Thoughts on Generative AI

Generative artificial intelligence (‘Gen AI’) refers to ‘techniques that learn a set of facts from data, and uses it to generate brand-new, unique info that resembles but doesn’t repeat the original data’. 

Examples of this include text, images, video, audio, structures, computer code, synthetic data, workflows and even models of physical objects.

AI 1.0 Has Arrived

Over the past year and a quarter, the growth of AI has been nothing short of remarkable, with businesses across various industries and sectors increasingly adopting AI technologies. 

This increasing rate of adoption has been driven by its potential to transform operations, simplify decision-making processes, and optimizing business performance. Advancements in natural language processing and computer vision, among other things, have made this possible. 

These developments have been further supported by vast increases in computational efficiency and capacity, with such advancements having effectively enabled large language models to read datasets and provide easily digestible information based on inputted data. 

Although exact costs are still unknown, estimates show that OpenAI (the company behind ChatGPT) spent several million dollars to train GPT-3, as it required approximately 45 terabytes of text data.

More recently, Anthropic (one of OpenAI’s closest rivals) announced that they are building a new frontier model, Claude-Next, but that it would ‘require a billion dollars in spending.’

Needless to say, building a generative AI model is extremely costly.

Due to the sheer financial and technical resources required to build these models, it comes as no surprise that the industry is dominated by a few heavyweights. 

At the top of the list, unsurprisingly, is OpenAI, who was valued at a staggering $80B USD after a recent tender offer share sale. 

The service has experienced explosive growth, as the fastest consumer internet app to reach 100 million users in history. 

However, recent data indicated that ChatGPT’s total monthly visits fell for the first time since launch, and the downward trend has continued.

Other dominant players include:

  • Anthropic: Operator of a research company intended to create large-scale artificial intelligence systems that are steerable, interpretable, and robust. The early-stage startup has raised $7.3B USD over the past year and now sports a valuation of around $18B USD. 
  • HuggingFace: Developer of the first open-source chatbot application and natural language processing technologies designed to facilitate AI-powered communication. As of its last Series D funding round of $235M USD, its valuation stands at $4.5B USD. 
  • Jasper: Developer of an AI-based content writing platform designed to help businesses create original content. The company is now valued at $1.2B USD after an internal valuation downgrade this past September. 
  • Glean: Developer of an AI-powered search engine designed to find any piece of data through integrations with multiple platforms. The startup is valued at $2.2B USD as of its latest funding round in February 2024
  • Stability.AI: The developer of text-to-image AI software is valued at $1B USD based on its November 2023 financing of $50M USD.

Timeline of Generative AI Development

ChatGPT’s launch in November 2022 served as a major turning point for Generative AI. 

According to a report from the Pew Research Center, 58% of American adults know about ChatGPT. 

As OpenAI continues to roll out product updates and new features, competitors have also followed suit. 

Several months after the launch of ChatGPT, Microsoft previewed their AI-powered Bing Chat in February 2023, while Google launched Bard in March 2023. For now, OpenAI is winning the ‘Generative AI War’, as Google’s rush to launch BARD meant that it came up short of matching ChatGPT in terms of accuracy and efficiency.

This mishap was also reflected in the public markets, as back in February 2023, Google lost approximately $100B in market value in a single day after BARD was shown to produce inaccurate information in a promotional video.

But this was just an early attempt and we expect Google and other players to catch up with OpenAI’s dominance in the months ahead. 

That being said, there are clear limitations of what these early generative AI models are capable of. 

First, Gen AI models rely heavily on data-driven algorithms, which means that the overall accuracy of these models decrease when they are presented with new information or contexts outside of their original training parameters. 

Second, there are multiple security risks associated with Gen AI. According to research done by Cyberhaven, 11% of data that employees paste into ChatGPT is confidential, including companies’ intellectual property, sensitive strategic information, and client data. The inherent nature of Gen AI has also caused malicious use of deep fakes, where bad actors have used voice and facial recognition to fool authentication systems and bypass existing security measures.

Therefore, data leaks and AI misuse are key risks that need to be addressed before mass institutional adoption can take place. 

As founders, investors, and businesses begin to place more weight on this revolutionary technology, Cadenza is closely following the space to capitalize on opportunities over the long-term.

Some Considerations

As Generative AI matures, several key considerations will be crucial in evaluating the success of a given project:


Data Privacy

LLMs like GPT are generally trained on large, public datasets to improve their accuracy and robustness. 

However, reports estimate that high quality public language data will likely be exhausted by 2026. As such, private data will become a key differentiator in the capabilities of various LLMs. Encryption will become critical to ensuring the security of data, as data breaches could prove catastrophic.

Data Ingestion

Companies adopting ‘AI-native’ data streams can secure several strategic advantages that set them apart in today’s data-intensive business landscape. 

This approach employs advanced AI algorithms for immediate data processing, cleaning, categorization, and analysis upon ingestion, resulting in improved data utilization, and swift decision-making. 

Data Ingestion

Another key benefit is scalability, with AI-native systems readily adjusting to data volume fluctuations, offering both cost-efficiency and flexibility, unlike traditional data systems that necessitate significant investment for scaling, AI-native systems simplify the process. 

These systems also enhance accuracy by processing large data volumes more precisely than human capabilities permit and continually learning to minimize error rates.

This results in superior data quality that translates into more reliable business insights. Moreover, they facilitate predictive analytics through machine learning, identifying data patterns and trends to aid proactive decision-making. 

Cost efficiency is another significant advantage, as AI automation reduces labor costs and the expenses associated with data storage and processing infrastructure. 

Take for example traditional data sourcing companies like Pitchbook or Crunchbase, which currently outsource data ingestion to developing countries, where humans manually input data. 

In the face of continually emerging data sources and types, AI-native data ingestion’s agility stands out, allowing swift adaptation without the significant expenses associated with manual labor.

Lastly, these systems can enhance data security by using AI algorithms to identify data pattern anomalies that could signal potential breaches, providing an additional protective layer for sensitive information.

In essence, ‘AI-native’ data ingestion enables companies to leverage their data more effectively, efficiently, and securely than traditional methods, enabling companies to outscale competitors and incumbents.

Use vs Fine-tune vs Build

When considering the application of LLMs in existing workflows, several distinct possible options exist: 

  • Use existing LLMs
  • Fine-tune existing LLMs, or 
  • Build novel LLMs for specific usage

Large Language Models (LLMs) like GPT-4 from OpenAI are widely applicable across various tasks and domains, and can be fine-tuned for more specified use-cases, while some verticals may demand custom LLMs for privacy, accuracy, or other reasons.

Value Creation in the New Internet

As the market matures, the picture for the Generative AI infrastructure stack is becoming increasingly clear. Categories include foundation models, fine-tuning, and data storage, all of which address how to deliver performance at scale.

Foundation Models 

Together, is a decentralized compute platform that Cadenza originally funded at the pre-seed stage and which recently raised a new $106M round of funding at a $1.25 billion post-money valuation.

It is representative of a foundation model composed of open models, decentralized cloud, and a developer cloud, allowing for research and commercial applications to be built on top of it. 

These foundation models allow for much more flexibility, as they can be reassigned from one problem to another, and can be applied across multiple use cases such as art, programming and search.

As such, they open up the possibility where we can simply describe a task and shift the burden of devising a solution to computers. 

Other examples of popular foundation models include OpenAI’s GPT-3 and Google’s PaLM.  Users can simply enter a prompt on GPT-3 for it to solve a variety of tasks, including Q&A, parsing unstructured data, and explaining code snippets. 

GitHub has also partnered with OpenAI’s Codex model to create GitHub Copilot, the world’s first at-scale Generative AI development tool that can autocomplete code in any language. On the back of this, Google launched Minerva, a language model trained on PaLM that is capable of solving mathematical and scientific questions using step-by-step reasoning. 

Bottom line, foundation models are providing the basis for new and exciting applications, therefore, we believe that open-source foundation models, such as Together, are expected to outperform closed-source API distribution models in the long run.

Fine-Tuning and Domain-Specific Models 

Fine-tuning is a customization technique to train an existing foundation model to become an ‘expert’ in a specific use case. 

Pre-trained models significantly reduce training time and costs, while increasing performance and accuracy.

By narrowing down an area of expertise, more targeted solutions (e.g. medical research, legal analysis, or customer support) can be created. There are different open-source frameworks like TensorFlow and PyTorch that facilitate dataset training to fine-tune an existing model. 

Due to the host of benefits that fine-tuning brings, domain-specific models are becoming increasingly popular.  

The opportunities within this part of the infrastructure stack are seemingly endless. As an example, Bloomberg launched BloombergGPT, their own 50-billion parameter LLM ‘specifically trained on a wide range of financial data to support a diverse set of natural language processing tasks within the financial industry’. 

Bloomberg isn’t alone in their plan to leverage their own proprietary data and strengthen their AI moat. 

A recent study surveyed on LLMs revealed that ‘nearly 40% of surveyed enterprises are already considering building enterprise-specific language models’, highlighting that companies recognize the number of benefits incorporating LLMs into their workflows brings. 

As we have seen with each technological paradigm shift, domain-specific models have the potential to enhance user experiences and improve productivity. 

Data Storage

Within Generative AI, Data storage and management is essential at every stage of development. 

Databricks ($43B Valuation) is one notable example which unifies data science, data engineering, and business analytics in a collaborative workspace. 

Built on top of Apache Spark, Databricks offers streamlined workflows for big data processing and machine learning tasks. The platform also features a variety of cloud-based services such as data storage, real-time analytics, and machine learning model training and deployment.

Key Insights

Overall, Generative AI has huge implications for the global economy. 

According to a report released by McKinsey, Generative AI is estimated to add approximately $2.6-4.4T USD in value annually across the 63 use cases they analyzed. Within that, about 75% of that value falls into four main categories: customer operations, marketing and sales, software engineering, and R&D. 

However, at Cadenza, we believe that the foundations on which Generative AI is built will disrupt how we use the Internet, representing a fundamental change in the underlying technology. 

In this new technology frontier, our thesis is that Generative AI will be more than just a tool or a feature creating value in the new Internet.

If you found this informative, you may also like The Case for Decentralized Social Media or Web3 Gaming is Going Mainstream.

If you would like more information on our thesis surrounding Generative AI or other transformative technologies, please email info@cadenza.vc

By:


One response to “Thoughts on Generative AI”

Leave a reply to Guardians of the Digital Galaxy: How AI is Changing Cybersecurity – Cadenza Chronicle Cancel reply