There is a lot of excitement about Generative AI these days and for good reason. The emergence of this technology feels like a fundamental platform shift - much like the early days of the Internet or Mobile - and opens our minds to what is possible in every layer of the software stack.
Just as we have seen with every other major technical stack in the past, communities of builders and users have started stepping in and are building some of the most interesting Generative AI projects in open source. Several of the most active open source projects – both in contribution and usage - over the past few months have been in Generative AI. With the number of such projects increasing rapidly, Generative AI now deserves to be its own category in open source.
Today, we are launching a Generative AI Open Source (GenOS) Index to track the most active open source projects in Generative AI. We plan to update this index every month and identify the top 30 projects in terms of GitHub star growth (adds) in the preceding ninety days, with 500 star adds being the minimum for a project to be included. Furthermore, because there are enough differences in growth characteristics, we categorize the projects into three subcategories: Models, Infrastructure/Tools and Applications.
In Q1 2023, 5,716 open source projects in total - existing or newly created - added at least 500 GitHub stars. Of those, we identified 187 as Generative AI projects with 46 (25%) being Models, 83 (44%) Infrastructure/Tools and 58 (31%) Applications. Among the 30 fastest-growing Generative AI projects, the share across the three subcategories changes to 33% of Models, 40% of Infrastructure/Tools and 27% of Applications.
The top 30 Generative AI projects, ranked by the number of GitHub stars added during the last ninety days, are the following:
Beyond the top 30, there were several other really interesting Generative AI projects that we anticipate gaining adoption and breaking into the GenOS Index. Here are five Rising Stars that we liked the most:
In the current GenOS Index, Infrastructure/Tools projects were the most active. They represent close to half of the top projects in Generative AI. This reflects the emerging nature of the Generative AI category which requires building the right infrastructure and toolchain first so that users can train models and build AI applications. Applications were the second most active Generative AI category in Q1, followed by Models.
Models: In the Models category, we see a strong representation of “lightweight” GPT models such as nanoGPT, minGPT and PicoGPT as well as ALBERT, which is a “lite” version of BERT from Google, and Cramming, which enables training a BERT-like LLM with limited compute. This indicates a strong demand for OpenAI alternatives that are easier to train - possibly on proprietary domain data - and run at a fraction of the cost. As more products get powered by AI, we anticipate growing demand for such lightweight GPT models in open source. In addition, we see the emergence of LLMs in other languages - GLM-130B for English and Chinese, PhoBERT for Vietnamese and Rinna for Japanese.
Infrastructure/Tools: Among Infrastructure/Tools projects, we see ColossalAI, Petals and CarperAI trlX - that focus on making large AI models cheaper and faster to train and more accessible for inference - gain adoption. The same is true for projects like LangChain and LlamaIndex (GPT Index) that enable LLMs to connect with external data. Finally, we see vector databases such as Weaviate, Qdrant and Milvus - that scale similarity search by storing both vectors and objects in a database and making the data available through GraphQL, REST and other clients - do well.
Applications: In the Applications category, we see a wide variety of applications powered by GPT as expected from a community that is experimenting with many diverse use cases. Several of the projects such as lencx/ChatGPT, wechat-ChatGPT and ChatGPT-Mac enable ChatGPT through other interfaces such as desktop, WeChat, and a menu bar. As the Generative AI technology and infrastructure mature over time and become more accessible to a wider audience of builders and creators, we expect to see projects in the Applications subcategory surpass those in Models and Infrastructure/Tools.
While Generative AI projects were only 3.3% of the 5,716 open source projects that added at least 500 stars in Q1, we anticipate the share of fast-growing Generative AI projects to increase materially over the next several quarters.
Stay tuned for monthly installments of the Decibel GenOS Index!