This interview is part of the Decibel OSS Spotlight series where we showcase founders of fast-growing community-led projects that are solving really unique problems and experiencing strong community adoption.
Sudip Chakrabarti spoke to Shishir Patil and Tianjun Zhang, the co-creators of Gorilla, an open source project that enables LLMs to use tools by invoking APIs. Given a natural language query, Gorilla finds the semantically and syntactically correct API to invoke, demonstrating how to use LLMs to invoke 1600+ (and growing) API calls accurately while reducing hallucination.
Shishir and Tianjun shared with us their inspiration behind creating Gorilla and their vision to make it a widely adopted project.
Tianjun: The name Gorilla actually reflects the mission of our project. We believe that LLMs, as incredible as they are, can solve meaningful, real-life problems only when they are connected with external tools. Since gorillas are the other major species of apes that are known to use tools - using sticks to forage for food and even to determine the depth of water - we thought Gorilla would be a really appropriate name for our project.
Tianjun: I am a 5th-year PhD student at UC Berkeley’s Sky Computing Lab, advised by Prof. Joseph Gonzalez. Before Berkeley, I completed two years of undergrad at Shanghai Jia Tong University followed by two years at the University of Michigan. Early into my PhD, I focused on building autonomous agents using Reinforcement Learning for robotics and gaming such as StarCraft. Later as I started looking at LLMs, I realized their potential to create highly capable autonomous agents by leveraging their vast knowledge of the world. This realization inspired Shishir and I to create Gorilla, an autonomous agent itself, designed to connect multiple external tools and assist in complex tasks. It brought my initial research on autonomous agents full circle.
Shishir: I am a 4th-year PhD student at the Sky Computing Lab where I work with Prof. Joseph Gonzalez and also collaborate with Professors Prabal Datta and Ion Stoica. I did my undergrad from India and then spent two years at Microsoft Research as a Research Fellow before moving to Berkeley. My research interests are on the two extremes of systems: edge computing - how to build applications for extremely low power devices like ARM Cortex, ThingWorx EMS and smartphones - and sky (cloud) computing - how to build applications powered by LLMs that run across multiple disparate cloud platforms.
Shishir: Last year, when ChatGPT became popular, we were already exploring LLMs. We realized that while using LLMs for chatting is a powerful demonstration of what’s possible, integrating them with other tools - via APIs - is crucial to accomplish meaningful tasks. Most existing solutions were and continue to be prompt-based like LangChain. This means they can’t handle the scale of web applications with potentially millions of changing APIs. Moreover, discovering the right APIs with the correct parameters for specific tasks poses a significant hurdle. Whether it's connecting an iOS application to a database, implementing authentication through Auth0, or adding a payment gateway like Stripe, calling the right APIs with the right context requires expertise. This problem extends beyond specialized APIs to even commonly used services like AWS, GCP, and Azure, each offering thousands of APIs, with each API requiring several different input parameters. The current reliance on human experts or time-consuming searches through API documentation and online resources like StackOverflow makes the process inefficient and unmanageable when building complex applications.
We built Gorilla to solve this exact problem. We figured that if we could teach an LLM to make the right API calls with the right parameters and context then we could easily connect all the tools needed to build really powerful LLM-driven applications. That’s how Gorilla was born!
Tianjun: Gorilla is a novel pipeline designed for fine-tuning LLMs to call APIs on a large scale, potentially encompassing millions of constantly changing APIs with varying functionalities and constraints.
To create Gorilla, we started by constructing a massive dataset of ML APIs from public repositories like TorchHub, TensorHub, and HuggingFace. We enhanced this dataset by generating synthetic user question prompts for each API, making each entry an instruction reference API pair. We further came up with RAT - Retrieval Aware Training. The idea behind RAT is simple - retrievers may not be accurate and can make mistakes. So you want to train the LLM to be aware that a retrieved data point may potentially be wrong. Using this dataset and RAT, we fine-tuned Gorilla, which is based on the LLaMA-7B model and incorporates document retrieval (API documentation). Our retrieval-aware training approach enables Gorilla to adapt to API documentation changes at test time and reason effectively about API constraints and input parameters. The result is a reliable API call generator with virtually no hallucination, an impressive ability to adapt to API usage changes during testing, and the capability to satisfy constraints while selecting appropriate APIs.
Gorilla's greatest strength lies in handling APIs with constraints. API calls often come with inherent limitations, requiring the LLM to not only understand the API's functionality but also categorize calls based on different constraint parameters. This introduces added complexity, demanding a more nuanced understanding from the LLM. For machine learning API calls, common constraints include parameter size and minimum accuracy. For example, a prompt like "Invoke an image classification model with less than 10M parameters and at least 70% ImageNet accuracy" poses a significant challenge for the LLM to interpret accurately. The model must comprehend the user's description and reason about the embedded constraints within the request.
Gorilla excels in meeting this challenge, allowing LLMs to navigate the intricate landscape of constraints that accompany real-world API calls. It goes beyond merely comprehending API functionality, demonstrating its capability to handle complex constraints effectively.
Finally, we introduced a technique to actually measure Hallucination. So, now using Abstract Syntax Tree (AST) matching, we can accurately put a number on how much LLMs hallucinate when assigned the task of API calls.
Tianjun: At Sky Lab, we have a rich history of open sourcing our research, including projects like Mesos, Spark, Ray, Skyplane, Skypilot, POET and more. Naturally, when it came to Gorilla, open sourcing was an obvious choice for us. We had started building Gorilla after observing significant hallucinations and inaccuracies in API calls and input parameters with GPT-4. Our aim was to create a much more superior solution. But, instead of just claiming Gorilla's superiority, we wanted to empower our users to evaluate it themselves and draw their own conclusions. To achieve this, we decided to provide the actual model, the evaluation data set, and the evaluation technique to our users. And, open sourcing was the easiest way to achieve our goal.
Shishir: We have been completely blown away by the reception of the project from the get go - honestly, we were not prepared for it. The day we uploaded the Gorilla paper to arXiv, that same evening the project blew up with several hundreds of people liking it and asking how they could try it. And, we didn’t even have a project website! So, we slogged the next 24 hours to release the models on HuggingFace along with instructions where they could replicate and build on top of Gorilla, put up a hosted model so people could try it easily from Colab notebooks, and started building the community around it.
Since then, Gorilla has surpassed 6,000 GitHub stars, garnered 2,000+ active Discord community members, and has served 100,000+ user requests. The project's momentum is growing, thanks to active contributions from the community. Originally, Gorilla supported ~1600 APIs - mostly ML APIs - from TorchHub, TensorHub and HuggingFace. Since then, the community has helped us add support for multiple developer tools and cloud services like Kubernetes, AWS, GCP, GitHub, GNU tools, BASH and other CLI APIs as well, with several other exciting additions in the pipeline.
Shishir: Starting with a mistake, I wish we had set up the hosted service, website, and shared the open-source code before submitting the arXiv paper. There was a moment, when everyone was mailing us for resources, and even a few hours matter in the initial phase. Thankfully, we were able to course correct quickly. On the bright side, we have been deeply engaged with the community and genuinely benefitted from its feedback. For example, we had initially focused on ML APIs, but the community's request for other mainstream developer tools and cloud services made us adapt and add support for them.
Second, we spend a lot of time working on pull requests from the community, even when some of those represent corner use cases. Most of our users are technical and they often have very specific asks. We go the extra mile to accommodate their needs - we have found that to be an easy way to build credibility with the community and help increase usage of Gorilla - even though that has meant sleeping every other night for the past couple of months!
Finally, we make every effort to be completely transparent with our community and clearly communicate our plans. For example, we make it very clear that we only show our users the correct API call and the corresponding parameters, but we do not execute the actual API. We do not collect any data from the output of any API except for the exit condition - success or failure. Trust is crucial in the open-source world, and we strive to be upfront and honest at every step of the journey.
Tianjun: I have three: EdgeML from Microsoft, PyTorch and HuggingFace. I find myself using the tools built by these communities a lot and I really like the developer experience as well as how well maintained these open source projects are.
Shishir: I look at successful open source projects through a different lens. While many are backed by large organizations like Spark by Databricks or Kafka by Confluent, I get drawn to projects created and maintained by a single creator or a small team, yet are solving a problem - however narrow - really well. Case in point is FZF, the command-line fuzzy finder tool (with 53k+ GitHub stars) that provides a neat UI for reverse-i search on the terminal. It is blazing fast and just works amazingly well; once you start using it you’d never go back! I have a whole list of similar open source gems that might never become standalone businesses but are absolutely delightful to use.
Shishir: One piece of advice I have for every new open source project creator is to focus on making the "step zero" incredibly easy for users - so easy that people could try out the project while casually scrolling through their phone during lunch. I've seen many open source projects make the mistake of introducing unnecessary barriers, like asking for sign-ups or credit card details (even when they don’t plan to charge), which creates friction for users who are just trying to evaluate the project's usefulness. I believe that if users could evaluate your project really quickly and then use it with minimal changes to their workflow then it becomes so much easier to gain adoption and attract users and contributors to the project.