Last fall, after playing around with OpenAI’s GPT-3 text-generating AI model — the predecessor to GPT-4 — former Uber researcher Jerry Liu discovered what he describes as “limitations” around the model’s ability to work with private data. (e.g. personal files). To solve this, he launched an open source project, LamaIndexdesigned to unlock the capabilities and use cases of large language models (LLMs) such as GPT-3 and GPT-4.
“LLMs provide incredible capabilities for knowledge extraction and reasoning — they can answer questions, summarize and extract insights, and even sequential decision-making with an external environment,” Liu told businessroundups.org in an email interview. “But LLMs have limits.”
As the project grew in popularity (200,000 monthly downloads), Liu joined forces with Simon Suo, one of his old colleagues at Uber, to turn LlamaIndex into a full-fledged company. Today, LlamaIndex (the company) provides a framework to help developers leverage the capabilities of LLMs on top of their personal or organizational data.
“LlamaIndex [helps] developers manage their data for LLM applications,” Liu said. “Our toolkit has the most depth in this aspect and we make it easy to integrate with other tools the developer is using.”
The LlamaIndex framework allows developers to connect data from files such as PDFs, PowerPoints, apps such as Notion and Slack, and databases such as Postgres and MongoDB to LLMs. The framework includes connectors to include data sources and data formats, as well as ways to structure data so that it can be easily used with LLMs.
In addition, LlamaIndex features a data retrieval and query interface that allows developers to enter any LLM input prompt to return what Liu describes as “context and knowledge-augmented” output.
“There are other LLM application frameworks that provide basic building blocks for LLM applications and agents,” said Liu. “What’s specific to LlamaIndex is that we focus on connecting your data sources to LLMs, and we have extensive tools for data ingestion, data management and indexing, and data retrieval related to LLM applications.”
The prospect of expanding LLMs in this way provoked investors, who pledged $8.5 million to LlamaIndex in a recently closed seed funding round. Greylock led with the participation of angel investors including Jack Altman, Lenny Rachitsky and Charles Xie.
So what will LlamaIndex spend the money on? Liu says it will be used to build an “enterprise solution” on top of the open source LlamaIndex project, which will launch later this year. One capability allows customers to use “protection grade” data connectors to parse and transport large amounts of data, while another, related capability allows them to index “domain-specific” data.
“LlamaIndex is not tied to a specific piece of technology, so we can continue to be used with LLMs as the technology evolves,” Liu said. “The AI industry is evolving so quickly that any initial stacks that emerge will likely change over the course of the next few months.”