Announcing LanceDB

Announcing LanceDB

3 min read

🙋 Are you building a generative AI app using an LLM or API? Are you working on a new recsys, a modern search engine, or a new analytical tool for unstructured data? Good news: you no longer have to struggle with Pinecone’s high cost, over the top complexity, or data privacy concerns.

🚀 LanceDB is a free and open-source vector database that you can run locally or on your own server. It’s lightning fast and is easy to embed into your backend server. Check out our github repo or pip install lancedb to get started.

For a quick demo, you can take a look at our sample youtube transcript search app. We’ll be adding a lot more soon. For more details on the API, take a look at LanceDB docs.

❓ Why we built LanceDB

As we spoke to builders of ML/AI applications, a common refrain from users was struggling to get services like Pinecone even running. After a while, we realized that the retrieve-filter-hydrate workflow was often time a big bottleneck in productivity and app latency.

So we put our heads together. I was one of the original co-authors of the pandas library. Lei was a core-contributor to HDFS and led ML infrastructure at Cruise. Using our experience building data/ML tooling, we’ve built something totally new.

LanceDB ❤️ builders

We’ve reimagined vector search from the ground up for better developer productivity, scalability, and performance. LanceDB is backed by Lance format — a modern columnar data format that is an alternative to parquet. It’s optimized for high speed random access and management of AI datasets like vectors, documents, and images.

We then added our own Rust implementations of a number of SOTA ANN-index algorithms to support low-latency vector search. These indices are SSD-based and can easily scale wayyyy beyond memory.

What’s more, LanceDB allows you to store and filter other features along with vectors. Our users have been able to replace 3–4 different data stores with LanceDB alone and achieve a speedup at the same time.

🔥 Updates

Since our first release 2 weeks ago, many exciting things have been happening. Thanks to Minh Le from the community, we now have a LangChain integration. Our integration for LlamaIndex is also under review.

Our current focus is building a TypeScript implementation with a native-level experience — no python server necessary! If you have thoughts on what the right API should look like, hop on this PR and let us know!

🛣️ Roadmap

Currently, we provide a python package called lancedb which is pip installable and delivers a great local workflow. Beyond what we’re working on right now, here’s what you can expect:

  • Ecosystem integrations into OpenAI plugin / AutoGPT etc
  • Richer set of embedding functions and document processing conveniences
  • Gallery of generative AI apps powered by LanceDB
  • Solutions for cloud deployment

🙏 We want to hear from you

We’d love to get your take on LanceDB. If you have questions, feedback, or want help using LanceDB in your app, don’t hesitate to drop us a line at contact@lancedb.com. And we’d really appreciate your support in the form of a Github Star on our LanceDB repo