Search⌘ K
AI Features

Embeddings and Vector Stores in LangChain

Explore how LangChain leverages embeddings to represent text as vectors that capture semantic meaning. Understand the importance of vector stores for storing, indexing, and searching these vectors to build intelligent AI applications. Learn practical methods for adding documents with metadata into vector stores, and discover integrations like Chroma, Pinecone, and FAISS that enable scalable and persistent storage solutions. This lesson provides foundational knowledge for creating applications that retrieve information based on meaning rather than exact matches, using LangChain's flexible tools and vector databases.

Let’s dive right into one of the most critical pieces of building intelligent applications with LangChain: vector stores. By now, you’ve already experimented with language models to generate content or answer questions. But how do we store and retrieve text data in a way that actually captures its meaning? That’s where vector stores shine.

What are embeddings?

First, let’s discuss embeddings, which are closely linked with vector stores. An embedding is a numerical representation of text. Whether it’s a word, a sentence, or an entire document, embeddings convert it into a vector, a list of numbers that captures its semantic meaning.

An easy mental image is to think of a giant three-dimensional space (though, in practice, the space often has hundreds or thousands of dimensions). Words or sentences that are related in meaning appear “close” to each other, while unrelated text drifts farther away. For example, “kitten” would be near “cat,” but both would be quite distant from “car.”

Loading...
A sample plot for the embeddings in 3 dimensions

This is powerful because machines don’t process English or any other language; they process numbers. By encoding words into vectors that represent their semantic relationships, we create a bridge between human language and the mathematical operations that computers excel at.

LangChain offers integrations with multiple embedding providers. One standout option is OpenAI, which provides state-of-the-art models like text-embedding-3-large. Here’s a quick look at how you might use it:

Python 3.10.4
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

Previously, we used Groq to access our LLM of choice; however, at the time of writing, no embedding model was available on Groq. We will now use OpenAI to access their library of models.

That’s all it takes to get started. This line of code loads a ...