...

/

Embeddings and Vector Stores in LangChain

Embeddings and Vector Stores in LangChain

Discover different ways of storing documents and using them in your applications with LangChain

Let’s dive right into one of the most critical pieces of building intelligent applications with LangChain: vector stores. By now, you’ve already experimented with language models to generate content or answer questions. But how do we store and retrieve text data in a way that actually captures its meaning? That’s where vector stores shine.

What are embeddings?

First, let’s discuss embeddings, which are closely linked with vector stores. An embedding is a numerical representation of text. Whether it’s a word, a sentence, or an entire document, embeddings convert it into a vector, a list of numbers that captures its semantic meaning.

An easy mental image is to think of a giant three-dimensional space (though, in practice, the space often has hundreds or thousands of dimensions). Words or sentences that are related in meaning appear “close” to each other, while unrelated text drifts farther away. For example, “kitten” would be near “cat,” but both would be quite distant from “car.”

Loading...
A sample plot for the embeddings in 3 dimensions

This is powerful because machines don’t process English or any other language; they process numbers. By encoding words into vectors that represent their semantic relationships, we create a bridge between human language and the mathematical operations that computers excel at.

LangChain offers integrations with multiple embedding providers. One standout option is OpenAI, which provides state-of-the-art models like text-embedding-3-large. Here’s a quick look at how you might use it:

Press + to interact
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

Previously, we used Groq to access our LLM of choice; however, at the time of writing, no embedding model was available on Groq. We will now use OpenAI to access their library of models.

That’s all it takes to get started. This line of code loads a powerful model that converts your text into dense numerical vectors. ...