What Are Embeddings?

Explore how embeddings transform text into numerical vectors that capture semantic meaning, enabling similarity-based search in LLM applications. Understand different embedding granularities, the role of cosine similarity, and how embedding quality affects retrieval accuracy in retrieval-augmented generation pipelines.

We'll cover the following...

The geometry of embedding space
- What high-dimensional space actually means
  - Clustering and neighborhoods
Measuring semantic similarity
- Cosine similarity as the default metric
  - Why not Euclidean distance?
How embeddings power retrieval
Conclusion

Every step in a Retrieval-Augmented Generation pipeline depends on one fundamental operation: converting human-readable text into numbers that machines can compare. The previous lesson on chunking strategies showed how documents are split into manageable pieces, but those pieces are still raw strings. A similarity search algorithm cannot look at the sentence “How do I reset my password?” and know it is related to “Steps to change your login credentials.” To bridge that gap, the pipeline needs a mathematical representation of meaning. That representation is called an embedding, and understanding how embeddings work is the single most important prerequisite for building effective retrieval systems.

An embedding is a fixed-length vector of floating-point numbers that encodes the semantic content of a piece of text. Instead of treating words as arbitrary symbols, an embedding model learns to place text into a coordinate system where proximity reflects meaning. The word “king” and the word “queen” end up as nearby points in this coordinate system, while “king” and “refrigerator” land far apart. This is not hand-coded. The model learns these relationships from massive amounts of text data during training.

Embeddings come in different granularities, and choosing the right one depends on the task at hand.

Word embeddings: These assign a single vector to each word in a vocabulary. Models like Word2Vec and GloVe pioneered this approach, and Amazon SageMaker’s BlazingText algorithm provides a managed ...

1.LLM Application Architectures

2.Challenges and Risks

3.Transformers and Attention

4.Vector Databases

5.Prompt Engineering

Cloud Lab

6.Fine-Tuning

Cloud Lab

7.Model Context with LangChain

8.Agentic Workflows

Cloud Lab

9.Retrieval Augmented Generation (RAG)

Cloud Lab

Cloud Lab

10.LLM Evaluation

Cloud Lab

What Are Embeddings?