Embeddings

Learn how embeddings transform raw data into meaningful vectors that power search, recommendations, and modern AI systems.

If you walk into a machine learning or generative AI interview, there’s a high chance you’ll be asked, “What are embeddings, and why are they important?” This is a common interview question because embeddings are a fundamental concept in modern AI systems. An interviewer bringing this up wants to see that you understand how we represent data (like text or images) in a way that machines can work with. Embeddings come up in many contexts—from natural language processing to recommendation systems—so demonstrating a solid grasp of them signals that you’re well-versed in the building blocks of ML models. Essentially, this question tests whether you know that an embedding is a numeric representation of data that captures important meaning. It also probes if you appreciate why such representations are useful (for example, how they enable algorithms to measure similarity or learn patterns in the data).

To answer this question well, you should cover what embeddings are and why they matter. A strong answer would explain that an embedding is a vector (a list of numbers) that encodes some properties of the input data in a continuous space. You’d want to mention different types of embeddings: for instance, sparse representations like one-hot encodings or TF-IDF vs. dense embeddings learned by neural networks. You should also mention static embeddings (like Word2Vec or GloVe, where each word has a fixed vector) vs. contextual embeddings (like those from BERT, where the vector for a word can change depending on the sentence context. Interviewers expect you to know these distinctions.

In the rest of this lesson, we’ll discuss all these concepts: we’ll define embeddings, discuss how to create them, and explore their benefits.

What are embeddings?

An embedding is a way to represent data, such as a word, a sentence, an image, or any other item, as a point in a high-dimensional space. In practical terms, an embedding is a vector of numbers. Each dimension of this vector doesn’t necessarily have an interpretable meaning by itself, but collectively, the vector captures meaningful patterns or features of the data. The key idea is that similar data will have similar embedding vectors in this space. For example, in a text embedding space, the words “cat” and “kitty” would end up with close vectors, reflecting their related meaning​. Likewise, two sentences that mean roughly the same thing will yield numerically close embeddings, even if the actual words differ​.

To understand why we need embeddings, consider how we might feed text data into a machine learning model. Computers can’t directly interpret words or images—we must convert them into numbers. A simple approach might be to use a one-hot encoding for words, where each word is represented by a giant vector mostly full of zeros and a single 1 indicating the word’s ID. However, this kind of representation is sparse (mostly zeros) and doesn’t reflect any similarity between words. A simple Python code is given below:

Press + to interact
Python 3.10.4
# One-hot encoding in simple Python
vocab = ['cat', 'kitten', 'bank', 'Java']
target = 'Java'
index = vocab.index(target)
one_hot = [1 if i == index else 0 for i in range(len(vocab))]
print(one_hot) # Output: [0, 0, 0, 1]

In a one-hot scheme, the words “cat” and “kitten” would be just as different as “cat” and “bank,” sharing no features in common. In contrast, an embedding assigns each word a dense vector where all dimensions can have non-zero values, and it can place “cat” and “kitten” close together in that space because they are similar. A simple way to understand it to think that sparse vectors tend to represent only superficial differences (like different words altogether), whereas dense vectors encode semantic meaning—they carry information about how words relate in meaning​. Dense embeddings typically have far fewer dimensions than the vocabulary size—for example, a 300-dimensional dense vector instead of a 50,000-dimensional one-hot vector—yet they pack more information into each dimension, making them very information-rich​.

An embedding space is often imagined as a kind of map of meanings. If we take a large collection of words or other data and compute embeddings for all of them, we can visualize how they cluster. Items with similar characteristics will cluster together in this geometric space. For instance, if we generate embeddings for a bunch of words and reduce the dimensionality for visualization, we might see all the days of the week grouped in one region. Words like “Monday,” “Tuesday,” and “Wednesday” would form a tight cluster separate from, say, a cluster of color names or a cluster of animal names.

Press + to interact

This property—that semantic similarity in concepts translates to proximity in the embedding space—is what makes embeddings so powerful. It means we can mathematically measure similarity (for example, by cosine similarity or Euclidean distance between vectors) and have that correspond to conceptual ...