Table of Contents

What is one hot encoding?Why use one hot encoding?How to read this decision tree Handling high-cardinality categorical features What does “high cardinality” mean?Why one-hot encoding becomes problematic Small vs large category example Better alternatives for high-cardinality features Target encoding Frequency/count encoding Hash encoding Embedding layers When should you use each approach?Practical recommendation Sparse matrices and memory efficiency Why one-hot encoding wastes memory What is a sparse matrix?How Sklearn handles sparse output Dense vs sparse intuition Example 1: Dense output with Pandas Output type Example 2: Sparse output with Sklearn Output type Optional memory comparison When should you care?Practical recommendation How to convert categorical data to numerical data What is the dummy variable trap?One hot encoding with Pandas One hot encoding with Sklearn Comparing Pandas, Sklearn, and category_encoders Example dataset 1. Pandas get_dummies()example 2. Sklearn OneHotEncoderexample 3. category_encoders TargetEncoder example When should you use each?Next steps for your learning Continue reading about artificial intelligence One-hot encoding in PyTorch and TensorFlow One-hot encoding in TensorFlow TensorFlow example Output Shape explanation Common TensorFlow use cases One-hot encoding in PyTorch PyTorch example Output Shape explanation Common PyTorch use cases One-hot encoding vs embeddings in deep learning Practical use cases Important warning Next steps for your learning Continue reading about artificial intelligence

Data Science in 5 Minutes: What is One Hot Encoding?

Data Science in 5 Minutes: What is One Hot Encoding?

Learn what one-hot encoding is, when to use it, how it compares to other encoding techniques, and how to implement it with Pandas and Scikit-learn to prepare categorical data for machine learning models.

15 mins read

Jun 01, 2026

Share

editor-page-cover

If you’re in the field of data science, you’ve probably heard the term “one hot encoding”. Even the Sklearn documentation tells you to “encode categorical integer features using a one-hot scheme”. But, what is one hot encoding, and why do we use it?

Most machine learning tutorials and tools require you to prepare data before it can be fit to a particular ML model. One hot encoding is a process of converting categorical data variables so they can be provided to machine learning algorithms to improve predictions. One hot encoding is a crucial part of feature engineering for machine learning.

In this guide, we will introduce you to one hot encoding and show you when to use it in your ML models. We’ll provide some real-world examples with Sklearn and Pandas.

This tutorial at a glance:

What is one hot encoding?
How to convert categorical data to numerical data
One hot encoding with Pandas
One hot encoding with Sklearn
Next steps for your learning

Start mastering feature engineering for ML with our hands-on course today.

Cover

Feature Engineering for Machine Learning

Feature engineering is a crucial stage in any machine learning project. It allows you to use data to define features that enable machine learning algorithms to work properly. In this course, you will learn the techniques that will help you create new features from existing features. You’ll start by diving into label encoding which is crucial for converting categorical features into numerical. You’ll also learn about other various types of encoding such as: one-hot, count, and mean, all of which are important for feature engineering. In the remaining chapters, you’ll learn about feature interaction and datetime features. In all, this course will show you the many different ways you can create features from existing ones.

30mins

Advanced

10 Playgrounds

1 Quiz

What is one hot encoding?#

Categorical data refers to variables that are made up of label values, for example, a “color” variable could have the values “red,” “blue,” and “green.” Think of values like different categories that sometimes have a natural ordering to them.

Some machine learning algorithms can work directly with categorical data depending on implementation, such as a decision tree, but most require any inputs or outputs variables to be a number, or numeric in value. This means that any categorical data must be mapped to integers.

One hot encoding is one method of converting data to prepare it for an algorithm and get a better prediction. With one-hot, we convert each categorical value into a new categorical column and assign a binary value of 1 or 0 to those columns. Each integer value is represented as a binary vector. All the values are zero, and the index is marked with a 1.

Take a look at this chart for a better understanding:

Let’s apply this to an example. Say we have the values red and blue. With one-hot, we would assign red with a numeric value of 0 and blue with a numeric value of 1.

It’s crucial to be consistent when we use these values. This makes it possible to invert our encoding at a later point to get our original categorical back.

Once we assign numeric values, we create a binary vector that represents our numerical values. In this case, our vector will have 2 as its length since we have 2 values. Thus, the red value can be represented with the binary vector [1,0], and the blue value will be represented as [0,1].

Why use one hot encoding?#

One hot encoding is useful for data that has no relationship to each other. Machine learning algorithms treat the order of numbers as an attribute of significance. In other words, they will read a higher number as better or more important than a lower number.

While this is helpful for some ordinal situations, some input data does not have any ranking for category values, and this can lead to issues with predictions and poor performance. That’s when one hot encoding saves the day.

One hot encoding makes our training data more useful and expressive, and it can be rescaled easily. By using numeric values, we more easily determine a probability for our values. In particular, one hot encoding is used for our output values, since it provides more nuanced predictions than single labels.

How to read this decision tree#

Use one-hot encoding when your categories are nominal, unordered, and have a small number of unique values, such as red, blue, and green.
Use ordinal encoding when the categories have a meaningful order, such as low, medium, and high.
Use label encoding carefully. It can work well with tree-based models, but for unordered categories, it may accidentally imply an order that does not exist.
Use target encoding when you have high-cardinality features, such as ZIP codes, product IDs, or user segments. It can reduce dimensionality, but you should apply it carefully to avoid data leakage.

The encoding choice is a feature engineering decision, and it can directly affect model accuracy.

Handling high-cardinality categorical features#

One-hot encoding works well when a feature has only a few categories. But in real-world machine learning systems, you’ll often encounter features with hundreds or even thousands of unique values. This is called high cardinality, and it can create serious performance and scalability problems if you’re not careful.

What does “high cardinality” mean?#

A categorical feature has high cardinality when it contains many unique categories—typically more than 50 or 100.

Common examples include:

Product IDs in e-commerce systems
City or ZIP code features
User IDs
Search queries or keywords

For example:

That’s where problems start.

Why one-hot encoding becomes problematic#

With high-cardinality features, one-hot encoding can quickly become inefficient.

Here’s why:

It creates too many columns
Memory usage increases significantly
Most values become 0, creating sparse matrices
Training becomes slower
Some models struggle with extremely wide datasets
It can reduce generalization and hurt performance

For small datasets, this might be manageable. For production-scale ML systems, it often isn’t.

Small vs large category example#

A small categorical feature works well with one-hot encoding:

That means 1,000 separate columns for just one feature.

Better alternatives for high-cardinality features#

Instead of blindly applying one-hot encoding, you can use more scalable encoding techniques.

Target encoding#

Target encoding replaces each category with the average target value associated with it.

Example:

City A → average purchase value = 120
City B → average purchase value = 85

This works especially well for:

Tree-based models
Large tabular datasets
Kaggle-style ML problems

Warning: If done incorrectly, target encoding can cause data leakage because it uses information from the target variable.

Frequency/count encoding#

This technique replaces categories with how often they appear.

Example:

“New York” → 12,500
“Chicago” → 8,200

Useful when:

Category frequency itself carries meaning
You want a simple and lightweight solution

Hash encoding#

Hash encoding maps categories into a fixed number of columns using a hash function.

Benefits:

Controls feature size
Works well for very large datasets
Useful in streaming systems and NLP pipelines

Trade-off:

Different categories can occasionally collide into the same bucket

Embedding layers#

Deep learning models often use embeddings instead of one-hot encoding.

Instead of creating thousands of sparse columns, embeddings learn dense numerical representations for categories.

Common use cases:

Recommendation systems
NLP models
Large-scale deep learning pipelines

This is how systems like YouTube, Netflix, and modern language models handle massive categorical spaces efficiently.

When should you use each approach?#

One-hot encoding → Small category sets
Target encoding → Tree-based models and tabular ML
Frequency encoding → Lightweight preprocessing
Hash encoding → Extremely large feature spaces
Embeddings → Deep learning and recommendation systems

Practical recommendation#

In practice, there’s no single “best” encoding strategy.

A good rule of thumb is:

Use one-hot encoding for low-cardinality features
Use alternative encodings when category counts become large
Always validate performance using cross-validation

The right encoding strategy can significantly improve both model scalability and feature engineering quality.

Sparse matrices and memory efficiency#

One-hot encoding is simple and powerful, but it comes with a hidden cost: memory usage. As the number of categories grows, the encoded dataset can become extremely large because most values in the matrix are zeros. That’s where sparse matrices become important.

Why one-hot encoding wastes memory#

When you one-hot encode categorical features, each category becomes its own binary column.

For example:

On larger datasets, the memory difference becomes dramatic.

When should you care?#

Large datasets
NLP systems (bag-of-words, TF-IDF)
Recommendation systems
High-cardinality categorical features
Production ML pipelines

Some machine learning algorithms also work better with sparse input than others, especially linear models and certain tree-based approaches.

Practical recommendation#

Use sparse matrices whenever your feature dimensionality becomes large. They improve both memory efficiency and scalability, which becomes critical in real-world machine learning systems.

How to convert categorical data to numerical data#

Manually converting our data to numerical values includes two basic steps:

Integer encoding
One hot encoding

For the first step, we need to assign each category value with an integer, or numeric, value. If we had the values red, yellow, and blue, we could assign them 1, 2, and 3 respectively.

When dealing with categorical variables that have no order or relationship, we need to take this one step further. Step two involves applying one-hot encoding to the integers we just assigned. To do this, we remove the integer encoded variable and add a binary variable for each unique variable.

Above, we had three categories, or colors, so we use three binary variables. We place the value 1 as the binary variable for each color and the value 0 for the other two colors.

red,	yellow,	 blue
1,		0,		0
0,		1,		0
0,		0,		1

Note: In many other fields, binary variables are referred to as dummy variables.

Start mastering feature engineering for ML with our hands-on course today.

Cover

Feature Engineering for Machine Learning

Feature engineering is a crucial stage in any machine learning project. It allows you to use data to define features that enable machine learning algorithms to work properly. In this course, you will learn the techniques that will help you create new features from existing features. You’ll start by diving into label encoding which is crucial for converting categorical features into numerical. You’ll also learn about other various types of encoding such as: one-hot, count, and mean, all of which are important for feature engineering. In the remaining chapters, you’ll learn about feature interaction and datetime features. In all, this course will show you the many different ways you can create features from existing ones.

30mins

Advanced

10 Playgrounds

1 Quiz

In the example above, knowing the values of two columns automatically reveals the value of the third. For linear regression models, this redundancy can make coefficient estimation unstable and harder to interpret.

To avoid this issue, many machine learning practitioners drop one encoded column and treat it as the baseline category. Libraries such as Pandas and Scikit-learn provide options to automate this behavior. For example, pd.get_dummies(drop_first=True) or OneHotEncoder(drop='first') can remove the redundant feature automatically.

It's important to note that the dummy variable trap primarily affects linear models. Tree-based algorithms such as Random Forests and Gradient Boosting are generally unaffected because they do not rely on matrix inversion when learning relationships.

One hot encoding with Pandas#

We don’t have to one hot encode manually. Many data science tools offer easy ways to encode your data. The Python library Pandas provides a function called get_dummies to enable one-hot encoding.

df_new = pd.get_dummies(df, columns=["col1"], prefix="Planet")

Let’s see this in action.

Line 7 shows that we’re using get_dummies to do one-hot encoding for a pandas DataFrame object. The parameter prefix indicates the prefix of the new column name.
Line 9 shows us our output.

Let’s apply this to a practical example. Say we have the following dataset.

import pandas as pd
 
ids = [11, 22, 33, 44, 55, 66, 77]
countries = ['Seattle', 'London', 'Lahore', 'Berlin', 'Abuja']
 
df = pd.DataFrame(list(zip(ids, countries)),
                  columns=['Ids', 'Cities'])

Here we have a Pandas dataframe called df with two lists: ids and Cities. Let’s call the head() to get this result:

	Ids	Cities
0	11	Seattle
1	22	London
2	33	Lahore
3	44	Berlin
4	55	Abuja

We see here that the Cities column contains our categorical values: the names of our cities. We must convert them in our new column Cities using the get_dummies() function we discussed above.

y = pd.get_dummies(df.Countries, prefix='City')
print(y.head())

Here, we are passing the value City for the prefix attribute of the method get_dummies(). If we run the code now, we will print our encoded values:

Python

import sklearn.preprocessing as preprocessing
import numpy as np
import pandas as pd
targets = np.array(["red", "green", "blue", "yellow", "pink",
                    "white"])
labelEnc = preprocessing.LabelEncoder()
new_target = labelEnc.fit_transform(targets)
onehotEnc = preprocessing.OneHotEncoder()
onehotEnc.fit(new_target.reshape(-1, 1))
targets_trans = onehotEnc.transform(new_target.reshape(-1, 1))
print("The original data")
print(targets)
print("The transform data using OneHotEncoder")
print(targets_trans.toarray())

We use LabelEncoder to convert the string to int on line 7 and line 8.
Line 9 creates our OneHotEncoder object.
Line 10 fits the original feature using fit().
Line 11 converts the original feature to the new feature using one-hot encoding.
You can see the new data from the output of line 15.

Note: In the newer version of sklearn, you don’t need to convert the string to int, as OneHotEncoder does this automatically.

Let’s see the OneHotEncoder class in action with another example. First, here’s how to import the class.

from sklearn.preprocessing import OneHotEncoder

Like before, we first populate our list of unique values for the encoder.

Tool	Best For	Pros	Limitations	Handles train/test consistency?	Pipeline-friendly?
`pd.get_dummies()`	Quick analysis and notebooks	Simple, readable, easy to use	Can create train/test column mismatches	Not automatically	No
`sklearn.OneHotEncoder`	Production ML workflows	Works with Sklearn pipelines, handles unseen categories	Slightly more setup	Yes	Yes
`category_encoders`	Advanced feature engineering	Supports target encoding and high-cardinality features	Requires extra library and careful validation	Yes, when fitted properly	Yes

Target encoding replaces each category with a value based on the target variable. This can be useful for high-cardinality features like city names, product IDs, or user segments.Warning: Target encoding can cause data leakage if you apply it incorrectly. Always fit encoders on training data only and validate with cross-validation.

When should you use each?#

Use pd.get_dummies() when you’re doing quick analysis, exploring data, or building a simple notebook example.
Use sklearn.OneHotEncoder when you’re building a real ML pipeline and need consistent behavior across training and test data.
Use category_encoders when one-hot encoding creates too many columns or when you need advanced techniques like target encoding, count encoding, or hashing.

In practice, start simple with one-hot encoding, then move to advanced encoders when your dataset or model needs it.

Next steps for your learning#

Congrats on making it to the end! You should now have a good idea what one hot encoding does and how to implement it in Python. There is still a lot to learn to master machine learning feature engineering. Your next steps are:

One hot with Numpy
Count encoding
Mean encoding
Label encoding
Weight of evidence encoding

To get introduce to these, check out Educative’s mini course Feature Engineering for Machine Learning. You’ll learn the techniques to create new ML features from existing features. You’ll start by diving into label encoding which is crucial for converting categorical features into numerical. In the remaining chapters, you’ll learn about feature interaction and datetime features.

Happy learning!

Continue reading about artificial intelligence#

One-hot encoding in PyTorch and TensorFlow#

In deep learning, one-hot encoding is commonly used to represent categorical labels in a numerical format that neural networks can understand. You’ll see it frequently in classification tasks where models predict one class out of many possible categories.

Unlike traditional ML preprocessing pipelines, deep learning frameworks often perform one-hot encoding directly on tensors during training.

One-hot encoding in TensorFlow#

TensorFlow provides the tf.one_hot() function for converting integer labels into one-hot encoded tensors.

This is especially useful in:

Multi-class classification
Image classification labels
NLP token processing

TensorFlow example#

Shape explanation#

Input shape: (3,)
Output shape: (3, 3)

Each label becomes a vector of length 3, where:

1 marks the correct class
0 marks all other classes

For example:

Label 2 → [0, 0, 1]

Common TensorFlow use cases#

Image classification
Multi-class neural networks
Token representation in NLP pipelines
Recommendation systems

One-hot encoding in PyTorch#

PyTorch provides torch.nn.functional.one_hot() for the same purpose.

The idea is identical:

Integer labels are converted into categorical vectors
Each class gets its own position in the vector

PyTorch example#

Shape explanation#

Input tensor shape: (3,)
Output tensor shape: (3, 3)

Each row represents one encoded class label.

Common PyTorch use cases#

Deep learning classifiers
Custom loss functions
NLP pipelines
Reinforcement learning models

One-hot encoding vs embeddings in deep learning#

One-hot encoding works well when the number of categories is small. But for large vocabularies or high-cardinality features, it becomes inefficient because the vectors grow very large and sparse.

That’s why modern deep learning systems often use embeddings instead.

Embeddings:

Learn dense numerical representations
Reduce dimensionality
Improve scalability and memory efficiency

This is especially important in:

NLP systems
Recommendation engines
Transformer models and modern AI architectures

Practical use cases#

You’ll commonly see one-hot encoding in:

Image classification labels (cat, dog, car)
NLP token encoding
Recommendation systems
Multi-class prediction tasks

Important warning#

One-hot encoding very large vocabularies can become memory-intensive because every category creates a new dimension. For large-scale deep learning systems, embeddings are usually the preferred solution.

Next steps for your learning#

Congrats on making it to the end! You should now have a good idea what one hot encoding does and how to implement it in Python. There is still a lot to learn to master machine learning feature engineering. Your next steps are:

One hot with Numpy
Count encoding
Mean encoding
Label encoding
Weight of evidence encoding

To get introduce to these, check out Educative’s mini course Feature Engineering for Machine Learning. You’ll learn the techniques to create new ML features from existing features. You’ll start by diving into label encoding which is crucial for converting categorical features into numerical. In the remaining chapters, you’ll learn about feature interaction and datetime features.

Happy learning!

Continue reading about artificial intelligence#

Written By:

Amanda Fawcett

Related Courses

An Introductory Guide to Data Science and Machine Learning Data Science for Non-Programmers Mastering Data Analysis with Python Pandas Advanced pandas—Going Beyond the Basics Data Science in Production: Building Scalable Model Pipelines Data Science Interview Handbook Linear Algebra for Data Science Using Python Data Science in R: From Basics to Machine Learning Introduction to Data Science with Python Learn Data Science with Bash Shell Data Science and Machine Learning Interview Handbook Data Wrangling With Python Business Machine Learning Applied Machine Learning: Industry Case Study with TensorFlow Data Visualizations with ggplot2 in R

Related Blogs

Julia vs. Python: A comprehensive comparison R Tutorial: a quick beginner's guide to using R Kubernetes: A Comprehensive Tutorial for Beginners

Free Resources

blog

Julia vs. Python: A comprehensive comparison

blog

R Tutorial: a quick beginner's guide to using R

blog

Kubernetes: A Comprehensive Tutorial for Beginners