Trusted answers to developer questions
Trusted Answers to Developer Questions

Related Tags

one hot encoding
python
machine learning
data sciences

One-hot encoding in Python

Educative Answers Team

Most of the existing machine learning algorithms cannot be executed on categorical data. Instead, the categorical data needs to first be converted to numerical data. One-hot encoding is one of the techniques used to perform this conversion. This method is mostly used when deep learning techniques are to be applied to​ sequential classification problems.

One-hot encoding is essentially the representation of categorical variables as binary vectors. These categorical values are first mapped to integer values. Each integer value is then represented as a binary vector that is all 0s (except the index of the integer which is marked as 1).

svg viewer

Manual one-hot encoding

Have a look at the example below​ which manually converts the categorical list of colors to a numerical list using one-hot encoding:

import numpy as np
### Categorical data to be converted to numeric data
colors = ["red", "green", "yellow", "red", "blue"]

### Universal list of colors
total_colors = ["red", "green", "blue", "black", "yellow"]

### map each color to an integer
mapping = {}
for x in range(len(total_colors)):
  mapping[total_colors[x]] = x

one_hot_encode = []

for c in colors:
  arr = list(np.zeros(len(total_colors), dtype = int))
  arr[mapping[c]] = 1
  one_hot_encode.append(arr)

print(one_hot_encode)

One-hot encoding using scikit-learn

Take a look at the example below. It uses the scikit-learn library to perform one-hot encoding:

from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder

### Categorical data to be converted to numeric data
colors = (["red", "green", "yellow", "red", "blue"])

### integer mapping using LabelEncoder
label_encoder = LabelEncoder()
integer_encoded = label_encoder.fit_transform(colors)
print(integer_encoded)
integer_encoded = integer_encoded.reshape(len(integer_encoded), 1)

### One hot encoding
onehot_encoder = OneHotEncoder(sparse=False)
onehot_encoded = onehot_encoder.fit_transform(integer_encoded)

print(onehot_encoded)

RELATED TAGS

one hot encoding
python
machine learning
data sciences
Copyright ©2022 Educative, Inc. All rights reserved
RELATED COURSES

View all Courses

Keep Exploring