Trusted answers to developer questions
Trusted Answers to Developer Questions

Related Tags

one hot encoding
python
machine learning
data sciences

# One-hot encoding in Python Educative Answers Team

Grokking Modern System Design Interview for Engineers & Managers

Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.

Most of the existing machine learning algorithms cannot be executed on categorical data. Instead, the categorical data needs to first be converted to numerical data. One-hot encoding is one of the techniques used to perform this conversion. This method is mostly used when deep learning techniques are to be applied to​ sequential classification problems.

One-hot encoding is essentially the representation of categorical variables as binary vectors. These categorical values are first mapped to integer values. Each integer value is then represented as a binary vector that is all 0s (except the index of the integer which is marked as 1). ## Manual one-hot encoding

Have a look at the example below​ which manually converts the categorical list of colors to a numerical list using one-hot encoding:

import numpy as np### Categorical data to be converted to numeric datacolors = ["red", "green", "yellow", "red", "blue"]### Universal list of colorstotal_colors = ["red", "green", "blue", "black", "yellow"]### map each color to an integermapping = {}for x in range(len(total_colors)):  mapping[total_colors[x]] = xone_hot_encode = []for c in colors:  arr = list(np.zeros(len(total_colors), dtype = int))  arr[mapping[c]] = 1  one_hot_encode.append(arr)print(one_hot_encode)

## One-hot encoding using scikit-learn

Take a look at the example below. It uses the scikit-learn library to perform one-hot encoding:

from sklearn.preprocessing import LabelEncoderfrom sklearn.preprocessing import OneHotEncoder### Categorical data to be converted to numeric datacolors = (["red", "green", "yellow", "red", "blue"])### integer mapping using LabelEncoderlabel_encoder = LabelEncoder()integer_encoded = label_encoder.fit_transform(colors)print(integer_encoded)integer_encoded = integer_encoded.reshape(len(integer_encoded), 1)### One hot encodingonehot_encoder = OneHotEncoder(sparse=False)onehot_encoded = onehot_encoder.fit_transform(integer_encoded)print(onehot_encoded)

RELATED TAGS

one hot encoding
python
machine learning
data sciences 