Related Tags

numpy

# How to convert an array of indices to one-hot encoded NumPy array

Vafa Batool

### Overview

One-hot encoding is a very popular technique used in machine learning to convert categorical data, like red, blue, and green, into binary values of $0$ and $1$ for machine learning algorithms to use.

NumPy arrays, like any other array, can be indexed based on the indices of the elements. A high-level representation of a NumPy array being converted into a one-hot encoded 2-D array is as follows:

How a simple array would look after being converted to a one-hot encoded array

### Method

One-hot encoding creates a 2-D array whose number of rows is equal to the size of the original array and number of columns is equal to the max element in the 1-D array added to $1$. In the example above, the number of rows is $3$ (the number of elements in a 1-D array) and the number of columns is $5$ (max element added to $1$ or $4 + 1$). In each row, the binary number $1$ is stored against the number in the original array, now treated as an index. For example, in the one-hot encoded array above, $1$ is stored on the 1st index in row $1$ for the number $1$ as well as on the 4th index in row $2$ for the number $4$

Let's look at the step-by-step transformation of another simple example below:

Conversion of array of indices to one-hot encoded array
1 of 4

### Code

Now let's see the method in action in Python using NumPy. NumPy provides certain functions that make this process of transformation very efficient. Have a look at the code below and change the values to see how the conversion changes as a result.

import numpy as np

#creating an array
simple_array = np.array([0,2,1])

#creating a 2D array filled with 0's
encoded_array = np.zeros((simple_array.size, simple_array.max()+1), dtype=int)

#replacing 0 with a 1 at the index of the original array
encoded_array[np.arange(simple_array.size),simple_array] = 1

print(encoded_array)
Code example showing conversion of an array into a one-hot encoded array

### Explanation

• Line 1: We import NumPy to use functions from this library.
• Line 4: We declare a simple array of numbers.
• Line 7: We initialize a 2-D array of $0$'s using the numpy.zeros function, which takes the shape (rows, columns) as its first argument and the data type as its second argument. As mentioned earlier, the rows are equal to the length of the original array and the columns are equal to the value of the max element added to 1. In the code above, both the rows and columns are equal to 3. The data type of the array is specified as int.
• Line 10: We use the numpy.arange function to create a range of integers using the size of the original array.

In the example above, numpy.arange will return [0 1 2]. This will be used to loop over the rows of the 2-D array. Each number in the original array is used as an index to add $1$ to the 2-D array. Notice that in row 1, $1$ is added for the number $0$ on the 0th index and so on.

RELATED TAGS

numpy

CONTRIBUTOR

Vafa Batool
RELATED COURSES

View all Courses

Keep Exploring

Learn in-demand tech skills in half the time 