What is the get_dummies function in Pandas?

The get_dummies function is used to convert categorical variables into dummy or indicator variables.

A dummy or indicator variable can have a value of 0 or 1.

How get_dummies works

The get_dummies function works as follows:

  • It takes a data frame, series, or list.
  • Then, it converts each unique element present in the object to a column heading.
  • The function iterates over the object that is passed and checks if the element at the particular index matches the column heading.
  • If it does, it encodes it as a 1.
  • Otherwise, it assigns it a 0.

Illustration

The illustration below gives an example of how the get_dummies function works:

How does get_dummies function works

Syntax

The syntax of the get_dummies function is as follows:

pandas.get_dummies(data, prefix=None, prefix_sep='_', dummy_na=False, columns=None, sparse=False, drop_first=False, dtype=None)

Only the first parameter is compulsory. The rest are optional.

Parameters

The table below describes the parameters:

Parameter Description
data Refers to a data frame, series, or list.
prefix String to append column names of the data frame that is returned. It is None by default.
prefix_sep The separator or delimiter to be used if a prefix is added. It is _ by default.
dummy_na Adding a column to represent NAN values. It is False by default.
columns Column names in the data frame to be encoded. It is None by default.
sparse Whether a SparseArray should back the dummy-encoded columns. It is False by default.
drop_first To remove the first column. It is False by default.
dtype Data type for the new column.

Return value

The get_dummies function returns a data frame with categorical encodings (0s and 1s).

Example

The code snippet below shows how the get_dummies function is used in Pandas:

import pandas as pd
import numpy as np
# Creating a series from a list
s = pd.Series(list('abcb'))
print(s)
print('\n')
# Encoding using the function
print(pd.get_dummies(s))
print('\n')
# With dummy_na = True
s1 = ['a', 'b', np.nan]
print("With NA column as well")
print(pd.get_dummies(s1, dummy_na=True))
print('\n')
# On a dataframe with column prefixes
df = pd.DataFrame({'A': ['a', 'b', 'a'],
'B': ['b', 'a', 'c'],
'C': [1, 2, 3]})
print(pd.get_dummies(df, prefix=['col1', 'col2']))
print('\n')
# With drop_first = True
print(pd.get_dummies(pd.Series(list('abcaa')), drop_first=True))
print('\n')
Copyright ©2024 Educative, Inc. All rights reserved