What is the get_dummies function in Pandas?
The get_dummies function is used to convert categorical variables into dummy or indicator variables.
A dummy or indicator variable can have a value of 0 or 1.
How get_dummies works
The get_dummies function works as follows:
- It takes a data frame, series, or list.
- Then, it converts each unique element present in the object to a column heading.
- The function iterates over the object that is passed and checks if the element at the particular index matches the column heading.
- If it does, it encodes it as a 1.
- Otherwise, it assigns it a 0.
Illustration
The illustration below gives an example of how the get_dummies function works:
Syntax
The syntax of the get_dummies function is as follows:
pandas.get_dummies(data, prefix=None, prefix_sep='_', dummy_na=False, columns=None, sparse=False, drop_first=False, dtype=None)
Only the first parameter is compulsory. The rest are optional.
Parameters
The table below describes the parameters:
| Parameter | Description |
|---|---|
data |
Refers to a data frame, series, or list. |
prefix |
String to append column names of the data frame that is returned. It is None by default. |
prefix_sep |
The separator or delimiter to be used if a prefix is added. It is _ by default. |
dummy_na |
Adding a column to represent NAN values. It is False by default. |
columns |
Column names in the data frame to be encoded. It is None by default. |
sparse |
Whether a SparseArray should back the dummy-encoded columns. It is False by default. |
drop_first |
To remove the first column. It is False by default. |
dtype |
Data type for the new column. |
Return value
The get_dummies function returns a data frame with categorical encodings (0s and 1s).
Example
The code snippet below shows how the get_dummies function is used in Pandas:
import pandas as pdimport numpy as np# Creating a series from a lists = pd.Series(list('abcb'))print(s)print('\n')# Encoding using the functionprint(pd.get_dummies(s))print('\n')# With dummy_na = Trues1 = ['a', 'b', np.nan]print("With NA column as well")print(pd.get_dummies(s1, dummy_na=True))print('\n')# On a dataframe with column prefixesdf = pd.DataFrame({'A': ['a', 'b', 'a'],'B': ['b', 'a', 'c'],'C': [1, 2, 3]})print(pd.get_dummies(df, prefix=['col1', 'col2']))print('\n')# With drop_first = Trueprint(pd.get_dummies(pd.Series(list('abcaa')), drop_first=True))print('\n')
Free Resources
Copyright ©2025 Educative, Inc. All rights reserved