How to convert a pandas object's variables to dummy variables

Overview

A dummy variable in pandas is an indicator variable that takes only the value, 0, or,1, to indicate whether a separate categorical variable can take a specific value or not.

To create a dummy variable in a given DataFrame in pandas, we make use of the get_dummies() function.

Syntax

pandas.get_dummies(data, prefix=None, prefix_sep='_', dummy_na=False, columns=None, sparse=False, dtype=None)[source]
Syntax for the get_dummies() function in Pandas

Parameter value

The get_dummies() function takes the following parameter values:

  • data (required): This is the input data that is used to get the dummy indicators.
  • prefix (optional): This is a string that is used to append the column names of the DataFrame.
  • prefix_sep (optional): This is the separator or delimiter to use when appending the prefix.
  • dummy_na (optional): This takes a Boolean value indicating if a column containing NaN is added or not.
  • columns (optional): This represents the names of the columns for the dummies.
  • sparse (optional): This takes a Boolean value indicating if the dummy-encoded columns should be sparseArray or a regular NumPy array.
  • dtype (optional): This is the data type of the resulting columns.

Return value

The get_dummies() function returns a dummy-encoded data.

Example

# A program to illustrate the get_dummies() function in Pandas
# importing the pandas module
import pandas as pd
# creating a dataframe
df = pd.DataFrame({'NAME': ['Alek', 'Akim', 'Cynthia'], 'AGE': ['19', '29', '23'],
'HEIGHT': [189, 178, 168]})
# creating a dummyencoded data
dummy_table = pd.get_dummies(df, prefix=['col1', 'col2'])
print(dummy_table)

Explanation

  • Line 4: We import the pandas module.
  • Lines 7–8: We create DataFrame df.
  • Line 11: We create a dummy encoded table, dummy_table, containing dummy variables, using the get_dummies() function.
  • Line 13: We print the new table of dummies.

Free Resources