How to convert a pandas object's variables to dummy variables
Overview
A dummy variable in pandas is an indicator variable that takes only the value, 0, or,1, to indicate whether a separate categorical variable can take a specific value or not.
To create a dummy variable in a given DataFrame in pandas, we make use of the get_dummies() function.
Syntax
pandas.get_dummies(data, prefix=None, prefix_sep='_', dummy_na=False, columns=None, sparse=False, dtype=None)[source]
Syntax for the get_dummies() function in Pandas
Parameter value
The get_dummies() function takes the following parameter values:
data(required): This is the input data that is used to get the dummy indicators.prefix(optional): This is a string that is used to append the column names of the DataFrame.prefix_sep(optional): This is the separator or delimiter to use when appending theprefix.dummy_na(optional): This takes a Boolean value indicating if a column containing NaN is added or not.columns(optional): This represents the names of the columns for the dummies.sparse(optional): This takes a Boolean value indicating if the dummy-encoded columns should besparseArrayor a regular NumPy array.dtype(optional): This is the data type of the resulting columns.
Return value
The get_dummies() function returns a dummy-encoded data.
Example
# A program to illustrate the get_dummies() function in Pandas# importing the pandas moduleimport pandas as pd# creating a dataframedf = pd.DataFrame({'NAME': ['Alek', 'Akim', 'Cynthia'], 'AGE': ['19', '29', '23'],'HEIGHT': [189, 178, 168]})# creating a dummyencoded datadummy_table = pd.get_dummies(df, prefix=['col1', 'col2'])print(dummy_table)
Explanation
- Line 4: We import the pandas module.
- Lines 7–8: We create DataFrame
df. - Line 11: We create a dummy encoded table,
dummy_table, containing dummy variables, using theget_dummies()function. - Line 13: We print the new table of dummies.