A dummy variable in pandas is an indicator variable that takes only the value, 0
, or,1
, to indicate whether a separate categorical variable can take a specific value or not.
To create a dummy variable in a given DataFrame in pandas, we make use of the get_dummies()
function.
pandas.get_dummies(data, prefix=None, prefix_sep='_', dummy_na=False, columns=None, sparse=False, dtype=None)[source]
The get_dummies()
function takes the following parameter values:
data
(required): This is the input data that is used to get the dummy indicators.prefix
(optional): This is a string that is used to append the column names of the DataFrame.prefix_sep
(optional): This is the separator or delimiter to use when appending the prefix
.dummy_na
(optional): This takes a Boolean value indicating if a column containing NaN is added or not.columns
(optional): This represents the names of the columns for the dummies.sparse
(optional): This takes a Boolean value indicating if the dummy-encoded columns should be sparseArray
or a regular NumPy array.dtype
(optional): This is the data type of the resulting columns.The get_dummies()
function returns a dummy-encoded data.
# A program to illustrate the get_dummies() function in Pandas# importing the pandas moduleimport pandas as pd# creating a dataframedf = pd.DataFrame({'NAME': ['Alek', 'Akim', 'Cynthia'], 'AGE': ['19', '29', '23'],'HEIGHT': [189, 178, 168]})# creating a dummyencoded datadummy_table = pd.get_dummies(df, prefix=['col1', 'col2'])print(dummy_table)
df
.dummy_table
, containing dummy variables, using the get_dummies()
function.