The get_dummies
function is used to convert categorical variables into dummy or indicator variables.
A dummy or indicator variable can have a value of 0 or 1.
get_dummies
worksThe get_dummies
function works as follows:
The illustration below gives an example of how the get_dummies
function works:
The syntax of the get_dummies
function is as follows:
pandas.get_dummies(data, prefix=None, prefix_sep='_', dummy_na=False, columns=None, sparse=False, drop_first=False, dtype=None)
Only the first parameter is compulsory. The rest are optional.
The table below describes the parameters:
Parameter | Description |
---|---|
data |
Refers to a data frame, series, or list. |
prefix |
String to append column names of the data frame that is returned. It is None by default. |
prefix_sep |
The separator or delimiter to be used if a prefix is added. It is _ by default. |
dummy_na |
Adding a column to represent NAN values. It is False by default. |
columns |
Column names in the data frame to be encoded. It is None by default. |
sparse |
Whether a SparseArray should back the dummy-encoded columns. It is False by default. |
drop_first |
To remove the first column. It is False by default. |
dtype |
Data type for the new column. |
The get_dummies
function returns a data frame with categorical encodings (0s and 1s).
The code snippet below shows how the get_dummies
function is used in Pandas:
import pandas as pdimport numpy as np# Creating a series from a lists = pd.Series(list('abcb'))print(s)print('\n')# Encoding using the functionprint(pd.get_dummies(s))print('\n')# With dummy_na = Trues1 = ['a', 'b', np.nan]print("With NA column as well")print(pd.get_dummies(s1, dummy_na=True))print('\n')# On a dataframe with column prefixesdf = pd.DataFrame({'A': ['a', 'b', 'a'],'B': ['b', 'a', 'c'],'C': [1, 2, 3]})print(pd.get_dummies(df, prefix=['col1', 'col2']))print('\n')# With drop_first = Trueprint(pd.get_dummies(pd.Series(list('abcaa')), drop_first=True))print('\n')