How do you create a category column while file reading in pandas?
What are categorical values in pandas?
A Categorical is a pandas data type that corresponds to the categorical variables in statistics. A categorical variable usually takes a fixed number of possible values. Some of the examples that can be considered as categorical are, gender, social class, blood type, country, etc.
Old way to create a categorical column in pandas
Earlier in pandas, you could create a category column after reading the file. Below is a code snippet that shows how this works. We used the astype() function to convert a column to a category column.
Let’s take a look at the code:
import pandas as pddrinks = pd.read_csv('http://bit.ly/drinksbycountry')print("Datatype of each column:")print(drinks.dtypes)drinks['continent'] = drinks.continent.astype('category')print("\nDatatype after creating category column:")print(drinks.dtypes)
Explanation:
- In line 1, we import the required package.
- In line 3, we read a CSV file from the URL.
- In line 6, we print the data types of all the columns. You can see that
continentis of typeobjectand not a categorical column. - In line 8, we create the
continentcolumn as a categorical column using theastype()function. - In line 11, we again print the data types of all the columns, and can see that the
continentcolumn is now a categorical column.
New way to create a categorical column in pandas
The above approach works fine, but what if we could do this conversion while reading the file itself?
Take a look at the code to see how this works.
import pandas as pddrinks = pd.read_csv('http://bit.ly/drinksbycountry',dtype={'continent':'category'})print("Datatype of each column:")print(drinks.dtypes)
Explanation:
- In line 1, we import the required package.
- In line 3, we read the CSV file and, while reading the file, we pass the
.dtype parameter where we set the data type of the continentcolumn
Similarly, you can set the data type of multiple columns using key-value pairs.
- In line 7, we print the data type of all the columns and can see that the continent column is now a categorical column.