How to perform cross-tabulation on a column in pandas
In pandas, cross-tabulation is used to compute the cross-tabulation of two factors. With the help of cross-tabulation we can find the frequency distribution of the variables.
Syntax
pandas.crosstab(index, columns, values = None, rownames = None, colnames = None, aggfunc = None, margins = False, margins_name = 'All', dropna = True, normalize = False)
To use cross-tabulation, we call the built-in function pandas.crosstab. In the index option, we pass the value that is used as a row, and in the column, we pass the values that will be used as the columns. The other options like values, rownames, and colnames are optional and can be used whenever they are required. Otherwise, they will be processed in their built-in state.
Example
For instance, consider a table with the following data:
Employee Name | Nationality | Gender |
Jerry | Germany | Male |
Harry | USA | Male |
Emma | USA | Female |
Amalia | China | Female |
Suppose we want to find out how many of the employees are males and females from each country. Pandas helps us do this via the crosstab() function:
import numpy as npimport pandas as pdemployee_name = np.array(["Jerry", "Harry", "Emma", "Amalia"], dtype = object)nationality = np.array(["Germeny", "USA", "USA", "China"], dtype = object)gender = np.array(["Male", "Male", "Female", "Female"], dtype = object)print(pd.crosstab(nationality, gender, rownames = ['Nationality'], colnames = ['Gender']))
We see the results below:
Nationality | Male | Female |
USA | 1 | 1 |
Germany | 1 | 0 |
UK | 0 | 1 |
Explanation
Line 4–6: Store data in the
employe_name,nationality,andgendervariables respectively.Line 7: Call the
crosstab()function to implement cross-tabulation on the data and pass the respective row and column.
Free Resources