How to use nunique() function on a dataframe in pandas
DataFrame
A DataFrame is a commonly used 2-dimensional data structure. It is a table that consists of columns and rows, and is used primarily as an object in the pandas library.
How to import the pandas library
We use the following statement to call the pandas library.
import pandas as pd
Example
A DataFrame can be formed as shown below. The one we use contains countries that have been put in different groups and are given a different a_score and b_score.
Both the scores are imaginary values for this example.
import pandas as pda_score = [4, 5, 7, 8, 2, 3, 1, 6, 9, 10]b_score = [1, 2, 3, 4, 5, 6, 7, 10, 8, 9]country = ['Pakistan', 'USA', 'Canada', 'Brazil', 'India', 'Beligium', 'Malaysia', 'Peru', 'England', 'Scotland']groups = ['A','A','B','A','B','B','C','A','C','C']df = pd.DataFrame({'group':groups, 'country':country, 'a_score':a_score, 'b_score':b_score})print(df)
The nunique() function
The nunique() function counts the number of unique entries in a column of a dataframe.
It is useful in situations where the number of categories is unknown beforehand.
Syntax
The function prototype is as follows.
mynumber = df.nunique()
Parameter
It does not take any parameters.
Return value
This method returns the number of entries in the requested columns.
Example
The following example prints the number of unique entries in a_score. Next, it prints a number of unique entries in all columns.
import pandas as pda_score = [4, 5, 7, 4, 2, 4, 1, 1, 5, 10]b_score = [1, 2, 3, 4, 3, 6, 4, 10, 1, 9]country = ['Pakistan', 'USA', 'Canada', 'Brazil', 'India', 'Beligium', 'Malaysia', 'Peru', 'England', 'Scotland']groups = ['A','A','B','A','B','B','C','A','C','C']df = pd.DataFrame({'group':groups, 'country':country, 'a_score':a_score, 'b_score':b_score})print("the main dataframe")print(df)print("")print("unique entries in a_score = ")print(df.a_score.nunique())print("")print("unique entries in all columns = ")print(df.nunique())