How to count unique items in pandas
pandas provides the useful function value_counts() to count unique items – it returns a Series with the counts of unique values.
Category data value count
First of all, let’s see a simple example without any parameter:
import pandas as pdimport numpy as np# create a dataframe with one columndf = pd.DataFrame({"col1": ["a", "b", "a", "c", "a", "a", "a", "c"]})# print the dataframe objectprint(df)# line breakprint("=" * 30)# counting unique itemsitem_counts = df["col1"].value_counts()print(item_counts)
- From the output of
line 10you can see the result, which is a count of the columncol1.
Category data value count with normalize
Sometimes, we don’t care about the exact number for each item of one column, but we care about the relative percentage. Setting normalize=True can return the percentage instead of the counts.
import pandas as pd# create a dataframe with one columndf = pd.DataFrame({"col1": ["a", "b", "a", "c", "a", "a", "a", "c"]})# setting normalize=Trueitem_counts = df["col1"].value_counts(normalize=True)print(item_counts)
-
line 7setsnormalize=True. -
From the output of
line 8, you can see the difference from the last demo: the output is a relative percentage, not counts.
Continuous data bucket intervals
value_counts() can handle continuous values as well as category values. By setting bin=n parameters, you can group those continuous values into n groups.
import pandas as pdimport numpy as np# create a array with random value between 0 and 1data = np.random.random((30,))# create a DataFrame object from arraydf = pd.DataFrame(data, columns=["col1"])# show the first five rows of this dataframe objectprint(df.head())# line breakprint("=" * 30)# set bins=8value_bins = df['col1'].value_counts(bins=8)print(value_bins)
-
line 12setsbins=8. -
From the output of
line 13, you can see that the original values are grouped into 8 bins.