pandas provides the useful function value_counts()
to count unique items – it returns a Series
with the counts of unique values.
First of all, let’s see a simple example without any parameter:
import pandas as pd import numpy as np # create a dataframe with one column df = pd.DataFrame({"col1": ["a", "b", "a", "c", "a", "a", "a", "c"]}) # print the dataframe object print(df) # line break print("=" * 30) # counting unique items item_counts = df["col1"].value_counts() print(item_counts)
line 10
you can see the result, which is a count of the column col1
.Sometimes, we don’t care about the exact number for each item of one column, but we care about the relative percentage. Setting normalize=True
can return the percentage instead of the counts.
import pandas as pd # create a dataframe with one column df = pd.DataFrame({"col1": ["a", "b", "a", "c", "a", "a", "a", "c"]}) # setting normalize=True item_counts = df["col1"].value_counts(normalize=True) print(item_counts)
line 7
sets normalize=True
.
From the output of line 8
, you can see the difference from the last demo: the output is a relative percentage, not counts.
value_counts()
can handle continuous values as well as category values. By setting bin=n
parameters, you can group those continuous values into n
groups.
import pandas as pd import numpy as np # create a array with random value between 0 and 1 data = np.random.random((30,)) # create a DataFrame object from array df = pd.DataFrame(data, columns=["col1"]) # show the first five rows of this dataframe object print(df.head()) # line break print("=" * 30) # set bins=8 value_bins = df['col1'].value_counts(bins=8) print(value_bins)
line 12
sets bins=8
.
From the output of line 13
, you can see that the original values are grouped into 8 bins.
RELATED TAGS
CONTRIBUTOR
View all Courses