In this lesson, we will learn about percentiles.

Representing data through percentiles

Another useful description of a dataset is by using percentiles.

For this we consider ordered data, meaning data that is sorted in ascending order. The 25th25^{th} percentile marks a data point in the ordered data such that 25%25\% of the data is below this data point and thus 75%75\% is above this data point. If we say that the 25th25^{th} percentile score on an exam was 85%, then 25%25\% of the candidates scored less than 85%85\% on the exam.

The percentiles of a dataset are commonly referred to as the ‘empirical percentiles’ as they are the percentiles of the dataset, not of the underlying distribution. The 50th50^{th} empirical percentile is equivalent to the median of the data. Common intervals to look at are the 50%50\% region around the median, also called the interquartile range or IQR.

IQR runs from the 25th25^{th} empirical percentile to the 75th75^{th} empirical percentile. The 95%95\% region, which runs from the 2.5th2.5^{th} empirical percentile to the 97.5th97.5^{th} empirical percentile. Percentiles of a dataset may be computed with the percentile() function in the numpy package. The first argument is the data, the second argument is a list of percentiles:

Get hands-on with 1200+ tech skills courses.