Statistical Methods
Learn how to apply basic statistical operations in pandas.
We'll cover the following...
Importance of statistics in data analysis
Statistics plays a crucial role in data analysis, providing methods to summarize, organize, and make inferences from our data. The application of statistical methods enables us to draw meaningful insights from data exploration en route to making data-driven decisions. The pandas library contains a range of basic statistical methods for gaining a strong understanding of our data, which we’ll explore using the credit card dataset.
Central tendency
Central tendency is a statistical measure describing a dataset's center or typical value. There are three central tendency measures—mean, median, and mode. For example, we can find the mean of the Rating column, the median of the Income column, and the mode of the Cards column with the mean(), median(), and mode() methods, respectively.
Notice that the mode returns two values, 0 and 2. This implies that we have a bimodal mode, which indicates that the data has two distinct values that occur more frequently than the other values.
Variability
Data variability refers to the degree to which the values in a dataset differ from the central tendency. It measures how spread out the data is, and it’s an important aspect of understanding the distribution of a dataset. There are various measures of variability, as shown in the examples below:
Range: Measures the difference between the highest and lowest value and is calculated with
max()andmin().
Variance: Measures the degree of spread about the mean value of the data and is calculated with
var().