How to compute mean, median, and mode using NumPy in Python

Key takeaways:

  • Mean computes the average of an array of numbers.

  • Median finds the middle value in a sorted array.

  • Mode identifies the most frequent value(s) in an array.

  • NumPy provides efficient built-in functions (np.mean(), np.median(), and np.std()) to compute these statistical measures.

  • Understanding these measures is essential for data analysis, statistical modeling, and machine learning.

In data analysis, descriptive statistics like the mean, median, and mode help summarize and understand the underlying patterns in the data. Python’s NumPy library, widely used in data science, provides optimized functions to compute these statistics efficiently. This Answer explains how to calculate these measures using NumPy, providing you with the tools needed for data analysis and exploratory data analysis (EDA).

In NumPy, we use special inbuilt functions to compute mean, standard deviation, and median.

Mean

The mean gives the arithmetic mean of the input values. It’s a measure of central tendency that provides the “average” value. It calculates by taking the sum of elements divided by the total number of elements.

Usage of mean

The mean is widely used in applications where you need a measure of the average, such as:

  • To calculate average returns or stock prices.
  • To compute the average score of students.
  • To analyze average patient age, weight, etc.

The syntax for the Numpy mean function is numpy.mean() or np.mean().

import numpy as np
array = np.arange(20)
print(array)
r1 = np.mean(array)
print("\nMean: ", r1)
Calculating mean using numpy.mean()

Median

The median is the middle value when the data is sorted. If the dataset has an odd number of values, the median is the exact middle element. If the dataset has an even number of values, the median is the average of the two middle values.

Usage of median

The median is useful when:

  • Data contains outliers or skewed distributions, where the mean may not represent the central tendency well.
  • Median home prices to avoid the impact of a few extremely expensive properties.

The syntax for the median is np.median() function.

import numpy as np
array = np.arange(20)
print(array)
r1 = np.median(array)
print("\nstd: ", r1)
Calculating median using numpy.median()

Mode

The mode is the value that appears most frequently in a dataset. If several values occur with the same highest frequency, a dataset can have multiple modes (multimodal).

Usage of mode

The mode is used in applications where identifying the most frequent event is crucial:

  • Finding the most popular product or service.
  • Identifying the most common customer purchase.
  • Identifying the most common grade or score range.

NumPy does not have a built-in mode function, but we can use SciPy to compute the mode.

import numpy as np
from scipy import stats
data = np.array([3, 4, 4, 5, 6, 7, 8, 4])
print(data)
# Compute mode
mode_value = stats.mode(data)
print(f"Mode: {mode_value.mode[0]}, Count: {mode_value.count[0]}")
Calculating mode using mode()

Become a machine learning engineer with our comprehensive learning path!

Ready to kickstart your career as an ML Engineer? Our “Become a Machine Learning Engineer” path is designed to take you from your first line of code to landing your first job. From mastering Python to diving into machine learning algorithms and model development, this path has it all. This comprehensive journey offers essential knowledge and hands-on practice, ensuring you gain practical, real-world coding skills. With our AI mentor by your side, you’ll overcome challenges with personalized support. Start your machine learning career today and make your mark in the world of AI!

Conclusion

In this Answer, we explored how to calculate the mean, median, and mode using NumPy and SciPy. These fundamental statistical measures help summarize and understand datasets, providing valuable insights for further analysis. Whether you’re working with financial data, healthcare metrics, or sales data, these tools are essential for basic data analysis tasks.

Frequently asked questions

Haven’t found what you were looking for? Contact Us


What is the median of 7, 10, 7, 5, 9 and 10?

The median is 8. Here is the calculation:

  1. Sort the numbers: [5, 7, 7, 9, 10, 10]

  2. Since there is an even number of values (6), the median is the average of the two middle values. The two middle values are 7 and 9.

  3. Calculate the average of these two values:

                                  Median = (7+9)/2 = 16/2 = 8
    

What is the median of 2, 3, 4, 5, 1, 2, 3, 4, 6, 5?

The median is 3.5. Here is the calculation:

  1. Sort the numbers: [1, 2, 2, 3, 3, 4, 4, 5, 5, 6]

  2. Find the middle value(s): There are 10 values (even number), so the median will be the average of the 5th and 6th values. The 5th value is 3, and the 6th value is 4.

  3. Calculate the average of these two values:

                                       Median = (3+4)/2 = 7/2 = 3.5

What is the full form of NumPy?

The full form of NumPy is Numerical Python. It is a popular library in Python used for working with arrays, matrices, and performing a wide range of mathematical and statistical operations.


Free Resources