Statistical Measures Using Arrays
Learn to apply statistical measures using arrays.
In this lesson, we’ll apply different statistical measures using arrays.
With statistics, we get to understand and describe our data. It is the science of collecting, organizing, analyzing, and presenting data. For example, in a group of a hundred people of different ages and backgrounds, we might want to know the average age of the group or their average height. Or, we might want to know the most common ethnicity among the group.
Only after we’ve understood and are able to describe the data, can we make inferences about the data. The different statistical measures like mean, mode, median, variance, and standard deviation can help us analyze the data.
Instruction: Use the playground below for all the upcoming tasks.
Statistical measures
Use this playground to practice your learning:
1. Calculating the mean
The mean is the average of the values. We calculate the arithmetic mean by taking the sum of the values and dividing the sum by the total number of values.
Instruction: Write the function in the above playground and test it.
Exercise: Calculating the grade
Write a function that calculates the total grade percentage of the student. You are given the percentage and achieved grade in each instrument (each out of 100).
float theGrader(int Marks[ ], int Percentage [ ], int k)
Here is an example (having 5 instruments) with the weightage and earned marks percentage in each instrument:
| Instrument# | Percentage | Grade (out of 100) |
|---|---|---|
| Assignment # 1 | 10% | 80 |
| Assignment # 2 | 20% | 90 |
| Midterm | 20% | 85 |
| Project | 25% | 95 |
| Final exam | 25% | 75 |
The achieved percentage is as follows:
The general formula for the achieved percentage is as follows:
where represent the marks achieved in the 'th instrument and is the percentage of the instrument.
Instruction: Write the code in the following widget.
2. Calculating the median
The median is the middle value. To calculate the median, we first need to sort the values in ascending or descending order.
If the total number of values is odd, we can easily find the middle value. However, if the total number of values is even, we take the average of the middle two values and that is the median.
Below, we’ve used bubble sort to sort the values first and then calculate the median.
Instruction: Write the code in the above exercise playground.
3. Calculating the mode
Another way to represent data is through the mode that is the value with the highest frequency in the data.
To calculate the mode, we need to know the frequencies of all elements. We will make the following function:
int frequency(int D[], int n, int t)
Here, D[] is the data of size n, and t is the value to search and count how many times t appears in D[].
Idea:
The mode() function assumes the first value to be the mode value mv. We then find the frequency of the first value of the array and store it inside mf.
Inside the for loop, in the first iteration, we find the frequency of the second value of the array and compare it with the frequency of the first element. If the frequency of the second (next) element f is greater than the first (previous) element, we update the mode and frequency values (mv and mf respectively). These steps are carried out for each element of the array.
Lastly, after the entire array has been traversed, we return the mode value mv that holds the value with the highest frequency.
4. Calculating the variance and standard deviation
As the name suggests, the variance is used to calculate the degree of variability of each value from the mean.
- We take the difference of each value from the mean and then square the differences (to make them positive).
- We then divide the sum of the squared values by the total number of values.
Variance:
where is the mean of the data D[].
Standard Deviation is the square root of the variance, usually represented by .