Data Processing

In this lesson, we will learn about data processing tools using arrays.

NumPy arrays provide diverse functionalities and tools to manipulate extract meaning from raw data.

Mean and standard deviation

Suppose you are measuring the peak value of current passing through a transformer every two hours, but due to temperature changes, there is a fluctuation in the peak value. In order to get a good representation of the data, we will need to calculate the mean value of all 12 readings. The mathematical formula for the mean is given below:

a=1ni=1nai\overline{a}=\frac{1}{n}\sum_{i=1}^n{a_i}

where aia_{i} are elements in the data set, nn is the total number of elements and a\overline{a} is the mean.

The data’s standard deviation and variance will represent the number of fluctuations in the peak value. The mathematical formula for standard deviation is given below:

σ=1ni=1n(aa)2\sigma=\sqrt{\frac{1}{n}\sum_{i=1}^{n}(a-\overline{a})^2}

variance=σ2variance=\sigma^2

where aia_{i} are elements in the data set, nn is the total number of elements, a\overline{a} is the mean, and σ\sigma is the standard deviation. Variance is simply the square of the standard deviation.

In the example below, we will see how easily we can compute mean and standard deviation.

Get hands-on with 1200+ tech skills courses.