Trusted answers to developer questions
Trusted Answers to Developer Questions

Related Tags

What are z-scores and how are they used in a dataset?

Khizar Hayat Saani

Z-score is a numeric measurement that identifies how far a data point is from the mean. It is measured in terms of the standard deviation.

The z-score is calculated as follows:

Z=data pointmeanstandard deviationZ = \frac{data \ point - mean} {standard \ deviation}

Significance of the z-score

Z-scores can be a vital tool for statisticians and developers alike.

Through z-scores, developers can easily detect anomalies within our dataset.

The following points allow us to extract more information from a z-score:

  • A positive z-score means that the data point is above the mean.
  • A negative z-score means that the data point is below the mean.
  • A z-score close to 0 means that the data point is close to the mean.
  • In general, a data point with a z-score greater than 3 or less than -3 is considered anomalous.
Z-scores and their meaning

How to use z-scores

By using the z-score of a particular data point, we can measure how close or far the point is from our mean. By setting a range of acceptable z-scores, we can identify the anomalies as the points that lie outside of our acceptable range( e.g., ±1\pm1).

A range of ±1\pm1 means that we will be considering points that are one standard deviation from our mean (as acceptable). All other points will be anomalies or outliers.

Example

Let’s consider the following dataset:

[2, 3, 5, 4, 7, 19, 6, 4, 3, 6]

First, we will calculate the mean and standard deviation of our dataset. These come out as:

  • Mean: 5.9
  • Standard Deviation: 4.6

Now, we will proceed to calculate the z-scores using the formula above.

Data point

z-score

2

-0.8

3

-0.6

5

-0.1

4

-0.4

7

0.2

19

2.8

6

0.02

4

-0.4

3

-0.6

6

0.02

From the table, we can easily identify that data point 19 has the highest z-score. Hence, the point can be considered an anomaly with a z-score of 2.8. The point lies 2.8 standard deviations beyond the mean.

Note: The z-score may also be referred to as Standard Score.

RELATED TAGS

CONTRIBUTOR

Khizar Hayat Saani
Copyright ©2022 Educative, Inc. All rights reserved
RELATED COURSES

View all Courses

Keep Exploring