Trusted answers to developer questions
Trusted Answers to Developer Questions

Related Tags

# What are z-scores and how are they used in a dataset?

Khizar Hayat Saani

Z-score is a numeric measurement that identifies how far a data point is from the mean. It is measured in terms of the standard deviation.

The z-score is calculated as follows:

$Z = \frac{data \ point - mean} {standard \ deviation}$

## Significance of the z-score

Z-scores can be a vital tool for statisticians and developers alike.

Through z-scores, developers can easily detect anomalies within our dataset.

The following points allow us to extract more information from a z-score:

• A positive z-score means that the data point is above the mean.
• A negative z-score means that the data point is below the mean.
• A z-score close to 0 means that the data point is close to the mean.
• In general, a data point with a z-score greater than 3 or less than -3 is considered anomalous.
Z-scores and their meaning

## How to use z-scores

By using the z-score of a particular data point, we can measure how close or far the point is from our mean. By setting a range of acceptable z-scores, we can identify the anomalies as the points that lie outside of our acceptable range( e.g., $\pm1$).

A range of $\pm1$ means that we will be considering points that are one standard deviation from our mean (as acceptable). All other points will be anomalies or outliers.

## Example

Let’s consider the following dataset:

[2, 3, 5, 4, 7, 19, 6, 4, 3, 6]


First, we will calculate the mean and standard deviation of our dataset. These come out as:

• Mean: 5.9
• Standard Deviation: 4.6

Now, we will proceed to calculate the z-scores using the formula above.

 Data point z-score 2 -0.8 3 -0.6 5 -0.1 4 -0.4 7 0.2 19 2.8 6 0.02 4 -0.4 3 -0.6 6 0.02

From the table, we can easily identify that data point 19 has the highest z-score. Hence, the point can be considered an anomaly with a z-score of 2.8. The point lies 2.8 standard deviations beyond the mean.

Note: The z-score may also be referred to as Standard Score.

RELATED TAGS

CONTRIBUTOR

Khizar Hayat Saani