Finding Past Anomalies

Learn running simulations on past data to find anomalies.

In the previous section, we identified an anomaly. We found an increase in the 400 status code because the z-score was 6. But how do we set the threshold for the z-score? Is a z-score of 3 an anomaly? What about 2, or 1?

To find thresholds that fit our needs, we can run simulations on past data with different values and evaluate the results. This is often called backtesting.

The first thing we need to do is to calculate the mean and the standard deviation for each status code up until every row, just as if it is the current value. This is a classic job for a window function:

Get hands-on with 1200+ tech skills courses.