Preparing the Data

Learn how to prepare your data for anomaly detection.

We'll cover the following

Data in server logs

Application servers such as Nginx, Apache, and IIS write very useful information to access logs. The data in these logs can be instrumental in identifying anomalies.

We will analyze logs of a web application, so the data we are most interested in is the timestamp and the status code of every response from the server. To illustrate the type of insight, we can draw from this data:

  • A sudden increase in the 500 status code: You may have a problem with the server. Did you push a new version? Is there an external service you are using that started failing in unexpected ways?
  • A sudden increase in the 400 status code: You may have a problem with the client. Did you change some validation logic and forgot to update the client? Did you make a change and forgot to handle backward compatibility?
  • A sudden increase in the 404 status code: You may have an SEO problem. Did you move some pages and forgot to set up redirects? Is there some script kiddy running a scan on your site? A sudden increase in the 200 status code: You either have some significant legit traffic coming in or are under a DOS attack. Either way, you probably want to check where it is coming from.

Get hands-on with 1200+ tech skills courses.