Website forecasting is a method used to predict the possible traffic on a website based on its history. Website traffic forecasting is done using the previous traffic record collected from the website. Now, websites can predict their traffic beforehand so the traffic length stays within the bandwidth. Also, it helps allocate resources and personnel to deal with issues based on traffic.
Traffic can also be used as a metric to decide a website’s reachability and predict the next step to improving it. In this Answer, we’ll discuss how machine learning algorithms can be used for website traffic forecasting.
The dataset used to forecast traffic on a website contains a history of views on a sample website for a decided period of time. The dataset has two columns to show the reachability. The column Date_traffic
shows the date, and the Views_per_day
column shows the total views on the website on that specific date. The format for the date defined in the dataset is d/m/Y
. The training data is stored in Traffic_record.csv
.
The website traffic forecasting process has a series of steps, starting with installing the libraries and plotting the dataset to predict the necessary variable values used in model training. The model is trained, the traffic is predicted, and the results are plotted.
Installing dependencies: To perform the website forecasting process, certain dependencies are required. In Python3, we use pip3
to install the required dependencies. For this specific process, we require matplotlib
, pandas
, and statsmodels
. To install the dependencies, we use the following command:
pip3 install matplotlibpip3 install pandaspython3 -m pip install statsmodels
Importing libraries: The next step is to import the libraries into the Python file; this will help use the libraries in the code. To import the libraries, we add the following statements to the code:
import pandas as pdimport matplotlib.pyplot as pltfrom statsmodels.tsa.seasonal import seasonal_decomposefrom statsmodels.graphics.tsaplots import plot_pacfimport statsmodels.api as sm
In the code:
Line 1: We import pandas
for loading the dataset in the Python file.
Line 2: We import matplotlib
for plotting the dataset.
Lines 3–5: We import statsmodel
to use its API to train the SARIMAX
model. Also, plotting libraries are used to predict the values p
, q
, and d
for the model.
Preparing the data: For model training we read the dataset, but it is also important to format the dataset. We format the dates used in the dataset from strings to d/m/Y
format. To do this, we add the following lines of code to the Python file.
traffic_history = pd.read_csv("Traffic_record.csv")print(traffic_history.head())# Format the datetraffic_history["Date_traffic"] = pd.to_datetime(traffic_history["Date_traffic"], format="%d/%m/%Y")print(traffic_history.info())
In the code:
Line 1: We read the dataset from the CSV file, Traffic_record.csv
and load it into the data frame traffic_history
.
Line 2: We print the top 5 values loaded from the dataset.
Line 4: We use pandas to format the column Date_traffic
in format %d/%m/%Y
.
Line 5: We print information from columns in the dataset.
Predicting p
, d
and q
: We use three different plotting techniques to find the values of variables for the training of the SARIMAX model. SARIMAX is a statistical model that understands seasonal trends of data to predict future values in a seasonal period s
. To determine the values of p
, q
, and d
, we employ the following mechanisms:
Since the website traffic is not consistent, it is seasonal. For instance, there is more traffic on weekdays than on weekends on educational websites and the opposite for entertainment websites. So for seasonal traffic, we use the SARIMAX model and set the value of d
equal to 1
. To plot the graph to detect whether it is seasonal or stationary, we use the following lines of code:
seasonal_traffic = seasonal_decompose(traffic_history["Views_per_day"], model='multiplicative', period = 30)figure_seasonal = plt.figure()figure_seasonal = seasonal_traffic.plot()figure_seasonal.set_size_inches(10, 10)
In the code:
Line 1: We use seasonal_decompose
by the View_per_day
column to plot the seasonal_traffic
for a period of 30
days using the multiplicative
Line 2–4: We plot the graph and define the size of the graph.
We use autocorrelation on the View_per_day
column to detect the value of p
. To do that, we use the following line of code:
pd.plotting.autocorrelation_plot(traffic_history["Views_per_day"])
The output graph is as follows:
Based on the output, since the curve is moving after the fifth horizontal line, we define the value of p
equal to 5
.
Now to find the value of q
which is the moving average, we use partial autocorrelation of the View_per_day
column. To do that, we use the following line of code:
plot_pacf(traffic_history["Views_per_day"], lags = 100)
The output of the graph is as follows:
Based on the output, only two points are far away from all the other points plotted in the graph. We define the value of q
as 2
.
Model training: After calculating the values necessary for model training, we can finally train our SARIMAX
model on the Views_per_day
column. To do this, we use the following lines of code:
p, d, q = 5, 1, 2model_used=sm.tsa.statespace.SARIMAX(traffic_history['Views_per_day'],order=(p, d, q),seasonal_order=(p, d, q, 12))model_used=model_used.fit()print(model_used.summary())
In the code:
Line 1: We define the value p
, d
and q
.
Line 2: We define the model SARIMAX
to use the View_per_day
column, order
represents the non-seasonal component and seasonal_order
with the p
, d
and q
values. In seasonal_order
, we define s
seasonal period as 12
representing months, meaning a year period.
Line 3: We train the model.
Line 4: We print the summary of the trained model.
Predict for future: Now we use the trained model to predict the views on the website for the next 30 days. To do this, we use the following lines of code:
predicted_month = model_used.predict(len(traffic_history), len(traffic_history)+30)print(predicted_month)
In the code:
Line 1: We use the trained model to predict the traffic on the website. We add 30
to the length of data to define a period of time.
Line 2: We print the predicted output.
Plotting the prediction with history: Next, we can finally plot our predictions, the x-axis represents the the days and y-axis represents the traffic on website. The graph shows the traffic with the already fed history to the model using the following lines of code:
traffic_history["Views_per_day"].plot(legend=True, label="Traffic history", figsize=(10, 10))predicted_month.plot(legend=True, label= "Future predictions")
Below is the running example of the following algorithm. Run it and navigate to the working model to test your custom data.
import React from 'react'; require('./style.css'); import ReactDOM from 'react-dom'; import App from './app.js'; ReactDOM.render( <App />, document.getElementById('root') );