Time series data is a collection of observations recorded at different time intervals. These data points are ordered chronologically and can be used to predict future values. Time series forecasting is a crucial technique in various domains such as finance, economics, and meteorology. In this answer, we’ll explore ARIMA time series forecasting using Python.
ARIMA stands for autoregressive integrated moving average. It’s a robust statistical method for analyzing and forecasting time series data. ARIMA combines three key components to model a time series:
Autoregressive (AR): The Autoregressive component considers the correlation between the current value of the time series and its previous values. It assumes that the future values of the series can be predicted using past values.
Differencing (I): Differencing is a technique used to make a time series stationary. Stationarity is important because many time series models, including ARIMA, work best when the data is stationary. Stationary data has a constant mean and variance over time.
Moving average (MA): The Moving average component helps to model the relationship between the current value and the past prediction errors (residuals).
Let’s exemplify the ARIMA model in Python by predicting the next 10 closing prices of the S&P 500 from the given data as follows:
import pandas as pdimport matplotlib.pyplot as pltfrom statsmodels.tsa.arima.model import ARIMA# Datamy_df = pd.read_csv('data.csv')closing_price = my_df['prices']# ARIMA modelmy_model = ARIMA(closing_price, order=(1, 1, 2))model_fit = my_model.fit()# Predict values# 100-109, refers to the next 10 values after the value at 99th indexpredicted_values = model_fit.predict(100, 109)# Plot actual and predicted valuesplt.figure()plt.plot(closing_price, label='Actual Values')plt.plot(predicted_values, label='Predicted Values', color='red', linestyle='dotted')plt.xlabel('Time')plt.ylabel('Value')plt.title('Actual vs. Predicted Values from ARIMA')plt.legend()plt.show()
Lines 1–3: We import the necessary libraries.
pandas
for data manipulation, matplotlib
for creating visualizations, and statsmodels
for time series analysis using the ARIMA model.
Lines 6–7: The data is loaded from a CSV file named 'data.csv'
using pandas
, which is a powerful data manipulation library.
Specifically, we’re interested in the 'prices'
column of the DataFrame, which presumably contains the closing prices of the S&P 500 stock.
Lines 10–11: We create an ARIMA model with an order
of (1, 1, 2)
. The order consists of three components:
The autoregressive order (p), the differencing order (d), and the moving average order (q). These values determine the behavior of the ARIMA model.
Once the ARIMA model is defined, we fit it to the closing prices of the S&P 500 data.
Lines 15: Here, we’re interested in forecasting the closing price values for the next 10 time points after the last index in our original data. The model_fit.predict(100, 109)
call generates predictions for these time points using the fitted ARIMA model.
Lines 18–25: We create a line plot using matplotlib
. The plot includes two lines: one representing the actual closing price values and another representing the predicted values from our ARIMA model.
The actual values are displayed in blue, while the predicted values are shown in red with a dotted line style.
The x-axis of the plot represents time, and the y-axis represents the value of the closing prices.
The title of the plot is set as 'Actual vs. Predicted Values from ARIMA'
.
To provide clarity, a legend is included to distinguish between the actual and predicted values.
In conclusion, this answer demonstrates the ARIMA model for time series forecasting along with the code example to load time series data, create an ARIMA model, make predictions, and visualize the actual versus predicted closing price values. It’s a practical example of how data analysis and forecasting can be performed using Python and relevant libraries.
Free Resources