Learn how to use SARIMA to make forecasts.

Understanding SARIMA

To review, SARIMA combines multiple time series components—seasonality, autoregression, integration, and moving average. The full model is defined by its order, which will represent the parameters for each of these components—(p,d,q)(P,D,Q)(p, d, q)(P, D, Q). The lowercase letters pp, dd, and qq represent the orders for autoregression, integration, and moving average, respectively. The uppercase letters PP, DD, and QQ represent the seasonal orders for each of these same components.

We might wonder how exactly to define the model order. There is no hard rule, but we can try a general framework that works well. For that, we need to address each component separately, starting with dd and DD, which depend on the stationarity of our data.

Finding dd and DD

Since our series has trend and seasonality, it can't be stationary, so we need integration. Looking at the growth, the series seems to be closer to an exponential curve than to a straight line, so we can try differentiating twice by using shift(1) twice and doing an Augmented Dickey-FullerTest to detect trend-stationarity in time series data test to see if we get rid of the trend.

The shift(n) method on a pandas series will move it nn rows down so that the value in the first row moves to the second and so on. This allows us to get a value in time tt and subtract the tnt-n value from it. By using shift(1) twice in a row, we are taking the difference of the difference, thus reducing the effect of a nonlinear trend.

Get hands-on with 1200+ tech skills courses.