Stochastic Processes

Learn the difference between stochastic and deterministic processes and why stochastic processes are important for time series analysis.

We'll cover the following...

Randomness
Deterministic systems
Stochastic processes

Randomness

Time series are datasets whose time index is one of their main characteristics. This means we need to know when an observation was produced to understand the data-generating process behind our sample. Yet, before going deep into why time is so important, let’s take a step back and reflect on the expression “data-generating process” itself. What does it mean?

The data-generating process of our sample is nothing but the real-world mechanism that produces our data. While the definition may seem obvious, it implies an uncomfortable truth: In the empirical sciences, where statistics and data science belong, we usually don’t know for sure the underlying mechanism that produced our data. At most, we know an approximation of it. Think, for instance, of stock prices: We know that good news about a company’s performance might positively impact its stock price. However, we almost never know exactly by how many dollars the stock will go up if the company beats its revenue forecasts by 1%. In such cases, we’ll usually give a (better or worse) confidence interval.

For this reason, in empirical sciences, we often think of data as realizations of a more or less random process. In mathematical terms, this is known as stochastic processes.

Press + to interact

Deterministic systems

Roughly speaking, stochastic processes are data-generating processes that contain a random component. This is in contrast to deterministic processes, which can be perfectly reconstructed based on logical rules.

Deterministic systems can be as simple as the metric system (one meter will always be 100 centimeters) or as complex as Newtonian physics. To understand the state of a deterministic system, we only need to know two things:

The laws that regulate the system
The initial conditions of the system

Look at the code snippet below. It is an extremely simple example of what a deterministic time series can be. The function calculates the power of an initial state in a sequence of steps. Note that by knowing what the initial condition of the system was (as defined by the parameter initial_state) and how many steps have been calculated, we can trace the whole process.

Press + to interact

Python 3.10.4

import pandas as pd
import matplotlib.pyplot as plt
def deterministic(initial_state, steps):
  '''
  This function takes an initial value for a deterministic system and a number of steps,
  and returns a series of realisations of the system at the end of those steps. 
  The value of the system is the power of the initial step at the end step. 
  Parameters:
  :initial_state: integer
  :steps: integer
  '''
  counter = 0
  state = initial_state
  realisations = []
  while counter <= steps: 
    state = 2 * state 
    realisations.append(state)
    counter +=1
  return realisations
results = deterministic(2, 10)
s = pd.Series(results)
plt.plot(s)
plt.xlabel('Step')
plt.ylabel('Value of system')
plt.show()

Stochastic processes

Stochastic processes, on the other hand, have a built-in random component. This doesn’t mean that the whole system is random, though. Think, for instance, of weather: If you were to guess what the weather would be like tomorrow, expecting it to be similar to today’s wouldn’t be a bad approximation. However, you couldn’t possibly know how many minutes of sunlight or millimeters of rain will fall at any given minute, and even the best forecast will only give you a confidence interval. In other words, weather patterns are sticky and forecastable, but not entirely.

Another example is a country’s gross domestic product (GDP). Modern economists are confident that the value of a country’s production of goods and services tends to increase under the current economic system. However, the upward trend is not monotonic, and it’s usually broken by small and big variations up and down, as you can see in the figure below, which shows the US GDP from January 2000 to July 2022.

Press + to interact

Python 3.10.4

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
def random_walk(initial_state, steps):
  '''
  This function takes an initial value and a number of steps and calculates a series
  of realizations of a simple random walk at the end of those steps. 
  The random walk is defined by a standard normal distribution. 
  Parameters:
  :initial_state: integer
  :steps: integer
  '''    
  counter = 0
  state = initial_state
  realisations = []
  while counter <= steps: 
    state = state + np.random.normal(0,1)
    realisations.append(state)
    counter +=1
  return realisations
results = random_walk(0, 10)
s = pd.Series(results)
plt.plot(s)
plt.xlabel('Step')
plt.ylabel('Value of system')
plt.show()

Introduction to Time Series

The Basics of Time Series

Exploring Data

Analyze Time Series Data Using Markov Transition Fields

The Properties of Time Series

ARIMA Models

On Prediction

Choosing, Fitting, and Evaluating Models

Conclusion

Appendix

What have you learned?

Time Series Forecasting with Prophet in Python

Stochastic Processes

Randomness

Deterministic systems

Stochastic processes