Introduction to Markov Decision Process (MDP)

Learn the basics of Markov's decision process and its use in commercial games.

We'll cover the following

The approach

Markov decision processes (MDPs) are a way to formulate problems that are characterized by actions to maximize rewards in a fully observable situation or environment. An MDP consists of a set of world states, $S$, and available actions from those states, $A$. The MDP may be probabilistic in that the actual result of taking an action (the next state ss’) may depend on a probabilistic transition model P(ss,a)P(s’|s,a). Each time the agent acts, it receives a reward rr, though that reward may be zero or negative. That reward may be the same whenever an agent enters a particular state, or it may depend on the state and action that led there. An optimal policy for solving an MDP would be to take any world state ss and an output action aa, which maximize the future expected rewards (usually discounted over time to prioritize more immediate rewards).

An MDP is commonly used as a problem formulation in reinforcement learning because an optimal policy for an MDP can be found by repeated simulation and iteration without prior human-labeled data as long as the world state is clearly defined, the action space is limited, and the reward function is known.

Get hands-on with 1200+ tech skills courses.