Extension of Markov Decision Process

Learn the basics of the extension of the MDP approach, known as the POMDP approach.

Disadvantages of MDPs

MDPs require a fully observable world state. They assume that the agent has all the information necessary to make its decisions without any ambiguity in the current situation. They allow probabilistic results of actions but not probabilistic inputs. Partially observable Markov decision processes (POMDPs) relax this assumption. They require only that the world state be consistently defined such that it is explorable and discoverable, not that it’s fully available to the agent in all situations. Therefore, POMDPs are more complex and can find good policies (better than MDPs) for more difficult problems. However, there are currently no general-purpose methods for perfectly solving POMDPs in a reasonable amount of time.

Baker and Tenenbaum (2014) developed a plan recognition system that uses POMDPs. Because POMDPs can explicitly represent incomplete knowledge about other agents, Baker and Tenenbaum considered reasoning using POMDPs as a form of theory of mind, which in turn implies that other agents are acting according to a particular internal logic that reflects their current mental state—that is, their beliefs about the world and their desires. Specifically, Baker and Tenenbaum defined Bayesian theory of mind (BToM) to be a formal version of theory of mind, in which other agents are assumed to act to satisfy their own internal goals according to their own beliefs but in a way that may not be perfectly optimal or based on unerring rational decision-making. BToM uses probabilistic modeling (with POMDPs) to represent approximate adherence to a rational decision-making process with room for variation and error.

What do the beliefs and desires represent?

Baker and Tenenbaum (2014) represented the observed agent’s beliefs and desires using a DBN, solving forward and backward inference with standard POMDP solution-approximation methods, where the forward modeling of a POMDP represents finding a policy and inferring the agent’s plan. This inverse planning approach is common in plan recognition, and in this case, it also naturally incorporates probabilistic uncertainty. Specifically, desires in their model are represented as objects or events. Beliefs are represented as possible world states that may be true, associated with probabilities of their truth. The observer agent explicitly tracks the observed agent’s likely beliefs and desires, according to a Bayesian model that incorporates uncertainty about both.

Reliability of the approach

Testing this approach in a relatively simple domain, a simulation of a student looking for and choosing between three food trucks, Baker and Tenenbaum (2014) found that their model was more effective in predicting the agent’s goals and plans than two simplified models. One of these alternate models assumes that the agent has complete knowledge of the world state before beginning its search, essentially treating their reasoning approach as an MDP instead of a POMDP. In the baseline model, the agent’s modeled beliefs are fixed, based on a prior distribution of the possible world states, and are not updated as the agent explores the environment. Compared to both simplified models, the BToM model was more successful in describing the internal beliefs and desires of the observed agent in a way that matched manual human judgments.

Can we apply POMDP in current commercial games?

The technique relies on a precise formulation of the problem space as a POMDP. Thus, limitations like those discussed when using MDP-based techniques apply here. Specifically, there must be a compact world state and a clearly defined forward simulation. Moreover, the prior probability distributions or tables for player beliefs and desires must be set, either by expert determinations or by learning from data. In the case of the simple food truck domain, this is relatively easy. The agents can be expected to travel reasonably short paths towards their desired food truck once they see it.

In other games, with more complex decision-making, the problem formulation may not be nearly as easy. Moreover, POMDPs are known to be significantly more computationally complex than MDPs, and inverse planning is more computationally intensive than planning (Baker and Tenenbaum, 2014). In terms of computational hardness, POMDPs are known to be in the class of PSPACE-complete problems (Papadimitriou and Tsitsiklis, 1987), which are much harder than NP problems with no efficient solution known to exist.

It is unclear whether it is currently computationally feasible to even approximate an inverse planning solution to POMDPs for complex games. Therefore, no wonder we have yet to see more examples of this approach used for commercial or modern games. In the future, however, better inverse planning solvers and increased computing power may help to alleviate these concerns, and we may see these approaches become more viable for today’s games.

Get hands-on with 1200+ tech skills courses.