Formalization of the T-Maze Problem

Get an idea about imitation learning and the T-maze problem.

In supervised learning, we assumed that a teacher supplies detailed information on the desired response of a learner. This was particularly suited to object recognition, where we had a large number of labeled examples. A much more common learning condition is when an agent, such as a human or a robot, has to learn to make decisions in the environment. In what follows, the agent is a machine learner that we implement in software, but it’s useful to think about the agent as a system that can act in the world, like a robot or a human.

Imitation learning

A good example of such learning tasks for an agent is that of learning to play tennis. In this case, the agent might try out moves and get rewarded by points they score, rather than a teacher who specifies every muscle movement they need to follow. Or, in the case of a robot, an engineer who designs every sequence of motor activations. One approach that resembles supervised learning is that of a trainer who demonstrates the correct moves. This type of supervised learning is called imitation learning. Much of imitation learning follows the previous discussion, so we will concentrate in this chapter on an important learning scenario where the agent only gets simple feedback after periods of actions in the form of reward or punishment, without detailing which of the actions has contributed to the outcome. This type of learning scenario is called reinforcement learning (RL)\textbf{(RL)}.

Get hands-on with 1200+ tech skills courses.