Actors and Actor-Critics

Learn about policy search and actor-critic schemes.

We'll cover the following

Policy search

So far, we have focused on finding a value function, and we derived from this the greedy policy as the action that leads to the state with the largest return. The value function can be seen as a critic that adapts the policy. Another approach, especially when using function approximators, is to consider a parameterized policy directly and search for good policy parameters. Such an approach is called an actor. We need to find parameters that maximize the payoff. We illustrated such a setting in this figurefig101c. It’s now time to think about the implementation of this actor as a deep neural network that takes observations such as pictures from a camera and produces outputs such as motor commands for a mobile robot.

Get hands-on with 1200+ tech skills courses.