Model-Free Reinforcement Learning

Learn about model-free reinforcement learning.

Temporal difference method for value iteration

In the previous lesson, we assumed a model of the environment through explicit knowledge of the functions τ(s,a)\tau (s, a) and ρ(s,a)\rho(s, a). While the Bellman equations have been known since the 1950s, their usefulness has been limited due to the fact that finding the environmental functions can be difficult. This is one of the reasons that such RL\text{RL} techniques have not gained more traction.

Get hands-on with 1200+ tech skills courses.