Course Content
Module 1 of 1
Proximal Policy Optimization
0/1
0% complete1. Policy Gradient Methods+150 XP
Module 1/1 · Lesson 1/1
Policy Gradient Methods
rl
intermediate
+150 XP
Policy Gradient Methods
Policy gradient methods optimize the policy directly by ascending the gradient of expected return.
REINFORCE Algorithm
The policy gradient theorem states:
Where is the discounted return.
PPO Clip Objective
PPO constrains updates using a clipped surrogate objective:
Where is the probability ratio.