Policy Gradient Methods: Teaching Your Agent to Climb Mountains
From REINFORCE to Actor-Critic -- how gradient ascent on policy parameters unlocks continuous and high-dimensional action spaces.
intermediate~3 hours3 notebooksPolicy ParameterizationSoftmax PolicyPolicy Gradient TheoremREINFORCE AlgorithmVariance ReductionBaseline MethodsActor-CriticAdvantage Function
Curator of this Module
Dr. Rajat Dandekar
Course Instructor
Dr. Rajat Dandekar is a researcher and educator specializing in AI/ML, with a passion for making complex concepts accessible through intuitive explanations and hands-on learning.
Checking access…
Learning Path
Article
1
Notebook 12
Notebook 23
Notebook 3Case Study
Certificate