Group-Relative Policy Optimization (GRPO) -- From Scratch
How DeepSeek eliminated the critic network and made RLHF simpler, cheaper, and better. From group-relative advantages to training reasoning models.
intermediate~3 hours3 notebooksGRPOGroup-Relative AdvantagesPPO vs GRPODeepSeek-R1Verifiable RewardsRLHF without Critic
Curator of this Module
Dr. Rajat Dandekar
Course Instructor
Dr. Rajat Dandekar is a researcher and educator specializing in AI/ML, with a passion for making complex concepts accessible through intuitive explanations and hands-on learning.
Checking access…
Learning Path
Article
1
Notebook 12
Notebook 23
Notebook 3Case Study
Certificate