Group-Relative Policy Optimization (GRPO) -- From Scratch

How DeepSeek eliminated the critic network and made RLHF simpler, cheaper, and better. From group-relative advantages to training reasoning models.

intermediate~3 hours3 notebooksGRPOGroup-Relative AdvantagesPPO vs GRPODeepSeek-R1Verifiable RewardsRLHF without Critic

Curator of this Module

Dr. Rajat Dandekar

Course Instructor

Dr. Rajat Dandekar is a researcher and educator specializing in AI/ML, with a passion for making complex concepts accessible through intuitive explanations and hands-on learning.

Checking access…

Learning Path

Article

Notebook 1

Notebook 2

Notebook 3

Case Study

Certificate

Group-Relative Policy Optimization (GRPO) -- From Scratch

Curator of this Module

Dr. Rajat Dandekar

Learning Path

Read the Article

Practice with Notebooks

Apply Your Knowledge

Get Your Certificate