Building a Reasoning Model from Scratch

How to teach a small language model to think step-by-step using reinforcement learning with verifiable rewards -- from SFT on chain-of-thought data to GRPO training to distillation.

intermediate~3 hours4 notebooksReasoning ModelsChain-of-ThoughtSupervised Fine-TuningGRPOVerifiable RewardsRejection SamplingDistillationDeepSeek-R1

Curator of this Module

Dr. Rajat Dandekar

Course Instructor

Dr. Rajat Dandekar is a researcher and educator specializing in AI/ML, with a passion for making complex concepts accessible through intuitive explanations and hands-on learning.

Checking access…

Learning Path

Article

Notebook 1

Notebook 2

Notebook 3

Notebook 4

Case Study

Certificate

Building a Reasoning Model from Scratch

Curator of this Module

Dr. Rajat Dandekar

Learning Path

Read the Article

Practice with Notebooks

Apply Your Knowledge

Get Your Certificate