VizuaraVizuara AI Pods

Building a Reasoning Model from Scratch

How to teach a small language model to think step-by-step using reinforcement learning with verifiable rewards -- from SFT on chain-of-thought data to GRPO training to distillation.

intermediate~3 hours4 notebooksReasoning ModelsChain-of-ThoughtSupervised Fine-TuningGRPOVerifiable RewardsRejection SamplingDistillationDeepSeek-R1

Curator of this Module

Dr. Rajat Dandekar

Dr. Rajat Dandekar

Course Instructor

Dr. Rajat Dandekar is a researcher and educator specializing in AI/ML, with a passion for making complex concepts accessible through intuitive explanations and hands-on learning.

Checking access…

Learning Path

Article
1
Notebook 1
2
Notebook 2
3
Notebook 3
4
Notebook 4
Case Study
Certificate