Mini-SWE-RL: Teaching a Small Language Model to Fix Bugs with Reinforcement Learning
How to build the exact same RL pipeline used by state-of-the-art SWE agents like DeepSWE — miniaturized to run on your laptop in 30 minutes. From puzzle design to GRPO training to results analysis.
intermediate~3 hours4 notebooksGRPO for Code FixingBinary Reward RLThe Sweet Spot ProblemPolicy Gradient Training LoopFrom Laptop to Production Scale
Curator of this Module
Dr. Rajat Dandekar
Course Instructor
Dr. Rajat Dandekar is a researcher and educator specializing in AI/ML systems. He creates hands-on courses that teach complex concepts from first principles.
Checking access…
Learning Path
Article
1
Notebook 12
Notebook 23
Notebook 34
Notebook 4Case Study
Certificate