Practice Notebooks

Work through each notebook sequentially. Complete the exercises to unlock the next one.

0/3

RL Foundations: The Four Elements

Compare RL to supervised and unsupervised learning, understand the four elements of RL (policy, reward, value function, model), and build a Tic-Tac-Toe agent that learns through self-play.

45 min2 exercisesNarrated

MDPs, Rewards, and the Markov Property

Implement a complete MDP from scratch (the recycling robot), compute returns with and without discounting, verify the Markov property computationally, and find optimal policies through exhaustive search.

50 min2 exercisesNarrated

Your First RL Agent with Gymnasium

Use OpenAI Gymnasium to interact with the CartPole environment, compare random agents to heuristic agents, and implement a Q-learning agent that learns to balance the pole from scratch.

55 min2 exercisesNarrated