Notebook 2 of 3
MDPs, Rewards, and the Markov Property
Implement a complete MDP from scratch (the recycling robot), compute returns with and without discounting, verify the Markov property computationally, and find optimal policies through exhaustive search.
Ready to Code
Download this notebook and open it in Google Colab. Work through the exercises — this notebook includes voice narration inside Colab.
~50 min2 exercises
0/3 complete