MDPs, Rewards, and the Markov Property

Implement a complete MDP from scratch (the recycling robot), compute returns with and without discounting, verify the Markov property computationally, and find optimal policies through exhaustive search.

Ready to Code

Download this notebook and open it in Google Colab. Work through the exercises — this notebook includes voice narration inside Colab.

~50 min2 exercises

0/3 complete