Markov Decision Processes

Reinforcement Learning with Gymnasium in Python

Fouad Trad

Machine Learning Engineer

MDP

  • Models RL environments mathematically

Image showing a complex environment (smart city) and the components we should extract from it: states, actions, rewards, and transition probabilities.

Reinforcement Learning with Gymnasium in Python

MDP

  • Models RL environments mathematically

Diagram showing how from a complex environment we extract MDP components (states, actions, rewards, and transition probabilities) to solve the environment with model-based RL techniques.

Reinforcement Learning with Gymnasium in Python

Markov property

  • Future state depends only on current state and action

Image showing a chess board with some arrows for possible moves.

Reinforcement Learning with Gymnasium in Python

Frozen Lake as MDP

  • Agent must reach goal without falling into holes

Image showing the frozen lake environment.

Reinforcement Learning with Gymnasium in Python

Frozen Lake as MDP - states

  • Positions agent can occupy

Image showing three different positions of the agent within the Frozen Lake environment.

Reinforcement Learning with Gymnasium in Python

Frozen Lake as MDP - terminal states

  • Lead to episode termination

Image showing the terminal states in the frozen lake environment.

Reinforcement Learning with Gymnasium in Python

Frozen Lake as MDP - actions

  • Up, down, left, right

Image showing the actions to perform in Frozen Lake along with their associated labels: 0-left, 1-down, 2-right, and 3-up.

Reinforcement Learning with Gymnasium in Python

Frozen Lake as MDP - transitions

  • Actions don't necessarily lead to expected outcomes

Image showing agent at the top left corner of the frozen lake grid aiming to move right.

Reinforcement Learning with Gymnasium in Python

Frozen Lake as MDP - transitions

  • Actions don't necessarily lead to expected outcomes

Image showing that the agent can move to the right.

Reinforcement Learning with Gymnasium in Python

Frozen Lake as MDP - transitions

  • Actions don't necessarily lead to expected outcomes

Image showing that the agent can also move down.

Reinforcement Learning with Gymnasium in Python

Frozen Lake as MDP - transitions

  • Actions don't necessarily lead to expected outcomes

Image showing that the agent might also stay in the same place.

Reinforcement Learning with Gymnasium in Python

Frozen Lake as MDP - transitions

  • Actions don't necessarily lead to expected outcomes

Image showing that when the agent decides to move right, there are probabilities for the agent to go right, down, or stay in the same place.

  • Transition probabilities: likelihood of reaching a state given a state and action
Reinforcement Learning with Gymnasium in Python

Frozen Lake as MDP - rewards

  • Reward only given in goal state

Image showing the agent in the goal state.

Reinforcement Learning with Gymnasium in Python

Gymnasium states and actions

import gymnasium as gym


env = gym.make('FrozenLake', is_slippery=True)
print(env.action_space)
print(env.observation_space)
print("Number of actions: ", env.action_space.n)
print("Number of states: ", env.observation_space.n)
Discrete(4)

Discrete(16)
Number of actions: 4
Number of states: 16
Reinforcement Learning with Gymnasium in Python

Gymnasium rewards and transitions

env.unwrapped.P: dictionary where keys are state-action pairs

print(env.unwrapped.P[state][action])
[
  (probability_1, next_state_1, reward_1, is_terminal_1), 
  (probability_2, next_state_2, reward_2, is_terminal_2), 
  etc.
]
Reinforcement Learning with Gymnasium in Python

Gymnasium rewards and transitions - example

state = 6
action = 0

print(env.unwrapped.P[state][action])
[(0.3333333333333333, 2, 0.0, False), 
(0.3333333333333333, 5, 0.0, True), 
(0.3333333333333333, 10, 0.0, False)]

Image showing action numbers: 0-left, 1-down, 2-right, 3-up.

Image showing the agent in state number 6 with states being numbered from the top left to the lower right, line by line.

Reinforcement Learning with Gymnasium in Python

Let's practice!

Reinforcement Learning with Gymnasium in Python

Preparing Video For Download...