Navigating the RL framework

Reinforcement Learning with Gymnasium in Python

Fouad Trad

Machine Learning Engineer

RL framework

Image showing an agent component.

Reinforcement Learning with Gymnasium in Python

RL framework

Image showing agent and environment components.

Reinforcement Learning with Gymnasium in Python

RL framework

  • Agent: learner, decision-maker
  • Environment: challenges to be solved

Image showing all RL components including: agent, environment, states, actions, and rewards.

Reinforcement Learning with Gymnasium in Python

RL framework

  • Agent: learner, decision-maker
  • Environment: challenges to be solved
  • State: environment snapshot at given time

Image showing that the environment gives states for the agent.

Reinforcement Learning with Gymnasium in Python

RL framework

  • Agent: learner, decision-maker
  • Environment: challenges to be solved
  • State: environment snapshot at given time
  • Action: agent's choice in response to state

Image showing that the agent responds to the environment's state by executing an action.

Reinforcement Learning with Gymnasium in Python

RL framework

  • Agent: learner, decision-maker
  • Environment: challenges to be solved
  • State: environment snapshot at given time
  • Action: agent's choice in response to state
  • Reward: feedback for agent action

Image showing that the agent responds to the environment's state by executing an action, and receives a reward from the environment based on the executed action.

Reinforcement Learning with Gymnasium in Python

RL interaction loop

env = create_environment()
state = env.get_initial_state()

for i in range(n_iterations): action = choose_action(state)
state, reward = env.execute(action)
update_knowledge(state, action, reward)

Image showing that the agent responds to the environment's state by executing an action, and receives a reward from the environment based on the executed action.

Reinforcement Learning with Gymnasium in Python

Episodic vs. continuous tasks

Episodic tasks
  • Tasks segmented in episodes
  • Episode has beginning and end
  • Example: agent playing chess

Image showing a cat playing chess.

Continuous tasks
  • Continuous interaction
  • No distinct episodes
  • Example: Adjusting traffic lights

Image showing a frog riding a bike and waiting for traffic lights to turn green.

Reinforcement Learning with Gymnasium in Python

Return

  • Actions have long term consequences
  • Agent aims to maximize total reward over time
  • Return: sum of all expected rewards

Image showing that the return is the sum of individual rewards r_1 through r_n.

Reinforcement Learning with Gymnasium in Python

Discounted return

  • Immediate rewards are more valuable than future ones
  • Discounted return: gives more weight to nearer rewards
  • Discount factor ($\gamma$): discounts future rewards

Image showing the formula of the discounted return as the sum of rewards, each multiplied by the discount factor, raised to the power of its respective time step.

Reinforcement Learning with Gymnasium in Python

Discount factor

  • Between zero and one
  • Balances immediate vs. long-term rewards
    • Lower value → immediate gains
    • Higher value → long-term benefits

Image showing the influence of discount factor's extreme values where a value of zero favors only immediate gains, and a value of one favors future gains without discount.

Reinforcement Learning with Gymnasium in Python

Numerical example

import numpy as np
expected_rewards = np.array([1, 6, 3])

discount_factor = 0.9
discounts = np.array([discount_factor ** i for i in range(len(expected_rewards))])
print(f"Discounts: {discounts}")
Discounts: [1.   0.9  0.81]
discounted_return = np.sum(expected_rewards * discounts)
print(f"The discounted return is {discounted_return}")
The discounted return is 8.83
Reinforcement Learning with Gymnasium in Python

Let's practice!

Reinforcement Learning with Gymnasium in Python

Preparing Video For Download...