Balancing exploration and exploitation

Reinforcement Learning with Gymnasium in Python

Fouad Trad

Machine Learning Engineer

Training with random actions

  • Agent explores environment
  • No strategy optimization based on learned knowledge
  • Agent uses knowledge when training done

Image showing an agent within an environment

Reinforcement Learning with Gymnasium in Python

Exploration-exploitation trade-off

 

  • Balances exploration and exploitation
  • Continuous exploration prevents strategy refinement
  • Exclusive exploitation misses undiscovered opportunities

Image showing an agent trying to explore new actions in order to discover more rewards, and trying to exploit its knowledge while possibly missing on some rewards.

Reinforcement Learning with Gymnasium in Python

Dining choices

Image showing a restaurant table.

Reinforcement Learning with Gymnasium in Python

Epsilon-greedy strategy

 

  • Explore with probability epsilon

Diagram showing that with a probability epsilon, the agent explores by choosing a random action.

Reinforcement Learning with Gymnasium in Python

Epsilon-greedy strategy

 

  • Explore with probability epsilon
  • Exploit with probability 1-epsilon
  • Ensures continuous exploration while using knowledge

Diagram showing that with a probability epsilon, the agent explores by choosing a random action, and with a probability of 1 - epsilon, it exploits by selecting the best known action.

Reinforcement Learning with Gymnasium in Python

Decayed epsilon-greedy strategy

 

  • Reduces epsilon over time
  • More exploration initially
  • More exploitation later on
  • Agent increasingly relies on its accumulated knowledge

Image showing how epsilon decreases over time.

Reinforcement Learning with Gymnasium in Python

Implementation with Frozen Lake

env = gym.make('FrozenLake', is_slippery=True)

action_size = env.action_space.n
state_size = env.observation_space.n
Q = np.zeros((state_size, action_size))

alpha = 0.1 gamma = 0.99 total_episodes = 10000

Image showing a snapshot of the Frozen Lake environment.

Reinforcement Learning with Gymnasium in Python

Implementing epsilon_greedy()

def epsilon_greedy(state):

if np.random.rand() < epsilon: action = env.action_space.sample() # Explore
else: action = np.argmax(Q[state, :]) # Exploit return action
Reinforcement Learning with Gymnasium in Python

Training epsilon-greedy

epsilon = 0.9   # Exploration rate

rewards_eps_greedy = []
for episode in range(total_episodes):
    state, info = env.reset()
    terminated = False
    episode_reward = 0
    while not terminated:
        action = epsilon_greedy(state)
        new_state, reward, terminated, truncated, info = env.step(action)       
        Q[state, action] = update_q_table(state, action, new_state) 
        state = new_state

episode_reward += reward rewards_eps_greedy.append(episode_reward)
Reinforcement Learning with Gymnasium in Python

Training decayed epsilon-greedy

epsilon = 1.0   # Exploration rate
epsilon_decay = 0.999
min_epsilon = 0.01

rewards_decay_eps_greedy = [] for episode in range(total_episodes): state, info = env.reset() terminated = False episode_reward = 0 while not terminated: action = epsilon_greedy(state) new_state, reward, terminated, truncated, info = env.step(action) episode_reward += reward Q[state, action] = update_q_table(state, action, new_state) state = new_state rewards_decay_eps_greedy.append(episode_reward)
epsilon = max(min_epsilon, epsilon * epsilon_decay)
Reinforcement Learning with Gymnasium in Python

Comparing strategies

avg_eps_greedy= np.mean(rewards_eps_greedy)
avg_decay = np.mean(rewards_decay_eps_greedy)
plt.bar(['Epsilon Greedy', 'Decayed Epsilon Greedy'],
        [avg_eps_greedy, avg_decay], 
        color=['blue', 'green'])
plt.title('Average Reward per Episode')
plt.ylabel('Average Reward')
plt.show()

Image of a bar plot showing that the average reward achieved with epsilon-greedy is around 0.02 while the one achieved with decayed epsilon-greedy is around 0.55.

Reinforcement Learning with Gymnasium in Python

Let's practice!

Reinforcement Learning with Gymnasium in Python

Preparing Video For Download...