Expected SARSA

Reinforcement Learning with Gymnasium in Python

Fouad Trad

Machine Learning Engineer

Expected SARSA

  • TD method
  • Model-free technique
  • Updates Q-table differently than SARSA and Q-learning

Diagram showing the steps involved in expected SARSA including initializing a Q-table, choosing an action to perform, receiving a reward from the environment, and updating the table. The agent continues this loop until convergence is achieved after a certain number of episodes.

Reinforcement Learning with Gymnasium in Python

Expected SARSA update

SARSA

Image showing the mathematical formula of the SARSA update rule.

Q-learning

Image showing the mathematical formula of the Q-learning update rule.

Expected SARSA

Image showing the mathematical formula of the expected SARSA update rule.

Reinforcement Learning with Gymnasium in Python

Expected value of next sate

Image showing the mathematical formula of the expected SARSA update rule.

  • Takes into account all actions

Image showing the mathematical formula for the expected Q-value for the next state.

  • Random actions → equal probabilities

Image showing the mathematical formula for the expected Q-value for the next state when actions are chosen randomly with equal probabilities.

Reinforcement Learning with Gymnasium in Python

Implementation with Frozen Lake

env = gym.make('FrozenLake-v1', 
               is_slippery=False)

num_states = env.observation_space.n
num_actions = env.action_space.n
Q = np.zeros((num_states, num_actions))

gamma = 0.99 alpha = 0.1 num_episodes = 1000

Image showing the Frozen Lake environment

Reinforcement Learning with Gymnasium in Python

Expected SARSA update rule

def update_q_table(state, action, next_state, reward):

expected_q = np.mean(Q[next_state])
Q[state, action] = (1-alpha) * Q[state, action] + alpha * (reward + gamma * expected_q)

Image showing the mathematical formula of the expected SARSA update rule.

Reinforcement Learning with Gymnasium in Python

Training

for i in range(num_episodes):
    state, info = env.reset()    
    terminated = False  

while not terminated: action = env.action_space.sample()
next_state, reward, terminated, truncated, info = env.step(action)
update_q_table(state, action, next_state, reward) state = next_state
Reinforcement Learning with Gymnasium in Python

Agent's policy

policy = {state: np.argmax(Q[state]) 
          for state in range(num_states)}
print(policy)
{ 0: 1,  1: 2,  2: 1,  3: 0, 
  4: 1,  5: 0,  6: 1,  7: 0, 
  8: 2,  9: 2, 10: 1, 11: 0, 
 12: 0, 13: 2, 14: 2, 15: 0}

Image showing the policy learned by the agent, showing which action to perform in every state.

Reinforcement Learning with Gymnasium in Python

Let's practice!

Reinforcement Learning with Gymnasium in Python

Preparing Video For Download...