Reinforcement Learning with Gymnasium in Python
Fouad Trad
Machine Learning Engineer
# 0: left, 1: down, 2: right, 3: up policy = { 0:1, 1:2, 2:1, 3:1, 4:3, 5:1, 6:2, 7:3 }
state, info = env.reset() terminated = False while not terminated: action = policy[state] state, reward, terminated, _, _ = env.step(action)
def compute_state_value(state):
if state == terminal_state: return 0
action = policy[state]
_, next_state, reward, _ = env.unwrapped.P[state][action][0]
return reward + gamma * compute_state_value(next_state)
terminal_state = 8 gamma = 1
V = {state: compute_state_value(state) for state in range(num_states)}
print(V)
{0: 1, 1: 8, 2: 9,
3: 2, 4: 7, 5: 10,
6: 3, 7: 5, 8: 0}
# 0: left, 1: down, 2: right, 3: up policy_two = { 0:2, 1:2, 2:1, 3:2, 4:2, 5:1, 6:2, 7:2 }
V_2 = {state: compute_state_value(state) for state in range(num_states)} print(V_2)
State-values for policy 1
{0: 1, 1: 8, 2: 9,
3: 2, 4: 7, 5: 10,
6: 3, 7: 5, 8: 0}
State-values for policy 2
{0: 7, 1: 8, 2: 9,
3: 7, 4: 9, 5: 10,
6: 8, 7: 10, 8: 0}
Reinforcement Learning with Gymnasium in Python