Introduction to deep reinforcement learning

Deep Reinforcement Learning in Python

Timothée Carayol

Principal Machine Learning Engineer, Komment

Why Deep Reinforcement Learning

 

  • Traditional RL is suitable for low-dimensional tasks

 

 

  • Many applications have high-dimensional state and/or action space

 

An agent exploring the Frozen Lake environment

An agent playing the classic Space Invaders video game

Deep Reinforcement Learning in Python

The ingredients of DRL

 

  1. Reinforcement Learning concepts
  2. Deep Learning and PyTorch

 

  • DRL uses these concepts with deep neural networks

 

Pixel art illustration of a cook mixing ingredients

Deep Reinforcement Learning in Python

The RL framework

 

  • Step t:

 

A large box with two smaller boxes inside it, labelled 'Agent' and 'Environment'

Deep Reinforcement Learning in Python

The RL framework

 

  • Step t:
    • Agent observes state $s_t$

 

A red arrow with the label State s_t goes from environment to agent.

Deep Reinforcement Learning in Python

The RL framework

 

  • Step t:
    • Agent observes state $s_t$
    • Agent takes action $a_t$

 

A red arrow with label action a_t goes from agent to environment. The state arrow is now black.

Deep Reinforcement Learning in Python

The RL framework

 

  • Step t:
    • Agent observes state $s_t$
    • Agent takes action $a_t$
  • Step t+1:
    • Environment gives reward $r_t$
    • State evolves to $s_{t+1}$

 

The state s_t arrow updates its label to state s_t+1 and is red again. A new red arrow labelled reward r_t+1 goes from environment to agent. The action arrow is now black.

Deep Reinforcement Learning in Python

The RL framework

 

  • Step t:
    • Agent observes state $s_t$
    • Agent takes action $a_t$
  • Step t+1:
    • Environment gives reward $r_t$
    • State evolves to $s_{t+1}$
  • Repeat until episode is complete

 

The same image as the previous slide, but all arrows are black.

Deep Reinforcement Learning in Python

Policy $\pi(s_t)$

 

  • Mapping from state to action, describing how the agent behaves in a given state $s_t$

 

  • Deterministic:
    • Returns the selected action
  • Stochastic:
    • Returns a distribution over actions
    • Policy is a probability distribution over possible actions
Deep Reinforcement Learning in Python

Trajectory and episode return

 

Trajectory tau: Sequence of all states and actions in  an episode; tau = ((s0, a0), (s1, a1), ... (sT, aT))

 

Episode return Rtau: total (discounted) rewards accumulated along trajectory tau. Rtau = sum over t of gamma to the power of t times r_t

Deep Reinforcement Learning in Python

Setting up the environment

env = gym.make("ALE/SpaceInvaders-v5")

# Define neural network architecture class Network(nn.Module): def __init__(self, dim_inputs, dim_outputs): super(Network, self).__init__() self.linear = nn.Linear(dim_inputs, dim_outputs) def forward(self, x): return self.linear(x)
# Instantiate network network = Network(dim_inputs, dim_outputs)
# Instantiate optimizer optimizer = optim.Adam(network.parameters(), lr=0.0001)
Deep Reinforcement Learning in Python

The basic loop

for episode in range(1000):
  state, info = env.reset()
  done = False

while not done:
action = select_action(network, state)
next_state, reward, terminated, truncated, _ = ( env.step(action)) done = terminated or truncated
loss = calculate_loss(network, state, action, next_state, reward, done) optimizer.zero_grad() loss.backward() optimizer.step()
state = next_state

 

  • Outer loop: iterate through episodes
  • Inner loop: iterate through steps
    • Select an action
    • Observe new state and reward
    • Calculate the loss and update the network
    • Update the state
  • (Loss?)
Deep Reinforcement Learning in Python

Coming next

 

 

  • DRL is powerful!
  • Value-based and policy-based approaches
  • DQN and refinements
  • Policy gradient methods

A Datacamp learner diving deep into the sea to discover the secrets of Deep Reinforcement Learning

Deep Reinforcement Learning in Python

Let's practice!

Deep Reinforcement Learning in Python

Preparing Video For Download...