Introduction to deep Q learning

Deep Reinforcement Learning in Python

Timothée Carayol

Principal Machine Learning Engineer, Komment

What is Deep Q Learning?

 

 

An image representing Q(state, action), with the state represented as the Earth and the action represented as a joystick

Deep Reinforcement Learning in Python

Q-Learning refresher

 

Action value function Q_pi(s,a): sum of future rewards if action a is taken in state s, assuming that the policy pi is followed afterwards. Q_pi(s,a) = expected value over future trajectories given policy pi is followed of R_tau given s_t=s and a_t=a

 

 

  • Knowledge of $Q$ would enable optimal policy: $$ \pi(s_t) = {\arg\max}_a Q(s_t, a) $$

  • Goal of Q-learning: learn $Q$ over time

Deep Reinforcement Learning in Python

Q-Learning refresher

Bellman equation (in Q-learning) in a deterministic environment: Q_pi(s_t, a_t) = reward r_t+1 + discount rate gamma * max over a_t+1 of Q_pi(s_t+1, a_t+1))

Temporal difference target, a.k.a. TD-target, Q-target or target Q-value: refers to the right side of the Bellman equation, used as target value for the Q-learning update rule. r_t+1 + gamma * max over a_t+1 of Q_pi(s_t+1, a_t+1))

  • Bellman equation: recursive formula for $Q$
  • Right side of Bellman Equation: "TD-target"
  • Use TD-target from Bellman Equation to update $\hat{Q}$ after each step

Q-learning update rule: Q_new = (1-alpha) Q_old + alpha * TD-target

Deep Reinforcement Learning in Python

The Q-Network

A Q-table with 4 states and 4 actions, so 16 cells to fill

Deep Reinforcement Learning in Python

The Q-Network

A Q-table with 9 states and 4 actions, so 36 cells to fill

Deep Reinforcement Learning in Python

The Q-Network

A Q-table with dozens of states and 4 action, probably ~100 cells to fill

Deep Reinforcement Learning in Python

The Q-Network

  • At the heart of Deep Q Learning: a neural network

Illustration of a fully connected neural network with two hidden layers

Deep Reinforcement Learning in Python

The Q-Network

  • At the heart of Deep Q Learning: a neural network

Illustration of a fully connected neural network with two hidden layers, with the Earth image from the previous slide feeding into the input layer

Deep Reinforcement Learning in Python

The Q-Network

  • At the heart of Deep Q Learning: a neural network mapping state to Q-values

The illustration from the previous slide, with each node in the output layer associated with an action represented as a direction on the joystick. Up means action 0, 1 means right, down means 2, left means 3.

  • A network approximating the action-value function is called 'Q-network'
  • Q-networks are commonly used in Deep Q Learning algorithms, such as DQN.
Deep Reinforcement Learning in Python

Implementing the Q-network

class QNetwork(nn.Module):

def __init__(self, state_size, action_size): super(QNetwork, self).__init__()
self.fc1 = nn.Linear(state_size, 64) self.fc2 = nn.Linear(64, 64) self.fc3 = nn.Linear(64, action_size)
def forward(self, state): x = torch.relu(self.fc1(torch.tensor(state))) x = torch.relu(self.fc2(x)) return self.fc3(x)
q_network = QNetwork(8, 4)
optimizer = optim.Adam(q_network.parameters(), lr=0.0001)
  • Input dimension determined by state
  • Output dimension determined by number of possible actions

  • In this example:

    • 2 hidden layers with 64 nodes each
    • ReLu activation function
Deep Reinforcement Learning in Python

Let's practice!

Deep Reinforcement Learning in Python

Preparing Video For Download...