Deep Reinforcement Learning in Python
Timothée Carayol
Principal Machine Learning Engineer, Komment
Knowledge of $Q$ would enable optimal policy: $$ \pi(s_t) = {\arg\max}_a Q(s_t, a) $$
Goal of Q-learning: learn $Q$ over time
class QNetwork(nn.Module):
def __init__(self, state_size, action_size): super(QNetwork, self).__init__()
self.fc1 = nn.Linear(state_size, 64) self.fc2 = nn.Linear(64, 64) self.fc3 = nn.Linear(64, action_size)
def forward(self, state): x = torch.relu(self.fc1(torch.tensor(state))) x = torch.relu(self.fc2(x)) return self.fc3(x)
q_network = QNetwork(8, 4)
optimizer = optim.Adam(q_network.parameters(), lr=0.0001)
Output dimension determined by number of possible actions
In this example:
Deep Reinforcement Learning in Python