Deep Reinforcement Learning in Python
Timothée Carayol
Principal Machine Learning Engineer, Komment
Q-learning:
Policy learning:
$\pi_\theta(a_t | s_t)$:
class PolicyNetwork(nn.Module): def __init__(self, state_size, action_size): super(PolicyNetwork, self).__init__() self.fc1 = nn.Linear(state_size, 64) self.fc2 = nn.Linear(64, 64) self.fc3 = nn.Linear(64, action_size) def forward(self, state): x = torch.relu(self.fc1(torch.tensor(state))) x = torch.relu(self.fc2(x)) action_probs = torch.softmax(self.fc3(x), dim=-1) return action_probs
action_probs = policy_network(state) print('Action probabilities:', action_probs)
Action probabilities: tensor([0.21, 0.02, 0.74, 0.03])
action_dist = ( torch.distributions.Categorical(action_probs))
action = action_dist.sample()
Policy must maximize expected returns
Objective function:
Policy must maximize expected returns
Objective function:
Deep Reinforcement Learning in Python