Deep Reinforcement Learning dengan Python
Timothée Carayol
Principal Machine Learning Engineer, Komment


Mengetahui $Q$ memungkinkan kebijakan optimal: $$ \pi(s_t) = {\arg\max}_a Q(s_t, a) $$
Tujuan Q-learning: mempelajari $Q$ seiring waktu









class QNetwork(nn.Module):def __init__(self, state_size, action_size): super(QNetwork, self).__init__()self.fc1 = nn.Linear(state_size, 64) self.fc2 = nn.Linear(64, 64) self.fc3 = nn.Linear(64, action_size)def forward(self, state): x = torch.relu(self.fc1(torch.tensor(state))) x = torch.relu(self.fc2(x)) return self.fc3(x)q_network = QNetwork(8, 4)optimizer = optim.Adam(q_network.parameters(), lr=0.0001)
Dimensi output ditentukan oleh jumlah aksi yang mungkin
Contoh ini:
Deep Reinforcement Learning dengan Python