Python ile Deep Reinforcement Learning
Timothée Carayol
Principal Machine Learning Engineer, Komment

for episode in range(num_episodes):# 1. Bölümü başlatwhile not done:# 2. Eylem seç# 3. Eylemi oyna, sonraki durum ve ödülü al# 4. (İskontolu) ödülü getirime ekle# 5. Durumu güncelle# 6. Kayıp hesapla# 7. Politika ağını gradyan inişiyle güncelle
from torch.distributions import Categorical def select_action(policy_network, state): action_probs = policy_network(state)action_dist = Categorical(action_probs)action = action_dist.sample()log_prob = action_dist.log_prob(action)return action.item(), log_prob.reshape(1)action, log_prob = select_action( policy_network, state)
Örneklenen eylem indeksi: 1
Örneklenen eylemin log olasılığı: -1.38
Politika gradyanı teoremine dönelim:


Python'da:
episode_returnepisode_log_probsloss = -episode_return * episode_log_probs.sum()
for episode in range(50): state, info = env.reset(); done = False; step = 0; episode_log_probs = torch.tensor([])R = 0while not done: step += 1 action, log_prob = select_action(policy_network, state)next_state, reward, terminated, truncated, _ = env.step(action) done = terminated or truncatedR += (gamma ** step) * rewardepisode_log_probs = torch.cat((episode_log_probs, log_prob))state = next_stateloss = - R * episode_log_probs.sum()optimizer.zero_grad(); loss.backward(); optimizer.step()
Python ile Deep Reinforcement Learning