Deep Reinforcement Learning dengan Python
Timothée Carayol
Principal Machine Learning Engineer, Komment
| Contoh |
|---|
| Discount rate |
| PPO: clipping epsilon, entropy bonus |
| Experience replay: ukuran buffer, ukuran batch |
| Jadwal epsilon-greedy yang menurun |
| Fixed Q-targets: $\tau$ |
| Learning rate |
| Jumlah layer, node per layer... |
Objektif: rata-rata reward kumulatif
Teknik pencarian hyperparameter:


Alur kerja Optuna:
study Optuna
import optunadef objective(trial): ...study = optuna.create_study()study.optimize(objective, n_trials=100)
study.best_params
{'learning_rate': 0.001292481, 'batch_size': 8}
Dalam fungsi objektif:
Fleksibel penuh untuk spesifikasi hyperparameter:
def objective(trial: optuna.trial.Trial):# Hyperparameters x dan y antara -10 dan 10x = trial.suggest_float('x', -10, 10) y = trial.suggest_float('y', -10, 10)# Kembalikan metrik yang akan diminimalkan return (x - 2) ** 2 + 1.2 * (y + 3) ** 2
n_trials dengan sampler default (TPE)n_trials tidak diisi: jalan terus hingga dihentikan
import sqlite study = optuna.create_study( storage="sqlite:///DRL.db", study_name="my_study")study.optimize(objective, n_trials=100)
loaded_study = optuna.load_study(
study_name="my_study",
storage="sqlite:///DRL.db")
optuna.visualization.plot_param_importances(study)

optuna.visualization.plot_contour(study)

Deep Reinforcement Learning dengan Python