Deep Reinforcement Learning in Python
Timothée Carayol
Principal Machine Learning Engineer, Komment
| Examples |
|---|
| Discount rate |
| PPO: clipping epsilon, entropy bonus |
| Experience replay: buffer size, batch size |
| Decayed epsilon greediness schedule |
| Fixed Q-targets: $\tau$ |
| Learning rate |
| Number of layers, nodes per layer... |
Objective: average cumulative rewards
Hyperparameter search techniques:


Optuna workflow:
study
import optunadef objective(trial): ...study = optuna.create_study()study.optimize(objective, n_trials=100)
study.best_params
{'learning_rate': 0.001292481, 'batch_size': 8}
In the objective function:
Offers full flexibility on hyperparameter specification:
def objective(trial: optuna.trial.Trial):# Hyperparameters x and y between -10 and 10x = trial.suggest_float('x', -10, 10) y = trial.suggest_float('y', -10, 10)# Return the metric to minimize return (x - 2) ** 2 + 1.2 * (y + 3) ** 2
n_trials with default sampler (TPE)n_trials omitted: run trials until interrupted
import sqlite study = optuna.create_study( storage="sqlite:///DRL.db", study_name="my_study")study.optimize(objective, n_trials=100)
loaded_study = optuna.load_study(
study_name="my_study",
storage="sqlite:///DRL.db")
optuna.visualization.plot_param_importances(study)

optuna.visualization.plot_contour(study)

Deep Reinforcement Learning in Python