Deep Reinforcement Learning in Python
Timothée Carayol
Principal Machine Learning Engineer, Komment
Examples |
---|
Discount rate |
PPO: clipping epsilon, entropy bonus |
Experience replay: buffer size, batch size |
Decayed epsilon greediness schedule |
Fixed Q-targets: $\tau$ |
Learning rate |
Number of layers, nodes per layer... |
Objective: average cumulative rewards
Hyperparameter search techniques:
Optuna workflow:
study
import optuna
def objective(trial): ...
study = optuna.create_study()
study.optimize(objective, n_trials=100)
study.best_params
{'learning_rate': 0.001292481, 'batch_size': 8}
In the objective function:
Offers full flexibility on hyperparameter specification:
def objective(trial: optuna.trial.Trial):
# Hyperparameters x and y between -10 and 10
x = trial.suggest_float('x', -10, 10) y = trial.suggest_float('y', -10, 10)
# Return the metric to minimize return (x - 2) ** 2 + 1.2 * (y + 3) ** 2
n_trials
with default sampler (TPE)n_trials
omitted: run trials until interrupted
import sqlite study = optuna.create_study( storage="sqlite:///DRL.db", study_name="my_study")
study.optimize(objective, n_trials=100)
loaded_study = optuna.load_study(
study_name="my_study",
storage="sqlite:///DRL.db")
optuna.visualization.plot_param_importances(study)
optuna.visualization.plot_contour(study)
Deep Reinforcement Learning in Python