CI/CD for Machine Learning
Ravi Bhadauria
Machine Learning Engineer
Hyperparameter tuning
Training
Loose coupling for independent training
Both jobs are dataset dependent
# Contents of hp configuration
{
"n_estimators": [2, 4, 5],
"max_depth": [10, 20, 50],
"random_state": [1993]
}
# Contents of best parameters
{
"n_estimators": 5,
"max_depth": 20,
"random_state": 1993
}
# Load hyperparameters from the JSON file with open("rfc_best_params.json", "r") as params_file: rfc_params = json.load(params_file)
# Define and train model model = RandomForestClassifier(**rfc_params) model.fit(X_train, y_train)
# Define the model and hyperparameter search space model = RandomForestClassifier() param_grid = json.load(open("hp_config.json", "r"))
# Perform GridSearch with five fold CV grid_search = GridSearchCV(model, param_grid, cv=5) grid_search.fit(X_train, y_train)
# Get the best hyperparameters best_params = grid_search.best_params_ with open("rfc_best_params.json", "w") as outfile: json.dump(best_params, outfile)
stages:
preprocess: ...
train: ...
hp_tune:
cmd: python hp_tuning.py
deps:
- processed_dataset/weather.csv
- hp_config.json
- hp_tuning.py
outs: # Not tracking best parameters
- hp_tuning_results.md:
cache: false
stages:
preprocess: ...
hp_tune: ...
train:
cmd: python train.py
deps:
- processed_dataset/weather.csv
- rfc_best_params.json # Best parameters
- train.py
metrics:
- metrics.json:
cache: false
Stages can be triggered independently dvc repro <stage_name>
Force run hyperparameter tuning stage dvc repro -f hp_tune
Training can be run with dvc repro train
Both stages trigger preprocessing step as dependency
mean_test_score | std_test_score | max_depth | n_estimators | random_state |
---|---|---|---|---|
0.999733 | 0.000413118 | 20 | 5 | 1993 |
0.999307 | 0.000574418 | 50 | 5 | 1993 |
0.99888 | 0.000617378 | 10 | 5 | 1993 |
0.997813 | 0.00117333 | 10 | 4 | 1993 |
Changes in Python hyperparameter tuning script
# Save the results of hyperparameter tuning
cv_results = pd.DataFrame(grid_search.cv_results_)
markdown_table = cv_results.to_markdown(index=False)
with open("hp_tuning_results.md", "w") as markdown_file:
markdown_file.write(markdown_table)
Hyperparameter tuning route
hp_tune/<some-string>
dvc repro -f hp_tune
cml pr create
to create a new training PR with best parametersManual route
train/<some-string>
CI/CD for Machine Learning