Model training with MLFlow in Databricks

Databricks Concepts

Kevin Barlow

Data Practitioner

Machine Learning Lifecycle

¹ https://www.datacamp.com/blog/machine-learning-lifecycle-explained

Model training and development

Machine Learning Lifecycle - Modeling

Single-node vs. Multi-node

Single-node machine learning

Great for experimenting and starting
Easier initial setup
Hard to implement in production

scikit-learn logo

Multi-node machine learning

Great for production workloads
Easier maintenance long-term
Highly scalable

Apache Spark logo

AutoML

"Glass box" approach to automated machine learning
Leverages open-source libraries
Creates models based on data and targeted prediction
Provides notebook with generated code for further

AutoML example

¹ https://www.databricks.com/product/automl

MLFlow

Open-source framework
End-to-end machine learning lifecycle management
Track, evaluate, manage, and deploy
Pre-installed on ML Runtime!

import mlflow

with mlflow.start_run() as run:
  # machine learning training

mlflow.autolog()

mlflow.log_metric('accuracy', acc)

mlflow.lot_param('k', kNum)

MLFlow Experiments

Collect information across multiple runs in a single location
Sort and compare model runs
Find and promote the best model

MLFlow Experiments

Let's practice!

Databricks Concepts