Model training with MLFlow in Databricks

Databricks Concepts

Kevin Barlow

Data Practitioner

Machine Learning Lifecycle

Machine Learning Lifecycle

1 https://www.datacamp.com/blog/machine-learning-lifecycle-explained
Databricks Concepts

Model training and development

Machine Learning Lifecycle - Modeling

Databricks Concepts

Single-node vs. Multi-node

Single-node machine learning

  • Great for experimenting and starting
  • Easier initial setup
  • Hard to implement in production

scikit-learn logo

Multi-node machine learning

  • Great for production workloads
  • Easier maintenance long-term
  • Highly scalable

Apache Spark logo

Databricks Concepts

AutoML

  • "Glass box" approach to automated machine learning
  • Leverages open-source libraries
  • Creates models based on data and targeted prediction
  • Provides notebook with generated code for further

AutoML example

1 https://www.databricks.com/product/automl
Databricks Concepts

MLFlow

  • Open-source framework
  • End-to-end machine learning lifecycle management
  • Track, evaluate, manage, and deploy
  • Pre-installed on ML Runtime!

MLFlow Logo

import mlflow

with mlflow.start_run() as run:
  # machine learning training

mlflow.autolog()

mlflow.log_metric('accuracy', acc)

mlflow.lot_param('k', kNum)
Databricks Concepts

MLFlow Experiments

  • Collect information across multiple runs in a single location
  • Sort and compare model runs
  • Find and promote the best model

MLFlow Experiments

Databricks Concepts

Let's practice!

Databricks Concepts

Preparing Video For Download...