Model training with MLFlow in Databricks

Concetti di Databricks

Kevin Barlow

Data Practitioner

Machine Learning Lifecycle

Machine Learning Lifecycle

1 https://www.datacamp.com/blog/machine-learning-lifecycle-explained
Concetti di Databricks

Model training and development

Machine Learning Lifecycle - Modeling

Concetti di Databricks

Single-node vs. Multi-node

Single-node machine learning

  • Great for experimenting and starting
  • Easier initial setup
  • Hard to implement in production

scikit-learn logo

Multi-node machine learning

  • Great for production workloads
  • Easier maintenance long-term
  • Highly scalable

Apache Spark logo

Concetti di Databricks

AutoML

  • "Glass box" approach to automated machine learning
  • Leverages open-source libraries
  • Creates models based on data and targeted prediction
  • Provides notebook with generated code for further

AutoML example

1 https://www.databricks.com/product/automl
Concetti di Databricks

MLFlow

  • Open-source framework
  • End-to-end machine learning lifecycle management
  • Track, evaluate, manage, and deploy
  • Pre-installed on ML Runtime!

MLFlow Logo

import mlflow

with mlflow.start_run() as run:
  # machine learning training

mlflow.autolog()

mlflow.log_metric('accuracy', acc)

mlflow.lot_param('k', kNum)
Concetti di Databricks

MLFlow Experiments

  • Collect information across multiple runs in a single location
  • Sort and compare model runs
  • Find and promote the best model

MLFlow Experiments

Concetti di Databricks

Let's practice!

Concetti di Databricks

Preparing Video For Download...