Using Databricks for machine learning

Databricks Concepts

Kevin Barlow

Data Practitioner

Machine Learning Lifecycle

Machine Learning Lifecycle

1 https://www.datacamp.com/blog/machine-learning-lifecycle-explained
Databricks Concepts

Planning and preparation

ML Lifecycle - EDA

Databricks Concepts

Planning for machine learning

What do I have?

  1. Data availability
  2. Business requirements
  3. Data scientists/data analysts

Data team and resources

What do I want?

  1. Use cases
  2. Legal and security compliance
  3. Business outcomes

Business outcomes

Databricks Concepts

ML Runtime

  • Extension of Databricks compute
  • Optimized for machine learning applications
  • Contains most common libraries and frameworks
    • scikit-learn, SparkML, TensorFlow
    • MLFlow
  • Works with cluster library management

Databricks ML Runtime

Databricks Concepts

Exploratory Data Analysis

import pandas as pd
pd.describe(df)
# Spark DF
df.summary()
dbutils.data.summarize()
import bamboolib as bam
df

EDA in Databricks

Databricks Concepts

Feature tables and feature stores

Raw Data
count category price shelf_loc rating
4 horror 12.50 end 3
6 romance 13.99 top 4.5
12 sci-fi 16.50 bottom 5
31 romance 9.99 bottom 3.5
23 fantasy 24.99 top 4
18 horror 19.99 end 2.5
19 cooking 17.50 end 4.5
7 fantasy 12.99 top 3
37 sci-fi 14.99 bottom 5
Feature table
count category price shelf_loc rating
4 1 12.50 1 3
6 2 13.99 2 4.5
12 3 16.50 3 5
31 2 9.99 3 3.5
23 4 24.99 2 4
18 1 19.99 1 2.5
19 5 17.50 1 4.5
7 4 12.99 2 3
37 3 14.99 3 5
Databricks Concepts

Databricks Feature Store

  • Centralized storage for featurized datasets
  • Easily discover and re-use features for machine learning models
  • Upstream and downstream lineage

Databricks Feature Store

from databricks import feature_store

fs = feature_store.FeatureStoreClient()

fs.create_table(
    name=table_name,
    primary_keys=["wine_id"],
    df=features_df,
    schema=features_df.schema,
    description="wine features"
)
Databricks Concepts

Let's practice!

Databricks Concepts

Preparing Video For Download...