Packaging ML models

Developing Machine Learning Models for Production

Sinan Ozdemir

Data Scientist, Entrepreneur, and Author

Why packaging matters

  • Optimize performance
  • Ensure compatibility
  • Makes models easier to deploy

Packaging Methods

  • Serialization: simple, lightweight, and language-agnostic
  • Environment packaging: capture entire software environments
  • Containerization: a portable, reproducible, and isolated environment
Developing Machine Learning Models for Production

How to package ML models

  • Serialization - storing and retrieving an ML model

  • Environment packaging - consistent and reproducible environment for the ML Model

  • Containerization - packaging the model, dependencies, and the environment in a single "container"

Developing Machine Learning Models for Production

Serializing scikit-learn models

Serializing an sklearn model using pickle:

import pickle

model = ...  # Train the scikit-learn model

# Serialize the model to a file
with open('model.pkl', 'wb') as f:
    pickle.dump(model, f)

# Load the serialized model from the file
with open('model.pkl', 'rb') as f:
    model = pickle.load(f)

Serializing an sklearn model in HDF5 format:

import h5py
import numpy as np
from sklearn.externals import joblib

model = ...  # Train the scikit-learn model

# Serialize the model to an HDF5 file
with h5py.File('model.h5', 'w') as f:
    f.create_dataset('model_weights',
    data=joblib.dump(model))

# Load the serialized model from the HDF5 file
with h5py.File('model.h5', 'r') as f:
    model = joblib.load(f['model_weights'][:])
Developing Machine Learning Models for Production

Serializing PyTorch and Tensorflow models

Serializing a PyTorch model:

import torch

# Train a PyTorch model and store it in a variable
trained_model = ...

# Serialize the trained model to a file
serialized_model_path = 'model.pt'
torch.save(trained_model.state_dict(), serialized_model_path)

# Load the serialized model from a file
loaded_model = ... # Initialize the model
loaded_model.load_state_dict(
    torch.load(serialized_model_path))

Serializing a Tensorflow model:

import tensorflow as tf

# Train a Tensorflow model
trained_model = ...

# Save the trained model to a directory 
saved_model_directory = 'model/'
tf.saved_model.save
    (trained_model, saved_model_directory)

# Load the saved model from the directory 
loaded_model = tf.saved_model.load(
    saved_model_directory)
Developing Machine Learning Models for Production

ML environment packaging with Docker

  • Ensure the model's environment lets it run
  • virtualenv etc. create consistent and reproducible environments
  • Docker containers are self-contained units that are easily deployable

docker

Developing Machine Learning Models for Production

Example Dockerfile

# Use an existing image as the base image
FROM python:3.8-slim

# Set the working directory
WORKDIR /app

# Copy the requirements file to the image
COPY requirements.txt .

# Install the required dependencies
RUN pip install -r requirements.txt

# Copy the ML model and its dependencies to the image
COPY model/ .

# Set the entrypoint to run the model
ENTRYPOINT ["python", "run_model.py"]

<---- Use Python 3.8 base image


<---- Set the working directory


<---- Copy the requirmentes.txt file


<---- Install the model's dependent packages


<---- Copy the model into the continer


<---- Tell the container how to start up
Developing Machine Learning Models for Production

Experiment -> Docker workflow

  1. Serialize the trained ML model using a format such as pickle, HDF5, or PyTorch.

  2. Containerize the serialized ML model, its dependencies, and the environment

  3. Deploy the Docker image to a target environment such as a cloud platform

  4. Run the Docker container from the deployed Docker image and run the ML model.

  5. Use the model within the container from an API or some other access point.

Docker Workflow

Developing Machine Learning Models for Production

Let's practice!

Developing Machine Learning Models for Production

Preparing Video For Download...