Serving modes

MLOps Deployment and Life Cycling

Nemanja Radojkovic

Senior Machine Learning Engineer

model as software

MLOps Deployment and Life Cycling

user perspective

MLOps Deployment and Life Cycling

service like any other

MLOps Deployment and Life Cycling

food service

MLOps Deployment and Life Cycling

model service

MLOps Deployment and Life Cycling

Serving and serving mode

  • Providing prediction service == Model serving
  • Implementation of a specific type of serving == Serving mode

 

Choose carefully!

MLOps Deployment and Life Cycling

when should the service run

MLOps Deployment and Life Cycling

scheduled

MLOps Deployment and Life Cycling

on demand

MLOps Deployment and Life Cycling

batch prediction 1

MLOps Deployment and Life Cycling

batch pred 2

MLOps Deployment and Life Cycling

batch definition

MLOps Deployment and Life Cycling

also known as

MLOps Deployment and Life Cycling

Batch prediction: Keep it simple

  • Batch prediction is the simplest
  • If use case allows it, go for it
  • Good fit: monthly generation of sales forecasts
MLOps Deployment and Life Cycling

on demand 1

MLOps Deployment and Life Cycling

on demand synonyms

MLOps Deployment and Life Cycling

on demand time importance

MLOps Deployment and Life Cycling

tech term

MLOps Deployment and Life Cycling

request time

MLOps Deployment and Life Cycling

response time

MLOps Deployment and Life Cycling

Acceptable latency

What is acceptable?

  • < 1 hour?
  • < 1 minute?
  • < 1 second?
  • < 1 millisecond?
MLOps Deployment and Life Cycling

Near-real time prediction a.k.a. Stream processing

Acceptable latency ~= X minutes

Also known as stream processing (requests and responses form "data streams")

MLOps Deployment and Life Cycling

Real-time prediction

Acceptable latency < 1 sec

Example:

  • Credit card fraud detection
  • Late prediction as good as useless
MLOps Deployment and Life Cycling

When latency is a priority

  • Weaker, but faster model more valuable than a stronger, but slower one
  • Models deployed to end user devices to reduce latency => "edge deployment"
    • ML-infused smartphone apps:
      • navigation apps
      • unlocking via facial recognition
      • image filters
MLOps Deployment and Life Cycling

Let's practice!

MLOps Deployment and Life Cycling

Preparing Video For Download...