Serving modes

Deployment e ciclo di vita in MLOps

Nemanja Radojkovic

Senior Machine Learning Engineer

model as software

Deployment e ciclo di vita in MLOps

user perspective

Deployment e ciclo di vita in MLOps

service like any other

Deployment e ciclo di vita in MLOps

food service

Deployment e ciclo di vita in MLOps

model service

Deployment e ciclo di vita in MLOps

Serving and serving mode

  • Providing prediction service == Model serving
  • Implementation of a specific type of serving == Serving mode

 

Choose carefully!

Deployment e ciclo di vita in MLOps

when should the service run

Deployment e ciclo di vita in MLOps

scheduled

Deployment e ciclo di vita in MLOps

on demand

Deployment e ciclo di vita in MLOps

batch prediction 1

Deployment e ciclo di vita in MLOps

batch pred 2

Deployment e ciclo di vita in MLOps

batch definition

Deployment e ciclo di vita in MLOps

also known as

Deployment e ciclo di vita in MLOps

Batch prediction: Keep it simple

  • Batch prediction is the simplest
  • If use case allows it, go for it
  • Good fit: monthly generation of sales forecasts
Deployment e ciclo di vita in MLOps

on demand 1

Deployment e ciclo di vita in MLOps

on demand synonyms

Deployment e ciclo di vita in MLOps

on demand time importance

Deployment e ciclo di vita in MLOps

tech term

Deployment e ciclo di vita in MLOps

request time

Deployment e ciclo di vita in MLOps

response time

Deployment e ciclo di vita in MLOps

Acceptable latency

What is acceptable?

  • < 1 hour?
  • < 1 minute?
  • < 1 second?
  • < 1 millisecond?
Deployment e ciclo di vita in MLOps

Near-real time prediction a.k.a. Stream processing

Acceptable latency ~= X minutes

Also known as stream processing (requests and responses form "data streams")

Deployment e ciclo di vita in MLOps

Real-time prediction

Acceptable latency < 1 sec

Example:

  • Credit card fraud detection
  • Late prediction as good as useless
Deployment e ciclo di vita in MLOps

When latency is a priority

  • Weaker, but faster model more valuable than a stronger, but slower one
  • Models deployed to end user devices to reduce latency => "edge deployment"
    • ML-infused smartphone apps:
      • navigation apps
      • unlocking via facial recognition
      • image filters
Deployment e ciclo di vita in MLOps

Let's practice!

Deployment e ciclo di vita in MLOps

Preparing Video For Download...