Automation in MLOps deployment strategies

Fully Automated MLOps

Arturo Opsetmoen Amador

Senior Consultant - Machine Learning

Model deployment - prediction service

Image of the MLOps reference architecture with an element highlighted: the prediction service.

Fully Automated MLOps

Prediction services - modes recap

Predictions can be served in:

  • Batch
  • Streams
  • Real-time
  • On-the-edge
Fully Automated MLOps

Prediction service - batch serving

$$

  • Large number of predictions

$$

  • Periodic schedules

$$

  • Event triggered

$$

$$

$$

Figure representing batch processing. All input is delivered to a batch processing engine. All output is delivered simultaneously after batch processing the whole input.

Fully Automated MLOps

Prediction service - streaming serving

$$

  • Continuously incoming data

$$

  • Continuously delivered predictions

$$

$$

Figure representing stream processing. Elements are continuously delivered to a stream processing engine. After processing, outputs are continuously delivered to an output stream.

Fully Automated MLOps

Prediction service - real time

$$

  • Single record

$$

  • Instant predictions

$$

$$

realtime_interaction.png

Fully Automated MLOps

Prediction service - on the edge

$$

  • Mobile devices

$$

  • IoT devices

$$

  • Reduced latency

$$

Image of an IoT application running on a mobile pad device.

Fully Automated MLOps

Deployment strategies

$$

ML serving types determines how we should deploy and update our prediction services

Model deployment strategies include:

  • Shadow deployment
  • Canary deployment
  • A/B testing
  • Blue/Green
Fully Automated MLOps

A/B testing

Figure representing AB testing in a deployed prediction service. Prediction requests are sent to a load balancer. The load balancer distributes requests to models A and B. The performance of the models is continuously monitored.

Fully Automated MLOps

A/B testing

Figure of an AB testing deployment. After the performance of the model has been monitored long enough, the model B has shown better performance. The system shifts all requests to this, the best performing model.

Fully Automated MLOps

Shadow deployment

Figure of a shadow deployment. Prediction requests are sent to a load balancer. The load balancer delivers requests to both the live and shadow models. Only the live model delivers predictions back. The performance of both models are continuously monitored.

Fully Automated MLOps

Blue/Green deployment

Figure of a blue/green deployment. A model is live in production in the blue environment.

Fully Automated MLOps

Blue/Green deployment

An updated model is deployed to a replica of the blue environment. This replica is the green environment. Prediction requests start to be redirected from the blue to the green environment automatically by a traffic switch module.

Fully Automated MLOps

Blue/Green deployment

Figure of a blue/green deployment. All traffic is gradually and automatically switched from the blue to the green environment.

Fully Automated MLOps

Blue/Green deployment

After all traffic is switched, the blue environment can be deleted. The green environment with the updated model becomes the new production environment.

Fully Automated MLOps

Deploying and updating prediction services

$$

Model type determines deployment strategy

Table with deployment strategy and their properties in 4 dimensions: No downtime, Condition-based, Rollback time, and additional costs.

Fully Automated MLOps

Let's practice!

Fully Automated MLOps

Preparing Video For Download...