Automation in MLOps deployment strategies

Fully Automated MLOps

Arturo Opsetmoen Amador

Senior Consultant - Machine Learning

Model deployment - prediction service

Image of the MLOps reference architecture with an element highlighted: the prediction service.

Prediction services - modes recap

Predictions can be served in:

Batch

Streams

Real-time

On-the-edge

Prediction service - batch serving

Large number of predictions

Periodic schedules

Event triggered

Figure representing batch processing. All input is delivered to a batch processing engine. All output is delivered simultaneously after batch processing the whole input.

Prediction service - streaming serving

Continuously incoming data

Continuously delivered predictions

Figure representing stream processing. Elements are continuously delivered to a stream processing engine. After processing, outputs are continuously delivered to an output stream.

Prediction service - real time

Single record

Instant predictions

Prediction service - on the edge

Mobile devices

IoT devices

Reduced latency

Image of an IoT application running on a mobile pad device.

Deployment strategies

ML serving types determines how we should deploy and update our prediction services

Model deployment strategies include:

Shadow deployment
Canary deployment
A/B testing
Blue/Green

A/B testing

Figure representing AB testing in a deployed prediction service. Prediction requests are sent to a load balancer. The load balancer distributes requests to models A and B. The performance of the models is continuously monitored.

A/B testing

Figure of an AB testing deployment. After the performance of the model has been monitored long enough, the model B has shown better performance. The system shifts all requests to this, the best performing model.

Shadow deployment

Figure of a shadow deployment. Prediction requests are sent to a load balancer. The load balancer delivers requests to both the live and shadow models. Only the live model delivers predictions back. The performance of both models are continuously monitored.

Blue/Green deployment

Figure of a blue/green deployment. A model is live in production in the blue environment.

Blue/Green deployment

An updated model is deployed to a replica of the blue environment. This replica is the green environment. Prediction requests start to be redirected from the blue to the green environment automatically by a traffic switch module.

Blue/Green deployment

Figure of a blue/green deployment. All traffic is gradually and automatically switched from the blue to the green environment.

Blue/Green deployment

After all traffic is switched, the blue environment can be deleted. The green environment with the updated model becomes the new production environment.

Deploying and updating prediction services

Model type determines deployment strategy

Table with deployment strategy and their properties in 4 dimensions: No downtime, Condition-based, Rollback time, and additional costs.

Let's practice!

Fully Automated MLOps