Orchestrating AI Deployment: Azure Pipelines Meets VMware ECS

Oct 2, 2024 ecs vsphere kubernetes devops

Introduction

kkk

Edge AI Challenge

AI applications at the edge offer significant advantages, including local data processing for real-time analysis, reduced latency, offline operation, and scalability without overwhelming central infrastructure. Regular retraining of models allows contiunal incremental accuracy improvements. Regular quantization, pruning, and knowledge distillation also helps reduce model size without loss and improve model performance. However the software lifecycle management of applications across a large estate of distributed edge compute environments presents several significant challenges:

Ensuring compatibility across different operating systems and software environments is challenging.
Edge environments can have limited or unreliable network connectivity.
There’s a shortage of IT expertise in edge sites.

To address these challenges, organizations are turning to specialized edge platforms and tools that offer features like:

Pull-based architectures for efficient management of distributed sites.
Zero-touch provisioning and lifecycle management.
Edge-optimized runtimes for both virtualized and containerized applications.
Integrated telemetry for visibility into application and platform behaviors.

It is a risk to deploy large-scale AI model updates to thousands of edge sites in an uncontrolled fashion. Potential widespread system failures or performance degradation if the updates contain bugs. Without proper testing, updates could introduce new biases or errors in AI decision-making. Without rollback capabilities in an uncontrolled process makes it difficult to quickly address problems. To address these challenges, organizations are looking to deploy continous delivery pipelines and employing testing stage gates.

GitOps

GitOps is an operational model for cloud-native applications that uses Git as the single source of truth for declarative infrastructure and applications its principles:

Infrastructure-as-Code (IaC) stored in Git repositories
Automated synchronization between Git state and cluster state
Declarative descriptions of the desired system state
Continuous deployment and reconciliation

Azure DevOps Pipeline

Azure Pipelines is a cloud-based service provided by Microsoft as part of the Azure DevOps suite. It is designed to automate the building, testing, and deployment of code projects. It enables controlled, stagegated deployments with approvals and gates allowing thorough testing in controlled environments before wider rollout. It provides visibility into the deployment process, which is essential for managing and troubleshooting across distributed edge sites.

VMware Edge Compute Stack (ECS)

VMware Edge Compute Stack (ECS) is an edge-optimized runtime and orchestration platform designed for managing edge applications and infrastructure across multiple distributed sites. The edge-optimized runtime is managed by VMware Edge Cloud Orchestrator (VECO). VECO enables a GitOps approach to managing edge infrastructure and applications. This approach enables version control and auditing of changes to edge environments. VECO implements a zero-touch provisioning with a pull-based model suitable for edge sites with limited IT skills and intermittent connectivity.

In VECO sites can be used to manage a number of edge hosts in the same location, these hosts can share a common set of configuration files and also be visualized and troubleshooted as a single location. A site can be associated with a GitHub repository, branch and or folder. In my environment I associate each site with a site folder one each for qa, uat and production environments. Within each folder I have:

configureHost.yaml - Defines the configuration to be applied to the ESXi host
face-detector.yaml - Defines the configuration of the AI inference application
vm.yaml - Defines the configuration of other VM workload to be deployed to the site
metallb.yaml - Defines the configuration of load balancer to expose AI inference application kubernetes service to the outside network

The file and folder structure for the sites and associated declarative configuration files looks something like:

├── sites
    ├── qa
    │   ├── configureHost.yaml
    │   ├── face-detector.yaml
    │   ├── vm.yaml
    │   └── metallb.yaml
    ├── uat
    │   ├── configureHost.yaml
    │   ├── face-detector.yaml
    │   ├── vm.yaml
    │   └── metallb.yaml
    └── production
        ├── configureHost.yaml
        ├── face-detector.yaml
        ├── vm.yaml
        └── metallb.yaml

Computer Vision Pipeline

A while ago I created a computer vision demonstration application the approach is documented here. This was designed to make simple changes to the code which reflected visually in the running application. There is the ability to simulate a change the AI model to detect a draw a green box around either full face or just the eyes. There is also the ability to simulate other changes to the app such as changing web app background colour or text. The source code for this application is stored in a GitHub repository. There are various other assets in this repository, all face detector application files are within folder face-detector. As such the first part of the pipeline watches for commits to files in the face-detector folder in the main repository branch.

trigger:
  branches:
    include:
    - main
  paths:
    include:
    - face-detector/*

An Azure Pipeline agent is a software component that runs on virtual server in Azure cloud or on a physical device. The agent is responsible for executing the tasks defined in an Azure DevOps pipeline. There is a cost to using Azure cloud agents and I have perfectly good laptop which could run the pipeline agent software.

Any software required by the pipeline to run needs to be available on the agent. I need my pipeline to build docker imaages and so I need to have the Docker Desktop engine running. I have an Arm based Mac in which the docker engine socket location changes from standard. In this standard configuration the Azure pipeline fails as it cannot find docker engine socket. In this case it is possible to set environment variable to its correct location before starting the agent.

export DOCKER_HOST=unix://$HOME/.docker/run/docker.sock

yq is a lightweight and portable command-line tool for processing YAML, JSON, XML, CSV, and other structured data formats. It is particularly useful for updating YAML files. My pipeline uses yq to update the infrastructure as code yaml files. As such I needed to install this on MacOS using command like:

brew install yq

The next section of the pipeline defines agent pool to use my laptop rather than a cloud instance.

pool:
  name: Default
  demands:
    - agent.name -equals Darryls-MacBook-Pro

With the base pipeline attributes defined we can look at the stages. A stage is a logical boundary in an Azure DevOps pipeline. Stages can be used to group actions, each stage contains one or more jobs. In this pipeline I have the following stages:

Build and push multi-arch docker image
Update the QA environment
Approval stagegate for QA
Update the UAT environment
Approval stagegate for UAT
Update the production environment

Introduce your approach using Azure Pipelines to manage VMware ECS site YAML files

Emphasize the benefits of this method (automation, consistency, control)

Azure Pipelines: Explain its role in orchestrating the deployment process

VMware ECS: Briefly describe how it’s used as the deployment target

YAML Configuration: Discuss the importance of YAML files in defining ECS site configurations

Stage Gates: Explain how these control the flow between environments

Pipeline Structure

Describe the overall structure of your pipeline

Explain the stages: QA, UAT, and Production

Discuss how you implemented stage gates between environments

Desired State - YAML Configuration Management

Explain how you manage different YAML configurations for each environment

Discuss any templating or parameterization techniques used