CI/CD Pipelines for ML: Automating the MLOps Lifecycle
In the first article of this series, we explored why 85% of ML projects never reach production and how MLOps solves this problem. We transformed a monolithic notebook into a modular, configurable pipeline. Now it is time to take the next step: automating the entire pipeline with CI/CD, so that every change to code, data, or configuration automatically triggers training, validation, and model deployment.
In this article, we will build a complete CI/CD pipeline for machine learning using GitHub Actions as the orchestrator and Docker as the execution runtime. We will go beyond theory and build a fully functional project with a sentiment classifier, complete with multi-stage Dockerfiles, YAML workflows, data validation, model registry integration, and automated deployment.
What You Will Learn
- Why CI/CD for ML differs fundamentally from traditional software CI/CD
- How to design the architecture of an end-to-end ML pipeline
- How to write optimized multi-stage Dockerfiles for training and serving
- How to write complete GitHub Actions workflows for ML
- How to integrate data validation, model registry, and automated deployment
- How to manage data versioning with DVC inside the pipeline
- How to implement ML-specific testing (unit, integration, smoke)
- How to monitor the model after deployment
- How to reduce costs with caching and self-hosted runners
- How to choose the right CI/CD tool for your team
Why CI/CD for ML Is Different
If you come from traditional software development, you might assume that standard CI/CD practices translate directly to ML projects. In reality, machine learning introduces unique complexities that require a purpose-built approach. The fundamental difference is that traditional CI/CD manages a single artifact (source code), while ML CI/CD must manage three simultaneously: code, data, and model.
The Three ML Artifacts
In traditional software, if the code does not change, the output does not change. In ML, even with identical code, a change in the data produces a different model. This means the CI/CD pipeline must track and validate three independent dimensions.
| Dimension | Traditional CI/CD | ML CI/CD |
|---|---|---|
| Code | Git push triggers build + test | Git push triggers training + evaluation |
| Data | Not applicable | New data triggers retraining |
| Model | Not applicable | New model requires validation + promotion |
| Configuration | Feature flags, env vars | Hyperparameters, feature sets, metric thresholds |
| Environment | OS + libraries | OS + libraries + GPU drivers + CUDA version |
| Validation | Tests pass or fail | Metrics above/below threshold + comparison with production model |
| Deployment | Deploy or rollback | Gradual rollout + A/B test + drift monitoring |
Continuous Training: The Key Concept
CI/CD for ML introduces a concept absent from traditional software engineering: Continuous Training (CT). Beyond Continuous Integration and Continuous Deployment, CT ensures the model is automatically retrained whenever:
- New data arrives: the dataset is updated with new observations
- Code changes: preprocessing logic or the algorithm is modified
- Metrics degrade: monitoring detects data drift or performance drop
- A timer fires: scheduled retraining (e.g., weekly) is triggered
Common Mistake: CI/CD Without CT
Many teams implement CI/CD for ML code but forget Continuous Training. The result is a model that gets deployed once and then never updated, silently degrading over time as production data diverges from training data. A pipeline without CT is like a car without maintenance: it works until it breaks down.
ML Pipeline Architecture
Before writing code, let us design the complete architecture. Each phase has specific inputs and outputs, and failure in one phase blocks all downstream phases. This "fail fast" approach ensures that only validated models reach production.
+------------------+ +------------------+ +------------------+
| DATA INGESTION |---->| PREPROCESSING |---->| TRAINING |
| | | | | |
| - Pull DVC data | | - Data cleaning | | - Train model |
| - Validation | | - Feature eng. | | - Log metrics |
| - Schema check | | - Train/val/test | | - Log params |
| | | split | | - Save artifacts |
+------------------+ +------------------+ +------------------+
| |
| (trigger: new data |
| or schedule) v
| +------------------+
| | EVALUATION |
| | |
| | - Metrics |
| | - Compare with |
| | production |
| | - Gate: thresholds|
| +------------------+
| |
| (if metrics > threshold)
| v
+------------------+ +------------------+ +------------------+
| MONITORING |<----| SMOKE TEST |<----| DEPLOYMENT |
| | | | | |
| - Health check | | - Test endpoint | | - Push registry |
| - Drift detect. | | - Sample predict | | - Stage/Prod |
| - Alert | | - Latency check | | - Rollback ready |
| - Trigger retrain| | | | |
+------------------+ +------------------+ +------------------+
Each block corresponds to a step in the GitHub Actions workflow. Let us now look at how to implement each phase, starting with containerization using Docker.
Docker for Machine Learning
Docker solves one of the most frustrating problems in ML: "it works on my machine". By containerizing the training and serving environments, we guarantee that code produces identical results wherever it runs: on the data scientist's laptop, in the CI/CD runner, and in production. For ML, Docker requires special attention: images tend to be very large (scientific libraries + GPU drivers) and builds can be slow.
Base Images for ML
Choosing the right base image is critical for both size and compatibility. Here are the main options and when to use each.
| Base Image | Size | Use | When to Choose |
|---|---|---|---|
| python:3.11-slim | ~120 MB | CPU Training/Serving | scikit-learn, XGBoost, lightweight serving |
| python:3.11-bookworm | ~900 MB | Training with build tools | Dependencies that require C/C++ compilation |
| nvidia/cuda:12.1-runtime | ~3.5 GB | GPU Inference | Deep learning model serving |
| nvidia/cuda:12.1-devel | ~5.2 GB | GPU Training | PyTorch/TensorFlow training with CUDA |
| pytorch/pytorch:2.1.0-cuda12.1 | ~6 GB | PyTorch Training/Serving | PyTorch projects that want to avoid manual CUDA setup |
Multi-Stage Dockerfile for Training and Serving
The multi-stage pattern is fundamental for ML. Using two separate stages, we can have a full build environment (with compilers and build tools) and a lean final image that contains only the runtime needed. This reduces the final image size by up to 60%.
# ============================================
# Stage 1: Builder - install dependencies
# ============================================
FROM python:3.11-slim AS builder
WORKDIR /build
# Install build tools required for native dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
gcc \
g++ \
&& rm -rf /var/lib/apt/lists/*
# Copy and install dependencies into a virtual environment
COPY requirements.txt .
RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
RUN pip install --no-cache-dir --upgrade pip && \
pip install --no-cache-dir -r requirements.txt
# ============================================
# Stage 2: Training - runs the training job
# ============================================
FROM python:3.11-slim AS trainer
WORKDIR /app
# Copy virtual environment from builder stage
COPY --from=builder /opt/venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
# Copy source code
COPY src/ ./src/
COPY config/ ./config/
COPY train.py .
COPY evaluate.py .
ENTRYPOINT ["python", "train.py"]
# ============================================
# Stage 3: Serving - production API
# ============================================
FROM python:3.11-slim AS serving
WORKDIR /app
# Non-root user for security
RUN useradd --create-home appuser
# Copy virtual environment from builder stage
COPY --from=builder /opt/venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
# Copy only the code needed for serving
COPY src/serving/ ./src/serving/
COPY src/preprocessing/ ./src/preprocessing/
# Health check
HEALTHCHECK --interval=30s --timeout=5s --retries=3 \
CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')"
USER appuser
EXPOSE 8000
ENTRYPOINT ["uvicorn", "src.serving.app:app", "--host", "0.0.0.0", "--port", "8000"]
Why Multi-Stage for ML?
- Security: the serving image contains no compilers or build tools
- Size: the serving stage is much lighter (~300 MB vs ~1.2 GB)
- Cache: dependencies change less often than code, leveraging layer cache
- Flexibility: you can build only the training stage or only the serving stage
Layer Cache Optimization
The order of COPY instructions in the Dockerfile is critical for caching. Python
dependencies change rarely; source code changes often. By copying
requirements.txt first and then the code, we avoid reinstalling
dependencies on every code change.
# Data and models (managed by DVC, not Docker)
data/
models/
*.pkl
*.h5
*.pt
# Development environment
.venv/
__pycache__/
*.pyc
.pytest_cache/
.mypy_cache/
# Git and CI
.git/
.github/
.dvc/cache/
# IDE and editor
.vscode/
.idea/
*.swp
# Documentation
docs/
*.md
LICENSE
Docker with GPU Support
For deep learning model training, you need GPU support inside the container. Docker supports NVIDIA GPUs through the NVIDIA Container Toolkit. The configuration requires NVIDIA drivers on the host and the toolkit installed.
# Base image with CUDA runtime
FROM nvidia/cuda:12.1.1-runtime-ubuntu22.04 AS gpu-trainer
WORKDIR /app
RUN apt-get update && apt-get install -y --no-install-recommends \
python3.11 \
python3.11-venv \
python3-pip \
&& rm -rf /var/lib/apt/lists/*
RUN python3.11 -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
# PyTorch with CUDA dependencies
COPY requirements-gpu.txt .
RUN pip install --no-cache-dir -r requirements-gpu.txt
COPY src/ ./src/
COPY train.py .
# CUDA environment variables
ENV NVIDIA_VISIBLE_DEVICES=all
ENV NVIDIA_DRIVER_CAPABILITIES=compute,utility
ENTRYPOINT ["python", "train.py"]
# Build GPU image
docker build -f Dockerfile.gpu -t ml-trainer:gpu .
# Run with GPU access
docker run --gpus all \
-v $(pwd)/data:/app/data \
-v $(pwd)/models:/app/models \
ml-trainer:gpu \
--config config/training.yaml
GitHub Actions for Machine Learning
GitHub Actions is a CI/CD service integrated into GitHub that executes automated workflows in response to events (push, pull request, schedule, manual dispatch). For ML, it offers significant advantages: native integration with the Git repository, a marketplace with pre-built actions, secret management for credentials, and up to 2,000 minutes/month free for public repositories.
ML Workflow Structure
A GitHub Actions workflow for ML has a specific structure: multiple jobs corresponding to pipeline phases, with explicit dependencies between jobs and execution conditions based on model metrics.
name: ML Pipeline - Train, Evaluate, Deploy
on:
# Trigger on push to main (code or config changes)
push:
branches: [main]
paths:
- 'src/**'
- 'config/**'
- 'requirements.txt'
- 'train.py'
- 'evaluate.py'
# Scheduled trigger for periodic retraining
schedule:
- cron: '0 6 * * 1' # Every Monday at 06:00 UTC
# Manual trigger with parameters
workflow_dispatch:
inputs:
force_deploy:
description: 'Force deployment even if metrics do not improve'
required: false
default: 'false'
type: choice
options:
- 'true'
- 'false'
training_config:
description: 'Training configuration file'
required: false
default: 'config/training.yaml'
env:
PYTHON_VERSION: '3.11'
DOCKER_REGISTRY: ghcr.io
IMAGE_NAME: 






