Hi! I'm

Federico Calò

Software Developer | Technical Writer

I create modern web applications and custom digital tools to help businesses grow through technological innovation. My passion is combining computer science and economics to generate real value.

Contact Me

About Me

My passion for computer science was born at the Technical Commercial Institute of Maglie, where I discovered the power of programming and the fascination of creating digital solutions. From the start, I understood that computer science was not just code, but an extraordinary tool for turning ideas into reality.

During my studies in Business Information Systems, I began to interweave computer science and economics, understanding how technology can be the engine of growth for any business. This vision accompanied me to the University of Bari, where I obtained my degree in Computer Science, deepening my technical skills and passion for software development.

Today I put this experience at the service of businesses, professionals and startups, creating tailor-made digital solutions that automate processes, optimize resources and open new business opportunities. Because true innovation begins when technology meets the real needs of people.

My Skills

Data Analysis & Predictive Models

I transform data into strategic insights with in-depth analysis and predictive models for informed decisions

Process Automation

I create custom tools that automate repetitive operations and free up time for value-added activities

Custom Systems

I develop tailor-made software systems, from platform integrations to customized dashboards

const federico = {
  nome: "Federico Calò",
  ruolo: "Sviluppatore Software",
  città: "Bari, Italia",
  missione: "Aiutare attraverso l'informatica",
  passioni: [
    "Codice Pulito",
    "Innovazione",
    "Crescita Continua"
  ]
};

La Mia Missione

Credo fermamente che l'informatica sia lo strumento più potente per trasformare le idee in realtà e migliorare la vita delle persone.

Democratizzare la Tecnologia

La mia missione è rendere l'informatica accessibile a tutti: dalle piccole imprese locali alle startup innovative, fino ai professionisti che vogliono digitalizzare la propria attività. Ogni realtà merita di sfruttare le potenzialità del digitale.

Unire Informatica ed Economia

Non è solo questione di scrivere codice: è capire come la tecnologia possa generare valore reale. Intrecciando competenze informatiche e visione economica, aiuto le attività a crescere, ottimizzare processi e raggiungere nuovi traguardi di efficienza e redditività.

Creare Soluzioni su Misura

Ogni attività è unica, e così devono esserlo le soluzioni. Sviluppo strumenti personalizzati che rispondono alle esigenze specifiche di ciascun cliente, automatizzando processi ripetitivi e liberando tempo per ciò che conta davvero: far crescere il business.

Trasforma la Tua Attività con la Tecnologia

Che tu gestisca un negozio, uno studio professionale o un'azienda, posso aiutarti a sfruttare le potenzialità dell'informatica per lavorare meglio, più velocemente e in modo più intelligente.

Parliamone Insieme →

Join the Community

Join the developer community where we discuss software, AI, architecture and DevOps. Share ideas, ask questions and grow with us.

Channel

FC Dev Blog

Get notifications on new articles, complete series, weekly tips and featured tools. Bilingual IT/EN content directly in your Telegram.

New articles as they are published
Weekly tips and code snippets
Polls on future topics

Subscribe to Channel

Group

FC Dev Community

A bilingual IT/EN community for developers. Discussions, Q&A, mutual help and networking with other professionals.

Discussions on articles and technologies
Coding help and code review
Job opportunities and collaboration

Join the Group

Discussion Topics

View

Master SQL

RoadMap.sh

November 2024

View

Oracle Certified Foundations Associate

Oracle

October 2024

View

People Leadership Credential

Connect

September 2024

💻 Languages & Technologies

Java

Python

JavaScript

Angular

React

TypeScript

SQL

PHP

CSS/SCSS

Node.js

Docker

Git

💼

12/2024 - Present

Custom Software Engineering Analyst

Accenture

Bari, Puglia, Italy · Hybrid Analysis and development of computer systems through the use of Java and Quarkus in Health and Public Sector. Continuous training on modern technologies for creating customized and efficient software solutions and on agents.

💼

06/2022 - 12/2024

Software analyst and Back End Developer Associate Consultant

Links Management and Technology SpA

Experience analyzing as-is software systems and ETL flows using PowerCenter. Completed Spring Boot training for developing modern and scalable backend applications. Backend developer specialized in Spring Boot, with experience in database design, analysis, development and testing of assigned tasks.

💼

02/2021 - 10/2021

Software programmer

Adesso.it (prima era WebScience srl)

Experience in AS-IS and TO-BE analysis, SEO evolutions and website evolutions to improve user performance and engagement.

🎓

2018 - 2025

Degree in Computer Science

University of Bari Aldo Moro

Bachelor's degree in Computer Science, focusing on software engineering, algorithms, and modern development practices.

📚

2013 - 2018

Diploma - Corporate Information Systems

Technical Commercial Institute of Maglie

Technical diploma specializing in Business Information Systems, combining IT knowledge with business management.

Contattami

Hai un progetto in mente? Parliamone! Compila il form qui sotto e ti risponderò al più presto.

* Campi obbligatori. I tuoi dati saranno utilizzati solo per rispondere alla tua richiesta.

CI/CD Pipelines for ML: Automating the MLOps Lifecycle

In the first article of this series, we explored why 85% of ML projects never reach production and how MLOps solves this problem. We transformed a monolithic notebook into a modular, configurable pipeline. Now it is time to take the next step: automating the entire pipeline with CI/CD, so that every change to code, data, or configuration automatically triggers training, validation, and model deployment.

In this article, we will build a complete CI/CD pipeline for machine learning using GitHub Actions as the orchestrator and Docker as the execution runtime. We will go beyond theory and build a fully functional project with a sentiment classifier, complete with multi-stage Dockerfiles, YAML workflows, data validation, model registry integration, and automated deployment.

What You Will Learn

Why CI/CD for ML differs fundamentally from traditional software CI/CD
How to design the architecture of an end-to-end ML pipeline
How to write optimized multi-stage Dockerfiles for training and serving
How to write complete GitHub Actions workflows for ML
How to integrate data validation, model registry, and automated deployment
How to manage data versioning with DVC inside the pipeline
How to implement ML-specific testing (unit, integration, smoke)
How to monitor the model after deployment
How to reduce costs with caching and self-hosted runners
How to choose the right CI/CD tool for your team

Why CI/CD for ML Is Different

If you come from traditional software development, you might assume that standard CI/CD practices translate directly to ML projects. In reality, machine learning introduces unique complexities that require a purpose-built approach. The fundamental difference is that traditional CI/CD manages a single artifact (source code), while ML CI/CD must manage three simultaneously: code, data, and model.

The Three ML Artifacts

In traditional software, if the code does not change, the output does not change. In ML, even with identical code, a change in the data produces a different model. This means the CI/CD pipeline must track and validate three independent dimensions.

Dimension	Traditional CI/CD	ML CI/CD
Code	Git push triggers build + test	Git push triggers training + evaluation
Data	Not applicable	New data triggers retraining
Model	Not applicable	New model requires validation + promotion
Configuration	Feature flags, env vars	Hyperparameters, feature sets, metric thresholds
Environment	OS + libraries	OS + libraries + GPU drivers + CUDA version
Validation	Tests pass or fail	Metrics above/below threshold + comparison with production model
Deployment	Deploy or rollback	Gradual rollout + A/B test + drift monitoring

Continuous Training: The Key Concept

CI/CD for ML introduces a concept absent from traditional software engineering: Continuous Training (CT). Beyond Continuous Integration and Continuous Deployment, CT ensures the model is automatically retrained whenever:

New data arrives: the dataset is updated with new observations
Code changes: preprocessing logic or the algorithm is modified
Metrics degrade: monitoring detects data drift or performance drop
A timer fires: scheduled retraining (e.g., weekly) is triggered

Common Mistake: CI/CD Without CT

Many teams implement CI/CD for ML code but forget Continuous Training. The result is a model that gets deployed once and then never updated, silently degrading over time as production data diverges from training data. A pipeline without CT is like a car without maintenance: it works until it breaks down.

ML Pipeline Architecture

Before writing code, let us design the complete architecture. Each phase has specific inputs and outputs, and failure in one phase blocks all downstream phases. This "fail fast" approach ensures that only validated models reach production.

ML CI/CD Pipeline Architecture


  +------------------+     +------------------+     +------------------+
  |  DATA INGESTION  |---->|  PREPROCESSING   |---->|    TRAINING      |
  |                  |     |                  |     |                  |
  | - Pull DVC data  |     | - Data cleaning  |     | - Train model    |
  | - Validation     |     | - Feature eng.   |     | - Log metrics    |
  | - Schema check   |     | - Train/val/test  |     | - Log params     |
  |                  |     |   split          |     | - Save artifacts |
  +------------------+     +------------------+     +------------------+
         |                                                   |
         | (trigger: new data                                |
         |  or schedule)                                     v
         |                                          +------------------+
         |                                          |   EVALUATION     |
         |                                          |                  |
         |                                          | - Metrics        |
         |                                          | - Compare with   |
         |                                          |   production     |
         |                                          | - Gate: thresholds|
         |                                          +------------------+
         |                                                   |
         |                                    (if metrics > threshold)
         |                                                   v
  +------------------+     +------------------+     +------------------+
  |   MONITORING     |<----|   SMOKE TEST     |<----|   DEPLOYMENT     |
  |                  |     |                  |     |                  |
  | - Health check   |     | - Test endpoint  |     | - Push registry  |
  | - Drift detect.  |     | - Sample predict |     | - Stage/Prod     |
  | - Alert          |     | - Latency check  |     | - Rollback ready |
  | - Trigger retrain|     |                  |     |                  |
  +------------------+     +------------------+     +------------------+

Each block corresponds to a step in the GitHub Actions workflow. Let us now look at how to implement each phase, starting with containerization using Docker.

Docker for Machine Learning

Docker solves one of the most frustrating problems in ML: "it works on my machine". By containerizing the training and serving environments, we guarantee that code produces identical results wherever it runs: on the data scientist's laptop, in the CI/CD runner, and in production. For ML, Docker requires special attention: images tend to be very large (scientific libraries + GPU drivers) and builds can be slow.

Base Images for ML

Choosing the right base image is critical for both size and compatibility. Here are the main options and when to use each.

Base Image	Size	Use	When to Choose
python:3.11-slim	~120 MB	CPU Training/Serving	scikit-learn, XGBoost, lightweight serving
python:3.11-bookworm	~900 MB	Training with build tools	Dependencies that require C/C++ compilation
nvidia/cuda:12.1-runtime	~3.5 GB	GPU Inference	Deep learning model serving
nvidia/cuda:12.1-devel	~5.2 GB	GPU Training	PyTorch/TensorFlow training with CUDA
pytorch/pytorch:2.1.0-cuda12.1	~6 GB	PyTorch Training/Serving	PyTorch projects that want to avoid manual CUDA setup

Multi-Stage Dockerfile for Training and Serving

The multi-stage pattern is fundamental for ML. Using two separate stages, we can have a full build environment (with compilers and build tools) and a lean final image that contains only the runtime needed. This reduces the final image size by up to 60%.

Dockerfile - Multi-stage for training and serving


# ============================================
# Stage 1: Builder - install dependencies
# ============================================
FROM python:3.11-slim AS builder

WORKDIR /build

# Install build tools required for native dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
    build-essential \
    gcc \
    g++ \
    && rm -rf /var/lib/apt/lists/*

# Copy and install dependencies into a virtual environment
COPY requirements.txt .
RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
RUN pip install --no-cache-dir --upgrade pip && \
    pip install --no-cache-dir -r requirements.txt

# ============================================
# Stage 2: Training - runs the training job
# ============================================
FROM python:3.11-slim AS trainer

WORKDIR /app

# Copy virtual environment from builder stage
COPY --from=builder /opt/venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"

# Copy source code
COPY src/ ./src/
COPY config/ ./config/
COPY train.py .
COPY evaluate.py .

ENTRYPOINT ["python", "train.py"]

# ============================================
# Stage 3: Serving - production API
# ============================================
FROM python:3.11-slim AS serving

WORKDIR /app

# Non-root user for security
RUN useradd --create-home appuser

# Copy virtual environment from builder stage
COPY --from=builder /opt/venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"

# Copy only the code needed for serving
COPY src/serving/ ./src/serving/
COPY src/preprocessing/ ./src/preprocessing/

# Health check
HEALTHCHECK --interval=30s --timeout=5s --retries=3 \
    CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')"

USER appuser
EXPOSE 8000

ENTRYPOINT ["uvicorn", "src.serving.app:app", "--host", "0.0.0.0", "--port", "8000"]

Why Multi-Stage for ML?

Security: the serving image contains no compilers or build tools
Size: the serving stage is much lighter (~300 MB vs ~1.2 GB)
Cache: dependencies change less often than code, leveraging layer cache
Flexibility: you can build only the training stage or only the serving stage

Layer Cache Optimization

The order of COPY instructions in the Dockerfile is critical for caching. Python dependencies change rarely; source code changes often. By copying requirements.txt first and then the code, we avoid reinstalling dependencies on every code change.

.dockerignore - Exclude unnecessary files


# Data and models (managed by DVC, not Docker)
data/
models/
*.pkl
*.h5
*.pt

# Development environment
.venv/
__pycache__/
*.pyc
.pytest_cache/
.mypy_cache/

# Git and CI
.git/
.github/
.dvc/cache/

# IDE and editor
.vscode/
.idea/
*.swp

# Documentation
docs/
*.md
LICENSE

Docker with GPU Support

For deep learning model training, you need GPU support inside the container. Docker supports NVIDIA GPUs through the NVIDIA Container Toolkit. The configuration requires NVIDIA drivers on the host and the toolkit installed.

Dockerfile.gpu - Dockerfile with GPU support


# Base image with CUDA runtime
FROM nvidia/cuda:12.1.1-runtime-ubuntu22.04 AS gpu-trainer

WORKDIR /app

RUN apt-get update && apt-get install -y --no-install-recommends \
    python3.11 \
    python3.11-venv \
    python3-pip \
    && rm -rf /var/lib/apt/lists/*

RUN python3.11 -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"

# PyTorch with CUDA dependencies
COPY requirements-gpu.txt .
RUN pip install --no-cache-dir -r requirements-gpu.txt

COPY src/ ./src/
COPY train.py .

# CUDA environment variables
ENV NVIDIA_VISIBLE_DEVICES=all
ENV NVIDIA_DRIVER_CAPABILITIES=compute,utility

ENTRYPOINT ["python", "train.py"]

Run with GPU access


# Build GPU image
docker build -f Dockerfile.gpu -t ml-trainer:gpu .

# Run with GPU access
docker run --gpus all \
    -v $(pwd)/data:/app/data \
    -v $(pwd)/models:/app/models \
    ml-trainer:gpu \
    --config config/training.yaml

GitHub Actions for Machine Learning

GitHub Actions is a CI/CD service integrated into GitHub that executes automated workflows in response to events (push, pull request, schedule, manual dispatch). For ML, it offers significant advantages: native integration with the Git repository, a marketplace with pre-built actions, secret management for credentials, and up to 2,000 minutes/month free for public repositories.

ML Workflow Structure

A GitHub Actions workflow for ML has a specific structure: multiple jobs corresponding to pipeline phases, with explicit dependencies between jobs and execution conditions based on model metrics.

.github/workflows/ml-pipeline.yml - Complete workflow


name: ML Pipeline - Train, Evaluate, Deploy

on:
  # Trigger on push to main (code or config changes)
  push:
    branches: [main]
    paths:
      - 'src/**'
      - 'config/**'
      - 'requirements.txt'
      - 'train.py'
      - 'evaluate.py'

  # Scheduled trigger for periodic retraining
  schedule:
    - cron: '0 6 * * 1'  # Every Monday at 06:00 UTC

  # Manual trigger with parameters
  workflow_dispatch:
    inputs:
      force_deploy:
        description: 'Force deployment even if metrics do not improve'
        required: false
        default: 'false'
        type: choice
        options:
          - 'true'
          - 'false'
      training_config:
        description: 'Training configuration file'
        required: false
        default: 'config/training.yaml'

env:
  PYTHON_VERSION: '3.11'
  DOCKER_REGISTRY: ghcr.io
  IMAGE_NAME: #123;{ github.repository }}/ml-model
  MLFLOW_TRACKING_URI: #123;{ secrets.MLFLOW_TRACKING_URI }}

jobs:
  # ============================================
  # Job 1: Data Validation
  # ============================================
  data-validation:
    name: Validate Data Quality
    runs-on: ubuntu-latest
    outputs:
      data_valid: #123;{ steps.validate.outputs.valid }}

    steps:
      - uses: actions/checkout@v4

      - name: Setup Python
        uses: actions/setup-python@v5
        with:
          python-version: #123;{ env.PYTHON_VERSION }}
          cache: 'pip'

      - name: Install dependencies
        run: pip install -r requirements.txt

      - name: Setup DVC
        uses: iterative/setup-dvc@v2

      - name: Pull data from DVC
        run: dvc pull
        env:
          AWS_ACCESS_KEY_ID: #123;{ secrets.AWS_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: #123;{ secrets.AWS_SECRET_ACCESS_KEY }}

      - name: Validate data quality
        id: validate
        run: |
          python -m src.data.validate_data \
            --data-path data/raw/reviews.csv \
            --schema-path config/data_schema.yaml
          echo "valid=true" >> $GITHUB_OUTPUT

  # ============================================
  # Job 2: Model Training
  # ============================================
  training:
    name: Train Model
    needs: data-validation
    if: needs.data-validation.outputs.data_valid == 'true'
    runs-on: ubuntu-latest
    outputs:
      model_version: #123;{ steps.train.outputs.model_version }}
      run_id: #123;{ steps.train.outputs.run_id }}

    steps:
      - uses: actions/checkout@v4

      - name: Setup Python
        uses: actions/setup-python@v5
        with:
          python-version: #123;{ env.PYTHON_VERSION }}
          cache: 'pip'

      - name: Install dependencies
        run: pip install -r requirements.txt

      - name: Setup DVC and pull data
        run: |
          pip install dvc[s3]
          dvc pull
        env:
          AWS_ACCESS_KEY_ID: #123;{ secrets.AWS_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: #123;{ secrets.AWS_SECRET_ACCESS_KEY }}

      - name: Train model
        id: train
        run: |
          python train.py \
            --config #123;{ github.event.inputs.training_config || 'config/training.yaml' }} \
            --output-dir models/
          echo "model_version=$(cat models/version.txt)" >> $GITHUB_OUTPUT
          echo "run_id=$(cat models/run_id.txt)" >> $GITHUB_OUTPUT
        env:
          MLFLOW_TRACKING_URI: #123;{ env.MLFLOW_TRACKING_URI }}
          MLFLOW_TRACKING_USERNAME: #123;{ secrets.MLFLOW_USERNAME }}
          MLFLOW_TRACKING_PASSWORD: #123;{ secrets.MLFLOW_PASSWORD }}

      - name: Upload model artifact
        uses: actions/upload-artifact@v4
        with:
          name: trained-model
          path: models/
          retention-days: 30

  # ============================================
  # Job 3: Model Evaluation (Gate)
  # ============================================
  evaluation:
    name: Evaluate Model
    needs: training
    runs-on: ubuntu-latest
    outputs:
      metrics_pass: #123;{ steps.evaluate.outputs.metrics_pass }}
      accuracy: #123;{ steps.evaluate.outputs.accuracy }}
      auc_roc: #123;{ steps.evaluate.outputs.auc_roc }}

    steps:
      - uses: actions/checkout@v4

      - name: Setup Python
        uses: actions/setup-python@v5
        with:
          python-version: #123;{ env.PYTHON_VERSION }}
          cache: 'pip'

      - name: Install dependencies
        run: pip install -r requirements.txt

      - name: Download trained model
        uses: actions/download-artifact@v4
        with:
          name: trained-model
          path: models/

      - name: Run evaluation and metric gate
        id: evaluate
        run: |
          python evaluate.py \
            --model-path models/ \
            --data-path data/processed/test.csv \
            --output-path evaluation/metrics.json
          # Parse metrics and set gate
          python -c "
          import json, sys
          with open('evaluation/metrics.json') as f:
              m = json.load(f)
          passed = m['auc_roc'] >= 0.82 and m['accuracy'] >= 0.80
          print(f'AUC-ROC: {m[\"auc_roc\"]:.4f} | Accuracy: {m[\"accuracy\"]:.4f}')
          print(f'Gate: {\"PASS\" if passed else \"FAIL\"}')
          with open('$GITHUB_OUTPUT', 'a') as out:
              out.write(f'metrics_pass={str(passed).lower()}\n')
              out.write(f'accuracy={m[\"accuracy\"]}\n')
              out.write(f'auc_roc={m[\"auc_roc\"]}\n')
          sys.exit(0 if passed else 1)
          "

  # ============================================
  # Job 4: Build and Push Docker Image
  # ============================================
  build-push:
    name: Build and Push Image
    needs: evaluation
    if: needs.evaluation.outputs.metrics_pass == 'true'
    runs-on: ubuntu-latest
    permissions:
      contents: read
      packages: write
    outputs:
      image_tag: #123;{ steps.meta.outputs.tags }}

    steps:
      - uses: actions/checkout@v4

      - name: Download model artifact
        uses: actions/download-artifact@v4
        with:
          name: trained-model
          path: models/

      - name: Log in to GitHub Container Registry
        uses: docker/login-action@v3
        with:
          registry: #123;{ env.DOCKER_REGISTRY }}
          username: #123;{ github.actor }}
          password: #123;{ secrets.GITHUB_TOKEN }}

      - name: Extract Docker metadata
        id: meta
        uses: docker/metadata-action@v5
        with:
          images: #123;{ env.DOCKER_REGISTRY }}/#123;{ env.IMAGE_NAME }}
          tags: |
            type=sha,prefix=sha-
            type=raw,value=latest,enable=#123;{ github.ref == 'refs/heads/main' }}

      - name: Build and push Docker image
        uses: docker/build-push-action@v5
        with:
          context: .
          file: Dockerfile
          target: serving
          push: true
          tags: #123;{ steps.meta.outputs.tags }}
          labels: #123;{ steps.meta.outputs.labels }}
          cache-from: type=gha
          cache-to: type=gha,mode=max

  # ============================================
  # Job 5: Deploy to Staging
  # ============================================
  deploy-staging:
    name: Deploy to Staging
    needs: build-push
    runs-on: ubuntu-latest
    environment: staging

    steps:
      - uses: actions/checkout@v4

      - name: Deploy to staging environment
        run: |
          echo "Deploying #123;{ needs.build-push.outputs.image_tag }} to staging"
          # kubectl set image or docker service update
          kubectl set image deployment/ml-model-staging \
            ml-api=#123;{ needs.build-push.outputs.image_tag }} \
            --namespace staging
          kubectl rollout status deployment/ml-model-staging --namespace staging

      - name: Run smoke tests on staging
        run: |
          python tests/smoke/test_api.py \
            --base-url #123;{ secrets.STAGING_URL }}

Data Validation: Preventing Bad Data from Reaching Training

One of the most valuable components of an ML pipeline is data validation. Training on corrupt, incomplete, or drift-affected data produces a model that will fail in production without any obvious error. We use Great Expectations to define declarative data quality rules that serve as a gate before training begins.

src/data/validate_data.py - Data validation with Great Expectations


# validate_data.py
# Data quality checks using Great Expectations
# Executed as first step in the CI/CD pipeline

import json
import sys
import logging
from pathlib import Path
import pandas as pd
import great_expectations as gx
from great_expectations.core.batch import BatchRequest

logger = logging.getLogger(__name__)


def validate_training_data(
    data_path: str,
    output_path: str = "data/raw/validation_summary.json"
) -> bool:
    """
    Validates training data against a set of predefined expectations.

    Returns:
        True if all expectations pass, False otherwise.
    """
    df = pd.read_csv(data_path)
    context = gx.get_context()

    # Create a pandas datasource
    datasource = context.sources.add_pandas("training_data")
    data_asset = datasource.add_dataframe_asset("reviews")
    batch_request = data_asset.build_batch_request(dataframe=df)

    # Define expectations
    suite = context.add_expectation_suite("training_data_suite")

    validator = context.get_validator(
        batch_request=batch_request,
        expectation_suite=suite
    )

    # ---- Core expectations ----
    # 1. Minimum number of rows
    validator.expect_table_row_count_to_be_between(min_value=10_000)

    # 2. Required columns must exist
    for col in ["text", "label", "timestamp"]:
        validator.expect_column_to_exist(col)

    # 3. No nulls in critical columns
    validator.expect_column_values_to_not_be_null("text")
    validator.expect_column_values_to_not_be_null("label")

    # 4. Label values within expected range (binary classification)
    validator.expect_column_values_to_be_in_set("label", [0, 1])

    # 5. Class balance check: neither class exceeds 90%
    validator.expect_column_mean_to_be_between("label", min_value=0.1, max_value=0.9)

    # 6. Text column: reasonable length
    validator.expect_column_value_lengths_to_be_between(
        "text", min_value=5, max_value=5000
    )

    # 7. No duplicate rows
    validator.expect_compound_columns_to_be_unique(["text"])

    # ---- Run validation ----
    results = validator.validate()

    summary = {
        "all_checks_passed": results.success,
        "total_checks": len(results.results),
        "passed_checks": sum(1 for r in results.results if r.success),
        "failed_checks": sum(1 for r in results.results if not r.success),
        "row_count": int(df.shape[0]),
        "column_count": int(df.shape[1]),
        "failures": [
            {
                "expectation": str(r.expectation_config.expectation_type),
                "column": r.expectation_config.kwargs.get("column", "N/A")
            }
            for r in results.results if not r.success
        ]
    }

    Path(output_path).parent.mkdir(parents=True, exist_ok=True)
    with open(output_path, "w") as f:
        json.dump(summary, f, indent=2)

    if not results.success:
        logger.error(f"Data validation FAILED: {summary['failed_checks']} checks failed")
        for failure in summary["failures"]:
            logger.error(f"  - {failure['expectation']} on column '{failure['column']}'")
    else:
        logger.info(f"Data validation PASSED: {summary['passed_checks']}/{summary['total_checks']} checks")

    return results.success


if __name__ == "__main__":
    import argparse
    parser = argparse.ArgumentParser()
    parser.add_argument("--data-path", required=True)
    parser.add_argument("--output-path", default="data/raw/validation_summary.json")
    args = parser.parse_args()

    passed = validate_training_data(args.data_path, args.output_path)
    sys.exit(0 if passed else 1)

ML-Specific Testing

ML systems require test types that do not exist in traditional software. In addition to unit tests for utility functions, we need tests that verify the behavior of the model itself: does the preprocessing produce the expected shapes? Does the model output valid probabilities? Does the API respond correctly to edge cases?

tests/smoke/test_api.py - Smoke tests post-deployment


# test_api.py
# Smoke tests run immediately after deployment
# Verify the API is working correctly before traffic routing

import json
import sys
import time
import argparse
import requests
import logging

logger = logging.getLogger(__name__)


def run_smoke_tests(base_url: str, timeout: int = 30) -> bool:
    """
    Runs smoke tests against the deployed API.

    Args:
        base_url: Base URL of the deployed API (e.g., https://api.example.com)
        timeout: Max wait time in seconds for health check

    Returns:
        True if all tests pass, False otherwise
    """
    results = {}

    # ---- Test 1: Health check ----
    print("Test 1: Health check...")
    deadline = time.time() + timeout
    while time.time() < deadline:
        try:
            resp = requests.get(f"{base_url}/health", timeout=5)
            if resp.status_code == 200:
                health = resp.json()
                assert health.get("status") == "healthy", f"Unexpected health: {health}"
                results["health_check"] = True
                print(f"  PASS - model_version: {health.get('model_version', 'unknown')}")
                break
        except Exception:
            time.sleep(2)
    else:
        results["health_check"] = False
        print("  FAIL - health check timed out")

    # ---- Test 2: Single prediction ----
    print("Test 2: Single prediction...")
    try:
        payload = {
            "text": "This product is absolutely fantastic, I love it!",
            "max_length": 128
        }
        start = time.time()
        resp = requests.post(f"{base_url}/predict", json=payload, timeout=10)
        latency_ms = (time.time() - start) * 1000

        assert resp.status_code == 200, f"HTTP {resp.status_code}: {resp.text}"
        result = resp.json()
        assert "prediction" in result, "Missing 'prediction' field"
        assert "confidence" in result, "Missing 'confidence' field"
        assert 0.0 <= result["confidence"] <= 1.0, "Confidence out of range [0,1]"
        assert latency_ms < 500, f"Latency too high: {latency_ms:.0f}ms (max 500ms)"

        results["single_prediction"] = True
        print(f"  PASS - prediction: {result['prediction']}, latency: {latency_ms:.0f}ms")
    except Exception as e:
        results["single_prediction"] = False
        print(f"  FAIL - {e}")

    # ---- Test 3: Batch prediction ----
    print("Test 3: Batch prediction...")
    try:
        batch_payload = {
            "texts": [
                "Excellent quality and fast shipping!",
                "Terrible experience, never buying again.",
                "Average product, nothing special."
            ]
        }
        resp = requests.post(f"{base_url}/predict/batch", json=batch_payload, timeout=15)
        assert resp.status_code == 200
        batch_result = resp.json()
        assert len(batch_result["predictions"]) == 3
        for pred in batch_result["predictions"]:
            assert 0.0 <= pred["confidence"] <= 1.0

        results["batch_prediction"] = True
        print(f"  PASS - {len(batch_result['predictions'])} predictions returned")
    except Exception as e:
        results["batch_prediction"] = False
        print(f"  FAIL - {e}")

    # ---- Test 4: Edge cases ----
    print("Test 4: Edge cases...")
    try:
        # Empty text should return validation error, not 500
        resp = requests.post(f"{base_url}/predict", json={"text": ""}, timeout=5)
        assert resp.status_code == 422, f"Expected 422 for empty text, got {resp.status_code}"

        # Very long text - should not crash
        long_text = "word " * 2000
        resp = requests.post(f"{base_url}/predict", json={"text": long_text}, timeout=10)
        assert resp.status_code in [200, 422], f"Unexpected status: {resp.status_code}"

        results["edge_cases"] = True
        print("  PASS")
    except Exception as e:
        results["edge_cases"] = False
        print(f"  FAIL - {e}")

    # ---- Summary ----
    all_passed = all(results.values())
    passed = sum(results.values())
    total = len(results)
    print(f"\nSmoke test results: {passed}/{total} passed")
    if not all_passed:
        failed = [k for k, v in results.items() if not v]
        print(f"Failed tests: {', '.join(failed)}")

    return all_passed


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--base-url", required=True)
    parser.add_argument("--timeout", type=int, default=30)
    args = parser.parse_args()

    passed = run_smoke_tests(args.base_url, args.timeout)
    sys.exit(0 if passed else 1)

Cost Optimization: Caching and Self-Hosted Runners

ML pipelines on CI/CD can become expensive quickly: model training takes time and consumes runner minutes. There are several strategies to keep costs under control without sacrificing automation.

Aggressive Layer Caching


# In your GitHub Actions workflow: maximize caching
- name: Setup Python with full pip cache
  uses: actions/setup-python@v5
  with:
    python-version: "3.11"
    cache: 'pip'
    cache-dependency-path: |
      requirements.txt
      requirements-dev.txt

# DVC cache: avoid re-downloading unchanged data
- name: Cache DVC data
  uses: actions/cache@v4
  with:
    path: .dvc/cache
    key: dvc-#123;{ hashFiles('dvc.lock') }}
    restore-keys: |
      dvc-

# Docker BuildKit cache (GHA cache backend)
- name: Build with BuildKit cache
  uses: docker/build-push-action@v5
  with:
    cache-from: type=gha
    cache-to: type=gha,mode=max
    # 'max' mode caches every layer, not just final

Conditional Job Execution

Not every pipeline run needs to retrain the model. Use path filters and conditions to skip expensive steps when they are unnecessary.


# Only trigger full training when relevant files change
on:
  push:
    paths:
      - 'src/training/**'
      - 'src/data/**'
      - 'config/model_config.yaml'
      - 'requirements.txt'
    # Ignore: docs/, README.md, tests/ (unless you want test-triggered training)

jobs:
  check-changes:
    runs-on: ubuntu-latest
    outputs:
      training_needed: #123;{ steps.filter.outputs.training }}
    steps:
      - uses: dorny/paths-filter@v3
        id: filter
        with:
          filters: |
            training:
              - 'src/training/**'
              - 'config/model_config.yaml'
              - 'data/**'

  training:
    needs: check-changes
    if: needs.check-changes.outputs.training_needed == 'true'
    runs-on: ubuntu-latest
    # ... training steps

Best Practices and Tool Comparison

Tool	Strengths	Weaknesses	Best For
GitHub Actions	Integrated with Git, huge marketplace, free tier	Limited GPU runners, no native ML features	Projects already on GitHub, teams <20 people
GitLab CI	Self-hosted option, strong security, good GPU support	More complex configuration, steeper learning curve	Enterprise with on-premise requirements
Kubeflow Pipelines	Native K8s, pipeline visualization, artifact tracking	Complex setup, Kubernetes required	Large ML teams, complex DAG pipelines
Prefect / Airflow	Advanced scheduling, retries, observability	Adds operational overhead, not CI/CD-native	Data engineering teams, complex orchestration
DVC Pipelines + GH Actions	Reproducibility, data versioning, lightweight	Learning curve for DVC, less visualization	Research teams, projects prioritizing reproducibility

Recommended Approach for Budgets Under EUR 5K/Year

CI/CD: GitHub Actions (free tier covers most use cases)
Orchestration: DVC Pipelines (free, reproducible, Git-native)
Registry: GitHub Container Registry (free for public repos)
Serving: Self-hosted on Hetzner VPS (~EUR 6/month)
Tracking: MLflow self-hosted on same VPS
Total infrastructure: ~EUR 360/year, well under budget

Conclusions and Next Steps

We have built a complete CI/CD pipeline for ML: from multi-stage Dockerfiles that optimize image size and security, to GitHub Actions workflows with data validation, training, metric gates, Docker build, and automated deployment. Every component is connected and automated.

The key principle to take away: a CI/CD pipeline for ML must manage code, data, and model as first-class citizens. Treating only code as a versioned artifact while leaving data and models untracked is one of the primary causes of unreproducible experiments and silent production degradation.

Continue the Series

Previous article: MLOps 101: From Experiment to Production - Foundational concepts and maturity levels
Next article: Dataset and Model Versioning: DVC vs LakeFS - Deep dive into data versioning with DVC and LakeFS
Related: Serving ML Models: FastAPI + Uvicorn - Building the production API referenced in this pipeline