Hi! I'm

Federico Calò

Software Developer | Technical Writer

I create modern web applications and custom digital tools to help businesses grow through technological innovation. My passion is combining computer science and economics to generate real value.

Contact Me

About Me

My passion for computer science was born at the Technical Commercial Institute of Maglie, where I discovered the power of programming and the fascination of creating digital solutions. From the start, I understood that computer science was not just code, but an extraordinary tool for turning ideas into reality.

During my studies in Business Information Systems, I began to interweave computer science and economics, understanding how technology can be the engine of growth for any business. This vision accompanied me to the University of Bari, where I obtained my degree in Computer Science, deepening my technical skills and passion for software development.

Today I put this experience at the service of businesses, professionals and startups, creating tailor-made digital solutions that automate processes, optimize resources and open new business opportunities. Because true innovation begins when technology meets the real needs of people.

My Skills

Data Analysis & Predictive Models

I transform data into strategic insights with in-depth analysis and predictive models for informed decisions

Process Automation

I create custom tools that automate repetitive operations and free up time for value-added activities

Custom Systems

I develop tailor-made software systems, from platform integrations to customized dashboards

const federico = {
  nome: "Federico Calò",
  ruolo: "Sviluppatore Software",
  città: "Bari, Italia",
  missione: "Aiutare attraverso l'informatica",
  passioni: [
    "Codice Pulito",
    "Innovazione",
    "Crescita Continua"
  ]
};

La Mia Missione

Credo fermamente che l'informatica sia lo strumento più potente per trasformare le idee in realtà e migliorare la vita delle persone.

Democratizzare la Tecnologia

La mia missione è rendere l'informatica accessibile a tutti: dalle piccole imprese locali alle startup innovative, fino ai professionisti che vogliono digitalizzare la propria attività. Ogni realtà merita di sfruttare le potenzialità del digitale.

Unire Informatica ed Economia

Non è solo questione di scrivere codice: è capire come la tecnologia possa generare valore reale. Intrecciando competenze informatiche e visione economica, aiuto le attività a crescere, ottimizzare processi e raggiungere nuovi traguardi di efficienza e redditività.

Creare Soluzioni su Misura

Ogni attività è unica, e così devono esserlo le soluzioni. Sviluppo strumenti personalizzati che rispondono alle esigenze specifiche di ciascun cliente, automatizzando processi ripetitivi e liberando tempo per ciò che conta davvero: far crescere il business.

Trasforma la Tua Attività con la Tecnologia

Che tu gestisca un negozio, uno studio professionale o un'azienda, posso aiutarti a sfruttare le potenzialità dell'informatica per lavorare meglio, più velocemente e in modo più intelligente.

Parliamone Insieme →

Join the Community

Join the developer community where we discuss software, AI, architecture and DevOps. Share ideas, ask questions and grow with us.

Channel

FC Dev Blog

Get notifications on new articles, complete series, weekly tips and featured tools. Bilingual IT/EN content directly in your Telegram.

New articles as they are published
Weekly tips and code snippets
Polls on future topics

Subscribe to Channel

Group

FC Dev Community

A bilingual IT/EN community for developers. Discussions, Q&A, mutual help and networking with other professionals.

Discussions on articles and technologies
Coding help and code review
Job opportunities and collaboration

Join the Group

Discussion Topics

View

Master SQL

RoadMap.sh

November 2024

View

Oracle Certified Foundations Associate

Oracle

October 2024

View

People Leadership Credential

Connect

September 2024

💻 Languages & Technologies

Java

Python

JavaScript

Angular

React

TypeScript

SQL

PHP

CSS/SCSS

Node.js

Docker

Git

💼

12/2024 - Present

Custom Software Engineering Analyst

Accenture

Bari, Puglia, Italy · Hybrid Analysis and development of computer systems through the use of Java and Quarkus in Health and Public Sector. Continuous training on modern technologies for creating customized and efficient software solutions and on agents.

💼

06/2022 - 12/2024

Software analyst and Back End Developer Associate Consultant

Links Management and Technology SpA

Experience analyzing as-is software systems and ETL flows using PowerCenter. Completed Spring Boot training for developing modern and scalable backend applications. Backend developer specialized in Spring Boot, with experience in database design, analysis, development and testing of assigned tasks.

💼

02/2021 - 10/2021

Software programmer

Adesso.it (prima era WebScience srl)

Experience in AS-IS and TO-BE analysis, SEO evolutions and website evolutions to improve user performance and engagement.

🎓

2018 - 2025

Degree in Computer Science

University of Bari Aldo Moro

Bachelor's degree in Computer Science, focusing on software engineering, algorithms, and modern development practices.

📚

2013 - 2018

Diploma - Corporate Information Systems

Technical Commercial Institute of Maglie

Technical diploma specializing in Business Information Systems, combining IT knowledge with business management.

Contattami

Hai un progetto in mente? Parliamone! Compila il form qui sotto e ti risponderò al più presto.

* Campi obbligatori. I tuoi dati saranno utilizzati solo per rispondere alla tua richiesta.

MLOps 101: From Experiment to Production

Every data scientist has experienced this moment: the model performs flawlessly in the Jupyter notebook, metrics look stellar, the team celebrates during the demo. Then comes the dreaded question: "When can we ship this to production?". Silence follows. Industry estimates suggest that up to 85% of machine learning projects never reach production. Not because the models are broken, but because the infrastructure, processes, and discipline required to run them reliably and continuously simply do not exist.

MLOps (Machine Learning Operations) exists precisely to bridge this gap. It is not a single tool or technology, but a set of practices, tools, and cultural shifts that transform isolated experiments into robust, production-grade ML systems. In this article, we will explore what MLOps means, why it has become indispensable, and how to start applying it concretely, even on a limited budget.

What You Will Learn

Why most ML projects never reach production and how MLOps solves this problem
The key differences between DevOps and MLOps
Google's 3-level MLOps maturity model
The complete lifecycle of an ML model in production
How to track experiments with MLflow
How to serve a model with FastAPI and Docker
An open-source stack to get started for less than $5,000/year

What Is MLOps and Why Does It Matter

MLOps applies DevOps principles to the machine learning lifecycle. Just as DevOps unified development and operations for traditional software, MLOps brings together data science, engineering, and operations for ML systems. The goal is to automate and make reproducible every stage: from data preparation to training, from validation to deployment, from monitoring to retraining.

DevOps vs MLOps: Key Differences

Engineers coming from a software background might assume that standard DevOps practices translate directly to ML. In reality, fundamental differences make MLOps a discipline of its own.

Aspect	DevOps	MLOps
Artifact	Source code	Code + Data + Model
Versioning	Git for code	Git + DVC for data and models
Testing	Unit tests, integration tests	Data validation, model validation, A/B tests
CI/CD	Build, test, deploy code	Train, validate, deploy model
Monitoring	Latency, errors, uptime	Data drift, concept drift, model performance
Degradation	Explicit bugs	Silent degradation over time
Reproducibility	Same code = same output	Same code + same data + same seed = same output

The most critical difference is silent degradation. A traditional software service either works or it does not: a bug produces an error. An ML model can keep returning predictions without any technical errors while its accuracy steadily deteriorates because incoming data has shifted from the training distribution. Without targeted monitoring, no one notices until users start complaining.

The ML "Valley of Death"

Gartner predicted that 30% of generative AI projects would be abandoned after the proof-of-concept stage by the end of 2025, due to poor data quality, inadequate risk controls, escalating costs, or unclear business value. MLOps systematically addresses each of these root causes.

The MLOps Market: Numbers and Trends

The MLOps market is expanding at a remarkable pace. According to industry analyses, the global MLOps market was valued between $2 and $3 billion in 2025, with projections ranging from $25 to $56 billion by 2035, at a compound annual growth rate (CAGR) between 29% and 42% depending on the source.

These numbers reflect a concrete reality: organizations are investing heavily in bringing ML models to production. According to market estimates, over 70% of large enterprises in North America run production AI workloads, and more than 55% have integrated automated model monitoring. Yet nearly two-thirds of organizations remain stuck in the pilot stage, unable to scale AI across the enterprise.

The 3 MLOps Maturity Levels

Google defined a 3-level MLOps maturity model that has become the de facto industry standard. Each level represents an increasing degree of automation and reliability in the ML lifecycle.

Level 0: Manual Process

At Level 0, every step is manual. The data scientist works in a notebook, trains the model locally, exports it as a file, and hands it to the engineering team who wraps it in an API. There is no automation, no monitoring, no automatic retraining.

Characteristic	Level 0
Training	Manual, in a notebook
Deployment	Manual, file handoff (.pkl or .h5)
Monitoring	None or manual
Retraining	Only on explicit request
Reproducibility	Poor or nonexistent

This level is common in organizations starting to apply ML to their use cases. It may be sufficient when models are rarely updated and data changes slowly, but it does not scale.

Level 1: ML Pipeline Automation

At Level 1, training is automated through an ML pipeline. Instead of deploying a single model, you deploy the entire pipeline that produces it. This enables continuous training: when new data arrives, the pipeline automatically retrains the model.

Characteristic	Level 1
Training	Automated via pipeline
Deployment	Automated pipeline
Monitoring	Model performance + retraining triggers
Retraining	Automatic on new data or degradation
Reproducibility	Good (versioned pipelines)

Level 1 is sufficient when data changes frequently but the ML approach remains stable. The pipeline stays the same but is re-executed periodically with fresh data.

Level 2: CI/CD for Machine Learning

At Level 2, a full CI/CD system purpose-built for ML is added. Not only does the data change, but so does the pipeline code, features, hyperparameters, and model architecture. Every change goes through automated tests, validation, and controlled deployment.

Characteristic	Level 2
Training	Automated + CI/CD on the pipeline itself
Deployment	Blue/green, canary, A/B testing
Monitoring	Full: data drift, concept drift, performance, latency
Retraining	Automatic with validation and rollback
Reproducibility	Complete (code + data + environment versioned)

Reaching Level 2 is the target for mature organizations. It requires significant investment in infrastructure and culture, but it is the only sustainable way to manage dozens or hundreds of models in production.

The MLOps Lifecycle

The lifecycle of an ML model in production is an iterative process spanning six core phases. Unlike traditional software development, this cycle never truly ends: a production model requires continuous maintenance.

MLOps Lifecycle


    +----------+     +---------+     +----------+
    |   DATA   |---->|  TRAIN  |---->| EVALUATE |
    | Collect  |     | Feature |     | Validate |
    | Clean    |     | Train   |     | Compare  |
    | Version  |     | Tune    |     | Approve  |
    +----------+     +---------+     +----------+
         ^                                |
         |                                v
    +----------+     +---------+     +----------+
    | RETRAIN  |<----| MONITOR |<----| DEPLOY   |
    | Trigger  |     | Drift   |     | Stage    |
    | Schedule |     | Metrics |     | Canary   |
    | Auto     |     | Alert   |     | Release  |
    +----------+     +---------+     +----------+

1. Data: Collection, Cleaning, and Versioning

Everything starts with data. In this phase, raw data is collected, cleaned (handling missing values, outliers, duplicates), transformed into useful features, and versioned. Data versioning is essential: to reproduce a model, you need to know exactly which data was used for training. Tools like DVC (Data Version Control) version large datasets in a Git-like manner.

2. Train: Feature Engineering and Model Training

With data ready, features are built, an algorithm is chosen, and the model is trained. Each experiment (combination of hyperparameters, features, architecture) is tracked with its parameters and metrics. Tools like MLflow make this process systematic and reproducible.

3. Evaluate: Validation and Comparison

The trained model is validated against predefined metrics (accuracy, F1-score, RMSE, AUC) and compared with the version currently in production. If the new model does not meet minimum thresholds or does not improve upon the previous one, it is not promoted.

4. Deploy: Staging, Canary, and Release

The approved model progresses through environments: staging for integration tests, canary for validation with limited real traffic, and finally full production. Strategies like blue/green deployment and canary releases minimize risk.

5. Monitor: Drift, Metrics, and Alerts

In production, the model is monitored continuously. Both technical metrics (latency, throughput, errors) and ML metrics (accuracy on real data, prediction distribution, data drift) are tracked. Alerts fire when metrics fall below thresholds.

6. Retrain: Triggers and Automation

When monitoring detects degradation, retraining is triggered. This can be scheduled (e.g., weekly), trigger-based (e.g., accuracy below 90%), or manual. The new model goes through the evaluate and deploy phases again.

The Open-Source MLOps Stack

One of MLOps' greatest strengths is a mature open-source ecosystem covering every lifecycle phase. You do not need expensive enterprise platforms to get started: the right combination of open-source tools builds a complete MLOps pipeline.

Phase	Tool	Purpose
Data Versioning	DVC	Version datasets and models, integrated with Git
Experiment Tracking	MLflow	Log parameters, metrics, and artifacts per experiment
Model Registry	MLflow Model Registry	Version and promote models (staging/production)
Pipeline Orchestration	Prefect / Airflow	Workflow orchestration, scheduling, retries
Model Serving	FastAPI + Docker	REST API for predictions, containerized
Containerization	Docker + K8s	Reproducible environments, horizontal scaling
Monitoring	Prometheus + Grafana	Metrics, dashboards, alerting
Data Validation	Great Expectations	Automated data quality tests

From Notebook to Pipeline: A Practical Example

Let us walk through the most common transition every ML team faces: refactoring code written in a Jupyter notebook into a modular, reproducible pipeline. We will take a real-world classification example and restructure it step by step.

Before: The Monolithic Notebook

Here is the typical notebook where everything lives in a single file: no separation of concerns, no logging, no versioning.

notebook_v1.ipynb - The classic monolithic notebook


# Cell 1: Everything in one notebook
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, f1_score
import pickle

# Load data
df = pd.read_csv("data/customers.csv")

# Inline feature engineering
df["age_group"] = pd.cut(df["age"], bins=[0, 25, 45, 65, 100],
                          labels=["young", "adult", "senior", "elderly"])
df["total_spend"] = df["orders"] * df["avg_order_value"]

# Split
X = df[["age", "total_spend", "visits", "days_since_last"]]
y = df["churned"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Training - hardcoded parameters
model = RandomForestClassifier(n_estimators=100, max_depth=10)
model.fit(X_train, y_train)

# Evaluation - print to screen
y_pred = model.predict(X_test)
print(f"Accuracy: {accuracy_score(y_test, y_pred)}")
print(f"F1: {f1_score(y_test, y_pred)}")

# Save - pickle with no versioning
with open("model.pkl", "wb") as f:
    pickle.dump(model, f)
print("Model saved!")

Problems with the Monolithic Notebook

Not reproducible: no seed, no data versioning
Not tracked: parameters and metrics live only in notebook output
Not testable: no isolated functions to test
Not deployable: a pickle file is not an API
Not maintainable: changing one feature requires re-running everything

After: A Modular Pipeline

We restructure the code into separate modules, each with a single responsibility. Every function is testable, every parameter is configurable, and every metric is tracked.

src/data/preprocessing.py - Data preparation module


"""Module for data preparation and transformation."""
import pandas as pd
from pathlib import Path
from typing import Tuple


def load_data(path: str) -> pd.DataFrame:
    """Load the dataset from the specified path."""
    filepath = Path(path)
    if not filepath.exists():
        raise FileNotFoundError(f"Dataset not found: {path}")
    return pd.read_csv(filepath)


def create_features(df: pd.DataFrame) -> pd.DataFrame:
    """Create derived features for the model."""
    result = df.copy()
    result["age_group"] = pd.cut(
        result["age"],
        bins=[0, 25, 45, 65, 100],
        labels=["young", "adult", "senior", "elderly"]
    )
    result["total_spend"] = result["orders"] * result["avg_order_value"]
    return result


def split_data(
    df: pd.DataFrame,
    target_col: str = "churned",
    feature_cols: list = None,
    test_size: float = 0.2,
    random_state: int = 42
) -> Tuple[pd.DataFrame, pd.DataFrame, pd.Series, pd.Series]:
    """Split data into train/test with a fixed seed for reproducibility."""
    from sklearn.model_selection import train_test_split

    if feature_cols is None:
        feature_cols = ["age", "total_spend", "visits", "days_since_last"]

    X = df[feature_cols]
    y = df[target_col]
    return train_test_split(X, y, test_size=test_size, random_state=random_state)

src/models/trainer.py - Training module


"""Module for model training and evaluation."""
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score
from typing import Dict, Any
import pandas as pd


def train_model(
    X_train: pd.DataFrame,
    y_train: pd.Series,
    n_estimators: int = 100,
    max_depth: int = 10,
    random_state: int = 42
) -> RandomForestClassifier:
    """Train a RandomForestClassifier with configurable parameters."""
    model = RandomForestClassifier(
        n_estimators=n_estimators,
        max_depth=max_depth,
        random_state=random_state
    )
    model.fit(X_train, y_train)
    return model


def evaluate_model(
    model: RandomForestClassifier,
    X_test: pd.DataFrame,
    y_test: pd.Series
) -> Dict[str, float]:
    """Evaluate the model and return a metrics dictionary."""
    y_pred = model.predict(X_test)
    return {
        "accuracy": accuracy_score(y_test, y_pred),
        "f1_score": f1_score(y_test, y_pred),
        "precision": precision_score(y_test, y_pred),
        "recall": recall_score(y_test, y_pred),
    }

src/pipeline.py - Pipeline orchestration


"""Main pipeline orchestrating all stages."""
from src.data.preprocessing import load_data, create_features, split_data
from src.models.trainer import train_model, evaluate_model
import yaml
from pathlib import Path


def run_pipeline(config_path: str = "config.yaml") -> None:
    """Run the full ML pipeline using an external config."""
    # 1. Load configuration
    with open(config_path) as f:
        config = yaml.safe_load(f)

    # 2. Data preparation
    print("[1/4] Loading data...")
    df = load_data(config["data"]["path"])
    df = create_features(df)

    # 3. Split
    print("[2/4] Splitting train/test...")
    X_train, X_test, y_train, y_test = split_data(
        df,
        test_size=config["data"]["test_size"],
        random_state=config["data"]["random_state"]
    )

    # 4. Training
    print("[3/4] Training model...")
    model = train_model(
        X_train, y_train,
        n_estimators=config["model"]["n_estimators"],
        max_depth=config["model"]["max_depth"],
        random_state=config["model"]["random_state"]
    )

    # 5. Evaluation
    print("[4/4] Evaluating...")
    metrics = evaluate_model(model, X_test, y_test)
    for name, value in metrics.items():
        print(f"  {name}: {value:.4f}")


if __name__ == "__main__":
    run_pipeline()

config.yaml - Externalized configuration


# config.yaml - All parameters in one file
data:
  path: "data/customers.csv"
  test_size: 0.2
  random_state: 42
  feature_cols:
    - age
    - total_spend
    - visits
    - days_since_last

model:
  algorithm: "random_forest"
  n_estimators: 100
  max_depth: 10
  random_state: 42

evaluation:
  metrics:
    - accuracy
    - f1_score
    - precision
    - recall
  min_accuracy: 0.85

Benefits of a Modular Pipeline

Reproducible: fixed seed, externalized configuration, versionable data
Testable: every function is isolated and can have dedicated unit tests
Maintainable: changing features does not affect training and vice versa
Configurable: change hyperparameters without touching code
Automatable: the pipeline can be triggered by CI/CD

Experiment Tracking with MLflow

How many times have you tweaked a hyperparameter and then forgotten which combination yielded the best result? Experiment tracking solves this by automatically recording the parameters, metrics, and artifacts of every experiment.

MLflow is the most widely adopted open-source tool for experiment tracking. It provides a tracking server with a web UI for visualizing and comparing experiments, a Python API for logging, and a Model Registry for managing model lifecycles.

Setup and First Experiment

MLflow installation


# Installation
pip install mlflow

# Start a local tracking server
mlflow server --host 127.0.0.1 --port 5000

src/pipeline_tracked.py - Pipeline with MLflow tracking


"""ML pipeline with experiment tracking via MLflow."""
import mlflow
import mlflow.sklearn
from src.data.preprocessing import load_data, create_features, split_data
from src.models.trainer import train_model, evaluate_model


def run_tracked_pipeline(config: dict) -> None:
    """Run the pipeline while tracking everything with MLflow."""

    # Set the tracking URI (local or remote server)
    mlflow.set_tracking_uri("http://127.0.0.1:5000")
    mlflow.set_experiment("churn-prediction")

    with mlflow.start_run(run_name="rf-baseline") as run:
        # Log parameters
        mlflow.log_param("algorithm", "RandomForest")
        mlflow.log_param("n_estimators", config["model"]["n_estimators"])
        mlflow.log_param("max_depth", config["model"]["max_depth"])
        mlflow.log_param("test_size", config["data"]["test_size"])
        mlflow.log_param("random_state", config["data"]["random_state"])

        # Data preparation
        df = load_data(config["data"]["path"])
        df = create_features(df)
        X_train, X_test, y_train, y_test = split_data(
            df,
            test_size=config["data"]["test_size"],
            random_state=config["data"]["random_state"]
        )

        # Log dataset dimensions
        mlflow.log_param("train_samples", len(X_train))
        mlflow.log_param("test_samples", len(X_test))
        mlflow.log_param("n_features", X_train.shape[1])

        # Training
        model = train_model(
            X_train, y_train,
            n_estimators=config["model"]["n_estimators"],
            max_depth=config["model"]["max_depth"]
        )

        # Evaluation
        metrics = evaluate_model(model, X_test, y_test)

        # Log metrics
        for name, value in metrics.items():
            mlflow.log_metric(name, value)

        # Log the model as an artifact
        mlflow.sklearn.log_model(
            model,
            artifact_path="model",
            registered_model_name="churn-classifier"
        )

        # Log the config as an artifact
        mlflow.log_artifact("config.yaml")

        print(f"Run ID: {run.info.run_id}")
        print(f"Metrics: {metrics}")

After running several experiments, open http://127.0.0.1:5000 in your browser. The MLflow UI displays a table of all experiments, letting you compare metrics, sort by performance, and visualize parameter-vs-metric charts.

Model Registry: Versioning Models

Just as code is versioned with Git, ML models should be versioned with a Model Registry. MLflow Model Registry provides a centralized system for managing model lifecycles through three stages.

Stage	Description	Used By
None / Staging	Model under testing and validation	Data scientists, QA
Production	Approved model serving real traffic	Serving API, end users
Archived	Retired model kept for auditing	Compliance, rollback

Model promotion with MLflow Model Registry


"""Model lifecycle management with MLflow Model Registry."""
from mlflow.tracking import MlflowClient

client = MlflowClient("http://127.0.0.1:5000")

# Retrieve the latest staging version
latest_versions = client.get_latest_versions(
    name="churn-classifier",
    stages=["Staging"]
)

if latest_versions:
    version = latest_versions[0].version
    print(f"Model in staging: v{version}")

    # Promote to Production after validation
    client.transition_model_version_stage(
        name="churn-classifier",
        version=version,
        stage="Production",
        archive_existing_versions=True  # Archive previous version
    )
    print(f"Model v{version} promoted to Production")

# Load the production model for inference
import mlflow.pyfunc

model = mlflow.pyfunc.load_model("models:/churn-classifier/Production")
prediction = model.predict(new_data)

Deployment: FastAPI + Docker

A production ML model is typically exposed as a REST API. FastAPI is the ideal choice for Python: it is fast (ASGI-based), generates automatic documentation (OpenAPI/Swagger), and has excellent data validation through Pydantic. By containerizing with Docker, we get an artifact deployable anywhere.

src/serving/app.py - FastAPI serving endpoint


"""REST API for serving ML model predictions."""
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel, Field
import mlflow.pyfunc
import pandas as pd
import logging
from typing import List

# Logging configuration
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

app = FastAPI(
    title="Churn Prediction API",
    description="ML-based churn prediction API",
    version="1.0.0"
)


class PredictionRequest(BaseModel):
    """Prediction request schema."""
    age: int = Field(..., ge=0, le=120, description="Customer age")
    total_spend: float = Field(..., ge=0, description="Total spending")
    visits: int = Field(..., ge=0, description="Number of visits")
    days_since_last: int = Field(..., ge=0, description="Days since last visit")


class PredictionResponse(BaseModel):
    """Prediction response schema."""
    prediction: int
    probability: float
    model_version: str


# Load model at startup
MODEL_NAME = "churn-classifier"
MODEL_STAGE = "Production"
model = None
model_version = "unknown"


@app.on_event("startup")
async def load_model():
    """Load the MLflow model on server startup."""
    global model, model_version
    try:
        model_uri = f"models:/{MODEL_NAME}/{MODEL_STAGE}"
        model = mlflow.pyfunc.load_model(model_uri)
        model_version = model.metadata.run_id[:8]
        logger.info(f"Model loaded: {MODEL_NAME} ({model_version})")
    except Exception as e:
        logger.error(f"Model loading error: {e}")
        raise


@app.get("/health")
async def health_check():
    """Health check endpoint."""
    return {"status": "healthy", "model_loaded": model is not None}


@app.post("/predict", response_model=PredictionResponse)
async def predict(request: PredictionRequest):
    """Generate a churn prediction for a customer."""
    if model is None:
        raise HTTPException(status_code=503, detail="Model not loaded")

    try:
        input_data = pd.DataFrame([request.model_dump()])
        prediction = model.predict(input_data)
        probability = float(prediction[0]) if hasattr(prediction[0], '__float__') else 0.0

        return PredictionResponse(
            prediction=int(prediction[0]),
            probability=probability,
            model_version=model_version
        )
    except Exception as e:
        logger.error(f"Prediction error: {e}")
        raise HTTPException(status_code=500, detail="Prediction error")


@app.post("/predict/batch", response_model=List[PredictionResponse])
async def predict_batch(requests: List[PredictionRequest]):
    """Generate batch predictions for multiple customers."""
    if model is None:
        raise HTTPException(status_code=503, detail="Model not loaded")

    input_data = pd.DataFrame([r.model_dump() for r in requests])
    predictions = model.predict(input_data)

    return [
        PredictionResponse(
            prediction=int(p),
            probability=float(p),
            model_version=model_version
        )
        for p in predictions
    ]

Dockerfile - Serving container


# Dockerfile for ML model serving
FROM python:3.11-slim

WORKDIR /app

# System dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
    gcc \
    && rm -rf /var/lib/apt/lists/*

# Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Application code
COPY src/serving/ ./serving/
COPY config.yaml .

# Service port
EXPOSE 8000

# Healthcheck
HEALTHCHECK --interval=30s --timeout=5s --retries=3 \
    CMD curl -f http://localhost:8000/health || exit 1

# Start with uvicorn
CMD ["uvicorn", "serving.app:app", "--host", "0.0.0.0", "--port", "8000"]

Build and start the container


# Build the image
docker build -t churn-api:v1.0.0 .

# Start the container
docker run -d \
  --name churn-api \
  -p 8000:8000 \
  -e MLFLOW_TRACKING_URI=http://mlflow-server:5000 \
  churn-api:v1.0.0

# Test the API
curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d '{"age": 35, "total_spend": 1250.50, "visits": 12, "days_since_last": 45}'

Production Monitoring

Deployment is not the end of the journey but the beginning of a critical new phase: monitoring. A production model degrades over time because the world changes and data shifts with it. Monitoring must cover three main areas.

Metrics to Track

Category	Metrics	Tool
Infrastructure	Latency (p50, p95, p99), throughput, HTTP errors, CPU/RAM	Prometheus + Grafana
Model	Accuracy, F1-score, prediction distribution, confidence	MLflow + custom metrics
Data	Data drift, feature drift, missing values, input distribution	Evidently AI / Great Expectations

Data Drift vs Concept Drift

It is critical to distinguish between two types of model degradation:

Data Drift: the distribution of incoming data shifts compared to the training set. Example: a model trained on customers aged 25-45 starts receiving requests for customers aged 60+.
Concept Drift: the relationship between inputs and outputs changes. Example: after a pandemic, customer churn patterns are completely different, but the input features have the same distribution.

src/monitoring/drift_detector.py - Simplified drift detection


"""Data drift detection using statistical tests."""
import numpy as np
from scipy import stats
from typing import Dict, Tuple


def detect_drift(
    reference_data: np.ndarray,
    production_data: np.ndarray,
    feature_names: list,
    threshold: float = 0.05
) -> Dict[str, Dict]:
    """
    Detect data drift by comparing distributions with the KS test.

    Args:
        reference_data: training data (reference)
        production_data: production data (current)
        feature_names: feature names
        threshold: p-value threshold for drift (default 0.05)

    Returns:
        Drift report for each feature
    """
    drift_report = {}

    for i, feature in enumerate(feature_names):
        ref_values = reference_data[:, i]
        prod_values = production_data[:, i]

        # Kolmogorov-Smirnov test
        ks_stat, p_value = stats.ks_2samp(ref_values, prod_values)

        drift_detected = p_value < threshold
        drift_report[feature] = {
            "ks_statistic": round(ks_stat, 4),
            "p_value": round(p_value, 4),
            "drift_detected": drift_detected,
            "ref_mean": round(float(np.mean(ref_values)), 4),
            "prod_mean": round(float(np.mean(prod_values)), 4),
        }

        if drift_detected:
            print(f"DRIFT DETECTED on '{feature}': "
                  f"KS={ks_stat:.4f}, p={p_value:.4f}")

    return drift_report

When to Trigger Retraining

Not every drift warrants immediate retraining. Define clear thresholds: data drift on critical features, accuracy drop greater than 5%, or a significantly skewed prediction distribution. Avoid excessive retraining, which can introduce instability.

Getting Started on Less Than $5,000/Year

MLOps does not have to mean six-figure enterprise platforms. For small to mid-sized teams, it is entirely possible to build effective MLOps infrastructure using open-source tools and minimal cloud spend.

Proposed Stack for Small Teams

Component	Solution	Annual Cost
Code	GitHub Free / GitLab CE	$0
Data Versioning	DVC + Google Cloud Storage (5 GB free)	$0 - $50
Experiment Tracking	MLflow on a budget VM	$200 - $500
Training	Google Colab Pro / spot VMs	$120 - $600
Serving	FastAPI on a VM (2 vCPU, 4 GB RAM)	$300 - $800
Monitoring	Prometheus + Grafana (self-hosted)	$0 (same VM)
CI/CD	GitHub Actions (2,000 min/month free)	$0
Container Registry	GitHub Container Registry	$0

Estimated total: $620 - $1,950/year, well below the $5,000 threshold. This stack supports up to 5-10 models in production with moderate traffic volumes (thousands of predictions per day).

Cost Reduction Tips

Spot/preemptible VMs: up to 70% savings for non-urgent training
Autoscaling: scale to zero when there are no requests
Model compression: smaller models = fewer serving resources
Batch inference: if real-time predictions are not needed, use nightly batches
Multi-tenant: a single MLflow/Grafana infrastructure for all projects

MLOps Project Structure

To wrap up with something immediately actionable, here is the recommended folder structure for an MLOps project. This organization follows separation of concerns and facilitates automation, testing, and collaboration.

Recommended MLOps project structure


churn-prediction/
  data/
    raw/                  # Raw data (versioned with DVC)
    processed/            # Transformed data
    data.dvc              # DVC tracking file
  src/
    data/
      preprocessing.py    # Cleaning and feature engineering
      validation.py       # Data quality validation
    models/
      trainer.py          # Training logic
      evaluator.py        # Evaluation and metrics
    serving/
      app.py              # FastAPI application
      schemas.py          # Pydantic schemas
    monitoring/
      drift_detector.py   # Drift detection
      metrics.py          # Custom metrics
    pipeline.py           # Pipeline orchestration
  tests/
    test_preprocessing.py
    test_trainer.py
    test_api.py
  config.yaml             # Pipeline configuration
  Dockerfile              # Serving container
  docker-compose.yaml     # Full local stack
  requirements.txt        # Python dependencies
  .dvc/                   # DVC configuration
  .github/
    workflows/
      train.yaml          # CI/CD for training
      deploy.yaml         # CI/CD for deployment
  mlflow/                 # MLflow artifacts (local)
  README.md

Conclusion and Next Steps

MLOps is not a luxury reserved for Big Tech. It is a necessity for anyone who wants to bring ML models to production reliably and sustainably. In this article we covered the fundamentals: from understanding the problem (why ML projects fail) to concrete solutions (modular pipelines, experiment tracking, model registry, containerized serving, and monitoring).

The key is to start incrementally. You do not need to reach Google's Level 2 maturity model on day one. Start at Level 0 with good practices:

Right now: Break your notebook code into modules. Use a config.yaml.
Week 1: Add MLflow for experiment tracking.
Week 2: Containerize your model with FastAPI + Docker.
Month 1: Implement a CI/CD pipeline with GitHub Actions.
Month 2: Add monitoring with Prometheus and basic alerting.
Month 3: Implement DVC for data versioning.

In the upcoming articles in this series, we will dive deeper into each component: data management with DVC, ML-specific CI/CD pipelines, advanced monitoring with Evidently AI, and scalable deployment on Kubernetes. Each article will be hands-on, with working code and step-by-step instructions.

Series Roadmap

Article 2: DVC - Data Versioning for ML
Article 3: MLflow Deep Dive - Advanced Experiment Tracking
Article 4: CI/CD for Machine Learning with GitHub Actions
Article 5: Feature Stores and Production Feature Engineering
Article 6: Scalable Model Serving with Kubernetes
Article 7: Advanced Monitoring: Data Drift and Evidently AI
Article 8: Governance, Compliance, and Responsible ML