I create modern web applications and custom digital tools to help businesses grow through technological innovation. My passion is combining computer science and economics to generate real value.
My passion for computer science was born at the Technical Commercial Institute of Maglie, where I discovered the power of programming and the fascination of creating digital solutions. From the start, I understood that computer science was not just code, but an extraordinary tool for turning ideas into reality.
During my studies in Business Information Systems, I began to interweave computer science and economics, understanding how technology can be the engine of growth for any business. This vision accompanied me to the University of Bari, where I obtained my degree in Computer Science, deepening my technical skills and passion for software development.
Today I put this experience at the service of businesses, professionals and startups, creating tailor-made digital solutions that automate processes, optimize resources and open new business opportunities. Because true innovation begins when technology meets the real needs of people.
My Skills
Data Analysis & Predictive Models
I transform data into strategic insights with in-depth analysis and predictive models for informed decisions
Process Automation
I create custom tools that automate repetitive operations and free up time for value-added activities
Custom Systems
I develop tailor-made software systems, from platform integrations to customized dashboards
Credo fermamente che l'informatica sia lo strumento più potente per trasformare le idee in realtà e migliorare la vita delle persone.
Democratizzare la Tecnologia
La mia missione è rendere l'informatica accessibile a tutti: dalle piccole imprese locali alle startup innovative, fino ai professionisti che vogliono digitalizzare la propria attività. Ogni realtà merita di sfruttare le potenzialità del digitale.
Unire Informatica ed Economia
Non è solo questione di scrivere codice: è capire come la tecnologia possa generare valore reale. Intrecciando competenze informatiche e visione economica, aiuto le attività a crescere, ottimizzare processi e raggiungere nuovi traguardi di efficienza e redditività.
Creare Soluzioni su Misura
Ogni attività è unica, e così devono esserlo le soluzioni. Sviluppo strumenti personalizzati che rispondono alle esigenze specifiche di ciascun cliente, automatizzando processi ripetitivi e liberando tempo per ciò che conta davvero: far crescere il business.
Trasforma la Tua Attività con la Tecnologia
Che tu gestisca un negozio, uno studio professionale o un'azienda, posso aiutarti a sfruttare le potenzialità dell'informatica per lavorare meglio, più velocemente e in modo più intelligente.
Bari, Puglia, Italy · Hybrid
Analysis and development of computer systems through the use of Java and Quarkus in Health and Public Sector. Continuous training on modern technologies for creating customized and efficient software solutions and on agents.
💼
06/2022 - 12/2024
Software analyst and Back End Developer Associate Consultant
Links Management and Technology SpA
Experience analyzing as-is software systems and ETL flows using PowerCenter. Completed Spring Boot training for developing modern and scalable backend applications. Backend developer specialized in Spring Boot, with experience in database design, analysis, development and testing of assigned tasks.
💼
02/2021 - 10/2021
Software programmer
Adesso.it (prima era WebScience srl)
Experience in AS-IS and TO-BE analysis, SEO evolutions and website evolutions to improve user performance and engagement.
🎓
2018 - 2025
Degree in Computer Science
University of Bari Aldo Moro
Bachelor's degree in Computer Science, focusing on software engineering, algorithms, and modern development practices.
📚
2013 - 2018
Diploma - Corporate Information Systems
Technical Commercial Institute of Maglie
Technical diploma specializing in Business Information Systems, combining IT knowledge with business management.
Contattami
Hai un progetto in mente? Parliamone! Compila il form qui sotto e ti risponderò al più presto.
* Campi obbligatori. I tuoi dati saranno utilizzati solo per rispondere alla tua richiesta.
Experiment Tracking with MLflow: The Complete Guide
Have you ever spent hours searching for which hyperparameter combination produced that
excellent result from three weeks ago? Or wondered why the production model behaves
differently from what you tested locally? These problems, extremely common in the machine
learning lifecycle, share a root cause: the absence of a structured
experiment tracking system.
MLflow is the most widely adopted open-source answer to this problem.
Born at Databricks in 2018 and donated to the Apache Foundation in 2019, MLflow has
established itself as the de facto standard for ML experiment tracking in the Python
ecosystem. With the release of MLflow 3 in June 2025 at the Databricks
Data + AI Summit, the platform made a significant evolutionary leap: from a pure tracking
tool to a unified platform for developing, evaluating, and deploying ML and GenAI models,
with LoggedModel as a first-class entity and a 25% improvement in logging throughput
compared to version 2.5.
This guide covers MLflow end-to-end: from installation to advanced tracking, from
autologging to the Model Registry, through model serving and Docker integration.
Every example is tested and production-ready.
What You Will Learn in This Article
MLflow architecture: tracking server, backend store, artifact store, and what's new in MLflow 3
Local and production setup: SQLite, PostgreSQL, S3 as artifact store
Autologging: zero-config integration with scikit-learn, XGBoost, PyTorch, TensorFlow
Model Registry: staging, production, archive, and model lifecycle management
Model Serving with MLflow: REST API, Docker containers, FastAPI integration
MLflow with Docker Compose for production deployment
Comparison with alternatives: W&B, Neptune, ClearML - when to choose what
Best practices and anti-patterns for ML teams of any size
The MLOps and Machine Learning in Production Series
#
Article
Focus
1
MLOps: From Experiment to Production
Foundations and full lifecycle
2
ML Pipelines with CI/CD
GitHub Actions and Docker for ML
3
Dataset and Model Versioning: DVC vs LakeFS
Data and model versioning
4
You are here - Experiment Tracking with MLflow
Tracking, registry, serving
5
Model Drift Detection
Monitoring and automated retraining
6
Serving with FastAPI + Uvicorn
Deploying models to production
7
Scaling ML on Kubernetes
KubeFlow and Seldon Core
8
A/B Testing ML Models
Methodology and implementation
9
ML Governance
Compliance, AI Act EU, ethics
10
Case Study: Churn Prediction
End-to-end production pipeline
MLflow Architecture: The Four Core Components
Before writing a single line of code, it is essential to understand MLflow's building
blocks. The platform consists of four main components, each with a specific role in the
machine learning lifecycle:
MLflow Tracking: the API and UI for logging and querying experiments.
Records parameters, metrics, tags, artifacts, and notes for every training run.
MLflow Projects: a format for packaging ML code into reproducible runs,
with dependency and environment management via Conda or Docker.
MLflow Models: a standard format for saving models so they can be
served from multiple frameworks (Python function, REST API, Spark UDF, etc.).
MLflow Model Registry: a centralized store for managing model
lifecycle: versioning, staging, production, archiving, and audit trails.
MLflow 3 (2025) adds a fifth fundamental element: the concept of
LoggedModel as a first-class entity. Instead of the previous
run-centric approach (where the model was just an artifact of a run), LoggedModels
persist across multiple runs, environments, and deployments, with full lineage
to parameters, metrics, traces, and evaluation data.
MLflow Storage Architecture
Component
What It Stores
Recommended Backend
Backend Store
Parameters, metrics, tags, run metadata
PostgreSQL / MySQL (production), SQLite (local)
Artifact Store
Model files, images, CSV, evaluation data
S3, GCS, Azure Blob (production), local filesystem
Tracking Server
REST API for logging and web UI
Docker container / EC2 / Kubernetes pod
Model Registry
Model versions, stages, annotations, webhooks
Requires a database (does not work with file system)
MLflow Setup: From Local to Production
Installation and Local Setup
MLflow installation requires a single pip command. For local development, MLflow uses
SQLite as the backend store and the local filesystem as the artifact store: no additional
server is required.
# Install MLflow (2.x/3.x)
pip install mlflow
# With extras for specific integrations
pip install mlflow[extras] # scikit-learn, XGBoost, LightGBM
pip install mlflow[databricks] # Databricks integration
pip install mlflow[genai] # GenAI and LLM tools (MLflow 3+)
# Verify installation
mlflow --version
# mlflow, version 2.x.x or 3.x.x
# Launch the local UI (uses ./mlruns as storage)
mlflow ui
# UI available at http://localhost:5000
# Start with SQLite database
mlflow server \
--backend-store-uri sqlite:///mlflow.db \
--default-artifact-root ./mlruns \
--host 0.0.0.0 \
--port 5000
Production Setup: PostgreSQL + S3
For production environments with multiple users and high concurrency, it is essential
to use a relational database as the backend and object storage as the artifact store.
MLflow's Model Registry requires a database (does not work with a file system):
For teams on a tight budget, this Docker Compose configuration on a single VM costs
approximately 180 EUR/year (EC2 t3.small or equivalent). MinIO replaces S3 locally
with a fully compatible API. For persistent storage you can also use AWS S3
(around 2-5 EUR/month for a few GB of artifacts). The savings compared to SaaS
platforms like W&B Teams (50+ USD/user/month) are significant: a 5-person team
saves over 2,500 EUR per year.
Experiment Tracking: Logging Params, Metrics, and Artifacts
The heart of MLflow is the tracking API. Each call to mlflow.start_run()
creates a new run within an experiment. An experiment
groups related runs (e.g., all runs for the churn prediction model). Runs log four
types of data:
Parameters: fixed values for the run (hyperparameters, configuration)
Metrics: numeric values that can vary over time (loss, accuracy per epoch)
Artifacts: arbitrary files (models, images, datasets, HTML reports)
Tags: key-value metadata for annotating and filtering runs
import mlflow
import mlflow.sklearn
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import (
accuracy_score, f1_score, roc_auc_score,
confusion_matrix, classification_report, RocCurveDisplay
)
from sklearn.model_selection import train_test_split
import os
# ==================== MLflow Configuration ====================
# Connect to the tracking server (local or remote)
mlflow.set_tracking_uri("http://localhost:5000")
# Create or use an existing experiment
experiment_name = "churn-prediction-gbm"
mlflow.set_experiment(experiment_name)
experiment = mlflow.get_experiment_by_name(experiment_name)
print(f"Experiment ID: {experiment.experiment_id}")
# ==================== Training with Full Tracking ====================
def train_churn_model(X_train, X_val, y_train, y_val, params: dict) -> str:
"""
Train a GBM for churn prediction with complete MLflow tracking.
Returns the MLflow run_id.
"""
with mlflow.start_run(
run_name=f"gbm-lr{params['learning_rate']}-depth{params['max_depth']}"
) as run:
# ---- 1. TAGS: run metadata ----
mlflow.set_tags({
"team": "ml-engineering",
"project": "churn-prediction",
"dataset_version": "v2.1",
"git_commit": os.popen("git rev-parse HEAD").read().strip(),
"environment": "dev",
})
# ---- 2. PARAMS: hyperparameters and config ----
mlflow.log_params(params)
mlflow.log_params({
"train_size": len(X_train),
"val_size": len(X_val),
"n_features": X_train.shape[1],
"target_positive_rate": float(y_train.mean()),
})
# ---- 3. TRAINING ----
model = GradientBoostingClassifier(**params)
model.fit(X_train, y_train)
# ---- 4. STEP-BY-STEP METRICS during training ----
train_scores = list(model.staged_predict(X_train))
val_scores = list(model.staged_predict(X_val))
for step, (tr_pred, val_pred) in enumerate(zip(train_scores, val_scores)):
train_acc = accuracy_score(y_train, tr_pred)
val_acc = accuracy_score(y_val, val_pred)
mlflow.log_metrics({
"train_accuracy_step": train_acc,
"val_accuracy_step": val_acc,
}, step=step)
# ---- 5. FINAL METRICS ----
y_pred = model.predict(X_val)
y_prob = model.predict_proba(X_val)[:, 1]
final_metrics = {
"accuracy": accuracy_score(y_val, y_pred),
"f1_score": f1_score(y_val, y_pred),
"auc_roc": roc_auc_score(y_val, y_prob),
}
mlflow.log_metrics(final_metrics)
# ---- 6. ARTIFACTS: model file and reports ----
# Confusion matrix as an image artifact
fig, ax = plt.subplots(figsize=(6, 5))
cm = confusion_matrix(y_val, y_pred)
im = ax.imshow(cm, interpolation='nearest', cmap='Blues')
ax.set_title('Confusion Matrix - Churn Prediction')
ax.set_xlabel('Predicted')
ax.set_ylabel('True')
plt.colorbar(im)
plt.tight_layout()
mlflow.log_figure(fig, "confusion_matrix.png")
plt.close()
# Classification report as a text artifact
report = classification_report(y_val, y_pred, target_names=["No Churn", "Churn"])
mlflow.log_text(report, "classification_report.txt")
# Feature importance as a JSON table
feature_imp = pd.DataFrame({
"feature": X_train.columns.tolist(),
"importance": model.feature_importances_
}).sort_values("importance", ascending=False)
mlflow.log_table(feature_imp.to_dict(orient="list"), "feature_importance.json")
# ---- 7. LOG MODEL: save with signature and input example ----
input_example = X_val.head(3)
signature = mlflow.models.infer_signature(X_val, y_pred)
mlflow.sklearn.log_model(
sk_model=model,
artifact_path="model",
signature=signature,
input_example=input_example,
registered_model_name="churn-gbm-model", # Auto-register in Registry
)
print(f"Run ID: {run.info.run_id}")
print(f"Accuracy: {final_metrics['accuracy']:.4f}")
print(f"AUC-ROC: {final_metrics['auc_roc']:.4f}")
return run.info.run_id
Nested Runs and Hyperparameter Search
MLflow supports nested runs: child runs inside a parent run. This
pattern is ideal for hyperparameter search (Optuna, GridSearchCV) where you want
a parent run representing the overall search and many child runs, one per tested
configuration:
import mlflow
import optuna
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import cross_val_score
def hyperparameter_search(X_train, y_train, n_trials: int = 50) -> str:
"""
Hyperparameter search with Optuna + MLflow nested runs.
Parent run: contains the search summary.
Child runs: each Optuna trial is a child MLflow run.
"""
mlflow.set_tracking_uri("http://localhost:5000")
mlflow.set_experiment("churn-hparam-search")
with mlflow.start_run(run_name="optuna-search-v1") as parent_run:
mlflow.set_tag("search_method", "optuna-tpe")
mlflow.log_param("n_trials", n_trials)
mlflow.log_param("optimization_metric", "auc_roc")
def objective(trial) -> float:
params = {
"n_estimators": trial.suggest_int("n_estimators", 100, 1000),
"learning_rate": trial.suggest_float("learning_rate", 0.001, 0.3, log=True),
"max_depth": trial.suggest_int("max_depth", 2, 8),
"subsample": trial.suggest_float("subsample", 0.5, 1.0),
}
# Each trial gets its own child MLflow run
with mlflow.start_run(
run_name=f"trial-{trial.number}",
nested=True # marks this as a child run
):
mlflow.log_params(params)
model = GradientBoostingClassifier(**params, random_state=42)
scores = cross_val_score(model, X_train, y_train, cv=3, scoring="roc_auc")
auc_mean = scores.mean()
auc_std = scores.std()
mlflow.log_metrics({
"cv_auc_mean": auc_mean,
"cv_auc_std": auc_std,
})
return auc_mean
study = optuna.create_study(
direction="maximize",
sampler=optuna.samplers.TPESampler(seed=42)
)
study.optimize(objective, n_trials=n_trials)
best_trial = study.best_trial
mlflow.log_params({f"best_{k}": v for k, v in best_trial.params.items()})
mlflow.log_metric("best_auc", best_trial.value)
print(f"Best AUC: {best_trial.value:.4f}")
print(f"Best params: {best_trial.params}")
return parent_run.info.run_id
Autologging: Zero-Config Tracking
Autologging is one of MLflow's most convenient features: with a single line of code,
MLflow automatically intercepts calls to major ML frameworks and logs parameters,
metrics, and artifacts without any changes to your training code. It is supported by
scikit-learn, XGBoost, LightGBM, PyTorch Lightning, TensorFlow/Keras, Spark MLlib,
and others.
import mlflow
import mlflow.sklearn
import mlflow.xgboost
# ==================== scikit-learn Autologging ====================
mlflow.sklearn.autolog(
log_input_examples=True,
log_model_signatures=True,
log_models=True,
log_datasets=False,
max_tuning_runs=100,
exclusive=False,
)
mlflow.set_experiment("autolog-demo")
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
with mlflow.start_run(run_name="rf-gridsearch-autolog"):
param_grid = {
"n_estimators": [100, 300],
"max_depth": [4, 6, 8],
"min_samples_split": [10, 20],
}
model = GridSearchCV(
RandomForestClassifier(random_state=42),
param_grid,
cv=3,
scoring="roc_auc",
n_jobs=-1
)
model.fit(X_train, y_train)
# MLflow automatically logged:
# - All RandomForest parameters
# - cv=3, scoring, n_jobs
# - Best params from GridSearchCV
# - Cross-validation score
# - The model as an artifact
# ==================== XGBoost Autologging ====================
mlflow.xgboost.autolog(
importance_types=["gain", "weight"],
log_model_signatures=True,
)
import xgboost as xgb
with mlflow.start_run(run_name="xgb-autolog"):
dtrain = xgb.DMatrix(X_train, label=y_train)
dval = xgb.DMatrix(X_val, label=y_val)
params = {
"objective": "binary:logistic",
"eval_metric": ["logloss", "auc"],
"learning_rate": 0.05,
"max_depth": 6,
}
# Autolog tracks all metrics per boosting round
booster = xgb.train(
params,
dtrain,
num_boost_round=500,
evals=[(dtrain, "train"), (dval, "val")],
early_stopping_rounds=50,
verbose_eval=False,
)
Autologging Limitations
Autologging is convenient for rapid prototyping, but has limitations in production:
it does not log custom user-defined metrics, does not log dataset information
(size, version, class distribution), and does not handle dependencies between
experiments. For mature MLOps pipelines, it is recommended to use autologging
as a foundation and add manual calls to mlflow.log_param(),
mlflow.log_metric(), and mlflow.log_artifact() for
domain-specific information.
MLflow Model Registry: Managing the Model Lifecycle
The MLflow Model Registry is the component that transforms MLflow from a simple tracking
tool into a true MLOps platform. It allows managing model versions through a standardized
lifecycle: from development to staging to production, with
audit trails, annotations, and notifications.
import mlflow
from mlflow.tracking import MlflowClient
mlflow.set_tracking_uri("http://localhost:5000")
client = MlflowClient()
MODEL_NAME = "churn-gbm-model"
# ==================== 1. REGISTER A MODEL ====================
# Method A: during log_model (most common)
with mlflow.start_run() as run:
mlflow.sklearn.log_model(
sk_model=model,
artifact_path="model",
registered_model_name=MODEL_NAME,
)
# Automatically registered as version 1 with stage "None"
# Method B: from an existing run via URI
run_id = "abc123def456"
version = mlflow.register_model(
model_uri=f"runs:/{run_id}/model",
name=MODEL_NAME,
tags={"team": "ml-eng", "algorithm": "gbm"}
)
print(f"Registered: {MODEL_NAME} version {version.version}")
# ==================== 2. MANAGE VERSIONS AND STAGES ====================
# Add a description to the version
client.update_model_version(
name=MODEL_NAME,
version=version.version,
description=(
"GBM for churn prediction v2.1. "
"Accuracy: 0.9423, AUC-ROC: 0.9567. "
"Trained on 2024-01 to 2025-01 dataset, 45k samples."
)
)
# Promote to Staging (after internal validation)
client.transition_model_version_stage(
name=MODEL_NAME,
version=version.version,
stage="Staging",
archive_existing_versions=False,
)
print(f"Model v{version.version} promoted to Staging")
# Add traceability tags
client.set_model_version_tag(
name=MODEL_NAME,
version=version.version,
key="validated_by",
value="alice.rossi@company.com"
)
# Promote to Production (after team approval)
client.transition_model_version_stage(
name=MODEL_NAME,
version=version.version,
stage="Production",
archive_existing_versions=True, # Archive the previous Production version
)
print(f"Model v{version.version} is now in production!")
# ==================== 3. LOAD THE PRODUCTION MODEL ====================
def load_production_model(model_name: str):
"""Always loads the Production version from the registry."""
model_uri = f"models:/{model_name}/Production"
return mlflow.sklearn.load_model(model_uri)
# In an inference script or serving layer
model = load_production_model(MODEL_NAME)
predictions = model.predict(new_data)
# ==================== 4. QUERY THE REGISTRY ====================
# List all versions of a model
versions = client.search_model_versions(f"name='{MODEL_NAME}'")
for v in versions:
print(f"v{v.version} | Stage: {v.current_stage} | Run: {v.run_id[:8]}...")
# Get only Production versions
prod_versions = client.get_latest_versions(MODEL_NAME, stages=["Production"])
if prod_versions:
latest_prod = prod_versions[0]
run_data = client.get_run(latest_prod.run_id).data
print(f"Production AUC-ROC: {run_data.metrics.get('auc_roc', 'N/A')}")
Model Serving with MLflow
MLflow includes a built-in serving server that exposes any registered model as a REST
API with a single command. This is an excellent solution for prototyping and development
environments. For production at scale, the recommended approach is to load the MLflow
model into a FastAPI application (covered in the next article in this series).
# ==================== Serving via CLI ====================
# Serve the latest Production version of the model
mlflow models serve \
--model-uri "models:/churn-gbm-model/Production" \
--host 0.0.0.0 \
--port 8080 \
--env-manager conda
# Serve a specific run
mlflow models serve \
--model-uri "runs:/abc123def456/model" \
--port 8080
# With Docker (recommended for production)
mlflow models build-docker \
--model-uri "models:/churn-gbm-model/Production" \
--name "churn-model-server" \
--enable-mlserver
docker run -p 8080:8080 churn-model-server
# ==================== Test the Server ====================
import requests
import json
import pandas as pd
test_data = pd.DataFrame({
"feature_0": [0.5, -1.2],
"feature_1": [1.3, 0.8],
})
payload = {
"dataframe_split": {
"columns": test_data.columns.tolist(),
"data": test_data.values.tolist()
}
}
response = requests.post(
"http://localhost:8080/invocations",
headers={"Content-Type": "application/json"},
data=json.dumps(payload)
)
print(f"Predictions: {response.json()}")
# {"predictions": [0, 1]}
MLflow + FastAPI Integration for Production
For robust production serving, the best practice is to load the model from the MLflow
Model Registry at application startup and automatically refresh it when a new Production
version is promoted:
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import mlflow
import pandas as pd
import numpy as np
import threading
import time
import logging
from typing import List
logger = logging.getLogger(__name__)
app = FastAPI(title="Churn Prediction API", version="2.0.0")
class MLflowModelManager:
"""
Manages loading and auto-refresh of the model
from the MLflow Model Registry.
"""
def __init__(self, model_name: str, tracking_uri: str, refresh_interval: int = 300):
self.model_name = model_name
self._model = None
self._model_version = None
self._lock = threading.Lock()
mlflow.set_tracking_uri(tracking_uri)
self._load_model()
self._start_refresh_thread()
def _load_model(self) -> None:
try:
model_uri = f"models:/{self.model_name}/Production"
new_model = mlflow.sklearn.load_model(model_uri)
client = mlflow.MlflowClient()
versions = client.get_latest_versions(self.model_name, stages=["Production"])
version = versions[0].version if versions else "unknown"
with self._lock:
self._model = new_model
self._model_version = version
logger.info(f"Model {self.model_name} v{version} loaded from registry")
except Exception as e:
logger.error(f"Failed to load model: {e}")
def _start_refresh_thread(self) -> None:
def refresh_loop():
while True:
time.sleep(300) # Check every 5 minutes
self._load_model()
thread = threading.Thread(target=refresh_loop, daemon=True)
thread.start()
def predict(self, features: pd.DataFrame) -> np.ndarray:
with self._lock:
if self._model is None:
raise RuntimeError("Model not available")
return self._model.predict(features)
def predict_proba(self, features: pd.DataFrame) -> np.ndarray:
with self._lock:
return self._model.predict_proba(features)[:, 1]
@property
def model_version(self) -> str:
return self._model_version or "unknown"
model_manager = MLflowModelManager(
model_name="churn-gbm-model",
tracking_uri="http://mlflow-server:5000",
)
class PredictionRequest(BaseModel):
features: List[List[float]]
feature_names: List[str]
class PredictionResponse(BaseModel):
predictions: List[int]
probabilities: List[float]
model_version: str
@app.post("/predict", response_model=PredictionResponse)
async def predict(request: PredictionRequest) -> PredictionResponse:
try:
df = pd.DataFrame(request.features, columns=request.feature_names)
predictions = model_manager.predict(df).tolist()
probabilities = model_manager.predict_proba(df).tolist()
return PredictionResponse(
predictions=predictions,
probabilities=probabilities,
model_version=model_manager.model_version,
)
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
@app.get("/health")
async def health_check():
return {"status": "healthy", "model_version": model_manager.model_version}
MLflow vs Alternatives: W&B, Neptune, ClearML
MLflow is not the only option for experiment tracking. The market offers several valid
alternatives, each with specific strengths. The choice depends on budget, team size,
existing infrastructure, and governance requirements.
Full Comparison of Experiment Tracking Tools
Dimension
MLflow
W&B
Neptune
ClearML
License
Open-source (Apache 2.0)
SaaS + Enterprise self-host
SaaS + self-host
Open-source + Enterprise
Cost (5-person team)
Free (self-hosted)
~$250/month
~$100/month
Free (self-hosted)
Setup
Docker in ~30 min
Zero config (SaaS)
Zero config (SaaS)
Complex (ClearML Server)
UI/UX
Functional, not flashy
Excellent, rich dashboards
Good, highly customizable
Full-featured, steep learning curve
Autologging
Excellent (20+ frameworks)
Excellent (W&B SDK)
Good
Automatic via monkey-patching
Model Registry
Built-in, staging workflow
W&B Model Registry
Available
Integrated Model Repository
Hyperparameter Sweep
Via Optuna / Hyperopt
Native Sweeps (excellent)
Good
Built-in HPO
GenAI / LLM Support
MLflow 3: tracing, evaluation
Prompts, LLM monitoring
LLM tracking
LLM experiment tracking
Best for
Self-hosted infra, SMBs, compliance
Teams prioritizing UX and collaboration
Teams with moderate budget
Enterprise automation needs
When to Choose MLflow
MLflow is the optimal choice in these scenarios:
Limited budget (<5K EUR/year): self-hosting on a single VM costs around 180 EUR/year
Data residency requirements: sensitive data that cannot leave company infrastructure
Python ecosystem integration: MLflow integrates natively with scikit-learn, PyTorch, TensorFlow, XGBoost, and 20+ frameworks
Compliance and audit (EU AI Act): full access to database and artifacts, no SaaS vendor lock-in
DevOps-oriented teams: MLflow is a Docker container like any other service
Querying and Analyzing Experiments
One of MLflow's most valuable features is the ability to programmatically query all
experiments to find the best run, compare runs across multiple dimensions, or extract
data for automated reports:
import mlflow
import pandas as pd
mlflow.set_tracking_uri("http://localhost:5000")
# ==================== Search Runs with Filters ====================
# Find all runs with AUC-ROC > 0.92 in your experiment
runs = mlflow.search_runs(
experiment_names=["churn-prediction-gbm"],
filter_string="metrics.auc_roc > 0.92 and tags.environment = 'dev'",
order_by=["metrics.auc_roc DESC"],
max_results=20,
)
print(runs[["run_id", "metrics.auc_roc", "metrics.f1_score",
"params.learning_rate", "params.max_depth"]].head())
# ==================== Find and Load the Best Run ====================
def get_best_run(experiment_name: str, metric: str = "metrics.auc_roc") -> dict:
runs = mlflow.search_runs(
experiment_names=[experiment_name],
filter_string="status = 'FINISHED'",
order_by=[f"{metric} DESC"],
max_results=1,
)
if runs.empty:
raise ValueError(f"No runs found in {experiment_name}")
best = runs.iloc[0]
return {
"run_id": best["run_id"],
"auc_roc": best.get("metrics.auc_roc"),
"accuracy": best.get("metrics.accuracy"),
}
best_run = get_best_run("churn-prediction-gbm")
print(f"Best run: {best_run['run_id']}, AUC-ROC: {best_run['auc_roc']:.4f}")
# Load the model from the best run
model = mlflow.sklearn.load_model(f"runs:/{best_run['run_id']}/model")
# ==================== Export Step-by-Step Metric History ====================
from mlflow.tracking import MlflowClient
client = MlflowClient()
run_id = "abc123def456"
metric_history = client.get_metric_history(run_id, "val_accuracy_step")
steps = [m.step for m in metric_history]
values = [m.value for m in metric_history]
accuracy_over_time = pd.DataFrame({"step": steps, "val_accuracy": values})
print(f"Training steps logged: {len(accuracy_over_time)}")
Best Practices for MLflow in Production
1. Naming Conventions
# Recommended naming schema for experiments and runs
# EXPERIMENT: [team]-[project]-[type]
"ml-eng-churn-prediction-gbm"
"ml-eng-churn-prediction-neural-net"
"research-recommender-collaborative"
# RUN NAME: [algorithm]-[key-param]-[date]
"gbm-lr0.05-depth6-2025-11-15"
"xgb-v2-autofeat-2025-11-20"
# Always use tags for structured metadata
mlflow.set_tags({
"team": "ml-engineering",
"project": "churn-prediction",
"environment": "dev", # dev | staging | prod
"dataset_version": "v2.1",
"git_branch": git_branch,
"git_commit": git_commit[:8],
"triggered_by": "ci-cd", # manual | ci-cd | scheduled
"approved_by": "", # Filled before promotion
})
Using inconsistent metric names: accuracy and
acc are two separate metrics for MLflow. Define a canonical dictionary
of metric names and use it consistently.
Not closing active runs: if code crashes inside an active run
without the with mlflow.start_run() context manager, the run stays
in "RUNNING" state indefinitely. Always use the context manager.
Logging large CSV files as artifacts: MLflow is not a data lake.
For large datasets, log only metadata (DVC path, hash, size) and use DVC for
data versioning.
Using SQLite in multi-user production: SQLite does not support
concurrent writes. With two parallel training processes, lock errors occur.
Use PostgreSQL or MySQL for any multi-user setup.
Not logging the dataset version: model parameters without the
data version are insufficient for reproducibility. Always log the Git commit,
DVC tag, and dataset dimensions.
Skipping the Staging intermediate step: the Staging workflow
allows integration tests and team validation before production deployment.
Never skip this step.
What's New in MLflow 3: Toward GenAI and Agents
With the release of MLflow 3 in June 2025, the platform made a significant evolutionary
leap oriented toward the GenAI world. The most relevant updates for teams working with
traditional ML models and LLMs:
LoggedModel as a first-class entity: the model is no longer just
an artifact of a run. LoggedModels persist across runs, environments, and deployments,
with full lineage to metrics, parameters, traces, and evaluation data.
25% logging performance improvement: MLflow 3.x optimized database
queries and reduced logging overhead, delivering 25% higher log throughput compared
to version 2.5 (2026 benchmarks).
GenAI Tracing: automatic tracing for LLMs, chains, tool calls, and
agents with support for LangChain, LlamaIndex, OpenAI SDK, Anthropic, and others.
Feedback Collection API: structured collection of human feedback on
model outputs, integrated with the UI for review and evaluation.
Evolved Evaluation Framework: mlflow.evaluate() now
supports custom metrics, LLM-as-judge, and automatic model comparison.
Conclusions and Next Steps
MLflow has consolidated its position as the most widely used experiment tracking tool
in the open-source ML ecosystem, with an active community and continuous evolution.
The combination of tracking, Model Registry, and serving in a single self-hosted platform
makes it the natural choice for teams that want complete control over their MLOps
infrastructure without SaaS costs.
The workflow we covered in this article, from experiment logging to production promotion
through the Model Registry, covers 90% of real use cases for an ML team. Integrated with
DVC for data versioning (previous article) and GitHub Actions for CI/CD automation, it
delivers a complete, professional MLOps system for under 250 EUR/year.
The next article tackles one of the most insidious problems in production machine
learning: Model Drift. We will explore how to detect performance
degradation over time (data drift, concept drift, prediction drift) and how to implement
automated retraining systems with Grafana and Prometheus alerts.