Hi! I'm

Federico Calò

Software Developer | Technical Writer

I create modern web applications and custom digital tools to help businesses grow through technological innovation. My passion is combining computer science and economics to generate real value.

Contact Me

About Me

My passion for computer science was born at the Technical Commercial Institute of Maglie, where I discovered the power of programming and the fascination of creating digital solutions. From the start, I understood that computer science was not just code, but an extraordinary tool for turning ideas into reality.

During my studies in Business Information Systems, I began to interweave computer science and economics, understanding how technology can be the engine of growth for any business. This vision accompanied me to the University of Bari, where I obtained my degree in Computer Science, deepening my technical skills and passion for software development.

Today I put this experience at the service of businesses, professionals and startups, creating tailor-made digital solutions that automate processes, optimize resources and open new business opportunities. Because true innovation begins when technology meets the real needs of people.

My Skills

Data Analysis & Predictive Models

I transform data into strategic insights with in-depth analysis and predictive models for informed decisions

Process Automation

I create custom tools that automate repetitive operations and free up time for value-added activities

Custom Systems

I develop tailor-made software systems, from platform integrations to customized dashboards

const federico = {
  nome: "Federico Calò",
  ruolo: "Sviluppatore Software",
  città: "Bari, Italia",
  missione: "Aiutare attraverso l'informatica",
  passioni: [
    "Codice Pulito",
    "Innovazione",
    "Crescita Continua"
  ]
};

La Mia Missione

Credo fermamente che l'informatica sia lo strumento più potente per trasformare le idee in realtà e migliorare la vita delle persone.

Democratizzare la Tecnologia

La mia missione è rendere l'informatica accessibile a tutti: dalle piccole imprese locali alle startup innovative, fino ai professionisti che vogliono digitalizzare la propria attività. Ogni realtà merita di sfruttare le potenzialità del digitale.

Unire Informatica ed Economia

Non è solo questione di scrivere codice: è capire come la tecnologia possa generare valore reale. Intrecciando competenze informatiche e visione economica, aiuto le attività a crescere, ottimizzare processi e raggiungere nuovi traguardi di efficienza e redditività.

Creare Soluzioni su Misura

Ogni attività è unica, e così devono esserlo le soluzioni. Sviluppo strumenti personalizzati che rispondono alle esigenze specifiche di ciascun cliente, automatizzando processi ripetitivi e liberando tempo per ciò che conta davvero: far crescere il business.

Trasforma la Tua Attività con la Tecnologia

Che tu gestisca un negozio, uno studio professionale o un'azienda, posso aiutarti a sfruttare le potenzialità dell'informatica per lavorare meglio, più velocemente e in modo più intelligente.

Parliamone Insieme →

Join the Community

Join the developer community where we discuss software, AI, architecture and DevOps. Share ideas, ask questions and grow with us.

Channel

FC Dev Blog

Get notifications on new articles, complete series, weekly tips and featured tools. Bilingual IT/EN content directly in your Telegram.

New articles as they are published
Weekly tips and code snippets
Polls on future topics

Subscribe to Channel

Group

FC Dev Community

A bilingual IT/EN community for developers. Discussions, Q&A, mutual help and networking with other professionals.

Discussions on articles and technologies
Coding help and code review
Job opportunities and collaboration

Join the Group

Discussion Topics

View

Master SQL

RoadMap.sh

November 2024

View

Oracle Certified Foundations Associate

Oracle

October 2024

View

People Leadership Credential

Connect

September 2024

💻 Languages & Technologies

Java

Python

JavaScript

Angular

React

TypeScript

SQL

PHP

CSS/SCSS

Node.js

Docker

Git

💼

12/2024 - Present

Custom Software Engineering Analyst

Accenture

Bari, Puglia, Italy · Hybrid Analysis and development of computer systems through the use of Java and Quarkus in Health and Public Sector. Continuous training on modern technologies for creating customized and efficient software solutions and on agents.

💼

06/2022 - 12/2024

Software analyst and Back End Developer Associate Consultant

Links Management and Technology SpA

Experience analyzing as-is software systems and ETL flows using PowerCenter. Completed Spring Boot training for developing modern and scalable backend applications. Backend developer specialized in Spring Boot, with experience in database design, analysis, development and testing of assigned tasks.

💼

02/2021 - 10/2021

Software programmer

Adesso.it (prima era WebScience srl)

Experience in AS-IS and TO-BE analysis, SEO evolutions and website evolutions to improve user performance and engagement.

🎓

2018 - 2025

Degree in Computer Science

University of Bari Aldo Moro

Bachelor's degree in Computer Science, focusing on software engineering, algorithms, and modern development practices.

📚

2013 - 2018

Diploma - Corporate Information Systems

Technical Commercial Institute of Maglie

Technical diploma specializing in Business Information Systems, combining IT knowledge with business management.

Contattami

Hai un progetto in mente? Parliamone! Compila il form qui sotto e ti risponderò al più presto.

* Campi obbligatori. I tuoi dati saranno utilizzati solo per rispondere alla tua richiesta.

YOLO and Object Detection: From Theory to Practice with YOLO26

In January 2026 Ultralytics released YOLO26, the latest evolution of the YOLO family that has redefined real-time object detection. But to understand YOLO26 you must first understand YOLO itself: what makes it extraordinarily fast, how its architecture works, and why it has become the de facto standard for object detection in industrial, automotive, surveillance, and robotics applications. In this article we will build a complete understanding of modern object detection, from YOLOv1 through practical implementation with YOLOv8 and YOLO26.

Note: This is one of the first comprehensive English tutorials on YOLO26. The Computer Vision with Deep Learning series on federicocalo.dev is the reference source for these topics.

What You Will Learn

How object detection works: bounding boxes, confidence scores, class probabilities
YOLO architecture: backbone, neck, head - from theory to implementation
YOLO history: from v1 to YOLO26, key improvements at each version
Anchor-free detection: why YOLOv8 and YOLO26 abandoned anchor boxes
Core metrics: IoU, mAP, precision-recall curves
Complete training on a custom dataset with YOLOv8 (Ultralytics)
Real-time inference on images, video and webcam
Export and deployment: ONNX, TensorRT, OpenVINO, CoreML, NCNN
YOLO26: the architectural innovations of January 2026
Best practices for dataset preparation and model selection

1. Object Detection Fundamentals

Object detection is the task of simultaneously locating and classifying one or more objects within an image. Unlike classification (a single label for the entire image), detection must answer three questions: what is in the image, where it is (bounding box), and with what confidence it was detected.

1.1 Output Representation

Every detected object is represented by a bounding box with 5 core values plus a probability vector for the classes. YOLO uses normalized coordinates (0 to 1) relative to the image dimensions, making annotations resolution-independent.

YOLO Output Format

# Each detection is represented by:
# [x_center, y_center, width, height, confidence] + [p_class1, p_class2, ..., p_classN]

# Example: detecting a cat (class 0) in a 640x640 image
detection = {
    'bbox': (0.45, 0.60, 0.30, 0.40),  # x_c, y_c, w, h (normalized 0-1)
    'confidence': 0.94,                 # objectness confidence score
    'class_id': 0,
    'class_name': 'cat',
    'class_prob': 0.96                  # conditional class probability
}

# The "final score" is: confidence * class_prob = 0.94 * 0.96 = 0.90

# Convert to pixel coordinates (640x640 image):
x_c_px = 0.45 * 640   # = 288
y_c_px = 0.60 * 640   # = 384
w_px   = 0.30 * 640   # = 192
h_px   = 0.40 * 640   # = 256

# Convert to [x1, y1, x2, y2] format
x1 = x_c_px - w_px / 2   # = 192
y1 = y_c_px - h_px / 2   # = 256
x2 = x_c_px + w_px / 2   # = 384
y2 = y_c_px + h_px / 2   # = 512

1.2 Non-Maximum Suppression (NMS)

Detection models produce hundreds of overlapping bounding box proposals for every object. Non-Maximum Suppression (NMS) selects the single best box per object and removes all near-duplicate proposals, using Intersection over Union (IoU) as the overlap criterion. YOLO26 introduces a learned Dynamic NMS that adapts the threshold based on scene density.

IoU and NMS from Scratch

import numpy as np

def compute_iou(box1: np.ndarray, box2: np.ndarray) -> float:
    """
    Computes Intersection over Union between two bounding boxes.
    Input: [x1, y1, x2, y2] for both boxes.
    Returns: IoU in [0, 1]
    """
    x_left   = max(box1[0], box2[0])
    y_top    = max(box1[1], box2[1])
    x_right  = min(box1[2], box2[2])
    y_bottom = min(box1[3], box2[3])

    if x_right < x_left or y_bottom < y_top:
        return 0.0  # no intersection

    intersection = (x_right - x_left) * (y_bottom - y_top)
    area1 = (box1[2] - box1[0]) * (box1[3] - box1[1])
    area2 = (box2[2] - box2[0]) * (box2[3] - box2[1])
    union = area1 + area2 - intersection

    return intersection / union

def non_maximum_suppression(
    boxes: np.ndarray,
    scores: np.ndarray,
    iou_threshold: float = 0.45,
    score_threshold: float = 0.25
) -> list[int]:
    """
    Applies NMS to remove duplicate bounding boxes.

    Args:
        boxes: [N, 4] array of boxes in [x1, y1, x2, y2] format
        scores: [N] confidence scores
        iou_threshold: boxes with IoU above this are considered duplicates
        score_threshold: discard boxes below this confidence

    Returns:
        List of indices of selected boxes
    """
    valid_mask = scores >= score_threshold
    boxes  = boxes[valid_mask]
    scores = scores[valid_mask]
    indices = np.where(valid_mask)[0]

    order    = np.argsort(scores)[::-1]
    selected = []

    while len(order) > 0:
        best = order[0]
        selected.append(indices[best])
        order = order[1:]

        if len(order) == 0:
            break

        ious  = np.array([compute_iou(boxes[best], boxes[i]) for i in order])
        order = order[ious < iou_threshold]

    return selected

# Example
boxes  = np.array([[100, 100, 300, 300],
                   [110, 105, 310, 305],  # near-duplicate of first
                   [500, 200, 700, 400]]) # distinct object
scores = np.array([0.92, 0.88, 0.75])

kept = non_maximum_suppression(boxes, scores, iou_threshold=0.5)
print(f"Kept boxes: {kept}")  # [0, 2] - duplicate removed

1.3 Evaluation Metrics

      Core Metrics for Object Detection
      
            Metric
            Formula
            Meaning
          
            IoU
            Intersection / Union
            Overlap between predicted box and ground truth
          
            Precision
            TP / (TP + FP)
            Fraction of correct predictions among all predictions
          
            Recall
            TP / (TP + FN)
            Fraction of real objects that were found
          
            AP@0.5
            Area under PR curve at IoU=0.5
            Per-class detection accuracy
          
            mAP@0.5
            Mean AP across all classes
            Primary metric for model comparison
          
            mAP@0.5:0.95
            Mean mAP at IoU 0.5-0.95 (step 0.05)
            Stricter COCO standard metric (harder threshold)

2. YOLO Architecture: How It Works

YOLO (You Only Look Once) was introduced by Redmon et al. in 2016 with a revolutionary idea: treat object detection as a single regression problem, predicting bounding boxes and class probabilities directly from a single forward pass through the network. No region proposals, no two-stage processing: one network, one inference, extreme speed. Modern YOLO versions process 640x640 images at 30-100 FPS on consumer GPUs.

2.1 Three-Stage Architecture: Backbone, Neck, Head

YOLO Architecture (General Schema)

Input Image (640x640x3)
        |
        v
+------------------+
|    BACKBONE      |  Multi-scale feature extraction
|  (CSPDarkNet /   |  Outputs feature maps at 3 scales:
|   Hybrid Attn)   |  P3: 80x80  (small objects, high resolution)
|                  |  P4: 40x40  (medium objects)
|                  |  P5: 20x20  (large objects, high semantics)
+------------------+
        |
        v
+------------------+
|      NECK        |  Multi-scale feature aggregation
|   (PANet / BiFPN)|  Feature Pyramid Network (FPN) top-down path
|                  |  fuses semantic info from deep layers with
|                  |  spatial detail from shallow layers
+------------------+
        |
        v
+------------------+
|      HEAD        |  Final predictions per scale
|  (Decoupled      |  For each grid cell at each scale:
|   Anchor-Free)   |  - Box regression: [x, y, w, h]
|                  |  - Objectness: p(object present)
|                  |  - Classification: [p_c1, ..., p_cN]
+------------------+
        |
        v
Output: [batch, num_predictions, 4 + 1 + num_classes]
# YOLOv8 nano on 640x640: 8400 total predictions
# Breakdown: 80x80 + 40x40 + 20x20 = 6400 + 1600 + 400 = 8400

2.2 YOLO Evolution: From v1 to YOLO26

      YOLO Version History
      
        
            Version
            Year
            Key Innovation
            mAP (COCO)
          

        
            YOLOv1
            2016
            Single-stage detection, SxS grid, one-shot regression
            63.4 (VOC)
          

            YOLOv3
            2018
            Multi-scale detection at 3 resolutions, Darknet-53 backbone
            33.0
          

            YOLOv5
            2020
            CSP backbone, mosaic augmentation, PyTorch-native
            48.2
          

            YOLOv7
            2022
            Extended ELAN, auxiliary training heads, model re-parameterization
            51.4
          

            YOLOv8
            2023
            Anchor-free, decoupled head, C2f block, Ultralytics API
            53.9
          

            YOLOv9
            2024
            GELAN, Programmable Gradient Information (PGI)
            55.6
          

            YOLOv10
            2024
            NMS-free inference, dual-label assignment, consistent matching
            54.4
          

            YOLO26
            Jan 2026
            Hybrid Attention backbone, Dynamic NMS, C3k2 blocks
            57.2
          

      
    

2.3 Anchor-Free Detection: The YOLOv8 Revolution

One of the most significant innovations in YOLOv8 (inherited by YOLO26) is the abandonment of anchor boxes. Earlier YOLO versions used predefined anchor shapes computed via k-means clustering on the training dataset. This approach had two major drawbacks: (1) anchor design required dataset-specific tuning, and (2) the model predicted offsets relative to these anchors, introducing a prior bias that could hurt performance on novel object shapes.

In the anchor-free approach adopted by YOLOv8 and YOLO26, the model directly predicts the absolute (x, y) center coordinates and (w, h) dimensions of each bounding box from each grid cell. The decoupled head separates classification and box regression into two parallel branches, significantly improving performance because the two tasks require different feature representations. This simplification also makes the model easier to train and transfer to new domains.

3. Training on a Custom Dataset with YOLOv8

3.1 Dataset Preparation

YOLOv8 uses the YOLO TXT format for annotations: one .txt file per image, one line per object in the format: <class_id> <x_center> <y_center> <width> <height> (all coordinates normalized to [0, 1]).

YOLO Dataset Structure

dataset/
├── images/
│   ├── train/          # Training images
│   │   ├── img001.jpg
│   │   └── ...
│   ├── val/            # Validation images (~20%)
│   └── test/           # Test images (optional)
├── labels/
│   ├── train/          # One .txt per image
│   │   ├── img001.txt  # One line per object
│   │   └── ...
│   ├── val/
│   └── test/
└── dataset.yaml        # Dataset configuration

# Content of img001.txt (two objects):
# class_id  x_c   y_c    w     h
  0         0.45  0.60  0.30  0.40   # cat in center
  1         0.85  0.25  0.20  0.35   # dog top right

# dataset.yaml:
# path: /path/to/dataset
# train: images/train
# val: images/val
# nc: 2              # number of classes
# names: ['cat', 'dog']

COCO to YOLO Annotation Converter

import json
from pathlib import Path

def convert_coco_to_yolo(coco_json_path: str, output_dir: str) -> None:
    """
    Converts COCO JSON annotations to YOLO TXT format.
    Useful for public datasets: COCO, Open Images, LVIS, etc.
    """
    with open(coco_json_path) as f:
        coco_data = json.load(f)

    images = {img['id']: img for img in coco_data['images']}

    # Map COCO category_id -> zero-based YOLO index
    cat_id_to_yolo = {cat['id']: i for i, cat in enumerate(coco_data['categories'])}

    # Group annotations by image
    annotations_by_image: dict = {}
    for ann in coco_data['annotations']:
        img_id = ann['image_id']
        annotations_by_image.setdefault(img_id, []).append(ann)

    output_path = Path(output_dir)
    output_path.mkdir(parents=True, exist_ok=True)

    for img_id, anns in annotations_by_image.items():
        img_info = images[img_id]
        img_w, img_h = img_info['width'], img_info['height']
        img_name = Path(img_info['file_name']).stem

        lines = []
        for ann in anns:
            # COCO: [x_top_left, y_top_left, width, height]
            x_tl, y_tl, w, h = ann['bbox']

            # Convert to YOLO normalized center format
            x_c = (x_tl + w / 2) / img_w
            y_c = (y_tl + h / 2) / img_h
            w_n = w / img_w
            h_n = h / img_h

            cls = cat_id_to_yolo[ann['category_id']]
            lines.append(f"{cls} {x_c:.6f} {y_c:.6f} {w_n:.6f} {h_n:.6f}")

        with open(output_path / f"{img_name}.txt", 'w') as f:
            f.write('\n'.join(lines))

    print(f"Converted {len(annotations_by_image)} images -> {output_dir}")

3.2 Complete Training with Ultralytics YOLOv8

YOLOv8 Full Training Configuration

from ultralytics import YOLO
import yaml
from pathlib import Path

# Model variants (speed/accuracy tradeoff):
# yolov8n.pt  - nano   (fastest, ~3.2ms)
# yolov8s.pt  - small
# yolov8m.pt  - medium (recommended starting point)
# yolov8l.pt  - large
# yolov8x.pt  - xlarge (most accurate, ~10ms)

model = YOLO('yolov8m.pt')  # loads COCO pre-trained weights

results = model.train(
    data='dataset.yaml',
    epochs=100,
    imgsz=640,
    batch=16,
    workers=8,
    device='0',              # GPU index; 'cpu' for CPU-only

    # Optimization
    optimizer='AdamW',
    lr0=0.001,               # initial learning rate
    lrf=0.01,                # final LR = lr0 * lrf
    momentum=0.937,
    weight_decay=0.0005,
    warmup_epochs=3,
    cos_lr=True,             # cosine annealing LR schedule

    # Augmentation
    mosaic=1.0,              # mosaic: combines 4 images
    mixup=0.1,               # mixup augmentation
    copy_paste=0.1,          # copy-paste augmentation
    degrees=10.0,
    translate=0.1,
    scale=0.5,
    fliplr=0.5,
    flipud=0.0,

    # Regularization
    dropout=0.0,
    label_smoothing=0.0,

    # Checkpointing
    save=True,
    save_period=10,
    project='runs/train',
    name='yolov8m_custom',
    patience=50,

    plots=True,
    verbose=True
)

print(f"Best mAP@0.5: {results.results_dict['metrics/mAP50(B)']:.3f}")
print(f"Best model: runs/train/yolov8m_custom/weights/best.pt")

3.3 Real-Time Inference

Production Inference Engine: Images, Video, Webcam

from ultralytics import YOLO
import cv2
import numpy as np
import time
from pathlib import Path

class YOLOInferenceEngine:
    """
    Production-ready inference engine for YOLOv8/YOLO26.
    Supports single images, video files, RTSP streams, and webcam.
    """

    def __init__(
        self,
        model_path: str = 'yolov8m.pt',
        conf_threshold: float = 0.25,
        iou_threshold: float = 0.45,
    ):
        self.model = YOLO(model_path)
        self.conf = conf_threshold
        self.iou  = iou_threshold

        # Random color palette for up to 100 classes
        np.random.seed(42)
        self.colors = np.random.randint(0, 255, size=(100, 3), dtype=np.uint8)

    def predict_image(self, image_path: str,
                      save_path: str | None = None) -> list[dict]:
        """Single image inference with optional annotated output save."""
        results = self.model.predict(
            source=image_path, conf=self.conf, iou=self.iou, verbose=False
        )

        detections = []
        for r in results:
            for box in r.boxes:
                det = {
                    'bbox': box.xyxy[0].tolist(),          # [x1, y1, x2, y2]
                    'confidence': float(box.conf[0]),
                    'class_id': int(box.cls[0]),
                    'class_name': r.names[int(box.cls[0])]
                }
                detections.append(det)

            if save_path:
                cv2.imwrite(save_path, r.plot())

        return detections

    def process_video(self, source, output_path: str | None = None) -> None:
        """
        Process a video file or RTSP stream with real-time FPS overlay.
        source: file path, RTSP URL, or integer camera index.
        """
        cap = cv2.VideoCapture(source)
        writer = None

        if output_path and isinstance(source, str):
            fps = cap.get(cv2.CAP_PROP_FPS)
            w   = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
            h   = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
            writer = cv2.VideoWriter(
                output_path, cv2.VideoWriter_fourcc(*'mp4v'), fps, (w, h)
            )

        frame_count = 0
        while cap.isOpened():
            ret, frame = cap.read()
            if not ret:
                break

            t0 = time.perf_counter()
            results = self.model(frame, conf=self.conf, iou=self.iou, verbose=False)
            fps_val = 1 / (time.perf_counter() - t0)

            annotated = results[0].plot()
            cv2.putText(annotated, f"FPS: {fps_val:.1f}", (10, 30),
                       cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
            cv2.putText(annotated, f"Frame: {frame_count}", (10, 65),
                       cv2.FONT_HERSHEY_SIMPLEX, 0.7, (255, 255, 0), 2)

            if writer:
                writer.write(annotated)
            cv2.imshow('YOLO Detection', annotated)
            if cv2.waitKey(1) & 0xFF == ord('q'):
                break

            frame_count += 1

        cap.release()
        if writer:
            writer.release()
        cv2.destroyAllWindows()
        print(f"Processed {frame_count} frames")

    def run_webcam(self, camera_id: int = 0) -> None:
        """Live detection from webcam. Press 'q' to quit."""
        print(f"Starting webcam {camera_id} detection. Press 'q' to exit.")
        self.process_video(camera_id, output_path=None)

# Usage
engine = YOLOInferenceEngine('best.pt', conf_threshold=0.4)

# Single image
dets = engine.predict_image('test.jpg', save_path='result.jpg')
for d in dets:
    print(f"{d['class_name']}: {d['confidence']:.2f} @ {[int(c) for c in d['bbox']]}")

# Video file
engine.process_video('traffic.mp4', 'traffic_detected.mp4')

# Live webcam
engine.run_webcam(camera_id=0)

4. Export and Deployment

After training, you typically need to export your model to a deployment-optimized format. The choice depends on your target hardware: TensorRT for NVIDIA GPUs, OpenVINO for Intel silicon, NCNN or TFLite for ARM edge devices, CoreML for Apple Silicon.

      YOLOv8 / YOLO26 Export Formats
      
        
            Format
            Target Hardware
            Speedup vs PyTorch
            Use Case
          

        
            ONNX
            Multi-platform CPU/GPU
            1.5-2x
            Maximum portability, cross-framework compatibility
          

            TensorRT
            NVIDIA GPU
            5-8x
            Highest throughput on NVIDIA Jetson, T4, A100
          

            OpenVINO
            Intel CPU/GPU/VPU
            3-4x
            Intel server CPUs, Movidius Neural Compute Sticks
          

            TFLite
            Mobile/Edge
            2-3x
            Android devices, Raspberry Pi with Coral TPU
          

            CoreML
            Apple Silicon (M1-M4)
            3-5x
            iOS/macOS apps with hardware-accelerated Neural Engine
          

            NCNN
            ARM CPU
            2-4x
            Raspberry Pi, embedded ARM SoCs, Qualcomm SoCs
          

      
    

Multi-Format Export and Benchmarking

from ultralytics import YOLO

model = YOLO('runs/train/yolov8m_custom/weights/best.pt')

# ONNX (most portable, works on any hardware)
model.export(format='onnx', imgsz=640, opset=17, simplify=True)

# TensorRT (fastest on NVIDIA GPUs, requires TensorRT + CUDA)
model.export(
    format='engine',      # TensorRT engine
    imgsz=640,
    half=True,            # FP16: 2x faster with minimal accuracy drop
    workspace=4,          # GPU workspace in GB
    batch=1,              # fixed batch size for TRT
    device=0
)

# OpenVINO (Intel CPU optimized)
model.export(format='openvino', imgsz=640, half=False)

# TFLite (mobile/edge, supports INT8 quantization)
model.export(format='tflite', imgsz=640, int8=False)

# CoreML (Apple Silicon, iOS/macOS)
model.export(format='coreml', imgsz=640)

# NCNN (ARM embedded, no Python dependency at inference)
model.export(format='ncnn', imgsz=640)

# --- Benchmark: compare formats ---
from ultralytics.utils.benchmarks import benchmark

benchmark(
    model='runs/train/yolov8m_custom/weights/best.pt',
    data='dataset.yaml',
    imgsz=640,
    half=True,
    device=0
)

# --- Use exported models (same Ultralytics API) ---
model_onnx = YOLO('best.onnx')
model_trt  = YOLO('best.engine')

results = model_onnx.predict('image.jpg', conf=0.25)
print(f"ONNX detections: {len(results[0].boxes)}")

results = model_trt.predict('image.jpg', conf=0.25)
print(f"TensorRT detections: {len(results[0].boxes)}")

5. YOLO26: What's New in January 2026

Released by Ultralytics in January 2026, YOLO26 introduces significant architectural innovations that position it as the reference model for real-time object detection in 2026. The key improvements are focused on three areas: a more expressive backbone with hybrid attention, a smarter post-processing stage with learned Dynamic NMS, and improved training with self-calibrating augmentation policies.

5.1 Key Innovations

      YOLO26 vs YOLOv8: Technical Comparison
      
            Feature
            YOLOv8
            YOLO26
          
            Backbone
            CSP-DarkNet with C2f blocks
            Hybrid Attention + C3k2 blocks
          
            Neck
            PANet
            Enhanced PANet with SCDown
          
            Head
            Anchor-free decoupled
            Anchor-free with Dual Head
          
            NMS
            Fixed-threshold NMS
            Dynamic NMS (learned threshold scheduling)
          
            Training augmentation
            Manual mosaic/mixup/copy-paste
            Self-calibrating auto_augment='yolo26'
          
            mAP@0.5 (COCO)
            53.9
            57.2 (+3.3)
          
            mAP@0.5:0.95 (COCO)
            37.3
            41.1 (+3.8)
          
            Inference latency (A100)
            4.1ms
            3.8ms (-7%)

The Hybrid Attention backbone combines convolutional operations (efficient for local texture features) with window-based self-attention (effective for long-range dependencies). The attention is applied selectively in the deeper backbone stages where it provides the most benefit for detecting objects at varying scales. Dynamic NMS replaces the static IoU threshold with a learned score that adapts to object density in each image, reducing false negatives in crowded scenes (pedestrians, vehicles in traffic) without increasing false positives in sparse scenes.

YOLO26 Training and Inference with Ultralytics

# YOLO26 requires ultralytics >= 8.3.0 (January 2026 release)
# pip install ultralytics --upgrade

from ultralytics import YOLO

# Available model sizes (same family as YOLOv8)
# yolo26n.pt  - nano   (fastest, ~3.1ms A100)
# yolo26s.pt  - small
# yolo26m.pt  - medium (recommended: best accuracy/speed tradeoff)
# yolo26l.pt  - large
# yolo26x.pt  - xlarge (most accurate, ~8.1ms A100)

model = YOLO('yolo26m.pt')

# Training on custom dataset (same Ultralytics API as YOLOv8)
results = model.train(
    data='dataset.yaml',
    epochs=100,
    imgsz=640,
    batch=16,
    device='0',

    # YOLO26-specific: self-calibrating augmentation policy
    # automatically adjusts mosaic/mixup/copy-paste intensity per epoch
    auto_augment='yolo26',

    # YOLO26-specific: adaptive NMS threshold scheduling
    # gradually tightens NMS during training for better precision
    nms_schedule=True,

    # Standard hyperparameters
    optimizer='AdamW',
    lr0=0.001,
    cos_lr=True,
    patience=50,
    plots=True
)

# Validation
metrics = model.val(data='dataset.yaml')
print(f"mAP@0.5:     {metrics.box.map50:.3f}")
print(f"mAP@0.5:0.95: {metrics.box.map:.3f}")

# Inference (identical API to YOLOv8)
results = model.predict('test_image.jpg', conf=0.25, iou=0.45)
for r in results:
    print(f"Detected {len(r.boxes)} objects")

6. Advanced Training Strategies and Hyperparameter Tuning

YOLO's success depends as much on training strategy as on architecture. From learning rate scheduling to class imbalance handling, these techniques distinguish a mediocre model from a production-ready one that actually holds up under real-world conditions.

Advanced Training: LR Scheduling, Class Weights, and Auto-Tuning

from ultralytics import YOLO
import yaml

# ---- Dataset YAML with class weights for imbalance compensation ----
dataset_config = {
    'path': './datasets/custom',
    'train': 'images/train',
    'val': 'images/val',
    'nc': 5,
    'names': ['person', 'car', 'bicycle', 'dog', 'cat'],

    # Class weights: compensate imbalance (person is 5x more frequent)
    'cls_weights': [1.0, 1.0, 2.0, 2.5, 2.5]
}

with open('dataset.yaml', 'w') as f:
    yaml.dump(dataset_config, f)


# ---- Training with optimized hyperparameters ----
model = YOLO('yolo26m.pt')

results = model.train(
    data='dataset.yaml',
    epochs=300,
    imgsz=640,
    batch=16,
    device='0',

    # ---- Optimizer ----
    optimizer='AdamW',          # AdamW better than SGD for fine-tuning
    lr0=0.001,                  # Initial learning rate
    lrf=0.01,                   # Final LR = lr0 * lrf
    weight_decay=0.0005,

    # ---- LR Scheduling ----
    cos_lr=True,                # Cosine annealing (smoother than step)
    warmup_epochs=3,            # Linear warmup from lr0/10 to lr0
    warmup_momentum=0.8,

    # ---- YOLO26 Augmentation ----
    auto_augment='yolo26',      # YOLO26 self-calibrating augmentation policy
    mosaic=1.0,
    mixup=0.2,
    copy_paste=0.1,
    degrees=10.0,
    translate=0.2,
    scale=0.9,
    fliplr=0.5,

    # ---- Loss weights ----
    box=7.5,                    # Bounding box loss weight
    cls=0.5,                    # Classification loss weight
    dfl=1.5,                    # Distribution Focal Loss weight

    # ---- Early stopping and checkpointing ----
    patience=50,
    save_period=25,
    plots=True,

    project='runs/train',
    name='yolo26m_custom',
)

print(f"Best mAP@0.5:     {results.results_dict['metrics/mAP50(B)']:.4f}")
print(f"Best mAP@0.5:0.95: {results.results_dict['metrics/mAP50-95(B)']:.4f}")


# ---- Hyperparameter Auto-Tuning with Ray Tune ----
def tune_yolo26(model_path: str, data_path: str) -> None:
    """
    Automatically optimize hyperparameters with Ray Tune.
    Requires: pip install ray[tune]
    Searches over learning rate, augmentation intensity, loss weights.
    """
    model = YOLO(model_path)

    # Search space
    space = {
        'lr0': (1e-5, 1e-1),           # log-uniform
        'lrf': (0.01, 1.0),
        'weight_decay': (0.0, 0.001),
        'warmup_epochs': (0, 5),
        'box': (0.02, 0.2),
        'cls': (0.2, 4.0),
        'mosaic': (0.0, 1.0),
        'mixup': (0.0, 0.5),
    }

    result = model.tune(
        data=data_path,
        space=space,
        epochs=50,       # epochs per trial
        iterations=100,  # number of configurations to try
        optimizer='AdamW',
        plots=True,
        save=True
    )

    print("Best hyperparameters found:")
    for k, v in result.items():
        print(f"  {k}: {v}")


# ---- Structured training monitoring with callbacks ----
class YOLOTrainingMonitor:
    """Custom training callbacks for advanced monitoring."""

    def __init__(self, no_improve_alert: int = 30):
        self.best_map = 0.0
        self.no_improve_count = 0
        self.alert_after = no_improve_alert
        self.history = []

    def on_train_epoch_end(self, trainer) -> None:
        metrics = trainer.metrics
        current_map = metrics.get('metrics/mAP50(B)', 0.0)

        self.history.append({
            'epoch': trainer.epoch,
            'map50': current_map,
        })

        if current_map > self.best_map:
            self.best_map = current_map
            self.no_improve_count = 0
        else:
            self.no_improve_count += 1

        if self.no_improve_count == self.alert_after:
            print(f"[WARN] No improvement for {self.alert_after} epochs. "
                  f"Best mAP: {self.best_map:.4f}")

6.1 Dataset Quality Rules for Robust Training

Dataset quality matters more than architecture. A YOLO26n trained on excellent data will outperform a YOLO26x trained on poor data. These are the non-negotiable rules for building a YOLO dataset that generalizes in production:

      YOLO Dataset Quality Checklist
      
        
            Aspect
            Minimum
            Optimal
            Why it Matters
          

        
            Images per class
            500
            2000+
            More variety = better generalization
          

            Boxes per image
            1-10
            5-50 (real scenes)
            Too sparse = model ignores context
          

            Condition variety
            2 lighting conditions
            Day/night/indoor/outdoor
            Robustness to domain shift
          

            Class balance
            Max 5:1 ratio
            2:1 or better
            Prevents class dominance
          

            Train/val/test split
            70/20/10
            80/10/10
            Test set never seen during development
          

            Annotation quality
            Inter-annotator kappa > 0.8
            Consensus of 2+ annotators
            Label noise directly degrades mAP
          

      
    

7. Best Practices for Object Detection

YOLO Model Selection Guide

Scenario	Recommended Model	Reason
Rapid prototyping / research	YOLOv8n / YOLO26n	Fast iteration, easy to debug, small GPU memory
Production (GPU server)	YOLO26m / YOLO26l	Best accuracy/throughput balance for cloud deployment
Edge (Raspberry Pi, Jetson Nano)	YOLOv8n + INT8 quantization	Minimal memory footprint, works without NVIDIA GPU
Maximum accuracy (offline batch)	YOLO26x with TTA	State of the art, test-time augmentation for extra boost
Small dense objects (drones, PCB)	YOLOv8m with imgsz=1280	Larger input resolution preserves fine-grained detail
Apple Silicon (iOS/macOS)	YOLOv8m exported to CoreML	Uses Neural Engine for 3-5x speedup on M-series chips

Common Mistakes to Avoid

Unbalanced dataset: If one class has 10x more images than others, the model will over-specialize. Use weighted sampling (cls_weights in dataset.yaml) or strategic over/undersampling to balance class distributions.
Confidence threshold too low: A threshold of 0.1 floods outputs with false positives. Start at 0.25 and increase incrementally until precision is acceptable for your application.
NMS IoU threshold too low: An IoU threshold of 0.3 incorrectly suppresses valid boxes in dense scenes (crowds, parking lots). Use 0.45-0.5 for overlapping objects, or use YOLO26's Dynamic NMS.
Training images too small: YOLO is optimized for 640x640. Training at 320x320 significantly degrades detection of small objects, which can be the majority of challenging cases.
Applying mosaic augmentation on industrial images: Mosaic combines 4 images into 1, which destroys spatial context. This is counterproductive for industrial inspection where the location of a defect relative to the part matters.
Ignoring domain shift: YOLO pre-trained on COCO expects natural images. For very different domains (infrared, X-ray, microscopy, satellite), either fine-tune on a representative dataset or train from scratch.

Conclusions

In this article we built a comprehensive understanding of modern object detection with YOLO, covering theory, implementation, and production deployment:

Object detection fundamentals: bounding boxes, IoU, NMS, mAP metrics
YOLO three-stage architecture: backbone, FPN+PAN neck, anchor-free head
YOLO evolution from v1 to YOLO26, with each version's key contribution
Advanced training: cosine LR scheduling, class-weighted loss, warmup phases
Hyperparameter auto-tuning with Ray Tune for finding optimal configurations
Export to ONNX, TensorRT, OpenVINO, CoreML, NCNN for every hardware target
Dataset quality rules: the non-negotiable foundation for a model that generalizes
YOLO26 innovations: Hybrid Attention backbone, Dynamic NMS, +3.3 mAP vs YOLOv8

Series Navigation

Cross-Series Resources

MLOps: Model Serving in Production - serve your YOLO26 model at scale with FastAPI and Kubernetes
Computer Vision on Edge: Raspberry Pi and Jetson - deploy YOLO26 with TensorRT and NCNN on embedded hardware
Deep Learning: Advanced Object Detection - DETR, DINO and transformer-based detectors

Metric	Formula	Meaning
IoU	Intersection / Union	Overlap between predicted box and ground truth
Precision	TP / (TP + FP)	Fraction of correct predictions among all predictions
Recall	TP / (TP + FN)	Fraction of real objects that were found
AP@0.5	Area under PR curve at IoU=0.5	Per-class detection accuracy
mAP@0.5	Mean AP across all classes	Primary metric for model comparison
mAP@0.5:0.95	Mean mAP at IoU 0.5-0.95 (step 0.05)	Stricter COCO standard metric (harder threshold)

Version	Year	Key Innovation	mAP (COCO)
YOLOv1	2016	Single-stage detection, SxS grid, one-shot regression	63.4 (VOC)
YOLOv3	2018	Multi-scale detection at 3 resolutions, Darknet-53 backbone	33.0
YOLOv5	2020	CSP backbone, mosaic augmentation, PyTorch-native	48.2
YOLOv7	2022	Extended ELAN, auxiliary training heads, model re-parameterization	51.4
YOLOv8	2023	Anchor-free, decoupled head, C2f block, Ultralytics API	53.9
YOLOv9	2024	GELAN, Programmable Gradient Information (PGI)	55.6
YOLOv10	2024	NMS-free inference, dual-label assignment, consistent matching	54.4
YOLO26	Jan 2026	Hybrid Attention backbone, Dynamic NMS, C3k2 blocks	57.2

Format	Target Hardware	Speedup vs PyTorch	Use Case
ONNX	Multi-platform CPU/GPU	1.5-2x	Maximum portability, cross-framework compatibility
TensorRT	NVIDIA GPU	5-8x	Highest throughput on NVIDIA Jetson, T4, A100
OpenVINO	Intel CPU/GPU/VPU	3-4x	Intel server CPUs, Movidius Neural Compute Sticks
TFLite	Mobile/Edge	2-3x	Android devices, Raspberry Pi with Coral TPU
CoreML	Apple Silicon (M1-M4)	3-5x	iOS/macOS apps with hardware-accelerated Neural Engine
NCNN	ARM CPU	2-4x	Raspberry Pi, embedded ARM SoCs, Qualcomm SoCs

Feature	YOLOv8	YOLO26
Backbone	CSP-DarkNet with C2f blocks	Hybrid Attention + C3k2 blocks
Neck	PANet	Enhanced PANet with SCDown
Head	Anchor-free decoupled	Anchor-free with Dual Head
NMS	Fixed-threshold NMS	Dynamic NMS (learned threshold scheduling)
Training augmentation	Manual mosaic/mixup/copy-paste	Self-calibrating auto_augment='yolo26'
mAP@0.5 (COCO)	53.9	57.2 (+3.3)
mAP@0.5:0.95 (COCO)	37.3	41.1 (+3.8)
Inference latency (A100)	4.1ms	3.8ms (-7%)

Aspect	Minimum	Optimal	Why it Matters
Images per class	500	2000+	More variety = better generalization
Boxes per image	1-10	5-50 (real scenes)	Too sparse = model ignores context
Condition variety	2 lighting conditions	Day/night/indoor/outdoor	Robustness to domain shift
Class balance	Max 5:1 ratio	2:1 or better	Prevents class dominance
Train/val/test split	70/20/10	80/10/10	Test set never seen during development
Annotation quality	Inter-annotator kappa > 0.8	Consensus of 2+ annotators	Label noise directly degrades mAP