Hi! I'm

Federico Calò

Software Developer | Technical Writer

I create modern web applications and custom digital tools to help businesses grow through technological innovation. My passion is combining computer science and economics to generate real value.

Contact Me

About Me

My passion for computer science was born at the Technical Commercial Institute of Maglie, where I discovered the power of programming and the fascination of creating digital solutions. From the start, I understood that computer science was not just code, but an extraordinary tool for turning ideas into reality.

During my studies in Business Information Systems, I began to interweave computer science and economics, understanding how technology can be the engine of growth for any business. This vision accompanied me to the University of Bari, where I obtained my degree in Computer Science, deepening my technical skills and passion for software development.

Today I put this experience at the service of businesses, professionals and startups, creating tailor-made digital solutions that automate processes, optimize resources and open new business opportunities. Because true innovation begins when technology meets the real needs of people.

My Skills

Data Analysis & Predictive Models

I transform data into strategic insights with in-depth analysis and predictive models for informed decisions

Process Automation

I create custom tools that automate repetitive operations and free up time for value-added activities

Custom Systems

I develop tailor-made software systems, from platform integrations to customized dashboards

const federico = {
  nome: "Federico Calò",
  ruolo: "Sviluppatore Software",
  città: "Bari, Italia",
  missione: "Aiutare attraverso l'informatica",
  passioni: [
    "Codice Pulito",
    "Innovazione",
    "Crescita Continua"
  ]
};

La Mia Missione

Credo fermamente che l'informatica sia lo strumento più potente per trasformare le idee in realtà e migliorare la vita delle persone.

Democratizzare la Tecnologia

La mia missione è rendere l'informatica accessibile a tutti: dalle piccole imprese locali alle startup innovative, fino ai professionisti che vogliono digitalizzare la propria attività. Ogni realtà merita di sfruttare le potenzialità del digitale.

Unire Informatica ed Economia

Non è solo questione di scrivere codice: è capire come la tecnologia possa generare valore reale. Intrecciando competenze informatiche e visione economica, aiuto le attività a crescere, ottimizzare processi e raggiungere nuovi traguardi di efficienza e redditività.

Creare Soluzioni su Misura

Ogni attività è unica, e così devono esserlo le soluzioni. Sviluppo strumenti personalizzati che rispondono alle esigenze specifiche di ciascun cliente, automatizzando processi ripetitivi e liberando tempo per ciò che conta davvero: far crescere il business.

Trasforma la Tua Attività con la Tecnologia

Che tu gestisca un negozio, uno studio professionale o un'azienda, posso aiutarti a sfruttare le potenzialità dell'informatica per lavorare meglio, più velocemente e in modo più intelligente.

Parliamone Insieme →

Join the Community

Join the developer community where we discuss software, AI, architecture and DevOps. Share ideas, ask questions and grow with us.

Channel

FC Dev Blog

Get notifications on new articles, complete series, weekly tips and featured tools. Bilingual IT/EN content directly in your Telegram.

New articles as they are published
Weekly tips and code snippets
Polls on future topics

Subscribe to Channel

Group

FC Dev Community

A bilingual IT/EN community for developers. Discussions, Q&A, mutual help and networking with other professionals.

Discussions on articles and technologies
Coding help and code review
Job opportunities and collaboration

Join the Group

Discussion Topics

View

Master SQL

RoadMap.sh

November 2024

View

Oracle Certified Foundations Associate

Oracle

October 2024

View

People Leadership Credential

Connect

September 2024

💻 Languages & Technologies

Java

Python

JavaScript

Angular

React

TypeScript

SQL

PHP

CSS/SCSS

Node.js

Docker

Git

💼

12/2024 - Present

Custom Software Engineering Analyst

Accenture

Bari, Puglia, Italy · Hybrid Analysis and development of computer systems through the use of Java and Quarkus in Health and Public Sector. Continuous training on modern technologies for creating customized and efficient software solutions and on agents.

💼

06/2022 - 12/2024

Software analyst and Back End Developer Associate Consultant

Links Management and Technology SpA

Experience analyzing as-is software systems and ETL flows using PowerCenter. Completed Spring Boot training for developing modern and scalable backend applications. Backend developer specialized in Spring Boot, with experience in database design, analysis, development and testing of assigned tasks.

💼

02/2021 - 10/2021

Software programmer

Adesso.it (prima era WebScience srl)

Experience in AS-IS and TO-BE analysis, SEO evolutions and website evolutions to improve user performance and engagement.

🎓

2018 - 2025

Degree in Computer Science

University of Bari Aldo Moro

Bachelor's degree in Computer Science, focusing on software engineering, algorithms, and modern development practices.

📚

2013 - 2018

Diploma - Corporate Information Systems

Technical Commercial Institute of Maglie

Technical diploma specializing in Business Information Systems, combining IT knowledge with business management.

Contattami

Hai un progetto in mente? Parliamone! Compila il form qui sotto e ti risponderò al più presto.

* Campi obbligatori. I tuoi dati saranno utilizzati solo per rispondere alla tua richiesta.

Introduction: When Data Is Not Enough

In machine learning, data is everything. But often we do not have enough, or the dataset is imbalanced (one class has far more samples than others). Data augmentation is a set of techniques for artificially expanding the dataset, creating new samples from existing ones. The mathematics behind these techniques ranges from geometric transformations to statistical interpolation.

What You Will Learn

Geometric transformations for images: rotation, flip, scaling
Mixup and CutMix: interpolation between samples
SMOTE: synthetic oversampling for minority classes
Text and time series augmentation
Generative models: GANs and VAEs for synthetic data
When augmentation helps and when it hurts

Geometric Transformations for Images

Geometric transformations are the simplest and most intuitive forms of augmentation. Each transformation can be expressed as a transformation matrix applied to pixel coordinates.

Rotation

A rotation by angle $\\theta$ in the 2D plane:

\\begin{bmatrix} x' \\\\ y' \\end{bmatrix} = \\begin{bmatrix} \\cos\\theta & -\\sin\\theta \\\\ \\sin\\theta & \\cos\\theta \\end{bmatrix} \\begin{bmatrix} x \\\\ y \\end{bmatrix}

Scaling (Zoom)

\\begin{bmatrix} x' \\\\ y' \\end{bmatrix} = \\begin{bmatrix} s_x & 0 \\\\ 0 & s_y \\end{bmatrix} \\begin{bmatrix} x \\\\ y \\end{bmatrix}

General Affine Transformation

Combining rotation, scaling, translation, and shear in homogeneous coordinates:

\\begin{bmatrix} x' \\\\ y' \\\\ 1 \\end{bmatrix} = \\begin{bmatrix} a & b & t_x \\\\ c & d & t_y \\\\ 0 & 0 & 1 \\end{bmatrix} \\begin{bmatrix} x \\\\ y \\\\ 1 \\end{bmatrix}


import numpy as np

def rotate_image(image, angle_deg):
    """Image rotation (simplified for 2D arrays)."""
    angle_rad = np.radians(angle_deg)
    cos_a, sin_a = np.cos(angle_rad), np.sin(angle_rad)

    # Rotation matrix
    R = np.array([[cos_a, -sin_a],
                   [sin_a,  cos_a]])

    h, w = image.shape[:2]
    center = np.array([h/2, w/2])

    # Create rotated image
    rotated = np.zeros_like(image)
    for i in range(h):
        for j in range(w):
            coords = np.array([i, j]) - center
            src = R.T @ coords + center
            si, sj = int(round(src[0])), int(round(src[1]))
            if 0 <= si < h and 0 <= sj < w:
                rotated[i, j] = image[si, sj]

    return rotated

def augment_batch(images, labels):
    """Apply random augmentation to a batch."""
    augmented_images = []
    augmented_labels = []

    for img, label in zip(images, labels):
        augmented_images.append(img)
        augmented_labels.append(label)

        # Horizontal flip (50% probability)
        if np.random.random() > 0.5:
            augmented_images.append(np.fliplr(img))
            augmented_labels.append(label)

        # Vertical flip (30% probability)
        if np.random.random() > 0.7:
            augmented_images.append(np.flipud(img))
            augmented_labels.append(label)

        # Gaussian noise
        if np.random.random() > 0.5:
            noise = np.random.normal(0, 0.05, img.shape)
            augmented_images.append(np.clip(img + noise, 0, 1))
            augmented_labels.append(label)

    return np.array(augmented_images), np.array(augmented_labels)

# Example
np.random.seed(42)
batch = np.random.rand(4, 8, 8)  # 4 images 8x8
labels = np.array([0, 1, 0, 2])

aug_images, aug_labels = augment_batch(batch, labels)
print(f"Original: {batch.shape[0]} images")
print(f"Augmented: {aug_images.shape[0]} images")

Mixup: Interpolation Between Samples

Mixup creates new samples by linearly interpolating between pairs of existing samples (both inputs and labels):

\\tilde{x} = \\lambda x_i + (1 - \\lambda) x_j

\\tilde{y} = \\lambda y_i + (1 - \\lambda) y_j

where $\\lambda \\sim \\text{Beta}(\\alpha, \\alpha)$ with $\\alpha \\in (0, \\infty)$ . Typically $\\alpha = 0.2$ (light mixing).

Why it works: Mixup acts as a regularizer, forcing the model to make linear predictions between samples. It reduces overfitting and improves calibration.


import numpy as np

def mixup(X, y, alpha=0.2):
    """Mixup data augmentation."""
    n = X.shape[0]
    # Lambda from Beta distribution
    lam = np.random.beta(alpha, alpha, size=n)

    # Random permutation for second sample
    indices = np.random.permutation(n)

    # For multi-dimensional features, reshape lambda
    lam_x = lam.reshape(-1, *([1] * (X.ndim - 1)))

    # Interpolation
    X_mix = lam_x * X + (1 - lam_x) * X[indices]
    y_mix = lam * y + (1 - lam) * y[indices]

    return X_mix, y_mix

# Example with one-hot labels
X = np.random.randn(100, 10)       # 100 samples, 10 features
y = np.eye(3)[np.random.randint(0, 3, 100)]  # One-hot, 3 classes

X_mix, y_mix = mixup(X, y, alpha=0.2)
print(f"Original sample y[0]: {y[0]}")
print(f"Mixup sample y_mix[0]: {np.round(y_mix[0], 3)}")
print(f"Mixup label sum: {y_mix[0].sum():.4f}")  # Should be ~1

CutMix: Cut and Paste

CutMix cuts a rectangular region from one image and replaces it with the corresponding region from another. The label is proportional to the area:

\\tilde{y} = \\lambda y_A + (1 - \\lambda) y_B \\quad \\text{where} \\quad \\lambda = 1 - \\frac{r_w \\cdot r_h}{W \\cdot H}

where $r_w, r_h$ are the width and height of the cut region, and $W, H$ are the image dimensions.

SMOTE: Oversampling for Minority Classes

SMOTE (Synthetic Minority Over-sampling Technique) generates synthetic samples for the minority class by interpolating between existing samples and their $k$ nearest neighbors:

\\mathbf{x}_{\\text{new}} = \\mathbf{x}_i + \\lambda (\\mathbf{x}_{nn} - \\mathbf{x}_i) \\quad \\text{where} \\quad \\lambda \\sim \\text{Uniform}(0, 1)

where $\\mathbf{x}_{nn}$ is a random neighbor among the $k$ -nearest neighbors of $\\mathbf{x}_i$ .

When to use SMOTE: for tabular datasets with class imbalance (fraud, medical diagnoses, anomalies). Do NOT use for images (prefer Focal Loss or class weights) and do NOT apply to the test set (training only).


import numpy as np
from sklearn.neighbors import NearestNeighbors

def smote(X_minority, n_synthetic, k=5):
    """SMOTE: generate synthetic samples from the minority class."""
    n_samples = X_minority.shape[0]
    k_actual = min(k, n_samples - 1)

    # Find k nearest neighbors
    nn = NearestNeighbors(n_neighbors=k_actual + 1)
    nn.fit(X_minority)
    distances, indices = nn.kneighbors(X_minority)

    synthetic = []
    for _ in range(n_synthetic):
        # Choose a random sample
        idx = np.random.randint(0, n_samples)
        # Choose a random neighbor (exclude self: index 0)
        nn_idx = indices[idx, np.random.randint(1, k_actual + 1)]
        # Interpolate
        lam = np.random.random()
        new_sample = X_minority[idx] + lam * (X_minority[nn_idx] - X_minority[idx])
        synthetic.append(new_sample)

    return np.array(synthetic)

# Imbalanced dataset: 100 class 0, 10 class 1
np.random.seed(42)
X_majority = np.random.randn(100, 5) + np.array([2, 0, 0, 0, 0])
X_minority = np.random.randn(10, 5) + np.array([-2, 0, 0, 0, 0])

# Generate 90 synthetic samples to balance
X_synthetic = smote(X_minority, n_synthetic=90, k=5)
X_balanced = np.vstack([X_majority, X_minority, X_synthetic])
y_balanced = np.array([0]*100 + [1]*10 + [1]*90)

print(f"Original: class 0={100}, class 1={10}")
print(f"Balanced: class 0={100}, class 1={100}")
print(f"Balanced shape: {X_balanced.shape}")

Text Augmentation

For text data, the main techniques are:

Synonym Replacement: replace words with synonyms
Random Insertion: insert synonyms at random positions
Random Swap: swap word positions
Random Deletion: delete random words with probability $p$
Back-Translation: translate to another language and back (EN -> FR -> EN)

Time Series Augmentation

Specific techniques for temporal data:

Jittering (Gaussian Noise)

\\tilde{x}_t = x_t + \\epsilon_t \\quad \\text{where} \\quad \\epsilon_t \\sim \\mathcal{N}(0, \\sigma^2)

Time Warping

Distorts the time axis with a random monotone function, speeding up or slowing down portions of the series.

Window Slicing

Extracts random sub-sequences of length $w < T$ and rescales them to the original length.

Generative Models for Synthetic Data

GAN (Generative Adversarial Networks)

A generator $G$ and a discriminator $D$ compete in a min-max game:

\\min_G \\max_D \\mathbb{E}_{x \\sim p_{\\text{data}}}[\\log D(x)] + \\mathbb{E}_{z \\sim p_z}[\\log(1 - D(G(z)))]

$G$ generates fake data from noise $z$ , $D$ tries to distinguish real from fake. At equilibrium, $G$ produces samples indistinguishable from real data.

VAE (Variational Autoencoder)

The VAE loss combines reconstruction and latent space regularization:

L_{\\text{VAE}} = \\mathbb{E}_{q(z|x)}[\\log p(x|z)] - D_{\\text{KL}}(q(z|x) \\| p(z))

The first term is reconstruction quality, the second forces the latent space to be a standard Gaussian, enabling sampling of new data.

When Augmentation Works (and When It Does Not)

Works well:

Small datasets with few samples per class
Transformations that preserve semantics (flipping natural images)
Imbalanced classes (SMOTE for tabular, Focal Loss + augmentation for images)

Does not work or causes harm:

Transformations that change semantics (rotating digits 6 and 9)
Overly aggressive augmentation (deforms data beyond recognition)
Applying augmentation to the test set (data leakage)
Already abundant and diversified data

Summary

Key Takeaways

Geometric transformations: rotation, scaling, flip matrices - the basis of image augmentation
Mixup: $\\tilde{x} = \\lambda x_i + (1-\\lambda) x_j$ - regularization through interpolation
SMOTE: interpolates between minority class neighbors to balance
GAN: min-max game to generate realistic synthetic data
VAE: reconstruction + KL divergence for a sampleable latent space
Golden rule: augmentation must preserve semantics and never touch the test set

Series Conclusion: with this article, the "Mathematics and Statistics for AI" series concludes. We have covered the foundations from linear algebra to information theory, from optimization to Transformer mathematics. Every concept has been connected to practical ML/AI applications with NumPy implementations. These mathematical foundations will allow you to deeply understand any machine learning algorithm you encounter.