Hi! I'm

Federico Calò

Software Developer | Technical Writer

I create modern web applications and custom digital tools to help businesses grow through technological innovation. My passion is combining computer science and economics to generate real value.

Contact Me

About Me

My passion for computer science was born at the Technical Commercial Institute of Maglie, where I discovered the power of programming and the fascination of creating digital solutions. From the start, I understood that computer science was not just code, but an extraordinary tool for turning ideas into reality.

During my studies in Business Information Systems, I began to interweave computer science and economics, understanding how technology can be the engine of growth for any business. This vision accompanied me to the University of Bari, where I obtained my degree in Computer Science, deepening my technical skills and passion for software development.

Today I put this experience at the service of businesses, professionals and startups, creating tailor-made digital solutions that automate processes, optimize resources and open new business opportunities. Because true innovation begins when technology meets the real needs of people.

My Skills

Data Analysis & Predictive Models

I transform data into strategic insights with in-depth analysis and predictive models for informed decisions

Process Automation

I create custom tools that automate repetitive operations and free up time for value-added activities

Custom Systems

I develop tailor-made software systems, from platform integrations to customized dashboards

const federico = {
  nome: "Federico Calò",
  ruolo: "Sviluppatore Software",
  città: "Bari, Italia",
  missione: "Aiutare attraverso l'informatica",
  passioni: [
    "Codice Pulito",
    "Innovazione",
    "Crescita Continua"
  ]
};

La Mia Missione

Credo fermamente che l'informatica sia lo strumento più potente per trasformare le idee in realtà e migliorare la vita delle persone.

Democratizzare la Tecnologia

La mia missione è rendere l'informatica accessibile a tutti: dalle piccole imprese locali alle startup innovative, fino ai professionisti che vogliono digitalizzare la propria attività. Ogni realtà merita di sfruttare le potenzialità del digitale.

Unire Informatica ed Economia

Non è solo questione di scrivere codice: è capire come la tecnologia possa generare valore reale. Intrecciando competenze informatiche e visione economica, aiuto le attività a crescere, ottimizzare processi e raggiungere nuovi traguardi di efficienza e redditività.

Creare Soluzioni su Misura

Ogni attività è unica, e così devono esserlo le soluzioni. Sviluppo strumenti personalizzati che rispondono alle esigenze specifiche di ciascun cliente, automatizzando processi ripetitivi e liberando tempo per ciò che conta davvero: far crescere il business.

Trasforma la Tua Attività con la Tecnologia

Che tu gestisca un negozio, uno studio professionale o un'azienda, posso aiutarti a sfruttare le potenzialità dell'informatica per lavorare meglio, più velocemente e in modo più intelligente.

Parliamone Insieme →

Join the Community

Join the developer community where we discuss software, AI, architecture and DevOps. Share ideas, ask questions and grow with us.

Channel

FC Dev Blog

Get notifications on new articles, complete series, weekly tips and featured tools. Bilingual IT/EN content directly in your Telegram.

New articles as they are published
Weekly tips and code snippets
Polls on future topics

Subscribe to Channel

Group

FC Dev Community

A bilingual IT/EN community for developers. Discussions, Q&A, mutual help and networking with other professionals.

Discussions on articles and technologies
Coding help and code review
Job opportunities and collaboration

Join the Group

Discussion Topics

View

Master SQL

RoadMap.sh

November 2024

View

Oracle Certified Foundations Associate

Oracle

October 2024

View

People Leadership Credential

Connect

September 2024

💻 Languages & Technologies

Java

Python

JavaScript

Angular

React

TypeScript

SQL

PHP

CSS/SCSS

Node.js

Docker

Git

💼

12/2024 - Present

Custom Software Engineering Analyst

Accenture

Bari, Puglia, Italy · Hybrid Analysis and development of computer systems through the use of Java and Quarkus in Health and Public Sector. Continuous training on modern technologies for creating customized and efficient software solutions and on agents.

💼

06/2022 - 12/2024

Software analyst and Back End Developer Associate Consultant

Links Management and Technology SpA

Experience analyzing as-is software systems and ETL flows using PowerCenter. Completed Spring Boot training for developing modern and scalable backend applications. Backend developer specialized in Spring Boot, with experience in database design, analysis, development and testing of assigned tasks.

💼

02/2021 - 10/2021

Software programmer

Adesso.it (prima era WebScience srl)

Experience in AS-IS and TO-BE analysis, SEO evolutions and website evolutions to improve user performance and engagement.

🎓

2018 - 2025

Degree in Computer Science

University of Bari Aldo Moro

Bachelor's degree in Computer Science, focusing on software engineering, algorithms, and modern development practices.

📚

2013 - 2018

Diploma - Corporate Information Systems

Technical Commercial Institute of Maglie

Technical diploma specializing in Business Information Systems, combining IT knowledge with business management.

Contattami

Hai un progetto in mente? Parliamone! Compila il form qui sotto e ti risponderò al più presto.

* Campi obbligatori. I tuoi dati saranno utilizzati solo per rispondere alla tua richiesta.

Introduction: Measuring Information

Information theory, founded by Claude Shannon in 1948, gives us the tools to quantify uncertainty, measure the amount of information in a message, and evaluate how well a model approximates reality. In machine learning, these concepts appear everywhere: cross-entropy is the default classification loss function, KL divergence is at the heart of VAEs and knowledge distillation.

What You Will Learn

Information content: surprise as -log(p)
Entropy: measuring the uncertainty of a distribution
Cross-entropy: the most used classification loss
KL divergence: asymmetric distance between distributions
Mutual information: dependency between variables
Perplexity and its connections to language models

Information Content: Surprise

The information content (or self-information) of an event with probability $p$ measures how "surprising" that event is:

I(x) = -\\log_2 P(x)

Intuition: a very probable event ( $P \\approx 1$ ) carries little information (low surprise). A rare event ( $P \\approx 0$ ) carries much information (high surprise). The unit in base 2 is the bit: one bit is the information content of a fair coin flip.

Entropy: Average Uncertainty

Entropy is the expected value of information content, or the average surprise of a distribution:

H(X) = -\\sum_{x} P(x) \\log P(x) = \\mathbb{E}[-\\log P(X)]

For a continuous distribution:

H(X) = -\\int f(x) \\log f(x) \\, dx

Fundamental properties:

$H(X) \\geq 0$ always (uncertainty is never negative)
$H(X) = 0$ only if $X$ is deterministic (one event has probability 1)
$H(X)$ is maximal for the uniform distribution (maximum uncertainty)

Example: for a fair coin ( $P(H) = P(T) = 0.5$ ), the entropy is $H = -0.5\\log_2(0.5) - 0.5\\log_2(0.5) = 1$ bit. For a biased coin with $P(H) = 0.99$ , the entropy is about 0.08 bit: almost no uncertainty, we almost always know the outcome.


import numpy as np

def entropy(probs):
    """Compute entropy in bits (log base 2)."""
    probs = np.array(probs)
    probs = probs[probs > 0]  # Avoid log(0)
    return -np.sum(probs * np.log2(probs))

# Fair coin
print(f"Fair coin: H = {entropy([0.5, 0.5]):.4f} bit")

# Biased coin
print(f"Biased coin (0.99): H = {entropy([0.99, 0.01]):.4f} bit")

# Fair 6-sided die (uniform)
print(f"Fair die: H = {entropy([1/6]*6):.4f} bit")

# Loaded die (3 comes up 50%)
probs_loaded = [0.1, 0.1, 0.5, 0.1, 0.1, 0.1]
print(f"Loaded die: H = {entropy(probs_loaded):.4f} bit")

Cross-Entropy: The Classification Loss

Cross-entropy between the true distribution $p$ and the model's predicted distribution $q$ measures how many bits are needed on average to encode data from $p$ using the optimal code for $q$ :

H(p, q) = -\\sum_{x} p(x) \\log q(x)

In classification, $p$ is the target distribution (one-hot) and $q$ is the softmax output. For a single sample with label $y$ (one-hot) and prediction $\\hat{y}$ :

L = -\\sum_{k=1}^{K} y_k \\log \\hat{y}_k

For binary classification, it simplifies to binary cross-entropy:

L = -[y \\log(\\hat{y}) + (1-y) \\log(1 - \\hat{y})]

Fundamental connection: minimizing cross-entropy is equivalent to maximizing the log-likelihood of the model. This explains why cross-entropy is the natural classification loss: we are looking for the model that assigns the maximum probability to the observed data.


import numpy as np

def cross_entropy(p, q):
    """Cross-entropy H(p, q) using natural logarithm."""
    q = np.clip(q, 1e-15, 1 - 1e-15)  # Avoid log(0)
    return -np.sum(p * np.log(q))

def binary_cross_entropy(y_true, y_pred):
    """Binary cross-entropy for a single sample."""
    y_pred = np.clip(y_pred, 1e-15, 1 - 1e-15)
    return -(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred))

# 3-class classification
y_true = np.array([0, 0, 1])  # Class 3

# Good prediction
y_pred_good = np.array([0.05, 0.05, 0.90])
print(f"Good prediction: CE = {cross_entropy(y_true, y_pred_good):.4f}")

# Mediocre prediction
y_pred_mid = np.array([0.2, 0.3, 0.5])
print(f"Medium prediction: CE = {cross_entropy(y_true, y_pred_mid):.4f}")

# Wrong prediction
y_pred_bad = np.array([0.7, 0.2, 0.1])
print(f"Wrong prediction: CE = {cross_entropy(y_true, y_pred_bad):.4f}")

# Binary cross-entropy
print(f"\nBCE(y=1, pred=0.9) = {binary_cross_entropy(1, 0.9):.4f}")
print(f"BCE(y=1, pred=0.5) = {binary_cross_entropy(1, 0.5):.4f}")
print(f"BCE(y=1, pred=0.1) = {binary_cross_entropy(1, 0.1):.4f}")

KL Divergence: Distance Between Distributions

KL divergence (Kullback-Leibler) measures how much a distribution $q$ differs from a reference distribution $p$ :

D_{\\text{KL}}(p \\| q) = \\sum_{x} p(x) \\log \\frac{p(x)}{q(x)} = H(p, q) - H(p)

Important properties:

$D_{\\text{KL}}(p \\| q) \\geq 0$ always (Gibbs' inequality)
$D_{\\text{KL}}(p \\| q) = 0$ if and only if $p = q$
Not symmetric: $D_{\\text{KL}}(p \\| q) \\neq D_{\\text{KL}}(q \\| p)$

The relation $H(p, q) = H(p) + D_{\\text{KL}}(p \\| q)$ tells us that cross-entropy is the entropy of $p$ plus the KL divergence. Since $H(p)$ is constant (does not depend on the model), minimizing cross-entropy is equivalent to minimizing KL divergence.

KL Divergence in VAEs

In Variational Autoencoders, the loss includes a KL divergence term that forces the latent distribution to be close to a standard Gaussian:

D_{\\text{KL}}(\\mathcal{N}(\\mu, \\sigma^2) \\| \\mathcal{N}(0, 1)) = \\frac{1}{2}(\\mu^2 + \\sigma^2 - \\log \\sigma^2 - 1)


import numpy as np

def kl_divergence(p, q):
    """KL divergence D_KL(p || q)."""
    p = np.array(p, dtype=float)
    q = np.array(q, dtype=float)
    mask = p > 0
    return np.sum(p[mask] * np.log(p[mask] / q[mask]))

# Two distributions over 4 classes
p = np.array([0.25, 0.25, 0.25, 0.25])  # Uniform
q1 = np.array([0.3, 0.2, 0.3, 0.2])     # Slightly different
q2 = np.array([0.9, 0.03, 0.04, 0.03])  # Very different

print(f"KL(p || q1) = {kl_divergence(p, q1):.6f}")
print(f"KL(p || q2) = {kl_divergence(p, q2):.6f}")

# KL asymmetry
print(f"\nKL(p || q1) = {kl_divergence(p, q1):.6f}")
print(f"KL(q1 || p) = {kl_divergence(q1, p):.6f}")

# KL for VAE (Gaussian vs standard normal)
def kl_gaussian(mu, log_var):
    """KL divergence between N(mu, sigma^2) and N(0, 1)."""
    return -0.5 * np.sum(1 + log_var - mu**2 - np.exp(log_var))

mu = np.array([0.5, -0.3, 0.1])
log_var = np.array([-0.5, 0.2, -0.1])
print(f"\nKL(N(mu,sigma^2) || N(0,1)) = {kl_gaussian(mu, log_var):.4f}")

Mutual Information

Mutual information measures how much information one random variable provides about another:

I(X; Y) = \\sum_{x, y} P(x, y) \\log \\frac{P(x, y)}{P(x) P(y)} = H(X) - H(X|Y)

If $I(X; Y) = 0$ , the variables are independent. In ML, mutual information is used for: feature selection (selecting the most informative features), clustering evaluation, and as an objective in the InfoNCE loss of contrastive learning.

Perplexity: Evaluating Language Models

Perplexity is a standard metric for evaluating language models. It is defined as the exponential of the average cross-entropy per token:

\\text{PPL} = \\exp\\left(-\\frac{1}{N} \\sum_{i=1}^{N} \\log P(w_i | w_{<i})\\right)

A perplexity of $k$ means that, on average, the model is "confused" as if choosing uniformly among $k$ options at each step. Lower perplexity means a better model.

Summary and Connections to ML

Key Takeaways

Entropy $H(X)$ : measures uncertainty, maximal for uniform distributions
Cross-entropy $H(p,q)$ : the standard classification loss
KL divergence: asymmetric distance between distributions, used in VAEs
Minimizing cross-entropy = maximizing log-likelihood = minimizing KL
Mutual information: measures dependency, used in feature selection and contrastive learning
Perplexity: standard language model metric, lower is better

In the Next Article: we will explore PCA and dimensionality reduction. We will see how the covariance matrix, eigenvectors, and SVD allow compressing data while retaining most of the information.