Hi! I'm

Federico Calò

Software Developer | Technical Writer

I create modern web applications and custom digital tools to help businesses grow through technological innovation. My passion is combining computer science and economics to generate real value.

Contact Me

About Me

My passion for computer science was born at the Technical Commercial Institute of Maglie, where I discovered the power of programming and the fascination of creating digital solutions. From the start, I understood that computer science was not just code, but an extraordinary tool for turning ideas into reality.

During my studies in Business Information Systems, I began to interweave computer science and economics, understanding how technology can be the engine of growth for any business. This vision accompanied me to the University of Bari, where I obtained my degree in Computer Science, deepening my technical skills and passion for software development.

Today I put this experience at the service of businesses, professionals and startups, creating tailor-made digital solutions that automate processes, optimize resources and open new business opportunities. Because true innovation begins when technology meets the real needs of people.

My Skills

Data Analysis & Predictive Models

I transform data into strategic insights with in-depth analysis and predictive models for informed decisions

Process Automation

I create custom tools that automate repetitive operations and free up time for value-added activities

Custom Systems

I develop tailor-made software systems, from platform integrations to customized dashboards

const federico = {
  nome: "Federico Calò",
  ruolo: "Sviluppatore Software",
  città: "Bari, Italia",
  missione: "Aiutare attraverso l'informatica",
  passioni: [
    "Codice Pulito",
    "Innovazione",
    "Crescita Continua"
  ]
};

La Mia Missione

Credo fermamente che l'informatica sia lo strumento più potente per trasformare le idee in realtà e migliorare la vita delle persone.

Democratizzare la Tecnologia

La mia missione è rendere l'informatica accessibile a tutti: dalle piccole imprese locali alle startup innovative, fino ai professionisti che vogliono digitalizzare la propria attività. Ogni realtà merita di sfruttare le potenzialità del digitale.

Unire Informatica ed Economia

Non è solo questione di scrivere codice: è capire come la tecnologia possa generare valore reale. Intrecciando competenze informatiche e visione economica, aiuto le attività a crescere, ottimizzare processi e raggiungere nuovi traguardi di efficienza e redditività.

Creare Soluzioni su Misura

Ogni attività è unica, e così devono esserlo le soluzioni. Sviluppo strumenti personalizzati che rispondono alle esigenze specifiche di ciascun cliente, automatizzando processi ripetitivi e liberando tempo per ciò che conta davvero: far crescere il business.

Trasforma la Tua Attività con la Tecnologia

Che tu gestisca un negozio, uno studio professionale o un'azienda, posso aiutarti a sfruttare le potenzialità dell'informatica per lavorare meglio, più velocemente e in modo più intelligente.

Parliamone Insieme →

Join the Community

Join the developer community where we discuss software, AI, architecture and DevOps. Share ideas, ask questions and grow with us.

Channel

FC Dev Blog

Get notifications on new articles, complete series, weekly tips and featured tools. Bilingual IT/EN content directly in your Telegram.

New articles as they are published
Weekly tips and code snippets
Polls on future topics

Subscribe to Channel

Group

FC Dev Community

A bilingual IT/EN community for developers. Discussions, Q&A, mutual help and networking with other professionals.

Discussions on articles and technologies
Coding help and code review
Job opportunities and collaboration

Join the Group

Discussion Topics

View

Master SQL

RoadMap.sh

November 2024

View

Oracle Certified Foundations Associate

Oracle

October 2024

View

People Leadership Credential

Connect

September 2024

💻 Languages & Technologies

Java

Python

JavaScript

Angular

React

TypeScript

SQL

PHP

CSS/SCSS

Node.js

Docker

Git

💼

12/2024 - Present

Custom Software Engineering Analyst

Accenture

Bari, Puglia, Italy · Hybrid Analysis and development of computer systems through the use of Java and Quarkus in Health and Public Sector. Continuous training on modern technologies for creating customized and efficient software solutions and on agents.

💼

06/2022 - 12/2024

Software analyst and Back End Developer Associate Consultant

Links Management and Technology SpA

Experience analyzing as-is software systems and ETL flows using PowerCenter. Completed Spring Boot training for developing modern and scalable backend applications. Backend developer specialized in Spring Boot, with experience in database design, analysis, development and testing of assigned tasks.

💼

02/2021 - 10/2021

Software programmer

Adesso.it (prima era WebScience srl)

Experience in AS-IS and TO-BE analysis, SEO evolutions and website evolutions to improve user performance and engagement.

🎓

2018 - 2025

Degree in Computer Science

University of Bari Aldo Moro

Bachelor's degree in Computer Science, focusing on software engineering, algorithms, and modern development practices.

📚

2013 - 2018

Diploma - Corporate Information Systems

Technical Commercial Institute of Maglie

Technical diploma specializing in Business Information Systems, combining IT knowledge with business management.

Contattami

Hai un progetto in mente? Parliamone! Compila il form qui sotto e ti risponderò al più presto.

* Campi obbligatori. I tuoi dati saranno utilizzati solo per rispondere alla tua richiesta.

Introduction: Why Probability Is Fundamental for AI

Machine learning is, at its core, a problem of reasoning under uncertainty. Data is noisy, models are approximations, and predictions are always probabilistic. Probability gives us the mathematical framework to quantify this uncertainty and make informed decisions.

In this article, we will explore from the basics (conditional probability, distributions) to advanced concepts like Bayes' theorem, Maximum Likelihood Estimation, and the comparison between frequentist and Bayesian approaches.

What You Will Learn

Conditional probability and independence
Distributions: Gaussian, Bernoulli, Categorical, Poisson
Bayes' theorem: updating beliefs with data
Maximum Likelihood Estimation (MLE)
Maximum A Posteriori (MAP) and the Bayesian approach
Central Limit Theorem and its implications

Foundations: Probability and Random Variables

The probability of an event $A$ is a number between 0 and 1 that measures how likely that event is: $P(A) \\in [0, 1]$ .

The conditional probability of $A$ given $B$ measures the probability of $A$ knowing that $B$ occurred:

P(A|B) = \\frac{P(A \\cap B)}{P(B)}

Two events are independent if $P(A \\cap B) = P(A) \\cdot P(B)$ , meaning knowing one occurred does not change the probability of the other.

Expected Value and Variance

The expected value (mean) of a random variable $X$ :

\\mathbb{E}[X] = \\sum_{x} x \\cdot P(X = x) \\quad \\text{(discrete)} \\qquad \\mathbb{E}[X] = \\int_{-\\infty}^{\\infty} x \\cdot f(x) \\, dx \\quad \\text{(continuous)}

The variance measures the spread around the mean:

\\text{Var}(X) = \\mathbb{E}[(X - \\mu)^2] = \\mathbb{E}[X^2] - (\\mathbb{E}[X])^2

The standard deviation $\\sigma = \\sqrt{\\text{Var}(X)}$ has the same units as $X$ , making it more interpretable.

Fundamental Distributions for ML

Bernoulli Distribution

Models a single experiment with two outcomes (success/failure). Parameter: $p$ (probability of success).

P(X = k) = p^k (1-p)^{1-k} \\quad k \\in \\{0, 1\\}

In ML: models binary classification. The output of a neuron with sigmoid is the parameter $p$ of a Bernoulli.

Gaussian (Normal) Distribution

The most important distribution in statistics and ML, parameterized by mean $\\mu$ and variance $\\sigma^2$ :

f(x) = \\frac{1}{\\sqrt{2\\pi\\sigma^2}} \\exp\\left(-\\frac{(x - \\mu)^2}{2\\sigma^2}\\right)

Why it appears everywhere: by the Central Limit Theorem, the sum of many independent random variables tends to a Gaussian, regardless of the original distribution.

In ML: weight initialization (Gaussian with $\\mu = 0$ ), data noise, Gaussian Mixture Models, VAE (variational autoencoders).

Categorical (Multinomial) Distribution

Generalization of Bernoulli to $K$ classes with probabilities $p_1, p_2, \\ldots, p_K$ where $\\sum_i p_i = 1$ :

P(X = k) = p_k \\quad k \\in \\{1, 2, \\ldots, K\\}

In ML: the output of softmax models a categorical distribution over $K$ classes.


import numpy as np
from scipy import stats

# Bernoulli: biased coin flip (p=0.7)
bernoulli = stats.bernoulli(p=0.7)
samples = bernoulli.rvs(size=1000)
print(f"Bernoulli - Empirical mean: {samples.mean():.3f} (expected: 0.7)")

# Gaussian: heights (mean=170cm, std=10cm)
gaussian = stats.norm(loc=170, scale=10)
heights = gaussian.rvs(size=1000)
print(f"Gaussian - Mean: {heights.mean():.1f}, Std: {heights.std():.1f}")

# Probability of being between 160 and 180
prob = gaussian.cdf(180) - gaussian.cdf(160)
print(f"P(160 < X < 180) = {prob:.4f}")

# Categorical: 6-sided die
probs = np.array([1/6] * 6)
categorical_samples = np.random.choice(6, size=1000, p=probs) + 1
print(f"Die - Mean: {categorical_samples.mean():.2f} (expected: 3.5)")

Bayes' Theorem: Updating Beliefs

Bayes' theorem is one of the most powerful tools for probabilistic reasoning. It allows us to update our belief about the probability of a hypothesis after observing data:

P(\\theta | D) = \\frac{P(D | \\theta) \\cdot P(\\theta)}{P(D)}

where:

$P(\\theta | D)$ - Posterior: updated belief after data
$P(D | \\theta)$ - Likelihood: how probable the data is given the model
$P(\\theta)$ - Prior: initial belief before data
$P(D)$ - Evidence: marginal probability of data (normalization constant)

Intuition: Bayes tells us to start with an initial belief (prior), observe data (likelihood), and combine the two to obtain an updated belief (posterior). The more data we observe, the more the posterior is dominated by the likelihood and less by the prior.

Practical Example: Naive Bayes Classifier


import numpy as np

# Dataset: email spam detection
# Features: count of words "free", "offer", "hello"
X_train = np.array([
    [3, 2, 0],  # spam
    [4, 3, 1],  # spam
    [0, 0, 3],  # not-spam
    [1, 0, 4],  # not-spam
    [5, 4, 0],  # spam
    [0, 1, 2],  # not-spam
])
y_train = np.array([1, 1, 0, 0, 1, 0])  # 1=spam, 0=not-spam

# Naive Bayes: P(spam|features) proportional to P(features|spam) * P(spam)
# Compute prior
n_spam = np.sum(y_train == 1)
n_ham = np.sum(y_train == 0)
p_spam = n_spam / len(y_train)
p_ham = n_ham / len(y_train)
print(f"P(spam) = {p_spam:.3f}, P(ham) = {p_ham:.3f}")

# Compute mean and variance per feature (Gaussian likelihood)
spam_mean = X_train[y_train == 1].mean(axis=0)
spam_var = X_train[y_train == 1].var(axis=0) + 1e-6
ham_mean = X_train[y_train == 0].mean(axis=0)
ham_var = X_train[y_train == 0].var(axis=0) + 1e-6

def gaussian_log_likelihood(x, mean, var):
    return -0.5 * np.sum(np.log(2 * np.pi * var) + (x - mean)**2 / var)

# Classify new email
x_new = np.array([4, 3, 0])
log_p_spam = np.log(p_spam) + gaussian_log_likelihood(x_new, spam_mean, spam_var)
log_p_ham = np.log(p_ham) + gaussian_log_likelihood(x_new, ham_mean, ham_var)

print(f"Log P(spam|x) proportional to: {log_p_spam:.4f}")
print(f"Log P(ham|x) proportional to: {log_p_ham:.4f}")
print(f"Classification: {'SPAM' if log_p_spam > log_p_ham else 'HAM'}")

Maximum Likelihood Estimation (MLE)

MLE finds the model parameters that make the observed data most probable. Given a series of independent observations $D = \\{x_1, \\ldots, x_n\\}$ , the likelihood is:

\\mathcal{L}(\\theta) = \\prod_{i=1}^{n} P(x_i | \\theta)

In practice, we work with the log-likelihood (turns products into sums):

\\ell(\\theta) = \\log \\mathcal{L}(\\theta) = \\sum_{i=1}^{n} \\log P(x_i | \\theta)

To find the maximum, we compute the derivative and set it to zero: $\\frac{d\\ell}{d\\theta} = 0$ .

Example: MLE for a Gaussian

For Gaussian data, the MLE parameter estimates are:

\\hat{\\mu}_{\\text{MLE}} = \\frac{1}{n} \\sum_{i=1}^{n} x_i \\qquad \\hat{\\sigma}^2_{\\text{MLE}} = \\frac{1}{n} \\sum_{i=1}^{n} (x_i - \\hat{\\mu})^2

The sample mean and variance are the MLE estimates. The connection to ML is deep: minimizing cross-entropy loss is equivalent to maximizing the log-likelihood.

Maximum A Posteriori (MAP)

MAP adds a prior to the parameters, combining likelihood and prior:

\\hat{\\theta}_{\\text{MAP}} = \\arg\\max_{\\theta} P(\\theta | D) = \\arg\\max_{\\theta} [\\log P(D | \\theta) + \\log P(\\theta)]

With a Gaussian prior $P(\\theta) \\sim \\mathcal{N}(0, \\sigma_p^2)$ , the term $\\log P(\\theta)$ becomes an L2 penalty on the weights: this is why L2 regularization (Ridge) is equivalent to a Gaussian prior on weights.


import numpy as np
from scipy.optimize import minimize_scalar

# Observed data (heights in cm)
data = np.array([168, 172, 175, 170, 173, 169, 171, 174, 176, 170])

# MLE: sample mean and variance
mu_mle = np.mean(data)
sigma2_mle = np.var(data)
print(f"MLE: mu={mu_mle:.2f}, sigma^2={sigma2_mle:.2f}")

# Log-likelihood function
def neg_log_likelihood(mu, data=data, sigma2=sigma2_mle):
    n = len(data)
    return 0.5 * n * np.log(2 * np.pi * sigma2) + np.sum((data - mu)**2) / (2 * sigma2)

# MAP with Gaussian prior: mu ~ N(170, 5^2)
prior_mu = 170
prior_sigma2 = 25

def neg_log_posterior(mu):
    nll = neg_log_likelihood(mu)
    neg_log_prior = (mu - prior_mu)**2 / (2 * prior_sigma2)
    return nll + neg_log_prior

result_mle = minimize_scalar(neg_log_likelihood, bounds=(150, 190), method='bounded')
result_map = minimize_scalar(neg_log_posterior, bounds=(150, 190), method='bounded')

print(f"MLE mu: {result_mle.x:.4f}")
print(f"MAP mu: {result_map.x:.4f} (shrunk toward prior {prior_mu})")

Central Limit Theorem

The Central Limit Theorem (CLT) states that the sum (or average) of many independent random variables, regardless of their original distribution, tends to a Gaussian distribution:

\\bar{X}_n = \\frac{1}{n}\\sum_{i=1}^{n} X_i \\xrightarrow{d} \\mathcal{N}\\left(\\mu, \\frac{\\sigma^2}{n}\\right)

This explains why the Gaussian appears everywhere: neural network weight = sum of many small updates, sensor noise = sum of many small perturbations, etc.

Summary and Connections to ML

Key Takeaways

Bayes: $P(\\theta|D) \\propto P(D|\\theta) P(\\theta)$ - updates beliefs with data
Gaussian: the most common distribution, appears everywhere thanks to CLT
MLE: finds parameters that maximize the probability of observed data
MAP: MLE + prior, equivalent to regularization
Cross-entropy loss = negative log-likelihood: the fundamental connection
L2 regularization = Gaussian prior: the probabilistic connection

In the Next Article: we will explore optimization for ML. We will cover Gradient Descent, SGD, Adam, momentum, and learning rate scheduling strategies that determine whether a model converges or diverges.