Hi! I'm

Federico Calò

Software Developer | Technical Writer

I create modern web applications and custom digital tools to help businesses grow through technological innovation. My passion is combining computer science and economics to generate real value.

Contact Me

About Me

My passion for computer science was born at the Technical Commercial Institute of Maglie, where I discovered the power of programming and the fascination of creating digital solutions. From the start, I understood that computer science was not just code, but an extraordinary tool for turning ideas into reality.

During my studies in Business Information Systems, I began to interweave computer science and economics, understanding how technology can be the engine of growth for any business. This vision accompanied me to the University of Bari, where I obtained my degree in Computer Science, deepening my technical skills and passion for software development.

Today I put this experience at the service of businesses, professionals and startups, creating tailor-made digital solutions that automate processes, optimize resources and open new business opportunities. Because true innovation begins when technology meets the real needs of people.

My Skills

Data Analysis & Predictive Models

I transform data into strategic insights with in-depth analysis and predictive models for informed decisions

Process Automation

I create custom tools that automate repetitive operations and free up time for value-added activities

Custom Systems

I develop tailor-made software systems, from platform integrations to customized dashboards

const federico = {
  nome: "Federico Calò",
  ruolo: "Sviluppatore Software",
  città: "Bari, Italia",
  missione: "Aiutare attraverso l'informatica",
  passioni: [
    "Codice Pulito",
    "Innovazione",
    "Crescita Continua"
  ]
};

La Mia Missione

Credo fermamente che l'informatica sia lo strumento più potente per trasformare le idee in realtà e migliorare la vita delle persone.

Democratizzare la Tecnologia

La mia missione è rendere l'informatica accessibile a tutti: dalle piccole imprese locali alle startup innovative, fino ai professionisti che vogliono digitalizzare la propria attività. Ogni realtà merita di sfruttare le potenzialità del digitale.

Unire Informatica ed Economia

Non è solo questione di scrivere codice: è capire come la tecnologia possa generare valore reale. Intrecciando competenze informatiche e visione economica, aiuto le attività a crescere, ottimizzare processi e raggiungere nuovi traguardi di efficienza e redditività.

Creare Soluzioni su Misura

Ogni attività è unica, e così devono esserlo le soluzioni. Sviluppo strumenti personalizzati che rispondono alle esigenze specifiche di ciascun cliente, automatizzando processi ripetitivi e liberando tempo per ciò che conta davvero: far crescere il business.

Trasforma la Tua Attività con la Tecnologia

Che tu gestisca un negozio, uno studio professionale o un'azienda, posso aiutarti a sfruttare le potenzialità dell'informatica per lavorare meglio, più velocemente e in modo più intelligente.

Parliamone Insieme →

Join the Community

Join the developer community where we discuss software, AI, architecture and DevOps. Share ideas, ask questions and grow with us.

Channel

FC Dev Blog

Get notifications on new articles, complete series, weekly tips and featured tools. Bilingual IT/EN content directly in your Telegram.

New articles as they are published
Weekly tips and code snippets
Polls on future topics

Subscribe to Channel

Group

FC Dev Community

A bilingual IT/EN community for developers. Discussions, Q&A, mutual help and networking with other professionals.

Discussions on articles and technologies
Coding help and code review
Job opportunities and collaboration

Join the Group

Discussion Topics

View

Master SQL

RoadMap.sh

November 2024

View

Oracle Certified Foundations Associate

Oracle

October 2024

View

People Leadership Credential

Connect

September 2024

💻 Languages & Technologies

Java

Python

JavaScript

Angular

React

TypeScript

SQL

PHP

CSS/SCSS

Node.js

Docker

Git

💼

12/2024 - Present

Custom Software Engineering Analyst

Accenture

Bari, Puglia, Italy · Hybrid Analysis and development of computer systems through the use of Java and Quarkus in Health and Public Sector. Continuous training on modern technologies for creating customized and efficient software solutions and on agents.

💼

06/2022 - 12/2024

Software analyst and Back End Developer Associate Consultant

Links Management and Technology SpA

Experience analyzing as-is software systems and ETL flows using PowerCenter. Completed Spring Boot training for developing modern and scalable backend applications. Backend developer specialized in Spring Boot, with experience in database design, analysis, development and testing of assigned tasks.

💼

02/2021 - 10/2021

Software programmer

Adesso.it (prima era WebScience srl)

Experience in AS-IS and TO-BE analysis, SEO evolutions and website evolutions to improve user performance and engagement.

🎓

2018 - 2025

Degree in Computer Science

University of Bari Aldo Moro

Bachelor's degree in Computer Science, focusing on software engineering, algorithms, and modern development practices.

📚

2013 - 2018

Diploma - Corporate Information Systems

Technical Commercial Institute of Maglie

Technical diploma specializing in Business Information Systems, combining IT knowledge with business management.

Contattami

Hai un progetto in mente? Parliamone! Compila il form qui sotto e ti risponderò al più presto.

* Campi obbligatori. I tuoi dati saranno utilizzati solo per rispondere alla tua richiesta.

Introduction: How Neural Networks Learn

If linear algebra is the language of machine learning, differential calculus is its learning engine. Every time a model improves its predictions, it does so through a process called gradient descent, which relies entirely on derivatives and gradients. Without calculus, neural networks could not learn.

In this article, we will see how partial derivatives tell us which direction to adjust weights, how the chain rule makes backpropagation possible, and how everything is implemented in practice with NumPy.

What You Will Learn

Derivatives: the concept of rate of change
Partial derivatives and the gradient vector
Chain rule: how to compose derivatives (the heart of backpropagation)
Computational graphs: forward and backward pass
Jacobian and Hessian: higher-order information
Manual backpropagation implementation in NumPy

Derivatives: The Rate of Change

The derivative of a function $f(x)$ at a point tells us how quickly the function value changes when $x$ changes by an infinitesimal amount:

f'(x) = \\frac{df}{dx} = \\lim_{h \\to 0} \\frac{f(x + h) - f(x)}{h}

Intuition: the derivative is the slope of the function at a point. If positive, the function is increasing; if negative, decreasing; if zero, we are at a stationary point (minimum, maximum, or saddle point).

Derivatives of common activation functions in deep learning:

\\text{ReLU}: f(x) = \\max(0, x) \\quad \\Rightarrow \\quad f'(x) = \\begin{cases} 1 & \\text{if } x > 0 \\\\ 0 & \\text{if } x \\leq 0 \\end{cases}

\\text{Sigmoid}: \\sigma(x) = \\frac{1}{1 + e^{-x}} \\quad \\Rightarrow \\quad \\sigma'(x) = \\sigma(x)(1 - \\sigma(x))

\\text{Tanh}: \\tanh(x) = \\frac{e^x - e^{-x}}{e^x + e^{-x}} \\quad \\Rightarrow \\quad \\tanh'(x) = 1 - \\tanh^2(x)

Why This Matters: the sigmoid derivative has a maximum of 0.25 (when $x = 0$ ). This means that at each layer the gradient is multiplied by a factor of at most 0.25, causing the famous vanishing gradient problem in deep networks. That is why ReLU (derivative = 1 for $x > 0$ ) is preferred.

Partial Derivatives and the Gradient

When the function depends on multiple variables (like a loss function depending on all weights), we compute partial derivatives: the derivative with respect to each variable, keeping the others fixed.

For a function $f(x_1, x_2, \\ldots, x_n)$ , the gradient is the vector of all partial derivatives:

\\nabla f = \\begin{bmatrix} \\frac{\\partial f}{\\partial x_1} \\\\ \\frac{\\partial f}{\\partial x_2} \\\\ \\vdots \\\\ \\frac{\\partial f}{\\partial x_n} \\end{bmatrix}

Crucial intuition: the gradient points in the direction of steepest ascent of the function. To minimize the loss, we move in the opposite direction:

\\theta_{\\text{new}} = \\theta_{\\text{old}} - \\eta \\nabla_{\\theta} L(\\theta)

where $\\eta$ is the learning rate and $L(\\theta)$ the loss function. This is the fundamental formula of gradient descent.


import numpy as np

# Example: f(x, y) = x^2 + 3xy + y^2
# Gradient: [2x + 3y, 3x + 2y]

def f(x, y):
    return x**2 + 3*x*y + y**2

def gradient_f(x, y):
    df_dx = 2*x + 3*y
    df_dy = 3*x + 2*y
    return np.array([df_dx, df_dy])

# Starting point
x, y = 3.0, 2.0
print(f"f({x}, {y}) = {f(x, y)}")
print(f"Gradient: {gradient_f(x, y)}")

# Gradient descent
lr = 0.1
for step in range(20):
    grad = gradient_f(x, y)
    x -= lr * grad[0]
    y -= lr * grad[1]
    if step % 5 == 0:
        print(f"Step {step}: x={x:.4f}, y={y:.4f}, f={f(x, y):.6f}")

The Chain Rule: The Heart of Backpropagation

The chain rule is the mathematical principle that makes training deep neural networks possible. If we have composed functions $y = f(g(x))$ , the derivative is:

\\frac{dy}{dx} = \\frac{dy}{dg} \\cdot \\frac{dg}{dx} = f'(g(x)) \\cdot g'(x)

With multiple composed functions $y = f_1(f_2(f_3(x)))$ :

\\frac{dy}{dx} = \\frac{df_1}{df_2} \\cdot \\frac{df_2}{df_3} \\cdot \\frac{df_3}{dx}

A neural network is exactly a composition of functions: each layer applies a linear transformation followed by a non-linear activation. The chain rule allows us to compute how the loss changes with respect to every weight, traversing all layers in reverse order.

Example: Backpropagation on a Single Neuron

Consider a single neuron with MSE loss:

L = (y - \\hat{y})^2 \\quad \\text{where} \\quad \\hat{y} = \\sigma(wx + b)

The gradient with respect to $w$ via the chain rule:

\\frac{\\partial L}{\\partial w} = \\frac{\\partial L}{\\partial \\hat{y}} \\cdot \\frac{\\partial \\hat{y}}{\\partial z} \\cdot \\frac{\\partial z}{\\partial w} = 2(\\hat{y} - y) \\cdot \\sigma'(z) \\cdot x

where $z = wx + b$ . Each term in the chain has a precise meaning: the error, the activation sensitivity, and the input.


import numpy as np

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_deriv(x):
    s = sigmoid(x)
    return s * (1 - s)

# Single neuron: forward and backward pass
x = 2.0    # input
y = 1.0    # target
w = 0.5    # weight
b = 0.1    # bias
lr = 0.1

for epoch in range(50):
    # Forward pass
    z = w * x + b
    y_hat = sigmoid(z)
    loss = (y - y_hat) ** 2

    # Backward pass (chain rule)
    dL_dyhat = 2 * (y_hat - y)           # dL/d(y_hat)
    dyhat_dz = sigmoid_deriv(z)          # d(y_hat)/dz
    dz_dw = x                             # dz/dw
    dz_db = 1.0                           # dz/db

    dL_dw = dL_dyhat * dyhat_dz * dz_dw  # Full chain rule
    dL_db = dL_dyhat * dyhat_dz * dz_db

    # Update weights
    w -= lr * dL_dw
    b -= lr * dL_db

    if epoch % 10 == 0:
        print(f"Epoch {epoch}: loss={loss:.6f}, w={w:.4f}, b={b:.4f}")

Computational Graphs: Visualizing Forward and Backward

A computational graph represents a function as a tree of elementary operations. Each node performs a simple operation (addition, multiplication, activation) and during the backward pass the gradient flows through the graph in reverse order thanks to the chain rule.

Consider $L = (\\sigma(w_1 x_1 + w_2 x_2 + b) - y)^2$ :

Forward: $z_1 = w_1 x_1$ , $z_2 = w_2 x_2$ , $s = z_1 + z_2 + b$ , $a = \\sigma(s)$ , $L = (a - y)^2$
Backward: compute $\\frac{\\partial L}{\\partial a}$ , then $\\frac{\\partial L}{\\partial s}$ , then $\\frac{\\partial L}{\\partial w_1}$ and $\\frac{\\partial L}{\\partial w_2}$

This is exactly what PyTorch and TensorFlow do automatically with automatic differentiation.

Jacobian and Hessian

The Jacobian generalizes the gradient to vector-valued functions. If $\\mathbf{f}: \\mathbb{R}^n \\to \\mathbb{R}^m$ , the Jacobian is an $m \\times n$ matrix:

\\mathbf{J} = \\begin{bmatrix} \\frac{\\partial f_1}{\\partial x_1} & \\cdots & \\frac{\\partial f_1}{\\partial x_n} \\\\ \\vdots & \\ddots & \\vdots \\\\ \\frac{\\partial f_m}{\\partial x_1} & \\cdots & \\frac{\\partial f_m}{\\partial x_n} \\end{bmatrix}

The Hessian is the matrix of second derivatives, giving us information about the curvature of the loss function:

\\mathbf{H} = \\begin{bmatrix} \\frac{\\partial^2 f}{\\partial x_1^2} & \\frac{\\partial^2 f}{\\partial x_1 \\partial x_2} \\\\ \\frac{\\partial^2 f}{\\partial x_2 \\partial x_1} & \\frac{\\partial^2 f}{\\partial x_2^2} \\end{bmatrix}

The eigenvalues of the Hessian determine whether a critical point is a minimum (all positive), maximum (all negative), or saddle point (mixed). In neural network optimization, saddle points are much more common than local minima.

Full Backpropagation: 2-Layer Network


import numpy as np

np.random.seed(42)

# XOR dataset (not linearly separable)
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([[0], [1], [1], [0]])

# Weight initialization
W1 = np.random.randn(2, 4) * 0.5   # (2 inputs, 4 hidden)
b1 = np.zeros((1, 4))
W2 = np.random.randn(4, 1) * 0.5   # (4 hidden, 1 output)
b2 = np.zeros((1, 1))

def sigmoid(x):
    return 1 / (1 + np.exp(-np.clip(x, -500, 500)))

lr = 1.0

for epoch in range(10000):
    # === FORWARD PASS ===
    z1 = X @ W1 + b1         # (4, 2) @ (2, 4) = (4, 4)
    a1 = sigmoid(z1)          # Hidden activation
    z2 = a1 @ W2 + b2        # (4, 4) @ (4, 1) = (4, 1)
    a2 = sigmoid(z2)          # Output

    # Loss: MSE
    loss = np.mean((y - a2) ** 2)

    # === BACKWARD PASS (Chain Rule) ===
    m = X.shape[0]
    # Output layer gradient
    dL_da2 = 2 * (a2 - y) / m
    da2_dz2 = a2 * (1 - a2)        # Sigmoid derivative
    dz2 = dL_da2 * da2_dz2         # (4, 1)

    dW2 = a1.T @ dz2               # (4, 4).T @ (4, 1) = (4, 1)
    db2 = np.sum(dz2, axis=0, keepdims=True)

    # Hidden layer gradient (chain rule continues!)
    da1 = dz2 @ W2.T               # (4, 1) @ (1, 4) = (4, 4)
    dz1 = da1 * (a1 * (1 - a1))    # Sigmoid derivative

    dW1 = X.T @ dz1                # (2, 4).T @ (4, 4) = (2, 4)
    db1 = np.sum(dz1, axis=0, keepdims=True)

    # === WEIGHT UPDATE ===
    W2 -= lr * dW2
    b2 -= lr * db2
    W1 -= lr * dW1
    b1 -= lr * db1

    if epoch % 2000 == 0:
        print(f"Epoch {epoch}: Loss = {loss:.6f}")

# Final result
predictions = np.round(a2, 2)
print(f"\nFinal predictions:\n{predictions.flatten()}")
print(f"Targets:    {y.flatten()}")

Gradient Checking: Verifying Gradients

To ensure backpropagation is correctly implemented, we can compare analytical gradients with numerical ones computed via finite differences:

\\frac{\\partial L}{\\partial \\theta_i} \\approx \\frac{L(\\theta_i + \\epsilon) - L(\\theta_i - \\epsilon)}{2\\epsilon}

with $\\epsilon \\approx 10^{-7}$ . The relative difference between analytical and numerical gradients should be less than $10^{-5}$ .


import numpy as np

def numerical_gradient(f, params, idx, epsilon=1e-7):
    """Compute numerical gradient for verification."""
    original = params[idx].copy()

    params[idx] = original + epsilon
    loss_plus = f()

    params[idx] = original - epsilon
    loss_minus = f()

    params[idx] = original
    return (loss_plus - loss_minus) / (2 * epsilon)

# Simple example: f = (w*x - y)^2
w = np.array([0.5])
x, y_true = 2.0, 3.0

def compute_loss():
    return (w[0] * x - y_true) ** 2

# Analytical gradient
grad_analytical = 2 * (w[0] * x - y_true) * x

# Numerical gradient
grad_numerical = numerical_gradient(compute_loss, [w], 0)

print(f"Analytical: {grad_analytical:.8f}")
print(f"Numerical:  {grad_numerical:.8f}")
print(f"Rel diff: {abs(grad_analytical - grad_numerical) / max(abs(grad_analytical), 1e-8):.2e}")

Summary and Connections to ML

Key Takeaways

Derivative: measures rate of change, indicates the function slope
Gradient $\\nabla L$ : points in the direction of steepest ascent of the loss
Gradient descent: $\\theta \\leftarrow \\theta - \\eta \\nabla L$ - move opposite to the gradient
Chain rule: allows computing gradients through function compositions
Backpropagation: applying the chain rule to the network's computational graph
Vanishing gradient: sigmoid has max derivative 0.25, ReLU solves with derivative 1

In the Next Article: we will explore probability and statistics for ML. We will cover Bayes' theorem, distributions, Maximum Likelihood Estimation, and how to quantify uncertainty in predictions.