Hi! I'm

Federico Calò

Software Developer | Technical Writer

I create modern web applications and custom digital tools to help businesses grow through technological innovation. My passion is combining computer science and economics to generate real value.

Contact Me

About Me

My passion for computer science was born at the Technical Commercial Institute of Maglie, where I discovered the power of programming and the fascination of creating digital solutions. From the start, I understood that computer science was not just code, but an extraordinary tool for turning ideas into reality.

During my studies in Business Information Systems, I began to interweave computer science and economics, understanding how technology can be the engine of growth for any business. This vision accompanied me to the University of Bari, where I obtained my degree in Computer Science, deepening my technical skills and passion for software development.

Today I put this experience at the service of businesses, professionals and startups, creating tailor-made digital solutions that automate processes, optimize resources and open new business opportunities. Because true innovation begins when technology meets the real needs of people.

My Skills

Data Analysis & Predictive Models

I transform data into strategic insights with in-depth analysis and predictive models for informed decisions

Process Automation

I create custom tools that automate repetitive operations and free up time for value-added activities

Custom Systems

I develop tailor-made software systems, from platform integrations to customized dashboards

const federico = {
  nome: "Federico Calò",
  ruolo: "Sviluppatore Software",
  città: "Bari, Italia",
  missione: "Aiutare attraverso l'informatica",
  passioni: [
    "Codice Pulito",
    "Innovazione",
    "Crescita Continua"
  ]
};

La Mia Missione

Credo fermamente che l'informatica sia lo strumento più potente per trasformare le idee in realtà e migliorare la vita delle persone.

Democratizzare la Tecnologia

La mia missione è rendere l'informatica accessibile a tutti: dalle piccole imprese locali alle startup innovative, fino ai professionisti che vogliono digitalizzare la propria attività. Ogni realtà merita di sfruttare le potenzialità del digitale.

Unire Informatica ed Economia

Non è solo questione di scrivere codice: è capire come la tecnologia possa generare valore reale. Intrecciando competenze informatiche e visione economica, aiuto le attività a crescere, ottimizzare processi e raggiungere nuovi traguardi di efficienza e redditività.

Creare Soluzioni su Misura

Ogni attività è unica, e così devono esserlo le soluzioni. Sviluppo strumenti personalizzati che rispondono alle esigenze specifiche di ciascun cliente, automatizzando processi ripetitivi e liberando tempo per ciò che conta davvero: far crescere il business.

Trasforma la Tua Attività con la Tecnologia

Che tu gestisca un negozio, uno studio professionale o un'azienda, posso aiutarti a sfruttare le potenzialità dell'informatica per lavorare meglio, più velocemente e in modo più intelligente.

Parliamone Insieme →

Join the Community

Join the developer community where we discuss software, AI, architecture and DevOps. Share ideas, ask questions and grow with us.

Channel

FC Dev Blog

Get notifications on new articles, complete series, weekly tips and featured tools. Bilingual IT/EN content directly in your Telegram.

New articles as they are published
Weekly tips and code snippets
Polls on future topics

Subscribe to Channel

Group

FC Dev Community

A bilingual IT/EN community for developers. Discussions, Q&A, mutual help and networking with other professionals.

Discussions on articles and technologies
Coding help and code review
Job opportunities and collaboration

Join the Group

Discussion Topics

View

Master SQL

RoadMap.sh

November 2024

View

Oracle Certified Foundations Associate

Oracle

October 2024

View

People Leadership Credential

Connect

September 2024

💻 Languages & Technologies

Java

Python

JavaScript

Angular

React

TypeScript

SQL

PHP

CSS/SCSS

Node.js

Docker

Git

💼

12/2024 - Present

Custom Software Engineering Analyst

Accenture

Bari, Puglia, Italy · Hybrid Analysis and development of computer systems through the use of Java and Quarkus in Health and Public Sector. Continuous training on modern technologies for creating customized and efficient software solutions and on agents.

💼

06/2022 - 12/2024

Software analyst and Back End Developer Associate Consultant

Links Management and Technology SpA

Experience analyzing as-is software systems and ETL flows using PowerCenter. Completed Spring Boot training for developing modern and scalable backend applications. Backend developer specialized in Spring Boot, with experience in database design, analysis, development and testing of assigned tasks.

💼

02/2021 - 10/2021

Software programmer

Adesso.it (prima era WebScience srl)

Experience in AS-IS and TO-BE analysis, SEO evolutions and website evolutions to improve user performance and engagement.

🎓

2018 - 2025

Degree in Computer Science

University of Bari Aldo Moro

Bachelor's degree in Computer Science, focusing on software engineering, algorithms, and modern development practices.

📚

2013 - 2018

Diploma - Corporate Information Systems

Technical Commercial Institute of Maglie

Technical diploma specializing in Business Information Systems, combining IT knowledge with business management.

Contattami

Hai un progetto in mente? Parliamone! Compila il form qui sotto e ti risponderò al più presto.

* Campi obbligatori. I tuoi dati saranno utilizzati solo per rispondere alla tua richiesta.

Introduction: Disassembling the Magic of LLMs

Large Language Models seem magical: you type a question and get a coherent, structured, often surprisingly intelligent response. But under the hood there's no magic, there's mathematics. An LLM is fundamentally a system that predicts the next token in a sequence, based on statistical patterns learned from billions of words during training.

Understanding how LLMs work is not an academic exercise: it's an essential practical skill. Knowing what happens between your prompt and the generated response allows you to write better prompts, debug unexpected behaviors, choose the right model for your use case, and understand why LLMs hallucinate.

What You'll Learn in This Article

How text is transformed into numbers through tokenization
The role of embeddings in representing semantic meaning
How the attention mechanism works in Transformers
The text generation process: from logits to tokens
Sampling strategies: temperature, top-k, and top-p
Why LLMs hallucinate and the role of the context window

Phase 1: Tokenization - From Text to Numbers

The first step in LLM processing is tokenization: converting text into a sequence of integers. Neural models don't understand letters or words; they operate on numerical vectors. Tokenization is the bridge between human language and model mathematics.

Byte-Pair Encoding (BPE)

The most common algorithm is Byte-Pair Encoding (BPE), used by GPT, Claude, and most modern models. BPE works iteratively: it starts from individual characters and progressively merges the most frequent pairs into longer tokens.

The result is a vocabulary of 50,000-100,000 tokens representing an optimal compromise between granularity and efficiency. Common words like "the" become a single token, while rare words are split into sub-tokens.


# Example of tokenization with tiktoken (OpenAI's tokenizer)
import tiktoken

# Load GPT-4's tokenizer
enc = tiktoken.encoding_for_model("gpt-4")

# Tokenize a sentence
text = "Generative artificial intelligence is revolutionary"
tokens = enc.encode(text)
print(f"Text: {text}")
print(f"Token IDs: {tokens}")
print(f"Token count: {len(tokens)}")

# Decode each token to see the breakdown
for token_id in tokens:
    print(f"  ID {token_id} -> '{enc.decode([token_id])}'")

# Typical output:
# Text: Generative artificial intelligence is revolutionary
# Token IDs: [Gen, erative, artificial, intelligence, is, revolutionary]
# Token count: 6

Practical Impact of Tokenization

Tokenization has important practical consequences every developer should know:

Cost: APIs charge per token, not per word. One word can be 1-4 tokens
Context window: the limit is in tokens, not words. 4,000 tokens is roughly 3,000 English words
Different languages: non-English languages often require more tokens to express the same concept (~1.3x for Italian)
Code: source code is often less token-efficient than natural text

Phase 2: Embeddings - From Token to Meaning

After tokenization, each token ID is converted into an embedding: a dense vector of real numbers (typically 768-12,288 dimensions) that captures the semantic meaning of the token.

The power of embeddings lies in their geometry: words with similar meanings have nearby vectors in the space. "King" and "Queen" are close, as are "Paris" and "France". These relationships are automatically learned during training.

Embeddings: Numbers with Meaning

An embedding is not a simple numeric ID: it's a high-dimensional vector where each dimension captures an aspect of meaning. Arithmetic operations on embeddings produce semantically sensible results: vec("king") - vec("man") + vec("woman") ≈ vec("queen").


# Visualizing semantic similarity of embeddings
from openai import OpenAI
import numpy as np

client = OpenAI()

def get_embedding(text: str) -> list:
    """Get the embedding of a text using OpenAI."""
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=text
    )
    return response.data[0].embedding

def cosine_similarity(a: list, b: list) -> float:
    """Calculate cosine similarity between two vectors."""
    a, b = np.array(a), np.array(b)
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

# Compare semantic similarities
words = ["cat", "dog", "automobile", "feline"]
embeddings = {w: get_embedding(w) for w in words}

print("Semantic similarities:")
print(f"  cat-feline:      {cosine_similarity(embeddings['cat'], embeddings['feline']):.4f}")
print(f"  cat-dog:         {cosine_similarity(embeddings['cat'], embeddings['dog']):.4f}")
print(f"  cat-automobile:  {cosine_similarity(embeddings['cat'], embeddings['automobile']):.4f}")
# cat-feline will have the highest similarity!

Positional Encoding

Beyond semantic embeddings, Transformers add a positional encoding: a signal indicating the position of each token in the sequence. Without this mechanism, the model wouldn't distinguish "the cat chases the mouse" from "the mouse chases the cat", since the Transformer architecture processes all tokens in parallel, not sequentially.

Phase 3: The Transformer - Attention Is All You Need

The heart of every modern LLM is the Transformer architecture, composed of repeated blocks of Self-Attention and Feed-Forward Networks. Models like GPT-4 have hundreds of these stacked blocks, each refining the text representation.

Self-Attention: The Key Mechanism

Self-attention allows each token to "look at" all other tokens in the sequence and decide how relevant they are to its meaning in the current context. In the sentence "The cat sat on the mat because it was tired", the attention mechanism connects "was tired" to "cat" (not "mat"), resolving the coreference.

Mathematically, for each token, three vectors are computed: Query (what am I looking for), Key (what I offer as context), and Value (my informational content). The dot product between Query and Key determines the attention weight, which is used to weigh the Values.

Multi-Head Attention

A single attention mechanism captures one type of relationship. Multi-head attention runs multiple attention operations in parallel (typically 32-128 "heads"), each specialized in a different aspect: syntactic, semantic, proximity, coreference relationships, and so on.

Anatomy of a Transformer Block

Each Transformer block follows this structure: Layer Norm to stabilize input, Multi-Head Self-Attention to capture relationships between tokens, Residual Connection to preserve original information, a second Layer Norm, and a Feed-Forward Network (2 dense layers) to transform the representation, followed by another Residual Connection. GPT-4 stacks approximately 120 of these blocks.

Phase 4: Text Generation

After the input text has passed through all Transformer blocks, the last layer produces an output vector for each position. To generate the next token, this vector is projected onto the entire vocabulary producing logits: a numerical score for every possible token in the vocabulary.

From Logits to Probabilities: Softmax

Logits are transformed into probabilities through the softmax function, which normalizes the scores so they sum to 1. The token with the highest probability is the model's "best prediction", but it's not always the one selected.

Sampling Strategies

The choice of the next token is not deterministic. Different sampling strategies produce outputs with different characteristics:

Greedy decoding: always picks the most probable token. Deterministic but often repetitive and boring
Random sampling: samples from the full distribution. Creative but potentially incoherent
Temperature: controls "randomness". T=0 is greedy, T=1 is the original distribution, T>1 increases creativity
Top-k sampling: samples only from the k most probable tokens (e.g., k=40)
Top-p (nucleus) sampling: samples from the smallest set of tokens whose cumulative probability exceeds p (e.g., p=0.9)


# Example: effect of temperature on generation
from anthropic import Anthropic

client = Anthropic()
prompt = "Write the beginning of a fantasy story in one line:"

for temp in [0.0, 0.5, 1.0, 1.5]:
    response = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=100,
        temperature=temp,
        messages=[{"role": "user", "content": prompt}]
    )
    print(f"\nTemperature {temp}:")
    print(f"  {response.content[0].text}")

# Temperature 0.0: deterministic output, always the same
# Temperature 0.5: slight variation, still coherent
# Temperature 1.0: creative, good balance
# Temperature 1.5: very creative, possible incoherence

Context Window: The Memory of LLMs

The context window is the maximum number of tokens an LLM can process in a single request (input + output). This is effectively the model's "working memory" during a conversation.

      Context Window by Model
      
          Model
          Context Window
          Approximate Equivalent
        
          GPT-3.5
          4,096 / 16,384 tokens
          ~3,000 / 12,000 words
        
          GPT-4
          8,192 / 128,000 tokens
          ~6,000 / 96,000 words
        
          Claude 3.5 Sonnet
          200,000 tokens
          ~150,000 words
        
          Gemini 1.5 Pro
          1,000,000 tokens
          ~750,000 words
        
          Llama 3.1
          128,000 tokens
          ~96,000 words

Hallucinations: Why LLMs Make Things Up

Hallucinations are one of the most critical problems with LLMs: the model generates false information with the same confidence it generates true information. This happens because an LLM doesn't "know" facts: it predicts the most probable next token given the context.

If the statistical pattern suggests that after "The capital of Australia is" the most likely token is "Sydney", the model will generate "Sydney" even though the correct answer is "Canberra". The model has no internal mechanism to verify the truth of its outputs.

Mitigation: Retrieval-Augmented Generation (RAG)

The most effective strategy to reduce hallucinations is RAG: providing the model with factual information retrieved from reliable sources as part of the context. Instead of asking the model to "remember", we give it updated data and ask it to reason about it.


# Simplified RAG: providing factual context to the model
from anthropic import Anthropic

client = Anthropic()

# Context retrieved from a database or search engine
factual_context = """
Company data updated to Q3 2025:
- Revenue: EUR 12.5M (+23% YoY)
- Employees: 85
- Active clients: 342
- NPS score: 72
"""

response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=500,
    messages=[{
        "role": "user",
        "content": f"""Based EXCLUSIVELY on the following data,
answer the question. If the data doesn't contain the answer, say "I don't have this information".

DATA:
{factual_context}

QUESTION: What is the current revenue and how many employees do we have?"""
    }]
)
print(response.content[0].text)

Conclusions

Understanding the inner workings of LLMs - from tokenization to generation, through embeddings, attention, and sampling - is not just theoretical knowledge. It's the foundation for using these tools effectively and consciously.

Tokenization influences costs and context limits. Temperature and sampling strategies determine output creativity. The attention mechanism explains why the model understands (or doesn't understand) context. Hallucinations are a direct consequence of the next-token-prediction architecture.

In the next article, we'll put this knowledge into practice with Advanced Prompt Engineering: systematic techniques to get the most from LLMs, from zero-shot to chain-of-thought, from system prompts to the ReAct pattern.

Model	Context Window	Approximate Equivalent
GPT-3.5	4,096 / 16,384 tokens	~3,000 / 12,000 words
GPT-4	8,192 / 128,000 tokens	~6,000 / 96,000 words
Claude 3.5 Sonnet	200,000 tokens	~150,000 words
Gemini 1.5 Pro	1,000,000 tokens	~750,000 words
Llama 3.1	128,000 tokens	~96,000 words