Hi! I'm

Federico Calò

Software Developer | Technical Writer

I create modern web applications and custom digital tools to help businesses grow through technological innovation. My passion is combining computer science and economics to generate real value.

Contact Me

About Me

My passion for computer science was born at the Technical Commercial Institute of Maglie, where I discovered the power of programming and the fascination of creating digital solutions. From the start, I understood that computer science was not just code, but an extraordinary tool for turning ideas into reality.

During my studies in Business Information Systems, I began to interweave computer science and economics, understanding how technology can be the engine of growth for any business. This vision accompanied me to the University of Bari, where I obtained my degree in Computer Science, deepening my technical skills and passion for software development.

Today I put this experience at the service of businesses, professionals and startups, creating tailor-made digital solutions that automate processes, optimize resources and open new business opportunities. Because true innovation begins when technology meets the real needs of people.

My Skills

Data Analysis & Predictive Models

I transform data into strategic insights with in-depth analysis and predictive models for informed decisions

Process Automation

I create custom tools that automate repetitive operations and free up time for value-added activities

Custom Systems

I develop tailor-made software systems, from platform integrations to customized dashboards

const federico = {
  nome: "Federico Calò",
  ruolo: "Sviluppatore Software",
  città: "Bari, Italia",
  missione: "Aiutare attraverso l'informatica",
  passioni: [
    "Codice Pulito",
    "Innovazione",
    "Crescita Continua"
  ]
};

La Mia Missione

Credo fermamente che l'informatica sia lo strumento più potente per trasformare le idee in realtà e migliorare la vita delle persone.

Democratizzare la Tecnologia

La mia missione è rendere l'informatica accessibile a tutti: dalle piccole imprese locali alle startup innovative, fino ai professionisti che vogliono digitalizzare la propria attività. Ogni realtà merita di sfruttare le potenzialità del digitale.

Unire Informatica ed Economia

Non è solo questione di scrivere codice: è capire come la tecnologia possa generare valore reale. Intrecciando competenze informatiche e visione economica, aiuto le attività a crescere, ottimizzare processi e raggiungere nuovi traguardi di efficienza e redditività.

Creare Soluzioni su Misura

Ogni attività è unica, e così devono esserlo le soluzioni. Sviluppo strumenti personalizzati che rispondono alle esigenze specifiche di ciascun cliente, automatizzando processi ripetitivi e liberando tempo per ciò che conta davvero: far crescere il business.

Trasforma la Tua Attività con la Tecnologia

Che tu gestisca un negozio, uno studio professionale o un'azienda, posso aiutarti a sfruttare le potenzialità dell'informatica per lavorare meglio, più velocemente e in modo più intelligente.

Parliamone Insieme →

Join the Community

Join the developer community where we discuss software, AI, architecture and DevOps. Share ideas, ask questions and grow with us.

Channel

FC Dev Blog

Get notifications on new articles, complete series, weekly tips and featured tools. Bilingual IT/EN content directly in your Telegram.

New articles as they are published
Weekly tips and code snippets
Polls on future topics

Subscribe to Channel

Group

FC Dev Community

A bilingual IT/EN community for developers. Discussions, Q&A, mutual help and networking with other professionals.

Discussions on articles and technologies
Coding help and code review
Job opportunities and collaboration

Join the Group

Discussion Topics

View

Master SQL

RoadMap.sh

November 2024

View

Oracle Certified Foundations Associate

Oracle

October 2024

View

People Leadership Credential

Connect

September 2024

💻 Languages & Technologies

Java

Python

JavaScript

Angular

React

TypeScript

SQL

PHP

CSS/SCSS

Node.js

Docker

Git

💼

12/2024 - Present

Custom Software Engineering Analyst

Accenture

Bari, Puglia, Italy · Hybrid Analysis and development of computer systems through the use of Java and Quarkus in Health and Public Sector. Continuous training on modern technologies for creating customized and efficient software solutions and on agents.

💼

06/2022 - 12/2024

Software analyst and Back End Developer Associate Consultant

Links Management and Technology SpA

Experience analyzing as-is software systems and ETL flows using PowerCenter. Completed Spring Boot training for developing modern and scalable backend applications. Backend developer specialized in Spring Boot, with experience in database design, analysis, development and testing of assigned tasks.

💼

02/2021 - 10/2021

Software programmer

Adesso.it (prima era WebScience srl)

Experience in AS-IS and TO-BE analysis, SEO evolutions and website evolutions to improve user performance and engagement.

🎓

2018 - 2025

Degree in Computer Science

University of Bari Aldo Moro

Bachelor's degree in Computer Science, focusing on software engineering, algorithms, and modern development practices.

📚

2013 - 2018

Diploma - Corporate Information Systems

Technical Commercial Institute of Maglie

Technical diploma specializing in Business Information Systems, combining IT knowledge with business management.

Contattami

Hai un progetto in mente? Parliamone! Compila il form qui sotto e ti risponderò al più presto.

* Campi obbligatori. I tuoi dati saranno utilizzati solo per rispondere alla tua richiesta.

Introduction: When Prompt Engineering Isn't Enough

Prompt engineering is powerful, but it has limits. When you need a model to consistently follow a specific style, respond in a proprietary format, or excel in a niche domain, fine-tuning becomes the solution. Fine-tuning adapts the model's weights to your data, creating a specialized version that "thinks" the way your domain requires.

But full fine-tuning of a model with billions of parameters is prohibitively expensive. Techniques like LoRA, QLoRA, and PEFT have revolutionized this process, making it possible to adapt a 70-billion parameter model on a single consumer GPU, modifying less than 0.1% of the total weights.

What You'll Learn in This Article

The difference between full fine-tuning and parameter-efficient techniques
How LoRA (Low-Rank Adaptation) works and why it's so efficient
QLoRA: combining quantization and LoRA for limited GPUs
Dataset preparation for fine-tuning
Practical implementation with Hugging Face and PEFT
When fine-tuning is better than prompt engineering

Full Fine-Tuning vs Parameter-Efficient Fine-Tuning

In full fine-tuning, all model parameters are updated during training. For a model like Llama 3 70B, this means updating 70 billion weights, requiring hundreds of GB of GPU memory and significant hardware costs.

PEFT (Parameter-Efficient Fine-Tuning) techniques solve this problem by updating only a small fraction of parameters, achieving results comparable to full fine-tuning with a fraction of the resources.

      Resource Comparison: Full vs LoRA vs QLoRA
      
        
          Characteristic
          Full Fine-Tuning
          LoRA
          QLoRA
        

          Parameters updated
          100%
          0.1-1%
          0.1-1%
        

          GPU RAM (7B model)
          ~60 GB
          ~16 GB
          ~6 GB
        

          GPU RAM (70B model)
          ~500 GB
          ~160 GB
          ~48 GB
        

          Result quality
          Best
          ~95-98% of full
          ~93-97% of full
        

          Training time
          Hours/Days
          Minutes/Hours
          Minutes/Hours
        

          Estimated cost (7B)
          $50-200
          $5-20
          $2-10
        

    

LoRA: Low-Rank Adaptation

LoRA (Low-Rank Adaptation) is the most popular PEFT technique, based on an elegant mathematical insight: during fine-tuning, the changes to model weights have a low rank. Instead of updating the full weight matrix W (dimension d x d), LoRA decomposes the update into two small matrices A and B of rank r, where r is much smaller than d.

In practice, LoRA "freezes" all original model weights and adds small adapter modules alongside the attention layers. During training, only these modules are updated. During inference, the LoRA weights can be merged with the originals at no additional cost.


# LoRA configuration with Hugging Face PEFT
from peft import LoraConfig, get_peft_model, TaskType
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load base model
model_name = "meta-llama/Llama-3.1-8B"
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Configure LoRA
lora_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    r=16,                    # Decomposition rank (higher = more capacity)
    lora_alpha=32,           # Scale factor (typically 2x r)
    lora_dropout=0.05,       # Dropout for regularization
    target_modules=[         # Layers to apply LoRA to
        "q_proj", "k_proj",  # Query and Key in attention
        "v_proj", "o_proj",  # Value and Output projection
    ],
    bias="none"              # Don't train biases
)

# Apply LoRA to model
peft_model = get_peft_model(model, lora_config)

# Check trainable parameters
peft_model.print_trainable_parameters()
# Output: trainable params: 6,553,600 || all params: 8,030,261,248
# Percentage: 0.082% of total parameters!

How to Choose Rank r

The r (rank) parameter determines the expressive capacity of the LoRA adaptation:

r = 4-8: sufficient for simple tasks (classification, output formatting)
r = 16-32: good balance for most use cases
r = 64-128: for complex tasks requiring significant behavior changes

QLoRA: LoRA + Quantization

QLoRA combines LoRA with 4-bit quantization of the base model. The original model is compressed from float16 (16 bits per weight) to int4 (4 bits per weight), reducing memory requirements by approximately 4x. LoRA modules remain in float16 to maintain fine-tuning precision.


# QLoRA: fine-tuning with 4-bit quantization
from transformers import BitsAndBytesConfig
import torch

# 4-bit quantization configuration
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,                    # Quantize to 4 bits
    bnb_4bit_quant_type="nf4",            # NormalFloat4 (best for LLMs)
    bnb_4bit_compute_dtype=torch.bfloat16, # Compute in bfloat16
    bnb_4bit_use_double_quant=True        # Double quantization to save RAM
)

# Load quantized model
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.1-8B",
    quantization_config=bnb_config,
    device_map="auto"
)

# Apply LoRA on top of quantized model
peft_model = get_peft_model(model, lora_config)

# Now the 8B model requires only ~6GB VRAM!
print(f"GPU memory: ~{torch.cuda.memory_allocated() / 1e9:.1f} GB")

Dataset Preparation

Fine-tuning quality directly depends on dataset quality. A well-prepared dataset is more important than perfect hyperparameters. The standard format is instruction-response pairs.


# Dataset preparation for fine-tuning
import json

# Standard instruction-following format
training_examples = [
    {
        "instruction": "Analyze the sentiment of this restaurant review",
        "input": "The pasta was cold and the service painfully slow. Never again.",
        "output": "NEGATIVE. The review expresses dissatisfaction with two aspects: food quality (cold pasta) and service (painfully slow). The expression 'Never again' indicates a strongly negative experience."
    },
    {
        "instruction": "Analyze the sentiment of this restaurant review",
        "input": "Nice atmosphere, average food. Prices a bit high for the area.",
        "output": "NEUTRAL/MIXED. The review presents positive aspects (nice atmosphere), neutral (average food), and negative (high prices). The overall judgment is balanced with a slight negative tendency."
    }
]

# Format for training
def format_for_training(example: dict) -> str:
    """Format an example in the standard chat template."""
    prompt = f"""### Instruction:
{example['instruction']}

### Input:
{example['input']}

### Response:
{example['output']}"""
    return prompt

# Save dataset
with open("training_data.jsonl", "w") as f:
    for ex in training_examples:
        f.write(json.dumps({"text": format_for_training(ex)}) + "\n")

print(f"Dataset created with {len(training_examples)} examples")

Dataset Best Practices

Quality > Quantity: 500 high-quality examples beat 5,000 mediocre ones
Diversity: cover all variants of the task the model will need to handle
Consistency: maintain uniform format and style across all examples
Balance: distribute classes uniformly (positive/negative/neutral)
Validation: set aside at least 10-20% of data for evaluation
Cleaning: remove duplicates, grammar errors, inconsistent responses

Training and Evaluation

Fine-tuning training follows the same principle as standard training: minimizing loss on training examples. With LoRA/QLoRA, however, the process is much faster and requires fewer resources.


# Training with Hugging Face Trainer
from transformers import TrainingArguments, Trainer
from datasets import load_dataset

# Load dataset
dataset = load_dataset("json", data_files="training_data.jsonl", split="train")
dataset = dataset.train_test_split(test_size=0.1)

# Tokenize
def tokenize(example):
    return tokenizer(
        example["text"],
        truncation=True,
        max_length=512,
        padding="max_length"
    )

tokenized = dataset.map(tokenize, batched=True)

# Configure training
training_args = TrainingArguments(
    output_dir="./fine-tuned-model",
    num_train_epochs=3,              # Number of epochs (2-5 for LoRA)
    per_device_train_batch_size=4,   # Batch size per GPU
    gradient_accumulation_steps=4,   # Simulate larger batch size
    learning_rate=2e-4,              # Learning rate (1e-4 - 3e-4 for LoRA)
    warmup_steps=100,                # Gradual warmup
    logging_steps=10,                # Log every 10 steps
    save_strategy="epoch",           # Save at each epoch
    evaluation_strategy="epoch",     # Evaluate at each epoch
    fp16=True,                       # Mixed precision for speed
)

# Start training
trainer = Trainer(
    model=peft_model,
    args=training_args,
    train_dataset=tokenized["train"],
    eval_dataset=tokenized["test"],
)

trainer.train()

# Save LoRA model (only adapters, a few MB)
peft_model.save_pretrained("./lora-adapters")
print("Training complete! Adapters saved.")

Merge and Deploy

After training, LoRA adapters can be used in two ways: loaded separately on top of the base model (flexible, you can have multiple adapters) or merged with the base model into a single model (simpler to deploy, no inference overhead).


# Merge LoRA adapters with base model
from peft import PeftModel

# Load base model + adapters
base_model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.1-8B",
    torch_dtype=torch.float16,
    device_map="auto"
)
merged_model = PeftModel.from_pretrained(base_model, "./lora-adapters")

# Merge adapters with base model
merged_model = merged_model.merge_and_unload()

# Save complete model
merged_model.save_pretrained("./final-model")
tokenizer.save_pretrained("./final-model")
print("Final model saved (base + LoRA merged)")

# Test the fine-tuned model
inputs = tokenizer("### Instruction:\nAnalyze the sentiment...", return_tensors="pt")
outputs = merged_model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Decision Framework: Fine-Tuning vs Prompt Engineering

Fine-tuning isn't always the right choice. Here's a framework for deciding when to invest in fine-tuning and when prompt engineering is sufficient.

      When to Choose What
      
          Scenario
          Recommendation
          Reason
        
          Generic task with specific format
          Prompt Engineering
          Few-shot is sufficient
        
          Niche domain with specific terminology
          Fine-Tuning
          Model needs to learn the vocabulary
        
          Consistent writing style
          Fine-Tuning
          Difficult to maintain with prompts alone
        
          Limited budget, few data
          Prompt Engineering
          Fine-tuning requires data and compute
        
          Critical latency, high per-token cost
          Fine-Tuning (small model)
          Small fine-tuned model beats large generic one
        
          Data privacy requirement
          Fine-Tuning (open source)
          No data sent to third parties

Conclusions

Fine-tuning with LoRA and QLoRA has democratized language model adaptation. What previously required expensive GPU clusters is now possible on a single consumer GPU, modifying less than 1% of the model's total parameters.

The key to success lies in dataset quality: 500 manually curated examples produce better results than 10,000 automatically generated ones. Invest time in data preparation, not just hyperparameters.

In the next article, we'll see how to bring LLMs to production: OpenAI and Anthropic APIs, open source model deployment, caching strategies, rate limiting, monitoring, and cost management.

Characteristic	Full Fine-Tuning	LoRA	QLoRA
Parameters updated	100%	0.1-1%	0.1-1%
GPU RAM (7B model)	~60 GB	~16 GB	~6 GB
GPU RAM (70B model)	~500 GB	~160 GB	~48 GB
Result quality	Best	~95-98% of full	~93-97% of full
Training time	Hours/Days	Minutes/Hours	Minutes/Hours
Estimated cost (7B)	$50-200	$5-20	$2-10

Scenario	Recommendation	Reason
Generic task with specific format	Prompt Engineering	Few-shot is sufficient
Niche domain with specific terminology	Fine-Tuning	Model needs to learn the vocabulary
Consistent writing style	Fine-Tuning	Difficult to maintain with prompts alone
Limited budget, few data	Prompt Engineering	Fine-tuning requires data and compute
Critical latency, high per-token cost	Fine-Tuning (small model)	Small fine-tuned model beats large generic one
Data privacy requirement	Fine-Tuning (open source)	No data sent to third parties