Hi! I'm

Federico Calò

Software Developer | Technical Writer

I create modern web applications and custom digital tools to help businesses grow through technological innovation. My passion is combining computer science and economics to generate real value.

Contact Me

About Me

My passion for computer science was born at the Technical Commercial Institute of Maglie, where I discovered the power of programming and the fascination of creating digital solutions. From the start, I understood that computer science was not just code, but an extraordinary tool for turning ideas into reality.

During my studies in Business Information Systems, I began to interweave computer science and economics, understanding how technology can be the engine of growth for any business. This vision accompanied me to the University of Bari, where I obtained my degree in Computer Science, deepening my technical skills and passion for software development.

Today I put this experience at the service of businesses, professionals and startups, creating tailor-made digital solutions that automate processes, optimize resources and open new business opportunities. Because true innovation begins when technology meets the real needs of people.

My Skills

Data Analysis & Predictive Models

I transform data into strategic insights with in-depth analysis and predictive models for informed decisions

Process Automation

I create custom tools that automate repetitive operations and free up time for value-added activities

Custom Systems

I develop tailor-made software systems, from platform integrations to customized dashboards

const federico = {
  nome: "Federico Calò",
  ruolo: "Sviluppatore Software",
  città: "Bari, Italia",
  missione: "Aiutare attraverso l'informatica",
  passioni: [
    "Codice Pulito",
    "Innovazione",
    "Crescita Continua"
  ]
};

La Mia Missione

Credo fermamente che l'informatica sia lo strumento più potente per trasformare le idee in realtà e migliorare la vita delle persone.

Democratizzare la Tecnologia

La mia missione è rendere l'informatica accessibile a tutti: dalle piccole imprese locali alle startup innovative, fino ai professionisti che vogliono digitalizzare la propria attività. Ogni realtà merita di sfruttare le potenzialità del digitale.

Unire Informatica ed Economia

Non è solo questione di scrivere codice: è capire come la tecnologia possa generare valore reale. Intrecciando competenze informatiche e visione economica, aiuto le attività a crescere, ottimizzare processi e raggiungere nuovi traguardi di efficienza e redditività.

Creare Soluzioni su Misura

Ogni attività è unica, e così devono esserlo le soluzioni. Sviluppo strumenti personalizzati che rispondono alle esigenze specifiche di ciascun cliente, automatizzando processi ripetitivi e liberando tempo per ciò che conta davvero: far crescere il business.

Trasforma la Tua Attività con la Tecnologia

Che tu gestisca un negozio, uno studio professionale o un'azienda, posso aiutarti a sfruttare le potenzialità dell'informatica per lavorare meglio, più velocemente e in modo più intelligente.

Parliamone Insieme →

Join the Community

Join the developer community where we discuss software, AI, architecture and DevOps. Share ideas, ask questions and grow with us.

Channel

FC Dev Blog

Get notifications on new articles, complete series, weekly tips and featured tools. Bilingual IT/EN content directly in your Telegram.

New articles as they are published
Weekly tips and code snippets
Polls on future topics

Subscribe to Channel

Group

FC Dev Community

A bilingual IT/EN community for developers. Discussions, Q&A, mutual help and networking with other professionals.

Discussions on articles and technologies
Coding help and code review
Job opportunities and collaboration

Join the Group

Discussion Topics

View

Master SQL

RoadMap.sh

November 2024

View

Oracle Certified Foundations Associate

Oracle

October 2024

View

People Leadership Credential

Connect

September 2024

💻 Languages & Technologies

Java

Python

JavaScript

Angular

React

TypeScript

SQL

PHP

CSS/SCSS

Node.js

Docker

Git

💼

12/2024 - Present

Custom Software Engineering Analyst

Accenture

Bari, Puglia, Italy · Hybrid Analysis and development of computer systems through the use of Java and Quarkus in Health and Public Sector. Continuous training on modern technologies for creating customized and efficient software solutions and on agents.

💼

06/2022 - 12/2024

Software analyst and Back End Developer Associate Consultant

Links Management and Technology SpA

Experience analyzing as-is software systems and ETL flows using PowerCenter. Completed Spring Boot training for developing modern and scalable backend applications. Backend developer specialized in Spring Boot, with experience in database design, analysis, development and testing of assigned tasks.

💼

02/2021 - 10/2021

Software programmer

Adesso.it (prima era WebScience srl)

Experience in AS-IS and TO-BE analysis, SEO evolutions and website evolutions to improve user performance and engagement.

🎓

2018 - 2025

Degree in Computer Science

University of Bari Aldo Moro

Bachelor's degree in Computer Science, focusing on software engineering, algorithms, and modern development practices.

📚

2013 - 2018

Diploma - Corporate Information Systems

Technical Commercial Institute of Maglie

Technical diploma specializing in Business Information Systems, combining IT knowledge with business management.

Contattami

Hai un progetto in mente? Parliamone! Compila il form qui sotto e ti risponderò al più presto.

* Campi obbligatori. I tuoi dati saranno utilizzati solo per rispondere alla tua richiesta.

The Importance of Data Preprocessing

Feature engineering and data preprocessing are the most critical phases of any Machine Learning project. An unwritten rule states that 80% of a data scientist's time is spent preparing data, and only 20% on modeling. No matter how sophisticated the algorithm: if the input data is dirty, incomplete, or poorly represented, the model will produce poor results. Garbage in, garbage out.

Preprocessing transforms raw data into a format suitable for the algorithm. Feature engineering goes further: it creates new variables from existing ones, leveraging domain knowledge to capture relationships that the algorithm alone would not find. Together, these phases determine the success or failure of an ML project.

What You Will Learn in This Article

Techniques for handling missing values
Encoding categorical variables
Scaling and normalizing numerical features
Outlier detection and handling
Creating new features with domain knowledge
Preprocessing pipelines with scikit-learn

Handling Missing Values

Real-world data almost always contains missing values (NaN, null). The main strategies are three: deletion (removing rows or columns with too many missing values), statistical imputation (replacing with mean, median, or mode), and predictive imputation (using a model to predict missing values). The choice depends on the amount of missing data and the missingness pattern (random or systematic).

Python — Handling Missing Values

import pandas as pd
import numpy as np
from sklearn.impute import SimpleImputer, KNNImputer

# Dataset with missing values
data = pd.DataFrame({
    'age': [25, 30, np.nan, 45, 35, np.nan, 28, 50],
    'income': [30000, np.nan, 45000, 60000, np.nan, 55000, 32000, 70000],
    'category': ['A', 'B', 'A', np.nan, 'B', 'A', 'B', 'A'],
    'target': [0, 1, 0, 1, 1, 0, 1, 0]
})

print("Missing values per column:")
print(data.isnull().sum())
print(f"\nMissing percentage:\n{(data.isnull().mean() * 100).round(1)}")

# Strategy 1: Mean/median imputation
imputer_mean = SimpleImputer(strategy='mean')
data['age_imputed'] = imputer_mean.fit_transform(data[['age']])

imputer_median = SimpleImputer(strategy='median')
data['income_imputed'] = imputer_median.fit_transform(data[['income']])

# Strategy 2: KNN imputation (uses neighbors)
knn_imputer = KNNImputer(n_neighbors=3)
numeric_cols = data[['age', 'income']].values
imputed_knn = knn_imputer.fit_transform(numeric_cols)

# Strategy 3: Categorical imputation with mode
imputer_mode = SimpleImputer(strategy='most_frequent')
data['category_imputed'] = imputer_mode.fit_transform(data[['category']])

print("\nAfter imputation:")
print(data[['age_imputed', 'income_imputed', 'category_imputed']].head())

Encoding Categorical Variables

ML algorithms work with numbers, not strings. Categorical variables must be converted to numerical format. Label Encoding assigns an integer to each category (A=0, B=1, C=2): simple but introduces a non-existent order. One-Hot Encoding creates a binary column for each category: does not introduce order but can generate many columns with high-cardinality categoricals. Target Encoding replaces each category with the target mean for that category: powerful but risky for overfitting.

Python — Encoding and Scaling

from sklearn.preprocessing import (
    LabelEncoder, OneHotEncoder, OrdinalEncoder,
    StandardScaler, MinMaxScaler, RobustScaler
)
from sklearn.compose import ColumnTransformer
import pandas as pd
import numpy as np

# Example dataset
df = pd.DataFrame({
    'color': ['red', 'blue', 'green', 'red', 'blue'],
    'size': ['S', 'M', 'L', 'XL', 'M'],
    'price': [10.5, 25.0, 45.0, 80.0, 22.0],
    'weight': [100, 500, 1200, 3000, 450]
})

# One-Hot for color (nominal, no order)
ohe = OneHotEncoder(sparse_output=False, drop='first')
color_encoded = ohe.fit_transform(df[['color']])
print(f"One-Hot color:\n{color_encoded}")

# Ordinal for size (ordinal, has an order)
oe = OrdinalEncoder(categories=[['S', 'M', 'L', 'XL']])
size_encoded = oe.fit_transform(df[['size']])
print(f"\nOrdinal size: {size_encoded.flatten()}")

# --- SCALING ---
# StandardScaler: mean=0, std=1 (for normal distributions)
ss = StandardScaler()
price_standard = ss.fit_transform(df[['price']])

# MinMaxScaler: range [0,1] (for non-normal distributions)
mms = MinMaxScaler()
price_minmax = mms.fit_transform(df[['price']])

# RobustScaler: uses median and IQR (robust to outliers)
rs = RobustScaler()
weight_robust = rs.fit_transform(df[['weight']])

print(f"\nStandard: {price_standard.flatten().round(2)}")
print(f"MinMax:   {price_minmax.flatten().round(2)}")
print(f"Robust:   {weight_robust.flatten().round(2)}")

Outlier Detection

Outliers are anomalous values that deviate significantly from the rest of the data. They can be measurement errors, corrupted data, or genuine extreme values. The IQR (Interquartile Range) method identifies outliers as points beyond 1.5 times the IQR from the first or third quartile. The Z-score method identifies points with standardized values beyond a threshold (typically 3). Isolation Forest is an ML approach that isolates outliers using random decision trees.

Feature Selection

Not all features contribute positively to the model. Irrelevant or redundant features can worsen performance and slow down training. Feature selection identifies the most informative variables. Methods include: correlation (remove highly correlated features), variance threshold (remove low-variance features), SelectKBest (select the K best according to a statistical test), and Random Forest feature importance.

Preprocessing Pipeline with scikit-learn

scikit-learn Pipelines chain preprocessing and modeling steps into a single object. This prevents data leakage (when test set information contaminates training) and simplifies cross-validation and deployment. ColumnTransformer allows applying different transformations to different columns.

Python — Complete Preprocessing Pipeline

from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.impute import SimpleImputer
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
import pandas as pd
import numpy as np

# Realistic dataset
np.random.seed(42)
n = 200
df = pd.DataFrame({
    'age': np.random.randint(18, 70, n).astype(float),
    'income': np.random.normal(40000, 15000, n),
    'experience': np.random.randint(0, 30, n).astype(float),
    'city': np.random.choice(['Milan', 'Rome', 'Naples', 'Turin'], n),
    'education': np.random.choice(['Diploma', 'Bachelor', 'Master'], n),
    'target': np.random.randint(0, 2, n)
})

# Insert random missing values
for col in ['age', 'income', 'experience']:
    mask = np.random.random(n) < 0.1
    df.loc[mask, col] = np.nan

# Define columns by type
numeric_features = ['age', 'income', 'experience']
categorical_features = ['city', 'education']

# Numeric preprocessing: impute + scale
numeric_transformer = Pipeline([
    ('imputer', SimpleImputer(strategy='median')),
    ('scaler', StandardScaler())
])

# Categorical preprocessing: impute + one-hot
categorical_transformer = Pipeline([
    ('imputer', SimpleImputer(strategy='most_frequent')),
    ('onehot', OneHotEncoder(drop='first', handle_unknown='ignore'))
])

# ColumnTransformer combines everything
preprocessor = ColumnTransformer(transformers=[
    ('num', numeric_transformer, numeric_features),
    ('cat', categorical_transformer, categorical_features)
])

# Full pipeline: preprocessing + model
full_pipeline = Pipeline([
    ('preprocessor', preprocessor),
    ('classifier', RandomForestClassifier(n_estimators=100, random_state=42))
])

# Cross-validation (preprocessing happens INSIDE each fold)
X = df.drop('target', axis=1)
y = df['target']
scores = cross_val_score(full_pipeline, X, y, cv=5, scoring='accuracy')
print(f"Accuracy: {scores.mean():.3f} (+/- {scores.std():.3f})")

Data Leakage: Preprocessing (scaling, imputation) must happen after the train/test split, never before. If you scale on the entire dataset, the test set influences the scaler parameters. The scikit-learn Pipeline automatically prevents this issue by applying fit_transform only on the training set and transform on the test set.

Key Takeaways

Preprocessing is the most critical phase: 80% of time goes to data preparation
Missing values: deletion, statistical or predictive imputation depending on context
Encoding: One-Hot for nominal, Ordinal for ordinal, Target Encoding with caution
Scaling: StandardScaler for normal distributions, RobustScaler with outliers
Pipeline + ColumnTransformer prevent data leakage and simplify code
Feature engineering with domain knowledge often makes more difference than algorithm choice