I create modern web applications and custom digital tools to help businesses grow through technological innovation. My passion is combining computer science and economics to generate real value.
My passion for computer science was born at the Technical Commercial Institute of Maglie, where I discovered the power of programming and the fascination of creating digital solutions. From the start, I understood that computer science was not just code, but an extraordinary tool for turning ideas into reality.
During my studies in Business Information Systems, I began to interweave computer science and economics, understanding how technology can be the engine of growth for any business. This vision accompanied me to the University of Bari, where I obtained my degree in Computer Science, deepening my technical skills and passion for software development.
Today I put this experience at the service of businesses, professionals and startups, creating tailor-made digital solutions that automate processes, optimize resources and open new business opportunities. Because true innovation begins when technology meets the real needs of people.
My Skills
Data Analysis & Predictive Models
I transform data into strategic insights with in-depth analysis and predictive models for informed decisions
Process Automation
I create custom tools that automate repetitive operations and free up time for value-added activities
Custom Systems
I develop tailor-made software systems, from platform integrations to customized dashboards
Credo fermamente che l'informatica sia lo strumento più potente per trasformare le idee in realtà e migliorare la vita delle persone.
Democratizzare la Tecnologia
La mia missione è rendere l'informatica accessibile a tutti: dalle piccole imprese locali alle startup innovative, fino ai professionisti che vogliono digitalizzare la propria attività. Ogni realtà merita di sfruttare le potenzialità del digitale.
Unire Informatica ed Economia
Non è solo questione di scrivere codice: è capire come la tecnologia possa generare valore reale. Intrecciando competenze informatiche e visione economica, aiuto le attività a crescere, ottimizzare processi e raggiungere nuovi traguardi di efficienza e redditività.
Creare Soluzioni su Misura
Ogni attività è unica, e così devono esserlo le soluzioni. Sviluppo strumenti personalizzati che rispondono alle esigenze specifiche di ciascun cliente, automatizzando processi ripetitivi e liberando tempo per ciò che conta davvero: far crescere il business.
Trasforma la Tua Attività con la Tecnologia
Che tu gestisca un negozio, uno studio professionale o un'azienda, posso aiutarti a sfruttare le potenzialità dell'informatica per lavorare meglio, più velocemente e in modo più intelligente.
Bari, Puglia, Italy · Hybrid
Analysis and development of computer systems through the use of Java and Quarkus in Health and Public Sector. Continuous training on modern technologies for creating customized and efficient software solutions and on agents.
💼
06/2022 - 12/2024
Software analyst and Back End Developer Associate Consultant
Links Management and Technology SpA
Experience analyzing as-is software systems and ETL flows using PowerCenter. Completed Spring Boot training for developing modern and scalable backend applications. Backend developer specialized in Spring Boot, with experience in database design, analysis, development and testing of assigned tasks.
💼
02/2021 - 10/2021
Software programmer
Adesso.it (prima era WebScience srl)
Experience in AS-IS and TO-BE analysis, SEO evolutions and website evolutions to improve user performance and engagement.
🎓
2018 - 2025
Degree in Computer Science
University of Bari Aldo Moro
Bachelor's degree in Computer Science, focusing on software engineering, algorithms, and modern development practices.
📚
2013 - 2018
Diploma - Corporate Information Systems
Technical Commercial Institute of Maglie
Technical diploma specializing in Business Information Systems, combining IT knowledge with business management.
Contattami
Hai un progetto in mente? Parliamone! Compila il form qui sotto e ti risponderò al più presto.
* Campi obbligatori. I tuoi dati saranno utilizzati solo per rispondere alla tua richiesta.
Introduction: From Experiment to Production
Building an AI agent that works on a developer's laptop is relatively straightforward.
Bringing it to production with reliability, scalability, and observability is an entirely
different challenge. According to Gartner, only 25% of organizations that
have developed AI agent prototypes have successfully scaled them to production environments.
The gap between prototype and production-ready system is enormous, and the causes are almost
always infrastructural: inadequate containerization, missing monitoring, ineffective scaling,
and approximate state management.
AI agents present unique deployment challenges compared to traditional applications. An agent
is not a simple stateless microservice: it maintains conversational state, makes external API
calls with variable latency, consumes computational resources unpredictably, and can remain
active for minutes (or hours) on a single task. These characteristics require specific
deployment strategies that go beyond the classic request-response pattern.
In this article, we will analyze the complete deployment stack for AI agents: from Docker
containerization to Kubernetes orchestration, from scaling strategies to advanced monitoring.
Each section includes production-ready configurations and architectural patterns consolidated
by teams managing agents at scale.
What You Will Learn in This Article
How to containerize an AI agent with Docker and multi-stage builds
Kubernetes deployment with manifests optimized for agentic workloads
Horizontal, vertical, and queue-based scaling strategies
State persistence: Redis, PostgreSQL, and PersistentVolumes
Service mesh and networking for inter-agent communication
Agent-specific health checks: liveness, readiness, and startup probes
Monitoring with Prometheus and Grafana: custom metrics for agents
Structured logging and distributed tracing with OpenTelemetry
Docker Containerization for AI Agents
Containerization is the critical first step to making an AI agent portable and reproducible.
A Docker container encapsulates the agent, its dependencies, local models (if any), and
configuration into a deployable unit that runs anywhere. However, containerizing an AI agent
requires attention to specific details that traditional web applications do not present.
Optimized Dockerfile: Multi-Stage Build
A multi-stage build approach drastically reduces the final image size by separating the build
environment from the runtime environment. For a Python-based agent, this means installing
compilation dependencies only in the build stage and copying only the necessary artifacts
to the final stage.
# === Stage 1: Builder ===
FROM python:3.12-slim AS builder
WORKDIR /app
# Install system dependencies for compilation
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
gcc \
&& rm -rf /var/lib/apt/lists/*
# Copy and install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir --prefix=/install -r requirements.txt
# === Stage 2: Runtime ===
FROM python:3.12-slim AS runtime
WORKDIR /app
# Copy only installed dependencies
COPY --from=builder /install /usr/local
# Copy agent source code
COPY src/ ./src/
COPY config/ ./config/
# Create non-root user for security
RUN useradd --create-home --shell /bin/bash agent
USER agent
# Environment variables
ENV PYTHONUNBUFFERED=1
ENV AGENT_ENV=production
ENV LOG_LEVEL=INFO
# Built-in container health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
CMD python -c "import requests; requests.get('http://localhost:8080/health')" || exit 1
# Expose service port
EXPOSE 8080
# Start the agent
CMD ["python", "-m", "src.agent_server"]
This Dockerfile implements several critical best practices. Using python:3.12-slim
as the base image reduces the attack surface and overall size. The multi-stage build eliminates
compilation tools from the final image. The non-root user prevents privilege escalation attacks.
The native HEALTHCHECK allows Docker itself to monitor the container's state.
Image Optimization
For agents using heavy libraries like PyTorch or TensorFlow, image optimization becomes
crucial. Some effective strategies:
Layer caching: order COPY instructions from least to most volatile (requirements.txt before source code) to maximize Docker layer cache
.dockerignore: exclude tests, documentation, temporary files, virtual environments, and models not needed in production
Alpine vs Slim: for Python agents, slim is generally preferable to alpine because it avoids compatibility issues with packages that require glibc
Distroless: for maximum security, Google Distroless images eliminate even the shell from the container, reducing the attack surface to a minimum
Environment-Specific Configuration
A production agent needs different configurations compared to development: real API keys,
production endpoints, appropriate logging levels. Configuration management happens through
environment variables, configuration files mounted as volumes, or external secret managers.
Kubernetes is the standard orchestration platform for containerized workloads in production.
For AI agents, Kubernetes offers fundamental advantages: automatic scaling, self-healing,
secret management, service discovery, and rolling updates with zero downtime. However, AI
agents have specific requirements that demand dedicated Kubernetes configurations.
Base Manifests: Deployment and Service
The deployment manifest defines how Kubernetes should run and manage agent pods. For AI
agents, it is essential to correctly configure resources (CPU and memory), health probes,
and restart policies.
Separating configuration from code is a fundamental principle of Twelve-Factor Apps.
In Kubernetes, ConfigMaps manage non-sensitive configuration, while Secrets protect API keys,
database credentials, and TLS certificates. For AI agents, LLM provider API keys are the most
critical secret to protect.
When an agent needs to maintain persistent local state (for example, embedding caches,
locally fine-tuned models, or on-disk session history), a StatefulSet
is preferable to a Deployment. StatefulSet guarantees stable network identity for each pod,
persistent storage via PersistentVolumeClaim, and deterministic startup and shutdown ordering.
Scaling Strategies
Scaling AI agents is more complex than scaling traditional microservices. An agent can occupy
a thread for tens of seconds (or minutes) while executing a multi-step task, making traditional
metrics (CPU, memory) insufficient indicators of actual load. Multi-dimensional scaling
strategies are needed.
Horizontal Pod Autoscaler (HPA)
The HPA automatically scales the number of replicas based on observed metrics. For AI agents,
custom metrics are essential: the number of concurrent tasks, request queue depth, and average
latency per task are more meaningful indicators than CPU utilization.
For agents that process asynchronous tasks, queue-depth-based scaling is the most effective
pattern. The idea is simple: when the queue grows, add workers; when it empties, scale down.
Tools like KEDA (Kubernetes Event-Driven Autoscaling) allow scaling pods based
on metrics from RabbitMQ, Redis Streams, Kafka, or SQS.
Scale-to-zero: when there are no tasks in the queue, KEDA can reduce replicas to zero, completely eliminating infrastructure costs during idle periods
Burst scaling: during sudden spikes, KEDA can scale aggressively based on the queue growth rate
Cooldown period: a stabilization period prevents thrashing (continuous up and down scaling) caused by temporary load fluctuations
Vertical Scaling
Some agentic tasks require more resources per instance rather than more instances. For example,
an agent performing complex reasoning with a local model or processing large documents benefits
from more memory and CPU per pod rather than more pods with limited resources. The
Vertical Pod Autoscaler (VPA) automatically adjusts resource requests based
on historical usage.
State Persistence
State management is one of the most critical challenges in AI agent deployment. An agent that
loses its conversational state, long-term memory, or the context of an ongoing task due to a
pod restart is unusable in production. State persistence requires a multi-layer approach.
Redis for Session State
Redis is the ideal choice for agent session state: low latency (sub-millisecond),
support for complex data structures, and automatic TTL for expired session cleanup. In a
multi-replica context, Redis acts as a shared session store that allows any pod to continue
a conversation started on another pod.
PostgreSQL + pgvector for Long-Term Memory
Long-term agent memory requires a database capable of handling both structured data
(interaction history, user preferences, metrics) and semantic searches (similarity search
on vector embeddings). PostgreSQL with pgvector satisfies both requirements
in a single solution, avoiding the complexity of managing a separate relational database
and a vector store.
PersistentVolumes for Local Cache
When an agent uses local caches (pre-computed embeddings, downloaded models, temporary
processing files), Kubernetes PersistentVolumes ensure that this data survives pod restarts.
It is important to configure the appropriate storageClassName and reclaim policy
to avoid data loss or orphaned volume accumulation.
Networking and Inter-Agent Communication
In multi-agent architectures, communication between agents is a critical aspect that impacts
latency, reliability, and security. Kubernetes networking offers several options, from simple
Service discovery to advanced service meshes.
Service Mesh with Istio
A service mesh like Istio adds a dedicated infrastructure layer for inter-service
communication. For multi-agent systems, Istio provides:
Automatic mTLS: mutual encryption between all pods, ensuring that inter-agent communication is always encrypted and authenticated
Circuit breaker: when a downstream agent is overloaded or unresponsive, the circuit breaker stops requests to prevent cascading failures
Automatic retries: failed requests are retried with exponential backoff, transparently handling transient errors
Advanced load balancing: intelligent traffic distribution with algorithms like least-connections or consistent hashing
Observability: traffic, latency, and error rate metrics for every service pair, without modifications to agent code
Communication Patterns
The choice of communication pattern depends on the type of interaction between agents:
Synchronous request-response: for interactions where the calling agent must wait for the result (gRPC or REST). Suitable for tool calling and queries to specialized sub-agents
Asynchronous message queue: for delegated tasks that do not require an immediate response (RabbitMQ, Kafka). Ideal for multi-agent pipelines where each agent processes and passes the result to the next
Event-driven: for notifications and triggers (Kafka, Redis Pub/Sub). Enables complete decoupling between agents that produce events and agents that consume them
Health Checks for AI Agents
Kubernetes probes are essential to ensure that only healthy pods receive traffic. For AI
agents, the three probe types have specific meanings:
Liveness Probe: verifies that the agent process is alive and not in a deadlock.
Checks that the HTTP server responds and that the main loop is not stuck. If it fails,
Kubernetes restarts the pod.
Readiness Probe: verifies that the agent is ready to receive new tasks.
Checks the connection to Redis, the database, and the availability of external APIs.
If it fails, the pod is removed from the Service (no incoming traffic) but not restarted.
Startup Probe: verifies that initialization has completed. For agents that
need to load models, populate caches, or establish multiple connections, startup time can
be significant (30-120 seconds). The startup probe prevents liveness/readiness from
killing the pod before it is ready.
Health Endpoint Implementation
# health.py - Health endpoints for the AI agent
from fastapi import FastAPI, Response
import redis
import psycopg2
import time
app = FastAPI()
# Global agent state
agent_ready = False
agent_start_time = time.time()
@app.get("/health/live")
async def liveness():
"""Is the agent alive? Is the process running?"""
return {"status": "alive", "uptime": time.time() - agent_start_time}
@app.get("/health/ready")
async def readiness():
"""Is the agent ready to receive tasks?"""
checks = {}
# Verify Redis connection
try:
r = redis.from_url("redis://redis:6379")
r.ping()
checks["redis"] = "ok"
except Exception:
checks["redis"] = "failed"
return Response(status_code=503, content="Redis unavailable")
# Verify database connection
try:
conn = psycopg2.connect("postgresql://agent:pass@db:5432/agentdb")
conn.close()
checks["database"] = "ok"
except Exception:
checks["database"] = "failed"
return Response(status_code=503, content="Database unavailable")
# Verify API key configured
import os
if not os.getenv("ANTHROPIC_API_KEY"):
checks["api_key"] = "missing"
return Response(status_code=503, content="API key missing")
checks["api_key"] = "configured"
return {"status": "ready", "checks": checks}
@app.get("/health/startup")
async def startup():
"""Is initialization complete?"""
if not agent_ready:
return Response(status_code=503, content="Initialization in progress")
return {"status": "started"}
Monitoring and Alerting
Monitoring is the backbone of production operability. For AI agents, standard infrastructure
metrics (CPU, memory, network) are necessary but insufficient. Specific metrics are needed
that capture the behavior and performance of agent reasoning.
Prometheus Metrics for Agents
Prometheus is the de facto standard for monitoring cloud-native systems.
For AI agents, we define custom metrics that track every critical aspect of a task's lifecycle.
Warning: average number of iterations per task steadily increasing (possible infinite loop)
Critical: Redis or database connection lost for more than 2 minutes
Logging and Distributed Tracing
In an agentic system, a single user task can generate dozens of LLM calls, tool invocations,
and interactions with external services. Tracing the complete flow of a task requires
structured logging and distributed tracing.
Structured Logging (JSON)
Structured logging in JSON format enables automatic parsing, indexed search, and event
correlation. Every log entry must include a correlation ID (or trace ID)
that links all logs related to a single user task.
OpenTelemetry (OTel) is the open source standard for distributed observability.
For AI agents, OTel allows tracing the entire path of a task through all system components:
from receiving the request, through every agent loop iteration, every LLM call, every tool
invocation, to the final response.
Every significant operation is wrapped in an OTel span. Spans are organized
hierarchically: the task is the root span, each loop iteration is a child, and LLM calls
and tool invocations are grandchildren. This hierarchy allows visualizing the complete task
flow in tools like Jaeger or Zipkin, immediately identifying
bottlenecks and failure points.
Log Aggregation
For systems with tens or hundreds of agent instances, centralized log aggregation is
indispensable. The most common solutions are:
ELK Stack (Elasticsearch, Logstash, Kibana): powerful for full-text search and advanced log analysis, but requires significant resources
Grafana Loki: lightweight and cost-effective solution that indexes only log metadata (labels), not the full content. Ideal for teams already using Grafana
Datadog / New Relic: SaaS solutions that integrate logs, metrics, and tracing in a single platform, with AI-powered analysis for anomaly detection
LangSmith as an Observability Platform
LangSmith, developed by the LangChain team, is an observability platform
specifically designed for LLM applications and AI agents. Unlike generic monitoring tools,
LangSmith understands the semantics of agent interactions:
LLM chain tracing: complete visualization of every chain/graph execution with input, output, latency, and cost for each node
Integrated playground: ability to re-run any step with modified prompts for rapid debugging
Dataset and evaluation: creation of test datasets from production traces for automated regression testing
Native alerting: rules based on response quality, costs, and error patterns specific to agents
Self-hosted or SaaS: available as both cloud service and on-premise deployment for compliance requirements
CI/CD for AI Agents
The CI/CD pipeline for AI agents extends the traditional one with specific steps: prompt
validation, LLM provider integration testing, and reasoning performance verification. A
robust pipeline includes:
Unit tests: testing individual tools, routing logic, and error handling
Integration tests: end-to-end testing with real LLMs (or mocks) on predefined scenarios
Regression tests: golden answer datasets to verify that prompt or model updates do not degrade quality
Canary deployment: progressive rollout to 5%, 25%, 50%, 100% of traffic with automatic metric monitoring
Automatic rollback: if metrics degrade during canary, automatic rollback to the previous version
Pre-Production Deployment Checklist
Before bringing an agent to production, verify every item on this checklist:
Security audit completed: API keys protected, input sanitized, output filtered
Load testing performed: system handles expected load with 50% margin
Monitoring configured: metrics, dashboards, and alerting operational
Logging active: structured logs with correlation IDs, centralized aggregation
Health checks implemented: liveness, readiness, and startup probes functional
Scaling configured: HPA with custom metrics, appropriate min/max limits
Rollback plan documented: tested procedure to revert to previous version
Rate limiting active: protection against traffic bursts and infinite loops
Budget alerts configured: LLM API spending thresholds with notifications
Disaster recovery tested: data backups, verified recovery procedure
Deploying AI agents to production requires a rigorous engineering approach that goes far
beyond simply packaging code in a container. The infrastructure must handle the specificities
of agentic workloads: variable latency, unpredictable resource consumption, persistent
conversational state, and dependence on external services.
The pillars of a robust deployment are four: optimized containerization
with multi-stage Docker builds, Kubernetes orchestration with probes and
scaling configured for agentic workloads, complete observability with custom
Prometheus metrics and distributed tracing, and state persistence through
Redis and PostgreSQL.
The gap between prototype and production is bridged by investing in the platform: an agent
with excellent monitoring and automatic scaling is a business asset. An agent without
observability is an operational risk. The pre-production checklist presented in this article
represents the bare minimum for a responsible go-live.
In the next article, "FinOps & Cost Optimization for AI Agents", we will
tackle the other critical aspect of production: cost control. We will analyze token economics,
model routing strategies to reduce spending by 60-80%, and prompt engineering techniques
focused on savings.