Hi! I'm

Federico Calò

Software Developer | Technical Writer

I create modern web applications and custom digital tools to help businesses grow through technological innovation. My passion is combining computer science and economics to generate real value.

Contact Me

About Me

My passion for computer science was born at the Technical Commercial Institute of Maglie, where I discovered the power of programming and the fascination of creating digital solutions. From the start, I understood that computer science was not just code, but an extraordinary tool for turning ideas into reality.

During my studies in Business Information Systems, I began to interweave computer science and economics, understanding how technology can be the engine of growth for any business. This vision accompanied me to the University of Bari, where I obtained my degree in Computer Science, deepening my technical skills and passion for software development.

Today I put this experience at the service of businesses, professionals and startups, creating tailor-made digital solutions that automate processes, optimize resources and open new business opportunities. Because true innovation begins when technology meets the real needs of people.

My Skills

Data Analysis & Predictive Models

I transform data into strategic insights with in-depth analysis and predictive models for informed decisions

Process Automation

I create custom tools that automate repetitive operations and free up time for value-added activities

Custom Systems

I develop tailor-made software systems, from platform integrations to customized dashboards

const federico = {
  nome: "Federico Calò",
  ruolo: "Sviluppatore Software",
  città: "Bari, Italia",
  missione: "Aiutare attraverso l'informatica",
  passioni: [
    "Codice Pulito",
    "Innovazione",
    "Crescita Continua"
  ]
};

La Mia Missione

Credo fermamente che l'informatica sia lo strumento più potente per trasformare le idee in realtà e migliorare la vita delle persone.

Democratizzare la Tecnologia

La mia missione è rendere l'informatica accessibile a tutti: dalle piccole imprese locali alle startup innovative, fino ai professionisti che vogliono digitalizzare la propria attività. Ogni realtà merita di sfruttare le potenzialità del digitale.

Unire Informatica ed Economia

Non è solo questione di scrivere codice: è capire come la tecnologia possa generare valore reale. Intrecciando competenze informatiche e visione economica, aiuto le attività a crescere, ottimizzare processi e raggiungere nuovi traguardi di efficienza e redditività.

Creare Soluzioni su Misura

Ogni attività è unica, e così devono esserlo le soluzioni. Sviluppo strumenti personalizzati che rispondono alle esigenze specifiche di ciascun cliente, automatizzando processi ripetitivi e liberando tempo per ciò che conta davvero: far crescere il business.

Trasforma la Tua Attività con la Tecnologia

Che tu gestisca un negozio, uno studio professionale o un'azienda, posso aiutarti a sfruttare le potenzialità dell'informatica per lavorare meglio, più velocemente e in modo più intelligente.

Parliamone Insieme →

Join the Community

Join the developer community where we discuss software, AI, architecture and DevOps. Share ideas, ask questions and grow with us.

Channel

FC Dev Blog

Get notifications on new articles, complete series, weekly tips and featured tools. Bilingual IT/EN content directly in your Telegram.

New articles as they are published
Weekly tips and code snippets
Polls on future topics

Subscribe to Channel

Group

FC Dev Community

A bilingual IT/EN community for developers. Discussions, Q&A, mutual help and networking with other professionals.

Discussions on articles and technologies
Coding help and code review
Job opportunities and collaboration

Join the Group

Discussion Topics

View

Master SQL

RoadMap.sh

November 2024

View

Oracle Certified Foundations Associate

Oracle

October 2024

View

People Leadership Credential

Connect

September 2024

💻 Languages & Technologies

Java

Python

JavaScript

Angular

React

TypeScript

SQL

PHP

CSS/SCSS

Node.js

Docker

Git

💼

12/2024 - Present

Custom Software Engineering Analyst

Accenture

Bari, Puglia, Italy · Hybrid Analysis and development of computer systems through the use of Java and Quarkus in Health and Public Sector. Continuous training on modern technologies for creating customized and efficient software solutions and on agents.

💼

06/2022 - 12/2024

Software analyst and Back End Developer Associate Consultant

Links Management and Technology SpA

Experience analyzing as-is software systems and ETL flows using PowerCenter. Completed Spring Boot training for developing modern and scalable backend applications. Backend developer specialized in Spring Boot, with experience in database design, analysis, development and testing of assigned tasks.

💼

02/2021 - 10/2021

Software programmer

Adesso.it (prima era WebScience srl)

Experience in AS-IS and TO-BE analysis, SEO evolutions and website evolutions to improve user performance and engagement.

🎓

2018 - 2025

Degree in Computer Science

University of Bari Aldo Moro

Bachelor's degree in Computer Science, focusing on software engineering, algorithms, and modern development practices.

📚

2013 - 2018

Diploma - Corporate Information Systems

Technical Commercial Institute of Maglie

Technical diploma specializing in Business Information Systems, combining IT knowledge with business management.

Contattami

Hai un progetto in mente? Parliamone! Compila il form qui sotto e ti risponderò al più presto.

* Campi obbligatori. I tuoi dati saranno utilizzati solo per rispondere alla tua richiesta.

AI Observability: Monitoring LLMs, Tokens, and Agents

With the massive adoption of Large Language Models (LLMs) in production, a new observability domain emerges: monitoring AI-based applications. LLM calls have unique characteristics compared to traditional APIs: variable costs based on token usage, unpredictable latencies, non-deterministic outputs, and hallucination risks. AI observability applies classical observability principles to this new paradigm.

In this article, we will explore how to instrument language model calls, trace AI agent behavior, monitor costs in real time, and detect anomalies in model responses using OpenTelemetry and emerging semantic conventions for AI.

What You Will Learn in This Article

Why AI applications require specialized observability
Tracing LLM calls with specific spans and attributes
Monitoring token usage and costs in real time
Observing AI agent behavior (tool calls, reasoning)
Detecting hallucination and quality degradations
Frameworks and tools for AI observability

Why AI Applications Require Specialized Observability

LLM-based applications have characteristics that make them fundamentally different from traditional applications from an observability perspective:

      Unique AI Observability Challenges
      
          Challenge
          Traditional Application
          AI/LLM Application
        
          Costs
          Fixed (compute, storage)
          Variable (per token, per request)
        
          Latency
          Predictable (ms range)
          High and variable (1-60s for streaming)
        
          Output
          Deterministic
          Non-deterministic (temperature, sampling)
        
          Errors
          Clear status codes
          Hallucination, incoherent responses
        
          Testing
          Deterministic unit tests
          Qualitative evaluation (eval)

Instrumenting LLM Calls

Instrumenting language model calls follows the same OTel pattern for external APIs, but with specific attributes to capture AI metadata: model used, token count, temperature, estimated cost.


from opentelemetry import trace, metrics
import time
import tiktoken

tracer = trace.get_tracer("ai-service", "1.0.0")
meter = metrics.get_meter("ai-service", "1.0.0")

# LLM-specific metrics
llm_token_usage = meter.create_counter(
    name="llm.token.usage",
    description="Tokens used in LLM calls",
    unit="token"
)

llm_request_duration = meter.create_histogram(
    name="llm.request.duration",
    description="LLM call duration",
    unit="ms"
)

llm_cost = meter.create_counter(
    name="llm.cost.total",
    description="Estimated LLM call cost",
    unit="USD"
)

def call_llm(messages, model="gpt-4", temperature=0.7, max_tokens=1000):
    with tracer.start_as_current_span(
        "llm.chat.completion",
        attributes={
            # Semantic conventions for AI (draft)
            "gen_ai.system": "openai",
            "gen_ai.request.model": model,
            "gen_ai.request.temperature": temperature,
            "gen_ai.request.max_tokens": max_tokens,
            "gen_ai.request.top_p": 1.0,
            # Application context
            "llm.prompt.messages_count": len(messages),
            "llm.prompt.system_prompt_length": len(messages[0]["content"])
                if messages[0]["role"] == "system" else 0,
        }
    ) as span:
        start_time = time.monotonic()

        try:
            # Input token counting (pre-call)
            encoding = tiktoken.encoding_for_model(model)
            input_tokens = sum(
                len(encoding.encode(m["content"])) for m in messages
            )
            span.set_attribute("gen_ai.usage.prompt_tokens", input_tokens)

            # Model call
            response = openai_client.chat.completions.create(
                model=model,
                messages=messages,
                temperature=temperature,
                max_tokens=max_tokens
            )

            # Response attributes
            output_tokens = response.usage.completion_tokens
            total_tokens = response.usage.total_tokens

            span.set_attribute("gen_ai.usage.completion_tokens", output_tokens)
            span.set_attribute("gen_ai.usage.total_tokens", total_tokens)
            span.set_attribute("gen_ai.response.model", response.model)
            span.set_attribute("gen_ai.response.finish_reason",
                response.choices[0].finish_reason)

            # Estimated cost calculation
            cost = estimate_cost(model, input_tokens, output_tokens)
            span.set_attribute("llm.cost.estimated_usd", cost)

            # Record metrics
            duration_ms = (time.monotonic() - start_time) * 1000
            common_attrs = {
                "gen_ai.system": "openai",
                "gen_ai.request.model": model
            }

            llm_token_usage.add(input_tokens,
                {**common_attrs, "gen_ai.token.type": "input"})
            llm_token_usage.add(output_tokens,
                {**common_attrs, "gen_ai.token.type": "output"})
            llm_request_duration.record(duration_ms, common_attrs)
            llm_cost.add(cost, common_attrs)

            span.set_status(StatusCode.OK)
            return response

        except RateLimitError as e:
            span.record_exception(e)
            span.set_status(StatusCode.ERROR, "Rate limit exceeded")
            span.set_attribute("llm.error.type", "rate_limit")
            raise

        except Exception as e:
            span.record_exception(e)
            span.set_status(StatusCode.ERROR, str(e))
            raise

Tracing AI Agents

AI agents are systems that use LLMs to reason, plan, and invoke tools (tool calls) to complete complex tasks. Their behavior is particularly difficult to observe because it involves multiple reasoning cycles, tool selection decisions, and result composition.


def trace_agent_execution(task, agent):
    with tracer.start_as_current_span(
        "agent.execute",
        attributes={
            "agent.name": agent.name,
            "agent.model": agent.model,
            "agent.task": task.description,
            "agent.max_iterations": agent.max_iterations,
            "agent.tools_available": ",".join(agent.tool_names)
        }
    ) as agent_span:
        iteration = 0
        total_tokens = 0
        total_cost = 0.0
        tools_called = []

        while not agent.is_done() and iteration < agent.max_iterations:
            iteration += 1

            # Span for each reasoning loop iteration
            with tracer.start_as_current_span(
                f"agent.iteration.{iteration}",
                attributes={
                    "agent.iteration": iteration,
                    "agent.state": agent.current_state
                }
            ) as iter_span:

                # Span for LLM reasoning call
                with tracer.start_as_current_span("agent.reasoning") as reason_span:
                    decision = agent.reason(task)
                    reason_span.set_attribute("agent.decision.type",
                        decision.type)  # "tool_call" | "final_answer"
                    reason_span.set_attribute("agent.decision.confidence",
                        decision.confidence)
                    total_tokens += decision.tokens_used

                # If agent decides to call a tool
                if decision.type == "tool_call":
                    with tracer.start_as_current_span(
                        "agent.tool_call",
                        attributes={
                            "agent.tool.name": decision.tool_name,
                            "agent.tool.input_summary": decision.tool_input[:200]
                        }
                    ) as tool_span:
                        result = agent.execute_tool(decision)
                        tool_span.set_attribute("agent.tool.success",
                            result.success)
                        tool_span.set_attribute("agent.tool.output_length",
                            len(str(result.output)))
                        tools_called.append(decision.tool_name)

        # Final agent attributes
        agent_span.set_attribute("agent.iterations_total", iteration)
        agent_span.set_attribute("agent.tokens_total", total_tokens)
        agent_span.set_attribute("agent.cost_total_usd", total_cost)
        agent_span.set_attribute("agent.tools_called",
            ",".join(tools_called))
        agent_span.set_attribute("agent.success", agent.is_done())

Real-Time Cost Monitoring

LLM call costs can scale rapidly, especially with models like GPT-4 or Claude. Real-time cost monitoring allows detecting anomalies (an infinite loop consuming tokens), setting budget alerts, and optimizing model usage.


# Prometheus alert rules for AI cost monitoring
groups:
  - name: ai-cost-alerts
    rules:
      # Alert: hourly spending above budget
      - alert: HighAICostPerHour
        expr: |
          sum(rate(llm_cost_total[1h])) * 3600 > 50
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "AI cost exceeding $50/hour"

      # Alert: high LLM error rate
      - alert: HighLLMErrorRate
        expr: |
          rate(llm_request_duration_count{status="error"}[5m])
          / rate(llm_request_duration_count[5m]) > 0.1
        for: 3m
        labels:
          severity: critical
        annotations:
          summary: "LLM error rate above 10%"

      # Alert: degraded LLM latency
      - alert: HighLLMLatency
        expr: |
          histogram_quantile(0.95,
            rate(llm_request_duration_bucket[5m])
          ) > 30000
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "LLM P95 latency above 30 seconds"

      # Alert: anomalous token usage
      - alert: AnomalousTokenUsage
        expr: |
          rate(llm_token_usage[5m]) > 3 * avg_over_time(rate(llm_token_usage[1h])[24h:1h])
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Token usage 3x above 24h average"

Key Metrics for AI Observability

Token usage: input/output tokens per model, per endpoint, per user
Cost per request: estimated cost based on model and tokens used
TTFT latency: Time To First Token, critical for streaming applications
Error rate: rate limits, timeouts, model errors
Finish reason: distribution of stop, max_tokens, tool_call, content_filter
Agent iterations: number of reasoning cycles per completed task
Tool call success rate: percentage of successful vs failed tool calls

Hallucination and Quality Detection

Response quality monitoring is a unique aspect of AI observability. While technical errors (timeouts, rate limits) are easy to detect, hallucinations and low-quality responses require proxy metrics and automatic evaluation.

Proxy Signals for Response Quality

Response length: responses too short or too long compared to the average may indicate quality problems.

Anomalous finish reason: a high rate of max_tokens indicates truncated responses; a high rate of content_filter indicates blocked content.

User feedback: track user actions after the response (retry, thumbs down, abandonment) as indirect quality signals.

Similarity score: compare the response with reference responses using embeddings to detect quality drift.

Frameworks for AI Observability

Several frameworks are emerging to simplify AI observability, offering automatic instrumentation for major AI libraries and preconfigured dashboards. Among the most relevant: OpenLLMetry (OTel-based for LLM), LangSmith (for LangChain), Helicone (OpenAI proxy), and OTel Semantic Conventions for GenAI (emerging standard).

Conclusions and Next Steps

AI observability is a rapidly evolving field that extends classical observability principles to LLM-based applications. The unique challenges (variable costs, non-deterministic output, hallucination) require specialized metrics and monitoring patterns.

The three pillars of AI observability are: cost monitoring (token usage, cost per request), performance monitoring (latency, TTFT, error rate), and quality monitoring (finish reason, user feedback, similarity scores).

In the next and final article of the series, we will present a complete case study: an end-to-end observability implementation for a microservices architecture, with before and after metrics from the OpenTelemetry adoption.

Challenge	Traditional Application	AI/LLM Application
Costs	Fixed (compute, storage)	Variable (per token, per request)
Latency	Predictable (ms range)	High and variable (1-60s for streaming)
Output	Deterministic	Non-deterministic (temperature, sampling)
Errors	Clear status codes	Hallucination, incoherent responses
Testing	Deterministic unit tests	Qualitative evaluation (eval)