Cześć! Jestem

Federico Calò

Sviluppatore Software | Divulgatore Tecnico

Creo applicazioni web moderne e strumenti digitali personalizzati per aiutare le attività a crescere attraverso l'innovazione tecnologica. La mia passione è unire informatica ed economia per generare valore reale.

Skontaktuj się

O Mnie

La mia passione per l'informatica è nata tra i banchi dell'Istituto Tecnico Commerciale di Maglie, dove ho scoperto il potere della programmazione e il fascino di creare soluzioni digitali. Fin da subito, ho capito che l'informatica non era solo codice, ma uno strumento straordinario per trasformare idee in realtà.

Durante gli studi superiori in Sistemi Informativi Aziendali, ho iniziato a intrecciare informatica ed economia, comprendendo come la tecnologia possa essere il motore della crescita per qualsiasi attività. Questa visione mi ha accompagnato all'Università degli Studi di Bari, dove ho conseguito la Laurea in Informatica, approfondendo le mie competenze tecniche e la mia passione per lo sviluppo software.

Oggi metto questa esperienza al servizio di imprese, professionisti e startup, creando soluzioni digitali su misura che automatizzano processi, ottimizzano risorse e aprono nuove opportunità di business. Perché la vera innovazione inizia quando la tecnologia incontra le esigenze reali delle persone.

Moje Umiejętności

Analisi Dati & Modelli Previsionali

Trasformo i dati in insights strategici con analisi approfondite e modelli predittivi per decisioni informate

Automatyzacja Procesów

Creo strumenti personalizzati che automatizzano operazioni ripetitive e liberano tempo per attività a valore aggiunto

Systemy Na Zamówienie

Sviluppo sistemi software su misura, dalle integrazioni tra piattaforme alle dashboard personalizzate

const federico = {
  nome: "Federico Calò",
  ruolo: "Sviluppatore Software",
  città: "Bari, Italia",
  missione: "Aiutare attraverso l'informatica",
  passioni: [
    "Codice Pulito",
    "Innovazione",
    "Crescita Continua"
  ]
};

Moja Misja

Credo fermamente che l'informatica sia lo strumento più potente per trasformare le idee in realtà e migliorare la vita delle persone.

Demokratyzacja Technologii

La mia missione è rendere l'informatica accessibile a tutti: dalle piccole imprese locali alle startup innovative, fino ai professionisti che vogliono digitalizzare la propria attività. Ogni realtà merita di sfruttare le potenzialità del digitale.

Łączenie IT i Biznesu

Non è solo questione di scrivere codice: è capire come la tecnologia possa generare valore reale. Intrecciando competenze informatiche e visione economica, aiuto le attività a crescere, ottimizzare processi e raggiungere nuovi traguardi di efficienza e redditività.

Tworzenie Rozwiązań na Miarę

Ogni attività è unica, e così devono esserlo le soluzioni. Sviluppo strumenti personalizzati che rispondono alle esigenze specifiche di ciascun cliente, automatizzando processi ripetitivi e liberando tempo per ciò che conta davvero: far crescere il business.

Przekształć Swój Biznes z Technologią

Che tu gestisca un negozio, uno studio professionale o un'azienda, posso aiutarti a sfruttare le potenzialità dell'informatica per lavorare meglio, più velocemente e in modo più intelligente.

Porozmawiajmy →

Unisciti alla Community

Entra nella community di sviluppatori dove discutiamo di software, AI, architettura e DevOps. Condividi idee, fai domande e cresci insieme a noi.

Canale

FC Dev Blog

Ricevi notifiche su nuovi articoli, serie complete, tips settimanali e tool in evidenza. Contenuti bilingui IT/EN direttamente nel tuo Telegram.

Nuovi articoli appena pubblicati
Tips e code snippets settimanali
Sondaggi sugli argomenti futuri

Iscriviti al Canale

Gruppo

FC Dev Community

Una community bilingue IT/EN per sviluppatori. Discussioni, Q&A, aiuto reciproco e networking con altri professionisti del settore.

Discussioni su articoli e tecnologie
Help coding e code review
Opportunità di lavoro e collaborazione

Unisciti al Gruppo

Topic di Discussione

Visualizza

Master SQL

RoadMap.sh

Novembre 2024

Visualizza

Oracle Certified Foundations Associate

Oracle

Ottobre 2024

Visualizza

People Leadership Credential

Connect

Settembre 2024

Linguaggi & Tecnologie

Java

Python

JavaScript

Angular

React

TypeScript

SQL

PHP

CSS/SCSS

Node.js

Docker

Git

💼

12/2024 - Presente

Custom Software Engineering Analyst

Accenture

Bari, Puglia, Italia · Ibrida Analisi e sviluppo di sistemi informatici attraverso l'utilizzo di Java e Quarkus in Health and Public Sector. Formazione continua su tecnologie moderne per la creazione di soluzioni software personalizzate ed efficienti e sugli agenti.

💼

06/2022 - 12/2024

Analista software e Back End Developer Associate Consultant

Links Management and Technology SpA

Esperienza nell'analisi di sistemi software as-is e flussi ETL utilizzando PowerCenter. Formazione completata su Spring Boot per lo sviluppo di applicazioni backend moderne e scalabili. Sviluppatore Backend specializzato in Spring Boot, con esperienza in progettazione di database, analisi, sviluppo e testing dei task assegnati.

💼

02/2021 - 10/2021

Programmatore software

Adesso.it (prima era WebScience srl)

Esperienza nell'analisi AS-IS e TO-BE, evoluzioni SEO ed evoluzioni website per migliorare le performance e l'engagement degli utenti.

🎓

2018 - 2025

Laurea in Informatica

Università degli Studi di Bari Aldo Moro

Bachelor's degree in Computer Science, focusing on software engineering, algorithms, and modern development practices.

📚

2013 - 2018

Diploma - Sistemi Informativi Aziendali

Istituto Tecnico Commerciale di Maglie

Technical diploma specializing in Business Information Systems, combining IT knowledge with business management.

Skontaktuj się

Masz projekt? Porozmawiajmy! Wypełnij formularz, a odpowiem wkrótce.

* Campi obbligatori. I tuoi dati saranno utilizzati solo per rispondere alla tua richiesta.

RAG z PostgreSQL: od dokumentu do odpowiedzi

Czy kiedykolwiek chciałeś, aby Twój system AI mógł odpowiadać na pytania w oparciu o dokumenty? specyficzne dla Twojej firmy, bez konieczności szkolenia niestandardowego modelu? Odpowiedź tak zadzwoń Generacja wzmocniona odzyskiwaniem (RAG)i jest to jedna z najpiękniejszych architektur potężne i praktyczne technologie nowoczesnej sztucznej inteligencji. A PostgreSQL z pgvector jest jednym z najlepszych narzędzi aby to wdrożyć.

RAG łączy w sobie dwie uzupełniające się możliwości: wyszukiwanie semantyczne (znajdź I dokumenty najbardziej istotne dla wniosku) z rozszerzeniem generacja języka naturalnego (udziel spójnej odpowiedzi na podstawie tych dokumentów). The result is a system that responds z aktualną wiedzą o Twoich danych, a nie z ogólną znajomością wcześniej wytrenowanego modelu.

W tym artykule zbudujemy kompletny, kompleksowy potok RAG: od przetwarzania dokumentów na zapytanie z odpowiedzią wygenerowaną przez GPT-4, wszystko na PostgreSQL. Brak dodatkowych baz danych, brak usług zewnętrznych dla sklepu wektorowego.

Przegląd serii

#	Przedmiot	Centrum
1	pgwektor	Instalacja, operatorzy, indeksowanie
2	Osadzanie w głębi	Modele, odległości, generacja
3	Jesteś tutaj - RAG z PostgreSQL	Rurociąg RAG od końca do końca
4	Wyszukiwanie podobieństw	Algorytmy i optymalizacja
5	HNSW i IVFFlat	Zaawansowane strategie indeksowania
6	RAG w produkcji	Skalowalność i wydajność

Czego się nauczysz

Kompletna architektura systemu RAG: komponenty i przepływ danych
Potok pozyskiwania dokumentów: ładowanie, analizowanie, fragmentowanie
Strategia przechowywania w PostgreSQL z pgvector
Pobieranie: od zapytania do wyboru najbardziej odpowiednich fragmentów
Generacja: jak zbudować zachętę i zintegrować z GPT-4
Wyszukiwanie hybrydowe: połączenie wyszukiwania wektorowego i wyszukiwania pełnotekstowego PostgreSQL
Ocena jakości RAG: mierniki i narzędzia

Architektura RAG: jak to działa

System RAG składa się z dwóch głównych faz, które działają w różnym czasie:

Faza 1: Przetwarzanie (offline)

Dzieje się to jednorazowo (lub okresowo w miarę zmiany dokumentów). Proces jest następujący:

Obciążenie: Załaduj dokumenty z systemu plików, adresu URL, bazy danych, API
Analizować: Wyodrębnij tekst z plików PDF, DOCX, HTML, Markdown
Kawałki: Podziel tekst na fragmenty o optymalnej wielkości
Osadzać: Wygeneruj wektor osadzania dla każdego fragmentu
Sklep: Zapisz fragment + osadzenie + metadane w PostgreSQL

Faza 2: Pobieranie + Generowanie (online, dla każdego zapytania)

Zapytania: Użytkownik zadaje pytanie w języku naturalnym
Osadź zapytania: Przekształć pytanie w wektor, korzystając z tego samego modelu
Szukaj: Znajdź k najbardziej podobnych fragmentów w PostgreSQL
Kontekst: Złóż znalezione fragmenty jako kontekst
Spowodować: Wyślij pytanie + kontekst do LLM, aby uzyskać odpowiedź

## Flusso RAG Visualizzato

INGESTION (offline):
Documento PDF
    |
    v
[Parser] -> Testo grezzo
    |
    v
[Chunker] -> ["chunk 1", "chunk 2", ..., "chunk N"]
    |
    v
[Embedding Model] -> [[0.023, -0.841, ...], [0.891, 0.234, ...], ...]
    |
    v
[PostgreSQL + pgvector] -> Memorizzazione permanente

QUERY (online):
Domanda utente: "Come funziona l'indicizzazione HNSW?"
    |
    v
[Embedding Model] -> [0.045, -0.823, ...]  (query vector)
    |
    v
[PostgreSQL ANN Search] -> Top 5 chunk più simili
    |
    v
[Prompt Builder] -> "Usa questo contesto: [chunk1, chunk2, ...] Domanda: ..."
    |
    v
[GPT-4 / Claude] -> "L'indicizzazione HNSW (Hierarchical Navigable Small World) ..."
    |
    v
Risposta all'utente

Konfiguracja projektu

Uzależnienia

# requirements.txt
openai>=1.12.0
psycopg2-binary>=2.9.9
langchain>=0.1.0
langchain-openai>=0.0.5
langchain-community>=0.0.20
pypdf>=3.17.0
python-dotenv>=1.0.0
tiktoken>=0.5.0

# Installazione
pip install -r requirements.txt

Konfiguracja bazy danych

-- Setup iniziale PostgreSQL
CREATE EXTENSION IF NOT EXISTS vector;
CREATE EXTENSION IF NOT EXISTS pg_trgm;  -- per full-text search

-- Schema completo per RAG
CREATE TABLE IF NOT EXISTS rag_documents (
    id              BIGSERIAL PRIMARY KEY,
    -- Informazioni sorgente
    source_path     TEXT NOT NULL,
    source_type     TEXT NOT NULL CHECK (source_type IN ('pdf', 'txt', 'md', 'html', 'docx')),
    source_hash     TEXT NOT NULL,          -- hash MD5 del file originale
    -- Chunk info
    chunk_index     INTEGER NOT NULL,
    chunk_total     INTEGER,
    -- Contenuto
    title           TEXT,
    content         TEXT NOT NULL,
    content_length  INTEGER GENERATED ALWAYS AS (length(content)) STORED,
    -- Embedding
    embedding_model TEXT NOT NULL DEFAULT 'text-embedding-3-small',
    embedding       vector(1536),
    -- Metadata
    metadata        JSONB DEFAULT '{}',
    tags            TEXT[] DEFAULT '{}',
    -- Timestamps
    ingested_at     TIMESTAMPTZ DEFAULT NOW(),
    updated_at      TIMESTAMPTZ DEFAULT NOW(),
    UNIQUE (source_path, chunk_index, source_hash)
);

-- Indice HNSW per vector search veloce
CREATE INDEX idx_rag_embedding_hnsw
ON rag_documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

-- Indice GIN per full-text search
CREATE INDEX idx_rag_content_fts
ON rag_documents
USING gin (to_tsvector('english', content));

-- Indice per filtri comuni
CREATE INDEX idx_rag_source_type ON rag_documents (source_type);
CREATE INDEX idx_rag_tags ON rag_documents USING gin (tags);
CREATE INDEX idx_rag_metadata ON rag_documents USING gin (metadata);

Potok pozyskiwania dokumentów

Struktura projektu w Pythonie

rag_system/
├── config.py          # Configurazione DB, API keys, parametri
├── ingestion/
│   ├── __init__.py
│   ├── loaders.py     # Caricamento documenti da varie sorgenti
│   ├── parsers.py     # Parsing PDF, DOCX, HTML, Markdown
│   ├── chunkers.py    # Strategie di chunking
│   └── pipeline.py    # Pipeline ingestion orchestrator
├── retrieval/
│   ├── __init__.py
│   ├── embedder.py    # Generazione embeddings
│   └── searcher.py    # Vector search e hybrid search
├── generation/
│   ├── __init__.py
│   ├── prompts.py     # Template prompts
│   └── generator.py  # Integrazione LLM
├── rag.py             # Classe principale RAGSystem
└── main.py            # Entry point

config.py

import os
from dataclasses import dataclass
from dotenv import load_dotenv

load_dotenv()

@dataclass
class Config:
    # Database
    db_host: str = os.getenv("DB_HOST", "localhost")
    db_port: int = int(os.getenv("DB_PORT", "5432"))
    db_name: str = os.getenv("DB_NAME", "ragdb")
    db_user: str = os.getenv("DB_USER", "postgres")
    db_password: str = os.getenv("DB_PASSWORD", "")

    # OpenAI
    openai_api_key: str = os.getenv("OPENAI_API_KEY", "")
    embedding_model: str = "text-embedding-3-small"
    embedding_dim: int = 1536
    chat_model: str = "gpt-4o-mini"  # cost-effective default

    # Chunking
    chunk_size: int = 800
    chunk_overlap: int = 150
    min_chunk_size: int = 100

    # Retrieval
    top_k: int = 5
    similarity_threshold: float = 0.65  # minimum cosine similarity

    # Generation
    max_context_tokens: int = 8000
    temperature: float = 0.1  # low temperature for factual answers

    def get_db_url(self) -> str:
        return f"postgresql://{self.db_user}:{self.db_password}@{self.db_host}:{self.db_port}/{self.db_name}"

config = Config()

ingestion/loaders.py — Ładowanie z wielu źródeł

import hashlib
from pathlib import Path
from dataclasses import dataclass
from typing import Optional
import requests
from bs4 import BeautifulSoup

@dataclass
class RawDocument:
    content: str
    source_path: str
    source_type: str
    source_hash: str
    title: Optional[str] = None
    metadata: dict = None

    def __post_init__(self):
        if self.metadata is None:
            self.metadata = {}

def load_text_file(path: str) -> RawDocument:
    p = Path(path)
    content = p.read_text(encoding="utf-8")
    return RawDocument(
        content=content,
        source_path=path,
        source_type="txt",
        source_hash=hashlib.md5(content.encode()).hexdigest(),
        title=p.stem
    )

def load_markdown_file(path: str) -> RawDocument:
    p = Path(path)
    content = p.read_text(encoding="utf-8")
    # Estrai titolo dal frontmatter o dalla prima riga H1
    title = None
    for line in content.split("\n"):
        if line.startswith("# "):
            title = line[2:].strip()
            break
    return RawDocument(
        content=content,
        source_path=path,
        source_type="md",
        source_hash=hashlib.md5(content.encode()).hexdigest(),
        title=title
    )

def load_pdf_file(path: str) -> RawDocument:
    from pypdf import PdfReader
    reader = PdfReader(path)
    pages = []
    for page in reader.pages:
        pages.append(page.extract_text())
    content = "\n\n".join(pages)
    return RawDocument(
        content=content,
        source_path=path,
        source_type="pdf",
        source_hash=hashlib.md5(content.encode()).hexdigest(),
        title=Path(path).stem,
        metadata={"pages": len(reader.pages)}
    )

def load_url(url: str) -> RawDocument:
    response = requests.get(url, timeout=30)
    response.raise_for_status()
    soup = BeautifulSoup(response.text, "html.parser")
    # Rimuovi script, style, nav
    for tag in soup(["script", "style", "nav", "header", "footer"]):
        tag.decompose()
    content = soup.get_text(separator="\n", strip=True)
    title = soup.title.string if soup.title else url
    return RawDocument(
        content=content,
        source_path=url,
        source_type="html",
        source_hash=hashlib.md5(content.encode()).hexdigest(),
        title=title
    )

def load_document(source: str) -> RawDocument:
    """Smart loader che sceglie il parser corretto."""
    if source.startswith("http"):
        return load_url(source)
    p = Path(source)
    loaders = {
        ".txt":  load_text_file,
        ".md":   load_markdown_file,
        ".pdf":  load_pdf_file,
    }
    loader = loaders.get(p.suffix.lower())
    if not loader:
        raise ValueError(f"Tipo file non supportato: {p.suffix}")
    return loader(source)

ingestion/chunkers.py — Inteligentne porcjowanie

from langchain.text_splitter import RecursiveCharacterTextSplitter
from dataclasses import dataclass
from typing import Optional

@dataclass
class TextChunk:
    content: str
    chunk_index: int
    source_path: str
    source_type: str
    source_hash: str
    title: Optional[str] = None
    metadata: dict = None

    def __post_init__(self):
        if self.metadata is None:
            self.metadata = {}

class SmartChunker:
    """
    Chunker che adatta la strategia al tipo di documento.
    """
    def __init__(self, chunk_size: int = 800, chunk_overlap: int = 150):
        self.chunk_size = chunk_size
        self.chunk_overlap = chunk_overlap

        # Separatori per testo generico
        self._text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=chunk_size,
            chunk_overlap=chunk_overlap,
            separators=["\n\n", "\n", ". ", "! ", "? ", "; ", ", ", " "],
            length_function=len
        )

        # Separatori per markdown (rispetta la struttura)
        self._md_splitter = RecursiveCharacterTextSplitter(
            chunk_size=chunk_size,
            chunk_overlap=chunk_overlap,
            separators=["## ", "# ", "\n\n", "\n", ". "],
            length_function=len
        )

    def chunk(self, doc) -> list[TextChunk]:
        """Chunk un documento, scegliendo la strategia giusta."""
        if doc.source_type == "md":
            raw_chunks = self._md_splitter.split_text(doc.content)
        else:
            raw_chunks = self._text_splitter.split_text(doc.content)

        # Filtra chunk troppo piccoli
        raw_chunks = [c for c in raw_chunks if len(c.strip()) > 100]

        return [
            TextChunk(
                content=chunk.strip(),
                chunk_index=i,
                source_path=doc.source_path,
                source_type=doc.source_type,
                source_hash=doc.source_hash,
                title=doc.title,
                metadata={
                    **doc.metadata,
                    "chunk_total": len(raw_chunks),
                    "char_count": len(chunk)
                }
            )
            for i, chunk in enumerate(raw_chunks)
        ]

ingestion/pipeline.py — główny koordynator

import psycopg2
from psycopg2.extras import execute_values
import json
import time
from .loaders import load_document
from .chunkers import SmartChunker

class IngestionPipeline:
    def __init__(self, config, embedder):
        self.config = config
        self.embedder = embedder
        self.chunker = SmartChunker(
            chunk_size=config.chunk_size,
            chunk_overlap=config.chunk_overlap
        )
        self.conn = psycopg2.connect(config.get_db_url())

    def is_already_ingested(self, source_path: str, source_hash: str) -> bool:
        """Controlla se il documento e già nel DB con lo stesso hash (non cambiato)."""
        with self.conn.cursor() as cur:
            cur.execute(
                "SELECT COUNT(*) FROM rag_documents WHERE source_path = %s AND source_hash = %s",
                (source_path, source_hash)
            )
            return cur.fetchone()[0] > 0

    def ingest(self, source: str, tags: list[str] = None, force: bool = False) -> dict:
        """
        Processa un documento e lo inserisce in PostgreSQL.
        Ritorna statistiche sull'operazione.
        """
        tags = tags or []
        start_time = time.time()

        # 1. Carica documento
        doc = load_document(source)
        print(f"Caricato: {source} ({len(doc.content)} chars, hash: {doc.source_hash[:8]})")

        # 2. Controlla se già presente (incrementale update)
        if not force and self.is_already_ingested(source, doc.source_hash):
            print(f"  Saltato: documento non modificato")
            return {"skipped": True, "source": source}

        # 3. Chunking
        chunks = self.chunker.chunk(doc)
        print(f"  Chunking: {len(chunks)} chunk creati")

        # 4. Elimina versione precedente (se esiste)
        with self.conn.cursor() as cur:
            cur.execute("DELETE FROM rag_documents WHERE source_path = %s", (source,))

        # 5. Genera embeddings in batch
        texts = [c.content for c in chunks]
        embeddings = self.embedder.embed_batch(texts)
        print(f"  Embeddings generati: {len(embeddings)} vettori dim {len(embeddings[0])}")

        # 6. Inserisci in PostgreSQL
        rows = [
            (
                c.source_path,
                c.source_type,
                c.source_hash,
                c.chunk_index,
                len(chunks),  # chunk_total
                c.title,
                c.content,
                self.config.embedding_model,
                embeddings[i],
                json.dumps(c.metadata),
                tags
            )
            for i, c in enumerate(chunks)
        ]

        with self.conn.cursor() as cur:
            execute_values(cur, """
                INSERT INTO rag_documents
                    (source_path, source_type, source_hash, chunk_index, chunk_total,
                     title, content, embedding_model, embedding, metadata, tags)
                VALUES %s
                ON CONFLICT (source_path, chunk_index, source_hash) DO UPDATE SET
                    content = EXCLUDED.content,
                    embedding = EXCLUDED.embedding,
                    updated_at = NOW()
            """, rows, template="(%s,%s,%s,%s,%s,%s,%s,%s,%s::vector,%s::jsonb,%s::text[])")
            self.conn.commit()

        elapsed = time.time() - start_time
        stats = {
            "source": source,
            "chunks": len(chunks),
            "embeddings": len(embeddings),
            "elapsed_sec": round(elapsed, 2)
        }
        print(f"  Completato in {elapsed:.1f}s - {stats}")
        return stats

    def ingest_directory(self, directory: str, extensions: list[str] = None) -> list[dict]:
        """Ingesta tutti i documenti in una directory."""
        from pathlib import Path
        extensions = extensions or [".txt", ".md", ".pdf"]
        results = []
        for path in Path(directory).rglob("*"):
            if path.suffix.lower() in extensions:
                result = self.ingest(str(path))
                results.append(result)
        return results

Odzyskiwanie: znajdowanie odpowiednich kawałków

Retrieval/searcher.py

import psycopg2
from dataclasses import dataclass
from typing import Optional

@dataclass
class SearchResult:
    id: int
    source_path: str
    source_type: str
    chunk_index: int
    title: Optional[str]
    content: str
    similarity: float
    metadata: dict

class HybridSearcher:
    """
    Combina vector search (semantica) con full-text search (keyword).
    Reciprocal Rank Fusion per merging dei risultati.
    """
    def __init__(self, config, embedder):
        self.config = config
        self.embedder = embedder
        self.conn = psycopg2.connect(config.get_db_url())

    def vector_search(self, query: str, top_k: int = 10,
                      source_type: Optional[str] = None,
                      tags: Optional[list[str]] = None) -> list[SearchResult]:
        """Ricerca semantica con filtri opzionali."""
        query_embedding = self.embedder.embed_single(query)
        threshold = 1 - self.config.similarity_threshold  # converti a cosine distance

        # Costruisci query dinamica con filtri opzionali
        filters = ["embedding <=> %s::vector < %s"]
        params = [query_embedding, threshold]

        if source_type:
            filters.append("source_type = %s")
            params.append(source_type)
        if tags:
            filters.append("tags && %s::text[]")  -- overlap: almeno un tag in comune
            params.append(tags)

        where_clause = " AND ".join(filters)

        with self.conn.cursor() as cur:
            cur.execute(f"""
                SELECT
                    id, source_path, source_type, chunk_index, title, content,
                    1 - (embedding <=> %s::vector) AS similarity,
                    metadata
                FROM rag_documents
                WHERE {where_clause}
                ORDER BY embedding <=> %s::vector
                LIMIT %s
            """, [query_embedding] + params + [query_embedding, top_k])

            rows = cur.fetchall()
            return [
                SearchResult(
                    id=r[0], source_path=r[1], source_type=r[2],
                    chunk_index=r[3], title=r[4], content=r[5],
                    similarity=round(r[6], 4), metadata=r[7]
                )
                for r in rows
            ]

    def fulltext_search(self, query: str, top_k: int = 10) -> list[SearchResult]:
        """Full-text search con ts_rank per ranking."""
        with self.conn.cursor() as cur:
            cur.execute("""
                SELECT
                    id, source_path, source_type, chunk_index, title, content,
                    ts_rank(to_tsvector('english', content),
                            plainto_tsquery('english', %s)) AS rank,
                    metadata
                FROM rag_documents
                WHERE to_tsvector('english', content) @@
                      plainto_tsquery('english', %s)
                ORDER BY rank DESC
                LIMIT %s
            """, (query, query, top_k))

            rows = cur.fetchall()
            return [
                SearchResult(
                    id=r[0], source_path=r[1], source_type=r[2],
                    chunk_index=r[3], title=r[4], content=r[5],
                    similarity=round(float(r[6]), 4), metadata=r[7]
                )
                for r in rows
            ]

    def hybrid_search(self, query: str, top_k: int = 5,
                       vector_weight: float = 0.7) -> list[SearchResult]:
        """
        Reciprocal Rank Fusion (RRF) per combinare vector e full-text search.
        RRF Score = sum(1 / (k + rank)) per ogni lista di risultati.
        """
        k_rrf = 60  # costante RRF standard

        # Ottieni entrambi i risultati
        vector_results = self.vector_search(query, top_k=top_k * 2)
        fts_results = self.fulltext_search(query, top_k=top_k * 2)

        # Calcola RRF scores
        scores = {}
        all_results = {}

        for rank, result in enumerate(vector_results):
            scores[result.id] = scores.get(result.id, 0) + vector_weight / (k_rrf + rank + 1)
            all_results[result.id] = result

        fts_weight = 1 - vector_weight
        for rank, result in enumerate(fts_results):
            scores[result.id] = scores.get(result.id, 0) + fts_weight / (k_rrf + rank + 1)
            all_results[result.id] = result

        # Ordina per RRF score e prendi top_k
        sorted_ids = sorted(scores.keys(), key=lambda x: scores[x], reverse=True)
        final_results = [all_results[id] for id in sorted_ids[:top_k]]

        # Aggiorna similarity con RRF score normalizzato
        max_score = scores[sorted_ids[0]] if sorted_ids else 1
        for result in final_results:
            result.similarity = round(scores[result.id] / max_score, 4)

        return final_results

Generacja: od kontekstu do odpowiedzi

Generation/prompts.py

from string import Template

# System prompt che definisce il comportamento dell'AI
RAG_SYSTEM_PROMPT = """Sei un assistente AI preciso e utile. Rispondi alle domande
basandoti ESCLUSIVAMENTE sui documenti di contesto forniti.

Regole:
1. Usa SOLO le informazioni presenti nel contesto. Non inventare.
2. Se la risposta non e nel contesto, dillo chiaramente.
3. Cita le sorgenti usando [Fonte: nome_file, chunk X] dopo ogni affermazione.
4. Mantieni un tono professionale e conciso.
5. Struttura la risposta in modo chiaro con paragrafi o bullet points se appropriato.
"""

def build_rag_prompt(query: str, context_chunks: list, include_sources: bool = True) -> str:
    """
    Costruisce il prompt per l'LLM con il contesto recuperato.

    Args:
        query: La domanda dell'utente
        context_chunks: Lista di SearchResult
        include_sources: Se includere le informazioni sulla sorgente

    Returns:
        Il prompt formattato per l'LLM
    """
    if not context_chunks:
        return f"Domanda: {query}\n\nNota: Non ho trovato documenti rilevanti nel knowledge base."

    # Costruisci il contesto con numerazione e sorgente
    context_parts = []
    for i, chunk in enumerate(context_chunks, 1):
        source_info = f"[Fonte: {chunk.source_path}, chunk {chunk.chunk_index}]" if include_sources else ""
        context_parts.append(f"--- Documento {i} {source_info} ---\n{chunk.content}")

    context_text = "\n\n".join(context_parts)

    return f"""Contesto dai documenti:
{context_text}

---

Domanda dell'utente: {query}

Rispondi basandoti sul contesto fornito."""

generacja/generator.py

from openai import OpenAI
from dataclasses import dataclass
from typing import Optional
import tiktoken
from .prompts import RAG_SYSTEM_PROMPT, build_rag_prompt

@dataclass
class RAGResponse:
    answer: str
    sources: list[dict]
    model: str
    total_tokens: int
    prompt_tokens: int
    completion_tokens: int

class RAGGenerator:
    def __init__(self, config):
        self.config = config
        self.client = OpenAI(api_key=config.openai_api_key)
        self.tokenizer = tiktoken.encoding_for_model("gpt-4o")

    def count_tokens(self, text: str) -> int:
        return len(self.tokenizer.encode(text))

    def truncate_context(self, chunks: list, max_tokens: int) -> list:
        """
        Tronca il contesto per non superare il limite di token.
        Mantieni i chunk più rilevanti (gia ordinati per similarità).
        """
        selected = []
        used_tokens = 0

        for chunk in chunks:
            chunk_tokens = self.count_tokens(chunk.content)
            if used_tokens + chunk_tokens > max_tokens:
                break
            selected.append(chunk)
            used_tokens += chunk_tokens

        return selected

    def generate(self, query: str, context_chunks: list,
                 stream: bool = False) -> RAGResponse:
        """
        Genera una risposta RAG.

        Args:
            query: La domanda dell'utente
            context_chunks: Chunk recuperati da PostgreSQL
            stream: Se True, usa streaming (non implementato qui per semplicità)
        """
        # Tronca il contesto se necessario
        max_context_tokens = self.config.max_context_tokens
        truncated_chunks = self.truncate_context(context_chunks, max_context_tokens)

        if len(truncated_chunks) < len(context_chunks):
            print(f"  Contesto troncato: {len(context_chunks)} -> {len(truncated_chunks)} chunk")

        # Costruisci il prompt
        user_prompt = build_rag_prompt(query, truncated_chunks)

        # Chiama l'LLM
        response = self.client.chat.completions.create(
            model=self.config.chat_model,
            messages=[
                {"role": "system", "content": RAG_SYSTEM_PROMPT},
                {"role": "user", "content": user_prompt}
            ],
            temperature=self.config.temperature,
            max_tokens=1500
        )

        answer = response.choices[0].message.content
        usage = response.usage

        # Prepara le sorgenti per la risposta
        sources = [
            {
                "source": chunk.source_path,
                "chunk_index": chunk.chunk_index,
                "similarity": chunk.similarity,
                "excerpt": chunk.content[:200] + "..."
            }
            for chunk in truncated_chunks
        ]

        return RAGResponse(
            answer=answer,
            sources=sources,
            model=self.config.chat_model,
            total_tokens=usage.total_tokens,
            prompt_tokens=usage.prompt_tokens,
            completion_tokens=usage.completion_tokens
        )

Kompletny system RAG

rag.py — klasa główna

from config import config, Config
from ingestion.pipeline import IngestionPipeline
from retrieval.searcher import HybridSearcher
from generation.generator import RAGGenerator

class EmbeddingService:
    """Wrapper per generazione embeddings OpenAI."""
    def __init__(self, cfg: Config):
        from openai import OpenAI
        self.client = OpenAI(api_key=cfg.openai_api_key)
        self.model = cfg.embedding_model

    def embed_single(self, text: str) -> list[float]:
        resp = self.client.embeddings.create(
            input=[text.replace("\n", " ")],
            model=self.model
        )
        return resp.data[0].embedding

    def embed_batch(self, texts: list[str]) -> list[list[float]]:
        cleaned = [t.replace("\n", " ").strip() for t in texts]
        resp = self.client.embeddings.create(input=cleaned, model=self.model)
        return [item.embedding for item in resp.data]

class RAGSystem:
    """
    Sistema RAG completo: ingestion + retrieval + generation.
    """
    def __init__(self, cfg: Config = None):
        self.config = cfg or config
        self.embedder = EmbeddingService(self.config)
        self.ingestion = IngestionPipeline(self.config, self.embedder)
        self.searcher = HybridSearcher(self.config, self.embedder)
        self.generator = RAGGenerator(self.config)

    def add_document(self, source: str, tags: list[str] = None) -> dict:
        """Aggiunge un documento al knowledge base."""
        return self.ingestion.ingest(source, tags=tags)

    def add_directory(self, directory: str, extensions: list[str] = None) -> list[dict]:
        """Aggiunge tutti i documenti di una directory."""
        return self.ingestion.ingest_directory(directory, extensions)

    def ask(self, question: str, use_hybrid: bool = True,
            source_type: str = None) -> dict:
        """
        Pone una domanda al sistema RAG.

        Returns:
            dict con answer, sources, usage
        """
        # 1. Retrieval
        if use_hybrid:
            chunks = self.searcher.hybrid_search(question, top_k=self.config.top_k)
        else:
            chunks = self.searcher.vector_search(
                question, top_k=self.config.top_k, source_type=source_type
            )

        if not chunks:
            return {
                "answer": "Non ho trovato informazioni rilevanti per rispondere a questa domanda.",
                "sources": [],
                "retrieval": {"chunks_found": 0}
            }

        # 2. Generation
        response = self.generator.generate(question, chunks)

        return {
            "answer": response.answer,
            "sources": response.sources,
            "retrieval": {
                "chunks_found": len(chunks),
                "top_similarity": chunks[0].similarity if chunks else 0
            },
            "usage": {
                "model": response.model,
                "total_tokens": response.total_tokens
            }
        }

main.py — Użycie systemu

from rag import RAGSystem

# Inizializza il sistema
rag = RAGSystem()

# --- INGESTION ---
print("=== Aggiungendo documenti al knowledge base ===")

# Aggiungi singoli file
rag.add_document("docs/postgresql_guide.pdf", tags=["postgresql", "database"])
rag.add_document("docs/pgvector_tutorial.md", tags=["pgvector", "vector-search"])
rag.add_document("https://www.postgresql.org/docs/current/", tags=["official-docs"])

# Aggiungi una directory intera
stats = rag.add_directory("docs/", extensions=[".md", ".txt", ".pdf"])
print(f"Ingestati {len(stats)} documenti")

# --- QUERY ---
print("\n=== Interrogando il sistema ===")

questions = [
    "Come si installa pgvector su PostgreSQL 16?",
    "Qual e la differenza tra HNSW e IVFFlat?",
    "Come si ottimizza la memoria per il vector search?",
]

for q in questions:
    print(f"\nDomanda: {q}")
    print("-" * 60)
    result = rag.ask(q)
    print(f"Risposta:\n{result['answer']}")
    print(f"\nSorgenti utilizzate ({len(result['sources'])}):")
    for src in result["sources"]:
        print(f"  - {src['source']} [similarità: {src['similarity']}]")
    print(f"\nToken usati: {result['usage']['total_tokens']}")

Wyszukiwanie hybrydowe: PostgreSQL pełnotekstowy + wektor

Jedną z największych zalet PostgreSQL dla RAG jest to, że możesz połączyć wyszukiwanie w jedno zapytanie semantyka (wektorowa) z klasycznym wyszukiwaniem pełnotekstowym. Jest to szczególnie przydatne dla zapytania zawierające precyzyjne terminy techniczne (nazwy własne, akronimy, wersje oprogramowania), które Samo wyszukiwanie semantyczne może nie uchwycić idealnie:

-- Hybrid search in SQL puro: vettore + full-text in una query
WITH vector_search AS (
    SELECT id, content, source_path, chunk_index,
           1 - (embedding <=> %s::vector) AS vector_score,
           ROW_NUMBER() OVER (ORDER BY embedding <=> %s::vector) AS vector_rank
    FROM rag_documents
    ORDER BY embedding <=> %s::vector
    LIMIT 20
),
fts_search AS (
    SELECT id, content, source_path, chunk_index,
           ts_rank(to_tsvector('english', content),
                   plainto_tsquery('english', %s)) AS fts_score,
           ROW_NUMBER() OVER (
               ORDER BY ts_rank(to_tsvector('english', content),
                                plainto_tsquery('english', %s)) DESC
           ) AS fts_rank
    FROM rag_documents
    WHERE to_tsvector('english', content) @@ plainto_tsquery('english', %s)
    LIMIT 20
),
-- Reciprocal Rank Fusion
rrf AS (
    SELECT
        COALESCE(v.id, f.id) AS id,
        COALESCE(v.content, f.content) AS content,
        COALESCE(v.source_path, f.source_path) AS source_path,
        -- RRF score: 0.7 * vector_weight + 0.3 * fts_weight
        COALESCE(0.7 / (60 + v.vector_rank), 0) +
        COALESCE(0.3 / (60 + f.fts_rank), 0) AS rrf_score
    FROM vector_search v
    FULL OUTER JOIN fts_search f ON v.id = f.id
)
SELECT id, content, source_path, rrf_score
FROM rrf
ORDER BY rrf_score DESC
LIMIT 5;

Ocena jakości RAG

Jak sprawdzić, czy system RAG działa dobrze? Główne wskaźniki to:

Metryczny	Co mierzy	Cel	Jak to obliczyć
Przypomnij @K	Odpowiednie dokumenty znajdują się w najwyższych wynikach K	> 0,70	Zestaw testowy z podstawową prawdą
Precyzja@K	Znalezione wyniki są rzeczywiście istotne	> 0,60	Ręczna adnotacja
Odpowiedz Wierność	Odpowiedź jest obsługiwana przez pobrany kontekst	> 0,80	Ramy RAGAS
Odpowiedź Trafność	Odpowiedź odpowiada na zadane pytanie	> 0,75	Ramy RAGAS
Opóźnienie P95	Czas reakcji na 95. percentylu	< 3 s	Monitoring w produkcji

# Valutazione con RAGAS
# pip install ragas
from ragas import evaluate
from ragas.metrics import faithfulness, answer_relevancy, context_recall
from datasets import Dataset

# Prepara il dataset di test
test_data = {
    "question": [
        "Come si crea un indice HNSW in pgvector?",
        "Qual e il limite di dimensioni per i vettori in pgvector?",
    ],
    "answer": [
        # Risposte generate dal tuo sistema RAG
        rag.ask("Come si crea un indice HNSW in pgvector?")["answer"],
        rag.ask("Qual e il limite di dimensioni per i vettori in pgvector?")["answer"],
    ],
    "contexts": [
        # I chunk recuperati per ciascuna domanda
        [c["excerpt"] for c in rag.ask("...")["sources"]],
        [c["excerpt"] for c in rag.ask("...")["sources"]],
    ],
    "ground_truth": [
        "CREATE INDEX ON items USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 64)",
        "Il limite e 16000 dimensioni per vettori di tipo vector in pgvector 0.7+",
    ]
}

dataset = Dataset.from_dict(test_data)
results = evaluate(dataset, metrics=[faithfulness, answer_relevancy, context_recall])
print(results)

Zaawansowane strategie fragmentowania

Jakość kawałkowania jest jednym z najważniejszych czynników wpływających na jakość RAG. Strategia źle skalibrowane fragmentowanie może obniżyć wydajność nawet w przypadku najlepszego modelu osadzania. Oto zaawansowane strategie dla konkretnych przypadków użycia:

Porcjowanie z nakładaniem się semantycznym

from langchain.text_splitter import RecursiveCharacterTextSplitter
import re

class SemanticChunker:
    """
    Chunker che preserva la coerenza semantica dei paragrafi.
    A differenza del semplice chunking per caratteri, questo
    rispetta i confini di frase e paragrafo.
    """
    def __init__(self, chunk_size: int = 800, chunk_overlap: int = 150):
        self.chunk_size = chunk_size
        self.chunk_overlap = chunk_overlap

    def split_by_sentences(self, text: str) -> list[str]:
        """Divide il testo in frasi usando regex."""
        # Pattern per fine frase: ., !, ? seguiti da spazio e maiuscola
        sentences = re.split(r'(?<=[.!?])\s+(?=[A-Z])', text)
        return [s.strip() for s in sentences if s.strip()]

    def create_chunks_with_context(self, text: str) -> list[str]:
        """
        Crea chunk con context overlap:
        ogni chunk include le ultime N parole del chunk precedente.
        """
        sentences = self.split_by_sentences(text)
        chunks = []
        current_chunk = []
        current_size = 0

        for sentence in sentences:
            sentence_size = len(sentence)

            # Se la frase corrente supera da sola il chunk_size, spezzala
            if sentence_size > self.chunk_size:
                if current_chunk:
                    chunks.append(" ".join(current_chunk))
                    # Mantieni overlap: ultime N parole
                    words = " ".join(current_chunk).split()
                    overlap_words = words[-30:]  # ~150 caratteri overlap
                    current_chunk = [" ".join(overlap_words)]
                    current_size = len(" ".join(overlap_words))

                # Spezza la frase lunga
                splitter = RecursiveCharacterTextSplitter(
                    chunk_size=self.chunk_size,
                    chunk_overlap=self.chunk_overlap
                )
                for sub in splitter.split_text(sentence):
                    chunks.append(sub)
                continue

            # Aggiungi frase al chunk corrente
            if current_size + sentence_size + 1 > self.chunk_size and current_chunk:
                chunks.append(" ".join(current_chunk))
                # Overlap: ultime 30 parole del chunk precedente
                words = " ".join(current_chunk).split()
                overlap_words = words[-30:]
                current_chunk = [" ".join(overlap_words), sentence]
                current_size = len(" ".join(current_chunk))
            else:
                current_chunk.append(sentence)
                current_size += sentence_size + 1

        if current_chunk:
            chunks.append(" ".join(current_chunk))

        return chunks

Dzielenie według struktury dokumentu (w oparciu o nagłówek)

import re
from typing import Generator

def chunk_by_headers(markdown_text: str, max_chunk_size: int = 800) -> Generator:
    """
    Chunking che rispetta la struttura gerarchica dei documenti Markdown.
    Ogni sezione H2/H3 diventa un contesto separato, preservando il titolo
    come header del chunk (fondamentale per la qualità dell'embedding).
    """
    # Regex per trovare header Markdown (H1-H4)
    header_pattern = re.compile(r'^(#{1,4})\s+(.+), re.MULTILINE)

    # Trova tutti gli header con le loro posizioni
    headers = list(header_pattern.finditer(markdown_text))

    if not headers:
        # Nessun header: usa chunking standard
        yield {"content": markdown_text, "header": "", "level": 0}
        return

    # Processa ogni sezione delimitata dagli header
    for i, header in enumerate(headers):
        level = len(header.group(1))  # numero di # = livello header
        title = header.group(2).strip()

        # Contenuto dalla posizione attuale fino al prossimo header
        start = header.end()
        end = headers[i + 1].start() if i + 1 < len(headers) else len(markdown_text)
        section_content = markdown_text[start:end].strip()

        if not section_content:
            continue

        # Prefissa ogni chunk con il titolo della sezione
        # CRITICO: il titolo migliora drasticamente la qualità dell'embedding
        full_chunk = f"# {title}\n\n{section_content}"

        # Se la sezione e troppo grande, spezzala
        if len(full_chunk) <= max_chunk_size:
            yield {"content": full_chunk, "header": title, "level": level}
        else:
            # Sezione grande: spezza mantenendo il titolo come prefisso
            splitter = RecursiveCharacterTextSplitter(
                chunk_size=max_chunk_size - len(title) - 10,
                chunk_overlap=100
            )
            for j, sub_chunk in enumerate(splitter.split_text(section_content)):
                yield {
                    "content": f"# {title}\n\n{sub_chunk}",
                    "header": title,
                    "level": level,
                    "sub_index": j
                }

Przepisywanie i dekompozycja zapytań

Zaawansowana technika poprawiająca jakość i jakość RAG przepisanie zapytania: przed wykonaniem wyszukiwania wektorowego przeformułuj zapytanie użytkownika, aby było większe nadaje się do wyszukiwania semantycznego. Zapytania konwersacyjne („ten wcześniej”, „jak to działa?”) często nie są one zgodne z dokumentacją techniczną.

from openai import OpenAI

client = OpenAI()

def rewrite_query_for_search(original_query: str, chat_history: list = None) -> str:
    """
    Riformula la query dell'utente per ottimizzare la ricerca semantica.
    Utile per:
    1. Query conversazionali con riferimenti impliciti
    2. Query brevi e ambigue
    3. Query con abbreviazioni o gergo tecnico non standard
    """
    history_context = ""
    if chat_history:
        history_context = f"\nConversazione precedente:\n{chr(10).join(chat_history[-4:])}\n"

    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {
                "role": "system",
                "content": """Sei un esperto di ricerca semantica. Dato una query utente,
                riscrivila per massimizzare la probabilità di trovare documenti rilevanti
                in una ricerca vettoriale. La query riscritta deve:
                1. Essere autonoma (senza riferimenti impliciti a "quello")
                2. Usare termini tecnici espliciti e precisi
                3. Esprimere chiaramente il concetto cercato
                4. Essere lunga 1-3 frasi
                Rispondi SOLO con la query riscritta, senza spiegazioni."""
            },
            {
                "role": "user",
                "content": f"{history_context}Query originale: {original_query}\nQuery riscritta:"
            }
        ],
        temperature=0,
        max_tokens=200
    )
    return response.choices[0].message.content.strip()

def decompose_complex_query(query: str) -> list[str]:
    """
    Decompone una query complessa in sub-query più semplici.
    Utile per domande multi-aspetto che richiedono informazioni da più documenti.

    Es: "Qual e la differenza tra HNSW e IVFFlat, e quale e più veloce?"
    -> ["Come funziona HNSW?", "Come funziona IVFFlat?", "Performance HNSW vs IVFFlat benchmark"]
    """
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {
                "role": "system",
                "content": """Analizza la query e, se contiene più domande o aspetti distinti,
                scomponila in 2-4 sub-query semplici. Se la query e già semplice, restituisci
                solo la query originale. Formato risposta: JSON list di stringhe."""
            },
            {
                "role": "user",
                "content": f"Query: {query}"
            }
        ],
        temperature=0,
        response_format={"type": "json_object"}
    )
    import json
    result = json.loads(response.choices[0].message.content)
    return result.get("sub_queries", [query])

# Uso nel pipeline RAG
def advanced_rag_query(rag_system, question: str) -> dict:
    """
    Pipeline RAG avanzata con query rewriting e decomposizione.
    """
    # 1. Rewrite per ricerca semantica ottimale
    rewritten = rewrite_query_for_search(question)
    print(f"Query riscritta: {rewritten}")

    # 2. Controlla se e una query complessa
    sub_queries = decompose_complex_query(rewritten)

    if len(sub_queries) == 1:
        # Query semplice: ricerca standard
        return rag_system.ask(rewritten)
    else:
        # Query complessa: cerca per ogni sub-query e deduplicating
        all_chunks = []
        seen_ids = set()

        for sq in sub_queries:
            results = rag_system.searcher.hybrid_search(sq, top_k=3)
            for chunk in results:
                if chunk.id not in seen_ids:
                    all_chunks.append(chunk)
                    seen_ids.add(chunk.id)

        # Genera risposta con tutti i chunk raccolti
        response = rag_system.generator.generate(question, all_chunks[:8])
        return {
            "answer": response.answer,
            "sources": response.sources,
            "sub_queries": sub_queries
        }

Monitorowanie jakości RAG na produkcji

-- Query SQL per monitorare la salute del knowledge base RAG

-- 1. Documenti per tipo sorgente e dimensione media chunk
SELECT
    source_type,
    COUNT(*) AS total_chunks,
    COUNT(DISTINCT source_path) AS unique_documents,
    ROUND(AVG(content_length)) AS avg_chunk_chars,
    MIN(content_length) AS min_chunk_chars,
    MAX(content_length) AS max_chunk_chars,
    SUM(content_length) AS total_chars
FROM rag_documents
GROUP BY source_type
ORDER BY total_chunks DESC;

-- 2. Distribuzione temporale dell'ingestion
SELECT
    DATE_TRUNC('day', ingested_at) AS day,
    COUNT(*) AS chunks_ingested,
    COUNT(DISTINCT source_path) AS docs_ingested
FROM rag_documents
WHERE ingested_at >= NOW() - INTERVAL '30 days'
GROUP BY day
ORDER BY day DESC;

-- 3. Documenti più vecchi (candidati per re-ingestion)
SELECT
    source_path,
    source_type,
    COUNT(*) AS chunks,
    MAX(ingested_at) AS last_ingested,
    NOW() - MAX(ingested_at) AS age
FROM rag_documents
GROUP BY source_path, source_type
ORDER BY last_ingested ASC
LIMIT 20;

-- 4. Verifica che gli embedding abbiano le dimensioni corrette
SELECT
    embedding_model,
    COUNT(*) AS total,
    -- array_length per vector type non e supportato nativamente
    -- usa questo per verificare che non ci siano embedding NULL
    COUNT(embedding) AS with_embedding,
    COUNT(*) - COUNT(embedding) AS missing_embedding
FROM rag_documents
GROUP BY embedding_model;

-- 5. Chunk più corti (probabilmente frammentati male)
SELECT id, source_path, chunk_index, content_length, content
FROM rag_documents
WHERE content_length < 100  -- chunk molto corti
ORDER BY content_length ASC
LIMIT 10;

-- 6. Dimensione totale del knowledge base
SELECT
    pg_size_pretty(pg_total_relation_size('rag_documents')) AS total_size,
    pg_size_pretty(pg_relation_size('rag_documents')) AS table_size,
    pg_size_pretty(pg_relation_size('idx_rag_embedding_hnsw')) AS hnsw_index_size,
    COUNT(*) AS total_chunks,
    COUNT(DISTINCT source_path) AS total_documents
FROM rag_documents;

Anty-wzorce, których należy unikać

5 najczęstszych błędów w RAG z PostgreSQL

Kawałki są za duże: Kawałki zawierające ponad 3000 znaków obejmują wiele tematów, mylą model osadzania. Maks. 1000 znaków (200 tokenów).
Brak filtra progowego: Zwróć fragmenty o niskim podobieństwie (np. 0,3) wprowadza szum do odpowiedzi. Ustaw minimum na 0,60-0,70.
Inny model osadzania zapytań i dokumentów: Jeśli połknąłeś z text-embedding-3-small, użyj tego samego w przypadku zapytań. Zawsze.
Podpowiedź zbyt ogólna: Monit systemowy musi poinstruować LLM, aby to zrobił trzymaj się kontekstu i cytuj źródła.
Brak buforowania: Identyczne zapytania za każdym razem przeliczają osadzanie. Zaimplementuj pamięć podręczną Redis dla najczęstszych osadzań zapytań.

Wnioski i dalsze kroki

Masz teraz kompletny i działający system RAG na PostgreSQL. Architektura, którą zbudowaliśmy i modułowe: możesz zastąpić model osadzania, zmienić LLM lub dodać nowe źródła bez dotykania serca systemu. PostgreSQL obsługuje zarówno magazyn wektorowy, jak i wyszukiwanie pełnotekstowe, eliminując potrzebę stosowania oddzielnych systemów, takich jak Pinecone lub Elasticsearch.

W następnym artykule omówiono Zaawansowane wyszukiwanie podobieństw: jak działają Algorytmy ANN (Approximate Nearest Neighbor), różnice pomiędzy wyszukiwaniem dokładnym i techniki przybliżone i optymalizacyjne dla zapytań o niskim opóźnieniu.

Seria trwa

Poprzedni: Osadzania: teoria i praktyka
Następny: Zaawansowane wyszukiwanie podobieństw w PostgreSQL
Powiązany: Inżynieria AI: architektura RAG