Hi! I'm

Federico Calò

Software Developer | Technical Writer

I create modern web applications and custom digital tools to help businesses grow through technological innovation. My passion is combining computer science and economics to generate real value.

Contact Me

About Me

My passion for computer science was born at the Technical Commercial Institute of Maglie, where I discovered the power of programming and the fascination of creating digital solutions. From the start, I understood that computer science was not just code, but an extraordinary tool for turning ideas into reality.

During my studies in Business Information Systems, I began to interweave computer science and economics, understanding how technology can be the engine of growth for any business. This vision accompanied me to the University of Bari, where I obtained my degree in Computer Science, deepening my technical skills and passion for software development.

Today I put this experience at the service of businesses, professionals and startups, creating tailor-made digital solutions that automate processes, optimize resources and open new business opportunities. Because true innovation begins when technology meets the real needs of people.

My Skills

Data Analysis & Predictive Models

I transform data into strategic insights with in-depth analysis and predictive models for informed decisions

Process Automation

I create custom tools that automate repetitive operations and free up time for value-added activities

Custom Systems

I develop tailor-made software systems, from platform integrations to customized dashboards

const federico = {
  nome: "Federico Calò",
  ruolo: "Sviluppatore Software",
  città: "Bari, Italia",
  missione: "Aiutare attraverso l'informatica",
  passioni: [
    "Codice Pulito",
    "Innovazione",
    "Crescita Continua"
  ]
};

La Mia Missione

Credo fermamente che l'informatica sia lo strumento più potente per trasformare le idee in realtà e migliorare la vita delle persone.

Democratizzare la Tecnologia

La mia missione è rendere l'informatica accessibile a tutti: dalle piccole imprese locali alle startup innovative, fino ai professionisti che vogliono digitalizzare la propria attività. Ogni realtà merita di sfruttare le potenzialità del digitale.

Unire Informatica ed Economia

Non è solo questione di scrivere codice: è capire come la tecnologia possa generare valore reale. Intrecciando competenze informatiche e visione economica, aiuto le attività a crescere, ottimizzare processi e raggiungere nuovi traguardi di efficienza e redditività.

Creare Soluzioni su Misura

Ogni attività è unica, e così devono esserlo le soluzioni. Sviluppo strumenti personalizzati che rispondono alle esigenze specifiche di ciascun cliente, automatizzando processi ripetitivi e liberando tempo per ciò che conta davvero: far crescere il business.

Trasforma la Tua Attività con la Tecnologia

Che tu gestisca un negozio, uno studio professionale o un'azienda, posso aiutarti a sfruttare le potenzialità dell'informatica per lavorare meglio, più velocemente e in modo più intelligente.

Parliamone Insieme →

Join the Community

Join the developer community where we discuss software, AI, architecture and DevOps. Share ideas, ask questions and grow with us.

Channel

FC Dev Blog

Get notifications on new articles, complete series, weekly tips and featured tools. Bilingual IT/EN content directly in your Telegram.

New articles as they are published
Weekly tips and code snippets
Polls on future topics

Subscribe to Channel

Group

FC Dev Community

A bilingual IT/EN community for developers. Discussions, Q&A, mutual help and networking with other professionals.

Discussions on articles and technologies
Coding help and code review
Job opportunities and collaboration

Join the Group

Discussion Topics

View

Master SQL

RoadMap.sh

November 2024

View

Oracle Certified Foundations Associate

Oracle

October 2024

View

People Leadership Credential

Connect

September 2024

💻 Languages & Technologies

Java

Python

JavaScript

Angular

React

TypeScript

SQL

PHP

CSS/SCSS

Node.js

Docker

Git

💼

12/2024 - Present

Custom Software Engineering Analyst

Accenture

Bari, Puglia, Italy · Hybrid Analysis and development of computer systems through the use of Java and Quarkus in Health and Public Sector. Continuous training on modern technologies for creating customized and efficient software solutions and on agents.

💼

06/2022 - 12/2024

Software analyst and Back End Developer Associate Consultant

Links Management and Technology SpA

Experience analyzing as-is software systems and ETL flows using PowerCenter. Completed Spring Boot training for developing modern and scalable backend applications. Backend developer specialized in Spring Boot, with experience in database design, analysis, development and testing of assigned tasks.

💼

02/2021 - 10/2021

Software programmer

Adesso.it (prima era WebScience srl)

Experience in AS-IS and TO-BE analysis, SEO evolutions and website evolutions to improve user performance and engagement.

🎓

2018 - 2025

Degree in Computer Science

University of Bari Aldo Moro

Bachelor's degree in Computer Science, focusing on software engineering, algorithms, and modern development practices.

📚

2013 - 2018

Diploma - Corporate Information Systems

Technical Commercial Institute of Maglie

Technical diploma specializing in Business Information Systems, combining IT knowledge with business management.

Contattami

Hai un progetto in mente? Parliamone! Compila il form qui sotto e ti risponderò al più presto.

* Campi obbligatori. I tuoi dati saranno utilizzati solo per rispondere alla tua richiesta.

Indexing for Vector Search: HNSW, IVFFlat and Performance Strategies

In the previous article, we explored how HNSW and IVFFlat work conceptually. Now we dive into the technical details that make the difference in production: how to choose optimal parameters, how to monitor index health over time, how to handle incremental updates without performance loss, and how to scale with millions of vectors on real hardware.

A poorly configured vector index can be 10x slower than an optimized one, or consume 4x more RAM than necessary. This article gives you concrete numbers and practical rules for configuring pgvector professionally, based on real benchmarks and production patterns. In 2026, with the "Just Use Postgres" trend continuing to grow, knowing how to correctly configure vector indexes is a fundamental skill for every AI engineer.

Series Overview

#	Article	Focus
1	pgvector	Installation, operators, indexing
2	Embeddings Deep Dive	Models, distances, generation
3	RAG with PostgreSQL	End-to-end RAG pipeline
4	Similarity Search	Algorithms and optimization
5	You are here - HNSW and IVFFlat	Advanced indexing strategies
6	RAG in Production	Scalability and performance

What You Will Learn

Calculating HNSW and IVFFlat index sizes
Choosing optimal parameters: formulas and benchmarks
Configuring ef_search and probes at runtime for specific recall targets
Monitoring index health in production
REINDEX without downtime
Incremental updates: how inserts degrade ANN indexes over time
Multi-index strategies for different use cases
Full PostgreSQL configuration to maximize vector performance
Advanced query optimization: query planning and EXPLAIN ANALYZE
Scheduled index maintenance: vacuum and aggressive autovacuum

Index Sizing: How Much Space Do They Use?

Before creating an index, it is essential to understand how much space it will occupy in memory and on disk. Having the entire index in shared_buffers is the optimal condition for maximum performance. An index that does not fit in memory requires I/O for every query, increasing latency by 10-100x.

HNSW Size Formula

-- Approximate HNSW index size formula:
-- Size ~= n_vectors * m * (2 + 4 * d / 8) bytes + overhead
-- where:
--   n_vectors = number of vectors
--   m = index m parameter (connections per node)
--   d = vector dimensions (e.g. 1536)

-- Practical example: 1M vectors, dim=1536, m=16:
-- 1_000_000 * 16 * (2 + 4 * 1536 / 8) = 1_000_000 * 16 * 770 = 12.3 GB
-- HNSW typically occupies 1.5-3x the raw data size

-- Raw data size (float4 vector):
-- 4 bytes * 1536 dim * 1_000_000 vectors = 6.1 GB

-- Check actual sizes:
SELECT
    pg_size_pretty(pg_relation_size('documents')) AS table_size,
    pg_size_pretty(pg_indexes_size('documents')) AS indexes_size,
    pg_size_pretty(pg_total_relation_size('documents')) AS total_size;

-- Size of each specific index:
SELECT
    indexname,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
    pg_size_pretty(pg_relation_size(indexrelid)) AS raw_bytes
FROM pg_stat_user_indexes
WHERE tablename = 'documents'
ORDER BY pg_relation_size(indexrelid) DESC;

-- Check if index fits in shared_buffers:
-- Rule of thumb: shared_buffers must be >= 1.5x the HNSW index size
-- If it doesn't fit, consider reduced-dimension embeddings (768 or 384 dim)
SELECT
    current_setting('shared_buffers') AS shared_buffers,
    pg_size_pretty(pg_indexes_size('documents')) AS total_index_size;

Quick Sizing Reference

Vectors	Dimensions	Raw Data	HNSW (m=16)	IVFFlat (lists=sqrt(n))	RAM Recommended
100K	1536	600 MB	~1.2 GB	~700 MB	4 GB
1M	1536	6 GB	~12 GB	~7 GB	32 GB
10M	1536	60 GB	~120 GB	~70 GB	256 GB
1M	768	3 GB	~6 GB	~3.5 GB	16 GB
1M	384	1.5 GB	~3 GB	~1.8 GB	8 GB

HNSW Parameters: Optimal Configuration Guide

HNSW has three key parameters that control the balance between memory, build time, recall, and query latency. Understanding them deeply allows you to configure the index professionally for any use case.

The m Parameter: Connections Per Node

-- m: maximum bidirectional connections per node per level
-- Default: 16  |  Valid range: 4-64 (pgvector max: 100)

-- Practical guidelines for m:
-- m=8:  Low memory, lower recall. Use for: caching, fast suggestions, huge datasets
-- m=16: Balanced default. Use for: general-purpose RAG, semantic search
-- m=32: High recall, double memory. Use for: medical/legal search, high precision
-- m=64: Maximum recall, 4x memory. Use for: extreme cases, small datasets <100K

-- Benchmark m vs recall and memory (1M vectors, 1536 dim, ef_search=40):
-- m=8:  recall@10=84%, index=6GB,  p50=7ms,  p95=15ms
-- m=16: recall@10=93%, index=12GB, p50=10ms, p95=22ms
-- m=32: recall@10=97%, index=24GB, p50=18ms, p95=38ms
-- m=64: recall@10=99%, index=48GB, p50=35ms, p95=72ms

-- Create indexes with different m values (comparative test):
CREATE INDEX idx_hnsw_m8  ON documents USING hnsw (embedding vector_cosine_ops) WITH (m=8,  ef_construction=64);
CREATE INDEX idx_hnsw_m16 ON documents USING hnsw (embedding vector_cosine_ops) WITH (m=16, ef_construction=64);
CREATE INDEX idx_hnsw_m32 ON documents USING hnsw (embedding vector_cosine_ops) WITH (m=32, ef_construction=64);

-- Compare actual sizes of created indexes:
SELECT
    indexname,
    pg_size_pretty(pg_relation_size(indexrelid)) AS size,
    idx_scan AS query_count
FROM pg_stat_user_indexes
WHERE tablename = 'documents'
  AND indexname LIKE 'idx_hnsw_m%'
ORDER BY indexname;

The ef_construction Parameter: Build Quality

-- ef_construction: candidates considered during index build
-- Affects the quality (maximum achievable recall) of the built index
-- Does NOT affect index size
-- Default: 64

-- Practical guidelines:
-- ef_construction=32:  Fast build, reduced potential recall. Prototypes only.
-- ef_construction=64:  Default. Excellent for most cases.
-- ef_construction=128: 2x slower build, ~2% better max recall.
-- ef_construction=200: Very slow build, marginal improvement.

-- Build time benchmark (m=16, 1M vectors):
-- ef=32:  Build ~20 min, max recall@10 ~89%
-- ef=64:  Build ~45 min, max recall@10 ~95%
-- ef=128: Build ~90 min, max recall@10 ~97%
-- ef=256: Build ~3 hours, max recall@10 ~98.5%

-- For maximum index quality (one-time operation):
-- Use a large maintenance_work_mem for the build
SET maintenance_work_mem = '4GB';  -- temporary for the build
CREATE INDEX idx_hnsw_highquality
ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m=24, ef_construction=128);
-- After the build, RAM is released automatically

-- Monitor build progress:
SELECT
    phase,
    blocks_done,
    blocks_total,
    ROUND(blocks_done::numeric / NULLIF(blocks_total, 0) * 100, 1) AS pct_done,
    tuples_done,
    tuples_total
FROM pg_stat_progress_create_index
WHERE relid = 'documents'::regclass;

The ef_search Parameter: Query Quality

-- ef_search: candidates examined during search (beam search width)
-- This is a RUNTIME parameter: you can change it without rebuilding the index
-- Default: 40
-- Valid range: 1 -> ef_construction (build-time max)

-- Set ef_search for the current session:
SET hnsw.ef_search = 40;   -- default, good balance

-- High precision (enterprise RAG, medical, legal):
SET hnsw.ef_search = 100;

-- High speed (autocomplete, real-time recommendation):
SET hnsw.ef_search = 20;

-- Benchmark ef_search (1M vectors, 1536 dim, m=16, ef_construction=64):
-- ef_search=10:  ~3ms/query,  recall@10 ~75%
-- ef_search=20:  ~5ms/query,  recall@10 ~85%
-- ef_search=40:  ~10ms/query, recall@10 ~92%
-- ef_search=100: ~25ms/query, recall@10 ~97%
-- ef_search=200: ~50ms/query, recall@10 ~99%

-- Set per transaction (safer in production):
BEGIN;
SET LOCAL hnsw.ef_search = 80;
SELECT id, content, embedding <=> $1::vector AS dist
FROM documents
ORDER BY embedding <=> $1::vector
LIMIT 10;
COMMIT;

-- Set globally in postgresql.conf (persists across restarts):
-- hnsw.ef_search = 60

-- Check current configuration:
SHOW hnsw.ef_search;
SELECT current_setting('hnsw.ef_search');

IVFFlat Parameters: lists and probes

IVFFlat uses a fundamentally different approach: instead of a navigable graph, it creates clusters via K-means and searches only the most promising clusters at query time. The lists and probes parameters control this mechanism.

Choosing the Number of Lists

-- lists: number of clusters (centroids) for IVFFlat
-- Rule of thumb:
--   lists = sqrt(n_rows)  for datasets up to 1M rows
--   lists = n_rows / 1000  for datasets above 1M rows

-- Automatically calculate the optimal value:
WITH stats AS (
    SELECT COUNT(*) AS n FROM documents
)
SELECT
    n,
    CEIL(SQRT(n::float))::int AS recommended_lists,
    CEIL(SQRT(n::float))::int * 10 AS max_probes  -- max probes = 10% of lists
FROM stats;

-- Examples:
-- 10K rows:   lists=100  (sqrt=100, minimum recommended=100)
-- 100K rows:  lists=316  (sqrt(100000))
-- 1M rows:    lists=1000 (sqrt(1000000))
-- 10M rows:   lists=3162 (sqrt(10000000))
-- 100M rows:  lists=10000

-- Create the index with the calculated value (automated procedure):
DO $
DECLARE
    n_rows INTEGER;
    n_lists INTEGER;
BEGIN
    SELECT COUNT(*) INTO n_rows FROM documents;
    n_lists := GREATEST(100, CEIL(SQRT(n_rows::float))::int);
    EXECUTE format(
        'CREATE INDEX idx_ivfflat ON documents USING ivfflat (embedding vector_cosine_ops) WITH (lists = %s)',
        n_lists
    );
    RAISE NOTICE 'IVFFlat index created with % lists for % rows', n_lists, n_rows;
END $;

-- IMPORTANT: IVFFlat needs existing data to perform K-means clustering!
-- Create the index AFTER loading at least 70-80% of your data.
-- If you add significant data after the build, recall degrades: rebuild periodically.

Probes at Runtime: Balancing Recall and Latency

-- probes: how many lists to search during a query
-- Must be <= lists
-- Default: 1 (searches only the nearest list - very fast but very low recall!)

-- WARNING: the default probes=1 gives very low recall!
-- Always set an appropriate probes value for your use case.

-- Formula for recall target:
-- probes_needed ~= lists * target_recall^2
-- For 90% recall with lists=1000: probes ~= 1000 * 0.81 = 810 (!!)
-- For 85% recall with lists=1000: probes ~= 1000 * 0.72 = 720
-- For 80% recall with lists=1000: probes ~= 1000 * 0.64 = 640

-- In practice, with well-distributed clustering (K-means converges):
-- probes = lists * 0.05  ->  recall ~= 85%   (good balance)
-- probes = lists * 0.10  ->  recall ~= 90%
-- probes = lists * 0.20  ->  recall ~= 95%

-- Benchmark IVFFlat (1M vectors, lists=1000, 1536 dim):
-- probes=5:    ~3ms/query,   recall@10 ~72%
-- probes=10:   ~6ms/query,   recall@10 ~82%
-- probes=50:   ~28ms/query,  recall@10 ~92%
-- probes=100:  ~55ms/query,  recall@10 ~96%
-- probes=200:  ~110ms/query, recall@10 ~98%

-- Set in postgresql.conf (persists between sessions):
-- ivfflat.probes = 10  (global default)

-- Override per session/transaction:
BEGIN;
SET LOCAL ivfflat.probes = 50;  -- only for this transaction
SELECT id, content FROM documents ORDER BY embedding <=> query_vec LIMIT 5;
COMMIT;

-- Application-level tuning in Python with psycopg2:
with conn.cursor() as cur:
    cur.execute("SET ivfflat.probes = %s", (probes,))
    cur.execute("""
        SELECT id, content, 1 - (embedding <=> %s::vector) AS similarity
        FROM documents
        ORDER BY embedding <=> %s::vector
        LIMIT %s
    """, (query_vec, query_vec, top_k))
    results = cur.fetchall()

Direct Comparison: HNSW vs IVFFlat

Choosing between HNSW and IVFFlat is not always obvious. This table summarizes the key trade-offs with concrete data to help you decide.

Feature	HNSW (m=16, ef=64)	IVFFlat (lists=1000)	Recommendation
Query latency (p50)	~10ms (ef_search=40)	~6ms (probes=10)	IVFFlat faster with low probes
Recall@10 at equal latency	~92%	~82%	HNSW better recall
Build time (1M vectors)	~45 min	~10 min	IVFFlat 4x faster
Index memory	~12 GB (1M x 1536)	~7 GB (1M x 1536)	IVFFlat ~40% less RAM
Incremental inserts	Excellent, no re-training	Degrades over time	HNSW for dynamic data
Data required for build	None (can start empty)	Requires existing data	HNSW more flexible
Parallel build (PG16+)	Yes, multi-worker	Partial	HNSW scales better

Quick Decision Guide

HNSW is the right choice in most cases: growing datasets, RAG applications where recall is critical, environments with adequate available RAM.
IVFFlat is preferable when: you have a near-static dataset (rarely updated), limited memory, or you need an operational index quickly (e.g. urgent proof-of-concept).
No index (brute force) is correct under 50K vectors or when 100% guaranteed recall is required.

Monitoring Index Health in Production

Index Status and Usage

-- Complete monitoring dashboard for vector indexes
SELECT
    schemaname,
    tablename,
    indexname,
    -- Usage
    idx_scan           AS queries_using_index,
    idx_tup_read       AS tuples_read_from_index,
    idx_tup_fetch      AS tuples_actually_returned,
    -- Efficiency
    CASE
        WHEN idx_scan > 0 THEN ROUND(idx_tup_fetch::numeric / idx_scan, 1)
        ELSE 0
    END AS avg_tuples_per_query,
    -- Size
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE tablename = 'documents'
ORDER BY idx_scan DESC;

-- Check if the index is in cache (shared_buffers)
-- Requires pg_buffercache extension:
CREATE EXTENSION IF NOT EXISTS pg_buffercache;

SELECT
    relname,
    pg_size_pretty(pg_relation_size(oid)) AS size,
    ROUND(
        (SELECT COUNT(*) FROM pg_buffercache WHERE relfilenode = pg_relation_filenode(oid))::numeric
        / NULLIF(pg_relation_size(oid) / 8192, 0) * 100, 2
    ) AS pct_in_shared_buffers
FROM pg_class
WHERE relname LIKE '%hnsw%' OR relname LIKE '%ivfflat%';

-- If <50% in cache, queries will be significantly slower (I/O bound)
-- Solution: increase shared_buffers or use pg_prewarm

-- Recent slow queries involving vector search (requires pg_stat_statements):
SELECT
    LEFT(query, 100) AS query_short,
    calls,
    ROUND(mean_exec_time::numeric, 2) AS mean_ms,
    ROUND(max_exec_time::numeric, 2) AS max_ms,
    ROUND(total_exec_time::numeric / 1000, 2) AS total_sec
FROM pg_stat_statements
WHERE query ILIKE '%<=%>%'  -- queries with vector distance operator
ORDER BY mean_exec_time DESC
LIMIT 10;

pg_prewarm: Load Index into Cache

-- pg_prewarm extension: loads indexes into shared_buffers on startup
CREATE EXTENSION IF NOT EXISTS pg_prewarm;

-- Load the HNSW index into cache immediately
SELECT pg_prewarm('documents_hnsw_idx');
-- Returns: number of blocks loaded

-- Check how much memory was used
SELECT
    pg_size_pretty(pg_relation_size('documents_hnsw_idx')) AS index_size,
    pg_size_pretty(current_setting('shared_buffers')::bigint) AS shared_buffers,
    ROUND(
        pg_relation_size('documents_hnsw_idx')::numeric /
        current_setting('shared_buffers')::bigint * 100, 1
    ) AS pct_of_shared_buffers;

-- Configure automatic prewarming at PostgreSQL startup
-- in postgresql.conf:
-- shared_preload_libraries = 'pg_prewarm'
-- pg_prewarm.autoprewarm = on
-- pg_prewarm.autoprewarm_interval = 300  -- save state every 5 minutes

-- This ensures that after a restart, the index is reloaded into cache
-- automatically using the state saved before shutdown.

-- Prewarm all vector indexes at once:
SELECT pg_prewarm(indexrelid::regclass)
FROM pg_stat_user_indexes
WHERE tablename = 'documents'
  AND (indexname LIKE '%hnsw%' OR indexname LIKE '%ivfflat%');

Index Degradation: The Incremental Insert Problem

A critical aspect often overlooked: ANN indexes degrade over time with inserts. HNSW adds new nodes to the existing structure, but the connection quality of these nodes is inferior compared to a full rebuild. IVFFlat degrades even more noticeably because new vectors are assigned to existing clusters that are no longer optimal for the updated data distribution.

Measuring Degradation

-- Monitor recall over time after inserts
-- Save recall periodically in a monitoring table

CREATE TABLE IF NOT EXISTS index_quality_log (
    measured_at   TIMESTAMPTZ DEFAULT NOW(),
    index_name    TEXT,
    n_rows        BIGINT,
    recall_at_10  FLOAT,
    p50_ms        FLOAT,
    p95_ms        FLOAT,
    pct_cache     FLOAT  -- % of index in shared_buffers
);

-- Automated measurement function
CREATE OR REPLACE FUNCTION measure_index_quality(
    p_index_name TEXT,
    p_table_name TEXT
) RETURNS void AS $
DECLARE
    v_n_rows BIGINT;
    v_cache_pct FLOAT;
BEGIN
    -- Count current rows
    EXECUTE format('SELECT COUNT(*) FROM %I', p_table_name) INTO v_n_rows;

    -- Calculate % in cache (approximation)
    SELECT ROUND(
        (SELECT COUNT(*) FROM pg_buffercache
         WHERE relfilenode = pg_relation_filenode(p_index_name::regclass))::numeric
        / NULLIF(pg_relation_size(p_index_name::regclass) / 8192, 0) * 100, 2
    ) INTO v_cache_pct;

    -- Insert log (recall must be measured externally with a test set)
    INSERT INTO index_quality_log (index_name, n_rows, pct_cache)
    VALUES (p_index_name, v_n_rows, v_cache_pct);

    RAISE NOTICE 'Quality log: index=%, rows=%, cache=%\%', p_index_name, v_n_rows, v_cache_pct;
END;
$ LANGUAGE plpgsql;

-- Query to view degradation over time
SELECT
    measured_at::date AS date,
    n_rows AS rows,
    recall_at_10 AS "Recall@10",
    p95_ms AS "P95 latency (ms)",
    pct_cache AS "% In Cache"
FROM index_quality_log
WHERE index_name = 'documents_hnsw_idx'
ORDER BY measured_at;

-- Alert thresholds (set up notifications when exceeded):
-- recall_at_10 < 0.85  -> urgent REINDEX needed
-- recall_at_10 < 0.90  -> schedule REINDEX within 1 week
-- p95_ms > 100         -> check if index is in cache
-- pct_cache < 50%      -> increase shared_buffers or run pg_prewarm

REINDEX CONCURRENTLY: Rebuild Without Downtime

-- REINDEX CONCURRENTLY rebuilds the index without blocking read queries
-- Requires PostgreSQL 12+ and takes longer than regular REINDEX
-- During the rebuild, queries continue using the old index

-- METHOD 1: Direct REINDEX (simpler, PostgreSQL 12+)
REINDEX INDEX CONCURRENTLY documents_hnsw_idx;
-- Pro: simple
-- Con: cannot change parameters during rebuild

-- METHOD 2: Swap with temporary index (more flexible)
-- Step 1: Create a new index with optimized parameters
CREATE INDEX CONCURRENTLY documents_hnsw_new
ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m=16, ef_construction=128);  -- improved ef_construction!

-- Step 2: Verify the new index was built correctly
SELECT
    indexname,
    indisvalid AS is_valid,
    pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_indexes
JOIN pg_index ON pg_index.indexrelid = pg_class.oid
JOIN pg_class ON pg_class.relname = pg_indexes.indexname
WHERE tablename = 'documents'
  AND indexname IN ('documents_hnsw_idx', 'documents_hnsw_new');

-- Step 3: Atomic swap (only a brief exclusive lock)
BEGIN;
DROP INDEX documents_hnsw_idx;          -- very brief exclusive lock
ALTER INDEX documents_hnsw_new RENAME TO documents_hnsw_idx;
COMMIT;

-- When to rebuild?
-- After >20% inserts/updates relative to original size
-- If recall < 0.85 (measured with test set)
-- After large batch deletions (>30% of rows)
-- Recommended schedule: weekly for highly dynamic datasets,
-- monthly for stable datasets

Multi-Index Strategies

In complex production scenarios, you may need multiple indexes for different access patterns. PostgreSQL with pgvector supports multiple vector indexes on the same embedding column, and the query planner automatically chooses the most appropriate one.

-- Strategy 1: Partial indexes by document type
-- Advantages: each index is smaller, faster, occupies less RAM
CREATE INDEX idx_hnsw_docs_pdf
ON documents USING hnsw (embedding vector_cosine_ops)
WITH (m=16, ef_construction=64)
WHERE source_type = 'pdf';

CREATE INDEX idx_hnsw_docs_web
ON documents USING hnsw (embedding vector_cosine_ops)
WITH (m=16, ef_construction=64)
WHERE source_type IN ('html', 'md');

-- Query that automatically activates the partial index:
EXPLAIN SELECT id, content
FROM documents
WHERE source_type = 'pdf'           -- this condition activates idx_hnsw_docs_pdf
ORDER BY embedding <=> '[...]'::vector
LIMIT 5;
-- Output: Index Scan using idx_hnsw_docs_pdf

-- Strategy 2: Indexes for different dimensions (Matryoshka embeddings / MRL)
-- text-embedding-3-small supports 512 and 1536 dimensions
ALTER TABLE documents ADD COLUMN IF NOT EXISTS embedding_512 vector(512);
ALTER TABLE documents ADD COLUMN IF NOT EXISTS embedding_1536 vector(1536);

CREATE INDEX idx_hnsw_512
ON documents USING hnsw (embedding_512 vector_cosine_ops)
WITH (m=16, ef_construction=64);

CREATE INDEX idx_hnsw_1536
ON documents USING hnsw (embedding_1536 vector_cosine_ops)
WITH (m=32, ef_construction=128);  -- higher quality for the full version

-- Queries using the appropriate version:
-- Fast search (autocomplete, 3x faster, ~95% quality):
SELECT id, content, embedding_512 <=> query_512 AS dist
FROM documents
ORDER BY embedding_512 <=> query_512 LIMIT 20;

-- Precise search (RAG):
SELECT id, content, embedding_1536 <=> query_1536 AS dist
FROM documents
WHERE id IN (
    SELECT id FROM documents
    ORDER BY embedding_512 <=> query_512 LIMIT 100  -- coarse filter
)
ORDER BY embedding_1536 <=> query_1536 LIMIT 5;

-- Strategy 3: Time-based index (only recent documents)
-- Ideal for news, chat history, fresh document applications
CREATE INDEX idx_hnsw_recent
ON documents USING hnsw (embedding vector_cosine_ops)
WITH (m=16, ef_construction=64)
WHERE created_at > NOW() - INTERVAL '7 days';

-- The index populates automatically with new inserts
-- Old documents fall outside the condition automatically
-- Periodic REINDEX to remove dead links to expired documents

PostgreSQL Configuration for Vector Workloads

PostgreSQL configuration is as important as the choice of index parameters. A wrong configuration can nullify all the benefits of HNSW. Here is the complete optimal configuration for production RAG systems.

# postgresql.conf - Optimal configuration for vector search
# Apply after determining the server RAM amount

# ========================================
# MEMORY - The most critical part
# ========================================
shared_buffers = '8GB'          # 25% of total RAM
                                  # HNSW index MUST fit here
                                  # With 32GB RAM: shared_buffers = 8GB
                                  # With 64GB RAM: shared_buffers = 16GB

effective_cache_size = '24GB'   # 75% of total RAM
                                  # Hint for the query planner
                                  # Does NOT allocate memory, just a hint

work_mem = '64MB'               # For sort and hash operations
                                  # Influences queries with ORDER BY + LIMIT
                                  # Caution: each connection may use it multiple times

maintenance_work_mem = '2GB'    # For CREATE INDEX (uses MUCH more than normal)
                                  # Set to 25-50% of RAM before a rebuild
                                  # After build, restore original value

# ========================================
# PARALLELISM
# ========================================
max_parallel_workers_per_gather = 4  # Workers per single query
max_parallel_workers = 8             # Total workers system-wide
max_parallel_maintenance_workers = 7 # For parallel CREATE INDEX (PG16+)
parallel_tuple_cost = 0.1            # Encourages parallelism usage
parallel_setup_cost = 100            # Parallelism setup overhead

# ========================================
# pgvector SETTINGS
# ========================================
# Set at runtime or in postgresql.conf:
hnsw.ef_search = 60             # System default (override per session)
ivfflat.probes = 10             # System default

# ========================================
# WAL for heavy INSERTs
# ========================================
wal_buffers = '64MB'
max_wal_size = '4GB'
checkpoint_completion_target = 0.9
wal_compression = on            # Reduces WAL I/O (useful for intensive ingestion)

# ========================================
# MONITORING
# ========================================
log_min_duration_statement = 100   # Log queries slower than 100ms
track_io_timing = on               # Measure I/O time (useful for cache miss diagnosis)
track_activity_query_size = 2048   # Truncate query in log to 2048 chars
shared_preload_libraries = 'pg_stat_statements,pg_prewarm'
pg_stat_statements.max = 10000    # Track last 10K unique queries

Parallel Index Build: Accelerating HNSW Construction

-- PostgreSQL 16+ supports parallel index build for HNSW
-- Dramatically reduces build time on multi-core systems

-- Set workers for the build (more workers = faster build)
-- max = max_parallel_maintenance_workers
SET max_parallel_maintenance_workers = 7;  -- use 8 CPUs total (1 leader + 7 workers)

-- Set maintenance_work_mem high for the build
SET maintenance_work_mem = '4GB';

-- Build with parallelism:
CREATE INDEX idx_hnsw_parallel
ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m=16, ef_construction=64);

-- Monitor build progress (PostgreSQL 12+):
SELECT
    phase,
    blocks_done,
    blocks_total,
    ROUND(blocks_done::numeric / NULLIF(blocks_total, 0) * 100, 1) AS pct_done,
    tuples_done,
    tuples_total,
    ROUND(tuples_done::numeric / NULLIF(tuples_total, 0) * 100, 1) AS pct_tuples_done
FROM pg_stat_progress_create_index
WHERE relid = 'documents'::regclass;

-- Build speed comparison (1M vectors, 1536 dim, m=16, ef=64):
-- 1 worker:  ~90 minutes
-- 4 workers: ~25 minutes
-- 8 workers: ~15 minutes  (diminishing returns beyond 8)
-- 16 workers: ~12 minutes (minimal additional improvement)

-- Automatic build time estimate based on current data:
WITH stats AS (SELECT COUNT(*) AS n FROM documents)
SELECT
    n AS num_vectors,
    ROUND(n / 1000000.0, 2) AS millions,
    -- Estimate with 8 workers, m=16, ef_construction=64
    ROUND(n / 1000000.0 * 15, 0) || ' min' AS estimated_build_8workers
FROM stats;

Two-Phase Retrieval: Precision with Efficiency

An advanced technique to balance speed and precision is two-phase retrieval (also called coarse-to-fine search): a fast first phase using low-dimensional embeddings or a relaxed HNSW index, followed by precise re-ranking on a restricted subset of the best candidates. This often delivers better recall than a single-phase search at lower latency.

-- Two-phase retrieval for maximum efficiency
-- Phase 1: Fast coarse search with 512-dim embeddings (3x faster)
-- Phase 2: Precise re-ranking with 1536-dim embeddings (only on top-50)

WITH coarse_candidates AS (
    -- Phase 1: top-50 candidates with fast 512-dim search
    SELECT
        id,
        embedding_512 <=> %s::vector(512) AS coarse_dist
    FROM documents
    ORDER BY embedding_512 <=> %s::vector(512)
    LIMIT 50
),
precise_ranking AS (
    -- Phase 2: re-rank top-50 with precise 1536-dim embeddings
    SELECT
        d.id,
        d.content,
        d.source_path,
        d.embedding_1536 <=> %s::vector(1536) AS precise_dist,
        1 - (d.embedding_1536 <=> %s::vector(1536)) AS similarity
    FROM documents d
    INNER JOIN coarse_candidates c ON c.id = d.id
    ORDER BY d.embedding_1536 <=> %s::vector(1536)
)
SELECT id, content, source_path, similarity
FROM precise_ranking
LIMIT 5;

-- Latency vs quality comparison (1M vectors):
-- Direct 1536-dim HNSW (ef_search=40): ~10ms,  Recall@5 ~94%
-- Direct 1536-dim HNSW (ef_search=100): ~25ms, Recall@5 ~98%
-- Two-phase (512 coarse + 1536 rerank): ~4ms,  Recall@5 ~96%
--   -> 2.5x faster with even better recall!

Managing Deletions: Vacuum and HNSW

-- Deletions in PostgreSQL are "soft deletes" (tuples marked dead)
-- The HNSW index maintains references to these dead tuples
-- VACUUM removes dead tuples and updates the index

-- Check dead tuples (indicator of VACUUM need)
SELECT
    relname AS table_name,
    n_live_tup AS live_rows,
    n_dead_tup AS dead_rows,
    ROUND(n_dead_tup::numeric / NULLIF(n_live_tup, 0) * 100, 2) AS pct_dead,
    last_vacuum,
    last_autovacuum,
    n_mod_since_analyze AS mods_since_analyze
FROM pg_stat_user_tables
WHERE relname = 'documents';

-- If pct_dead > 10-20%, it is time to vacuum
VACUUM ANALYZE documents;  -- vacuum + update statistics

-- VACUUM FULL: rebuilds the table (blocks writes, frees more space)
-- Use only in scheduled maintenance windows:
VACUUM FULL documents;

-- Aggressive autovacuum for vector tables
-- (many updates/deletes typical in RAG pipelines):
ALTER TABLE documents SET (
    autovacuum_vacuum_scale_factor = 0.01,   -- vacuum after 1% modified (default 20%)
    autovacuum_analyze_scale_factor = 0.005, -- analyze after 0.5% (default 10%)
    autovacuum_vacuum_cost_delay = 2,        -- more aggressive (default 20ms)
    autovacuum_vacuum_threshold = 50         -- at least 50 rows modified (default 50)
);

-- Verify autovacuum is running:
SELECT
    schemaname,
    relname,
    last_autovacuum,
    last_autoanalyze,
    autovacuum_count,
    autoanalyze_count
FROM pg_stat_user_tables
WHERE relname = 'documents';

Query Optimization with EXPLAIN ANALYZE

Using EXPLAIN ANALYZE regularly is essential to verify that vector search queries are using indexes correctly and to diagnose performance problems.

-- Complete analysis of a vector search query
EXPLAIN (ANALYZE, BUFFERS, TIMING, FORMAT TEXT)
SELECT id, content, embedding <=> '[0.1, 0.2, ...]'::vector(1536) AS dist
FROM documents
WHERE source_type = 'pdf'
ORDER BY embedding <=> '[0.1, 0.2, ...]'::vector(1536)
LIMIT 10;

-- Desired output (uses HNSW index):
-- Index Scan using idx_hnsw_docs_pdf on documents  (cost=0.00..8.54 rows=10)
--   Index Cond: (embedding <=> '[...]'::vector)
--   Filter: (source_type = 'pdf')
--   Buffers: shared hit=247    <-- everything from cache!
--   -> Planning Time: 0.3 ms
--   -> Execution Time: 8.7 ms

-- Undesired output (brute force - to be avoided):
-- Seq Scan on documents  (cost=0.00..1234.56 rows=10)
--   Sort Key: (embedding <=> '...'::vector)
--   Buffers: shared hit=1234 read=5678  <-- many reads from disk!
--   -> Execution Time: 3450 ms

-- If you see Seq Scan instead of Index Scan, check:
-- 1. Does the index exist?
SELECT indexname FROM pg_indexes WHERE tablename = 'documents';
-- 2. Is the LIMIT small enough?
--    PostgreSQL uses the index only for small LIMITs
-- 3. Are statistics up to date?
ANALYZE documents;
-- 4. Is enable_indexscan on?
SHOW enable_indexscan;  -- must be 'on'
-- 5. Is ef_search appropriate?
SHOW hnsw.ef_search;

Production Checklist for Vector Indexes

Memory sizing: Verify that shared_buffers is large enough to hold the HNSW index. If the index is not in cache, queries will be 10-100x slower.
maintenance_work_mem: Set to 1-4GB before creating the index. You can reduce it after the build.
Parallel build: Use max_parallel_maintenance_workers=7 for fast builds on multi-core systems. Saves hours of downtime.
ef_search in production: Do not use the default of 40. Measure recall on your dataset and set the appropriate value (typically 60-100 for enterprise RAG).
Recall monitoring: Run recall tests weekly. If it drops below 0.85, schedule an urgent REINDEX.
Aggressive autovacuum: For tables with heavy inserts/deletes, lower autovacuum_vacuum_scale_factor to 0.01-0.05.
pg_prewarm: Enable auto-prewarming to ensure the index is in cache after every PostgreSQL restart.
EXPLAIN ANALYZE: Regularly check that queries use HNSW indexes and do not accidentally fall back to sequential scans.

Common Mistakes and How to Avoid Them

Mistake	Symptom	Solution
shared_buffers too small	Slow queries (>500ms), high % disk reads in EXPLAIN	Increase to 25% of RAM; use pg_prewarm
Default ef_search (40) in production	Recall@10 ~92%, imprecise RAG responses	Set 60-100 for enterprise RAG
IVFFlat with probes=1 (default)	Recall@10 ~50-60%, completely wrong results	Set probes=10-50 according to recall target
No REINDEX after many inserts	Recall degrades progressively over time	Schedule weekly/monthly REINDEX CONCURRENTLY
Sequential scan instead of Index Scan	Very slow vector queries without index	ANALYZE table; check LIMIT and WHERE clause
maintenance_work_mem too low	HNSW build extremely slow, takes hours/days	SET maintenance_work_mem = '2GB' before CREATE INDEX

Conclusions and Next Steps

Vector index tuning is a discipline that requires concrete measurements, not just intuition. Optimal parameters depend on your specific dataset, latency requirements, and available memory. The golden rule: measure first, optimize after, monitor always.

In 2026, with the "Just Use Postgres" trend increasingly consolidated, knowing how to correctly configure HNSW and IVFFlat indexes lets you achieve performance competitive with specialized vector databases like Pinecone or Qdrant, while maintaining the simplicity of a single PostgreSQL infrastructure. Recent benchmarks show pgvector up to 28x faster than Pinecone at 16x lower cost, with the right configuration.

The final article in this series tackles the ultimate challenge: taking all of this to production at scale. Partitioning for enormous datasets, connection pooling with PgBouncer, read replicas dedicated to vector search, query caching with Redis, and multi-tenant architectures that allow PostgreSQL to serve millions of vector queries per day.

Continue the Series

Previous: Advanced Similarity Search in PostgreSQL
Next: RAG in Production: Scalable Architecture
Related: MLOps: Production Infrastructure
Related: AI Engineering: Advanced RAG Pipeline