Indexing for Vector Search: HNSW, IVFFlat and Performance Strategies
In the previous article, we explored how HNSW and IVFFlat work conceptually. Now we dive into the technical details that make the difference in production: how to choose optimal parameters, how to monitor index health over time, how to handle incremental updates without performance loss, and how to scale with millions of vectors on real hardware.
A poorly configured vector index can be 10x slower than an optimized one, or consume 4x more RAM than necessary. This article gives you concrete numbers and practical rules for configuring pgvector professionally, based on real benchmarks and production patterns. In 2026, with the "Just Use Postgres" trend continuing to grow, knowing how to correctly configure vector indexes is a fundamental skill for every AI engineer.
Series Overview
| # | Article | Focus |
|---|---|---|
| 1 | pgvector | Installation, operators, indexing |
| 2 | Embeddings Deep Dive | Models, distances, generation |
| 3 | RAG with PostgreSQL | End-to-end RAG pipeline |
| 4 | Similarity Search | Algorithms and optimization |
| 5 | You are here - HNSW and IVFFlat | Advanced indexing strategies |
| 6 | RAG in Production | Scalability and performance |
What You Will Learn
- Calculating HNSW and IVFFlat index sizes
- Choosing optimal parameters: formulas and benchmarks
- Configuring ef_search and probes at runtime for specific recall targets
- Monitoring index health in production
- REINDEX without downtime
- Incremental updates: how inserts degrade ANN indexes over time
- Multi-index strategies for different use cases
- Full PostgreSQL configuration to maximize vector performance
- Advanced query optimization: query planning and EXPLAIN ANALYZE
- Scheduled index maintenance: vacuum and aggressive autovacuum
Index Sizing: How Much Space Do They Use?
Before creating an index, it is essential to understand how much space it will occupy in memory
and on disk. Having the entire index in shared_buffers is the optimal condition
for maximum performance. An index that does not fit in memory requires I/O for every query,
increasing latency by 10-100x.
HNSW Size Formula
-- Approximate HNSW index size formula:
-- Size ~= n_vectors * m * (2 + 4 * d / 8) bytes + overhead
-- where:
-- n_vectors = number of vectors
-- m = index m parameter (connections per node)
-- d = vector dimensions (e.g. 1536)
-- Practical example: 1M vectors, dim=1536, m=16:
-- 1_000_000 * 16 * (2 + 4 * 1536 / 8) = 1_000_000 * 16 * 770 = 12.3 GB
-- HNSW typically occupies 1.5-3x the raw data size
-- Raw data size (float4 vector):
-- 4 bytes * 1536 dim * 1_000_000 vectors = 6.1 GB
-- Check actual sizes:
SELECT
pg_size_pretty(pg_relation_size('documents')) AS table_size,
pg_size_pretty(pg_indexes_size('documents')) AS indexes_size,
pg_size_pretty(pg_total_relation_size('documents')) AS total_size;
-- Size of each specific index:
SELECT
indexname,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
pg_size_pretty(pg_relation_size(indexrelid)) AS raw_bytes
FROM pg_stat_user_indexes
WHERE tablename = 'documents'
ORDER BY pg_relation_size(indexrelid) DESC;
-- Check if index fits in shared_buffers:
-- Rule of thumb: shared_buffers must be >= 1.5x the HNSW index size
-- If it doesn't fit, consider reduced-dimension embeddings (768 or 384 dim)
SELECT
current_setting('shared_buffers') AS shared_buffers,
pg_size_pretty(pg_indexes_size('documents')) AS total_index_size;
Quick Sizing Reference
| Vectors | Dimensions | Raw Data | HNSW (m=16) | IVFFlat (lists=sqrt(n)) | RAM Recommended |
|---|---|---|---|---|---|
| 100K | 1536 | 600 MB | ~1.2 GB | ~700 MB | 4 GB |
| 1M | 1536 | 6 GB | ~12 GB | ~7 GB | 32 GB |
| 10M | 1536 | 60 GB | ~120 GB | ~70 GB | 256 GB |
| 1M | 768 | 3 GB | ~6 GB | ~3.5 GB | 16 GB |
| 1M | 384 | 1.5 GB | ~3 GB | ~1.8 GB | 8 GB |
HNSW Parameters: Optimal Configuration Guide
HNSW has three key parameters that control the balance between memory, build time, recall, and query latency. Understanding them deeply allows you to configure the index professionally for any use case.
The m Parameter: Connections Per Node
-- m: maximum bidirectional connections per node per level
-- Default: 16 | Valid range: 4-64 (pgvector max: 100)
-- Practical guidelines for m:
-- m=8: Low memory, lower recall. Use for: caching, fast suggestions, huge datasets
-- m=16: Balanced default. Use for: general-purpose RAG, semantic search
-- m=32: High recall, double memory. Use for: medical/legal search, high precision
-- m=64: Maximum recall, 4x memory. Use for: extreme cases, small datasets <100K
-- Benchmark m vs recall and memory (1M vectors, 1536 dim, ef_search=40):
-- m=8: recall@10=84%, index=6GB, p50=7ms, p95=15ms
-- m=16: recall@10=93%, index=12GB, p50=10ms, p95=22ms
-- m=32: recall@10=97%, index=24GB, p50=18ms, p95=38ms
-- m=64: recall@10=99%, index=48GB, p50=35ms, p95=72ms
-- Create indexes with different m values (comparative test):
CREATE INDEX idx_hnsw_m8 ON documents USING hnsw (embedding vector_cosine_ops) WITH (m=8, ef_construction=64);
CREATE INDEX idx_hnsw_m16 ON documents USING hnsw (embedding vector_cosine_ops) WITH (m=16, ef_construction=64);
CREATE INDEX idx_hnsw_m32 ON documents USING hnsw (embedding vector_cosine_ops) WITH (m=32, ef_construction=64);
-- Compare actual sizes of created indexes:
SELECT
indexname,
pg_size_pretty(pg_relation_size(indexrelid)) AS size,
idx_scan AS query_count
FROM pg_stat_user_indexes
WHERE tablename = 'documents'
AND indexname LIKE 'idx_hnsw_m%'
ORDER BY indexname;
The ef_construction Parameter: Build Quality
-- ef_construction: candidates considered during index build
-- Affects the quality (maximum achievable recall) of the built index
-- Does NOT affect index size
-- Default: 64
-- Practical guidelines:
-- ef_construction=32: Fast build, reduced potential recall. Prototypes only.
-- ef_construction=64: Default. Excellent for most cases.
-- ef_construction=128: 2x slower build, ~2% better max recall.
-- ef_construction=200: Very slow build, marginal improvement.
-- Build time benchmark (m=16, 1M vectors):
-- ef=32: Build ~20 min, max recall@10 ~89%
-- ef=64: Build ~45 min, max recall@10 ~95%
-- ef=128: Build ~90 min, max recall@10 ~97%
-- ef=256: Build ~3 hours, max recall@10 ~98.5%
-- For maximum index quality (one-time operation):
-- Use a large maintenance_work_mem for the build
SET maintenance_work_mem = '4GB'; -- temporary for the build
CREATE INDEX idx_hnsw_highquality
ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m=24, ef_construction=128);
-- After the build, RAM is released automatically
-- Monitor build progress:
SELECT
phase,
blocks_done,
blocks_total,
ROUND(blocks_done::numeric / NULLIF(blocks_total, 0) * 100, 1) AS pct_done,
tuples_done,
tuples_total
FROM pg_stat_progress_create_index
WHERE relid = 'documents'::regclass;
The ef_search Parameter: Query Quality
-- ef_search: candidates examined during search (beam search width)
-- This is a RUNTIME parameter: you can change it without rebuilding the index
-- Default: 40
-- Valid range: 1 -> ef_construction (build-time max)
-- Set ef_search for the current session:
SET hnsw.ef_search = 40; -- default, good balance
-- High precision (enterprise RAG, medical, legal):
SET hnsw.ef_search = 100;
-- High speed (autocomplete, real-time recommendation):
SET hnsw.ef_search = 20;
-- Benchmark ef_search (1M vectors, 1536 dim, m=16, ef_construction=64):
-- ef_search=10: ~3ms/query, recall@10 ~75%
-- ef_search=20: ~5ms/query, recall@10 ~85%
-- ef_search=40: ~10ms/query, recall@10 ~92%
-- ef_search=100: ~25ms/query, recall@10 ~97%
-- ef_search=200: ~50ms/query, recall@10 ~99%
-- Set per transaction (safer in production):
BEGIN;
SET LOCAL hnsw.ef_search = 80;
SELECT id, content, embedding <=> $1::vector AS dist
FROM documents
ORDER BY embedding <=> $1::vector
LIMIT 10;
COMMIT;
-- Set globally in postgresql.conf (persists across restarts):
-- hnsw.ef_search = 60
-- Check current configuration:
SHOW hnsw.ef_search;
SELECT current_setting('hnsw.ef_search');
IVFFlat Parameters: lists and probes
IVFFlat uses a fundamentally different approach: instead of a navigable graph, it creates
clusters via K-means and searches only the most promising clusters at query time. The
lists and probes parameters control this mechanism.
Choosing the Number of Lists
-- lists: number of clusters (centroids) for IVFFlat
-- Rule of thumb:
-- lists = sqrt(n_rows) for datasets up to 1M rows
-- lists = n_rows / 1000 for datasets above 1M rows
-- Automatically calculate the optimal value:
WITH stats AS (
SELECT COUNT(*) AS n FROM documents
)
SELECT
n,
CEIL(SQRT(n::float))::int AS recommended_lists,
CEIL(SQRT(n::float))::int * 10 AS max_probes -- max probes = 10% of lists
FROM stats;
-- Examples:
-- 10K rows: lists=100 (sqrt=100, minimum recommended=100)
-- 100K rows: lists=316 (sqrt(100000))
-- 1M rows: lists=1000 (sqrt(1000000))
-- 10M rows: lists=3162 (sqrt(10000000))
-- 100M rows: lists=10000
-- Create the index with the calculated value (automated procedure):
DO $
DECLARE
n_rows INTEGER;
n_lists INTEGER;
BEGIN
SELECT COUNT(*) INTO n_rows FROM documents;
n_lists := GREATEST(100, CEIL(SQRT(n_rows::float))::int);
EXECUTE format(
'CREATE INDEX idx_ivfflat ON documents USING ivfflat (embedding vector_cosine_ops) WITH (lists = %s)',
n_lists
);
RAISE NOTICE 'IVFFlat index created with % lists for % rows', n_lists, n_rows;
END $;
-- IMPORTANT: IVFFlat needs existing data to perform K-means clustering!
-- Create the index AFTER loading at least 70-80% of your data.
-- If you add significant data after the build, recall degrades: rebuild periodically.
Probes at Runtime: Balancing Recall and Latency
-- probes: how many lists to search during a query
-- Must be <= lists
-- Default: 1 (searches only the nearest list - very fast but very low recall!)
-- WARNING: the default probes=1 gives very low recall!
-- Always set an appropriate probes value for your use case.
-- Formula for recall target:
-- probes_needed ~= lists * target_recall^2
-- For 90% recall with lists=1000: probes ~= 1000 * 0.81 = 810 (!!)
-- For 85% recall with lists=1000: probes ~= 1000 * 0.72 = 720
-- For 80% recall with lists=1000: probes ~= 1000 * 0.64 = 640
-- In practice, with well-distributed clustering (K-means converges):
-- probes = lists * 0.05 -> recall ~= 85% (good balance)
-- probes = lists * 0.10 -> recall ~= 90%
-- probes = lists * 0.20 -> recall ~= 95%
-- Benchmark IVFFlat (1M vectors, lists=1000, 1536 dim):
-- probes=5: ~3ms/query, recall@10 ~72%
-- probes=10: ~6ms/query, recall@10 ~82%
-- probes=50: ~28ms/query, recall@10 ~92%
-- probes=100: ~55ms/query, recall@10 ~96%
-- probes=200: ~110ms/query, recall@10 ~98%
-- Set in postgresql.conf (persists between sessions):
-- ivfflat.probes = 10 (global default)
-- Override per session/transaction:
BEGIN;
SET LOCAL ivfflat.probes = 50; -- only for this transaction
SELECT id, content FROM documents ORDER BY embedding <=> query_vec LIMIT 5;
COMMIT;
-- Application-level tuning in Python with psycopg2:
with conn.cursor() as cur:
cur.execute("SET ivfflat.probes = %s", (probes,))
cur.execute("""
SELECT id, content, 1 - (embedding <=> %s::vector) AS similarity
FROM documents
ORDER BY embedding <=> %s::vector
LIMIT %s
""", (query_vec, query_vec, top_k))
results = cur.fetchall()
Direct Comparison: HNSW vs IVFFlat
Choosing between HNSW and IVFFlat is not always obvious. This table summarizes the key trade-offs with concrete data to help you decide.
| Feature | HNSW (m=16, ef=64) | IVFFlat (lists=1000) | Recommendation |
|---|---|---|---|
| Query latency (p50) | ~10ms (ef_search=40) | ~6ms (probes=10) | IVFFlat faster with low probes |
| Recall@10 at equal latency | ~92% | ~82% | HNSW better recall |
| Build time (1M vectors) | ~45 min | ~10 min | IVFFlat 4x faster |
| Index memory | ~12 GB (1M x 1536) | ~7 GB (1M x 1536) | IVFFlat ~40% less RAM |
| Incremental inserts | Excellent, no re-training | Degrades over time | HNSW for dynamic data |
| Data required for build | None (can start empty) | Requires existing data | HNSW more flexible |
| Parallel build (PG16+) | Yes, multi-worker | Partial | HNSW scales better |
Quick Decision Guide
- HNSW is the right choice in most cases: growing datasets, RAG applications where recall is critical, environments with adequate available RAM.
- IVFFlat is preferable when: you have a near-static dataset (rarely updated), limited memory, or you need an operational index quickly (e.g. urgent proof-of-concept).
- No index (brute force) is correct under 50K vectors or when 100% guaranteed recall is required.
Monitoring Index Health in Production
Index Status and Usage
-- Complete monitoring dashboard for vector indexes
SELECT
schemaname,
tablename,
indexname,
-- Usage
idx_scan AS queries_using_index,
idx_tup_read AS tuples_read_from_index,
idx_tup_fetch AS tuples_actually_returned,
-- Efficiency
CASE
WHEN idx_scan > 0 THEN ROUND(idx_tup_fetch::numeric / idx_scan, 1)
ELSE 0
END AS avg_tuples_per_query,
-- Size
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE tablename = 'documents'
ORDER BY idx_scan DESC;
-- Check if the index is in cache (shared_buffers)
-- Requires pg_buffercache extension:
CREATE EXTENSION IF NOT EXISTS pg_buffercache;
SELECT
relname,
pg_size_pretty(pg_relation_size(oid)) AS size,
ROUND(
(SELECT COUNT(*) FROM pg_buffercache WHERE relfilenode = pg_relation_filenode(oid))::numeric
/ NULLIF(pg_relation_size(oid) / 8192, 0) * 100, 2
) AS pct_in_shared_buffers
FROM pg_class
WHERE relname LIKE '%hnsw%' OR relname LIKE '%ivfflat%';
-- If <50% in cache, queries will be significantly slower (I/O bound)
-- Solution: increase shared_buffers or use pg_prewarm
-- Recent slow queries involving vector search (requires pg_stat_statements):
SELECT
LEFT(query, 100) AS query_short,
calls,
ROUND(mean_exec_time::numeric, 2) AS mean_ms,
ROUND(max_exec_time::numeric, 2) AS max_ms,
ROUND(total_exec_time::numeric / 1000, 2) AS total_sec
FROM pg_stat_statements
WHERE query ILIKE '%<=%>%' -- queries with vector distance operator
ORDER BY mean_exec_time DESC
LIMIT 10;
pg_prewarm: Load Index into Cache
-- pg_prewarm extension: loads indexes into shared_buffers on startup
CREATE EXTENSION IF NOT EXISTS pg_prewarm;
-- Load the HNSW index into cache immediately
SELECT pg_prewarm('documents_hnsw_idx');
-- Returns: number of blocks loaded
-- Check how much memory was used
SELECT
pg_size_pretty(pg_relation_size('documents_hnsw_idx')) AS index_size,
pg_size_pretty(current_setting('shared_buffers')::bigint) AS shared_buffers,
ROUND(
pg_relation_size('documents_hnsw_idx')::numeric /
current_setting('shared_buffers')::bigint * 100, 1
) AS pct_of_shared_buffers;
-- Configure automatic prewarming at PostgreSQL startup
-- in postgresql.conf:
-- shared_preload_libraries = 'pg_prewarm'
-- pg_prewarm.autoprewarm = on
-- pg_prewarm.autoprewarm_interval = 300 -- save state every 5 minutes
-- This ensures that after a restart, the index is reloaded into cache
-- automatically using the state saved before shutdown.
-- Prewarm all vector indexes at once:
SELECT pg_prewarm(indexrelid::regclass)
FROM pg_stat_user_indexes
WHERE tablename = 'documents'
AND (indexname LIKE '%hnsw%' OR indexname LIKE '%ivfflat%');
Index Degradation: The Incremental Insert Problem
A critical aspect often overlooked: ANN indexes degrade over time with inserts. HNSW adds new nodes to the existing structure, but the connection quality of these nodes is inferior compared to a full rebuild. IVFFlat degrades even more noticeably because new vectors are assigned to existing clusters that are no longer optimal for the updated data distribution.
Measuring Degradation
-- Monitor recall over time after inserts
-- Save recall periodically in a monitoring table
CREATE TABLE IF NOT EXISTS index_quality_log (
measured_at TIMESTAMPTZ DEFAULT NOW(),
index_name TEXT,
n_rows BIGINT,
recall_at_10 FLOAT,
p50_ms FLOAT,
p95_ms FLOAT,
pct_cache FLOAT -- % of index in shared_buffers
);
-- Automated measurement function
CREATE OR REPLACE FUNCTION measure_index_quality(
p_index_name TEXT,
p_table_name TEXT
) RETURNS void AS $
DECLARE
v_n_rows BIGINT;
v_cache_pct FLOAT;
BEGIN
-- Count current rows
EXECUTE format('SELECT COUNT(*) FROM %I', p_table_name) INTO v_n_rows;
-- Calculate % in cache (approximation)
SELECT ROUND(
(SELECT COUNT(*) FROM pg_buffercache
WHERE relfilenode = pg_relation_filenode(p_index_name::regclass))::numeric
/ NULLIF(pg_relation_size(p_index_name::regclass) / 8192, 0) * 100, 2
) INTO v_cache_pct;
-- Insert log (recall must be measured externally with a test set)
INSERT INTO index_quality_log (index_name, n_rows, pct_cache)
VALUES (p_index_name, v_n_rows, v_cache_pct);
RAISE NOTICE 'Quality log: index=%, rows=%, cache=%\%', p_index_name, v_n_rows, v_cache_pct;
END;
$ LANGUAGE plpgsql;
-- Query to view degradation over time
SELECT
measured_at::date AS date,
n_rows AS rows,
recall_at_10 AS "Recall@10",
p95_ms AS "P95 latency (ms)",
pct_cache AS "% In Cache"
FROM index_quality_log
WHERE index_name = 'documents_hnsw_idx'
ORDER BY measured_at;
-- Alert thresholds (set up notifications when exceeded):
-- recall_at_10 < 0.85 -> urgent REINDEX needed
-- recall_at_10 < 0.90 -> schedule REINDEX within 1 week
-- p95_ms > 100 -> check if index is in cache
-- pct_cache < 50% -> increase shared_buffers or run pg_prewarm
REINDEX CONCURRENTLY: Rebuild Without Downtime
-- REINDEX CONCURRENTLY rebuilds the index without blocking read queries
-- Requires PostgreSQL 12+ and takes longer than regular REINDEX
-- During the rebuild, queries continue using the old index
-- METHOD 1: Direct REINDEX (simpler, PostgreSQL 12+)
REINDEX INDEX CONCURRENTLY documents_hnsw_idx;
-- Pro: simple
-- Con: cannot change parameters during rebuild
-- METHOD 2: Swap with temporary index (more flexible)
-- Step 1: Create a new index with optimized parameters
CREATE INDEX CONCURRENTLY documents_hnsw_new
ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m=16, ef_construction=128); -- improved ef_construction!
-- Step 2: Verify the new index was built correctly
SELECT
indexname,
indisvalid AS is_valid,
pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_indexes
JOIN pg_index ON pg_index.indexrelid = pg_class.oid
JOIN pg_class ON pg_class.relname = pg_indexes.indexname
WHERE tablename = 'documents'
AND indexname IN ('documents_hnsw_idx', 'documents_hnsw_new');
-- Step 3: Atomic swap (only a brief exclusive lock)
BEGIN;
DROP INDEX documents_hnsw_idx; -- very brief exclusive lock
ALTER INDEX documents_hnsw_new RENAME TO documents_hnsw_idx;
COMMIT;
-- When to rebuild?
-- After >20% inserts/updates relative to original size
-- If recall < 0.85 (measured with test set)
-- After large batch deletions (>30% of rows)
-- Recommended schedule: weekly for highly dynamic datasets,
-- monthly for stable datasets
Multi-Index Strategies
In complex production scenarios, you may need multiple indexes for different access patterns. PostgreSQL with pgvector supports multiple vector indexes on the same embedding column, and the query planner automatically chooses the most appropriate one.
-- Strategy 1: Partial indexes by document type
-- Advantages: each index is smaller, faster, occupies less RAM
CREATE INDEX idx_hnsw_docs_pdf
ON documents USING hnsw (embedding vector_cosine_ops)
WITH (m=16, ef_construction=64)
WHERE source_type = 'pdf';
CREATE INDEX idx_hnsw_docs_web
ON documents USING hnsw (embedding vector_cosine_ops)
WITH (m=16, ef_construction=64)
WHERE source_type IN ('html', 'md');
-- Query that automatically activates the partial index:
EXPLAIN SELECT id, content
FROM documents
WHERE source_type = 'pdf' -- this condition activates idx_hnsw_docs_pdf
ORDER BY embedding <=> '[...]'::vector
LIMIT 5;
-- Output: Index Scan using idx_hnsw_docs_pdf
-- Strategy 2: Indexes for different dimensions (Matryoshka embeddings / MRL)
-- text-embedding-3-small supports 512 and 1536 dimensions
ALTER TABLE documents ADD COLUMN IF NOT EXISTS embedding_512 vector(512);
ALTER TABLE documents ADD COLUMN IF NOT EXISTS embedding_1536 vector(1536);
CREATE INDEX idx_hnsw_512
ON documents USING hnsw (embedding_512 vector_cosine_ops)
WITH (m=16, ef_construction=64);
CREATE INDEX idx_hnsw_1536
ON documents USING hnsw (embedding_1536 vector_cosine_ops)
WITH (m=32, ef_construction=128); -- higher quality for the full version
-- Queries using the appropriate version:
-- Fast search (autocomplete, 3x faster, ~95% quality):
SELECT id, content, embedding_512 <=> query_512 AS dist
FROM documents
ORDER BY embedding_512 <=> query_512 LIMIT 20;
-- Precise search (RAG):
SELECT id, content, embedding_1536 <=> query_1536 AS dist
FROM documents
WHERE id IN (
SELECT id FROM documents
ORDER BY embedding_512 <=> query_512 LIMIT 100 -- coarse filter
)
ORDER BY embedding_1536 <=> query_1536 LIMIT 5;
-- Strategy 3: Time-based index (only recent documents)
-- Ideal for news, chat history, fresh document applications
CREATE INDEX idx_hnsw_recent
ON documents USING hnsw (embedding vector_cosine_ops)
WITH (m=16, ef_construction=64)
WHERE created_at > NOW() - INTERVAL '7 days';
-- The index populates automatically with new inserts
-- Old documents fall outside the condition automatically
-- Periodic REINDEX to remove dead links to expired documents
PostgreSQL Configuration for Vector Workloads
PostgreSQL configuration is as important as the choice of index parameters. A wrong configuration can nullify all the benefits of HNSW. Here is the complete optimal configuration for production RAG systems.
# postgresql.conf - Optimal configuration for vector search
# Apply after determining the server RAM amount
# ========================================
# MEMORY - The most critical part
# ========================================
shared_buffers = '8GB' # 25% of total RAM
# HNSW index MUST fit here
# With 32GB RAM: shared_buffers = 8GB
# With 64GB RAM: shared_buffers = 16GB
effective_cache_size = '24GB' # 75% of total RAM
# Hint for the query planner
# Does NOT allocate memory, just a hint
work_mem = '64MB' # For sort and hash operations
# Influences queries with ORDER BY + LIMIT
# Caution: each connection may use it multiple times
maintenance_work_mem = '2GB' # For CREATE INDEX (uses MUCH more than normal)
# Set to 25-50% of RAM before a rebuild
# After build, restore original value
# ========================================
# PARALLELISM
# ========================================
max_parallel_workers_per_gather = 4 # Workers per single query
max_parallel_workers = 8 # Total workers system-wide
max_parallel_maintenance_workers = 7 # For parallel CREATE INDEX (PG16+)
parallel_tuple_cost = 0.1 # Encourages parallelism usage
parallel_setup_cost = 100 # Parallelism setup overhead
# ========================================
# pgvector SETTINGS
# ========================================
# Set at runtime or in postgresql.conf:
hnsw.ef_search = 60 # System default (override per session)
ivfflat.probes = 10 # System default
# ========================================
# WAL for heavy INSERTs
# ========================================
wal_buffers = '64MB'
max_wal_size = '4GB'
checkpoint_completion_target = 0.9
wal_compression = on # Reduces WAL I/O (useful for intensive ingestion)
# ========================================
# MONITORING
# ========================================
log_min_duration_statement = 100 # Log queries slower than 100ms
track_io_timing = on # Measure I/O time (useful for cache miss diagnosis)
track_activity_query_size = 2048 # Truncate query in log to 2048 chars
shared_preload_libraries = 'pg_stat_statements,pg_prewarm'
pg_stat_statements.max = 10000 # Track last 10K unique queries
Parallel Index Build: Accelerating HNSW Construction
-- PostgreSQL 16+ supports parallel index build for HNSW
-- Dramatically reduces build time on multi-core systems
-- Set workers for the build (more workers = faster build)
-- max = max_parallel_maintenance_workers
SET max_parallel_maintenance_workers = 7; -- use 8 CPUs total (1 leader + 7 workers)
-- Set maintenance_work_mem high for the build
SET maintenance_work_mem = '4GB';
-- Build with parallelism:
CREATE INDEX idx_hnsw_parallel
ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m=16, ef_construction=64);
-- Monitor build progress (PostgreSQL 12+):
SELECT
phase,
blocks_done,
blocks_total,
ROUND(blocks_done::numeric / NULLIF(blocks_total, 0) * 100, 1) AS pct_done,
tuples_done,
tuples_total,
ROUND(tuples_done::numeric / NULLIF(tuples_total, 0) * 100, 1) AS pct_tuples_done
FROM pg_stat_progress_create_index
WHERE relid = 'documents'::regclass;
-- Build speed comparison (1M vectors, 1536 dim, m=16, ef=64):
-- 1 worker: ~90 minutes
-- 4 workers: ~25 minutes
-- 8 workers: ~15 minutes (diminishing returns beyond 8)
-- 16 workers: ~12 minutes (minimal additional improvement)
-- Automatic build time estimate based on current data:
WITH stats AS (SELECT COUNT(*) AS n FROM documents)
SELECT
n AS num_vectors,
ROUND(n / 1000000.0, 2) AS millions,
-- Estimate with 8 workers, m=16, ef_construction=64
ROUND(n / 1000000.0 * 15, 0) || ' min' AS estimated_build_8workers
FROM stats;
Two-Phase Retrieval: Precision with Efficiency
An advanced technique to balance speed and precision is two-phase retrieval (also called coarse-to-fine search): a fast first phase using low-dimensional embeddings or a relaxed HNSW index, followed by precise re-ranking on a restricted subset of the best candidates. This often delivers better recall than a single-phase search at lower latency.
-- Two-phase retrieval for maximum efficiency
-- Phase 1: Fast coarse search with 512-dim embeddings (3x faster)
-- Phase 2: Precise re-ranking with 1536-dim embeddings (only on top-50)
WITH coarse_candidates AS (
-- Phase 1: top-50 candidates with fast 512-dim search
SELECT
id,
embedding_512 <=> %s::vector(512) AS coarse_dist
FROM documents
ORDER BY embedding_512 <=> %s::vector(512)
LIMIT 50
),
precise_ranking AS (
-- Phase 2: re-rank top-50 with precise 1536-dim embeddings
SELECT
d.id,
d.content,
d.source_path,
d.embedding_1536 <=> %s::vector(1536) AS precise_dist,
1 - (d.embedding_1536 <=> %s::vector(1536)) AS similarity
FROM documents d
INNER JOIN coarse_candidates c ON c.id = d.id
ORDER BY d.embedding_1536 <=> %s::vector(1536)
)
SELECT id, content, source_path, similarity
FROM precise_ranking
LIMIT 5;
-- Latency vs quality comparison (1M vectors):
-- Direct 1536-dim HNSW (ef_search=40): ~10ms, Recall@5 ~94%
-- Direct 1536-dim HNSW (ef_search=100): ~25ms, Recall@5 ~98%
-- Two-phase (512 coarse + 1536 rerank): ~4ms, Recall@5 ~96%
-- -> 2.5x faster with even better recall!
Managing Deletions: Vacuum and HNSW
-- Deletions in PostgreSQL are "soft deletes" (tuples marked dead)
-- The HNSW index maintains references to these dead tuples
-- VACUUM removes dead tuples and updates the index
-- Check dead tuples (indicator of VACUUM need)
SELECT
relname AS table_name,
n_live_tup AS live_rows,
n_dead_tup AS dead_rows,
ROUND(n_dead_tup::numeric / NULLIF(n_live_tup, 0) * 100, 2) AS pct_dead,
last_vacuum,
last_autovacuum,
n_mod_since_analyze AS mods_since_analyze
FROM pg_stat_user_tables
WHERE relname = 'documents';
-- If pct_dead > 10-20%, it is time to vacuum
VACUUM ANALYZE documents; -- vacuum + update statistics
-- VACUUM FULL: rebuilds the table (blocks writes, frees more space)
-- Use only in scheduled maintenance windows:
VACUUM FULL documents;
-- Aggressive autovacuum for vector tables
-- (many updates/deletes typical in RAG pipelines):
ALTER TABLE documents SET (
autovacuum_vacuum_scale_factor = 0.01, -- vacuum after 1% modified (default 20%)
autovacuum_analyze_scale_factor = 0.005, -- analyze after 0.5% (default 10%)
autovacuum_vacuum_cost_delay = 2, -- more aggressive (default 20ms)
autovacuum_vacuum_threshold = 50 -- at least 50 rows modified (default 50)
);
-- Verify autovacuum is running:
SELECT
schemaname,
relname,
last_autovacuum,
last_autoanalyze,
autovacuum_count,
autoanalyze_count
FROM pg_stat_user_tables
WHERE relname = 'documents';
Query Optimization with EXPLAIN ANALYZE
Using EXPLAIN ANALYZE regularly is essential to verify that vector search queries are using indexes correctly and to diagnose performance problems.
-- Complete analysis of a vector search query
EXPLAIN (ANALYZE, BUFFERS, TIMING, FORMAT TEXT)
SELECT id, content, embedding <=> '[0.1, 0.2, ...]'::vector(1536) AS dist
FROM documents
WHERE source_type = 'pdf'
ORDER BY embedding <=> '[0.1, 0.2, ...]'::vector(1536)
LIMIT 10;
-- Desired output (uses HNSW index):
-- Index Scan using idx_hnsw_docs_pdf on documents (cost=0.00..8.54 rows=10)
-- Index Cond: (embedding <=> '[...]'::vector)
-- Filter: (source_type = 'pdf')
-- Buffers: shared hit=247 <-- everything from cache!
-- -> Planning Time: 0.3 ms
-- -> Execution Time: 8.7 ms
-- Undesired output (brute force - to be avoided):
-- Seq Scan on documents (cost=0.00..1234.56 rows=10)
-- Sort Key: (embedding <=> '...'::vector)
-- Buffers: shared hit=1234 read=5678 <-- many reads from disk!
-- -> Execution Time: 3450 ms
-- If you see Seq Scan instead of Index Scan, check:
-- 1. Does the index exist?
SELECT indexname FROM pg_indexes WHERE tablename = 'documents';
-- 2. Is the LIMIT small enough?
-- PostgreSQL uses the index only for small LIMITs
-- 3. Are statistics up to date?
ANALYZE documents;
-- 4. Is enable_indexscan on?
SHOW enable_indexscan; -- must be 'on'
-- 5. Is ef_search appropriate?
SHOW hnsw.ef_search;
Production Checklist for Vector Indexes
- Memory sizing: Verify that
shared_buffersis large enough to hold the HNSW index. If the index is not in cache, queries will be 10-100x slower. - maintenance_work_mem: Set to 1-4GB before creating the index. You can reduce it after the build.
- Parallel build: Use
max_parallel_maintenance_workers=7for fast builds on multi-core systems. Saves hours of downtime. - ef_search in production: Do not use the default of 40. Measure recall on your dataset and set the appropriate value (typically 60-100 for enterprise RAG).
- Recall monitoring: Run recall tests weekly. If it drops below 0.85, schedule an urgent REINDEX.
- Aggressive autovacuum: For tables with heavy inserts/deletes, lower
autovacuum_vacuum_scale_factorto 0.01-0.05. - pg_prewarm: Enable auto-prewarming to ensure the index is in cache after every PostgreSQL restart.
- EXPLAIN ANALYZE: Regularly check that queries use HNSW indexes and do not accidentally fall back to sequential scans.
Common Mistakes and How to Avoid Them
| Mistake | Symptom | Solution |
|---|---|---|
| shared_buffers too small | Slow queries (>500ms), high % disk reads in EXPLAIN | Increase to 25% of RAM; use pg_prewarm |
| Default ef_search (40) in production | Recall@10 ~92%, imprecise RAG responses | Set 60-100 for enterprise RAG |
| IVFFlat with probes=1 (default) | Recall@10 ~50-60%, completely wrong results | Set probes=10-50 according to recall target |
| No REINDEX after many inserts | Recall degrades progressively over time | Schedule weekly/monthly REINDEX CONCURRENTLY |
| Sequential scan instead of Index Scan | Very slow vector queries without index | ANALYZE table; check LIMIT and WHERE clause |
| maintenance_work_mem too low | HNSW build extremely slow, takes hours/days | SET maintenance_work_mem = '2GB' before CREATE INDEX |
Conclusions and Next Steps
Vector index tuning is a discipline that requires concrete measurements, not just intuition. Optimal parameters depend on your specific dataset, latency requirements, and available memory. The golden rule: measure first, optimize after, monitor always.
In 2026, with the "Just Use Postgres" trend increasingly consolidated, knowing how to correctly configure HNSW and IVFFlat indexes lets you achieve performance competitive with specialized vector databases like Pinecone or Qdrant, while maintaining the simplicity of a single PostgreSQL infrastructure. Recent benchmarks show pgvector up to 28x faster than Pinecone at 16x lower cost, with the right configuration.
The final article in this series tackles the ultimate challenge: taking all of this to production at scale. Partitioning for enormous datasets, connection pooling with PgBouncer, read replicas dedicated to vector search, query caching with Redis, and multi-tenant architectures that allow PostgreSQL to serve millions of vector queries per day.







