pgvector Tutorial: Turn PostgreSQL into a Vector Database
In 2026, the developer community has embraced a clear mantra: "Just Use Postgres". Snowflake acquired Crunchy Data for $250 million, Databricks invested $1 billion in Neon, and Supabase reached a $5 billion valuation. PostgreSQL is no longer just a relational database: with pgvector, it becomes a full-fledged vector database, capable of handling embeddings, similarity search, and AI workloads without adding a single service to your stack.
If you are already running PostgreSQL (and statistically, you are), you do not need Pinecone, Qdrant, or Weaviate to get started with vector search. In this article, we will build a complete vector search system from scratch: from installing pgvector to creating optimized indexes, through embedding generation and LangChain integration.
Series Overview
| # | Article | Focus |
|---|---|---|
| 1 | You are here - pgvector | Installation, operators, indexing |
| 2 | Embeddings Deep Dive | Models, chunking, dimensions |
| 3 | RAG with PostgreSQL | End-to-end RAG pipeline |
| 4 | Hybrid Search | Vectors + full-text search |
| 5 | Scaling pgvector | Partitioning, sharding, performance |
| 6 | pgvector in Production | Monitoring, backup, CI/CD |
What You Will Learn
- What pgvector is and why it is the most important PostgreSQL extension in 2026
- How to install pgvector on Linux, Docker, and managed services (Supabase, Neon, AWS RDS)
- How to create tables with vector columns and insert embeddings
- The three distance operators: Cosine, L2, and Inner Product
- How to run effective similarity search queries
- How to generate embeddings with Python (OpenAI and Sentence Transformers)
- HNSW vs IVFFlat: which index to choose and how to configure it
- pgvector benchmarks against dedicated vector databases
- A complete end-to-end example with Python
- LangChain integration for RAG systems
1. The "Just Use Postgres" Trend in 2026
For years, the AI ecosystem pushed toward dedicated vector databases: Pinecone, Weaviate, Qdrant, Milvus. Every RAG project seemed to require a new service to manage, monitor, and pay for. But in 2025-2026, something changed dramatically.
The community realized that vectors are not a database type but a data type. Just as PostgreSQL handles JSON, arrays, geometries, and full-text search, it can also handle high-dimensional vectors. The extension that makes all of this possible is called pgvector.
Why "Just Use Postgres" in 2026
| Factor | Dedicated Vector Database | PostgreSQL + pgvector |
|---|---|---|
| Infrastructure | New service to manage | Extension in your existing DB |
| Cost | Separate plan (Pinecone from $70/month) | Included in PostgreSQL cost |
| Data consistency | Cross-DB synchronization needed | Native ACID transactions |
| Queries | Proprietary API | Standard SQL + JOIN, WHERE, GROUP BY |
| Backup | Separate system | pg_dump includes everything |
| Team | Specialized skills required | Anyone who knows SQL |
The key advantage is operational simplicity. With pgvector, your embeddings live in the same table as your relational data. You can JOIN vectors with metadata, use WHERE clauses to filter, and have transactional consistency. No external service to synchronize, no different API to learn.
2. What Is pgvector
pgvector is an open-source extension for PostgreSQL that adds native support for high-dimensional vectors and similarity search. Created by Andrew Kane in 2021, it has quickly become the most downloaded PostgreSQL extension for AI/ML use cases.
With pgvector you can:
- Store vectors (embeddings) as a native data type in a table column
- Run similarity searches using three different distance metrics
- Create specialized indexes (HNSW and IVFFlat) for fast lookups
- Combine vector search with everything PostgreSQL offers: JOINs, transactions, triggers, materialized views
pgvector by the Numbers (2026)
| Metric | Value |
|---|---|
| Current version | 0.8.0+ (with iterative index scans) |
| Max vector dimensions | 16,000 |
| Index types | HNSW, IVFFlat |
| Vector types | vector, halfvec, sparsevec, bit |
| Distance metrics | Cosine, L2 (Euclidean), Inner Product |
| Supported by | AWS RDS, Supabase, Neon, Google Cloud SQL, Azure |
| License | PostgreSQL License (open-source) |
Version 0.8.0, released in 2025, introduced a key feature: iterative index scans. This solves the overfiltering problem, where a vector index fails to return enough results because most candidates are discarded by WHERE filters. With iterative scans, pgvector continues exploring the index until it finds enough results that satisfy the filters, dramatically improving filtered query quality.
3. Installing pgvector
pgvector can be installed in several ways depending on your environment. Let us walk through the three main approaches: native Linux installation, Docker, and managed services.
3.1 Installation on Ubuntu/Debian
# Prerequisites: PostgreSQL 14+ already installed
sudo apt-get update
sudo apt-get install -y postgresql-server-dev-16
# Clone and build pgvector
git clone --branch v0.8.0 https://github.com/pgvector/pgvector.git
cd pgvector
make
sudo make install
# Enable the extension in your database
sudo -u postgres psql -d your_database -c "CREATE EXTENSION vector;"
For PostgreSQL 17, replace postgresql-server-dev-16 with
postgresql-server-dev-17. pgvector supports PostgreSQL versions 13 through 17.
3.2 Installation with Docker
The fastest way to get started. The official pgvector/pgvector image includes
PostgreSQL with pgvector pre-installed.
# docker-compose.yml
version: '3.8'
services:
postgres:
image: pgvector/pgvector:pg17
environment:
POSTGRES_DB: vectordb
POSTGRES_USER: admin
POSTGRES_PASSWORD: secret_password
ports:
- "5432:5432"
volumes:
- pgdata:/var/lib/postgresql/data
- ./init.sql:/docker-entrypoint-initdb.d/init.sql
volumes:
pgdata:
-- init.sql: automatically executed on first startup
CREATE EXTENSION IF NOT EXISTS vector;
# Quick start without docker-compose
docker run -d \
--name pgvector-db \
-e POSTGRES_DB=vectordb \
-e POSTGRES_PASSWORD=secret \
-p 5432:5432 \
pgvector/pgvector:pg17
# Connect and enable pgvector
docker exec -it pgvector-db psql -U postgres -d vectordb \
-c "CREATE EXTENSION vector;"
3.3 Managed Services (Zero Setup)
If you do not want to manage infrastructure, several cloud providers offer PostgreSQL with pgvector already enabled. This is the ideal choice for production deployments.
Managed Services with pgvector
| Service | pgvector Included | Free Tier | Notes |
|---|---|---|---|
| Supabase | Pre-installed | Yes (500 MB) | Dashboard UI, automatic REST APIs |
| Neon | Pre-installed | Yes (0.5 GB) | Serverless, database branching |
| AWS RDS | Must enable | No | pgvector 0.8.0 on Aurora PostgreSQL |
| Google Cloud SQL | Must enable | No | pgvector 0.8.0 support |
| Azure Database | Must enable | No | Flexible Server with pgvector |
-- On Supabase, pgvector is already available.
-- Just enable the extension:
CREATE EXTENSION IF NOT EXISTS vector;
-- Verify the installation
SELECT extversion FROM pg_extension WHERE extname = 'vector';
-- Result: 0.8.0
4. Creating a Table with Vectors
With pgvector installed, you can add vector(n) columns to your tables,
where n is the vector dimensionality. The value of n depends
on the embedding model you use: OpenAI text-embedding-3-small produces
1536-dimensional vectors, while text-embedding-3-large goes up to 3072.
-- Enable the extension (if not already done)
CREATE EXTENSION IF NOT EXISTS vector;
-- Documents table for a RAG system
CREATE TABLE documents (
id BIGSERIAL PRIMARY KEY,
title VARCHAR(500) NOT NULL,
content TEXT NOT NULL,
source VARCHAR(255),
category VARCHAR(100),
embedding vector(1536),
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
-- Add a column comment for documentation
COMMENT ON COLUMN documents.embedding IS
'Embedding generated with OpenAI text-embedding-3-small (1536 dim)';
Inserting Documents with Embeddings
Insertion works like a regular INSERT, passing the vector as a comma-separated string of values within square brackets.
-- Single insert
INSERT INTO documents (title, content, source, embedding)
VALUES (
'Introduction to PostgreSQL',
'PostgreSQL is a powerful and reliable open-source relational database...',
'docs/postgres-intro.md',
'[0.023, -0.045, 0.067, 0.089, -0.012, ...]' -- 1536-dimensional vector
);
-- Batch insert
INSERT INTO documents (title, content, source, category, embedding)
VALUES
('Chapter 1', 'Chapter 1 content...', 'book.pdf', 'tutorial',
'[0.11, -0.22, 0.33, ...]'),
('Chapter 2', 'Chapter 2 content...', 'book.pdf', 'tutorial',
'[0.44, -0.55, 0.66, ...]'),
('FAQ', 'Frequently asked questions...', 'faq.md', 'support',
'[-0.12, 0.34, -0.56, ...]');
Vector Dimensionality
The dimensionality specified in the column definition (vector(1536)) must
exactly match the dimension of the embeddings you insert. Attempting to insert a
768-dimensional vector into a vector(1536) column will raise an error.
Choose the dimension based on the embedding model you intend to use.
Popular Embedding Model Dimensions
| Model | Provider | Dimensions | Cost / 1M tokens |
|---|---|---|---|
| text-embedding-3-small | OpenAI | 1536 | $0.02 |
| text-embedding-3-large | OpenAI | 3072 | $0.13 |
| all-MiniLM-L6-v2 | HuggingFace (free) | 384 | Free (local) |
| all-mpnet-base-v2 | HuggingFace (free) | 768 | Free (local) |
| voyage-3 | Voyage AI | 1024 | $0.06 |
5. Distance Operators
pgvector provides three operators for computing distance between vectors. Each operator corresponds to a different mathematical metric and has specific use cases.
The Three pgvector Operators
| Operator | Metric | Formula | Ordering | Use Case |
|---|---|---|---|---|
<=> |
Cosine Distance | 1 - cosine_similarity | ASC (0 = identical) | Text, normalized embeddings |
<-> |
L2 (Euclidean) | sqrt(sum((a-b)^2)) | ASC (0 = identical) | Images, spatial data |
<#> |
Inner Product (negated) | -1 * dot_product | ASC (more negative = more similar) | Normalized vectors, max performance |
-- COSINE DISTANCE <=> (most commonly used for text)
-- Result: 0 = identical, 2 = opposite
SELECT title, content,
embedding <=> '[0.1, 0.2, ...]' AS cosine_distance
FROM documents
ORDER BY embedding <=> '[0.1, 0.2, ...]' ASC
LIMIT 5;
-- L2 (EUCLIDEAN) <->
-- Result: 0 = identical, increasing values = farther apart
SELECT title, content,
embedding <-> '[0.1, 0.2, ...]' AS l2_distance
FROM documents
ORDER BY embedding <-> '[0.1, 0.2, ...]' ASC
LIMIT 5;
-- INNER PRODUCT (NEGATED) <#>
-- Result: more negative values = more similar
SELECT title, content,
(embedding <#> '[0.1, 0.2, ...]') * -1 AS similarity
FROM documents
ORDER BY embedding <#> '[0.1, 0.2, ...]' ASC
LIMIT 5;
Which Operator to Choose
For 90% of text and AI use cases, use <=> (cosine distance).
Embedding models like those from OpenAI produce normalized vectors, making cosine
distance and inner product equivalent in terms of results. Use <->
(L2) only when working with spatial data or images where vector magnitude matters.
Inner product (<#>) is slightly faster than cosine because it skips
normalization, but only on already-normalized vectors.
6. Similarity Search Queries
Similarity search is the core of pgvector. The basic query is straightforward: order results by distance from the query vector and take the top K. But in production, queries become more sophisticated by combining vector search with SQL filters.
6.1 Basic Query
-- Find the 5 most similar documents to a query vector
SELECT
id,
title,
content,
1 - (embedding <=> $1) AS similarity -- $1 = query vector
FROM documents
ORDER BY embedding <=> $1
LIMIT 5;
6.2 Queries with Filters (pgvector's Real Power)
This is where pgvector shines compared to dedicated vector databases. You can combine vector search with any SQL clause: WHERE, JOIN, GROUP BY, subqueries.
-- Filter by category and date
SELECT
id, title, content,
1 - (embedding <=> $1) AS similarity
FROM documents
WHERE category = 'tutorial'
AND created_at > NOW() - INTERVAL '30 days'
ORDER BY embedding <=> $1
LIMIT 5;
-- JOIN with other tables: find similar docs with author info
SELECT
d.title,
d.content,
a.name AS author_name,
1 - (d.embedding <=> $1) AS similarity
FROM documents d
JOIN authors a ON d.author_id = a.id
WHERE a.verified = true
ORDER BY d.embedding <=> $1
LIMIT 10;
-- Exclusion: find similar but NOT the document itself
SELECT
id, title,
1 - (embedding <=> (SELECT embedding FROM documents WHERE id = 42)) AS similarity
FROM documents
WHERE id != 42
ORDER BY embedding <=> (SELECT embedding FROM documents WHERE id = 42)
LIMIT 5;
6.3 Iterative Index Scans (pgvector 0.8.0)
One of the most important additions in version 0.8.0 is support for iterative index scans. When you combine vector search with restrictive WHERE filters, the index might not find enough results on the first pass. With iterative scans, pgvector continues exploring the index until it satisfies the requested LIMIT.
-- Enable iterative scan for HNSW
SET hnsw.iterative_scan = relaxed_order; -- or strict_order
-- Set the maximum number of tuples to scan
SET hnsw.max_scan_tuples = 20000;
-- Now queries with restrictive filters work correctly
SELECT id, title,
1 - (embedding <=> $1) AS similarity
FROM documents
WHERE category = 'rare' -- category with few documents
ORDER BY embedding <=> $1
LIMIT 10;
-- Without iterative scan: might return 0-3 results
-- With iterative scan: always returns up to 10 results
-- Available modes:
-- off = traditional behavior (default)
-- strict_order = maintains exact distance ordering
-- relaxed_order = approximate ordering (better performance)
When to Use Iterative Scans
- relaxed_order: for most cases, offers the best balance between performance and quality
- strict_order: when exact distance ordering is critical (e.g., ranking results for users)
- off: when you have no WHERE filters or when filters are not restrictive
7. Generating Embeddings with Python
To insert data into pgvector, you first need to transform text into numerical vectors using an embedding model. Let us look at two approaches: the OpenAI API (paid, high quality) and Sentence Transformers (free, local).
7.1 With the OpenAI API
import openai
import psycopg2
from psycopg2.extras import execute_values
# Configure the OpenAI client
client = openai.OpenAI(api_key="sk-...")
def get_embedding(text: str, model: str = "text-embedding-3-small") -> list[float]:
"""Generate an embedding for a text using OpenAI."""
response = client.embeddings.create(
input=text,
model=model
)
return response.data[0].embedding # list of 1536 floats
# Connect to PostgreSQL
conn = psycopg2.connect(
host="localhost",
database="vectordb",
user="admin",
password="secret_password"
)
cur = conn.cursor()
# Generate and insert embeddings
documents = [
("Introduction to Python", "Python is a versatile programming language..."),
("Advanced PostgreSQL", "Recursive CTEs allow hierarchical queries..."),
("Docker for Developers", "Docker containers isolate applications..."),
]
for title, content in documents:
embedding = get_embedding(content)
cur.execute(
"INSERT INTO documents (title, content, embedding) VALUES (%s, %s, %s)",
(title, content, str(embedding))
)
conn.commit()
cur.close()
conn.close()
7.2 With Sentence Transformers (Free and Local)
from sentence_transformers import SentenceTransformer
import psycopg2
import numpy as np
# Load the model (automatically downloaded the first time)
model = SentenceTransformer('all-MiniLM-L6-v2') # 384 dimensions
# Generate embeddings for a batch of texts
texts = [
"Python is a versatile programming language",
"Recursive CTEs enable hierarchical queries in SQL",
"Docker containers isolate applications",
]
embeddings = model.encode(texts) # shape: (3, 384)
# Connect and insert
conn = psycopg2.connect(
host="localhost", database="vectordb",
user="admin", password="secret_password"
)
cur = conn.cursor()
# NOTE: the column must be vector(384), not vector(1536)
for text, embedding in zip(texts, embeddings):
cur.execute(
"INSERT INTO documents (title, content, embedding) VALUES (%s, %s, %s)",
(text[:100], text, str(embedding.tolist()))
)
conn.commit()
cur.close()
conn.close()
OpenAI vs Sentence Transformers
| Aspect | OpenAI API | Sentence Transformers |
|---|---|---|
| Cost | $0.02 / 1M tokens | Free |
| Quality | Excellent (state of the art) | Good (sufficient for many use cases) |
| Privacy | Data sent to OpenAI | Fully local, no data leaves your server |
| Speed | ~100ms per request (network) | ~10ms per text (with GPU) |
| Offline | No (requires internet) | Yes (after first download) |
| Dimensions | 1536 or 3072 | 384 or 768 (model-dependent) |
The Golden Rule of Embeddings
Always use the same embedding model for both indexing documents and
generating the query vector. Embeddings from different models live in different vector
spaces and are not comparable. If you index with text-embedding-3-small,
you must search with the same model.
8. Indexing: HNSW vs IVFFlat
Without an index, pgvector performs a sequential scan (brute force), comparing the query vector with every row in the table. This is 100% accurate but becomes slow with large datasets. ANN (Approximate Nearest Neighbor) indexes sacrifice a small percentage of accuracy for searches that are orders of magnitude faster.
8.1 HNSW Index
HNSW (Hierarchical Navigable Small World) builds a multi-layered graph of connections between vectors. It is the recommended index for most use cases: fast reads, good accuracy, and it supports incremental inserts without rebuilding.
-- HNSW index with default parameters
CREATE INDEX idx_documents_embedding_hnsw
ON documents
USING hnsw (embedding vector_cosine_ops);
-- HNSW index with custom parameters
CREATE INDEX idx_documents_embedding_hnsw
ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (
m = 24, -- connections per node (default: 16, range: 2-100)
ef_construction = 128 -- candidates during build (default: 64, range: 4-1000)
);
-- Tune ef_search for queries (default: 40)
SET hnsw.ef_search = 100; -- higher = more accurate but slower
HNSW Parameters: Configuration Guide
| Parameter | Default | Range | Effect | Recommendation |
|---|---|---|---|---|
m |
16 | 2 - 100 | More connections = more accurate, more memory | 16-32 for most cases |
ef_construction |
64 | 4 - 1000 | Higher = better index, slower build | 64-256 (one-time cost, worth increasing) |
ef_search |
40 | 1 - 1000 | Higher = more accurate search, slower | 40-200 based on desired trade-off |
8.2 IVFFlat Index
IVFFlat (Inverted File with Flat compression) divides vectors into clusters and searches only the clusters closest to the query. It builds faster than HNSW but requires rebuilding when you add significant new data. Ideal for datasets that change infrequently.
-- IMPORTANT: IVFFlat requires data in the table BEFORE creating the index
-- (it needs to perform k-means clustering on existing data)
-- IVFFlat index
CREATE INDEX idx_documents_embedding_ivf
ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (
lists = 100 -- number of clusters (default: 100)
);
-- Tune the number of clusters to explore during search
SET ivfflat.probes = 10; -- default: 1, recommended: sqrt(lists)
-- Rule of thumb for "lists":
-- rows < 1M: lists = rows / 1000
-- rows >= 1M: lists = sqrt(rows)
8.3 Operator to Index Class Mapping
The operator specified when creating the index must match the operator used in your queries. Here is the complete mapping.
Operator and Index Class Mapping
| Distance | SQL Operator | Index Class (ops) |
|---|---|---|
| Cosine | <=> |
vector_cosine_ops |
| L2 (Euclidean) | <-> |
vector_l2_ops |
| Inner Product | <#> |
vector_ip_ops |
8.4 HNSW vs IVFFlat: Which to Choose
HNSW vs IVFFlat Comparison
| Aspect | HNSW | IVFFlat |
|---|---|---|
| Search speed | Faster | Fast |
| Accuracy (recall) | ~95-99% | ~90-98% |
| Build time | Slow | Fast |
| Memory | High | Medium |
| Incremental inserts | Supported | Requires REINDEX |
| Iterative scans | Yes (0.8.0) | Yes (0.8.0) |
| Ideal use case | General purpose, growing data | Static datasets, fast builds |
Practical Advice
For most projects, choose HNSW. It is more versatile, supports incremental inserts without rebuilding, and pgvector 0.8.0 has significantly improved its performance. Use IVFFlat only if index build time is critical (very large datasets loaded once) or if you have strict memory constraints.
9. Performance: pgvector vs Dedicated Vector Databases
The most common question is: "Is pgvector fast enough for production?" The answer in 2026 is a clear yes, for the vast majority of use cases. Let us look at the numbers.
9.1 Large-Scale Benchmarks
Benchmarks from pgvectorscale (a complementary extension developed by Timescale) on 50 million vectors show impressive results:
Benchmark: 50M Vectors at 99% Recall
| System | QPS (Queries/sec) | Recall | Notes |
|---|---|---|---|
| pgvectorscale | 471 | 99% | 11.4x faster than Qdrant |
| Qdrant | 41 | 99% | Optimized for pure vector search |
9.2 Mid-Scale Benchmarks (1M Vectors)
Benchmark: 1M Vectors - Throughput
| System | Queries/sec | Inserts/sec | Managed Cost |
|---|---|---|---|
| Pinecone | ~5,000 | ~50,000 | From $70/month |
| Qdrant | ~4,500 | ~45,000 | Self-hosted or cloud |
| pgvector | ~3,000 | ~30,000 | Included in PG cost |
At 1M vectors, pgvector is slightly slower in raw throughput. But considering that the cost is virtually zero (already included in your PostgreSQL), the value-for-money ratio is unbeatable. For most web applications, 3,000 queries per second is more than enough.
When pgvector Is Not Enough
Consider a dedicated vector database (Qdrant, Pinecone) when:
- You have over 50 million vectors
- You need p99 latencies under 5ms
- Your workload is exclusively vector search (no relational data)
- You need native distributed sharding
For everything else, pgvector is the right choice. A system under 10 million vectors with PostgreSQL already in the stack does not need additional complexity.
10. Complete Example: Knowledge Base with Python
Let us put everything together in a practical example. We will build a knowledge base that indexes documents with embeddings, enables similarity search, and supports filtering by category.
"""
Knowledge Base with PostgreSQL + pgvector
Requirements: pip install openai psycopg2-binary
"""
import openai
import psycopg2
from dataclasses import dataclass
# === Configuration ===
OPENAI_API_KEY = "sk-..."
DB_CONFIG = {
"host": "localhost",
"database": "vectordb",
"user": "admin",
"password": "secret_password",
}
EMBEDDING_MODEL = "text-embedding-3-small"
EMBEDDING_DIM = 1536
@dataclass(frozen=True)
class SearchResult:
id: int
title: str
content: str
similarity: float
category: str
# === Embedding functions ===
client = openai.OpenAI(api_key=OPENAI_API_KEY)
def get_embedding(text: str) -> list[float]:
"""Generate an embedding for a text."""
response = client.embeddings.create(
input=text,
model=EMBEDDING_MODEL,
)
return response.data[0].embedding
def get_embeddings_batch(texts: list[str]) -> list[list[float]]:
"""Generate embeddings for a batch of texts (more efficient)."""
response = client.embeddings.create(
input=texts,
model=EMBEDDING_MODEL,
)
return [item.embedding for item in response.data]
# === Database setup ===
def setup_database() -> None:
"""Create the table and indexes."""
conn = psycopg2.connect(**DB_CONFIG)
cur = conn.cursor()
cur.execute("CREATE EXTENSION IF NOT EXISTS vector;")
cur.execute(f"""
CREATE TABLE IF NOT EXISTS kb_documents (
id BIGSERIAL PRIMARY KEY,
title VARCHAR(500) NOT NULL,
content TEXT NOT NULL,
category VARCHAR(100) DEFAULT 'general',
embedding vector({EMBEDDING_DIM}),
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
""")
cur.execute("""
CREATE INDEX IF NOT EXISTS idx_kb_embedding_hnsw
ON kb_documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 24, ef_construction = 128);
""")
cur.execute("""
CREATE INDEX IF NOT EXISTS idx_kb_category
ON kb_documents (category);
""")
conn.commit()
cur.close()
conn.close()
print("Database setup complete.")
# === Document insertion ===
def insert_documents(
documents: list[dict],
batch_size: int = 50,
) -> None:
"""Insert documents with automatically generated embeddings."""
conn = psycopg2.connect(**DB_CONFIG)
cur = conn.cursor()
for i in range(0, len(documents), batch_size):
batch = documents[i : i + batch_size]
texts = [doc["content"] for doc in batch]
embeddings = get_embeddings_batch(texts)
for doc, emb in zip(batch, embeddings):
cur.execute(
"""INSERT INTO kb_documents (title, content, category, embedding)
VALUES (%s, %s, %s, %s)""",
(doc["title"], doc["content"],
doc.get("category", "general"), str(emb)),
)
print(f"Inserted {min(i + batch_size, len(documents))}/{len(documents)} documents")
conn.commit()
cur.close()
conn.close()
# === Similarity search ===
def search(
query: str,
limit: int = 5,
category: str | None = None,
min_similarity: float = 0.3,
) -> list[SearchResult]:
"""Search for documents similar to the query."""
query_embedding = get_embedding(query)
conn = psycopg2.connect(**DB_CONFIG)
cur = conn.cursor()
# Enable iterative scan for restrictive filters
cur.execute("SET hnsw.iterative_scan = relaxed_order;")
if category:
cur.execute(
"""SELECT id, title, content, category,
1 - (embedding <=> %s::vector) AS similarity
FROM kb_documents
WHERE category = %s
ORDER BY embedding <=> %s::vector
LIMIT %s""",
(str(query_embedding), category,
str(query_embedding), limit),
)
else:
cur.execute(
"""SELECT id, title, content, category,
1 - (embedding <=> %s::vector) AS similarity
FROM kb_documents
ORDER BY embedding <=> %s::vector
LIMIT %s""",
(str(query_embedding), str(query_embedding), limit),
)
results = [
SearchResult(
id=row[0], title=row[1], content=row[2],
category=row[3], similarity=row[4],
)
for row in cur.fetchall()
if row[4] >= min_similarity
]
cur.close()
conn.close()
return results
# === Main ===
if __name__ == "__main__":
# 1. Setup
setup_database()
# 2. Insert sample documents
sample_docs = [
{
"title": "Introduction to PostgreSQL",
"content": "PostgreSQL is an open-source RDBMS with full ACID support.",
"category": "database",
},
{
"title": "PostgreSQL Indexes",
"content": "B-trees are the default index type. HNSW is used for vectors.",
"category": "database",
},
{
"title": "What is RAG",
"content": "RAG combines retrieval and generation for document-based answers.",
"category": "ai",
},
{
"title": "Docker for Developers",
"content": "Docker isolates applications in lightweight, portable containers.",
"category": "devops",
},
{
"title": "Embeddings and Vector Search",
"content": "Embeddings transform text into vectors for similarity search.",
"category": "ai",
},
]
insert_documents(sample_docs)
# 3. Search documents
print("\n--- Search: 'how do vector databases work' ---")
results = search("how do vector databases work", limit=3)
for r in results:
print(f" [{r.similarity:.3f}] {r.title} ({r.category})")
print("\n--- Filtered search for category 'ai' ---")
results = search("PostgreSQL tutorial", limit=3, category="ai")
for r in results:
print(f" [{r.similarity:.3f}] {r.title} ({r.category})")
11. LangChain Integration
If you are building a RAG system, LangChain provides a direct integration with pgvector
through the langchain-postgres package. This lets you use PostgreSQL as a
vector store without writing SQL manually.
pip install langchain-postgres langchain-openai psycopg
from langchain_postgres import PGVector
from langchain_openai import OpenAIEmbeddings
from langchain_core.documents import Document
# Configure the embedding model
embeddings = OpenAIEmbeddings(
model="text-embedding-3-small",
openai_api_key="sk-...",
)
# NOTE: langchain-postgres uses psycopg3 (not psycopg2)
CONNECTION_STRING = "postgresql+psycopg://admin:secret@localhost:5432/vectordb"
# Create the vector store
vector_store = PGVector(
embeddings=embeddings,
collection_name="langchain_docs",
connection=CONNECTION_STRING,
use_jsonb=True, # JSONB metadata for efficient filtering
)
# Add documents
docs = [
Document(
page_content="pgvector turns PostgreSQL into a vector database",
metadata={"source": "tutorial.md", "chapter": 1},
),
Document(
page_content="HNSW is the recommended index for most use cases",
metadata={"source": "tutorial.md", "chapter": 2},
),
Document(
page_content="LangChain simplifies building RAG systems",
metadata={"source": "rag-guide.md", "chapter": 1},
),
]
vector_store.add_documents(docs)
# Similarity search
results = vector_store.similarity_search(
"how to create a vector index",
k=3,
)
for doc in results:
print(f"[{doc.metadata['source']}] {doc.page_content[:80]}...")
# Search with scores
results_with_scores = vector_store.similarity_search_with_score(
"what is pgvector",
k=3,
)
for doc, score in results_with_scores:
print(f"[Score: {score:.4f}] {doc.page_content[:80]}...")
# Use as a retriever in a RAG chain
retriever = vector_store.as_retriever(
search_type="similarity",
search_kwargs={"k": 5},
)
# The retriever can be used directly in LangChain chains
relevant_docs = retriever.invoke("install pgvector docker")
Note on psycopg3
The langchain-postgres package requires psycopg3 (not psycopg2).
Connection strings must use the postgresql+psycopg:// format instead of
postgresql+psycopg2://. If you are migrating from langchain-community,
update your connection string accordingly.
12. Best Practices for pgvector in Production
12.1 Vector Dimensions
Smaller vectors take less space, make indexes faster, and reduce costs. OpenAI
text-embedding-3-small (1536 dim) offers the best quality-to-size ratio
for most use cases. If you can tolerate a slight quality loss,
all-MiniLM-L6-v2 (384 dim) is an excellent free alternative.
12.2 Batch Inserts
from psycopg2.extras import execute_values
# WRONG: one INSERT at a time (slow)
for doc in documents:
cur.execute("INSERT INTO docs (content, embedding) VALUES (%s, %s)",
(doc.content, str(doc.embedding)))
# CORRECT: batch insert with execute_values (10-50x faster)
data = [
(doc.content, str(doc.embedding))
for doc in documents
]
execute_values(
cur,
"INSERT INTO docs (content, embedding) VALUES %s",
data,
page_size=500, # rows per SQL batch
)
conn.commit()
12.3 Connection Pooling
In production, do not open a new PostgreSQL connection for every request. Use a connection
pool such as PgBouncer or the built-in pooling from psycopg2.
from psycopg2 import pool
# Create a connection pool (at application startup)
connection_pool = pool.ThreadedConnectionPool(
minconn=5, # minimum connections
maxconn=20, # maximum connections
host="localhost",
database="vectordb",
user="admin",
password="secret_password",
)
# Use a connection from the pool
conn = connection_pool.getconn()
try:
cur = conn.cursor()
cur.execute("SELECT ... ORDER BY embedding <=> %s LIMIT 5", ...)
results = cur.fetchall()
cur.close()
finally:
connection_pool.putconn(conn) # return to pool
12.4 Maintenance
-- VACUUM to reclaim space after DELETE/UPDATE
VACUUM ANALYZE documents;
-- Rebuild IVFFlat index after massive inserts
REINDEX INDEX CONCURRENTLY idx_documents_embedding_ivf;
-- NOTE: HNSW does not require REINDEX for new inserts
-- Monitor index size
SELECT
indexname,
pg_size_pretty(pg_relation_size(indexname::regclass)) AS size
FROM pg_indexes
WHERE tablename = 'documents';
-- Verify that the index is being used
EXPLAIN ANALYZE
SELECT id, 1 - (embedding <=> '[0.1, 0.2, ...]') AS similarity
FROM documents
ORDER BY embedding <=> '[0.1, 0.2, ...]'
LIMIT 5;
-- Look for "Index Scan using idx_..._hnsw" in the output
Production Checklist
- Create an HNSW index with
m = 24andef_construction = 128 - Set
ef_searchbetween 40 and 200 based on the latency/accuracy trade-off - Enable
hnsw.iterative_scan = relaxed_orderif you use WHERE filters - Use batch inserts with
execute_valuesfor bulk loads - Configure a connection pool (PgBouncer or psycopg2.pool)
- Schedule periodic
VACUUM ANALYZE - Monitor queries with
EXPLAIN ANALYZEto verify index usage - Use
halfvecinstead ofvectorto save 50% space (if float16 precision is sufficient)
Conclusions
pgvector has transformed PostgreSQL from a purely relational database into an AI-ready platform. With version 0.8.0 and its iterative index scans, the gap with dedicated vector databases has narrowed dramatically, and for most projects under 10 million vectors, there is no reason to add a separate service to your stack.
In this article, you learned how to install pgvector, create tables with vector columns, run similarity searches with three distance operators, generate embeddings with Python, configure HNSW and IVFFlat indexes, and integrate everything with LangChain. You also saw benchmarks confirming pgvector's production-grade performance.
In the next article, we will dive deep into embeddings: how they work internally, how to choose the right model, chunking strategies for long documents, and how to optimize search quality. In the third article, we will build a complete RAG pipeline with PostgreSQL as the sole backend.
Additional Resources
- pgvector GitHub: github.com/pgvector/pgvector - Official documentation and changelog
- pgvectorscale: github.com/timescale/pgvectorscale - Extension for advanced performance
- LangChain PGVector: python.langchain.com - Official Python integration
- Supabase Vector: supabase.com/docs/guides/ai - pgvector guide on Supabase
- Neon AI: neon.tech/docs/ai - pgvector on Neon serverless







