Postgres Is All You Need
Last year I evaluated four vector databases for a
production RAG system. I read every benchmark, sat through
three vendor demos, and built proof-of-concept integrations
with Pinecone, Weaviate, and Qdrant.
Then I shipped pgvector on the Postgres instance we already
had running. It took an afternoon.
Six months later, that system serves 50,000 semantic
searches per day at p95 latency under 40ms. The total
infrastructure cost for vector search is $0 extra per
month because it runs on the same database that stores our
users, sessions, and application data.
The Vector DB Hype Cycle
Venture capital poured $350M+ into vector database
startups in 2023-2024. Pinecone raised $138M. Weaviate
raised $50M. Qdrant raised $28M. That money needs to
justify itself, which means marketing budgets aimed
directly at convincing you that Postgres cannot handle
your embeddings.
Here is what the pitch sounds like: "You need a
purpose-built vector database for production AI. Postgres
was not designed for high-dimensional similarity search.
You will hit scaling walls."
That pitch is not wrong for everyone. It is wrong for
about 90% of teams building AI features today.
Most production RAG systems have fewer than 5M vectors.
Most query volumes are under 100 QPS. Most latency
requirements are "under 200ms." If that describes your
system, you are the target customer for a $200K/year
service you do not need.
pgvector Basics
pgvector is a Postgres extension that adds vector data
types and similarity search operators. You enable it with
one line:
CREATE EXTENSION IF NOT EXISTS vector;
That gives you a vector column type, three distance
operators (<-> for L2, <=> for cosine, <#> for
inner product), and three indexing strategies.
Flat Scan (No Index)
No index at all. Postgres scans every row and computes
the distance. 100% recall, but O(n) performance. Fine for
tables under 10,000 rows. I use this for development and
small lookup tables.
IVFFlat
Inverted file index. Partitions vectors into clusters
(called "lists"), then searches only the nearest clusters
at query time. You choose the number of lists and how many
to probe.
CREATE INDEX ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
Good recall at reasonable speed. The catch: you need to
build the index after inserting your data, and the quality
depends on choosing the right number of lists. Rule of
thumb: lists = rows / 1000 for up to 1M rows.
HNSW
Hierarchical Navigable Small World graph. This is the one
you want for production. It builds a multi-layer graph
structure that enables logarithmic search time with
configurable recall.
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
HNSW takes longer to build but delivers better query
performance than IVFFlat and does not require rebuilding
after inserts. The two parameters that matter:
- m: connections per node (default 16, higher = better
recall but more memory)
- ef_construction: build-time search width (default 64,
higher = better recall but slower builds)
Performance Reality
I ran benchmarks on a Supabase Pro instance (4 GB RAM,
2-core CPU) with 1M vectors at 1536 dimensions (OpenAI
text-embedding-3-small output size). Here is what I
measured:
| Metric | HNSW | IVFFlat | Pinecone p1 |
|---|---|---|---|
| p50 latency | 8ms | 12ms | 5ms |
| p95 latency | 22ms | 35ms | 11ms |
| p99 latency | 38ms | 58ms | 18ms |
| Recall@10 | 0.98 | 0.95 | 0.99 |
| QPS (single conn) | 120 | 85 | 200 |
Pinecone is faster. No question. A purpose-built vector
index running on optimized hardware will beat a general
purpose database. But look at the actual numbers. 22ms
vs 11ms at p95. For a RAG pipeline where the LLM
generation step takes 800-2000ms, you are arguing about
11 milliseconds.
Recall at 0.98 means for every 100 queries, pgvector
returns the same top-10 results as an exact search 98
times. The 2% where it differs? The "wrong" results are
still semantically close -- usually items ranked 11th or
12th in the exact ordering.
I tuned ef_search (the query-time search width) to 100:
SET hnsw.ef_search = 100;
That pushed recall to 0.99 with p95 at 28ms. Still well
within acceptable range for any user-facing application.
The Supabase Advantage
I run my Postgres on Supabase, which ships pgvector
enabled by default. Zero setup. The same connection
string that my Next.js app uses for user auth and
application data also serves vector search queries.
This matters more than benchmarks. Here is why:
One fewer service to monitor. No separate vector DB
dashboard, no additional uptime SLA to track, no second
set of access credentials to rotate.
One fewer network hop. My application queries users,
embeddings, and metadata in a single SQL query with JOINs.
A dedicated vector DB means a separate API call, a
separate round trip, and the complexity of correlating
results across two data stores.
One fewer billing surprise. Supabase Pro is $25/month.
I know exactly what it costs. Vector DB pricing is per
vector, per query, per dimension, per namespace -- a
pricing model designed to scale with your success and
punish you for it.
Transactional consistency. When I insert a new
document, the text content, metadata, and embedding
all land in the same transaction. There is no sync
lag between my application database and my vector
index. No eventual consistency headaches.
When pgvector Is NOT Enough
I would be dishonest if I told you pgvector solves
everything. Here are the cases where a dedicated vector
database earns its cost:
More than 10M vectors. At this scale, HNSW index
build times stretch to hours and memory requirements
exceed what a standard Postgres instance provides.
Pinecone and Qdrant shard across machines natively.
pgvector does not.
Sub-10ms p99 latency requirements. If you are building
a real-time recommendation system where every millisecond
counts (think ad serving, trading signals), the 20-40ms
p99 range of pgvector will not cut it.
Multi-tenant isolation at massive scale. You have 1000
tenants each with 1M vectors. Namespaces in Pinecone
handle this cleanly. In pgvector, you are either
partitioning tables or filtering with WHERE clauses, and
neither scales as elegantly past a certain point.
GPU-accelerated search. Some workloads need the raw
throughput that GPU-based vector search provides. pgvector
runs on CPU only.
If your use case hits any of these, the dedicated vector
DB is worth the money. But be honest about whether you
are solving today's problem or a problem you imagine
having in 18 months.
Implementation: Supabase + pgvector
Here is the complete setup I use in production. Five
steps from zero to semantic search.
1. Create the Table
CREATE TABLE documents (
id BIGSERIAL PRIMARY KEY,
content TEXT NOT NULL,
metadata JSONB DEFAULT '{}',
embedding VECTOR(1536),
created_at TIMESTAMPTZ DEFAULT NOW()
);
2. Add the HNSW Index
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 200);
I set ef_construction to 200 for production. Higher
than the default, but it only affects build time, not
query time. Worth the upfront cost for better recall.
3. Create a Search Function
CREATE OR REPLACE FUNCTION match_documents(
query_embedding VECTOR(1536),
match_threshold FLOAT DEFAULT 0.78,
match_count INT DEFAULT 10
)
RETURNS TABLE (
id BIGINT,
content TEXT,
metadata JSONB,
similarity FLOAT
)
LANGUAGE plpgsql
AS $$
BEGIN
RETURN QUERY
SELECT
d.id,
d.content,
d.metadata,
1 - (d.embedding <=> query_embedding) AS similarity
FROM documents d
WHERE 1 - (d.embedding <=> query_embedding)
> match_threshold
ORDER BY d.embedding <=> query_embedding
LIMIT match_count;
END;
$$;
4. Insert Embeddings (TypeScript)
import { createClient } from '@supabase/supabase-js'
import OpenAI from 'openai'
const supabase = createClient(
process.env.SUPABASE_URL!,
process.env.SUPABASE_SERVICE_KEY!
)
const openai = new OpenAI()
async function insertDocument(
content: string,
metadata: Record<string, unknown>
) {
const { data: embeddingData } =
await openai.embeddings.create({
model: 'text-embedding-3-small',
input: content,
})
const embedding =
embeddingData[0].embedding
const { error } = await supabase
.from('documents')
.insert({
content,
metadata,
embedding,
})
if (error) throw error
}
5. Semantic Search (TypeScript)
async function searchDocuments(query: string) {
const { data: embeddingData } =
await openai.embeddings.create({
model: 'text-embedding-3-small',
input: query,
})
const queryEmbedding =
embeddingData[0].embedding
const { data, error } = await supabase
.rpc('match_documents', {
query_embedding: queryEmbedding,
match_threshold: 0.78,
match_count: 5,
})
if (error) throw error
return data
}
That is the entire implementation. No SDK for a
separate vector service. No API key management for
another provider. No data sync pipeline. Just SQL
and the Supabase client you already have.
Cost Comparison: Pinecone vs. pgvector
I priced both options for a realistic workload: 500K
vectors at 1536 dimensions, 50K queries per day,
single region.
| | Pinecone (s1) | Supabase Pro |
|---|---|---|
| Base cost | $70/month | $25/month |
| Storage (500K vecs) | included | included |
| Query costs | included | included |
| Writes (10K/day) | included | included |
| Additional infra | $0 | $0 |
| Total | $70/month | $25/month |
The Pinecone s1 pod at $70/month handles this
workload. Supabase Pro at $25/month handles it too --
and also runs your auth, your application database,
your realtime subscriptions, and your storage.
But the real cost is not the invoice. It is the
engineering time.
Adding Pinecone means: a new SDK to learn, a data sync
pipeline to build and maintain, a separate monitoring
setup, credential rotation for another service, a
migration path if you outgrow the s1 pod or Pinecone
changes pricing, and the cognitive overhead of reasoning
about data consistency across two stores.
I estimate that overhead at 2-4 engineering hours per
month for a small team. At $150/hour fully loaded, that
is $300-600/month in hidden costs. More than the
infrastructure itself.
The Boring Technology Thesis
Dan McKinley wrote "Choose Boring Technology" in 2015.
The argument: every technology choice you make spends
from a limited innovation budget. You can only sustain
a few novel technologies at a time before your team
drowns in operational complexity.
Postgres is the most boring technology I know. It has
been in production since 1996. Every engineer on your
team has used it. Your monitoring, backup, and
failover infrastructure already handles it.
pgvector turns Postgres into a good-enough vector
database. Not the fastest. Not the most feature-rich.
But good enough for the vast majority of production
AI workloads. And it does it without spending any of
your innovation budget.
I have shipped three AI features on pgvector. None of
them required migrating to a dedicated vector DB later.
The one time I considered it, I realized the bottleneck
was my chunking strategy, not the vector index. I fixed
the real problem and moved on.
The 10-20% performance gap between pgvector and a
dedicated solution is real. But it is a gap you trade
for: one fewer service, one fewer vendor relationship,
one fewer failure mode, and one fewer thing to explain
to the on-call engineer at 3 AM.
That trade-off is worth it almost every time.
Start Here
If you are evaluating vector databases, try this
before you sign a contract:
- Enable pgvector on your existing Postgres
- Load your actual embeddings (not a benchmark dataset)
- Run your actual queries and measure latency
- Compare that latency to your actual requirements
Most teams discover that pgvector meets their needs
before they finish step 3. The ones that do not usually
know exactly why, which makes the case for a dedicated
solution much clearer.
The best infrastructure decision is the one you do not
have to make. You already run Postgres. Use it.