Postgres Is All You Need: pgvector as Production AI Infrastructure

Postgres Is All You Need: pgvector as Production AI Infrastructure | Celestinosalim.com

Postgres Is All You Need

Last year I evaluated four vector databases for a production RAG system. I read every benchmark, sat through three vendor demos, and built proof-of-concept integrations with Pinecone, Weaviate, and Qdrant.

Then I shipped pgvector on the Postgres instance we already had running. It took an afternoon.

Six months later, that system serves 50,000 semantic searches per day at p95 latency under 40ms. The total infrastructure cost for vector search is $0 extra per month because it runs on the same database that stores our users, sessions, and application data.

The Vector DB Hype Cycle

Venture capital poured $350M+ into vector database startups in 2023-2024. Pinecone raised $138M. Weaviate raised $50M. Qdrant raised $28M. That money needs to justify itself, which means marketing budgets aimed directly at convincing you that Postgres cannot handle your embeddings.

Here is what the pitch sounds like: "You need a purpose-built vector database for production AI. Postgres was not designed for high-dimensional similarity search. You will hit scaling walls."

That pitch is not wrong for everyone. It is wrong for about 90% of teams building AI features today.

Most production RAG systems have fewer than 5M vectors. Most query volumes are under 100 QPS. Most latency requirements are "under 200ms." If that describes your system, you are the target customer for a $200K/year service you do not need.

pgvector Basics

pgvector is a Postgres extension that adds vector data types and similarity search operators. You enable it with one line:

CREATE EXTENSION IF NOT EXISTS vector;

That gives you a vector column type, three distance operators (<-> for L2, <=> for cosine, <#> for inner product), and three indexing strategies.

Flat Scan (No Index)

No index at all. Postgres scans every row and computes the distance. 100% recall, but O(n) performance. Fine for tables under 10,000 rows. I use this for development and small lookup tables.

IVFFlat

Inverted file index. Partitions vectors into clusters (called "lists"), then searches only the nearest clusters at query time. You choose the number of lists and how many to probe.

CREATE INDEX ON documents
  USING ivfflat (embedding vector_cosine_ops)
  WITH (lists = 100);

Good recall at reasonable speed. The catch: you need to build the index after inserting your data, and the quality depends on choosing the right number of lists. Rule of thumb: lists = rows / 1000 for up to 1M rows.

HNSW

Hierarchical Navigable Small World graph. This is the one you want for production. It builds a multi-layer graph structure that enables logarithmic search time with configurable recall.

CREATE INDEX ON documents
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 64);

HNSW takes longer to build but delivers better query performance than IVFFlat and does not require rebuilding after inserts. The two parameters that matter:

m: connections per node (default 16, higher = better recall but more memory)
ef_construction: build-time search width (default 64, higher = better recall but slower builds)

Performance Reality

I ran benchmarks on a Supabase Pro instance (4 GB RAM, 2-core CPU) with 1M vectors at 1536 dimensions (OpenAI text-embedding-3-small output size). Here is what I measured:

| Metric | HNSW | IVFFlat | Pinecone p1 | |---|---|---|---| | p50 latency | 8ms | 12ms | 5ms | | p95 latency | 22ms | 35ms | 11ms | | p99 latency | 38ms | 58ms | 18ms | | Recall@10 | 0.98 | 0.95 | 0.99 | | QPS (single conn) | 120 | 85 | 200 |

Pinecone is faster. No question. A purpose-built vector index running on optimized hardware will beat a general purpose database. But look at the actual numbers. 22ms vs 11ms at p95. For a RAG pipeline where the LLM generation step takes 800-2000ms, you are arguing about 11 milliseconds.

Recall at 0.98 means for every 100 queries, pgvector returns the same top-10 results as an exact search 98 times. The 2% where it differs? The "wrong" results are still semantically close -- usually items ranked 11th or 12th in the exact ordering.

I tuned ef_search (the query-time search width) to 100:

SET hnsw.ef_search = 100;

That pushed recall to 0.99 with p95 at 28ms. Still well within acceptable range for any user-facing application.

The Supabase Advantage

I run my Postgres on Supabase, which ships pgvector enabled by default. Zero setup. The same connection string that my Next.js app uses for user auth and application data also serves vector search queries.

This matters more than benchmarks. Here is why:

One fewer service to monitor. No separate vector DB dashboard, no additional uptime SLA to track, no second set of access credentials to rotate.

One fewer network hop. My application queries users, embeddings, and metadata in a single SQL query with JOINs. A dedicated vector DB means a separate API call, a separate round trip, and the complexity of correlating results across two data stores.

One fewer billing surprise. Supabase Pro is $25/month. I know exactly what it costs. Vector DB pricing is per vector, per query, per dimension, per namespace -- a pricing model designed to scale with your success and punish you for it.

Transactional consistency. When I insert a new document, the text content, metadata, and embedding all land in the same transaction. There is no sync lag between my application database and my vector index. No eventual consistency headaches.

When pgvector Is NOT Enough

I would be dishonest if I told you pgvector solves everything. Here are the cases where a dedicated vector database earns its cost:

More than 10M vectors. At this scale, HNSW index build times stretch to hours and memory requirements exceed what a standard Postgres instance provides. Pinecone and Qdrant shard across machines natively. pgvector does not.

Sub-10ms p99 latency requirements. If you are building a real-time recommendation system where every millisecond counts (think ad serving, trading signals), the 20-40ms p99 range of pgvector will not cut it.

Multi-tenant isolation at massive scale. You have 1000 tenants each with 1M vectors. Namespaces in Pinecone handle this cleanly. In pgvector, you are either partitioning tables or filtering with WHERE clauses, and neither scales as elegantly past a certain point.

GPU-accelerated search. Some workloads need the raw throughput that GPU-based vector search provides. pgvector runs on CPU only.

If your use case hits any of these, the dedicated vector DB is worth the money. But be honest about whether you are solving today's problem or a problem you imagine having in 18 months.

Implementation: Supabase + pgvector

Here is the complete setup I use in production. Five steps from zero to semantic search.

1. Create the Table

CREATE TABLE documents (
  id BIGSERIAL PRIMARY KEY,
  content TEXT NOT NULL,
  metadata JSONB DEFAULT '{}',
  embedding VECTOR(1536),
  created_at TIMESTAMPTZ DEFAULT NOW()
);

2. Add the HNSW Index

CREATE INDEX ON documents
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 200);

I set ef_construction to 200 for production. Higher than the default, but it only affects build time, not query time. Worth the upfront cost for better recall.

3. Create a Search Function

CREATE OR REPLACE FUNCTION match_documents(
  query_embedding VECTOR(1536),
  match_threshold FLOAT DEFAULT 0.78,
  match_count INT DEFAULT 10
)
RETURNS TABLE (
  id BIGINT,
  content TEXT,
  metadata JSONB,
  similarity FLOAT
)
LANGUAGE plpgsql
AS $$
BEGIN
  RETURN QUERY
  SELECT
    d.id,
    d.content,
    d.metadata,
    1 - (d.embedding <=> query_embedding) AS similarity
  FROM documents d
  WHERE 1 - (d.embedding <=> query_embedding)
    > match_threshold
  ORDER BY d.embedding <=> query_embedding
  LIMIT match_count;
END;
$$;

4. Insert Embeddings (TypeScript)

import { createClient } from '@supabase/supabase-js'
import OpenAI from 'openai'

const supabase = createClient(
  process.env.SUPABASE_URL!,
  process.env.SUPABASE_SERVICE_KEY!
)
const openai = new OpenAI()

async function insertDocument(
  content: string,
  metadata: Record<string, unknown>
) {
  const { data: embeddingData } =
    await openai.embeddings.create({
      model: 'text-embedding-3-small',
      input: content,
    })

  const embedding =
    embeddingData[0].embedding

  const { error } = await supabase
    .from('documents')
    .insert({
      content,
      metadata,
      embedding,
    })

  if (error) throw error
}

5. Semantic Search (TypeScript)

async function searchDocuments(query: string) {
  const { data: embeddingData } =
    await openai.embeddings.create({
      model: 'text-embedding-3-small',
      input: query,
    })

  const queryEmbedding =
    embeddingData[0].embedding

  const { data, error } = await supabase
    .rpc('match_documents', {
      query_embedding: queryEmbedding,
      match_threshold: 0.78,
      match_count: 5,
    })

  if (error) throw error
  return data
}

That is the entire implementation. No SDK for a separate vector service. No API key management for another provider. No data sync pipeline. Just SQL and the Supabase client you already have.

Cost Comparison: Pinecone vs. pgvector

I priced both options for a realistic workload: 500K vectors at 1536 dimensions, 50K queries per day, single region.

| | Pinecone (s1) | Supabase Pro | |---|---|---| | Base cost | $70/month | $25/month | | Storage (500K vecs) | included | included | | Query costs | included | included | | Writes (10K/day) | included | included | | Additional infra | $0 | $0 | | Total | $70/month | $25/month |

The Pinecone s1 pod at $70/month handles this workload. Supabase Pro at $25/month handles it too -- and also runs your auth, your application database, your realtime subscriptions, and your storage.

But the real cost is not the invoice. It is the engineering time.

Adding Pinecone means: a new SDK to learn, a data sync pipeline to build and maintain, a separate monitoring setup, credential rotation for another service, a migration path if you outgrow the s1 pod or Pinecone changes pricing, and the cognitive overhead of reasoning about data consistency across two stores.

I estimate that overhead at 2-4 engineering hours per month for a small team. At $150/hour fully loaded, that is $300-600/month in hidden costs. More than the infrastructure itself.

The Boring Technology Thesis

Dan McKinley wrote "Choose Boring Technology" in 2015. The argument: every technology choice you make spends from a limited innovation budget. You can only sustain a few novel technologies at a time before your team drowns in operational complexity.

Postgres is the most boring technology I know. It has been in production since 1996. Every engineer on your team has used it. Your monitoring, backup, and failover infrastructure already handles it.

pgvector turns Postgres into a good-enough vector database. Not the fastest. Not the most feature-rich. But good enough for the vast majority of production AI workloads. And it does it without spending any of your innovation budget.

I have shipped three AI features on pgvector. None of them required migrating to a dedicated vector DB later. The one time I considered it, I realized the bottleneck was my chunking strategy, not the vector index. I fixed the real problem and moved on.

The 10-20% performance gap between pgvector and a dedicated solution is real. But it is a gap you trade for: one fewer service, one fewer vendor relationship, one fewer failure mode, and one fewer thing to explain to the on-call engineer at 3 AM.

That trade-off is worth it almost every time.

Start Here

If you are evaluating vector databases, try this before you sign a contract:

Enable pgvector on your existing Postgres
Load your actual embeddings (not a benchmark dataset)
Run your actual queries and measure latency
Compare that latency to your actual requirements

Most teams discover that pgvector meets their needs before they finish step 3. The ones that do not usually know exactly why, which makes the case for a dedicated solution much clearer.

The best infrastructure decision is the one you do not have to make. You already run Postgres. Use it.

Postgres Is All You Need: pgvector as Production AI Infrastructure

Discussion

Fine-Tuning vs RAG: The Decision Framework Nobody Talks About

Why Your RAG System Is Bleeding Money (And How to Fix It)

When to Kill an AI Feature: Product Decisions Most Teams Get Wrong