pgvector: How to Add Vector Search to PostgreSQL

pgvector is a PostgreSQL extension that lets you store vectors in a table column and search them by similarity. It reached version 0.8.2 in late 2025, and it now ships by default on most hosted PostgreSQL services: AWS RDS, Google Cloud SQL, Supabase, and Neon. If you already run PostgreSQL, you can add vector search without standing up a separate database.

Vectors are arrays of numbers that represent meaning. A sentence about "database migrations" and a sentence about "schema changes" end up with vectors that sit close to each other, even though they share no keywords. That closeness is the whole point of vector search: it finds semantically related content rather than exact word matches.

What pgvector stores

An embedding model converts text into a list of numbers. Send "I love coffee" to the model and it returns 1,536 numbers. Send "I enjoy espresso" and you get a nearly identical list, even though the words are completely different. Send "database migration" and the numbers look nothing like either of those. The model learned these relationships from patterns in text; you don't define them yourself.

pgvector adds a column type to PostgreSQL that stores these lists. You set the dimension count when you create the column (1,536 for OpenAI's text-embedding-3-small) and it stays fixed from then on. When you search, pgvector finds the rows whose numbers are closest to your query, which means closest in meaning, not in wording.

Installing pgvector

On a self-hosted PostgreSQL server:

# Ubuntu/Debian
sudo apt install postgresql-16-pgvector

# macOS with Homebrew
brew install pgvector

Then enable it inside the database:

CREATE EXTENSION vector;

On hosted services (RDS, Google Cloud SQL, Supabase, Neon), the package is already installed, so CREATE EXTENSION vector; is all you need.

Storing and querying vectors

You add a vector column to any table and store embeddings like any other value:

CREATE TABLE documents (
  id        SERIAL PRIMARY KEY,
  content   TEXT,
  embedding vector(1536)
);

In practice you generate the embeddings in application code, sending text to an embedding model (OpenAI, Cohere, a local model), then inserting the returned array. To search, you pass a query embedding and order by distance:

-- Find the 5 documents most similar to a query
SELECT content, embedding <=> $1 AS distance
FROM documents
ORDER BY embedding <=> $1
LIMIT 5;

The <=> operator computes cosine distance, the most common choice for text. pgvector also supports L2 distance (<->) and inner product (<#>). The advantage over a dedicated vector database shows up here: you can combine vector search with any SQL filter or join in the same query.

Indexing for performance

Without an index, every similarity query scans every row. That's fine up to roughly 100,000 rows, but it slows down past that. Add an HNSW index to switch to approximate nearest-neighbor search, which is much faster at the cost of very slightly less precision:

CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);

HNSW can be built on an empty table and handles new inserts without a rebuild. On large tables, use CREATE INDEX CONCURRENTLY to avoid locking. pgvector also has an IVFFlat index type that uses less memory, but HNSW is the better default for most cases.

pgvector vs. a dedicated vector database

The main alternative to pgvector is a purpose-built vector database: Pinecone, Weaviate, Qdrant, or Milvus. Which one fits depends on your scale and how much operational complexity you're willing to take on.

Factor	pgvector	Dedicated vector DB
Setup	`CREATE EXTENSION`	Separate service to run
SQL joins and filters	Yes, natively	Metadata filtering only
ACID transactions	Yes	Varies
Comfortable scale	Up to ~10M rows	Hundreds of millions
Operational overhead	Existing PostgreSQL infra	New service to manage
Managed options	RDS, Supabase, Neon	Pinecone, Weaviate Cloud

For most teams building a first RAG pipeline or adding semantic search to an existing product, pgvector is the right place to start. You get vector search without new infrastructure, and your vectors live next to your relational data where you can query them together.

A purpose-built database earns its keep at very large scale (50M+ vectors), when you need extremely low query latency, or when you want built-in hybrid search. PostgreSQL vs MongoDB covers a related trade-off: MongoDB Atlas has its own vector search, and if your data already lives there, that may be the simpler path.

pgvector and schema migrations

Adding pgvector to a production database is a schema change like any other: you enable an extension, add columns, and build indexes. A couple of things are worth knowing before you do.

HNSW index builds are CPU-intensive. Use CREATE INDEX CONCURRENTLY on production tables.
Switching embedding models means changing the dimension count, which means dropping the old column and backfilling all the embeddings. Plan that as a multi-step migration, not a one-liner.

pgvector follows the same lifecycle as any other PostgreSQL extension: enable it in staging first, confirm the version matches production, then roll it out. What is database migration? covers safe sequencing. pgvector also shows up consistently among the top PostgreSQL extensions in active use.

Bytebase handles PostgreSQL schema migrations with a review step before SQL runs in production, and keeps a full history of every change, including extension additions and index builds. For teams managing pgvector alongside other PostgreSQL schema changes, that review step is where you catch problems before they reach production.

FAQ

Does pgvector work with all PostgreSQL hosting providers? Most major providers include it as of 2024: AWS RDS (PostgreSQL 15.2+), Google Cloud SQL, Azure Database for PostgreSQL, Supabase, Neon, and Render.

How many vectors can pgvector handle? Most teams run comfortably up to 5-10 million vectors on a well-sized PostgreSQL instance. Approximate indexes (HNSW, IVFFlat) help the most past 100,000 rows. Beyond ~50M vectors, a dedicated vector database is worth evaluating.

Is pgvector production-ready? Yes. pgvector 0.7.0 (March 2024) added HNSW indexing, which was the main gap for production workloads. Version 0.8.2 is the current stable release. It runs in production at Supabase, Neon, and plenty of teams building RAG applications.

Back to blog

pgvector: How to Add Vector Search to PostgreSQL

What pgvector stores

Installing pgvector

Storing and querying vectors

Indexing for performance

pgvector vs. a dedicated vector database

pgvector and schema migrations

FAQ

Explore the standard for database governance

Solutions & Platform

Resources

Change Management

Access & Security

AI & Agents

Comparisons

Company

Legal

pgvector: How to Add Vector Search to PostgreSQL

What pgvector stores#

Installing pgvector#

Storing and querying vectors#

Indexing for performance#

pgvector vs. a dedicated vector database#

pgvector and schema migrations#

FAQ#

Explore the standard for database governance

What pgvector stores

Installing pgvector

Storing and querying vectors

Indexing for performance

pgvector vs. a dedicated vector database

pgvector and schema migrations

FAQ