AI & ML

When does pgvector make sense instead of a dedicated vector database?

pgvector is the right default when you already run PostgreSQL and need vector search joined to relational data, but the cited guidance says dedicated vector databases become worth evaluating around 50M+ vectors or when you need extremely low latency or built-in hybrid search.

By AxiomLogica Editorial

May 4, 202621 min read

Reviewed by Editorial

When does pgvector make sense instead of a dedicated vector database?

Bottom Line: pgvector is the economically rational default when your team already operates PostgreSQL and needs vector search joined to relational data. The operational break-even shifts around 50M vectors, or earlier if p99 latency targets are strict or hybrid search must run at high concurrency — those are the conditions under which a dedicated system like Qdrant earns its second operational surface.

When pgvector is the right default

pgvector extends PostgreSQL with vector column types and similarity search, and it is available on most hosted PostgreSQL services without provisioning anything new. The case for it is not primarily about raw performance — it is about what you avoid: a second database service to deploy, monitor, back up, and keep consistent with your primary store.

As the pgvector README puts it: "Store your vectors with the rest of your data." That single sentence captures the operational logic. Vectors live in the same schema as the application records they describe, subject to the same ACID guarantees, foreign key constraints, and point-in-time recovery that the team already depends on. By default pgvector performs exact nearest neighbor search, which provides perfect recall; approximate nearest neighbor search (ANN) is opt-in via an HNSW or IVFFlat index, trading recall for speed when the dataset warrants it. Vectors up to 2,000 dimensions can be indexed.

Bottom Line: If you already run PostgreSQL, the build-vs-buy threshold starts high. Every factor that pulls toward a dedicated vector database — scale, latency, hybrid search — must be weighed against the cost of owning two stateful services instead of one. For most teams with fewer than 5M vectors, pgvector on an existing Postgres cluster is the correct starting point, not a stopgap.

What changes the decision from convenience to scale

The decision is not a feature checklist. It is a weighted set of operational and performance constraints that each pull in different directions. The table below maps the key factors:

Decision Factor	Favors pgvector	Favors dedicated DB (e.g., Qdrant)
Existing PostgreSQL stack	✅ Zero incremental service	❌ Adds a second service
Relational joins on query path	✅ Native SQL	❌ Cross-service round-trips
ACID transactions required	✅ First-class in Postgres	❌ Eventual consistency risk
Vector count	✅ Under ~50M	❌ 50M+ strains Postgres ANN
Concurrent traffic / p99 latency	✅ Moderate; no strict SLA	❌ High concurrency + tight p99
Hybrid search (dense + sparse)	✅ Via Postgres FTS + RRF	❌ Native at scale favors specialists
Maintenance burden	✅ One toolchain	❌ Two toolchains, two runbooks

Bytebase's pgvector guide frames the shift plainly: purpose-built vector databases become more attractive at scale, with a practical threshold around 50M+ vectors, especially when latency or hybrid search needs exceed what PostgreSQL comfortably handles. Qdrant's benchmark methodology adds precision: latency and requests-per-second (RPS) are separate variables — as the benchmark page states, "ANN search is all about trading precision for speed", and "Requests-per-Second (RPS): Serve more requests per second in exchange of individual requests taking longer (i.e. higher latency)." Comparing systems without matching precision/recall is meaningless.

The practical implication: you cannot answer "should we switch?" by looking at vector count alone. Traffic pattern, filter selectivity, update rate, and p99 latency targets each modulate where the real threshold falls.

Existing Postgres stack and lower incremental ops cost

Teams that already run PostgreSQL can add pgvector with no new service to provision. The extension installs via APT, Yum, Homebrew, Docker, conda-forge, and comes preinstalled on Postgres.app and most hosted providers — including RDS, Cloud SQL, Supabase, and Neon. The marginal cost of enabling vector search is an extension flag and an index, not a new deployment pipeline. Bytebase notes that pgvector is available on most hosted PostgreSQL services, so teams already on Postgres can add vector search without provisioning a separate database service.

Production Note: Staying inside PostgreSQL means your backup schedule, monitoring stack, alerting runbooks, access-control policies, and disaster-recovery playbooks already cover your vector data. Moving to a dedicated vector database means separately running, monitoring, backing up, and scaling a second stateful service — with none of those runbooks written yet.

Qdrant, by contrast, is explicitly positioned as a standalone service. That isolation has value at scale, but it means every operational control must be duplicated.

Relational joins and ACID transactions as first-class requirements

pgvector's README states the core value proposition directly: "Plus ACID compliance, point-in-time recovery, JOINs, and all of the other great features of Postgres." For workloads where a similarity search must be filtered by row-level permissions, joined to a users table, or executed inside a transaction that also modifies application state, keeping vectors in PostgreSQL eliminates an entire class of consistency bugs.

A RAG pipeline that retrieves document chunks and then filters by tenant ID or access scope is a canonical example. Running that filter in Postgres costs a standard SQL predicate. Running it across two services means either fetching more vectors than needed and filtering in application code, or replicating access-control logic into the vector store — both are error-prone and slower.

Pro Tip: If metadata and vectors live in one transactional store, you can atomically insert a document record and its embedding in a single transaction. Split across two systems, you either accept a window of inconsistency or build a two-phase commit wrapper — neither is free.

pgvector supports exact and approximate nearest neighbor search in the same database as relational metadata, and it composes naturally with PostgreSQL row-level security and foreign keys.

Operational burden when vector search becomes another platform

A dedicated vector database is not just a faster similarity search engine — it is another platform with its own operational lifecycle. "Qdrant - High-performance, massive-scale vector database and similarity search engine" is how the project describes itself, and the same repository frames Qdrant as a production-ready service for storing, searching, and managing vectors with payloads. Qdrant 1.17 (2026) explicitly adds improved search latency and better operational observability — which signals that observability in dedicated vector databases is a first-class product concern, not an afterthought.

Watch Out: Adding a dedicated vector database introduces separate backup schedules, separate capacity planning, separate monitoring dashboards, and cross-system consistency risks — especially during reindexing or schema migrations. If the vectors and application records diverge during a failed write, detecting and repairing that split state is non-trivial without application-layer reconciliation logic.

For many teams, this operational overhead is the hidden cost that vendor benchmarks never surface. The performance uplift from a dedicated system must exceed not just the raw query latency delta, but the engineering time to build and maintain two-system consistency.

Scale breakpoints where pgvector starts to strain

pgvector does not have a hard row-count ceiling, but performance degrades predictably as vector counts grow and index memory pressure mounts. The practical scale bands below use the 50M+ threshold from Bytebase's guidance as the primary anchor, with latency expectations framed around ANN index behavior from the pgvector README:

Scale Band	Index Choice	Approximate Build Time	Query Latency Profile	Recommendation
< 5M vectors	HNSW or IVFFlat	Minutes	Low; comfortably sub-10ms p99 at moderate QPS	pgvector; no case to migrate
5M – 50M vectors	HNSW preferred	Tens of minutes to hours	Workload-dependent; benchmark required	pgvector viable; benchmark at real concurrency
50M+ vectors	HNSW strains RAM	Hours+	p99 degrades under load without vertical scaling	Evaluate dedicated DB
50M+ with hybrid search	N/A	N/A	PostgreSQL FTS + RRF adds latency at scale	Dedicated DB often wins

As the pgvector README notes: "HNSW index ... has better query performance than IVFFlat (in terms of speed-recall tradeoff), but has slower build times and uses more memory." Conversely, "IVFFlat ... has faster build times and uses less memory than HNSW, but has lower query performance." The index tradeoff compounds as dataset size grows — HNSW's memory requirement scales with the number of vectors, which eventually creates RAM pressure on the Postgres host itself.

Under 5M vectors: why pgvector usually stays simpler

Below 5M vectors, the argument for migrating off pgvector is almost entirely absent. HNSW indexes at this scale build in minutes, fit comfortably in memory on a standard database host, and deliver p99 latency well within acceptable ranges for most application workloads. The operational advantage of staying in PostgreSQL — shared backups, shared monitoring, no cross-service consistency — dominates any marginal latency gap versus a dedicated system.

pgvector also supports hybrid search at this scale via PostgreSQL full-text search combined with reciprocal rank fusion (RRF) or a cross-encoder reranker, covering the most common retrieval patterns without a second service.

Bottom Line: Under 5M vectors with moderate concurrency, pgvector on your existing PostgreSQL cluster is the correct choice. The ops cost of a dedicated vector database is not justified by the performance difference at this scale. Start here, instrument your query latency, and revisit when you cross measurable thresholds — not before.

5M to 50M vectors: where benchmarking and query mix matter

This band is where the decision genuinely requires empirical measurement rather than rules of thumb. The pgvector README recommends IVFFlat list sizing of approximately rows/1000 for up to 1M rows and sqrt(rows) above 1M, with a starting probe count of sqrt(lists) — these parameters are sensitive to your actual data distribution and must be tuned against recall targets.

Qdrant's benchmark framework separates two distinct modes: latency-first (minimize single-request p99) and RPS-first (maximize throughput at the cost of individual request latency). This distinction matters because your traffic pattern determines which axis is binding.

Signal	Interpretation	Action
p99 > 50ms at target concurrency	pgvector HNSW under RAM pressure	Benchmark Qdrant at matched recall
Filter selectivity < 1%	High-cardinality filter on large table	Test payload-filtered ANN in dedicated DB
Update rate > 10K vectors/hour	IVFFlat index rebuild cost grows	Evaluate write path in dedicated DB
Mixed dense + sparse queries	Postgres FTS + RRF adds latency	Native hybrid search in Qdrant may win
p99 < 20ms, QPS headroom adequate	pgvector holds	Stay; revisit at next scale milestone

Benchmark at real concurrency with production-representative filter predicates — not single-request recall measured offline. A system that achieves 95% recall at 5ms for a single query may exhibit 150ms p99 under 200 concurrent requests if the HNSW graph is resident in shared buffer contention with the rest of the Postgres workload.

50M+ vectors and low-latency hybrid search

At 50M+ vectors, dedicated vector databases become worth a structured evaluation, particularly when p99 latency targets are strict or hybrid search (dense vector + keyword) must run at scale. Bytebase's guidance makes this the practical inflection point.

Dimension	pgvector (50M+)	Qdrant (50M+)
HNSW memory footprint	Grows linearly; Postgres host RAM becomes binding	Purpose-built memory management; disk-backed segments
Hybrid search	Postgres FTS + RRF; separate execution plans	Native: "Blend keyword and vector search in one query – use dense or sparse vectors"
Scaling model	Vertical (bigger Postgres instance)	Horizontal sharding and distributed deployment
Latency isolation	Shares Postgres buffer pool with OLTP workload	Dedicated process; no buffer pool contention
Operational surface	One service	Second service with its own lifecycle

Qdrant's hybrid search natively supports dense and sparse vectors — including BM25, SPLADE++, and miniCOIL — in a single query. pgvector's hybrid search routes through PostgreSQL full-text search and combines results via RRF, which is functional but adds execution path complexity and latency at high vector counts.

When Qdrant and other dedicated vector databases win

A dedicated vector database earns its keep when search speed, query isolation, and vector-native features outweigh the operational convenience of PostgreSQL. Qdrant is explicitly "tailored to extended filtering support" — payload-filtered ANN search is a first-class operation, not bolted onto a general-purpose query planner. The same logic applies to other dedicated systems (Pinecone, Weaviate, Milvus, LanceDB, Turbopuffer), each optimized for different segments of the scale and latency space.

Capability	Qdrant	pgvector
Payload-filtered ANN (native)	✅ First-class	⚠️ Via SQL WHERE; planner overhead
Horizontal scaling / sharding	✅ Built-in	❌ Postgres sharding is complex
Search process isolation	✅ Dedicated process	❌ Shares Postgres buffer pool
Vector-native observability (v1.17)	✅ Built-in metrics	❌ Requires external instrumentation
ACID transactions	❌ Not a transactional DB	✅ First-class
SQL joins on relational data	❌ Requires application-side join	✅ Native

High concurrency and strict p99 latency targets

When a p99 latency SLA drives the architecture decision, dedicated databases have a structural advantage: they do not share a buffer pool or execution scheduler with an OLTP workload. A Postgres instance handling concurrent application writes, analytical queries, and high-QPS ANN searches on 50M+ vectors is competing with itself for I/O and memory bandwidth.

Qdrant's Sprinklr case study reports a P99 latency of 20ms for searches on 1 million vectors, while Elasticsearch and Milvus both exceeded 100ms in the same benchmark setup. The numbers are vendor-produced and should be reproduced against your own data before being treated as universal — but the directional signal is consistent with the architectural explanation: process isolation removes buffer pool contention.

Pro Tip: Measure p99 latency under realistic concurrent load — not single-request recall from an offline benchmark. A system that reports 5ms average latency in isolation may exhibit 80ms p99 at 200 concurrent requests when the host is also serving OLTP traffic. Run your concurrency test with production-representative filter predicates and a real update stream before making a migration decision.

Qdrant's benchmark framework explicitly separates latency-first and RPS-first workload profiles — a distinction that matters because optimizing for throughput (more RPS) degrades individual request latency, and vice versa. Confirm which axis your SLA binds before interpreting any published numbers.

Built-in hybrid search and vector-native features

Hybrid search — combining dense vector similarity with keyword relevance — is where the operational gap between pgvector and dedicated systems is most visible at scale.

Capability	pgvector + Postgres FTS	Qdrant native hybrid
Query interface	Two execution plans merged via RRF or cross-encoder	Single query: dense + sparse in one pass
Sparse vector support	❌ Not native	✅ BM25, SPLADE++, miniCOIL
Latency at 50M+ vectors	Adds FTS plan overhead	Optimized single-pass execution
Tuning surface	Postgres FTS configuration + RRF weights	Sparse model selection + fusion weights
Operational complexity	One service, two query types	One service, one query type

pgvector's README confirms "Use together with Postgres full-text search for hybrid search" — RRF and cross-encoder reranking are the documented patterns. This works well at moderate scale. At 50M+ vectors under load, merging two execution plans adds latency that native hybrid search avoids. The operational trade-off is clear: pgvector keeps one service but requires more query engineering; Qdrant adds a service but delivers hybrid search as a first-class primitive.

Migration and ops overhead you should price in

The build-vs-buy framing that competitors rarely make concrete: adopting a dedicated vector database is not a query performance decision — it is a systems ownership decision. The table below maps the operational cost dimensions across both options:

Cost Dimension	pgvector on Postgres	Dedicated vector DB (e.g., Qdrant)
Migration effort	Extension install + index build	Data export, transform, re-import; embedding pipeline reroute
Duplicate indexing	None	ANN index built separately from Postgres records
Monitoring	Shared Postgres tooling (PG Stats, Datadog, etc.)	Additional service metrics; new dashboards required
Backups	Covered by existing Postgres backup schedule	Separate backup job; separate retention policy
Scaling cost	Vertical Postgres instance upgrade	Horizontal shard scaling; separate capacity plan
Consistency risk	Zero (same transaction)	Cross-service sync required on every write
On-call runbooks	Existing Postgres playbooks apply	New runbooks required from scratch

What it costs to keep vectors inside PostgreSQL

For a team that already owns PostgreSQL operations, adding pgvector has near-zero marginal overhead. The extension installs with a single command on most providers, the index builds in the background, and every existing operational tool — backup schedules, Prometheus exporters, alerting thresholds, PITR policies — already covers the vector data without modification. The pgvector README installation guidance and hosted-provider support show that this keeps vectors within the same operational surface area as the rest of PostgreSQL.

Operational Item	Existing Postgres team cost
Backup coverage	Included; no new job
Monitoring	Included; extend existing queries
Access control	Row-level security and roles already in place
Disaster recovery	Existing PITR covers vectors automatically
Scale ceiling	Vertical; acceptable under ~50M vectors

Production Note: Shared operational tooling is the compounding advantage of pgvector. Every engineer who can operate Postgres can operate pgvector. There is no vector-database-specific incident response to train, no separate on-call rotation, and no divergent infrastructure-as-code path for the vector store.

What a dedicated vector database adds to the stack

"It provides a production-ready service with a convenient API to store, search, and manage points—vectors with an additional payload" is how Qdrant describes the service in its README. Production-ready means it can handle the workload; it does not mean the operational overhead disappears.

Operational Item	Dedicated vector DB incremental cost
Deployment	New service: container/VM, config, networking
Backups	Separate snapshot or export job
Monitoring	New dashboards; latency, RPS, recall metrics
Capacity planning	Independent from Postgres; must model separately
Consistency	Application-layer sync on every document write
Migration effort	Full re-embed or data export from Postgres

Watch Out: Consistency between your PostgreSQL application records and your dedicated vector store is entirely the application's responsibility. A failed write that inserts a database row but not the corresponding vector embedding produces silent retrieval gaps — documents that exist but are never returned by semantic search. Detecting this requires reconciliation jobs that add operational complexity and latency to your data pipeline.

Decision framework for backend and platform teams

The weighted decision comes down to five axes: workload size, join/ACID requirements, concurrent traffic, p99 latency targets, and hybrid search complexity. As Qdrant's benchmark guidance states, "ANN search is all about trading precision for speed" — and by extension, operational simplicity against performance headroom. The threshold from Bytebase is ~50M vectors, but each of the other axes can move that threshold earlier.

Axis	pgvector threshold	Dedicated DB threshold
Vector count	< ~50M	≥ 50M
Relational joins on query path	Required	Not required
ACID transactions	Required	Not required
p99 latency target	> 20ms acceptable	< 20ms at high concurrency
Concurrent search QPS	Moderate (< a few hundred)	High (hundreds to thousands)
Hybrid search at scale	Low-to-moderate volume	High volume, sparse vectors required
Update rate	Low-to-moderate	High (frequent reindexing)

Choose pgvector when these conditions are true

Bottom Line: Choose pgvector when: (1) your team already operates PostgreSQL, (2) vector search must join to relational data or run inside transactions, (3) your vector count is below ~50M, and (4) p99 latency targets are achievable without dedicated process isolation. The operational default is one service, one toolchain, and zero migration cost. Every departure from this default requires a concrete performance or feature justification measured under production-representative load.

The green-light criteria in plain terms: existing PostgreSQL ownership, SQL join requirements, ACID semantics, vector count under the ~50M threshold, and no strict sub-20ms p99 SLA at high concurrency. pgvector also handles light hybrid search via PostgreSQL full-text search and RRF without adding a service.

Choose Qdrant or another dedicated system when these conditions are true

The trigger conditions for moving off PostgreSQL are not vague — they are measurable:

Vector count exceeds ~50M and HNSW memory pressure is degrading query performance or forcing vertical scaling beyond cost-efficient instance sizes.
p99 latency targets are strict (e.g., sub-20ms) at high concurrency, and benchmarking confirms that Postgres buffer pool contention is the binding constraint — not query logic.
Hybrid search at scale requires native sparse vector support (BM25, SPLADE++, miniCOIL) that PostgreSQL FTS cannot match in a single query pass.
Vector search is operationally isolated from the OLTP workload — meaning the team is willing to own a second stateful service, its backups, its monitoring, and its consistency guarantees.
Horizontal scaling is required and vertical Postgres scaling has hit cost or architectural limits.

Qdrant is the reference point here for high-performance, payload-filtered ANN at scale. Pinecone, Weaviate, Milvus, LanceDB, and Turbopuffer occupy different segments of this space (managed vs. self-hosted, dense-only vs. multimodal, serverless vs. provisioned) — the right choice among dedicated systems is a separate evaluation once the decision to leave PostgreSQL is justified.

Questions teams ask before committing

How many vectors can pgvector handle?

pgvector does not impose a hard vector-count limit, but practical performance degrades as ANN index memory requirements grow relative to available RAM. Most teams find pgvector operates comfortably below ~50M vectors with HNSW indexing; above that threshold, query latency under load and index build times create pressure to evaluate dedicated systems. The exact inflection depends on vector dimensionality (up to 2,000 dimensions can be indexed), update rate, filter selectivity, and concurrent QPS — not vector count alone.

Is pgvector good for production?

Yes, for workloads that need vectors joined to relational data with ACID guarantees and moderate concurrency. pgvector runs on the same PostgreSQL instance as application data, inheriting point-in-time recovery, role-based access control, and existing monitoring. The qualification is scale: at 50M+ vectors or strict p99 SLAs under high concurrency, production readiness depends on whether Postgres can meet your latency budget without dedicated process isolation.

Is pgvector better than Qdrant?

They are optimized for different regimes. pgvector wins when operational simplicity, SQL joins, and ACID transactions are required. Qdrant wins when vector count is large, p99 latency must be tight under high concurrency, or native hybrid search with sparse vectors is required. The Sprinklr benchmark shows Qdrant delivering 20ms P99 on 1M vectors where Elasticsearch and Milvus exceeded 100ms — but that comparison does not include pgvector, and single-system benchmarks under controlled conditions rarely replicate production multi-tenant load.

What are the limitations of pgvector?

Key constraints: vectors up to 2,000 dimensions can be indexed; approximate search requires an explicit HNSW or IVFFlat index (exact search is the default); HNSW uses more memory than IVFFlat but delivers better speed-recall tradeoff; IVFFlat has faster build times but lower query performance; both index types share the PostgreSQL buffer pool with the rest of the workload, creating contention under high concurrency; horizontal scaling requires Postgres-level partitioning or Citus, which adds complexity; hybrid search requires composing PostgreSQL FTS with ANN results, adding execution overhead at scale.

When does the migration cost make a dedicated database irrational?

When the performance gap does not justify building consistency sync, new monitoring, separate backup schedules, and new on-call runbooks. Teams that migrate at 10M vectors to chase a 30% latency improvement often find they have traded a simple operational model for a complex one without a meaningful user-facing difference. Measure p99 under real concurrency first.

Sources and References

Bytebase pgvector Guide — Primary source for operational positioning, hosted-provider availability, and the ~50M vector decision threshold
pgvector GitHub README — Authoritative source for indexing constraints (HNSW vs. IVFFlat trade-offs, 2,000-dimension limit, ACID and JOIN capabilities, hybrid search patterns)
Qdrant GitHub README — Qdrant's official positioning as a standalone high-performance vector database with extended filtering support
Qdrant Benchmarks — Methodology for ANN comparison: precision/recall matching, latency vs. RPS distinction
Qdrant Single-Node Speed Benchmark — Latency-first vs. RPS-first benchmark modes; concurrency planning guidance
Qdrant Homepage — Native hybrid search capabilities: dense + sparse vectors, BM25, SPLADE++, miniCOIL
Qdrant Sprinklr Case Study — P99 latency of 20ms on 1M vectors; comparison against Elasticsearch and Milvus
Qdrant 1.17 Release Blog — 2026 release notes covering search latency improvements and operational observability enhancements

Keywords: pgvector, PostgreSQL 16, Qdrant, Pinecone, Weaviate, Milvus, Chroma, LanceDB, Turbopuffer, ACID transactions, approximate nearest neighbors (ANN), hybrid search, HNSW, IVFFlat, p99 latency

Was this guide helpful?

Share: X · LinkedIn · Reddit