AI & ML

Qdrant vs pgvector vs pgvectorscale for billion-vector filtering workloads

Q: Is Qdrant faster than pgvector?

Against plain pgvector on an unoptimized configuration, Qdrant can outperform it on filtered workloads due to its one-stage HNSW traversal. Against pgvector paired with pgvectorscale's StreamingDiskANN index, the answer reverses sharply on throughput: [pgvector + pgvectorscale posted 471.57 QPS versus Qdrant's 41.47 QPS](https://www.tigerdata.com/blog/pgvector-vs-qdrant) at 99% recall on 50M × 768-dim Cohere embeddings. The comparison is not one-dimensional — Qdrant prioritizes tail latency consistency while pgvectorscale prioritizes peak throughput.

Q: Does pgvector support filtering?

Yes. pgvector runs inside PostgreSQL, so metadata filtering uses standard SQL predicates — `WHERE` clauses, joins, and subqueries — applied by the query planner. Both HNSW and IVFFlat indexes are available. The scaling implication: at low-to-moderate vector counts with broad filters, SQL predicate filtering is efficient. At billion-vector scale with high-selectivity filters, query plan choices and index memory pressure degrade performance in ways that StreamingDiskANN or Qdrant's one-stage traversal handle more predictably.

Q: What is pgvectorscale used for?

pgvectorscale adds the StreamingDiskANN search index to PostgreSQL on top of pgvector's existing data types and distance functions. It targets large-scale vector search workloads — specifically the regime where pgvector's in-memory HNSW index becomes too expensive to maintain and serve. StreamingDiskANN stores the graph on disk and streams neighborhoods into memory during traversal, enabling high throughput at lower memory cost. It is not a standalone database — it is a scale-oriented Postgres extension used alongside pgvector.

On a 50M-vector benchmark, pgvectorscale/Postgres delivered 11.4x higher throughput than Qdrant at 99% recall (471.57 QPS vs 41.47 QPS) while Qdrant kept lower tail latency, but the result is workload-dependent and the Tiger Data comparison notes index build speed and operational trade-offs still matter.

By AxiomLogica Editorial

May 4, 202616 min read

Reviewed by Editorial

Qdrant vs pgvector vs pgvectorscale for billion-vector filtering workloads

How we compared Qdrant, pgvector, and pgvectorscale

The comparison centers on a single, practically relevant question: which engine sustains the best throughput and tail latency on vector search at scale, under identical hardware and a fixed recall target? The benchmark surfaced here is ANN-benchmarks-style, not a direct filtered-production workload test, so the filtered-search conclusions below should be treated as workload interpretation rather than a measured result. To answer it, the primary dataset used throughout is 50 million Cohere embeddings at 768 dimensions, evaluated on identical AWS hardware at a 99% recall target — the same conditions Tiger Data reported in their pgvector vs Qdrant comparison.

The benchmark methodology follows ANN-benchmarks-style evaluation: server-side engine, client on a separate machine, engine-specific configuration for collection creation, upload, and search. Qdrant's vector-db-benchmark repository codifies this structure, running the server in Docker Compose with a remote client to isolate query latency from local interference.

Parameter	Value
Dataset	50M Cohere embeddings, 768 dimensions
Recall target	99%
Hardware	Identical AWS EC2 instances
Evaluation style	ANN-benchmarks (server + remote client)
Engines tested	Qdrant · pgvector · pgvector + pgvectorscale

Benchmark scope: same dataset, same hardware, same recall target

Tiger Data ran Qdrant and PostgreSQL with pgvector plus pgvectorscale on the same 50M-vector, 768-dimensional Cohere dataset and the same AWS hardware. As Tiger Data states: "We tested Postgres and Qdrant on a level playing field: 50 million embeddings, each with 768 dimensions."

Shared condition	Detail
Dataset	50M × 768-dim Cohere embeddings
Recall target	99%
Hardware parity	Identical AWS instances for all engines
Benchmark framework	ANN-benchmarks-style, server + remote client
Postgres build	pgvector + pgvectorscale, release-mode with native compiler opts

The benchmark surfaced is ANN-oriented. Filtered-search behavior can diverge materially from pure ANN results, particularly at high filter selectivity, so the benchmark should not be read as a direct production filtered-search measurement unless the filter workload is explicitly reproduced.

Why filtered search changes the ranking

Pure ANN benchmarks measure nearest-neighbor recall and throughput on an unfiltered index. The moment a predicate narrows the candidate space — "return the top-10 vectors where category = 'electronics' and price < 200" — the index graph topology changes the calculus entirely.

Qdrant applies filters during HNSW traversal, avoiding a separate pre- or post-filtering pass. Its documentation acknowledges the failure mode directly: "If too many vectors are filtered out, so the HNSW graph becomes disconnected." At high selectivity, graph disconnection degrades recall and forces fallback behavior, so no engine is immune to filter-induced degradation.

pgvector handles filtering through standard SQL predicates and query planner logic. At moderate scale the planner can push predicates alongside an HNSW scan, but the effectiveness depends heavily on filter selectivity and the shape of the index. pgvectorscale's StreamingDiskANN index alters this dynamic by redesigning the traversal path for large, disk-resident indexes — but the behavior under narrow filters versus broad filters still varies by workload.

Pro Tip: An ANN leaderboard ranking tells you almost nothing about filtered-retrieval performance. Before accepting any benchmark number, verify whether the test included payload filters and what the filter selectivity was. A 10× throughput gap on unfiltered ANN can shrink or reverse when you add a predicate that passes 1% of vectors versus one that passes 80%.

Benchmark snapshot: throughput, tail latency, and recall

At 99% recall on the 50M-vector dataset, pgvector + pgvectorscale reached 471.57 QPS while Qdrant reached 41.47 QPS — an 11.4× throughput advantage for the Postgres stack. The surfaced source quantifies throughput, but it does not expose exact p95 and p99 numbers in the excerpt available here, so tail-latency comparison has to remain qualitative in this article.

Engine	QPS @ 99% recall	p95 latency	p99 latency
pgvector + pgvectorscale	471.57	Not surfaced in excerpt	Not surfaced in excerpt
Qdrant	41.47	Not surfaced in excerpt	Not surfaced in excerpt
pgvector (plain)	Below pgvectorscale	Not surfaced in excerpt	Not surfaced in excerpt

Plain pgvector without pgvectorscale performs below the combined stack in this benchmark — the StreamingDiskANN index is the primary driver of the Postgres throughput advantage.

What the 50M-vector result says about pgvectorscale vs Qdrant

The 11.4× throughput gap is the headline, but the mechanism matters. As Tiger Data states: "At 99% recall, Postgres enhanced with pgvector and pgvectorscale demonstrates significantly higher throughput, processing 471.57 queries per second compared to Qdrant's 41.47 queries per second."

Metric	pgvector + pgvectorscale	Qdrant
QPS @ 99% recall	471.57	41.47
Throughput ratio	11.4×	1×
Index type	StreamingDiskANN	HNSW
Tail latency	Not quantified in surfaced source	Not quantified in surfaced source

StreamingDiskANN's disk-oriented traversal strategy enables high throughput on large indexes without requiring the entire graph to reside in memory. This is the architectural reason the Postgres stack outperforms Qdrant on raw QPS in this specific benchmark. That advantage is real and reproducible under the stated conditions; it does not extend automatically to every filter shape or every dataset size.

Where Qdrant still wins on latency consistency

The QPS number favors pgvectorscale, but QPS alone does not determine SLA fitness. Tiger Data's comparison reports p50, p95, and p99 latencies alongside throughput, but the exact numbers are not exposed in the excerpt used here, so any measured latency advantage has to be left unclaimed in this article.

The key trade-off is architectural rather than numeric: Qdrant applies payload filters during HNSW traversal and runs as a dedicated vector service, while PostgreSQL shares CPU, memory, and connection-management overhead with other relational workloads. For workloads where p99 spikes translate to user-facing timeout budgets, the tail-latency distribution still matters more than peak throughput even when the exact percentile values are not surfaced in the shortened comparison.

Latency consideration	Qdrant posture	pgvector + pgvectorscale posture
Filter application	During HNSW traversal (one-stage)	SQL predicate + StreamingDiskANN
Shared resource contention	Isolated vector-only engine	Shared Postgres process pool
Tail latency under filters	Architecturally optimized, exact p95/p99 not surfaced here	Benchmark-dependent, exact p95/p99 not surfaced here
Reported percentiles	p50, p95, p99 in Tiger Data source	p50, p95, p99 in Tiger Data source

Qdrant: strengths, weak spots, and ideal workload fit

Qdrant is a purpose-built vector search engine written in Rust, designed around payload-aware filtered retrieval. Its HNSW implementation integrates filter predicates directly into graph traversal rather than applying them as a separate stage, which preserves recall under complex, multi-condition filters that would otherwise require expensive post-processing.

Dimension	Qdrant	pgvector	pgvector + pgvectorscale
Payload filtering	Native, one-stage during HNSW traversal	SQL predicates through PostgreSQL planner	SQL predicates through PostgreSQL planner
Multi-tenancy	Collection-per-tenant or named vectors	PostgreSQL schema and row-level controls	PostgreSQL schema and row-level controls
Scaling model	Horizontal sharding, distributed mode	Vertical scale inside PostgreSQL	Disk-resident index on PostgreSQL
Operational burden	Standalone service, separate lifecycle	Postgres extension, no separate service	Postgres extension plus index-management overhead

Where Qdrant is a better operational fit

Qdrant runs as an independent service with no relational database dependency. For teams where vector search is a distinct capability — not tightly coupled to transactional Postgres tables — running Qdrant separately isolates resource consumption, eliminates shared query planner contention, and allows the vector index to be tuned, scaled, and upgraded on its own schedule.

Pro Tip: If your vector workload runs alongside heavy OLTP writes or complex analytical queries in the same database, co-locating vectors in Postgres introduces resource contention that doesn't exist in a dedicated vector store. Qdrant's operational isolation means a spike in write throughput on your relational layer won't degrade p99 on vector retrieval.

Where Qdrant loses ground in this benchmark

On the cited 50M-vector, 99% recall benchmark, Qdrant posted 41.47 QPS against 471.57 QPS for pgvector+pgvectorscale — an 11.4× throughput deficit in this specific evaluation.

Watch Out: The 41.47 QPS result is workload-dependent. Tiger Data's benchmark reflects a specific dataset (50M × 768-dim Cohere embeddings), a specific recall target (99%), and specific AWS hardware. Qdrant's relative performance changes materially with different filter selectivities, different vector dimensionalities, and different concurrency patterns. Do not treat the 11.4× gap as a universal ceiling on Qdrant throughput — reproduce the benchmark on your actual workload before committing.

pgvector: when the PostgreSQL extension is enough

pgvector adds vector similarity search to PostgreSQL 16 as an extension, supporting both HNSW and IVFFlat index types. Filtering is expressed through standard SQL predicates and joins — the same WHERE clauses engineers already use for relational queries. As Tiger Data's documentation states: "You can rely on familiar pgvector capabilities (for example HNSW and IVFFlat indexes)…"

Dimension	pgvector
Filtering mechanism	SQL predicates, query planner
Index types	HNSW, IVFFlat
Ecosystem fit	Native Postgres — joins, transactions, RLS
Scale ceiling	Hardware/config-dependent; degrades before pgvectorscale
Operational model	Postgres extension, no separate service

What pgvector does well for mixed relational and vector queries

The core strength of pgvector is co-location: vectors live in the same table as the metadata they describe, join freely against other Postgres tables, and participate in transactions. PostgreSQL documents this behavior in its core SQL model, including joins and transactional guarantees, while pgvector inherits those capabilities inside the database. Applying a metadata filter requires no synchronization between a separate vector store and a relational source of truth — the filter executes in a single query plan.

Pro Tip: If your application needs to join vector results against user permissions, product catalog metadata, or real-time inventory counts, keeping vectors in Postgres with pgvector eliminates an entire class of consistency problems. A separate vector store with a Postgres metadata backend requires two-phase retrieval and risks staleness between the stores on writes.

Where pgvector alone stops being the answer

Plain pgvector begins to show stress as dataset size grows past tens of millions of vectors and query concurrency increases. The HNSW index must fit in memory for efficient graph traversal; at billion-vector scale, memory pressure forces either an index that no longer fits in RAM or a configuration trade-off that degrades recall. Write amplification on large HNSW indexes is a real cost — every incremental insertion can trigger segment reorganization that competes with read throughput.

Watch Out: pgvector has no hard vector-count ceiling, but operational pressure — index memory footprint, write amplification on HNSW, and query planner behavior under highly selective filters — builds well before you reach 1 billion vectors. pgvectorscale's StreamingDiskANN index exists specifically to address this regime.

pgvectorscale: what changes in Postgres at large scale

pgvectorscale is a PostgreSQL extension built by Timescale that adds the StreamingDiskANN index on top of pgvector's existing data types and distance functions. It does not replace pgvector — it extends it. As Tiger Data describes: "Pgvectorscale complements pgvector by building on its data type and distance functions with a new search index, StreamingDiskANN, which is purpose-built for high performance and cost-efficient scalability."

The Tiger Data benchmark ran self-hosted PostgreSQL with pgvector and pgvectorscale on AWS hardware, compared directly against standalone vector databases under identical conditions. The resulting 471.57 QPS at 99% recall is the outcome of StreamingDiskANN's disk-resident traversal strategy, not of plain pgvector alone.

Dimension	pgvectorscale
Core addition	StreamingDiskANN index over pgvector types
Scale target	Large-scale, disk-resident vector search
Postgres compatibility	Extension — works alongside pgvector
Hosting model	Self-hosted Postgres (AWS EC2 in benchmark)
Cost vs managed vector DBs	Up to 75–79% cheaper than Pinecone per Tiger Data

Why StreamingDiskANN matters for large filtered indexes

DiskANN-family indexes store the graph on disk and stream node neighborhoods into memory during traversal, rather than requiring the full graph to fit in RAM. This makes the index viable at scales where HNSW memory pressure becomes prohibitive — particularly relevant for billion-vector targets.

Pro Tip: StreamingDiskANN's disk-oriented design changes the throughput-vs-memory trade-off that makes plain pgvector expensive at scale. At 50M vectors, the benchmark shows 471.57 QPS at 99% recall on AWS hardware — a result that plain pgvector's in-memory HNSW cannot match at the same hardware cost. If you are budgeting memory for a large Postgres instance to hold a pgvector HNSW index, benchmark pgvectorscale's StreamingDiskANN against your workload before sizing the instance.

The caveat: StreamingDiskANN's filtered-search behavior, like any ANN index, varies with filter selectivity. The Tiger Data benchmark is primarily ANN-oriented; teams with narrow, high-selectivity filters should validate on their own data before treating the 11.4× QPS advantage as guaranteed.

Self-hosting on AWS EC2: the cost and ops trade-off

Self-hosting PostgreSQL with pgvector and pgvectorscale on AWS EC2 delivers substantial cost advantages over managed vector database services — Tiger Data reports 75–79% lower cost than Pinecone in their direct comparison. The benchmark hardware parity was maintained by running both Postgres and standalone vector DBs on equivalent AWS instances.

Dimension	Self-hosted pgvector + pgvectorscale	Managed / standalone vector DB
Infrastructure cost	Lower (EC2 on-demand or reserved)	Higher (managed service markup)
Ops overhead	Postgres DBA skills required	Vendor-managed availability
Upgrade path	Manual extension upgrades	Vendor-controlled
Existing Postgres teams	Low marginal overhead	New toolchain overhead
Isolation from relational load	None — shared Postgres process	Full isolation

The operational overhead is real: index builds at large scale require maintenance windows, EC2 instance sizing decisions tie directly to StreamingDiskANN memory patterns, and the team bears full responsibility for availability, backups, and upgrade coordination. For teams already operating Postgres infrastructure at scale, the marginal ops cost is manageable. For teams new to Postgres operations, a managed vector store can reduce engineering burden even at higher infrastructure cost.

Decision matrix: which engine to choose for your workload

The 11.4× QPS advantage for pgvectorscale is significant and reproducible under the benchmarked conditions, but it does not universally determine the correct choice. Filter selectivity, tail latency SLOs, existing infrastructure, and team operational capability each shift the decision.

Workload trait	Qdrant	pgvector (plain)	pgvector + pgvectorscale
Filtered search, low-selectivity	Strong	Adequate	Strong
Filtered search, high-selectivity	Strong (one-stage HNSW)	Degrades	Benchmark-dependent
Peak throughput @ 99% recall (50M vectors)	41.47 QPS	Below pgvectorscale	471.57 QPS
Tail latency consistency	Architecturally prioritized	Planner-dependent	Benchmark-dependent
Existing Postgres footprint	Separate service	Native extension	Native extension
Mixed relational + vector queries	Requires dual-store sync	Native SQL joins	Native SQL joins
Billion-vector scale	Horizontal sharding	Memory-constrained	Disk-resident index
Ops complexity	Standalone service	Standard Postgres	Postgres + extension mgmt

Choose Qdrant when filtered latency consistency matters most

Qdrant is the right choice when your SLA is defined by p99 latency under complex payload filters, when the vector workload must be isolated from relational database resource contention, or when the team is not already operating Postgres infrastructure.

Your application fires multi-condition filters (nested, array-valued, or compound payloads) and needs consistent sub-50ms p99 regardless of query concurrency
The vector workload is operationally distinct from your relational data — no tight join requirements at query time
You need horizontal sharding across vector collections without committing to Postgres cluster management

Choose pgvectorscale when throughput and Postgres consolidation matter

pgvectorscale paired with pgvector is the strongest option when raw QPS at a fixed recall target is the primary constraint and your team already operates Postgres.

Your 50M+ vector workload requires sustained throughput above what plain pgvector's in-memory HNSW can provide, and the benchmark-validated 471.57 QPS at 99% recall fits your concurrency model
Vectors and relational metadata live in the same Postgres schema — SQL joins at query time are a hard requirement
Your team operates EC2-hosted Postgres and can absorb extension management overhead in exchange for 75–79% infrastructure cost savings versus managed vector services
Postgres consolidation reduces toolchain surface area and you are not introducing a new service dependency

Choose plain pgvector without pgvectorscale only when dataset size stays below approximately 10–20M vectors and query concurrency is modest — the operational simplicity of a single extension without StreamingDiskANN is a real advantage at that scale.

FAQ

Bottom Line: On a 50M-vector benchmark at 99% recall, pgvector + pgvectorscale delivers 11.4× higher QPS than Qdrant (471.57 vs 41.47). Choose pgvectorscale when throughput and Postgres consolidation are the constraints; choose Qdrant when tail latency consistency and operational isolation from relational workloads are the constraints; use plain pgvector when your dataset is small enough that StreamingDiskANN overhead is not justified.

Is Qdrant faster than pgvector?

Against plain pgvector on an unoptimized configuration, Qdrant can outperform it on filtered workloads due to its one-stage HNSW traversal. Against pgvector paired with pgvectorscale's StreamingDiskANN index, the answer reverses sharply on throughput: pgvector + pgvectorscale posted 471.57 QPS versus Qdrant's 41.47 QPS at 99% recall on 50M × 768-dim Cohere embeddings. The comparison is not one-dimensional — Qdrant prioritizes tail latency consistency while pgvectorscale prioritizes peak throughput.

Does pgvector support filtering?

Yes. pgvector runs inside PostgreSQL, so metadata filtering uses standard SQL predicates — WHERE clauses, joins, and subqueries — applied by the query planner. Both HNSW and IVFFlat indexes are available. The scaling implication: at low-to-moderate vector counts with broad filters, SQL predicate filtering is efficient. At billion-vector scale with high-selectivity filters, query plan choices and index memory pressure degrade performance in ways that StreamingDiskANN or Qdrant's one-stage traversal handle more predictably.

What is pgvectorscale used for?

pgvectorscale adds the StreamingDiskANN search index to PostgreSQL on top of pgvector's existing data types and distance functions. It targets large-scale vector search workloads — specifically the regime where pgvector's in-memory HNSW index becomes too expensive to maintain and serve. StreamingDiskANN stores the graph on disk and streams neighborhoods into memory during traversal, enabling high throughput at lower memory cost. It is not a standalone database — it is a scale-oriented Postgres extension used alongside pgvector.

Sources & References

Qdrant vector-db-benchmark — benchmark framework with Docker Compose server plus remote client isolation.
Tiger Data: pgvector vs Qdrant — primary source for the 471.57 vs 41.47 QPS result at 99% recall.
Tiger Data: pgvector vs Pinecone — StreamingDiskANN description and scale-oriented PostgreSQL extension context.
Tiger Data: Why Postgres wins for AI and vector workloads — 50M-vector benchmark context and hardware parity statement.
Tiger Data: pgvector + pgvectorscale docs — pgvector HNSW and IVFFlat capabilities reference.
Qdrant benchmarks page — filter behavior and HNSW disconnection description.
Tiger Data: pgvector cost vs Pinecone — 75–79% cost advantage claim for self-hosted Postgres.

Keywords: Qdrant, pgvector, pgvectorscale, PostgreSQL 16, StreamingDiskANN, ANN-benchmarks, Cohere embeddings, AWS EC2, p95 latency, p99 latency, QPS, recall@99%, payload filtering, Rust, Timescale

Was this guide helpful?

Share: X · LinkedIn · Reddit