How we compared Qdrant, pgvector, and pgvectorscale
The comparison centers on a single, practically relevant question: which engine sustains the best throughput and tail latency on vector search at scale, under identical hardware and a fixed recall target? The benchmark surfaced here is ANN-benchmarks-style, not a direct filtered-production workload test, so the filtered-search conclusions below should be treated as workload interpretation rather than a measured result. To answer it, the primary dataset used throughout is 50 million Cohere embeddings at 768 dimensions, evaluated on identical AWS hardware at a 99% recall target — the same conditions Tiger Data reported in their pgvector vs Qdrant comparison.
The benchmark methodology follows ANN-benchmarks-style evaluation: server-side engine, client on a separate machine, engine-specific configuration for collection creation, upload, and search. Qdrant's vector-db-benchmark repository codifies this structure, running the server in Docker Compose with a remote client to isolate query latency from local interference.
| Parameter | Value |
|---|---|
| Dataset | 50M Cohere embeddings, 768 dimensions |
| Recall target | 99% |
| Hardware | Identical AWS EC2 instances |
| Evaluation style | ANN-benchmarks (server + remote client) |
| Engines tested | Qdrant · pgvector · pgvector + pgvectorscale |
Benchmark scope: same dataset, same hardware, same recall target
Tiger Data ran Qdrant and PostgreSQL with pgvector plus pgvectorscale on the same 50M-vector, 768-dimensional Cohere dataset and the same AWS hardware. As Tiger Data states: "We tested Postgres and Qdrant on a level playing field: 50 million embeddings, each with 768 dimensions."
| Shared condition | Detail |
|---|---|
| Dataset | 50M × 768-dim Cohere embeddings |
| Recall target | 99% |
| Hardware parity | Identical AWS instances for all engines |
| Benchmark framework | ANN-benchmarks-style, server + remote client |
| Postgres build | pgvector + pgvectorscale, release-mode with native compiler opts |
The benchmark surfaced is ANN-oriented. Filtered-search behavior can diverge materially from pure ANN results, particularly at high filter selectivity, so the benchmark should not be read as a direct production filtered-search measurement unless the filter workload is explicitly reproduced.
Why filtered search changes the ranking
Pure ANN benchmarks measure nearest-neighbor recall and throughput on an unfiltered index. The moment a predicate narrows the candidate space — "return the top-10 vectors where category = 'electronics' and price < 200" — the index graph topology changes the calculus entirely.
Qdrant applies filters during HNSW traversal, avoiding a separate pre- or post-filtering pass. Its documentation acknowledges the failure mode directly: "If too many vectors are filtered out, so the HNSW graph becomes disconnected." At high selectivity, graph disconnection degrades recall and forces fallback behavior, so no engine is immune to filter-induced degradation.
pgvector handles filtering through standard SQL predicates and query planner logic. At moderate scale the planner can push predicates alongside an HNSW scan, but the effectiveness depends heavily on filter selectivity and the shape of the index. pgvectorscale's StreamingDiskANN index alters this dynamic by redesigning the traversal path for large, disk-resident indexes — but the behavior under narrow filters versus broad filters still varies by workload.
Pro Tip: An ANN leaderboard ranking tells you almost nothing about filtered-retrieval performance. Before accepting any benchmark number, verify whether the test included payload filters and what the filter selectivity was. A 10× throughput gap on unfiltered ANN can shrink or reverse when you add a predicate that passes 1% of vectors versus one that passes 80%.
Benchmark snapshot: throughput, tail latency, and recall
At 99% recall on the 50M-vector dataset, pgvector + pgvectorscale reached 471.57 QPS while Qdrant reached 41.47 QPS — an 11.4× throughput advantage for the Postgres stack. The surfaced source quantifies throughput, but it does not expose exact p95 and p99 numbers in the excerpt available here, so tail-latency comparison has to remain qualitative in this article.
| Engine | QPS @ 99% recall | p95 latency | p99 latency |
|---|---|---|---|
| pgvector + pgvectorscale | 471.57 | Not surfaced in excerpt | Not surfaced in excerpt |
| Qdrant | 41.47 | Not surfaced in excerpt | Not surfaced in excerpt |
| pgvector (plain) | Below pgvectorscale | Not surfaced in excerpt | Not surfaced in excerpt |
Plain pgvector without pgvectorscale performs below the combined stack in this benchmark — the StreamingDiskANN index is the primary driver of the Postgres throughput advantage.
What the 50M-vector result says about pgvectorscale vs Qdrant
The 11.4× throughput gap is the headline, but the mechanism matters. As Tiger Data states: "At 99% recall, Postgres enhanced with pgvector and pgvectorscale demonstrates significantly higher throughput, processing 471.57 queries per second compared to Qdrant's 41.47 queries per second."
| Metric | pgvector + pgvectorscale | Qdrant |
|---|---|---|
| QPS @ 99% recall | 471.57 | 41.47 |
| Throughput ratio | 11.4× | 1× |
| Index type | StreamingDiskANN | HNSW |
| Tail latency | Not quantified in surfaced source | Not quantified in surfaced source |
StreamingDiskANN's disk-oriented traversal strategy enables high throughput on large indexes without requiring the entire graph to reside in memory. This is the architectural reason the Postgres stack outperforms Qdrant on raw QPS in this specific benchmark. That advantage is real and reproducible under the stated conditions; it does not extend automatically to every filter shape or every dataset size.
Where Qdrant still wins on latency consistency
The QPS number favors pgvectorscale, but QPS alone does not determine SLA fitness. Tiger Data's comparison reports p50, p95, and p99 latencies alongside throughput, but the exact numbers are not exposed in the excerpt used here, so any measured latency advantage has to be left unclaimed in this article.
The key trade-off is architectural rather than numeric: Qdrant applies payload filters during HNSW traversal and runs as a dedicated vector service, while PostgreSQL shares CPU, memory, and connection-management overhead with other relational workloads. For workloads where p99 spikes translate to user-facing timeout budgets, the tail-latency distribution still matters more than peak throughput even when the exact percentile values are not surfaced in the shortened comparison.
| Latency consideration | Qdrant posture | pgvector + pgvectorscale posture |
|---|---|---|
| Filter application | During HNSW traversal (one-stage) | SQL predicate + StreamingDiskANN |
| Shared resource contention | Isolated vector-only engine | Shared Postgres process pool |
| Tail latency under filters | Architecturally optimized, exact p95/p99 not surfaced here | Benchmark-dependent, exact p95/p99 not surfaced here |
| Reported percentiles | p50, p95, p99 in Tiger Data source | p50, p95, p99 in Tiger Data source |
Qdrant: strengths, weak spots, and ideal workload fit
Qdrant is a purpose-built vector search engine written in Rust, designed around payload-aware filtered retrieval. Its HNSW implementation integrates filter predicates directly into graph traversal rather than applying them as a separate stage, which preserves recall under complex, multi-condition filters that would otherwise require expensive post-processing.
| Dimension | Qdrant | pgvector | pgvector + pgvectorscale |
|---|---|---|---|
| Payload filtering | Native, one-stage during HNSW traversal | SQL predicates through PostgreSQL planner | SQL predicates through PostgreSQL planner |
| Multi-tenancy | Collection-per-tenant or named vectors | PostgreSQL schema and row-level controls | PostgreSQL schema and row-level controls |
| Scaling model | Horizontal sharding, distributed mode | Vertical scale inside PostgreSQL | Disk-resident index on PostgreSQL |
| Operational burden | Standalone service, separate lifecycle | Postgres extension, no separate service | Postgres extension plus index-management overhead |
Where Qdrant is a better operational fit
Qdrant runs as an independent service with no relational database dependency. For teams where vector search is a distinct capability — not tightly coupled to transactional Postgres tables — running Qdrant separately isolates resource consumption, eliminates shared query planner contention, and allows the vector index to be tuned, scaled, and upgraded on its own schedule.
Pro Tip: If your vector workload runs alongside heavy OLTP writes or complex analytical queries in the same database, co-locating vectors in Postgres introduces resource contention that doesn't exist in a dedicated vector store. Qdrant's operational isolation means a spike in write throughput on your relational layer won't degrade p99 on vector retrieval.
Where Qdrant loses ground in this benchmark
On the cited 50M-vector, 99% recall benchmark, Qdrant posted 41.47 QPS against 471.57 QPS for pgvector+pgvectorscale — an 11.4× throughput deficit in this specific evaluation.
Watch Out: The 41.47 QPS result is workload-dependent. Tiger Data's benchmark reflects a specific dataset (50M × 768-dim Cohere embeddings), a specific recall target (99%), and specific AWS hardware. Qdrant's relative performance changes materially with different filter selectivities, different vector dimensionalities, and different concurrency patterns. Do not treat the 11.4× gap as a universal ceiling on Qdrant throughput — reproduce the benchmark on your actual workload before committing.
pgvector: when the PostgreSQL extension is enough
pgvector adds vector similarity search to PostgreSQL 16 as an extension, supporting both HNSW and IVFFlat index types. Filtering is expressed through standard SQL predicates and joins — the same WHERE clauses engineers already use for relational queries. As Tiger Data's documentation states: "You can rely on familiar pgvector capabilities (for example HNSW and IVFFlat indexes)…"
| Dimension | pgvector |
|---|---|
| Filtering mechanism | SQL predicates, query planner |
| Index types | HNSW, IVFFlat |
| Ecosystem fit | Native Postgres — joins, transactions, RLS |
| Scale ceiling | Hardware/config-dependent; degrades before pgvectorscale |
| Operational model | Postgres extension, no separate service |
What pgvector does well for mixed relational and vector queries
The core strength of pgvector is co-location: vectors live in the same table as the metadata they describe, join freely against other Postgres tables, and participate in transactions. PostgreSQL documents this behavior in its core SQL model, including joins and transactional guarantees, while pgvector inherits those capabilities inside the database. Applying a metadata filter requires no synchronization between a separate vector store and a relational source of truth — the filter executes in a single query plan.
Pro Tip: If your application needs to join vector results against user permissions, product catalog metadata, or real-time inventory counts, keeping vectors in Postgres with pgvector eliminates an entire class of consistency problems. A separate vector store with a Postgres metadata backend requires two-phase retrieval and risks staleness between the stores on writes.
Where pgvector alone stops being the answer
Plain pgvector begins to show stress as dataset size grows past tens of millions of vectors and query concurrency increases. The HNSW index must fit in memory for efficient graph traversal; at billion-vector scale, memory pressure forces either an index that no longer fits in RAM or a configuration trade-off that degrades recall. Write amplification on large HNSW indexes is a real cost — every incremental insertion can trigger segment reorganization that competes with read throughput.
Watch Out: pgvector has no hard vector-count ceiling, but operational pressure — index memory footprint, write amplification on HNSW, and query planner behavior under highly selective filters — builds well before you reach 1 billion vectors. pgvectorscale's StreamingDiskANN index exists specifically to address this regime.
pgvectorscale: what changes in Postgres at large scale
pgvectorscale is a PostgreSQL extension built by Timescale that adds the StreamingDiskANN index on top of pgvector's existing data types and distance functions. It does not replace pgvector — it extends it. As Tiger Data describes: "Pgvectorscale complements pgvector by building on its data type and distance functions with a new search index, StreamingDiskANN, which is purpose-built for high performance and cost-efficient scalability."
The Tiger Data benchmark ran self-hosted PostgreSQL with pgvector and pgvectorscale on AWS hardware, compared directly against standalone vector databases under identical conditions. The resulting 471.57 QPS at 99% recall is the outcome of StreamingDiskANN's disk-resident traversal strategy, not of plain pgvector alone.
| Dimension | pgvectorscale |
|---|---|
| Core addition | StreamingDiskANN index over pgvector types |
| Scale target | Large-scale, disk-resident vector search |
| Postgres compatibility | Extension — works alongside pgvector |
| Hosting model | Self-hosted Postgres (AWS EC2 in benchmark) |
| Cost vs managed vector DBs | Up to 75–79% cheaper than Pinecone per Tiger Data |
Why StreamingDiskANN matters for large filtered indexes
DiskANN-family indexes store the graph on disk and stream node neighborhoods into memory during traversal, rather than requiring the full graph to fit in RAM. This makes the index viable at scales where HNSW memory pressure becomes prohibitive — particularly relevant for billion-vector targets.
Pro Tip: StreamingDiskANN's disk-oriented design changes the throughput-vs-memory trade-off that makes plain pgvector expensive at scale. At 50M vectors, the benchmark shows 471.57 QPS at 99% recall on AWS hardware — a result that plain pgvector's in-memory HNSW cannot match at the same hardware cost. If you are budgeting memory for a large Postgres instance to hold a pgvector HNSW index, benchmark pgvectorscale's StreamingDiskANN against your workload before sizing the instance.
The caveat: StreamingDiskANN's filtered-search behavior, like any ANN index, varies with filter selectivity. The Tiger Data benchmark is primarily ANN-oriented; teams with narrow, high-selectivity filters should validate on their own data before treating the 11.4× QPS advantage as guaranteed.
Self-hosting on AWS EC2: the cost and ops trade-off
Self-hosting PostgreSQL with pgvector and pgvectorscale on AWS EC2 delivers substantial cost advantages over managed vector database services — Tiger Data reports 75–79% lower cost than Pinecone in their direct comparison. The benchmark hardware parity was maintained by running both Postgres and standalone vector DBs on equivalent AWS instances.
| Dimension | Self-hosted pgvector + pgvectorscale | Managed / standalone vector DB |
|---|---|---|
| Infrastructure cost | Lower (EC2 on-demand or reserved) | Higher (managed service markup) |
| Ops overhead | Postgres DBA skills required | Vendor-managed availability |
| Upgrade path | Manual extension upgrades | Vendor-controlled |
| Existing Postgres teams | Low marginal overhead | New toolchain overhead |
| Isolation from relational load | None — shared Postgres process | Full isolation |
The operational overhead is real: index builds at large scale require maintenance windows, EC2 instance sizing decisions tie directly to StreamingDiskANN memory patterns, and the team bears full responsibility for availability, backups, and upgrade coordination. For teams already operating Postgres infrastructure at scale, the marginal ops cost is manageable. For teams new to Postgres operations, a managed vector store can reduce engineering burden even at higher infrastructure cost.
Decision matrix: which engine to choose for your workload
The 11.4× QPS advantage for pgvectorscale is significant and reproducible under the benchmarked conditions, but it does not universally determine the correct choice. Filter selectivity, tail latency SLOs, existing infrastructure, and team operational capability each shift the decision.
| Workload trait | Qdrant | pgvector (plain) | pgvector + pgvectorscale |
|---|---|---|---|
| Filtered search, low-selectivity | Strong | Adequate | Strong |
| Filtered search, high-selectivity | Strong (one-stage HNSW) | Degrades | Benchmark-dependent |
| Peak throughput @ 99% recall (50M vectors) | 41.47 QPS | Below pgvectorscale | 471.57 QPS |
| Tail latency consistency | Architecturally prioritized | Planner-dependent | Benchmark-dependent |
| Existing Postgres footprint | Separate service | Native extension | Native extension |
| Mixed relational + vector queries | Requires dual-store sync | Native SQL joins | Native SQL joins |
| Billion-vector scale | Horizontal sharding | Memory-constrained | Disk-resident index |
| Ops complexity | Standalone service | Standard Postgres | Postgres + extension mgmt |
Choose Qdrant when filtered latency consistency matters most
Qdrant is the right choice when your SLA is defined by p99 latency under complex payload filters, when the vector workload must be isolated from relational database resource contention, or when the team is not already operating Postgres infrastructure.
- Your application fires multi-condition filters (nested, array-valued, or compound payloads) and needs consistent sub-50ms p99 regardless of query concurrency
- The vector workload is operationally distinct from your relational data — no tight join requirements at query time
- You need horizontal sharding across vector collections without committing to Postgres cluster management
Choose pgvectorscale when throughput and Postgres consolidation matter
pgvectorscale paired with pgvector is the strongest option when raw QPS at a fixed recall target is the primary constraint and your team already operates Postgres.
- Your 50M+ vector workload requires sustained throughput above what plain pgvector's in-memory HNSW can provide, and the benchmark-validated 471.57 QPS at 99% recall fits your concurrency model
- Vectors and relational metadata live in the same Postgres schema — SQL joins at query time are a hard requirement
- Your team operates EC2-hosted Postgres and can absorb extension management overhead in exchange for 75–79% infrastructure cost savings versus managed vector services
- Postgres consolidation reduces toolchain surface area and you are not introducing a new service dependency
Choose plain pgvector without pgvectorscale only when dataset size stays below approximately 10–20M vectors and query concurrency is modest — the operational simplicity of a single extension without StreamingDiskANN is a real advantage at that scale.
FAQ
Bottom Line: On a 50M-vector benchmark at 99% recall, pgvector + pgvectorscale delivers 11.4× higher QPS than Qdrant (471.57 vs 41.47). Choose pgvectorscale when throughput and Postgres consolidation are the constraints; choose Qdrant when tail latency consistency and operational isolation from relational workloads are the constraints; use plain pgvector when your dataset is small enough that StreamingDiskANN overhead is not justified.
Is Qdrant faster than pgvector?
Against plain pgvector on an unoptimized configuration, Qdrant can outperform it on filtered workloads due to its one-stage HNSW traversal. Against pgvector paired with pgvectorscale's StreamingDiskANN index, the answer reverses sharply on throughput: pgvector + pgvectorscale posted 471.57 QPS versus Qdrant's 41.47 QPS at 99% recall on 50M × 768-dim Cohere embeddings. The comparison is not one-dimensional — Qdrant prioritizes tail latency consistency while pgvectorscale prioritizes peak throughput.
Does pgvector support filtering?
Yes. pgvector runs inside PostgreSQL, so metadata filtering uses standard SQL predicates — WHERE clauses, joins, and subqueries — applied by the query planner. Both HNSW and IVFFlat indexes are available. The scaling implication: at low-to-moderate vector counts with broad filters, SQL predicate filtering is efficient. At billion-vector scale with high-selectivity filters, query plan choices and index memory pressure degrade performance in ways that StreamingDiskANN or Qdrant's one-stage traversal handle more predictably.
What is pgvectorscale used for?
pgvectorscale adds the StreamingDiskANN search index to PostgreSQL on top of pgvector's existing data types and distance functions. It targets large-scale vector search workloads — specifically the regime where pgvector's in-memory HNSW index becomes too expensive to maintain and serve. StreamingDiskANN stores the graph on disk and streams neighborhoods into memory during traversal, enabling high throughput at lower memory cost. It is not a standalone database — it is a scale-oriented Postgres extension used alongside pgvector.
Sources & References
- Qdrant vector-db-benchmark — benchmark framework with Docker Compose server plus remote client isolation.
- Tiger Data: pgvector vs Qdrant — primary source for the 471.57 vs 41.47 QPS result at 99% recall.
- Tiger Data: pgvector vs Pinecone — StreamingDiskANN description and scale-oriented PostgreSQL extension context.
- Tiger Data: Why Postgres wins for AI and vector workloads — 50M-vector benchmark context and hardware parity statement.
- Tiger Data: pgvector + pgvectorscale docs — pgvector HNSW and IVFFlat capabilities reference.
- Qdrant benchmarks page — filter behavior and HNSW disconnection description.
- Tiger Data: pgvector cost vs Pinecone — 75–79% cost advantage claim for self-hosted Postgres.
Keywords: Qdrant, pgvector, pgvectorscale, PostgreSQL 16, StreamingDiskANN, ANN-benchmarks, Cohere embeddings, AWS EC2, p95 latency, p99 latency, QPS, recall@99%, payload filtering, Rust, Timescale



