Multi-tenant vector search forces a choice that no general ANN benchmark captures: how much memory overhead do you accept in exchange for per-tenant query isolation? The Curator paper (arXiv:2401.07119, Jin & Wu et al.) frames this directly — every conventional multi-tenancy design either wastes memory or degrades search performance, and Curator proposes an indexing approach aimed at escaping that binary. The paper's value is in articulating the trade-off precisely, not in claiming leaderboard-level ANN superiority.
Why multi-tenancy is hard in vector databases
The core tension is architectural. As the Curator abstract states: "Multi-tenancy in vector databases is currently achieved by building either a single, shared index among all tenants, or a per-tenant index." Each approach optimizes for a different resource: "The former optimizes for memory efficiency at the expense of search performance, while the latter does the opposite."
This is not a tuning problem — it is a structural constraint. A shared index allows tenants' vectors to occupy adjacent graph nodes or cluster centroids. When a query arrives for tenant A, the index must either scan regions owned by tenant B (wasting compute and polluting cache lines) or apply a post-hoc filter that discards irrelevant results after traversal (hurting recall or requiring over-fetch). Per-tenant indexes solve both problems but multiply memory footprint with tenant count, making them uneconomical when tenant populations are large or when tenant vector sets are small individually.
The Curator paper's operator-level contribution is to name this as a first-class engineering problem rather than a configuration choice, and to propose measured evidence for where the trade-off sits.
Bottom Line: The shared-versus-per-tenant index choice is not about preference — it governs whether you pay in memory or in tail latency. Curator proposes an indexing structure that targets both dimensions simultaneously, and its value is in the evidence for that claim, not in any claim of universal ANN superiority.
How Curator models tenant isolation and shared infrastructure
Curator is an indexing method for multi-tenant vector databases designed to avoid the forced choice between memory efficiency and query latency. The Semantic Scholar figure caption for the paper states: "Current multi-tenancy strategies incur inherent inefficiency in either query latency or memory usage, while Curator aims to optimize both simultaneously."
ComparisonTable
| Design | Isolation model | Memory behavior | Latency implication |
|---|---|---|---|
| Shared index | Logical (filter-based) | Low — one index for all tenants | Search traversal visits off-tenant nodes; filtering overhead at query time |
| Per-tenant index | Physical (separate index per tenant) | High — scales with tenant count | Clean traversal; no cross-tenant noise |
| Curator | Structural (tenant-specific subtrees + shared regions) | Moderate — shared subtrees reduce duplication | Traversal scope bounded per tenant; less filter overhead than shared index |
The mechanism keeps tenant-specific clustering trees while sharing common subtrees where tenant embeddings overlap. That architecture avoids the all-or-nothing choice: shared-index designs pollute traversal paths; per-tenant designs replicate identical graph edges for every new tenant. Curator encodes tenant-specific structure compactly and shares only what can safely be shared.
The table below captures the qualitative isolation and latency behavior of the three designs:
| Design | Isolation model | Memory behavior | Latency implication |
|---|---|---|---|
| Shared index | Logical (filter-based) | Low — one index for all tenants | Search traversal visits off-tenant nodes; filtering overhead at query time |
| Per-tenant index | Physical (separate index per tenant) | High — scales with tenant count | Clean traversal; no cross-tenant noise |
| Curator | Structural (tenant-specific subtrees + shared regions) | Moderate — shared subtrees reduce duplication | Traversal scope bounded per tenant; less filter overhead than shared index |
The reproduction context for this table is limited to the cited paper. Exact numeric overhead ratios require parsing the full PDF.
Shared index vs per-tenant index: the baseline designs
The shared-index model is the default in many managed vector stores. Pinecone implements multi-tenancy through namespaces: "Namespaces let you partition records within an index and are essential for implementing multitenancy when you need to isolate the data of each customer/user." On Standard and Enterprise plans, Pinecone supports up to 100,000 namespaces per index, with stated support for million-scale namespace counts in specific use cases.
Qdrant takes a tiered approach as of v1.16.0: "With tiered multitenancy, you can implement two levels of tenant isolation within a single collection, keeping small tenants together inside a shared Shard, while isolating large tenants into their own dedicated Shards." This makes Qdrant's model a hybrid — logically shared for small tenants, physically isolated for large ones.
Both approaches reflect the same underlying trade-off the Curator paper formalizes. Namespace-based designs (Pinecone multi-tenancy) push the isolation boundary to query routing. Shard-based designs (Qdrant) push it to storage allocation. Neither eliminates the memory-versus-latency tension; they shift where it is managed.
Pro Tip: The shared-vs-per-tenant framing is necessary but not sufficient for production planning. What the baseline analysis omits is tenant size distribution — a system with a few large tenants and many small ones behaves differently from one with uniform tenant sizes. Qdrant's tiered model encodes this asymmetry explicitly; Pinecone multi-tenancy leaves it to the operator to manage through index design. Neither the paper nor vendor documentation provides a formula for the break-even point.
Tenant-specific clustering trees and compact shared subtrees
Curator's indexing approach encodes tenant identity into the structure of the index rather than relying on post-query filtering. Where conventional ANN indexes (HNSW, IVF, ScaNN) build a global graph or cluster assignment without tenant awareness, Curator maintains per-tenant clustering trees with shared subtree regions when tenants' data distributions overlap.
This design has a concrete operator implication: the memory cost of adding a new tenant is not the cost of a full index replica, nor is it zero. It is proportional to how much of the new tenant's data occupies regions not already encoded in the shared subtree structure. Tenants with highly similar embeddings (e.g., many SaaS tenants querying a shared domain corpus) benefit most; tenants with entirely distinct data distributions approach the memory cost of a per-tenant index.
Pro Tip: If your tenant population has high semantic overlap — for instance, all tenants query the same product catalog with tenant-specific access control — Curator's shared-subtree model captures the most memory savings. If tenants are domain-disjoint (e.g., a multi-industry platform where each tenant's vectors have no distributional overlap), the shared subtree shrinks and memory savings erode toward the per-tenant baseline. The Curator paper does not provide explicit guidance on when the break-even occurs, so operators must benchmark against their own tenant distribution before assuming general memory reduction.
Methodology and what the paper actually measured
The Curator paper's experimental scope is multi-tenant vector search — specifically, how index design choices affect memory usage and search performance when multiple tenants share infrastructure. The study compares Curator against the two canonical baselines: a single shared index and a collection of per-tenant indexes.
The paper's methodology centers on the trade-off between these baselines rather than on positioning Curator against standalone ANN indexes such as HNSW or IVF in single-tenant workloads. This is a deliberate scope boundary: the contribution is in the multi-tenant regime, and the measured outputs are memory usage and search performance under tenant-separated workloads.
The reproduction context for exact experimental details — dataset names, vector dimensionality, tenant count, hardware configuration, and complete result tables — requires reading the full PDF on arXiv. The search-accessible abstract establishes the design comparison and the directional findings; it does not expose numeric benchmark rows.
| Design point | What the paper measures | Direction of finding |
|---|---|---|
| Shared index | Memory usage vs. per-tenant | Lower memory; higher latency cost |
| Per-tenant index | Query latency vs. shared | Lower latency; higher memory cost |
| Curator | Memory and latency vs. both baselines | Targets improvement on both dimensions |
Note: Exact QPS, recall, p99 latency values, and memory-reduction percentages are not reproducible from the abstract alone. Treat the table above as a directional summary pending full-PDF extraction.
Metrics that matter: memory overhead, p99 latency, and tenant filtering
For platform engineers, the Curator paper surfaces three operationally concrete metrics: memory overhead relative to tenant count, tail latency under concurrent multi-tenant queries, and the cost of tenant filtering during search traversal.
| Metric | Shared index | Per-tenant index | Curator | Units |
|---|---|---|---|---|
| Memory overhead vs. one tenant | ~1.0x index memory | ~N x index memory for N tenants | Between shared and per-tenant; distribution-dependent | x index memory |
| Tail latency (p99) under co-resident load | Elevated by off-tenant traversal and cache contention | Lowest isolation-induced p99 | Lower than shared index; depends on overlap | ms |
| Tenant filtering cost | High — post-traversal filtering on shared structure | None for in-tenant traversal | Reduced by tenant-aware structure | candidate checks per query |
Memory overhead is the most tractable to plan for. A shared index holds memory roughly constant as tenants grow; a per-tenant index scales linearly. Curator's shared-subtree design should sit between these curves, with the slope determined by tenant data distribution — a point the paper motivates even if it does not supply a universal formula.
Tail latency — specifically p99 — matters because multi-tenant workloads are bursty and uneven. One large tenant issuing high-QPS queries on a shared index can push p99 latency for small tenants upward, not because their queries are slow in isolation but because they compete for cache and traversal capacity. Per-tenant indexes decouple this; Curator's partial sharing introduces partial coupling.
Tenant filtering is the third dimension. In Pinecone multi-tenancy, "namespaces let you partition records within an index", making namespace routing the primary isolation mechanism. In Qdrant's shard model, shard-selection determines whether a query touches shared or dedicated storage. In both cases, the filtering mechanism interacts with the index structure to determine traversal scope. Curator's design internalizes tenant awareness into the index itself, which reduces the reliance on post-traversal filtering — a latency advantage that is most pronounced when tenant data distributions are tight.
What the results prove, and what they do not
Curator's measured contribution is specific: its indexing approach improves the memory-versus-latency trade-off relative to shared-index and per-tenant-index baselines in multi-tenant vector search workloads. The paper's own framing is comparative with those two design points, as the Semantic Scholar figure caption confirms: "Current multi-tenancy strategies incur inherent inefficiency in either query latency or memory usage, while Curator aims to optimize both simultaneously."
Watch Out: Curator is a multi-tenancy paper, not a universal ANN benchmark. It does not establish superiority across unrelated workloads, and it should not be used as proof that Curator beats HNSW, IVF, or ScaNN on single-tenant recall or throughput tests. Evaluate it within the multi-tenant regime the paper actually measured.
What this does not prove:
- Curator does not establish dominance over HNSW, IVF, or ScaNN on single-tenant ANN-Benchmarks workloads. That comparison was not the study's objective.
- The results do not generalize automatically to all vector dimensionalities, all tenant size distributions, or all hardware configurations.
- No direct head-to-head with Qdrant's tiered shard model or Pinecone's namespace model under identical workloads has been retrieved from public sources.
Watch Out: Citing Curator as evidence of general ANN superiority misreads the paper's scope. The findings are valid within their experimental regime — multi-tenant isolation with two specific baseline comparisons. An operator who applies Curator's conclusions to a single-tenant, high-recall ANN search workload is extrapolating beyond the study's evidence. Evaluate Curator on multi-tenancy benchmarks; use ANN-Benchmarks for single-tenant recall and throughput comparisons.
What platform teams should infer for Qdrant and Pinecone
The Curator paper's findings do not directly benchmark Qdrant or Pinecone — but the isolation models those systems expose map cleanly onto the paper's design taxonomy, and the trade-off logic transfers.
Qdrant's tiered multitenancy in v1.16.0 is structurally closest to a hybrid of the paper's two baselines: "With tiered multitenancy, you can implement two levels of tenant isolation within a single collection" — shared shards for small tenants, dedicated shards for large tenants. This means Qdrant's memory profile is sublinear in tenant count when most tenants are small, but its per-large-tenant memory cost is equivalent to a per-tenant index. The Curator paper's finding that per-tenant designs trade memory for latency applies directly to Qdrant's dedicated-shard path.
Pinecone multi-tenancy uses namespaces as the isolation primitive. Standard and Enterprise plans support up to 100,000 namespaces per index, with stated capability for million-scale counts for specific use cases. The memory profile stays flat as namespace count grows because the underlying index is shared. The latency exposure follows the paper's prediction: cross-namespace traversal and filter cost scale with query load, and noisy-neighbor effects are real under high-concurrency workloads.
ComparisonTable
| System | Isolation model | Memory behavior | Latency exposure | Closest paper analog |
|---|---|---|---|---|
| Qdrant (tiered, v1.16.0) | Hybrid: shared shards (small) + dedicated shards (large) | Sublinear until large-tenant threshold | Low for large tenants; shared-index risk for small tenants | Hybrid between paper's two baselines |
| Pinecone (namespaces) | Logical (namespace routing within shared index) | Flat as namespace count grows | Filter/routing overhead at query time; noisy-neighbor risk | Shared-index baseline |
| Curator (paper) | Structural (tenant-specific subtrees + shared regions) | Moderate; proportional to tenant data novelty | Reduced traversal scope vs. shared index | Novel design between baselines |
The mapping here is conceptual, not a validated peer comparison. No public benchmark directly measures Qdrant tiered shards against Pinecone namespaces under identical multi-tenant load.
When shared infrastructure is worth the latency risk
Shared infrastructure is operationally acceptable when three conditions hold: tenant SLA requirements are uniform, tenant data distributions overlap enough that shared traversal paths are efficient, and operator control over noisy-neighbor mitigation exists at the routing layer.
Pinecone multi-tenancy fits this profile for SaaS products with many small customers sharing a common domain. The flat memory profile and high namespace ceiling (100,000 per index, scaling further on request) make it operationally simple. The latency risk is real but bounded when namespace routing is fast and per-namespace query rates stay low.
Qdrant's tiered model fits asymmetric tenant populations: "keeping small tenants together inside a shared Shard, while isolating large tenants into their own dedicated Shards" provides a natural break-even — small tenants benefit from memory sharing, large tenants get isolation without requiring a separate cluster.
DecisionMatrix
| Condition | Shared infrastructure (Pinecone namespaces / Qdrant shared shards) | Per-tenant isolation (Qdrant dedicated shards / separate collections) |
|---|---|---|
| Tenant count | High (hundreds to millions) | Low to moderate (tens to hundreds) |
| SLA sensitivity | Relaxed or uniform across tenants | Strict or differentiated per tenant |
| Memory budget | Constrained | Flexible; cost is secondary |
| Tenant data distribution | High overlap | Domain-disjoint or low overlap |
| Noisy-neighbor tolerance | Acceptable | Not acceptable |
Choose shared infrastructure when tenant count is high, SLAs are uniform, memory is constrained, tenant embeddings overlap substantially, and some noisy-neighbor exposure is acceptable.
Choose per-tenant isolation when tenant count is lower, SLAs are strict or tiered, memory budget can absorb duplication, tenant data is domain-disjoint, or noisy-neighbor risk cannot be tolerated.
No universal numeric threshold for when to switch between models has been established in public sources. Treat the matrix above as a qualitative decision guide, not a formula.
Deployment boundaries: filtering, noisy neighbors, and tail latency
Tenant filtering becomes the dominant query bottleneck when filter selectivity is low — that is, when a large fraction of vectors in the index must be examined before the tenant-specific result set is assembled. In a shared-index model, a query for a tenant that owns 0.1% of the index vectors may traverse a significant portion of the graph before filtering eliminates off-tenant candidates. This is the mechanism behind the Curator paper's finding that shared indexes sacrifice search performance: "The former optimizes for memory efficiency at the expense of search performance."
Noisy-neighbor effects compound filter latency. A large tenant running high-QPS queries saturates shared HNSW graph traversal capacity, polluting CPU cache and memory bandwidth for co-resident small tenants. Qdrant's dedicated-shard model for large tenants directly addresses this by removing large tenants from the shared traversal path. Pinecone's namespace model relies on service-level capacity management to bound this effect.
Pro Tip: In shared-index deployments, tenant filtering overhead scales inversely with tenant data fraction, and namespace routing or shard selection only helps when it shortens the candidate set before traversal finishes. A tenant owning 1% of vectors in a shared index faces 100× the filtering work per relevant result compared to a tenant owning 100% of a private index. When p99 latency for small tenants degrades under load, the first place to investigate is filter selectivity, not index build parameters. Moving high-value small tenants to dedicated isolation boundaries often recovers more tail latency than any HNSW
eformtuning.
Limitations and caveats in the Curator paper
The Curator paper addresses one specific operational problem — the memory-versus-latency trade-off in multi-tenant vector search — and evaluates two specific baselines. That scope is the source of both its contribution and its limits.
The paper does not establish generalization across vector dimensionality ranges, arbitrary tenant size distributions, or all workload access patterns. The experimental details needed to reproduce the results (dataset provenance, hardware specifications, exact tenant configurations, and complete result tables) are not accessible from the abstract alone. Operators evaluating Curator must read the full PDF and assess whether the paper's experimental workload resembles their own production conditions before applying its conclusions.
Watch Out: The paper's benchmark scope is narrower than the problem it names. Multi-tenancy in vector databases spans many workload patterns — high-write, high-read, mixed-tenant query concurrency, variable recall requirements — and the paper evaluates a subset. Before adopting Curator's design conclusions, operators should verify that the paper's tenant count, vector distribution, and query pattern match their deployment context. A finding that holds at N tenants with K-dimensional vectors may not hold at 10N tenants with domain-disjoint distributions.
Why memory savings do not automatically mean lower total cost
Memory reduction in the index layer is one cost driver, not the whole cost model. In a self-hosted Qdrant deployment, the operational cost includes cluster node count, shard replication factor, query CPU allocation, and write amplification during tenant onboarding — none of which scale purely with index memory. Curator's memory savings apply to the in-memory index structure; they do not reduce storage costs for the raw vector payload, metadata, or WAL.
In Pinecone's managed model, cost depends on index size (pod tier and count), namespace count, and query volume — not just memory footprint. Reducing in-index memory overhead does not directly translate to a lower Pinecone bill unless that reduction crosses a pod-tier threshold.
Brief Context: Memory savings from shared-subtree designs reduce the per-tenant marginal cost of index growth — a meaningful planning input for platform teams projecting tenant count. But the full TCO calculation must include query CPU, storage, replication, and operational overhead. Qdrant's tiered shard model illustrates this: "keeping small tenants together inside a shared Shard, while isolating large tenants into their own dedicated Shards" saves memory for small tenants but adds shard management complexity as the large-tenant count grows. Curator's memory improvements are a necessary but not sufficient input to cost planning.
FAQ for engineers evaluating vector-store multi-tenancy
What is the trade-off between shared and per-tenant indexes?
The Curator paper states it precisely: "The former optimizes for memory efficiency at the expense of search performance, while the latter does the opposite." A shared index holds memory costs flat as tenant count grows but forces query traversal across off-tenant vectors. A per-tenant index delivers clean isolation and predictable latency but multiplies memory with tenant count.
What is Curator in vector databases?
Curator is an indexing method for multi-tenant vector databases that proposes tenant-specific clustering trees with shared subtree regions to reduce the memory cost of per-tenant isolation without accepting the full latency penalty of shared-index traversal. The paper (arXiv:2401.07119) evaluates Curator against shared-index and per-tenant-index baselines. It is a research proposal, not a production component shipping inside Qdrant or Pinecone.
Is Qdrant good for multi-tenancy?
Qdrant's tiered multitenancy (v1.16.0) is operationally practical for asymmetric tenant populations. Shared shards for small tenants keep memory costs low; dedicated shards for large tenants prevent noisy-neighbor latency contamination. The model aligns closely with the hybrid position Curator's paper motivates, without requiring Curator's specific indexing implementation.
How does Pinecone support multi-tenancy?
Pinecone multi-tenancy uses namespaces as the isolation primitive. Standard and Enterprise plans support up to 100,000 namespaces per index, scaling to million-scale counts for specific use cases. "Namespaces are created automatically as you upsert records", making onboarding operationally simple. The latency exposure is the shared-index baseline's filter overhead, which Pinecone manages at the service layer rather than at the index structure layer.
Does Curator replace HNSW or IVF for standard ANN search?
No. The Curator paper is a multi-tenancy study, not an ANN-Benchmarks submission. Its comparisons are against shared-index and per-tenant-index baselines under multi-tenant workloads. Claims about HNSW, IVF, or ScaNN recall-vs-latency curves in single-tenant regimes are outside the paper's evidence boundary.
When should an operator prefer per-tenant indexes over a shared index?
When any of the following apply: per-tenant SLAs are strict and differentiated, tenant data is domain-disjoint (low distributional overlap), noisy-neighbor effects are unacceptable, or the tenant population is small enough that the memory multiplier is manageable. Qdrant's dedicated-shard path and separate Qdrant collections both satisfy this requirement; Pinecone's namespace model does not provide equivalent physical isolation.
Sources & References
- Curator: Efficient Indexing for Multi-Tenant Vector Databases (arXiv:2401.07119) — Primary paper; defines the shared-index vs. per-tenant-index trade-off and proposes Curator as a resolution.
- Curator on Semantic Scholar (Jin & Wu) — Figure caption confirming Curator's dual optimization target.
- Qdrant multitenancy documentation — Official docs for tiered multitenancy, shared shards, and dedicated shards in v1.16.0.
- Pinecone multitenancy guide — Official docs for namespace-based isolation, namespace limits, and multi-tenant index design.
- Pinecone upsert and namespace documentation — Source for namespace isolation framing and automatic namespace creation behavior.
- Pinecone manage namespaces documentation — Confirms namespace lifecycle and creation semantics.
Keywords: Curator, Pinecone, Qdrant, HNSW, IVF, ScaNN, ANN-Benchmarks, tenant isolation, tail latency, memory overhead, tenant filtering, multi-tenant vector search, shared index, per-tenant index, vector database



