AI & ML

Curator and the multi-tenancy problem in vector databases

Q: What is the trade-off between shared and per-tenant indexes?

The Curator paper states it precisely: ["The former optimizes for memory efficiency at the expense of search performance, while the latter does the opposite."](https://arxiv.org/abs/2401.07119) A shared index holds memory costs flat as tenant count grows but forces query traversal across off-tenant vectors. A per-tenant index delivers clean isolation and predictable latency but multiplies memory with tenant count.

Curator tackles multi-tenancy by managing isolation and memory trade-offs so tenants can share vector infrastructure without blowing up tail latency, but the paper’s value is in the measured latency-vs-memory trade-off rather than claiming universal best-in-class ANN performance.

By AxiomLogica Editorial

May 6, 202619 min read

Reviewed by Editorial

Curator and the multi-tenancy problem in vector databases

Multi-tenant vector search forces a choice that no general ANN benchmark captures: how much memory overhead do you accept in exchange for per-tenant query isolation? The Curator paper (arXiv:2401.07119, Jin & Wu et al.) frames this directly — every conventional multi-tenancy design either wastes memory or degrades search performance, and Curator proposes an indexing approach aimed at escaping that binary. The paper's value is in articulating the trade-off precisely, not in claiming leaderboard-level ANN superiority.

Why multi-tenancy is hard in vector databases

The core tension is architectural. As the Curator abstract states: "Multi-tenancy in vector databases is currently achieved by building either a single, shared index among all tenants, or a per-tenant index." Each approach optimizes for a different resource: "The former optimizes for memory efficiency at the expense of search performance, while the latter does the opposite."

This is not a tuning problem — it is a structural constraint. A shared index allows tenants' vectors to occupy adjacent graph nodes or cluster centroids. When a query arrives for tenant A, the index must either scan regions owned by tenant B (wasting compute and polluting cache lines) or apply a post-hoc filter that discards irrelevant results after traversal (hurting recall or requiring over-fetch). Per-tenant indexes solve both problems but multiply memory footprint with tenant count, making them uneconomical when tenant populations are large or when tenant vector sets are small individually.

The Curator paper's operator-level contribution is to name this as a first-class engineering problem rather than a configuration choice, and to propose measured evidence for where the trade-off sits.

Bottom Line: The shared-versus-per-tenant index choice is not about preference — it governs whether you pay in memory or in tail latency. Curator proposes an indexing structure that targets both dimensions simultaneously, and its value is in the evidence for that claim, not in any claim of universal ANN superiority.

How Curator models tenant isolation and shared infrastructure

Curator is an indexing method for multi-tenant vector databases designed to avoid the forced choice between memory efficiency and query latency. The Semantic Scholar figure caption for the paper states: "Current multi-tenancy strategies incur inherent inefficiency in either query latency or memory usage, while Curator aims to optimize both simultaneously."

ComparisonTable

Design	Isolation model	Memory behavior	Latency implication
Shared index	Logical (filter-based)	Low — one index for all tenants	Search traversal visits off-tenant nodes; filtering overhead at query time
Per-tenant index	Physical (separate index per tenant)	High — scales with tenant count	Clean traversal; no cross-tenant noise
Curator	Structural (tenant-specific subtrees + shared regions)	Moderate — shared subtrees reduce duplication	Traversal scope bounded per tenant; less filter overhead than shared index

The mechanism keeps tenant-specific clustering trees while sharing common subtrees where tenant embeddings overlap. That architecture avoids the all-or-nothing choice: shared-index designs pollute traversal paths; per-tenant designs replicate identical graph edges for every new tenant. Curator encodes tenant-specific structure compactly and shares only what can safely be shared.

The table below captures the qualitative isolation and latency behavior of the three designs:

Design	Isolation model	Memory behavior	Latency implication
Shared index	Logical (filter-based)	Low — one index for all tenants	Search traversal visits off-tenant nodes; filtering overhead at query time
Per-tenant index	Physical (separate index per tenant)	High — scales with tenant count	Clean traversal; no cross-tenant noise
Curator	Structural (tenant-specific subtrees + shared regions)	Moderate — shared subtrees reduce duplication	Traversal scope bounded per tenant; less filter overhead than shared index

The reproduction context for this table is limited to the cited paper. Exact numeric overhead ratios require parsing the full PDF.

Shared index vs per-tenant index: the baseline designs

The shared-index model is the default in many managed vector stores. Pinecone implements multi-tenancy through namespaces: "Namespaces let you partition records within an index and are essential for implementing multitenancy when you need to isolate the data of each customer/user." On Standard and Enterprise plans, Pinecone supports up to 100,000 namespaces per index, with stated support for million-scale namespace counts in specific use cases.

Qdrant takes a tiered approach as of v1.16.0: "With tiered multitenancy, you can implement two levels of tenant isolation within a single collection, keeping small tenants together inside a shared Shard, while isolating large tenants into their own dedicated Shards." This makes Qdrant's model a hybrid — logically shared for small tenants, physically isolated for large ones.

Both approaches reflect the same underlying trade-off the Curator paper formalizes. Namespace-based designs (Pinecone multi-tenancy) push the isolation boundary to query routing. Shard-based designs (Qdrant) push it to storage allocation. Neither eliminates the memory-versus-latency tension; they shift where it is managed.

Pro Tip: The shared-vs-per-tenant framing is necessary but not sufficient for production planning. What the baseline analysis omits is tenant size distribution — a system with a few large tenants and many small ones behaves differently from one with uniform tenant sizes. Qdrant's tiered model encodes this asymmetry explicitly; Pinecone multi-tenancy leaves it to the operator to manage through index design. Neither the paper nor vendor documentation provides a formula for the break-even point.

Tenant-specific clustering trees and compact shared subtrees

Curator's indexing approach encodes tenant identity into the structure of the index rather than relying on post-query filtering. Where conventional ANN indexes (HNSW, IVF, ScaNN) build a global graph or cluster assignment without tenant awareness, Curator maintains per-tenant clustering trees with shared subtree regions when tenants' data distributions overlap.

This design has a concrete operator implication: the memory cost of adding a new tenant is not the cost of a full index replica, nor is it zero. It is proportional to how much of the new tenant's data occupies regions not already encoded in the shared subtree structure. Tenants with highly similar embeddings (e.g., many SaaS tenants querying a shared domain corpus) benefit most; tenants with entirely distinct data distributions approach the memory cost of a per-tenant index.

Pro Tip: If your tenant population has high semantic overlap — for instance, all tenants query the same product catalog with tenant-specific access control — Curator's shared-subtree model captures the most memory savings. If tenants are domain-disjoint (e.g., a multi-industry platform where each tenant's vectors have no distributional overlap), the shared subtree shrinks and memory savings erode toward the per-tenant baseline. The Curator paper does not provide explicit guidance on when the break-even occurs, so operators must benchmark against their own tenant distribution before assuming general memory reduction.

Methodology and what the paper actually measured

The Curator paper's experimental scope is multi-tenant vector search — specifically, how index design choices affect memory usage and search performance when multiple tenants share infrastructure. The study compares Curator against the two canonical baselines: a single shared index and a collection of per-tenant indexes.

The paper's methodology centers on the trade-off between these baselines rather than on positioning Curator against standalone ANN indexes such as HNSW or IVF in single-tenant workloads. This is a deliberate scope boundary: the contribution is in the multi-tenant regime, and the measured outputs are memory usage and search performance under tenant-separated workloads.

The reproduction context for exact experimental details — dataset names, vector dimensionality, tenant count, hardware configuration, and complete result tables — requires reading the full PDF on arXiv. The search-accessible abstract establishes the design comparison and the directional findings; it does not expose numeric benchmark rows.

Design point	What the paper measures	Direction of finding
Shared index	Memory usage vs. per-tenant	Lower memory; higher latency cost
Per-tenant index	Query latency vs. shared	Lower latency; higher memory cost
Curator	Memory and latency vs. both baselines	Targets improvement on both dimensions

Note: Exact QPS, recall, p99 latency values, and memory-reduction percentages are not reproducible from the abstract alone. Treat the table above as a directional summary pending full-PDF extraction.

Metrics that matter: memory overhead, p99 latency, and tenant filtering

For platform engineers, the Curator paper surfaces three operationally concrete metrics: memory overhead relative to tenant count, tail latency under concurrent multi-tenant queries, and the cost of tenant filtering during search traversal.

Metric	Shared index	Per-tenant index	Curator	Units
Memory overhead vs. one tenant	~1.0x index memory	~N x index memory for N tenants	Between shared and per-tenant; distribution-dependent	x index memory
Tail latency (p99) under co-resident load	Elevated by off-tenant traversal and cache contention	Lowest isolation-induced p99	Lower than shared index; depends on overlap	ms
Tenant filtering cost	High — post-traversal filtering on shared structure	None for in-tenant traversal	Reduced by tenant-aware structure	candidate checks per query

Memory overhead is the most tractable to plan for. A shared index holds memory roughly constant as tenants grow; a per-tenant index scales linearly. Curator's shared-subtree design should sit between these curves, with the slope determined by tenant data distribution — a point the paper motivates even if it does not supply a universal formula.

Tail latency — specifically p99 — matters because multi-tenant workloads are bursty and uneven. One large tenant issuing high-QPS queries on a shared index can push p99 latency for small tenants upward, not because their queries are slow in isolation but because they compete for cache and traversal capacity. Per-tenant indexes decouple this; Curator's partial sharing introduces partial coupling.

Tenant filtering is the third dimension. In Pinecone multi-tenancy, "namespaces let you partition records within an index", making namespace routing the primary isolation mechanism. In Qdrant's shard model, shard-selection determines whether a query touches shared or dedicated storage. In both cases, the filtering mechanism interacts with the index structure to determine traversal scope. Curator's design internalizes tenant awareness into the index itself, which reduces the reliance on post-traversal filtering — a latency advantage that is most pronounced when tenant data distributions are tight.

What the results prove, and what they do not

Curator's measured contribution is specific: its indexing approach improves the memory-versus-latency trade-off relative to shared-index and per-tenant-index baselines in multi-tenant vector search workloads. The paper's own framing is comparative with those two design points, as the Semantic Scholar figure caption confirms: "Current multi-tenancy strategies incur inherent inefficiency in either query latency or memory usage, while Curator aims to optimize both simultaneously."

Watch Out: Curator is a multi-tenancy paper, not a universal ANN benchmark. It does not establish superiority across unrelated workloads, and it should not be used as proof that Curator beats HNSW, IVF, or ScaNN on single-tenant recall or throughput tests. Evaluate it within the multi-tenant regime the paper actually measured.

What this does not prove:

Curator does not establish dominance over HNSW, IVF, or ScaNN on single-tenant ANN-Benchmarks workloads. That comparison was not the study's objective.
The results do not generalize automatically to all vector dimensionalities, all tenant size distributions, or all hardware configurations.
No direct head-to-head with Qdrant's tiered shard model or Pinecone's namespace model under identical workloads has been retrieved from public sources.

Watch Out: Citing Curator as evidence of general ANN superiority misreads the paper's scope. The findings are valid within their experimental regime — multi-tenant isolation with two specific baseline comparisons. An operator who applies Curator's conclusions to a single-tenant, high-recall ANN search workload is extrapolating beyond the study's evidence. Evaluate Curator on multi-tenancy benchmarks; use ANN-Benchmarks for single-tenant recall and throughput comparisons.

What platform teams should infer for Qdrant and Pinecone

The Curator paper's findings do not directly benchmark Qdrant or Pinecone — but the isolation models those systems expose map cleanly onto the paper's design taxonomy, and the trade-off logic transfers.

Qdrant's tiered multitenancy in v1.16.0 is structurally closest to a hybrid of the paper's two baselines: "With tiered multitenancy, you can implement two levels of tenant isolation within a single collection" — shared shards for small tenants, dedicated shards for large tenants. This means Qdrant's memory profile is sublinear in tenant count when most tenants are small, but its per-large-tenant memory cost is equivalent to a per-tenant index. The Curator paper's finding that per-tenant designs trade memory for latency applies directly to Qdrant's dedicated-shard path.

Pinecone multi-tenancy uses namespaces as the isolation primitive. Standard and Enterprise plans support up to 100,000 namespaces per index, with stated capability for million-scale counts for specific use cases. The memory profile stays flat as namespace count grows because the underlying index is shared. The latency exposure follows the paper's prediction: cross-namespace traversal and filter cost scale with query load, and noisy-neighbor effects are real under high-concurrency workloads.

ComparisonTable

System	Isolation model	Memory behavior	Latency exposure	Closest paper analog
Qdrant (tiered, v1.16.0)	Hybrid: shared shards (small) + dedicated shards (large)	Sublinear until large-tenant threshold	Low for large tenants; shared-index risk for small tenants	Hybrid between paper's two baselines
Pinecone (namespaces)	Logical (namespace routing within shared index)	Flat as namespace count grows	Filter/routing overhead at query time; noisy-neighbor risk	Shared-index baseline
Curator (paper)	Structural (tenant-specific subtrees + shared regions)	Moderate; proportional to tenant data novelty	Reduced traversal scope vs. shared index	Novel design between baselines

The mapping here is conceptual, not a validated peer comparison. No public benchmark directly measures Qdrant tiered shards against Pinecone namespaces under identical multi-tenant load.

When shared infrastructure is worth the latency risk

Shared infrastructure is operationally acceptable when three conditions hold: tenant SLA requirements are uniform, tenant data distributions overlap enough that shared traversal paths are efficient, and operator control over noisy-neighbor mitigation exists at the routing layer.

Pinecone multi-tenancy fits this profile for SaaS products with many small customers sharing a common domain. The flat memory profile and high namespace ceiling (100,000 per index, scaling further on request) make it operationally simple. The latency risk is real but bounded when namespace routing is fast and per-namespace query rates stay low.

Qdrant's tiered model fits asymmetric tenant populations: "keeping small tenants together inside a shared Shard, while isolating large tenants into their own dedicated Shards" provides a natural break-even — small tenants benefit from memory sharing, large tenants get isolation without requiring a separate cluster.

DecisionMatrix

Condition	Shared infrastructure (Pinecone namespaces / Qdrant shared shards)	Per-tenant isolation (Qdrant dedicated shards / separate collections)
Tenant count	High (hundreds to millions)	Low to moderate (tens to hundreds)
SLA sensitivity	Relaxed or uniform across tenants	Strict or differentiated per tenant
Memory budget	Constrained	Flexible; cost is secondary
Tenant data distribution	High overlap	Domain-disjoint or low overlap
Noisy-neighbor tolerance	Acceptable	Not acceptable

Choose shared infrastructure when tenant count is high, SLAs are uniform, memory is constrained, tenant embeddings overlap substantially, and some noisy-neighbor exposure is acceptable.

Choose per-tenant isolation when tenant count is lower, SLAs are strict or tiered, memory budget can absorb duplication, tenant data is domain-disjoint, or noisy-neighbor risk cannot be tolerated.

No universal numeric threshold for when to switch between models has been established in public sources. Treat the matrix above as a qualitative decision guide, not a formula.

Deployment boundaries: filtering, noisy neighbors, and tail latency

Tenant filtering becomes the dominant query bottleneck when filter selectivity is low — that is, when a large fraction of vectors in the index must be examined before the tenant-specific result set is assembled. In a shared-index model, a query for a tenant that owns 0.1% of the index vectors may traverse a significant portion of the graph before filtering eliminates off-tenant candidates. This is the mechanism behind the Curator paper's finding that shared indexes sacrifice search performance: "The former optimizes for memory efficiency at the expense of search performance."

Noisy-neighbor effects compound filter latency. A large tenant running high-QPS queries saturates shared HNSW graph traversal capacity, polluting CPU cache and memory bandwidth for co-resident small tenants. Qdrant's dedicated-shard model for large tenants directly addresses this by removing large tenants from the shared traversal path. Pinecone's namespace model relies on service-level capacity management to bound this effect.

Pro Tip: In shared-index deployments, tenant filtering overhead scales inversely with tenant data fraction, and namespace routing or shard selection only helps when it shortens the candidate set before traversal finishes. A tenant owning 1% of vectors in a shared index faces 100× the filtering work per relevant result compared to a tenant owning 100% of a private index. When p99 latency for small tenants degrades under load, the first place to investigate is filter selectivity, not index build parameters. Moving high-value small tenants to dedicated isolation boundaries often recovers more tail latency than any HNSW ef or m tuning.

Limitations and caveats in the Curator paper

The Curator paper addresses one specific operational problem — the memory-versus-latency trade-off in multi-tenant vector search — and evaluates two specific baselines. That scope is the source of both its contribution and its limits.

The paper does not establish generalization across vector dimensionality ranges, arbitrary tenant size distributions, or all workload access patterns. The experimental details needed to reproduce the results (dataset provenance, hardware specifications, exact tenant configurations, and complete result tables) are not accessible from the abstract alone. Operators evaluating Curator must read the full PDF and assess whether the paper's experimental workload resembles their own production conditions before applying its conclusions.

Watch Out: The paper's benchmark scope is narrower than the problem it names. Multi-tenancy in vector databases spans many workload patterns — high-write, high-read, mixed-tenant query concurrency, variable recall requirements — and the paper evaluates a subset. Before adopting Curator's design conclusions, operators should verify that the paper's tenant count, vector distribution, and query pattern match their deployment context. A finding that holds at N tenants with K-dimensional vectors may not hold at 10N tenants with domain-disjoint distributions.

Why memory savings do not automatically mean lower total cost

Memory reduction in the index layer is one cost driver, not the whole cost model. In a self-hosted Qdrant deployment, the operational cost includes cluster node count, shard replication factor, query CPU allocation, and write amplification during tenant onboarding — none of which scale purely with index memory. Curator's memory savings apply to the in-memory index structure; they do not reduce storage costs for the raw vector payload, metadata, or WAL.

In Pinecone's managed model, cost depends on index size (pod tier and count), namespace count, and query volume — not just memory footprint. Reducing in-index memory overhead does not directly translate to a lower Pinecone bill unless that reduction crosses a pod-tier threshold.

Brief Context: Memory savings from shared-subtree designs reduce the per-tenant marginal cost of index growth — a meaningful planning input for platform teams projecting tenant count. But the full TCO calculation must include query CPU, storage, replication, and operational overhead. Qdrant's tiered shard model illustrates this: "keeping small tenants together inside a shared Shard, while isolating large tenants into their own dedicated Shards" saves memory for small tenants but adds shard management complexity as the large-tenant count grows. Curator's memory improvements are a necessary but not sufficient input to cost planning.

FAQ for engineers evaluating vector-store multi-tenancy

What is the trade-off between shared and per-tenant indexes?

The Curator paper states it precisely: "The former optimizes for memory efficiency at the expense of search performance, while the latter does the opposite." A shared index holds memory costs flat as tenant count grows but forces query traversal across off-tenant vectors. A per-tenant index delivers clean isolation and predictable latency but multiplies memory with tenant count.

What is Curator in vector databases?

Curator is an indexing method for multi-tenant vector databases that proposes tenant-specific clustering trees with shared subtree regions to reduce the memory cost of per-tenant isolation without accepting the full latency penalty of shared-index traversal. The paper (arXiv:2401.07119) evaluates Curator against shared-index and per-tenant-index baselines. It is a research proposal, not a production component shipping inside Qdrant or Pinecone.

Is Qdrant good for multi-tenancy?

Qdrant's tiered multitenancy (v1.16.0) is operationally practical for asymmetric tenant populations. Shared shards for small tenants keep memory costs low; dedicated shards for large tenants prevent noisy-neighbor latency contamination. The model aligns closely with the hybrid position Curator's paper motivates, without requiring Curator's specific indexing implementation.

How does Pinecone support multi-tenancy?

Pinecone multi-tenancy uses namespaces as the isolation primitive. Standard and Enterprise plans support up to 100,000 namespaces per index, scaling to million-scale counts for specific use cases. "Namespaces are created automatically as you upsert records", making onboarding operationally simple. The latency exposure is the shared-index baseline's filter overhead, which Pinecone manages at the service layer rather than at the index structure layer.

Does Curator replace HNSW or IVF for standard ANN search?

No. The Curator paper is a multi-tenancy study, not an ANN-Benchmarks submission. Its comparisons are against shared-index and per-tenant-index baselines under multi-tenant workloads. Claims about HNSW, IVF, or ScaNN recall-vs-latency curves in single-tenant regimes are outside the paper's evidence boundary.

When should an operator prefer per-tenant indexes over a shared index?

When any of the following apply: per-tenant SLAs are strict and differentiated, tenant data is domain-disjoint (low distributional overlap), noisy-neighbor effects are unacceptable, or the tenant population is small enough that the memory multiplier is manageable. Qdrant's dedicated-shard path and separate Qdrant collections both satisfy this requirement; Pinecone's namespace model does not provide equivalent physical isolation.

Sources & References

Curator: Efficient Indexing for Multi-Tenant Vector Databases (arXiv:2401.07119) — Primary paper; defines the shared-index vs. per-tenant-index trade-off and proposes Curator as a resolution.
Curator on Semantic Scholar (Jin & Wu) — Figure caption confirming Curator's dual optimization target.
Qdrant multitenancy documentation — Official docs for tiered multitenancy, shared shards, and dedicated shards in v1.16.0.
Pinecone multitenancy guide — Official docs for namespace-based isolation, namespace limits, and multi-tenant index design.
Pinecone upsert and namespace documentation — Source for namespace isolation framing and automatic namespace creation behavior.
Pinecone manage namespaces documentation — Confirms namespace lifecycle and creation semantics.

Keywords: Curator, Pinecone, Qdrant, HNSW, IVF, ScaNN, ANN-Benchmarks, tenant isolation, tail latency, memory overhead, tenant filtering, multi-tenant vector search, shared index, per-tenant index, vector database

Was this guide helpful?

Share: X · LinkedIn · Reddit