AI & ML

How filtered vector search works under the hood

Filtered vector search is not one algorithm but a planner choice among pre-filtering, post-filtering, and inline-filtering: high-selectivity filters favor pre-filtering, low-selectivity filters favor post-filtering, and medium-selectivity filters can use inline strategies, but stale selectivity estimates can make the planner choose badly and hurt recall/latency.

By AxiomLogica Editorial

May 4, 202624 min read

Reviewed by Editorial

How filtered vector search works under the hood

What filtered vector search is solving

Relational predicates change the shape of a vector search problem in ways a pure ANN index cannot absorb. A query like "find the 10 embedding-nearest documents, but only among those owned by tenant X and published after 2024-01-01" cannot be handed wholesale to an HNSW graph or an IVF index and expect correct results — the index was built over the full corpus, not over that tenant's filtered slice.

Google Research defines filtered vector search as the operation where relational predicates restrict the candidate set before or during top-k similarity search. The same paper states plainly: "Filters alter the effective dataset size and distribution, impacting the search effort required." That single sentence captures the core engineering problem. Reducing the candidate set changes not just how many vectors the engine scores, but also the graph topology the traversal must work through — and that topological shift is what breaks naive ANN approaches.

Bottom Line: Filtered vector search is not a wrapper around ANN search; it is a distinct execution problem. The engine must decide when to apply the relational predicate relative to similarity traversal, and that decision — pre-filter, post-filter, or inline-filter — has direct consequences for recall, p99 latency, and resource consumption. The right choice depends on filter selectivity, cardinality, and data correlation, all of which change over a production workload's lifetime. ANN indexes alone are insufficient when predicates materially reduce the candidate set; the execution strategy must account for all three inputs.

How the query planner decides the execution path

The planner's job is to pick the execution strategy that minimizes total cost — typically a weighted combination of CPU wall-time, IOs, and recall loss. Google Research classifies those strategies as pre-filtering, post-filtering, and inline-filtering, with efficiency depending on filter selectivity, cardinality, and data correlation. In practice, no planner has perfect information, so the choice is driven by statistics, not ground truth.

The diagram below shows the three execution paths and where the predicate evaluation gate sits relative to ANN traversal:

flowchart LR
    Q([Query + Predicate]) --> PL[Query Planner]

    PL -->|High selectivity| PF_IN[Pre-filter\nEvaluate predicate first\nShrink candidate pool\nThen run ANN on subset]
    PL -->|Low selectivity| PF_OUT[Post-filter\nRun ANN first\nEvaluate predicate on ANN output]
    PL -->|Medium selectivity| IF[Inline-filter\nInterleave predicate\nevaluation during\ngraph traversal]

    PF_IN --> R([Top-k results])
    PF_OUT --> R
    IF --> R

AlloyDB's integration of ScaNN demonstrates what deep planner integration looks like in a SQL engine. As Google Cloud's AlloyDB engineering blog describes it: "By deeply integrating with the AlloyDB query planner, the ScaNN index is able to optimize the ordering of SQL filters in vector search based on your workload characteristics." That phrase — "ordering of SQL filters" — is the key: the planner decides whether a B-tree index on a metadata column fires before or after the vector scan, or whether both indexes run in tandem via inline filtering.

Pro Tip: Treat selectivity estimation as a first-class concern at schema design time, not as a post-hoc diagnostic. If your planner's cardinality estimates are wrong by an order of magnitude — common when metadata columns are skewed or correlated — it will pick the wrong strategy and you will see either recall collapse or latency spikes, depending on which direction the estimate drifts.

Where selectivity, cardinality, and correlation enter the decision

Filter selectivity is the fraction of the total corpus that survives the predicate. A filter with selectivity 0.001 (0.1 percent pass) is highly selective; a filter with selectivity 0.9 (90 percent pass) is broad. Cardinality is the count of distinct values in the filtered attribute — a column with two values (true/false) has cardinality 2; a user-ID column on a 50M-vector corpus might have cardinality 10M. Data correlation describes whether vectors that are embedding-similar also tend to share metadata values, which determines whether a filtered subgraph remains geometrically coherent after the predicate fires.

Google Research names all three as the main factors driving execution-strategy efficiency. AlloyDB's ScaNN documentation reinforces this: "Let's dive into what filter selectivity is and how AlloyDB ScaNN's index leverages it to improve the performance and quality of your search." The planner ingests these statistics as cost inputs. When they are accurate, the strategy selection is sound. When they are stale, the engine optimizes for a distribution that no longer exists.

Pro Tip: Selectivity estimation is a planner input, not a post-hoc diagnostic. If the estimate fed into the planner is wrong, the chosen strategy can be wrong in a way that degrades both recall and latency simultaneously — a worse outcome than either metric degrading in isolation. Track selectivity distribution for your most frequent filter predicates as a monitoring signal, not only as a schema-design artifact.

Why stale stats make the wrong strategy look attractive

Planners build cost models from statistics collected at a point in time. In a live system, metadata distributions shift — a tenant's document count grows 40×, a "status" enum that was uniformly distributed becomes 95 percent "archived" after a migration, a tag column develops a power-law distribution as usage patterns evolve. Each of these shifts changes the effective selectivity of predicates the planner already has cached estimates for.

Google Research frames this directly: changes in filter distribution and dataset size are the root source of planning errors. Google Cloud's adaptive filtering mechanism states: "At times the query planner may misjudge the selectivity of a filter due to outdated statistics, resulting in the vector search and filtering conditions being applied in a suboptimal order." Adaptive filtering uses real observed statistics at query time to override the stale plan, recovering correct ordering even as distributions evolve.

Watch Out: Two failure modes compound each other in production. First, a high-cardinality metadata column (e.g., user ID) produces a near-unique filter per query — the planner sees "high selectivity" for the column type but cannot account for per-value variance without per-value statistics. Second, skewed metadata distributions mean that queries against the top-1-percent bucket see 100× lower selectivity than the planner estimated from column-level averages. Both scenarios cause the wrong strategy to look optimal, and the wrong strategy in a high-QPS system means sustained recall degradation or latency blowout.

Pre-filtering when the predicate is highly selective

Pre-filtering wins when the predicate leaves only a small fraction of vectors eligible for similarity search. The FANS benchmark paper states this directly: "When a filter is highly selective and matches only a small portion of the tuples, a filter-first approach becomes highly advantageous." The mechanism is candidate shrinkage — by materializing the filtered set before running ANN traversal, the engine reduces the search space dramatically, which cuts traversal cost proportionally.

Dimension	High-selectivity pre-filter	Low-selectivity pre-filter
Candidate set size	Tiny (e.g., <1% of corpus)	Large (e.g., >20% of corpus)
Graph traversal cost	Low — fewer nodes to visit	High — nearly full index scan
Recall risk	Moderate — sparse subgraph	Low — dense subgraph
Filter evaluation overhead	High relative to ANN work	Low relative to ANN work
Recommended?	✅ Yes, if subgraph connected	❌ No — use post or inline

Qdrant filtering uses this logic through its payload index system: when a payload index exists on a filtered field and the planner determines that the filtered subset is small, Qdrant can locate matching records quickly rather than scanning all vectors. The effect is that the filterable HNSW graph navigates a smaller and more targeted neighborhood, reducing work per query.

Why pre-filtering can collapse the search space

Pre-filtering exploits the fact that filtered vector search changes the effective dataset size. With selectivity at 0.1 percent on a 50M-vector corpus, the planner works with 50,000 candidates — an HNSW traversal on that subset is orders of magnitude cheaper than traversal on the full graph. The similarity search sees a dramatically smaller problem, and most of that cost reduction comes for free if a payload index can deliver the filtered set without a full scan.

Production Note: High-selectivity predicates do reduce ANN work substantially, but the savings are not cost-free. If the filtered subset is small enough, the exact-filter evaluation against a payload index may dominate query time rather than the vector traversal. The crossover point depends on index type (hash vs. range vs. inverted), cardinality, and whether the filtered set fits in memory. Profile both components separately before attributing latency to ANN traversal.

Where pre-filtering breaks down in practice

The HNSW graph is built over the full corpus. When pre-filtering produces a small subset, the subgraph induced by those vectors may be sparsely connected or even disconnected — the navigable small-world property that makes HNSW fast depends on graph density, which the filtered slice may not preserve. The FANS benchmark summary warns that graph traversal with in-filtering strategies can fail when the filtered subgraph is sparsely connected.

Algorithms like ACORN and Filtered-DiskANN were developed specifically to address this: they modify index construction to maintain connectivity within filtered subsets, rather than relying on the full graph's topology to remain useful post-filtering. For standard HNSW without filtered-graph augmentation, extremely high selectivity (below ~0.01 percent) can produce recall values far below what the planner expects, because the subgraph navigation cannot find enough valid neighbors.

Watch Out: Fragmented HNSW subgraphs are a silent recall killer. The engine will return k results and report success, but those results may not be the true top-k because the traversal was stuck in a disconnected component. Validate recall at production-representative selectivity values, not just at the average. A filter that usually returns 0.5 percent but occasionally returns 0.005 percent of the corpus will behave very differently at the tail.

Post-filtering when the predicate is broad

Post-filtering is the default strategy for broad predicates — those where most vectors satisfy the filter condition. The FANS benchmark confirms post-filtering is most efficient when selectivity is high, meaning most items pass the filter. The ANN index runs first over the full corpus, producing a candidate list of size ef (the exploration factor, typically set well above k), and the metadata predicate then discards ineligible candidates from that list.

Pinecone metadata filtering is the canonical example of a managed service that exposes this pattern. Pinecone's filtering language supports expression-based operators applied at query time, and for broad predicates — where only a small fraction of ANN-returned candidates will be excluded — this architecture performs well without requiring any planner intelligence.

Dimension	Broad predicate (post-filter)	Narrow predicate (post-filter)
ANN candidate pool	Unrestricted — full index	Unrestricted — full index
Filter discard rate	Low — most candidates survive	High — most candidates discarded
Recall risk	Low — k survivors easy to find	High — may not reach k after discard
ANN traversal efficiency	Preserved	Preserved — but wasted for discarded
Recommended?	✅ Yes	❌ No — use pre-filter or inline

Why broad filters preserve ANN efficiency

Low-selectivity filters — predicates that admit most of the corpus — are ideal for post-filtering because the ANN index can traverse a large, geometrically coherent graph without immediate candidate starvation. The graph topology is intact, the navigable-small-world guarantees hold, and the traversal produces a high-quality candidate list. The metadata filter then makes minor trimmings on the output.

Pinecone's filtering works well in this regime precisely because the managed service does not need to modify graph structure or maintain filtered sub-indexes. The ANN layer delivers candidates; the filter layer discards a small fraction.

Pro Tip: Low-selectivity filters often favor post-filtering because the ANN graph operates over a less constrained search space without immediate candidate starvation. If your predicate admits more than roughly 20 percent of the corpus, post-filtering typically delivers the best recall-latency trade-off without requiring graph augmentation or inline traversal logic. Below that threshold, model selectivity explicitly and consider whether inline-filtering or pre-filtering becomes more cost-effective.

Why post-filtering can miss the true top-k

Post-filtering has a structural recall hazard that becomes acute when ANN returns a fixed-size candidate list before the filter runs. If ef (the exploration factor) is set to, say, 2k, the ANN traversal returns 2k candidates. If the predicate discards 90 percent of those, only 0.2k valid candidates survive — fewer than k, the requested result count. The engine returns a partial result set with no error signal that it missed the true top-k.

The FANS benchmark quantifies this failure mode: if selectivity is too low, ANNS may not return enough valid candidates for post-filtering. This is not a theoretical edge case; it occurs whenever the ef value was calibrated for a broad-filter workload and a narrow filter arrives without triggering a strategy change.

Watch Out: Post-filtering can silently return fewer than k valid matches when ANN truncation occurs before metadata validation. The query succeeds from the engine's perspective — it returns the best results it found — but the recall relative to the true top-k may be 40–60 percent. Engines that expose ef as a configurable parameter allow you to oversize the candidate list as a defensive measure, but this trades latency for recall. There is no free lunch; the real fix is a strategy switch or an inline-filtering implementation.

Inline-filtering and filterable graph traversal

Inline-filtering applies the relational predicate during graph traversal rather than before or after. As the traversal expands neighbors from the current node, it evaluates the predicate on each candidate and excludes ineligible nodes from the beam. The effect is that the search never wastes traversal steps on vectors that the predicate would reject, but it also never restricts the graph topology the way pre-filtering does.

AlloyDB Omni documents this directly: "Inline filtering is a query optimization strategy where AlloyDB Omni uses both vector and other secondary indexes to perform vector search and filter evaluation in tandem."

sequenceDiagram
    participant Q as Query
    participant PL as Planner
    participant VI as Vector Index (ScaNN/HNSW)
    participant SI as Secondary Index (B-tree/Inverted)
    participant R as Result Buffer

    Q->>PL: Query + predicate + k
    PL->>VI: Begin ANN traversal (ef candidates)
    loop For each neighbor node expanded
        VI->>SI: Evaluate predicate on candidate
        SI-->>VI: Pass / Reject
        VI->>R: Append if pass, skip if reject
    end
    R-->>Q: Top-k filtered results

The inline strategy sits between the two extremes in the selectivity spectrum. It avoids the subgraph fragmentation risk of pure pre-filtering because the global graph topology remains intact — the traversal starts from the same entry points and navigates the same edges. It avoids the candidate-starvation risk of pure post-filtering because ineligible vectors are pruned during traversal rather than after a fixed-size truncation.

Pro Tip: Inline filtering is most valuable in the medium-selectivity band — predicates where neither pre-filtering nor post-filtering dominates. The cost of predicate evaluation per neighbor depends heavily on whether a secondary index supports the predicate (B-tree range lookup is cheap; sequential scan of a payload column is not). Ensure your inline-filter predicates are backed by secondary indexes or the per-node evaluation overhead will negate the traversal savings.

How Qdrant materializes one-stage filtering

Qdrant filtering implements a variant of inline filtering through its filterable HNSW index. The key operational constraint is build order: payload indexes must be created before the HNSW index is built. As Qdrant's own example notebook documents: "This should dramatically improve filtering performance by allowing Qdrant to quickly locate matching records instead of scanning all vectors." Without a payload index, Qdrant falls back to scanning, which degrades to brute-force behavior under narrow predicates.

Dimension	Qdrant inline (payload-indexed)	Pure pre-filter	Pure post-filter
Graph topology	Full — intact HNSW graph	Subset — fragmented risk	Full — intact HNSW graph
Predicate evaluation	During traversal via payload index	Before traversal	After ANN output
Recall at high selectivity	Good — traversal continues globally	Fragmentation risk	Candidate starvation risk
Recall at low selectivity	Good — low discard during traversal	Wasteful — near-full scan	Good
Build order dependency	Payload index must precede HNSW build	N/A	N/A

Qdrant's planner adapts its strategy at the segment level based on estimated selectivity. The filterable HNSW path activates when a payload index exists and the planner estimates that inline traversal is cheaper than a brute-force scan of the filtered set. This is the same decision axis Google Research describes — but Qdrant exposes the payload-index/HNSW-build-order constraint as a concrete operational requirement rather than a planner-internal abstraction.

Why AlloyDB and ScaNN start broad and adapt

AlloyDB with ScaNN takes a different materialization approach. Rather than modifying the graph construction phase, it integrates with the SQL query planner to dynamically order filter evaluation. The planner starts from observed workload statistics, selects an execution strategy, and then — via adaptive filtering — uses real observed statistics at query time when those static estimates drift.

Google Cloud's adaptive filtering mechanism is the production-facing answer to the stale-statistics problem described above. When the query planner misjudges selectivity due to outdated statistics, adaptive filtering detects the mismatch during execution and adjusts the filter ordering mid-query. This converts a planning failure from a silent recall degradation into a recoverable runtime adaptation.

Production Note: AlloyDB's adaptive filtering increases in value proportionally to how fast your data distributions evolve. In stable datasets with regular ANALYZE cycles, the static planner is often sufficient. In high-churn systems — where new tenants appear, old tenants deactivate, and metadata distributions shift week-over-week — adaptive filtering is the mechanism that prevents the system from silently optimizing for a distribution that no longer exists. The architecture deliberately starts broad (full vector search, then filter) and adapts down toward tighter pre-filtering as evidence accumulates.

Why ANN indexes struggle when filters change the dataset

Standard ANN indexes — HNSW, IVF, ScaNN — are built to optimize similarity search over a fixed distribution. Their construction algorithms assume the search graph or partition structure reflects the actual query-time candidate pool. Filtered vector search violates that assumption every time a predicate materially reduces the candidate set, because the index was optimized for a different effective dataset size and distribution.

Google Research states this as the core complication: filtered vector search alters the effective dataset size and distribution, making ANN optimization harder than unfiltered search. The FANS benchmark confirms the downstream consequence: ANNS can fail to return enough valid candidates when selectivity is too low, directly degrading filtered-search quality.

The mathematical relationship is straightforward. If the corpus contains $N$ vectors and the filter has selectivity $s$ (fraction of vectors passing the predicate), the effective search set size is:

$(N_{\text{eff}} = s \cdot N)$

ANN index quality — specifically recall at a given ef — degrades as (N_{\text{eff}}) shrinks relative to $N$, because the index structure was optimized for $N$, not (N_{\text{eff}}). At extreme selectivity ((s \rightarrow 0)), the traversal may find fewer than $k$ nodes within the filtered set before exhausting its beam, and recall approaches the brute-force rate on the tiny filtered subset rather than the ANN rate on the full corpus.

Pro Tip: Monitor recall separately for queries segmented by estimated filter selectivity. An aggregate recall-at-10 of 0.95 across all queries can mask recall-at-10 of 0.60 on the most selective 5 percent of queries — and those selective queries are often the highest-value ones (per-tenant isolation, compliance-restricted data access). Instrument selectivity buckets in your observability pipeline so that latency and recall metrics surface per selectivity tier.

How data correlation changes candidate quality

Data correlation between metadata attributes and embedding space is the least-intuitive of the three planner inputs. If vectors that are embedding-similar also tend to share a metadata value, then the filtered subgraph induced by that predicate is geometrically coherent — semantically similar vectors remain neighbors in the filtered subgraph. The inverse is also true: if metadata is distributed independently of embedding space, the filtered subset samples the full graph uniformly, which fragments the navigable-small-world structure.

Google Research explicitly names data correlation as a key determinant of filtered vector search efficiency — and frames it as one reason the effective dataset distribution changes under filtering. Correlated metadata creates benign subgraphs; uncorrelated metadata creates adversarial ones.

Watch Out: Correlated metadata can make naive selectivity estimates appear accurate in development but diverge dangerously under production drift. If your development dataset has language-based clustering (all English documents in one tenant, all Spanish in another), the filtered subgraph is coherent. After users begin mixing languages within a single tenant, the same predicate induces a fragmented subgraph — same selectivity, much worse recall. Correlation is not a static property; validate subgraph connectivity under representative production filter patterns, not just development distributions.

What the paper says about open research problems

Google Research positions filtered vector search as an area with material open problems rather than a solved engineering problem. The paper's framing implies that adaptive planning and workload-aware optimization remain unresolved in practical ANN systems — no single strategy dominates all selectivity regimes, and the challenge of maintaining accurate runtime selectivity estimates at scale has no general solution as of early 2026.

Algorithmic work like ACORN and Filtered-DiskANN addresses the graph-structure side of the problem — building indexes that remain connected under filtered access patterns. The planning side — accurately estimating selectivity, detecting distribution drift, and switching strategies mid-workload — is less mature in open-source systems.

Pro Tip: The open research problems map directly to production pain points: adaptive planning under workload shift, mixed-selectivity query batches where different queries in the same second want different execution paths, and accurate selectivity estimation for high-cardinality or correlated metadata columns. If your system encounters these, you are operating at the frontier of what current implementations handle reliably. Invest in monitoring selectivity distributions and validating recall at tail selectivity values as your primary mitigation.

What this means for operators choosing a vector store

The choice of vector store is partly a choice of how much visibility and control you have over the filtering execution path. The three strategies — pre-filter, post-filter, inline-filter — are available in different forms across Pinecone, Qdrant, and AlloyDB Omni, but the operational exposure differs significantly.

System	Filtering strategy	Planner visibility	Config surface	Recall risk at high selectivity
Pinecone	Post-filter (managed)	Low — opaque service	Filter expression syntax	Candidate starvation possible
Qdrant (self-hosted)	Inline via filterable HNSW	Medium — payload index control	Payload index + HNSW build order	Fragmentation if payload index missing
AlloyDB + ScaNN	Inline + adaptive planner	High — SQL EXPLAIN, adaptive stats	ANALYZE cadence, scann parameters	Stale stats → wrong strategy

The recall risk column is the most operationally important. All three systems can produce correct results under the right conditions. The question is whether the failure mode is observable and recoverable — and that depends on how much planner surface the system exposes.

When to trust managed filtering

Managed services like Pinecone metadata filtering remove the payload-index/build-order concerns that self-hosted systems expose. You write a filter expression at query time; the service handles execution strategy. For workloads where predicates are broad — admitting 20 percent or more of the indexed corpus — this is a safe default. The post-filter architecture performs well, recall is predictable, and the operational surface is minimal.

Production Note: Managed filtering reduces tuning burden substantially compared with self-hosted systems that expose payload index configuration, graph build order, and ANALYZE scheduling. The cost is reduced planner visibility — you cannot inspect why a specific query's recall dropped after a metadata distribution shift, and you cannot force a strategy change for a specific predicate class. If your workload contains high-selectivity or highly variable predicates, that opacity becomes a liability. For stable, broad-filter workloads, it is a reasonable trade.

When self-hosted filtering pays off

Qdrant filtering in a self-hosted deployment pays off when you need direct control over three dimensions: payload index configuration (which fields are indexed and how), HNSW build order (payload indexes must precede HNSW construction for filterable behavior), and multi-tenancy isolation (separate collections or named vectors per tenant to prevent cross-tenant recall interference).

The Qdrant examples notebook makes the operational dependency explicit: filterable HNSW behavior is only activated when payload indexes are created before the HNSW index. This is not a limitation — it is a contract that gives operators a deterministic way to enable or disable inline-filtering behavior per field. Engineers operating Qdrant at 50M+ vectors need to evaluate filtering with and without payload indexes to find the optimal configuration for their specific selectivity distribution, as Qdrant's own course material recommends.

Production Note: Self-hosted Qdrant exposes write amplification trade-offs that managed services hide. Adding a payload index increases write overhead — every insert or update must maintain the index in addition to the HNSW structure. In write-heavy multi-tenant workloads, the amplification cost is non-trivial. Profile write throughput with and without payload indexes on your specific schema before committing to a per-field indexing strategy. The filtering performance gains are real, but so is the write-path cost.

Questions readers ask next

What is filtered vector search? Filtered vector search is a query operation that combines a relational predicate (a metadata filter) with approximate nearest-neighbor similarity search, restricting the candidate set to only those vectors that satisfy both the filter and the top-k similarity criterion. It is not a single algorithm but a planner-chosen execution strategy among pre-filtering, post-filtering, and inline-filtering, each suited to different filter selectivity regimes.

What are pre-filtering and post-filtering in vector search? Pre-filtering evaluates the relational predicate first, materializes a reduced candidate set, and then runs ANN search within that subset. It is advantageous when the predicate is highly selective (a small fraction of the corpus passes). Post-filtering runs ANN search first over the full index, then applies the predicate to the ANN output. It is advantageous when the predicate is broad (most of the corpus passes). Both strategies fail at the opposite selectivity extreme: pre-filtering risks subgraph fragmentation, post-filtering risks candidate starvation.

How does Qdrant handle filtering in vector search? Qdrant implements inline-filtering through a filterable HNSW index. Payload indexes on metadata fields must be created before the HNSW index is built — this is a hard operational dependency. With payload indexes in place, Qdrant's planner can evaluate the predicate during graph traversal rather than before or after, which preserves global graph connectivity while avoiding full-index post-filter candidate starvation. Without payload indexes, Qdrant falls back to scanning, which degrades to brute-force behavior on narrow predicates.

Does Pinecone support metadata filtering? Yes. Pinecone metadata filtering is a first-class query capability that supports expression-based filter operators at query time. Pinecone's architecture applies metadata filters as a post-filter step on ANN results. This works well for broad predicates but can produce recall degradation on highly selective predicates if the ANN candidate list (sized by ef) does not contain enough valid results before the filter discards candidates.

Why does filtering hurt vector search recall? Filtering hurts recall through two distinct mechanisms. Pre-filtering can fragment the ANN subgraph when the filtered subset is small, preventing the traversal from finding true nearest neighbors within the restricted graph. Post-filtering can starve the result set when ANN truncates the candidate list to ef entries before the filter runs, leaving fewer than k valid results after discard. Inline-filtering mitigates both risks but still depends on secondary index support for predicate evaluation and on accurate selectivity estimation by the planner.

Sources and references

Filtered Vector Search: State-of-the-art and Research Opportunities — Google Research paper (arXiv:2401.07119); primary source for execution strategy classification and the role of selectivity, cardinality, and data correlation
arXiv:2401.07119 — arXiv mirror of the Google Research filtered vector search paper
AlloyDB AI's ScaNN index improves search on all kinds of data — Google Cloud engineering blog; source for ScaNN planner integration and adaptive filtering quotes
AlloyDB Omni: Filtered vector search overview — Google Cloud official docs; source for inline-filtering definition and tandem index evaluation
Pinecone: Filter by metadata — Pinecone official documentation; source for metadata filtering operators and post-filter architecture
Qdrant Essentials Course: Day 2 Pitstop Project — Qdrant official course material; source for payload index tuning recommendations
Qdrant HNSW Performance Tuning notebook — Qdrant official examples; source for filterable HNSW build-order dependency and payload index performance quote
FANS: Filtered Approximate Nearest-Neighbor Search Benchmark — Benchmark paper; source for quantitative characterization of pre-filter, post-filter, and inline-filter efficiency across selectivity regimes
Adaptive Filtering: Improving AlloyDB AI Vector Search — Secondary coverage of Google Cloud's adaptive filtering mechanism; corroborates planner mis-estimation failure mode

Keywords: filtered vector search, Qdrant filtering, Pinecone metadata filtering, AlloyDB Omni, ScaNN, HNSW, ANN indexes, selectivity estimation, cardinality, data correlation, ACORN, Filtered-DiskANN, PostgreSQL query planner, recall, p99 latency

Was this guide helpful?

Share: X · LinkedIn · Reddit