IN LIST: clean up generic static filtering#21927
Conversation
|
run benchmark in_list_strategy |
|
🤖 Criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing perf/in_list_generic_static_filter (ba66c2f) to 3aefba7 (merge-base) diff File an issue against this benchmark runner |
|
🤖 Criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
ba66c2f to
5a9378f
Compare
5a9378f to
a84579d
Compare
There are some regressions in benchmarks, I'll run again. |
|
run benchmark in_list_strategy |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing perf/in_list_generic_static_filter (a84579d) to c7e9284 (merge-base) diff using: in_list_strategy File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagein_list_strategy — base (merge-base)
in_list_strategy — branch
File an issue against this benchmark runner |
adriangb
left a comment
There was a problem hiding this comment.
Seems like it was all noise. LGTM!
@adriangb Thanks for the stamp! I've picked up this epic again as you can see, currently working on a clean PR stack. Draft PRs are up and linked to the main issue, but still need some work and tuning before ready for review status. |
|
@adriangb Do you want to merge this one? The rest of the stack should be mostly ready for review now, I just need to rebase on |
|
merging! |
## Which issue does this PR close? - Part of apache#19241. - Stacked on apache#21927. - Next in stack: apache#23012. - Extracted from apache#19390. ## Rationale for this change `IN LIST` evaluates expressions like `x IN (1, 3, 7)`. The list on the right is fixed, so DataFusion can precompute a small lookup structure once and then reuse it for every input row. For `UInt8`, there are only 256 possible values: 0 through 255. That means the lookup can be a tiny checklist with one bit per possible value: - If the list contains `3`, set bit `3`. - If the list contains `7`, set bit `7`. - To check whether an input value is present, read that one bit. So instead of hashing each input value or comparing it against the list, membership becomes one indexed bit test. The bitmap is only 32 bytes, because 256 bits = 32 bytes. This PR adds the first specialized primitive path in the stack as a concrete `UInt8` filter. The `UInt16` version is added in apache#23012, and the shared bitmap abstraction is introduced only after both concrete implementations are visible in apache#23035. ## What changes are included in this PR? - Adds `UInt8BitmapFilter`, a 32-byte bitmap built from the non-null constants in the `IN` list. - Routes `UInt8` constant-list filtering to that bitmap path. - Keeps the same SQL null behavior as the generic path for both `IN` and `NOT IN`. - Moves shared dictionary-needle handling into `static_filter.rs`, so specialized filters can reuse it consistently. - Adds focused tests for `UInt8` null handling and dictionary-encoded needles. ## Are these changes tested? Yes. - `cargo fmt --all` - `cargo test -p datafusion-physical-expr bitmap_filter_u8 --lib` - `cargo test -p datafusion-physical-expr in_list_int_types --lib` - `cargo clippy -p datafusion-physical-expr --all-targets --all-features -- -D warnings` ## Are there any user-facing changes? No. This is an internal performance optimization only. <!-- codex-benchmark-start --> ## Local benchmark snapshot Benchmark command: ```bash cargo bench -p datafusion-physical-expr --profile release-nonlto --bench in_list_strategy -- --save-baseline <name> ``` Method: compare adjacent saved baselines using raw Criterion sample minima (`min(time / iters)`). Lower is better; changes within +/-5% are treated as noise. These numbers were not rerun after splitting the bitmap abstraction into apache#23035. Compared baselines: [apache#21927](apache#21927) -> [apache#23011](apache#23011) Relevant scope: UInt8 narrow-integer rows. Summary: 5 relevant rows, 5 faster, 0 slower, 0 within +/-5%. | Benchmark | Before | After | Change | |---|---:|---:|---:| | `narrow_integer/u8/list=16/match=0%` | 20.39 us | 3.94 us | -80.7% (5.18x faster) | | `narrow_integer/u8/list=16/match=50%` | 38.38 us | 3.98 us | -89.6% (9.65x faster) | | `narrow_integer/u8/list=4/match=0%` | 18.18 us | 3.93 us | -78.4% (4.62x faster) | | `narrow_integer/u8/list=4/match=50%` | 34.63 us | 3.96 us | -88.6% (8.75x faster) | | `nulls/narrow_integer/u8/list=16/match=50%/nulls=20%` | 37.12 us | 4.16 us | -88.8% (8.93x faster) | <!-- codex-benchmark-end --> --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Which issue does this PR close?
IN LISToptimization series in OptimizeINperformance with specialized implementations #19390.Rationale for this change
After #21649, non-primitive constant
IN LISTevaluation still uses the extractedArrayStaticFilterfallback path. That path relies on comparator checks for each input row. This PR replaces that fallback lookup with a precomputed hash table and shared result construction so generic constant-list evaluation is cheaper before the later specialized primitive and string optimizations from #19390.What changes are included in this PR?
The PR is split so reviewers can separate mechanical cleanup from the behavior/performance changes:
Refactor generic InList static filter helpersPure refactoring. This moves the existing generic static-filter construction and probe loop into helper methods inside
ArrayStaticFilter, without changing the lookup data structure or result semantics.Build InList results from bitmapsChanges how the generic path materializes
BooleanArrayresults after membership has been computed. Instead of mixing membership checks and SQL three-valued null handling in the row loop, this builds a contains bitmap first and applies the null/negation rules with bitmap operations. This keeps the sameIN/NOT INsemantics, including theNULLcases.Optimize generic InList static filteringReplaces the fallback lookup storage from a unit-valued raw-entry
HashMaptohashbrown::HashTable<usize>. The table still stores indices into the constant list and still uses Arrow hashing plusmake_comparatorfor equality, but avoids the extra map value bookkeeping.The existing specialized primitive filters and dictionary handling are intentionally left out of scope.
Are these changes tested?
Yes.
Are there any user-facing changes?
No. This is an internal performance optimization only.
Local benchmark snapshot
Benchmark command:
Method: compare adjacent saved baselines using raw Criterion sample minima (
min(time / iters)). Lower is better; changes within +/-5% are treated as noise.Compared baselines: merge-base -> #21927
Relevant scope: generic fallback string/view/binary rows.
Summary: 62 relevant rows, 61 faster, 0 slower, 1 within +/-5%.
Largest relevant deltas:
utf8view/short_8b/list=64/match=0%utf8view/short_8b/list=256/match=0%utf8view/short_8b/list=16/match=0%utf8view/short_8b/list=4/match=0%utf8view/len_12b/list=16/match=0%utf8view/len_12b/list=64/match=0%fixed_size_binary/fsb16/list=10000/match=0%fixed_size_binary/fsb16/list=256/match=0%utf8view/shared_prefix/pfx=8/list=16/match=0%utf8view/mixed_len/list=16/match=0%fixed_size_binary/fsb16/list=4/match=0%fixed_size_binary/fsb16/list=64/match=0%utf8view/mixed_len/list=64/match=0%utf8/short_8b/list=256/match=0%utf8view/shared_prefix/pfx=12/list=32/match=0%Full relevant table (62 rows)
fixed_size_binary/fsb16/list=10000/match=0%fixed_size_binary/fsb16/list=10000/match=50%fixed_size_binary/fsb16/list=256/match=0%fixed_size_binary/fsb16/list=256/match=50%fixed_size_binary/fsb16/list=4/match=0%fixed_size_binary/fsb16/list=4/match=50%fixed_size_binary/fsb16/list=64/match=0%fixed_size_binary/fsb16/list=64/match=50%nulls/utf8/long_24b/list=16/match=50%/nulls=20%nulls/utf8/short_8b/list=16/match=50%/nulls=20%nulls/utf8view/long_24b/list=16/match=50%/nulls=20%nulls/utf8view/short_8b/list=16/match=50%/nulls=20%nulls/utf8view/short_8b/list=16/match=50%/nulls=20%/NOT_INnulls/utf8view/short_8b/list=16/match=50%/nulls=50%utf8/long_24b/list=256/match=0%utf8/long_24b/list=256/match=50%utf8/long_24b/list=4/match=0%utf8/long_24b/list=4/match=50%utf8/long_24b/list=64/match=0%utf8/long_24b/list=64/match=50%utf8/mixed_len/list=16/match=0%utf8/mixed_len/list=16/match=50%utf8/mixed_len/list=64/match=0%utf8/mixed_len/list=64/match=50%utf8/shared_prefix/pfx=12/list=32/match=50%utf8/short_8b/list=16/match=50%/NOT_INutf8/short_8b/list=256/match=0%utf8/short_8b/list=256/match=50%utf8/short_8b/list=4/match=0%utf8/short_8b/list=4/match=50%utf8/short_8b/list=64/match=0%utf8/short_8b/list=64/match=50%utf8view/len_12b/list=16/match=0%utf8view/len_12b/list=16/match=50%utf8view/len_12b/list=64/match=0%utf8view/len_12b/list=64/match=50%utf8view/long_24b/list=16/match=0%utf8view/long_24b/list=16/match=50%utf8view/long_24b/list=256/match=0%utf8view/long_24b/list=256/match=50%utf8view/long_24b/list=4/match=0%utf8view/long_24b/list=4/match=50%utf8view/long_24b/list=64/match=0%utf8view/long_24b/list=64/match=50%utf8view/mixed_len/list=16/match=0%utf8view/mixed_len/list=16/match=50%utf8view/mixed_len/list=64/match=0%utf8view/mixed_len/list=64/match=50%utf8view/shared_prefix/pfx=12/list=32/match=0%utf8view/shared_prefix/pfx=12/list=32/match=50%utf8view/shared_prefix/pfx=16/list=64/match=0%utf8view/shared_prefix/pfx=16/list=64/match=50%utf8view/shared_prefix/pfx=8/list=16/match=0%utf8view/shared_prefix/pfx=8/list=16/match=50%utf8view/short_8b/list=16/match=0%utf8view/short_8b/list=16/match=50%utf8view/short_8b/list=256/match=0%utf8view/short_8b/list=256/match=50%utf8view/short_8b/list=4/match=0%utf8view/short_8b/list=4/match=50%utf8view/short_8b/list=64/match=0%utf8view/short_8b/list=64/match=50%