Skip to content

IN LIST: clean up generic static filtering#21927

Merged
adriangb merged 3 commits into
apache:mainfrom
geoffreyclaude:perf/in_list_generic_static_filter
Jun 22, 2026
Merged

IN LIST: clean up generic static filtering#21927
adriangb merged 3 commits into
apache:mainfrom
geoffreyclaude:perf/in_list_generic_static_filter

Conversation

@geoffreyclaude

@geoffreyclaude geoffreyclaude commented Apr 29, 2026

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

After #21649, non-primitive constant IN LIST evaluation still uses the extracted ArrayStaticFilter fallback path. That path relies on comparator checks for each input row. This PR replaces that fallback lookup with a precomputed hash table and shared result construction so generic constant-list evaluation is cheaper before the later specialized primitive and string optimizations from #19390.

What changes are included in this PR?

The PR is split so reviewers can separate mechanical cleanup from the behavior/performance changes:

  1. Refactor generic InList static filter helpers

    Pure refactoring. This moves the existing generic static-filter construction and probe loop into helper methods inside ArrayStaticFilter, without changing the lookup data structure or result semantics.

  2. Build InList results from bitmaps

    Changes how the generic path materializes BooleanArray results after membership has been computed. Instead of mixing membership checks and SQL three-valued null handling in the row loop, this builds a contains bitmap first and applies the null/negation rules with bitmap operations. This keeps the same IN / NOT IN semantics, including the NULL cases.

  3. Optimize generic InList static filtering

    Replaces the fallback lookup storage from a unit-valued raw-entry HashMap to hashbrown::HashTable<usize>. The table still stores indices into the constant list and still uses Arrow hashing plus make_comparator for equality, but avoids the extra map value bookkeeping.

The existing specialized primitive filters and dictionary handling are intentionally left out of scope.

Are these changes tested?

Yes.

Are there any user-facing changes?

No. This is an internal performance optimization only.

Local benchmark snapshot

Benchmark command:

cargo bench -p datafusion-physical-expr --profile release-nonlto --bench in_list_strategy -- --save-baseline <name>

Method: compare adjacent saved baselines using raw Criterion sample minima (min(time / iters)). Lower is better; changes within +/-5% are treated as noise.

Compared baselines: merge-base -> #21927

Relevant scope: generic fallback string/view/binary rows.

Summary: 62 relevant rows, 61 faster, 0 slower, 1 within +/-5%.

Largest relevant deltas:

Benchmark Before After Change
utf8view/short_8b/list=64/match=0% 45.55 us 21.85 us -52.0% (2.08x faster)
utf8view/short_8b/list=256/match=0% 44.34 us 21.71 us -51.0% (2.04x faster)
utf8view/short_8b/list=16/match=0% 44.03 us 21.60 us -50.9% (2.04x faster)
utf8view/short_8b/list=4/match=0% 41.54 us 20.52 us -50.6% (2.02x faster)
utf8view/len_12b/list=16/match=0% 41.43 us 20.55 us -50.4% (2.02x faster)
utf8view/len_12b/list=64/match=0% 41.59 us 21.00 us -49.5% (1.98x faster)
fixed_size_binary/fsb16/list=10000/match=0% 58.11 us 29.36 us -49.5% (1.98x faster)
fixed_size_binary/fsb16/list=256/match=0% 55.49 us 28.57 us -48.5% (1.94x faster)
utf8view/shared_prefix/pfx=8/list=16/match=0% 57.54 us 32.07 us -44.3% (1.79x faster)
utf8view/mixed_len/list=16/match=0% 62.86 us 35.25 us -43.9% (1.78x faster)
fixed_size_binary/fsb16/list=4/match=0% 47.62 us 27.20 us -42.9% (1.75x faster)
fixed_size_binary/fsb16/list=64/match=0% 47.85 us 27.45 us -42.6% (1.74x faster)
utf8view/mixed_len/list=64/match=0% 66.09 us 38.00 us -42.5% (1.74x faster)
utf8/short_8b/list=256/match=0% 52.09 us 30.49 us -41.5% (1.71x faster)
utf8view/shared_prefix/pfx=12/list=32/match=0% 70.61 us 42.33 us -40.1% (1.67x faster)
Full relevant table (62 rows)
Benchmark Before After Change
fixed_size_binary/fsb16/list=10000/match=0% 58.11 us 29.36 us -49.5% (1.98x faster)
fixed_size_binary/fsb16/list=10000/match=50% 98.77 us 81.20 us -17.8% (1.22x faster)
fixed_size_binary/fsb16/list=256/match=0% 55.49 us 28.57 us -48.5% (1.94x faster)
fixed_size_binary/fsb16/list=256/match=50% 96.40 us 79.32 us -17.7% (1.22x faster)
fixed_size_binary/fsb16/list=4/match=0% 47.62 us 27.20 us -42.9% (1.75x faster)
fixed_size_binary/fsb16/list=4/match=50% 93.08 us 75.58 us -18.8% (1.23x faster)
fixed_size_binary/fsb16/list=64/match=0% 47.85 us 27.45 us -42.6% (1.74x faster)
fixed_size_binary/fsb16/list=64/match=50% 95.20 us 74.96 us -21.3% (1.27x faster)
nulls/utf8/long_24b/list=16/match=50%/nulls=20% 85.74 us 74.79 us -12.8% (1.15x faster)
nulls/utf8/short_8b/list=16/match=50%/nulls=20% 80.01 us 77.30 us -3.4% (within +/-5%)
nulls/utf8view/long_24b/list=16/match=50%/nulls=20% 110.19 us 96.52 us -12.4% (1.14x faster)
nulls/utf8view/short_8b/list=16/match=50%/nulls=20% 74.78 us 62.92 us -15.9% (1.19x faster)
nulls/utf8view/short_8b/list=16/match=50%/nulls=20%/NOT_IN 71.24 us 63.51 us -10.9% (1.12x faster)
nulls/utf8view/short_8b/list=16/match=50%/nulls=50% 83.84 us 62.11 us -25.9% (1.35x faster)
utf8/long_24b/list=256/match=0% 58.79 us 37.57 us -36.1% (1.56x faster)
utf8/long_24b/list=256/match=50% 107.85 us 74.62 us -30.8% (1.45x faster)
utf8/long_24b/list=4/match=0% 56.68 us 37.64 us -33.6% (1.51x faster)
utf8/long_24b/list=4/match=50% 100.40 us 79.11 us -21.2% (1.27x faster)
utf8/long_24b/list=64/match=0% 59.39 us 35.95 us -39.5% (1.65x faster)
utf8/long_24b/list=64/match=50% 101.26 us 79.59 us -21.4% (1.27x faster)
utf8/mixed_len/list=16/match=0% 60.51 us 49.06 us -18.9% (1.23x faster)
utf8/mixed_len/list=16/match=50% 154.00 us 139.13 us -9.7% (1.11x faster)
utf8/mixed_len/list=64/match=0% 63.46 us 49.87 us -21.4% (1.27x faster)
utf8/mixed_len/list=64/match=50% 154.01 us 134.01 us -13.0% (1.15x faster)
utf8/shared_prefix/pfx=12/list=32/match=50% 98.73 us 76.64 us -22.4% (1.29x faster)
utf8/short_8b/list=16/match=50%/NOT_IN 96.18 us 72.15 us -25.0% (1.33x faster)
utf8/short_8b/list=256/match=0% 52.09 us 30.49 us -41.5% (1.71x faster)
utf8/short_8b/list=256/match=50% 94.56 us 74.39 us -21.3% (1.27x faster)
utf8/short_8b/list=4/match=0% 51.95 us 32.27 us -37.9% (1.61x faster)
utf8/short_8b/list=4/match=50% 95.05 us 78.47 us -17.4% (1.21x faster)
utf8/short_8b/list=64/match=0% 53.60 us 33.34 us -37.8% (1.61x faster)
utf8/short_8b/list=64/match=50% 96.35 us 80.95 us -16.0% (1.19x faster)
utf8view/len_12b/list=16/match=0% 41.43 us 20.55 us -50.4% (2.02x faster)
utf8view/len_12b/list=16/match=50% 73.07 us 50.49 us -30.9% (1.45x faster)
utf8view/len_12b/list=64/match=0% 41.59 us 21.00 us -49.5% (1.98x faster)
utf8view/len_12b/list=64/match=50% 75.23 us 50.25 us -33.2% (1.50x faster)
utf8view/long_24b/list=16/match=0% 58.48 us 38.22 us -34.7% (1.53x faster)
utf8view/long_24b/list=16/match=50% 109.63 us 87.32 us -20.4% (1.26x faster)
utf8view/long_24b/list=256/match=0% 61.12 us 38.40 us -37.2% (1.59x faster)
utf8view/long_24b/list=256/match=50% 113.25 us 91.61 us -19.1% (1.24x faster)
utf8view/long_24b/list=4/match=0% 58.43 us 39.48 us -32.4% (1.48x faster)
utf8view/long_24b/list=4/match=50% 112.73 us 90.14 us -20.0% (1.25x faster)
utf8view/long_24b/list=64/match=0% 62.17 us 38.48 us -38.1% (1.62x faster)
utf8view/long_24b/list=64/match=50% 109.35 us 87.64 us -19.8% (1.25x faster)
utf8view/mixed_len/list=16/match=0% 62.86 us 35.25 us -43.9% (1.78x faster)
utf8view/mixed_len/list=16/match=50% 126.60 us 103.97 us -17.9% (1.22x faster)
utf8view/mixed_len/list=64/match=0% 66.09 us 38.00 us -42.5% (1.74x faster)
utf8view/mixed_len/list=64/match=50% 137.76 us 112.23 us -18.5% (1.23x faster)
utf8view/shared_prefix/pfx=12/list=32/match=0% 70.61 us 42.33 us -40.1% (1.67x faster)
utf8view/shared_prefix/pfx=12/list=32/match=50% 115.15 us 94.27 us -18.1% (1.22x faster)
utf8view/shared_prefix/pfx=16/list=64/match=0% 63.47 us 40.67 us -35.9% (1.56x faster)
utf8view/shared_prefix/pfx=16/list=64/match=50% 112.27 us 91.32 us -18.7% (1.23x faster)
utf8view/shared_prefix/pfx=8/list=16/match=0% 57.54 us 32.07 us -44.3% (1.79x faster)
utf8view/shared_prefix/pfx=8/list=16/match=50% 100.47 us 82.69 us -17.7% (1.21x faster)
utf8view/short_8b/list=16/match=0% 44.03 us 21.60 us -50.9% (2.04x faster)
utf8view/short_8b/list=16/match=50% 72.92 us 49.10 us -32.7% (1.49x faster)
utf8view/short_8b/list=256/match=0% 44.34 us 21.71 us -51.0% (2.04x faster)
utf8view/short_8b/list=256/match=50% 72.43 us 51.58 us -28.8% (1.40x faster)
utf8view/short_8b/list=4/match=0% 41.54 us 20.52 us -50.6% (2.02x faster)
utf8view/short_8b/list=4/match=50% 72.50 us 48.46 us -33.2% (1.50x faster)
utf8view/short_8b/list=64/match=0% 45.55 us 21.85 us -52.0% (2.08x faster)
utf8view/short_8b/list=64/match=50% 73.14 us 50.92 us -30.4% (1.44x faster)

@github-actions github-actions Bot added the physical-expr Changes to the physical-expr crates label Apr 29, 2026
@geoffreyclaude

Copy link
Copy Markdown
Contributor Author

run benchmark in_list_strategy

@adriangbot

Copy link
Copy Markdown

🤖 Criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4345337604-1911-hz6fq 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing perf/in_list_generic_static_filter (ba66c2f) to 3aefba7 (merge-base) diff
BENCH_NAME=in_list_strategy
BENCH_COMMAND=cargo bench --features=parquet --bench in_list_strategy
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot

Copy link
Copy Markdown

🤖 Criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                                                main                                   perf_in_list_generic_static_filter
-----                                                                ----                                   ----------------------------------
dictionary/i32/dict=10/list=16                                       1.00      7.6±0.01µs        ? ?/sec    1.00      7.6±0.01µs        ? ?/sec
dictionary/i32/dict=100/list=16                                      1.00      7.7±0.01µs        ? ?/sec    1.00      7.7±0.01µs        ? ?/sec
dictionary/i32/dict=100/list=16/NOT_IN                               1.00      7.7±0.02µs        ? ?/sec    1.00      7.7±0.01µs        ? ?/sec
dictionary/i32/dict=100/list=4                                       1.00      7.7±0.01µs        ? ?/sec    1.00      7.7±0.01µs        ? ?/sec
dictionary/i32/dict=100/list=64                                      1.00      7.8±0.01µs        ? ?/sec    1.00      7.7±0.01µs        ? ?/sec
dictionary/i32/dict=1000/list=16                                     1.00      9.0±0.01µs        ? ?/sec    1.00      9.0±0.02µs        ? ?/sec
dictionary/utf8_long/dict=100/list=16                                1.07      8.9±0.05µs        ? ?/sec    1.00      8.3±0.01µs        ? ?/sec
dictionary/utf8_short/dict=50/list=32                                1.06      8.6±0.05µs        ? ?/sec    1.00      8.1±0.01µs        ? ?/sec
dictionary/utf8_short/dict=50/list=8                                 1.06      8.5±0.21µs        ? ?/sec    1.00      8.0±0.01µs        ? ?/sec
dictionary/utf8_short/dict=500/list=20                               1.08     10.4±0.05µs        ? ?/sec    1.00      9.6±0.01µs        ? ?/sec
f32/large_list/list=64/match=0%                                      1.00     15.5±0.03µs        ? ?/sec    1.02     15.8±0.03µs        ? ?/sec
f32/large_list/list=64/match=50%                                     1.01     24.7±0.35µs        ? ?/sec    1.00     24.4±0.37µs        ? ?/sec
f32/small_list/list=32/match=0%                                      1.00     15.2±0.01µs        ? ?/sec    1.06     16.0±0.01µs        ? ?/sec
f32/small_list/list=32/match=50%                                     1.15     30.3±0.32µs        ? ?/sec    1.00     26.3±0.66µs        ? ?/sec
f32/small_list/list=4/match=0%                                       1.00     15.5±0.03µs        ? ?/sec    1.05     16.2±0.01µs        ? ?/sec
f32/small_list/list=4/match=50%                                      1.07     30.8±0.38µs        ? ?/sec    1.00     28.9±0.43µs        ? ?/sec
fixed_size_binary/fsb16/list=10000/match=0%                          1.25     36.7±0.10µs        ? ?/sec    1.00     29.3±0.06µs        ? ?/sec
fixed_size_binary/fsb16/list=10000/match=50%                         1.50     85.4±0.58µs        ? ?/sec    1.00     56.9±0.36µs        ? ?/sec
fixed_size_binary/fsb16/list=256/match=0%                            1.25     34.0±0.36µs        ? ?/sec    1.00     27.2±0.15µs        ? ?/sec
fixed_size_binary/fsb16/list=256/match=50%                           1.51     78.2±0.37µs        ? ?/sec    1.00     51.9±0.38µs        ? ?/sec
fixed_size_binary/fsb16/list=4/match=0%                              1.25     32.8±0.05µs        ? ?/sec    1.00     26.3±0.05µs        ? ?/sec
fixed_size_binary/fsb16/list=4/match=50%                             1.35     72.1±0.33µs        ? ?/sec    1.00     53.2±0.36µs        ? ?/sec
fixed_size_binary/fsb16/list=64/match=0%                             1.25     32.8±0.07µs        ? ?/sec    1.00     26.1±0.04µs        ? ?/sec
fixed_size_binary/fsb16/list=64/match=50%                            1.35     72.0±0.40µs        ? ?/sec    1.00     53.4±0.36µs        ? ?/sec
narrow_integer/i16/list=256/match=0%                                 1.03     12.2±0.02µs        ? ?/sec    1.00     11.8±0.01µs        ? ?/sec
narrow_integer/i16/list=256/match=50%                                1.01     18.5±0.18µs        ? ?/sec    1.00     18.3±0.11µs        ? ?/sec
narrow_integer/i16/list=4/match=0%                                   1.00     11.9±0.01µs        ? ?/sec    1.01     12.0±0.05µs        ? ?/sec
narrow_integer/i16/list=4/match=50%                                  1.00     22.0±0.58µs        ? ?/sec    1.00     22.1±0.26µs        ? ?/sec
narrow_integer/i16/list=64/match=0%                                  1.01     12.0±0.05µs        ? ?/sec    1.00     11.8±0.02µs        ? ?/sec
narrow_integer/i16/list=64/match=50%                                 1.36     24.6±0.93µs        ? ?/sec    1.00     18.1±0.13µs        ? ?/sec
narrow_integer/u8/list=16/match=0%                                   1.13     14.0±0.01µs        ? ?/sec    1.00     12.4±0.05µs        ? ?/sec
narrow_integer/u8/list=16/match=50%                                  1.33     35.8±0.30µs        ? ?/sec    1.00     27.0±0.28µs        ? ?/sec
narrow_integer/u8/list=4/match=0%                                    1.05     12.8±0.01µs        ? ?/sec    1.00     12.2±0.07µs        ? ?/sec
narrow_integer/u8/list=4/match=50%                                   1.42     36.6±0.18µs        ? ?/sec    1.00     25.8±0.26µs        ? ?/sec
nulls/narrow_integer/u8/list=16/match=50%/nulls=20%                  1.21     30.5±0.27µs        ? ?/sec    1.00     25.2±0.23µs        ? ?/sec
nulls/primitive/i32/large_list/list=64/match=50%/nulls=20%           1.00     19.4±0.23µs        ? ?/sec    1.07     20.9±0.28µs        ? ?/sec
nulls/primitive/i32/small_list/list=16/match=50%/nulls=20%           1.00     23.4±0.21µs        ? ?/sec    1.19     27.9±0.29µs        ? ?/sec
nulls/primitive/i32/small_list/list=16/match=50%/nulls=20%/NOT_IN    1.00     23.5±0.15µs        ? ?/sec    1.11     26.0±0.23µs        ? ?/sec
nulls/primitive/i32/small_list/list=16/match=50%/nulls=50%           1.00     14.5±0.25µs        ? ?/sec    1.16     16.9±0.06µs        ? ?/sec
nulls/utf8/long_24b/list=16/match=50%/nulls=20%                      1.05     75.1±0.15µs        ? ?/sec    1.00     71.2±0.29µs        ? ?/sec
nulls/utf8/short_8b/list=16/match=50%/nulls=20%                      1.05     63.5±0.19µs        ? ?/sec    1.00     60.3±0.43µs        ? ?/sec
nulls/utf8view/long_24b/list=16/match=50%/nulls=20%                  1.10     93.9±0.33µs        ? ?/sec    1.00     85.3±0.17µs        ? ?/sec
nulls/utf8view/short_8b/list=16/match=50%/nulls=20%                  1.21     47.1±0.34µs        ? ?/sec    1.00     38.8±0.18µs        ? ?/sec
nulls/utf8view/short_8b/list=16/match=50%/nulls=20%/NOT_IN           1.22     47.0±0.22µs        ? ?/sec    1.00     38.5±0.18µs        ? ?/sec
nulls/utf8view/short_8b/list=16/match=50%/nulls=50%                  1.74     50.2±0.26µs        ? ?/sec    1.00     28.9±0.14µs        ? ?/sec
primitive/i32/large_list/list=256/match=0%                           1.00     12.1±0.01µs        ? ?/sec    1.00     12.2±0.01µs        ? ?/sec
primitive/i32/large_list/list=256/match=50%                          1.00     16.8±0.15µs        ? ?/sec    1.44     24.3±0.49µs        ? ?/sec
primitive/i32/large_list/list=64/match=0%                            1.11     13.0±0.07µs        ? ?/sec    1.00     11.8±0.01µs        ? ?/sec
primitive/i32/large_list/list=64/match=50%                           1.00     22.6±0.17µs        ? ?/sec    1.45     32.6±0.17µs        ? ?/sec
primitive/i32/small_list/list=16/match=50%/NOT_IN                    1.00     20.8±0.18µs        ? ?/sec    1.33     27.8±0.29µs        ? ?/sec
primitive/i32/small_list/list=32/match=0%                            1.08     12.9±0.13µs        ? ?/sec    1.00     11.9±0.02µs        ? ?/sec
primitive/i32/small_list/list=32/match=50%                           1.00     16.8±0.20µs        ? ?/sec    1.92     32.2±0.19µs        ? ?/sec
primitive/i32/small_list/list=4/match=0%                             1.09     13.1±0.04µs        ? ?/sec    1.00     12.0±0.02µs        ? ?/sec
primitive/i32/small_list/list=4/match=50%                            1.00     24.0±0.38µs        ? ?/sec    1.38     33.1±0.18µs        ? ?/sec
primitive/i64/large_list/list=128/match=0%                           1.02     12.0±0.02µs        ? ?/sec    1.00     11.8±0.04µs        ? ?/sec
primitive/i64/large_list/list=128/match=50%                          1.13     21.9±0.38µs        ? ?/sec    1.00     19.3±0.17µs        ? ?/sec
primitive/i64/large_list/list=32/match=0%                            1.00     12.1±0.03µs        ? ?/sec    1.00     12.1±0.01µs        ? ?/sec
primitive/i64/large_list/list=32/match=50%                           1.00     19.1±0.12µs        ? ?/sec    1.13     21.5±0.35µs        ? ?/sec
primitive/i64/small_list/list=16/match=0%                            1.00     12.0±0.01µs        ? ?/sec    1.00     12.0±0.05µs        ? ?/sec
primitive/i64/small_list/list=16/match=50%                           1.04     24.9±0.32µs        ? ?/sec    1.00     23.9±0.30µs        ? ?/sec
primitive/i64/small_list/list=4/match=0%                             1.00     12.1±0.03µs        ? ?/sec    1.00     12.1±0.01µs        ? ?/sec
primitive/i64/small_list/list=4/match=50%                            1.03     23.8±0.18µs        ? ?/sec    1.00     23.2±0.17µs        ? ?/sec
timestamp_ns/large_list/list=32/match=0%                             1.37     24.6±0.02µs        ? ?/sec    1.00     17.9±0.12µs        ? ?/sec
timestamp_ns/large_list/list=32/match=50%                            1.98     58.8±0.47µs        ? ?/sec    1.00     29.7±0.13µs        ? ?/sec
timestamp_ns/small_list/list=16/match=0%                             1.38     24.8±0.03µs        ? ?/sec    1.00     18.0±0.07µs        ? ?/sec
timestamp_ns/small_list/list=16/match=50%                            2.10     62.1±0.53µs        ? ?/sec    1.00     29.6±0.20µs        ? ?/sec
timestamp_ns/small_list/list=4/match=0%                              1.40     25.0±0.10µs        ? ?/sec    1.00     17.8±0.07µs        ? ?/sec
timestamp_ns/small_list/list=4/match=50%                             1.84     56.6±0.53µs        ? ?/sec    1.00     30.8±0.13µs        ? ?/sec
utf8/long_24b/list=256/match=0%                                      1.18     41.2±0.04µs        ? ?/sec    1.00     34.8±0.21µs        ? ?/sec
utf8/long_24b/list=256/match=50%                                     1.29     89.8±0.37µs        ? ?/sec    1.00     69.4±0.45µs        ? ?/sec
utf8/long_24b/list=4/match=0%                                        1.19     41.1±0.05µs        ? ?/sec    1.00     34.5±0.06µs        ? ?/sec
utf8/long_24b/list=4/match=50%                                       1.24     87.7±0.44µs        ? ?/sec    1.00     70.6±0.41µs        ? ?/sec
utf8/long_24b/list=64/match=0%                                       1.22     41.9±0.96µs        ? ?/sec    1.00     34.4±0.05µs        ? ?/sec
utf8/long_24b/list=64/match=50%                                      1.33     91.7±0.41µs        ? ?/sec    1.00     68.9±0.37µs        ? ?/sec
utf8/mixed_len/list=16/match=0%                                      1.05     42.4±0.13µs        ? ?/sec    1.00     40.2±0.18µs        ? ?/sec
utf8/mixed_len/list=16/match=50%                                     1.18    115.5±0.59µs        ? ?/sec    1.00     98.1±0.48µs        ? ?/sec
utf8/mixed_len/list=64/match=0%                                      1.05     42.7±0.08µs        ? ?/sec    1.00     40.6±0.13µs        ? ?/sec
utf8/mixed_len/list=64/match=50%                                     1.26    123.4±0.55µs        ? ?/sec    1.00     97.7±2.18µs        ? ?/sec
utf8/shared_prefix/pfx=12/list=32/match=50%                          1.22     86.9±0.43µs        ? ?/sec    1.00     71.0±0.35µs        ? ?/sec
utf8/short_8b/list=16/match=50%/NOT_IN                               1.23     77.7±0.29µs        ? ?/sec    1.00     63.1±0.42µs        ? ?/sec
utf8/short_8b/list=256/match=0%                                      1.23     33.7±0.02µs        ? ?/sec    1.00     27.5±0.36µs        ? ?/sec
utf8/short_8b/list=256/match=50%                                     1.23     79.3±0.33µs        ? ?/sec    1.00     64.2±0.59µs        ? ?/sec
utf8/short_8b/list=4/match=0%                                        1.24     33.9±0.06µs        ? ?/sec    1.00     27.3±0.03µs        ? ?/sec
utf8/short_8b/list=4/match=50%                                       1.21     78.4±0.36µs        ? ?/sec    1.00     64.6±0.27µs        ? ?/sec
utf8/short_8b/list=64/match=0%                                       1.25     33.8±0.04µs        ? ?/sec    1.00     27.1±0.08µs        ? ?/sec
utf8/short_8b/list=64/match=50%                                      1.29     82.0±0.34µs        ? ?/sec    1.00     63.7±0.55µs        ? ?/sec
utf8view/len_12b/list=16/match=0%                                    1.39     26.0±0.04µs        ? ?/sec    1.00     18.7±0.03µs        ? ?/sec
utf8view/len_12b/list=16/match=50%                                   1.51     61.3±0.45µs        ? ?/sec    1.00     40.7±0.37µs        ? ?/sec
utf8view/len_12b/list=64/match=0%                                    1.39     26.2±0.38µs        ? ?/sec    1.00     18.8±0.11µs        ? ?/sec
utf8view/len_12b/list=64/match=50%                                   1.51     60.7±0.33µs        ? ?/sec    1.00     40.2±0.17µs        ? ?/sec
utf8view/long_24b/list=16/match=0%                                   1.17     47.9±0.08µs        ? ?/sec    1.00     41.1±0.12µs        ? ?/sec
utf8view/long_24b/list=16/match=50%                                  1.14     99.4±0.27µs        ? ?/sec    1.00     87.4±1.10µs        ? ?/sec
utf8view/long_24b/list=256/match=0%                                  1.18     47.7±0.06µs        ? ?/sec    1.00     40.5±0.05µs        ? ?/sec
utf8view/long_24b/list=256/match=50%                                 1.14     98.7±0.36µs        ? ?/sec    1.00     86.9±0.14µs        ? ?/sec
utf8view/long_24b/list=4/match=0%                                    1.18     47.9±0.06µs        ? ?/sec    1.00     40.7±0.09µs        ? ?/sec
utf8view/long_24b/list=4/match=50%                                   1.13     97.2±0.30µs        ? ?/sec    1.00     85.9±0.23µs        ? ?/sec
utf8view/long_24b/list=64/match=0%                                   1.17     47.4±0.14µs        ? ?/sec    1.00     40.5±0.05µs        ? ?/sec
utf8view/long_24b/list=64/match=50%                                  1.16     99.0±0.36µs        ? ?/sec    1.00     85.2±0.31µs        ? ?/sec
utf8view/mixed_len/list=16/match=0%                                  1.25     35.9±0.04µs        ? ?/sec    1.00     28.7±0.05µs        ? ?/sec
utf8view/mixed_len/list=16/match=50%                                 1.73    113.0±2.16µs        ? ?/sec    1.00     65.5±0.38µs        ? ?/sec
utf8view/mixed_len/list=64/match=0%                                  1.25     35.8±0.10µs        ? ?/sec    1.00     28.7±0.06µs        ? ?/sec
utf8view/mixed_len/list=64/match=50%                                 1.49    120.9±0.57µs        ? ?/sec    1.00     81.4±0.42µs        ? ?/sec
utf8view/shared_prefix/pfx=12/list=32/match=0%                       1.16     50.8±0.20µs        ? ?/sec    1.00     43.9±0.08µs        ? ?/sec
utf8view/shared_prefix/pfx=12/list=32/match=50%                      1.17    101.7±0.34µs        ? ?/sec    1.00     86.8±0.37µs        ? ?/sec
utf8view/shared_prefix/pfx=16/list=64/match=0%                       1.16     47.5±0.06µs        ? ?/sec    1.00     40.9±0.09µs        ? ?/sec
utf8view/shared_prefix/pfx=16/list=64/match=50%                      1.18    101.0±0.44µs        ? ?/sec    1.00     85.6±0.49µs        ? ?/sec
utf8view/shared_prefix/pfx=8/list=16/match=0%                        1.21     37.2±0.06µs        ? ?/sec    1.00     30.7±0.09µs        ? ?/sec
utf8view/shared_prefix/pfx=8/list=16/match=50%                       1.19     87.0±0.43µs        ? ?/sec    1.00     73.2±0.25µs        ? ?/sec
utf8view/short_8b/list=16/match=0%                                   1.39     25.3±0.03µs        ? ?/sec    1.00     18.2±0.04µs        ? ?/sec
utf8view/short_8b/list=16/match=50%                                  1.57     62.3±0.46µs        ? ?/sec    1.00     39.7±0.22µs        ? ?/sec
utf8view/short_8b/list=256/match=0%                                  1.35     25.1±0.04µs        ? ?/sec    1.00     18.6±0.43µs        ? ?/sec
utf8view/short_8b/list=256/match=50%                                 1.62     63.2±0.45µs        ? ?/sec    1.00     39.1±0.41µs        ? ?/sec
utf8view/short_8b/list=4/match=0%                                    1.38     25.8±0.10µs        ? ?/sec    1.00     18.7±0.07µs        ? ?/sec
utf8view/short_8b/list=4/match=50%                                   1.50     59.9±0.45µs        ? ?/sec    1.00     40.0±0.19µs        ? ?/sec
utf8view/short_8b/list=64/match=0%                                   1.37     25.4±0.04µs        ? ?/sec    1.00     18.4±0.04µs        ? ?/sec
utf8view/short_8b/list=64/match=50%                                  1.61     62.9±0.41µs        ? ?/sec    1.00     39.0±0.35µs        ? ?/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 1090.2s
Peak memory 4.5 GiB
Avg memory 4.4 GiB
CPU user 1387.3s
CPU sys 2.0s
Peak spill 0 B

branch

Metric Value
Wall time 1110.2s
Peak memory 4.5 GiB
Avg memory 4.5 GiB
CPU user 1411.4s
CPU sys 1.3s
Peak spill 0 B

File an issue against this benchmark runner

@adriangb

Copy link
Copy Markdown
Contributor

primitive/i32/small_list/list=4/match=50% 1.00 24.0±0.38µs ? ?/sec 1.38 33.1±0.18µs ? ?/sec

There are some regressions in benchmarks, I'll run again.

@adriangb

Copy link
Copy Markdown
Contributor

run benchmark in_list_strategy

@adriangbot

Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4740141036-588-82s8q 6.12.68+ #1 SMP Sat May 2 07:49:07 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing perf/in_list_generic_static_filter (a84579d) to c7e9284 (merge-base) diff using: in_list_strategy
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot

Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                                                HEAD                                   perf_in_list_generic_static_filter
-----                                                                ----                                   ----------------------------------
dictionary/i32/dict=10/list=16                                       1.00      7.6±0.02µs        ? ?/sec    1.00      7.6±0.00µs        ? ?/sec
dictionary/i32/dict=100/list=16                                      1.00      7.7±0.01µs        ? ?/sec    1.00      7.8±0.03µs        ? ?/sec
dictionary/i32/dict=100/list=16/NOT_IN                               1.00      7.7±0.01µs        ? ?/sec    1.00      7.7±0.01µs        ? ?/sec
dictionary/i32/dict=100/list=4                                       1.00      7.7±0.01µs        ? ?/sec    1.00      7.7±0.01µs        ? ?/sec
dictionary/i32/dict=100/list=64                                      1.00      7.8±0.01µs        ? ?/sec    1.00      7.8±0.02µs        ? ?/sec
dictionary/i32/dict=1000/list=16                                     1.00      9.0±0.02µs        ? ?/sec    1.00      9.0±0.01µs        ? ?/sec
dictionary/utf8_long/dict=100/list=16                                1.06      8.8±0.04µs        ? ?/sec    1.00      8.3±0.01µs        ? ?/sec
dictionary/utf8_short/dict=50/list=32                                1.07      8.7±0.04µs        ? ?/sec    1.00      8.1±0.01µs        ? ?/sec
dictionary/utf8_short/dict=50/list=8                                 1.06      8.5±0.06µs        ? ?/sec    1.00      8.0±0.01µs        ? ?/sec
dictionary/utf8_short/dict=500/list=20                               1.08     10.4±0.04µs        ? ?/sec    1.00      9.6±0.01µs        ? ?/sec
f32/large_list/list=64/match=0%                                      1.02     15.7±0.02µs        ? ?/sec    1.00     15.3±0.02µs        ? ?/sec
f32/large_list/list=64/match=50%                                     1.25     24.3±0.29µs        ? ?/sec    1.00     19.5±0.20µs        ? ?/sec
f32/small_list/list=32/match=0%                                      1.04     16.0±0.02µs        ? ?/sec    1.00     15.4±0.01µs        ? ?/sec
f32/small_list/list=32/match=50%                                     1.00     21.8±0.23µs        ? ?/sec    1.35     29.4±1.40µs        ? ?/sec
f32/small_list/list=4/match=0%                                       1.03     16.0±0.02µs        ? ?/sec    1.00     15.5±0.03µs        ? ?/sec
f32/small_list/list=4/match=50%                                      1.01     26.4±0.40µs        ? ?/sec    1.00     26.2±0.25µs        ? ?/sec
fixed_size_binary/fsb16/list=10000/match=0%                          1.25     33.4±0.06µs        ? ?/sec    1.00     26.7±0.32µs        ? ?/sec
fixed_size_binary/fsb16/list=10000/match=50%                         1.47     83.6±0.53µs        ? ?/sec    1.00     56.8±0.29µs        ? ?/sec
fixed_size_binary/fsb16/list=256/match=0%                            1.29     31.3±0.05µs        ? ?/sec    1.00     24.4±0.32µs        ? ?/sec
fixed_size_binary/fsb16/list=256/match=50%                           1.49     77.7±0.44µs        ? ?/sec    1.00     52.2±0.92µs        ? ?/sec
fixed_size_binary/fsb16/list=4/match=0%                              1.32     31.0±0.03µs        ? ?/sec    1.00     23.5±0.04µs        ? ?/sec
fixed_size_binary/fsb16/list=4/match=50%                             1.37     74.2±0.37µs        ? ?/sec    1.00     54.3±0.56µs        ? ?/sec
fixed_size_binary/fsb16/list=64/match=0%                             1.31     30.9±0.06µs        ? ?/sec    1.00     23.5±0.11µs        ? ?/sec
fixed_size_binary/fsb16/list=64/match=50%                            1.38     74.6±0.57µs        ? ?/sec    1.00     54.1±0.63µs        ? ?/sec
narrow_integer/i16/list=256/match=0%                                 1.09     13.6±0.13µs        ? ?/sec    1.00     12.4±0.01µs        ? ?/sec
narrow_integer/i16/list=256/match=50%                                1.18     22.1±0.32µs        ? ?/sec    1.00     18.6±0.18µs        ? ?/sec
narrow_integer/i16/list=4/match=0%                                   1.09     13.0±0.01µs        ? ?/sec    1.00     11.9±0.01µs        ? ?/sec
narrow_integer/i16/list=4/match=50%                                  1.00     22.5±0.28µs        ? ?/sec    1.05     23.6±0.59µs        ? ?/sec
narrow_integer/i16/list=64/match=0%                                  1.07     12.7±0.05µs        ? ?/sec    1.00     11.9±0.01µs        ? ?/sec
narrow_integer/i16/list=64/match=50%                                 1.00     18.9±0.13µs        ? ?/sec    1.21     22.8±0.39µs        ? ?/sec
narrow_integer/u8/list=16/match=0%                                   1.00     12.4±0.04µs        ? ?/sec    1.04     12.9±0.20µs        ? ?/sec
narrow_integer/u8/list=16/match=50%                                  1.00     19.1±0.24µs        ? ?/sec    1.42     27.1±0.22µs        ? ?/sec
narrow_integer/u8/list=4/match=0%                                    1.00     12.1±0.03µs        ? ?/sec    1.01     12.3±0.02µs        ? ?/sec
narrow_integer/u8/list=4/match=50%                                   1.00     18.3±0.13µs        ? ?/sec    1.34     24.5±0.32µs        ? ?/sec
nulls/narrow_integer/u8/list=16/match=50%/nulls=20%                  1.00     21.9±0.34µs        ? ?/sec    1.21     26.5±0.24µs        ? ?/sec
nulls/primitive/i32/large_list/list=64/match=50%/nulls=20%           1.00     16.5±0.16µs        ? ?/sec    1.00     16.5±0.17µs        ? ?/sec
nulls/primitive/i32/small_list/list=16/match=50%/nulls=20%           1.00     22.8±0.21µs        ? ?/sec    1.00     22.8±0.22µs        ? ?/sec
nulls/primitive/i32/small_list/list=16/match=50%/nulls=20%/NOT_IN    1.08     23.3±0.21µs        ? ?/sec    1.00     21.5±0.07µs        ? ?/sec
nulls/primitive/i32/small_list/list=16/match=50%/nulls=50%           1.07     17.7±0.04µs        ? ?/sec    1.00     16.6±0.06µs        ? ?/sec
nulls/utf8/long_24b/list=16/match=50%/nulls=20%                      1.14     82.1±0.31µs        ? ?/sec    1.00     71.9±0.28µs        ? ?/sec
nulls/utf8/short_8b/list=16/match=50%/nulls=20%                      1.14     71.5±0.37µs        ? ?/sec    1.00     62.9±0.63µs        ? ?/sec
nulls/utf8view/long_24b/list=16/match=50%/nulls=20%                  1.16     97.6±0.46µs        ? ?/sec    1.00     84.2±0.22µs        ? ?/sec
nulls/utf8view/short_8b/list=16/match=50%/nulls=20%                  1.24     58.4±0.28µs        ? ?/sec    1.00     47.2±0.55µs        ? ?/sec
nulls/utf8view/short_8b/list=16/match=50%/nulls=20%/NOT_IN           1.24     58.5±0.41µs        ? ?/sec    1.00     47.3±0.39µs        ? ?/sec
nulls/utf8view/short_8b/list=16/match=50%/nulls=50%                  1.36     50.9±0.28µs        ? ?/sec    1.00     37.3±0.18µs        ? ?/sec
primitive/i32/large_list/list=256/match=0%                           1.11     13.3±0.03µs        ? ?/sec    1.00     12.0±0.01µs        ? ?/sec
primitive/i32/large_list/list=256/match=50%                          1.00     18.2±0.12µs        ? ?/sec    1.10     20.0±0.12µs        ? ?/sec
primitive/i32/large_list/list=64/match=0%                            1.01     12.1±0.02µs        ? ?/sec    1.00     11.9±0.01µs        ? ?/sec
primitive/i32/large_list/list=64/match=50%                           1.04     19.1±0.11µs        ? ?/sec    1.00     18.4±0.13µs        ? ?/sec
primitive/i32/small_list/list=16/match=50%/NOT_IN                    1.09     24.1±0.38µs        ? ?/sec    1.00     22.2±0.31µs        ? ?/sec
primitive/i32/small_list/list=32/match=0%                            1.00     12.0±0.01µs        ? ?/sec    1.00     12.0±0.01µs        ? ?/sec
primitive/i32/small_list/list=32/match=50%                           1.00     22.2±0.28µs        ? ?/sec    1.04     23.1±0.28µs        ? ?/sec
primitive/i32/small_list/list=4/match=0%                             1.00     12.0±0.02µs        ? ?/sec    1.01     12.2±0.05µs        ? ?/sec
primitive/i32/small_list/list=4/match=50%                            1.00     22.8±0.40µs        ? ?/sec    1.00     22.8±0.25µs        ? ?/sec
primitive/i64/large_list/list=128/match=0%                           1.02     12.7±0.10µs        ? ?/sec    1.00     12.4±0.04µs        ? ?/sec
primitive/i64/large_list/list=128/match=50%                          1.00     18.8±0.14µs        ? ?/sec    1.03     19.3±0.25µs        ? ?/sec
primitive/i64/large_list/list=32/match=0%                            1.03     12.8±0.01µs        ? ?/sec    1.00     12.5±0.03µs        ? ?/sec
primitive/i64/large_list/list=32/match=50%                           1.26     25.2±0.47µs        ? ?/sec    1.00     20.0±0.17µs        ? ?/sec
primitive/i64/small_list/list=16/match=0%                            1.00     12.1±0.02µs        ? ?/sec    1.08     13.0±0.04µs        ? ?/sec
primitive/i64/small_list/list=16/match=50%                           1.00     17.2±0.20µs        ? ?/sec    1.39     23.9±0.39µs        ? ?/sec
primitive/i64/small_list/list=4/match=0%                             1.02     13.0±0.13µs        ? ?/sec    1.00     12.7±0.02µs        ? ?/sec
primitive/i64/small_list/list=4/match=50%                            1.00     22.3±0.40µs        ? ?/sec    1.05     23.3±0.46µs        ? ?/sec
timestamp_ns/large_list/list=32/match=0%                             1.42     24.7±0.03µs        ? ?/sec    1.00     17.5±0.02µs        ? ?/sec
timestamp_ns/large_list/list=32/match=50%                            1.79     59.8±0.49µs        ? ?/sec    1.00     33.4±0.17µs        ? ?/sec
timestamp_ns/small_list/list=16/match=0%                             1.42     24.9±0.03µs        ? ?/sec    1.00     17.6±0.02µs        ? ?/sec
timestamp_ns/small_list/list=16/match=50%                            1.64     59.2±0.54µs        ? ?/sec    1.00     36.1±0.29µs        ? ?/sec
timestamp_ns/small_list/list=4/match=0%                              1.43     25.2±0.03µs        ? ?/sec    1.00     17.5±0.02µs        ? ?/sec
timestamp_ns/small_list/list=4/match=50%                             1.87     59.5±0.61µs        ? ?/sec    1.00     31.9±0.36µs        ? ?/sec
utf8/long_24b/list=256/match=0%                                      1.21     41.2±0.06µs        ? ?/sec    1.00     34.0±0.04µs        ? ?/sec
utf8/long_24b/list=256/match=50%                                     1.33     94.5±0.41µs        ? ?/sec    1.00     71.2±0.20µs        ? ?/sec
utf8/long_24b/list=4/match=0%                                        1.22     41.2±0.06µs        ? ?/sec    1.00     33.9±0.03µs        ? ?/sec
utf8/long_24b/list=4/match=50%                                       1.27     93.7±0.48µs        ? ?/sec    1.00     73.8±0.42µs        ? ?/sec
utf8/long_24b/list=64/match=0%                                       1.21     41.1±0.04µs        ? ?/sec    1.00     34.1±0.07µs        ? ?/sec
utf8/long_24b/list=64/match=50%                                      1.31     93.6±0.37µs        ? ?/sec    1.00     71.5±0.40µs        ? ?/sec
utf8/mixed_len/list=16/match=0%                                      1.15     42.4±0.74µs        ? ?/sec    1.00     37.0±0.11µs        ? ?/sec
utf8/mixed_len/list=16/match=50%                                     1.14    121.3±0.73µs        ? ?/sec    1.00    106.7±0.80µs        ? ?/sec
utf8/mixed_len/list=64/match=0%                                      1.13     42.9±0.14µs        ? ?/sec    1.00     38.1±0.16µs        ? ?/sec
utf8/mixed_len/list=64/match=50%                                     1.07    127.0±1.44µs        ? ?/sec    1.00    118.4±0.72µs        ? ?/sec
utf8/shared_prefix/pfx=12/list=32/match=50%                          1.31     93.5±0.30µs        ? ?/sec    1.00     71.5±0.42µs        ? ?/sec
utf8/short_8b/list=16/match=50%/NOT_IN                               1.27     83.9±0.33µs        ? ?/sec    1.00     65.8±0.61µs        ? ?/sec
utf8/short_8b/list=256/match=0%                                      1.28     34.1±0.10µs        ? ?/sec    1.00     26.7±0.05µs        ? ?/sec
utf8/short_8b/list=256/match=50%                                     1.30     85.3±0.29µs        ? ?/sec    1.00     65.5±0.51µs        ? ?/sec
utf8/short_8b/list=4/match=0%                                        1.28     34.1±0.04µs        ? ?/sec    1.00     26.7±0.03µs        ? ?/sec
utf8/short_8b/list=4/match=50%                                       1.27     83.3±0.41µs        ? ?/sec    1.00     65.8±0.44µs        ? ?/sec
utf8/short_8b/list=64/match=0%                                       1.26     33.6±0.03µs        ? ?/sec    1.00     26.6±0.06µs        ? ?/sec
utf8/short_8b/list=64/match=50%                                      1.28     85.3±0.53µs        ? ?/sec    1.00     66.8±0.58µs        ? ?/sec
utf8view/len_12b/list=16/match=0%                                    1.39     25.7±0.03µs        ? ?/sec    1.00     18.5±0.03µs        ? ?/sec
utf8view/len_12b/list=16/match=50%                                   1.64     65.9±0.38µs        ? ?/sec    1.00     40.3±0.19µs        ? ?/sec
utf8view/len_12b/list=64/match=0%                                    1.40     25.9±0.02µs        ? ?/sec    1.00     18.4±0.04µs        ? ?/sec
utf8view/len_12b/list=64/match=50%                                   1.62     65.9±0.42µs        ? ?/sec    1.00     40.6±0.29µs        ? ?/sec
utf8view/long_24b/list=16/match=0%                                   1.18     47.8±0.07µs        ? ?/sec    1.00     40.6±0.06µs        ? ?/sec
utf8view/long_24b/list=16/match=50%                                  1.24    106.5±0.29µs        ? ?/sec    1.00     86.0±0.30µs        ? ?/sec
utf8view/long_24b/list=256/match=0%                                  1.18     47.6±0.08µs        ? ?/sec    1.00     40.3±0.10µs        ? ?/sec
utf8view/long_24b/list=256/match=50%                                 1.24    107.1±0.28µs        ? ?/sec    1.00     86.4±0.26µs        ? ?/sec
utf8view/long_24b/list=4/match=0%                                    1.18     47.8±0.04µs        ? ?/sec    1.00     40.5±0.06µs        ? ?/sec
utf8view/long_24b/list=4/match=50%                                   1.24    106.4±0.36µs        ? ?/sec    1.00     85.8±0.43µs        ? ?/sec
utf8view/long_24b/list=64/match=0%                                   1.18     47.4±0.07µs        ? ?/sec    1.00     40.2±0.05µs        ? ?/sec
utf8view/long_24b/list=64/match=50%                                  1.24    106.5±0.34µs        ? ?/sec    1.00     85.8±0.41µs        ? ?/sec
utf8view/mixed_len/list=16/match=0%                                  1.26     36.0±0.27µs        ? ?/sec    1.00     28.5±0.07µs        ? ?/sec
utf8view/mixed_len/list=16/match=50%                                 1.55    107.5±0.54µs        ? ?/sec    1.00     69.2±0.45µs        ? ?/sec
utf8view/mixed_len/list=64/match=0%                                  1.25     35.7±0.08µs        ? ?/sec    1.00     28.6±0.19µs        ? ?/sec
utf8view/mixed_len/list=64/match=50%                                 1.41    117.5±0.91µs        ? ?/sec    1.00     83.2±0.54µs        ? ?/sec
utf8view/shared_prefix/pfx=12/list=32/match=0%                       1.18     50.0±0.08µs        ? ?/sec    1.00     42.4±0.05µs        ? ?/sec
utf8view/shared_prefix/pfx=12/list=32/match=50%                      1.24    108.0±0.34µs        ? ?/sec    1.00     87.2±0.38µs        ? ?/sec
utf8view/shared_prefix/pfx=16/list=64/match=0%                       1.16     47.3±0.04µs        ? ?/sec    1.00     40.6±0.10µs        ? ?/sec
utf8view/shared_prefix/pfx=16/list=64/match=50%                      1.28    108.0±0.32µs        ? ?/sec    1.00     84.7±0.32µs        ? ?/sec
utf8view/shared_prefix/pfx=8/list=16/match=0%                        1.23     37.2±0.06µs        ? ?/sec    1.00     30.3±0.06µs        ? ?/sec
utf8view/shared_prefix/pfx=8/list=16/match=50%                       1.29     93.8±0.29µs        ? ?/sec    1.00     72.8±0.38µs        ? ?/sec
utf8view/short_8b/list=16/match=0%                                   1.40     25.3±0.15µs        ? ?/sec    1.00     18.1±0.09µs        ? ?/sec
utf8view/short_8b/list=16/match=50%                                  1.69     66.9±0.38µs        ? ?/sec    1.00     39.6±0.33µs        ? ?/sec
utf8view/short_8b/list=256/match=0%                                  1.39     25.1±0.04µs        ? ?/sec    1.00     18.0±0.03µs        ? ?/sec
utf8view/short_8b/list=256/match=50%                                 1.77     67.5±0.49µs        ? ?/sec    1.00     38.2±0.27µs        ? ?/sec
utf8view/short_8b/list=4/match=0%                                    1.40     25.7±0.03µs        ? ?/sec    1.00     18.3±0.03µs        ? ?/sec
utf8view/short_8b/list=4/match=50%                                   1.61     64.4±0.39µs        ? ?/sec    1.00     39.9±0.26µs        ? ?/sec
utf8view/short_8b/list=64/match=0%                                   1.41     25.4±0.43µs        ? ?/sec    1.00     18.0±0.03µs        ? ?/sec
utf8view/short_8b/list=64/match=50%                                  1.74     67.1±0.39µs        ? ?/sec    1.00     38.7±0.27µs        ? ?/sec

Resource Usage

in_list_strategy — base (merge-base)

Metric Value
Wall time 1260.3s
Peak memory 41.7 MiB
Avg memory 29.5 MiB
CPU user 1377.7s
CPU sys 1.2s
Peak spill 0 B

in_list_strategy — branch

Metric Value
Wall time 1285.3s
Peak memory 45.1 MiB
Avg memory 32.9 MiB
CPU user 1409.4s
CPU sys 1.1s
Peak spill 0 B

File an issue against this benchmark runner

@geoffreyclaude geoffreyclaude changed the title Optimize generic InList static filtering IN LIST: clean up generic static filtering Jun 18, 2026

@adriangb adriangb left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like it was all noise. LGTM!

@geoffreyclaude

Copy link
Copy Markdown
Contributor Author

Seems like it was all noise. LGTM!

@adriangb Thanks for the stamp! I've picked up this epic again as you can see, currently working on a clean PR stack. Draft PRs are up and linked to the main issue, but still need some work and tuning before ready for review status.

@geoffreyclaude

geoffreyclaude commented Jun 22, 2026

Copy link
Copy Markdown
Contributor Author

@adriangb Do you want to merge this one? The rest of the stack should be mostly ready for review now, I just need to rebase on main after each merge to get a clean diff...

@adriangb adriangb added this pull request to the merge queue Jun 22, 2026
@adriangb

Copy link
Copy Markdown
Contributor

merging!

Merged via the queue into apache:main with commit d2d9b12 Jun 22, 2026
42 checks passed
alamb added a commit to alamb/datafusion that referenced this pull request Jun 24, 2026
## Which issue does this PR close?

- Part of apache#19241.
- Stacked on apache#21927.
- Next in stack: apache#23012.
- Extracted from apache#19390.

## Rationale for this change

`IN LIST` evaluates expressions like `x IN (1, 3, 7)`. The list on the
right is fixed, so DataFusion can precompute a small lookup structure
once and then reuse it for every input row.

For `UInt8`, there are only 256 possible values: 0 through 255. That
means the lookup can be a tiny checklist with one bit per possible
value:

- If the list contains `3`, set bit `3`.
- If the list contains `7`, set bit `7`.
- To check whether an input value is present, read that one bit.

So instead of hashing each input value or comparing it against the list,
membership becomes one indexed bit test. The bitmap is only 32 bytes,
because 256 bits = 32 bytes.

This PR adds the first specialized primitive path in the stack as a
concrete `UInt8` filter. The `UInt16` version is added in apache#23012, and
the shared bitmap abstraction is introduced only after both concrete
implementations are visible in apache#23035.

## What changes are included in this PR?

- Adds `UInt8BitmapFilter`, a 32-byte bitmap built from the non-null
constants in the `IN` list.
- Routes `UInt8` constant-list filtering to that bitmap path.
- Keeps the same SQL null behavior as the generic path for both `IN` and
`NOT IN`.
- Moves shared dictionary-needle handling into `static_filter.rs`, so
specialized filters can reuse it consistently.
- Adds focused tests for `UInt8` null handling and dictionary-encoded
needles.

## Are these changes tested?

Yes.

- `cargo fmt --all`
- `cargo test -p datafusion-physical-expr bitmap_filter_u8 --lib`
- `cargo test -p datafusion-physical-expr in_list_int_types --lib`
- `cargo clippy -p datafusion-physical-expr --all-targets --all-features
-- -D warnings`

## Are there any user-facing changes?

No. This is an internal performance optimization only.

<!-- codex-benchmark-start -->
## Local benchmark snapshot

Benchmark command:

```bash
cargo bench -p datafusion-physical-expr --profile release-nonlto --bench in_list_strategy -- --save-baseline <name>
```

Method: compare adjacent saved baselines using raw Criterion sample
minima (`min(time / iters)`). Lower is better; changes within +/-5% are
treated as noise. These numbers were not rerun after splitting the
bitmap abstraction into apache#23035.

Compared baselines:
[apache#21927](apache#21927) ->
[apache#23011](apache#23011)

Relevant scope: UInt8 narrow-integer rows.

Summary: 5 relevant rows, 5 faster, 0 slower, 0 within +/-5%.

| Benchmark | Before | After | Change |
|---|---:|---:|---:|
| `narrow_integer/u8/list=16/match=0%` | 20.39 us | 3.94 us | -80.7%
(5.18x faster) |
| `narrow_integer/u8/list=16/match=50%` | 38.38 us | 3.98 us | -89.6%
(9.65x faster) |
| `narrow_integer/u8/list=4/match=0%` | 18.18 us | 3.93 us | -78.4%
(4.62x faster) |
| `narrow_integer/u8/list=4/match=50%` | 34.63 us | 3.96 us | -88.6%
(8.75x faster) |
| `nulls/narrow_integer/u8/list=16/match=50%/nulls=20%` | 37.12 us |
4.16 us | -88.8% (8.93x faster) |
<!-- codex-benchmark-end -->

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants