Optimize SipHash by reordering compress instructions by mat-1 · Pull Request #127226 · rust-lang/rust

mat-1 · 2024-07-01T23:07:12Z

This PR optimizes hashing by changing the order of instructions in the sip.rs compress macro so the CPU can parallelize it better. The new order is taken directly from Fig 2.1 in the SipHash paper (but with the xors moved which makes it a little faster). I attempted to optimize it some more after this, but I think this might be the optimal instruction order. Note that this shouldn't change the behavior of hashing at all, only statements that don't depend on each other were reordered.

It appears like the current order hasn't changed since its original implementation from 2012 which doesn't look like it was written with data dependencies in mind.

Running ./x bench library/core --stage 0 --test-args hash before and after this change shows the following results:

Before:

benchmarks:
    hash::sip::bench_bytes_4             7.20/iter +/- 0.70
    hash::sip::bench_bytes_7             9.01/iter +/- 0.35
    hash::sip::bench_bytes_8             8.12/iter +/- 0.10
    hash::sip::bench_bytes_a_16         10.07/iter +/- 0.44
    hash::sip::bench_bytes_b_32         13.46/iter +/- 0.71
    hash::sip::bench_bytes_c_128        37.75/iter +/- 0.48
    hash::sip::bench_long_str          121.18/iter +/- 3.01
    hash::sip::bench_str_of_8_bytes     11.20/iter +/- 0.25
    hash::sip::bench_str_over_8_bytes   11.20/iter +/- 0.26
    hash::sip::bench_str_under_8_bytes   9.89/iter +/- 0.59
    hash::sip::bench_u32                 9.57/iter +/- 0.44
    hash::sip::bench_u32_keyed           6.97/iter +/- 0.10
    hash::sip::bench_u64                 8.63/iter +/- 0.07

After:

benchmarks:
    hash::sip::bench_bytes_4             6.64/iter +/- 0.14
    hash::sip::bench_bytes_7             8.19/iter +/- 0.07
    hash::sip::bench_bytes_8             8.59/iter +/- 0.68
    hash::sip::bench_bytes_a_16          9.73/iter +/- 0.49
    hash::sip::bench_bytes_b_32         12.70/iter +/- 0.06
    hash::sip::bench_bytes_c_128        32.38/iter +/- 0.20
    hash::sip::bench_long_str          102.99/iter +/- 0.82
    hash::sip::bench_str_of_8_bytes     10.71/iter +/- 0.21
    hash::sip::bench_str_over_8_bytes   11.73/iter +/- 0.17
    hash::sip::bench_str_under_8_bytes  10.33/iter +/- 0.41
    hash::sip::bench_u32                10.41/iter +/- 0.29
    hash::sip::bench_u32_keyed           9.50/iter +/- 0.30
    hash::sip::bench_u64                 8.44/iter +/- 1.09

I ran this on my computer so there's some noise, but you can tell at least bench_long_str is significantly faster (~18%).

Also, I noticed the same compress function from the library is used in the compiler as well, so I took the liberty of copy-pasting this change to there as well.

Thanks @Semisol for porting SipHash for another project which led me to notice this issue in Rust, and for helping investigate. <3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize SipHash by reordering compress instructions#127226

Optimize SipHash by reordering compress instructions#127226
bors merged 1 commit into
rust-lang:masterfrom
mat-1:optimize-siphash-round

mat-1 commented Jul 1, 2024 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

Uh oh!

Conversation

mat-1 commented Jul 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

mat-1 commented Jul 1, 2024 •

edited

Loading