git.codecow.com Git - Monocypher.git/commit

author	Loup Vaillant <loup@loup-vaillant.fr>
	Mon, 3 Jul 2023 21:27:37 +0000 (23:27 +0200)
committer	Loup Vaillant <loup@loup-vaillant.fr>
	Mon, 3 Jul 2023 21:27:37 +0000 (23:27 +0200)
commit	014f45d9fe54271c8e08006eea83fa97b9d55d32
tree	62be6345132e69d85e4066f8b5c9db525f686917	tree \| snapshot
parent	f034515e60c7e59a4a77a0f0a11faa6181a7b2a9	commit \| diff

Faster Argon2 inner loop

Compilers aren't magical. They need help to generate the best code.
Here we want to compute the following expression:

    mask = 0xffffffff;
    2 * (a & mask) * (b & mask)

The most efficient way to do this looks like this:

    u64 al = (u32)a;   // Truncate
    u64 bl = (u32)b;   // Truncate
    u64 x  = al * bl;  // 32->64 bits multiply
    u64 2x = x << 1;   // shift
    return 2x;

My compiler doesn't pick up on this, and perform a slower alternative
instead. Either the multiply by two uses an actual multiply instead of a
shift, or the shift is done first, forcing a more expensive 64->64
multiply.  More naive compilers may even do both.

Whatever the cause, I got 5% faster code on GCC 11.3.