Turns out compilers don't do this naturally, and this leads to
observable slow downs in some cases.
Also noted that we are relying on 2's complement representation (we
already were). We could be more portable by going unsigned, but by this
logic the entire field arithmetic should go unsigned. It's possible,
but it's not trivial. I've kinda tried it in the past, and failed.
Every architecture of interest is 2's complement anyway, so I think this
will be good enough.
static void fe_cswap(fe f, fe g, int b)
{
+ i32 mask = -b; // rely on 2's complement: -1 = 0xffffffff
FOR (i, 0, 10) {
- i32 x = (f[i] ^ g[i]) & -b;
+ i32 x = (f[i] ^ g[i]) & mask;
f[i] = f[i] ^ x;
g[i] = g[i] ^ x;
}
static void fe_ccopy(fe f, const fe g, int b)
{
+ i32 mask = -b; // rely on 2's complement: -1 = 0xffffffff
FOR (i, 0, 10) {
- i32 x = (f[i] ^ g[i]) & -b;
+ i32 x = (f[i] ^ g[i]) & mask;
f[i] = f[i] ^ x;
}
}