Turns out compilers don't do this naturally, and this leads to
observable slow downs in some cases.
Also noted that we are relying on 2's complement representation (we
already were). We could be more portable by going unsigned, but by this
logic the entire field arithmetic should go unsigned. It's possible,
but it's not trivial. I've kinda tried it in the past, and failed.
Every architecture of interest is 2's complement anyway, so I think this
will be good enough.