Note: I noticed something iffy about comparing against all zeroes: for
big buffers, the timings were way off (small buffers were okay). This
suggest they were *not* constant time, which is worrying.
The generated assembly is too big for me to review. I can't tell
whether there's a variable time optimisation in there. Thankfully, we
rarely use crypto_memcmp() to compare big zeroed buffers in practice.
Instead, we compare small, pseudo random data such as hashes or
authentication tags. So I used pseudo-random data for the tests.
While we should be good in practice, I'm a bit worried. Someone may
want to check that compilers haven't become too clever.