And optimised `crypto_blake2b_update()` on short or unaligned inputs.
This also clarifies the source code somewhat, though it's now a bit more
verbose. That verbosity does give us over 15-30% speed on small updates
& small inputs typical of key derivation schemes.
---
The reason for the rework of the API is that the struct argument simply
does not work in practice. See, BLAKE2b has not one, but *two* typical
usages: regular hashing, and keyed hashing. There are many situations
where controlling the size of the hash is useful even when we don't use
the key, and when we do keyed hashing (for MAC or KDF), having to use
the struct is just *so* verbose and tedious.
I briefly considered having just one hash function to rule them all with
6 arguments, but regular hashing still is the main use case. So instead
of a struct or the original split, I simply have:
* `crypto_blake2b()` with a variable sized hash.
* `crypto_blake2b_keyed()` with every option.
and in practice, the 6 arguments of the keyed version are quite
manageable: output first as always, then the key, then the message.
And if arguments get too long the pointer/size pairs give us natural
line breaks.
I had to work on KDFs to realise this so thanks @samuel-lucas6 for
bringing the whole issue up.
---
Unlike SHA512, I did *not* add an explicit KDF interface for BLAKE2b.
My reasons being:
* There is no clear standard KDF for BLAKE2b except HKDF.
* HKDF is build on top of HMAC, and HMAC is stupid when BLAKE2b already
has a keyed mode that does the exact same thing with less code and
fewer CPU cycles.
* HKDF is kind of *a little* stupid itself.
* `crypto_blake2b_keyed()` already implements KDF extract.
* `crypto_chacha20_*()` already implement KDF expand.
* I hesitate to implement a non-standard convenience function users are
unlikely to get catastrophically wrong.
So instead I updated the manual, including 3 full KDF functions lazy
users can just blindly copy & paste.