The loading code for Chacha20, Poly1305, Blake2b, and SHA-512 was a bit
ad-hoc. This made it a bit impenetrable, as well as error prone.
Chacha20 in particular was harder than it should be to adapt to faster
implementation that proceed by several blocks at a time. So was
Poly1305, I think.
The loading code has been modified to conform to the following pattern:
1. Align ourselves with block boundaries
2. Process the message block by block
3. remaining bytes
- The last section just calls general purpose update code. It's the only
one that's mandatory.
- The first section calls the same general purpose update code, with
just enough input to reach the next block boundary. It must be
present whenever the second section is.
- The second section does optimised block-by-block update. It needs the
first section to ensure alignment.
Each section but the last updates the input pointers and lengths,
allowing later sections may assume they were the first.
Tests were performed with sections 1 2 3, 1 3, and 3 alone. They all
yield the same, correct results. We could write an equivalence proof,
but the property-based tests were designed to catch mistakes in the
loading code in the first place. Maybe not worth the trouble.