Note a small problem in the implementation: we are reusing one byte for
both the tweak and the next random seed. This makes them *not*
independent, and a possible source of vulnerability.
In practice, this is only a problem for the 3 bits comprising the
cofactor, since the sign and the padding do not play a role in deciding
whether the mapping fails or succeeds.
TODO: take the cofactor from the clamped bits of the scalar, instead of
the tweak. This will ensure proper independence, while keeping the high
level code simple and maximally efficient.