Loup Vaillant [Tue, 19 Jun 2018 22:26:41 +0000 (00:26 +0200)]
Corrected failing test on 32-bit systems
When size_t is not uint64_t, converting "negative" size_t integers to
uint64_t yields nonsensical results. That is, the following isn't
portable:
size_t x = 42;
uint64_t y = -i;
Because y might be missing the high order bits if size_t is smaller than
uint64_t. Instead, we want to convert to a large sized integer *before*
we negate it:
Loup Vaillant [Sun, 17 Jun 2018 18:05:36 +0000 (20:05 +0200)]
Tests for crypto_verify*() catch more errors
The test had a symmetry that caused them to miss a possible error, where
the implementer would replace an `|` operator by an `&` operator.
Breaking that symmetry allow them to catch that error.
Loup Vaillant [Sun, 17 Jun 2018 17:47:46 +0000 (19:47 +0200)]
Faster crypto_verify*() tests
There was no need to test every possible value to catch the errors the
tests caugth. Cutting them down makes the test 64000 times faster,
which matters quite a lot when we run the TIS interpreter.
The new tests catch just as many errors as the old ones.
Loup Vaillant [Sun, 17 Jun 2018 17:19:20 +0000 (19:19 +0200)]
Run the TIS interpreter in 2 commands instead of 3.
The TIS interpreter doesn't need to be run from inside the
formal-analysis folder. We can refer to the relevant C files directly.
This simplify the README a tiny little bit.
Loup Vaillant [Sun, 17 Jun 2018 17:06:12 +0000 (19:06 +0200)]
Corrected variable sized buffer in the tests.
The p_eddsa_random() test was triggering the TIS interbreter because of
a variable sized array allocated on the stack. The test run properly
with a fixed sized buffer. (Variable size buffers are tested elsewhere,
most notably with the test vectors).
Loup Vaillant [Sat, 16 Jun 2018 10:29:34 +0000 (12:29 +0200)]
Improved the test suite
The test suite has been trimmed down a little, and improved a bit. The
main goal was to have the TIS interpreter to run the entire test suite
in less than 20 hours, so I (and others) could realistically run it on
each new release.
- We now have much less Argon2i test vectors. Only those at block size
boundaries have been kept (for instance, it is important that we test
both below and above 512 blocks, and below and above 64 bytes hashes,
to hit all code paths.
- X25519 test vectors have been cut in half. We have official test
vectors and the Monte Carlo test already, we don't need too many
vectors. The main advantage here is to reduce the size of the test
vector header file.
- The tests for chacha20_set_ctr() explore the test space more
efficiently.
- The comparison between crypto_argon2i() and crypto_argon2i_general()
is no longer repeated 128 times.
- The overlap tests for Argon2i have been cut down, and the overlap
space has been reduced to compensate (so we're still sure there will
be lots of overlap).
- The incremental tests for Blake2b and SHA-512 were cut down to a
number of iterations, and total message size, that is *not* a multiple
of a block length, so tests can fail more reliably if the MIN() macro
has an error.
- The roundtrip tests for EdDSA have now been cut down, and try several
message sizes as well.
- The random tests for EdDSA (which are mostly meant to test what
happens when the point is outside the curve), have beet cut down (from
1000 to 100), and try several message sizes as well.
Loup Vaillant [Sat, 16 Jun 2018 10:03:22 +0000 (12:03 +0200)]
Reset SHA-512 input buffer like Blake2b's
This is mostly for consistency (code that follow the same patterns
everywhere are more easily reviewed). The generated code is also a tiny
bit more efficient that way.
Loup Vaillant [Sat, 16 Jun 2018 09:35:52 +0000 (11:35 +0200)]
Fixed undefined behaviour in Blake2b
Fixes #96
The function blake2b_set_input() was reading uninitialised memory.
While this didn't matter in practice (most platforms don't have trap
representations for unsigned integers), it is undefined behaviour under
the C and C++ standards. To fix it, we reset the whole input buffer
before setting its first byte.
The fix introduces a conditional, but that conditional only depend
on an index, which itself depends on the size of the input, which is not
secret. We're still "constant time" with respect to secrets.
Loup Vaillant [Sat, 12 May 2018 16:06:18 +0000 (18:06 +0200)]
don't recomend 16 bytes for argon2i digests
Listing 16 bytes as a possible size sounds like an endorsement.
But really, 16 bytes is a bit short, and weakens the security
of the subsequent symmetric crypto. You kind have to know what
you are doing to select such a size, so let's not list it.
This is C we're talking about. Functions that return void cannot fail
only if they're used correctly. Incorrect inputs can still trigger
undefined behaviour. In this sense, those functions _can_ fail.
Returning void should be an obvious enough hint that the function
requires no error handling. At least it is if you're familiar enough
with C. (If one is not, one is not qualified to use a crypto library in
an unsafe language.)
An unqualified "cannot fail" give any more information than `void`, and
may even mislead some users. Better stay on the safe side.
Test vectors for EdDSA used to use Moncoypher's Blake2b hash. This
didn't make much sense, and could even conceivably be construed as
circular. Blake2b vectors were properly generated, so it wasn't really
circular, but that's still ugly.
Now the test vectors only depend on Libsodium and ed25519-donna.
Monocypher failed to compile with -DED25519_SHA512 since we added the
incremental interface. The error has been corrected by the #95 Pull
Request by @vbmithr.
To make sure this doesn't happen again, the test suite has been expanded
to include the official Ed25519 construction. You can test it thus:
$ make clean
$ make test CFLAGS="-DED25519_SHA512 -O3"
Or just run `tests/test.sh` to have the whole shebang.
Found 4 internal buffers that were not properly wiped. While this were
unlikely to turn into an exploitable vulnerability, we can never be too
cautious.
- XChacha20 initialisation failed to wipe its intermediate key.
- Blake2b initialisation failed to wipe a copy of its secret key.
- EdDSA failed to wipe an internal field element
- EdDSA failed to wipe an internal group element
I always wanted to release Monocypher to the public domain, at least to
the extent possible. My amputated version of the GNU all permissive
licence was nice and concise, but had no real legal basis. We need
something solid, hence CC-0 from Creative Commons.
In case that's still not enough, the better known BSD licence has been
kept as a fallback.
MSVC doesn't like when people negate unsigned numbers, and the result is
still unsigned. By default, it's an error. Moreover, section 6.5.6 of
the C11 standard doesn't clearly specify what -x means when x is an
unsigned integer. It does however specify ~x.
So I replaced -x by ~x+1. This was getting ugly, though, so I made the
ALIGN macro. ALIGN(x, block_size) returns how many bytes we need to
reach the next block size, assuming we've already consumed x bytes. For
instance, ALIGN(11, 8) == 5. It uses bit twiddling trickery, so the
block size must be a power of 2.
Loup Vaillant [Thu, 22 Mar 2018 21:36:48 +0000 (22:36 +0100)]
Replaced "Double Ratchet" by "X3DH" in the manual
The Double Ratchet algorithm has other purposes than simple forward
secrecy, and is quite complicated, and rely on some prior key exchange
protocol to boot. Pointing to it wasn't good general purpose advice.
X3DH is what we were looking for. It is simple enough, and addresses
the main issues around key exchange (forward secrecy, replay attacks, and
deniability).
Loup Vaillant [Thu, 22 Mar 2018 20:51:06 +0000 (21:51 +0100)]
Adjusted the number of test vectors
There were too many tests vectors. This is redundant, takes more time
to test, and bloats the generated vectors header file, which then takes
longer to compile.
I reduced their numbers, while making sure they were as effective as
they used to be (maximum code coverage, and every relevant lengths
still tested).
For those who worry about dangerously reducing the number of tests for
Poly1305: don't. There is nothing the random tests can catch that the
official, hand crafted test vectors cannot. The random tests don't
really test the core algorithm, they test the loading code.
Loup Vaillant [Thu, 22 Mar 2018 20:47:13 +0000 (21:47 +0100)]
Corrected possible misalignment in the tests
Argon2i must have its work area aligned for uint64_t. I don't recall
any guarantee of alignment when allocating an array of uint8_t on the
stack. So we allocate on the heap instead.
Loup Vaillant [Thu, 22 Mar 2018 12:30:21 +0000 (13:30 +0100)]
Added a test vector for Argon2i
Libsodium's API doesn't let the user specify the `key` and `ad`
arguments. An implementation that flips them by mistake would still
pass the test vectors.
So I added a test vector from the reference implementation (hard coded,
to avoid dragging the whole reference implementation with us). With
that, we're sure `key` and `ad` are processed in the right order.
It wouldn't have affected security, but due diligence can't hurt.
Loup Vaillant [Tue, 6 Mar 2018 22:34:24 +0000 (23:34 +0100)]
More auditable code for Poly1305
The invariants in the comments have been updated, and a couple minor
errors of no consequence were corrected.
The final reduction code of crypto_poly1305_final() has been modified to
facilitate audits and formal proofs. This was motivated by the
following semi-formal proof:
Loup Vaillant [Sun, 25 Feb 2018 20:00:46 +0000 (21:00 +0100)]
More readable and more flexible loading code
The loading code for Chacha20, Poly1305, Blake2b, and SHA-512 was a bit
ad-hoc. This made it a bit impenetrable, as well as error prone.
Chacha20 in particular was harder than it should be to adapt to faster
implementation that proceed by several blocks at a time. So was
Poly1305, I think.
The loading code has been modified to conform to the following pattern:
1. Align ourselves with block boundaries
2. Process the message block by block
3. remaining bytes
- The last section just calls general purpose update code. It's the only
one that's mandatory.
- The first section calls the same general purpose update code, with
just enough input to reach the next block boundary. It must be
present whenever the second section is.
- The second section does optimised block-by-block update. It needs the
first section to ensure alignment.
Each section but the last updates the input pointers and lengths,
allowing later sections may assume they were the first.
Tests were performed with sections 1 2 3, 1 3, and 3 alone. They all
yield the same, correct results. We could write an equivalence proof,
but the property-based tests were designed to catch mistakes in the
loading code in the first place. Maybe not worth the trouble.
Loup Vaillant [Mon, 12 Feb 2018 20:25:59 +0000 (21:25 +0100)]
Renamed crypto_aead_(un)lock to crypto_(un)lock_aead
Related to #89
This is better for consistency (now authenticated encryption always
begins by "crypto_lock" or "crypto_unlock"), and has the added benefit
of warning developers of the major breaking changes triggered by IETF
padding.
Loup Vaillant [Sat, 10 Feb 2018 19:16:22 +0000 (20:16 +0100)]
Removed divison operations
This has no effect on most platform with most modern compiler, and makes
the code slightly less readable to boot.
But.
Some compilers may fail to transform divisions by a power of two into
the relevant shift or mask. Moreover, some platforms sport a variable
time division operation.
In the name of safety against timing attacks, those operation have been
removed explicitly. Only one remains, in Argon2i, but its operands are
not secret.
Loup Vaillant [Sat, 10 Feb 2018 18:25:28 +0000 (19:25 +0100)]
Changed authenticated encryptio to match RFC 7539
Closes #87
This is a breaking change. For data in transit, update everyone at
once. For data at rest, decrypt then re-encrypt everything. Sorry
about that. I should have thought this through earlier.
The main reason for this change is speed. While Monocypher doesn't aim
to be as fast as possible itself, it *does* aim to allow upgrades. By
ensuring that processing is aligned to block boundaries, RFC 7539
simplifies the implementation of fast algorithms.
This change brings the following benefits:
- Users who need the best speed possible ever can upgrade.
- The length of the additional data is now authenticated, closing a
potential minor vulnerability.
- We can use Libsodium's crypto_aead_xchacha20poly1305_ietf_encrypt to
generate test vectors.
---
The monolithic interface stays the same. Function names, types, and
buffer sizes, are identical. Just recompile your programs to upgrade.
The incremental interface has been changed to be more orthogonal:
`crypto_lock_encrypt()` and `crypto_lock_auth()` have been removed.
There shall be one true AEAD construction, users don't need those
building blocks. Users who *really* need another AEAD construction can
write it themselves with the low-level primitives.
`crypto_lock_aead_auth()` and `crypto_unlock_aead_auth()` have been
renamed `crypto_lock_auth_ad()` and `crypto_unlock_auth_ad()`
respectively. "aead" was a misnomer, those functions only authenticate
additional data.
`crypto_lock_auth_message()` and `crypto_unlock_auth_message()` have
been added. They authenticate the cipher text. Their main use is
performing a separate authentication pass (usefull when users expect a
high corruption rate).
Loup Vaillant [Fri, 9 Feb 2018 20:58:27 +0000 (21:58 +0100)]
Less provocative introductory paragraph
Some people interpret the previous wording to mean that Monocypher
already eats Libsodium's lunch, which is obviously false. Status
slap-down ensues, and I have to explain the difference between "means to"
and "does".
This also hints more clearly at the scope of Monocypher: do no more than
TweetNacl (except for password key derivation), stay small and portable
without sacrificing too much speed. (That scope may need to be clearly
stated somewhere.)
Loup Vaillant [Wed, 7 Feb 2018 22:55:51 +0000 (23:55 +0100)]
Better wording for Poly1305 security considerations
Fixes #84. I hope.
Some users *will* use Poly1305 for who knows what nefarious purpose I
haven't anticipated. This is why we expose low-level primitives in the
first place.
This may sound bikeshed-y, but Poly1305 is quite exacting. Not as bad
as AES-GCM from what I've heard, but close. So the manual must be
precise and unambiguous.
Loup Vaillant [Sat, 3 Feb 2018 22:22:25 +0000 (23:22 +0100)]
More accurate speed benchmark
Used smaller buffers to minimise the impact of cache misses in the
benchmark. Chosen a size that makes Libsodium and Monocypher look best.
(There is a trade-off between start up time and throughput.)
This should highlight the algorithmic differences better. Still, the
memory access patterns are very clean, computation tends to dominate.
Ultimately, this makes little difference.