Loup Vaillant [Sun, 6 Oct 2019 22:45:05 +0000 (00:45 +0200)]
Fused sliding windows and scalar multiplication
At last, we saved some stack. 320 bytes on my machine, which is a bit
disappointing. We may be able to shave off a couple more, but we're
reaching the limit.
Loup Vaillant [Sun, 6 Oct 2019 21:58:38 +0000 (23:58 +0200)]
Incremental left to right sliding windows
The main loop of the scalar multiplication goes one by one, so we can't
have the sliding loop skip indices. By adding a context that keeps
track of the next needed addition (as well as its value), we'll be able
to fuse the two slides and the scalar multiplication together.
Loup Vaillant [Sun, 6 Oct 2019 20:12:42 +0000 (22:12 +0200)]
Slide from left to right
Scalar multiplication goes from left to right (from MSB to
LSB). Computing the sliding windows used to go from *right to left*.
This direction mismatch forced us to keep all the signed digits in
memory, which currently incur a little over 500 bytes of stack overhead.
That overhead is avoidable. Avoiding it will allow Monocypher to fit in
smaller embedded devices.
Right now we just change the direction of the sliding. Interleaving will
come later.
Those are easily visible through the QtCreator IDE intellisense, but
somehow never showed up when compiling at the command line. This should
help silence MSVC warnings as well.
Turns out compilers don't do this naturally, and this leads to
observable slow downs in some cases.
Also noted that we are relying on 2's complement representation (we
already were). We could be more portable by going unsigned, but by this
logic the entire field arithmetic should go unsigned. It's possible,
but it's not trivial. I've kinda tried it in the past, and failed.
Every architecture of interest is 2's complement anyway, so I think this
will be good enough.
Those functions are used for both X25519 and EdDSA. Moving them up one
section makes it easier for user to delete the X-25519 section without
affecting EdDSA.
(Overall, Monocypher should let users delete the code they don't
need. This wasn't an explicit goal initially, but the code naturally
turned out that way. Supporting this use case cost us nothing.)
Made the 3 options (from source, from lib, system wide installation)
clearer, and stated the ability to change compilation flags explicitly.
(Those flags are all standards, but not everyone may know them).
Loup Vaillant [Thu, 14 Mar 2019 22:45:44 +0000 (23:45 +0100)]
Clarified why some buffers are not wiped
ge_msub() and ge_double_scalarmult_vartime() aren't clear why they don't
wipe their buffers. I have added warnings that they indeed don't do so,
and thus should not be used to process secrets.
This also makes clear to auditors that failing to wipe the buffers was
intentional.
Loup Vaillant [Wed, 13 Mar 2019 23:10:26 +0000 (00:10 +0100)]
Improved the key exchange API
crypto_kex_ctx is now differentiated into a client specific context, and
a server specific context. The distinction is entirely artificial (it's
the same thing under the hood), but it prevents some misuses at compile
time, making the API easier to use.
The name of the arguments have also been changed: "local" and "remote"
have been replaced by "client" and "server" whenever appropriate. The
previous names made implementation easier, but their meaning was context
dependent, and thus confusing. The new names have stable meanings, and
thus easier to document and use.
Fabio Scotoni [Tue, 12 Mar 2019 06:17:39 +0000 (07:17 +0100)]
man: fix whitespace and macro invocation issues
- There was some trailing whitespace on some of the lines of the new
pages that I hadn't noticed.
- There was a .PP instead of .Pp.
- There was a .Fa with no space after it.
Loup Vaillant [Mon, 4 Mar 2019 22:20:28 +0000 (23:20 +0100)]
Corrected undefined behaviour in kex tests
Calling those functions again on the same status not only does not make
any sense, it can grow the transcript beyond its maximum size of 128
bytes, which triggers a buffer overflow. We needed to save the context
so we could re-run the relevant function where we left of.
It's the second time the TIS interpreter finds a bug that the other
sanitisers didn't.
Loup Vaillant [Sun, 3 Mar 2019 21:56:29 +0000 (22:56 +0100)]
Added secure channel protocols (experimental)
At long last, the NaCl family of crypto libraries is gaining direct
support for secure channels.
Up until now, the choices were basically invent our own protocol, or
give up and use a TLS library, thus voiding the usability improvements
of NaCl libraries.
Now we have a solution. It's still a bit experimental, it's not yet
documented, but it's there. And soon, we will finally be able to shift
the cryptographic right answer for secure channels away from TLS, and
towards the NaCl family. Or perhaps just Monocypher, if for some reason
Libsodium doesn't follow suit. :-)
Loup Vaillant [Fri, 22 Feb 2019 20:14:06 +0000 (21:14 +0100)]
Added comment on speed tests
The way I measure timings is not perfectly portable. Users who
get weird results are encouraged to modify this bit of code to
have proper measurements.
Loup Vaillant [Sun, 17 Feb 2019 18:25:52 +0000 (19:25 +0100)]
Removed division by zero in speed benchmarks
If some library is so fast that it goes below the resolution of the
timer we're using to measure it, the measured duration may be zero, and
then trigger a division by zero when we convert it to a speed in Hz.
This could possibly happen with a very fast library (Libsodium), on a
very fast machine, with a sufficiently low resolution timer.
This patch reworks and simplifies things a bit, and adds an explicit
check. We now print "too fast to be measured" instead of dividing by
zero.
Loup Vaillant [Sat, 26 Jan 2019 14:44:01 +0000 (15:44 +0100)]
Allow the test suite to customise its random seed
This will only affect the property based tests, not the test vectors
themselves. The idea is to let paranoid users run the test suite with
lots and lots of different streams of random numbers, just to be safe.
Test vector generation could undergo a similar transformation, though it
is less likely to be worth the trouble (we'd have to generate the test
vectors, compile the test suite all over again).
Loup Vaillant [Fri, 25 Jan 2019 14:43:02 +0000 (15:43 +0100)]
Link SHA-512 code when using -DED25519_SHA512
When the $CFLAGS variable contains the -DED25519_SHA512 option (by
default it doesn't), the code from src/optional/sha512.c is
automatically linked to the final libraries (libmonocypher.a and
libmonocypher.so).
That way, users who need to install a ED25519 compliant version of
Monocypher can do so simply by altering the compilation options with the
$CFLAGS variable.
Loup Vaillant [Sun, 20 Jan 2019 21:42:38 +0000 (22:42 +0100)]
Made L an array of *signed* integers
Was unsigned previously, causing a bunch of implementation defined
conversions. No machine nowadays are no 2's complement, but it's still
cleaner that way.
Loup Vaillant [Thu, 6 Dec 2018 00:04:37 +0000 (01:04 +0100)]
Decoupled window widths, minimised stack usage
The width of the pre-computed window affects the program size. It has
been set to 5 (8 elements) so we can approach maximum performance
without bloating the program too much.
The width of the cached window affects the *stack* size. It has been set
to 3 (2 elements) to avoid blowing up the stack (this matters most on
embedded environments). The performance hit is measurable, yet very
reasonable.
Footgun wielders can adjust those widths as they see fit.
Loup Vaillant [Wed, 5 Dec 2018 22:16:55 +0000 (23:16 +0100)]
Parameterise sliding window width with a macro
This is more general, perhaps even more readable this way. This also
lays the groundwork for using different window widths for the
pre-computed window and the cached one. (The cached window has to be
smaller to save stack space, while the pre-computed constant is allowed
to be bigger).
Loup Vaillant [Thu, 16 Aug 2018 19:29:13 +0000 (21:29 +0200)]
Added tests for HChacha20
Not that it needed any (XChacha20 were enough), but it's easier to
communicate to outsiders that HChacha20 is correct when we have explicit
test vectors.
Loup Vaillant [Wed, 15 Aug 2018 18:02:03 +0000 (20:02 +0200)]
Properly prevent S malleability
S malleability was mostly prevented in a previous commit, for reasons
that had nothing to do with S malleability. This mislead users into
thinking Monocypher was not S malleable.
To avoid confusion, I properly verify that S is strictly lower than L
(the order of the curve). S malleability is no longer a thing.
We still have nonce malleability, but that one can't be helped.
Also added Wycheproof test vectors about malleability.
Loup Vaillant [Sat, 11 Aug 2018 18:05:28 +0000 (20:05 +0200)]
Signed sliding windows for EdDSA
Signed sliding windows are effectively one bit wider than their unsigned
counterparts, without doubling the size of the corresponding look up
table. Going from 4-bit unsigned to 5-bit signed allowed us to gain
almost 17 additions on average.
This gain is less impressive than it sounds: the whole operation still
costs 254 doublings and 56 additions, and going signed made window
construction and look up a bit slower. Overall, we barely gained 2.5%.
We could gain a bit more speed still by precomputing the look up table
for the base point, but the gains would be similar, and the costs in
code size and complexity would be even bigger.
Loup Vaillant [Sat, 11 Aug 2018 16:19:35 +0000 (18:19 +0200)]
Reduced EdDSA malleability for sliding windows
Signed sliding windows can overflow the initial scalar by one bit. This
is not a problem when the scalar is reduced modulo L, which is smaller
than 2^253. The second half of the signature however is controlled by
the attacker, and can be any value.
Legitimate signatures however always reduce modulo L. They don't really
have to, but this helps with determinism, and enables test vectors. So
we can safely reject any signature whose second half exceeds L.
This patch rejects anything above 2^253-1, thus guaranteeing that the
three most significant bits are cleared. This eliminate s-malleability
in most cases, but not all. Besides, there is still nonce malleability.
Users should still assume signatures are malleable.
Loup Vaillant [Sat, 11 Aug 2018 15:36:14 +0000 (17:36 +0200)]
EdDSA sliding windows now indicate the number
This is in preparation for signed sliding windows. Instead of choosing
-1 for doing nothing, and an index to point to the table, we write how
much we add directly (that means 0 for nothing). We divide the number
by 2 to get the index.
The double scalarmult routine doesn't handle negative values yet.