Loup Vaillant [Thu, 12 Jan 2023 18:14:04 +0000 (19:14 +0100)]
Normalise AEAD direct interface
The AEAD interface had three problems:
- Inconsistent prefix (now fixed, to "crypto_aead").
- Inconsistent ordering of arguments (now the mac is after the output
text).
- Redundant API without the additional data (now removed).
The result is less than satisfactory to be honest. If it were just me I
would delete the direct interface entirely, because the streaming one is
almost easier to use...
...save one crucial detail: the choice of the exact algorithm. The
streaming interfaces offers three init options, each with its pros and
cons. Users need a default, and it shall be XChacha20. Those who know
what they are doing can easily use the streaming API anyway.
Loup Vaillant [Thu, 12 Jan 2023 17:15:27 +0000 (18:15 +0100)]
Added Ed25519ph
Not sure this is such a good idea, considering how niche Ed25519ph is.
Yet someone recently paid me handsomely for the functionality, sparking
the recent overhaul of the EdDSA API that made it simpler and more
flexible than ever.
And now implementing Ed25519ph has become so cheap that I felt like
thanking them by adding it upstream.
I'm currently rethinking the AEAD API as a whole, and to be honest I'm
so happy with this streaming API that I believe it could replace the
regular API entirely.
One problem with the AEAD API is the sheer number of arguments.
`crypto_lock_aead()` and `crypto_unlock_aead()` currently have 8
arguments, comprising 6 pointers (all of the same type) and 2 sizes.
There are way too many opportunities to swap arguments and break stuff.
The streaming API however is divided into an init phase, which has only
3 arguments, and a read/write phase, which has 7, but "only" 4 pointers
to byte buffers. Which I don't think we can improve much. We could try
and use a second struct similar to what we do with Argon2, but with only
7 arguments (compared to Argon2's 15) I don't think we would gain that
much readability.
As for how to use the streaming API for single shot uses, that's obvious
enough:
- Declare the context and call Init.
- Call read/write.
- Wipe the context.
One may argue that everything else (Poly1305, Blake2b, SHA-512, and
HMAC) provide a one-shot API, and we should do so here as well. There's
just one problem: we don't have one init function, we have _three_.
If we provide a one-shot API, orthogonality would need all 3 variants.
That's 6 functions total (3 locks, 3 unlocks), which is a bit much,
especially since at least one of them is only provided for compatibility
with a standard I don't entirely agree with. We could of course only
provide only the single one-shot API (like we do right now), but that
leaves such an obvious hole in the API.
Stopping at just the 5 functions we need for everything (streaming,
one-shot, all 3 variants) is very tempting.
Loup Vaillant [Mon, 9 Jan 2023 18:10:03 +0000 (19:10 +0100)]
Simplified and unified Chacha20 API
We had 6 functions. Now we only have 3.
While the basic variants are a bit more convenient to use, I don't
expect users will be using them frequently enough for it to matter. But
having 6 functions to chose from instead of 3 is in my opinion a
non-negligible cost.
Then there's HChacha20, the odd one out. While we're here breaking the
API left and right, I figured I needed a stable naming scheme for
everything. And I think each function should be named
crypto_<cluster>_<function_name>(), with relatively few clusters. And
HChacha20 quite clearly belong to the "chacha20" cluster, even though
it's sometimes used as kind of a hash (for the extended nonce DJB only
relies on its properties as a bastardised stream cipher).
And while we're speaking clusters, I'm considering having one man page
per cluster, with no regards towards whether a function is "advanced" or
not. In practice this would mean:
- Bundling HChacha20 and Chacha20 functions in the same man page. This
would help highlight how they're related.
- Bundling low-level EdDSA building blocks with the high-level
construction. We can always push the advanced stuff down the man
page, but the main point here is to make it easier to find. Oh and
we'd perhaps add the conversion to X25519 as well.
- Bundling dirty X25519 function together with the clean one. And
perhaps the conversion to EdDSA too.
- The Elligator functions are already documented together, but I think
they deserve their dedicated prefix. Like, "crypto_elligator_".
However we go about it, I'd like to strive towards a more systematic way
of documenting things, to the point of enabling some automatic checks as
hinted in #250.
Loup Vaillant [Sat, 7 Jan 2023 11:48:35 +0000 (12:48 +0100)]
Added X25519 -> EdDSA public key conversion
Also removed the private conversions (users can use the relevant hash
function instead), and renamed the existing conversion to fit the new
functionality set better.
Combined with the EdDSA building blocks, this should be enough to
implement XEdDSA.
Loup Vaillant [Fri, 6 Jan 2023 16:29:16 +0000 (17:29 +0100)]
Nicer Argon2 API
I believe it's hard to do any better.
- One function to rule them all.
- Inputs are all nicely organised.
- There's an easy way to omit the key and additional data.
- Argon2 user code is very clear, though a little verbose.
I believe fusing the "regular" and "extra" inputs together would not be
a good idea, because it would make the common case (no extra inputs)
either more verbose or more confusing than it is right now.
Loup Vaillant [Sat, 31 Dec 2022 21:33:50 +0000 (22:33 +0100)]
Fixed uninitialised read UB in Argon2
The index block was declared in the block loop instead of the segment
loop. Yet it's only initialised one time out of 128 there, so most of
the time we're accessing uninitialised memory.
It still appeared to work because that that block always occupied the
same spot in the stack. Only Clang's memory sanitiser and the TIS
interpreter caught this.
Loup Vaillant [Fri, 30 Dec 2022 23:24:53 +0000 (00:24 +0100)]
Add Argon2d and Argon2id support
This is mostly about supporting Argon2id, because it is mandated by the
RFC, and sometimes recommended by people who know more than I do about
the threat models around passwords.
Argon2d is included as well because supporting it is practically free
(one character change and one constant).
Speaking of constants, I'm not sure whether the three `CRYPTO_ARGON2_*`
constants should be pre-processor definitions like they are now, or
proper `const uint32_t` declarations.
Loup Vaillant [Thu, 29 Dec 2022 23:06:38 +0000 (00:06 +0100)]
Fix tis-ci tests
The Argon2 tests were failing because we were allocating too much memory
on 16-bit platforms. Reducing the test from 4 lanes & 32KiB down to 2
lanes and 16KiB should fix it.
The main test suite of course still needs bigger parameters.
Loup Vaillant [Mon, 12 Dec 2022 21:21:23 +0000 (22:21 +0100)]
Reworked Argon2 API (draft)
This is a prelude to Argon2d and Argon2id support. The rationale here
is that supporting both with the current API would require way too many
functions. Using a structure also helps manage the ungodly amount of
arguments this function has.
A number of unresolved questions so far:
- Should we pass by value or by reference?
- Should we start the struct with a size field, Microsoft style?
- Should we add a version field?
- Should we keep the nb_lanes field?
- If so should we support more than one lane, even while staying single
threaded?
- Should we provide structures with default values to begin with?
This is mostly an API/ABI compatibility question. Personally I think we
should omit the size field and pass by value, it feels more convenient
in practice.
A version field would let us support future versions of Argon2 without
breaking users, but the specs are so stable nowadays that I'm not sure
this is worth the trouble. We may add it if users don't need to know
it's there.
The nb_lanes field however might be required for compatibility with the
_current_ specs, so I'm inclined to keep it even if we delay multi-lane
support indefinitely.
Default values are a difficult problem. The correct strength for
password hashing is highly context dependent: we almost always want to
chose the highest tolerable strength, and there is no one size fits all.
The current manual outlines a _process_ for finding the values that work
for any given situation.
If we don't provide defaults, users have to fill out the fields
themselves, including fields that won't change often (nb_iterations), or
aren't supported yet (nb_lanes if we keep it). If we do provide
defaults, we need to chose them very carefully, and risk quick
obsolescence.
Finally, it's not clear which field should be in the struct, and which
field should be a regular argument. Right now I put fields that are
likely to stay identical from invocation to invocation in the struct.
Another possibility is to instead restrict ourselves to fields that have
a good default, which would likely demote the nb_blocks to being a
regular argument. That way users will know what parameters should be
treated as strong recommendations, and which they're supposed to chose
themselves.
Loup Vaillant [Mon, 12 Dec 2022 14:31:04 +0000 (15:31 +0100)]
More portable/consistent EdDSA verification
EdDSA has more corner cases than we would like. Up until now we didn't
pay much attention.
- The first version of Monocypher didn't check the range of S, allowing
attackers to generate valid variants of existing signatures. While it
doesn't affect the core properties of signatures, some systems rely on
a stricter security guarantee: generating a new, distinct signature
must require the private key.
- When the public key has a low-order component, there can be an
inconsistency between various verification methods. Detecting such
keys is prohibitively expensive (a full scalar multiplication), and
some systems nevertheless require that everyone agrees whether a
signature is valid or not (if they don't we risk various failures such
as network partitions).
- Further disagreement can occur if A and R use a non-canonical
encoding, though in practice this only happens when the public key has
low order (and detecting _that_ is not expensive).
There is a wide consensus that the range of S should be checked, and we
do. Where consensus is lacking is with respect to the verification
method (batch or strict equation), checking for non-canonical encodings,
and checking that A has low order.
The current version is as permissive as the consensus allows:
- It checks the range of S.
- It uses the batch equation.
- It allows non-canonical encodings for A and R.
- It allows A to have low order.
The previous version on the other hand used the strict equation, and did
not allow non-canonical encodings for R. The reasons for the current
policy are as follows:
- Everyone checks the range of S, it provides an additional security
guarantee, and it makes verification slightly faster.
- The batch equation is the only one that is consistent with batched
verification. Batch verification is important because it allows up to
2x performance gains, precisely in settings where it might be the
bottleneck (performing many verifications).
- Allowing non-canonical encodings and low order A makes the code
simpler, and makes sure we do not start rejecting signatures that were
previously accepted.
- Though these choices aren't completely RFC 8032 compliant, they _are_
consistent with at least one library out there (Zebra). Note that if
we forbade low order A, we would be consistent with Libsodium instead.
Which library we chose to be consistent with is kind of arbitrary.
The main downside for now is an 8% drop in performance. 1% can be
recovered by replacing the 3 final doublings by comparisons, but 7% come
from R decompression, which is a necessary cost of the batch equation.
I hope to overcome this loss with a lattice based optimisation [Thomas
Pornin 2020].
Loup Vaillant [Wed, 7 Dec 2022 18:39:02 +0000 (19:39 +0100)]
Less error prone EdDSA verification building blocks
crypto_eddsa_r_check() is replaced by crypto_eddsa_check_equation().
This has two advantages:
- Users now only need to return the value of crypto_eddsa_r_check().
No need for an additional check we may forget, much safer.
- Verifying the equation give better optimisation opportunities.
Loup Vaillant [Fri, 2 Dec 2022 22:45:45 +0000 (23:45 +0100)]
Safer interface for EdDSA
Now the private key is 64 bytes, and is the concatenation of the seed
and the public key just like Libsodium. The idea is to make sure users
never sign messages with the wrong public key, which can leak the secret
scalar and allow forgeries.
Users who can't afford the overhead of storing 32 additional bytes for
the secret key (say they need to burn the key into expensive fuses),
they can always only store the first 32 bytes, and re-derive the entire
key pair when they need it.
Loup Vaillant [Thu, 1 Dec 2022 15:27:08 +0000 (16:27 +0100)]
Remove EdDSA incremental & custom hash API
The incremental and custom hash API was too complex and too niche to
justify itself. I'm removing them in favour of a more flexible
approach: giving the basic building blocks necessary to implement EdDSA
manually.
Those building blocks comprise 5 specialised functions:
- crypto_eddsa_trim_scalar: turn 32 random bytes into a proper scalar.
- crypto_eddsa_reduce : reduces a 64 bytes number modulo L.
- crypto_eddsa_mul_add : like MUL_ADD, except modulo L.
- crypto_eddsa_scalarbase : multiplies a scalar by the base point.
- crypto_eddsa_r_check : generates R independently for verification.
These make it fairly easy to implement EdDSA (including Ed25519) in
various ways, including the streaming and custom hash functions I just
removed, replacing the deterministic nonce by a random one, or adding a
random prefix to mitigate the energy side channel in some settings.
I believe only minimal tweaks are required to implement the Edwards25519
half of RFC 8032 entirely (including the context and pre-hash variants),
as well as XEdDSA (which should only require a single Montgomery to
Edwards conversion function).
This is a prototype, and the extensibility promises remain to be tested.
Ideally that means implementing all the fancy extensions in a separate
project, and _maybe_ include some of them in the optional files.
Loup Vaillant [Tue, 29 Nov 2022 23:49:15 +0000 (00:49 +0100)]
Switch indentation from spaces to tabs.
For the longest time I had a fairly strong personal preference for
spaces. Then it came to my attention that using tabs meaningfully
increases accessibility.
As a cryptography library, Monocypher is supposed to ultimately help,
among other people, the most vulnerable among us. It would be a shame
to potentially exclude disabled contributors or auditors.
Note that this patches sometimes changes a little more than just
spacing. A few pieces of code in particular relied on indentation
width, and had to be reworked a little bit to make them tab width
agnostic.
Loup Vaillant [Mon, 28 Nov 2022 22:58:53 +0000 (23:58 +0100)]
Remove deprecated functions.
Deprecated functions are redundant with previous major branches of
Monocypher, and as such don't serve any meaningful purpose. Maintaining
old branches is cheaper anyway.
Note: this will also remove them from the manual, including on the
website when updated. We could compensate by publishing older versions
of the manual, or we could punt on it and rely on the fact that the
tarball contain the associated manual.
- Global constant should have been `static`
- Reserved identifier (double underscores)
- Loss of precision in implicit conversions
- Implicit change of sign
Users may wonder why we didn't provide the safer API from the outset.
We could explain this is for backwards compatibility, but this man page
is quite cluttered already.
TODO: the next major version of Monocypher should definitely adopt this
safer API.
- clarify NULL goes in public_key (not secret_key)
- add parenthetical note to define the term "fat public key" inline
- fix commas (I actually had to look up the rules for comma-before-but)
- avoid colloquial "we"
Though I think the option of choosing how many bits we mask off at the
end is useful, it made the code more confusing. Better leave it for the
one case where we really need it: parsing Elligator representatives,
which are represented with one less bit than usual.
This does cost us 4 lines of code, but I think they're worth it. The
previous code was too much like code golf for my comfort.
Loup Vaillant [Sun, 13 Mar 2022 09:57:52 +0000 (10:57 +0100)]
Removed redundant Valgrind tests
I think we don't need to test both -O2 and -O3. We're checking memory
error here, they should not be much different under different
optimisation settings (though maybe -O3 can trigger more bugs).
Also did not add myself to the list of authors, I think this trivial
change is not copyrightable.
Chris Fogelklou [Sat, 5 Mar 2022 06:33:16 +0000 (07:33 +0100)]
Run basic tests taken from the readme.md file using Github Actions.
* Add a github workflow which follows the basic install flow in the readme.md.
* Add installation of libsodium.
* Add valgrind as a dependency.
* Add llvm to list of dependencies.
Fabio Scotoni [Sun, 13 Feb 2022 05:44:40 +0000 (06:44 +0100)]
Deprecate crypto_key_exchange
See #230.
In summary:
The way crypto_key_exchange() works was not optimal.
Several combinations of (private, public) key pairs produce the same
shared secret (see 54a7ef551078dda001f94f1045e142af0fc1008a);
crypto_key_exchange() did nothing to mitigate that.
Since this is a high-level function that cannot be saved even with
documentation because it is so fundamentally misguided,
it is now deprecated (see #232).
Users are encouraged to either adopt the function into their own code
if and only if they require it for backwards compatibility,
starting with the release that removes it,
or preferably migrate to implementing an existing protocol,
such as instantiating a Noise protocol pattern or a protocol like X3DH.
Loup Vaillant [Sat, 12 Feb 2022 10:13:22 +0000 (11:13 +0100)]
Fix invsqrt() boolean return
The invsqrt() function used to return true when its operand was a
non-zero square. Thing is, it was used to determine whether it was a
square, *period*. If the operand was zero, we had the wrong answer.
This introduced two "errors" visible by the end user:
* The all-zero X25519 point could not be mapped back to an Elligator 2
representative. In practice that point is never used, but that does
not comply with the original paper.
* The (0, 1) EdDSa public key could not be serialised, causing any
signature verification associated with it to fail. This one we really
don't care, because even though it is on the curve, that point has low
order, and legitimate public keys all have prime order.
Jens Alfke [Wed, 12 Jan 2022 19:36:44 +0000 (11:36 -0800)]
Allow headers to declare a C++ namespace
If the preprocessor symbol `MONOCYPHER_CPP_NAMESPACE` is defined,
the Monocypher headers will be wrapped in a C++ namespace instead of
in `extern "C"`. The name of the namespace is the value of the symbol.
This requires also compiling Monocypher's .c files as C++, either by:
- Using a compiler option like `--x c++`;
- Using an IDE setting, like the Identity & Type inspector in Xcode;
- Putting `#include "monocypher.c" in a C++ file
Loup Vaillant [Wed, 18 Aug 2021 19:09:11 +0000 (21:09 +0200)]
Inlined ge_r_check()
This saves 9 lines of code.
ge_r_check() was once separated to enhance readability, but it turns out
that inlining it saves more lines of code than expected. The code is
almost more readable now, though the comments are still needed.
MSVC doesn't like when we expand unsigned integers after a bitwise
negation. A fair distaste, except this time it's wrong. Writing
`(~x & 0xffffffff)` instead of just `~x` shows MSVC the error of its ways.
Also made a potentially lossy conversion to `int` explicit (and
explained in a comment why there is no actual loss).
The trick is to parse the input bytes into words at the last moment.
This simplifies incremental loading, and saves around 10 lines of code.
The performance impact is barely measurable.
Caugth by TIS-CI and the latest Clang's UBSan.
Incrementing a NULL pointer, even by a NULL offset, is not permitted.
This was caused by the removal of a conditional that exited early if the
message was empty.
The fix was to move the increment inside the alignment loop.
It may be tiny bit slower, but this was the slow path already.
Users can avoid it by aligning their increments to block boundaries.
Previous changes caused ge_msub() to only be used for signature
verification (it was previously used for signature generation as well,
but this hurted readability). It thus became reasonable to use
temporary buffers, since we no longer have to wipe them (at a sizeable
performance cost).
The trick is the same as how ge_sub() is defined in terms of ge_add().
This saves 9 lines of code, and the performance cost is negligible.
This saved 16 lines of code. It is not clear how readability is impacted
(the calling code is a bit more complex, but it is also more local).
Overall, I believe it's a win.
Fabio Scotoni [Fri, 11 Jun 2021 18:37:46 +0000 (20:37 +0200)]
doc: Use canonical spelling of BLAKE2(b)
Spelling according to Jean-Philippe Aumasson, Samuel Neves, Zooko
Wilcox-O’Hearn, Christian Winnerlein. "BLAKE2: Simpler, Smaller, Fast as
MD5." ACNS 2013. Lecture Notes in Computer Science, vol 7954,
and also https://twitter.com/veorq/status/1396728032883322884