Loup Vaillant [Sun, 5 Mar 2023 10:25:07 +0000 (11:25 +0100)]
Split installation into 3 sub targets
Namely:
- `install-lib`: library files and headers.
- `install-doc`: man pages.
- `install-pc` : pkg-config file.
Some users don't want to install everything. Those who pull Monocypher
directly from git, despite strong suggestion to use a tarball instead,
can be especially annoyed at the mandoc dependency.
We could also split the library and includes, but this feels overkill.
Loup Vaillant [Mon, 27 Feb 2023 13:24:05 +0000 (14:24 +0100)]
Move relevant parts of tests/utils.c to tests/gen
The main point is to remove dead code from tests/utils.c, and have the
tarball be as clean as possible. Incidentally this also cleans up the
dependencies for test vector generation as well.
Loup Vaillant [Sun, 26 Feb 2023 10:10:48 +0000 (11:10 +0100)]
Generate docks with `make all`
That way the docs no longer have to belong to root when we generate
them. Some more dist.sh shenanigans were required to spare tarball
users the mandoc dependency. Mostly making sure their doc is not
deleted when they `make clean`.
Loup Vaillant [Sat, 25 Feb 2023 23:36:32 +0000 (00:36 +0100)]
Give myself some copyright info
Now I'm not exactly sure where the CAVEATS section comes from. It dates
back to 2017, but it uses wording that was previously in the manual
(which was originally all mine). It is possible, though I'm not sure,
that I'm misappropriating the work of @CuleX and/or @FScoto here.
Hopefully I'm not. Or maybe future historians will judge me.
Fabio Scotoni [Sat, 25 Feb 2023 10:13:24 +0000 (11:13 +0100)]
Resolve mandoc -Tlint nits
Several of them I have ignored intentionally:
1. unknown manual section comes with the definition;
2. unusual Xr order seems to have been an intentional decision
grouping by topic rather than alphabetically,
even if admittedly unusual;
3. no blank before trailing delimiter: Fa extern const
crypto_argon2_extras crypto_argon2_no_extras;
is another intentional choice,
with no reasonable markup existing for a global variable
declaration other than Bd.
Fabio Scotoni [Sat, 25 Feb 2023 09:59:41 +0000 (10:59 +0100)]
makefile: fix install-doc
During the documentation overhaul in 85a7c3742f06ab55fdf523a7a6a9cfe5cda09837,
generating the man page symlinks was automated
(introduction of doc/gen_doc.sh, now called doc/doc_gen.sh).
However, the install-doc target of the makefile fails to account for
the source folder not necessarily existing.
This change runs doc/doc_gen.sh before attempting to install
the man pages.
This has the questionable side effect of creating a folder as root and
creating a bunch of files as root (including the HTML files,
i.e. running mandoc as root) when doing sudo make install,
but the average user will "just" install and forget about it anyway.
Loup Vaillant [Sun, 19 Feb 2023 23:31:32 +0000 (00:31 +0100)]
Fixed ctgrind
Note that Argon2 reports a "use" of uninitialised value. It does not
appear to be an secret dependent branch, or even index, but I didn't
expect the warning.
Loup Vaillant [Fri, 10 Feb 2023 15:56:12 +0000 (16:56 +0100)]
Better doc integration
Now some checks must pass before we generate the docs:
- .Nm and .Fo names are identical.
- All .Fn names are referenced in .Nm and .Fo.
- All functions from the headers have an .Nm reference.
- No .Nm reference documents a non-existent function.
- No .Xr reference is dead.
In practice this allowed me to catch many stale references very easily.
Since those were basicaly mechanically caugth typos, I did not update
the date nor copyright information in the affected documents.
Loup Vaillant [Fri, 3 Feb 2023 17:18:58 +0000 (18:18 +0100)]
Improved SHA-512 speed on small inputs.
Not as much as BLAKE2b because SHA-512 core is slower, but still.
The cost here is about 10 lines of code, and an additional couple
dozen bytes in the binary.
Loup Vaillant [Thu, 2 Feb 2023 23:14:15 +0000 (00:14 +0100)]
Document BLAKE2b KDF, change BLAKE2b API *again*
And optimised `crypto_blake2b_update()` on short or unaligned inputs.
This also clarifies the source code somewhat, though it's now a bit more
verbose. That verbosity does give us over 15-30% speed on small updates
& small inputs typical of key derivation schemes.
---
The reason for the rework of the API is that the struct argument simply
does not work in practice. See, BLAKE2b has not one, but *two* typical
usages: regular hashing, and keyed hashing. There are many situations
where controlling the size of the hash is useful even when we don't use
the key, and when we do keyed hashing (for MAC or KDF), having to use
the struct is just *so* verbose and tedious.
I briefly considered having just one hash function to rule them all with
6 arguments, but regular hashing still is the main use case. So instead
of a struct or the original split, I simply have:
* `crypto_blake2b()` with a variable sized hash.
* `crypto_blake2b_keyed()` with every option.
and in practice, the 6 arguments of the keyed version are quite
manageable: output first as always, then the key, then the message.
And if arguments get too long the pointer/size pairs give us natural
line breaks.
I had to work on KDFs to realise this so thanks @samuel-lucas6 for
bringing the whole issue up.
---
Unlike SHA512, I did *not* add an explicit KDF interface for BLAKE2b.
My reasons being:
* There is no clear standard KDF for BLAKE2b except HKDF.
* HKDF is build on top of HMAC, and HMAC is stupid when BLAKE2b already
has a keyed mode that does the exact same thing with less code and
fewer CPU cycles.
* HKDF is kind of *a little* stupid itself.
* `crypto_blake2b_keyed()` already implements KDF extract.
* `crypto_chacha20_*()` already implement KDF expand.
* I hesitate to implement a non-standard convenience function users are
unlikely to get catastrophically wrong.
So instead I updated the manual, including 3 full KDF functions lazy
users can just blindly copy & paste.
Loup Vaillant [Wed, 1 Feb 2023 22:03:06 +0000 (23:03 +0100)]
Add HKDF SHA-512
In principle, we can imitate HKDF quite easily with SHA-512 alone.
Being fully RFC compliant however is fiddly and tedious, so for users
who want to do key derivation with SHA-512 in the most standard way
possible need dedicated functions.
Note the absence of a `crypto_sha512_hkdf_extract()` function, which
would be nothing more than an alias of `crypto_sha512_hmac()`. Aliases
to the equivalent incremental interface are also absent. There are pros
and cons to both the presence and absence of those aliases, I personally
prefer to leave them out.
Loup Vaillant [Mon, 30 Jan 2023 22:48:16 +0000 (23:48 +0100)]
Renamed crypto_hmac_sha512*() to crypto_sha512*_hmac()
The main goal here is to have one man page per hash, and steer users who
are reaching for hashes when trying to compute MACs or derive keys to
the right functions.
Fusing the man pages for SHA-512 and HMAC also helped reduce the total
size of the documentation.
Loup Vaillant [Thu, 26 Jan 2023 21:47:48 +0000 (22:47 +0100)]
Documentation overhaul
The API was broken left and right, and the documentation had to be
updated to reflect it. In the process, I decided to go slightly above
and beyond.
The documentation is rather daunting. One problem I had with it was the
sheer number of pages to navigate. On top of that the optional and
advanced section didn't help. So I made a number of changes:
* I'm sick of having to update symbolic links, so I removed them all.
Instead they're generated as part of the documentation generation
process.
* I collapsed the advanced and optional section into the main folder.
Monocypher is a low-level library, so in practice even part of the
main API is advanced. Let's stop being shy about it and document
everything at the same level.
The optional part costs almost nothing on systems that can
meaningfully use the makefile. It is now included by default.
* I organised all function name around the `crypto_<section>_*()` naming
scheme, and each section gets exactly one man page. This gives us 13
sections, 13 corresponding man pages, plus `intro.3monocypher`.
One unfortunate consequence is that some of the man pages grew extra
sections for the advanced functions that were bolted on there. I'm
not exactly sure how we should tackle them. Still, I think this is a
win overall: those fewer man pages are still easier to navigate, and I
don't think the advanced stuff really hurt the readability of the man
pages they were inserted in.
This automation is a first step towards solving #250.
The next step should be to automate some cross verification:
- Compare the header files and the man pages, spot any missing
functions. Maybe also control arguments.
- Control that all .Fo and .Fn macros list the same functions.
- Control that no cross link (.Xr) is dead.
- Control that no function name (.Fn) isn't declared in the header.
(except in the HISTORY section). Maybe also do the same for
arguments.
Loup Vaillant [Sun, 22 Jan 2023 20:41:51 +0000 (21:41 +0100)]
Add simple limb overflow checker for Poly1305
Monocypher now grabs at the graal of Formal Verification.
It is very crude, but it works for Poly1305. Just transliterate the C
program into Python, add some preconditions and asserts, and execute the
result as Python to prove the absence of overflow (provided the
preconditions are correct).
Fabio Scotoni [Fri, 20 Jan 2023 18:18:02 +0000 (19:18 +0100)]
Normalize crypto_blake2b markup
Struct documentation by mdoc(7) convention does not go in the synopsis.
Similarly, external constants are not part of .Nm,
which is only used for the actual function names.
Constants are marked up with .Dv,
whether they are macros or not.
Loup Vaillant [Thu, 12 Jan 2023 18:14:04 +0000 (19:14 +0100)]
Normalise AEAD direct interface
The AEAD interface had three problems:
- Inconsistent prefix (now fixed, to "crypto_aead").
- Inconsistent ordering of arguments (now the mac is after the output
text).
- Redundant API without the additional data (now removed).
The result is less than satisfactory to be honest. If it were just me I
would delete the direct interface entirely, because the streaming one is
almost easier to use...
...save one crucial detail: the choice of the exact algorithm. The
streaming interfaces offers three init options, each with its pros and
cons. Users need a default, and it shall be XChacha20. Those who know
what they are doing can easily use the streaming API anyway.
Loup Vaillant [Thu, 12 Jan 2023 17:15:27 +0000 (18:15 +0100)]
Added Ed25519ph
Not sure this is such a good idea, considering how niche Ed25519ph is.
Yet someone recently paid me handsomely for the functionality, sparking
the recent overhaul of the EdDSA API that made it simpler and more
flexible than ever.
And now implementing Ed25519ph has become so cheap that I felt like
thanking them by adding it upstream.
I'm currently rethinking the AEAD API as a whole, and to be honest I'm
so happy with this streaming API that I believe it could replace the
regular API entirely.
One problem with the AEAD API is the sheer number of arguments.
`crypto_lock_aead()` and `crypto_unlock_aead()` currently have 8
arguments, comprising 6 pointers (all of the same type) and 2 sizes.
There are way too many opportunities to swap arguments and break stuff.
The streaming API however is divided into an init phase, which has only
3 arguments, and a read/write phase, which has 7, but "only" 4 pointers
to byte buffers. Which I don't think we can improve much. We could try
and use a second struct similar to what we do with Argon2, but with only
7 arguments (compared to Argon2's 15) I don't think we would gain that
much readability.
As for how to use the streaming API for single shot uses, that's obvious
enough:
- Declare the context and call Init.
- Call read/write.
- Wipe the context.
One may argue that everything else (Poly1305, Blake2b, SHA-512, and
HMAC) provide a one-shot API, and we should do so here as well. There's
just one problem: we don't have one init function, we have _three_.
If we provide a one-shot API, orthogonality would need all 3 variants.
That's 6 functions total (3 locks, 3 unlocks), which is a bit much,
especially since at least one of them is only provided for compatibility
with a standard I don't entirely agree with. We could of course only
provide only the single one-shot API (like we do right now), but that
leaves such an obvious hole in the API.
Stopping at just the 5 functions we need for everything (streaming,
one-shot, all 3 variants) is very tempting.
Loup Vaillant [Mon, 9 Jan 2023 18:10:03 +0000 (19:10 +0100)]
Simplified and unified Chacha20 API
We had 6 functions. Now we only have 3.
While the basic variants are a bit more convenient to use, I don't
expect users will be using them frequently enough for it to matter. But
having 6 functions to chose from instead of 3 is in my opinion a
non-negligible cost.
Then there's HChacha20, the odd one out. While we're here breaking the
API left and right, I figured I needed a stable naming scheme for
everything. And I think each function should be named
crypto_<cluster>_<function_name>(), with relatively few clusters. And
HChacha20 quite clearly belong to the "chacha20" cluster, even though
it's sometimes used as kind of a hash (for the extended nonce DJB only
relies on its properties as a bastardised stream cipher).
And while we're speaking clusters, I'm considering having one man page
per cluster, with no regards towards whether a function is "advanced" or
not. In practice this would mean:
- Bundling HChacha20 and Chacha20 functions in the same man page. This
would help highlight how they're related.
- Bundling low-level EdDSA building blocks with the high-level
construction. We can always push the advanced stuff down the man
page, but the main point here is to make it easier to find. Oh and
we'd perhaps add the conversion to X25519 as well.
- Bundling dirty X25519 function together with the clean one. And
perhaps the conversion to EdDSA too.
- The Elligator functions are already documented together, but I think
they deserve their dedicated prefix. Like, "crypto_elligator_".
However we go about it, I'd like to strive towards a more systematic way
of documenting things, to the point of enabling some automatic checks as
hinted in #250.
Loup Vaillant [Sat, 7 Jan 2023 11:48:35 +0000 (12:48 +0100)]
Added X25519 -> EdDSA public key conversion
Also removed the private conversions (users can use the relevant hash
function instead), and renamed the existing conversion to fit the new
functionality set better.
Combined with the EdDSA building blocks, this should be enough to
implement XEdDSA.
Loup Vaillant [Fri, 6 Jan 2023 16:29:16 +0000 (17:29 +0100)]
Nicer Argon2 API
I believe it's hard to do any better.
- One function to rule them all.
- Inputs are all nicely organised.
- There's an easy way to omit the key and additional data.
- Argon2 user code is very clear, though a little verbose.
I believe fusing the "regular" and "extra" inputs together would not be
a good idea, because it would make the common case (no extra inputs)
either more verbose or more confusing than it is right now.
Loup Vaillant [Sat, 31 Dec 2022 21:33:50 +0000 (22:33 +0100)]
Fixed uninitialised read UB in Argon2
The index block was declared in the block loop instead of the segment
loop. Yet it's only initialised one time out of 128 there, so most of
the time we're accessing uninitialised memory.
It still appeared to work because that that block always occupied the
same spot in the stack. Only Clang's memory sanitiser and the TIS
interpreter caught this.
Loup Vaillant [Fri, 30 Dec 2022 23:24:53 +0000 (00:24 +0100)]
Add Argon2d and Argon2id support
This is mostly about supporting Argon2id, because it is mandated by the
RFC, and sometimes recommended by people who know more than I do about
the threat models around passwords.
Argon2d is included as well because supporting it is practically free
(one character change and one constant).
Speaking of constants, I'm not sure whether the three `CRYPTO_ARGON2_*`
constants should be pre-processor definitions like they are now, or
proper `const uint32_t` declarations.
Loup Vaillant [Thu, 29 Dec 2022 23:06:38 +0000 (00:06 +0100)]
Fix tis-ci tests
The Argon2 tests were failing because we were allocating too much memory
on 16-bit platforms. Reducing the test from 4 lanes & 32KiB down to 2
lanes and 16KiB should fix it.
The main test suite of course still needs bigger parameters.
Loup Vaillant [Mon, 12 Dec 2022 21:21:23 +0000 (22:21 +0100)]
Reworked Argon2 API (draft)
This is a prelude to Argon2d and Argon2id support. The rationale here
is that supporting both with the current API would require way too many
functions. Using a structure also helps manage the ungodly amount of
arguments this function has.
A number of unresolved questions so far:
- Should we pass by value or by reference?
- Should we start the struct with a size field, Microsoft style?
- Should we add a version field?
- Should we keep the nb_lanes field?
- If so should we support more than one lane, even while staying single
threaded?
- Should we provide structures with default values to begin with?
This is mostly an API/ABI compatibility question. Personally I think we
should omit the size field and pass by value, it feels more convenient
in practice.
A version field would let us support future versions of Argon2 without
breaking users, but the specs are so stable nowadays that I'm not sure
this is worth the trouble. We may add it if users don't need to know
it's there.
The nb_lanes field however might be required for compatibility with the
_current_ specs, so I'm inclined to keep it even if we delay multi-lane
support indefinitely.
Default values are a difficult problem. The correct strength for
password hashing is highly context dependent: we almost always want to
chose the highest tolerable strength, and there is no one size fits all.
The current manual outlines a _process_ for finding the values that work
for any given situation.
If we don't provide defaults, users have to fill out the fields
themselves, including fields that won't change often (nb_iterations), or
aren't supported yet (nb_lanes if we keep it). If we do provide
defaults, we need to chose them very carefully, and risk quick
obsolescence.
Finally, it's not clear which field should be in the struct, and which
field should be a regular argument. Right now I put fields that are
likely to stay identical from invocation to invocation in the struct.
Another possibility is to instead restrict ourselves to fields that have
a good default, which would likely demote the nb_blocks to being a
regular argument. That way users will know what parameters should be
treated as strong recommendations, and which they're supposed to chose
themselves.
Loup Vaillant [Mon, 12 Dec 2022 14:31:04 +0000 (15:31 +0100)]
More portable/consistent EdDSA verification
EdDSA has more corner cases than we would like. Up until now we didn't
pay much attention.
- The first version of Monocypher didn't check the range of S, allowing
attackers to generate valid variants of existing signatures. While it
doesn't affect the core properties of signatures, some systems rely on
a stricter security guarantee: generating a new, distinct signature
must require the private key.
- When the public key has a low-order component, there can be an
inconsistency between various verification methods. Detecting such
keys is prohibitively expensive (a full scalar multiplication), and
some systems nevertheless require that everyone agrees whether a
signature is valid or not (if they don't we risk various failures such
as network partitions).
- Further disagreement can occur if A and R use a non-canonical
encoding, though in practice this only happens when the public key has
low order (and detecting _that_ is not expensive).
There is a wide consensus that the range of S should be checked, and we
do. Where consensus is lacking is with respect to the verification
method (batch or strict equation), checking for non-canonical encodings,
and checking that A has low order.
The current version is as permissive as the consensus allows:
- It checks the range of S.
- It uses the batch equation.
- It allows non-canonical encodings for A and R.
- It allows A to have low order.
The previous version on the other hand used the strict equation, and did
not allow non-canonical encodings for R. The reasons for the current
policy are as follows:
- Everyone checks the range of S, it provides an additional security
guarantee, and it makes verification slightly faster.
- The batch equation is the only one that is consistent with batched
verification. Batch verification is important because it allows up to
2x performance gains, precisely in settings where it might be the
bottleneck (performing many verifications).
- Allowing non-canonical encodings and low order A makes the code
simpler, and makes sure we do not start rejecting signatures that were
previously accepted.
- Though these choices aren't completely RFC 8032 compliant, they _are_
consistent with at least one library out there (Zebra). Note that if
we forbade low order A, we would be consistent with Libsodium instead.
Which library we chose to be consistent with is kind of arbitrary.
The main downside for now is an 8% drop in performance. 1% can be
recovered by replacing the 3 final doublings by comparisons, but 7% come
from R decompression, which is a necessary cost of the batch equation.
I hope to overcome this loss with a lattice based optimisation [Thomas
Pornin 2020].
Loup Vaillant [Wed, 7 Dec 2022 18:39:02 +0000 (19:39 +0100)]
Less error prone EdDSA verification building blocks
crypto_eddsa_r_check() is replaced by crypto_eddsa_check_equation().
This has two advantages:
- Users now only need to return the value of crypto_eddsa_r_check().
No need for an additional check we may forget, much safer.
- Verifying the equation give better optimisation opportunities.
Loup Vaillant [Fri, 2 Dec 2022 22:45:45 +0000 (23:45 +0100)]
Safer interface for EdDSA
Now the private key is 64 bytes, and is the concatenation of the seed
and the public key just like Libsodium. The idea is to make sure users
never sign messages with the wrong public key, which can leak the secret
scalar and allow forgeries.
Users who can't afford the overhead of storing 32 additional bytes for
the secret key (say they need to burn the key into expensive fuses),
they can always only store the first 32 bytes, and re-derive the entire
key pair when they need it.
Loup Vaillant [Thu, 1 Dec 2022 15:27:08 +0000 (16:27 +0100)]
Remove EdDSA incremental & custom hash API
The incremental and custom hash API was too complex and too niche to
justify itself. I'm removing them in favour of a more flexible
approach: giving the basic building blocks necessary to implement EdDSA
manually.
Those building blocks comprise 5 specialised functions:
- crypto_eddsa_trim_scalar: turn 32 random bytes into a proper scalar.
- crypto_eddsa_reduce : reduces a 64 bytes number modulo L.
- crypto_eddsa_mul_add : like MUL_ADD, except modulo L.
- crypto_eddsa_scalarbase : multiplies a scalar by the base point.
- crypto_eddsa_r_check : generates R independently for verification.
These make it fairly easy to implement EdDSA (including Ed25519) in
various ways, including the streaming and custom hash functions I just
removed, replacing the deterministic nonce by a random one, or adding a
random prefix to mitigate the energy side channel in some settings.
I believe only minimal tweaks are required to implement the Edwards25519
half of RFC 8032 entirely (including the context and pre-hash variants),
as well as XEdDSA (which should only require a single Montgomery to
Edwards conversion function).
This is a prototype, and the extensibility promises remain to be tested.
Ideally that means implementing all the fancy extensions in a separate
project, and _maybe_ include some of them in the optional files.
Loup Vaillant [Tue, 29 Nov 2022 23:49:15 +0000 (00:49 +0100)]
Switch indentation from spaces to tabs.
For the longest time I had a fairly strong personal preference for
spaces. Then it came to my attention that using tabs meaningfully
increases accessibility.
As a cryptography library, Monocypher is supposed to ultimately help,
among other people, the most vulnerable among us. It would be a shame
to potentially exclude disabled contributors or auditors.
Note that this patches sometimes changes a little more than just
spacing. A few pieces of code in particular relied on indentation
width, and had to be reworked a little bit to make them tab width
agnostic.
Loup Vaillant [Mon, 28 Nov 2022 22:58:53 +0000 (23:58 +0100)]
Remove deprecated functions.
Deprecated functions are redundant with previous major branches of
Monocypher, and as such don't serve any meaningful purpose. Maintaining
old branches is cheaper anyway.
Note: this will also remove them from the manual, including on the
website when updated. We could compensate by publishing older versions
of the manual, or we could punt on it and rely on the fact that the
tarball contain the associated manual.
- Global constant should have been `static`
- Reserved identifier (double underscores)
- Loss of precision in implicit conversions
- Implicit change of sign
Users may wonder why we didn't provide the safer API from the outset.
We could explain this is for backwards compatibility, but this man page
is quite cluttered already.
TODO: the next major version of Monocypher should definitely adopt this
safer API.
- clarify NULL goes in public_key (not secret_key)
- add parenthetical note to define the term "fat public key" inline
- fix commas (I actually had to look up the rules for comma-before-but)
- avoid colloquial "we"