With the help of a (now updated) `doc_extract_examples.sh` script.
Note: We may want to integrate this script in the test suite, if we end
up writing more documentation.
Argon2 failed to conform to the reference implementation when used with
multiple lanes, rendering it useless for this compatibility use case.
The error came from the way we select the reference set:
- On the first slice of the first pass, only the current lane is valid.
- When selecting other lanes, only fully completed segments are valid.
- The previous block of *all* lanes must be excluded.
Compilers aren't magical. They need help to generate the best code.
Here we want to compute the following expression:
mask = 0xffffffff;
2 * (a & mask) * (b & mask)
The most efficient way to do this looks like this:
u64 al = (u32)a; // Truncate
u64 bl = (u32)b; // Truncate
u64 x = al * bl; // 32->64 bits multiply
u64 2x = x << 1; // shift
return 2x;
My compiler doesn't pick up on this, and perform a slower alternative
instead. Either the multiply by two uses an actual multiply instead of a
shift, or the shift is done first, forcing a more expensive 64->64
multiply. More naive compilers may even do both.
Whatever the cause, I got 5% faster code on GCC 11.3.
Loup Vaillant [Wed, 22 Mar 2023 22:49:16 +0000 (23:49 +0100)]
Modify Blake2b context input to byte buffer
Though it requires a (safe because it's all aligned) cast at one point,
it makes the code simpler and significantly speeds up non-aligned
incremental hashes.
Surprisingly, foregoing word-by-word loading at the begining of the
update doesn't slow anything down, but forgoing it at the end *does*.
So while we align with block boundaries directly, we end up copying the
remaining words first, then the remaining bytes.
Loup Vaillant [Wed, 22 Mar 2023 21:51:22 +0000 (22:51 +0100)]
Rename align() to gap() to avoid confusion
The name "align" made readers believe it returns the next multiple,
while in fact it's returning how much we need to get there.
The name "gap" was suggested to me, and I haven't found better. A fully
descriptive name would likely be quite long, and wouldn't preclude the
need to look the definition up anyway. (And I suspect even now one could
guess from context.)
Loup Vaillant [Sun, 5 Mar 2023 10:25:07 +0000 (11:25 +0100)]
Split installation into 3 sub targets
Namely:
- `install-lib`: library files and headers.
- `install-doc`: man pages.
- `install-pc` : pkg-config file.
Some users don't want to install everything. Those who pull Monocypher
directly from git, despite strong suggestion to use a tarball instead,
can be especially annoyed at the mandoc dependency.
We could also split the library and includes, but this feels overkill.
Loup Vaillant [Mon, 27 Feb 2023 13:24:05 +0000 (14:24 +0100)]
Move relevant parts of tests/utils.c to tests/gen
The main point is to remove dead code from tests/utils.c, and have the
tarball be as clean as possible. Incidentally this also cleans up the
dependencies for test vector generation as well.
Loup Vaillant [Sun, 26 Feb 2023 10:10:48 +0000 (11:10 +0100)]
Generate docks with `make all`
That way the docs no longer have to belong to root when we generate
them. Some more dist.sh shenanigans were required to spare tarball
users the mandoc dependency. Mostly making sure their doc is not
deleted when they `make clean`.
Loup Vaillant [Sat, 25 Feb 2023 23:36:32 +0000 (00:36 +0100)]
Give myself some copyright info
Now I'm not exactly sure where the CAVEATS section comes from. It dates
back to 2017, but it uses wording that was previously in the manual
(which was originally all mine). It is possible, though I'm not sure,
that I'm misappropriating the work of @CuleX and/or @FScoto here.
Hopefully I'm not. Or maybe future historians will judge me.
Fabio Scotoni [Sat, 25 Feb 2023 10:13:24 +0000 (11:13 +0100)]
Resolve mandoc -Tlint nits
Several of them I have ignored intentionally:
1. unknown manual section comes with the definition;
2. unusual Xr order seems to have been an intentional decision
grouping by topic rather than alphabetically,
even if admittedly unusual;
3. no blank before trailing delimiter: Fa extern const
crypto_argon2_extras crypto_argon2_no_extras;
is another intentional choice,
with no reasonable markup existing for a global variable
declaration other than Bd.
Fabio Scotoni [Sat, 25 Feb 2023 09:59:41 +0000 (10:59 +0100)]
makefile: fix install-doc
During the documentation overhaul in 85a7c3742f06ab55fdf523a7a6a9cfe5cda09837,
generating the man page symlinks was automated
(introduction of doc/gen_doc.sh, now called doc/doc_gen.sh).
However, the install-doc target of the makefile fails to account for
the source folder not necessarily existing.
This change runs doc/doc_gen.sh before attempting to install
the man pages.
This has the questionable side effect of creating a folder as root and
creating a bunch of files as root (including the HTML files,
i.e. running mandoc as root) when doing sudo make install,
but the average user will "just" install and forget about it anyway.
Loup Vaillant [Sun, 19 Feb 2023 23:31:32 +0000 (00:31 +0100)]
Fixed ctgrind
Note that Argon2 reports a "use" of uninitialised value. It does not
appear to be an secret dependent branch, or even index, but I didn't
expect the warning.
Loup Vaillant [Fri, 10 Feb 2023 15:56:12 +0000 (16:56 +0100)]
Better doc integration
Now some checks must pass before we generate the docs:
- .Nm and .Fo names are identical.
- All .Fn names are referenced in .Nm and .Fo.
- All functions from the headers have an .Nm reference.
- No .Nm reference documents a non-existent function.
- No .Xr reference is dead.
In practice this allowed me to catch many stale references very easily.
Since those were basicaly mechanically caugth typos, I did not update
the date nor copyright information in the affected documents.
Loup Vaillant [Fri, 3 Feb 2023 17:18:58 +0000 (18:18 +0100)]
Improved SHA-512 speed on small inputs.
Not as much as BLAKE2b because SHA-512 core is slower, but still.
The cost here is about 10 lines of code, and an additional couple
dozen bytes in the binary.
Loup Vaillant [Thu, 2 Feb 2023 23:14:15 +0000 (00:14 +0100)]
Document BLAKE2b KDF, change BLAKE2b API *again*
And optimised `crypto_blake2b_update()` on short or unaligned inputs.
This also clarifies the source code somewhat, though it's now a bit more
verbose. That verbosity does give us over 15-30% speed on small updates
& small inputs typical of key derivation schemes.
---
The reason for the rework of the API is that the struct argument simply
does not work in practice. See, BLAKE2b has not one, but *two* typical
usages: regular hashing, and keyed hashing. There are many situations
where controlling the size of the hash is useful even when we don't use
the key, and when we do keyed hashing (for MAC or KDF), having to use
the struct is just *so* verbose and tedious.
I briefly considered having just one hash function to rule them all with
6 arguments, but regular hashing still is the main use case. So instead
of a struct or the original split, I simply have:
* `crypto_blake2b()` with a variable sized hash.
* `crypto_blake2b_keyed()` with every option.
and in practice, the 6 arguments of the keyed version are quite
manageable: output first as always, then the key, then the message.
And if arguments get too long the pointer/size pairs give us natural
line breaks.
I had to work on KDFs to realise this so thanks @samuel-lucas6 for
bringing the whole issue up.
---
Unlike SHA512, I did *not* add an explicit KDF interface for BLAKE2b.
My reasons being:
* There is no clear standard KDF for BLAKE2b except HKDF.
* HKDF is build on top of HMAC, and HMAC is stupid when BLAKE2b already
has a keyed mode that does the exact same thing with less code and
fewer CPU cycles.
* HKDF is kind of *a little* stupid itself.
* `crypto_blake2b_keyed()` already implements KDF extract.
* `crypto_chacha20_*()` already implement KDF expand.
* I hesitate to implement a non-standard convenience function users are
unlikely to get catastrophically wrong.
So instead I updated the manual, including 3 full KDF functions lazy
users can just blindly copy & paste.
Loup Vaillant [Wed, 1 Feb 2023 22:03:06 +0000 (23:03 +0100)]
Add HKDF SHA-512
In principle, we can imitate HKDF quite easily with SHA-512 alone.
Being fully RFC compliant however is fiddly and tedious, so for users
who want to do key derivation with SHA-512 in the most standard way
possible need dedicated functions.
Note the absence of a `crypto_sha512_hkdf_extract()` function, which
would be nothing more than an alias of `crypto_sha512_hmac()`. Aliases
to the equivalent incremental interface are also absent. There are pros
and cons to both the presence and absence of those aliases, I personally
prefer to leave them out.
Loup Vaillant [Mon, 30 Jan 2023 22:48:16 +0000 (23:48 +0100)]
Renamed crypto_hmac_sha512*() to crypto_sha512*_hmac()
The main goal here is to have one man page per hash, and steer users who
are reaching for hashes when trying to compute MACs or derive keys to
the right functions.
Fusing the man pages for SHA-512 and HMAC also helped reduce the total
size of the documentation.
Loup Vaillant [Thu, 26 Jan 2023 21:47:48 +0000 (22:47 +0100)]
Documentation overhaul
The API was broken left and right, and the documentation had to be
updated to reflect it. In the process, I decided to go slightly above
and beyond.
The documentation is rather daunting. One problem I had with it was the
sheer number of pages to navigate. On top of that the optional and
advanced section didn't help. So I made a number of changes:
* I'm sick of having to update symbolic links, so I removed them all.
Instead they're generated as part of the documentation generation
process.
* I collapsed the advanced and optional section into the main folder.
Monocypher is a low-level library, so in practice even part of the
main API is advanced. Let's stop being shy about it and document
everything at the same level.
The optional part costs almost nothing on systems that can
meaningfully use the makefile. It is now included by default.
* I organised all function name around the `crypto_<section>_*()` naming
scheme, and each section gets exactly one man page. This gives us 13
sections, 13 corresponding man pages, plus `intro.3monocypher`.
One unfortunate consequence is that some of the man pages grew extra
sections for the advanced functions that were bolted on there. I'm
not exactly sure how we should tackle them. Still, I think this is a
win overall: those fewer man pages are still easier to navigate, and I
don't think the advanced stuff really hurt the readability of the man
pages they were inserted in.
This automation is a first step towards solving #250.
The next step should be to automate some cross verification:
- Compare the header files and the man pages, spot any missing
functions. Maybe also control arguments.
- Control that all .Fo and .Fn macros list the same functions.
- Control that no cross link (.Xr) is dead.
- Control that no function name (.Fn) isn't declared in the header.
(except in the HISTORY section). Maybe also do the same for
arguments.
Loup Vaillant [Sun, 22 Jan 2023 20:41:51 +0000 (21:41 +0100)]
Add simple limb overflow checker for Poly1305
Monocypher now grabs at the graal of Formal Verification.
It is very crude, but it works for Poly1305. Just transliterate the C
program into Python, add some preconditions and asserts, and execute the
result as Python to prove the absence of overflow (provided the
preconditions are correct).
Fabio Scotoni [Fri, 20 Jan 2023 18:18:02 +0000 (19:18 +0100)]
Normalize crypto_blake2b markup
Struct documentation by mdoc(7) convention does not go in the synopsis.
Similarly, external constants are not part of .Nm,
which is only used for the actual function names.
Constants are marked up with .Dv,
whether they are macros or not.
Loup Vaillant [Thu, 12 Jan 2023 18:14:04 +0000 (19:14 +0100)]
Normalise AEAD direct interface
The AEAD interface had three problems:
- Inconsistent prefix (now fixed, to "crypto_aead").
- Inconsistent ordering of arguments (now the mac is after the output
text).
- Redundant API without the additional data (now removed).
The result is less than satisfactory to be honest. If it were just me I
would delete the direct interface entirely, because the streaming one is
almost easier to use...
...save one crucial detail: the choice of the exact algorithm. The
streaming interfaces offers three init options, each with its pros and
cons. Users need a default, and it shall be XChacha20. Those who know
what they are doing can easily use the streaming API anyway.
Loup Vaillant [Thu, 12 Jan 2023 17:15:27 +0000 (18:15 +0100)]
Added Ed25519ph
Not sure this is such a good idea, considering how niche Ed25519ph is.
Yet someone recently paid me handsomely for the functionality, sparking
the recent overhaul of the EdDSA API that made it simpler and more
flexible than ever.
And now implementing Ed25519ph has become so cheap that I felt like
thanking them by adding it upstream.
I'm currently rethinking the AEAD API as a whole, and to be honest I'm
so happy with this streaming API that I believe it could replace the
regular API entirely.
One problem with the AEAD API is the sheer number of arguments.
`crypto_lock_aead()` and `crypto_unlock_aead()` currently have 8
arguments, comprising 6 pointers (all of the same type) and 2 sizes.
There are way too many opportunities to swap arguments and break stuff.
The streaming API however is divided into an init phase, which has only
3 arguments, and a read/write phase, which has 7, but "only" 4 pointers
to byte buffers. Which I don't think we can improve much. We could try
and use a second struct similar to what we do with Argon2, but with only
7 arguments (compared to Argon2's 15) I don't think we would gain that
much readability.
As for how to use the streaming API for single shot uses, that's obvious
enough:
- Declare the context and call Init.
- Call read/write.
- Wipe the context.
One may argue that everything else (Poly1305, Blake2b, SHA-512, and
HMAC) provide a one-shot API, and we should do so here as well. There's
just one problem: we don't have one init function, we have _three_.
If we provide a one-shot API, orthogonality would need all 3 variants.
That's 6 functions total (3 locks, 3 unlocks), which is a bit much,
especially since at least one of them is only provided for compatibility
with a standard I don't entirely agree with. We could of course only
provide only the single one-shot API (like we do right now), but that
leaves such an obvious hole in the API.
Stopping at just the 5 functions we need for everything (streaming,
one-shot, all 3 variants) is very tempting.
Loup Vaillant [Mon, 9 Jan 2023 18:10:03 +0000 (19:10 +0100)]
Simplified and unified Chacha20 API
We had 6 functions. Now we only have 3.
While the basic variants are a bit more convenient to use, I don't
expect users will be using them frequently enough for it to matter. But
having 6 functions to chose from instead of 3 is in my opinion a
non-negligible cost.
Then there's HChacha20, the odd one out. While we're here breaking the
API left and right, I figured I needed a stable naming scheme for
everything. And I think each function should be named
crypto_<cluster>_<function_name>(), with relatively few clusters. And
HChacha20 quite clearly belong to the "chacha20" cluster, even though
it's sometimes used as kind of a hash (for the extended nonce DJB only
relies on its properties as a bastardised stream cipher).
And while we're speaking clusters, I'm considering having one man page
per cluster, with no regards towards whether a function is "advanced" or
not. In practice this would mean:
- Bundling HChacha20 and Chacha20 functions in the same man page. This
would help highlight how they're related.
- Bundling low-level EdDSA building blocks with the high-level
construction. We can always push the advanced stuff down the man
page, but the main point here is to make it easier to find. Oh and
we'd perhaps add the conversion to X25519 as well.
- Bundling dirty X25519 function together with the clean one. And
perhaps the conversion to EdDSA too.
- The Elligator functions are already documented together, but I think
they deserve their dedicated prefix. Like, "crypto_elligator_".
However we go about it, I'd like to strive towards a more systematic way
of documenting things, to the point of enabling some automatic checks as
hinted in #250.
Loup Vaillant [Sat, 7 Jan 2023 11:48:35 +0000 (12:48 +0100)]
Added X25519 -> EdDSA public key conversion
Also removed the private conversions (users can use the relevant hash
function instead), and renamed the existing conversion to fit the new
functionality set better.
Combined with the EdDSA building blocks, this should be enough to
implement XEdDSA.