From: Loup Vaillant Date: Sat, 10 Nov 2018 12:59:38 +0000 (+0100) Subject: Added -DBLAKE2_NO_UNROLLING preprocessor option X-Git-Url: https://git.codecow.com/?a=commitdiff_plain;h=2174e60ef8aaf2c9779c12e4caaab087a4e5ab35;p=Monocypher.git Added -DBLAKE2_NO_UNROLLING preprocessor option Less bloat, faster on some embedded platforms. --- diff --git a/README.md b/README.md index 4fc4938..63604e5 100644 --- a/README.md +++ b/README.md @@ -126,22 +126,24 @@ TweetNaCl (the default is `-O3 march=native`): Customisation ------------- -For simplicity, compactness, and performance reasons, Monocypher -signatures default to EdDSA with curve25519 and Blake2b. This is -different from the more mainstream Ed25519, which uses SHA-512 -instead. - -If you need Ed25519 compatibility, you must do the following: - -- Compile Monocypher.c with option -DED25519_SHA512. -- Link the final program with a suitable SHA-512 implementation. You - can use the `sha512.c` and `sha512.h` files provided in - `src/optional`. - -Note that even though the default hash (Blake2b) is not "standard", -you can still upgrade to faster implementations if you really need to. -The Donna implementations of ed25519 for instance can use a custom -hash —one test does just that. +Monocypher has two preprocessor flags: `ED25519_SHA512` and +`BLAKE2_NO_UNROLLING`, which are activated by compiling monocypher.c +with the options `-DED25519_SHA512` and `-DBLAKE2_NO_UNROLLING` +respectively. + +The `-DED25519_SHA512` option is a compatibility feature for public key +signatures. The default is EdDSA with Curve25519 and Blake2b. +Activating the option replaces it by Ed25519 (EdDSA with Curve25519 and +SHA-512). When this option is activated, you will need to link the +final program with a suitable SHA-512 implementation. You can use the +`sha512.c` and `sha512.h` files provided in `src/optional`. + +The `-DBLAKE2_NO_UNROLLING` option is a performance tweak. By default, +Monocypher unrolls the Blake2b inner loop, because it is over 25% faster +on modern processors. On some embedded processors however, unrolling +the loop makes it _slower_ (the unrolled loop is 5KB bigger, and may +strain the instruction cache). If you're using an embedded platform, +try this option. The binary will be smaller, perhaps even faster. Contributor notes diff --git a/src/monocypher.c b/src/monocypher.c index 40bb999..6ea9832 100644 --- a/src/monocypher.c +++ b/src/monocypher.c @@ -482,6 +482,8 @@ static void blake2b_compress(crypto_blake2b_ctx *ctx, int is_last_block) { 13, 11, 7, 14, 12, 1, 3, 9, 5, 0, 15, 4, 8, 6, 2, 10 }, { 6, 15, 14, 9, 11, 3, 0, 8, 12, 2, 13, 7, 1, 4, 10, 5 }, { 10, 2, 8, 4, 7, 6, 1, 5, 15, 11, 9, 14, 3, 12, 13, 0 }, + { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 }, + { 14, 10, 4, 8, 9, 15, 13, 6, 1, 12, 0, 2, 11, 7, 5, 3 }, }; // init work vector @@ -511,9 +513,15 @@ static void blake2b_compress(crypto_blake2b_ctx *ctx, int is_last_block) BLAKE2_G(v, 2, 7, 8, 13, input[sigma[i][12]], input[sigma[i][13]]);\ BLAKE2_G(v, 3, 4, 9, 14, input[sigma[i][14]], input[sigma[i][15]]) +#ifdef BLAKE2_NO_UNROLLING + FOR (i, 0, 12) { + BLAKE2_ROUND(i); + } +#else BLAKE2_ROUND(0); BLAKE2_ROUND(1); BLAKE2_ROUND(2); BLAKE2_ROUND(3); BLAKE2_ROUND(4); BLAKE2_ROUND(5); BLAKE2_ROUND(6); BLAKE2_ROUND(7); BLAKE2_ROUND(8); BLAKE2_ROUND(9); BLAKE2_ROUND(0); BLAKE2_ROUND(1); +#endif // update hash ctx->hash[0] ^= v0 ^ v8;