From: Loup Vaillant <loup@loup-vaillant.fr>
Date: Fri, 22 Mar 2019 20:52:30 +0000 (+0100)
Subject: Optimised Poly1305 loading code
X-Git-Url: https://git.codecow.com/?a=commitdiff_plain;h=4635859c4c75fcfdf652491375cee96216df7170;p=Monocypher.git

Optimised Poly1305 loading code

By actually *rolling* the loading code.  I haven't looked at the
assembly, but I suspect the loop is easier for the compiler to
vectorise.

This results in a 5% speed increase on my machine (Intel i5 Skylake
laptop, gcc 7.3.0).

This fix was made possible by @Sadoon-AlBader on GitHub, who submitted
pull request #118
---

diff --git a/src/monocypher.c b/src/monocypher.c
index 55a0d4c..efa04f3 100644
--- a/src/monocypher.c
+++ b/src/monocypher.c
@@ -388,10 +388,9 @@ void crypto_poly1305_update(crypto_poly1305_ctx *ctx,
     // Process the message block by block
     size_t nb_blocks = message_size >> 4;
     FOR (i, 0, nb_blocks) {
-        ctx->c[0] = load32_le(message +  0);
-        ctx->c[1] = load32_le(message +  4);
-        ctx->c[2] = load32_le(message +  8);
-        ctx->c[3] = load32_le(message + 12);
+        FOR (i, 0, 4) {
+            ctx->c[i] = load32_le(message +  i*4);
+        }
         poly_block(ctx);
         message += 16;
     }