From: Loup Vaillant <loup@loup-vaillant.fr>
Date: Mon, 12 Dec 2022 14:31:04 +0000 (+0100)
Subject: More portable/consistent EdDSA verification
X-Git-Url: https://git.codecow.com/?a=commitdiff_plain;h=325da52cdec24fe1e6b316544792383d21353c32;p=Monocypher.git

More portable/consistent EdDSA verification

EdDSA has more corner cases than we would like.  Up until now we didn't
pay much attention.

- The first version of Monocypher didn't check the range of S, allowing
  attackers to generate valid variants of existing signatures.  While it
  doesn't affect the core properties of signatures, some systems rely on
  a stricter security guarantee: generating a new, distinct signature
  must require the private key.

- When the public key has a low-order component, there can be an
  inconsistency between various verification methods.  Detecting such
  keys is prohibitively expensive (a full scalar multiplication), and
  some systems nevertheless require that everyone agrees whether a
  signature is valid or not (if they don't we risk various failures such
  as network partitions).

- Further disagreement can occur if A and R use a non-canonical
  encoding, though in practice this only happens when the public key has
  low order (and detecting _that_ is not expensive).

There is a wide consensus that the range of S should be checked, and we
do.  Where consensus is lacking is with respect to the verification
method (batch or strict equation), checking for non-canonical encodings,
and checking that A has low order.

The current version is as permissive as the consensus allows:

- It checks the range of S.
- It uses the batch equation.
- It allows non-canonical encodings for A and R.
- It allows A to have low order.

The previous version on the other hand used the strict equation, and did
not allow non-canonical encodings for R.  The reasons for the current
policy are as follows:

- Everyone checks the range of S, it provides an additional security
  guarantee, and it makes verification slightly faster.

- The batch equation is the only one that is consistent with batched
  verification.  Batch verification is important because it allows up to
  2x performance gains, precisely in settings where it might be the
  bottleneck (performing many verifications).

- Allowing non-canonical encodings and low order A makes the code
  simpler, and makes sure we do not start rejecting signatures that were
  previously accepted.

- Though these choices aren't completely RFC 8032 compliant, they _are_
  consistent with at least one library out there (Zebra).  Note that if
  we forbade low order A, we would be consistent with Libsodium instead.
  Which library we chose to be consistent with is kind of arbitrary.

The main downside for now is an 8% drop in performance. 1% can be
recovered by replacing the 3 final doublings by comparisons, but 7% come
from R decompression, which is a necessary cost of the batch equation.
I hope to overcome this loss with a lattice based optimisation [Thomas
Pornin 2020].

Should mostly fix #248
---

diff --git a/src/monocypher.c b/src/monocypher.c
index 035cb21..2864f1f 100644
--- a/src/monocypher.c
+++ b/src/monocypher.c
@@ -2002,38 +2002,41 @@ static int slide_step(slide_ctx *ctx, int width, int i, const u8 scalar[32])
 int crypto_eddsa_check_equation(const u8 signature[64], const u8 public_key[32],
                                 const u8 h[32])
 {
-	ge A; // -public_key
+	ge minus_A; // -public_key
+	ge minus_R; // -first_half_of_signature
 	const u8 *s = signature + 32;
 
-	// Check that public_key is on the curve
-	// Compute A = -public_key
-	// Prevent s malleability
+	// Check that A and R are on the curve
+	// Check that 0 <= S < L (prevents malleability)
+	// *Allow* non-cannonical encoding for A and R
 	{
 		u32 s32[8];
 		load32_le_buf(s32, s, 8);
-		if (ge_frombytes_neg_vartime(&A, public_key) || is_above_l(s32)) {
+		if (ge_frombytes_neg_vartime(&minus_A, public_key) ||
+		    ge_frombytes_neg_vartime(&minus_R, signature)  ||
+		    is_above_l(s32)) {
 			return -1;
 		}
 	}
 
-	// look-up table for A
+	// look-up table for minus_A
 	ge_cached lutA[P_W_SIZE];
 	{
-		ge A2, tmp;
-		ge_double(&A2, &A, &tmp);
-		ge_cache(&lutA[0], &A);
+		ge minus_A2, tmp;
+		ge_double(&minus_A2, &minus_A, &tmp);
+		ge_cache(&lutA[0], &minus_A);
 		FOR (i, 1, P_W_SIZE) {
-			ge_add(&tmp, &A2, &lutA[i-1]);
+			ge_add(&tmp, &minus_A2, &lutA[i-1]);
 			ge_cache(&lutA[i], &tmp);
 		}
 	}
 
-	// A = [s]B - [h]A
+	// sum = [s]B - [h]A
 	// Merged double and add ladder, fused with sliding
 	slide_ctx h_slide;  slide_init(&h_slide, h);
 	slide_ctx s_slide;  slide_init(&s_slide, s);
 	int i = MAX(h_slide.next_check, s_slide.next_check);
-	ge *sum = &A;
+	ge *sum = &minus_A; // reuse minus_A for the sum
 	ge_zero(sum);
 	while (i >= 0) {
 		ge tmp;
@@ -2048,10 +2051,19 @@ int crypto_eddsa_check_equation(const u8 signature[64], const u8 public_key[32],
 		i--;
 	}
 
-	// Compare R and A (originally [s]B - [h]A)
-	u8 r_check[32];
-	ge_tobytes(r_check, &A);                    // r_check = A
-	return crypto_verify32(r_check, signature); // R == R_check ? OK : fail
+	// Compare [8](sum-R) and the zero point
+	// The multiplication by 8 eliminates any low-order component
+	// and ensures consistency with batched verification.
+	ge_cached cached;
+	u8 check[32];
+	static const u8 zero_point[32] = {1}; // Point of order 1
+	ge_cache(&cached, &minus_R);
+	ge_add(sum, sum, &cached);
+	ge_double(sum, sum, &minus_R); // reuse minus_R as temporary
+	ge_double(sum, sum, &minus_R); // reuse minus_R as temporary
+	ge_double(sum, sum, &minus_R); // reuse minus_R as temporary
+	ge_tobytes(check, sum);
+	return crypto_verify32(check, zero_point);
 }
 
 // 5-bit signed comb in cached format (Niels coordinates, Z=1)