Report on Learning with Errors over Rings-based HILA5 and its CCA Security

Report on Learning with Errors over Rings-based HILA5 and its CCA Security Jesús Antonio Soto Velázquez January 24, 2018 Abstract HILA5 is a cryptographic primitive based on lattices that was submitted to the Post-Quantum competition by NIST as a Key Exchange Mechanism (KEM) and Public Key Encryption (PKE). The structure of HILA5 is based on the learning with errors over rings problem (RWLE), and it claims to have not only CPA security, but also CCA2 security. Although very similar to the NewHope cryptosystem, HILA5 employs a slightly different reconciliation technique and additionally proposes an error correction technique, XE5. This allows for smaller ciphertexts and lower decryption failure rate of 2 128 when used with the same parameters as NewHope. Bernstein et al. showed in [2] that despite the claim that HILA5 offers IND-CCA security, there is indeed a practical chosen-ciphertext attack that may be used by evil Bob to retrieve Alice s secret key. 1 Introduction In recent years, lattice-based cryptography has gained a lot of attention due to its properties that range from provable security guarantees, possibility of fully or somewhat homomorphic encryption, efficiency, and resistance to quantum attacks. This apparent resistance to quantum attacks is what has lead to the submission of many proposals to the Post-Quantum competition organized by NIST to standardize a suite of quantum-safe algorithms for key exchange and public-key encryption. One such submission was the Hila5 cryptosystem proposed by Saarinen in [4]. It builds upon the NewHope [1] cryptosystem and its parameters, except that it differs mainly in two aspects: the reconciliation technique used, and addition of an error-correcting code mechanism. The Hila5 cryptosystem is instantiated as a Key Exchange 1

Mechanism (KEM) and suggests that it can be easily be adapted into a Public-Key Encryption scheme (PKE). 1.1 Key Exchange Mechanism A Key Exchange Mechanism is protocol of one or two messages used by Alice to transmit an ephemeral key to Bob. A KEM typically consists of three efficient algorithms: key generation, encapsulation, and decapsulation. Key generation is done by Alice and produces a secret key and a public key. The public key is sent to Bob, which is used in the encapsulation to create a ciphertext and an ephemeral session key. The ciphertext is sent to Alice to be used in the decapsulation done by Alice, and it either produces the same session key or a failure. Thus, the final output is a key shared only by Alice and Bob. 1.2 Learning with Errors over Rings A ring R is an algebraic structure with certain properties and two built-in operations, typically addition and multiplication. It has the same properties as an abelian group under addition, i.e.: closure, associativity, an identity element, an inverse, commutativity, and top of it has a second operation, e.g., multiplication. It may or not be commutative. Let R be a ring with elements v Z n q. Each element can be expressed as a polynomial represented by its coefficients, or intuitively it can be seen as a vector v = n, where each position represents a single coefficient. Hila5 makes use of the polynomial basis Z q [x]/(x n + 1). Thus, all polynomials in the ring are bounded by order n 1, and reduced mod q. Let χ be an error (discrete Gaussian) distribution tightly close to zero, rounded to the nearest integer. In Hila5, the domain of this distribution is [-16, 16]. Definition. Let g u R and s, e R be chosen at random from some distribution χ which is tightly concentrated around zero. Given (g, g s + e), determining s R is known as the Learning with Errors problem. The hardness of this problem is a function of n, q, and χ. We say that an algorithm A solves RLWE with error distribution χ if, for any s Z n q, given an arbitrary number of indistinguishable samples from A s,χ, A outputs s with high probability. In others words, that Pr[s A(q, χ, a 1,..., a n A s,χ ] 1. The additive error e is important for the RLWE problem. Without it, finding s is easy: with n equations, it is possible to recover s using Gaussian elimination. When the error is included, Gaussian elimination would further amplify the error, resulting in no additional information about s. 2

The Ring-Learning with Errors problem can be reduced to a decision variant as it suffices to distinguish RLWE samples from entirely uniform samples. These samples are of the form: (a, b = a s + e) R q R q, for s, a, e R q. Furthermore, a reduction from the worst-case to the averagecase is described in [3] as: it suffices to solve this distinguishing task for a uniform secret s Z n q. Note: In Hila5, the multiplication operation between two elements u, v R require O(n 2 ) operations. The author proposes to convert the polynomial representation to the Number Theoretic Transform, which reduces the number of operations to O(n log n). However, this transformation is only important for implementation reasons, and will not be taken into account for this report. 2 Hila5 as a KEM As previously mentioned, Hila5 can be instantiated as a Key Exchange Mechanism. The three steps are described as follows: 1. (Alice). (PK,SK) KeyGeneration(): g u R (sk) = a χ e χ A = g a + e Alice sends public key P K = (A, g) to Bob. 2. (Bob) (CT, K) Encapsulation(pk): b χ y A b d, k, c SafeBits(y) p z = k r XE5 Cod(p) z e χ B g b + e K = h(h(pk) h(ct) p) Bob sends ct = B d c r to Alice. 3. (Alice) (K ) Decapsulation(SK, CT) x B a k Select(x, d, c) p z = k r XE5 Cod(p ) 3

p XE5 Fix(r z r ) p Return K = h(h(pk) h(ct) p ) The hash function employed h(x) is SHA3-256. The encapsulation method introduces key reconciliation and a linear error correction code, XE5. In the decapsulation phase, Alice obtains an approximate key x, and uses key reconciliation vector c and XE5 to fix up to 5 bit mismatches. The key exchange mechanism is successful if Alice and Bob agree on the exact key, i.e., K = K. 2.1 Why do we need reconciliation? Up to the first two steps in the encapsulation method in section 2 it is clear what is going on: Bob creates his secret share of the key y = A b = (g a + e) b = gab + eb. Later on, Alice calculates her secret share of the key x = B a = (g b + e ) a = gab + e b. Since the added error is very small, x gab y, where x and y are two vectors in Z n q. The difference is expressed as = x y = e a e b/ Thus, Alice s and Bob s secret shares are approximately the same, but that is not good enough. To agree on an exact secret given this approximate key share, they must perform key reconciliation. Key reconciliation essentially makes both parties share an exact key with very high probability. From the coefficients of the vectors x, y, up to n shared bits can be extracted. The disagreeing bits can be fixed through a binary classifier. Since the error distribution χ is tightly centered around zero, the distribution of the distance between each vector element, δ i = x i y i is also centered around zero. The SafeBits(y) function is intended to provide three things: safe bit positions (v i ), value of key bit (k i ), and reconciliation value (c i ). Each key bit and reconciliation bit is chosen as: 2yi 4yi k i = c i = mod 2, q q for y i in range (uniform): [0, q 1]. Naturally, k i remains private, while c i is eventually sent to Alice. Alice then uses this information to find k i = k i. More concretely, Alice gets k i using c i via: 2 q q k i = q (x i c i + mod q) 4 8 Variations previous to Hila5 include all reconciliation bits given by ring dimension, but this work instead assumes that not all of them are needed. This introduces the vector d = n, where d i = 0 means it is not needed for reconciliation. Since the distribution is biased towards zero, there are some bits that are less likely to agree. Thus, the strategy is that honest Bob 4

selects m indices in y that are likely to agree. These coefficients would be closest to center parts of k = 0 and k = 1 ranges, q 3q and, respectively. 4 4 To decide which bits should be included as part of the reconciliation, Bob computes d i = 1 if: (yi q q mod ) b, 4 8 where b is the window size (range) for safe bit selection, and the rounding function x = x + 2 1. The resulting size of the key is m wt(d) bits, where wt(d) is the hamming weight of the vector d. 3 Error Correction Code XE5 On top of the new reconciliation technique, Hila5 introduces an error correction mechanism, XE5, in order to fix any mismatches occurring during reconciliation. This error correction code run on secret data, i.e., the session key k. Definition. XE5 is a linear block code that has a block size of 496 bits, out of which 256 bits are payload bits p = (p 0, p 1,..., p 255 ) and 240 provide redundancy r. Redundancy is divided into ten subcodewords r 0, r 1,..., r q of varying bit length r i = L i. The bit length for each subcodeword r i is fixed, and the bits in each of these are indexed: r (i,0), r (i,1),..., r (i,li 1 ). The subcodeword r 0 satisfies a certain parity equation, and remaining r 1,..., r 9 satisfy a parity congruence, both which will not be elaborated here. However, it is relevant to note that each payload bit position p i is assigned a corresponding integer weight w i [0, 10] as a sum: w i = r (0, i/16 ) + q r (j,i mod Lj ), for 0 i 256. Lemma. If message payload p only has a single non-zero bit p e, then w e = 10 and w i 1 for all i e. The previous lemma hints that the weight w i is useful to identify errors in the payload. Since w e = 10 is the highest it can get, it suggests that the bit p e is an error and should be flipped for correction. Definition. Given XE5 input block p r, we deliver a redundancy check r from p via the parity equation and parity congruence. Moreover, the distance r = r r. Payload distance weight vector w is derived from r via section 3. j=1 5

In general, the error correction strategy is to flip bit p x at position x where w x = 10. Changing each bit p i when w i 6 will correct a total of five bit errors in a single block. In the description of the KEM, we have in the encapsulation method that Bob gets: p z = k and r XE5 Cod(p) z. Later, Alice reconstructs the key k, which can be used to obtain p z = k, and from it r XE5 Cod(p ). However, we know that there might be some errors in the payload p, so that r r. Therefore, applying the error correction XE5 Fix(r z r ) = XE5 Fix(XE5 Cod(p) z z XE5 Cod(p )) will set correct bits to zero, due to the XOR property. Errors will set bits p x = 1, which will be assigned a weight w i by the XE5 error correction mechanism, which is used to decide is the bit p x contains an error or not. If the result of this operation is applied XOR p, then the errors found in payload p will be corrected by doing a bit flips on these positions where w i 6. What remains of this process is a corrected payload p, so that if used to construct K, then K = K with very high probability, so that Alice and Bob now exactly agree on an ephemeral key. 4 Security claims 4.1 Chosen Plaintext Attack Security Recall that the final output of a KEM is a shared key, not a plaintext. Therefore, in order to define security, what should be considered are the outputs of the key generation, encapsulation, and decapsulation algorithms. Considering these, a KEM (Gen, Encap, Decap) is (t, ɛ) IND-CPA secure if for all t-time adversaries A: Adv ind-cpa (A) = Pr [ G A 0 = 1 ] Pr [ G A 1 = 1 ] ε, where the security games are defined as follows G 0 (sk, pk) Gen CT Encap(pk) K Decap(sk, CT) return A(pk, CT, K) G 1 (sk, pk) Gen CT Encap(pk) K Decap(sk, CT) return A(pk, CT, K ) 6

In short, it means that the probability that an t-time adversary distinguishes between two valid triplets (pk, CT, K) and (pk, CT, K ) is bounded by ε. To break the IND-CPA definition, an adversary B would have to distinguish any two pair of elements from the triplets. The intuition behind Hila5 being IND-CPA comes from the facts that: 1. The public key is obtained via: pk = A g a + e. Distinguishing between two PK for two distinct g, g u R is the same as solving the decisional variant of the RLWE problem, which is provably hard. 2. CT obtained from encapsulation is the concatenation B d c r. Distinguishing two pairs of B is the same as the previous point. The vectors d and c used for reconciliation are heavily dependent on the errors present in the shared key material, which come from the error distribution χ. Due to the combined use of the secret key and the error, i.e., g a + e, it is not possible to distinguish the effects of the unknown error. Finally, the author claims that the error correction code r does not impact security, and intuitively it depends on the structure of the RLWE. 3. The two keys K, K are made up as follows: K = h(h(pk) h(ct) p), and K = h(h(pk) h(ct) p ). The first two parts of the key, PK and CT, were discussed in the previous points. The remaining part, p and p, represent the first 256 bits of the secret and are derived from Bob s secret share y = A b = (g a + e) b = gab + eb, and Alice s secret share x = B a = (g b + e ) a = gab + e b, where e, e χ. As this falls under the structure of the RLWE decisional variant, is then indistinguishable for a t-time adversary A. 4.2 Chosen Ciphertext Attack Security Berstein et al. claim in [2] that Hila5 does not offer IND-CCA security, despite the implicit claim of the author of Hila5 that it does offer IND-CCA2 security. In [4], Saarinen claims that Hila5 can be made secure against active attacks, i.e., IND-CCA2 secure, if K is used as keying material for an AEAD (Authenticated Encryption with Associated Data), such as AES256-GCM or Keyak. The main difference from the previous IND-CPA games is that evil Bob (the adversary), has partial (CCA1) or full (CCA2) access to a decapsulation oracle. In the attack proposed by Bernstein et al., evil Bob chooses nonlegitimate ciphertexts to provide to Alice, and then learns something about 7

the key according to the responses by Alice. This attack does not need a decapsulation oracle, and only decrypts legitimate ciphertexts, thus, Hila5 would not provide IND-CCA1 security either. The most natural way to show that the KEM does not have IND-CCA security is to give an attack for the IND-CCA games: G 0 (sk, pk) Gen CT Encap Decap( ) (pk) K Decap(sk, CT) return A(pk, CT, K) G 1 (sk, pk) Gen CT Encap Decap( ) (pk) K Decap(sk, CT) return A(pk, CT, K ) The attack given by Berstein et al. is a variant of Fluhrer s chosenciphertext attack that works against similar RWLE cryptosystems. In this attack, evil Bob artificially forces the first coefficient of gab to be close to the edge M 1. Recall that in the edge of the intervals, errors are more prone to occur prior reconciliation. An honest Bob would rather set a reconciliation bit c[0] for the first coefficient, but evil Bob does not. Evil Bob proceeds honestly with the rest of the bits, so now he is able to try to guess the first bit, and see how Alice reacts to it. By this reaction, Bob is able to distinguish between 0 and 1 for the first coefficient. Assuming he guessed correctly, Bob knows the first coefficient of (gab), and with this information, he can pinpoint the interval of the first coefficient of (e a). The more queries Bob makes to Alice, the more smaller the interval of (e a)[0] becomes, e.g., through binary search, until he deduces the exact distance of this coefficient. Once he knows this, setting e = 1 reveals the first coefficient of Alice s secret key a. The same procedure can be repeated for the remaining 255 key bits. It would be preferable if these could all be obtained at once with high probability, and for this there is another method. In general, the major steps of the adapted Fluhrer s attack are: 1. Guess a small low-weight secret b 0 (as suggested earlier) such that the first coefficient of (gab 0 ) is at the edge of M. Recall that b 0 R such that (b (ga + e))[0] 1. 2. For each coefficient δ [ 16,..., 16] compute b δ such that (gab δ )[0] = M + δ. 3. For each target coefficient of Alice s secret (a) Choose e such that (e a)[0] is the target coefficient. For the first coefficient, e = 1. 8

(b) Perform a binary search using the b d elta to recover the target coefficient. In the case where ( e a)[0] > δ), the target coefficient (gab d elta)[0] + (e a)[0] maps to 1. 4. If after recovering several coefficients the results look like bad guesses, then most likely b 0 was a wrong guess. If so, start again from step 1. A sequence of good guesses looks like it was sampled from the χ error distribution. After successful execution of this attack, Bob learns Alice s secret key sk, showing that Hila5 is not secure against this particular chosen-ciphertext attack. The only obstacle towards executing this attack is the error correction code present in Hila5, i.e., XE5. Fluhrer s attack depends on detecting bit errors in the shared secret from Alice, i.e., x = B a = gab + ea. The application of XE5 Fix hides any bit errors present, stopping the attack momentarily. A work-around for this is having evil Bob induce a single bit flip in p, the payload, then the redundancy z will not have any additional errors. Thus, any interaction with Alice will have the same result whether Bob flipped that bit or not. We know that a bit is corrected whenever wi 6 for bit p[i], and that if one non-zero bit p[i] is flipped, it gets assigned w i = 10. Therefore, evil Bob can flip at least 5 bits in r, so that Alice will not be able to to correct bit p[i] when she computes p [i]. There will a disagreement between the shared keys as an error is present, and the variation of the Fluhrer s attack can be used as described previously. References [1] Erdem Alkim et al. Post-quantum key exchange - a new hope. Cryptology eprint Archive, Report 2015/1092. https://eprint.iacr.org/2015/ 1092. 2015. [2] Daniel J. Bernstein et al. HILA5 Pindakaas : On the CCA security of lattice-based encryption with error correction. Cryptology eprint Archive, Report 2017/1214. https://eprint.iacr.org/2017/1214. 2017. [3] Chris Peikert. Lattice Cryptography for the Internet. Cryptology eprint Archive, Report 2014/070. https : / / eprint. iacr. org / 2014 / 070. 2014. [4] Markku-Juhani O. Saarinen. HILA5: On Reliability, Reconciliation, and Error Correction for Ring-LWE Encryption. Cryptology eprint Archive, Report 2017/424. https://eprint.iacr.org/2017/424. 2017. 9