Linear Cryptanalysis. Kaisa Nyberg. Department of Computer Science Aalto University School of Science. S3, Sackville, August 11, 2015

Similar documents
Linear Cryptanalysis of Long-Key Iterated Cipher with Applications to Permutation-Based Ciphers

Wieringa, Celine; Nyberg, Kaisa Improved Parameter Estimates for Correlation and Capacity Deviates in Linear Cryptanalysis

Linear Cryptanalysis of Reduced-Round PRESENT

Linear Cryptanalysis Using Multiple Linear Approximations

Multivariate Linear Cryptanalysis: The Past and Future of PRESENT

Experiments on the Multiple Linear Cryptanalysis of Reduced Round Serpent

Extended Criterion for Absence of Fixed Points

Statistical and Linear Independence of Binary Random Variables

Improved Impossible Differential Cryptanalysis of Rijndael and Crypton

Impossible Differential-Linear Cryptanalysis of Reduced-Round CLEFIA-128

Distinguishing Attacks on a Kind of Generalized Unbalanced Feistel Network

Cryptanalysis of the Full DES and the Full 3DES Using a New Linear Property

Linear Cryptanalysis

Differential-Linear Cryptanalysis of Serpent

On Distinct Known Plaintext Attacks

Linear Cryptanalysis of RC5 and RC6

Improved Linear (hull) Cryptanalysis of Round-reduced Versions of SIMON

Block Cipher Cryptanalysis: An Overview

Integral and Multidimensional Linear Distinguishers with Correlation Zero

Provable Security Against Differential and Linear Cryptanalysis

FFT-Based Key Recovery for the Integral Attack

Linear Cryptanalysis of DES with Asymmetries

A Five-Round Algebraic Property of the Advanced Encryption Standard

On Feistel Ciphers Using Optimal Diffusion Mappings Across Multiple Rounds

Chapter 1 - Linear cryptanalysis.

Key Difference Invariant Bias in Block Ciphers

Type 1.x Generalized Feistel Structures

Revisiting the Wrong-Key-Randomization Hypothesis

Affine equivalence in the AES round function

Zero-Correlation Linear Cryptanalysis of Reduced-Round LBlock

Algebraic Techniques in Differential Cryptanalysis

Improved Analysis of Some Simplified Variants of RC6

On Multiple Linear Approximations

Differential Attack on Five Rounds of the SC2000 Block Cipher

Towards Provable Security of Substitution-Permutation Encryption Networks

Thesis Research Notes

Lecture 12: Block ciphers

Computing the biases of parity-check relations

Analysis of SHA-1 in Encryption Mode

Linear Cryptanalysis Using Multiple Approximations

3-6 On Multi Rounds Elimination Method for Higher Order Differential Cryptanalysis

Hermelin, Miia; Cho, Joo Yeon; Nyberg, Kaisa Multidimensional Linear Cryptanalysis

Some attacks against block ciphers

Structural Evaluation of AES and Chosen-Key Distinguisher of 9-round AES-128

Links Between Truncated Differential and Multidimensional Linear Properties of Block Ciphers and Underlying Attack Complexities

Experimenting Linear Cryptanalysis

Probability Distributions of Correlation and Differentials in Block Ciphers

Provable Security Against Differential and Linear Cryptanalysis

jorge 2 LSI-TEC, PKI Certification department

A New Technique for Multidimensional Linear Cryptanalysis with Applications on Reduced Round Serpent

Division Property: a New Attack Against Block Ciphers

Enhancing the Signal to Noise Ratio

Siwei Sun, Lei Hu, Peng Wang, Kexin Qiao, Xiaoshuang Ma, Ling Song

On the Wrong Key Randomisation and Key Equivalence Hypotheses in Matsui s Algorithm 2

Zero-Correlation Linear Cryptanalysis with Fast Fourier Transform and Applications to Camellia and CLEFIA

Cryptanalysis of a Generalized Unbalanced Feistel Network Structure

Cryptanalysis of a Generalized Unbalanced Feistel Network Structure

AES side channel attacks protection using random isomorphisms

On Correlation Between the Order of S-boxes and the Strength of DES

Lecture 4: DES and block ciphers

New Combined Attacks on Block Ciphers

Improved characteristics for differential cryptanalysis of hash functions based on block ciphers

An average case analysis of a dierential attack. on a class of SP-networks. Distributed Systems Technology Centre, and

Links Between Theoretical and Effective Differential Probabilities: Experiments on PRESENT

Chapter 2 - Differential cryptanalysis.

Symmetric Cryptanalytic Techniques. Sean Murphy ショーン マーフィー Royal Holloway

Bernoulli variables. Let X be a random variable such that. 1 with probability p X = 0 with probability q = 1 p

Analysis of cryptographic hash functions

Specification on a Block Cipher : Hierocrypt L1

Maximum Correlation Analysis of Nonlinear S-boxes in Stream Ciphers

Using MILP in Analysis of Feistel Structures and Improving Type II GFS by Switching Mechanism

DK-2800 Lyngby, Denmark, Mercierlaan 94, B{3001 Heverlee, Belgium,

How Biased Are Linear Biases

The Invariant Set Attack 26th January 2017

Security of Random Feistel Schemes with 5 or more Rounds

GENERALIZED NONLINEARITY OF S-BOXES. Sugata Gangopadhyay

Linear and Statistical Independence of Linear Approximations and their Correlations

SOBER Cryptanalysis. Daniel Bleichenbacher and Sarvar Patel Bell Laboratories Lucent Technologies

Impossible differential and square attacks: Cryptanalytic link and application to Skipjack

Impossible Differential Cryptanalysis of Mini-AES

Revisit and Cryptanalysis of a CAST Cipher

KFC - The Krazy Feistel Cipher

Improved Meet-in-the-Middle Attacks on Reduced-Round Camellia-192/256

Symmetric Crypto Systems

Product Systems, Substitution-Permutation Networks, and Linear and Differential Analysis

New Results in the Linear Cryptanalysis of DES

Security of the AES with a Secret S-box

Improbable Differential Cryptanalysis and Undisturbed Bits

Data complexity and success probability of statisticals cryptanalysis

S-box (Substitution box) is a basic component of symmetric

BISON Instantiating the Whitened Swap-Or-Not Construction November 14th, 2018

Linear Cryptanalysis of Reduced-Round Speck

A Brief Comparison of Simon and Simeck

Improved Zero-sum Distinguisher for Full Round Keccak-f Permutation

Integrals go Statistical: Cryptanalysis of Full Skipjack Variants

Cryptography - Session 2

Cristina Nita-Rotaru. CS355: Cryptography. Lecture 9: Encryption modes. AES

Statistical Analysis of chi-square A. Author(s)ISOGAI, Norihisa; MIYAJI, Atsuko; NO

Cryptanalysis of Patarin s 2-Round Public Key System with S Boxes (2R)

Statistical and Algebraic Properties of DES

Transcription:

Kaisa Nyberg Department of Computer Science Aalto University School of Science s 2 r t S3, Sackville, August 11, 2015

Outline Linear characteristics and correlations Matsui s algorithms Traditional statistical models using normal distributions under key equivalence hypotheses Linear Hull theorem Key variance and more realistic key hypotheses 2/55

Expected outcome This lecture provides you with the basic concepts for understanding Matsui s algorithms and more general linear cryptanalysis This lecture is targeted to give you the necessary (and hopefully also sufficient) prerequisites for being able to read recent CRYPTO 2015 paper by Jialin Huang, et al., (and possibly also the one by Bing Sun, et al.) FSE 2013 paper by Andrey Bogdanov and Elmar Tischhauser and goes beyond by presenting a new comprehensive model of Matsui s Algorithm 2 that handles key variance for both wrong keys and right keys. 3/55

Section: Linear characteristics and correlations 4/55

Symmetric-key encryption k K the key x P the plaintext y C the ciphertext Encryption method is a family {E k } of transformations E k : P C, parametrized using the key k such that for each encryption transformation E k there is a decryption transformation D k : C P, such that D k (E k (x))) = x, for all x P. The spaces are too large to allow deterministic analysis when observing data from a cipher. Therefore we use information theoretic and statistical analysis and consider the key, plaintext and ciphertext as random variables. Assumption: Plaintext and key are independent random variables that follow uniform distribution. 5/55

Block cipher The data to be encrypted is split into blocks x i, i = 1,...,N, of fixed length n. A typical value of n is 128. P = C = Z n 2, K = Zl 2. For the purposes of linear cryptanalysis a block cipher is considered as a vectorial Boolean function f : Z n 2 Z l 2 Z n 2 Z l 2 Z n 2, f(x, k) = (x, k, E k (x)) Inner product of a mask u = (u 1,...,u n ) Z n 2 and a data vector x = (x 1,...,x n ) Z n 2 is computed as u x = u 1 x 1 + u 2 x 2 +...u n x n. Linear approximation with mask triple (u, v, w) of a block cipher is a relation u x + v k + w E k (x). 6/55

Correlation Correlation between two Boolean functions f : Z n 2 Z 2 and g : Z n 2 Z 2 is defined as c(f, g) = 2 n (#{x Z n 2 f(x) = g(x)} #{x Z n 2 f(x) g(x)}) Correlation c(f, 0) is called the correlation of f(x) over x, and also denoted as c x (f(x)). Then c x (f(x)) = 2 Pr(f(x) = 0) 1. Linear cryptanalysis makes use of Boolean functions derived from ciphers with large correlations Useful expression: c x (u x + v k + w E k (x)). c x (f(x)) = 2 n x ( 1) f(x), where the sum is taken over integers. 7/55

Correlations in iterated block ciphers We focus on key alternating iterated block ciphers. Let (k 1, k 2,...,k r ) be the extended key with the round keys k i derived from master key k. A key-alternating cipher E k has following structure Then E k (x) = g(...g(g(g(x + k 1 )+k 2 )...)+k r ). c x (u x + w E k (x)) = τ r ( 1) τ i k i c z (τ i z +τ i+1 g(z)), i=1 where τ = (τ 1 = u,τ 2,...,τ r,τ r+1 = w). [JD94] x k 1 k 2 k 3 k r g g g g y 8/55

Proof in case r = 2 x k 1 g k 2 g E k (x) In this case τ = (u,τ 2, w) c x (u x + w E k (x)) = 2 n x ( 1) u x+w E k(x) = 2 n ( 1) u x+w g(g(x+k 1)+k 2 ) x = 2 2n ( 1) u x+τ 2 g(x+k 1 ) ( 1) τ 2 y+w g(y+k 2 ) τ x y = 2 2n ( 1) u (z1+k1)+τ2 g(z1) τ z 1 z 2 ( 1) τ2 (z2+k2)+w g(z2) = ( 1) u k 1+τ 2 k 2 c z1 (u z 1 +τ 2 g(z 1 ))c z2 (τ 2 z 2 + w g(z 2 )). τ 9/55

Linear characteristic Similarly as in the previous proof, we set z 1 = x + k 1 and z i = g(z i 1 )+k i, i = 2,...,r. Let v = (v 1,...,v r, v r+1 ) be an (r + 1)-tuple of masks such that v 1 = u and v r+1 = w. Then r (v i z i + v i+1 g(z i )) = u x + v 1 k 1 +...+v r k r + w E k (x). i=1 The sequence v = (v 1,...,v r, v r+1 ) is called a linear characteristic from u to w over the key-alternating cipher E k. We set v k = v 1 k 1 +...+v r k r. Then the linear characteristic v = (v 1,...,v r, v r+1 ) defines the linear approximation u x + v k + w E k (x). 10/55

Correlation of linear characteristic Using c x (u x + w E k (x)) = τ r ( 1) τ k c z (τ i z +τ i+1 g(z)), i=1 where τ = (u,τ 2,...,τ r, w), we obtain c x (u x + v k + w E k (x)) = ( 1) v k c x (u x + w E k (x)) = ( 1) r v k ( 1) τ k c z (τ i z +τ i+1 g(z)) τ i=1 r r = ( 1) τ k c z (τ i z +τ i+1 g(z)). i=1 c z (v i z + v i+1 g(z))+ τ v Taking the average over k = (k 1,...,k r ), where all k i take all possible values, will make the second term vanish if at least one τ i 0. We can state the following theorem. i=1 11/55

Average correlation of linear characteristic Assumption. Round keys k 1,...,k r take all possible values. Theorem. Average correlation of a linear characteristic v = (v 1, v 2,...,v r, v r+1 ) from u to w taken over extended keys k = (k 1,...,k r ) is c(u, v, w) = Avg k c x (u x + v k + w E k (x)) r = c z (v i z + v i+1 g(z)) i=1 In the first practical cryptanalysis, the following estimate was used c x(u x + v k + w E k (x)) = ( 1) v k c x(u x + w E k (x)) c(u, v, w) c(u, v, w) was computed using the Piling up lemma under the heuristic assumption of round-independence. We replaced this assumption by assuming independent round keys (i.e., long-key cipher [DR2007]). Is c(u, v, w) a good estimate for any fixed key k? 12/55

Case of single dominant characteristic Rewriting the previous equation gives r c x (u x + w E k (x)) ( 1) v k c z (v i z + v i+1 g(z)). (1) The values c x (u x + w E k (x)) are estimated from cipher data. In reality, c x (u x + w E k (x)) = r ( 1) τ k c z (τ i z +τ i+1 g(z)). τ A characteristic v = (v 1,...,v r+1 ) is called dominant characteristic from u to w, if r c(u, v, w) = c z (v i z + v i+1 g(z)) is large, and i=1 i=1 i=1 r c z (τ i z +τ i+1 g(z)) 0, for τ v. i=1 13/55

Toy example E k (x) = g(g(x)+k), where g is the AES S-box and k is eight bits. The maximum of c(u x + v g(x)) is 2 3.Then for any 8-bit end masks u and w there exist many characteristics v with equally large correlations c(u, v, w) = 2 6 which is the maximum possible value 2 6. On the other hand, for any given (u, w) the true value c x (u x + w E k (x)) varies a lot with the key k. Consider (u, w) = (, ). Then we have c x (u x + w E k (x)) = 0, for 21 keys k. For these keys, linear cryptanalysis fails! For the remaining 235 keys k we have c x (u x + w E k (x)) 2 6. There are no single dominant characteristics. 14/55

Correlations over SPN: S-box layer x = (x 1, x 2,...,x t ) g(x) = (S 1 (x 1 ), S 2 (x 2 ),...,S t (x t )) Assumption: Inputs x j to different S-boxes are independent. u = (u 1, u 2,...,u t ) v = (v 1, v 2,...,v t ) t c x (u x + v g(x)) = c xj (u j x j + v j g(x j )), by the Piling up lemma. To maximize the correlation one usually takes almost all u j and v j equal to zero, since then c xj (u j x j + v j g(x j )) = 1. j=1 15/55

Linear characteristics for SPN: linear layer g(x) = Mx c x (u x + v Mx)) = c x (u x + M t v x)) { 1 if u = M = t v 0 otherwise This uniquely determines the masks over the linear layer. Text-books have nice concrete examples of how to construct linear characteristics over SPNs, see [Stinson] or [Knudsen-Robshaw]. 16/55

SPN characteristic example 17/55

Section: Matsui s algorithms 18/55

Empirical correlation Given a sample D of size N of plaintext-ciphertext pairs (x, y = E K (x)) obtained from cipher with unknown key K the empirical correlation (or observed correlation) ĉ(d, K) of linear approximation u x + w y) is computed as ĉ(d, K) = 1 (#{(x, y) u x + w y = 0} #{(x, y) u x + w y 0}) N 19/55

Algorithm 1 Matsui s Algorithm 1 is a statistical cryptanalysis method for finding one bit of the key with the following steps 1. Select a linear characteristic v such that the average correlation c = c(u, v, w) deviates from 0 as much as possible. 2. Sample plaintext-ciphertext pairs (x, E K (x)) for a fixed (unknown) key K and determine the empirical correlation ĉ(d, K) of the linear approximation u x + w E K (x) 3. If c and ĉ are of the same sign, output v K = 0. Else output v K = 1. Here estimate (1) is used: c x (u x + w E K (x)) ( 1) v K c(u, v, w). 20/55

Algorithm 2 Matsui s Algorithm 2 is a statistical cryptanalysis method for finding a part of the last round key for block ciphers where a relation (last round trick) E k,k r (x) = G kr (E k (x)), where k r is relatively short, can be constructed from the encryption transformation. 1. Select a linear characteristic v such that the average correlation c = c(u, v, w) of the linear approximation u x + v k + w E k (x) deviates from 0 as much as possible. 21/55

Algorithm 2, cont d 2. Take a sample D of plaintext-ciphertext pairs (x, E k,k r ) of size N. For each last round key candidate K, compute pairs (x, y = G 1 K (E k,k r (x)) and determine the empirical correlation ĉ(d, K) of the linear approximation u x + w y = 0 3. Output a set of values K that achieve top largest values ĉ(d, K). 4. Additionally, one can determine the value v k as in Algorithm 1. 22/55

Key assumptions of Matsui s Algorithm 2 The statistical model of Algorithm 2 relies on two assumptions Hypothesis of right-key equivalence: The observed correlation ĉ(d, K R ) computed from a data sample of size N obtained by partial decryption using the right key candidate K = K R follows, for all K R, normal distribution with parameters Exp D (ĉ(d, K R )) = c, i.e. is the same for all K R Var D (ĉ(d, K R )) = Exp D (ĉ(d, K R ) Exp D (ĉ(d, K R ))) 2 = 1 N. Wrong-key hypothesis: The observed correlation ĉ(d, K W ) computed from a data sample of size N obtained by partial decryption using a wrong key candidate K = K W follows, for all K W, normal distribution with parameters Exp D (ĉ(d, K W )) = 0 Var D (ĉ(d, K W )) = 1 N. 23/55

Variance of observed correlation We explain the wrong key variance. Let N be the size of the sample and N 0 be the number of observed pairs (x, y) computed with the wrong key that satisfy u x + w y = 0. N 0 is binomially distributed with expected value N/2 and variance N/4. For large N we can approximate the binomial distribution with the normal one. From ĉ = 2N 0 1 N it follows that Exp D (ĉ(d, K W )) = 0 and Var D (ĉ(d, K W )) = 1 N. The data variance of the correlation computed for the correct key is approximately the same. As N grows the variance decreases. 24/55

Normal distributions of observed correlations in Algorithm 2 Legend: black = wrong key, red = right key with Exp D (ĉ(d, K R )) < 0, blue = right key with Exp D (ĉ(d, K R )) > 0 25/55

Section: Success probability and data complexity 26/55

Statistical tests Linear cryptanalysis makes use of a statistical hypothesis test. Algorithm 1 makes a decision between H 0 : v k = 0 H 1 : v k = 1 Algorithm 2 makes a decision between H 0 H 1 : K = K R, that is, G 1 K (E k,k R (x)) = E k (x), for all x : K = K W, that is, data pairs (x, G 1 K (E k,k R (x)) are not from the cipher 27/55

The setting Two normal deviates T W and T R such that T W N(µ W,σ 2 W) and T R N(µ R,σ 2 R). (w.l.g. assume µ W < µ R ) Value of T computed from a sample drawn from either the distribution of T W or the one of T R. The task is to decide which one of the two. 28/55

Success probability Threshold value Θ: T Θ T is drawn from the distribution of T W, T Θ T is drawn from the distribution of T R. We set bounds to the error probabilities Pr(T W T > Θ) α 0 and Pr(T R T Θ) α 1, which are satisfied if µ W +σ W ζ 0 Θ µ W σ R ζ 1, where Φ(ζ i ) = 1 α i for i = 1, 2, and Φ is the cumulative distribution function of the standard normal distribution. 29/55

Example success area in Algorithm 1 0 c ĉ 30/55

Example success area in Algorithm 2 0 c ĉ 31/55

Success probability in Algorithm 2 We denote α 0 = 2 a 1 and α 1 = 1 P S, where a is often called the advantage and P S is the success probability that the right key is accepted. The value a indicates how many bits of the right key are correctly determined. Then a set of 2 l a keys remains to be searched. Question: Why α 0 = 2 a 1 and not 2 a? 32/55

Success probability in Algorithm 2 Plugging in the parameters of distributions 1 Φ 1 (1 2 a 1 ) Θ c 1 Φ 1 (P S ) N N Such a value Θ exists if P S N Φ( N c Φ 1 (1 2 a 1 ), or equivalently ( Φ 1 (1 2 a 1 )+Φ 1 (P S ) ) 2 c 2 This lower bound of N is an estimate of the data complexity required to get a bits of the right key with success probability P S. 33/55

Why α 0 = 1 2 a 1? Answer: Samples computed with wrong keys will be accepted on both sides. To have the total probability less than 2 a both sides must have probability less than 2 a 1. Question: Why the success probability is not halved? Answer: There is only one correct key. It can fail the test only on one side of the distribution. 34/55

Statistical model of Algorithm 2 Φ the cumulative distribution function of the standard normal distribution P S success probability, ϕ PS = Φ 1 (P S ) 2 a is the proportion of accepted wrong keys a is the advantage of the attack, ϕ a = Φ 1 (1 2 a 1 ) µ R and σr 2 are the mean and variance of the normal deviate ĉ(d, K R ) for the right key, and µ W and σw 2 are the mean and variance of the normal deviate ĉ(d, K W ) for the wrong key. Then the success probability can be determined by ( ) µr µ W σ W ϕ a P S Φ. σ R 35/55

Summary of Algorithm 2 Fix the advantage a, where a t and t is the length in bits of the last round key. Fix the success probability P S Compute the sample size N from ( Φ 1 (1 2 a 1 )+Φ 1 (P S ) ) 2 N = c 2 Use a sample D of N known plaintext-ciphertext pairs Compute the observed correlations for all 2 t last round key candidates. Take those 2 t a of the round key candidates that have the largest empirical correlation (in absolute value). Complement them to full l-bit keys by appending the remaining l t bits to them (assuming the key-schedule allows this). Search the set of 2 l a cipher key candidates exhaustively. 36/55

Section: Linear Hull theorem 37/55

Fixed-key correlation of a linear characteristic Recall c x (u x + w E k (x)) = τ r ( 1) τ i k i c z (τ i z +τ i+1 g(z)), i=1 where τ = ( u,τ 2,...,τ r, w), and that for any v, c x (u x + v k + w E k (x)) = ( 1) v k c x (u x + w E k (x)) r r = ( 1) τ k c z (τ i z +τ i+1 g(z)). i=1 c z (v i z + v i+1 g(z))+ τ v We also computed the average of this correlation over the keys. The Right Key Hypothesis states that all right keys behave as the average. This is clearly not the case. Next we compute the variance of c x (u x + v k + w E k (x)) as the key k varies. i=1 38/55

The Fundamental Theorem By Jensen s inequality Avg k c x (u x + v k + w E k (x)) 2 c(u, v, w) 2, for all v, and in general the strict inequality holds. More accurately, the following theorem holds The Linear Hull Theorem [KN94, KN01, DR2007] If the round keys of a block cipher E k take on all values (aka E k is a long-key cipher), then Avg k c x (u x + w E k (x)) 2 = c(u,τ, w) 2. τ We denote ELP(u, w) = Avg k (c x (u x + w E k (x))) 2 and call it the expected linear potential of (u, w). 39/55

Toy example cont d Consider the previous example. We saw that in terms of single characteristics, all (u, w) are about equally good, but there are no dominant characteristics. Also in terms of linear hulls, all (u, w) have about equally large ELP: ELP(, ) = 2 10.40 ELP(u, w) 2 9.65 = ELP(, ) c(u x + w E k (x)) 2 ELP(, ), for 76 keys k. The weakest of (u, w) is (, ). For this mask pair c x (u x + w E k (x)) = 0, for 33 keys k. For the remaining 223 keys we have c x (u x + w E k (x)) 2 6. c(u x + w E k (x)) 2 ELP(, ), for 80 keys k. 40/55

Computing an estimate of ELP(u, w) ELP(u, w) = Avg k c x(u x + w E k (x)) 2 = r c z(τ i z +τ i+1 g(z)) 2 τ 2,...,τ r i=1 = c z(τ r z + w g(z)) 2 c z(τ r 1 z +τ r g(z)) 2 τ r τ r 1 c z(τ 3 z +τ 4 g(z)) 2 τ 2 τ 3 z +τ 3 g(z)) 2 c z(u z +τ c z(τ 2 2 g(z)) 2 This expression gives an iterative algorithm: start from the bottom line to compute for each τ 3 the value on the last line. Can be made feasible by restricting to τ with low Hamming weight and keeping only the largest values from each iteration. Restrictions on τ will lead to a lower bound of ELP(u, w), which is still much larger than any c(u, v, w) 2. 41/55

Section: Key variance and more realistic key hypotheses 42/55

Key-variance of observed correlation Wrong key case see also [BT2013] The wrong-key hypothesis states that Exp D (ĉ(d, K W )) is equal for all keys K W. This is not true in practice. Denote c(k W ) = Exp D (ĉ(d, K W )). Then Exp KW (c(k W )) = 0 To compute the variance, we use Var KW (c(k W )) = Exp KW ( c(kw ) 2) ( Exp KW (c(k W )) ) 2. By [DR2007], Corollary 7, Exp KW ( c(kw ) 2) = 2 n. Then Var KW (c(k W )) = 2 n. 43/55

Key-variance of observed correlation Right key case (New!) The Right-key equivalence hypothesis states that Exp D (ĉ(d, K R ) ) is equal for all keys K R. This is not realistic. Denote Exp D (ĉ(d, K R )) = c(k R ). If c(k R ) > 0 set K R K +. Else K R K Hypothesis of Algorithm 2 (single dominant characteristic): Exp KR K ( c(k R)) Exp KR K +( c(k R)) Denote Exp KR K + ( c(k R)) = c To compute the variance of c(k R ), for K R K + (similarly for K R K ) we apply again the rule Var X (F(X)) = Exp X ( F(X) 2 ) (Exp X (F(X))) 2. Then we get Var KR K + ( c(k R)) = ELP c 2. 44/55

Total variance of observed correlation Wrong key case To compute the total variance over data and wrong keys, we recall the rule Var X,Y (F(X, Y)) = Exp X (Var Y (F(X, Y)))+Var X (Exp Y (F(X, Y)))... and get Var D,KW (ĉ(d, K W )) = 1 N + 2 n. 45/55

Total variance of observed correlation Right key case To compute the total variance of ĉ(d, K R ) over data and right keys, we use the same rule as above... and get Exp KR (Var D (ĉ(d, K R ))) = 1 N Var KR (Exp D (ĉ(d, K R ))) = Var KR ( c(k R )) = ELP c 2. Var D,KR K + (ĉ(d, K R)) =Var D,KR K (ĉ(d, K R)) = 1 N + ELP c2 46/55

How large is the right key variance Matsui s Algorithm uses one dominant characteristic (u, v, w). By the Linear Hull theorem ELP c(u, v, w) 2 = τ v c(u,τ, w) 2, and Exp D,KR ĉ(d, K R ) = c = c(u, v, w). In the ideal case, the sum on the right side of the first equation is the variance of pure noise, which has mean equal to zero and variance equal to 2 n. We obtain ELP c 2 2 n. Note that the classical hypothesis of right key equivalence assumes ELP c 2 = 0. 47/55

Recalling the statistical model of Algorithm 2 Φ the cumulative distribution function of the standard normal distribution P S success probability, ϕ PS = Φ 1 (P S ) 2 a is the proportion of accepted wrong keys a is the advantage of the attack, ϕ a = Φ 1 (1 2 a 1 ) µ R and σr 2 are the mean and variance of the normal deviate ĉ(d, K R ) for the right key, and µ W and σw 2 are the mean and variance of the normal deviate ĉ(d, K W ) for the wrong key. Then the success probability can be determined by ( ) µr µ W σ W ϕ a P S Φ. σ R 48/55

Success probability and data complexity Now we plug in the new parameters of Matsui s Algorithm 2 to get ( c N ) 1+N2 P S Φ n ϕ a. N(ELP c2 )+1 If ELP = c 2 (key-equivalence) then this is identical to the result in [BT2013] Eq.(6). In reality, we have ELP c 2 > 2 n, and we derive the data complexity estimate as N (ϕ a +ϕ PS ) 2 c 2 (ELP c 2 )(ϕ a +ϕ PS ) 2 +ϕ 2 a(elp c 2 2 n ) 49/55

Extensions and variations Zero-correlation: c = c(k R ) = 0 for all keys Is it possible to handle a case with two dominant characteristics? Example [AES book] linear approximation is composed of more than one dominant characteristics c(u x + w E k (x)) = ( 1) γ k c γ +( 1) λ k c λ where c γ and c λ are the correlations of the characteristics Then Avg(c(u x +γ k + w E k (x)) = c γ. But this gives a usable estimate only for a half of the keys. Those are the keys for which λ k = 0 and then For the remaining keys we have c(u x +γ k + w E k (x)) = c γ + c λ c(u x +γ K + w E K (x)) = c γ c λ 0. 50/55

The case of about equally small characteristics To resist linear cryptanalysis, [DR2007] advices designers to take care that no single dominant characteristic exists. It will make linear cryptanalysis harder, but not impossible. We have for the right key Exp D (ĉ(d, K R )) = c x (u x + w E k (x)) N(0, ELP) from where we get Theorem 22 of [DR2007] Theorem. If the number of characteristics with non-zero correlation is large and all characteristic correlations are small compared to ELP, then c x (u x + w E k (x)) χ 2 (1) ELP 51/55

The case of about equally small characteristics, cont d For the wrong keys Exp D (ĉ(d, K W )) N(0, 2 n ) as before, and 2 n Exp D (ĉ(d, K W )) 2 χ 2 (1) If ELP >> 2 n the distributions of the observed correlations for the wrong keys and right keys can be distinguished. 52/55

Multiple/Multidimensional linear cryptanalysis So far we fixed (u, v). By collecting a number, say m, strong linear approximations with different mask pairs (u, w) the sum of their observed squared correlations is a multiple of a χ 2 (m) distributed random variable... provided that the correlations are independent random variables. To overcome this problem, it is possible to consider distributions of data extracted from plaintext-ciphertext pairs with an expected distribution which is far from uniform. Such non-uniform distributions can be found by with the help of linear approximations with high correlations. They may also be found with the help of truncated differentials with high probabilities. 53/55

Summary Linear cryptanalysis using one dominant characteristic First the statistical model of observed correlation was presented for a fixed key Traditional key-equivalence hypotheses were stated As they are known not to hold in practice for all keys, we studied the variance in the correlation due to the key for wrong keys and also for right keys. New estimates, with improved theoretical justification, of success probability and data complexity achieved. Among modern ciphers, cases with single dominant linear characteristics are very rare The statistical model for uniformly small characteristics correlations briefly described Enhancements by using more linear approximations was discussed. 54/55

Cited papers and books [KN94] K. Nyberg: Linear approximation of block ciphers. In Advances in Cryptology - EUROCRYPT 94, volume 950 of Lecture Notes in Computer Science. Springer-Verlag, 1995. [JD94] J. Daemen: Correlation Matrices. In Fast Software Encryption, FSE 2, volume 1008 of Lecture Notes in Computer Science. Springer-Verlag, 1995. [KN01] K. Nyberg: Correlation theorems in cryptanalysis. Discrete Applied Mathematics, 111:177-188, 2001. [AES book] J. Daemen and V. Rijmen: The Design of Rijndael: AES - The Advanced Encryption Standard. Springer-Verlag, 2002. [Stinson] D. R. Stinson: Cryptography: Theory and Practice, 3rd ed.. CRC Press, 2005. [Knudsen-Robshaw] L. R. Knudsen and M. Robshaw: The Block Cipher Companion. Springer, 2011. [DR2007] Joan Daemen and Vincent Rijmen: Probability distributions of correlation and differentials in block ciphers. J. Mathematical Cryptology, 1(3):221 242, 2007. [BT2013] Andrey Bogdanov and Elmar Tischhauser: On the wrong key randomisation and key equivalence hypotheses in Matsui s algorithm 2. FSE 2013, LNCS 8424, pages 19 38. 55/55