Hash Functions Adam O Neill Based on http://cseweb.ucsd.edu/~mihir/cse207/
Where we are We ve seen a lower-level primitive (blockciphers) and a higher-level primitive (symmetric encryption)
Where we are We ve seen a lower-level primitive (blockciphers) and a higher-level primitive (symmetric encryption) Now let s see another lower-level primitive, hash functions
Where we are We ve seen a lower-level primitive (blockciphers) and a higher-level primitive (symmetric encryption) Now let s see another lower-level primitive, hash functions Their primary purpose is data compression, but they have many other uses as well
Where we are We ve seen a lower-level primitive (blockciphers) and a higher-level primitive (symmetric encryption) Now let s see another lower-level primitive, hash functions Their primary purpose is data compression, but they have many other uses as well They are treated a bit like a magic wand
Universal Hashing Attack Game 7.1 (universal hash function). For a keyed hash function H defined over (K, M, T ), and a given adversary A, the attack game runs as follows. The challenger picks a random k R H :KxM T HCK, A outputs two distinct messages m 0,m 1 2 M. K and keeps k to itself. We say that A wins the above game if H(k, m 0 )=H(k, m 1 ). We define A s advantage with respect to H, denoted UHFadv[A,H], as the probability that A wins the game. 2. )=HkC. 7
Universal Hashing Attack Game 7.1 (universal hash function). For a keyed hash function H defined over (K, M, T ), and a given adversary A, the attack game runs as follows. The challenger picks a random k R A outputs two distinct messages m 0,m 1 2 M. K and keeps k to itself. We say that A wins the above game if H(k, m 0 )=H(k, m 1 ). We define A s advantage with respect to H, denoted UHFadv[A,H], as the probability that A wins the game. 2 Definition 7.2. Let H be a keyed hash function defined over (K, M, T ), We say that H is an -bounded universal hash function, or -UHF, ifuhfadv[a,h] apple for all adversaries A (even ine cient ones). We say that H is a statistical UHF if it is an -UHF for some negligible. We say that H is a computational UHF if UHFadv[A,H] is negligible for all e adversaries A. cient
3. t t k + ai 2 Zp The challenge in constructing good universal hash functions (UHFs) is to construct a function that 4. Output t achieves a small collision probability using a short key. Preferably, the size of the key should not depend the length of the being hashed. Wethe give three constructions. first is an It is noton difficult to show thatmessage this algorithm produces same value as defined inthe (7.3). Observe elegant construction a statistical UHFone using modular arithmetic Our second that a long message ofcan be processed block at a time using and littlepolynomials. additional space. Every construction is based on the CBC and functions defined in Section 6.4. We show that both iteration takes one multiplication and cascade one addition. are On computational UHFs. thirdmultiplication construction is based onfour PMAC Section 0 from a machine that hasthe several units, say units, we can use a6.11. 4-way parallel version of Horner s method to utilize all the available units and speed up the evaluation of Hpoly. to Coefficients are is input polynomial 7.2.1 Construction using polynomials Assuming the length of m1: is auhfs multiple of 4,input simply replace lines (2) and (3) above with thethe following key, Constructions We start construction usingi polynomials a prime. Let ` be a (poly-bounded) 2. with Forai UHF 1 to v incrementing by 4 at everymodulo iteration: length3.parametert andt let function k 4 p+ be ai ak 3prime. + ai+1 We k 2 define + ai+2a khash + ai+3 2 Zp Hpoly that hashes a message m 2 Z ` p to a single element t 2 Zp. The key space is K := Zp. 2 3 4 OneLet canmprecompute the so values at every iteration we process four blocks of be a message, m =k(a,1k, a,2k,..in., azvp). 2Then Z ` p for some 0 v `. Let k 2 Zp be a key. the message using four can all be done in parallel. The hash function Hpolymultiplications (k, m) is definedthat as follows: v v 1 v 2 : H k, (a,..., a ) = k + a k + a k + + aifv p1 kis+super-poly, av 2 Zp this implies (7.3) 1 v 1 2 poly Security as a UHF. Next we show that Hpoly is an (`/p)-uhf. that `/p is negligible, which means that Hpoly is a statistical UHF. 247 Lemma 7.2. The function Hpoly over (Zp, (Zp ) `, Zp ) defined in (7.3) is an (`/p)-uhf. Proof. Consider two distinct messages m0 = (a1,..., au ) and m1 = (b1,..., bv ) in (Zp ) `. We show that Pr[Hpoly (k, m0 ) = is over the random choice of let Hpoly (k, m1 )] `/p, where the probability and ).az au Ca, Cai,, key k in Zp. Define the two polynomials: proo± distinct be ( a.,.az?..,aj)e7p... ai. u := X u + a1 Xthe f (X)Consider := )X + b1 XIt g(x) ai v,...,ar -. 1 v 1 +polynomial a2 X u 2 + + au 1with X + au + b2has X v 2 +at + bv 1 X +l bv roots most in Zp [X]. Then, by definition of Hpoly weatneed to show that random probability His zelo coefficients a point is. E so Xp (7.4) the. Pr[f (k) = g(k)] `/p where k is uniform in Zp. In other words, we need to bound the number of points k 2 Zp for which f (k) g(k) = 0. Since the messages m0 and m1 are distinct we know that f (X) g(x) is a nonzero
i PRF as a computational way UHF let FskxM T be a PRF. Then F is a computational UHF. prooi Suppose A is an adversary against the UHF. we construct an adversary Adversary BMC ) B against the PRF Run A let ( m If Fncm, )=F( A,,mz ) be 's output mz ) ret I Elsegt
Collision Resistance Definition: A collision for a function h : D! {0, 1} n is a pair x 1, x 2 2 D of points such that h(x 1 )=h(x 2 )butx 1 6= x 2. If D > 2 n then the pigeonhole principle tells us that there must exist a collision for h.
The Formalism Key less means µ K e } The formalism considers a family H :Keys(H) D! R of functions, meaning for each K 2 Keys(H) wehaveamaph K : D! R defined by H K (x) =H(K, x). Let Game CR H procedure Initialize K $ Keys(H) Return co K procedure Finalize(x 1, x 2 ) If (x 1 = x 2 )thenreturnfalse If (x 1 62 D or x 2 62 D) thenreturnfalse Return (H K (x 1 )=H K (x 2 )) Adv cr H (A) =Pr h CR A H ) true i.
Keyless Hash Functions Practical cryptographic hash functions (SHA3 etc) are keyless
Keyless Hash Functions Practical cryptographic hash functions (SHA3 etc) are keyless However, no keyless hash function can be collision resistant
Keyless Hash Functions Practical cryptographic hash functions (SHA3 etc) are keyless However, no keyless hash function can be collision resistant This leads to the foundations-of-hashing dilemma (Rogaway)
Keyless Hash Functions Practical cryptographic hash functions (SHA3 etc) are keyless However, no keyless hash function can be collision resistant This leads to the foundations-of-hashing dilemma (Rogaway) The solution is to use keyless hash functions in reductions only (no definition)
Example Ek ( XII ) a xtr ] = Let E: {0, 1} k {0, 1} n! {0, 1} n be a blockcipher. Let H: {0, 1} k {0, 1} 2n! {0, 1} n be defined by EKCXED #x.ii Alg H(K, x[1]x[2]) y E K (E K (x[1]) x[2]); Return y Ek ( Ek ( x[ itotxtz ] ) = Ekttklx ' a ] ) Ex 't D) Adversary A ( k ) Choose XEI ] set YTH = EIXID, xci, XIDF 'Ll ] ) # XIH aek(x' Ii ) ) return ( xtihih ], x ' 'Ll ] x 'M )
Alg SHA1(M) // M < 2 64 Toe SHA1 V SHF1( 5A827999 k 6ED9EBA1 k 8F1BBCDC k CA62C1D6, M ) return V Alg SHF1(K, M) // K = 128 and M < 2 64 y shapad(m) Parse y as M 1 k M 2 k k M n where M i = 512 (1 apple i apple n) V 67452301 k EFCDAB89 k 98BADCFE k 10325476 k C3D2E1F0 for i =1,...,n do V shf1(k, M i k V ) return V function M ' = shaped ( M ) 1 f t#dt an M MZ,. v -... Alg shapad(m) // M < 2 64 d (447 M ) mod 512 Let ` be the 64-bit binary representation of M y M k 1 k 0 d k ` // y is a multiple of 512 return y
shf1 Alg shf1(k, B k V ) // K = 128, B = 512 and V = 160 Parse B as W 0 k W 1 k k W 15 where W i = 32 (0 apple i apple 15) Parse V as V 0 k V 1 k k V 4 where V i = 32 (0 apple i apple 4) Parse K as K 0 k K 1 k K 2 k K 3 where K i = 32 (0 apple i apple 3) for t = 16 to 79 do W t ROTL 1 (W t 3 W t 8 W t 14 W t 16 ) A V 0 ; B V 1 ; C V 2 ; D V 3 ; E V 4 for t = 0 to 19 do L t K 0 ; L t+20 K 1 ; L t+40 K 2 ; L t+60 K 3 for t = 0 to 79 do if (0 apple t apple 19) then f (B ^ C) _ (( B) ^ D) if (20 apple t apple 39 OR 60 apple t apple 79) then f B C D if (40 apple t apple 59) then f (B ^ C) _ (B ^ D) _ (C ^ D) temp ROTL 5 (A)+f + E + W t + L t E D ; D C ; C ROTL 30 (B); B A ; A temp V 0 V 0 +A ; V 1 V 1 +B ; V 2 V 2 +C ; V 3 V 3 +D ; V 4 V 4 +E V V 0 k V 1 k V 2 k V 3 k V 4 ; return V
SHA3 Selected October 2012 as winner of NIST competition
SHA3 Selected October 2012 as winner of NIST competition NOT Merkle-Damgard, uses sponge construction
SHA3 Selected October 2012 as winner of NIST competition NOT Merkle-Damgard, uses sponge construction Natively supports variable-length output
Sponge Function 9 9 g q Figure 8.11: The sponge construction Input: M 2 {0, 1} applel and ` > 0 Output: a tag h 2 {0, 1} v // Absorbing stage Pad M and break into r-bit blocks m 1,...,m s h for i 0 n 1tos do m 0 i m i k 0 c 2 {0, 1} n h (h m 0 i ) // Squeezing stage z h[0.. r 1] for i 1todv/re do h (h) z z k (h[0.. r 1]) output z[0.. v 1]
Applications Killer app: Hash before sign
Applications Killer app: Hash before sign Used both in security and non-security applications
Applications Killer app: Hash before sign Used both in security and non-security applications Password verification
Applications Killer app: Hash before sign Used both in security and non-security applications Password verification Compare-by-Hash
Applications Killer app: Hash before sign Used both in security and non-security applications Password verification Compare-by-Hash Virus protection
Applications Killer app: Hash before sign Used both in security and non-security applications Password verification Compare-by-Hash Virus protection
Birthday Attack Let H : {0, 1} k D! {0, 1} n. adversary A(K) for i =1,...,q do x i $ D ; y i H K (x i ) if 9i, j (i 6= j and y i = y j and x i 6= x j ) then return x i, x j else return FAIL
Analysis Let H : {0, 1} k D! {0, 1} n. adversary A(K) for i =1,...,q do x i $ D ; y i H K (x i ) if 9i, j (i 6= j and y i = y j and x i 6= x j ) then return x i, x j else return FAIL What is the probability that this attack finds a collision? adversary A(K) for i =1,...,q do x i $ D ; y i H K (x i ) if 9i, j (i 6= j and y i = y j ) then COLL true We have dropped things that don t much a ect the advantage and focused on success probability. So we want to know what is Pr [COLL].
Choose points at random from range Birthday choose points at random from domain E Adversary A, hash them for i =1,...,q do for i =1,...,q do y $ i {0, 1} n x $ ttrvtfandon i D ; y i H K (x i ) if 9i, j (i 6= j and y i = y j ) then if 9i, j(i 6= j and y i = y j ) then COLL true COLL true Pr [COLL] = C(2 n, q) Pr [COLL] =? Are the two collision probabilities the same?
Regularity We say that H : {0, 1} k D! {0, 1} n is regular if every range point has the same number of pre-images under H K.Thatisifwelet then H is regular if H 1 K (y) ={x 2 D : H K (x) =y} H 1 D K (y) = 2 n for all K and y. In this case the following processes both result in a random output Process 1 y $ {0, 1} n return y Process 2 x $ D; y $ H K (x) return y
If H: {0, 1} k D! {0, 1} n is regular then the birthday attack finds a collision in about 2 n/2 trials. If H is not regular, the attack may succeed sooner. So we want functions to be close to regular. It seems MD4, MD5, SHA1, SHA2, SHA3,... have this property.
Birthday attack times Function n T B MD4 128 2 64 MD5 128 2 64 SHA1 160 2 80 SHA2-256 256 2 128 SHA2-512 512 2 256 SHA3-256 256 2 128 SHA3-512 512 2 256 Muff} deignedby 7990 's hash output is 128 bits. SHAI hash output is 160 bits T B is the number of trials to find collisions via a birthday attack. M 5I*s{ tp Ps*,., Y -7 168. - '
Compression Functions A compression function is a family h : {0, 1} k {0, 1} b+n! {0, 1} n of hash functions whose inputs are of a fixed size b + n, whereb is called the block size. E.g. b = 512 and n = 160, in which case h : {0, 1} k {0, 1} 672! {0, 1} 160 x v h K h K (x k v)
MD Transform Merkle - Damgoird Design principle: To build a CR hash function where D = {0, 1} apple264 : H : {0, 1} k D! {0, 1} n First build a CR compression function h : {0, 1} k {0, 1} b+n! {0, 1} n. Appropriately iterate h to get H, usingh to hash block-by-block.
Setup Assume for simplicity that M is a multiple of b. Let kmk b be the number of b-bit blocks in M, andwrite M = M[1]...M[`] where` = kmk b. hii denote the b-bit binary representation of i 2 {0,...,2 b 1}. D be the set of all strings of at most 2 b 1 blocks, so that kmk b 2 {0,...,2 b 1} for any M 2 D, andthuskmk b can be encoded as above.
The Transform Given: Compression function h : {0, 1} k {0, 1} b+n! {0, 1} n. Build: Hash function H : {0, 1} k D! {0, 1} n. Algorithm H K (M) m kmk b ; M[m + 1] hmi ; V [0] 0 n For i =1,...,m +1dov[i] h K (M[i] V [i 1]) Return V [m + 1] M M ' Hm#Hill ; M[1] M[2] o h2i iv. h 0 n K t A h K h K H K (M) chtainirgvaivbll If SHAI this is Not on
MD preserves CR Assume h is CR H is built from h using MD Then H is CR too! This means No need to attack H! You won t find a weakness in it unless h has one H is guaranteed to be secure assuming h is. For this reason, MD is the design used in many current hash functions. Newer hash functions use other iteration methods with analogous properties.
The Theorem Theorem: Let h : {0, 1} k {0, 1} b+n! {0, 1} n be a family of functions and let H : {0, 1} k D! {0, 1} n be obtained from h via the MD transform. Given a cr-adversary A H we can build a cr-adversary A h such that Adv cr 2 H (A H) apple Adv cr h (A h) + -. and the running time of A h is that of A H plus the time for computing h on the outputs of A H. Implication: h CR ) Adv cr h (A h)small ) Adv cr H (A H)small ) H CR
Proof
6 Case 1 k k k k Let x 1 = h2i V 1 [2] and x 2 = h1i V 2 [1]. Then h K (x 1 )=h K (x 2 )becauseh K (M 1 )=H K (M 2 ). But x 1 6= x 2 because h1i 6= h2i.
Case 2 k k k k a #. : x 1 h2i V 1 [2] ; x 2 h2i V 2 [2] If x 1 6= x 2 then return x 1, x 2 Else // V 1 [2] = V 2 [2] x 1 M 1 [2] V 1 [1] ; x 2 M 2 [2] V 2 [1] If x 1 6= x 2 then return x 1, x 2 Else // V 1 [1] = V 2 [1] x 1 M 1 [1] 0 n ; x 2 M 2 [1] 0 n Return x 1, x 2 Mihir Bellare UCSD
Joux s Attack
Joux s Attack random functions may lead to incorrect conclusions when applied to a Merkle-Damgård function. We say that an s-collision for a hash function H is a set of messages M 1,...,M s 2 M such that H(M 1 )=...= H(M s ). Joux showed how to find an s-collision for a Merkle-Damgård function in time O((log 2 s) X 1/2 ). Using Joux s method we can find a 2 n/2 -collision M 1,...,M 2 n/2 for H 1 in time O(n2 n/2 ). Then, by the birthday paradox it is likely that two of these messages, say M i,m j, are also a collision for H 2. This pair M i,m j is a collision for both H 1 and H 2 and therefore a collision for H 12. It was found in time O(n2 n/2 ), as promised.
v = ExitI Compression Function from Blockcipher: An Example Let E : {0, 1} b {0, 1} n! {0, 1} n be a block cipher. Let us design keyless compression function h : {0, 1} b+n! {0, 1} n by h(x v) =E x (v) Is H collision resistant? Ex Cu ) = Ex Cv ' ) Extil )
Davies-Meyer y := m i 2 K L t i := E(m i, t i 1 ) t i 1 2 X x := t i 1 E Figure 8.6: The Davies-Meyer compression function h ( xlly Ey = ) ( x ) # Eyfx ) ox = Ey, ( x ' ) ox ' Ey ( ) = = Eye ' Ey, ( x, ( x ' / ox fty ' ) ax ' ' A X 8 x )
Other Examples Davies-Meyer variants. The Davies-Meyer construction is not unique. Many other similar methods can convert a block cipher into a collision resistant compression function. For example, one could use Matyas-Meyer-Oseas: h 1 (x, y) := E(x, y) y Miyaguchi-Preneel: h 2 (x, y) := E(x, y) y x Or even: h 3 (x, y) := E(x y, y) y or many other such variants. Preneel et al. [89] give twelve di erent variants that can be shown to be collision resistant.
Davies-Meyer Security A Theorem 8.4 (Davies-Meyer). Let h DM be the Davies-Meyer hash function derived from a block cipher E =(E,D) defined over (K, X ), where X is large. Then h DM is collision resistant in the ideal cipher model. In particular, every collision finding adversary A that issues at most q ideal-cipher queries will satisfy CR ic adv[a,h DM ] apple (q + 1)(q + 2)/ X.
Proof
Cryptanalysis So far we have looked at attacks that do not attempt to exploit the structure of H. Can we do better than birthday if we do exploit the structure? Ideally not, but functions have fallen short!
MD5 Designed by Ron Rivest in the early 1990s and still widely used.
MD5 Designed by Ron Rivest in the early 1990s and still widely used. Collisions can be found in under a minute [WaFeLaYu,LeWadW,KI]
MD5 Designed by Ron Rivest in the early 1990s and still widely used. Collisions can be found in under a minute [WaFeLaYu,LeWadW,KI] Can find two Win32 executables whose MD5 hashes collide (break virus protection)
MD5 Designed by Ron Rivest in the early 1990s and still widely used. Collisions can be found in under a minute [WaFeLaYu,LeWadW,KI] Can find two Win32 executables whose MD5 hashes collide (break virus protection) Can break deployed cryptographic protocols
SHA1
SHA3 Theorem 8.6. Let H be the hash function obtained from a permutation : {0, 1} n! {0, 1} n,with capacity c, rater (so n = r + c), and output length v apple r. In the ideal permutation model, where is modeled as a random permutation, the hash function H is collision resistant, assuming 2 v and 2 c are super-poly. In particular, for every collision finding adversary A, if the number of ideal-permutation queries plus the number of r-bit blocks in the output messages of A is bounded by q, then CR ic adv[a,h] apple q(q 1) 2 v + q(q + 1) 2 c.