Perfectly Balanced Boolean Functions and Golić Conjecture Stanislav V. Smyshlyaev Abstract: Golić conjecture ([3]) states that the necessary condition for a function to be perfectly balanced for any choice of a tapping sequence is linearity of a function in the first or in the last essential variable. In the current paper we prove Golić conjecture. Keywords: Boolean function, perfectly balanced function, keystream generator, filter, Golić conjecture. Introduction Golić ([3]) studied cryptographic properties of keystream generators consisting of a shift register and a filter function which is connected to the register according to some tapping sequence. Golić considered model of a keystream generator as a filter with a fixed filter function and arbitrary choice of a tapping sequence. With proposal of an inversion attack on the filter he showed a cryptographic weakness of keystream generators in such model in the case of a filter function being linear either in the first or in the last variable. Earlier Anderson ([]) proposed an idea of optimum correlation attack and showed corresponding cryptographic weakness of such keystream generators in case of inappropriate choice of both the tapping sequence and the filter function. The important open question was: do any keystream generators without these undesirable properties exist in Golić model? Golić conjectured that in his model a filter with a filter function f is invulnerable to Anderson optimum correlation attack if and only if f is linear either in the first or in the last variable. Golić proved an easier part of this conjecture, namely sufficiency, and noted that necessity remained unproven due to a subtle underlying combinatorial problem remaining to be solved. According to Golić conjecture, the necessary condition for a function to be perfectly balanced (i. e. preserving pure randomness of an input binary sequence when used as a filter function) for any choice of a tapping sequence is linearity of a function in the first or in the last essential variable. Golić conjecture implies that in the model being considered (with The work was partially supported by the Russian Foundation for Basic Research (grant no. 09-0-00653) Computer Science Department, Lomonosov University, Moscow, Russia; E-mail: smyshsv@gmail.com
independent choice of a tapping sequence and a Boolean function) there are no functions invulnerable both to the inversion attack and to the optimum correlation attack. To prove Golić conjecture, it suffices to find for arbitrary Boolean function which is nonlinear in the first and in the last essential variables a tapping sequence, such that the Boolean function which describes input-output behaviour of the corresponding filter does not satisfy conditions of the Sumarokov criterion of perfect balancedness ([]). The trivial case of a function with no linear variables was considered in [7]. In the general case, all linear variables of a function have to be handled in a special way to construct a particular tapping sequence and two binary sequences required by Sumarokov criterion. This in fact solves an underlying combinatorial problem mentioned by Golić. Related work. Sumarokov ([]) defined perfect balancedness of Boolean functions. A perfectly balanced filter function transforms uniformly distributed input sequences into uniformly distributed output sequences. Also, Sumarokov proved a useful criterion of perfect balancedness. Dichtl ([2]) offered an example of a Boolean function that is nonlinear in the first and in the last variables but is perfectly balanced when used as a filter function with a certain choice of a tapping sequence. That example does not rule out Golić conjecture because of the fact that some other choices of a tapping sequence do not induce perfect balancedness of corresponding filter functions. Gouget and Sibert ([4]) suggested not to consider a Boolean function independently of a tapping sequence used in a filter and noted that one class of perfectly balanced functions nonlinear in both the first and the last variable was described by Logachev ([6]). Nevertheless, existence of this class is not in conflict with Golić conjecture either, because the models are different. 2 Definitions As usual, F 2 denotes the Galois field. For any n N V n denotes F n 2, F n is the set of all Boolean functions in n variables. Variable x i is called essential for the function f(x, x 2,..., x n ) F n if there exists (α, α 2,..., α i, α i+,..., α n ) V n such that f(α, α 2,..., α i, 0, α i+,..., α n ) f(α, α 2,..., α i,, α i+,..., α n ). Variable x i is called linear essential for the function f(x, x 2,..., x n ) F n if for any (α, α 2,..., α i, α i+,..., α n ) V n inequality f(α, α 2,..., α i, 0, α i+,..., α n ) f(α, α 2,..., α i,, α i+,..., α n ) holds. By Φ n, Φ n F n, we denote the set of all Boolean functions with both first and last variables being essential. Let m N. Boolean function g F N, N N, induces mapping g m : V m+n V m of the form g m (z, z 2,..., z m+n ) = (g(z,..., z N ), g(z 2,..., z N+ ),..., g(z m,..., z m+n )). (2.) Let γ = (γ,..., γ n ) be a tuple of nonnegative integers such that γ = 0; γ i+ > γ i, i =, 2,..., n, and let N = γ n +. From now on we consider tuples γ of this form. For γ of the above form and arbitrary f Φ n we denote f(x N γn, x N γn,..., x N γ ) by f γ (x,..., x N ). A filter with a tapping sequence γ, and a filter function f is a mapping of the set to V i, defined by equations (2.) with m =, 2,... and g = f γ. V i i=γ n+ i= 2
Definition 2.. ([]). A Boolean function f F n is said to be perfectly balanced if for any m N and any y V m (f m ) (y) = 2 n, where denotes cardinality. The subset of F n composed by all functions linear in the first (resp. last) variable is denoted by L n (resp. R n ). It is easy to see ([]) that all functions in L n R n are perfectly balanced. 3 Preliminaries We denote the set of all perfectly balanced n -variable functions by PB n, PB n F n. From cryptographic applications point of view the subset Φ n PB n \ (L n R n ) is of primary importance. The next theorem states necessary and sufficient condition for an m -tuple in the righthand side of equation (2.) to be distributed uniformly in V m given the uniform distribution of the vector X m = (x,..., x m+n ) and can be easily proven using only Definition 2. and basics of probability theory. Theorem 3.. Let n N and f F n. Let {X m = (x,..., x m+n )} m= random vectors with distribution be a sequence of Pr{X m = (a,..., a m+n )} = 2 (m+n ) for any (a,..., a m+n ) V m+n. Random vector Y m = f m (X m ) is distributed uniformly for each m N iff f is perfectly balanced. Theorem 3.2. ([]). A Boolean function f F n of distinct binary sequences is perfectly balanced iff there is no pair x = (x,..., x r ), z = (z,..., z r ) V r, r > 2n, (3.) such that x = z,..., x n = z n, x r n+ = z r n+,..., x r = z r ; (3.2) x z; (3.3) f(x i,..., x i+n ) = f(z i,..., z i+n ), i =,..., r n +. (3.4) Full proof of the Theorem 3.2 can be found in Appendix A. Theorem 3.3. ([3]) For a filter with a filter function f for any choice of a tapping sequence γ the output sequence is purely random given that the input sequence is such if (and only if [not proven]) f(z,..., z n ) is balanced for each value of (z 2,..., z n ) (i. e. f is linear in the first variable) or f(z,..., z n ) is balanced for each value of (z,..., z n ) (i. e. f is linear in the last variable). According to Dichtl ([2]), unproven necessary condition in Theorem 3.3 is referred to as Golić conjecture. 3
4 Main Result Theorem 3. implies that Golić conjecture can be stated in the following form. Conjecture 4.. If f γ is perfectly balanced for every possible choice of γ, then f is linear in the first or in the last variable. To prove Golić conjecture it suffices to construct for arbitrary f Φ n \ (L n R n ) a particular tapping sequence making function f γ not perfectly balanced. The key idea is to force γ i increase exponentially in i. After choosing appropriate γ we construct two different binary sequences of the special form required by Sumarokov criterion (Theorem 3.2) to prove that f γ is not perfectly balanced. Theorem 4.2. For any f Φ n \ (L n R n ) there exists a tuple γ such that f γ / PB N. Proof. Let f Φ n \ (L n R n ). Suppose that f depends on each variable essentially (this is w.l.o.g. since we are free to choose any tuple γ ). Choose γ as follows: γ = (τ 0, τ 0 + δ 0,..., τ 0 + (m 0 )δ 0, τ,..., τ k, τ k + δ k,..., τ k + (m k )δ k, τ k+,..., τ n l ), where δ k = τ k+ τ k m k and m k is the number of succeeding linear essential variables of f between (n l k ) th and (n l k) th nonlinear essential variables, k = 0,,..., n l 2 ; l = (m 0 ) +... + (m n l ) is the total number of linear essential variables of f. Let m = max m k k=0,...,n l 2, τ 0 = 0, τ = m 0, τ k+ > (4m 2 + )τ k, k =,..., n l 2 and τ k+ τ k be a multiple of m k. Consider two binary sequences y = (y 0,..., y M ), z = (z 0,..., z M ), M = 2N + l, where k j are indices such that m kj > ( l denotes the total number of these indices). Fix certain bits of these sequences as follows: = 0, z yn+ l =, N+ l a j a j {0, }, j =,..., l. Indices of the form N + l j= a j j= a j are referred to as B-indices and all the others as A- indices. It is easy to conclude using Theorem 3.2, that to prove the Theorem it suffices to show that one can set all yet unfixed bits of y so that f γm N+2 (y) = f γm N+2 (z) and z j = y j holds for any A-index j. Thereby we have distinct binary sequences y, z, y = z > 2N, with coinciding leading as well as tailing N-bit subsequences and such that f γm N+2 (y) = f γm N+2 (z). Then, using Theorem 3.2, one concludes that γ is required tapping sequence, f γ / PB N and the Theorem follows. First, we demonstrate some simple relations.. δ k = τ k+ τ k m k > (+4m2 )τ k τ k m k 4mτ k. 2. If m k >, then τ k = τ k + m k δ k 2δ k. 3. δ k > δ k. From and 2 it follows that if m k >, then δ k > δ k. j= j= 4
l 4. From 3 it follows that < l j=j j=j δ kl () l j 5. δ k = τ k+ τ k m k < τ k+ m k τ k+ if k ; δ 0 τ. < δ kl = δ k l () i i=0 = δ kl. According to Theorem 3.2, to prove the Theorem it suffices to prove solvability of the following system of equations. f γ (y 0,..., y N ) = f γ (z 0,..., z N )... f γ (y M N+,..., y M ) = f γ (z M N+,..., z M ) = 0, a yn+ l j {0, }, j =,..., l a j j= (4.) =, a zn+ l j {0, }, j =,..., l a j j= y t = z t, t N + l a j, a j {0, }, j =,..., l j= Now we fix variables involved in the second subsystem of (4.) and consider i th equation ( i = 0,..., M N + ) of the first subsystem. Three cases are possible. Case. Each B-index variable, which is essential for f i γ f γ (y i,..., y i+n ), is linear for f i γ. In Lemma 4.3 (the proof can be found in Appendix B) we prove that in this case f i γ depends on exactly two such variables. Then, from definition of linear dependence and from our fixation of B-index variables, we get that i th equation of the first subsystem turns out to be identity. Lemma 4.3. Let the set of B-index variables that are essential for fγ i be nonempty, and let fγ i be linear in any B-index variable. Then fγ i is linear in exactly two B-index variables. Case 2. fγ i depends essentially on no B-index variable. In this case we have a trivial equality, since variables with equal A-indices are equal, i.e. y j = z j, j N + l a j. j= Case 3. f i γ depends essentially and nonlinearly on some B-index variable y _ j i. Lemma 4.4 (the proof is in Appendix C) states that in this case fγ i is nonlinear in exactly one essential B-index variable. In other words, if any other B-index variable is essential for fγ i, then the latter is linear essential variable of fγ i. Lemma 4.4. If fγ i depends essentially and nonlinearly on some B-index variable, then there is exactly one such (nonlinear, essential, B-index) variable. Therefore i th equation of the system could be written in the form φ(y j i, y j i 2,..., y _ j i, 0, y_ j i +,..., y j i n l ) = φ(y j i, y j i 2,..., y_ j i,, y_ j i +,..., y j i n l ) ζ i, where φ is the function constructed from f by setting all linear essential variables to zero; (y j i, y j i 2,..., y _, j i y_,..., y j i + jn l i ) are yet unfixed variables and ζ i is a constant. The 5
variable y _ is essential and nonlinear for φ, thus there exists at least one setting of variables j i (y j i, y j i 2,..., y _, j i y_,..., y j i + jn l i ), which turns i th equation of the first subsystem of system (4.) to identity. The Theorem is proven if one shows that no indices jm i appear in any other equation whose function satisfies conditions of the Case 3. In other words, each index of a nonlinear essential variable of fγ i appears in at most one equation with function fγ i depending essentially and nonlinearly on a B-index variable. This fact is proven in Lemma 4.5 (the proof is in Appendix D). Lemma 4.5. There is no index of a nonlinear essential variable of fγ j (for any j) that occurs in at least two equations with functions fγ i satisfying conditions of the Case 3. Also, each variable, that is present in equations corresponding to Case 3, is present only in one such equation. Equations which correspond to Case and Case 2 turn into trivial equalities, and each equation corresponding to Case 3 is solvable. So, we can conclude that the whole system is solvable, and that fact directly implies statement of the Theorem. Remark 4.6. Proof of Theorem 4.2 is much easier in the case of f without linear essential variables. In this case one has m k =, k = 0,..., n ; the sequences y, z are of length 2N + and differ in one bit only. 5 Conclusion and Open Question Theorem 4.2 implies the negative answer to the question of existence (in Golić model) of keystream generators without undesirable properties mentioned in introduction. But our proof is based on a register whose size exponentially grows with the number of taps. Thus, though theoretically the question with Golić conjecture is now closed, there remains the following open question: whether it is possible to prove a similar statement without forcing sequence γ increase exponentially (e. g. in the model where the size of a register is bounded by some polynomial). References [] R.J.Anderson. Searching for the Optimum Correlation Attack. B. Preneel (ed.) Fast Software Encryption. LNCS, vol. 008, pp. 37-43. Springer, Heidelberg (995). [2] M. Dichtl. On nonlinear filter generators. E. Biham (ed.) FSE 997. LNCS, vol. 267, pp. 03-06. Springer, Heidelberg (997). [3] Dj.Golić, J. On the Security of Nonlinear Filter Generators. D. Gollmann (ed.) Proceedings of Fast Software Encryption 996. LNCS, vol. 039, pp. 73-88. Springer, Heidelberg (996). [4] A.Gouget, H.Sibert. Revisiting Correlation-Immunity in Filter Generators. C.Adams, A.Miri and M.Wiener(Eds.) Proceedings of SAC 2007, LNCS, vol. 4876, pp. 378-395. Springer, Heidelberg (2007). 6
[5] D. A. Huffman. Canonical Forms for Information-Lossless Finite-State Logical Machines. IRE Trans. Circuit Theory, 959, v. 5, spec. suppl., p. 4-59. [6] O.A. Logachev, On perfectly balanced Boolean functions. Cryptology eprint Archive, Report 2007/022, http://eprint.iacr.org/. [7] O.A. Logachev, A.A. Salnikov, S.V. Smyshlyaev, V.V. Yashchenko. Perfectly Balanced Functions in Symbolic Dynamics. Cryptology eprint Archive, Report 2009/296, http://eprint.iacr.org/. [8] O.A. Logachev, A.A. Salnikov, and V.V. Yashchenko, Boolean Functions in Coding Theory and Cryptology, MCCME, Moscow, 2004 (in Russian). [9] O.A. Logachev, S.V. Smyshlyaev, V.V. Yashchenko. New methods of investigation of perfectly balanced Boolean functions. Discrete Mathematics and Applications. Volume 9, Issue 3, Pages 237 262, ISSN (Online) 569-3929, ISSN (Print) 0924-9265, DOI: 0.55/DMA.2009.04, /July/2009. [0] F.P. Preparata. Convolutional Transformations of Binary Sequences: Boolean Functions and Their Resynchronizing Properties. IEEE Trans. Electron. Comput., 966, v.5, N6, pp. 898-909. [] S.N. Sumarokov, Functions of defect zero and invertability of some class of finite-memory encoders, Obozrenie prom. i prikl. mat. () (994) 33-55 (in Russian). Appendix A Proof of Theorem 3.2. Denote by γ(f, l) the maximum possible (over all (y, y 2,..., y l ) V l ) number of solutions to the system { f(x s, x s+,..., x s+n ) = y s (5.) s =, 2,..., l. It is obvious that if for some f there are no sequences x, z such that (3.)-(3.4) hold, then the output sequence (y, y 2,..., y r n+ ) = f r n+ (x) and x, x 2,..., x n ; x r n+,..., x r determine the whole input sequence x, and so, for any integer l, γ(f, l) 2 2n 2. It is easy to show that in the opposite case γ(f, l) is unbounded as a function of l with l. In the remaining part of the proof it is shown that γ(f, l) is bounded (with l ) iff f is perfectly balanced. By definition, for any f PB n and any natural l γ(f, l) = 2 n, i. e. γ(f, l) is not unbounded with l. Let f / PB n. Then there is an integer l and a tuple ỹ = (ỹ, ỹ 2,..., ỹ l ) V l, such that there exist 2 n + α solutions to (5.), where α. For the tuple ỹ V l construct the set of all possible sequences of length (k+)l+k(n ) of the following form: ỹ,..., ỹ l, y l+,..., y l+n, ỹ,..., ỹ l, y 2l+n,..., y 2l+2(n ),......, y kl+(k )(n )+,..., y kl+k(n ), ỹ,..., ỹ l, (5.2) 7
k =, 2,..., where y i F 2, i = l +, l + 2,..., l + n ; 2l + n, 2l + n 2,.... Let µ k denote the average number of inputs of f (k+)l+k(n ) that correspond to one output of the form (5.2). In this case, µ k = 2 n ( + α 2 n )k+, so µ k with k. That is, for any integer M there is an integer k = k(m) such that µ k(m) > M, i. e. preimage of one of the sequences (5.2) of length t(m) = (k(m)+)l + k(m)(n ) is of cardinality greater than M. This means that for arbitrary M there exists t(m) such that γ(f, t(m)) > M and thus γ(f, l) is unbounded as a function of l. Appendix B Proof of Lemma 4.3. Consider the set of all essential B-index variables of fγ i and let the variable in this set with the maximal B-index correspond to the (N τ k rδ k ) th variable of f γ, r m k. It is evident that in this case there is another B-index variable corresponding to (N τ k (r + )δ k ) th variable of f γ. According to conditions of Case, this variable is linear as well. Therefore r m k 2, m k 3. Next one has to prove that no other B-index variable is essential for fγ i. It suffices to show that variables of f γ with indices N τ k rδ k l b j, b j {, 0, }, j =,..., l are not essential for f γ except for two trivial cases ( l j= b j = 0 ). j= l j= b j = δ k and Let k j = k. Two cases are possible. ) j > j : b j 0 and let j be the maximal index j such that b j 0. Evidently, it suffices to consider the case of b j =. Then N τ k rδ k l b j N τ k rδ k + j l j= j= < N τ k rδ k 4m( )τ k j < N τ kj. Also, the following inequality holds. N τ k rδ k l b j = N (τ k+ (m k r)δ k ) b j N (τ k+ 2δ k ) l b j N (τ k+ 2δ k ) j j= j= > N (τ k+ 2δ k ) j= j= δ k j. If k j = k, then k j = k and one can estimate the last expression as follows: N (τ k+ 2δ k ) δ k j = N τ k+ +δ k δ k δ k+ > N τ k+ δ k+ = N τ kj. Else N (τ k+ 2δ k ) δ δ kj k j > N τ k+ δ k j N τ k+ δ k j N τ kj δ k j > N ( + ) δ 4m k j = N τ k j τ k j m kj ( + ) δ 4m k j > N τ k j m kj ( ++ ) δ 4m k j N τ k j m kj ( ++ ) δ 2 23 k j N τ k j ( + + ) δ 2 2 23 k j > N τ kj. This implies that all the variables with indices j > j, b j = occur in the interval (N τ kj, N τ kj ) and thus could not be essential for f γ. 8
2) j > j b j = 0 ; j < j : b j 0 (if there are multiple such j, we choose the largest one). If b j =, then N τ k rδ k l b j > N τ k rδ k δ k > N τ k (r + 2)δ k ; j= j= N τ k rδ k l b j < N τ k rδ k, N τ k rδ k l b j N τ k (r + )δ k. If b j = 0, then N τ k rδ k l j= j= j= b j > N τ k rδ k δ k > N τ k (r + )δ k ; N τ k rδ k l b j < N τ k (r )δ k, N τ k rδ k l b j N τ k rδ k. Thus, in this case variables are not essential too. Appendix C Proof of Lemma 4.4. By contradiction, let for some fγ i two B-index variables correspond to (N τ k ) th and (N τ p ) th variables of f γ, p > k. Then j= τ p τ k = l j= b j, b j {, 0, }. (5.3) ) Let the set K = {j k j p, b j = } be nonempty and let j be the maximum element of this set. Then l b j j > δ k j > ( ) > 4m( )τ p > τ p τ k. j= 2) Let K be empty, i.e. b j 0, k j p. Then l b j j= j= j p j= < δ k jp, where k jp p. δ k jp = τ kjp + τ kjp m kjp 8 τ kjp + τ kjp 8 τ p τ p < τ 7 2 7 2 p τ p τ p τ k. Hence, (5.3) is impossible and this concludes the proof of the Lemma. Appendix D Proof of Lemma 4.5. We have to prove that equality τ a τ b + l j= a j = τ c τ d + l j= a j (5.4) does not hold if conditions are not satisfied. τ a = τ c τ b = τ d a j = a j, j =,..., l (5.5) 9
First, we prove that equality l j= a j = l j= a j holds only if a j = a j, j =,..., l. Let a j a j, a j =, a j = 0 and let j be the largest index such that a j a j. Then l j= a j l j= j= a j j a j = + j j= j j= a j j j= a j > δ k j > 0. Consider indices a, b, c, d, e+, e = k j, where j is the largest index such that a j a j. One can transform (5.4) as follows: τ a τ b = τ c τ d + j b j, b j = a j a j, j =,..., j. Let q = max{a, b, c, d, e + }. We have (up to equivalence) five possibilities. ) q = a, q > b, q > c, q > d, q > e +. Then τ a = τ q > (4m 2 + )τ q 5τ q τ b + (τ c τ d ) + 3τ q τ b + τ c τ d + 3δ q 2 > τ b + τ c τ d + δ q 2 > τ b + τ c τ d + j τ b + τ c τ d + j b j, hence equality (5.4) does not hold. j= 2) q = e +, a e, b e, c e, d e. Let b j = (the case of b j = is treated along the same lines). δ e > 4mτ e > 2τ e + 2δ e > (τ a τ b + τ d τ c ) + δ e + δ e > τ a τ b + τ d τ c + j τ a τ b + τ d τ c + j b j, thus equality (5.4) does not hold. j= 3) q = a = c. Then (5.4) can be transformed into τ d = τ b + j b j. If b = d, then (5.4) turns into j j= j= j= j= b j = 0, which holds only if b j = 0, j =,..., j. If d > b (or d < b, that can be treated similarly), we denote q = max{b, d, e + } and consider three subcases. d = q > e+. Then τ d > (4m 2 +)τ q > τ b +4m 2 τ q > τ b +4m 2 δ e > τ b + δ e > τ b + j τ b + j b j. j= j= q = e + > d. Then τ d +τ b +2δ q 2 j j= j j= b j > 4mτ q j = τ d +τ b +2δ e j j= j= j= τ d + τ b + 2τ q j j= > > τ d +τ b + δ k j j > τ d +τ b. j= q = e + = d. Then τ d > 3τ 4 d + m 2 τ q 3τ e+ + τ 2m j b > δ e + τ b > τ b + j τ b + j b kj. j= j= 0
In fact, other subcases are possible but each of them is equivalent to one of the above. 4) q = a = d, b < q, c < q. Then e + q and hence τ a + τ d > (4m 2 + )τ q + τ q > τ c + τ b + τ q τ c + τ b + 2δ e > τ c + τ b + δ e > τ c + τ b + j b j, thus (5.4) does not hold in this case either. 5) q = a = e +, b < q, c < q, d < q. Then τ a = 3τa 4 + τa 4 > 3τ e+ + m 2 τ 4 q. e = k j, so m e 2, m 2. Then 3τ e+ + m 2 τ 4 q > 3τ e+ 2m e + 3τ q > 3δ 2 e + 3τ q > ( + e + τ b + (τ c τ d ) > j + τ b + (τ c τ d ) j b j + τ b + τ c τ d. This implies that (5.4) does not j= hold in this case either. j= j=