On the Randomness Complexity of. Efficient Sampling. Bella Dubrov

On the Randomness Complexity of Efficient Sampling Bella Dubrov

On the Randomness Complexity of Efficient Sampling Research Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science in Computer Science Bella Dubrov Submitted to the Senate of the Technion Israel Institute of Technology ADAR 5766 HAIFA MARCH 2006

The Research Thesis Was Done Under The Supervision of Dr. Yuval Ishai in the Department of Computer Science I wish to express my deep gratitude to my advisor, Yuval Ishai, for his wise guidance, constant encouragement and many inspiring discussions. I would also like to thank Eyal Kushilevitz and Ronen Shaltiel for useful comments and suggestions regarding this thesis. The generous financial help of the Technion is gratefully acknowledged

Dedicated to my mother, grandfather and Vadim

Contents Abstract 1 1 Introduction 5 1.1 Our Contribution................................ 6 2 Preliminaries 11 2.1 Boolean Circuits................................. 12 2.2 Cryptography and Pseudorandom Generators................. 12 2.3 The Nisan-Wigderson Pseudorandom Generator............... 13 2.4 Function Compression.............................. 14 3 Pseudorandom Generators Fooling Non-Boolean Distinguishers 15 3.1 Cryptographic nb-prgs............................. 18 3.2 Nisan-Wigderson Style nb-prgs........................ 20 3.2.1 nb-prgs for Constant Depth Circuits................. 20 4 Compression Lower Bounds for Parity 23 4.1 Exact Compression of Parity.......................... 23 4.2 Average-Case Compression of Parity...................... 24 4.2.1 Proof of Part 1 of Theorem 3.3..................... 25 4.2.2 Proof of Part 2 of Theorem 3.3..................... 27 5 A Win-Win Result 31 6 Applications 39 6.1 Probabilistic Functions and Random Sampling................ 39 6.1.1 Relation with the P romise-p vs. P romise-bpp Question..... 39 6.1.2 Matching the Entropy Bound..................... 41 6.2 Cryptographic Applications........................... 49 7 Conclusions and Open Problems 53 References 55 Hebrew Abstract i

List of Tables 3.1 Summary of nb-prgs parameter settings.................... 17 6.1 Summary of randomness parameters for samplers............... 48

Abstract We consider the following question: Can every efficiently samplable distribution be efficiently sampled, up to a small statistical distance, using roughly as much randomness as the length of its output? Towards a study of this question we generalize the current theory of pseudorandomness and consider pseudorandom generators that fool non-boolean distinguishers (nb-prgs). We show a link between nb-prgs and a notion of function compression, introduced by Harnik and Naor [18]. (A compression algorithm for f should efficiently compress an input x in a way that will preserve the information needed to compute f(x).) By constructing nb-prgs, we answer the above question affirmatively under the following types of assumptions: Cryptographic incompressibility assumptions (that are implied by, and seem weaker than, exponential cryptographic assumptions). Nisan-Wigderson style (average-case) incompressibility assumptions for polynomialtime computable functions. No assumptions are needed for answering our question affirmatively in the case of constant-depth samplers. To complement the above, we extend an idea from [18] and establish the following winwin situation. If the answer to our main question is no, then it is possible to construct a (weak variant of) collision-resistant hash function from any one-way permutation. The latter would be considered a surprising result, as a black-box construction of this type was ruled out by Simon [38]. Finally, we present an application of nb-prgs to information theoretic cryptography. Specifically, under any of the above assumptions, efficient protocols for informationtheoretic secure multiparty computation never need to use (much) more randomness than communication. An extended abstract of our results will appear in [9]. 1

Notation and Abbreviations U n Uniform distribution on {0, 1} n U S Uniform distribution on the set S x R X The choice of x according to the distribution X H(X) The Shannon entropy of X SD(X, Y ) The statistical distance between X and Y PPTM Probabilistic polynomial-time Turing machine u, v The inner product of the vectors u and v modulo 2 Concatenation f : n m(n) The function f outputs a string of length m(n) on inputs of length n f k (x) f(f(f...(x))) (k times) Size(s(n)) The class of circuits of size s(n) Size(s(n)) Depth(d(n)) The class of circuits of size s(n) and depth d(n) s-dnf A DNF with terms of length s s-cnf A CNF with clauses of length s 3

Chapter 1 Introduction In their 1976 paper, Knuth and Yao [30] consider the following problem. Suppose we wish to sample from some distribution R on {0, 1} m. How many random bits are needed in order to do this? Knuth and Yao showed that there exists a (possibly inefficient, or even nonterminating) algorithm that samples R using H(R) + O(1) random bits on average, where H( ) denotes Shannon entropy. We investigate the computational analog of this question, namely we deal with the case where R is samplable by some efficient algorithm D. Our goal is to reduce the amount of randomness that D uses as much as possible, when we are willing to settle for sampling from a distribution which is statistically 1 close to R. The question we ask is the following: Can every efficiently samplable distribution be efficiently sampled, up to a small statistical distance, using roughly as much randomness as the length of its output? (Note that in case that the Shannon entropy is efficiently computable, or is given as advice, one could hope to match the entropy bound. We will deal with this more refined question later.) Building randomness-efficient (or completely derandomized) algorithms for various purposes has been the subject of extensive research, e.g. [34, 35, 22, 23, 31, 29, 7, 40, 26, 21, 39, 27]. Considerable effort has been devoted to derandomizing algorithms that compute deterministic boolean functions, with P = BPP being a major open problem in the area. A more general problem is the reduction of the amount of randomness used by algorithms that compute general (not necessarily boolean) probabilistic functions. 2 A probabilistic function f is a random process that on input x outputs a sample from some distribution f(x). Therefore sampling algorithms comprise a special case of probabilistic functions (where the input is over a unary alphabet or, equivalently, the output distribution depends only on the length 1 Note that computational closeness can be achieved using standard pseudorandom generators. While this suffices for most real-life applications of sampling, there are contexts in which one actually needs the strict notion of statistical closeness considered here. Such an application in cryptography is discussed in Section 1.1. 2 We note that in the case of probabilistic functions with logarithmic output length this problem is equivalent to the P romise-p vs. P romise-bpp question. See Section 6.1.1 for more details. 5

of the input). In the following, whenever we refer to sampling algorithms, the discussion can be generalized to arbitrary probabilistic functions. Pseudorandomness. A major tool in the area of derandomization is the notion of a pseudorandom generator. A pseudorandom generator (PRG) is an efficient deterministic algorithm that stretches a short random string (called the seed) into a long pseudorandom string, i.e., a string that looks random to computationally bounded distinguishers. In the classical setting the distinguisher gets some input string and outputs a single bit. The distinguisher is said to be fooled by the PRG, if the probability that it outputs 1 on a random string is close to the probability that it outputs 1 on a pseudorandom string. Except for very limited distinguisher classes (e.g. constant-depth circuits), the constructions of pseudorandom generators rely on some hardness assumptions. The first approach, initiated by Blum-Micali and Yao, is to use cryptographic assumptions, such as the existence of one-way permutations (OWPs) [5, 42] and one-way functions [16]. The resulting generators have polynomial stretch (i.e., the seed length and the output length are polynomially related), are computable in polynomial time and, provided the assumption holds, fool any polynomial-time distinguisher. Relaxing the requirements, Nisan and Wigderson [35] constructed pseudorandom generators relying on weaker assumptions, namely (non-unifrom) average-case hardness assumptions, which in turn can be reduced to corresponding worstcase assumptions [1, 23, 40]. The resulting generators fool distinguishers with fixed circuit size and are allowed to be computable in exponential time in their seed length. This allows to reduce the seed length to O(log n), where n is the output length. It should be noted that these PRGs require more time to compute than the size of the circuit they try to fool. 1.1 Our Contribution In order to deal with reducing the amount of randomness required by probabilistic functions and samplers, we present a natural generalization of the classical notion of PRGs to pseudorandom generators that fool non-boolean distinguishers (nb-prgs). In our setting the distinguisher gets an input string of length n and outputs some string of length m(n). We say that the nb-prg fools the distinguisher, if the distribution of the distinguisher s output on a random string is statistically close to the distribution of its output on a pseudorandom string. The question of constructing pseudorandom generators that fool non-boolean distinguishers was posed in [24]. Typical parameters. The typical parameters of nb-prgs that are useful for our applications are the following. First note that every nb-prg fooling distinguishers that output m bits must have seed length at least m, since otherwise the distinguisher can just output the first m bits of its input. Suppose that we want to reduce the amount of randomness used by a sampler that uses n random bits and outputs m(n) = n γ random bits for some constant 0 < γ < 1. The sampler runs in poly(n) time and we want the new sampler (with the reduced randomness) to also be efficient. We would therefore like to have an nb-prg with the shortest seed possible (i.e., of length m) and with polynomial stretch, which is computable in poly(n) time. In the sequel we call this parameter setting the polynomial setting. Note that this is different from the case of standard PRGs for derandomizing BPP, 6

where the PRG can be computable in exponential time in its seed length. The reason for this is that in the standard setting one needs to iterate over all the possible seeds anyway in order to achieve full derandomization. The seed length in the standard setting would ideally be O(log n). In our setting, however, we can not hope for full derandomization, since the processes that we deal with are inherently probabilistic. The seed length in our setting would typically be polynomially related to the output length of the generator. It is obvious that in the general case, similarly to standard PRGs, the construction of nb-prgs should rely on hardness assumptions. But what type of assumptions can be useful for this task? Observe that a standard PRG fooling circuits of size s + 2 m is also an nb-prg fooling circuits of size s that output m bits. (This is so since a distinguisher that outputs m bits can be converted into a boolean distinguisher at the expense of 2 m increase in its size.) It follows that good nb-prgs can be constructed based on exponential strength cryptographic assumptions, and in particular from the existence of a OWP with exponential hardness (since such OWPs imply the existence of PRGs with exponential hardness). In contrast, in the Nisan-Wigderson setting, a PRG fooling circuits of size 2 m will be computable in time greater than 2 m. Thus, under such assumptions we do not get efficient nb-prgs for the polynomial parameter setting. We would like to construct nb-prgs based on weaker assumptions than exponential strength OWPs. Our results in this direction are outlined below. Function compression. We show that the notion of nb-prgs is closely connected with the notion of function compression 3 introduced by Harnik and Naor [18]. Consider the following setting. A bounded player, called the compressor, wishes to compute some function f on an input x of length n. Because of its limited computing power, the compressor can not compute f(x) directly. Instead, the compressor has a connection with a computationally unbounded player called the solver, that is willing to compute f(x). Moreover, the compressor is only allowed to communicate m(n) < n bits to the solver. Therefore the compressor needs to somehow compress x to m(n) bits in a way that will preserve the information needed to compute f(x). We say that f can be compressed to m(n) bits by some algorithm C, if on inputs of size n the algorithm C outputs m(n) bits and there exists an unbounded solver S such that S(C(x)) = f(x) for all x. This (worst-case) notion of compression can be naturally generalized to average-case compression, where we measure the fraction of inputs x on which S(C(x)) = f(x). Constructing nb-prgs. We show that the assumption that function compression is hard has some useful consequences in our context. More specifically, we demonstrate that nb-prgs can be constructed based on (average-case) function compression hardness assumptions. 4 In particular, the cryptographic and the Nisan-Wigderson constructions both give rise to nb-prgs if, instead of standard hardness assumptions, compression hardness assumptions are used. We construct nb-prgs for the polynomial parameter setting based on the following assumptions. In the cryptographic setting we construct nb-prgs based on 3 This new notion is not to be confused with other notions of language compression that appear in the literature, e.g. [11]. 4 Note that the existence of an nb-prg fooling distinguishers that output m bits implies the existence of a function in N P that is hard to compress to m bits (the function equals 1 on the set of images of the nb-prg.) 7

polynomial-strength cryptographic incompressibility assumptions, namely that there exists a (polynomial-strength) OWP with a hard-core bit that is hard to compress. We note that, using an exact complexity analysis of the hard-core bit construction of Goldreich and Levin [13], this assumption is implied by an exponentially strong OWP. We can also base our nb-prgs on Nisan-Wigderson style incompressibility assumptions, namely on the existence of a function in P that is hard to compress on average to, say, n/2 bits by circuits of some fixed polynomial size. For instance, assuming the existence of a function in P that can not be compressed to n/2 bits with advantage 1/n 8 by circuits of size O(n 8 ) we obtain nb-prgs that fool (up to an error ɛ = 1/n) distinguishers of linear size that output m = n 1/4 bits, where the seed length is l = O(m 2 ). Note that, for the reasons discussed above, we need to base our constructions on hard functions in P (instead of hard functions in E that are used for constructing standard PRGs). In the classical setting, the average-case assumptions can be relaxed to worstcase assumptions by using worst-case to average-case hardness amplification techniques [1, 23, 40]. Since no hardness amplification techniques for functions in P are known, we can not relax our assumptions to worst-case. Similarly, PRG constructions based on a polynomial encoding of the hard function, such as [37, 41], can not be used in our setting. These constructions require the computation of a low-degree extension of the hard boolean function, that is, a low-degree polynomial over a finite field that agrees with the boolean function on its domain. The problem is that the low degree extension of a function in P is generally #P-hard to compute, so the resulting generator will not be efficient. A win-win result. We also reach some interesting conclusions in case our hardness assumptions do not hold. Harnik and Naor [18] suggest applications of function compression to cryptography. In particular, they show that if SAT can be efficiently compressed (with certain parameters) then a collision-resistant hash function (CRHF) can be built from any one-way function. Our results can be viewed as complementary to theirs: they show consequences of the existence of good compression algorithms, whereas we show consequences of the non-existence of such algorithms. However, there is a considerable gap between the easiness results exploited by [18] and the hardness results we require. First, [18] requires good compression in the worst-case whereas we rely on incompressibility in the average-case. More importantly, even if SAT is incompressible on average, this does not imply a positive answer to our main question, since the resulting nb-prgs will typically be computationally inefficient. Thus, to establish a tighter win-win situation we need to rely on a different construction of CRHFs whose failure would imply the type of negative compression results that is useful for our purpose. Using this approach, we show the following result. Suppose OWPs exist. 5 Then, either the answer to our main question is yes or there exists a distributional collision resistant hash function (d-crhf). Distributional collision-resistant hash functions are a weaker variant of CRHFs that we define. Instead of requiring that finding an arbitrary collision is hard, we require that finding a random collision is hard. A different way of interpreting the above result is that if the answer to our main question is no then OWP implies d-crhf. The latter implication would be considered a surprising result since 5 For this result we only require the existence of standard (polynomial strength) OWPs (as opposed to exponential strength OWPs that imply a positive answer to our question). 8

Simon [38] shows an oracle relative to which OWPs exist but CRHFs do not, hence ruling out a black-box construction of CRHFs from OWPs. We note that relative to the same oracle used in [38] d-crhfs also do not exist, thus no black-box construction of d-crhfs from OWPs is possible. Unconditional results for constant-depth samplers. We prove unconditional lower bounds on the size of constant depth circuits that compress the parity function. As in the standard setting, this gives rise to unconditional nb-prgs that fool constant depth circuits. Our lower bounds generalize the lower bounds for computing parity obtained by Håstad [15]. Using the results of [15] directly, the compression bounds that can be obtained are only for m = o(n 1/(d+2) ), where d is the depth of the circuit. We extend these results to obtain nearly tight bounds that apply to every value of m of the form m = n δ, for any constant 0 < δ < 1. Thus we get (unconditionally) efficient nb-prgs for constant depth samplers. Our compression lower bounds show that in the specific context of computing parity by constant depth circuits, the relaxation of standard computation to compression does not give much more computing power. Application to cryptography. As was already mentioned, the main application that motivated our notion of nb-prgs is the problem of reducing the amount of randomness used by sampling algorithms. Our generators also have a cryptographic application. Specifically, they can be used to reduce the amount of randomness in protocols for information-theoretic secure multiparty computation. In this problem a set of players wish to compute a function of their inputs, while maintaining the (information-theoretic) privacy of their data against a computationally unbounded adversary. There has been a significant body of work on characterizing the amount of randomness used by such protocols, both for specific and general tasks, e.g. [6, 32, 31, 7, 10]. We show that using nb-prgs (and under the corresponding assumptions) efficient protocols for information-theoretic secure multiparty computation never need to use (much) more randomness than communication. (In fact, it suffices to use roughly as much randomness as the amount of communication viewed by an adversarial coalition.) Matching the entropy bound. Finally, we consider the more refined question of reducing the amount of randomness to match the entropy of the sampled distribution. Recall that Knuth and Yao [30] showed that this can be done non-explicitly. We show an explicit version of this result. Specifically, under exponential-strength cryptographic assumptions, or when the probability function of the sampled distribution can be computed in polynomial time, the amount of randomness needed to efficiently sample R can be reduced to O(H(R)). Related Work. Several known PRGs for space bounded machines (such as [34] and [36]) are in fact also nb-prgs in our sense. (When constructing the PRG, the output of the distinguisher should be considered part of the space that it uses.) This is so since when the distinguisher runs with a pseudorandom input produced by such a PRG, its final state distribution is statistically close to its final state distribution on a random input. This fact was noted, for example, by Nisan and Zuckerman [36], who also point out that their PRG can be used to reduce the amount of randomness in several types of probabilistic algorithms, such as walks on Markov chains. The seed length of such a PRG will be close to the space 9

the distinguisher uses. Therefore using these PRGs one can only reduce randomness to the output length in efficient sampling algorithms which do not use more space than their output length. In contrast, we consider the general case of efficient sampling algorithms. Organization. The remainder of the thesis is organized as follows. We start with some preliminaries in Chapter 2. In Chapter 3 we introduce the notion of nb-prgs and show that under compression hardness assumptions both the cryptographic construction and the Nisan-Wigderson construction give rise to nb-prgs. In Chapter 4 we prove lower bounds on the size of constant depth circuits that compress parity exactly and on average. In Chapter 5 we establish the win-win result, namely we show that if OWPs exist, then either the answer to our question is yes or d-crhfs exist. Chapter 6 presents some applications of nb-prgs. The application to random sampling is described in Section 6.1. Section 6.1.1 explains the connection with the P romise-p vs. P romise-bpp question and Section 6.1.2 discusses the possibility of matching the entropy bound. The application to secure computation protocols is described in Section 6.2. Finally, Chapter 7 discusses the conclusions and open problems. 10

Chapter 2 Preliminaries We refer the reader to the Notation and Abbreviations section on page 3. We will also need the following definitions and basic facts. Definition 2.1 Let X be a random variable on {0, 1} m. The Shannon entropy of X is defined as H(X) = ( ) 1 P r[x = x] log. P r[x = x] x {0,1} m Definition 2.2 For random variables X, Y on {0, 1} m the statistical distance of X and Y is defined as SD(X, Y ) = 1 P r[x = z] P r[y = z]. 2 z {0,1} m Fact 2.1 For any random variables X, Y on {0, 1} m SD(X, Y ) = max P r[x T ] P r[y T ]. T {0,1} m Fact 2.2 For any random variables X, Y, Z if SD(X, Y ) α and SD(Y, Z) β then SD(X, Z) α + β. Definition 2.3 Let m, n be integers such that m n. Let h = {h s }, h s : {0, 1} n {0, 1} m, be a family of functions. We say that h is a pairwise independent hash function family if for every x 1, x 2 {0, 1} n such that x 1 x 2, for every y 1, y 2 {0, 1} m P r s [h s (x 1 ) = y 1 h s (x 2 ) = y 2 ] = 1 2 2m. We note that polynomial-time computable pairwise independent hash function families can be easily constructed for every n and m n. 11

2.1 Boolean Circuits A boolean circuit consists of NOT, AND and OR gates. Unless stated otherwise, we discuss circuits of unbounded fanin (i.e., each gate can have any number of inputs). For any boolean circuit we define its size as the number of gates in the circuit excluding NOT gates and its depth as the length (in gates, excluding NOT gates) of the longest path from input to output. We call the input level of the circuit the bottom of the circuit. A circuit family C = {C n } n N is a collection of circuits such that for every n the circuit C n has n boolean inputs. For functions s : N N, d : N N we denote by Size(s(n)) the class of (non-uniform) circuit families C = {C n } such that for every n the circuit C n is of size at most s(n) and we similarly define Size(s(n)) Depth(d(n)). A literal is a boolean variable or its negation. A term is a conjunction (AND) of literals and a clause is a disjunction (OR) of literals. A DNF is OR of terms, a CNF is AND of clauses. An s-dnf is a DNF where all the terms are of length at most s. An s-cnf is a CNF where all the clauses are of length at most s. 2.2 Cryptography and Pseudorandom Generators The following are basic cryptographic definitions and results, see [12] for more details. In [12] the adversary is usually a PPTM machine. We use generalized definitions of the standard notions, in which the adversary is in some arbitrary complexity class K. Definition 2.4 Let K be a complexity class. A length-preserving permutation f : {0, 1} {0, 1} is called a K-one-way permutation (K-OWP for short) if it is computable in polynomial time and for every algorithm A K, for every polynomial p( ), for sufficiently large n s P r[a(f(x)) = x] < 1 p(n), where the probability is over x chosen according to U n and the coin tosses of A. Definition 2.5 Let K be a complexity class and let f : {0, 1} {0, 1} be a function. A predicate b : {0, 1} {0, 1} is called a K-hard-core bit of f if it is computable in polynomial time and for every algorithm A K, for every polynomial p( ), for sufficiently large n s P r[a(f(x)) = b(x)] < 1 2 + 1 p(n), where the probability is over x chosen according to U n and the coin tosses of A. Definition 2.6 Let K be a complexity class and let ɛ : N R be a function such that for all n we have 0 ɛ(n) < 1. Let l : N N be a function such that l(n) < n for all n. A function G : l(n) n is called an (l, ɛ, K)-pseudorandom generator ((l, ɛ, K)-PRG for short), if for every distinguisher A K, for sufficiently large n s P r[a(g(u l )) = 1] P r[a(u n ) = 1] < ɛ(n). 12

G is called an (l, K)-cryptographic pseudorandom generator ((l, K)-crypto-PRG for short) if for every polynomial p( ) G is an (l, 1 p(n), K)-PRG. The proofs of the following theorems 1 are implicit in the works of Goldreich and Levin [13] and Blum, Micali and Yao [5], [42]. See also [12], chapters 2 and 3. Theorem 2.1 (An exact version of Theorem 2.5.2 from [12]) Let 0 < γ 1. Let f be a Size(2 c 1n γ )-OWP, for some constant c 1 > 0. Define g(x, r) = (f(x), r) and b(x, r) = x, r. Then b is a Size(2 c 2n γ )-hard-core bit of g, for some constant c 2 > 0. Theorem 2.2 (An exact version of Proposition 3.4.3 from [12]) Let 0 < γ δ < 1 be constants. For every constant c 1 > 0, if there exists a Size(2 c 1n γ/δ )-OWP, then there exists a poly(n)-time computable (n δ, Size(2 c 2n γ )) -crypto-prg, for some constant c 2 > 0. 2.3 The Nisan-Wigderson Pseudorandom Generator We now present basic definitions and results from the work of Nisan and Wigderson [35]. Definition 2.7 A collection of sets {S 1,..., S n }, S i [l], is called an (s, k)-design if S i = k for all i and S i S j s for all i j. An n l 0-1 matrix is called an (s, k)-design if the collection of its n rows, interpreted as subsets of [l] is an (s, k)-design. Lemma 2.1 For all integers n and k, such that log n k n, there exists an n l boolean matrix which is a (log n, k)-design, where l = O(k 2 ). The matrix can be computed in polynomial time in n. If k = O(log n), there exists such a matrix with l = O(log n) that is computable in polynomial time in n. Definition 2.8 Let l : N N be a 1-1 function. Let A = {A n } be a collection of 0-1 matrices such that A n is an n l(n) matrix. For x {0, 1} l(n) the matrix A n defines n subsets of the bits of x. Let f be a boolean function. We denote by f A the transformation such that for every n, for every x of length l(n) the output of f A (x) is the concatenation of n applications of the function f to the subsets of the bits of x defined by A n. The following theorem from [35] shows how to convert a hard boolean function into a pseudorandom generator. Theorem 2.3 Let k(n) < l(n) < n, let f be a boolean function. Suppose that for sufficiently large n s the function f on inputs of length k(n) can not be computed by circuits of size n 2 for 1 2 + 1 n 2 fraction of the inputs. For every n let A n be a boolean n l(n) matrix which is a (log n, k(n))-design and let A = {A n }. Then G given by G = f A is an (l, 1 n, Size(n))-PRG. 1 The theorems also hold in the uniform setting, however we need the non-uniform versions for our purposes. 13

2.4 Function Compression The general notion of function compression was very recently introduced by Harnik and Naor [18], who mostly consider it in the context of NP relations. We use the following variant of this notion. Definition 2.9 Let m : N N be a function, such that m(n) < n for all n. Let C, S be complexity classes. We say that a boolean function f : {0, 1} {0, 1} has compression complexity (m, C, S) if there exist two algorithms: a compressor C : n m(n) and a solver S : {0, 1} {0, 1}, such that C C, S S and S(C(x)) = f(x) for all x. We will generally be interested in the case where the solver is unbounded. In this case we say that f can be m-compressed by algorithms in C. We will sometimes be interested in average-case function compression, that is, compression which is correct only for some fraction of the inputs. Definition 2.10 We say that a compressing algorithm C compresses f on average with probability p, if there exists a solver S such that for every n we have P r[s(c(x)) = f(x)] p, where the probability is on the uniform choice of x and the coin tosses of C (if C is probabilistic). We define the advantage of C as p 1 2. We note that in the case of average-case compression by circuits, we can assume w.l.o.g. that the compressing circuit is deterministic. We use the following notation. Notation 2.1 Let C : {0, 1} n {0, 1} m be a compressing circuit for a boolean function f : {0, 1} n {0, 1} and S : {0, 1} m {0, 1} be a solver. We define the following. Υ f S (C) = the number of inputs x {0, 1}n for which S(C(x)) = f(x), Υ f (C) = max S {Υ f S (C)}, α f (C) = Υf (C) 2, n f (C) = α f (C) 1 2. For y {0, 1} m(n) we define: zero f C (y) = {x {0, 1}n : f(x) = 0 and C(x) = y}, one f C (y) = {x {0, 1}n : f(x) = 1 and C(x) = y}. In order to get some intuition about function compression we now present a couple of basic observations. Observation 2.1 Most boolean functions are not compressible to n 1 bits by circuits of size 2 o(n). Proof: By a counting argument. Note that if f can be compressed to m bits by circuits of size s, then it can be computed by circuits of size s + 2 m. Thus we get the following. Observation 2.2 If a boolean function is not computable by circuits of size s + 2 m, then it can not be compressed to m bits by circuits of size s. 14

Chapter 3 Pseudorandom Generators Fooling Non-Boolean Distinguishers We are now ready to present our notion of pseudorandom generators fooling non-boolean distinguishers. Definition 3.1 Let K be a complexity class, let 0 ɛ(n) < 1 and m(n) l(n) < n for all n. A function G : l(n) n is called a pseudorandom generator fooling non-boolean distinguishers with parameters (l(n), m(n), ɛ(n), K) ( (l, m, ɛ, K)-nb-PRG for short), if for every D K, such that D : n m(n), for sufficiently large n s we have SD(D(G(U l )), D(U n )) < ɛ(n). (If D is probabilistic, the probability space includes also its coin tosses.) G is called an (l, m)-crypto-nb-prg if for every polynomial p( ) the function G is an 1 (l, m, p(n), K)-nb-PRG, where K is the class of PPTMs. Note that for m = 1 we get the classical PRG definition. Also note that m(n) l(n) is needed, since otherwise the distinguisher could output the first m bits and get an advantage ɛ 1 2. The following parameters can be obtained non-constructively. Proposition 3.1 For every n, m, s and ɛ there exists a function G : {0, 1} l {0, 1} n which is an (l, m, ɛ, Size(s))-nb-PRG, for l = O(log s + m + log 1 ɛ ). As was already mentioned, in our applications we would typically like to have nb-prgs with polynomial stretch that can be computed in polynomial time. For a summary of the various parameter settings that we obtain see Table 3.1. We note that, somewhat counterintuitively, nb-prgs fooling distinguishers with m = n 0.1 can be turned into nb-prgs fooling distinguishers with m = n 0.9. (This might seem counterintuitive, since a larger m means an easier task for the distinguisher.) Indeed, to get an nb-prg G against distinguishers with m = n 0.9, take the nb-prg G against distinguishers with m = n 0.1 and output only its first n 0.1 0.9 bits. (This nb-prg has output length n = n 0.1 0.9 and fools distinguishers that output n 0.1 = n 0.9 bits.) However, it is 15

important to pay attention to the running time and the seed length of the resulting nb- PRG. If, for example, the seed length of G is l = m 2 = n 0.2 we get that the seed length of G is greater than its output length, so it is not a valid nb-prg. For the same reason it is not possible to build an efficient nb-prg G against distinguishers with m = n 0.1 from an nb-prg G against distinguishers with m = log n, where G takes poly(n) time to compute. This is so, since G will not be computable in polynomial time in its output length. Similarly to Observation 2.2, a sufficiently strong classical PRG is also a good nb-prg. Observation 3.1 If G is an (l, ɛ, Size(s+2 m ))-PRG, then G is an (l, m, ɛ, Size(s)) nb P RG. Using this observation we conclude that strong cryptographic assumptions (e.g. exponential OWPs, as in Theorem 2.2) imply the existence of efficient nb-prgs for the polynomial parameter setting. 1 For NW-style PRGs this observation only gives us nb-prgs with m = O(log n). We now construct nb-prgs based on weaker assumptions, namely compression hardness assumptions. First, we note that both the cryptographic and the Nisan-Wigderson constructions have black-box proofs, i.e., the proof shows an oracle circuit that contradicts the hardness assumption given that the oracle breaks the PRG. We also observe that in these proofs the oracle is used in a simple way, for example, in both proofs the circuit calls the oracle only once. We show that proofs of this form can be translated to the setting of function compression. Specifically, we show that if an oracle circuit A B computes some function f, then, given an oracle B that compresses the function of B, it is possible to build an oracle circuit A B that compresses f. The circuit A B works roughly as follows. It performs the computation of A B until it encounters an oracle gate. Then, instead of a B-gate is has a B -gate. At this point it outputs the output of the B -gate concatenated with all the additional information that is needed in order to continue the computation. The solver will later continue the computation from this stage. The connection with nb-prgs follows by observing that a distinguisher for nb-prg actually compresses some function that breaks the PRG in the standard sense. Construction 3.1 Let A B be an oracle circuit. Given an oracle gate B (which can have many output bits) we construct the circuit A B as follows. We start with the circuit A and go over all its oracle gates. For each oracle gate we delete all the paths from it to the output. (By deleting a path we mean deleting all the gates and edges on the path. This leaves some edges hanging and not entering any gate: these are outputs of the new circuit.) We replace the oracle gate by a B gate. (The outputs of B are also outputs of the new circuit.) We now delete all the constant outputs of A. (These are edges of A with constant values that turned into outputs of A.) 1 A similar result can be obtained using exponential-strength one-way functions. However, in this case we have a polynomial blowup in the seed length. For instance, the construction of [16] gives l = O(m 8 ). This polynomial overhead can be reduced using the constructions of [17, 19]. 16

Compression Distinguisher Seed Assumption Comments Reference Length (m) Class Length (l) 1 Size(n) O(log n) standard NW: [23, 40] function in E with circuit size 2 Ω(n) log n Size(n) O(log n) standard NW: [23, 40] + function in E Obs. 3.1 with circuit size 2 Ω(n) (log n) c Size(n) O((log n) 2c ) compress NW: Thm. 3.2 function in DT IME(2 O(n1/c) ) which is not n 2 -compressible by Size(O(2 2n1/c )) on 1 2 + 1 2 2n1/c fraction of the inputs n γ Size(2 nγ ) n δ strong crypto: 0 < γ δ < 1 Thm. 2.2 + Size(2 Ω(nγ/δ) )-OWP Obs. 3.1 n γ PPTM O(n γ ) compress crypto: 0 < γ < 1 Thm. 3.1 OWP with n 2 -incompressible hard-core bit n γ Size(n) O(n 2γ ) compress NW: γ < 1 2 Thm. 3.2 function in P which is not n 2 -compressible by Size(O(n 2/γ )) on 1 2 + 1 n 2/γ fraction of the inputs n γ Size(n) O(n 2γ+δ ) none 0 < γ < 1 2, Thm. 3.4 Depth(d) δ > 0 Table 3.1: Summary of nb-prgs parameter settings. All the nb-prgs have n output bits and are computable in poly(n) time. All the distinguishers have advantage at most ɛ, where 0 < ɛ < 1 can be arbitrary small constant. 17

Note that if A calls B adaptively, i.e., there is a path from one oracle gate to the output that goes through another oracle gate, then the second oracle gate will be deleted when we process the first one. This does not damage the compression, since the solver will continue the original computation from the first oracle gate. Observe also that if {A B } is a P-uniform circuit family, then so is {A B }. Lemma 3.1 Let A B be a probabilistic oracle circuit that computes some function f with probability p on a uniformly chosen input and random coins. Let B be an oracle that compresses the function that B computes. Let A B be the circuit constructed according to Construction 3.1. Let n denote the input length of A B (and of A B ) and denote by m(n) the output length of A B. If m(n) < n, then A B compresses f with probability p. Proof: Since B compresses the function that B computes, there exists a solver S such that S(B (x)) = B(x) for all x. It is possible to construct a solver T such that T (A B ( )) computes f with probability p. The solver T completes the circuit A B to get a circuit T (A B ( )) that is equivalent to A B. The circuit T (A B ( )) is the circuit A except that in place of each B gate it has S(B ( )). Note that each oracle gate of the original circuit (that is not deleted) adds at least one output bit for the new circuit. Therefore this approach does not work when there are too many oracle calls. This is the case, for instance, in proofs of Yao s XOR Lemma [14, 20, 23], which do not efficiently translate to the compression setting via the above paradigm. Another problematic case is when the circuit performs complex computations that involve both the oracle answers and the input. 3.1 Cryptographic nb-prgs We now weaken the assumptions and show that cryptographic nb-prgs exist under polynomial strength cryptographic incompressibility assumptions. Specifically, we prove that cryptographic nb-prgs exist provided there are OWPs with incompressible hard-core bits. Definition 3.2 Let K be a complexity class. Let b be a polynomial time computable predicate, f be a length preserving permutation and m(n) < n for all n. Define g(y) = b(f 1 (y)). We say that b is an (m, K)-incompressible hard-core bit of f, if for every D K, such that D : n m(n), for every polynomial p(n), for sufficiently large n s, D compresses g on average with probability less than 1 2 + 1 p(n). Observe that for m = 1 we get the standard hard-core bit definition. In the sequel when K is omitted we assume it to be probabilistic polynomial time. (Similar results can be obtained for the non-uniform setting, i.e., where K is the class of polynomial-size circuits.) We note that even optimal compression in the sense of Harnik and Naor [18] does not rule out the existence of incompressible hard-core bits. Harnik and Naor mainly consider the compression of N P languages up to the witness length. The N P language {y b(f 1 (y)) = 1} has natural witnesses of length n and we require incompressibility to 18

length m < n. So even if optimal compression in the sense of [18] is possible, there still may exist incompressible hard-core bits in the sense required here. We now show that the BMY construction produces an nb-prg if instead of a standard hard-core bit we use an incompressible hard-core bit. The proof of correctness of the BMY construction is a black-box proof, namely given a distinguisher B for the generator as an oracle we build an algorithm A that breaks the hard-core bit. Since the algorithm A is relatively simple (it calls the oracle only once and then performs a simple computation), we can use Observation 3.1 to translate the proof to our setting. Theorem 3.1 Let m : N N be a function such that m(n) < n for all n. If there exists a OWP with an m-incompressible hard-core bit, then for any function l : N N such that l(n) < n and n = l(n) O(1) there exists an (l, m(l(n)) 1)-crypto-nb-PRG G : l(n) n that is computable in poly(n) time. Proof: We use the standard BMY construction [5, 42]. Namely, let f : {0, 1} {0, 1} be a one-way permutation with an m-incompressible hard-core bit b : {0, 1} {0, 1}. We define the nb-prg G as G(x) = b(x) b(f(x))... b(f n 1 (x)). Our goal is to show that G is an (l, m(l) 1)-crypto-nb-PRG. Suppose, for the sake of contradiction, that there exists a PPTM D : n m(l) 1 such that SD(D(G(U l )), D(U n )) 1/poly(n). This means that there exists some T {0, 1} m(l) 1 such that P r[d(g(u l )) T ] P r[d(u n ) T ] 1/poly(n). Define the function S as S(y) = 1 iff y T and define the function B as B(x) = S(D(x)). It holds that D compresses B. We also know that B is a distinguisher for G in the standard sense. The standard proof shows that there exists a PPTM A that given B as an oracle computes b(f 1 ( )) with probability 1/poly(n). The machine A works as follows. On input y it chooses a random bit a, a random 0 j n and random bits r 1,..., r j 1. It then computes out = B(r 1... r j 1 a b(y) b(f(y))... b(f n j 1 (y))). If out = 1 it outputs a and it outputs ā otherwise. Looking at A as an oracle circuit and using Observation 3.1 we conclude that the following machine C compresses b(f 1 ( )) with probability 1/poly(n). On input y the machine C chooses a random bit a, a random 0 j n and random bits r 1,..., r j 1. It then outputs a D(r 1... r j 1 a b(y) b(f(y))... b(f n j 1 (y))). Using the above theorem we get, for example, that if there exists a OWP with an n 2 - incompressible hard-core bit, then there are nb-prgs for the polynomial parameter setting with l = O(m). We note that n = poly(l(n)), even for small values of m (for example, using the theorem we can not get seed length close to m when m = log n). Thus the incompressible hard-core bit assumption is not strong enough to derandomize BPP. 19

3.2 Nisan-Wigderson Style nb-prgs As was previously noted, standard NW-style assumptions yield nb-prgs whose computation requires exponential time in the input length, and are thus unsuitable for the polynomial parameter setting. If we use compression complexity assumptions, we can get nb-prgs for the full parameter range. This is shown by the following theorem, which is a generalization of a theorem of Nisan and Wigderson [35]. Theorem 3.2 Let k = k(n), l = l(n) and m = m(k) such that m < k < l < n. Let f be a boolean function, such that for sufficiently large n s f on inputs of length k can not be compressed on average to m bits with probability 1 2 + 1 by circuits of size O(n 2 ). For every n 2 n let A n be a boolean n l(n) matrix which is a (log n, k(n))-design and let A = {A n }. Then G given by G = f A is an (l, m(k), 1 n, Size(n))-nb-PRG. Proof: We again observe that the standard proof is a black-box proof that uses the oracle in a simple way. Suppose, for the sake of contradiction, that there exists some distinguisher D Size(n), D : n m(k), such that SD(D(G(U l )), D(U n )) 1 n. Let T {0, 1}m be a set that gives the statistical distance, meaning P r[d(g(u l )) T ] P r[d(u n ) T ] 1 n. As before, define the function S as S(y) = 1 iff y T and define the function B as B(x) = S(D(x)). It holds that D compresses B and B is a distinguisher for G in the standard sense. The proof of [35] shows how to build a circuit A of size O(n 2 ) that given oracle access to B computes f on inputs of length k with advantage 1. The circuit A works as follows n 2 on input x. It first computes the functions y 1,..., y i 1 (for some 0 i n) such that every y j depends on at most log n bits of the input (the y j s are computed by CNFs). It then computes out = B(y 1,..., y i 1, c i,..., c n ), where the c j s are constants. If out = 1 it outputs c i and it outputs c i otherwise. Using Observation 3.1 we conclude that the following circuit C compresses f with advantage 1. It computes y n 2 1,..., y i 1 similarly to A. It then runs D(y 1,..., y i 1, c i,..., c n ) and outputs its answer. Note that the constant c i does not have to be in the output of C, because it can be wired into the solver circuit. The following are typical parameters. Suppose that there exists some 0 < γ < 1 2 and a function in P which is not compressible to n/2 bits on 1 2 + 1 fraction of the n 2/γ inputs by circuits of size O(n 2/γ ). Then there is an (O(n 2γ ), n γ, 1 n, Size(n))-nb-PRG, that is computable in poly(n) time. We note that the advantage of the distinguisher can be made negligible by using a function that is hard to compress with even negligible advantage. 3.2.1 nb-prgs for Constant Depth Circuits Similarly to the classical Nisan-Wigderson PRGs, Theorem 3.2 can be used to build nb- PRGs that fool constant depth circuits given a function that is hard to compress by constant depth circuits. This is because the proof of the theorem increases the depth of the circuit 20

only by a small constant. In order to build such nb-prgs we obtain unconditional lower bounds on the size of constant depth circuits that compress the parity function. Håstad [15] showed that circuits of depth d and size 2 n1/d can only compute parity with advantage 2 Ω(n1/d). Using this result we conclude that circuits of depth d and size 2 o(n1/(d+2)) that output m = o(n 1/(d+2) ) bits can compress parity with advantage at most 2 Ω(n1/(d+2)). If we build nb-prgs using this result we get a seed length of roughly m d. We would like to get a shorter seed that does not depend on d. Thus we prove a stronger lower bound, namely that small circuits of constant depth can not compress parity on average even slightly, i.e., to m = n δ bits for any 0 < δ < 1. The following theorem summarizes the results that we obtain. Theorem 3.3 Let 0 < δ < 1 be a constant. Let C : {0, 1} n {0, 1} m be a circuit of depth d, where m = n δ. Then the following holds. 1) For d = 1 the advantage of C in compressing parity is at most 2 Ω(n nδ log n). 2) For d 2, for C of size 2 O(n(1 δ)/d) the advantage of C in compressing parity is at most 2 Ω(n(1 δ)/d). We prove this theorem in Chapter 4. The following theorem shows the existence of unconditional nb-prgs fooling constant depth circuits. Theorem 3.4 Let d N, 0 < δ < 1 be constants and let k : N N be a function such that log n k < n and n 2 1 δ = o(2k(n) d+2 ). There exists a function G which is an (l, m(k), 1 n, Size(n) Depth(d))-nb-PRG, for l = O(k2 ) and m(k) = k δ. G is computable in polynomial time in n. Proof: Let A = {A n } be a collection of boolean n l(n) matrices which are (log n, k)- designs (such matrices exist by Lemma 2.1). We define G(x) = parity A (x). Clearly, G is computable in space log n. The proof of Theorem 3.2 works also for constant depth circuits, since, similarly to the original proof of [35], it increases the depth of the circuit from d to d + 2. We therefore get an nb-prg against small constant depth circuits that output m bits with seed length of O(m 2+δ ) for any constant 0 < δ < 1. 21

Chapter 4 Compression Lower Bounds for Parity We extend the parity lower bounds of Håstad [15] to function compression. We assume that the solver is unbounded and prove lower bounds on compressing circuits for exact and average-case compression of parity. We note that the following upper bounds can be obtained. For d = 1 a circuit that outputs the first n δ 1 bits of its input concatenated with an AND of the remaining bits has advantage 2 Ω(n nδ). For d 2 it is known that parity on n bits can be computed by circuits of depth d and size 2 O(n1/(d 1)) [15]. Thus by dividing the input into n δ sets of size n 1 δ each and computing the parity of each set we get a circuit of size 2 O(n(1 δ)/(d 1)) that compresses parity exactly to n δ bits. Notation 4.1 In this chapter we omit the superscript parity and write Υ S (C), Υ(C), α(c), (C), zero C (y) and one C (y). 4.1 Exact Compression of Parity In this section we prove the following theorem. Theorem 4.1 A circuit C : {0, 1} n {0, 1} nδ 1 δ must have size 2Ω(n d ). of depth d that compresses parity exactly d We later prove a stronger statement, namely that circuits of size 2 ) can not compress parity even on average. However, for exact compression lower bounds a simpler technique can be used, therefore we present it here. We use the connection between the sensitivity of boolean functions and their constant depth circuit size introduced by Linial et al. [33]. O(n 1 δ Definition 4.1 Let f : {0, 1} n {0, 1} m be a function and x {0, 1} n. The sensitivity of f on x, s x (f), is the number of Hamming neighbors x of x such that f(x) f(x ). The average sensitivity of f, s(f) is defined as s(f) = 1 2 n x s x(f). 23