Average Case Complexity February 8, 011 Impagliazzo s Hardcore Lemma ofessor: Valentine Kabanets Scribe: Hoda Akbari 1 Average-Case Hard Boolean Functions w.r.t. Circuits In this lecture, we are going to introduce a notion of hardness in the sense of circuit size. For now, we focus on the uniform distribution. Definition 1. A function f : {0, 1} n {0, 1} is said to be δ-hard for size s (for parameters δ, s) if for all circuits C of size C s, we have In this sense, [C(x) f(x)] > δ x U n f is not δ-hard C, C s. x U n [C(x) f(x)] δ f Heur δ SIZ(s) Theorem 1. For all δ < 1 1 and ( 0 < ) ε < 1 such that δ = 1 ε there exist δ-hard functions for circuit size s, where s 1 n n. ε 64 oof. The proof is by a probabilistic argument over a randomized construction: we prove that random function f works. For each circuit C of size s, we have [ [f(x) = C(x)] > 1 ] f x + ε exp( ε n ) (1) This is because the expected fraction of agreements is 1, and the proof follows by a Chernoff bound argument. Remark (Lower bound for circuit size). Suppose we have a truth table of size n which is compressed into a minimal circuit C of size s. To represent C with bits, we need at most s( + log s) 3s log s bits. Roughly, to have s log s n we can choose s n n. Using the above remark and equation 1, by the union bound, the probability that there exists some circuit within the size limit for computing f is upperbounded by: (number of circuits) exp( ε n ) s s log s exp( ε n ) 1 Suppose now that we choose a hardness measure δ = 1 ε, and make ε sufficiently small. To have a δ-hard function here means having a function that is very close to random, and essentially unpredictable by any circuit within the allowed size, on almost all inputs. 1 It never happens that δ > 1 because either 1 or 0 constitutes at least half of the function s output, and therefore we can achieve δ 1 by having a circuit that outputs the more frequent bit, regardless of the input. The inequality s n /n always holds, since according to the Shannon-Lupanov bound [Shannon and Lupanov 60], any function f : {0, 1} n {0, 1} can be computed using some circuit of size n (1 + o(1)). Thus for the n bound to be interesting, we must add a factor so that s n /n. 1
In general, we can construct a δ-hard function f in the following way: We choose a subset H of density δ from the universe ( H δ n ). Suppose function f is so that f H is random and f H is 0, where H is the complement of H. With this construction, it can be easily verified that f is (δ.( 1 ) = δ)-hard. Intuitively, f is δ-hard because there exists a hard-core set H of inputs of size H δ n such that f is essentially unpredictable on H by any small circuit. Surprisingly, this intuition is actually correct! The following lemma formalizes this fact: Lemma 1 (Hard-Core Lemma [Impagliazzo 95 [1], Holenstein 04[]] ). Let f : {0, 1} n {0, 1} be δ-hard for size s where s n n δ 3 ; then for any 0 < ε < 1, there exists a set H {0, 1}n, with size H δ n such that H is an ε-hardcore for circuits of size s = s poly(ε, δ). This means that for some vanishing ε, we cannot do better than randomly guessing the output. Thus, ε is a measure of hard-coreness and determines how much unpredictable our function is. The value for s is s = s. ε 3n, and the lemma says for any circuit C with size C s, x H [C(x) = f(x)] 1 + ε oof. Let s argue the easier result of ε-hard core of size δ n. The proof is by contradiction: Suppose there is no such ε-hard core; that is: S {0, 1} n, S δ. n, circuit C, C s s.t. x S [C(x) = f(x)] > 1 + ε Consider a two-player zero-sum game with players S (set player) and C (circuit player), where the payoff matrix can be arranged in form of a table whose row headers are different choices of sets of size at least δ n and column headers are all possible circuits of size at most s. We define the payoff matrix as: Payoff S,C = [C(x) = f(x)] (The amount S gives to C) x S Note that 0 Payoff S,C 1. Let v be the value of the game: v = min S max C S S C C [Payoff S,C ] = max C min S S S C C [Payoff S,C ] That is, we consider mixed strategies for the two players (mixed strategy is a probability distribution over pure strategies: rows for S, and columns for C). Then we look at the min max or max min of the expected payoff, when the players use their respective mixed strategies (with S player trying to minimize, and C player trying to maximize their payoff). To continue the proof, we distinguish between the following two cases: (1) v < 1 + ε () v 1 + ε Case (1). By the min-max theorem, there exists a mixed strategy S for the set player such that, for all circuits C within the size limit, S S [Payoff S,C] < 1 + ε or equivalently, [ [C(x) = f(x)]] < 1 S S x S + ε ()
By averaging we can get S, S δ n s.t. x S [C(x) = f(x)] < 1 + ε This is a for a fixed circuit C, though. We want to argue the existence of such a set S that has this property for all circuits C. To do so, we need to use some concentration bounds. Define distribution D in the following way: Sample S from S; Sample random x from S (Uniformly). With this definition, q. () yields: x D [C(x) = f(x)] < 1 + ε The rest of the proof for case (1) includes concentration-type argument on the the quality of random sample, and is therefore omitted to avoid technicalities. Finally one would get: S, S δ n : S is ε-hardcore (contradicting the initial assumption) Case (). Similar to case (1), there exists mixed strategy C for for circuit player such that set S, S δ n : C C [Payoff S,C ] 1 + ε That is, for every set S of density δ, on average over x S, the majority of circuits (weighted according to C) is correct on x. Let s define the set H of those inputs x where less than 1/ + ε/ of the circuits in C are correct on x. That is, H = {x Weighted under C, less than ( 1 + ε ) fraction of circuits C in C are correct on x} Note: [Payoff S,C] = [C(x) = f(x)] < 1 C C x H,C C + ε By definition of H and the contradiction assumption, it must hold that H < δ n. We will give up on the inputs from H, which constitute less than δ fraction of all inputs. Fix an x H. For this x, we have 1 + ε of circuits C C are correct. Hence, if we sample O( n ε ) circuits C from C, their majority are correct on x with probability 1 exp( n). By the union bound, [ x H s.t. majority is wrong] 1 As a result, there must exist a good choice of majority which is correct on all x H. oof for the tight bound of δ n. Our approach is to use the same game with the same payoff function as in proof of the loose bound, but now we label the rows with sets S of density δ (rather than δ). Again, case (1) v < 1 + ε cannot happen, as this would imply the existence of a hardcore set of density δ, which we assumed were not true. We want to compute f on at least 1 δ fraction of all inputs. As before, we can get the majority circuit correct on at least 1 δ fraction of all inputs. 3
We want to recover half of the inputs in H, i.e., we want our algorithm to be also correct on about half of inputs in H (in addition to being correct outside of H). This would be easy if we knew what H is, by the following randomized circuit: On input x: if x H, Majority(circuit(x)) else output random bits (or the more frequent bit) But this is not possible since we must know f(x), while f is hard. We will define a set H in a special way (different from the one we used for the loose bound above). Our approach is to sample a few circuits and estimate the expectation with sample average based on the law of large numbers: C C [Payoff S,C] Avg[.] (For a few circuits C sampled from C) Definition. Define α corr (x) and α 1 (x) as: α corr (x) = α 1 (x) = C in samples C in samples [C(x) = f(x)] 1 [C(x) = 1] 1 Note that α 1 (x) is computable since our collection of sampled circuits C is polynomially small. We have: C in samples [C(x) = f(x)] = 1 + 1 α corr(x) [C(x) = 1] = 1 C in samples + 1 α 1(x) Observe that for all x, 1 α corr (x) 1. We also define a set H of size δ n of all those x s with smallest values α corr (x). Let φ = max x H α corr (x). We have α corr (x 1 ) α corr (x ) α corr (x δ n) φ where all these x i s are members of H. Claim. φ > 0. (Because x H [α corr (x)] > 1 1 0) For x H, α corr (x) φ, so at least 1 + 1 α corr(x) 1 + φ are correct. Define a randomized circuit C (x) as follows: of circuits in the samples collection 0 if α 1 (x) φ, C (x) = 1 with probability 1 + α1(x) φ if φ < α 1 (x) < φ, 1 if α 1 (x) φ Note that for every x H, C (x) = f(x) with probability 1. Also, it can be proved that [C (x) = f(x)] = 1 + α corr(x) φ truncated at 0 and 1 Thus, for each x H: [C (x) = f(x)] max(0, 1 + α corr(x) ) 1 φ 4
and therefore x H [C (x) = f(x)] 1 By fixing randomness of C to at least preserve the expectation, we get a deterministic circuit that is correct on more than 1 δ fraction of inputs, contradicting the assumption that f is δ-hard. In the next lecture, we ll see how to use these results for hardness amplification. References [1] R. Impagliazzo. Hard-core distributions for somewhat hard problems. In oc. 36th I Symposium on Foundations of Computer Science, pp. 538 545, 1995. [] T. Holenstein. Key agreement from weak bit agreement. In oc. 37th Symposium on Theory of Computing, pp. 664 673, 005. 5