Lecture 3: AC 0, the switching lemma

Similar documents
CSC 2429 Approaches to the P vs. NP Question and Related Complexity Questions Lecture 2: Switching Lemma, AC 0 Circuit Lower Bounds

CS6840: Advanced Complexity Theory Mar 29, Lecturer: Jayalal Sarma M.N. Scribe: Dinesh K.

1 AC 0 and Håstad Switching Lemma

Lecture 5: February 21, 2017

Homework 4 Solutions

Certifying polynomials for AC 0 [ ] circuits, with applications

Notes for Lecture 11

CS 151 Complexity Theory Spring Solution Set 5

On the correlation of parity and small-depth circuits

18.5 Crossings and incidences

Lecture 4: LMN Learning (Part 2)

Complexity Theory of Polynomial-Time Problems

Guest Lecture: Average Case depth hierarchy for,, -Boolean circuits. Boaz Barak

Lecture 2: Minimax theorem, Impagliazzo Hard Core Lemma

Lecture 7: ɛ-biased and almost k-wise independent spaces

Notes for Lecture 25

Lecture 2: Proof of Switching Lemma

The Monte Carlo Method

Lecture 29: Computational Learning Theory

Finite fields, randomness and complexity. Swastik Kopparty Rutgers University

CSC 5170: Theory of Computational Complexity Lecture 9 The Chinese University of Hong Kong 15 March 2010

1 Randomized Computation

Questions Pool. Amnon Ta-Shma and Dean Doron. January 2, Make sure you know how to solve. Do not submit.

On converting CNF to DNF

20.1 2SAT. CS125 Lecture 20 Fall 2016

EXPONENTIAL LOWER BOUNDS FOR DEPTH THREE BOOLEAN CIRCUITS

CS Introduction to Complexity Theory. Lecture #11: Dec 8th, 2015

Two Sides of the Coin Problem

Foundations of Machine Learning and Data Science. Lecturer: Avrim Blum Lecture 9: October 7, 2015

Solutions to Sample Problems for Midterm

Lecture 10: Learning DNF, AC 0, Juntas. 1 Learning DNF in Almost Polynomial Time

An Algorithmist s Toolkit September 24, Lecture 5

CSE20: Discrete Mathematics for Computer Science. Lecture Unit 2: Boolan Functions, Logic Circuits, and Implication

Lecture 7: The Polynomial-Time Hierarchy. 1 Nondeterministic Space is Closed under Complement

1 Circuit Complexity. CS 6743 Lecture 15 1 Fall Definitions

Lecture 4: Codes based on Concatenation

Lecture 2 (Notes) 1. The book Computational Complexity: A Modern Approach by Sanjeev Arora and Boaz Barak;

Umans Complexity Theory Lectures

INSTITUTE OF MATHEMATICS THE CZECH ACADEMY OF SCIENCES. Random resolution refutations. Pavel Pudlák Neil Thapen

Tecniche di Verifica. Introduction to Propositional Logic

1 Distributional problems

CS261: A Second Course in Algorithms Lecture #18: Five Essential Tools for the Analysis of Randomized Algorithms

UC Berkeley CS 170: Efficient Algorithms and Intractable Problems Handout 22 Lecturer: David Wagner April 24, Notes 22 for CS 170

Topic: Sampling, Medians of Means method and DNF counting Date: October 6, 2004 Scribe: Florin Oprea

Lecture notes on OPP algorithms [Preliminary Draft]

Lecture 5: Efficient PAC Learning. 1 Consistent Learning: a Bound on Sample Complexity

1 Nisan-Wigderson pseudorandom generator

Computational Learning Theory - Hilary Term : Introduction to the PAC Learning Framework

: On the P vs. BPP problem. 18/12/16 Lecture 10

U.C. Berkeley CS294: Beyond Worst-Case Analysis Handout 8 Luca Trevisan September 19, 2017

1 PSPACE-Completeness

Impagliazzo s Hardcore Lemma

1 Approximate Counting by Random Sampling

6.895 Randomness and Computation March 19, Lecture Last Lecture: Boosting Weak Learners Into Strong Learners

Addition is exponentially harder than counting for shallow monotone circuits

CS294: Pseudorandomness and Combinatorial Constructions September 13, Notes for Lecture 5

CS151 Complexity Theory. Lecture 6 April 19, 2017

Lecture 23: Alternation vs. Counting

Majority is incompressible by AC 0 [p] circuits

Lecture 6: Deterministic Primality Testing

Boolean circuits. Lecture Definitions

IS VALIANT VAZIRANI S ISOLATION PROBABILITY IMPROVABLE? Holger Dell, Valentine Kabanets, Dieter van Melkebeek, and Osamu Watanabe December 31, 2012

Lecture 03: Polynomial Based Codes

Lecture 4: Hardness Amplification: From Weak to Strong OWFs

Error Correcting Codes: Combinatorics, Algorithms and Applications Spring Homework Due Monday March 23, 2009 in class

On the Randomness Complexity of. Efficient Sampling. Bella Dubrov

Classes of Boolean Functions

NP-Completeness I. Lecture Overview Introduction: Reduction and Expressiveness

1 Lecture 6-7, Scribe: Willy Quach

Definition 1. NP is a class of language L such that there exists a poly time verifier V with

Lecture 15: A Brief Look at PCP

Lecture 8: Alternatation. 1 Alternate Characterizations of the Polynomial Hierarchy

PRGs for space-bounded computation: INW, Nisan

CMPT 710/407 - Complexity Theory Lecture 4: Complexity Classes, Completeness, Linear Speedup, and Hierarchy Theorems

Notes on Computer Theory Last updated: November, Circuits

Polynomial Representations of Threshold Functions and Algorithmic Applications. Joint with Josh Alman (Stanford) and Timothy M.

Polynomial Identity Testing and Circuit Lower Bounds

Solving Random Satisfiable 3CNF Formulas in Expected Polynomial Time

: On the P vs. BPP problem. 30/12/2016 Lecture 12

Computability and Complexity Theory: An Introduction

CS/EE 181a 2010/11 Lecture 4

Lecture 2: Continued fractions, rational approximations

CSC 5170: Theory of Computational Complexity Lecture 5 The Chinese University of Hong Kong 8 February 2010

Pseudorandom Bits for Constant Depth Circuits with Few Arbitrary Symmetric Gates

Notes for Lecture 3... x 4

Automata Theory CS Complexity Theory I: Polynomial Time

P is the class of problems for which there are algorithms that solve the problem in time O(n k ) for some constant k.

Lecture 5. 1 Review (Pairwise Independence and Derandomization)

On the efficient approximability of constraint satisfaction problems

Notes for Lecture 2. Statement of the PCP Theorem and Constraint Satisfaction

Lecture 12: Constant Degree Lossless Expanders

PAC Learning. prof. dr Arno Siebes. Algorithmic Data Analysis Group Department of Information and Computing Sciences Universiteit Utrecht

Notes on Complexity Theory Last updated: October, Lecture 6

Critical Reading of Optimization Methods for Logical Inference [1]

Lecture 3 (Notes) 1. The book Computational Complexity: A Modern Approach by Sanjeev Arora and Boaz Barak;

6.842 Randomness and Computation April 2, Lecture 14

Lecture 5: The Landscape of Complexity Classes

Complete problems for classes in PH, The Polynomial-Time Hierarchy (PH) oracle is like a subroutine, or function in

Chapter 2. Reductions and NP. 2.1 Reductions Continued The Satisfiability Problem (SAT) SAT 3SAT. CS 573: Algorithms, Fall 2013 August 29, 2013

Quantum algorithms (CO 781/CS 867/QIC 823, Winter 2013) Andrew Childs, University of Waterloo LECTURE 13: Query complexity and the polynomial method

Transcription:

Lecture 3: AC 0, the switching lemma Topics in Complexity Theory and Pseudorandomness (Spring 2013) Rutgers University Swastik Kopparty Scribes: Meng Li, Abdul Basit 1 Pseudorandom sets We start by proving the existence of small sets that look pseudorandom to all small circuits. Getting your hands on such a set is enough for derandomizing all your randomized algorithms. Theorem 1. For any c and n, there is an m = O(n 2c ) and x 1,..., x m {0, 1} n such that circuit C with C n c Pr i [m] [C(x i ) = 1] Pr x {0,1} n[c(x) = 1] 0.1 Proof. Pick x 1, x 2,..., x m from {0, 1} n independently and uniformly at random. For a fixed C, consider the probability Pr x1,...,x m [ Pr i [m] [C(x i ) = 1] Pr x {0,1} n[c(x) = 1] > 0.1 ] Let Notice that Y i = { 1 C(xi ) = 1 0 C(x i ) = 0 E[Y i ] = Pr x {0,1} n[c(x) = 1] [ m i=1 E Y ] i = Pr m i [m] [C(x i ) = 1] So by the Chernoff bound: Pr x1,...,x m [ Pr i [m] [C(x i ) = 1] Pr x {0,1} n[c(x) = 1] > 0.1 ] e (0.1)2 m/2 Now we know that for every fixed circuit C, the random collection x 1,..., x m fools C with high probability. Taking a union bound over all C s.t. C n c (of which there are (n c ) O(nc) ): [ Pr C, C n c s.t. x 1,...,x m Pri [m] [C(x i ) = 1] Pr x {0,1} n[c(x) = 1] > 0.1 ] e O(n 2c) e O(nc log n) < 1 Thus with positive probability (and in fact, with high probability), a random choice of x 1,... x m has the desired property. In particular, such a choice exists. 1

2 AC 0 circuits AC 0 circuits are circuits with O(1) depth and unbounded fan-in AND, OR and NOT gates. It will be the first nontrivial circuit class for which we can prove exponential lower bounds. First observe: every polynomial size AC 0 circuit has an equivalent circuit in NC 1. We will show that PARITY and MAJORITY both do not have polynomial size AC 0 circuits. Thus AC 0 is a strict subset of NC 1. Let us warm up by understanding AC 0 circuits when the depth is really small. Depth 0 circuits are simply literals. Depth 1 circuits are simply ANDs or ORs of literals. Circuits of depth 0 and 1 thus cannot even compute some functions, even if we do not care about their size. Depth 2 circuits are either DNFs (ORs of ANDs of literals) or CNFs(ANDs of ORs of literals). Every function can be computed by a DNF and also by a CNF. The AND gates of a DNF are called terms. The OR gates of a CNF are called clauses. A t-dnf is a DNF where all the gates at the bottom level have fan-in at most t. Note that not every function can be computed by a t-dnf if t < n. Theorem 2. Depth 2 AC 0 circuits for PARITY have size at least Ω(2 n ). Proof. Suppose we have a DNF computing PARITY (the case of CNF is handled similarly). Suppose some term T does not include all variables. Consider an assignment which sets that term T to 1, and thus sets the value of the full DNF to 1. Now if we flip the value of any of the variables not in T, the term is still set to 1, and thus so is the DNF. But PARITY changes its value every time we flip a variable. Hence, all terms include all n variables. Now a term that includes all n variables can equal 1 on exactly 1 input in {0, 1} n. Since the DNF has to compute PARITY, which equals 1 on 2 n 1 inputs, there must be at least 2 n 1 terms in the DNF. Already showing that polynomial size depth 3 circuits cannot compute PARITY is quite nontrivial. We will show that any AC 0 circuit of constant depth and subexponential size cannot compute PAR- ITY. Theorem 3. If C is an AC 0 circuit of size s, depth d computing PARITY, then s 2 Ω(n 1 d 1 ) In order to prove the theorem, we need to use a powerful theorem describing the behavior of AC 0 circuits under restrictions. As in the case of formula lower bounds, we will say that there is a restriction of any AC 0 circuit which simplifies it drastically. On the other hand, any restriction of the PARITY function is a PARITY function, and will mean that it cannot be computed by a drastically simplified AC 0 circuit (unless the originall AC 0 circuit was very big to begin with). How can we hope to prove the existence of such a restriction? One idea is to induct on the circuit. Consider all the subcircuits feeding in to the top gate. By induction, we can get a simplifying restriction for each of the subcircuits. But how do these restrictions relate to each other? In fact, 2

these restrictions may be mutually inconsistent. This dooms any such naive strategy to find such a restriction. The key idea is to aim higher. We will try to show that most restrictions simplify a circuit drastically. This kind of statement lends itself to induction very nicely; if for each subcircuit of a circuit, most restrictions simplify that subcircuit drastically, then by a union bound, most restriction simplify all the subcircuits of a circuit simultaneously. The crucial aspect is to analyze how these simplified subcircuits interact with the top gate of the circuit. This is precisely the content of the switching lemma. The switching lemma is concerned with using random restrictions to simplify a CNF/DNF. To generate a random restriction, we pick a random set of variables, and set them to 0/1 uniformly and independently, and leave the remaining variables unset. We denote the formula f after restriction ρ by f ρ. Lemma 4. Given a t-cnf f, apply a random restriction ρ on it, leaving pn variables alive (p < 1/2). Pr[f ρ can not be represented as a s-dnf] (16pt) s Proof. Let l = pn be the number of alive variables, R k be the set of restrictions leaving k variables alive. We have: ( ) n R k = 2 n k k Let B be the set of bad restrictions in R l for which the restricted function cannot be represented as an s-dnf. Obviously, B R l. The probability that f can t be rewritten as a s-dnf is B R l. To find a upper bound on B, we try to find an encoding of B. Specifically, we will give an injection from B R l s W, where W is a set of size at most (4t) s. Therefore we will get: B R l ( n l s ) 2 n l+s (4t) s ( n ) l 2 n l l ( n l )s (8t) s p ( 1 p )s (8t) s (16tp) s What kind of property is not representable as an s-dnf? Define a minterm of a function to be a minimal set of variables which has the property that there is a setting of those variables which forces the function to equal 1. It is a simple fact that if every minterm of a function is of size s, then the function can be represented as an s-dnf. So if a function is not representable as an s-dnf, then it must have a minterm of size > s. We fix an ordering of the clauses of f, and within each clause fix an ordering of the variables. Take a bad restriction ρ. Because ρ is bad, f ρ is not identically 0 or 1; thus no clause of f is fixed to 0 under ρ (some clauses are fixed to 1, and the rest of the clauses are left unfixed). We now use the minterm properties of functions not representables as s-dnfs. Since ρ is bad, there is a restriction π of > s of the remaining variables that fixes f ρ to 1, but any setting of a subset of s of the variables of π does not fix the function f ρ to 1. Since ρπ fixes f to 1, it must fix each clause of f to 1. Let C 1 be the first clause which is not fixed to 1 by ρ (but is fixed to 1 by ρπ). Suppose C 1 has d 1 variables from π. Let π 1 be the part of π restricting those variables. Let π 1 be the unique setting to the variables of π 1 which prevents C 1 from being fixed to 1 (this exists since C 1 is OR of literals). 3

Now suppose we know only the restriction ρπ 1, but not ρ. The key observation is that this is enough to identify the clause C 1! Indeed, we can consider the clauses of f in order, and identify the first one which is not fixed to 1 by ρπ 1. By giving a little bit more information along with ρπ 1, we can also identify π 1 and thus ρ. Thus we found a way of describing a bad ρ by giving an element of R l d 1, along with a little bit more information. This is the main trick of the proof. We now fully describe the encoding. Find the first clause C 2 which is not fixed to 1 by ρπ 1. Let π 2 be the restriction to C 2 of π \ π 1. Let π 2 be the unique setting to the variables of π 2 which prevents C 2 from being fixed to 1. In general, let C i be the first clause which is not fixed to 1 by ρπ 1 π 2... π i 1. Let π i be the restriction to C i of π \ (π 1 π 2... π i 1 ). Let π i be the unique setting to the variables of π i which prevents C 2 from being fixed to 1. Continuing we get π 1, π 2,... and π 1, π 2,.... We stop this process after the total number of variables restricted by π 1... π k and π 1... π k exactly equals s (at the last step, we may restrict a subset of the variables of π in C i ). Let π be the restriction π 1 π 2... π k truncated to exactly s variables. Define ρ = ρπ. For each i [k], define a i {0, 1} t as follows: a i indicates which of the t variables of clause C i are set by π i. Also define a string b {0, 1} s which contains the values of π for each of the s variables fixed by π 1, π 2,..., π k. We will show that (ρ, (a i ) k i=1, b) completely determines ρ. Furthermore, we will show that ((a i ) k i=1, b) lies in a set of size at most (4t)s. As indicated earlier, using ρπ we can identify C 1 : we find the first clause not fixed to 1 by ρπ. Then using a 1 and b we can find π 1 and π 1. Then consider the restriction ρ 1 = ρπ \ π 1 π 1. Restriction ρ 1 lets us identify C 2 : it is the first clause which is not fixed to 1 by ρ 1. Using a 2, b we can identify π 2 and π 2. Then consider ρ 2 = ρ 1 \ π 2 π 2, and proceed. In the end we know ρ k = ρπ 1 π 2... π k, as well as π 1, π 2,..., π k, and thus we know ρ. Thus we have an encoding of ρ. Finally, observe that (a i ) k i=1 comes from the set of all sequences of length at most k, such that each element of the sequence being an element of {0, 1} t with at least one 1, and the total number of 1 s in all the a i equals s. It is a simple exercise to show that the number of such sequences is at most (2t) s. Theorem 5. If C is a circuit of size M and depth d that computes the parity of n inputs, then M 2 Ω(n1/(d 1) ) Proof Idea: The proof will use the Switching Lemma extensively. Given an AC 0 circuit, a random restriction is very likely to simplify each DNF (assuming it has small width) at the bottom two layers. Specifcally, it s quite likely that the restriction of each small-width DNF will be computable by small-width CNF. So we switch all the small-width DNFs at the bottom two layers with smallwidth CNFs. This lets us merge two layers of AND gates, and hence shrink the depth by 1. We then repeat, overall making d 2 random restrictions. The final restricted function is computable by a small width DNF/CNF. We have that a good fraction of the variables are still unset and that any restriction of Parity is either Parity (or its negation). But Parity on m variables (or its negation) cannot be computed by a DNF/CNF of width < m. Proof. Suppose C is a depth d AC 0 circuit of size M which computes parity. Assume without loss of generality that the bottom layer of C is gates (a symmetric argument works if the bttom layer is gates). Then the second layer consist of gates computing DNF formulas. We first apply a random restriction α 0 which sets each variable with probability 1/2; this serves to make every gate at the bottom layer have fan-in bounded by t (with t TBA). 4

A gate survives iff none of its variables has been set to 0. For a fixed gate, the probability that a gate of width > t survives is (3/4) t. Then by the union bound, we have that the probability that some gate of height 1 and width > t survives is M(3/4) t. Also, by the Chernoff bound the probability that fewer than n/4 variables are unset is exp( n). After the restriction α 0, we have that the gates at the second layer compute t-dnf s. We now apply a restriction α 1 with p = 1/100t. From the Switching Lemma, we have that the probability that a particular restricted DNF can not be represented by a t-cnf is (10pt) t = (1/10) t. Then by the union bound, the probability that some gate fails to become a t-cnf is M(1/10) t. If we now plug in these CNFs to the circuit, we can collapse the bottom two laters and get a new circuit of depth d 1, with the gates at bottom level having fan-in at most t. We can fix such a restriction, and repeat. We apply restrictions α 2,..., α d 2, each with p = 1/100t, collapsing the circuit to depth 2. After the last restriction, the number m of variables remaining n is with probability at least exp( n). Also, with probability at most O(dM (1/10) t ), 4(100t) d 2 the resulting depth 2 circuit is a t-dnf/t-cnf. Now if t < m, this is a contradiction, since a t-dnf/t-cnf cannot compute parity or its negation on > t bits. Thus if O(dM (1/10) t ) + exp( n) < 1, n t < 4(100t) d 2 then we have a contradiction. Taking t = O(n 1 d 1 ) the second condition holds, and so to invalidate the first condition we must have M 2 Ω(n d 1 1 ), as desired. 3 Lower Bounds for Majority Theorem 6. If C is a circuit of size M and depth d that computes the majority of n inputs, then M 2 Ω(n1/(d 1) ) Theorem 6 can be proved using the Switching Lemma. It is a good exercise to try and do so. We prove the following weaker version of the theorem: Theorem 7. If C is a circuit of size M and depth d that computes the majority of n inputs, then M 2 Ω(n1/d ) The proof of the theorem follows from the following Lemma: Lemma 8. MAJ AC 0 PARITY AC 0. Proof. Say that Majority AC 0. Then given x {0, 1}, we can test if x k or x k, by feeding fixed 0/1 bits along with x into the Majority circuit. It follows that we can test x = k. Let C k be a circuit that tests if x = k. Then we can realize Parity by taking k odd C k. 5

4 Randomized AC 0 Theorem 9. For every C(x, r) AC 0, with C n c and function f : {0, 1} n {0, 1} such that P r r {0,1} n[c(x, r) f(x)] ɛ x {0, 1} n then C (x) AC 0 s.t. C (x) = f(x), x {0, 1} n with C n 2c. In the last lecture, we showed that any randomized circuit C computing f could be converted into a deterministic one. The proof involved amplifying the success probability by running the circuit several times with independent random seeds and then taking the majority. Once we have a circuit C computing f with error probability at most 2 n, then there exists a random string r that gives the correct answer for every x {0, 1} n. Fixing r into C gives us a deterministic circuit. Since we proved that Majority / AC 0, the same argument will not work for converting randomized AC 0 circuits to deterministic ones. A beautiful observation of Ajtai and Ben-Or is that we don t really require the full power of Majority for that argument to work. Instead one can work with a weaker notion, Approximate Majority. It turns out that AC 0 circuits can compute Approximate Majority. With this new tool, we will be able to prove Theorem 9. Definition 10 (Approximate Majority). An Approximate Majority is a boolean function f : {0, 1} n {0, 1} such that: f(x) = 0 for every x of Hamming weight at most n/4. f(x) = 1 for every x of Hamming weight at least 3n/4. Theorem 11 (Ajtai, Ben-Or). There exists an Approximate Majority computable with polynomial size AC 0 circuits. Thus every function computable by a polynomial size randomized AC 0 can also be computed by a polynomial size deterministic AC 0 circuit. We will prove this by coming up with a distribution of circuits C such that if x n/4 then P r C [C(x) 0] << 2 n2. if x 3n/4 then P r C [C(x) 1] << 2 n2. If we can come up with such a distrubution, then there exists a circuit in AC 0 that computes Approximate Majority. 6