Topic: Sampling, Medians of Means method and DNF counting Date: October 6, 2004 Scribe: Florin Oprea

Similar documents
An Algorithmist s Toolkit September 24, Lecture 5

Homework 4 Solutions

Stanford University CS254: Computational Complexity Handout 8 Luca Trevisan 4/21/2010

An Introduction to Randomized algorithms

Theory of Computer Science to Msc Students, Spring Lecture 2

Essential facts about NP-completeness:

Approximate Counting and Markov Chain Monte Carlo

The Monte Carlo Method

Lecture 13 March 7, 2017

15-855: Intensive Intro to Complexity Theory Spring Lecture 7: The Permanent, Toda s Theorem, XXX

1 Distributional problems

CS261: A Second Course in Algorithms Lecture #18: Five Essential Tools for the Analysis of Randomized Algorithms

CSC 2429 Approaches to the P vs. NP Question and Related Complexity Questions Lecture 2: Switching Lemma, AC 0 Circuit Lower Bounds

(x 1 +x 2 )(x 1 x 2 )+(x 2 +x 3 )(x 2 x 3 )+(x 3 +x 1 )(x 3 x 1 ).

1 More finite deterministic automata

1 Approximate Counting by Random Sampling

Lecture 5: Two-point Sampling

Lecture 14: October 22

Lecture 5. 1 Review (Pairwise Independence and Derandomization)

Introduction. Pvs.NPExample

1 Circuit Complexity. CS 6743 Lecture 15 1 Fall Definitions

Lecture 11 October 7, 2013

Notes for Lecture 2. Statement of the PCP Theorem and Constraint Satisfaction

1 Reductions and Expressiveness

Notes on Computer Theory Last updated: November, Circuits

HW11. Due: December 6, 2018

CS154, Lecture 15: Cook-Levin Theorem SAT, 3SAT

6.045: Automata, Computability, and Complexity (GITCS) Class 15 Nancy Lynch

NP-Completeness I. Lecture Overview Introduction: Reduction and Expressiveness

6.841/18.405J: Advanced Complexity Wednesday, February 12, Lecture Lecture 3

Lecture 4: Two-point Sampling, Coupon Collector s problem

COL 352 Introduction to Automata and Theory of Computation Major Exam, Sem II , Max 80, Time 2 hr. Name Entry No. Group

P is the class of problems for which there are algorithms that solve the problem in time O(n k ) for some constant k.

Lecture 2 Sep 5, 2017

PAC Learning. prof. dr Arno Siebes. Algorithmic Data Analysis Group Department of Information and Computing Sciences Universiteit Utrecht

Design and Analysis of Algorithms

NP-Complete Problems. More reductions

Lecture 24: Approximate Counting

1 Randomized Computation

11.1 Set Cover ILP formulation of set cover Deterministic rounding

1 Reductions from Maximum Matchings to Perfect Matchings

6.842 Randomness and Computation Lecture 5

Critical Reading of Optimization Methods for Logical Inference [1]

15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018

Foundations of Machine Learning and Data Science. Lecturer: Avrim Blum Lecture 9: October 7, 2015

SOLUTION: SOLUTION: SOLUTION:

15-451/651: Design & Analysis of Algorithms October 31, 2013 Lecture #20 last changed: October 31, 2013

Computational Learning Theory

Non-Interactive Zero Knowledge (II)

Lecture 3: AC 0, the switching lemma

Tecniche di Verifica. Introduction to Propositional Logic

NP and Computational Intractability

8. INTRACTABILITY I. Lecture slides by Kevin Wayne Copyright 2005 Pearson-Addison Wesley. Last updated on 2/6/18 2:16 AM

Chapter 2. Reductions and NP. 2.1 Reductions Continued The Satisfiability Problem (SAT) SAT 3SAT. CS 573: Algorithms, Fall 2013 August 29, 2013

DRAFT. Complexity of counting. Chapter 8

1 Randomized complexity

an efficient procedure for the decision problem. We illustrate this phenomenon for the Satisfiability problem.

UC Berkeley CS 170: Efficient Algorithms and Intractable Problems Handout 22 Lecturer: David Wagner April 24, Notes 22 for CS 170

Lecture 59 : Instance Compression and Succinct PCP s for NP

CSCI 1590 Intro to Computational Complexity

Classes of Boolean Functions

CSCI3390-Lecture 16: NP-completeness

Algorithms: COMP3121/3821/9101/9801

Lecture 2: November 9

Lecture Notes CS:5360 Randomized Algorithms Lecture 20 and 21: Nov 6th and 8th, 2018 Scribe: Qianhang Sun

Lecture 5: February 21, 2017

CS151 Complexity Theory. Lecture 13 May 15, 2017

Complexity, P and NP

CSC 5170: Theory of Computational Complexity Lecture 5 The Chinese University of Hong Kong 8 February 2010

Computational Learning Theory - Hilary Term : Introduction to the PAC Learning Framework

CS 151 Complexity Theory Spring Solution Set 5

Introduction to Computational Complexity

Computational Complexity: A Modern Approach. Draft of a book: Dated January 2007 Comments welcome!

U.C. Berkeley CS278: Computational Complexity Professor Luca Trevisan August 30, Notes for Lecture 1

Lecture 18: Inapproximability of MAX-3-SAT

Lecture 2: Streaming Algorithms

SAT, NP, NP-Completeness

Randomized Algorithms

Solution of Exercise Sheet 12

6.080 / Great Ideas in Theoretical Computer Science Spring 2008

NP-Hardness reductions

Limits to Approximability: When Algorithms Won't Help You. Note: Contents of today s lecture won t be on the exam

Lecture 3: Reductions and Completeness

NP Complete Problems. COMP 215 Lecture 20

Lecture 8: February 8

1 Primals and Duals: Zero Sum Games

Bipartite Perfect Matching

Complexity Theory. Jörg Kreiker. Summer term Chair for Theoretical Computer Science Prof. Esparza TU München

Lecture 4. 1 FPTAS - Fully Polynomial Time Approximation Scheme

Lecture 24: Randomized Complexity, Course Summary

Non-Deterministic Time

CS 6820 Fall 2014 Lectures, October 3-20, 2014

Lecture 17: Interactive Proof Systems

Lecture 15: A Brief Look at PCP

Approximate Counting

Lecture 22: Derandomization Implies Circuit Lower Bounds

Lecture 2 (Notes) 1. The book Computational Complexity: A Modern Approach by Sanjeev Arora and Boaz Barak;

Lecture 3: Voronoi Diagrams & VCP Preliminaries July 24, 2013

MTAT Complexity Theory October 20th-21st, Lecture 7

The Probabilistic Method

Transcription:

15-859(M): Randomized Algorithms Lecturer: Shuchi Chawla Topic: Sampling, Medians of Means method and DNF counting Date: October 6, 200 Scribe: Florin Oprea 8.1 Introduction In this lecture we will consider the problem of counting the number of solutions of a problem in NP. This is obviously NP-hard for problems that are NP-complete, but not necessarily easy for problems in P. The class of such counting problems is denoted by #P. Definition 8.1.1 The class #P (called Sharp-P ) is the class of all functions f for which there exists an NP language L with a non-deterministic Turing machine M such that f(x) equals the number of accepting paths in M on input x. Counting problems have several interesting applications: estimating the volume of a body in space, counting the number of perfect matchings of a graph, counting the number of satisfying assignments for a DNF formula, approximating the permanent of a matrix. In this lecture we will look at the last two. A counting problem, in its most general form, can be described as follows: Given a universe U and a set S U, determine the size of S. When it is easy to determine the size of U, then this problem translates to estimating the ratio S in an efficient way. For functions in the class #P, we can think of the universe U as being the set of all solutions to the problem, or the set of all accepting paths. Example: #SAT is a canonical problem in #P (i.e., it is #P -complete). A SAT formula is a formula in CNF form: (x 1 x 2 x 3 ) (x x 1 x 5 ).... #SAT estimates the number of satisfying assignments for a SAT formula. 8.2 The DNF Counting Problem A DNF formula is a conjunction of clauses, each of which is a disjunction of literals of the form x or x. An example is: (x 1 x 2 x 3 ).... Our goal is to estimate the number of assignments to the variables that satisfy this formula. Note that it is easy to determine whether a DNF formula is satisfiable or not the formula is unsatisfiable if and only if each clause in it contains a contradiction. We will now show the surprising result that #DNF is in fact #P complete. This indicates that counting can be much harder than the corresponding decision problem. Theorem 8.2.1 #DNF is #P complete Proof: We reduce #SAT to this problem. Consider a CNF formular f of polynomial size. Let g = f. Note that g can be expressed in DNF form with the same size as f. 1 Now any assignment 1 Think about why this is true. 1

that satisfies f does not satisfy g and vice versa. Thus we have 2 n #g = #f, and solving #g gives an answer to #f, which is a #P complete problem. Although we do not know how to compute #DNF in polynomial time, our aim will be to estimate it to within a reasonable accuracy. In order to do so, our first attempt is to use sampling. Let U be the universe of possible assignments for a DNF formula f, and S U the set of satisfying assignments. The algorithm is the following: 1. Repeat the following t times: (a) Pick x uniformly at random from U. (b) If x belong to S, count 1. (c) If not, count 0. 2. Take an average of the above counts and return that as an estimate of S. Let p = S and consider the random variable X i denote the output of i-th iteration. Clearly E[X i ] = S = p. Let X denote our estimate at the end of the algorithm, that is, X = 1 t t X i. Then E[X] = p and one can use the value X as an estimator for the size of S. There exists a trade-off between the value of t and the accuracy of the estimator. To determine after how many steps it is safe to stop the algorithm we ll use Chebyshev s inequality. Chebyshev: Pr[ X µ yσ] 1, or using y = ɛµ y 2 σ, σ2 Pr[ X µ ɛµ] ɛ 2 µ 2 The first and second moments of the estimator x are given as follows. µ(x) = S σ 2 (X i ) = p(1 p) t σ 2 (X) = 1 σ 2 (X t 2 i ) = tp(1 p) t 2 = p(1 p) t Using these bounds, we can bound the probability that our estimate is very far from the true value of S. Pr[failure] = Pr[estimate is off by more than 1 + ɛ] = p(1 p) tɛ 2 p 2 = 1 p tɛ 2 p So it is sufficient to take t = (1 p) to have a failure probability of at most 1 ɛ 2 p. In general, after t steps of sampling from a random variable with mean µ and variance σ 2, we have: 2

µ t = µ σ t = σ2 t σ 2 Pr[failure] tɛ 2 µ 2 < 1 ( ) σ 2 for t = ɛ 2 µ 2 An estimator with the property E[X] = µ (where µ is the quantity to be estimated) is called an unbiased estimator. Thus X given by the above algorithm is an unbiased estimator. The quantity σ 2 µ 2 is called the critical ratio. If σ2 is poly(size of problem, 1 µ 2 ɛ ) the above algorithm is a FPRAS (fully polynomial randomized approximation scheme). Definition 8.2.2 A fully polynomial randomized approximation scheme (FPRAS) is a randomized algorithm A running in time poly (n, 1 ɛ ) such that: Pr[(1 ɛ)f(x) A(x) (1 + ɛ)f(x)] 3 The algorithm A is called a PRAS if its running time is poly n. As mentioned before, if the critical ratio is polynomial in n the above algorithm is a FPRAS for µ = E[X]. If we want Pr[failure] σ2 < δ we can take t = σ2. When δ is large, choosing t tɛ 2 µ 2 δɛ 2 µ 2 this way is sufficient. However, if δ is very small, t becomes large. To do better we would like t to be of the order of O(log( 1 δ )). Here is an algorithm that achieves this. This technique is known as the median of means method. Denoting by A the previous algorithm, we will run 2s + 1 copies of A for some value of s to be determined later. Denote by A i (x) the output of the i-th instance of A. Our new algorithm outputs the median of all these 2s + 1 values. A natural question here is why we don t take the mean. Note however that the algorithm A is not guaranteed to be within (1 + ɛ) of µ with probability 1, so there may be cases when its output is far away from the true value of µ. Taking the mean in this case might have a bad impact on the final estimate, and our accuracy decreases only as 1 t and not as 1 exp(t). Let us now compute the probability of failure of the new algorithm. First note that failure in this case means that the median fell outside of the [(1 ɛ)µ, (1 + ɛ)µ] interval, an event which happens when at least s + 1 instances of A have failed. We can use this fact to upper bound the probability of failure as follows: 3

We can now choose s = log 3 Pr[new algorithm fails] Pr[s + 1 runs of algorithm of A fail] ( ) ( ) 2s + 1 1 i ( ) 3 2s+1 i i i s+1 ( ) 1 s+1 ( ) 3 s ( ) 2s + 1 i i s+1 ( ) 1 3 s = 2 2s 1 δ 2 2δ+2 ) s ( 3 to obtained the desired bound. Note that the new estimator may be biased, however we boosted the probability that A(x) [(1 ɛ)µ, (1 + ɛ)µ]. Let us see how can we apply the previous results to #DNF. Recall that U denotes the set of all assignments and S denotes the set of satisfying assignments, S 1 p = p. Then after t Ω( ɛ 2 p ) steps, the probability of failure is below 1. Now, if p is large, the algorithm has a high chance of succeeding, however when p is small (say 1 2 n ), then it may happen that no element from S is sampled within any polynomial number of steps. This is an inherent problem to this approach, due to the fact that the size of the sampling universe is very large. In the following subsection we will see a different sampling technique, called importance sampling. The idea will be to shrink the size of the sampling universe to only a sub-population of interest, thereby increasing the value of p. 8.2.1 Importance Sampling This method is due to Karp, Luby and Madras. We assume that σ2 is small. The algorithm will boost p by decreasing the size of the universe U. µ 2 Let S i be the set of assignments satisfying clause C i, i = 1... m. Then S = m S i is the set whose size we need to estimate. Define U = {(i, a i ) a i S i }. Then U = i S i. The sets S i have the following properties: 1. For all i, S i is computable in polynomial time. 2. It is possible to sample uniformly at random from each S i. 3. For any a U it can be determined in polynomial time whether a S i.

To understand how the sampling process works we will use the following construction. Imagine a table with the rows indexed by the elements of U and the columns indexed by the sets S i. S 1 S 2 S 3... S m a 1 a 2 a 3. We place a at the intersection of each row corresponding to an element of a U and a column corresponding to a set S i such that a S i (implying that a satisfies clause C i ). Thus all assignments satisfying a clause will have a corresponding row containing at least one. To determine the size of S it would be sufficient to count all such rows. For that we use the following procedure: for each row containing at least one star, make one of its stars a special star ( ). Now the number of special stars will be equal to the size of S. To count the number of special stars we use the same idea as before: sample a uniformly at random from the set of all s (U ) and then test if it is special. If so, count 1, otherwise count 0. After a number of steps stop and output X U, where X is the total number of 1 s. To complete the description of the algorithm we need two more things: (i) A method to pick up a uniformly at random from the table. (ii) Determine when a star is special. For (ii) we define a as being special, if it is the first one in its row. Now it becomes easy to test in polynomial time that a is special by using property 3 from above: if the corresponding assignment satisfies the s column but no other column before that, then it is special. Otherwise it is not. For (i) (picking up a u.a.r.) we use the following algorithm: Compute S i. Pick i with probability S i i S i. Pick a random assignment satisfying C i. This is again easy to do, by the properties of the sets S i. There is one more thing we need to check before we are done: the fact that p is sufficiently large. But, p = S U = # of special s # of 1 s m. Therefore, after t Ω( m ɛ 2 ) steps we have a fairly good estimator for the size of S. 5

8.3 Permanent of a matrix M Let M be a n n {0, 1} matrix. Its permanent is defined as follows: P erm(m) = n π S n M iπ(i) This is very similar to the determinant of M less the sign of each permutation π. Det(M) = π S n sgn(π) n M iπ(i) Computing Perm is #P-complete. Note that Perm is equivalent to counting the number of perfect matchings in a bipartite graph. If we take the two sets of vertices to be the row numbers and the column numbers respectively, each non-zero term in the permanent corresponds to a perfect matching in the associated bipartite graph G M. An unbiased estimator for Perm Replace each 1 in M by +1 or -1 independently at random and let M be the newly obtained matrix. Denote by X = (Det(M )) 2. Claim 8.3.1 E[X]=Perm(M) Proof: Let S be the set of permutations for which n M iπ(i) = 1. Then P erm(m) = S. For each π i S, denote by X i = sgn(π i ) n Then we have the following: M jπ i (j) j=1 { 1, 1}. Det(M ) = π i S X i. X = (Det(M )) 2 = i X i 2 + i j X i X j E[X] = i E [ X i 2 ] + i j E[X i ]E[X j ] = S + 0 = P erm(m) where we used the fact that the variables X i and X j are pairwise independent for i j and each of them has zero mean. This estimator has a high variance, therefore in pactice we use something different. This will be presented later in the class. 6