An Exposition Sandip Sinha Anamay Chaturvedi Indian Institute of Science, Bangalore 14th November 14
Introduction Deciding a Language Let L {0, 1} be a language, and let M be a Turing machine. We say M decides the language L if for all strings in the language it halts in the accept state, and for all strings not in the language it halts in the reject state.
Definitions The class DTIME Let T : N N be some function. A language L is in DTIME(T(n)) iff there is a Turing machine that runs in time c T (n) for some constant c > 0 and decides L. The class P (polynomial-time) P = c 1 DTIME(n c ) (polynomial-time)
Examples CONNECTED GRAPH = {< G >: G is a connected graph} Decision version of single-source shortest-path problem L = {< G, s, t, k >: G is a graph with non-negative edge weights, s and t are vertices in G with a path from s to t of length k} PRIMES = {< n >: n is a prime}
Definitions The class NP (non-deterministic polynomial-time) A language L {0, 1} is in NP if there exists a polynomial p : N N, and a polynomial-time TM M (called the verifier for L), such that for every x {0, 1}, x L u {0, 1} p( x ) s.t. M(x, u) = 1 If x L and u {0, 1} p( x ) satisfy M(x, u) = 1 then we call u a certificate for x (with respect to the language L and machine M).
Examples All languages in P VERTEX COVER = {< G, k >: G has a vertex cover of size k} COMPOSITES = {< n >: n is a composite}
Reduction Polynomial-time reduction A language L {0, 1} is polynomial-time reducible to a language L {0, 1}, denoted by L p L, if there is a polynomial-time computable function f : {0, 1} {0, 1} such that for every x {0, 1}, x L if and only if f (x) L.
Definitions NP-hard We say L is NP-hard if L p L for every L NP. NP-complete We say L is NP-complete if L is NP-hard and L NP.
CNF Formula A Boolean formula over variables u 1, u 2,..., u n consists of the variables (and their negations) and the logical operators AND ( ), OR ( ) and NOT ( ). A clause is a boolean formula of the form i v ij, where each v ij is a variable u k or its negation ū k. We say a boolean formula over variables u 1, u 2,..., u n is in Conjunctive Normal Form (CNF) if it is a conjunction of clauses, i.e., it is of the form ( ) v ij j i
CNF Formula Example: If x and y are bits, the expression x = y can be represented as follows: (x ȳ) ( x y) The size of a CNF formula is the number of and symbols it contains.
Representation of a boolean formula in CNF Lemma: For every k-variable boolean formula f : {0, 1} k {0, 1}, there is a k-variable CNF formula φ of size atmost k2 k such that φ(v) = f (v) for every v {0, 1} k. Proof. For u = (u 1, u 2,..., u k ), we can design a clause C u (v 1, v 2,.., v k ) such that C u (u) = 0 and C u (v) = 1 for every v u. For instance, if u = (1, 0, 0, 1), we set C u (v) = v 1 v 2 v 3 v 4. The CNF Formula φ is the conjunction of all such clauses C u for u such that f (u) = 0. φ(v) = u:f (u)=0 C u (v) Then φ(v) = f (v) for all v {0, 1} k. The size of φ is at most k2 k.
SAT A satisfiable formula φ is one for which there exists a satisfying assignment (an assignment z of the variables such that φ(z) = 1). SAT is the language consisting of all satisfiable CNF formulae.
Cook-Levin Theorem Theorem (Cook-Levin) SAT is NP-complete. Clearly, SAT is in NP since a satisfying assignment serves as the certificate. The more interesting result is that SAT is NP-hard.
Proof: Goal Let L {0, 1} be a language in NP. By definition of the class NP, there is a polynomial function p and a a polynomial time TM M such that x L iff u {0, 1} p( x ) s.t. M(x, u) = 1. Let the running time of M be T (n), which is a polynomial. To show that SAT is NP-hard, we have to exhibit a polynomial-time computable function f which converts any x {0, 1} into a CNF formula φ x such that x L iff φ x is satisfiable. Equivalently, φ x SAT iff u {0, 1} p( x ) s.t. M(x, u) = 1. For the rest of the proof, we fix an input x of length n, a certificate u of length p(n), and y = x u of length n + p(n).
Proof: First Attempt We can convert the boolean function that maps u {0, 1} p( x ) to M(x, u) into a CNF formula φ x. Clearly, we have φ x (u) = M(x, u) for every u {0, 1} p( x ). Thus a string u such that M(x, u) = 1 exists iff φ x is satisfiable. Problem: Size of φ x can be as large as p( x )2 p( x ), which is exponential in length of the input x, so it cannot be computed in polynomial time.
Proof: Main Ideas Computation is local: In a single step of the execution of M on x, only one bit in each work tape is modified, and this change depends on a small (constant) number of factors. So, we can express each basic step of M as a CNF formula of constant size. The conjunction of all these formulae, along with constantsized CNF formulae to check the initial configuration and the final configuration, will be a CNF formula of size (n + T (n))d, where d is some constant which depends only on M. Thus the CNF formula has size polynomial in n. From the proof, it will be clear that this CNF formula can be computed in time polynomial in the running time of M.
Proof: Assumptions M has two tapes - an input tape and a work/output tape. M is an oblivious TM, i.e. a TM whose head movement is independent of its tape contents. So, M s computation takes the same time for all inputs of size n and for every i the location of M s heads at the i th step depends only on i and input length. These assumptions are without loss of generality since it can be shown that if a language L is decided by a TM M in time T (n), there is a two-tape oblivious TM which decides L in time O(T (n) 2 ).
Proof: Snapshot A snapshot of M is a 3-tuple < p, a, b >, where: p is the current state of M a is the symbol at the current head position on M s input tape b is the symbol at the current head position on M s work/output tape Input Tape a p Work Tape b
Proof: Encoding of Snapshot For each step i of the computation of M on x, we will store an encoding z i of the snapshot of M at the end of the i th step. Size of z i is Q Γ Γ, where Q is the set of states and Γ is the tape alphabet of M. This is a constant which we call c.
Notation prev(i) denotes the last step before the i th when the head position in the work/output tape was the same as that just before the i th step. inputpos(i) denotes the head position in the input tape just before the i th step.
Dependence of a Snapshot Input Tape 1 inputpos(i) m Snapshots 0 Prev(i) i-1 i T(n)
Proof: Constructing the CNF formula We would like to check that for each i T (n), the snapshot z i is correct given the snapshots for the previous i 1 steps. z i depends only on its previous state, and the symbols read from its tapes. We can express the above dependence as z i = F (z i 1, z prev(i), y inputpos(i) ), where F is derived from the transition function of M. This condition can be expressed as a CNF formula of size (3c + 1)2 3c+1.
Proof: Constructing the CNF formula x L iff there exists a string y {0, 1} n+p(n) and a sequence of strings z 0,..., z T (n) {0, 1} c satisfying the following conditions: 1 The first n bits of y are equal to x. Size: 4n 2 The string z 0 encodes the initial snapshot < s,, > of M. Size: c2 c 3 For every i {1,..., T (n)}, z i = F (z i 1, z prev(i), y inputpos(i) ). Size: T (n)(3c + 1)2 3c+1. 4 The last string z T (n) encodes a snapshot in which the machine halts and outputs 1. Size: c2 c Thus, the entire CNF formula, which is a conjunction of all the above conditions, is of size d.(n + T (n)) (where d is some constant depending only on M). The size is polynomial in n since T (n) is a polynomial function.
Proof: Winding up The CNF formula φ x will take variables y {0, 1} n+p(n) and z {0, 1} c(t (n)+1) and will verify that y, z satisfy the AND of all the four conditions. Thus x L iff φ x SAT. φ x has size polynomial in n. φ x can be computed in time polynomial in T (n), the running time of M. This shows SAT is NP-hard, and hence NP-complete.
Implications This theorem provides the first example of a natural NP - complete problem (whose definition does not depend on Turing Machines). To show that some other language L in NP is NP-complete, it suffices to exhibit a polynomial-time computable reduction from SAT (or 3SAT, in which each clause has atmost 3 literals) to L. Example: 0\1 Integer Programming.
References [1] Arora, S. and Barak, B., Computational Complexity: A Modern Approach, Cambridge University Press, 2009. [2] Karp, Richard M. (1972) Reducibility Among Combinatorial Problems, Complexity of Computer Computations. New York: Plenum. pp. 85 103. ISBN 0-306-30707-3. [3] Cook, Stephen (1971). The complexity of theorem proving procedures. Proceedings of the Third Annual ACM Symposium on Theory of Computing. pp. 151 158.