Class Note #20 Date: 03/29/2006 [Overall Information] In today s class, the following four concepts were introduced: decision version of a problem, formal language, P and NP. We also discussed the relationship between P and NP. [No Announcements Today] [During the Lecture] 1, At the beginning of today s lecture, we briefly reviewed the general lower bound running time calculation for comparison based sorting algorithms, and recalled what we learned before about polynomial time: an algorithm is said to run in polynomial time if its running time T(n) satisfies T(n) = O(n k ). Here, k is some constant that does not grow with n. We also went through several simple running time expressions as examples. Notice that while log(n) grows even slower than polynomial, we are only looking for an upper bound, so an algorithm running in time O(log n) is also polynomial. Usually, we equate polynomial running time with good and nonpolynomial with bad. Most algorithms we studied in this course so far are polynomial: most non polynomial algorithms are impractical except for small inputs. However, it should be noted that sometimes, input are
known to be small enough to make exponential algorithms practical, and in other cases, the input is so large that merely polynomial time is not good enough, and algorithms can only be used if their exponent is very small. For instance, if an algorithm is used to analyze the entire WWW graph, running time n 2 is already much too expensive. 2, To introduce the idea of NP and NP Completeness, we discussed the context in which it may be relevant in practice. The motivating story was taken from the wonderful book on the topic by Garey and Johnson. If a programmer/algorithms designer is in charge of solving a particular problem (in our case, a real world application which requires finding the largest clique in a graph of recommendations among customers), and finds it impossible to come up with a correct and polynomial time algorithm, how could the programmer convince others (such as his boss) that this is not a failure on his part, but inherent in the problem? Ideally,
the programmer would like to prove formally that no efficient algorithm is possible. However, for most practical problems, no such proof is known at this point. Instead, the theory of NP completeness provides a way to prove that it is just as difficult as thousands of other problems that many famous researchers have worked on for decades, and not been able to solve efficiently. Thus, it is strong evidence that the problem may not be solvable efficiently, and certainly that the failure to solve it efficiently is not the programmer's fault. In order to simplify speaking formally about problems, we will focus on what is called the decision version : turning the problem into a yes/no question that still captures all of the difficulty of the original problem. For example, the decision version of the Minimum Spanning Tree problem is phrased as follows: given a graph G and a cost bound C, is there a minimum spanning tree of cost at most C? As further examples, we looked at the following three problems: (1) The Coin Selection problem from the midterm exam; (2) The recommendation contest problem; (It is formally known as k Clique Problem.) (3) The Graph Coloring problem; (Assign the minimum number of colors to the nodes of a graph such that no adjacent nodes have the same color. Below, we see the decision versions of those problems.
3, Once we are focusing on decision problems, we can easily derive the notion of a language. All inputs are considered as strings (describing the data in some natural format). A language is simply a set of strings, which we can associate with the strings that have an answer of Yes for a decision problem. We say that an algorithm decides a language L if and only if it answers yes to all inputs from the language and no to all others.
4, The class P (polynomial time) then consists of all languages L such that there exists polynomial time algorithm A deciding L. Notice that this is a class of languages/problems, not of algorithms. For instance, there may also be other, non polynomial, algorithms for languages L in P (stupid ways to solve the same problem). However, if there is just one algorithm deciding the language in polynomial time, then the language is in P.
5, After that, we tried to see which of the four decision problems we looked at earlier are in P. For example, on the one hand, since we have polynomial time algorithms solving the Minimum Spanning Tree and Coin Selection problems, we can compute the best solution, and compare the result with the threshold value and then give the correct yes/no answer. On the other hand, it is not known if there are polynomial algorithms solving the k Clique and k Coloring problems. 6., With all this preparation, we are now ready to define the class NP. Notice that it does not stand for not polynomial, but rather for nondeterministic polynomial time. The notion of NP is based on verifying a solution to a problem, rather than finding one from scratch. Informally, NP is defined as the problems
with the property that when the answer is yes, there exists a short proof which can be verified efficiently. The Minimum Spanning Tree problem belongs to NP, as does the Coin Selection problem. As further examples, Graph Coloring belongs to NP. To prove to someone that a coloring with k colors exists, one can simply write down the coloring (which does not take much space). And if a coloring has been suggested, it is easy to verify if it is legal, by ascertaining that (1) no two adjacent nodes have the same color, and (2) no more than k colors are used. Similarly, to prove that a graph contains a k clique, one can simply write down the corresponding set of nodes (which is a short proof). This can be verified by testing that all pairs of nodes are connected, and at least k nodes were in the set. Hence, all four problems are in NP. The notion of NP, and proof verification, can be understood quite naturally with the analogy of a student (prover), and TA or instructor (verifier). If the student thinks that the answer to a problem is Yes, then he has to provide a proof of that fact. If the correct answer is Yes, then there must be at least one proof that the student could give which would convince the instructor to accept the proof and give full credit. If the correct answer is No, then no matter what proof the student writes down, a correct instructor can never be convinced that the answer is in fact Yes.
6, The formal definition of NP was given afterwards. In this definition, the idea of short and efficiently are mathematically and formally defined. The polynomial time verification algorithm is by A(x, y), where x is the input and y is the proof with length no longer than s( x ). The size function s( x ) describes how long the proof is allowed to be as a function of the input length, and must be bounded by a polynomial.
7, Why is the concept of NP important? Why did we spend so much time and energy to define something as abstract as proof verification? It turns out that the set NP contains most practical problems one runs into, and thus captures the common characteristics of many problems. First, we can verify that P is a subset of NP. That is because for problems in P, we don t need the y part (proof part) at all in the verification algorithm A(x,y). Given the input x, we can just run the solving algorithm in polynomial time and then give the yes/no answer. Keeping up the student and instructor analogy, this would mean that problems in P are so easy that the instructor will give full credit never mind what the student writes. The instructor is only willing to spend polynomial time grading, but as polynomial time is enough for him to solve the problem on his own, he is willing to do so and ignore the student's solution.
8, The question Is NP = P? is considered the biggest open problem in computer science, or even in all of mathematics. The question, in other words, asks: is it true that whenever it is easy (polynomial time) to verify someone else s solution to a problem, is it also easy to come up with your own solution?