Complexity Theory Part Two

Recap from Last Time

The Complexity Class P The complexity class P (for polynomial time) contains all problems that can be solved in polynomial time. Formally: P = { L There is a polynomial-time decider for L } Assuming the Cobham-Edmonds thesis, a language is in P if it can be decided efficiently.

Polynomial-Time Verifiers A polynomial-time verifier for L is a TM V such that V halts on all inputs. w L iff c Σ*. V accepts w, c. V's runtime is a polynomial in w (that is, V's runtime is O( w k ) for some integer k)

The Complexity Class NP The complexity class NP (nondeterministic polynomial time) contains all problems that can be verified in polynomial time. Formally: NP = { L There is a polynomial-time verifier for L } The name NP comes from another way of characterizing NP. If you introduce nondeterministic Turing machines and appropriately define polynomial time, then NP is the set of problems that an NTM can solve in polynomial time.

Theorem (Baker-Gill-Solovay): Any proof that purely relies on universality and self-reference cannot resolve P NP. Proof: Take CS154!

So how are we going to reason about P and NP?

New Stuff!

A Challenge

REG P NP Problems in NP vary widely in their difficulty, even if P = NP. How can we rank the relative difficulties of problems?

Reducibility

Maximum Matching Given an undirected graph G, a matching in G is a set of edges such that no two edges share an endpoint. A maximum matching is a matching with the largest number of edges.

Maximum Matching Given an undirected graph G, a matching in G is a set of edges such that no two edges share an endpoint. A maximum matching is a matching with the largest number of edges. A matching, matching, but but not not a a maximum maximum matching. matching.

Maximum Matching Given an undirected graph G, a matching in G is a set of edges such that no two edges share an endpoint. A maximum matching is a matching with the largest number of edges.

Maximum Matching Jack Edmonds' paper Paths, Trees, and Flowers gives a polynomial-time algorithm for finding maximum matchings. (This is the same Edmonds as in Cobham- Edmonds Thesis. ) Using this fact, what other problems can we solve?

Domino Tiling

Solving Domino Tiling

In Pseudocode boolean canplacedominos(grid G, int k) { } return hasmatching(gridtograph(g), k);

Based Based on on this this connection connection between between maximum maximum matching matching and and domino domino tiling, tiling, which which of of the the following following statements statements would would be be more more proper proper to to conclude? conclude? A. A. Finding Finding a a maximum maximum matching matching isn t isn t any any more more difficult difficult than than tiling tiling a a grid grid with with dominoes. dominoes. B. B. Tiling Tiling a a grid grid with with dominoes dominoes isn t isn t any any more more difficult difficult than than finding finding a a maximum maximum matching. matching. Answer Answer at at PollEv.com/cs103 PollEv.com/cs103 or or text CS103 to 22333 once to join, then A or B. text CS103 to 22333 once to join, then A or B.

Intuition: Tiling a grid with dominoes can't be harder than solving maximum matching, because if we can solve maximum matching efficiently, we can solve domino tiling efficiently.

Another Example

Reachability Consider the following problem: Given an directed graph G and nodes s and t in G, is there a path from s to t? It's known that this problem can be solved in polynomial time (use DFS or BFS). Given that we can solve the reachability problem in polynomial time, what other problems can we solve in polynomial time?

Converter Conundrums Suppose that you want to plug your laptop into a projector. Your laptop only has a VGA output, but the projector needs HDMI input. You have a box of connectors that convert various types of input into various types of output (for example, VGA to DVI, DVI to DisplayPort, etc.) Question: Can you plug your laptop into the projector?

Converter Conundrums Connectors RGB RGB to to USB USB VGA VGA to to DisplayPort DB13W3 DB13W3 to to CATV CATV DisplayPort to to RGB RGB DB13W3 DB13W3 to to HDMI HDMI DVI DVI to to DB13W3 DB13W3 S-Video S-Video to to DVI DVI FireWire FireWire to to SDI SDI VGA VGA to to RGB RGB DVI DVI to to DisplayPort DisplayPort USB USB to to S-Video S-Video SDI SDI to to HDMI HDMI

Converter Conundrums VGA RGB USB DisplayPort DB13W3 CATV HDMI X DVI S-Video FireWire SDI

Converter Conundrums Connectors RGB RGB to to USB USB VGA VGA to to DisplayPort DB13W3 DB13W3 to to CATV CATV DisplayPort to to RGB RGB DB13W3 DB13W3 to to HDMI HDMI DVI DVI to to DB13W3 DB13W3 S-Video S-Video to to DVI DVI FireWire FireWire to to SDI SDI VGA VGA to to RGB RGB DVI DVI to to DisplayPort USB USB to to S-Video S-Video SDI SDI to to HDMI HDMI VGA RGB USB DisplayPort DB13W3 CATV HDMI X FireWire DVI S-Video SDI

In Pseudocode boolean canplugin(list<plug> plugs) { } return isreachable(plugstograph(plugs), VGA, HDMI);

Based Based on on this this connection connection between between plugging plugging a a laptop laptop into into a a projector projector and and determining determining reachability, reachability, which which of of the the following following statements statements would would be be more more proper proper to to conclude? conclude? A. A. Plugging Plugging a a laptop laptop into into a a projector projector isn t isn t any any more more difficult difficult that that computing computing reachability reachability in in a a directed directed graph. graph. B. B. Computing Computing reachability reachability in in a a directed directed graph graph isn t isn t any any more more difficult difficult than than plugging plugging a a laptop laptop into into a a projector. projector. Answer Answer at at PollEv.com/cs103 PollEv.com/cs103 or or text CS103 to 22333 once to join, then A or B. text CS103 to 22333 once to join, then A or B.

Intuition: Finding a way to plug a computer into a projector can't be harder than determining reachability in a graph, since if we can determine reachability in a graph, we can find a way to plug a computer into a projector.

bool solveproblema(string input) { return solveproblemb(transform(input)); } Intuition: Problem A can't be harder than problem B, because solving problem B lets us solve problem A.

bool solveproblema(string input) { return solveproblemb(transform(input)); } If A and B are problems where it's possible to solve problem A using the strategy shown above *, we write A p B. We say that A is polynomial-time reducible to B. * Assuming that transform * runs in polynomial time.

Polynomial-Time Reductions If A p B and B P, then A P. If L 1 P L 2 and L 2 NP, then L 1 NP. P

Polynomial-Time Reductions If A p B and B P, then A P. If A p B and B NP, then A NP. P

Polynomial-Time Reductions If A p B and B P, then A P. If A p B and B NP, then A NP. P NP

This ₚ relation lets us rank the relative difficulties of problems in P and NP. What else can we do with it?

NP-Hardness and NP-Completeness

Question: What makes a problem hard to solve?

Intuition: If A ₚ B, then problem B is at least as hard * as problem A. * for some definition of at least as hard as.

Intuition: To show that some problem is hard, show that lots of other problems reduce to it.

NP-Hardness A language L is called NP-hard if for every A NP, we have A P L. A language in L is called NP-complete iff L is NP-hard and L NP. The class NPC is the set of NP-complete problems. NP P

NP-Hardness A language L is called NP-hard if for every A NP, we have A P L. A language in L is called NP-complete iff L is NP-hard and L NP. The class NPC is the set of NP-complete problems. NP NP-Hard P

NP-Hardness A language L is called NP-hard if for every A NP, we have A P L. Intuitively: L has has to to be be at at least least as as hard as every problem in NP, since an an algorithm for for L can can be be used used to to decide all all problems in in NP. NP. A language in L is hard called as NP-complete every problem iff L is in NP-hard NP, since and L NP. The class NPC is the set of NP-complete problems. NP NP-Hard P

NP-Hardness A language L is called NP-hard if for every A NP, we have A P L. A language in L is called NP-complete iff L What's is What's NP-hard in in here? here? and L NP. The class NPC is the set of NP-complete problems. NP NP-Hard P

NP-Hardness A language L is called NP-hard if for every A NP, we have A P L. A language in L is called NP-complete if L is NP-hard and L NP. The class NPC is the set of NP-complete problems. NP NP-Hard P

The Tantalizing Truth Theorem: If any NP-complete language is in P, then P = NP. Proof: Suppose that L is NP-complete and L P. Now consider any arbitrary NP problem X. Since L is NP-complete, we know that X p L. Since L P and X p L, we see that X P. Since our choice of X was arbitrary, this means that NP P, so P = NP.

The Tantalizing Truth Theorem: If any NP-complete language is in P, then P = NP. Proof: Suppose that L is NP-complete and L P. Now consider any arbitrary NP problem A. Since L is NP-complete, we know that A p L. Since L P and A p L, we see that A P. Since our choice of A was arbitrary, this means that NP P, so P = NP. P = NP

The Tantalizing Truth Theorem: If any NP-complete language is not in P, then P NP. Proof: Suppose that L is an NP-complete language not in P. Since L is NP-complete, we know that L NP. Therefore, we know that L NP and L P, so P NP. NP P NPC

How do we even know NP-complete problems exist in the first place?

Satisfiability A propositional logic formula φ is called satisfiable if there is some assignment to its variables that makes it evaluate to true. p q is satisfiable. p p is unsatisfiable. p (q q) is satisfiable. An assignment of true and false to the variables of φ that makes it evaluate to true is called a satisfying assignment.

SAT The boolean satisfiability problem (SAT) is the following: Formally: Given a propositional logic formula φ, is φ satisfiable? SAT = { φ φ is a satisfiable PL formula }

SAT = { φ φ is a satisfiable PL formula } The The language language SAT SAT happens happens to to be be in in NP. NP. Think Think about about how how a a polynomial-time polynomial-time verifier verifier for for SAT SAT might might work. work. Which Which of of the the following following would would work work as as certificates certificates for for such such a a verifier, verifier, given given that that the the input input is is a a propositional propositional formula formula φ? φ? A. A. The The truth truth table table of of φ. φ. B. B. One One possible possible variable variable assignment assignment to to φ. φ. C. C. A list list of of all all possible possible variable variable assignments assignments for for φ. φ. D. D. None None of of the the above, above, or or two two or or more more of of the the above. above. Answer Answer at at PollEv.com/cs103 PollEv.com/cs103 or or text text CS103 CS103 to to 22333 22333 once once to to join, join, then then A, A, B, B, C, C, or or D. D.

Theorem (Cook-Levin): SAT is NP-complete. Proof Idea: To see that SAT NP, show how to make a polynomial-time verifier for it. Key idea: have the certificate be a satisfying assignment. To show that SAT is NP-hard, given a polymomial-time verifier V for an arbitrary NP language L, for any string w you can construct a polynomially-sized formula φ(w) that says there is a certificate c where V accepts w, c. This formula is satisfiable if and only if w L, so deciding whether the formula is satisfiable decides whether w is in L. Proof: Take CS154!

Why All This Matters Resolving P NP is equivalent to just figuring out how hard SAT is. If SAT P, then P = NP. If SAT P, then P NP. We've turned a huge, abstract, theoretical problem about solving problems versus checking solutions into the concrete task of seeing how hard one problem is. You can get a sense for how little we know about algorithms and computation given that we can't yet answer this question!

Why All This Matters You will almost certainly encounter NP-hard problems in practice they're everywhere! If a problem is NP-hard, then there is no known algorithm for that problem that is efficient on all inputs, always gives back the right answer, and runs deterministically. Useful intuition: If you need to solve an NP-hard problem, you will either need to settle for an approximate answer, an answer that's likely but not necessarily right, or have to work on really small inputs.

Sample NP-Hard Problems Computational biology: Given a set of genomes, what is the most probable evolutionary tree that would give rise to those genomes? (Maximum parsimony problem) Game theory: Given an arbitrary perfect-information, finite, twoplayer game, who wins? (Generalized geography problem) Operations research: Given a set of jobs and workers who can perform those tasks in parallel, can you complete all the jobs within some time bound? (Job scheduling problem) Machine learning: Given a set of data, find the simplest way of modeling the statistical patterns in that data (Bayesian network inference problem) Medicine: Given a group of people who need kidneys and a group of kidney donors, find the maximum number of people who can end up with kidneys (Cycle cover problem) Systems: Given a set of processes and a number of processors, find the optimal way to assign those tasks so that they complete as soon as possible (Processor scheduling problem)

Coda: What if P NP is resolved?

Intermediate Problems With few exceptions, every problem we've discovered in NP has either definitely been proven to be in P, or definitely been proven to be NP-complete. A problem that's NP, not in P, but not NP-complete is called NP-intermediate. Theorem (Ladner): There are NP-intermediate problems if and only if P NP. NP P NPC

What if P NP?

A Good Read: A Personal View of Average-Case Complexity by Russell Impagliazzo

What if P = NP?

And a Dismal Third Option

Next Time The Big Picture Where to Go from Here A Final Your Questions Parting Words!