A FINE-GRAINED APPROACH TO

Similar documents
6.S078 A FINE-GRAINED APPROACH TO ALGORITHMS AND COMPLEXITY LECTURE 1

A FINE-GRAINED APPROACH TO ALGORITHMS AND COMPLEXITY. Virginia Vassilevska Williams Stanford University

Hardness for easy problems. An introduction

Connections between exponential time and polynomial time problem Lecture Notes for CS294

Complexity Theory of Polynomial-Time Problems

On some fine-grained questions in algorithms and complexity

CSC2420: Algorithm Design, Analysis and Theory Fall 2017

Geometric Problems in Moderate Dimensions

Complexity Theory of Polynomial-Time Problems

Algorithms as Lower Bounds

Strong ETH Breaks With Merlin and Arthur. Or: Short Non-Interactive Proofs of Batch Evaluation

Matching Triangles and Basing Hardness on an Extremely Popular Conjecture

Limitations of Algorithm Power

Completeness for First-Order Properties on Sparse Structures with Algorithmic Applications

Algorithm Design and Analysis

arxiv: v2 [cs.ds] 3 Oct 2017

Complexity Theory of Polynomial-Time Problems

Essential facts about NP-completeness:

NP-Completeness. Andreas Klappenecker. [based on slides by Prof. Welch]

Notes on Complexity Theory Last updated: October, Lecture 6

CS154, Lecture 15: Cook-Levin Theorem SAT, 3SAT

Complete problems for classes in PH, The Polynomial-Time Hierarchy (PH) oracle is like a subroutine, or function in

Holger Dell Saarland University and Cluster of Excellence (MMCI) Simons Institute for the Theory of Computing

COP 4531 Complexity & Analysis of Data Structures & Algorithms

COMP Analysis of Algorithms & Data Structures

Complexity Theory of Polynomial-Time Problems

Fine Grained Counting Complexity I

A.Antonopoulos 18/1/2010

CS/COE

NP and Computational Intractability

7.8 Intractability. Overview. Properties of Algorithms. Exponential Growth. Q. What is an algorithm? A. Definition formalized using Turing machines.

Non-Deterministic Time

P is the class of problems for which there are algorithms that solve the problem in time O(n k ) for some constant k.

Lecture 8: Complete Problems for Other Complexity Classes

Lecture 22: Counting

Algorithm Design and Analysis

Complexity Classes V. More PCPs. Eric Rachlin

CS 350 Algorithms and Complexity

1. Introduction Recap

Limits to Approximability: When Algorithms Won't Help You. Note: Contents of today s lecture won t be on the exam

Complexity, P and NP

NP Complete Problems. COMP 215 Lecture 20

NP-Hardness reductions

Polynomial-Time Reductions

NP Completeness and Approximation Algorithms

CS154, Lecture 17: conp, Oracles again, Space Complexity

NP-Complete Reductions 1

CS21 Decidability and Tractability

Topics in Complexity Theory

P, NP, NP-Complete, and NPhard

Computer Science 385 Analysis of Algorithms Siena College Spring Topic Notes: Limitations of Algorithms

CS 6505, Complexity and Algorithms Week 7: NP Completeness

ITCS:CCT09 : Computational Complexity Theory Apr 8, Lecture 7

Algorithms 6.5 REDUCTIONS. designing algorithms establishing lower bounds classifying problems intractability

2 P vs. NP and Diagonalization

Agenda. What is a complexity class? What are the important complexity classes? How do you prove an algorithm is in a certain class

Some Algebra Problems (Algorithmic) CSE 417 Introduction to Algorithms Winter Some Problems. A Brief History of Ideas

NP-Completeness. Subhash Suri. May 15, 2018

CSC 5170: Theory of Computational Complexity Lecture 9 The Chinese University of Hong Kong 15 March 2010

CS 320, Fall Dr. Geri Georg, Instructor 320 NP 1

NP Completeness. CS 374: Algorithms & Models of Computation, Spring Lecture 23. November 19, 2015

Complexity Theory of Polynomial-Time Problems

Intro to Theory of Computation

CS 350 Algorithms and Complexity

CSE 135: Introduction to Theory of Computation NP-completeness

Deterministic APSP, Orthogonal Vectors, and More:

Copyright 2000, Kevin Wayne 1

CS 583: Algorithms. NP Completeness Ch 34. Intractability

6.080 / Great Ideas in Theoretical Computer Science Spring 2008

Hardness of Mastermind

Algorithm Design and Analysis

How proofs are prepared at Camelot

On The Hardness of Approximate and Exact (Bichromatic) Maximum Inner Product. Lijie Chen (Massachusetts Institute of Technology)

Some Algebra Problems (Algorithmic) CSE 417 Introduction to Algorithms Winter Some Problems. A Brief History of Ideas

CS151 Complexity Theory

Umans Complexity Theory Lectures

Complexity Theory VU , SS The Polynomial Hierarchy. Reinhard Pichler

Outline. Complexity Theory EXACT TSP. The Class DP. Definition. Problem EXACT TSP. Complexity of EXACT TSP. Proposition VU 181.

Faster Satisfiability Algorithms for Systems of Polynomial Equations over Finite Fields and ACC^0[p]

NP-Completeness Review

Notes for Lecture 21

Complexity: moving from qualitative to quantitative considerations

Welcome to... Problem Analysis and Complexity Theory , 3 VU

1 Randomized reduction: a first start

CS154, Lecture 13: P vs NP

Computational Complexity and Intractability: An Introduction to the Theory of NP. Chapter 9

NP and Computational Intractability

Computability and Complexity Theory: An Introduction

Theory of Computation CS3102 Spring 2015 A tale of computers, math, problem solving, life, love and tragic death

Polynomial Representations of Threshold Functions and Algorithmic Applications. Joint with Josh Alman (Stanford) and Timothy M.

Apropos of an errata in ÜB 10 exercise 3

1 Circuit Complexity. CS 6743 Lecture 15 1 Fall Definitions

P, NP, and Beyond. Hengfeng Wei. May 01 May 04, Hengfeng Wei P, NP, and Beyond May 01 May 04, / 29

Simulating Branching Programs with Edit Distance and Friends

Lecture 19: Finish NP-Completeness, conp and Friends

NP Completeness and Approximation Algorithms

Chapter 34: NP-Completeness

Easy Problems vs. Hard Problems. CSE 421 Introduction to Algorithms Winter Is P a good definition of efficient? The class P

Complexity Theory of Polynomial-Time Problems

The Intractability And The Tractability Of The Orthogonal Vectors Problem

Transcription:

A FINE-GRAINED APPROACH TO ALGORITHMS AND COMPLEXITY Virginia Vassilevska Williams MIT ICDT 2018

THE CENTRAL QUESTION OF ALGORITHMS RESEARCH ``How fast can we solve fundamental problems, in the worst case? etc.

HARD PROBLEMS For many problems, the known techniques get stuck: Very important computational problems from diverse areas They have simple, often brute-force, textbook algorithms That are slow. No improvements in many decades!

A CANONICAL HARD PROBLEM k-sat Input: variables x 1,,x n and a formula F = C 1 C 2 C m so that each C i is of the form {y 1 y 2 y k } and i, y i is either x t or x t for some t. Output: A boolean assignment to {x 1,,x n } that satisfies all the clauses, or NO if the formula is not satisfiable Brute-force algorithm: try all 2 n assignments Best known algorithm: O(2 n-(cn/k) n d ) time for const c,d Goes to 2 n as k grows.

ANOTHER HARD PROBLEM: LONGEST COMMON SUBSEQUENCE (LCS) Given two strings on n letters??? ATCGGGTTCCTTAAGGG AT ATTGGTACCTTCAGGT GG_TACCTTCA_GG Find a subsequence of both strings of maximum length. Algorithms: Classical O(n 2 ) time Best algorithm: O(n 2 / log 2 n) time [MP 80] Applications both in computational biology and in spellcheckers. Solved daily on huge strings! (Human genome: 3 x 10 9 base pairs.)

??? ANOTHER HARD PROBLEM: BATCH PARTIAL MATCH Given a database D {0, 1} n of size n and queries x 1,, x n {0, 1, } d, for every x i, report whether there is a y in D that matches x i in all non- positions, i.e. for every c, either x i c = or x i c = y c. e.g. 001101 matches *01**1 No O(n 2 ε ) time alg. for ε > 0 known for d ω(log n)! Try all pairs (y D, query x i ) and check whether they match in O(n 2 d) time. Or: Preprocess all possible queries: O(2 d n) time. [AWY 15]: If d = c n log n, O(n 2 1/O(log(c)) ) time.

IN THEORETICAL CS, POLYNOMIAL TIME = EFFICIENT/EASY. This is for a variety of reasons. E.g. composing two efficient algorithms results in an efficient algorithm. Also, model-independence. However, no one would consider an O(n 100 ) time algorithm efficient in practice. If n is huge, then O(n 2 ) is also inefficient.

No N 2 - time algorithms known for: WE ARE STUCK ON MANY PROBLEMS, EVEN JUST IN O(N 2 ) TIME Many string matching problems: Edit distance, Sequence local alignment, LCS, jumbled indexing General form: given two sequences of length n, how similar are they? All variants can be solved in O(n 2 ) time by dynamic programming. ATCGGGTTCCTTAAGGG ATTGGTACCTTCAGG

No N 2 - time algorithms known for: WE ARE STUCK ON MANY PROBLEMS, EVEN JUST IN O(N 2 ) TIME Many string matching problems Many problems in computational geometry: e.g Given n points in the plane, are any three collinear? A very important primitive!

No N 2 - time algorithms known for: WE ARE STUCK ON MANY PROBLEMS, EVEN JUST IN O(N 2 ) TIME Many string matching problems Many problems in computational geometry Many graph problems in sparse graphs: e.g. Given an n node, O(n) edge graph, what is its diameter? Fundamental problem. Even approximation algorithms seem hard!

No N 2 - time algorithms known for: WE ARE STUCK ON MANY PROBLEMS, EVEN JUST IN O(N 2 ) TIME Many string matching problems Many problems in computational geometry Many graph problems in sparse graphs Many problems from databases: batch partial match, Q = R A, B S B, C T(C, A) / triangle listing

No N 2 - time algorithms known for: WE ARE STUCK ON MANY PROBLEMS, EVEN JUST IN O(N 2 ) TIME Many string matching problems Many problems in computational geometry Many graph problems in sparse graphs Many problems from databases Many other problems Why are we stuck? Are we stuck because of the same reason?

PLAN Traditional hardness in complexity A fine-grained approach New Developments

COMPUTATIONAL COMPLEXITY PSPACE NP P Logspace AC0 This traditional complexity class approach says little about runtime. QBF with k alternations in 2 n-ω(k) time [SW15] QBF gets easier as the # of quantifiers increases... Best SAT alg: 2 n Even O(n 2 ) time is inefficient Contains the 1000-clique problem; best runtime: Ω(n 790 )

TIME HIERARCHY THEOREMS For most natural computational models one can prove: for any constant c, there exist problems solvable in O(n c ) time but not in O(n c-ε ) time for any ε > 0. It is completely unclear how to show that a particular problem in O(n c ) time is not in O(n c-ε ) time for any ε > 0. It is not even known if SAT is in linear time!

N size of input NP P It also does not apply to problems in P! Theorem [Cook, Karp 72]: Unless P=NP k-sat is NP-complete for all k 3. WHY IS K-SAT HARD? NP-completeness addresses runtime, but it is too coarse-grained! I.e. k-sat is considered hard because fast algorithms for it imply fast algorithms for many important problems. We ll develop a fine-grained theory of hardness that is conditional and mimics NP-completeness.

PLAN Traditional hardness in complexity A fine-grained approach Examples

Idea: Mimic NP-completeness FINE-GRAINED HARDNESS IDEA 1. Identify key hard problems 2. Reduce these to all (?) problems believed hard 3. Hopefully form equivalence classes

CNF SAT IS CONJECTURED TO BE REALLY HARD Two popular conjectures about SAT on n variables [IPZ01]: ETH: 3-SAT requires 2 n time for some constant > 0. SETH: for every > 0, there is a k such that k-sat on n variables, m clauses cannot be solved in 2 (1- )n poly m time. So we can use k-sat as our hard problem and ETH or SETH as the hypothesis we base hardness on.

Strengthening of SETH [CGIMPS 16] suggests these are not equivalent Fix the model: word-ram with O(log n) bit words Given a database D of n vectors in {0,1} d and n queries x 1,,x n in {0,1,*} d for d = ω(log n), for every x i, answer whether some y in D matches it. Hypothesis: BPM requires n 2-o(1) time. [W 05]: SETH implies this hypothesis! Easy O(n 2 d) time alg Best known [AWY 15]: n2 - (1 / log (d/log n)) Orthogonal Vectors Given a set S of n integers, are there a, b, c 2 S with a + b + c = 0? 3SUM Batch Partial Match (BPM) More key problems to blame APSP Hypothesis: APSP requires n 3-o(1) time. All pairs shortest paths: given an n-node weighted graph, find the distance between every Easy O(n 2 ) time alg [BDP 05]: ~n 2 / log 2 n time for integers [Chan 18] : ~n 2 / log 2 n time for reals Hypothesis: 3SUM requires n 2-o(1) time. two nodes. Classical algs: O(n 3 ) time [W 14]: n 3 / exp( log n) time

Idea: Mimic NP-completeness FINE-GRAINED HARDNESS 1. Identify key hard problems 2. Reduce these to all (?) other hard problems 3. Hopefully form equivalence classes

FINE-GRAINED REDUCTIONS A is (a,b)-reducible to B if for every ε>0 δ>0, and an O(a(n) 1-δ ) time algorithm that adaptively transforms any A-instance of size n to B-instances of size n 1,,n k so that Σ i b(n i ) 1-ε < a(n) 1-δ. If B is in O(b(n) 1-ε ) time, then A is in O(a(n) 1-δ ) time. Focus on exponents. We can build equivalences. Intuition: a(n),b(n) are the naive runtimes for A and B. A reducible to B implies that beating the naive runtime for B implies also beating the naive runtime for A. B n A a(n) 1-δ B B B n 1, n 2,, n k

Theorem (VW-W 10): If EPM on N size inputs is in O(N 2 ε ) time, then BPM is in O(n 2 ε 2) time EXAMPLE: BATCH PARTIAL MATCH ~ EXISTS PARTIAL MATCH Instance D, x 1,, x n of BPM O(n) time reduction Input: D {0, 1} d, D = n, x 1,, x n {0, 1, } d, d = ω log n BPM Output: For every x i, is there a y D matching x i? EPM Output: Is there an x i and a y D matching x i? Instance 1 of EPM detection on n 1/2 size database and n 1/2 queries Instance of EPM detection on n 1/2 Instance n of EPM detection on n 1/2 size database size database and n 1/2 queries and n 1/2 queries

Input: D, X = {x 1,, x n } BPM Output: for each x i, is there a y D that matches x i? EPM Output: Is there any x i with a y D that matches it? Reduction from BPM to EPM: t = n 1/2 Split D into pieces D 1,,D t of size n/t Split X into pieces X 1,,X t of size n/t Z all zeros vector For all pairs (D p, X q ) in turn: While EPM finds a partial match y D, x i X for (D p,x q ): Set Z[i ] = 1 Remove x i from X. D 1 D 2 D t-1 D t y D Database X x i X 1 X 2 X t-1 X t Query Set

Z all zeros vector For all pairs (D p, X q ) in turn: While EPM finds a partial match y D, x i X for (D p,x q ): Set Z[i ] = 1 Remove x i from X. BPM TO EPM REDUCTION Correctness: Every pair y D, x i X appears in some examined D p, X q Runtime: Every call to the Exists Partial Match finding algorithm is due to either (1) Setting an entry Z[i] to 1, or this happens at most once per x i X. (2) Determining that some pair D p, X q doesn t have any more partial matches this happens at most once per pair D p, X q, If the runtime for detecting a partial match is T n = O(n 2 ε ), then the reduction time is (n + t 2 ) T(n/t). Setting t=n 1/2, we get: O(n 2 - ε/2 ).

Using other hardness assumptions, one can unravel even more structure N input size n number of variables or vertices SOME STRUCTURE WITHIN P Graph diameter [RV 13,BRSVW 18], eccentricities [AVW 16], local alignment, longest common substring* [AVW 14], Frechet distance [Br 14], Edit distance [BI 15], LCS, Dyn. time warping [ABV 15, BrK 15], subtree isomorphism [ABHVZ 15], Betweenness [AGV 15], Hamming Closest Pair [AW15], Reg. Expr. Matching [BI16,BGL17] N 2- Many dynamic problems [P 10],[AV 14], [HKNS 15], [D16], [RZ 04], [AD 16], N2- Huge literature in comp. geom. [GO 95, BHP98, ]: Geombase, 3PointsLine, 3LinesPoint, Polygonal Containment, Planar Motion Planning, 3D Motion Planning String problems: Sequence local alignment [AVW 14], jumbled indexing [ACLL 14], N 2-2 (1 - )n 3SUM N 2- Batch Partial Match / OV k-sat 8 k [W 04] APSP n 3- N 1.5- equivalent N1.5- n 3- In dense graphs: radius, median, betweenness centrality [AGV 15], negative triangle, second shortest path, replacement paths, shortest cycle [VW 10],

PLAN Traditional hardness in complexity A fine-grained approach More examples

LISTING TRIANGLES Suppose that G = V, E is an m edge graph and we want to list its triangles. How fast can we do it? Corresponds to the query R A, B S B, C T(C, A) where R,S,T have m tuples. G can have ~m 3/2 triangles, and these can be listed in ~m 3/2 time. What if G has ~m triangles? Patrascu 10: under the 3SUM Hypothesis, m 4 3 o 1 time is needed! Bjorklund et al 14: if n n matrices can be multiplied in O(n 2 ) time (huge open problem), then can list m triangles in ~m 4/3 time. Follow-up work Lincoln et al. 17: finding directed cycles is hard

The fine-grained approach has been spreading: FURTHER DEVELOPMENTS Space usage of data structures [GKLP 17] Planar graph problems [AD 16] Sparse graph problems [LVW 18] Fine-grained Fixed Parameter Tractability [AVW 16] Fine-grained Cryptography [BRSV 17] Fine-grained I/O Complexity [DLLLV 18] Time/space tradeoffs [LVW 16] Approximation hardness More-believable hypotheses

Many open problems: bit.ly/2mdi6zh On my webpage New survey: bit.ly/2d9jbax Thank you! Questions?