Average Case Complexity

A fundamental question in NP-completeness theory is When, and in what sense, can an NP-complete problem be considered solvable in practice? In real life a problem often arises with a natural distribution of problem instances. What qualifies for being natural? Computational Complexity, by Y. Fu Average Case Complexity 1 / 45

Historically the focus was on probabilistic analysis of algorithms with respect to uniform distribution. There are hard NP-problems with easy average case algorithm. Computational Complexity, by Y. Fu Average Case Complexity 2 / 45

The fundamentals of the average case complexity were developed by Leonid Levin in 1986 in a two page paper. Average Case Complete Problems. SIAM Journal of Computing, 15:285-286, 1986. It aims at an NP-completeness theory with natural distributions. Computational Complexity, by Y. Fu Average Case Complexity 3 / 45

Synopsis 1. Distributional Problem 2. Natural Distribution 3. DistNP-Completeness 4. SampNP-Completeness Computational Complexity, by Y. Fu Average Case Complexity 4 / 45

Distributional Problem Computational Complexity, by Y. Fu Average Case Complexity 5 / 45

Technically a problem that arises in practice is a pair consisting of a decision problem and a distribution of the problem instance. Computational Complexity, by Y. Fu Average Case Complexity 6 / 45

Distribution Function A distribution function µ : {0, 1} [0, 1] is from strings to real values in [0, 1] such that µ(x) µ(y) whenever x < y, and lim x µ(x) = 1. Using the lexicographic order, the order < can be defined by x < y iff x < y or x = y x y. 1. The value µ(x) is the cumulative probability at x. 2. The density function µ of µ is defined by µ(x) = µ(x) µ(x 1). Computational Complexity, by Y. Fu Average Case Complexity 7 / 45

A distributional problem is a pair L, µ where L {0, 1} and µ is a distribution function. We are particularly interested in the distributional problem classes that are average case counterparts of P and NP. Computational Complexity, by Y. Fu Average Case Complexity 8 / 45

On Average Time For every TM A and input x, let time A (x) denote the number of steps A takes on input x. 1. We say that the worst case complexity of A is polynomial if c, d. x.time A (x) c x d. 2. It seems natural to say that a distributional problem L, µ is efficiently solvable by a TM A if c, d. n. µ(x)time A (x) cn d. x {0,1} n Computational Complexity, by Y. Fu Average Case Complexity 9 / 45

Polynomial Time on Average However the natural definition is pathological because it is not closed under function composition, and it is not model independent. Computational Complexity, by Y. Fu Average Case Complexity 10 / 45

Polynomial Time on Average Consider a k-tape TM that halts in n steps on every input 0 n and in 2 n steps on input 0 n. Assume the distribution is uniform. Its expected running time is n(1 1/2 n ) + 2 n /2 n < n + 1 On a machine with only one tape the average running time would be exponential due to a quadratic slowdown. Computational Complexity, by Y. Fu Average Case Complexity 11 / 45

Polynomial Time on Average Let s manipulate the worst case complexity formula slightly: c, d. x. time A(x) 1 d x d c. By applying the expectation operation to the above formula we get Levin s definition of average case polynomial time. Computational Complexity, by Y. Fu Average Case Complexity 12 / 45

Average Case Analog of P A distributional problem L, µ is in AvgP if L is accepted by a TM A that renders true the following C, ɛ. x {0,1} µ(x) time A(x)ɛ x C. (1) For every d > 0 condition (1) is equivalent to the following C, ɛ. x {0,1} µ(x) Hint: E[X ] d E[X d ] for each d 1. time A(x)ɛ x d C. Computational Complexity, by Y. Fu Average Case Complexity 13 / 45

Average Case Analog of P Observations: P AvgP. An average case P-time algorithm has a high probability to run in P-time. This is due to Markov s inequality: [ timea (x) ɛ ] Pr x R {0,1} KC 1 x K. Computational Complexity, by Y. Fu Average Case Complexity 14 / 45

1. AvgP contains not just theoretically feasible problems, but also practically feasible problems. 2. One gets a super class of NP, denoted by AvgNP, if one replaces the TM in the definition of AvgP by NDTM. Since NP AvgNP, the hard problems in AvgNP are unlikely to have efficient algorithms. 3. One looks for a class of NP problems that have efficient average case algorithms. Computational Complexity, by Y. Fu Average Case Complexity 15 / 45

Natural Distribution Computational Complexity, by Y. Fu Average Case Complexity 16 / 45

Levin assumed that natural distributions are P-time computable. Computational Complexity, by Y. Fu Average Case Complexity 17 / 45

Polynomial Time Computable Distribution Levin, 1986. A distribution function µ is P-computable if there is a P-time TM that computes it. The density function of a P-computable distribution function is also P-time computable. Computational Complexity, by Y. Fu Average Case Complexity 18 / 45

The density function for the uniform distribution is given by 1 x ( x + 1) 1 2 x. Computational Complexity, by Y. Fu Average Case Complexity 19 / 45

Arguably a distribution is natural not because we can calculate it efficiently; it is natural because it can be generated efficiently. Computational Complexity, by Y. Fu Average Case Complexity 20 / 45

Polynomial Time Samplable Distribution Impagliazzo and Levin, 1990. A distribution function µ is P-samplable if there is a P-time PTM A such that A outputs x with probability µ(x) for all x {0, 1}. Computational Complexity, by Y. Fu Average Case Complexity 21 / 45

Lemma. A P-computable distribution is also P-samplable. Lemma. Assume P P P. Then there is a P-samplable distribution that is not P-computable. Computational Complexity, by Y. Fu Average Case Complexity 22 / 45

DistNP and SampNP A distributional problem L, µ is in DistNP if the following hold: L NP, and µ is P-computable. A distributional problem L, µ is in SampNP if the following hold: L NP, and µ is P-samplable. Computational Complexity, by Y. Fu Average Case Complexity 23 / 45

DistNP-Completeness Computational Complexity, by Y. Fu Average Case Complexity 24 / 45

A reduction between problems in DistNP is a Karp reduction. Additionally it should also satisfy some continuity property. Computational Complexity, by Y. Fu Average Case Complexity 25 / 45

Suppose L, µ and L, µ are distributional problems. L, µ average case reduces to L, µ, noted L, µ A L, µ, if there is a P-time computable f and polynomials p, q such that Correctness. x {0, 1}. x L f (x) L ; Length Regularity. x {0, 1}. f (x) = p( x ); Domination. y {0, 1}. x {0,1}, f (x)=y µ(x) q( y )µ (y). Computational Complexity, by Y. Fu Average Case Complexity 26 / 45

1. Length Regularity implies that f 1 (y) is finite for all y {0, 1}. 2. Domination condition y {0, 1}. x {0,1}, f (x)=y µ(x) q( y )µ (y) is to ensue that the reduction does not map a highly likely instance of the first problem onto a rare instance of the second problem. Otherwise an easy solution to the latter does not necessarily yield an easy solution to the former. Computational Complexity, by Y. Fu Average Case Complexity 27 / 45

Lemma. Average case reduction is transitive. Computational Complexity, by Y. Fu Average Case Complexity 28 / 45

Theorem. If L, µ A L, µ AvgP, then L, µ AvgP. Proof. Let f be a reduction from L, µ to L, µ with polynomials p, q. Let the running time of f be bounded by dn d. Clearly dn d p(n). Suppose A is a TM for L, µ and ɛ, C are such that y {0,1} µ (y) time A (y)ɛ y C. Let A be the obvious TM for L, µ obtained by composition. The inequality derived on the next slide implies L, µ AvgP. Computational Complexity, by Y. Fu Average Case Complexity 29 / 45

x {0,1} µ(x) time A (x) ɛ q( f (x) )d x d y=f (x) x {0,1} µ(x) y=f (x) ( timea (y)+d x d) ɛ q( y )d x d time A (y) ɛ + ( d x d) ɛ µ(x) q( y )d x d x {0,1} y=f (x) x {0,1} µ (y) y {0,1} C + 1. ( µ(x) timea (y) ɛ q( y ) y ( timea (y) ɛ y ) + 1 ) + 1 Computational Complexity, by Y. Fu Average Case Complexity 30 / 45

We say that L, µ is DistNP-complete if the following hold: L, µ DistNP, and L, µ A L, µ for all L, µ DistNP. Computational Complexity, by Y. Fu Average Case Complexity 31 / 45

Levin provided the first DistNP-complete problem in 1986. The proof we will present below is from Yuri Gurevich (1987). Complete and Incomplete Randomized NP Problems. FOCS. Computational Complexity, by Y. Fu Average Case Complexity 32 / 45

Distributional Bounded Halting Problem 1. Let U contain all tuples α, x, 1 t such that the NDTM N α accepts x in t steps. 2. Let µ u be the distribution on tuples α, x, 1 t of length n st. α R {0, 1} log(n), t R {0,..., n log(n)}, and x R {0, 1} n log(n) t. This distribution is P-time computable. 3. U, µ u is the distributional version of Bounded Halting. We could make µ u uniform by replacing 1 t with a string of equal length, and assign each such string the same probability. But we would lose the domination property had we done that. Computational Complexity, by Y. Fu Average Case Complexity 33 / 45

Peak Elimination The obvious reduction fails the Domination property. We bypass the problem by using the following lemma. Lemma. Let µ be a P-computable distribution function. There is a P-time computable function g : {0, 1} {0, 1} such that g is one-one: g(x) = g(x ) iff x = x. For every x {0, 1}, g(x) x + 1. For every y {0, 1}, µ({x y = g(x)}) 1 2 y 1. Computational Complexity, by Y. Fu Average Case Complexity 34 / 45

Proof Given x {0, 1}, let h(x) be the largest common prefix of the binary representations of µ(x) and µ(x 1). h is P-time computable. h(x) k if µ(x) = µ(x) µ(x 1) 2 k. h is one-one. Suppose x < x and h(x) = h(x ) with h(x) = k. The (k + 1)-th bit of µ(x) must be 1. The (k + 1)-th bit of both µ(x ) and µ(x 1) must be 1. Computational Complexity, by Y. Fu Average Case Complexity 35 / 45

Proof 1. For every x {0, 1} n, define g(x) = { 0x, if µ(x) 2 n, 1h(x), otherwise. Clearly g satisfies the first two conditions of the lemma. 2. We now show that µ({x y=g(x)}) 1 2 y 1 for all y {0, 1}. If y is not in the image of g, then µ({x y=g(x)}) = 0. If y = 0x and µ(x) 1, then µ({x y=g(x)}) 1 2 x If y = 1h(x) and µ(x) > 1 2 x, then h(x) log It follows that µ({x y=g(x)}) 1 2 y 1. ( 1 µ(x) 2 y 1. ). Computational Complexity, by Y. Fu Average Case Complexity 36 / 45

Theorem. U, µ u is DistNP-complete. Proof. Suppose L, µ DistNP. 1. Let N α be a P-time NDTM that accepts L. Define N α by On input y, guess x such that y = g(x); then execute N α (x). Let p be the polynomial running time of N α. 2. Reduction: f (x) = α, y, 1 k, where x = n and y = g(x) and k = p(n) + log(n) + n α y. 3. Correctness and Length Regularity conditions are satisfied. By definition {µ(x) α, y, 1 k = f (x)} 1. 2 y 1 Let m = α, y, 1 k. The probability that α, y, 1 k occurs is at least 2 log m 1 1 2 y m = 1 1. So Domination condition is met. m 2 2 y Computational Complexity, by Y. Fu Average Case Complexity 37 / 45

It is remarkable that an NPC problem coupled with a simple distribution contains the projected image of everything in DistNP. Levin s definition of P-computable distribution is crucial to the transformation that maps an instance with higher than average probability to a shorter instance for which the probability is fair. Computational Complexity, by Y. Fu Average Case Complexity 38 / 45

SampNP-Completeness Computational Complexity, by Y. Fu Average Case Complexity 39 / 45

SampNP-Completeness We say that L, µ is SampNP-complete if the following hold: L, µ SampNP, and L, µ L, µ for all L, µ SampNP. Computational Complexity, by Y. Fu Average Case Complexity 40 / 45

SampNP-Completeness Theorem. (Impagliazzo and Levin, 1990) If L, µ is DistNP-complete, then it is also SampNP-complete. Proof. See the paper by Impagliazzo and Levin: No Better Ways to Generate Hard NP Instances than Picking Uniformly at Random, FOCS, 1990. Computational Complexity, by Y. Fu Average Case Complexity 41 / 45

Levin has got it right after all. By restricting to the P-computable distributions, we may overlook some easy problems, but we never turn any easy problems into hard ones. Computational Complexity, by Y. Fu Average Case Complexity 42 / 45

Average Case Complexity vs. Worst Case Complexity Investigations have shown that it is unlikely that the existence of an efficient average case algorithm implies the existence of an efficient worst case algorithm. Computational Complexity, by Y. Fu Average Case Complexity 43 / 45

Application In cryptography one seeks NP problems that are hard on average. This is a strong motivation for studying average case complexity. Open Problem. Is factorization (discrete log) DistNP-hard? Computational Complexity, by Y. Fu Average Case Complexity 44 / 45

Open Problem 1. DistNP SampNP NP AvgNP. 2. Natural DistNP-complete problems. Computational Complexity, by Y. Fu Average Case Complexity 45 / 45