to fast sorting algorithms of which the provable average run-time is of equal order of magnitude as the worst-case run-time, even though this average

Similar documents
for average case complexity 1 randomized reductions, an attempt to derive these notions from (more or less) rst

Preface These notes were prepared on the occasion of giving a guest lecture in David Harel's class on Advanced Topics in Computability. David's reques

Algorithmic Probability

Then RAND RAND(pspace), so (1.1) and (1.2) together immediately give the random oracle characterization BPP = fa j (8B 2 RAND) A 2 P(B)g: (1:3) Since

which possibility holds in an ensemble with a given a priori probability p k log 2

A version of for which ZFC can not predict a single bit Robert M. Solovay May 16, Introduction In [2], Chaitin introd

Sophistication Revisited

Chazelle (attribution in [13]) obtained the same result by generalizing Pratt's method: instead of using and 3 to construct the increment sequence use

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

Kolmogorov complexity and its applications

CISC 876: Kolmogorov Complexity

1 Introduction Computers can be regarded as engines that dissipate energy in order to process information. The ultimate limits of miniaturization of c

Symmetry of Information: A Closer Look

Kolmogorov complexity and its applications

Recovery Based on Kolmogorov Complexity in Underdetermined Systems of Linear Equations

3 Self-Delimiting Kolmogorov complexity

Martin-Lof Random and PA-complete Sets. Frank Stephan. Universitat Heidelberg. November Abstract

Compression Complexity

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

A statistical mechanical interpretation of algorithmic information theory

Kolmogorov Complexity in Randomness Extraction

Limitations of Efficient Reducibility to the Kolmogorov Random Strings

Note An example of a computable absolutely normal number

Columbia University. and it is not natural to add this new component. another information measure based on length of

On the average-case complexity of Shellsort

Introduction to Kolmogorov Complexity

Written Qualifying Exam. Spring, Friday, May 22, This is nominally a three hour examination, however you will be

On Languages with Very High Information Content

Jack H. Lutz. Iowa State University. Abstract. It is shown that almost every language in ESPACE is very hard to

U.C. Berkeley CS278: Computational Complexity Professor Luca Trevisan August 30, Notes for Lecture 1

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. XX, NO Y, MONTH Algorithmic Statistics. Péter Gács, John T. Tromp, and Paul M.B.

Computability Theory

Algorithmic Information Theory

Counting dependent and independent strings

Minimum Description Length Induction, Bayesianism, and Kolmogorov Complexity

Computability and Complexity

COMPUTATIONAL COMPLEXITY

Journal of Computer and System Sciences

Circuit depth relative to a random oracle. Peter Bro Miltersen. Aarhus University, Computer Science Department

Lower Bounds for Shellsort. April 20, Abstract. We show lower bounds on the worst-case complexity of Shellsort.

CS154, Lecture 12: Kolmogorov Complexity: A Universal Theory of Data Compression

Average Case Analysis of QuickSort and Insertion Tree Height using Incompressibility

Computer Science Dept.

Chaitin Ω Numbers and Halting Problems

On Probabilistic ACC Circuits. with an Exact-Threshold Output Gate. Richard Beigel. Department of Computer Science, Yale University

Notes on Iterated Expectations Stephen Morris February 2002

Universal probability distributions, two-part codes, and their optimal precision

COS597D: Information Theory in Computer Science October 19, Lecture 10

Pure Quantum States Are Fundamental, Mixtures (Composite States) Are Mathematical Constructions: An Argument Using Algorithmic Information Theory

A Note on the Complexity of Network Reliability Problems. Hans L. Bodlaender Thomas Wolle

Spurious Chaotic Solutions of Dierential. Equations. Sigitas Keras. September Department of Applied Mathematics and Theoretical Physics

Distribution of Environments in Formal Measures of Intelligence: Extended Version

Sparse Pseudorandom Distributions

Extracted from a working draft of Goldreich s FOUNDATIONS OF CRYPTOGRAPHY. See copyright notice.

Reducing P to a Sparse Set using a Constant Number of Queries. Collapses P to L. Dieter van Melkebeek. The University of Chicago.

Prefix-like Complexities and Computability in the Limit

Kolmogorov Complexity with Error

Sophistication as Randomness Deficiency

The Computational Complexity Column

Theory of Computation

Universal Prediction of Selected Bits

2 Exercises 1. The following represent graphs of functions from the real numbers R to R. Decide which are one-to-one, which are onto, which are neithe

CSC 5170: Theory of Computational Complexity Lecture 4 The Chinese University of Hong Kong 1 February 2010

Pseudorandom Generators

The complexity of stochastic sequences

The Proof of IP = P SP ACE

Computational Depth: Concept and Applications

Computational Tasks and Models

A fast algorithm to generate necklaces with xed content

Theory of Computation

Quantum Kolmogorov Complexity Based on Classical Descriptions

Production-rule complexity of recursive structures

c 1999 Society for Industrial and Applied Mathematics

Algorithmic Information Theory

Low-Depth Witnesses are Easy to Find

Universal probability-free conformal prediction

Lecture 13: Foundations of Math and Kolmogorov Complexity

290 J.M. Carnicer, J.M. Pe~na basis (u 1 ; : : : ; u n ) consisting of minimally supported elements, yet also has a basis (v 1 ; : : : ; v n ) which f

Decision Problems Concerning. Prime Words and Languages of the

1 Computational Problems

Loss Bounds and Time Complexity for Speed Priors

How to Pop a Deep PDA Matters

arxiv: v2 [math.st] 28 Jun 2016

Decidability of Existence and Construction of a Complement of a given function

HOW TO MAKE ELEMENTARY GEOMETRY MORE ROBUST AND THUS, MORE PRACTICAL: GENERAL ALGORITHMS. O. Kosheleva. 1. Formulation of the Problem

Theory of Computing Systems 1999 Springer-Verlag New York Inc.

form, but that fails as soon as one has an object greater than every natural number. Induction in the < form frequently goes under the fancy name \tra

Claude Marche. Bat 490, Universite Paris-Sud. Abstract

Notes for Lecture Notes 2

,

measure 0 in EXP (in fact, this result is established for any set having innitely many hard instances, in the Instant Complexity sense). As a conseque

REPORTRAPPORT. Centrum voor Wiskunde en Informatica. Asymptotics of zeros of incomplete gamma functions. N.M. Temme

BU CAS CS 538: Cryptography Lecture Notes. Fall itkis/538/

The axiomatic power of Kolmogorov complexity

and the polynomial-time Turing p reduction from approximate CVP to SVP given in [10], the present authors obtained a n=2-approximation algorithm that

Laboratoire Bordelais de Recherche en Informatique. Universite Bordeaux I, 351, cours de la Liberation,

On a Quantitative Notion of Uniformity? Appeared In: MFCS'95, LNCS 969, 169{178, Springer{Verlag, 1995.

2 C. A. Gunter ackground asic Domain Theory. A poset is a set D together with a binary relation v which is reexive, transitive and anti-symmetric. A s

Lecture 14 - P v.s. NP 1

U.C. Berkeley CS278: Computational Complexity Professor Luca Trevisan 9/6/2004. Notes for Lecture 3

Transcription:

Average Case Complexity under the Universal Distribution Equals Worst Case Complexity Ming Li University of Waterloo, Department of Computer Science Waterloo, Ontario N2L 3G1, Canada aul M.B. Vitanyi Centrum voor Wiskunde en Informatica, Kruislaan 413 1098 SJ Amsterdam, The Netherlands and Universiteit van Amsterdam, Faculteit Wiskunde en Informatica Abstract The average complexity of any algorithm whatsoever (provided it always terminates) under the universal distribution is of the same order of magnitude as the worst-case complexity. This holds both for time complexity and for space complexity. To focus our discussion, we use as illustrations the particular case of sorting algorithms, and the general case of the average case complexity of Ncomplete problems. 1 Introduction 1 For many algorithms the average case running time under some distributions on the inputs is less than the worst-case running time. For instance, using Quicksort on a list of n items to be sorted gives under the Uniform Distribution on the inputs an average running time of O(n log n) while the worst-case running time is (n 2 ). The worst-case running time of Quicksort is typically reached if the list is already sorted or almost sorted, that is, exactly in cases where we actually should not have to do much work at all. Since in practice the lists to be sorted occurring in computer computations are very likely to be sorted or almost sorted, programmers implementing systems involving sorting algorithms tend to resort 1 The work of the rst author was supported in part by NSERC Operating Grant OG0036747. art of the work was performed while he was with the Department of Computer Science, York University, North York, Ontario, Canada. The work of the second author was supported in part by NSERC International Scientic Exchange Award ISE0046203. A preliminary version of this work was presented at the 30th Annual IEEE Symposium on Foundations of Computer Science, 1989, pp. 34-39.

to fast sorting algorithms of which the provable average run-time is of equal order of magnitude as the worst-case run-time, even though this average running time can only be proved to be O(n log 2 n) under the Uniform Distribution as in the case of Shellsort, or to some randomized version of Quicksort. In the case of N-complete problems the question arises whether there are algorithms that solve them in polynomial time \on the average". Whether this phenomenon occurs must depend on the combination of the particular N-complete problem to be solved and the distribution of the instances. Obviously, some combinations are easy on the average, and some combinations are hard on the average, by tailoring the distribution to the ease or hardness of the individual instances of the problem. This raises the question of a meaningful denition of a \hard on the average" problem. L.A. Levin [Le] has shown that for the Tiling problem with uniform distribution of instances there is no polynomial on the average algorithm, unless there exists such an algorithm for each combination of an N-complete problem and polynomial time computable probability distribution. Here it is shown that under the Universal Distribution all N-complete problems are hard to compute on the average unless = N. 2 The Universal Distribution Let N, Q, and R denote the set of nonnegative integers, nonnegative rational numbers, and nonnegative real numbers, respectively. A superscript `+' excludes zero. We consider countably innite sample spaces, say S = S N fug, where u is an `undened' element not in N. A function from S into R, such that x2s (x) 1 denes a probability distribution on S. (This allows us to consider defective probability distributions on the natural numbers, which sum to less than one, by concentrating the surplus probability on u.) A probability distribution is called enumerable, if the set of points f(x; y) : x 2 N; y 2 Q; y (x)g is recursively enumerable. That is, (x) can be approximated from below by a Turing machine, for all x 2 N. ( (u) can be approximated from above. A probability distribution is recursive if (x) can be approximated both from below and above by a Turing machine, for all x.) Levin has shown that we can eectively enumerate all enumerable probability distributions, 1 ; 2 ; :::. In particular, there exists a universal enumerable probability distribution, denoted by, say, m, such that 8 k2n 9 c2q 8 x2n [cm(x) k (x)]: (1) That is, m dominates each k multiplicatively. It is convenient to dene m(x) = 2?K(x) ; (2) where K(x) is the prex variant of Kolmogorov complexity [G1]. In equation (1), the constant c can be set to c = 2 K( k )+O(1) = 2 K(k)+O(1) = O(k log 2 k): (3) This means that we can take c to be exponential in the length of the shortest self-delimiting binary program to compute k. The universal distribution (rather, its continuous version) was originally discovered by R.J. Solomono in 1964, with the aim of predicting continuations of nite prexes of innite binary sequences. We can

view the discrete probability density m as the a priori probability 2 of nite objects in absence of any knowledge about them [So]. Levin has shown that Solomono's denition, and the two denitions (1) and (2) given above, are equivalent up to a multiplicative constant. Thus, three very dierent formalizations turn out to dene the same notion of universal probability. Such a circumstance is often taken as evidence that we are dealing with a fundamental concept. See [ZL] for the analogous notions in continuous sample spaces, [G2], and [LV1] or [LV2] for elaboration of the cited facts and proofs. This universal distribution has many important properties. Under m, easily describable objects have high probability, and complex or random objects have low probability. Other things being equal, it embodies Occam's Razor, which says we should prefer simple explanations over complicated ones. To give an example, with x = 2 n we have K(x) log n + 2 log log n + O(1) and m(x) = (1=n log 2 n). If we generate the binary representation of y by n tosses of a fair coin, apart from the leading `1', then for the overwhelming majority of outcomes we shall have K(y) n and m(y) = O(2?n ). By Markov's inequality, for any two probability distributions and Q, for all k, we have Q(x) (x)=k with -probability at least 1? 1=k. By equations (1) and (3) therefore, for each enumerable probability distribution (x) we have X f (x) : 2 K( )+O(1) m(x) (x) m(x)=kg 1? 1=k; (4) for all k > 0. In this sense, with high -probability, (x) is close to m(x), for each enumerable. The distribution m is the only enumerable one which has that property. If the problem instances are generated algorithmically, then the distribution is enumerable. In absence of any a priori knowledge of the actual distribution therefore, apart from that it is enumerable, studying the average behavior under m is considerably more meaningful than studying the average behavior under any other particular enumerable distribution. 3 Average Case Complexity Let x 2 N. Let l(x) denote the length of the binary representation of x. Let t(x) be the running time of algorithm A on problem instance x. Dene the worst-case time complexity of A as = maxft(x) : l(x) = ng. Dene the average time complexity of A with respect to a a probability distribution on the sample space S by Taverage(n) l(x) = n (x) t(x) = l(x) = n (x) : Example (Quicksort). Let us compare the average time complexity for Quicksort under the Uniform Distribution L(x) and the one under the Universal distribution m(x). Dene L(x) = 2?2l(x), such that the conditional probability L(xj l(x) = n) = 2?n. We encode the list of elements to be sorted as nonnegative integers in some standard way. 2 Consider an enumeration T1; T2; ::: of Turing machines with a separate binary one-way input tape. Let T be such a machine. If T halts with output x, then T has scanned a nite initial segment of the input, say p, and we dene T (p) = x. The set of such p for which T halts is a prex code: no such input is a proper prex of another one. Assume the input is provided by tosses of a fair coin. The probability that T halts with output x is T (x) = T(p) = x 2? l(p), where l(p) denotes the length of p. Then T (x) 1, the decit from one being the probability that T doesn't halt. Concentrate this x2n surplus probability on T (u), such that T (x) = 1. It can be shown that is an enumerable probability distribution x2s i = ( T ) for some T. In particular, U (x) = (m(x)) for a universal machine U. From this, properties (1), (2), and (3) can be derived.

For Quicksort, Taverage(n) L = (n log n). We may expect that Taverage(n) m = (n log n). But the Theorem will tell us much more, namely, Taverage(n) m = (n 2 )! Let us give some intuition why this is the case. With the low average time-complexity under the Uniform Distribution, there can only be o((log n)2 n =n) strings x of length n with t(x) = (n 2 ). Therefore, given n, each such string can be described by its sequence number in this small set, and hence for each such x we nd K(xj n) n? log n + 3 log log n. (Since n is known, we can nd each n? k by coding k self-delimiting in 2 log k bits. The inequality follows by setting k = log n? log log n.) Therefore, no really random x's, with K(xjn) n, can achieve the worst-case run time (n 2 ). Only strings x which are non-random, with K(xjn) n, among which are the sorted or almost sorted lists, and lists exhibiting other regularities, can have (n 2 ) running time. Such lists x have relatively low Kolmogorov complexity K(x) since they are regular (can be shortly described), and therefore m(x) = 2?K(x) is very high. Therefore, the contribution of these strings to the average running time is weighted very heavily. This intuition can be made precise in a much more general form. We assume that all inputs to an algorithm are coded as integers according to some standard encoding. Theorem. Let A be any algorithm, provided it terminates for all inputs in N. Let the inputs to A be distributed according to m. Then the average case time complexity is of the same order of magnitude as the corresponding worst-case time complexity. roof. We dene a probability distribution (x) on the inputs that assigns high probability to the inputs for which the worst-case complexity is reached, and zero probability for other cases. Let A be the algorithm involved. Let be the worst-case time complexity of A. Clearly, is recursive (for instance by running A on all x's of length n). Dene the probability distribution (x) by: 1 For each n = 1; 2; :::, dene a n := l(x) = n m(x); 2 if l(x) = n and x is lexicographically least with t(x) =, then (x) := a n, else (x) := 0. It is easy to see that a n is enumerable since m(x) is enumerable. Therefore, (x) is enumerable. Setting (u) = m(u), we have dened (x) such that x2s (x) = x2s m(x), and (x) is an enumerable probability distribution. The average case time complexity Taverage(n) m with respect to the m(x) distribution on the inputs, using c m(x) (x) by (1), is obtained by: where T m average(n) = 1 c 1 c = X l(x) = n X l(x) = n l(x) = n l(x) = n m(x) t(x) l(x) = n m(x) (x) m(x) (x) (x) l(x) = n c ; l(x) = n (x) = 1: l(x) = n m(x) The proof of the theorem is nished by the observation that T m average(n)

holds vacuously. If in the proof is k in the standard eective enumeration 1 ; 2 ; ::: of enumerable semimeasures, then we can set c k log 2 k by equation (3). Namely, considering the binary representations of positive integers, c k is a prex code with l(c(k)) = log k + 2 log log k. Since there is a Turing machine halting with output k i the input is c(k), the length K(k) of the shortest prex free program for k does not exceed l(c(k)). This gives an interpretation to the constant of proportionality between the m-average complexity and the worst-case complexity: if the algorithm to approximate (x) from below is the kth algorithm in the standard eective enumeration of all algorithms, then: T m average(n) k log 2 k : Hence we must code the algorithm to compute as compact as possible to get the most signicant lower bound. That is, the ease with which we can describe (algorithmically) the strings which produce a worst case running time determines the closeness of the average time complexity to the worst-case time complexity. It would seem that the result has implications for algorithm design. For large n, average case analysis is misleading because real inputs tend to be distributed according to the universal distribution, not according to the uniform distribution. But the constant of proportionality in the high order term is something like 2?K( ). Consider Quicksort again. It runs in n log n time under the uniform distribution but n 2 time worst case. So its real average time complexity might be something like n log n + n 2? K( ). As long as the input size n satises n log n n 2? K( ), like when K( ) log n, experimental testing of the average running time of Quicksort must show a considerably improvement over the n 2 worst case behavior, corresponding to the analysis for the uniform distribution. Here K( ) is the size of the shortest program to generate the pseudo uniform distribution over the sample. Frequently people use pseudo random permutations in order to kill o the worst case behavior, or to choose the `pivot' in the algorithm randomly. This results in randomized Quicksort. Again, the Kolmogorov complexity of the random number generator must be at least log n in order to drive the high order term down to n. Thus, random number generators should be selected with the input size to the nal algorithm in mind. An interesting question is whether any random number generator of Kolmogorov complexity log n is sucient { or are they all sucient? We nish with some immediate corollaries. Corollary. The analogue of the Theorem holds for other complexity measures (like space complexity), by about the same proof. Corollary. The m-average time complexity of Quicksort is (n 2 ). Corollary. For each N-complete problem, if the problem instances are distributed according to m, then the average running time of any algorithm that solves it is superpolynomial unless = N. (A result related to this corollary is suggested in [BCGL], apparently using dierent arguments.) Following the work reported here, related questions with respect to more feasible classes of probability distributions (like polynomial time computable ones) have been studied in [Mi]. Acknowledgements. Richard Beigel, Benny Chor, Gloria Kissin, John Tromp, and Vladimir Uspenskii commented on the manuscript. Mike O'Donnell raised the question we address here during a lecture given by the second author.

References fbcgl] S. Ben-David, B. Chor, O. Goldreich, M. Luby, On the theory of average case complexity, roc. 21th STOC, 1989, pp. 204-216. [G1]. Gacs, On the symmetry of algorithmic information, Soviet Math. Dokl., 15(1974), pp. 1477-1481, (Correction, Ibid., 15(1974), p. 1481) [G2]. Gacs, Lecture notes on descriptional complexity and randomness, Manuscript, Boston University, Boston, Mass., October 1987 (Unpublished). [LV1] M. Li and. Vitanyi, Kolmogorov complexity and its applications, in Handbook for Theoretical Computer Science, Vol. 1, Jan van Leeuwen, Managing Editor, North-Holland, 1990. More up-to-date: M. Li and. Vitanyi, An Introduction to Kolmogorov complexity and Its Applications, Springer-Verlag, New York, 1997 (2nd Edition). [LV2] M. Li and. Vitanyi, Inductive reasoning and Kolmogorov complexity, 4th IEEE Structure in Complexity Theory conference, 1989, pp. 165-185. Final version: J. Comp. System Sciences, 44:2(1992), 343-384. See also:.m.b. Vitanyi and M. Li, Minimum Description Length Induction, Bayesianism, and Kolmogorov Complexity, IEEE Trans. Inform. Theory, IT-46:2(2000), 446{464. [Le] L.A. Levin, Average case complete problems, SIAM J. Comp., 15(1986), pp. 285,286. [Mi].B. Milterson, The complexity of malign ensembles, Tech. Rept. B-335, DAIMI, Aarhus University, September 1990. [ZV] A.K. Zvonkin and L.A. Levin, The complexity of nite objects and development of the concepts of information and randomness by means of the theory of algorithms, Russian Math. Surveys, 25:6(1970), pp. 83-124.