Kolmogorov complexity and its applications

Similar documents
Kolmogorov complexity and its applications

CISC 876: Kolmogorov Complexity

Sophistication Revisited

2 Plain Kolmogorov Complexity

Lecture 13: Foundations of Math and Kolmogorov Complexity

Kolmogorov complexity

Computability Theory

Is there an Elegant Universal Theory of Prediction?

Notes for Lecture Notes 2

Introduction to Kolmogorov Complexity

Limitations of Efficient Reducibility to the Kolmogorov Random Strings

1 Computational Problems

Time, Space, and Energy in Reversible Computing. Paul Vitanyi, CWI, University of Amsterdam, National ICT Australia

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

CS154, Lecture 12: Kolmogorov Complexity: A Universal Theory of Data Compression

17.1 The Halting Problem

Gödel s Incompleteness Theorem. Overview. Computability and Logic

CSE 4111/5111/6111 Computability Jeff Edmonds Assignment 3: Diagonalization & Halting Problem Due: One week after shown in slides

Complexity 6: AIT. Outline. Dusko Pavlovic. Kolmogorov. Solomonoff. Chaitin: The number of wisdom RHUL Spring Complexity 6: AIT.

Lecture 11: Gödel s Second Incompleteness Theorem, and Tarski s Theorem

U.C. Berkeley CS278: Computational Complexity Professor Luca Trevisan August 30, Notes for Lecture 1

Complex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity

Compression Complexity

Algorithmic Probability

Lecture 11: Quantum Information III - Source Coding

Introduction to Turing Machines. Reading: Chapters 8 & 9

Counting dependent and independent strings

Large Numbers, Busy Beavers, Noncomputability and Incompleteness

The axiomatic power of Kolmogorov complexity

Complexity Theory VU , SS The Polynomial Hierarchy. Reinhard Pichler

Outline. Complexity Theory EXACT TSP. The Class DP. Definition. Problem EXACT TSP. Complexity of EXACT TSP. Proposition VU 181.

Chapter 2 Algorithms and Computation

CSCI 2200 Foundations of Computer Science Spring 2018 Quiz 3 (May 2, 2018) SOLUTIONS

CSE 105 THEORY OF COMPUTATION. Spring 2018 review class

Decidability: Church-Turing Thesis

Finish K-Complexity, Start Time Complexity

Decision Problems with TM s. Lecture 31: Halting Problem. Universe of discourse. Semi-decidable. Look at following sets: CSCI 81 Spring, 2012

TURING MAHINES

Lecture Notes: The Halting Problem; Reductions

Warm-Up Problem. Please fill out your Teaching Evaluation Survey! Please comment on the warm-up problems if you haven t filled in your survey yet.

Theory of Computation

Computability and Complexity

Theory of Computation

CPSC 421: Tutorial #1

Pseudorandom Generators

Computability Theory. CS215, Lecture 6,

Gödel s Incompleteness Theorem. Overview. Computability and Logic

CS154, Lecture 10: Rice s Theorem, Oracle Machines

Randomness. What next?

Decidable Languages - relationship with other classes.

Computational Models Lecture 8 1

Computation. Some history...

1 Showing Recognizability

CSCI3390-Lecture 6: An Undecidable Problem

CS21 Decidability and Tractability

Undecidability. Andreas Klappenecker. [based on slides by Prof. Welch]

Logic: The Big Picture

CSE 105 THEORY OF COMPUTATION

CS294: Pseudorandomness and Combinatorial Constructions September 13, Notes for Lecture 5

On approximate decidability of minimal programs 1

CS 125 Section #10 (Un)decidability and Probability November 1, 2016

Undecidability COMS Ashley Montanaro 4 April Department of Computer Science, University of Bristol Bristol, UK

Part I: Definitions and Properties

COS597D: Information Theory in Computer Science October 19, Lecture 10

Turing Machines Part III

Introduction to Languages and Computation

The Turing Machine. Computability. The Church-Turing Thesis (1936) Theory Hall of Fame. Theory Hall of Fame. Undecidability

Algorithmic Information Theory

Preface These notes were prepared on the occasion of giving a guest lecture in David Harel's class on Advanced Topics in Computability. David's reques

Predicate Logic - Undecidability

Symmetry of Information: A Closer Look

Turing Machines. Lecture 8

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

Random Reals à la Chaitin with or without prefix-freeness

MTAT Complexity Theory October 13th-14th, Lecture 6

Computable Functions

CS154, Lecture 17: conp, Oracles again, Space Complexity

Computer Sciences Department

A version of for which ZFC can not predict a single bit Robert M. Solovay May 16, Introduction In [2], Chaitin introd

Kolmogorov-Loveland Randomness and Stochasticity

1 From previous lectures

Computational Models Lecture 8 1

Cogito ergo sum non machina!

Introduction to Turing Machines

Informal Statement Calculus

CSCE 551: Chin-Tser Huang. University of South Carolina

Algorithmic Information Theory

König s Lemma and Kleene Tree

Kolmogorov Complexity and Diophantine Approximation

Universal probability distributions, two-part codes, and their optimal precision

CSC 5170: Theory of Computational Complexity Lecture 9 The Chinese University of Hong Kong 15 March 2010

to fast sorting algorithms of which the provable average run-time is of equal order of magnitude as the worst-case run-time, even though this average

The complexity of stochastic sequences

Undecidability. We are not so much concerned if you are slow as when you come to a halt. (Chinese Proverb)

Lecture 20: conp and Friends, Oracles in Complexity Theory

Computational Models Lecture 8 1

YALE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE

Lecture 6: Oracle TMs, Diagonalization Limits, Space Complexity

CS151 Complexity Theory. Lecture 1 April 3, 2017

CSE 105 Theory of Computation

Transcription:

Spring, 2009 Kolmogorov complexity and its applications Paul Vitanyi Computer Science University of Amsterdam http://www.cwi.nl/~paulv/course-kc

We live in an information society. Information science is our profession. But do you know what is information, mathematically, and how to use it to prove theorems? You will, by the end of the term.

Examples Average case analysis of Shellsort. Open since 1959. What is the distance between two pieces of information carrying entities? For example, distance from an internet query to an answer.

Lecture 1. History and Definition History Intuition and ideas in the past Inventors Basic mathematical theory The course. Textbook: Li-Vitanyi: An introduction to Kolmogorov complexity and its applications, preferably the third edition! Homework every week about last lecture, mid-term and final exam (or possibly individual project and presentation).

1. Intuition & history What is the information content of an individual string? 111. 1 (n 1 s) π = 3.1415926 n = 2 1024 Champernowne s number: 0.1234567891011121314 is normal in scale 10 (every block has same frequency) All these numbers share one commonality: there are small programs to generate them. Shannon s information theory does not help here.

1903: An interesting year This and the next two pages were stolen from Lance Fortnow

1903: An interesting year Kolmogorov Church von Neumann

Andrey Nikolaevich Kolmogorov (1903, Tambov, Russia 1987 Moscow) Measure Theory Probability Analysis Intuitionistic Logic Cohomology Dynamical Systems Hydrodynamics Kolmogorov complexity

When there were no digital cameras (1987).

A story of Dr. Samuel Johnson Dr. Beattie observed, as something remarkable which had happened to him, that he chanced to see both No.1 and No.1000 hackney-coaches. Why sir, said Johnson there is an equal chance for one s seeing those two numbers as any other two. Boswell s Life of Johnson

The case of cheating casino Bob proposes to flip a coin with Alice: Alice wins a dollar if Heads; Bob wins a dollar if Tails Result: TTTTTT. 100 Tails in a roll. Alice lost $100. She feels being cheated.

Alice goes to the court Alice complains: T 100 is not random. Bob asks Alice to produce a random coin flip sequence. Alice flipped her coin and got THTTHHTHTHHHTTTTH But Bob claims Alice s sequence has probability 2-100, and so does his. How do we define randomness?

2. Roots of Kolmogorov complexity and preliminaries (1) Foundations of Probability P. Laplace: a sequence is extraordinary (nonrandom) because it contains regularity (which is rare). 1919. von Mises notion of a random sequence S: lim n { #(1) in n-prefix of S}/n =p, 0<p<1 The above holds for any subsequence of S selected by an admissible selection rule. If `admissible rule is any partial function then there are no random sequences. A. Wald: countably many admissible selection rules. Then there are random sequences. A. Church: recursive selection functions J. Ville: von Mises-Wald-Church random sequence does not satisfy all laws of randomness.

Roots (2) Information Theory. Shannon theory is on an ensemble. But what is information in an individual object? (3) Inductive inference. Bayesian approach using universal prior distribution (4) Shannon s State x Symbol (Turing machine) complexity.

Preliminaries and Notations Strings: x, y, z. Usually binary. x=x 1 x 2... an infinite binary sequence x i:j =x i x i+1 x j x is number of bits in x. Textbook uses l(x). Sets, A, B, C A, number of elements in set A. Textbook uses d(a). K-complexity vs C-complexity, names etc. I assume you know Turing machines, universal TM s, basic facts...

3. Mathematical Theory Solomonoff (1960)-Kolmogorov (1965)-Chaitin (1969): The amount of information in a string is the size of the smallest program of an optimal Universal TM U generating that string. C (x) = min { p : U(p) = x } U p Invariance Theorem: It does not matter which optimal universal Turing machine U we choose. I.e. all universal encoding methods are ok.

Proof of the Invariance theorem Fix an effective enumeration of all Turing machines (TM s): T 1, T 2, Define C = min { p : T(p) = x} T p U is an optimal universal TM such that (p produces x) U(1 n 0p) = T n (p) Then for all x: C U (x) C Tn (x) + n+1, and C U (x) C U (x) c. Fixing U, we write C(x) instead of C U (x). QED Formal statement of the Invariance Theorem: There exists a computable function S 0 such that for all computable functions S, there is a constant c S such that for all strings x ε {0,1} * C S0 (x) C S (x) + c S

It has many applications Mathematics --- probability theory, logic, statistics. Physics --- chaos, thermodynamics. Computer Science average case analysis, inductive inference and learning, shared information between documents, data mining and clustering, incompressibility method -- examples: Prime number theorem Goedel s incompleteness Shellsort average case Heapsort average case Circuit complexity Lower bounds on combinatorics, graphs,turing machine computations, formal languages, communication complexity, routing Philosophy, biology, cognition, etc randomness, inference, learning, complex systems, sequence similarity Information theory information in individual objects, information distance Classifying objects: documents, genomes Query Answering systems

Mathematical Theory cont. Intuitively: C(x)= length of shortest description of x Define conditional Kolmogorov complexity similarly, with C(x y)=length of shortest description of x given y. Examples C(xx) = C(x) + O(1) C(xy) C(x) + C(y) + O(log(min{C(x),C(y)}) C(1 n ) O(logn) C(π 1:n ) O(logn); C(π 1:n n) O(1) For all x, C(x) x +O(1) C(x x) = O(1) C(x ε) = C(x); C(ε x)=o(1)

3.1 Basics Incompressibility: For constant c>0, a string x ε {0,1} * is c-incompressible if C(x) x -c. For constant c, we often simply say that x is incompressible. (We will call incompressible strings random strings.) Lemma. There are at least 2 n 2 n-c +1 c-incompressible strings of length n. Proof. There are only k=0,,n-c-1 2 k = 2 n-c -1 programs with length less than n-c. Hence only that many strings (out of total 2 n strings of length n) can have shorter programs (descriptions) than n-c. QED.

Facts If x=uvw is incompressible, then C(v) v - O(log x ). Proof. C(uvw) = uvw uw +C(v)+ O(log u ) +O(log C(v)). If p is the shortest program for x, then C(p) p - O(1) C(x p) = O(1) but C(p x) C( p )+O(1) (optimal because of the Halting Problem!) If a subset A of {0,1}* is recursively enumerable (r.e.) (the elements of A can be listed by a Turing machine), and A is sparse ( A =n p(n) for some polynomial p), then for all x in A, x =n, C(x) O(log p(n) ) + O(C(n)) +O( A )= O(log n).

3.2 Asymptotics Enumeration of binary strings: 0,1,00,01,10, mapping to natural numbers 0, 1, 2, 3, C(x) as x Define m(x) to be the monotonic lower bound of C(x) curve (as natural number x ). Then m(x), as x, and m(x) < Q(x) for all unbounded computable Q. Nonmonotonicity: for x=yz, it does not imply that C(y) C(x)+O(1).

Graph of C(x) for integer x. Function m(x) is greatest monotonic non-decreasing lower bound.

Length-conditional complexity. Let x=x_1...x_n. Self-delimiting codes are x =1^n 0x with x =2n+1, and x = 1^{ n }0nx with x = n+2 n +1 ( n =log n). n-strings are x s of the form x=n 0...0 with n= x. Note that n =2 log n +1. So, for every n, C(x n)=o(1) for all n-strings.

Graph of C(x l(x)). Function m(x) is greatest monotonic non-decreasing lower bound.

3.3 Properties Theorem (Kolmogorov) (i) C(x) is not partially recursive. That is, there is no Turing machine M s.t. M accepts (x,k) if C(x) k and undefined otherwise. (ii) However, there is H(t,x) such that H(t+1,x) H(t,x) and lim t H(t,x)=C(x) where H(t,x) is total recursive. Proof. (i) If such M exists, then design M as follows. M simulates M on input (x,n), for all x =n in parallel (one step each), and outputs the first x such that M says `yes. Choose n >> M. Thus we have a contradiction: C(x) n by M, but M outputs x hence x =n >> M C(x) n. (ii) TM with program for x running for t steps defines H(t,x). QED

3.4 Godel s Theorem Theorem. The statement x is random (=incompressible) is undecidable for all but finitely many x. Proof (J. Barzdins, G. Chaitin). Let F be an axiomatic theory (sound, consistent, containing PA). C(F)= C. If the theorem is false and statement x is random is provable in F, then we can enumerate all proofs in F to find a proof of x is random. Consider x s with (1) x >> C+O(log x ), and output (first) random such x. Then (2) C(x) < C +O(log x ) But the proof for x is random implies that (3) C(x) x. Now (1)+(2)+(3) yields a contradiction, C+O(log x ) >> C+O(log x ). QED

3.5 Barzdin s Lemma A characteristic sequence of set A is an infinite binary sequence χ=χ 1 χ 2, χ i =1 iff iεa. Theorem. (i) The characteristic sequence χ of an r.e. set A satisfies C(χ 1:n n) log n+c A for all n. (ii) There is an r.e. set such that C(χ 1:n ) log n for all n. Proof. (i) Use the number m of 1 s in the prefix χ 1:n as termination condition [C(m) log n+o(1)]. (ii) By diagonalization. Let U be the universal TM. Define χ=χ 1 χ 2, by χ i =1 if the i-th bit output by U(i)< equals 0, otherwise χ i =0. χ defines an r.e. set. Suppose, for some n, we have C(χ 1:n )<log n. Then, there is a program p such that for all i n we have U(p,i)= χ i and p < log n, hence p<n. But U(p,p) not equal χ p by definition. QED