Lecture 13: Foundations of Math and Kolmogorov Complexity

6.045 Lecture 13: Foundations of Math and Kolmogorov Complexity 1

Self-Reference and the Recursion Theorem 2

Lemma: There is a computable function q : Σ* Σ* such that for every string w, q(w) is the description of a TM P w that on every input, prints out w and then accepts Proof Define a TM Q: s w Q P w w 3

Theorem: There is a Self-Printing TM Proof: First define a TM B which does this: M B w P M M M Now consider the TM that looks like this: w B w P B B P B B B No explicit self-reference here! QED 4

The Recursion Theorem Theorem: For every TM T computing a function t : Σ* Σ* Σ* there is a Turing machine R computing a function r : Σ* Σ*, such that for every string w, r(w) = t(r, w) (a,b) T t(a,b) w R t(r,w) 5

Proof: (a,b) T t(a,b) Define M = N B N w B Y w T w P N N w N Define R: What is S? w P M M B S w T t(s,w) 6

Proof: (a,b) Define M = T M t(a,b) B What is M(M,w)? N w B Y w T w P M M w M Define R: S M S w P B T t(s,w) M w 7

Proof: (a,b) T t(a,b) M B w M PM w B Y w T Define R: S S = Y = R. QED M S w P B T t(s,w) M w 8

FOO x (y) := Output x and halt. BAR(M) := Output N(w) = Run FOO M outputting M. Run M on (M, w) Q(N, w) := Run BAR(N) outputting S. Run T on (S, w) R(w) := Run FOO Q outputting Q. Run BAR(Q) outputting S. Run T on (S, x) Claim: S is a description of R itself! S(w) = Run FOO Q outputting Q. Run Q on (Q, w)

FOO x (y) := Output x and halt. BAR(M) := Output N(w) = Run FOO M outputting M. Run M on (M, w) Q(N, w) := Run BAR(N) outputting S. Run T on (S, w) R(w) := Run FOO Q outputting Q. Run BAR(Q) outputting S. Run T on (S, w) Claim: S is a description of R itself! S(w) = Run FOO Q outputting Q. Run BAR(Q) outputting S. Run T on (S, w) Therefore R(w) = T(R, w)

For every computable t, there is a computable r such that r(w) = t(r,w) where R is a description of r Suppose we can design a TM T of the form: On input (x,w), do bla bla with x, do bla bla bla with w, etc. etc. We can then find a TM R with the behavior: On input w, do bla bla with R, do bla bla bla with w, etc. etc. We can use the operation: Obtain your own description in Turing machine pseudocode! 11

Theorem: A TM is undecidable Proof (using the recursion theorem) Assume H decides A TM Construct machine B such that on input w: 1. Obtains its own description B 2. Runs H on (B, w) and flips the output Running B on input w always does the opposite of what H says it should! A formalization of free will paradoxes! No single machine can predict behavior of all others 12

Turing Machine Minimization MIN = {M M is a minimal-state Turing machine} Theorem: MIN is unrecognizable! Proof: Suppose we could recognize MIN with TM M M(x) := Obtain description of M. For k = 1,2,, Run M on the TMs M 1, M k for k steps, until M accepts some M i with Q(M i ) > Q(M). Output M i (x). We have: 1. L M = L M i [by construction] 2. M has fewer states than M i 3. M i is minimal [by definition of MIN] CONTRADICTION! 13

Computability and the Foundations of Mathematics 17

Formal Systems of Mathematics A formal system describes a formal language for - writing (finite) mathematical statements, - has a definition of a proof of a statement Example: Every TM M defines some formal system F - {Mathematical statements in F } = * String w represents the statement M halts on w - A proof that M halts on w can be defined to be the computation history of M on w: the sequence of configurations C 0 C 1 C t that M goes through while computing on w Could sometimes prove M doesn t halt on w 18

Interesting Systems of Mathematics Define a formal system F to be interesting if: 1. Any mathematical statement about computation can be (computably) described as a statement of F. Given (M, w), there is a (computable) S M,w of F such that S M,w is true in F if and only if M accepts w. 2. Proofs are convincing a TM can check that a candidate proof of a theorem is correct This set is decidable: {(S, P) P is a proof of S in F } 3. If S is in F and there is a proof of S describable as a computation, then there s a proof of S in F. If M halts on w, then there s either a proof P of S M,w or a proof P of S M,w 19

Consistency and Completeness A formal system F is inconsistent if there is a statement S in F such that both S and S are provable in F F is consistent if it is NOT inconsistent A formal system F is incomplete if there is a statement S in F such that neither S nor S are provable in F F is complete if it is NOT incomplete 20

Limitations on Mathematics! For every consistent and interesting F, Theorem 1. (Gödel 1931) F must be incomplete! There are mathematical statements that are true but cannot be proved. Theorem 2. (Gödel 1931) The consistency of F cannot be proved in F. Theorem 3. (Church-Turing 1936) The problem of checking whether a given statement in F has a proof is undecidable. 21

Unprovable Truths in Mathematics (Gödel) Every consistent interesting F is incomplete: there are statements that cannot be proved or disproved. Let S M, w in F be true if and only if M accepts w Proof: Define TM G(w): 1. Obtain own description G [Recursion Theorem!] 2. For all strings P in lexicographical order, If (P is a proof of S G, w in F ) then reject If (P is a proof of S G, w in F ) then accept Note: If F is complete then G cannot run forever! 1. If (G accepts w) then have proof P of G doesn t accept w 2. If (G rejects w) then found proof P of G accepts w In either case, F is inconsistent! Proof of S G, w and S G, w 22

(Gödel 1931) The consistency of F cannot be proved within any interesting consistent F Proof: Assume we can prove F is consistent in F We constructed :S G, w = G does not accept w which is true, but has no proof in F G does not accept w There is no proof of :S G, w in F But if there s a proof of F is consistent in F, then there is a proof of :S G, w in F (here s the proof): If S G, w is true, then there is a proof in F of S G, w and a proof in in F of :S G, w But since F is consistent, this cannot be true. Therefore, :S G, w is true This contradicts the previous theorem! 23

Undecidability in Mathematics PROVABLE F = {S there s a proof in F of S, or there s a proof in F of :S} (Church-Turing 1936) For every interesting consistent F, PROVABLE F is undecidable Proof: Suppose PROVABLE F is decidable with TM P. Then we can decide A TM with the following procedure: On input (M, w), run the TM P on input S M,w If P accepts, examine all proofs in lex order If a proof of S M,w is found then accept If a proof of :S M,w is found then reject If P rejects, then reject. Why does this work? 24

Kolmogorov Complexity: A Universal Theory of Data Compression 25

The Church-Turing Thesis: Everyone s Intuitive Notion of Algorithms = Turing Machines This is not a theorem it is a falsifiable scientific hypothesis. A Universal Theory of Computation 26

A Universal Theory of Information? Can we quantify how much information is contained in a string? A = 01010101010101010101010101010101 B = 110010011101110101101001011001011 Idea: The more we can compress a string, the less information it contains. 27

Information as Description Thesis: The amount of information in a string x is the length of the shortest description of x Let x 2 {0,1}* How should we describe strings? Use Turing machines with inputs! Def: A description of x is a string <M,w> such that M on input w halts with only x on its tape. Def: The shortest description of x, denoted as d(x), is the lexicographically shortest string <M,w> such that M(w) halts with only x on its tape. 28

A Specific Pairing Function Theorem. There is a 1-1 computable function <,> : Σ* x Σ* Σ* and computable functions 1 and 2 : Σ* Σ* such that: z = <M,w> iff 1 (z) = M and 2 (z) = w Define: <M,w> := 0 M 1 M w (Example: <10110,101> = 00000110110101) Note that <M,w> = 2 M + w + 1 29

Kolmogorov Complexity (1960 s) Definition: The shortest description of x, denoted as d(x), is the lexicographically shortest string <M,w> such that M(w) halts with only x on its tape. Definition: The Kolmogorov complexity of x, denoted as K(x), is d(x). EXAMPLES?? Let s first determine some properties of K. Examples will fall out of this. 30

A Simple Upper Bound Theorem: There is a fixed c so that for all x in {0,1}* K(x) x + c The amount of information in x isn t much more than x Proof: Define a TM M = On input w, halt. On any string x, M(x) halts with x on its tape. Observe that <M,x> is a description of x. Let c = 2 M +1 Then K(x) <M,x> 2 M + x + 1 x + c 31

Repetitive Strings have Low K-Complexity Theorem: There is a fixed c so that for all n 2, and all x ϵ {0,1}*, K(x n ) K(x) + c log n The information in x n isn t much more than that in x Proof: Define the TM N = On input <n,<m,w>>, Let x = M(w). Print x for n times. Let <M,w> be the shortest description of x. Then K(x n ) K(<N,<n,<M,w>>>) 2 N + d log n + K(x) c log n + K(x) for some constants c and d 32

Repetitive Strings have Low K-Complexity Theorem: There is a fixed c so that for all n 2, and all x ϵ {0,1}*, K(x n ) K(x) + c log n The information in x n isn t much more than that in x Recall: A = 01010101010101010101010101010101 For w = (01) n, we have K(w) K(01) + c log n So for all n, K((01) n ) d + c log n for a fixed c, d 33

Does The Computational Model Matter? Turing machines are one programming language. If we use other programming languages, could we get significantly shorter descriptions? An interpreter is a semi-computable function p : Σ* Σ* Takes programs as input, and (may) print their outputs Definition: Let x ϵ {0,1}*. The shortest description of x under p, called d p (x), is the lexicographically shortest string w for which p(w) = x. Definition: The K p complexity of x is K p (x) := d p (x). 34

Does The Computational Model Matter? Theorem: For every interpreter p, there is a fixed c so that for all x ϵ {0,1}*, K(x) K p (x) + c Moral: Using another programming language would only change K(x) by some additive constant Proof: Define M = On w, simulate p(w) and write its output to tape Then <M,d p (x)> is a description of x, so K(x) <M,d p (x)> 2 M + K p (x) + 1 c + K p (x) 35

There Exist Incompressible Strings Theorem: For all n, there is an x ϵ {0,1} n such that K(x) n There are incompressible strings of every length Proof: (Number of binary strings of length n) = 2 n but (Number of descriptions of length < n) (Number of binary strings of length < n) = 1 + 2 + 4 + + 2 n-1 = 2 n 1 Therefore, there is at least one n-bit string x that does not have a description of length < n 36

Random Strings Are Incompressible! Theorem: For all n and c 1, Pr x ϵ {0,1} n[ K(x) n-c ] 1 1/2 c Most strings are highly incompressible Proof: (Number of binary strings of length n) = 2 n but (Number of descriptions of length < n-c) (Number of binary strings of length < n-c) = 2 n-c 1 Hence the probability that a random x satisfies K(x) < n-c is at most (2 n-c 1)/2 n < 1/2 c. 37

KOLMOGOROV DIRECTIONS 38

Kolmogorov Complexity: Try it! Give short algorithms for generating the strings: 1. 01000110110000010100111001011101110000 2. 123581321345589144233377610987 3. 126241207205040403203628803628800 39

Kolmogorov Complexity: Try it! Give short algorithms for generating the strings: 1. 01000110110000010100111001011101110000 2. 123581321345589144233377610987 3. 126241207205040403203628803628800 40

Kolmogorov Complexity: Try it! Give short algorithms for generating the strings: 1. 01000110110000010100111001011101110000 2. 123581321345589144233377610987 3. 126241207205040403203628803628800 41

Kolmogorov Complexity: Try it! Give short algorithms for generating the strings: 1. 01000110110000010100111001011101110000 2. 123581321345589144233377610987 3. 126241207205040403203628803628800 This seems hard to determine in general. Why? 42

Computing Compressibility? Can an algorithm perform optimal compression? Can algorithms tell us if a given string is compressible? COMPRESS = { (x,c) K(x) c} Theorem: COMPRESS is undecidable! Idea: If decidable, we could design an algorithm that prints the shortest incompressible string of length n But such a string could then be succinctly described, by providing the algorithm code and n in binary! Berry Paradox: The smallest integer that cannot be defined in less than thirteen words. 43

Computing Compressibility? COMPRESS = {(x,c) K(x) c} Theorem: COMPRESS is undecidable! Proof: Suppose it s decidable. Consider the TM: M = On input x ϵ {0,1}*, let N = 2 x. For all y ϵ {0,1}* in lexicographical order, If (y,n) COMPRESS then print y and halt. M(x) prints the shortest string y with K(y ) > 2 x. <M,x> is a description of y, and <M,x> d + x So 2 x < K(y ) d + x. CONTRADICTION for large x! 44

Yet Another Proof that A TM is Undecidable! COMPRESS = {(x,c) K(x) c} Theorem: A TM is undecidable. Proof: Reduction from COMPRESS to A TM. Given a pair (x,c), our reduction constructs a TM: M x,c = On input w, For all pairs <M,w > with <M,w > c, simulate each M on w in parallel. If some M halts and prints x, then accept. K(x) c M x,c accepts ε 45

More on Interesting Formal Systems A formal system F is interesting if it is finite and: 1. Any mathematical statement about computation can also be effectively described within F. For all strings x and integers c, there is a S x,c in F that is equivalent to K(x) c 2. Proofs are convincing: it should be possible to check that a proof of a theorem is correct This set is decidable: { (S,P) P is a proof of S in F } 46

The Unprovable Truth About K-Complexity Theorem: For every interesting consistent F, There is a t s.t. for all x, K(x) > t is unprovable in F Proof: Define an M that treats its input as an integer: M(k) := Search over all strings x and proofs P for a proof P in F that K(x) > k. Output x if found Suppose M(k) halts. It must print some output x Then K(x ) = K(<M,k>) c + k c + log k for some c Because F is consistent, K(x ) > k is true But k < c + log k only holds for small enough k If we choose t to be greater than these k then M(t) cannot halt, so K(x) > t has no proof! 47

Random Unprovable Truths Theorem: For every interesting consistent F, There is a t s.t. for all x, K(x) > t is unprovable in F For a randomly chosen x of length t+100, K(x) > t is true with probability at least 1-1/2 100 We can randomly generate true statements in F which have no proof in F, with high probability! For every interesting formal system F there is always some finite integer (say, t=10000) so that you ll never be able to prove in F that a random 20000-bit string requires a 10000-bit program! 48

Proving Theorems With K-Complexity Theorem: L = {x x x 0, 1 } is not regular. Proof: Suppose L is recognized by a DFA D. Let n 0 and choose an x 0, 1 such that K x n Let q x be the state of D reached after reading in x Define a TM M(D, q, n): Find a path P in D of length n that starts from state q and ends in an accept state. Print the n-bit string along path P, and halt. Claim: The pair <M,(D, q x, n)> is a description of x! So n K x <M,(D, q x, n)> O log n CONTRADICTION! 50

Next Episode: Complexity Theory! 51

Repetitive Strings have Low Information Theorem: There is a fixed c so that for all x ϵ {0,1}* K(xx) K(x) + c The information in xx isn t much more than that in x Proof: Let N = On <M,w>, let s = M(w). Print ss. Suppose <M,w> is the shortest description of x. Then <N,<M,w>> is a description of xx Therefore K(xx) <N,<M,w>> 2 N + <M,w> + 1 2 N + K(x) + 1 c + K(x) 57

Information as Description Thesis: The amount of information in a string x is the length of the smallest description of x Let x 2 {0,1}* How should we describe strings? Use Turing Machines! Def. The shortest description of x, denoted d(x), is the lexicographically shortest string M such that the TM M on input ε halts with only x on its tape. 58

Kolmogorov Complexity (1960 s) Def. The shortest description of x, denoted d(x), is the lexicographically shortest string M such that the TM M on input ε halts with only x on its tape. Definition: The Kolmogorov complexity of x, denoted as K(x), is d(x). EXAMPLES?? Let s first determine some properties of K. Examples will fall out of this. 59

A Simple Upper Bound Theorem: There is a fixed c so that for all x in {0,1}* K(x) x + c The amount of information in x isn t much more than x Proof: For any string x, define a TM M x = On input w, overwrite w with x, and halt. Then M x on ε halts with just x on its tape. We have K(x) M x x + c 60