Complexity 6: AIT. Outline. Dusko Pavlovic. Kolmogorov. Solomonoff. Chaitin: The number of wisdom RHUL Spring Complexity 6: AIT.

Outline Complexity Theory Part 6: did we achieve? Algorithmic information and logical depth : Algorithmic information : Algorithmic probability : The number of wisdom RHUL Spring 2012 : Logical depth Outline did we achieve? did we achieve? : Algorithmic information : Algorithmic probability : The number of wisdom : Logical depth did we achieve? Why are there complex phenomena? The question was: Where does the complexity come from?

Why are there complex phenomena? Why are there complex phenomena? Why does the evolution increase complexity? Why are we so complex? is complexity? Which strings are complex? Did complexity theory bring us closer to an answer? 010101010101010101010101010101010 01 digits 0101100111000111110 1101 digits 1100110101010111010 1101 digits is complexity? is complexity? How do you know which strings are complex? 010101010101010101010 01 (01) 50 010101010101010101010101010101010 01 digits 0101100111000111110 1101 digits 1100110101010111010 1101 digits

is complexity? is complexity? 010101010101010101010 01 (01) 50 010101010101010101010 01 (01) 50 do i=1..50 write 01 od do i=1..50 write 01 od 010110011110 11 0 1 1 1 0 2 1 {{ 2 0 i 1 i i=1; do until length= write 0 i 1 i ;i=i+1od is complexity? Complexity = length of description 010101010101010101010 01 (01) 50 do i=1..50 write 01 od Idea 010110011110 11 0 1 1 1 0 2 1 {{ 2 0 i 1 i i=1; do until length= write 0 i 1 i ;i=i+1od Fix a language and define for any x C(x) = length of the simplest description x 110011010101 00 print 110011010101 00 Complexity = length of description is a description? C(0101010...) < C(010...) < C(110010...)

is a description? is a description? β = x β = x The length of the expression " x" is< 20 is a description? Berry Paradox β = x The length of the expression " x" is< 20 Therefore β does not exist. β = x The length of the expression " Therefore β does not exist. x" is< 20 Therefore all x satisfy C(x) 20 Berry Paradox Moral We need a formal definition of a description. β = x The length of the expression " x" is< 20 Therefore β does not exist. Therefore all x satisfy C(x) 20 " x" is not a valid description

Outline Idea: descriptions = programs did we achieve? : Algorithmic information M { N : Algorithmic probability ( ) u c : The number of wisdom N N N N : Logical depth Assume programs with a "length" measure Assume programs with a "length" measure M { N l M { N l ( ) u c ( ) u c N N N N N N N N l(p) =acode length of p l(p) =acode length of p (e.g. the number of symbols in p) Complexity = length of program Escape from Berry Paradox Idea complexity of x is the length of the simplest program that outputs x Fact The predicate K (x) y K (x) = {p()=x l(p) is not decidable, or else β = K (x)>20 x would be a program.

Consequence Definition Komogorov complexity does not satisfy the axioms of complexity measures. For a given 2-tape Turing Machine M, them-description distance from a string x to a string y is K M (y x) = l(p) M(p,x)=y The machine M is called the interpreter. Definition For a given 2-tape Turing Machine M, them-description distance from a string x to a string y is 1 Notation For any pair of functions f, g : N N we write K M (y x) = l(p) M(p,x)=y f + g c n. f (n) g(n)+c f + = g f + g g + f The machine M is called the interpreter. 1 The definition of infimum implies =. Proof. If U is universal machine for all 2-tape machines, i.e. Proposition 1 (The Invariance Theorem) U ( M, x, y) = M(x, y) There is a universal interpreter: a 2-tape Turing Machine Usuchthatforall2-tapeTuringMachinesMholds then define U (J(x, y), z) = U(x, y, z) K U (y x) + K M (y x) where J is Cantor s pairing function, so that U (J( M, p), x) = M(p, x) and thus K U (y x) K M (y x)+l ( M ) + 6

Scholium For any two universal interpreters U, V holds K U (y x) + = K V (y x) Definition The description distance from a string x to a string y (or complexity of y relative to x) isthe description distance with respect to a universal interpreter K (y x) = l(p) {p(x)=y Properties of description complexity Definition The description complexity (or complexity) of astringy is the description distance from the empty string to y K (y) = l(p) {p()=y #{x {0, 1 l K (x) < l m 2 l m m(x) K (x) l(x) for m(x) = x y K (y) lim x m(x) = φ RPF. lim x φ(x) = m(x) < φ(x) χ RTF. lim t χ(t, x) =K (x) h. K (x + h) K (x) < 2h Compression and randomness Compression and randomness Question Temperature is the average of kinetic energies. is the entropy average of? Theorem (Schack 1997) complexity can be defined with optimal encoding and then H(q) K (q i ) i I

Compression and randomness Compression and randomness Upshot Upshot The most informative strings cannot be compressed: The most informative strings cannot be compressed: K (x) = l(x) K (x) = l(x) This is s definition of randomness. Outline Entropy encoding: more likely shorter did we achieve? Shannon s Noiseless Coding : Algorithmic information Let µ : Σ [0, 1] be the frequency distribution of the alphabet Σ. Thentheoptimalwordlengthfor representing a symbol s Σ is : Algorithmic probability l(s) = log µ(s) : The number of wisdom : Logical depth Entropy encoding: more likely shorter Occam s Razor: shorter more likely Shannon s Noiseless Coding s Algorithmic Probability Let µ : Σ [0, 1] be the frequency distribution of the alphabet Σ. Thentheoptimalwordlengthfor representing a symbol s Σ is Let U be a self-delimiting universal Turing machine, i.e. U(p) = q. U(p :: q) l(s) = log µ(s) Then the aprioriprobability distribution of data from a dataset Σ is Remark µ(y) = U(p)=y 2 l(p) The Shannon entropy is the average word length.

Occam s Razor: shorter more likely Occam s Razor: shorter more likely s Algorithmic Probability s Algorithmic Probability Let U be a self-delimiting universal Turing machine, i.e. Let U be a self-delimiting universal Turing machine, i.e. U(p) = q. U(p :: q) U(p, x) = q. U(p :: q, x) Then the aprioriprobability distribution of data from a dataset Σ is Then the aprioriconditional probability distribution of data from a dataset Σ is µ(y) = 2 l(p) µ(y x) = 2 l(p) U(p)=y U(p,x)=y Remark Remark Self-delimiting = y Σ µ(y) 1. Self-delimiting = y Σ µ(y) 1. Inductive inference Inductive inference s model of theory formation Inferring the most effective explanations phenomena are modeled as bitstrings d, h... {0, 1 The goal is to maximize d observeddata h hypotheticdata Pr(h d) = Pr(d h) Pr(h) Pr(d) {p (d) =h explanationofcausality Pr(h d) = is the best {p(d)=h 2 l(p) thesimplestexplanation Inductive inference Inductive inference Inferring the most effective explanations The goal is to maximize Pr(h d) = Pr(d h) Pr(h) Pr(d) Minimum Description Length Principle (MDLP) Given an observation d, thebestexplanationminimizes l(p)+l(q) Since Pr(x) =2 K (x),thisisequivalenttominimizing where {p () = h {q (h) =d K (h d) = K (d h)+k (h) K (d)

Outline did we achieve? The halting number : Algorithmic information : Algorithmic probability κ =.κ 1 κ 2 κ 3 κ p 1 if U(p) κ p = 0 otherwise : The number of wisdom : Logical depth The halting number κ =.κ 1 κ 2 κ 3 κ p 1 if U(p) κ p = 0 otherwise Fact κ is highly compressible! Remark The value of κ depend on a fixed Universal Turing Machine U. Its crucial properties do not depend on the choice of U. Definition K t M (y) = {p()=y time(p,y) t( y ) l(p) Proposition (Barzdin) Suppose that Y = {i N y i = 1 is recursively enumerable. Then for any c > 0thereisatotalrecursive function t such that K t M (y) c log y

Proposition 2 Question For every n there is a program of length 2lognthat outputs the first n digits of κ. might this compressed κ look like? s number s number Ω = µ(s) = s N U(p) 2 l(p) Ω = µ(s) = s N U(p) 2 l(p) Interpretations probability that a hypothesis will be formed a priori probability that a randomly chosen program will halt Proposition 3 Proposition 4 Ω is incompressible. Ω 1..n decides halting of all programs of length up to n.

Interpretation Interpretation Many open problems can be formulated as the questions whether certain search programs will halt: Riemann Hypothesis P = NP, AP = ANP, one-way,trapdoor,ddh... Twin Primes... Many open problems can be formulated as the questions whether certain search programs will halt: Riemann Hypothesis P = NP, AP = ANP, one-way,trapdoor,ddh... Twin Primes... These programs are shorter than 5000 characters. Knowing Ω 1..5000 would resolve most of the open problems of mathematics. Outline Time bounded algorithmic probability did we achieve? : Algorithmic information Definition : Algorithmic probability : The number of wisdom µ t (y) = {p()=y time(p,y) t( y ) 2 l(p) : Logical depth Logical depth Logical depth Definition (, Adleman... ) Definition (, Adleman... ) L ɛ (y) = { t REC µ t (y) µ(y) ɛ L ɛ (y) = { t REC µ t (y) µ(y) ɛ Interpretation The shortest time in which that one of the minimal programs that output y will halt, with the conditional probability ɛ.

Logical depth Logical depth Proposition 5 (Adleman) Proposition 6 () The predicate L ɛ (y) O(n k ) is decidable in polynomial time. Deep strings cannot be quickly computed from shallow ones. Logical depth Logical mutual information Levin s Law of Conservation of Information s interpretation Let "A structure is deep if most of its algorithmic probability is contributed by slow-running processes." d be a stream of randomized observations, and h astreamgeneratedbyadeterministic mathematical model. Then the probability that the mutual information is I(D : H) > m is less than 2 m. Logical depth Logical depth Levin s comment Levin s interpretation "Following Church s Thesis, Theorem 3 precludes the increase of information through computational processes. Theorem 2 precludes the increase of information through acombinationofcomputationalandrandomized processes." "Our results contradict the assertion of some mathematicians that the truth of any valid proposition can be verified in the course of scientific progress through informal methods. (To do so by formal methods has been proven impossible by Gödel.)"