Complexity 6: AIT. Outline. Dusko Pavlovic. Kolmogorov. Solomonoff. Chaitin: The number of wisdom RHUL Spring Complexity 6: AIT.

Similar documents
CISC 876: Kolmogorov Complexity

CS154, Lecture 12: Kolmogorov Complexity: A Universal Theory of Data Compression

Complexity 1: Motivation. Outline. Dusko Pavlovic. Introductions. Algorithms. Introductions. Complexity 1: Motivation. Contact.

Lecture 13: Foundations of Math and Kolmogorov Complexity

Understanding Computation

2 Plain Kolmogorov Complexity

PREDICATE LOGIC: UNDECIDABILITY AND INCOMPLETENESS HUTH AND RYAN 2.5, SUPPLEMENTARY NOTES 2

Computability Theory. CS215, Lecture 6,

Turing s thesis: (1930) Any computation carried out by mechanical means can be performed by a Turing Machine

CS21 Decidability and Tractability

Decidability: Church-Turing Thesis

Kolmogorov complexity and its applications

Theory of Computation Lecture Notes. Problems and Algorithms. Class Information

Limits of Computability

Kolmogorov complexity

Algorithmic Information Theory

PROOFS IN PREDICATE LOGIC AND COMPLETENESS; WHAT DECIDABILITY MEANS HUTH AND RYAN 2.3, SUPPLEMENTARY NOTES 2

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

Introduction to Turing Machines

Large Numbers, Busy Beavers, Noncomputability and Incompleteness

Informal Statement Calculus

Most General computer?

Theory of Computation Lecture Notes

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

Compression Complexity

1 Showing Recognizability

Kolmogorov complexity and its applications

Decidability. Linz 6 th, Chapter 12: Limits of Algorithmic Computation, page 309ff

Complex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity

Turing Machines A Turing Machine is a 7-tuple, (Q, Σ, Γ, δ, q0, qaccept, qreject), where Q, Σ, Γ are all finite

Kolmogorov Complexity

Finish K-Complexity, Start Time Complexity

23.1 Gödel Numberings and Diagonalization

Computer Sciences Department

Final Exam Comments. UVa - cs302: Theory of Computation Spring < Total

Models. Models of Computation, Turing Machines, and the Limits of Turing Computation. Effective Calculability. Motivation for Models of Computation

Opleiding Informatica

Part I: Definitions and Properties

CSCE 551: Chin-Tser Huang. University of South Carolina

CS294: Pseudorandomness and Combinatorial Constructions September 13, Notes for Lecture 5

Institute for Applied Information Processing and Communications (IAIK) Secure & Correct Systems. Decidability

Is there an Elegant Universal Theory of Prediction?

Peano Arithmetic. CSC 438F/2404F Notes (S. Cook) Fall, Goals Now

6-1 Computational Complexity

Complexity Theory Part I

Great Theoretical Ideas

Computability and Complexity

Hilbert s problems, Gödel, and the limits of computation

Decidability: Reduction Proofs

TURING MAHINES

Hilbert s problems, Gödel, and the limits of computation

3 Self-Delimiting Kolmogorov complexity

Theory of Computation

Victoria Gitman and Thomas Johnstone. New York City College of Technology, CUNY

Gödel s Incompleteness Theorem. Overview. Computability and Logic

COS597D: Information Theory in Computer Science October 19, Lecture 10

Chapter 2 Algorithms and Computation

Handbook of Logic and Proof Techniques for Computer Science

Decision Problems with TM s. Lecture 31: Halting Problem. Universe of discourse. Semi-decidable. Look at following sets: CSCI 81 Spring, 2012

Example. Lemma. Proof Sketch. 1 let A be a formula that expresses that node t is reachable from s

Predicate Logic - Undecidability

CS154, Lecture 10: Rice s Theorem, Oracle Machines

Algorithmic Probability

Undecidability. Andreas Klappenecker. [based on slides by Prof. Welch]

Turing Machines. Lecture 8

Decidable Languages - relationship with other classes.

Tutorial on Axiomatic Set Theory. Javier R. Movellan

10-704: Information Processing and Learning Fall Lecture 10: Oct 3

6.080 / Great Ideas in Theoretical Computer Science Spring 2008

Preface These notes were prepared on the occasion of giving a guest lecture in David Harel's class on Advanced Topics in Computability. David's reques

Computable Functions

1 Computational Problems

Artificial Intelligence. 3 Problem Complexity. Prof. Dr. Jana Koehler Fall 2016 HSLU - JK

Turing Machine Recap

Introduction to Information Theory

CS 301. Lecture 17 Church Turing thesis. Stephen Checkoway. March 19, 2018

Exam Computability and Complexity

COMP/MATH 300 Topics for Spring 2017 June 5, Review and Regular Languages

CSCC63 Worksheet Turing Machines

Introduction to Kolmogorov Complexity

NP-Completeness. A language B is NP-complete iff B NP. This property means B is NP hard

Lecture 11: Measuring the Complexity of Proofs

Information Theory and Statistics Lecture 2: Source coding

Notes for Lecture Notes 2

On some Metatheorems about FOL

CS154, Lecture 13: P vs NP

Automata Theory. Definition. Computational Complexity Theory. Computability Theory

Acknowledgments 2. Part 0: Overview 17

6.045J/18.400J: Automata, Computability and Complexity. Quiz 2. March 30, Please write your name in the upper corner of each page.

CSCI3390-Lecture 6: An Undecidable Problem

Dynamic Semantics. Dynamic Semantics. Operational Semantics Axiomatic Semantics Denotational Semantic. Operational Semantics

Section 14.1 Computability then else

Towards a Theory of Information Flow in the Finitary Process Soup

258 Handbook of Discrete and Combinatorial Mathematics

The following techniques for methods of proofs are discussed in our text: - Vacuous proof - Trivial proof

The Cook-Levin Theorem

CSE 555 Homework Three Sample Solutions

Undecidability COMS Ashley Montanaro 4 April Department of Computer Science, University of Bristol Bristol, UK

Mathematical Logic. Reasoning in First Order Logic. Chiara Ghidini. FBK-IRST, Trento, Italy

Introduction to Computational Complexity

Transcription:

Outline Complexity Theory Part 6: did we achieve? Algorithmic information and logical depth : Algorithmic information : Algorithmic probability : The number of wisdom RHUL Spring 2012 : Logical depth Outline did we achieve? did we achieve? : Algorithmic information : Algorithmic probability : The number of wisdom : Logical depth did we achieve? Why are there complex phenomena? The question was: Where does the complexity come from?

Why are there complex phenomena? Why are there complex phenomena? Why does the evolution increase complexity? Why are we so complex? is complexity? Which strings are complex? Did complexity theory bring us closer to an answer? 010101010101010101010101010101010 01 digits 0101100111000111110 1101 digits 1100110101010111010 1101 digits is complexity? is complexity? How do you know which strings are complex? 010101010101010101010 01 (01) 50 010101010101010101010101010101010 01 digits 0101100111000111110 1101 digits 1100110101010111010 1101 digits

is complexity? is complexity? 010101010101010101010 01 (01) 50 010101010101010101010 01 (01) 50 do i=1..50 write 01 od do i=1..50 write 01 od 010110011110 11 0 1 1 1 0 2 1 {{ 2 0 i 1 i i=1; do until length= write 0 i 1 i ;i=i+1od is complexity? Complexity = length of description 010101010101010101010 01 (01) 50 do i=1..50 write 01 od Idea 010110011110 11 0 1 1 1 0 2 1 {{ 2 0 i 1 i i=1; do until length= write 0 i 1 i ;i=i+1od Fix a language and define for any x C(x) = length of the simplest description x 110011010101 00 print 110011010101 00 Complexity = length of description is a description? C(0101010...) < C(010...) < C(110010...)

is a description? is a description? β = x β = x The length of the expression " x" is< 20 is a description? Berry Paradox β = x The length of the expression " x" is< 20 Therefore β does not exist. β = x The length of the expression " Therefore β does not exist. x" is< 20 Therefore all x satisfy C(x) 20 Berry Paradox Moral We need a formal definition of a description. β = x The length of the expression " x" is< 20 Therefore β does not exist. Therefore all x satisfy C(x) 20 " x" is not a valid description

Outline Idea: descriptions = programs did we achieve? : Algorithmic information M { N : Algorithmic probability ( ) u c : The number of wisdom N N N N : Logical depth Assume programs with a "length" measure Assume programs with a "length" measure M { N l M { N l ( ) u c ( ) u c N N N N N N N N l(p) =acode length of p l(p) =acode length of p (e.g. the number of symbols in p) Complexity = length of program Escape from Berry Paradox Idea complexity of x is the length of the simplest program that outputs x Fact The predicate K (x) y K (x) = {p()=x l(p) is not decidable, or else β = K (x)>20 x would be a program.

Consequence Definition Komogorov complexity does not satisfy the axioms of complexity measures. For a given 2-tape Turing Machine M, them-description distance from a string x to a string y is K M (y x) = l(p) M(p,x)=y The machine M is called the interpreter. Definition For a given 2-tape Turing Machine M, them-description distance from a string x to a string y is 1 Notation For any pair of functions f, g : N N we write K M (y x) = l(p) M(p,x)=y f + g c n. f (n) g(n)+c f + = g f + g g + f The machine M is called the interpreter. 1 The definition of infimum implies =. Proof. If U is universal machine for all 2-tape machines, i.e. Proposition 1 (The Invariance Theorem) U ( M, x, y) = M(x, y) There is a universal interpreter: a 2-tape Turing Machine Usuchthatforall2-tapeTuringMachinesMholds then define U (J(x, y), z) = U(x, y, z) K U (y x) + K M (y x) where J is Cantor s pairing function, so that U (J( M, p), x) = M(p, x) and thus K U (y x) K M (y x)+l ( M ) + 6

Scholium For any two universal interpreters U, V holds K U (y x) + = K V (y x) Definition The description distance from a string x to a string y (or complexity of y relative to x) isthe description distance with respect to a universal interpreter K (y x) = l(p) {p(x)=y Properties of description complexity Definition The description complexity (or complexity) of astringy is the description distance from the empty string to y K (y) = l(p) {p()=y #{x {0, 1 l K (x) < l m 2 l m m(x) K (x) l(x) for m(x) = x y K (y) lim x m(x) = φ RPF. lim x φ(x) = m(x) < φ(x) χ RTF. lim t χ(t, x) =K (x) h. K (x + h) K (x) < 2h Compression and randomness Compression and randomness Question Temperature is the average of kinetic energies. is the entropy average of? Theorem (Schack 1997) complexity can be defined with optimal encoding and then H(q) K (q i ) i I

Compression and randomness Compression and randomness Upshot Upshot The most informative strings cannot be compressed: The most informative strings cannot be compressed: K (x) = l(x) K (x) = l(x) This is s definition of randomness. Outline Entropy encoding: more likely shorter did we achieve? Shannon s Noiseless Coding : Algorithmic information Let µ : Σ [0, 1] be the frequency distribution of the alphabet Σ. Thentheoptimalwordlengthfor representing a symbol s Σ is : Algorithmic probability l(s) = log µ(s) : The number of wisdom : Logical depth Entropy encoding: more likely shorter Occam s Razor: shorter more likely Shannon s Noiseless Coding s Algorithmic Probability Let µ : Σ [0, 1] be the frequency distribution of the alphabet Σ. Thentheoptimalwordlengthfor representing a symbol s Σ is Let U be a self-delimiting universal Turing machine, i.e. U(p) = q. U(p :: q) l(s) = log µ(s) Then the aprioriprobability distribution of data from a dataset Σ is Remark µ(y) = U(p)=y 2 l(p) The Shannon entropy is the average word length.

Occam s Razor: shorter more likely Occam s Razor: shorter more likely s Algorithmic Probability s Algorithmic Probability Let U be a self-delimiting universal Turing machine, i.e. Let U be a self-delimiting universal Turing machine, i.e. U(p) = q. U(p :: q) U(p, x) = q. U(p :: q, x) Then the aprioriprobability distribution of data from a dataset Σ is Then the aprioriconditional probability distribution of data from a dataset Σ is µ(y) = 2 l(p) µ(y x) = 2 l(p) U(p)=y U(p,x)=y Remark Remark Self-delimiting = y Σ µ(y) 1. Self-delimiting = y Σ µ(y) 1. Inductive inference Inductive inference s model of theory formation Inferring the most effective explanations phenomena are modeled as bitstrings d, h... {0, 1 The goal is to maximize d observeddata h hypotheticdata Pr(h d) = Pr(d h) Pr(h) Pr(d) {p (d) =h explanationofcausality Pr(h d) = is the best {p(d)=h 2 l(p) thesimplestexplanation Inductive inference Inductive inference Inferring the most effective explanations The goal is to maximize Pr(h d) = Pr(d h) Pr(h) Pr(d) Minimum Description Length Principle (MDLP) Given an observation d, thebestexplanationminimizes l(p)+l(q) Since Pr(x) =2 K (x),thisisequivalenttominimizing where {p () = h {q (h) =d K (h d) = K (d h)+k (h) K (d)

Outline did we achieve? The halting number : Algorithmic information : Algorithmic probability κ =.κ 1 κ 2 κ 3 κ p 1 if U(p) κ p = 0 otherwise : The number of wisdom : Logical depth The halting number κ =.κ 1 κ 2 κ 3 κ p 1 if U(p) κ p = 0 otherwise Fact κ is highly compressible! Remark The value of κ depend on a fixed Universal Turing Machine U. Its crucial properties do not depend on the choice of U. Definition K t M (y) = {p()=y time(p,y) t( y ) l(p) Proposition (Barzdin) Suppose that Y = {i N y i = 1 is recursively enumerable. Then for any c > 0thereisatotalrecursive function t such that K t M (y) c log y

Proposition 2 Question For every n there is a program of length 2lognthat outputs the first n digits of κ. might this compressed κ look like? s number s number Ω = µ(s) = s N U(p) 2 l(p) Ω = µ(s) = s N U(p) 2 l(p) Interpretations probability that a hypothesis will be formed a priori probability that a randomly chosen program will halt Proposition 3 Proposition 4 Ω is incompressible. Ω 1..n decides halting of all programs of length up to n.

Interpretation Interpretation Many open problems can be formulated as the questions whether certain search programs will halt: Riemann Hypothesis P = NP, AP = ANP, one-way,trapdoor,ddh... Twin Primes... Many open problems can be formulated as the questions whether certain search programs will halt: Riemann Hypothesis P = NP, AP = ANP, one-way,trapdoor,ddh... Twin Primes... These programs are shorter than 5000 characters. Knowing Ω 1..5000 would resolve most of the open problems of mathematics. Outline Time bounded algorithmic probability did we achieve? : Algorithmic information Definition : Algorithmic probability : The number of wisdom µ t (y) = {p()=y time(p,y) t( y ) 2 l(p) : Logical depth Logical depth Logical depth Definition (, Adleman... ) Definition (, Adleman... ) L ɛ (y) = { t REC µ t (y) µ(y) ɛ L ɛ (y) = { t REC µ t (y) µ(y) ɛ Interpretation The shortest time in which that one of the minimal programs that output y will halt, with the conditional probability ɛ.

Logical depth Logical depth Proposition 5 (Adleman) Proposition 6 () The predicate L ɛ (y) O(n k ) is decidable in polynomial time. Deep strings cannot be quickly computed from shallow ones. Logical depth Logical mutual information Levin s Law of Conservation of Information s interpretation Let "A structure is deep if most of its algorithmic probability is contributed by slow-running processes." d be a stream of randomized observations, and h astreamgeneratedbyadeterministic mathematical model. Then the probability that the mutual information is I(D : H) > m is less than 2 m. Logical depth Logical depth Levin s comment Levin s interpretation "Following Church s Thesis, Theorem 3 precludes the increase of information through computational processes. Theorem 2 precludes the increase of information through acombinationofcomputationalandrandomized processes." "Our results contradict the assertion of some mathematicians that the truth of any valid proposition can be verified in the course of scientific progress through informal methods. (To do so by formal methods has been proven impossible by Gödel.)"