# Algorithmic Probability

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Algorithmic Probability From Scholarpedia From Scholarpedia, the free peer-reviewed encyclopedia p Curator: Marcus Hutter, Australian National University Curator: Shane Legg, Dalle Molle Institute for Artificial Intelligence (IDSIA) Curator: Paul M.B. Vitanyi, CWI, and Computer science, University of Amsterdam Algorithmic "Solomonoff" Probability (AP) assigns to objects an a priori probability that is in some sense universal. This prior distribution has theoretical applications in a number of areas, including inductive inference theory and the time complexity analysis of algorithms. Its main drawback is that it is not computable and thus can only be approximated in practice. Contents 1 Bayes, Occam and Epicurus 2 Discrete Universal A Priori Probability 3 Continuous Universal A Priori Probability 4 Applications 4.1 Solomonoff Induction 4.2 AIXI and a Universal Definition of Intelligence 4.3 Expected Time/Space Complexity of Algorithms under the Universal Distribution 4.4 PAC Learning Using the Universal Distribution in the Learning Phase 4.5 Halting Probability 5 References 6 External Links 7 See Also Figure 1: Ray Solomonoff in Bioggio, Switzerland, Bayes, Occam and Epicurus In an inductive inference problem there is some observed data and a set of hypotheses, one of which may be the true hypothesis generating. The task is to decide which hypothesis, or hypotheses, are the most likely to be responsible for the observations. An elegant solution to this problem is Bayes' rule, To compute the relative probabilities of different hypotheses can be dropped as it is an independent constant. Furthermore, for a given hypothesis it is often straight forward to compute. A conceptually more difficult problem is to assign the prior probability. Indeed, how can one assign a probability to a hypothesis before observing any data? A theoretically sound solution to this problem was not known until R.J. Solomonoff founded algorithmic probability theory in the 1960s. Algorithmic probability rests upon two philosophical principles. The first is Epicurus' (342? B.C B.C.) principle of multiple explanations which states that one should keep all hypotheses that are consistent with the data. From Bayes' rule it is clear that in order to keep consistent hypotheses what is needed is

2 . The second philosophical foundation is the principle of Occam's razor ( , sometimes spelt Ockham). Occam's razor states that when inferring causes entities should not be multiplied beyond necessity. This is widely understood to mean: Among all hypotheses consistent with the observations, choose the simplest. In terms of a prior distribution over hypotheses, this is the same as giving simpler hypotheses higher a priori probability, and more complex ones lower probability. Using Turing's model of universal computation, Solomonoff (1964) produced a universal prior distribution that unifies these two principles. This work was later generalized and extended by a number of researchers, in particular L. A. Levin who formalized the initial intuitive approach in a rigorous mathematical setting and supplied the basic theorems of the field. There now exist a number of strongly related universal prior distributions, however this article will describe only the discrete universal a priori probability, and its continuous counterpart. See Chapters 4 and 5 of Li and Vitányi (1997) for a general survey. Discrete Universal A Priori Probability Consider an unknown process producing a binary string of one hundred 1s. The probability that such a string is the result of a uniform random process, such as fair coin flips, is just, like that of any other string of the same length. Intuitively, we feel that there is a difference between a string that we can recognize and distinguish, and the vast majority of strings that are indistinguishable to us. In fact, we feel that a far more likely explanation is that some deterministic algorithmic simple process has generated this string. This observation was already made by P.-S. Laplace in about 1800, but the techniques were lacking for that great mathematician to quantify this insight. Presently we do this using the lucky confluence of combinatorics, information theory and theory of algorithms. It is clear that, in a world with computable processes, patterns which result from simple processes are relatively likely, while patterns that can only be produced by very complex processes are relatively unlikely. Formally, a computable process that produces a string is a program that when executed on a universal Turing machine produces the string as output. As is itself a binary string, we can define the discrete universal a priori probability,, to be the probability that the output of a universal prefix Turing machine is when provided with fair coin flips on the input tape. Formally, where the sum is over all halting programs for which outputs the string. As is a prefix universal Turing machine the set of valid programs forms a prefix-free set and thus the sum is bounded due to Kraft's inequality. In fact, the boundedness of the sum should be an obvious consequence from realizing that the above expression is indeed a probability, which hinges on the fact that a prefix machine is one with only {0,1} input symbols and no end-of-input marker of any kind. It is easy to see that this distribution is strongly related to Kolmogorov complexity in that is at least the maximum term in the summation, which is. The Coding Theorem of L.A. Levin (1974) states that equality also holds in the other direction in the sense that, or equivalently, As every string can be computed by at least one program on, it follows that assigns a non-zero probability to all computable hypotheses and thus this distribution respects Epicurus' principle. Furthermore, if we measure the complexity of by its Kolmogorov complexity, we see that the simplest strings have the highest probability under, while complex ones have a correspondingly low probability. In this way also formalises the principle of Occam's razor. Unfortunately, it is impossible in general to know whether any given program will halt when run on, due

3 to the halting problem. This means that the sum can not be computed, it can only be approximated from below. Stated technically, the function is only lower semi-computable. Furthermore, is not a proper probability measure but rather a semi-measure as. This can be corrected by normalising, however this introduces other theoretical weaknesses and so will not be discussed here. The universal a priori probability has many remarkable properties. A theorem of L.A. Levin states that among the lower semi-computable semi-measures it is the largest such function in the following sense: If is a lower semi-computable semi-measure, then there is a constant such that for all. In fact, we can take with the prefix Kolmogorov complexity of the distribution (defined as the least complexity of a prefix Turing machine lower semi-computing the distribution). This justifies calling a universal distribution. Thus, the same mathematical object arises in three different ways: The family of lower semi-computable semi-measures contains an element that multiplicatively dominates every other element: a `universal' lower semi-computable semi-measure ; The probability mass function defined as the probability that the universal prefix machine outputs when the input is provided by fair coin flips, is the `a priori' probability ; and The probability has as its associated Shannon-Fano code ( ). When three very different definitions (or properties) turn out to define the same object, then this is usually taken in Mathematics to mean that this object is objectively important. Continuous Universal A Priori Probability The universal distribution is defined in a discrete domain, its arguments are finite binary strings. For applications such as the prediction of growing sequences it is necessary to define a similar distribution on infinite binary sequences. This leads to the universal semi-measure defined as the probability that the output of a monotone universal Turing machine starts with when provided with fair coin flips on the input tape. Formally, where the sum is over all minimal programs for which 's output starts with (denoted by *). Like, it can be shown that is only a semi-computable semi-measure. Due to a theorem of L.A. Levin it is the largest such function: If is a lower semi-computable semi-measure, then there is a constant such that for all. The precise definition of these notions is similar but more complicated than in the discrete setting above and thus we refer the reader to the literature (see for example chapter 4 of Li and Vitanyi 1997). The semi-measure has similar remarkable properties to, for example,, where is the monotone complexity. Also divided by the uniform measure is the maximal semicomputable supermartingale. Additionally, if the infinite binary sequences are distributed according to a computable measure, then the predictive distribution converges rapidly to with -probability 1. Hence, predicts almost as well as does the true distribution. Applications

4 Algorithmic probability has a number of important theoretical applications; some of them are listed below. For a more complete survey see Li and Vitanyi (1997). Solomonoff Induction Solomonoff (1964) developed a quantitative formal theory of induction based on universal Turing machines (to compute, quantify and assign codes to all quantities of interest) and algorithmic complexity (to define what simplicity/complexity means). The important Prediction Error Theorem is due to Solomonoff (1978): Let be the expected squared error made in the -th prediction of the next bit of an infinite binary sequence in the ensemble of infinite binary sequences distributed according to computable measure when predicting according to the fixed semi-measure. Then, that is, the total summed expected squared prediction error is bounded by a constant, and therefore, if smooth enough, typically decreases faster than. In particular it implies with -probability 1, stated above. This is remarkable as it means that Solomonoff's inductive inference system will learn to correctly predict any computable sequence with only the absolute minimum amount of data. It would thus, in some sense, be the perfect universal prediction algorithm, if only it were computable. AIXI and a Universal Definition of Intelligence It can also be shown that leads to excellent predictions and decisions in more general stochastic environments. Essentially AIXI is a generalisation of Solomonoff induction to the reinforcement learning setting, that is, where the agent's actions can influence the state of the environment. AIXI can be proven to be, in some sense, an optimal reinforcement learning agent embedded in an arbitrary unknown environment (Hutter 2004). Like Solomonoff induction, the AIXI agent is not computable and thus can only be approximated in practice. Instead of defining an optimal general agent, it is possible to turn things around and define a universal performance measure for agents acting in computable environments, known as universal intelligence. Expected Time/Space Complexity of Algorithms under the Universal Distribution For every algorithm whatsoever, the -average-case complexity is always of the same order of magnitude as the worst-case complexity for computational resources such as time and storage space. This result of Li and Vitanyi means that, if the inputs are distributed according to the universal distribution, rather than according to the uniform distribution as is customary, then the expected case is as bad as the worst case. For example, the sorting procedure Quicksort has worst-case running time, but expexted running time on a list of n keys for permutations drawn at random from uniformly distributed permutations of keys. But if the permutations are distributed according to the universal distribution, that is, according to Occam's dictum that the simple permutations have the highest probabilities, as is often the case in practice, then the expected running time rises to the worst-case of. Experiments appear to confirm this theoretical prediction. PAC Learning Using the Universal Distribution in the Learning Phase Using as the sampling distribution in the learning phase in PAC-learning properly increases the learning power for the family of discrete concept classes under computable distributions, and similarly for PAC learning

5 of continuous concepts over computable measures by using. This model of Li and Vitanyi is akin to using a teacher in the learning phase who gives the simple examples first. See Applications of AIT. Halting Probability A formally related quantity is the probability that halts when provided with fair coin flips on the input tape (i.e., that a random computer program will eventually halt). This halting probability known as Chaitin's constant or "the number of wisdom", and has numerous remarkable mathematical properties. For example, it can be used to quantify Goedel's Incompleteness Theorem. is also References M. Hutter. Universal artificial intelligence: Sequential Decisions based on algorithmic probability. Springer, Berlin, M. Li and P. M B. Vitanyi. An introduction to Kolmogorov complexity and its applications. Springer, New York, 2nd edition 1997, and 3rd edition SGWID= L.A. Levin. Laws of information conservation (non-growth) and aspects of the foundation of probability theory. Problems Information Transmission, 10(3): , R. J. Solomonoff. A formal theory of inductive inference: Parts 1 and 2. Information and Control, 7:1--22 and , R. J. Solomonoff. Complexity-based induction systems: Comparisons and convergence theorems. IEEE Transactions on Information Theory, IT-24: , A. K. Zvonkin and L. A. Levin. The complexity of finite objects and the development of the concepts of information and randomness by means of the theory of algorithms. Russian Mathematical Surveys, 25(6): , External Links Homepages of Leonid A. Levin, Ray Solomonoff Ming Li Paul Vitanyi Marcus Hutter Gregory Chaitin See Also Algorithmic information theory, Algorithmic Complexity, Algorithmic Randomness Marcus Hutter, Shane Legg, Paul M.B. Vitanyi (2007) Algorithmic Probability. Scholarpedia, p Created: 4 December 2006, reviewed: 23 August 2007, accepted: 23 August 2007 Editor: Marcus Hutter, Australian National University Retrieved from " Categories: Algorithmic Information Theory Computational Intelligence

### Algorithmic probability, Part 1 of n. A presentation to the Maths Study Group at London South Bank University 09/09/2015

Algorithmic probability, Part 1 of n A presentation to the Maths Study Group at London South Bank University 09/09/2015 Motivation Effective clustering the partitioning of a collection of objects such

### CISC 876: Kolmogorov Complexity

March 27, 2007 Outline 1 Introduction 2 Definition Incompressibility and Randomness 3 Prefix Complexity Resource-Bounded K-Complexity 4 Incompressibility Method Gödel s Incompleteness Theorem 5 Outline

### Introduction to Kolmogorov Complexity

Introduction to Kolmogorov Complexity Marcus Hutter Canberra, ACT, 0200, Australia http://www.hutter1.net/ ANU Marcus Hutter - 2 - Introduction to Kolmogorov Complexity Abstract In this talk I will give

### Universal Prediction of Selected Bits

Universal Prediction of Selected Bits Tor Lattimore and Marcus Hutter and Vaibhav Gavane Australian National University tor.lattimore@anu.edu.au Australian National University and ETH Zürich marcus.hutter@anu.edu.au

### 16.4 Multiattribute Utility Functions

285 Normalized utilities The scale of utilities reaches from the best possible prize u to the worst possible catastrophe u Normalized utilities use a scale with u = 0 and u = 1 Utilities of intermediate

### algorithms ISSN

Algorithms 2010, 3, 260-264; doi:10.3390/a3030260 OPEN ACCESS algorithms ISSN 1999-4893 www.mdpi.com/journal/algorithms Obituary Ray Solomonoff, Founding Father of Algorithmic Information Theory Paul M.B.

### Kolmogorov Complexity

Kolmogorov Complexity 1 Krzysztof Zawada University of Illinois at Chicago E-mail: kzawada@uic.edu Abstract Information theory is a branch of mathematics that attempts to quantify information. To quantify

### The Fastest and Shortest Algorithm for All Well-Defined Problems 1

Technical Report IDSIA-16-00, 3 March 2001 ftp://ftp.idsia.ch/pub/techrep/idsia-16-00.ps.gz The Fastest and Shortest Algorithm for All Well-Defined Problems 1 Marcus Hutter IDSIA, Galleria 2, CH-6928 Manno-Lugano,

### From inductive inference to machine learning

From inductive inference to machine learning ADAPTED FROM AIMA SLIDES Russel&Norvig:Artificial Intelligence: a modern approach AIMA: Inductive inference AIMA: Inductive inference 1 Outline Bayesian inferences

### Kolmogorov complexity and its applications

Spring, 2009 Kolmogorov complexity and its applications Paul Vitanyi Computer Science University of Amsterdam http://www.cwi.nl/~paulv/course-kc We live in an information society. Information science is

### Kolmogorov complexity and its applications

CS860, Winter, 2010 Kolmogorov complexity and its applications Ming Li School of Computer Science University of Waterloo http://www.cs.uwaterloo.ca/~mli/cs860.html We live in an information society. Information

### COS597D: Information Theory in Computer Science October 19, Lecture 10

COS597D: Information Theory in Computer Science October 9, 20 Lecture 0 Lecturer: Mark Braverman Scribe: Andrej Risteski Kolmogorov Complexity In the previous lectures, we became acquainted with the concept

### Towards a Universal Theory of Artificial Intelligence based on Algorithmic Probability and Sequential Decisions

Marcus Hutter - 1 - Universal Artificial Intelligence Towards a Universal Theory of Artificial Intelligence based on Algorithmic Probability and Sequential Decisions Marcus Hutter Istituto Dalle Molle

### Sophistication Revisited

Sophistication Revisited Luís Antunes Lance Fortnow August 30, 007 Abstract Kolmogorov complexity measures the ammount of information in a string as the size of the shortest program that computes the string.

### Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction

15-0: Learning vs. Deduction Artificial Intelligence Programming Bayesian Learning Chris Brooks Department of Computer Science University of San Francisco So far, we ve seen two types of reasoning: Deductive

### CS340 Machine learning Lecture 5 Learning theory cont'd. Some slides are borrowed from Stuart Russell and Thorsten Joachims

CS340 Machine learning Lecture 5 Learning theory cont'd Some slides are borrowed from Stuart Russell and Thorsten Joachims Inductive learning Simplest form: learn a function from examples f is the target

### An Algebraic Characterization of the Halting Probability

CDMTCS Research Report Series An Algebraic Characterization of the Halting Probability Gregory Chaitin IBM T. J. Watson Research Center, USA CDMTCS-305 April 2007 Centre for Discrete Mathematics and Theoretical

### Transfer Learning using Kolmogorov Complexity: Basic Theory and Empirical Evaluations

Transfer Learning using Kolmogorov Complexity: Basic Theory and Empirical Evaluations M. M. Hassan Mahmud Department of Computer Science University of Illinois at Urbana-Champaign mmmahmud@uiuc.edu Sylvian

### Algorithmic Information Theory

Algorithmic Information Theory Peter D. Grünwald CWI, P.O. Box 94079 NL-1090 GB Amsterdam, The Netherlands E-mail: pdg@cwi.nl Paul M.B. Vitányi CWI, P.O. Box 94079 NL-1090 GB Amsterdam The Netherlands

### de Blanc, Peter Ontological Crises in Artificial Agents Value Systems. The Singularity Institute, San Francisco, CA, May 19.

MIRI MACHINE INTELLIGENCE RESEARCH INSTITUTE Ontological Crises in Artificial Agents Value Systems Peter de Blanc Machine Intelligence Research Institute Abstract Decision-theoretic agents predict and

### Kolmogorov Complexity

Kolmogorov Complexity Davide Basilio Bartolini University of Illinois at Chicago Politecnico di Milano dbarto3@uic.edu davide.bartolini@mail.polimi.it 1 Abstract What follows is a survey of the field of

### Kolmogorov complexity ; induction, prediction and compression

Kolmogorov complexity ; induction, prediction and compression Contents 1 Motivation for Kolmogorov complexity 1 2 Formal Definition 2 3 Trying to compute Kolmogorov complexity 3 4 Standard upper bounds

### Average Case Analysis of QuickSort and Insertion Tree Height using Incompressibility

Average Case Analysis of QuickSort and Insertion Tree Height using Incompressibility Tao Jiang, Ming Li, Brendan Lucier September 26, 2005 Abstract In this paper we study the Kolmogorov Complexity of a

### CS 380: ARTIFICIAL INTELLIGENCE MACHINE LEARNING. Santiago Ontañón

CS 380: ARTIFICIAL INTELLIGENCE MACHINE LEARNING Santiago Ontañón so367@drexel.edu Summary so far: Rational Agents Problem Solving Systematic Search: Uninformed Informed Local Search Adversarial Search

### A statistical mechanical interpretation of algorithmic information theory

A statistical mechanical interpretation of algorithmic information theory Kohtaro Tadaki Research and Development Initiative, Chuo University 1 13 27 Kasuga, Bunkyo-ku, Tokyo 112-8551, Japan. E-mail: tadaki@kc.chuo-u.ac.jp

### Introduction to Artificial Intelligence. Learning from Oberservations

Introduction to Artificial Intelligence Learning from Oberservations Bernhard Beckert UNIVERSITÄT KOBLENZ-LANDAU Winter Term 2004/2005 B. Beckert: KI für IM p.1 Outline Learning agents Inductive learning

### Objective probability-like things with and without objective indeterminism

Journal reference: Studies in History and Philosophy of Modern Physics 38 (2007) 626 Objective probability-like things with and without objective indeterminism László E. Szabó Theoretical Physics Research

### 1 Two-Way Deterministic Finite Automata

1 Two-Way Deterministic Finite Automata 1.1 Introduction Hing Leung 1 In 1943, McCulloch and Pitts [4] published a pioneering work on a model for studying the behavior of the nervous systems. Following

### Ockham Efficiency Theorem for Randomized Scientific Methods

Ockham Efficiency Theorem for Randomized Scientific Methods Conor Mayo-Wilson and Kevin T. Kelly Department of Philosophy Carnegie Mellon University Formal Epistemology Workshop (FEW) June 19th, 2009 1

### Recovery Based on Kolmogorov Complexity in Underdetermined Systems of Linear Equations

Recovery Based on Kolmogorov Complexity in Underdetermined Systems of Linear Equations David Donoho Department of Statistics Stanford University Email: donoho@stanfordedu Hossein Kakavand, James Mammen

### Lance Fortnow Georgia Institute of Technology A PERSONAL VIEW OF P VERSUS NP

Lance Fortnow Georgia Institute of Technology A PERSONAL VIEW OF P VERSUS NP NEW YORK TIMES AUGUST 16, 2010 Step 1: Post Elusive Proof. Step 2: Watch Fireworks. By John Markoff Vinay Deolalikar, a mathematician

### Creative Objectivism, a powerful alternative to Constructivism

Creative Objectivism, a powerful alternative to Constructivism Copyright c 2002 Paul P. Budnik Jr. Mountain Math Software All rights reserved Abstract It is problematic to allow reasoning about infinite

### Stochasticity in Algorithmic Statistics for Polynomial Time

Stochasticity in Algorithmic Statistics for Polynomial Time Alexey Milovanov 1 and Nikolay Vereshchagin 2 1 National Research University Higher School of Economics, Moscow, Russia almas239@gmail.com 2

### 3 Self-Delimiting Kolmogorov complexity

3 Self-Delimiting Kolmogorov complexity 3. Prefix codes A set is called a prefix-free set (or a prefix set) if no string in the set is the proper prefix of another string in it. A prefix set cannot therefore

### Kolmogorov Complexity and Diophantine Approximation

Kolmogorov Complexity and Diophantine Approximation Jan Reimann Institut für Informatik Universität Heidelberg Kolmogorov Complexity and Diophantine Approximation p. 1/24 Algorithmic Information Theory

### Self-Modification and Mortality in Artificial Agents

Self-Modification and Mortality in Artificial Agents Laurent Orseau 1 and Mark Ring 2 1 UMR AgroParisTech 518 / INRA 16 rue Claude Bernard, 75005 Paris, France laurent.orseau@agroparistech.fr http://www.agroparistech.fr/mia/orseau

### arxiv: v5 [cs.it] 5 Oct 2011

Numerical Evaluation of Algorithmic Complexity for Short Strings: A Glance Into the Innermost Structure of Randomness arxiv:1101.4795v5 [cs.it] 5 Oct 2011 Jean-Paul Delahaye 1, Hector Zenil 1 Email addresses:

### Turing Machines Part III

Turing Machines Part III Announcements Problem Set 6 due now. Problem Set 7 out, due Monday, March 4. Play around with Turing machines, their powers, and their limits. Some problems require Wednesday's

### A Formal Solution to the Grain of Truth Problem

A Formal Solution to the Grain of Truth Problem Jan Leike Australian National University jan.leike@anu.edu.au Jessica Taylor Machine Intelligence Research Inst. jessica@intelligence.org Benya Fallenstein

### CS 282A/MATH 209A: Foundations of Cryptography Prof. Rafail Ostrosky. Lecture 4

CS 282A/MATH 209A: Foundations of Cryptography Prof. Rafail Ostrosky Lecture 4 Lecture date: January 26, 2005 Scribe: Paul Ray, Mike Welch, Fernando Pereira 1 Private Key Encryption Consider a game between

### Introduction to Turing Machines. Reading: Chapters 8 & 9

Introduction to Turing Machines Reading: Chapters 8 & 9 1 Turing Machines (TM) Generalize the class of CFLs: Recursively Enumerable Languages Recursive Languages Context-Free Languages Regular Languages

### which possibility holds in an ensemble with a given a priori probability p k log 2

ALGORITHMIC INFORMATION THEORY Encyclopedia of Statistical Sciences, Volume 1, Wiley, New York, 1982, pp. 38{41 The Shannon entropy* concept of classical information theory* [9] is an ensemble notion it

### cse303 ELEMENTS OF THE THEORY OF COMPUTATION Professor Anita Wasilewska

cse303 ELEMENTS OF THE THEORY OF COMPUTATION Professor Anita Wasilewska LECTURE 13 CHAPTER 4 TURING MACHINES 1. The definition of Turing machine 2. Computing with Turing machines 3. Extensions of Turing

### From Batch to Transductive Online Learning

From Batch to Transductive Online Learning Sham Kakade Toyota Technological Institute Chicago, IL 60637 sham@tti-c.org Adam Tauman Kalai Toyota Technological Institute Chicago, IL 60637 kalai@tti-c.org

### Classification (Categorization) CS 391L: Machine Learning: Inductive Classification. Raymond J. Mooney. Sample Category Learning Problem

Classification (Categorization) CS 9L: Machine Learning: Inductive Classification Raymond J. Mooney University of Texas at Austin Given: A description of an instance, x X, where X is the instance language

### Online Learning, Mistake Bounds, Perceptron Algorithm

Online Learning, Mistake Bounds, Perceptron Algorithm 1 Online Learning So far the focus of the course has been on batch learning, where algorithms are presented with a sample of training data, from which

### Algorithmic Data Analytics, Small Data Matters and Correlation versus Causation

Algorithmic Data Analytics, Small Data Matters and Correlation versus Causation arxiv:1309.1418v9 [cs.ce] 26 Jul 2017 Hector Zenil Unit of Computational Medicine, SciLifeLab, Department of Medicine Solna,

### CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2017. Tom M. Mitchell. All rights reserved. *DRAFT OF September 16, 2017* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is

### A Philosophical Treatise of Universal Induction

Entropy 2011, 13, 1076-1136; doi:10.3390/e13061076 OPEN ACCESS entropy ISSN 1099-4300 www.mdpi.com/journal/entropy Article A Philosophical Treatise of Universal Induction Samuel Rathmanner and Marcus Hutter

### Uncountable Automatic Classes and Learning

Uncountable Automatic Classes and Learning Sanjay Jain a,1, Qinglong Luo a, Pavel Semukhin b,2, Frank Stephan c,3 a Department of Computer Science, National University of Singapore, Singapore 117417, Republic

### Bayesian Learning (II)

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

### Computer Sciences Department

Computer Sciences Department 1 Reference Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Computer Sciences Department 3 ADVANCED TOPICS IN C O M P U T A B I L I T Y

### Randomized Complexity Classes; RP

Randomized Complexity Classes; RP Let N be a polynomial-time precise NTM that runs in time p(n) and has 2 nondeterministic choices at each step. N is a polynomial Monte Carlo Turing machine for a language

### Theory of computation: initial remarks (Chapter 11)

Theory of computation: initial remarks (Chapter 11) For many purposes, computation is elegantly modeled with simple mathematical objects: Turing machines, finite automata, pushdown automata, and such.

### The axiomatic power of Kolmogorov complexity

The axiomatic power of Kolmogorov complexity Laurent Bienvenu 1, Andrei Romashchenko 2, Alexander Shen 2, Antoine Taveneaux 1, and Stijn Vermeeren 3 1 LIAFA, CNRS & Université Paris 7 2 LIRMM, CNRS & Université

### Chapter 13. Omega and the Time Evolution of the n-body Problem

Chapter 13 Omega and the Time Evolution of the n-body Problem Karl Svozil Institut für Theoretische Physik, University of Technology Vienna, Wiedner Hauptstraße 8-10/136, A-1040 Vienna, Austria; svozil@tuwien.ac.at

### The Legacy of Hilbert, Gödel, Gentzen and Turing

The Legacy of Hilbert, Gödel, Gentzen and Turing Amílcar Sernadas Departamento de Matemática - Instituto Superior Técnico Security and Quantum Information Group - Instituto de Telecomunicações TULisbon

### PERFECT SECRECY AND ADVERSARIAL INDISTINGUISHABILITY

PERFECT SECRECY AND ADVERSARIAL INDISTINGUISHABILITY BURTON ROSENBERG UNIVERSITY OF MIAMI Contents 1. Perfect Secrecy 1 1.1. A Perfectly Secret Cipher 2 1.2. Odds Ratio and Bias 3 1.3. Conditions for Perfect

### The sum 2 KM(x) K(x) over all prefixes x of some binary sequence can be infinite

The sum 2 KM(x) K(x) over all prefixes x of some binary sequence can be infinite Mikhail Andreev, Akim Kumok arxiv:40.467v [cs.it] 7 Jan 204 January 8, 204 Abstract We consider two quantities that measure

### Probability and Statistics

Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be CHAPTER 4: IT IS ALL ABOUT DATA 4a - 1 CHAPTER 4: IT

### Machine Learning Basics Lecture 4: SVM I. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 4: SVM I Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d. from distribution

### CS154, Lecture 12: Kolmogorov Complexity: A Universal Theory of Data Compression

CS154, Lecture 12: Kolmogorov Complexity: A Universal Theory of Data Compression Rosencrantz & Guildenstern Are Dead (Tom Stoppard) Rigged Lottery? And the winning numbers are: 1, 2, 3, 4, 5, 6 But is

### arxiv: v2 [cs.ai] 11 May 2016

Ultimate Intelligence Part II: Physical Complexity and Limits of Inductive Inference Systems Eray Özkural Gök Us Sibernetik Ar&Ge Ltd. Şti. arxiv:1504.03303v2 [cs.ai] 11 May 2016 Abstract. We continue

### Course 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016

Course 16:198:520: Introduction To Artificial Intelligence Lecture 13 Decision Making Abdeslam Boularias Wednesday, December 7, 2016 1 / 45 Overview We consider probabilistic temporal models where the

### Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

### arxiv:math/ v1 [math.nt] 17 Feb 2003

arxiv:math/0302183v1 [math.nt] 17 Feb 2003 REPRESENTATIONS OF Ω IN NUMBER THEORY: FINITUDE VERSUS PARITY TOBY ORD AND TIEN D. KIEU Abstract. We present a new method for expressing Chaitin s random real,

### Online Prediction: Bayes versus Experts

Marcus Hutter - 1 - Online Prediction Bayes versus Experts Online Prediction: Bayes versus Experts Marcus Hutter Istituto Dalle Molle di Studi sull Intelligenza Artificiale IDSIA, Galleria 2, CH-6928 Manno-Lugano,

### Final Exam Comments. UVa - cs302: Theory of Computation Spring < Total

UVa - cs302: Theory of Computation Spring 2008 Final Exam Comments < 50 50 59 60 69 70 79 80 89 90 94 95-102 Total 2 6 8 22 16 16 12 Problem 1: Short Answers. (20) For each question, provide a correct,

### Lecture 12: Randomness Continued

CS 710: Complexity Theory 2/25/2010 Lecture 12: Randomness Continued Instructor: Dieter van Melkebeek Scribe: Beth Skubak & Nathan Collins In the last lecture we introduced randomized computation in terms

### DRAFT. Algebraic computation models. Chapter 14

Chapter 14 Algebraic computation models Somewhat rough We think of numerical algorithms root-finding, gaussian elimination etc. as operating over R or C, even though the underlying representation of the

### Notes on Complexity Theory Last updated: November, Lecture 10

Notes on Complexity Theory Last updated: November, 2015 Lecture 10 Notes by Jonathan Katz, lightly edited by Dov Gordon. 1 Randomized Time Complexity 1.1 How Large is BPP? We know that P ZPP = RP corp

### The State Explosion Problem

The State Explosion Problem Martin Kot August 16, 2003 1 Introduction One from main approaches to checking correctness of a concurrent system are state space methods. They are suitable for automatic analysis

### Minimum Message Length and Kolmogorov Complexity

Minimum Message Length and Kolmogorov Complexity C. S. WALLACE AND D. L. DOWE Computer Science, Monash University, Clayton Vic 3168 Australia Email: csw@cs.monash.edu.au The notion of algorithmic complexity

### Ultimate Intelligence Part I: Physical Completeness and Objectivity of Induction

Ultimate Intelligence Part I: Physical Completeness and Objectivity of Induction Eray Özkural Gök Us Sibernetik Ar&Ge Ltd. Şti. Abstract. We propose that Solomonoff induction is complete in the physical

### A version of for which ZFC can not predict a single bit Robert M. Solovay May 16, Introduction In [2], Chaitin introd

CDMTCS Research Report Series A Version of for which ZFC can not Predict a Single Bit Robert M. Solovay University of California at Berkeley CDMTCS-104 May 1999 Centre for Discrete Mathematics and Theoretical

### Introduction to Machine Learning (67577) Lecture 5

Introduction to Machine Learning (67577) Lecture 5 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Nonuniform learning, MDL, SRM, Decision Trees, Nearest Neighbor Shai

### ,

Kolmogorov Complexity Carleton College, CS 254, Fall 2013, Prof. Joshua R. Davis based on Sipser, Introduction to the Theory of Computation 1. Introduction Kolmogorov complexity is a theory of lossless

### Kolmogorov Complexity and Applications

06051 Abstracts Collection Kolmogorov Complexity and Applications Dagstuhl Seminar M. Hutter 1, W. Merkle 2 and P. Vitanyi 3 1 IDSIA - Lugano, CH marcus@idsia.ch 2 Univ. Heidelberg, DE merkle@math.uni-heidelberg.de

### What Causality Is (stats for mathematicians)

What Causality Is (stats for mathematicians) Andrew Critch UC Berkeley August 31, 2011 Introduction Foreword: The value of examples With any hard question, it helps to start with simple, concrete versions

### Primitive Recursive Presentations of Transducers and their Products

Primitive Recursive Presentations of Transducers and their Products Victor Yodaiken FSResearch LLC, 2718 Creeks Edge Parkway yodaiken@fsmlabs.com http://www.yodaiken.com Abstract. Methods for specifying

### Logistic Regression. Will Monroe CS 109. Lecture Notes #22 August 14, 2017

1 Will Monroe CS 109 Logistic Regression Lecture Notes #22 August 14, 2017 Based on a chapter by Chris Piech Logistic regression is a classification algorithm1 that works by trying to learn a function

### P, NP, NP-Complete, and NPhard

P, NP, NP-Complete, and NPhard Problems Zhenjiang Li 21/09/2011 Outline Algorithm time complicity P and NP problems NP-Complete and NP-Hard problems Algorithm time complicity Outline What is this course

### On the Foundations of Universal Sequence Prediction

Technical Report IDSIA-03-06 On the Foundations of Universal Sequence Prediction Marcus Hutter IDSIA, Galleria 2, CH-6928 Manno-Lugano, Switzerland marcus@idsia.ch http://www.idsia.ch/ marcus 8 February

### An Introduction to Statistical Theory of Learning. Nakul Verma Janelia, HHMI

An Introduction to Statistical Theory of Learning Nakul Verma Janelia, HHMI Towards formalizing learning What does it mean to learn a concept? Gain knowledge or experience of the concept. The basic process

### Sample Project: Simulation of Turing Machines by Machines with only Two Tape Symbols

Sample Project: Simulation of Turing Machines by Machines with only Two Tape Symbols The purpose of this document is to illustrate what a completed project should look like. I have chosen a problem that

### Measures of relative complexity

Measures of relative complexity George Barmpalias Institute of Software Chinese Academy of Sciences and Visiting Fellow at the Isaac Newton Institute for the Mathematical Sciences Newton Institute, January

### Lecture 13: Foundations of Math and Kolmogorov Complexity

6.045 Lecture 13: Foundations of Math and Kolmogorov Complexity 1 Self-Reference and the Recursion Theorem 2 Lemma: There is a computable function q : Σ* Σ* such that for every string w, q(w) is the description

### 1 Computational problems

80240233: Computational Complexity Lecture 1 ITCS, Tsinghua Univesity, Fall 2007 9 October 2007 Instructor: Andrej Bogdanov Notes by: Andrej Bogdanov The aim of computational complexity theory is to study

### Measuring and Optimizing Behavioral Complexity for Evolutionary Reinforcement Learning

Measuring and Optimizing Behavioral Complexity for Evolutionary Reinforcement Learning Faustino J. Gomez, Julian Togelius, and Juergen Schmidhuber IDSIA, Galleria 2 6928 Manno-Lugano Switzerland Abstract.

### Computability Theory

Computability Theory Cristian S. Calude May 2012 Computability Theory 1 / 1 Bibliography M. Sipser. Introduction to the Theory of Computation, PWS 1997. (textbook) Computability Theory 2 / 1 Supplementary

### 1 Computational Problems

Stanford University CS254: Computational Complexity Handout 2 Luca Trevisan March 31, 2010 Last revised 4/29/2010 In this lecture we define NP, we state the P versus NP problem, we prove that its formulation

### ALGORITHMIC INFORMATION THEORY: REVIEW FOR PHYSICISTS AND NATURAL SCIENTISTS

ALGORITHMIC INFORMATION THEORY: REVIEW FOR PHYSICISTS AND NATURAL SCIENTISTS S D DEVINE Victoria Management School, Victoria University of Wellington, PO Box 600, Wellington, 6140,New Zealand, sean.devine@vuw.ac.nz

### Lecture 3: Constructing a Quantum Model

CS 880: Quantum Information Processing 9/9/010 Lecture 3: Constructing a Quantum Model Instructor: Dieter van Melkebeek Scribe: Brian Nixon This lecture focuses on quantum computation by contrasting it

### Automata Theory CS S-12 Turing Machine Modifications

Automata Theory CS411-2015S-12 Turing Machine Modifications David Galles Department of Computer Science University of San Francisco 12-0: Extending Turing Machines When we added a stack to NFA to get a

### Theory of computation: initial remarks (Chapter 11)

Theory of computation: initial remarks (Chapter 11) For many purposes, computation is elegantly modeled with simple mathematical objects: Turing machines, finite automata, pushdown automata, and such.

### Chapter ML:IV. IV. Statistical Learning. Probability Basics Bayes Classification Maximum a-posteriori Hypotheses

Chapter ML:IV IV. Statistical Learning Probability Basics Bayes Classification Maximum a-posteriori Hypotheses ML:IV-1 Statistical Learning STEIN 2005-2017 Area Overview Mathematics Statistics...... Stochastics

### Information Theory CHAPTER. 5.1 Introduction. 5.2 Entropy

Haykin_ch05_pp3.fm Page 207 Monday, November 26, 202 2:44 PM CHAPTER 5 Information Theory 5. Introduction As mentioned in Chapter and reiterated along the way, the purpose of a communication system is

### Layerwise computability and image randomness

Layerwise computability and image randomness Laurent Bienvenu Mathieu Hoyrup Alexander Shen arxiv:1607.04232v1 [math.lo] 14 Jul 2016 Abstract Algorithmic randomness theory starts with a notion of an individual