Uncertainty & Induction in AGI

Similar documents
On the Foundations of Universal Sequence Prediction

Algorithmic Probability

Universal Convergence of Semimeasures on Individual Random Sequences

A Philosophical Treatise of Universal Induction

Chapter 6 The Structural Risk Minimization Principle

Advances in Universal Artificial Intelligence

Predictive MDL & Bayes

arxiv: v1 [cs.lg] 28 May 2011

CISC 876: Kolmogorov Complexity

Ockham Efficiency Theorem for Randomized Scientific Methods

Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric?

From inductive inference to machine learning

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction

Approximate Universal Artificial Intelligence

Solomonoff Induction Violates Nicod s Criterion

Free Lunch for Optimisation under the Universal Distribution

Measuring Agent Intelligence via Hierarchies of Environments

Distribution of Environments in Formal Measures of Intelligence: Extended Version

Bayesian learning Probably Approximately Correct Learning

Applied Logic. Lecture 4 part 2 Bayesian inductive reasoning. Marcin Szczuka. Institute of Informatics, The University of Warsaw

Lecture 11: Gödel s Second Incompleteness Theorem, and Tarski s Theorem

Approximate Universal Artificial Intelligence

Is there an Elegant Universal Theory of Prediction?

Universal Learning Theory

Algorithmic Information Theory

A Tutorial Introduction to the Minimum Description Length Principle

De Finetti s ultimate failure. Krzysztof Burdzy University of Washington

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

de Blanc, Peter Ontological Crises in Artificial Agents Value Systems. The Singularity Institute, San Francisco, CA, May 19.

Theoretically Optimal Program Induction and Universal Artificial Intelligence. Marcus Hutter

I. Induction, Probability and Confirmation: Introduction

Pei Wang( 王培 ) Temple University, Philadelphia, USA

Chapter Three. Hypothesis Testing

Creative Objectivism, a powerful alternative to Constructivism

Imprecise Probability

Deductive Systems. Lecture - 3

CHAPTER 0: BACKGROUND (SPRING 2009 DRAFT)

An Algorithms-based Intro to Machine Learning

Bias and No Free Lunch in Formal Measures of Intelligence

The roots of computability theory. September 5, 2016

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16

Universal probability-free conformal prediction

Probability is related to uncertainty and not (only) to the results of repeated experiments

Strong AI vs. Weak AI Automated Reasoning

Machine Learning Theory (CS 6783)

Agency and Interaction in Formal Epistemology

Kolmogorov complexity and its applications

Popper School Methodological Disproof of Quantum Logic

Machine Learning Theory (CS 6783)

Bayesian Learning Extension

Mathematical Logic. Introduction to Reasoning and Automated Reasoning. Hilbert-style Propositional Reasoning. Chiara Ghidini. FBK-IRST, Trento, Italy

Related Concepts: Lecture 9 SEM, Statistical Modeling, AI, and Data Mining. I. Terminology of SEM

HAPPY BIRTHDAY CHARLES

CONSTRUCTION OF THE REAL NUMBERS.

COMP219: Artificial Intelligence. Lecture 19: Logic for KR

Why on earth did you do that?!

Minimum Description Length Induction, Bayesianism, and Kolmogorov Complexity

Decision Theory and the Logic of Provability

Classification (Categorization) CS 391L: Machine Learning: Inductive Classification. Raymond J. Mooney. Sample Category Learning Problem

Propositional Logic. Fall () Propositional Logic Fall / 30

Outline. Introduction. Bayesian Probability Theory Bayes rule Bayes rule applied to learning Bayesian learning and the MDL principle

Statistical and Inductive Inference by Minimum Message Length

Proof Techniques (Review of Math 271)

Class 15: Hilbert and Gödel

Chapter 7 R&N ICS 271 Fall 2017 Kalev Kask

Lecture 14 Rosser s Theorem, the length of proofs, Robinson s Arithmetic, and Church s theorem. Michael Beeson

comp4620/8620: Advanced Topics in AI Foundations of Artificial Intelligence

PHILOSOPHY OF MODELING

How to determine if a statement is true or false. Fuzzy logic deal with statements that are somewhat vague, such as: this paint is grey.

Introduction to Information Theory

COMS 4771 Introduction to Machine Learning. Nakul Verma

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

Bayesian network modeling. 1

Predictive Hypothesis Identification

From Complexity to Intelligence

Bayesian Regression of Piecewise Constant Functions

CS 380: ARTIFICIAL INTELLIGENCE

Feyerabend: How to be a good empiricist

School of EECS Washington State University. Artificial Intelligence

Evaluation for Pacman. CS 188: Artificial Intelligence Fall Iterative Deepening. α-β Pruning Example. α-β Pruning Pseudocode.

CS 188: Artificial Intelligence Fall 2008

COMP219: Artificial Intelligence. Lecture 19: Logic for KR

Weighted Majority and the Online Learning Approach

Why on earth did you do that?!

CS Machine Learning Qualifying Exam

Probability theory basics

16.4 Multiattribute Utility Functions

Adversarial Sequence Prediction

Church s s Thesis and Hume s s Problem:

Lecture 3: Introduction to Complexity Regularization

Propositional Resolution

PHI Searle against Turing 1

Introduction to Bayesian Learning

Probabilistic Reasoning

Maximal Introspection of Agents

Examples: P: it is not the case that P. P Q: P or Q P Q: P implies Q (if P then Q) Typical formula:

Data Mining Chapter 4: Data Analysis and Uncertainty Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Introduction to Kolmogorov Complexity

Experimental logics, mechanism and knowable consistency (SELLC 2010 in Guangzhou, China)

Transcription:

Uncertainty & Induction in AGI Marcus Hutter Canberra, ACT, 0200, Australia http://www.hutter1.net/ ANU AGI ProbOrNot Workshop 31 July 2013

Marcus Hutter - 2 - Uncertainty & Induction in AGI Abstract AGI systems have to learn from experience, build models of the environment from the acquired knowledge, and use these models for prediction (and action). In philosophy this is called inductive inference, in statistics it is called estimation and prediction, and in computer science it is addressed by machine learning. I will first review unsuccessful attempts and unsuitable approaches towards a general theory of uncertainty and induction, including Popper s denial of induction, frequentist statistics, much of statistical learning theory, subjective Bayesianism, Carnap s confirmation theory, the data paradigm, eliminative induction, pluralism, and deductive and other approaches. I will then argue that Solomonoff s formal, general, complete, consistent, and essentially unique theory provably solves most issues that have plagued the other approaches. Some theoretical properties, extensions to (re)active learning agents, and practical approximations are mentioned in passing, but they are not the focus of this talk. I will conclude with some general advice to philosophers and AGI researchers.

Marcus Hutter - 3 - Uncertainty & Induction in AGI Contents The Need for a General Theory Unsuitable/Limited Approaches Universal Induction and AGI Discussion

Marcus Hutter - 4 - Uncertainty & Induction in AGI Motivation the necessity of dealing properly with uncertainty&induction in AGI The world is only partially observable: Need to form beliefs about the unobserved. Induction problem needs to be solved. Also: The world itself may be indeterministic (quantum theory) Uncertainty comes in degrees: low/medium/high and finer. A theory handling uncertainty must be strong enough to support rational decision making and action recommendation. Fazit: Uncertainty and induction have to be dealt with in a quantitative and graded manner.

Marcus Hutter - 5 - Uncertainty & Induction in AGI Induction/Prediction Examples Hypothesis testing/identification: Does treatment X cure cancer? Do observations of white swans confirm that all ravens are black? Model selection: Are planetary orbits circles or ellipses? How many wavelets do I need to describe my picture well? Which genes can predict cancer? Parameter estimation: Bias of my coin. Eccentricity of earth s orbit. Sequence prediction: Predict weather/stock-quote/... tomorrow, based on past sequence. Continue IQ test sequence like 1,4,9,16,? Classification can be reduced to sequence prediction: Predict whether email is spam. Question: Is there a general & formal & complete & consistent theory for induction & prediction? Beyond induction: active/reward learning, fct. optimization, game theory.

Marcus Hutter - 6 - Uncertainty & Induction in AGI The Need for a Unified Theory Why do we need or should want a unified theory of induction? Finding new rules for every particular (new) problem is cumbersome. A plurality of theories is prone to disagreement or contradiction. Axiomatization boosted mathematics&logic&deduction and so (should) induction. Provides a convincing story and conceptual tools for outsiders. Automatize induction&science (that s what machine learning does) By relating it to existing narrow/heuristic/practical approaches we deepen our understanding of and can improve them. Necessary for resolving philosophical problems. Unified/universal theories are often beautiful gems. There is no convincing argument that the goal is unattainable.

Marcus Hutter - 7 - Uncertainty & Induction in AGI Unsuitable/Limited Approaches Popper s approach to induction is seriously flawed: falsificationism is too limited, corroboration confirmation or meaningless, simple easy-to-refute. No free lunch myth relies on unrealistic uniform sampling. Universal sampling permits free lunch. Frequentism: definition circular, limited to i.i.d., reference class problem. Statistical Learning Theory: Predominantly considers i.i.d. data: Empirical Risk Minimization, PAC bounds, VC-dimension, Rademacher complexity, Cross-Validation.

Marcus Hutter - 8 - Uncertainty & Induction in AGI Unsuitable/Limited Approaches Subjective Bayes: No formal procedure/theory to get prior. Objective Bayes: Right in spirit, but limited to small classes unless community embraces information theory. MDL/MML: practical approximations of universal induction. Pluralism is globally inconsistent. Deductive Logic: not strong enough to allow for induction. Non-monotonic reasoning, inductive logic, default reasoning do not properly take uncertainty into account. Carnap s confirmation theory: Only for exchangeable data. Cannot confirm universal hypotheses. Data paradigm: data may be more important than algorithms for simple problems, but a lookup-table AGI will not work. Eliminative induction: ignores uncertainty and information theory. Fuzzy logic/sets: no sound foundations / heuristic. Imprecise probability: Is indecisive.

Marcus Hutter - 9 - Uncertainty & Induction in AGI Summary The criticized approaches cannot serve as a general foundation of induction. Conciliation Of course most of the criticized approaches do work in their limited domains, and are trying to push their boundaries towards more generality. And What Now? Criticizing others is easy and in itself a bit pointless. The crucial question is whether there is something better out there. And indeed there is, which I will turn to now.

Marcus Hutter - 10 - Uncertainty & Induction in AGI Induction Deduction Approximate correspondence between the most important concepts in induction and deduction. Induction Deduction Type of inference: generalization/prediction specialization/derivation Framework: probability axioms = logical axioms Assumptions: prior = non-logical axioms Inference rule: Bayes rule = modus ponens Results: posterior = theorems Universal scheme: Solomonoff probability = Zermelo-Fraenkel set theory Universal inference: universal induction = universal theorem prover Limitation: incomputable = incomplete (Gödel) In practice: approximations = semi-formal proofs Operation: computation = proof The foundations of induction are as solid as those for deduction.

Foundations of Universal Induction and AGI Ockhams razor (simplicity) principle Entities should not be multiplied beyond necessity. Epicurus principle of multiple explanations If more than one theory is consistent with the observations, keep all theories. Bayes rule for conditional probabilities Given the prior belief/probability one can predict all future probabilities. Posterior(H D) Likelihood(D H) Prior(H). Turing s universal machine Everything computable by a human using a fixed procedure can also be computed by a (universal) Turing machine. Kolmogorov s complexity The complexity or information content of an object is the length of its shortest description on a universal Turing machine. Solomonoff s universal prior=ockham+epicurus+bayes+turing Solves the question of how to choose the prior if nothing is known. universal induction, formal Ockham. Prior(H) = 2 Kolmogorov(H) Bellman equations Theory of how to optimally plan and act in known environments. Solomonoff + Bellman = Universal Artificial Intelligence.

Marcus Hutter - 12 - Uncertainty & Induction in AGI Properties of Universal Induction solves/avoids/meliorates most problems other systems are plagued with + is formal, complete, general, globally consistent, + has essentially optimal error bounds, + solves the problem of zero p(oste)rior, + can confirm universal hypotheses, + is reparametrization and regrouping invariant, + solves the problem of old evidence and updating, + allows to incorporate prior knowledge, + ready integration into rational decision agent, prediction of short sequences needs care, constant fudges in all results and the U-dependence, is incomputable with crude practical approximations.

Marcus Hutter - 13 - Uncertainty & Induction in AGI Induction Prediction Decision Action Having or acquiring or learning or inducing a model of the environment an agent interacts with allows the agent to make predictions and utilize them in its decision process of finding a good next action. Induction infers general models from specific observations/facts/data, usually exhibiting regularities or properties or relations in the latter. Example Induction: Find a model of the world economy. Prediction: Use the model for predicting the future stock market. Decision: Decide whether to invest assets in stocks or bonds. Action: Trading large quantities of stocks influences the market.

Marcus Hutter - 14 - Uncertainty & Induction in AGI Universal Artificial Intelligence Key idea: Optimal action/plan/policy based on the simplest world model consistent with history. Formally... AIXI: a k := arg max... max [r k +... + r m ] a k a m o k r k o m r m 2 length(p) p : U(p,a 1..a m )=o 1 r 1..o m r m k=now, action, observation, reward, U niversal TM, program, m=lifespan AIXI is an elegant, complete, essentially unique, and limit-computable mathematical theory of AI. Claim: AIXI is the most intelligent environmental independent, i.e. universally optimal, agent possible. Proof: For formalizations, quantifications, proofs see Problem: Computationally intractable. Achievement: Well-defines AI. Gold standard to aim at. Inspired practical algorithms. Cf. infeasible exact minimax. [H 00-05]

Marcus Hutter - 15 - Uncertainty & Induction in AGI Approximations Exact computation of probabilities often intractable. Approximate the solution, not the problem! Any heuristic method needs to be gauged against Solomonoff/AIXI gold-standard. Cheap surrogates will fail in critical situations. Established/good approximations: MDL, MML, CTW, Universal search, Monte Carlo,... AGI needs anytime algorithm powerful enough to approach Solomonoff/AIXI in the limit of infinite comp.time.

Marcus Hutter - 16 - Uncertainty & Induction in AGI Summary Conceptually and mathematically the problem of induction is solved. Computational problems and some philosophical questions remain. Ingredients for induction and prediction: Ockham, Epicurus, Turing, Bayes, Kolmogorov, Solomonoff For decisions and actions: Include Bellman. Mathematical results: consistency, bounds, optimality, many others. Most Philosophical riddles around induction solved. Experimental results via practical compressors. Induction Science Machine Learning Ockham s razor Compression Intelligence.

Marcus Hutter - 17 - Uncertainty & Induction in AGI Advice to Philosophers & AGI Researchers Embrace U(A)I as the best conceptual solutions of the induction/agi problem so far. Stand on the shoulders of giants like Bayes, Shannon, Turing, Kolmogorov, Solomonoff, Wallace, Rissanen, Bellman. Work out defects / what is missing, and try to improve U(A)I, or Work on alternatives but then benchmark your approach against state of the art U(A)I. Cranks who have not understood the giants and try to reinvent the wheel from scratch can safely be ignored. Never trust a theory === if it is not supported by an experiment ===== experiment theory

Marcus Hutter - 18 - Uncertainty & Induction in AGI When it s OK to ignore U(A)I if your pursued approaches already works sufficiently well if your problem is simple enough (e.g. i.i.d.) if you do not care about a principled/sound solution if you re happy to succeed by trial-and-error (with restrictions) Information Theory Information Theory plays an even more significant role for induction than this presentation might suggest. Algorithmic Information Theory is superior to Shannon Information. There are AIT versions that even capture Meaningful Information.

Marcus Hutter - 19 - Uncertainty & Induction in AGI Outlook Use compression size as general performance measure (like perplexity is used in speech) Via code-length view, many approaches become comparable, and may be regarded as approximations to UI. This should lead to better compression algorithms which in turn should lead to better learning algorithms. Address open problems in induction within the UI framework.

Marcus Hutter - 20 - Uncertainty & Induction in AGI Thanks! Questions? Details: Paper1: A Philosophical Treatise of Universal Induction. Entropy, 13:6 (2011) 1076 1136. Paper2: Universal Intelligence: A Definition of Machine Intelligence. Minds & Machines, 17:4 (2007) 391 444. Book intends to excite a broader AI audience about abstract Algorithmic Information Theory and inform theorists about exciting applications to AI. Decision Theory = Probability + Utility Theory + + Universal Induction = Ockham + Bayes + Turing = = A Unified View of Artificial Intelligence