Statistical and Inductive Inference by Minimum Message Length

Size: px

Start display at page:

Download "Statistical and Inductive Inference by Minimum Message Length"

Elvin Griffith
6 years ago
Views:

1 C.S. Wallace Statistical and Inductive Inference by Minimum Message Length With 22 Figures Springer

2 Contents Preface 1. Inductive Inference Introduction Inductive Inference The Demise of Theories Approximate Theories Explanation Explanatory Power The Explanation Message Random Variables Probability Independence Discrete Distributions Example: The Binomial Distribution Continuous Distributions Expectation Non-Bayesian Inference Non-Bayesian Estimation Non-Bayesian Model Selection Bayesian Inference Bayesian Decision Theory The Origins of Priors Previous Likelihood Conjugate Priors The Jeffreys Prior Uninformative Priors Maximum-Entropy Priors Invariant Conjugate Priors Summary Summary of Statistical Critique 54 v

3 x Contents 2. Information Shannon Information Binary Codes Optimal Codes Measurement of Information The Construction of Optimal Codes Coding Multi-Word Messages Arithmetic Coding Some Properties of Optimal Codes Non-Binary Codes: The Nit The Subjective Nature of Information The Information Content of a Multinomial Distribution Entropy Codes for Infinite Sets Unary and Punctuated Binary Codes Optimal Codes for Integers Feasible Codes for Infinite Sets Universal Codes Algorithmic Complexity Turing Machines Start and Stop Conditions Dependence on the Choice of Turing Machine Turing Probability Distributions Universal Turing Machines Algorithmic Complexity vs. Shannon Information Information, Inference and Explanation The Second Part of an Explanation The First Part of an Explanation Theory Description Codes as Priors Universal Codes in Theory Descriptions Relation to Bayesian Inference Explanations and Algorithmic Complexity The Second Part The First Part An Alternative Construction Universal Turing Machines as Priors Differences among UTMs The Origins of Priors Revisited The Evolution of Priors 135

4 Contents xi 3. Strict Minimum Message Length (SMML) Problem Definition The Set X of Possible Data The Probabilistic Model of Data Coding of the Data The Set of Possible Inferences Coding the Inference Prior Probability Density Meaning of the Assertion The Strict Minimum Message Length Explanation for Discrete Data Discrete Hypothesis Sets Minimizing Relations for SMML Binomial Example Significance of I ~ I Non-Uniqueness of Sufficient Statistics Binomial Example Using a Sufficient Statistic An Exact Algorithm for the Binomial Problem A Solution for the Trinomial Distribution The SMML Explanation for Continuous Data Mean of a Normal A Boundary Rule for Growing Data Groups Estimation of Normal Mean with Normal Prior Mean of a Multivariate Normal Summary of Multivariate Mean Estimator Mean of a Uniform Distribution of Known Range Some General Properties of SMML Estimators Property 1: Data Representation Invariance Property 2: Model Representation Invariance Property 3: Generality Property 4: Dependence on Sufficient Statistics Property 5: Efficiency Discrimination Example: Discrimination of a Mean Summary Approximations to SMML The "Ideal Group" (IG) Estimator SMML-like codes L2 Ideal Data Groups The Estimator The Neyman-Scott Problem The Ideal Group Estimator for Neyman-Scott Other Estimators for Neyman-Scott 202

5 xii Contents 4.5 Maximum Likelihood for Neyman-Scott Marginal Maximum Likelihood Kullback-Leibler Distance Minimum Expected K-L Distance (MEKL) Minimum Expected K-L Distance for Neyman-Scott Blurred Images Dowe's Approximation I1D to the Message Length Random Coding of Estimates Choosing a Region in O Partitions of the Hypothesis Space The Meaning of Uncertainty Regions Uncertainty via Limited Precision Uncertainty via Dowe's I1D Construction What Uncertainty Is Described by a Region? Summary MML: Quadratic Approximations to SMML The MML Coding Scheme Assumptions of the Quadratic MML Scheme A Trap for the Unwary Properties of the MML Estimator An Alternative Expression for Fisher Information Data Invariance and Sufficiency Model Invariance Efficiency Multiple Parameters MML Multi-Parameter Properties The MML Message Length Formulae Standard Formulae Small-Sample Message Length Curved-Prior Message Length Singularities in the Prior Large-D Message Length Approximation Based on I Precision of Estimate Spacing Empirical Fisher Information Formula I1 A for Many Parameters Irregular Likelihood Functions Transformation of Empirical Fisher Information A Safer? Empirical Approximation to Fisher Information A Binomial Example The Multinomial Distribution Irregularities in the Binomial and Multinomial Distributions 248

6 Contents xiii 5.5 Limitations The Normal Distribution Extension to the Neyman-Scott Problem Negative Binomial Distribution The Likelihood Principle MML Details in Some Interesting Cases Geometric Constants Conjugate Priors for the Normal Distribution Conjugate Priors for the Multivariate Normal Distribution Normal Distribution with Perturbed Data Normal Distribution with Coarse Data von Mises-Fisher Distribution Circular von Mises-Fisher distribution Spherical von Mises-Fisher Distribution Poisson Distribution Linear Regression and Function Approximation Linear Regression Function Approximation Mixture Models ML Mixture Estimation: The EM Algorithm A Message Format for Mixtures A Coding Trick Imprecise Assertion of Discrete Parameters The Code Length of Imprecise Discrete Estimates A Surrogate Class Label "Estimate" The Fisher Information for Mixtures The Fisher Information with Class Labels Summary of the Classified Model Classified vs. Unclassified Models A "Latent Factor" Model Multiple Latent Factors Structural Models Inference of a Regular Grammar A Mealey Machine Representation Probabilistic FSMs An Assertion Code for PFSMs A Less Redundant FSM Code Transparency and Redundancy Coding Transitions An Example Classification Trees and Nets A Decision Tree Explanation 315

7 xiv Contents Coding the Tree Structure Coding the Class Distributions at the Leaves Decision Graphs and Other Elaborations A Binary Sequence Segmentation Problem The Kearns et al "MDL" Criterion Correcting the Message Length Results Using the MML Criterion An SMML Approximation to the Sequence Problem Learning Causal Nets The Model Space The Message Format Equivalence Sets Insignificant Effects Partial Order Equivalence Structural Equivalence Explanation Length Finding Good Models Prior Constraints Test Results The Feathers on the Arrow of Time Closed Systems and Their States Reversible Laws Entropy as a Measure of Disorder Why Entropy Will Increase A Paradox? Deducing the Past Macroscopic Deduction Deduction with Deterministic Laws, Exact View Deduction with Deterministic Laws, Inexact View Deduction with Non-deterministic Laws Alternative Priors A Tale of Two Clocks Records and Memories Induction of the Past (A la recherche du temps perdu) Induction of the Past by Maximum Likelihood Induction of the Past by MML The Uses of Deduction The Inexplicable Induction of the Past with Deterministic Laws Causal and Teleological Explanations Reasons for Asymmetry Summary: The Past Regained? Gas Simulations 370

8 Contents xv Realism of the Simulation Backtracking to the Past Diatomic Molecules The Past of a Computer Process Addendum: Why Entropy Will Increase (Additional Simulation Details) Simulation of the Past A Non-Adiabatic Experiment MML as a Descriptive Theory The Grand Theories Primitive Inductive Inferences The Hypotheses of Natural Languages The Efficiencies of Natural Languages Some Inefficiencies of Natural Languages Scientific Languages The Practice of MML Induction Human Induction Evolutionary Induction Experiment Related Work Solomonoff Prediction with Generalized Scoring Is Prediction Inductive? A Final Quibble Rissanen, MDL and NML Normalized Maximum Likelihood Has NML Any Advantage over MML? 413 Bibliography 417 Index 421

The Minimum Message Length Principle for Inductive Inference

The Minimum Message Length Principle for Inductive Inference The Principle for Inductive Inference Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population Health University of Melbourne University of Helsinki, August 25,