Cromwell's principle idealized under the theory of large deviations

Similar documents
Uncertain Inference and Artificial Intelligence

ICES REPORT Model Misspecification and Plausibility

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Computing Minimax Decisions with Incomplete Observations

MIT Spring 2016

Bayesian Regression Linear and Logistic Regression

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

Quantifying the Price of Uncertainty in Bayesian Models

The Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition.

Variational Bayesian Logistic Regression

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric?

Bios 6649: Clinical Trials - Statistical Design and Monitoring

STAT 100C: Linear models

6.867 Machine learning, lecture 23 (Jaakkola)

A Brief Review of Probability, Bayesian Statistics, and Information Theory

New Bayesian methods for model comparison

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

Applied Bayesian Statistics STAT 388/488

Machine Learning Lecture 7

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

7. Estimation and hypothesis testing. Objective. Recommended reading

Stochastic population forecast: a supra-bayesian approach to combine experts opinions and observed past forecast errors

Blending Bayesian and frequentist methods. according to the precision of prior information with. applications to hypothesis testing.

Strong Lens Modeling (II): Statistical Methods

Experimental Design to Maximize Information

CS 630 Basic Probability and Information Theory. Tim Campbell

Statistical Inference for Food Webs

Noninformative Priors for the Ratio of the Scale Parameters in the Inverted Exponential Distributions

More on nuisance parameters

ST440/540: Applied Bayesian Statistics. (9) Model selection and goodness-of-fit checks

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

ECE521 week 3: 23/26 January 2017

Stat 5101 Lecture Notes

Chapter Three. Hypothesis Testing

Prediction Using Several Macroeconomic Models

1 Introduction. 2 AIC versus SBIC. Erik Swanson Cori Saviano Li Zha Final Project

David Giles Bayesian Econometrics

Pattern Classification

Exact Minimax Predictive Density Estimation and MDL

MIT Spring 2016

Algorithm-Independent Learning Issues

Bayesian Econometrics

var D (B) = var(b? E D (B)) = var(b)? cov(b; D)(var(D))?1 cov(d; B) (2) Stone [14], and Hartigan [9] are among the rst to discuss the role of such ass

Natural Language Processing. Classification. Features. Some Definitions. Classification. Feature Vectors. Classification I. Dan Klein UC Berkeley

Generalized Linear Models

STA 4273H: Statistical Machine Learning

Miscellany : Long Run Behavior of Bayesian Methods; Bayesian Experimental Design (Lecture 4)

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

Testing Statistical Hypotheses

Penalized Loss functions for Bayesian Model Choice

Stochastic Population Forecasting based on a Combination of Experts Evaluations and accounting for Correlation of Demographic Components

Machine Learning 4771

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

Carl N. Morris. University of Texas

Robust Monte Carlo Methods for Sequential Planning and Decision Making

Model comparison. Christopher A. Sims Princeton University October 18, 2016

Final Examination CS 540-2: Introduction to Artificial Intelligence

Confidence Distribution

Chapter 6. Estimation of Confidence Intervals for Nodal Maximum Power Consumption per Customer

SRNDNA Model Fitting in RL Workshop

Ch 4. Linear Models for Classification

Curve Fitting Re-visited, Bishop1.2.5

PATTERN RECOGNITION AND MACHINE LEARNING

Whaddya know? Bayesian and Frequentist approaches to inverse problems

EC319 Economic Theory and Its Applications, Part II: Lecture 7

MODEL COMPARISON CHRISTOPHER A. SIMS PRINCETON UNIVERSITY

E. Santovetti lesson 4 Maximum likelihood Interval estimation

David Giles Bayesian Econometrics

Matthias Schubert. Jochen Köhler. Michael H. Faber

STA414/2104 Statistical Methods for Machine Learning II

Dose-response modeling with bivariate binary data under model uncertainty

Randomized Decision Trees

WE start with a general discussion. Suppose we have

Bayesian Model Diagnostics and Checking

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Decision Analysis. An insightful study of the decision-making process under uncertainty and risk

Bayesian Brittleness

Lecture Notes 1: Decisions and Data. In these notes, I describe some basic ideas in decision theory. theory is constructed from

Logistic Regression. Machine Learning Fall 2018

Testing Statistical Hypotheses

Support Vector Machines

Learning Bayesian networks

Bayesian Model Comparison

Expectation Propagation for Approximate Bayesian Inference

Statistics: Learning models from data

Political Science 236 Hypothesis Testing: Review and Bootstrapping

Basic Probabilistic Reasoning SEG

STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă

CIS 390 Fall 2016 Robotics: Planning and Perception Final Review Questions

An introduction to Bayesian reasoning in particle physics

STA 4273H: Sta-s-cal Machine Learning

What are the Findings?

An Introduction to No Free Lunch Theorems

Contents. Set Theory. Functions and its Applications CHAPTER 1 CHAPTER 2. Preface... (v)

Ellipsoidal Methods for Adaptive Choice-based Conjoint Analysis (CBCA)

Confidence Intervals Lecture 2

Using prior knowledge in frequentist tests

Class 26: review for final exam 18.05, Spring 2014

Transcription:

Cromwell's principle idealized under the theory of large deviations Seminar, Statistics and Probability Research Group, University of Ottawa Ottawa, Ontario April 27, 2018 David Bickel University of Ottawa

Cromwell s principle Cromwell's principle: P(model is wrong) > 0, i.e., the prior probability that the assumptions are incorrect is greater than 0 D. Lindley (2013), Understanding Uncertainty Relevance to Bayesian model checking: Diagnostics often reveal that prior distributions require revision impossible under Bayes's theorem if P(model) = 1 Idealized Cromwell's principle: P(model) vanishes Large deviations version of the infinitesimal probability due to A. Hájek and M. Smithson (2012), Synthese 187, 33 48 D. R. Bickel (2018), Statistics, http://bit.ly/2vke23q

Empirical distribution converges to max. entropy Weak convergence to maximum entropy distribution D. R. Bickel (2018), Statistics, http://bit.ly/2vke23q

Normal-normal CDFs D. R. Bickel (2018), Statistics, http://bit.ly/2vke23q

models have sufficient evidence

Assessing multiple models All models Models in conflict with the data models (of sufficient evidence)

Bayesian model assessment Does the prior or parametric family conflict with the data? Yes, if it has relatively low evidence Measures of evidence satisfying the criteria of Bickel, International Statistical Review 81, 188-206: Likelihood ratio (limited to comparing two simple models) Bayes factor Penalized likelihood ratio No Yes Use the model for decisions (minimize posterior expected loss) even if other models are adequate Change the model to an adequate model, a model of high evidence

Loss function example What act is appropriate? Given multiple priors of sufficient evidence, agreement with the data With or without confidence sets

Hurwicz criterion

models posteriors Robust Bayes act: Hurwicz criterion Loss function

Decisions under robust Bayes rules applied to models of sufficient evidence Robust Bayes acts: models posteriors E-admissible Hurwicz criterion Γ-minimax Bickel, International Journal of Approximate Reasoning 66, 53-72 Loss function 1 model 1 posterior Bayes act

Decisions under the entropy-maximizing model of sufficient evidence models posteriors Bickel, International Journal of Approximate Reasoning 66, 53-72 Blending Loss function Nested confidence sets 1 posterior Bayes act

Decisions under the entropy-maximizing Bayesian model of sufficient evidence models posteriors Bickel, International Journal of Approximate Reasoning 66, 53-72 Blending Loss function Nested confidence sets 1 posterior Bayes act

Pooling methods Good derived a harmonic mean of p-values from Bayes' rule Examples of subjectively weighting each expert's distribution: Minimizing a weighted sum of divergences from the distributions being combined yields a linear combination of the distributions Any linearly combined marginal distribution is the same whether marginalization or combination is carried out first Weighted multiplicative combination of the distributions Externally Bayesian updating

Decisions under minimax-redundancy pooling of models of small codelengths Bickel, International Journal of Approximate Reasoning 66, 53-72 models posteriors Pooling Loss function 1 posterior Minimize expected loss

Conflicting probabilities

The three players Combiner is you, the statistician who combines the candidate distributions Goal 1: minimize error; Goal 2: beat Chooser Chooser is the imaginary statistician who instead chooses a single candidate distribution Goal 1: minimize error; Goal 2: beat Combiner Nature picks the true distribution among those that are plausible in order to help Chooser

Information game Information divergence and inferential gain: Utility paid to Statistician 2 (Combiner or Chooser): Reduction to game of Combiner v. Nature-Chooser Coalition:

Decisions under minimax-redundancy pooling of models of small codelengths Bickel, International Journal of Approximate Reasoning 66, 53-72 models posteriors Pooling Loss function 1 posterior Minimize expected loss

models posteriors Robust Bayes acts: E-admissible Hurwicz criterion Γ-minimax Blending Pooling Loss function Nested confidence sets 1 posterior Bayes act Funded by the Natural Sciences and Engineering Research Council of Canada