Whaddya know? Bayesian and Frequentist approaches to inverse problems

Similar documents
Constraints versus Priors

Statistical Measures of Uncertainty in Inverse Problems

Using what we know: Inference with Physical Constraints

Intro to Probability. Andrei Barbu

Uncertainty Quantification Qualification

Bayesian Inference for Normal Mean

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

Bayesian Inference. STA 121: Regression Analysis Artin Armagan

Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric?

Introduction to Statistical Methods for High Energy Physics

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over

Lecture Notes 1: Decisions and Data. In these notes, I describe some basic ideas in decision theory. theory is constructed from

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10

Econ 2148, spring 2019 Statistical decision theory

Swarthmore Honors Exam 2012: Statistics

Probability COMP 245 STATISTICS. Dr N A Heard. 1 Sample Spaces and Events Sample Spaces Events Combinations of Events...

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Bayesian Econometrics

Journeys of an Accidental Statistician

Using prior knowledge in frequentist tests

Lecture notes on statistical decision theory Econ 2110, fall 2013

MIT Spring 2016

(1) Introduction to Bayesian statistics

Basic Probabilistic Reasoning SEG

Statistical Methods in Particle Physics Lecture 1: Bayesian methods

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14

Bayesian Estimation An Informal Introduction

Lecture 2: Statistical Decision Theory (Part I)

Lecture 5: Bayes pt. 1

Bayesian Inference for Binomial Proportion

Joint, Conditional, & Marginal Probabilities

A Discussion of the Bayesian Approach

Bayesian Regression Linear and Logistic Regression

What Causality Is (stats for mathematicians)

Statistics of Small Signals

Advanced Statistical Methods. Lecture 6

Bayesian methods in economics and finance

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models

MS&E 226: Small Data

Lecture 2: Basic Concepts of Statistical Decision Theory

Solution: chapter 2, problem 5, part a:

An AI-ish view of Probability, Conditional Probability & Bayes Theorem

10/18/2017. An AI-ish view of Probability, Conditional Probability & Bayes Theorem. Making decisions under uncertainty.

Conditional probabilities and graphical models

18.05 Practice Final Exam

A Problem Involving Games. Paccioli s Solution. Problems for Paccioli: Small Samples. n / (n + m) m / (n + m)

Statistical Methods in Particle Physics. Lecture 2

Classification and Regression Trees

STA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n =

Introduction into Bayesian statistics

MATH MW Elementary Probability Course Notes Part I: Models and Counting

We set up the basic model of two-sided, one-to-one matching

Machine Learning

Predictive Distributions

Weighted Majority and the Online Learning Approach

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

EnM Probability and Random Processes

Lecture 1: Probability Fundamentals

Using Probability to do Statistics.

Notes on Decision Theory and Prediction

Probability - Lecture 4

Approximate Bayesian computation for spatial extremes via open-faced sandwich adjustment

Single Maths B: Introduction to Probability

Machine Learning

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

Applied Bayesian Statistics STAT 388/488

Chapter 5. Bayesian Statistics

Machine Learning 4771

On the errors introduced by the naive Bayes independence assumption

Generative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul

Decision Analysis. An insightful study of the decision-making process under uncertainty and risk

Probabilistic Reasoning

Bayesian Inference. Introduction

Terminology. Experiment = Prior = Posterior =

Quantitative Understanding in Biology 1.7 Bayesian Methods

Peter Hoff Minimax estimation November 12, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling

Introduction to Imprecise Probability and Imprecise Statistical Methods

Probability, Entropy, and Inference / More About Inference

A BAYESIAN SOLUTION TO INCOMPLETENESS

Bayesian data analysis using JASP

Logistics. Naïve Bayes & Expectation Maximization. 573 Schedule. Coming Soon. Estimation Models. Topics

Cómo contar la predicción probabilistica? Part I: The problem with probabilities

18.05 Final Exam. Good luck! Name. No calculators. Number of problems 16 concept questions, 16 problems, 21 pages

How Much Evidence Should One Collect?

P (E) = P (A 1 )P (A 2 )... P (A n ).

STAT J535: Introduction

Ordered Designs and Bayesian Inference in Survey Sampling

Data Mining Chapter 4: Data Analysis and Uncertainty Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Statistical Methods for Particle Physics (I)

Bayesian Model Diagnostics and Checking

Uncertainty Quantification for Inverse Problems. November 7, 2011

Choosing priors Class 15, Jeremy Orloff and Jonathan Bloom

MAE 493G, CpE 493M, Mobile Robotics. 6. Basic Probability

A.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I. kevin small & byron wallace

Econ 2140, spring 2018, Part IIa Statistical Decision Theory

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation

Peter Hoff Minimax estimation October 31, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11

Transcription:

Whaddya know? Bayesian and Frequentist approaches to inverse problems Philip B. Stark Department of Statistics University of California, Berkeley Inverse Problems: Practical Applications and Advanced Analysis Schlumberger WesternGeco Houston, TX 12 15 November 2012

Abstract The difference between Bayesians and Frequentists is what they presume to know about how the data were generated. Frequentists generally presume to know that the data come from a statistical model. A statistical model says what the probability distribution of the data would be, as a function of one or more unknown parameters. Frequentists might also presume to know that the parameters satisfy some constraints. Bayesians claim that and more: that the actual values of the parameters are selected at random from a set of possible values of those parameters, and that they know the probability distribution used to make that random selection. Unsurprisingly, the difference in assumptions leads to different conclusions. It also leads to different interpretations and definitions of what a good estimator is. Frequentist criteria generally quantify error over repetitions of the experiment, keeping the parameters constant. Bayesian criteria generally quantify error over repetitions of selecting the parameters, keeping the observations constant.

Acknowledgments: Much cribbed from joint work with Luis Tenorio. See: Stark, P.B. and L. Tenorio, 2010. A Primer of Frequentist and Bayesian Inference in Inverse Problems. In Large Scale Inverse Problems and Quantification of Uncertainty, Biegler, L., G. Biros, O. Ghattas, M. Heinkenschloss, D. Keyes, B. Mallick, L. Tenorio, B. van Bloemen Waanders and K. Willcox, eds. John Wiley and Sons, NY.

What s a Bayesian? Frequentists and Bayesians agree as matter of math that P(A B) = P(B A)P(A)/P(B). The problem arises when A is an hypothesis. Then P(A) is a prior. Bayesians have no trouble with priors, e.g., the probability that the hypothesis is true is p. ( the probability that there s oil in the formation is 10%. ) To frequentists, doesn t make sense: Hypothesis is true or not its truth is not random. Ignorance cannot always be expressed as a probability, and hence not as a prior. Frequentist focus: evidence. Bayesian focus: degree of belief given evidence.

What s P(A)? Several standard interpretations. Equally likely outcomes Frequency theory Subjective (Bayesian) theory Model-based

Arguments for Bayesianism Descriptive (mountains of evidence to the contrary) Normative (maybe, if you had a prior) Appeals to betting (Dutch book. If you had to bet, would you be Bayesian?) To capture constraints (frequentists can, too) It doesn t matter, if you have enough data (depends: not true for improper priors or inverse problems) Makes prejudices explicit (does it?) Always gives an answer with muy macho computations The error bars are smaller than for Frequentist methods

How does probability enter an inverse problem? Physics: quantum, thermo Deliberate randomization (experiments) Measurement error: real or an idealization Convenient approximate description (stochastic model): random media, earthquakes Priors Probability means quite different things in these cases.

A partial taxonomy of problems Prediction or inference? Philosophical or practical? Falsifiable/testable on relevant timescale? Repeatable? Earth s core: inference, philosophical, not falsifiable, not repeatable. Oil exploration: prediction, practical, falsifiable, repeatable: If method makes more money than anything else in the toolkit, use it. (But check whether it actually does.)

When are Bayesian Methods Justified? You really have a prior. Still: Little reason anyone but you should be persuaded. I ve never met anyone with a prior, only people who use Bayesian computations. Priors generally chosen for convenience or habit, not because anyone deeply believes they are true. Mountains of evidence that nobody is Bayesian in daily life: biology Prediction, practical, falsifiable on a reasonable time scale, e.g., targeted marketing, predicting server load, aiming anti-aircraft guns. Can test approach against data, repeatedly. Still, error bars still don t mean much. I feel the same about tea leaves and chicken bones. Checking theoretical properties of frequentist methods. If an estimate is not Bayes for some prior (and the loss in question), it is inadmissible; might want a different method. Bayesian uncertainties not to be trusted in high-stakes decisions, if less than frequentist. Frequentist uncertainties based on invented model not to be trusted, either.

Does God play Dice with the Universe? Einstein: no. Bayesians: yes. PBS: what s the potential downside to acting as if she does if she doesn t? Can compare performance of frequentist and Bayes estimators, both with and without a prior. Can evaluate Bayes estimator from frequentist perspective and vice versa. Comparing minimax risk to Bayes risk measures how much information the prior adds.

Terminology State of the world, θ: mathematical representation of physical property, e.g., seismic velocity. Θ: set of possible states of the world, incorporating any prior constraints. Know that θ Θ. Parameter λ = λ[θ] is a property of θ. Data Y take values in the sample space Y. Statistical model gives distribution of Y for any θ Θ. If θ = η, then Y IP η. Density of IP η (with respect to dominating measure µ) at y: p η (y) dip η /dµ y Likelihood of η given Y = y is p η (y) as a function of η, y fixed. Estimators: quantities that can be computed from the data Y without knowing the state of the world.

Priors and Posteriors Prior π is a probability distribution on Θ Assume p η (y) is jointly measurable wrt η, y. Then π and IP η determine the joint distribution of θ and Y. Marginal distribution of Y has density m(y) = p η (y) π(dη). Θ Posterior distribution of θ given Y = y: π(dη Y = y) = p η(y) π(dη). m(y) marginal posterior distribution of λ[θ] given Y = y, π λ (dl Y = y) defined by π λ (dl Y = y) π(dη Y = y) Λ η:λ[η] Λ

Bounded Normal Mean θ Θ [ τ, τ] Y = θ + Z, Z N(0, 1) density of IP θ at y is φ θ (y) 1 2π e (y θ)2 /2 likelihood of η given Y = y is φ η (y) as a function of η, y fixed.

Priors for the Bounded Normal Mean π needs to assign probability 1 to the interval [ τ, τ]. Every choice of π has more information than the constraint θ Θ, because it specifies chance that θ is in each subset of Θ. One choice: uninformative uniform distribution on [ τ, τ], with density U τ (η) 1 2τ 1 [ τ,τ](η). Density of predictive distribution of Y is m(y) = 1 2τ τ τ Posterior distribution of θ given Y = y is φ η (y)dη = 1 (Φ(y + τ) Φ(y τ)). 2τ π(dη Y = y) = φ η(y) 1 2τ 1 η [ τ,τ](η) = φ η(y)1 η [ τ,τ] (η) m(y) Φ(y + τ) Φ(y τ).

y = # = 3 y = 0.7 # y=0 y = 1.2 # 1.2 1 0.8 f " (! y) 0.6 0.4 0.2 0 3 2 1 0 1 2 3! Posterior densities of θ for a bounded normal mean with a uniform prior on the interval [ 3, 3] for four values of y.

Comparing estimators Risk function: way to compare estimators: expected cost of using a particular estimator when the world is in a given state. Choice of risk function should be informed by scientific goal, but often chosen for convenience. Generally, no estimator has the smallest risk for all θ Θ. Bayesian and frequentist methods trade off risks for different θ differently.

Decision Theory Two-player game: Nature versus analyst. Frequentist and Bayesian games differ. According to both, Nature and the analyst know Θ, IP η for all η Θ, λ, and the payoff rule L(l, λ[η]) Nature selects θ from Θ. Analyst selects estimator λ. Analyst does not know θ and Nature does not know λ. Data Y generated using the value of θ that Nature selected, plugged into λ, and L( λ(y ), λ[θ]) is found. With θ fixed, a new value of Y is generated, and L( λ(y ), λ[θ]) is calculated again; repeat. Analyst pays average value of L( λ(y ), λ[θ]), risk of λ at θ, denoted ρ θ ( λ, λ[θ]).

Bayesian Rules v Frequentist Rules Bayesian version: Nature selects θ at random according to the prior distribution π, and the analyst knows π. Frequentist version: analyst does not know how Nature will select θ from Θ. Essential difference between the frequentist and Bayesian viewpoints: Bayesians claim to know more about how Nature generates the data. Cautious frequentist might select λ to minimize her worst-case risk, on the assumption that Nature might play intelligently to win as much as possible. Minimax estimator. Bayesian would select estimator that minimizes the average risk on the assumption that Nature selects θ at random from π. Bayes estimator.

Duality between Bayes Risk and Minimax Risk Bayes risk depends on Θ, {IP η : η Θ}, λ, L, and π. Consider allowing π to vary over a rich set of possible priors. Prior for which Bayes risk is largest is least favorable. Typically not uninformative prior. Under some technical conditions, the Bayes risk for the least favorable prior is equal to the minimax risk. If Bayes risk is much smaller than the minimax risk, prior added information not present in the constraint itself.

What s it all about, α? MSE Posterior MSE Confidence levels Credible levels

Mean Squared Error Suppose λ[θ] takes values in a Hilbert space. Estimate λ[θ] by λ(y ). MSE of λ when θ = η is MSE( λ(y ), η) IE η λ(y ) λ[η] 2. Depends on η: Expectation is wrt IP η. True θ is unknown, so can t select λ to minimize MSE. Might choose λ to make the largest MSE for η Θ as small as possible: minimax MSE estimator.

Posterior Mean Squared Error PMSE( λ(y), π) IE π λ(y) λ[η] 2. Depends on π and observed value y. Expectation is wrt posterior distribution of θ given Y = y. π is known, so can select (for each y) the estimator w/ smallest PMSE. Bayes estimator for PMSE is posterior marginal mean: λ π (y) lπ λ (dl Y = y).

MSE v PMSE Both involve expectations of the squared norm of the difference between the estimate and the true value of the parameter. MSE: expectation wrt distribution of Y, holding θ = η fixed. PMSE: expectation wrt posterior distribution of θ, holding Y = y fixed.

Confidence Sets and Credible Regions Pick α (0, 1). A random set I(Y ) of possible values of λ is 1 α confidence set for λ[θ] if IP η {I(Y ) λ[η]} 1 α, η Θ. Probability is wrt distribution of Y holding η fixed. Coverage probability is the (smallest) chance that I(Y ) λ[η] for η Θ, with Y IP η (y).

Posterior Credible Region Pick α (0, 1). Set I(Y ) of possible values of λ is 1 α posterior credible region for λ[θ] if IP π(dθ Y =y) (λ[θ] I(y)) π λ (dl Y = y) 1 α. I(y) Probability is wrt posterior distribution of θ, holding Y = y. Posterior probability that I(y) λ[θ] given Y = y.

Frequentist performance of Bayes estimators for a BNM Maximum MSE of the Bayes estimator of θ for MSE risk Frequentist coverage probability and expected length of the 95% Bayes credible interval for θ.

Bias 2, variance, MSE of Bayes estimator of BNM, π U[ 3, 3] 1 bias 2 variance MSE 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 3 2 1 0 1 2 3!

MSE Risk 1.2 1 0.8 Bayes estimator minimax affine estimator truncated estimate truncated affine minimax " = 1/2 1.1 1 0.9 Bayes risk minimax affine risk bound on nonlinear minimax risk risk of Y " = 3 risk 0.6 0.4 0.2 0.8 0.7 0.6 0.5 0.4 0 0.5 0 0.5! 0.3 2 0 2!

Frequentist coverage of 95% Credible Region 100 frequentist coverage nominal 95%! = 1/2! = 3 100 Coverage (%) 90 80 70 60 90 80 70 60 50 50 0.5 0 0.5 " 2 0 2 "

Expected length of the Bayesian Credible Region for BNM 1 credible truncated truncated Pratt " = 1/2 " = 3 4 mean length 0.95 0.9 0.85 3.5 3 2.5 2 0.8 0.5 0 0.5! 1.5 2 0 2!

Summary Bayesian methods require augmenting data and physical constraints with priors. Difference between frequentist and Bayesian viewpoints: Bayesians claim to know more about how the data are generated. Frequentists claim to know θ Θ, but not how θ is to be selected from Θ. Bayesians claim to know θ is selected at random from Θ according to prior distribution π, known to them. Both claim to know IP η, the probability distribution of data if θ is η, for η Θ.

Summary Bayesian: probability measures what the analyst thinks. Frequentist: probability has to do with empirical regularities. Model-based inference: what does probability mean, outside the model? Bayesian and frequentist measures of uncertainty differ. MSE and PMSE are expectations of the same thing, but wrt different distributions: MSE wrt distribution of the data, holding the parameter fixed PMSE wrt posterior distribution of the parameter, holding data fixed. Coverage probability and credible level are the chance that a set contains θ Coverage probability is wrt distribution of data, holding parameter fixed and allowing set to vary randomly. Credible level is wrt posterior distribution of parameter, holding data and set fixed and allowing θ to vary randomly.

Summary Priors add information not present in constraints. Worse in higher dimensions. With mild conditions, the largest Bayes risk as the prior is allowed to vary is the minimax risk as the parameter is allowed to vary If Bayes risk for a given prior is less than the minimax risk, beware: Do your beliefs match the analyst s?