A Discussion of the Bayesian Approach

Similar documents
STAT 425: Introduction to Bayesian Analysis

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Chapter 5. Bayesian Statistics

CS 361: Probability & Statistics

Bayesian Inference for Normal Mean

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

Introduction to Applied Bayesian Modeling. ICPSR Day 4

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models

Stat 5101 Lecture Notes

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

CS 361: Probability & Statistics

Inference for a Population Proportion

Bayesian Inference. STA 121: Regression Analysis Artin Armagan

Applied Bayesian Statistics STAT 388/488

The subjective worlds of Frequentist and Bayesian statistics

Bayesian data analysis using JASP

Linear Models A linear model is defined by the expression

The Random Variable for Probabilities Chris Piech CS109, Stanford University

Bayesian Models in Machine Learning

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling

Bayesian Inference. Chapter 2: Conjugate models

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017

Introduction. Start with a probability distribution f(y θ) for the data. where η is a vector of hyperparameters

(1) Introduction to Bayesian statistics

ECE521 W17 Tutorial 6. Min Bai and Yuhuai (Tony) Wu

Predicting AGI: What can we say when we know so little?

The Jeffreys Prior. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) The Jeffreys Prior MATH / 13

One-parameter models

Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric?

Lecture 3. Univariate Bayesian inference: conjugate analysis

Classical and Bayesian inference

Lecture 6. Prior distributions

Modeling Environment

Part 2: One-parameter models

Foundations of Statistical Inference

Lecture Notes 1: Decisions and Data. In these notes, I describe some basic ideas in decision theory. theory is constructed from

Module 22: Bayesian Methods Lecture 9 A: Default prior selection

Bayesian Inference. p(y)

PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.

1 Introduction. P (n = 1 red ball drawn) =

Bayesian Inference: Concept and Practice

Class 26: review for final exam 18.05, Spring 2014

Contents. Part I: Fundamentals of Bayesian Inference 1

Other Noninformative Priors

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

Introduction to Bayesian Statistics

Probability and Inference. POLI 205 Doing Research in Politics. Populations and Samples. Probability. Fall 2015

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Parametric Techniques Lecture 3

Unobservable Parameter. Observed Random Sample. Calculate Posterior. Choosing Prior. Conjugate prior. population proportion, p prior:

Bayesian Statistics. State University of New York at Buffalo. From the SelectedWorks of Joseph Lucke. Joseph F. Lucke

A Very Brief Summary of Bayesian Inference, and Examples

Department of Statistics

Bayesian Methods for Machine Learning

Bayesian Analysis for Natural Language Processing Lecture 2

Parametric Techniques

Bayesian Inference: Posterior Intervals

A primer on Bayesian statistics, with an application to mortality rate estimation

INTRODUCTION TO BAYESIAN ANALYSIS

Introduction to Bayesian Methods

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Checking for Prior-Data Conflict

Computational Perception. Bayesian Inference

Lecture : Probabilistic Machine Learning

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

Bayesian RL Seminar. Chris Mansley September 9, 2008

E. Santovetti lesson 4 Maximum likelihood Interval estimation

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

The Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition.

Bayesian Regression Linear and Logistic Regression

Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/??

Bayes: All uncertainty is described using probability.

PMR Learning as Inference

Bayesian Linear Regression

Multinomial Data. f(y θ) θ y i. where θ i is the probability that a given trial results in category i, i = 1,..., k. The parameter space is

Bayesian Methods. David S. Rosenberg. New York University. March 20, 2018

Statistical Theory MT 2007 Problems 4: Solution sketches

CSC321 Lecture 18: Learning Probabilistic Models

Eco517 Fall 2014 C. Sims MIDTERM EXAM

Bernoulli and Poisson models

Bayesian analysis in nuclear physics

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference

MS&E 226: Small Data

Statistical Theory MT 2006 Problems 4: Solution sketches

Exam 2 Practice Questions, 18.05, Spring 2014

Classical and Bayesian inference

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Outline. Motivation Contest Sample. Estimator. Loss. Standard Error. Prior Pseudo-Data. Bayesian Estimator. Estimators. John Dodson.

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation

Conjugate Priors, Uninformative Priors

Poisson regression: Further topics

10. Exchangeability and hierarchical models Objective. Recommended reading

Probability Review - Bayes Introduction

COS513 LECTURE 8 STATISTICAL CONCEPTS

Weakly informative priors

Bayesian Statistics. Debdeep Pati Florida State University. February 11, 2016

Transcription:

A Discussion of the Bayesian Approach Reference: Chapter 10 of Theoretical Statistics, Cox and Hinkley, 1974 and Sujit Ghosh s lecture notes David Madigan

Statistics The subject of statistics concerns itself with using data to make inferences and predictions about the world Researchers assembled the vast bulk of the statistical knowledge base prior to the availability of significant computing Lots of assumptions and brilliant mathematics took the place of computing and led to useful and widely-used tools Serious limits on the applicability of many of these methods: small data sets, unrealistically simple models, Produce hard-to-interpret outputs like p-values and confidence intervals

Bayesian Statistics The Bayesian approach has deep historical roots but required the algorithmic developments of the late 1980 s before it was of any use The old sterile Bayesian-Frequentist debates are a thing of the past Most data analysts take a pragmatic point of view and use whatever is most useful

Think about this Denote θ the probability that the next operation in hospital A results in a death Use the data to estimate (i.e., guess the value of) θ

Introduction Classical approach treats θ as fixed and draws on a repeated sampling principle Bayesian approach regards θ as the realized value of a random variable Θ, with density f Θ (θ) ( the prior ) This makes life easier because it is clear that if we observe data X=x, then we need to compute the conditional density of Θ given X=x ( the posterior ) The Bayesian critique focuses on the legitimacy and desirability of introducing the rv Θ and of specifying its prior distribution

Bayesian Estimation e.g. beta-binomial model: 1 1 1 1 ) (1 ) (1 ) (1 ) ( ) ( ) (! +!! +!!!! =!! = " # $ # $ % % % % % % % % % r n r r n r p D p D p Predictive distribution:!! + = + = + " " " " " d D p n x p d D n x p D n x p ) ( ) 1) ( ( ) 1), ( ( ) 1) ( (

Interpretations of Prior Distributions 1. As frequency distributions 2. As normative and objective representations of what is rational to believe about a parameter, sometimes in a state of ignorance 3. As a subjective measure of what a particular individual, you, actually believes

Prior Frequency Distributions Sometimes the parameter value may be generated by a stable physical mechanism that may be known, or inferred from previous data e.g. a parameter that is a measure of a properties of a batch of material in an industrial inspection problem. Data on previous batches allow the estimation of a prior distribution Has a physical interpretation in terms of frequencies

Normative/Objective Interpretation Central problem: specifying a prior distribution for a parameter about which nothing is known If θ can only have a finite set of values, it seems natural to assume all values equally likely a priori This can have odd consequences. For example specifying a uniform prior on regression models: [], [1], [2], [3], [4], [12], [13], [14], [23], [24], [34], [123], [124], [134], [234], [1234] assigns prior probability 6/16 to 3-variable models and prior probability only 4/16 to 2-variable models

Continuous Parameters Invariance arguments. e.g. for a normal mean µ, argue that all intervals (a,a+h) should have the same prior probability for any given h and all a. This leads a unform prior on the entire real line ( improper prior ) For a scale parameter, σ, may say all (a,ka) have the same prior probability, leading to a prior proportional to 1/ σ, again improper

Continuous Parameters Natural to use a uniform prior (at least if the parameter space is of finite extent) However, if Θ is uniform, an arbitrary non-linear function, g(θ), is not Example: p(θ)=1, θ>0. Re-parametrize as: then: where so that: ignorance about θ does not imply ignorance about γ. The notion of prior ignorance may be untenable?

The Jeffreys Prior (single parameter) Jeffreys prior is given by: where is the expected Fisher Information This is invariant to transformation in the sense that all parametrizations lead to the same prior Can also argue that it is uniform for a parametrization where the likelihood is completely determined except for its location (see Box and Tiao, 1973, Section 1.3)

Jeffreys for Binomial which is a beta density with parameters ½ and ½

Other Jeffreys Priors

Improper Priors => Trouble (sometimes) Suppose Y 1,.,Y n are independently normally distributed with constant variance σ 2 and with: Suppose it is known that ρ is in [0,1], ρ is uniform on [0,1], and γ, β, and σ have improper priors Then for any observations y, the marginal posterior density of ρ is proportional to where h is bounded and has no zeroes in [0,1]. This posterior is an improper distribution on [0,1]!

Improper prior usually => proper posterior =>

Another Example

Subjective Degrees of Belief Probability represents a subjective degree of belief held by a particular person at a particular time Various techniques for eliciting subjective priors. For example, Good s device of imaginary results. e.g. binomial experiment. beta prior with a=b. Imagine the experiment yields 1 tail and n-1 heads. How large should n be in order that we would just give odds of 2 to 1 in favor of a head occurring next? (eg n=4 implies a=b=1)

Problems with Subjectivity What if the prior and the likelihood disagree substantially? The subjective prior cannot be wrong but may be based on a misconception The model may be substantially wrong Often use hierarchical models in practice:

General Comments Determination of subjective priors is difficult Difficult to assess the usefulness of a subjective posterior Don t be misled by the term subjective ; all data analyses involve appreciable personal elements

EVVE

Bayesian Compromise between Data and Prior Posterior variance is on average smaller than the prior variance Reduction is the variance of posterior means over the distribution of possible data

Posterior Summaries Mean, median, mode, etc. Central 95% interval versus highest posterior density region (normal mixture example )

Conjugate priors

Example: Football Scores point spread Team A might be favored to beat Team B by 3.5 points The prior probability that A wins by 4 points or more is 50% Treat point spreads as given in fact there should be an uncertainty measure associated with the point spread

Example: Football Scores outcome-spread seems roughly normal, e.g., N(0,14 2 ) Pr(favorite wins spread = 3.5) = Pr(outcome-spread > -3.5) = 1 Φ(-3.5/14) = 0.598 Pr(favorite wins spread = 9.0) = 0.74

Example: Football Scores, cont Model: (X=)outcome-spread ~ N(0,σ 2 ) Prior for σ 2? The inverse-gamma is conjugate

Example: Football Scores, cont n = 2240 and v = 187.3 Prior Posterior Inv-χ 2 (3,10) => Inv-χ 2 (2243,187.1) Inv-χ 2 (1,50) => Inv-χ 2 (2241,187.2) Inv-χ 2 (100,180) => Inv-χ 2 (2340,187.0)

Example: Football Scores Pr(favorite wins spread = 3.5) = Pr(outcome-spread > -3.5) = 1 Φ(-3.5/σ) = 0.598 Simulate from posterior: postsigmasample <- sqrt(rsinvchisq(10000,2340,187.0)) hist(1-pnorm(- 3.5/postSigmaSample),nclass=50)

Example: Football Scores, cont n = 10 and v = 187.3 Prior Posterior Inv-χ 2 (3,10) => Inv-χ 2 (13,146.4) Inv-χ 2 (1,50) => Inv-χ 2 (11,174.8) Inv-χ 2 (100,180) => Inv-χ 2 (110,180.7)

Prediction Posterior Predictive Density of a future observation θ binomial example, n=20, x=12, a=1, b=1 y ~ y

Prediction for Univariate Normal

Prediction for Univariate Normal Posterior Predictive Distribution is Normal

Prediction for a Poisson