Bayesian statistics, simulation and software

Similar documents
Bayesian statistics, simulation and software

STAT 830 Bayesian Estimation

USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL*

Statistical Inference: Maximum Likelihood and Bayesian Approaches

Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/??

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Bayesian Inference, and Examples

Chapter 4. Bayesian inference. 4.1 Estimation. Point estimates. Interval estimates

Parameter Estimation

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

Bayesian statistics, simulation and software

Foundations of Statistical Inference

Chapter 8: Sampling distributions of estimators Sections

MATH c UNIVERSITY OF LEEDS Examination for the Module MATH2715 (January 2015) STATISTICAL METHODS. Time allowed: 2 hours

MAS451: Principles of Statistics

9 Bayesian inference. 9.1 Subjective probability

Statistics & Data Sciences: First Year Prelim Exam May 2018

Module 22: Bayesian Methods Lecture 9 A: Default prior selection

Remarks on Improper Ignorance Priors

A tutorial on. Richard Boys

Bayesian statistics, simulation and software

First Year Examination Department of Statistics, University of Florida

More on nuisance parameters

STAT 450: Final Examination Version 1. Richard Lockhart 16 December 2002

1 Hypothesis Testing and Model Selection

STAT215: Solutions for Homework 2


Chapter 8.8.1: A factorization theorem

Foundations of Statistical Inference


Probability distribution functions

Econ 2140, spring 2018, Part IIa Statistical Decision Theory

SOLUTION FOR HOMEWORK 6, STAT 6331

Naïve Bayes classification

Bayesian philosophy Bayesian computation Bayesian software. Bayesian Statistics. Petter Mostad. Chalmers. April 6, 2017

The Jeffreys Prior. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) The Jeffreys Prior MATH / 13

Markov Chain Monte Carlo (MCMC)

STAT J535: Chapter 5: Classes of Bayesian Priors

Sampling Distributions

Interval Estimation. Chapter 9

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

The Normal Linear Regression Model with Natural Conjugate Prior. March 7, 2016

General Bayesian Inference I

Spring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n =

The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for Bayesian Estimation in a Finite Gaussian Mixture Model

Classical and Bayesian inference

Homework 1: Solution

Lecture 2: Statistical Decision Theory (Part I)

Part III. A Decision-Theoretic Approach and Bayesian testing

Bayesian Inference: Posterior Intervals

An Introduction to Bayesian Linear Regression

Introduction to Probabilistic Machine Learning

Stats 579 Intermediate Bayesian Modeling. Assignment # 2 Solutions

1 Introduction. P (n = 1 red ball drawn) =

1 Priors, Contd. ISyE8843A, Brani Vidakovic Handout Nuisance Parameters

Statistical Theory MT 2006 Problems 4: Solution sketches

Introduction to Markov Chain Monte Carlo & Gibbs Sampling

Overall Objective Priors

Linear Models A linear model is defined by the expression

ST 740: Multiparameter Inference

CSC 2541: Bayesian Methods for Machine Learning

Statistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003

Two hours. Statistical Tables to be provided THE UNIVERSITY OF MANCHESTER. 14 January :45 11:45

COS513 LECTURE 8 STATISTICAL CONCEPTS

Econ 2148, spring 2019 Statistical decision theory

MIT Spring 2016

Chapter 5. Bayesian Statistics

Bayes Testing and More

Computer intensive statistical methods

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1

MAS3301 Bayesian Statistics Problems 5 and Solutions

4. CONTINUOUS RANDOM VARIABLES

Sampling Distributions

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart

Statistical Theory MT 2007 Problems 4: Solution sketches

The exponential family: Conjugate priors

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Statistics. Statistics

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER. 21 June :45 11:45

Checking for Prior-Data Conflict

INTRODUCTION TO BAYESIAN STATISTICS

Mixture models. Mixture models MCMC approaches Label switching MCMC for variable dimension models. 5 Mixture models

Parametric Techniques Lecture 3

Chapter 9: Hypothesis Testing Sections

Bayesian Statistics Part III: Building Bayes Theorem Part IV: Prior Specification

Bayesian Inference in a Normal Population

Lecture 5. i=1 xi. Ω h(x,y)f X Θ(y θ)µ Θ (dθ) = dµ Θ X

Chapter 3. Prior elicitation. MAS2317/3317: Introduction to Bayesian Statistics

STA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources

Modern Methods of Statistical Learning sf2935 Auxiliary material: Exponential Family of Distributions Timo Koski. Second Quarter 2016

COMP90051 Statistical Machine Learning

Bayesian Inference. Chapter 2: Conjugate models

Lecture 1: Introduction

Approximation of Posterior Means and Variances of the Digitised Normal Distribution using Continuous Normal Approximation

Probability and Distributions

Credibility Modeling with Applications. Tatiana Khapaeva. Thesis presented in a partial fulfilment of the requirements.

STAT 730 Chapter 4: Estimation

Monte Carlo Methods in Bayesian Inference: Theory, Methods and Applications

Example: An experiment can either result in success or failure with probability θ and (1 θ) respectively. The experiment is performed independently

Transcription:

Module 10: Bayesian prediction and model checking Department of Mathematical Sciences Aalborg University 1/15

Prior predictions Suppose we want to predict future data x without observing any data x. Assume: Data model: x θ x θ π(x θ). Prior: π(θ). This implies a joint distribution: ( x,θ) π(x,θ) = π(x θ)π(θ). From this joint distribution we obtain the marginal density of x, x π( x) = π(x θ)π(θ)dθ, which is called the prior predictive density. 2/15

Prior prediction: Normal case, τ known Assume: Data model: π(x µ) N(µ,τ). Prior: π(µ) N(µ 0,τ 0 ). Prior predictive density: π(x) = π(x µ)π(µ)dµ = τ 2π exp ( 1 2 τ(x µ)2 ) τ0 2π exp and a simple calculation shows ( π(x) exp 1 ) ττ 0 (x µ 0 ) 2 N 2τ +τ 0 ( 12 τ 0(µ µ 0 ) 2 ) dµ ( µ 0, ττ 0 τ +τ 0 Easier argument: x µ ) N(0,τ) ( is independent of µ N(µ 0,τ 0 ), so x N (µ 0,( 1τ + 1τ0 ) 1 = N µ 0, ττ 0 τ+τ 0 ). NB: prior predictive var. is larger than both prior var. and data var. ). 3/15

Prior predictive distribution Illustration of the fact that prior predictive precision < prior precision (ignore the dashed line): 0.10 0.08 0.06 0.04 0.02 0.00 Prior Prior predictive Predictive if µ =170 140 150 160 170 180 190 200 4/15

Simulating the prior predictive distribution If the prior predictive density π(x) is difficult to derive we can simply make a simulation x in two steps: 1. Generate parameter from prior: θ π(θ). 2. Conditional on θ generate x: x π(x θ). 5/15

Posterior prediction Now, suppose we have observed data x and want to predict a possible future observation x given data x. Assume: Data model: x θ x θ π(x θ), and given θ then x and x are independent. Prior: π(θ). The joint density of predicted data x, data x and parameter θ is π( x,x,θ) = π( x,x θ)π(θ) = π( x θ)π(x θ)π(θ) = π( x θ)π(θ x)π(x) where π( x θ) and π(x θ) represent the same conditional distribution (namely that from the data model). The posterior predictive distribution is the (marginal) distribution of x conditional on data x: π( x,θ,x) π( x x) = π( x, θ x)dθ = dθ = π( x θ)π(θ x)dθ. π(x) Thus we have now replaced the prior density with the posterior density. 6/15

Simulating the posterior predictive distribution If the posterior predictive density π( x x) is difficult to derive we can simply make a simulation x in two steps: 1. Generate parameter from posterior: θ x π(θ x). 2. Conditional on θ generate x from data model: x π(x θ). 7/15

Posterior prediction: Normal case, τ known Data model: X 1,X 2,...,X n iid N(µ,τ). Prior: π(µ) N(µ 0,τ 0 ). Posterior: π(µ x) N (µ 1,τ 1 ), µ 1 = nτ x+τ0µ0 nτ+τ 0 and τ 1 = nτ +τ 0. ( ) τ As the prior predictive distribution (of one observation) is N µ 0, 0τ τ+τ 0 and the posterior is the prior for the posterior prediction, we obtain by replacing µ 0 by µ 1 and replacing τ 0 by τ 1 that ( ) ( ττ 1 nτ x+τ0 µ 0 x x N µ 1, = N, (nτ +τ ) 0)τ. τ +τ 1 nτ +τ 0 τ +nτ +τ 0 NB: Posterior predictive mean and posterior mean are equal but posterior predictive precision ττ1 τ+τ 1 is smaller than posterior precision τ 1 and smaller than prior precision τ. When n is large, we have x x approx N( x,τ). When τ 0 = 0 (i.e. we consider an improper prior), we have (as in classical statistics) µ x N( x,nτ) and x x approx N( x,τ) (for n large). 8/15

Prior and posterior predictive distributions 0.10 0.08 0.06 Prior Prior predictive Predictive if µ =170 0.04 0.02 0.00 0.10 0.08 0.06 140 150 160 170 180 190 200 Posterior Prior predictive Posterior predictive 0.04 0.02 0.00 140 150 160 170 180 190 200 9/15

Model checking Idea: If the model is correct, then posterior predictions of the data should look like the observed data. Difficulty: How to choose a good measure of similarity? Example: We have observed a sequence of n = 20 zeros and ones: 1 1 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 Model: X 1,X 2,...,X 20 are IID where P(X i = 1) = p is unknown. Prior: π(p) Be(α,β) where α > 0 and β > 0 are known. Posterior: π(p x) Be(#ones+α,#zeros+β). Model checking: We simulate N posterior predictive realisations X (i) = ( X (i) 1 (i) (i), X 2,..., X 20 ) i = 1,...,N. If these vectors look similar to the data above, it would indicate that the model is probably okay. 10/15

Model checking: First attempt (a failure) Define summary function s(x) = #ones in x. Histogram for s( x (i) ) for N = 10,000 independent posterior predictions: 0 400 800 1400 Histogram of proportion of ones 0.0 0.2 0.4 0.6 0.8 So the observed number of ones is in no way unusual compared to the posterior predictions. This is just as expected so we need another summary function s(x). 11/15

Model checking: Second attempt (a success) Define summary function s(x) = number of switches between ones and zeros in x. In the data the number of switches is 3: 1 1 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 Histogram for s( x (i) ) for N = 10,000 independent posterior predictions: Histogram of number of switches 0 500 1000 0 5 10 15 20 Only around 1.7% of the posterior prediction have 3 or fewer switches. This suggests that the model assumption of independence is questionable. 12/15

Example: Speed of light 66 measurements of the time it takes light to travel 7445 meters (deviations in nanoseconds from a given number): 0 10 20 30 40 Data model: 24.76 24.78 24.80 24.82 24.84 x 1,...,x 66 iid N(µ,τ). (Questionable?) Prior: π(µ,τ) N(0,0.001) Gamma(0.001,1000). 13/15

Example: Speed of light Posterior distribution of µ, τ and 1/τ: 0 50 100 150 200 250 300 0 50 100 150 200 250 0 100 200 300 24.822 24.826 24.830 4000 8000 12000 0.00010 0.00020 Red lines denote sample mean and sample variance, respectively. 14/15

Example: Speed of light Data contain one very low measurement. Is this unusual? Generate 1000 posterior predictive samples x (i) = (x (i) 1,...,x(i) i = 1,...,1000, and define s(x) = min{x 1,...x 66 }. 66 ), 0 100 200 300 24.76 24.78 24.80 24.82 24.84 24.86 24.88 Conclusion: The smallest value in the data is very unlikely under the assumed model. 15/15