Bayesian Methods: Introduction to Multi-parameter Models

Similar documents
Topic 9: Sampling Distributions of Estimators

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes.

Direction: This test is worth 150 points. You are required to complete this test within 55 minutes.

Topic 9: Sampling Distributions of Estimators

1.010 Uncertainty in Engineering Fall 2008

Chapter 6 Sampling Distributions

Topic 9: Sampling Distributions of Estimators

Chapter 6 Principles of Data Reduction

Simulation. Two Rule For Inverting A Distribution Function

Stat 421-SP2012 Interval Estimation Section

CS284A: Representations and Algorithms in Molecular Biology

Exponential Families and Bayesian Inference

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Lecture 18: Sampling distributions

Lecture 11 and 12: Basic estimation theory

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Lecture 33: Bootstrap

Parameter, Statistic and Random Samples

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

Lecture 3. Properties of Summary Statistics: Sampling Distribution

Lecture 2: Monte Carlo Simulation

R. van Zyl 1, A.J. van der Merwe 2. Quintiles International, University of the Free State

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization

Random Variables, Sampling and Estimation

Lecture 12: September 27

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15

Computing Confidence Intervals for Sample Data

Lecture 19: Convergence

Table 12.1: Contingency table. Feature b. 1 N 11 N 12 N 1b 2 N 21 N 22 N 2b. ... a N a1 N a2 N ab

GUIDELINES ON REPRESENTATIVE SAMPLING

Problem Set 4 Due Oct, 12

AAEC/ECON 5126 FINAL EXAM: SOLUTIONS

Solutions: Homework 3

of the matrix is =-85, so it is not positive definite. Thus, the first

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Stochastic Simulation

Distribution of Random Samples & Limit theorems

Binomial Distribution

Lecture 7: Properties of Random Samples

Properties and Hypothesis Testing

Maximum Likelihood Estimation and Complexity Regularization

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Interval Estimation (Confidence Interval = C.I.): An interval estimate of some population parameter is an interval of the form (, ),

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments:

Statistical Theory MT 2008 Problems 1: Solution sketches

Review Questions, Chapters 8, 9. f(y) = 0, elsewhere. F (y) = f Y(1) = n ( e y/θ) n 1 1 θ e y/θ = n θ e yn

STATISTICAL INFERENCE

Probability and statistics: basic terms

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

n n i=1 Often we also need to estimate the variance. Below are three estimators each of which is optimal in some sense: n 1 i=1 k=1 i=1 k=1 i=1 k=1

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

Monte Carlo Integration

4. Partial Sums and the Central Limit Theorem

The Bayesian Learning Framework. Back to Maximum Likelihood. Naïve Bayes. Simple Example: Coin Tosses. Given a generative model

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Statistics 511 Additional Materials

( θ. sup θ Θ f X (x θ) = L. sup Pr (Λ (X) < c) = α. x : Λ (x) = sup θ H 0. sup θ Θ f X (x θ) = ) < c. NH : θ 1 = θ 2 against AH : θ 1 θ 2

On an Application of Bayesian Estimation

1 Models for Matched Pairs

Statistical Theory MT 2009 Problems 1: Solution sketches

This is an introductory course in Analysis of Variance and Design of Experiments.

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

Chapter 13: Tests of Hypothesis Section 13.1 Introduction

7.1 Convergence of sequences of random variables

Basis for simulation techniques

Bayesian Control Charts for the Two-parameter Exponential Distribution

Lecture Notes 15 Hypothesis Testing (Chapter 10)

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

Frequentist Inference

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

Algorithms for Clustering

5. Likelihood Ratio Tests

UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL/MAY 2009 EXAMINATIONS ECO220Y1Y PART 1 OF 2 SOLUTIONS

ECO 312 Fall 2013 Chris Sims LIKELIHOOD, POSTERIORS, DIAGNOSING NON-NORMALITY

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Module 1 Fundamentals in statistics

MATH/STAT 352: Lecture 15

Section 14. Simple linear regression.

Estimation for Complete Data

4.5 Multiple Imputation

Stat 200 -Testing Summary Page 1

Chapter 13, Part A Analysis of Variance and Experimental Design

1 Inferential Methods for Correlation and Regression Analysis

Joint Probability Distributions and Random Samples. Jointly Distributed Random Variables. Chapter { }

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Regression and generalization

Lecture 9: September 19

32 estimating the cumulative distribution function

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9

Common Large/Small Sample Tests 1/55

Summary. Recap ... Last Lecture. Summary. Theorem

Factor Analysis. Lecture 10: Factor Analysis and Principal Component Analysis. Sam Roweis

Lecture 3: MLE and Regression

Chapter 6 Part 5. Confidence Intervals t distribution chi square distribution. October 23, 2008

Empirical Process Theory and Oracle Inequalities

Transcription:

Bayesia Methods: Itroductio to Multi-parameter Models Parameter: θ = ( θ, θ) Give Likelihood p(y θ) ad prior p(θ ), the posterior p proportioal to p(y θ) x p(θ ) Margial posterior ( θ, θ y) is Iterested oly i θ? o Loss fuctio depeds o θ oly Eg y ~ N ( µσ, ), both ukow The θ = ( µσ, ) ad L( θδ, ) ( µ δ( x)) = o Ca show that the expected posterior loss ivolves the margial posterior p( θ y) = p( θ, θ yd ) θ = p( θ θ, y) p( θ yd ) θ Note that the margial posterior of θ is a mixture of coditioal posteriors give θ Whe θ takes discrete values (,,,M), possibly deotig differet models, the the posterior of θ is a weighted average of posteriors give each model The weights deped o the combied evidece from the prior ad the data, p( θ y) This is the key idea uderlyig Bayesia Model Averagig (BMA) Give a strog belief i the prior specificatio, Bayesia eed ot select a model BMA would miimize expected risk We ca always draw samples from the joit posterior p( θ, θ y) o If it is easier to sample from p( θ y ), draw samples from this distributio, ad the for each of these samples, draw from p y To obtai samples from p( θ y ), igore the θcoordiate ( θ, θ )

Aalysis for Normal Data, N ( µσ, ) with No-iformative prior Prior p ( µ σ ) c; p( σ ) ( σ ), ie, ( µ,l σ) Uiform Normal likelihood: Give ( µσ, ), iid observatios lead to the sufficiet statistics ys, = ( yi y) ad the likelihood fuctio ( ) / ( σ ) exp{ [( ) s + y ( µ )]} σ The posterior of ( µσ, ), proportioal to likelihood times the prior, factorizes ito two parts: exp{ ( y µ )}; σ ( + )/ ( σ ) exp{ ( ) s } σ The first term represets the kerel of a Normal desity with mea y, ad var σ /, except for the costat / πσ / o Whe usig the precisio otatio { τ = (/ σ )}, we say that give y ad τ, the coditioal posterior of µ is Normal with mea y, ad precisio τ Note that for a sample of size, give ( µσ, ), y has a Normal distributio with precisio τ For the margial posterior of τ or σ : o We must itegrate out the first term correspodig to the coditioal posterior of µ, give τ, i the above joit posterior expressio This yields a term proportioal to σ Alteratively, the first term is proportioal to σ * N( y, σ / ) Now, the margial posterior desity of σ is proportioal to ( + ) + ν ( σ ) exp{ }, where ν = ( ) s σ Note that this correspods to a iverse-gamma desity, or a scaled iverse chi-squared For its summary statistic, mea, media ad mode, see Table A i the text book (pp 574)

o However, i order to obtai the desity of τ, we must accout for a chage of variables from σ to τ Sice, the absolute value of dσ / dτ yields the term τ, the margial posterior desity of τ is ( + ) proportioal to τ exp{ ντ }, which correspods to a scaled chisquared with (-) degrees of freedom ad scale parameter ν, or a - ν Gamma(, ) Note that, the posterior desity of ντ is a chi-squared radom variable with (-) degrees of freedom, which is same as the samplig distributio of ν = ( ) s Give this o-iformative prior o the parameters, the distributio of the pivotal quatity uchaged ( ) s σ remaied The posterior distributio of ( µσ, ) belogs to Normal-Iverted Gamma family, ad that of ( µτ, ) belogs to Normal-Gamma family We ca easily draw samples from this joit posterior by first drawig samples from a Gamma (scaled chi-squared), ad the give each τ, draw a sample from the coditioal Normal distributio of µ For the Margial Posterior of µ o Sice, the coditioal posterior distributio of µ give τ is Normal with mea y, ad var σ /, therefore, give τ Z = ( µ y) τ ~ N(0,) Sice the distributio of Z does t deped o the coditioig variable τ, (Z, τ ) are idepedet radom variables Thus ντ is chi-squared radom variable with (-) df, idepedet of Z Hece, Z ( µ y) = ~ ντ/( ) s Studet s t with (-) degrees of freedom Thus, the margial posterior of µ is a t-distributio with locatio y ad scale s/

o Note that with this o-iformative prior, the samplig distributio of the pivotal quatity, t = ( y µ ) s also has the Studet s t distributio with (-) df o Note that the t-distributio represets a scale-mixture of Normal radom variables, whe the scale has a iverted Gamma distributio Posterior-predictive desity of Future Observatio(s) o I order to predict a future observable ỹ, whose desity depeds o ( µσ, ), we eed to fid the predictive desity, p( ỹ y), whe the ucertaity about the parameters ( µσ, ) is give by its posterior o Of course, give the samples from the posterior of ( µσ, ), ad the desity ỹ ( µσ, ), oe ca ow draw samples from the joit desity of ( y, µτ, ) Igorig the secod ad third colums provides the samples from the posterior-predictive desity of ỹ o However, if the future observatio is also from Normal( µσ, ) populatio, oe ca easily get the aalytic expressio of the posterior predictive desity Give ( µσ, ), y = µ + σz, where Z is a stadard Normal radom variable Furthermore, µ ( y, σ ) is Normal with mea y, ad var σ /, it follows that y y, σ is Normal with mea Hece, give ( y, τ ), τ / ( y y) U = ( + )/ y, ad var ( / ) σ + is a Normal (0,) radom variable Furthermore, sice ντ is a idepedet Chi-squared radom variable with (-) degrees of freedom, it follows that y y s ( + )/ has a t-distributio I other words, the posterior-predictive desity of ỹ is a t- distributio with locatio y ad scale s ( + )/ Note that if we wat to predict m future observatios from this same populatio, kowig that the y m, smis the sufficiet statistic, we ca achieve this task by first predictig oe observatio from N( µσ, / m), as above, as well as oe from

the predictive desity of s m, which ca be foud similarly Now, give y m, sm, the coditioal distributio of Y,, Ym does t deped o the parameters Thus we ca ow draw Y s from this distributio The Example o the speed of light is worth readig, sice i this case the outliers do ot satisfy the ormal model, ad the posterior based o this data model does ot look good I fact, i this problem, the sigal to oise ratio is very small, so the model has to be really good o I fact, the values of the physical costats are reviewed every five years by the Committee o Data for Sciece ad Techology (CODATA), see, eg, http://physicsistgov/cuu/referece/cotetshtml ad a iterestig article o implicatios of o-costat velocity of light at http://wwwldolphiorg/cdkcoseqhtml CODATA evaluates the collectio of observatios made i the iterveig five years for outliers etc, ad the updates the values of physical costats Of course, the chage i a few least sigificat digits Aalysis of Normal data N µσ Gamma prior (, ) with cojugate Normal-iverted Give the likelihood of iid observatios from Normal, the cojugate prior should also have two terms of the same form It suggests the cojugate Prior: µ σ ~ N( µ 0, σ / κ 0) [This prior is equivalet to the posterior from a state startig with uiform prior, ad drawig κ0 with observed mea µ 0 whe the variace is kow] I additio, τ ~scaled χ with scale ντ ( ν / σ ) ν ντ (, ) ν0 0 0 0 0 σ = =, or a Gamma 0 0 0 Note that i the cojugate prior, the two parameters are depedet, but we are assigig idepedet distributios to ( µ / σ ad / σ ) The sigal to oise ratio ( µ / σ ) is a very popular parameter i Egieerig applicatios o I effect, the prior is same as a radom effect model for µ, which may ot be suitable i some applicatios [See the textbook o this issue]

O multiplyig the likelihood by the prior, it is easy to see that the posterior is also a Normal-Iverted Gamma form with updated parameters µ ωµ ω ω κ ν 0 = 0 + ( ) y, where =, κ = κ +, 0 0 = ν +, ad νσ = νσ + ( ) s + ω ( y µ ) 0 0 0 Agai, samplig from this distributio is self-explaatory Now, for the margial posterior distributio of µ, followig the discussio i the o-iformative prior case, it is easy to see that we get a t-distributio with locatio µ ad scale ( σ / κ ) Similarly, the predictive desity of a future observatio ca be obtaied Aalysis of Normal data with semi-cojugate prior I some applicatios, the prior o (, ) µσ may be required to be idepedet I this case, the joit posterior will ot factorize ay more, but oe ca still obtai the coditioal ad margial posteriors κ