Conjugate Analysis for the Linear Model

Similar documents
ST 740: Linear Models and Multivariate Normal Inference

The linear model is the most fundamental of all serious statistical models encompassing:

Bayesian Linear Regression

variability of the model, represented by σ 2 and not accounted for by Xβ

Foundations of Statistical Inference

Module 11: Linear Regression. Rebecca C. Steorts

Part 6: Multivariate Normal and Linear Models

Bayesian Linear Models

Bayesian Linear Models

Bayesian Linear Models

Hierarchical Modeling for Univariate Spatial Data

Bayesian Linear Models

ST 740: Model Selection

Bayesian Inference. Chapter 9. Linear models and regression

Statistical View of Least Squares

Module 22: Bayesian Methods Lecture 9 A: Default prior selection

Modeling Real Estate Data using Quantile Regression

Gibbs Sampling in Linear Models #2

Hierarchical Modelling for Univariate Spatial Data

Lecture 16 : Bayesian analysis of contingency tables. Bayesian linear regression. Jonathan Marchini (University of Oxford) BS2a MT / 15

Multinomial Data. f(y θ) θ y i. where θ i is the probability that a given trial results in category i, i = 1,..., k. The parameter space is

Partial factor modeling: predictor-dependent shrinkage for linear regression

Bayesian linear regression

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017

g-priors for Linear Regression

Models for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data

Module 4: Bayesian Methods Lecture 5: Linear regression

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Covariance and Correlation

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

Linear Models A linear model is defined by the expression

INTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y

Chapter 8: Sampling distributions of estimators Sections

Matrix Approach to Simple Linear Regression: An Overview

Problem Selected Scores

Joint Probability Distributions

An Introduction to Bayesian Linear Regression

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Hierarchical Modelling for Univariate Spatial Data

Gibbs Sampling in Endogenous Variables Models

Statistical Models. ref: chapter 1 of Bates, D and D. Watts (1988) Nonlinear Regression Analysis and its Applications, Wiley. Dave Campbell 2009

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Bayesian Econometrics

Statistical View of Least Squares

The binomial model. Assume a uniform prior distribution on p(θ). Write the pdf for this distribution.

STAT 518 Intro Student Presentation

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL*

Default Priors and Effcient Posterior Computation in Bayesian

Ch 3: Multiple Linear Regression

Gibbs Sampling in Latent Variable Models #1

AMS-207: Bayesian Statistics

Bayesian Regression Linear and Logistic Regression

Multivariate Bayesian Linear Regression MLAI Lecture 11

Lecture 6 Multiple Linear Regression, cont.

Bayesian Multivariate Logistic Regression

Part 4: Multi-parameter and normal models

Linear Regression (9/11/13)

CSC 2541: Bayesian Methods for Machine Learning

MCMC algorithms for fitting Bayesian models

Caterpillar Regression Example: Conjugate Priors, Conditional & Marginal Posteriors, Predictive Distribution, Variable Selection

Chapter 4 - Fundamentals of spatial processes Lecture notes

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

A Bayesian Treatment of Linear Gaussian Regression

Stat 5101 Lecture Notes

Chapter 17: Undirected Graphical Models

Consistent high-dimensional Bayesian variable selection via penalized credible regions

13: Variational inference II

One-parameter models

Principles of Bayesian Inference

Multiple Linear Regression

Linear Algebra Review

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Robust Bayesian Simple Linear Regression

Part 8: GLMs and Hierarchical LMs and GLMs

Bayesian Inference: Posterior Intervals

Areal data models. Spatial smoothers. Brook s Lemma and Gibbs distribution. CAR models Gaussian case Non-Gaussian case

Bayesian data analysis in practice: Three simple examples

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

STA 216, GLM, Lecture 16. October 29, 2007

Multivariate Normal & Wishart

Topic 12 Overview of Estimation

Lecture : Probabilistic Machine Learning

ST 740: Multiparameter Inference

Bayesian spatial hierarchical modeling for temperature extremes

Bayesian Inference. Chapter 2: Conjugate models

MFM Practitioner Module: Risk & Asset Allocation. John Dodson. February 3, 2010

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik

A Bayesian Mixture Model with Application to Typhoon Rainfall Predictions in Taipei, Taiwan 1

STA 2201/442 Assignment 2

Markov Chain Monte Carlo

Simple Linear Regression

Bayesian Econometrics - Computer section

Estimating Sparse High Dimensional Linear Models using Global-Local Shrinkage

Bayesian Machine Learning

ST430 Exam 1 with Answers

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

INTRODUCTION TO BAYESIAN STATISTICS

Transcription:

Conjugate Analysis for the Linear Model If we have good prior knowledge that can help us specify priors for β and σ 2, we can use conjugate priors. Following the procedure in Christensen, Johnson, Branscum, and Hanson (2010), we will actually specify a prior for the error precision parameter τ = 1 σ 2 : τ gamma(a, b) This is analogous to placing an inverse gamma prior on σ 2. Then our prior on β will depend on τ: ( ) β τ MVN δ, τ 1 [ X 1 D( X 1 ) ] (Note τ 1 = σ 2 )

Conjugate Analysis for the Linear Model We will specify a set of k a priori reasonable hypothetical observations having predictor vectors x 1,..., x k (these along with a column of 1 s will form the rows of X) and prior expected response values ỹ 1,..., ỹ k. Our MVN prior on β is equivalent to a MVN prior on Xβ: Xβ τ MVN(ỹ, τ 1 D) Hence prior mean of Xβ is ỹ, implying that the prior mean δ of β is X 1 ỹ. D 1 is a diagonal matrix whose diagonal elements represent the weights of the hypothetical observations. Intuitively, the prior has the same worth as tr(d 1 ) observations.

Conjugate Analysis for the Linear Model The joint density is π(β, τ, X, y) τ n/2 τ n/2 D 1/2 τ a 1 e bτ { } exp 1 2 (Xβ y) (τ 1 I) 1 (Xβ y) { } exp 1 2 ( Xβ ỹ) (τ 1 D) 1 ( Xβ ỹ) It can be shown that the conditional posterior for β τ is: β τ, X, y MVN ( ˆβ, τ 1 (X X + X D 1 X) 1) where ˆβ = (X X + X D 1 X) 1 [X y + X D 1 ỹ]

Conjugate Analysis for the Linear Model And the posterior for τ is: where ( n + 2a τ X, y gamma, n + 2a s ) 2 2 s = (y X ˆβ) (y X ˆβ) + (ỹ X ˆβ) D 1 (ỹ X ˆβ) + 2b n + 2a The subjective information is incorporated via ˆβ (a function of X and ỹ) and s (a function of ˆβ, a, and b).

Conjugate Analysis for the Linear Model While the conditional posterior π(β τ, X, y) is multivariate normal, the marginal posterior π(β X, y) is a (scaled) noncentral multivariate t-distribution. In making inference about β, it is easier to use the conditional posterior for β τ. Rather than basing inference on the posterior for β ˆτ (by plugging in a posterior estimate of τ), it is more appropriate to sample random values τ [1],..., τ [J] from the posterior distribution of τ, and then randomly sample from the conditional posterior of β τ [j], j = 1,..., J. Posterior point estimates and interval estimates can then be based on those random draws.

Prior Specification for the Conjugate Analysis We will specify a matrix X of hypothetical predictor values. We also specify (via expert opinion or previous knowledge) a corresponding vector ỹ of reasonable response values for such predictors. The number of such hypothetical observations we specify must be one more than the number of predictor variables in the regression. Our prior mean for β will be X 1 ỹ.

Prior Specification for the Conjugate Analysis We also must specify the shape parameter a and the rate parameter b for the gamma prior on τ. One strategy is to choose a first, based on the degree on confidence in our prior. For a given a, we can view the prior as being worth the same as 2a sample observations. A larger value of a indicates we are more confident in our prior.

Prior Specification for the Conjugate Analysis Here is one strategy for specifying b: Consider any of the hypothetical observations take the first, for example. If ỹ 1 is the prior expected response for a hypothetical observation with predictors x 1, then let ỹ max be the a priori maximum reasonable response for a hypothetical observation with predictors x 1. Then (based on the normal distribution) let a prior guess for σ be ỹmax ỹ 1. 1.645 Since τ = 1, this gives us a reasonable guess for τ. σ2 Set this guess for τ equal to the mean a of the gamma prior b for τ. Since we have already specified a, we can solve for b.

Example of a Conjugate Analysis Example in R with Automobile Data Set We can get point and interval estimates for τ (and thus for σ 2 ). We can get point and interval estimates for the elements of β most easily by drawing from the posterior distributions of τ and then β τ..

A Bayesian Approach to Model Selection In exploratory regression problems, we often must select which subset of our potential predictor variables produces the best model. A Bayesian may consider the possible models and compare them based on their posterior probabilities. Note that if the value of coefficient β j is 0, then variable X j is not needed in the model. Let β j = z j b j for each j, where z j = 0 or 1 and b j (, ). Then our model is Y i = z 0 b 0 +z 1 b 1 X i1 +z 2 b 2 X i2 + +z k 1 b k 1 X i,k 1 +ɛ i, i = 1,..., n where any z j = 0 indicates that this predictor variable does not belong in the model.

A Bayesian Approach to Model Selection Example: Oxygen uptake example: X 1 = group, X 2 = age, X 3 = group age: z = (z 0, z 1, z 2, z 3 ) True E[Y x, b, z] (1,0,0,0) b 0 (1,1,0,0) b 0 + b 1 group (1,0,1,0) b 0 + b 2 age (1,1,1,0) b 0 + b 1 group + b 2 age (1,1,1,1) b 0 + b 1 group + b 2 age + b 3 group age

A Bayesian Approach to Model Selection For each possible value of the vector z, we calculate the posterior probability for that model: For any particular z, say: π(z y, X) = p(z )p(y X, z ) p(z)p(y X, z) z This involves a prior p( ) on each possible model a noninformative approach would be to let all these prior probabilities be equal. If there are a large number of potential predictors, we would use a method called Gibbs sampling) (more on this later) to search over the many models.

Example of Bayesian Model Selection Example in R with Oxygen Data Set We can consider all possible subsets of set of predictor variables: We can consider only certain subsets (here, we only consider including the interaction term when both first-order terms appear):