Gibbs Sampling in Linear Models #1

Similar documents
Gibbs Sampling in Linear Models #2

Gibbs Sampling in Latent Variable Models #1

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Gibbs Sampling in Endogenous Variables Models

Bayesian Linear Regression

Monte Carlo Composition Inversion Acceptance/Rejection Sampling. Direct Simulation. Econ 690. Purdue University

Bayesian Linear Models

GLS and FGLS. Econ 671. Purdue University. Justin L. Tobias (Purdue) GLS and FGLS 1 / 22

Bayesian data analysis in practice: Three simple examples

Bayesian Linear Models

The linear model is the most fundamental of all serious statistical models encompassing:

CSC 2541: Bayesian Methods for Machine Learning

Bayesian Linear Models

Regression #3: Properties of OLS Estimator

MCMC algorithms for fitting Bayesian models

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

Bayesian Hypothesis Testing in GLMs: One-Sided and Ordered Alternatives. 1(w i = h + 1)β h + ɛ i,

A Bayesian Treatment of Linear Gaussian Regression

Part 6: Multivariate Normal and Linear Models

Bayesian Methods in Multilevel Regression

Regression #4: Properties of OLS Estimator (Part 2)

Principles of Bayesian Inference

Regression #5: Confidence Intervals and Hypothesis Testing (Part 1)

variability of the model, represented by σ 2 and not accounted for by Xβ

Module 4: Bayesian Methods Lecture 5: Linear regression

Kazuhiko Kakamu Department of Economics Finance, Institute for Advanced Studies. Abstract

Default Priors and Effcient Posterior Computation in Bayesian

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Regression #2. Econ 671. Purdue University. Justin L. Tobias (Purdue) Regression #2 1 / 24

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Metropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9

Regression #8: Loose Ends

David Giles Bayesian Econometrics

17 : Markov Chain Monte Carlo

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Module 11: Linear Regression. Rebecca C. Steorts

Markov Chain Monte Carlo

Module 17: Bayesian Statistics for Genetics Lecture 4: Linear regression

ECO 513 Fall 2008 C.Sims KALMAN FILTER. s t = As t 1 + ε t Measurement equation : y t = Hs t + ν t. u t = r t. u 0 0 t 1 + y t = [ H I ] u t.

Hierarchical Linear Models. Jeff Gill. University of Florida

The joint posterior distribution of the unknown parameters and hidden variables, given the

Part 8: GLMs and Hierarchical LMs and GLMs

Lecture 5: Spatial probit models. James P. LeSage University of Toledo Department of Economics Toledo, OH

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

November 2002 STA Random Effects Selection in Linear Mixed Models

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 15-7th March Arnaud Doucet

Markov Chain Monte Carlo methods

MCMC Methods: Gibbs and Metropolis

AMS-207: Bayesian Statistics

Machine Learning (CS 567) Lecture 5

Hierarchical Linear Models

Relevance Vector Machines

Part 7: Hierarchical Modeling

Bayesian model selection: methodology, computation and applications

Embedding Supernova Cosmology into a Bayesian Hierarchical Model

Technical Appendix Detailing Estimation Procedures used in A. Flexible Approach to Modeling Ultimate Recoveries on Defaulted

Accounting for Complex Sample Designs via Mixture Models

David Giles Bayesian Econometrics

Nonparametric Regression

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA

Eco517 Fall 2014 C. Sims FINAL EXAM

Using Model Selection and Prior Specification to Improve Regime-switching Asset Simulations

Bayesian Inference for Regression Parameters

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2

The Expectation-Maximization Algorithm

Practical Bayesian Quantile Regression. Keming Yu University of Plymouth, UK

MH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution

David Giles Bayesian Econometrics

Statistical Estimation

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

A Few Special Distributions and Their Properties

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart

STAT 730 Chapter 4: Estimation

Sequential Monte Carlo Methods (for DSGE Models)

Lecture 4: Dynamic models

MULTILEVEL IMPUTATION 1

STA 414/2104, Spring 2014, Practice Problem Set #1

A Bayesian Probit Model with Spatial Dependencies

Computational statistics

Principles of Bayesian Inference

Study Notes on the Latent Dirichlet Allocation

AAEC/ECON 5126 FINAL EXAM: SOLUTIONS

Sparse Linear Models (10/7/13)

10. Exchangeability and hierarchical models Objective. Recommended reading

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

1 Bayesian Linear Regression (BLR)

Hierarchical Linear Models. Hierarchical Linear Models. Much of this material already seen in Chapters 5 and 14. Hyperprior on K parameters α:

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

Bayesian Regression (1/31/13)

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

Lecture 6: Discrete Choice: Qualitative Response

Computer Intensive Methods in Mathematical Statistics

Katsuhiro Sugita Faculty of Law and Letters, University of the Ryukyus. Abstract

Lecture 12: Application of Maximum Likelihood Estimation:Truncation, Censoring, and Corner Solutions

Variational Inference (11/04/13)

STA 4273H: Statistical Machine Learning

Propensity Score Matching

Principles of Bayesian Inference

Transcription:

Gibbs Sampling in Linear Models #1 Econ 690 Purdue University Justin L Tobias Gibbs Sampling #1

Outline 1 Conditional Posterior Distributions for Regression Parameters in the Linear Model [Lindley and Smith (1972, JRSSB)] 2 Gibbs Sampling in the Classical Linear Regression Model Application to our Wage Data 3 Gibbs Sampling in a Linear Regression Model with Groupwise Heteroscedasticity Justin L Tobias Gibbs Sampling #1

Before discussing applications of Gibbs sampling in several different linear models, we must first prove an important result that will assist us in deriving needed conditional posterior distributions This result is relevant for the derivation of posterior conditional distributions for regression parameters, and makes use of our completion of the square formulae presented earlier We will use this result all the time in this and subsequent lectures I m not kidding ALL THE TIME! (So try and get familiar with it) Justin L Tobias Gibbs Sampling #1

Deriving β Σ, y in the LRM Consider the regression model (where y is n 1) under the proper prior (and some independent prior on Σ, p(σ) which we will not explicitly model) Justin L Tobias Gibbs Sampling #1

We will show that where and Justin L Tobias Gibbs Sampling #1

First, note (under prior independence) where p(σ) denotes the prior for Σ Since the conditional posterior distribution p(β Σ, y) is proportional to the joint above (why?), we can ignore all terms which enter multiplicatively and involve only Σ or y (or both) Justin L Tobias Gibbs Sampling #1

It follows that p(β Σ, y) exp ( 1 [ (y X β) Σ 1 (y X β) + (β µ β ) V 1 β (β µ β )] ) 2 This is almost in the form where we can apply our previous completion of the square result However, before doing this, we must get a quadratic form for β in the likelihood function part of this expression Justin L Tobias Gibbs Sampling #1

To this end, let us define Note that β is just the familiar GLS estimator Also note that, given Σ and y, β is effectively known for purposes of deriving β Σ, y (ie, we can and will condition on it) Justin L Tobias Gibbs Sampling #1

For the sake of space, let and note Justin L Tobias Gibbs Sampling #1

The reason why this middle term was zero on the previous page follows since Justin L Tobias Gibbs Sampling #1

Using this last result, we can replace the quadratic form (y X β) Σ 1 (y X β) with to yield û Σ 1 û + (β β ) X Σ 1 X (β β ) Justin L Tobias Gibbs Sampling #1

We can apply our completion of the square formula to the above to get: ( p(β Σ, y) exp 1 ) 2 (β β) V β (β β) where Defining it follows that as claimed V β = (X Σ 1 X + V 1 β ) β = V 1 β (X Σ 1 X β + V 1 β µ β) = V 1 β (X Σ 1 y + V 1 β µ β) D β = V 1 β and d β = X Σ 1 y + V 1 β, β Σ, y N(D β d β, D β ), Justin L Tobias Gibbs Sampling #1

Gibbs Sampling in the Classical Linear Model Consider the regression model under the priors y N(X β, σ 2 I n ) β N(µ β, V β ), σ 2 IG(a, b) We seek to show how the Gibbs sampler can be employed to fit this model (noting, of course, that we do not need to resort to this algorithm in such a simplified model, but introduce it here as a starting point) Justin L Tobias Gibbs Sampling #1

To implement the sampler, we need to derive two things: 1 2 The first of these can be obtained by applying our previous theorem (with Σ = σ 2 I n ) Specifically, we obtain: where ( D β = X X /σ 2 + V 1 β β σ 2, y N(D β d β, D β ), ) 1, dβ = X y/σ 2 + V 1 β µ β Justin L Tobias Gibbs Sampling #1

As for the posterior conditional for σ 2, note p(β, σ 2 y) p(β)p(σ 2 )p(y β, σ 2 ) Since p(σ 2 β, y) is proportional to the joint posterior above, it follows that Justin L Tobias Gibbs Sampling #1

The density on the last page is easily recognized as the kernel of an density Justin L Tobias Gibbs Sampling #1

Thus, to implement the Gibbs sampler in the linear regression model, we can proceed as follows 1 Given a current value of σ 2 : 2 Calculate D β = D β (σ 2 ) and d β = d β (σ 2 ) 3 Draw β from a N[D β (σ 2 )d β (σ 2 ), D β (σ 2 )] distribution 4 Draw σ 2 from an ( [ n IG 2 + a, b 1 + 1 ] ) 1 2 (y X β) (y X β) distribution 5 Repeat this process many times, updating the posterior conditionals at each iteration to condition on the most recent simulations produced in the chain 6 Discard an early set of parameter simulations as the burn-in period 7 Use the subsequent draws to compute posterior features of interest Justin L Tobias Gibbs Sampling #1

Application to Wage Data We apply this Gibbs sampler to our wage-education data set We seek to compare posterior features estimated via the Gibbs output to those we previously derived analytically in our linear regression model lecture notes The following results are obtained under the prior β N(0, 4I 2 ), σ 2 IG[3, (1/[2 2])] We obtain 5,000 simulations, and discard the first 100 as the burn-in period Justin L Tobias Gibbs Sampling #1

Posterior Calculations Using the Gibbs Sampler β 0 β 1 σ 2 E( y) 118 091 [091] 267 Std( y) 087 0063 [0066] 0011 In addition, we calculate Pr(β 1 < 10 y) = 911 The numbers in square brackets were our analytical results based on a diffuse prior Thus, we are able to match these analytical results almost exactly with 4,900 post-convergence simulations Justin L Tobias Gibbs Sampling #1

Gibbs with Groupwise Heteroscedasticity Suppose we generalize the regression model in the previous exercise so that y = X β + ɛ, where E(ɛɛ X ) Σ = σ 2 1 0 0 0 0 0 0 0 0 0 0 0 σ 2 1 0 0 0 0 0 0 0 0 0 0 σ 2 2 0 0 0 0 0 0 0 0 0 0 0 σ 2 2 0 0 0 0 0 0 0 0 0 0 0 σ 2 J 0 0 0 0 0 0 0 0 0 0 0 σ 2 J Justin L Tobias Gibbs Sampling #1

That is, we relax the homoscedasticity assumption with one of groupwise heteroscedasticity - where we presume there are J different groups with identical variance parameters within each group, but different parameters across groups Let n j represent the number of observations belonging to group J Given priors of the form show how the Gibbs sampler can be used to fit this model Justin L Tobias Gibbs Sampling #1

It is important to recognize that even in this simple model, proceeding analytically, as we did in the homoscedastic regression model, will prove to be very difficult As will soon become clear, however, estimation of this model via the Gibbs sampler involves only a simple extension of our earlier algorithm for the classical linear regression model Justin L Tobias Gibbs Sampling #1

Since we have pre-sorted the data into blocks by type of variance parameter, define: y j as the n j 1 outcome vector and X j the n j k covariate matrix, respectively, for group j, j = 1, 2,, J Thus, Var(y j X j ) = σj 2 I nj, y = y 1 y 2 y J, X = X 1 X 2 X J Justin L Tobias Gibbs Sampling #1

The likelihood function for this sample of data can be written as: Justin L Tobias Gibbs Sampling #1

Combining this likelihood with our priors (which we won t write out explicitly), we obtain: Justin L Tobias Gibbs Sampling #1

To implement the Gibbs sampler, we need to derive the posterior conditionals: 1 2 3 4 p(β σ1, 2 σ2, 2, σj 2, y) p(β Σ, y) p(σ 2 1 β, σ 2 2,, σ 2 J, y) p(σ 2 J β, σ2 1,, σ 2 J 1, y) Justin L Tobias Gibbs Sampling #1

The first posterior conditional, again, can be obtained by a direct application of the Lindley and Smith (1972) result Specifically, since y N(X β, Σ) and β N(µ β, V β ), it follows that where β {σ 2 j } J j=1, y N(D β d β, D β ) and Note that Σ will need to be updated at each iteration of the algorithm when sampling β Justin L Tobias Gibbs Sampling #1

As for the complete posterior for each σ 2 j, note (with the j subscript denoting all parameters other than j) Staring at the expression for the joint posterior, we see that all terms involving σ j 2 are separable from those involving σj 2 and so the posterior conditional of interest is of the form Justin L Tobias Gibbs Sampling #1

In this form, it is recognized that Justin L Tobias Gibbs Sampling #1

Thus, to implement the sampler, we would first need to specify an initial condition for β or {σj 2}J j=1 What might be reasonable values for these? Let s say we set β = β initially Then, we 1 Sample σ 1 2 from an ( n1 [ ] ) 1 IG 2 + a 1, b1 1 + (1/2)(y 1 X 1 β) (y 1 X 1 β) 2 3 Sample σ J 2 from an ( nj [ ] ) 1 IG 2 + a J, b 1 J + (1/2)(y J X J β) (y J X J β) 4 Sample a new β from a N(D β d β, D β ) density, noting that Σ must be updated appropriately Justin L Tobias Gibbs Sampling #1