Gibbs Sampling in Linear Models #2

Similar documents
Gibbs Sampling in Endogenous Variables Models

Gibbs Sampling in Linear Models #1

Gibbs Sampling in Latent Variable Models #1

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

Bayesian Linear Regression

MCMC algorithms for fitting Bayesian models

Bayesian Linear Models

USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL*

Bayesian data analysis in practice: Three simple examples

variability of the model, represented by σ 2 and not accounted for by Xβ

The Wishart distribution Scaled Wishart. Wishart Priors. Patrick Breheny. March 28. Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/11

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Monte Carlo Composition Inversion Acceptance/Rejection Sampling. Direct Simulation. Econ 690. Purdue University

Timevarying VARs. Wouter J. Den Haan London School of Economics. c Wouter J. Den Haan

Default Priors and Effcient Posterior Computation in Bayesian

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

The linear model is the most fundamental of all serious statistical models encompassing:

Part 6: Multivariate Normal and Linear Models

Metropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9

November 2002 STA Random Effects Selection in Linear Mixed Models

Dynamic Factor Models and Factor Augmented Vector Autoregressions. Lawrence J. Christiano

Bayesian Inference. Chapter 9. Linear models and regression

Lecture 16: Mixtures of Generalized Linear Models

Bayesian Linear Models

Hierarchical Linear Models

Statistics & Data Sciences: First Year Prelim Exam May 2018

Bayesian Linear Models

Hierarchical Modeling for Univariate Spatial Data

STA 294: Stochastic Processes & Bayesian Nonparametrics

TAKEHOME FINAL EXAM e iω e 2iω e iω e 2iω

Principles of Bayesian Inference

CSC 2541: Bayesian Methods for Machine Learning

ST 740: Linear Models and Multivariate Normal Inference

Module 11: Linear Regression. Rebecca C. Steorts

An Introduction to Bayesian Linear Regression

Bayesian IV: the normal case with multiple endogenous variables

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix

Bayesian linear regression

Lecture Notes based on Koop (2003) Bayesian Econometrics

GLS and FGLS. Econ 671. Purdue University. Justin L. Tobias (Purdue) GLS and FGLS 1 / 22

Conjugate Analysis for the Linear Model

ECO 513 Fall 2009 C. Sims HIDDEN MARKOV CHAIN MODELS

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

Computational statistics

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Parameter Estimation. Industrial AI Lab.

MFM Practitioner Module: Risk & Asset Allocation. John Dodson. January 28, 2015

Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation

Markov Chain Monte Carlo

CTDL-Positive Stable Frailty Model

Linear Regression (9/11/13)

Hierarchical Modelling for Univariate Spatial Data

Markov Chain Monte Carlo methods

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

Bayesian Methods in Multilevel Regression

Least Squares Regression

Efficient Bayesian Multivariate Surface Regression

Lecture 4: Dynamic models

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

Metropolis-Hastings Algorithm

Advanced Introduction to Machine Learning CMU-10715

Bayesian spatial hierarchical modeling for temperature extremes

Katsuhiro Sugita Faculty of Law and Letters, University of the Ryukyus. Abstract

Switching Regime Estimation

Kazuhiko Kakamu Department of Economics Finance, Institute for Advanced Studies. Abstract

Least Squares Regression

1 Data Arrays and Decompositions

STA 4273H: Statistical Machine Learning

Variational Inference (11/04/13)

Working Papers in Econometrics and Applied Statistics

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014

Separable covariance arrays via the Tucker product - Final

Using Model Selection and Prior Specification to Improve Regime-switching Asset Simulations

Random Vectors 1. STA442/2101 Fall See last slide for copyright information. 1 / 30

MFM Practitioner Module: Risk & Asset Allocation. John Dodson. February 3, 2010

Bayesian Multivariate Logistic Regression

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Regression #5: Confidence Intervals and Hypothesis Testing (Part 1)

Advanced topics from statistics

Bayesian Statistics. Debdeep Pati Florida State University. April 3, 2017

A Bayesian Treatment of Linear Gaussian Regression

Regression #2. Econ 671. Purdue University. Justin L. Tobias (Purdue) Regression #2 1 / 24

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

1. Introduction. Hang Qian 1 Iowa State University

Bayesian Hypothesis Testing in GLMs: One-Sided and Ordered Alternatives. 1(w i = h + 1)β h + ɛ i,

PMR Learning as Inference

Bayesian Linear Models

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 15-7th March Arnaud Doucet

Summary of Extending the Rank Likelihood for Semiparametric Copula Estimation, by Peter Hoff

Marginal Specifications and a Gaussian Copula Estimation

Econ 508B: Lecture 5

Use of Bayesian multivariate prediction models to optimize chromatographic methods

The Bayesian Approach to Multi-equation Econometric Model Estimation

Cross-sectional space-time modeling using ARNN(p, n) processes

Transcription:

Gibbs Sampling in Linear Models #2 Econ 690 Purdue University

Outline 1 Linear Regression Model with a Changepoint Example with Temperature Data 2 The Seemingly Unrelated Regressions Model 3 Gibbs sampling in a linear model with inequality constraints

In this lecture, we extend our previous lecture on Gibbs sampling in the linear regression model. In particular, we consider three additional variants of the linear model: 1 A linear regression model with a single, unknown changepoint. 2 The seemingly unrelated regressions [SUR] model. 3 A linear model with inequality restrictions on the regression coefficients.

A Linear Model with a Single Changepoint Suppose that the density for a time series y t, t = 1, 2,, T, conditioned on its lags, the model parameters and other covariates, can be expressed as In this model, λ is a changepoint - for periods until λ, one regression is assumed to generate y, and following λ, a new regression is assumed to generate y.

Suppose you employ priors of the form: θ = [θ 1 θ 2 ] N(µ θ, V θ ) β = [β 1 β 2 ] N(µ β, V β ) σ 2 IG(a 1, a 2 ) τ 2 IG(b 1, b 2 ) λ Uniform{1, 2,, T 1}. Note that λ is treated as a parameter of the model, and by placing a uniform prior over the elements 1, 2,, T 1, a changepoint is assumed to exist.

For this model, we will do the following: 1 (a) Derive the likelihood function. 2 (b) Describe how the Gibbs sampler can be employed to estimate the parameters of this model, given the priors above.

First, for the likelihood function, note that the data essentially divides into two parts, conditioned on the changepoint λ. As such, standard results can be used for the linear regression model (as in our previous lectures) to simulate the regression and variance parameters within each of the two regimes.. Specifically, The joint posterior is proportional to the likelihood above times the given priors for the model parameters.

So, in this model, what will be the complete posterior conditionals for β and θ? Using our established results for the linear regression model, we know: where In the above, we have defined

Similarly, we obtain the posterior conditional for β: β θ, σ 2, τ 2, λ, y N(D β d β, D β ), where ( ) 1 D β = X β X β/τ 2 + V 1 β and d β = X β y β/τ 2 + V 1 β µ β. Where, again, we have defined

What about the complete conditional posterior distributions for the variance parameters σ 2 and τ 2? Can you derive these? It is straightforward to show that and

For the changepoint λ, simulation is not as standard. What we do know is (under our uniform prior on λ): where λ {1, 2, T 1}.

Since λ is discrete-valued, we can: 1 Calculate the above for each possible value of λ (given the current values of the other parameters). 2 Normalize these values into probabilities by dividing each value from the first step by the sum of all values obtained from the first step. 3 Sample λ by drawing from this discrete distribution. A posterior simulator proceeds by iteratively sampling from all of these conditional posterior distributions.

Example With U.S. Average Annual Temperature Data We now provide an example of the previous method using temperature data. Specifically, we obtain data on the average annual temperature in the U.S. from 1895-2006 (n = 112). We seek to determine if there is a change or break in the historical temperature path, perhaps consistent with the idea of global warming. If there is such a break, and if global warming concerns are true, then we might expect that recent years have been characterized by more rapid increases in temperature than years in the distant past. In this example, we aren t thinking about a discrete jump in temperatures, but rather just a change in slopes.

To this end, we consider a restricted version of our previous model: where The changepoint in this model is λ, and periods before and after λ will have different slopes. For this application, we focus attention on fitting a continuous function, and do not allow for discrete jumps as in our original presentation of the model.

We restrict λ {3, 4,..., T 3}. Thus, we assume that a changepoint exists, and force the changepoint to occur toward the interior of the sample period. For our priors, we specify a uniform prior for λ and also specify α 0 52 100 0 0 α 1 N 0, 0 100 0. α 2 0 0 0 100

In this model, the complete posterior conditionals for α and σ 2 are straightforward. That is, given λ, the covariate matrix X λ can be constructed, and the conditionals for these parameters are of standard forms. For the changepoint λ, under our flat prior, we obtain: Thus, we must calculate the likelihood (for a given α and σ 2 ) for each possible value of λ {3, 4,..., T 3}, and then normalize these quantities. The changepoint λ is then drawn from this discrete distribution.

The following slides present results of this estimation exercise. The simulator was run for 50,000 iterations, and the first 1,000 were discarded as the burn-in. We present two graphs: The first plots the posterior mean of ŷ X λ α. Note that this function also averages over the uncertainty surrounding the location of the changepoint λ. The second graph plots the posterior frequencies associated with the location of λ.

Posterior Mean of Expected Temperature 55.5 55 54.5 Avg. Annual US Temperature 54 53.5 53 52.5 52 51.5 51 50.5 1900 1920 1940 1960 1980 2000 Year

Posterior Frequencies of Changepoint 3000 2500 Changepoint Frequency 2000 1500 1000 500 0 1900 1920 1940 1960 1980 2000 Year

The first graph clearly indicates the location of a changepoint at the latter-end of the sample period (around 1980), and a strong upswing in average temperatures after that period. The second graph reaches a similar conclusion, as the posterior distribution of the changepoint λ places most mass toward the end of the sample period. Note the revision of our prior, here, which was specified to be uniform over all possible discrete values. If no learning had taken place, we would expect a similar uniform shape for the posterior distribution.

The SUR Model Consider a two-equation version of the Seemingly Unrelated Regression (SUR) model [Zellner (1962)]: where ɛ i = [ɛ i1 ɛ i2 ] iid N(0, Σ), i = 1, 2,, n, x i1 and x i2 are 1 k 1 and 1 k 2, respectively, and

Suppose you employ priors of the form and β = [β 1 β 2] N(µ β, V β ) where W denotes a Wishart distribution. Note that the Wishart distribution is a multivariate generalization of the gamma distribution, and is the conjugate prior for Σ 1 in a multivariate normal sampling model. The kernel of the Wishart is

First, note that the likelihood function can be expressed as ( ) L(β, Σ) = (2π) n Σ n/2 exp 1 n (ỹ i X i β) Σ 1 (ỹ i X i β), 2 i=1 where [ ỹ i = [y i1 y i2 ], xi1 0 X i = 0 x i2 ]

To implement the Gibbs sampler for this model, we must derive p(β Σ, y) and p(σ 1 β, y). We will now consider the first of these. Observe that we can stack the SUR model as where and y j, X j and ɛ j, for j = 1, 2 have been staked over i = 1, 2,..., n. In addition, stacked in this way, we obtain:

In this form, we can apply our established result for the linear regression model to obtain: β Σ, y N(D β d β, D β ), where and ( D β = X (Σ 1 I N )X + V 1 β ) 1 d β = X (Σ 1 I N )y + V 1 β µ β.

The derivation of the conditional posterior distribution for Σ 1 is new, and so we will go through the details associated with this derivation. Combining prior with likelihood, we obtain: ( ) p(σ 1 β, y) p(σ 1 ) Σ n/2 exp 1 n (ỹ i X i β) Σ 1 (ỹ i X i β), 2 i=1

Using the specific form of our Wishart prior, we get (letting ɛ i = ỹ i X i β) :

In this form, it follows that Thus, implementation of the Gibbs sampler for the SUR model only requires sampling from a Normal and a Wishart distribution.

The latter of these (i.e., the Wishart), though perhaps unfamiliar, is easy to draw from. For example, when ν is an integer (at least as large as the dimension of Σ), as is often the case, we can obtain a draw from a W (Ω, ν) density as follows: First, sample Then, let It follows that x i N (0, Ω), i = 1, 2,..., ν. ν P = x i x i. i=1 P W (Ω, ν).

Inequality Constraints Consider the regression model y i = β 0 + β 1 x i1 + β 2 x i2 + + β k x ik + ɛ i, i = 1, 2,, n, where ɛ i iid N(0, σ 2 ). Let β = [β 0 β 1 β k ]. Suppose that the regression coefficients are known to satisfy constraints of the form a < Hβ < b, where a and b are known (k + 1) 1 vectors of lower and upper limits, respectively, and H is a non-singular matrix which selects elements of β to incorporate the known constraints.

The vectors a and b may also contain elements equal to or, respectively, if a particular linear combination of β is not to be bounded from above or below (or both). Thus, in this formulation of the model, we are capable of imposing up to k + 1 inequality restrictions, but no more. We seek to describe how the Gibbs sampler can be employed to fit this model.

Following Geweke (1996), we will first reparameterize the model in terms of γ = Hβ. To this end, let us write the regression model as y = X β + ɛ, where y = y 1 y 2. y n, X = 1 x 11 x 1k 1 x 21 x 2k...... 1 x n1 x nk and ɛ = ɛ 1 ɛ 2. ɛ n.

Since γ = Hβ and H is non-singular, write β = H 1 γ. The reparameterized regression model thus becomes with

In the γ-parameterization, note that independent priors can be employed which satisfy the stated inequality restrictions. To this end, we specify independent truncated Normal priors for elements of γ of the form: with I ( ) denoting the standard indicator function. The posterior distribution is proportional to the product of the priors (with p(σ 2 ) σ 2 ) and likelihood:

We can make use of our often-used formula to derive the posterior conditional for γ: where and V is the (k + 1) (k + 1) diagonal matrix with j th diagonal element equal to V j.

The density on the last slide is a multivariate Normal density truncated to the regions defined by a and b. Drawing directly from this multivariate truncated Normal is non-trivial in general, but as Geweke [1991] points out, the posterior conditional distributions: γ j γ j, σ 2, y (with γ j denoting all elements of γ other than γ j ) are univariate truncated Normal. Thus, one can sample each γ j individually (rather than sampling γ in a single step).

Let Ω = [H] 1 = X X /σ 2 + V 1, ω ij denote the (i, j) element of Ω, and γ j the j th element of γ. With a bit of work, one can show: where TN (a,b) (µ, σ 2 ) denotes a Normal density with mean µ and variance σ 2 truncated to the interval (a, b).

The derivation of the posterior conditional for the variance parameter is reasonably standard by this point: A Gibbs sampling algorithm for the regression model with linear inequality constraints thus follows by independently sampling each regression parameter γ j and then sampling the variance parameter σ 2.