Gibbs Sampling in Latent Variable Models #1

Similar documents
Gibbs Sampling in Linear Models #2

Gibbs Sampling in Endogenous Variables Models

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

Gibbs Sampling in Linear Models #1

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Bayesian Linear Models

Bayesian Linear Regression

Truncation and Censoring

Wageningen Summer School in Econometrics. The Bayesian Approach in Theory and Practice

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Monte Carlo Composition Inversion Acceptance/Rejection Sampling. Direct Simulation. Econ 690. Purdue University

Limited Dependent Variable Panel Data Models: A Bayesian. Approach

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

Bayesian Hypothesis Testing in GLMs: One-Sided and Ordered Alternatives. 1(w i = h + 1)β h + ɛ i,

Linear Regression With Special Variables

Lecture Notes based on Koop (2003) Bayesian Econometrics

Applied Microeconometrics (L5): Panel Data-Basics

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Gibbs Sampling for the Probit Regression Model with Gaussian Markov Random Field Latent Variables

Bayesian Inference: Probit and Linear Probability Models

Bayesian Linear Models

Econometrics II. Seppo Pynnönen. Spring Department of Mathematics and Statistics, University of Vaasa, Finland

Econometrics Lecture 5: Limited Dependent Variable Models: Logit and Probit

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Bayesian Linear Models

Lecture 12: Application of Maximum Likelihood Estimation:Truncation, Censoring, and Corner Solutions

Lecture 5: Spatial probit models. James P. LeSage University of Toledo Department of Economics Toledo, OH

MCMC algorithms for fitting Bayesian models

Timevarying VARs. Wouter J. Den Haan London School of Economics. c Wouter J. Den Haan

Bayesian linear regression

Econometric Analysis of Games 1

A Bayesian Mixture Model with Application to Typhoon Rainfall Predictions in Taipei, Taiwan 1

Applied Economics. Regression with a Binary Dependent Variable. Department of Economics Universidad Carlos III de Madrid

Limited Dependent Panel Data Models: A comparative Analysis of classical and Bayesian Inference among Econometric Packages

The linear model is the most fundamental of all serious statistical models encompassing:

Contents. Part I: Fundamentals of Bayesian Inference 1

STA 216, GLM, Lecture 16. October 29, 2007

Logistic & Tobit Regression

Part 6: Multivariate Normal and Linear Models

Ordered Response and Multinomial Logit Estimation

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

Online Appendix to: Marijuana on Main Street? Estimating Demand in Markets with Limited Access

Modeling Binary Outcomes: Logit and Probit Models

Bayesian Econometrics - Computer section

Women. Sheng-Kai Chang. Abstract. In this paper a computationally practical simulation estimator is proposed for the twotiered

Econometrics Master in Business and Quantitative Methods

Principles of Bayesian Inference

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

Analysing geoadditive regression data: a mixed model approach

Studies in Nonlinear Dynamics & Econometrics

Time Series and Dynamic Models

CENSORED DATA AND CENSORED NORMAL REGRESSION

Special Topic: Bayesian Finite Population Survey Sampling

Default Priors and Effcient Posterior Computation in Bayesian

Module 11: Linear Regression. Rebecca C. Steorts

Marginal Specifications and a Gaussian Copula Estimation

A Bayesian Treatment of Linear Gaussian Regression

Bayesian Multivariate Logistic Regression

Minimax-Regret Sample Design in Anticipation of Missing Data, With Application to Panel Data. Jeff Dominitz RAND. and

i (x i x) 2 1 N i x i(y i y) Var(x) = P (x 1 x) Var(x)

Metropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9

Chap 2. Linear Classifiers (FTH, ) Yongdai Kim Seoul National University

Lecture 16: Mixtures of Generalized Linear Models

Generalized Linear Models Introduction

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

Week 7: Binary Outcomes (Scott Long Chapter 3 Part 2)

STAT Advanced Bayesian Inference

Models for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data

Course Econometrics I

Other Noninformative Priors

Kazuhiko Kakamu Department of Economics Finance, Institute for Advanced Studies. Abstract

Ages of stellar populations from color-magnitude diagrams. Paul Baines. September 30, 2008

Flexible Regression Modeling using Bayesian Nonparametric Mixtures

Multivariate Versus Multinomial Probit: When are Binary Decisions Made Separately also Jointly Optimal?

Conjugate Analysis for the Linear Model

ECON 594: Lecture #6

Econometrics I. Professor William Greene Stern School of Business Department of Economics 1-1/40. Part 1: Introduction

Supplementary Material for Analysis of Job Satisfaction: The Case of Japanese Private Companies

STA 4273H: Statistical Machine Learning

REGRESSION RECAP. Josh Angrist. MIT (Fall 2014)

Lecture 10: Alternatives to OLS with limited dependent variables. PEA vs APE Logit/Probit Poisson

Econometrics I Lecture 7: Dummy Variables

CSE 546 Final Exam, Autumn 2013

Discrete Choice Modeling

Ch 7: Dummy (binary, indicator) variables

Part 2: One-parameter models

MULTILEVEL IMPUTATION 1

Applied Health Economics (for B.Sc.)

GLS and FGLS. Econ 671. Purdue University. Justin L. Tobias (Purdue) GLS and FGLS 1 / 22

Lecture 3.1 Basic Logistic LDA

Foundations of Statistical Inference

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 15-7th March Arnaud Doucet

WU Weiterbildung. Linear Mixed Models

A Bayesian Probit Model with Spatial Dependencies

Bayesian data analysis in practice: Three simple examples

Joint Modeling of Longitudinal Item Response Data and Survival

Lecturer: David Blei Lecture #3 Scribes: Jordan Boyd-Graber and Francisco Pereira October 1, 2007

Regression #8: Loose Ends

Chapter 9 Regression with a Binary Dependent Variable. Multiple Choice. 1) The binary dependent variable model is an example of a

Econometrics Problem Set 10

Transcription:

Gibbs Sampling in Latent Variable Models #1 Econ 690 Purdue University

Outline 1 Data augmentation 2 Probit Model Probit Application A Panel Probit Panel Probit 3 The Tobit Model Example: Female Labor Supply

Many microeconometric applications (including binary, discrete choice, tobit and generalized tobit analyses) involve the use of latent data. This latent data is unobserved by the econometrician, but the observed choices economic agents make typically impose some type of truncation or ordering among the latent variables.

In this lecture we show how the Gibbs sampler can be used to fit a variety of common microeconomic models involving the use of latent data. In particular, we review how data augmentation [see, e.g., Tanner and Wong (1987), Chib (1992) and Albert and Chib (1993)] can be used to simplify the computations in these models. Importantly, we recognize that many popular models in econometrics are essentially linear regression models on suitably defined latent data. Thus, conditioned on the latent data s values, we can apply all of our previous techniques to sample parameters in the framework of a linear regression model.

To review the idea behind data augmentation in general terms, suppose that we are primarily interested in characterizing the joint posterior p(θ y). An alternative to analyzing this posterior directly involves working with a seemingly more completed posterior p(θ, z y) for some latent data z. Although the addition of the variables z might seem to needlessly complicate the estimation exercise, it often proves to be computationally convenient. To implement the Gibbs sampler here, we need to obtain the posterior conditionals p(θ z, y) and p(z θ, y). Often, the posterior conditional p(θ z, y) will take a convenient form, thus making it possible to fit the model using the Gibbs sampler.

Probit Model Consider the following latent variable representation of the probit model: The value of the binary variable y i is observed, as are the values of the explanatory variables x i. The latent data z i, however, are unobserved.

For this model, we seek to: (a) Derive the likelihood function. (b) Using a prior for β of the form β N(µ β, V β ), derive the augmented joint posterior p(β, z y). (c) Verify that marginalized over z, the joint posterior for the parameters β is exactly the same as the posterior you would obtain without introducing any latent variables to the model. (d)discuss how the Gibbs sampler can be employed to fit the model.

(a) Note that Pr(y i = 1 x i, β) = Pr(x i β + ɛ i > 0) = Pr(ɛ i > x i β) = 1 Φ( x i β) = Φ(x i β). Similarly, Pr(y i = 0 x i, β) = 1 Φ(x i β). Because of the assumed independence across observations, the likelihood function is obtained as: L(β) = n Φ(x i β) y i [1 Φ(x i β)] 1 y i. i=1

(b) To derive the augmented joint posterior, note that implying that The term p(β) is simply our prior, while p(y, z β) represents the complete or augmented data density.

To characterize this density in more detail, note Immediately, from our latent variable representation, we know As for the conditional for y given z and β, note that when z i > 0 then y i must equal one, while when z i 0, the y i must equal zero.

In other words, the sign of z i perfectly predicts the value of y. Hence, we can write with I denoting the indicator function taking on the value one if the statement in the parentheses is true, and is otherwise zero.

This previous expression simply states that when z i is positive, then y i is one with probability one, and conversely, when z i is negative, then y i is zero with probability one. Putting the pieces together, we obtain the augmented data density p(y, z β). We combine this with our prior to obtain p(β, z y) p(β) n [I (z i > 0)I (y i = 1) + I (z i 0)I (y i = 0)] φ(z i, x i β, 1). i=1

(c) To show the equality of these quantities, we integrate our last expression for the joint posterior over z to obtain: p(β y) = p(β, z y)dz z p(β) = p(β) = p(β) = p(β) = p(β) n [I (z i > 0)I (y i = 1) + I (z i 0)I (y i = 0)] φ(z i, x i β, 1)dz 1 dz n z i=1 n [ i=1 n [ 0 i=1 ] [I (z i > 0)I (y i = 1) + I (z i 0)I (y i = 0)] φ(z i, x i β, 1)dz i ] I (y i = 1)φ(z i ; x i β, 1)dz i I (y i = 0)φ(z i ; x i β, 1)dz i + n [I (y i = 0)[1 Φ(x i β)] + I (y i = 1)Φ(x i β)] i=1 n Φ(x i β) y i [1 Φ(x i β)] 1 y i. i=1 0

Note that this is exactly the prior times the likelihood obtained from part (a) which can be obtained without explicitly introducing any latent variables. What is important to note is that the posterior of β is unchanged by the addition of the latent variables. So, augmenting the posterior with z will not change any inference regarding β, though it does make the problem computationally easier, as described in the solution to the next question.

(d) With the addition of the latent data, the complete conditionals are easily obtained. In particular, the complete conditional for β given z and the data y follows directly from standard results from the linear regression model where

As for the complete conditional for z, first note that the independence across observations implies that each z i can be drawn independently. We also note that

Thus, where the notation TN [a,b] (µ, σ 2 ) denotes a Normal distribution with mean µ and variance σ 2 truncated to the interval [a, b].

These derivations show that one can implement the Gibbs sampler by drawing β z, y from a multivariate Normal distribution and then drawing each z i β, y, i = 1, 2,, n independently from its conditional posterior distribution. To generate draws from the Truncated Normal one can use the inverse transform method as described in previous lectures. I am supplying a MATLAB program, truncnorm3.m which generates draws from a truncated normal distribution (and uses the method of inversion).

Probit: Application In an paper that has been used to illustrate the binary choice model in different econometrics texts [e.g., Gujarati (2003)], Fair (1978) analyzed the decision to have an extramarital affair. We take a version of this data and fit a binary choice model to examine factors that are related to the decision to have an affair.

We specify a probit model where the decision to have an extramarital affair depends on seven variables: 1 An intercept (CONS) 2 A male dummy (MALE) 3 Number of years married (YS-MARRIED) 4 A dummy variable if the respondent has children from the marriage (KIDS) 5 A dummy for classifying one s self as religious (RELIGIOUS) 6 Years of schooling completed (ED) 7 A final dummy variable denoting if the person views the marriage as happier than an average marriage (HAPPY)

For our prior for the 7 1 parameter vector β, we specify β N(0, 10 2 I 7 ), so that the prior is quite noninformative and centered at zero. We run the Gibbs sampler for 2000 iterations, and discard the first 500 as the burn-in.

Table 14.1: Coefficient and Marginal Effect Posterior Means and Standard Deviations from Probit Model Using Fair s (1978) Data Coefficient Marginal Effect Mean Std. Dev Mean Std. Dev CONS -.726 (.417) - - MALE.154 (.131).047 (.040) YS-MARRIED.029 (.013).009 (.004) KIDS.256 (.159).073 (.045) RELIGIOUS -.514 (.124) -.150 (.034) ED.005 (.026).001 (.008) HAPPY -.514 (.125) -.167 (.042)

To illustrate how quantities of interest other than regression coefficients could be calculated, we reported posterior means and standard deviations associated with marginal effects from the probit model. For a continuous explanatory variable we note: E(y x, β) x j = β j φ(xβ).

For given x, the marginal effect on the previous slide is a function of the regression parameters β. Thus, the posterior distribution of the marginal effect can be obtained by calculating and collecting the values {β (i) j φ(xβ (i) )} 1500 i=1, where β(i) represents the i th post-convergence draw obtained from the Gibbs sampler. When x j is binary, we calculate the marginal effect as the difference between the Normal c.d.f. s when the binary indicator is set to 1 and then 0.

A Panel Probit Model To illustrate how our results for nonlinear and hierarchical models can be combined, consider a panel probit model of the form: The observed binary responses y it are generated according to: { 1 if zit > 0 y it = 0 if z it 0

The random effects {α i } are drawn from a common distribution: α i iid N(α, σ 2 α ). Suppose that you employ priors of the following forms: β N(µ β, V β ) α N(µ α, V α ) σ 2 α IG(a, b). We seek to show a Gibbs sampler can be employed to fit this panel probit model.

Like the probit without a panel structure, we will use data augmentation to fit the model. Specifically, we will work with an augmented posterior distribution of the form p(z, {α i }, α, σ 2 α, β y). Derivation of the complete posterior conditionals then follows similarly to those derived in this lecture for the probit and a previous lecture for linear hierarchical models.

Specifically, we obtain where α i α, σ 2 α, β, z, y ind N(D αi d αi, D αi ), i = 1, 2,, n D αi = (T + σ 2 α ) 1, d αi = t (z it x it β) + σ 2 α α,

where β α, {α i }, σ 2 α, z, y N(D β d β, D β ) D β = (X X + V 1 β ) 1, d β = X (z α) + V 1 β µ β, X and z have been stacked appropriately and α [α 1 ι T α nι T ] with ι T denoting a T 1 vector of ones.

where α β, {α i }, σ 2 α, z, y N(D α d α, D α ) D α = (n/σ 2 α + V 1 α ) 1, d α = i α i /σ 2 α + V 1 α µ α. Finally, σ 2 α α, {α i }, β, z, y IG ( n 2 + a, [b 1 +.5 i (α i α) 2 ] 1 ). Fitting the model involves cycling through these conditional posterior distributions. Of course, blocking steps can and should be used to improve the mixing of the posterior simulator

The Tobit Model The tobit model specifies a mixed discrete-continuous distribution for a censored outcome variable y. In most applications of tobit models, values of y are observed provided y is positive, while we simultaneously see a clustering of y values at zero.

Formally, we can write the tobit specification in terms of a latent variable model: and

For this model, we seek to 1 (a) Write down the likelihood function for the tobit model. 2 (b) Describe how data augmentation can be used in conjunction with the Gibbs sampler to carry out a Bayesian analysis of this model.

(a) The likelihood function breaks into two parts. For the set of observations censored at zero, the contribution to the likelihood is: Pr(y i = 0 x i, β, σ 2 ) = Pr(ɛ i x i β) = 1 Φ(x i β/σ). Similarly, when y i > 0, the contribution to the likelihood is φ(y i ; x i β, σ 2 ) = 1 [( )] σ φ yi x i β, σ with φ( ) denoting the standard Normal density function. Hence, L(β, σ 2 ) = [ ( )] xi β [( )] 1 1 Φ σ σ φ yi x i β. σ i:y i =0 i:y i >0

(b) Chib (1992, Journal of Econometrics) was the original Bayesian tobit paper. Before discussing how the model can be fit using the Gibbs sampler, we note that the following prior specifications are employed: β N(µ β, V β ) σ 2 IG(a, b).

Following Chib (1992), we augment the joint posterior with the latent data z i. Thus, we will work with It is also important to recognize that when y i > 0, z i is observed (and equals y i ), while when y i = 0, we know that z i is truncated from above at zero.

To this end, let D i be a binary variable that equals one if the observation is censored (y i = 0) and equals zero otherwise. We define so that y z just takes the value y for the uncensored observations, and for the censored observations, y z will take the value of the latent data z. Conditioned on z, this makes it easy to draw from the β conditional.

We obtain the following complete posterior conditionals for the standard tobit model: where D β = ( X X σ 2 σ 2 β, z, y IG n/2 + a, β z, σ 2, y N(D β d β, D β ), ) 1 + V 1 β, d β = X y z σ 2 ( b 1 + (1/2) + V 1 β µ β. ) 1 n (y zi x i β) 2. i=1

Finally, A Gibbs sampler involves cycling through these three sets of conditional posterior distributions.

Tobit Model: Application We illustrate the use of the tobit model with a simple data set providing information on the number of weeks worked in 1990 by a sample of 2,477 married women. The data are taken from the National Longitudinal Survey of Youth (NLSY). Importantly, we recognize that this data set has a two-sided censoring problem, wherein the number of weeks worked is clustered both at zero and at 52 (full-time employment). We seek to account for this two-sided censoring in our model specification.

For our priors, we use the same forms as those presented for the general tobit model and set µ β = 0, V β = 10 2 I 5, a = 3, b = (1/40). In our data, approximately 45 percent of the sample reports working 52 hours per week and 17 percent of the sample reports working 0 weeks per year. To account for this feature of our data, we specify a slightly generalized form of the tobit model:

and z i = x i β + ɛ i, ɛ i iid N(0, σ 2 ) Equivalently,

In terms of the Gibbs sampler, the only difference between the algorithm required for this model and the one for the standard tobit is that z i must be truncated from below at 52 for all of those observations with y i = 52. One possible programming expedient in this regard is to sample a latent variable truncated from above at zero (say zi 1 ) and one truncated from below at 52 (say zi 2 ) for every observation in the sample. Then, let Di 1 be a dummy variable if y i = 0 and Di 2 be a dummy if y i = 52. We can then construct our complete data vector y z as: y z = D 1 z 1 + D 2 z 2 + (1 D 1 D 2 )y. With this construction, the Gibbs sampler then proceeds identically to the previous case.

Table 14.4: Coefficient and Marginal Effect Posterior Means and Standard Deviations from Tobit Model of Weeks Worked by Married Women Coefficient Marginal Effect Mean Std. Dev. Mean Std. Dev CONST 29.65 (6.06) - - AFQT.107 (.047).043 (.019) SPOUSE-INC -.245 (.053) -.098 (.021) KIDS -26.65 (2.44) -10.61.971 ED 2.94 (.513) 1.17 (.202) σ 44.35 (1.20) - -

The sampler was run for 5,500 iterations and the first 500 were discarded as the burn-in. For a tobit with two-sided censoring, one can show: [ ( ) ( )] E(y x, β) 52 xβ xβ = β j Φ Φ. x j σ σ Our results suggest that married women with low test scores, few years of education, and children in the home are likely to work fewer weeks during the year.