November 2002 STA Random Effects Selection in Linear Mixed Models

Similar documents
Bayesian linear regression

Bayesian Inference. Chapter 9. Linear models and regression

Default Priors and Effcient Posterior Computation in Bayesian

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Bayesian Hypothesis Testing in GLMs: One-Sided and Ordered Alternatives. 1(w i = h + 1)β h + ɛ i,

Gibbs Sampling in Linear Models #2

Bayesian Linear Regression

7. Estimation and hypothesis testing. Objective. Recommended reading

Fixed and random effects selection in linear and logistic models

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Lecture 16: Mixtures of Generalized Linear Models

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

Graphical Models for Collaborative Filtering

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

STA 216, GLM, Lecture 16. October 29, 2007

variability of the model, represented by σ 2 and not accounted for by Xβ

The linear model is the most fundamental of all serious statistical models encompassing:

Fixed and Random Effects Selection in Linear and Logistic Models

MULTILEVEL IMPUTATION 1

PMR Learning as Inference

Mixed models in R using the lme4 package Part 4: Theory of linear mixed models

7. Estimation and hypothesis testing. Objective. Recommended reading

Gibbs Sampling in Endogenous Variables Models

The joint posterior distribution of the unknown parameters and hidden variables, given the

ST 740: Linear Models and Multivariate Normal Inference

Bayesian non-parametric model to longitudinally predict churn

Bayesian Linear Models

Part 8: GLMs and Hierarchical LMs and GLMs

Bayesian Linear Models

Hierarchical Linear Models. Hierarchical Linear Models. Much of this material already seen in Chapters 5 and 14. Hyperprior on K parameters α:

Bayesian Linear Models

Bayes methods for categorical data. April 25, 2017

g-priors for Linear Regression

Part 6: Multivariate Normal and Linear Models

Cross-sectional space-time modeling using ARNN(p, n) processes

Supplementary Material for Analysis of Job Satisfaction: The Case of Japanese Private Companies

Sparse Linear Models (10/7/13)

Bayesian shrinkage approach in variable selection for mixed

An Introduction to Bayesian Linear Regression

Bayesian Graphical Models for Structural Vector AutoregressiveMarch Processes 21, / 1

Accounting for Complex Sample Designs via Mixture Models

Markov Chain Monte Carlo methods

1 Data Arrays and Decompositions

Sparse Factor-Analytic Probit Models

Likelihood-Based Methods

Large-scale Ordinal Collaborative Filtering

Index. Pagenumbersfollowedbyf indicate figures; pagenumbersfollowedbyt indicate tables.

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017

Non-Parametric Bayes

Younshik Chung and Hyungsoon Kim 968). Sharples(990) showed how variance ination can be incorporated easily into general hierarchical models, retainin

Bayesian data analysis in practice: Three simple examples

Katsuhiro Sugita Faculty of Law and Letters, University of the Ryukyus. Abstract

Nonparameteric Regression:

Hierarchical Linear Models

Univariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation

The Wishart distribution Scaled Wishart. Wishart Priors. Patrick Breheny. March 28. Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/11

Factorization of Seperable and Patterned Covariance Matrices for Gibbs Sampling

Variational Inference (11/04/13)

Partial factor modeling: predictor-dependent shrinkage for linear regression

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

Steven L. Scott. Presented by Ahmet Engin Ural

Linear Models A linear model is defined by the expression

A Bayesian Treatment of Linear Gaussian Regression

An exploration of fixed and random effects selection for longitudinal binary outcomes in the presence of non-ignorable dropout

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random

Introduction into Bayesian statistics

Principles of Bayesian Inference

Gibbs Sampling in Linear Models #1

An Extended BIC for Model Selection

Density Estimation. Seungjin Choi

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Lecture 16 : Bayesian analysis of contingency tables. Bayesian linear regression. Jonathan Marchini (University of Oxford) BS2a MT / 15

Default Bayesian Model Determination Methods for Generalised Linear Mixed Models

Will Penny. SPM for MEG/EEG, 15th May 2012

Bayesian inference on dependence in multivariate longitudinal data

Will Penny. DCM short course, Paris 2012

Dynamic System Identification using HDMR-Bayesian Technique

Chapter 4: Factor Analysis

STAT Advanced Bayesian Inference

Bayesian Methods for Machine Learning

Classical and Bayesian inference

2 Bayesian Hierarchical Response Modeling

Lecture 5: Spatial probit models. James P. LeSage University of Toledo Department of Economics Toledo, OH

Multivariate Normal & Wishart

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Stat260: Bayesian Modeling and Inference Lecture Date: March 10, 2010

Bayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples

Bayesian Inference for the Multivariate Normal

Computational methods for mixed models

STAT 425: Introduction to Bayesian Analysis

Generalized Linear Models. Kurt Hornik

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

Vector Autoregressive Model. Vector Autoregressions II. Estimation of Vector Autoregressions II. Estimation of Vector Autoregressions I.

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix

Gibbs Sampling in Latent Variable Models #1

Math 423/533: The Main Theoretical Topics

The Normal Linear Regression Model with Natural Conjugate Prior. March 7, 2016

Bayesian Regressions in Experimental Design

A Derivation of the EM Updates for Finding the Maximum Likelihood Parameter Estimates of the Student s t Distribution

Transcription:

November 2002 STA216 1 Random Effects Selection in Linear Mixed Models

November 2002 STA216 2 Introduction It is common practice in many applications to collect multiple measurements on a subject. Linear mixed models (Laird and Ware, 1982; Longford, 1993) attempt to account for within-subject dependency in the multiple measurements by including one or more subject-specific latent variables (i.e., random effects) in the regression model. An important practical problem in applying linear mixed models is how to choose the random effects component. Use AIC or BIC? Likelihood ratio test? Score test?

November 2002 STA216 3 Bayesian Hierarchical Approach We propose an approach for selecting random effects using a hierarchical Bayesian model. A key step: D = ΛΓΓ T Λ, (1) We allow elements of Λ to have positive probability of being zero so that random effects can have zero variances, effectively dropping out of the model. Conditionally, the parameters in either Λ or Γ can be regarded as regression coefficients in a normal linear model.

November 2002 STA216 4 Linear Mixed Models n subjects, with subject i contributing n i observations For subject i at observation j, let y ij denote a response variable, let x ij denote a p 1 vector of predictors, and let z ij denote a q 1 vector of predictors. In general, the linear mixed effects model is written as y i = X i α + Z i β i + ε i, (2) where y i = (y i1,..., y ini ) T, X i = (x T i1,..., xt in i ) T, Z i = (z T i1,..., zt in i ) T, α is a p 1 vector of unknown population parameters, β i is a q 1 vector of unknown subject-specific random effects with β i N(0, D), and the elements of the residual vector, ε i, are N(0, σ 2 I). Integrating out the random effects β i, the marginal distribution of y i is

November 2002 STA216 5 N(X i α, Z i DZ T i ). Heterogeneity among subjects is accommodated by allowing the linear predictor conditional on the covariates to vary. When z ij is a subvector of x ij, the model allows the regression coefficients for the covariates included in z ij to vary among subjects, while assuming that the remaining coefficients are fixed for all subjects. In Bayesian estimation of mixed models: inverse-wishart prior for D. The inverse-wishart density tends to be restrictive, however, since it prescribes a common degrees of freedom for all the diagonal entries of D. In addition, it is only useful if the random effects component is known, since it restricts all random effect variances to be positive.

November 2002 STA216 6 Reparameterization Starting with the model that has a random coefficient for each of the elements of z ij, we adaptively select models having some random effects excluded. From model (2), it is clear that selecting a subset of random effects is equivalent to setting to 0 the variances of the nonselected random effects. Let d lm denote the (l, m)th entry of D, for l, m = 1,..., q. The lth random effect β il is excluded if d ll = 0 and is included if d ll > 0. Let L be the lower triangular Cholesky decomposition of D. We assume that L has nonnegative diagonal elements so that it is unique (Seber, 1977, p388). Given L, the linear mixed model (2) can be reexpressed as y i = X i α + Z i Lb i + ε i,

November 2002 STA216 7 where b i = (b i1,..., b iq ) T is a vector of independent standard normal latent variables. We further let L = ΛΓ, where Λ = diag(λ 1,..., λ q ) and Γ is a q q matrix with the (l, m)th element denoted by γ lm. As minimal conditions on Λ and Γ so that they are uniquely defined, we assume that λ l 0, γ ll = 1, and γ lm = 0 for l = 1,..., q, m = l + 1,..., q. (3) Specifically, we choose Λ to be a nonnegative q q diagonal matrix, and Γ to be a lower triangular matrix with 1 s in the diagonal entries. This leads to the decomposition of D in (1), and to the reparameterized linear mixed model, y i = X i α + Z i ΛΓb i + ε i. (4)

November 2002 STA216 8 Implications of the Reparameterization Following straightforward matrix algebra, the diagonal elements of D are d ll = λ 2 l ( 1 + l 1 r=1 γ 2 lr ) The off-diagonal elements are d lm = d ml = λ l λ m (γ ml + for l = 1,..., q, (5) l 1 r=1 γ lr γ mr ) for l = 1,..., q; m = l + 1,..., q. In the case where λ l = 0, var(β il ) = 0 and the lth random effect, β il, is effectively dropped. The parameters γ R q(q 1)/2 measure the degree of within-subject dependency in the random-effects, β i, as is clear from the expression for the correlation coefficient

November 2002 STA216 9 between β il and β im, for l m, ρ(β im, β il ) = γ ml + l 1 r=1 γ lrγ mr ( 1 + )( l 1 r=1 γ2 lr 1 + ), m 1 r=1 γ2 mr which does not depend on λ. As functions of elements of the covariance matrix D, λ and γ are not independent. In particular, if λ l = 0, γ ml = γ lm = 0 for all m {l + 1,..., q} and m {1,..., l 1}. For later use, we define { R λ = γ : γ ml = γ lm = 0 if λ l = 0, } l = 1,..., q, m = l + 1,..., q, m = 1,..., l 1. (6)

November 2002 STA216 10 Prior Specification Our model is completed with a prior density for θ = (α, λ, γ, σ 2 ) T. First, we assume p(θ) = p(λ, γ)p(α)p(σ 2 ), Following standard convention, we choose conjugate priors, with N(α 0, A 0 ) for α and G(c 0, d 0 ) (σ 2 ) c 0 1 exp{ d 0 σ 2 } for σ 2. In choosing priors for Λ and Γ, and hence for D, we wish to allocate positive probability to zero values for the random effects variances. In addition, motivated by practical considerations, we want to choose priors that facilitate posterior computation. For this reason, prior distributions that are conditionally conjugate are desirable. We assume that p(λ, γ) = p(γ λ)p(λ) N(γ; γ 0, R 0 )1(γ R λ )p(λ),

November 2002 STA216 11 We further assume that the λ s are independent so that p(λ) = q l=1 p(λ l). Let ZI-N + (π, µ, σ 2 ) denote the density of a zero inflated half normal distribution consisting of a point mass at zero (with probability π) and a N(µ, σ 2 ) density truncated below by zero. To specify a model selection prior, we choose p(λ l ) = d ZI-N + (p l0, m l0, s 2 l0 ) for each l, where p l0, m l0, and s 2 l0 are hyperparameters to be specified by the investigators. The prior probability that the lth random effect is excluded (i.e., its variance is zero) is p l0, and the overall prior probability of excluding all the random effects is q l=1 p l0.

November 2002 STA216 12 Posterior Computation Letting b = (b 1,..., b n ) T and y = (y 1,..., y n ) T, the likelihood is given by ( n exp σ 2 i=1 n i j=1 l(θ, b; y) = (2πσ 2 ) n i=1 n i/2 ) (y ij x T ijα z T ijλγb i ) 2 /2. The posterior distribution is obtained by combining priors and the likelihood in the usual way. However, directly evaluation of the posterior distribution seems to be difficult. Instead we employ a Gibbs sampler (Gelfand and Smith, 1990) which works by alternately sampling from the full conditional distributions of the parameters (α, σ 2, λ, γ) and latent variables b. Bayesian linear model theory (Lindley and Smith, 1972) applies when deriving the full

November 2002 STA216 13 conditional distributions of α, σ 2, and b p(α λ, γ, σ 2, b, y) = d N( α, Â), with ( Â = σ 2 n ) 1 ni i=1 j=1 x ijx T ij + A 1 0 and { α = Â σ 2 n ni i=1 j=1 x ij(y ij z T ij ΛΓb i) + } A 1 0 α 0. For σ 2, the full conditional distribution is given by p(σ 2 α, λ, γ, b, y) = d G(ĉ, d) where ĉ = c 0 + n i=1 n i/2 and d = d 0 + n ni i=1 j=1 (y ij x T ij α zt ij ΛΓb i) 2 /2. Similar to α, the full conditional distribution of the latent normal variable b is n p(b λ, γ, σ 2, α, y) = p(b i λ, γ, σ 2, α, y i ), i=1 with p(b i λ, γ, σ 2, α, y i ) = d N(ĥi, ( Ĥi), where Ĥi = σ 2 ) n 1, i j=1 v ijvij T + I ĥ i = σ 2 Ĥ i ni j=1 v ij(y ij x T ij α), and v ij = z T ij ΛΓb i.

November 2002 STA216 14 FCDs of λ and γ The full conditional distributions of λ and γ seem to be complex, given the likelihood form in (7). However, upon rewriting expression (4) with constraint (3) as q q ) y ij = x T ijα + b il (λ l z ijl + λ m z ijm γ ml + ε ij, l=1 m=l+1 we obtain two equations that characterize λ and γ as regression coefficients in a normal linear model. First define the q(q 1)/2 1 vector ( T u ij = b il λ m z ijm : l = 1,..., q, m = l + 1,..., q). Then expression (7) implies y ij x T ijα = u T ijγ + ε ij. Since the error term is normally distributed and γ has a multivariate normal prior

November 2002 STA216 15 distribution after setting elements equal to zero to ensure that γ R λ, the full conditional distribution for γ is easy to derive. The full conditional distribution of γ is given by p(γ α, λ, b, σ 2, y) N( γ, R)1(γ R λ ), where R ( = σ 2 n ) 1 ni i=1 j=1 u iju T ij + R 1 0 and ( γ = R σ 2 n i=1 ) ni j=1 u ij(y ij x T ijα) + R 1 0 γ 0. Similarly, on defining the q 1 vector t ij = ( z ijl (b il + l 1 m=1 b im γ ml ) : l = 1,..., q) T, it is easy to verify that (7) implies y ij x T ijα = t T ijλ + ε ij. Letting η ijl = y ij x T ij α m l t ijmλ m for each λ l, we have η ijl = t ijl λ l + ε ij. It follows from straightforward (but lengthy) algebra

November 2002 STA216 16 that the full conditional distribution of λ l is p(λ l λ (l), α, β, γ, σ 2, y) d = ZI-N + ( p l, λ l, σ 2 l ), (8) where p l = P (λ l = 0 λ (l), α, β, γ, σ 2, y) is the conditional posterior probability that λ l = 0, and λ l and σ l 2 are the updated mean and variance in the normal component of the ZI-N + density. To derive the expressions for p l, λ l and σ l 2, first let ω 2 l = n ni i=1 j=1 t2 ijl /σ2, and let λ l be the maximum likelihood estimate of λ l so that λ l = n i=1 Then, λ l = σ 2 (ω 2 l σ 2 l = (ω 2 l ni j=1 t ijlη ijl / n i=1 λ l + s 2 l0 m l0) and + s 2 l0 ) 1. Define ni j=1 t2 ijl.

November 2002 STA216 17 a = exp{ n i=1 ni j=1 η2 ijl /2σ2 } and b = σ l 1 Φ( m l0 /s l0 ) s l0 1 Φ( λ l / σ l ) { } n n i exp (η ijl λ l t ijl ) 2 /2σ 2 exp { i=1 j=1 ( λ 2 l /2ω 2 l + m 2 l0/2s 2 l0 λ 2 /2 σ 2 l ) }. Then, p l = p l0 a p l0 a + (1 p l0 )b. Distribution (8) is conditionally conjugate, following the same form as the prior for λ l. Sampling from expression (8) can be implemented by (i) sampling δ l from Bernoulli( p l ); and (ii) setting λ l = 0 if δ l = 1 and otherwise sampling λ l from N( λ l, σ 2 l ) truncated below by zero. Given repeated samples from the posterior distribution, inference about the model

November 2002 STA216 18 parameters α, γ, λ, and σ 2 proceeds as usual. In particular, one can report posterior means, posterior standard deviations, and highest posterior density (HPD) intervals. To compute the posterior probabilities of each of the 2 q models, we simply add up the number of occurrences of each model and divide by the number of iterations. The prior and posterior probabilities can then be used to calculate Bayes factors for comparing individual models. Refer to Kass and Raftery (1995) for a review of the Bayes factor.