Bayesian shrinkage approach in variable selection for mixed

Similar documents
Or How to select variables Using Bayesian LASSO

Lasso & Bayesian Lasso

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Bayesian Sparse Linear Regression with Unknown Symmetric Error

Horseshoe, Lasso and Related Shrinkage Methods

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

Default Priors and Effcient Posterior Computation in Bayesian

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Fixed and Random Effects Selection in Linear and Logistic Models

STA 216, GLM, Lecture 16. October 29, 2007

November 2002 STA Random Effects Selection in Linear Mixed Models

Fixed and random effects selection in linear and logistic models

Bayesian Additive Regression Tree (BART) with application to controlled trail data analysis

Bayesian linear regression

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Default Priors and Efficient Posterior Computation in Bayesian Factor Analysis

Partial factor modeling: predictor-dependent shrinkage for linear regression

Bayesian methods in economics and finance

Bayes Model Selection with Path Sampling: Factor Models

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Nonparametric Bayes Modeling

Sparse Linear Models (10/7/13)

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

Some Curiosities Arising in Objective Bayesian Analysis

Hierarchical models. Dr. Jarad Niemi. August 31, Iowa State University. Jarad Niemi (Iowa State) Hierarchical models August 31, / 31

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Analysis Methods for Supersaturated Design: Some Comparisons

Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA)

Gibbs Sampling in Endogenous Variables Models

Principles of Bayesian Inference

Unsupervised Regressive Learning in High-dimensional Space

Variable Selection in Nonparametric Random Effects Models. Bo Cai and David B. Dunson

Modeling and Predicting Healthcare Claims

Bayesian Multivariate Logistic Regression

Gibbs Sampling in Linear Models #2

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Lecture 16: Mixtures of Generalized Linear Models

Bayes methods for categorical data. April 25, 2017

CASE STUDY: Bayesian Incidence Analyses from Cross-Sectional Data with Multiple Markers of Disease Severity. Outline:

Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US

Aspects of Feature Selection in Mass Spec Proteomic Functional Data

Using Model Selection and Prior Specification to Improve Regime-switching Asset Simulations

Scalable MCMC for the horseshoe prior

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010

Probabilistic machine learning group, Aalto University Bayesian theory and methods, approximative integration, model

Generalized Linear Models and Its Asymptotic Properties

Estimating Sparse High Dimensional Linear Models using Global-Local Shrinkage

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Bayesian data analysis in practice: Three simple examples

Nonparametric Bayesian Methods (Gaussian Processes)

Lecture 4: Dynamic models

Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference

Nonparametric Bayes tensor factorizations for big data

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j

Statistical Inference

ABC methods for phase-type distributions with applications in insurance risk problems

Bayesian Methods for Machine Learning

Prior Distributions for the Variable Selection Problem

An Extended BIC for Model Selection

Lecture 14: Shrinkage

Reduction of Model Complexity and the Treatment of Discrete Inputs in Computer Model Emulation

arxiv: v1 [stat.me] 6 Jul 2017

Computational statistics

Likelihood NIPS July 30, Gaussian Process Regression with Student-t. Likelihood. Jarno Vanhatalo, Pasi Jylanki and Aki Vehtari NIPS-2009

Regularization in Cox Frailty Models

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Robust Bayesian Simple Linear Regression

Bayesian Analysis of Latent Variable Models using Mplus

Plausible Values for Latent Variables Using Mplus

Bayesian non-parametric model to longitudinally predict churn

Efficient sampling for Gaussian linear regression with arbitrary priors

CTDL-Positive Stable Frailty Model

A Study into Mechanisms of Attitudinal Scale Conversion: A Randomized Stochastic Ordering Approach

A Frequentist Assessment of Bayesian Inclusion Probabilities

Ratemaking application of Bayesian LASSO with conjugate hyperprior

Hierarchical Modeling for Univariate Spatial Data

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

Hierarchical sparsity priors for regression models

Bayesian Hypothesis Testing in GLMs: One-Sided and Ordered Alternatives. 1(w i = h + 1)β h + ɛ i,

Interquantile Shrinkage in Regression Models

Extended Bayesian Information Criteria for Model Selection with Large Model Spaces

ESTIMATING THE MEAN LEVEL OF FINE PARTICULATE MATTER: AN APPLICATION OF SPATIAL STATISTICS

Sparse Variational Analysis of Large Longitudinal Data Sets

STA 4273H: Statistical Machine Learning

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1

Sparse Factor-Analytic Probit Models

Coupled Hidden Markov Models: Computational Challenges

Markov Chain Monte Carlo (MCMC)

Factor model shrinkage for linear instrumental variable analysis with many instruments

Bayesian inference on dependence in multivariate longitudinal data

MCMC Sampling for Bayesian Inference using L1-type Priors

An exploration of fixed and random effects selection for longitudinal binary outcomes in the presence of non-ignorable dropout

Linear Mixed Models. One-way layout REML. Likelihood. Another perspective. Relationship to classical ideas. Drawbacks.

Contents. Part I: Fundamentals of Bayesian Inference 1

penalized joint log-likelihood procedure with an adaptive penalty for the selection and estimation of the fixed and random effects.

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Bayesian Linear Models

Bayesian Linear Regression

Model Choice. Hoff Chapter 9, Clyde & George Model Uncertainty StatSci, Hoeting et al BMA StatSci. October 27, 2015

Embedding Supernova Cosmology into a Bayesian Hierarchical Model

Transcription:

Bayesian shrinkage approach in variable selection for mixed effects s GGI Statistics Conference, Florence, 2015 Bayesian Variable Selection June 22-26, 2015

Outline 1 Introduction 2 3 4

Outline Introduction 1 Introduction 2 3 4

Variable selection problem draws considerable attention. Penalized regression with shrinkage estimation and simultaneous variable selection seems promising Frequentist approaches: Tibshirani (1996), Fan and Li (2001), Zou (2006), Zou and Li (2008) etc. Bayesian approaches: Park and Casella (2008), Hans (2009), Carvalho et. al. (2009, 2010), Griffin and Brown (2007), and Armagan et al. (2013) None proposed for mixed effects s (LME). Even more challenging for LME coupled with random effects.

Linear mixed effects (LME) : y i = X i β + Z i ζ i + ɛ i ζ i N(0, Ψ) ɛ i N(0, σ 2 Λ) Challenge: standard methods such as AIC, BIC, Bayes factors etc. do not work well different number of random effects. (4 p s for p 1 fixed and random vector) Very few approaches proposed so far: Chen and Dunson (2003), Cai and Dunson (2006), Kinney and Dunson (2007), Bondell et al. (2010), Ibrahim et al. (2010), Yang (2012, 2013).

Some current methods for random effects Chen and Dunson (2003): Stochastic search variable selection (SSVS) approach to the LMM. Kinney and Dunson (2007): of data augmentation algorithm, parameter expansion (Gelman, 2004, 2005) and approximation for the GLMM. Ibrahim et. al. (2010):Maximum penalized likelihood (MPL) and smoothly clipped absolute deviation (SCAD) and ALASSO Bondell et. al. (2010): adaptive LASSO and EM Cai and Dunson (2006):Taylor series expansion for GLMM. All these s are parametric ones.

We use Bayesian shrinkage priors for joint selection of both fixed and random effects Desirable properties of such priors: spikes at zero, student t-like tails, simple characterization as a scale mixture of normals to facilitate computaton Such shrinkage priors can shrink small coefficients to zero without over-shrink large coefficients.

Outline Introduction 1 Introduction 2 3 4

Shrinkage priors for fixed effects Stochastic search variable selection (SSVS, George and McCulloch 1993): Introduce latent variable J, β J k = (β k, j k = 1) We use generalized double Pareto, horse shoe prior, and normal-exponential-gamma priors for the fixed effects.

Stochastic search variable selection:ssvs Proposed by George and McCulloch (1993), originally for subset selection in linear regression, searching for s with highest posterior probabilities starting with full s containing all candidate predictors choosing mixture priors that allow predictors to drop by zeroing the coefficients running Gibbs sampler with conditional conjugacy from the posterior distribution.

Mixed effects Introduction β = (β 1,, β p ), J = (j 1,, j p ) j k = 0 indicates β k = 0 j k 0 for β k 0 We take the Zellner g-prior (Zellner and Siow, 1980) β J N(0, σ 2 (X J X J ) 1 /g), g Gamma(.) and Bernoulli for j k Generally ζ N(0, Ω). of Cholesky decomposition Ω = ΛΓΓ Λ where Λ is a positive diagonal matrix with diagonal elements λ l, l = 1,, q proportional to the random effects standard deviations, and Γ is a lower triangular matrix with diagonal elements equal to 1.

The generalized double Pareto (GDP) density is defined as f (x a, b) = 1/(2a)(1 + x /(ab)) b+1, x GDP(a, b). The GDP resembles the Laplace density with similar sparse shrinkage properties, also has Cauchy-like tails, which avoids over-shrinkage However the posterior form is very complicated and the parameters are not in closed forms. Instead, it can be expressed as a scale mixture of normal distribution. x N(0, τ), τ Exp(λ 2 /2), and λ Ga(α, η), then x GDP(a = η/b, b).

β j k N(0, σ 2 /gκ k ), κ k exp(ϕ 2 k /2), ϕ k Gamma(ν 1, ϑ 1 ), β j k N(0, σ 2 /gκ 2 k ), κ k Ca + (0, ν 2 ), ν 2 Ca + (0, ϑ 2 ), β j k N(0, σ 2 /gκ k ), κ k exp(ς 2 /2), ς 2 Gamma(ν 3, ϑ 3 ),

Γ: lower triangular matrix q q with diagonal element 1. Model specification Introduction y ij = X ij β + Z ij ζ i + ɛ ij, V (ζ i ) = Ω i = 1,, n, j = 1,, n i X ij : q 1 Cholesky decomposition y ij = X ij ij i + ɛ ij, Λ : Diag(λ 1,, λ q ) ΛΓV (b i )Γ Λ = Ω

Prior for random effects The random effect with the generalized shrinkage prior is specified as follows: b ik G k G k DP(αG0 k ) G k 0 GDP(1.0, 1.0) The horse shoe prior and normal-exponential-gamma prior can be specified similarly. Λ : Diag(λ 1,, λ q ) Γ: lower triangular matrix q q with diagonal element 1. λ k p k0 δ 0 + (1 p k0 )N + (0, φ 2 k ), φ2 k IG(1/2, 1/2)

Parameter expansion (PX) Liu and Wu (1999), Gelman (2004, 2006) y θ, z N(θ + z, 1), z θ N(0, D) θ unknown parameter, z missing y θ, α, w N(θ α + w, 1), w θ, α N(α, D) z α, w PX breaks association, speeds up convergence.

Outline Introduction 1 Introduction 2 3 4

y ij = X ij β + Z ij ζ i + ɛ ij, ɛ ij N(0, σ 2 ) X ij = (1, x ij1,, x ij15 ), x ijk Unif ( 2, 2) z ijk = x ijk, k = 1,, 4 Case 1:ϱ i N(0, Ω), Ω = 0.00 0.00 0.00 0.00 0.00 0.9 0.48 0.06 0.00 0.48 0.40 0.10 0.00 0.06 0.10 0.10. β = (β 1,, β 16 ), β 1 = β 2 = β 3 = 1.5, β k = 0, k = 4,..., 16 Case 2:q = 8, (ϱ i1, ϱ i4 ) N(0, Ω 1 ), Ω = Ω1 same as Case 1 Case 3:β 4 = = β 9 = 0.01, everything else same as Case 2.

Case Method PF PR PM MSE S.E. 1 GDPP 1.00 0.63 0.63 0.12 0.18 HSPP 1.00 0.64 0.63 0.11 0.19 NEGP 0.98 0.59 0.58 0.13 0.19 KDNP 0.86 0.57 0.46 0.18 0.25 2 GDPP 1.00 0.57 0.55 0.11 0.011 HSPP 0.97 0.56 0.54 0.12 0.013 NEGP 0.96 0.54 0.53 0.13 0.014 KDNP 0.82 0.53 0.44 0.16 0.39 3 GDPP 1.00 0.60 0.60 0.18 0.10 HSPP 0.99 0.61 0.59 0.20 0.11 NEGP 0.93 0.58 0.56 0.26 0.15 KDNP 0.40 0.55 0.23 0.35 0.24

captioncomparisons of parameters estimates in the simulation study. S.E. is the standard error of the MSE.PM is the percentage of times the true was selected, PF and PR are the percentage of times the correct fixed and random effects were selected respectively.

Table: Models with highest posterior probability in the simulation study for Case 3: is the true Model GDPP HSPP NEGP KDNP x ij1, x ij2, x ij3, z ij2, z ij3, zij4 0.62 0.61 0.59 0.23 x ij1, x ij2, x ij3, z ij1, z ij2, z ij3, z ij4 0.37 0.38 0.39 0.15 x ij1, x ij2, x ij3, x ij6, x ij9, z ij2, z ij3, z ij4 0.082 x ij1, x ij2, x ij3, x ij5, x ij7, x ij9, z ij2, z ij3, z ij4 0.060 x ij1, x ij3, x ij5, z ij2, z ij3, z ij4 0.051 x ij1, x ij2, x ij3, x ij9, z ij2, z ij3, z ij4 0.031 x ij1, x ij2, x ij3, x ij6, z ij1, z ij2, z ij3, z ij4 0.027 x ij1, x ij2, x ij3, x ij8, x ij9, z ij2, z ij3, z ij4 0.025 x ij1, x ij2, x ij3, x ij8, z ij1, z ij2, z ij3, z ij4 0.023 x ij1, x ij2, x ij3, x ij8, z ij2, z ij3, z ij4 0.022

The shrinkage approaches provide better results than the KDNP. Based on extensive simulations, we find that the KDNP is not stable and suffers from numerical issues and fails easily. Of all the approaches, GDPP and HSPP perform better than the others in variable selection and parameter estimation. GDPP can be implemented in a more efficient Gibbs sampling algorithm than HSPP, which uses Metropolis-Hasting algorithm.

Outline Introduction 1 Introduction 2 3 4

A bioassay study conducted to identify the in vivo estrogenic responses to chemicals. Data consisted of 2681 female rats, experimented by 19 participating labs in 8 countries from OECD. 2 protocols on immature female rats with administration of doses by oral gavage (protocol A) and subcutaneous injection (protocol B). 2 other protocols on adult female rats with oral gavage (protocol C) and subcutaneous injection (protocol D). Each protocol was comprised of 11 groups, each with 6 rats. 1 untreated group, 1 control group, 7 groups administered with varied doses of ethinyl estradiol (EE) and ZM.

Goal: how the uterus weights interacts with the in vivo activity of chemical compounds under different experimental conditions. There is concerns about heterogeneity across different labs. Kannol et. al (2001) analyzed the data with simple parametric methods. The number of possible competing s is over 4,000.

Table: Top 10 s with highest posterior probabilities by GDPP, HSPP and NEGP for real data Model GDPP HSPP NEGP x ij1, x ij3, x ij4, x ij5, x ij6, z ij1, z ij3, z ij4 0.098 0.093 0.072 x ij1, x ij3, x ij5, x ij6, z ij1, z ij2 0.063 0.049 0.097 x ij1, x ij3, x ij4, x ij5, x ij6, z ij1, z ij2, z ij4 0.047 0.039 0.038 x ij1, x ij3, x ij4, x ij5, x ij6, z ij1, z ij4 0.038 0.023 0.033 x ij1, x ij3, x ij4, x ij5, x ij6, z ij1, z ij3, z ij4, z ij6 0.026 x ij1, x ij3, x ij4, x ij5, x ij6, z ij1, z ij2, z ij6 0.023 x ij1, x ij3, x ij4, x ij5, x ij6, z ij1, z ij2, z ij5 0.020 0.018 0.034 x ij1, x ij3, x ij4, x ij5, x ij6, z ij1 0.020 0.013 0.046 x ij1, x ij3, x ij4, x ij5, x ij6, z ij1, z ij2, z ij3, z ij4 0.020 0.025 x ij1, x ij3, x ij4, x ij5, x ij6, z ij1, z ij3, z ij4, z ij5 0.019 x ij1, x ij3, x ij4, x ij5, x ij6, z ij1, z ij3, z ij4, z ij6 0.015 x ij1, x ij3, x ij5, x ij6, z ij1, z ij3, z ij4, z ij5 0.032 x ij1, x ij3, x ij4, x ij5, x ij6, z ij1, z ij2 0.028 x ij1, x ij3, x ij4, x ij5, x ij6, z ij1, z ij2, z ij4, z ij5 0.015

Table: Comparisons of parameter estimates in the real data study Parameter GDPP HSPP NEGP LMER β 1 3.61 a (3.31, 3.76) b 3.60 a (3.35, 3.87) b 3.60 a (3.28, 3.96) b 3.61 a (3.16, 4.10) c β 2 NA d NA d NA d 0.06 (0.02, 0.12) β 3 1.21 (1.04, 1.47) 1.18 (1.06, 1.53) 1.22 (0.97, 1.49) 1.20 (1.03, 1.69) β 4 1.20 (0.89, 1.64) 1.21 (0.87, 1.70) 1.19 (0.79, 1.68) 1.27 (0.85, 1.64) β 5 0.14 (0.13, 0.16) 0.14 (0.13, 0.16) 0.14 (0.13, 0.16) 0.14 (0.13, 0.15) β 6-0.48 (-0.57, -0.35) -0.47 (-0.56, -0.39) -0.47(-0.59, -0.24) -0.49 (-0.57, -0.37) a Mean b 95% credible interval c 95% confidence interval d Not selected in the

We try to fit the with KDNP but it fails with too many numerical issues. Fixed effect of protocol B is not selected for any, but the corresponding random effects is, which indicates heterogeneity. Both fixed and random effects of protocol C and D are selected in the leading s. The fixed effects of EE dose and ZM dose are selected in the leading s, but seldom for the corresponding random effects.

References Introduction Chen and Dunson Biometrics, 2003 Cai and Dunson Biometrics, 2006 Kinney and Dunson Biometrics, 2007 Bondell et al. Biometrics, 2010 Ibrahim et al. Biometrics, 2010 Yang, M, Computational Statistics and Data Analysis, 2012 Yang, M, Biometrical Journal, 2013 Yang, M, Under review, 2015