Bayesian Variable Selection Regression Of Multivariate Responses For Group Data

Size: px

Start display at page:

Download "Bayesian Variable Selection Regression Of Multivariate Responses For Group Data"

Justin Weaver
5 years ago
Views:

1 Bayesian Variable Selection Regression Of Multivariate Responses For Group Data B. Liquet 1,2 and K. Mengersen 2 and A. N. Pettitt 2 and M. Sutton 2 1 LMAP, Université de Pau et des Pays de L Adour 2 ACEMS, QUT, Australia

2 Contents Motivation: Integrative Analysis for group data Application to pleiotropy investigation Sparse Model: Lasso penalty, Group penalty, Sparse Group Lasso Bayesian framework Simulation Studies Illustration: genomics data R package: MSGBSS

3 Integrative Analysis Goal of Integrative Analysis: Wikipedia. Data integration involves combining data residing in different sources and providing users with a unified view of these data. This process becomes significant in a variety of situations, which include both commercial and scientific. System Biology. Integrative Analysis: Analysis of heterogeneous types of data from inter-platform technologies. Goal. Combine multiple types of data: Contribute to a better understanding of biological mechanism. Have the potential to improve the diagnosis and treatments of complex diseases.

4 Example: Data definition p q n X - n observations - p variables Y - n observations - q variables n

5 Example: Data definition p q n X - n observations - p variables Y - n observations - q variables n Omics. Y matrix: gene expression, X matrix: SNP (single nucleotide polymorphism). Many others such as proteomic, metabolomic data. neuroimaging. Y matrix: behavioral variables, X matrix: brain activity (e.g., EEG, fmri, NIRS) neuroimaging genetics. Y matrix: fmri (Fusion of functional magnetic resonance imaging), X matrix: SNP Ecology/Environment. Y matrix: Water quality variables, X matrix: Landscape variables

6 Data: Constraints and Aims Main constraint: situation with p > n

7 Group structures within the data Natural example: Categorical variables which is a group of dummies variables in a regression setting.

8 Group structures within the data Natural example: Categorical variables which is a group of dummies variables in a regression setting. Genomics: genes within the same pathway have similar functions and act together in regulating a biological system. These genes can add up to have a larger effect can be detected as a group (i.e., at a pathway or gene set/module level).

9 Group structures within the data Natural example: Categorical variables which is a group of dummies variables in a regression setting. Genomics: genes within the same pathway have similar functions and act together in regulating a biological system. These genes can add up to have a larger effect can be detected as a group (i.e., at a pathway or gene set/module level). We consider variables are divided into groups: Example p: SNPs grouped into K genes X = [SNP 1,... + SNP } {{ } k SNP k+1, SNP k+2,..., SNP h... SNP } {{ } l+1,..., SNP p ] } {{ } gene 1 gene 2 gene K Example p: genes grouped into K pathways/modules (X j = gene j ) X = [X 1, X 2,..., X } {{ } k X k+1, X k+2,..., X h... X } {{ } l+1, X l+2,..., X p ] } {{ } M 1 M 2 M K

10 Aims in regression setting: Select group variables taking into account the data structures; all the variables within a group are selected otherwise none of them are selected Combine both sparsity of groups and within each group; only relevant variables within a group are selected

11 Some frequentist Appoaches Lasso models: regression model use L 1 and L 2 penalties for performing variable selection

12 Lasso models Univariate model : Y = Xβ + ε, Y R n, β R p Lasso regression: sparse model with L 1 penalty β lasso = argmin β R p n p (Y i X T i β) 2 + λ β j i=1 = argmin Y Xβ 2 2 +λ β 1 β R p } {{ }}{{} Penalty Loss Group Lasso regression: sparse model using group structure with L 2 penalty on group (X g with β g R p g ) min β R p j=1 G G Y X g β g λ pg β g 2 g=1 -If p g = 1 = Lasso method ( β g 2 = g=1 β 2 g = β g )

13 Lasso models Univariate model : Y = G g=1 X gβ g + ε, Y R n, β R p Sparse Group Lasso regression: sparse group model using group structure combining L 1 and L 2 penalties min β R p G G Y X g β g λ 1 pg β g 2 + λ 2 β 1 g=1 g=1

14 Lasso model for Multivariate Y where Y (n q), B (p q) Y = XB + E Sparse model: select predictors associated to the multiple phenotype Y (n q): p min B Y XB 2 F + λ β l 2 where B = β 1 1 β β q 1 β 1 2 β β q 2 l=1, β l = (β 1 l...., β2 l,..., βq l ) β 1 p β 2 p... β q p Implemented in glmnet() Sparse Group model: see e.g., Li et al. (2015), Wand et al. (2012).

15 Bayesian Framework Univariate regression model: Y G g=1 X gβ g N n ( 0, σ 2 I n ) Xu and Ghosh (2015): Bayesian group lasso using spike and slab priors for group variable selection. Rockova and Lesafre (2014): (EM) algorithm for a hierarchical model incorporating grouping information. Stingo et al. (2011): PLS approach for pathway and gene selection using variable selection priors and Markov chain Monte Carlo (MCMC)

16 Bayesian Framework Multivariate regression model: G ( Y X g B g MN n q 0p q, I n, ) Σ g=1 Zhu et al. (2014): Bayesian generalized low rank regression model. Greenlaw et al. (2016): Bayesian hierarchical modeling for imaging genomics data. Limited to Σ = σ 2 I q.

17 Bayesian group lasso model with spike and slab priors Xu and Ghosh (2015) approaches: spike and slab priors providing variable selection at the group level. hierarchical spike and slab prior structure to select variables both at the group level and within each group. Limited to univariate case. Doesn t take into account group size in the penalisation part.

18 Multivariate Bayesian Group Lasso with Spike and Slab prior Y X, B, Σ MN n q (XB, Σ, I n ), (1) Vec(B T g Σ, τ g, π 0 ) ind (1 π 0 )N mg q(0, I mg τ 2 g Σ) + π 0δ 0 (Vec(B T g )), (2) ( ) τ 2 ind mg + 1 g Gamma, λ2, g = 1,..., G, (3) 2 2 Σ IW(d, Q) (4) π 0 Beta(a, b) (5) where δ 0 (Vec(B T g )) denotes a point mass at 0 Rm gq, B g is the m g q regression coefficient matrix for the group g

19 Calibration of λ λ is related to the coefficient of shrinkage. tuned using an empirical Bayes approach. a Monte Carlo EM algorithm is used. the kth EM update for λ is: λ (k) p + G = G g=1 E [ λ τ 2 (k 1) g Y ], in which the posterior expectation of τ 2 g is replaced by the Monte Carlo sample average of τ 2 g generated in the Gibbs sample based on λ (k 1).

20 Median Thresholding Estimator In the case of a block orthogonal design matrix X (i.e., X T i X j = 0 for i j), we have for 1 g G Vec( B T g ) = Vec ( ((X T g X g ) 1 X T g Y) T) N mg q ( Vec(B T g ), (X T g X g ) 1 Σ ). We showed assuming π 0 > c 1+c, then there exists t(π 0) > 0, such that the marginal posterior median of β g satisfies ij Med(β g ij B g ) = 0 for any 1 i m g and 1 j q when Vec( B T g ) 2 < t. The marginal posterior median estimator of the gth group of regression coefficients is zero when the norm of the corresponding block least square estimator is less than a certain threshold.

21 Posterior median as a soft thresholding estimator Assuming an orthogonal design matrix X, i.e., X T X = ni p and consider our model defined with fixed τ 2 g (1 g G). Posterior distribution of B g is a spike and slab distribution, ( Vec(B T g ) X, Y (1 l g)n mgq (1 D g )Vec(B T LS,g ), 1 D ) g I mg Σ + l g δ 0 (Vec(B T g n )), where B LS,g is the least squares estimator of B g, D g = l g = π 0 1, and l 1+nτ 2 g = p(b g = 0 rest) g π 0 + (1 π 0 )(τ 2 mg (q 1) g ) 2 (1 + nτ 2 mg { g ) 2 exp (1 D g )ntr[σ 1 B T LS,g B } LS,g] The marginal posterior distribution of β g ij is also a spike and slab distribution, ( β g ij X, Y (1 l g)n (1 D g ) β g LS,ij, 1 D ) g Σ jj + l g δ 0 (β g n ij ), where Σ jj is the j-th diagonal element of Σ.

22 Posterior median as a soft thresholding estimator The resulting median is a soft thresholding estimator defined by β Med,g ij = Med(β g ij X, Y) = sgn ( β g LS,ij ) (1 D g) β g Σjj LS,ij Q g 1 D g, n + where z + denotes the positive part of z and Q g = φ 1 ( 1 ). 2(1 min( 1 2,l g)) For a univariate response (q = 1) the matrix Σ reduces to the scalar σ 2, and our result matches the previous work of Xu et al (2015). In the multivariate frequentist setting, Li et al. (2015) proposed an iterative algorithm with similar soft thresholding function to incorporate group structure in estimating the regression estimates.

23 Oracle property Let B 0, B 0 g, β 0,g ij denote the true values of B, B g, β g ij, respectively. The index vector of the true model as A = (I( Vec(B g ) 2 0), g = 1,..., G), The index vector of the model selected by certain thresolding estimator B g as A n = (I( Vec( B g ) 2 0), g = 1,..., G). Model selection consistency is attained if and only if lim n P(A n = A) = 1.

24 Oracle property Under orthogonal design the median thresholding estimator has oracle property. Theorem: Assume orthogonal design matrix, i.e., X T X = ni p. Suppose nτ 2 g,n and log(τ 2 g,n )/n 0 as n, for g = 1,..., G, then the median thresholding estimator has oracle property, that is, variable selection consistency, and asymptotic normality, lim n P(AMed n = A) = 1 ( n Vec( B Med A ) Vec(B0 A )) d N(0, Σ I).

25 Gibbs Sampler: full posterior distribution where p(b, τ 2, Σ, π 0 Y, X) p(y B, τ 2, Σ, π 0 ) p(b τ 2, Σ, π 0 ) p(y B, τ 2, Σ, π 0 ) Σ n/2 exp p(b τ 2, Σ, π 0 ) = p(τ 2 ) p(σ) p(π 0 ), { 1 2 Tr [ (Y XB)Σ 1 (Y XB) T ]}, G p(b g τ 2 g, Σ, π 0 ), g=1 p(b g τ 2 g, Σ, π 0 ) (1 π 0 )(2π) qmg 2 (τ 2 qmg g ) 2 Σ mg 1 2 exp [ ] 2τ 2 Tr B g Σ 1 B T g I[B g 0] g +π 0 δ 0 (Vec(B T g )), G p(τ 1,..., τ g ) (λ 2 qmg +1 ) 2 (τ 2 qmg +1 g ) 2 1 exp λ2 m g g=1 p(π 0 ) π a 1 0 (1 π 0 ) b 1, p(σ) Σ d+q+1 2 exp { 1 2 Tr(QΣ 1 ) } 2 τ2 g,

26 Conditional posterior distribution Let B (g) denote the B matrix without the gth group, and X (g) denote the covariate matrix corresponding to B (g), that is, X (g) = (X 1,..., X g 1, X g+1,..., X G ) where X g is the design matrix corresponding to B g. Conditional posterior distribution of B g : spike and slab distribution Conditional posterior distribution of α 2 g = 1 τ 2 g Inverse Gamma if B g = 0 Inverse Gaussian if B g 0 Conditional posterior distribution of Σ: Inverse Wishart Conditional posterior distribution of π 0 : Beta distribution

27 Multivariate Sparse Group Selection with Spike and Slab Prior (MBSGS-SS) Reparametrize the coefficients matrices to tackle the two kinds of sparsity separately: B g = V 1 2 g B g, g = 1,..., G; j = 1,..., m g, with V 1 2 g = diag{τ g1,..., τ gmg }, τ gj 0 and where B g, when nonzero, follow Vec( B T g ) N m g q(0, I mg Σ). Diagonal element of V 1 2 g control the magnitude of the elements of B g.

28 Multivariate Sparse Group Selection with Spike and Slab Prior (MBSGS-SS) To select variables at the group level, we assume the multivariate spike and slab prior for each Vec( B T g ): Vec( B T g Σ, τ g, π 0 ) ind (1 π 0 )N mg q(0, I mg Σ) + π 0 δ 0 (Vec( B T g )) Note that when τ gj = 0, the j-th row of B g is essentially dropped out of the model even when B j g 0. So in order to choose variables within each relevant group, we assume the following spike and slab prior for each τ gj : τ gj ind (1 π 1 )N + (0, s 2 ) +π 1 δ 0 (τ gj ), g = 1,..., G; j = 1,..., m g, where N + (0, s 2 ) denotes a normal N(0, s 2 ) distribution truncated below at 0.

29 Prior Specification We assume an Inverse Wishart prior for Σ IW(d, Q) We assume conjugate beta hyper-priors for π 0 and π 1. We use a conjugate inverse gamma prior for s 2 Inverse Gamma(1, t), We stimate t with the Monte Carlao EM algorithm. For the k-th EM update, t (k) 1 = [ 1 Y ], s 2 E t (k 1) where the posterior expectation of Gibbs samples based on t (k 1). 1 s 2 is estimated from the

30 Gibbs Sampler: full posterior distribution Joint posterior p( B, τ 2, Σ, π 0, π 1, s 2 Y, X) Σ n/2 exp 1 T G G 2 Tr Y X g V 1 2 g B g Σ 1 Y X g V 1 2 g B g g=1 G (1 π 0 )(2π) qmg 2 Σ mg 2 exp g=1 g=1 { 1 [ B 2 Tr g Σ 1 B ] } T g I[ B g 0] +π 0 δ 0 (Vec( B T g )) G m g (1 π 1)2(2πs 2 ) 1 2 exp τ2 gj 2s 2 I[τ gj > 0] + π 1 δ 0 (τ gj ) g=1 j=1 { Σ d+q+1 2 exp 1 } 2 Tr(QΣ 1 ) π a (1 π 0 ) a 2 1 π c 1 1 (1 π 1 1 ) c 2 1 { t(s 2 ) 2 exp t } s 2

31 Simulation Studies Comparison to Lasso, group Lasso, sparse goup Lasso for univariate setting R package glmnet and SGL Comparison to Lasso for Mulivariate setting R package glmnet Bayseian approaches BGL-SS (Bayesian Group Lasso with Spike and Slab prior) BSGS-SS (Bayesian Sparse Group selection with spike and slab priors) MBGL-SS for multivariate setting MBSGL-SS for multivariate setting running iterations in which the first are burn-ins.

32 Simulation results: summary Simulations results suggest that the multivariate Bayesian group lasso with spike and slab prior is strongly influenced by a combination of different group size structures and high correlation between predictors. The multivariate Bayesian sparse group selection with spike and slab prior does not suffer in this situation. Most powerful method for variable selection and prediction performance in the presence of group structure data All numerical results from our article could be reproduced using our R package MBSGS available on CRAN.

33 Computation The current version of our package runs for example: a MBSGS-SS model in around 2 minutes a MBGL-SS model in around 1 minute for a model with 20 groups of 5 variables for a sample size (n = 900) with iterations including for the burnin. Further improvements of the code such as parallelization over the group structure are in progress to speed up the computational time for tacking Big Data sets In this context of genetics studies, some further extensions of our model are under investigation such as integrating different group penalties given a biological prior of the pathways or different distribution priors for each group.

34 Application: Aim and Data Identify a parsimonious set of predictors that explains the joint variability of gene expression in four tissues (adrenal gland, fat, heart, and kidney). 770 SNPs in 29 inbred rats as a predictor matrix (n = 29, p = 770) 29 measured expression levels in the 4 tissues as the outcome (q = 4).

35 Application chromosome information defines the group structure of the predictor matrix ran our MBGL-SS and MBSGS-SS models using this group structure The multivariate lasso selects 69 SNPs which come from the 20 chromosomes. The MBGL-SS selects only the two first groups corresponding to the SNPs from chromosomes 1 and 2.

36 Application: MBSGS-SS results, 32 SNPs from 8 chromosomes

37 Application: results SNP D14Mit (from chromosome 10) has been previously identified Highest estimate (0.334) for the heart tissue the posterior standard deviation of the regression parameter for each selected non-zero median estimate was in the range 0.11 to SNP D14Mit3 estimate was with posterior standard deviation Importance of chromosomes could be investigated using an estimate of the probability of inclusion

38 Concluding remarks: summary Bayesian methods for group-sparse modeling in the context of a multivariate correlated response variable. Our models are based on spike and slab type priors which facilitate variable selection. Importance to include the group size information in the shrinkage part of our model. We have shown that the posterior median estimator could both select and estimate the regression coefficients. Simulation results suggest very good performance of the posterior median estimator for variable selection and prediction error. This estimator obtains similar results as the highest probability model in terms of true and false positive rates.

39 References - Li, Y., Nan, B., and Zhu, J. (2015). *Multivariate Sparse Group Lasso for the Multivariate Multiple Linear Regression with an Arbitrary Group Structure.* Biometrics, 71(2): Liquet, B., Lafaye de Micheaux, P., Hejblum, B., and Thiebaut, R. (2016). *Group and Sparse Group Partial Least Square Approaches Applied in Genomics Context.* Bioinformatics, 3(1): Liquet, B. and Sutton, M. (2016). *MBSGS: Multivariate Bayesian Sparse Group Selection with Spike and Slab*. R package version URL - Liquet, B., Mengersen, K., Pettitt, A. N. and Sutton, M. (2016). *Bayesian Variable Selection Regression Of Multivariate Responses For Group Data* Bayesian Analysis. Volume 12, Number 4 (2017), Received the Lindley award from ISBA Prize Committee, delivered during ISBA World Meeting in Edinburgh Xu, X. and Ghosh, M. (2015). *Bayesian Variable Selection and Estimation for Group Lasso.* Bayesian Analysis, 10(4):

40 Extension to Pleotropic mapping for genome-wide association studies using group variable selection Pleiotropy: genetic variants which affect multiple different complex diseases Example: genetic variants which affect both Breast and Thyroid cancer. Results from GWAS suggest that complex diseases are often affected by many variants with small effects (known as polygenicity) Aims: statistical method to leverage pleiotropic effects incoporate prior pathway knowledge to increase statistical power and identify important risk variants.

41 Model for multiple GWAS studies Suppose we have data from K independent GWAS datasets, D = D 1 D K, where D k = ({y 1, x 1 },..., {y nk, x nk }) y ik {0, 1} denotes the phenotype of the kth study x ik R p is the vector with corresponding p SNPs. Logistic regression model y ik Bernoulli ( g 1 (ν ik ) ) ν ik = x T ik β k for k = 1,..., K, where g( ) is the logistic link function β k R p the regression coefficients for the kth GWAS. Let β j R K, j = 1,..., p, the vector of K regression coefficients corresponding to the jth SNP over the K GWAS.

42 Group Structure SNPs can be partitioned into G groups (genes) Let π g, g = 1,..., G the set of SNPs contained in the gth group with p g = π g. Matrix of all regression coefficients as B = (β 1,..., β K ) = (β 1,..., β p ) T.

43 Frequentist Approach The log likelihood for the combined datasets: p(d B) = K n k ( ) L y ik β T k x ik k=1 i=1 The penalised likelihood estimate B = argmin B R p K where L(x) = ln(1 + e x ) K n k ( ) L y ik β T k x ik + λ1 B + λ G2,1 2 B l2,1 (6) k=1 i=1 G 2,1 -norm penalty B G2,1 = G l 2,1 -norm penalty B l2,1 = p i=1 g=1 i π g K j=1 β2 ik K k=1 β2 ik respectively. The G 2,1 -norm fixes the group structure across studies and encourages sparsity at a gene level. The l 2,1 -norm which allows sparsity within a group.

44 Inference Inference using the alternating direction method of multipliers algorithm (ADMM). Novel approach for identifying pleiotropic effects as it accounts for gene specific and SNP specific effects using a variable selection approach. The method is only capable of producing a point estimate of B and acurrate estimation of the variance for these parameters is not easily given.

45 Bayesian Logistic regression with multivariate spike and slab prior: LogitMBGL-SS Let γ = (γ 1,..., γ p ) T indicate the association status for SNPs where γ j = 1 indicates that the jth SNP is associated to all K traits. Spike and slab prior for the jth SNP β j R K, β j (1 γ j )N K (0, τ 2 j V) + γ jδ 0 (β j ) ( K + 1 τ 2 j Gamma 2, λ ), 2 V IW(d, Q), γ j Bernolli(α 0 ) α 0 Beta(a, b) for j = 1,..., p, where δ 0 (β j ) denotes a point mass at 0 R K. Here, V R K K is a covariance matrix modeling the covariance of the SNP effect on the traits.

46 Extension Should perform well when the SNPs are independent. GWAS datasets: strong correlations that can occur between SNPs within the same gene. Solution: reparameterise the coefficeints to handle the sparsity at a gene grouping level and individual feature level separately. τ R p to model individual sparsity b (g) R pgk with b (g) = (b (g)t,..., b (g)t 1 p g ) where b (g) j group sparsity. R K for β j = τ j b (g) j, where τ j 0, for all j π g.

47 Bayesian Logistic regression using multivariate sparse group selection with spike and slab priors β j = τ j b (g) j, where τ j 0, for all j π g. We assume the following multivariate spike and slab b (g) (1 α 0 )N pg K (0, I pg V) + α 0 δ 0 (b (g) ) τ j (1 α 1 )N ( + 0, s 2) + α 1 δ 0 (τ j ), α 0 Beta(a 1, a 2 ) α 1 Beta(c 1, c 2 ) s 2 InvGamma(1, t) for j π g and g = 1,..., G

48 Signal recovery: (i) gmt: Grouped multi-task penalised logistic regression (λ 1 > 0, λ 2 = 0) using G 2,1 -norm (ii) smt: Sparse multi-task penalid penalised logistic regression (λ 1 = 0, λ 2 > 0 ) using l 2,1 -norm (iii) sgmt: Sparse group multi-task penalised logistic regression (λ 1 > 0, λ 2 > 0 ) (iv) logitmbgl: Bayesian logistic regression using multivariate group lasso with spike and slab prior (v) logitmbsgs: Bayesian logistic regression using multivariate sparse group selection with spike and slab prior

49 Results: Frequentist group gmt Study 1 gmt Study value of coefficient sgmt Study 1 sgmt Study 2 smt Study 1 smt Study 2 Study 1 Study 2 index of coefficient

50 Results: Bayesian group MBGL Study 1 MBGL Study 2 value of coefficient MBSGS Study 1 MBSGS Study 2 Study 1 Study 2 index of coefficient

51 Main Conclusion on the simulation studies The penalised approaches perform reasonably well in variable selection but the reconstructed signal is underestimated. In general, the penalised likelihood methods suffer in terms of false negatives, selecting more variables to be nonzero than the Bayesian methods. The Bayesian methods perform the best in terms of signal recovery measured by the l 1 error and variable selection performance metrics. The penalised likelihood approaches are computationally efficient using alternating direction method of multipliers algorithm Simulation results suggest that when computationally possible the Bayesian estimators should be used. The multivariate Bayesian sparse group selection with spike and slab prior performed the best in terms of signal recovery. The Bayesian method provides a natural method for quantifying the variability of the estimated coefficients.

52 What Next? Application on real data: case/control studies Breast Cancer and Thyroide Cancer Thyroide Cancer (482 case, 463 control) Breast Cancer (1172 case, 1125 control) 6677 SNPs from 618 genes from 10 non-overlapping gene pathways.

53 What Next? Application on real data: case/control studies Breast Cancer and Thyroide Cancer Thyroide Cancer (482 case, 463 control) Breast Cancer (1172 case, 1125 control) 6677 SNPs from 618 genes from 10 non-overlapping gene pathways. Looking for one postdoc position to fill for 2 years granted by la ligue contre le Cancer post-doctorat.html

Bayesian Variable Selection Regression of Multivariate Responses for Group Data

Bayesian Analysis 2017) 12, Number 4, pp. 1039 1067 Bayesian Variable Selection Regression of Multivariate Responses for Group Data B. Liquet,, K. Mengersen, A. N. Pettitt, and M. Sutton, Abstract. We