Fused elastic net EPoC

Size: px

Start display at page:

Download "Fused elastic net EPoC"

Kenneth Sharp
5 years ago
Views:

1 Fused elastic net EPoC Tobias Abenius November 28, 2013

2 Data Set and Method Use mrna and copy number aberration (CNA) to build a network using an extension of our EPoC method (Jörnsten et al., 2011; Abenius et al., 2012). Extension with generalized fused LASSO and elastic net generalized fused LASSO which is for this project. Two different cancers, breast cancer (N=766) and ovarian cancer (N=560). Small subset (p = 250) of all genes that exist in the intersect gene list of the two cancers.

3 EPoC summary The EPoC method first standardize mrna and CNA, then CNA direct effects on their corresponding mrna are calculated (univariate regression) and subtracted from mrna to form the residual mrna. This residuals will act as response and CNA will act as predictors.

4 Fused Objective The objective of the standard fused lasso penalizes differences between consecutive coefficients (Tibshirani et al., 2005): β fused = arg min β p y Xβ λ 1 β 1 + λ 2 β i 1 β i i=2 }{{} standard fused LASSO part

5 Fused Objective The objective of the standard fused lasso penalizes differences between consecutive coefficients (Tibshirani et al., 2005): β fused = arg min β p y Xβ λ 1 β 1 + λ 2 β i 1 β i i=2 }{{} standard fused LASSO part This has been extended to a more general case where all pairwise differences are penalized (Petry et al., 2011): p p β pwfused = arg min y Xβ λ 1 β 1 + λ 2 β i β j β i=1 j=i+1 }{{} pairwise fused LASSO part

6 Fused Objective cont. This can be even more generalized to (Ye and Xie, 2010) β = arg min y Xβ λ 1 β 1 + λ 2 Lβ 1, β where matrix L { 1, 0, 1} m p tell which absolute differences between β coefficients to penalize.

7 For standard fused lasso L becomes a matrix with ones on the diagonal and 1 on the superdiagonal. L = β 1 β 2 β p

8 For pairwise fused lasso L contains ( p 2) rows where a row contains one 1 and one 1 in the places corresponding to the coefficient combination for this row. Here is an example with p = 4: L = β 1 β 2 β 3 β 4 pair pair ( )

9 Split Bregman Solver Ye and Xie (2010) introduced a method to solve this generalized fused lasso with this L: repeat β k+1 arg min V (β) + u k, β a k + v k, Lβ b k + β + µ 1 β a k µ 2 Lβ b k ) a k+1 T λ 1 (β k+1 + uk µ 1 µ 1 ) b k+1 T λ 2 (Lβ k+1 + vk µ 2 µ 2 u k+1 u k + δ 1 (β k+1 a k+1 ) v k+1 v k + δ 2 (Lβ k+1 b k+1 ) until convergence where T λ is the soft threshold operator with parameter λ, µ 1 = 1, µ 2 = 1, δ 1 = µ 1 1 and δ 2 = µ 1 2.

10 Elastic net Elastic net: arg min β y Xβ λ elastic β λ LASSO β 1 It is fairly easy to show (see exercise 2.9) that elastic net can be solved by a LASSO solver. Instead of using X X use where G = (1 γ)x X + γi p γ = λ elastic 1 + λ elastic. I implemented fused elastic net in C as an R package using this extension of the split bregman algorithm.

11 Results There seems to be an interplay of different penalty parameters. Up, no elastic penalty, down: λ Elastic = 10. Horizontal: different λ LASSO Vertical: different λ Fused. xtable(fusedlinks) xtable(fusedelasticlinks)

12 Results xtable(t(ns1)) xtable(t(ns2))

13 Claim to verify Elastic net adds an L2 penalty term and claims that it will not behave as LASSO where of two correlated predictors one can be active and the other not, somewhat randomly. I decided to try to verify this claim.

14 Results For each pair of genes (predictors) very correlated (ρ > 0.9), how often are exactly one of them in the active set, in all the (p = 250) (K = 2) (#λ 1 = 8) different regressions? fused epoc: % elastic fused: % How often are both of them zero? fused epoc: % elastic fused: % How often are none of them zero? fused epoc: % elastic fused: %

15 RSS e-04 5e-04 5e-03 5e-02 5e-01 λ LASSO Figure : In blue RSS without elastic and in red with elastic

16 References Rebecka Jörnsten, Tobias Abenius, Teresia Kling, Linnéa Schmidt, Erik Johansson, Torbjörn EM Nordling, Bodil Nordlander, Chris Sander, Peter Gennemark, Keiko Funa, et al. Network modeling of the transcriptional effects of copy number aberrations in glioblastoma. Molecular Systems Biology, 7(1), Tobias Abenius, Rebecka Jörnsten, Teresia Kling, Linnéa Schmidt, José Sánchez, and Sven Nelander. System-scale network modeling of cancer using epoc. In Advances in Systems Biology, pages Springer, Robert Tibshirani, Michael Saunders, Saharon Rosset, Ji Zhu, and Keith Knight. Sparsity and smoothness via the fused lasso. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(1): , February Sebastian Petry, Claudia Flexeder, and Gerhard Tutz. Pairwise Fused Lasso. Technical report, Dept.of Statistics, University of Munich, (102), Gui-bo Ye and Xiaohui Xie. Split Bregman method for large scale fused Lasso. arxiv: v1, (1), 2010.

Stability and the elastic net

Stability and the elastic net Patrick Breheny March 28 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/32 Introduction Elastic Net Our last several lectures have concentrated on methods for