Structural Learning and Integrative Decomposition of Multi-View Data

Size: px

Start display at page:

Download "Structural Learning and Integrative Decomposition of Multi-View Data"

Ira Gilmore
5 years ago
Views:

1 Structural Learning and Integrative Decomposition of Multi-View Data, Department of Statistics, Texas A&M University JSM 2018, Vancouver, Canada July 31st, 2018

2 Dr. Gen Li, Columbia University, Mailman School of Public Health

3 Heterogeneous multi-view data Multi-view/Multimodal/Multisource: n common samples, d different views/platforms: X 1,..., X d, X i R n p i. Prototypical Example: TCGA

4 TCGA - BRCA From Cancer Genome Atlas Network, Nature, 2012: n = 348 subject with measurements of 1. gene expression (p 1 = 645) 2. DNA methylation (p 2 = 574) 3. mirna expression (p 3 = 423) 4. reverse phase protein array (p 4 = 171) Goals of the analysis: explorative analysis with respect to breast cancer subtype identification, consensus clustering, dimension reduction

5 Naive approaches PCA - standard dimension reduction technique, can be used for exploratory analysis, clustering,... Possible strategies: 1. Apply PCA/other method separately to each view 2. Apply PCA/other method to concatenated matrix with all the views Limitations: 1 does not take advantage of matched samples 2 does not take into account heterogeneity between the views Neither 1 or 2 can be used for association analysis

6 Linked Component Models Represent each view via shared and individual structures: view = shared + individual + noise shared + individual - dimension reduction shared - consensus clustering, association analysis individual - information unique to particular view Multiple formulations, both frequentist and bayesian (Jia et al., 2010, Lock et al., 2013; Klami et al., 2015; Yang and Michailidis, 2015; Zhou et al., 2016;...)

7 Example - Joint and Individual Variation Explained (Lock et al., 2013) Represent each view via shared and individual structures: view = shared + individual + noise JIVE: low-rank decomposition into (globally) shared and individual parts X i = U 0 V 0i + U i V i + E i, i = 1,..., d. U 0 - latent components shared across all d views U i - latent components specific to view i Use as a benchmark due to automatic implementation (r.jive), and proven success (Kuligowski et al., 2015; Hellton and Thoresen., 2016;...)

8 What if there are partially-shared components? From Cancer Genome Atlas Network, Nature, 2012: n = 348 subject with measurements of 1. gene expression (p 1 = 645) 2. DNA methylation (p 2 = 574) 3. mirna expression (p 3 = 423) 4. reverse phase protein array (p 4 = 171) Possible approaches to find partially-shared components: 1. Ignore 2. Apply JIVE/other method sequentially adding one dataset at a time Disadvantages: 1. Partially-shared structures are mistaken as shared or individual 2. Solution to sequential approach may be order-dependent

9 Objectives What if there are more than 2 views? X 1 = U 1 V 1,1 + U 2V 1,2 + U 3 V 1,3 + U 5 V 1,5 + E 1, X 2 = U 1 V 2,1 + U 2V 2,2 + U 4 V 2,4 + U 6 V 2,6 + E 2, X 3 = U 1 V 3,1 + U 3 V 3,3 + U 4 V 3,4 + U 7 V 3,7 + E 3.

10 Objectives What if there are more than 2 views? X 1 = U 1 V 1,1 + U 2V 1,2 + U 3 V 1,3 + U 5 V 1,5 + E 1, X 2 = U 1 V 2,1 + U 2V 2,2 + U 4 V 2,4 + U 6 V 2,6 + E 2, X 3 = U 1 V 3,1 + U 3 V 3,3 + U 4 V 3,4 + U 7 V 3,7 + E 3. Can be expressed as X = [X 1 X 2 X 3 ] = UV + E, with V 1 V 1,1 V 1,2 V 1,3 0 V 1, V = V 2 = V 2,1 V 2,2 0 V 2,4 0 V 2, V 3 V 3,1 0 V 3,3 V 3,4 0 0 V 3,7 0

11 Proposed model X = [X 1... X d ] = UV + E X = [X 1 X 2 X 3] R n (p1+p2+p3) U R n r V T R r (p1+p2+p3)

12 Proposed Model V = V (S) with S = X = [X 1 X 2 X 3] R n (p1+p2+p3) U R n r V T R r (p1+p2+p3)

13 Proposed Model Structural Learning and Integrative Decomposition SLIDE: X = [X 1... X d ] = UV (S) + E S {0, 1} d r - binary structure matrix U R n r - score matrix with orthogonal columns, U U = I V = V (S) R p r - loading matrix with sparsity as in S

14 SLIDE Model Existence and Identifiability Advantages: We prove that SLIDE matrix decomposition always exists and is unique under certain conditions.

15 Model Existence and Identifiability Advantages: We prove that SLIDE matrix decomposition always exists and is unique under certain conditions. Number of possible models: Given d matched datasets, and the maximal possible rank r, there exists ( r+2 d 1) 2 d 1 distinct binary structures for SLIDE model.

16 SLIDE model d = 3, r = 7 leads to 3, 432 possible structures S X = [X 1 X 2 X 3] R n (p1+p2+p3) U R n r V T R r (p1+p2+p3)

17 Structural Learning and Integrative Decomposition SLIDE model: X = [X 1... X d ] = UV (S) + E 1. How to reduce the number of structures S for consideration from ( r+2 d 1) 2 d 1 to a small subset? (structural learning) 2. Given a sequence of m distinct binary structures S 1,..., S m, how to choose the best? (structural learning) 3. Given a binary structure S {0, 1} d r, how to fit the SLIDE model? (integrative decomposition)

18 Estimation SLIDE WORKFLOW Step 1: Step 2: Step 3: Form a sequence of candidate structures S_1,..., S_m Apply BCV procedure to select one structure from the sequence Fit SLIDE model for selected structure S

19 Step 1 - Generating a sequence of candidate structures SLIDE: X = [X 1... X d ] = UV (S) + E Given a grid of λ values, choose S 1,..., S m based on the support of V from { d minimize U, V i=1 subject to U U = I. ( 1 r 2 X i UV i 2 F + λ V ij 2 }{{} j=1 loss function }{{} block-sparse penalty )} Updates of U and V have closed form. Tuning parameter λ controls the sparsity level in V

20 Estimation SLIDE WORKFLOW Step 1: Step 2: Step 3: Form a sequence of candidate structures S_1,..., S_m Apply BCV procedure to select one structure from the sequence Fit SLIDE model for selected structure S

21 Step 2 - Selecting structure S out of the fixed set Adapt the bi-cross-validation (BCV) approach (Owen&Perry, 2009). ( ) X 11 X = [X 1 X 2... X d ] = 1 X1 12 X2 11 X Xd 11 Xd 12 X1 21 X1 22 X2 21 X Xd 21 Xd 22. Suppose X 11 = [X X d 11 ] is hold out. Fit the SLIDE model to X 22 = [X X 22 d ] with structure S to find Û and ˆV. Evaluate the prediction error on X 11 = [X X 11 d ]

22 Estimation SLIDE WORKFLOW Step 1: Step 2: Step 3: Form a sequence of candidate structures S_1,..., S_m Apply BCV procedure to select one structure from the sequence Fit SLIDE model for selected structure S

23 Step 3 - Fitting SLIDE model given structure S SLIDE: X = [X 1... X d ] = UV (S) + E Given S {0, 1} d r, fit SLIDE model by solving: { } minimize X UV 2 F U, V subject to U U = I, V = V (S). Updates of U and V have closed form.

24 Application to TCGA BRCA data n = 348 primary tumor samples with matched measurements of 1. gene expression (p 1 = 645) 2. DNA methylation (p 2 = 574) 3. mirna expression (p 3 = 423) 4. reverse phase protein array (p 4 = 171) Goals: exploratory analysis and consensus clustering

25 Comparison with JIVE Ranks and percentage of variance explained: SLIDE JIVE Dimension GE 14 (47.89%) 34 (61.72%) n 645 ME 10 (42.56%) 31 (47.87%) n 574 mirna 19 (57.50%) 30 (48.20%) n 423 RPPA 20 (63.55%) 22 (52.04%) n 171 Total 50 (52.88%) 108 (52.46%) n 1813 Both SLIDE and JIVE identify 3 globally-shared components. SLIDE identifies 3 partially-shared components: 1 for (GE, ME, mirna); 1 for (GE, ME); 1 for (GE, mirna).

26 Application to TCGA BRCA data Hierarchical clustering using 6 scores from SLIDE Subgroups induced by SLIDE, Log Rank test P value=0.029 Overall probability of survival Group 1: 115 Group 2: 65 Group 3: 94 Group 4: Years

27 Summary SLIDE highlights: structured sparsity within low-rank matrix decompositions justification of model existence and identifiability application to breast cancer data from TCGA reveals superior clustering compared to existing prominent method

28 Thank you!,

29 SLIDE Model Existence and Identifiability Let B d be the set of distinct binary vectors b i {0, 1} d. Existence: For any given signal matrix, there S, U, V such that SLIDE holds. Moreover, for each b k B d, the corresponding non-zero columns of dataset-specific loadings V i are linearly independent. Uniqueness: If non-zero columns of each V i are linearly independent, then S is unique. If for each b k B d the corresponding columns of V are orthogonal with distinct norms, then U and V are also unique.

Multi Omics Clustering. ABDBM Ron Shamir

Multi Omics Clustering. ABDBM Ron Shamir Multi Omics Clustering ABDBM Ron Shamir 1 Outline Introduction Cluster of Clusters (COCA) icluster Nonnegative Matrix Factorization (NMF) Similarity Network Fusion (SNF) Multiple Kernel Learning (MKL)