Structural Learning and Integrative Decomposition of Multi-View Data

Size: px
Start display at page:

Download "Structural Learning and Integrative Decomposition of Multi-View Data"

Transcription

1 Structural Learning and Integrative Decomposition of Multi-View Data, Department of Statistics, Texas A&M University JSM 2018, Vancouver, Canada July 31st, 2018

2 Dr. Gen Li, Columbia University, Mailman School of Public Health

3 Heterogeneous multi-view data Multi-view/Multimodal/Multisource: n common samples, d different views/platforms: X 1,..., X d, X i R n p i. Prototypical Example: TCGA

4 TCGA - BRCA From Cancer Genome Atlas Network, Nature, 2012: n = 348 subject with measurements of 1. gene expression (p 1 = 645) 2. DNA methylation (p 2 = 574) 3. mirna expression (p 3 = 423) 4. reverse phase protein array (p 4 = 171) Goals of the analysis: explorative analysis with respect to breast cancer subtype identification, consensus clustering, dimension reduction

5 Naive approaches PCA - standard dimension reduction technique, can be used for exploratory analysis, clustering,... Possible strategies: 1. Apply PCA/other method separately to each view 2. Apply PCA/other method to concatenated matrix with all the views Limitations: 1 does not take advantage of matched samples 2 does not take into account heterogeneity between the views Neither 1 or 2 can be used for association analysis

6 Linked Component Models Represent each view via shared and individual structures: view = shared + individual + noise shared + individual - dimension reduction shared - consensus clustering, association analysis individual - information unique to particular view Multiple formulations, both frequentist and bayesian (Jia et al., 2010, Lock et al., 2013; Klami et al., 2015; Yang and Michailidis, 2015; Zhou et al., 2016;...)

7 Example - Joint and Individual Variation Explained (Lock et al., 2013) Represent each view via shared and individual structures: view = shared + individual + noise JIVE: low-rank decomposition into (globally) shared and individual parts X i = U 0 V 0i + U i V i + E i, i = 1,..., d. U 0 - latent components shared across all d views U i - latent components specific to view i Use as a benchmark due to automatic implementation (r.jive), and proven success (Kuligowski et al., 2015; Hellton and Thoresen., 2016;...)

8 What if there are partially-shared components? From Cancer Genome Atlas Network, Nature, 2012: n = 348 subject with measurements of 1. gene expression (p 1 = 645) 2. DNA methylation (p 2 = 574) 3. mirna expression (p 3 = 423) 4. reverse phase protein array (p 4 = 171) Possible approaches to find partially-shared components: 1. Ignore 2. Apply JIVE/other method sequentially adding one dataset at a time Disadvantages: 1. Partially-shared structures are mistaken as shared or individual 2. Solution to sequential approach may be order-dependent

9 Objectives What if there are more than 2 views? X 1 = U 1 V 1,1 + U 2V 1,2 + U 3 V 1,3 + U 5 V 1,5 + E 1, X 2 = U 1 V 2,1 + U 2V 2,2 + U 4 V 2,4 + U 6 V 2,6 + E 2, X 3 = U 1 V 3,1 + U 3 V 3,3 + U 4 V 3,4 + U 7 V 3,7 + E 3.

10 Objectives What if there are more than 2 views? X 1 = U 1 V 1,1 + U 2V 1,2 + U 3 V 1,3 + U 5 V 1,5 + E 1, X 2 = U 1 V 2,1 + U 2V 2,2 + U 4 V 2,4 + U 6 V 2,6 + E 2, X 3 = U 1 V 3,1 + U 3 V 3,3 + U 4 V 3,4 + U 7 V 3,7 + E 3. Can be expressed as X = [X 1 X 2 X 3 ] = UV + E, with V 1 V 1,1 V 1,2 V 1,3 0 V 1, V = V 2 = V 2,1 V 2,2 0 V 2,4 0 V 2, V 3 V 3,1 0 V 3,3 V 3,4 0 0 V 3,7 0

11 Proposed model X = [X 1... X d ] = UV + E X = [X 1 X 2 X 3] R n (p1+p2+p3) U R n r V T R r (p1+p2+p3)

12 Proposed Model V = V (S) with S = X = [X 1 X 2 X 3] R n (p1+p2+p3) U R n r V T R r (p1+p2+p3)

13 Proposed Model Structural Learning and Integrative Decomposition SLIDE: X = [X 1... X d ] = UV (S) + E S {0, 1} d r - binary structure matrix U R n r - score matrix with orthogonal columns, U U = I V = V (S) R p r - loading matrix with sparsity as in S

14 SLIDE Model Existence and Identifiability Advantages: We prove that SLIDE matrix decomposition always exists and is unique under certain conditions.

15 Model Existence and Identifiability Advantages: We prove that SLIDE matrix decomposition always exists and is unique under certain conditions. Number of possible models: Given d matched datasets, and the maximal possible rank r, there exists ( r+2 d 1) 2 d 1 distinct binary structures for SLIDE model.

16 SLIDE model d = 3, r = 7 leads to 3, 432 possible structures S X = [X 1 X 2 X 3] R n (p1+p2+p3) U R n r V T R r (p1+p2+p3)

17 Structural Learning and Integrative Decomposition SLIDE model: X = [X 1... X d ] = UV (S) + E 1. How to reduce the number of structures S for consideration from ( r+2 d 1) 2 d 1 to a small subset? (structural learning) 2. Given a sequence of m distinct binary structures S 1,..., S m, how to choose the best? (structural learning) 3. Given a binary structure S {0, 1} d r, how to fit the SLIDE model? (integrative decomposition)

18 Estimation SLIDE WORKFLOW Step 1: Step 2: Step 3: Form a sequence of candidate structures S_1,..., S_m Apply BCV procedure to select one structure from the sequence Fit SLIDE model for selected structure S

19 Step 1 - Generating a sequence of candidate structures SLIDE: X = [X 1... X d ] = UV (S) + E Given a grid of λ values, choose S 1,..., S m based on the support of V from { d minimize U, V i=1 subject to U U = I. ( 1 r 2 X i UV i 2 F + λ V ij 2 }{{} j=1 loss function }{{} block-sparse penalty )} Updates of U and V have closed form. Tuning parameter λ controls the sparsity level in V

20 Estimation SLIDE WORKFLOW Step 1: Step 2: Step 3: Form a sequence of candidate structures S_1,..., S_m Apply BCV procedure to select one structure from the sequence Fit SLIDE model for selected structure S

21 Step 2 - Selecting structure S out of the fixed set Adapt the bi-cross-validation (BCV) approach (Owen&Perry, 2009). ( ) X 11 X = [X 1 X 2... X d ] = 1 X1 12 X2 11 X Xd 11 Xd 12 X1 21 X1 22 X2 21 X Xd 21 Xd 22. Suppose X 11 = [X X d 11 ] is hold out. Fit the SLIDE model to X 22 = [X X 22 d ] with structure S to find Û and ˆV. Evaluate the prediction error on X 11 = [X X 11 d ]

22 Estimation SLIDE WORKFLOW Step 1: Step 2: Step 3: Form a sequence of candidate structures S_1,..., S_m Apply BCV procedure to select one structure from the sequence Fit SLIDE model for selected structure S

23 Step 3 - Fitting SLIDE model given structure S SLIDE: X = [X 1... X d ] = UV (S) + E Given S {0, 1} d r, fit SLIDE model by solving: { } minimize X UV 2 F U, V subject to U U = I, V = V (S). Updates of U and V have closed form.

24 Application to TCGA BRCA data n = 348 primary tumor samples with matched measurements of 1. gene expression (p 1 = 645) 2. DNA methylation (p 2 = 574) 3. mirna expression (p 3 = 423) 4. reverse phase protein array (p 4 = 171) Goals: exploratory analysis and consensus clustering

25 Comparison with JIVE Ranks and percentage of variance explained: SLIDE JIVE Dimension GE 14 (47.89%) 34 (61.72%) n 645 ME 10 (42.56%) 31 (47.87%) n 574 mirna 19 (57.50%) 30 (48.20%) n 423 RPPA 20 (63.55%) 22 (52.04%) n 171 Total 50 (52.88%) 108 (52.46%) n 1813 Both SLIDE and JIVE identify 3 globally-shared components. SLIDE identifies 3 partially-shared components: 1 for (GE, ME, mirna); 1 for (GE, ME); 1 for (GE, mirna).

26 Application to TCGA BRCA data Hierarchical clustering using 6 scores from SLIDE Subgroups induced by SLIDE, Log Rank test P value=0.029 Overall probability of survival Group 1: 115 Group 2: 65 Group 3: 94 Group 4: Years

27 Summary SLIDE highlights: structured sparsity within low-rank matrix decompositions justification of model existence and identifiability application to breast cancer data from TCGA reveals superior clustering compared to existing prominent method

28 Thank you!,

29 SLIDE Model Existence and Identifiability Let B d be the set of distinct binary vectors b i {0, 1} d. Existence: For any given signal matrix, there S, U, V such that SLIDE holds. Moreover, for each b k B d, the corresponding non-zero columns of dataset-specific loadings V i are linearly independent. Uniqueness: If non-zero columns of each V i are linearly independent, then S is unique. If for each b k B d the corresponding columns of V are orthogonal with distinct norms, then U and V are also unique.

Multi Omics Clustering. ABDBM Ron Shamir

Multi Omics Clustering. ABDBM Ron Shamir Multi Omics Clustering ABDBM Ron Shamir 1 Outline Introduction Cluster of Clusters (COCA) icluster Nonnegative Matrix Factorization (NMF) Similarity Network Fusion (SNF) Multiple Kernel Learning (MKL)

More information

arxiv: v3 [stat.ml] 18 Mar 2018

arxiv: v3 [stat.ml] 18 Mar 2018 Angle-based joint and individual variation explained Qing Feng, Meilei Jiang, Jan Hannig, J. S. Marron Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, Chapel

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Effective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data

Effective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data Effective Linear Discriant Analysis for High Dimensional, Low Sample Size Data Zhihua Qiao, Lan Zhou and Jianhua Z. Huang Abstract In the so-called high dimensional, low sample size (HDLSS) settings, LDA

More information

Variable Selection in Structured High-dimensional Covariate Spaces

Variable Selection in Structured High-dimensional Covariate Spaces Variable Selection in Structured High-dimensional Covariate Spaces Fan Li 1 Nancy Zhang 2 1 Department of Health Care Policy Harvard University 2 Department of Statistics Stanford University May 14 2007

More information

A Framework for Feature Selection in Clustering

A Framework for Feature Selection in Clustering A Framework for Feature Selection in Clustering Daniela M. Witten and Robert Tibshirani October 10, 2011 Outline Problem Past work Proposed (sparse) clustering framework Sparse K-means clustering Sparse

More information

Sparse Principal Component Analysis via Alternating Maximization and Efficient Parallel Implementations

Sparse Principal Component Analysis via Alternating Maximization and Efficient Parallel Implementations Sparse Principal Component Analysis via Alternating Maximization and Efficient Parallel Implementations Martin Takáč The University of Edinburgh Joint work with Peter Richtárik (Edinburgh University) Selin

More information

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models 02-710 Computational Genomics Systems biology Putting it together: Data integration using graphical models High throughput data So far in this class we discussed several different types of high throughput

More information

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary

More information

Factor Analysis (10/2/13)

Factor Analysis (10/2/13) STA561: Probabilistic machine learning Factor Analysis (10/2/13) Lecturer: Barbara Engelhardt Scribes: Li Zhu, Fan Li, Ni Guan Factor Analysis Factor analysis is related to the mixture models we have studied.

More information

Data Mining and Matrices

Data Mining and Matrices Data Mining and Matrices 05 Semi-Discrete Decomposition Rainer Gemulla, Pauli Miettinen May 16, 2013 Outline 1 Hunting the Bump 2 Semi-Discrete Decomposition 3 The Algorithm 4 Applications SDD alone SVD

More information

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views A Randomized Approach for Crowdsourcing in the Presence of Multiple Views Presenter: Yao Zhou joint work with: Jingrui He - 1 - Roadmap Motivation Proposed framework: M2VW Experimental results Conclusion

More information

Correcting gene expression data when neither the unwanted variation nor the factor of interest are observed

Correcting gene expression data when neither the unwanted variation nor the factor of interest are observed Correcting gene expression data when neither the unwanted variation nor the factor of interest are observed Laurent Jacob 1 laurent@stat.berkeley.edu Johann Gagnon-Bartsch 1 johann@berkeley.edu Terence

More information

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider variable

More information

Vertical Integration of Multiple High-Dimensional Datasets

Vertical Integration of Multiple High-Dimensional Datasets Vertical Integration of Multiple High-Dimensional Datasets Eric F. Lock A dissertation submitted to the faculty of the University of North Carolina at Chapel Hill in partial fulfillment of the requirements

More information

STATISTICAL INTEGRATION OF INFORMATION. Qing Feng

STATISTICAL INTEGRATION OF INFORMATION. Qing Feng STATISTICAL INTEGRATION OF INFORMATION Qing Feng A dissertation submitted to the faculty of the University of North Carolina at Chapel Hill in partial fulfillment of the requirements for the degree of

More information

Correlate. A method for the integrative analysis of two genomic data sets

Correlate. A method for the integrative analysis of two genomic data sets Correlate A method for the integrative analysis of two genomic data sets Sam Gross, Balasubramanian Narasimhan, Robert Tibshirani, and Daniela Witten February 19, 2010 Introduction Sparse Canonical Correlation

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

Learning in Bayesian Networks

Learning in Bayesian Networks Learning in Bayesian Networks Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Berlin: 20.06.2002 1 Overview 1. Bayesian Networks Stochastic Networks

More information

Integrated Anlaysis of Genomics Data

Integrated Anlaysis of Genomics Data Integrated Anlaysis of Genomics Data Elizabeth Jennings July 3, 01 Abstract In this project, we integrate data from several genomic platforms in a model that incorporates the biological relationships between

More information

Protein Expression Molecular Pattern Discovery by Nonnegative Principal Component Analysis

Protein Expression Molecular Pattern Discovery by Nonnegative Principal Component Analysis Protein Expression Molecular Pattern Discovery by Nonnegative Principal Component Analysis Xiaoxu Han and Joseph Scazzero Department of Mathematics and Bioinformatics Program Department of Accounting and

More information

Functional SVD for Big Data

Functional SVD for Big Data Functional SVD for Big Data Pan Chao April 23, 2014 Pan Chao Functional SVD for Big Data April 23, 2014 1 / 24 Outline 1 One-Way Functional SVD a) Interpretation b) Robustness c) CV/GCV 2 Two-Way Problem

More information

Support'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan

Support'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan Support'Vector'Machines Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan kasthuri.kannan@nyumc.org Overview Support Vector Machines for Classification Linear Discrimination Nonlinear Discrimination

More information

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences Biostatistics-Lecture 16 Model Selection Ruibin Xi Peking University School of Mathematical Sciences Motivating example1 Interested in factors related to the life expectancy (50 US states,1969-71 ) Per

More information

Expression Data Exploration: Association, Patterns, Factors & Regression Modelling

Expression Data Exploration: Association, Patterns, Factors & Regression Modelling Expression Data Exploration: Association, Patterns, Factors & Regression Modelling Exploring gene expression data Scale factors, median chip correlation on gene subsets for crude data quality investigation

More information

Relevance Vector Machines

Relevance Vector Machines LUT February 21, 2011 Support Vector Machines Model / Regression Marginal Likelihood Regression Relevance vector machines Exercise Support Vector Machines The relevance vector machine (RVM) is a bayesian

More information

Automatic Rank Determination in Projective Nonnegative Matrix Factorization

Automatic Rank Determination in Projective Nonnegative Matrix Factorization Automatic Rank Determination in Projective Nonnegative Matrix Factorization Zhirong Yang, Zhanxing Zhu, and Erkki Oja Department of Information and Computer Science Aalto University School of Science and

More information

L 2,1 Norm and its Applications

L 2,1 Norm and its Applications L 2, Norm and its Applications Yale Chang Introduction According to the structure of the constraints, the sparsity can be obtained from three types of regularizers for different purposes.. Flat Sparsity.

More information

Charalampos (Babis) E. Tsourakakis WABI 2013, France WABI '13 1

Charalampos (Babis) E. Tsourakakis WABI 2013, France WABI '13 1 Charalampos (Babis) E. Tsourakakis charalampos.tsourakakis@aalto.fi WBI 2013, France WBI '13 1 WBI '13 2 B B B C B B B B B B B B B B B D D D E E E E E E E Copy numbers for a single gene B C D E (2) (3)

More information

Group lasso for genomic data

Group lasso for genomic data Group lasso for genomic data Jean-Philippe Vert Mines ParisTech / Curie Institute / Inserm Machine learning: Theory and Computation workshop, IMA, Minneapolis, March 26-3, 22 J.P Vert (ParisTech) Group

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

Clustering based tensor decomposition

Clustering based tensor decomposition Clustering based tensor decomposition Huan He huan.he@emory.edu Shihua Wang shihua.wang@emory.edu Emory University November 29, 2017 (Huan)(Shihua) (Emory University) Clustering based tensor decomposition

More information

Unsupervised machine learning

Unsupervised machine learning Chapter 9 Unsupervised machine learning Unsupervised machine learning (a.k.a. cluster analysis) is a set of methods to assign objects into clusters under a predefined distance measure when class labels

More information

On the inconsistency of l 1 -penalised sparse precision matrix estimation

On the inconsistency of l 1 -penalised sparse precision matrix estimation On the inconsistency of l 1 -penalised sparse precision matrix estimation Otte Heinävaara Helsinki Institute for Information Technology HIIT Department of Computer Science University of Helsinki Janne

More information

Inferring Transcriptional Regulatory Networks from High-throughput Data

Inferring Transcriptional Regulatory Networks from High-throughput Data Inferring Transcriptional Regulatory Networks from High-throughput Data Lectures 9 Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20

More information

cross-language retrieval (by concatenate features of different language in X and find co-shared U). TOEFL/GRE synonym in the same way.

cross-language retrieval (by concatenate features of different language in X and find co-shared U). TOEFL/GRE synonym in the same way. 10-708: Probabilistic Graphical Models, Spring 2015 22 : Optimization and GMs (aka. LDA, Sparse Coding, Matrix Factorization, and All That ) Lecturer: Yaoliang Yu Scribes: Yu-Xiang Wang, Su Zhou 1 Introduction

More information

Information Retrieval

Information Retrieval Introduction to Information CS276: Information and Web Search Christopher Manning and Pandu Nayak Lecture 13: Latent Semantic Indexing Ch. 18 Today s topic Latent Semantic Indexing Term-document matrices

More information

a Short Introduction

a Short Introduction Collaborative Filtering in Recommender Systems: a Short Introduction Norm Matloff Dept. of Computer Science University of California, Davis matloff@cs.ucdavis.edu December 3, 2016 Abstract There is a strong

More information

CSC411: Final Review. James Lucas & David Madras. December 3, 2018

CSC411: Final Review. James Lucas & David Madras. December 3, 2018 CSC411: Final Review James Lucas & David Madras December 3, 2018 Agenda 1. A brief overview 2. Some sample questions Basic ML Terminology The final exam will be on the entire course; however, it will be

More information

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012 Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Principal Components Analysis Le Song Lecture 22, Nov 13, 2012 Based on slides from Eric Xing, CMU Reading: Chap 12.1, CB book 1 2 Factor or Component

More information

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS

More information

Cluster Analysis of Gene Expression Microarray Data. BIOL 495S/ CS 490B/ MATH 490B/ STAT 490B Introduction to Bioinformatics April 8, 2002

Cluster Analysis of Gene Expression Microarray Data. BIOL 495S/ CS 490B/ MATH 490B/ STAT 490B Introduction to Bioinformatics April 8, 2002 Cluster Analysis of Gene Expression Microarray Data BIOL 495S/ CS 490B/ MATH 490B/ STAT 490B Introduction to Bioinformatics April 8, 2002 1 Data representations Data are relative measurements log 2 ( red

More information

Robustness of Principal Components

Robustness of Principal Components PCA for Clustering An objective of principal components analysis is to identify linear combinations of the original variables that are useful in accounting for the variation in those original variables.

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand

More information

Regularization and Variable Selection via the Elastic Net

Regularization and Variable Selection via the Elastic Net p. 1/1 Regularization and Variable Selection via the Elastic Net Hui Zou and Trevor Hastie Journal of Royal Statistical Society, B, 2005 Presenter: Minhua Chen, Nov. 07, 2008 p. 2/1 Agenda Introduction

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!

More information

Principal component analysis and the asymptotic distribution of high-dimensional sample eigenvectors

Principal component analysis and the asymptotic distribution of high-dimensional sample eigenvectors Principal component analysis and the asymptotic distribution of high-dimensional sample eigenvectors Kristoffer Hellton Department of Mathematics, University of Oslo May 12, 2015 K. Hellton (UiO) Distribution

More information

Multimodal Deep Learning for Predicting Survival from Breast Cancer

Multimodal Deep Learning for Predicting Survival from Breast Cancer Multimodal Deep Learning for Predicting Survival from Breast Cancer Heather Couture Deep Learning Journal Club Nov. 16, 2016 Outline Background on tumor histology & genetic data Background on survival

More information

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Combinations of features Given a data matrix X n p with p fairly large, it can

More information

Sparse PCA with applications in finance

Sparse PCA with applications in finance Sparse PCA with applications in finance A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon 1 Introduction

More information

PROBABILISTIC LATENT SEMANTIC ANALYSIS

PROBABILISTIC LATENT SEMANTIC ANALYSIS PROBABILISTIC LATENT SEMANTIC ANALYSIS Lingjia Deng Revised from slides of Shuguang Wang Outline Review of previous notes PCA/SVD HITS Latent Semantic Analysis Probabilistic Latent Semantic Analysis Applications

More information

STATS 306B: Unsupervised Learning Spring Lecture 13 May 12

STATS 306B: Unsupervised Learning Spring Lecture 13 May 12 STATS 306B: Unsupervised Learning Spring 2014 Lecture 13 May 12 Lecturer: Lester Mackey Scribe: Jessy Hwang, Minzhe Wang 13.1 Canonical correlation analysis 13.1.1 Recap CCA is a linear dimensionality

More information

Nonnegative Matrix Factorization

Nonnegative Matrix Factorization Nonnegative Matrix Factorization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September

More information

New Machine Learning Methods for Neuroimaging

New Machine Learning Methods for Neuroimaging New Machine Learning Methods for Neuroimaging Gatsby Computational Neuroscience Unit University College London, UK Dept of Computer Science University of Helsinki, Finland Outline Resting-state networks

More information

Multimodel Ensemble forecasts

Multimodel Ensemble forecasts Multimodel Ensemble forecasts Calibrated methods Michael K. Tippett International Research Institute for Climate and Society The Earth Institute, Columbia University ERFS Climate Predictability Tool Training

More information

Advanced Introduction to Machine Learning

Advanced Introduction to Machine Learning 10-715 Advanced Introduction to Machine Learning Homework 3 Due Nov 12, 10.30 am Rules 1. Homework is due on the due date at 10.30 am. Please hand over your homework at the beginning of class. Please see

More information

Supplement to Sparse Non-Negative Generalized PCA with Applications to Metabolomics

Supplement to Sparse Non-Negative Generalized PCA with Applications to Metabolomics Supplement to Sparse Non-Negative Generalized PCA with Applications to Metabolomics Genevera I. Allen Department of Pediatrics-Neurology, Baylor College of Medicine, Jan and Dan Duncan Neurological Research

More information

A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables

A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables Qi Tang (Joint work with Kam-Wah Tsui and Sijian Wang) Department of Statistics University of Wisconsin-Madison Feb. 8,

More information

CPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018

CPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018 CPSC 340: Machine Learning and Data Mining Sparse Matrix Factorization Fall 2018 Last Time: PCA with Orthogonal/Sequential Basis When k = 1, PCA has a scaling problem. When k > 1, have scaling, rotation,

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Learning with Singular Vectors

Learning with Singular Vectors Learning with Singular Vectors CIS 520 Lecture 30 October 2015 Barry Slaff Based on: CIS 520 Wiki Materials Slides by Jia Li (PSU) Works cited throughout Overview Linear regression: Given X, Y find w:

More information

Generalized Power Method for Sparse Principal Component Analysis

Generalized Power Method for Sparse Principal Component Analysis Generalized Power Method for Sparse Principal Component Analysis Peter Richtárik CORE/INMA Catholic University of Louvain Belgium VOCAL 2008, Veszprém, Hungary CORE Discussion Paper #2008/70 joint work

More information

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx

More information

6. Regularized linear regression

6. Regularized linear regression Foundations of Machine Learning École Centrale Paris Fall 2015 6. Regularized linear regression Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr

More information

Chapter 10. Semi-Supervised Learning

Chapter 10. Semi-Supervised Learning Chapter 10. Semi-Supervised Learning Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Outline

More information

Computational Genomics

Computational Genomics Computational Genomics http://www.cs.cmu.edu/~02710 Introduction to probability, statistics and algorithms (brief) intro to probability Basic notations Random variable - referring to an element / event

More information

A General Framework for Variable Selection in Linear Mixed Models with Applications to Genetic Studies with Structured Populations

A General Framework for Variable Selection in Linear Mixed Models with Applications to Genetic Studies with Structured Populations A General Framework for Variable Selection in Linear Mixed Models with Applications to Genetic Studies with Structured Populations Joint work with Karim Oualkacha (UQÀM), Yi Yang (McGill), Celia Greenwood

More information

Introduction PCA classic Generative models Beyond and summary. PCA, ICA and beyond

Introduction PCA classic Generative models Beyond and summary. PCA, ICA and beyond PCA, ICA and beyond Summer School on Manifold Learning in Image and Signal Analysis, August 17-21, 2009, Hven Technical University of Denmark (DTU) & University of Copenhagen (KU) August 18, 2009 Motivation

More information

Discovering molecular pathways from protein interaction and ge

Discovering molecular pathways from protein interaction and ge Discovering molecular pathways from protein interaction and gene expression data 9-4-2008 Aim To have a mechanism for inferring pathways from gene expression and protein interaction data. Motivation Why

More information

Dimension Reduction (PCA, ICA, CCA, FLD,

Dimension Reduction (PCA, ICA, CCA, FLD, Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011 April 6 th, 2011 Parts of the PCA slides are from previous 10-701 lectures 1 Outline Dimension reduction

More information

Principal component analysis (PCA) for clustering gene expression data

Principal component analysis (PCA) for clustering gene expression data Principal component analysis (PCA) for clustering gene expression data Ka Yee Yeung Walter L. Ruzzo Bioinformatics, v17 #9 (2001) pp 763-774 1 Outline of talk Background and motivation Design of our empirical

More information

Today. Calculus. Linear Regression. Lagrange Multipliers

Today. Calculus. Linear Regression. Lagrange Multipliers Today Calculus Lagrange Multipliers Linear Regression 1 Optimization with constraints What if I want to constrain the parameters of the model. The mean is less than 10 Find the best likelihood, subject

More information

A new approach for biomarker detection using fusion networks

A new approach for biomarker detection using fusion networks A new approach for biomarker detection using fusion networks Master thesis Bioinformatics Mona Rams June 15, 2016 Department Mathematik und Informatik Freie Universität Berlin 2 Statutory Declaration I

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Nonparametric Bayes tensor factorizations for big data

Nonparametric Bayes tensor factorizations for big data Nonparametric Bayes tensor factorizations for big data David Dunson Department of Statistical Science, Duke University Funded from NIH R01-ES017240, R01-ES017436 & DARPA N66001-09-C-2082 Motivation Conditional

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 02-01-2018 Biomedical data are usually high-dimensional Number of samples (n) is relatively small whereas number of features (p) can be large Sometimes p>>n Problems

More information

CPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2017

CPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2017 CPSC 340: Machine Learning and Data Mining Sparse Matrix Factorization Fall 2017 Admin Assignment 4: Due Friday. Assignment 5: Posted, due Monday of last week of classes Last Time: PCA with Orthogonal/Sequential

More information

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms François Caron Department of Statistics, Oxford STATLEARN 2014, Paris April 7, 2014 Joint work with Adrien Todeschini,

More information

Fast Adaptive Algorithm for Robust Evaluation of Quality of Experience

Fast Adaptive Algorithm for Robust Evaluation of Quality of Experience Fast Adaptive Algorithm for Robust Evaluation of Quality of Experience Qianqian Xu, Ming Yan, Yuan Yao October 2014 1 Motivation Mean Opinion Score vs. Paired Comparisons Crowdsourcing Ranking on Internet

More information

Modeling Mutagenicity Status of a Diverse Set of Chemical Compounds by Envelope Methods

Modeling Mutagenicity Status of a Diverse Set of Chemical Compounds by Envelope Methods Modeling Mutagenicity Status of a Diverse Set of Chemical Compounds by Envelope Methods Subho Majumdar School of Statistics, University of Minnesota Envelopes in Chemometrics August 4, 2014 1 / 23 Motivation

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Empirical Bayes, Hierarchical Bayes Mark Schmidt University of British Columbia Winter 2017 Admin Assignment 5: Due April 10. Project description on Piazza. Final details coming

More information

Linear Algebra. Introduction. Marek Petrik 3/23/2017. Many slides adapted from Linear Algebra Lectures by Martin Scharlemann

Linear Algebra. Introduction. Marek Petrik 3/23/2017. Many slides adapted from Linear Algebra Lectures by Martin Scharlemann Linear Algebra Introduction Marek Petrik 3/23/2017 Many slides adapted from Linear Algebra Lectures by Martin Scharlemann Midterm Results Highest score on the non-r part: 67 / 77 Score scaling: Additive

More information

Supplemental Material for Discrete Graph Hashing

Supplemental Material for Discrete Graph Hashing Supplemental Material for Discrete Graph Hashing Wei Liu Cun Mu Sanjiv Kumar Shih-Fu Chang IM T. J. Watson Research Center Columbia University Google Research weiliu@us.ibm.com cm52@columbia.edu sfchang@ee.columbia.edu

More information

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2016

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2016 CPSC 340: Machine Learning and Data Mining More PCA Fall 2016 A2/Midterm: Admin Grades/solutions posted. Midterms can be viewed during office hours. Assignment 4: Due Monday. Extra office hours: Thursdays

More information

Lecture 7: Con3nuous Latent Variable Models

Lecture 7: Con3nuous Latent Variable Models CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 7: Con3nuous Latent Variable Models All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/

More information

Master s Thesis. Clinical Outcome Prediction Based on Multi-Omics Data: Extension of IPF-LASSO. Gerhard Schulze

Master s Thesis. Clinical Outcome Prediction Based on Multi-Omics Data: Extension of IPF-LASSO. Gerhard Schulze ,t Master s Thesis Clinical Outcome Prediction Based on Multi-Omics Data: Extension of -LASSO by Gerhard Schulze Ludwig-Maximilians-Universität München Institut für Statistik Supervisor: Prof. Dr. Anne-Laure

More information

SELC : Sequential Elimination of Level Combinations by Means of Modified Genetic Algorithms

SELC : Sequential Elimination of Level Combinations by Means of Modified Genetic Algorithms SELC : Sequential Elimination of Level Combinations by Means of Modified Genetic Algorithms Revision submitted to Technometrics Abhyuday Mandal Ph.D. Candidate School of Industrial and Systems Engineering

More information

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector

More information

Constraint-based Subspace Clustering

Constraint-based Subspace Clustering Constraint-based Subspace Clustering Elisa Fromont 1, Adriana Prado 2 and Céline Robardet 1 1 Université de Lyon, France 2 Universiteit Antwerpen, Belgium Thursday, April 30 Traditional Clustering Partitions

More information

TESTING SIGNIFICANCE OF FEATURES BY LASSOED PRINCIPAL COMPONENTS. BY DANIELA M. WITTEN 1 AND ROBERT TIBSHIRANI 2 Stanford University

TESTING SIGNIFICANCE OF FEATURES BY LASSOED PRINCIPAL COMPONENTS. BY DANIELA M. WITTEN 1 AND ROBERT TIBSHIRANI 2 Stanford University The Annals of Applied Statistics 2008, Vol. 2, No. 3, 986 1012 DOI: 10.1214/08-AOAS182 Institute of Mathematical Statistics, 2008 TESTING SIGNIFICANCE OF FEATURES BY LASSOED PRINCIPAL COMPONENTS BY DANIELA

More information

Machine Learning 2nd Edition

Machine Learning 2nd Edition INTRODUCTION TO Lecture Slides for Machine Learning 2nd Edition ETHEM ALPAYDIN, modified by Leonardo Bobadilla and some parts from http://www.cs.tau.ac.il/~apartzin/machinelearning/ The MIT Press, 2010

More information

ZHANG C, ZHANG S.: BAYESIAN JOINT MATRIX DECOMPOSITION FOR DATA INTEGRATION WITH HETEROGENEOUS NOISE 1

ZHANG C, ZHANG S.: BAYESIAN JOINT MATRIX DECOMPOSITION FOR DATA INTEGRATION WITH HETEROGENEOUS NOISE 1 ZHANG C, ZHANG S.: BAYESIAN JOINT MATRIX DECOMPOSITION FOR DATA INTEGRATION WITH HETEROGENEOUS NOISE 1 Bayesian Joint Matrix Decomposition for Data Integration with Heterogeneous Noise Chihao Zhang and

More information

Package sgpca. R topics documented: July 6, Type Package. Title Sparse Generalized Principal Component Analysis. Version 1.0.

Package sgpca. R topics documented: July 6, Type Package. Title Sparse Generalized Principal Component Analysis. Version 1.0. Package sgpca July 6, 2013 Type Package Title Sparse Generalized Principal Component Analysis Version 1.0 Date 2012-07-05 Author Frederick Campbell Maintainer Frederick Campbell

More information

Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine

Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine Mike Tipping Gaussian prior Marginal prior: single α Independent α Cambridge, UK Lecture 3: Overview

More information

Clustering and Network

Clustering and Network Clustering and Network Jing-Dong Jackie Han jdhan@picb.ac.cn http://www.picb.ac.cn/~jdhan Copy Right: Jing-Dong Jackie Han What is clustering? A way of grouping together data samples that are similar in

More information

FaRoC: Fast and Robust Supervised Canonical Correlation Analysis for Multimodal Omics Data

FaRoC: Fast and Robust Supervised Canonical Correlation Analysis for Multimodal Omics Data IEEE TRANSACTIONS ON CYBERNETICS, VOL. XX, NO. YY, 2017 1 : Fast and Robust Supervised Canonical Correlation Analysis for Multimodal Omics Data Ankita Mandal and Pradipta Maji, Senior Member, IEEE Abstract

More information

https://goo.gl/kfxweg KYOTO UNIVERSITY Statistical Machine Learning Theory Sparsity Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY 1 KYOTO UNIVERSITY Topics:

More information

Sparse statistical modelling

Sparse statistical modelling Sparse statistical modelling Tom Bartlett Sparse statistical modelling Tom Bartlett 1 / 28 Introduction A sparse statistical model is one having only a small number of nonzero parameters or weights. [1]

More information