Mid-year Report Linear and Non-linear Dimentionality. Reduction. applied to gene expression data of cancer tissue samples

Similar documents
Linear and Non-linear Dimension Reduction Applied to Gene Expression Data of Cancer Tissue Samples

Data-dependent representations: Laplacian Eigenmaps

Nonlinear Dimensionality Reduction. Jose A. Costa

CSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Unsupervised dimensionality reduction

Statistical and Computational Analysis of Locality Preserving Projection

MATH 829: Introduction to Data Mining and Analysis Principal component analysis

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

Introduction to Machine Learning

PCA and admixture models

Principal component analysis (PCA) for clustering gene expression data

Data dependent operators for the spatial-spectral fusion problem

Invariant Subspace Perturbations or: How I Learned to Stop Worrying and Love Eigenvectors

Fisher s Linear Discriminant Analysis

Non-linear Dimensionality Reduction

Principal Components Analysis (PCA)

Nonlinear Methods. Data often lies on or near a nonlinear low-dimensional curve aka manifold.

Numerical Methods I Singular Value Decomposition

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Data Mining and Analysis: Fundamental Concepts and Algorithms

Inverse Power Method for Non-linear Eigenproblems

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation)

Linear Algebra Methods for Data Mining

Graph Functional Methods for Climate Partitioning

Nonlinear Dimensionality Reduction

Linear Dimensionality Reduction

Principal Component Analysis (PCA)

Apprentissage non supervisée

20 Unsupervised Learning and Principal Components Analysis (PCA)

Intrinsic Structure Study on Whale Vocalizations

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

Principal Component Analysis (PCA) CSC411/2515 Tutorial

Principal Component Analysis

Dimensionality Reduction

Analysis of Spectral Kernel Design based Semi-supervised Learning

Principal Component Analysis (PCA)

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

1 Singular Value Decomposition and Principal Component

What is Principal Component Analysis?

DIMENSION REDUCTION AND CLUSTER ANALYSIS

Principal Component Analysis

Learning gradients: prescriptive models

Statistical Machine Learning

Iterative Laplacian Score for Feature Selection

Spectra of Adjacency and Laplacian Matrices

Lecture 7 Spectral methods

Nonlinear Dimensionality Reduction

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Preprocessing & dimensionality reduction

Dimensionality Reduc1on

Data Analysis and Manifold Learning Lecture 7: Spectral Clustering

Statistics for Applications. Chapter 9: Principal Component Analysis (PCA) 1/16

LECTURE NOTE #11 PROF. ALAN YUILLE

CSE 554 Lecture 7: Alignment

Spectral Clustering on Handwritten Digits Database

Machine Learning (BSMC-GA 4439) Wenke Liu

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

Data Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings

Probabilistic Latent Semantic Analysis

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

Table of Contents. Multivariate methods. Introduction II. Introduction I

PCA: Principal Component Analysis

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer

Relations Between Adjacency And Modularity Graph Partitioning: Principal Component Analysis vs. Modularity Component Analysis

Principal Component Analysis

1 Principal Components Analysis

Dimensionality Reduction

Lecture: Face Recognition and Feature Reduction

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

Neuroscience Introduction

Dimension Reduction and Low-dimensional Embedding

14 Singular Value Decomposition

Machine Learning 11. week

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

The Nyström Extension and Spectral Methods in Learning

15 Singular Value Decomposition

Notes on Implementation of Component Analysis Techniques

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

Machine Learning (CSE 446): Unsupervised Learning: K-means and Principal Component Analysis

Linear & Non-Linear Discriminant Analysis! Hugh R. Wilson

EUSIPCO

Image Analysis & Retrieval Lec 13 - Feature Dimension Reduction

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016

Maximum variance formulation

Locality Preserving Projections

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

Singular Value Decomposition and Principal Component Analysis (PCA) I

Spectral Techniques for Clustering

Lecture 10: Dimension Reduction Techniques

CHARACTERIZATION OF NONLINEAR NEURON RESPONSES

System 1 (last lecture) : limited to rigidly structured shapes. System 2 : recognition of a class of varying shapes. Need to:

Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center

Advanced data analysis

Discriminative K-means for Clustering

Transcription:

Mid-year Report Linear and Non-linear Dimentionality applied to gene expression data of cancer tissue samples Franck Olivier Ndjakou Njeunje Applied Mathematics, Statistics, and Scientific Computation Advisers Wojtek Czaja John J. Benedetto Norbert Wiener Center for Harmonic Department of Mathematics University of Maryland - College Park December 2, 2014 1 / 26

Outline 2 / 26

DNA Microarray [2] 3 / 26

DNA Microarray [2] Gene expression matrix 3 / 26

Project goal In this project I would like to demonstrate the effectiveness of Non-linear versus Linear algorithm in capturing biologically relevant structures in cancer expression dataset. 4 / 26

Project goal In this project I would like to demonstrate the effectiveness of Non-linear versus Linear algorithm in capturing biologically relevant structures in cancer expression dataset. I will be using clustering analysis and rand index as tools to measure the preservation of the structure between the original and the reduced dataset. 4 / 26

I am considering the following methods: 1. Dimension reduction algorithms 12 LDR: (PCA) [1] NDR: Laplacian Eigenmap (LE) [3] 1 Jinlong Shi, Zhigang Luo, Nonlinear dimensionality reduction. 2 Mikhail Belkin, Partha Niyogi, Laplacian Eigenmaps. 5 / 26

I am considering the following methods: 1. Dimension reduction algorithms 12 LDR: (PCA) [1] NDR: Laplacian Eigenmap (LE) [3] 2. Clustering algorithm K-means (KM) Hierachical clustering (HC) 1 Jinlong Shi, Zhigang Luo, Nonlinear dimensionality reduction. 2 Mikhail Belkin, Partha Niyogi, Laplacian Eigenmaps. 5 / 26

Step 1: Compute the standardized matrix X of the original matrix X, X = ( x 1, x 2,..., x M ) (1) = ( x 1 x 1, x 2 x 2,..., x M x M ). (2) σ11 σmm σ22 Here, x 1, x 2,..., x M and σ 11, σ 22,..., σ MM are respectively the mean values and the variances for corresponding variable vectors. 6 / 26

Step 2: Compute the covariance matrix of X, then make spectral decomposition to get the eigenvalues and its corresponding eigenvectors. C = X X = UΛU. (3) Here Λ = diag(λ 1, λ 2,..., λ M ), λ 1 λ 2... λ M, U = (u 1, u 2,..., u M ). λ i and u i are separately the ith eigenvalue corresponding eigenvector for covariance matrix C. Step 3: Determine the number of principal components based on the preconcerted value. Supposing the number to be m, the i th principal component can be computed as Xu i, and the reduced dimentional (N m) subspace is XU m. 7 / 26

Illustration 8 / 26

Loading vectors components are in short a linear combination of the original component making up the matrix X. 9 / 26

Loading vectors components are in short a linear combination of the original component making up the matrix X. The vector carrying the coefficients associated with the combination are known as the loading vectors. 9 / 26

Loading vectors components are in short a linear combination of the original component making up the matrix X. The vector carrying the coefficients associated with the combination are known as the loading vectors. Finding the first loading vector u 1 must be done so that the magnitude of the first principal component is maximized. 9 / 26

First loading vectors: Rayleigh quotient u 1 = arg max u=1 y 1 2 (4) = arg max u=1 Xu 2 (5) = arg max u=1 u X Xu u u (6) = arg max u=1 u Cu u u (7) 10 / 26

First loading vectors: Rayleigh quotient u 1 = arg max u=1 y 1 2 (4) = arg max u=1 Xu 2 (5) = arg max u=1 u X Xu u u (6) = arg max u=1 u Cu u u (7) The quantity to be maximized is well-known as the Rayleigh quotient for symmetric matrices. The solution to this optimization problem is known to be the eigenvector of C corresponding to the eigenvalue of largest magnitude. 10 / 26

Remaining loading vectors To find the remaining loading vectors u k for k = 2... m, we will apply the same idea to the modified matrix X k. k 1 X k = X Xu i u i (8) i=1 where all correlation with the previously found loading vectors has been removed. 11 / 26

Covariance matrix With C R M M as our symmetric covariance matrix, The set {u j } of unit eigenvectors of C with j = 1... M, where M M, forms a basis of R M. The corresponding eigenvalues {λ j }, are such that λ 1 > λ 2 >... > λ M. 12 / 26

Covariance matrix With C R M M as our symmetric covariance matrix, The set {u j } of unit eigenvectors of C with j = 1... M, where M M, forms a basis of R M. The corresponding eigenvalues {λ j }, are such that λ 1 > λ 2 >... > λ M. So any vector u (0) R M can be written as: u (0) = c 1 u 1 + c 2 u 2 +... + c M u M (9) for some c 1, c 2,..., c M R. 12 / 26

[4] Assuming that c 1 0. Au (0) = c 1 λ 1 u 1 + c 2 λ 2 u 2 +... + c M λ M u M A k u (0) = c 1 λ k 1u 1 + c 2 λ k 2u 2 +... + c M λ k M u M A k u (0) = λ k 1(c 1 u 1 + c 2 ( λ 2 λ 1 ) k u 2 +... + c M ( λ M λ 1 ) k u M ). 13 / 26

[4] Assuming that c 1 0. Au (0) = c 1 λ 1 u 1 + c 2 λ 2 u 2 +... + c M λ M u M A k u (0) = c 1 λ k 1u 1 + c 2 λ k 2u 2 +... + c M λ k M u M A k u (0) = λ k 1(c 1 u 1 + c 2 ( λ 2 λ 1 ) k u 2 +... + c M ( λ M λ 1 ) k u M ). So, as k increases we get, u 1 Ak u (0) A k u (0). (10) 13 / 26

Algorithm 3 Pick a starting vector u (0) with u (0) = 1 While u (k) u (k 1) > 10 6 Let w = Au (k 1) Let u (k) = w w Note: The convergence rate depends on the magnitude of the second largest eigenvalue. 3 Gene H. Golub, Henk A. van der Vorstb, Eigenvalue computation in the 20th century. 14 / 26

Data set [5] The matrix X has dimension: 2000 3 15 / 26

Variability: This number reflects the amount of variance captured in the reduction. It is the percentage of total magnitude of eigenvalues corresponding to the eigenvectors (loading vectors) used. DR Toolbox PCA My PCA: 90% variability 16 / 26

Data set [5] The matrix X has dimension: 20000 3 17 / 26

DR Toolbox PCA My PCA: 99% variability 18 / 26

[6] Rank index is a measure of agreement between two data clustering. Given a set of n elements S and two partition X and Y of the set S, the rand index r is given by: r = a + b C(n, 2) (11) 19 / 26

[6] Rank index is a measure of agreement between two data clustering. Given a set of n elements S and two partition X and Y of the set S, the rand index r is given by: where: r = a + b C(n, 2) a, the number of pairs of elements in S that are in the same set in X and in the same set in Y. b, the number of pairs of elements in S that are in different sets in X and in different sets in Y. (11) 19 / 26

K-means Clustering The matrix X has dimension: 20002 60 Raw clustering labels 2D labels: 83% agreement 20 / 26

Hierarchycal Clustering The matrix X has dimension: 20002 60 Raw clustering labels 2D labels: 73% agreement 21 / 26

22 / 26

Variability preserved within data 23 / 26

Conclusion : Kmeans vs Hierarchycal 83% vs 73% agreement 24 / 26

Conclusion : Kmeans vs Hierarchycal 83% vs 73% agreement Laplacian Eigenmap: Kmeans vs Hierarchycal??% vs??% agreement 24 / 26

Conclusion : Kmeans vs Hierarchycal 83% vs 73% agreement Laplacian Eigenmap: Kmeans vs Hierarchycal??% vs??% agreement vs Laplacian Eigenmap??% vs??% agreement 24 / 26

References: Jinlong Shi, Zhigang Luo, Nonlinear dimensionality reduction of gene expression data for visualization and clustering analysis of cancer tissue samples. Computers in Biology and Medicine 40 (2010) 723-732. Larssono, September 2007, Microarray-schema. Via Wikipedia - DNA microarray page. Mikhail Belkin, Partha Niyogi, Laplacian Eigenmaps for Dimentionality and Data Representation. Neural Computation 15, 1373-1396 (2003) Gene H. Golub, Henk A. van der Vorstb, Eigenvalue computation in the 20th century. Journal of Computational and Applied Mathematics 123 (2000) 35-65. 25 / 26

References: Laurens van der Maaten, Affiliation: Delft University of Technology. Matlab Toolbox for (v0.8.1b) March 21, 2013. W. M. Rand (1971), Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66 (336): 846850. 26 / 26