Principal Components Analysis (PCA) and Singular Value Decomposition (SVD) with applications to Microarrays

Similar documents
Linear Algebra review Powers of a diagonalizable matrix Spectral decomposition

Linear Algebra review Powers of a diagonalizable matrix Spectral decomposition

Review of Linear Algebra

Singular Value Decomposition and Principal Component Analysis (PCA) I

Linear Algebra Review. Vectors

Linear Algebra for Machine Learning. Sargur N. Srihari

11. Regression and Least Squares

The Singular Value Decomposition

Linear Algebra (Review) Volker Tresp 2017

Math Linear Algebra Final Exam Review Sheet

Linear Algebra (Review) Volker Tresp 2018

Linear Algebra - Part II

Vectors and Matrices Statistics with Vectors and Matrices

Principal Components Theory Notes

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

1 Linearity and Linear Systems

This appendix provides a very basic introduction to linear algebra concepts.

Chapter 3 Transformations

Conceptual Questions for Review

Singular Value Decomposition. 1 Singular Value Decomposition and the Four Fundamental Subspaces

Linear Algebra Review. Fei-Fei Li

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Linear Algebra V = T = ( 4 3 ).

Mathematical foundations - linear algebra

Data Preprocessing Tasks

B553 Lecture 5: Matrix Algebra Review

Assignment #10: Diagonalization of Symmetric Matrices, Quadratic Forms, Optimization, Singular Value Decomposition. Name:

[POLS 8500] Review of Linear Algebra, Probability and Information Theory

15 Singular Value Decomposition

14 Singular Value Decomposition

Image Registration Lecture 2: Vectors and Matrices

Math 360 Linear Algebra Fall Class Notes. a a a a a a. a a a

Econ Slides from Lecture 7

Linear Algebra Review. Fei-Fei Li

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

CSE 554 Lecture 7: Alignment

Linear Algebra & Geometry why is linear algebra useful in computer vision?

2. Matrix Algebra and Random Vectors

Introduction to Machine Learning

Announcements (repeat) Principal Components Analysis

Lecture 3: Review of Linear Algebra

Lecture 3: Review of Linear Algebra

1 Principal component analysis and dimensional reduction

Maths for Signals and Systems Linear Algebra in Engineering

Lecture Notes 2: Matrices

Math Matrix Algebra

COMP6237 Data Mining Covariance, EVD, PCA & SVD. Jonathon Hare

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =

Linear Algebra & Geometry why is linear algebra useful in computer vision?

5 Linear Algebra and Inverse Problem

CS 246 Review of Linear Algebra 01/17/19

Multiplying matrices by diagonal matrices is faster than usual matrix multiplication.

PRINCIPAL COMPONENTS ANALYSIS

Lecture: Face Recognition and Feature Reduction

The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA)

Properties of Matrices and Operations on Matrices

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

Basic Concepts in Matrix Algebra

Linear Dimensionality Reduction

Chap 3. Linear Algebra

Deep Learning Book Notes Chapter 2: Linear Algebra

Math Bootcamp An p-dimensional vector is p numbers put together. Written as. x 1 x =. x p

Gaussian random variables inr n

Applied Linear Algebra in Geoscience Using MATLAB

Lecture: Face Recognition and Feature Reduction

Linear Algebra March 16, 2019

Background Mathematics (2/2) 1. David Barber

A Tutorial on Data Reduction. Principal Component Analysis Theoretical Discussion. By Shireen Elhabian and Aly Farag

PCA, Kernel PCA, ICA

Linear Algebra and Matrices

Review problems for MA 54, Fall 2004.

The Matrix Algebra of Sample Statistics

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2

Large Scale Data Analysis Using Deep Learning

Multivariate Statistical Analysis

Karhunen-Loève Transform KLT. JanKees van der Poel D.Sc. Student, Mechanical Engineering

The study of linear algebra involves several types of mathematical objects:

Appendix A: Matrices

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

7. Symmetric Matrices and Quadratic Forms

Principal Component Analysis. Applied Multivariate Statistics Spring 2012

Lecture 6 Positive Definite Matrices

Ch.3 Canonical correlation analysis (CCA) [Book, Sect. 2.4]

COMP 558 lecture 18 Nov. 15, 2010

Lecture Notes 1: Vector spaces

L3: Review of linear algebra and MATLAB

Final Exam Practice Problems Answers Math 24 Winter 2012

Numerical Methods I Singular Value Decomposition

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Mathematics 13: Lecture 10

Recall the convention that, for us, all vectors are column vectors.

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining

Bare minimum on matrix algebra. Psychology 588: Covariance structure and factor models

MATH 1120 (LINEAR ALGEBRA 1), FINAL EXAM FALL 2011 SOLUTIONS TO PRACTICE VERSION

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

Unsupervised Learning: Dimensionality Reduction

A = 3 B = A 1 1 matrix is the same as a number or scalar, 3 = [3].

. a m1 a mn. a 1 a 2 a = a n

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016

DS-GA 1002 Lecture notes 10 November 23, Linear models

Transcription:

Principal Components Analysis (PCA) and Singular Value Decomposition (SVD) with applications to Microarrays Prof. Tesler Math 283 Fall 2015 Prof. Tesler Principal Components Analysis Math 283 / Fall 2015 1 / 39

Covariance Let X and Y be random variables, possibly dependent. Recall that the covariance of X and Y is defined as Cov(X, Y) = E ( (X µ X )(Y µ Y ) ) and that an alternate formula is Cov(X, Y) = E(XY) E(X)E(Y) Previously we used Var(X + Y) = Var(X) + Var(Y) + 2 Cov(X, Y) and Var(X 1 + X 2 + + X n ) = Var(X 1 ) + + Var(X n ) Prof. Tesler Principal Components Analysis Math 283 / Fall 2015 2 / 39

Covariance properties Covariance properties Cov(X, X) = Var(X) Cov(X, Y) = Cov(Y, X) Cov(aX + b, cy + d) = ac Cov(X, Y) Sign of covariance Cov(X, Y) = E((X µ X )(Y µ Y )) When Cov(X, Y) is positive: there is a tendency to have X > µ X when Y > µ Y and vice-versa, and X < µ X when Y < µ Y and vice-versa. When Cov(X, Y) is negative: there is a tendency to have X > µ X when Y < µ Y and vice-versa, and X < µ X when Y > µ Y and vice-versa. When Cov(X, Y) = 0: a) X and Y might be independent, but it s not guaranteed. b) Var(X + Y) = Var(X) + Var(Y) Prof. Tesler Principal Components Analysis Math 283 / Fall 2015 3 / 39

Sample variance Variance of a random variable: σ 2 = Var(X) = E((X µ X ) 2 ) = E(X 2 ) (E(X)) 2 Sample variance from data x 1,..., x n : s 2 = var(x) = 1 n 1 n i=1 (x i x) 2 = 1 n 1 ( n i=1 x i 2 ) n n 1 x2 Vector formula: Centered data: M = [ x 1 x x 2 x x n x ] s 2 = M M n 1 = M M n 1 Prof. Tesler Principal Components Analysis Math 283 / Fall 2015 4 / 39

Sample covariance Covariance between random variables X, Y: σ XY = Cov(X, Y) = E((X µ X )(Y µ Y )) = E(XY) E(X)E(Y) Sample covariance from data (x 1, y 1 ),..., (x n, y n ): ( s XY = cov(x, y) = 1 n n ) (x i x)(y i ȳ) = 1 x i y i n 1 n 1 i=1 i=1 n n 1 xȳ Vector formula: M X = [ x 1 x x 2 x x n x ] M Y = [ y 1 ȳ y 2 ȳ y n ȳ ] s XY = M X M Y n 1 = M X M Y n 1 Prof. Tesler Principal Components Analysis Math 283 / Fall 2015 5 / 39

Covariance matrix For problems with many simultaneous random variables, put them into vectors: [ ] T R X = Y = U S V and then form a covariance matrix: [ ] Cov(R, T) Cov(R, U) Cov(R, V) Cov( X, Y) = Cov(S, T) Cov(S, U) Cov(S, V) In matrix/vector notation, Cov( X, Y) = E [ ( X E( X)) ( Y E( Y)) ] Prof. Tesler Principal Components Analysis Math 283 / Fall 2015 6 / 39

Covariance matrix (a.k.a. Variance-Covariance matrix) Often there s one vector with all the variables: R X = S T Cov( X) = Cov( X, X) = E [ ( X E( X)) ( X E( X)) ] Cov(R, R) Cov(R, S) Cov(R, T) = Cov(S, R) Cov(S, S) Cov(S, T) Cov(T, R) Cov(T, S) Cov(T, T) Var(R) Cov(R, S) Cov(R, T) = Cov(R, S) Var(S) Cov(S, T) Cov(R, T) Cov(S, T) Var(T) The matrix is symmetric. The diagonal entries are ordinary variances. Prof. Tesler Principal Components Analysis Math 283 / Fall 2015 7 / 39

Covariance matrix properties Cov( X, Y) = Cov( Y, X) Cov(A X + B, Y) = A Cov( X, Y) Cov( X, C Y + D) = Cov( X, Y)C Cov(A X + B) = A Cov( X)A Cov( X 1 + X 2, Y) = Cov( X 1, Y) + Cov( X 2, Y) Cov( X, Y 1 + Y 2 ) = Cov( X, Y 1 ) + Cov( X, Y 2 ) A, C are constant matrices, B, D are constant vectors, and all dimensions must be correct for matrix arithmetic. Prof. Tesler Principal Components Analysis Math 283 / Fall 2015 8 / 39

Example (2D, but works for higher dimensions too) Data (x 1, y 1 ),..., (x 100, y 100 ): M 0 = [ ] x1 x 100 y 1 y 100 = [ ] 3.0858 0.8806 9.8850 4.4106 12.8562 10.7804 8.7504 13.5627 20 (ri+inal data 1' 16 14 12 10 ' 6 4 2 0!5 0 5 10 15 Prof. Tesler Principal Components Analysis Math 283 / Fall 2015 9 / 39

Centered data (ri+inal data Centered data 20 1' 16 14 12 10 ' 6 4 2 0!5 0 5 10 15 10 8 6 4 2 0!2!4!6!8!10!5 0 5 10 Prof. Tesler Principal Components Analysis Math 283 / Fall 2015 10 / 39

Computing sample covariance matrix Original data: 100 (x, y) points in a 2 100 matrix M 0 : [ ] [ ] x1 x M 0 = 100 3.0858 0.8806 9.8850 4.4106 = y 1 y 100 12.8562 10.7804 8.7504 13.5627 Centered data: subtract x from x s and ȳ from y s to get M; here x = 5, ȳ = 10: [ ] 1.9142 4.1194 4.8850 0.5894 M = 2.8562 0.7804 1.2496 3.5627 Sample covariance: C = M M [ ] 31.9702 16.5683 100 1 = 16.5683 13.0018 [ ] [ ] sxx s sx 2 s = XY = XY s XY s 2 Y s YX s YY Prof. Tesler Principal Components Analysis Math 283 / Fall 2015 11 / 39

Orthonormal matrix Recall that for vectors v, w, we have v w = v w cos(θ), where θ is the angle between the vectors. Orthogonal means perpendicular. v and w are orthogonal when the angle between them is θ = 90 = π 2 radians. So cos(θ) = 0 and v w = 0. Vectors v 1,..., v n are orthonormal when v i v j = 0 for i j (different vectors are orthogonal) v i v i = 1 for all i (each{ vector has length 1; they are all unit vectors) 0 if i j In short: v i v j = δ ij = 1 if i = j. Prof. Tesler Principal Components Analysis Math 283 / Fall 2015 12 / 39

Orthonormal matrix Form an n n matrix of orthonormal vectors V = [ ] v 1 v n by loading n-dimensional column vectors into the columns of V. Transpose it to convert the vectors to row vectors: v 1 v 2 V = (V V) ij is the i th row of V dotted with the j th column of V: 1 0 0 (V V) ij = v i v j = δ ij V 0 V =. 1.... 0. 0 0 1 Thus, V V = I (n n identity matrix), so V = V 1. An n n matrix V is orthonormal when V V = I (or equivalently, VV = I), where I is the n n identity matrix.. v n Prof. Tesler Principal Components Analysis Math 283 / Fall 2015 13 / 39

Diagonalizing the sample covariance matrix C C is a real-valued symmetric matrix. It can be shown that it can be diagonalized C = VDV where V is orthonormal, so V 1 = V. [ C ] = [ V ][ D ][ V ] 31.9702 16.5683 0.8651 0.5016 41.5768 0 0.8651 0.5016 = 16.5683 13.0018 0.5016 0.8651 0 3.3952 0.5016 0.8651 Also, all eigenvalues are 0 ( C is positive semidefinite ): For all vectors w, w C w = w MM w n 1 = M w 2 n 1 0 Eigenvector equation C w = λ w gives w C w = λ w w = λ w 2. So λ w 2 = w C w 0, giving λ 0. It is conventional to put the eigenvalues on the diagonal of D in decreasing order: λ 1 λ 2 0. Prof. Tesler Principal Components Analysis Math 283 / Fall 2015 14 / 39

Principal axes The columns of V are the right eigenvectors of C. Multiply each eigenvector by the square root of its eigenvalue to get the principal components. Eigenvalue Eigenvector PC [ ] [ ] 0.8651 5.5782 41.5768 0.5016 3.2343 [ ] [ ] 0.5016 0.9242 3.3952 0.8651 1.5940 Put them into the columns of a matrix: P = V [ ] 5.5782 0.9242 D = 3.2343 1.5940 C = VDV = V D D V = (V D)(V D) = PP Prof. Tesler Principal Components Analysis Math 283 / Fall 2015 15 / 39

Principal axes Plot the centered data with lines along the principal axes: Principal axes 10 8 6 4 2 0!2!4!6!8!10!5 0 5 10 Sum of squared perpendicular distances of data points to first PC line (red) is minimum among all lines through origin. ith PC is perpendicular to the previous ones, and the sum of squared perpendicular distances to the span (line, plane,...) of the first i PCs is minimum among all i-dim. spaces through origin. Prof. Tesler Principal Components Analysis Math 283 / Fall 2015 16 / 39

Rotate axes Transform M to M 2 = V M and plot points given by the columns of M 2 : Principal axes (ot+tion / ne1 prin4ip+5 +6es 10 8 6 10 $ % 4 2 0!2!4!6!8!10!5 0 5 10 4 2 0!2!4!%!$!10!15!10!5 0 5 10 From linear algebra, a linear transformation M AM does a combination of rotations, reflections, scaling, shearing, and orthogonal projections. V is orthonormal, so M 2 = V M rotates/reflects all the data. M = VM 2 recovers centered data M from rotated data M 2. The sample covariance matrix of M 2 is M 2 M 2 n 1 = (V M)(V M) n 1 = V MM V n 1 ( ) MM = V V = V CV = D n 1 Note C = VDV and V = V 1, so D = V 1 C(V ) 1 = V CV. Prof. Tesler Principal Components Analysis Math 283 / Fall 2015 17 / 39

New coordinates The rotated data has new coordinates (t 1, u 1 ),..., (t 100, u 100 ) and covariance matrix[ D: ] [ ] V var(t) cov(t, U) 41.5768 0 CV = D = = cov(t, U) var(u) 0 3.3952 In D, the total variance is var(t) + var(u) = 44.9720. Note that this is the sum of the eigenvalues, λ 1 + λ 2 +. The trace of a matrix is the sum of its diagonal entries. So the total variance is Tr(D) = λ 1 + λ 2 +. General linear algebra fact: Tr(X) = Tr(AXA 1 ). So Tr(C) = Tr(VDV ) = Tr(VDV 1 ) = Tr(D). Below, Tr(C) = Tr(D) = 44.9720. C = [ 31.9702 16.5683 16.5683 13.0018 ] D = [ ] 41.5768 0 0 3.3952 Prof. Tesler Principal Components Analysis Math 283 / Fall 2015 18 / 39

Part of variance explained by each axis The part of the variance explained by each axis is λ i /total variance: Eigenvector Eigenvalue Explained [ ] 0.8651 0.5016 41.5768 41.5768/44.9720 = 92.45% [ ] 0.5016 0.8651 3.3952 3.3952/44.9720 = 7.55% Total 44.9720 100% This is an application of Cov(A X) = A Cov( X)A : Cov ( [ ]) V X Y = V Cov ([ ]) X V Y Prof. Tesler Principal Components Analysis Math 283 / Fall 2015 19 / 39

Dimension reduction To clean up noise, set all u i = 0 and rotate back: [ ] ] t1 t V 2 t 3 [ x1 x = 2 x 3 0 0 0 ỹ 1 ỹ 2 ỹ 3 $ % 4 2 0!2!4!%!$ (ro+ect ori1inal centered data to 1st (C!10!10!5 0 5 10 Prof. Tesler Principal Components Analysis Math 283 / Fall 2015 20 / 39

Dimension reduction Say we want to keep enough information to explain 90% of the variance. Take enough top PCs to explain 90% of the variance. Let M 3 be M 2 (rotated data) with the remaining coordinates zeroed out. Rotate it back to the original axes with VM 3. In other applications, a dominant signal can be suppressed by zeroing out the coordinates for the top PCs instead of the bottom PCs. Prof. Tesler Principal Components Analysis Math 283 / Fall 2015 21 / 39

Variations for PCA (and SVD, upcoming) Some people reverse the roles of rows and columns of M. In some applications, M is centered (subtract off row means) and in others, it s not. If the ranges on the variables (rows) are very different, the data might be rescaled in each row to make similar ranges. For example, replace each row by Z-scores for the row. Prof. Tesler Principal Components Analysis Math 283 / Fall 2015 22 / 39

Sensitivity to scaling PCA was originally designed for measurements in ordinary space, so all axes had the same units (e.g., cm or inches) and equivalent results would be obtained no matter what units were used. It s problematic to mix physical quantities with different units: Length of (a, b) in (seconds,mm): a 2 + b 2 (adding sec 2 plus mm 2 is not legitimate!) Convert to (hours,miles): ( a 3600, ) b 1609344 = Angles are also distorted by this unit conversion: arctan(b/a) arctan ( a 3600 a 2 3600 2 + b2 1609344 2 / b 1609344). (0 C, 0 C) = 0 vs. (32 F, 32 F) = 32 2. Both C and F use an arbitrary zero instead of absolute zero. PCA is sensitive to differences in the scale, offset, and ranges of the variables. Rescaling one row w/o the others changes angles and lengths nonuniformly, and changes eigenvalues and eigenvectors in an inequivalent way. Typically addressed by replacing each row with Z-scores. Prof. Tesler Principal Components Analysis Math 283 / Fall 2015 23 / 39

Microarrays Before we considered single genes where red or green (positive or negative expression level) distinguished the classes. If x i is the expression level of gene i then L = a 1 x 1 + a 2 x 2 + is a linear combination of genes. Next up: a method that finds linear combinations of genes where L > C and L < C distinguish two classes, for some constant C. So L = C is a line / plane / etc. that splits the multidimensional space of expression levels. Different classes are not always separated in this fashion. In some situations, nonlinear relations may be required. Prof. Tesler Principal Components Analysis Math 283 / Fall 2015 24 / 39

Microarrays Consider an experiment with 80 microarrays, each with 10000 spots. M is 10000 80. C = MM 80 1 is 10000 10000! M M is 80 80. We will see that MM has the same 80 eigenvalues as M M, plus an additional 10000 80 = 9920 eigenvalues equal to 0. Some of the 80 eigenvalues of M M may also be 0. For centered data, all row sums of M are 0 so [1,..., 1] is an eigenvector of M M with eigenvalue 0. We will see we can work with the smaller of MM or M M. Prof. Tesler Principal Components Analysis Math 283 / Fall 2015 25 / 39

Singular Value Decomposition (SVD) Let M be a p q matrix (not necessarily centered ). The Singular Value Decomposition of M is M = USV, where U is orthonormal, p p. V is orthonormal, q q. S is a diagonal p q matrix, s 1 s 2 0. If M is 5 3, this would look like M U S V s 1 0 0 = 0 s 2 0 0 0 s 3 0 0 0 0 0 0 Prof. Tesler Principal Components Analysis Math 283 / Fall 2015 26 / 39

Compact SVD For p > q: The bottom p q rows of S are all 0. Remove them. Keep only the first q rows of S and first q columns of U. U is orthonormal, p q. V is orthonormal, q q. S is a diagonal p q matrix, s 1 s 2 0. If M is 5 3, this would look like M U S V [ ] ] = s1 0 0 [ 0 s 2 0 0 0 s 3 For q > p: keep only the first p columns of S and first p rows of V. Matlab and R have options for full or compact form in svd(m). Prof. Tesler Principal Components Analysis Math 283 / Fall 2015 27 / 39

Computing the SVD M M = (VS U )(USV ) = V(S S)V = V MM = (USV )(VS U ) = U(SS )U = U s 1 2 0 0 0 s 2 2 0 V 0 0 s 2 3 s 2 1 0 0 0 0 0 s 2 2 0 0 0 0 0 s 2 3 0 0 0 0 0 0 0 U 0 0 0 0 0 This diagonalization of M M and MM shows they have the same eigenvalues up to the dimension of the smaller matrix. The larger matrix has all additional eigvenvalues equal to 0. Compute the SVD using whichever gives smaller dimensions! Prof. Tesler Principal Components Analysis Math 283 / Fall 2015 28 / 39

Computing the SVD M M = (VS U )(USV ) = V(S S)V = V s 1 2 0 0 0 s 2 2 0 V 0 0 s 2 3 s 2 1 0 0 0 0 0 s 2 MM = (USV )(VS U ) = U(SS )U 2 0 0 0 = U 0 0 s 2 3 0 0 0 0 0 0 0 U 0 0 0 0 0 First method (recommended when p q): Diagonalize M M = VDV. Compute p q matrix S with S ii = D ii and 0 s elsewhere. The pseudoinverse of S is S 1 : replace each nonzero diagonal entry of S by its reciprocal, and transpose to get a q p matrix. Compute U = MVS 1. q p is analogous: diagonalize MM = UDU ; compute S from D; then compute V = (S 1 U M). svd(m) in both Matlab and R. Prof. Tesler Principal Components Analysis Math 283 / Fall 2015 29 / 39

Singular values and singular vectors Let M be a p q matrix (not necessarily centered). Suppose s is a scalar. v is a q 1 unit vector (column vector). u is a p 1 unit vector (column vector). s is a singular value of M with right singular vector v and left singular vector u if M v = s u and u M = s v (same as M u = s v). Break U and V into columns U = [ u 1 u 2 u p ] V = [ v 1 v 2 v q ] Then M v i = s i u i and M u i = s i v i for i up to min(p, q). If p > q: M u i = 0 for i > q. If q > p: M v i = 0 for i > p. To get full-sized M = USV from compact (p q case): choose the remaining columns of U from the nullspace of M in such a way that the columns of U are an orthonormal basis of R p. Prof. Tesler Principal Components Analysis Math 283 / Fall 2015 30 / 39

Relation between PCA and SVD Previous computation for PCA Start with centered data matrix M (n columns). Compute covariance matrix, diagonalize it, compute P: C = MM n 1 = VDV = PP where P = V D Computing PCA using SVD In terms of the SVD factorization M = USV, covariance is C = MM n 1 = (USV )(VS U ) = U(SS )U n 1 n 1 = UDU where D = SS n 1 = PP where P = US n 1 Variance for ith component is s i 2 n 1 Note: there were minor notation adjustments to deal with n 1. Prof. Tesler Principal Components Analysis Math 283 / Fall 2015 31 / 39

SVD in microarrays Nielsen et al. 1 studied tumors in six types of tissue. 41 tissue samples and 46 microarray slides They switched microarray platforms in the middle of the experiment: The first 26 slides have 22,654 spots (22K). The next 20 slides have 42,611 spots (42K) (mostly a superset). Five of the samples were done on both 22K and 42K platforms. 7425 spots were in common to both platforms, had good signal across all slides, and had sample variance above a certain threshold. So M is 7425 46. 1 Molecular characterisation of soft tissue tumours: a gene expression study, Lancet (2002) 359: 1301 1307. Prof. Tesler Principal Components Analysis Math 283 / Fall 2015 32 / 39

SVD in microarrays Color scale: Negative 0 Positive Nielsen et al., supplementary material. http://genome-www.stanford.edu/sarcoma/supplemental_data.shtml The compact form M = USV is shown above. They call the columns of U eigenarrays and the columns of V (rows of V ) eigengenes. An eigenarray is a linear combination of arrays. An eigengene is a linear combination of genes. Prof. Tesler Principal Components Analysis Math 283 / Fall 2015 33 / 39

SVD in microarrays Sample covariance matrix: C = M M /45 = U S S U /45 Sample variance of ith component: s i 2 /45. Total sample variance: (s 1 2 + + s 46 2 )/45. Here is V and the explained fractions s i 2 /(s 1 2 + + s 46 2 ) Nielsen et al., supplementary material. Prof. Tesler Principal Components Analysis Math 283 / Fall 2015 34 / 39

Expression level of eigengenes The expression level of gene i on array j is M ij. Interpretation of change of basis S = U MV: the ith eigenarray only detects the ith eigengene, and has 0 response to other eigengenes. Interpretation of V = U MS 1 : The expression level of eigengene i on array j is (V ) ij = V ji. Let m represent a new array (e.g., a column vector of expression levels in each gene). The expression level of eigengene i is (U m) i /s i. Prof. Tesler Principal Components Analysis Math 283 / Fall 2015 35 / 39

Platform bias They re-ordered the arrays according to the expression level V j1 in the first eigengene (largest eigenvalue). They found that V j1 tends to be larger in the 42K arrays and smaller in the 22K arrays. This is an experimental artifact, not a property of the specimens under study. Nielsen et al., supplementary material. Prof. Tesler Principal Components Analysis Math 283 / Fall 2015 36 / 39

Removing 22K vs. 42K array bias Let S be S with the (1,1) entry replaced by 0. Let M = U SV. This reduces the signal and variance in many spots. After removing weak spots, they cut down to 5520 spots, giving a 5520 46 data matrix. Prof. Tesler Principal Components Analysis Math 283 / Fall 2015 37 / 39

Classification Eigengenes 1D For the 5520 46 matrix, the expression levels of the top three eigengenes can be used to classify some tumor types. '((709!=FA '((526!89,: '((607!89,: E.523 '((710!=FA '((420!=FA '((524!'cBCannoEa '((563!8,;: '((523!89,: '((398!8,;: / '((739!89,: '((894!=FA '((742!89,: '((398!8,;: - '((890!=FA '((417!=FA '((629!'cBCannoEa '((641!89,: '((889!=FA '((418!=FA '((616!89,: '((419!8,;:<=>? '((390!8,;: '((525!89,: '((1220!89,: '((840!89,: '((516!89,: / '((516!89,: - '((1324!'1n'arc '((108!'1n'arc / '((1148!+,'( '((108!'1n'arc - '((111!+,'( '((646!+,'( '((794!+,'( '((865!'1n'arc '((335!+,'( '((219!+,'( '((117!'1n'arc '((854!'1n'arc '((535!'1n'arc '((638!'1n'arc '((094!+,'( - '((656!+,'( / '((850!'1n'arc '((094!+,'( / '((656!+,'( - 5520 spots eigengene 1!2!1 0 1 2 $ 10 4 STT535!SynSarc STT117!SynSarc STT850!SynSarc STT638!SynSarc STT108!SynSarc A STT108!SynSarc B STT854!SynSarc STT418!MFH STT865!SynSarc STT1324!SynSarc STT525!LEIO STT523!LEIO STT889!MFH STT417!MFH STT1220!LEIO STT563!LIPO STT390!LIPO STT607!LEIO m.523 STT894!MFH STT419!LIPO/MYX STT420!MFH STT616!LEIO STT890!MFH STT739!LEIO STT526!LEIO STT516!LEIO A STT516!LEIO B STT742!LEIO STT709!MFH STT641!LEIO STT398!LIPO B STT398!LIPO A STT840!LEIO STT710!MFH STT524!Schwannoma STT111!GIST STT629!Schwannoma STT656!GIST A STT1148!GIST STT656!GIST B STT094!GIST B STT646!GIST STT335!GIST STT219!GIST STT794!GIST STT094!GIST A 5520 spots eigengene 2!1!0.5 0 0.5 1 1.5 x 10 4 $%%71/'!(*2+'- $%%71/'!(*2+', $%%!#/'!$<=$>?@'- $%%:#1'!489 $%%:!#'!489 $%%"&7'!(*2+ $%%&.&'!;*$% $%%!#/'!$<=$>?@', $%%:71'!()*+ $%%/1.'!489 $%%"0&'!()*+ $%%&01'!$@CD>==EA> $%%:.0'!()*+' $%%/1#'!489 $%%&"&'!;*$%', $%%.0#'!489 $%%"0.'!$@CD>==EA> $%%!!:'!$<=$>?@ $%%:1.'!;*$% $%%77"'!;*$% $%%&"&'!;*$%'- $%%"7"'!$<=$>?@ $%%71#'!(*2+ $%%!70.!$<=$>?@ $%%.!/'!489 $%%0!1'!;*$% $%%/".'!$<=$>?@ $%%&#:'!()*+'AB"07 $%%/"#'!$<=$>?@ $%%!!./!;*$% $%%&7/'!$<=$>?@ $%%#1.'!;*$%'- $%%!!!'!;*$% $%%/&"'!$<=$>?@ $%%#1.'!;*$%', $%%.!:'!489 $%%//1'!489 $%%"07'!()*+ $%%.!1'!(*2+3456 $%%!00#!()*+ $%%"0"'!()*+ $%%/.#'!()*+ $%%&.!'!()*+ $%%"!&'!()*+'- $%%&!&'!()*+ $%%"!&'!()*+', ""0#'FGEHF'IJKI=KI=I'7!!"###!!####!"### # "### Prof. Tesler Principal Components Analysis Math 283 / Fall 2015 38 / 39

Classification Eigengenes 2D λ 1, λ 2, λ 3 help distinguish between tumor types ()*+,*+,+&-!"# %&!$'! $"# $./,.012 34.5 6(47 6487 9:;.2<=0,,>?0!$"#!!!!"#!!!$"# $ $"#!!"# ()*+,*+,+&! %&!$ ' +,-./-./.&0 '$$$ ($$$ $!($$$!'$$$ 12/1345!*$$$ 6718 9+7:!)$$$ 97;: <=>!!$$$$ 15?@3//AB3!!($$$!!"#!!!$"# $ +,-./-./.&! $"#!!"# %&!$ ' +,-./-./.&0 '""" (""" "!("""!'""" 12/1345!*""" 6718!)""" 9+7: 97;:!!"""" <=> 15?@3//AB3!!("""!!!"#$ " "#$!!#$ +,-./-./.&( %&!" ' Prof. Tesler Principal Components Analysis Math 283 / Fall 2015 39 / 39