Singular Value Decomposition and Principal Component Analysis (PCA) I

Similar documents
Conceptual Questions for Review

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =

Linear Algebra Review. Vectors

Linear Algebra review Powers of a diagonalizable matrix Spectral decomposition

Linear Algebra review Powers of a diagonalizable matrix Spectral decomposition

CS 246 Review of Linear Algebra 01/17/19

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Math Linear Algebra Final Exam Review Sheet

Review of Linear Algebra

Knowledge Discovery and Data Mining 1 (VO) ( )

Review of Linear Algebra

MATRICES ARE SIMILAR TO TRIANGULAR MATRICES

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

Linear Algebra (Review) Volker Tresp 2018

UNIT 6: The singular value decomposition.

Linear Algebra for Machine Learning. Sargur N. Srihari

Quantum Computing Lecture 2. Review of Linear Algebra

MATH 240 Spring, Chapter 1: Linear Equations and Matrices

Lecture 7 Spectral methods

Math Bootcamp An p-dimensional vector is p numbers put together. Written as. x 1 x =. x p

Multivariate Statistical Analysis

CS 143 Linear Algebra Review

2 b 3 b 4. c c 2 c 3 c 4

Mathematical foundations - linear algebra

Elementary maths for GMT

Linear Algebra. Matrices Operations. Consider, for example, a system of equations such as x + 2y z + 4w = 0, 3x 4y + 2z 6w = 0, x 3y 2z + w = 0.

Math 215 HW #9 Solutions

Foundations of Computer Vision

Bare minimum on matrix algebra. Psychology 588: Covariance structure and factor models

Study Notes on Matrices & Determinants for GATE 2017

Matrix Algebra Determinant, Inverse matrix. Matrices. A. Fabretti. Mathematics 2 A.Y. 2015/2016. A. Fabretti Matrices

Introduction to Machine Learning

Glossary of Linear Algebra Terms. Prepared by Vince Zaccone For Campus Learning Assistance Services at UCSB

Linear Algebra - Part II

A Tutorial on Data Reduction. Principal Component Analysis Theoretical Discussion. By Shireen Elhabian and Aly Farag

MATH 20F: LINEAR ALGEBRA LECTURE B00 (T. KEMP)

Math 520 Exam 2 Topic Outline Sections 1 3 (Xiao/Dumas/Liaw) Spring 2008

Lecture 15 Review of Matrix Theory III. Dr. Radhakant Padhi Asst. Professor Dept. of Aerospace Engineering Indian Institute of Science - Bangalore

q n. Q T Q = I. Projections Least Squares best fit solution to Ax = b. Gram-Schmidt process for getting an orthonormal basis from any basis.

Principal Components Analysis (PCA) and Singular Value Decomposition (SVD) with applications to Microarrays

18.06SC Final Exam Solutions

Linear Algebra Primer

1 Matrices and Systems of Linear Equations. a 1n a 2n

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Introduction to Matrix Algebra

linearly indepedent eigenvectors as the multiplicity of the root, but in general there may be no more than one. For further discussion, assume matrice

1. Linear systems of equations. Chapters 7-8: Linear Algebra. Solution(s) of a linear system of equations (continued)

Basic Concepts in Matrix Algebra

Introduction to Matrices

. a m1 a mn. a 1 a 2 a = a n

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Topic 1: Matrix diagonalization

Review problems for MA 54, Fall 2004.

Linear Algebra (Review) Volker Tresp 2017

Lecture Notes in Linear Algebra

Unit 3: Matrices. Juan Luis Melero and Eduardo Eyras. September 2018

Equality: Two matrices A and B are equal, i.e., A = B if A and B have the same order and the entries of A and B are the same.

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Quick Tour of Linear Algebra and Graph Theory

NOTES on LINEAR ALGEBRA 1

Econ Slides from Lecture 7

PCA, Kernel PCA, ICA

Spectral Theorem for Self-adjoint Linear Operators

14 Singular Value Decomposition

Introduc)on to linear algebra

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

LINEAR ALGEBRA REVIEW

Vectors and Matrices Statistics with Vectors and Matrices

Ir O D = D = ( ) Section 2.6 Example 1. (Bottom of page 119) dim(v ) = dim(l(v, W )) = dim(v ) dim(f ) = dim(v )

Chapter 3 Transformations

EE731 Lecture Notes: Matrix Computations for Signal Processing

Large Scale Data Analysis Using Deep Learning

Lecture 1 Review: Linear models have the form (in matrix notation) Y = Xβ + ε,

Matrices and Linear Algebra

The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA)

B553 Lecture 5: Matrix Algebra Review

Linear Algebra and Matrices

Mathematics. EC / EE / IN / ME / CE. for

Matrices A brief introduction

Tutorial on Principal Component Analysis

Phys 201. Matrices and Determinants

Final Exam, Linear Algebra, Fall, 2003, W. Stephen Wilson

1 Matrices and Systems of Linear Equations

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

Chapter 3. Determinants and Eigenvalues

1 Linearity and Linear Systems

Math 408 Advanced Linear Algebra

MA 1B ANALYTIC - HOMEWORK SET 7 SOLUTIONS

Linear Algebra and Eigenproblems

Math 102 Final Exam - Dec 14 - PCYNH pm Fall Name Student No. Section A0

LECTURE 16: PCA AND SVD

PHYS 705: Classical Mechanics. Rigid Body Motion Introduction + Math Review

MATH 31 - ADDITIONAL PRACTICE PROBLEMS FOR FINAL

Linear algebra for computational statistics

ANSWERS (5 points) Let A be a 2 2 matrix such that A =. Compute A. 2

MATRICES The numbers or letters in any given matrix are called its entries or elements

Foundations of Matrix Analysis

22.3. Repeated Eigenvalues and Symmetric Matrices. Introduction. Prerequisites. Learning Outcomes

PRACTICE FINAL EXAM. why. If they are dependent, exhibit a linear dependence relation among them.

Lecture Notes 2: Matrices

Transcription:

Singular Value Decomposition and Principal Component Analysis (PCA) I Prof Ned Wingreen MOL 40/50 Microarray review Data per array: 0000 genes, I (green) i,i (red) i 000 000+ data points! The expression matrix has entries of the form log 2 ( X = g i a j With X 00 arrays, this yields, ) I (green) ij : I (red) ij where the i th row g i of the m rows is the transcriptional response of the gene i, and where the j th column a j of the n columns is the expression profile assay j This is an m n matrix, where the number of rows m 0000 and the number of columns n 00 Example Assays taken every 0 minutes after addition of some compound (nutrient, poison, signal molecule) how to characterize the response of the genes? For any 2 log 2 I(t) I(0) 50 00 50 200 t 2 Figure : Noisy genes individual gene, it may be impossible to see the signal through the noise Individual genes may respond with a superposition of different patterns and noise How to extract the important patterns from the noise? We will use SVD and

PCA to analyze the expression matrix X But first we need to learn (or review) some linear algebra, so we can manipulate matrices like X Linear Algebra Matrices a a a 2 a a m n m n matrix A =, the transpose A T a 2 =, a m a m2 a mn a m where the transpose interchanges columns and rows If m = n, then A is called square Matrix multiplication If A is a m n matrix and B is a n l matrix, then AB is a m l matrix where (AB) ij = n a ik b kj k= We can think of this as dot products of vectors: a a 2 ( ) b b2 bl AB = a m Identity matrix (AB) ij = a i b j = a T b square matrix I = 0 0 I is the identity matrix because AI = A IA = A, for all matrices A 2

Inverse of a square matrix AA = A A = I Note that (A ) = A Finding the inverse of a matrix is slow in practice In general: C C 2 A = A (C ij) T = A C 2, C ji where A = deta Therefore A is /deta times the transpose of the cofactor matrix (aka the adjoint of A ) C ij = ( ) i+j M ij, where the minor M ij is the determinant of the submatrix obtained from A by deleting row i and column j Determinants of square matrices Example 2 2 matrix: ( ) a b det = ad bc c d The determinant of larger matrices can be defined by induction Example) 3 3 matrix: i j k det a b c = i(bf ce) j(af cd) + k(ae bd) d e f More generally, A = deta = n a ij ( ) i+j M ij, j= where as above M ij is the determinant of the submatrix formed by A without row i and column j: a a 2 a n A = a 2 ( ), so that A = a submatrix M a 2 M 2 + Example: Inverse of a 2 2 matrix A = A = A ( ) a b A = c d ( ) C C 2 = ( ) M M 2 A M 2 M 22 C 2 C 22 ( ) d b = c a ad bc ( d b c a ) 3

Check: ( )( ) A d b a b A = ad bc c a c d ( ) ad bc 0 = ad bc 0 ad bc ( ) 0 = = I 0 Linear independence c v + c 2 v 2 + c 3 v 3 + + c n v n = 0 c = c 2 = = c n = 0 If n vectors are not linearly independent, then they span a subspace of dimension < n Eg Two vectors v and v 2 are linearly dependent if and only if they are multiples of each other: v = a v 2 In this case v + v 2 only span a dimensional subspace Orthogonality Figure 2: Linearly dependent vectors Two vectors are orthogonal if x y = 0 ( ) ( ) 4 x =, y = : x y = 4 + 2( 2) = 0 2 2 Orthogonal vectors are perpendicular x y Figure 3: Orthogonal vectors Eigenvalues and eigenvectors of symmetric matrices Eigen means characteristic, so these are also sometimes called characteristic values and vectors Eigenvalues and vectors are defined and related by the equation A v = λ v, 4

where v is an eigenvector and λ is an eigenvalue (A consequence of this definition is that if v is an eigenvector, then so is c v) To find eigenvalues, we use the theorem (A λi) v = 0 det(a λi) = 0 This follows from a fact which is good to know: detx = i λ i Similarly, tr X = i λ i We can therefore find eigenvalues by solving a λ a 2 a 2 a 22 λ det(a λi) = = 0, ann λ which is an n th order polynomial in λ, with n roots, ie solutions for λ (these are either real or complex conjugate pairs) Once a λ i is known, A v = λ i v gives a set of linear equations which can be solved for the associated eigenvector v i Diagonalization If A is an n n matrix with n distinct eigenvalues λ,λ 2,,λ n and associated eigenvectors v i, then let ( ) v v U = 2 v n, where the eigenvectors v are the columns of U Now AU = A( v, v 2,, v n ) = (A v,a v 2,,A v n ) = (λ v,λ 2 v 2,,λ n v n ) λ 0 λ 2 = U = UD, 0 λn where D is the diagonal matrix of eigenvalues Manipulating the expression AU = U D yields the identity A = UDU This is the eigen-decomposition theorem, and it is an example of a similarity transform When A is a symmetric matrix, ie, when A = A T, then it is possible to make the vs orthonormal, in which case U = U T, so that A = UDU T 5

(Principal Component Analysis (PCA) is equivalent to diagonalization of a particular symmetric matrix the covariance matrix) The eigendecomposition allows some nice results: A 2 = AA = (UDU )(UDU ) = UDDU = UD 2 U = U λ 2 λ 2 2 U, λ 2 n and in the same way A m = UD m U = U λ m λ m 2 λ m n U So if we known the eigenvalues of A, we immediately know the eigenvalues of A m In particular, It is easy to check that A = (UDU ) = (U ) D U = UD U A A = (UD U )(UDU ) = UD DU = UU = I Therefore /λ A /λ 2 = U U /λn Singular Value Decomposition (SVD) SVD b A v A v s v 2 s 2 A b v 2 Figure 4: Singular Value Decomposition (SVD) SVD is a generalization of diagonalization for non-symmetric matrices If A is an m n matrix, then U,D,V : A = UDV T, 6

where U is an m m matrix with orthonormal columns and V is an n n matrix with orthonormal rows D is an m n diagonal matrix, where S S 2 D =, S S 2 S n 0 Sn These S i are the singular values 7

SVD and PCA II Prof Ned Wingreen MOL 40/50 Let s see how we can first apply PCA and then SVD to extract information about gene regulation from a gene expression matrix Recall a j X = g i, where the i th row g i of the m rows is the transcriptional response of the gene i, and where the j th column a j of the n columns is the expression profile assay j Consider the expression profile for each assay This will be the vector a k =(x k,x 2k,,x mk ) Each assay is therefore associated with a point in the m-dimensional space of genes The graph, of course, shows only 2 dimensions of a m-dimensional space, g 2 a 2 a g Figure : Gene space but one can immediately see that there is a correlation between the expression of gene g and g 2 How can we find correlations among multiple genes? Eg how would we know if the expression of genes g, g 2, g 37,andg 64 are all correlated? We can learn this from PCA and/or SVD First, center the data by subtracting out the mean value for each gene: y ik = x ik x i, where x i = n n x ik, the average having been taken over the n assays This yields the vector a k = a k x =(y k,y 2k,,y mk ) k=

g 2 a 2 a g Figure 2: Re-centered gene space Notice in this example that even after subtracting out the means, the expression values for genes and 2 are correlated over the set of assays Biologically, this suggests that genes and 2 are coregulated How can we quantify this? In particular how can we recognize if many sets of genes are coregulated (or counter-regulated)? Graphically, we d like to find the directions in gene space (weighed combinations of genes) that best capture the observed correlations in the data How do we do this mathematically, allowing for many correlated genes? Answer these directions are the principal components of a particular matrix, the covariance matrix With n assays, this takes the form: C ij = n y ik y jk = n (x ik x i )(x jk x j ) n n k= k= = E[(x i x i )(x j x j )] = E[x i x j ] x i x j =cov(x i,x j ) From the covariance matrix we can also define cor(x i,x j )= cov(x i,x j ) σ i σ j, the correlation coefficient of x i and x j (often called r), which satisfies r If x i and x j are independent, Two other easy relations are cov(x i,x j )=E[x i x j ] x i x j = x i x j x i x j =0 cov(x i,x i )=E[(x i x i ) 2 ]=σi 2 (cor(x i,x i )=r =) cov(x i, x i )= σi 2 (cor(x i, x i )=r = ) The covariance matrix is symmetric: cov(x,x ) cov(x 2,x ) cov = cov(x,x 2 ) 2

Diagonalizing the covariance matrix is called a Principal Component Analysis λ u ( ) cov = UDU T u u = 2 u m λ 2 u 2, λm u m where λ λ 2 λ m 0 The us are called orthonormal eigenvectors called the principal component vectors and the λs, the eigenvalues of the covariance matrix, are called principal component values The eigenvectors give the variance of the data along the principal component directions, and the covariance between these directions is zero So u gives the direction in which the data is most stretched and λ is the variance of the data in this direction Notice that Total variance of data = i cov(x i,x i )=trcov= k λ k where we ve used the fact that the trace of a square matrix is the sum of its eigenvalues In terms of gene expression, u gives the weighted set of genes that are most strongly coregulated (or counter-regulated!), u 2 gives the second strongest set, etc A plot of the eigenvalues λ i is order from largest to smallest may reveal a transition from signal to noise: 3 Fraction of total variance 2 signal noise Figure 3: Eigenvalue plot Dimension reduction If the PCA separates signal from noise, we can clean up data by considering projection of data onto principal components For each assay: a k =(x k,x 2k,,x mk )= a k + x, 3

g 2 ỹ ỹ 2 ỹ 3 g ỹ 4 ỹ 5 Figure 4: Values of ỹ we can find the projection of a k onto the principal components: ỹ k = u a k ỹ k2 = u 2 a k Keeping only the first few values of ỹ for each assay accounts for most of the signal in the data Plotting the assay data projected onto the first few principal components can also reveal correlations beyond linear Sometimes, ỹ and ỹ 2 will show no additional correlations (Figure 5), but sometimes correlations will be apparent even though ỹ, ỹ 2 = 0 (Figure 6) ỹ 2 ỹ Figure 5: Uncorrelated Other applications of PCA in biology: Molecular dynamics reconstructing flexible modes of biomolecules from snapshots Synthetic lethality Immunology antigen-antibody Residue-residue potentials Almost any large data set 4

ỹ 2 ỹ Figure 6: Correlated SVD By summing over assays to produce the covariance matrix, we have thrown away information Imagine that the assays were taken at fixed time intervals how can we recapture the time dependence of the gene expression? Any one gene may have a noisy time course of expression: x t t Figure 7: Single gene expression over time But maybe there is a coherent response that contributes a little bit to many genes: This coherent signal will contribute to the correlation of many genes, and so these genes will co-vary So these genes will form a principal component in the PCA of the covariance matrix So after performing a PCA, imagine putting time labels back on the data: By finding the principal components and plotting the projections on these components for each assay vs time, we reconstruct the coherent signal hidden in noisy data SVD allows us to do this in one step: m n matrix X = UDV T = ( u u 2 u m S ) S 2 v v 2 Sn v n, where S S 2 S n 0 In this decomposition, the us represent correlated sets of genes and the vs represent coherent patterns of genes expression 5

many genes slightly positive x many genes slightly negative t Figure 8: Overall x signal g 2 2 4 3 6 7 5 g Figure 9: Relabeled data The singular values S are simply related to the principal components of the covariance matrix: λ k = Sk 2 The matrix constructed from the leading l singular values, and the associated right { v} and left { u} singular vectors is the best rank l approximation to X, that is, choosing l X (l) = u k S k v k T minimizes i,j k= x ij x (l) ij 2 6

ỹ t Figure 0: Explicit time plot 7