PCA: Principal Component Analysis

Size: px

Start display at page:

Download "PCA: Principal Component Analysis"

Calvin Matthews
5 years ago
Views:

1 PCA: Principal Component Analysis Lyron Winderbaum University of Adelaide January 29, 2015

2 PCA is the vanilla flavour of Component Analysis What is a component? What makes a component principal?

3 A Contrived Example I ll make some data so that we can discuss these concepts in a concrete context: n < df_eg <- data.frame(x = rnorm(n,10,2.5)) df_eg$y <- rnorm(n,df_eg$x,abs(2.5-abs(df_eg$x-10))) PS: I made this presentation in knitr, and your welcome to to the source code if you would like to see how.

4 Y X

5 What are Components? Components can be thought of in a number of (equivalent) ways: as rotations, as directions, as axes, as orthogonal projections, as linear combinations. and it can be useful to understand that all of these concepts are equivalent. Forgive me if I switch between them.

6 Y X

7 If we rotate into the red and blue axes rot_red <- function(x,y){return(( 2/sqrt(5))*X + (1/sqrt(5))*Y)} rot_blue <- function(x,y){return((-1/sqrt(5))*x + (2/sqrt(5))*Y)} df_eg$red <- rot_red( df_eg$x,df_eg$y) df_eg$blue <- rot_blue(df_eg$x,df_eg$y)

8 Red vs. Blue ( What, like a puma? ) 10 blue red

9 Red, sd = count red

10 Blue, sd = count blue

11 X, sd = count X

12 Y, sd = count Y

13 What makes a component principal? Large Variance. i.e. standard deviation.

14 PCA pca_results <- prcomp(df_eg[,c('x','y')]) summary(pca_results) ## Importance of components: ## PC1 PC2 ## Standard deviation ## Proportion of Variance ## Cumulative Proportion

15 Y X

16 2 PC PC1

17 Things to Note about PCA: Real datasets typically have more than 2 variables and weird things happen in high dimensional space. Consult a mathematician. Just because a component contains the most variability, doesn t mean it contains the variability you are interested in. PCA is not a golden hammer. It is relatively standard to scale before doing PCA why, I do not entirely understand, but here I did not do that. Doing so can produce quite different results. Ask me about Florian s DIGE data.

18 [1]

19 Pointers an-intuitive-explanation-of-pca.html bsa501/full

20 Notation X = X 1 X 2. X d and X (0, Σ) E [X] = 0 without loss of generality. Σ = var(x) = cov(x, X) = E [ XX T] For the purposes of this talk, I will assume Σ has rank d.

21 Eigen-Decomposition of Σ Σ is always symmetric and semi-positive definite, by definition. The assumption that Σ has full rank d gives us that it must be strictly positive definite, and thus diagonalizable i.e. Σ = UΛU T (1) Where Λ is a diagonal matrix of the eigenvalues λ 1 > λ 2 >... > λ d (in decreasing order without loss of generality - by a permutation of X) and U is the orthogonal matrix with the corresponding eigenvectors u 1, u 2,..., u d for columns.).

22 PCA revolves around the vectors v 1,...v d which I will define as: { v 1 = arg max var(a T X) : a = } 1 a v 2 = arg max a v 3 = arg max a { var(a T X) : a = 1, a v 1 = 0 } { var(a T X) : a = 1, a (v 1 + v 2 ) = 0 }.. { v d = arg max var(a T X) : a = 1, a (v 1 + v v d 1 ) = } 0 a

23 Some groundwork - Eq. 2 cov ( a T X, b T X ) = a T Σb, a R d (2) ( cov a T X, ) [( b T X = ) ( ) ] E a T X b T T X = [ ] E a T XX T b = [ a T E XX T] b = a T Σb Note how cov ( a T X, a T X ) = var ( a T X ) = a T Σa is a special case of this.

24 Some groundwork - Eq. 3 Consider ( ) var U T X = [ ] E U T XX T U = [ U T E XX T] U = U T UΛU T U (Eqn. 1) = Λ cov(u T i X, u T j X) is the (i, j) th entry of var ( U T X ) = Λ, so { cov(u T i X, u T 0 i j j X) = i = j λ i (3)

25 Some groundwork - Eq. 4 a = d c i u i i=1 s.t. d c 2 i = 1 (4) a, a = 1 as u 1, u 2,..., u d form an orthonormal basis for R d. Note that Eq. 4 can be expressed as matrices: i=1 a = Uc s.t. c = 1 for c = c 1 c 2. c d

26 A Proof So consider the first principal component, { v 1 = arg max var(a T X) : a = } 1 a { = arg max axa T : a = } 1 (Eq. 2) a d d d = arg max c i c j u i Xu Uc T j : c 2 i = 1 (Eq. 4) i=1 j=1 i=1 d d = arg max c Uc 2 i λ i : c 2 i = 1 (Eq. 3) = ±u 1 i=1 i=1

27 A Proof Similarly v 2 = ±u 2 v 3 = ±u 3.. v d = ±u d

28 Some Interesting Questions. What happens when Σ has rank r < d? What happens if λ i = λ i+1 for some i? What happens in the sample case?

29 A brief overview of the sample case In the sample case consider we have a d n data matrix X of n observations on d variables (without loss of generality centered). Then we estimate the eigenvectors of the covariance matrix Σ with the eigenvectors of the sample covariance matrix S = XX T Analogues of most of the population case results then hold for this matrix.

30 Some rank properties rank(x) min(n, d) and rank(s) rank(x) min(n, d)

31 DNA microarrays - Gene Expression studies In gene expression studies that use DNA microarrays, typically n << d and so rank(s) rank(x) min(n, d) = n << d

32 References John Novembre, Toby Johnson, Katarzyna Bryc, Zoltan Kutalik, Adam R. Boyko, Adam Auton, Amit Indap, Karen S. King, Sven Bergmann, Matthew R. Nelson, Matthew Stephens, and Carlos D. Bustamante. Genes mirror geography within europe. Nature, 456(7218):98 101, November 2008.

Eigenvalues, Eigenvectors, and an Intro to PCA

Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Changing Basis We ve talked so far about re-writing our data using a new set of variables, or a new basis.