Lecture 4: Principal Component Analysis and Linear Dimension Reduction

Size: px
Start display at page:

Download "Lecture 4: Principal Component Analysis and Linear Dimension Reduction"

Transcription

1 Lecture 4: Principal Component Analysis and Linear Dimension Reduction Advanced Applied Multivariate Analysis STAT 2221, Fall 2013 Sungkyu Jung Department of Statistics University of Pittsburgh / 56

2 Linear dimension reduction For a random vector X R p, consider reducing the dimension from p to d, i.e., p variables (X 1,..., X p ) to a set of most interesting d variables. Here, 1 d p. Best Subset? Linear dimension reduction: Construct d variables Z 1,..., Z d as linear combinations of X 1,..., X p, i.e. Z i = a i1 X a ip X p = a ix (i = 1,..., d). Linear dimension reduction seeks a sequence of such Z i, or equivalently a sequence of a i, where the random variables (Z i s) are most important among all other choices. 2 / 56

3 Principal Component Analysis In linear dimension reduction, we require a 1 = 1 and a i, a j = 0. Thus the problem is to find an interesting set of direction vectors {a i : i = 1,..., p}, where the projection scores onto a i are useful. Principal Component Analysis (PCA) is a linear dimension reduction technique that gives a set of direction vectors of maximal (projected) variances. Take m = 1. PCA for the distribution of X finds a 1 such that a 1 = argmax Var(Z 1 (a)), a R p, a =1 where Z 1 (a) = a 1 X a p X p = a X. 3 / 56

4 Geometric understanding of PCA for point cloud n (= 200) data points in 2D v_1 v_2 PCA is best understood with a point cloud. Take a look at 2D example. 4 / 56

5 Geometric understanding of PCA for point cloud n (= 200) data points in 2D Projection Scores Take a = (1, 0). v_ Density v_ variance 0.79 N = 200 Bandwidth = / 56

6 Geometric understanding of PCA for point n (= 200) data points in 2D Projection cloud Scores v_2 Density Take a = (1, 0), (0, 1), (1/ 2, 1/ 2). Which one is better? v_1 Projection Scores Density Density variance 0.79 N = 200 Bandwidth = Projection Scores variance 1.32 N = 200 Bandwidth = variance 1.04 N = 200 Bandwidth = / 56

7 Geometric understanding of PCA for 3D point cloud A 3D point cloud. Mean. 1st PC direction (maximizing variance of projections) explains the cloud best. 1st and 2nd directions form a plane. 7 / 56

8 Formulation of population PCA 1 Suppose a random vector X with mean µ, covariance Σ (not necessarily normal). The first principal component (PC) direction vector is the unit vector u 1 R p that maximizes the variance of u 1 X when compared to other unit vectors, i.e., u 1 = argmax Var(u X). u R p, u =1 u 1 = (u 11,..., u 1p ) is the first PC direction vector, sometime called loading vector. (u 11,..., u 1p ) are loadings of the 1st PC. Z 1 = u 11 X u 1p X p = u 1X is the first PC score or the first principal component (random variable). λ 1 = Var(Z 1 ) is the variance explained by the first PC score. 8 / 56

9 Formulation of population PCA 2 The second PC direction is the unit vector u 2 R p that maximizes the variance of u 2 X; is orthogonal to the first PC direction u 1. That is, u 2 = argmax Var(u X). u R p, u =1,u u 1 =0 u 2 = (u 21,..., u 2p ) is the second PC direction vector, and is the vector of the 2nd loadings. Z 2 = u 2X is the second principal component. λ 2 = Var(Z 2 ) is the variance explained by the second PC score, and λ 1 λ 2. Corr(Z 1, Z 2 ) = 0. 9 / 56

10 Formulation of population PCA (3,4,...p) Given the first k 1 PC directions u 1,..., u k, the kth PC direction is the unit vector u k R p that maximizes the variance of u 2 X; is orthogonal to the first to (k 1)th PC directions u j. That is, u k = argmax Var(u X). u R p, u =1 u u j =0,j=1,...,k 1 u k = (u k1,..., u kp ) is the kth PC direction vector, and is the vector of the kth loadings. Z k = u kx is the kth principal component. λ k = Var(Z k ) is the variance explained by the k PC score, and λ 1 λ k 1 λ k. Corr(Z i, Z j ) = 0 for all i j k. 10 / 56

11 Relation to eigen-decomposition of Σ Recall the eigen-decomposition of the symmetric positive definite Σ = UΛU with U = [u 1,..., u p ] orthogonal, Λ = diag(λ 1,..., λ p ) with λ 1... λ p, Σu i = λ i u i. In next two slides we show that: 1 The kth eigenvector u k is the kth PC direction vector. 2 The kth eigenvalue λ k is the variance explained by the kth principal component. 3 PC directions are orthogonal u i u j = 0 (i j) and also Σ-orthogonal u iσu j = 0 Cov(Z i, Z j ) = 0 (i j). 11 / 56

12 Relation to eigen-decomposition of Σ The first PC direction problem is to maximize Var(u X) with the constraint u u = 1. Using Lagrange multiplier λ, the problem of maximization is the same as finding a stationary point of Φ(u, λ) = Var(u X) λ(u u 1) = u Σu λ(u u 1). The stationary point solves the following: which leads 1 Φ(u, λ) = Σu λu = 0, 2 u λ = u Σu = Var(u X), Σu = λu. (1) The eigenvector-eigenvalue pairs (u i, λ i ), (i = 1,..., p) satisfy (1). It is clear that the first PC direction is the first eigenvector u 1, as it gives the largest variance λ 1 = u 1 Σu 1 = Var(u 1 X) λ j (j > 1). 12 / 56

13 Relation to eigen-decomposition of Σ For the kth PC direction, we form a Lagrangian function Φ(u, λ, γ k 1 ) = u Σu λ(u u 1) k 1 j=1 2γ ju j u, given the first k 1 PC directions. The derivative of Φ, equated to zero, is then 1 2 u Φ(u, λ, γk 1 ) = Σu λu k 1 j=1 γ ju j = 0, Φ(u, λ, γ1 k ) = u γ ju = 0. (2) j We have γ j = u j Σu = 0 (since Σu j = λ j u j ), thus λ = u Σu = Var(u X), Σu = λu. (3) The kth to last eigen-pairs (u i, λ i ), (i = k,..., p) satisfy both (3) and (2). Thus, the kth PC direction is u k, as it gives the largest variance λ k = u k Σu k. 13 / 56

14 Example: Principal components of bivariate normal distribution Consider N 2 (0, Σ) with [ ] [ ] [ ] [ ] 2 1 cos(π/4) sin(π/4) 3 0 cos(π/4) sin(π/4) Σ = =, 1 2 sin(π/4) cos(π/4) 0 1 sin(π/4) cos(π/4) The PC directions u i are the principal axes of ellipsoids, representing the density of MVN, with lengths given by λ i. 14 / 56

15 Sample PCA Given multivariate data X = [x 1,..., x n ] p n, the sample PCA sequentially finds orthogonal directions of maximal (projected) sample variance. 1 For x i = x i x, the centered data matrix is X = [ x 1,..., x n ] = X(I n 1 n 11 ). 2 The sample variance matrix is then S = 1 n 1 X X. 3 Eigen-decomposition of S = Û ΛÛ leads to the kth sample PC direction (û k ) and the variance of the kth sample score (ˆλ k ). 15 / 56

16 Sample PCA It can be checked that the eigenvector û k of S satisfies û k = z (k)i (u) = u x i, argmax Var({z (k)i (u) : i = 1,..., n}), u R p, u =1 u û j =0,j=1,...,k 1 (i = 1,..., n). û k = (û k1,..., û kp ) is the kth sample PC direction vector, and is the vector of the kth loadings. z (k) = (z (k)i,..., z (k)n ) 1 n = û k X is the of kth score vector, where Z (k)i = û k x i. λ k the sample variance of {Z (k)i : i = 1,..., n}. The information of the first two principal components is all contained in the score vectors (z (1), z (2) ), the 2D scatters of which is thus most informative with the largest total variance. 16 / 56

17 Example: Swiss Bank Note Data Swiss Bank Note data from HS. p = 6, n = 200. Genuine (blue) and fake (red) samples in original measurements V1 V2 V3 V4 V5 V / 56

18 Example: Swiss Bank Note Data Scatters of principal components Comp.1 Comp.2 Comp.3 Comp.4 Comp Comp / 56

19 How many components to keep? (1) The criterion for PCA is a high variance in the principal components. The question involves how much the PCs explain the the whole data in terms of variance. 1 Total variance in X is the sum of all marginal variances p Var({x ki : i = 1,..., n}) = diag(s) k=1 = p Var({z (k)i : i = 1,..., n}) = k=1 p λ k. k=1 2 Variance of the kth scores: Var({z (k)i : i = 1,..., n}) = λ k. 3 Total variance in the 1 kth scores: λ λ k. For heuristics made from these quantities,see next slide: 19 / 56

20 Example: Swiss Bank Note Data scree plot In scree plot (k, λ k ), we look for an elbow. Im cumulative scree plot, (proportion of variance explained, (k, k j=1 λ j p j=1 λ j )), use 90% as a cutoff. Scree plot Cum. Scree plot lambda cumulative propotion (%) component component 20 / 56

21 How many components to keep? (2) 1 Kaiser s rule (of thumb): Retain PCs 1 k satisfying λ k > λ = 1 p p λ j. j=1 Tends to choose fewer components. 2 Likelihood ratio testing on null hypothesis H 0 (k) : λ k+1 = = λ p, where k components are used if H 0 (k) is not rejected at a specified level. 21 / 56

22 Which variables are most responsible for the principal components? Loadings of principal component direction. Biplot - scatterplot of PC1 and PC2 scores, overlaid with p vectors each representing the loadings of the first two PC directions. In the Swiss Bank Note Data, the loadings are Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 V V V V V V / 56

23 Example: Swiss Bank Note Data biplot Which measurements are most responsible for the first two principal components? PC1 loadings V1 V PC2 loadings V1 V4 Comp V V1V2 V V V Comp / 56

24 Computation of PCA PCA is either computed using eigenvalue decomposition of S = 1 n 1 X X or using the singular value decomposition of X. Eigen-decomposition of S For S = UΛU, 1 PC directions u k (eigenvectors) 2 Variance of PC (scores) λ k (eigenvalues) 3 Matrix of principal component scores V = U X = z (1). z (p). 24 / 56

25 Computation of PCA SVD of X The singular value decomposition (SVD) of p n matrix X has the form X = UDV. The left singular vectors U = [u 1,..., u p ] p p and the right singular vectors V = [v 1,..., v p ] n p are orthogonal (U U = I p, V V = I p ). The columns of U span the column space of X; the columns of V span the row space. D = diag(d 1,..., d p ), d 1 d 2... d p 0 are the singular values of X. 25 / 56

26 SVD of X The SVD of X = UDV. Then Computation of PCA S = 1 n 1 XX = 1 n 1 UDV VDU 1 = Udiag( n 1 d j 2 )U 1 PC directions u k (left singular vectors) 2 Variance of PC (scores) 1 n 1 d 2 j (singular values 2 ) 3 Matrix of principal component scores (right singular vectors) z (1) d 1 v 1. = Z = U X = DV =. z (p) d p v p 26 / 56

27 PCA in R The standard data format is the n p data frame or matrix x. To perform PCA by eigen decomposition: spr <-princomp(x) U<-spr$loadings L<-(spr$sdev)^2 Z <-spr$scores To perform PCA by singular value decompositoin gpr <- prcomp(x) U <- gpr$rotation L <- (gpr$sdev)^2 Z <- gpr$x 27 / 56

28 Correlation PCA PCA is not scale invariant. Correlation matrix of a random vector X is given by R = D 1 2 Σ ΣD 1 2 Σ, where D Σ is the p p diagonal matrix consisting of diagonal elements of Σ. Correlation PCA: The PC directions and variance of PC scores are obtained by eigen-decomposition of R = U R Λ R U R. Preferred if measurements are not commensurate (e.g. X 1 = household income, X 2 = years in school). 28 / 56

29 Olivetti Faces data PCA for Olivetti Faces data Obtained from Grayscale faces 8 bit [0-255], a few (10) images of several (40) different people. 400 total images, 64x64 size. From the Oivetti database at ATT. 4 samples, 64 x 64 pixels, each pixel has a value (darker lighter) 29 / 56

30 sample 1, 64 x 64 pixels, each pixel has a value (darker lighter) Images as data An image is a matrix-valued datum. In Olivetti Faces data, the matrix is of size 64 64, with each pixel having values between [0-255]. The matrix, corresponding one observation, is vecterized (vec d) by stacking each column into one long vector of size d = 4096 = So, x 1 is a d 1 vector corresponding to PCA is applied to the data matrix X = [x 1,..., x n ]. 30 / 56

31 Olivetti Faces data Major components PC2 Direction PC3 Direction PC4 Direction 4 2 x PC1 Direction PC1 Direction PC1 Direction PC1 Direction PC1 Direction PC3 Direction PC4 Direction PC2 Direction 5 x PC2 Direction PC2 Direction PC2 Direction PC1 Direction PC2 Direction PC4 Direction PC3 Direction PC3 Direction x PC3 Direction PC3 Direction PC1 Direction PC2 Direction PC3 Direction PC4 Direction PC4 Direction PC4 Direction x PC4 Direction 31 / 56

32 Olivetti Faces data Scree plots 1 scree plot 1 cum. scree plot proportion of variance cum. proportion of variance component component 1 1 proportion of variance cum. proportion of variance component component 32 / 56

33 Olivetti Faces data Interpretation Examine the mode of variation by walking along the PC direction from the mean. Here, ±1, 2 standard deviations of Z (k) PC PC / 56

34 Olivetti Faces data Interpretation (Eigenfaces) PC walk. + 2 std along PC dir. (Top PC 1, Mid PC 2, Bottom PC 3) PC1 darker to lighter face PC2 feminin to masculine face PC3 oval to rectangle face 34 / 56

35 Olivetti Faces data Interpretation (Eigenfaces Computation) Vectorized data in X = [x 1,..., x n ], with mean x. Denote (u j, λ j ): the jth PC direction and PC variance. Walk along PC j direction and examine at position s = ±2, ±1, 0 by 1 Reconstruction at s: w s = x ± s λ j u j 2 Convert to image by reshaping the vector w s into matrix W s. 35 / 56

36 Olivetti Faces data Reconstruction of original data Approximation to the original data matrix: x i = x + p Z (j)i u j, (i = 1,..., n) j=1 Approximation of the original observation x i by the first m < p principal components: ˆx i = x + m Z (j)i u j, j=1 The larger m, the better approximation by ˆx i. The smaller m, the more succinct dimension reduction of X. 36 / 56

37 Reconstruction of original face Observation index i = 5. 5th face. from top left to bottom right: (mean, 1, 5, 10) & (20, 50, 100, 400) PCs 37 / 56

38 Reconstruction of original face Observation index i = th face. from top left to bottom right: (mean, 1, 5, 10) & (20, 50, 100, 400) PCs 38 / 56

39 Reconstruction of original face Observation index i = th face. from top left to bottom right: (mean, 1, 5, 10) & (20, 50, 100, 400) PCs 39 / 56

40 Olivetti Faces data Reconstruction of original face Human eyes require > 50 principal components to see resemblance between ˆx i and x i. Corresponds to about %90 percent of variance explained in PCs. Subjective and heuristic decision on how many components to use Reconstruction by PCA most useful when - each datum is visually represented (rather than being just numbers). - for example: data objects are images, functions, shapes. 40 / 56

41 Dimension reduction for two sets of random variables When distinction between explanatory and response variables are not so clear, an analysis dealing with the two sets of variables in a symmetric manner is desired. Examples 1 Relationship between genes expressions and biological variables; 2 Relation between two sets of psychological tests, each with multidimensional measurements. 41 / 56

42 Example: nutrimouse dataset a nutrition study in the mouse Reference: pharmaco-moleculaire/acceuil.html Pascal Martin from the Toxicology and Pharmacology Laboratory (French National Institute for Agronomic Research). Obtained from R package, CCA. n = 40 mice, each with r = 120 gene expression levels, associated with nutrition problem, and s = 21 measurements of concentrations of 21 hepatic fatty acids. X , Y / 56

43 X 10 40, Y / 56 Example: nutrimouse dataset Objective XY correlation Focus on the first 10 gene expression levels (Dimension r of the first set of variables is now 10).

44 Example: nutrimouse dataset Objective Interested in dimension reduction of X and Y, while keeping the important association between X and Y. Find linear dimension reduction of X and Y using the cross-correlation matrix X correlation Y correlation Cross correlation / 56

45 Canonical Correlation Analysis (CCA) CCA seeks to identify and quantify the associations between two sets of variables. Given two random vectors X R r and Y R s (r = s or r s), consider linear dimension reduction of each of two random vectors, ξ = g X = g 1 X g r X r, ω = h Y = h 1 Y h s Y s. CCA finds the random variables (ξ, ω) or the projection vectors (g, h) that give maximal correlation between ξ and ω, Corr(ξ, ω) = Cov(ξ, ω) Var(ξ)Var(ω) 45 / 56

46 Population CCA 1 Denote Σ 11 = Cov(X),Σ 22 = Cov(Y) Σ 12 = Cov(X, Y),Σ 21 = Cov(Y, X). The first set of projection vectors (g 1, h 1 ) = argmax g R r,h R s Corr(g X, h Y) (4) g Σ 12 h argmax g R r,h R s g Σ 11 gh Σ 22 h. (5) (g 1, h 1 ) Canonical correlation vectors. (ξ 1 = g 1 X, ω 1 = h 1Y) are called canonical variates. ρ 1 = Corr(ξ 1, ω 1 ) is the largest canonical correlation. 46 / 56

47 Population CCA 2,3,... The kth set of projection vectors, given (g 1,..., g k 1 ) and (h 1,..., h k 1 ), are (g k, h k ) = g Σ 12 h argmax g R r,h R s g Σ 11 gh Σ 22 h. (6) g Σ 11 g j =0, h Σ 22 h j =0,j=1,...,k 1 (g k, h k ) Canonical correlation vectors. (ξ k = g k X, ω k = h ky) the kth canonical variates. ρ k = Corr(ξ k, ω k ) ρ j, j = 1,..., k 1. Corr(ξ k, ξ j ) = 0, Corr(ω k, ω j ) = 0, j = 1,..., k 1. Generally g i g j = 0 NOT true. 47 / 56

48 Sample CCA For n sample (x 1, y i ), (i = 1,..., n), denote The sample CCA is S 11 = Ĉov(X),S 22 = Ĉov(Y) S 12 = Ĉov(X, Y),S 21 = Ĉov(Y, X). (g k, h k ) = g S 12 h argmax g R r,h R s g S 11 gh S 22 h. (7) g S 11 g j =0, h S 22 h j =0,j=1,...,k 1 48 / 56

49 First CC vectors: maximize Finding CCA solution g S 12 h g S 11 gh S 22 h. Change-of-variable: a = S g, b = S h. The problem is now to maximize a S S 12S b a ab b with respect to a R r, b R s, and the solution is given by a 1 = v 1 (S S 12S 1 22 S 21S ), b 1 = v 1 (S S 21S 1 11 S 12S ), where v i (M) is the ith eigenvector (corresponding to the ith largest eigenvalue) of symmetric M. 49 / 56

50 Finding CCA solution For j = 1,..., min(s, r), we have a j = v j (S S 12S 1 22 S 21S ), b j = v j (S S 21S 1 11 S 12S ), g j = S a j, h j = S b j and thus canonical covariate scores ξ j = g j X = (g j x 1,..., g j x n) and ω j = h j Y = (h j y 1,..., h j y n). Note that g j g k = a j S 1 11 a k 0 (not orthogonal); Cov({ξ j(i) }, {ξ k(i) }) = g j S 11g k = 0 (uncorrelated). 50 / 56

51 nutrimouse data Canonical correlation coefficients Reduced dataset: r = 10 and s = 21 variables with n = 40 samples. min(s, r) = 10 pairs of canonical variates, with decreasing canonical correlation coefficients. res.cc$cor / 56

52 First four pairs of canonical variates {(ξ (j)i, ω (j)i ) : i = 1,..., n}. nutrimouse data Canonical variate scores plot 52 / 56

53 Original data, with r n. Problem? nutrimouse data with r = 120 XY correlation / 56

54 Regularized CCA When r n, and s n, there always exist g S 12 h (g 1, h 1 ) = argmax g R r,h R s g S 11 gh S 22 h, satisfying Corr(ξ j, ω k ) = ±1 (perfect correlation!). This is an artifact from that S 11 and S 22 are not invertible (or the rank of X and Ỹ is at most n 1. A way to circumvent this issue and obtain a meaningful estimate of population canonical correlation coefficients, Regularized CCA is often used: g S 12 h (g k, h k ) = argmax g R r,h R s g (S 11 + λ 1 I r )gh (S 22 + λ 2 I s )h, g S 11 g j =0, h S 22 h j =0,j=1,...,k 1 (8) for λ 1, λ 2 > / 56

55 nutrimouse data Regularized CCA 1 the sample canonical correlation coefficients ρ j and 2 the corresponding projection vectors h j and g j are varing depending on the choice of regularization parameter λ 1 = λ 2 = , 0.01, / 56

56 CCA in R n r data frame or matrix x and n s data frame or matrix y To perform CCA: cc=cancor(x,y) rho<-cc$cor g<-cc$xcoef h<-cc$ycoef To perform regularized CCA, use package CCA, library(cca) cc=rcc(x,y,lambda1,lambda2) rho<-cc$cor g<-cc$xcoef h<-cc$ycoef 56 / 56

1 Principal Components Analysis

1 Principal Components Analysis Lecture 3 and 4 Sept. 18 and Sept.20-2006 Data Visualization STAT 442 / 890, CM 462 Lecture: Ali Ghodsi 1 Principal Components Analysis Principal components analysis (PCA) is a very popular technique for

More information

Linear Dimensionality Reduction

Linear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

Computation. For QDA we need to calculate: Lets first consider the case that

Computation. For QDA we need to calculate: Lets first consider the case that Computation For QDA we need to calculate: δ (x) = 1 2 log( Σ ) 1 2 (x µ ) Σ 1 (x µ ) + log(π ) Lets first consider the case that Σ = I,. This is the case where each distribution is spherical, around the

More information

Statistics for Applications. Chapter 9: Principal Component Analysis (PCA) 1/16

Statistics for Applications. Chapter 9: Principal Component Analysis (PCA) 1/16 Statistics for Applications Chapter 9: Principal Component Analysis (PCA) 1/16 Multivariate statistics and review of linear algebra (1) Let X be a d-dimensional random vector and X 1,..., X n be n independent

More information

Principal Components Analysis. Sargur Srihari University at Buffalo

Principal Components Analysis. Sargur Srihari University at Buffalo Principal Components Analysis Sargur Srihari University at Buffalo 1 Topics Projection Pursuit Methods Principal Components Examples of using PCA Graphical use of PCA Multidimensional Scaling Srihari 2

More information

What is Principal Component Analysis?

What is Principal Component Analysis? What is Principal Component Analysis? Principal component analysis (PCA) Reduce the dimensionality of a data set by finding a new set of variables, smaller than the original set of variables Retains most

More information

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Combinations of features Given a data matrix X n p with p fairly large, it can

More information

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A. 2017-2018 Pietro Guccione, PhD DEI - DIPARTIMENTO DI INGEGNERIA ELETTRICA E DELL INFORMAZIONE POLITECNICO DI

More information

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26 Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1

More information

STAT 100C: Linear models

STAT 100C: Linear models STAT 100C: Linear models Arash A. Amini April 27, 2018 1 / 1 Table of Contents 2 / 1 Linear Algebra Review Read 3.1 and 3.2 from text. 1. Fundamental subspace (rank-nullity, etc.) Im(X ) = ker(x T ) R

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data. Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two

More information

Principal Components Analysis

Principal Components Analysis Principal Components Analysis Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 16-Mar-2017 Nathaniel E. Helwig (U of Minnesota) Principal

More information

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 42 Outline 1 Introduction 2 Feature selection

More information

Standardization and Singular Value Decomposition in Canonical Correlation Analysis

Standardization and Singular Value Decomposition in Canonical Correlation Analysis Standardization and Singular Value Decomposition in Canonical Correlation Analysis Melinda Borello Johanna Hardin, Advisor David Bachman, Reader Submitted to Pitzer College in Partial Fulfillment of the

More information

Multivariate Statistics (I) 2. Principal Component Analysis (PCA)

Multivariate Statistics (I) 2. Principal Component Analysis (PCA) Multivariate Statistics (I) 2. Principal Component Analysis (PCA) 2.1 Comprehension of PCA 2.2 Concepts of PCs 2.3 Algebraic derivation of PCs 2.4 Selection and goodness-of-fit of PCs 2.5 Algebraic derivation

More information

Learning with Singular Vectors

Learning with Singular Vectors Learning with Singular Vectors CIS 520 Lecture 30 October 2015 Barry Slaff Based on: CIS 520 Wiki Materials Slides by Jia Li (PSU) Works cited throughout Overview Linear regression: Given X, Y find w:

More information

TAMS39 Lecture 10 Principal Component Analysis Factor Analysis

TAMS39 Lecture 10 Principal Component Analysis Factor Analysis TAMS39 Lecture 10 Principal Component Analysis Factor Analysis Martin Singull Department of Mathematics Mathematical Statistics Linköping University, Sweden Content - Lecture Principal component analysis

More information

STATISTICAL LEARNING SYSTEMS

STATISTICAL LEARNING SYSTEMS STATISTICAL LEARNING SYSTEMS LECTURE 8: UNSUPERVISED LEARNING: FINDING STRUCTURE IN DATA Institute of Computer Science, Polish Academy of Sciences Ph. D. Program 2013/2014 Principal Component Analysis

More information

Principal component analysis

Principal component analysis Principal component analysis Angela Montanari 1 Introduction Principal component analysis (PCA) is one of the most popular multivariate statistical methods. It was first introduced by Pearson (1901) and

More information

Clusters. Unsupervised Learning. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Clusters. Unsupervised Learning. Luc Anselin.   Copyright 2017 by Luc Anselin, All Rights Reserved Clusters Unsupervised Learning Luc Anselin http://spatial.uchicago.edu 1 curse of dimensionality principal components multidimensional scaling classical clustering methods 2 Curse of Dimensionality 3 Curse

More information

CS 340 Lec. 6: Linear Dimensionality Reduction

CS 340 Lec. 6: Linear Dimensionality Reduction CS 340 Lec. 6: Linear Dimensionality Reduction AD January 2011 AD () January 2011 1 / 46 Linear Dimensionality Reduction Introduction & Motivation Brief Review of Linear Algebra Principal Component Analysis

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

Principal Component Analysis (PCA) Theory, Practice, and Examples

Principal Component Analysis (PCA) Theory, Practice, and Examples Principal Component Analysis (PCA) Theory, Practice, and Examples Data Reduction summarization of data with many (p) variables by a smaller set of (k) derived (synthetic, composite) variables. p k n A

More information

Principal Component Analysis. Applied Multivariate Statistics Spring 2012

Principal Component Analysis. Applied Multivariate Statistics Spring 2012 Principal Component Analysis Applied Multivariate Statistics Spring 2012 Overview Intuition Four definitions Practical examples Mathematical example Case study 2 PCA: Goals Goal 1: Dimension reduction

More information

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Principal Analysis Edps/Soc 584 and Psych 594 Applied Multivariate Statistics Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN c Board

More information

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Applications of visual analytics, data types 3 Data sources and preparation Project 1 out 4

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Applications of visual analytics, data types 3 Data sources and preparation Project 1 out 4 Lecture Topic Projects 1 Intro, schedule, and logistics 2 Applications of visual analytics, data types 3 Data sources and preparation Project 1 out 4 Data reduction, similarity & distance, data augmentation

More information

Machine Learning. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Machine Learning. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Machine Learning Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1395 1 / 47 Table of contents 1 Introduction

More information

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis: Uses one group of variables (we will call this X) In

More information

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation)

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation) Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation) PCA transforms the original input space into a lower dimensional space, by constructing dimensions that are linear combinations

More information

Vector Space Models. wine_spectral.r

Vector Space Models. wine_spectral.r Vector Space Models 137 wine_spectral.r Latent Semantic Analysis Problem with words Even a small vocabulary as in wine example is challenging LSA Reduce number of columns of DTM by principal components

More information

Data Preprocessing Tasks

Data Preprocessing Tasks Data Tasks 1 2 3 Data Reduction 4 We re here. 1 Dimensionality Reduction Dimensionality reduction is a commonly used approach for generating fewer features. Typically used because too many features can

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 02-01-2018 Biomedical data are usually high-dimensional Number of samples (n) is relatively small whereas number of features (p) can be large Sometimes p>>n Problems

More information

FINM 331: MULTIVARIATE DATA ANALYSIS FALL 2017 PROBLEM SET 3

FINM 331: MULTIVARIATE DATA ANALYSIS FALL 2017 PROBLEM SET 3 FINM 331: MULTIVARIATE DATA ANALYSIS FALL 2017 PROBLEM SET 3 The required files for all problems can be found in: http://www.stat.uchicago.edu/~lekheng/courses/331/hw3/ The file name indicates which problem

More information

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) Principal Component Analysis (PCA) Salvador Dalí, Galatea of the Spheres CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang Some slides from Derek Hoiem and Alysha

More information

Tutorial on Principal Component Analysis

Tutorial on Principal Component Analysis Tutorial on Principal Component Analysis Copyright c 1997, 2003 Javier R. Movellan. This is an open source document. Permission is granted to copy, distribute and/or modify this document under the terms

More information

18.S096 Problem Set 7 Fall 2013 Factor Models Due Date: 11/14/2013. [ ] variance: E[X] =, and Cov[X] = Σ = =

18.S096 Problem Set 7 Fall 2013 Factor Models Due Date: 11/14/2013. [ ] variance: E[X] =, and Cov[X] = Σ = = 18.S096 Problem Set 7 Fall 2013 Factor Models Due Date: 11/14/2013 1. Consider a bivariate random variable: [ ] X X = 1 X 2 with mean and co [ ] variance: [ ] [ α1 Σ 1,1 Σ 1,2 σ 2 ρσ 1 σ E[X] =, and Cov[X]

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand

More information

EDAMI DIMENSION REDUCTION BY PRINCIPAL COMPONENT ANALYSIS

EDAMI DIMENSION REDUCTION BY PRINCIPAL COMPONENT ANALYSIS EDAMI DIMENSION REDUCTION BY PRINCIPAL COMPONENT ANALYSIS Mario Romanazzi October 29, 2017 1 Introduction An important task in multidimensional data analysis is reduction in complexity. Recalling that

More information

MATH 829: Introduction to Data Mining and Analysis Principal component analysis

MATH 829: Introduction to Data Mining and Analysis Principal component analysis 1/11 MATH 829: Introduction to Data Mining and Analysis Principal component analysis Dominique Guillot Departments of Mathematical Sciences University of Delaware April 4, 2016 Motivation 2/11 High-dimensional

More information

UCLA STAT 233 Statistical Methods in Biomedical Imaging

UCLA STAT 233 Statistical Methods in Biomedical Imaging UCLA STAT 233 Statistical Methods in Biomedical Imaging Instructor: Ivo Dinov, Asst. Prof. In Statistics and Neurology University of California, Los Angeles, Spring 2004 http://www.stat.ucla.edu/~dinov/

More information

Lecture 6: Methods for high-dimensional problems

Lecture 6: Methods for high-dimensional problems Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,

More information

Singular Value Decomposition and Principal Component Analysis (PCA) I

Singular Value Decomposition and Principal Component Analysis (PCA) I Singular Value Decomposition and Principal Component Analysis (PCA) I Prof Ned Wingreen MOL 40/50 Microarray review Data per array: 0000 genes, I (green) i,i (red) i 000 000+ data points! The expression

More information

Principal Components Analysis (PCA) and Singular Value Decomposition (SVD) with applications to Microarrays

Principal Components Analysis (PCA) and Singular Value Decomposition (SVD) with applications to Microarrays Principal Components Analysis (PCA) and Singular Value Decomposition (SVD) with applications to Microarrays Prof. Tesler Math 283 Fall 2015 Prof. Tesler Principal Components Analysis Math 283 / Fall 2015

More information

Principal Components Theory Notes

Principal Components Theory Notes Principal Components Theory Notes Charles J. Geyer August 29, 2007 1 Introduction These are class notes for Stat 5601 (nonparametrics) taught at the University of Minnesota, Spring 2006. This not a theory

More information

Unconstrained Ordination

Unconstrained Ordination Unconstrained Ordination Sites Species A Species B Species C Species D Species E 1 0 (1) 5 (1) 1 (1) 10 (4) 10 (4) 2 2 (3) 8 (3) 4 (3) 12 (6) 20 (6) 3 8 (6) 20 (6) 10 (6) 1 (2) 3 (2) 4 4 (5) 11 (5) 8 (5)

More information

System 1 (last lecture) : limited to rigidly structured shapes. System 2 : recognition of a class of varying shapes. Need to:

System 1 (last lecture) : limited to rigidly structured shapes. System 2 : recognition of a class of varying shapes. Need to: System 2 : Modelling & Recognising Modelling and Recognising Classes of Classes of Shapes Shape : PDM & PCA All the same shape? System 1 (last lecture) : limited to rigidly structured shapes System 2 :

More information

Announcements (repeat) Principal Components Analysis

Announcements (repeat) Principal Components Analysis 4/7/7 Announcements repeat Principal Components Analysis CS 5 Lecture #9 April 4 th, 7 PA4 is due Monday, April 7 th Test # will be Wednesday, April 9 th Test #3 is Monday, May 8 th at 8AM Just hour long

More information

Principal Component Analysis

Principal Component Analysis CSci 5525: Machine Learning Dec 3, 2008 The Main Idea Given a dataset X = {x 1,..., x N } The Main Idea Given a dataset X = {x 1,..., x N } Find a low-dimensional linear projection The Main Idea Given

More information

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Canonical Edps/Soc 584 and Psych 594 Applied Multivariate Statistics Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Canonical Slide

More information

Applied Multivariate Analysis

Applied Multivariate Analysis Department of Mathematics and Statistics, University of Vaasa, Finland Spring 2017 Dimension reduction Principal Component Analysis (PCA) The problem in exploratory multivariate data analysis usually is

More information

Principal Component Analysis (PCA) CSC411/2515 Tutorial

Principal Component Analysis (PCA) CSC411/2515 Tutorial Principal Component Analysis (PCA) CSC411/2515 Tutorial Harris Chan Based on previous tutorial slides by Wenjie Luo, Ladislav Rampasek University of Toronto hchan@cs.toronto.edu October 19th, 2017 (UofT)

More information

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Second-Order Inference for Gaussian Random Curves

Second-Order Inference for Gaussian Random Curves Second-Order Inference for Gaussian Random Curves With Application to DNA Minicircles Victor Panaretos David Kraus John Maddocks Ecole Polytechnique Fédérale de Lausanne Panaretos, Kraus, Maddocks (EPFL)

More information

Properties of Matrices and Operations on Matrices

Properties of Matrices and Operations on Matrices Properties of Matrices and Operations on Matrices A common data structure for statistical analysis is a rectangular array or matris. Rows represent individual observational units, or just observations,

More information

Chapter 4: Factor Analysis

Chapter 4: Factor Analysis Chapter 4: Factor Analysis In many studies, we may not be able to measure directly the variables of interest. We can merely collect data on other variables which may be related to the variables of interest.

More information

Mathematical foundations - linear algebra

Mathematical foundations - linear algebra Mathematical foundations - linear algebra Andrea Passerini passerini@disi.unitn.it Machine Learning Vector space Definition (over reals) A set X is called a vector space over IR if addition and scalar

More information

The CCA Package. March 20, 2007

The CCA Package. March 20, 2007 The CCA Package March 20, 2007 Type Package Title Canonical correlation analysis Version 1.1 Date 2007-03-20 Author Ignacio González, Sébastien Déjean Maintainer Sébastien Déjean

More information

Principal Component Analysis CS498

Principal Component Analysis CS498 Principal Component Analysis CS498 Today s lecture Adaptive Feature Extraction Principal Component Analysis How, why, when, which A dual goal Find a good representation The features part Reduce redundancy

More information

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012 Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Principal Components Analysis Le Song Lecture 22, Nov 13, 2012 Based on slides from Eric Xing, CMU Reading: Chap 12.1, CB book 1 2 Factor or Component

More information

Vectors and Matrices Statistics with Vectors and Matrices

Vectors and Matrices Statistics with Vectors and Matrices Vectors and Matrices Statistics with Vectors and Matrices Lecture 3 September 7, 005 Analysis Lecture #3-9/7/005 Slide 1 of 55 Today s Lecture Vectors and Matrices (Supplement A - augmented with SAS proc

More information

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016 Lecture 8 Principal Component Analysis Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 13, 2016 Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 1 / 31 Outline 1 Eigen

More information

Ch.3 Canonical correlation analysis (CCA) [Book, Sect. 2.4]

Ch.3 Canonical correlation analysis (CCA) [Book, Sect. 2.4] Ch.3 Canonical correlation analysis (CCA) [Book, Sect. 2.4] With 2 sets of variables {x i } and {y j }, canonical correlation analysis (CCA), first introduced by Hotelling (1936), finds the linear modes

More information

Machine Learning - MT & 14. PCA and MDS

Machine Learning - MT & 14. PCA and MDS Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)

More information

Principal Component Analysis and Singular Value Decomposition. Volker Tresp, Clemens Otte Summer 2014

Principal Component Analysis and Singular Value Decomposition. Volker Tresp, Clemens Otte Summer 2014 Principal Component Analysis and Singular Value Decomposition Volker Tresp, Clemens Otte Summer 2014 1 Motivation So far we always argued for a high-dimensional feature space Still, in some cases it makes

More information

A Peak to the World of Multivariate Statistical Analysis

A Peak to the World of Multivariate Statistical Analysis A Peak to the World of Multivariate Statistical Analysis Real Contents Real Real Real Why is it important to know a bit about the theory behind the methods? Real 5 10 15 20 Real 10 15 20 Figure: Multivariate

More information

3.1. The probabilistic view of the principal component analysis.

3.1. The probabilistic view of the principal component analysis. 301 Chapter 3 Principal Components and Statistical Factor Models This chapter of introduces the principal component analysis (PCA), briefly reviews statistical factor models PCA is among the most popular

More information

Principal Component Analysis-I Geog 210C Introduction to Spatial Data Analysis. Chris Funk. Lecture 17

Principal Component Analysis-I Geog 210C Introduction to Spatial Data Analysis. Chris Funk. Lecture 17 Principal Component Analysis-I Geog 210C Introduction to Spatial Data Analysis Chris Funk Lecture 17 Outline Filters and Rotations Generating co-varying random fields Translating co-varying fields into

More information

Dimensionality Reduction Techniques (DRT)

Dimensionality Reduction Techniques (DRT) Dimensionality Reduction Techniques (DRT) Introduction: Sometimes we have lot of variables in the data for analysis which create multidimensional matrix. To simplify calculation and to get appropriate,

More information

Data reduction for multivariate analysis

Data reduction for multivariate analysis Data reduction for multivariate analysis Using T 2, m-cusum, m-ewma can help deal with the multivariate detection cases. But when the characteristic vector x of interest is of high dimension, it is difficult

More information

STAT 730 Chapter 9: Factor analysis

STAT 730 Chapter 9: Factor analysis STAT 730 Chapter 9: Factor analysis Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Data Analysis 1 / 15 Basic idea Factor analysis attempts to explain the

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions

More information

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations. Previously Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations y = Ax Or A simply represents data Notion of eigenvectors,

More information

(b) The sample canonical correlation matrix is reported below:

(b) The sample canonical correlation matrix is reported below: Multivariate data analysis HW#4 1. Solution: (a) The estimated coefficients for X are: So j! = ( 0.007204730, 0.113157401, 0.001881052). The estimated coefficients for Y are: So h! = ( 0.015167589, 0.003864790,

More information

Linear Methods in Data Mining

Linear Methods in Data Mining Why Methods? linear methods are well understood, simple and elegant; algorithms based on linear methods are widespread: data mining, computer vision, graphics, pattern recognition; excellent general software

More information

1. Introduction to Multivariate Analysis

1. Introduction to Multivariate Analysis 1. Introduction to Multivariate Analysis Isabel M. Rodrigues 1 / 44 1.1 Overview of multivariate methods and main objectives. WHY MULTIVARIATE ANALYSIS? Multivariate statistical analysis is concerned with

More information

Econ Slides from Lecture 7

Econ Slides from Lecture 7 Econ 205 Sobel Econ 205 - Slides from Lecture 7 Joel Sobel August 31, 2010 Linear Algebra: Main Theory A linear combination of a collection of vectors {x 1,..., x k } is a vector of the form k λ ix i for

More information

Statistics 202: Data Mining. c Jonathan Taylor. Week 2 Based in part on slides from textbook, slides of Susan Holmes. October 3, / 1

Statistics 202: Data Mining. c Jonathan Taylor. Week 2 Based in part on slides from textbook, slides of Susan Holmes. October 3, / 1 Week 2 Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Part I Other datatypes, preprocessing 2 / 1 Other datatypes Document data You might start with a collection of

More information

Part I. Other datatypes, preprocessing. Other datatypes. Other datatypes. Week 2 Based in part on slides from textbook, slides of Susan Holmes

Part I. Other datatypes, preprocessing. Other datatypes. Other datatypes. Week 2 Based in part on slides from textbook, slides of Susan Holmes Week 2 Based in part on slides from textbook, slides of Susan Holmes Part I Other datatypes, preprocessing October 3, 2012 1 / 1 2 / 1 Other datatypes Other datatypes Document data You might start with

More information

1 A factor can be considered to be an underlying latent variable: (a) on which people differ. (b) that is explained by unknown variables

1 A factor can be considered to be an underlying latent variable: (a) on which people differ. (b) that is explained by unknown variables 1 A factor can be considered to be an underlying latent variable: (a) on which people differ (b) that is explained by unknown variables (c) that cannot be defined (d) that is influenced by observed variables

More information

Machine Learning 2nd Edition

Machine Learning 2nd Edition INTRODUCTION TO Lecture Slides for Machine Learning 2nd Edition ETHEM ALPAYDIN, modified by Leonardo Bobadilla and some parts from http://www.cs.tau.ac.il/~apartzin/machinelearning/ The MIT Press, 2010

More information

Unsupervised Learning Basics

Unsupervised Learning Basics SC4/SM8 Advanced Topics in Statistical Machine Learning Unsupervised Learning Basics Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/atsml/

More information

Modeling Classes of Shapes Suppose you have a class of shapes with a range of variations: System 2 Overview

Modeling Classes of Shapes Suppose you have a class of shapes with a range of variations: System 2 Overview 4 4 4 6 4 4 4 6 4 4 4 6 4 4 4 6 4 4 4 6 4 4 4 6 4 4 4 6 4 4 4 6 Modeling Classes of Shapes Suppose you have a class of shapes with a range of variations: System processes System Overview Previous Systems:

More information

Eigenfaces. Face Recognition Using Principal Components Analysis

Eigenfaces. Face Recognition Using Principal Components Analysis Eigenfaces Face Recognition Using Principal Components Analysis M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive Neuroscience, 3(1), pp. 71-86, 1991. Slides : George Bebis, UNR

More information

Linear Subspace Models

Linear Subspace Models Linear Subspace Models Goal: Explore linear models of a data set. Motivation: A central question in vision concerns how we represent a collection of data vectors. The data vectors may be rasterized images,

More information

Short Answer Questions: Answer on your separate blank paper. Points are given in parentheses.

Short Answer Questions: Answer on your separate blank paper. Points are given in parentheses. ISQS 6348 Final exam solutions. Name: Open book and notes, but no electronic devices. Answer short answer questions on separate blank paper. Answer multiple choice on this exam sheet. Put your name on

More information

Canonical Correlation Analysis with Kernels

Canonical Correlation Analysis with Kernels Canonical Correlation Analysis with Kernels Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Computational Diagnostics Group Seminar 2003 Mar 10 1 Overview

More information

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Anders Øland David Christiansen 1 Introduction Principal Component Analysis, or PCA, is a commonly used multi-purpose technique in data analysis. It can be used for feature

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University Lecture 4: Principal Component Analysis Aykut Erdem May 016 Hacettepe University This week Motivation PCA algorithms Applications PCA shortcomings Autoencoders Kernel PCA PCA Applications Data Visualization

More information

15 Singular Value Decomposition

15 Singular Value Decomposition 15 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

18 Bivariate normal distribution I

18 Bivariate normal distribution I 8 Bivariate normal distribution I 8 Example Imagine firing arrows at a target Hopefully they will fall close to the target centre As we fire more arrows we find a high density near the centre and fewer

More information

CHAPTER 4 PRINCIPAL COMPONENT ANALYSIS-BASED FUSION

CHAPTER 4 PRINCIPAL COMPONENT ANALYSIS-BASED FUSION 59 CHAPTER 4 PRINCIPAL COMPONENT ANALYSIS-BASED FUSION 4. INTRODUCTION Weighted average-based fusion algorithms are one of the widely used fusion methods for multi-sensor data integration. These methods

More information

CLASS-SENSITIVE PRINCIPAL COMPONENTS ANALYSIS. Di Miao. Chapel Hill 2015

CLASS-SENSITIVE PRINCIPAL COMPONENTS ANALYSIS. Di Miao. Chapel Hill 2015 CLASS-SENSITIVE PRINCIPAL COMPONENTS ANALYSIS Di Miao A dissertation submitted to the faculty at the University of North Carolina at Chapel Hill in partial fulfillment of the requirements for the degree

More information

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis Massimiliano Pontil 1 Today s plan SVD and principal component analysis (PCA) Connection

More information

Unsupervised Learning: Dimensionality Reduction

Unsupervised Learning: Dimensionality Reduction Unsupervised Learning: Dimensionality Reduction CMPSCI 689 Fall 2015 Sridhar Mahadevan Lecture 3 Outline In this lecture, we set about to solve the problem posed in the previous lecture Given a dataset,

More information