Data Mining and Analysis: Fundamental Concepts and Algorithms
|
|
- Brett Bennett
- 5 years ago
- Views:
Transcription
1 Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA 2 Department of Computer Science Universidade Federal de Minas Gerais, Belo Horizonte, Brazil Chapter 7: Dimensionality Reduction Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 7: Dimensionality Reduction 1 / 33
2 Dimensionality Reduction The goal of dimensionality reduction is to find a lower dimensional representation of the data matrix D to avoid the curse of dimensionality. Given n d data matrix, each point x i = (x i1, x i2,...,x id ) T is a vector in the ambient d-dimensional vector space spanned by the d standard basis vectors e 1, e 2,...,e d. Given any other set of d orthonormal vectors u 1, u 2,...,u d we can re-express each point x as x = a 1 u 1 + a 2 u 2 + +a d u d where a = (a 1, a 2,...,a d ) T represents the coordinates of x in the new basis. More compactly: x = Ua where U is the d d orthogonal matrix, whose ith column comprises the ith basis vector u i. Thus U 1 = U T, and we have a = U T x Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 7: Dimensionality Reduction 2 / 33
3 Optimal Basis: Projection in Lower Dimensional Space There are potentially infinite choices for the orthonormal basis vectors. Our goal is to choose an optimal basis that preserves essential information about D. We are interested in finding the optimal r-dimensional representation of D, with r d. Projection of x onto the first r basis vectors is given as x = a 1 u 1 + a 2 u 2 + +a r u r = r a i u i = U r a r where U r and a r comprises the r basis vectors and coordinates, respv. Also, restricting a = U T x to r terms, we have a r = U T r x The r-dimensional projection of x is thus given as: x = U r U T r x = P r x where P r = U r U T r = r u iu T i is the orthogonal projection matrix for the subspace spanned by the first r basis vectors. Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 7: Dimensionality Reduction 3 / 33
4 Optimal Basis: Error Vector Given the projected vector x = P r x, the corresponding error vector, is the projection onto the remaining d r basis vectors ǫ = d i=r+1 The error vector ǫ is orthogonal to x. a i u i = x x The goal of dimensionality reduction is to seek an r-dimensional basis that gives the best possible approximation x i over all the points x i D. Alternatively, we seek to minimize the error ǫ i = x i x i over all the points. Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 7: Dimensionality Reduction 4 / 33
5 Iris Data: Optimal One-dimensional Basis X1 X2 X3 X1 X2 X3 u1 Iris Data: 3D Optimal 1D Basis Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 7: Dimensionality Reduction 5 / 33
6 Iris Data: Optimal 2D Basis X 1 X 2 X 3 X1 X2 X3 u1 u2 Iris Data (3D) Optimal 2D Basis Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 7: Dimensionality Reduction 6 / 33
7 Principal Component Analysis Principal Component Analysis (PCA) is a technique that seeks a r-dimensional basis that best captures the variance in the data. The direction with the largest projected variance is called the first principal component. The orthogonal direction that captures the second largest projected variance is called the second principal component, and so on. The direction that maximizes the variance is also the one that minimizes the mean squared error. Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 7: Dimensionality Reduction 7 / 33
8 Principal Component: Direction of Most Variance We seek to find the unit vector u that maximizes the projected variance of the points. Let D be centered, and let Σ be its covariance matrix. The projection of x i on u is given as ( u x T ) x i i = u T u = (u T x i )u = a i u u Across all the points, the projected variance along u is σu 2 = 1 n (a i µ u ) 2 = 1 n u T ( ( x i x T ) i u = u T 1 n x i x T i n n n ) u = u T Σu We have to find the optimal basis vector u that maximizes the projected variance σ 2 u = u T Σu, subject to the constraint that u T u = 1. The maximization objective is given as max J(u) = u T Σu α(u T u 1) u Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 7: Dimensionality Reduction 8 / 33
9 Principal Component: Direction of Most Variance Given the objective max u J(u) = u T Σu α(u T u 1), we solve it by setting the derivative of J(u) with respect to u to the zero vector, to obtain ( u T Σu α(u T u 1) ) = 0 u that is, 2Σu 2αu = 0which implies Σu = αu Thus α is an eigenvalue of the covariance matrix Σ, with the associated eigenvector u. Taking the dot product with u on both sides, we have σ 2 u = ut Σuu T αu = αu T u = α To maximize the projected variance σ 2 u, we thus choose the largest eigenvalue λ 1 of Σ, and the dominant eigenvector u 1 specifies the direction of most variance, also called the first principal component. Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 7: Dimensionality Reduction 9 / 33
10 Iris Data: First Principal Component X 3 X 1 X 2 u 1 Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 7: Dimensionality Reduction 10 / 33
11 Minimum Squared Error Approach The direction that maximizes the projected variance is also the one that minimizes the average squared error. The mean squared error (MSE) optimization condition is MSE(u) = 1 n n ǫ i 2 = 1 n n x i x i 2 = n x i 2 u T Σu n Since the first term is fixed for a dataset D, we see that the direction u 1 that maximizes the variance is also the one that minimizes the MSE. Further, we have sum n x i 2 u T Σu = var(d) = tr(σ) = n Thus, for the direction u 1 that minimizes MSE, we have d MSE(u 1 ) = var(d) u T 1 Σu 1 = var(d) λ 1 σ 2 i Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 7: Dimensionality Reduction 11 / 33
12 Best 2-dimensional Approximation The best 2D subspace that captures the most variance in D comprises the eigenvectors u 1 and u 2 corresponding to the largest and second largest eigenvalues λ 1 and λ 2, respv. Let U 2 = ( ) u 1 u 2 be the matrix whose columns correspond to the two principal components. Given the point x i D its projected coordinates are computed as follows: a i = U T 2 x i Let A denote the projected 2D dataset. The total projected variance for A is given as var(a) = u T 1 Σu 1 + u T 2 Σu 2 = u T 1 λ 1u 1 + u T 2 λ 2u 2 = λ 1 +λ 2 The first two principal components also minimize the mean square error objective, since MSE = 1 n n x i x i 2 = var(d) 1 n n ( ) x T i P 2 x i = var(d) var(a) Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 7: Dimensionality Reduction 12 / 33
13 Optimal and Non-optimal 2D Approximations The optimal subspace maximizes the variance, and minimizes the squared error, whereas the nonoptimal subspace captures less variance, and has a high mean squared error value, as seen from the lengths of the error vectors (line segments). u2 X1 X3 X2 X1 X3 X2 u1 Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 7: Dimensionality Reduction 13 / 33
14 Best r-dimensional Approximation To find the best r-dimensional approximation to D, we compute the eigenvalues of Σ. Because Σ is positive semidefinite, its eigenvalues are non-negative λ 1 λ 2 λ r λ r+1 λ d 0 We select the r largest eigenvalues, and their corresponding eigenvectors to form the best r-dimensional approximation. Total Projected Variance: Let U r = ( ) u 1 u r be the r-dimensional basis vector matrix, withe the projection matrix given as P r = U ru T r = r u iu T i. Let A denote the dataset formed by the coordinates of the projected points in the r-dimensional subspace. The projected variance is given as var(a) = 1 n n x T i P r x i = r u T i Σu i = r λ i Mean Squared Error: The mean squared error in r dimensions is MSE = 1 n n r xi x 2 i = var(d) λ i = d λ i r λ i Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 7: Dimensionality Reduction 14 / 33
15 Choosing the Dimensionality One criteria for choosing r is to compute the fraction of the total variance captured by the first r principal components, computed as f(r) = λ r 1 +λ 2 + +λ r = λ i λ 1 +λ 2 + +λ d d λ i = r λ i var(d) Given a certain desired variance threshold, say α, starting from the first principal component, we keep on adding additional components, and stop at the smallest value r, for which f(r) α. In other words, we select the fewest number of dimensions such that the subspace spanned by those r dimensions captures at least α fraction (say 0.9) of the total variance. Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 7: Dimensionality Reduction 15 / 33
16 Principal Component Analysis: Algorithm PCA (D,α): µ = 1 n 1 n x i // compute mean 2 Z = D 1 µ T // center the data ( Σ = 1 n Z T Z ) 3 // compute covariance matrix 4 (λ 1,λ 2,...,λ d ) = eigenvalues(σ)// compute eigenvalues U = ( ) 5 u 1 u 2 u d = eigenvectors(σ)// compute eigenvectors 6 f(r) = r λ i d λ i, for all r = 1, 2,...,d // fraction of total variance 7 Choose smallest r so that f(r) α// choose dimensionality U r = ( ) 8 u 1 u 2 u r // reduced basis 9 A = {a i a i = U T r x i, for i = 1,...,n}// reduced dimensionality data Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 7: Dimensionality Reduction 16 / 33
17 Iris Principal Components Covariance matrix: Σ = The eigenvalues and eigenvectors of Σ λ 1 = λ 2 = λ 3 = u 1 = u 2 = u 3 = The total variance is therefore λ 1 +λ 2 +λ 3 = = The fraction of total variance for different values of r is given as r f(r) This r = 2 PCs are need to capture α = 0.95 fraction of variance. Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 7: Dimensionality Reduction 17 / 33
18 Iris Data: Optimal 3D PC Basis X 3 X 1 X 2 u2 u3 u1 Iris Data (3D) Optimal 3D Basis Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 7: Dimensionality Reduction 18 / 33
19 Iris Principal Components: Projected Data (2D) u u 1 Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 7: Dimensionality Reduction 19 / 33
20 Geometry of PCA Geometrically, when r = d, PCA corresponds to a orthogonal change of basis, so that the total variance is captured by the sum of the variances along each of the principal directions u 1, u 2,...,u d, and further, all covariances are zero. Let U be the d d orthogonal matrix U = ( u 1 u 2 u d ), with U 1 = U T. Let Λ = diag(λ 1,,λ d ) be the diagonal matrix of eigenvalues. Each principal component u i corresponds to an eigenvector of the covariance matrix Σ Σu i = λ i u i for all 1 i d which can be written compactly in matrix notation: ΣU = UΛ which impliesσ = UΛU T Thus, Λ represents the covariance matrix in the new PC basis. In the new PC basis, the equation x T Σ 1 x = 1 defines a d-dimensional ellipsoid (or hyper-ellipse). The eigenvectors u i of Σ, that is, the principal components, are the directions for the principal axes of the ellipsoid. The square roots of the eigenvalues, that is, λ i, give the lengths of the semi-axes. Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 7: Dimensionality Reduction 20 / 33
21 Iris: Elliptic Contours in Standard Basis u 2 u 3 u 1 Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 7: Dimensionality Reduction 21 / 33
22 Iris: Axis-Parallel Ellipsoid in PC Basis u 1 u 2 u 3 Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 7: Dimensionality Reduction 22 / 33
23 Kernel Principal Component Analysis Principal component analysis can be extended to find nonlinear directions in the data using kernel methods. Kernel PCA finds the directions of most variance in the feature space instead of the input space. Using the kernel trick, all PCA operations can be carried out in terms of the kernel function in input space, without having to transform the data into feature space. Let φ be a function that maps a point x in input space to its image φ(x i ) in feature space. Let the points in feature space be centered and let Σ φ be the covariance matrix. The first PC in feature space correspond to the dominant eigenvector where Σ φ u 1 = λ 1 u 1 Σ φ = 1 n n φ(x i )φ(x i ) T Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 7: Dimensionality Reduction 23 / 33
24 Kernel Principal Component Analysis n It can be shown that u 1 = c i φ(x i ). That is, the PC direction in feature space is a linear combination of the transformed points. The coefficients are captured in the weight vector c = ( ) T c 1, c 2,, c n Substituting into the eigen-decomposition of Σ φ and simplifying, we get: Kc = nλ 1 c = η 1 c Thus, the weight vector c is the eigenvector corresponding to the largest eigenvalue η 1 of the kernel matrix K. Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 7: Dimensionality Reduction 24 / 33
25 Kernel Principal Component Analysis The weight vector c can be used to then find u 1 via u 1 = n c i φ(x i ). The only constraint we impose is that u 1 should be normalized to be a unit vector, which implies c 2 = 1 η 1. We cannot compute directly compute the principal direction, but we can project any point φ(x) onto the principal direction u 1, as follows: u T 1φ(x) = n c i φ(x i ) T φ(x) = n c i K(x i, x) which requires only kernel operations. We can obtain the additional principal components by solving for the other eigenvalues and eigenvectors of Kc j = nλ j c j = η j c j If we sort the eigenvalues of K in decreasing order η 1 η 2 η n 0, we can obtain the jth principal component as the corresponding eigenvector c j. The variance along the jth principal component is given as λ j = η j n. Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 7: Dimensionality Reduction 25 / 33
26 Kernel PCA Algorithm KERNELPCA (D, K,α): K = { K(x i, x j ) } 1 // compute n n kernel matrix i,j=1,...,n 2 K = (I 1 n 1 n n)k(i 1 n 1 n n)// center the kernel matrix 3 ( (η 1,η 2,...,η d ) = ) eigenvalues(k)// compute eigenvalues 4 c1 c 2 c n = eigenvectors(k)// compute eigenvectors 5 λ i = η i n for all i = 1,...,n// compute variance for each component 6 7 c i = f(r) = 1 η i c i for all i = 1,...,n// ensure that u T i u i = 1 r λ i d, for all r = 1, 2,...,d // fraction of total λ i variance 8 Choose smallest r so that f(r) α// choose dimensionality C r = ( ) 9 c 1 c 2 c r // reduced basis 10 A = {a i a i = C T r K i, for i = 1,...,n}// reduced dimensionality data Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 7: Dimensionality Reduction 26 / 33
27 Nonlinear Iris Data: PCA in Input Space X u X u 2 X 1 X 1 Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 7: Dimensionality Reduction 27 / 33
28 Nonlinear Iris Data: Projection onto PCs u u 1 Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 7: Dimensionality Reduction 28 / 33
29 Kernel PCA: 3 PCs (Contours of Constant Projection) Homogeneous Quadratic Kernel: K (xi, xj ) = (xti xj ) Cb Cb Cb X1 Zaki & Meira Jr. (RPI and UFMG) Cb 0 0 Cb X2 X2 X Cb X1 Data Mining and Analysis X1 Chapter 7: Dimensionality Reduction 29 / 33
30 Kernel PCA: Projected Points onto 2 PCs Homogeneous Quadratic Kernel: K(x i, x j ) = (x T i x j ) 2 u u 1 Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 7: Dimensionality Reduction 30 / 33
31 Singular Value Decomposition Principal components analysis is a special case of a more general matrix decomposition method called Singular Value Decomposition (SVD). PCA yields the following decomposition of the covariance matrix: Σ = UΛU T where the covariance matrix has been factorized into the orthogonal matrix U containing its eigenvectors, and a diagonal matrix Λ containing its eigenvalues (sorted in decreasing order). SVD generalizes the above factorization for any matrix. In particular for an n d data matrix D with n points and d columns, SVD factorizes D as follows: D = L R T where L is a orthogonal n n matrix, R is an orthogonal d d matrix, and is an n d diagonal matrix, defined as (i, i) = δ i, and 0 otherwise. The columns of L are called the left singular and the columns of R (or rows of R T ) are called the right singular vectors. The entriesδ i are called the singular values of D, and they are all non-negative. Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 7: Dimensionality Reduction 31 / 33
32 Reduced SVD If the rank of D is r min(n, d), then there are only r nonzero singular values, ordered as follows: δ 1 δ 2 δ r > 0. We discard the left and right singular vectors that correspond to zero singular values, to obtain the reduced SVD as D = L r r R T r where L r is the n r matrix of the left singular vectors, R r is the d r matrix of the right singular vectors, and r is the r r diagonal matrix containing the positive singular vectors. The reduced SVD leads directly to the spectral decomposition of D given as D = r δ i l i r T i The best rank q approximation to the original data D is the matrix D q = q δ il i r T i that n minimizes the expression D D q F, where A F = d j=1 A(i, j)2 is called the Frobenius Norm of A. Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 7: Dimensionality Reduction 32 / 33
33 Connection Between SVD and PCA Assume D has been centered, and let D = L R T via SVD. Consider the scatter matrix for D, given as D T D. We have D T D = (L R T) T ( L R T) = R T L T L R T = R( T )R T = R 2 dr T where 2 d is the d d diagonal matrix defined as 2 d(i, i) = δ 2 i, for i = 1,...,d. The covariance matrix of centered D is given as Σ = 1 n DT D; we get D T D = nσ = nuλu T = U(nΛ)U T The right singular vectors R are the same as the eigenvectors of Σ. The singular values of D are related to the eigenvalues of Σ as nλ i = δi 2, which impliesλ i = δ2 i, for i = 1,...,d n Likewise the left singular vectors in L are the eigenvectors of the matrix n n matrix DD T, and the corresponding eigenvalues are given as δ 2 i. Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 7: Dimensionality Reduction 33 / 33
Data Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA 2 Department of Computer
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA Department
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki Wagner Meira Jr. Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA Department
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA
More informationPCA, Kernel PCA, ICA
PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per
More information14 Singular Value Decomposition
14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing
More informationData Mining II. Prof. Dr. Karsten Borgwardt, Department Biosystems, ETH Zürich. Basel, Spring Semester 2016 D-BSSE
D-BSSE Data Mining II Prof. Dr. Karsten Borgwardt, Department Biosystems, ETH Zürich Basel, Spring Semester 2016 D-BSSE Karsten Borgwardt Data Mining II Course, Basel Spring Semester 2016 2 / 117 Our course
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA
More informationPrincipal Component Analysis
Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand
More informationPrincipal Component Analysis
CSci 5525: Machine Learning Dec 3, 2008 The Main Idea Given a dataset X = {x 1,..., x N } The Main Idea Given a dataset X = {x 1,..., x N } Find a low-dimensional linear projection The Main Idea Given
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA
More information15 Singular Value Decomposition
15 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing
More informationDimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas
Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx
More informationChapter 3 Transformations
Chapter 3 Transformations An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Linear Transformations A function is called a linear transformation if 1. for every and 2. for every If we fix the bases
More informationIntroduction to Machine Learning
10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what
More informationCS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works
CS68: The Modern Algorithmic Toolbox Lecture #8: How PCA Works Tim Roughgarden & Gregory Valiant April 20, 206 Introduction Last lecture introduced the idea of principal components analysis (PCA). The
More information7. Symmetric Matrices and Quadratic Forms
Linear Algebra 7. Symmetric Matrices and Quadratic Forms CSIE NCU 1 7. Symmetric Matrices and Quadratic Forms 7.1 Diagonalization of symmetric matrices 2 7.2 Quadratic forms.. 9 7.4 The singular value
More informationStructure in Data. A major objective in data analysis is to identify interesting features or structure in the data.
Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two
More informationNotes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T.
Notes on singular value decomposition for Math 54 Recall that if A is a symmetric n n matrix, then A has real eigenvalues λ 1,, λ n (possibly repeated), and R n has an orthonormal basis v 1,, v n, where
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA
More informationMATH 829: Introduction to Data Mining and Analysis Principal component analysis
1/11 MATH 829: Introduction to Data Mining and Analysis Principal component analysis Dominique Guillot Departments of Mathematical Sciences University of Delaware April 4, 2016 Motivation 2/11 High-dimensional
More informationPrincipal Component Analysis (PCA)
Principal Component Analysis (PCA) Salvador Dalí, Galatea of the Spheres CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang Some slides from Derek Hoiem and Alysha
More informationGI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil
GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis Massimiliano Pontil 1 Today s plan SVD and principal component analysis (PCA) Connection
More information1 Singular Value Decomposition and Principal Component
Singular Value Decomposition and Principal Component Analysis In these lectures we discuss the SVD and the PCA, two of the most widely used tools in machine learning. Principal Component Analysis (PCA)
More information2. LINEAR ALGEBRA. 1. Definitions. 2. Linear least squares problem. 3. QR factorization. 4. Singular value decomposition (SVD) 5.
2. LINEAR ALGEBRA Outline 1. Definitions 2. Linear least squares problem 3. QR factorization 4. Singular value decomposition (SVD) 5. Pseudo-inverse 6. Eigenvalue decomposition (EVD) 1 Definitions Vector
More informationComputation. For QDA we need to calculate: Lets first consider the case that
Computation For QDA we need to calculate: δ (x) = 1 2 log( Σ ) 1 2 (x µ ) Σ 1 (x µ ) + log(π ) Lets first consider the case that Σ = I,. This is the case where each distribution is spherical, around the
More informationLecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26
Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA
More information7 Principal Component Analysis
7 Principal Component Analysis This topic will build a series of techniques to deal with high-dimensional data. Unlike regression problems, our goal is not to predict a value (the y-coordinate), it is
More informationMathematical foundations - linear algebra
Mathematical foundations - linear algebra Andrea Passerini passerini@disi.unitn.it Machine Learning Vector space Definition (over reals) A set X is called a vector space over IR if addition and scalar
More informationTutorials in Optimization. Richard Socher
Tutorials in Optimization Richard Socher July 20, 2008 CONTENTS 1 Contents 1 Linear Algebra: Bilinear Form - A Simple Optimization Problem 2 1.1 Definitions........................................ 2 1.2
More informationLecture: Face Recognition and Feature Reduction
Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 11-1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed
More informationMachine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling
Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University
More informationRegularized Discriminant Analysis and Reduced-Rank LDA
Regularized Discriminant Analysis and Reduced-Rank LDA Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Regularized Discriminant Analysis A compromise between LDA and
More informationChap 3. Linear Algebra
Chap 3. Linear Algebra Outlines 1. Introduction 2. Basis, Representation, and Orthonormalization 3. Linear Algebraic Equations 4. Similarity Transformation 5. Diagonal Form and Jordan Form 6. Functions
More informationConnection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis
Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal
More informationLecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016
Lecture 8 Principal Component Analysis Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 13, 2016 Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 1 / 31 Outline 1 Eigen
More informationPrincipal Component Analysis and Linear Discriminant Analysis
Principal Component Analysis and Linear Discriminant Analysis Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/29
More informationNumerical Methods I Singular Value Decomposition
Numerical Methods I Singular Value Decomposition Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 October 9th, 2014 A. Donev (Courant Institute)
More informationLecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University
Lecture 4: Principal Component Analysis Aykut Erdem May 016 Hacettepe University This week Motivation PCA algorithms Applications PCA shortcomings Autoencoders Kernel PCA PCA Applications Data Visualization
More informationNovelty Detection. Cate Welch. May 14, 2015
Novelty Detection Cate Welch May 14, 2015 1 Contents 1 Introduction 2 11 The Four Fundamental Subspaces 2 12 The Spectral Theorem 4 1 The Singular Value Decomposition 5 2 The Principal Components Analysis
More informationMLCC 2015 Dimensionality Reduction and PCA
MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2015 Outline PCA & Reconstruction PCA and Maximum Variance PCA and Associated Eigenproblem Beyond the First Principal Component
More informationTutorial on Principal Component Analysis
Tutorial on Principal Component Analysis Copyright c 1997, 2003 Javier R. Movellan. This is an open source document. Permission is granted to copy, distribute and/or modify this document under the terms
More informationDS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.
DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1
More informationReview of Some Concepts from Linear Algebra: Part 2
Review of Some Concepts from Linear Algebra: Part 2 Department of Mathematics Boise State University January 16, 2019 Math 566 Linear Algebra Review: Part 2 January 16, 2019 1 / 22 Vector spaces A set
More informationDATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD
DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary
More informationChapter XII: Data Pre and Post Processing
Chapter XII: Data Pre and Post Processing Information Retrieval & Data Mining Universität des Saarlandes, Saarbrücken Winter Semester 2013/14 XII.1 4-1 Chapter XII: Data Pre and Post Processing 1. Data
More informationUNIT 6: The singular value decomposition.
UNIT 6: The singular value decomposition. María Barbero Liñán Universidad Carlos III de Madrid Bachelor in Statistics and Business Mathematical methods II 2011-2012 A square matrix is symmetric if A T
More informationPCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given
More informationSingular Value Decomposition. 1 Singular Value Decomposition and the Four Fundamental Subspaces
Singular Value Decomposition This handout is a review of some basic concepts in linear algebra For a detailed introduction, consult a linear algebra text Linear lgebra and its pplications by Gilbert Strang
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA Tobias Scheffer Overview Principal Component Analysis (PCA) Kernel-PCA Fisher Linear Discriminant Analysis t-sne 2 PCA: Motivation
More informationFinal Exam, Linear Algebra, Fall, 2003, W. Stephen Wilson
Final Exam, Linear Algebra, Fall, 2003, W. Stephen Wilson Name: TA Name and section: NO CALCULATORS, SHOW ALL WORK, NO OTHER PAPERS ON DESK. There is very little actual work to be done on this exam if
More informationReview problems for MA 54, Fall 2004.
Review problems for MA 54, Fall 2004. Below are the review problems for the final. They are mostly homework problems, or very similar. If you are comfortable doing these problems, you should be fine on
More informationPrincipal Component Analysis
Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used
More informationPreprocessing & dimensionality reduction
Introduction to Data Mining Preprocessing & dimensionality reduction CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University Fall 2016 CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall 2016
More informationLEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach
LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach Dr. Guangliang Chen February 9, 2016 Outline Introduction Review of linear algebra Matrix SVD PCA Motivation The digits
More informationCOMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017
COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY
More informationSTATISTICAL LEARNING SYSTEMS
STATISTICAL LEARNING SYSTEMS LECTURE 8: UNSUPERVISED LEARNING: FINDING STRUCTURE IN DATA Institute of Computer Science, Polish Academy of Sciences Ph. D. Program 2013/2014 Principal Component Analysis
More informationMethods for sparse analysis of high-dimensional data, II
Methods for sparse analysis of high-dimensional data, II Rachel Ward May 26, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 55 High dimensional
More informationKernel Principal Component Analysis
Kernel Principal Component Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationFocus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.
Previously Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations y = Ax Or A simply represents data Notion of eigenvectors,
More informationDimensionality Reduction
Lecture 5 1 Outline 1. Overview a) What is? b) Why? 2. Principal Component Analysis (PCA) a) Objectives b) Explaining variability c) SVD 3. Related approaches a) ICA b) Autoencoders 2 Example 1: Sportsball
More informationConceptual Questions for Review
Conceptual Questions for Review Chapter 1 1.1 Which vectors are linear combinations of v = (3, 1) and w = (4, 3)? 1.2 Compare the dot product of v = (3, 1) and w = (4, 3) to the product of their lengths.
More informationData Mining Lecture 4: Covariance, EVD, PCA & SVD
Data Mining Lecture 4: Covariance, EVD, PCA & SVD Jo Houghton ECS Southampton February 25, 2019 1 / 28 Variance and Covariance - Expectation A random variable takes on different values due to chance The
More informationPrincipal Component Analysis. Applied Multivariate Statistics Spring 2012
Principal Component Analysis Applied Multivariate Statistics Spring 2012 Overview Intuition Four definitions Practical examples Mathematical example Case study 2 PCA: Goals Goal 1: Dimension reduction
More informationIntroduction to Matrix Algebra
Introduction to Matrix Algebra August 18, 2010 1 Vectors 1.1 Notations A p-dimensional vector is p numbers put together. Written as x 1 x =. x p. When p = 1, this represents a point in the line. When p
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA
More informationLecture 3: Review of Linear Algebra
ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters, transforms,
More informationCHAPTER 5. THE KERNEL METHOD 138
CHAPTER 5. THE KERNEL METHOD 138 Chapter 5 The Kernel Method Before we can mine data, it is important to first find a suitable data representation that facilitates data analysis. For example, for complex
More informationLecture 3: Review of Linear Algebra
ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak, scribe: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters,
More informationLinear Methods for Regression. Lijun Zhang
Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived
More informationMATH 304 Linear Algebra Lecture 20: The Gram-Schmidt process (continued). Eigenvalues and eigenvectors.
MATH 304 Linear Algebra Lecture 20: The Gram-Schmidt process (continued). Eigenvalues and eigenvectors. Orthogonal sets Let V be a vector space with an inner product. Definition. Nonzero vectors v 1,v
More informationComputational math: Assignment 1
Computational math: Assignment 1 Thanks Ting Gao for her Latex file 11 Let B be a 4 4 matrix to which we apply the following operations: 1double column 1, halve row 3, 3add row 3 to row 1, 4interchange
More informationLecture 4: Principal Component Analysis and Linear Dimension Reduction
Lecture 4: Principal Component Analysis and Linear Dimension Reduction Advanced Applied Multivariate Analysis STAT 2221, Fall 2013 Sungkyu Jung Department of Statistics University of Pittsburgh E-mail:
More informationData Analysis and Manifold Learning Lecture 2: Properties of Symmetric Matrices and Examples
Data Analysis and Manifold Learning Lecture 2: Properties of Symmetric Matrices and Examples Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline
More informationFoundations of Computer Vision
Foundations of Computer Vision Wesley. E. Snyder North Carolina State University Hairong Qi University of Tennessee, Knoxville Last Edited February 8, 2017 1 3.2. A BRIEF REVIEW OF LINEAR ALGEBRA Apply
More information1 Principal Components Analysis
Lecture 3 and 4 Sept. 18 and Sept.20-2006 Data Visualization STAT 442 / 890, CM 462 Lecture: Ali Ghodsi 1 Principal Components Analysis Principal components analysis (PCA) is a very popular technique for
More informationDimensionality Reduction and Principle Components
Dimensionality Reduction and Principle Components Ken Kreutz-Delgado (Nuno Vasconcelos) UCSD ECE Department Winter 2012 Motivation Recall, in Bayesian decision theory we have: World: States Y in {1,...,
More informationCS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)
CS4495/6495 Introduction to Computer Vision 8B-L2 Principle Component Analysis (and its use in Computer Vision) Wavelength 2 Wavelength 2 Principal Components Principal components are all about the directions
More information(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =
. (5 points) (a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? dim N(A), since rank(a) 3. (b) If we also know that Ax = has no solution, what do we know about the rank of A? C(A)
More informationAdvanced Introduction to Machine Learning CMU-10715
Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás Póczos Contents Motivation PCA algorithms Applications Some of these slides are taken from Karl Booksh Research
More informationProblem Set 1. Homeworks will graded based on content and clarity. Please show your work clearly for full credit.
CSE 151: Introduction to Machine Learning Winter 2017 Problem Set 1 Instructor: Kamalika Chaudhuri Due on: Jan 28 Instructions This is a 40 point homework Homeworks will graded based on content and clarity
More informationDecember 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis
.. December 20, 2013 Todays lecture. (PCA) (PLS-R) (LDA) . (PCA) is a method often used to reduce the dimension of a large dataset to one of a more manageble size. The new dataset can then be used to make
More informationLinear Algebra Methods for Data Mining
Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 Linear Discriminant Analysis Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki Principal
More information5 Linear Algebra and Inverse Problem
5 Linear Algebra and Inverse Problem 5.1 Introduction Direct problem ( Forward problem) is to find field quantities satisfying Governing equations, Boundary conditions, Initial conditions. The direct problem
More informationLecture 7 Spectral methods
CSE 291: Unsupervised learning Spring 2008 Lecture 7 Spectral methods 7.1 Linear algebra review 7.1.1 Eigenvalues and eigenvectors Definition 1. A d d matrix M has eigenvalue λ if there is a d-dimensional
More informationMachine Learning (BSMC-GA 4439) Wenke Liu
Machine Learning (BSMC-GA 4439) Wenke Liu 02-01-2018 Biomedical data are usually high-dimensional Number of samples (n) is relatively small whereas number of features (p) can be large Sometimes p>>n Problems
More informationLecture 2: Linear Algebra Review
EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1
More information1 Feature Vectors and Time Series
PCA, SVD, LSI, and Kernel PCA 1 Feature Vectors and Time Series We now consider a sample x 1,..., x of objects (not necessarily vectors) and a feature map Φ such that for any object x we have that Φ(x)
More informationConcentration Ellipsoids
Concentration Ellipsoids ECE275A Lecture Supplement Fall 2008 Kenneth Kreutz Delgado Electrical and Computer Engineering Jacobs School of Engineering University of California, San Diego VERSION LSECE275CE
More informationCSE 554 Lecture 7: Alignment
CSE 554 Lecture 7: Alignment Fall 2012 CSE554 Alignment Slide 1 Review Fairing (smoothing) Relocating vertices to achieve a smoother appearance Method: centroid averaging Simplification Reducing vertex
More informationDimensionality Reduction and Principal Components
Dimensionality Reduction and Principal Components Nuno Vasconcelos (Ken Kreutz-Delgado) UCSD Motivation Recall, in Bayesian decision theory we have: World: States Y in {1,..., M} and observations of X
More informationEECS 275 Matrix Computation
EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 6 1 / 22 Overview
More informationDimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining
Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Combinations of features Given a data matrix X n p with p fairly large, it can
More informationCOMP6237 Data Mining Covariance, EVD, PCA & SVD. Jonathon Hare
COMP6237 Data Mining Covariance, EVD, PCA & SVD Jonathon Hare jsh2@ecs.soton.ac.uk Variance and Covariance Random Variables and Expected Values Mathematicians talk variance (and covariance) in terms of
More informationMethods for sparse analysis of high-dimensional data, II
Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More information