Preprocessing & dimensionality reduction

Size: px
Start display at page:

Download "Preprocessing & dimensionality reduction"

Transcription

1 Introduction to Data Mining Preprocessing & dimensionality reduction CPSC/AMTH 445a/545a Guy Wolf Yale University Fall 2016 CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

2 Outline 1 Preprocessing for data simplification Sampling Aggregation Discretization Density estimation Dimensionality reduction 2 Principal component analysis (PCA) Autoencoder Variance maximization Singular value decomposition (SVD) 3 Multidimensional scaling (MDS) Gram matrix Double-centering Stress function CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

3 Preprocessing for data simplification Sampling Sampling Select a subset of representative data points instead of processing the entire data. A sampled subset is useful only if its analysis yields the same patterns, results, conclusions, etc., as the analysis of the entire data points 2000 points 500 points CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

4 Preprocessing for data simplification Sampling Sampling Select a subset of representative data points instead of processing the entire data. Common sampling approaches Random: an equal probability of selecting any particular item. Without replacement: iteratively selected & remove items. With replacement: selected items remain in the population. Stratified: draw random samples from each partition. Choosing a sufficient sample size is often crucial for effective sampling. CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

5 Preprocessing for data simplification Sampling Example Choose enough samples guarantee at least one representative is selected from each distinct group/cluster/profile in the data. CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

6 Preprocessing for data simplification Aggregation Instead of sampling representative data points we can coarse-grain the data by aggregating together attributes or data points. Aggregation Combining several attributes to a single feature, or several data points into a single observation. Examples Change monthly revenues to annual revenues Analyze neighborhoods instead of houses Provide average rating of a season (not per episode) CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

7 Preprocessing for data simplification Discretization It is sometimes convenient to transform the entire data to nominal (or ordinal) attributes. Discretization Transformation of continuous attributes (or ones with infinite range) to discrete ones with a finite range. Discretization can be done in a supervised discretization (e.g., using class labels) or in an unsupervised manner (e.g., using clustering). CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

8 Preprocessing for data simplification Discretization Supervised discretization based on minimizing impurity: 3 values per axis 5 values per axis CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

9 Preprocessing for data simplification Discretization Unsupervised discretization: CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

10 Preprocessing for data simplification Density estimation Transforming attributes from raw vales to densities can be used to coarse-grain the data and bring its features to comparable scales between zero and one. CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

11 Preprocessing for data simplification Density estimation Transforming attributes from raw vales to densities can be used to coarse-grain the data and bring its features to comparable scales between zero and one. Cell-based density estimation CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

12 Preprocessing for data simplification Density estimation Transforming attributes from raw vales to densities can be used to coarse-grain the data and bring its features to comparable scales between zero and one. Center-based density estimation CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

13 Preprocessing for data simplification Dimensionality reduction Dimensionality of data is generally determined by the number of attributes or features that represent each data point. Curse of dimensionality A general term for various phenomena that arise when analyzing and processing high-dimensional data. Common theme - statistical significance is difficult, impractical, or even impossible to obtain due to sparsity of the data in high-dimensions Causes poor performance of classical statistical methods compared to low-dimensional data Common solution - reduce the dimensionality of the data as part of its (pre)processing. CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

14 Preprocessing for data simplification Dimensionality reduction There are several approaches to represent the data in a lower dimension, which can generally be split into two types: Dimensionality reduction approaches Feature selection/weighting - select a subset of existing features and only use them in the analysis, while possibly also assigning them importance weights to eliminate redundant information Feature extraction/construction - create new features by extracting relevant information from the original features PCA and MDS are two of the most common dimensionality reduction methods in data analysis, but many others exist as well. CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

15 Preprocessing for data simplification Feature subset selection Ideally - choose the best feature subset out of all possible combinations. Impractical - there are 2 n choices for n attributes! Feature selection approaches Embedded methods - choose the best features for a task as part of the data mining algorithm (e.g., decision trees). Filter methods - choose features that optimize a general criterion (e.g., min correlation) as part of data preprocessing using an efficient search algorithm. Wrapper methods - first formulate & handle a data mining task to select features, and then use the resulting subset to solve the real task. Alternatively, expert knowledge can sometimes be used to eliminate redundant and unnecessary features. CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

16 Principal Component Analysis CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

17 CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

18 CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

19 Assume: avg = 0 Find: best k-dim projection CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

20 Projection on principal components: Principal components Data points CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

21 Projection on principal components: 3D space λ1φ1 1D space CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

22 What is the best projection? Find subspace S R n s.t. dim(s) = k and the data is well approximated by ˆx = proj S x. CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

23 What is the best projection? Find subspace S R n s.t. dim(s) = k and the data is well approximated by ˆx = proj S x. Find subspace S R n s.t. S = span{u 1,..., u k } and the data is x ˆx is minimal over the data with ˆx = proj S x. CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

24 What is the best projection? Find subspace S R n s.t. dim(s) = k and the data is well approximated by ˆx = proj S x. Find subspace S R n s.t. S = span{u 1,..., u k } and the data is x ˆx is minimal over the data with ˆx = proj S x. Find k vectors u 1,..., u k s.t. N 1 N i=1 x i ˆx i 2 is minimal with ˆx = proj span{u1,...,u k } x. CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

25 What is the best projection? Find subspace S R n s.t. dim(s) = k and the data is well approximated by ˆx = proj S x. Find subspace S R n s.t. S = span{u 1,..., u k } and the data is x ˆx is minimal over the data with ˆx = proj S x. Find k vectors u 1,..., u k s.t. N 1 N i=1 x i ˆx i 2 is minimal with ˆx = proj span{u1,...,u k } x. How do we find these vectors u 1,..., u k? CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

26 Autoencoder Minimize N 1 n i=1 x i ˆx i 2 s.t. ˆx = proj span{u1,...,u k } x Input layer: x[1] x[2] x[3] x[4] x[5] Hidden layer: h[1] h[2] h[3] h i = W x i ˆx i = Uh i Output layer: ˆx[1] ˆx[2] ˆx[3] ˆx[4] ˆx[5] arg min W R k n,u R n k N x i UWx i 2 i=1 CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

27 Reconstruction error minimization We only need to consider orthonormal vectors u 1,..., u k (i.e., u i = 1, u i, u j = 0 for i j) that form a basis for the subspace. We can then extend this set to form a basis u 1,..., u n for the entire R n. Then, we can write x = n j=1 x, u j u j = n j=1 u j u T j x and proj span{u1,...,u k } = k j=1 u j u T j x. We now consider the reconstruction error N 1 N i=1 x i ˆx i 2. CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

28 Reconstruction error minimization First, notice that x ˆx = n j=1 u j u T j x k j=1 u j u T j x = n j=k+1 u j u T j x n n x ˆx 2 = ( u j [q]uj T x) 2 q=1 j=k+1 = n n n ( u j [q]u j [q])(uj T x)(uj T x) j=k+1 j =k+1 q=1 = n n k (uj T x) 2 = (uj T x) 2 (uj T x) 2 = x 2 ˆx 2 j=k+1 j=1 j=1 Minimizing the reconstruction error is equivalent to maximizing N 1 N i=1 ˆx i 2 = k j=1 N 1 N i=1 (u T j x i ) 2 = k j=1 variance(u T j x) CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

29 Variance maximization Find a direction that maximizes the variance in the projected data. CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

30 Variance maximization Find a direction that maximizes the variance in the projected data. Find a unit vector u R n that maximizes variance(u T x) = u T Σu, where Σ is the covariance matrix. CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

31 Variance maximization Find a direction that maximizes the variance in the projected data. Find a unit vector u R n that maximizes: variance(u T x) = N 1 N i=1 = u T ( N 1 where Σ is the covariance matrix. N (u T x i ) 2 = N 1 (u T x i )(xi T u) N i=1 x i x T i ) i=1 u = u T Σu CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

32 Variance maximization Find a direction that maximizes the variance in the projected data. Find a unit vector u R n that maximizes variance(u T x) = u T Σu, where Σ is the covariance matrix. CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

33 Variance maximization Find a direction that maximizes the variance in the projected data. Find a unit vector u R n that maximizes variance(u T x) = u T Σu, where Σ is the covariance matrix. Solve the maximization problem: maximize u T Σu s.t. u = 1 CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

34 Variance maximization Solve the maximization problem: maximize u T Σu s.t. u = 1 Apply Lagrange multipliers method: f (u, α) = u T Σu + α(1 u T u) u f (u, α) = 2(Σu αu) u f (u, α) = 0 Σu = αu Therefore, u is an eigenvector of Σ with eigenvalue α, which has to be the maximal eigenvalue to maximize u T Σu = α. CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

35 Variance maximization Similarly, a second direction is found via: maximize u2 T Σu 2 s.t. u 2 = 1 u 2, u 1 = 0 Apply Lagrange multipliers method: f (u 2, α, β) = u T 2 Σu 2 + α(1 u T 2 u 2 ) βu T 2 u 1 u2 f (u 2, α, β) = 2(Σu 2 αu 2 ) βu 1 u2 f (u 2, α, β) = 0 β = u 1, u2 f (u 2, α, β) = 0 Σu 2 = αu 2 Therefore, u 2 is an eigenvector of Σ with the second largest eigenvalue. CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

36 Eigendecomposition and SVD features data points features Covariance matrix = data points cov(q 1, q 2 ) i x i [q 1 ] x i [q 2 ] CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

37 Eigendecomposition and SVD Eigenvector Eigenvalue λ i φ i = Covariance matrix φ i q 1 q 2 CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

38 Eigendecomposition and SVD Spectral theorem applies to cov. matrices: Eigenvector Eigenvalue λ i φ i = Covariance matrix φ i q 1 q 2 Covariance matrix = Singular vectors SVD (Singular Value Decomposition) Singular values Spectral Theorem: cov(q 1, q 2 ) = i λ i φ i [q 1 ] φ i [q 2 ] CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

39 Singular value decomposition Any matrix M R n k can be decomposed to U, S, V SVD(M) as M = U S V T n n orthogonal n k diagonal k k orthogonal The singular values in S are the square root of the (nonnegative) eigenvalues of both MM T and M T M. The singular vectors in (the columns of) U are the eigenvectors of MM T. The singular vectors in (the columns of) V are the eigenvectors of M T M. Proof and more details about SVD can be found on Wikipedia. CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

40 Singular value decomposition eigenvalues λ 1 λ 2 λ 3 λ 4 λ 5 Decaying covariance spectrum reveals (low) dimensionality CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

41 Singular value decomposition Covariance matrix = }{{} principal components Eigenvectors Eigenvalues Covariance matrix can be approximated by a truncated SVD CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

42 Trivial example Consider simple case of data points that are all on the same high dimensional line Straight line is defined by a unit vector ψ = 1 Points on the line are defined by multiplying ψ by scalars The points can be formulated as x i = c i ψ Covariance: cov(t 1, t 2 ) = x i [t 1 ]x i [t 2 ] = c iψ[t1 ]c iψ[t2 ] = i i ( ci 2 ) ψ[t 1 ] ψ[t 2 ] = c 2 ψ(t1 ) ψ(t 2 ) c (c 1, c 2,...) i CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

43 Trivial example Consider simple case of data points that are all on the same high dimensional line Straight line is defined by a unit vector ψ = 1 Points on the line are defined by multiplying ψ by scalars c The points can be formulated as 2 x i = c i ψ Covariance: Covariance cov(t 1, t 2 ) = x i [t 1 ]x i [t 2 ] = c iψ[t1 ]c iψ[t2 ] = = i i ( ci 2 ) ψ[t Matrix 1 ] ψ[t 2 ] = c 2 ψ(t1 ) ψ(t 2 ) c (c 1, c 2,...) i ψ CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

44 Trivial example Consider simple case of data points that are all on the same high dimensional line Straight line is defined by a unit vector ψ = 1 Points on the line are defined by multiplying ψ by scalars The points can be formulated as x i = c i ψ Covariance: cov(t 1, t 2 ) = x i [t 1 ]x i [t 2 ] = c iψ[t1 ]c iψ[t2 ] = i i ( ci 2 ) ψ[t 1 ] ψ[t 2 ] = c 2 ψ(t1 ) ψ(t 2 ) c (c 1, c 2,...) i Covariance matrix has a single eigenvalue c 2 and a single eigenvector ψ, which defines principal direction of the data-point vectors CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

45 Trivial example 3D space CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

46 Trivial example 3D space φ 1 = ψ CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

47 Trivial example 3D space CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

48 Trivial example Length: eigenvalues Direction: eigenvectors 3D space λ1φ1 λ 2 φ 2 principal components max var directions CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

49 PCA algorithm: 1 Centering 2 Covariance 3 SVD (or eigendecomposition) 4 Projection Alternative method: Multi-Dimensional Scaling (MDS) - preserve distances/inner-products with minimal set of coordinates. CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

50 Multidimensional Scaling CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

51 Multidimensional scaling What if we cannot compute a covariance matrix? Consider a k-dimensional rigid body - all we need to know are distances between its parts. We can ignore its position and orientation and find the most efficient way to place it in R k. CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

52 Multidimensional scaling 0 d 1j d 1m..... D = d i1 0 d im..... d m1 d mj 0 { y1,..., y m R k : y i y j = d ij = x i x j } Multidimensional scaling Given a m m matrix D of distances between m objects, find k dimensional coordinates that preserve these distances. CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

53 Multidimensional scaling Gram matrix A distance matrix is not convenient to directly embed in R k, but embedding inner products is a simpler task. Gram matrix A matrix G that contains inner products g ij = x i, x j is a Gram matrix. Using the spectral theorem we can decompose G = ΦΛΦ T and get m x i, x j = g ij = λ q Φ[i, q]φ[j, q] = Φ[i, ]Λ 1/2, Φ[j, ]Λ 1/2 q=1 Similar to PCA, we can truncate small eigenvalues and use the k biggest eigenpairs. CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

54 Multidimensional scaling Spectral embedding G = λ 1 λ 2 λ 3 λ k > 0 m φ 1 φ 2 φ 3 φ k CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

55 Multidimensional scaling Spectral embedding G = λ 1 λ 2 λ 3 λ k > 0 m φ 1 φ 2 φ 3 φ k CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

56 Multidimensional scaling Spectral embedding G = λ 1 λ 2 λ 3 λ k > 0 m φ 1 φ 2 φ 3 φ k x Φ(x) [λ 1/2 1 φ 1 (x), λ 1/2 2 φ 2 (x), λ 1/2 3 φ 3 (x),..., λ 1/2 φ k (x)] T k CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

57 Multidimensional scaling Double-centering Notice that given a distance metric that is equivalent to Euclidean distances, we can write: x y 2 = x 2 + y 2 2 x, y But then: CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

58 Multidimensional scaling Double-centering Notice that given a distance metric that is equivalent to Euclidean distances, we can write: x y 2 = x 2 + y 2 2 x, y But then: mean x ( x y 2 ) = z 2 + y 2 2 z, y where z and z 2 are the mean and mean squared norm of the data CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

59 Multidimensional scaling Double-centering Notice that given a distance metric that is equivalent to Euclidean distances, we can write: x y 2 = x 2 + y 2 2 x, y But then: mean x ( x y 2 ) = z 2 + y 2 2 z, y mean y ( x y 2 ) = z 2 + x 2 2 x, z where z and z 2 are the mean and mean squared norm of the data CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

60 Multidimensional scaling Double-centering Notice that given a distance metric that is equivalent to Euclidean distances, we can write: x y 2 = x 2 + y 2 2 x, y But then: mean x ( x y 2 ) = z 2 + y 2 2 z, y mean y ( x y 2 ) = z 2 + x 2 2 x, z mean x,y ( x y 2 ) = 2 z 2 2 z, z where z and z 2 are the mean and mean squared norm of the data CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

61 Multidimensional scaling Double-centering Thus, if we set g(x, y) = 2 1 ( x y 2 mean x ( x y 2 ) we get a gram matrix, since: mean y ( x y 2 ) + mean x,y ( x y 2 ) ) g(x, y) = ( x, y x, z ) ( z, y z, z ) = x z, y z }{{}}{{} x,y z z,y z Therefore, we can compute G = 1 2 J D(2) J where J = Id 1 m 1 1 T CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

62 Multidimensional scaling Classic MDS Classic MDS is computed with the following algorithm: MDS algorithm 1 Formulate squared distances 2 Build Gram matrix by double-centering 3 SVD (or eigendecomposition) 4 Assign coordinates based on eigenvalues and eigenvectors Exercise: show that for centered data in Euclidean space this embedding is identical to PCA. CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

63 Multidimensional scaling Stress function What if we are not given a distance metric, but just dissimilarities? Stress function A function that quantifies the disagreement between given dissimilarities and embedded Euclidean distances. Examples Stress functions i<j Metric MDS stress: (ˆd ij f (d ij )) 2, where f is a predetermined i<j d2 ij monotonically increasing function i<j Kruskal s stress-1: (ˆd ij f (d ij )) 2 ˆd, where f is optimized, but still 2 i<j ij monotonically increasing Sammon s stress: ( i<j d ij ) 1 (ˆd ij d ij ) 2 i<j d ij CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

64 Multidimensional scaling Non-metric MDS Non-metric, or non-classical MDS is computed by the following algorithm: Non-metric MDS algorithm 1 Formulate a dissimilarity matrix D. 2 Find an initial configuration (e.g., using classical MDS) with distance matrix ˆD. 3 Minimize STRESS D (f, ˆD) by optimizing the fitting function. 4 Minimize STRESS D (f, ˆD) by optimizing the configuration and resulting ˆD. 5 Iterate the previous two steps until the stress is lower than a stopping threshold. CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

65 Summary Preprocessing steps are crucial in preparing data for meaningful analysis. Linear dimensionality reduction for alleviating the curse of dimensionality: PCA - project data on leading eigenvectors of the covariance matrix. MDS - embed data using leading eigenvalues of a Gram matrix and entries of corresponding eigenvectors. In both cases, SVD is used in practice instead of eigendecomposition. Nonlinear dimensionality reduction will be covered later in the semester. CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall / 29

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 42 Outline 1 Introduction 2 Feature selection

More information

Dimension Reduction and Low-dimensional Embedding

Dimension Reduction and Low-dimensional Embedding Dimension Reduction and Low-dimensional Embedding Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/26 Dimension

More information

PCA and admixture models

PCA and admixture models PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1

More information

14 Singular Value Decomposition

14 Singular Value Decomposition 14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

EECS 275 Matrix Computation

EECS 275 Matrix Computation EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 6 1 / 22 Overview

More information

Data Preprocessing Tasks

Data Preprocessing Tasks Data Tasks 1 2 3 Data Reduction 4 We re here. 1 Dimensionality Reduction Dimensionality reduction is a commonly used approach for generating fewer features. Typically used because too many features can

More information

Machine Learning - MT & 14. PCA and MDS

Machine Learning - MT & 14. PCA and MDS Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)

More information

Linear Dimensionality Reduction

Linear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis

More information

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage

More information

15 Singular Value Decomposition

15 Singular Value Decomposition 15 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations. Previously Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations y = Ax Or A simply represents data Notion of eigenvectors,

More information

Distance Preservation - Part I

Distance Preservation - Part I October 2, 2007 1 Introduction 2 Scalar product Equivalence with PCA Euclidean distance 3 4 5 Spatial distances Only the coordinates of the points affects the distances. L p norm: a p = p D k=1 a k p Minkowski

More information

Computation. For QDA we need to calculate: Lets first consider the case that

Computation. For QDA we need to calculate: Lets first consider the case that Computation For QDA we need to calculate: δ (x) = 1 2 log( Σ ) 1 2 (x µ ) Σ 1 (x µ ) + log(π ) Lets first consider the case that Σ = I,. This is the case where each distribution is spherical, around the

More information

7 Principal Component Analysis

7 Principal Component Analysis 7 Principal Component Analysis This topic will build a series of techniques to deal with high-dimensional data. Unlike regression problems, our goal is not to predict a value (the y-coordinate), it is

More information

CSC 411 Lecture 12: Principal Component Analysis

CSC 411 Lecture 12: Principal Component Analysis CSC 411 Lecture 12: Principal Component Analysis Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 12-PCA 1 / 23 Overview Today we ll cover the first unsupervised

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions

More information

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis. Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Vector space Consists of: A set V A scalar

More information

MLCC 2015 Dimensionality Reduction and PCA

MLCC 2015 Dimensionality Reduction and PCA MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2015 Outline PCA & Reconstruction PCA and Maximum Variance PCA and Associated Eigenproblem Beyond the First Principal Component

More information

Linear Algebra (Review) Volker Tresp 2017

Linear Algebra (Review) Volker Tresp 2017 Linear Algebra (Review) Volker Tresp 2017 1 Vectors k is a scalar (a number) c is a column vector. Thus in two dimensions, c = ( c1 c 2 ) (Advanced: More precisely, a vector is defined in a vector space.

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

Dimensionality Reduction

Dimensionality Reduction Lecture 5 1 Outline 1. Overview a) What is? b) Why? 2. Principal Component Analysis (PCA) a) Objectives b) Explaining variability c) SVD 3. Related approaches a) ICA b) Autoencoders 2 Example 1: Sportsball

More information

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang. Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 02-01-2018 Biomedical data are usually high-dimensional Number of samples (n) is relatively small whereas number of features (p) can be large Sometimes p>>n Problems

More information

Machine Learning. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Machine Learning. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Machine Learning Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1395 1 / 47 Table of contents 1 Introduction

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Linear Algebra & Geometry why is linear algebra useful in computer vision? Linear Algebra & Geometry why is linear algebra useful in computer vision? References: -Any book on linear algebra! -[HZ] chapters 2, 4 Some of the slides in this lecture are courtesy to Prof. Octavia

More information

Dimensionality Reduction

Dimensionality Reduction Dimensionality Reduction Le Song Machine Learning I CSE 674, Fall 23 Unsupervised learning Learning from raw (unlabeled, unannotated, etc) data, as opposed to supervised data where a classification of

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis .. December 20, 2013 Todays lecture. (PCA) (PLS-R) (LDA) . (PCA) is a method often used to reduce the dimension of a large dataset to one of a more manageble size. The new dataset can then be used to make

More information

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Linear Algebra & Geometry why is linear algebra useful in computer vision? Linear Algebra & Geometry why is linear algebra useful in computer vision? References: -Any book on linear algebra! -[HZ] chapters 2, 4 Some of the slides in this lecture are courtesy to Prof. Octavia

More information

Advanced Machine Learning & Perception

Advanced Machine Learning & Perception Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 2 Nonlinear Manifold Learning Multidimensional Scaling (MDS) Locally Linear Embedding (LLE) Beyond Principal Components Analysis (PCA)

More information

LECTURE NOTE #11 PROF. ALAN YUILLE

LECTURE NOTE #11 PROF. ALAN YUILLE LECTURE NOTE #11 PROF. ALAN YUILLE 1. NonLinear Dimension Reduction Spectral Methods. The basic idea is to assume that the data lies on a manifold/surface in D-dimensional space, see figure (1) Perform

More information

CS 143 Linear Algebra Review

CS 143 Linear Algebra Review CS 143 Linear Algebra Review Stefan Roth September 29, 2003 Introductory Remarks This review does not aim at mathematical rigor very much, but instead at ease of understanding and conciseness. Please see

More information

Lecture 3: Review of Linear Algebra

Lecture 3: Review of Linear Algebra ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters, transforms,

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA Tobias Scheffer Overview Principal Component Analysis (PCA) Kernel-PCA Fisher Linear Discriminant Analysis t-sne 2 PCA: Motivation

More information

Lecture 3: Review of Linear Algebra

Lecture 3: Review of Linear Algebra ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak, scribe: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters,

More information

L26: Advanced dimensionality reduction

L26: Advanced dimensionality reduction L26: Advanced dimensionality reduction The snapshot CA approach Oriented rincipal Components Analysis Non-linear dimensionality reduction (manifold learning) ISOMA Locally Linear Embedding CSCE 666 attern

More information

Lecture 6 Sept Data Visualization STAT 442 / 890, CM 462

Lecture 6 Sept Data Visualization STAT 442 / 890, CM 462 Lecture 6 Sept. 25-2006 Data Visualization STAT 442 / 890, CM 462 Lecture: Ali Ghodsi 1 Dual PCA It turns out that the singular value decomposition also allows us to formulate the principle components

More information

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY

More information

The following definition is fundamental.

The following definition is fundamental. 1. Some Basics from Linear Algebra With these notes, I will try and clarify certain topics that I only quickly mention in class. First and foremost, I will assume that you are familiar with many basic

More information

Exercises * on Principal Component Analysis

Exercises * on Principal Component Analysis Exercises * on Principal Component Analysis Laurenz Wiskott Institut für Neuroinformatik Ruhr-Universität Bochum, Germany, EU 4 February 207 Contents Intuition 3. Problem statement..........................................

More information

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach Dr. Guangliang Chen February 9, 2016 Outline Introduction Review of linear algebra Matrix SVD PCA Motivation The digits

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Principal Component Analysis

Principal Component Analysis Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

Principal Component Analysis

Principal Component Analysis CSci 5525: Machine Learning Dec 3, 2008 The Main Idea Given a dataset X = {x 1,..., x N } The Main Idea Given a dataset X = {x 1,..., x N } Find a low-dimensional linear projection The Main Idea Given

More information

Neuroscience Introduction

Neuroscience Introduction Neuroscience Introduction The brain As humans, we can identify galaxies light years away, we can study particles smaller than an atom. But we still haven t unlocked the mystery of the three pounds of matter

More information

Unsupervised learning: beyond simple clustering and PCA

Unsupervised learning: beyond simple clustering and PCA Unsupervised learning: beyond simple clustering and PCA Liza Rebrova Self organizing maps (SOM) Goal: approximate data points in R p by a low-dimensional manifold Unlike PCA, the manifold does not have

More information

Machine Learning for Software Engineering

Machine Learning for Software Engineering Machine Learning for Software Engineering Dimensionality Reduction Prof. Dr.-Ing. Norbert Siegmund Intelligent Software Systems 1 2 Exam Info Scheduled for Tuesday 25 th of July 11-13h (same time as the

More information

Unsupervised dimensionality reduction

Unsupervised dimensionality reduction Unsupervised dimensionality reduction Guillaume Obozinski Ecole des Ponts - ParisTech SOCN course 2014 Guillaume Obozinski Unsupervised dimensionality reduction 1/30 Outline 1 PCA 2 Kernel PCA 3 Multidimensional

More information

ISSN: (Online) Volume 3, Issue 5, May 2015 International Journal of Advance Research in Computer Science and Management Studies

ISSN: (Online) Volume 3, Issue 5, May 2015 International Journal of Advance Research in Computer Science and Management Studies ISSN: 2321-7782 (Online) Volume 3, Issue 5, May 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at:

More information

Machine Learning Techniques

Machine Learning Techniques Machine Learning Techniques ( 機器學習技法 ) Lecture 13: Deep Learning Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University ( 國立台灣大學資訊工程系

More information

Linear Algebra (Review) Volker Tresp 2018

Linear Algebra (Review) Volker Tresp 2018 Linear Algebra (Review) Volker Tresp 2018 1 Vectors k, M, N are scalars A one-dimensional array c is a column vector. Thus in two dimensions, ( ) c1 c = c 2 c i is the i-th component of c c T = (c 1, c

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Yuanzhen Shao MA 26500 Yuanzhen Shao PCA 1 / 13 Data as points in R n Assume that we have a collection of data in R n. x 11 x 21 x 12 S = {X 1 =., X x 22 2 =.,, X x m2 m =.

More information

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26 Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1

More information

Linear Algebra - Part II

Linear Algebra - Part II Linear Algebra - Part II Projection, Eigendecomposition, SVD (Adapted from Sargur Srihari s slides) Brief Review from Part 1 Symmetric Matrix: A = A T Orthogonal Matrix: A T A = AA T = I and A 1 = A T

More information

CS 340 Lec. 6: Linear Dimensionality Reduction

CS 340 Lec. 6: Linear Dimensionality Reduction CS 340 Lec. 6: Linear Dimensionality Reduction AD January 2011 AD () January 2011 1 / 46 Linear Dimensionality Reduction Introduction & Motivation Brief Review of Linear Algebra Principal Component Analysis

More information

Principal Components Analysis. Sargur Srihari University at Buffalo

Principal Components Analysis. Sargur Srihari University at Buffalo Principal Components Analysis Sargur Srihari University at Buffalo 1 Topics Projection Pursuit Methods Principal Components Examples of using PCA Graphical use of PCA Multidimensional Scaling Srihari 2

More information

Review of some mathematical tools

Review of some mathematical tools MATHEMATICAL FOUNDATIONS OF SIGNAL PROCESSING Fall 2016 Benjamín Béjar Haro, Mihailo Kolundžija, Reza Parhizkar, Adam Scholefield Teaching assistants: Golnoosh Elhami, Hanjie Pan Review of some mathematical

More information

Statistics 202: Data Mining. c Jonathan Taylor. Week 2 Based in part on slides from textbook, slides of Susan Holmes. October 3, / 1

Statistics 202: Data Mining. c Jonathan Taylor. Week 2 Based in part on slides from textbook, slides of Susan Holmes. October 3, / 1 Week 2 Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Part I Other datatypes, preprocessing 2 / 1 Other datatypes Document data You might start with a collection of

More information

Part I. Other datatypes, preprocessing. Other datatypes. Other datatypes. Week 2 Based in part on slides from textbook, slides of Susan Holmes

Part I. Other datatypes, preprocessing. Other datatypes. Other datatypes. Week 2 Based in part on slides from textbook, slides of Susan Holmes Week 2 Based in part on slides from textbook, slides of Susan Holmes Part I Other datatypes, preprocessing October 3, 2012 1 / 1 2 / 1 Other datatypes Other datatypes Document data You might start with

More information

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) Principal Component Analysis (PCA) Additional reading can be found from non-assessed exercises (week 8) in this course unit teaching page. Textbooks: Sect. 6.3 in [1] and Ch. 12 in [2] Outline Introduction

More information

Clusters. Unsupervised Learning. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Clusters. Unsupervised Learning. Luc Anselin.   Copyright 2017 by Luc Anselin, All Rights Reserved Clusters Unsupervised Learning Luc Anselin http://spatial.uchicago.edu 1 curse of dimensionality principal components multidimensional scaling classical clustering methods 2 Curse of Dimensionality 3 Curse

More information

Stat 159/259: Linear Algebra Notes

Stat 159/259: Linear Algebra Notes Stat 159/259: Linear Algebra Notes Jarrod Millman November 16, 2015 Abstract These notes assume you ve taken a semester of undergraduate linear algebra. In particular, I assume you are familiar with the

More information

PCA and LDA. Man-Wai MAK

PCA and LDA. Man-Wai MAK PCA and LDA Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: S.J.D. Prince,Computer

More information

Solutions to Review Problems for Chapter 6 ( ), 7.1

Solutions to Review Problems for Chapter 6 ( ), 7.1 Solutions to Review Problems for Chapter (-, 7 The Final Exam is on Thursday, June,, : AM : AM at NESBITT Final Exam Breakdown Sections % -,7-9,- - % -9,-,7,-,-7 - % -, 7 - % Let u u and v Let x x x x,

More information

Orthonormal Bases; Gram-Schmidt Process; QR-Decomposition

Orthonormal Bases; Gram-Schmidt Process; QR-Decomposition Orthonormal Bases; Gram-Schmidt Process; QR-Decomposition MATH 322, Linear Algebra I J. Robert Buchanan Department of Mathematics Spring 205 Motivation When working with an inner product space, the most

More information

Distances & Similarities

Distances & Similarities Introduction to Data Mining Distances & Similarities CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University Fall 2016 CPSC 445 (Guy Wolf) Distances & Similarities Yale - Fall 2016 1 / 22 Outline

More information

Lecture: Face Recognition and Feature Reduction

Lecture: Face Recognition and Feature Reduction Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab 1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed in the

More information

CSE 554 Lecture 7: Alignment

CSE 554 Lecture 7: Alignment CSE 554 Lecture 7: Alignment Fall 2012 CSE554 Alignment Slide 1 Review Fairing (smoothing) Relocating vertices to achieve a smoother appearance Method: centroid averaging Simplification Reducing vertex

More information

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary

More information

Data Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining

Data Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining Data Mining: Data Lecture Notes for Chapter 2 Introduction to Data Mining by Tan, Steinbach, Kumar 1 Types of data sets Record Tables Document Data Transaction Data Graph World Wide Web Molecular Structures

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

Lecture: Face Recognition and Feature Reduction

Lecture: Face Recognition and Feature Reduction Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 11-1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed

More information

Nonlinear Dimensionality Reduction

Nonlinear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Kernel PCA 2 Isomap 3 Locally Linear Embedding 4 Laplacian Eigenmap

More information

Uncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization

Uncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization Uncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization Haiping Lu 1 K. N. Plataniotis 1 A. N. Venetsanopoulos 1,2 1 Department of Electrical & Computer Engineering,

More information

Tutorial on Principal Component Analysis

Tutorial on Principal Component Analysis Tutorial on Principal Component Analysis Copyright c 1997, 2003 Javier R. Movellan. This is an open source document. Permission is granted to copy, distribute and/or modify this document under the terms

More information

Principal Component Analysis. Applied Multivariate Statistics Spring 2012

Principal Component Analysis. Applied Multivariate Statistics Spring 2012 Principal Component Analysis Applied Multivariate Statistics Spring 2012 Overview Intuition Four definitions Practical examples Mathematical example Case study 2 PCA: Goals Goal 1: Dimension reduction

More information

7. Variable extraction and dimensionality reduction

7. Variable extraction and dimensionality reduction 7. Variable extraction and dimensionality reduction The goal of the variable selection in the preceding chapter was to find least useful variables so that it would be possible to reduce the dimensionality

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis CS5240 Theoretical Foundations in Multimedia Leow Wee Kheng Department of Computer Science School of Computing National University of Singapore Leow Wee Kheng (NUS) Principal

More information

Unsupervised Learning: Dimensionality Reduction

Unsupervised Learning: Dimensionality Reduction Unsupervised Learning: Dimensionality Reduction CMPSCI 689 Fall 2015 Sridhar Mahadevan Lecture 3 Outline In this lecture, we set about to solve the problem posed in the previous lecture Given a dataset,

More information

Dimension Reduction and Iterative Consensus Clustering

Dimension Reduction and Iterative Consensus Clustering Dimension Reduction and Iterative Consensus Clustering Southeastern Clustering and Ranking Workshop August 24, 2009 Dimension Reduction and Iterative 1 Document Clustering Geometry of the SVD Centered

More information

Maximum variance formulation

Maximum variance formulation 12.1. Principal Component Analysis 561 Figure 12.2 Principal component analysis seeks a space of lower dimensionality, known as the principal subspace and denoted by the magenta line, such that the orthogonal

More information

LECTURE 16: PCA AND SVD

LECTURE 16: PCA AND SVD Instructor: Sael Lee CS549 Computational Biology LECTURE 16: PCA AND SVD Resource: PCA Slide by Iyad Batal Chapter 12 of PRML Shlens, J. (2003). A tutorial on principal component analysis. CONTENT Principal

More information

Chapter 3 Transformations

Chapter 3 Transformations Chapter 3 Transformations An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Linear Transformations A function is called a linear transformation if 1. for every and 2. for every If we fix the bases

More information

Linear Algebra Massoud Malek

Linear Algebra Massoud Malek CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product

More information

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data. Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two

More information

Review problems for MA 54, Fall 2004.

Review problems for MA 54, Fall 2004. Review problems for MA 54, Fall 2004. Below are the review problems for the final. They are mostly homework problems, or very similar. If you are comfortable doing these problems, you should be fine on

More information

Manifold Learning: Theory and Applications to HRI

Manifold Learning: Theory and Applications to HRI Manifold Learning: Theory and Applications to HRI Seungjin Choi Department of Computer Science Pohang University of Science and Technology, Korea seungjin@postech.ac.kr August 19, 2008 1 / 46 Greek Philosopher

More information

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:

More information

PCA and LDA. Man-Wai MAK

PCA and LDA. Man-Wai MAK PCA and LDA Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: S.J.D. Prince,Computer

More information

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Yoshua Bengio Pascal Vincent Jean-François Paiement University of Montreal April 2, Snowbird Learning 2003 Learning Modal Structures

More information

Functional Analysis Review

Functional Analysis Review Outline 9.520: Statistical Learning Theory and Applications February 8, 2010 Outline 1 2 3 4 Vector Space Outline A vector space is a set V with binary operations +: V V V and : R V V such that for all

More information

Chapter 7: Symmetric Matrices and Quadratic Forms

Chapter 7: Symmetric Matrices and Quadratic Forms Chapter 7: Symmetric Matrices and Quadratic Forms (Last Updated: December, 06) These notes are derived primarily from Linear Algebra and its applications by David Lay (4ed). A few theorems have been moved

More information

Feature selection and extraction Spectral domain quality estimation Alternatives

Feature selection and extraction Spectral domain quality estimation Alternatives Feature selection and extraction Error estimation Maa-57.3210 Data Classification and Modelling in Remote Sensing Markus Törmä markus.torma@tkk.fi Measurements Preprocessing: Remove random and systematic

More information

Mathematical foundations - linear algebra

Mathematical foundations - linear algebra Mathematical foundations - linear algebra Andrea Passerini passerini@disi.unitn.it Machine Learning Vector space Definition (over reals) A set X is called a vector space over IR if addition and scalar

More information