Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Size: px
Start display at page:

Download "Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling"

Transcription

1 Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany 1 / 31

2 Outline 2. Non-linear Dimensionality Reduction 3. Supervised Dimensionality Reduction 2 / 31

3 Outline 2. Non-linear Dimensionality Reduction 3. Supervised Dimensionality Reduction 1 / 31

4 The Dimensionality Reduction Problem Given a set X called data space, e.g., X := R m, a set X X called data, a function D : X X,K N (R K ) X R + 0 called distortion where D(P) measures how bad a low dimensional representation P : X R K for a data set X X is, and a number K N of latent dimensions, find a low dimensional representation P : X R K with K dimensions with minimal distortion D(P). 1 / 31

5 Distortions for Dimensionality Reduction (1/2) Let d X be a distance on X and d Z be a distance on the latent space R K, usually just the Euclidean distance K d Z (v, w) := v w 2 = ( (v i w i ) 2 ) 1 2 i=1 Multidimensional scaling aims to find latent representations P that reproduce the distance measure d X as good as possible: D(P) := = 2 X ( X 1) 2 n(n 1) x,x X x x (d X (x, x ) d Z (P(x), P(x ))) 2 n i 1 (d X (x i, x j ) z i z j ) 2, z i := P(x i ) i=1 j=1 2 / 31

6 Distortions for Dimensionality Reduction (2/2) Feature reconstruction methods aim to find latent representations P and reconstruction maps r : R K X from a given class of maps that reconstruct features as good as possible: D(P, r) := 1 X = 1 n d X (x, r(p(x))) x X n d X (x i, r(z i )), z i := P(x i ) i=1 3 / 31

7 PCA: Intuition We want to find an orthogonal basis which represent directions of maximum variance 4 / 31

8 PCA: Intuition we want to find an orthogonal basis which represent directions of maximum variance red line indicates direction of maximal variance within the data 5 / 31

9 PCA: Intuition we want to find an orthogonal basis which represent directions of maximum variance red line indicates direction of maximal variance within the data blue line represents direction of maximal variance given that it has to be orthogonal to the red line 6 / 31

10 PCA: Task We assume that the data has mean Zero: n x i = 0 i=1 Then PCA can be accomplished by finding the top K eigenvalues of the covariance matrix X X of the data! Mean has to be zero to account for different units in the data (Degrees C vs Degrees F) Can also be accomplished through a singular value decomposition of the data matrix X as we will see 7 / 31

11 Singular Value Decomposition (SVD) Theorem (Existence of SVD) For every A R n m there exist matrices U R n k,v R m k, Σ := diag(σ 1,..., σ k ) R k k, k := min{n, m} σ 1 σ 2 σ r > σ r+1 = = σ k = 0, r := rank(a) U, V orthonormal, i.e., U T U = I, V T V = I with A = UΣV T σ i are called singular values of A. Note: I := diag(1,..., 1) R k k denotes the unit matrix. 8 / 31

12 Singular Value Decomposition (SVD; 2/2) It holds: a) σi 2 are eigenvalues and V i eigenvectors of A T A: (A T A)V i = σi 2 V i, i = 1,..., k, V = (V 1,..., V k ) b) σi 2 are eigenvalues and U i eigenvectors of AA T : (AA T )U i = σi 2 U i, i = 1,..., k, U = (U 1,..., U k ) 9 / 31

13 Singular Value Decomposition (SVD; 2/2) It holds: a) σi 2 are eigenvalues and V i eigenvectors of A T A: (A T A)V i = σi 2 V i, i = 1,..., k, V = (V 1,..., V k ) b) σi 2 are eigenvalues and U i eigenvectors of AA T : (AA T )U i = σi 2 U i, i = 1,..., k, U = (U 1,..., U k ) proof: a) (A T A)V i = V Σ T U T UΣV T V i = V Σ 2 e i = σi 2 V i b) (AA T )U i = UΣ T V T V Σ T U T U i = UΣ 2 e i = σi 2 U i 9 / 31

14 Truncated SVD Let A R n m and UΣV T = A its SVD. Then for k min{n, m} the decomposition with A = U Σ V T U := (U,1,..., U,k ), V := (V,1,..., V,k ), Σ := diag(σ 1,..., σ k ) is called truncated SVD with rank k. 10 / 31

15 Low Rank Approximation Let A R n m. For k min{n, m}, any pair of matrices U R n k, V R m k is called a low rank approximation of A with rank k. The matrix UV T is called the reconstruction of A by U, V and the quantity A UV T F the L2 reconstruction error. 11 / 31

16 Low Rank Approximation Let A R n m. For k min{n, m}, any pair of matrices U R n k, V R m k is called a low rank approximation of A with rank k. The matrix UV T is called the reconstruction of A by U, V and the quantity A UV T F = n m (A i,j Ui T V j ) 2 i=1 j=1 the L2 reconstruction error. Note: A F is called Frobenius norm. 11 / 31

17 Optimal Low Rank Approximation is Truncated SVD Theorem (Low Rank Approximation; Eckart-Young theorem) Let A R n m. For k min{n, m}, the optimal low rank approximation of rank k (i.e., with smallest reconstruction error) is the truncated SVD. (U, V ) := arg min U R n k,v R m k A UV T 2 Note: As U, V do not have to be orthonormal, one can take U := U Σ, V := V for the SVD A = U Σ V T. 12 / 31

18 Principal Components Analysis (PCA) Let X := {x 1,..., x n } R m be a data set and K N the number of latent dimensions (K m). PCA finds K principal components v 1,..., v K R m and latent weights z i R K for each data point i {1,..., n}, such that the linear combination of the principal components K x i z i,k v k k=1 reconstructs the original features x i as good as possible: 13 / 31

19 Principal Components Analysis (PCA) Let X := {x 1,..., x n } R m be a data set and K N the number of latent dimensions (K m). PCA finds K principal components v 1,..., v K R m and latent weights z i R K for each data point i {1,..., n}, such that the linear combination of the principal components reconstructs the original features x i as good as possible: arg min v 1,...,v K z 1,...,zn = n x i i=1 K z i,k v k 2 k=1 n x i Vz i 2, i=1 V := (v 1,..., v K ) T 13 / 31

20 Principal Components Analysis (PCA) Let X := {x 1,..., x n } R m be a data set and K N the number of latent dimensions (K m). PCA finds K principal components v 1,..., v K R m and latent weights z i R K for each data point i {1,..., n}, such that the linear combination of the principal components reconstructs the original features x i as good as possible: arg min v 1,...,v K z 1,...,zn = n x i i=1 K z i,k v k 2 k=1 n x i Vz i 2, i=1 = X ZV T 2 F, V := (v 1,..., v K ) T thus PCA is just the SVD of the data matrix X. X := (x 1,..., x n ) T, Z := (z 1,..., z n ) T 13 / 31

21 PCA Algorithm 1: procedure dimred-pca(d := {x 1,..., x N } R M, K N) 2: X := (x 1, x 2,..., x N ) T 3: (U, Σ, V ) := svd(x ) 4: Z := U.,1:K Σ 1:K,1:K 5: return D dimred := {Z 1,.,..., Z N,. } 14 / 31

22 FIGURE Simulated data in three classes, near 15 / 31 Principal Components Analysis (Example 1) [HTFF05, p. 530]

23 Principal Components Analysis (Example 1) Second principal component First principal component FIGURE The best rank-two linear approximation to the half-sphere data. The right panel shows the [HTFF05, p. 536] projected points with coordinates given by U 2 D 2,the first two principal components of the data. 15 / 31

24 Principal Components Analysis (Example 2) FIGURE A sample of 130 handwritten [HTFF05, 3 s p. 537] shows a variety of writing styles. 16 / 31

25 Principal Components Analysis (Example 2) First Principal Component Second Principal Component O O O O O O O O O O O O O O O O O O O O O O O O O FIGURE (Left panel:) the first two principal components of the handwritten threes. The circled points are the closest projected images to the vertices of 16 / 31 [HTFF05, p. 538]

26 2. Non-linear Dimensionality Reduction Outline 2. Non-linear Dimensionality Reduction 3. Supervised Dimensionality Reduction 17 / 31

27 2. Non-linear Dimensionality Reduction Linear Dimensionality Reduction Dimensionality reduction accomplishes two tasks: 1. compute lower dimensional representations for given data points x i for PCA: u i = Σ 1 V T x i, U := (u 1,..., u n ) T 17 / 31

28 2. Non-linear Dimensionality Reduction Linear Dimensionality Reduction Dimensionality reduction accomplishes two tasks: 1. compute lower dimensional representations for given data points x i for PCA: u i = Σ 1 V T x i, U := (u 1,..., u n ) T 2. compute lower dimensional representations for new data points x (often called fold in ) for PCA: u := arg min x V Σu 2 = Σ 1 V T x u 17 / 31

29 2. Non-linear Dimensionality Reduction Linear Dimensionality Reduction Dimensionality reduction accomplishes two tasks: 1. compute lower dimensional representations for given data points x i for PCA: u i = Σ 1 V T x i, U := (u 1,..., u n ) T 2. compute lower dimensional representations for new data points x (often called fold in ) for PCA: u := arg min x V Σu 2 = Σ 1 V T x u PCA is called a linear dimensionality reduction technique because the latent representations u depend linearly on the observed representations x. 17 / 31

30 2. Non-linear Dimensionality Reduction Kernel Trick Represent (conceptionally) non-linearity by linearity in a higher dimensional embedding φ : R m R m but compute in lower dimensionality for methods that depend on x only through a scalar product x T θ = φ(x) T φ(θ) = k(x, θ), x, θ R m if k can be computed without explicitly computing φ. 18 / 31

31 2. Non-linear Dimensionality Reduction Kernel Trick / Example Example: φ : R R 1001, ( ( 1000 x i ) 1 ) 2 x i i=0,..., ( x T θ = φ(x) T 1000 φ(θ) = i Naive computation: i=0 = x x x 999 x 1000 ) x i θ i = (1 + xθ) 1000 =: k(x, θ) 2002 binomial coefficients, 3003 multiplications, 1000 additions. Kernel computation: 1 multiplication, 1 addition, 1 exponentiation. 19 / 31

32 2. Non-linear Dimensionality Reduction Kernel PCA φ :R m R m, φ(x 1 ) φ(x 2 ) X :=. φ(x n ) X UΣṼ T m m We can compute the columns of U as eigenvectors of X X T R n n without having to compute Ṽ R m k (which is large!): X X T U i = σ 2 i U i 20 / 31

33 2. Non-linear Dimensionality Reduction Kernel PCA / Removing the Mean Issue 1: The x i := φ(x i ) may not have zero mean and thus distort PCA. x i := x i 1 n n x i i=1 21 / 31

34 2. Non-linear Dimensionality Reduction Kernel PCA / Removing the Mean Issue 1: The x i := φ(x i ) may not have zero mean and thus distort PCA. x i := x i 1 n n x i i=1 = X T (I 1 n 1I) X :=( x 1,..., x n) T = (I 1 n 1I) X T Note: 1I := (1) i=1,...,n,j=1,...,n vector of ones, I := (δ(i = j)) Lars Schmidt-Thieme, i=1,...,n,j=1,...,n unity matrix. Nicolas Schilling, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany 21 / 31

35 2. Non-linear Dimensionality Reduction Kernel PCA / Removing the Mean Issue 1: The x i := φ(x i ) may not have zero mean and thus distort PCA. x i := x i 1 n n x i i=1 = X T (I 1 n 1I) X :=( x 1,..., x n) T = (I 1 n 1I) X T K := X X T = (I 1 n 1I) X T X (I 1 n 1I) =HKH, H := (I 1 1I) centering matrix n Thus, the kernel matrix K with means removed can be computed from the kernel matrix K without having to access coordinates. 21 / 31

36 2. Non-linear Dimensionality Reduction Kernel PCA / Fold In Issue 2: How to compute projections u of new points x (as Ṽ is not computed)? With Ṽ = X T UΣ 1 u := arg min x Ṽ Σu 2 = Σ 1 Ṽ T x u u = Σ 1 Ṽ T x = Σ 1 Σ 1 U T X x = Σ 2 U T (k(x i, x)) i=1,...,n u can be computed with access to kernel values only (and to U, Σ). 22 / 31

37 2. Non-linear Dimensionality Reduction Kernel PCA / Summary Given: data set X := {x 1,..., x n } R m, kernel function k : R m R m R. task 1: Learn latent representations U of data set X : K :=(k(x i, x j )) i=1,...,n,j=1,...,n (0) K :=HKH, H := (I 1 1I) (1) n (U, Σ) :=eigen decomposition (K ) (2) task 2: Learn latent representation u of new point x: u := Σ 2 U T (k(x i, x)) i=1,...,n (3) 23 / 31

38 2. Non-linear Dimensionality Reduction Kernel PCA: Example 1 Eigenvalue= Eigenvalue= Eigenvalue= Eigenvalue= Eigenvalue= Eigenvalue= Eigenvalue= Eigenvalue= [Mur12, p. 493] 24 / 31

39 2. Non-linear Dimensionality Reduction Kernel PCA: Example pca [Mur12, p. 495] 25 / 31

40 2. Non-linear Dimensionality Reduction Kernel PCA: Example kpca [Mur12, p. 495] 25 / 31

41 3. Supervised Dimensionality Reduction Outline 2. Non-linear Dimensionality Reduction 3. Supervised Dimensionality Reduction 26 / 31

42 3. Supervised Dimensionality Reduction Dimensionality Reduction as Pre-Processing Given a prediction task and a data set D train := {(x 1, y 1 ),..., (x n, y n )} R m Y. 1. compute latent features z i R K for the objects of a data set by means of dimensionality reduction of the predictors x i. e.g., using PCA on {x1,..., x n } R m 2. learn a prediction model on the latent features based on ŷ : R K Y D train := {(z 1, y 1 ),..., (z n, y n )} 3. treat the number K of latent dimensions as hyperparameter. e.g., find using grid search. 26 / 31

43 3. Supervised Dimensionality Reduction Dimensionality Reduction as Pre-Processing Advantages: simple procedure generic procedure works with any dimensionality reduction method and prediction method as component methods. usually fast 27 / 31

44 3. Supervised Dimensionality Reduction Dimensionality Reduction as Pre-Processing Advantages: simple procedure generic procedure works with any dimensionality reduction method and prediction method as component methods. usually fast Disadvantages: dimensionality reduction is unsupervised, i.e., not informed about the target that should be predicted later on. leads to the very same latent features regardless of the prediction task. likely not the best task-specific features are extracted. 27 / 31

45 3. Supervised Dimensionality Reduction Supervised PCA p(z) := N (z; 0, 1) p(x z; µ x, σ 2 x, W x ) := N (x; µ x + W x z, σ 2 xi ) p(y z; µ y, σ 2 y, W y ) := N (y; µ y + W y z, σ 2 y I ) like two PCAs, coupled by shared latent features z: one for the predictors x. one for the targets y. latent features act as information bottleneck. also known as Latent Factor Regression or Bayesian Factor Regression. 28 / 31

46 3. Supervised Dimensionality Reduction Supervised PCA: Discriminative Likelihood A simple likelihood would put the same weight on reconstructing the predictors and reconstructing the targets. A weight α R + 0 for the reconstruction error of the predictors should be introduced (discriminative likelihood): L α (Θ; x, y, z) := n p(y i z i ; Θ)p(x i z i ; Θ) α p(z i ; Θ) i=1 α can be treated as hyperparameter and found by grid search. 29 / 31

47 3. Supervised Dimensionality Reduction Supervised PCA: EM The M-steps for µ x, σ 2 x, W x and µ y, σ 2 y, W y are exactly as before. the coupled E-step is: ( 1 z i = σy 2 Wy T W y + α 1 ) 1 σx 2 Wx T W x ( 1 σy 2 Wy T (y i µ y ) + α 1 ) σx 2 Wx T (x i µ x ) 30 / 31

48 3. Supervised Dimensionality Reduction Conclusion (1/3) Dimensionality reduction aims to find a lower dimensional representation of data that preserves the information as much as possible. Preserving information means to preserve pairwise distances between objects (multidimensional scaling). to be able to reconstruct the original object features (feature reconstruction). The truncated Singular Value Decomposition (SVD) provides the best low rank factorization of a matrix in two factor matrices. SVD is usually computed by an algebraic factorization method (such as QR decomposition). 31 / 31

49 3. Supervised Dimensionality Reduction Conclusion (2/3) Principal components analysis (PCA) finds latent object and variable features that provide the best linear reconstruction (in L2 error). PCA is a truncated SVD of the data matrix. Probabilistic PCA (PPCA) provides a probabilistic interpretation of PCA. PPCA adds a L2 regularization of the object features. PPCA is learned by the EM algorithm. Adding L2 regularization for the linear reconstruction/variable features on top leads to Bayesian PCA. Generalizing to variable-specific variances leads to Factor Analysis. For both, Bayesian PCA and Factor Analysis, EM can be adapted easily. 31 / 31

50 3. Supervised Dimensionality Reduction Conclusion (3/3) To capture a nonlinear relationship between latent features and observed features, PCA can be kernelized (Kernel PCA). Learning a Kernel PCA is done by an eigen decomposition of the kernel matrix. Kernel PCA often is found to lead to unnatural visualizations. But Kernel PCA sometimes provides better classification performance for simple classifiers on latent features (such as 1-Nearest Neighbor). 31 / 31

51 Readings Principal Components Analysis (PCA) [HTFF05], ch , [Bis06], ch. 12.1, [Mur12], ch Probabilistic PCA [Bis06], ch. 12.2, [Mur12], ch Factor Analysis [HTFF05], ch , [Bis06], ch Kernel PCA [HTFF05], ch , [Bis06], ch. 12.3, [Mur12], ch / 31

52 Further Readings (Non-negative) Matrix Factorization [HTFF05], ch Independent Component Analysis, Exploratory Projection Pursuit [HTFF05], ch [Bis06], ch [Mur12], ch Nonlinear Dimensionality Reduction [HTFF05], ch. 14.9, [Bis06], ch / 31

53 References Christopher M. Bishop. Pattern recognition and machine learning, volume 1. springer New York, Trevor Hastie, Robert Tibshirani, Jerome Friedman, and James Franklin. The elements of statistical learning: data mining, inference and prediction, volume Kevin P. Murphy. Machine learning: a probabilistic perspective. The MIT Press, / 31

Machine Learning. A. Supervised Learning A.1. Linear Regression. Lars Schmidt-Thieme

Machine Learning. A. Supervised Learning A.1. Linear Regression. Lars Schmidt-Thieme Machine Learning A. Supervised Learning A.1. Linear Regression Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 02-01-2018 Biomedical data are usually high-dimensional Number of samples (n) is relatively small whereas number of features (p) can be large Sometimes p>>n Problems

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany Syllabus Fri. 21.10. (1) 0. Introduction A. Supervised Learning: Linear Models & Fundamentals Fri. 27.10. (2) A.1 Linear Regression Fri. 3.11. (3) A.2 Linear Classification Fri. 10.11. (4) A.3 Regularization

More information

Machine Learning - MT & 14. PCA and MDS

Machine Learning - MT & 14. PCA and MDS Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)

More information

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:

More information

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx

More information

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016 Lecture 8 Principal Component Analysis Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 13, 2016 Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 1 / 31 Outline 1 Eigen

More information

Exercise Sheet 1. 1 Probability revision 1: Student-t as an infinite mixture of Gaussians

Exercise Sheet 1. 1 Probability revision 1: Student-t as an infinite mixture of Gaussians Exercise Sheet 1 1 Probability revision 1: Student-t as an infinite mixture of Gaussians Show that an infinite mixture of Gaussian distributions, with Gamma distributions as mixing weights in the following

More information

14 Singular Value Decomposition

14 Singular Value Decomposition 14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

Unsupervised dimensionality reduction

Unsupervised dimensionality reduction Unsupervised dimensionality reduction Guillaume Obozinski Ecole des Ponts - ParisTech SOCN course 2014 Guillaume Obozinski Unsupervised dimensionality reduction 1/30 Outline 1 PCA 2 Kernel PCA 3 Multidimensional

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

Principal Component Analysis

Principal Component Analysis Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used

More information

PCA and admixture models

PCA and admixture models PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1

More information

An Introduction to Statistical and Probabilistic Linear Models

An Introduction to Statistical and Probabilistic Linear Models An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

Motivating the Covariance Matrix

Motivating the Covariance Matrix Motivating the Covariance Matrix Raúl Rojas Computer Science Department Freie Universität Berlin January 2009 Abstract This note reviews some interesting properties of the covariance matrix and its role

More information

Clustering VS Classification

Clustering VS Classification MCQ Clustering VS Classification 1. What is the relation between the distance between clusters and the corresponding class discriminability? a. proportional b. inversely-proportional c. no-relation Ans:

More information

IV. Matrix Approximation using Least-Squares

IV. Matrix Approximation using Least-Squares IV. Matrix Approximation using Least-Squares The SVD and Matrix Approximation We begin with the following fundamental question. Let A be an M N matrix with rank R. What is the closest matrix to A that

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Numerical Linear Algebra Background Cho-Jui Hsieh UC Davis May 15, 2018 Linear Algebra Background Vectors A vector has a direction and a magnitude

More information

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach Dr. Guangliang Chen February 9, 2016 Outline Introduction Review of linear algebra Matrix SVD PCA Motivation The digits

More information

CS 340 Lec. 6: Linear Dimensionality Reduction

CS 340 Lec. 6: Linear Dimensionality Reduction CS 340 Lec. 6: Linear Dimensionality Reduction AD January 2011 AD () January 2011 1 / 46 Linear Dimensionality Reduction Introduction & Motivation Brief Review of Linear Algebra Principal Component Analysis

More information

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 42 Outline 1 Introduction 2 Feature selection

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 5: Numerical Linear Algebra Cho-Jui Hsieh UC Davis April 20, 2017 Linear Algebra Background Vectors A vector has a direction and a magnitude

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality

More information

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR

More information

Principal components analysis COMS 4771

Principal components analysis COMS 4771 Principal components analysis COMS 4771 1. Representation learning Useful representations of data Representation learning: Given: raw feature vectors x 1, x 2,..., x n R d. Goal: learn a useful feature

More information

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012 Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Principal Components Analysis Le Song Lecture 22, Nov 13, 2012 Based on slides from Eric Xing, CMU Reading: Chap 12.1, CB book 1 2 Factor or Component

More information

Principal Component Analysis

Principal Component Analysis CSci 5525: Machine Learning Dec 3, 2008 The Main Idea Given a dataset X = {x 1,..., x N } The Main Idea Given a dataset X = {x 1,..., x N } Find a low-dimensional linear projection The Main Idea Given

More information

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) Principal Component Analysis (PCA) Additional reading can be found from non-assessed exercises (week 8) in this course unit teaching page. Textbooks: Sect. 6.3 in [1] and Ch. 12 in [2] Outline Introduction

More information

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary

More information

Machine learning for pervasive systems Classification in high-dimensional spaces

Machine learning for pervasive systems Classification in high-dimensional spaces Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version

More information

7 Principal Component Analysis

7 Principal Component Analysis 7 Principal Component Analysis This topic will build a series of techniques to deal with high-dimensional data. Unlike regression problems, our goal is not to predict a value (the y-coordinate), it is

More information

15 Singular Value Decomposition

15 Singular Value Decomposition 15 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

Singular Value Decomposition

Singular Value Decomposition Chapter 6 Singular Value Decomposition In Chapter 5, we derived a number of algorithms for computing the eigenvalues and eigenvectors of matrices A R n n. Having developed this machinery, we complete our

More information

Numerical Methods I Singular Value Decomposition

Numerical Methods I Singular Value Decomposition Numerical Methods I Singular Value Decomposition Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 October 9th, 2014 A. Donev (Courant Institute)

More information

Linear Dimensionality Reduction

Linear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis

More information

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY

More information

EECS 275 Matrix Computation

EECS 275 Matrix Computation EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 6 1 / 22 Overview

More information

Dimensionality Reduction with Principal Component Analysis

Dimensionality Reduction with Principal Component Analysis 10 Dimensionality Reduction with Principal Component Analysis Working directly with high-dimensional data, such as images, comes with some difficulties: it is hard to analyze, interpretation is difficult,

More information

Machine Learning. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Machine Learning. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Machine Learning Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1395 1 / 47 Table of contents 1 Introduction

More information

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Course 495: Advanced Statistical Machine Learning/Pattern Recognition Course 495: Advanced Statistical Machine Learning/Pattern Recognition Deterministic Component Analysis Goal (Lecture): To present standard and modern Component Analysis (CA) techniques such as Principal

More information

MLCC 2015 Dimensionality Reduction and PCA

MLCC 2015 Dimensionality Reduction and PCA MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2015 Outline PCA & Reconstruction PCA and Maximum Variance PCA and Associated Eigenproblem Beyond the First Principal Component

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 4 Jan-Willem van de Meent (credit: Yijun Zhao, Arthur Gretton Rasmussen & Williams, Percy Liang) Kernel Regression Basis function regression

More information

Data Mining Lecture 4: Covariance, EVD, PCA & SVD

Data Mining Lecture 4: Covariance, EVD, PCA & SVD Data Mining Lecture 4: Covariance, EVD, PCA & SVD Jo Houghton ECS Southampton February 25, 2019 1 / 28 Variance and Covariance - Expectation A random variable takes on different values due to chance The

More information

Data Mining II. Prof. Dr. Karsten Borgwardt, Department Biosystems, ETH Zürich. Basel, Spring Semester 2016 D-BSSE

Data Mining II. Prof. Dr. Karsten Borgwardt, Department Biosystems, ETH Zürich. Basel, Spring Semester 2016 D-BSSE D-BSSE Data Mining II Prof. Dr. Karsten Borgwardt, Department Biosystems, ETH Zürich Basel, Spring Semester 2016 D-BSSE Karsten Borgwardt Data Mining II Course, Basel Spring Semester 2016 2 / 117 Our course

More information

4 Bias-Variance for Ridge Regression (24 points)

4 Bias-Variance for Ridge Regression (24 points) Implement Ridge Regression with λ = 0.00001. Plot the Squared Euclidean test error for the following values of k (the dimensions you reduce to): k = {0, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500,

More information

The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA)

The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA) Chapter 5 The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA) 5.1 Basics of SVD 5.1.1 Review of Key Concepts We review some key definitions and results about matrices that will

More information

Lecture Notes 2: Matrices

Lecture Notes 2: Matrices Optimization-based data analysis Fall 2017 Lecture Notes 2: Matrices Matrices are rectangular arrays of numbers, which are extremely useful for data analysis. They can be interpreted as vectors in a vector

More information

Example: Face Detection

Example: Face Detection Announcements HW1 returned New attendance policy Face Recognition: Dimensionality Reduction On time: 1 point Five minutes or more late: 0.5 points Absent: 0 points Biometrics CSE 190 Lecture 14 CSE190,

More information

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations. Previously Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations y = Ax Or A simply represents data Notion of eigenvectors,

More information

GWAS V: Gaussian processes

GWAS V: Gaussian processes GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

Lecture 6: Methods for high-dimensional problems

Lecture 6: Methods for high-dimensional problems Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1).

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1). Regression and PCA Classification The goal: map from input X to a label Y. Y has a discrete set of possible values We focused on binary Y (values 0 or 1). But we also discussed larger number of classes

More information

A Least Squares Formulation for Canonical Correlation Analysis

A Least Squares Formulation for Canonical Correlation Analysis A Least Squares Formulation for Canonical Correlation Analysis Liang Sun, Shuiwang Ji, and Jieping Ye Department of Computer Science and Engineering Arizona State University Motivation Canonical Correlation

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

Linear Algebra Methods for Data Mining

Linear Algebra Methods for Data Mining Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 The Singular Value Decomposition (SVD) continued Linear Algebra Methods for Data Mining, Spring 2007, University

More information

EE731 Lecture Notes: Matrix Computations for Signal Processing

EE731 Lecture Notes: Matrix Computations for Signal Processing EE731 Lecture Notes: Matrix Computations for Signal Processing James P. Reilly c Department of Electrical and Computer Engineering McMaster University October 17, 005 Lecture 3 3 he Singular Value Decomposition

More information

Machine Learning (Spring 2012) Principal Component Analysis

Machine Learning (Spring 2012) Principal Component Analysis 1-71 Machine Learning (Spring 1) Principal Component Analysis Yang Xu This note is partly based on Chapter 1.1 in Chris Bishop s book on PRML and the lecture slides on PCA written by Carlos Guestrin in

More information

Linear and Non-Linear Dimensionality Reduction

Linear and Non-Linear Dimensionality Reduction Linear and Non-Linear Dimensionality Reduction Alexander Schulz aschulz(at)techfak.uni-bielefeld.de University of Pisa, Pisa 4.5.215 and 7.5.215 Overview Dimensionality Reduction Motivation Linear Projections

More information

Collaborative Filtering: A Machine Learning Perspective

Collaborative Filtering: A Machine Learning Perspective Collaborative Filtering: A Machine Learning Perspective Chapter 6: Dimensionality Reduction Benjamin Marlin Presenter: Chaitanya Desai Collaborative Filtering: A Machine Learning Perspective p.1/18 Topics

More information

Nonparameteric Regression:

Nonparameteric Regression: Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA

More information

CSC411: Final Review. James Lucas & David Madras. December 3, 2018

CSC411: Final Review. James Lucas & David Madras. December 3, 2018 CSC411: Final Review James Lucas & David Madras December 3, 2018 Agenda 1. A brief overview 2. Some sample questions Basic ML Terminology The final exam will be on the entire course; however, it will be

More information

Dimensionality Reduction

Dimensionality Reduction Lecture 5 1 Outline 1. Overview a) What is? b) Why? 2. Principal Component Analysis (PCA) a) Objectives b) Explaining variability c) SVD 3. Related approaches a) ICA b) Autoencoders 2 Example 1: Sportsball

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 8 Continuous Latent Variable

More information

Singular Value Decomposition

Singular Value Decomposition Singular Value Decomposition CS 205A: Mathematical Methods for Robotics, Vision, and Graphics Doug James (and Justin Solomon) CS 205A: Mathematical Methods Singular Value Decomposition 1 / 35 Understanding

More information

The Singular Value Decomposition

The Singular Value Decomposition The Singular Value Decomposition Philippe B. Laval KSU Fall 2015 Philippe B. Laval (KSU) SVD Fall 2015 1 / 13 Review of Key Concepts We review some key definitions and results about matrices that will

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,

More information

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis .. December 20, 2013 Todays lecture. (PCA) (PLS-R) (LDA) . (PCA) is a method often used to reduce the dimension of a large dataset to one of a more manageble size. The new dataset can then be used to make

More information

Linear Regression and Discrimination

Linear Regression and Discrimination Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision) CS4495/6495 Introduction to Computer Vision 8B-L2 Principle Component Analysis (and its use in Computer Vision) Wavelength 2 Wavelength 2 Principal Components Principal components are all about the directions

More information

be a Householder matrix. Then prove the followings H = I 2 uut Hu = (I 2 uu u T u )u = u 2 uut u

be a Householder matrix. Then prove the followings H = I 2 uut Hu = (I 2 uu u T u )u = u 2 uut u MATH 434/534 Theoretical Assignment 7 Solution Chapter 7 (71) Let H = I 2uuT Hu = u (ii) Hv = v if = 0 be a Householder matrix Then prove the followings H = I 2 uut Hu = (I 2 uu )u = u 2 uut u = u 2u =

More information

PCA and LDA. Man-Wai MAK

PCA and LDA. Man-Wai MAK PCA and LDA Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: S.J.D. Prince,Computer

More information

Lecture 7: Con3nuous Latent Variable Models

Lecture 7: Con3nuous Latent Variable Models CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 7: Con3nuous Latent Variable Models All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/

More information

Principal Component Analysis and Linear Discriminant Analysis

Principal Component Analysis and Linear Discriminant Analysis Principal Component Analysis and Linear Discriminant Analysis Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/29

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

1 Singular Value Decomposition and Principal Component

1 Singular Value Decomposition and Principal Component Singular Value Decomposition and Principal Component Analysis In these lectures we discuss the SVD and the PCA, two of the most widely used tools in machine learning. Principal Component Analysis (PCA)

More information

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis Massimiliano Pontil 1 Today s plan SVD and principal component analysis (PCA) Connection

More information

Nonlinear Dimensionality Reduction

Nonlinear Dimensionality Reduction Nonlinear Dimensionality Reduction Piyush Rai CS5350/6350: Machine Learning October 25, 2011 Recap: Linear Dimensionality Reduction Linear Dimensionality Reduction: Based on a linear projection of the

More information

Principal Component Analysis

Principal Component Analysis B: Chapter 1 HTF: Chapter 1.5 Principal Component Analysis Barnabás Póczos University of Alberta Nov, 009 Contents Motivation PCA algorithms Applications Face recognition Facial expression recognition

More information

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University Lecture 4: Principal Component Analysis Aykut Erdem May 016 Hacettepe University This week Motivation PCA algorithms Applications PCA shortcomings Autoencoders Kernel PCA PCA Applications Data Visualization

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

Preprocessing & dimensionality reduction

Preprocessing & dimensionality reduction Introduction to Data Mining Preprocessing & dimensionality reduction CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University Fall 2016 CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall 2016

More information

Principal Component Analysis and Singular Value Decomposition. Volker Tresp, Clemens Otte Summer 2014

Principal Component Analysis and Singular Value Decomposition. Volker Tresp, Clemens Otte Summer 2014 Principal Component Analysis and Singular Value Decomposition Volker Tresp, Clemens Otte Summer 2014 1 Motivation So far we always argued for a high-dimensional feature space Still, in some cases it makes

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

The Mathematics of Facial Recognition

The Mathematics of Facial Recognition William Dean Gowin Graduate Student Appalachian State University July 26, 2007 Outline EigenFaces Deconstruct a known face into an N-dimensional facespace where N is the number of faces in our data set.

More information

Machine Learning 2nd Edition

Machine Learning 2nd Edition INTRODUCTION TO Lecture Slides for Machine Learning 2nd Edition ETHEM ALPAYDIN, modified by Leonardo Bobadilla and some parts from http://www.cs.tau.ac.il/~apartzin/machinelearning/ The MIT Press, 2010

More information

LECTURE NOTE #11 PROF. ALAN YUILLE

LECTURE NOTE #11 PROF. ALAN YUILLE LECTURE NOTE #11 PROF. ALAN YUILLE 1. NonLinear Dimension Reduction Spectral Methods. The basic idea is to assume that the data lies on a manifold/surface in D-dimensional space, see figure (1) Perform

More information

Chemometrics: Classification of spectra

Chemometrics: Classification of spectra Chemometrics: Classification of spectra Vladimir Bochko Jarmo Alander University of Vaasa November 1, 2010 Vladimir Bochko Chemometrics: Classification 1/36 Contents Terminology Introduction Big picture

More information

Kernel methods, kernel SVM and ridge regression

Kernel methods, kernel SVM and ridge regression Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;

More information

Leverage Sparse Information in Predictive Modeling

Leverage Sparse Information in Predictive Modeling Leverage Sparse Information in Predictive Modeling Liang Xie Countrywide Home Loans, Countrywide Bank, FSB August 29, 2008 Abstract This paper examines an innovative method to leverage information from

More information