MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

Size: px
Start display at page:

Download "MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA"

Transcription

1 1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

2 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR C0 02!

3 3 Why reducing the data dimensionality? Reducing the dimensionality of the dataset at hand so that computation afterwards is more tractable Idea: sole a few of the dimensions matter, the projections of the data along the residual dimensions do not contain informative structure of the data (already a form of generalization)

4 4 Why reducing the data dimensionality? The curse of dimensionality refers to the exponential growth of volume covered by the parameters values to be tested as the dimensionality increases. In ML, analyzing high dimensional data is made particularly difficult because: Often one does not have enough observations to get good estimates (i.e. to sample enough all parameters). Adding more dimensions (hence more features) can increase the noise, and hence the error.

5 5 Principal Component Analysis Principal Component Analysis is a method widely used in Engineering and Science. Its principle is based on statistical analysis of the correlations underpinning the dataset at hand Its popularity is due to the fact that: Its computation is simple and tractable with an analytical solution Its result can be easily visualized through usually a 2 or 3 dimensional graph.

6 6 Co-Variance, Correlation The covariance and correlation are a measure of the dependency between two variables. Given two variables x and y (assuming that x and y are both zero mean): cov( x, y) E x, y E x E y, cov( xy, ) corr( x, y). var var x y x and y are said to be uncorrelated, if their covariance is zero: x y corr x, y 0 and cov, 0.

7 7 Co-Variance Matrix If X x i j i 1... M j1... N is a multidimensional dataset containing M N-dimensional datapoints, the covariance matrix C of X is given by: C E XX T X X X X cov 1, 1...cov 1,... cov X1, XN......cov XN, X N N C is diagonal when t h e d a t a X is decorrelated along each dimension. The rows X, j 1... N,represent the coordinate of each datapoint with respect to the j-th basis vector. The column of X contain the M datapoints. j T T C E XX ~ XX, since expectation is only a normalization factor.

8 8 Purpose of PCA Goal: to find a better representation of the dataset at hand so as to simplify computation afterwards Assumes a linear transformation Assumes maximal variance is a criterion Raw 2D dataset Projected onto two first Principal components

9 9 PCA: principle PRINCIPLE: Define a low dimensional manifold in the original space. Represent each data point X by its projection Y onto this manifold. FORMALISM: Consider a data set of MN-dimensional data points i i1... M i N j X= x and x, i 1,..., M : j1,... N PCA aims at finding a linear map A,s.t N A q A:, q N Y AX Y y y y 1 M i q,,..., and each

10 10 PCA: principle There are three equivalent methods for performing PCA: 1. Maximize the variance of the projection (Hotelling 1933). In other words, this method tries to maximize the spread of the projected data. 2. Minimize the reconstruction error (Pearson 1901), i.e. to minimize the squared distance between the original data and its estimate in a low dimensional manifold. 3. Mean Least Error of the parameter in a latent variable (Tipping and Bishop 1996)

11 11 Standard PCA: Variance Maximization through Eigenvalue Decomposition Algorithm: 1. Determine the direction (vector) along which the variance of the data is maximal. 2. Determine an orthonormal basis composed of the direction obtained in 1. The projection of the data onto each axis are uncorrelated.

12 12 Standard PCA: Variance Maximization through Eigenvalue Decomposition Algorithm: 1) Zero mean: X X ' X - E X 2) Compute Covariance matrix: C E X ' X ' 3) Compute eigenvalues using C I 0, i 1... N 4) Compute eigenvectors using Ce i i e 1 q 5) Choose first q N eigenvectors: e,... e with q 1 1 e1... e N 6) Project data onto new basis: X ' Y AqX ', Aq.. q q e1... e N i i T

13 13 Standard PCA: Example Sepatatiing line Two classes with 20 and 16 examples in each class Projection of the image datapoints on the first and 2 nd PC Demo PCA for Face Classification By projecting a set of images of two classes (two different persons) onto first two principal component allows to extract features particular to each class, which can then be use for classification.

14 14 Principal Component Analysis LIMITATION OF STANDARD and MSQ PCA: The variance-covariance matrix needs to be calculated: Can be very computation-intensive for large datasets with a high # of dimensions Does not deal properly with missing data Incomplete data must either be discarded or imputed using ad-hoc methods Outliers can unduly affect the analysis Probabilistic PCA addresses some of the above limitations

15 15 Probabilistic PCA i i 1... j M The data X x are samples of the distribution of the random variable x. j1... N x is generated by the latent variable z following: T x W z The latent variable z corresponds to the unobserved variable. It is a lower dimensional representation of the data and their dependencies. In Probabilistic PCA, the latent variable model consists then of: x: observed variables (Dimension N) z: latent variables (Dimension q) with q<n Less dimensions results in more parsimonious models.

16 16 Probabilistic PCA Assumptions: i i 1... j The data X x are samples of the distribution of the random variable x. j1... N - The latent variable z are centered and white, i.e. z 0, I - is a parameter (usually the mean of the data x) - the noise follows a zero mean Gaussian distribution 0, T - W is a N q matrix. M x is generated by the latent variable z following: T x W z Variance of the noise is diagonal conditional independence on the observables given the latent variables; z encapsulates all correlations across original dimensions. 2

17 17 Probabilistic PCA p(z) Assumptions: - The latent variable z are centered and white, i.e. z 0, I - is a parameter (usually the mean of the data x) - the noise follows a zero mean Gaussian distribution 0, T - W is a N q matrix. z Variance of the noise is diagonal conditional independence on the observables given the latent variables; z encapsulates all correlations across original dimensions. 2

18 18 Probabilistic PCA p(z) x 2 p(x z 1 ) w Z 1 * w z 1 z x 1 2 Assuming further an isotropic Gaussian noise model N (0, I) conditional probability distribution of the observables given the latent variables p( x z) is given by: T 2 px ( z) W z, I

19 19 Probabilistic PCA p(z) x 2 p(x z 1 ) w z 1 z p(x) Z 1 * w x 1 Axes of the ellipse correspond to the colums of W, i.e. to the eigenvectors of the covariance matrix: XX T The marginal distribution can be computed by integrating out the latent variable and is then: T 2, p x W W I Open parameters; can be learned through maximum likelihood

20 20 Probabilistic PCA through Maximum Likelihood If we set T 2 B W W I 1 M datapoints X= x,..., x., one can then compute the log-likelihood: M 1 ln L B,, ln 2 ln N B tr B C 2 where C is the sample covariance matrix of the complete set of M The maximum likelihood estimate of is the mean of the dataset X. The parameters B and are estimated through E-M. See lecture notes for values of B and + exercises for derivation

21 21 Probabilistic PCA through Maximum Likelihood The use of E-M to estimate the variables of PPCA offers a natural approach to the estimation of the principal axes when some of the data vectors X exhibit one or more missing values. Exploit E-M approach to estimate the latent variables. - Compute complete likelihood of the dataset (dataset is X and Z): log p X, Z, W, and treat X as the missing data! - E-step: compute expectation of complete log likelihood using estimate of p( z x) and current parameters - M-step: maximize parameters W, Iterate until likelihood no longer increases Allows on-line learning (incremental update)

22 23 Probabilistic PCA p(z x) x 2 w ww T (x-x- z The conditional distribution of the latent variable over the data is given by: T 2 p z x B W x, B, B W W I p(x) x 1 Is again Gaussian! Axes of the ellipse correspond to the colums of W, i.e. to the eigenvectors of the covariance matrix: XX T In the absence of noise, one recovers standard PCA, as T 1 W W W x the latent space. is an orthogonal projection of x onto

23 24 Probabilistic PCA: Dimensionality Reduction Reduction of the dimensionality is obtained by looking at the latent variable and estimating its distribution. Reduce dimensionality by projecting onto a subset q of the dimensions p z x B W x, B, B W Wq I T 2 q In the absence of noise, one recovers standard PCA, as T 1 W W W x the latent space. is an orthogonal projection of x onto

24 25 Probabilistic PCA: Summary Idea: Assume that the data X were generated by a Gaussian latent variable model, Probabilistic PCA consists then in estimating the density of the latent variable through maximum likelihood. Probabilistic PCA is then PCA through projection on a latent space. Advantages of expressing PCA in probabilistic form: It can easily be extended to estimation from mixtures of PCA models. The estimated density can easily be used for classification and other Bayesian computation afterwards.

25 26 Probabilistic PCA: Summary p(z) x 2 p(x z 1 ) w Z 1 * w p(x) z 1 z x 1 Assumptions: underlying latent variable has a Gaussian distribution linear relationship between latent and observed variables isotropic Gaussian noise in observed dimensions

26 27 Revisiting the hypotheses of PCA PCA assumed a linear transformation Non-linear PCA (Kernel PCA): to find a non-linear embedding of the data

27 28 Going back to linearity Find a non-linear transformation that send the data in a space where linear computation is again feasible.

28 Kernel-Induced Feature Space Idea: Send the data X into a feature space H through the nonlinear map f. i1... M i N 1,..., M f f f X x X x x f H Performs linear PCA in feature space Original Space x 2 e 2 x 1 e 1 29

29 30 Kernel-Induced Feature Space Idea: Send the data X into a feature space H through the nonlinear map f. i1... M i N 1,..., M f f f X x X x x x 2 Original Space f While the dimension of the original space is N, the dimension of the feature space may be greater than N! X is lifted onto H Determining f is difficult Kernel Trick

30 31 The Kernel Trick In most cases, determining the transformation f may be difficult. Linear PCA computes an inner product across pairs of observations: i x, x j No need to compute the transformation f, if one expresses everything as a function of the inner product in feature space the kernel function: k : X X i j i j f f k x, x x, x. Metric of similarity across datapoints May extract some features

31 32 Popular Kernels Gaussian / RBF Kernel (translation-invariant): xx' k x x e 2 2, ',. Homogeneous Polynomial Kernels: k x, x' x, x', p ; p Inhomogeneous Polynomial Kernels: k x, x' x, x' c, p, c 0 p

32 33 From Linear PCA to Kernel PCA Rewriting PCA in terms of dot products: 1 N Each eigenvector e,..., e found by linear PCA can be expressed as a linear combination of the datapoints: M 1 Using = T Ce x x e with Ce ie M i j j i i i j1 we obtain, M 1 T e x x e M i j j i i j1 i j

33 34 From Linear PCA to Kernel PCA Rewriting PCA in terms of dot products: 1 N Each eigenvector e,..., e found by linear PCA can be expressed as a linear combination of the datapoints: M 1 Using = T Ce x x e with Ce ie M i j j i i i j1 we obtain, M 1 T e x x e M i j j i i j1 i j 1 M i Scalar M j1 i j x j.

34 35 Linear PCA in Feature Space Sending the data in feature space through f: f: X H x f x Assume that, in feature space H, the data are centered: M i1 f x i 0 The covariance matrix in the feature space is: C f 1 M FF i The columns i 1... M of F are composed of f x. T

35 36 Linear PCA in Feature Space As in the original space, in feature space, the covariance matrix can be diagonalized and we have now to find the eigenvalues i > 0, satisfying: C v f i v i i f i f j i j i f x, C v x, v, i, j 1,... M All solutions v with different of zero lie in the span of the fx 1 ),, fx M ), and we can thus write: M i i j i i i iv i x 1 M j1 f, [... ]

36 37 Linear PCA in Feature Space M i i j i x f f i f Cf v f j1 x, C v x, v j i j i 1 M f f f f f f M M M l i,,, i j j l j i j x x x x x x l1 j1 j1 Given that: ij i, j f K f x x Kernel Trick eigenvalue problem of the form: i i K M, M: number of datapoints i Dual eigenvalue problem of finding the eigenvectors v of Cf.

37 38 Linear PCA in Feature Space The solutions to the dual eigenvalue problem : 1 M are given by all the eigenvectors,..., with non-zero eigenvalues,...,. Asking that the eigenvectors v of i i i.e. v, v 1 i 1,..., M 1 M C f be normalized, 1 is equivalent to asking that the dual eigenvectors,..., i i are such that: 1/. Kernel PCA finds at most M eigenvectors M: number of datapoints M>>N dimension of each datapoint M

38 39 Constructing the kpca projections We cannot see the projection in feature space! We can only compute the projections of each point onto each eigenvector i Projection of query point x onto eigenvector v : M M j j i i j i j v, f x f x, f x k x, x j1 j1 Sum over all training points Contour lines group points with equal projection: i All points x, s. t: v, f x cst.

39 40 H From Scholkopf & Smola, 2002 Contour linear in linear PCA are straight lines. In kpca, these appear curvy in original space, while straight in feature space.

40 41 Popular Kernels Gaussian / RBF Kernel (translation-invariant): xx' k x x e 2 2, ',. Homogeneous Polynomial Kernels: k x, x' x, x', p ; p Inhomogeneous Polynomial Kernels: k x, x' x, x' c, p, c 0 p

41 42 Kernel PCA: Examples From Scholkopf & Smola, 2002

42 43 Kernel PCA: Examples From Scholkopf & Smola, 2002

43 44 Kernel PCA: Examples MLDEMOS Two sets of circle datapoints Original Data

44 45 Kernel PCA: Examples MLDEMOS Gaussian Kernel Projections onto first two eigenvectors

45 46 Kernel PCA: Examples MLDEMOS Pair of Glasses datapoints Original Data

46 47 Kernel PCA: Examples MLDEMOS Gaussian Kernel, kernel width=0.9 Projections onto first two eigenvectors

47 48 Kernel PCA: Examples MLDEMOS Two sets of circle datapoints Original Data

48 49 Kernel PCA: Examples MLDEMOS Polynomial Kernel order p=20 Projections onto first two eigenvectors

49 50 Kernel PCA: Examples MLDEMOS Polynomial Kernel order p=20 Points clusters here Projections onto first two eigenvectors

50 51 Curse of Dimensionality Kernel PCA is very intensive computationally. Computation of the eigenvectors requires eigenvalue decomposition of the Gram matrix (Kernel Matrix is M x M) which grows quadratic ally with the number of data points M. Computation of each projection in original space grows linearly with M. too. A variety of sparse methods have been proposed in the literature

51 54 How to choose kernels? There is no rule for choosing the right kernel; each kernel must be adapted to a particular problem. Do a grid search over values of kernel parameters and perform crossvalidation for each choice. Some considerations are important: Kernel parameters are often related to geometrical properties of data; e.g kernel width in rbf kernel relates to size of the variance of the data Experimentally, there is some robustness in the choice, if the chosen kernels provide an acceptable trade-off between simpler and more efficient structure (e.g. linear separability), which requires some explosion Information structure preserving, which requires that the explosion is not too strong.

52 55 Coding Mini-Projects List of topics: - Co-Inertia Analysis (CIA) : equivalent to CCA - ISOMAP - Locally Liner Embedding - Laplacian Eigenmaps - Chirp classifier - Learning Vector Quantization Visualization

53 56 Coding Mini-Projects Instructions - Code should be embedded in mldemos, propertly tested for compilation - Implement the method with real-world or simulated data and analyze its performance, when possible comparing its performance to other equivalent available methods - In your report, describe briefly the method, its implementation and its evaluation.

54 57 Surveys of Literature Topics Real-world applications of kernel learning methods Application of continuous RL methods for robot control Inverse reinforcement learning methods Semi-supervised clustering with application to finance, Classification with SVM with application to finance Applications of Gaussian Process regression for robot control

55 58 Caveats when Conducting Surveys of Literature Surveys are done by team of two people Count hours of work including redaction Each member of the team reads about articles Do not use google! Use rather Google scholar, IEEEXplore and other known search engine (see Jot down notes as you read a paper! Be critical in your survey of the liteature. Report on contradictory findings, or spot claims unsubstantiated by data!

56 59 Reports Format Pick your projects (first come, first serve basis), see doodle poll at: Mini-Projects and Lit Survey are evaluated in two ways: 1. Written Reports 1. Report on coding project (10 pages maximum, 10pt minimum, single column; code from mini-project must be submitted together with the report). 2. Reports on lit. survey If you do the lit. survey as team of two, the maximum length of the survey should be 20 pages (10pt minimum, single column) Reports (+code) are due on May , 6pm and should be submitted electronically to 2. Oral presentation in class (10 minutes presentation) on May 31

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal

More information

Kernel methods for comparing distributions, measuring dependence

Kernel methods for comparing distributions, measuring dependence Kernel methods for comparing distributions, measuring dependence Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Principal component analysis Given a set of M centered observations

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

Maximum variance formulation

Maximum variance formulation 12.1. Principal Component Analysis 561 Figure 12.2 Principal component analysis seeks a space of lower dimensionality, known as the principal subspace and denoted by the magenta line, such that the orthogonal

More information

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inria.fr http://perception.inrialpes.fr/

More information

Machine Learning - MT & 14. PCA and MDS

Machine Learning - MT & 14. PCA and MDS Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Yoshua Bengio Pascal Vincent Jean-François Paiement University of Montreal April 2, Snowbird Learning 2003 Learning Modal Structures

More information

Lecture 7: Con3nuous Latent Variable Models

Lecture 7: Con3nuous Latent Variable Models CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 7: Con3nuous Latent Variable Models All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/

More information

LECTURE NOTE #11 PROF. ALAN YUILLE

LECTURE NOTE #11 PROF. ALAN YUILLE LECTURE NOTE #11 PROF. ALAN YUILLE 1. NonLinear Dimension Reduction Spectral Methods. The basic idea is to assume that the data lies on a manifold/surface in D-dimensional space, see figure (1) Perform

More information

Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis

Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture

More information

MACHINE LEARNING ADVANCED MACHINE LEARNING

MACHINE LEARNING ADVANCED MACHINE LEARNING MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 2 2 MACHINE LEARNING Overview Definition pdf Definition joint, condition, marginal,

More information

MACHINE LEARNING ADVANCED MACHINE LEARNING

MACHINE LEARNING ADVANCED MACHINE LEARNING MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 22 MACHINE LEARNING Discrete Probabilities Consider two variables and y taking discrete

More information

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang. Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015 EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 42 Outline 1 Introduction 2 Feature selection

More information

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:

More information

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations. Previously Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations y = Ax Or A simply represents data Notion of eigenvectors,

More information

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand

More information

Non-linear Dimensionality Reduction

Non-linear Dimensionality Reduction Non-linear Dimensionality Reduction CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Laplacian Eigenmaps Locally Linear Embedding (LLE)

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Machine Learning CSE546 Carlos Guestrin University of Washington November 13, 2014 1 E.M.: The General Case E.M. widely used beyond mixtures of Gaussians The recipe is the same

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 8 Continuous Latent Variable

More information

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II 1 Non-linear regression techniques Part - II Regression Algorithms in this Course Support Vector Machine Relevance Vector Machine Support vector regression Boosting random projections Relevance vector

More information

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

Unsupervised Learning

Unsupervised Learning 2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 02-01-2018 Biomedical data are usually high-dimensional Number of samples (n) is relatively small whereas number of features (p) can be large Sometimes p>>n Problems

More information

CS534 Machine Learning - Spring Final Exam

CS534 Machine Learning - Spring Final Exam CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the

More information

Nonlinear Dimensionality Reduction

Nonlinear Dimensionality Reduction Nonlinear Dimensionality Reduction Piyush Rai CS5350/6350: Machine Learning October 25, 2011 Recap: Linear Dimensionality Reduction Linear Dimensionality Reduction: Based on a linear projection of the

More information

Machine learning for pervasive systems Classification in high-dimensional spaces

Machine learning for pervasive systems Classification in high-dimensional spaces Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

Machine Learning Techniques for Computer Vision

Machine Learning Techniques for Computer Vision Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017 CPSC 340: Machine Learning and Data Mining More PCA Fall 2017 Admin Assignment 4: Due Friday of next week. No class Monday due to holiday. There will be tutorials next week on MAP/PCA (except Monday).

More information

Kernel methods, kernel SVM and ridge regression

Kernel methods, kernel SVM and ridge regression Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

Linear Dimensionality Reduction

Linear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

Kernels and the Kernel Trick. Machine Learning Fall 2017

Kernels and the Kernel Trick. Machine Learning Fall 2017 Kernels and the Kernel Trick Machine Learning Fall 2017 1 Support vector machines Training by maximizing margin The SVM objective Solving the SVM optimization problem Support vectors, duals and kernels

More information

Kernel Methods. Machine Learning A W VO

Kernel Methods. Machine Learning A W VO Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance

More information

1 Principal Components Analysis

1 Principal Components Analysis Lecture 3 and 4 Sept. 18 and Sept.20-2006 Data Visualization STAT 442 / 890, CM 462 Lecture: Ali Ghodsi 1 Principal Components Analysis Principal components analysis (PCA) is a very popular technique for

More information

Dimensionality reduction

Dimensionality reduction Dimensionality Reduction PCA continued Machine Learning CSE446 Carlos Guestrin University of Washington May 22, 2013 Carlos Guestrin 2005-2013 1 Dimensionality reduction n Input data may have thousands

More information

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

Dimensionality Reduction with Principal Component Analysis

Dimensionality Reduction with Principal Component Analysis 10 Dimensionality Reduction with Principal Component Analysis Working directly with high-dimensional data, such as images, comes with some difficulties: it is hard to analyze, interpretation is difficult,

More information

Kernel Methods in Machine Learning

Kernel Methods in Machine Learning Kernel Methods in Machine Learning Autumn 2015 Lecture 1: Introduction Juho Rousu ICS-E4030 Kernel Methods in Machine Learning 9. September, 2015 uho Rousu (ICS-E4030 Kernel Methods in Machine Learning)

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

7. Variable extraction and dimensionality reduction

7. Variable extraction and dimensionality reduction 7. Variable extraction and dimensionality reduction The goal of the variable selection in the preceding chapter was to find least useful variables so that it would be possible to reduce the dimensionality

More information

Mathematical foundations - linear algebra

Mathematical foundations - linear algebra Mathematical foundations - linear algebra Andrea Passerini passerini@disi.unitn.it Machine Learning Vector space Definition (over reals) A set X is called a vector space over IR if addition and scalar

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Reading: Ben-Hur & Weston, A User s Guide to Support Vector Machines (linked from class web page) Notation Assume a binary classification problem. Instances are represented by vector

More information

PCA and admixture models

PCA and admixture models PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1

More information

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY

More information

L5 Support Vector Classification

L5 Support Vector Classification L5 Support Vector Classification Support Vector Machine Problem definition Geometrical picture Optimization problem Optimization Problem Hard margin Convexity Dual problem Soft margin problem Alexander

More information

PRINCIPAL COMPONENTS ANALYSIS

PRINCIPAL COMPONENTS ANALYSIS 121 CHAPTER 11 PRINCIPAL COMPONENTS ANALYSIS We now have the tools necessary to discuss one of the most important concepts in mathematical statistics: Principal Components Analysis (PCA). PCA involves

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti

More information

Robustness of Principal Components

Robustness of Principal Components PCA for Clustering An objective of principal components analysis is to identify linear combinations of the original variables that are useful in accounting for the variation in those original variables.

More information

CS181 Midterm 2 Practice Solutions

CS181 Midterm 2 Practice Solutions CS181 Midterm 2 Practice Solutions 1. Convergence of -Means Consider Lloyd s algorithm for finding a -Means clustering of N data, i.e., minimizing the distortion measure objective function J({r n } N n=1,

More information

Advanced Machine Learning & Perception

Advanced Machine Learning & Perception Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 6 Standard Kernels Unusual Input Spaces for Kernels String Kernels Probabilistic Kernels Fisher Kernels Probability Product Kernels

More information

Advanced Machine Learning & Perception

Advanced Machine Learning & Perception Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 1 Introduction, researchy course, latest papers Going beyond simple machine learning Perception, strange spaces, images, time, behavior

More information

Perceptron Revisited: Linear Separators. Support Vector Machines

Perceptron Revisited: Linear Separators. Support Vector Machines Support Vector Machines Perceptron Revisited: Linear Separators Binary classification can be viewed as the task of separating classes in feature space: w T x + b > 0 w T x + b = 0 w T x + b < 0 Department

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Apprentissage non supervisée

Apprentissage non supervisée Apprentissage non supervisée Cours 3 Higher dimensions Jairo Cugliari Master ECD 2015-2016 From low to high dimension Density estimation Histograms and KDE Calibration can be done automacally But! Let

More information

Vector Space Models. wine_spectral.r

Vector Space Models. wine_spectral.r Vector Space Models 137 wine_spectral.r Latent Semantic Analysis Problem with words Even a small vocabulary as in wine example is challenging LSA Reduce number of columns of DTM by principal components

More information

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012 Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Principal Components Analysis Le Song Lecture 22, Nov 13, 2012 Based on slides from Eric Xing, CMU Reading: Chap 12.1, CB book 1 2 Factor or Component

More information

Chemometrics: Classification of spectra

Chemometrics: Classification of spectra Chemometrics: Classification of spectra Vladimir Bochko Jarmo Alander University of Vaasa November 1, 2010 Vladimir Bochko Chemometrics: Classification 1/36 Contents Terminology Introduction Big picture

More information

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University Lecture 4: Principal Component Analysis Aykut Erdem May 016 Hacettepe University This week Motivation PCA algorithms Applications PCA shortcomings Autoencoders Kernel PCA PCA Applications Data Visualization

More information

Linear Models for Regression. Sargur Srihari

Linear Models for Regression. Sargur Srihari Linear Models for Regression Sargur srihari@cedar.buffalo.edu 1 Topics in Linear Regression What is regression? Polynomial Curve Fitting with Scalar input Linear Basis Function Models Maximum Likelihood

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

CSC 411 Lecture 12: Principal Component Analysis

CSC 411 Lecture 12: Principal Component Analysis CSC 411 Lecture 12: Principal Component Analysis Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 12-PCA 1 / 23 Overview Today we ll cover the first unsupervised

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) Principal Component Analysis (PCA) Salvador Dalí, Galatea of the Spheres CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang Some slides from Derek Hoiem and Alysha

More information

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation)

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation) Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation) PCA transforms the original input space into a lower dimensional space, by constructing dimensions that are linear combinations

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

SVMs: nonlinearity through kernels

SVMs: nonlinearity through kernels Non-separable data e-8. Support Vector Machines 8.. The Optimal Hyperplane Consider the following two datasets: SVMs: nonlinearity through kernels ER Chapter 3.4, e-8 (a) Few noisy data. (b) Nonlinearly

More information

Kernel PCA, clustering and canonical correlation analysis

Kernel PCA, clustering and canonical correlation analysis ernel PCA, clustering and canonical correlation analsis Le Song Machine Learning II: Advanced opics CSE 8803ML, Spring 2012 Support Vector Machines (SVM) 1 min w 2 w w + C j ξ j s. t. w j + b j 1 ξ j,

More information

Machine Learning 2nd Edition

Machine Learning 2nd Edition INTRODUCTION TO Lecture Slides for Machine Learning 2nd Edition ETHEM ALPAYDIN, modified by Leonardo Bobadilla and some parts from http://www.cs.tau.ac.il/~apartzin/machinelearning/ The MIT Press, 2010

More information

HST.582J/6.555J/16.456J

HST.582J/6.555J/16.456J Blind Source Separation: PCA & ICA HST.582J/6.555J/16.456J Gari D. Clifford gari [at] mit. edu http://www.mit.edu/~gari G. D. Clifford 2005-2009 What is BSS? Assume an observation (signal) is a linear

More information

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision) CS4495/6495 Introduction to Computer Vision 8B-L2 Principle Component Analysis (and its use in Computer Vision) Wavelength 2 Wavelength 2 Principal Components Principal components are all about the directions

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

What is Principal Component Analysis?

What is Principal Component Analysis? What is Principal Component Analysis? Principal component analysis (PCA) Reduce the dimensionality of a data set by finding a new set of variables, smaller than the original set of variables Retains most

More information

Machine Learning. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Machine Learning. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Machine Learning Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1395 1 / 47 Table of contents 1 Introduction

More information

L26: Advanced dimensionality reduction

L26: Advanced dimensionality reduction L26: Advanced dimensionality reduction The snapshot CA approach Oriented rincipal Components Analysis Non-linear dimensionality reduction (manifold learning) ISOMA Locally Linear Embedding CSCE 666 attern

More information

Machine Learning. Support Vector Machines. Manfred Huber

Machine Learning. Support Vector Machines. Manfred Huber Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data

More information

Kernel-Based Contrast Functions for Sufficient Dimension Reduction

Kernel-Based Contrast Functions for Sufficient Dimension Reduction Kernel-Based Contrast Functions for Sufficient Dimension Reduction Michael I. Jordan Departments of Statistics and EECS University of California, Berkeley Joint work with Kenji Fukumizu and Francis Bach

More information

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary

More information