Advanced Machine Learning & Perception

Size: px
Start display at page:

Download "Advanced Machine Learning & Perception"

Transcription

1 Advanced Machine Learning & Perception Instructor: Tony Jebara

2 Topic 2 Nonlinear Manifold Learning Multidimensional Scaling (MDS) Locally Linear Embedding (LLE) Beyond Principal Components Analysis (PCA) Kernel PCA (KPCA) Semidefinite Embedding (SDE) Minimum Volume Embedding (MVE)

3 Principal Components Analysis Encode data on linear (flat) manifold as steps along its axes x j y j = µ + C c vi i=1 ij Best choice of µ, c and v is least squares or equivalently maximum Gaussian likelihood error = x j y 2 N j = j =1 N j =1 x j µ C c vi i=1 ij Take derivatives of error over µ, c and v and set to zero 2 µ = 1 N N j =1 x j, v = eig 1 N N j =1 x j µ ( )( x j µ ) T, c = ( x ij i µ ) T v j

4 Manifold Learning & Embedding Data is often not Gaussian and not in a linear subspace Consider image of face being translated from left-to-right x t = T t x0 nonlinear! How to capture the true coordinates of the data on the manifold or embedding space and represent it compactly? Unlike PCA, Embedding does not try to reconstruct the data Just finds better more compact coordinates on the manifold Example, instead of pixel intensity image (x,y) find a measure (t) of how far the face has translated. t

5 Multidimensional Scaling Idea: find low dimensional embedding that mimics only the distances between points X in original space Construct another set of low dimensional (say 2D) points with coordinates Y that maintain the pairwise distances A Dissimilarity d(x i,x j ) is a function of two inputs such that ( ) 0 ( ) = 0 ( ) = d ( x j, x i ) d x i, x j d x i, x i d x i, x j A Distance Metric is stricter, satisfies triangle inequality: ( ) d ( x i, x j ) + d ( x j, x k ) d x i, x k ( ) = 1 2 Standard example: Euclidean l2 metric d x i, x j Assume for N objects, we compute a dissimilarity Δ matrix which tells us how far they are Δ ij = d x i, x j ( ) x i x 2 j

6 Multidimensional Scaling Given dissimilarity Δ between original X points under original d() metric, find Y points with dissimilarity D under another d () metric such that D is similar to Δ Δ ij = d x i, x ( ) j D ij = d '( y i, y ) j Want to find Y s that minimize some difference from D to Δ Eg. Least Squares Stress = Stress ( y 1,, y ) N = D ij Δ ij Eg. Invariant Stress = Eg. Sammon Mapping = Eg. Strain = InvStress = ij 1 Δ ij Stress ( Y ) 2 i<j D ij ( ) 2 D ij Δ ij trace J ( Δ 2 D 2 )J Δ 2 D 2 ij ( ) 2 Some are global Some are local Gradient descent ( ( )) wherej = I 1 N 1 1 T

7 MDS Example 3D to 2D Have distances from cities to cities, these are on the surface of a sphere (Earth) in 3D space Reconstructed 2D points on plane capture essential properties (poles?)

8 MDS Example Multi-D to 2D More elaborate example Have correlation matrix between crimes. These are arbitrary dimensionality. Hack: convert correlation to dissimilarity and show reconstructed Y

9 Locally Linear Embedding Instead of trying to preserve ALL pairwise distances only preserve SOME distances across nearby points: Lets us unwrap manifold! Also, distances are only locally valid Euclidean distance is only similar to geodesic at small scales How do we pick which distances? Find the k nearest neighbors of each point and only preserve those

10 LLE with K-Nearest Neighbors Start with unconnected points x 1 x 2 Compute pairs of distances A ij = d(x i,x j ) Connect each point to its k closest points B ij = A ij <= sort(a i* ) k Then symmetrize the connectivity matrix B ij = max(b ji, B ij ) x x4 x 3 5 x 6 x 1 x 2 x 3 x 4 x 5 x 6 x x x x x x

11 LLE Instead of distance, look at neighborhood of each point. Preserve reconstruction of point with neighbors in low dim Find K nearest neighbors for each point Describe neighborhood as best weights on neighbors to reconstruct the point ε( W ) = subject to ( ) = i x i W ij j W ij x j j = 1 i Find best vectors that still have same weights 2 Φ Y y i W yj ij i j 2 subject to N y i = 0, i=1 N i=1 y T yi i = I

12 LLE Finding W s (convex combination of weights on neighbors): ( ) = i ε i ( W i ) ε W ε i ( W i ) = x i j W ijx j j = W ij xi x j where ε i W i ( ) = x i j W ijx j 2 ( ) 2 = Wij j x i x j ( ) j ( ) ( ) T ( xi x k ) ( ) T ( W xi ij x ) j jk = W ij W xi ik x j = W ij W ik C jk jk and recall j W ij = 1 W * 1 i = arg min w 2 wt Cw λ( w T 1) 1) Take Deriv & Set to 0 2) Solve Linear system Cw λ( 1 ) = 0 C w λ = 1 3) Find λ 4) Find w 2 w T 1 = 1 λ w T 1 = 1 λ

13 LLE Finding Y s (new low-d points that agree with the W s) 2 Φ( Y ) N N = i y i W yj ij subject to y i = 0, y T yi i = I = = i i y i W yj j ij ( k ) T T T T y yi i W yi yk ik W yj yi ij + W ij W yj yk ik ( ) T yi W ik yk k Where Y is a matrix whose rows are the y vectors To minimize the above subject to constraints we set Y as the bottom d+1 eigenvectors of M j jk ( ) jk ( ) y j T yk i = δ jk W jk W kj + W ij W ik jk = M jk yj T yk ( ) = tr MYY T j i=1 i=1

14 LLE Results Synthetic data on S-manifold Have noisy 3D samples X Get 2D LLE embedding Y Real data on face images Each x is an image that has been rasterized into a vector Dots are reconstructed two-dimensional Y points

15 LLE Results Top=PCA Bottom=LLE

16 Kernel Principal Components Analysis Recall, PCA approximates the data with eigenvectors, mean and coefficients: x i µ + Get eigenvectors that best approximating the covariance: Σ =V ΛV T Σ 11 Σ 12 Σ 13 Σ 12 Σ 22 Σ 23 Σ 13 Σ 23 Σ 33 = v 1 v2 v3 T Eigenvectors are orthonormal: v i v j = δ ij In coordinates of v, Gaussian is diagonal, cov = Λ Higher eigenvalues are higher variance, use those first To compute the coefficients: c ij = x i µ How to extend PCA to make it nonlinear? Kernels! λ 1 λ 2 λ 3 λ 4 λ λ λ 3 v1 ( ) T v j v2 C j =1 v3 c ij v j T

17 Kernel PCA Idea: replace dot-products in PCA with kernel evaluations. Recall, could do PCA on DxD covariance matrix of data C = 1 N T x xi N i λv = C Evals & If data is v i=1 Evecs zero-mean or NxN Gram matrix of data: K ij = x T satisfy i x j For nonlinearity, do PCA on feature expansions: N φ( x i )φ( x ) T i C = 1 i=1 N Instead of doing explicit feature expansion, use kernel I.e. d-th order polynomial K ij = k ( x i,x ) j = φ( x ) T i φ x j As usual, kernel must satisfy Mercer s theorem Assume, for simplicity, all N φ( x ) feature data is zero-mean i = 0 i=1 ( ) = ( x T i x ) d j

18 Kernel PCA Efficiently find & use eigenvectors of C-bar: λv = Cv Can dot either side of above equation with feature vector: ( ) T v = φ ( xi ) T Cv λφ x i Eigenvectors are in span of feature vectors: Combine equations: λφ( x i ) T v = φ ( xi ) T Cv ( ) T N α j φ( x j ) ( ) T N C α j φ( x j ) λφ x i λφ x i { } = φ x j =1 i ( ) T N α j φ( x j ) { } = φ x j =1 i λ N j =1 α j K ij = 1 N ( ) T 1 N λk α = K 2 α N λ α = K α N { } i=1 v = N i=1 α i φ( x ) i { N φ x N ( k )φ( x k ) T } N α k=1 j φ x j =1 j k=1 K ik N j =1 α j K kj ( ) { }

19 Kernel PCA From before, we had: λφ x i this is an eig equation! N λα = Kα Get eigenvectors α and eigenvalues of K Eigenvalues are N times λ For each eigenvector α k there is an eigenvector v k Want eigenvectors v to be normalized: v k Can now use alphas only for doing PCA projection & reconstruction! ( ) T v = φ ( xi ) T Cv ( ) T v k = 1 ( N α k i φ( x )) T N i=1 i α k j φ x j =1 j ( ( )) = 1 ( α k ) T Kα k = 1 ( α k ) T N λ k α k = 1 α k ( ) T α k = 1 N λ k

20 Kernel PCA To compute k th projection coefficient of a new point φ(x) { } = N α i ( ) T v k = φ( x) T N α k i φ( x ) i ( ) c k = φ x k k x,x i=1 i=1 i Reconstruction*: φ ( x) K = c k v k K N = α k k=1 i k ( x,x ) N i α k i=1 j φ( x ) k=1 j =1 j *Pre-image problem, linear combo in Hilbert goes outside Can now do nonlinear PCA and do PCA on non-vectors Nonlinear KPCA eigenvectors satisfy same properties as usual PCA but in Hilbert space. These evecs: 1) Top q have max variance 2) Top q reconstruction has with min mean square error 3) Are uncorrelated/orthogonal 4) Top have max mutual with inputs

21 Centering Kernel PCA So far, we had assumed the N data was zero-mean: φ( x i ) = 0 i=1 We want this: φ ( x ) j = φ( x j ) 1 N N φ( x ) i=1 i How to do without touching feature space? Use kernels K ij = φ ( x ) T i φ ( x ) j = ( φ( x i ) 1 N φ( x ) N ) T ( k=1 k φ( x j ) 1 N φ( x ) N ) k=1 k = φ( x ) T i φ( x j ) 1 N 1 N N k=1 = K ij 1 N ( ) φ x k Can get alpha eigenvectors from K tilde by adjusting old K N φ( x ) T i φ( x ) k N k=1 K kj k=1 T φ x j ( ) N N N N k=1 l =1 1 N K N k=1 ik + 1 N 2 φ( x ) T k φ( x ) l N N k=1 K l =1 kl

22 Kernel PCA Results KPCA on 2d dataset Left-to-right Kernel poly order goes from 1 to 3 1=linear=PCA Top-to-bottom top evec to weaker evecs

23 Kernel PCA Results Use coefficients of the KPCA for training a linear SVM classifier to recognize chairs from their images. Use various polynomial kernel degrees where 1=linear as in regular PCA

24 Kernel PCA Results Use coefficients of the KPCA for training a linear SVM classifier to recognize characters from their images. Use various polynomial kernel degrees where 1=linear as in regular PCA (worst case in experiments) Inferior performance to nonlinear SVMs (why??)

25 Semidefinite Embedding Also known as Maximum Variance Unfolding Similar to LLE and kernel PCA Like LLE, maintains only distance in the neighborhood Stretch all the data while maintaining the distances: Then apply PCA (or kpca)

26 Semidefinite Embedding To visualize high-dimensional {x 1,,x N } data: PCA and Kernel PCA (Sholkopf et al): -Get matrix A of affinities between pairs A ij =k(x i,x j ) -SVD A & view top projections Semidefinite Embedding (Weinberger, Saul): -Get k-nearest neighbors graph of data -Get matrix A -Use max trace SDP to stretch stretch graph A into PD graph K -SVD K & view top projections

27 Semidefinite Embedding SDE unfolds (pulls apart) knn connected graph C but preserves pairwise distances when C ij =1 max K λ i i κ = K R N N s.t.k 0 ij s.t. K ij = 0 s.t. K κ s.t.k ii + K jj K ij K ji = A ii + A jj A ij A ji if C ij = 1 SDE s stretching of graph improves the visualization

28 SDE Optimization with YALMIP Linear Programming <Quadratic Programming <Quadratically Constrained Quadratic Programming <Semidefinite Programming <Convex Programming <Polynomial Time Algorithms LP QP QCQP SDP CP P

29 SDE Optimization with YALMIP LP fast O(N 3 ) min x b T x s.t. ci T x αi i QP min x 1 2 x T H x + b T x s.t. ci T x αi i QCQP min x 1 2 x T H x + b T x s.t. ci T x αi i, x T x η SDP min K tr ( BK ) s.t. tr ( C T i K) α i i, K 0 ALL above in the YALMIP package for Matab! Google it, download and install! CP slow O(N 3 ) min x f x ( ) s.t. g ( x ) α

30 SDE Results SDE unfolds (pulls apart) knn connected graph C SDE s stretching of graph improves the visualization here

31 SDE Results SDE unfolds (pulls apart) knn connected graph C before doing PCA Gets more use or energy out of the top eigenvectors than PCA SDE s stretching of graph improves visualization here

32 SDE Problems MVE But SDE stretching could worsen visualization! Spokes Experiment: PCA Want to pull apart only in visualized dimensions Flatten down remaining ones vs.

33 Tony Jebara Minimum Volume Embedding To reduce dimension, drive energy into top (e.g. 2) dims Maximize fidelity or % energy in top dims ( ) = λ 1 + λ 2 F K i λ i F ( K) = 0.98 F ( K) = 0.29 Equivalent to maximizing Assume β=1/2 λ 1 + λ 2 β i λ i for some β 33

34 Minimum Volume Embedding Stretch in d<d top dimensions and squash rest. max K D λ i d λ i=1 i i=d +1 s.t. K κ Simplest Linear-Spectral SDP α = α 1 α d α d +1 α D = Effectively maximizes Eigengap between d th and d+1 th λ

35 Minimum Volume Embedding Stretch in d<d top dimensions and squash rest. max K D λ i d λ i=1 i i=d +1 s.t. K κ Simplest Linear-Spectral SDP α = α 1 α d α d +1 α D = Effectively maximizes Eigengap between d th and d+1 th λ Variational bound on cost Iterated Monotonic SDP Lock V and solve SDP K. Lock K and solve SVD for V.

36 MVE Optimization: Spectral SDP Spectral function: f(λ 1,,λ D ) eigenvalues of a matrix K SDP packages use restricted cost functions over hull kappa. Trace SDPs max K κ tr ( BK ) Logdet SDPs max K κ i log λ i Consider richer SDPs (assume λ i in decreasing order) Linear-Spectral SDPs max K κ i α i λ i Problem: last one doesn t fit nicely into standard SDP code (like YALMIP or CSDP). Need an iterative algorithm

37 MVE Optimization: Spectral SDP ( ) = α i λ i g K i If alphas are ordered get a variational SDP problem (a Procrustes problem). max K κ g K ( ) = max K κ i α i λ i = max K κ α i λ i tr v T i ( i v i ) s.t. Kv i = λ i v i T = max K κ i α i tr ( λ i v i v i ) λ i λ i+1 T = max K κ i α i tr ( Kv i v i ) v T i v j = δ ij T = max K κ tr ( K i α i v i v i ) = max min tr K α u u T K κ U ( i i i i ) s.t. u T i u j = δ ij if α i α i+1 T max K κ max U tr ( K α i u i u i ) s.t. u T i u j = δ ij if α i α i+1 i For max over K use SDP. For max over U use SVD. Iterate to obtain monotonic improvement.

38 MVE Optimization: Spectral SDP Theorem: if alpha decreasing the objective g K is convex. ( ) = α i λ i s.t.α i α i+1 i Proof: Recall from (Overton & Womersley 91) and (Fan 49) and (Bach & Jordan 03) Sum of d top eigenvalues of p.d. matrix is convex f d d ( K) = λ i i=1 convex Our linear-spectral cost is a combination of these 1 ( ) = α D f D ( K) + ( α i α i+1 ) f i ( K i=d 1 ) 1 = α D tr ( K) + α i α i+1 f i ( K) g K i=d 1 Trace (linear) + conic combo of convex fn s =convex

39 MVE Pseudocode Download at

40 MVE Results Spokes experiment visualization and spectra Converges in ~5 iterations

41 MVE Results Swissroll Visualization (Connectivity via knn) (d is set to 2) PCA SDE Same convergence under random initialization or K=A MVE

42 Tony Jebara knn Embedding with MVE MVE does better even with knn connectivity Face images with spectra PCA SDE MVE 42

43 Tony Jebara knn Embedding with MVE MVE does better even with knn Digit images visualization with spectra PCA SDE MVE 43

44 Tony Jebara Graph Embedding with MVE Collaboration network with spectra PCA SDE MVE 44 44

45 MVE Results Evaluate fidelity or % energy in top 2 dims kpca SDE MVE Star 95.0% 29.9% 100.0% Swiss Roll 45.8% 99.9% 99.9% Twos 18.4% 88.4% 97.8% Faces 31.4% 83.6% 99.2% Network 5.9% 95.3% 99.2%

46 MVE for Spanning Trees Instead of knn, use maximum weight spanning tree to connect points (Kruskal s algo: connect points with short edges first, skip edges that create loops, stop when tree) Tree connectivity can fold under SDE or MVE. Add constraints on all pairs to keep all distances from shrinking (call this SDE-FULL or MVE-FULL) K original κ and K ii + K jj K ij K ji A ii + A jj A ij A ji

47 MVE for Spanning Trees Tree connectivity with degree=2. Top: taxonomy tree of 30 species of salamanders Bottom: taxonomy tree of 56 species of crustaceans

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal

More information

Manifold Learning: Theory and Applications to HRI

Manifold Learning: Theory and Applications to HRI Manifold Learning: Theory and Applications to HRI Seungjin Choi Department of Computer Science Pohang University of Science and Technology, Korea seungjin@postech.ac.kr August 19, 2008 1 / 46 Greek Philosopher

More information

Nonlinear Dimensionality Reduction

Nonlinear Dimensionality Reduction Nonlinear Dimensionality Reduction Piyush Rai CS5350/6350: Machine Learning October 25, 2011 Recap: Linear Dimensionality Reduction Linear Dimensionality Reduction: Based on a linear projection of the

More information

LECTURE NOTE #11 PROF. ALAN YUILLE

LECTURE NOTE #11 PROF. ALAN YUILLE LECTURE NOTE #11 PROF. ALAN YUILLE 1. NonLinear Dimension Reduction Spectral Methods. The basic idea is to assume that the data lies on a manifold/surface in D-dimensional space, see figure (1) Perform

More information

Advanced Machine Learning & Perception

Advanced Machine Learning & Perception Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 6 Standard Kernels Unusual Input Spaces for Kernels String Kernels Probabilistic Kernels Fisher Kernels Probability Product Kernels

More information

L26: Advanced dimensionality reduction

L26: Advanced dimensionality reduction L26: Advanced dimensionality reduction The snapshot CA approach Oriented rincipal Components Analysis Non-linear dimensionality reduction (manifold learning) ISOMA Locally Linear Embedding CSCE 666 attern

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage

More information

Nonlinear Dimensionality Reduction

Nonlinear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Kernel PCA 2 Isomap 3 Locally Linear Embedding 4 Laplacian Eigenmap

More information

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations. Previously Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations y = Ax Or A simply represents data Notion of eigenvectors,

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 477 Instructor: Tony Jebara Topic 5 Generalization Guarantees VC-Dimension Nearest Neighbor Classification (infinite VC dimension) Structural Risk Minimization Support Vector Machines

More information

Advanced Machine Learning & Perception

Advanced Machine Learning & Perception Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 1 Introduction, researchy course, latest papers Going beyond simple machine learning Perception, strange spaces, images, time, behavior

More information

Dimension Reduction and Low-dimensional Embedding

Dimension Reduction and Low-dimensional Embedding Dimension Reduction and Low-dimensional Embedding Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/26 Dimension

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA Tobias Scheffer Overview Principal Component Analysis (PCA) Kernel-PCA Fisher Linear Discriminant Analysis t-sne 2 PCA: Motivation

More information

CS798: Selected topics in Machine Learning

CS798: Selected topics in Machine Learning CS798: Selected topics in Machine Learning Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS798: Selected topics in Machine Learning

More information

Distance Preservation - Part 2

Distance Preservation - Part 2 Distance Preservation - Part 2 Graph Distances Niko Vuokko October 9th 2007 NLDR Seminar Outline Introduction Geodesic and graph distances From linearity to nonlinearity Isomap Geodesic NLM Curvilinear

More information

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY

More information

Nonlinear Methods. Data often lies on or near a nonlinear low-dimensional curve aka manifold.

Nonlinear Methods. Data often lies on or near a nonlinear low-dimensional curve aka manifold. Nonlinear Methods Data often lies on or near a nonlinear low-dimensional curve aka manifold. 27 Laplacian Eigenmaps Linear methods Lower-dimensional linear projection that preserves distances between all

More information

Learning a Kernel Matrix for Nonlinear Dimensionality Reduction

Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger kilianw@cis.upenn.edu Fei Sha feisha@cis.upenn.edu Lawrence K. Saul lsaul@cis.upenn.edu Department of Computer and Information

More information

Dimensionality Reduc1on

Dimensionality Reduc1on Dimensionality Reduc1on contd Aarti Singh Machine Learning 10-601 Nov 10, 2011 Slides Courtesy: Tom Mitchell, Eric Xing, Lawrence Saul 1 Principal Component Analysis (PCA) Principal Components are the

More information

Kernel Methods. Machine Learning A W VO

Kernel Methods. Machine Learning A W VO Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance

More information

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Yoshua Bengio Pascal Vincent Jean-François Paiement University of Montreal April 2, Snowbird Learning 2003 Learning Modal Structures

More information

Dimensionality reduction. PCA. Kernel PCA.

Dimensionality reduction. PCA. Kernel PCA. Dimensionality reduction. PCA. Kernel PCA. Dimensionality reduction Principal Component Analysis (PCA) Kernelizing PCA If we have time: Autoencoders COMP-652 and ECSE-608 - March 14, 2016 1 What is dimensionality

More information

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian

More information

Nonlinear Dimensionality Reduction. Jose A. Costa

Nonlinear Dimensionality Reduction. Jose A. Costa Nonlinear Dimensionality Reduction Jose A. Costa Mathematics of Information Seminar, Dec. Motivation Many useful of signals such as: Image databases; Gene expression microarrays; Internet traffic time

More information

Lecture 10: Dimension Reduction Techniques

Lecture 10: Dimension Reduction Techniques Lecture 10: Dimension Reduction Techniques Radu Balan Department of Mathematics, AMSC, CSCAMM and NWC University of Maryland, College Park, MD April 17, 2018 Input Data It is assumed that there is a set

More information

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the

More information

Discriminative Direction for Kernel Classifiers

Discriminative Direction for Kernel Classifiers Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering

More information

Manifold Learning and it s application

Manifold Learning and it s application Manifold Learning and it s application Nandan Dubey SE367 Outline 1 Introduction Manifold Examples image as vector Importance Dimension Reduction Techniques 2 Linear Methods PCA Example MDS Perception

More information

Learning a kernel matrix for nonlinear dimensionality reduction

Learning a kernel matrix for nonlinear dimensionality reduction University of Pennsylvania ScholarlyCommons Departmental Papers (CIS) Department of Computer & Information Science 7-4-2004 Learning a kernel matrix for nonlinear dimensionality reduction Kilian Q. Weinberger

More information

Preprocessing & dimensionality reduction

Preprocessing & dimensionality reduction Introduction to Data Mining Preprocessing & dimensionality reduction CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University Fall 2016 CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall 2016

More information

SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS

SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS VIKAS CHANDRAKANT RAYKAR DECEMBER 5, 24 Abstract. We interpret spectral clustering algorithms in the light of unsupervised

More information

Advanced Machine Learning & Perception

Advanced Machine Learning & Perception Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 1 Introduction, researchy course, latest papers Going beyond simple machine learning Perception, strange spaces, images, time, behavior

More information

Unsupervised dimensionality reduction

Unsupervised dimensionality reduction Unsupervised dimensionality reduction Guillaume Obozinski Ecole des Ponts - ParisTech SOCN course 2014 Guillaume Obozinski Unsupervised dimensionality reduction 1/30 Outline 1 PCA 2 Kernel PCA 3 Multidimensional

More information

Non-linear Dimensionality Reduction

Non-linear Dimensionality Reduction Non-linear Dimensionality Reduction CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Laplacian Eigenmaps Locally Linear Embedding (LLE)

More information

Data-dependent representations: Laplacian Eigenmaps

Data-dependent representations: Laplacian Eigenmaps Data-dependent representations: Laplacian Eigenmaps November 4, 2015 Data Organization and Manifold Learning There are many techniques for Data Organization and Manifold Learning, e.g., Principal Component

More information

Nonlinear Manifold Learning Summary

Nonlinear Manifold Learning Summary Nonlinear Manifold Learning 6.454 Summary Alexander Ihler ihler@mit.edu October 6, 2003 Abstract Manifold learning is the process of estimating a low-dimensional structure which underlies a collection

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

More information

Lecture: Some Practical Considerations (3 of 4)

Lecture: Some Practical Considerations (3 of 4) Stat260/CS294: Spectral Graph Methods Lecture 14-03/10/2015 Lecture: Some Practical Considerations (3 of 4) Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning: these notes are still very rough.

More information

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach Dr. Guangliang Chen February 9, 2016 Outline Introduction Review of linear algebra Matrix SVD PCA Motivation The digits

More information

Principal Component Analysis

Principal Component Analysis Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used

More information

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015 EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,

More information

Unsupervised learning: beyond simple clustering and PCA

Unsupervised learning: beyond simple clustering and PCA Unsupervised learning: beyond simple clustering and PCA Liza Rebrova Self organizing maps (SOM) Goal: approximate data points in R p by a low-dimensional manifold Unlike PCA, the manifold does not have

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR

More information

14 Singular Value Decomposition

14 Singular Value Decomposition 14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

Kernel Principal Component Analysis

Kernel Principal Component Analysis Kernel Principal Component Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Unsupervised Learning: Dimensionality Reduction

Unsupervised Learning: Dimensionality Reduction Unsupervised Learning: Dimensionality Reduction CMPSCI 689 Fall 2015 Sridhar Mahadevan Lecture 3 Outline In this lecture, we set about to solve the problem posed in the previous lecture Given a dataset,

More information

Machine Learning - MT Clustering

Machine Learning - MT Clustering Machine Learning - MT 2016 15. Clustering Varun Kanade University of Oxford November 28, 2016 Announcements No new practical this week All practicals must be signed off in sessions this week Firm Deadline:

More information

Distance Metric Learning

Distance Metric Learning Distance Metric Learning Technical University of Munich Department of Informatics Computer Vision Group November 11, 2016 M.Sc. John Chiotellis: Distance Metric Learning 1 / 36 Outline Computer Vision

More information

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation Introduction and Data Representation Mikhail Belkin & Partha Niyogi Department of Electrical Engieering University of Minnesota Mar 21, 2017 1/22 Outline Introduction 1 Introduction 2 3 4 Connections to

More information

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x = Linear Algebra Review Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1 x x = 2. x n Vectors of up to three dimensions are easy to diagram.

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot, we get creative in two

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

Data Preprocessing Tasks

Data Preprocessing Tasks Data Tasks 1 2 3 Data Reduction 4 We re here. 1 Dimensionality Reduction Dimensionality reduction is a commonly used approach for generating fewer features. Typically used because too many features can

More information

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang. Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Kernel methods, kernel SVM and ridge regression

Kernel methods, kernel SVM and ridge regression Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

DIMENSION REDUCTION. min. j=1

DIMENSION REDUCTION. min. j=1 DIMENSION REDUCTION 1 Principal Component Analysis (PCA) Principal components analysis (PCA) finds low dimensional approximations to the data by projecting the data onto linear subspaces. Let X R d and

More information

7 Principal Component Analysis

7 Principal Component Analysis 7 Principal Component Analysis This topic will build a series of techniques to deal with high-dimensional data. Unlike regression problems, our goal is not to predict a value (the y-coordinate), it is

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Data Mining II. Prof. Dr. Karsten Borgwardt, Department Biosystems, ETH Zürich. Basel, Spring Semester 2016 D-BSSE

Data Mining II. Prof. Dr. Karsten Borgwardt, Department Biosystems, ETH Zürich. Basel, Spring Semester 2016 D-BSSE D-BSSE Data Mining II Prof. Dr. Karsten Borgwardt, Department Biosystems, ETH Zürich Basel, Spring Semester 2016 D-BSSE Karsten Borgwardt Data Mining II Course, Basel Spring Semester 2016 2 / 117 Our course

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 02-01-2018 Biomedical data are usually high-dimensional Number of samples (n) is relatively small whereas number of features (p) can be large Sometimes p>>n Problems

More information

CIS 520: Machine Learning Oct 09, Kernel Methods

CIS 520: Machine Learning Oct 09, Kernel Methods CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

Linear Dimensionality Reduction

Linear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis

More information

Apprentissage non supervisée

Apprentissage non supervisée Apprentissage non supervisée Cours 3 Higher dimensions Jairo Cugliari Master ECD 2015-2016 From low to high dimension Density estimation Histograms and KDE Calibration can be done automacally But! Let

More information

Global (ISOMAP) versus Local (LLE) Methods in Nonlinear Dimensionality Reduction

Global (ISOMAP) versus Local (LLE) Methods in Nonlinear Dimensionality Reduction Global (ISOMAP) versus Local (LLE) Methods in Nonlinear Dimensionality Reduction A presentation by Evan Ettinger on a Paper by Vin de Silva and Joshua B. Tenenbaum May 12, 2005 Outline Introduction The

More information

Kernel methods for comparing distributions, measuring dependence

Kernel methods for comparing distributions, measuring dependence Kernel methods for comparing distributions, measuring dependence Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Principal component analysis Given a set of M centered observations

More information

Each new feature uses a pair of the original features. Problem: Mapping usually leads to the number of features blow up!

Each new feature uses a pair of the original features. Problem: Mapping usually leads to the number of features blow up! Feature Mapping Consider the following mapping φ for an example x = {x 1,...,x D } φ : x {x1,x 2 2,...,x 2 D,,x 2 1 x 2,x 1 x 2,...,x 1 x D,...,x D 1 x D } It s an example of a quadratic mapping Each new

More information

Singular Value Decomposition and Principal Component Analysis (PCA) I

Singular Value Decomposition and Principal Component Analysis (PCA) I Singular Value Decomposition and Principal Component Analysis (PCA) I Prof Ned Wingreen MOL 40/50 Microarray review Data per array: 0000 genes, I (green) i,i (red) i 000 000+ data points! The expression

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Dimensionality Reduction AShortTutorial

Dimensionality Reduction AShortTutorial Dimensionality Reduction AShortTutorial Ali Ghodsi Department of Statistics and Actuarial Science University of Waterloo Waterloo, Ontario, Canada, 2006 c Ali Ghodsi, 2006 Contents 1 An Introduction to

More information

Locality Preserving Projections

Locality Preserving Projections Locality Preserving Projections Xiaofei He Department of Computer Science The University of Chicago Chicago, IL 60637 xiaofei@cs.uchicago.edu Partha Niyogi Department of Computer Science The University

More information

Metric Learning. 16 th Feb 2017 Rahul Dey Anurag Chowdhury

Metric Learning. 16 th Feb 2017 Rahul Dey Anurag Chowdhury Metric Learning 16 th Feb 2017 Rahul Dey Anurag Chowdhury 1 Presentation based on Bellet, Aurélien, Amaury Habrard, and Marc Sebban. "A survey on metric learning for feature vectors and structured data."

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

Notion of Distance. Metric Distance Binary Vector Distances Tangent Distance

Notion of Distance. Metric Distance Binary Vector Distances Tangent Distance Notion of Distance Metric Distance Binary Vector Distances Tangent Distance Distance Measures Many pattern recognition/data mining techniques are based on similarity measures between objects e.g., nearest-neighbor

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Yuanzhen Shao MA 26500 Yuanzhen Shao PCA 1 / 13 Data as points in R n Assume that we have a collection of data in R n. x 11 x 21 x 12 S = {X 1 =., X x 22 2 =.,, X x m2 m =.

More information

PARAMETERIZATION OF NON-LINEAR MANIFOLDS

PARAMETERIZATION OF NON-LINEAR MANIFOLDS PARAMETERIZATION OF NON-LINEAR MANIFOLDS C. W. GEAR DEPARTMENT OF CHEMICAL AND BIOLOGICAL ENGINEERING PRINCETON UNIVERSITY, PRINCETON, NJ E-MAIL:WGEAR@PRINCETON.EDU Abstract. In this report we consider

More information

Dimensionality Reduction: A Comparative Review

Dimensionality Reduction: A Comparative Review Tilburg centre for Creative Computing P.O. Box 90153 Tilburg University 5000 LE Tilburg, The Netherlands http://www.uvt.nl/ticc Email: ticc@uvt.nl Copyright c Laurens van der Maaten, Eric Postma, and Jaap

More information

Kernel Methods. Barnabás Póczos

Kernel Methods. Barnabás Póczos Kernel Methods Barnabás Póczos Outline Quick Introduction Feature space Perceptron in the feature space Kernels Mercer s theorem Finite domain Arbitrary domain Kernel families Constructing new kernels

More information

Beyond Scalar Affinities for Network Analysis or Vector Diffusion Maps and the Connection Laplacian

Beyond Scalar Affinities for Network Analysis or Vector Diffusion Maps and the Connection Laplacian Beyond Scalar Affinities for Network Analysis or Vector Diffusion Maps and the Connection Laplacian Amit Singer Princeton University Department of Mathematics and Program in Applied and Computational Mathematics

More information

15 Singular Value Decomposition

15 Singular Value Decomposition 15 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation Laplacian Eigenmaps for Dimensionality Reduction and Data Representation Neural Computation, June 2003; 15 (6):1373-1396 Presentation for CSE291 sp07 M. Belkin 1 P. Niyogi 2 1 University of Chicago, Department

More information

CSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13

CSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13 CSE 291. Assignment 3 Out: Wed May 23 Due: Wed Jun 13 3.1 Spectral clustering versus k-means Download the rings data set for this problem from the course web site. The data is stored in MATLAB format as

More information

Principal Component Analysis

Principal Component Analysis CSci 5525: Machine Learning Dec 3, 2008 The Main Idea Given a dataset X = {x 1,..., x N } The Main Idea Given a dataset X = {x 1,..., x N } Find a low-dimensional linear projection The Main Idea Given

More information

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis. Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Vector space Consists of: A set V A scalar

More information

Mathematical foundations - linear algebra

Mathematical foundations - linear algebra Mathematical foundations - linear algebra Andrea Passerini passerini@disi.unitn.it Machine Learning Vector space Definition (over reals) A set X is called a vector space over IR if addition and scalar

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Intrinsic Structure Study on Whale Vocalizations

Intrinsic Structure Study on Whale Vocalizations 1 2015 DCLDE Conference Intrinsic Structure Study on Whale Vocalizations Yin Xian 1, Xiaobai Sun 2, Yuan Zhang 3, Wenjing Liao 3 Doug Nowacek 1,4, Loren Nolte 1, Robert Calderbank 1,2,3 1 Department of

More information

7. Symmetric Matrices and Quadratic Forms

7. Symmetric Matrices and Quadratic Forms Linear Algebra 7. Symmetric Matrices and Quadratic Forms CSIE NCU 1 7. Symmetric Matrices and Quadratic Forms 7.1 Diagonalization of symmetric matrices 2 7.2 Quadratic forms.. 9 7.4 The singular value

More information

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision) CS4495/6495 Introduction to Computer Vision 8B-L2 Principle Component Analysis (and its use in Computer Vision) Wavelength 2 Wavelength 2 Principal Components Principal components are all about the directions

More information

Statistical Methods for SVM

Statistical Methods for SVM Statistical Methods for SVM Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot,

More information

1 Principal Components Analysis

1 Principal Components Analysis Lecture 3 and 4 Sept. 18 and Sept.20-2006 Data Visualization STAT 442 / 890, CM 462 Lecture: Ali Ghodsi 1 Principal Components Analysis Principal components analysis (PCA) is a very popular technique for

More information

Final Exam, Machine Learning, Spring 2009

Final Exam, Machine Learning, Spring 2009 Name: Andrew ID: Final Exam, 10701 Machine Learning, Spring 2009 - The exam is open-book, open-notes, no electronics other than calculators. - The maximum possible score on this exam is 100. You have 3

More information

Convex Optimization in Classification Problems

Convex Optimization in Classification Problems New Trends in Optimization and Computational Algorithms December 9 13, 2001 Convex Optimization in Classification Problems Laurent El Ghaoui Department of EECS, UC Berkeley elghaoui@eecs.berkeley.edu 1

More information