Advanced Machine Learning & Perception
|
|
- Earl Griffin
- 5 years ago
- Views:
Transcription
1 Advanced Machine Learning & Perception Instructor: Tony Jebara
2 Topic 2 Nonlinear Manifold Learning Multidimensional Scaling (MDS) Locally Linear Embedding (LLE) Beyond Principal Components Analysis (PCA) Kernel PCA (KPCA) Semidefinite Embedding (SDE) Minimum Volume Embedding (MVE)
3 Principal Components Analysis Encode data on linear (flat) manifold as steps along its axes x j y j = µ + C c vi i=1 ij Best choice of µ, c and v is least squares or equivalently maximum Gaussian likelihood error = x j y 2 N j = j =1 N j =1 x j µ C c vi i=1 ij Take derivatives of error over µ, c and v and set to zero 2 µ = 1 N N j =1 x j, v = eig 1 N N j =1 x j µ ( )( x j µ ) T, c = ( x ij i µ ) T v j
4 Manifold Learning & Embedding Data is often not Gaussian and not in a linear subspace Consider image of face being translated from left-to-right x t = T t x0 nonlinear! How to capture the true coordinates of the data on the manifold or embedding space and represent it compactly? Unlike PCA, Embedding does not try to reconstruct the data Just finds better more compact coordinates on the manifold Example, instead of pixel intensity image (x,y) find a measure (t) of how far the face has translated. t
5 Multidimensional Scaling Idea: find low dimensional embedding that mimics only the distances between points X in original space Construct another set of low dimensional (say 2D) points with coordinates Y that maintain the pairwise distances A Dissimilarity d(x i,x j ) is a function of two inputs such that ( ) 0 ( ) = 0 ( ) = d ( x j, x i ) d x i, x j d x i, x i d x i, x j A Distance Metric is stricter, satisfies triangle inequality: ( ) d ( x i, x j ) + d ( x j, x k ) d x i, x k ( ) = 1 2 Standard example: Euclidean l2 metric d x i, x j Assume for N objects, we compute a dissimilarity Δ matrix which tells us how far they are Δ ij = d x i, x j ( ) x i x 2 j
6 Multidimensional Scaling Given dissimilarity Δ between original X points under original d() metric, find Y points with dissimilarity D under another d () metric such that D is similar to Δ Δ ij = d x i, x ( ) j D ij = d '( y i, y ) j Want to find Y s that minimize some difference from D to Δ Eg. Least Squares Stress = Stress ( y 1,, y ) N = D ij Δ ij Eg. Invariant Stress = Eg. Sammon Mapping = Eg. Strain = InvStress = ij 1 Δ ij Stress ( Y ) 2 i<j D ij ( ) 2 D ij Δ ij trace J ( Δ 2 D 2 )J Δ 2 D 2 ij ( ) 2 Some are global Some are local Gradient descent ( ( )) wherej = I 1 N 1 1 T
7 MDS Example 3D to 2D Have distances from cities to cities, these are on the surface of a sphere (Earth) in 3D space Reconstructed 2D points on plane capture essential properties (poles?)
8 MDS Example Multi-D to 2D More elaborate example Have correlation matrix between crimes. These are arbitrary dimensionality. Hack: convert correlation to dissimilarity and show reconstructed Y
9 Locally Linear Embedding Instead of trying to preserve ALL pairwise distances only preserve SOME distances across nearby points: Lets us unwrap manifold! Also, distances are only locally valid Euclidean distance is only similar to geodesic at small scales How do we pick which distances? Find the k nearest neighbors of each point and only preserve those
10 LLE with K-Nearest Neighbors Start with unconnected points x 1 x 2 Compute pairs of distances A ij = d(x i,x j ) Connect each point to its k closest points B ij = A ij <= sort(a i* ) k Then symmetrize the connectivity matrix B ij = max(b ji, B ij ) x x4 x 3 5 x 6 x 1 x 2 x 3 x 4 x 5 x 6 x x x x x x
11 LLE Instead of distance, look at neighborhood of each point. Preserve reconstruction of point with neighbors in low dim Find K nearest neighbors for each point Describe neighborhood as best weights on neighbors to reconstruct the point ε( W ) = subject to ( ) = i x i W ij j W ij x j j = 1 i Find best vectors that still have same weights 2 Φ Y y i W yj ij i j 2 subject to N y i = 0, i=1 N i=1 y T yi i = I
12 LLE Finding W s (convex combination of weights on neighbors): ( ) = i ε i ( W i ) ε W ε i ( W i ) = x i j W ijx j j = W ij xi x j where ε i W i ( ) = x i j W ijx j 2 ( ) 2 = Wij j x i x j ( ) j ( ) ( ) T ( xi x k ) ( ) T ( W xi ij x ) j jk = W ij W xi ik x j = W ij W ik C jk jk and recall j W ij = 1 W * 1 i = arg min w 2 wt Cw λ( w T 1) 1) Take Deriv & Set to 0 2) Solve Linear system Cw λ( 1 ) = 0 C w λ = 1 3) Find λ 4) Find w 2 w T 1 = 1 λ w T 1 = 1 λ
13 LLE Finding Y s (new low-d points that agree with the W s) 2 Φ( Y ) N N = i y i W yj ij subject to y i = 0, y T yi i = I = = i i y i W yj j ij ( k ) T T T T y yi i W yi yk ik W yj yi ij + W ij W yj yk ik ( ) T yi W ik yk k Where Y is a matrix whose rows are the y vectors To minimize the above subject to constraints we set Y as the bottom d+1 eigenvectors of M j jk ( ) jk ( ) y j T yk i = δ jk W jk W kj + W ij W ik jk = M jk yj T yk ( ) = tr MYY T j i=1 i=1
14 LLE Results Synthetic data on S-manifold Have noisy 3D samples X Get 2D LLE embedding Y Real data on face images Each x is an image that has been rasterized into a vector Dots are reconstructed two-dimensional Y points
15 LLE Results Top=PCA Bottom=LLE
16 Kernel Principal Components Analysis Recall, PCA approximates the data with eigenvectors, mean and coefficients: x i µ + Get eigenvectors that best approximating the covariance: Σ =V ΛV T Σ 11 Σ 12 Σ 13 Σ 12 Σ 22 Σ 23 Σ 13 Σ 23 Σ 33 = v 1 v2 v3 T Eigenvectors are orthonormal: v i v j = δ ij In coordinates of v, Gaussian is diagonal, cov = Λ Higher eigenvalues are higher variance, use those first To compute the coefficients: c ij = x i µ How to extend PCA to make it nonlinear? Kernels! λ 1 λ 2 λ 3 λ 4 λ λ λ 3 v1 ( ) T v j v2 C j =1 v3 c ij v j T
17 Kernel PCA Idea: replace dot-products in PCA with kernel evaluations. Recall, could do PCA on DxD covariance matrix of data C = 1 N T x xi N i λv = C Evals & If data is v i=1 Evecs zero-mean or NxN Gram matrix of data: K ij = x T satisfy i x j For nonlinearity, do PCA on feature expansions: N φ( x i )φ( x ) T i C = 1 i=1 N Instead of doing explicit feature expansion, use kernel I.e. d-th order polynomial K ij = k ( x i,x ) j = φ( x ) T i φ x j As usual, kernel must satisfy Mercer s theorem Assume, for simplicity, all N φ( x ) feature data is zero-mean i = 0 i=1 ( ) = ( x T i x ) d j
18 Kernel PCA Efficiently find & use eigenvectors of C-bar: λv = Cv Can dot either side of above equation with feature vector: ( ) T v = φ ( xi ) T Cv λφ x i Eigenvectors are in span of feature vectors: Combine equations: λφ( x i ) T v = φ ( xi ) T Cv ( ) T N α j φ( x j ) ( ) T N C α j φ( x j ) λφ x i λφ x i { } = φ x j =1 i ( ) T N α j φ( x j ) { } = φ x j =1 i λ N j =1 α j K ij = 1 N ( ) T 1 N λk α = K 2 α N λ α = K α N { } i=1 v = N i=1 α i φ( x ) i { N φ x N ( k )φ( x k ) T } N α k=1 j φ x j =1 j k=1 K ik N j =1 α j K kj ( ) { }
19 Kernel PCA From before, we had: λφ x i this is an eig equation! N λα = Kα Get eigenvectors α and eigenvalues of K Eigenvalues are N times λ For each eigenvector α k there is an eigenvector v k Want eigenvectors v to be normalized: v k Can now use alphas only for doing PCA projection & reconstruction! ( ) T v = φ ( xi ) T Cv ( ) T v k = 1 ( N α k i φ( x )) T N i=1 i α k j φ x j =1 j ( ( )) = 1 ( α k ) T Kα k = 1 ( α k ) T N λ k α k = 1 α k ( ) T α k = 1 N λ k
20 Kernel PCA To compute k th projection coefficient of a new point φ(x) { } = N α i ( ) T v k = φ( x) T N α k i φ( x ) i ( ) c k = φ x k k x,x i=1 i=1 i Reconstruction*: φ ( x) K = c k v k K N = α k k=1 i k ( x,x ) N i α k i=1 j φ( x ) k=1 j =1 j *Pre-image problem, linear combo in Hilbert goes outside Can now do nonlinear PCA and do PCA on non-vectors Nonlinear KPCA eigenvectors satisfy same properties as usual PCA but in Hilbert space. These evecs: 1) Top q have max variance 2) Top q reconstruction has with min mean square error 3) Are uncorrelated/orthogonal 4) Top have max mutual with inputs
21 Centering Kernel PCA So far, we had assumed the N data was zero-mean: φ( x i ) = 0 i=1 We want this: φ ( x ) j = φ( x j ) 1 N N φ( x ) i=1 i How to do without touching feature space? Use kernels K ij = φ ( x ) T i φ ( x ) j = ( φ( x i ) 1 N φ( x ) N ) T ( k=1 k φ( x j ) 1 N φ( x ) N ) k=1 k = φ( x ) T i φ( x j ) 1 N 1 N N k=1 = K ij 1 N ( ) φ x k Can get alpha eigenvectors from K tilde by adjusting old K N φ( x ) T i φ( x ) k N k=1 K kj k=1 T φ x j ( ) N N N N k=1 l =1 1 N K N k=1 ik + 1 N 2 φ( x ) T k φ( x ) l N N k=1 K l =1 kl
22 Kernel PCA Results KPCA on 2d dataset Left-to-right Kernel poly order goes from 1 to 3 1=linear=PCA Top-to-bottom top evec to weaker evecs
23 Kernel PCA Results Use coefficients of the KPCA for training a linear SVM classifier to recognize chairs from their images. Use various polynomial kernel degrees where 1=linear as in regular PCA
24 Kernel PCA Results Use coefficients of the KPCA for training a linear SVM classifier to recognize characters from their images. Use various polynomial kernel degrees where 1=linear as in regular PCA (worst case in experiments) Inferior performance to nonlinear SVMs (why??)
25 Semidefinite Embedding Also known as Maximum Variance Unfolding Similar to LLE and kernel PCA Like LLE, maintains only distance in the neighborhood Stretch all the data while maintaining the distances: Then apply PCA (or kpca)
26 Semidefinite Embedding To visualize high-dimensional {x 1,,x N } data: PCA and Kernel PCA (Sholkopf et al): -Get matrix A of affinities between pairs A ij =k(x i,x j ) -SVD A & view top projections Semidefinite Embedding (Weinberger, Saul): -Get k-nearest neighbors graph of data -Get matrix A -Use max trace SDP to stretch stretch graph A into PD graph K -SVD K & view top projections
27 Semidefinite Embedding SDE unfolds (pulls apart) knn connected graph C but preserves pairwise distances when C ij =1 max K λ i i κ = K R N N s.t.k 0 ij s.t. K ij = 0 s.t. K κ s.t.k ii + K jj K ij K ji = A ii + A jj A ij A ji if C ij = 1 SDE s stretching of graph improves the visualization
28 SDE Optimization with YALMIP Linear Programming <Quadratic Programming <Quadratically Constrained Quadratic Programming <Semidefinite Programming <Convex Programming <Polynomial Time Algorithms LP QP QCQP SDP CP P
29 SDE Optimization with YALMIP LP fast O(N 3 ) min x b T x s.t. ci T x αi i QP min x 1 2 x T H x + b T x s.t. ci T x αi i QCQP min x 1 2 x T H x + b T x s.t. ci T x αi i, x T x η SDP min K tr ( BK ) s.t. tr ( C T i K) α i i, K 0 ALL above in the YALMIP package for Matab! Google it, download and install! CP slow O(N 3 ) min x f x ( ) s.t. g ( x ) α
30 SDE Results SDE unfolds (pulls apart) knn connected graph C SDE s stretching of graph improves the visualization here
31 SDE Results SDE unfolds (pulls apart) knn connected graph C before doing PCA Gets more use or energy out of the top eigenvectors than PCA SDE s stretching of graph improves visualization here
32 SDE Problems MVE But SDE stretching could worsen visualization! Spokes Experiment: PCA Want to pull apart only in visualized dimensions Flatten down remaining ones vs.
33 Tony Jebara Minimum Volume Embedding To reduce dimension, drive energy into top (e.g. 2) dims Maximize fidelity or % energy in top dims ( ) = λ 1 + λ 2 F K i λ i F ( K) = 0.98 F ( K) = 0.29 Equivalent to maximizing Assume β=1/2 λ 1 + λ 2 β i λ i for some β 33
34 Minimum Volume Embedding Stretch in d<d top dimensions and squash rest. max K D λ i d λ i=1 i i=d +1 s.t. K κ Simplest Linear-Spectral SDP α = α 1 α d α d +1 α D = Effectively maximizes Eigengap between d th and d+1 th λ
35 Minimum Volume Embedding Stretch in d<d top dimensions and squash rest. max K D λ i d λ i=1 i i=d +1 s.t. K κ Simplest Linear-Spectral SDP α = α 1 α d α d +1 α D = Effectively maximizes Eigengap between d th and d+1 th λ Variational bound on cost Iterated Monotonic SDP Lock V and solve SDP K. Lock K and solve SVD for V.
36 MVE Optimization: Spectral SDP Spectral function: f(λ 1,,λ D ) eigenvalues of a matrix K SDP packages use restricted cost functions over hull kappa. Trace SDPs max K κ tr ( BK ) Logdet SDPs max K κ i log λ i Consider richer SDPs (assume λ i in decreasing order) Linear-Spectral SDPs max K κ i α i λ i Problem: last one doesn t fit nicely into standard SDP code (like YALMIP or CSDP). Need an iterative algorithm
37 MVE Optimization: Spectral SDP ( ) = α i λ i g K i If alphas are ordered get a variational SDP problem (a Procrustes problem). max K κ g K ( ) = max K κ i α i λ i = max K κ α i λ i tr v T i ( i v i ) s.t. Kv i = λ i v i T = max K κ i α i tr ( λ i v i v i ) λ i λ i+1 T = max K κ i α i tr ( Kv i v i ) v T i v j = δ ij T = max K κ tr ( K i α i v i v i ) = max min tr K α u u T K κ U ( i i i i ) s.t. u T i u j = δ ij if α i α i+1 T max K κ max U tr ( K α i u i u i ) s.t. u T i u j = δ ij if α i α i+1 i For max over K use SDP. For max over U use SVD. Iterate to obtain monotonic improvement.
38 MVE Optimization: Spectral SDP Theorem: if alpha decreasing the objective g K is convex. ( ) = α i λ i s.t.α i α i+1 i Proof: Recall from (Overton & Womersley 91) and (Fan 49) and (Bach & Jordan 03) Sum of d top eigenvalues of p.d. matrix is convex f d d ( K) = λ i i=1 convex Our linear-spectral cost is a combination of these 1 ( ) = α D f D ( K) + ( α i α i+1 ) f i ( K i=d 1 ) 1 = α D tr ( K) + α i α i+1 f i ( K) g K i=d 1 Trace (linear) + conic combo of convex fn s =convex
39 MVE Pseudocode Download at
40 MVE Results Spokes experiment visualization and spectra Converges in ~5 iterations
41 MVE Results Swissroll Visualization (Connectivity via knn) (d is set to 2) PCA SDE Same convergence under random initialization or K=A MVE
42 Tony Jebara knn Embedding with MVE MVE does better even with knn connectivity Face images with spectra PCA SDE MVE 42
43 Tony Jebara knn Embedding with MVE MVE does better even with knn Digit images visualization with spectra PCA SDE MVE 43
44 Tony Jebara Graph Embedding with MVE Collaboration network with spectra PCA SDE MVE 44 44
45 MVE Results Evaluate fidelity or % energy in top 2 dims kpca SDE MVE Star 95.0% 29.9% 100.0% Swiss Roll 45.8% 99.9% 99.9% Twos 18.4% 88.4% 97.8% Faces 31.4% 83.6% 99.2% Network 5.9% 95.3% 99.2%
46 MVE for Spanning Trees Instead of knn, use maximum weight spanning tree to connect points (Kruskal s algo: connect points with short edges first, skip edges that create loops, stop when tree) Tree connectivity can fold under SDE or MVE. Add constraints on all pairs to keep all distances from shrinking (call this SDE-FULL or MVE-FULL) K original κ and K ii + K jj K ij K ji A ii + A jj A ij A ji
47 MVE for Spanning Trees Tree connectivity with degree=2. Top: taxonomy tree of 30 species of salamanders Bottom: taxonomy tree of 56 species of crustaceans
Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis
Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal
More informationManifold Learning: Theory and Applications to HRI
Manifold Learning: Theory and Applications to HRI Seungjin Choi Department of Computer Science Pohang University of Science and Technology, Korea seungjin@postech.ac.kr August 19, 2008 1 / 46 Greek Philosopher
More informationNonlinear Dimensionality Reduction
Nonlinear Dimensionality Reduction Piyush Rai CS5350/6350: Machine Learning October 25, 2011 Recap: Linear Dimensionality Reduction Linear Dimensionality Reduction: Based on a linear projection of the
More informationLECTURE NOTE #11 PROF. ALAN YUILLE
LECTURE NOTE #11 PROF. ALAN YUILLE 1. NonLinear Dimension Reduction Spectral Methods. The basic idea is to assume that the data lies on a manifold/surface in D-dimensional space, see figure (1) Perform
More informationAdvanced Machine Learning & Perception
Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 6 Standard Kernels Unusual Input Spaces for Kernels String Kernels Probabilistic Kernels Fisher Kernels Probability Product Kernels
More informationL26: Advanced dimensionality reduction
L26: Advanced dimensionality reduction The snapshot CA approach Oriented rincipal Components Analysis Non-linear dimensionality reduction (manifold learning) ISOMA Locally Linear Embedding CSCE 666 attern
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction
More informationDimension Reduction Techniques. Presented by Jie (Jerry) Yu
Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage
More informationNonlinear Dimensionality Reduction
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Kernel PCA 2 Isomap 3 Locally Linear Embedding 4 Laplacian Eigenmap
More informationFocus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.
Previously Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations y = Ax Or A simply represents data Notion of eigenvectors,
More informationStatistical Machine Learning
Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x
More informationMachine Learning 4771
Machine Learning 477 Instructor: Tony Jebara Topic 5 Generalization Guarantees VC-Dimension Nearest Neighbor Classification (infinite VC dimension) Structural Risk Minimization Support Vector Machines
More informationAdvanced Machine Learning & Perception
Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 1 Introduction, researchy course, latest papers Going beyond simple machine learning Perception, strange spaces, images, time, behavior
More informationDimension Reduction and Low-dimensional Embedding
Dimension Reduction and Low-dimensional Embedding Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/26 Dimension
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA Tobias Scheffer Overview Principal Component Analysis (PCA) Kernel-PCA Fisher Linear Discriminant Analysis t-sne 2 PCA: Motivation
More informationCS798: Selected topics in Machine Learning
CS798: Selected topics in Machine Learning Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS798: Selected topics in Machine Learning
More informationDistance Preservation - Part 2
Distance Preservation - Part 2 Graph Distances Niko Vuokko October 9th 2007 NLDR Seminar Outline Introduction Geodesic and graph distances From linearity to nonlinearity Isomap Geodesic NLM Curvilinear
More informationCOMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017
COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY
More informationNonlinear Methods. Data often lies on or near a nonlinear low-dimensional curve aka manifold.
Nonlinear Methods Data often lies on or near a nonlinear low-dimensional curve aka manifold. 27 Laplacian Eigenmaps Linear methods Lower-dimensional linear projection that preserves distances between all
More informationLearning a Kernel Matrix for Nonlinear Dimensionality Reduction
Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger kilianw@cis.upenn.edu Fei Sha feisha@cis.upenn.edu Lawrence K. Saul lsaul@cis.upenn.edu Department of Computer and Information
More informationDimensionality Reduc1on
Dimensionality Reduc1on contd Aarti Singh Machine Learning 10-601 Nov 10, 2011 Slides Courtesy: Tom Mitchell, Eric Xing, Lawrence Saul 1 Principal Component Analysis (PCA) Principal Components are the
More informationKernel Methods. Machine Learning A W VO
Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance
More informationLearning Eigenfunctions: Links with Spectral Clustering and Kernel PCA
Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Yoshua Bengio Pascal Vincent Jean-François Paiement University of Montreal April 2, Snowbird Learning 2003 Learning Modal Structures
More informationDimensionality reduction. PCA. Kernel PCA.
Dimensionality reduction. PCA. Kernel PCA. Dimensionality reduction Principal Component Analysis (PCA) Kernelizing PCA If we have time: Autoencoders COMP-652 and ECSE-608 - March 14, 2016 1 What is dimensionality
More informationUnsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto
Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian
More informationNonlinear Dimensionality Reduction. Jose A. Costa
Nonlinear Dimensionality Reduction Jose A. Costa Mathematics of Information Seminar, Dec. Motivation Many useful of signals such as: Image databases; Gene expression microarrays; Internet traffic time
More informationLecture 10: Dimension Reduction Techniques
Lecture 10: Dimension Reduction Techniques Radu Balan Department of Mathematics, AMSC, CSCAMM and NWC University of Maryland, College Park, MD April 17, 2018 Input Data It is assumed that there is a set
More informationLinear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction
Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the
More informationDiscriminative Direction for Kernel Classifiers
Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering
More informationManifold Learning and it s application
Manifold Learning and it s application Nandan Dubey SE367 Outline 1 Introduction Manifold Examples image as vector Importance Dimension Reduction Techniques 2 Linear Methods PCA Example MDS Perception
More informationLearning a kernel matrix for nonlinear dimensionality reduction
University of Pennsylvania ScholarlyCommons Departmental Papers (CIS) Department of Computer & Information Science 7-4-2004 Learning a kernel matrix for nonlinear dimensionality reduction Kilian Q. Weinberger
More informationPreprocessing & dimensionality reduction
Introduction to Data Mining Preprocessing & dimensionality reduction CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University Fall 2016 CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall 2016
More informationSPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS
SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS VIKAS CHANDRAKANT RAYKAR DECEMBER 5, 24 Abstract. We interpret spectral clustering algorithms in the light of unsupervised
More informationAdvanced Machine Learning & Perception
Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 1 Introduction, researchy course, latest papers Going beyond simple machine learning Perception, strange spaces, images, time, behavior
More informationUnsupervised dimensionality reduction
Unsupervised dimensionality reduction Guillaume Obozinski Ecole des Ponts - ParisTech SOCN course 2014 Guillaume Obozinski Unsupervised dimensionality reduction 1/30 Outline 1 PCA 2 Kernel PCA 3 Multidimensional
More informationNon-linear Dimensionality Reduction
Non-linear Dimensionality Reduction CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Laplacian Eigenmaps Locally Linear Embedding (LLE)
More informationData-dependent representations: Laplacian Eigenmaps
Data-dependent representations: Laplacian Eigenmaps November 4, 2015 Data Organization and Manifold Learning There are many techniques for Data Organization and Manifold Learning, e.g., Principal Component
More informationNonlinear Manifold Learning Summary
Nonlinear Manifold Learning 6.454 Summary Alexander Ihler ihler@mit.edu October 6, 2003 Abstract Manifold learning is the process of estimating a low-dimensional structure which underlies a collection
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
More informationLecture: Some Practical Considerations (3 of 4)
Stat260/CS294: Spectral Graph Methods Lecture 14-03/10/2015 Lecture: Some Practical Considerations (3 of 4) Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning: these notes are still very rough.
More informationLEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach
LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach Dr. Guangliang Chen February 9, 2016 Outline Introduction Review of linear algebra Matrix SVD PCA Motivation The digits
More informationPrincipal Component Analysis
Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used
More informationEE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015
EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,
More informationUnsupervised learning: beyond simple clustering and PCA
Unsupervised learning: beyond simple clustering and PCA Liza Rebrova Self organizing maps (SOM) Goal: approximate data points in R p by a low-dimensional manifold Unlike PCA, the manifold does not have
More informationPCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given
More informationPCA, Kernel PCA, ICA
PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per
More informationMACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA
1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR
More information14 Singular Value Decomposition
14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing
More informationKernel Principal Component Analysis
Kernel Principal Component Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationUnsupervised Learning: Dimensionality Reduction
Unsupervised Learning: Dimensionality Reduction CMPSCI 689 Fall 2015 Sridhar Mahadevan Lecture 3 Outline In this lecture, we set about to solve the problem posed in the previous lecture Given a dataset,
More informationMachine Learning - MT Clustering
Machine Learning - MT 2016 15. Clustering Varun Kanade University of Oxford November 28, 2016 Announcements No new practical this week All practicals must be signed off in sessions this week Firm Deadline:
More informationDistance Metric Learning
Distance Metric Learning Technical University of Munich Department of Informatics Computer Vision Group November 11, 2016 M.Sc. John Chiotellis: Distance Metric Learning 1 / 36 Outline Computer Vision
More informationLaplacian Eigenmaps for Dimensionality Reduction and Data Representation
Introduction and Data Representation Mikhail Belkin & Partha Niyogi Department of Electrical Engieering University of Minnesota Mar 21, 2017 1/22 Outline Introduction 1 Introduction 2 3 4 Connections to
More informationVectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =
Linear Algebra Review Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1 x x = 2. x n Vectors of up to three dimensions are easy to diagram.
More informationSupport Vector Machines
Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot, we get creative in two
More informationLecture Notes 1: Vector spaces
Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector
More informationData Preprocessing Tasks
Data Tasks 1 2 3 Data Reduction 4 We re here. 1 Dimensionality Reduction Dimensionality reduction is a commonly used approach for generating fewer features. Typically used because too many features can
More informationMachine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.
Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationKernel methods, kernel SVM and ridge regression
Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;
More informationIntroduction to Machine Learning
10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what
More informationDIMENSION REDUCTION. min. j=1
DIMENSION REDUCTION 1 Principal Component Analysis (PCA) Principal components analysis (PCA) finds low dimensional approximations to the data by projecting the data onto linear subspaces. Let X R d and
More information7 Principal Component Analysis
7 Principal Component Analysis This topic will build a series of techniques to deal with high-dimensional data. Unlike regression problems, our goal is not to predict a value (the y-coordinate), it is
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationData Mining II. Prof. Dr. Karsten Borgwardt, Department Biosystems, ETH Zürich. Basel, Spring Semester 2016 D-BSSE
D-BSSE Data Mining II Prof. Dr. Karsten Borgwardt, Department Biosystems, ETH Zürich Basel, Spring Semester 2016 D-BSSE Karsten Borgwardt Data Mining II Course, Basel Spring Semester 2016 2 / 117 Our course
More informationMachine Learning (BSMC-GA 4439) Wenke Liu
Machine Learning (BSMC-GA 4439) Wenke Liu 02-01-2018 Biomedical data are usually high-dimensional Number of samples (n) is relatively small whereas number of features (p) can be large Sometimes p>>n Problems
More informationCIS 520: Machine Learning Oct 09, Kernel Methods
CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed
More informationMachine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling
Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University
More informationECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction
ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering
More informationLinear Dimensionality Reduction
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis
More informationApprentissage non supervisée
Apprentissage non supervisée Cours 3 Higher dimensions Jairo Cugliari Master ECD 2015-2016 From low to high dimension Density estimation Histograms and KDE Calibration can be done automacally But! Let
More informationGlobal (ISOMAP) versus Local (LLE) Methods in Nonlinear Dimensionality Reduction
Global (ISOMAP) versus Local (LLE) Methods in Nonlinear Dimensionality Reduction A presentation by Evan Ettinger on a Paper by Vin de Silva and Joshua B. Tenenbaum May 12, 2005 Outline Introduction The
More informationKernel methods for comparing distributions, measuring dependence
Kernel methods for comparing distributions, measuring dependence Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Principal component analysis Given a set of M centered observations
More informationEach new feature uses a pair of the original features. Problem: Mapping usually leads to the number of features blow up!
Feature Mapping Consider the following mapping φ for an example x = {x 1,...,x D } φ : x {x1,x 2 2,...,x 2 D,,x 2 1 x 2,x 1 x 2,...,x 1 x D,...,x D 1 x D } It s an example of a quadratic mapping Each new
More informationSingular Value Decomposition and Principal Component Analysis (PCA) I
Singular Value Decomposition and Principal Component Analysis (PCA) I Prof Ned Wingreen MOL 40/50 Microarray review Data per array: 0000 genes, I (green) i,i (red) i 000 000+ data points! The expression
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationDimensionality Reduction AShortTutorial
Dimensionality Reduction AShortTutorial Ali Ghodsi Department of Statistics and Actuarial Science University of Waterloo Waterloo, Ontario, Canada, 2006 c Ali Ghodsi, 2006 Contents 1 An Introduction to
More informationLocality Preserving Projections
Locality Preserving Projections Xiaofei He Department of Computer Science The University of Chicago Chicago, IL 60637 xiaofei@cs.uchicago.edu Partha Niyogi Department of Computer Science The University
More informationMetric Learning. 16 th Feb 2017 Rahul Dey Anurag Chowdhury
Metric Learning 16 th Feb 2017 Rahul Dey Anurag Chowdhury 1 Presentation based on Bellet, Aurélien, Amaury Habrard, and Marc Sebban. "A survey on metric learning for feature vectors and structured data."
More informationDS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.
DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1
More informationNotion of Distance. Metric Distance Binary Vector Distances Tangent Distance
Notion of Distance Metric Distance Binary Vector Distances Tangent Distance Distance Measures Many pattern recognition/data mining techniques are based on similarity measures between objects e.g., nearest-neighbor
More informationPrincipal Component Analysis
Principal Component Analysis Yuanzhen Shao MA 26500 Yuanzhen Shao PCA 1 / 13 Data as points in R n Assume that we have a collection of data in R n. x 11 x 21 x 12 S = {X 1 =., X x 22 2 =.,, X x m2 m =.
More informationPARAMETERIZATION OF NON-LINEAR MANIFOLDS
PARAMETERIZATION OF NON-LINEAR MANIFOLDS C. W. GEAR DEPARTMENT OF CHEMICAL AND BIOLOGICAL ENGINEERING PRINCETON UNIVERSITY, PRINCETON, NJ E-MAIL:WGEAR@PRINCETON.EDU Abstract. In this report we consider
More informationDimensionality Reduction: A Comparative Review
Tilburg centre for Creative Computing P.O. Box 90153 Tilburg University 5000 LE Tilburg, The Netherlands http://www.uvt.nl/ticc Email: ticc@uvt.nl Copyright c Laurens van der Maaten, Eric Postma, and Jaap
More informationKernel Methods. Barnabás Póczos
Kernel Methods Barnabás Póczos Outline Quick Introduction Feature space Perceptron in the feature space Kernels Mercer s theorem Finite domain Arbitrary domain Kernel families Constructing new kernels
More informationBeyond Scalar Affinities for Network Analysis or Vector Diffusion Maps and the Connection Laplacian
Beyond Scalar Affinities for Network Analysis or Vector Diffusion Maps and the Connection Laplacian Amit Singer Princeton University Department of Mathematics and Program in Applied and Computational Mathematics
More information15 Singular Value Decomposition
15 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing
More informationLaplacian Eigenmaps for Dimensionality Reduction and Data Representation
Laplacian Eigenmaps for Dimensionality Reduction and Data Representation Neural Computation, June 2003; 15 (6):1373-1396 Presentation for CSE291 sp07 M. Belkin 1 P. Niyogi 2 1 University of Chicago, Department
More informationCSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13
CSE 291. Assignment 3 Out: Wed May 23 Due: Wed Jun 13 3.1 Spectral clustering versus k-means Download the rings data set for this problem from the course web site. The data is stored in MATLAB format as
More informationPrincipal Component Analysis
CSci 5525: Machine Learning Dec 3, 2008 The Main Idea Given a dataset X = {x 1,..., x N } The Main Idea Given a dataset X = {x 1,..., x N } Find a low-dimensional linear projection The Main Idea Given
More informationVector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.
Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Vector space Consists of: A set V A scalar
More informationMathematical foundations - linear algebra
Mathematical foundations - linear algebra Andrea Passerini passerini@disi.unitn.it Machine Learning Vector space Definition (over reals) A set X is called a vector space over IR if addition and scalar
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA
More informationIntrinsic Structure Study on Whale Vocalizations
1 2015 DCLDE Conference Intrinsic Structure Study on Whale Vocalizations Yin Xian 1, Xiaobai Sun 2, Yuan Zhang 3, Wenjing Liao 3 Doug Nowacek 1,4, Loren Nolte 1, Robert Calderbank 1,2,3 1 Department of
More information7. Symmetric Matrices and Quadratic Forms
Linear Algebra 7. Symmetric Matrices and Quadratic Forms CSIE NCU 1 7. Symmetric Matrices and Quadratic Forms 7.1 Diagonalization of symmetric matrices 2 7.2 Quadratic forms.. 9 7.4 The singular value
More informationCS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)
CS4495/6495 Introduction to Computer Vision 8B-L2 Principle Component Analysis (and its use in Computer Vision) Wavelength 2 Wavelength 2 Principal Components Principal components are all about the directions
More informationStatistical Methods for SVM
Statistical Methods for SVM Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot,
More information1 Principal Components Analysis
Lecture 3 and 4 Sept. 18 and Sept.20-2006 Data Visualization STAT 442 / 890, CM 462 Lecture: Ali Ghodsi 1 Principal Components Analysis Principal components analysis (PCA) is a very popular technique for
More informationFinal Exam, Machine Learning, Spring 2009
Name: Andrew ID: Final Exam, 10701 Machine Learning, Spring 2009 - The exam is open-book, open-notes, no electronics other than calculators. - The maximum possible score on this exam is 100. You have 3
More informationConvex Optimization in Classification Problems
New Trends in Optimization and Computational Algorithms December 9 13, 2001 Convex Optimization in Classification Problems Laurent El Ghaoui Department of EECS, UC Berkeley elghaoui@eecs.berkeley.edu 1
More information