Unsupervised dimensionality reduction


 Silvia Daniela Barnett
 1 years ago
 Views:
Transcription
1 Unsupervised dimensionality reduction Guillaume Obozinski Ecole des Ponts  ParisTech SOCN course 2014 Guillaume Obozinski Unsupervised dimensionality reduction 1/30
2 Outline 1 PCA 2 Kernel PCA 3 Multidimensional scaling 4 Laplacian Eigenmaps 5 Locally Linear Embedding Guillaume Obozinski Unsupervised dimensionality reduction 2/30
3 PCA Guillaume Obozinski Unsupervised dimensionality reduction 4/30
4 A direction that maximizes the variance Data are points in R d. Looking for a direction v in R d such that the variance of the signals projected on v is maximized: Var((v x i )...n ) = 1 n = 1 n (v x i ) 2 v x i x i v = v ( 1 n = v Σ v x i x i ) v Need to solve max v 2 =1 v Σ v Solution: eigenvector associated to the largest eigenvalue of Σ Guillaume Obozinski Unsupervised dimensionality reduction 5/30
5 Principal directions Assume the design matrix is centered, i.e. X 1 = 0. Principal directions as eigenvectors of the covariance Consider the eigenvalue decomposition of Σ = X X: Σ = VS 2 V with S = Diag(s 1,..., s n ) and s 1... s n. The principal directions are the columns of V Principal directions as singular vectors of the design matrix Consider the singular value decomposition of X X = USV The principal directions are the right singular vectors. Guillaume Obozinski Unsupervised dimensionality reduction 6/30
6 Principal components Obtained by projection of the rows of X on V. But XV = USV V = US. So the principal components are obtained with the left singular vectors and the singular values. Guillaume Obozinski Unsupervised dimensionality reduction 7/30
7 Kernel PCA Guillaume Obozinski Unsupervised dimensionality reduction 9/30
8 Centering implicitly in feature space Assume that we use a mapping φ so that the representation of the data is the design matrix Then φ(x 1 ) Φ =.. φ(x n ) φ = 1 n φ(x i ) = Φ 1. So that if φ(x i ) = φ(x i ) φ then Φ = Φ 11 Φ = ( I n 1 n 11 ) Φ Finally, the center kernel matrix is computed as K = Φ Φ = HKH with H = I n 1 n 11. Guillaume Obozinski Unsupervised dimensionality reduction 10/30
9 Principal function in a RKHS (Schölkopf et al., 1998) Find a function f with f H = 1 that maximizes Equivalently, max f f, h xi 2 H. f (x i ) 2 s.t. f 2 H 1. By the representer theorem, f (x) = n j=1 α jk(x, x j ). So, f (x i ) 2 = = ( ) 2 α j K(x i, x j ) j=1 j,j =1 = α K K α α j α j K(x i, x j )K(x i, x j ) So the problem can be written as max α α K K α s.t. α K α. Guillaume Obozinski Unsupervised dimensionality reduction 11/30
10 Solution of kernel PCA Write K = US 2 U. If β = U α, then the problem is formulated as max β β2 i s 4 i s.t. β 2 i s 2 i 1 This is attained for β = ( 1 s 1, 0,..., 0) and thus α = 1 s 1 u 1. So the first principal function is f (x) = 1 s 1 U i1 K(x i, x) And the kth principal function is f (x) = 1 s k U ik K(x i, x) Guillaume Obozinski Unsupervised dimensionality reduction 12/30
11 Multidimensional scaling Guillaume Obozinski Unsupervised dimensionality reduction 14/30
12 Multidimensional scaling Goal: Given a collection of not necessarily Euclidean distances δ ij between pairs of points indexed by {1,..., n}. Construct a collection of points y i in a Euclidean space such that y i y j 2 δ ij Original formulation: Minimize a function called stress function ( ) 2 min yi y j 2 δ ij Y ij Classical formulation: min Y ( yi y j 2 2 δij 2 ij ) 2 Guillaume Obozinski Unsupervised dimensionality reduction 15/30
13 Centered kernel matrix from a Euclidean distance matrix Lemma If D 2 = ( dij 2 ) 1 i,j n is a matrix of squared Euclidean distances, then K = 1 2 HD 2H with H = I n 1 n 11, is the corresponding centered kernel matrix. Proof: d 2 ij = φ(x i ) φ(x j ) 2 2 = K ii + K jj 2K jj With κ = (K 11,..., K nn ), we have 2K = κ1 + 1κ D 2 K = HKH = 1 2 H ( κ1 + 1κ D 2 ) H. Guillaume Obozinski Unsupervised dimensionality reduction 16/30
14 Classical MDS algorithm Algorithm: 1 Compute K = 1 2 HD 2H. 2 Remove negative eigenvalues from K. 3 Solve kernel PCA on K If D 2 are Euclidean distances, step 2 is unnecessary and it can be shown that this solves the classical MDS problem. Guillaume Obozinski Unsupervised dimensionality reduction 17/30
15 Isomap (Tenenbaum et al., 2000) Algorithm: 1 Compute a knn graph on the data 2 Compute geodesic distances on the knn graph using the l 2 distance on each edge. 3 Apply classical MDS to the obtained distances Remarks: Isomap assumes that we can rely on the l 2 distance locally will fail if there are too many noise dimensions. geodesic distances can be computed with e.g. the Floyd Warshall algorithm Guillaume Obozinski Unsupervised dimensionality reduction 18/30
16 Laplacian Eigenmaps Guillaume Obozinski Unsupervised dimensionality reduction 20/30
17 Graph Laplacians Assume a similarity matrix W is available on the data, e.g. a kernel matrix such as W = (w ij ) 1 i,j n with ( w ij = exp 1 ) h x i x j 2 2 We can think of W as defining a weighted graph on the data. We say that a function is smooth on the weighted graph if its Laplacian L (f ) := 1 w ij (f (x i ) f (x j )) 2 2 ij is small. We say that a vector f is smooth on the weighted graph if its Laplacian L (f) := 1 w ij (f i f j ) 2 2 is small. ij Guillaume Obozinski Unsupervised dimensionality reduction 21/30
18 Laplacian and normalized Laplacian matrices Define D = Diag(d) with d i = j w ij. We then have L (f) = 1 w ij (f i f j ) 2 2 = 1 2 = 1 2 ij ij i w ij f 2 i d i f 2 i ij j w ij f 2 j d j f 2 j = f Df f Wf = f Lf Laplacian matrix: L Normalized Laplacian matrix: ij ij w ij f i f j w ij f i f j L := D 1 2 LD 1 2 = I D 1 2 WD 1 2. Guillaume Obozinski Unsupervised dimensionality reduction 22/30
19 Laplacian embeddings (Belkin and Niyogi, 2001) Principle: Given a weight matrix W find an embedding y i R K for point i that, given scaling and centering constraints on y i, solves We have min Y w ij y i y j 2 2 with Y = [ ] y 1... y n. ij w ij y i y j 2 2 = ( ) 2 w ij Yik Y ij ij ij k=1 K ( ) 2 = w ij Yik Y jk = k=1 ij K K Y k L Y k = tr(y L Y) k=1 Guillaume Obozinski Unsupervised dimensionality reduction 23/30
20 Laplacian embedding formulation min Y tr(y L Y) s.t. Y D Y = I, Y 1 = 0. With the change of variable Ỹ = D 1 2 Y, then Ỹ solves min Ỹ tr(ỹ L Ỹ) s.t. Ỹ Ỹ = I, Ỹ D = 0. But LD = 0, so the columns of Ỹ are the eigenvectors associated with the smallest eigenvalues except for the one D Equivalently, the columns of Y are the solutions of the generalized eigenvalue problem Lu = λdu for the smallest generalized eigenvalues except for the one 1. The rows of the obtained matrix form the embedding. Guillaume Obozinski Unsupervised dimensionality reduction 24/30
21 Locally Linear Embedding Guillaume Obozinski Unsupervised dimensionality reduction 26/30
22 Locally Linear Embedding (Roweis and Saul, 2000) Let x 1,..., x p be a collection of vectors in R p. 1 Construct a knn graph 2 Approximate x i by a linear combination of its neighbors, by finding the vector of weights solving the constrained linear regressions: xi min 2 w ij x j, with w ij = 1. w i 2 j N (i) j N (i) 3 Set w ij = 0 for j / N (i). 4 Find a centered set of points y i in R d with white covariance, which minimizes 2 y i w ij y j j=1 2 Guillaume Obozinski Unsupervised dimensionality reduction 27/30
23 LLE step 2: constrained regressions xi min 2 w ij x j, with w ij = 1. w i 2 j N (i) j N (i) But if j N (i) w ij = 1, then x i 2 w ij x j = Need to solve j N (i) 2 j N (i) = j N (i) w ij (x i x j ) min u 1 2 u Ku s.t. u 1 = w ij w ik (x i x j ) (x i x k ) }{{} K (i) jk L(u, λ) = 1 2 u Ku λ(u 1 1) and u L = 0 Ku = λ1 Solved for u = K K 1 1. Guillaume Obozinski Unsupervised dimensionality reduction 28/30
24 LLE step 4: final optimization problem min y 1,...,y n s.t. y i y i = 1, 2 w ij y j j=1 1 n 2 y i y i = I p. Equivalently, denoting Y = [ y 1... y n ], we have to solve Or, min Y min Y Y WY 2 F s.t. 1 Y = 0, Y (I n W) (I n W) Y s.t. Y1 = 0, 1 n Y Y = I p. 1 n Y Y = I p. So the columns of 1 n Y are the k eigenvectors associated with the smallest k nonzero eigenvalues. Guillaume Obozinski Unsupervised dimensionality reduction 29/30
25 References I Belkin, M. and Niyogi, P. (2001). Laplacian eigenmaps and spectral techniques for embedding and clustering. In NIPS, volume 14, pages Roweis, S. T. and Saul, L. K. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500): Schölkopf, B., Smola, A., and Müller, K.R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural computation, 10(5): Tenenbaum, J. B., De Silva, V., and Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500): Guillaume Obozinski Unsupervised dimensionality reduction 30/30
Nonlinear Dimensionality Reduction
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Kernel PCA 2 Isomap 3 Locally Linear Embedding 4 Laplacian Eigenmap
More informationLearning Eigenfunctions: Links with Spectral Clustering and Kernel PCA
Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Yoshua Bengio Pascal Vincent JeanFrançois Paiement University of Montreal April 2, Snowbird Learning 2003 Learning Modal Structures
More informationUnsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto
Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, KMeans, ISOMAP, HLLE, Laplacian
More informationDimensionality Reduction AShortTutorial
Dimensionality Reduction AShortTutorial Ali Ghodsi Department of Statistics and Actuarial Science University of Waterloo Waterloo, Ontario, Canada, 2006 c Ali Ghodsi, 2006 Contents 1 An Introduction to
More informationNonlinear Methods. Data often lies on or near a nonlinear lowdimensional curve aka manifold.
Nonlinear Methods Data often lies on or near a nonlinear lowdimensional curve aka manifold. 27 Laplacian Eigenmaps Linear methods Lowerdimensional linear projection that preserves distances between all
More informationDistance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center
Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center 1 Outline Part I  Applications Motivation and Introduction Patient similarity application Part II
More informationLearning a Kernel Matrix for Nonlinear Dimensionality Reduction
Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger kilianw@cis.upenn.edu Fei Sha feisha@cis.upenn.edu Lawrence K. Saul lsaul@cis.upenn.edu Department of Computer and Information
More informationData dependent operators for the spatialspectral fusion problem
Data dependent operators for the spatialspectral fusion problem Wien, December 3, 2012 Joint work with: University of Maryland: J. J. Benedetto, J. A. Dobrosotskaya, T. Doster, K. W. Duke, M. Ehler, A.
More informationA Duality View of Spectral Methods for Dimensionality Reduction
Lin Xiao lxiao@caltech.edu Center for the Mathematics of Information, California Institute of Technology, Pasadena, CA 91125, USA Jun Sun sunjun@stanford.edu Stephen Boyd boyd@stanford.edu Department of
More informationFace Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi
Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Overview Introduction Linear Methods for Dimensionality Reduction Nonlinear Methods and Manifold
More informationDimensionality Reduc1on
Dimensionality Reduc1on contd Aarti Singh Machine Learning 10601 Nov 10, 2011 Slides Courtesy: Tom Mitchell, Eric Xing, Lawrence Saul 1 Principal Component Analysis (PCA) Principal Components are the
More informationFocus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.
Previously Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations y = Ax Or A simply represents data Notion of eigenvectors,
More informationGlobal (ISOMAP) versus Local (LLE) Methods in Nonlinear Dimensionality Reduction
Global (ISOMAP) versus Local (LLE) Methods in Nonlinear Dimensionality Reduction A presentation by Evan Ettinger on a Paper by Vin de Silva and Joshua B. Tenenbaum May 12, 2005 Outline Introduction The
More informationA SEMISUPERVISED METRIC LEARNING FOR CONTENTBASED IMAGE RETRIEVAL. {dimane,
A SEMISUPERVISED METRIC LEARNING FOR CONTENTBASED IMAGE RETRIEVAL I. Daoudi,, K. Idrissi, S. Ouatik 3 Université de Lyon, CNRS, INSALyon, LIRIS, UMR505, F696, France Faculté Des Sciences, UFR IT, Université
More informationSPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS
SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS VIKAS CHANDRAKANT RAYKAR DECEMBER 5, 24 Abstract. We interpret spectral clustering algorithms in the light of unsupervised
More informationData Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings
Data Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings Radu Horaud INRIA Grenoble RhoneAlpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline
More informationLocality Preserving Projections
Locality Preserving Projections Xiaofei He Department of Computer Science The University of Chicago Chicago, IL 60637 xiaofei@cs.uchicago.edu Partha Niyogi Department of Computer Science The University
More informationLEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach
LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach Dr. Guangliang Chen February 9, 2016 Outline Introduction Review of linear algebra Matrix SVD PCA Motivation The digits
More informationDiscriminant Uncorrelated Neighborhood Preserving Projections
Journal of Information & Computational Science 8: 14 (2011) 3019 3026 Available at http://www.joics.com Discriminant Uncorrelated Neighborhood Preserving Projections Guoqiang WANG a,, Weijuan ZHANG a,
More informationLocalized Sliced Inverse Regression
Localized Sliced Inverse Regression Qiang Wu, Sayan Mukherjee Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University, Durham NC 2778251,
More informationThe Curse of Dimensionality for Local Kernel Machines
The Curse of Dimensionality for Local Kernel Machines Yoshua Bengio, Olivier Delalleau, Nicolas Le Roux Dept. IRO, Université de Montréal P.O. Box 6128, Downtown Branch, Montreal, H3C 3J7, Qc, Canada {bengioy,delallea,lerouxni}@iro.umontreal.ca
More informationMLCC 2015 Dimensionality Reduction and PCA
MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGEMITIIT June 25, 2015 Outline PCA & Reconstruction PCA and Maximum Variance PCA and Associated Eigenproblem Beyond the First Principal Component
More informationDimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining
Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Combinations of features Given a data matrix X n p with p fairly large, it can
More informationDimensionality Reduction: A Comparative Review
Tilburg centre for Creative Computing P.O. Box 90153 Tilburg University 5000 LE Tilburg, The Netherlands http://www.uvt.nl/ticc Email: ticc@uvt.nl Copyright c Laurens van der Maaten, Eric Postma, and Jaap
More informationDiscriminative Direction for Kernel Classifiers
Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering
More informationLocal Learning Projections
Mingrui Wu mingrui.wu@tuebingen.mpg.de Max Planck Institute for Biological Cybernetics, Tübingen, Germany Kai Yu kyu@sv.neclabs.com NEC Labs America, Cupertino CA, USA Shipeng Yu shipeng.yu@siemens.com
More informationData Analysis and Manifold Learning Lecture 7: Spectral Clustering
Data Analysis and Manifold Learning Lecture 7: Spectral Clustering Radu Horaud INRIA Grenoble RhoneAlpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture 7 What is spectral
More informationLecture 7 Spectral methods
CSE 291: Unsupervised learning Spring 2008 Lecture 7 Spectral methods 7.1 Linear algebra review 7.1.1 Eigenvalues and eigenvectors Definition 1. A d d matrix M has eigenvalue λ if there is a ddimensional
More informationTheoretical analysis of LLE based on its weighting step
Theoretical analysis of LLE based on its weighting step Yair Goldberg and Ya acov Ritov Department of Statistics and The Center for the Study of Rationality The Hebrew University March 29, 2011 Abstract
More informationRegression on Manifolds Using Kernel Dimension Reduction
Jens Nilsson JENSN@MATHS.LTH.SE Centre for Mathematical Sciences, Lund University, Box 118, SE221 00 Lund, Sweden Fei Sha FEISHA@CS.BERKELEY.EDU Computer Science Division, University of California, Berkeley,
More informationMachine learning for pervasive systems Classification in highdimensional spaces
Machine learning for pervasive systems Classification in highdimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version
More informationLocally Linear Embedded Eigenspace Analysis
Locally Linear Embedded Eigenspace Analysis IFP.TRLEA.YunFuJan.1,2005 Yun Fu and Thomas S. Huang Beckman Institute for Advanced Science and Technology University of Illinois at UrbanaChampaign 405 North
More informationSpectral Clustering. by HU Pili. June 16, 2013
Spectral Clustering by HU Pili June 16, 2013 Outline Clustering Problem Spectral Clustering Demo Preliminaries Clustering: Kmeans Algorithm Dimensionality Reduction: PCA, KPCA. Spectral Clustering Framework
More informationPart I Generalized Principal Component Analysis
Part I Generalized Principal Component Analysis René Vidal Center for Imaging Science Institute for Computational Medicine Johns Hopkins University Principal Component Analysis (PCA) Given a set of points
More informationDimension reduction methods: Algorithms and Applications Yousef Saad Department of Computer Science and Engineering University of Minnesota
Dimension reduction methods: Algorithms and Applications Yousef Saad Department of Computer Science and Engineering University of Minnesota Université du Littoral Calais July 11, 16 First..... to the
More informationPCA, Kernel PCA, ICA
PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. MariaFlorina Balcan 04/08/2015 Big & HighDimensional Data HighDimensions = Lot of Features Document classification Features per
More informationPRINCIPAL COMPONENTS ANALYSIS
121 CHAPTER 11 PRINCIPAL COMPONENTS ANALYSIS We now have the tools necessary to discuss one of the most important concepts in mathematical statistics: Principal Components Analysis (PCA). PCA involves
More informationSVM and Dimensionality Reduction in Cognitive Radio with Experimental Validation
1 and ality Reduction in Cognitive Radio with Experimental Validation Shujie Hou, Robert C Qiu*, Senior Member, IEEE, Zhe Chen, Zhen Hu, Student Member, IEEE arxiv:11062325v1 [csni] 12 Jun 2011 Abstract
More informationMATH 829: Introduction to Data Mining and Analysis Clustering II
his lecture is based on U. von Luxburg, A Tutorial on Spectral Clustering, Statistics and Computing, 17 (4), 2007. MATH 829: Introduction to Data Mining and Analysis Clustering II Dominique Guillot Departments
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA Tobias Scheffer Overview Principal Component Analysis (PCA) KernelPCA Fisher Linear Discriminant Analysis tsne 2 PCA: Motivation
More informationComputation. For QDA we need to calculate: Lets first consider the case that
Computation For QDA we need to calculate: δ (x) = 1 2 log( Σ ) 1 2 (x µ ) Σ 1 (x µ ) + log(π ) Lets first consider the case that Σ = I,. This is the case where each distribution is spherical, around the
More informationWhat is Principal Component Analysis?
What is Principal Component Analysis? Principal component analysis (PCA) Reduce the dimensionality of a data set by finding a new set of variables, smaller than the original set of variables Retains most
More informationStructure in Data. A major objective in data analysis is to identify interesting features or structure in the data.
Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two
More informationUnsupervised Kernel Dimension Reduction Supplemental Material
Unsupervised Kernel Dimension Reduction Supplemental Material Meihong Wang Dept. of Computer Science U. of Southern California Los Angeles, CA meihongw@usc.edu Fei Sha Dept. of Computer Science U. of Southern
More informationAnalysis of Spectral Kernel Design based Semisupervised Learning
Analysis of Spectral Kernel Design based Semisupervised Learning Tong Zhang IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Rie Kubota Ando IBM T. J. Watson Research Center Yorktown Heights,
More informationOnline Kernel PCA with Entropic Matrix Updates
Dima Kuzmin Manfred K. Warmuth Computer Science Department, University of California  Santa Cruz dima@cse.ucsc.edu manfred@cse.ucsc.edu Abstract A number of updates for density matrices have been developed
More informationTwoManifold Problems with Applications to Nonlinear System Identification
TwoManifold Problems with Applications to Nonlinear System Identification Byron Boots Geoffrey J. Gordon Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA 1213 beb@cs.cmu.edu ggordon@cs.cmu.edu
More informationPrinciple Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA
Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis: Uses one group of variables (we will call this X) In
More informationSignal processing methods have significantly changed. Diffusion Maps. Signal Processing
[ Ronen Talmon, Israel Cohen, Sharon Gannot, and Ronald R. Coifman ] Diffusion Maps for Signal Processing Advances in Kernelbased Learning for Signal Processing istockphoto.com/ aleksandar velasevic [
More informationLecture 5 Singular value decomposition
Lecture 5 Singular value decomposition Weinan E 1,2 and Tiejun Li 2 1 Department of Mathematics, Princeton University, weinan@princeton.edu 2 School of Mathematical Sciences, Peking University, tieli@pku.edu.cn
More informationJordan normal form notes (version date: 11/21/07)
Jordan normal form notes (version date: /2/7) If A has an eigenbasis {u,, u n }, ie a basis made up of eigenvectors, so that Au j = λ j u j, then A is diagonal with respect to that basis To see this, let
More informationApproximating a Gram Matrix for Improved KernelBased Learning
Approximating a Gram Matrix for Improved KernelBased Learning (Extended Abstract) Petros Drineas 1 and Michael W. Mahoney 1 Department of Computer Science, Rensselaer Polytechnic Institute, Troy, New
More informationConvex relaxation for Combinatorial Penalties
Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts  ParisTech Joint work with Francis Bach Fête Parisienne in Computation,
More informationImmediate Reward Reinforcement Learning for Projective Kernel Methods
ESANN'27 proceedings  European Symposium on Artificial Neural Networks Bruges (Belgium), 2527 April 27, dside publi., ISBN 2933772. Immediate Reward Reinforcement Learning for Projective Kernel Methods
More informationDimensionality Estimation, Manifold Learning and Function Approximation using Tensor Voting
Journal of Machine Learning Research 11 (2010) 411450 Submitted 11/07; Revised 11/09; Published 1/10 Dimensionality Estimation, Manifold Learning and Function Approximation using Tensor Voting Philippos
More informationVariable Importance in Nonlinear Kernels (VINK): Classification of Digitized Histopathology
Variable Importance in Nonlinear Kernels (VINK): Classification of Digitized Histopathology Shoshana Ginsburg 1, Sahirzeeshan Ali 1, George Lee 2, Ajay Basavanhally 2, and Anant Madabhushi 1, 1 Department
More informationDecember 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis
.. December 20, 2013 Todays lecture. (PCA) (PLSR) (LDA) . (PCA) is a method often used to reduce the dimension of a large dataset to one of a more manageble size. The new dataset can then be used to make
More informationRiemannian Manifold Learning for Nonlinear Dimensionality Reduction
Riemannian Manifold Learning for Nonlinear Dimensionality Reduction Tony Lin 1,, Hongbin Zha 1, and Sang Uk Lee 2 1 National Laboratory on Machine Perception, Peking University, Beijing 100871, China {lintong,
More informationLeast Squares Optimization
Least Squares Optimization The following is a brief review of least squares optimization and constrained optimization techniques. I assume the reader is familiar with basic linear algebra, including the
More informationSpectral clustering. Two ideal clusters, with two points each. Spectral clustering algorithms
A simple example Two ideal clusters, with two points each Spectral clustering Lecture 2 Spectral clustering algorithms 4 2 3 A = Ideally permuted Ideal affinities 2 Indicator vectors Each cluster has an
More informationKernel Methods. Machine Learning A W VO
Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance
More informationKernel Methods in Machine Learning
Kernel Methods in Machine Learning Autumn 2015 Lecture 1: Introduction Juho Rousu ICSE4030 Kernel Methods in Machine Learning 9. September, 2015 uho Rousu (ICSE4030 Kernel Methods in Machine Learning)
More informationEach new feature uses a pair of the original features. Problem: Mapping usually leads to the number of features blow up!
Feature Mapping Consider the following mapping φ for an example x = {x 1,...,x D } φ : x {x1,x 2 2,...,x 2 D,,x 2 1 x 2,x 1 x 2,...,x 1 x D,...,x D 1 x D } It s an example of a quadratic mapping Each new
More informationHow to learn from very few examples?
How to learn from very few examples? Dengyong Zhou Department of Empirical Inference Max Planck Institute for Biological Cybernetics Spemannstr. 38, 72076 Tuebingen, Germany Outline Introduction Part A
More informationarxiv: v1 [cs.lg] 29 Dec 2011
TwoManifold Problems arxiv:1112.6399v1 [cs.lg] 29 Dec 211 Byron Boots Machine Learning Department Carnegie Mellon University Pittsburgh, PA 15213 beb@cs.cmu.edu Abstract Recently, there has been much
More informationCS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works
CS68: The Modern Algorithmic Toolbox Lecture #8: How PCA Works Tim Roughgarden & Gregory Valiant April 20, 206 Introduction Last lecture introduced the idea of principal components analysis (PCA). The
More informationDistance Preservation  Part 2
Distance Preservation  Part 2 Graph Distances Niko Vuokko October 9th 2007 NLDR Seminar Outline Introduction Geodesic and graph distances From linearity to nonlinearity Isomap Geodesic NLM Curvilinear
More informationThe Elastic Embedding Algorithm for Dimensionality Reduction
Miguel Á. CarreiraPerpiñán mcarreiraperpinan@ucmerced.edu Electrical Engineering and Computer Science, School of Engineering, University of California, Merced Abstract We propose a new dimensionality
More informationL11: Pattern recognition principles
L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction
More informationPrincipal Component Analysis
Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand
More informationChapter 1. GEOMETRIC METHODS FOR FEATURE EXTRACTION AND DIMENSIONAL REDUCTION A Guided Tour. Introduction. Christopher J.C. Burges
Chapter 1 GEOMETRIC METHODS FOR FEATURE EXTRACTION AND DIMENSIONAL REDUCTION A Guided Tour Christopher J.C. Burges Microsoft Research Abstract Keywords: We give a tutorial overview of several geometric
More informationDATA MINING LECTURE 8. Dimensionality Reduction PCA  SVD
DATA MINING LECTURE 8 Dimensionality Reduction PCA  SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary
More informationSpectral Graph Theory and You: Matrix Tree Theorem and Centrality Metrics
Spectral Graph Theory and You: and Centrality Metrics Jonathan Gootenberg March 11, 2013 1 / 19 Outline of Topics 1 Motivation Basics of Spectral Graph Theory Understanding the characteristic polynomial
More informationLearning with kernels and SVM
Learning with kernels and SVM Šámalova chata, 23. května, 2006 Petra Kudová Outline Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Learning from data find
More informationMachine Learning for Data Science (CS4786) Lecture 12
Machine Learning for Data Science (CS4786) Lecture 12 Gaussian Mixture Models Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016fa/ Back to Kmeans Single link is sensitive to outliners We
More informationDimensionality Reduction. CS57300 Data Mining Fall Instructor: Bruno Ribeiro
Dimensionality Reduction CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Visualize high dimensional data (and understand its Geometry) } Project the data into lower dimensional spaces }
More informationDistance Preservation  Part I
October 2, 2007 1 Introduction 2 Scalar product Equivalence with PCA Euclidean distance 3 4 5 Spatial distances Only the coordinates of the points affects the distances. L p norm: a p = p D k=1 a k p Minkowski
More informationLinear algebra and applications to graphs Part 1
Linear algebra and applications to graphs Part 1 Written up by Mikhail Belkin and Moon Duchin Instructor: Laszlo Babai June 17, 2001 1 Basic Linear Algebra Exercise 1.1 Let V and W be linear subspaces
More informationDenoising and Dimension Reduction in Feature Space
Denoising and Dimension Reduction in Feature Space Mikio L. Braun Fraunhofer Institute FIRST.IDA Kekuléstr. 7, 2489 Berlin mikio@first.fhg.de Joachim Buhmann Inst. of Computational Science ETH Zurich CH8092
More informationDef. The euclidian distance between two points x = (x 1,...,x p ) t and y = (y 1,...,y p ) t in the pdimensional space R p is defined as
MAHALANOBIS DISTANCE Def. The euclidian distance between two points x = (x 1,...,x p ) t and y = (y 1,...,y p ) t in the pdimensional space R p is defined as d E (x, y) = (x 1 y 1 ) 2 + +(x p y p ) 2
More informationSemiSupervised Learning on Riemannian Manifolds
Machine Learning, 56, 209 239, 2004 c 2004 Kluwer Academic Publishers. Manufactured in The Netherlands. SemiSupervised Learning on Riemannian Manifolds MIKHAIL BELKIN misha@math.uchicago.edu PARTHA NIYOGI
More informationUniversity of Bristol  Explore Bristol Research. Peer reviewed version. Link to published version (if available): /SSCI.2015.
Maronidis, A, Tefas, A, & Pitas, I 206 Graph Embedding Exploiting Subclasses In 205 IEEE Symposium Series on Computational Intelligence SSCI 205: Proceedings of a meeting held 70 December 205, Cape Town,
More informationPrincipal Component Analysis (PCA)
Principal Component Analysis (PCA) Additional reading can be found from nonassessed exercises (week 8) in this course unit teaching page. Textbooks: Sect. 6.3 in [1] and Ch. 12 in [2] Outline Introduction
More informationEigenvalues, Eigenvectors, and an Intro to PCA
Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Changing Basis We ve talked so far about rewriting our data using a new set of variables, or a new basis.
More informationEigenvalues, Eigenvectors, and an Intro to PCA
Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Changing Basis We ve talked so far about rewriting our data using a new set of variables, or a new basis.
More informationFACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING
FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING Vishwanath Mantha Department for Electrical and Computer Engineering Mississippi State University, Mississippi State, MS 39762 mantha@isip.msstate.edu ABSTRACT
More informationbelow, kernel PCA Eigenvectors, and linear combinations thereof. For the cases where the preimage does exist, we can provide a means of constructing
Kernel PCA Pattern Reconstruction via Approximate PreImages Bernhard Scholkopf, Sebastian Mika, Alex Smola, Gunnar Ratsch, & KlausRobert Muller GMD FIRST, Rudower Chaussee 5, 12489 Berlin, Germany fbs,
More informationSpectral Feature Selection for Supervised and Unsupervised Learning
Spectral Feature Selection for Supervised and Unsupervised Learning Zheng Zhao Huan Liu Department of Computer Science and Engineering, Arizona State University zhaozheng@asu.edu huan.liu@asu.edu Abstract
More informationManifold Approximation by Moving LeastSquares Projection (MMLS)
Manifold Approximation by Moving LeastSquares Projection (MMLS) Barak Sober David Levin School of Applied Mathematics, Tel Aviv University, Israel 20180117 arxiv:1606.07104v4 [cs.gr] 16 Jan 2018 Abstract
More informationVectors and Matrices Statistics with Vectors and Matrices
Vectors and Matrices Statistics with Vectors and Matrices Lecture 3 September 7, 005 Analysis Lecture #39/7/005 Slide 1 of 55 Today s Lecture Vectors and Matrices (Supplement A  augmented with SAS proc
More informationNumerical Methods I Singular Value Decomposition
Numerical Methods I Singular Value Decomposition Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATHGA 2011.003 / CSCIGA 2945.003, Fall 2014 October 9th, 2014 A. Donev (Courant Institute)
More informationNonlinear Component Analysis Based on Correntropy
onlinear Component Analysis Based on Correntropy JianWu Xu, Puskal P. Pokharel, António R. C. Paiva and José C. Príncipe Abstract In this paper, we propose a new nonlinear principal component analysis
More informationPrincipal Component Analysis
B: Chapter 1 HTF: Chapter 1.5 Principal Component Analysis Barnabás Póczos University of Alberta Nov, 009 Contents Motivation PCA algorithms Applications Face recognition Facial expression recognition
More informationRandom Sampling of Bandlimited Signals on Graphs
Random Sampling of Bandlimited Signals on Graphs Pierre Vandergheynst École Polytechnique Fédérale de Lausanne (EPFL) School of Engineering & School of Computer and Communication Sciences Joint work with
More informationPrincipal component analysis (PCA) for clustering gene expression data
Principal component analysis (PCA) for clustering gene expression data Ka Yee Yeung Walter L. Ruzzo Bioinformatics, v17 #9 (2001) pp 763774 1 Outline of talk Background and motivation Design of our empirical
More informationLecture 4: Principal Component Analysis and Linear Dimension Reduction
Lecture 4: Principal Component Analysis and Linear Dimension Reduction Advanced Applied Multivariate Analysis STAT 2221, Fall 2013 Sungkyu Jung Department of Statistics University of Pittsburgh Email:
More informationKernel Methods. Barnabás Póczos
Kernel Methods Barnabás Póczos Outline Quick Introduction Feature space Perceptron in the feature space Kernels Mercer s theorem Finite domain Arbitrary domain Kernel families Constructing new kernels
More informationPerformance Evaluation of Nonlinear Dimensionality Reduction Methods on the BANCA Database
Performance Evaluation of Nonlinear Dimensionality Reduction Methods on the BANCA Database CMPE 544 Pattern Recognition  Term Project Report Bogazici University Hasan Faik ALAN January 5, 2013 Abstract
More informationProperties of Matrices and Operations on Matrices
Properties of Matrices and Operations on Matrices A common data structure for statistical analysis is a rectangular array or matris. Rows represent individual observational units, or just observations,
More information7. Variable extraction and dimensionality reduction
7. Variable extraction and dimensionality reduction The goal of the variable selection in the preceding chapter was to find least useful variables so that it would be possible to reduce the dimensionality
More information