Dimensionality Reduction

Similar documents
An Introduction to. Support Vector Machine

3D Geometry for Computer Graphics. Lesson 2: PCA & SVD

Kernel-based Methods and Support Vector Machines

Introduction to local (nonparametric) density estimation. methods

Dimensionality Reduction and Learning

Announcements. Recognition II. Computer Vision I. Example: Face Detection. Evaluating a binary classifier

Principal Components. Analysis. Basic Intuition. A Method of Self Organized Learning

Multivariate Transformation of Variables and Maximum Likelihood Estimation

Bayes (Naïve or not) Classifiers: Generative Approach

QR Factorization and Singular Value Decomposition COS 323

ENGI 3423 Simple Linear Regression Page 12-01

Pinaki Mitra Dept. of CSE IIT Guwahati

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

Dimensionality reduction Feature selection

Tema 5: Aprendizaje NO Supervisado: CLUSTERING Unsupervised Learning: CLUSTERING. Febrero-Mayo 2005

Point Estimation: definition of estimators

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

Out of sample extensions of PCA, kernel PCA, and MDS

Algebraic-Geometric and Probabilistic Approaches for Clustering and Dimension Reduction of Mixtures of Principle Component Subspaces

Binary classification: Support Vector Machines

CS 1675 Introduction to Machine Learning Lecture 12 Support vector machines

Simple Linear Regression

KLT Tracker. Alignment. 1. Detect Harris corners in the first frame. 2. For each Harris corner compute motion between consecutive frames

Linear Regression Linear Regression with Shrinkage. Some slides are due to Tommi Jaakkola, MIT AI Lab

Econometric Methods. Review of Estimation

Support vector machines

TESTS BASED ON MAXIMUM LIKELIHOOD

Simulation Output Analysis

Chapter 4 (Part 1): Non-Parametric Classification (Sections ) Pattern Classification 4.3) Announcements

Unsupervised Learning and Other Neural Networks

Convergence of the Desroziers scheme and its relation to the lag innovation diagnostic

Lecture 3. Sampling, sampling distributions, and parameter estimation

6. Nonparametric techniques

Overview. Basic concepts of Bayesian learning. Most probable model given data Coin tosses Linear regression Logistic regression

Radial Basis Function Networks

Ideal multigrades with trigonometric coefficients

Chapter 14 Logistic Regression Models

Big Data Analytics. Data Fitting and Sampling. Acknowledgement: Notes by Profs. R. Szeliski, S. Seitz, S. Lazebnik, K. Chaturvedi, and S.

Chapter 5 Properties of a Random Sample

Support vector machines II

Supervised learning: Linear regression Logistic regression

Bayesian Classification. CS690L Data Mining: Classification(2) Bayesian Theorem: Basics. Bayesian Theorem. Training dataset. Naïve Bayes Classifier

Lecture 3. Least Squares Fitting. Optimization Trinity 2014 P.H.S.Torr. Classic least squares. Total least squares.

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best

6.867 Machine Learning

CS 2750 Machine Learning. Lecture 8. Linear regression. CS 2750 Machine Learning. Linear regression. is a linear combination of input components x

Lecture 9: Tolerant Testing

Lecture 8: Linear Regression

Special Instructions / Useful Data

New Schedule. Dec. 8 same same same Oct. 21. ^2 weeks ^1 week ^1 week. Pattern Recognition for Vision

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:

Singular Value Decomposition. Linear Algebra (3) Singular Value Decomposition. SVD and Eigenvectors. Solving LEs with SVD

Lecture Notes Types of economic variables

Generative classification models

Block-Based Compact Thermal Modeling of Semiconductor Integrated Circuits

CLASS NOTES. for. PBAF 528: Quantitative Methods II SPRING Instructor: Jean Swanson. Daniel J. Evans School of Public Affairs

A conic cutting surface method for linear-quadraticsemidefinite

Research on SVM Prediction Model Based on Chaos Theory

8.1 Hashing Algorithms

Simple Linear Regression

G S Power Flow Solution

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

Dr. Shalabh. Indian Institute of Technology Kanpur

CSE 5526: Introduction to Neural Networks Linear Regression

Feature Selection: Part 2. 1 Greedy Algorithms (continued from the last lecture)

Comparing Different Estimators of three Parameters for Transmuted Weibull Distribution

ECON 5360 Class Notes GMM

b. There appears to be a positive relationship between X and Y; that is, as X increases, so does Y.

COV. Violation of constant variance of ε i s but they are still independent. The error term (ε) is said to be heteroscedastic.

Linear Regression with One Regressor

Nonparametric Techniques

Maximum Likelihood Estimation

Application of Global Sensitivity Indices for measuring the effectiveness of Quasi-Monte Carlo methods and parameter estimation. parameter.

PROJECTION PROBLEM FOR REGULAR POLYGONS

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

Qualifying Exam Statistical Theory Problem Solutions August 2005

Probability and. Lecture 13: and Correlation

L5 Polynomial / Spline Curves

Functions of Random Variables

LAPLACIAN MATRIX IN ALGEBRAIC GRAPH THEORY

Mean is only appropriate for interval or ratio scales, not ordinal or nominal.

Analysis of System Performance IN2072 Chapter 5 Analysis of Non Markov Systems

Applications of Multiple Biological Signals

Line Fitting and Regression

Newton s Power Flow algorithm

Discrete Mathematics and Probability Theory Fall 2016 Seshia and Walrand DIS 10b

ECE 559: Wireless Communication Project Report Diversity Multiplexing Tradeoff in MIMO Channels with partial CSIT. Hoa Pham

ESS Line Fitting

LINEAR REGRESSION ANALYSIS

Naïve Bayes MIT Course Notes Cynthia Rudin

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then

13. Parametric and Non-Parametric Uncertainties, Radial Basis Functions and Neural Network Approximations

Chapter 8. Inferences about More Than Two Population Central Values

ECONOMETRIC THEORY. MODULE VIII Lecture - 26 Heteroskedasticity

PTAS for Bin-Packing

ECON 482 / WH Hong The Simple Regression Model 1. Definition of the Simple Regression Model

Assignment 5/MATH 247/Winter Due: Friday, February 19 in class (!) (answers will be posted right after class)

Median as a Weighted Arithmetic Mean of All Sample Observations

Multiple Regression. More than 2 variables! Grade on Final. Multiple Regression 11/21/2012. Exam 2 Grades. Exam 2 Re-grades

Outline. Point Pattern Analysis Part I. Revisit IRP/CSR

Transcription:

Dmesoalty Reducto Sav Kumar, Google Research, NY EECS-6898, Columba Uversty - Fall, 010 Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 1

Curse of Dmesoalty May learg techques scale poorly wth data dmesoalty (d) Desty estmato For example, Gaussa Mxture Models (GMM) eed to estmate covarace matrces O( d ) Nearest Neghbor Search O(d) Also, performace of trees ad hashes suffers wth hgh dmesoalty Optmzato techques Frst order methods scale O (d) whle secod order O( d ) Clusterg, classfcato, regresso, Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg

Curse of Dmesoalty May learg techques scale poorly wth data dmesoalty (d) Desty estmato For example, Gaussa Mxture Models (GMM) eed to estmate covarace matrces O( d ) Nearest Neghbor Search O(d) Also, performace of trees ad hashes suffers wth hgh dmesoalty Optmzato techques Frst order methods scale O (d) whle secod order O( d ) Clusterg, classfcato, regresso, Data Vsualzato hard to do hgh-dmesoal spaces Dmesoalty Reducto Key Idea: Data dmesos put space may be statstcally depedet possble to reta most of the formato put space a lower dmesoal space Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 3

Dmesoalty Reducto 50 x 50 pxel faces R 500 50 x 50 pxel radom mages Space of face mages sgfcatly smaller tha 56 500 Wat to recover the uderlyg low-dmesoal space! Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 4

Dmesoalty Reducto Lear echques PCA, Metrc MDS, Radomzed proectos Assume data les a subspace Work well practce may cases Ca be a poor approxmato for some data Nolear echques Mafold learg methods Kerel PCA, LLE, ISOMAP, Assume local learty of data Need desely sampled data as put Other approaches Autoecoders (mult-layer Neural Networks), Computatoally more demadg tha lear methods Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 5

Prcpal Compoet Aalyss (PCA) wo vews but same soluto 1. Wat to fd best lear recostructo of the data that mmzes mea squared recostructo error. Wat to fd best subspace that maxmzes the proected data varace { 1 Suppose put data x } =, x R s cetered.e., Goal: o fd a k-dm lear embeddg y such that k < d d x x μ data mea Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 6

Prcpal Compoet Aalyss (PCA) wo vews but same soluto 1. Wat to fd best lear recostructo of the data that mmzes mea squared recostructo error. Wat to fd best subspace that maxmzes the proected data varace { 1 Suppose put data x } =, x R s cetered.e., Goal: o fd a k-dm lear embeddg y such that k < d Recostructo Vew arg m B, y d x ~ k x = =1 y b = B y d k k 1 B R, y R ~ = 1 x 1 x x = = By s.t. x μ B B = I data mea Bˆ = arg m X BB X s.t. B B = I ad yˆ = B X B d F data matrx ˆ Soluto: Get top k left sgular vectors of X (O(d )) ad proect data o them (O(kd)) Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 7

Prcpal Compoet Aalyss (PCA) Max-Varace Vew Wat to fd k-dm lear proecto y = B x such that Bˆ = argmaxr B ( ) B XX B s.t. B B = I assumg data s cetered Soluto: Get top k egevectors of XX (O(d +d 3 )) ad proect data (O(kd)) Left sgular vectors of X = Egevectors of XX Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 8

Prcpal Compoet Aalyss (PCA) Max-Varace Vew Wat to fd k-dm lear proecto y = B x such that Bˆ = argmaxr B ( ) B XX B s.t. B B = I assumg data s cetered Soluto: Get top k egevectors of XX (O(d +d 3 )) ad proect data (O(kd)) Left sgular vectors of X = Egevectors of XX Statstcal assumpto: Data s ormally dstrbuted More geeral versos (allow ose the data) - Factor Aalyss ad Probablstc PCA Ca be exteded to a olear verso usg kerels * * * * * ** * * * * * * * * * * Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 9

Metrc MDS: Gve parwse (Eucldea) dstaces amog pots, fd a low-dm embeddg that preserves the orgal dstaces MultDmesoal Scalg (MDS) Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 10 k d y x R R, = Y d y y Y, arg m ˆ k d Y X R R, x x d =

MultDmesoal Scalg (MDS) Metrc MDS: Gve parwse (Eucldea) dstaces amog pots, fd a low-dm embeddg that preserves the orgal dstaces Yˆ = Y, Frst, x dstace matrx (D) s coverted to a smlarty matrx (K) K = arg m 1 H D H y y D = d d x 1 H = I 1 1 d R, y d R k X R, Y R d = x x k 1 = [1,...,1] etres Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 11

MultDmesoal Scalg (MDS) Metrc MDS: Gve parwse (Eucldea) dstaces amog pots, fd a low-dm embeddg that preserves the orgal dstaces Yˆ = arg m Frst, x dstace matrx (D) s coverted to a smlarty matrx (K) K = 1 Soluto: Best k-dm (k < d) lear embeddg Y gve by Y, H D H y y D = d d x 1 H = I 1 1 d R, y d R k X R, Y R d = x x k 1 = [1,...,1] etres Y Y = K U k Σ k U k Y 1/ = Σk U k Embeddg detcal to that from PCA o X Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 1

Kerel (Nolear) PCA Key Idea: Istead of fdg lear proectos the orgal put space, do t the (mplct) feature space duced by a mercer kerel Φ k( x, z) = Φ( x) ( x) Scholkopf et al. [5] Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 13

Kerel (Nolear) PCA Key Idea: Istead of fdg lear proectos the orgal put space, do t the (mplct) feature space duced by a mercer kerel Let s focus o 1-dm proecto, Φ k( x, z) = Φ( x) ( x) data covarace PCA Assumpto: cetered data = 1 = X1 = 0 best drecto b C = XX kerel PCA = 1 X = x Φ( x ) = Φ( )1 0 C = Φ( X ) Φ( X ) Cb = λb = Φ( x )( Φ( x ) b) λb 1 = 1 = [1,...,1] etres Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 14

Kerel (Nolear) PCA Key Idea: Istead of fdg lear proectos the orgal put space, do t the (mplct) feature space duced by a mercer kerel Let s focus o 1-dm proecto, data covarace PCA Assumpto: cetered data = 1 = X1 = 0 best drecto b C = XX kerel PCA = 1 X = x Φ( x ) = Φ( )1 0 C = Φ( X ) Φ( X ) Cb = λb = Φ( x )( Φ( x ) b) λb = 1 ( ) Φ( x ) b / λ ) = Φ k( x, z) = Φ( x) ( x) b = = 1 Φ( x = 1αΦ( x ) λ 0 = Φ(X )α b les the spa of mapped put pots! 1 = [1,...,1] etres Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 15

Kerel (Nolear) PCA Key Idea: Istead of fdg lear proectos the orgal put space, do t the (mplct) feature space duced by a mercer kerel Let s focus o 1-dm proecto, data covarace PCA Assumpto: cetered data = 1 = X1 = 0 best drecto b C = XX kerel PCA = 1 X = x Φ( x ) = Φ( )1 0 C = Φ( X ) Φ( X ) Cb = λb = Φ( x )( Φ( x ) b) λb Φ( X ) Φ( X ) Φ( X ) α = λφ( X ) α k( x, z) = Φ( x) ( x) Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 16 = 1 ( ) Φ( x ) b / λ ) = Φ b = = 1 Φ( x = 1αΦ( x ) λ 0 = Φ(X )α b les the spa of mapped put pots! Premultply by Φ(X ) ad replace Φ ( X ) Φ( X ) = K K α = λkα Kα = λα K s postve-defte (otherwse, other solutos are ot of terest) 1 = [1,...,1] etres

Kerel (Nolear) PCA Ma Computato: Fd top k egevectors of kerel matrx Fal soluto: b = Φ( X ) α K α = λα O( k)! but eed to have ut-legth Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 17

Kerel (Nolear) PCA Ma Computato: Fd top k egevectors of kerel matrx Fal soluto: b = Φ( X ) α K α = λα O( k)! but eed to have ut-legth b b = 1 α Kα =1 λ α α = 1 α 1 = λ Proecto of a pot x y = Φ( x ) b = (1/ λ ) K α th colum of K Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 18

Kerel (Nolear) PCA Ma Computato: Fd top k egevectors of kerel matrx Fal soluto: b = Φ( X ) α K α = λα O( k)! but eed to have ut-legth b b = 1 α Kα =1 λ α α = 1 α 1 = λ Proecto of a pot x y = Φ( x ) b = (1/ λ ) K α th colum of K What f we wat to fd a proecto for a ew pot ot see durg trag? Kow as out-of-sample exteso Not as straghtforward as for lear PCA Ca be thought of as addg aother row ad colum kerel Matrx o avod recomputg egedecomposto of exteded Kerel matrx, use Nystrom method to approxmate the ew embeddg (recall matrx approxmatos) Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 19

Ceterg Feature Space We assumed that data was cetered feature space Easy to do wth features {x } How to do t mapped feature space {Φ(x )} as explct mappg may be ukow? Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 0

Ceterg Feature Space We assumed that data was cetered feature space Easy to do wth features {x } How to do t mapped feature space {Φ(x )} as explct mappg may be ukow? We wat: Φ = (1/ ) = 1 Φ( x ) Φ x ) Φ( x ) Φ ( But we eed data oly through kerel matrx, so get cetered kerel matrx ~ K = K 1 K K1 + 1 K1 ( 1 ) = 1/,, = 1,..., Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 1

Ceterg Feature Space We assumed that data was cetered feature space Easy to do wth features {x } How to do t mapped feature space {Φ(x )} as explct mappg may be ukow? We wat: Φ = (1/ ) = 1 Φ( x ) Φ x ) Φ( x ) Φ ( But we eed data oly through kerel matrx, so get cetered kerel matrx Iterpretato ~ K = K 1 K K1 + 1 K1 ( 1 ) = 1/,, = 1,..., ~ K = K m m + m K m mea of th row mea of th col mea of all etres K m m Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg

Locally Lear Embeddg (LLE) Key Idea: Gve suffcet samples, each data pot ad ts eghbors are assumed to le close to a locally lear patch. ry to recostruct each data pot from ts t eghbors O( d) x ~ w x ~ dcates eghbors of Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 3

Locally Lear Embeddg (LLE) Key Idea: Gve suffcet samples, each data pot ad ts eghbors are assumed to le close to a locally lear patch. ry to recostruct each data pot from ts t eghbors O( d) x ~ w x ~ dcates eghbors of Lear the weghts by solvg argm x w x s.t. ~ w = 1 w O( dt ~ 3 ) Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 4

Locally Lear Embeddg (LLE) Key Idea: Gve suffcet samples, each data pot ad ts eghbors are assumed to le close to a locally lear patch. ry to recostruct each data pot from ts t eghbors O( d) x ~ w x ~ dcates eghbors of Lear the weghts by solvg argm x w x s.t. ~ w = 1 w O( dt ~ Assumpto: Same weghts recostruct the low-dm embeddg also costruct a sparse x matrx argm y w y s.t. y = 0 Y M ~ Get bottom k egevectors gorg the last O( k) Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 5 3 ) = ( I W ) ( I W ) (1/ ) y y = I

PCA vs LLE PCA A face mage traslated space agast radom backgroud = 961, d = 3009, t = 4, k = LLE Rowes & Saul [5] Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 6

ISOMAP Fd the low-dmesoal represetato that best preserves geodesc dstaces betwee pots MDS wth geodesc dstaces Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 7

ISOMAP Fd the low-dmesoal represetato that best preserves geodesc dstaces betwee pots MDS wth geodesc dstaces y y Output co-ordates Yˆ = arg m Y, y y Δ Geodesc dstace Recovers true (covex) mafold asymptotcally! Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 8

ISOMAP Gve put pots: 1. Fd t earest eghbors for each pot : O( ). Fd shortest path dstace for every (, ), Δ : O( log ) 3. Costruct matrx K wth etres as cetered Δ K s a dese matrx 4. Optmal k reduced dms: Σ 1/ k U O( k)! Egevalues Egevectors k Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 9

ISOMAP Expermet Face mage take wth two pose varatos (left-rght ad up-dow), ad 1-D llumato drecto, d = 4096, = 698 aebaum et al. [7] Issue: Qute sestve to false edges the graph ( short-crcut ) Oe wrog edge may cause the shortest paths to chage drastcally Better to use expected commute tme betwee two odes Laplaca Egemaps Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 30

Laplaca Egemaps Mmze weghted dstaces betwee eghbors Yˆ = arg m Y ~ W y D y D D = W Aother formulato Yˆ = s.t arg mr[ Y Y Y DY = I LY ] L = D W Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 31

Laplaca Egemaps Mmze weghted dstaces betwee eghbors 1. Fd t earest eghbors for each pot : O( ). Compute weght matrx W: W = exp( x x / σ 0 otherwse ) f ~ 3. Compute ormalzed laplaca K 4. Optmal k reduced dms: U k 1/ 1/ = I D WD where D = W Bottom egevectors of K gorg last O( k) but ca do much faster usg Arold s/laczos method sce matrx s sparse Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 3

Maxmum Varace Ufoldg (MVU) Key Idea: Fd embeddg wth maxmum varace that Preserves agles ad legths for edges betwee earest eghbors Agles/dstaces preservato costrat y y = x x If there s a edge (, ) the graph formed by parwse coectg all t earest eghbors Weberger ad Saul [1] Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 33

Maxmum Varace Ufoldg (MVU) Key Idea: Fd embeddg wth maxmum varace that Preserves agles ad legths for edges betwee earest eghbors Agles/dstaces preservato costrat y y = x x If there s a edge (, ) the graph formed by parwse coectg all t earest eghbors Ceterg costrat (for traslatoal varace) y = 0 Weberger ad Saul [1] Optmzato Crtero Maxmze squared parwse dstaces betwee embeddgs argmax y y Y, Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 34 s.t. above costrats Same as maxmzg varace of the outputs!

Maxmum Varace Ufoldg (MVU) Reformulato: Usg a kerel K, such that K = y y Agles/dstaces preservato K K + K = d = x x Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 35

Maxmum Varace Ufoldg (MVU) Reformulato: Usg a kerel K, such that K = y y Agles/dstaces preservato K K + K = d = x x Ceterg costrat = K 0 y = = y 0 Symmetrc Postve-Defte costrat Sem-Defte Program! O( 3 +c 3 ) # of costrats Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 36

Maxmum Varace Ufoldg (MVU) Reformulato: Usg a kerel K, such that K = y y Agles/dstaces preservato K K + K = d = x x Ceterg costrat = K 0 y = = y 0 Symmetrc Postve-Defte costrat Sem-Defte Program! O( 3 +c 3 ) Max-varace obectve fucto r(k) # of costrats Fal soluto Y 1/ Σk U k = op k egevalues ad egevectors of K Ca relax the hard costrats va slack varables! Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 37

PCA vs MVU refol kot, = 1617, d = 3, t = 5, k = A teapot vewed rotated 180 deg a plae, = 00, d = 308, t = 4, k = 1 Weberger ad Saul [1] Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 38

Large-Scale Face Mafold Learg Costruct Web dataset Extracted 18M faces from.5b teret mages ~15 hours o 500 maches Faces ormalzed to zero mea ad ut varace Graph costructo Approx Nearest Neghbor Spll rees 5 NN, ~ days Ca be doe much faster usg approprate hashes! alwalkar, Kumar, Rowley [13] Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 39

Neghborhood Graph Costructo Coect each ode (face) wth ts eghbors Is the graph coected? Depth-Frst-Search to fd largest coected compoet 10 mutes o a sgle mache Largest compoet depeds o umber of NN ( t ) alwalkar, Kumar, Rowley [13] Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 40

Samples from coected compoets From Largest Compoet From Smaller Compoets alwalkar, Kumar, Rowley [13] Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 41

Graph Mapulato Approxmatg Geodescs Shortest paths betwee pars of face mages Computg for all pars feasble O( log )! Key Idea: Need oly a few colums of K for samplg-based spectral decomposto requre shortest paths betwee a few ( l ) odes ad all other odes 1 hour o 500 maches (l = 10K) Computg Embeddgs (k = 100) Nystrom: 1.5 hours, 500 mache Col-Samplg: 6 hours, 500 maches Proectos: 15 ms, 500 maches alwalkar, Kumar, Rowley [13] Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 4

CMU-PIE Dataset 68 people, 13 poses, 43 llumatos, 4 expressos 35,47 faces detected by a face detector Classfcato ad clusterg o poses Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 43

Optmal D embeddgs alwalkar, Kumar, Rowley [13] Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 44

Clusterg K-meas clusterg after trasformato (k = 100) K fxed to be the same as umber of classes wo metrcs Purty - pots wth a cluster come from the same class Accuracy - pots from a class form a sgle cluster Matrx K s ot guarateed to be postve sem-defte Isomap! - Nystrom: EVD of W (ca gore egatve egevalues) - Col-samplg: SVD of C (sgs are lost)! alwalkar, Kumar, Rowley [13] Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 45

Expermets - Classfcato K-Nearest Neghbor Classfcato after Embeddg (%) Classfcato error for 10 radom splts alwalkar, Kumar, Rowley [13] Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 46

18M-Mafold D Nystrom Isomap alwalkar, Kumar, Rowley [13] Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 47

Shortest Paths o Mafold 18M samples ot eough! alwalkar, Kumar, Rowley [13] Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 48

People Hopper Iterface Orkut Gadget Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 49

Mafold Learg Ope Questos Does a mafold really exst for a gve dataset? Is t really coected or covex? Istead of lyg o a mafold, may be data lves small clusters dfferet subspaces? Ay practcal beefts of olear dmesoalty reducto (mafold learg) clusterg/classfcato? Most of the results o toy data, o real practcal utlty so far I practce, PCA eough to gve most of the beefts (f ay) Istead of lookg for yet aother mafold learg method, better to focus o solvg f a mafold exsts ad how to quatfy that Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 50

Refereces 1. K. Pearso, "O Les ad Plaes of Closest Ft to Systems of Pots Space". Phlosophcal Magaze (6): 559 57. 1901.. C. Spearma, Geeral Itellgece, Obectvely Determed ad Measured, Amerca Joural of Psychology, 1904. (factor aalyss) 3. I.. Jollffe. Prcpal Compoet Aalyss. Sprger-Verlag. pp. 487, 1986. 4.. Cox, & M. Cox. Multdmesoal scalg. Chapma & Hall, 1994. 5. B. Schölkopf, A. Smola, K.-R. Muller, Kerel Prcpal Compoet Aalyss, I: Berhard Schölkopf, Chrstopher J. C. Burges, Alexader J. Smola (Eds.), Advaces Kerel Methods-Support Vector Learg, 1999, MI Press Cambrdge, MA, USA, 37 35. 6. S.. Rowes ad L. K. Saul, Nolear Dmesoalty Reducto by Locally Lear Embeddg, Scece, December 000. 7. J. B. eebaum, V. de Slva ad J. C. Lagford, A Global Geometrc Framework for Nolear Dmesoalty Reducto, Scece 90 (5500): 319-33, 000. 8. M. Belk ad P. Nyog, Laplaca Egemaps ad Spectral echques for Embeddg ad Clusterg, Advaces Neural Iformato Processg Systems 14, 001, p. 586-691. 9. D. Dooho ad C. Grmes, "Hessa egemaps: Locally lear embeddg techques for hghdmesoal data" Proc Natl Acad Sc U S A. 003 May 13; 100(10): 5591 5596. 10. Y. Bego, J F Paemet, P. Vcet, O. Delalleau, N. Le Roux, M. Oumet, Out-of-sample extesos for lle, somap, mds, egemaps, ad spectral clusterg, NIPS, 004. 11. G. E. Hto* ad R. R. Salakhutdov, Reducg the Dmesoalty of Data wth Neural Networks, Scece, 006, Vol. 313. o. 5786, pp. 504-507. 1. K. Q. Weberger ad L. K. Saul, Usupervsed Learg of Image Mafolds by Semdefte Programmg, Iteratoal Joural of Computer Vso (IJCV), 70(1), 006. 13. A. alwalkar, S. Kumar ad H. Rowley, Large Scale Mafold Learg, CVPR, 008. 14. B. Shaw ad. Jebara, Structure Preservg Embeddg, ICML, 009. Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 51