Dimensionality Reduction - PDF Free Download

Dmesoalty Reducto Sav Kumar, Google Research, NY EECS-6898, Columba Uversty - Fall, 010 Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 1

Curse of Dmesoalty May learg techques scale poorly wth data dmesoalty (d) Desty estmato For example, Gaussa Mxture Models (GMM) eed to estmate covarace matrces O( d ) Nearest Neghbor Search O(d) Also, performace of trees ad hashes suffers wth hgh dmesoalty Optmzato techques Frst order methods scale O (d) whle secod order O( d ) Clusterg, classfcato, regresso, Data Vsualzato hard to do hgh-dmesoal spaces Dmesoalty Reducto Key Idea: Data dmesos put space may be statstcally depedet possble to reta most of the formato put space a lower dmesoal space Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 3

Dmesoalty Reducto 50 x 50 pxel faces R 500 50 x 50 pxel radom mages Space of face mages sgfcatly smaller tha 56 500 Wat to recover the uderlyg low-dmesoal space! Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 4

Dmesoalty Reducto Lear echques PCA, Metrc MDS, Radomzed proectos Assume data les a subspace Work well practce may cases Ca be a poor approxmato for some data Nolear echques Mafold learg methods Kerel PCA, LLE, ISOMAP, Assume local learty of data Need desely sampled data as put Other approaches Autoecoders (mult-layer Neural Networks), Computatoally more demadg tha lear methods Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 5

Prcpal Compoet Aalyss (PCA) wo vews but same soluto 1. Wat to fd best lear recostructo of the data that mmzes mea squared recostructo error. Wat to fd best subspace that maxmzes the proected data varace { 1 Suppose put data x } =, x R s cetered.e., Goal: o fd a k-dm lear embeddg y such that k < d Recostructo Vew arg m B, y d x ~ k x = =1 y b = B y d k k 1 B R, y R ~ = 1 x 1 x x = = By s.t. x μ B B = I data mea Bˆ = arg m X BB X s.t. B B = I ad yˆ = B X B d F data matrx ˆ Soluto: Get top k left sgular vectors of X (O(d )) ad proect data o them (O(kd)) Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 7

Prcpal Compoet Aalyss (PCA) Max-Varace Vew Wat to fd k-dm lear proecto y = B x such that Bˆ = argmaxr B ( ) B XX B s.t. B B = I assumg data s cetered Soluto: Get top k egevectors of XX (O(d +d 3 )) ad proect data (O(kd)) Left sgular vectors of X = Egevectors of XX Statstcal assumpto: Data s ormally dstrbuted More geeral versos (allow ose the data) - Factor Aalyss ad Probablstc PCA Ca be exteded to a olear verso usg kerels * * * * * ** * * * * * * * * * * Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 9

Metrc MDS: Gve parwse (Eucldea) dstaces amog pots, fd a low-dm embeddg that preserves the orgal dstaces MultDmesoal Scalg (MDS) Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 10 k d y x R R, = Y d y y Y, arg m ˆ k d Y X R R, x x d =

MultDmesoal Scalg (MDS) Metrc MDS: Gve parwse (Eucldea) dstaces amog pots, fd a low-dm embeddg that preserves the orgal dstaces Yˆ = Y, Frst, x dstace matrx (D) s coverted to a smlarty matrx (K) K = arg m 1 H D H y y D = d d x 1 H = I 1 1 d R, y d R k X R, Y R d = x x k 1 = [1,...,1] etres Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 11

MultDmesoal Scalg (MDS) Metrc MDS: Gve parwse (Eucldea) dstaces amog pots, fd a low-dm embeddg that preserves the orgal dstaces Yˆ = arg m Frst, x dstace matrx (D) s coverted to a smlarty matrx (K) K = 1 Soluto: Best k-dm (k < d) lear embeddg Y gve by Y, H D H y y D = d d x 1 H = I 1 1 d R, y d R k X R, Y R d = x x k 1 = [1,...,1] etres Y Y = K U k Σ k U k Y 1/ = Σk U k Embeddg detcal to that from PCA o X Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 1

Kerel (Nolear) PCA Key Idea: Istead of fdg lear proectos the orgal put space, do t the (mplct) feature space duced by a mercer kerel Φ k( x, z) = Φ( x) ( x) Scholkopf et al. [5] Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 13

Kerel (Nolear) PCA Key Idea: Istead of fdg lear proectos the orgal put space, do t the (mplct) feature space duced by a mercer kerel Let s focus o 1-dm proecto, Φ k( x, z) = Φ( x) ( x) data covarace PCA Assumpto: cetered data = 1 = X1 = 0 best drecto b C = XX kerel PCA = 1 X = x Φ( x ) = Φ( )1 0 C = Φ( X ) Φ( X ) Cb = λb = Φ( x )( Φ( x ) b) λb 1 = 1 = [1,...,1] etres Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 14

Kerel (Nolear) PCA Key Idea: Istead of fdg lear proectos the orgal put space, do t the (mplct) feature space duced by a mercer kerel Let s focus o 1-dm proecto, data covarace PCA Assumpto: cetered data = 1 = X1 = 0 best drecto b C = XX kerel PCA = 1 X = x Φ( x ) = Φ( )1 0 C = Φ( X ) Φ( X ) Cb = λb = Φ( x )( Φ( x ) b) λb = 1 ( ) Φ( x ) b / λ ) = Φ k( x, z) = Φ( x) ( x) b = = 1 Φ( x = 1αΦ( x ) λ 0 = Φ(X )α b les the spa of mapped put pots! 1 = [1,...,1] etres Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 15

Kerel (Nolear) PCA Key Idea: Istead of fdg lear proectos the orgal put space, do t the (mplct) feature space duced by a mercer kerel Let s focus o 1-dm proecto, data covarace PCA Assumpto: cetered data = 1 = X1 = 0 best drecto b C = XX kerel PCA = 1 X = x Φ( x ) = Φ( )1 0 C = Φ( X ) Φ( X ) Cb = λb = Φ( x )( Φ( x ) b) λb Φ( X ) Φ( X ) Φ( X ) α = λφ( X ) α k( x, z) = Φ( x) ( x) Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 16 = 1 ( ) Φ( x ) b / λ ) = Φ b = = 1 Φ( x = 1αΦ( x ) λ 0 = Φ(X )α b les the spa of mapped put pots! Premultply by Φ(X ) ad replace Φ ( X ) Φ( X ) = K K α = λkα Kα = λα K s postve-defte (otherwse, other solutos are ot of terest) 1 = [1,...,1] etres

Kerel (Nolear) PCA Ma Computato: Fd top k egevectors of kerel matrx Fal soluto: b = Φ( X ) α K α = λα O( k)! but eed to have ut-legth Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 17

Kerel (Nolear) PCA Ma Computato: Fd top k egevectors of kerel matrx Fal soluto: b = Φ( X ) α K α = λα O( k)! but eed to have ut-legth b b = 1 α Kα =1 λ α α = 1 α 1 = λ Proecto of a pot x y = Φ( x ) b = (1/ λ ) K α th colum of K What f we wat to fd a proecto for a ew pot ot see durg trag? Kow as out-of-sample exteso Not as straghtforward as for lear PCA Ca be thought of as addg aother row ad colum kerel Matrx o avod recomputg egedecomposto of exteded Kerel matrx, use Nystrom method to approxmate the ew embeddg (recall matrx approxmatos) Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 19

Ceterg Feature Space We assumed that data was cetered feature space Easy to do wth features {x } How to do t mapped feature space {Φ(x )} as explct mappg may be ukow? Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 0

Ceterg Feature Space We assumed that data was cetered feature space Easy to do wth features {x } How to do t mapped feature space {Φ(x )} as explct mappg may be ukow? We wat: Φ = (1/ ) = 1 Φ( x ) Φ x ) Φ( x ) Φ ( But we eed data oly through kerel matrx, so get cetered kerel matrx ~ K = K 1 K K1 + 1 K1 ( 1 ) = 1/,, = 1,..., Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 1

Ceterg Feature Space We assumed that data was cetered feature space Easy to do wth features {x } How to do t mapped feature space {Φ(x )} as explct mappg may be ukow? We wat: Φ = (1/ ) = 1 Φ( x ) Φ x ) Φ( x ) Φ ( But we eed data oly through kerel matrx, so get cetered kerel matrx Iterpretato ~ K = K 1 K K1 + 1 K1 ( 1 ) = 1/,, = 1,..., ~ K = K m m + m K m mea of th row mea of th col mea of all etres K m m Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg

Locally Lear Embeddg (LLE) Key Idea: Gve suffcet samples, each data pot ad ts eghbors are assumed to le close to a locally lear patch. ry to recostruct each data pot from ts t eghbors O( d) x ~ w x ~ dcates eghbors of Lear the weghts by solvg argm x w x s.t. ~ w = 1 w O( dt ~ Assumpto: Same weghts recostruct the low-dm embeddg also costruct a sparse x matrx argm y w y s.t. y = 0 Y M ~ Get bottom k egevectors gorg the last O( k) Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 5 3 ) = ( I W ) ( I W ) (1/ ) y y = I

PCA vs LLE PCA A face mage traslated space agast radom backgroud = 961, d = 3009, t = 4, k = LLE Rowes & Saul [5] Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 6

ISOMAP Fd the low-dmesoal represetato that best preserves geodesc dstaces betwee pots MDS wth geodesc dstaces Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 7

ISOMAP Fd the low-dmesoal represetato that best preserves geodesc dstaces betwee pots MDS wth geodesc dstaces y y Output co-ordates Yˆ = arg m Y, y y Δ Geodesc dstace Recovers true (covex) mafold asymptotcally! Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 8

ISOMAP Gve put pots: 1. Fd t earest eghbors for each pot : O( ). Fd shortest path dstace for every (, ), Δ : O( log ) 3. Costruct matrx K wth etres as cetered Δ K s a dese matrx 4. Optmal k reduced dms: Σ 1/ k U O( k)! Egevalues Egevectors k Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 9

ISOMAP Expermet Face mage take wth two pose varatos (left-rght ad up-dow), ad 1-D llumato drecto, d = 4096, = 698 aebaum et al. [7] Issue: Qute sestve to false edges the graph ( short-crcut ) Oe wrog edge may cause the shortest paths to chage drastcally Better to use expected commute tme betwee two odes Laplaca Egemaps Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 30

Laplaca Egemaps Mmze weghted dstaces betwee eghbors Yˆ = arg m Y ~ W y D y D D = W Aother formulato Yˆ = s.t arg mr[ Y Y Y DY = I LY ] L = D W Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 31

Laplaca Egemaps Mmze weghted dstaces betwee eghbors 1. Fd t earest eghbors for each pot : O( ). Compute weght matrx W: W = exp( x x / σ 0 otherwse ) f ~ 3. Compute ormalzed laplaca K 4. Optmal k reduced dms: U k 1/ 1/ = I D WD where D = W Bottom egevectors of K gorg last O( k) but ca do much faster usg Arold s/laczos method sce matrx s sparse Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 3

Maxmum Varace Ufoldg (MVU) Key Idea: Fd embeddg wth maxmum varace that Preserves agles ad legths for edges betwee earest eghbors Agles/dstaces preservato costrat y y = x x If there s a edge (, ) the graph formed by parwse coectg all t earest eghbors Ceterg costrat (for traslatoal varace) y = 0 Weberger ad Saul [1] Optmzato Crtero Maxmze squared parwse dstaces betwee embeddgs argmax y y Y, Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 34 s.t. above costrats Same as maxmzg varace of the outputs!

Maxmum Varace Ufoldg (MVU) Reformulato: Usg a kerel K, such that K = y y Agles/dstaces preservato K K + K = d = x x Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 35

Maxmum Varace Ufoldg (MVU) Reformulato: Usg a kerel K, such that K = y y Agles/dstaces preservato K K + K = d = x x Ceterg costrat = K 0 y = = y 0 Symmetrc Postve-Defte costrat Sem-Defte Program! O( 3 +c 3 ) Max-varace obectve fucto r(k) # of costrats Fal soluto Y 1/ Σk U k = op k egevalues ad egevectors of K Ca relax the hard costrats va slack varables! Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 37

PCA vs MVU refol kot, = 1617, d = 3, t = 5, k = A teapot vewed rotated 180 deg a plae, = 00, d = 308, t = 4, k = 1 Weberger ad Saul [1] Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 38

Large-Scale Face Mafold Learg Costruct Web dataset Extracted 18M faces from.5b teret mages ~15 hours o 500 maches Faces ormalzed to zero mea ad ut varace Graph costructo Approx Nearest Neghbor Spll rees 5 NN, ~ days Ca be doe much faster usg approprate hashes! alwalkar, Kumar, Rowley [13] Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 39

Neghborhood Graph Costructo Coect each ode (face) wth ts eghbors Is the graph coected? Depth-Frst-Search to fd largest coected compoet 10 mutes o a sgle mache Largest compoet depeds o umber of NN ( t ) alwalkar, Kumar, Rowley [13] Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 40

Samples from coected compoets From Largest Compoet From Smaller Compoets alwalkar, Kumar, Rowley [13] Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 41

Graph Mapulato Approxmatg Geodescs Shortest paths betwee pars of face mages Computg for all pars feasble O( log )! Key Idea: Need oly a few colums of K for samplg-based spectral decomposto requre shortest paths betwee a few ( l ) odes ad all other odes 1 hour o 500 maches (l = 10K) Computg Embeddgs (k = 100) Nystrom: 1.5 hours, 500 mache Col-Samplg: 6 hours, 500 maches Proectos: 15 ms, 500 maches alwalkar, Kumar, Rowley [13] Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 4

CMU-PIE Dataset 68 people, 13 poses, 43 llumatos, 4 expressos 35,47 faces detected by a face detector Classfcato ad clusterg o poses Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 43

Optmal D embeddgs alwalkar, Kumar, Rowley [13] Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 44

Clusterg K-meas clusterg after trasformato (k = 100) K fxed to be the same as umber of classes wo metrcs Purty - pots wth a cluster come from the same class Accuracy - pots from a class form a sgle cluster Matrx K s ot guarateed to be postve sem-defte Isomap! - Nystrom: EVD of W (ca gore egatve egevalues) - Col-samplg: SVD of C (sgs are lost)! alwalkar, Kumar, Rowley [13] Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 45

Expermets - Classfcato K-Nearest Neghbor Classfcato after Embeddg (%) Classfcato error for 10 radom splts alwalkar, Kumar, Rowley [13] Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 46

18M-Mafold D Nystrom Isomap alwalkar, Kumar, Rowley [13] Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 47

Shortest Paths o Mafold 18M samples ot eough! alwalkar, Kumar, Rowley [13] Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 48

People Hopper Iterface Orkut Gadget Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 49

Mafold Learg Ope Questos Does a mafold really exst for a gve dataset? Is t really coected or covex? Istead of lyg o a mafold, may be data lves small clusters dfferet subspaces? Ay practcal beefts of olear dmesoalty reducto (mafold learg) clusterg/classfcato? Most of the results o toy data, o real practcal utlty so far I practce, PCA eough to gve most of the beefts (f ay) Istead of lookg for yet aother mafold learg method, better to focus o solvg f a mafold exsts ad how to quatfy that Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 50

Refereces 1. K. Pearso, "O Les ad Plaes of Closest Ft to Systems of Pots Space". Phlosophcal Magaze (6): 559 57. 1901.. C. Spearma, Geeral Itellgece, Obectvely Determed ad Measured, Amerca Joural of Psychology, 1904. (factor aalyss) 3. I.. Jollffe. Prcpal Compoet Aalyss. Sprger-Verlag. pp. 487, 1986. 4.. Cox, & M. Cox. Multdmesoal scalg. Chapma & Hall, 1994. 5. B. Schölkopf, A. Smola, K.-R. Muller, Kerel Prcpal Compoet Aalyss, I: Berhard Schölkopf, Chrstopher J. C. Burges, Alexader J. Smola (Eds.), Advaces Kerel Methods-Support Vector Learg, 1999, MI Press Cambrdge, MA, USA, 37 35. 6. S.. Rowes ad L. K. Saul, Nolear Dmesoalty Reducto by Locally Lear Embeddg, Scece, December 000. 7. J. B. eebaum, V. de Slva ad J. C. Lagford, A Global Geometrc Framework for Nolear Dmesoalty Reducto, Scece 90 (5500): 319-33, 000. 8. M. Belk ad P. Nyog, Laplaca Egemaps ad Spectral echques for Embeddg ad Clusterg, Advaces Neural Iformato Processg Systems 14, 001, p. 586-691. 9. D. Dooho ad C. Grmes, "Hessa egemaps: Locally lear embeddg techques for hghdmesoal data" Proc Natl Acad Sc U S A. 003 May 13; 100(10): 5591 5596. 10. Y. Bego, J F Paemet, P. Vcet, O. Delalleau, N. Le Roux, M. Oumet, Out-of-sample extesos for lle, somap, mds, egemaps, ad spectral clusterg, NIPS, 004. 11. G. E. Hto* ad R. R. Salakhutdov, Reducg the Dmesoalty of Data wth Neural Networks, Scece, 006, Vol. 313. o. 5786, pp. 504-507. 1. K. Q. Weberger ad L. K. Saul, Usupervsed Learg of Image Mafolds by Semdefte Programmg, Iteratoal Joural of Computer Vso (IJCV), 70(1), 006. 13. A. alwalkar, S. Kumar ad H. Rowley, Large Scale Mafold Learg, CVPR, 008. 14. B. Shaw ad. Jebara, Structure Preservg Embeddg, ICML, 009. Sav Kumar 11/16/010 EECS6898 Large Scale Mache Learg 51