Smlartes, Dstances and Manfold Learnng Prof. Rchard C. Wlson Dept. of Computer Scence Unversty of York
Part I: Eucldean Space Poston, Smlarty and Dstance Manfold Learnng n Eucldean space Some famous technques Part II: Non-Eucldean Manfolds Assessng Data Nature and Propertes of Manfolds Data Manfolds Learnng some specal types of manfolds Part III: Advanced Technques Methods for ntrnscally curved manfolds Thanks to Edwn Hancock, Elza Xu, Bob Dun for contrbutons And support from the EU SIMBAD project
Part I: Eucldean Space
Poston The man arena for pattern recognton and machne learnng problems s vector space A set of n well defned features collected nto a vector R n Also defned are addton of vectors and multplcaton by a scalar Feature vector poston
Smlarty To make meanngful progress, we need a noton of smlarty, the nner product x, y The nner-product x,y can be consdered to be a smlarty between x and y In Eucldean space, poston, smlarty are all neatly connected Poston Smlarty Dstance (squared) x, y x y x, y d 2 ( x, y) x y x y, x y
The Golden Tro In Eucldean space, the concepts of poston, smlarty and dstance are elegantly connected Poston X Dstance D Smlarty K
Pont poston matrx In a normal manfold learnng problem, we have a set of samples X={x 1,x 2,...,x m } These can be collected together n a matrx X x x X x T 1 T 2 T m I use ths conventon, but others may wrte them vertcally
Centrng A common and mportant operaton s centrng movng the mean to the orgn Centred ponts behave better JX / m matrx s the mean matrx, so J s the all-ones matrx X JX / m s the centred Ths can be done wth C C I J / m X CX C s the centrng matrx (and s symmetrc C=C T )
Poston-Smlarty The smlarty matrx K s defned as K j x, x From the defnton of X, we smply get The Gram matrx s the smlarty matrx of the centred ponts (from the defnton of X) K c K XX CXX.e. a centrng operaton on K T C T T j CKC Poston X Smlarty K K c s really a kernel matrx for the ponts (lnear kernel)
Poston-Smlarty To go from K to X, we need to consder the egendecomposton of K K K UΛU XX As long as we can take the square root of Λ then we can fnd X as T T 1/2 X UΛ Poston X Smlarty K
kernel embeddng Kernel embeddng Fnds a Eucldean manfold from object smlartes K UΛU 1/2 X UΛ Embeds a kernel matrx nto a set of ponts n Eucldean space (the ponts are automatcally centred) K must have no negatve egenvalues,.e. t s a kernel matrx (Mercer condton) T
Smlarty-Dstance Smlarty K Dstance D j s j jj j j j j j j D K K K d, 2 2, 2,,, ), ( x x x x x x x x x x x x j s j jj D K K K, 2 We can easly determne D s from K
Smlarty-Dstance What about fndng K from D s? D 2K s, j K K jj j Lookng at the top equaton, we mght magne that K=-½ D s s a sutable choce Not centred; the relatonshp s actually K 1 2 CD C s
Classc MDS Classc Multdmensonal Scalng embeds a (squared) dstance matrx nto Eucldean space Usng what we have so far, the algorthm s smple 1 K CDsC 2 T UΛU K X UΛ 1/2 Ths s MDS Compute the kernel Egendecompose the kernel Embed the kernel Poston X Dstance D
The Golden Tro MDS Poston X Kernel Embeddng Dstance D Smlarty K 1 K CDsC 2 D 2K s, j K K jj j
Kernel methods A kernel s functon k(,j) whch computes an nner-product k(, j) x, x But wthout needng to know the actual ponts (the space s mplct) Usng a kernel functon we can drectly compute K wthout knowng X Poston X j Dstance D Smlarty K Kernel functon
Kernel methods The mpled space may be very hgh dmensonal, but a true kernel wll always produce a postve semdefnte K and the mpled space wll be Eucldean Many (most?) PR algorthms can be kernelzed Made to use K rather than X or D The trck s to note that any nterestng vector should le n the space spanned by the examples we are gven Hence t can be wrtten as a lnear combnaton u x X Look for α nstead of u 1 T 1 α x 2 2 m x m
Kernel PCA What about PCA? PCA solves the followng problem Let s kernelze: Xu X u Σu u u u u T T T n 1 arg mn arg mn * 1 1 α K α α XX XX α α X X X α X Xu X u 2 1 1 ) ( ) ( 1 1 T T T T T T T T T T n n n n
Kernel PCA K 2 has the same egenvectors as K, so the egenvectors of PCA are the same as the egenvectors of K The egenvalues of PCA are related to the egenvectors of K by 1 2 PCA n K Kernel PCA s a kernel embeddng wth an externally provded kernel matrx
Kernel PCA So kernel PCA gves the same soluton as kernel embeddng The egenvalues are modfed a bt They are essentally the same thng n Eucldean space MDS uses the kernel and kernel embeddng MDS and PCA are essentally the same thng n Eucldean space Kernel embeddng, MDS and PCA all gve the same answer for a set of ponts n Eucldean space
Some useful observatons Your smlarty matrx s Eucldean ff t has no negatve egenvalues (.e. t s a kernel matrx and PSD) By smlar reasonng, your dstance matrx s Eucldean ff the smlarty matrx derved from t s PSD If the feature space s small but the number of samples s large, then the covarance matrx s small and t s better to do normal PCA (on the covarance matrx) If the feature space s large and the number of samples s small, then the kernel matrx wll be small and t s better to do kernel embeddng
Part II: Non-Eucldean Manfolds
Non-lnear data Much of the data n computer vson les n a hghdmensonal feature space but s constraned n some way The space of all mages of a face s a subspace of the space of all possble mages The subspace s hghly non-lnear but low dmensonal (descrbed by a few parameters)
Non-lnear data Ths cannot be exploted by the lnear subspace methods lke PCA These assume that the subspace s a Eucldean space as well A classc example s the swss roll data:
Flat Manfolds Fundamentally dfferent types of data, for example: The embeddng of ths data nto the hgh-dmensonal space s hghly curved Ths s called extrnsc curvature, the curvature of the manfold wth respect to the embeddng space Now magne that ths manfold was a pece of paper; you could unroll the paper nto a flat plane wthout dstortng t No ntrnsc curvature, n fact t s homeomorphc to Eucldean space
Ths manfold s dfferent: Curved manfold It must be stretched to map t onto a plane It has non-zero ntrnsc curvature A flatlander lvng on ths manfold can tell that t s curved, for example by measurng the rato of the radus to the crcumference of a crcle In the frst case, we mght stll hope to fnd Eucldean embeddng We can never fnd a dstorton free Eucldean embeddng of the second (n the sense that the dstances wll always have errors)
Intrnscally Eucldean Manfolds We cannot use the prevous methods on the second type of manfold, but there s stll hope for the frst The manfold s embedded n Eucldean space, but Eucldean dstance s not the correct way to measure dstance The Eucldean dstance shortcuts the manfold The geodesc dstance calculates the shortest path along the manfold
Geodescs The geodesc generalzes the concept of dstance to curved manfolds The shortest path jonng two ponts whch les completely wthn the manfold If we can correctly compute the geodesc dstances, and the manfold s ntrnscally flat, we should get Eucldean dstances whch we can plug nto our Eucldean geometry machne Poston X Geodesc Dstances Dstance D Smlarty K
ISOMAP ISOMAP s exactly such an algorthm Approxmate geodesc dstances are computed for the ponts from a graph Nearest neghbours graph For neghbours, Eucldean dstance geodesc dstances For non-neghbours, geodesc dstance approxmated by shortest dstance n graph Once we have dstances D, can use MDS to fnd Eucldean embeddng
ISOMAP: Neghbourhood graph Shortest path algorthm MDS ISOMAP ISOMAP s dstance-preservng embedded dstances should be close to geodesc dstances
Laplacan Egenmap The Laplacan Egenmap s another graph-based method of embeddng non-lnear manfolds nto Eucldean space As wth ISOMAP, form a neghbourhood graph for the dataponts Fnd the graph Laplacan as follows The adjacency matrx A s A j d e 0 2 j t f and The degree matrx D s the dagonal matrx D A j j The normalzed graph Laplacan s L I D j are connected otherwse AD 1/ 2 1/ 2
Laplacan Egenmap We fnd the Laplacan egenmap embeddng usng the egendecomposton of L L The embedded postons are X Smlar to ISOMAP UU D 1/ 2 U Structure preservng not dstance preservng T
Locally-Lnear Embeddng Locally-lnear Embeddng s another classc method whch also begns wth a neghbourhood graph We make pont (n the orgnal data) from a weghted sum of the neghbourng ponts j xˆ Wjx W j s 0 for any pont j not n the neghbourhood (and for =j) We fnd the weghts by mnmsng the reconstructon error mn xˆ x Subject to the constrans that the weghts are non-negatve and sum to 1 Gves a relatvely smple closed-form soluton W 0, 1 j W j j 2 j j
Locally-Lnear Embeddng These weghts encode how well a pont j represents a pont and can be nterpreted as the adjacency between and j A low dmensonal embeddng s found by then fndng ponts to mnmse the error mn 2 yˆ y yˆ In other words, we fnd a low-dmensonal embeddng whch preserves the adjacency relatonshps The soluton to ths embeddng problem turns out to be smply the egenvectors of the matrx M j W y T M ( I W) ( I W) LLE s scale-free: the fnal ponts have the covarance matrx I Unt scale j j
Comparson LLE mght seem lke qute a dfferent process to the prevous two, but actually very smlar We can nterpret the process as producng a kernel matrx followed by scale-free kernel embeddng k T T K ( k 1) I J W W W W n T K UΛ U X U ISOMAP Lap. Egenmap LLE Representaton Neghbourhood graph Neghbourhood graph Neghbourhood graph Smlarty matrx Embeddng From geodesc dstances X 1/ U 2 Graph Laplacan X D 1/ 2 U Reconstructon weghts X U
Comparson ISOMAP s the only method whch drectly computes and uses the geodesc dstances The other two depend ndrectly on the dstances through local structure LLE s scale-free, so the orgnal dstance scale s lost, but the local structure s preserved Computng the necessary local dmensonalty to fnd the Computng the necessary local dmensonalty to fnd the correct nearest neghbours s a problem for all such methods
Part II: Indefnte Smlartes
Non-Eucldean data Data s Eucldean ff K s psd Unless you are usng a kernel functon, ths s often not true Why does ths happen?
What type of data do I have? Startng pont: dstance matrx However we do not know apror f our measurements are representable on a manfold We wll call them dssmlartes Our startng pont to answer the queston What type of data do I have? wll be a matrx of dssmlartes D between objects Types of dssmlartes Eucldean (no ntrnsc curvature) Non-Eucldean, metrc (curved manfold) Non-metrc (no pont-lke manfold representaton)
Causes Example: Chcken peces data Dstance by algnment Global algnment of everythng could fnd Eucldean dstances Only local algnments are practcal
Causes Dssmlartes may also be non-metrc The data s metrc f t obeys the metrc condtons 1. D j 0 (nonegatvty) 2. D j = 0 ff =j (dentty of ndscernables) 3. D j = D j (symmetry) 4. D j D k + D kj (trangle nequalty) Reasonable dssmlartes should meet 1&2
Causes Symmetry D j = D j May not be symmetrc by defnton Algnment: j may fnd a better soluton than j
Causes Trangle volatons D j D k + D kj Extended objects D D D k kj j k 0 0 0 j Fnally, nose n the measure of D can cause all of these effects
Fnd the smlarty matrx K 1 2 Tests(1) CD C The data s Eucldean ff K s postve semdefnte (no negatve egenvalues) K s a kernel, explct embeddng from kernel embeddng We can then use K n a kernel algorthm Negatve egenfracton (NEF) Between 0 and 0.5 NEF 0 0 for Eucldean smlartes s
Tests(2) 3. D j = D j (symmetry) Mean, maxmum asymmetry Easy to check by lookng at pars 4. D j D k + D kj (trangle nequalty) Number, maxmum volaton Check these for your data (3 rd nvolves checkng all trples possbly expensve) Metrc data s embeddable on a (curved) Remannan manfold
Determnng the causes The negatve egenvalues 0-1 Nose 0-0.5 Extended Objects Egenvalue -2-3 -4-5 -6-7 0 Sphercal manfold Egenvalue -1-1.5-2 -2.5-3 -3.5-4 -5 Egenvalue -10-15 -20-25
Correctons If the data s non-metrc or non-eucldean, we can correct t Symmetry volatons 1 Average D j Dj ( Dj D j ) 2 For mn-cost dstances D j Dj mn( Dj, D j ) may be more approprate Trangle volatons D Constant offset j D j c ( j) Ths wll also remove non-eucldean behavour for large enough c Eucldean volatons Dscard negatve egenvalues Even when the volatons are caused by nose, some nformaton s stll lost There are many other approaches * * On Eucldean correctons for non-eucldean dssmlartes, Dun, Pekalska, Harol, Lee and Bunke, S+SSPR 08
Part III: Technques for non-eucldean Embeddngs
Known Manfolds Sometmes we have data whch les on a known but non- Eucldean manfold Examples n Computer Vson Surface normals Rotaton matrces Flow tensors (DT-MRI) Ths s not Manfold Learnng, as we already know what the manfold s What tools do we need to be able to process data lke ths? As before, dstances are the key
Example: 2D drecton Drecton of an edge n an mage, encoded as a unt vector x 1 x x 2 The average of the drecton vector sn t even a drecton vector (not unt length), let alone the correct average drecton The normal defnton of mean s not correct Because the manfold s curved x 1 n x
Tangent space The tangent space (T P ) s the Eucldean space whch s parallel to the manfold(m) at a partcular pont (P) M P T P The tangent space s a very useful tool because t s Eucldean
Exponental map: Exponental Map Exp P : T A Exp P X Exp P maps a pont X on the tangent plane onto a pont A on the manfold P s the centre of the mappng and s at the orgn on the tangent space P M The mappng s one-to-one n a local regon of P The most mportant property of the mappng s that the dstances to the centre P are preserved d T P ( X, P) d ( A, P) The geodesc dstance on the manfold equals the Eucldean dstance on the tangent plane (for dstances to the centre only) M
Exponental map The log map goes the other way, from manfold to tangent plane Log : M T P X Log P M p
Exponental Map Example on the crcle: Embed the crcle n the complex plane The manfold representng the crcle s a complex number wth magntude 1 and can be wrtten x+y=exp() Im P e P Re
In ths case t turns out that the map s related to the normal exp and log functons M T P P e P P A P P A e e P A A X log log Log X A e A A P A P P X P X A exp ) ( exp exp exp Exp
Intrnsc mean The mean of a set of samples s usually defned as the sum of the samples dvded by the number Ths s only true n Eucldean space A more general formula x arg mn x Mnmses the dstances from the mean to the samples (equvalent n Eucldean space) d ( x, x ) 2 g
Intrnsc mean We can compute ths ntrnsc mean usng the exponental map If we knew what the mean was, then we can use the mean as the centre of a map X Log From the propertes of the Exp-map, the dstances are the same d e So the mean on the tangent plane s equal to the mean on the manfold M A ( X, M ) d ( A, M ) g
Intrnsc mean Start wth a guess at the mean and move towards correct answer Ths gves us the followng algorthm Guess at a mean M 0 1. Map on to tangent plane usng M 2. Compute the mean on the tangent plane to get new estmate M +1 M k1 Exp M k 1 n Log M k A
Intrnsc Mean For many manfolds, ths procedure wll converge to the ntrnsc mean Convergence not always guaranteed Other statstcs and probablty dstrbutons on manfolds are problematc. Can hypothess a normal dstrbuton on tangent plane, but dstortons nevtable
Some useful manfolds and maps Some useful manfolds and exponental maps Drectonal vectors (surface normals etc.) a, a 1 x ( a p cos ) (Log map) sn p cos a (Exp map) sn x a, p unt vectors, x les n an (n-1)d space
Some useful manfolds and maps Symmetrc postve defnte matrces (covarance, flow tensors etc) A, X u T P A P Au 1 2 1 2 0 log P exp P u 1 2 1 2 0 AP XP 1 2 1 2 P P 1 2 1 2 (Log map) (Exp map) A s symmetrc postve defnte, X s just symmetrc log s the matrx log defned as a generalzed matrx functon
Some useful manfolds and maps Orthogonal matrces (rotaton matrces, egenvector matrces) A, AA T X log P A P exp T I A (Log map) X (Exp map) A orthogonal, X antsymmetrc (X+X T =0) These are the matrx exp and log functons as before In fact there are multple solutons to the matrx log Only one s the requred real antsymmetrc matrx; not easy to fnd Rest are complex
Embeddng on S n On S 2 (surface of a sphere n 3D) the followng parametersaton s well known x ( r sn cos, r sn sn, r cos) The dstance between two ponts (the length of the geodesc) s d j r cos 1 sn sn x y x xy cos cos x y T d xy y
More Sphercal Geometry But on a sphere, the dstance s the hghlghted arc-length Much neater to use nner-product x, y xy cos xy r 2 cos xy d xy r xy r cos 1 x, y 2 r And works n any number of dmensons x rθ xy y θ xy
Sphercal Embeddng Say we had the dstances between some objects (d j ), measured on the surface of a [hyper]sphere of dmenson n The sphere (and objects) can be embedded nto an n+1 dmensonal space Let X be the matrx of pont postons Z=XX T s a kernel matrx But Zj x, x j And x, y d Z xy j r cos x, x 1 j We can compute Z from D and fnd the sphercal embeddng! r 2 r 2 cos d r j
Sphercal Embeddng But wat, we don t know what r s! The dstances D are non-eucldean, and f we use the wrong radus, Z s not a kernel matrx Negatve egenvalues Use ths to fnd the radus Choose r to mnmse the negatve egenvalues r* arg mn Z( r) r o
Example: Texture Mappng As an alternatve to unwrappng object onto a plane and texture-mappng the plane Embed onto a sphere and texture-map the sphere Plane Sphere
Backup sldes
Laplacan and related processes As well as embeddng objects onto manfolds, we can model many nterestng processes on manfolds Example: the way heat flows across a manfold can be very nformatve du 2 u heat equaton dt 2 s the Laplacan and n 3D Eucldean space t 2 2 2 2 2 2 x y z On a sphere t s 2 1 1 sn 2 2 2 2 r sn r sn s
Heat flow Heat flow allows us to do nterestng thngs on a manfold Smoothng: Heat flow s a dffuson process (wll smooth the data) Charactersng the manfold (heat content, heat kernel coeffcents...) The Laplacan depends on the geometry of the manfold We may not know ths It may be hard to calculate explctly Graph Laplacan
Graph Laplacan Gven a set of dataponts on the manfold, descrbe them by a graph Vertces are dataponts, edges are adjacency relaton Adjacency matrx (for example) A j exp( d Then the graph Laplacan s L V A 2 j 2 d j / ) V A j j The graph Laplacan s a dscrete approxmaton of the manfold Laplacan
Heat Kernel Usng the graph Laplacan, we can easly mplement heatflow methods on the manfold usng the heat-kernel du Lu heat equaton dt H exp( Lt) heat kernel Can dffuse a functon on the manfold by f ' Hf