Adaptve Manfold Learnng Jng Wang, Zhenyue Zhang Department of Mathematcs Zhejang Unversty, Yuquan Campus, Hangzhou, 327, P. R. Chna wroarng@sohu.com zyzhang@zju.edu.cn Hongyuan Zha Department of Computer Scence Pennsylvana State Unversty Unversty Park, PA 682 zha@cse.psu.edu Abstract Recently, there have been several advances n the machne learnng and pattern recognton communtes for developng manfold learnng algorthms to construct nonlnear low-dmensonal manfolds from sample data ponts embedded n hgh-dmensonal spaces. In ths paper, we develop algorthms that address two key ssues n manfold learnng: ) the adaptve selecton of the neghborhood szes; and 2) better fttng the local geometrc structure to account for the varatons n the curvature of the manfold and ts nterplay wth the samplng densty of the data set. We also llustrate the effectveness of our methods on some synthetc data sets. Introducton Recently, there have been advances n the machne learnng communty for developng effectve and effcent algorthms for constructng nonlnear low-dmensonal manfolds from sample data ponts embedded n hgh-dmensonal spaces, emphaszng smple algorthmc mplementaton and avodng optmzaton problems prone to local mnma. The proposed algorthms nclude Isomap [6], locally lnear embeddng (LLE) [3] and ts varatons, manfold chartng [], hessan LLE [2] and local tangent space algnment (LTSA) [7], and they have been successfully appled n several computer vson and pattern recognton problems. Several drawbacks and possble extensons of the algorthms have been ponted out n [4, 7] and the focus of ths paper s to address two key ssues n manfold learnng: ) how to adaptvely select the neghborhood szes n the k-nearest neghbor computaton to construct the local connectvty; and 2) how to account for the varatons n the curvature of the manfold and ts nterplay wth the samplng densty of the data set. We wll dscuss those two ssues n the context of local tangent space algnment (LTSA) [7], a varaton of locally lnear embeddng (LLE) [3] (see also [5],[]). We beleve the basc deas we proposed can be smlarly appled to other manfold learnng algorthms. We frst outlne the basc steps of LTSA and llustrate ts falure modes usng two smple examples. Gven a data set X = [x,..., x N ] wth x R m, sampled (possbly wth nose) from a d-dmensonal manfold (d < m), LTSA proceeds n the followng steps. ) LOCAL NEIGHBORHOOD CONSTRUCTION. For each x, =,..., N, determne a set X = [x,..., x k ] of ts neghbors (k nearest neghbors, for example).
.3 k = 4.5 k = 6.5 k = 8.9.2.4.4.8.7..3.3.6.2.2.5.4....3.2.2.3....4 2 2.2 2 2.2 2 2.5.2.2 9..5.5 8.5.. 7 6.5.5 5.5 4..5.5 3.5.. 2.2.5.5 5 5.25.2 2 2 2 2.2 2 2 Fgure : The data sets (frst column) and computed coordnates τ by LTSA vs. the centered arc-length coordnates Top row: Example. Bottom row: Example 2. 2) LOCAL LINEAR FITTING. Compute an orthonormal bass Q for the d-dmensonal tangent space of the manfold at x, and the orthogonal projecton of each x j to the tangent space: θ () j = Q T (x j x ) where x s the mean of the neghbors. 3) LOCAL COORDINATES ALIGNMENT. Algn the N local projectons Θ = [θ (),, θ() k ], =,..., N, to obtan the global coordnates τ,..., τ N. Such an algnment s acheved by mnmzng the global reconstructon error E 2 2 T (I ee T ) L Θ 2 2 (.) k over all possble L R d d and row-orthonormal T = [τ,..., τ N ] R d N, where T = [τ,..., τ k ] wth the ndex set {,..., k } determned by the neghborhood of each x, and e s a vector of all ones. Two strateges are commonly used for selectng the local neghborhood sze k : one s k nearest neghborhood ( k-nn wth a constant k for all the sample ponts) and the other s ɛ- neghborhood [3, 6]. The effectveness of the manfold learnng algorthms ncludng LTSA depends on the manner of how the nearby neghborhoods overlap wth each other and the varaton of the curvature of the manfold and ts nterplay wth the samplng densty [4]. We llustrate those ssues wth two smple examples. Example. We sample data ponts from a half unt crcle x = [cos(t ), sn(t )] T, =..., N. It s easy to see that t represent the arc-length of the crcle. We choose t [, π] accordng to t + t =.(. + cos(t ) ) startng at t =, and set N = 52 so that t N π and t N+ > π. Clearly, the half crcle has unt curvature everywhere. Ths s an example of hghly-varyng samplng densty. Example 2. The date set s generated as x = [t, e t2 ] T, =..., N, where t [ 6, 6] are unformly dstrbuted. The curvature of the -D curve at parameter value t s gven by c g (t) = 2 2t2 e t2 ( + 4t 2 e 2t2 ) 3/2
whch changes from mn t c g (t) = to max t c g (t) = 2 over t [ 6, 6]. We set N = 8. Ths s an example of hghly-varyng curvature. For the above two data sets, LTSA wth constant k-nn strategy fals for any reasonable k we have tested. So does LTSA wth constant ɛ-neghborhoods. In the frst column of Fgure, we plot these two data sets. The computed coordnates by LTSA wth constant k- neghborhoods are plotted aganst the centered arc-length coordnates for a selected range of k (deally, the plots should dsplay ponts on a straght lne of slops ±π/4). 2 Adaptve Neghborhood Selecton In ths secton, we propose a neghborhood contracton and expanson algorthm for adaptvely selectng k at each sample pont x. We assume that the data are generated from a parameterzed manfold, x = f(τ ), =,..., N, where f : Ω R d R m. If f s smooth enough, usng frst-order Taylor expanson at a fxed τ, for a neghborng τ, we have f( τ) = f(τ) + J f (τ) ( τ τ) + ɛ(τ, τ), (2.2) where J f (τ) R m d s the Jacob matrx of f at τ and ɛ(τ, τ) represents the error term determned by the Hessan of f, ɛ(τ, τ) c f (τ) τ τ 2 2, where c f (τ) represents the curvature of the manfold at τ. Settng τ = τ and τ = τ j gves x j = x + J f (τ ) (τ j τ ) + ɛ(τ, τ j ). (2.3) A pont x j can be regarded as a neghbor of x wth respect to the tangent space spanned by the columns of J f (τ ) f τ j τ 2 s small and ɛ(τ, τ j ) 2 J f (τ ) (τ j τ ) 2. The above condtons, however, are dffcult to verfy n practce snce we do not know J f (τ ). To get around ths problem, consder an orthogonal bass matrx Q of the tangent space spanned by the columns of J f (τ ) whch can be approxmately computed by the SVD of X x e T, where x s the mean of the neghbors x j = f(τ j ), j =,..., k. Note that x = k k j= x j = x + J f (τ ) ( τ τ ) + ɛ, where ɛ s the mean of ɛ(τ, τ ),..., ɛ(τ, τ k ). Elmnatng x n (2.3) by the representaton above yelds x j = x + J f (τ ) (τ j τ ) + ɛ () j wth ɛ () j = ɛ(τ, τ j ) ɛ. Let θ () j = Q T (x j x ), we have x j = x + Q θ () j + ɛ () j. Thus, x j can be selected as a neghbor of x f the orthogonal projecton θ () j s small and ɛ () j 2 = x j x Q θ () j 2 Q θ () j 2 = θ () j 2. (2.4) Assume all the x j satsfy the above nequalty, then we should approxmately have (I Q Q T )(X x e T ) F η Q T (X x e T ) F (2.5) We wll use (2.5) as a crteron for adaptve neghbor selecton, startng wth a K-NN at each sample pont x wth a large enough ntal K and deletng ponts one by one untl (2.5) holds. Ths process wll termnate when the neghborhood sze equals d + k for some small k and (2.5) s not true. In that case, we may need to reselect a k-nn that mnmzes the rato (I QQT )(X xet ) F Q T (X xet ) F NEIGHBORHOOD CONTRACTION. as the neghborhood set as s detaled below.
C. Determne the ntal K and K-NN neghborhood X (K) = [x,..., x K ] for x, ordered n non-decreasng dstances to x, Set k = K. C. Let x (k) x x x 2 x... x K x. be the column mean of X (k). Compute the orthogonal bass matrx Q (k), the d largest sngular vectors of X (k) x (k) e T. Set Θ (k) = (Q (k) ) T (X (k) x (k) e T ). C2. If X (k) x (k) e T Q (k) Θ (k) F < η Θ (k) F, then set X = X (k), Θ = Θ (k), and termnate. C3. If k > d+k, then delete the last column of X (k) to obtan X (k ), set k := k, and go to step C, otherwse, go to step C4. C4. Let k = arc mn d+k j K Θ (k). X (j) x (j) e T Q (j) Θ (j) F Θ (j), and set X = X (k), Θ = F Step C4 means that f there s no k-nn (k d + k ) satsfyng (2.5), then the contracted neghborhood X should be one that mnmzes X xet Q Θ F Θ F. Once the contracton step s done we can stll add back some of unselected x j to ncrease the overlap of nearby neghborhoods whle stll keep (2.5) ntact. In fact, we can add x j f x j x Q θ j η θ j whch s demonstrated n the followng result (we refer to [8] for the proof). Theorem 2. Let X = [x,..., x k ] satsfy (2.5). Furthermore, we assume x j x Q θ () j η θ() j, j = k +,..., k + p, (2.6) where θ () j = Q T (x j x ). Denote by x the column mean of the expanded matrx X = [X, x k+,... x k+p ]. Then for the left-sngular vector matrx Q correspondng to the d largest sngular values of X x e T, (I Q QT )( X x e T ) F η ( Q T ( X x e T ) F + k+p p k + p j=k+ θ () j 2). The above result shows that f the mean of the projectons θ () j of the expandng neghbors s small and/or the number of the expandng ponts are relatvely small, then approxmately, NEIGHBORHOOD EXPANSION. (I Q QT )( X x e T ) F η Q T ( X x e T ) F. E. Set k to be the column number of X obtaned by the neghborhood contractng step. For j = k +,..., K, compute θ () j = Q T (x j x ). E. Denote by J the ndex subset of j s, k < j K, such that (I Q Q T )(x j x ) 2 θ () j 2. Expand X by addng x j, j J. Example 3. We construct the data ponts as x = [sn(t ), cos(t ),.2t ] T, =,..., N, wth t [, 4π] unformly dstrbuted, whch s plotted n the top-left panel n Fgure 2.
.8 (a) k=7. (b) k=8. (c) k=9.4.6.5.5.2.4.2.5.5.2... (d) k=3.5 (e) k=5.5 (f) k=3.5 (g) k=35.5...5.5.5.5...5.5.5 Fgure 2: Plots of the data sets (top left), the computed coordnates τ by LTSA vs. the centered arc-length coordnates (a c), the computed coordnates τ by LTSA wth neghborhood C contracton vs the centered arc-length coordnates (e g), and the computed coordnates τ by LTSA wth neghborhood contracton and expanson vs. the centered arc-length coordnates (bottom left) LTSA wth constant k-nn fals for any k: small k leads to lack of necessary overlap among the neghborhoods whle for large k, the computed tangent space can not represent the local geometry well. In (a c) of Fgure 2, we plot the coordnates computed by LTSA vs. the arc-length of the curve. Contractng the neghborhoods wthout expanson also results n bad results, because of small szes of the resultng neghborhoods, see (e g) of Fgure 2. Panel (d) of Fgure 2 gves an excellent result computed by LTSA wth both neghborhood contracton and expanson. We want menton that our adaptve strateges also work well for nosy data sets, we refer the readers to [8] for some examples. 3 Algnment ncorporatng varatons of manfold curvature Let X = [x,..., x k ] conssts of the neghbors determned by the contracton and expanson steps n the above secton. In (.), we can show that the sze of the error term E 2 depends on the sze of the curvature of manfold at sample pont x [8]. To make the mnmzaton n (.) more unform, we need to factor out the effect of the varatons of the curvature. To ths end, we pose the followng mnmzaton problem, mn T,{L } k (T (I k ee T ) L Θ )D 2 2, (3.7) where D = dag(φ(θ () ),..., φ(θ() k )), and φ(θ () j ) s proportonal to the curvature of the manfold at the parameter value θ, the computaton of whch wll be dscussed below. For fxed T, the optmal L s gven by L = T (I k k ee T )Θ + = T Θ +. Substtutng t nto (3.7), we have the reduced mnmzaton problem mn T k T (I k k ee T Θ + Θ )D 2 2 Imposng the normalzaton condton T T T = I, a soluton to the mnmzaton problem above s gven by the d egenvectors correspondng to the second to (d + )st smallest
egenvalues of the followng matrx B (SW ) dag(d 2 /k,..., D 2 n/k n )(SW ) T, where W = (I k k ee T )(I k Θ + Θ ). Second-order analyss of the error term n (.) shows that we can set φ (θ () j ) = γ + c f (τ ) θ () j 2 wth a small postve constant γ to ensure φ (θ () j ) >, and c f (τ ) represents the mean of curvatures c f (τ, τ j ) for all neghbors of x. Let Q denote the orthonormal matrx of the largest d rght sngular vectors of X (I k ee T ). We can approxmately compute c f (τ ) as follows. c f (τ ) k k l=2 arccos(σ mn (Q T Q l )) θ l 2. where σ mn ( ) s the smallest sngular value of a matrx. Then the dagonal weghts φ(θ ) can be computed as φ (θ () j ) = η + θ j 2 2 k k l=2 arccos(σ mn (Q T Q l )) θ l 2. Wth the above preparaton, we are now ready to present the adaptve LTSA algorthm. Gven a data set X = [x,..., x N ], the approach conssts of the followng steps: Step. Determnng the neghborhood X = [x,..., x k ] for each x, =,..., N, usng the neghborhood contracton/expanson steps n Secton 2. Step 2. Compute the truncated SVD, say Q Σ V T of X (I k ee T ) wth d columns n both Q and V, the projectons θ () l = Q T (x l x ) wth the mean x of the neghbors, and denote Θ = [θ (),..., θ() k ]. Step 3. Estmate the curvatures as follows. For each =,..., N, c = k arccos(σ mn (Q T Q l )) k l=2 θ () l, 2 Step 4. Construct algnment matrx. For =,..., N, set W = I k [ e, V ][ e, V ] T, k k D = γi+ dag(c θ () 2 2,..., c θ () k 2 2), where γ s a small constant number (usually we set γ =. 6 ). Set ntal B =. Update B teratvely by B(I, I ) := B(I, I ) + W D D W T /k, =,..., N. Step 5. Algn global coordnates. Compute the d + smallest egen-vectors of B and pck up the egenvector [u 2,..., u d+ ] matrx correspondng to the 2nd to d + st smallest egenvalues, and set T = [u 2,..., u d+ ] T. 4 Expermental Results In ths secton, we present several numercal examples to llustrate the performance of the adaptve LTSA algorthm. The test data sets nclude curves n 2D/3D Eucldean spaces.
.5 k=4.5 k=6.5 k=8.5 k=6 8... 6 4.5.5.5.5 2..5.5.5.5. 5 5 2 2 2 2 2 2 2 2.3.3.3.3.8.2..2..2..2..6.4.....2.2.3.2.3.2.3.2.3.4.4.4.4 2 2 2 2 2 2 2 2 Fgure 3: The computed coordnates τ by LTSA takng nto account curvature and varable sze of neghborhood. Frst we apply the adaptve LTSA to the date sets shown n Examples and 2. Adaptve LTSA wth dfferent startng k s works every well. See Fgure 3. It shows that for these tow data sets, the adaptve LTSA s not senstve to the choce of the startng k or the varatons n samplng denstes and manfold curvatures. Next, we consder the swss-roll surface defned by f(s, t) = [s cos(s), t, s sn(s)] T. It s easy to see that J f (s, t) T J f (s, t) = dag( + s 2, ). Denotng s = s(r) the nverse transformaton of r = r(s) defned by r(s) = the swss-roll surface can be parameterzed as s + α2 dα = 2 (s + s 2 + arcsnh(s)), ˆf(r, t) = [s(t) cos(s(r)), t, s(r) sn(s(r))] T and ˆf s sometrc wth respect to (r, t). In the left fgure of Fgure 4, we show there s a dstorton between the computed coordnates by LTSA wth the best-ft neghborhood sze (bottom left) and the generatng coordnates (r, t) T (top rght). In the rght panel of the bottom row of the left fgure of Fgure 4, we plot the computed coordnates by the adaptve LTSA wth ntal neghborhood sze k = 3. (In fact, the adaptve LTSA s nsenstve to k and we wll get smlar results wth a larger or smaller ntal k). We can see that the computed coordnates by the adaptve LTSA can recover the generatng coordnates well wthout much dstorton. Fnally we appled both LTSA and the adaptve LTSA to a 2D manfold wth 3 peaks embedded n a dmensonal space. The data ponts are generated as follows. Frst we generate N = 2 3D ponts, y = (t, s, h(t, s )) T, where t and s randomly dstrbuted n the nterval [.5,.5] and h(t, s) s defned by h(t, s) = e 2t2 2s 2 e t2 (s+) 2 e (+t)2 s 2. Then we embed the 3D ponts nto a D space by x Q = Qy, x H = Hy, where Q R 3 s a random orthonormal matrx resultng n an orthogonal transformaton and H R 3 a matrx wth ts sngular values unformly dstrbuted n (, ) resultng n an affne transformaton. In the top row of the rght fgure of Fgure 4, we plot the
swss role Generatng Coordnate. (a).4 (b) 5.5.5.2.2 5 5 5.5 2 3 4 5.5...5.5.4.6.8.5.5..6.3.5 (c).6 (d).4.2.2.2...2.3.4.2.2.4.4.4.2.2.4.4.2.2.4.5.5.5.6.6.4.2.2.4 Fgure 4: Left fgure: 3D swss-roll and the generatng coordnates (top row), computed 2D coordnates by LTSA wth the best neghborhood sze k = 5 (bottom left) and computed 2D coordnates by adaptve LTSA (bottom rght). Rght fgure: coordnates computed by LTSA for the orthogonally embedded D data set {x Q } (a) and the affnely embedded D data set {x H } (b), and the coordnates computed by the adaptve LTSA for {xq } (c) and {x H } (d). computed coordnates by LTSA for x Q (shown n (a)) and x H (shown n (b)) wth best-ft neghborhood sze k = 5. We can see the deformatons (stretchng and compresson) are qute promnent. In the bottom row of the rght fgure of Fgure 4, we plot the computed coordnates by the adaptve LTSA for x Q (shown n (c)) and x H (shown n (d)) wth ntal neghborhood sze k = 5. It s clear that the adaptve LTSA gves a much better result. References [] M. Brand. Chartng a manfold. Advances n Neural Informaton Processng Systems, 5, MIT Press, 23. [2] D. Donoho and C. Grmes. Hessan Egenmaps: new tools for nonlnear dmensonalty reducton. Proceedngs of Natonal Academy of Scence, 559-5596, 23. [3] S. Rowes and L. Saul. Nonlnear dmensonalty reducton by locally lnear embeddng. Scence, 29: 2323 2326, 2. [4] L. Saul and S. Rowes. Thnk globally, ft locally: unsupervsed learnng of nonlnear manfolds. Journal of Machne Learnng Research, 4:9-55, 23. [5] E. Teh and S. Rowes. Automatc Algnment of Local Representatons. Advances n Neural Informaton Processng Systems, 5, MIT Press, 23. [6] J. Tenenbaum, V. De Slva and J. Langford. A global geometrc framework for nonlnear dmenson reducton. Scence, 29:239 2323, 2. [7] Z. Zhang and H. Zha. Prncpal Manfolds and Nonlnear Dmensonalty Reducton va Tangent Space Algnment. SIAM J. Scentfc Computng, 26:33 338, 24. [8] J. Wang, Z. Zhang and H. Zha. Adaptve Manfold Learnng. Techncal Report CSE- 4-2, Dept. CSE, Pennsylvana State Unversty, 24.