A Local Non-Negative Pursuit Method for Intrinsic Manifold Structure Preservation

Size: px

Start display at page:

Download "A Local Non-Negative Pursuit Method for Intrinsic Manifold Structure Preservation"

Byron Baldwin
5 years ago
Views:

1 Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence A Local Non-Negative Pursuit Method for Intrinsic Manifold Structure Preservation Dongdong Chen and Jian Cheng Lv and Zhang Yi Machine Intelligence Laboratory College of Computer Science, Sichuan University Chengdu 665, P. R. China Abstract The local neighborhood selection plays a crucial role for most representation based manifold learning algorithms. This paper reveals that an improper selection of neighborhood for learning representation will introduce negative components in the learnt representations. Importantly, the representations with negative components will affect the intrinsic manifold structure preservation. In this paper, a local non-negative pursuit () method is proposed for neighborhood selection and non-negative representations are learnt. Moreover, it is proved that the learnt representations are sparse and convex. Theoretical analysis and experimental results show that the proposed method achieves or outperforms the state-of-the-art results on various manifold learning problems. Introduction Manifold learning is generally to learn a data representation which can uncovers the intrinsic manifold structure. It is extensively used in machine learning, pattern recognition and computer vision (Wright et al. 29; Lv et al. 29; Liu, Lin, and Yu 2; Cai et al. 2). To learn a data representation, the neighbors of a data point should be selected in advance. There are many neighborhood selection methods, which can be divided into two categories: K nearest neighbors (KNN) methods and l norm minimization methods. Accordingly, the representation learning is divided into: KNN-based learning algorithms, such as Locally Linear Embedding (, (Roweis and Saul 2)), Laplacian eigenmaps (, (Belkin and Niyogi 23)); and l based learning algorithms, such as Sparse Manifold Clustering Embedding (, (Wright et al. 29)). However, the Knn method is heuristic. It is not easy to select proper neighbors of a data point in practical applications. On the other hand, the working mechanism of l based methods has not been fully elucidated (Zhang, Yang, and Feng 2). The solution of l norm minimization does t indicate the space distribution feature of the samples. More importantly, the representations learnt by these algorithms cannot avoid the existence of negative compo- Copyright c 24, Association for the Advancement of Artificial Intelligence ( All rights reserved. nents. The representations with negative components cannot correctly reflect the essential relations between data pairs. The intrinsic structure of the data would be broken. Hoyer (Hoyer 22) proposed Non-negative Sparse Coding (NSC) to learn sparse non-negative representations; recently, Zhuang et al. (Zhuang et al. 22) also proposed non-negative low rank and sparse representations for semisupervised learning. Unfortunately, the following concerns haven t been discussed. Why do the negative components of learnt representation exist and what is the influence of it on the intrinsic manifold structure preservation? In this paper, we firstly reveals that an improper neighborhood selection will result in the existence of negative components of learnt representations. It is illustrated that the representations with negative components will destroy the intrinsic manifold structure. In order to avoid the existence of negative components and well preserve the intrinsic structure of the data, a local non-negative pursuit () is proposed to select the neighbors and learn the non-negative representations. The selected neighbors construct a convex set so that non-negative affine representations are learnt. Further, we have proves the representations are sparse and nonnegative, which are useful to manifold dimensionality estimation and intrinsic manifold structure preservation. Theoretical analysis and experimental results show that the proposed method achieves or outperforms the state-of-the-art results on various manifold learning problems. Notations and Preliminaries A = a i R m } n i= is the data set, which lies on a manifold M of intrinsic dimension d( m). A = [a, a 2,, a n ] is the matrix form of A. Generally, the representation learning for a manifold involves to solve the following problem: A = AX, () where X = [x, x 2,, x n ] R n n is the representation matrix of A. Accordingly, x i = [x i, x i2,, x in ] is the representation of a i A (i =, 2,, n). X should preserves the intrinsic structure of M. With different purposes, various constraints could be added on X so that some particular representations can be obtained, such as sparse representation. In this paper, three famous representation learning algorithms (,, and ) will be used to compare 745

2 with our algorithm in various manifold learning problems. learns the local affine representations with an assumption that the local neighborhood of a point on the manifold can be well approximated by the affine subspace spanned by the neighbors of the point. learns the heat kernel representations that best preserves locality instead of local linearity in. learns a sparse affine representation by solving an l minimization problem which can achieve datum-adaptive neighborhood. Given a particular representation X, the corresponding low-dimensional embeddings are obtained by solving the trace minimization problem below (Yan et al. 27). min trace(yφ(x)y ), (2) YY =I d where I d is an identity matrix and Y R d n is the final embedding of A. Φ(X) is a symmetric and positive definite matrix w.r.t X. Four kinds of subspace w.r.t. A that used in this paper are defined as: non-negative subspace: SA} + = i c ia i c i } negative subspace: SA} = i c ia i c i < } affine subspace: S a A} = i c ia i i c i = } convex set: S a A} + = i c ia i i c i =, c i } where c i is the combinational coefficient w.r.t a i. In addition, P S and P Sa denotes the projection operator on the subspace S and S a, respectively. Motivation Three methods mentioned above and their extensions have been wildly studied and applied in many fields (He and Niyogi 23; Lv, Zhang, and Kok Kiong 27; Elhamifar, Sapiro, and Vidal 22). The performance of these methods suffers due to negative components of the learnt representations, and the intrinsic manifold structure cannot be preserved well. From Eqs.(), a i is the i-th column of A, x i is the i-th column of X (i =, 2,, n). The point a i = Ax i, is the reconstruction of a i. Suppose there are t non-zero components in x i, define a subset A i = a λj x iλj } t j= A, where λ j denotes the index of the j-th non-zero component in x i. Thus, a i is a linear combination of points in A i, where the combination should involves both additive and subtractive interactions in terms of the components of x i. Clearly, from the definition of S } +, if a i / SA i } +, then x ij < (j =,, n). The negative components will broke the intrinsic manifold structure. The following example will illustrate this problem. Suppose A = a = (9.8, 5.4), a 2 = (2.35, 3.7), a 3 = (.75, 8.2), a 4 = (4.9,.95) } which is sampled from a -d manifold as shown in Figure (a). For a i A(i =, 2, 3, 4), denotes A i = A \ a i }. is used to learn the affine representation of A with the neighborhood size is three. The local affine representation X can be obtained as: X = [ (,.748,-.47,.462), (.554,,.687,-.242), (-.7562,.42,,.354), (.8, ,2.6238,) ]. S a A 3 } + a a 2 a 3 x32 x33 y y 2 y 4 x3 a 4 x34 (a) (b) (c) Figure : (a): Data manifold and S a A 3 } + ; (b): The affine representation of a 3 learnt by ; (c): The corresponding -d embedding of A, denoted by Y = y, y 2, y 3, y 4 }. It is easy to see that each S a A i } is the whole plane of R 2, thus a i S a A i } and the reconstruction error is zero, i.e., a i Ax i 2 =. However, a i / S a A i } +. Thus, the corresponding representation x i must contains negative components. Figure (b) shows the representation of a 3. By solving Eqs.(2) with Φ(X) = (I X) (I X), where I is the identity matrix, the corresponding -d embedding of A can be obtained (Figure (c)). It shows that the intrinsic manifold structure is changed. Furthermore, non-negative representations are required for some clustering algorithms (Von Luxburg 27). For example, spectral clustering algorithm requires to construct a non-negative graph weights matrix. Generally, let X = X. Clearly, the graph weights matrix X is generally not same with original X, the weights in X would no longer reflect the inherent relationships between data points if there exist negative components in X. Clearly, an improper subset A i will result in includes some negative components in the learnt representation so that the local manifold structure is broken. Generally, the subset A i is determined according to the neighborhood of a i. Thus, to select a proper neighborhood and learn a nonnegative representation is crucial to preserve the intrinsic manifold structure. Local Non-negative Pursuit Manifold assumption (Saul and Roweis 23): If sufficient sampling can be obtained from the manifold, then the local structure of the manifold is linear which can be well approximated by the local affine subspace. Based on this assumption, we suppose that for each data point, there exists an optimal neighborhood which satisfies: ) the size of this neighborhood is less than d + ; 2) the corresponding affine projection of the point is located inside the non-negative affine subspace spanned by its neighboring points. More precisely: Assumption Given a data set A that is sampled from a sufficient dense sampled manifold M, for a i A, consider a linear patch U A that contains K( d + ) nearest neighbors of a i. We assume that: A opt U such that P SaA opt}a i S a A opt } +, where A opt R m k, k d +. Such S a A opt } + is called as a sparse convex patch of M at a i. Generally, if a i is a boundary point, A opt < d +, otherwise, A opt = d + where. is the set cardinality. y3 746

3 Clearly, U is the K nearest neighbors of a i. Rewrite U = A k A k where A k is a set that contains any k( K) neighbors of a i and A k is the complementary set of A k. Denote G k = g t = a i a t a t A k }, we have the following result. Theorem For a j A k, if P SaA k }a i S a A k } + and P SGk }g j SG k }, it holds that: P SaA k+ }a i S a A k+ } +. where g j = a i a j and A k+ = A k a j }. The proof can be found in Appendix A. According to the theorem, a Local Non-negative Pursuit () method is proposed to select the proper neighboring points from U one by one. The algorithm is as follows. Algorithm Find A opt : Local Non-negative Pursuit : Input: a i R m, U R m K. 2: Initialize: k =, A k, G k =. 3: Let k =, select the the nearest neighbor from U by a λ = arg min a j U a i a j 2. (3) 4: Updating: A = a λ }, G = g λ }, k = 2. 5: while true aλk = arg min i a j 2, a j U s.t. P SGk }g j SG k }. 6: if Eqs.(4) has solution, updating: Ak = A k a λk }; G k = G k g λk }; k = k +. 7: else Break. 8: endwhile 9: Output: A opt = A k = a λ, a λ2,, a λk }. λ j (j =, 2,, k) above denotes the index of the j-th point that is selected by. Clearly, the points in A opt are from the local of M at a i and are affine independent of each other, which is very useful properly for manifold learning. The algorithm converges fast with computational complexity of O(k(K k)/2), where k = A opt and K = U. The following theorem describes the problem. Theorem 2 The never selects the same data point twice. Thus, it must converges in no more than K steps. The proof can be found in Appendix A. The will stopped when no appreciate neighbors can be selected. e.g. if no neighbor can be selected at step t(< K), the algorithm will stopped since it never select the same point twice. Clearly, P SaA k }a i is the affine reconstruction of a i at step k. Denote r k = a i P SaA k }a i. The following theorem shows the reconstruction error is decreasing. Theorem 3 For k, r k 2 < r k 2. The proof can be found in Appendix A. (4) (5) Based on the selected neighbors set A opt, a non-negative sparse affine representation of a i is given by minimizing the following objective function: min x i 2 a i Ax i 2 2, s.t. x iλj = ; x it = if a t / A opt. (6) j= Denote G = [a i a λ, a i a λ2,, a i a λk ] and M = (G G), where a λj A opt (j =, 2,, k). A closed form solution is: x iλj = m j M, for a λ j A opt ; x it =, for a t / A opt. where m j is the j-th column of M and R k is the vector of all ones. Theorem 4 The representation x i learnt by Eqs.(6) is sparse and convex. The proof can be found in Appendix A. Such a non-negative sparse affine representation is referred as Sparse Convex Representation (SCR) in this paper. Experiments and Discussions In this section, a series of synthetic and real-world data sets are used for the experiments. The proposed method is compared with the three methods,, and. All the experiments were carried out using MATLAB on a 2.2 GHz machine with 2.GB RAM. Manifold dimensionality estimation. By using the proposed method, the learnt SCRs are sparse with x i d + (i =,, n). Based on the sparsity of SCR, an intrinsic dimensionality estimation method is given. We firstly sort the components of x i in descending order and denote x i = [x i, x i2,, x in ] with x i x i2 x in. Then, a manifold dimensionality estimation vector (dev) of M is defined as: dev(m) n x i = [ρ, ρ 2,, ρ n ]. (8) n i= Thus, the intrinsic dimension d of M can be determined by: (7) d = arg maxρ l ρ l+ } k l=. (9) l Multi-manifold clustering and embedding. The SCRs of a data set can be learnt by using the proposed method. The SCRs, X = [x,, x n ] R n n, can be used for multi-manifold clustering and embedding. Firstly, a similarity matrix W = max(x, X) is constructed. Then, the normalized Laplacian matrix is calculated by L = D 2 WD 2, where D is a diagonal matrix with the diagonal element d ii = n j= w ij. Finally, the standard spectral clustering (Von Luxburg 27) is used to obtain the n c clusters, denoted by M i } nc i=. For each cluster M i, the intrinsic dimension d i is estimated by Eqs.(9) and the d i -dimension embedding is obtained by Eqs.(2). 747

4 Evaluate metric. It is important to preserve the intrinsic structure of a manifold after embedding. Some method, such as MMRE (Lee and Verleysen 29) and LCMC (Chen and Buja 29) have been proposed to assess the intrinsic structure preservation by computing the neighborhood overlap. MMRE not only computes the neighborhood alignment accuracy, but also considers the order of the neighbors. Based on the MMRE method, this paper assesses the quality of the intrinsic structure preservation with different method,,, and our method. For the data set A = a i } n i=, its corresponding lowdimensional embedding is denoted by Y = y i } n i=. Clearly, y i is the embedding of a i. Λ i = λ ij } si j= is the indices set of the selected s i neighbors of a i with a i a λij 2 a i a λi,j+ 2 (j =,, s i ). Γ i = γ ij } ti j= is the indices set of the selected t i neighbors of y i with y i y γij 2 y i y γi,j+ 2 (j =,, t i ). The neighborhood alignment accuracy (nalac) is given as: nalac n n i= min(s i,t i) j= δ(λ ij = γ ij ) min(s i, t i ), () where δ( ) is the delta function. Another average alignment accuracy is formulated as follows: soft nalac n n i= Λ i Γ i max(s i, t i ). () It is clear that the bigger the value of nalac (soft-nalac), the change of neighborhood is smaller. Thus, a bigger value of nalac (soft-nalac) indicates a better behaviour of intrinsic manifold structure preservation (a): trefoil (c): COIL (b): 2-trefoil-knots (d): ExtendedYaleB Figure 2: The employed four data sets. Synthetic Data. data points are sampled from a trefoil that is embedded in R with small Gaussian white noises (Figure 2(a)).,, and scheme are used to learn a low dimensional manifold with K = 2, 3,, and λ = 6 2, 6 3,, 6 for. Figure (6) a reports the corresponding time costs. Figure 3 shows the final embedding results and the corresponding devs. It is shown that the SCRs learnt by our method are always nonnegative and sparse. The devs of our method always reflect the intrinsic dimension of the data. For different neighborhood sizes K (λ for ), only our method can always provide a good embedding result K =5 K =3 K = K =5 K =3 K = K =5 K =3 K = =3 = =5 =4 = = Figure 3: Embeddings of trefoil. For each method, the left is the obtained embeddings while the right is the corresponding devs. The corresponding nalac and soft-nalac are shown in Figure 4 (a) and (b) respectively. Clearly, by using the proposed method, the neighborhood alignment accuracy is highest. That means the intrinsic structure is preserved better compared with the other methods. Next, consider a case where the manifolds are intertwined and close to each other as shown in Figure 2 (b). 4 points are sampled from a 2-trefoil-knots, which are embedded in R and are corrupted with small Gaussian white noise. These mehtods,,, and are used for manifold clustering with different neighborhood sizes. Figure 5 (a) reports the misclassification rates and Figure 6 (b) reports the corresponding time costs. The results shows that the proposed method can always achieve the stable and good clustering behaviour while the time costs are lowest among all other methods. Real data. Two most-used real datasets, COIL-2 and ExtendedYaleB are used in this section, as shown in Figure 2 (c) and (d) respectively. The COIL-2 dataset includes 2 objects. Each object has 72 images of different positions and the size of each image is Meanwhile, the 72 duck images is used for manifold embedding. These methods,,, and, are used to learn the low dimension manifold with K = 2, 3,, 7 and λ = 6 2, 6 3,, 6 7 for. Figure 6 (c) reports the corresponding time costs, which shows that the is running fast. Figure 7 shows the final embedding results and the corresponding devs. The results shows that the SCRs learnt by scheme are always non-negative and sparse. The devs of our method always reflect the intrinsic dimension of the data. The corresponding nalac and soft-nalac are reported in Figure 4 (c) and (d) respectively, which indicates that the proposed method has a good performance of intrinsic structure preservation. 748

5 6 5 2 nalac (%) soft-nalac (%) (a): nalac on trefoil (b): soft-nalac on trefoil (a): Execution time on trefoil 2 3 (b): Execution time on 2-trefoil-knots nalac (%) (c): nalac on COIL-2 soft-nalac (%) (d): soft-nalac on COIL (c): Execution time on COIL (d): Execution time on ExtendedYaleB Figure 4: Intrinsic manifold structure preservation behaviour of various manifold learning methods. Figure 6: The execution time of various methods. Misclassification rate (%) (a): Misclassification rate on 2-trefoil-knots Misclassification rate (%) (b): Misclassification rate on ExtendedYaleB Figure 5: Manifold clustering behaviour of various methods. The ExtendedYaleB data set contains about 2, 44 frontal face images of 38 individuals with that the size of each image is ,, and, are used for manifold clustering with different neighborhood sizes. Figure 5 (b) reports the misclassification rates and Figure 6 (d) reports the corresponding time costs. These results show that the proposed method can always obtain good performance of manifold clustering and the time costs are low. Figure 8 shows the embedding results and the corresponding devs of first two clusters which are obtained by using proposed method. K = K =5 K =3 K = K = K =5 K =3 K = K = K =5 K =3 K = =3 = =6 =4 = = Figure 7: Embeddings of COIL-2. For each method, the left is the obtained embeddings while the right is the corresponding devs. Conclusion This paper devotes to analyze the essence of the existence of negative components in the representation. A local Nonnegative Pursuit () algorithm is proposed to select the neighbors of a data point and the Sparse Convex Representations (SCR) are learnt. It has been shown that the SCRs learnt by using the proposed method are sparse and non-negative. Experimental results show that the proposed method achieves or outperforms the state-of-the-art results. Figure 8: The embeddings of first two clusters of ExtendedYaleB, which are given by using the proposed scheme, and the corresponding dev at the right bottom. 749

6 Acknowledgments This work was supported by the National Science Foundation of China under grant , and National Program on Key Basic Research Project (973 Program) under Grant 2CB322. Appendix A The proof of Theorem : Since P SaA k }a i S a A k } +, P SaA k }a i can be written as P SaA k }a i = with a t A k, t =,, k and x t a t, (2) t= x t =, x t. (3) t= Let r k = a i P SaA k }a i. Clearly, r k = a i x t a t = t= x t (a i a t ) = t= x t g t, (4) t= where g t = a i a t and x t (t =,, k). Thus, r k SG k } +. Since P SGk }g j SG k }, it holds that 9 < (r k, g j ) 8. Let o = [,,, ] R m, we have It means that P Sar k,g j }o S a r k, g j } +. P Saa i r k,a i g j }(a i o) S a a i r k, a i g j } +. Then, we can get P SaP SaAk }a i,a j} a i S a PSaA k }a i, a j } + PS a t= } + } a i S a x t a t, a j. (5) x ta t,a j t= By the definition of S a } +, we have } + S a x t a t, a j t= } = η ( x t a t ) + η 2 a j t= = (η x )a + + (η x k )a k + (η 2 )a j }, where η + η 2 =, η, η 2. Clearly, η x + η x η x k + η 2 = η (x + x x k ) + η 2 = η + η 2 =. Thus, S a t= x t a t, a j } + S a A k+ } +. (6) By Eqs.(5) and Eqs.(6), we have P SaA k+ }a i S a A k+ } + The proof is complete. The proof of Theorem 2: Let a t A is the point selected at step k. If a t A k, it means that a t SA k } + and P SAk }a t = a t. Thus P SAk }a t SA k } +. It follows that P SAk }a t SA k } + P SGk }g t SG k } +. (7) where G k = a i a j a j A k } and g t = a i a t. Clearly, it is contradict with the condition (4). Thus, a t can not be selected twice. The proof is complete. The proof of Theorem 3: The space spanned by A can be written in a direct sum: SA} = S a A} S a A}}. Clearly, P SaA k }a i S a A k } and a i P SaA k }a i S a A k }}. (8) Since S a A k } S a A k }, we have thus, P SaA k }a i S a A k }, (P SaA k }a i P SaA k }a i ) S a A k }. (9) From Eqs.(8) and Eqs.(9), we have (a i P SaA k }a i ) (P SaA k }a i P SaA k }a i ). Since (a i P SaA k }a i ) = (a i P SaA k }a i ) + (P SaA k }a i P SaA k }a i ), From the Pythagorean theorem, it holds that: i.e., a i P SaA k }a i 2 2 = a i P SaA k }a i P SaA k }a i P SaA k }a i 2 2, r k 2 2 = r k P SaA k }a i P SaA k }a i 2 2. Clearly, P SaA k }a i P SaA k }a i 2 2 >, thus r k 2 < r k 2. The proof is complete. The proof of Theorem 4: Clearly, Ax i = x iλj a λj = P SaA opt}a i, j= where a λj A opt and k j= x iλ j =. It is easy to see that P SaA opt}a i S a A opt } +. Thus, the learnt affine representation x i is non-negative. Since x i = k( d + ), x is also sparse (Elad 2). The proof is complete. 75

7 References Belkin, M., and Niyogi, P. 23. Laplacian eigenmaps for dimensionality reduction and data representation. Neural computation 5(6): Cai, D.; He, X.; Han, J.; and Huang, T. S. 2. Graph regularized nonnegative matrix factorization for data representation. Pattern Analysis and Machine Intelligence, IEEE Transactions on 33(8): Chen, L., and Buja, A. 29. Local multidimensional scaling for nonlinear dimension reduction, graph drawing, and proximity analysis. Journal of the American Statistical Association 4(485): Elad, M. 2. Sparse and redundant representations: from theory to applications in signal and image processing. Springer. Elhamifar, E.; Sapiro, G.; and Vidal, R. 22. See all by looking at a few: Sparse modeling for finding representative objects IEEE. He, X., and Niyogi, P. 23. Locality preserving projections. In NIPS, volume 6, Hoyer, P. O. 22. Non-negative sparse coding. Neural Networks for Signal Processing, 22. Proceedings of the 22 2th IEEE Workshop on Lee, J. A., and Verleysen, M. 29. Quality assessment of dimensionality reduction: Rank-based criteria. Neurocomputing 72(7): Liu, G.; Lin, Z.; and Yu, Y. 2. Robust subspace segmentation by low-rank representation. In Proceedings of the 27th International Conference on Machine Learning (ICML-), Lv, J. C.; Kok Kiong, T.; Zhang, Y.; and Huang, S. 29. Convergence analysis of a class of hyvärinen oja s ica learning algorithms with constant learning rates. Signal Processing, IEEE Transactions on 57(5): Lv, J. C.; Zhang, Y.; and Kok Kiong, T. 27. Global convergence of gha learning algorithm with nonzero-approaching learning rates. Neural Networks, IEEE Transactions on 8(6): Roweis, S. T., and Saul, L. K. 2. Nonlinear dimensionality reduction by locally linear embedding. Science 29(55): Saul, L. K., and Roweis, S. T. 23. Think globally, fit locally: unsupervised learning of low dimensional manifolds. The Journal of Machine Learning Research 4:9 55. Von Luxburg, U. 27. A tutorial on spectral clustering. Statistics and computing 7(4): Wright, J.; Yang, A. Y.; Ganesh, A.; Sastry, S. S.; and Ma, Y. 29. Robust face recognition via sparse representation. Pattern Analysis and Machine Intelligence, IEEE Transactions on 3(2): Yan, S.; Xu, D.; Zhang, B.; Zhang, H.-J.; Yang, Q.; and Lin, S. 27. Graph embedding and extensions: a general framework for dimensionality reduction. Pattern Analysis and Machine Intelligence, IEEE Transactions on 29():4 5. Zhang, D.; Yang, M.; and Feng, X. 2. Sparse representation or collaborative representation: Which helps face recognition? In Computer Vision (ICCV), 2 IEEE International Conference on, IEEE. Zhuang, L.; Gao, H.; Lin, Z.; Ma, Y.; Zhang, X.; and Yu, N. 22. Non-negative low rank and sparse graph for semi-supervised learning. In Computer Vision and Pattern Recognition (CVPR), 22 IEEE Conference on, IEEE. 75

Nonnegative Matrix Factorization Clustering on Multiple Manifolds

Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10) Nonnegative Matrix Factorization Clustering on Multiple Manifolds Bin Shen, Luo Si Department of Computer Science,