Manifold Coarse Graining for Online Semi-supervised Learning

Size: px

Start display at page:

Download "Manifold Coarse Graining for Online Semi-supervised Learning"

Alexis Strickland
5 years ago
Views:

1 for Online Semi-supervised Learning Mehrdad Farajtabar, Amirreza Shaban, Hamid R. Rabiee, Mohammad H. Rohban Digital Media Lab, Department of Computer Engineering, Sharif University of Technology, Tehran, Iran. September 6, 2011 M. Farajtabar et al. for Online Semi-supervised Learning September 6, / 23

2 1 Introduction Manifold regularization 2 3 Exact Coarse Graining Approximate Coarse Graining Preserving Manifold Structure 4 Summary and Future Works M. Farajtabar et al. for Online Semi-supervised Learning September 6, / 23

3 Semi-supervised Leraning Introduction Manifold regularization Semi-supervised Learning: utilize unlabeled data to to enhance classification. Manifold assumption: the labeling function varies smoothly with respect to underlying manifold. Manifold structure is modeled by the neighborhood graph of the data points. Application such as image segmentation, handwritten digit recognition, text classification, and etc. M. Farajtabar et al. for Online Semi-supervised Learning September 6, / 23

4 Online setting Outline Introduction Manifold regularization Online classification of data is required in common applications such as object tracking, face recognition in surveillance systems, image retrieval and etc. Usually unlabeled data is easily available in such classification problems. Most of the classic SSL algorithms are not efficient in an online classification setting. This is due to Time complexity of O(n 3 ) in standard implementations. Space complexity. Our approach: data reduction! M. Farajtabar et al. for Online Semi-supervised Learning September 6, / 23

5 Some solutions Outline Introduction Manifold regularization Using a smoothness regularization term on the just τ most recent data. Using RPtree to partition the graph into clusters covering the manifold structure. Replacing neighboring points in Euclidean space with fixed number of centroids. Caorse graining based on diffusion maps. M. Farajtabar et al. for Online Semi-supervised Learning September 6, / 23

6 Our solution Outline Introduction Manifold regularization Our method like above ones rely on data reduction to overcome memory and time limitations. A semi-supervised data reduction method Captures the geometric structure of data Considers the labeled data as a cue to better preserve the classification accuracy. Spectral decomposition is used to find similar nodes on the manifold in order to be merged. Assuming a maximum buffer length of k, we do data reduction whenever this limit is reached. O(k 3 ) Time complexity compared with O(n 3 ) for each newly arrived data. M. Farajtabar et al. for Online Semi-supervised Learning September 6, / 23

7 Problem statement Outline Introduction Manifold regularization Let X u = {x 1,..., x u } and X l = {x u+1,..., x u+l } be sets of unlabeled and labeled data points respectively. Also let y = (y(u + 1),..., y(u + l)) be the vector labels on X l. Our goal is to predict labels of X = X u X l. Let W be the weight matrix of the k-nn graph of X, W (i, j) = exp( x i x j 2 /2σ) where σ is the bandwidth parameter. Defining diagonal matrix D with D(i, i) = n j=1 W (i, j) and the laplacian matrix L un = D W, manifold regularization algorithms minimize the smoothness functional S(f ) = 1 W (i, j)(f (i) f (j)) 2 = f T L un f 2 i,j under some appropriate criteria. M. Farajtabar et al. for Online Semi-supervised Learning September 6, / 23

8 Introduction Manifold regularization Harmonic Solution and Label Propagation Minimizing f T L un f such that f l = y is called Harmonic Solution for manifold regularization problem. Label Propagation is a way for computing HS. In this algorithm labels are propagated from labeled to unlabeled nodes through edges in an iterative manner. Defining the stochastic matrix P as P(i, j) = W (i, j)/ the algorithm is stated as follows 1 Propagation: f (t+1) Pf (t) 2 Clamping: f l = y n W (i, k). k=1 Under appropriate conditions, the solution converges to the HS: f u = (I P uu ) 1 P ul y f l = y. M. Farajtabar et al. for Online Semi-supervised Learning September 6, / 23

9 Spectral view Defining absorbing stochastic matrix, [ ] Puu P Q ul 0 I The entire process of LP may be rewritten in one step So the solution is where f (t+1) = Qf (t) n f (t+1) (j) = q ij f (t) (i). f u (j) = l+u k=u+1 i=1 Q (j, k)y(k), Q lim n Qn. M. Farajtabar et al. for Online Semi-supervised Learning September 6, / 23

10 Spectral view (cont.) Q (j, k) = p k (j), for u + 1 k l + u and 1 < j < n, where p k (j) denotes element j of the k th right eigenvector which is unitary (corresponding to eigenvalue equal to one). f u can be expressed in terms of the right unitary eigenvectors of Q: f u (j) = l+u k=u+1 p k (j)y(k). f u remains unchanged if these eigenvectors are preserved in a CG process. M. Farajtabar et al. for Online Semi-supervised Learning September 6, / 23

11 Intution Outline Exact Coarse Graining Approximate Coarse Graining Preserving Manifold Structure w 13 w 23 = q 13 = q 23 =, w 13 + w 14 w 23 + w 24 w 14 w 24 = q 14 = q 24 =. w 13 + w 14 w 23 + w w 14 w 13 w 23 w w 13 + w 23 w 14 + w w 35 w 48 w 35 w 48 5 w 36 w (a) Before CG 8 5 w 36 w (b) After CG 8 f (1) = q 13 f (3) + q 14 f (4) f (0) = q 03 f (3) + q 04 f (4) f (2) = q 23 f (3) + q 24 f (4) f (3) = q 31 f (1) + q 32 f (2) + f (3) = q 30 f (0) + f (4) = q 41 f (1) + q 42 f (2) + f (4) = q 40 f (0) + M. Farajtabar et al. for Online Semi-supervised Learning September 6, / 23

12 Formal statement Outline Exact Coarse Graining Approximate Coarse Graining Preserving Manifold Structure This process can be modeled by the transformation Q = LQR where d 1 d d 1+d 2 d 1+d L = R = 0.. I n 2. I n In the general case, R and L can be defined such that the transform merges all the nodes with the same neighbors. Rows i and j of Q are equal if and only if p k (i) = p k (j) for all k with λ k 0, where p k is a right eigenvector of Q. M. Farajtabar et al. for Online Semi-supervised Learning September 6, / 23

13 Formal statement(cont.) Exact Coarse Graining Approximate Coarse Graining Preserving Manifold Structure Lemma Lp is the right unitary eigenvector of Q with the same eigenvalue as p, where p is the right unitary eigenvector of Q. Before merge: f u (j) = u+l k=u+1 p k(j)y(k). After Merge: f u(j ) = u +l k=u +1 (Lp k)(j )y(k) Theorem LP solution is preserved for nodes or cluster of nodes in exact CG, i.e. when we merge nodes if their corresponding elements are the same along all right eigenvectors with nonzero eigenvalues. M. Farajtabar et al. for Online Semi-supervised Learning September 6, / 23

14 Motivation Outline Exact Coarse Graining Approximate Coarse Graining Preserving Manifold Structure In real problems the case where neighbors of two or more nodes are exactly the same rarely occurs. Merging nodes when their corresponding elements in eigenvectors are close enough. For example along i th eigenvector we consider two elements approximately the same if their difference is no more than η i, then merge nodes if they are pairwise approximately the same. On the other hand LP results depend on unitary eigenvectors, so a good practice is to do CG on unitary eigenvectors only. M. Farajtabar et al. for Online Semi-supervised Learning September 6, / 23

15 Analysis Outline Exact Coarse Graining Approximate Coarse Graining Preserving Manifold Structure consider the term p i RLp i. Lp i is the approximate right eigenvector of Q. Multiplying by R unfolds the clusters. Defining ε i = p i RLp i, we would like to find an upper bound of ε i for nodes to be merged. The smaller ε i is, the more similar Lp i is to p i Consider node 1 that is placed in a cluster S 1 = {1,..., m}, ε i (1) = p i (1) (RLp i )(1) = η i (1 u 1(1) m j=1 u 1(j) ). M. Farajtabar et al. for Online Semi-supervised Learning September 6, / 23

16 Analysis(cont.) Outline Exact Coarse Graining Approximate Coarse Graining Preserving Manifold Structure Suppose p is the true right eigenvector of Q. We would like to have Lp as its right eigenvector so as to better preserve the manifold structure and LP. It is thus interesting to see whether Lp can be approximately considered as an eigenvector of Q. Considering Q (Lp) = λ(lp) + e, we d like to minimize e. We have n l i=1 e(i) 2 u 1 (i) 2D, where D = k λ 2 n i=1 i j=1 u 1(j) ε i (j) 2 and p i s and λ i s are right eigenvectors and associated eigenvalues that CG is performed along. i : η i 2 1 2kM λ i 2 λmin n = λ l D/n D M. Farajtabar et al. for Online Semi-supervised Learning September 6, / 23

17 Manifold Structure Outline Exact Coarse Graining Approximate Coarse Graining Preserving Manifold Structure Not just unitary eigenvectors, but some other eigenvectors corresponding to larger eigenvalues. (c) Before CG (d) After CG M. Farajtabar et al. for Online Semi-supervised Learning September 6, / 23

18 Eigenvector Preservation Summary and Future Works Real world data: Handwritten digit recognition. Reduce the number of nodes from 1000 to 92. Eigenvalues and cosine similarity of eigenvectors before and after CG. Eigenvalues and eigenvectors are well preserved. This guarantees a good accuracy of classification after reduction i λ i λ i (Lp i ) p i Lp i p i M. Farajtabar et al. for Online Semi-supervised Learning September 6, / 23

19 Online Classification Summary and Future Works Maximum buffer size is 200. New data arrive sequentially. Classification accuracy up to time t. Comparison with the baseline and state of the art work (graph quantization). Manifold structure and label information is considered in the process of data reduction. Accuracy baseline coarse graining 0.84graph quantization Time step Accuracy Time step Accuracy Time step M. Farajtabar et al. for Online Semi-supervised Learning September 6, / 23

20 Manifold Structure Preservation Summary and Future Works CG is done for 500 data points to reduce the data size to 100. Accuracy is averaged over 500 new data points added separately. Prevent new data points recover the manifold structure. An indication of how well the manifold structure is preserved in CG. Accuracy coarse graining 0.84 graph quantization Number of clusters Accuracy Number of clusters M. Farajtabar et al. for Online Semi-supervised Learning September 6, / 23

21 Outlier Robustness Outline Summary and Future Works Noise is added manually and classification accuracy is calculated. Outliers are generated by adding multiples of the data variance. In our method outliers are merged and their effect is reduced while in the graph quantization method separate clusters are devoted to outliers. Accuracy Outlier ratio M. Farajtabar et al. for Online Semi-supervised Learning September 6, / 23

22 Summary and Future Works Summary Semi-supervised CG algorithm is proposed to reduce the number of data points while preserving the manifold structure. The prediction result is closely related to the eigenvectors of a variation of the stochastic matrix. Exact and approximate coarse graining algorithms. Future works: A theoretical analysis of robustness against noise. Extending the spectral view point to other manifold learning methods. Deriving tighter error bounds on CG. M. Farajtabar et al. for Online Semi-supervised Learning September 6, / 23

23 Summary and Future Works Thanks for your Attention. M. Farajtabar et al. for Online Semi-supervised Learning September 6, / 23

Efficient Iterative Semi-Supervised Classification on Manifold

Efficient Iterative Semi-Supervised Classification on Manifold Mehrdad Farajtabar, Hamid R. Rabiee, Amirreza Shaban, Ali Soltani-Farani Digital Media Lab, AICTC Research Center, Department of Computer