Non-negative Laplacian Embedding

Size: px
Start display at page:

Download "Non-negative Laplacian Embedding"

Transcription

1 9 Ninth IEEE International Conference on Data Mining Non-negative Laplacian Embedding Dijun Luo, Chris Ding, Heng Huang Computer Science and Engineering Department University of Texas at Arlington Arlington, Texas, 769 Tao Li School of Computer Science Florida International University Miami,3399 Abstract Laplacian embedding provides a low dimensional representation for a matrix of pairwise similarity data using the eigenvectors of the Laplacian matrix. The true power of Laplacian embedding is that it provides an approximation of the clustering. However, clustering requires the solution to be nonnegative. In this paper, we propose a new approach, nonnegative Laplacian embedding, which approximates clustering in a more direct way than traditional approaches. From the solution of our approach, clustering structures can be read off directly. We also propose an efficient algorithm to optimize the objective function utilized in our approach. Empirical studies on many real world datasets show that our approach leads to more accurate solution and improves clustering accuracy at the same time. Keywords-Laplacian Embedding; Non-negative Matrix Factorization; I. INTRODUCTION In many real world tasks in data mining, information retrieval, and machine learning areas, data are represented in high dimensional space, which might intrinsically lie in a very low dimensional one. In addition, many data come in as a matrix of pairwise similarities, such as network data, protein interaction data. Meanwhile, unlabeled data are much easier to be obtained than labeled data. Thus, it is challenging and useful to develop unsupervised approaches to embed high dimensional data into low dimensional one. From the data embedding point of view, there are two categories of embedding approaches. Approaches in the first category are to embed data into a linear space with linear transformation, such as principle component analysis (PCA). These approaches give out robust representations of data in a low dimension; However, they do not properly embed data which lie on non-linear manifold. The second category approaches embed data in a nonlinear manner. They include IsoMAP [5], Local Linear Embedding (LLE) [], Local Tangent Space Alignment [7], etc. These embeddings have different purposes and objectives. But they can detect the nonlinear manifold where data lie The above approaches all assume data points are represented by feature vectors (attributes). In this paper, our emphasis is on graph embedding, i.e., the relationship among data points are represented by a matrix of pairwise similarities (which are viewed as edge weights of the graph). Laplacian Embedding is one of the most popular graph embedding method. Laplacian Embedding and the related usage of eigenvectors of graph Laplace matrix is first developed in 97s. It was called quadratic placement[6] of graph nodes in a space. The eigenvectors of graph Laplace matrix are used for graph partitioning and connectivity analysis [4]. This approach becomes popular in 99s for circuit layout in VLSI community (see a review []), and graph partitioning [] for domain decomposition, a key problem in distributedmemory computing. Laplacian Embedding is now very popularly used [3] mainly due to its relation to graph clustering [5], [5], [4], [9]. In fact, the eigenvectors of Laplace matrix provides an approximation solution of the clustering [5], and the generalized eigenvectors of the Laplace matrix provides an approximation solution of the Normalized Cut clustering [4] and min-max clustering [9]. A Difficulty with Eigenvector Embedding A main difficulty of using eigenvectors of the Laplace matrix to solve multi-way clustering problem is that the eigenvectors have mixed-sign entries while the clustering indicator vectors (that these eigenvectors approximate) are nonnegative. For two-way clustering, this is not a problem because a linear Ψ-transformation[7] of the second eigenvector (the Fiedler vector) and the first eigenvector leads to two genuine indicator vectors (vectors with positive and/or zero entries and each row has only one nonzero entry). Because of this main difficulty, most applications resort to a two-step procedure[] : () embedding the graph into the eigenvector space (Laplace embedding) and () clustering these embedded points using K-means clustering. This procedure provides an approximate solution to the clustering problem. Nonnegative Embedding Provides a Solution In this paper, we propose a new approach. We propose to perform the Laplace embedding with nonnegative vectors, which can be directly interpreted as cluster membership indicator vectors. As a consequence, the nonnegative embedding also provides a more accurate solution to /9 $6. 9 IEEE DOI.9/ICDM

2 clustering problem because the solution indicators now more resembles the desired cluster indicators. We call this new approach, the nonnegative Laplacian Embedding (NLE). NLE has the following property. It optimizes the Ratio Cut function with enforcing the nonnegative requirements rigorously. With the nonnegative representation of the cluster indicator, the embedding results can be interpreted as the posterior clustering probability. As a result, the cluster membership can be read off from the embedding coordinates immediately (see 4). Second, our NLE method has soft clustering capability (see IX-A) where a data point could be fractionally assigned to several clusters. This capability is especially important for many real-life data which come with much noise. For these data, not every data point clearly and uniquely belongs to one cluster (pattern). The soft clustering capability is lacking in standard spectral clustering and K- means clustering. Our approach requires a solver which optimizes a quadratic function with both orthonormal and nonnegative constraints. The feasible domain of such optimization problem is highly non-linear and non-convex. In this paper, we also propose an efficient algorithm to address this problem. In the remainder of the paper, we first transform the minimization problems of the embedding ( ) and ( 3) into a maximization problem of a well-behaved positive definite function ( 4). In order to generalize the problem definition, in 5, we prove the similarity matrix (graph matrix) with mixed signs can also be applied for Laplacian embedding. After that, we present the NLE algorithm ( 6) and prove rigorously the correctness and convergence of our algorithm ( 7) using the theory of constrained optimization. We illustrate the NLE algorithm and capability using an example of faces images in 9. In, we perform extensive experiments on five UCI datasets [] and AT&T face image dataset [3] to compare our NLE algorithm to standard spectral approach. We show that our NLE algorithm consistently gives out better objective function value of Laplacian embedding and the clustering objective. Meanwhile, our NLE method also improves clustering accuracy over the standard spectral approach. Brief Summary of Major Clustering Frameworks In essence, our line of clustering framework is to show that the clustering objective can be written as an optimization of quadratic function with nonnegative constraints and orthogonality constraints; If we retain orthogonality while ignore the nonnegativity, the solution is the standard Laplacian embedding using eigenvectors. This has been the way spectral clustering is developed so far. However, if we retain nonnegativity rigorously and enforce the orthogonality approximately, the solution is the NLE proposed in this paper. We note that this clustering framework is similar to the K-means clustering PCA Nonnegative Matrix Factorization(NMF) [7], [8] framework [8]. It has been shown [6], [7], [8] that K-means clustering objective can be written as the maximization of a quadratic function with nonnegativity and orthogonality constraints; If we retain orthogonality while ignore the nonnegativity, the solution is PCA [6], [7]. However, if we retain nonnegativity rigorously and enforce orthogonality approximately, the solution is NMF[7]. Several further developments using NMF for clustering are convex NMF [], orthogonal NMF [3], and equivalence between NMF and probabilistic latent semantic indexing []. For recent surveys of NMF see [4], [9]. II. LAPLACIAN EMBEDDING We start with a brief introduction to Laplacian embedding. The input data is a matrix W of pairwise similarities among n objects. We view W as the edge weights on a graph with n nodes. The task is to embed the nodes of the graph into - D space with coordinates (x,,x n ). The objective is that if i, j are similar (i.e., w ij is large), they should be adjacent in embedded space, i.e., (x i x j ) should be small. This can be achieved by minimizing [6] min J(x) = (x i x j ) w ij x ij = ij x i (D W ) ij x j = x T (D W )x, () where D = diag(d,,d n ) and d i = j W ij. The minimization of ij (x i x j ) w ij would get x i = if there is no constraint on the magnitude of the vector x. Therefore, we impose the normalization i x i =.The original objective function invariant if we replace x i by x i + constant. Thus the solution is not unique. To fix this uncertainty, we can adjust the constant such that x i = (x i is centered around ). Thus x i have mixed signs. With these two constraints: x i =, x i =, i the solution of minimizing the embedding objective is given by the eigenvectors of i (D W )f = λf, () The matrix L = D W is called graph Laplacian. This is because L is a discrete form of the Laplace operator ( ) f(x, y, z) = x + y + z f(x, y, z) In mathematical physics, a partial differential operator is not defined unless the boundary condition are specified different boundary conditions leads to different solutions. The graph Laplacian here is the discretized form of Laplacian operator with the Von Neumann boundary condition, i.e., the derivatives along the boundary are zero. (The discretized 338

3 form of Laplacian operator with the Dirichlet boundary condition has slightly different form.) Because of the Von Neumann boundary condition, the solution is invariant w.r.t. an additive constant. As a consequence, the solution contains the constant eigenvector, the first eigenvector with eigenvalue zero. (see [] for details). Multi-dimensional Embedding This embedding can be generalized to embedding in k- dimensional space, with coordinates r i R k.let r i r j be the Euclidean distance between nodes i, j. The embedding is obtained by optimizing min J(R) = n r i r j w ij R i,j= n = r T i (D W ) ij r j i,j= = Tr R(D W )R T, (3) R (r,, r n ). In order to prevent R, we impose the normalization constraints RR T = I. To fix the uncertainty due to the shift invariance, we further impose the constraint ri =(r i is centered around ). The solution is given by eigenvectors: R =(f,, f k ) T. This is called spectral Laplacian embedding (spectral means using eigenvectors). Let Q = R T R n p, the spectral Laplacian embedding can be formally cast as an optimization problem: min Tr Q QT (D W )Q, s.t. Q T Q = I. (4) III. RATIO CUT SPECTRAL CLUSTERING The true power of Laplacian embedding is the clustering capability. Here we briefly outline the somehow often neglected, but fundamentally important relationship between the spectral clustering [5] and Laplacian embedding. In fact, these two things are identical! In clustering/partitioning a graph, the most popular objective is min-cut, which cuts the graph G into A, B such that the cross-cut similarity (weight) s(a, B) = i A j B w ij is minimized. Without size balancing, the mincut will often cut a very small subgraph out, leading to two highly unbalanced subgraphs. The first solution to this problem is developed in curcuit placement field by Cheng and Wei [6] who proposed to minimize the following ratio cut objective function min A,B s(a, B) A B = G [ s(a, B) + A ] s(a, B) B Note G is a constant and drops out. Hagen and Kahng[5] later show that Fiedler vector (nd eigenvector of the graph Laplacian) provides an effective solution. Chan et al. [5] generalized this two-way clustering to multi-way Ration Cut clustering: divide the nodes of G into K disjoint clusters {C p } by minimizing the objective function: J rc = p<q K s(c p,c q ) C p + s(c p,c q ), (5) C q where s(c k,c l )= i C k j C l w ij. d i = j w ij. Let h k = {, } n be an indicator vector for cluster C k, i.e., h k (i) =,ifx i belongs to the cluster C k ; h k (i) =, otherwise. They show that Theorem : The objective can be written as, J rc = = p<q K K l= s(c p,c q) C p + s(cp,cq) C q h T l (D W )h l h T l h l = Tr(H T (D W )H), (6) where H =(h / h,, h K / h K ). ratio cut problem becomes min Tr Q HT (D W )H, s.t. H T H = I, (7) Chan et al. also discussed the embedding of this function, which is identical to the Laplacian embedding of Eq.(4) with the same orthogonality constraints. Shi and Malik [4] further developed this into normalized cut clustering. Ding et al, [9] further developed this into the min-max cut clustering. A simple and widely adopted algorithm for solving spectral clustering has two steps: () compute the eigenvectors of L = D W for Laplacian embedding; () do K-means clustering in the eigenspace to obtain clusters. The second step is necessary because the eigenvector solution Q has mixed signs and the clusters cannot be identified directly. This is a generic difficulty of multi-way spectral clustering. IV. NONNEGATIVE LAPLACIAN EMBEDDING In all previous working on spectral clustering, the nonnegativity of the cluster indicator H are ignored. On the other hand, a nonnegative solution by enforcing the constraint H has two direct benefits: () we can obtain cluster assignments directly. () we obtain more accurate solution because the nonnegative solution resembles the desired cluster indicators. In this paper, we propose the Nonnegative Laplacian Embedding (NLE) approach. In NLE, we rigorously enforce the nonnegativity constraint. The most important benefit of nonnegative embedding is that the cluster membership can be read off from solution Q immediately: x i belongs to the cluster C k, where k corresponds to the largest component in the i-th row of Q, k = arg max Q ij. (8) j K In fact, we may view the i-th row of Q as the posterior probability that object i belongs to different clusters. 339

4 Formally, the optimization of Eq.(4) is identical to max Q Tr[QT (W D + σi)q], s.t. Q T Q = I,Q, (9) because the σ term TrQ T σiq = σtri = σn is a constant. We set σ = λ m to be the largest eigenvalue of L = D W. W D + σi is positive definite, because W D + σi = n k= (σ λ k)v k vk T. This -step transformation (change min to max and makes the objective positive definite) makes the optimization as a well-behaved problem. The algorithm to solve Eq.( 9) will be provided in 6. V. LAPLACIAN EMBEDDING WITH MIX SIGNED SIMILARITY MATRIX In traditional Laplacian embedding, graph matrix (i.e. similarity matrix) is required to be non-negative. Here we show that similarity matrix with mixed sign can also be applied for Laplacian Embedding, as well as NLE. Let W + and W be the positive and negative part of W, respectively: W = W + W. For the positive part, we want to minimize the embedding distance so that the instances are similar with each other, min w + x ij (x i x j ). i,j But for the negative part, we maximize the embedding distance so that the instances are dissimilar, max w x ij (x i x j ). i,j We can combine them together by minimizing the difference, min (w + x ij w ij )(x i x j ) = w ij (x i x j ). i,j i,j Here we show that the similarity matrix can be shifted by any constant. Theorem : If q is a non-trivial eigenvector of graph Laplacian on similarity W, then q is also an eigenvector of graph Laplacian on similarity W + σe, where σ is any constant and E is a matrix with all entries with proper size. Obviously the e (a single column with all ones) is an eigenvector of any graph Laplacian. The corresponding eigenvalue is. Here, by non-trivial eigenvector, we mean those eigenvectors which are not e. Proof. Since q is a non-trivial eigenvector of graph Laplacian on similarity W, (D W )q = λq. If the similarity matrix shifts by a constant, W = W + σe, then the corresponding graph Laplacian becomes: L = D W =(D + nσi) (W + σe). Notice that all non-trivial eigenvectors are orthogonal to the trivial eigenvector e, L q = [(D + nσi) (W + σe)]q = (D W )q + nσq E)q = (λ + nσ)q, () which indicates q is also an eigenvector of L. Theorem suggests that for any mix signed similarity matrix, we can add any constant, such that the similarity matrix is nonnegative, without changing the eigenvectors (i.e. the embedding results remain the same). VI. SOLVING NLE PROBLEMS Inspired from NMF algorithms, we solve the NLE problem of Eq.(9) using the similar techniques, see discussions for the relationship with NMF in VIII. A. NLE algorithm The algorithm starts with an initial guess Q. It then iteratively updates Q until convergence using the updating rule: [(W + σi)q + QΛ Q ik Q ] ik ik, () [DQ + QΛ + ] ik where Λ=Q T (W + σi D)Q, () and Λ + is the positive part of Λ, and similarly for Λ. Notice that the feasible domain of Eq.(9) is non-convex, indicating that our algorithm can only reach local solutions. However, we show in empirical study that our algorithm yields much better Ration Cut objective than standard spectral clustering with a statistical analysis over a large number of random trials. B. Computational complexity analysis In the typical implementation of NLE algorithm, the computational complexity is O(n K) [the complexity bottleneck is the computation of Λ in Eq.()], which is not suitable to large scale problems. However, one can easily incorporate the approximate decompositions such as Nyström decomposition, to reduce the problem to O(nK ) time complexity. VII. ANALYSIS OF NLE ALGORITHM In this section, we show the correctness and convergence of our algorithm. For correctness, we mean that the update yields a correct solution at convergence; The correctness of our algorithm is assured by the following theorem. Theorem 3: Fixed points of Eq. () satisfy the KKT condition of the optimization problem of Eq.(7). Proof. We begin with the Lagrangian L = Tr[Q T (W + σi D)Q Λ(Q T Q I) ΣQ], (3) 34

5 where the Lagrange multiplier Λ enforces the orthogonality condition Q T Q = I and the Lagrange multiplier Σ enforces the nonnegativity of Q. The KKT complementary slackness condition ( L/ Q ik )Q ik =becomes [(W + σi D)Q QΛ] ij Q ij =. (4) Clearly, a fixed point of the update rule Eq. () satisfies [(W + σi D)Q QΛ] ij Q ij =. This equation is mathematically identical to Eq. (4). From Eq. (4), summing over j, we obtain Λ ii = [Q T (W + σi D)Q] ii. To find the off-diagonal elements of α, we ignore the nonnegativity requirement and setting L/ Q =which leads to Λ ii =[Q T (W + σi D)Q] ii. By combining these two results we obtain Eq.(9). The convergence of our algorithm is assured by the following Theorem. Theorem 4: Under the update rule of Eq. (), the Lagrangian function L = Tr[Q T (W + σi D)Q Λ(Q T Q I)], (5) increases monotonically. Proof of Theorem 4. We use the auxiliary function approach [8]. An auxiliary function G(H, H) of function L(H) satisfies G(H, H) =L(H), G(H, H) L(H). We define H (t+) = arg max G(H, H (t) ). (6) H Then by construction, we have L(H (t) )=Z(H (t),h (t) ) Z(H (t+),h (t) ) L(H (t+) ). (7) This proves that L(H (t) ) is monotonically increasing. The key steps in the remainder of the proof are: () Find an appropriate auxiliary function; () Find the global maxima of the auxiliary function. We write Eq.(5) as L = Tr[Q T (W + σi)q +Λ Q T Q Q T DQ Λ + Q T Q]. We can show that one auxiliary function of L is Z(H, H) = ijk + ilk ik using the inequality (W + σ) ij Hik Hjk ( + log H ikh jk Hjk ) (Λ ) kl Hik Hil ( + log H ikh il Hil ) (D H) ik H ik ik ( HΛ + ) ik H ik z + logz,z = H ik H jk / Hjk, and a generic inequality (8) n k i= p= (AS B) ip S ip S ip Tr(S T ASB), (9) where A, B, S, S >,A = A T,B = B T. We now find the global maxima of Z(H) =G(H, H). The gradient is Z(H, H) H ik = [(W + σ) H] ij Hik H ik (D H) ik H ik The second derivative G(H, H) H ik H jl = W ik δ ij δ kl, W ik = [(W + σ) H] ij Hik H ik + (D H) ik + ( HΛ ) kl Hik H il ( HΛ + ) ik H ik + ( HΛ + ) ik, + ( HΛ ) ik Hik H il () () is negative definite. Thus Z(H) is a concave function in H and has a unique global maximum. This maximum is obtained by setting the first derivative to zero, yielding: Hik = [(W + σ) H] ij +( HΛ ) ik (D H) () ik +( HΛ + ) ik According to Eq. (6), H (t+) = H and H (t) = H, we see that Eq. () is the update rule of Eq. (). Thus Eq. (7) always holds. VIII. RELATIONSHIP WITH NMF The nonnegative Laplacian Embedding is inspired from the idea of NMF. Here we show that these two methods are connected. Theorem 5: Eq. (9) is equivalent to the following, Proof. (W D + σi) QQ T min (W D + σi) Q QQT, s.t. Q T Q = I,Q, (3) = W D + σi Tr(W D + σi)qq T + QQ T Since W D + σi and QQ T are constant (with the constraint Q T Q = I), Eq.(3) is equivalent to min [ Tr(W D + Q σi)qqt ], or max Q TrQT (W D + σi)q, with the same constraints, which is identical to Eq. (9). 34

6 Figure. Face images are selected from AT&T face database. On top three rows (one person per row), each person has ten images with different expressions. On the fourth row, ten images come from ten different people. IX. ILLUSTRATION EXAMPLE We illustrate the nonnegative Laplacian embedding using a simple dataset of 3 images from the AT&T face database [3] (see the first three row of Fig. ). Each person has images with different expressions. Using the standard way, for each image, we reshape the image to a single vector to represent the image. For this experiment, since the pixel values of the images are non-negative, we use the inner product (w ij = x T i x j) of two images to calculate the similarity; an advantage of inner-product similarity is that there is no adjustable parameter. We start NLE algorithm with random matrix Q, Q. We show the NLE embedding results at the st, -th, 5- th and 3-th iterations (see Fig. ). The objective function value is also shown on y-axis. For each checkpoint, we use a 3D plot to show all 3 images (each image as a point) with the first, second, and third row of Q as x-, y-, and z-axis. Because we impose both non-negative and near orthogonal constraints on Q, all the data points are near the positive part of the axis. From Fig., we notice that the clustering structure becomes more and more clear as the objective function value increases. A. Soft clustering capability of NLE In traditional spectral clustering, a data point must belong to one of the clusters this is hard clustering. However, such hard clustering sometimes prevents us from detecting delicate cluster structure details in complex data. For example, in Fig., we may add images from other persons (shown as the bottom row) to the 3 images on the top. Traditional spectral clustering will assign these images into one of the 3 clusters. However, these images do not belong to three existing clusters. Ideally, the clustering solution would exhibit this fact. We now demonstrate that this fact is revealed in our NLE approach. Our NLE has the soft clustering capability, i.e., the solution Q can be viewed as posterior probability of the object to be assigned to each cluster. The NLE solution Q =[q,q,q 3 ] is shown in Hinton diagram (see Fig. 3). In the figure the face images index i is sorted as following: i = for the images shown in st row of Fig., i = for the images shown in nd row of Fig., i = 3 for the images shown in 3rd row of Fig., and i =3 4 for the images shown in 4th row of Fig.. We plot the elements of solution Q in rectangles, the size of which denotes the value of the corresponding elements. We see from Fig.3 that for the first 3 images, one of q k is very pronounced and other components negligible: the cluster distribution/assignment are very clear. For the last images, none of them is clearly clustered into any clusters indicating the soft clustering nature for these images. These images are outliers in this dataset, and our NLE algorithm can correctly detect them. 34

7 Index of Image Figure 3. Soft clustering of NLE. q (i),q (i),q 3 (i) are shown as 3 rows using Hinton diagram of i = 4 (x-axis) for the 4 images in Fig., where i =3 4 correspond to the images in the 4th row of Fig.. Objective # iteration known/observed to belong to class F, etc.). They are clustered into K clusters. with m k = C k. This forms a contingency table T = (T kl ), where T kl denotes the number of objects from class F k and have been clustered into cluster C l. Clearly, l T kl = n k and k T kl = m l. The clustering accuracy is the percentage of objects been correctly clustered: ρ = k T kk/n. In practice, matching F k to C l is obtained by running the Hungarian algorithm for the optimal bipartite matching. A. Evolution of NLE algorithm In Figs 4 and 5, we show NLE evolutions of two typical runs on two UCI (dermatology and zoo) datasets. The initial Q are set to be results in spectral clustering as explained above. We observe that the NLE objective function values increase steadily as iteration proceeds. The clustering accuracy also improves with more iterations. These facts indicate that clustering quality is improved when the objective function value increases. B. Comparison with spectral clustering Figure. NLE results on the top 3 face images of Figure at different iterations. The objective function values of Tr Q T (W + σi D)Q are shown on y-axis. For each checkpoint, we use a 3D plot to show all 3 images (each image as a point) with the first, second, and third row of Q as x-, y-, and z-axis. X. EXPERIMENTS ON UCI DATASETS We evaluate the performance of our NLE algorithm in 4 UCI datasets []: Dermatology, Soybean, Vehicle, and Zoo. In experiments, our goal is to compare with the standard spectral approach (as explained in last paragraph of 3). Therefore, we initialize Q using the clustering solution of the standard spectral clustering: H is set to the cluster indicator and Q = H +. as the starting point. In evaluation, we use clustering accuracy. Suppose we have N = n + n + + n K data objects (n are Table I EXPERIMENTAL SETUP DETAILS ON UCI AND AT&T DATASETS Dataset #sample #feature #class Dermatology Glass Soybean Vehicle Zoo 6 7 AT&T We perform extensive evaluation of both NLE and spectral clustering on the 5 UCI datasets and the AT&T dataset (See Table for experimental setup details). We note that the standard spectral clustering results on a dataset are not deterministic, because the results of K-means on the eigenspace (the spectral Laplacian embedding) depend sensitively on 343

8 Objective x # iteration Figure 4. NLE objective function value and clustering accuracy on dermatology dataset. The accuracy starts from the spectral clustering value and improves with more NLE iterations. Objective.7 x # iteration Figure 5. NLE objective function value and clustering accuracy on zoo dataset. The accuracy starts from the spectral clustering value and improves with more NLE iterations. the initialization. For this reason, we perform 4 runs of K-means clustering on the eigenspace for each dataset. We also perform 4 NLE computations and each of them is initialized from the spectral solution. We evaluate the performance as following. Define Best(N) to be the lowest objective among N random trials for both of approaches (spectral clustering and NLE). Clearly, Best(N) improves (decreases) as we increase N. The results of experiments for different N are shown in Figure 6. [At smaller N, the results are averaged with multiple N-interval runs.] The results are shown in the right of Figures 6 (a-f). We compare the clustering accuracy using the same strategy (shown in left of Figures 6 (a-f)). For objective, the best (minimum) value is subtracted from the original objective. In all 6 datasets, NLE results are consistently better than spectral clustering on average, in both terms of Ratio Cup objective and clustering accuracy. In Table, we show the objective function value and the corresponding clustering accuracy, picking the best result of the 4 runs (here, the best means the lowest objective function value, because this is an unsupervised learning). For all 4 datasets, NLE consistently gives lower (better) objective function value and higher clustering accuracy. XI. CONCLUSION In this paper, we propose a Nonnegative Laplacian Embedding (NLE) algorithm and prove the correctness and convergence of the algorithm. NLE gives nonnegative embedding results from which clustering structures of data can be read off immediately. A computationally efficient algorithm is developed to solve proposed NLE problems. Moreover, we prove the similarity matrix (i.e. graph matrix) with mixed signs can also be applied for Laplacian embedding. We demonstrate the cluster assignment advantage and soft-clustering capability of NLE algorithm by illustrations on face expression data and extensive experiments on five UCI datasets and one image dataset. Our approach consistently outperforms spectral clustering in terms of both Ratio Cut objective and clustering accuracy. Acknowledgment. This work is supported partially by NSF DMS and NSF CCF-8378 at UTA, and NSF DMS and IIS-5468 at FIU. REFERENCES [] C.J. Alpert and A.B. Kahng. Recent directions in netlist partitioning: a survey. Integration, the VLSI Journal, 9: 8, 995. [] A. Asuncion and D. Newman. UCI machine learning repository. 7. [3] M. Belkin and P. Niyogi. Laplacian eigenmaps and spectral techniques for embedding and clustering. NIPS,. [4] M. Berry, M. Browne, A. Langville, P. Pauca, and R. Plemmons. Algorithms and applications for approximate nonnegative matrix factorization. To Appear in Computational Statistics and Data Analysis, 6. [5] P.K. Chan, M.Schlag, and J.Y. Zien. Spectral k-way ratiocut partitioning and clustering. IEEE Trans. CAD-Integrated Circuits and Systems, 3:88 96, 994. [6] C.-K. Cheng and Y.A. Wei. An improved two-way partitioning algorithm with stable performance. IEEE. Trans. on Computed Aided Desgin, :5 5, 99. [7] C. Ding and X. He. K-means clustering and principal component analysis. Int l Conf. Machine Learning (ICML),

9 Objective ( 3 ) Clustering SpecClus NLE SpecClus NLE Dataset Ave Best Ave Best Ave Best Ave Best Dermatology Glass Soybean Vehicle Zoo AT&T Table II AVERAGE (AVG) AND BEST RATIO CUT OBJECTIVE FUNCTION VALUE AND CLUSTERING ACCURACY OF STANDARD SPECTRAL CLUSTERING (SPECCLUS) AND NLE OVER 4 RANDOM TRIALS. MEANS THAT THE LOWER THE BETTER AND MEANS THE HIGHER THE BETTER. [8] C. Ding, X. He, and H.D. Simon. On the equivalence of nonnegative matrix factorization and spectral clustering. Proc. SIAM Data Mining Conf, 5. [9] C. Ding, X. He, H. Zha, M. Gu, and H. Simon. A min-max cut algorithm for graph partitioning and data clustering. Proc. IEEE Int l Conf. Data Mining (ICDM), pages 7 4,. [] C. Ding, T. Li, and W. Peng. Nonnegative matrix factorization and probabilistic latent semantic indexing: Equivalence, chisquare statistic, and a hybrid method. Proc. National Conf. Artificial Intelligence, 6. [] Chris Ding, Rong Jin, Tao Li, and Horst D. Simon. A learning framework using green s function and kernel regularization with application to recommender system. In KDD, pages 6 69, 7. [] Chris Ding, Tao Li, and Michael I. Jordan. Convex and semi-nonnegative matrix factorizations. IEEE Trans. Pattern Analysis and Machine Intelligence, 9. [3] Chris Ding, Tao Li, Wei Peng, and Haesun Park. Orthogonal nonnegative matrix tri-factorizations for clustering. Proc Int l Conf. on Knowledge Discovery and Data Mining (KDD 6), page Accepted by. [4] M. Fiedler. Algebraic connectivity of graphs. Czech. Math. J., 3:98 35, 973. [5] L. Hagen and A.B. Kahng. New spectral methods for ratio cut partitioning and clustering. IEEE. Trans. on Computed Aided Desgin, :74 85, 99. [] A.Y. Ng, M.I. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. Proc. Neural Info. Processing Systems (NIPS ),. [] A. Pothen, H. D. Simon, and K. P. Liou. Partitioning sparse matrices with egenvectors of graph. SIAM Journal of Matrix Anal. Appl., :43 45, 99. [] S. Roweis and L. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 9:33 36,. [3] Ferdinando Samaria and Andy Harter. Parameterisation of a stochastic model for human face identification, [4] J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE. Trans. on Pattern Analysis and Machine Intelligence, :888 95,. [5] J.B. Tenenbaum, V. de Silva, and J.C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 9:39 33,. [6] H. Zha, C. Ding, M. Gu, X. He, and H.D. Simon. Spectral relaxation for K-means clustering. Advances in Neural Information Processing Systems 4 (NIPS ), pages 57 64,. [7] Z. Zhang and Z. Zha. Principal manifolds and nonlinear dimensionality reduction via tangent space alignment. SIAM J. Scientific Computing, 6:33 338, 4. [6] K. M. Hall. R-dimensional quadratic placement algorithm. Management Science, 7:9 9, 97. [7] D.D. Lee and H. S. Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 4:788 79, 999. [8] D.D. Lee and H.S. Seung. Algorithms for non-negative matrix factorization. In Advances in Neural Information Processing Systems 3, Cambridge, MA,. MIT Press. [9] Tao Li and Chris Ding. The relationships among various nonnegative matrix factorization methods for clustering. In ICDM, pages 36 37,

10 .95 Highest of of N Trials SpecClus NLE 6 Lowest of of N Trials (A) Dermatology (B) Glass (C) Soybean (D) Vehicle (E) Zoo log (F) AT&T N log N Figure 6. Clustering accuracy (left) and objective (right) on six datasets for Spectral Clustering (SpecClus) and our method (NLE). For clustering accuracy the higher the better ( ) and for objective the lower the better ( ). 346

Graph-Laplacian PCA: Closed-form Solution and Robustness

Graph-Laplacian PCA: Closed-form Solution and Robustness 2013 IEEE Conference on Computer Vision and Pattern Recognition Graph-Laplacian PCA: Closed-form Solution and Robustness Bo Jiang a, Chris Ding b,a, Bin Luo a, Jin Tang a a School of Computer Science and

More information

CSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13

CSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13 CSE 291. Assignment 3 Out: Wed May 23 Due: Wed Jun 13 3.1 Spectral clustering versus k-means Download the rings data set for this problem from the course web site. The data is stored in MATLAB format as

More information

Orthogonal Nonnegative Matrix Factorization: Multiplicative Updates on Stiefel Manifolds

Orthogonal Nonnegative Matrix Factorization: Multiplicative Updates on Stiefel Manifolds Orthogonal Nonnegative Matrix Factorization: Multiplicative Updates on Stiefel Manifolds Jiho Yoo and Seungjin Choi Department of Computer Science Pohang University of Science and Technology San 31 Hyoja-dong,

More information

Nonlinear Dimensionality Reduction

Nonlinear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Kernel PCA 2 Isomap 3 Locally Linear Embedding 4 Laplacian Eigenmap

More information

On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering

On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering Chris Ding Xiaofeng He Horst D. Simon Abstract Current nonnegative matrix factorization (NMF) deals with X = FG T type. We

More information

Robust Laplacian Eigenmaps Using Global Information

Robust Laplacian Eigenmaps Using Global Information Manifold Learning and its Applications: Papers from the AAAI Fall Symposium (FS-9-) Robust Laplacian Eigenmaps Using Global Information Shounak Roychowdhury ECE University of Texas at Austin, Austin, TX

More information

Solving Consensus and Semi-supervised Clustering Problems Using Nonnegative Matrix Factorization

Solving Consensus and Semi-supervised Clustering Problems Using Nonnegative Matrix Factorization Solving Consensus and Semi-supervised Clustering Problems Using Nonnegative Matrix Factorization Tao Li School of CS Florida International Univ. Miami, FL 33199, USA taoli@cs.fiu.edu Chris Ding CSE Dept.

More information

Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center

Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center 1 Outline Part I - Applications Motivation and Introduction Patient similarity application Part II

More information

Spectral Clustering. Zitao Liu

Spectral Clustering. Zitao Liu Spectral Clustering Zitao Liu Agenda Brief Clustering Review Similarity Graph Graph Laplacian Spectral Clustering Algorithm Graph Cut Point of View Random Walk Point of View Perturbation Theory Point of

More information

Spectral Clustering. Spectral Clustering? Two Moons Data. Spectral Clustering Algorithm: Bipartioning. Spectral methods

Spectral Clustering. Spectral Clustering? Two Moons Data. Spectral Clustering Algorithm: Bipartioning. Spectral methods Spectral Clustering Seungjin Choi Department of Computer Science POSTECH, Korea seungjin@postech.ac.kr 1 Spectral methods Spectral Clustering? Methods using eigenvectors of some matrices Involve eigen-decomposition

More information

Nonlinear Dimensionality Reduction. Jose A. Costa

Nonlinear Dimensionality Reduction. Jose A. Costa Nonlinear Dimensionality Reduction Jose A. Costa Mathematics of Information Seminar, Dec. Motivation Many useful of signals such as: Image databases; Gene expression microarrays; Internet traffic time

More information

Non-linear Dimensionality Reduction

Non-linear Dimensionality Reduction Non-linear Dimensionality Reduction CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Laplacian Eigenmaps Locally Linear Embedding (LLE)

More information

Unsupervised dimensionality reduction

Unsupervised dimensionality reduction Unsupervised dimensionality reduction Guillaume Obozinski Ecole des Ponts - ParisTech SOCN course 2014 Guillaume Obozinski Unsupervised dimensionality reduction 1/30 Outline 1 PCA 2 Kernel PCA 3 Multidimensional

More information

On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering

On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering Chris Ding, Xiaofeng He, Horst D. Simon Published on SDM 05 Hongchang Gao Outline NMF NMF Kmeans NMF Spectral Clustering NMF

More information

Statistical and Computational Analysis of Locality Preserving Projection

Statistical and Computational Analysis of Locality Preserving Projection Statistical and Computational Analysis of Locality Preserving Projection Xiaofei He xiaofei@cs.uchicago.edu Department of Computer Science, University of Chicago, 00 East 58th Street, Chicago, IL 60637

More information

Data Analysis and Manifold Learning Lecture 7: Spectral Clustering

Data Analysis and Manifold Learning Lecture 7: Spectral Clustering Data Analysis and Manifold Learning Lecture 7: Spectral Clustering Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture 7 What is spectral

More information

Spectral Techniques for Clustering

Spectral Techniques for Clustering Nicola Rebagliati 1/54 Spectral Techniques for Clustering Nicola Rebagliati 29 April, 2010 Nicola Rebagliati 2/54 Thesis Outline 1 2 Data Representation for Clustering Setting Data Representation and Methods

More information

Lecture 10: Dimension Reduction Techniques

Lecture 10: Dimension Reduction Techniques Lecture 10: Dimension Reduction Techniques Radu Balan Department of Mathematics, AMSC, CSCAMM and NWC University of Maryland, College Park, MD April 17, 2018 Input Data It is assumed that there is a set

More information

Fast Nonnegative Matrix Factorization with Rank-one ADMM

Fast Nonnegative Matrix Factorization with Rank-one ADMM Fast Nonnegative Matrix Factorization with Rank-one Dongjin Song, David A. Meyer, Martin Renqiang Min, Department of ECE, UCSD, La Jolla, CA, 9093-0409 dosong@ucsd.edu Department of Mathematics, UCSD,

More information

Automatic Rank Determination in Projective Nonnegative Matrix Factorization

Automatic Rank Determination in Projective Nonnegative Matrix Factorization Automatic Rank Determination in Projective Nonnegative Matrix Factorization Zhirong Yang, Zhanxing Zhu, and Erkki Oja Department of Information and Computer Science Aalto University School of Science and

More information

1 Matrix notation and preliminaries from spectral graph theory

1 Matrix notation and preliminaries from spectral graph theory Graph clustering (or community detection or graph partitioning) is one of the most studied problems in network analysis. One reason for this is that there are a variety of ways to define a cluster or community.

More information

Note on Algorithm Differences Between Nonnegative Matrix Factorization And Probabilistic Latent Semantic Indexing

Note on Algorithm Differences Between Nonnegative Matrix Factorization And Probabilistic Latent Semantic Indexing Note on Algorithm Differences Between Nonnegative Matrix Factorization And Probabilistic Latent Semantic Indexing 1 Zhong-Yuan Zhang, 2 Chris Ding, 3 Jie Tang *1, Corresponding Author School of Statistics,

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

Markov Chains and Spectral Clustering

Markov Chains and Spectral Clustering Markov Chains and Spectral Clustering Ning Liu 1,2 and William J. Stewart 1,3 1 Department of Computer Science North Carolina State University, Raleigh, NC 27695-8206, USA. 2 nliu@ncsu.edu, 3 billy@ncsu.edu

More information

A Duality View of Spectral Methods for Dimensionality Reduction

A Duality View of Spectral Methods for Dimensionality Reduction A Duality View of Spectral Methods for Dimensionality Reduction Lin Xiao 1 Jun Sun 2 Stephen Boyd 3 May 3, 2006 1 Center for the Mathematics of Information, California Institute of Technology, Pasadena,

More information

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian

More information

SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS

SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS VIKAS CHANDRAKANT RAYKAR DECEMBER 5, 24 Abstract. We interpret spectral clustering algorithms in the light of unsupervised

More information

Doubly Stochastic Normalization for Spectral Clustering

Doubly Stochastic Normalization for Spectral Clustering Doubly Stochastic Normalization for Spectral Clustering Ron Zass and Amnon Shashua Abstract In this paper we focus on the issue of normalization of the affinity matrix in spectral clustering. We show that

More information

ECS231: Spectral Partitioning. Based on Berkeley s CS267 lecture on graph partition

ECS231: Spectral Partitioning. Based on Berkeley s CS267 lecture on graph partition ECS231: Spectral Partitioning Based on Berkeley s CS267 lecture on graph partition 1 Definition of graph partitioning Given a graph G = (N, E, W N, W E ) N = nodes (or vertices), E = edges W N = node weights

More information

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation Introduction and Data Representation Mikhail Belkin & Partha Niyogi Department of Electrical Engieering University of Minnesota Mar 21, 2017 1/22 Outline Introduction 1 Introduction 2 3 4 Connections to

More information

Data Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings

Data Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings Data Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline

More information

On the equivalence between Non-negative Matrix Factorization and Probabilistic Latent Semantic Indexing

On the equivalence between Non-negative Matrix Factorization and Probabilistic Latent Semantic Indexing Computational Statistics and Data Analysis 52 (2008) 3913 3927 www.elsevier.com/locate/csda On the equivalence between Non-negative Matrix Factorization and Probabilistic Latent Semantic Indexing Chris

More information

Convex and Semi-Nonnegative Matrix Factorizations

Convex and Semi-Nonnegative Matrix Factorizations CONVEX AND SEMI-NONNEGATIVE MATRIX FACTORIZATIONS: DING, LI AND JORDAN 1 Convex and Semi-Nonnegative Matrix Factorizations Chris Ding, Tao Li, and Michael I. Jordan Chris Ding is with the Department of

More information

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering Shuyang Ling Courant Institute of Mathematical Sciences, NYU Aug 13, 2018 Joint

More information

2 GU, ZHOU: NEIGHBORHOOD PRESERVING NONNEGATIVE MATRIX FACTORIZATION graph regularized NMF (GNMF), which assumes that the nearby data points are likel

2 GU, ZHOU: NEIGHBORHOOD PRESERVING NONNEGATIVE MATRIX FACTORIZATION graph regularized NMF (GNMF), which assumes that the nearby data points are likel GU, ZHOU: NEIGHBORHOOD PRESERVING NONNEGATIVE MATRIX FACTORIZATION 1 Neighborhood Preserving Nonnegative Matrix Factorization Quanquan Gu gqq03@mails.tsinghua.edu.cn Jie Zhou jzhou@tsinghua.edu.cn State

More information

Linear Dimensionality Reduction

Linear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis

More information

L26: Advanced dimensionality reduction

L26: Advanced dimensionality reduction L26: Advanced dimensionality reduction The snapshot CA approach Oriented rincipal Components Analysis Non-linear dimensionality reduction (manifold learning) ISOMA Locally Linear Embedding CSCE 666 attern

More information

Discriminative K-means for Clustering

Discriminative K-means for Clustering Discriminative K-means for Clustering Jieping Ye Arizona State University Tempe, AZ 85287 jieping.ye@asu.edu Zheng Zhao Arizona State University Tempe, AZ 85287 zhaozheng@asu.edu Mingrui Wu MPI for Biological

More information

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Yoshua Bengio Pascal Vincent Jean-François Paiement University of Montreal April 2, Snowbird Learning 2003 Learning Modal Structures

More information

LECTURE NOTE #11 PROF. ALAN YUILLE

LECTURE NOTE #11 PROF. ALAN YUILLE LECTURE NOTE #11 PROF. ALAN YUILLE 1. NonLinear Dimension Reduction Spectral Methods. The basic idea is to assume that the data lies on a manifold/surface in D-dimensional space, see figure (1) Perform

More information

A Duality View of Spectral Methods for Dimensionality Reduction

A Duality View of Spectral Methods for Dimensionality Reduction Lin Xiao lxiao@caltech.edu Center for the Mathematics of Information, California Institute of Technology, Pasadena, CA 91125, USA Jun Sun sunjun@stanford.edu Stephen Boyd boyd@stanford.edu Department of

More information

Discriminant Uncorrelated Neighborhood Preserving Projections

Discriminant Uncorrelated Neighborhood Preserving Projections Journal of Information & Computational Science 8: 14 (2011) 3019 3026 Available at http://www.joics.com Discriminant Uncorrelated Neighborhood Preserving Projections Guoqiang WANG a,, Weijuan ZHANG a,

More information

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Course 495: Advanced Statistical Machine Learning/Pattern Recognition Course 495: Advanced Statistical Machine Learning/Pattern Recognition Deterministic Component Analysis Goal (Lecture): To present standard and modern Component Analysis (CA) techniques such as Principal

More information

Nonnegative Matrix Factorization and Probabilistic Latent Semantic Indexing: Equivalence, Chi-square Statistic, and a Hybrid Method

Nonnegative Matrix Factorization and Probabilistic Latent Semantic Indexing: Equivalence, Chi-square Statistic, and a Hybrid Method Nonnegative Matrix Factorization and Probabilistic Latent Semantic Indexing: Equivalence, hi-square Statistic, and a Hybrid Method hris Ding a, ao Li b and Wei Peng b a Lawrence Berkeley National Laboratory,

More information

Learning Generative Models of Similarity Matrices

Learning Generative Models of Similarity Matrices Appears in: Proc. Uncertainty in Artificial Intelligence, August 23. Learning Generative Models of Similarity Matrices Rómer Rosales Dept. of Elect. and Comp. Engineering University of Toronto romer@psi.toronto.edu

More information

MULTIPLICATIVE ALGORITHM FOR CORRENTROPY-BASED NONNEGATIVE MATRIX FACTORIZATION

MULTIPLICATIVE ALGORITHM FOR CORRENTROPY-BASED NONNEGATIVE MATRIX FACTORIZATION MULTIPLICATIVE ALGORITHM FOR CORRENTROPY-BASED NONNEGATIVE MATRIX FACTORIZATION Ehsan Hosseini Asl 1, Jacek M. Zurada 1,2 1 Department of Electrical and Computer Engineering University of Louisville, Louisville,

More information

Unsupervised Clustering of Human Pose Using Spectral Embedding

Unsupervised Clustering of Human Pose Using Spectral Embedding Unsupervised Clustering of Human Pose Using Spectral Embedding Muhammad Haseeb and Edwin R Hancock Department of Computer Science, The University of York, UK Abstract In this paper we use the spectra of

More information

PARAMETERIZATION OF NON-LINEAR MANIFOLDS

PARAMETERIZATION OF NON-LINEAR MANIFOLDS PARAMETERIZATION OF NON-LINEAR MANIFOLDS C. W. GEAR DEPARTMENT OF CHEMICAL AND BIOLOGICAL ENGINEERING PRINCETON UNIVERSITY, PRINCETON, NJ E-MAIL:WGEAR@PRINCETON.EDU Abstract. In this report we consider

More information

NONNEGATIVE matrix factorization (NMF) is a

NONNEGATIVE matrix factorization (NMF) is a Algorithms for Orthogonal Nonnegative Matrix Factorization Seungjin Choi Abstract Nonnegative matrix factorization (NMF) is a widely-used method for multivariate analysis of nonnegative data, the goal

More information

A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier

A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier Seiichi Ozawa, Shaoning Pang, and Nikola Kasabov Graduate School of Science and Technology, Kobe

More information

Using Matrix Decompositions in Formal Concept Analysis

Using Matrix Decompositions in Formal Concept Analysis Using Matrix Decompositions in Formal Concept Analysis Vaclav Snasel 1, Petr Gajdos 1, Hussam M. Dahwa Abdulla 1, Martin Polovincak 1 1 Dept. of Computer Science, Faculty of Electrical Engineering and

More information

Iterative Laplacian Score for Feature Selection

Iterative Laplacian Score for Feature Selection Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,

More information

1 Matrix notation and preliminaries from spectral graph theory

1 Matrix notation and preliminaries from spectral graph theory Graph clustering (or community detection or graph partitioning) is one of the most studied problems in network analysis. One reason for this is that there are a variety of ways to define a cluster or community.

More information

Analysis of Spectral Kernel Design based Semi-supervised Learning

Analysis of Spectral Kernel Design based Semi-supervised Learning Analysis of Spectral Kernel Design based Semi-supervised Learning Tong Zhang IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Rie Kubota Ando IBM T. J. Watson Research Center Yorktown Heights,

More information

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR

More information

he Applications of Tensor Factorization in Inference, Clustering, Graph Theory, Coding and Visual Representation

he Applications of Tensor Factorization in Inference, Clustering, Graph Theory, Coding and Visual Representation he Applications of Tensor Factorization in Inference, Clustering, Graph Theory, Coding and Visual Representation Amnon Shashua School of Computer Science & Eng. The Hebrew University Matrix Factorization

More information

ONP-MF: An Orthogonal Nonnegative Matrix Factorization Algorithm with Application to Clustering

ONP-MF: An Orthogonal Nonnegative Matrix Factorization Algorithm with Application to Clustering ONP-MF: An Orthogonal Nonnegative Matrix Factorization Algorithm with Application to Clustering Filippo Pompili 1, Nicolas Gillis 2, P.-A. Absil 2,andFrançois Glineur 2,3 1- University of Perugia, Department

More information

Nonnegative Matrix Factorization Clustering on Multiple Manifolds

Nonnegative Matrix Factorization Clustering on Multiple Manifolds Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10) Nonnegative Matrix Factorization Clustering on Multiple Manifolds Bin Shen, Luo Si Department of Computer Science,

More information

Factor Analysis (FA) Non-negative Matrix Factorization (NMF) CSE Artificial Intelligence Grad Project Dr. Debasis Mitra

Factor Analysis (FA) Non-negative Matrix Factorization (NMF) CSE Artificial Intelligence Grad Project Dr. Debasis Mitra Factor Analysis (FA) Non-negative Matrix Factorization (NMF) CSE 5290 - Artificial Intelligence Grad Project Dr. Debasis Mitra Group 6 Taher Patanwala Zubin Kadva Factor Analysis (FA) 1. Introduction Factor

More information

Manifold Learning: Theory and Applications to HRI

Manifold Learning: Theory and Applications to HRI Manifold Learning: Theory and Applications to HRI Seungjin Choi Department of Computer Science Pohang University of Science and Technology, Korea seungjin@postech.ac.kr August 19, 2008 1 / 46 Greek Philosopher

More information

Graph Metrics and Dimension Reduction

Graph Metrics and Dimension Reduction Graph Metrics and Dimension Reduction Minh Tang 1 Michael Trosset 2 1 Applied Mathematics and Statistics The Johns Hopkins University 2 Department of Statistics Indiana University, Bloomington November

More information

Robust Tensor Factorization Using R 1 Norm

Robust Tensor Factorization Using R 1 Norm Robust Tensor Factorization Using R Norm Heng Huang Computer Science and Engineering University of Texas at Arlington heng@uta.edu Chris Ding Computer Science and Engineering University of Texas at Arlington

More information

The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space

The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space Robert Jenssen, Deniz Erdogmus 2, Jose Principe 2, Torbjørn Eltoft Department of Physics, University of Tromsø, Norway

More information

Symmetric Two Dimensional Linear Discriminant Analysis (2DLDA)

Symmetric Two Dimensional Linear Discriminant Analysis (2DLDA) Symmetric Two Dimensional inear Discriminant Analysis (2DDA) Dijun uo, Chris Ding, Heng Huang University of Texas at Arlington 701 S. Nedderman Drive Arlington, TX 76013 dijun.luo@gmail.com, {chqding,

More information

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation Laplacian Eigenmaps for Dimensionality Reduction and Data Representation Neural Computation, June 2003; 15 (6):1373-1396 Presentation for CSE291 sp07 M. Belkin 1 P. Niyogi 2 1 University of Chicago, Department

More information

Cholesky Decomposition Rectification for Non-negative Matrix Factorization

Cholesky Decomposition Rectification for Non-negative Matrix Factorization Cholesky Decomposition Rectification for Non-negative Matrix Factorization Tetsuya Yoshida Graduate School of Information Science and Technology, Hokkaido University N-14 W-9, Sapporo 060-0814, Japan yoshida@meme.hokudai.ac.jp

More information

Spectral Generative Models for Graphs

Spectral Generative Models for Graphs Spectral Generative Models for Graphs David White and Richard C. Wilson Department of Computer Science University of York Heslington, York, UK wilson@cs.york.ac.uk Abstract Generative models are well known

More information

L 2,1 Norm and its Applications

L 2,1 Norm and its Applications L 2, Norm and its Applications Yale Chang Introduction According to the structure of the constraints, the sparsity can be obtained from three types of regularizers for different purposes.. Flat Sparsity.

More information

Data-dependent representations: Laplacian Eigenmaps

Data-dependent representations: Laplacian Eigenmaps Data-dependent representations: Laplacian Eigenmaps November 4, 2015 Data Organization and Manifold Learning There are many techniques for Data Organization and Manifold Learning, e.g., Principal Component

More information

Locality Preserving Projections

Locality Preserving Projections Locality Preserving Projections Xiaofei He Department of Computer Science The University of Chicago Chicago, IL 60637 xiaofei@cs.uchicago.edu Partha Niyogi Department of Computer Science The University

More information

Supervised locally linear embedding

Supervised locally linear embedding Supervised locally linear embedding Dick de Ridder 1, Olga Kouropteva 2, Oleg Okun 2, Matti Pietikäinen 2 and Robert P.W. Duin 1 1 Pattern Recognition Group, Department of Imaging Science and Technology,

More information

Part I: Preliminary Results. Pak K. Chan, Martine Schlag and Jason Zien. Computer Engineering Board of Studies. University of California, Santa Cruz

Part I: Preliminary Results. Pak K. Chan, Martine Schlag and Jason Zien. Computer Engineering Board of Studies. University of California, Santa Cruz Spectral K-Way Ratio-Cut Partitioning Part I: Preliminary Results Pak K. Chan, Martine Schlag and Jason Zien Computer Engineering Board of Studies University of California, Santa Cruz May, 99 Abstract

More information

Multi-Task Clustering using Constrained Symmetric Non-Negative Matrix Factorization

Multi-Task Clustering using Constrained Symmetric Non-Negative Matrix Factorization Multi-Task Clustering using Constrained Symmetric Non-Negative Matrix Factorization Samir Al-Stouhi Chandan K. Reddy Abstract Researchers have attempted to improve the quality of clustering solutions through

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 02-01-2018 Biomedical data are usually high-dimensional Number of samples (n) is relatively small whereas number of features (p) can be large Sometimes p>>n Problems

More information

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations. Previously Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations y = Ax Or A simply represents data Notion of eigenvectors,

More information

Nonlinear Methods. Data often lies on or near a nonlinear low-dimensional curve aka manifold.

Nonlinear Methods. Data often lies on or near a nonlinear low-dimensional curve aka manifold. Nonlinear Methods Data often lies on or near a nonlinear low-dimensional curve aka manifold. 27 Laplacian Eigenmaps Linear methods Lower-dimensional linear projection that preserves distances between all

More information

A Local Non-Negative Pursuit Method for Intrinsic Manifold Structure Preservation

A Local Non-Negative Pursuit Method for Intrinsic Manifold Structure Preservation Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence A Local Non-Negative Pursuit Method for Intrinsic Manifold Structure Preservation Dongdong Chen and Jian Cheng Lv and Zhang Yi

More information

Finding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October

Finding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October Finding normalized and modularity cuts by spectral clustering Marianna Bolla Institute of Mathematics Budapest University of Technology and Economics marib@math.bme.hu Ljubjana 2010, October Outline Find

More information

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal

More information

Spectral Graph Theory and its Applications. Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity

Spectral Graph Theory and its Applications. Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity Spectral Graph Theory and its Applications Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity Outline Adjacency matrix and Laplacian Intuition, spectral graph drawing

More information

A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier

A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier Seiichi Ozawa 1, Shaoning Pang 2, and Nikola Kasabov 2 1 Graduate School of Science and Technology,

More information

Multiscale Manifold Learning

Multiscale Manifold Learning Multiscale Manifold Learning Chang Wang IBM T J Watson Research Lab Kitchawan Rd Yorktown Heights, New York 598 wangchan@usibmcom Sridhar Mahadevan Computer Science Department University of Massachusetts

More information

AN ALTERNATING MINIMIZATION ALGORITHM FOR NON-NEGATIVE MATRIX APPROXIMATION

AN ALTERNATING MINIMIZATION ALGORITHM FOR NON-NEGATIVE MATRIX APPROXIMATION AN ALTERNATING MINIMIZATION ALGORITHM FOR NON-NEGATIVE MATRIX APPROXIMATION JOEL A. TROPP Abstract. Matrix approximation problems with non-negativity constraints arise during the analysis of high-dimensional

More information

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 42 Outline 1 Introduction 2 Feature selection

More information

Non-negative matrix factorization with fixed row and column sums

Non-negative matrix factorization with fixed row and column sums Available online at www.sciencedirect.com Linear Algebra and its Applications 9 (8) 5 www.elsevier.com/locate/laa Non-negative matrix factorization with fixed row and column sums Ngoc-Diep Ho, Paul Van

More information

THE HIDDEN CONVEXITY OF SPECTRAL CLUSTERING

THE HIDDEN CONVEXITY OF SPECTRAL CLUSTERING THE HIDDEN CONVEXITY OF SPECTRAL CLUSTERING Luis Rademacher, Ohio State University, Computer Science and Engineering. Joint work with Mikhail Belkin and James Voss This talk A new approach to multi-way

More information

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH Hoang Trang 1, Tran Hoang Loc 1 1 Ho Chi Minh City University of Technology-VNU HCM, Ho Chi

More information

Discriminative Direction for Kernel Classifiers

Discriminative Direction for Kernel Classifiers Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering

More information

Data dependent operators for the spatial-spectral fusion problem

Data dependent operators for the spatial-spectral fusion problem Data dependent operators for the spatial-spectral fusion problem Wien, December 3, 2012 Joint work with: University of Maryland: J. J. Benedetto, J. A. Dobrosotskaya, T. Doster, K. W. Duke, M. Ehler, A.

More information

Data Analysis and Manifold Learning Lecture 9: Diffusion on Manifolds and on Graphs

Data Analysis and Manifold Learning Lecture 9: Diffusion on Manifolds and on Graphs Data Analysis and Manifold Learning Lecture 9: Diffusion on Manifolds and on Graphs Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture

More information

Local Learning Regularized Nonnegative Matrix Factorization

Local Learning Regularized Nonnegative Matrix Factorization Local Learning Regularized Nonnegative Matrix Factorization Quanquan Gu Jie Zhou State Key Laboratory on Intelligent Technology and Systems Tsinghua National Laboratory for Information Science and Technology

More information

A Unifying Approach to Hard and Probabilistic Clustering

A Unifying Approach to Hard and Probabilistic Clustering A Unifying Approach to Hard and Probabilistic Clustering Ron Zass and Amnon Shashua School of Engineering and Computer Science, The Hebrew University, Jerusalem 91904, Israel Abstract We derive the clustering

More information

Nonnegative Matrix Tri-Factorization Based High-Order Co-Clustering and Its Fast Implementation

Nonnegative Matrix Tri-Factorization Based High-Order Co-Clustering and Its Fast Implementation 2011 11th IEEE International Conference on Data Mining Nonnegative Matrix Tri-Factorization Based High-Order Co-Clustering and Its Fast Implementation Hua Wang, Feiping Nie, Heng Huang, Chris Ding Department

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage

More information

From graph to manifold Laplacian: The convergence rate

From graph to manifold Laplacian: The convergence rate Appl. Comput. Harmon. Anal. 2 (2006) 28 34 www.elsevier.com/locate/acha Letter to the Editor From graph to manifold Laplacian: The convergence rate A. Singer Department of athematics, Yale University,

More information

Graphs, Geometry and Semi-supervised Learning

Graphs, Geometry and Semi-supervised Learning Graphs, Geometry and Semi-supervised Learning Mikhail Belkin The Ohio State University, Dept of Computer Science and Engineering and Dept of Statistics Collaborators: Partha Niyogi, Vikas Sindhwani In

More information

Bi-stochastic kernels via asymmetric affinity functions

Bi-stochastic kernels via asymmetric affinity functions Bi-stochastic kernels via asymmetric affinity functions Ronald R. Coifman, Matthew J. Hirn Yale University Department of Mathematics P.O. Box 208283 New Haven, Connecticut 06520-8283 USA ariv:1209.0237v4

More information

Dimensionality Reduc1on

Dimensionality Reduc1on Dimensionality Reduc1on contd Aarti Singh Machine Learning 10-601 Nov 10, 2011 Slides Courtesy: Tom Mitchell, Eric Xing, Lawrence Saul 1 Principal Component Analysis (PCA) Principal Components are the

More information

Space-Variant Computer Vision: A Graph Theoretic Approach

Space-Variant Computer Vision: A Graph Theoretic Approach p.1/65 Space-Variant Computer Vision: A Graph Theoretic Approach Leo Grady Cognitive and Neural Systems Boston University p.2/65 Outline of talk Space-variant vision - Why and how of graph theory Anisotropic

More information