Cholesky Decomposition Rectification for Non-negative Matrix Factorization
|
|
- Jasmine Gibbs
- 6 years ago
- Views:
Transcription
1 Cholesky Decomposition Rectification for Non-negative Matrix Factorization Tetsuya Yoshida Graduate School of Information Science and Technology, Hokkaido University N-14 W-9, Sapporo , Japan Abstract. We propose a method based on Cholesky decomposition for Non-negative Matrix Factorization (NMF). NMF enables to learn local representation due to its non-negative constraint. However, when utilizing NMF as a representation leaning method, the issues due to the non-orthogonality of the learned representation has not been dealt with. Since NMF learns both feature vectors and data vectors in the feature space, the proposed method 1) estimates the metric in the feature space based on the learned feature vectors, 2) applies Cholesky decomposition on the metric and identifies the upper triangular matrix, 3) and utilizes the upper triangular matrix as a linear mapping for the data vectors. The proposed approach is evaluated over several real world datasets. The results indicate that it is effective and improves performance. 1 Introduction Previous representation learning methods have not explicitly considered the characteristics of algorithms applied to the learned representation [4]. When applying Non-negative Matrix Factorization (NMF) [5,6,8,1] to document clustering, in most cases the number of features areset to the number of clusters [8,2]. However, when the number of features is increased, the non-orthogonality of the features in NMF hinder the effective utilization of the learned representation. We propose a method based on Cholesky decomposition [3] to remedy the problem due to the non-orthogonality of features learned in NMF. Since NMF learns both feature vectors and data vectors in the feature space, the proposed method 1) first estimates the metric in the feature space based on the learned feature vectors, 2) applies Cholesky decomposition on the metric and identifies the upper triangular matrix, 3) and finally utilize the upper triangular matrix as a linear mapping for the data vectors. The proposed method is evaluated over several document clustering problem, and the results indicate the effectiveness of the proposed method. Especially, the proposed method enables the effective utilization of the the learned representation by NMF without modifying the algorithms applied to the learned representation. No label information is required to exploit the metric in the feature space, and the proposed method is fast and robust, since Cholesky decomposition is utilized [3]. M. Kryszkiewicz et al. (Eds.): ISMIS 2011, LNAI 6804, pp , c Springer-Verlag Berlin Heidelberg 2011
2 Cholesky Decomposition Rectification for Non-negative Matrix Factorization Cholesky Decomposition Rectification for NMF We use a bold capital letter for a matrix, and a lower italic letter for a vector. X ij stands for the element in a matrix X. tr stands for the trace of a matrix, and X T stands for the transposition of X. 2.1 Non-negative Matrix Factorization Under the specified number of features q, Non-negative Matrix Factorization (NMF) [6] factorizes a non-negative matrix X= [x 1,, x n ] R p n + into two non-negative matrices U=[u 1,, u q ] R p q +, V=[v 1,, v n ] R q n + as X UV (1) Each x i is approximated as a linear combination of u 1,, u q. Minimization of the following objective function is conducted to obtain the matrices U and V: J 0 = X UV 2 (2) where stands for the norm of a matrix. In this paper we focus on Frobenius norm F [6]. Compared with methods based on eigenvalue analysis such as PCA, each element of U and V are non-negative, and their column vectors are not necessarily orthogonal in Euclidian space. 2.2 Clustering with NMF Besides image analysis [5], NMF is also applied to document clustering [8,2]. In most approaches which utilize NMF for document clustering, the number of features are set to the number of clusters [8,2]. Each instance is assigned to the cluster c with the maximal value in the constructed representation v. c =argmaxv c (3) c where v c stands for the value of c-th element in v. 2.3 Representation Learning with NMF When NMF is considered as a dimensionality reduction method, some learning method such as SVM (Support Vector Machine) or kmeans is applied for the learned representation V. In many cases, methods which assume Euclidian space (such as kmeans) are utilized for conducting learning on V [4]. However, to the best of our knowledge, the issues arising from the non-orthogonality of the learned representation has not been dealt with. 2.4 Cholesky Decomposition Rectification One of the reasons of the above problem is that, when the learned representation V is utilized, usually the square distance between a pair of instances (v i, v j )is calculated as (v i - v j ) T (v i - v j ) by (implicitly) assuming that v i is represented in
3 216 T. Yoshida some Euclidian space. However, since u 1,, u q learned by NMF are not orthogonal each other in general, the above calculation is not appropriate when NMF is utilized to learn V. If we know the metric M which reflects non-orthogonality in the feature space, the square distance can be calculated as (v i v j ) T M(v i v j ) (4) This corresponds to the (squared) Mahalanobis generalized distance. We exploit the property of NMF in the sense that data matrix X is decomposed into i) U, whose column vectors spans the feature space, and ii) V, which are the representation in the feature space. Based on this property, the proposed method 1) first estimates the metric in the feature space based on the learned feature vectors, 2) applies Cholesky decomposition on the metric and identifies the upper triangular matrix, 3) and finally utilizes the upper triangular matrix as a linear mapping for the data vectors. Some learning algorithm is applied to the transformed representation from 3) as in [4], We explain 1) and 2) in our proposed method. Note that the proposed method enables to effectively utilize the learned representation by NMF, without modifying the algorithms applied to the learned representation. Estimation of Metric via NMF. In NMF, when approximating the data matrix X and representing it as V in the feature space, the explicit representation of features in the original data space can also be obtained as U. Thus, by normalizing each u such that u T u=1 as in [8], we estimate the metric M as the Gram matrix U T U of the features. M = U T U, s.t. u T l u l =1, l =1,...,q (5) Contrary to other metric learning approaches, no label information is required to estimate M in our approach. Furthermore, since each data is approximated (embedded) in the feature space spanned by u 1,, u q, it seems rather natural to utilize eq.(5) based on U to estimate the metric of the feature space. Cholesky Decomposition Rectification. Since the metric M is estimated by eq.(5), it is guaranteed that M is symmetric positive semi-definite. Thus, based on Linear algebra [3], it is guaranteed that M can be uniquely decomposed by Cholesky decomposition with the upper triangular matrix T as: M = T T T (6) By substituting eq.(6) into eq.(4), we obtain the rectified representation TV: V TV (7) based on the upper triangular matrix T via Cholesky decomposition.
4 Cholesky Decomposition Rectification for Non-negative Matrix Factorization 217 Algorithm 1. Cholesky Decomposition Rectification for NMF (CNMF) CNMF(X, algnmf, q, parameters) Require: X R p n + //data matrix Require: algnmf ;//the utilized NMF algorithm Require: q; //the number of features Require: pars; //other parameters in algnmf 1: U, V := run algnmf on X with q (and pars) s.t. u T l u l =1, l =1,...,q 2: M := U T U 3: T := Cholesky decomposition of M s.t. M = T T T 4: return U, TV The proposed algorithm CNMF is shown in Algorithm 1. 3 Evaluations 3.1 Experimental Settings Datasets. We evaluated the proposed algorithm on 20 Newsgroup data (20NG) 1. Each document is represented as the standard vector space model based on the occurrences of terms. We created three datasets for 20NG (Multi5, Multi10, Multi15 datasets, with 5, 10, 15 clusters). 50 documents were sampled from each group (cluster) in order to create a sample for one dataset, and 10 samples were created for each dataset. For each sample, we conducted stemming using porter stemmer 2 and MontyTagger 3, removed stop words, and selected 2,000 words with large mutual information. We conducted experiments on the TREC datasets, however, results on other datasets are omitted due to page limit. Evaluation Measures. For each dataset, the cluster assignment was evaluated with respect to Normalized Mutual Information (). Let C, Ĉ stand for the random variables over the true and assigned clusters. is defined as = I(Ĉ;C) ( [0, 1]) where H( ) is Shannon Entropy, I(; ) is Mutual (H(Ĉ)+H(C))/2 Information. corresponds to the accuracy of assignment. The larger is, the better the result is. Comparison. We utilized the proposed method on 1) NMF [6], 2) WNMF [8], 3) GNMF [1], and evaluated its effectiveness. Since these methods are partitioning based clustering methods, we assume that the number of clusters k is specified. WNMF [6] first converts the data matrix X utilizing the weighting scheme in Ncut [7], and applies the standard NMF algorithm on the converted data. GNMF [1] constructs the m nearest neighbor graph and utilizes the graph Laplacian for the adjacency matrix A of the graph as a regularization term as: 1 jrennie/20newsgroups/. 20news was utilized. 2 martin/porterstemmer 3 hugo/montytagger
5 218 T. Yoshida Multi5 Multi10 Multi Multi5 NMF NMF+c WNMF WNMF+c GNMF GNMF+c Multi Multi Fig. 1. Results on 20 Newsgroup datasets ( ) (upper:kmeansclower:skmeans) J 2 = X UV 2 + λ tr(vlv T ) (8) where L = D - A (D is the degree matrix), and λ is the regularization parameter. Parameters. Cosine similarity, was utilized as the pairwise similarity measure. We varied the value of q and conducted experiments. The number of neighbors m was set to 10 in GNMF, andλ was set to 100 based on [1]. The number of maximum iterations was set to 30. Evaluation Procedure. As the standard clustering methods based on Euclidian space, kmeans and skmeans were applied to the learned representation matrix V from each method, and the proposed representation TV in eq.(7). Since NMF finds out local optimal, the results (U, V) depends on the initialization. Thus, we conducted 10 random initialization for the same data matrix. Furthermore, since both kmeans and skmeans are affected from the initial cluster assignment, for the same representation (either V or TV), clustering was repeated 10 times with random initial assignment. 3.2 Results The reported figures are the average of 10 samples in each dataset 4. The horizontal axis corresponds to the number of features q, the vertical one to. In the legend, solid lines correspond to NMF, dotted lines to WNMF, and dash lines to GNMF. In addition, +c stands for the results by utilizing the proposed method in eq.(7) and constructing TV for each method. 4 The average of 1,000 runs is reported for each dataset.
6 Cholesky Decomposition Rectification for Non-negative Matrix Factorization 219 The results in Fig. 1 show that the proposed method improves the performance of kmeans (the standard Euclidian distance) and skmeans (cosine similarity in Euclidian space). Thus, the proposed method can be said as effective to improve the performance. Especially, skmeans was substantially improved (lower figures in Fig. 1). In addition, when the proposed method is applied to WNMF (blue dotted WNMF+c), equivalent or even better performance was obtained compared with GNMF. On the other hand, the proposed method was not effective to GNMF, since the presupposition in Section 2.4 does not hold in GNMF. As the number of features q increases, the performance of NMF and WNMF degraded. On the other hand, by utilizing the proposed method, NMF+c and WNMF+c were very robust with respect to the increase of q. Thus, the proposed method can be said as effective for utilizing large number of features in NMF. 4 Concluding Remarks We proposed a method based on Cholesky decomposition to remedy the problem due to the non-orthogonality of features learned in Non-negative Matrix Factorization (NMF). Since NMF learns both feature vectors and data vectors in the feature space, the proposed method 1) first estimates the metric in the feature space based on the learned feature vectors, 2) applies Cholesky decomposition on the metric and identifies the upper triangular matrix, 3) and finally utilize the upper triangular matrix as a linear mapping for the data vectors. The proposed method enables the effective utilization of the learned representation by NMF without modifying the algorithms applied to the learned representation. References 1. Cai, D., He, X., Wu, X., Han, J.: Non-negative matrix factorization on manifold. In: Proc. of ICDM 2008, pp (2008) 2. Ding, C., Li, T., Peng, W., Park, H.: Orthogonal nonnegative matrix trifactorizations for clustering. In: Proc. of KDD 2006, pp (2006) 3. Harville, D.A.: Matrix Algebra From a Statistican s Perspective. Springer, Heidelberg (2008) 4. Kamvar, S.D., Klein, D., Manning, C.D.: Spectral learning. In: Proc. of IJCAI 2003, pp (2003) 5. Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401, (1999) 6. Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: Proc. of Neural Information Processing Systems (NIPS), pp (2001) 7. von Luxburg, U.: A tutorial on spectral clustering. Statistics and Computing 17(4), (2007) 8. Xu, W., Liu, X., Gong, Y.: Document clustering based on non-negative matrix factorization. In: Proc. of SIGIR 2003, pp (2003)
Orthogonal Nonnegative Matrix Factorization: Multiplicative Updates on Stiefel Manifolds
Orthogonal Nonnegative Matrix Factorization: Multiplicative Updates on Stiefel Manifolds Jiho Yoo and Seungjin Choi Department of Computer Science Pohang University of Science and Technology San 31 Hyoja-dong,
More informationOn the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering
On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering Chris Ding, Xiaofeng He, Horst D. Simon Published on SDM 05 Hongchang Gao Outline NMF NMF Kmeans NMF Spectral Clustering NMF
More informationNonnegative Matrix Factorization Clustering on Multiple Manifolds
Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10) Nonnegative Matrix Factorization Clustering on Multiple Manifolds Bin Shen, Luo Si Department of Computer Science,
More informationL 2,1 Norm and its Applications
L 2, Norm and its Applications Yale Chang Introduction According to the structure of the constraints, the sparsity can be obtained from three types of regularizers for different purposes.. Flat Sparsity.
More informationSpectral Clustering. Zitao Liu
Spectral Clustering Zitao Liu Agenda Brief Clustering Review Similarity Graph Graph Laplacian Spectral Clustering Algorithm Graph Cut Point of View Random Walk Point of View Perturbation Theory Point of
More informationRelations Between Adjacency And Modularity Graph Partitioning: Principal Component Analysis vs. Modularity Component Analysis
Relations Between Adjacency And Modularity Graph Partitioning: Principal Component Analysis vs. Modularity Component Analysis Hansi Jiang Carl Meyer North Carolina State University October 27, 2015 1 /
More informationSpectral clustering. Two ideal clusters, with two points each. Spectral clustering algorithms
A simple example Two ideal clusters, with two points each Spectral clustering Lecture 2 Spectral clustering algorithms 4 2 3 A = Ideally permuted Ideal affinities 2 Indicator vectors Each cluster has an
More informationIterative Laplacian Score for Feature Selection
Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,
More informationLearning Eigenfunctions: Links with Spectral Clustering and Kernel PCA
Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Yoshua Bengio Pascal Vincent Jean-François Paiement University of Montreal April 2, Snowbird Learning 2003 Learning Modal Structures
More informationSpectral Clustering. Spectral Clustering? Two Moons Data. Spectral Clustering Algorithm: Bipartioning. Spectral methods
Spectral Clustering Seungjin Choi Department of Computer Science POSTECH, Korea seungjin@postech.ac.kr 1 Spectral methods Spectral Clustering? Methods using eigenvectors of some matrices Involve eigen-decomposition
More informationMatrix Factorization & Latent Semantic Analysis Review. Yize Li, Lanbo Zhang
Matrix Factorization & Latent Semantic Analysis Review Yize Li, Lanbo Zhang Overview SVD in Latent Semantic Indexing Non-negative Matrix Factorization Probabilistic Latent Semantic Indexing Vector Space
More informationData Analysis and Manifold Learning Lecture 7: Spectral Clustering
Data Analysis and Manifold Learning Lecture 7: Spectral Clustering Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture 7 What is spectral
More informationLocal Learning Regularized Nonnegative Matrix Factorization
Local Learning Regularized Nonnegative Matrix Factorization Quanquan Gu Jie Zhou State Key Laboratory on Intelligent Technology and Systems Tsinghua National Laboratory for Information Science and Technology
More informationDimension Reduction Using Nonnegative Matrix Tri-Factorization in Multi-label Classification
250 Int'l Conf. Par. and Dist. Proc. Tech. and Appl. PDPTA'15 Dimension Reduction Using Nonnegative Matrix Tri-Factorization in Multi-label Classification Keigo Kimura, Mineichi Kudo and Lu Sun Graduate
More informationStatistical and Computational Analysis of Locality Preserving Projection
Statistical and Computational Analysis of Locality Preserving Projection Xiaofei He xiaofei@cs.uchicago.edu Department of Computer Science, University of Chicago, 00 East 58th Street, Chicago, IL 60637
More informationResearch Article Relationship Matrix Nonnegative Decomposition for Clustering
Mathematical Problems in Engineering Volume 2011, Article ID 864540, 15 pages doi:10.1155/2011/864540 Research Article Relationship Matrix Nonnegative Decomposition for Clustering Ji-Yuan Pan and Jiang-She
More informationMulti-Task Co-clustering via Nonnegative Matrix Factorization
Multi-Task Co-clustering via Nonnegative Matrix Factorization Saining Xie, Hongtao Lu and Yangcheng He Shanghai Jiao Tong University xiesaining@gmail.com, lu-ht@cs.sjtu.edu.cn, yche.sjtu@gmail.com Abstract
More informationSemi Supervised Distance Metric Learning
Semi Supervised Distance Metric Learning wliu@ee.columbia.edu Outline Background Related Work Learning Framework Collaborative Image Retrieval Future Research Background Euclidean distance d( x, x ) =
More informationMATH 829: Introduction to Data Mining and Analysis Clustering II
his lecture is based on U. von Luxburg, A Tutorial on Spectral Clustering, Statistics and Computing, 17 (4), 2007. MATH 829: Introduction to Data Mining and Analysis Clustering II Dominique Guillot Departments
More informationData Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings
Data Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline
More information1 Matrix notation and preliminaries from spectral graph theory
Graph clustering (or community detection or graph partitioning) is one of the most studied problems in network analysis. One reason for this is that there are a variety of ways to define a cluster or community.
More informationMULTIPLICATIVE ALGORITHM FOR CORRENTROPY-BASED NONNEGATIVE MATRIX FACTORIZATION
MULTIPLICATIVE ALGORITHM FOR CORRENTROPY-BASED NONNEGATIVE MATRIX FACTORIZATION Ehsan Hosseini Asl 1, Jacek M. Zurada 1,2 1 Department of Electrical and Computer Engineering University of Louisville, Louisville,
More informationGraph-Laplacian PCA: Closed-form Solution and Robustness
2013 IEEE Conference on Computer Vision and Pattern Recognition Graph-Laplacian PCA: Closed-form Solution and Robustness Bo Jiang a, Chris Ding b,a, Bin Luo a, Jin Tang a a School of Computer Science and
More informationNon-negative matrix factorization with fixed row and column sums
Available online at www.sciencedirect.com Linear Algebra and its Applications 9 (8) 5 www.elsevier.com/locate/laa Non-negative matrix factorization with fixed row and column sums Ngoc-Diep Ho, Paul Van
More informationUnsupervised Clustering of Human Pose Using Spectral Embedding
Unsupervised Clustering of Human Pose Using Spectral Embedding Muhammad Haseeb and Edwin R Hancock Department of Computer Science, The University of York, UK Abstract In this paper we use the spectra of
More informationLearning Spectral Clustering
Learning Spectral Clustering Francis R. Bach Computer Science University of California Berkeley, CA 94720 fbach@cs.berkeley.edu Michael I. Jordan Computer Science and Statistics University of California
More informationMachine Learning (BSMC-GA 4439) Wenke Liu
Machine Learning (BSMC-GA 4439) Wenke Liu 02-01-2018 Biomedical data are usually high-dimensional Number of samples (n) is relatively small whereas number of features (p) can be large Sometimes p>>n Problems
More informationSpectral Clustering. by HU Pili. June 16, 2013
Spectral Clustering by HU Pili June 16, 2013 Outline Clustering Problem Spectral Clustering Demo Preliminaries Clustering: K-means Algorithm Dimensionality Reduction: PCA, KPCA. Spectral Clustering Framework
More informationLecture 2: Linear Algebra Review
CS 4980/6980: Introduction to Data Science c Spring 2018 Lecture 2: Linear Algebra Review Instructor: Daniel L. Pimentel-Alarcón Scribed by: Anh Nguyen and Kira Jordan This is preliminary work and has
More informationCMPSCI 791BB: Advanced ML: Laplacian Learning
CMPSCI 791BB: Advanced ML: Laplacian Learning Sridhar Mahadevan Outline! Spectral graph operators! Combinatorial graph Laplacian! Normalized graph Laplacian! Random walks! Machine learning on graphs! Clustering!
More informationNote on Algorithm Differences Between Nonnegative Matrix Factorization And Probabilistic Latent Semantic Indexing
Note on Algorithm Differences Between Nonnegative Matrix Factorization And Probabilistic Latent Semantic Indexing 1 Zhong-Yuan Zhang, 2 Chris Ding, 3 Jie Tang *1, Corresponding Author School of Statistics,
More information7. Symmetric Matrices and Quadratic Forms
Linear Algebra 7. Symmetric Matrices and Quadratic Forms CSIE NCU 1 7. Symmetric Matrices and Quadratic Forms 7.1 Diagonalization of symmetric matrices 2 7.2 Quadratic forms.. 9 7.4 The singular value
More informationNonnegative Matrix Factorization and Probabilistic Latent Semantic Indexing: Equivalence, Chi-square Statistic, and a Hybrid Method
Nonnegative Matrix Factorization and Probabilistic Latent Semantic Indexing: Equivalence, hi-square Statistic, and a Hybrid Method hris Ding a, ao Li b and Wei Peng b a Lawrence Berkeley National Laboratory,
More informationc Springer, Reprinted with permission.
Zhijian Yuan and Erkki Oja. A FastICA Algorithm for Non-negative Independent Component Analysis. In Puntonet, Carlos G.; Prieto, Alberto (Eds.), Proceedings of the Fifth International Symposium on Independent
More informationOn the equivalence between Non-negative Matrix Factorization and Probabilistic Latent Semantic Indexing
Computational Statistics and Data Analysis 52 (2008) 3913 3927 www.elsevier.com/locate/csda On the equivalence between Non-negative Matrix Factorization and Probabilistic Latent Semantic Indexing Chris
More informationCertifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering
Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering Shuyang Ling Courant Institute of Mathematical Sciences, NYU Aug 13, 2018 Joint
More informationNonnegative Matrix Factorization
Nonnegative Matrix Factorization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationLinear Spectral Hashing
Linear Spectral Hashing Zalán Bodó and Lehel Csató Babeş Bolyai University - Faculty of Mathematics and Computer Science Kogălniceanu 1., 484 Cluj-Napoca - Romania Abstract. assigns binary hash keys to
More informationNon-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs Paris Smaragdis TR2004-104 September
More informationSVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)
Chapter 14 SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Today we continue the topic of low-dimensional approximation to datasets and matrices. Last time we saw the singular
More informationSpectral Techniques for Clustering
Nicola Rebagliati 1/54 Spectral Techniques for Clustering Nicola Rebagliati 29 April, 2010 Nicola Rebagliati 2/54 Thesis Outline 1 2 Data Representation for Clustering Setting Data Representation and Methods
More informationStatistical Machine Learning
Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x
More information2 GU, ZHOU: NEIGHBORHOOD PRESERVING NONNEGATIVE MATRIX FACTORIZATION graph regularized NMF (GNMF), which assumes that the nearby data points are likel
GU, ZHOU: NEIGHBORHOOD PRESERVING NONNEGATIVE MATRIX FACTORIZATION 1 Neighborhood Preserving Nonnegative Matrix Factorization Quanquan Gu gqq03@mails.tsinghua.edu.cn Jie Zhou jzhou@tsinghua.edu.cn State
More informationFantope Regularization in Metric Learning
Fantope Regularization in Metric Learning CVPR 2014 Marc T. Law (LIP6, UPMC), Nicolas Thome (LIP6 - UPMC Sorbonne Universités), Matthieu Cord (LIP6 - UPMC Sorbonne Universités), Paris, France Introduction
More informationAutomatic Rank Determination in Projective Nonnegative Matrix Factorization
Automatic Rank Determination in Projective Nonnegative Matrix Factorization Zhirong Yang, Zhanxing Zhu, and Erkki Oja Department of Information and Computer Science Aalto University School of Science and
More informationDimension Reduction and Iterative Consensus Clustering
Dimension Reduction and Iterative Consensus Clustering Southeastern Clustering and Ranking Workshop August 24, 2009 Dimension Reduction and Iterative 1 Document Clustering Geometry of the SVD Centered
More informationarxiv: v1 [stat.ml] 23 Dec 2015
k-means Clustering Is Matrix Factorization Christian Bauckhage arxiv:151.07548v1 [stat.ml] 3 Dec 015 B-IT, University of Bonn, Bonn, Germany Fraunhofer IAIS, Sankt Augustin, Germany http://mmprec.iais.fraunhofer.de/bauckhage.html
More informationConnection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis
Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal
More informationLecture Notes 1: Vector spaces
Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector
More informationData Preprocessing Tasks
Data Tasks 1 2 3 Data Reduction 4 We re here. 1 Dimensionality Reduction Dimensionality reduction is a commonly used approach for generating fewer features. Typically used because too many features can
More informationPrincipal Component Analysis
Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used
More informationROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015
ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti
More informationNonlinear Dimensionality Reduction
Nonlinear Dimensionality Reduction Piyush Rai CS5350/6350: Machine Learning October 25, 2011 Recap: Linear Dimensionality Reduction Linear Dimensionality Reduction: Based on a linear projection of the
More informationKernel Learning with Bregman Matrix Divergences
Kernel Learning with Bregman Matrix Divergences Inderjit S. Dhillon The University of Texas at Austin Workshop on Algorithms for Modern Massive Data Sets Stanford University and Yahoo! Research June 22,
More informationProtein Expression Molecular Pattern Discovery by Nonnegative Principal Component Analysis
Protein Expression Molecular Pattern Discovery by Nonnegative Principal Component Analysis Xiaoxu Han and Joseph Scazzero Department of Mathematics and Bioinformatics Program Department of Accounting and
More informationGroup Sparse Non-negative Matrix Factorization for Multi-Manifold Learning
LIU, LU, GU: GROUP SPARSE NMF FOR MULTI-MANIFOLD LEARNING 1 Group Sparse Non-negative Matrix Factorization for Multi-Manifold Learning Xiangyang Liu 1,2 liuxy@sjtu.edu.cn Hongtao Lu 1 htlu@sjtu.edu.cn
More informationSpectral Algorithms I. Slides based on Spectral Mesh Processing Siggraph 2010 course
Spectral Algorithms I Slides based on Spectral Mesh Processing Siggraph 2010 course Why Spectral? A different way to look at functions on a domain Why Spectral? Better representations lead to simpler solutions
More informationHYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH
HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH Hoang Trang 1, Tran Hoang Loc 1 1 Ho Chi Minh City University of Technology-VNU HCM, Ho Chi
More informationLECTURE NOTE #11 PROF. ALAN YUILLE
LECTURE NOTE #11 PROF. ALAN YUILLE 1. NonLinear Dimension Reduction Spectral Methods. The basic idea is to assume that the data lies on a manifold/surface in D-dimensional space, see figure (1) Perform
More informationVector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.
Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Vector space Consists of: A set V A scalar
More informationThe Informativeness of k-means for Learning Mixture Models
The Informativeness of k-means for Learning Mixture Models Vincent Y. F. Tan (Joint work with Zhaoqiang Liu) National University of Singapore June 18, 2018 1/35 Gaussian distribution For F dimensions,
More informationThere are two things that are particularly nice about the first basis
Orthogonality and the Gram-Schmidt Process In Chapter 4, we spent a great deal of time studying the problem of finding a basis for a vector space We know that a basis for a vector space can potentially
More informationMachine Learning for Data Science (CS4786) Lecture 11
Machine Learning for Data Science (CS4786) Lecture 11 Spectral clustering Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ ANNOUNCEMENT 1 Assignment P1 the Diagnostic assignment 1 will
More informationNonnegative Matrix Factorization with Orthogonality Constraints
Nonnegative Matrix Factorization with Orthogonality Constraints Jiho Yoo and Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology (POSTECH) Pohang, Republic
More informationSemi-Supervised Learning of Speech Sounds
Aren Jansen Partha Niyogi Department of Computer Science Interspeech 2007 Objectives 1 Present a manifold learning algorithm based on locality preserving projections for semi-supervised phone classification
More information14 Singular Value Decomposition
14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing
More informationDiffuse interface methods on graphs: Data clustering and Gamma-limits
Diffuse interface methods on graphs: Data clustering and Gamma-limits Yves van Gennip joint work with Andrea Bertozzi, Jeff Brantingham, Blake Hunter Department of Mathematics, UCLA Research made possible
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA 2 Department of Computer
More informationMathematical foundations - linear algebra
Mathematical foundations - linear algebra Andrea Passerini passerini@disi.unitn.it Machine Learning Vector space Definition (over reals) A set X is called a vector space over IR if addition and scalar
More informationOn the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering
On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering Chris Ding Xiaofeng He Horst D. Simon Abstract Current nonnegative matrix factorization (NMF) deals with X = FG T type. We
More informationDistance Preservation - Part I
October 2, 2007 1 Introduction 2 Scalar product Equivalence with PCA Euclidean distance 3 4 5 Spatial distances Only the coordinates of the points affects the distances. L p norm: a p = p D k=1 a k p Minkowski
More informationBasic Calculus Review
Basic Calculus Review Lorenzo Rosasco ISML Mod. 2 - Machine Learning Vector Spaces Functionals and Operators (Matrices) Vector Space A vector space is a set V with binary operations +: V V V and : R V
More informationLEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach
LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach Dr. Guangliang Chen February 9, 2016 Outline Introduction Review of linear algebra Matrix SVD PCA Motivation The digits
More informationSpectra of Adjacency and Laplacian Matrices
Spectra of Adjacency and Laplacian Matrices Definition: University of Alicante (Spain) Matrix Computing (subject 3168 Degree in Maths) 30 hours (theory)) + 15 hours (practical assignment) Contents 1. Spectra
More informationMatrix Algebra: Summary
May, 27 Appendix E Matrix Algebra: Summary ontents E. Vectors and Matrtices.......................... 2 E.. Notation.................................. 2 E..2 Special Types of Vectors.........................
More informationSpectral Clustering. Guokun Lai 2016/10
Spectral Clustering Guokun Lai 2016/10 1 / 37 Organization Graph Cut Fundamental Limitations of Spectral Clustering Ng 2002 paper (if we have time) 2 / 37 Notation We define a undirected weighted graph
More informationMATRIX ALGEBRA. or x = (x 1,..., x n ) R n. y 1 y 2. x 2. x m. y m. y = cos θ 1 = x 1 L x. sin θ 1 = x 2. cos θ 2 = y 1 L y.
as Basics Vectors MATRIX ALGEBRA An array of n real numbers x, x,, x n is called a vector and it is written x = x x n or x = x,, x n R n prime operation=transposing a column to a row Basic vector operations
More informationClustering-based State Aggregation of Dynamical Networks
Clustering-based State Aggregation of Dynamical Networks Takayuki Ishizaki Ph.D. from Tokyo Institute of Technology (March 2012) Research Fellow of the Japan Society for the Promotion of Science More than
More informationMarkov Chains and Spectral Clustering
Markov Chains and Spectral Clustering Ning Liu 1,2 and William J. Stewart 1,3 1 Department of Computer Science North Carolina State University, Raleigh, NC 27695-8206, USA. 2 nliu@ncsu.edu, 3 billy@ncsu.edu
More informationLecture 2: Linear Algebra Review
EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1
More informationAnalysis of Spectral Kernel Design based Semi-supervised Learning
Analysis of Spectral Kernel Design based Semi-supervised Learning Tong Zhang IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Rie Kubota Ando IBM T. J. Watson Research Center Yorktown Heights,
More informationMATLAB implementation of a scalable spectral clustering algorithm with cosine similarity
MATLAB implementation of a scalable spectral clustering algorithm with cosine similarity Guangliang Chen San José State University, USA RRPR 2018, Beijing, China Introduction We presented a fast spectral
More informationUnsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto
Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian
More informationSemi-supervised Dictionary Learning Based on Hilbert-Schmidt Independence Criterion
Semi-supervised ictionary Learning Based on Hilbert-Schmidt Independence Criterion Mehrdad J. Gangeh 1, Safaa M.A. Bedawi 2, Ali Ghodsi 3, and Fakhri Karray 2 1 epartments of Medical Biophysics, and Radiation
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA
More informationGAUSSIAN PROCESS TRANSFORMS
GAUSSIAN PROCESS TRANSFORMS Philip A. Chou Ricardo L. de Queiroz Microsoft Research, Redmond, WA, USA pachou@microsoft.com) Computer Science Department, Universidade de Brasilia, Brasilia, Brazil queiroz@ieee.org)
More informationFast Nonnegative Matrix Factorization with Rank-one ADMM
Fast Nonnegative Matrix Factorization with Rank-one Dongjin Song, David A. Meyer, Martin Renqiang Min, Department of ECE, UCSD, La Jolla, CA, 9093-0409 dosong@ucsd.edu Department of Mathematics, UCSD,
More informationThe Hypercube Graph and the Inhibitory Hypercube Network
The Hypercube Graph and the Inhibitory Hypercube Network Michael Cook mcook@forstmannleff.com William J. Wolfe Professor of Computer Science California State University Channel Islands william.wolfe@csuci.edu
More informationDistance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center
Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center 1 Outline Part I - Applications Motivation and Introduction Patient similarity application Part II
More informationThis appendix provides a very basic introduction to linear algebra concepts.
APPENDIX Basic Linear Algebra Concepts This appendix provides a very basic introduction to linear algebra concepts. Some of these concepts are intentionally presented here in a somewhat simplified (not
More informationNon-negative Matrix Factorization on Kernels
Non-negative Matrix Factorization on Kernels Daoqiang Zhang, 2, Zhi-Hua Zhou 2, and Songcan Chen Department of Computer Science and Engineering Nanjing University of Aeronautics and Astronautics, Nanjing
More informationBrief Introduction to Machine Learning
Brief Introduction to Machine Learning Yuh-Jye Lee Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU August 29, 2016 1 / 49 1 Introduction 2 Binary Classification 3 Support Vector
More informationOn Spectral Basis Selection for Single Channel Polyphonic Music Separation
On Spectral Basis Selection for Single Channel Polyphonic Music Separation Minje Kim and Seungjin Choi Department of Computer Science Pohang University of Science and Technology San 31 Hyoja-dong, Nam-gu
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationProbabilistic Dyadic Data Analysis with Local and Global Consistency
Deng Cai DENGCAI@CAD.ZJU.EDU.CN Xuanhui Wang XWANG20@CS.UIUC.EDU Xiaofei He XIAOFEIHE@CAD.ZJU.EDU.CN State Key Lab of CAD&CG, College of Computer Science, Zhejiang University, 100 Zijinggang Road, 310058,
More informationA Local Non-Negative Pursuit Method for Intrinsic Manifold Structure Preservation
Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence A Local Non-Negative Pursuit Method for Intrinsic Manifold Structure Preservation Dongdong Chen and Jian Cheng Lv and Zhang Yi
More informationLinear Algebra for Machine Learning. Sargur N. Srihari
Linear Algebra for Machine Learning Sargur N. srihari@cedar.buffalo.edu 1 Overview Linear Algebra is based on continuous math rather than discrete math Computer scientists have little experience with it
More informationLarge-scale Image Annotation by Efficient and Robust Kernel Metric Learning
Large-scale Image Annotation by Efficient and Robust Kernel Metric Learning Supplementary Material Zheyun Feng Rong Jin Anil Jain Department of Computer Science and Engineering, Michigan State University,
More informationPreliminaries and Complexity Theory
Preliminaries and Complexity Theory Oleksandr Romanko CAS 746 - Advanced Topics in Combinatorial Optimization McMaster University, January 16, 2006 Introduction Book structure: 2 Part I Linear Algebra
More informationPrincipal Component Analysis
CSci 5525: Machine Learning Dec 3, 2008 The Main Idea Given a dataset X = {x 1,..., x N } The Main Idea Given a dataset X = {x 1,..., x N } Find a low-dimensional linear projection The Main Idea Given
More information