arxiv: v1 [stat.ml] 23 Dec 2015

Similar documents
k-means Clustering via the Frank-Wolfe Algorithm

Note on Algorithm Differences Between Nonnegative Matrix Factorization And Probabilistic Latent Semantic Indexing

This appendix provides a very basic introduction to linear algebra concepts.

Linear Algebra and Robot Modeling

Non-negative matrix factorization with fixed row and column sums

An indicator for the number of clusters using a linear map to simplex structure

Orthogonal Nonnegative Matrix Factorization: Multiplicative Updates on Stiefel Manifolds

Nonnegative Matrix Factorization

Nonnegative Matrix Factorization and Probabilistic Latent Semantic Indexing: Equivalence, Chi-square Statistic, and a Hybrid Method

Matrix Factorization & Latent Semantic Analysis Review. Yize Li, Lanbo Zhang

CS264: Beyond Worst-Case Analysis Lecture #15: Topic Modeling and Nonnegative Matrix Factorization

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

Data Mining and Matrices

Matrix factorization models for patterns beyond blocks. Pauli Miettinen 18 February 2016

arxiv: v1 [math.fa] 19 Jul 2009

On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering

Clarkson Inequalities With Several Operators

On the equivalence between Non-negative Matrix Factorization and Probabilistic Latent Semantic Indexing

IV. Matrix Approximation using Least-Squares

Review problems for MA 54, Fall 2004.

A Purely Geometric Approach to Non-Negative Matrix Factorization

Cholesky Decomposition Rectification for Non-negative Matrix Factorization

On the Relative Gain Array (RGA) with Singular and Rectangular Matrices

Another algorithm for nonnegative matrices

Lecture 8: Linear Algebra Background

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

Fast Nonnegative Matrix Factorization with Rank-one ADMM

An Introduction to Matrix Algebra

MINIMAL POLYNOMIALS AND CHARACTERISTIC POLYNOMIALS OVER RINGS

SAMPLE OF THE STUDY MATERIAL PART OF CHAPTER 1 Introduction to Linear Algebra

(Refer Slide Time: 2:04)

Preserving Privacy in Data Mining using Data Distortion Approach

MULTIPLICATIVE ALGORITHM FOR CORRENTROPY-BASED NONNEGATIVE MATRIX FACTORIZATION

Deep Learning Book Notes Chapter 2: Linear Algebra

Complex Matrix Transformations

Non-Negative Matrix Factorization

Gaussian Graphical Models and Graphical Lasso

M.A.P. Matrix Algebra Procedures. by Mary Donovan, Adrienne Copeland, & Patrick Curry

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)

Factor Analysis (FA) Non-negative Matrix Factorization (NMF) CSE Artificial Intelligence Grad Project Dr. Debasis Mitra

1. General Vector Spaces

Kernels for Multi task Learning

Linear Algebra Done Wrong. Sergei Treil. Department of Mathematics, Brown University

CUTOFF FOR THE STAR TRANSPOSITION RANDOM WALK

QUALITATIVE CONTROLLABILITY AND UNCONTROLLABILITY BY A SINGLE ENTRY

Section 3.9. Matrix Norm

Notes on Linear Algebra and Matrix Theory

The Equivalence between Row and Column Linear Regression: A Surprising Feature of Linear Regression Updated Version 2.

Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora

On Optimal Frame Conditioners

SAMPLE OF THE STUDY MATERIAL PART OF CHAPTER 1 Introduction to Linear Algebra

ONP-MF: An Orthogonal Nonnegative Matrix Factorization Algorithm with Application to Clustering

L 2,1 Norm and its Applications

ON SUM OF SQUARES DECOMPOSITION FOR A BIQUADRATIC MATRIX FUNCTION

Preface. Figures Figures appearing in the text were prepared using MATLAB R. For product information, please contact:

arxiv: v1 [cs.sy] 2 Apr 2019

MODULE 8 Topics: Null space, range, column space, row space and rank of a matrix

Least Squares Optimization

Linear algebra I Homework #1 due Thursday, Oct Show that the diagonals of a square are orthogonal to one another.

A Introduction to Matrix Algebra and the Multivariate Normal Distribution

Abstract Algebra Study Sheet

Positive entries of stable matrices

A Field Extension as a Vector Space

Spectral Properties of Matrix Polynomials in the Max Algebra

CS123 INTRODUCTION TO COMPUTER GRAPHICS. Linear Algebra /34

Geometric interpretation of signals: background

A Single-letter Upper Bound for the Sum Rate of Multiple Access Channels with Correlated Sources

Mobile Robotics 1. A Compact Course on Linear Algebra. Giorgio Grisetti

1 Dirac Notation for Vector Spaces

Unsupervised Learning with Permuted Data

Comprehensive Introduction to Linear Algebra

MATRIX DETERMINANTS. 1 Reminder Definition and components of a matrix

Learning Binary Classifiers for Multi-Class Problem

Assignment 1 Math 5341 Linear Algebra Review. Give complete answers to each of the following questions. Show all of your work.

Review of Matrices and Block Structures

Linear Algebra, Summer 2011, pt. 3

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 1 x 2. x n 8 (4) 3 4 2

Finite Frames and Graph Theoretical Uncertainty Principles

Reconstruction from projections using Grassmann tensors

Lecture Notes 1: Vector spaces

MAT2342 : Introduction to Applied Linear Algebra Mike Newman, fall Projections. introduction

OPTIMAL SCALING FOR P -NORMS AND COMPONENTWISE DISTANCE TO SINGULARITY

Convex and Semi-Nonnegative Matrix Factorizations

Numerical Linear Algebra

The Nearest Doubly Stochastic Matrix to a Real Matrix with the same First Moment

IN this paper, we consider the capacity of sticky channels, a

CS123 INTRODUCTION TO COMPUTER GRAPHICS. Linear Algebra 1/33

Linear Algebra. The Manga Guide. Supplemental Appendixes. Shin Takahashi, Iroha Inoue, and Trend-Pro Co., Ltd.

Worst-Case Bounds for Gaussian Process Models

Working with Block Structured Matrices

Quick Introduction to Nonnegative Matrix Factorization

Machine Learning (BSMC-GA 4439) Wenke Liu

Topic 15 Notes Jeremy Orloff

1. Vectors.

Lecture 2: Linear operators

On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering

Lecture 5 : Projections

NOTES ON THE PERRON-FROBENIUS THEORY OF NONNEGATIVE MATRICES

CS 231A Section 1: Linear Algebra & Probability Review

Matrices and Quaternions. Gary D. Simpson. rev 00 Mar 01, Summary

Transcription:

k-means Clustering Is Matrix Factorization Christian Bauckhage arxiv:151.07548v1 [stat.ml] 3 Dec 015 B-IT, University of Bonn, Bonn, Germany Fraunhofer IAIS, Sankt Augustin, Germany http://mmprec.iais.fraunhofer.de/bauckhage.html Abstract. We show that the obective function of conventional k-means clustering can be expressed as the Frobenius norm of the difference of a data matrix and a low rank approximation of that data matrix. In short, we show that k-means clustering is a matrix factorization problem. These notes are meant as a reference and intended to provide a guided tour towards a result that is often mentioned but seldom made explicit in the literature. 1 Introduction Thek-meansprocedureisoneofthemostpopulartechniquestoclusteradataset X R m into subsets C 1,...,C k. The underlying ideas are intuitive and simple and most theoretical properties of k-means clustering are well established text book material [1,]. In this note, we are concerned with an aspect of k-means clustering that is arguably less well known and somewhat under-appreciated. Over the past years, several authors have pointed out that k-means clustering can be understood as a constrained matrix factorization problem [3,4,5,6,7]. However, reading these or related texts, it appears as if most authors consider this fact self explanatory and hardly discuss it in detail. Since this may confuse less experienced readers, our goal in this note is to rigorously establish the following equalities for the obective function of hard k-means clustering k i=1 =1 n z i x µ i X X = M = X T T) 1 1) where X R m n is a matrix of data vectors x R m ) M R m k is a matrix of cluster centroids µ i R m 3) R k n is a matrix of binary indicator variables such that { 1, if x C i z i = 0, otherwise. 4)

Notation and Preliminaries Throughout, we write x to denote -th column vector of a matrix X. To refer to the l,) element of a matrix X, we either write x l or X ) l. The Euclidean norm of a vector will be written as x and the Frobenius norm of a matrix as X. Regarding the squared Frobenius norm of a matrix, we recall the following properties X = l, x l = x = x T x = X T X ) = tr[ X T X ] 5) Finally, subscripts or summation indices i will be understood to range from 1 to k the number of clusters), subscripts or summation indices will range from 1 up to n the number of data vectors), and subscripts or summation indices l will be used to expand inner products between vectors or rows and columns of matrices. 3 Step by Step Derivation of 1) To substantiate the claim in 1), we first point out several peculiar properties of the binary indicator matrix in 4). Ifthe clustersc 1,...C k havedistinct clustercentroidsµ 1,...,µ k,eachofthe columns of will contain a single 1 and k 1 elements that are 0. Accordingly, the columns of will sum to one z i = 1 6) i and its row sums will indicate the number elements per cluster z i = n i = C i. 7) Moreover, since z i {0,1} and each column of only contains a single 1, the rows of are pairwise perpendicular because { 1, if i = i z i z i = 8) 0, otherwise which is then to say that the matrix T is a diagonal matrix where T ) = ) ii i T ) = { n i, if i = i z i i z i = 0, otherwise. 9) Having familiarized ourselves with these properties of the indicator matrix, we are now positioned to establish the equalities in 1) which we will do in a step by step manner.

3.1 Step 1: Expanding the expression on the left of 1) We begin our derivation by expanding the conventional k-means obective function on the left of 1). For this expression, we have z x i µ i = z i x T x x T µ i +µ T ) i µ i i, i, = z i x T x z i x T µ i + z i µ T i µ i. 10) i, i, i, }{{}}{{}}{{} T 1 T T 3 This expansion leads to further insights, if we examine the three terms T 1, T, and T 3 one by one. First of all, we find T 1 = i, z i x T x = i, z i x 11) = x 1) = tr [ X T X ] 13) where we made use of 6) and 5). Second of all, we observe T = z i x T µ i = z i x l µ li 14) i, i, l = x l µ li z i 15),l i = x l M 16) )l,l = = X T ) ) l M 17) l l X T M ) 18) Third of all, we note that = tr [ X T M ] 19) T 3 = i, z i µ T i µ i = i, z i µi 0) = i µi ni 1) where we applied 7).

3. Step : Expanding the expression in the middle of 1) Next, we look at the second expression in 1). As a squared Frobenius norm of a matrix difference, it can be written as X M [ X ) T ) ] = tr M X M = tr [ X T X ] [ tr X T M ] [ +tr T M T M ] ) }{{}}{{}}{{} T 4 T 5 T 6 Givenourearlierresults,weimmediatelyrecognizethatT 1 = T 4 andt = T 5. Thus, to establish that 10) and ) are indeed equivalent, it remains to verify whether T 3 = T 6? Regarding T 6, we note that, because of the cyclic permutation invariance of the trace operator, we have tr [ T M T M ] = tr [ M T M T]. 3) We also note that M T M T) ii 4) tr [ M T M T] = i = i M T M ) il T ) 5) li l M T M ) ii T ) ii 6) = i = i µi ni 7) where we used the fact that T is diagonal. This result, however, shows that T 3 = T 6 and, consequently, that 10) and ) really are equivalent. 3.3 Step 3: Eliminating matrix M Finally, to establish the equality on the right of 1) we ask for the matrix M that, for a given, would minimize X M. To this end, we consider X M = [tr [ X T X ] tr [ X T M ] +tr [ T M T M ]] M M = M T X T) 8) which, upon equation to 0, leads to M = X T T) 1 9) which beautifully reflects the fact that each of the k-means cluster centroids µ i coincides with the mean of the corresponding cluster C i, namely µ i = z i x z = 1 x. 30) i n i x C i

4 Conclusion Using tedious yet straightforward algebra, we have shown the the problem of hard k-means clustering can be understood as the following constrained matrix factorization problem min X X T T) 1 s.t. z i {0,1} z i = 1 References 1. MacKay, D.: Information Theory, Inference, & Learning Algorithms. Cambridge University Press 003). Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer 001) 3. Ding, C., He, X., Simon, H.: On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering. In: Proc. SDM, SIAM 005) 4. Gaussier, E., Goutte, C.: Relations between PLSA and NMF and Implications. In: Proc. SIGIR, ACM 005) 5. Kim, J., Park, H.: Sparse Nonnegative Matrix Factorization for Clustering. Technical Report GT-CSE-08-01, Georgia Institute of Technology 008) 6. Arora, R., Gupta, M., Kapila, A., Fazel, M.: Similarity-based Clustering by Left- Stochastic Matrix Factorization. J. of Machine Learning Research 14Jul.) 013) 7. Bauckhage, C., Drachen, A., Sifa, R.: Clustering Game Behavior Data. IEEE Trans. on Computational Intelligence and AI in Games 73) 015)