Nonnegative Matrix Factorization

Similar documents
Orthogonal Nonnegative Matrix Factorization: Multiplicative Updates on Stiefel Manifolds

Probabilistic Latent Semantic Analysis

Fisher s Linear Discriminant Analysis

Nonnegative Matrix Factorization with Orthogonality Constraints

Independent Component Analysis (ICA)

Logistic Regression. Seungjin Choi

COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017

On Spectral Basis Selection for Single Channel Polyphonic Music Separation

Linear Models for Regression

Bayes Decision Theory

Expectation Maximization

Data Mining and Matrices

Matrix Factorization & Latent Semantic Analysis Review. Yize Li, Lanbo Zhang

Recommendation Systems

Density Estimation. Seungjin Choi

Non-negative Matrix Factorization: Algorithms, Extensions and Applications

Kernel Principal Component Analysis

Information Theory Primer:

NONNEGATIVE matrix factorization (NMF) is a

Collaborative Filtering: A Machine Learning Perspective

Collaborative Filtering Matrix Completion Alternating Least Squares

Non-negative matrix factorization with fixed row and column sums

Binary Principal Component Analysis in the Netflix Collaborative Filtering Task

Using SVD to Recommend Movies

a Short Introduction

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

Linear Models for Regression

Principal Component Analysis (PCA) for Sparse High-Dimensional Data

c Springer, Reprinted with permission.

Properties of Context-Free Languages

Decoupled Collaborative Ranking

A Coupled Helmholtz Machine for PCA

Collaborative Filtering

Manifold Learning: Theory and Applications to HRI

Rank Selection in Low-rank Matrix Approximations: A Study of Cross-Validation for NMFs

Scaling Neighbourhood Methods

Fast Nonnegative Matrix Factorization with Rank-one ADMM

Collaborative Filtering

Matrix Factorization Techniques for Recommender Systems

Nonnegative Tensor Factorization using a proximal algorithm: application to 3D fluorescence spectroscopy

Single-channel source separation using non-negative matrix factorization

Non-Negative Matrix Factorization

Collaborative Filtering via Ensembles of Matrix Factorizations

Data Mining Techniques

Unsupervised learning: beyond simple clustering and PCA

Machine Learning (BSMC-GA 4439) Wenke Liu

ECE521 Lectures 9 Fully Connected Neural Networks

Techniques for Dimensionality Reduction. PCA and Other Matrix Factorization Methods

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Nonnegative Matrix Factor 2-D Deconvolution for Blind Single Channel Source Separation

Large-scale Collaborative Ranking in Near-Linear Time

CONVOLUTIVE NON-NEGATIVE MATRIX FACTORISATION WITH SPARSENESS CONSTRAINT

Algorithms for Collaborative Filtering

Factor Analysis (FA) Non-negative Matrix Factorization (NMF) CSE Artificial Intelligence Grad Project Dr. Debasis Mitra

Lecture Notes 10: Matrix Factorization

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Finite Automata. Seungjin Choi

Collaborative topic models: motivations cont

Neural Networks: Backpropagation

Quick Introduction to Nonnegative Matrix Factorization

Statistical Pattern Recognition

Non-Negative Matrix Factorization

Nonnegative Matrix Factorization Clustering on Multiple Manifolds

Maximum Within-Cluster Association

Abstract. LANDI, AMANDA KIM. The Nonnegative Matrix Factorization: Methods and Applications. (Under the direction of Kazufumi Ito.

Weighted Low Rank Approximations

Relational Learning via Collective Matrix Factorization. June 2008 CMU-ML

Conditional Independence and Factorization

EE 381V: Large Scale Learning Spring Lecture 16 March 7

Nonparameteric Regression:

Graphical Models for Collaborative Filtering

Nonnegative matrix factorization with α-divergence

Data Mining and Matrices

Jeffrey D. Ullman Stanford University

Independent Component Analysis

Probabilistic Latent Variable Models as Non-Negative Factorizations

Note on Algorithm Differences Between Nonnegative Matrix Factorization And Probabilistic Latent Semantic Indexing

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

ECS289: Scalable Machine Learning

NMF WITH SPECTRAL AND TEMPORAL CONTINUITY CRITERIA FOR MONAURAL SOUND SOURCE SEPARATION. Julian M. Becker, Christian Sohn and Christian Rohlfing

CS Lecture 18. Expectation Maximization

Cheng Soon Ong & Christian Walder. Canberra February June 2017

arxiv: v1 [stat.ml] 23 Dec 2015

1 Non-negative Matrix Factorization (NMF)

On the equivalence between Non-negative Matrix Factorization and Probabilistic Latent Semantic Indexing

Matrix Factorization Techniques For Recommender Systems. Collaborative Filtering

Module Based Neural Networks for Modeling Gene Regulatory Networks

Click-Through Rate prediction: TOP-5 solution for the Avazu contest

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Hierarchical Bayesian Matrix Factorization with Side Information

Latent Variable Models and EM algorithm

Non-Negative Matrix Factorization with Quasi-Newton Optimization

Restricted Boltzmann Machines for Collaborative Filtering

Lecture 5: Web Searching using the SVD

Rank, Trace-Norm & Max-Norm

arxiv: v1 [cs.ir] 16 Oct 2013

Matrix Algebra & Elementary Matrices

Transcription:

Nonnegative Matrix Factorization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 / 21

Lee and Seung, 1999 2 / 21

Nonnegative Matrix Factorization (NMF) U V Given a nonnegative target matrix X R D N, determine a 2-factor decomposition: X UV, X such that factor matrices U R D K and V R N K are nonnegative as well. Low-rank approximation of nonnegative data. Involves the optimization: min X UV 2, subject to U > 0 and V > 0. 3 / 21

Holistic Representation 4 / 21

Parts-Based Representation 5 / 21

Nonnegative Data (a) Document (b) Spectrogram (c) Image (d) Gene Expression 6 / 21

Matrix Factorization for Brain Computer Interface, 2008 7 / 21

Matrix Factorization for Collaborative Filtering, 2009 8 / 21

NMF for Clustering X U V 9 / 21

Nonnegative Matrix Factorization (NMF) Given a nonnegative matrix X R m n (with X ij 0 for i = 1,..., m and j = 1,..., n), NMF seeks a decomposition of X that is of the form: X UV, where U R m k and V R n k are restricted to be nonnegative matrices as well. When columns in X are treated as data points in m-dimensional space, columns in U are considered as basis vectors (or factor loadings) and each row in V is encoding that represents the extent to which each basis vector is used to reconstruct each data vector. Alternatively, when rows in X are data points in n-dimensional space, columns in V correspond to basis vectors and each row in U represents encoding. 10 / 21

NMF: Least Squares The LS error function is given by J LS = X UV 2 m n ( ) 2 = X ij [UV ] ij. i=1 j=1 Then, NMF involves the following optimization: arg min X UV 2. U 0,V 0 Gradient descent yields U ij U ij + η U ij V ij V ij + η V ij ) ([X V ] ij [UV V ] ij, ) ([X U] ij [V U U] ij. 11 / 21

Multiplicative Updates Gradient descent algorithms do not preserve that elements of U and V are nonnegative. Choose learning rates η U ij = η V ij = U ij [UV, V ] ij V ij [V U. U] ij Then we have multiplicative updates: U ij U ij [X V ] ij [UV V ] ij, V ij V ij [X U] ij [V U U] ij. 12 / 21

Multiplicative Updates for NMF: Alternative Derivation Suppose that the gradient of an error function has a decomposition that is of the form J = [ J ] + [ J ], where [ J ] + > 0 and [ J ] > 0. Then multiplicative update for parameters Θ has the form Compute derivatives: ( ) [ J ].η Θ Θ [ J ] +. U J = [ U J ] + [ U J ] = UV V X V, V J = [ V J ] + [ V J ] = V U U X U. Choosing η = 1 yields U U X V UV V, V V X U V U U. 13 / 21

Term-Document Matrix A term-document matrix X R m n is a collection of vector space representations of documents, where rows are terms (words) and columns are documents ( ) n X ij = t ij log, idf i where t ij is the term frequency of word i in document j and idf i is the number of documents containing word i. 14 / 21

Clustering by Factorization NMF yields a factorization X = UV : U ij : the degree to which term i belongs to cluster j. V ij : the degree document i is associated with cluster j. Document clustering is based on column vectors of V. Assign document i to cluster j if j = arg max V ij. j 15 / 21

Document Clustering by NMF 1. Construct a term-document matrix X. 2. Perform NMF on X, yielding X = UV. 3. Normalization U ij U ij, V ij V ij i U2 ij 4. Use V to determine the cluster label for each document. Assign document d i to cluster j if j = arg max V ij. j i U 2 ij. 16 / 21

k-means: Matrix Factorization Perspective Recall the objective function for k-means clustering where J = K k=1 This can be written as i C k x i µ k 2 = V ij = N i=1 j=1 { 1 if x i C j 0 otherwise J = X UV 2, K V ij x i µ j 2, where X = [x 1,..., x N ] and U = [µ 1,..., µ K ] contains centers (prototype vectors) in columns and V is the indicator matrix. 17 / 21

NMF: I-Divergence Consider the I -divergence between the data and the model: J I = i j X X [ ij ij log [ UV ] X ij + UV ] ij. ij Note that I-divergence is identical to Kullback-Leibler divergence when i j X ij = [ i j UV ] = 1. ij Multiplicative updates for U and V are determined by minimizing J I with nonnegativity constraints U 0, V 0 satisfied: k U ij U (X ij/[uv ] ik )V kj ij k V, kj k V ij V (X ki/[uv ] ki )U kj ij k U. kj Equivalence between NMF and PLSA was shown by Gaussier and Goutte (SIGIR-2005). 18 / 21

Weighted NMF In practice, the data matrix is often incomplete, i.e., some of entries are missing or unobserved. In order to handle missing entries in the decomposition, we consider an objective function that is a sum of weighted residuals: J = i,j W ij (X ij [UV ] ij ) 2 = W (X UV ) 2, where W ij are binary weights, i.e., { 1 if Xij is observed W ij = 0 if X ij is unobserved. Multiplicative updates for WNMF are given as: U U [W X ]V [W UV ]V, V V [W X ] U [W V U ]U. 19 / 21

Weighted NMF for Collaborative Filtering Item 1 Item 2 Item 3 Item 4 Item 5 Item 6 User 1 User 2 User 3 User 4 User 5 User 6 = positive negative unobserved 20 / 21

Weighted NMF for Collaborative Filtering Item 1 Item 2 Item 3 Item 4 Item 5 Item 6 Item 7 Coefficients for User 3 Latent movie factors User 1 User 2 User 3 User 4 User 5 User 6 User 7 = x comedy action horror thriller action comedy horror thriller Item Factor Matrix User-Item Rating Matrix User Factor Matrix X ij u i v j 21 / 21