ENGG5781 Matrix Analysis and Computations Lecture 10: Non-Negative Matrix Factorization and Tensor Decomposition

ENGG5781 Matrix Analysis and Computations Lecture 10: Non-Negative Matrix Factorization and Tensor Decomposition Wing-Kin (Ken) Ma 2017 2018 Term 2 Department of Electronic Engineering The Chinese University of Hong Kong

Topic 1: Nonnegative Matrix Factorization 1

Nonnegative Matrix Factorization Consider again the low-rank factorization problem min Y AB 2 A R m r,b R r n F. The solution is not unique: if (A, B ) is a solution to the above problem, then (A Q T, QB ) for any orthogonal Q is also a solution. Nonnegative Matrix Factorization (NMF): min Y AB 2 A R m r,b R r n F s.t. A 0, B 0 where X 0 means that X is elementwise non-negative. found to be able to extract meaningful features (by empirical studies) under some conditions, the NMF solution is provably unique numerous applications, e.g., in machine learning, signal processing, remote sensing W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, 2017 2018 Term 2. 2

As can be seen from Fig. 1, the NMF basis and encodings contain a large fraction of vanishing coefficients, so both the basis images and image encodings are sparse. The basis images are sparse because they are non-global and NMF contain Examples several versions of mouths, noses and other facial parts, where the various versions are in different Image Processing: locations or forms. The variability of a whole face is generated by combining these different parts. Although all parts are used by at A 0 constraints the basis elements to be nonnegative. B 0 imposes an additive reconstruction. update rules yield the technical disadv controlling the lea through trial and the matrix V is ver Fig. 2 may be adv bases. NMF Original = Figure 1 Non-negative m faces, whereas vector q holistic representations. m ¼ 2;429 facial imag n m matrix V. All thre three different types of c and methods. As shown r ¼ 49 basis images. P with red pixels. A partic represented by a linear superposition are shown superpositions are show learns to represent face the basis elements extract facial features such as eyes, nose and lips. Source: VQ [Lee-Seung1999] W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, 2017 2018 Term 2. 3

Text Mining basis elements allow us to recover different topics; weights allow us to assign each text to its corresponding topics. W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, 2017 2018 Term 2. 4

NMF NP-hard in general min Y AB 2 A R m r,b R r n F s.t A 0, B 0. a practical way to go: alternating optimization w.r.t. A, B given B, minimizing Y AB 2 F over A 0 is convex; given A, minimizing Y AB 2 F over B 0 is also convex despite that, there is no closed-form solution with min A 0 Y AB 2 F or min B 0 Y AB 2 F ; it takes time to solve them when Y is big state of the art, such as the Lee-Seung multiplicative update, applies inexact alternating optimization W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, 2017 2018 Term 2. 5

Toy Demonstration of NMF A face image dataset. Image size = 101 101, number of face images = 13232. Each x n is the vectorization of one face image, leading to m = 101 101 = 10201, N = 13232. W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, 2017 2018 Term 2. 6

Toy Demonstration of NMF: NMF-Extracted Features NMF settings: r = 49, Lee-Seung multiplicative update with 5000 iterations. W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, 2017 2018 Term 2. 7

Toy Demonstration of NMF: Comparison with PCA Mean face 1st principal left singular vector 2nd principal left singular vector 3th principal left singular vector last principal left singular vector 10 6 10 5 10 4 10 3 Energy 10 2 10 1 10 0 10 1 10 2 10 3 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Rank Energy Concentration W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, 2017 2018 Term 2. 8

Suggested Further Readings for NMF [Gillis2014] for a review of some classical NMF algorithms and separable NMF separable NMF is an NMF subclass that features provably polynomial-time or very simple algorithms for solving NMF under certain assumptions [Fu-Huang-Sidiropoulos-Ma2018] for an overview of classic and most recent developments of NMF identifiability note volume minimization NMF, which has provably better identifiability than classic NMF W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, 2017 2018 Term 2. 9

Topic 2: Tensor Decomposition W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, 2017 2018 Term 2. 10

Tensor A tensor is a multi-way numerical array. R I 1 I 2... I N and its entries by x i1,i 2,...,i N. An N-way tensor is denoted by X natural extension of matrices (which are two-way tensors) example: a color picture is a 3-way tensor, video 4-way Color = applications: blind signal separation, chemometrics, data mining,... focus: decomposition for 3-way tensors X R I J K (sufficiently complicated) W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, 2017 2018 Term 2. 11

Outer Product The outer product of a R I and b R J is an I J matrix a b = ab T = [ b 1 a, b 2 a,..., b J a ]. Here, is used to denote the outer product operator.... W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, 2017 2018 Term 2. 12

Outer Product The outer product of a matrix A R I J and a vector c R K, denoted by A c, is a three-way I J K tensor that takes the following form:... Specifically, if we let X = A c, then X (:, :, k) = c k A, k = 1,..., K. W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, 2017 2018 Term 2. 13

Outer Product Tensors in the form of a b c are called rank-1 tensors. W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, 2017 2018 Term 2. 14

Tensor Decomposition Problem: decompose X R I J K as x i,j,k = R a il b jl c kl, r=1 for some a ir, b jr, and c kr, i = 1,..., I, j = 1,..., J, k = 1,..., K, r = 1,..., R, or equivalently, R X = a r b r c r, r=1 where A = [a 1,..., a R ] R I R, B = [b 1,..., b R ] R J R, C = [c 1,..., c R ] R K R. W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, 2017 2018 Term 2. 15

Tensor Decomposition X = R a r b r c r, r=1 ( )... a sum of rank-1 tensors the smallest R that satisfies ( ) is called the rank of the tensor many names: tensor rank decomposition, canonical polyadic decomposition (CPD), parallel factor analysis (PARARFAC), CANDECOMP W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, 2017 2018 Term 2. 16

Application Example: Enron Data About Enron once ranked the 6th largest energy company in the world shares were worth $90.75 at their peak in Aug 2000 and dropped to $0.67 in Jan 2002 most top executives were tried for fraud after it was revealed in Nov 2001 that Enron s earning has been overstated by several hundred million dollars Enron email database a large database of over 600,000 emails generated by 158 employees of the Enron Corporation and acquired by the Federal Energy Regulatory Commission during its investigation after the company s collapse, according to wiki W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, 2017 2018 Term 2. 17

Application Example: Enron Data Source: http://www.math.uwaterloo.ca/~hdesterc/websitew/data/presentations/pres2012/valencia.pptx.pdf W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, 2017 2018 Term 2. 18

Uniqueness of Tensor Decomposition The low-rank matrix factorization problem X = AB is not unique. We also have many matrix decompositions: SVD, QR,... Theorem 10.1. Let (A, B, C) be a factor for X = R r=1 a r b r c r. If krank(a) + krank(b) + krank(c) 2R + 2, then (A, B, C) is the unique tensor decomposition factor for X up to a common column permutation and scaling. Implication: under some mild conditions with A, B, C, low-rank tensor decomposition is essentially unique the above theorem is just among one of the known sufficient conditions for unique tensor decomposition; some other results suggest much more relaxed conditions for unique tensor decomposition W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, 2017 2018 Term 2. 19

Slabs A slab of a tensor is a matrix obtained by fixing one index of the tensor. Horizontal slabs: {X (1) i = X (i, :, :)} I i=1 Lateral slabs: {X (2) j = X (:, j, :)} J j=1 Frontal slabs: {X (3) k = X (:, :, k)} K k=1 Horizontal slabs Lateral slabs Frontal slabs W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, 2017 2018 Term 2. 20

PARAFAC Formulation Consider the horizontal slabs as an example. X (1) i = R a i,r (b r c r ) = r=1 R a i,r b r c T r r=1 We can write X (1) i = BD A(i,:) C T, D A(i,:) = Diag(A(i, :)). W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, 2017 2018 Term 2. 21

PARAFAC Formulation Khatri-Rao (KR) product: the KR product of A R I R and B R J R is A B = [a 1 b 1,..., a R a R ]. A key KR property: let D = diag(d). We have vec(adb T ) = (B A)d. Idea: recall the horizontal slabs expression It can be reexpressed as X (1) i = BD A(i,:) C T, D A(i,:) = Diag(A(i, :)). vec(x (1) i ) = (C B)A T (i, :). roughly speaking, we can do A T (i, :) = (C B) vec(x (1) i ). W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, 2017 2018 Term 2. 22

PARAFAC Formulation By the trick above, we can write Horizontal slabs: X (1) i Lateral slabs: X (2) j Frontal slabs: X (3) k = X (i, :, :) = BD A(i,:) C T vec(x (1) i ) = (C B)A T (i, :) = X (:, j, :) = CD B(j,:) A T vec(x (2) j ) = (A C)B T (j, :) = X (:, :, k) = AD C(k,:) B T vec(x (3) k ) = (B A)CT (k, :) Observation: fixing B, C, solving for A is a linear system problem fixing A, C, solving for B is a linear system problem fixing A, B, solving for C is a linear system problem W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, 2017 2018 Term 2. 23

PARAFAC Formulation A tensor decomposition formulation: given X R I J K, R > 0, solve min A,B,C R X 2 a r b r c r r=1 F, NP-hard in general can be conveniently handled by alternating optimization w.r.t. A, B, C e.g., optimization w.r.t. A and fixing B, C is min A I i=1 X (1) i BD A(i,:) C T 2 F = min A I i=1 vec(x (1) i ) (C B)A T (i, :) 2 2 and a solution is (A (i, :)) T = (C B) vec(x (1) i ), i = 1,..., I. W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, 2017 2018 Term 2. 24

Suggested Further Readings for Tensor Decomposition [Sidiropoulos-De Lathauwer-Fu-Huang-Papalexakis-Faloutsos2017] W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, 2017 2018 Term 2. 25

References [Lee-Seung1999] D.D. Lee and H.S. Seung. Learning the parts of objects by non-negative matrix factorization, Nature, 1999. [Gillis2014] N. Gillis, The Why and how of nonnegative matrix factorization, in Regularization, Optimization, Kernels, and Support Vector Machines, J.A.K. Suykens, M. Signoretto and A. Argyriou (eds), Chapman & Hall/CRC, Machine Learning and Pattern Recognition Series, 2014. [Fu-Huang-Sidiropoulos-Ma2018] X. Fu, K. Huang, N. D. Sidiropoulos, and W.-K. Ma, Nonnegative matrix factorization for signal and data analytics: Identifiability, algorithms, and applications, arxiv preprint, 2018. Online avaiable: https: //arxiv.org/abs/1803.01257 [Sidiropoulos-De Lathauwer-Fu-Huang-Papalexakis-Faloutsos2017] N. D. Sidiropoulos, L. De Lathauwer, X. Fu, K. Huang, E. E. Papalexakis, and C. Faloutsos, Tensor decomposition for signal processing and machine learning, IEEE Trans. Signal Process., 2017. W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, 2017 2018 Term 2. 26