ENGG5781 Matrix Analysis and Computations Lecture 10: Non-Negative Matrix Factorization and Tensor Decomposition

Similar documents
ENGG5781 Matrix Analysis and Computations Lecture 0: Overview

A randomized block sampling approach to the canonical polyadic decomposition of large-scale tensors

Computational Linear Algebra

ENGG5781 Matrix Analysis and Computations Lecture 8: QR Decomposition

Introduction to Tensors. 8 May 2014

Coupled Matrix/Tensor Decompositions:

CVPR A New Tensor Algebra - Tutorial. July 26, 2017

Rank Selection in Low-rank Matrix Approximations: A Study of Cross-Validation for NMFs

矩阵论 马锦华 数据科学与计算机学院 中山大学

Semidefinite Programming Based Preconditioning for More Robust Near-Separable Nonnegative Matrix Factorization

Dealing with curse and blessing of dimensionality through tensor decompositions

1 Non-negative Matrix Factorization (NMF)

Lecture Notes 10: Matrix Factorization

CS264: Beyond Worst-Case Analysis Lecture #15: Topic Modeling and Nonnegative Matrix Factorization

Multi-Way Compressed Sensing for Big Tensor Data

Linear dimensionality reduction for data analysis

Sparseness Constraints on Nonnegative Tensor Decomposition

Quick Introduction to Nonnegative Matrix Factorization

Orthogonal tensor decomposition

Tensor Decompositions for Machine Learning. G. Roeder 1. UBC Machine Learning Reading Group, June University of British Columbia

BOOLEAN TENSOR FACTORIZATIONS. Pauli Miettinen 14 December 2011

ENGG5781 Matrix Analysis and Computations Lecture 9: Kronecker Product

Majorization Minimization (MM) and Block Coordinate Descent (BCD)

Automatic Rank Determination in Projective Nonnegative Matrix Factorization

Data Mining and Matrices

c Springer, Reprinted with permission.

COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017

Decompositions of Higher-Order Tensors: Concepts and Computation

A Modular NMF Matching Algorithm for Radiation Spectra

Performance Evaluation of the Matlab PCT for Parallel Implementations of Nonnegative Tensor Factorization

Big Tensor Data Reduction

NON-NEGATIVE SPARSE CODING

Principal Component Analysis

Decomposing a three-way dataset of TV-ratings when this is impossible. Alwin Stegeman

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2016

SPARSE signal representations have gained popularity in recent

/16/$ IEEE 1728

CP DECOMPOSITION AND ITS APPLICATION IN NOISE REDUCTION AND MULTIPLE SOURCES IDENTIFICATION

Matrix Factorization with Applications to Clustering Problems: Formulation, Algorithms and Performance

Faloutsos, Tong ICDE, 2009

A BLIND SPARSE APPROACH FOR ESTIMATING CONSTRAINT MATRICES IN PARALIND DATA MODELS

CPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018

Matrix Factorization & Latent Semantic Analysis Review. Yize Li, Lanbo Zhang

The multiple-vector tensor-vector product

Scalable Tensor Factorizations with Incomplete Data

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Must-read Material : Multimedia Databases and Data Mining. Indexing - Detailed outline. Outline. Faloutsos

Tensor Analysis. Topics in Data Mining Fall Bruno Ribeiro

Non-negative matrix factorization with fixed row and column sums

High Performance Parallel Tucker Decomposition of Sparse Tensors

Non-Negative Matrix Factorization

Information Retrieval

Learning From Hidden Traits: Joint Factor Analysis and Latent Clustering

Orthogonal Nonnegative Matrix Factorization: Multiplicative Updates on Stiefel Manifolds

CSC 576: Variants of Sparse Learning

Main matrix factorizations

A Characterization of Sampling Patterns for Union of Low-Rank Subspaces Retrieval Problem

Anomaly Detection in Temporal Graph Data: An Iterative Tensor Decomposition and Masking Approach

Machine learning for pervasive systems Classification in high-dimensional spaces

Using Matrix Decompositions in Formal Concept Analysis

Higher-Order Singular Value Decomposition (HOSVD) for structured tensors

CS168: The Modern Algorithmic Toolbox Lecture #10: Tensors, and Low-Rank Tensor Recovery

Matrix Decomposition in Privacy-Preserving Data Mining JUN ZHANG DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF KENTUCKY

Tensor Decompositions for Signal Processing Applications

Kronecker Decomposition for Image Classification

Example: Face Detection

Nonnegative Tensor Factorization using a proximal algorithm: application to 3D fluorescence spectroscopy

Non-Negative Tensor Factorization with Applications to Statistics and Computer Vision

JOS M.F. TEN BERGE SIMPLICITY AND TYPICAL RANK RESULTS FOR THREE-WAY ARRAYS

Linear Algebra Methods for Data Mining

Non-Negative Matrix Factorization with Quasi-Newton Optimization

Data Mining and Matrices

Nesterov-based Alternating Optimization for Nonnegative Tensor Completion: Algorithm and Parallel Implementation

Fitting a Tensor Decomposition is a Nonlinear Optimization Problem

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

On Identifiability of Nonnegative Matrix Factorization

Kronecker Product Approximation with Multiple Factor Matrices via the Tensor Product Algorithm

PROJECTIVE NON-NEGATIVE MATRIX FACTORIZATION WITH APPLICATIONS TO FACIAL IMAGE PROCESSING

Postgraduate Course Signal Processing for Big Data (MSc)

Non-convex Robust PCA: Provable Bounds

Fast Coordinate Descent methods for Non-Negative Matrix Factorization

1 Feature Vectors and Time Series

To be published in Optics Letters: Blind Multi-spectral Image Decomposition by 3D Nonnegative Tensor Title: Factorization Authors: Ivica Kopriva and A

Dictionary Learning Using Tensor Methods

A concise proof of Kruskal s theorem on tensor decomposition

INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

Multiscale Tensor Decomposition

Linear Classification: Perceptron

Simplicial Nonnegative Matrix Tri-Factorization: Fast Guaranteed Parallel Algorithm

Donald Goldfarb IEOR Department Columbia University UCLA Mathematics Department Distinguished Lecture Series May 17 19, 2016

Optimal solutions to non-negative PARAFAC/multilinear NMF always exist

Nonnegative Matrix Factorization

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

Robust Asymmetric Nonnegative Matrix Factorization

ONP-MF: An Orthogonal Nonnegative Matrix Factorization Algorithm with Application to Clustering

Dimitri Nion & Lieven De Lathauwer

The Canonical Tensor Decomposition and Its Applications to Social Network Analysis

Matrix Algebra. Learning Objectives. Size of Matrix

CS 231A Section 1: Linear Algebra & Probability Review

Data Mining and Analysis: Fundamental Concepts and Algorithms

Transcription:

ENGG5781 Matrix Analysis and Computations Lecture 10: Non-Negative Matrix Factorization and Tensor Decomposition Wing-Kin (Ken) Ma 2017 2018 Term 2 Department of Electronic Engineering The Chinese University of Hong Kong

Topic 1: Nonnegative Matrix Factorization 1

Nonnegative Matrix Factorization Consider again the low-rank factorization problem min Y AB 2 A R m r,b R r n F. The solution is not unique: if (A, B ) is a solution to the above problem, then (A Q T, QB ) for any orthogonal Q is also a solution. Nonnegative Matrix Factorization (NMF): min Y AB 2 A R m r,b R r n F s.t. A 0, B 0 where X 0 means that X is elementwise non-negative. found to be able to extract meaningful features (by empirical studies) under some conditions, the NMF solution is provably unique numerous applications, e.g., in machine learning, signal processing, remote sensing W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, 2017 2018 Term 2. 2

As can be seen from Fig. 1, the NMF basis and encodings contain a large fraction of vanishing coefficients, so both the basis images and image encodings are sparse. The basis images are sparse because they are non-global and NMF contain Examples several versions of mouths, noses and other facial parts, where the various versions are in different Image Processing: locations or forms. The variability of a whole face is generated by combining these different parts. Although all parts are used by at A 0 constraints the basis elements to be nonnegative. B 0 imposes an additive reconstruction. update rules yield the technical disadv controlling the lea through trial and the matrix V is ver Fig. 2 may be adv bases. NMF Original = Figure 1 Non-negative m faces, whereas vector q holistic representations. m ¼ 2;429 facial imag n m matrix V. All thre three different types of c and methods. As shown r ¼ 49 basis images. P with red pixels. A partic represented by a linear superposition are shown superpositions are show learns to represent face the basis elements extract facial features such as eyes, nose and lips. Source: VQ [Lee-Seung1999] W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, 2017 2018 Term 2. 3

Text Mining basis elements allow us to recover different topics; weights allow us to assign each text to its corresponding topics. W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, 2017 2018 Term 2. 4

NMF NP-hard in general min Y AB 2 A R m r,b R r n F s.t A 0, B 0. a practical way to go: alternating optimization w.r.t. A, B given B, minimizing Y AB 2 F over A 0 is convex; given A, minimizing Y AB 2 F over B 0 is also convex despite that, there is no closed-form solution with min A 0 Y AB 2 F or min B 0 Y AB 2 F ; it takes time to solve them when Y is big state of the art, such as the Lee-Seung multiplicative update, applies inexact alternating optimization W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, 2017 2018 Term 2. 5

Toy Demonstration of NMF A face image dataset. Image size = 101 101, number of face images = 13232. Each x n is the vectorization of one face image, leading to m = 101 101 = 10201, N = 13232. W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, 2017 2018 Term 2. 6

Toy Demonstration of NMF: NMF-Extracted Features NMF settings: r = 49, Lee-Seung multiplicative update with 5000 iterations. W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, 2017 2018 Term 2. 7

Toy Demonstration of NMF: Comparison with PCA Mean face 1st principal left singular vector 2nd principal left singular vector 3th principal left singular vector last principal left singular vector 10 6 10 5 10 4 10 3 Energy 10 2 10 1 10 0 10 1 10 2 10 3 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Rank Energy Concentration W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, 2017 2018 Term 2. 8

Suggested Further Readings for NMF [Gillis2014] for a review of some classical NMF algorithms and separable NMF separable NMF is an NMF subclass that features provably polynomial-time or very simple algorithms for solving NMF under certain assumptions [Fu-Huang-Sidiropoulos-Ma2018] for an overview of classic and most recent developments of NMF identifiability note volume minimization NMF, which has provably better identifiability than classic NMF W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, 2017 2018 Term 2. 9

Topic 2: Tensor Decomposition W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, 2017 2018 Term 2. 10

Tensor A tensor is a multi-way numerical array. R I 1 I 2... I N and its entries by x i1,i 2,...,i N. An N-way tensor is denoted by X natural extension of matrices (which are two-way tensors) example: a color picture is a 3-way tensor, video 4-way Color = applications: blind signal separation, chemometrics, data mining,... focus: decomposition for 3-way tensors X R I J K (sufficiently complicated) W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, 2017 2018 Term 2. 11

Outer Product The outer product of a R I and b R J is an I J matrix a b = ab T = [ b 1 a, b 2 a,..., b J a ]. Here, is used to denote the outer product operator.... W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, 2017 2018 Term 2. 12

Outer Product The outer product of a matrix A R I J and a vector c R K, denoted by A c, is a three-way I J K tensor that takes the following form:... Specifically, if we let X = A c, then X (:, :, k) = c k A, k = 1,..., K. W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, 2017 2018 Term 2. 13

Outer Product Tensors in the form of a b c are called rank-1 tensors. W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, 2017 2018 Term 2. 14

Tensor Decomposition Problem: decompose X R I J K as x i,j,k = R a il b jl c kl, r=1 for some a ir, b jr, and c kr, i = 1,..., I, j = 1,..., J, k = 1,..., K, r = 1,..., R, or equivalently, R X = a r b r c r, r=1 where A = [a 1,..., a R ] R I R, B = [b 1,..., b R ] R J R, C = [c 1,..., c R ] R K R. W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, 2017 2018 Term 2. 15

Tensor Decomposition X = R a r b r c r, r=1 ( )... a sum of rank-1 tensors the smallest R that satisfies ( ) is called the rank of the tensor many names: tensor rank decomposition, canonical polyadic decomposition (CPD), parallel factor analysis (PARARFAC), CANDECOMP W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, 2017 2018 Term 2. 16

Application Example: Enron Data About Enron once ranked the 6th largest energy company in the world shares were worth $90.75 at their peak in Aug 2000 and dropped to $0.67 in Jan 2002 most top executives were tried for fraud after it was revealed in Nov 2001 that Enron s earning has been overstated by several hundred million dollars Enron email database a large database of over 600,000 emails generated by 158 employees of the Enron Corporation and acquired by the Federal Energy Regulatory Commission during its investigation after the company s collapse, according to wiki W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, 2017 2018 Term 2. 17

Application Example: Enron Data Source: http://www.math.uwaterloo.ca/~hdesterc/websitew/data/presentations/pres2012/valencia.pptx.pdf W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, 2017 2018 Term 2. 18

Uniqueness of Tensor Decomposition The low-rank matrix factorization problem X = AB is not unique. We also have many matrix decompositions: SVD, QR,... Theorem 10.1. Let (A, B, C) be a factor for X = R r=1 a r b r c r. If krank(a) + krank(b) + krank(c) 2R + 2, then (A, B, C) is the unique tensor decomposition factor for X up to a common column permutation and scaling. Implication: under some mild conditions with A, B, C, low-rank tensor decomposition is essentially unique the above theorem is just among one of the known sufficient conditions for unique tensor decomposition; some other results suggest much more relaxed conditions for unique tensor decomposition W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, 2017 2018 Term 2. 19

Slabs A slab of a tensor is a matrix obtained by fixing one index of the tensor. Horizontal slabs: {X (1) i = X (i, :, :)} I i=1 Lateral slabs: {X (2) j = X (:, j, :)} J j=1 Frontal slabs: {X (3) k = X (:, :, k)} K k=1 Horizontal slabs Lateral slabs Frontal slabs W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, 2017 2018 Term 2. 20

PARAFAC Formulation Consider the horizontal slabs as an example. X (1) i = R a i,r (b r c r ) = r=1 R a i,r b r c T r r=1 We can write X (1) i = BD A(i,:) C T, D A(i,:) = Diag(A(i, :)). W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, 2017 2018 Term 2. 21

PARAFAC Formulation Khatri-Rao (KR) product: the KR product of A R I R and B R J R is A B = [a 1 b 1,..., a R a R ]. A key KR property: let D = diag(d). We have vec(adb T ) = (B A)d. Idea: recall the horizontal slabs expression It can be reexpressed as X (1) i = BD A(i,:) C T, D A(i,:) = Diag(A(i, :)). vec(x (1) i ) = (C B)A T (i, :). roughly speaking, we can do A T (i, :) = (C B) vec(x (1) i ). W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, 2017 2018 Term 2. 22

PARAFAC Formulation By the trick above, we can write Horizontal slabs: X (1) i Lateral slabs: X (2) j Frontal slabs: X (3) k = X (i, :, :) = BD A(i,:) C T vec(x (1) i ) = (C B)A T (i, :) = X (:, j, :) = CD B(j,:) A T vec(x (2) j ) = (A C)B T (j, :) = X (:, :, k) = AD C(k,:) B T vec(x (3) k ) = (B A)CT (k, :) Observation: fixing B, C, solving for A is a linear system problem fixing A, C, solving for B is a linear system problem fixing A, B, solving for C is a linear system problem W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, 2017 2018 Term 2. 23

PARAFAC Formulation A tensor decomposition formulation: given X R I J K, R > 0, solve min A,B,C R X 2 a r b r c r r=1 F, NP-hard in general can be conveniently handled by alternating optimization w.r.t. A, B, C e.g., optimization w.r.t. A and fixing B, C is min A I i=1 X (1) i BD A(i,:) C T 2 F = min A I i=1 vec(x (1) i ) (C B)A T (i, :) 2 2 and a solution is (A (i, :)) T = (C B) vec(x (1) i ), i = 1,..., I. W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, 2017 2018 Term 2. 24

Suggested Further Readings for Tensor Decomposition [Sidiropoulos-De Lathauwer-Fu-Huang-Papalexakis-Faloutsos2017] W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, 2017 2018 Term 2. 25

References [Lee-Seung1999] D.D. Lee and H.S. Seung. Learning the parts of objects by non-negative matrix factorization, Nature, 1999. [Gillis2014] N. Gillis, The Why and how of nonnegative matrix factorization, in Regularization, Optimization, Kernels, and Support Vector Machines, J.A.K. Suykens, M. Signoretto and A. Argyriou (eds), Chapman & Hall/CRC, Machine Learning and Pattern Recognition Series, 2014. [Fu-Huang-Sidiropoulos-Ma2018] X. Fu, K. Huang, N. D. Sidiropoulos, and W.-K. Ma, Nonnegative matrix factorization for signal and data analytics: Identifiability, algorithms, and applications, arxiv preprint, 2018. Online avaiable: https: //arxiv.org/abs/1803.01257 [Sidiropoulos-De Lathauwer-Fu-Huang-Papalexakis-Faloutsos2017] N. D. Sidiropoulos, L. De Lathauwer, X. Fu, K. Huang, E. E. Papalexakis, and C. Faloutsos, Tensor decomposition for signal processing and machine learning, IEEE Trans. Signal Process., 2017. W.-K. Ma, ENGG5781 Matrix Analysis and Computations, CUHK, 2017 2018 Term 2. 26