Non-negative Matrix Factorization: Algorithms, Extensions and Applications

Similar documents
Probabilistic Latent Variable Models as Non-Negative Factorizations

Matrix Factorization & Latent Semantic Analysis Review. Yize Li, Lanbo Zhang

Sparse and Shift-Invariant Feature Extraction From Non-Negative Data

Non-Negative Matrix Factorization And Its Application to Audio. Tuomas Virtanen Tampere University of Technology

Data Mining and Matrices

Nonnegative Matrix Factor 2-D Deconvolution for Blind Single Channel Source Separation

Probabilistic Latent Semantic Analysis

On Spectral Basis Selection for Single Channel Polyphonic Music Separation

CONVOLUTIVE NON-NEGATIVE MATRIX FACTORISATION WITH SPARSENESS CONSTRAINT

Single-channel source separation using non-negative matrix factorization

Nonnegative Matrix Factorization

c Springer, Reprinted with permission.

MULTIPLICATIVE ALGORITHM FOR CORRENTROPY-BASED NONNEGATIVE MATRIX FACTORIZATION

Non-Negative Matrix Factorization

Shift-Invariant Probabilistic Latent Component Analysis

An Efficient Posterior Regularized Latent Variable Model for Interactive Sound Source Separation

A State-Space Approach to Dynamic Nonnegative Matrix Factorization

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement

Non-negative matrix factorization with fixed row and column sums

NMF WITH SPECTRAL AND TEMPORAL CONTINUITY CRITERIA FOR MONAURAL SOUND SOURCE SEPARATION. Julian M. Becker, Christian Sohn and Christian Rohlfing

he Applications of Tensor Factorization in Inference, Clustering, Graph Theory, Coding and Visual Representation

Nonnegative Matrix Factorization with Markov-Chained Bases for Modeling Time-Varying Patterns in Music Spectrograms

EXPLOITING LONG-TERM TEMPORAL DEPENDENCIES IN NMF USING RECURRENT NEURAL NETWORKS WITH APPLICATION TO SOURCE SEPARATION

Nonnegative Tensor Factorization with Smoothness Constraints

Discovering Convolutive Speech Phones using Sparseness and Non-Negativity Constraints

Non-Negative Tensor Factorisation for Sound Source Separation

Independent Component Analysis and Unsupervised Learning

SHIFTED AND CONVOLUTIVE SOURCE-FILTER NON-NEGATIVE MATRIX FACTORIZATION FOR MONAURAL AUDIO SOURCE SEPARATION. Tomohiko Nakamura and Hirokazu Kameoka,

ECE521 Lectures 9 Fully Connected Neural Networks

Orthogonal Nonnegative Matrix Factorization: Multiplicative Updates on Stiefel Manifolds

Document and Topic Models: plsa and LDA

SUPERVISED NON-EUCLIDEAN SPARSE NMF VIA BILEVEL OPTIMIZATION WITH APPLICATIONS TO SPEECH ENHANCEMENT

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien

Variational inference

Non-negative matrix factorization and term structure of interest rates

Convolutive Non-Negative Matrix Factorization for CQT Transform using Itakura-Saito Divergence

Note on Algorithm Differences Between Nonnegative Matrix Factorization And Probabilistic Latent Semantic Indexing

Cheng Soon Ong & Christian Walder. Canberra February June 2018

EUSIPCO

Probabilistic Latent Semantic Analysis

THE task of identifying the environment in which a sound

BILEVEL SPARSE MODELS FOR POLYPHONIC MUSIC TRANSCRIPTION

Machine Learning for Signal Processing Non-negative Matrix Factorization

Sound Recognition in Mixtures

Bayesian Hierarchical Modeling for Music and Audio Processing at LabROSA

Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative Matrix Factorization

Recent Advances in Bayesian Inference Techniques

Machine Learning for Signal Processing Non-negative Matrix Factorization

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan

Knowledge Discovery and Data Mining 1 (VO) ( )

Single Channel Music Sound Separation Based on Spectrogram Decomposition and Note Classification

Notes on Latent Semantic Analysis

Data Mining Techniques

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Latent Dirichlet Allocation

Expectation Maximization

Environmental Sound Classification in Realistic Situations

Automatic Rank Determination in Projective Nonnegative Matrix Factorization

OBJECT DETECTION AND RECOGNITION IN DIGITAL IMAGES

Fast Coordinate Descent methods for Non-Negative Matrix Factorization

Machine Learning (BSMC-GA 4439) Wenke Liu

COLLABORATIVE SPEECH DEREVERBERATION: REGULARIZED TENSOR FACTORIZATION FOR CROWDSOURCED MULTI-CHANNEL RECORDINGS. Sanna Wager, Minje Kim

PROBABILISTIC LATENT SEMANTIC ANALYSIS

TRAFFIC RISK MINING. Gargee Chaudhari, Asawari Deore, Shruti Sonaje, Rabia Qazi

LOW RANK TENSOR DECONVOLUTION

Lecture 13 : Variational Inference: Mean Field Approximation

COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017

City Research Online. Permanent City Research Online URL:

Machine Learning Techniques for Computer Vision

PROJECTIVE NON-NEGATIVE MATRIX FACTORIZATION WITH APPLICATIONS TO FACIAL IMAGE PROCESSING

Non-Negative Matrix Factorization with Quasi-Newton Optimization

Analysis of polyphonic audio using source-filter model and non-negative matrix factorization

Non-Negative Tensor Factorization with Applications to Statistics and Computer Vision

ORTHOGONALITY-REGULARIZED MASKED NMF FOR LEARNING ON WEAKLY LABELED AUDIO DATA. Iwona Sobieraj, Lucas Rencker, Mark D. Plumbley

Lecture 8: Clustering & Mixture Models

LONG-TERM REVERBERATION MODELING FOR UNDER-DETERMINED AUDIO SOURCE SEPARATION WITH APPLICATION TO VOCAL MELODY EXTRACTION.

Feature Learning with Matrix Factorization Applied to Acoustic Scene Classification

Sparseness Constraints on Nonnegative Tensor Decomposition

Brief Introduction of Machine Learning Techniques for Content Analysis

Machine learning strategies for fmri analysis

Constrained Nonnegative Matrix Factorization with Applications to Music Transcription

Probabilistic Latent Semantic Analysis

ACOUSTIC SCENE CLASSIFICATION WITH MATRIX FACTORIZATION FOR UNSUPERVISED FEATURE LEARNING. Victor Bisot, Romain Serizel, Slim Essid, Gaël Richard

Bregman Divergences for Data Mining Meta-Algorithms

Unsupervised learning: beyond simple clustering and PCA

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

ECS289: Scalable Machine Learning

ACCOUNTING FOR PHASE CANCELLATIONS IN NON-NEGATIVE MATRIX FACTORIZATION USING WEIGHTED DISTANCES. Sebastian Ewert Mark D. Plumbley Mark Sandler

TIME-DEPENDENT PARAMETRIC AND HARMONIC TEMPLATES IN NON-NEGATIVE MATRIX FACTORIZATION

Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs

Independent Component Analysis. Contents

Non-Negative Factorization for Clustering of Microarray Data

Clustering, K-Means, EM Tutorial

Received: 1 May 2009 / Accepted: 8 July 2010 / Published online: 1 August 2010 The Author(s) 2010

Latent Dirichlet Allocation Introduction/Overview

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

Scaling Neighbourhood Methods

On monotonicity of multiplicative update rules for weighted nonnegative tensor factorization

Transcription:

Non-negative Matrix Factorization: Algorithms, Extensions and Applications Emmanouil Benetos www.soi.city.ac.uk/ sbbj660/ March 2013 Emmanouil Benetos Non-negative Matrix Factorization March 2013 1 / 25

Outline 1 Introduction 2 Non-negative Matrix Factorization 3 Probabilistic Latent Semantic Analysis 4 Convolutive Extensions 5 Tensorial Extensions 6 Resources Emmanouil Benetos Non-negative Matrix Factorization March 2013 2 / 25

Introduction Assumption: Perception of the whole is based on perception on its parts There is evidence for parts-based representations in the brain Algorithms for learning holistic representations: principal component analysis, singular value decomposition Allowing additive (and not subtractive) combinations leads to parts-based representations First work in the 90s: positive matrix factorization Emmanouil Benetos Non-negative Matrix Factorization March 2013 3 / 25

Non-negative Matrix Factorization (1) Non-negative Matrix Factorization (NMF): first proposed by Lee and Seung in 1999 Unsupervised algorithm for decomposing multivariate data Alternatively, a method for dimensionality reduction by factorizing a data matrix into a low rank decomposition Constraint: non-negativity of data This allows a parts-based representation because only additive combinations are allowed D. D. Lee and H. S. Seung, Learning the parts of objects by non-negative matrix factorization, Nature, vol. 401, pp. 788-791, Oct. 1999. Emmanouil Benetos Non-negative Matrix Factorization March 2013 4 / 25

Non-negative Matrix Factorization (2) Model: Given a non-negative matrix V find non-negative matrix factors W and H such that: V WH (1) where: V R n m W R n r H R r m The rank r of the factorization is chosen as (n+m)r < nm, so that data is compressed The columns of H are in one-to-one correspondence with the columns of V. Thus WH can be interpreted as weighted sum of each of the basis vectors in W, the weights being the corresponding columns of H Emmanouil Benetos Non-negative Matrix Factorization March 2013 5 / 25

Non-negative Matrix Factorization (3) Cost Functions: To find an approximate factorization, cost functions that quantify the quality of the approximation need to be defined Euclidean distance: A B 2 = ij (A ij B ij ) 2 (2) Kullback-Leibler divergence (aka relative entropy): D(A B) = ( A ij log A ) ij A ij +B ij B ij ij (3) D. D. Lee and H. S. Seung, Algorithms for non-negative matrix factorization, in Proc. NIPS, pp. 556-562, 2001. Emmanouil Benetos Non-negative Matrix Factorization March 2013 6 / 25

Non-negative Matrix Factorization (4) Algorithms: Problem 1: Minimize V WH 2 wrt W and H, subject to W,H 0 Problem 2: Minimize D(V WH) wrt W and H, subject to W,H 0 W, H estimated using multiplicative update rules Alternative solutions: alternating least squares, gradient descent Other cost functions: Itakura-Saito distance, α-divergences, β-divergences, φ-divergences, Bregman divergences,... Algorithms converge to a local minimum Emmanouil Benetos Non-negative Matrix Factorization March 2013 7 / 25

Network representation Network representation / generative model: Emmanouil Benetos Non-negative Matrix Factorization March 2013 8 / 25

Applications of NMF (1) Applications: Detection, dimensionality reduction, clustering, classification, denoising, prediction Examples: Digital image analysis Text mining Audio signal analysis Bioinformatics Emmanouil Benetos Non-negative Matrix Factorization March 2013 9 / 25

Applications of NMF (2) Learning parts-based representation of faces: Emmanouil Benetos Non-negative Matrix Factorization March 2013 10 / 25

Applications of NMF (3) Discovering semantic features in text: Emmanouil Benetos Non-negative Matrix Factorization March 2013 11 / 25

z z Applications of NMF (4) Discovering musical notes in a recording: 700 600 500 400 ω 300 200 100 0.5 1 1.5 2 2.5 t (sec) 5 5 4 4 3 3 2 2 1 100 200 300 400 500 600 700 ω 1 0.5 1 1.5 2 2.5 t (sec) Emmanouil Benetos Non-negative Matrix Factorization March 2013 12 / 25

Probabilistic Latent Semantic Analysis (1) In 1999, Hofmann proposed a technique for text processing and retrieval, called Probabilistic Latent Semantic Analysis (PLSA)...which is also called Probabilistic Latent Semantic Indexing (PLSI)...and also Probabilistic Latent Component Analysis (PLCA)! In fact, PLSA/PLSI/PLCA are the probabilistic counterparts of NMF using the KL divergence as a cost function This interpretation offers a framework that is easy to generalise and extend T. Hofmann, Learning the Similarity of Documents: an information-geometric approach to document retrieval and categorization, in Proc. NIPS, pp-914-920, 2000. Emmanouil Benetos Non-negative Matrix Factorization March 2013 13 / 25

Probabilistic Latent Semantic Analysis (2) Model: Considering the entries of V as having been generated by a probability distribution P(x 1,x 2 ), PLSA models P(x 1,x 2 ) as a mixture of conditionally independent multinomial distributions: P(x 1,x 2 ) = z P(z)P(x 1 z)p(x 2 z) = P(x 1 ) z P(x 2 z)p(z x 1 ) The 1st formulation is called symmetric, the 2nd asymmetric Parameters can be estimated using the Expectation-Maximization (EM) algorithm Essentially: H is P(x 1 z) and W is P(z)P(x 2 z) (4) Emmanouil Benetos Non-negative Matrix Factorization March 2013 14 / 25

Probabilistic Latent Semantic Analysis (3) Example of symmetric PLSA: Emmanouil Benetos Non-negative Matrix Factorization March 2013 15 / 25

Convolutive Extensions (1) Instead of identifying 1-D components, we might want to detect 2-D structures out of non-negative data The linear model is not good enough - we need a convolutive model Non-negative Matrix Factor Deconvolution (NMFD): Extracting shifted structures from non-negative data Model: V t W t H t (5) where: V R m n, W t R m r, H R r n, and H t shifts the columns of H by t spots to the right P. Smaragdis, Non-negative matrix factor deconvolution; extraction of multiple sound sources from monophonic inputs, in Proc. LVA/ICA, pp. 494-499, Sep. 2004. Emmanouil Benetos Non-negative Matrix Factorization March 2013 16 / 25

Convolutive Extensions (2) Example of NMFD: Emmanouil Benetos Non-negative Matrix Factorization March 2013 17 / 25

Convolutive Extensions (3) Probabilistic counterpart of NMFD: Shift-invariant Probabilistic Latent Component Analysis Model (shift invariance across 1 dimension): P(x,y) = z P(z) τ P(x,τ z)p(y τ z) (6) Model (shift invariance across 2 dimensions): P(x,y) = P(z) P(τ x,τ y z)p(x τ x,y τ y z) (7) z τ x τ y P. Smaragdis, B. Raj, M. V. S. Shashanka, Sparse and shift-invariant feature extraction from non-negative data, in Proc. ICASSP, pp. 2069-2072, 2008. Emmanouil Benetos Non-negative Matrix Factorization March 2013 18 / 25

Convolutive Extensions (4) Example of Shift-invariant PLCA for handwriting recognition: Emmanouil Benetos Non-negative Matrix Factorization March 2013 19 / 25

Convolutive Extensions (5) Example of Shift-invariant PLCA for musical pitch detection: Emmanouil Benetos Non-negative Matrix Factorization March 2013 20 / 25

Tensorial Extensions (1) What if we want to decompose multidimensional data (i.e. tensors)? NMF extends to Non-negative Tensor Factorization (NTF) Model: V k u j 1 uj 2... uj n (8) j=1 where V is an n-way array. The approximation tensor has rank k. As in NMF, there are cost functions and update rules for estimating the nk vectors u j i A. Shashua and T. Hazan, Non-negative tensor factorization with applications to statistics and computer vision, in Proc. ICML, pp. 792-799, 2005. Emmanouil Benetos Non-negative Matrix Factorization March 2013 21 / 25

Tensorial Extensions (2) Comparison of extracted NMF bases (2nd row) with NTF bases (3rd row): Emmanouil Benetos Non-negative Matrix Factorization March 2013 22 / 25

Tensorial Extensions (3) Probabilistic counterpart: Probabilistic Latent Component Analysis (PLCA) Model: P(x) = z P(z) N P(x j z) (9) where P(x) is an N-dimensional distribution of the random variable x = x 1,x 2,,x N. j=1 Unknown parameters estimated using EM algorithm M. Shashanka, B. Raj, and P. Smaragdis, Probabilistic latent variable models as nonnegative factorizations, Computational Intelligence and Neuroscience, Article ID 947438, 2008. Emmanouil Benetos Non-negative Matrix Factorization March 2013 23 / 25

More Extensions... Incorporate sparsity constraints into NMF/PLSA Incorporate prior distributions into PLCA models (incorporating information into the problem) Keeping fixed bases: supervised algorithms! Temporal constraints for modeling time-series: e.g. Non-negative Hidden Markov Model (N-HMM) More complex models based on NMF/PLSA: introduce more latent variables What if we don t know the size of r? Non-parametric techniques (e.g. GaP-NMF) For analysing spectra without losing phase: Complex NMF, High-Resolution NMF Emmanouil Benetos Non-negative Matrix Factorization March 2013 24 / 25

Resources Matlab Statistics Toolbox NMFlib (Matlab): http://www.ee.columbia.edu/~grindlay/code.html NMF-CUDA (OpenMP): https://github.com/ebattenberg/nmf-cuda NMFLAB (Matlab): http://www.bsp.brain.riken.go.jp/icalab/nmflab.html NMFN (R): http://cran.r-project.org/web/packages/nmfn/index.html NMF-DTU (Matlab): http://cogsys.imm.dtu.dk/toolbox/nmf/index.html nima (Python): http://nimfa.biolab.si/ libnmf (C): http://www.univie.ac.at/rlcta/software/ ITL Toolbox (C++): http://www.insight-journal.org/browse/publication/152 NNMA (C++): http://people.kyb.tuebingen.mpg.de/suvrit/work/progs/nnma.html Emmanouil Benetos Non-negative Matrix Factorization March 2013 25 / 25