Single-channel source separation using non-negative matrix factorization

Size: px

Start display at page:

Download "Single-channel source separation using non-negative matrix factorization"

Mae Franklin
5 years ago
Views:

1 Single-channel source separation using non-negative matrix factorization Mikkel N. Schmidt Technical University of Denmark DTU Informatics Department of Informatics and Mathematical Modeling Page 1 of 30

2 Agenda Single-channel source separation Non-negative matrix factorization NMF 2-D deconvolution (with Morten Mørup) Speech separation using sparse NMF (with Rasmus K. Olsson) NMF with Gaussian process priors (with Hans Laurberg) Page 2 of 30

3 Single-channel source separation Additive model Mixture Sources Noise Under-determined problem: More information required Example: Two-source noise-free Page 3 of 30

4 Approaches to single-channel source separation Filtering Decomposition and grouping Source modelling Page 4 of 30

5 Model-based source-separation mixture training data feature extraction feature extraction data representation inference separation model model reconstruction estimate Page 5 of 30

6 Signal representation Emphasize desired characteristics Introduce invariances Allow assumptions of independence or exchangeability Reduce dimensionality Allow signal reconstruction Page 6 of 30

7 Model Likelihood Mixing model Source model Model building Prior Model training Model adaptation Goals Accurately model sources and mixing process Enable efficient inference Page 7 of 30

8 Inference Posterior Likelihood Prior Estimate sources: Maximum a posteriori, posterior mean, marginal MAP, etc. Solve optimization or integration problem Page 8 of 30

9 Non-negative matrix factorization Non-negativity constraints Data Factorizing matrices Page 9 of 30

10 Non-negative matrix factorization Non-negative bilinear model Sum of products of non-negative variables Page 10 of 30

11 Why non-negativity? Many signals are naturally non-negative Pixel intensities Amplitude spectra Occurrence counts Discrete probabilities Additive combination of features No cancellations Build the whole as a sum of parts Page 11 of 30

12 Basic computation Constrained minimization problem Divergence measure Constraints Page 12 of 30

13 Probabilistic formulation Maximum likelihood Likelihood Prior Maximum a posteriori Page 13 of 30

14 Divergence measures (Likelihoods) Squared error (Lee and Seung, 1999) Kullback-Leibler divergence (Lee and Seung, 1999) Bregman's divergence (Dhillon and Sra, 2005) Kompass' divergence (Kompass, 2007) Csiszár's divergence (Cichocki et al., 2006) Amari's alpha divergence (Cichocki et al., 2006) Weighted versions of the above (Guillamet et al., 2001) Page 14 of 30

15 Distribution of the factors (Priors) Sparsity (Hoyer, 2002) Orthogonality (Ding et al., 2005) Discriminative (Wang et al., 2004) Smoothness (Virtanen, 2003) Gaussian process (Schmidt and Laurberg, 2008) Transformation invariance (Wersing et al., 2003) Convolutive (Virtanen, 2004; Smaragdis, 2004) 2-D convolutive (Schmidt and Mørup, 2006) Page 15 of 30

16 Optimization strategies Direct optimization Non-convex Alternating optimization Alternating descent Can have convex subproblems Page 16 of 30

17 Algorithms Projected least squares (Paatero, 1997) Multiplicative updates (Lee and Seung, 1999) Projected gradient descent (Lin, 2007) Logarithmic barrier method (Lu and Wu, 2005) Active set (Berry et al., 2006) Quasi Newton (Kim et al., 2007) Reparametrization (Cichocki et al., 2006) SOCP (Heiler and Schnörr, 2006) Page 17 of 30

18 NMF 2-D deconvolution NMF NMF 2-D Page 18 of 30

19 Example: Flute and guitar Frequency [khz] Time [seconds].2 Page 19 of 30

20 Speech separation using sparse NMF Training data Learn basis for each source Mixture Basis Decompose mixture Reconstruct each source Source estimate Page 20 of 30

21 Example: Two speakers Bases Mixture Frequency [khz] Time [seconds] Page 21 of 30

22 NMF with Gaussian process priors Data Gaussian processes Link functions GP: General distribution over functions Link function: Non-linear map to non-negative reals Page 22 of 30

23 Gaussian process Distribution over functions Characterized by Mean function Covariance function Example Page 23 of 30

24 Change of variable Cholesky decomposition I.I.D. standard normal Same model, different parametrization Parameters are a priori uncorrelated Empirically better optimization properties Page 24 of 30

25 Illustration of NMF with GP priors Page 25 of 30

26 Example: Toy problem Noisy data Underlying data NMF GPP-NMF Page 26 of 30

27 Example: Toy problem Columns of A Rows of B Underlying data NMF GPP-NMF Page 27 of 30

28 Example: Chemical shift brain imaging Data: 369 point 512 points (8 8 8 grid) in human head Task: Distinguish between brain and muscle tissue Spectra prior: Smooth exponential distribution Activation prior: 3-D smooth, exponential distribution, leftto-right symmetric Random draw from prior Page 28 of 30

29 Example: Chemical shift brain imaging Non-negative matrix factorization NMF with Gaussian process priors Brain Muscle Page 29 of 30

30 Conclusions Single-channel source separation is a difficult problem that occurs in many areas Model based source separation provides a principled approach to solving the problem NMF and its extensions are useful models for singlechannel source separation Gaussian processes can be used as a general framework for incorporating prior information in NMF Page 30 of 30

Nonnegative Matrix Factor 2-D Deconvolution for Blind Single Channel Source Separation

Nonnegative Matrix Factor 2-D Deconvolution for Blind Single Channel Source Separation Mikkel N. Schmidt and Morten Mørup Technical University of Denmark Informatics and Mathematical Modelling Richard