Temporal models with low-rank spectrograms

Size: px
Start display at page:

Download "Temporal models with low-rank spectrograms"

Transcription

1 Temporal models with low-rank spectrograms Cédric Févotte Institut de Recherche en Informatique de Toulouse (IRIT) IEEE MLSP Aalborg, Sep. 218

2 Outline NMF for spectral unmixing Generalities Itakura-Saito NMF Low-rank time-frequency synthesis Analysis vs synthesis Generative model Maximum joint likelihood estimation Variants NMF with transform learning Principle Optimisation 2

3 Collaborators Low-rank time-frequency synthesis NMF with transform learning Matthieu Kowalski (Paris-Saclay University) Dylan Fagot Herwig Wendt (CNRS, IRIT, Toulouse) 3

4 NMF for audio spectral unmixing (Smaragdis and Brown, 23) power spectrogram S W H frequency transform original temporal signal x(t) time y fn : short-time Fourier transform (STFT) of temporal signal x(t). s fn = y fn 2 : power spectrogram. NMF extracts recurring spectral patterns from the data by solving min D(S WH) = d(s fn [WH] fn ) W,H fn Successful applications in audio source separation and music transcription. 4

5 All factors are positive-valued: NMF for audio spectral unmixing (Smaragdis! Resulting and Brown, reconstruction 23) is additive X W H Non-negative 2 16 Input music passage 6 Component Frequency (Hz) Frequency Time (sec) 4 Component A BOUT THIS NON-NEGATIVE BUSINESS reproduced from (Smaragdis, 213) 5

6 NMF for audio spectral unmixing (Smaragdis and Brown, 23) power spectrogram S W H frequency transform original temporal signal x(t) time What is the right time-frequency transform S? What is the right measure of fit D(S WH)? NMF approximates the spectrogram by a sum of rank-one spectrograms. How to reconstruct temporal components? What about phase? 6

7 Itakura-Saito NMF (Févotte, Bertin, and Durrieu, 29) Gaussian low-rank variance model of the complex-valued STFT: y fn N c(, [WH] fn ) Related work by (Benaroya et al., 23; Abdallah and Plumbley, 24; Parry and Essa, 27) 7

8 Itakura-Saito NMF (Févotte, Bertin, and Durrieu, 29) Gaussian low-rank variance model of the complex-valued STFT: y fn N c(, [WH] fn ) Log-likelihood equivalent to Itakura-Saito (IS) divergence: log p(y WH) = D IS ( Y 2 WH) + cst Related work by (Benaroya et al., 23; Abdallah and Plumbley, 24; Parry and Essa, 27) 7

9 Itakura-Saito NMF (Févotte, Bertin, and Durrieu, 29) Gaussian low-rank variance model of the complex-valued STFT: y fn N c(, [WH] fn ) Log-likelihood equivalent to Itakura-Saito (IS) divergence: log p(y WH) = D IS ( Y 2 WH) + cst Zero-mean assumption: E [x(t)] = implies E [y fn ] = by linearity. Related work by (Benaroya et al., 23; Abdallah and Plumbley, 24; Parry and Essa, 27) 7

10 Itakura-Saito NMF (Févotte, Bertin, and Durrieu, 29) Gaussian low-rank variance model of the complex-valued STFT: y fn N c(, [WH] fn ) Log-likelihood equivalent to Itakura-Saito (IS) divergence: log p(y WH) = D IS ( Y 2 WH) + cst Zero-mean assumption: E [x(t)] = implies E [y fn ] = by linearity. Underlies a Gaussian composite model (GCM): y fn = z kfn, k z kfn N c(, w fk h kn ) Related work by (Benaroya et al., 23; Abdallah and Plumbley, 24; Parry and Essa, 27) 7

11 Itakura-Saito NMF (Févotte, Bertin, and Durrieu, 29) Gaussian low-rank variance model of the complex-valued STFT: y fn N c(, [WH] fn ) Log-likelihood equivalent to Itakura-Saito (IS) divergence: log p(y WH) = D IS ( Y 2 WH) + cst Zero-mean assumption: E [x(t)] = implies E [y fn ] = by linearity. Underlies a Gaussian composite model (GCM): y fn = k z kfn, z kfn N c(, w fk h kn ) Latent STFT components can be estimated a posteriori by Wiener filter: ẑ kfn = E [z kfn Y, W, H] = w fkh kn [WH] fn y fn Related work by (Benaroya et al., 23; Abdallah and Plumbley, 24; Parry and Essa, 27) 7

12 Itakura-Saito NMF (Févotte, Bertin, and Durrieu, 29) Gaussian low-rank variance model of the complex-valued STFT: y fn N c(, [WH] fn ) Log-likelihood equivalent to Itakura-Saito (IS) divergence: log p(y WH) = D IS ( Y 2 WH) + cst Zero-mean assumption: E [x(t)] = implies E [y fn ] = by linearity. Underlies a Gaussian composite model (GCM): y fn = k z kfn, z kfn N c(, w fk h kn ) Latent STFT components can be estimated a posteriori by Wiener filter: ẑ kfn = E [z kfn Y, W, H] = w fkh kn [WH] fn y fn Inverse-STFT of {ẑ kfn } fn produces temporal components such that: x(t) = k ĉk(t) Related work by (Benaroya et al., 23; Abdallah and Plumbley, 24; Parry and Essa, 27) 7

13 The Itakura-Saito divergence (Itakura and Saito, 1968; Gray et al., 198) Divergence d(x y) as a function of y with x=1 Itakura-Saito Generalised KL Squared Euclidean IS divergence: d IS (x y) = x y log x y 1 Nonconvex in y. Scale-invariance: d IS (λ x λ y) = d IS (x y) Very relevant for spectral data with high dynamic range. In comparison, d Euc (λ x λ y) = λ 2 d Euc (x y), d KL (λ x λ y) = λ d KL (x y). 8

14 Optimisation for IS-NMF (Févotte and Idier, 211) Objective min W,H D(S WH) = fn = fn [ ] s fn s fn log 1 [WH] fn [WH] fn [ ] s fn + log[wh] fn + cst [WH] fn State of the art Block-coordinate descent (W, H) with majorisation-minimisation (MM) Updates of W and H equivalent by transposition of S MM leads to multiplicative updates (linear complexity per iteration) f h kn h w fks fn [WH] 2 fn kn f w fk[wh] 1 fn Nonconvex problem (because of bilinearity and the divergence), initialisation matters. 9

15 Piano toy example (MIDI numbers : 61, 65, 68, 72) Figure: Three representations of data. demo available at 1

16 Piano toy example IS-NMF on power spectrogram with K = 8 K = Dictionary W Coefficients H.2.2 Reconstructed components K = K = K = K = K = K = K = Pitch estimates: (True values: 61, 65, 68, 72) x 1 5

17 Piano toy example KL-NMF on magnitude spectrogram with K = 8 Dictionary W Coefficients H Reconstructed components K = K = K = K = K = K = K = K = Pitch estimates: (True values: 61, 65, 68, 72) x 1 5

18 Follow-up on IS-NMF penalised versions promoting sparsity or dynamics (Lefèvre, Bach, and Févotte, 211a; Févotte, 211; Févotte, Le Roux, and Hershey, 213) model order selection (Tan and Févotte, 213) online/incremental variants (Dessein et al., 21; Lefèvre et al., 211b) Bayesian approaches (Hoffman et al., 21; Dikmen and Févotte, 211; Turner and Sahani, 214) full-covariance models (Liutkus et al., 211; Yoshii et al., 213) improved phase models (Badeau, 211; Magron, Badeau, and David, 217) multichannel variants (Ozerov and Févotte, 21; Sawada et al., 213; Kounades-Bastian et al., 216; Leglaive et al., 216) Sources S NMF: W H Mixing system A noise 1 Mixture X + + noise 2 Multichannel NMF problem: Estimate W, H and A from X 13

19 Outline NMF for spectral unmixing Generalities Itakura-Saito NMF Low-rank time-frequency synthesis Analysis vs synthesis Generative model Maximum joint likelihood estimation Variants NMF with transform learning Principle Optimisation 14

20 Analysis vs synthesis IS-NMF is a generative model of the STFT but not of the raw signal itself. Low-rank time-frequency synthesis (LRTFS) fills in this ultimate gap. STFT is an analysis transform y fn = t x(t)φ fn(t) LRTFS is a synthesis model x(t) = fn α fn φ fn (t) + e(t) Figure: Two Gabor atoms φ fn (t) 15

21 Low-rank time frequency synthesis (LRTFS) (Févotte and Kowalski, 214) Gaussian low-rank variance model of the synthesis coefficients: x(t) = α fn φ fn (t) + e(t), fn α fn N c (, [WH] fn ), e(t) N c (, λ). LRTFS is a generative model of raw signal x(t). Like in IS-NMF, latent composite structure of the synthesis coefficients: α fn = k z kfn, z kfn N c (, w fk h kn ). Given {ẑ kfn } fn, temporal components can be reconstructed as ĉ k (t) = fn ẑkfn φ fn (t). 16

22 Relation to sparse Bayesian learning (SBL) Generative signal model in vector/matrix form: x = Φα + e. x, e: vectors of signal and residual time samples (size T ), α: vector of synthesis coefficients αfn (size FN), Φ: time-frequency dictionary (size T FN). Synthesis coefficients model in vector/matrix form: p(α v) = N c (α, diag(v)). v: vector of variance coefficients vfn = [WH] fn (size FN). Similar to sparse Bayesian learning (Tipping, 21; Wipf and Rao, 24) except that the variance parameters are tied together by the low-rank structure WH. 17

23 Maximum joint likelihood estimation in LRTFS Optimise C(α, W, H) def = log p(x, α W, H, λ) = 1 λ x Φα [ αfn 2 ] + log [WH] fn + cst [WH] fn fn Block coordinate descent (α, W, H) 18

24 Maximum joint likelihood estimation in LRTFS Optimise C(α, W, H) def = log p(x, α W, H, λ) = 1 λ x Φα [ αfn 2 ] + log [WH] fn + cst [WH] fn fn Block coordinate descent (α, W, H) Optimisation of α 1 min α C M λ x Φα fn α fn 2 [WH] fn Ridge regression with complex-valued FISTA (Chaâri et al., 211; Florescu et al., 214) 18

25 Maximum joint likelihood estimation in LRTFS Optimise C(α, W, H) def = log p(x, α W, H, λ) = 1 λ x Φα [ αfn 2 ] + log [WH] fn + cst [WH] fn fn Block coordinate descent (α, W, H) Optimisation of W, H min W,H d IS ( α fn 2 [WH] fn ) fn IS-NMF with majorisation-minimisation (Févotte and Idier, 211) 18

26 Maximum joint likelihood estimation in LRTFS Algorithm 1: Block coordinate descent for LRTFS Set L Φ 2 2 (gradient inverse step size) Set α () = Φ H x (STFT) repeat % Update W and H with NMF {W (i+1), H (i+1) } = arg min W,H fn d IS( α (i) fn 2 [WH] fn ) % Update α with FISTA repeat % Gradient descent z (i) = α (i) + 1 L ΦH (x Φα (i) ) % Shrink α (i+1) fn = [WH](i+1) fn z (i) [WH] (i+1) +λ/l fn fn % Accelerate with momentum until convergence; until convergence; Complexity is one NMF per update of synthesis coefficients Efficient multiplication by Φ or Φ H using the LTFAT time-frequency toolbox 19

27 Noisy piano example 1 (a) noisy signal Time (s) (b) IS-NMF decomposition audio: c 1(t) c 2(t) c 3(t) c 4(t) c 5(t) c 6(t) c 7(t) c 8(t) c 9(t) c 1(t) 2

28 Noisy piano example 1 (a) noisy signal Time (s) (b) LRTFS decomposition audio: c 1(t) c 2(t) c 3(t) c 4(t) c 5(t) c 6(t) c 7(t) c 8(t) c 9(t) c 1(t) 21

29 Remarks about LRTFS Real-valued signals (Févotte and Kowalski, 218) x(t) previously assumed complex-valued for simplicity. In practice, x(t) is real-valued and Φ, α have Hermitian symmetry: x(t) = F /2 f =1 n=1 N 2R[α fn φ fn (t)] + e(t) More difficult to address but leads to essentially the same algorithm. Multi-layer representations (Févotte and Kowalski, 214, 215) LRTFS allows for multi-resolution hybrid representations: x = Φ a α a + Φ b α b + e. Φa and Φ b are time-frequency dictionaries with possibly different resolutions, αa and α b have their own structure, either low-rank or sparse. Not possible with standard NMF! 22

30 Remarks about LRTFS Compressive sensing (Févotte and Kowalski, 218) Signal x of size T sensed through linear operator A of size S T : b = Ax + e = AΦα + e Thanks to LRTFS, low-rankness can be used instead of sparsity. Estimate α from b under α fn N c (, [WH] fn ), similar algorithm. 25 toy piano 18 song excerpt output SNR 15 1 output SNR LRTFS SBL SBL L1 L1 2 LRTFS Oracle: spectrogram Oracle: spectrogram Oracle: WH (IS NMF) Oracle: WH (IS NMF) Number of measurements S (in % of T) Number of measurements S (in % of T) Recovery with LRTFS, SBL, LASSO (optimal hyperparameter) and two oracles Mamavatu by S. Raman (acoustic guitar, percussion, drums) 23

31 Outline NMF for spectral unmixing Generalities Itakura-Saito NMF Low-rank time-frequency synthesis Analysis vs synthesis Generative model Maximum joint likelihood estimation Variants NMF with transform learning Principle Optimisation 24

32 NMF with transform learning (Fagot, Wendt, and Févotte, 218) windowed segments X W H Fourier transform U 2 original temporal signal x(t) time 25

33 NMF with transform learning (Fagot, Wendt, and Févotte, 218) windowed segments X W H adaptive transform U 2 original temporal signal x(t) time 26

34 NMF with transform learning (Fagot, Wendt, and Févotte, 218) Power spectrogram S can be written as S = U FT X 2 X of size F N contains adjacent and windowed segments of x(t). UFT of size F F is the orthogonal Fourier matrix. 27

35 NMF with transform learning (Fagot, Wendt, and Févotte, 218) Power spectrogram S can be written as S = U FT X 2 X of size F N contains adjacent and windowed segments of x(t). UFT of size F F is the orthogonal Fourier matrix. Traditional NMF: min D( U FTX 2 WH) s.t. W, H W,H 27

36 NMF with transform learning (Fagot, Wendt, and Févotte, 218) Power spectrogram S can be written as S = U FT X 2 X of size F N contains adjacent and windowed segments of x(t). UFT of size F F is the orthogonal Fourier matrix. Traditional NMF: min D( U FTX 2 WH) s.t. W, H W,H NMF with transform learning (TL-NMF): min W,H,U D( UX 2 WH) s.t. W, H, U orthogonal Can be interpreted as a one-layer factorising network. 27

37 NMF with transform learning (Fagot, Wendt, and Févotte, 218) Power spectrogram S can be written as S = U FT X 2 X of size F N contains adjacent and windowed segments of x(t). UFT of size F F is the orthogonal Fourier matrix. Traditional NMF: min D( U FTX 2 WH) s.t. W, H W,H NMF with transform learning (TL-NMF): min W,H,U D( UX 2 WH) s.t. W, H, U orthogonal Can be interpreted as a one-layer factorising network. Inspired by the sparsifying transform of (Ravishankar and Bresler, 213): min U UX 1 s.t. U square invertible 27

38 NMF with transform learning (Fagot, Wendt, and Févotte, 218) Objective min W,H,U D IS( UX 2 WH) + λ H 1 s.t. W, H w k 1 = 1 UU T = I For simplicity, U real-valued orthogonal matrix. Practical importance of imposing sparsity on H. Block coordinate descent (U, W, H) Update of (W, H) with majorisation-minimisation. Update of U: Projected gradient descent with line-search (Manton, 22) Jacobi algorithm (U decomposed as a product of Givens rotations) (Wendt, Fagot, and Févotte, 218) 28

39 Learnt atoms from the piano toy example u 1 u 7 u 13 u 19 u 25 u 2 u 8 u 14 u 2 u 26 u 3 u 9 u 15 u 21 u 27 u 4 u 1 u 16 u 22 u 28 u 5 u 11 u 17 u 23 u 29 u 6 u 12 u 18 u 24 u 3 3 most active atoms learnt with TL-NMF (random initialisation) 29

40 Learnt atoms from the piano toy example u 5 u 6 u 7 u 8 u 13 u 14 u 1 u 2 u 1 u 11 u 19 u 21 Some atoms form pairs in phase quadrature 3

41 Temporal decomposition with TL-NMF Component Component Component Component Component Component Component Component time (s) time (s) Audio: c 1(t) c 2(t) c 3(t) c 4(t) c 5(t) c 6(t) c 7(t) c 8(t) 31

42 Summary IS-NMF is a generative model of the STFT. LRTFS is a generative model of the signal itself, with low-rank variance structure of the synthesis coefficients. TL-NMF learns a short-time transform together with the factorisation. Both LRTFS and TL-NMF take the raw signal as input. 32

43 Summary IS-NMF is a generative model of the STFT. LRTFS is a generative model of the signal itself, with low-rank variance structure of the synthesis coefficients. TL-NMF learns a short-time transform together with the factorisation. Both LRTFS and TL-NMF take the raw signal as input. Available PhD & postdoc positions in machine learning & signal processing within ERC project FACTORY 32

44 References I S. A. Abdallah and M. D. Plumbley. Polyphonic transcription by nonnegative sparse coding of power spectra. In Proc. International Symposium Music Information Retrieval Conference (ISMIR), pages , Barcelona, Spain, Oct. 24. R. Badeau. Gaussian modeling of mixtures of non-stationary signals in the time-frequency domain (HR-NMF). In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 211. doi: 1.119/ASPAA L. Benaroya, R. Gribonval, and F. Bimbot. Non negative sparse representation for Wiener based source separation with a single sensor. In Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages , Hong Kong, 23. L. Chaâri, J.-C. Pesquet, A. Benazza-Benyahia, and P. Ciuciu. A wavelet-based regularized reconstruction algorithm for SENSE parallel MRI with applications to neuroimaging. Medical Image Analysis, 15(2):185 21, 211. L. Daudet and B. Torrésani. Hybrid representations for audiophonic signal encoding. Signal Processing, 82(11): , 22. ISSN doi: A. Dessein, A. Cont, and G. Lemaitre. Real-time polyphonic music transcription with non-negative matrix factorization and beta-divergence. In Proc. International Society for Music Information Retrieval Conference (ISMIR), 21. O. Dikmen and C. Févotte. Nonnegative dictionary learning in the exponential noise model for adaptive music signal representation. In Advances in Neural Information Processing Systems (NIPS), pages , Granada, Spain, Dec URL 33

45 References II D. Fagot, H. Wendt, and C. Févotte. Nonnegative matrix factorization with transform learning. In Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr URL C. Févotte. Majorization-minimization algorithm for smooth Itakura-Saito nonnegative matrix factorization. In Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic, May 211. URL C. Févotte and J. Idier. Algorithms for nonnegative matrix factorization with the beta-divergence. Neural Computation, 23(9): , Sep doi: /NECO a 168. URL C. Févotte and M. Kowalski. Low-rank time-frequency synthesis. In Advances in Neural Information Processing Systems (NIPS), Dec URL C. Févotte and M. Kowalski. Hybrid sparse and low-rank time-frequency signal decomposition. In Proc. European Signal Processing Conference (EUSIPCO), Nice, France, Sep URL C. Févotte and M. Kowalski. Estimation with low-rank time-frequency synthesis models. IEEE Transactions on Signal Processing, 66(15): , Aug doi: URL C. Févotte, N. Bertin, and J.-L. Durrieu. Nonnegative matrix factorization with the Itakura-Saito divergence. With application to music analysis. Neural Computation, 21(3):793 83, Mar. 29. doi: /neco URL 34

46 References III C. Févotte, J. Le Roux, and J. R. Hershey. Non-negative dynamical system with application to speech and audio. In Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vancouver, Canada, May 213. URL A. Florescu, E. Chouzenoux, J.-C. Pesquet, P. Ciuciu, and S. Ciochina. A majorize-minimize memory gradient method for complex-valued inverse problems. Signal Processing, 13: , 214. R. M. Gray, A. Buzo, A. H. Gray, and Y. Matsuyama. Distortion measures for speech processing. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4): , Aug M. Hoffman, D. Blei, and P. Cook. Bayesian nonparametric matrix factorization for recorded music. In Proc. 27th International Conference on Machine Learning (ICML), Haifa, Israel, 21. F. Itakura and S. Saito. Analysis synthesis telephony based on the maximum likelihood method. In Proc 6 th International Congress on Acoustics, pages C 17 C 2, Tokyo, Japan, Aug D. Kounades-Bastian, L. Girin, X. Alameda-Pineda, S. Gannot, and R. Horaud. A variational EM algorithm for the separation of time-varying convolutive audio mixtures. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(8): , Aug ISSN doi: 1.119/TASLP A. Lefèvre, F. Bach, and C. Févotte. Itakura-Saito nonnegative matrix factorization with group sparsity. In Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic, May 211a. URL A. Lefèvre, F. Bach, and C. Févotte. Online algorithms for nonnegative matrix factorization with the Itakura-Saito divergence. In Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), Mohonk, NY, Oct. 211b. URL 35

47 References IV S. Leglaive, R. Badeau, and G. Richard. Multichannel audio source separation with probabilistic reverberation priors. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(12): , Dec ISSN doi: 1.119/TASLP A. Liutkus, R. Badeau, and G. Richard. Gaussian processes for underdetermined source separation. IEEE Transactions on Signal Processing, 59(7): , July 211. doi: 1.119/TSP P. Magron, R. Badeau, and B. David. Phase-dependent anisotropic Gaussian model for audio source separation. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 217. doi: 1.119/ICASSP J. H. Manton. Optimization algorithms exploiting unitary constraints. IEEE Transactions on Signal Processing, 5(3):635 65, Mar. 22. A. Ozerov and C. Févotte. Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation. IEEE Transactions on Audio, Speech and Language Processing, 18(3): , Mar. 21. doi: 1.119/TASL URL R. M. Parry and I. Essa. Phase-aware non-negative spectrogram factorization. In Proc. International Conference on Independent Component Analysis and Signal Separation (ICA), pages , London, UK, Sep. 27. S. Ravishankar and Y. Bresler. Learning sparsifying transforms. IEEE Transactions on Signal Processing, 61(5): , Mar ISSN X. doi: 1.119/TSP H. Sawada, H. Kameoka, S. Araki, and N. Ueda. Multichannel extensions of non-negative matrix factorization with complex-valued data. IEEE Transactions on Audio, Speech, and Language Processing, 21(5): , May 213. ISSN doi: 1.119/TASL P. Smaragdis. About this non-negative business. WASPAA keynote slides, 213. URL 36

48 References V P. Smaragdis and J. C. Brown. Non-negative matrix factorization for polyphonic music transcription. In Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), Oct. 23. V. Y. F. Tan and C. Févotte. Automatic relevance determination in nonnegative matrix factorization with the beta-divergence. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(7): , July 213. URL M. E. Tipping. Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research, 1: , 21. R. E. Turner and M. Sahani. Time-frequency analysis as probabilistic inference. IEEE Transactions on Signal Processing, 62(23): , Dec 214. doi: 1.119/TSP H. Wendt, D. Fagot, and C. Févotte. Jacobi algorithm for nonnegative matrix factorization with transform learning. In Proc. European Signal Processing Conference (EUSIPCO), Sep URL D. P. Wipf and B. D. Rao. Sparse bayesian learning for basis selection. IEEE Transactions on Signal Processing, 52(8): , Aug. 24. K. Yoshii, R. Tomioka, D. Mochihashi, and M. Goto. Infinite positive semidefinite tensor factorization for source separation of mixture signals. In Proc. International Conference on Machine Learning (ICML),

NONNEGATIVE MATRIX FACTORIZATION WITH TRANSFORM LEARNING. Dylan Fagot, Herwig Wendt and Cédric Févotte

NONNEGATIVE MATRIX FACTORIZATION WITH TRANSFORM LEARNING. Dylan Fagot, Herwig Wendt and Cédric Févotte NONNEGATIVE MATRIX FACTORIZATION WITH TRANSFORM LEARNING Dylan Fagot, Herwig Wendt and Cédric Févotte IRIT, Université de Toulouse, CNRS, Toulouse, France firstname.lastname@irit.fr ABSTRACT Traditional

More information

Jacobi Algorithm For Nonnegative Matrix Factorization With Transform Learning

Jacobi Algorithm For Nonnegative Matrix Factorization With Transform Learning Jacobi Algorithm For Nonnegative Matrix Factorization With Transform Learning Herwig Wt, Dylan Fagot and Cédric Févotte IRIT, Université de Toulouse, CNRS Toulouse, France firstname.lastname@irit.fr Abstract

More information

HYBRID SPARSE AND LOW-RANK TIME-FREQUENCY SIGNAL DECOMPOSITION

HYBRID SPARSE AND LOW-RANK TIME-FREQUENCY SIGNAL DECOMPOSITION HYBRID SPARSE AND LOW-RANK TIME-FREQUENCY SIGNAL DECOMPOSITION Cédric Févotte, Matthieu Kowalski, Laboratoire Lagrange (CNRS, OCA & Université Nice Sophia Antipolis), Nice, France Laboratoire des Signaux

More information

Low-Rank Time-Frequency Synthesis

Low-Rank Time-Frequency Synthesis Low-Rank - Synthesis Cédric Févotte Laboratoire Lagrange CNRS, OCA & Université de Nice Nice, France cfevotte@unice.fr Matthieu Kowalski Laboratoire des Signaux et Systèmes CNRS, Supélec & Université Paris-Sud

More information

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement Simon Leglaive 1 Laurent Girin 1,2 Radu Horaud 1 1: Inria Grenoble Rhône-Alpes 2: Univ. Grenoble Alpes, Grenoble INP,

More information

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS Emad M. Grais and Hakan Erdogan Faculty of Engineering and Natural Sciences, Sabanci University, Orhanli

More information

Expectation-maximization algorithms for Itakura-Saito nonnegative matrix factorization

Expectation-maximization algorithms for Itakura-Saito nonnegative matrix factorization Expectation-maximization algorithms for Itakura-Saito nonnegative matrix factorization Paul Magron, Tuomas Virtanen To cite this version: Paul Magron, Tuomas Virtanen. Expectation-maximization algorithms

More information

MULTI-RESOLUTION SIGNAL DECOMPOSITION WITH TIME-DOMAIN SPECTROGRAM FACTORIZATION. Hirokazu Kameoka

MULTI-RESOLUTION SIGNAL DECOMPOSITION WITH TIME-DOMAIN SPECTROGRAM FACTORIZATION. Hirokazu Kameoka MULTI-RESOLUTION SIGNAL DECOMPOSITION WITH TIME-DOMAIN SPECTROGRAM FACTORIZATION Hiroazu Kameoa The University of Toyo / Nippon Telegraph and Telephone Corporation ABSTRACT This paper proposes a novel

More information

FACTORS IN FACTORIZATION: DOES BETTER AUDIO SOURCE SEPARATION IMPLY BETTER POLYPHONIC MUSIC TRANSCRIPTION?

FACTORS IN FACTORIZATION: DOES BETTER AUDIO SOURCE SEPARATION IMPLY BETTER POLYPHONIC MUSIC TRANSCRIPTION? FACTORS IN FACTORIZATION: DOES BETTER AUDIO SOURCE SEPARATION IMPLY BETTER POLYPHONIC MUSIC TRANSCRIPTION? Tiago Fernandes Tavares, George Tzanetakis, Peter Driessen University of Victoria Department of

More information

Non-Negative Matrix Factorization And Its Application to Audio. Tuomas Virtanen Tampere University of Technology

Non-Negative Matrix Factorization And Its Application to Audio. Tuomas Virtanen Tampere University of Technology Non-Negative Matrix Factorization And Its Application to Audio Tuomas Virtanen Tampere University of Technology tuomas.virtanen@tut.fi 2 Contents Introduction to audio signals Spectrogram representation

More information

OPTIMAL WEIGHT LEARNING FOR COUPLED TENSOR FACTORIZATION WITH MIXED DIVERGENCES

OPTIMAL WEIGHT LEARNING FOR COUPLED TENSOR FACTORIZATION WITH MIXED DIVERGENCES OPTIMAL WEIGHT LEARNING FOR COUPLED TENSOR FACTORIZATION WITH MIXED DIVERGENCES Umut Şimşekli, Beyza Ermiş, A. Taylan Cemgil Boğaziçi University Dept. of Computer Engineering 34342, Bebek, İstanbul, Turkey

More information

CONVOLUTIVE NON-NEGATIVE MATRIX FACTORISATION WITH SPARSENESS CONSTRAINT

CONVOLUTIVE NON-NEGATIVE MATRIX FACTORISATION WITH SPARSENESS CONSTRAINT CONOLUTIE NON-NEGATIE MATRIX FACTORISATION WITH SPARSENESS CONSTRAINT Paul D. O Grady Barak A. Pearlmutter Hamilton Institute National University of Ireland, Maynooth Co. Kildare, Ireland. ABSTRACT Discovering

More information

Scalable audio separation with light Kernel Additive Modelling

Scalable audio separation with light Kernel Additive Modelling Scalable audio separation with light Kernel Additive Modelling Antoine Liutkus 1, Derry Fitzgerald 2, Zafar Rafii 3 1 Inria, Université de Lorraine, LORIA, UMR 7503, France 2 NIMBUS Centre, Cork Institute

More information

LONG-TERM REVERBERATION MODELING FOR UNDER-DETERMINED AUDIO SOURCE SEPARATION WITH APPLICATION TO VOCAL MELODY EXTRACTION.

LONG-TERM REVERBERATION MODELING FOR UNDER-DETERMINED AUDIO SOURCE SEPARATION WITH APPLICATION TO VOCAL MELODY EXTRACTION. LONG-TERM REVERBERATION MODELING FOR UNDER-DETERMINED AUDIO SOURCE SEPARATION WITH APPLICATION TO VOCAL MELODY EXTRACTION. Romain Hennequin Deezer R&D 10 rue d Athènes, 75009 Paris, France rhennequin@deezer.com

More information

Nonnegative Matrix Factor 2-D Deconvolution for Blind Single Channel Source Separation

Nonnegative Matrix Factor 2-D Deconvolution for Blind Single Channel Source Separation Nonnegative Matrix Factor 2-D Deconvolution for Blind Single Channel Source Separation Mikkel N. Schmidt and Morten Mørup Technical University of Denmark Informatics and Mathematical Modelling Richard

More information

Nonnegative Matrix Factorization with Markov-Chained Bases for Modeling Time-Varying Patterns in Music Spectrograms

Nonnegative Matrix Factorization with Markov-Chained Bases for Modeling Time-Varying Patterns in Music Spectrograms Nonnegative Matrix Factorization with Markov-Chained Bases for Modeling Time-Varying Patterns in Music Spectrograms Masahiro Nakano 1, Jonathan Le Roux 2, Hirokazu Kameoka 2,YuKitano 1, Nobutaka Ono 1,

More information

Dept. Electronics and Electrical Engineering, Keio University, Japan. NTT Communication Science Laboratories, NTT Corporation, Japan.

Dept. Electronics and Electrical Engineering, Keio University, Japan. NTT Communication Science Laboratories, NTT Corporation, Japan. JOINT SEPARATION AND DEREVERBERATION OF REVERBERANT MIXTURES WITH DETERMINED MULTICHANNEL NON-NEGATIVE MATRIX FACTORIZATION Hideaki Kagami, Hirokazu Kameoka, Masahiro Yukawa Dept. Electronics and Electrical

More information

ACCOUNTING FOR PHASE CANCELLATIONS IN NON-NEGATIVE MATRIX FACTORIZATION USING WEIGHTED DISTANCES. Sebastian Ewert Mark D. Plumbley Mark Sandler

ACCOUNTING FOR PHASE CANCELLATIONS IN NON-NEGATIVE MATRIX FACTORIZATION USING WEIGHTED DISTANCES. Sebastian Ewert Mark D. Plumbley Mark Sandler ACCOUNTING FOR PHASE CANCELLATIONS IN NON-NEGATIVE MATRIX FACTORIZATION USING WEIGHTED DISTANCES Sebastian Ewert Mark D. Plumbley Mark Sandler Queen Mary University of London, London, United Kingdom ABSTRACT

More information

arxiv: v1 [stat.ml] 6 Nov 2018

arxiv: v1 [stat.ml] 6 Nov 2018 A Quasi-Newton algorithm on the orthogonal manifold for NMF with transform learning arxiv:1811.02225v1 [stat.ml] 6 Nov 2018 Pierre Ablin, Dylan Fagot, Herwig Wendt, Alexandre Gramfort, and Cédric Févotte

More information

Single Channel Music Sound Separation Based on Spectrogram Decomposition and Note Classification

Single Channel Music Sound Separation Based on Spectrogram Decomposition and Note Classification Single Channel Music Sound Separation Based on Spectrogram Decomposition and Note Classification Hafiz Mustafa and Wenwu Wang Centre for Vision, Speech and Signal Processing (CVSSP) University of Surrey,

More information

Non-negative Matrix Factorization: Algorithms, Extensions and Applications

Non-negative Matrix Factorization: Algorithms, Extensions and Applications Non-negative Matrix Factorization: Algorithms, Extensions and Applications Emmanouil Benetos www.soi.city.ac.uk/ sbbj660/ March 2013 Emmanouil Benetos Non-negative Matrix Factorization March 2013 1 / 25

More information

Discovering Convolutive Speech Phones using Sparseness and Non-Negativity Constraints

Discovering Convolutive Speech Phones using Sparseness and Non-Negativity Constraints Discovering Convolutive Speech Phones using Sparseness and Non-Negativity Constraints Paul D. O Grady and Barak A. Pearlmutter Hamilton Institute, National University of Ireland Maynooth, Co. Kildare,

More information

Oracle Analysis of Sparse Automatic Music Transcription

Oracle Analysis of Sparse Automatic Music Transcription Oracle Analysis of Sparse Automatic Music Transcription Ken O Hanlon, Hidehisa Nagano, and Mark D. Plumbley Queen Mary University of London NTT Communication Science Laboratories, NTT Corporation {keno,nagano,mark.plumbley}@eecs.qmul.ac.uk

More information

NONNEGATIVE matrix factorization was originally introduced

NONNEGATIVE matrix factorization was originally introduced 1670 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 22, NO. 11, NOVEMBER 2014 Multichannel High-Resolution NMF for Modeling Convolutive Mixtures of Non-Stationary Signals in the

More information

STRUCTURE-AWARE DICTIONARY LEARNING WITH HARMONIC ATOMS

STRUCTURE-AWARE DICTIONARY LEARNING WITH HARMONIC ATOMS 19th European Signal Processing Conference (EUSIPCO 2011) Barcelona, Spain, August 29 - September 2, 2011 STRUCTURE-AWARE DICTIONARY LEARNING WITH HARMONIC ATOMS Ken O Hanlon and Mark D.Plumbley Queen

More information

A Comparative Study of Example-guided Audio Source Separation Approaches Based On Nonnegative Matrix Factorization

A Comparative Study of Example-guided Audio Source Separation Approaches Based On Nonnegative Matrix Factorization A Comparative Study of Example-guided Audio Source Separation Approaches Based On Nonnegative Matrix Factorization Alexey Ozerov, Srđan Kitić, Patrick Pérez To cite this version: Alexey Ozerov, Srđan Kitić,

More information

Generalized Constraints for NMF with Application to Informed Source Separation

Generalized Constraints for NMF with Application to Informed Source Separation Generalized Constraints for with Application to Informed Source Separation Christian Rohlfing and Julian M. Becker Institut für Nachrichtentechnik RWTH Aachen University, D2056 Aachen, Germany Email: rohlfing@ient.rwth-aachen.de

More information

Cauchy Nonnegative Matrix Factorization

Cauchy Nonnegative Matrix Factorization Cauchy Nonnegative Matrix Factorization Antoine Liutkus, Derry Fitzgerald, Roland Badeau To cite this version: Antoine Liutkus, Derry Fitzgerald, Roland Badeau. Cauchy Nonnegative Matrix Factorization.

More information

Convolutive Non-Negative Matrix Factorization for CQT Transform using Itakura-Saito Divergence

Convolutive Non-Negative Matrix Factorization for CQT Transform using Itakura-Saito Divergence Convolutive Non-Negative Matrix Factorization for CQT Transform using Itakura-Saito Divergence Fabio Louvatti do Carmo; Evandro Ottoni Teatini Salles Abstract This paper proposes a modification of the

More information

744 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 4, MAY 2011

744 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 4, MAY 2011 744 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 4, MAY 2011 NMF With Time Frequency Activations to Model Nonstationary Audio Events Romain Hennequin, Student Member, IEEE,

More information

SOUND SOURCE SEPARATION BASED ON NON-NEGATIVE TENSOR FACTORIZATION INCORPORATING SPATIAL CUE AS PRIOR KNOWLEDGE

SOUND SOURCE SEPARATION BASED ON NON-NEGATIVE TENSOR FACTORIZATION INCORPORATING SPATIAL CUE AS PRIOR KNOWLEDGE SOUND SOURCE SEPARATION BASED ON NON-NEGATIVE TENSOR FACTORIZATION INCORPORATING SPATIAL CUE AS PRIOR KNOWLEDGE Yuki Mitsufuji Sony Corporation, Tokyo, Japan Axel Roebel 1 IRCAM-CNRS-UPMC UMR 9912, 75004,

More information

Blind Spectral-GMM Estimation for Underdetermined Instantaneous Audio Source Separation

Blind Spectral-GMM Estimation for Underdetermined Instantaneous Audio Source Separation Blind Spectral-GMM Estimation for Underdetermined Instantaneous Audio Source Separation Simon Arberet 1, Alexey Ozerov 2, Rémi Gribonval 1, and Frédéric Bimbot 1 1 METISS Group, IRISA-INRIA Campus de Beaulieu,

More information

TIME-DEPENDENT PARAMETRIC AND HARMONIC TEMPLATES IN NON-NEGATIVE MATRIX FACTORIZATION

TIME-DEPENDENT PARAMETRIC AND HARMONIC TEMPLATES IN NON-NEGATIVE MATRIX FACTORIZATION TIME-DEPENDENT PARAMETRIC AND HARMONIC TEMPLATES IN NON-NEGATIVE MATRIX FACTORIZATION Romain Hennequin, Roland Badeau and Bertrand David, Institut Telecom; Telecom ParisTech; CNRS LTCI Paris France romainhennequin@telecom-paristechfr

More information

Deep NMF for Speech Separation

Deep NMF for Speech Separation MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Deep NMF for Speech Separation Le Roux, J.; Hershey, J.R.; Weninger, F.J. TR2015-029 April 2015 Abstract Non-negative matrix factorization

More information

ORTHOGONALITY-REGULARIZED MASKED NMF FOR LEARNING ON WEAKLY LABELED AUDIO DATA. Iwona Sobieraj, Lucas Rencker, Mark D. Plumbley

ORTHOGONALITY-REGULARIZED MASKED NMF FOR LEARNING ON WEAKLY LABELED AUDIO DATA. Iwona Sobieraj, Lucas Rencker, Mark D. Plumbley ORTHOGONALITY-REGULARIZED MASKED NMF FOR LEARNING ON WEAKLY LABELED AUDIO DATA Iwona Sobieraj, Lucas Rencker, Mark D. Plumbley University of Surrey Centre for Vision Speech and Signal Processing Guildford,

More information

Maximum Marginal Likelihood Estimation For Nonnegative Dictionary Learning

Maximum Marginal Likelihood Estimation For Nonnegative Dictionary Learning Maximum Marginal Likelihood Estimation For Nonnegative Dictionary Learning Onur Dikmen and Cédric Févotte CNRS LTCI; Télécom ParisTech dikmen@telecom-paristech.fr perso.telecom-paristech.fr/ dikmen 26.5.2011

More information

ESTIMATING TRAFFIC NOISE LEVELS USING ACOUSTIC MONITORING: A PRELIMINARY STUDY

ESTIMATING TRAFFIC NOISE LEVELS USING ACOUSTIC MONITORING: A PRELIMINARY STUDY ESTIMATING TRAFFIC NOISE LEVELS USING ACOUSTIC MONITORING: A PRELIMINARY STUDY Jean-Rémy Gloaguen, Arnaud Can Ifsttar - LAE Route de Bouaye - CS4 44344, Bouguenais, FR jean-remy.gloaguen@ifsttar.fr Mathieu

More information

NMF WITH SPECTRAL AND TEMPORAL CONTINUITY CRITERIA FOR MONAURAL SOUND SOURCE SEPARATION. Julian M. Becker, Christian Sohn and Christian Rohlfing

NMF WITH SPECTRAL AND TEMPORAL CONTINUITY CRITERIA FOR MONAURAL SOUND SOURCE SEPARATION. Julian M. Becker, Christian Sohn and Christian Rohlfing NMF WITH SPECTRAL AND TEMPORAL CONTINUITY CRITERIA FOR MONAURAL SOUND SOURCE SEPARATION Julian M. ecker, Christian Sohn Christian Rohlfing Institut für Nachrichtentechnik RWTH Aachen University D-52056

More information

Ken O Hanlon and Mark B. Sandler. Centre for Digital Music Queen Mary University of London

Ken O Hanlon and Mark B. Sandler. Centre for Digital Music Queen Mary University of London AN ITERATIVE HARD THRESHOLDING APPROACH TO l 0 SPARSE HELLINGER NMF Ken O Hanlon and Mark B. Sandler Centre for Digital Music Queen Mary University of London ABSTRACT Performance of Non-negative Matrix

More information

On Spectral Basis Selection for Single Channel Polyphonic Music Separation

On Spectral Basis Selection for Single Channel Polyphonic Music Separation On Spectral Basis Selection for Single Channel Polyphonic Music Separation Minje Kim and Seungjin Choi Department of Computer Science Pohang University of Science and Technology San 31 Hyoja-dong, Nam-gu

More information

An Efficient Posterior Regularized Latent Variable Model for Interactive Sound Source Separation

An Efficient Posterior Regularized Latent Variable Model for Interactive Sound Source Separation An Efficient Posterior Regularized Latent Variable Model for Interactive Sound Source Separation Nicholas J. Bryan Center for Computer Research in Music and Acoustics, Stanford University Gautham J. Mysore

More information

Gaussian Processes for Audio Feature Extraction

Gaussian Processes for Audio Feature Extraction Gaussian Processes for Audio Feature Extraction Dr. Richard E. Turner (ret26@cam.ac.uk) Computational and Biological Learning Lab Department of Engineering University of Cambridge Machine hearing pipeline

More information

EXPLOITING LONG-TERM TEMPORAL DEPENDENCIES IN NMF USING RECURRENT NEURAL NETWORKS WITH APPLICATION TO SOURCE SEPARATION

EXPLOITING LONG-TERM TEMPORAL DEPENDENCIES IN NMF USING RECURRENT NEURAL NETWORKS WITH APPLICATION TO SOURCE SEPARATION EXPLOITING LONG-TERM TEMPORAL DEPENDENCIES IN NMF USING RECURRENT NEURAL NETWORKS WITH APPLICATION TO SOURCE SEPARATION Nicolas Boulanger-Lewandowski Gautham J. Mysore Matthew Hoffman Université de Montréal

More information

Sparse Time-Frequency Transforms and Applications.

Sparse Time-Frequency Transforms and Applications. Sparse Time-Frequency Transforms and Applications. Bruno Torrésani http://www.cmi.univ-mrs.fr/~torresan LATP, Université de Provence, Marseille DAFx, Montreal, September 2006 B. Torrésani (LATP Marseille)

More information

Author manuscript, published in "7th International Symposium on Computer Music Modeling and Retrieval (CMMR 2010), Málaga : Spain (2010)"

Author manuscript, published in 7th International Symposium on Computer Music Modeling and Retrieval (CMMR 2010), Málaga : Spain (2010) Author manuscript, published in "7th International Symposium on Computer Music Modeling and Retrieval (CMMR 2010), Málaga : Spain (2010)" Notes on nonnegative tensor factorization of the spectrogram for

More information

ACOUSTIC SCENE CLASSIFICATION WITH MATRIX FACTORIZATION FOR UNSUPERVISED FEATURE LEARNING. Victor Bisot, Romain Serizel, Slim Essid, Gaël Richard

ACOUSTIC SCENE CLASSIFICATION WITH MATRIX FACTORIZATION FOR UNSUPERVISED FEATURE LEARNING. Victor Bisot, Romain Serizel, Slim Essid, Gaël Richard ACOUSTIC SCENE CLASSIFICATION WITH MATRIX FACTORIZATION FOR UNSUPERVISED FEATURE LEARNING Victor Bisot, Romain Serizel, Slim Essid, Gaël Richard LTCI, CNRS, Télćom ParisTech, Université Paris-Saclay, 75013,

More information

A diagonal plus low-rank covariance model for computationally efficient source separation

A diagonal plus low-rank covariance model for computationally efficient source separation A diagonal plus low-ran covariance model for computationally efficient source separation Antoine Liutus, Kazuyoshi Yoshii To cite this version: Antoine Liutus, Kazuyoshi Yoshii. A diagonal plus low-ran

More information

AN EM ALGORITHM FOR AUDIO SOURCE SEPARATION BASED ON THE CONVOLUTIVE TRANSFER FUNCTION

AN EM ALGORITHM FOR AUDIO SOURCE SEPARATION BASED ON THE CONVOLUTIVE TRANSFER FUNCTION AN EM ALGORITHM FOR AUDIO SOURCE SEPARATION BASED ON THE CONVOLUTIVE TRANSFER FUNCTION Xiaofei Li, 1 Laurent Girin, 2,1 Radu Horaud, 1 1 INRIA Grenoble Rhône-Alpes, France 2 Univ. Grenoble Alpes, GIPSA-Lab,

More information

Multichannel Audio Modeling with Elliptically Stable Tensor Decomposition

Multichannel Audio Modeling with Elliptically Stable Tensor Decomposition Multichannel Audio Modeling with Elliptically Stable Tensor Decomposition Mathieu Fontaine, Fabian-Robert Stöter, Antoine Liutkus, Umut Simsekli, Romain Serizel, Roland Badeau To cite this version: Mathieu

More information

Music separation guided by cover tracks: designing the joint NMF model

Music separation guided by cover tracks: designing the joint NMF model Music separation guided by cover tracks: designing the joint NMF model Nathan Souviraà-Labastie, Emmanuel Vincent, Frédéric Bimbot To cite this version: Nathan Souviraà-Labastie, Emmanuel Vincent, Frédéric

More information

EUSIPCO

EUSIPCO EUSIPCO 213 1569744273 GAMMA HIDDEN MARKOV MODEL AS A PROBABILISTIC NONNEGATIVE MATRIX FACTORIZATION Nasser Mohammadiha, W. Bastiaan Kleijn, Arne Leijon KTH Royal Institute of Technology, Department of

More information

A State-Space Approach to Dynamic Nonnegative Matrix Factorization

A State-Space Approach to Dynamic Nonnegative Matrix Factorization 1 A State-Space Approach to Dynamic Nonnegative Matrix Factorization Nasser Mohammadiha, Paris Smaragdis, Ghazaleh Panahandeh, Simon Doclo arxiv:179.5v1 [cs.lg] 31 Aug 17 Abstract Nonnegative matrix factorization

More information

Dictionary learning methods and single-channel source separation

Dictionary learning methods and single-channel source separation Dictionary learning methods and single-channel source separation Augustin Lefèvre November 29, 2012 Acknowledgements Ministere de la Recherche Willow team European Research Council Sierra team TSI Telecom

More information

Notes on Nonnegative Tensor Factorization of the Spectrogram for Audio Source Separation: Statistical Insights and Towards Self-Clustering of the Spatial Cues Cédric Févotte 1, and Alexey Ozerov 2 1 CNRS

More information

Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis

Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis LETTER Communicated by Andrzej Cichocki Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis Cédric Févotte fevotte@telecom-paristech.fr Nancy Bertin nbertin@telecom-paristech.fr

More information

PROBLEM of online factorization of data matrices is of

PROBLEM of online factorization of data matrices is of 1 Online Matrix Factorization via Broyden Updates Ömer Deniz Akyıldız arxiv:1506.04389v [stat.ml] 6 Jun 015 Abstract In this paper, we propose an online algorithm to compute matrix factorizations. Proposed

More information

NON-NEGATIVE MATRIX FACTORIZATION FOR SINGLE-CHANNEL EEG ARTIFACT REJECTION

NON-NEGATIVE MATRIX FACTORIZATION FOR SINGLE-CHANNEL EEG ARTIFACT REJECTION NON-NEGATIVE MATRIX FACTORIZATION FOR SINGLE-CHANNEL EEG ARTIFACT REJECTION Cécilia Damon Antoine Liutkus Alexandre Gramfort Slim Essid Institut Mines-Telecom, TELECOM ParisTech - CNRS, LTCI 37, rue Dareau

More information

COLLABORATIVE SPEECH DEREVERBERATION: REGULARIZED TENSOR FACTORIZATION FOR CROWDSOURCED MULTI-CHANNEL RECORDINGS. Sanna Wager, Minje Kim

COLLABORATIVE SPEECH DEREVERBERATION: REGULARIZED TENSOR FACTORIZATION FOR CROWDSOURCED MULTI-CHANNEL RECORDINGS. Sanna Wager, Minje Kim COLLABORATIVE SPEECH DEREVERBERATION: REGULARIZED TENSOR FACTORIZATION FOR CROWDSOURCED MULTI-CHANNEL RECORDINGS Sanna Wager, Minje Kim Indiana University School of Informatics, Computing, and Engineering

More information

Analysis of polyphonic audio using source-filter model and non-negative matrix factorization

Analysis of polyphonic audio using source-filter model and non-negative matrix factorization Analysis of polyphonic audio using source-filter model and non-negative matrix factorization Tuomas Virtanen and Anssi Klapuri Tampere University of Technology, Institute of Signal Processing Korkeakoulunkatu

More information

Speech enhancement based on nonnegative matrix factorization with mixed group sparsity constraint

Speech enhancement based on nonnegative matrix factorization with mixed group sparsity constraint Speech enhancement based on nonnegative matrix factorization with mixed group sparsity constraint Hien-Thanh Duong, Quoc-Cuong Nguyen, Cong-Phuong Nguyen, Thanh-Huan Tran, Ngoc Q. K. Duong To cite this

More information

TIME-DEPENDENT PARAMETRIC AND HARMONIC TEMPLATES IN NON-NEGATIVE MATRIX FACTORIZATION

TIME-DEPENDENT PARAMETRIC AND HARMONIC TEMPLATES IN NON-NEGATIVE MATRIX FACTORIZATION TIME-DEPENDENT PARAMETRIC AND HARMONIC TEMPLATES IN NON-NEGATIVE MATRIX FACTORIZATION 13 th International Conference on Digital Audio Effects Romain Hennequin, Roland Badeau and Bertrand David Telecom

More information

PROJET - Spatial Audio Separation Using Projections

PROJET - Spatial Audio Separation Using Projections PROJET - Spatial Audio Separation Using Proections Derry Fitzgerald, Antoine Liutkus, Roland Badeau To cite this version: Derry Fitzgerald, Antoine Liutkus, Roland Badeau. PROJET - Spatial Audio Separation

More information

Underdetermined Instantaneous Audio Source Separation via Local Gaussian Modeling

Underdetermined Instantaneous Audio Source Separation via Local Gaussian Modeling Underdetermined Instantaneous Audio Source Separation via Local Gaussian Modeling Emmanuel Vincent, Simon Arberet, and Rémi Gribonval METISS Group, IRISA-INRIA Campus de Beaulieu, 35042 Rennes Cedex, France

More information

Variational inference

Variational inference Simon Leglaive Télécom ParisTech, CNRS LTCI, Université Paris Saclay November 18, 2016, Télécom ParisTech, Paris, France. Outline Introduction Probabilistic model Problem Log-likelihood decomposition EM

More information

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent voices Nonparametric likelihood

More information

On monotonicity of multiplicative update rules for weighted nonnegative tensor factorization

On monotonicity of multiplicative update rules for weighted nonnegative tensor factorization On monotonicity of multiplicative update rules for weighted nonnegative tensor factorization Alexey Ozerov, Ngoc Duong, Louis Chevallier To cite this version: Alexey Ozerov, Ngoc Duong, Louis Chevallier.

More information

Non-Negative Tensor Factorisation for Sound Source Separation

Non-Negative Tensor Factorisation for Sound Source Separation ISSC 2005, Dublin, Sept. -2 Non-Negative Tensor Factorisation for Sound Source Separation Derry FitzGerald, Matt Cranitch φ and Eugene Coyle* φ Dept. of Electronic Engineering, Cor Institute of Technology

More information

SUPERVISED NON-EUCLIDEAN SPARSE NMF VIA BILEVEL OPTIMIZATION WITH APPLICATIONS TO SPEECH ENHANCEMENT

SUPERVISED NON-EUCLIDEAN SPARSE NMF VIA BILEVEL OPTIMIZATION WITH APPLICATIONS TO SPEECH ENHANCEMENT SUPERVISED NON-EUCLIDEAN SPARSE NMF VIA BILEVEL OPTIMIZATION WITH APPLICATIONS TO SPEECH ENHANCEMENT Pablo Sprechmann, 1 Alex M. Bronstein, 2 and Guillermo Sapiro 1 1 Duke University, USA; 2 Tel Aviv University,

More information

Variational Bayesian EM algorithm for modeling mixtures of non-stationary signals in the time-frequency domain (HR-NMF)

Variational Bayesian EM algorithm for modeling mixtures of non-stationary signals in the time-frequency domain (HR-NMF) Variational Bayesian EM algorithm for modeling mixtures of non-stationary signals in the time-frequency domain HR-NMF Roland Badeau, Angélique Dremeau To cite this version: Roland Badeau, Angélique Dremeau.

More information

Audio Source Separation Based on Convolutive Transfer Function and Frequency-Domain Lasso Optimization

Audio Source Separation Based on Convolutive Transfer Function and Frequency-Domain Lasso Optimization Audio Source Separation Based on Convolutive Transfer Function and Frequency-Domain Lasso Optimization Xiaofei Li, Laurent Girin, Radu Horaud To cite this version: Xiaofei Li, Laurent Girin, Radu Horaud.

More information

REVIEW OF SINGLE CHANNEL SOURCE SEPARATION TECHNIQUES

REVIEW OF SINGLE CHANNEL SOURCE SEPARATION TECHNIQUES REVIEW OF SINGLE CHANNEL SOURCE SEPARATION TECHNIQUES Kedar Patki University of Rochester Dept. of Electrical and Computer Engineering kedar.patki@rochester.edu ABSTRACT The paper reviews the problem of

More information

Independent Component Analysis and Unsupervised Learning

Independent Component Analysis and Unsupervised Learning Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien National Cheng Kung University TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent

More information

Drum extraction in single channel audio signals using multi-layer non negative matrix factor deconvolution

Drum extraction in single channel audio signals using multi-layer non negative matrix factor deconvolution Drum extraction in single channel audio signals using multi-layer non negative matrix factor deconvolution Clément Laroche, Hélène Papadopoulos, Matthieu Kowalski, Gaël Richard To cite this version: Clément

More information

(3) where the mixing vector is the Fourier transform of are the STFT coefficients of the sources I. INTRODUCTION

(3) where the mixing vector is the Fourier transform of are the STFT coefficients of the sources I. INTRODUCTION 1830 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 18, NO. 7, SEPTEMBER 2010 Under-Determined Reverberant Audio Source Separation Using a Full-Rank Spatial Covariance Model Ngoc Q.

More information

VARIATIONAL BAYESIAN EM ALGORITHM FOR MODELING MIXTURES OF NON-STATIONARY SIGNALS IN THE TIME-FREQUENCY DOMAIN (HR-NMF)

VARIATIONAL BAYESIAN EM ALGORITHM FOR MODELING MIXTURES OF NON-STATIONARY SIGNALS IN THE TIME-FREQUENCY DOMAIN (HR-NMF) VARIATIONAL BAYESIAN EM ALGORITHM FOR MODELING MIXTURES OF NON-STATIONARY SIGNALS IN THE TIME-FREQUENCY DOMAIN HR-NMF Roland Badeau, Angélique Drémeau Institut Mines-Telecom, Telecom ParisTech, CNRS LTCI

More information

Phase-dependent anisotropic Gaussian model for audio source separation

Phase-dependent anisotropic Gaussian model for audio source separation Phase-dependent anisotropic Gaussian model for audio source separation Paul Magron, Roland Badeau, Bertrand David To cite this version: Paul Magron, Roland Badeau, Bertrand David. Phase-dependent anisotropic

More information

Single-channel source separation using non-negative matrix factorization

Single-channel source separation using non-negative matrix factorization Single-channel source separation using non-negative matrix factorization Mikkel N. Schmidt Technical University of Denmark mns@imm.dtu.dk www.mikkelschmidt.dk DTU Informatics Department of Informatics

More information

10ème Congrès Français d Acoustique

10ème Congrès Français d Acoustique 1ème Congrès Français d Acoustique Lyon, 1-16 Avril 1 Spectral similarity measure invariant to pitch shifting and amplitude scaling Romain Hennequin 1, Roland Badeau 1, Bertrand David 1 1 Institut TELECOM,

More information

Real-time polyphonic music transcription with non-negative matrix factorization and beta-divergence

Real-time polyphonic music transcription with non-negative matrix factorization and beta-divergence Real-time polyphonic music transcription with non-negative matrix factorization and beta-divergence Arnaud Dessein, Arshia Cont, Guillaume Lemaitre To cite this version: Arnaud Dessein, Arshia Cont, Guillaume

More information

Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative Matrix Factorization

Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative Matrix Factorization Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative Matrix Factorization Nicholas Bryan Dennis Sun Center for Computer Research in Music and Acoustics, Stanford University

More information

SHIFT-INVARIANT DICTIONARY LEARNING FOR SPARSE REPRESENTATIONS: EXTENDING K-SVD

SHIFT-INVARIANT DICTIONARY LEARNING FOR SPARSE REPRESENTATIONS: EXTENDING K-SVD SHIFT-INVARIANT DICTIONARY LEARNING FOR SPARSE REPRESENTATIONS: EXTENDING K-SVD Boris Mailhé, Sylvain Lesage,Rémi Gribonval and Frédéric Bimbot Projet METISS Centre de Recherche INRIA Rennes - Bretagne

More information

IN real-world audio signals, several sound sources are usually

IN real-world audio signals, several sound sources are usually 1066 IEEE RANSACIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 3, MARCH 2007 Monaural Sound Source Separation by Nonnegative Matrix Factorization With emporal Continuity and Sparseness Criteria

More information

Enforcing Harmonicity and Smoothness in Bayesian Non-negative Matrix Factorization Applied to Polyphonic Music Transcription.

Enforcing Harmonicity and Smoothness in Bayesian Non-negative Matrix Factorization Applied to Polyphonic Music Transcription. Enforcing Harmonicity and Smoothness in Bayesian Non-negative Matrix Factorization Applied to Polyphonic Music Transcription. Nancy Bertin*, Roland Badeau, Member, IEEE and Emmanuel Vincent, Member, IEEE

More information

Environmental Sound Classification in Realistic Situations

Environmental Sound Classification in Realistic Situations Environmental Sound Classification in Realistic Situations K. Haddad, W. Song Brüel & Kjær Sound and Vibration Measurement A/S, Skodsborgvej 307, 2850 Nærum, Denmark. X. Valero La Salle, Universistat Ramon

More information

Bayesian Hierarchical Modeling for Music and Audio Processing at LabROSA

Bayesian Hierarchical Modeling for Music and Audio Processing at LabROSA Bayesian Hierarchical Modeling for Music and Audio Processing at LabROSA Dawen Liang (LabROSA) Joint work with: Dan Ellis (LabROSA), Matt Hoffman (Adobe Research), Gautham Mysore (Adobe Research) 1. Bayesian

More information

Matrix Factorization for Speech Enhancement

Matrix Factorization for Speech Enhancement Matrix Factorization for Speech Enhancement Peter Li Peter.Li@nyu.edu Yijun Xiao ryjxiao@nyu.edu 1 Introduction In this report, we explore techniques for speech enhancement using matrix factorization.

More information

Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs

Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs Paris Smaragdis TR2004-104 September

More information

SHIFTED AND CONVOLUTIVE SOURCE-FILTER NON-NEGATIVE MATRIX FACTORIZATION FOR MONAURAL AUDIO SOURCE SEPARATION. Tomohiko Nakamura and Hirokazu Kameoka,

SHIFTED AND CONVOLUTIVE SOURCE-FILTER NON-NEGATIVE MATRIX FACTORIZATION FOR MONAURAL AUDIO SOURCE SEPARATION. Tomohiko Nakamura and Hirokazu Kameoka, SHIFTED AND CONVOLUTIVE SOURCE-FILTER NON-NEGATIVE MATRIX FACTORIZATION FOR MONAURAL AUDIO SOURCE SEPARATION Tomohiko Nakamura and Hirokazu Kameoka, Graduate School of Information Science and Technology,

More information

Diffuse noise suppression with asynchronous microphone array based on amplitude additivity model

Diffuse noise suppression with asynchronous microphone array based on amplitude additivity model Diffuse noise suppression with asynchronous microphone array based on amplitude additivity model Yoshikazu Murase, Hironobu Chiba, Nobutaka Ono, Shigeki Miyabe, Takeshi Yamada, and Shoji Makino University

More information

MULTIPITCH ESTIMATION AND INSTRUMENT RECOGNITION BY EXEMPLAR-BASED SPARSE REPRESENTATION. Ikuo Degawa, Kei Sato, Masaaki Ikehara

MULTIPITCH ESTIMATION AND INSTRUMENT RECOGNITION BY EXEMPLAR-BASED SPARSE REPRESENTATION. Ikuo Degawa, Kei Sato, Masaaki Ikehara MULTIPITCH ESTIMATION AND INSTRUMENT RECOGNITION BY EXEMPLAR-BASED SPARSE REPRESENTATION Ikuo Degawa, Kei Sato, Masaaki Ikehara EEE Dept. Keio University Yokohama, Kanagawa 223-8522 Japan E-mail:{degawa,

More information

Sound Recognition in Mixtures

Sound Recognition in Mixtures Sound Recognition in Mixtures Juhan Nam, Gautham J. Mysore 2, and Paris Smaragdis 2,3 Center for Computer Research in Music and Acoustics, Stanford University, 2 Advanced Technology Labs, Adobe Systems

More information

On the use of a spatial cue as prior information for stereo sound source separation based on spatially weighted non-negative tensor factorization

On the use of a spatial cue as prior information for stereo sound source separation based on spatially weighted non-negative tensor factorization Mitsufuji and Roebel EURASIP Journal on Advances in Signal Processing 2014, 2014:40 RESEARCH Open Access On the use of a spatial cue as prior information for stereo sound source separation based on spatially

More information

Sparseness Constraints on Nonnegative Tensor Decomposition

Sparseness Constraints on Nonnegative Tensor Decomposition Sparseness Constraints on Nonnegative Tensor Decomposition Na Li nali@clarksonedu Carmeliza Navasca cnavasca@clarksonedu Department of Mathematics Clarkson University Potsdam, New York 3699, USA Department

More information

502 IEEE SIGNAL PROCESSING LETTERS, VOL. 23, NO. 4, APRIL 2016

502 IEEE SIGNAL PROCESSING LETTERS, VOL. 23, NO. 4, APRIL 2016 502 IEEE SIGNAL PROCESSING LETTERS, VOL. 23, NO. 4, APRIL 2016 Discriminative Training of NMF Model Based on Class Probabilities for Speech Enhancement Hanwook Chung, Student Member, IEEE, Eric Plourde,

More information

Multichannel Extensions of Non-negative Matrix Factorization with Complex-valued Data

Multichannel Extensions of Non-negative Matrix Factorization with Complex-valued Data 1 Multichannel Extensions of Non-negative Matrix Factorization with Complex-valued Data Hiroshi Sawada, Senior Member, IEEE, Hirokazu Kameoka, Member, IEEE, Shoko Araki, Senior Member, IEEE, Naonori Ueda,

More information

Scalable audio separation with light kernel additive modelling

Scalable audio separation with light kernel additive modelling Scalable audio separation with light kernel additive modelling Antoine Liutkus, Derry Fitzgerald, Zafar Rafii To cite this version: Antoine Liutkus, Derry Fitzgerald, Zafar Rafii. Scalable audio separation

More information

Harmonic/Percussive Separation Using Kernel Additive Modelling

Harmonic/Percussive Separation Using Kernel Additive Modelling Author manuscript, published in "IET Irish Signals & Systems Conference 2014 (2014)" ISSC 2014 / CIICT 2014, Limerick, June 26 27 Harmonic/Percussive Separation Using Kernel Additive Modelling Derry FitzGerald

More information

Learning the β-divergence in Tweedie Compound Poisson Matrix Factorization Models

Learning the β-divergence in Tweedie Compound Poisson Matrix Factorization Models Learning the β-divergence in Tweedie Compound Poisson Matrix Factorization Models Umut Şimşekli umut.simsekli@boun.edu.tr Dept. of Computer Engineering, Boğaziçi University, 34342 Bebek, Istanbul, Turkey

More information

Constrained Nonnegative Matrix Factorization with Applications to Music Transcription

Constrained Nonnegative Matrix Factorization with Applications to Music Transcription Constrained Nonnegative Matrix Factorization with Applications to Music Transcription by Daniel Recoskie A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the

More information

Consistent Wiener Filtering for Audio Source Separation

Consistent Wiener Filtering for Audio Source Separation MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Consistent Wiener Filtering for Audio Source Separation Le Roux, J.; Vincent, E. TR2012-090 October 2012 Abstract Wiener filtering is one of

More information