Research Seminar Télécom ParisTech

Size: px
Start display at page:

Download "Research Seminar Télécom ParisTech"

Transcription

1 Research Seminar Télécom ParisTech Computational Methods of Information Geometry with Real-Time Applications in Audio Signal Processing Arnaud Dessein August 30th 2013 M. Gérard Assayag Directeur M. Arshia Cont Encadrant M. Francis Bach Rapporteur M. Frank Nielsen Rapporteur M. Roland Badeau Examinateur M. Silvère Bonnabel Examinateur M. Jean-Luc Zarader Examinateur August 30th 2013 Research Seminar Télécom ParisTech 1/46

2 Introduction From information geometry theory: Study of statistics with concepts from: Differential geometry such as smooth manifolds. Information theory such as statistical divergences. Parametric statistical models possess an intrinsic geometrical structure. August 30th 2013 Research Seminar Télécom ParisTech 2/46

3 Introduction From information geometry theory: Study of statistics with concepts from: Differential geometry such as smooth manifolds. Information theory such as statistical divergences. Parametric statistical models possess an intrinsic geometrical structure. To computational information geometry: Broad community around the development and application of computational methods based on information geometry theory. Many techniques in machine learning and signal processing rely on statistical models or distance functions: principal component analysis, independent component analysis, centroid computation, k-means, expectation-maximization, nearest neighbor search, range search, smallest enclosing balls, Voronoi diagrams. August 30th 2013 Research Seminar Télécom ParisTech 2/46

4 Introduction From information geometry theory: Study of statistics with concepts from: Differential geometry such as smooth manifolds. Information theory such as statistical divergences. Parametric statistical models possess an intrinsic geometrical structure. To computational information geometry: Broad community around the development and application of computational methods based on information geometry theory. Many techniques in machine learning and signal processing rely on statistical models or distance functions: principal component analysis, independent component analysis, centroid computation, k-means, expectation-maximization, nearest neighbor search, range search, smallest enclosing balls, Voronoi diagrams. Objectives of the thesis: Employ this framework for audio signal processing. Primary motivations from real-time machine listening. August 30th 2013 Research Seminar Télécom ParisTech 2/46

5 Outline I. Computational Methods of Information Geometry 1. Preliminaries on Information Geometry 2. Sequential Change Detection with Exponential Families 3. Non-Negative Matrix Factorization with Convex-Concave Divergences II. Real-Time Applications in Audio Signal Processing 4. Real-Time Audio Segmentation 5. Real-Time Polyphonic Music Transcription August 30th 2013 Research Seminar Télécom ParisTech 3/46

6 Outline Separable divergences on the space of discrete positive measures Exponential families of probability distributions I. Computational Methods of Information Geometry 1. Preliminaries on Information Geometry 2. Sequential Change Detection with Exponential Families 3. Non-Negative Matrix Factorization with Convex-Concave Divergences II. Real-Time Applications in Audio Signal Processing 4. Real-Time Audio Segmentation 5. Real-Time Polyphonic Music Transcription August 30th 2013 Research Seminar Télécom ParisTech 4/46

7 Basic notions and common information divergences Separable divergences on the space of discrete positive measures Exponential families of probability distributions Divergences or distance functions generalize metric distances: Divergence: D(y y ) 0 and D(y y) = 0. Separable divergence: D(y y ) = m i=1 d(y i y i ). Convex-concave divergence: d(y y ) = d(y y q ) + d(y y p ). Squared Euclidean distance on R: d E (y y ) = (y y ) 2. Kullback-Leibler divergence on R + : d KL(y y ) = y log y/y y + y. Itakura-Saito divergence on R + : d IS(y y ) = y/y log y/y 1. arnaud.dessein@ircam.fr August 30th 2013 Research Seminar Télécom ParisTech 5/46

8 Basic notions and common information divergences Separable divergences on the space of discrete positive measures Exponential families of probability distributions Divergences or distance functions generalize metric distances: Divergence: D(y y ) 0 and D(y y) = 0. Separable divergence: D(y y ) = m i=1 d(y i y i ). Convex-concave divergence: d(y y ) = d(y y q ) + d(y y p ). Squared Euclidean distance on R: d E (y y ) = (y y ) 2. Kullback-Leibler divergence on R + : d KL(y y ) = y log y/y y + y. Itakura-Saito divergence on R + : d IS(y y ) = y/y log y/y 1. Separable divergences on (R + )n generated by a smooth convex function ϕ: Csiszár: d (C) ϕ (y y ) = y ϕ(y /y), where ϕ(1) = ϕ (1) = 0. α-divergences: d (a) α (y y ) = 1 α(1 α) (αy + (1 α)y y α y 1 α ). Kullback-Leibler, dual Kullback-Leibler, Hellinger, Pearson s χ 2, Neyman s χ 2. Bregman: d (B) ϕ (y y ) = ϕ(y) ϕ(y ) (y y )ϕ (y ). β-divergences: d (b) β (y y ) = 1 β(β 1) (y β + (β 1)y β βyy β 1 ). Squared Euclidean, Kullback-Leibler, Itakura-Saito. Skew Jeffreys-Bregman: d (JB) ϕ,λ (y y ) = λd ϕ(y y ) + (1 λ)d ϕ(y y). Skew Jensen-Bregman: d (JB ) ϕ,λ (y y ) = λd ϕ(y λy + (1 λ)y ) + (1 λ)d ϕ(y λy + (1 λ)y ). arnaud.dessein@ircam.fr August 30th 2013 Research Seminar Télécom ParisTech 5/46

9 Skew (α, β, λ)-divergences Separable divergences on the space of discrete positive measures Exponential families of probability distributions Separable divergences on (R + )n (continued): (α, β)-divergence: d (ab) α,β (y y 1 ) = αβ(α+β) (αy α+β + βy α+β (α + β)y α y β ). α-divergence for α + β = 1. (β + 1)-divergence for α = 1. Skew (α, β, λ)-divergence: d (ab) α,β,λ (y y ) = λd (ab) α,β (y y ) + (1 λ)d (ab) α,β (y y). arnaud.dessein@ircam.fr August 30th 2013 Research Seminar Télécom ParisTech 6/46

10 Basic notions and properties Separable divergences on the space of discrete positive measures Exponential families of probability distributions Exponential family: p θ (x) = exp(θ x ψ(θ)). The sufficient observations x belong to R m. The natural parameters θ belong to a convex set N R m. The log-normalizer ψ is convex on N and smooth on int N. arnaud.dessein@ircam.fr August 30th 2013 Research Seminar Télécom ParisTech 7/46

11 Basic notions and properties Separable divergences on the space of discrete positive measures Exponential families of probability distributions Exponential family: p θ (x) = exp(θ x ψ(θ)). The sufficient observations x belong to R m. The natural parameters θ belong to a convex set N R m. The log-normalizer ψ is convex on N and smooth on int N. Many common models: Bernoulli, Dirichlet, Gaussian, Laplace, Pareto, Poisson, Rayleigh, Von Mises-Fisher, Weibull, Wishart, log-normal, exponential, beta, gamma, geometric, binomial, negative binomial, categorical, multinomial. arnaud.dessein@ircam.fr August 30th 2013 Research Seminar Télécom ParisTech 7/46

12 Basic notions and properties Separable divergences on the space of discrete positive measures Exponential families of probability distributions Exponential family: p θ (x) = exp(θ x ψ(θ)). The sufficient observations x belong to R m. The natural parameters θ belong to a convex set N R m. The log-normalizer ψ is convex on N and smooth on int N. Many common models: Bernoulli, Dirichlet, Gaussian, Laplace, Pareto, Poisson, Rayleigh, Von Mises-Fisher, Weibull, Wishart, log-normal, exponential, beta, gamma, geometric, binomial, negative binomial, categorical, multinomial. Legendre-Fenchel conjugate: φ(η) = sup θ R m θ η ψ(θ). The expectation parameters η belong to the convex set int K. We have duality between natural and expectation parameters through ψ and φ. arnaud.dessein@ircam.fr August 30th 2013 Research Seminar Télécom ParisTech 7/46

13 Basic notions and properties Separable divergences on the space of discrete positive measures Exponential families of probability distributions Exponential family: p θ (x) = exp(θ x ψ(θ)). The sufficient observations x belong to R m. The natural parameters θ belong to a convex set N R m. The log-normalizer ψ is convex on N and smooth on int N. Many common models: Bernoulli, Dirichlet, Gaussian, Laplace, Pareto, Poisson, Rayleigh, Von Mises-Fisher, Weibull, Wishart, log-normal, exponential, beta, gamma, geometric, binomial, negative binomial, categorical, multinomial. Legendre-Fenchel conjugate: φ(η) = sup θ R m θ η ψ(θ). The expectation parameters η belong to the convex set int K. We have duality between natural and expectation parameters through ψ and φ. Maximum likelihood: pη ml (x 1,..., x n) = 1 n n j=1 x j. Simple arithmetic mean in expectation parameters. Natural parameters obtained by convex duality. arnaud.dessein@ircam.fr August 30th 2013 Research Seminar Télécom ParisTech 7/46

14 Dually flat geometry Separable divergences on the space of discrete positive measures Exponential families of probability distributions Canonical divergences: Kullback-Leibler divergence: D KL (P θ P θ ) = p θ log(p θ /p θ ) dν. Bregman divergences: B ϕ(ξ ξ ) = ϕ(ξ) ϕ(ξ ) (ξ ξ ) ϕ(ξ ). Relation: D KL (P θ P θ ) = B ψ (θ θ) = B φ (η(θ) η(θ )). arnaud.dessein@ircam.fr August 30th 2013 Research Seminar Télécom ParisTech 8/46

15 Outline Statistical framework Methods for exponential families I. Computational Methods of Information Geometry 1. Preliminaries on Information Geometry 2. Sequential Change Detection with Exponential Families 3. Non-Negative Matrix Factorization with Convex-Concave Divergences II. Real-Time Applications in Audio Signal Processing 4. Real-Time Audio Segmentation 5. Real-Time Polyphonic Music Transcription August 30th 2013 Research Seminar Télécom ParisTech 9/46

16 Background Statistical framework Methods for exponential families Principle: Decide whether the process presents some structural modifications along time. Find the time instants corresponding to the different change points. Characterize the properties within the respective segments. August 30th 2013 Research Seminar Télécom ParisTech 10/46

17 Background Statistical framework Methods for exponential families Principle: Decide whether the process presents some structural modifications along time. Find the time instants corresponding to the different change points. Characterize the properties within the respective segments. Applications: Quality control in industrial production. Fault detection in technological processes. Automatic surveillance for intrusion and abnormal behavior in security monitoring. Signal processing in geophysics, econometrics, audio, medicine, image. August 30th 2013 Research Seminar Télécom ParisTech 10/46

18 Background Statistical framework Methods for exponential families Principle: Decide whether the process presents some structural modifications along time. Find the time instants corresponding to the different change points. Characterize the properties within the respective segments. Applications: Quality control in industrial production. Fault detection in technological processes. Automatic surveillance for intrusion and abnormal behavior in security monitoring. Signal processing in geophysics, econometrics, audio, medicine, image. Approaches: Statistical modeling and monitoring of the distributions [Basseville & Nikiforov, 1993, Lai, 1995, Poor & Hadjiliadis, 2009, Chen & Gupta, 2012, Polunchenko & Tartakovsky, 2012]. Machine learning techniques relying on distance functions [Harchaoui & Lévy-Leduc, 2008, Harchaoui & Lévy-Leduc, 2010, Vert & Bleakley, 2010, Desobry et al., 2005, Harchaoui et al., 2009]. August 30th 2013 Research Seminar Télécom ParisTech 10/46

19 Motivations and contributions Statistical framework Methods for exponential families Issues of statistical online approaches: Either approximations of the exact statistics with unknown parameters for tractability. Or restrictions on the data and scenarios [Siegmund & Venkatraman, 1995, Mei, 2006]. With the exception of a full Bayesian framework for exponential families [Lai & Xing, 2010]. arnaud.dessein@ircam.fr August 30th 2013 Research Seminar Télécom ParisTech 11/46

20 Motivations and contributions Statistical framework Methods for exponential families Issues of statistical online approaches: Either approximations of the exact statistics with unknown parameters for tractability. Or restrictions on the data and scenarios [Siegmund & Venkatraman, 1995, Mei, 2006]. With the exception of a full Bayesian framework for exponential families [Lai & Xing, 2010]. Goals in the non-bayesian framework: Known or unknown parameters. Additive or non-additive changes. Topology of the parameters and data. Exact inference for online schemes. arnaud.dessein@ircam.fr August 30th 2013 Research Seminar Télécom ParisTech 11/46

21 Motivations and contributions Statistical framework Methods for exponential families Issues of statistical online approaches: Either approximations of the exact statistics with unknown parameters for tractability. Or restrictions on the data and scenarios [Siegmund & Venkatraman, 1995, Mei, 2006]. With the exception of a full Bayesian framework for exponential families [Lai & Xing, 2010]. Goals in the non-bayesian framework: Known or unknown parameters. Additive or non-additive changes. Topology of the parameters and data. Exact inference for online schemes. Contributions in this context: Study of the generalized likelihood ratios within the dually flat information geometry. Estimation with arbitrary estimators compared to maximum likelihood. Alternative expression of the statistics through convex duality. Attractive simplification for exact inference with maximum likelihood. arnaud.dessein@ircam.fr August 30th 2013 Research Seminar Télécom ParisTech 11/46

22 Multiple hypothesis Statistical framework Methods for exponential families Problem formulation: X 1,..., X n are mutually independent from P = {P ξ } ξ Ξ. Observe sx = (x 1,..., x n) X n. Decide whether X 1,..., X n are i.i.d. or not. arnaud.dessein@ircam.fr August 30th 2013 Research Seminar Télécom ParisTech 12/46

23 Multiple hypothesis Statistical framework Methods for exponential families Problem formulation: X 1,..., X n are mutually independent from P = {P ξ } ξ Ξ. Observe sx = (x 1,..., x n) X n. Decide whether X 1,..., X n are i.i.d. or not. Multiple hypotheses: H 0 : X 1,..., X n P ξ0, ξ 0 Ξ 0. H 1 : X 1,..., X i P ξ i, ξ i 0 Ξi 0, Xi+1,..., Xn P 0 ξ i, 1 ξi 1 Ξi 1, i v1, n 1w. H i 1 : X1,..., Xi P ξ i 0, ξi 0 Ξi 0, Xi+1,..., Xn P ξ i 1, ξi 1 Ξi arnaud.dessein@ircam.fr August 30th 2013 Research Seminar Télécom ParisTech 12/46

24 Test statistics and decision rules Statistical framework Methods for exponential families Likelihood ratio for known parameters: Λ i (sx)= 2 log nj=1 p ξbef (x j ) ij=1 p ξbef (x j ) n j=i+1 p ξaft (x j ). Simplification as cumulative sum statistics: 1 2 Λi (sx) = n j=i+1 log p ξ aft (x j ) p ξbef (x j ). Efficient recursive implementation for online procedures. arnaud.dessein@ircam.fr August 30th 2013 Research Seminar Télécom ParisTech 13/46

25 Test statistics and decision rules Statistical framework Methods for exponential families Likelihood ratio for known parameters: Λ i (sx)= 2 log nj=1 p ξbef (x j ) ij=1 p ξbef (x j ) n j=i+1 p ξaft (x j ). 1 Simplification as cumulative sum statistics: 2 Λi (sx) = n j=i+1 log p ξ (x aft j ) p ξbef (x j ). Efficient recursive implementation for online procedures. nj=i+1 Generalized likelihood ratio: Λ pi p ξ p (sx) = 2 log 0 (sx) (x j ) ij=1 p ξ p i 0 (sx)(x j ) n j=i+1 p ξ p i 1 (sx)(x j ). Two cumulative sums: 1 2 p Λ i (sx) = i p j=1 log ξ p i 0 (sx) (x j ) p p ξ 0 (sx) (x j ) + n p j=i+1 log ξ p i 1 (sx) (x j ) p ξ p 0 (sx) (x j ). Computationally more demanding so usually approximated for online procedures. arnaud.dessein@ircam.fr August 30th 2013 Research Seminar Télécom ParisTech 13/46

26 Test statistics and decision rules Statistical framework Methods for exponential families Likelihood ratio for known parameters: Λ i (sx)= 2 log nj=1 p ξbef (x j ) ij=1 p ξbef (x j ) n j=i+1 p ξaft (x j ). 1 Simplification as cumulative sum statistics: 2 Λi (sx) = n j=i+1 log p ξ (x aft j ) p ξbef (x j ). Efficient recursive implementation for online procedures. nj=i+1 Generalized likelihood ratio: Λ pi p ξ p (sx) = 2 log 0 (sx) (x j ) ij=1 p ξ p i 0 (sx)(x j ) n j=i+1 p ξ p i 1 (sx)(x j ). Two cumulative sums: 1 2 p Λ i (sx) = i p j=1 log ξ p i 0 (sx) (x j ) p p ξ 0 (sx) (x j ) + n p j=i+1 log ξ p i 1 (sx) (x j ) p ξ p 0 (sx) (x j ). Computationally more demanding so usually approximated for online procedures. Non-Bayesian decision rule for a change: max 1 i n 1 p Λ i (sx) H 1 H 0 λ. Comparison of the maximum statistics to a threshold. Change point estimated as the first time point where the maximum is reached. arnaud.dessein@ircam.fr August 30th 2013 Research Seminar Télécom ParisTech 13/46

27 Generic scheme Statistical framework Methods for exponential families Theorem (2.1) For an exponential family, the generalized likelihood ratio satisfies: 1 ( pλ i (sx) = i {D KL 2 P p θ i 0 ml (sx) Pp θ 0 (sx) + (n i) ) D KL (P pθ i0 ml(sx) Ppθ i0(sx) )} ( ) )} Pp Ppθ {D KL Pθ p i 1 ml (sx) D θ 0 (sx) KL (P pθ i1 ml(sx) i1(sx). arnaud.dessein@ircam.fr August 30th 2013 Research Seminar Télécom ParisTech 14/46

28 Specific cases Statistical framework Methods for exponential families Various scenarios: Known parameters before and after change. Known parameter before change, unknown parameter after change. Unknown parameters before and after change. Example (2.4) Exact statistics and maximum likelihood: 1 Λ p ) ( ) 2 i Pp Pp (sx) = i D KL (Pθ p i 0 ml (sx) + (n i) D θ 0 ml (sx) KL Pθ p i 1 ml (sx). θ 0 ml (sx) arnaud.dessein@ircam.fr August 30th 2013 Research Seminar Télécom ParisTech 15/46

29 Specific cases Statistical framework Methods for exponential families Various scenarios: Known parameters before and after change. Known parameter before change, unknown parameter after change. Unknown parameters before and after change. Example (2.4) Exact statistics and maximum likelihood: 1 Λ p ) ( ) 2 i Pp Pp (sx) = i D KL (Pθ p i 0 ml (sx) + (n i) D θ 0 ml (sx) KL Pθ p i 1 ml (sx). θ 0 ml (sx) Example (2.3) Approximate statistics: 1 Λ p ) 2 i Pp (sx) = (n i) D KL (Pθ p i 1 ml (sx). θ 0 (sx) arnaud.dessein@ircam.fr August 30th 2013 Research Seminar Télécom ParisTech 15/46

30 Revisiting through convex duality Statistical framework Methods for exponential families Proposition (2.3) For an exponential family, the generalized likelihood ratio satisfies: 1 pλ i (sx) = i φ(pη i 0 2 (sx)) + (n i) φ(pηi 1 (sx)) n φ(pη 0 (sx)) + i ml (sx). where the corrective term i ml compared to maximum likelihood estimation equals: i ml (sx) = i (pηi 0 ml (sx) pηi 0 (sx)) φ(pη i 0 (sx)) + (n i) (pηi 1 ml (sx) pηi 1 (sx)) φ(pη i 1 (sx)) n (pη 0 ml (sx) pη 0 (sx)) φ(pη 0 (sx)). arnaud.dessein@ircam.fr August 30th 2013 Research Seminar Télécom ParisTech 16/46

31 Revisiting through convex duality Statistical framework Methods for exponential families Proposition (2.3) For an exponential family, the generalized likelihood ratio satisfies: 1 pλ i (sx) = i φ(pη i 0 2 (sx)) + (n i) φ(pηi 1 (sx)) n φ(pη 0 (sx)) + i ml (sx). where the corrective term i ml compared to maximum likelihood estimation equals: i ml (sx) = i (pηi 0 ml (sx) pηi 0 (sx)) φ(pη i 0 (sx)) + (n i) (pηi 1 ml (sx) pηi 1 (sx)) φ(pη i 1 (sx)) n (pη 0 ml (sx) pη 0 (sx)) φ(pη 0 (sx)). Example (2.5) The exact generalized likelihood ratio for unknown parameters and maximum likelihood estimation verifies: 1 pλ i (sx) = i φ(pη i 0 ml 2 (sx)) + (n i) φ(pηi 1 ml (sx)) n φ(pη 0 ml (sx)). arnaud.dessein@ircam.fr August 30th 2013 Research Seminar Télécom ParisTech 16/46

32 Statistical framework Methods for exponential families Summary: Standard non-bayesian approach to sequential change detection. Dually flat information geometry of exponential families. Generalized likelihood ratios with arbitrary estimators. Attractive scheme for exact inference when unknown parameters. August 30th 2013 Research Seminar Télécom ParisTech 17/46

33 Statistical framework Methods for exponential families Summary: Standard non-bayesian approach to sequential change detection. Dually flat information geometry of exponential families. Generalized likelihood ratios with arbitrary estimators. Attractive scheme for exact inference when unknown parameters. Perspectives: Direct extensions: Non-steep or curved exponential families. Maximum a posteriori estimators. Asymptotic properties: Distribution of the test statistics. Optimality formulation and analysis. Statistical dependence: Autoregressive models. Non-linear systems and particle filtering. Alternative test statistics: Reversing the problem and starting from geometric considerations. Information divergences and more robust estimators. arnaud.dessein@ircam.fr August 30th 2013 Research Seminar Télécom ParisTech 17/46

34 Outline Proposed approach Experimental results I. Computational Methods of Information Geometry 1. Preliminaries on Information Geometry 2. Sequential Change Detection with Exponential Families 3. Non-Negative Matrix Factorization with Convex-Concave Divergences II. Real-Time Applications in Audio Signal Processing 4. Real-Time Audio Segmentation 5. Real-Time Polyphonic Music Transcription August 30th 2013 Research Seminar Télécom ParisTech 18/46

35 Background Proposed approach Experimental results Principle: Determine time boundaries that partition a sound into homogeneous and continuous temporal segments, such that adjacent segments exhibit inhomogeneties. Define a criterion to quantify the homogeneity of the segments. arnaud.dessein@ircam.fr August 30th 2013 Research Seminar Télécom ParisTech 19/46

36 Background Proposed approach Experimental results Principle: Determine time boundaries that partition a sound into homogeneous and continuous temporal segments, such that adjacent segments exhibit inhomogeneties. Define a criterion to quantify the homogeneity of the segments. Approaches: Supervised: high-level classes and automatic classification. Unsupervised: statistical and distance-based approaches: Musical onset detection [Bello et al., 2005, Dixon, 2006]. Speaker segmentation [Kemp et al., 2000, Kotti et al., 2008]. arnaud.dessein@ircam.fr August 30th 2013 Research Seminar Télécom ParisTech 19/46

37 Motivations and contributions Proposed approach Experimental results Issues of unsupervised approaches to audio segmentation: Often tailored to particular types of signal and homogeneity criterion. Specific distance functions or models. Some are offline. Others approximate the exact statistics. August 30th 2013 Research Seminar Télécom ParisTech 20/46

38 Motivations and contributions Proposed approach Experimental results Issues of unsupervised approaches to audio segmentation: Often tailored to particular types of signal and homogeneity criterion. Specific distance functions or models. Some are offline. Others approximate the exact statistics. Goals towards a unifying framework for audio segmentation: Arbitrary types of signals homogeneity criteria. Large choice of distance functions or models. Real-time constraints. Exact online inference. arnaud.dessein@ircam.fr August 30th 2013 Research Seminar Télécom ParisTech 20/46

39 Motivations and contributions Proposed approach Experimental results Issues of unsupervised approaches to audio segmentation: Often tailored to particular types of signal and homogeneity criterion. Specific distance functions or models. Some are offline. Others approximate the exact statistics. Goals towards a unifying framework for audio segmentation: Arbitrary types of signals homogeneity criteria. Large choice of distance functions or models. Real-time constraints. Exact online inference. Contributions in this context: Generic framework for real-time audio segmentation. Unification of several standard approaches. Online change detection with exponential families. Exact generalized likelihood ratios and maximum likelihood. arnaud.dessein@ircam.fr August 30th 2013 Research Seminar Télécom ParisTech 20/46

40 System architecture Proposed approach Experimental results Segmentation scheme: 1 Represent frames with a short-time sound description. 2 Model the observations with probability distributions. 3 Detect sequentially changes in the distribution parameters. Audio segmentation (online) Auditory scene Short-time sound representation Statistical modeling Change detection arnaud.dessein@ircam.fr August 30th 2013 Research Seminar Télécom ParisTech 21/46

41 System architecture Proposed approach Experimental results Segmentation scheme: 1 Represent frames with a short-time sound description. 2 Model the observations with probability distributions. 3 Detect sequentially changes in the distribution parameters. Short-time sound representation: Energy for information on loudness. Fourier transform for information on spectral content. Mel-frequency cepstral coefficients for information on timbre. Many other possibilities. Audio segmentation (online) Auditory scene Short-time sound representation Statistical modeling Change detection arnaud.dessein@ircam.fr August 30th 2013 Research Seminar Télécom ParisTech 21/46

42 System architecture Proposed approach Experimental results Segmentation scheme: 1 Represent frames with a short-time sound description. 2 Model the observations with probability distributions. 3 Detect sequentially changes in the distribution parameters. Short-time sound representation: Energy for information on loudness. Fourier transform for information on spectral content. Mel-frequency cepstral coefficients for information on timbre. Many other possibilities. Statistical modeling: Exponential families and generalized likelihood ratios. Unknown parameters and maximum likelihood. Audio segmentation (online) Auditory scene Short-time sound representation Statistical modeling Change detection arnaud.dessein@ircam.fr August 30th 2013 Research Seminar Télécom ParisTech 21/46

43 Change detection Proposed approach Experimental results Clarification of the relations between statistical approaches: Likelihood statistics: 2 log(p(sx H 0 )/p(sx H i 1 )) > λ. Exact GLR: pη 0 ml (sx) = n 1 nj=1 x j, pη i 0 ml (sx) = 1 ij=1 x i j, pη i 1 ml (sx) = n i 1 nj=i+1 x j. Approximate GLR on the whole window: pη i 0 ml (sx) pη 0 ml (sx) = n 1 nj=1 x j. Approximate GLR in a dead region: pη i 0 ml (sx) pη 0 ml (sx) n 1 n0 0 j=1 x j. Model selection: 2 log(p(sx H 0 )/p(sx H i 1 )) > λ. AIC: λ = 2d. BIC: λ = d log n. Penalized BIC: λ = γ d log n. arnaud.dessein@ircam.fr August 30th 2013 Research Seminar Télécom ParisTech 22/46

44 Change detection Proposed approach Experimental results Clarification of the relations between statistical approaches: Likelihood statistics: 2 log(p(sx H 0 )/p(sx H i 1 )) > λ. Exact GLR: pη 0 ml (sx) = n 1 nj=1 x j, pη i 0 ml (sx) = 1 ij=1 x i j, pη i 1 ml (sx) = n i 1 nj=i+1 x j. Approximate GLR on the whole window: pη i 0 ml (sx) pη 0 ml (sx) = n 1 nj=1 x j. Approximate GLR in a dead region: pη i 0 ml (sx) pη 0 ml (sx) n 1 n0 0 j=1 x j. Model selection: 2 log(p(sx H 0 )/p(sx H i 1 )) > λ. AIC: λ = 2d. BIC: λ = d log n. Penalized BIC: λ = γ d log n. Links with distance-based approaches: 1 Λ p ) ( ) 2 i Pp Pp (sx) = i D KL (Pθ p i 0 ml (sx) + (n i) D θ 0 ml (sx) KL Pθ p i 1 ml (sx). θ 0 ml (sx) Heuristics: Threshold on the observations. Distance between the observations at successive frames. Kernels methods: Equivalence between one-class support vector machines for novelty detection and approximate GLR statistics [Canu & Smola, 2006]. arnaud.dessein@ircam.fr August 30th 2013 Research Seminar Télécom ParisTech 22/46

45 Segmentation into silence and activity Proposed approach Experimental results Parameters: Short-time sound representation: energy in a Mel-frequency filter bank at Hz. Statistical model: Rayleigh distributions. Topology: 1 dimension, continuous non-negative values Original audio Short time sound representation Reference annotation Silence Activity Silence Activity Silence Time in seconds arnaud.dessein@ircam.fr August 30th 2013 Research Seminar Télécom ParisTech 23/46

46 Segmentation into music and speech Proposed approach Experimental results Parameters: Short-time sound representation: Mel-frequency cepstral coefficients at Hz. Parametric statistical model: multivariate spherical normal distributions fixed variance. Topology: 12 dimensions, continuous real values Original audio Reference annotation Music 1 Speech 1 Music 2 Speech 2 Music Time in seconds arnaud.dessein@ircam.fr August 30th 2013 Research Seminar Télécom ParisTech 24/46

47 Segmentation into different speakers Proposed approach Experimental results Parameters: Short-time sound representation: Mel-frequency cepstral coefficients at Hz. Parametric statistical model: multivariate spherical normal distributions fixed variance. Topology: 12 dimensions, continuous real values Original audio Estimated Mel frequency cepstral coefficients Reference annotation Speaker 1 Speaker 2 Speaker 3 Speaker 4 Speaker Time in seconds arnaud.dessein@ircam.fr August 30th 2013 Research Seminar Télécom ParisTech 25/46

48 Segmentation into polyphonic note slices Proposed approach Experimental results Parameters: Short-time sound representation: normalized magnitude spectrum at Hz. Parametric statistical model: categorical distributions. Topology: 257 dimensions, discrete frequency histograms A5 Original audio Reference piano roll E5 B4 F4# Pitch C4# G3# D3# A2# F Time in seconds arnaud.dessein@ircam.fr August 30th 2013 Research Seminar Télécom ParisTech 26/46

49 Evaluation on musical onset detection Proposed approach Experimental results Parameters: Short-time sound representation: normalized magnitude spectrum at Hz. Parametric statistical model: categorical distributions. Topology: 513 dimensions, discrete frequency histograms. August 30th 2013 Research Seminar Télécom ParisTech 27/46

50 Evaluation on musical onset detection Proposed approach Experimental results Parameters: Short-time sound representation: normalized magnitude spectrum at Hz. Parametric statistical model: categorical distributions. Topology: 513 dimensions, discrete frequency histograms. Evaluation of generalized likelihood ratios GLR and spectral flux SF: Difficult dataset [Leveau et al., 2004]. Standard methodology. Algorithm Threshold P R F Distance function GLR Kullback-Leibler SF Euclidean SF Kullback-Leibler SF Half-wave rectified difference arnaud.dessein@ircam.fr August 30th 2013 Research Seminar Télécom ParisTech 27/46

51 Evaluation on musical onset detection Proposed approach Experimental results Parameters: Short-time sound representation: normalized magnitude spectrum at Hz. Parametric statistical model: categorical distributions. Topology: 513 dimensions, discrete frequency histograms. Evaluation of generalized likelihood ratios GLR and spectral flux SF: Difficult dataset [Leveau et al., 2004]. Standard methodology. Algorithm Threshold P R F Distance function GLR Kullback-Leibler SF Euclidean SF Kullback-Leibler SF Half-wave rectified difference Comparison to the state-of-the-art: LFSF: offline spectral flux with a logarithmic frequency scale and filtering [Böck et al., 2012]. TSPC: online spectral peak classification into transients and non-transients [Röbel, 2011]. Algorithm P R F GLR LFSF TSPC arnaud.dessein@ircam.fr August 30th 2013 Research Seminar Télécom ParisTech 27/46

52 Proposed approach Experimental results Summary: Real-system for audio segmentation. Various types of signals and of homogeneity criteria. Sequential change detection with exponential families. Unification and generalization of several statistical and distance-based approaches. August 30th 2013 Research Seminar Télécom ParisTech 28/46

53 Proposed approach Experimental results Summary: Real-system for audio segmentation. Various types of signals and of homogeneity criteria. Sequential change detection with exponential families. Unification and generalization of several statistical and distance-based approaches. Perspectives: Dependent observations: Autoregressive models. Non-linear systems and particle filtering. Improved robustness: Post-processing by smoothing, adaptation. Growing and sliding window heuristics. Consideration of prior information: Maximum a posteriori. Full Bayesian framework. Further applications: Audio processing and music information retrieval. Other domains in signal processing. arnaud.dessein@ircam.fr August 30th 2013 Research Seminar Télécom ParisTech 28/46

54 Outline Optimization framework Methods for convex-concave divergences I. Computational Methods of Information Geometry 1. Preliminaries on Information Geometry 2. Sequential Change Detection with Exponential Families 3. Non-Negative Matrix Factorization with Convex-Concave Divergences II. Real-Time Applications in Audio Signal Processing 4. Real-Time Audio Segmentation 5. Real-Time Polyphonic Music Transcription August 30th 2013 Research Seminar Télécom ParisTech 29/46

55 Background Optimization framework Methods for convex-concave divergences Principle: Decompose the observed non-negative data as a linear combination of basis elements. Constrain the dictionary and encoding to be non-negative. arnaud.dessein@ircam.fr August 30th 2013 Research Seminar Télécom ParisTech 30/46

56 Background Optimization framework Methods for convex-concave divergences Principle: Decompose the observed non-negative data as a linear combination of basis elements. Constrain the dictionary and encoding to be non-negative. Applications: Bioinformatics, spectroscopy, surveillance. Signal processing in text, image, audio. arnaud.dessein@ircam.fr August 30th 2013 Research Seminar Télécom ParisTech 30/46

57 Background Optimization framework Methods for convex-concave divergences Principle: Decompose the observed non-negative data as a linear combination of basis elements. Constrain the dictionary and encoding to be non-negative. Applications: Bioinformatics, spectroscopy, surveillance. Signal processing in text, image, audio. Approaches: Euclidean cost function alternating non-negative least squares [Paatero & Tapper, 1994, Paatero, 1997]. Euclidean or Kullback-Leibler cost functions and multiplicative updates [Lee & Seung, 1999, Lee & Seung, 2001]. Many extensions [Berry et al., 2007, Cichocki et al., 2009]. Statistical inference [Févotte & Cemgil, 2009, Cemgil, 2009]. arnaud.dessein@ircam.fr August 30th 2013 Research Seminar Télécom ParisTech 30/46

58 Motivations and contributions Optimization framework Methods for convex-concave divergences Issues of approaches to optimizing factored models: Often particular cost functions or statistical models. Potential problems in convergence. With the exception of updates for classes of divergences [Cichocki et al., 2006, Cichocki et al., 2008, Dhillon & Sra, 2006, Dhillon & Sra, 2006, Kompass, 2007, Nakano et al., 2010, Févotte & Idier, 2011, Cichocki et al., 2011].. August 30th 2013 Research Seminar Télécom ParisTech 31/46

59 Motivations and contributions Optimization framework Methods for convex-concave divergences Issues of approaches to optimizing factored models: Often particular cost functions or statistical models. Potential problems in convergence. With the exception of updates for classes of divergences [Cichocki et al., 2006, Cichocki et al., 2008, Dhillon & Sra, 2006, Dhillon & Sra, 2006, Kompass, 2007, Nakano et al., 2010, Févotte & Idier, 2011, Cichocki et al., 2011].. Goals for providing generic methods: Common information divergences. Multiplicative updates. Convergence guarantees. August 30th 2013 Research Seminar Télécom ParisTech 31/46

60 Motivations and contributions Optimization framework Methods for convex-concave divergences Issues of approaches to optimizing factored models: Often particular cost functions or statistical models. Potential problems in convergence. With the exception of updates for classes of divergences [Cichocki et al., 2006, Cichocki et al., 2008, Dhillon & Sra, 2006, Dhillon & Sra, 2006, Kompass, 2007, Nakano et al., 2010, Févotte & Idier, 2011, Cichocki et al., 2011].. Goals for providing generic methods: Common information divergences. Multiplicative updates. Convergence guarantees. Contributions in this context: Generic scheme for arbitrary convex-concave divergences. Majorization-minimization scheme with these auxiliary functions. Known and novel updates for common information divergences. Multiplicative updates for α-divergences, β-divergences, (α, β)-divergences. Extension to skew divergences. August 30th 2013 Research Seminar Télécom ParisTech 31/46

61 Cost function minimization problem Optimization framework Methods for convex-concave divergences Problem formulation: y R m + is an observation vector. A R m r + is a dictionary matrix. Find an encoding vector x R r + of y into A such that y Ax. arnaud.dessein@ircam.fr August 30th 2013 Research Seminar Télécom ParisTech 32/46

62 Cost function minimization problem Optimization framework Methods for convex-concave divergences Problem formulation: y R m + is an observation vector. A R m r + is a dictionary matrix. Find an encoding vector x R r + of y into A such that y Ax. Optimization formulation: Separable divergence: D(y y ) = m i=1 d(y i y i ). Cost function: C(x) = D(y Ax). Problem: minimize C(x) subject to x X. arnaud.dessein@ircam.fr August 30th 2013 Research Seminar Télécom ParisTech 32/46

63 Variational bounding and auxiliary functions Optimization framework Methods for convex-concave divergences Variational bounding: Iterative technique for minimization problems. Cost function replaced at each step with a surrogate majorizing function. arnaud.dessein@ircam.fr August 30th 2013 Research Seminar Télécom ParisTech 33/46

64 Variational bounding and auxiliary functions Optimization framework Methods for convex-concave divergences Variational bounding: Iterative technique for minimization problems. Cost function replaced at each step with a surrogate majorizing function. Auxiliary function: G(sx sx) = C(sx) and G(x sx) C(x). Iterative scheme: if G(x sx) G(sx sx), then C(x) C(sx). Majorization-minimization scheme: minimize G(x, sx) subject to x X. s ( ) G( sx) This th C( ) ( C(sx) =G(sx sx) )= (sx), which G(x sx) C(x) s x, sx arnaud.dessein@ircam.fr August 30th 2013 Research Seminar Télécom ParisTech 33/46

65 Generic updates Optimization framework Methods for convex-concave divergences Proposition (3.3) For a convex-concave divergence, we have the following auxiliary function: { ( ) ( ) m r r a ik sx k r G(x sx) = pd y i a il sx l + q x k r i=1 l=1 k=1 l=1 a d y i a il sx l il sx l sx l=1 k ( )} r + a ik (x k sx k ) p d r y y i a il sx l k=1 l=1. arnaud.dessein@ircam.fr August 30th 2013 Research Seminar Télécom ParisTech 34/46

66 Generic updates Optimization framework Methods for convex-concave divergences Proposition (3.3) For a convex-concave divergence, we have the following auxiliary function: { ( ) ( ) m r r a ik sx k r G(x sx) = pd y i a il sx l + q x k r i=1 l=1 k=1 l=1 a d y i a il sx l il sx l sx l=1 k ( )} r + a ik (x k sx k ) p d r y y i a il sx l k=1 l=1. Theorem (3.4) For a convex-concave divergence, the cost function decreases monotonically by iteratively solving: m a q ( ) d r x k m ik y i=1 y i a il sx l = a p ( ) d r ik sx l=1 k y i=1 y i a il sx l. l=1 arnaud.dessein@ircam.fr August 30th 2013 Research Seminar Télécom ParisTech 34/46

67 Specific cases Optimization framework Methods for convex-concave divergences Various informations divergences: Csiszár ϕ-divergences. α-divergences. Left-sided Bregman ϕ-divergences. Skew Jensen-Bregman (ϕ, λ)-divergences. Skew Jeffreys (β, λ)-divergences. Skew Jensen β-divergences. (α, β)-divergences. Skew (α, β, λ)-divergences. August 30th 2013 Research Seminar Télécom ParisTech 35/46

68 Specific cases Optimization framework Methods for convex-concave divergences Various informations divergences: Csiszár ϕ-divergences. α-divergences. Left-sided Bregman ϕ-divergences. Skew Jensen-Bregman (ϕ, λ)-divergences. Skew Jeffreys (β, λ)-divergences. Skew Jensen β-divergences. (α, β)-divergences. Skew (α, β, λ)-divergences. Example (3.8) Multiplicative ( updates for (α, β)-divergences: mi=1 a x k = sx k ik yi α ( ) r l=1 a il sx l) β 1 pα,β mi=1 a ik( r. l=1 a il sx l) α+β 1 arnaud.dessein@ircam.fr August 30th 2013 Research Seminar Télécom ParisTech 35/46

69 Optimization framework Methods for convex-concave divergences Summary: Non-negative matrix factorization with convex-concave divergences. Many common information divergences. Variational bounding with auxiliary functions. Symmetrized and skew divergences. August 30th 2013 Research Seminar Télécom ParisTech 36/46

70 Optimization framework Methods for convex-concave divergences Summary: Non-negative matrix factorization with convex-concave divergences. Many common information divergences. Variational bounding with auxiliary functions. Symmetrized and skew divergences. Perspectives: Enhanced factorization model: Convex models, non-negative tensor models. Convolutive models. Extended cost functions: Penalty terms for regularization. l p-norms for sparsity regularization. Other generic updates: Majorization-equalization and tempered updates. Further assumptions on convexity properties. Strong convergence properties: Convergence of the cost function to a minimum. Convergence of the updates. arnaud.dessein@ircam.fr August 30th 2013 Research Seminar Télécom ParisTech 36/46

71 Outline Proposed approach Experimental results I. Computational Methods of Information Geometry 1. Preliminaries on Information Geometry 2. Sequential Change Detection with Exponential Families 3. Non-Negative Matrix Factorization with Convex-Concave Divergences II. Real-Time Applications in Audio Signal Processing 4. Real-Time Audio Segmentation 5. Real-Time Polyphonic Music Transcription August 30th 2013 Research Seminar Télécom ParisTech 37/46

72 Background Proposed approach Experimental results Principle: Convert a raw music signal into a symbolic representation such as a score. From a low-level audio waveform to high-level symbolic information. arnaud.dessein@ircam.fr August 30th 2013 Research Seminar Télécom ParisTech 38/46

73 Background Proposed approach Experimental results Principle: Convert a raw music signal into a symbolic representation such as a score. From a low-level audio waveform to high-level symbolic information. Approaches: Many methods for multiple pitch estimation [de Cheveigné, 2006, Klapuri & Davy, 2006]. Non-negative matrix factorization [Smaragdis & Brown, 2003, Abdallah & Plumbley, 2004, Virtanen & Klapuri, 2006, Raczyński et al., 2007, Niedermayer, 2008, Marolt, 2009, Grindlay & Ellis, 2009, Vincent et al., 2010]. Probabilistic latent component analysis [Smaragdis et al., 2008, Mysore & Smaragdis, 2009, Grindlay & Ellis, 2011, Hennequin et al., 2011, Fuentes et al., 2011, Benetos & Dixon, 2011]. arnaud.dessein@ircam.fr August 30th 2013 Research Seminar Télécom ParisTech 38/46

74 Motivations and contributions Proposed approach Experimental results Issues of approaches based on non-negative matrix factorization: Inherently suitable for offline processing. Need for structuring the dictionary. With the exception of systems for audio decomposition [Sha & Saul, 2005, Cheng et al., 2008, Cont, 2006, Cont et al., 2007]. Specific cost functions. No suitable controls on the decomposition. Potential convergence issues. Contributions in this context: Parametric family of (α, β)-divergences for decomposition. Insights into their relevancy for audio analysis. Flexible control on the frequency compromise during the decomposition. Multiplicative updates tailored to real time. Monotonic decrease of the cost function. August 30th 2013 Research Seminar Télécom ParisTech 39/46

75 System architecture Proposed approach Experimental results Transcription scheme: 1 Learn a dictionary offline for note templates. 2 Decompose the music signal online onto the dictionary. 3 Output the activations along time. Note template learning (offline) Isolated note samples Music signal decomposition (online) Auditory scene Short-term sound representation Short-time sound representation Non-negative matrix factorization Non-negative decomposition Note templates Note activations arnaud.dessein@ircam.fr August 30th 2013 Research Seminar Télécom ParisTech 40/46

76 System architecture Proposed approach Experimental results Transcription scheme: 1 Learn a dictionary offline for note templates. 2 Decompose the music signal online onto the dictionary. 3 Output the activations along time. Note template learning. Characteristic and discriminative templates. Short-term magnitude or power spectrum. Non-negative matrix factorization. Note template learning (offline) Isolated note samples Short-term sound representation Music signal decomposition (online) Auditory scene Short-time sound representation Non-negative matrix factorization Non-negative decomposition Frequency (Hz) Note templates Note activations A0 A1 A2 A3 A4 A5 A6 A7 Note 11 arnaud.dessein@ircam.fr August 30th 2013 Research Seminar Télécom ParisTech 40/46

77 Non-negative decomposition Proposed approach Experimental results Scaling property of (α, β)-divergences: d (ab) α,β (γy γy ) = γ α+β d (ab) α,β (y y ). Different emphasis is put on the coefficients depending on their magnitude. α + β > 0: more emphasis is put on the higher magnitude coefficients. α + β < 0: converse effect. Compromise between fundamental frequencies, first partials, and higher partials. arnaud.dessein@ircam.fr August 30th 2013 Research Seminar Télécom ParisTech 41/46

78 Non-negative decomposition Proposed approach Experimental results Scaling property of (α, β)-divergences: d (ab) α,β (γy γy ) = γ α+β d (ab) α,β (y y ). Different emphasis is put on the coefficients depending on their magnitude. α + β > 0: more emphasis is put on the higher magnitude coefficients. α + β < 0: converse effect. Compromise between fundamental frequencies, first partials, and higher partials. ( ( ( )) ) A y α e r (Ax) β 1 pα,β Tailored updates in vector form: x x A (Ax) α+β 1. 1 element-wise matrix multiplication per time frame. 1 vector transposed replication per time frame. 3 matrix-vector multiplications per iteration. 1 element-wise vector multiplication per iteration. 1 element-wise vector division per iteration. 3 element-wise vector powers per iteration. arnaud.dessein@ircam.fr August 30th 2013 Research Seminar Télécom ParisTech 41/46

79 Sample example of piano music Proposed approach Experimental results β = 2 A7 A6 A5 A4 A3 A2 A1 A0 α = 1 α = 0 α = 1 α = β = 1 A7 A6 A5 A4 A3 A2 A1 A β = 0 A7 A6 A5 A4 A3 A2 A1 A β = 1 A7 A6 A5 A4 A3 A2 A1 A Time in seconds Time in seconds Time in seconds Time in seconds arnaud.dessein@ircam.fr August 30th 2013 Research Seminar Télécom ParisTech 42/46

80 Proposed approach Experimental results Evaluation on multiple fundamental frequency estimation Evaluation: Standard piano dataset [Emiya et al., 2010]. Methodology from MIREX [Bay et al., 2009]. August 30th 2013 Research Seminar Télécom ParisTech 43/46

81 Proposed approach Experimental results Evaluation on multiple fundamental frequency estimation α + β α β Threshold F Distance function ± Itakura-Saito ± β-divergence with β = Neyman s χ α-divergence with α = Dual Kullback-Leibler Hellinger +1.0 ± Kullback-Leibler α-divergence with α = Pearson s χ α-divergence with α = α-divergence with α = α-divergence with α = α-divergence with α = α-divergence with α = α-divergence with α = 5 arnaud.dessein@ircam.fr August 30th 2013 Research Seminar Télécom ParisTech 43/46

82 Proposed approach Experimental results Evaluation on multiple fundamental frequency estimation Evaluation: Standard piano dataset [Emiya et al., 2010]. Methodology from MIREX [Bay et al., 2009]. August 30th 2013 Research Seminar Télécom ParisTech 43/46

83 Proposed approach Experimental results Evaluation on multiple fundamental frequency estimation Evaluation: Standard piano dataset [Emiya et al., 2010]. Methodology from MIREX [Bay et al., 2009]. Comparison of ABND to the state-of-the-art: BHNMF: offline unsupervised non-negative matrix factorization with the β-divergences and a harmonic model exploiting spectral smoothness [Vincent et al., 2010]. SACS: offline sinusoidal analysis with a candidate selection exploiting spectral features [Yeh et al., 2010]. Algorithm P R F A E sub E mis E fal E tot ABND BHNMF SACS arnaud.dessein@ircam.fr August 30th 2013 Research Seminar Télécom ParisTech 43/46

An information-geometric approach to real-time audio segmentation

An information-geometric approach to real-time audio segmentation An information-geometric approach to real-time audio segmentation Arnaud Dessein, Arshia Cont To cite this version: Arnaud Dessein, Arshia Cont. An information-geometric approach to real-time audio segmentation.

More information

Journée Interdisciplinaire Mathématiques Musique

Journée Interdisciplinaire Mathématiques Musique Journée Interdisciplinaire Mathématiques Musique Music Information Geometry Arnaud Dessein 1,2 and Arshia Cont 1 1 Institute for Research and Coordination of Acoustics and Music, Paris, France 2 Japanese-French

More information

Nonnegative Matrix Factor 2-D Deconvolution for Blind Single Channel Source Separation

Nonnegative Matrix Factor 2-D Deconvolution for Blind Single Channel Source Separation Nonnegative Matrix Factor 2-D Deconvolution for Blind Single Channel Source Separation Mikkel N. Schmidt and Morten Mørup Technical University of Denmark Informatics and Mathematical Modelling Richard

More information

Non-Negative Matrix Factorization And Its Application to Audio. Tuomas Virtanen Tampere University of Technology

Non-Negative Matrix Factorization And Its Application to Audio. Tuomas Virtanen Tampere University of Technology Non-Negative Matrix Factorization And Its Application to Audio Tuomas Virtanen Tampere University of Technology tuomas.virtanen@tut.fi 2 Contents Introduction to audio signals Spectrogram representation

More information

Single-channel source separation using non-negative matrix factorization

Single-channel source separation using non-negative matrix factorization Single-channel source separation using non-negative matrix factorization Mikkel N. Schmidt Technical University of Denmark mns@imm.dtu.dk www.mikkelschmidt.dk DTU Informatics Department of Informatics

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Non-negative Matrix Factorization: Algorithms, Extensions and Applications

Non-negative Matrix Factorization: Algorithms, Extensions and Applications Non-negative Matrix Factorization: Algorithms, Extensions and Applications Emmanouil Benetos www.soi.city.ac.uk/ sbbj660/ March 2013 Emmanouil Benetos Non-negative Matrix Factorization March 2013 1 / 25

More information

NMF WITH SPECTRAL AND TEMPORAL CONTINUITY CRITERIA FOR MONAURAL SOUND SOURCE SEPARATION. Julian M. Becker, Christian Sohn and Christian Rohlfing

NMF WITH SPECTRAL AND TEMPORAL CONTINUITY CRITERIA FOR MONAURAL SOUND SOURCE SEPARATION. Julian M. Becker, Christian Sohn and Christian Rohlfing NMF WITH SPECTRAL AND TEMPORAL CONTINUITY CRITERIA FOR MONAURAL SOUND SOURCE SEPARATION Julian M. ecker, Christian Sohn Christian Rohlfing Institut für Nachrichtentechnik RWTH Aachen University D-52056

More information

ORTHOGONALITY-REGULARIZED MASKED NMF FOR LEARNING ON WEAKLY LABELED AUDIO DATA. Iwona Sobieraj, Lucas Rencker, Mark D. Plumbley

ORTHOGONALITY-REGULARIZED MASKED NMF FOR LEARNING ON WEAKLY LABELED AUDIO DATA. Iwona Sobieraj, Lucas Rencker, Mark D. Plumbley ORTHOGONALITY-REGULARIZED MASKED NMF FOR LEARNING ON WEAKLY LABELED AUDIO DATA Iwona Sobieraj, Lucas Rencker, Mark D. Plumbley University of Surrey Centre for Vision Speech and Signal Processing Guildford,

More information

FACTORS IN FACTORIZATION: DOES BETTER AUDIO SOURCE SEPARATION IMPLY BETTER POLYPHONIC MUSIC TRANSCRIPTION?

FACTORS IN FACTORIZATION: DOES BETTER AUDIO SOURCE SEPARATION IMPLY BETTER POLYPHONIC MUSIC TRANSCRIPTION? FACTORS IN FACTORIZATION: DOES BETTER AUDIO SOURCE SEPARATION IMPLY BETTER POLYPHONIC MUSIC TRANSCRIPTION? Tiago Fernandes Tavares, George Tzanetakis, Peter Driessen University of Victoria Department of

More information

Bayesian Hierarchical Modeling for Music and Audio Processing at LabROSA

Bayesian Hierarchical Modeling for Music and Audio Processing at LabROSA Bayesian Hierarchical Modeling for Music and Audio Processing at LabROSA Dawen Liang (LabROSA) Joint work with: Dan Ellis (LabROSA), Matt Hoffman (Adobe Research), Gautham Mysore (Adobe Research) 1. Bayesian

More information

PreFEst: A Predominant-F0 Estimation Method for Polyphonic Musical Audio Signals

PreFEst: A Predominant-F0 Estimation Method for Polyphonic Musical Audio Signals PreFEst: A Predominant-F0 Estimation Method for Polyphonic Musical Audio Signals Masataka Goto National Institute of Advanced Industrial Science and Technology (AIST). IT, AIST, 1-1-1 Umezono, Tsukuba,

More information

Constrained Nonnegative Matrix Factorization with Applications to Music Transcription

Constrained Nonnegative Matrix Factorization with Applications to Music Transcription Constrained Nonnegative Matrix Factorization with Applications to Music Transcription by Daniel Recoskie A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the

More information

Non-negative matrix factorization and term structure of interest rates

Non-negative matrix factorization and term structure of interest rates Non-negative matrix factorization and term structure of interest rates Hellinton H. Takada and Julio M. Stern Citation: AIP Conference Proceedings 1641, 369 (2015); doi: 10.1063/1.4906000 View online:

More information

Online Nonnegative Matrix Factorization with General Divergences

Online Nonnegative Matrix Factorization with General Divergences Online Nonnegative Matrix Factorization with General Divergences Vincent Y. F. Tan (ECE, Mathematics, NUS) Joint work with Renbo Zhao (NUS) and Huan Xu (GeorgiaTech) IWCT, Shanghai Jiaotong University

More information

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory Statistical Inference Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory IP, José Bioucas Dias, IST, 2007

More information

Real-time polyphonic music transcription with non-negative matrix factorization and beta-divergence

Real-time polyphonic music transcription with non-negative matrix factorization and beta-divergence Real-time polyphonic music transcription with non-negative matrix factorization and beta-divergence Arnaud Dessein, Arshia Cont, Guillaume Lemaitre To cite this version: Arnaud Dessein, Arshia Cont, Guillaume

More information

June 21, Peking University. Dual Connections. Zhengchao Wan. Overview. Duality of connections. Divergence: general contrast functions

June 21, Peking University. Dual Connections. Zhengchao Wan. Overview. Duality of connections. Divergence: general contrast functions Dual Peking University June 21, 2016 Divergences: Riemannian connection Let M be a manifold on which there is given a Riemannian metric g =,. A connection satisfying Z X, Y = Z X, Y + X, Z Y (1) for all

More information

Bayesian Models in Machine Learning

Bayesian Models in Machine Learning Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of

More information

TIME-DEPENDENT PARAMETRIC AND HARMONIC TEMPLATES IN NON-NEGATIVE MATRIX FACTORIZATION

TIME-DEPENDENT PARAMETRIC AND HARMONIC TEMPLATES IN NON-NEGATIVE MATRIX FACTORIZATION TIME-DEPENDENT PARAMETRIC AND HARMONIC TEMPLATES IN NON-NEGATIVE MATRIX FACTORIZATION 13 th International Conference on Digital Audio Effects Romain Hennequin, Roland Badeau and Bertrand David Telecom

More information

Independent Component Analysis and Unsupervised Learning

Independent Component Analysis and Unsupervised Learning Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien National Cheng Kung University TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent

More information

Analysis of polyphonic audio using source-filter model and non-negative matrix factorization

Analysis of polyphonic audio using source-filter model and non-negative matrix factorization Analysis of polyphonic audio using source-filter model and non-negative matrix factorization Tuomas Virtanen and Anssi Klapuri Tampere University of Technology, Institute of Signal Processing Korkeakoulunkatu

More information

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework

More information

Bregman Divergences for Data Mining Meta-Algorithms

Bregman Divergences for Data Mining Meta-Algorithms p.1/?? Bregman Divergences for Data Mining Meta-Algorithms Joydeep Ghosh University of Texas at Austin ghosh@ece.utexas.edu Reflects joint work with Arindam Banerjee, Srujana Merugu, Inderjit Dhillon,

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

Environmental Sound Classification in Realistic Situations

Environmental Sound Classification in Realistic Situations Environmental Sound Classification in Realistic Situations K. Haddad, W. Song Brüel & Kjær Sound and Vibration Measurement A/S, Skodsborgvej 307, 2850 Nærum, Denmark. X. Valero La Salle, Universistat Ramon

More information

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS Emad M. Grais and Hakan Erdogan Faculty of Engineering and Natural Sciences, Sabanci University, Orhanli

More information

Minimum message length estimation of mixtures of multivariate Gaussian and von Mises-Fisher distributions

Minimum message length estimation of mixtures of multivariate Gaussian and von Mises-Fisher distributions Minimum message length estimation of mixtures of multivariate Gaussian and von Mises-Fisher distributions Parthan Kasarapu & Lloyd Allison Monash University, Australia September 8, 25 Parthan Kasarapu

More information

ACOUSTIC SCENE CLASSIFICATION WITH MATRIX FACTORIZATION FOR UNSUPERVISED FEATURE LEARNING. Victor Bisot, Romain Serizel, Slim Essid, Gaël Richard

ACOUSTIC SCENE CLASSIFICATION WITH MATRIX FACTORIZATION FOR UNSUPERVISED FEATURE LEARNING. Victor Bisot, Romain Serizel, Slim Essid, Gaël Richard ACOUSTIC SCENE CLASSIFICATION WITH MATRIX FACTORIZATION FOR UNSUPERVISED FEATURE LEARNING Victor Bisot, Romain Serizel, Slim Essid, Gaël Richard LTCI, CNRS, Télćom ParisTech, Université Paris-Saclay, 75013,

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

THE task of identifying the environment in which a sound

THE task of identifying the environment in which a sound 1 Feature Learning with Matrix Factorization Applied to Acoustic Scene Classification Victor Bisot, Romain Serizel, Slim Essid, and Gaël Richard Abstract In this paper, we study the usefulness of various

More information

On Spectral Basis Selection for Single Channel Polyphonic Music Separation

On Spectral Basis Selection for Single Channel Polyphonic Music Separation On Spectral Basis Selection for Single Channel Polyphonic Music Separation Minje Kim and Seungjin Choi Department of Computer Science Pohang University of Science and Technology San 31 Hyoja-dong, Nam-gu

More information

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent voices Nonparametric likelihood

More information

CONVOLUTIVE NON-NEGATIVE MATRIX FACTORISATION WITH SPARSENESS CONSTRAINT

CONVOLUTIVE NON-NEGATIVE MATRIX FACTORISATION WITH SPARSENESS CONSTRAINT CONOLUTIE NON-NEGATIE MATRIX FACTORISATION WITH SPARSENESS CONSTRAINT Paul D. O Grady Barak A. Pearlmutter Hamilton Institute National University of Ireland, Maynooth Co. Kildare, Ireland. ABSTRACT Discovering

More information

Discovering Convolutive Speech Phones using Sparseness and Non-Negativity Constraints

Discovering Convolutive Speech Phones using Sparseness and Non-Negativity Constraints Discovering Convolutive Speech Phones using Sparseness and Non-Negativity Constraints Paul D. O Grady and Barak A. Pearlmutter Hamilton Institute, National University of Ireland Maynooth, Co. Kildare,

More information

topics about f-divergence

topics about f-divergence topics about f-divergence Presented by Liqun Chen Mar 16th, 2018 1 Outline 1 f-gan: Training Generative Neural Samplers using Variational Experiments 2 f-gans in an Information Geometric Nutshell Experiments

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

Nonnegative Matrix Factorization with Markov-Chained Bases for Modeling Time-Varying Patterns in Music Spectrograms

Nonnegative Matrix Factorization with Markov-Chained Bases for Modeling Time-Varying Patterns in Music Spectrograms Nonnegative Matrix Factorization with Markov-Chained Bases for Modeling Time-Varying Patterns in Music Spectrograms Masahiro Nakano 1, Jonathan Le Roux 2, Hirokazu Kameoka 2,YuKitano 1, Nobutaka Ono 1,

More information

ACCOUNTING FOR PHASE CANCELLATIONS IN NON-NEGATIVE MATRIX FACTORIZATION USING WEIGHTED DISTANCES. Sebastian Ewert Mark D. Plumbley Mark Sandler

ACCOUNTING FOR PHASE CANCELLATIONS IN NON-NEGATIVE MATRIX FACTORIZATION USING WEIGHTED DISTANCES. Sebastian Ewert Mark D. Plumbley Mark Sandler ACCOUNTING FOR PHASE CANCELLATIONS IN NON-NEGATIVE MATRIX FACTORIZATION USING WEIGHTED DISTANCES Sebastian Ewert Mark D. Plumbley Mark Sandler Queen Mary University of London, London, United Kingdom ABSTRACT

More information

Chapter 2 Exponential Families and Mixture Families of Probability Distributions

Chapter 2 Exponential Families and Mixture Families of Probability Distributions Chapter 2 Exponential Families and Mixture Families of Probability Distributions The present chapter studies the geometry of the exponential family of probability distributions. It is not only a typical

More information

Curve Fitting Re-visited, Bishop1.2.5

Curve Fitting Re-visited, Bishop1.2.5 Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement Simon Leglaive 1 Laurent Girin 1,2 Radu Horaud 1 1: Inria Grenoble Rhône-Alpes 2: Univ. Grenoble Alpes, Grenoble INP,

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Single Channel Music Sound Separation Based on Spectrogram Decomposition and Note Classification

Single Channel Music Sound Separation Based on Spectrogram Decomposition and Note Classification Single Channel Music Sound Separation Based on Spectrogram Decomposition and Note Classification Hafiz Mustafa and Wenwu Wang Centre for Vision, Speech and Signal Processing (CVSSP) University of Surrey,

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Oracle Analysis of Sparse Automatic Music Transcription

Oracle Analysis of Sparse Automatic Music Transcription Oracle Analysis of Sparse Automatic Music Transcription Ken O Hanlon, Hidehisa Nagano, and Mark D. Plumbley Queen Mary University of London NTT Communication Science Laboratories, NTT Corporation {keno,nagano,mark.plumbley}@eecs.qmul.ac.uk

More information

Uncertainty. Jayakrishnan Unnikrishnan. CSL June PhD Defense ECE Department

Uncertainty. Jayakrishnan Unnikrishnan. CSL June PhD Defense ECE Department Decision-Making under Statistical Uncertainty Jayakrishnan Unnikrishnan PhD Defense ECE Department University of Illinois at Urbana-Champaign CSL 141 12 June 2010 Statistical Decision-Making Relevant in

More information

Applications of Information Geometry to Hypothesis Testing and Signal Detection

Applications of Information Geometry to Hypothesis Testing and Signal Detection CMCAA 2016 Applications of Information Geometry to Hypothesis Testing and Signal Detection Yongqiang Cheng National University of Defense Technology July 2016 Outline 1. Principles of Information Geometry

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Kernel methods and the exponential family

Kernel methods and the exponential family Kernel methods and the exponential family Stephane Canu a Alex Smola b a 1-PSI-FRE CNRS 645, INSA de Rouen, France, St Etienne du Rouvray, France b Statistical Machine Learning Program, National ICT Australia

More information

Expectation Propagation Algorithm

Expectation Propagation Algorithm Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,

More information

Maximum Marginal Likelihood Estimation For Nonnegative Dictionary Learning

Maximum Marginal Likelihood Estimation For Nonnegative Dictionary Learning Maximum Marginal Likelihood Estimation For Nonnegative Dictionary Learning Onur Dikmen and Cédric Févotte CNRS LTCI; Télécom ParisTech dikmen@telecom-paristech.fr perso.telecom-paristech.fr/ dikmen 26.5.2011

More information

INFORMATION THEORY AND STATISTICS

INFORMATION THEORY AND STATISTICS INFORMATION THEORY AND STATISTICS Solomon Kullback DOVER PUBLICATIONS, INC. Mineola, New York Contents 1 DEFINITION OF INFORMATION 1 Introduction 1 2 Definition 3 3 Divergence 6 4 Examples 7 5 Problems...''.

More information

CS229 Project: Musical Alignment Discovery

CS229 Project: Musical Alignment Discovery S A A V S N N R R S CS229 Project: Musical Alignment iscovery Woodley Packard ecember 16, 2005 Introduction Logical representations of musical data are widely available in varying forms (for instance,

More information

Dimension Reduction (PCA, ICA, CCA, FLD,

Dimension Reduction (PCA, ICA, CCA, FLD, Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011 April 6 th, 2011 Parts of the PCA slides are from previous 10-701 lectures 1 Outline Dimension reduction

More information

ESTIMATING TRAFFIC NOISE LEVELS USING ACOUSTIC MONITORING: A PRELIMINARY STUDY

ESTIMATING TRAFFIC NOISE LEVELS USING ACOUSTIC MONITORING: A PRELIMINARY STUDY ESTIMATING TRAFFIC NOISE LEVELS USING ACOUSTIC MONITORING: A PRELIMINARY STUDY Jean-Rémy Gloaguen, Arnaud Can Ifsttar - LAE Route de Bouaye - CS4 44344, Bouguenais, FR jean-remy.gloaguen@ifsttar.fr Mathieu

More information

EUSIPCO

EUSIPCO EUSIPCO 213 1569744273 GAMMA HIDDEN MARKOV MODEL AS A PROBABILISTIC NONNEGATIVE MATRIX FACTORIZATION Nasser Mohammadiha, W. Bastiaan Kleijn, Arne Leijon KTH Royal Institute of Technology, Department of

More information

Feature Learning with Matrix Factorization Applied to Acoustic Scene Classification

Feature Learning with Matrix Factorization Applied to Acoustic Scene Classification Feature Learning with Matrix Factorization Applied to Acoustic Scene Classification Victor Bisot, Romain Serizel, Slim Essid, Gaël Richard To cite this version: Victor Bisot, Romain Serizel, Slim Essid,

More information

arxiv: v1 [cs.lg] 1 May 2010

arxiv: v1 [cs.lg] 1 May 2010 A Geometric View of Conjugate Priors Arvind Agarwal and Hal Daumé III School of Computing, University of Utah, Salt Lake City, Utah, 84112 USA {arvind,hal}@cs.utah.edu arxiv:1005.0047v1 [cs.lg] 1 May 2010

More information

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang. Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

LONG-TERM REVERBERATION MODELING FOR UNDER-DETERMINED AUDIO SOURCE SEPARATION WITH APPLICATION TO VOCAL MELODY EXTRACTION.

LONG-TERM REVERBERATION MODELING FOR UNDER-DETERMINED AUDIO SOURCE SEPARATION WITH APPLICATION TO VOCAL MELODY EXTRACTION. LONG-TERM REVERBERATION MODELING FOR UNDER-DETERMINED AUDIO SOURCE SEPARATION WITH APPLICATION TO VOCAL MELODY EXTRACTION. Romain Hennequin Deezer R&D 10 rue d Athènes, 75009 Paris, France rhennequin@deezer.com

More information

744 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 4, MAY 2011

744 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 4, MAY 2011 744 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 4, MAY 2011 NMF With Time Frequency Activations to Model Nonstationary Audio Events Romain Hennequin, Student Member, IEEE,

More information

A Geometric View of Conjugate Priors

A Geometric View of Conjugate Priors A Geometric View of Conjugate Priors Arvind Agarwal and Hal Daumé III School of Computing, University of Utah, Salt Lake City, Utah, 84112 USA {arvind,hal}@cs.utah.edu Abstract. In Bayesian machine learning,

More information

Inference. Data. Model. Variates

Inference. Data. Model. Variates Data Inference Variates Model ˆθ (,..., ) mˆθn(d) m θ2 M m θ1 (,, ) (,,, ) (,, ) α = :=: (, ) F( ) = = {(, ),, } F( ) X( ) = Γ( ) = Σ = ( ) = ( ) ( ) = { = } :=: (U, ) , = { = } = { = } x 2 e i, e j

More information

Learning Bayesian network : Given structure and completely observed data

Learning Bayesian network : Given structure and completely observed data Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution

More information

Stochastic Proximal Gradient Algorithm

Stochastic Proximal Gradient Algorithm Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind

More information

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata Principles of Pattern Recognition C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata e-mail: murthy@isical.ac.in Pattern Recognition Measurement Space > Feature Space >Decision

More information

BILEVEL SPARSE MODELS FOR POLYPHONIC MUSIC TRANSCRIPTION

BILEVEL SPARSE MODELS FOR POLYPHONIC MUSIC TRANSCRIPTION BILEVEL SPARSE MODELS FOR POLYPHONIC MUSIC TRANSCRIPTION Tal Ben Yakar Tel Aviv University talby0@gmail.com Roee Litman Tel Aviv University roeelitman@gmail.com Alex Bronstein Tel Aviv University bron@eng.tau.ac.il

More information

On the Chi square and higher-order Chi distances for approximating f-divergences

On the Chi square and higher-order Chi distances for approximating f-divergences On the Chi square and higher-order Chi distances for approximating f-divergences Frank Nielsen, Senior Member, IEEE and Richard Nock, Nonmember Abstract We report closed-form formula for calculating the

More information

REVIEW OF SINGLE CHANNEL SOURCE SEPARATION TECHNIQUES

REVIEW OF SINGLE CHANNEL SOURCE SEPARATION TECHNIQUES REVIEW OF SINGLE CHANNEL SOURCE SEPARATION TECHNIQUES Kedar Patki University of Rochester Dept. of Electrical and Computer Engineering kedar.patki@rochester.edu ABSTRACT The paper reviews the problem of

More information

PART I INTRODUCTION The meaning of probability Basic definitions for frequentist statistics and Bayesian inference Bayesian inference Combinatorics

PART I INTRODUCTION The meaning of probability Basic definitions for frequentist statistics and Bayesian inference Bayesian inference Combinatorics Table of Preface page xi PART I INTRODUCTION 1 1 The meaning of probability 3 1.1 Classical definition of probability 3 1.2 Statistical definition of probability 9 1.3 Bayesian understanding of probability

More information

Irr. Statistical Methods in Experimental Physics. 2nd Edition. Frederick James. World Scientific. CERN, Switzerland

Irr. Statistical Methods in Experimental Physics. 2nd Edition. Frederick James. World Scientific. CERN, Switzerland Frederick James CERN, Switzerland Statistical Methods in Experimental Physics 2nd Edition r i Irr 1- r ri Ibn World Scientific NEW JERSEY LONDON SINGAPORE BEIJING SHANGHAI HONG KONG TAIPEI CHENNAI CONTENTS

More information

TIME-DEPENDENT PARAMETRIC AND HARMONIC TEMPLATES IN NON-NEGATIVE MATRIX FACTORIZATION

TIME-DEPENDENT PARAMETRIC AND HARMONIC TEMPLATES IN NON-NEGATIVE MATRIX FACTORIZATION TIME-DEPENDENT PARAMETRIC AND HARMONIC TEMPLATES IN NON-NEGATIVE MATRIX FACTORIZATION Romain Hennequin, Roland Badeau and Bertrand David, Institut Telecom; Telecom ParisTech; CNRS LTCI Paris France romainhennequin@telecom-paristechfr

More information

Probability and Statistics

Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be Chapter 3: Parametric families of univariate distributions CHAPTER 3: PARAMETRIC

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

Information Geometry of Positive Measures and Positive-Definite Matrices: Decomposable Dually Flat Structure

Information Geometry of Positive Measures and Positive-Definite Matrices: Decomposable Dually Flat Structure Entropy 014, 16, 131-145; doi:10.3390/e1604131 OPEN ACCESS entropy ISSN 1099-4300 www.mdpi.com/journal/entropy Article Information Geometry of Positive Measures and Positive-Definite Matrices: Decomposable

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

MULTI-RESOLUTION SIGNAL DECOMPOSITION WITH TIME-DOMAIN SPECTROGRAM FACTORIZATION. Hirokazu Kameoka

MULTI-RESOLUTION SIGNAL DECOMPOSITION WITH TIME-DOMAIN SPECTROGRAM FACTORIZATION. Hirokazu Kameoka MULTI-RESOLUTION SIGNAL DECOMPOSITION WITH TIME-DOMAIN SPECTROGRAM FACTORIZATION Hiroazu Kameoa The University of Toyo / Nippon Telegraph and Telephone Corporation ABSTRACT This paper proposes a novel

More information

Foundations of Nonparametric Bayesian Methods

Foundations of Nonparametric Bayesian Methods 1 / 27 Foundations of Nonparametric Bayesian Methods Part II: Models on the Simplex Peter Orbanz http://mlg.eng.cam.ac.uk/porbanz/npb-tutorial.html 2 / 27 Tutorial Overview Part I: Basics Part II: Models

More information

Statistical and Inductive Inference by Minimum Message Length

Statistical and Inductive Inference by Minimum Message Length C.S. Wallace Statistical and Inductive Inference by Minimum Message Length With 22 Figures Springer Contents Preface 1. Inductive Inference 1 1.1 Introduction 1 1.2 Inductive Inference 5 1.3 The Demise

More information

Linear Classification

Linear Classification Linear Classification Lili MOU moull12@sei.pku.edu.cn http://sei.pku.edu.cn/ moull12 23 April 2015 Outline Introduction Discriminant Functions Probabilistic Generative Models Probabilistic Discriminative

More information

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas CS839: Probabilistic Graphical Models Lecture 7: Learning Fully Observed BNs Theo Rekatsinas 1 Exponential family: a basic building block For a numeric random variable X p(x ) =h(x)exp T T (x) A( ) = 1

More information

Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1)

Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1) Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1) Detection problems can usually be casted as binary or M-ary hypothesis testing problems. Applications: This chapter: Simple hypothesis

More information

A CUSUM approach for online change-point detection on curve sequences

A CUSUM approach for online change-point detection on curve sequences ESANN 22 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges Belgium, 25-27 April 22, i6doc.com publ., ISBN 978-2-8749-49-. Available

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Introduction to Information Geometry

Introduction to Information Geometry Introduction to Information Geometry based on the book Methods of Information Geometry written by Shun-Ichi Amari and Hiroshi Nagaoka Yunshu Liu 2012-02-17 Outline 1 Introduction to differential geometry

More information

OPTIMAL WEIGHT LEARNING FOR COUPLED TENSOR FACTORIZATION WITH MIXED DIVERGENCES

OPTIMAL WEIGHT LEARNING FOR COUPLED TENSOR FACTORIZATION WITH MIXED DIVERGENCES OPTIMAL WEIGHT LEARNING FOR COUPLED TENSOR FACTORIZATION WITH MIXED DIVERGENCES Umut Şimşekli, Beyza Ermiş, A. Taylan Cemgil Boğaziçi University Dept. of Computer Engineering 34342, Bebek, İstanbul, Turkey

More information

Lecture 3: Pattern Classification

Lecture 3: Pattern Classification EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

Lecture 3: Pattern Classification. Pattern classification

Lecture 3: Pattern Classification. Pattern classification EE E68: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mitures and

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

t Tao Group Limited, Reading, RG6 IAZ, U.K. wenwu.wang(ieee.org t Samsung Electronics Research Institute, Staines, TW1 8 4QE, U.K.

t Tao Group Limited, Reading, RG6 IAZ, U.K.   wenwu.wang(ieee.org t Samsung Electronics Research Institute, Staines, TW1 8 4QE, U.K. NON-NEGATIVE MATRIX FACTORIZATION FOR NOTE ONSET DETECTION OF AUDIO SIGNALS Wenwu Wangt, Yuhui Luol, Jonathon A. Chambers', and Saeid Sanei t Tao Group Limited, Reading, RG6 IAZ, U.K. Email: wenwu.wang(ieee.org

More information

MULTIPITCH ESTIMATION AND INSTRUMENT RECOGNITION BY EXEMPLAR-BASED SPARSE REPRESENTATION. Ikuo Degawa, Kei Sato, Masaaki Ikehara

MULTIPITCH ESTIMATION AND INSTRUMENT RECOGNITION BY EXEMPLAR-BASED SPARSE REPRESENTATION. Ikuo Degawa, Kei Sato, Masaaki Ikehara MULTIPITCH ESTIMATION AND INSTRUMENT RECOGNITION BY EXEMPLAR-BASED SPARSE REPRESENTATION Ikuo Degawa, Kei Sato, Masaaki Ikehara EEE Dept. Keio University Yokohama, Kanagawa 223-8522 Japan E-mail:{degawa,

More information

10ème Congrès Français d Acoustique

10ème Congrès Français d Acoustique 1ème Congrès Français d Acoustique Lyon, 1-16 Avril 1 Spectral similarity measure invariant to pitch shifting and amplitude scaling Romain Hennequin 1, Roland Badeau 1, Bertrand David 1 1 Institut TELECOM,

More information