Maximum Marginal Likelihood Estimation For Nonnegative Dictionary Learning

Size: px

Start display at page:

Download "Maximum Marginal Likelihood Estimation For Nonnegative Dictionary Learning"

Madeline Pitts
5 years ago
Views:

1 Maximum Marginal Likelihood Estimation For Nonnegative Dictionary Learning Onur Dikmen and Cédric Févotte CNRS LTCI; Télécom ParisTech perso.telecom-paristech.fr/ dikmen

2 Outline 1 Problem & Motivation 2 Model 3 Estimators & Algorithms MJLE MMLE 4 Results A Piano Excerpt Swimmer Dataset 5 Conclusions Onur Dikmen and Cédric Févotte Maximum Marginal Likelihood Estimation For Nonnegative Dictionary Learning 1 / 15

3 Problem & Motivation NMF: Approximate V WH by minimising D(V WH) (Kullback-Leibler (KL), Itakura-Saito (IS), Euclidean) D KL (A B) = F N f=1 n=1 ( a fn log a ) fn a fn +b fn b fn W: dictionary (F K), H: expansion coefficients (K N), nonnegative Efficient majorization-minimization algorithms (optimisation of an auxiliary function) Expectation Maximisation (EM) algorithm (Poisson observation model) Optimality of W is in question (FK +KN parameters) Onur Dikmen and Cédric Févotte Maximum Marginal Likelihood Estimation For Nonnegative Dictionary Learning 2 / 15

4 Problem & Motivation Our aim: to learn W from the marginal likelihood p(v W) = p(v W, H)p(H) dh p(v W): marginal likelihood of W, not evidence p(v) Not directly possible. Variational approximation will be pursued. H Onur Dikmen and Cédric Févotte Maximum Marginal Likelihood Estimation For Nonnegative Dictionary Learning 3 / 15

5 Model Minimising D KL (V WH) = ML on the Poisson observation model v fn PO(v fn k w fk h kn ) With the introduction of latent variables C K v fn = c k,fn, c k,fn PO(c k,fn w fk h kn ) k=1 We assign a prior distribution on H h kn G(h kn α k,β k ) G is conjugate prior for PO observation model W is a deterministic variable Onur Dikmen and Cédric Févotte Maximum Marginal Likelihood Estimation For Nonnegative Dictionary Learning 4 / 15

6 Estimators & Algorithms (MJLE) Maximum joint likelihood estimation (MJLE) C JL (V W,H) def = logp(v,h W) = logp(v W,H)+logp(H) Optimisation w.r.t H can be performed by majorization-minimization h kn h kn f w fkv fn /[WH] fn +(α k 1) f w fk +β k Nonnegativity is ensured when α k 1 Akin to standard NMF n w fk w h knv fn /[WH] fn fk n h kn Onur Dikmen and Cédric Févotte Maximum Marginal Likelihood Estimation For Nonnegative Dictionary Learning 5 / 15

7 Estimators & Algorithms (MMLE) Maximum marginal likelihood estimation (MMLE) C ML (V W) def = logp(v W) = log p(v W, H)p(H) dh We cannot obtain log p(v W) analytically. EM algorithm Q(W W) Again, no analytical solution H H logp(v,c,h W)p(C,H V, W)dCdH Approximate the posterior p(c,h V, W) using variational Bayes or MCMC Onur Dikmen and Cédric Févotte Maximum Marginal Likelihood Estimation For Nonnegative Dictionary Learning 6 / 15

8 Estimators & Algorithms (MMLE with VBEM) Approximate p(c,h V, W) with a variational distribution (independent Gamma and Multinomial distributions) by minimising q(c,h) = F N K N q(c fn ) q(h kn ) f=1n=1 k=1n=1 KL(q(C,H) p(c,h V,W)) = logp(v W)+KL(q(C,H) p(v,c,h W)) Fixed point equations logq(h kn ) = + logp(v,c,h W) q(h (kn) )q(c) logq(c fn ) = + logp(v,c,h W) q(h)q(c (fn) ) Onur Dikmen and Cédric Févotte Maximum Marginal Likelihood Estimation For Nonnegative Dictionary Learning 7 / 15

9 Estimators & Algorithms (MMLE with VBEM) Approximate E-step ˆQ(W W) H logp(v,c,h W)q(C,H)dCdH ˆQ(W W) is analytically available because q(c,h) can be factorized Multiplicative update rules for W: n w fk w exp( logh kn )v fn /[Wexp( logh )] fn fk n h kn KL(q p(v,c,h W)) is a lower bound for logp(v W) Onur Dikmen and Cédric Févotte Maximum Marginal Likelihood Estimation For Nonnegative Dictionary Learning 8 / 15

10 Relation to Other Works Our model coincides with the Gamma-Poisson model in Canny 04. MJLE is derived there also. In Buntine 06, W has Dirichlet priors and inferred using variational Bayes. In Cemgil 09, W has Gamma priors. Model selection is done using p(v), after full Bayesian treatment. Onur Dikmen and Cédric Févotte Maximum Marginal Likelihood Estimation For Nonnegative Dictionary Learning 9 / 15

A short piano excerpt 15 sec., 22.05 khz.

11 A short piano excerpt 15 sec., khz., 16 bits STFT, overlapping windows of size 1024 F = 513, N = 676 Combinations of 4 notes are played Onur Dikmen and Cédric Févotte Maximum Marginal Likelihood Estimation For Nonnegative Dictionary Learning 10 / 15

12 Likelihoods vs. Component Number 2 x 104 MJLE with MU 4.6 x 104 MMLE with VBEM C JL C ML (lower bound) K (a) Joint likelihood K (b) Marginal likelihood Onur Dikmen and Cédric Févotte Maximum Marginal Likelihood Estimation For Nonnegative Dictionary Learning 11 / 15

13 Estimated Dictionaries W MJLE W MMLE Onur Dikmen and Cédric Févotte Maximum Marginal Likelihood Estimation For Nonnegative Dictionary Learning 12 / 15

14 Swimmer Dataset 4 joints / 4 angles 256 figures, 32x32 pixels each Onur Dikmen and Cédric Févotte Maximum Marginal Likelihood Estimation For Nonnegative Dictionary Learning 13 / 15

15 Estimated Dictionaries (a) W MJLE (b) W MMLE Onur Dikmen and Cédric Févotte Maximum Marginal Likelihood Estimation For Nonnegative Dictionary Learning 14 / 15

16 Conclusions Two approaches: Joint likelihood versus marginal likelihood MMLE with VBEM has comparable complexity to MJLE When K opt is used, they perform similarly MMLE has an intrinsic way of selecting the model order by automatically cancelling irrelevant columns in W More efficient than computing and comparing p(v) for many values of K in a full Bayesian setting Onur Dikmen and Cédric Févotte Maximum Marginal Likelihood Estimation For Nonnegative Dictionary Learning 15 / 15

Maximum Marginal Likelihood Estimation for Nonnegative Dictionary Learning in the Gamma-Poisson Model

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 60, NO. 10, OCTOBER 2012 5163 Maximum Marginal Likelihood Estimation for Nonnegative Dictionary Learning in the Gamma-Poisson Model Onur Dikmen, Member,IEEE,and