Empirical Bayes Unfolding of Elementary Particle Spectra at the Large Hadron Collider

Empirical Bayes Unfolding of Elementary Particle Spectra at the Large Hadron Collider Mikael Kuusela Institute of Mathematics, EPFL Statistics Seminar, University of Bristol June 13, 2014 Joint work with Victor M. Panaretos Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 1 / 26

CERN and the Large Hadron Collider Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 2 / 26

CMS Experiment at the LHC Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 3 / 26

Statistics at CERN Hypothesis testing / interval estimation with a large number of nuisance parameters Higgs boson, supersymmetry, beyond Standard Model physics,... Nonparametric multiple regression Energy response calibration Statistical inverse problems Unfolding Classification Improve S/B ratio, particle identification, triggering Pattern recognition Particle tracking Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 4 / 26

The Unfolding Problem Intensity 1500 1000 500 Any measurement carried out at the LHC is affected by the finite resolution of the particle detectors This causes the observed spectrum of events to be smeared or blurred with respect to the true one The unfolding problem is to estimate the true spectrum using the smeared observations Mathematically closely related to deblurring in optics and tomographic image reconstruction in medical imaging (b) Smeared intensity (a) True intensity 1500 Folding Unfolding Intensity 1000 500 0 Figure 5 : Smeared 0 spectrum 5 Physical observable 0 Figure 5 : True 0 spectrum 5 Physical observable Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 5 / 26

Unfolding is an Ill-Posed Inverse Problem The main issue in unfolding is the ill-posedness of the mapping from the true spectrum to the smeared spectrum The (pseudo)inverse of this mapping is very sensitive to small perturbations of the data Need to regularize the problem by introducing additional information about plausible solutions Current state-of-the-art : 1 EM iteration with early stopping 2 Generalized Tikhonov regularization Two major challenges: 1 How to choose the regularization strength? 2 How to quantify the uncertainty of the solution? In this talk, we propose an empirical Bayes unfolding framework for tackling these issues Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 6 / 26

Problem Formulation Using Poisson Point Processes (1) The appropriate mathematical model for unfolding is that of indirectly observed Poisson point processes A random measure M is a Poisson point process with intensity function f and state space E iff 1 M(B) Poisson(λ(B)), where λ(b) = f (s) ds, for every Borel set B B E 2 M(B 1 ),..., M(B n ) are independent random variables for disjoint Borel sets B 1,..., B n E The intensity function f uniquely characterizes the law of M I.e., all the information about the behavior of M is contained in f Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 7 / 26

Problem Formulation Using Poisson Point Processes (2) Let M and N be two Poisson point processes with intensities f and g and state spaces E and F, respectively Assume that M represents the true, particle-level events and N the smeared, detector-level events Then g(t) = (Kf )(t) = E k(t, s)f (s) ds, where the smearing kernel k represents the response of the detector and is given by k(t, s) = p(y i = t X i = s, ith event observed)p(ith event observed X i = s), where X i is the ith true event and Y i the corresponding smeared event Task: Estimate f given a single realization of the process N Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 8 / 26

Empirical Bayes Unfolding We propose to estimate f based on the following key principles: 1 Discretization of the true intensity f using a cubic B-spline basis expansion, that is, p f (s) = β j B j (s), where B j, j = 1,..., p, are the B-spline basis functions 2 Posterior mean estimation of the B-spline coefficients β = [β 1,..., β p ] T 3 Empirical Bayes selection of the scale δ of the regularizing smoothness prior p(β δ) 4 Frequentist uncertainty quantification and bias correction using the parametric bootstrap j=1 Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 9 / 26

Discretization of the Problem Let {F i } n i=1 be a partition of the smeared space F with n intervals Let y i = N(F i ) be the number of points observed in interval F i I.e., we record the observations into a histogram y = [y 1,..., y n ] T Then E(y i β) = g(t) dt = k(t, s)f (s) ds dt F i F i E ( p ) = k(t, s)b j (s) ds dt β j = j=1 F } i E {{ } :=K i,j Hence, we need to solve the Poisson regression problem for an ill-conditioned matrix K y β Poisson(Kβ) p K i,j β j Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 10 / 26 j=1

Bayesian Estimation of the Spline Coefficients Posterior for β: where the likelihood is given by n p(y β) = y i! p(β y, δ) = p(y β)p(β δ), β R p p(y δ) +, i=1 ( p j=1 K i,jβ j ) y i e p j=1 K i,j β j, β R p + We regularize the problem using the Gaussian smoothness prior p(β δ) exp ( δ f 2 2) = exp ( δβ T Ωβ ), β R p +, with δ > 0 and Ω i,j = E B i (s)b j (s) ds This becomes a proper pdf once we impose Aristotelian boundary conditions We use a single-component Metropolis Hastings algorithm to sample from the posterior The univariate proposal densities are chosen to approximate the full conditionals p(β k β k, y, δ) of the Gibbs sampler as proposed by Saquib et al. (1998) Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 11 / 26

Empirical Bayes Estimation of the Hyperparameter We propose choosing the hyperparameter δ (i.e. the regularization parameter) via marginal maximum likelihood: ˆδ = ˆδ(y) = arg max p(y δ) = arg max p(y β)p(β δ) dβ δ>0 δ>0 The marginal maximum likelihood estimate ˆδ is found using a Monte Carlo expectation-maximization algorithm (Geman and McClure, 1985, 1987; Saquib et al., 1998): E-step: Sample β (1),..., β (S) from the posterior p(β y, δ (t) ) and compute Q(δ; δ (t) ) = 1 S S s=1 log p(β(s) δ) R p + M-step: Set δ (t+1) = arg max δ>0 Q(δ; δ (t) ) The spline coefficients β are then estimated using the mean of the empirical Bayes posterior: ˆβ = E(β y, ˆδ) The estimated intensity is ˆf (s) = p j=1 ˆβ j B j (s) Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 12 / 26

What Does Empirical Bayes Do? ˆδ = arg max δ>0 p(y δ) = arg max δ>0 p(y θ)p(θ δ) dθ p(θ δ) p(y θ) 2 p(y θ)p(θ δ) dθ θ Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 13 / 26

Empirical Bayes vs. Hierarchical Bayes Hierarchical Bayes is a natural alternative for empirical Bayes But need to choose the hyperprior p(δ) It is a priori unclear how this should be done Different choices can result in non-negligible differences in the posterior The choice is not necessarily invariant under reparametrizations Empirical Bayes on the other hand: Chooses a unique, best regularizer among the family of priors {p(β δ)} δ>0 Requires only the choice of the family {p(β δ)} δ>0 Is by construction transformation invariant Empirical Bayes has become part of the standard methodology in generalized additive models (Wood, 2011) and Gaussian processes (Rasmussen and Williams, 2006) What about inverse problems? Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 14 / 26

Uncertainty Quantification and Bias Correction (1) The credible intervals of the empirical Bayes posterior p(β y, ˆδ) could in principle be used to make confidence statements about f But due to the data-driven choice of the prior, these intervals lose their subjective Bayesian interpretation Furthermore, their frequentist properties are poorly understood Instead, we propose using the parametric bootstrap to construct frequentist confidence bands for f : 1 Obtain a resampled observation y 2 Rerun the MCEM algorithm with y to find ˆδ = ˆδ(y ) 3 Compute ˆβ = E(β y, ˆδ ) 4 Obtain ˆf (s) = p j=1 ˆβ j B j(s) 5 Repeat R times The bootstrap sample {ˆf (r) } R r=1 is then used to compute approximate frequentist confidence intervals for f (s) for each s E This procedure also takes into account uncertainty regarding the choice of the hyperparameter δ Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 15 / 26

Uncertainty Quantification and Bias Correction (2) One can envisage various ways of obtaining the resampled observations y and of using the bootstrap sample {ˆf (r) } R r=1 to compute approximate frequentist confidence bands We propose using: Resampling: y i.i.d. Poisson(K ˆβ), where ˆβ = E(β y, ˆδ) Intervals: Pointwise 1 2α basic bootstrap intervals, given by [2ˆf (s) ˆf 1 α(s), 2ˆf (s) ˆf α (s)] Here ˆf α (s) denotes the α-quantile of the bootstrap sample evaluated at point s E The bootstrap may also be used to correct for the unavoidable bias in the point estimate ˆf Bootstrap estimate of the bias: bias (ˆf (s) ) = 1 R R r=1 ˆf (r) (s) ˆf (s) Bias-corrected point estimate: ˆf BC (s) = ˆf (s) bias (ˆf (s) ) Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 16 / 26

Demonstration: Setup True intensity { } 1 f (s) = λ tot π 1 N (s 2, 1) + π 2 N (s 2, 1) + π 3, E with π 1 = 0.2, π 2 = 0.5 and π 2 = 0.3 Smeared intensity g(t) = E N (t s 0, 1)f (s) ds E = F = [ 7, 7], discretized using n = 40 histogram bins and p = 30 B-spline basis functions The condition number of the smearing matrix K is 2.6 10 8 Problem severely ill-posed! Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 17 / 26

Demonstration: Empirical Bayes Unfolding, λ tot = 20 000 (a) Empirical Bayes unfolding with λ tot = 20000 4500 4000 Unfolded intensity True intensity Smeared intensity Intensity 3500 3000 2500 2000 1500 1000 500 0 6 4 2 0 2 4 6 Figure : Empirical Bayes unfolding, λ tot = 20 000, 95 % pointwise basic intervals Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 18 / 26

Demonstration: Empirical Bayes Unfolding, λ tot = 1 000 Empirical Bayes unfolding with λ tot = 1000 200 Unfolded intensity, bias cor. Unfolded intensity, no bias cor. True intensity Smeared intensity Intensity 150 100 50 0 6 4 2 0 2 4 6 Figure : Empirical Bayes unfolding, λ tot = 1 000, 95 % pointwise basic intervals Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 19 / 26

Demonstration: Hierarchical Bayes Unfolding, λ tot = 1 000 (a) Hierarchical Bayes unfolding with λ tot = 1000 200 Unfolded intensity True intensity Smeared intensity Intensity 150 100 50 0 6 4 2 0 2 4 6 Figure : Hierarchical Bayes, δ Gamma(1, 0.05), 95 % credible intervals Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 20 / 26

Demonstration: Hierarchical Bayes Unfolding, λ tot = 1 000 (a) Hierarchical Bayes unfolding with λ tot = 1000 200 Unfolded intensity True intensity Smeared intensity Intensity 150 100 50 0 6 4 2 0 2 4 6 Figure : Hierarchical Bayes, δ Gamma(0.001, 0.001), 95 % credible intervals Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 21 / 26

Z e + e : Setup We demonstrate empirical Bayes unfolding with real data by unfolding the Z e + e invariant mass spectrum measured in CMS The data are published in Chatrchyan et al. (2013) and correspond to integrated luminosity of 4.98 fb 1 collected in 2011 at s = 7 TeV 67 778 high quality electron-positron pairs with invariant masses 65 115 GeV in 0.5 GeV bins Response: convolution with the Crystal Ball function 2 CB(m m, σ 2 Ce (m m), α, γ) = C ( γ α 2σ 2 m m, σ ) γ e α2 2 ( γ α α m m σ ) γ, m m σ > α, α CB parameters estimated with maximum likelihood using 30 % of the data assuming that the true intensity is the non-relativistic Breit Wigner with PDG values for the Z mass and width Only the remaining 70 % used for unfolding Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 22 / 26

Z e + e : Event Display Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 23 / 26

Z e + e : Empirical Bayes Unfolding 14000 12000 10000 Empirical Bayes unfolding of the Z boson invariant mass Unfolded intensity True intensity Smeared data Intensity 8000 6000 4000 2000 0 82 84 86 88 90 92 94 96 98 Invariant mass (GeV) Figure : Empirical Bayes unfolding with bias correction and 95 % pointwise basic intervals Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 24 / 26

Conclusions We have introduced an empirical Bayes unfolding framework which enables a principled choice of the regularization parameter and frequentist uncertainty quantification Our studies are motivated by a real-world data analysis problem at CERN We work in direct collaboration with CERN physicists to improve the unfolding techniques used in LHC data analysis Our method provides reasonable estimates in very challenging unfolding scenarios Uncertainty quantification in unfolding is hampered by the presence of an unavoidable bias from the regularization But basic bootstrap resampling still provides an encouraging first approximation Further details in: Kuusela, M. and Panaretos, V. M. (2014). Empirical Bayes unfolding of elementary particle spectra at the Large Hadron Collider. arxiv:1401.8274 [stat.ap]. Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 25 / 26

References Chatrchyan, S. et al. (CMS Collaboration, 2013). Energy calibration and resolution of the CMS electromagnetic calorimeter in pp collisions at s = 7 TeV. Journal of Instrumentation, 8(09):P09009. Geman, S. and McClure, D. E. (1985). Bayesian image analysis: an application to single photon emission tomography. In Proceedings of the American Statistical Association, Statistical Computing Section, pages 12 18. Geman, S. and McClure, D. E. (1987). Statistical methods for tomographic image reconstruction. Bulletin of the International Statistical Institute, LII(4):5 21. Rasmussen, C. E. and Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. MIT Press. Saquib, S. S., Bouman, C. A., and Sauer, K. (1998). ML parameter estimation for Markov random fields with applications to Bayesian tomography. IEEE Transactions on Image Processing, 7(7):1029 1044. Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73:3 36. Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 26 / 26

Backup Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 27 / 26

Intuition on the Basic Bootstrap Intervals Inversion of a CDF pivot ( Neyman construction ): [ˆθ L, ˆθ U ] s.t. ˆθ0 p(ˆθ θ = ˆθ U ) dˆθ = α, ˆθ 0 p(ˆθ θ = ˆθ L ) dˆθ = 1 α p(ˆθ θ = ˆθ L ) p(ˆθ θ = ˆθ 0 ) p(ˆθ θ = ˆθ U ) ˆθ L ˆθ0 ˆθU α α Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 28 / 26

Intuition on the Basic Bootstrap Intervals Basic bootstrap interval: [ˆθ L, ˆθ U ] = [2ˆθ 0 ˆθ 1 α, 2ˆθ 0 ˆθ α] g(ˆθ + ˆθ 0 ˆθ L ) p(ˆθ θ = ˆθ 0 ) = g(ˆθ) g(ˆθ + ˆθ 0 ˆθ U ) ˆθ L ˆθ0 ˆθU α α Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 28 / 26

Demonstration: Setup λ tot 1 000 20 000 MCEM iterations 30 20 δ (0) 1 10 5 MCMC sample size during EM 1 000 500 MCMC sample size for ˆβ 1 000 R 200 Running time for ˆf 9 min 3 min Running time with bootstrap 9 h 56 min 3 h 36 min Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 29 / 26

Z e + e : Setup We unfold the n = 30 bins on the interval F = [82.5 GeV, 97.5 GeV] and use p = 38 B-spline basis functions to reconstruct the true intensity on the interval E = [81.5 GeV, 98.5 GeV] Here p > n facilitates the mixing of the MCMC sampler and E F accounts for boundary effects Other parameters: MCEM iterations 20 δ (0) 1 10 6 MCMC sample size during EM 500 MCMC sample size for ˆβ 5 000 R 200 Running time for ˆf Running time with bootstrap 5 min 6 h 13 min Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 30 / 26

Aristotelian Boundary Conditions (1) The prior p(β δ) exp ( δβ T Ωβ ) with Ω i,j = E B i (s)b j (s) ds is potentially improper since Ω has rank p 2 If the prior is improper, then the marginal p(y δ) is also improper and it makes no sense to use empirical Bayes for estimating δ The problem can be solved by imposing the so called Aristotelian boundary conditions That is, we condition on the unknown boundary values of f (or equivalently on β 1 and β p ) and place additional hyperpriors on these values: with p(β δ) = p(β 2,..., β p 1 β 1, β p, δ)p(β 1 δ)p(β p δ), β R p +, p(β 2,..., β p 1 β 1, β p, δ) exp( δβ T Ωβ), p(β 1 δ) exp ( δγ L β 2 1), p(β p δ) exp ( δγ R β 2 p), where γ L, γ R > 0 are fixed constants Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 31 / 26

Aristotelian Boundary Conditions (2) As a result p(β δ) exp ( δβ T Ω A β ) where the elements of Ω A are given by Ω i,j + γ L, if i = j = 1, Ω A,i,j = Ω i,j + γ R, if i = j = p, Ω i,j, otherwise The augmented matrix Ω A is positive definite and hence the modified prior is a proper pdf The Aristotelian prior has the added benefit that by controlling γ L and γ R we are able to control the variance of ˆf near the boundaries In our numerical experiments we had: Gaussian mixture model data: γ L = γ R = 5 Z e + e data: γ L = γ R = 70 Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 32 / 26

Demonstration: No regularization (b) Unregularized unfolding with λ tot = 20000 6000 5000 4000 Posterior mean, uniform prior Least squares, unfolded Least squares, smeared True intensity Smeared intensity Intensity 3000 2000 1000 0 6 4 2 0 2 4 6 Figure : Unfolding of the Gaussian mixture model data (λ tot = 20 000) without regularization. Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 33 / 26

Convergence of MCEM 10 3 (a) Convergence of MCEM Hyperparameter δ 10 4 10 5 10 6 λ tot = 1000 λ tot = 20000 10 7 0 10 20 30 Iteration Figure : Convergence of the MCEM algorithm for estimating the hyperparameter δ with the Gaussian mixture model data Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 34 / 26

MCMC Diagnostics β21 β5 800 600 400 200 4000 3500 3000 0 500 1000 Iteration 2500 0 500 1000 Iteration Frequency Frequency 100 50 Variable 5 1 Autocorrelation 0.5 0 0 500 1000 0 10 20 30 β5 Lag Variable 21 150 1 100 50 0 2000 4000 6000 β21 Autocorrelation 0 0.5 0 0 10 20 30 Lag Cumulative mean Cumulative mean 600 500 400 300 200 0 500 1000 Iteration 3800 3600 3400 3200 0 500 1000 Iteration Figure : Convergence and mixing diagnostics for the single-component Metropolis Hastings sampler for variables β 5 and β 21 with the Gaussian mixture model data with λ tot = 20 000: from left to right, the trace plots, histograms, estimated autocorrelation functions and cumulative means of the samples. Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 35 / 26

Convergence of Empirical Bayes Unfolding (b) Convergence of MISE 10 3 2 MISE / λ tot 10 4 10 4 10 5 Expected sample size λ tot Figure : Convergence of the mean integrated squared error (MISE) with the Gaussian mixture model data as the expected sample size λ tot grows. The error bars indicate approximate 95 % confidence intervals. Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 36 / 26

Monte Carlo EM Algorithm for Finding the MMLE The Monte Carlo EM algorithm (Geman and McClure, 1985, 1987; Saquib et al., 1998) for finding the marginal maximum likelihood estimate ˆδ: 1 Pick some initial guess δ (0) > 0 and set t = 0 2 E-step: 1 Sample β (1), β (2),..., β (S) from the posterior p(β y, δ (t) ) 2 Compute: Q(δ; δ (t) ) = 1 S log p(β (s) δ) S 3 M-step: Set δ (t+1) = arg max δ>0 4 Set t t + 1 s=1 Q(δ; δ (t) ) 5 If some stopping rule is satisfied, set ˆδ = δ (t) and terminate the iteration, else go to step 2 Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 37 / 26

B-Spline Basis Functions Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 38 / 26

Details of the MCMC Implementation (1) We use the single-component Metropolis Hastings sampler of Saquib et al. (1998) The kth full conditional satisfies n ( p ) log p(β k β k, y, δ) = y i log K i,j β j i=1 δ p i=1 j=1 n i=1 p K i,j β j j=1 p Ω i,j β i β j + const := f (β k, β k ) j=1 Taking a 2nd order Taylor expansion of the log-term around the current position β k of the Markov chain, we find where f (βk, β k ) d 1,k (βk β k ) + d 2,k 2 (β k β k ) 2 ( δ Ω k,k (βk ) 2 + 2 ) Ω i,k β i βk + const := g(βk, β), i k with µ = Kβ d 1,k = n i=1 ( K i,k 1 y ) i, d 2,k = µ i n i=1 ( ) 2 Ki,k y i µ i Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 39 / 26

Details of the MCMC Implementation (2) As a function of βk, the approximate full conditional g(βk, β) =d 1,k(βk β k) + d 2,k 2 (β k β k) 2 ( δ Ω k,k (βk )2 + 2 ) Ω i,k β i βk + const i k is a Gaussian with mean m k = d 1,k d 2,k β k 2δ i k Ω i,kβ i 2δΩ k,k d 2,k and variance σ 2 k = 1 2δΩ k,k d 2,k Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 40 / 26

Details of the MCMC Implementation (3) If m k 0, the proposal β k is sampled from N (m k, σ 2 k ) truncated to [0, ) If m k < 0, the proposal β k is sampled from Exp(λ) with β k log p(β k β) β k =0 = β k g(β k, β) β k =0 giving λ = d 1,k + d 2,k β k + 2δ i k Ω i,kβ i Denote: p(β k β) := q(β k, β k, β k ), p(β y, δ) := h(β k, β k ) The acceptance probability for the kth component of the single-component Metropolis Hastings algorithm is given by { a(βk, β) = min 1, h(β k, β k)q(β k, βk, β } k) h(β k, β k )q(βk, β k, β k ) Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 41 / 26

The Expectation-Maximization Algorithm The EM algorithm is an iterative method for finding the maximum of the likelihood L(θ; y) = p(y θ) Applies in cases where the data y can be seen as an incomplete version of some complete data x (that is, y = g(x)) with complete-data likelihood L(θ; x) = p(x θ) The EM iteration: 1 Pick some initial guess θ (0) and set t = 0 2 E-step: Compute Q(θ; θ (t) ) = E(log p(x θ) y, θ (t) ) 3 M-step: Set θ (t+1) = arg max θ Q(θ; θ (t) ) 4 Set t t + 1 5 If some stopping rule is satisfied, set ˆθ MLE = θ (t) and terminate the iteration, else go to step 2 The EM iteration never decreases the incomplete-data likelihood That is, L(θ (t+1) ; y) L(θ (t) ; y) for all t = 0, 1, 2,... Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 42 / 26