Empirical Bayes Unfolding of Elementary Particle Spectra at the Large Hadron Collider

Similar documents
Introduction to Unfolding in High Energy Physics

Density Estimation. Seungjin Choi

Introduction to Bayesian methods in inverse problems

Introduction to Probabilistic Machine Learning

MCMC Sampling for Bayesian Inference using L1-type Priors

Statistics: Learning models from data

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

MCMC algorithms for fitting Bayesian models

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Systematic uncertainties in statistical data analysis for particle physics. DESY Seminar Hamburg, 31 March, 2009

COS513 LECTURE 8 STATISTICAL CONCEPTS

Bayesian Regression Linear and Logistic Regression

Bayesian Methods for Machine Learning

Bayesian Inference and MCMC

Bayesian Linear Regression

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

Markov Chain Monte Carlo methods

STA 4273H: Statistical Machine Learning

Bayesian Inference in Astronomy & Astrophysics A Short Course

Bayesian Phylogenetics:

Principles of Bayesian Inference

Primer on statistics:

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart

Stat 516, Homework 1

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1

CPSC 540: Machine Learning

Statistical Methods in Particle Physics Lecture 1: Bayesian methods

Introduction to Machine Learning

an introduction to bayesian inference

The Bayesian approach to inverse problems

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

Computational statistics

Calibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods

Bayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters

Appendix F. Computational Statistics Toolbox. The Computational Statistics Toolbox can be downloaded from:

Markov Chain Monte Carlo methods

Parametric Techniques Lecture 3

Fitting Narrow Emission Lines in X-ray Spectra

Default Priors and Effcient Posterior Computation in Bayesian

David Giles Bayesian Econometrics

Learning the hyper-parameters. Luca Martino

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

A Very Brief Summary of Statistical Inference, and Examples

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

A Review of Pseudo-Marginal Markov Chain Monte Carlo

Marginal Specifications and a Gaussian Copula Estimation

The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for Bayesian Estimation in a Finite Gaussian Mixture Model

Contents. Part I: Fundamentals of Bayesian Inference 1

On Markov chain Monte Carlo methods for tall data

Measurement of jet production in association with a Z boson at the LHC & Jet energy correction & calibration at HLT in CMS

Basic math for biology

Stat 5101 Lecture Notes

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Approximate Inference using MCMC

STAT 518 Intro Student Presentation

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Eco517 Fall 2013 C. Sims MCMC. October 8, 2013

Lecture 8: Bayesian Estimation of Parameters in State Space Models

Practical Bayesian Quantile Regression. Keming Yu University of Plymouth, UK

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

g-priors for Linear Regression

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Advanced Statistical Methods. Lecture 6

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

Bayesian Nonparametric Regression for Diabetes Deaths

Bayesian Inference: Concept and Practice

GAUSSIAN PROCESS REGRESSION

wissen leben WWU Münster

Sequential Monte Carlo Samplers for Applications in High Dimensions

Parametric Techniques

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling

Numerical Analysis for Statisticians

STA 4273H: Sta-s-cal Machine Learning

Statistics for Particle Physics. Kyle Cranmer. New York University. Kyle Cranmer (NYU) CERN Academic Training, Feb 2-5, 2009

Lecture 4: Probabilistic Learning

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Nonparametric Bayesian Methods - Lecture I

Spatial Statistics Chapter 4 Basics of Bayesian Inference and Computation

Riemann Manifold Methods in Bayesian Statistics

Multilevel Statistical Models: 3 rd edition, 2003 Contents

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling

Fractional Imputation in Survey Sampling: A Comparative Review

Introduction to Markov Chain Monte Carlo & Gibbs Sampling

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering

Metropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9

Bayesian GLMs and Metropolis-Hastings Algorithm

Stat 451 Lecture Notes Numerical Integration

Bayesian Methods. David S. Rosenberg. New York University. March 20, 2018

Notes on pseudo-marginal methods, variational Bayes and ABC

Monte Carlo in Bayesian Statistics

Large Scale Bayesian Inference

Transcription:

Empirical Bayes Unfolding of Elementary Particle Spectra at the Large Hadron Collider Mikael Kuusela Institute of Mathematics, EPFL Statistics Seminar, University of Bristol June 13, 2014 Joint work with Victor M. Panaretos Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 1 / 26

CERN and the Large Hadron Collider Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 2 / 26

CMS Experiment at the LHC Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 3 / 26

Statistics at CERN Hypothesis testing / interval estimation with a large number of nuisance parameters Higgs boson, supersymmetry, beyond Standard Model physics,... Nonparametric multiple regression Energy response calibration Statistical inverse problems Unfolding Classification Improve S/B ratio, particle identification, triggering Pattern recognition Particle tracking Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 4 / 26

The Unfolding Problem Intensity 1500 1000 500 Any measurement carried out at the LHC is affected by the finite resolution of the particle detectors This causes the observed spectrum of events to be smeared or blurred with respect to the true one The unfolding problem is to estimate the true spectrum using the smeared observations Mathematically closely related to deblurring in optics and tomographic image reconstruction in medical imaging (b) Smeared intensity (a) True intensity 1500 Folding Unfolding Intensity 1000 500 0 Figure 5 : Smeared 0 spectrum 5 Physical observable 0 Figure 5 : True 0 spectrum 5 Physical observable Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 5 / 26

Unfolding is an Ill-Posed Inverse Problem The main issue in unfolding is the ill-posedness of the mapping from the true spectrum to the smeared spectrum The (pseudo)inverse of this mapping is very sensitive to small perturbations of the data Need to regularize the problem by introducing additional information about plausible solutions Current state-of-the-art : 1 EM iteration with early stopping 2 Generalized Tikhonov regularization Two major challenges: 1 How to choose the regularization strength? 2 How to quantify the uncertainty of the solution? In this talk, we propose an empirical Bayes unfolding framework for tackling these issues Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 6 / 26

Problem Formulation Using Poisson Point Processes (1) The appropriate mathematical model for unfolding is that of indirectly observed Poisson point processes A random measure M is a Poisson point process with intensity function f and state space E iff 1 M(B) Poisson(λ(B)), where λ(b) = f (s) ds, for every Borel set B B E 2 M(B 1 ),..., M(B n ) are independent random variables for disjoint Borel sets B 1,..., B n E The intensity function f uniquely characterizes the law of M I.e., all the information about the behavior of M is contained in f Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 7 / 26

Problem Formulation Using Poisson Point Processes (2) Let M and N be two Poisson point processes with intensities f and g and state spaces E and F, respectively Assume that M represents the true, particle-level events and N the smeared, detector-level events Then g(t) = (Kf )(t) = E k(t, s)f (s) ds, where the smearing kernel k represents the response of the detector and is given by k(t, s) = p(y i = t X i = s, ith event observed)p(ith event observed X i = s), where X i is the ith true event and Y i the corresponding smeared event Task: Estimate f given a single realization of the process N Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 8 / 26

Empirical Bayes Unfolding We propose to estimate f based on the following key principles: 1 Discretization of the true intensity f using a cubic B-spline basis expansion, that is, p f (s) = β j B j (s), where B j, j = 1,..., p, are the B-spline basis functions 2 Posterior mean estimation of the B-spline coefficients β = [β 1,..., β p ] T 3 Empirical Bayes selection of the scale δ of the regularizing smoothness prior p(β δ) 4 Frequentist uncertainty quantification and bias correction using the parametric bootstrap j=1 Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 9 / 26

Discretization of the Problem Let {F i } n i=1 be a partition of the smeared space F with n intervals Let y i = N(F i ) be the number of points observed in interval F i I.e., we record the observations into a histogram y = [y 1,..., y n ] T Then E(y i β) = g(t) dt = k(t, s)f (s) ds dt F i F i E ( p ) = k(t, s)b j (s) ds dt β j = j=1 F } i E {{ } :=K i,j Hence, we need to solve the Poisson regression problem for an ill-conditioned matrix K y β Poisson(Kβ) p K i,j β j Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 10 / 26 j=1

Bayesian Estimation of the Spline Coefficients Posterior for β: where the likelihood is given by n p(y β) = y i! p(β y, δ) = p(y β)p(β δ), β R p p(y δ) +, i=1 ( p j=1 K i,jβ j ) y i e p j=1 K i,j β j, β R p + We regularize the problem using the Gaussian smoothness prior p(β δ) exp ( δ f 2 2) = exp ( δβ T Ωβ ), β R p +, with δ > 0 and Ω i,j = E B i (s)b j (s) ds This becomes a proper pdf once we impose Aristotelian boundary conditions We use a single-component Metropolis Hastings algorithm to sample from the posterior The univariate proposal densities are chosen to approximate the full conditionals p(β k β k, y, δ) of the Gibbs sampler as proposed by Saquib et al. (1998) Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 11 / 26

Empirical Bayes Estimation of the Hyperparameter We propose choosing the hyperparameter δ (i.e. the regularization parameter) via marginal maximum likelihood: ˆδ = ˆδ(y) = arg max p(y δ) = arg max p(y β)p(β δ) dβ δ>0 δ>0 The marginal maximum likelihood estimate ˆδ is found using a Monte Carlo expectation-maximization algorithm (Geman and McClure, 1985, 1987; Saquib et al., 1998): E-step: Sample β (1),..., β (S) from the posterior p(β y, δ (t) ) and compute Q(δ; δ (t) ) = 1 S S s=1 log p(β(s) δ) R p + M-step: Set δ (t+1) = arg max δ>0 Q(δ; δ (t) ) The spline coefficients β are then estimated using the mean of the empirical Bayes posterior: ˆβ = E(β y, ˆδ) The estimated intensity is ˆf (s) = p j=1 ˆβ j B j (s) Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 12 / 26

What Does Empirical Bayes Do? ˆδ = arg max δ>0 p(y δ) = arg max δ>0 p(y θ)p(θ δ) dθ p(θ δ) p(y θ) 2 p(y θ)p(θ δ) dθ θ Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 13 / 26

What Does Empirical Bayes Do? ˆδ = arg max δ>0 p(y δ) = arg max δ>0 p(y θ)p(θ δ) dθ p(θ δ) p(y θ) 2 p(y θ)p(θ δ) dθ θ Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 13 / 26

What Does Empirical Bayes Do? ˆδ = arg max δ>0 p(y δ) = arg max δ>0 p(y θ)p(θ δ) dθ p(θ δ) p(y θ) 2 p(y θ)p(θ δ) dθ θ Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 13 / 26

Empirical Bayes vs. Hierarchical Bayes Hierarchical Bayes is a natural alternative for empirical Bayes But need to choose the hyperprior p(δ) It is a priori unclear how this should be done Different choices can result in non-negligible differences in the posterior The choice is not necessarily invariant under reparametrizations Empirical Bayes on the other hand: Chooses a unique, best regularizer among the family of priors {p(β δ)} δ>0 Requires only the choice of the family {p(β δ)} δ>0 Is by construction transformation invariant Empirical Bayes has become part of the standard methodology in generalized additive models (Wood, 2011) and Gaussian processes (Rasmussen and Williams, 2006) What about inverse problems? Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 14 / 26

Uncertainty Quantification and Bias Correction (1) The credible intervals of the empirical Bayes posterior p(β y, ˆδ) could in principle be used to make confidence statements about f But due to the data-driven choice of the prior, these intervals lose their subjective Bayesian interpretation Furthermore, their frequentist properties are poorly understood Instead, we propose using the parametric bootstrap to construct frequentist confidence bands for f : 1 Obtain a resampled observation y 2 Rerun the MCEM algorithm with y to find ˆδ = ˆδ(y ) 3 Compute ˆβ = E(β y, ˆδ ) 4 Obtain ˆf (s) = p j=1 ˆβ j B j(s) 5 Repeat R times The bootstrap sample {ˆf (r) } R r=1 is then used to compute approximate frequentist confidence intervals for f (s) for each s E This procedure also takes into account uncertainty regarding the choice of the hyperparameter δ Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 15 / 26

Uncertainty Quantification and Bias Correction (2) One can envisage various ways of obtaining the resampled observations y and of using the bootstrap sample {ˆf (r) } R r=1 to compute approximate frequentist confidence bands We propose using: Resampling: y i.i.d. Poisson(K ˆβ), where ˆβ = E(β y, ˆδ) Intervals: Pointwise 1 2α basic bootstrap intervals, given by [2ˆf (s) ˆf 1 α(s), 2ˆf (s) ˆf α (s)] Here ˆf α (s) denotes the α-quantile of the bootstrap sample evaluated at point s E The bootstrap may also be used to correct for the unavoidable bias in the point estimate ˆf Bootstrap estimate of the bias: bias (ˆf (s) ) = 1 R R r=1 ˆf (r) (s) ˆf (s) Bias-corrected point estimate: ˆf BC (s) = ˆf (s) bias (ˆf (s) ) Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 16 / 26

Demonstration: Setup True intensity { } 1 f (s) = λ tot π 1 N (s 2, 1) + π 2 N (s 2, 1) + π 3, E with π 1 = 0.2, π 2 = 0.5 and π 2 = 0.3 Smeared intensity g(t) = E N (t s 0, 1)f (s) ds E = F = [ 7, 7], discretized using n = 40 histogram bins and p = 30 B-spline basis functions The condition number of the smearing matrix K is 2.6 10 8 Problem severely ill-posed! Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 17 / 26

Demonstration: Empirical Bayes Unfolding, λ tot = 20 000 (a) Empirical Bayes unfolding with λ tot = 20000 4500 4000 Unfolded intensity True intensity Smeared intensity Intensity 3500 3000 2500 2000 1500 1000 500 0 6 4 2 0 2 4 6 Figure : Empirical Bayes unfolding, λ tot = 20 000, 95 % pointwise basic intervals Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 18 / 26

Demonstration: Empirical Bayes Unfolding, λ tot = 1 000 Empirical Bayes unfolding with λ tot = 1000 200 Unfolded intensity, bias cor. Unfolded intensity, no bias cor. True intensity Smeared intensity Intensity 150 100 50 0 6 4 2 0 2 4 6 Figure : Empirical Bayes unfolding, λ tot = 1 000, 95 % pointwise basic intervals Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 19 / 26

Demonstration: Hierarchical Bayes Unfolding, λ tot = 1 000 (a) Hierarchical Bayes unfolding with λ tot = 1000 200 Unfolded intensity True intensity Smeared intensity Intensity 150 100 50 0 6 4 2 0 2 4 6 Figure : Hierarchical Bayes, δ Gamma(1, 0.05), 95 % credible intervals Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 20 / 26

Demonstration: Hierarchical Bayes Unfolding, λ tot = 1 000 (a) Hierarchical Bayes unfolding with λ tot = 1000 200 Unfolded intensity True intensity Smeared intensity Intensity 150 100 50 0 6 4 2 0 2 4 6 Figure : Hierarchical Bayes, δ Gamma(0.001, 0.001), 95 % credible intervals Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 21 / 26

Z e + e : Setup We demonstrate empirical Bayes unfolding with real data by unfolding the Z e + e invariant mass spectrum measured in CMS The data are published in Chatrchyan et al. (2013) and correspond to integrated luminosity of 4.98 fb 1 collected in 2011 at s = 7 TeV 67 778 high quality electron-positron pairs with invariant masses 65 115 GeV in 0.5 GeV bins Response: convolution with the Crystal Ball function 2 CB(m m, σ 2 Ce (m m), α, γ) = C ( γ α 2σ 2 m m, σ ) γ e α2 2 ( γ α α m m σ ) γ, m m σ > α, α CB parameters estimated with maximum likelihood using 30 % of the data assuming that the true intensity is the non-relativistic Breit Wigner with PDG values for the Z mass and width Only the remaining 70 % used for unfolding Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 22 / 26

Z e + e : Event Display Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 23 / 26

Z e + e : Empirical Bayes Unfolding 14000 12000 10000 Empirical Bayes unfolding of the Z boson invariant mass Unfolded intensity True intensity Smeared data Intensity 8000 6000 4000 2000 0 82 84 86 88 90 92 94 96 98 Invariant mass (GeV) Figure : Empirical Bayes unfolding with bias correction and 95 % pointwise basic intervals Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 24 / 26

Conclusions We have introduced an empirical Bayes unfolding framework which enables a principled choice of the regularization parameter and frequentist uncertainty quantification Our studies are motivated by a real-world data analysis problem at CERN We work in direct collaboration with CERN physicists to improve the unfolding techniques used in LHC data analysis Our method provides reasonable estimates in very challenging unfolding scenarios Uncertainty quantification in unfolding is hampered by the presence of an unavoidable bias from the regularization But basic bootstrap resampling still provides an encouraging first approximation Further details in: Kuusela, M. and Panaretos, V. M. (2014). Empirical Bayes unfolding of elementary particle spectra at the Large Hadron Collider. arxiv:1401.8274 [stat.ap]. Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 25 / 26

References Chatrchyan, S. et al. (CMS Collaboration, 2013). Energy calibration and resolution of the CMS electromagnetic calorimeter in pp collisions at s = 7 TeV. Journal of Instrumentation, 8(09):P09009. Geman, S. and McClure, D. E. (1985). Bayesian image analysis: an application to single photon emission tomography. In Proceedings of the American Statistical Association, Statistical Computing Section, pages 12 18. Geman, S. and McClure, D. E. (1987). Statistical methods for tomographic image reconstruction. Bulletin of the International Statistical Institute, LII(4):5 21. Rasmussen, C. E. and Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. MIT Press. Saquib, S. S., Bouman, C. A., and Sauer, K. (1998). ML parameter estimation for Markov random fields with applications to Bayesian tomography. IEEE Transactions on Image Processing, 7(7):1029 1044. Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73:3 36. Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 26 / 26

Backup Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 27 / 26

Intuition on the Basic Bootstrap Intervals Inversion of a CDF pivot ( Neyman construction ): [ˆθ L, ˆθ U ] s.t. ˆθ0 p(ˆθ θ = ˆθ U ) dˆθ = α, ˆθ 0 p(ˆθ θ = ˆθ L ) dˆθ = 1 α p(ˆθ θ = ˆθ L ) p(ˆθ θ = ˆθ 0 ) p(ˆθ θ = ˆθ U ) ˆθ L ˆθ0 ˆθU α α Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 28 / 26

Intuition on the Basic Bootstrap Intervals Basic bootstrap interval: [ˆθ L, ˆθ U ] = [2ˆθ 0 ˆθ 1 α, 2ˆθ 0 ˆθ α] g(ˆθ + ˆθ 0 ˆθ L ) p(ˆθ θ = ˆθ 0 ) = g(ˆθ) g(ˆθ + ˆθ 0 ˆθ U ) ˆθ L ˆθ0 ˆθU α α Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 28 / 26

Demonstration: Setup λ tot 1 000 20 000 MCEM iterations 30 20 δ (0) 1 10 5 MCMC sample size during EM 1 000 500 MCMC sample size for ˆβ 1 000 R 200 Running time for ˆf 9 min 3 min Running time with bootstrap 9 h 56 min 3 h 36 min Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 29 / 26

Z e + e : Setup We unfold the n = 30 bins on the interval F = [82.5 GeV, 97.5 GeV] and use p = 38 B-spline basis functions to reconstruct the true intensity on the interval E = [81.5 GeV, 98.5 GeV] Here p > n facilitates the mixing of the MCMC sampler and E F accounts for boundary effects Other parameters: MCEM iterations 20 δ (0) 1 10 6 MCMC sample size during EM 500 MCMC sample size for ˆβ 5 000 R 200 Running time for ˆf Running time with bootstrap 5 min 6 h 13 min Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 30 / 26

Aristotelian Boundary Conditions (1) The prior p(β δ) exp ( δβ T Ωβ ) with Ω i,j = E B i (s)b j (s) ds is potentially improper since Ω has rank p 2 If the prior is improper, then the marginal p(y δ) is also improper and it makes no sense to use empirical Bayes for estimating δ The problem can be solved by imposing the so called Aristotelian boundary conditions That is, we condition on the unknown boundary values of f (or equivalently on β 1 and β p ) and place additional hyperpriors on these values: with p(β δ) = p(β 2,..., β p 1 β 1, β p, δ)p(β 1 δ)p(β p δ), β R p +, p(β 2,..., β p 1 β 1, β p, δ) exp( δβ T Ωβ), p(β 1 δ) exp ( δγ L β 2 1), p(β p δ) exp ( δγ R β 2 p), where γ L, γ R > 0 are fixed constants Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 31 / 26

Aristotelian Boundary Conditions (2) As a result p(β δ) exp ( δβ T Ω A β ) where the elements of Ω A are given by Ω i,j + γ L, if i = j = 1, Ω A,i,j = Ω i,j + γ R, if i = j = p, Ω i,j, otherwise The augmented matrix Ω A is positive definite and hence the modified prior is a proper pdf The Aristotelian prior has the added benefit that by controlling γ L and γ R we are able to control the variance of ˆf near the boundaries In our numerical experiments we had: Gaussian mixture model data: γ L = γ R = 5 Z e + e data: γ L = γ R = 70 Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 32 / 26

Demonstration: No regularization (b) Unregularized unfolding with λ tot = 20000 6000 5000 4000 Posterior mean, uniform prior Least squares, unfolded Least squares, smeared True intensity Smeared intensity Intensity 3000 2000 1000 0 6 4 2 0 2 4 6 Figure : Unfolding of the Gaussian mixture model data (λ tot = 20 000) without regularization. Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 33 / 26

Convergence of MCEM 10 3 (a) Convergence of MCEM Hyperparameter δ 10 4 10 5 10 6 λ tot = 1000 λ tot = 20000 10 7 0 10 20 30 Iteration Figure : Convergence of the MCEM algorithm for estimating the hyperparameter δ with the Gaussian mixture model data Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 34 / 26

MCMC Diagnostics β21 β5 800 600 400 200 4000 3500 3000 0 500 1000 Iteration 2500 0 500 1000 Iteration Frequency Frequency 100 50 Variable 5 1 Autocorrelation 0.5 0 0 500 1000 0 10 20 30 β5 Lag Variable 21 150 1 100 50 0 2000 4000 6000 β21 Autocorrelation 0 0.5 0 0 10 20 30 Lag Cumulative mean Cumulative mean 600 500 400 300 200 0 500 1000 Iteration 3800 3600 3400 3200 0 500 1000 Iteration Figure : Convergence and mixing diagnostics for the single-component Metropolis Hastings sampler for variables β 5 and β 21 with the Gaussian mixture model data with λ tot = 20 000: from left to right, the trace plots, histograms, estimated autocorrelation functions and cumulative means of the samples. Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 35 / 26

Convergence of Empirical Bayes Unfolding (b) Convergence of MISE 10 3 2 MISE / λ tot 10 4 10 4 10 5 Expected sample size λ tot Figure : Convergence of the mean integrated squared error (MISE) with the Gaussian mixture model data as the expected sample size λ tot grows. The error bars indicate approximate 95 % confidence intervals. Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 36 / 26

Monte Carlo EM Algorithm for Finding the MMLE The Monte Carlo EM algorithm (Geman and McClure, 1985, 1987; Saquib et al., 1998) for finding the marginal maximum likelihood estimate ˆδ: 1 Pick some initial guess δ (0) > 0 and set t = 0 2 E-step: 1 Sample β (1), β (2),..., β (S) from the posterior p(β y, δ (t) ) 2 Compute: Q(δ; δ (t) ) = 1 S log p(β (s) δ) S 3 M-step: Set δ (t+1) = arg max δ>0 4 Set t t + 1 s=1 Q(δ; δ (t) ) 5 If some stopping rule is satisfied, set ˆδ = δ (t) and terminate the iteration, else go to step 2 Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 37 / 26

B-Spline Basis Functions Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 38 / 26

Details of the MCMC Implementation (1) We use the single-component Metropolis Hastings sampler of Saquib et al. (1998) The kth full conditional satisfies n ( p ) log p(β k β k, y, δ) = y i log K i,j β j i=1 δ p i=1 j=1 n i=1 p K i,j β j j=1 p Ω i,j β i β j + const := f (β k, β k ) j=1 Taking a 2nd order Taylor expansion of the log-term around the current position β k of the Markov chain, we find where f (βk, β k ) d 1,k (βk β k ) + d 2,k 2 (β k β k ) 2 ( δ Ω k,k (βk ) 2 + 2 ) Ω i,k β i βk + const := g(βk, β), i k with µ = Kβ d 1,k = n i=1 ( K i,k 1 y ) i, d 2,k = µ i n i=1 ( ) 2 Ki,k y i µ i Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 39 / 26

Details of the MCMC Implementation (2) As a function of βk, the approximate full conditional g(βk, β) =d 1,k(βk β k) + d 2,k 2 (β k β k) 2 ( δ Ω k,k (βk )2 + 2 ) Ω i,k β i βk + const i k is a Gaussian with mean m k = d 1,k d 2,k β k 2δ i k Ω i,kβ i 2δΩ k,k d 2,k and variance σ 2 k = 1 2δΩ k,k d 2,k Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 40 / 26

Details of the MCMC Implementation (3) If m k 0, the proposal β k is sampled from N (m k, σ 2 k ) truncated to [0, ) If m k < 0, the proposal β k is sampled from Exp(λ) with β k log p(β k β) β k =0 = β k g(β k, β) β k =0 giving λ = d 1,k + d 2,k β k + 2δ i k Ω i,kβ i Denote: p(β k β) := q(β k, β k, β k ), p(β y, δ) := h(β k, β k ) The acceptance probability for the kth component of the single-component Metropolis Hastings algorithm is given by { a(βk, β) = min 1, h(β k, β k)q(β k, βk, β } k) h(β k, β k )q(βk, β k, β k ) Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 41 / 26

The Expectation-Maximization Algorithm The EM algorithm is an iterative method for finding the maximum of the likelihood L(θ; y) = p(y θ) Applies in cases where the data y can be seen as an incomplete version of some complete data x (that is, y = g(x)) with complete-data likelihood L(θ; x) = p(x θ) The EM iteration: 1 Pick some initial guess θ (0) and set t = 0 2 E-step: Compute Q(θ; θ (t) ) = E(log p(x θ) y, θ (t) ) 3 M-step: Set θ (t+1) = arg max θ Q(θ; θ (t) ) 4 Set t t + 1 5 If some stopping rule is satisfied, set ˆθ MLE = θ (t) and terminate the iteration, else go to step 2 The EM iteration never decreases the incomplete-data likelihood That is, L(θ (t+1) ; y) L(θ (t) ; y) for all t = 0, 1, 2,... Mikael Kuusela (EPFL) Empirical Bayes Unfolding June 13, 2014 42 / 26