Model Selection in Amplitude Analysis

Size: px

Start display at page:

Download "Model Selection in Amplitude Analysis"

Cuthbert Fowler
6 years ago
Views:

1 Model Selection in Amplitude Analysis Mike Williams Department of Physics & Laboratory for Nuclear Science Massachusetts Institute of Technology PWA/ATHOS April 6, 5

2 This talk summarizes a paper I worked on with B. Guegan, J. Hardin and J. Stevens. It should appear on the arxiv next week. Overview Everybody knows that one must truncate the set of waves considered. This is true in regression analyses performed in all fields of study. How do you do this? In physics the stepwise approach is by far the most common; however, general studies by statisticians show that these approaches tend to produce overly simplified models... is there a better way? Beyond this, if you want to discover a contribution it s desirable to quote a statistical significance. Is this possible? Follow up to: J.Stevens & MW [5.78]; MW [6.9] Preprint typeset in JINST style HYPER VERSION Model selection for amplitude analysis Baptiste Guegan, John Hardin, Justin Stevens and Mike Williams Massachusetts Institute of Technology, Cambridge, MA, United States ABSTRACT: Model complexity in amplitude analyses is often a priori underconstrained since the underlying theory permits a large number of amplitudes to contribute to most physical processes. The use of an overly complex model results in reduced predictive power and worse resolution on unknown parameters of interest. Therefore, it is common to arbitrarily reduce the complexity by removing from consideration some subset of the allowed amplitudes. This paper studies a datadriven method for limiting model complexity through regularization during regression in the context of a multivariate (Dalitzplot) analysis. The regularization technique applied greatly improves the performance. A method is also proposed for obtaining the significance of a resonance in a multivariate amplitude analysis. Mike Williams

3 This final state will first be studied in bins of m Toy Model abc with the fits performed in each bin being independent. The particles that decay into the abc final state are generically labeled as X. The decay amplitudes are constructed in the isobar formalism with the observed intensity written as the coherent sum of resonant terms plus a nonresonant term added incoherently as follows: Consider a toy model X abc in bins of m(abc) and start with simple 6wave true model. M (~x, ) = Â M Â i a i e if i A M i (~x, ) a nr, (.) where~x =(m a) b) c) 5 ab,m ac) represents the position in the Dalitz plot, ae if describes the unknown complex m ac 5 factor for each resonant component [S] of the model, and =. M denotes the sum over = spin. states of the [S] various X particles. Table [P] lists the properties and decay channels of the resonances contained in 5 8 [D] the true p.d.f. The resonant 6 [S] terms contain contributions from BlattWeisskopf barrier factors [8],.5.5 relativistic BreitWigner line shapes to describe the propagators, and spin factors obtained using 5 the Zemach formalism [9]. The amplitudes are evaluated using the qft package []..5.5 m ab m ac.5.5 m ab m ac.5 d) =. 8 6 m ac.5 e) =. 8 6 m ac.5 f) = m ab m ab m ab Figure. a) The X! abc mass distribution for one simulated data set overlaid with the contributions from Mike Williams

4 True PDF If we fit the data using only the waves in the model PDF things look good. 5 a) [S] [S] [P] [D] [S] [S] b) [D] c) [S] d) 6 5 [P] e) [S] f) [S] [S] g) h) [S] [D] i) [P] [S] Mike Williams Figure. Results of extended maximum likelihood fit with the true p.d.f. from Table : a) f) intensities for

5 Alternative PDF Fitting with an additional 5 extraneous waves introduces a lot of noise... 5 a) [S] [S] [P] [D] [S] [S] b) [D] c) [S] 8 6 d) [P] e) [S] f) [S] [S] g) h) [S] [D] i) [P] [S] Mike Williams 5 Figure. Results of extended maximum likelihood fit with the full p.d.f., including the additional 5 extra

$9..5.8.5.5 /. 6..7.6.5.. c) [S] f)...5.5.5. [S] Figure. Cumulative fraction of extraneous waves [P] with [S] fit fraction greater than yaxis value vs. The.5.5.5 [S] g).5.5 h).5.5.5.5 [S] (red) line shows / p N, where N is the total number of events in the bin.$

6 Alternative PDF Fitting with an additional 5 extraneous waves introduces a lot of noise... 5 a) [S] 8 6 Fit Fraction [S] [S] [P] [D] [S] d) [S] b) [P] e) N [D] / c) [S] f) [S] Figure. Cumulative fraction of extraneous waves [P] with [S] fit fraction greater than yaxis value vs. The [S] g).5.5 h) [S] (red) line shows / p N, where N is the total number of events in the bin. This is shown just to give a sense of statistical precision in the bin. [D] i) Mike Williams 6

7 LASSO formance of the fits has clearly deteriorated. The variance of the estimators of the true model parameters has in many places greatly increased. In addition, the fits incorrectly determine that Intuitively some of the we 5know extraneous that: (a) amplitudes adding more (thoseparameters not in the true to the p.d.f) fit will make describe significant *this* contributions. data set better; This can (b) beadding seen intoo Fig. much, where complexity the cumulative to the model number actually of extraneous reduces resonant predictive amplitudes power and with a resolution fit fraction on above regression a given coefficients. threshold are How shown do vs we m balance these competitors? abc. Step Applying : Penalize the complexity LASSO to in this the analysis likelihood involves (we follow augmenting the LASSO the minimization in our approach): quantity according to Eq.. as follows: " r Z # Z logl l a i e if i A M i (~x) d~x a nr d~x, (.) Â i,m where, as before, l is the LASSO regularization parameter. Our choice of LASSO penalty term Overly complex models tend to use large interference effects to produce unphysical features does notin depend the PDF. on the We relative penalize normalization this using SUM[sqrt(fit of the A (~x) fraction)]. terms. Figure 5 shows the results of The fit fraction for resonant term i is defined as R a i e if ia i M (~x) d~x/ R M (~x) d~x. The parameter λ balances the fit quality gain with the complexity added. Larger values of λ will produce more simple models... but how do we choose λ? N.b., once λ is chosen performing this fit is no 6 different than a normal fit and the penalty term is constructed using only quantities readily available (simple to code up). Mike Williams 7

8 Information Criteria We consider the two most famous ICs: AIC = log(l) r and BIC = log(l) r*log(n(events)), where r = N(waves) with fit fraction > [threshold]. The choice of threshold doesn t seem to matter (anything in the.% range gives the same result for us). AIC BIC λ λ BIC produces models <= AIC. Generic recommendations: Try both in your systematic studies, i.e., consider AIC(λ) <= λ <= BIC(λ). Specific cases: AIC is better when a false negative would be more misleading than a false positive, and BIC is better where a false positive more misleading than a false negative. Mike Williams 8

9 LASSO Fit Fit including the 5 extraneous waves LASSO ~identical to using only true PDF waves. 5 a) [S] [S] [P] [D] [S] [S] b) [D] c) [S] d) 6 5 [P] e) 5 [S] 5 f) [S] [S] g) h) [S] [D] i) [P] [S] Figure 8. Results of extended maximum likelihood fit with the full p.d.f., including the additional 5 ex Mike Williams 9

10 False Waves The LASSO does a good job of killing extraneous waves: No LASSO LASSO Fit Fraction /.... N Fit Fraction /.... N raction of extraneous waves with fit fraction greater than yaxis value vs. The where N is the total number of events in the bin. This is shown just to give a sion in the bin. It is now very unlikely to have *any* extraneous wave with a fit fraction > /sqrt(n). Mike Williams

11 Bias? Does adding a penalty term to the likelihood bias the results towards smaller fit fractions? fit fractions fit/true [S] [S] [P] [D] [S].. true model traditional LASSO.8.6 All fit fractions for waves we want to measure are unbiased within the uncertainties and most are *less* biased than when not using the LASSO. Mike Williams

12 Bias? We also extract the resonance parameters of interest and find less bias using LASSO (and better resolution of course) [S] a) [S] b) Δφ [S] [S] c) traditional fit/true.. resonance mass [S] [P] [S] LASSO fit/true.. resonance width [S] [P] [S] Mike Williams

13 Model. The LASSO successfully picks out a simple true model, but the real world is that the true model has a lot of nonzero (but small) contributions and we only care about studying the properties of larger waves. Model.: Add 6 more waves to the true model with maximum fit fractions between..%. We don t really care about these, we just want to still make sure we properly measure the waves with fit fractions > %. fit/true.. fit fractions [S] [S] [P] [D] [S] true model.8 traditional LASSO.6 Mike Williams

14 Model. The LASSO successfully picks out a simple true model, but the real world is that the true model has a lot of nonzero (but small) contributions and we only care about studying the properties of larger waves. Model.: Add 6 more waves to the true model with maximum fit fractions between..%. We don t really care about these, we just want to still make sure we properly measure the waves with fit fractions > %. traditional resonance mass [S] [P] [S] LASSO resonance width [S] [P] [S] fit/true.. fit/true Mike Williams

15 Significance The LASSO permits selecting the wave set in a way that does not require human intervention. This means that we can now generate ensembles of data sets and run the procedure on them and determine the statistical significance of any predefined event. How to define a resonance candidate will be analysis dependent. As a simple example we chose: at least 6 consecutive m(abc) bins with the wave having a fit fraction > at > sigma in each bin *and* difference between the MIN and MAX phase difference (relative to a reference wave) of > π/. For all waves that satisfy this criteria we fit their strength and phase motion and get a Δchi relative to the chi obtained assuming random noise in the wave. Mike Williams 5

16 Model I Model II J P [Ml j] Generated No LASSO BICLASSO No LASSO BICLASSO [S].5.5 ±..5 ±..5 ±..5 ±. [P].5.9 ±..6 ±..9 ±..8 ±. [S].5. ±..8 ±..9 ±..7 ±. Significance Large waves will be significant no matter what but still interesting to see how much bigger Δchi is with the LASSO for some waves... Table 6. Mean and standard deviation of the measured G X distributions extracted from the BreitWigner fits. 5 5 a) [S] [S] [P] [D] [S] # datasets m ac no LASSO.5.5 # datasets c) =. 5 LASSO 5 m ac > X Fraction of datasets with Δχ m ab > X Fraction of datasets with Δχ Figure 6. Dc distribution for resonance amplitudes in the true p.d.f.: [S] (blue), [P] (cyan), and [S] (green); a) without using the LASSO procedure (i.e. l = ) and b) with the LASSO procedure 8 with(out) the LASSO. d) Any structure that survives e) the LASSO is unlikely f) to be a statistical 7 =. 8 artifact (doesn t mean it s a =. m =. 8 abc 6 using resonance BIC to selectof the best course). l for each data sample and m 6 abc mass bin. m ac 5 5 procedure.5 using BIC to select the best l for.5each data sample and mass bin. The mean of the Dc distribution is significantly larger for all the resonant amplitudes in the true p.d.f. when the LASSO procedure is used. To evaluate.5 the extraneous amplitudes, Fig 7 shows the cumulative maximum Dc from m ab m the extraneous amplitudes in the fit a) without the LASSO procedure ab and b) with the LASSO 5 procedure. Only % of the datasets fit with the LASSO procedure had any extraneous amplitudes 5 Figure. a) The X! abc mass distribution for one simulated data set overlaid with the contributions from Δχ Δχ the individual resonant amplitudes. b) LASSO f) Dalitz thatplot havedistributions a maximum Dc in afor fewany bins extraneous of. wave greater than. Without using the that satisfied the criteria for being assigned a nonzero Dc. There are no datasets fit using the Mike Williams 6 6 m ac Δχ m ab more interesting MAX(Δchi) of any extraneous wave shows σ requires ~7(5) 5 Δχ

17 Summary Wave set selection is an important part of amplitude analysis and (IMO) one that should be considered in an unbiased and rigorous way. The statistical community has been tackling this problem for decades and many widely used (outside of physics) solutions exist that are easy to implement and study. The use of methods that do not require human input is desirable as you get out what you put in and if what you put it in known resonances interesting wave you re likely to get a nonzero contribution for interesting wave. What does that mean? The use of the LASSO (A,B)IC permits an automated procedure that does not bias results (more than traditional approaches), gives resolution close to that of a cheat model, and permits defining a test statistic from which significance can be defined. Still need the human to interpret the results! Mike Williams 7

Linear regression methods

Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response