Statistical models with uncertain error parameters

Size: px

Start display at page:

Download "Statistical models with uncertain error parameters"

Martin Riley
5 years ago
Views:

1 Eur. Phys. J. C (09) 79:33 Regular Artcle - Expermental Physcs Statstcal models wth uncertan error parameters Glen Cowan a Physcs Department, Royal Holloway, Unversty of London, Egham TW0 0EX, UK Receved: 3 October 08 / Accepted: 3 February 09 / Publshed onlne: February 09 The Author(s) 09 Abstract In a statstcal analyss n Partcle Physcs, nusance parameters can be ntroduced to take nto account varous types of systematc uncertantes. The best estmate of such a parameter s often modeled as a Gaussan dstrbuted varable wth a gven standard devaton (the correspondng systematc error ). Although the assgned systematc errors are usually treated as constants, n general they are themselves uncertan. A type of model s presented where the uncertanty n the assgned systematc errors s taken nto account. Estmates of the systematc varances are modeled as gamma dstrbuted random varables. The resultng confdence ntervals show nterestng and useful propertes. For example, when averagng measurements to estmate ther mean, the sze of the confdence nterval ncreases for decreasng goodness-of-ft, and averages have reduced senstvty to outlers. The basc propertes of the model are presented and several examples relevant for Partcle Physcs are explored. Introducton Data analyss n Partcle Physcs s based on observaton of a set of numbers that can be represented by a (vector) random varable, here denoted as y. The probablty of y (or probablty densty for contnuous varables) can n general be wrtten P(y, θ), where represents parameters of nterest and θ are nusance parameters needed for the correctness of the model but not of nterest to the analyst. The goal of the analyss s to carry out nference related to the parameters of nterest. A procedure for dong ths n the framework of frequentst statstcs usng the profle lkelhood functon s descrbed n Sect.. Ths nvolves usng control measurements wth gven standard devatons to provde nformaton on the nusance parameters. Here we wll take the term systematc error to mean the standard devaa e-mal: g.cowan@rhul.ac.uk ton of a control measurement tself. The word error s used n the sense defned here and not to mean, e.g., the unknown dfference between an nferred and true value. The systematc errors defned n ths way should also not be confused wth correspondng systematc uncertanty n the estmate of the parameter of nterest. Often the values assgned to the systematc errors are themselves uncertan. Ths can be ncorporated nto the model by treatng ther values as adjustable parameters and ther estmates as random varables. A model s proposed n whch the estmates of systematc varances are treated as followng a gamma dstrbuton, whose mean and wdth are set by the analyst to reflect the desred nomnal value and ts relatve uncertanty. The confdence ntervals that result from ths type of model are found to have nterestng and useful propertes. For example, when averagng measurements to estmate ther mean, the sze of the confdence nterval ncreases wth decreasng goodness-of-ft, and averages have reduced senstvty to outlers. The basc propertes of the model are presented and several types of examples relevant for Partcle Physcs are explored. The approach followed here s that of frequentst statstcs, as ths s wdely used n Partcle Physcs. Models wth elements smlar to the one proposed have been dscussed n the statstcs lterature, e.g., Refs.,]. Analogous Bayesan procedures have been nvestgated n Partcle Physcs 3 5] and found to produce results wth qualtatvely smlar propertes. After revewng parameter nference usng the profle lkelhood wth known systematc errors n Sect., the model wth adjustable error parameters s presented n Sect. 3 and ts use n determnng confdence ntervals s dscussed n Sect.. In ths paper two areas where such a model can be appled are explored: a sngle Gaussan dstrbuton measurement n Sect. 5 and the method of least squares n Sect. 6. The ssue of correlated systematc uncertantes s dscussed n Sect. 7 and conclusons are gven n Sect. 8.

2 33 Page of 7 Eur. Phys. J. C (09) 79 :33 Parameter nference usng the profle lkelhood and the case of known systematc errors Inference about a model s parameters s based on the lkelhood functon L(, θ) = P(y, θ). More specfcally one can construct a frequentst test of values of the parameters of nterest by usng the profle lkelhood rato (see, e.g., Ref. 6]), λ() = L(, ˆθ) L( ˆ, ˆθ). () Here n the denomnator, ˆ and ˆθ represent the maxmumlkelhood (ML) estmators of and θ, and ˆθ are the profled values of θ,.e., the values of θ that maxmze the lkelhood for a gven value of. Often the nusance parameters are ntroduced to account for a systematc uncertanty n the model. Ther presence parameterzes the systematc uncertanty such that for some pont n the enlarged parameter space the model should be closer to the truth. Because of correlatons between the estmators of the parameters, however, the nusance parameters result n a decrease n senstvty to the parameters of nterest. To counteract ths unwanted effect, one often ncludes nto the set of observed quanttes addtonal measurements that provde nformaton on the nusance parameters. A smple and often used form of such control measurements nvolves treatng the best avalable estmates of the nusance parameters θ = (θ,...,θ N ) as ndependent Gaussan dstrbuted values u = (u,...,u N ) wth standard devatons σ u = (σ u,...,σ u N ). In ths way the full lkelhood becomes L(, θ) = P(y, u, θ) = P(y, θ)p(u θ) N = P(y, θ) e (u θ ) /σu, () πσu or equvalently the log-lkelhood s ln L(, θ) = ln P(y, θ) N (u θ ) σu + C, (3) where C represents terms that do not depend on the adjustable parameters of the problem and therefore can be dropped; n the followng such constant terms wll usually not be wrtten explctly. The log-lkelhood n Eq. (3) represents one of the most wdely used methods for takng account of systematc uncertantes n Partcle Physcs analyses. Frst nusance parameters are ntroduced nto the model to parameterze the systematc uncertanty, and then these parameters are constraned by means of control measurements. The quadratc constrant terms n Eq. (3) correspond to the case where the estmate u of the parameter θ s modeled as a Gaussan dstrbuted varable of known standard devaton σ u. In some problems one may have parameters η that are ntrnscally postve wth estmates t modeled as followng a log-normal dstrbuton. The Gaussan model covers ths case as well by defnng θ = ln η and u = ln t, so that u s the correspondng Gaussan dstrbuted estmator for θ. Often the estmates u are the outcome of real control measurements, and so the standard devatons σ u are related to the correspondng sample sze. The control measurement tself could, however, nvolve a number of uncertantes or arbtrary model choces, and as a result the values of the σ u may themselves be uncertan. Gaussan modellng of the u can be used even f the measurement exsts only n an dealzed sense. For example, the parameter θ could represent a not-yet computed coeffcent n a perturbaton seres, and u s one s best guess of ts value (e.g., zero). In ths case one may try to estmate an approprate σ u by means of some recpe, e.g., by varyng some aspects of the approxmaton technque used to arrve at u. For example, n the case of predcton based on perturbaton theory one may try varyng the renormalzaton scale n some reasonable range. In such a case the estmate of σ u results from farly arbtrary choces, and values that may dffer by 50% or even a factor of two mght not be unreasonable. 3 Gamma model for estmated varances One can extend the model expressed by Eq. () to account for the uncertanty n the systematc errors by treatng the σ u as adjustable parameters. The best estmates s for the σ u are regarded as measurements to be ncluded n the lkelhood model. The wdth of the dstrbuton of the s s set by the analyst to reflect the approprate uncertanty n the σ u. The characterzaton of the error on the error s descrbed n Sect. 3.. In Sect. 3. the full mathematcal model s defned and the correspondng lkelhood profled over the σ u s derved. Ths s shown n Sect. 3.3 to be equvalent to a model n whch the estmates u follow a Student s t dstrbuton. 3. The relatve error on the error In the model proposed here t s convenent to regard the varances σ u as the parameters, and to take values v = s as ther estmates. There s a specal case n whch the estmated varances v wll follow a ch-squared dstrbuton, namely, when v s the sample varance of n ndependent observatons of u,.e., v = n n (u, j u ), () j=

3 Eur. Phys. J. C (09) 79 :33 Page 3 of 7 33 Fg. Plots of a the gamma dstrbuton of the estmated varance v and b the Nakagam dstrbuton for the estmated standard devaton s = v for several values of the parameter r (see text) f(v α,β) 5 3 (a) r = 0.05 r = 0. r = 0. r = 0.5 r =.0 g(s α,β) 8 6 (b) r = 0.05 r = 0. r = 0. r = 0.5 r = v s where u, j s the jth observaton of u and u = n nj= u, j. If the u, j are Gaussan dstrbuted wth standard devatons σ u, then one fnds (see, e.g., Ref. 7]) that the statstc (n )v /σu follows a ch-squared dstrbuton for n degrees of freedom. Furthermore, the ch-squared dstrbuton for n degrees of freedom s a specal case of the gamma dstrbuton, f (v; α, β) = βα Γ(α) vα e βv, v 0, (5) for parameter values α = n/ and β = /. The mean and varance are related to the parameters α and β by Ev] =α/β and V v] =α/β. Therefore f (n )v /σu follows a chsquare dstrbuton wth n degrees of freedom, then v follows a gamma dstrbuton wth α = n, (6) β = n σu. (7) In general the analyst wll not base the estmate v on n observatons of u but rather on dfferent types of nformaton, such as related control measurements or approxmate theoretcal predctons. The analyst must then set the wdth of the dstrbuton of v to reflect the approprate level of uncertanty n the estmate of σu. For v = s, usng error propagaton gves to frst order σ v Ev ] σ s Es ]. (8) To characterze the wdth of the gamma dstrbuton we defne σ v σ v r Ev ] = σu. (9) From Eq. (8) one sees that to frst approxmaton r σ s /Es ] and thus we can thnk of these factors as representng the relatve uncertanty n the estmate of the systematc error. The parameters r wll be referred to as the error on the error. A more accurate relaton between r as defned here and the quantty σ s /Es ] s gven n Appendx A. Usng the expectaton value of the gamma dstrbuton Ev ] = α /β and ts varance V v ] = α /β, we can relate the values r suppled by the analyst and the σ u to α and β by α = r, () β = r σ u. () Fgure a shows the gamma dstrbuton for σ u = and several values of r and Fg. b shows the correspondng dstrbuton of s = v. More detals on the dstrbuton of s and ts propertes are gven n Appendx A. The assumpton of a gamma dstrbuton s not unque but represents nevertheless a reasonable and flexble expresson of uncertanty n the σ u. Moreover t wll be shown that by usng the gamma dstrbuton one fnds a very smple procedure for ncorporatng uncertan systematc errors nto the model. Usng Eq. (6) to connect the relatve uncertanty r to the effectve number of measurements n gves n = + /r.a relevant specal case s n =, sometmes called the problem of two-pont systematcs, where one has two estmates u, and u, of a parameter θ. Ths gves ˆθ = u = (u, + u, ), () s = u, u,, (3) r = /. () It wll be assumed n ths paper that the analyst s able to assgn meanngful values for the error-on-the-error parameters r. The procedure for dong ths wll nvolve elctaton

4 33 Page of 7 Eur. Phys. J. C (09) 79 :33 of expert knowledge from those who assgned the systematc errors and wll n general vary dependng on the experment. One may want to regard a subset of the measurements as havng a certan common r whch could be ftted from the data, but we do not nvestgate ths possblty further here. The proposed model thus makes two mportant assumptons. Frst, the control measurements are taken to be ndependent and Gaussan dstrbuted. As mentoned n Sect., the Gaussan u can be extended to an alternatve dstrbuton f t can be related to a Gaussan by a transformaton. Second, the estmates of the varances of the u are gamma dstrbuted. Both assumptons are reasonable but nether s a perfect descrpton n practce, and thus the resultng nference could be subject to correspondng systematc uncertantes. Nevertheless the proposed model wll n general be an mprovement over the wdely used Gaussan assumpton for u wth fxed varances. In addton, the choce of the gamma dstrbuton leads to mportant smplfcatons n mathematcal expressons needed for nference, as shown n Sect. 3. below. 3. Lkelhood for the gamma model By treatng the estmated varances v = (v,...,v N ) as ndependent gamma dstrbuted random varables, the full lkelhood functon becomes L(, θ, σ u ) = P(y, θ) N πσu e (u θ ) /σ u β α Γ(α ) vα e β v. (5) By usng Eqs. () and () to relate the parameters α and β to σ u and r one fnds, up to addtve terms that are ndependent of the parameters, the log-lkelhood ln L(, θ, σ u ) = ln P(y, θ) ( ) ] N (u θ ) σ + + u r ln σu + v r σ u. (6) By settng the dervatves of ln L wth respect to the σ u to zero for fxed θ and one fnds the profled values σ u = v + r (u θ ) + r. (7) Usng these for the σu gves the profle lkelhood wth respect to the systematc varances, but whch stll depends on θ as well as the parameters of nterest. Aftersome manpulaton t can be wrtten up to constant terms as ln L (, θ) = ln L(, θ, σ u ) = ln P(y, θ) ( ) N + r ln + r (u θ ) v ]. (8) Some ntermedate steps n the dervaton of Eq. (8) are gven n Appendx B. In the lmt where all of the r are small, the estmates v are very close to ther expectaton values σ u. Makng ths replacement and expandng the logarthmc terms to frst order one recovers the quadratc terms as n Eq. (3). 3.3 Dervaton of profle lkelhood from Student s t dstrbuton An equvalent dervaton of the profle lkelhood (8) can be obtaned by frst defnng z u θ v. (9) As u follows a Gaussan wth mean θ and standard devaton σ u, and v follows a gamma dstrbuton wth mean σu and standard devaton σ v = r σ u, one can show (see, e.g., Ref. 7]) that z follows a Student s t dstrbuton, ( ν + ) ( Γ f (z ν ) = ν πγ(ν /) + z ν wth a number of degrees of freedom ) ν +, (0) ν = r. () By constructng the lkelhood L(, θ) as the product of P(y, θ) and Student s t dstrbutons, L(, θ) = P(y, θ) ( ) N Γ ν + ( ν πγ(ν /) + z ν ) ν +, () one obtans the same log-lkelhood as gven by ln L from Eq. (8). That s, the same model results f one replaces the estmates v by constants σ u, but stll takes the z to follow a Student s t dstrbuton, wth u = θ + σ u z. Thus n the followng we can drop the prme n the profle log-lkelhood (8) and regard ths equvalently as the log-lkelhood resultng from a model where the control measurements are dstrbuted accordng to a Student s t. In the lmt where r 0 and thus the number of degrees of freedom ν, the Student s t dstrbuton becomes a Gaussan (see, e.g., Ref. 7]),

5 Eur. Phys. J. C (09) 79 :33 Page 5 of 7 33 and the correspondng term n the log-lkelhood becomes quadratc n u θ,asneq.(3). Estmators and confdence regons from profle lkelhood The ML estmators are found by maxmzng the full ln L(, θ, σ u ) wth respect to all of the parameters, whch s equvalent to maxmzng the profle lkelhood wth respect to and θ. In ths way the statstcal uncertantes due to both the estmated bases u as well as ther estmated varances v are ncorporated nto the varances of the estmators for the parameters of nterest ˆ. Consder for example the case of a sngle contnuous parameter of nterest. Havng found the estmator ˆ, one could quantfy ts statstcal precson by usng the standard devaton σ ˆ. The covarance matrx for all of the estmated parameters can to frst approxmaton be found from the nverse of the matrx of second dervatves of ln L (see, e.g., Refs. 8,9]). From ths we extract the varance of the estmator of the parameter of nterest,.e., V ˆ] =σ.the ˆ presence of the nusance parameters n the model wll n general nflate σ ˆ, whch reflects the correspondng systematc uncertantes. But σ ˆ s by constructon a property of the model and not of a partcular data set. One may want, however, to report a measure of uncertanty along wth the estmate ˆ that reflects the extent to whch the data values are consstent wth the hypotheszed model, and therefore σ ˆ s not sutable for ths purpose. We wll show below, however, that a confdence regon can be constructed that has ths desred property. In general to fnd a confdence regon (or for a sngle parameter a confdence nterval) one tests all values of wth atestofszeα for some fxed probablty α. Those values of that are not rejected by the test consttute a confdence regon wth confdence level α. To determne the crtcal regon of the test of a gven one can use a test statstc based on the profle lkelhood rato t = lnλ() = ln L(, ˆθ) L( ˆ, ˆθ). (3) The crtcal regon of a test of corresponds to the regon of data space havng probablty content α wth maxmal t. Equvalently, provded t can be treated as contnuous, the p-value of a hypotheszed pont n parameter space s p = t,obs f (t, θ, σ u ) dt = F(t,obs ), () where t,obs s the observed value of t and F s the cumulatve dstrbuton of t. That s, we defne the regon of data space even less compatble wth the hypothess than what was observed to correspond to t > t,obs. The boundary of the confdence regon corresponds to the values of where p = α. Solvng Eq. () for the test statstc gves t = F ( p ), (5) where here t refers to the value observed, and F s the quantle of t. The statstc t s also defned n terms of the lkelhood through Eqs. () and (3), and by usng p = α one fnds that the boundary of the confdence regon s gven by ln L(, ˆθ) = ln L( ˆ, ˆθ) F ( α). (6) To fnd the p-values and thus determne the boundary of the confdence regon one needs the dstrbuton f (t, θ, σ u ). Accordng to Wlks theorem ], for M parameters of nterest = (,..., M ) the statstc t should follow a ch-squared dstrbuton for M degrees of freedom n the asymptotc lmt, whch here corresponds to the case where the dstrbutons of all ML estmators are Gaussan. To the extent that ths approxmaton holds we may dentfy the quantle F n Eq. (6) wth F, the chsquared quantle for M degrees of freedom. χm If t s further assumed that the log-lkelhood can be well approxmated by a quadratc functon about ts maxmum, then one fnds asymptotcally (see, e.g., Ref. ]) that ln L(, ˆθ) = ln L( ˆ, ˆθ) ( ˆ)T V ( ˆ), (7) where V j = covˆ, ˆ j ] s the covarance matrx for the parameters of nterest. Ths equaton says that the confdence regon s a hyper-ellpsod of fxed sze centred about ˆ.For example, for a sngle parameter one fnds that the endponts ± =ˆ ± σ ˆ / F ( α)] (8) χ gve the central confdence nterval wth confdence level α. For a probablty content correspondng to plus or mnus one standard devaton about the centre of a Gaussan,.e., α = 68.3%, one has F χ ( α) =, whch gves the wellknown result that the nterval of plus or mnus one standard devaton about the estmate s asymptotcally a 68.3% CL central confdence nterval. The relatons (7) and (8) depend, however, on a quadratc approxmaton of the log-lkelhood. In the model where the σ u are treated as adjustable, the profle loglkelhood s gven by Eq. (8), whch contans terms that are logarthmc n (u θ ), and not just the quadratc terms

6 33 Page 6 of 7 Eur. Phys. J. C (09) 79 :33 that appear n Eq. (3). As a result the relaton (7) s only a good approxmaton n the lmt of small r, whch s not always vald n the present problem. We can nevertheless use Eq. (6) assumng a ch-squared dstrbuton for t as a frst approxmaton for confdence regons. We wll see n the examples below that these have nterestng propertes that already capture the most mportant features of the model. If hgher accuracy s requred then Monte Carlo methods can be used to determne the dstrbuton of t. Alternatvely we can modfy the statstc so that ts dstrbuton s closer to the asymptotc form; ths s explored further n Sect Sngle-measurement model To nvestgate the asymptotc propertes of the profle lkelhood rato t s useful to examne a smple model wth a sngle measured value y followng a Gaussan wth mean and standard devaton σ. The parameter of nterest s and we treat the varance σ as a nusance parameter, whch s constraned by an ndependent gamma-dstrbuted estmate v. Thus the lkelhood s gven by L(, σ ) = f (y,v, σ ) = πσ e (y ) /σ βα Γ(α) vα e βv. (9) As before we set the parameters α and β of the gamma dstrbuton so that Ev] =σ and so that from Eq. (9) the standard devaton of v s σ v = rσ, where r characterzes the relatve error on the error. Ths gves α = r, (30) β = r σ. (3) The goal s to construct a confdence nterval for by usng the profle lkelhood rato λ() = L(, σ ()) L( ˆ, σ. (3) ) The log-lkelhood s ln L(, σ ) = (y ) ( σ + ) r ln σ v r σ + C, (33) where C represents constants that do not depend on or σ. From ths we fnd the requred estmators ˆ = y, (3) σ v = + r, (35) σ () = v + r (y ) + r. (36) Wth these ngredents we fnd the followng smple expresson for the statstc t = lnλ(), ( t = + ) ] (y ) r ln + r. (37) v Accordng to Wlks theorem ], the dstrbuton f (t ) should, n the large-sample lmt, be ch-squared for one degree of freedom. The large-sample lmt corresponds to the stuaton where estmators for the parameters become Gaussan, whch n ths case means r. The behavour of the dstrbuton of t for nonzero r s llustrated n Fg., whch shows the dstrbutons from data generated accordng to a Gaussan of mean = 0, standard devaton σ = and values of r = 0.0, 0., 0. and 0.6. The case of r = 0.0 approxmates the stuaton where the relatve uncertanty on σ s neglgbly small. One can see that greater values of r lead to an ncreasng departure of the dstrbuton from the asymptotc form. Dependng on the sze of the test beng carred out or equvalently the confdence level of the nterval, one may fnd that the asymptotc approxmaton s nadequate. In such a case one may wsh to use the Monte Carlo smulaton to determne the dstrbuton of the test statstc. Alternatvely one can modfy the statstc so that ts dstrbuton s better approxmated by the asymptotc form, as descrbed n the followng secton. 5. Bartlett correcton for profle lkelhood-rato statstc The lkelhood-rato statstc can be modfed so as to follow more closely a ch-square dstrbuton usng a type of correcton due to Bartlett ]. Ths method has receved some lmted notce n Partcle Physcs 5] but has not been wdely used n that feld. The basc dea s to determne the mean value Et ] of the orgnal statstc. In the asymptotc lmt, ths should be equal to the number of degrees of freedom n d of the ch-square dstrbuton, whch n ths example s n d =. One then defnes a modfed statstc t = n d Et ] t, (38) so that by constructon Et ]=n d.itwasshownbylawley 6] that the modfed statstc approaches the reference chsquared dstrbuton wth a dfference of order n /, where here the effectve sample sze n s related to the parameter r by n = + /r (cf. Eqs. (6) and ()). One could n prncple fnd the expectaton value Et ] by the Monte Carlo method. But for the method to be convenent to use one would lke to determne the Bartlett cor-

7 Eur. Phys. J. C (09) 79 :33 Page 7 of 7 33 Fg. Dstrbutons of the test varable t for a sngle Gaussan dstrbuted measurement wth relatve error-on-error r ) f(t (a) N = r = 0.0 ) f(t (b) N = r = 0. χ pdf χ pdf t t (c) (d) ) f(t N = ) f(t N = r = 0. r = 0.6 χ pdf χ pdf t t recton wthout resortng to smulaton. By expandng the expectaton value Et ]= t (y,v) f (y,v, σ ) dydv (39) as a Taylor seres n r one fnds Et ]= + 3r + cr, (0) where the coeffcent of the r term s found numercally to be c wth an accuracy of around %. Dvdng t /n d (here wth n d = ) from Eq. (3) byet ] to obtan the Bartlett-corrected statstc therefore gves t = + r r ( + 3r + r ) ln + r ] (y ). () v In more complex problems one may not have a smple expresson for the expectaton value needed n the Bartlett correcton and calculaton by Monte Carlo may be necessary. Dstrbutons of t are shown n Fg. 3 along wth Monte Carlo dstrbutons. As can be seen by comparng the uncorrected dstrbutons from Fgs. to those n Fg. 3, the Bartlett correcton s clearly very effectve, as s needed when the parameter r s large. 5. Confdence ntervals for the sngle-measurement model In the smple model explored n ths secton one can use the measured values of y and v to construct a confdence nterval for the parameter of nterest. The probablty that the nterval ncludes the true value of (the coverage probablty) can then be studed as a functon of the relatve error on the error r. What emerges s that the nterval based on the ch-squared dstrbuton of t has a coverage probablty substantally less than the nomnal confdence level, but that ths can be greatly mproved by use of the Bartlett-corrected nterval. To derve exact confdence ntervals for we can use the fact that z = y () v follows a Student s t dstrbuton for ν = /r degrees of freedom (see, e.g., Ref. 7]). From the dstrbuton of z one can fnd the correspondng pdf of

8 33 Page 8 of 7 Eur. Phys. J. C (09) 79 :33 Fg. 3 Dstrbutons of the Bartlett-corrected test varable t for a sngle Gaussan dstrbuted measurement wth relatve error-on-error r ) f(t (a) N = r = 0.0 ) f(t (b) N = r = 0. χ pdf χ pdf t (c) t (d) ) f(t N = ) f(t N = r = 0. r = 0.6 χ pdf χ pdf t t ] t = ( + ν)ln + z, (3) ν but n fact ths s not drectly needed. Rather we can use the pdf of z to fnd confdence ntervals for from the fact that a crtcal regon defned by t > t c s equvalent to the correspondng regon of z gven by z < z c and z > z c where the boundares of the crtcal regons n the two varables are related by Eq. (3). Equvalently one can say that the p-value of a hypotheszed value of s the probablty, assumng, to fnd z further from zero than what was observed,.e., zobs ( ( )) y p = f (z) dz = F ; ν, () z obs v where F(z; ν) s the cumulatve Student s t dstrbuton for ν = /r degrees of freedom. The boundares of the confdence nterval at confdence level CL = α (here α refers to the sze of the statstcal test, not the parameter α n the gamma dstrbuton) are found by settng p = α and solvng for, whch gves the upper and lower lmts ± = y ± vz α/. (5) Here z α/ s the α/ upper quantle of the Student s t dstrbuton,.e., the value of z obs needed n Eq. () tohave p = α. If one were to assume that the statstc t follows the asymptotc ch-squared dstrbuton, then z α/ s replaced by z a = ( r exp ) / Q α r + r ]. (6) Here Q α = F ( α) s obtaned from the quantle of χ the ch-squared dstrbuton for one degree of freedom. And f the Bartlett-corrected statstc t s used to construct the nterval, then the Q α n Eq. (6) s replaced by Q α Et ], where Et ]= + 3r + r s the expectaton value of t from Eq. (0). The half-wdth of the nterval measured n unts of the estmated standard devaton v,.e., z α/ or z a, are shown n Fg. a as a functon of the r parameter. The probablty P c for the confdence nterval to cover the true value of s by constructon equal to α for the exact confdence nterval. For the nterval based on the asymptotc dstrbuton of the test statstc ths s P c = za z a f χ (z) dz = F χ (z a ), (7)

9 Eur. Phys. J. C (09) 79 :33 Page 9 of 7 33 Fg. Plots of a the nterval half-wdth n unts of the estmated standard devaton v and b coverage probablty of the 68.3% CL confdence ntervals for 68.3% CL nterval half-wdth 5 3 (a) exact asymptotc Bartlett corrected coverage probablty (b) exact 0. asymptotc Bartlett corrected r r where F χ s the cumulatve ch-squared dstrbuton for one degree of freedom and z a s gven by Eq. (6), wth Q α replaced by Q α Et ] for the Bartlett-corrected case. The nterval half-wdths and coverage probabltes based on t and t are shown n Fg.. As can be seen, the nterval based on the Bartlett-corrected statstc s very close to the exact one, and ts coverage s close to the nomnal α for relevant values of r. As seen from the dstrbutons n Fgs. and 3 for the sngle-measurement model, the agreement wth the asymptotc form worsens for ncreasng values of the test statstc. For Z = t of (a four standard-devaton sgnfcance; see, e.g., Ref. 6]), the Bartlett-corrected statstc s close to the asymptotc form for r = 0., wth a small but vsble departure for r = 0.. In contrast, for a 68.3% confdence level (correspondng to t = ), one sees from Fg. a that the Bartlett corrected nterval s n satsfactory agreement wth the exact nterval out to r. For a more complcated analyss wth multple measurements havng dfferent r parameters one would need to check the valdty of asymptotc dstrbutons wth Monte Carlo. 6 Least-squares fttng and averagng measurements An mportant applcaton of the model descrbed n Sect. 3 s the least-squares ft of a curve, or as a specal case of ths, the average of a set of measurements. Suppose the data consst of N ndependent Gaussan dstrbuted values y, wth mean and varance Ey ]=ϕ(x ; ) + θ, (8) V y ]=σy. (9) Here the nusance parameters θ represent a potental bas or offset. The functon ϕ(x ; ) plus the bas θ gves the mean of y as a functon of a control varable x, and t depends on a set of M parameters of nterest = (,..., M ). That s, the probablty P(y ϕ, θ) n Eq. () becomes P(y, θ) = N πσy e (y ϕ(x ;) θ ) /σ y. (50) As before suppose the nusance parameters θ are constraned by N correspondng ndependent Gaussan measurements u, wth mean and varance Eu ]=θ, (5) V u ]=σu. (5) Often the best estmates of a potental bas θ wll be u = 0 for the actual measurement, but formally the u are treated as random varables that would fluctuate upon repetton of the experment. Therefore the full log-lkelhood or equvalently lnl(, θ) s up to an addtve constant gven by lnl(, θ) = ] N (y ϕ(x ; ) θ ) σy + (u θ ). σ u (53) That s, f we consder the σ u as known, then maxmumlkelhood estmators are obtaned by the mnmum of the sum of squares (53) whch s the usual formulaton of the method of least squares. The next step wll be to treat the σ u as adjustable parameters but before dong ths s t nterestng to note that by proflng over the nusance parameters θ, one fnds the profle lkelhood lnl () = N (y ϕ(x ; ) u ) σy + σu χ (). (5) That s, the same result s obtaned by usng the usual method of least squares wth statstcal and systematc uncertantes added n quadrature. Ths procedure gves the best lnear unbased estmator (BLUE), whch s wdely used n Partcle

10 33 Page of 7 Eur. Phys. J. C (09) 79 :33 Physcs, partcularly for the problem of averagng a set of measurements as descrbed n Refs. 7 0]. Returnng to the full dependence on and θ and followng the model of Sect. 3 we now regard the systematc varances σ u as free parameters for whch we have ndependent gamma dstrbuted estmates v, wth parameters α and β set by σ u and r accordng to Eqs. () and (). The log-lkelhood profled over the σ u s (cf. Eq. (8)), lnl (, θ) = N (y ϕ(x ; ) θ ) σ y ( ) + + ( r ln + r (u θ ) ) ]. v (55) To fnd the requred estmators we need to solve the system of equatons ln L = 0 =,...,M, (56) ln L = 0, θ =,...,N. (57) Equaton (57) results n θ 3 + u y + ϕ ] θ v + ( + r + )σ y ] r + u (y ϕ ) + u θ ( ) v + (ϕ y ) r + u ( + r )σ y ] u r = 0, =,...,N, (58) where here ϕ = ϕ(x ; ). Smultaneously solvng all M +N equatons for and the θ gves ther ML estmators. Solvng for the θ for fxed,.e., fxed ϕ, gves the profled values ˆθ. Equaton (58) are cubc n θ and so can be solved n closed form gvng ether one or three real roots. In the case of three roots, the one s chosen that maxmzes ln L. Usng the profle log-lkelhood from Eq. (55) one can use, for example, the test statstc t defned n Eq. (3) to fnd confdence regons for followng the general procedure outlned n Sect.. Examples of ths wll be shown n Sect Goodness of ft In the usual method of least squares, the mnmzed sum of squares χ mn = χ ( ˆ) based on Eq. (5) s often used to quantfy the goodness-of-ft. Because t s constructed as a sum of squares of Gaussan dstrbuted quanttes, one can show (see, e.g., Ref. ]) that ts samplng dstrbuton s ch-squared for N M degrees of freedom, and the p-value of the hypothess that the true model les somewhere n the parameter space of s thus p = χ mn f χ N M (x) dx. (59) When usng the gamma error model presented above, the quantty lnl (, θ) s no longer a smple sum of squares. Nevertheless one can construct the statstc that wll play the same role as the mnmzed χ () by consderng the model n whch the means ϕ(x, ), whch depend on the M parameters of nterest, are replaced by a vector of N ndependent mean values, one for each of the measurements: ϕ = (ϕ,...,ϕ N ). By requrng that the ϕ are gven by ϕ(x, ) one mposes N M constrants and restrcts the more general hypothess to an M-dmensonal subspace. One can then construct the lkelhood rato statstc q = ln L ( ˆ, ˆθ) L ( ˆϕ, ˆθ), (60) where the numerator contans the M ftted parameters of nterest ˆ, and n the denomnator one fts all N of the ϕ. When fttng separate values of ϕ and θ for each measurement (the saturated model ), one can see from nspecton that the maxmzed value of ln L (ϕ, θ) s zero, and therefore the statstc q becomes q = mn,θ ( + N + r (y ϕ(x ; ) θ ) ) σ y ( ln + r (u θ ) ) ]. (6) Accordng to Wlks theorem ], n the lmt where the estmators ˆ and ˆθ are Gaussan dstrbuted, q wll follow a ch-squared pdf for N M degrees of freedom. The statstc q thus plays the same role as the mnmzed sum of squares χ mn n the usual method of least squares. In the case of Eq. (6), however, the ch-squared approxmaton s not exact. One can see ths from the fact that the v are gamma rather than Gaussan dstrbuted; the Gaussan approxmaton holds only n the lmt where the r are suffcently small. If all r 0,.e., there s no uncertanty n the reported systematc errors, then the statstc q reduces to the mnmzed sum of squares from the method of least squares or BLUE, namely, q = N (y ( ˆϕ)) σ y + σ u. (6) One can check n an example that the samplng dstrbuton of q follows a ch-squared dstrbuton by generatng measured values y, u, and s accordng to the model descrbed n v

11 Eur. Phys. J. C (09) 79 :33 Page of 7 33 Fg. 5 Dstrbutons of the test varable q for averages of N = and 5 values usng r = 0. andr = 0. f(q) (a) N = r = 0. χ pdf f(q) (b) N = 5 r = 0. χ pdf q (c) q (d) f(q) N = f(q) N = 5 r = 0. r = 0. χ pdf χ pdf q q Sect. usng the followng parameter values: ϕ = =, σ y =, σ u = for all =,...,N. That s, the measurements are assumed to have the same mean and the goal s to ft ths parameter. The resultng dstrbutons of q are shown n Fg. 5a, b for N = and N = 5usngr = 0. for all measurements. Overlayed on the hstograms s the ch-squared pdf for N degrees of freedom. Although the agreement s reasonably good there s stll a notceable departure from the asymptotc dstrbuton n the tals. The same set of curves s shown n Fg. 5c, d for r = 0., for whch one sees an even greater dscrepancy between the true (.e., smulated) and asymptotc dstrbutons. One mght need a p-value wth an accuracy such that assumpton of the asymptotc dstrbuton of q s not adequate. In such a case one can use Monte Carlo to determne the correct samplng dstrbuton of q. Alternatvely, followng the procedure of Sect. 5. one can defne a Bartlettcorrected statstc q as q = N M q, (63) Eq] so that by constructon Eq ]=N M (n the example above for a sngle ftted parameter M = ). Dstrbutons of q correspondng to Fg. 5 are shown n Fg. 6, where the mean value Eq] was tself found from Monte Carlo smulaton. Whle one sees that the dstrbutons of q are n better agreement wth the Monte Carlo, vsble dscrepances reman. And snce here smulaton was requred to determne the Bartlett correcton, one could use t as well to fnd the p- value drectly. The Bartlett correcton s nevertheless useful n such a stuaton because the number of smulated values of q requred to estmate accurately Eq] may be much less than what one needs to fnd the upper tal area for a very hgh observed value of the test statstc. 6. Averagng measurements An mportant specal case of a least-squares ft s the average of N ndependent measurements, y,...,y N,ofthesame quantty,.e., the ft functon ϕ(x; ) = s n effect a horzontal lne and the control varable x does not enter. The expectaton values of the measurements are thus Ey ]= + θ, =,...,N, (6) where the parameter of nterest represents the desred mean value and as before θ are the bas parameters. As there s one

12 33 Page of 7 Eur. Phys. J. C (09) 79 :33 Fg. 6 Dstrbutons of the Bartlett-corrected test varable q for averages of N = and5 values usng r = 0. and r = 0. f(q ) (a) N = r = 0. χ pdf f(q ) (b) N = 5 r = 0. χ pdf q q (c) (d) f(q ) N = f(q ) N = 5 r = 0. χ pdf r = 0. χ pdf q q parameter of nterest, the statstc q follows asymptotcally a ch-squared dstrbuton for N degrees of freedom, although as we have seen above ths approxmaton breaks down as the r ncrease. As an example, consder the average of two ndependent measurements, nomnally reported as y ± σ y ± s for =,, n whch the σ y represent the statstcal uncertantes and s are the estmated systematc errors. Suppose here these are σ y = and s = for both measurements, and that the analyst reports values r representng the relatve accuracy of the estmates of the systematc errors, whch n ths example we wll take to be equal to a common value r. Furthermore suppose that the observed values of y and y are + δ and δ, respectvely, and we wll allow δ to vary. For the values of σ y and s chosen n ths example, the value of δ corresponds to the sgnfcance of the dscrepancy between y and y n standard devatons under assumpton of r = 0. Usng the nput values descrbed above, the mean,bas parameters θ, and systematc errors σ u are adjusted to maxmze the log-lkelhood from Eq. (6). Fgure 7 show the half-wdth of the 68.3% confdence nterval for as a functon of the parameter r for dfferent levels of δ. Ths nterval corresponds to the standard devaton σ ˆ when the r are all small, where the problem s the same as n least squares or BLUE. In Fg. 7a, the nterval s based on Eq. (6),.e., t s determned by the pont where the profle log-lkelhood drops by a fxed amount from ts maxmum (n Partcle Physcs often referred to as the MINOS nterval ]). In Fg. 7b, the nterval s found by solvng for the value of where ts p-value s p = α, and here α = = The p-value depends, however, on the assumed values of the nusance parameters. Here we use the values of θ and σ u profled at the value of tested. Ths technque s often called profle constructon n Partcle Physcs ], where t s wdely used, and elsewhere called hybrd resamplng 3,]. The resultng confdence nterval wll have the correct coverage probablty of α f the nusance parameters are equal to ther profled values; elsewhere the nterval could under- or over-cover. Although the ntervals from profle constructon dffer somewhat from those found drectly on the log-lkelhood, they have the same qualtatve behavour. From Fg. 7 one can extract several nterestng features. Frst, f r s small, that s, the systematc errors σ u are very close to ther estmated values s, then the nterval s halflength s very close to the standard devaton of the estmator,

13 Eur. Phys. J. C (09) 79 :33 Page 3 of 7 33 Fg. 7 Plots of the half-length of the -σ (68.3%) central confdence nterval for the parameter as a functon of the relatve uncertanty on the systematc errors r for dfferent levels of dscrepancy δ between two averaged measurements. Intervals are derved a from the log-lkelhood and b usng profle constructon (see text) half-length of -σ MINOS nterval 8 6 (a) δ = 0 δ = δ = δ = 3 δ = δ = 5 y = - δ ± ± y = + δ ± ± half-length of -σ confdence nterval 8 6 (b) δ = 0 δ = δ = δ = 3 δ = δ = 5 y = - δ ± ± y = + δ ± ± r r σ ˆ =, regardless of the level of dscrepancy between the two measured values. Further, the effect of larger values of r s seen to depend very much on the level of dscrepancy between the measured values. If y or y are very close (e.g., δ = 0 or ), then the length of the confdence nterval can even be reduced relatve to the case of r = 0. If the measurements are n agreement at a level that s better than expected, gven the reported statstcal and systematc uncertantes, then one fnds that the lkelhood s maxmzed for values of the systematc errors σ u that are smaller than the ntally estmated s. And as a consequence, the confdence nterval for shrnks. Fnally, one can see that f the data are ncreasngly nconsstent, e.g., n Fg. 7 for δ, then the effect of allowng hgher r s to ncrease the length of the nterval. Ths s also a natural consequence of the assumed model, whereby an observed level of heterogenety greater than what was ntally estmated results n maxmzng the lkelhood for larger values of σ u and consequently an ncreased confdence nterval sze. The coverage propertes of the ntervals for the average of two measurements example are nvestgated by generatng data values y for =, accordng to a Gaussan wth a common mean (here ) and the standard devatons both σ y =, and the u are generated accordng to a Gaussan dstrbuted wth mean of θ = 0 and standard devaton σ u =. The values v are gamma dstrbuted wth parameters α and β gvenbyeqs.() and () so as to correspond σ u = and for dfferent values of the parameters r, taken here to be the same for both measurements. Fgure 8 shows the coverage probablty for the nterval wth nomnal confdence level 68.3% based on the loglkelhood (the MINOS nterval) and also usng profle constructon (hybrd resamplng), as a functon of the r parameter. As seen n the fgure, the coverage probablty approxmates the nomnal value reasonably well out to r = 0.5, where one fnds P cov = 0.63 and for MINOS and P cov Nomnal CL MINOS nterval Profle constructon r Fg. 8 The coverage probablty of the ntervals based on the lkelhood (MINOS method) and on profle constructon (hybrd resamplng) as a functon of the parameter r (see text) profle constructon respectvely; at r =, the correspondng values are 0.56 and 0.67 (the Monte Carlo statstcal errors for all values s around 0.005). Thus reasonable agreement s found wth both methods but one should be aware that the coverage probablty may depart from the nomnal value for large values of r. 6.3 Senstvty to outlers One of the mportant propertes of the error model used n ths paper s that curves ftted to data become less senstve to ponts that depart sgnfcantly from the ftted curve (outlers) as the r parameters of the measurements are ncreased. Ths s a well-known feature of models based on the Student s t dstrbuton (see, e.g., Ref. ]). The reduced senstvty to outlers s llustrated n Fg. 9 for the case of averagng fve measurements of the same quantty (.e., the ft of a horzontal lne). All measured values are assgned σ y and s equal to.0, and n Fg. 9a, c they are all farly close to the central value of. In Fg. 9b, d the mddle

14 33 Page of 7 Eur. Phys. J. C (09) 79 :33 Fg. 9 Result of averagng fve quanttes: a no outler, r = 0.0; b wth outler, r = 0.0; c no outler, r = 0.; d wth outler, r = 0.. Also ndcated on the plots are the values of the Bartlett-corrected goodness-of-ft statstc q and the correspondng p value y (a) =.00 ± 0.63 q = 5.0 p = 0.9 r = 0.0 (all) data y (b) =.00 ± 0.63 q =.9-9 p =. r = 0.0 (all) data 5 5 y 5 (c) =.00 ± 0.65 data y 5 (d) =.75 ± 0.78 data 0 q =.9 p = 0.30 r = 0. (all) 0 q = p = 3.9 r = 0. (all) pont s at 0. In the top two plots, the r parameters for all measurements are taken to be r = 0.0, whch s very close to what would be obtaned wth an ordnary least-squares ft. In (a) the average s ; n (b) the outler causes the ftted mean to move to.00. In both cases the half-wdth of the confdence nterval s In the lower two plots, (c) and (d), all of the ponts are assgned r = 0.,.e., a 0% relatve uncertanty on the systematc error. In the case wth no outler, (c), the estmated mean stays at.00, and the half-wdth of the confdence nterval only ncreases a small amount to Wth the outler n (d), the ftted mean s.75 wth an nterval half-wdth of That s, the amount by whch the outler pulls the estmated mean away from the value preferred by the other ponts (.00) s substantally less than wth r = 0.0, (ftted mean.00). Furthermore, the lower compatblty between the measurements results n a confdence nterval that s larger than wthout the outler (half-wdth 0.78 rather than 0.65). When the r are small, however, the nterval sze s ndependent of the goodness of ft. Both the ncrease n the sze of the confdence nterval and the decrease n senstvty to the outler represent mportant mprovements n the nference. It s mportant to note that the above-mentoned propertes pertan to the case where each measurement has ts own bas parameter θ wth ts own r. It mght appear that one would obtan a result roughly equvalent to that of the proposed model by usng the ordnary least-squares approach,.e., the log-lkelhood of Eq. (53), and smply makng the replacement σ u σ u ( + r ). In the example shown above wth all r = 0., however, the result s ˆ =.00 ± 0.70 wthout the outler (mddle data pont at ) and ˆ =.00 ± 0.70 f the mddle pont s moved to 0. So by nflatng the systematc errors but stll usng least squares, one ncreases the sze of the confdence nterval by an amount that does not depend on the goodness of ft and the senstvty to outlers s not mproved. 7 Treatment of correlated uncertantes The phrase correlated systematc uncertantes s often taken to mean the stuaton where a nusance parameter affects multple measurements n a coherent way. Suppose, for example, that the expectaton values Ey ] of measured quanttes y wth =,...,L are functons ϕ (, θ) of parameters of nterest = (,..., M ) and nusance parameters θ = (θ,...,θ N ). Suppose further that the nusance parameters are defned such that for θ = 0they

15 Eur. Phys. J. C (09) 79 :33 Page 5 of 7 33 are unbased measurements of the nomnal model ϕ (). Expandng ϕ to frst order n θ therefore gves Ey ]=ϕ (, θ) ϕ () + N R j θ j, (65) j= where the factors R j = ϕ / θ j θ=0 determne how much θ j bases the measurement y. Suppose that the R j are known, ether from symmetry (e.g., a partcular θ j could be known to contrbute equally to all of the y ) or they are determned usng a Monte Carlo smulaton. As before suppose one has a set of ndependent Gaussan-dstrbuted control measurements u j used to constran the nusance parameters, wth mean values θ j and standard devatons σ u j. One can defne the total bas of measurement y as b = N R j θ j. (66) j= and an estmator for b s ˆb = N R j u j. (67) j= These estmators of the bases are correlated. As the control measurements are assumed ndependent, and therefore covu k, u l ]=V u k ]δ kl, the covarance of the bas estmators s U j = cov ˆb, ˆb j ]= N R k R jk V u k ]. (68) k= It s n the sense descrbed here that the proposed model s capable of treatng correlated systematc uncertantes. That s, although the control measurements u are ndependent they result n a nondagonal covarance for the estmated bases of the measurements. The matrx U j s shown here only to llustrate how correlated bas estmates can be related to ndependent control measurements and t s not explctly needed n the type of the analyss descrbed here. The full lkelhood can be constructed from the measurements y together wth ther expectaton values gven by Eq. (65), where the R j are assumed known. That s, n the log-lkelhood of Eqs. (53) or(55) the terms y ϕ(x ; ) θ are replaced by y ϕ () Nj= R j θ j. If the varances σu of the control measurements u are themselves uncertan then they are treated as adjustable parameters wth ndependent gamma-dstrbuted estmates. 8 Dscusson and conclusons The statstcal model proposed here can be appled n a wde varety of analyses where the standard devatons of Gaussan measurements are deemed to have a gven relatve uncertanty, reflected by the parameters r defned n Eq. (9). The quadratc constrant terms connectng control measurements to ther correspondng nusance parameters that appear n the log-lkelhood are replaced by logarthmc terms cf. Eqs. (3) and (8)]. The resultng model s equvalent to takng a Student s t dstrbuton for the control measurements, wth the number of degrees of freedom gven by ν = /r. It s not uncommon for systematc errors, especally those related to theoretcal uncertantes, to be uncertan themselves to several tens of percent. The model presented here allows such uncertantes to be taken nto account and t has been shown that ths has nterestng and useful consequences for the resultng nference. Confdence ntervals are found to ncrease n sze f the goodness of ft s poor and can decrease slghtly f the data are more nternally consstent than expected, gven the level of statstcal fluctuaton assumed n the model. Averages and ftted curves become less senstve to outlers. If the relatve uncertanty on the systematc errors s large enough (r greater than around 0. n the examples studed), then the samplng dstrbuton of lkelhood-rato test statstcs starts to depart from the asymptotc ch-squared form. Thus one cannot n general apply asymptotc results for p values and confdence ntervals wthout takng some care to ensure ther valdty. In some cases Bartlett-corrected statstcs can be used; alternatvely one may need to determne the relevant dstrbutons by Monte Carlo smulaton. In reportng results that use the procedure presented here t s mportant to communcate all of the r parameters. To allow for combnatons wth other measurements one should deally report the full lkelhood, ncludng the r values, to permt a consstent treatment of uncertantes common to several of the measurements. The pont of vew taken here has been that the analyst must determne reasonable values for the relatve uncertantes n the systematc errors. One should not, for example, decde to use the proposed model only f the goodness of ft s found to be poor. Rather, the r parameters should reflect the accuracy wth whch the systematc varances have been estmated and the resultng nference about the parameters of nterest then ncorporates ths knowledge n a manner that s vald for any data outcome. An alternatve mentoned here as a possblty would be to ft a common relatve uncertanty to all systematc errors (a global r), e.g., when averagng a set of numbers for whch no r values have been reported. Ths s analogous to the scalefactor procedure used by the Partcle Data Group 9] or the method of DerSmonan and Lard 7] wdely used n meta-

Goodness of fit and Wilks theorem

Goodness of fit and Wilks theorem DRAFT 0.0 Glen Cowan 3 June, 2013 Goodness of ft and Wlks theorem Suppose we model data y wth a lkelhood L(µ) that depends on a set of N parameters µ = (µ 1,...,µ N ). Defne the statstc t µ ln L(µ) L(ˆµ),