Goodness of fit and Wilks theorem

DRAFT 0.0 Glen Cowan 3 June, 2013 Goodness of ft and Wlks theorem Suppose we model data y wth a lkelhood L(µ) that depends on a set of N parameters µ = (µ 1,...,µ N ). Defne the statstc t µ ln L(µ) L(ˆµ), (1) where ˆµ are the ML estmators for µ. The value of t µ s a measure of how well the hypotheszed set of parameters µ stand n agreement wth the data. If the agreement s poor, then ˆµ wll be far from µ, the rato of lkelhoods wll be low and t µ wll be large. Larger values of t µ thus ndcate ncreasng ncompatblty between the data and the hypotheszed µ. Accordng to Wlks theorem, f the parameter values µ are true, then the asymptotc lmtofalargedatasample, thepdfoft µ sach-squaredstrbutonforn degreesoffreedom. We wll wrte ths as f(t µ µ) χ 2 N. (2) Suppose we have a data set that gves us an observed value of the statstc t µ,obs. We can quantfy the level of compatblty between µ and the observed data by computng the p-value p µ = t µ,obs f χ 2 N (t µ µ)dt µ. (3) Nowsupposethatthesetofparametersµcanbeexpressedasµ(θ)whereθ = (θ 1,...,θ M ) s a set of M parameters wth M < N. Now defne q µ ln L(µ(ˆθ)) L(ˆµ). (4) That s, n the numerator we adjust M parameters and n the denomnator N. In ths case, Wlks theorem states f(q µ µ(θ)) χ 2 N M (5) Provded certan regularty condtons are satsfed, ths holds regardless of the value of θ. Ths s a very useful property that allows one to compute p-values wthout needng to assume partcular values for the parameters θ. In ths case the p-value reflects the compatblty of the assumed functonal form µ(θ). 1

1 Gaussan data Suppose that the data are a set of N ndependent Gaussan dstrbuted values, y Gauss(µ,σ ), = 1,...,N, (6) where the standard devatons σ are known but the µ must be determned from the data. The lkelhood s so that the log-lkelhood s L(µ) = N 1 2πσ e (y µ ) 2 /2σ 2, (7) lnl(µ) = 1 (y µ ) 2 2 σ 2 +C, (8) where C does not depend on µ. By settng the dervatves of lnl(µ) wth respect to the µ to zero we fnd the ML estmators to be and from ths we fnd ˆµ = y, (9) t µ ln L(µ) L(ˆµ) = N (y µ ) 2 σ 2. (10) In the case where M parameters θ 1,...,θ M are ftted, the statstc q µ s q µ ln L(µ(ˆθ)) L(ˆµ) (y µ (ˆθ)) 2 = σ 2. (11) Thus we can use the mnmzed value of the sum of squares from an LS ft to test the goodness of ft. In such a case the values of µ are obtaned by assumng a functonal relaton between µ and a control varable x, whose value s fxed for each measurement of y. That s, µ (θ) = µ(x ;θ), = 1,...,N. (12) The p-value therefore reflects the degree of compatblty between the data and the functonal form µ(x; θ). 2 Hstogram of Posson or multnomal data Consder now a set of data values n = (n 1,...,n N ) whch we may thnk of as a hstogram wth N bns. Suppose the values are ndependent and Posson dstrbuted wth mean values ν, so that the jont probablty for the vector s 2

N P(n;ν) = ν! e ν. (13) The log-lkelhood s therefore where C represents terms that do not depend on ν. lnl(ν) = lnν ν ]+C, (14) If we regard each of the ν as adjustable, then by settng the dervatves of lnl(ν) wth respect to all of the ν to zero we fnd the ML estmators ˆν =, = 1,...,N. (15) Usng ths we can wrte down the statstc analogous to Eq. (1), t ν ln L(ν) L(ˆν) (16) ln ν ] ν + ˆν ˆν ln ν ] ν + (17), (18) where n the fnal lne we used ˆν =. By gong back to the orgnal Posson probabltes one can see that f = 0, then the logarthmc term n Eq. (16) s n fact absent. As wth the statstc t µ from above, Wlks theorem says that the dstrbuton of t ν approaches a ch-square dstrbuton for N degrees of freedom n the lmt of a large data sample. Here one can see the role of the large sample lmt, snce then the estmators ˆν = become approxmately Gaussan dstrbuted. Now suppose that the set of N mean values ν can be determned through a set of M parameters θ = (θ 1,...,θ M ). We can then defne the statstc q ν ln L(ν(ˆθ)) L(ˆν) ln ν ] (ˆθ) ν (ˆθ)+. (19) As wth the statstc q µ above, ths wll follow a ch-square dstrbuton for N M degrees of freedom. In some problems one may want to model a hstogram of values n = (n 1,...,n N ) as followng a multnomal dstrbuton. Ths s smlar to the Posson case above except that the total number of entres, n tot = (20) 3

s regarded as constant. There are n effect N 1 free parameters n the problem, whch can be taken as all but one of the probabltes p = (p 1,...,p N ) for an event to be n one of the N bns. One of the p s fxed from the constrant The multnomal dstrbuton for s P(n p,n tot ) = p = 1. (21) n tot! n 1!n 2!...n N! pn 1 1 pn 2 2...pn N N. (22) Snce n tot s fxed, we can regard the parameters to be ν = p n tot. The log-lkelhood functon s then lnl(ν) = ln ν n tot +C. (23) As n the Posson case the ML estmators for the ν are found to be ˆν =, so the statstc t ν then becomes t ν ln ν. (24) That s, t s the same as n the Posson case but wthout the terms ν +. Because here there are only N 1 ftted parameters (one of the ˆν can be determned from n tot mnus the sum of the rest), Wlks theorem says that t ν follows a ch-square dstrbuton for N 1 degrees of freedom. If the N mean values ν are determned from M parameters θ = (θ 1,...,θ M ), then the dstrbuton of the correspondng q ν, q ν s a ch-square dstrbuton for N M 1 degrees of freedom. ln ν (ˆθ), (25) Now suppose nstead of evaluatng the ν terms n Eqs. (19) and (25) wth the ML estmators for θ, we wrte the correspondng quanttes as a functon of θ,.e., χ 2 M(θ) χ 2 P(θ) ln ν (θ), (26) ln ν ] (θ) ν (θ)+, (27) where the subscrpts M and P refer to the multnomal or Posson cases, respectvely. These expressons are equal to the correspondng values of 2 ln L(θ). So to maxmze the lkelhood one can smply mnmze χ 2 P (θ) or χ2 M (θ), and the same ML estmators ˆθ wll result. 4

As an added bonus, however, the value of the mnmzed functon can be used drectly for a test of the goodness of ft, and to the extent that Wlks theorem s satsfed, ts samplng dstrbutos a ch-square dstrbuton for N M (Posson) or N M 1 (multnomal) degrees of freedom. References 1] S.S. Wlks, The large-sample dstrbuton of the lkelhood rato for testng composte hypotheses, Ann. Math. Statst. 9 (1938) 60-2. 2] G. Cowan, Statstcal Data Analyss, Oxford Unversty Press, 1998. 3] Steve Baker and Robert D. Cousns, Clarfcaton of the use of the ch-square and lkelhood functons n fts to hstograms, NIM 221 (1984) 437. 5