Maximum Likelihood Methods Hogg Chapter ix TAT 406-0: Mathematical tatistics II prig emester 06 Cotets 0 Admiistrata Maximum Likelihood Estimatio. Maximum Likelihood Estimates............ Motivatio.................... Example: igal Amplitude........ 3..3 Example: Bradley-Terry Model...... 3..4 Reparameterizatio............. 4. Fisher Iformatio ad The Cramér-Rao Boud. 4.. Fisher Iformatio............. 5.. The Cramér-Rao Boud.......... 6 Maximum-Likelihood Tests 8. Likelihood Ratio Test................ 8.. Example: Gaussia Distributio...... 8. Wald Test...................... 9.. Example: Gaussia Distributio...... 9.3 Rao cores Test................... 9.3. Example: Gaussia Distributio...... 9 Copyright 06, Joh T. Whela, ad all that 3 Multi-Parameter Methods 0 3. Maximum Likelihood Estimatio......... 0 3. Fisher Iformatio Matrix............. 3.. Example: ormal distributio....... 3.. Fisher Iformatio Matrix for a Radom ample.................... 3..3 Error Estimatio.............. 3.3 Maximum Likelihood Tests............. 3 3.3. Example: Mea of Normal Distributio with Ukow Variace.......... 3 3.3. Example: Mea of Multivariate Normal Distributio with Uit Variace...... 4 3.3.3 Large-ample Limit............. 5 Thursday 8 February 06 Read ectio 6. of Hogg 0 Admiistrata Effects of sow day: homeworks due o Thursday through prig Break. First prelim exam ow Thursday, March 3. hift
back to ormal after prig Break. Maximum Likelihood Estimatio. Maximum Likelihood Estimates We ow retur to methods of classical, or frequetist statistics, which are phrased ot i terms of probabilities which quatify our cofidece i geeral propositios with ukow truth values, but exclusively i terms of the frequecy of outcomes i hypothetical repetitios of experimets. Oe of the fudametal tools is the samplig distributio, which we ll write as a joit pdf for a radom vector X: f X x. If the model has a parameter θ, we ll write this as f X x; θ, ad whe we cosider this as a fuctio of θ, we ll write it Lθ; x or sometimes Lθ. Ad we will ofte fid it coveiet to work with the logarithm lθ l Lθ. Recall from last semester that oe possible choice of a estimate for θ give a observed data vector x is the maximum likelihood estimate θx argmax θ Lθ; x argmax θ lθ; x. If we take the fuctioal form of θx ad isert the radom vector X ito the fuctio, we get a statistic θ θx which is kow as the maximum likelihood estimator. Note that this is a radom variable ot i the formal sese which we ivoked to assig a Bayesia probability distributio to a ukow parameter Θ, but rather a fuctio of the radom vector X whose value will be differet i differet radom trials, eve with a fixed value for the parameter θ. Of course, if we re usig θ as a estimator, the true value of the parameter θ will be ukow. A useful special case is where the radom vector X is a sample of size draw from some distributio fx; θ. The the likelihood fuctio will be Lθ; x fx i ; θ. ad the log-likelihood will be.. Motivatio lθ; x l fx i ; θ.3 ice we kow that i the Bayesia framework, the posterior pdf for the parameter θ give the observed data x is f Θ X θ x f Θ θ f X Θ x θ,.4 we see that if the prior pdf f Θ θ is a costat, the posterior will be proportioal to f X Θ x θ Lθ; x, ad therefore the θx which maximizes the likelihood will also maxmize the posterior psd f Θ X θ x. Hogg also ivokes two theorems which idicate that the MLE matches up with the true ukow value of the parameter i the limit that the size of a sample goes to ifiity: Theorem 6..: uppose θ 0 is the true value of a parameter θ, ad X is a sample of size from a distributio fx; θ. Give certai regularity coditios, most importatly that θ 0 is ot at the boudary of the parameter space for θ, the true parameter maximizes the likelihood i the limit, i.e., lim P Lθ 0; X > Lθ 0 ; X for θ θ 0.5 We also eed differet θ values to give differet pdfs for X, ad that the support space for X is the same for all θ.
Theorem 6..3: Uder the same regularity coditios, the maximum likelihood estimator, if it exists, coverges i probability to θ 0 : θ P θ 0.6 so the maximum likelihood estimate is θ h ix i /σ i j h j /σ j.0 The first theorem says that the true value becomes the maximum likelihood value i the limit of a ifiitely large sample, while the secod says that the mle becomes the true value i that limit... Example: igal Amplitude ee Hogg for several examples of maximum-likelihood solutios associated with radom samples. To complemet those, we re goig to cosider a couple of cases where the radom vector is ot quite a radom sample, but still draw from a distributio with a parameter θ. First, cosider a simplified example from sigal processig. uppose you have a set of data poits {x i } which represet a sigal with a kow shape, described by {h i } ad a ukow amplitude θ, o top of which there is some Gaussia oise with variaces {σi }. I.e., the distributio associated with each data value is X i Nθh, σi. This meas the likelihood fuctio is Lθ; x ad the log-likelihood is The derivative is lθ lθ πσ i exp x i θh i σ i.7 x i θh i + cost.8 h i x i θh i σ i σ i h i x i σ i θ h i σi.9 We ca also write this i matrix form which would allow geeralizatio to covariaces betwee data poits via a o-diagoal variace-covariace matrix Σ θ h T Σ h h T Σ x. This makes the sigal model, with maximum-likelihood amplitude, where h θ h h T Σ h h T Σ x Σ / PΣ / x. P Σ / h h T Σ h h T Σ /.3 is a projectio matrix. Note that i a more complicated problem, we might have a superpositio of differet templates, each with its ow amplitude, k so the sigal would be X N θ ih i, Σ..3 Example: Bradley-Terry Model The Bradley-Terry model 3 is desiged to model paired comparisos betwee objects teams or players i games, foods i taste tests, etc. Each object has a stregth π i, 0 < π i <, ad i a compariso betwee two objects, object i will be chose over object j with a probability i π π i +π j. Cosider a restricted E.g., for cotiuous gravitatioal waves, you have two polarizatios, each with amplitude ad phase, so four parameters. 3 Zermelo, Mathematische Zeitschrift 9, 436 99; Bradley ad Terry Biometrika 39, 34 95 3
versio of the model where a object with ukow stregth θ is compared to a series of objects 4 with kow stregths π, π,..., π. Let X i be a Beroulli radom variable which is if the object wis the ith compariso ad 0 if it loses, so that θ X i b,.4 θ + π i The the likelihood is Lθ; x f X Θ x θ θ x i π x i i θ + π i.5 ad the log-likelihood is lθ [x i l θ + x i l π i lθ + π i ].6 ad l θ x i θ θ + π i.7 If we defie y x i to be the total umber of comparisos wo, the mle for θ satisfies y x i θ θ + π i.8 The right had side is the average umber of compariso wis you d predict, ad the mle is just the stregth which makes this equal to the actual umber observed..8 ca t be solved i closed form, oe ca fid the mle umerically by iteratig 5 y θ θ.9 + π i 4 Note that, sice the stregths are assumed to be kow, some of these comparisos may actually be repeated comparisos with the same object. 5 Ford, America Mathematical Mothly 64, 8 957..4 Reparameterizatio A key property of maximum-likelihood estimates is captured i Hogg s Theorem 6..: If you chage variables i the likelihood from θ to some other η gθ, ad work out the mle for η, it is η g θ. The proof is almost trivial, so rather tha repeat it, we ll demostrate the property i our Bradley-Terry example. Let λ l θ so that < λ <. The calculatio of the log likelihood proceeds as before, ad we ca substitute θ e λ ito.6 to get [ lλ xi λ + x i l π i le λ + π i ].0 The derivative is l λ x i eλ e λ + π i. so the maximum-likelihood equatio is e λ y x i. e λ + π i which is just the same as.8 with θ replaced by e λ. Tuesday 3 February 06 Read ectio 6. of Hogg. Fisher Iformatio ad The Cramér-Rao Boud We ow tur to a cosideratio of the properties of estimators ad other statistics. uppose we have a radom vector X which 4
is a radom sample from a distributio fx; θ which icludes a parameter θ. We kow the joit samplig distributio is f X θ x,..., x θ fx i ; θ.3 If we have a statistic Y ux, this is a radom variable. The mea ad variace of Y are ot radom variables, but they deped o θ because the samplig distributio does: µ Y θ E Y θ ad VarY θ E Y µ Y θ θ.. Fisher Iformatio ux,..., x fx i ; θ dx i.4 [ux µ Y θ] f X θ x θ d x.5 The first such fuctio we wat to cosider is the derivative with respect to the parameter θ of the log-likelihood lθ; x fx; θ. We will focus first o the case of a sigle radom variable before cosiderig a radom sample. We will also assume a eve broader set of regularity coditios tha last time, specifically that we re allowed to iterchage derivatives ad itegrals. The quatity of iterest is ux l θ, x l fx; θ fx; θ fx; θ.6 ad is oe measure of how much the probability distributio chages whe we chage the parameter. Note i this case the fuctio ux depeds explicitly o the parameter, i additio to the θ depedece of the distributio for the radom variable X. ux l θ, X is a statistic whose expectatio value is E l fx; θ fx; θ θ, X fx; θ dx dx fx; θ.7 where we explicitly restrict the itegral to the support space x so that fx; θ > 0. Now the regularity coditios especially the fact that is the same for ay allowed value of θ allow us to pull the derivative outside the itegral so we have E l θ, X d fx; θ dx d 0.8 dθ dθ where we have used the fact that fx; θ is a ormalized pdf i x. o ote that while l θx, x 0, i.e., the derivative of the log-likelihood is zero at the maximum-likelihood value of θ for a give x, the expectatio value E l θ, X vaishes for ay θ. This allows us to write the variace as Var l θ, X E [l θ, X] l fx; θ.9 fx; θ dx Iθ Iθ is a property of the distributio kow as the Fisher iformatio. You ve ecoutered it o the homework i the cotext of the Jeffreys prior f Θ θ Iθ. Note that if we go back ad differetiate.8 we get 0 d dθ E l θ, X d l fx; θ fx; θ dx dθ l fx; θ l fx; θ fx; θ fx; θ + dx dx l fx; θ l fx; θ fx; θ dx + fx; θ dx.30 5
This meas there s a equivalet ad sometimes easier to calculate way to write the Fisher iformatio as l fx; θ Iθ fx; θ dx E l θ; X.3 Example: Fisher iformatio for the mea of a ormal distributio To give a very simple example, cosider the ormal distributio Nθ, σ. The likelihood is so the log-likelihood is whose derivatives are fx; θ σ π exp x θ σ.3 x θ l fx; θ σ lπσ.33 x θ l fx; θ σ l fx; θ σ.34a.34b o the Fisher iformatio is Iθ. Note i passig that σ the expectatio of the first derviative is X θ, whose expectatio value is ideed σ zero. Fisher iformatio for a sample of size We ca ow cosider the Fisher iformatio associated with a radom sample of size, give that l fx; θ Iθ E θ.35 I fact, sice the joit samplig distributio, ad therefore the likelihood for the sample, is f X Θ x θ fx i ; θ,.36 the log likelihood will be l f X Θ x θ l fx i ; θ,.37 ad therefore the Fisher iformatio is l f X Θ x θ E θ l fx i ; θ E θ Iθ.38 where we ve used the fact that both the derivative ad the expectatio value are liear operatios, ad that the radom variables {X i } are iid. Note that the Fisher iformatio for a sample of size from a ormal distributio is, which is oe over the variace of the σ sample mea... The Cramér-Rao Boud Now we retur to cosideratio of a geeral statistic Y ux, with a special eye towards oe beig used as a estimator of the parameter θ. Recallig the defiitio of the expectatio value µ Y θ E Y ux,..., x f X Θ x, θ d x.39 ad variace VarY θ E [Y µ Y θ] θ [ux µ Y θ] f X θ x θ d x.40 6
we otice a similarity to the Fisher iformatio for the sample l fx Θ X θ Iθ E θ.4 Both are of the form E [ax] θ a a where we have defied the ier product a b E ax bx θ ax bx fx; θ dx.4 This behaves i may ways like a dot product a b, ad i particular satisfies the Cauchy-chwarz iequality 6 a a b b a b.43 This meas that VarY θ Iθ E Y µ Y θ l f X ΘX θ θ 0 l fx Θ X θ l fx Θ X θ E ux θ µ Y θ E θ f X Θ X θ ux,..., x f X Θ x, θ f X Θ x, θ d x d ux,..., x f X Θ x, θ d x d dθ dθ µ Y θ.44 where the last step holds as log as ux does t deped explicitly o θ. The iequality ca be solved for the variace of the statistic Y to give the Cramér-Rao boud: VarY θ µ Y θ Iθ.45 6 I vector algebra, this is a cosequece of the law of cosies a b a b cos φ a, b I the special case where Y is a ubiased estimator of θ, so that µ Y θ, it simplifies to VarY θ Iθ if E Y θ θ.46 I.e., the variace of a ubiased estimator ca be o less tha the reciprocal of the Fisher iformatio of the sample. To retur to our simple example of a Gaussia samplig distributio Nθ, σ, let Y X, the sample mea, which we kow is a ubiased estimator of the distributio mea θ. The the Cramér-Rao boud tells us VarX θ Iθ σ.47 Of course, we kow that the variace of the sample mea is σ /, which meas that i this case the boud is saturated, ad the sample mea is the lowest-variace ubiased estimator of the distributio mea for a ormal distributio with kow variace. Whe the Cramér-Rao boud is saturated, i.e., VarY θ for a ubiased estimator, we say that Y is a efficiet estimator of θ. Otherwise, we ca defie the ratio of the CramérIθ Rao boud to the actual variace as the efficiecy of the estimator. Aother sellig poit of maximum likelihood estimates is their relatioship to efficiecy. A MLE is ot always a efficiet estimator, but oe ca prove usig Taylor series that it becomes so i the limit that the sample size goes to ifiity. Aother importat theorem the proof of which we omit for ow is that, subject to the usual regularity coditios, the distributio for the MLE coverges to a Gaussia: θx θ D N 0, Iθ.48 7
Tuesday March 06 Read ectio 6.3 of Hogg Maximum-Likelihood Tests o far we ve used maximum likelihood methods to cosider ways to estimate the parameter θ of a distributio fx; θ. We ow cosider tests desiged to compare a ull hypothesis H 0 which specifies θ θ 0 ad H, which allows for arbitrary θ withi the allowed parameter space. Note that sice these are classical hypothesis tests, we do t require H to specify a prior probability distributio for θ. I each case, we will costruct a test statistic from a sample of size, draw from fx; θ, which is asymptotically chi-square distributed i the limit.. Likelihood Ratio Test Defie as usual the likelihood ad its logarithm Lθ; x f X Θ x θ lθ; x l f X Θ x θ fx i ; θ. l fx i ; θ. Recall that for Bayesia hypothesis testig, the atural quatity to costruct was the Bayes Factor B 0 f Xx H 0 f X x H Lθ 0 ; x Lθ; x f θ Ω Θθ H dθ.3 i the frequetist case we do t defie a prior f Θ θ H, so istead, of all the θ values at which to evaluate the likelihood, we choose the oe which maximizes it 7, ad look at the likelihood ratio Λx Lθ 0; x.4 L θx; x Note that i the frequetist picture Lθ; X ad ΛX are statistics, i.e., radom variables costructed from the sample X... Example: Gaussia Distributio Cosider a sample of size draw from a Nθ, σ distributio, for which we kow the mle is x ad the likelihood is Lθ; x σ π exp x σ i θ exp.5 σ θ x ad the likelihood ratio is Λx exp ice X Nθ, σ /, we have σ θ 0 x.6 l ΛX X θ 0 χ assumig H σ 0.7 / Now, this wo t be exactly true i geeral, but the various asymptotic theorems show that i geeral, for a regular uderlyig distributio, it s true i the asymptotic sese: χ L l ΛX D χ assumig H 0.8 7 There are various argumets which make this a reasoable choice, most revolvig aroud the fact that for large samples, the likelihood fuctio is approximately Gaussia with a peak value of L θx; x. 8
For fiite, we assume this is approximately true, ad compare l ΛX to the 00 αth percetile of χ distributio to get a test with false alarm probability α.. Wald Test The statistic l ΛX costructed from the likelihood ratio is oly oe of several statistics which are asymptotically χ. Aother possibility is to cosider the result that D θx θ0 N 0,.9 Iθ 0 ad costruct the statistic θx θ0 χ W I θx θx θ0.0 /I θx Comparig this to the percetiles of a χ is kow as the Wald test... Example: Gaussia Distributio The distributio of the statistic χ W should coverge to χ as. We ca see what the exact distributio is for a give choice of the uderlyig samplig distributio. Assumig agai a Nθ, σ distributio for fx; θ, we have Fisher iformatio Iθ σ, maximum likelihood estimator θx X, ad Wald test statistic χ W X θ 0 σ. which is of the same form as.7, which we kow to be exactly χ for ay..3 Rao cores Test The fial likelihood-based test ivolves the score l fx i ; θ. associated with each radom variable i the sample, whose sum is the derivative of the log-likelihood l θ; X l fx i; θ We kow the expectatio value is l E l fxi ; θ θ; X E.3 0 0.4 ad the variace is l Var l fxi ; θ θ; X Var Iθ Iθ.5 so we costruct a test statistic which, i the limit that the sum of the scores is ormally distributed, becomes a χ : χ R l θ 0 ; X Iθ 0.6 Comparig this to the percetiles of a χ is kow as the Rao scores test. It has the advatage over the other two tests that we do t eed to work out the maximum likelihood estimator to apply it..3. Example: Gaussia Distributio If the uderlyig distributio is Nθ, σ, so that l fx; θ l πσ x θ σ.7 9
ad l fx; θ which makes the derivative of the log-likeihood l θ; X X i θ σ which makes the Rao scores statistic χ R X θ 0 σ 4 σ x θ σ.8 X θ σ.9 X θ 0 σ / which is, agai, exactly χ distributed i this case. Wedesday March 06 Review for Prelim Exam Oe.0 The exam covers materials from the first four class-weeks of the term, i.e., Hogg sectios.-.3 ad 6.-6. ad associated topics covered i class through February 3, ad problem sets -4. Thursday 3 March 06 First Prelim Exam Tuesday 8 March 06 Read ectio 6.4 of Hogg 3 Multi-Parameter Methods We ow cosider the case where the probability distributio fx; θ depeds ot just o a sigle parameter, but p parameters {θ, θ,..., θ p } {θ α } θ. The likelihood for a sample of size is the Lθ; x fx i ; θ 3. ad the log-likelihood is lθ; x l fx i ; θ 3. 3. Maximum Likelihood Estimatio We re still iterested i the set of parameter values { θ,..., θ p } which maximize the likelihood or, equivaletly, its logarithm. But to maximize a fuctio of p parameters, we eed to set all of the partial derivatives to zero, which meas there are ow p maximum likelihood equatios lθ; x 0 α,,..., p 3.3 α We ca i priciple solve these p equatios for the p ukows { θ α }. For example, cosider a sample of size draw from a Nθ, θ distributio with pdf fx; θ, θ exp x θ 3.4 πθ θ so that l fx; θ, θ lπ l θ x θ The partial derivatives are l fx; θ, θ x θ θ l fx; θ, θ + x θ θ θ θ 3.5 3.6a 3.6b 0
Takig the partial derivatives of 3. ad usig liearity the gives us lx; θ, θ x i θ θ lx; θ, θ + x i θ θ θ 3.7a 3.7b We eed to set both of these to zero to fid θ ad θ, i.e., the coupled maximum-likelihood equatios are x i θ + x i θ θ θ We ca solve the first equatio for θ θ 0 3.8a 0 3.8b x i x 3.9 ad the substitute that ito the secod to get i.e., θ θ x i x θ 3.0 x i x s 3. where s x i x is the usual ubiased sample variace. 3. Fisher Iformatio Matrix If we cosider the score statistic for a sigle observatio l fx; θ α 3. we see that it s ow actually a radom vector with oe compoet per parameter. It is still true that l fx; θ fx; θ E α fx; θ fx; θ dx 0 3.3 α You ca show this by takig α of the ormalizatio itegral for fx; θ. ice this is a radom vector, we ca costruct its variace-covariace matrix Iθ with elemets [ ] [ ] l fx; θ l fx; θ I αβ θ E 3.4 α β As before, we ca differetiate 3.3 with respect to θ β to get the idetity 0 l fx; θ fx; θ dx β α l fx; θ l fx; θ fx; θ fx; θ dx + dx α β α β l fx; θ l fx; θ l fx; θ E + fx; θ dx α β α β 3.5 i.e., a equivalet way of writig the Fisher iformatio matrix is l fx; θ I αβ θ E 3.6 α β Note that by defiitio ad costructio, the Fisher matrix is symmetric.
3.. Example: ormal distributio If we retur to the case of a Nθ, θ distributio, takig derivatives of 3.6 gives us l fx; θ, θ l fx; θ, θ θ θ 3.7a x θ θ 3 l fx; θ, θ l fx; θ, θ x θ θ 3.7b 3.7c If we recall E X θ ad VarX E X θ θ, we get the Fisher iformatio matrix elemets I αβ θ E l fx;θ,θ α β : i.e., I θ θ I θ + θ θ θ 3 I θ I 0 Iθ θ 0 0 θ θ 3.8a 3.8b 3.8c 3.9 3.. Fisher Iformatio Matrix for a Radom ample As i the sigle-parameter case, liearity meas that the Fisher iformatio matrix for a sample of size is just times the FIM for the distributio: lθ; X l fx; θ E E I αβ θ α β α β 3.0 3..3 Error Estimatio Recall that for a model with a sigle parameter θ, the Cramér- Rao boud stated that a ubiased estimator of θ had miimum variace of. Also, the maximum-likelihood estimator Iθ θ from a sample of size saturated that boud i the limit, ad i fact its distributio coverged to a Gaussia with the miimum variace specified by the boud: θ θ D N 0, Iθ 3. I the multi-parameter case, we have a maximum-likelihood estimator θ which is a radom vector with p compoets { θ α α,..., p}. The correspodig limitig distributio is a multivariate ormal: θ θ D N p 0, Iθ 3. The matrix iverse Iθ of the Fisher iformatio matrix is the variace-covariace matrix of the limitig distributio. It is also this iverse matrix which provides the multiparameter versio of the Cramér-Rao boud. If T α X is a ubiased estimator of oe parameter θ α, the lower boud o its variace is Var T α X [ I θ ] 3.3 αα I.e., the diagoal elemets of the iverse Fisher matrix provide lower bouds o the variace of ubiased estimators of the parameters. I cases like 3.9, where the Fisher matrix is diagoal, its iverse is also diagoal, with [I θ] αα beig the same
as /Iθ αα. But i geeral, 8 [ I θ ] αα /Iθ αα Iθ α 3.4 This is thus a stroger boud tha oe would get by assumig all of the uisace parameters to be kow ad applyig the usual Cramér-Rao boud to the parameter of iterest. Thursday 0 March 06 Read ectio 6.5 of Hogg 3.3 Maximum Likelihood Tests We retur ow to the questio of hypothesis testig whe the parameter space associated with the distributio is multidimesioal. Hogg cosiders this i a somewhat limited cotext where the ull hypothesis H 0 ivolves pickig values for oe or more parameters or combiatios of parameters ad leavig the rest uspecified. Formally this is writte as H 0 : θ ω where ω is some lower-dimesioal subspace of the parameter space Ω. The alterative hypothesis is H : [θ Ω] [θ / ω] The umber of parameters i the full parameter space is p, while ω is defied by specifig values for q idepedet fuctios of the parameters, where 0 < q p. This meas that ω is a p q-dimesioal space. The case cosidered before was p q so that ω was a zero-dimesioal poit i the oe-dimesioal parameter space Ω. ice H 0 is also a composite hypothesis with differet possible parameter values, if we were costructig a Bayes factor, we d 8 I I This is easy to see i the case p, where I ad I I I I I I I I so that I I [I ] I I I I. also eed to specify a prior distributio for θ ω associated with H 0 : B 0 f Xx H 0 f X x H Lθ; x f θ ω Θθ H 0 d p q θ Lθ; x f 3.5 θ Ω Θθ H d p θ I classical statistics we do t have a prior distributio associated with either hypothesis, so we let them each put their best foot forward by pickig the parameter values which maximize the likelihood, which we refer to as θx argmax θ Ω Lθ; x ad θ 0 x argmax θ ω Lθ; x. The likelihood ratio for use i tests is thus Λx L θ 0 x; x L θx; x max θ ω Lθ; x max θ Ω Lθ; x 3.6 3.3. Example: Mea of Normal Distributio with Ukow Variace As a example of the method, cosider a sample of size draw from a Nθ, θ i.e., a ormal distributio where the mea ad the variace are both ukow parameters. The likelihood is Lθ, x πθ / exp x i θ θ 3.7 H 0 is θ µ 0, 0 < θ <, while H is < θ <, 0 < θ <. H 0 is effectively a oe-parameter model which has mles of θ 0 θ 0 µ 0 3.8a x i µ 0 σ 0 3.8b i0 3
while the MLE for H was foud last time to be θ x i x 3.9a θ x i x σ 3.9b We see that the maximized likelihoods are L θx; x π σ / e / L θ 0 x; x π σ 0 / e / 3.30a 3.30b o the likelihood ratio is / σ ΛX X i X / X 3.3 i µ 0 If we write σ 0 X i µ 0 [X i X + X µ 0 ] we ca see that X i µ 0 so X i X + X µ 0 + X i XX µ 0 3.3 X i X +X µ 0 0 X i X X µ 0 + 3.33 ΛX + X µ 0 X i X / 3.34 This meas that thresholdig o the value of ΛX is the same as thresholdig o the secod term i paretheses, or equivaletly o T X µ 0 X i X / 3.35 But we ve costructed this last statistic so that, if H 0 holds ad the {X i } are idepedet Nµ 0, θ radom variables, tudet s theorem tells is this is t-distributed with degrees of freedom. o for a test at sigificace level α we reject H 0 if T > t, α or T < t, α, sice o we reject H 0 if ΛX + T / 3.36 ΛX < + t, α / 3.37 3.3. Example: Mea of Multivariate Normal Distributio with Uit Variace We kow that the behavior of maximum likelihood tests is simplest whe the sample is draw from a ormal distributio with kow variace whose mea is the parameter i questio. Now we have multiple parameters, so the straightforward extesio is to have the sample draw from a multivariate ormal distributio whose meas are the parameters. o the sample ow cosists of idepedet radom vectors X, X,..., X, each with p elemets ad draw from a multivariate ormal distributio. For simplicity, we ll use the idetity matrix as the variacecovariace matrix, so X i N p θ, p p. We ll defie the ull hypothesis H 0 to be that the first q of the {θ α } are zero. The 4
likelihood fuctio is Lθ; x π / exp h x i θ T x i θ π / exp p [x i ] α θ α α exp p θ α x α where α x α 3.38 [x i ] α 3.39 Thus the maximum likelihood solutio uder H is θ α x α, whike uder H 0 it is { 0 α,..., q [ θ 0 ] α 3.40 α q +,..., p x α ad the likelihood ratio is Λ{x i } L θ 0 ; x L θ; x exp q α x α 3.4 Uder H 0, the {X α } are idepedet ad X α N0, for α,..., q. Thus, if the ull hypothesis is true, q X α l Λ{X i } / χ q 3.4 3.3.3 Large-ample Limit This last result also holds geerically as a limitig distributio for large samples : α l Λ{X i } D χ q 3.43 5