Maximum Likelihood Methods (Hogg Chapter Six)

Similar documents
Unbiased Estimation. February 7-12, 2008

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

Exponential Families and Bayesian Inference

Lecture 11 and 12: Basic estimation theory

Introductory statistics

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

Random Variables, Sampling and Estimation

Lecture 2: Monte Carlo Simulation

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

7.1 Convergence of sequences of random variables

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes.

Topic 9: Sampling Distributions of Estimators

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

( θ. sup θ Θ f X (x θ) = L. sup Pr (Λ (X) < c) = α. x : Λ (x) = sup θ H 0. sup θ Θ f X (x θ) = ) < c. NH : θ 1 = θ 2 against AH : θ 1 θ 2

Chapter 6 Principles of Data Reduction

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Lecture 3. Properties of Summary Statistics: Sampling Distribution

Statistical Theory MT 2008 Problems 1: Solution sketches

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Asymptotics. Hypothesis Testing UMP. Asymptotic Tests and p-values

7.1 Convergence of sequences of random variables

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

Statistical Theory MT 2009 Problems 1: Solution sketches

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Lecture 12: September 27

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Efficient GMM LECTURE 12 GMM II

Distribution of Random Samples & Limit theorems

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

Estimation for Complete Data

5. Likelihood Ratio Tests

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

Expectation and Variance of a random variable

Direction: This test is worth 150 points. You are required to complete this test within 55 minutes.

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

Topic 9: Sampling Distributions of Estimators

Statistical and Mathematical Methods DS-GA 1002 December 8, Sample Final Problems Solutions

Statistics 511 Additional Materials

4. Partial Sums and the Central Limit Theorem

STAT Homework 1 - Solutions

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Problem Set 4 Due Oct, 12

In this section we derive some finite-sample properties of the OLS estimator. b is an estimator of β. It is a function of the random sample data.

Lecture Notes 15 Hypothesis Testing (Chapter 10)

Topic 9: Sampling Distributions of Estimators

1.010 Uncertainty in Engineering Fall 2008

Lecture 19: Convergence

LECTURE 14 NOTES. A sequence of α-level tests {ϕ n (x)} is consistent if

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

Stat 319 Theory of Statistics (2) Exercises

Chapter 6 Infinite Series

Stat410 Probability and Statistics II (F16)

Lecture 33: Bootstrap

Summary. Recap ... Last Lecture. Summary. Theorem

Frequentist Inference

Axis Aligned Ellipsoid

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Solutions: Homework 3

Properties and Hypothesis Testing

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

6 Sample Size Calculations

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Element sampling: Part 2

Statistical inference: example 1. Inferential Statistics

The standard deviation of the mean

Lecture Note 8 Point Estimators and Point Estimation Methods. MIT Spring 2006 Herman Bennett

MA Advanced Econometrics: Properties of Least Squares Estimators

Lecture 7: Properties of Random Samples

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

Rank tests and regression rank scores tests in measurement error models

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

Lesson 10: Limits and Continuity

Chapter 6 Sampling Distributions

1 Introduction to reducing variance in Monte Carlo simulations

32 estimating the cumulative distribution function

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15

Lecture 2: Poisson Sta*s*cs Probability Density Func*ons Expecta*on and Variance Es*mators

6.3 Testing Series With Positive Terms

Review Questions, Chapters 8, 9. f(y) = 0, elsewhere. F (y) = f Y(1) = n ( e y/θ) n 1 1 θ e y/θ = n θ e yn

1 Approximating Integrals using Taylor Polynomials

Lecture 3: August 31

Bayesian Methods: Introduction to Multi-parameter Models

The Random Walk For Dummies

Simulation. Two Rule For Inverting A Distribution Function

Linear Regression Demystified

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

Estimation of the Mean and the ACVF

NANYANG TECHNOLOGICAL UNIVERSITY SYLLABUS FOR ENTRANCE EXAMINATION FOR INTERNATIONAL STUDENTS AO-LEVEL MATHEMATICS

6. Sufficient, Complete, and Ancillary Statistics

Last Lecture. Wald Test

Lecture 9: September 19

Stat 421-SP2012 Interval Estimation Section

Transcription:

Maximum Likelihood Methods Hogg Chapter ix TAT 406-0: Mathematical tatistics II prig emester 06 Cotets 0 Admiistrata Maximum Likelihood Estimatio. Maximum Likelihood Estimates............ Motivatio.................... Example: igal Amplitude........ 3..3 Example: Bradley-Terry Model...... 3..4 Reparameterizatio............. 4. Fisher Iformatio ad The Cramér-Rao Boud. 4.. Fisher Iformatio............. 5.. The Cramér-Rao Boud.......... 6 Maximum-Likelihood Tests 8. Likelihood Ratio Test................ 8.. Example: Gaussia Distributio...... 8. Wald Test...................... 9.. Example: Gaussia Distributio...... 9.3 Rao cores Test................... 9.3. Example: Gaussia Distributio...... 9 Copyright 06, Joh T. Whela, ad all that 3 Multi-Parameter Methods 0 3. Maximum Likelihood Estimatio......... 0 3. Fisher Iformatio Matrix............. 3.. Example: ormal distributio....... 3.. Fisher Iformatio Matrix for a Radom ample.................... 3..3 Error Estimatio.............. 3.3 Maximum Likelihood Tests............. 3 3.3. Example: Mea of Normal Distributio with Ukow Variace.......... 3 3.3. Example: Mea of Multivariate Normal Distributio with Uit Variace...... 4 3.3.3 Large-ample Limit............. 5 Thursday 8 February 06 Read ectio 6. of Hogg 0 Admiistrata Effects of sow day: homeworks due o Thursday through prig Break. First prelim exam ow Thursday, March 3. hift

back to ormal after prig Break. Maximum Likelihood Estimatio. Maximum Likelihood Estimates We ow retur to methods of classical, or frequetist statistics, which are phrased ot i terms of probabilities which quatify our cofidece i geeral propositios with ukow truth values, but exclusively i terms of the frequecy of outcomes i hypothetical repetitios of experimets. Oe of the fudametal tools is the samplig distributio, which we ll write as a joit pdf for a radom vector X: f X x. If the model has a parameter θ, we ll write this as f X x; θ, ad whe we cosider this as a fuctio of θ, we ll write it Lθ; x or sometimes Lθ. Ad we will ofte fid it coveiet to work with the logarithm lθ l Lθ. Recall from last semester that oe possible choice of a estimate for θ give a observed data vector x is the maximum likelihood estimate θx argmax θ Lθ; x argmax θ lθ; x. If we take the fuctioal form of θx ad isert the radom vector X ito the fuctio, we get a statistic θ θx which is kow as the maximum likelihood estimator. Note that this is a radom variable ot i the formal sese which we ivoked to assig a Bayesia probability distributio to a ukow parameter Θ, but rather a fuctio of the radom vector X whose value will be differet i differet radom trials, eve with a fixed value for the parameter θ. Of course, if we re usig θ as a estimator, the true value of the parameter θ will be ukow. A useful special case is where the radom vector X is a sample of size draw from some distributio fx; θ. The the likelihood fuctio will be Lθ; x fx i ; θ. ad the log-likelihood will be.. Motivatio lθ; x l fx i ; θ.3 ice we kow that i the Bayesia framework, the posterior pdf for the parameter θ give the observed data x is f Θ X θ x f Θ θ f X Θ x θ,.4 we see that if the prior pdf f Θ θ is a costat, the posterior will be proportioal to f X Θ x θ Lθ; x, ad therefore the θx which maximizes the likelihood will also maxmize the posterior psd f Θ X θ x. Hogg also ivokes two theorems which idicate that the MLE matches up with the true ukow value of the parameter i the limit that the size of a sample goes to ifiity: Theorem 6..: uppose θ 0 is the true value of a parameter θ, ad X is a sample of size from a distributio fx; θ. Give certai regularity coditios, most importatly that θ 0 is ot at the boudary of the parameter space for θ, the true parameter maximizes the likelihood i the limit, i.e., lim P Lθ 0; X > Lθ 0 ; X for θ θ 0.5 We also eed differet θ values to give differet pdfs for X, ad that the support space for X is the same for all θ.

Theorem 6..3: Uder the same regularity coditios, the maximum likelihood estimator, if it exists, coverges i probability to θ 0 : θ P θ 0.6 so the maximum likelihood estimate is θ h ix i /σ i j h j /σ j.0 The first theorem says that the true value becomes the maximum likelihood value i the limit of a ifiitely large sample, while the secod says that the mle becomes the true value i that limit... Example: igal Amplitude ee Hogg for several examples of maximum-likelihood solutios associated with radom samples. To complemet those, we re goig to cosider a couple of cases where the radom vector is ot quite a radom sample, but still draw from a distributio with a parameter θ. First, cosider a simplified example from sigal processig. uppose you have a set of data poits {x i } which represet a sigal with a kow shape, described by {h i } ad a ukow amplitude θ, o top of which there is some Gaussia oise with variaces {σi }. I.e., the distributio associated with each data value is X i Nθh, σi. This meas the likelihood fuctio is Lθ; x ad the log-likelihood is The derivative is lθ lθ πσ i exp x i θh i σ i.7 x i θh i + cost.8 h i x i θh i σ i σ i h i x i σ i θ h i σi.9 We ca also write this i matrix form which would allow geeralizatio to covariaces betwee data poits via a o-diagoal variace-covariace matrix Σ θ h T Σ h h T Σ x. This makes the sigal model, with maximum-likelihood amplitude, where h θ h h T Σ h h T Σ x Σ / PΣ / x. P Σ / h h T Σ h h T Σ /.3 is a projectio matrix. Note that i a more complicated problem, we might have a superpositio of differet templates, each with its ow amplitude, k so the sigal would be X N θ ih i, Σ..3 Example: Bradley-Terry Model The Bradley-Terry model 3 is desiged to model paired comparisos betwee objects teams or players i games, foods i taste tests, etc. Each object has a stregth π i, 0 < π i <, ad i a compariso betwee two objects, object i will be chose over object j with a probability i π π i +π j. Cosider a restricted E.g., for cotiuous gravitatioal waves, you have two polarizatios, each with amplitude ad phase, so four parameters. 3 Zermelo, Mathematische Zeitschrift 9, 436 99; Bradley ad Terry Biometrika 39, 34 95 3

versio of the model where a object with ukow stregth θ is compared to a series of objects 4 with kow stregths π, π,..., π. Let X i be a Beroulli radom variable which is if the object wis the ith compariso ad 0 if it loses, so that θ X i b,.4 θ + π i The the likelihood is Lθ; x f X Θ x θ θ x i π x i i θ + π i.5 ad the log-likelihood is lθ [x i l θ + x i l π i lθ + π i ].6 ad l θ x i θ θ + π i.7 If we defie y x i to be the total umber of comparisos wo, the mle for θ satisfies y x i θ θ + π i.8 The right had side is the average umber of compariso wis you d predict, ad the mle is just the stregth which makes this equal to the actual umber observed..8 ca t be solved i closed form, oe ca fid the mle umerically by iteratig 5 y θ θ.9 + π i 4 Note that, sice the stregths are assumed to be kow, some of these comparisos may actually be repeated comparisos with the same object. 5 Ford, America Mathematical Mothly 64, 8 957..4 Reparameterizatio A key property of maximum-likelihood estimates is captured i Hogg s Theorem 6..: If you chage variables i the likelihood from θ to some other η gθ, ad work out the mle for η, it is η g θ. The proof is almost trivial, so rather tha repeat it, we ll demostrate the property i our Bradley-Terry example. Let λ l θ so that < λ <. The calculatio of the log likelihood proceeds as before, ad we ca substitute θ e λ ito.6 to get [ lλ xi λ + x i l π i le λ + π i ].0 The derivative is l λ x i eλ e λ + π i. so the maximum-likelihood equatio is e λ y x i. e λ + π i which is just the same as.8 with θ replaced by e λ. Tuesday 3 February 06 Read ectio 6. of Hogg. Fisher Iformatio ad The Cramér-Rao Boud We ow tur to a cosideratio of the properties of estimators ad other statistics. uppose we have a radom vector X which 4

is a radom sample from a distributio fx; θ which icludes a parameter θ. We kow the joit samplig distributio is f X θ x,..., x θ fx i ; θ.3 If we have a statistic Y ux, this is a radom variable. The mea ad variace of Y are ot radom variables, but they deped o θ because the samplig distributio does: µ Y θ E Y θ ad VarY θ E Y µ Y θ θ.. Fisher Iformatio ux,..., x fx i ; θ dx i.4 [ux µ Y θ] f X θ x θ d x.5 The first such fuctio we wat to cosider is the derivative with respect to the parameter θ of the log-likelihood lθ; x fx; θ. We will focus first o the case of a sigle radom variable before cosiderig a radom sample. We will also assume a eve broader set of regularity coditios tha last time, specifically that we re allowed to iterchage derivatives ad itegrals. The quatity of iterest is ux l θ, x l fx; θ fx; θ fx; θ.6 ad is oe measure of how much the probability distributio chages whe we chage the parameter. Note i this case the fuctio ux depeds explicitly o the parameter, i additio to the θ depedece of the distributio for the radom variable X. ux l θ, X is a statistic whose expectatio value is E l fx; θ fx; θ θ, X fx; θ dx dx fx; θ.7 where we explicitly restrict the itegral to the support space x so that fx; θ > 0. Now the regularity coditios especially the fact that is the same for ay allowed value of θ allow us to pull the derivative outside the itegral so we have E l θ, X d fx; θ dx d 0.8 dθ dθ where we have used the fact that fx; θ is a ormalized pdf i x. o ote that while l θx, x 0, i.e., the derivative of the log-likelihood is zero at the maximum-likelihood value of θ for a give x, the expectatio value E l θ, X vaishes for ay θ. This allows us to write the variace as Var l θ, X E [l θ, X] l fx; θ.9 fx; θ dx Iθ Iθ is a property of the distributio kow as the Fisher iformatio. You ve ecoutered it o the homework i the cotext of the Jeffreys prior f Θ θ Iθ. Note that if we go back ad differetiate.8 we get 0 d dθ E l θ, X d l fx; θ fx; θ dx dθ l fx; θ l fx; θ fx; θ fx; θ + dx dx l fx; θ l fx; θ fx; θ dx + fx; θ dx.30 5

This meas there s a equivalet ad sometimes easier to calculate way to write the Fisher iformatio as l fx; θ Iθ fx; θ dx E l θ; X.3 Example: Fisher iformatio for the mea of a ormal distributio To give a very simple example, cosider the ormal distributio Nθ, σ. The likelihood is so the log-likelihood is whose derivatives are fx; θ σ π exp x θ σ.3 x θ l fx; θ σ lπσ.33 x θ l fx; θ σ l fx; θ σ.34a.34b o the Fisher iformatio is Iθ. Note i passig that σ the expectatio of the first derviative is X θ, whose expectatio value is ideed σ zero. Fisher iformatio for a sample of size We ca ow cosider the Fisher iformatio associated with a radom sample of size, give that l fx; θ Iθ E θ.35 I fact, sice the joit samplig distributio, ad therefore the likelihood for the sample, is f X Θ x θ fx i ; θ,.36 the log likelihood will be l f X Θ x θ l fx i ; θ,.37 ad therefore the Fisher iformatio is l f X Θ x θ E θ l fx i ; θ E θ Iθ.38 where we ve used the fact that both the derivative ad the expectatio value are liear operatios, ad that the radom variables {X i } are iid. Note that the Fisher iformatio for a sample of size from a ormal distributio is, which is oe over the variace of the σ sample mea... The Cramér-Rao Boud Now we retur to cosideratio of a geeral statistic Y ux, with a special eye towards oe beig used as a estimator of the parameter θ. Recallig the defiitio of the expectatio value µ Y θ E Y ux,..., x f X Θ x, θ d x.39 ad variace VarY θ E [Y µ Y θ] θ [ux µ Y θ] f X θ x θ d x.40 6

we otice a similarity to the Fisher iformatio for the sample l fx Θ X θ Iθ E θ.4 Both are of the form E [ax] θ a a where we have defied the ier product a b E ax bx θ ax bx fx; θ dx.4 This behaves i may ways like a dot product a b, ad i particular satisfies the Cauchy-chwarz iequality 6 a a b b a b.43 This meas that VarY θ Iθ E Y µ Y θ l f X ΘX θ θ 0 l fx Θ X θ l fx Θ X θ E ux θ µ Y θ E θ f X Θ X θ ux,..., x f X Θ x, θ f X Θ x, θ d x d ux,..., x f X Θ x, θ d x d dθ dθ µ Y θ.44 where the last step holds as log as ux does t deped explicitly o θ. The iequality ca be solved for the variace of the statistic Y to give the Cramér-Rao boud: VarY θ µ Y θ Iθ.45 6 I vector algebra, this is a cosequece of the law of cosies a b a b cos φ a, b I the special case where Y is a ubiased estimator of θ, so that µ Y θ, it simplifies to VarY θ Iθ if E Y θ θ.46 I.e., the variace of a ubiased estimator ca be o less tha the reciprocal of the Fisher iformatio of the sample. To retur to our simple example of a Gaussia samplig distributio Nθ, σ, let Y X, the sample mea, which we kow is a ubiased estimator of the distributio mea θ. The the Cramér-Rao boud tells us VarX θ Iθ σ.47 Of course, we kow that the variace of the sample mea is σ /, which meas that i this case the boud is saturated, ad the sample mea is the lowest-variace ubiased estimator of the distributio mea for a ormal distributio with kow variace. Whe the Cramér-Rao boud is saturated, i.e., VarY θ for a ubiased estimator, we say that Y is a efficiet estimator of θ. Otherwise, we ca defie the ratio of the CramérIθ Rao boud to the actual variace as the efficiecy of the estimator. Aother sellig poit of maximum likelihood estimates is their relatioship to efficiecy. A MLE is ot always a efficiet estimator, but oe ca prove usig Taylor series that it becomes so i the limit that the sample size goes to ifiity. Aother importat theorem the proof of which we omit for ow is that, subject to the usual regularity coditios, the distributio for the MLE coverges to a Gaussia: θx θ D N 0, Iθ.48 7

Tuesday March 06 Read ectio 6.3 of Hogg Maximum-Likelihood Tests o far we ve used maximum likelihood methods to cosider ways to estimate the parameter θ of a distributio fx; θ. We ow cosider tests desiged to compare a ull hypothesis H 0 which specifies θ θ 0 ad H, which allows for arbitrary θ withi the allowed parameter space. Note that sice these are classical hypothesis tests, we do t require H to specify a prior probability distributio for θ. I each case, we will costruct a test statistic from a sample of size, draw from fx; θ, which is asymptotically chi-square distributed i the limit.. Likelihood Ratio Test Defie as usual the likelihood ad its logarithm Lθ; x f X Θ x θ lθ; x l f X Θ x θ fx i ; θ. l fx i ; θ. Recall that for Bayesia hypothesis testig, the atural quatity to costruct was the Bayes Factor B 0 f Xx H 0 f X x H Lθ 0 ; x Lθ; x f θ Ω Θθ H dθ.3 i the frequetist case we do t defie a prior f Θ θ H, so istead, of all the θ values at which to evaluate the likelihood, we choose the oe which maximizes it 7, ad look at the likelihood ratio Λx Lθ 0; x.4 L θx; x Note that i the frequetist picture Lθ; X ad ΛX are statistics, i.e., radom variables costructed from the sample X... Example: Gaussia Distributio Cosider a sample of size draw from a Nθ, σ distributio, for which we kow the mle is x ad the likelihood is Lθ; x σ π exp x σ i θ exp.5 σ θ x ad the likelihood ratio is Λx exp ice X Nθ, σ /, we have σ θ 0 x.6 l ΛX X θ 0 χ assumig H σ 0.7 / Now, this wo t be exactly true i geeral, but the various asymptotic theorems show that i geeral, for a regular uderlyig distributio, it s true i the asymptotic sese: χ L l ΛX D χ assumig H 0.8 7 There are various argumets which make this a reasoable choice, most revolvig aroud the fact that for large samples, the likelihood fuctio is approximately Gaussia with a peak value of L θx; x. 8

For fiite, we assume this is approximately true, ad compare l ΛX to the 00 αth percetile of χ distributio to get a test with false alarm probability α.. Wald Test The statistic l ΛX costructed from the likelihood ratio is oly oe of several statistics which are asymptotically χ. Aother possibility is to cosider the result that D θx θ0 N 0,.9 Iθ 0 ad costruct the statistic θx θ0 χ W I θx θx θ0.0 /I θx Comparig this to the percetiles of a χ is kow as the Wald test... Example: Gaussia Distributio The distributio of the statistic χ W should coverge to χ as. We ca see what the exact distributio is for a give choice of the uderlyig samplig distributio. Assumig agai a Nθ, σ distributio for fx; θ, we have Fisher iformatio Iθ σ, maximum likelihood estimator θx X, ad Wald test statistic χ W X θ 0 σ. which is of the same form as.7, which we kow to be exactly χ for ay..3 Rao cores Test The fial likelihood-based test ivolves the score l fx i ; θ. associated with each radom variable i the sample, whose sum is the derivative of the log-likelihood l θ; X l fx i; θ We kow the expectatio value is l E l fxi ; θ θ; X E.3 0 0.4 ad the variace is l Var l fxi ; θ θ; X Var Iθ Iθ.5 so we costruct a test statistic which, i the limit that the sum of the scores is ormally distributed, becomes a χ : χ R l θ 0 ; X Iθ 0.6 Comparig this to the percetiles of a χ is kow as the Rao scores test. It has the advatage over the other two tests that we do t eed to work out the maximum likelihood estimator to apply it..3. Example: Gaussia Distributio If the uderlyig distributio is Nθ, σ, so that l fx; θ l πσ x θ σ.7 9

ad l fx; θ which makes the derivative of the log-likeihood l θ; X X i θ σ which makes the Rao scores statistic χ R X θ 0 σ 4 σ x θ σ.8 X θ σ.9 X θ 0 σ / which is, agai, exactly χ distributed i this case. Wedesday March 06 Review for Prelim Exam Oe.0 The exam covers materials from the first four class-weeks of the term, i.e., Hogg sectios.-.3 ad 6.-6. ad associated topics covered i class through February 3, ad problem sets -4. Thursday 3 March 06 First Prelim Exam Tuesday 8 March 06 Read ectio 6.4 of Hogg 3 Multi-Parameter Methods We ow cosider the case where the probability distributio fx; θ depeds ot just o a sigle parameter, but p parameters {θ, θ,..., θ p } {θ α } θ. The likelihood for a sample of size is the Lθ; x fx i ; θ 3. ad the log-likelihood is lθ; x l fx i ; θ 3. 3. Maximum Likelihood Estimatio We re still iterested i the set of parameter values { θ,..., θ p } which maximize the likelihood or, equivaletly, its logarithm. But to maximize a fuctio of p parameters, we eed to set all of the partial derivatives to zero, which meas there are ow p maximum likelihood equatios lθ; x 0 α,,..., p 3.3 α We ca i priciple solve these p equatios for the p ukows { θ α }. For example, cosider a sample of size draw from a Nθ, θ distributio with pdf fx; θ, θ exp x θ 3.4 πθ θ so that l fx; θ, θ lπ l θ x θ The partial derivatives are l fx; θ, θ x θ θ l fx; θ, θ + x θ θ θ θ 3.5 3.6a 3.6b 0

Takig the partial derivatives of 3. ad usig liearity the gives us lx; θ, θ x i θ θ lx; θ, θ + x i θ θ θ 3.7a 3.7b We eed to set both of these to zero to fid θ ad θ, i.e., the coupled maximum-likelihood equatios are x i θ + x i θ θ θ We ca solve the first equatio for θ θ 0 3.8a 0 3.8b x i x 3.9 ad the substitute that ito the secod to get i.e., θ θ x i x θ 3.0 x i x s 3. where s x i x is the usual ubiased sample variace. 3. Fisher Iformatio Matrix If we cosider the score statistic for a sigle observatio l fx; θ α 3. we see that it s ow actually a radom vector with oe compoet per parameter. It is still true that l fx; θ fx; θ E α fx; θ fx; θ dx 0 3.3 α You ca show this by takig α of the ormalizatio itegral for fx; θ. ice this is a radom vector, we ca costruct its variace-covariace matrix Iθ with elemets [ ] [ ] l fx; θ l fx; θ I αβ θ E 3.4 α β As before, we ca differetiate 3.3 with respect to θ β to get the idetity 0 l fx; θ fx; θ dx β α l fx; θ l fx; θ fx; θ fx; θ dx + dx α β α β l fx; θ l fx; θ l fx; θ E + fx; θ dx α β α β 3.5 i.e., a equivalet way of writig the Fisher iformatio matrix is l fx; θ I αβ θ E 3.6 α β Note that by defiitio ad costructio, the Fisher matrix is symmetric.

3.. Example: ormal distributio If we retur to the case of a Nθ, θ distributio, takig derivatives of 3.6 gives us l fx; θ, θ l fx; θ, θ θ θ 3.7a x θ θ 3 l fx; θ, θ l fx; θ, θ x θ θ 3.7b 3.7c If we recall E X θ ad VarX E X θ θ, we get the Fisher iformatio matrix elemets I αβ θ E l fx;θ,θ α β : i.e., I θ θ I θ + θ θ θ 3 I θ I 0 Iθ θ 0 0 θ θ 3.8a 3.8b 3.8c 3.9 3.. Fisher Iformatio Matrix for a Radom ample As i the sigle-parameter case, liearity meas that the Fisher iformatio matrix for a sample of size is just times the FIM for the distributio: lθ; X l fx; θ E E I αβ θ α β α β 3.0 3..3 Error Estimatio Recall that for a model with a sigle parameter θ, the Cramér- Rao boud stated that a ubiased estimator of θ had miimum variace of. Also, the maximum-likelihood estimator Iθ θ from a sample of size saturated that boud i the limit, ad i fact its distributio coverged to a Gaussia with the miimum variace specified by the boud: θ θ D N 0, Iθ 3. I the multi-parameter case, we have a maximum-likelihood estimator θ which is a radom vector with p compoets { θ α α,..., p}. The correspodig limitig distributio is a multivariate ormal: θ θ D N p 0, Iθ 3. The matrix iverse Iθ of the Fisher iformatio matrix is the variace-covariace matrix of the limitig distributio. It is also this iverse matrix which provides the multiparameter versio of the Cramér-Rao boud. If T α X is a ubiased estimator of oe parameter θ α, the lower boud o its variace is Var T α X [ I θ ] 3.3 αα I.e., the diagoal elemets of the iverse Fisher matrix provide lower bouds o the variace of ubiased estimators of the parameters. I cases like 3.9, where the Fisher matrix is diagoal, its iverse is also diagoal, with [I θ] αα beig the same

as /Iθ αα. But i geeral, 8 [ I θ ] αα /Iθ αα Iθ α 3.4 This is thus a stroger boud tha oe would get by assumig all of the uisace parameters to be kow ad applyig the usual Cramér-Rao boud to the parameter of iterest. Thursday 0 March 06 Read ectio 6.5 of Hogg 3.3 Maximum Likelihood Tests We retur ow to the questio of hypothesis testig whe the parameter space associated with the distributio is multidimesioal. Hogg cosiders this i a somewhat limited cotext where the ull hypothesis H 0 ivolves pickig values for oe or more parameters or combiatios of parameters ad leavig the rest uspecified. Formally this is writte as H 0 : θ ω where ω is some lower-dimesioal subspace of the parameter space Ω. The alterative hypothesis is H : [θ Ω] [θ / ω] The umber of parameters i the full parameter space is p, while ω is defied by specifig values for q idepedet fuctios of the parameters, where 0 < q p. This meas that ω is a p q-dimesioal space. The case cosidered before was p q so that ω was a zero-dimesioal poit i the oe-dimesioal parameter space Ω. ice H 0 is also a composite hypothesis with differet possible parameter values, if we were costructig a Bayes factor, we d 8 I I This is easy to see i the case p, where I ad I I I I I I I I so that I I [I ] I I I I. also eed to specify a prior distributio for θ ω associated with H 0 : B 0 f Xx H 0 f X x H Lθ; x f θ ω Θθ H 0 d p q θ Lθ; x f 3.5 θ Ω Θθ H d p θ I classical statistics we do t have a prior distributio associated with either hypothesis, so we let them each put their best foot forward by pickig the parameter values which maximize the likelihood, which we refer to as θx argmax θ Ω Lθ; x ad θ 0 x argmax θ ω Lθ; x. The likelihood ratio for use i tests is thus Λx L θ 0 x; x L θx; x max θ ω Lθ; x max θ Ω Lθ; x 3.6 3.3. Example: Mea of Normal Distributio with Ukow Variace As a example of the method, cosider a sample of size draw from a Nθ, θ i.e., a ormal distributio where the mea ad the variace are both ukow parameters. The likelihood is Lθ, x πθ / exp x i θ θ 3.7 H 0 is θ µ 0, 0 < θ <, while H is < θ <, 0 < θ <. H 0 is effectively a oe-parameter model which has mles of θ 0 θ 0 µ 0 3.8a x i µ 0 σ 0 3.8b i0 3

while the MLE for H was foud last time to be θ x i x 3.9a θ x i x σ 3.9b We see that the maximized likelihoods are L θx; x π σ / e / L θ 0 x; x π σ 0 / e / 3.30a 3.30b o the likelihood ratio is / σ ΛX X i X / X 3.3 i µ 0 If we write σ 0 X i µ 0 [X i X + X µ 0 ] we ca see that X i µ 0 so X i X + X µ 0 + X i XX µ 0 3.3 X i X +X µ 0 0 X i X X µ 0 + 3.33 ΛX + X µ 0 X i X / 3.34 This meas that thresholdig o the value of ΛX is the same as thresholdig o the secod term i paretheses, or equivaletly o T X µ 0 X i X / 3.35 But we ve costructed this last statistic so that, if H 0 holds ad the {X i } are idepedet Nµ 0, θ radom variables, tudet s theorem tells is this is t-distributed with degrees of freedom. o for a test at sigificace level α we reject H 0 if T > t, α or T < t, α, sice o we reject H 0 if ΛX + T / 3.36 ΛX < + t, α / 3.37 3.3. Example: Mea of Multivariate Normal Distributio with Uit Variace We kow that the behavior of maximum likelihood tests is simplest whe the sample is draw from a ormal distributio with kow variace whose mea is the parameter i questio. Now we have multiple parameters, so the straightforward extesio is to have the sample draw from a multivariate ormal distributio whose meas are the parameters. o the sample ow cosists of idepedet radom vectors X, X,..., X, each with p elemets ad draw from a multivariate ormal distributio. For simplicity, we ll use the idetity matrix as the variacecovariace matrix, so X i N p θ, p p. We ll defie the ull hypothesis H 0 to be that the first q of the {θ α } are zero. The 4

likelihood fuctio is Lθ; x π / exp h x i θ T x i θ π / exp p [x i ] α θ α α exp p θ α x α where α x α 3.38 [x i ] α 3.39 Thus the maximum likelihood solutio uder H is θ α x α, whike uder H 0 it is { 0 α,..., q [ θ 0 ] α 3.40 α q +,..., p x α ad the likelihood ratio is Λ{x i } L θ 0 ; x L θ; x exp q α x α 3.4 Uder H 0, the {X α } are idepedet ad X α N0, for α,..., q. Thus, if the ull hypothesis is true, q X α l Λ{X i } / χ q 3.4 3.3.3 Large-ample Limit This last result also holds geerically as a limitig distributio for large samples : α l Λ{X i } D χ q 3.43 5