Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0, ] = 1 I(max(X 1,..., X. Here the idicator fuctio I(A equals to 1 if A happes ad 0 otherwise. What we wrote is that the product of p.d.f. f(x i will be equal to 0 if at least oe of the factors is 0 ad this will happe if at least oe of X i s will fall outside of the iterval [0, ] which is the same as the maximum amog them exceeds. I other words, ad ϕ( = 0 if < max(x 1,..., X, ϕ( = 1 if max(x 1,..., X. Therefore, lookig at the figure 5.1 we see that ˆ = max(x 1,..., X is the MLE. 5.1 Cosistecy of MLE. Why the MLE ˆ coverges to the ukow parameter 0? This is ot immediately obvious ad i this sectio we will give a sketch of why this happes. 17

LECTURE 5. 18 ϕ( max(x1,..., X Figure 5.1: Maximize over First of all, MLE ˆ is a maximizer of L = 1 log f(x i which is just a log-likelihood fuctio ormalized by 1 (of course, this does ot affect the maximizatio. L ( depeds o data. Let us cosider a fuctio l(x = log f(x ad defie L( = 0 l(x, where we recall that 0 is the true ukow parameter of the sample X 1,..., X. By the law of large umbers, for ay, L ( 0 l(x = L(. Note that L( does ot deped o the sample, it oly depeds o. We will eed the followig Lemma. We have, for ay, L( L( 0. Moreover, the iequality is strict L( < L( 0 uless which meas that = 0. 0 (f(x = f(x 0 = 1.

LECTURE 5. 19 Proof. Let us cosider the differece L( L( 0 = 0 (log f(x log f(x 0 = 0 log f(x f(x 0. t 1 log t t 0 1 Figure 5.2: Diagram (t 1 vs. log t Sice (t 1 is a upper boud o log t (see figure 5.2 we ca write 0 log f(x f(x 0 = ( f(x ( f(x 0 f(x 0 1 = f(x 0 1 f(x 0 dx f(x dx f(x 0 dx = 1 1 = 0. Both itegrals are equal to 1 because we are itegratig the probability desity fuctios. This proves that L( L( 0 0. The secod statemet of Lemma is also clear. We will use this Lemma to sketch the cosistecy of the MLE. Theorem: Uder some regularity coditios o the family of distributios, MLE ˆ is cosistet, i.e. ˆ 0 as. The statemet of this Theorem is ot very precise but but rather tha provig a rigorous mathematical statemet our goal here to illustrate the mai idea. Mathematically iclied studets are welcome to come up with some precise statemet. Proof.

LECTURE 5. 20 We have the followig facts: 1. ˆ is the maximizer of L ( (by defiitio. 2. 0 is the maximizer of L( (by Lemma. 3. we have L ( L( by LLN. This situatio is illustrated i figure 5.3. Therefore, sice two fuctios L ad L are gettig closer, the poits of maximum should also get closer which exactly meas that ˆ 0. L( L( ^ MLE 0 Figure 5.3: Lemma: L( L( 0 5.2 Asymptotic ormality of MLE. Fisher iformatio. We wat to show the asymptotic ormality of MLE, i.e. that (ˆ 0 d N(0, σ 2 MLE for some σ2 MLE. Let us recall that above we defied the fuctio l(x = log f(x. To simplify the otatios we will deote by l (X, l (X, etc. the derivatives of l(x with respect to.

LECTURE 5. 21 Defiitio. (Fisher iformatio. Fisher Iformatio of a radom variable X with distributio 0 from the family { : Θ} is defied by I( 0 = 0 (l (X 0 2 0 ( log f(x 0 2. Next lemma gives aother ofte coveiet way to compute Fisher iformatio. Lemma. We have, ad Proof. First of all, we have 0 l (X 0 0 2 Also, sice p.d.f. itegrates to 1, 2 log f(x 0 = I( 0. l (X = (log f(x = f (X f(x (log f(x = f (X f(x (f (X 2 f 2 (X. f(x dx = 1, if we take derivatives of this equatio with respect to (ad iterchage derivative ad itegral, which ca usually be doe we will get, f(x dx = 0 ad 2 f(x dx = 2 f (x dx = 0. To fiish the proof we write the followig computatio 0 l 2 (X 0 = 0 log f(x 0 = (log f(x 2 0 f(x 0 dx (f (x 0 ( f = f(x 0 (x 0 2 f(x 0 dx f(x 0 = f (x 0 dx 0 (l (X 0 2 = 0 I( 0 = I( 0. We are ow ready to prove the mai result of this sectio.

LECTURE 5. 22 Theorem. (Asymptotic ormality of MLE. We have, 1 (ˆ 0 N( 0,. I( 0 Proof. Sice MLE ˆ is maximizer of L ( = 1 log f(x i we have, Let us use the Mea Value Theorem f(a f(b a b L (ˆ = 0. = f (c or f(a = f(b + f (c(a b for c [a, b] with f( = L (, a = ˆ ad b = 0. The we ca write, 0 = L (ˆ = L ( 0 + L (ˆ 1 (ˆ 0 for some ˆ 1 [ˆ, 0 ]. From here we get that ˆ 0 = L ( 0 L (ˆ 1 ad L (ˆ 0 = ( 0 L (ˆ. (5.1 1 Sice by Lemma i the previous sectio 0 is the maximizer of L(, we have Therefore, the umerator i (5.1 L ( 0 = ( 1 = ( 1 L ( 0 = 0 l (X 0 = 0. (5.2 l (X i 0 0 ( l (X i 0 0 l (X 1 0 N 0, Var 0 (l (X 1 0 (5.3 coverges i distributio by Cetral Limit Theorem. Next, let us cosider the deomiator i (5.1. First of all, we have that for all, L ( = 1 l (X i 0 l (X 1 by LLN. (5.4 Also, sice ˆ 1 [ˆ, 0 ] ad by cosistecy result of previous sectio ˆ 0, we have ˆ 1 0. Usig this together with (5.4 we get L (ˆ 1 0 l (X 1 0 = I( 0 by Lemma above.

LECTURE 5. 23 Combiig this with (5.3 we get L ( 0 L (ˆ 1 Fially, the variace, ( N 0, Var 0 (l (X 1 0. (I( 0 2 Var 0 (l (X 1 0 = 0 (l (X 0 2 ( 0 l (x 0 2 = I( 0 0 where i the last equality we used the defiitio of Fisher iformatio ad (5.2.