Maximum Likelihood Asymtotic Theory Eduardo Rossi University of Pavia
Slutsky s Theorem, Cramer s Theorem Slutsky s Theorem Let {X N } be a random sequence converging in robability to a constant a, and let g( ) be a function continuous at a. Then g(x N ) g(a) Cramer s Theorem Let {Y N } and {X N } be random sequences, with Y N d Y and XN a (a constant). Then 1. X N + Y N d a + Y 2. X N Y N d ay 3. Y N X N d Y a when a 0. Continuous Maing Theorem If X N continuous, then g(x N ) d g(x). d X and g( ) is Eduardo Rossi - Econometria er i mercati finanziari (avanzato) 2
Cramér-Wold device Cramér-Wold device If {X N } is a sequence of random vectors, X N D X if and only if for all conformable fixed vectors λ, λ X N D λ X. Eduardo Rossi - Econometria er i mercati finanziari (avanzato) 3
The Weak Law of Large Numbers, Kinchine s Theorem Cramér s Theorem Assume z N N(µ,Σ) and A N is a conformable matrix with A N A. Then A N z N d N(Aµ,AΣA ) The Weak Law of Large Numbers It is the result that given a samle of z 1, z 2,...,z N of random variables with E(z t ) = µ the samle mean z N µ The strong law of large numbers is the corresonding result with a.s. convergence. Khinchine s Theorem Let {z t } be i.i.d. and integrable with E(z t ) = µ <. Then z N µ. d Eduardo Rossi - Econometria er i mercati finanziari (avanzato) 4
Chebyshev s Theorem Chebyshev s Theorem If E(z t ) = 0, t and lim N z N 0, then 0. z N Assuming E(z t ) = 0 and E(z 2 t ) = σ 2 t, if the terms of the sequence are indeendent they are also uncorrelated, then E(z 2 N) = 1 N 2 N t=1 σ 2 t Given that E(z 2 N) 0 as N imlies that z N alying the Chebyshev s inequality 0. In fact, by P( z N > ǫ) E(z2 N ) ǫ 2 0 ǫ > 0. Eduardo Rossi - Econometria er i mercati finanziari (avanzato) 5
The Central Limit Theorem The Central Limit Theorem The CLT concers the convergence of scaled sums of random variables, but the means of the variables must be zero, and the scale factor is N 1/2 instead of N 1. If then V ar(z N ) = O(N 1 ) V ar(n 1/2 z N ) = O(1) The random sequence of normalized samle means {N 1/2 x N }, having E(N 1/2 x N ) = 0, neither degenerates to 0 in the limit nor diverge to infinite values, excet with robability 0. Eduardo Rossi - Econometria er i mercati finanziari (avanzato) 6
Lindberg-Lévy Theorem Lindberg-Lévy Theorem Let {z t } be i.i.d., with E(z t ) = 0 and E(zt 2 ) = σ 2 <. If then ω N = NzN σ ω N d ω N(0, 1). The variance must be finite. This is always a requirement for the CLT to hold. Eduardo Rossi - Econometria er i mercati finanziari (avanzato) 7
Asymtotic analysis of the MLR Let θ N an estimator, alied to a samle of size N, of a vector of arameters θ 0. θ N, θ 0 Θ R k. For technical reasons, Θ must be a comact subset (The set is bounded and closed). It is required that θ 0 be an interior oint of Θ: θ 0 int(θ). We say that θ 0 int(θ) if there exists a real number δ > 0 such that θ Θ whenever θ θ 0 < δ. This exclude θ 0 be on the boundary of the set. Eduardo Rossi - Econometria er i mercati finanziari (avanzato) 8
Consistency Definition (Consistency) θ N is said to be consistent for θ 0 if lim θ N = θ 0 ( θ N θ 0 ). Sketch of roof If the sequence E( θ N ) has a limit and lim V ar( θ N ) = 0 N then θ N converges in robability to this limit. If lim N E( θ N ) = θ 0 is asymtotically unbiased. Asymtotically unbiasedness and zero limiting variance are jointly sufficient for consistency. Eduardo Rossi - Econometria er i mercati finanziari (avanzato) 9
CAN estimator Suose θ N θ 0, and that N k ( θ N θ 0 ) = O (1) for some K > 0, and has a non-degenerate distribution as N. This distribution is called the asymtotic distribution of θ N. Definition (CAN estimator) θ N is said to be consistent and asymtotically normal (CAN) for θ 0 int(θ) if there exists k > 0 such that N k ( θ N θ 0 ) d N(0,V) where V is a finite variance matrix. Eduardo Rossi - Econometria er i mercati finanziari (avanzato) 10
CAN estimator In most alications K = 1/2 although it can be larger than this in models containing trend terms. Sometimes with stochastic trends, k > 1/2 but the limiting distribution is not Gaussian. Definition (Class of CAN estimators of θ 0 ) θ N C is said to be the best asymtotically normal estimator of θ 0 (BAN) in the class C if Avar( θ N ) Avar( θ N ) is.s.d. for θ N C. This roerty is called asymtotic efficiency. Eduardo Rossi - Econometria er i mercati finanziari (avanzato) 11
MLE Asymtotics The MLE is an imlicit function of the random samle. MLE is not a function of samle averages of the data. But the samle log-likelihood is a sum of i.i.d. random variables. Because the (U t, V t ) i.i.d. so are any such transformations as the L(θ) L(θ; U t V t ), t = 1, 2,...,N. The LLN can aly to the samle average log-likelihood function itself E N [L(θ)] for any fixed θ. E[L(θ)] Eduardo Rossi - Econometria er i mercati finanziari (avanzato) 12
Consistency Under the assumtions 1. Distribution 2. Dominance 3. Global Identification 4. Comactness of Θ The MLE is consistent θ N θ0 Eduardo Rossi - Econometria er i mercati finanziari (avanzato) 13
Consistency The samle average log-likelihood converges to the exected log-likelihood for any value of θ: E N [L(θ)] E[L(θ)] θ N = arg max θ Θ E N[L(θ)] θ 0 = arg max θ Θ E[L(θ)] by construction by strict log-likelihood inequality As a result, θ N θ0, rovided that the relationshis are continuous. Eduardo Rossi - Econometria er i mercati finanziari (avanzato) 14
Consistency The argument of arg max θ Θ is a function of θ, E N [L(θ)]. arg max θ Θ must be a continuous function of its functional argument. The distance between two functions over a set containing an infinite number of ossible comarisons at different values of θ: Uniform Convergence in Probability: The sequence of real-valued functions {g N (θ)} converges in robability to the limit function {g 0 (θ)} if su g N (θ) g 0 (θ) 0 θ Θ we say g N (θ) g 0 (θ) uniformly. Eduardo Rossi - Econometria er i mercati finanziari (avanzato) 15
Consistency We use the Uniform Convergence in Probability in order to define the robability limit of a sequence of random functions. Uniform LLN. g(θ, U) continuous function over θ Θ, where Θ R K is closed and bounded, {U t } is a sequence of i.i.d. r.v. with c.d.f. F U (u). If E[su θ Θ g(θ; U) ] exists, then 1. E[g(θ; U)] is continuous over θ Θ 2. E N [g(θ; U)] E[g(θ; U)] Eduardo Rossi - Econometria er i mercati finanziari (avanzato) 16
Consistency We aly the uniform LLN to the samle average log-likelihood. Consistency of Maxima. If there is a sequence of functions Q N (θ) that converges in robability uniformly to a function Q 0 (θ) on the closed and bounded Θ and if Q 0 (θ) is continuous and uniquely maximized at θ 0, then θ N = arg max θ Θ Q N(θ) θ 0 Comactness and differentiability guarantee that E N [L(θ)] has a maximum. Eduardo Rossi - Econometria er i mercati finanziari (avanzato) 17
Consistency Let g(θ; U) L(θ; U V ) the conditional likelihood function for θ evaluated at the r.v. (U, V ). The conditions for uniform convergence are satisfied: Differentiability imlies continuity of L(θ) Comactness of Θ. (U t, V t ) are i.i.d. with c.d.f. F U V (u v; θ) Dominance states that E[su θ Θ L(θ) ] exists Then E[L(θ)] is continuous and uniformly. E N [L(θ)] E[L(θ)] Eduardo Rossi - Econometria er i mercati finanziari (avanzato) 18
Consistency For the Consistency of Maxima Q N (θ) = E N [L(θ)] andq 0 (θ) = E[L(θ)]. Under the assumtions: From Likelihood Identification: if θ 1 Θ, θ 0 θ 1 imlies Pr{L(θ 0 ) L(θ 1 )} > 0 we have the Strict Exected Log-likelihood Inequality: θ θ 0 imlies E[L(θ)] < E[L(θ 0 )] Hence E[L(θ)] is uniquely maximized at θ 0. Therefore θ N = arg max θ Θ E N[L(θ)] θ 0 = arg max E[L(θ)] Eduardo Rossi - Econometria er i mercati finanziari (avanzato) 19
Asymtotic Normality Assumtion: There is an oen subset of Θ that contains the oulation arameter value θ 0. θ 0 is not on the boundary of Θ. Assumtion: E N [L θ ( θ N )] = 0 the MLE solves the normal equations. First-order Taylor series exansion: E N [L θ ( θ N )] = 0 = E N [L θ (θ 0 )] + E N [L θθ (θ N )]( θ N θ 0 ) θ N = α N θn + (1 α N )θ 0 α N [0, 1] Eduardo Rossi - Econometria er i mercati finanziari (avanzato) 20
Asymtotic Normality N( θn θ 0 ) = { E N [L θθ (θ N )]} 1 NE N [L θ (θ 0 )] NE N [L θ (θ 0 )] E N [L θθ (θ N )] d N(0, I(θ 0 )) (by CLT) I(θ 0 ) (by LLN) then, N( θ N θ 0 ) d N(0, I(θ 0 ) 1 ) Eduardo Rossi - Econometria er i mercati finanziari (avanzato) 21
Asymtotic Normality NEN [L θ (θ 0 )] d N(0, I(θ 0 )) Proof. Since NEN [L θ (θ 0 )] = N 1 N L(θ) θ = 1 N L θ (θ 0 ; U t V t ) θ=θ0 N t=1 is a sum of i.i.d. r.v.. Under the assumtions of i.i.d. (U t, V t ) and differentiability, E[L θ (θ 0 ) V = v] = 0 so that E[c L θ (θ 0 )] = 0 c R K V ar[c L θ (θ 0 )] = c I(θ 0 )c exists c R K Eduardo Rossi - Econometria er i mercati finanziari (avanzato) 22
Asymtotic Normality The CLT imlies NEN [c L θ (θ 0 )] d N(0,c I(θ 0 )c) by Cramér-Wold device: d NEN [L θ (θ 0 )] N(0, I(θ 0 )) Eduardo Rossi - Econometria er i mercati finanziari (avanzato) 23
Asymtotic Normality E N [L θθ (θ N )] where θ N θ0. I(θ 0 ) While θ N θ0 imlies g(θ N ) g(θ 0 ) for continuous g. g is a function of the random samle, therefore is random. Lemma 1. g N (θ) g 0 (θ), θ Θ, closed and bounded subset of R K. 2. g 0 (θ) is continuous 3. θ N θ0 Θ then g N (θ N ) g 0 (θ 0 ) Eduardo Rossi - Econometria er i mercati finanziari (avanzato) 24
Asymtotic Normality Assumtion. (Dominance II) E[su θ Θ L θθ (θ) ] exists. Lemma. Under the assumtions 1. Differentiability 2. Finite Information: V ar[l θ (θ 0 )] exists. 3. Comactness: Θ is bounded and closed. 4. Dominance II: E[su θ Θ L θθ (θ) ] exists. E N [ Lθθ (θ N ) ] I(θ0 ) Eduardo Rossi - Econometria er i mercati finanziari (avanzato) 25
Asymtotic Normality Proof 1. To establish that E N [L θθ (θ)] E[L θθ (θ)] uniformly. We note that Differentiability imlies that L θθ (θ) is continuous. (U t, V t ) are i.i.d. Θ is comact. E[su θ Θ L θθ (θ) ] exists. Then we can invoke the uniform LLN: E N [L θθ (θ)] E[L θθ (θ)] uniformly in θ Θ. 2. θ N θ0. Alying the Lemma E N [L θθ (θ N )] E[L θθ (θ 0 )]. The assumtions of Distribution, Differentiability and Finite Information are met so E[L θθ (θ 0 )] = I(θ 0 ). Then E N [L θθ (θ N )] I(θ 0 ) Eduardo Rossi - Econometria er i mercati finanziari (avanzato) 26
Asymtotic Normality Now, N( θn θ 0 ) = { E N [L θθ (θ N )]} 1 NE N [L θ (θ 0 )] d I(θ 0 ) 1 N[0, I(θ 0 )] N(0, I(θ 0 ) 1 ) by Slutsky Square roots of nonsingular matrices are continuous functions of the elements of the matrix { E N [L θθ ( θ N )]} 1/2 I(θ 0 ) 1/2 Then by Slutsky s Theorem { E N [L θθ ( θ N )]} 1/2 N( θ N θ 0 ) d I(θ 0 ) 1/2 N[0, I(θ 0 ) 1 ] N[0,I K ]. Eduardo Rossi - Econometria er i mercati finanziari (avanzato) 27
Variance Estimation Three consistent estimators of the I(θ 0 ): E N [ L θθ ( θ N )] V ar N [L θ ( θ N )] E N [I( θ N )] To use the emirical information estimator the oulation information function I(θ) is known only when the log-likelihood is unconditional. When the log-likelihood is conditional then the conditional information function is I(θ 0 V ) = E[L θ (θ 0 ; U V )L θ (θ 0 ; U V ) V ] because the marginal distribution of V is unsecified I(θ) is unknown. Eduardo Rossi - Econometria er i mercati finanziari (avanzato) 28
Variance Estimation The unconditional information matrix is obtained using the LIE: I(θ 0 ) = E[I(θ 0 V )] the emirical information matrix estimator is E N [I( θ N V )]. Consistency of the estimators Î(θ) I(θ) uniformly. Because θ N θ0 Î( θ N ) I(θ 0 ) since the inverse is a continuous function Î( θ N ) 1 I(θ 0 ) 1. Eduardo Rossi - Econometria er i mercati finanziari (avanzato) 29
Efficiency The asymtotic distribution of MLE has no bias and the variance matrix of its asymtotic distribution equals the Cramér-Rao Lower Bound. The MLE and the efficient (but infeasible) Cramer-Rao estimator are asymtotically equivalent: N( θn θ 0 ) N(θ θ 0 ) = N( θ N θ ) 0 because {E N [L θθ (θ N )]} 1 I(θ 0 ) 1 d NEN [L θ (θ 0 )] N(0, I(θ 0 )) Eduardo Rossi - Econometria er i mercati finanziari (avanzato) 30
Efficiency θ θ 0 + [I(θ 0 )] 1 E N [L θ (θ 0 )] N( θn θ 0 ) N(θ θ 0 ) = N( θ N θ ) = N( θ N θ 0 ) N{E N [I(θ 0 )]} 1 E N [L θ (θ 0 )] = [ {E N [L θθ (θ N )]} 1 I(θ 0 ) 1] NEN [L θ (θ 0 )] 0 using the equivalence of convergence in distribution to a constant and convergence in robability. Eduardo Rossi - Econometria er i mercati finanziari (avanzato) 31
Invariance ML estimator is invariant to nonsingular transformations of the arameters. If γ = g(θ) is a one-to-one rearameterization then the MLE for γ is γ = arg max γ Γ E N {L[g 1 (γ)]} where Γ is the arameter sace {γ = g(θ) θ Θ}. Because the rearameterization is one-to-one max γ Γ E N {L[g 1 (γ)]} = max θ Θ E N[L(θ)] γ = g( θ) Invariance: Rearameterization does not alter the location of the MLE. Eduardo Rossi - Econometria er i mercati finanziari (avanzato) 32
Delta Method For finding the asymtotic distribution of a trasformation. Given a consistent estimator of the aroximate variance of N( θ N θ 0 ), Ω, the aroximate variance of N(g( θn ) g(θ 0 )) is g θ ( θ N ) Ωg θ ( θ N ). Delta Method: If N( θ N θ 0 ) d N(0,Ω) and g(θ) is continuous at θ 0 then N(g( θn ) g(θ 0 )) d N(0,J 0 ΩJ 0) J 0 g(θ) θ, J 0 J(θ 0 ). Eduardo Rossi - Econometria er i mercati finanziari (avanzato) 33
Delta Method Proof: Exanding g( θ N ) around θ 0 we obtain N(g( θn ) g(θ 0 )) = J(θ N ) N( θ N θ 0 ) for some θ N = α N θn + (1 α N )θ 0 α N [0, 1] θ N θ 0 J(θ N ) J 0 (Slutsky s Theorem) N(g( θn ) g(θ 0 )) d J 0 N(0,Ω) N(0,J 0 ΩJ 0). Eduardo Rossi - Econometria er i mercati finanziari (avanzato) 34
Asymtotics MLE Normal Linear Regression Model The emirical exectation of the log-likelihood E N [L(θ)] = 1 2 log (2πσ2 ) E N[(y t x tβ) 2 ] 2σ 2 = 1 2 log (2πσ2 ) (y Xβ) (y Xβ)/N 2σ 2 The log-lik is differentiable. F.O.C s: E N [L β (θ)] = 1 σ 2 E N[x t (y t x tβ)] = 1 Nσ 2 [X (y Xβ)] E N [L σ 2(θ)] = 1 2σ 4 {σ2 E N [(y t x tβ) 2 ]} = 1 [ σ 2 1 ] 2σ 4 N (y Xβ) (y Xβ) Eduardo Rossi - Econometria er i mercati finanziari (avanzato) 35
Asymtotics MLE Normal Linear Regression Model Solutions: 1 Nσ 2 [X (y X β)] = 0 β = (X X) 1 X y σ 2 = 1 N (y Xβ) (y Xβ) The Hessian matrix: E N [L θθ (θ)] = 1 σ 2 N X X (y Xβ) X σ 4 N X (y Xβ) σ 4 N 1 2σ 1 4 σ 6 N (y Xβ) (y Xβ) Eduardo Rossi - Econometria er i mercati finanziari (avanzato) 36
Asymtotics MLE Normal Linear Regression Model E N [L θθ ( θ N )] = = 1 σ 2 N X X X (y X β) σ 4 N (y X β) X σ 4 N 1 2 σ 1 4 σ 6 N (y X β) (y X β) 1 σ 2 N X X 0 0 1 2 σ 1 4 σ 6 N (y X β) (y X β) which is negative definite. The MLE of σ 2 is σ N 2 = ǫ ǫ N = N K N s2 The second-order necessary condition for a oint to be the local maximum of a twice continuously differentiable function is that the Hessian be negative semidefinite at the oint. Eduardo Rossi - Econometria er i mercati finanziari (avanzato) 37
Asymtotics MLE Normal Linear Regression Model E N [ L θθ ( θ N )] = 1 σ 2 N NX X 0 0 1 2 σ 4 N = E N [I( θ N x t )] V ar N [L θ ( θ N )] = ¾ 1 σ 4 N ( ) 1/2 1 X X σ N N N( βn β) 1 2 σ 4 N( σ 2 N σ 2 ) N E N [x t (y t x t β N ) 2 x t] 1 σ 8 N d N(0,I K+1 ) 1 2 σ N 6 E N [x t (y t x t β N ) 3 ] {E N [(y t x t β N ) 4 ] σ 4 N} and N( βn β) d N(0, σ 2 E(x t x t) 1 ) Eduardo Rossi - Econometria er i mercati finanziari (avanzato) 38