Introduction to Estimation Methods for Time Series models Lecture 2

Introduction to Estimation Methods for Time Series models Lecture 2 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 1 / 21

Estimators: Large Sample Properties Purposes: study the behavior of ˆθ n when n approximate unknown finite sample distributions of ˆθ n Being ˆθ n a distribution n, how to define ˆθ n θ 0? Convergence in Probability (plim): The random variable ˆθ n converges in probability to a constant θ 0 if ɛ > 0 lim n P( ˆθ n θ 0 < ɛ) = 1 If plim ˆθ n = θ 0 the estimator is Consistent Convergence in Quadratic Mean: lim E(ˆθ n θ 0 ) 2 = lim MSE[ˆθ n n n] = 0 Being MSE[ˆθ n] = Var[ˆθ n]+bias[ˆθ n] 2 we have Convergence in Quadratic Mean lim Bias[ˆθ n n] = 0 and lim Var[ˆθ n n] = 0 Convergence in Quadratic Mean Convergence in Probability Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 2 / 21

Consistency Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 3 / 21

OLS without Normality: Large Sample Theory When H.4 of Normality is violated, OLS is still unbiased and BLUE, however confidence intervals and test statistics are not valid Assumptions for the large sample theory: A.1 E[x i ɛ i ] = 0 regressors uncorrelated with errors (weaker than strict exogeneity) 1 A.2 lim n n X X = Q definite positive Being ( n ) 1 ( n ) ˆβ = β +(X X) 1 X ɛ = x i x i x i ɛ i Consistency (convergence in Probability): plim β n = β ( ) 1 n 1 ( ) β n = β + x i x 1 n i x i ɛ i β n n }{{}}{{} Q 1 (A.2) 0 (A.1+LLN) Asymptotic Normality ( ) 1 n 1 ( ) 1 n n ( βn β) = x i x i x i ɛ i N(0, Q 1 (σ 2 Q)Q 1 ) = N(0,σ 2 Q 1 ) n n }{{}}{{} Q 1 (A.2) N(0,σ 2 Q) (A.2+CLT) Hence β n N(β,σ 2 (X X) 1 ) as in the Normal case (H.4) but only asymptotically. Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 4 / 21

NLS (the idea) The Nonlinear Regression model is y i = h(x i,β)+ɛ i with E[ɛ i x i ] = 0 Nonlinear Least Square (NLS) estimator: β NLS = arg min β n ɛ 2 i = arg min β n (y i h(x i,β)) 2 FOC: NLS estimator β NLS satisfy n [ y i h(x i, β ] h(xi, NLS ) β NLS ) = 0 β In general, no close form solutions numerical minimization Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 5 / 21

Maximum Likelihood Basic, strong assumption: distribution of the data known up to θ. Likelihood function: The joint density of an i.i.d random sample (x 1, x 2,...,x n) from f(x;θ 0 ) f(x 1, x 2,...,x n;θ) = f(x 1 ;θ)f(x 2 ;θ)...f(x n;θ) a different perspective: see the joint density as a function of the parameters θ (as opposed to the sample) n L(θ X) f(x 1, x 2,...,x n;θ) = f(x i ;θ) it is usually simpler to work with the log of the likelihood l(θ x 1, x 2,...,x n) ln L(θ x 1, x 2,...,x n) = n ln f(x i ;θ) ML Estimator: Given sample data generated from parametric model, find parameters that maximize probability of observing that sample. F.0.C. the Score: θ ML = arg max θ L(θ x 1, x 2,...,x n) = arg max θ l(θ) θ = 0 l(θ x 1, x 2,...,x n) Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 6 / 21

Maximum Likelihood: Example Consider a Univariate Normal model: f(y,θ) = N(µ,σ 2 ) = ( 1 exp 1 2πσ 2 2 (y i µ) 2 ) The log-likelihood is n l(µ,σ 2 ) = f(y i ;θ) = n 2 ln(2π) n 2 ln(σ2 ) 1 n (y i µ) 2 2 σ 2 and then the score of the µ and σ 2 parameters are l(µ,σ 2 ) σ 2 l(µ,σ 2 ) µ = 1 n σ 2 (y i µ) = 0 = n 2σ 2 + 1 2σ 4 σ 2 n (y i µ) 2 = 0 Therefore, by first solving for µ and inserting it in the score of σ 2 we get the ML estimators µ ML = 1 n y i n σ ML 2 = 1 n (y i µ ML ) 2 n Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 7 / 21

Maximum Likelihood: Properties Consistency: plim θ ML = θ 0 Asymptotic normality: a θ ML N (θ 0,[I(θ 0 )] 1) [ 2 ] ln L(θ) where I(θ) = E θ θ I(θ) is the Fisher Information matrix Asymptotic efficiency: has the smallest asymptotic variance being I(θ) 1 the Cramér-Rao lower bound Invariance: if θ is the MLE for θ 0, and if g(θ) is any (invertible) transformation of θ, then the MLE for g(θ 0 ) = g( θ). Ex: precision parameter γ 2 = 1/σ 2 γ 2 ML = 1/σ2 ML Bottom line: MLE makes best use of information (asymptotically) Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 8 / 21

Maximum Likelihood: Properties of regular density Under some regularity conditions (whose goal is to use Taylor approximation and interchange differentiation and expectation) D1: ln f(y i ;θ), g i = ln f(y i;θ), H θ i = 2 ln f(y i ;θ) θθ are random sample D2: E 0 [g i (θ 0 )] = 0 D3: Var 0 [g i (θ 0 )] = E 0 [H i (θ 0 )] D1 implied by assumption: y i, i = 1,..., n is random sample D2 is a consequence of ln f(y i ;θ 0 )dy i = 1 since by differencing both sides by θ 0 f(yi ;θ 0 ) ln f(yi ;θ 0 ) 0 = dy i = f(y i ;θ 0 )dy i = E 0 [g i (θ 0 )] θ 0 θ 0 D3 is obtained by differencing once more w.r.t θ 0 [ D1 (random sample) Var n 0 g i(θ 0 ) ] = n Var 0[g i (θ 0 )], thus [ ] [ ] ln L(θ0 ; y i ) 2 ln L(θ 0 ; y i ) J(θ 0 ) Var 0 = E 0 θ 0 θ 0 θ 0 I(θ 0 ) }{{} Information matrix equality Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 9 / 21

Asymptotic normality of MLE being the max of ln L, MLE satisfy by construction the likelihood equation g(ˆθ) = n g i(ˆθ) = 0 define H(θ 0 ) = 2 ln L(θ 0 ;y i ) θ 0 θ 0 = n 2 ln f(y i ;θ 0 ) θ 0 θ 0 = n H i(θ 0 ) 1 take first order Taylor expansion of the score g(ˆθ) around θ 0 g(ˆθ) = g(θ 0 )+H(θ 0 )(ˆθ θ 0 )+R 1 = 0, 2 rearrange and scale by n ( ) n(ˆθ θ0 ) = 1 n 1 H i (θ 0 ) n }{{} E 0[ 1 n H(θ 0)] n 1 n g i (θ 0 ) n }{{} N(0,Var 0[ n 1 g(θ 0)]) + R 1 }{{} 0 3 Use LLN (on first term) and CLT (on second term) [ ] 1 1 [ ]) 1 n(ˆθ θ0 ) E 0 n H(θ 0) N( 0, Var 0 n g(θ 0) ( N 0, n I(θ 0 ) 1 J(θ 0 )I(θ 0 ) 1) ( ) 1 1 = n I(θ 0) N( 0, 1 ) n J(θ 0) if information matrix equality J(θ 0 ) = I(θ 0 ) holds then ˆθ a N (θ 0, I(θ 0 ) 1) Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 10 / 21

Estimating asymptotic covariance matrix of MLE Three asymptotically equivalent estimators of the Asy.Var[ˆθ]: 1 Calculate E 0 [H(θ 0 )] (very difficult) and evaluate it at ˆθ to estimate [ ]} 1 {I(ˆθ)} 1 = { E 2 ln L(ˆθ; y i ) 0 ˆθˆθ 2 Calculate H(θ 0 ) (still quite difficult) and evaluate it at ˆθ to get { } 1 {Î(ˆθ)} 1 2 ln L(ˆθ; y i ) = ˆθˆθ 3 BHHH or OPG estimator (easy): use information matrix equality I(θ 0 ) = J(θ 0 ) { [ ]} 1 { n } 1 {Ĩ(ˆθ)} 1 ln L(ˆθ; y i ) = Var = g i (ˆθ)g i (ˆθ) ˆθ Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 11 / 21

Hypothesis testing (idea) Test of hypothesis H 0 : c(θ) = 0 Three tests, asymptotically equivalent (not in finite sample): Likelihood ratio test : If c(θ) = 0 is valid, then imposing it should not lead to a large reduction in the log-likelihood function. Therefore, we base the test on the difference, 2(ln L ln L R ) χ 2 df Both unrestricted ln L and restricted ln L R ML estimators are required Wald test: If c(θ) = 0 is valid, then c(θ ML ) 0 Only unrestricted (ML) estimator is required Lagrange multiplier test: If c(θ) = 0 is valid, then the restricted estimator should be near the point that maximizes the ln L. Therefore, the slope of ln L should be near zero at the restricted estimator Only restricted estimator is required Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 12 / 21

Hypothesis testing Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 13 / 21

Application of MLE: Linear regression model Model: y i = x i β +ɛ i and y i x i N(x i β,σ2 ) Log-likelihood based on n conditionally independent observations: Likelihood equations ln L = n 2 ln(2π) n 2 lnσ2 1 n (y i x i β) 2 σ 2 = n 2 ln(2π) n 2 lnσ2 (y Xβ) (y Xβ) 2σ 2 solving likelihood equations ln L β ln L σ 2 = X (y Xβ) σ 2 = 0 = n 2σ 2 + (y Xβ) (y Xβ) 2σ 4 = 0 ˆβ ML = (X X) 1 X Y and ˆσ 2 ML = e e n ˆβ ML = ˆβ OLS OLS has all desirable asymptotic properties of MLE Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 14 / 21

Maximum Likelihood in time series: AR model In a time series y t, the innovations ɛ t are usually not i.i.d. It is then very convenient to use the prediction error decomposition of the likelihood: L(y T, y T 1,..., y 1 ;θ) = f(y T Ω T 1 ;θ) f(y T 1 Ω T 2 ;θ)... f(y 1 Ω 0 ;θ) For example for the AR(1) the full log-likelihood can be written as y t = φ 1 y t 1 +ɛ t T l(φ) = f Y1 (y 1;φ) + f Y t Y t 1 (y t y t 1;φ) = f Y1 (y T T 1;φ) }{{} 2 log(2π) logσ 2 1 T (y t φy t 1) 2 2 σ 2 t=2 t=1 t=2 marginal 1 st obs }{{} conditional likelihood under normality OLS=MLE Hence, maximizing the conditional likelihood for φ is equivalent to minimize which is the OLS criteria. T (y t φy t 1 ) 2 t=2 In general for AR(p) process OLS are consistent and, under gaussianity, asymptotically equivalent to MLE asymptotically efficient Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 15 / 21

Maximum Likelihood in time series: ARMA model For a general ARMA(p,q) Y t = φ 1 Y t 1 +...+φ py t p +ɛ t +θ 1 ɛ t 1 +...+θ qɛ t q Y t 1 is correlated with ɛ t 1,...,ɛ t q OLS not consistent. MLE with numerical optimization procedures. Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 16 / 21

Maximum Likelihood in time series: GARCH model A GARCH process with gaussian innovation: has conditional densities: r t Ω t 1 N(µ t(θ),σ 2 t (θ)) f(r t Ω t 1 ;θ) = 1 ( σt 1 (θ) exp 1 2π 2 (r t µ t(θ)) 2 ) σ 2 t (θ) using the prediction error decomposition the log-likelihood becomes: log L(r T, r T 1,..., r 1 ;θ) = T T 2 log(2π) logσt 2 (θ) 1 T (r t µ t(θ)) 2 2 σ 2 t=1 t=1 t (θ) Non linear function in θ Numerical optimization techniques. Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 17 / 21

Quasi Maximum Likelihood ML requires complete specification of f(y i x i ;θ), usually Normality is assumed. Nevertheless, even if the true distribution is not Normal, assuming Normality gives consistency and asymptotic normality provided that the conditional mean and variance processes are correctly specified. However, the information matrix equality does not hold anymore i.e. J(θ 0 ) I(θ 0 ) hence, the covariance matrix of θ [ ML is not I(θ 0 ) 1 = E 2 ] 1 l(θ0 ) but θ 0 θ 0 θ QML a N ( θ 0,[I(θ 0 )] 1 J(θ 0 )[I(θ 0 )] 1) where 1 J(θ 0 ) = lim n T T t=1 [ lt(θ 0 ) l t(θ 0 ) ] E θ 0 θ 0 ] 1 ] 1 the std error provided by [Î(θ0 ) Ĵ(θ0 ) [Î(θ0 ) are called the robust standard errors. Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 18 / 21

GMM (idea) Idea: Sample moments p Population moments = function(parameters) A vector function g(θ, w t) which under the true value θ 0 satisfies E 0 [g(θ 0, w t)] = 0 is called a set of orthogonality or moment conditions Goal: estimate θ 0 from the informational content of the moment conditions semiparametric approach i.e. no distributional assumptions! replace population moment conditions with sample moments ĝ T (θ) = 1 T T g(θ, w t) t=1 and minimize, w.r.t. θ, a quadratic form of ĝ n(θ) with a certain weighting matrix W ( ( ) 1 T 1 T θ GMM = arg min g(θ, w t)) W g(θ, w t) θ T T t=1 t=1 Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 19 / 21

GMM: optimal weighting matrix exactly identified (Method of Moment, MM): # orthogonality conditions = # parameters moment equations satisfied exactly, i.e. 1 T g( θ, w t) = 0 W irrelevant T t=1 overidentified (Generalize MM): # orthogonality conditions > # parameters W is relevant. The optimal weighting matrix W is the inverse of the asymptotic var-cov of g(θ 0, w t) W = Var[g(θ 0, w t)] 1 but it depends on the unknonw θ 0 Feasible two-step procedure: Step 1. Use W = I to obtain a consistent estimator, ˆθ 1, then estimate ˆΦ = 1 T g(ˆθ 1, w t)g(ˆθ 1, w t) T t=1 Step 2. Compute second step GMM estimator using the weighting matrix ˆΦ 1 ( ) ( ) 1 T θ GMM = arg min g(θ, w t) ˆΦ 1 1 T g(θ, w t) θ T T t=1 t=1 The two-step estimator θ GMM is asymptotically efficient in the GMM class Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 20 / 21

GMM, OLS and ML Many estimation methods can be seen as GMM Examples: 1) OLS is a GMM with ortoghonality condition E[x i ɛ i ] = E [ x i (y i x i θ)] = 0 2) ML is a GMM on the score [ ] log f(yt;θ) E = E[g(θ, w t)] = 0 θ Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 21 / 21