Asymptotic Variance of Test Statistics in ML and QML Frameworks

Size: px

Start display at page:

Download "Asymptotic Variance of Test Statistics in ML and QML Frameworks"

Cecily Cain
5 years ago
Views:

1 Asymptotic Variance of Test Statistics in ML and QML Frameworks Anil K. Bera Osman Doğan Süleyman Taşpınar August 5, 207 Abstract In this study, we consider test statistics that can be written as the sample averages of data and derive their limiting distribution under the maximum likelihood ML and the quasi-maximum likelihood QML frameworks. We first generalize the asymptotic variance formula suggested in Pierce 982 in the ML framework and illustrate its applications through some well-known test statistics: i the skewness statistic, ii the kurtosis statistic, iii the Cox s statistic, iv the information matrix test statistic, and v the Durbin s h-statistic. We next provide a similar result in the QML setting and illustrate its applications by providing two examples. Illustrations show the simplicity and the effectiveness of our results for the asymptotic variance of test statistics, and therefore, they are recommended for practical applications. JEL-Classification: C3, C2, C3. Keywords: Variance, Asymptotic variance, MLE, QMLE, Inference, Test statistics, Skewness statistic, Kurtosis statistic, The Cox s statistic, The information matrix test, the Durbin s h-statistic. An earlier version of this paper was presented in a seminar at the Stat-Math Unit SMU of the Indian Statistical Institute ISI, Kolkata, in July 207. We are grateful to the seminar participants for constructive comments and suggestions. Any remaining shortcomings and errors are, of course, ours. Economics Program, University of Illinois, Illinois, United States, abera@illinois.edu. Economics Program, University of Illinois, Illinois, United States, odogan@illinois.edu. Economics Program, Queens College, The City University of New York, United States, staspinar@qc.cuny.edu.

2 Introduction Taking account of the nuisance parameters, particularly in the context of hypotheses testing, is an age-old problem in statistics. Attempts to solve this problem goes back as far as Student 908 who explored and solved a finite sample problem relating to testing for mean with the variance as the nuisance parameter. The problem persists also asymptotically; for instance, as investigated in a series of paper by Neyman 935, 957, 959. One of the outcomes of Neyman s research effort was his celebrated Cα test where the nuisance parameters are replaced with n-consistent estimators without changing the underlying asymptotic distribution of the test statistic. Similarly, Pierce 982 considered a simpler but immensely important practical problem of investigating the effect of replacing the unknown parameter θ 0 by its efficient estimator θ in a test statistic T y, θ 0, where ET y, θ 0 is free of θ 0. He provided an attractive practical solution along with the condition when no adjustment to variance will be necessary for estimation of θ 0. Quite coincidently, many of testing problems in econometrics, both old and recent ones, fall under Pierce 982 framework. However, we see hardly any reference to Pierce 982 in the econometric literature. Some exceptions are Newey and McFadden 994, Bera and Zuo 996, Tse 2002, Prokhorov and Schmidt 2009, Andreou and Werker 200 and Gorodnichenko et al It appears that econometricians have tackled the problems of nuisance parameters in testing case by case through extensive derivations and complex algebra. For instance, take the case of Cox 96, 962 statistic for separate families of hypotheses. White 982b, for the first time, provided rigorous derivation of the asymptotic distribution of Cox statistic. We show the essential part of White 982b results can be easily obtained using Pierce 982. The same can be said about, for example, Durbin 970 h-test, White 982a information matrix test and Jarque and Bera 987 skewness and kurtosis statistics. Given such ubiquitous occurrence of the same issue, it seems to be very important to bring Pierce s work to the forefront of econometrics. As we mentioned, Pierce 982 replaced the unknown parameter θ 0 by an efficient estimator, such as maximum likelihood estimator MLE. When the true data generating process DGP is unknown, the best we can hope for having a quasi MLE QMLE after assuming a parametric distribution F y, θ 0. White 982a suggested a sandwich variance formula for QMLE; the same formula also has been attributed to Eicker 963, 967 and Huber 967. However, its history goes back to Koopmans et al They derived the sandwich formula as a part of the large sample properties of the full-information MLE FMLE of the parameters of simultaneous equation system. It is fairly safe to say that almost all likelihood based estimators are QMLE, as the true DGP is rarely known. Thus, a natural progression would be to generalize Pierce 982 when θ 0 in T y, θ 0 is replaced by QMLE rather than MLE. In this study, we provide such a result with practical illustrations. Similar results are also considered in Newey 985a,b and Tauchen 985. In these studies, the authors do not focus on a general formula of the asymptotic variance of test statistics, instead they show how to conduct certain tests through auxiliary regressions. However, our focus is on the general variance formula and its practical implications for well-known test statistics. The rest of this paper is organized as follows. In Section 2, we revisit Pierce 982 and 2

3 generalize his result for certain type of test statistics in the ML framework. In the following subsection, Section 2., we illustrate the practical aspect of our results by providing examples on the well-known test statistics: the skewness statistic, the kurtosis statistic, the Cox s statistic, the information matrix statistic, and the Durbin s h-statistic. In Section 3, we generalize Pierce 982 result to the QMLE setting considered by White 982a. In the following sub-section, Section 3., we apply our result to skewness and kurtosis statistics under distributional misspecification. We conclude in Section 4. Some technical results are relegated to an appendix. 2 The Asymptotic Variance of Statistics Based on MLE In this section, we first state the assumptions needed to characterize the true DGP and define the MLE in a general setting by following White 982a. We next define the test statistic and state the regularity conditions that are required for its limiting distribution. Our main interest is to generalize Pierce 982 and to establish the limiting distribution of test statistic formulated with the MLE. Assumption. The random variables y i, for i =,... n, are i.i.d with common joint distribution function F y, θ, where θ is a p parameter vector, on a measurable space A, F, where y i s assume values in A and F is the relevant sigma algebra. Let ν be a measure defined on A, F such that it dominates F. Given F, there exists a measurable non-negative Radon-Nikodym density fy, θ = df y, θ/dν. Assumption 2. i Let Θ be a compact subset of R p. The density function fy, θ is measurable with respect to y A and continuous with respect to all θ Θ. ii log fy, θ my for all θ Θ and all y A, where m is an integrable function with respect to F. iii E[log fy, θ] has unique maximum at θ 0. Under Assumptions and 2, the log-likelihood function generated by F y, θ is defined by ly, θ = log fy i, θ. 2. Then, the MLE is defined by θ = arg max θ Θ ly, θ. Assumptions and 2 ensure the existence of the MLE. Also under these two assumptions, it can be shown that θ = θ 0 + o p. To establish asymptotic normality for the MLE, we need the following additional assumptions. Assumption 3. i The first order derivatives log fy, θ/ θ j, for j =,..., k, are F measurable for θ Θ, and continuously differentiable function of θ for all y A. ii There are integrable functions with respect to F for all y A and θ Θ, that dominate 2 log fy, θ/ θ i θ j and log fy, θ/ θ i log fy, θ/ θ j for i, j =,..., p. Assumption 4. The interior of Θ contains θ 0. 3

4 Using Assumption 3, we can define the following two information matrices: Aθ = { n } 2 { log fy i, θ, Aθ = E θ i θ j 2 log fy, θ θ i θ j }. 2.2 The asymptotic distribution of θ is based on the first order Taylor expansion of ly, θ n θ around θ 0, which yields n θ θ0 = A θ 0 ly, θ 0 + o p. 2.3 n θ Our stated assumptions ensure that n ly,θ 0 θ d N [0, Aθ 0 ] and A θ = Aθ 0 + o p. Thus, under the assumption that Aθ 0 is non-singular, we have n θ θ0 d N [ 0, A θ 0 ]. Following Huber 967, we consider the test statistics that can be written as the sample averages of data. Specifically, we consider: T θ = n ρy i, θ, 2.4 where ρy i, θ is a real valued function defined on A Θ, satisfying the following assumption. Assumption 5. i ρy, θ is F-measurable for all θ Θ and continuously differentiable for all θ Θ. ii Eρy, θ 0 = ρy, θ 0 df y, θ 0 = ρ is independent of θ 0. iii The first order derivatives ρy, θ/ θ j for j =,..., p are F measurable for all θ Θ, and are dominated by integrable function with respect to F. Assumption 5 characterizes the statistics that we considered in this paper; in particular, 5ii plays a role in simplifying the variance formula, and 5iii allows the interchange of the order of differentiation and integration in our analysis, and ensures that the asymptotic variance of test statistic can be consistently estimated. First, we present a short review of Pierce 982. Pierce 982 starts with the assumption that nt θ0 ρ and n θ θ 0 has a joint asymptotic multivariate normal distribution and derives a general asymptotic variance formula for nt θ. Pierce 982 sets ρ = 0 in Assumption 5, and starts with the following joint asymptotic multivariate normal distribution: [ ] n θ θ0 d A θ 0 M θ 0 N 0,, 2.5 nt θ0 Mθ 0 Cθ 0 where Mθ 0 is the asymptotic covariance between nt θ 0 and n θ θ 0. A first order Taylor expansion of T θ around θ 0 gives nt θ = nt θ0 + Dθ 0 n θ θ 0 + o p, 2.6 4

5 where Dθ 0 = E ρy, θ0. Thus, we have θ nt Var θ = Cθ 0 + Dθ 0 A θ 0 D θ 0 + Mθ 0 D θ 0 + Dθ 0 M θ With the assumption that E ρy, θ 0 is independent of θ 0, Pierce 982 demonstrates that Cov nt θ0, ly, θ 0 = Dθ n θ Since from 2.3, A θ 0 n ly, θ 0 θ is asymptotically equivalent to n θ θ0, nt Dθ 0 A θ 0 = Cov θ0, A θ 0 ly, θ 0 n θ nt Cov θ0, n θ θ0 leading to = Mθ 0, 2.9 Mθ 0 Dθ 0 A θ Using 2.0 in 2.7, we have the asymptotic variance formula stated in Pierce 982: nt Var θ = Cθ 0 Dθ 0 A θ 0 D θ Following the asymptotic arguments given in Huber 967 and White 980, we generalize Pierce 982 by directly establishing the joint asymptotic distribution of nt θ ρ and n θ θ 0, as stated in the following proposition. Proposition. Under Assumptions -5, the joint limiting distribution of nt θ ρ and n θ θ 0 is given by where [ n θ θ0 d N 0, nt θ ρ A θ Cθ 0 Dθ 0 A θ 0 D θ 0 Cθ 0 = E ρy, θ 0 ρ ρy, θ 0 ρ and Dθ 0 = E Proof. See Appendix A.. ] ρy, θ0 θ, Here we should note two important points. First, Proposition indicates that the asymptotic covariance between n T θ ρ and n θ θ0 is zero. This result highlights the fact that the MLE θ is an efficient estimator. For details, see Rao 973, Section 5a.2. Second, the simple 5

6 asymptotic covariance matrix of nt θ ρ is the same as that in Pierce 982. As shown in the proof, this simplified asymptotic variance formula can be derived in two alternative ways. Consider the following covariance between the score and test indicator log fy, θ0 Pθ 0 = E ρy, θ 0 ρ. θ In the first approach that we use in our proof in Appendix A., the assumption that E ρy, θ 0 = ρ is free of θ 0 leads to the result P θ 0 = Dθ 0, which can be considered as a type of information matrix equality result Neyman 959, p.27, Equation 4. This equality directly implies the asymptotic covariance matrix of nt θ ρ in 2.2. A second approach can be based on the efficiency argument of the MLE θ. In A.6, we have the asymptotic covariance matrix between n θ θ0 and nt θ ρ as Vθ 0 = Dθ 0 A θ 0 + P θ 0 A θ 0, which is zero, and that implies the equality P θ 0 = Dθ 0. Finally, under our assumptions, consistent estimators for the components of asymptotic variance-covariance matrix in 2.2 can be constructed. We may use the plug-in method when Cθ 0 and Dθ 0 in 2.3 have closed forms. Moreover, the i.i.d property of data ensures that they can be estimated by their corresponding sample counterparts as in the following proposition. Proposition 2. Consider the following sample moments: D θ = n C θ = n ρy i, θ θ θ, 2.4 ρy i, θ T θ ρy i, θ T θ. 2.5 Then, under our assumptions, we have D θ = Dθ 0 + o p and C θ = Cθ 0 + o p. Proof. See Appendix A Illustrations In this section, we provide examples that illustrate the practical applications of the general result stated in Proposition. We use the asymptotic variance formula developed in the previous section to formulate the asymptotic variance of the following well-known statistics: i the skewness statistic, ii the kurtosis statistic, iii the Cox s statistic, iv the information matrix statistic, and v the Durbin s h-statistic. 6

7 2.. The Skewness Statistic Consider the following data generating process y i = x iβ 0 + ε i, 2.6 where x i is the k vector of exogenous variables and ε i is an i.i.d normal random variable with mean zero and variance σ 2 0. Let θ 0 = β 0, σ2 0 be the true parameter vector and θ = β, σ 2 be an arbitrary vale of parameter vector in the parameter space. The log-likelihood function of 2.6 is ly, θ = n 2 log 2π n 2 log σ2 2σ 2 where ε i θ = y i x iβ. The first and second order conditions are ly, θ β ly, θ β β = σ 2 ly, θ σ 2 σ 2 = = σ 2 ε i θx i, x i x i, n 2σ 2 2 σ 2 3 ly, θ σ 2 ε 2 i θ, 2.7 = n 2σ 2 + 2σ 2 2 ε 2 i θ, 2.8 ly, θ β σ 2 = σ 2 2 ε i θx i, ε 2 ly, θ i θ, σ 2 = β σ 2 2 ε i θx i. Let X = x,..., x n be n k matrix of exogenous variables. We assume that Q x = lim n n X X exists and is nonsingular. Then, it can be shown that Q Aθ 0 = 2 x 0 k k 2σ 4 0 Let θ = arg max θ Θ ly, θ be the ML estimator. Then, the skewness statistic can be expressed as T θ = n ρy i, θ, where ρy i, θ = ε 3 i / σ Define ξ i = ε i /σ as the i.i.d random variable with mean zero and unit variance. Then, the variance of nt θ 0 is Cθ 0 = Varξ 3 i = E [ ξ 3 i Eξ 3 i ] 2 = µ6 µ 3 2 = 5, 2.2 where µ 6 = Eε 6 i and µ 2 = Eε 2 i = σ2 0. Simple calculations show that Dθ 0 = 3 σ 0 M x

8 where M x = lim n n X l n and l n is the n vector of ones. Then, using Proposition, the asymptotic variance of skewness statistic is nt Var θ = 5 9M xq x M x Remark. An estimate of M xq x M x = lim n n l nx lim n n X X lim n n X l n is n l nx X X X l n. Consider the regression model l n = Xδ 0 + u, where δ 0 is k vector of parameters and u is an n vector of disturbance term. The OLS estimator is δ = From the sum of residuals, we get l nû = l nl n X δ = l n = l nx l n X X X X l n = 0 X X X l n. X X X l n = n Hence, the estimate of M xq x M x is simply. Using Remark, the variance formula in 2.23 simplifies to nt Var θ = 5 9M xq x M x = The Kurtosis Statistic In this section, we investigate the asymptotic variance of kurtosis statistic under the data generating process described in Section 2... The kurtosis statistic can be expressed as T θ = n ρy i, θ, where ρy i, θ = ε 4 i / σ Note that E ρy i, θ 0 = µ 4 /σ 4 0 = 3, where µ 4 = Eε 4 i. The variance of nt θ 0 is Cθ 0 = Varξ 4 i = E [ ξ 4 i Eξ 4 i ] 2 = µ 4 2 µ 8 µ 2 4 = Simple calculations show that Dθ 0 = 0 2µ 4 = 6 0 6µ Then, Proposition implies the following asymptotic variance nt Var θ = µ 2Q x 0 k 2 0 k µ 2 =

9 2..3 The Cox s Statistic In this section, we consider the derivation of the asymptotic variance formula for the Cox s test statistic Cox 96, 962. We assume that we have i.i.d observations y,..., y n, and we aim to test the null hypothesis H f that y i has a density fy, θ for some θ Θ against the alternative that y i has a density function hy, γ for some γ Γ, where Γ is a compact parameter space. Assume that θ is k and γ is p vectors of parameters. Under H f, let θ be the MLE and γ be the QMLE. We assume that θ = θ 0 + o p and γ = γ 0 + o p. Note that the consistency of QMLE γ can be verified by adopting regularity conditions listed in Section 3 for hy, γ. Let δ = θ, γ be the combined parameter vector of dimension k + p. Then, the Cox s test statistic is given by T δ = n ρy i, δ, 2.30 where ρy i, δ = log fy i, θ/hy i, γ log fy, θ/hy, γ fy, θdν. 2.3 Note that under H f, we have Eρy, δ 0 = 0, where expectation is taken with respect to fy, θ, which implies that Cδ 0 = E ρy, δ 0 ρy, δ [ 2 = [log fy, θ 0 /hy, γ 0 ] 2 fy, θdν log fy, θ 0 /hy, γ 0 fy, θ 0 dν]. The gradient of the test statistic is given by Dθ 0 = E ρy, δ θ, δ0 ρy, δ γ δ0 = ψδ 0, φδ 0, 2.33 ρy, δ ρy, δ where ψδ 0 = E and φδ θ δ0 0 = E. In order to allow the exchange of the γ δ0 order of differentiation and integration, we adopt the following assumption. Assumption 6. log fy, θ/hy, γ fy, θ/ θ i and log fy, θ/hy, γ fy, θ/ γ j for i =,..., k and j =,..., p are dominated for all θ and γ in Θ Γ by measurable functions that are integrable with respect to ν. Let δ be gradient operator with respect to δ. Then ψδ 0 = E θ log fy i, θ 0 /hy i, γ 0 θ = log fy, θ 0 /hy, γ 0 fy, θ 0 dν θ log fy, θ 0 log fy, θ 0 /hy, γ 0 fy, θ 0 dν,

10 where we use the fact that θ fy, θ = θ log fy, θ fy, θ. For φδ 0, we simply have φδ 0 = E γ log fy i, θ 0 /hy i, γ 0 γ log fy, θ 0 /hy, γ 0 fy, θ 0 dν = Hence, the gradient of test statistic is simply given by Dθ 0 = ψδ 0, 0 p Under our regularity conditions in Sections 2 and 3 for fy, θ and hy, γ, it can be shown that [ n θ θ0 d N 0 k+p, n γ γ0 ] A θ 0 C θγ C γθ Hγ 0, 2.37 where C θγ is the asymptotic covariance between n θ θ 0 and n γ γ 0, and Hγ 0 is the asymptotic covariance of n γ γ 0. Then, using Proposition, we get nt Var θ = [ [log fy, θ 0 /hy, γ 0 ] 2 fy, θdν A θ 0 C θγ ψδ 0 0 p C γθ Hγ 0 = ψ δ 0 0 p [ [log fy, θ 0 /hy, γ 0 ] 2 fy, θdν ] 2 log fy, θ 0 /hy, γ 0 fy, θ 0 dν log fy, θ 0 /hy, γ 0 fy, θ 0 dν + ψδ 0 A θ 0 ψ δ The variance formula in 2.38 is the same as with the one stated in White 982b. In order to get a consistent estimator of 2.38, we need the following condition See Lemma 2. Assumption 7. [log fy, θ/hy, γ] 2 is dominated by a measurable function that is integrable with respect to ν for all θ and γ in Θ Γ. Assumptions 6 and 7 and Lemma 2 can be used to show that a consistent estimator of nt Var θ is given by nt Var θ = n [ log fy i, θ/hy ] [ 2 i, γ log fy, θ/hy, ] 2 γ fy, θdν + ψ δa θψ δ, 2.39 nt which the estimator proposed by Cox 962. To get another consistent estimator of Var θ, which avoids evaluation of integrals, we follow White 982b and adopt the following assumption. ] 2 0

11 Assumption 8. log fy, θ/ θ i log fy, θ/hy, θ for i =,..., k are dominated by measurable functions that are integrable with respect to ν. Assumptions 6-8 along with Lemma 2 ensure the following consistent estimator: nt Var θ = n [ [ log fy i, θ/hy ] 2 i, γ n log fy i, θ/hy ] 2 i, γ + ψ δa θψ δ, 2.40 where ψ δ = n θ log fy i, θ [ log fy i, θ/hy ] i, γ The Information Matrix Test White 982a uses the information equivalence to suggest a misspecification test, which is called the information matrix IM test. Consider the model specification stated in Section 2. The IM test statistic is given by T θ = n ρy i, θ, 2.42 where ρy i, θ is q vector containing indicators of interest with a typical element given by ρ l y i, θ = log fy i, θ θ i log fy i, θ θ j + 2 log fy i, θ θ i θ j Note that under the null hypothesis of no misspecification, we have E ρy, θ 0 = 0, where θ 0 is the true parameter value. Then, Cθ 0 = E ρy, θ 0 ρ y, θ We adopt the following assumption for the elements of ρy, θ. Assumption 9. ρ l y, θ/ θ i for l =,..., q and i =,..., k exist and are continuous function of θ for each y. Under Assumption 0, the expected value of the gradient of test statistic at the truth is Dθ 0 = {E θ ρ l y, θ 0 }, l =,... q Under the null of no misspecification, we have n θ θ0 d N [ 0, A θ 0 ], as shown in Section 2.

12 Then, applying Proposition, we obtain nt Var θ = Cθ 0 Dθ 0 A θ 0 D θ Now we show that 2.46 is the same as White 982a, p. 0 s formula, which is V θ 0 = E [ ρy, θ 0 Dθ 0 A θ 0 θ log fy, θ 0 ] [ ρy, θ 0 Dθ 0 A θ 0 θ log fy, θ 0 ] 2.47 = E ρy, θ 0 ρ y, θ 0 E Dθ 0 A θ 0 θ log fy, θ 0 ρ y, θ 0 E ρy, θ 0 θ log fy, θ 0A θ 0 D θ 0 + E Dθ 0 A θ 0 θ log fy, θ 0 θ log fy, θ 0A θ 0 D θ 0 = Cθ 0 Dθ 0 A θ 0 D θ 0, 2.48 where we used the fact that E ρy, θ 0 θ log fy, θ 0 = Dθ 0. A consistent estimator of V θ 0 is given by V θ = C θ D θa θd θ, 2.49 where C θ = n n ρy i, θρ y i, θ and D θ = i =,..., k. { n n ρ ly i, θ/ θ i } for l =,... q and 2..5 The Durbin s h-statistic In this section, we consider the Durbin s h-statistic suggested by Durbin 970 for testing the presence of an autoregressive process in the disturbance terms of a linear regression model that includes lagged dependent variables. The purpose of this illustration is to show that the result in Proposition is general enough and can be applicable to the time series models. Consider the following regression model. y t = β y t β r y t r + β r+ x t β r+s x st + u t, t = 0,,..., n, 2.50 u t = αu t + ε t, t =,..., n, 2.5 where α < is the autoregressive parameter, and ε t are i.i.d normal random variables with mean zero and variance 2. As in Durbin 970, we assume that y 0, y..., y r and x 0,..., x s0 are known constants, and u 0 is constant but unknown. We consider the null hypothesis H 0 : α = 0, The Durbin s h-statistic is based on the following serial correlation statistic: α = n t= ûtû t n, t= û2 t

13 where û t = y t β y t... β r y t r β r+ x t... β r+s x st are least squared residuals. Here, n α is the statistic of interest and we use Proposition to formulate its asymptotic variance. Then, the asymptotic variance of nα is given by Var nα = α Then, under H 0 : α = 0, we have Var nα =. Now, consider where α β = α2 nσ 2 0 u β t u t, 2.54 t= n t= u t y t 2 + u t y t. n t= u t y t r + u t y t r u t u t = n β t= t= u t x t + u t x t n t= u t x st + u t x st Using 2.54 and 2.55, we have 0 α E = β So using Proposition, we can write Var n α = nvar β, 2.57 where Var β is the asymptotic variance of the OLS estimator β. Thus, the Durbin s h-statistic is n α 2 h = nvar β, which has an asymptotic standard normal distribution The Asymptotic Variance of Statistics Based on QMLE In this section, we generalize the Pierce 982 formula under distributional misspecification, i.e., under the QMLE setting as in White 982a. We start with the assumption characterizing the true DGP. 3

14 Assumption 0. The random variables y i, for i =,... n, are i.i.d with common joint distribution function G on a measurable space A, F, where y i s assume values in A and F is the relevant sigma algebra. Let ν be a measure defined on A, F such that it dominates G. Given G, there exists a measurable non-negative Radon-Nikodym density g = dg/dν. Since the true distribution function G is rarely known, we choose to work with a parametric family of distribution functions F = {F y, θ} on A, F, which may or may not contain G, where θ is a p vector of parameter. The family F is correctly specified for y if it contains G, otherwise it is misspecified. F y, θ satisfies the conditions of the following assumption. Assumption. F y, θ has Radon-Nikodym density fy, θ = df y, θ/dν. Let Θ be a compact subset of R p. The density function fy, θ is measurable with respect to y A and continuous with respect to all θ Θ. by Under Assumptions and 2, the quasi-log-likelihood function generated by F y, θ is defined ly, θ = log fy i, θ. 3. Then, the QMLE is defined by θ = arg max θ Θ ly, θ. If F contains true distribution function, that is, if Gy = F y, θ 0 for some θ 0 Θ, then the QMLE is just the MLE of θ 0. If F does not include G, then the QMLE is an estimator of a parameter θ that minimizes the following Kullback-Leibler Information Criterion KLIC: Ig : f, θ = E [loggy i /fy i, θ] = logg/fgdν. 3.2 Throughout this section, the expectations are taken with respect to the true distribution function G. The KLIC measures the divergence of f from g, hence the QMLE θ of θ minimizes the discrepancy between f and g. To ensure this interpretation for θ as an estimator of θ, we need the following assumption. Assumption 2. i E log gy exists and log fy, θ my for all θ Θ and all y A, where m is an integrable function with respect to G. ii Identification condition: Ig : f, θ has a unique minimum at pseudo-true value θ in Θ. By Assumption 2, the KLIC is well-defined and θ is globally identifiable. If the matrix Aθ see equation 2.2 is positive definite and if θ minimizes Ig : f, θ on an open neighborhood O Θ, then θ is locally identifiable White 982a. This result indicates that if the sample analog A θ of Aθ is singular or close to being singular, then we will have an indication for an identification problem. Now we restate Assumption 3 in terms of the true distribution function G. 4

15 Assumption 3. i The first order derivatives log fy, θ/ θ j, for j =,..., k, are F measurable for θ Θ, and continuously differentiable function of θ for all y A. ii There are integrable functions with respect to G for all y A and θ Θ, that dominate 2 log fy, θ/ θ i θ j and log fy, θ/ θ i log fy, θ/ θ j for i, j =,..., p. Using Assumption 3, we can define the following matrices: Bθ = { n log fy, θ θ i } { log fy, θ log fy, θ, Bθ = E θ j θ i } log fy, θ. θ j Under our stated assumptions, it can be shown that θ = θ + o p White 982a. The asymptotic distribution of θ is based on the first order Taylor expansion of n written as ly, θ θ 3.3 around θ, which can be n θ θ = A θ ly, θ + o p. 3.4 n θ ly,θ Our stated assumptions ensure that n d θ N [0, Bθ ], A θ = Aθ + o p and B θ = Bθ + o p. Thus, 3.4 implies that n θ θ d N [ 0, A θ Bθ A θ ], 3.5 under the assumption that Aθ is non-singular. If the model is correctly specified, that is, gy = fy, θ 0 for some θ 0 Θ, then Ig : f, θ attains its unique minimum at θ = θ 0, and thus the QMLE θ is the consistent estimator of θ 0. The sandwich formula A θ Bθ A θ is generally attributed to Eicker 963, 967 and Huber 967. For example, Huber 967, Corollary derives the sandwich formula for the asymptotic distribution of a consistent estimator under some regularity conditions when data is simply i.i.d., and then establishes the information matrix equivalence for a correctly specified model. However, long before Eicker 963, 967 and Huber 967, Koopmans et al. 950 derived the sandwich formula while studying the large-sample properties of the MLE of the parameters of the system of structural equations. Koopmans et al. 950, p. 34 in their Assumption , assume joint normality of the disturbances. However, on p.35, they explicitly recognize the possibility that the assumed distribution function has no necessary connection with the distribution of the observations, and wrote: Nevertheless, we can use the [assumed distribution] function to define parameters by the same maximizing procedure. In these circumstances, we shall call it the quasi-likelihood function, and call the maximizing values of its parameters quasi-maximum-likelihood estimates. Possibly, this is the first appearance of the terms quasi-likelihood function and quasi-maximumlikelihood estimates in the statistics and econometrics literature. In Section 3.3.0, Koopmans et al. 950 provide asymptotic sampling variances and covariances of the maximum likelihood estimates, and on page p.50, they derive the sandwich form of the covariance matrix as given in 5

16 our equation 3.5. In the QML framework, our test statistic defined in 2.4 satisfies the conditions of the following assumptions, which is a counterpart of our earlier Assumption 5 under the true distribution function G. Assumption 4. i ρy, θ is F-measurable for all θ Θ and continuously differentiable for all θ Θ. ii Eρy, θ 0 = ρy, θ gydν = ρ is independent of θ. iii The first order derivatives ρy, θ/ θ j for j =,..., p are F measurable for all θ Θ, and are dominated by integrable function with respect to G. Following the asymptotic argument used in Proposition, we establish the joint asymptotic distribution of n θ θ and nt θ ρ in the QML framework, as stated in the following proposition. Proposition 3. Under our regularity conditions, the joint limiting distribution of nt θ ρ and n θ θ is given by [ ] n θ θ d A θ Bθ A θ V θ N 0,, 3.6 nt θ ρ Vθ Sθ Vθ = Dθ A θ Bθ A θ + P θ A θ, 3.7 Sθ = Cθ + Dθ A θ Bθ A θ D θ + P θ A θ D θ + Dθ A θ Pθ. 3.8 where Dθ = E ρy, θ E ρy, θ ρ ρy, θ ρ. Proof. See Appendix A.3. θ, Pθ = E log fy,θ θ ρy, θ ρ, and Cθ = Comparing Propositions and 3, we first note that the off-diagonal block Vθ is not a null matrix. Since there is no information matrix equality under distributional misspecification, Vθ is not a null matrix even if P θ = Dθ in 3.7. Second, when there is correct specification, that is, Gy = F y, θ 0 for some θ 0 Θ, Proposition 3 reduces to Proposition. simplification in the asymptotic variance Sθ derived in Proposition is possible. Thus, the Under the distributional misspecification, the argument given for the simplification in Proposition is not applicable, since there is no parametric density function f such that gy = fy, θ 0. Finally, as shown in Proposition 2, the pertinent sample product moments or plug-in estimators can be formulated to estimate the asymptotic variance in 3.8 in the QML framework. 6

17 3. Illustrations In this section, we illustrate the application of the formula stated in 3.8 within the context of two important example: i the skewness statistic and ii the Kurtosis statistic. 3.. The Skewness Statistic We consider the regression model stated in Section 2.. under the assumption that ε i is an i.i.d random variable with mean zero and variance 2. For notational simplification, we denote the true parameter vector with θ 0 = β 0, σ2 0 even the model is misspecified. Using the first order conditions stated in Section 2.., it can easily be shown that Bθ 0 = µ Q 3 2 x M 2 6 x µ 3 M µ 4 σ x 4 8, 3.9 where µ 3 = Eε 3 i and µ 4 = Eε 4 i. Note that under the normal distribution assumption, we have µ 3 = 0 and µ 4 = 3σ 4, which lead to Bθ 0 = Aθ 0. Using 2.9 and 3.9, it can be shown that A θ 0 Bθ 0 A θ 0 = σ 2 0 Q x µ 3 Q x M x µ 3 M xq x µ 4 σ Remark 2. Note that the diagonality of Aθ 0 indicates that inference about β 0 based on Aθ 0 will be correct even when there is distributional misspecification in the model. However, this is not the case for 2. Under the distributional misspecification, the correct asymptotic variance of n σ 2 2 is µ 4 4, not 2σ4 0. In addition, the result in 3.0 indicates that inference about both β 0 and 2 based on Bθ 0 will be affected and be incorrect under distributional misspecification. Let θ = β, σ 2 be the QMLE of θ 0. We consider the following skewness statistic. T θ = n ρy i, θ, where ρy i, θ = ε 3 i / σ 3 µ 3 µ 3/ Note that E ρy i, θ 0 = Eε 3 i /σ3 0 µ 3µ 3/2 2 = 0. The variance of unfeasible version nt θ 0 is Cθ 0 = E ε 3 / 3 µ 3 µ 3/2 2 2 = µ 3 2 µ 6 µ 2 3, 3.2 where µ 6 = Eε 6 i. Simple calculations show that Dθ 0 = 3 σ 0 M x 3µ 3 2σ Next, we will find Pθ 0. Using the first order conditions stated in Section 2.., we can easily 7

18 calculate that Pθ 0 = µ 3 2σ 5 0 µ 4 σ 5 0 M x + µ 5 2σ Remark 3. Note that unlike in Illustration 2.., here Dθ 0 P θ 0, in general. Under the normality of disturbance term, i.e., when there is no distributional misspecification, we have µ 4 = 3σ 4 0 and µ 3 = µ 5 = 0, and we obtain the IM equality of Section 2: Dθ 0 = P θ 0. Using Proposition 3, we obtain nt Var θ = 6 µ 6 µ M xq x M x + 9 µ2 3 M xq x M x σ 6 0 3µ4 2 4 M xq x M x 3µ µ 3µ nt Using Remark, Var θ in 3.5 simplifies to µ µ 4 σ nt Var θ = 6 µ 6 µ µ µ µ 4 σ 4 0 6µ 4 σ µ2 3 σ 6 0 3µ 3µ Under the null hypothesis of no skewness, i.e., µ 3 = µ 5 = 0, the above expression further simplifies to nt Var θ = µ 6 6 6µ Remark 4. Under no misspecification, i.e., under the normality assumption of disturbance term, we have µ 6 = 5σ 6 0, and µ 4 = 3σ 4 0. Then, from 3.7, we get Var nt θ = 6 as shown in Section The Kurtosis Statistic In this section, we investigate the asymptotic variance of kurtosis statistic when the disturbance terms of the regression model stated in Section 2.. are simply i.i.d with mean zero and variance 2. We consider the following kurtosis statistic: T θ = ρy i, n θ, where ρy i, θ = ε 4 i / σ 4 µ 4 /, where θ = β, σ 2 is the QMLE of θ 0. Then, the variance of nt θ 0 is Cθ 0 = E ε 4 /σ 4 0 µ 4 /σ = µ 8 µ

19 Simple calculations show that Dθ 0 = 4µ 3 σ 4 0 M x 2µ Using the first order conditions in Section 2.., we obtain Pθ 0 = µ 4 2σ 6 0 µ 5 σ 6 0 M x + µ 6 2σ Note that Pθ 0 D θ 0, in general, and only under symmetry, i.e., when there is no distributional misspecification, we will get the IM type-equality of Section 2. An application of Proposition 3 gives nt Var θ = 8 µ 8 µ µ2 3 6 M xq x M x + 6µ 4µ M xq x M x + 4µ2 4 2 µ 4 4 4µ3 µ M xq x M x 2µ µ 4µ Then, by Remark, we obtain nt Var θ = 8 µ 8 µ µ µ 4µ 2 3 σ µ2 4 2 µ 4 σ 4 0 8µ 3µ 5 σ µ2 4 σ 8 0 4µ 4µ Under the null hypothesis of no excess kurtosis, i.e., µ 4 = 3σ 4 0, µ 6 = 5σ 6 0 and µ 8 = 05σ 8 0, the above expression simplifies to nt Var θ = 64 µ µ 3µ Remark 5. Under no misspecification, i.e., under the normality assumption of disturbance term, nt we have µ 3 = µ 5 = 0. Then, from 3.24, we get Var θ = 24 as shown in Section Conclusion In this study, we provide the variance formulas for the asymptotic variance of test statistics that can be written as the sample averages of data. We first generalize the Pierce formula in the ML setting and illustrate the practical applications of the formula within the context of some wellknown test statistics. We next drive a similar formula in the QML setting for the same type of test statistics and provide two illustrations. We show that the asymptotic covariance between the MLE and the test statistic equals to the expectation of gradient of the test statistic. This information matrix type-equality allows us to simplify the asymptotic variance formula in the ML setting. Since there is no such equality-relations in the QML setting, it is not possible to simplify the asymptotic 9

20 variance formula. Our examples clearly indicate the usefulness of asymptotic variance formulas in hypothesis testing. 20

21 Appendix A Some Useful Lemmas Lemma. Let gx, θ be a continuous function of θ for each x and a measurable function of x for each θ on X Θ, where X is a Euclidean space and Θ is a compact subset of a Euclidean space. Assume that i gx, θ hx for all x and θ, where h is integrable with respect to a probability distribution function F on X. ii x, x 2,... x n is a random sample from F. Then: n gx i, θ almost everywhere uniformly for all θ in Θ. Proof. See Jennrich 969, Theorem 2. gx, θdf x = E gx, θ, Lemma 2. Let Z i for i =,..., n be i.i.d random variables assuming values in some set Ψ endowed with a sigma-field A. Let q : Ψ Θ R, where Θ R k is compact. Let Q n z, θ = n n qz i, θ be a measurable function for all θ Θ, and a continuous function of θ for all z Ψ. Assume that i Qn z, θ Q n θ 0 a.e uniformly for all θ Θ, where Qn θ = E qz, θ, ii θ θ 0 a.e., Then: Qn z, θ Q n θ 0 0, a.e. Proof. This lemma is a simple modification of White 980, Lemma 2.6. A. Proof of Proposition The claim can be proved by following the asymptotic argument given in Huber 967, Corollary, p.23. Define the following vector ξy, θ, ρ = log fy,θ θ ρy, θ ρ. A. Note that plim n n n ρy i, θ = ρ by Lemmas and 2. Thus, under our regularity conditions, we have plim n n n ξy i, θ, ρ = 0. Note that 2 log fy,θ 0 E θρ ξy, θ 0, ρ = E 0 θ θ ρy,θ 0 θ I = Aθ 0 0 Dθ 0 I, A.2 2

22 where Dθ 0 = E ρy, θ0. Also note that θ E ξy, θ 0, ρ ξ Aθ 0 Pθ 0 y, θ 0, ρ =, A.3 P θ 0 Cθ 0 where Pθ 0 = E log fy,θ0 θ ρy, θ 0 ρ and Cθ 0 = E ρy, θ 0 ρ ρy, θ 0 ρ. Then, an application of Huber 967, Corollary, p.23 yields: where [ ] n θ θ0 d A θ 0 V θ 0 N 0,, A.4 nt θ ρ Vθ 0 Sθ 0 A θ 0 V θ 0 = Vθ 0 Sθ 0 Aθ 0 0 Dθ 0 Using the inverse partitioned matrix formula, Aθ 0 Pθ 0 Aθ 0 D θ 0. I P θ 0 Cθ 0 0 I Aθ 0 0 A θ 0 0 = Dθ 0 I Dθ 0 A, A.5 θ 0 I Vθ 0 and Sθ 0 in A.4 can be expressed as Vθ 0 = Dθ 0 A θ 0 + P θ 0 A θ 0, Sθ 0 = Cθ 0 + Dθ 0 A θ 0 D θ 0 + P θ 0 A θ 0 D θ 0 + Dθ 0 A θ 0 Pθ 0. The assumption that Eρy, θ 0 is independent of θ 0 implies that E T θ θ0 Eρy, θ = θ0 = ρy, θ θ θ θ 0 fy, θ 0 dν ρy, θ nρy, = log fy, θ θ fy, θ 0 dν + θ0 θ0 n θ θ0 fy, θ 0 dν = 0. A.6 A.7 A.8 Since E log fy,θ θ θ0 = 0, A.8 can be expressed as ρy, θ n log fy, θ θ fy, θ 0 dν + ρy, θ0 ρ θ0 n θ which implies that θ0 fy, θ 0 dν = 0, A.9 P θ 0 = Dθ 0. A.0 22

23 Using A.0 in A.6 and A.7 for Vθ 0 and Sθ 0, respectively, yields the desired results. A.2 Proof of Proposition 2 Using Lemma 2, the estimators for the components of Sθ 0 and Vθ 0 can be formulated from the sample counterparts. These sample counterparts are D θ = n C θ = n ρy i, θ θ θ, ρy i, θ T θ ρy i, θ T θ. A. A.2 Then, Lemmas and 2 ensure that D θ = Dθ 0 + o p and C θ = Cθ 0 + o p. A.3 Proof of Proposition 3 The proof is similar to that of Proposition. Again, define the following vector ϑy, θ, ρ = log fy, θ θ ρy, θ. A.3 ρ Note that plim n n n ρy i, θ = ρ by Lemmas and 2. Thus, under our regularity conditions, we have plim n n n ϑy i, θ, ρ = 0. Also note that E θρ ϑy, θ, ρ = E 2 log fy,θ θ θ 0 ρy,θ θ I = Aθ 0 Dθ I, A.4 E ϑy, θ, ρ ϑ Bθ Pθ y, θ, ρ =, A.5 P θ Cθ where Dθ = E ρy, θ, Pθ θ = E log fy,θ θ ρy, θ ρ, and Cθ = E ρy, θ ρ ρy, θ ρ. Then, an application of Huber 967, Corollary, p.23 yields: where [ ] n θ θ d A θ Bθ A θ V θ N 0,, A.6 nt θ ρ Vθ Sθ A θ Bθ A θ V θ = Vθ Sθ Aθ 0 Dθ Bθ Pθ Aθ D θ. I P θ Cθ 0 I 23

24 Using the inverse partitioned matrix formula see A.5, it can be shown that Vθ = Dθ A θ Bθ A θ + P θ A θ, A.7 Sθ = Cθ + Dθ A θ Bθ A θ D θ + P θ A θ D θ + Dθ A θ Pθ. A.8 Since the information matrix type equality Dθ 0 = P θ 0 does not hold under QML setting, expressions in A.7 and A.8 cannot be further simplified. 24

25 References Andreou, Elena and Bas J. M. Werker 200. An Alternative Asymptotic Analysis of Residual- Based Statistics. In: The Review of Economics and Statistics 94., pp Bera, Anil K. and Xiao-Lei Zuo 996. Specification test for a linear regression model with ARCH process. In: Journal of Statistical Planning and Inference 50.2, pp Cox, D. R. 96. Tests of Separate Families of Hypotheses. In: Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume : Contributions to the Theory of Statistics. Berkeley, Calif.: University of California Press, pp Further Results on Tests of Separate Families of Hypotheses. In: Journal of the Royal Statistical Society. Series B Methodological 24.2, pp Durbin, J Testing for Serial Correlation in Least-Squares Regression When Some of the Regressors are Lagged Dependent Variables. In: Econometrica 38.3, pp Eicker, Friedhelm 963. Asymptotic Normality and Consistency of the Least Squares Estimators for Families of Linear Regressions. In: The Annals of Mathematical Statistics 34.2, pp Limit theorems for regressions with unequal and dependent errors. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume : Statistics. Ed. by L.M. LeCam and J. Neyman. University of California Press, pp Gorodnichenko, Yuriy, Anna Mikusheva, and Serena Ng 202. Estimators for persistent and possibly nonstationary data with classical properties. In: Econometric Theory 28.5, pp Huber, Peter J The behavior of maximum likelihood estimates under nonstandard conditions. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume : Statistics. Berkeley, Calif.: University of California Press, pp Jarque, Carlos M. and Anil K. Bera 987. A Test for Normality of Observations and Regression Residuals. In: International Statistical Review / Revue Internationale de Statistique 55.2, pp Jennrich, Robert I Asymptotic Properties of Non-Linear Least Squares Estimators. In: Ann. Math. Statist. 40.2, pp Koopmans, T.C., H. Rubin, and R.B Leipnik 950. Measuring the Equation Systems of Dynamic Economics. In: Statistical Inference in Dynamic Economic Models by Cowles Commission Monograph no0. Ed. by T.C. Koopmans. John Wiley and Sons, Inc., pp Newey, Whitney K. 985a. Generalized method of moments specification testing. In: Journal of Econometrics 29.3, pp b. Maximum Likelihood Specification Testing and Conditional Moment Tests. In: Econometrica 53.5, pp Newey, Whitney K. and Daniel McFadden 994. Chapter 36 Large sample estimation and hypothesis testing. In: ed. by Robert F. Engle and Daniel L. McFadden. Vol. 4. Handbook of Econometrics. Elsevier, pp

26 Neyman, Jerzy 935. Sur la vérification des hypothèses statistiques composées. French. In: Bulletin de la Société Mathématique de France 63, pp Current problems of mathematical statistics. Statistical Laboratory, University of California Optimal asymptotic tests of composite statistical hypotheses. In: Probability and Statistics, the Harald Cramer Volume. Ed. by U. Grenander. Wiley, New York, pp Pierce, Donald A The Asymptotic Effect of Substituting Estimators for Parameters in Certain Types of Statistics. In: The Annals of Statistics 0.2, pp Prokhorov, Artem and Peter Schmidt GMM redundancy results for general missing data problems. In: Journal of Econometrics 5., pp Rao, C. Radhakrishna 973. Linear statistical inference and its applications. 2nd Edition. Wiley series in probability and mathematical statistics: Probability and mathematical statistics. John Wiley & Sons, Inc. Student 908. The Probable Error of a Mean. In: Biometrika 6., pp. 25. Tauchen, George 985. Diagnostic testing and evaluation of maximum likelihood models. In: Journal of Econometrics 30., pp Tse, Y. K Residual-based diagnostics for conditional heteroscedasticity models. In: The Econometrics Journal 5.2, pp White, Halbert 980. Nonlinear Regression on Cross-Section Data. In: Econometrica 48.3, pp a. Maximum Likelihood Estimation of Misspecified Models. In: Econometrica 50., pp b. Regularity conditions for cox s test of non-nested hypotheses. In: Journal of Econometrics 9.2, pp

GARCH Models Estimation and Inference

GARCH Models Estimation and Inference Eduardo Rossi University of Pavia December 013 Rossi GARCH Financial Econometrics - 013 1 / 1 Likelihood function The procedure most often used in estimating θ 0 in