Quasi Maximum Likelihood Theory

Size: px
Start display at page:

Download "Quasi Maximum Likelihood Theory"

Transcription

1 Quasi Maximum Likelihood heory CHUNG-MING KUAN Department of Finance & CREA June 17, 2010 C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

2 Lecture Outline 1 Kullback-Leibler Information Criterion 2 Asymptotic Properties of the QMLE Asymptotic Normality Information Matrix Equality 3 Large Sample ests: Nested Models Wald est LM (Score) est Likelihood Ratio est 4 Large Sample ests: Non-Nested Models Wald Encompassing est Score Encompassing est Pseudo-rue Score Encompassing est C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

3 Lecture Outline (cont d) 5 Example I: Discrete Choice Models Binary Choice Models Multinomial Models Ordered Multinomial Models 6 Example II: Limited Dependent Variable Models runcated Regression Models Censored Regression Models Sample Selection Models 7 Example III: ime Series Models C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

4 Quasi-Maximum Likelihood (QML) heory Drawbacks of the least-squares method: It leaves no room for modeling other conditional moments, such as conditional variance, of the dependent variable. It fails to accommodate certain characteristics of the dependent variable, such as binary response and data truncation. he quasi-maximum likelihood (QML) method: Specifying a likelihood function that admits specifications of different conditional moments and/or distribution characteristics. Model misspecification is allowed, cf. conventional ML method. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

5 Kullback-Leibler Information Criterion (KLIC) Given IP(A) = p, the message that A will surely occur would be more (less) valuable or more (less) surprising when p is small (large). he information content of the message above ought to be a decreasing function of p. An information function is ι(p) = log(1/p), which decreases from positive infinity (p 0) to zero (p = 1). Clearly, ι(1 p) ι(p). he expected information is ( 1 ) ( 1 ) I = p ι(p) + (1 p) ι(1 p) = p log + (1 p) log, p 1 p which is also known as the entropy of the event A. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

6 he information that IP(A) changes from p to q would be useful when p and q are very different. he information content is ι(p) ι(q) = log(q/p), which is positive (negative) when q > p (q < p). Given n mutually exclusive events A 1,..., A n, each with an information value log(q i /p i ), the expected information value is I = n i=1 ( qi ) q i log. p i Kullback-Leibler Information Criterion (KLIC) of g relative to f is ( g(ζ) ) II(g :f ) = log g(ζ) dζ, f (ζ) R where g is the density function of z and f is another density function. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

7 heorem 9.1 II(g :f ) 0; the equality holds if, and only if, g = f almost everywhere. Proof: As log(1 + x) < x for all x > 1, we have ( g ) ( log = log 1 + f g f g It follows that log ( g(ζ) f (ζ) ) g(ζ) dζ > ) > 1 f g. ( 1 f (ζ) ) g(ζ) dζ = 0. g(ζ) Remark: he KLIC measures the closeness between f and g, but is not a metric because it is not reflexive in general, i.e., II(g :f ) II(f :g), and does not obey the triangle inequality. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

8 Let z t = (y t w t) be ν 1 and z t = {z 1, z 2,..., z t }. Given a sample of observations, specifying a complete probability model for z may be a formidable task in practice. It is practically more convenient to focus on the conditional density g t (y t x t ), where x t include some elements of w t and z t 1. We thus specify a quasi-likelihood function f t (y t x t ; θ) with θ Θ R k. he KLIC of g t relative to f t is II(g t :f t ; θ) = log R ( gt (y t x t ) f t (y t x t ; θ) ) g t (y t x t ) dy t. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

9 For a sample of obs, the average of individual KLICs is: ĪI ({g t :f t }; θ) := 1 = 1 II(g t :f t ; θ) ( IE[log gt (y t x t )] IE[log f t (y t x t ; θ)] ). Minimizing ĪI ({g t :f t }; θ) is equivalent to maximizing L (θ) = 1 IE[log f t (y t x t ; θ)]; he maximizer of L (θ), θ, is the minimizer of the average KLIC. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

10 If there exists a θ o Θ such that f t (y t x t ; θ o ) = g t (y t x t ) for all t, we say that {f t } is correctly specified for {y t x t }. In this case, II(g t :f t ; θ o ) = 0, so that ĪI ({g t :f t }; θ) is minimized at θ = θ o. Maximizing the sample counterpart of L (θ): L (y, x ; θ) := 1 log f t (y t x t ; θ), the average of the individual quasi-log-likelihood functions, the resulting solution, θ, is known as the quasi-maximum likelihood estimator (QMLE) of θ. When {f t } is specified correctly for {y t x t }, the QMLE is the standard MLE. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

11 Concentrating on certain conditional attribute of y t, we have, for example, y t x t N ( µ t (x t ; β), σ 2), or more generally, y t x t N ( µ t (x t ; β), h(x t ; α) ). Suppose y t x t N ( µ t (x t ; β), σ 2) and let θ = (β σ 2 ). he maximizer of 1 log f t(y t x t ; θ) also solves min β 1 [y t µ t (x t ; β)] [y t µ t (x t ; β)]. hat is, the NLS estimator is a QMLE under the specification of conditional normality with conditional homoskedasticity. Even when {µ t } is correctly specified for the conditional mean, there is no guarantee that the specification of σ 2 is correct. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

12 Asymptotic Properties of the QMLE he quasi-log-likelihood function is, in general, a nonlinear function in θ. he QMLE θ must be computed numerically using a nonlinear optimization algorithm. [ID-3] here exists a unique θ that minimizes the KLIC. Consistency: Suppose that L (y, x ; θ) tends to L (θ) in probability, uniformly in θ Θ, i.e., L (y, x ; θ) obeys a WULLN. hen, it is natural to expect the QMLE θ to converge in probability to θ, the minimizer of the average KLIC, ĪI ({g t :f t }; θ). C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

13 Asymptotic Normality When θ is in the interior of Θ, the mean-value expansion of L (y, x ; θ ) about θ is L (y, x ; θ ) = L (y, x ; θ ) + 2 L (y, x ; θ )( θ θ ), where the left-hand side is zero because θ must satisfy the first order condition. Let H (θ) = IE[ 2 L (y, x ; θ)]. When 2 L (y, x ; θ) obeys a WULLN, 2 L (y, x ; θ ) H (θ ) IP 0. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

14 When H (θ ) is nonsingular, ( θ θ ) = H (θ ) 1 L (y, x ; θ ) + o IP (1). Let B (θ) = var ( L (y, x ; θ) ) be the information matrix. When log f t (y t x t ; θ) obeys a CL, B (θ ) 1/2 ( L (y, x ; θ ) IE[ L (y, x ; θ )] ) D N (0, Ik ). Knowing IE[ L (y, x ; θ)] = IE[L (y, x ; θ)] is zero at θ = θ, the KLIC minimizer, we have B (θ ) 1/2 L (y, x ; θ ) D N (0, I k ). C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

15 his shows that ( θ θ ) is asymptotically equivalent to H (θ ) 1 B (θ ) 1/2[ B (θ ) 1/2 L (y, x ; θ ) ], which has an asymptotic normal distribution. heorem 9.2 Letting C (θ ) = H (θ ) 1 B (θ )H (θ ) 1, C (θ ) 1/2 ( θ θ ) D N (0, I k ). C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

16 Information Matrix Equality When the likelihood is specified correctly in a proper sense, the information matrix equality holds: H (θ o ) + B (θ o ) = 0. In this case, C (θ o ) simplifies to H (θ o ) 1 or B (θ o ) 1. For the specification of {y t x t }, define the score functions: s t (y t, x t ; θ) = log f t (y t x t ; θ) = f t (y t x t ; θ) 1 f t (y t x t ; θ), so that f t (y t x t ; θ) = s t (y t, x t ; θ)f t (y t x t ; θ). C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

17 By permitting interchange of differentiation and integration, s t (y t, x t ; θ)f t (y t x t ; θ) dy t = f t (y t x t ; θ) dy t = 0. R If {f t } is correctly specified for {y t x t }, IE[s t (y t, x t ; θ o ) x t ] = 0, where the conditional expectation is taken with respect to g t (y t x t ) = f t (y t x t ; θ o ). hus, mean score is zero: IE[s t (y t, x t ; θ o )] = 0. As f t = s t f t, we have 2 f t = ( s t )f t + (s t s t)f t, so that [ s t (y t, x t ; θ) + s t (y t, x t ; θ)s t (y t, x t ; θ) ] f t (y t x t ; θ) dy t R = 2 = 0. R f t (y t x t ; θ) dy t R C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

18 We have shown IE[ s t (y t, x t ; θ o ) x t ] + IE[s t (y t, x t ; θ o )s t (y t, x t ; θ o ) x t ] = 0. It follows that 1 IE[ s t (y t, x t ; θ o )] + 1 = H (θ o ) + 1 = 0. IE[s t (y t, x t ; θ o )s t (y t, x t ; θ o ) ] IE[s t (y t, x t ; θ o )s t (y t, x t ; θ o ) ] i.e., the expected Hessian matrix is negative of the average of individual information matrices, which need not be the information matrix B (θ o ). C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

19 By definition, the information matrix is [( B (θ o ) = 1 ) ( IE )] s t (y t, x t ; θ o ) s t (y t, x t ; θ o ) = 1 IE[s t (y t, x t ; θ o )s t (y t, x t ; θ o ) ] τ=1 t=τ+1 1 τ=1 t=τ+1 IE[s t τ (y t τ, x t τ ; θ o )s t (y t, x t ; θ o ) ] IE[s t (y t, x t ; θ o )s t+τ (y t+τ, x t+τ ; θ o ) ]. hat is, the information matrix involves the variances and autocovariances of individual score functions. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

20 A specification of {y t x t } is said to have dynamic misspecification if it is not correctly specified for {y t w t, z t 1 }; that is, there does not exist any θ o such that f t (y t x t ; θ o ) = g t (y t w t, z t 1 ). When f t (y t x t ; θ o ) = g t (y t w t, z t 1 ), IE[s t (y t, x t ; θ o ) x t ] = IE[s t (y t, x t ; θ o ) w t, z t 1 ] = 0, and by the law of iterated expectations, IE [ s t (y t, x t ; θ o )s t+τ (y t+τ, x t+τ ; θ o ) ] = IE [ s t (y t, x t ; θ o ) IE[s t+τ (y t+τ, x t+τ ; θ o ) w t+τ, z t+τ 1 ] ] = 0, for τ 1. In this case, B (θ o ) = 1 IE [ s t (y t, x t ; θ o )s t (y t, x t ; θ o ) ]. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

21 heorem 9.3 Suppose that there exists a θ o such that f t (y t x t ; θ o ) = g t (y t x t ) and there is no dynamic misspecification. hen, H (θ o ) + B (θ o ) = 0, where H (θ o ) = 1 IE[ s t(y t, x t ; θ o )] and B (θ o ) = 1 IE[s t (y t, x t ; θ o )s t (y t, x t ; θ o ) ]. When heorem 9.3 holds, B (θ o ) 1/2 ( θ θ o ) D N (0, I k ). hat is, the QMLE is asymptotically efficient because it achieves the Cramér-Rao lower bound asymptotically. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

22 Example: Assume: y t x t N (x tβ, σ 2 ) for all t, so that L (y, x ; θ) = 1 2 log(2π) 1 2 log(σ2 ) 1 (y t x tβ) 2 2σ 2. Straightforward calculation yields L (y, x ; θ) = 1 2 L (y, x ; θ) = 1 x t(y t x t β) σ (yt x 2σ 2 t β)2 2(σ 2 ) 2, xtx t xt(yt x σ 2 t β) (σ 2 ) 2 (yt x t β)x t 1 (yt x (σ 2 ) 2 2(σ 2 ) 2 t β)2 (σ 2 ) 3. solving L (y, x ; θ) = 0 we obtain the QMLEs for β and σ 2. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

23 If the specification above is correct for {y t x t }, there exists θ o = (β o σo) 2 such that y t x t N (x tβ o, σo). 2 aking expectation with respect to the true distribution, IE[x t (y t x tβ)] = IE[x t (IE(y t x t ) x tβ)] = IE(x t x t)(β o β), which is zero when evaluated at β = β o. Similarly, IE[(y t x tβ) 2 ] = IE[(y t x tβ o + x tβ o x tβ) 2 ] = IE[(y t x tβ o ) 2 ] + IE[(x tβ o x tβ) 2 ] = σo 2 + IE[(x tβ o x tβ) 2 ], where the second term on the right-hand side is zero if it is evaluated at β = β o. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

24 he results above show that H (θ) = IE[ 2 L (θ)] = 1 IE(xtx t ) σ 2 IE(xtx t )(β o β) (σ 2 ) 2 (β o β) IE(x tx t ) 1 σ2 (σ 2 ) 2 2(σ 2 ) 2 o +IE[(x t β o x t β)2 ] (σ 2 ) 3, and H (θ o ) = 1 IE(xtx t ) 0 σo (σo 2)2. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

25 Without dynamic misspecification, the information matrix B (θ) is (y 1 t x t β)2 x tx t xt(yt x IE (σ 2 ) 2 t β) + xt(yt x 2(σ 2 ) 2 t β)3 2(σ 2 ) 3 (yt x t β)x t + (yt x 2(σ 2 ) 2 t β)3 x t 1 (yt x 2(σ 2 ) 3 4(σ 2 ) 2 t β)2 + (yt x 2(σ 2 ) 3 t β)4 4(σ 2 ) 4. Given conditional normality, its conditional third and fourth central moments are zero and 3(σo) 2 2, respectively. hen, IE[(y t x tβ) 3 ] = 3σo 2 IE[(x tβ o x tβ)] + IE[(x tβ o x tβ) 3 ], which is zero when evaluated at β = β o. Similarly, IE[(y t x tβ) 4 ] = 3(σo) σo 2 IE[(x tβ o x tβ) 2 ] + IE[(x tβ o x tβ) 4 ], which is 3(σo) 2 2 when evaluated at β = β o. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

26 It is now easily seen that the information matrix equality holds because B (θ o ) = 1 IE(x tx t ) 0 σo (σ 2 o )2 A typical consistent estimator of H (θ o ) is P H = xtx t 0 σ ( σ 2 )2 When the information matrix equality holds, a consistent estimator of C (θ o ) is H 1. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

27 Example: he specification is y t x t N (x tβ, σ 2 ), but the true conditional behavior is y t x t N ( x tβ o, h(x t, α o ) ). hat is, our specification is correct only for the conditional mean. Due to misspecification, the KLIC minimizer is θ = (β o (σ ) 2 ). hen, H (θ ) = 1 IE(xtx t ) 0 (σ ) IE[h(xt,αo)], 2(σ ) 4 (σ ) 6 and B (θ ) = 1 IE[h(x t,α o)x tx t ] (σ ) IE[h(xt,αo)] + 3 IE[h(xt,αo)2 ] 4(σ ) 4 2(σ ) 6 4(σ ) 8. he information matrix equality does not hold here, despite that the conditional mean function is specified correctly. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

28 he upper-left block of H is x tx t/( σ 2 ), which remains a consistent estimator of the corresponding block in H (θ ). he information matrix B (θ ) can be consistently estimated by a block-diagonal matrix with the upper-left block: ê2 t x t x t ( σ 2 )2. he upper-left block of C = ( 1 ) 1 ( x t x 1 t 1 H B H 1 is thus ) ( êt 2 x t x 1 t 1 x t x t). his is precisely the the Eicker-White estimator. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

29 Wald est Consider the null hypothesis Rθ = r, where R is q k matrix with full row rank. he Wald test checks if R θ is sufficiently close to r. By the asymptotic normality result: C (θ ) 1/2 ( θ θ ) D N (0, I k ), we have under the null hypothesis that [RC (θ )R ] 1/2 (R θ r) H 1 D N (0, I q ). his result remains valid when C (θ ) is replaced by a consistent estimator: C = B H 1. he Wald test is W = (R θ r) (R C R ) 1 (R θ r) D χ 2 (q). C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

30 Example: Specification:y t x t N (x tβ, σ 2 ). Writing θ = (σ 2 β ) and β = (b 1 b 2), where b 1 is (k s) 1, and b 2 is s 1. We are interested in the hypothesis b 2 = Rθ = 0, where R = [0 R 1 ] and R 1 = [0 I s ] is s k. With β 2, = R θ, the Wald test is W = β 2, (R C R ) 1 β 2,. When the information matrix equality holds, C = H 1 so that R C R = R H 1 R = σ 2 R 1(X X/ ) 1 R 1. he Wald test becomes W = β 2, [R 1 (X X/ ) 1 R 1] 1 β 2, / σ 2 D χ 2 (s). C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

31 LM (Score) est Maximizing L (θ) subject to the constraint Rθ = r yields the Lagrangian: L (θ) + θ R λ, where λ is the vector of Lagrange multipliers. he constrained QMLEs are θ and λ ; we want to check if λ is sufficiently close to zero. First note the saddle-point condition: L ( θ ) + R λ = 0. he mean-value expansion of L ( θ ) about θ yields L (θ ) + 2 L (θ )( θ θ ) + R λ = 0, where θ is the mean value between θ and θ. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

32 Recall from the discussion of the Wald test that ( θ θ ) = H (θ ) 1 L (θ ) + o IP (1), we obtain 0 = H (θ ) 1 L (θ ) H (θ ) 1 2 L (θ ) ( θ θ ) H (θ ) 1 R λ = ( θ θ ) ( θ θ ) H (θ ) 1 R λ + o IP (1). Pre-multiplying both sides by R and noting that R( θ θ ) = 0, we have λ = [RH (θ ) 1 R ] 1 R ( θ θ ) + o IP (1). C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

33 For Λ (θ ) = [RH (θ ) 1 R ] 1 RC (θ )R [RH (θ ) 1 R ] 1, the normalized Lagrangian multiplier is Λ (θ ) 1/2 λ = Λ (θ ) 1/2 [RH (θ ) 1 R ] 1 R ( θ θ ) It follows that Λ 1/2 λ D N (0, I q ). D N (0, Iq ). where Λ = (RḦ 1 R ) 1 R C R (RḦ 1 R ) 1, and Ḧ and C are consistent estimators based on the constrained QMLE θ. he LM test is LM = λ 1 Λ λ D χ 2 (q). C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

34 In the light of the saddle-point condition: L ( θ ) + R λ = 0, LM = λ RḦ 1 R (R C R 1 ) RḦ 1 R λ = [ L ( θ )] Ḧ 1 R (R C R ) 1 RḦ 1 [ L ( θ )], which mainly depends on the score function L evaluated at θ. When the information matrix equality holds, LM = λ RḦ 1 R λ = [ L ( θ )] Ḧ 1 [ L ( θ )]. he LM test is also known as the score test in the statistics literature. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

35 Example: Specification: y t x t N (x tβ, σ 2 ). Let θ = (σ 2 β ) and β = (b 1 b 2), where b 1 is (k s) 1, and b 2 is s 1. he null hypothesis is b 2 = Rθ = 0, where R = [0 R 1 ] is s (k + 1) and R 1 = [0 I s ] is s k. From the saddle-point condition, L ( θ ) = R λ, and hence L ( θ ) = σ 2L ( θ ) b1 L ( θ ) b2 L ( θ ) = hus, the LM test is mainly based on: b2 L ( θ ) = 1 σ λ. x 2t ɛ t = X 2 ɛ/( σ 2 ), where σ 2 = ɛ ɛ/, with ɛ the vector of constrained residuals. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

36 he LM test statistic now reads: 0 0 X 2 ɛ/( σ2 ) R (R C R ) 1 RḦ 1 Ḧ X 2 ɛ/( σ2 ), which converges in distribution to χ 2 (s) under the null. When the information matrix equality holds, LM = [ L ( θ )] Ḧ 1 [ L ( θ )] = [0 ɛ X 2 / ](X X/ ) 1 [0 ɛ X 2 / ] / σ 2 = [ ɛ X(X X) 1 X ɛ/ ɛ ɛ] = R 2, where R 2 is from the auxiliary regression of ɛ t on x 1t and x 2t. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

37 Example (Breusch-Pagan est): Let h : R (0, ) be a differentiable function. Consider the specification: y t x t, ζ t N ( x tβ, h(ζ tα) ), where ζ tα = α 0 + p i=1 ζ tiα i. Under this specification, L (y, x, ζ ; θ) = 1 2 log(2π) log ( h(ζ tα) ) (y t x tβ) 2 2h(ζ tα). he null hypothesis is α 1 = = α p = 0 so that h(α 0 ) = σ 2 0, i.e., conditional homoskedasticity. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

38 For the LM test, the corresponding score vector is α L (y, x, ζ ; θ) = 1 [ h1 (ζ tα)ζ t 2h(ζ tα) ( (yt x tβ) 2 h(ζ tα) )] 1, where h 1 (η) = dh(η)/ dη. Under the null, h 1 (ζ tα) = h 1 (α 0 ) =: c. he constrained specification is y t x t, ζ t N (x tβ, σ 2 ), and the constrained QMLEs are the OLS estimator ˆβ and σ 2 = ê2 t /. he score vector evaluated at the constrained QMLEs is α L (y t, x t, ζ t ; θ ) = c [ ζt 2 σ 2 ( ê2 t σ 2 )] 1. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

39 It can be shown that the (p + 1) (p + 1) block of the Hessian matrix corresponding to α is 1 [ (yt x tβ) 2 ] 1 h 3 (ζ + tα) 2h 2 (ζ [h 1 (ζ α)] 2 ζ t ζ t tα) [ (yt x + tβ) 2 ] 2h 2 (ζ tα) 1 2h(ζ h 2 (ζ α)ζ t ζ t, tα) where h 2 (η) = dh 1 (η)/ dη. Evaluating the expectation of this block at θ o = (β o α 0 0 ) and noting that σ 2 o = h(α 0 ), we have ( ) ( c 2 1 2[(σo) 2 2 ) IE(ζ t ζ t). C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

40 his block of the expected Hessian matrix can be consistently estimated by ( ) ( ) c 2 1 2( σ 2 (ζ )2 t ζ t). Setting d t = êt 2 / σ 2 1, the LM statistic under the information matrix equality is ( LM = 1 ) ( ) 1 ( ) d 2 t ζ t ζ t ζ D t ζ t d t χ 2 (p), where the numerator is the (centered) regression sum of squares (RSS) of regressing d t on ζ t. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

41 Remarks: In the Breusch-Pagan test, the function h does not show up in the statistic. As such, this test is capable of testing general conditional heteroskedasticity without specifying a functional form of h. When ζ t contains the squares and all cross-product terms of the non-constant elements of x t : xit 2 and x itx jt (let there be n of such terms), the resulting Breusch-Pagan test is also the test of heteroskedasticity of unknown form due to White (1980) and has the limiting distribution χ 2 (n). he Breusch-Pagan test is valid when the information matrix equality holds. hus, the Breusch-Pagan test is not robust to dynamic misspecification, e.g., when the errors are serially correlated. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

42 Koenker (1981): As 1 ê4 t IP 3(σ 2 o) 2 under the null, 1 dt 2 = 1 ê 4 t ( σ 2 )2 2 ê 2 t σ IP 2. hus, the test below is asymptotically equivalent to the original Breusch-Pagan test: LM = ( ) ( ) 1 ( )/ d t ζ t ζ t ζ t ζ t d t dt 2, which can be computed as R 2, where R 2 is obtained from regressing d t on ζ t. As i=1 d i = 0, the centered and non-centered R 2 are equivalent. hus, the Breusch-Pagan test can be computed as R 2 from the regression of ê 2 t on ζ t. (Why?) C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

43 Example (Breusch-Godfrey est): Given the specification y t x t N (x tβ, σ 2 ), we are interested in testing if y t x tβ are serially uncorrelated. For the AR(1) error: y t x tβ = ρ(y t 1 x t 1β) + u t, with ρ < 1 and {u t } a white noise. he null hypothesis is ρ = 0. Consider a general specification that admits serial correlations: y t y t 1, x t, x t 1 N (x tβ + ρ(y t 1 x t 1β), σ 2 u). Under the null, y t x t N (x tβ, σ 2 ), and the constrained QMLE of β is the OLS estimator ˆβ. esting the null hypothesis amounts to testing whether y t 1 x t 1β should be included in the mean specification. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

44 When the information matrix equality holds, the LM test can be computed as R 2, where R 2 is from the regression of ê t = y t x ˆβ t on x t and y t 1 x t 1 β. Replacing β with its constrained estimator ˆβ, we can also obtain R 2 from the regression of ê t on x t and ê t 1. his test has the limiting χ 2 (1) distribution under the null. he Breusch-Godfrey test can be extended straightforwardly to check if the errors follow an AR(p) process. By regressing ê t on x t and ê t 1,..., ê t p, the resulting R 2 is the LM test when the information matrix equality holds and has a limiting χ 2 (p) distribution. Remark: If there is neglected conditional heteroskedasticity, the information matrix equality would fail, and the Breusch-Godfrey test no longer has a limiting χ 2 distribution. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

45 If the specification is y t x tβ = u t + αu t 1, i.e., the errors follow an MA(1) process, we can write y t x t, u t 1 N (x tβ + αu t 1, σu). 2 he null hypothesis is α = 0. Again, the constrained specification is the standard linear regression model y t = x tβ, and the constrained QMLE of β is still the OLS estimator ˆβ. he LM test of α = 0 can be computed as R 2 with R 2 obtained from the regression of û t = y t x ˆβ t on x t and û t 1. his is identical to the LM test for AR(1) errors. Similarly, the Breusch-Godfrey test for MA(p) errors is also the same as that for AR(p) errors. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

46 Likelihood Ratio est he likelihood ratio (LR) test compares the log-likelihoods of the constrained and unconstrained specifications: LR = 2 [L ( θ ) L ( θ )]. Recall from the discussion of the LM test: ( θ θ ) = ( θ θ ) H (θ ) 1 R λ + o IP (1), and λ = [RH (θ ) 1 R ] 1 R ( θ θ ) + o IP (1), we have ( θ θ ) = H (θ ) 1 R [RH (θ ) 1 R ] 1 R ( θ θ )+o IP (1). C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

47 By aylor expansion of L ( θ ) about θ, we have LR = 2 [ L ( θ ) L ( θ ) ] = 2 L ( θ )( θ θ ) ( θ θ ) H ( θ )( θ θ ) + o IP (1) = ( θ θ ) H (θ )( θ θ ) + o IP (1), because L ( θ ) = 0. It follows that LR = ( θ θ ) R [RH (θ ) 1 R ] 1 R( θ θ ) + o IP (1). he first term on the RHS would have an asymptotic χ 2 (q) distribution provided that the information matrix equality holds. (Why?) C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

48 Remarks: he 3 classical large sample tests check different aspects of the likelihood function. he Wald test checks if θ is close to θ ; the LM test checks if L ( θ ) is close to zero; the LR test checks if the constrained and unconstrained log-likelihood values are close to each other. he Wald and LM tests are based on unconstrained and constrained estimation results, respectively, but the LR test requires both. he Wald and LM test can be made robust to misspecification by employing a suitable consistent estimator of the asymptotic covariance matrix. Yet, the LR test is valid only when the information matrix equality holds. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

49 esting Non-Nested Models Consider testing non-nested specifications: H 0 : y t x t, ξ t f (y t x t ; θ), θ Θ R p, H 1 : y t x t, ξ t ϕ(y t ξ t ; ψ), ψ Ψ R q, where x t and ξ t are two sets of variables. hese specification are non-nested because they can not be derived from each other by imposing restrictions on the parameters. he encompassing principle of Mizon (1984) and Mizon and Richard (1986) asserts that, if the model under the null is true, it should encompass the model under the alternative, such that a statistic of the alternative model should be close to its pseudo-true value, the probability limit evaluated under the null model. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

50 Wald Encompassing est An encompassing test for non-nested hypotheses is based on the difference between a chosen statistic and the sample counterpart of its pseudo-true value. When the chosen statistic is the QMLE of the alternative model, the resulting test is the Wald encompassing test (WE). Consider now the non-nested specifications of the conditional mean function: H 0 : y t x t, ξ t N (x tβ, σ 2 ), β B R k, H 1 : y t x t, ξ t N (ξ tδ, σ 2 ), δ D R r, where x t and ξ t do not have elements in common. Let ˆβ and ˆδ denote the QMLEs of the parameters in the null and alternative models. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

51 Under the null: y t x t, ξ t N (x tβ o, σ 2 o), IE(ξ t y t ) = IE(ξ t x t)β o, and hence where ( 1 ˆδ = ) 1 ( ξ t ξ 1 t ) IP ξ t y t M 1 ξξ M ξxβ o, 1 M ξξ = lim IE(ξ t ξ t), 1 M ξx = lim IE(ξ t x t). Clearly, the pseudo-true parameter δ(β o ) = M 1 ξξ M ξxβ o would not be the probability limit of ˆδ if x tβ is an incorrect specification of the conditional mean. hus, whether ˆδ and the sample counterpart of δ(β o ) is sufficiently close to zero constitutes an evidence for or against the null hypothesis. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

52 he WE is then based on: ( 1 ˆδ ˆδ(ˆβ ) = ˆδ ( 1 = ) 1 ( ξ t ξ 1 t ) 1 ( ξ t ξ 1 t ) ξ t x t ˆβ ) ξ t (y t x ˆβ t ), which in effect checks if ɛ t = y t x tβ and ξ t are correlated. Letting ê t = y t x t ˆβ, we have 1 which is 1 ξ t ê t = 1 ( 1 ξ t ɛ t ( 1 ξ t ɛ t ) ( ξ t x 1 t ξ t x t) (ˆβ β o ), ) 1 ( x t x 1 t x t ɛ t ). C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

53 Setting ˆξ t = M ξx M 1 xx x t, we can write 1 ξ t ê t = 1 Under the null hypothesis, where 1 Σ o = σ 2 o ξ t ê t ( D N (0, Σo ), lim 1 (ξ t ˆξ t )ɛ t + o IP (1). IE [ (ξ t ˆξ t )(ξ t ˆξ t ) ]) = σo 2 ( Mξξ M ξx M 1 ) xx M xξ. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

54 Consequently, 1/2 [ˆδ ˆδ(ˆβ )] and hence D N ( 0, M 1 ξξ Σ om 1 ) ξξ, [ˆδ ˆδ(ˆβ ) ] Mξξ Σ 1 o M ξξ [ˆδ ˆδ(ˆβ ) ] D χ 2 (r). A consistent estimator for Σ o is [( Σ = ˆσ 2 1 ( 1 ) ξ t ξ t ) ( ξ t x 1 t ) 1 ( x t x 1 t t) x t ξ. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

55 he WE statistic reads WE = [ˆδ ˆδ(ˆβ ) ] ( 1 ) ξ t ξ t D χ 2 (r). ( Σ 1 1 ) ξ t ξ t [ˆδ ˆδ(ˆβ ) ] When x t and ξ t have s (s < r) elements in common, ξ tê t have s elements that are identically zero. hus, rank(σ o ) = r r s, and the WE should be computed with inverse Σ. Σ 1 replaced by Σ, the generalized C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

56 Score Encompassing est Under H 1 : y t x t, ξ t N (ξ tδ, σ 2 ), the score function evaluated at the pseudo-true parameter δ(β o ) is (apart from a constant σ 2 ) 1 ξ t [y t ξ tδ(β o )] = 1 [ ξ t yt ξ ( t M 1 ξξ M )] ξxβ o. When the pseudo-true parameter is replaced by its estimator ˆδ(ˆβ ), the score function becomes ( 1 ξ t y t ξ 1 t ) 1 ( ξ t ξ 1 t ) ξ t x t ˆβ = 1 ξ t ê t. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

57 he score encompassing test (SE) is based on 1 ξ t ê t and the SE statistic is ( ) 1 ê t ξ t Σ 1 D N (0, Σo ), ( ) ξ t ê t D χ 2 (r). he WE and SE are based on the same ingredient and both require evaluating the pseudo-true parameter. he WE and SE are difficult to implement when the pseudo-true parameter is not readily derived; this may happen when, e.g., the QMLE does not have an analytic form. (Examples?) C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

58 Pseudo-rue Score Encompassing est Chen and Kuan (2002) propose the pseudo-true score encompassing (PSE) test which is based on the pseudo-true value of the score function under the alternative. Although it may be difficult to evaluate the pseudo-true value of a QMLE when it does not have a closed form, it would be easier to evaluate the pseudo-true score because the analytic from of the score function is usually available. he null and alternative hypotheses are: H 0 : y t x t, ξ t f (x t ; θ), θ Θ R p, H 1 : y t x t, ξ t ϕ(ξ t, ψ), ψ Ψ R q. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

59 Let s f,t (θ) = log f (y t x t ; θ), s ϕ,t (ψ) = log ϕ(y t ξ t ; ψ), and L f, (θ) = 1 s f,t (θ), L ϕ, (ψ) = 1 s ϕ,t (ψ). he pseudo-true score function of L ϕ, (ψ) is J ϕ (θ, ψ) := lim IE [ f (θ) Lϕ, (ψ) ]. As the pseudo-true parameter ψ(θ o ) is the KLIC minimizer when the null hypothesis is specified correctly, we have J ϕ (θ o, ψ(θ o )) = 0. hus, whether J ϕ (ˆθ, ˆψ ) is close to zero constitutes an evidence for or against the null hypothesis. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

60 Following Wooldridge (1990), we can incorporate the null model into the the score function and write L ϕ, (θ, ψ) = 1 d 1,t (θ, ψ) + 1 d 2,t (θ, ψ)c t (θ), where IE f (θ) [c t (θ) x t, ξ t ] = 0. As such, 1 J ϕ (θ, ψ) = lim [ IE f (θ) d1,t (θ, ψ) ], and its sample counterpart is Ĵ ϕ (ˆθ, ˆψ ) = 1 d 1,t (ˆθ, ˆψ ) = 1 ) ) d 2,t (ˆθ, ˆψ ct (ˆθ, where the second equality follows because L ϕ, ( ˆψ ) = 0. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

61 For non-nested, linear specifications for the conditional mean, we can incorporate the null model (y t = x tβ + ε t ) into the score and obtain δ L ϕ, (θ, ψ) = 1 ξ t (x tβ ξ tδ) + 1 ξ t ε t with d 1,t (θ, ψ) = ξ t (x tβ ξ tδ), d 2,t (θ, ψ) = ξ t, and c t (θ) = ε t. hen, Ĵ ϕ (ˆθ, ˆψ ) = 1 as in the SE discussed earlier. ξ t ( x t ˆβ ξ t ˆδ ) = 1 ξ t ê t, C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

62 For nonlinear specifications, it can be seen that Ĵ ϕ (ˆθ, ˆψ ) = 1 δ µ ( ξ t, ˆδ )[ ( m xt, ˆβ ) ( µ ξt, ˆδ )] = 1 δ µ ( ξ t, ˆδ )êt, with ê t = y t m(x t, ˆβ ) the nonlinear OLS residuals. Yet, it is not easy to compute the SE here because evaluating the pseudo-true value of ˆδ is a formidable task. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

63 he linear expansion of 1/2 Ĵ ϕ (ˆθ, ˆψ ) about (θo, ψ(θ o )) is Ĵ ϕ (ˆθ, ˆψ ) 1 d 2,t (θ o, ψ(θ o ))c t (θ o ) A o (ˆθ θ o ), where A o = lim 1 IE f (θ o)[ d2,t (θ o, ψ(θ o )) θ c t (θ o ) ]. Note that the other terms in the expansion that involve c t would vanish in the limit because they have zero mean. Recall also that (ˆθ θ o ) = H (θ o ) 1 1 where IE f (θo)[s f,t (θ o ) x t, ξ t ] = 0. s f,t (θ o ) + o IP (1), C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

64 Collecting terms we have Ĵ ϕ (ˆθ, ˆψ ) = 1 b t (θ o, ψ(θ o )) + o IP (1), where b t (θ o, ψ(θ o )) = d 2,t (θ o, ψ(θ o ))c t (θ o ) A o H (θ o ) 1 s f,t (θ o ) and IE f (θo)[b t (θ o, ψ(θ o )) x t, ξ t ] = 0. By invoking a suitable CL, 1/2 Ĵ ϕ (ˆθ, ˆψ ) has a limiting normal distribution with the asymptotic covariance matrix: 1 Σ o = lim IE f (θo)[ bt (θ o, ψ(θ o ))b t (θ o, ψ(θ o )) ], which can be consistently estimated by Σ = 1 b t (ˆθ, ˆψ ) bt (ˆθ, ˆψ ). C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

65 It follows that the PSE test is PSE = (ˆθ Ĵϕ, ˆψ ) Σ (ˆθ Ĵϕ, ˆψ ) D χ 2 (k), where k is the rank of Σ and Σ is the generalized inverse of Σ. he PSE test can be understood as an extension of the conditional mean encompassing test of Wooldridge (1990). C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

66 Example: Consider non-nested specifications of conditional variance: H 0 : y t x t, ξ t N ( 0, h(x t, α) ), H 1 : y t x t, ξ t N ( 0, κ(ξ t, γ) ). When h t and κ t are evaluated at the respective QMLEs α and γ, we write ĥ t and ˆκ t. It can be verified that s h,t (α) = αh t 2ht 2 (yt 2 h t ), s κ,t (γ) = γκ t 2κ 2 t where IE f (θ) (y 2 t h t x t, ξ t ) = 0. (yt 2 κ t ) = γκ t 2κ 2 (h t κ t ) t }{{} d 1,t + γκ t 2κ 2 t }{{} d 2,t (yt 2 h t ) }{{} c t, C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

67 he sample counterpart of the pseudo-true score function is thus 1 γ ˆκ t 2ˆκ 2 (yt 2 ĥ t ). t hus, the PSE test amounts to checking whether γ κ t /(2κ 2 t ) are correlated with the generalized errors (y 2 t h t ). C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

68 Binary Choice Models Consider the binary dependent variable: { 1, with probability IP(y y t = t = 1 x t ), 0, with probability 1 IP(y t = 1 x t ). he density function of y t given x t is: g(y t x t ) = IP(y t = 1 x t ) yt [1 IP(y t = 1 x t )] 1 yt. Approximating IP(y t = 1 x t ) by F (x t ; θ), the quasi-likelihood function is f (y t x t ; θ) = F (x t ; θ) yt [1 F (x t ; θ)] 1 yt. he QMLE θ is obtained by maximizing L (θ) = 1 [ yt log F (x t ; θ) + (1 y t ) log(1 F (x t ; θ)) ]. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

69 Probit model: F (x t ; θ) = Φ(x tθ) = x t θ 1 2π e v 2 /2 dv, where Φ denotes the standard normal distribution function. Logit model: F (x t ; θ) = G(x tθ) = exp( x tθ) = exp(x tθ) 1 + exp(x tθ), where G is the logistic distribution function with mean zero and variance π 2 /3. Note that the logistic distribution is more peaked around its mean and has slightly thicker tails than the standard normal distribution. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

70 Global Concavity he log-likelihood function L is globally concave provided that 2 θ L (θ) is negative definite for θ Θ. By a 2nd-order aylor expansion, L (θ) = L ( θ ) + θ L ( θ )(θ θ ) + (θ θ ) 2 θ L (θ )(θ θ ) = L ( θ ) + (θ θ ) 2 θ L (θ )(θ θ ), where θ is between θ and θ. Global concavity implies that L (θ) must be less than L ( θ ) for any θ Θ. hus, θ must be a global maximizer. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

71 For the logit model, as G (u) = G(u)[1 G(u)], we have θ L (θ) = 1 = 1 = 1 [ G (x y tθ) t G(x tθ) (1 y t) G (x tθ) 1 G(x tθ) ] x t { yt [1 G(x tθ)] (1 y t )G(x tθ) } x t [y t G(x tθ)]x t, from which we can solve for the QMLE θ. Note that 2 θ L (θ) = 1 G(x tθ)[1 G(x tθ)]x t x t, which is negative definite, so that L is globally concave in θ. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

72 For the probit model, θ L (θ) = 1 = 1 [ φ(x y tθ) t Φ(x tθ) (1 y t) φ(x tθ) 1 Φ(x tθ) y t Φ(x tθ) Φ(x tθ)[1 Φ(x tθ)] φ(x tθ)x t, ] x t where φ is the standard normal density function. It can be verified that 2 θ L (θ) = 1 [ y t φ(x tθ) + x tθφ(x tθ) Φ 2 (x tθ) + (1 y t ) φ(x tθ) x tθ[1 Φ(x tθ)] [1 Φ(x tθ)] 2 ] φ(x tθ)x t x t, which is also negative definite; see e.g., Amemiya (1985, pp ). C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

73 As IE(y t x t ) = IP(y t = 1 x t ), we can write y t = F (x t ; θ) + e t. Note that when F is correctly specified for the conditional mean, y t is conditionally heteroskedastic with var(y t x t ) = IP(y t = 1 x t )[1 IP(y t = 1 x t )]. hus, the probit and logit models are also different nonlinear mean specifications with conditional heteroskedasticity. he NLS estimator of θ is inefficient because the NLS objective function ignores conditional heteroskedasticity. A weighted NLS estimator that takes into account the conditional variance is still inefficient. (Why?) C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

74 As IE(y t x t ) = IP(y t = 1 x t ), we can write y t = F (x t ; θ) + e t. Note that when F is correctly specified for the conditional mean, y t is conditionally heteroskedastic with var(y t x t ) = IP(y t = 1 x t )[1 IP(y t = 1 x t )]. hus, the probit and logit models are also different nonlinear mean specifications with conditional heteroskedasticity. he NLS estimator of θ is inefficient because the NLS objective function ignores conditional heteroskedasticity. A weighted NLS estimator that takes into account the conditional variance is still inefficient. (Why?) C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

75 Marginal response: For the probit model, Φ(x tθ) x tj = φ(x tθ)θ j ; for the logit model, G(x tθ) x tj = G(x tθ)[1 G(x tθ)]θ j. hese marginal effects all depend on x t. It is typical to evaluate the marginal response based on a particular value of x t, such as x t = 0 or x t = x, the sample average of x t. When x t = 0, φ(0) 0.4 and G(0)[1 G(0)] = his suggests that the QMLE for the logit model is approximately 1.6 times the QMLE for the probit model when x t are close to zero. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

76 Letting p = IP(y = 1 x), p/(1 p) is the odds ratio: the probability of y = 1 relative to the probability of y = 0. For the logit model, the log-odds-ratio is ( pt ) ln = ln(exp(x 1 p tθ)) = x tθ. t so that ln(p t /(1 p t )) x tj = θ j. As the effect of a regressor on the log-odds-ratio, each coefficient in the logit model is also understood as a semi-elasticity. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

77 Measure of Goodness of Fit Digression: Given the objective function Q, let Q (c) denote the value of Q when the model contains only a constant term, Q the largest possible value of Q (if exists), and Q (ˆθ ) the value of Q for the fitted model. hen, R 2 of relative gain is: R 2 RG = Q (ˆθ ) Q (c) Q Q (c) = 1 Q Q (ˆθ ) Q Q (c). For binary choice models, Q = L and the largest possible L = 0 when IP(y t = 1 x t ) = 1. he measure of McFadden (1974) is RRG 2 = 1 L (ˆθ ) [ L (ȳ) = 1 yt ln ˆp t + (1 y t ) ln(1 ˆp t ) ] [ ȳ ln ȳ + (1 ȳ) ln(1 ȳ) ], where ˆp t = G(x t ˆθ ) or ˆp t = Φ(x t ˆθ ). C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

78 Latent Index Model Assume that y t is determined by the latent index variable yt : { 1, yt > 0, y t = 0, yt 0, where yt = x tβ + e t. hus, IP(y t = 1 x t ) = IP(y t > 0 x t ) = IP(e t > x t β x t ), which is also IP(e t < x t β x t ) provided that e t is symmetric about zero. he probit specification Φ or the logit specification G can be understood as specifications of the conditional distribution of e t. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

79 Multinomial Models Consider J + 1 mutually exclusive choices that do not have a natural ordering, e.g., employment status and commuting mode. he dependent variable y t takes on J + 1 values such that 0, with probability IP(y t = 0 x t ), 1, with probability IP(y t = 1 x t ), y t =. J, with probability IP(y t = J x t ). Define the new binary variable d t,j for j = 0, 1,..., J as { 1, if y d t,j = t = j, 0, otherwise, note that J j=0 d t,j = 1. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

80 he density function of d t,0,..., d t,j given x t is then J g(d t,0,..., d t,j x t ) = IP(y t = j x t ) d t,j. j=0 Approximating IP(y t = j x t ) by F j (x t ; θ) we obtain the quasi-log-likelihood function: L (θ) = 1 J d t,j ln F j (x t ; θ). j=0 he first order condition is θ L (θ) = 1 J j=0 1 [ d t,j θ F F j (x t ; θ) j (x t ; θ) ] = 0, from which we can solve for the QMLE θ. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

81 Multinomial Logit Model Common specifications of the conditional probabilities are: F j (x t ; θ) = G t,j = exp(x tθ j ) J, j = 0,..., J, k=0 exp(x tθ k ) where θ = (θ 0 θ 1... θ j ), and x t does not depend on choices. Note, however, that the parameters are not identified because, for example, = exp[x t(θ 0 + γ)] exp[x t(θ 0 + γ)] + J k=1 exp(x tθ k ) exp(x tθ 0 ) exp(x tθ 0 ) + J k=1 exp[x t(θ k γ)]. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

82 For normalization, we set θ 0 = 0, so that 1 F 0 (x t ; θ) = G t,0 = 1 + J k=1 exp(x tθ k ), F j (x t ; θ) = G t,j = exp(x tθ j ) 1 +, j = 1,..., J, J k=1 exp(x tθ k ) with θ = (θ 1 θ 2... θ j ). his leads to the multinomial logit model, and the quasi-log-likelihood function is L (θ) = 1 j=1 J d t,j x tθ j 1 ( log 1 + ) J exp(x tθ k ). k=1 C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

83 It is easy to see that θj G t,k = G t,k G t,j x t for k j, and θj G t,j = G t,j [ 1 Gt,j ] xt. It follows that θj L (θ) = 1 (d t,j G t,j )x t, j = 1,..., J. he Hessian matrix contains: θj θ i L (θ) = 1 (G t,j G t,i )x t x t, i j, i, j = 1,..., J, θj θ j L (θ) = 1 G t,j (1 G t,j )x t x t, j = 1,..., J. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

84 he marginal response of G t,j to the change of x t are J xt G t,0 = G t,0 G t,i θ i, i=1 xt G t,j = G t,j (θ j ) J G t,i θ i, j = 1,..., J. i=1 hus, all coefficient vectors θ i, i = 1,..., J, enter xt G t,j. he log-odds ratios (relative to the base choice j = 0) are: ln(g t,j /G t,0 ) = x t(θ j θ 0 ) = x tθ j, j = 1,..., J, as θ 0 = 0 in G t,0 by construction. hus, each coefficient θ j,k is also understood as the effect of x tk on the log-odds-ratio ln(g t,j /G t,0 ). C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

85 Conditional Logit Model In the conditional logit model, there are multiple choices with choice-dependent or alternative-varying regressors. For example, in choosing among several commuting modes, the regressors may include the in-vehicle time and waiting time that vary with the vehicle. Let x t = (x t,0 x t,1... x t,j ). he specifications for the conditional probabilities are G t,j = exp(x t,j θ) J j = 0, 1,..., J. k=0 exp(x t,kθ), In contrast with the multinomial model, θ is common for all j, so that there is no identification problem. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

86 he quasi-log-likelihood function is L (θ) = 1 J d t,j x t,jθ 1 j=0 ( J ) log exp(x t,k θ). k=0 he gradient and Hessian matrix are θ L (θ) = 1 J (d t,j G t,j )x t,j, j=0 2 θ L (θ) = 1 J J J G t,j x t,j x t,j G t,j x t,j G t,j x t,j j=0 j=0 j=0 = 1 J G t,j (x t,j x t )(x t,j x t ), j=0 where x t = J j=0 G t,jx t,j is the weighted average of x t,j. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

87 It can be shown that the Hessian matrix is negative definite so that the quasi-log-likelihood function is globally concave. Each choice probability G t,j is affected not only by x t,j but also the regressors for other choices, x t,i, because xt,i G t,j = G t,j G t,i θ, i j, i = 0,..., J, xt,j G t,j = G t,j (1 G t,j )θ, j = 0,..., J. Note that for positive θ, an increase in x t,j increases the probability of the j th choice but decreases the probabilities of other choices. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

88 Random Utility Interpretation McFadden (1974): Consider the random utility of the choice j: U t,j = V t,j + ε t,j, j = 0, 1,..., J. he alternative i would be chosen if IP(y t = i x t ) = IP(U t,i > U t,j, for all j i x t ) = IP(ε t,j ε t,i V t,j V t,i, for all j i x t ). Letting ε t,ji = ε t,j ε t,i and Ṽ t,ji = V t,j V t,i. hen we have (for j = 0), IP(y t = 1 x t ) = IP( ε t,j0 Ṽt,j0, j = 1, 2,..., J x t ) = Ṽ t,10 Ṽ t,j0 f ( ε t,10,... ε t,j0 ) d ε t,10 d ε t,j0. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

89 his setup is consistent with the theory of decision making. Different models are obtained by imposing different assumptions on the joint distribution of ε t,j. Suppose that ε t,j are independent random variables across j with the type I extreme value distribution: exp[ exp( ε t,j )], and the density: f (ε t,j ) = exp( ε t,j ) exp[ exp( ε t,j )], j = 0, 1,..., J. It can be shown that IP(y t = j x t ) = exp(v t,j ) J i=0 exp(v t,i). We obtain the conditional logit model when V t,j = x t,jθ and multinomial logit model when V t,j = x tθ j. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

90 Remarks: 1 In the conditional logit model, the choices must be quite different such that they are independent of each other. his is known as independence of irrelevant alternatives (IIA) which is a restrictive condition in practice. 2 he multinomial probit model is obtained by assuming that ε t,j are jointly normally distributed. his model permits correlations among the choices, yet it requires numerical or simulation method to evaluate the J-fold integral for choice probabilities. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

91 Nested Logit Models McFadden (1978): Generalized extreme value (GEV) distribution with F (ε 0, ε 1,..., ε J ) = exp [ G ( e ε 0, e ε 1,..., e ε )] J, where G is non-negative and homogeneous of degree one and satisfies other conditions. With this distribution assumption, the choice probabilities of the random utility model are e V G ( j j e V 0, e V 1,..., e V ) J G ( ) e V, j = 0, 1,..., J, 0, e V 1,..., e V J where G j denotes the derivative of G with respect to the j th argument. In particular, when G ( e V 0, e V1,,..., e V ) J = J k=0 exp( V k), we obtain the multinomial logit model. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

92 Consider the case that there are J groups of choices, in which the j th group has 1,..., K j choices. he random utility is: U t,jk = V t,jk + ε t,jk, j = 1,..., J, k = 1,..., K j. For example, one must first choose between taking public transportation and driving and then select a vehicle in the chosen group. he nested logit model postulates the GEV distribution for ε t,jk : exp [ G ( e ε 11,..., e ε 1K 1,..., e ε J1,..., e ε )] JK J r K j J j ( ) = exp e ε 1/rj jk, j=1 k=1 where r j = [1 corr(ε jk, ε jm )] 1/2. When the choices in the j th group are uncorrelated, we have r j = 1 and hence the multinomial logit model. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

93 Define the binary variable d t,jk as { 1, if y d t,jk = t = jk, j = 1,..., J, k = 1,..., K j. 0, otherwise, A particular choice jk is chosen with the probability p t,jk = IP(d t,jk = 1 x t ) = IP(U t,jk > U t,mn m, n x t ). Let x t be the collection of z t,j and w t,jk and V t,jk = z t,jα + w t,jk γ j, j = 1,..., J, k = 1,..., K j. hus, we have group-specific regressors and regressors that that depend on both j and k. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

94 Let I t,j = ln ( K j k=1 exp(w t,jk γ j/r j ) ) denote the inclusive value, where r j are known as scale parameters. We can write p t,jk = exp(z t,j α + r ji t,j ) exp(w t,jk J γ j/r j ) m=1 exp(z t,mα + r m I t,m ) Kj n=1 }{{} exp(w t,jn γ ; j/r j ) }{{} p t,j p t,k j details can be found in Cameron and rivedi (2005, pp ). Note that for a given j, p t,k j is a conditional logit model with the parameter γ j /r j. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

95 he approximating density is f = K J j ( ) dt,jk pt,j p t,k j = j=1 k=1 J j=1 and the quasi-log-likelihood function is p d t,jk t,j K j p d t,jk t,k j k=1, L = 1 J d t,jk ln(p t,j ) + 1 j=1 K J j d t,jk ln(p t,k j ). j=1 k=1 Maximizing this likelihood function yields the full information maximum likelihood (FIML) estimator. here is also a sequential (limited information maximum likelihood) estimator; we omit the details. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

96 Ordered Multinomial Models Suppose that the categories of data have an ordering such that 0, yt c 1 1, c 1 < yt c 2 y t =. J, c J < yt. where yt are latent variables. For example, language proficiency status and education level have a natural ordering. Such models may be estimated using multinomial logit models; yet taking into account this structure yields simpler models. C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, / 119

The Quasi-Maximum Likelihood Method: Theory

The Quasi-Maximum Likelihood Method: Theory Chapter 9 he Quasi-Maximum Likelihood Method: heory As discussed in preceding chapters, estimating linear and nonlinear regressions by the least squares method results in an approximation to the conditional

More information

Quasi-Maximum Likelihood: Applications

Quasi-Maximum Likelihood: Applications Chapter 10 Quasi-Maximum Likelihood: Applications 10.1 Binary Choice Models In many economic applications the dependent variables of interest may assume only finitely many integer values, each labeling

More information

Generalized Method of Moment

Generalized Method of Moment Generalized Method of Moment CHUNG-MING KUAN Department of Finance & CRETA National Taiwan University June 16, 2010 C.-M. Kuan (Finance & CRETA, NTU Generalized Method of Moment June 16, 2010 1 / 32 Lecture

More information

LECTURE ON HAC COVARIANCE MATRIX ESTIMATION AND THE KVB APPROACH

LECTURE ON HAC COVARIANCE MATRIX ESTIMATION AND THE KVB APPROACH LECURE ON HAC COVARIANCE MARIX ESIMAION AND HE KVB APPROACH CHUNG-MING KUAN Institute of Economics Academia Sinica October 20, 2006 ckuan@econ.sinica.edu.tw www.sinica.edu.tw/ ckuan Outline C.-M. Kuan,

More information

Quick Review on Linear Multiple Regression

Quick Review on Linear Multiple Regression Quick Review on Linear Multiple Regression Mei-Yuan Chen Department of Finance National Chung Hsing University March 6, 2007 Introduction for Conditional Mean Modeling Suppose random variables Y, X 1,

More information

Econometrics II - EXAM Outline Solutions All questions have 25pts Answer each question in separate sheets

Econometrics II - EXAM Outline Solutions All questions have 25pts Answer each question in separate sheets Econometrics II - EXAM Outline Solutions All questions hae 5pts Answer each question in separate sheets. Consider the two linear simultaneous equations G with two exogeneous ariables K, y γ + y γ + x δ

More information

Introduction to Quantile Regression

Introduction to Quantile Regression Introduction to Quantile Regression CHUNG-MING KUAN Department of Finance National Taiwan University May 31, 2010 C.-M. Kuan (National Taiwan U.) Intro. to Quantile Regression May 31, 2010 1 / 36 Lecture

More information

Introduction to Estimation Methods for Time Series models Lecture 2

Introduction to Estimation Methods for Time Series models Lecture 2 Introduction to Estimation Methods for Time Series models Lecture 2 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 1 / 21 Estimators:

More information

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Econometrics Week 4 Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Fall 2012 1 / 23 Recommended Reading For the today Serial correlation and heteroskedasticity in

More information

Estimation of Dynamic Regression Models

Estimation of Dynamic Regression Models University of Pavia 2007 Estimation of Dynamic Regression Models Eduardo Rossi University of Pavia Factorization of the density DGP: D t (x t χ t 1, d t ; Ψ) x t represent all the variables in the economy.

More information

Maximum Likelihood (ML) Estimation

Maximum Likelihood (ML) Estimation Econometrics 2 Fall 2004 Maximum Likelihood (ML) Estimation Heino Bohn Nielsen 1of32 Outline of the Lecture (1) Introduction. (2) ML estimation defined. (3) ExampleI:Binomialtrials. (4) Example II: Linear

More information

Lecture 6: Discrete Choice: Qualitative Response

Lecture 6: Discrete Choice: Qualitative Response Lecture 6: Instructor: Department of Economics Stanford University 2011 Types of Discrete Choice Models Univariate Models Binary: Linear; Probit; Logit; Arctan, etc. Multinomial: Logit; Nested Logit; GEV;

More information

Econometrics II - EXAM Answer each question in separate sheets in three hours

Econometrics II - EXAM Answer each question in separate sheets in three hours Econometrics II - EXAM Answer each question in separate sheets in three hours. Let u and u be jointly Gaussian and independent of z in all the equations. a Investigate the identification of the following

More information

(θ θ ), θ θ = 2 L(θ ) θ θ θ θ θ (θ )= H θθ (θ ) 1 d θ (θ )

(θ θ ), θ θ = 2 L(θ ) θ θ θ θ θ (θ )= H θθ (θ ) 1 d θ (θ ) Setting RHS to be zero, 0= (θ )+ 2 L(θ ) (θ θ ), θ θ = 2 L(θ ) 1 (θ )= H θθ (θ ) 1 d θ (θ ) O =0 θ 1 θ 3 θ 2 θ Figure 1: The Newton-Raphson Algorithm where H is the Hessian matrix, d θ is the derivative

More information

Maximum Likelihood Tests and Quasi-Maximum-Likelihood

Maximum Likelihood Tests and Quasi-Maximum-Likelihood Maximum Likelihood Tests and Quasi-Maximum-Likelihood Wendelin Schnedler Department of Economics University of Heidelberg 10. Dezember 2007 Wendelin Schnedler (AWI) Maximum Likelihood Tests and Quasi-Maximum-Likelihood10.

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n

More information

Formulary Applied Econometrics

Formulary Applied Econometrics Department of Economics Formulary Applied Econometrics c c Seminar of Statistics University of Fribourg Formulary Applied Econometrics 1 Rescaling With y = cy we have: ˆβ = cˆβ With x = Cx we have: ˆβ

More information

An Encompassing Test for Non-Nested Quantile Regression Models

An Encompassing Test for Non-Nested Quantile Regression Models An Encompassing Test for Non-Nested Quantile Regression Models Chung-Ming Kuan Department of Finance National Taiwan University Hsin-Yi Lin Department of Economics National Chengchi University Abstract

More information

Asymptotic Least Squares Theory

Asymptotic Least Squares Theory Asymptotic Least Squares Theory CHUNG-MING KUAN Department of Finance & CRETA December 5, 2011 C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 1 / 85 Lecture Outline

More information

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Econometrics Week 8 Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Fall 2012 1 / 25 Recommended Reading For the today Instrumental Variables Estimation and Two Stage

More information

Generalized Least Squares Theory

Generalized Least Squares Theory Chapter 4 Generalized Least Squares Theory In Section 3.6 we have seen that the classical conditions need not hold in practice. Although these conditions have no effect on the OLS method per se, they do

More information

8. Hypothesis Testing

8. Hypothesis Testing FE661 - Statistical Methods for Financial Engineering 8. Hypothesis Testing Jitkomut Songsiri introduction Wald test likelihood-based tests significance test for linear regression 8-1 Introduction elements

More information

GARCH Models Estimation and Inference

GARCH Models Estimation and Inference Università di Pavia GARCH Models Estimation and Inference Eduardo Rossi Likelihood function The procedure most often used in estimating θ 0 in ARCH models involves the maximization of a likelihood function

More information

Introductory Econometrics

Introductory Econometrics Based on the textbook by Wooldridge: : A Modern Approach Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna December 11, 2012 Outline Heteroskedasticity

More information

The outline for Unit 3

The outline for Unit 3 The outline for Unit 3 Unit 1. Introduction: The regression model. Unit 2. Estimation principles. Unit 3: Hypothesis testing principles. 3.1 Wald test. 3.2 Lagrange Multiplier. 3.3 Likelihood Ratio Test.

More information

Chapter 4: Constrained estimators and tests in the multiple linear regression model (Part III)

Chapter 4: Constrained estimators and tests in the multiple linear regression model (Part III) Chapter 4: Constrained estimators and tests in the multiple linear regression model (Part III) Florian Pelgrin HEC September-December 2010 Florian Pelgrin (HEC) Constrained estimators September-December

More information

More Empirical Process Theory

More Empirical Process Theory More Empirical Process heory 4.384 ime Series Analysis, Fall 2008 Recitation by Paul Schrimpf Supplementary to lectures given by Anna Mikusheva October 24, 2008 Recitation 8 More Empirical Process heory

More information

Econometrics I, Estimation

Econometrics I, Estimation Econometrics I, Estimation Department of Economics Stanford University September, 2008 Part I Parameter, Estimator, Estimate A parametric is a feature of the population. An estimator is a function of the

More information

Econometrics of Panel Data

Econometrics of Panel Data Econometrics of Panel Data Jakub Mućk Meeting # 6 Jakub Mućk Econometrics of Panel Data Meeting # 6 1 / 36 Outline 1 The First-Difference (FD) estimator 2 Dynamic panel data models 3 The Anderson and Hsiao

More information

Linear Regression with Time Series Data

Linear Regression with Time Series Data u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f e c o n o m i c s Econometrics II Linear Regression with Time Series Data Morten Nyboe Tabor u n i v e r s i t y o f c o p e n h a g

More information

GARCH Models Estimation and Inference. Eduardo Rossi University of Pavia

GARCH Models Estimation and Inference. Eduardo Rossi University of Pavia GARCH Models Estimation and Inference Eduardo Rossi University of Pavia Likelihood function The procedure most often used in estimating θ 0 in ARCH models involves the maximization of a likelihood function

More information

Classical Least Squares Theory

Classical Least Squares Theory Classical Least Squares Theory CHUNG-MING KUAN Department of Finance & CRETA National Taiwan University October 14, 2012 C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, 2012

More information

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models

More information

Spring 2017 Econ 574 Roger Koenker. Lecture 14 GEE-GMM

Spring 2017 Econ 574 Roger Koenker. Lecture 14 GEE-GMM University of Illinois Department of Economics Spring 2017 Econ 574 Roger Koenker Lecture 14 GEE-GMM Throughout the course we have emphasized methods of estimation and inference based on the principle

More information

CALCULATION METHOD FOR NONLINEAR DYNAMIC LEAST-ABSOLUTE DEVIATIONS ESTIMATOR

CALCULATION METHOD FOR NONLINEAR DYNAMIC LEAST-ABSOLUTE DEVIATIONS ESTIMATOR J. Japan Statist. Soc. Vol. 3 No. 200 39 5 CALCULAION MEHOD FOR NONLINEAR DYNAMIC LEAS-ABSOLUE DEVIAIONS ESIMAOR Kohtaro Hitomi * and Masato Kagihara ** In a nonlinear dynamic model, the consistency and

More information

A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008 A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. Linear-in-Parameters Models: IV versus Control Functions 2. Correlated

More information

Graduate Econometrics Lecture 4: Heteroskedasticity

Graduate Econometrics Lecture 4: Heteroskedasticity Graduate Econometrics Lecture 4: Heteroskedasticity Department of Economics University of Gothenburg November 30, 2014 1/43 and Autocorrelation Consequences for OLS Estimator Begin from the linear model

More information

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models University of Illinois Fall 2016 Department of Economics Roger Koenker Economics 536 Lecture 7 Introduction to Specification Testing in Dynamic Econometric Models In this lecture I want to briefly describe

More information

Binary Choice Models Probit & Logit. = 0 with Pr = 0 = 1. decision-making purchase of durable consumer products unemployment

Binary Choice Models Probit & Logit. = 0 with Pr = 0 = 1. decision-making purchase of durable consumer products unemployment BINARY CHOICE MODELS Y ( Y ) ( Y ) 1 with Pr = 1 = P = 0 with Pr = 0 = 1 P Examples: decision-making purchase of durable consumer products unemployment Estimation with OLS? Yi = Xiβ + εi Problems: nonsense

More information

MEI Exam Review. June 7, 2002

MEI Exam Review. June 7, 2002 MEI Exam Review June 7, 2002 1 Final Exam Revision Notes 1.1 Random Rules and Formulas Linear transformations of random variables. f y (Y ) = f x (X) dx. dg Inverse Proof. (AB)(AB) 1 = I. (B 1 A 1 )(AB)(AB)

More information

Appendix A: The time series behavior of employment growth

Appendix A: The time series behavior of employment growth Unpublished appendices from The Relationship between Firm Size and Firm Growth in the U.S. Manufacturing Sector Bronwyn H. Hall Journal of Industrial Economics 35 (June 987): 583-606. Appendix A: The time

More information

Classical Least Squares Theory

Classical Least Squares Theory Classical Least Squares Theory CHUNG-MING KUAN Department of Finance & CRETA National Taiwan University October 18, 2014 C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014

More information

Panel Data Seminar. Discrete Response Models. Crest-Insee. 11 April 2008

Panel Data Seminar. Discrete Response Models. Crest-Insee. 11 April 2008 Panel Data Seminar Discrete Response Models Romain Aeberhardt Laurent Davezies Crest-Insee 11 April 2008 Aeberhardt and Davezies (Crest-Insee) Panel Data Seminar 11 April 2008 1 / 29 Contents Overview

More information

Introductory Econometrics

Introductory Econometrics Based on the textbook by Wooldridge: : A Modern Approach Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna November 23, 2013 Outline Introduction

More information

Non-linear panel data modeling

Non-linear panel data modeling Non-linear panel data modeling Laura Magazzini University of Verona laura.magazzini@univr.it http://dse.univr.it/magazzini May 2010 Laura Magazzini (@univr.it) Non-linear panel data modeling May 2010 1

More information

Econometrics Multiple Regression Analysis: Heteroskedasticity

Econometrics Multiple Regression Analysis: Heteroskedasticity Econometrics Multiple Regression Analysis: João Valle e Azevedo Faculdade de Economia Universidade Nova de Lisboa Spring Semester João Valle e Azevedo (FEUNL) Econometrics Lisbon, April 2011 1 / 19 Properties

More information

Linear Regression with Time Series Data

Linear Regression with Time Series Data Econometrics 2 Linear Regression with Time Series Data Heino Bohn Nielsen 1of21 Outline (1) The linear regression model, identification and estimation. (2) Assumptions and results: (a) Consistency. (b)

More information

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data Jae-Kwang Kim 1 Iowa State University June 28, 2012 1 Joint work with Dr. Ming Zhou (when he was a PhD student at ISU)

More information

Economics 582 Random Effects Estimation

Economics 582 Random Effects Estimation Economics 582 Random Effects Estimation Eric Zivot May 29, 2013 Random Effects Model Hence, the model can be re-written as = x 0 β + + [x ] = 0 (no endogeneity) [ x ] = = + x 0 β + + [x ] = 0 [ x ] = 0

More information

Linear Regression with Time Series Data

Linear Regression with Time Series Data u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f e c o n o m i c s Econometrics II Linear Regression with Time Series Data Morten Nyboe Tabor u n i v e r s i t y o f c o p e n h a g

More information

Elements of Probability Theory

Elements of Probability Theory Elements of Probability Theory CHUNG-MING KUAN Department of Finance National Taiwan University December 5, 2009 C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 1 / 58

More information

Lecture 14 More on structural estimation

Lecture 14 More on structural estimation Lecture 14 More on structural estimation Economics 8379 George Washington University Instructor: Prof. Ben Williams traditional MLE and GMM MLE requires a full specification of a model for the distribution

More information

1 Motivation for Instrumental Variable (IV) Regression

1 Motivation for Instrumental Variable (IV) Regression ECON 370: IV & 2SLS 1 Instrumental Variables Estimation and Two Stage Least Squares Econometric Methods, ECON 370 Let s get back to the thiking in terms of cross sectional (or pooled cross sectional) data

More information

Additional Topics on Linear Regression

Additional Topics on Linear Regression Additional Topics on Linear Regression Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Additional Topics 1 / 49 1 Tests for Functional Form Misspecification 2 Nonlinear

More information

Econometric Analysis of Cross Section and Panel Data

Econometric Analysis of Cross Section and Panel Data Econometric Analysis of Cross Section and Panel Data Jeffrey M. Wooldridge / The MIT Press Cambridge, Massachusetts London, England Contents Preface Acknowledgments xvii xxiii I INTRODUCTION AND BACKGROUND

More information

Statistics 910, #5 1. Regression Methods

Statistics 910, #5 1. Regression Methods Statistics 910, #5 1 Overview Regression Methods 1. Idea: effects of dependence 2. Examples of estimation (in R) 3. Review of regression 4. Comparisons and relative efficiencies Idea Decomposition Well-known

More information

Follow links for Class Use and other Permissions. For more information send to:

Follow links for Class Use and other Permissions. For more information send  to: COPYRIGH NOICE: Kenneth J. Singleton: Empirical Dynamic Asset Pricing is published by Princeton University Press and copyrighted, 00, by Princeton University Press. All rights reserved. No part of this

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 MA 575 Linear Models: Cedric E Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 1 Within-group Correlation Let us recall the simple two-level hierarchical

More information

Econometrics Lecture 5: Limited Dependent Variable Models: Logit and Probit

Econometrics Lecture 5: Limited Dependent Variable Models: Logit and Probit Econometrics Lecture 5: Limited Dependent Variable Models: Logit and Probit R. G. Pierse 1 Introduction In lecture 5 of last semester s course, we looked at the reasons for including dichotomous variables

More information

Introduction Large Sample Testing Composite Hypotheses. Hypothesis Testing. Daniel Schmierer Econ 312. March 30, 2007

Introduction Large Sample Testing Composite Hypotheses. Hypothesis Testing. Daniel Schmierer Econ 312. March 30, 2007 Hypothesis Testing Daniel Schmierer Econ 312 March 30, 2007 Basics Parameter of interest: θ Θ Structure of the test: H 0 : θ Θ 0 H 1 : θ Θ 1 for some sets Θ 0, Θ 1 Θ where Θ 0 Θ 1 = (often Θ 1 = Θ Θ 0

More information

CCP Estimation. Robert A. Miller. March Dynamic Discrete Choice. Miller (Dynamic Discrete Choice) cemmap 6 March / 27

CCP Estimation. Robert A. Miller. March Dynamic Discrete Choice. Miller (Dynamic Discrete Choice) cemmap 6 March / 27 CCP Estimation Robert A. Miller Dynamic Discrete Choice March 2018 Miller Dynamic Discrete Choice) cemmap 6 March 2018 1 / 27 Criteria for Evaluating Estimators General principles to apply when assessing

More information

1 Outline. 1. Motivation. 2. SUR model. 3. Simultaneous equations. 4. Estimation

1 Outline. 1. Motivation. 2. SUR model. 3. Simultaneous equations. 4. Estimation 1 Outline. 1. Motivation 2. SUR model 3. Simultaneous equations 4. Estimation 2 Motivation. In this chapter, we will study simultaneous systems of econometric equations. Systems of simultaneous equations

More information

Spatial Regression. 9. Specification Tests (1) Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Spatial Regression. 9. Specification Tests (1) Luc Anselin.   Copyright 2017 by Luc Anselin, All Rights Reserved Spatial Regression 9. Specification Tests (1) Luc Anselin http://spatial.uchicago.edu 1 basic concepts types of tests Moran s I classic ML-based tests LM tests 2 Basic Concepts 3 The Logic of Specification

More information

I. Multinomial Logit Suppose we only have individual specific covariates. Then we can model the response probability as

I. Multinomial Logit Suppose we only have individual specific covariates. Then we can model the response probability as Econ 513, USC, Fall 2005 Lecture 15 Discrete Response Models: Multinomial, Conditional and Nested Logit Models Here we focus again on models for discrete choice with more than two outcomes We assume that

More information

Ch.10 Autocorrelated Disturbances (June 15, 2016)

Ch.10 Autocorrelated Disturbances (June 15, 2016) Ch10 Autocorrelated Disturbances (June 15, 2016) In a time-series linear regression model setting, Y t = x tβ + u t, t = 1, 2,, T, (10-1) a common problem is autocorrelation, or serial correlation of the

More information

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2016 Instructor: Victor Aguirregabiria

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2016 Instructor: Victor Aguirregabiria ECOOMETRICS II (ECO 24S) University of Toronto. Department of Economics. Winter 26 Instructor: Victor Aguirregabiria FIAL EAM. Thursday, April 4, 26. From 9:am-2:pm (3 hours) ISTRUCTIOS: - This is a closed-book

More information

Discrete Dependent Variable Models

Discrete Dependent Variable Models Discrete Dependent Variable Models James J. Heckman University of Chicago This draft, April 10, 2006 Here s the general approach of this lecture: Economic model Decision rule (e.g. utility maximization)

More information

Control Function and Related Methods: Nonlinear Models

Control Function and Related Methods: Nonlinear Models Control Function and Related Methods: Nonlinear Models Jeff Wooldridge Michigan State University Programme Evaluation for Policy Analysis Institute for Fiscal Studies June 2012 1. General Approach 2. Nonlinear

More information

Linear Regression. Junhui Qian. October 27, 2014

Linear Regression. Junhui Qian. October 27, 2014 Linear Regression Junhui Qian October 27, 2014 Outline The Model Estimation Ordinary Least Square Method of Moments Maximum Likelihood Estimation Properties of OLS Estimator Unbiasedness Consistency Efficiency

More information

Brief Review on Estimation Theory

Brief Review on Estimation Theory Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on

More information

Intermediate Econometrics

Intermediate Econometrics Intermediate Econometrics Heteroskedasticity Text: Wooldridge, 8 July 17, 2011 Heteroskedasticity Assumption of homoskedasticity, Var(u i x i1,..., x ik ) = E(u 2 i x i1,..., x ik ) = σ 2. That is, the

More information

QED. Queen s Economics Department Working Paper No Russell Davidson Queen s University. James G. MacKinnon Queen s University

QED. Queen s Economics Department Working Paper No Russell Davidson Queen s University. James G. MacKinnon Queen s University QED Queen s Economics Department Working Paper No. 514 Convenient Specification Tests for Logit and Probit Models Russell Davidson Queen s University James G. MacKinnon Queen s University Department of

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

On the Power of Tests for Regime Switching

On the Power of Tests for Regime Switching On the Power of Tests for Regime Switching joint work with Drew Carter and Ben Hansen Douglas G. Steigerwald UC Santa Barbara May 2015 D. Steigerwald (UCSB) Regime Switching May 2015 1 / 42 Motivating

More information

Understanding Regressions with Observations Collected at High Frequency over Long Span

Understanding Regressions with Observations Collected at High Frequency over Long Span Understanding Regressions with Observations Collected at High Frequency over Long Span Yoosoon Chang Department of Economics, Indiana University Joon Y. Park Department of Economics, Indiana University

More information

Topic 7: Heteroskedasticity

Topic 7: Heteroskedasticity Topic 7: Heteroskedasticity Advanced Econometrics (I Dong Chen School of Economics, Peking University Introduction If the disturbance variance is not constant across observations, the regression is heteroskedastic

More information

ECON 594: Lecture #6

ECON 594: Lecture #6 ECON 594: Lecture #6 Thomas Lemieux Vancouver School of Economics, UBC May 2018 1 Limited dependent variables: introduction Up to now, we have been implicitly assuming that the dependent variable, y, was

More information

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes Maximum Likelihood Estimation Econometrics II Department of Economics Universidad Carlos III de Madrid Máster Universitario en Desarrollo y Crecimiento Económico Outline 1 3 4 General Approaches to Parameter

More information

AGEC 661 Note Eleven Ximing Wu. Exponential regression model: m (x, θ) = exp (xθ) for y 0

AGEC 661 Note Eleven Ximing Wu. Exponential regression model: m (x, θ) = exp (xθ) for y 0 AGEC 661 ote Eleven Ximing Wu M-estimator So far we ve focused on linear models, where the estimators have a closed form solution. If the population model is nonlinear, the estimators often do not have

More information

Model comparison and selection

Model comparison and selection BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)

More information

Lecture 6: Hypothesis Testing

Lecture 6: Hypothesis Testing Lecture 6: Hypothesis Testing Mauricio Sarrias Universidad Católica del Norte November 6, 2017 1 Moran s I Statistic Mandatory Reading Moran s I based on Cliff and Ord (1972) Kelijan and Prucha (2001)

More information

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics Jiti Gao Department of Statistics School of Mathematics and Statistics The University of Western Australia Crawley

More information

Multiple Regression Analysis

Multiple Regression Analysis Multiple Regression Analysis y = 0 + 1 x 1 + x +... k x k + u 6. Heteroskedasticity What is Heteroskedasticity?! Recall the assumption of homoskedasticity implied that conditional on the explanatory variables,

More information

Testing Restrictions and Comparing Models

Testing Restrictions and Comparing Models Econ. 513, Time Series Econometrics Fall 00 Chris Sims Testing Restrictions and Comparing Models 1. THE PROBLEM We consider here the problem of comparing two parametric models for the data X, defined by

More information

Central Bank of Chile October 29-31, 2013 Bruce Hansen (University of Wisconsin) Structural Breaks October 29-31, / 91. Bruce E.

Central Bank of Chile October 29-31, 2013 Bruce Hansen (University of Wisconsin) Structural Breaks October 29-31, / 91. Bruce E. Forecasting Lecture 3 Structural Breaks Central Bank of Chile October 29-31, 2013 Bruce Hansen (University of Wisconsin) Structural Breaks October 29-31, 2013 1 / 91 Bruce E. Hansen Organization Detection

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple

More information

Inference Based on the Wild Bootstrap

Inference Based on the Wild Bootstrap Inference Based on the Wild Bootstrap James G. MacKinnon Department of Economics Queen s University Kingston, Ontario, Canada K7L 3N6 email: jgm@econ.queensu.ca Ottawa September 14, 2012 The Wild Bootstrap

More information

GARCH Models Estimation and Inference

GARCH Models Estimation and Inference GARCH Models Estimation and Inference Eduardo Rossi University of Pavia December 013 Rossi GARCH Financial Econometrics - 013 1 / 1 Likelihood function The procedure most often used in estimating θ 0 in

More information

Econometric Methods for Panel Data

Econometric Methods for Panel Data Based on the books by Baltagi: Econometric Analysis of Panel Data and by Hsiao: Analysis of Panel Data Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies

More information

h=1 exp (X : J h=1 Even the direction of the e ect is not determined by jk. A simpler interpretation of j is given by the odds-ratio

h=1 exp (X : J h=1 Even the direction of the e ect is not determined by jk. A simpler interpretation of j is given by the odds-ratio Multivariate Response Models The response variable is unordered and takes more than two values. The term unordered refers to the fact that response 3 is not more favored than response 2. One choice from

More information

ECONOMETRICS (I) MEI-YUAN CHEN. Department of Finance National Chung Hsing University. July 17, 2003

ECONOMETRICS (I) MEI-YUAN CHEN. Department of Finance National Chung Hsing University. July 17, 2003 ECONOMERICS (I) MEI-YUAN CHEN Department of Finance National Chung Hsing University July 17, 2003 c Mei-Yuan Chen. he L A EX source file is ec471.tex. Contents 1 Introduction 1 2 Reviews of Statistics

More information

simple if it completely specifies the density of x

simple if it completely specifies the density of x 3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely

More information

CRE METHODS FOR UNBALANCED PANELS Correlated Random Effects Panel Data Models IZA Summer School in Labor Economics May 13-19, 2013 Jeffrey M.

CRE METHODS FOR UNBALANCED PANELS Correlated Random Effects Panel Data Models IZA Summer School in Labor Economics May 13-19, 2013 Jeffrey M. CRE METHODS FOR UNBALANCED PANELS Correlated Random Effects Panel Data Models IZA Summer School in Labor Economics May 13-19, 2013 Jeffrey M. Wooldridge Michigan State University 1. Introduction 2. Linear

More information

A Primer on Asymptotics

A Primer on Asymptotics A Primer on Asymptotics Eric Zivot Department of Economics University of Washington September 30, 2003 Revised: October 7, 2009 Introduction The two main concepts in asymptotic theory covered in these

More information

Multivariate Time Series: VAR(p) Processes and Models

Multivariate Time Series: VAR(p) Processes and Models Multivariate Time Series: VAR(p) Processes and Models A VAR(p) model, for p > 0 is X t = φ 0 + Φ 1 X t 1 + + Φ p X t p + A t, where X t, φ 0, and X t i are k-vectors, Φ 1,..., Φ p are k k matrices, with

More information

Heteroskedasticity. Part VII. Heteroskedasticity

Heteroskedasticity. Part VII. Heteroskedasticity Part VII Heteroskedasticity As of Oct 15, 2015 1 Heteroskedasticity Consequences Heteroskedasticity-robust inference Testing for Heteroskedasticity Weighted Least Squares (WLS) Feasible generalized Least

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Statistics and econometrics

Statistics and econometrics 1 / 36 Slides for the course Statistics and econometrics Part 10: Asymptotic hypothesis testing European University Institute Andrea Ichino September 8, 2014 2 / 36 Outline Why do we need large sample

More information

Estimation and Testing for Common Cycles

Estimation and Testing for Common Cycles Estimation and esting for Common Cycles Anders Warne February 27, 2008 Abstract: his note discusses estimation and testing for the presence of common cycles in cointegrated vector autoregressions A simple

More information

Single Equation Linear GMM with Serially Correlated Moment Conditions

Single Equation Linear GMM with Serially Correlated Moment Conditions Single Equation Linear GMM with Serially Correlated Moment Conditions Eric Zivot October 28, 2009 Univariate Time Series Let {y t } be an ergodic-stationary time series with E[y t ]=μ and var(y t )

More information