Quasi-Maximum Likelihood: Applications

Size: px

Start display at page:

Download "Quasi-Maximum Likelihood: Applications"

Lisa Bridges
5 years ago
Views:

1 Chapter 10 Quasi-Maximum Likelihood: Applications 10.1 Binary Choice Models In many economic applications the dependent variables of interest may assume only finitely many integer values, each labeling a category of data. he simplest case of discrete dependent variables is the binary variable that takes on the values one and zero. For example, the ownership of a durable goods and the choice of participating a particular event may be represented by a binary dependent variable. Such variables are different from standard continuous variables such as GDP and income. It is then natural to take into account the data characteristics in the specification of quasilikelihood function. Conditional on the explanatory variables x t, the binary variable y t is such that { 1, with probability IP(y y t = t =1 x t ), 0, with probability 1 IP(y t =1 x t ). he density function of y t given x t is of the Bernoulli type: g(y t x t )=IP(y t =1 x t ) yt [1 IP(y t =1 x t )] 1 yt. A standard modeling approach is to find a function F such that F (x t ; θ) approximates the conditional probability IP(y t =1 x t ). he quasi-likelihood function is then: f(y t x t ; θ) =F (x t ; θ) yt [1 F (x t ; θ)] 1 yt. 267

2 268 CHAPER 10. QUASI-MAXIMUM LIKELIHOOD: APPLICAIONS he QMLE θ is then obtained by maximizing L (θ) = 1 [ yt log F (x t ; θ)+(1 y t )log(1 F (x t ; θ)) ]. As 0 IP(y t =1 x t ) 1, it is natural to choose F as a function bounded between zero and one, such as a distribution functions. When F (x t ; θ) =Φ(x tθ) withφthe standard normal distribution function: u 1 Φ(u) = e v2 /2 dv, 2π we have the probit model. For Φ(u) =p, itsinverseφ 1 (p) isknownastheprobit transformation. WhenF (x t ; θ) =G(x tθ)withg the logistic distribution function: 1 eu G(u) = = 1+e u 1+e u. it is the logit model. he logistic distribution has mean zero and variance π 2 /3, and it is more peaked around its mean and has slightly thicker tails than the standard normal distribution. For G(u) =p, itsinverse ( p ) G 1 (p) =log, 1 p is known as the logit transformation. ItiseasytoverifythatG (u) =G(u)[1 G(u)] which is convenient for estimating a logit model. and For the logit model, θ L (θ) = 1 = 1 2 θ L (θ) = 1 [ G (x y tθ) t G(x t θ) (1 y t) [y t G(x tθ)]x t, G(x t θ)[1 G(x t θ)]x t x t. G (x tθ) 1 G(x t θ) Given that G(x tθ)[1 G(x tθ)] > 0 for all θ, this matrix is is negative definite, so that the quasi-log-likelihood function is globally concave on the parameter space. When x t contains a constant term, the first order condition implies ] x t 1 y t = 1 G(x t θ );

3 10.1. BINARY CHOICE MODELS 269 that is, the average of fitted values is the relative frequency of y t =1. For the probit model, θ L (θ) = 1 = 1 [ φ(x y tθ) t Φ(x t θ) (1 y t) φ(x tθ) 1 Φ(x t θ) y t Φ(x t θ) Φ(x t θ)[1 Φ(x t θ)]φ(x tθ)x t, ] x t where φ is the standard normal density function. It can also be verified that 2 θ L (θ) = 1 [ y t φ(x tθ)+x tθφ(x tθ) Φ 2 (x t θ) +(1 y t ) φ(x tθ) x tθ[1 Φ(x tθ)] [1 Φ(x t θ)]2 which is also negative definite; see e.g., Amemiya (1985, pp ). ] φ(x tθ)x t x t, Clearly, the conditional mean of y t is just the conditional probability P (y t =1 x t ), and its conditional variance is var(y t x t )=IP(y t =1 x t )[1 IP(y t =1 x t )], which changes with x t.hus,f(x t ; θ) is also an approximation to the conditional mean function; F (x t ; θ)[1 F (x t ; θ)] is an approximation to the conditional variance function. Writing y t = F (x t ; θ)+e t, we can see that the probit and logit models are in effect different nonlinear mean specifications with conditional heteroskedasticity. Although θ may be estimated using the NLS method, the resulting estimator cannot be efficient because it ignores conditional heteroskedasticity. Even a weighted NLS estimator that takes into account the conditional variance is still inefficient because it does not consider the Bernoulli feature of y t. Note that the linear probability model in which F (x t ; θ) =x tθ (Section 4.4) is not appropriate because the fitted values may be outside the range of [0, 1]. It should be noted that the marginal response of the choice probability to the change of a particular variable x tj in a probit model is Φ(x t θ) x tj = ϕ(x tθ)θ j,

4 270 CHAPER 10. QUASI-MAXIMUM LIKELIHOOD: APPLICAIONS and the marginal response in a logit model is G(x tθ) = G(x t x θ)[1 G(x t θ)]θ j, tj which change with x t. By contrast, the marginal response to the change of a regressor in a linear regression model is the associated coefficient which is invariant with respect to t. hus, it is typical to evaluate the marginal response based on a particular value of x t,suchasx t = 0 or x t = x, the sample average of x t. Note that when x t = 0, ϕ(0) 0.4 andg(0)[1 G(0)] = his suggests that the QMLE for the logit model is approximately 1.6 times the QMLE for the probit model when x t are close to zero. An alternative view of the binary choice model is to assume that the observed variable y t is determined by the latent (index) variable yt : { 1, y y t = t > 0, 0, yt 0, where y t = x t β + e t.hus, IP(y t =1 x t )=IP(y t > 0 x t )=IP(e t > x t β x t ). he probability on the right-hand side is also IP(e t < x t β x t ) provided that e t is symmetric about zero. he probit specification Φ or the logit specification G can then be viewed as specifications of the conditional distribution of e t. his suggests that a possible modification of these two models is to consider an asymmetric distribution function for e t Models for Multiple Choices A more general discrete choice model would be needed when one faces multiple choices of job, event or transportation alternatives. In this case, the dependent variable y t takes on J + 1 integer values (0, 1,...,J) which correspond to different categories that are not overlapping and do not have a natural ordering. We have 0, with probability IP(y t =0 x t ), 1, with probability IP(y t =1 x t ), y t =. J, with probability IP(y t = J x t ). he number of choices J may depend on t because, for example, an individual who does not own a car can not have the choice of driving to work. We shall not consider this complication here, however.

5 10.2. MODELS FOR MULIPLE CHOICES Multinomial Logit Models Define the new binary variable d t,j for j =0, 1,...,J as d t,j = { 1, if y t = j, 0, otherwise. Note that J j=0 d t,j = 1. he density function of d t,0,...,d t,j given x t is then g(d t,0,...,d t,j x t )= J IP(y t = j x t ) d t,j. j=0 he conditional probabilities can be approximated by some functions F (x t ; θ j ), where the regressors are t-specific (individual specific) but not choice specific, yet their responses to the choices j may be different. Note that only J conditional probabilities need to be specified; otherwise, there would be an identification problem because the sum of J + 1 probabilities must be one. Common specifications of the conditional probabilities are F (x t ; θ 1,...,θ J )=G t,0 = 1 1+ J k=1 exp(x t θ k ), exp(x tθ j ) F (x t ; θ 1,...,θ J )=G t,j = 1+ J k=1 exp(x t θ k ), which lead to the so-called multinomial logit model. he quasi-log-likelihood function of this model reads ( ) L (θ 1,...,θ J )= 1 J d t,j x t θ j 1 J log 1+ exp(x t θ k ). j=1 he gradient vector with respect to θ j is θj L (θ 1,...,θ J )= 1 k=1 (d t,j G t,j )x t, j =1,...,J. Setting these equations to zero we can solve for the QMLE of θ j. he first order condition again implies that, when x t contains a constant term, the predicted relative frequency of the j th category equals its actual relative frequency. It can also be verified that the (i, j) th off-diagonal block of the Hessian matrix is θj θ i L (θ 1,...,θ J )= 1 (G t,j G t,i )x t x t, i j, i, j =1,...,J,

6 272 CHAPER 10. QUASI-MAXIMUM LIKELIHOOD: APPLICAIONS and the j th diagonal block is θj θ j L (θ 1,...,θ J )= 1 G t,j (1 G t,j )x t x t, j =1,...,J. We may then use these information to compute the asymptotic covariance matrix for the QMLEs. As in the binomial logit model, one should be careful about the the marginal response of the choice probability, G t,j, to the change of the variables x t.itiseasytoverifythat J xt G t,0 = G t,0 G t,i θ i, i=1 xt G t,j = G t,j (θ j ) J G t,i θ i, j =1,...,J. i=1 hus, when x t changes, all coefficient vectors θ i, i = 1,...,J, enter the marginal response of G t,j Conditional Logit Model A model closely related to the multinomial logit model is McFadden s conditional logit model, in which the regressors are choice-specific. For example, when an individual makes choice among several commuting modes, the regressors may include the in-vehicle time and waiting time that vary with the vehicle he/she chooses. As such, we shall consider x t =(x t,0 x t,1... x t,j ),wherex t,j is the vector of regressors characterizing the t th individual s attributes with respect to the j th choice. As in Section , we define the binary variable d t,j for j =0, 1,...,J as { 1, if yt = j, d t,j = 0, otherwise, and the density function of d t,0,...,d t,j given x t is J g(d t,0,...,d t,j x t )= IP(y t = j x t ) d t,j. j=0 In the conditional logit model, the conditional probabilities are approximated by G t,j = exp(x t,j θ) J k=0 exp(x t,k θ), j =0, 1,...,J.

7 10.2. MODELS FOR MULIPLE CHOICES 273 Note that θ is now common for all j, so that we may have J + 1 specifications for the conditional probabilities without causing an identification problem. his model would be convenient if one wants to predict the probability of a new choice. Similar to the preceding subsection, the quasi-log-likelihood function is ( L (θ) = 1 J d t,j x t,jθ 1 J ) log exp(x t,k θ). j=0 he first order condition is k=0 θ L (θ) = 1 J (d t,j G t,j )x t,j = 0, j=0 from which the QMLE for θ can be easily calculated. he Hessian matrix of the quasilog-likelihood function is 2 θ L (θ) = 1 J J J G t,j x t,jx t,j G t,j x t,j j=0 j=0 j=0 G t,j x t,j = 1 J G t,j (x t,j x t )(x t,j x t ), j=0 where x t = J j=0 G t,j x t,j is the weighted average of x t,j. It is then clear that the Hessian matrix is negative definite so that the quasi-log-likelihood function is globally concave. he marginal response of G t,j to x t,i, is xt,i G t,j = G t,j G t,i θ, i j, i =0,...,J, and the marginal response of G t,j to x t,j is xt,j G t,j = G t,j (1 G t,j )θ, j =0,...,J. his shows that each choice probability is affected not only by x t,j but also the regressors for other choices, x t,i. he conditional logit model may be understood under a random utility framework (McFadden, 1974). Given the random utility of the choice j: U t,j = x t,jθ + ε t,j, j =0, 1,...,J, the alternative i would be chosen if IP(y t = i x t )=IP(U t,i >U t,j, for all j i x t ).

8 274 CHAPER 10. QUASI-MAXIMUM LIKELIHOOD: APPLICAIONS Suppose that ε t,j are independent random variables across j such that they have the type I extreme value distribution: exp[ exp( ε t,j )]. o see how this setup leads to a conditional logit model, consider the simple case that there are 3 choices (J = 2). Letting μ t,j = x t,jθ we have IP(y t =2 x t )=IP(ε t,2 + μ t,2 μ t,1 >ε t,1 and ε t,2 + μ t,2 μ t,0 >ε t,0 ) ( εt,2 +μ t,2 μ t,1 ) = f(ε t,2 ) f(ε t,1 )dε t,1 ( εt,2 +μ t,2 μ t,0 f(ε t,0 )dε t,0 ) dε t,2, where f(ε t,j ) is the type I extreme value density function: exp( ε t,j )exp[ exp( ε t,j )]. hen, IP(y t =2 x t )= = exp( ε t,2 )exp[ exp( ε t,2 )] exp[ exp( ε t,2 μ t,2 + μ t,1 )] exp[ exp( ε t,2 μ t,2 + μ t,0 )] dε t,2 exp(μ t,2 ) exp(μ t,0 )+exp(μ t,1 )+exp(μ t,2 ). his is precisely the conditional logit specification of IP(y t =2 x t ). he specifications for other conditional probabilities can be obtained similarly. hus, the conditional logit model hinges on the condition that the choices must be quite different such that they are independent of each other. his is also known as the condition of independence of irrelevant alternatives (IIA). he IIA condition may not hold in applications and therefore must be tested empirically Ordered Data In some applications, the multiple categories of interest may have a natural ordering, such as the levels of education, opinion, and price. In particular, y t is said to be ordered if 0, yt c 1 1, c 1 <yt c 2 y t =. J, c J <yt ; otherwise, it is unordered. hevariabley t in Section is unordered.

9 10.3. MODELS OF COUN DAA 275 Setting yt = x t θ + e t, the conditional probabilities are: IP(y t = j x t )=IP(c j <y t <c j+1 x t ) = F (c j+1 x tθ) F (c j x tθ), j =0, 1,...,J, where c 0 =, c J+1 =, andf is the distribution of e t. When F is the logistic distribution function G, thisistheordered logit model; whenf is the standard normal distribution function Φ, it is the ordered probit model. he quasi-log-likelihood function of the ordered model is L (θ) = 1 J log[f (c j+1 x t θ) F (c j x t θ)], j=0 from which we can easily compute its gradient and Hessian matrix. Pratt (1981) showed that the Hessian matrix is negative definite so that the quasi-log-likelihood function is globally concave. he marginal responses of the choice probabilities to the change of the variables can be easily calculated. For the ordered probit model, these responses are φ(c 1 x tθ)θ, j =0 [ xt IP(y t = j x t )= φ(cj x tθ) φ(c j+1 x tθ) ] θ, j =1,...,J 1, φ(c J x tθ)θ, j = J. See Exercise 10.5 for the marginal responses of the ordered logit model Models of Count Data In practice, there are integer-valued dependent variables that are not categorical, such as the number of accidents of a plant or the number of patents filed by a company. Such data, also known as count data, differ from categorical data in that they represent the intensity of the occurrence of an event. It is typical to postulate a Poisson distribution for the conditional probability of the count data y t =0, 1, 2,...: IP(y t x t )=exp( λ t ) λyt t y t!, where the parameter λ t, which is also the conditional mean, has a log-linear form: log λ t = x tθ.

10 276 CHAPER 10. QUASI-MAXIMUM LIKELIHOOD: APPLICAIONS he quasi-log-likelihood function is then L (θ) = 1 = 1 [ λt + y t log(λ t ) log(y t!) ] [ exp(x t θ)+y t (x t θ) log(y t!)]. he gradient vector is θ L (θ) = 1 [y t exp(x tθ)]x t, and the Hessian matrix is 2 θ L (θ) = 1 exp(x tθ)x t x t. Since exp(x tθ) are non-negative, the Hessian matrix is also negative definite on the parameter space. A drawback of the Poisson specification is that it in effect restricts the conditional variance to be the same as the conditional mean λ t.writing y t =exp(x tθ)+e t, we simply have a nonlinear specification of the conditional mean function without restricting the conditional variance. Such a specification does not consider the feature that y t are discrete-valued count data, however Models of Limited Dependent Variables here are also numerous economic applications in which a dependent variable can only be observed within a limited range. Such a variable is known as a limited dependent variable. For example, in the study of the household expenditure on durable goods, obin (1958) noticed that positive expenditures can be observed only when households spend more than the cheapest available price; potential expenditures below the minimum price are not observable but appear as zeros. In this case, the expenditure data are said to be censored. On the other hand, data may be truncated if they are completely lost outside a given range; for example, income data may be truncated when they are below certain level. A proper specification for a limited dependent variable must take data censoring or truncation into account, as will be apparent in subsequent sub-sections.

11 10.4. MODELS OF LIMIED DEPENDEN VARIABLES runcated Regression Models First consider the dependent variable y t which is truncated from below at c, i.e., y t >c. he conditional density of truncated y t is g(y t y t >c,x t )=g(y t x t )/ IP(y t >c x t ), and we may approximate g(y t x t )using f(y t x t ; β,σ 2 )= Setting u t = y t /σ, wehave IP(y t >c x t )= 1 σ ( 1 exp (y t x tβ) 2 ) 2πσ 2 2σ 2 c ( yt x t φ β ) dy σ t = 1 ( σ φ yt x ) tβ. σ = φ(u t )du t (c x t β)/σ ( c x =1 Φ t β ). σ It follows that the truncated density function g(y t y t >c,x t ) can be approximated by f(y t y t >c,x t ; β,σ 2 )= φ[(y t x tβ)/σ] σ [ 1 Φ ( (c x t β)/σ)], which, of course, depends on the truncation parameter c. he quasi-log-likelihood function is therefore 1 2 [log(2π)+log(σ2 )] 1 2σ 2 (y t x tβ) 2 1 [ ( c x log 1 Φ t β ) ]. σ Reparameterizing by α = β/σ and γ = σ 1,wehave L (θ) = log(2π) 2 +log(γ) 1 2 (γy t x t α)2 1 log [ 1 Φ(γc x t α)], where θ =(α γ). he first-order condition is 1 [ ] (γy t x t θ L (θ) = α) φ(γc x t α) 1 Φ(γc x t α) x t 1 γ 1 [ ] (γy t x t α)y t φ(γc x t α) 1 Φ(γc x α)c t = 0, from which we can solve for the QMLE of θ. It can be verified that L (θ) is globally concave in θ.

12 278 CHAPER 10. QUASI-MAXIMUM LIKELIHOOD: APPLICAIONS When f(y t y t >c,x t ; θ o ) is the correct specification for g(y t y t >c,x t ), it is not too difficult to derive the conditional mean of truncated y t as IE(y t y t >c,x t )= y t f(y t y t >c,x t ; θ o )dy t c = x t β o + σ φ ( (c x t β o )/σ ) o o 1 Φ ( (c x t β ); o)/σ o (10.1) which differs from x tβ o by a nonlinear term; see Exercise hus, the OLS estimator of regressing y t on x t would be inconsistent for β o when y t are truncated. his also shows why a proper model must consider data truncation. Define the hazard function λ as λ(u) =φ(u)/[1 Φ(u)], which is also known as the inverse Mill s ratio. Setting u t =(c x tβ o )/σ o, the truncated mean can be expressed as IE(y t y t >c,x t )=x tβ o + σ o λ(u t ). It can also be shown that the truncated variance is var(y t y t >c,x t )=σo[ 2 1 λ 2 ] (u t )+λ(u t )u t, instead of σo; 2 see Exercise he marginal response of IE(y t y t >c,x t ) to a change of x t is β o [ 1 λ 2 (u t )+λ(u t )u t ] = β o σ 2 o var(y t y t >c,x t ), which depends on the truncated variance. Consider now the case that the dependent variable y t is truncated from above at c, i.e., y t <c. he truncated density function g(y t y t <c,x t ) can be approximated by f(y t y t <c,x t ; β,σ 2 )= φ[(y t x tβ)/σ] σ Φ ( (c x t β)/σ). hen, analogous to (10.1), we have IE(y t y t <c,x t )=x φ ( (c x ) tβ o σ tβ o )/σ o o Φ ( (c x t β o )/σ ), o with the inverse Mill s ratio λ(u) = φ(u)/φ(u).

13 10.4. MODELS OF LIMIED DEPENDEN VARIABLES Censored Regression Models Consider the following censored dependent variable: { yt, yt > 0, y t = 0, yt 0, where yt = x t β + e t is an index variable. It does not matter whether the threshold value of yt is zero or a non-zero constant c (e.g., the cheapest available price) because c can always be absorbed into the constant term in the specification of yt. Let g and g denote, respectively, the densities of y t and yt conditional on x t. When yt > 0, g(y t x t )=g (yt x t ), and when yt 0, censoring yields the conditional probability IP(y t =0 x t )= 0 g (y t x t )dy t. In the standard obit model (obin, 1958), g (yt x t ) is approximated by ( f(yt x t ; 1 β,σ2 )= exp (y t x tβ) 2 ) 2πσ 2 2σ 2 = 1 ( y σ φ t x ) tβ. σ For yt 0, IP{y t =0 x t } is approximated by 1 0 x φ((yt x σ tβ)/σ) dyt t β/σ = φ(v t )dv t =1 Φ ( x t β Letting α = β/σ and γ = σ 1, the model would be correctly specified for {yt x t } if there exists θ o =(α o γ o ) such that f(yt x t ; θ o )=g (yt x t ). Given the specification above, the quasi-log-likelihood function is 1 2 [log(2π)+log(σ2 )] + 1 {t:y t=0} log(1 Φ(x t β/σ)) 1 2 σ ). {t:y t>0} [(y t x t β)/σ]2, where 1 is the number of t such that y t > 0. he QMLE of θ =(α γ) can then be obtained by maximizing L (θ) = 1 log(1 Φ(x t α)) + 1 log γ 1 (γy 2 t x t α)2. {t:y t=0} he first order condition is θ L (θ) = 1 {t:y t=0} {t:y t>0} φ(x t α) 1 Φ(x t α)x t + {t:y t>0} (γy t x tα)x t 1 γ {t:y t>0} (γy t x t α)y t. = 0,

14 280 CHAPER 10. QUASI-MAXIMUM LIKELIHOOD: APPLICAIONS from which the QMLE can be computed. Again, L (θ) is globally concave in α and γ. he conditional mean of censored y t is IE(y t x t )=IE(y t y t > 0, x t )IP(y t > 0 x t )+IE(y t y t =0, x t )IP(y t =0 x t ) =IE(y t y t > 0, x t )IP(y t > 0 x t ). When f(yt x t ; β,σ2 ) is correctly specified for g (yt x t ), ( y IP(yt > 0 x t )=IP t x ) tβ o > x tβ o σ o σ x t =Φ(x t β o /σ o ). o his leads to the following conditional density: g(y t y t > 0, x t )= g (y t x t ) IP(y t > 0 x t ) = φ[(y t x tβ o )/σ o ] σ o Φ(x t β o /σ o ). hen, analogous to (10.1), we have IE(y t y t > 0, x t )= 0 y t g(y t y t > 0, x t )dy t = x tβ o + σ o φ(x t β o /σ o ) Φ(x t β o /σ o ); (10.2) see Exercise It follows that IE(y t x t )=x tβ o Φ(x tβ o /σ o )+σ o φ(x tβ o /σ o ). his shows that x tβ can not be the correct specification of the conditional mean of censored y t. Similar to truncated regression models, regressing y t on x t results in an inconsistent estimator for β o. It is also easy to calculate that the marginal response of IE(y t x t ) to a change of x t is β o Φ(x t β o /σ o ). Consider the case that y t is censored from above: { y t, yt < 0, y t = 0, y t 0, where y t = x tβ + e t.whenf(y t x t ; β,σ 2 ) is correctly specified for g (y t x t ), hen, IP(y t < 0 x t )=1 Φ(x tβ o /σ o ). g(y t y t < 0, x t )= g (y t x t ) IP(y t < 0 x t ) = φ[(y t x t β o )/σ o ] σ o [1 Φ(x t β o /σ o )],

15 10.4. MODELS OF LIMIED DEPENDEN VARIABLES 281 and hence, analogous to (10.2), IE(y t y t < 0, x t )=x tβ o σ o φ(x tβ o /σ o ) 1 Φ(x t β o /σ o ). Consequently, IE(y t x t )=x tβ o [1 Φ(x tβ o /σ o )] σ o φ(x tβ o /σ o ). and its marginal response to a change of x t is β o [1 Φ(x t β o /σ o )] Models of Sample Selection It is also common to encounter the problem of sample selection or incidental truncation in empirical studies. For example, in the study on how wages and individual characteristics determine working hours, the data of working hours are not resulted from random sampling but from those who select to work. In other words, only those whose market wages are above their reservation wages (the minimum wages for which they are willing to work) may be observed. hus, the decision of working must be related to working hours. Consider two variables y 1 and y t,wherey 1 is a binary choice variable. he selection problem arises when the former affects the latter. hese two variables are { 1, y y 1,t = 1,t > 0, 0, y1,t 0. y 2,t = y2,t, if y 1,t =1, where y1,t = x 1,t β + e 1,t and y 2,t = x 2,t γ + e 2,t are index variables. We observe y 1,t but not the index variable y1,t. Also, we observe y 2,t when y 1,t =1sothaty 2,t are incidentally truncated when y 1,t = 0. It is typical to model (y1,t y 2,t ) basedona bivariate normal distribution: (( ) [ ]) x N 1,t β σ 2 x 2,t γ, 1 σ 12. σ 12 σ2 2 When this is a correct specification for the conditional distribution of (y1,t y 2,t ),the corresponding true parameters are β o, γ o, σ1,o 2, σ2 2,o and σ 12,o. Computing the QMLE based on this bivariate specification is quite complicated. Heckman (1976) suggest a simpler, two-step estimator for the sample-selection model.

16 282 CHAPER 10. QUASI-MAXIMUM LIKELIHOOD: APPLICAIONS For notation simplicity, we shall, in what follows, omit the conditioning variables x 1,t and x 2,t. First recall that the conditional mean of y 2,t given y 1,t is IE(y 2,t y 1,t) =x 2,tγ o + σ 12,o (y 1,t x 1,tβ o )/σ 2 1,o. he result of the truncated regression model also shows that, when the truncation parameter c =0, IE(y 1,t y 1,t > 0) = x 1,tβ o + σ 1,o λ(x 1,tβ o /σ 1,o ), with λ(u) = φ(u)/φ(u). Ptting these results together we have IE(y2,t y 1,t > 0) = x 2,t γ o + σ ( 12,o x 1,t β ) o λ. σ 1,o σ 1,o his motivates the following specification: y 2,t = x 2,t γ + σ 12 σ 1 λ(x 1,t β/σ 1 )+e t, which is a complicated nonlinear regression with heteroskedastic errors. Heckman s two-step estimator is computed as follows. 1. Estimate α = β/σ 1 based on the probit model of y 1,t and denote the estimator as α. 2. Regress y 2,t on x 2,t and λ(x 1,t α ). As the QMLE α in the first step is consistent for α o, the second step is equivalent to regressing y 2,t on x 2,t and λ(x 1,t α o). hus, the OLS estimator of regressing y 2,t on x 2,t alone, which ignores the λ term, is inconsistent and may have severe sample-selection bias in finite samples.

17 10.4. MODELS OF LIMIED DEPENDEN VARIABLES 283 Exercises 10.1 In the logit model with the k 1 parameter vector θ, consider the null hypothesis that an s 1subvectorθ is zero, where s<k. Write down the Wald statistic with a hteroskedasticity-consistent covariance matrix estimator In the logit model with the k 1 parameter vector θ, consider the null hypothesis that an s 1 subvector θ is zero, where s<k. Write down the LM statistic with a hteroskedasticity-consistent covariance matrix estimator Construct a PSE test for testing the null hypothesis of a logit model with F (x t ; θ) = G(x tθ) against the alternative hypothesis of a probit model with F (x t ; θ) = Φ(x tθ) In the conditional logit model with the k 1 parameter vector θ, consider the null hypothesis that an s 1 subvector θ is zero, where s<k. Write down the Wald statistic with a hteroskedasticity-consistent covariance matrix estimator For the ordered logit model, find the marginal responses of the choice probabilities to the change of the variables In the Poisson regression model with the parameter vector θ, consider the null hypothesis Rθ = r, withr a q k given matrix. Write down the Wald statistic with a hteroskedasticity-consistent covariance matrix estimator In the Poisson regression model with the parameter vector θ, consider the null hypothesis Rθ = r, withr a q k given matrix. Write down the LM statistic with a hteroskedasticity-consistent covariance matrix estimator In the truncated regression model with the truncation parameter c, show that IE(y t y t >c,x t ) is given by (10.1) In the truncated regression model with the truncation parameter c, show that var(y t y t >c,x t )=σo[ 2 1 λ 2 ] (u t )+λ(u t )u t,whereλ(ut )=φ(u t )/[1 Φ(u t )] and u t =(c x tβ o )/σ o In the obit model, show that IE(y t y t > 0, x t ) is given by (10.2).

18 284 CHAPER 10. QUASI-MAXIMUM LIKELIHOOD: APPLICAIONS References Amemiya, akeshi (1985). Advanced Econometrics, Cambridge, MA: Harvard University Press. Heckman, James J. (1976). Common structure of statistical medels of truncation, sample selection, and limited dependent variables and a simple estimator for such models, Annals of Economic and Social Measurement, 5, McFadden, Daniel L. (1973). Conditional logit analysis of qualitative choice behavior, in P. Zarembka (ed.), Frontier of Econometrics, New York: Academic Press. obin, James (1958). Estimation of relationships for limited dependent variables, Econometrica, 26,

Quasi Maximum Likelihood Theory

Quasi Maximum Likelihood Theory Quasi Maximum Likelihood heory CHUNG-MING KUAN Department of Finance & CREA June 17, 2010 C.-M. Kuan (National aiwan Univ.) Quasi Maximum Likelihood heory June 17, 2010 1 / 119 Lecture Outline 1 Kullback-Leibler