Discrete Dependent Variable Models

Similar documents
Analogy Principle. Asymptotic Theory Part II. James J. Heckman University of Chicago. Econ 312 This draft, April 5, 2006

Maximum Likelihood (ML) Estimation

Labor Supply and the Two-Step Estimator

Econometric Analysis of Cross Section and Panel Data

Probabilistic Choice Models

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes

Greene, Econometric Analysis (6th ed, 2008)

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

ECON Introductory Econometrics. Lecture 11: Binary dependent variables

Ability Bias, Errors in Variables and Sibling Methods. James J. Heckman University of Chicago Econ 312 This draft, May 26, 2006

A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

Chapter 11. Regression with a Binary Dependent Variable

Rewrap ECON November 18, () Rewrap ECON 4135 November 18, / 35

Introduction to Maximum Likelihood Estimation

Business Economics BUSINESS ECONOMICS. PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS MODULE No. : 3, GAUSS MARKOV THEOREM

A Course on Advanced Econometrics

Maximum Likelihood Tests and Quasi-Maximum-Likelihood

Statistical Distribution Assumptions of General Linear Models

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43

Modeling Binary Outcomes: Logit and Probit Models

Chapter 9 Regression with a Binary Dependent Variable. Multiple Choice. 1) The binary dependent variable model is an example of a

POLI 8501 Introduction to Maximum Likelihood Estimation

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

Poisson Regression. Ryan Godwin. ECON University of Manitoba

Introduction to Econometrics

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

Quantile methods. Class Notes Manuel Arellano December 1, Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be

P1: JYD /... CB495-08Drv CB495/Train KEY BOARDED March 24, :7 Char Count= 0 Part II Estimation 183

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation

Generalized Linear Models for Non-Normal Data

GLS and FGLS. Econ 671. Purdue University. Justin L. Tobias (Purdue) GLS and FGLS 1 / 22

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

MLE and GMM. Li Zhao, SJTU. Spring, Li Zhao MLE and GMM 1 / 22

Ch 7: Dummy (binary, indicator) variables

Economics 583: Econometric Theory I A Primer on Asymptotics

Very simple marginal effects in some discrete choice models *

Markov-Switching Models with Endogenous Explanatory Variables. Chang-Jin Kim 1

Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis"

CRE METHODS FOR UNBALANCED PANELS Correlated Random Effects Panel Data Models IZA Summer School in Labor Economics May 13-19, 2013 Jeffrey M.

Introduction: structural econometrics. Jean-Marc Robin

Economics 620, Lecture 18: Nonlinear Models

Statistics 3858 : Maximum Likelihood Estimators

Applied Economics. Regression with a Binary Dependent Variable. Department of Economics Universidad Carlos III de Madrid

LECTURE 10: NEYMAN-PEARSON LEMMA AND ASYMPTOTIC TESTING. The last equality is provided so this can look like a more familiar parametric test.

Non-linear panel data modeling

SOLUTIONS Problem Set 2: Static Entry Games

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2

Finite Sample Performance of A Minimum Distance Estimator Under Weak Instruments

Four Parameters of Interest in the Evaluation. of Social Programs. James J. Heckman Justin L. Tobias Edward Vytlacil

Lecture 4: Heteroskedasticity

QED. Queen s Economics Department Working Paper No Russell Davidson Queen s University. James G. MacKinnon Queen s University

Week 7: Binary Outcomes (Scott Long Chapter 3 Part 2)

1 Motivation for Instrumental Variable (IV) Regression

ECON3150/4150 Spring 2015

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Semiparametric Regression

Large Sample Properties of Estimators in the Classical Linear Regression Model

Linear Regression and Its Applications

Linear Regression With Special Variables

ECON 594: Lecture #6

LECTURE 18: NONLINEAR MODELS

The Simple Linear Regression Model

Introduction to Estimation Methods for Time Series models Lecture 2

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2016 Instructor: Victor Aguirregabiria

AGEC 661 Note Fourteen

ECONOMETRICS I. Assignment 5 Estimation

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.

Econ 583 Homework 7 Suggested Solutions: Wald, LM and LR based on GMM and MLE

Lecture 10: Alternatives to OLS with limited dependent variables. PEA vs APE Logit/Probit Poisson

Introduction to Linear Regression Analysis

Binary Dependent Variable. Regression with a

Econometrics II - EXAM Answer each question in separate sheets in three hours

POL 681 Lecture Notes: Statistical Interactions

Binary Choice Models Probit & Logit. = 0 with Pr = 0 = 1. decision-making purchase of durable consumer products unemployment

A Course in Applied Econometrics Lecture 7: Cluster Sampling. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

What s New in Econometrics? Lecture 14 Quantile Methods

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47

Wooldridge, Introductory Econometrics, 4th ed. Chapter 2: The simple regression model

Econometrics Summary Algebraic and Statistical Preliminaries

Discussion of Sensitivity and Informativeness under Local Misspecification

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

Quadratic Equations Part I

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models

Using Matching, Instrumental Variables and Control Functions to Estimate Economic Choice Models

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Econometric Analysis of Games 1

Lecture 6: Discrete Choice: Qualitative Response

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Lectures 5 & 6: Hypothesis Testing

MC3: Econometric Theory and Methods. Course Notes 4

ECON 5350 Class Notes Nonlinear Regression Models

Extensions to the Basic Framework II

New Developments in Econometrics Lecture 16: Quantile Estimation

Econometrics Lecture 5: Limited Dependent Variable Models: Logit and Probit

Logistic Regression: Regression with a Binary Dependent Variable

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Applied Quantitative Methods II

Econometrics of Panel Data

Transcription:

Discrete Dependent Variable Models James J. Heckman University of Chicago This draft, April 10, 2006

Here s the general approach of this lecture: Economic model Decision rule (e.g. utility maximization) (e.g. FOC) {z } Sec. 1 Motivation: Index function and random utility models Econometric model Underlying regression (e.g. solve the FOC for (e.g. depending on observed data, discrete or limited dependent adependentvariable) variable model) {z } Sec. 2 Setup [Estimation] {z } Sec. 4 Estimation [Interpretation] {z } Sec. 3 Marginal Eects 1

We assume that we have an economic model and have derived implications of the model, e.g. FOCs, which we can test. Converting these conditions into an underlying regression usually involves little more than rearranging terms to isolate a dependent variable. Often this dependent variable is not directly observed, in a way that we ll make clear later. In such cases, we cannot simply estimate the underlying regression. Instead, we need to formulate an econometric model that allows us to estimate the parameters of interest in the decision rule/underlying regression using what little information we have on the dependent variable. 2

We will present two models in part A which will help us bridge the gap between inestimable underlying regressions and an estimable econometric model. In part B, we will further develop the econometric model introduced in part A so that it is ready for estimation. In part C, we jump ahead to interpreting our results. In particular we will explain why, unlike in the linear regression models, the estimated b does not give us the marginal eect of a change in the independent variables on the dependent variable. We jump ahead to this topic because it will give us some information we need when we estimate the model. Finally, part D will describe how to estimate the model. 3

1 Motivation Discrete dependent variable models are often cast in the form of index function models or random utility models. Both models view the outcome of a discrete choice as a reflection of an underlying regression. The desire to inform econometric models with economic models suggests that the underlying regression be a marginal cost-benefit analysis calculation. The dierence between the two models is that the structure of the cost-benefit calculation in index function models is simpler than that in random utility models. 4

1.1 Index function models Since marginal benefit calculations are not observable, we model the dierence between benefit and cost as an unobserved variable such that: = 0 + where (0 1), with symmetric. While we do not observe, we do observe, which is related to in the sense that: and =0if 0 =1if 0 In this formulation 0 is called the index function. Note two things. First, our assumption that () =1could be changed 5

to () = 2 instead, by multiplying our coecients by 2 Our observed data will be unchanged; =0or 1, depending only on the sign of, not its scale. Second, setting the threshold for given at 0 is likewise innocent if the model contains a constant term. (In general, unless there is some compelling reason, binomial probability models should not be estimated without constant terms.) Now the probability that =1is observed is: Pr{ =1} = Pr{ 0} = Pr{ 0 + 0} = Pr{ 0 } 6

Then under the assumption that the distribution of is symmetric, we can write: Pr{ =1} =Pr{ 0 } = ( 0 ) where is the cdf of. This provides the underlying structural model for estimation by MLE or NLLS estimation. 7

1.2 Random utility models Suppose the marginal cost benefit calculation was slightly more complex. Let 0 and 1 be the net benefit or utility derived from taking actions 0 and 1, respectively. We can model this utility calculus as the unobserved variables 0 and 1 such that: 0 = 0 0 + 0 1 = 0 1 + 1 Now assume that ( 1 0 ) (0 1), where is symmetric. Again, although we don t observe 0 and 1, we do observe where: = 0 if 0 1 = 1 if 0 1 8

In other words, if the utility from action 0 is greater than action 1, i.e., 0 1,then =0=1whentheconverseistrue. Here the probability of observing action 1 is: Pr{ =1} = Pr{ 0 1 } =Pr{ 0 0 + 0 0 1 + 1 } = Pr{ 1 0 0 0 0 1 } = ( 0 1 0 0 ) 9

2 Setup The index function and random utility models provide the link between an underlying regression and an econometric model. Now we ll begin the process of flushing out the econometric model. First we ll consider dierent specifications for the distribution of and later, in part C, examine how marginal eects are derived from our probability model. This will pave the way for our discussion of how to estimate the model. 10

2.1 Why Pr{ =1}? In both index function and random utility models, the probability of observing =1has the structure: Pr{ =1} = ( 0 ). Why are we so interested in the probability that =1? Because the expected value of given is just that probability: [] =0 (1 )+1 = ( 0 ). 11

2.2 Common specifications for ( 0 ) How do we specify ( 0 )? There are four basic specifications that dominate the literature. (a) Linear probability model (LPM): ( 0 )= 0 (b) Probit: () =( 0 )= R 0 () = R 0 1 2 2 2 (c) Logit: ( 0 )=( 0 )= 0 1+ 0 12

(d) Extreme Value Type I: ( 0 )= ( 0 )=1 0 13

2.3 Deciding which specification to use Each specification has its advantages and disadvantages. (1) LPM. The linear probability model is popular because it is extremely simple to estimate. This simplicity, however, comes at a cost. To see what we mean, set up the NLLS regression model. = [ ]+( [ ]) = ( 0 )+ = 0 + Because is linear, this just collapses down to the CR model. Notice that the error term: =1 0 with probability = 0 and 0 with probability 1 =1 0 14

This implies that: [ ] = [ 2 ] 2 [ ] =[ 2 ] = (1 0 ) 2 +(1 ) ( 0 ) 2 = 2 0 + [ 0 ] 2 +[ 0 ] 2 [ 0 ] 2 = 2 0 +[ 0 ] 2 = 0 2[ 0 ] 2 +[ 0 ] 2 = 0 (1 0 ) So our first problem is that is heteroscedastic in a way that depends on Of course, absent any other problems, we could manage this with an FGLS estimator. A second more serious problem, however, is that since 0 is not confined to the [0 1] interval, the LPM leaves open the possibility of predicted probabilities that lie outside the [0 1] interval, which is nonsensical, and of negative 15

variances: 0 1[] = = 0 1 [] = 0 (1 0 ) 0 0 0 [] 0 [] 0 This is a problem that is harder to correct. We could define =1if ( 0 )= 0 1 and =0if ( 0 )= 0 0, but this procedure creates unrealistic kinks at thetruncationpointsfor( 0 =0or 1). (2) Probit vs. Logit. The probit model, which uses the normal distribution, is sometimes (inappropriately) justified by appealing to a central limit theorem,while the 16

logit model can be justified by the fact that it is similar to a normal distribution but has a much simpler form. The dierence between the logit and normal distribution is that the logit has slightly heavier tails. The standard normal has mean zero and variance 1 while the logit has mean zero and variance equal to 2 3 (3) Extreme Value Type I. The extreme value type I distribution is the least common of the four models. It is importanttonotethatthisisanasymmetricpdf. 17

3 Marginal eects Unlike in linear models such as the CR or Neo-CR models, the marginal eect of a change in on [] is not simply To see why, dierentiate [] by : [] = (0 ) ( 0 ) ( 0 ) = ( 0 ) These marginal eects look dierentineachofthefourbasic probability models. 1. LPM. Note that ( 0 )=1,so( 0 ) =, whichis the same as in the CR-type models, as expected. 18

2. Probit. Now, ( 0 ) = ( 0 ) = 1 2 ( 0 ) 2 2 so ( 0 ) = 3. Logit. In this case: 0 ( 0 ) = (0 ) ( 0 ) = 0 1+ 0 (1 + 0 ) 2 0 0 = μ1 0 1+ 0 1+ 0 = ( 0 )(1 ( 0 )) Giving us the marginal eect ( 0 ) = (1 ). 19

3.1 Converting probit marginal eects to logit marginal eects To convert a probit coecient estimate to a logit coecient estimate, from the discussion above comparing the variances of probit and logit random variable, it would make sense to multiply the probit coecient estimate by 3 = 18 (since variance of logit is 2 3 whereas variance of the normal is 1). But Amemiya suggests a dierent conversion factor. Through trial and error he found that 16 works better at the center of the distribution, which demarcates the mean value of the regressors. At the center of the distribution, = 05 and 0 =0. Well (0) = 03989 while (0) = 025. So we want to solve the equation, 03989 probit =025 logit this gives us logit =16 profit. 20

4 Estimation and hypothesis testing Therearetwobasicmethodsofestimation,MLEandNLLS estimation. Since the former is far more popular, we ll spend most of our time on it. 21

4.1 MLE Given our assumption that the are i.i.d., by the definition of independence, we can write the joint probability of observing { } =1 as Pr{ 1 2 } = =0[1 ( 0 )] =1[ ( 0 )] Using the notational simplification ( 0 ) = ( 0 ) = 0 ( 0 )= 0 we can write the likelihood function as: = (1 ) 1 ( ) Since we are searching for a value of that maximizes the probability of observing what we have, monotonically increasing transformations will not aect our maximization result. 22

Hence we can take logs of the likelihood function; and since maximizing a sum is easier than maximizing a product, we take the log of the likelihood function: Now estimate b by: ln = P {(1 )ln[1 ]+ ln } b =argmax ln Within the MLE framework, we shall now examine the following six (estimation and testing) procedures: A. Estimating b ; 23

B. Estimating asymptotic variance of b ; C. Estimating asymptotic variance of the predicted probabilities; D. Estimating asymptotic variance of the marginal eects; E. Hypothesis testing; and F. Measuring goodness of fit A. Estimating b To solve max conditions. ln we need to examine the first and second order 24

First Order Conditions (FOCs): A necessary condition for maximization is that the first derivative equal zero: ln ln = ( 0 ) ( 0 ) = ln ( 0 =0 ) If we write: and we plug in: ( 0 ) ( 0 ) = (0 ) ln = X {(1 )ln[1 ]+ ln } 25

then we just need to solve: X (1 ) + 1 = X ( 1) + (1 ) (1 ) =0 X ( ) =0 {FOCs} (1 ) NowwelookatthespecificFOCsinthreemainmodels: (1) LPM. Since = 0 and =1, our FOC becomes: X ( ) (1 ) = X 26 ( 0 ) (1 0 ) 0 =0

This is just a set of linear equations in and which we can solve explicitly for in two ways. (i) Least squares. The first solution gives us a result that is reminiscent of familiar least squares predictors. (a) GLS. Solving for the in the numerator, we get something resembling the generalized least squares estimator, where each is weighted by the variance of X 0 2 (1 0 ) 0 = X P = P (1 0 ) 0 (1 0 ) 0 2 (1 0 ) 0 = P ( ) P 2 ( ) 27

(b) OLS. If we assume homoscedasticity, i.e: (1 0 ) 0 = ( )=() = 2 Then the equation above collapses into the standard OLS estimator of : P P 1 () = P = P 1 () 2 (ii) GMM. If we rewrite 0 = then the FOC conditions resemble the generalized method of moments condition for solving the heteroscedastic linear LS model: X (1 0 ) 0 =0 X ( ) =0 28 2

Again, if we assume homoskedasticity, we get the moment condition for solving the CR model: 1 () X = X =0 Note that each of these estimators is identical. Some maybemoreecient than others in the presence of heteroscedasticity, but, in general, they are just dierent ways of motivating the LS estimator. 29

(2) Probit. Noting that = = the FOC is just: X ( ) (1 ) = X = X ( ) (1 ) (1 ) X (1 ) If we define (refer the results in the Roy Model handout): 0 = ( 0 )= (1 ) 1 = ( 0 )= 30

Then we can rewrite the FOC as: X =0 where: = 0 if =0,and 1 if =1 Note that, unlike in the LPM, these FOC are a set of nonlinear equations in. They cannot be easily solved explicitly for. So has to be estimated using the numerical methods outlined in the Asymptotic Theory Notes. 31

(3) Logit. Here = and = (1 ),sothefoc becomes: X ( ) = X ( ) (1 ) =0 (1 ) (1 ) X ( ) = 0 Interestingly, note that we can write = so that the FOC can be written P ( ) = P =0, which is similar to the moment conditions for the LPM. Like the probit model, however, the FOC for the logit model are nonlinear in and must therefore be solved using numerical methods. Second Order Condition (SOC): Together, the FOCs and the 32

SOC that the second derivative or Hessian be negative definite are necessary and sucient conditions for maximization. To verify the second order condition, let: So that we need to check: ( 0 ) ( 0 ) = 0 ( 0 ) 2 ln ln ( 0 0 = ( 0 ) ( 0 ) ) 2 ln = ( 0 )( 0 ) 0 0 = X ( ) ( 0 ) (1 ) 0 0 33

(1) LPM.WecanprovethattheLPMsatisfiestheSOC : P ( 0 ) ( 0 ) (1 0 ) 0 0 = P (1 0 ) 0 ( 0 ) (1 0 ) 2 ( 0 ) (1 2 20 ) 0 0 3 +2 0 2 = P = P 0 (1 0 ) 2 ( 0 ) 2 ( 0 ) 2 (1 0 ) 2 ( 0 ) 2 0 0 (Using fact {0 1} 2 = ) (2) Probit. The same can be said about the probit model, 34

and the proof follows from the results in the Roy model. First, note that 0 ( 0 )= 0 ( 0 ) Taking the derivative of the first derivative we need to show: X ( 0 ) [ ] 0 = X ( 0 ) [ ] 0 0 We can simplify this expression using results for the truncated normal (see results on truncated normal in Roy 35

Model handout): 0 ( 0 ) = ( 0 ) 1 1 ( 0 ) = 0 2 1 (1 ) = 2 0 0 2 0 = 0 ( 0 + 0 ) 0 = ( 0 = 0 2 ) 2 = 0 1 2 1 = 1 ( 0 + 1 ) 0 So that we can write the SOC as: X ( 0 + ) 0 0 36

Where: = 0 = (1 ) if =0,and 1 = if =1 (3) Logit. Taking the derivative of the FOC for logit, we get the SOC : P [ ) ] ( 0 ) 0 = P (1 ) 0 0 which clearly holds. Note that since the Hessian does not include, the Newton-Raphson method of numerical optimization, which uses in its iterative algo- 37

rithm, and the method of scoring, which uses [], are identical in the case of the logit model. Why? Because [] is taken with respect to the distribution of. We ve shown that the LPM, probit and logit models are globally concave. So the Newton-Raphson method of optimization will converge in just a few iterations for these three models unless the data are very badly conditioned. B. Estimating the Asy Cov matrix for b Recall the following two results from the MLE notes: (a) ( b 0 ) (0 ( 0 ) 1 ) Ã! 1 2 ln where ( 0 )=plim 0 0 38

(b) lim 1 ln ln 0 = μ 1 ln ln 0 = 2 ln 0 = plim à 1 2 ln 0 0! = lim 1 2 ln 0 We have three possible estimators for Asy.Var[ b ] based on these two facts. (1) Asy.Var[ ]= b b 1 where b = P ( ) ( 0 0 ) (1 ) (2) Asy. Var[ b ]=[] 1 where [] = 2 ln 0 39

In any model where does not depend on [] = b since the expectation has taken over the distribution of. Soinmodelssuchaslogitthefirst and second estimators are identical. In the probit model, b depends on so b 6= []. Amemiya ( Qualitative Response Models: A Survey, Journal of Economic Literature, 19, 4, 1981, pp. 481-536) showed that: [] probit = X 0 1 0 = X 2 (1 ) 0 (3) Berndt, Hall, Hall and Hausman took the following estimator from T.W. Anderson (1959) which we call the 40

TWA estimator: where b = X Asy.Var[ b ]= b 1, μ ( ) (1 ) 0 0 μ ( ) (1 ) Notice there is no negative sign before the b 1,asthe two negative signs cancel each other out. Note that the three estimators listed here are the basic three variants on the gradient method of iterative numerical optimization explained in the numerical optimization notes. 41

C. Estimating the Asy Cov matrix for predicted probabilities, ( b 0 ) For simplicity, let ( b 0 )= b Recall the delta method: if is twice continuously dierentiable and ( 0 ) (0 2 ), then: (( ) ( 0 )) (0 [ 0 ( 0 )] 2 2 ) Applying this to b we get ³ ( b ) ( 0 ) (0 [ b 0 ( 0 )] 2 [ b ]) where 0 is the true parameter value. So a natural estimator for the asymptotic covariance matrix for the predicted probabilities is: 42

Asy.Var[ b ]= ³ 0 ³ where =Asy.Var[ b ]. Since: b b = b ( b 0 ) ( b 0 ) b as: Asy.Var[ b ]=( ) b 2 0 =( b ),wecanwritetheestimator D. Estimating the Asy Cov matrix for marginal eects, ( b 0 ) To recap, the marginal eects are given by: [] = = ( 0 ) ( 0 ) = 43

To simplify notation, let ( b 0 ) b = b = b Again, using the delta method as motivation, a sensible estimator for the asymptotic variance of ( ) b would be: μ μ 0 b b Asy. Var[b] = b b where is as above. We can be more explicit in defining our estimator by noting that: This gives us: b b = ( b ) b = b b b + = b + b ( b b 0 0 ) b ( 0 ) ( b 0 ) b b 44

Asy.Var[ b b ]= Ã b + b! ( b b 0 0 ) Ã b + b 0 ( b b! 0 0 ) This equation still does not tell us much. It may be more interesting to look at what the estimator looks like under dierent specifications of (1) LPM. Recall = 0 =1 and 0 =0,so: Asy.Var[ b b ] = =Asy.Var[ b ] (2) Probit. Here =, = and 0 = 0, leaving us with: Asy.Var[ b ] = b ³ ³ ³ ³ 2 b 0 0 b b 0 0 b 0 45

(1) Logit. Now =, = (1), and 0 = (1)[1 2], so: Asy. Var[ b ] h i 2 ³ logit = b(1 ) b +(12) b b 0 ³ +(12) b b 0 0 E. Hypothesis testing Suppose we want to test the following set of restrictions, 0 : =. If we let be the number of restrictions in, i.e., rank (), then MLE provides us with three test statistics (refer also the Asymptotic Theory notes). (1) Wald test ³ = b 0 [ Est.Asy.Var( ) b 0 ]( b ) 2 () 46

Example. Suppose 0 : the last coecients or elements of are 0. Define =[0 ] and =0; and let b be the last elements of b Then we get = b 0 1b (2) Likelihood ratio test = 2[ln ( b ) ln ( b )] 2 () where ln ( b ) and ln ( b ) are the log likelihood function evaluated with and without the restrictions on b, respectively. 47

Example. To test 0 : all slope coecients except that on the constant term are 0, let ln ( b ) = X { ln +(1 )ln(1 )} = X { )ln +([1 ])ln(1 )} = { ln +(1 )ln(1 )} where is the proportion of observations with =1. 48

(3) Score or Lagrange multiplier test Write out the Lagrangian for the MLE problem given the restriction = : =ln( ). The ln first order condition is = So the test statistic is = 0,where is just evaluated at Example. In the logit model, suppose we want to test 0 : all slopes are 0. Then = 2,where 2 is the uncentered coecient of determination in the regression of ( ) on,where is the proportion of =1observations in the sample. (Don t worry about how this is derived.) 49

F. Measuring goodness of fit There are three basic ways to describe how well a limited dependent variable model fits the data. (1) Log likelihood function, ln. The most basic way to describe how successful the model is at fitting the data is to report the value of ln at b Since the hypothesis that all other slopes in the model are zero is also interesting, ln computed with only a constant term (ln 0 ), which should also be reported. Comparing ln 0 to ln gives us an idea of how much the likelihood improves on adding the explanatory variables. 50

(2) Likelihood ratio index, LRI. Ananalogtothe 2 in the CR model is the likelihood ratio index, = 1 (ln ln 0 ). Thismeasurehasanintuitiveappealin that it is bounded by 0 and 1 since ln is a small negative number while ln 0 is a large negative number, making ln ln 0 1. If =1, =1whenever =1 and = 0 whenever = 0, giving us a perfect fit. =0when the fit is miserable, i.e. ln =ln 0. Unfortunately, values between 0 and 1 have no natural interpretation like they do in the 2 measure. 51

(3) Hit and miss table. A useful summary of the predictive ability of the model is a 2 2 table of the hits and misses of a prediction rule: b = 1 if ( ) b 0 ), and 0 otherwise. =0 =1 Hits # of obs. where b =0 # of obs. where b =1 Misses # of obs. where b =1 # of obs. where b =0 Theusualvaluefor =05 Note, however, that 05 may seem reasonable but is arbitrary. 52