Binary Dependent Variables

Similar documents
i (x i x) 2 1 N i x i(y i y) Var(x) = P (x 1 x) Var(x)

Lab 10 - Binary Variables

2. We care about proportion for categorical variable, but average for numerical one.

Applied Statistics and Econometrics

Homework Solutions Applied Logistic Regression

Lecture 7: OLS with qualitative information

Practice exam questions

Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, Last revised January 20, 2018

Course Econometrics I

ECON 594: Lecture #6

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests

University of California at Berkeley Fall Introductory Applied Econometrics Final examination. Scores add up to 125 points

Nonlinear Econometric Analysis (ECO 722) : Homework 2 Answers. (1 θ) if y i = 0. which can be written in an analytically more convenient way as

Measurement Error. Often a data set will contain imperfect measures of the data we would ideally like.

Lecture#12. Instrumental variables regression Causal parameters III

Problem set - Selection and Diff-in-Diff

raise Coef. Std. Err. z P> z [95% Conf. Interval]

Jeffrey M. Wooldridge Michigan State University

Lab 11 - Heteroskedasticity

Problem Set 4 ANSWERS

Lecture 4: Multivariate Regression, Part 2

Applied Statistics and Econometrics

Problem Set 10: Panel Data

Handout 11: Measurement Error

ESTIMATING AVERAGE TREATMENT EFFECTS: REGRESSION DISCONTINUITY DESIGNS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics

Functional Form. So far considered models written in linear form. Y = b 0 + b 1 X + u (1) Implies a straight line relationship between y and X

Lecture 12: Effect modification, and confounding in logistic regression

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis

Final Exam. Question 1 (20 points) 2 (25 points) 3 (30 points) 4 (25 points) 5 (10 points) 6 (40 points) Total (150 points) Bonus question (10)

Computer Exercise 3 Answers Hypothesis Testing

Lecture 4: Multivariate Regression, Part 2

Nonlinear Regression Functions

General Linear Model (Chapter 4)

Essential of Simple regression

Lecture 8: Functional Form

Lab 07 Introduction to Econometrics

CRE METHODS FOR UNBALANCED PANELS Correlated Random Effects Panel Data Models IZA Summer School in Labor Economics May 13-19, 2013 Jeffrey M.

Heteroskedasticity. (In practice this means the spread of observations around any given value of X will not now be constant)

Sociology 362 Data Exercise 6 Logistic Regression 2

2.1. Consider the following production function, known in the literature as the transcendental production function (TPF).

fhetprob: A fast QMLE Stata routine for fractional probit models with multiplicative heteroskedasticity

Sociology Exam 1 Answer Key Revised February 26, 2007

ECON2228 Notes 7. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 41

Assessing the Calibration of Dichotomous Outcome Models with the Calibration Belt

Chapter 11. Regression with a Binary Dependent Variable

(a) Briefly discuss the advantage of using panel data in this situation rather than pure crosssections

Estimating and Interpreting Effects for Nonlinear and Nonparametric Models

Correlation and regression

Question 1 carries a weight of 25%; Question 2 carries 20%; Question 3 carries 20%; Question 4 carries 35%.

STATISTICS 110/201 PRACTICE FINAL EXAM

Heteroskedasticity. Occurs when the Gauss Markov assumption that the residual variance is constant across all observations in the data set

Chapter 9 Regression with a Binary Dependent Variable. Multiple Choice. 1) The binary dependent variable model is an example of a

ECO220Y Simple Regression: Testing the Slope

Correlation and Simple Linear Regression

Exam ECON3150/4150: Introductory Econometrics. 18 May 2016; 09:00h-12.00h.

Lecture 3.1 Basic Logistic LDA

Applied Statistics and Econometrics

POLI 7050 Spring 2008 March 5, 2008 Unordered Response Models II

Empirical Application of Panel Data Regression

Generalized linear models

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b.

Immigration attitudes (opposes immigration or supports it) it may seriously misestimate the magnitude of the effects of IVs

Warwick Economics Summer School Topics in Microeconometrics Instrumental Variables Estimation

SplineLinear.doc 1 # 9 Last save: Saturday, 9. December 2006

Chapter 7. Hypothesis Tests and Confidence Intervals in Multiple Regression

SOCY5601 Handout 8, Fall DETECTING CURVILINEARITY (continued) CONDITIONAL EFFECTS PLOTS

sociology sociology Scatterplots Quantitative Research Methods: Introduction to correlation and regression Age vs Income

1 Linear Regression Analysis The Mincer Wage Equation Data Econometric Model Estimation... 11

Handout 12. Endogeneity & Simultaneous Equation Models

Econometrics II Censoring & Truncation. May 5, 2011

ECON3150/4150 Spring 2016

ECON Introductory Econometrics. Lecture 11: Binary dependent variables

Lab 6 - Simple Regression

Heteroskedasticity Example

Practice 2SLS with Artificial Data Part 1

Interpreting coefficients for transformed variables

Using the same data as before, here is part of the output we get in Stata when we do a logistic regression of Grade on Gpa, Tuce and Psi.

Logistic & Tobit Regression

Lecture 14. More on using dummy variables (deal with seasonality)

. *DEFINITIONS OF ARTIFICIAL DATA SET. mat m=(12,20,0) /*matrix of means of RHS vars: edu, exp, error*/

8. Nonstandard standard error issues 8.1. The bias of robust standard errors

Statistical Modelling with Stata: Binary Outcomes

Analyzing Proportions

Course Econometrics I

Logistic Regression. Building, Interpreting and Assessing the Goodness-of-fit for a logistic regression model

Autocorrelation. Think of autocorrelation as signifying a systematic relationship between the residuals measured at different points in time

Problem Set #5-Key Sonoma State University Dr. Cuellar Economics 317- Introduction to Econometrics

S o c i o l o g y E x a m 2 A n s w e r K e y - D R A F T M a r c h 2 7,

Specification Error: Omitted and Extraneous Variables

Control Function and Related Methods: Nonlinear Models

Graduate Econometrics Lecture 4: Heteroskedasticity

sociology 362 regression

Multiple Regression Analysis: Estimation. Simple linear regression model: an intercept and one explanatory variable (regressor)

Longitudinal Data Analysis Using Stata Paul D. Allison, Ph.D. Upcoming Seminar: May 18-19, 2017, Chicago, Illinois

1 Independent Practice: Hypothesis tests for one parameter:

Case of single exogenous (iv) variable (with single or multiple mediators) iv à med à dv. = β 0. iv i. med i + α 1

Statistical Modelling in Stata 5: Linear Models

STAT 7030: Categorical Data Analysis

FTE Employment before FTE Employment after

Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58

Transcription:

Binary Dependent Variables In some cases the outcome of interest rather than one of the right hand side variables - is discrete rather than continuous

Binary Dependent Variables In some cases the outcome of interest rather than one of the right hand side variables - is discrete rather than continuous The simplest example of this is when the Y variable is binary so that it can take only 1 or 2 possible values (eg Pass/Fail, Profit/Loss, Win/Lose)

Binary Dependent Variables In some cases the outcome of interest rather than one of the right hand side variables - is discrete rather than continuous The simplest example of this is when the Y variable is binary so that it can take only 1 or 2 possible values (eg Pass/Fail, Profit/Loss, Win/Lose) Representating this is easy just use a dummy variable on the left hand side of your model, so that Y i = β 0 + β 1 X 1i + β 2 X 2i + u i (1)

Binary Dependent Variables In some cases the outcome of interest rather than one of the right hand side variables - is discrete rather than continuous The simplest example of this is when the Y variable is binary so that it can take only 1 or 2 possible values (eg Pass/Fail, Profit/Loss, Win/Lose) Representating this is easy just use a dummy variable on the left hand side of your model, so that Becomes Y i = β 0 + β 1 X 1i + β 2 X 2i + u i (1) D i = β 0 + β 1 X 1i + β 2 X 2i + u i (2)

Binary Dependent Variables In some cases the outcome of interest rather than one of the right hand side variables - is discrete rather than continuous The simplest example of this is when the Y variable is binary so that it can take only 1 or 2 possible values (eg Pass/Fail, Profit/Loss, Win/Lose) Representating this is easy just use a dummy variable on the left hand side of your model, so that Becomes Where D i Y i = β 0 + β 1 X 1i + β 2 X 2i + u i (1) D i = β 0 + β 1 X 1i + β 2 X 2i + u i (2) = 1 if the individual (or firm or country etc) has the characteristic (eg Win)

Binary Dependent Variables In some cases the outcome of interest rather than one of the right hand side variables - is discrete rather than continuous The simplest example of this is when the Y variable is binary so that it can take only 1 or 2 possible values (eg Pass/Fail, Profit/Loss, Win/Lose) Representating this is easy just use a dummy variable on the left hand side of your model, so that Becomes Y i = β 0 + β 1 X 1i + β 2 X 2i + u i (1) D i = β 0 + β 1 X 1i + β 2 X 2i + u i (2) Where D i = 1 if the individual (or firm or country etc) has the characteristic (eg Win) = 0 if not (eg Lose)

D i = β 0 + β 1 X 1i + β 2 X 2i + u i (2) Equation (2) can be estimated by OLS.

D i = β 0 + β 1 X 1i + β 2 X 2i + u i (2) Equation (2) can be estimated by OLS. When do this it is called a linear probability model and can interpret the coefficients in a similar way as with other OLS models (linear because if fits a straight line and probability because it implicitly models the probability of an event occurring)

D i = β 0 + β 1 X 1i + β 2 X 2i + u i (2) Equation (2) can be estimated by OLS. When do this it is called a linear probability model and can interpret the coefficients in a similar way as with other OLS models

D i = β 0 + β 1 X 1i + β 2 X 2i + u i (2) Equation (2) can be estimated by OLS. When do this it is called a linear probability model and can interpret the coefficients in a similar way as with other OLS models (linear because if fits a straight line and probability because it implicitly models the probability of an event occurring)

-.5 0.5 1 0 5 10 15 20 num_sems first Fitted values two (scatter first num_sems) (line phat num_sems) The chance of getting a first can be seen to depend positively on the number of seminars in this linear probability model

D i = β 0 + β 1 X 1i + β 2 X 2i + u i (2) Equation (2) can be estimated by OLS. When do this it is called a linear probability model and can interpret the coefficients in a similar way as with other OLS models (linear because if fits a straight line and probability because it implicitly models the probability of an event occurring) So for example the coefficient β 1

D i = β 0 + β 1 X 1i + β 2 X 2i + u i (2) Equation (2) can be estimated by OLS. When do this it is called a linear probability model and can interpret the coefficients in a similar way as with other OLS models (linear because if fits a straight line and probability because it implicitly models the probability of an event occurring) So for example the coefficient β 1 = dd i /dx 1

D i = β 0 + β 1 X 1i + β 2 X 2i + u i (2) Equation (2) can be estimated by OLS. When do this it is called a linear probability model and can interpret the coefficients in a similar way as with other OLS models (linear because if fits a straight line and probability because it implicitly models the probability of an event occurring) So for example the coefficient β 1 = dd i /dx 1 now gives the impact of a unit change in the value of X 1 on the chances of belonging to the category coded D=1 (eg of winning) - hence the name linear probability model

Example: Using the dataset marks.dta can work out the chances of getting a first depend on class attendance and gender using a linear probability model gen first=mark>=70 /* first set up a dummy variable that will become the dependent variable */ tab first first Freq. Percent Cum. ------------+----------------------------------- 0 79 66.95 66.95 1 39 33.05 100.00 ------------+----------------------------------- Total 118 100.00 So 39 students got a first out of 118. If we summarise this variable then the mean value of this (or any) binary variable is the proportion of the sample with that characteristic ci first Variable Obs Mean Std. Err. [95% Conf. Interval] -------------+--------------------------------------------------------------- first 118.3305085.0434881.2443825.4166345 So in this case 33% of the course got a first class mark (.33 33%) To see what determines this look at the OLS regression output reg first num_sems female Source SS df MS Number of obs = 118 -------------+------------------------------ F( 2, 115) = 12.03 Model 4.51869886 2 2.25934943 Prob > F = 0.0000 Residual 21.5914706 115.187751919 R-squared = 0.1731 -------------+------------------------------ Adj R-squared = 0.1587 Total 26.1101695 117.223163842 Root MSE =.4333 ------------------------------------------------------------------------------ first Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- num_sems.0342389.0096717 3.54 0.001.0150811.0533967 female.2144069.0837238 2.56 0.012.0485662.3802475 _cons -.1796094.1244661-1.44 0.152 -.4261528.066934 This says that the chances of getting a first rise by 3.4 percentage points for every extra class attended and that, on average, women are 21 percentage points more likely to get a first, even after allowing for the number of classes attended.

However, can show that OLS estimates when the dependent variable is binary

However, can show that OLS estimates when the dependent variable is binary 1. will suffer from heteroskedasticity, so that the t-statistics are biased

However, can show that OLS estimates when the dependent variable is binary 1. will suffer from heteroskedasticity, so that the t-statistics are biased 2. as graph shows may not constrain the predicted values to lie between 0 and 1 (which need if going to predict behaviour accurately)

Using the example above predict phat (option xb assumed; fitted values). su phat Variable Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- phat 118.3305085.1965232 -.1796094.6510978 Can see that there are some negative predictions which is odd for a variable that is binary. Note that not many of the predicted values are zero (and none are 1). This is not unusual since the model is actually predicting the probability of belonging to one of the two categories.

Because of this it is better to use a model that explicitly rather than implicitly models this probability and does not suffer from heteroskedasticity

Because of this it is better to use a model that explicitly rather than implicitly models this probability and does not suffer from heteroskedasticity So that model Probability (D i =1)

Because of this it is better to use a model that explicitly rather than implicitly models this probability and does not suffer from heteroskedasticity So that model Probability (D i =1) as a function of the right hand side variables

Because of this it is better to use a model that explicitly rather than implicitly models this probability and does not suffer from heteroskedasticity So that model Probability (D i =1) as a function of the right hand side variables rather than simply measure whether D=1 or 0

Because of this it is better to use a model that explicitly rather than implicitly models this probability and does not suffer from heteroskedasticity So that model Probability (D i =1) as a function of the right hand side variables rather than simply measure whether D=1 or 0 Probability (D i =1) = F(X 1, X 2 X k )

Because of this it is better to use a model that explicitly rather than implicitly models this probability and does not suffer from heteroskedasticity So that model Probability (D i =1) as a function of the right hand side variables rather than simply measure whether D=1 or 0 Probability (D i =1) = F(X 1, X 2 X k ) So need to come up with a specific functional form for this relationship There are 2 alternatives commonly used

1. Logit Model Assumes that the Probability model is given by Probability (D i =1) = F(X 1, X 2 X k )

1. Logit Model Assumes that the Probability model is given by Probability (D i =1) = F(X 1, X 2 X k ) Prob (D i =1) = 1 ( β 0 + β 1 1X1+ β2x + e 2 )

1. Logit Model Assumes that the Probability model is given by Probability (D i =1) = F(X 1, X 2 X k ) Prob (D i =1) = 2. The Probit Model 1 + 1 ( β 0 + β1x1+ β2x e 2 ) Assumes that the Probability model is given by

1. Logit Model Assumes that the Probability model is given by Probability (D i =1) = F(X 1, X 2 X k ) Prob (D i =1) = 2. The Probit Model 1 + 1 ( β 0 + β1x1+ β2x e 2 ) Assumes that the Probability model is given by Prob (D i =1) = 1 2 1 ( β0 + β1x1+ β2x2 e 2 ) 2π

1. Logit Model Assumes that the Probability model is given by Probability (D i =1) = F(X 1, X 2 X k ) Prob (D i =1) = 1 ( β 0+ β 1 1X1+ β2x + e 2 )

1 0.5 0 a logit function is bounded by 0 and 1 and looks something like this As the value of the logit function F(XB) rises the probability asymptotes to one. As the vale of F(XB) falls the probability asymptotes to zero 0 F(Xβ)

2. The Probit Model Assumes that the Probability model is given by Prob (D i =1) = 1 2 1 ( β0 + β1x1+ β2x2 e 2 ) 2π The idea is to find the values of the coefficients β 0, β 1 etc that make this probability as close to the values in the dataset as possible.

1. Logit Model Assumes that the Probability model is given by Probability (D i =1) = F(X 1, X 2 X k ) Prob (D i =1) = 1 + 1 ( β 0 + β1x1+ β2x e 2 ) 2. The Probit Model Assumes that the Probability model is given by Prob (D i =1) = 1 2 1 ( β0 + β1x1+ β2x2 e 2 ) 2π The idea is to find the values of the coefficients β 0, β 1 etc that make this probability as close to the values in the dataset as possible. The technique is called maximum likelihood estimation

Example: Using the marks.dta data set above the logit and probit equivalents of the OLS linear probability estimates above are, respectively logit first num_sems female Iteration 0: log likelihood = -74.875501 Iteration 1: log likelihood = -63.834471 Iteration 2: log likelihood = -62.907479 Iteration 3: log likelihood = -62.868479 Iteration 4: log likelihood = -62.868381 Logistic regression Number of obs = 118 LR chi2(2) = 24.01 Prob > chi2 = 0.0000 Log likelihood = -62.868381 Pseudo R2 = 0.1604 ------------------------------------------------------------------------------ first Coef. Std. Err. z P> z [95% Conf. Interval] -------------+---------------------------------------------------------------- num_sems.2397919.0747599 3.21 0.001.0932652.3863185 female 1.054987.4337783 2.43 0.015.2047975 1.905177 _cons -4.357462 1.084287-4.02 0.000-6.482626-2.232298 ------------------------------------------------------------------------------ probit first num_sems female Iteration 0: log likelihood = -74.875501 Iteration 1: log likelihood = -63.637368 Iteration 2: log likelihood = -63.127944 Iteration 3: log likelihood = -63.122191 Iteration 4: log likelihood = -63.12219 Probit regression Number of obs = 118 LR chi2(2) = 23.51 Prob > chi2 = 0.0000 Log likelihood = -63.12219 Pseudo R2 = 0.1570 ------------------------------------------------------------------------------ first Coef. Std. Err. z P> z [95% Conf. Interval] -------------+---------------------------------------------------------------- num_sems.1329781.0397433 3.35 0.001.0550827.2108736 female.6198004.2607587 2.38 0.017.1087228 1.130878 _cons -2.44597.5545824-4.41 0.000-3.532932-1.359009

Note that the predicted values from the logit and probit regressions will lie between 0 and 1 predict phat_probit (option p assumed; Pr(first)) su phat_probit phat_logit Variable Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- phat_probit 118.3330643.1980187.0072231.7147911 phat_logit 118.3305085 0.3305085.3305085 while the means are the same the predictions are not identical for the two estimation techniques

The standard errors and t values on these variables should be free of the bias inherent in OLS though they could still be subject to other types of heteroskedasticity so it is a good idea to use the, robust adjustment even with logit and probit estimators

logit first num_sems female, robust Iteration 0: log pseudolikelihood = -74.875501 Iteration 1: log pseudolikelihood = -63.834471 Iteration 2: log pseudolikelihood = -62.907479 Iteration 3: log pseudolikelihood = -62.868479 Iteration 4: log pseudolikelihood = -62.868381 Logistic regression Number of obs = 118 Wald chi2(2) = 14.65 Prob > chi2 = 0.0007 Log pseudolikelihood = -62.868381 Pseudo R2 = 0.1604 ------------------------------------------------------------------------------ Robust first Coef. Std. Err. z P> z [95% Conf. Interval] -------------+---------------------------------------------------------------- num_sems.2397919.0770628 3.11 0.002.0887516.3908322 female 1.054987.4304162 2.45 0.014.2113872 1.898588 _cons -4.357462 1.154993-3.77 0.000-6.621206-2.093718 (in this example it make little difference)

In both cases the estimated coefficients look very different from those of the OLS linear probability estimates.

In both cases the estimated coefficients look very different from those of the OLS linear probability estimates. This is because they do not have the same interpretation as with OLS (they are simply values that maximise the likelihood function).

In both cases the estimated coefficients look very different from those of the OLS linear probability estimates. This is because they do not have the same interpretation as with OLS (they are simply values that maximise the likelihood function). To obtain coefficients which can be interpreted in a similar way to OLS, need marginal effects

In both cases the estimated coefficients look very different from those of the OLS linear probability estimates. This is because they do not have the same interpretation as with OLS (they are simply values that maximise the likelihood function). To obtain coefficients which can be interpreted in a similar way to OLS, need marginal effects δ Pr ob( Di = 1) δx i

In both cases the estimated coefficients look very different from those of the OLS linear probability estimates. This is because they do not have the same interpretation as with OLS (they are simply values that maximise the likelihood function). To obtain coefficients which can be interpreted in a similar way to OLS, need marginal effects δ Pr ob( Di = 1) δx i for the logit model this is given by

In both cases the estimated coefficients look very different from those of the OLS linear probability estimates. This is because they do not have the same interpretation as with OLS (they are simply values that maximise the likelihood function). To obtain coefficients which can be interpreted in a similar way to OLS, need marginal effects δ Pr ob( Di = 1) δx i for the logit model this is given by δ Pr ob( D δx i i = 1) = β i exp( β 0 ( ) ) 1+ exp( β + β X + β X ) 2 0 + β X 1 1 1 1 + β 2 2 X 2 ) 2

In both cases the estimated coefficients look very different from those of the OLS linear probability estimates. This is because they do not have the same interpretation as with OLS (they are simply values that maximise the likelihood function). To obtain coefficients which can be interpreted in a similar way to OLS, need marginal effects δ Pr ob( Di = 1) δx i for the logit model this is given by δ Pr ob( D δx i i = 1) = β i exp( β 0 ( ) ) 1+ exp( β + β X + β X ) 2 0 + β X 1 1 1 1 + β 2 2 X 2 ) 2 = β i Pr ob( Di = 1)*(1 Pr ob( Di = 1))

In both cases the estimated coefficients look very different from those of the OLS linear probability estimates. This is because they do not have the same interpretation as with OLS (they are simply values that maximise the likelihood function). To obtain coefficients which can be interpreted in a similar way to OLS, need marginal effects δ Pr ob( Di = 1) δx i for the logit model this is given by δ Pr ob( D δx i i = 1) = β i exp( β 0 ( ) ) 1+ exp( β + β X + β X ) 2 0 + β X 1 1 1 1 + β 2 2 X 2 ) 2 = β i Pr ob( Di = 1)*(1 Pr ob( Di = 1))

In both cases the estimated coefficients look very different from those of the OLS linear probability estimates. This is because they do not have the same interpretation as with OLS (they are simply values that maximise the likelihood function). To obtain coefficients which can be interpreted in a similar way to OLS, need marginal effects δ Pr ob( Di = 1) δx i for the logit model this is given by δ Pr ob( D δx i i = 1) = β i exp( β 0 ( ) ) 1+ exp( β + β X + β X ) 2 0 + β X 1 1 1 1 + β 2 2 X 2 ) 2 = β i Pr ob( Di = 1)*(1 Pr ob( Di = 1))

= β i Pr ob( Di = 1)*(1 Pr ob( Di = 1)) (in truth since Prob(D i =1) varies with the values of the X variables, this marginal effect is typically evaluated with all the X variables set to their mean values)

In the case of probit model this marginal effect is given by δ Pr ob( Di δx i = 1) = βi f ( β + β X + β X ) 0 1 1 2 2 (where f is the differential of the probit function again typically evaluated at the mean of each X variable)

and in the case of probit model this is given by δ Pr ob( Di δx i = 1) = βi f ( β + β X + β X ) 0 1 1 2 2 (where f is the differential of the probit function above again typically evaluated at the mean of each X variable) In both cases the interpretation of these marginal effects is the impact that a unit change in the variable X i has on the probability of belonging to the treatment group (just like OLS coefficients)

To obtain marginal effects in Stata run either the logit or probit command then simply type logit first num_sems female mfx Marginal effects after logit y = Pr(first) (predict) =.27708689 ------------------------------------------------------------------------------ variable dy/dx Std. Err. z P> z [ 95% C.I. ] X ---------+-------------------------------------------------------------------- num_sems.0480326.01343 3.58 0.000.021713.074352 12.4576 female*.2192534.09097 2.41 0.016.040952.397554.389831 ------------------------------------------------------------------------------ (*) dy/dx is for discrete change of dummy variable from 0 to 1 probit first num_sems female mfx Marginal effects after probit y = Pr(first) (predict) =.29192788 ------------------------------------------------------------------------------ variable dy/dx Std. Err. z P> z [ 95% C.I. ] X ---------+-------------------------------------------------------------------- num_sems.0456601.01301 3.51 0.000.020157.071163 12.4576 female*.2177256.09231 2.36 0.018.036807.398645.389831 ------------------------------------------------------------------------------ (*) dy/dx is for discrete change of dummy variable from 0 to 1 (or with probit you can also type dprobit first num_sems female Iteration 0: log likelihood = -74.875501 Iteration 1: log likelihood = -63.637368 Iteration 2: log likelihood = -63.127944 Iteration 3: log likelihood = -63.122191 Iteration 4: log likelihood = -63.12219

Probit regression, reporting marginal effects Number of obs = 118 LR chi2(2) = 23.51 Prob > chi2 = 0.0000 Log likelihood = -63.12219 Pseudo R2 = 0.1570 ------------------------------------------------------------------------------ first df/dx Std. Err. z P> z x-bar [ 95% C.I. ] ---------+-------------------------------------------------------------------- num_sems.0456601.0130119 3.35 0.001 12.4576.020157.071163 female*.2177256.0923073 2.38 0.017.389831.036807.398645 ---------+-------------------------------------------------------------------- obs. P.3305085 pred. P.2919279 (at x-bar) ------------------------------------------------------------------------------ (*) df/dx is for discrete change of dummy variable from 0 to 1 z and P> z correspond to the test of the underlying coefficient being 0 These estimates are similar to those of OLS (as they should be since OLS, logit and probit are unbiased)

Goodness of Fit Tests The maximum likelihood equivalent to the F test of goodness of fit of the model as a whole is given by the Likelihood Ratio (LR) test

Goodness of Fit Tests The maximum likelihood equivalent to the F test of goodness of fit of the model as a whole is given by the Likelihood Ratio (LR) test which compares the value of the (log) likelihood when maximised with the likelihood function with all coefficients set to zero (much like the F test compares the model RSS with the RSS when all coefficients set to zero)

Goodness of Fit Tests The maximum likelihood equivalent to the F test of goodness of fit of the model as a whole is given by the Likelihood Ratio (LR) test which compares the value of the (log) likelihood when maximised with the likelihood function with all coefficients set to zero (much like the F test compares the model RSS with the RSS when all coefficients set to zero) Can show that LR = 2 [Log L max Log L 0 ] ~ χ 2 (k-1)

Goodness of Fit Tests The maximum likelihood equivalent to the F test of goodness of fit of the model as a whole is given by the Likelihood Ratio (LR) test which compares the value of the (log) likelihood when maximised with the likelihood function with all coefficients set to zero (much like the F test compares the model RSS with the RSS when all coefficients set to zero) Can show that LR = 2 [Log L max Log L 0 ] ~ χ 2 (k-1) where k is the number of right hand side variables including the constant

Goodness of Fit Tests The maximum likelihood equivalent to the F test of goodness of fit of the model as a whole is given by the Likelihood Ratio (LR) test which compares the value of the (log) likelihood when maximised with the likelihood function with all coefficients set to zero (much like the F test compares the model RSS with the RSS when all coefficients set to zero) Can show that LR = 2 [Log L max Log L 0 ] ~ χ 2 (k-1) where k is the number of right hand side variables including the constant If the estimate chi-squared value exceeds the critical value then reject the null that the model as no explanatory power

Goodness of Fit Tests The maximum likelihood equivalent to the F test of goodness of fit of the model as a whole is given by the Likelihood Ratio (LR) test which compares the value of the (log) likelihood when maximised with the likelihood function with all coefficients set to zero (much like the F test compares the model RSS with the RSS when all coefficients set to zero) Can show that LR = 2 [Log L max Log L 0 ] ~ χ 2 (k-1) where k is the number of right hand side variables including the constant If the estimate chi-squared value exceeds the critical value then reject the null that the model as no explanatory power (this value is given in the top right hand corner of the logit/probit output in Stata)

Can also use the LR test to test restrictions on subsets of the coefficients in a similar way to the F test LR = 2 [Log L max unrestrict Log L max restrict ] ~ χ 2 (l) (where l is the number of restricted variables)

A maximum likelihood equivalent of the R2 is the pseudo-r2

A maximum likelihood equivalent of the R 2 is the pseudo-r 2 Pseudo R 2 = 1 (Log L max /Log L 0 )

A maximum likelihood equivalent of the R 2 is the pseudo-r 2 Pseudo R 2 = 1 (Log L max /Log L 0 ) This value lies between 0 and 1 and the closer to one the better the fit of the model (again this value is given in the top right hand corner of the logit/probit output in Stata)

It is also a good idea to try and look at the % of correct predictions in the model ie how many are predicted to have a value 1 and how many predicted to have a value 0 Can do this by assigning a rule Predict = 1 if Predict = 0 if ^ p >=.5 ^ p <.5 where is the individual predicted probability taken from the logit or probit model ^ p

g predict=1 if phat>=.5 replace predict=0 if phat<.5 tab predict first, row first predict 0 1 Total -----------+----------------------+---------- 0 69 22 91 75.82 24.18 100.00 -----------+----------------------+---------- 1 10 17 27 37.04 62.96 100.00 -----------+----------------------+---------- Total 79 39 118 66.95 33.05 100.00 So in this case the model predicts 63% of firsts correctly and 75% of non-firsts correct. Compare this with a random guess which would get 50% of each category correct