Applied Economics. Regression with a Binary Dependent Variable. Department of Economics Universidad Carlos III de Madrid

Similar documents
Regression with a Binary Dependent Variable (SW Ch. 9)

Chapter 11. Regression with a Binary Dependent Variable

Binary Dependent Variable. Regression with a

ECON Introductory Econometrics. Lecture 11: Binary dependent variables

Probit Estimation in gretl

Testing Hypothesis after Probit Estimation

Ordered Response and Multinomial Logit Estimation

Chapter 9 Regression with a Binary Dependent Variable. Multiple Choice. 1) The binary dependent variable model is an example of a

Introduction to Econometrics

2. We care about proportion for categorical variable, but average for numerical one.

4. Nonlinear regression functions

Econometrics Honor s Exam Review Session. Spring 2012 Eunice Han

Econometrics Problem Set 10

ECONOMETRICS HONOR S EXAM REVIEW SESSION

Statistical Inference. Part IV. Statistical Inference

Multiple Regression Analysis

Regression with Qualitative Information. Part VI. Regression with Qualitative Information

Extensions to the Basic Framework II

Econ 371 Problem Set #6 Answer Sheet. deaths per 10,000. The 90% confidence interval for the change in death rate is 1.81 ±

Economics 471: Econometrics Department of Economics, Finance and Legal Studies University of Alabama

Models of Qualitative Binary Response

Exercises (in progress) Applied Econometrics Part 1

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017

Linear Regression With Special Variables

6. Assessing studies based on multiple regression

Exercise sheet 6 Models with endogenous explanatory variables

Non-linear panel data modeling

Lecture notes to Chapter 11, Regression with binary dependent variables - probit and logit regression

Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis"

Write your identification number on each paper and cover sheet (the number stated in the upper right hand corner on your exam cover).

Binary Dependent Variables

Econometrics I Lecture 7: Dummy Variables

2. Linear regression with multiple regressors

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

Week 7: Binary Outcomes (Scott Long Chapter 3 Part 2)

Testing and Model Selection

Truncation and Censoring

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes

Problem set 1: answers. April 6, 2018

Modeling Binary Outcomes: Logit and Probit Models

Gibbs Sampling in Latent Variable Models #1

POLI 8501 Introduction to Maximum Likelihood Estimation

CHAPTER 6: SPECIFICATION VARIABLES

UNIVERSIDAD CARLOS III DE MADRID ECONOMETRICS FINAL EXAM (Type B) 2. This document is self contained. Your are not allowed to use any other material.

Applied Health Economics (for B.Sc.)

UNIVERSIDAD CARLOS III DE MADRID ECONOMETRICS Academic year 2009/10 FINAL EXAM (2nd Call) June, 25, 2010

Econometrics II. Seppo Pynnönen. Spring Department of Mathematics and Statistics, University of Vaasa, Finland

Answers to Problem Set #4

Sample Problems. Note: If you find the following statements true, you should briefly prove them. If you find them false, you should correct them.

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis

Multiple Regression Analysis. Part III. Multiple Regression Analysis

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests

Time Series. Chapter Time Series Data

Introduction to Linear Regression Analysis

WISE International Masters

Practice Questions for the Final Exam. Theoretical Part

h=1 exp (X : J h=1 Even the direction of the e ect is not determined by jk. A simpler interpretation of j is given by the odds-ratio

The Multiple Regression Model Estimation

Binary Choice Models Probit & Logit. = 0 with Pr = 0 = 1. decision-making purchase of durable consumer products unemployment

Discrete Choice Modeling

ECON 482 / WH Hong Binary or Dummy Variables 1. Qualitative Information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Introduction to GSEM in Stata

10. Time series regression and forecasting

PhD/MA Econometrics Examination January 2012 PART A

Immigration attitudes (opposes immigration or supports it) it may seriously misestimate the magnitude of the effects of IVs

Applied Econometrics. Professor Bernard Fingleton

Brief Sketch of Solutions: Tutorial 3. 3) unit root tests

Discrete Dependent Variable Models

Model Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection

Selection on Observables: Propensity Score Matching.

2) For a normal distribution, the skewness and kurtosis measures are as follows: A) 1.96 and 4 B) 1 and 2 C) 0 and 3 D) 0 and 0

Heteroskedasticity. Part VII. Heteroskedasticity

Multiple Regression. Midterm results: AVG = 26.5 (88%) A = 27+ B = C =

Partial effects in fixed effects models

The general linear regression with k explanatory variables is just an extension of the simple regression as follows

Econometrics Problem Set 4

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

3. Linear Regression With a Single Regressor

Heteroscedasticity 1

7. Integrated Processes

ECON3150/4150 Spring 2015

Nonlinear Regression Functions

i (x i x) 2 1 N i x i(y i y) Var(x) = P (x 1 x) Var(x)

A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

CORRELATION, ASSOCIATION, CAUSATION, AND GRANGER CAUSATION IN ACCOUNTING RESEARCH

Applied Statistics and Econometrics

Econ 427, Spring Problem Set 3 suggested answers (with minor corrections) Ch 6. Problems and Complements:

Lecture 3.1 Basic Logistic LDA

Econometrics Problem Set 6

Practical Econometrics. for. Finance and Economics. (Econometrics 2)

A simple alternative to the linear probability model for binary choice models with endogenous regressors

Generalized linear models

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

The Simple Regression Model. Part II. The Simple Regression Model

Univariate linear models

The Simple Linear Regression Model

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

Transcription:

Applied Economics Regression with a Binary Dependent Variable Department of Economics Universidad Carlos III de Madrid See Stock and Watson (chapter 11) 1 / 28

Binary Dependent Variables: What is Different? So far the dependent variable (Y) has been continuous: district-wide average test score traffic fatality rate What if Y is binary? Y = get into college, or not; X = high school grades, SAT scores, demographic variables Y =person smokes, or not; X = cigarette tax rate, income, demographic variables Y = mortgage application is accepted, or not; X = race, income, house characteristics, marital status 2 / 28

Example: Mortgage Denial and Race The Boston Fed HMDA Dataset Individual applications for single-family mortgages made in 1990 in the greater Boston area 2380 observations, collected under Home Mortgage Disclosure Act (HMDA) Variables Dependent variable: Is the mortgage denied or accepted? Independent variables: income, wealth, employment status, other loan, property characteristics, and race of applicant. 3 / 28

Linear Probability Model A natural starting point is the linear regression model with a single regressor: Y i = β 0 + β 1 X i + µ i But: What does β 1 mean when Y is binary? What does the line β 0 + β 1 X mean when Y is binary? What does the predicted value Ŷ mean when Y is binary? For example, what does Ŷ = 0.26 mean? 4 / 28

Linear Probability Model When Y is binary: E(Y X ) = 1xPr(Y = 1 X ) + 0xPr(Y = 0 X ) = Pr(Y = 1 X ) Under the assumption, E(u i X i ) = 0, so E(Y i X i ) = E(β 0 + β 1 X i + u i X i ) = β 0 + β 1 X i, so, E(Y X ) = Pr(Y = 1 X ) = β 0 + β 1 X i In the linear probability model, the predicted value of Y is interpreted as the predicted probability that Y = 1, and β 1 is the change in that predicted probability for a unit change in X. 5 / 28

Linear Probability Model - When Y is binary, the linear regression model Y i = β 0 + β 1 X i + µ i is called the linear probability model because Pr(Y = 1 X ) = β 0 + β 1 X i - The predicted value is a probability: -E(Y X = x) = Pr(Y = 1 X = x) = prob. that Y = 1 given X = x -Ŷ = the predicted probability that Y = 1 given X -β 1 = change in probability that Y = 1 for a unit change in x: β 1 = Pr(Y =1 X =x+ x) Pr(Y =1 X =x) x 6 / 28

Example: linear probability model, HMDA data - Mortgage denial v. ratio of debt payments to income (P/I ratio) in a subset of the HMDA data set (n = 127) 7 / 28

Example: linear probability model, HMDA data deny i = β 0 + β 1 PI i + β 2 black i Model 1: OLS, using observations 1 2380 Dependent variable: deny Heteroskedasticity-robust standard errors, variant HC1 Coefficient Std. Error t-ratio p-value const 0.0905136 0.0285996 3.1649 0.0016 pi rat 0.559195 0.0886663 6.3067 0.0000 black 0.177428 0.0249463 7.1124 0.0000 Mean dependent var 0.119748 S.D. dependent var 0.324735 Sum squared resid 231.8047 S.E. of regression 0.312282 R 2 0.076003 Adjusted R 2 0.075226 F (2, 2377) 49.38650 P-value(F ) 9.67e 22 Log-likelihood 605.6108 Akaike criterion 1217.222 Schwarz criterion 1234.546 Hannan Quinn 1223.527 8 / 28

The linear probability model: Summary -Advantages: - simple to estimate and to interpret - inference is the same as for multiple regression (need heteroskedasticity-robust standard errors) 9 / 28

The linear probability model: Summary - Disadvantages: -A LPM says that the change in the predicted probability for a given change in X is the same for all values of X, but that doesn t make sense. Think about the HMDA example? -Also, LPM predicted probabilities can be < 0 or > 1! -These disadvantages can be solved by using a nonlinear probability model: probit and logit regression 10 / 28

Probit and Logit Regression - The problem with the linear probability model is that it models the probability of Y=1 as being linear: Pr(Y = 1 X ) = β 0 + β 1 X - Instead, we want: i. Pr(Y = 1 X ) to be increasing in X for β 1 > 0, and ii. 0 Pr(Y = 1 X ) 1 for all X - This requires using a nonlinear functional form for the probability. How about an S-curve? 11 / 28

S-curve function candidates 12 / 28

Probit regression - Probit regression models the probability that Y=1 using the cumulative standard normal distribution function, Φ(z), evaluated at z = β 0 + β 1 X. The probit regression model is, Pr(Y = 1 X ) = Φ(β 0 + β 1 X ) - where Φ(.) is the cumulative normal distribution function and z = β 0 + β 1 X. is the z value or z index of the probit model. - Example: Suppose β 0 = 2, β 1 = 3, X =.4, so Pr(Y = 1 X =.4) = Φ( 2 + 3.4) = Φ( 0.8) Pr(Y = 1 X =.4) = area under the standard normal density to left of z =.8, which is... 13 / 28

Probit regression 14 / 28

Logit regression Logit regression models the probability of Y=1, given X, as the cumulative standard logistic distribution function, evaluated at z = β 0 + β 1 X : Pr(Y = 1 X ) = F (β 0 + β 1 X ) where F is the cumulative logistic distribution function: F (β 0 + β 1 X ) = 1 1+e (β 0 +β 1 X ) 15 / 28

Logit regression - Example: Suppose β 0 = 3, β 1 = 2, X =.4, so Pr(Y = 1 X =.4) = 1 1+e ( 3+2 0.4) = 0.0998 Why bother with logit if we have probit? -The main reason is historical: logit is computationally faster & easier, but that does not matter nowadays -In practice, logit and probit are very similar since empirical results typically do not hinge on the logit/probit choice, both tend to be used in practice 16 / 28

Understanding the Coefficients and the Slopes In contrast to the linear model, in the probit and logit models the coefficients do not capture the marginal effect on output when a control changes - if control x j is continuous, Pr(y=1) x j = f (βx)β j - if control x j is discrete, Pr (y = 1) = F (βx 1 ) F (βx 0 ) - Where f (.) and F () are the density and cumulative density functions 17 / 28

Understanding the Coefficients and the Slopes - Specifically with z = β 0 + β 1 x 1 +... + β k x k Logit: - f (z) = e z (1+e z ) 2 - F (z) = 1 1+e z Probit: - f (z) = φ(z) - F (z) = Φ(z) 18 / 28

Estimation and Inference in the Logit and Probit Models We will focus on the probit model: Pr(Y = 1 X ) = Φ(β 0 + β 1 X ) we could use nonlinear least squares. However, a more efficient estimator (smaller variance) is the Maximum Likelihood Estimator 19 / 28

The Maximum Likelihood Estimator of the Coefficients in the Probit Model - The likelihood function is the conditional density of Y 1,...,Y n given X 1,...,X n, treated as a function of the unknown parameters (β s) - The maximum likelihood estimator (MLE) is the value of the β s that maximize the likelihood function. - The MLE is the value of the β s that best describe the full distribution of the data. - In large samples, the MLE is: - consistent - normally distributed - efficient (has the smallest variance of all estimators) 20 / 28

The Maximum Likelihood Estimator of the Coefficients in the Probit Model Data: Y 1,...,Y n, i.i.d. Derivation of the likelihood starts with the density of Y 1 : Pr(Y 1 = 1 X ) = Φ(β 0 + β 1 X 1 ) and Pr(Y 1 = 0) = (1 Φ(β 0 + β 1 X 1 ))so Pr(Y 1 = y 1 X 1 ) = Φ(β 0 + β 1 X 1 ) y 1 (1 Φ(β 0 + β 1 X 1 )) (1 y 1) y 1 = 1,0 Pr(Y 1 = y 1 X 1 ) = Φ(z 1 ) y 1 (1 Φ(z 1 )) (1 y 1) with z 1 = β 0 + β 1 X 1 21 / 28

The Maximum Likelihood Estimator. Probit The probit likelihood function is the joint density of Y 1,...,Y n given X 1,...,X n, treated as a function of the β s: f (β;y 1,...,Y n X 1,...,X n ) = {Φ(z 1 ) y 1 (1 Φ(z 1 )) (1 y1) }{Φ(z 2 ) y 2 (1 Φ(z 2 )) (1 y2) }...{Φ(z n ) y n (1 Φ(z n )) (1 yn) } - β MLE maximize this likelihood function. - But we cannot solve for the maximum explicitly! So the MLE must be maximized using numerical methods - In large samples: - βs MLE, are consistent - βs MLE, are normally distributed - βs MLE, are asymptotically efficient among all estimators (assuming the probit model is the correct model) 22 / 28

The Maximum Likelihood Estimator. Probit - Standard errors of βs MLE are computed automatically - Testing, confidence intervals proceeds as usual - Everything extends to multiple X s 23 / 28

The Maximum Likelihood Estimator. Logit - The only difference between probit and logit is the functional form used for the probability: Φ is replaced by the cumulative logistic function. Otherwise, the likelihood is similar - As with probit, - βs MLE, are consistent - Their standard errors can be computed - Testing confidence intervals proceeds as usual 24 / 28

Measures of Fit for Logit and Probit - The R 2 and R 2 do not make sense here (why?). So, two other specialized measures are used: - The fraction correctly predicted = fraction of Y s for which the predicted probability is > 50% when Y i = 1, or is < 50% when Y i = 0. - The pseudo-r 2 measures the improvement in the value of the log likelihood, relative to having no X s. 25 / 28

Basic Commands in gretl for Probit Estimation probit: computes Maximum Likelihood probit estimation omit/add: tests joint significance $yhat: returns probability estimates $lnl: returns the log-likelihood for the last estimated model pdf(n,z): returns the density of normal distribution cdf(n,z): returns the cdf normal distribution logit: computes Maximum Likelihood logit estimation 26 / 28

probit depvar indvars robust verbose p-values depvar must be binary {0, 1} (otherwise a different model is estimated or an error message is given) slopes are computed at the means by default, standard errors are computed using the negative inverse of the Hessian output shows χq 2 statistic test for null that all slopes are zero options: 1 --robust: covariance matrix robust to model misspecification 2 --p-values: shows p-values instead of slope estimates 3 --verbose: shows information from all numerical iterations 27 / 28

Impact of fertility on female labor participation Using the data set fertility.gdt : -estimate a linear probability model that explains wether or not a woman worked during the last year as a function of the variables morekids, agem1, black, hispan, and othrace. Give an interpretation of the parameters. - Using the previous model, what is the impact on the probability of working when a woman has more than two children? - Using the same model and assuming that age of the mother is a continuos variable, what is the impact on the probability of a marginal change in the education of the mother? - Answer the previous two question using a probit and logit models. 28 / 28