ECON 482 / WH Hong Binary or Dummy Variables 1. Qualitative Information

Similar documents
Econometrics I Lecture 7: Dummy Variables

Chapter 9: The Regression Model with Qualitative Information: Binary Variables (Dummies)

Regression with Qualitative Information. Part VI. Regression with Qualitative Information

CHAPTER 7. + ˆ δ. (1 nopc) + ˆ β1. =.157, so the new intercept is = The coefficient on nopc is.157.

ECON Interactions and Dummies

Solutions to Problem Set 4 (Due November 13) Maximum number of points for Problem set 4 is: 66. Problem C 6.1

Answer Key: Problem Set 6

Lab 10 - Binary Variables

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is

Making sense of Econometrics: Basics

Answer Key: Problem Set 5

ECON 5350 Class Notes Functional Form and Structural Change

Problem 13.5 (10 points)

Ch 7: Dummy (binary, indicator) variables

Econometrics II. Seppo Pynnönen. Spring Department of Mathematics and Statistics, University of Vaasa, Finland

6. Dummy variable regression

Solutions to Problem Set 5 (Due November 22) Maximum number of points for Problem set 5 is: 220. Problem 7.3

Econometrics -- Final Exam (Sample)

Solutions to Odd-Numbered End-of-Chapter Exercises: Chapter 8

2) For a normal distribution, the skewness and kurtosis measures are as follows: A) 1.96 and 4 B) 1 and 2 C) 0 and 3 D) 0 and 0

Lecture 7: OLS with qualitative information

Exercise sheet 6 Models with endogenous explanatory variables

Estimating the return to education for married women mroz.csv: 753 observations and 22 variables

CHAPTER 4. > 0, where β

Contest Quiz 3. Question Sheet. In this quiz we will review concepts of linear regression covered in lecture 2.

Intermediate Econometrics

Final Exam - Solutions

Eco 391, J. Sandford, spring 2013 April 5, Midterm 3 4/5/2013

where Female = 0 for males, = 1 for females Age is measured in years (22, 23, ) GPA is measured in units on a four-point scale (0, 1.22, 3.45, etc.

Introduction to Econometrics (4 th Edition) Solutions to Odd-Numbered End-of-Chapter Exercises: Chapter 8

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

4. Nonlinear regression functions

Applied Statistics and Econometrics

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Lecture 5: Omitted Variables, Dummy Variables and Multicollinearity

Universidad Carlos III de Madrid Econometría Nonlinear Regression Functions Problem Set 8

PhD/MA Econometrics Examination January 2012 PART A

Chapter 9. Dummy (Binary) Variables. 9.1 Introduction The multiple regression model (9.1.1) Assumption MR1 is

Course Econometrics I

Applied Quantitative Methods II

Statistics and Quantitative Analysis U4320

ECON 497 Midterm Spring

Econometrics Multiple Regression Analysis with Qualitative Information: Binary (or Dummy) Variables

Week 2: Pooling Cross Section across Time (Wooldridge Chapter 13)

ECO321: Economic Statistics II

Write your identification number on each paper and cover sheet (the number stated in the upper right hand corner on your exam cover).

Econometrics Problem Set 6

CIVL 7012/8012. Simple Linear Regression. Lecture 3

Marketing Research Session 10 Hypothesis Testing with Simple Random samples (Chapter 12)

Can you tell the relationship between students SAT scores and their college grades?

Group comparisons in logit and probit using predicted probabilities 1

Econometrics Problem Set 10

ECO375 Tutorial 4 Wooldridge: Chapter 6 and 7

Exercise Sheet 4 Instrumental Variables and Two Stage Least Squares Estimation

Data Analysis 1 LINEAR REGRESSION. Chapter 03

Economics Introduction to Econometrics - Fall 2007 Final Exam - Answers

Would you have survived the sinking of the Titanic? Felix Pretis (Oxford) Econometrics Oxford University, / 38


Testing for Discrimination

Econometrics Honor s Exam Review Session. Spring 2012 Eunice Han

Practice exam questions

Econ 444, class 11. Robert de Jong 1. Monday November 6. Ohio State University. Econ 444, Wednesday November 1, class Department of Economics

Applied Economics. Regression with a Binary Dependent Variable. Department of Economics Universidad Carlos III de Madrid

Solutions to Exercises in Chapter 9

MGEC11H3Y L01 Introduction to Regression Analysis Term Test Friday July 5, PM Instructor: Victor Yu

ECNS 561 Topics in Multiple Regression Analysis

FNCE 926 Empirical Methods in CF

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

Chapter 7 Student Lecture Notes 7-1

In order to carry out a study on employees wages, a company collects information from its 500 employees 1 as follows:

ECMT 676 Assignment #1 March 18, and x. are unknown? - Run the following regression: directly? What if μ1

Chapter 3 Multiple Regression Complete Example

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43

Problem C7.10. points = exper.072 exper guard forward (1.18) (.33) (.024) (1.00) (1.00)

Linear Regression With Special Variables

Regression Models REVISED TEACHING SUGGESTIONS ALTERNATIVE EXAMPLES

Speci cation of Conditional Expectation Functions

Economics 113. Simple Regression Assumptions. Simple Regression Derivation. Changing Units of Measurement. Nonlinear effects

Write your identification number on each paper and cover sheet (the number stated in the upper right hand corner on your exam cover).

Chapter 3: Examining Relationships

ECONOMETRICS HONOR S EXAM REVIEW SESSION

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

University of California at Berkeley Fall Introductory Applied Econometrics Final examination. Scores add up to 125 points

Chapter Fifteen. Frequency Distribution, Cross-Tabulation, and Hypothesis Testing

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Chapter 13: Dummy and Interaction Variables

Lecture 8. Using the CLR Model. Relation between patent applications and R&D spending. Variables

Applied Statistics and Econometrics

Econometrics Problem Set 4

Research Methods in Political Science I

Chapter 14 Student Lecture Notes 14-1

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Overview. 4.1 Tables and Graphs for the Relationship Between Two Variables. 4.2 Introduction to Correlation. 4.3 Introduction to Regression 3.


Lecture 6: Linear Regression (continued)

Hypothesis testing. Data to decisions

Multiple Linear Regression CIVL 7012/8012

6.2b Homework: Fit a Linear Model to Bivariate Data

Econometrics Problem Set 3

Problem Set 10: Panel Data

Transcription:

1. Qualitative Information Qualitative Information Up to now, we assume that all the variables has quantitative meaning. But often in empirical work, we must incorporate qualitative factor into regression model. Examples: gender, race, industry, region, rating grade,... A way to incorporate qualitative information is to use binary variables which take either zero or one. Binary variables are most commonly called dummy variables. They may appear as the dependent or as independent variables. 1

2. A Single Dummy Independent Variable (Example) Wage equation wage = β + δ female + β educ + u 0 0 1 female is a dummy variable: female = 1 if the person is a woman; female = 0 if the person is a man δ 0: the wage gain/loss if the person is a woman rather than a man (holding other factors fixed) Holding educ and other factors constant, If female ( female = 1), the mean wage is β0 + δ0 If male ( female = 0), the mean wage is β 0 δ 0 = E ( wage female = 1, educ) E ( wage female = 0, educ) Thus, δ 0 measures the difference in mean wage between men and women with the same level of education. 2

Graphically, This is an intercept shift. 3

Dummy variable trap Why not using two dummy variables instead of one when we have two categories in a qualitative variable? In the wage equation example, wage = β + γ male + δ female + β educ + u 0 0 0 1 This model cannot be estimated because of perfect collinearity. This is called the "dummy variable trap" When using dummy variables, one category always has to be omitted: wage = β + δ female + β educ + u => The base category is men 0 0 1 wage = β + γ male + β educ + u 0 0 1 Alternatively, one could omit the intercept: wage = γ male + δ female + β educ + u Disadvantages: 0 0 1 => The base category is women i. More difficult to test for differences between the parameters ii. R-squared formula is only valid when regression contains intercept 4

(Example) Wage equation (cont'd) Estimated equation wage = 1.57 1.81 female + 0.572 educ + 0.025 exper + 0.141 tenure (0.72) (0.26) (0.049) (0.012) (0.021) 2 n = 526, R = 0.364 The t-statistic said that the coefficient on female is statistically significant. Interpretation: Holding education, experience, and tenure fixed, women earn $1.81 less per hour than men Does that mean that women are discriminated against? Not necessarily. Being female may be correlated with other productivity characteristics that have not been controlled for. => Omitted variable bias 5

(Example) Effect of training grants on hours of training Estimated equation frsemp = 46.67 + 26.25grant 0.98log( sales) 6.07log( employ) (43.41) (5.59) (3.54) (3.88) n = 105, R 2 = 0.237 frsemp: hours training per employee; grant : =1 if firm received training grant, =0 otherwise The coefficient on grant is statistically significant, implying that holding other factors constant, the receipt of training grant increases hours of training by 26.25 hours. This is an example of program evaluation Treatment group (= grant receiver) vs. control group (= no grant) Is the effect of treatment on the outcome of interest causal? => possibly not, because the event of giving grant is not randomly chosen. Therefore, the event might be correlated with other factor that have not been controlled for. => Omitted variable bias 6

3. Using Dummy Variables for Multiple Categories (Example) City credit ratings and municipal bond interest reates Estimation model 1: MBR = β + β CR + other factors 0 1 MBR : Municipal bond rate; CR : Credit ratings for a firm: from 0 to 4 (0=worst, 4= best) β 1 measures the percentage point change in MBR when CR increases by one unit. Is the difference between four and three the same as the difference between zero and one? Unfortunately, it is hard to interpret the meaning of one unit increase in CR, because indeed CR has ordinal meaning rather than cardinal. Therefore, a better way to incorporate this information is to define dummies Estimation model 2: MBR = β + δ CR + δ CR + δ CR + δ CR + other factors 0 1 1 2 2 3 3 4 4 where, for example, CR = if CR = 1 and zero otherwise, etc. Now, 1 1 δ i for i = 1,...,4 are measured in comparison to the worst rating ( = base category) 7

4. Interactions Involving Dummy Variables (1) Interaction involving dummy variables (Example) Wage equation Estimation model ( ) log wage = β + δ female + β educ + δ female educ + u 0 0 1 1 ( β δ ) ( β δ ) = + female + + female educ + u 0 0 1 1 β 0 = intercept for men; β 1 = slope for men β0 + δ0 = intercept for women; β1+ δ1 = slope for women Interesting Hypothesis o 1 H : δ = 0 => the return to education is the same for men and women. H : δ = 0, δ = 0 0 0 1 => The whole wage equation is the same for men and women. 8

Graphically Graph (a): δ 0 < 0 and δ 1 < 0 => women earn less than men at all levels of education, and the gap increases as gets larger. Graph (b): δ 0 < 0 and δ 1 > 0 educ => women earn less than men at low levels of education, but the gap narrows ad education increase 9

(2) Testing for difference in regression functions across groups It is possible that functional forms might be different depending on a group. To test this follow the step. Unrestricted model (contains full set of interaction, which allow different functional forms) cumgpa = β + δ female + β sat + δ female sat + β hsperc + δ female hsperc 0 0 1 1 2 2 + β tothrs + δ female tothrs + u 3 3 cumgpa: college grade point average; sat : sat score hsperc: high school rank perentile; tothrs : total hours spent in college courses Restricted model cumgpa = β + β sat + β hsperc + β tothrs + u 0 1 2 3 Null hypothesis: H0 : δ0 = 0, δ1 = 0, δ2 = 0, δ3 = 0 => All interaction effects are zero, i.e., the same regression coefficients apply to men and women. (No different functional form) 10

Estimation of the unrestricted model cumgpa = 1.48 0.353 female + 0.0011 sat + 0.00075 female sat (0.21) (0.411) (0.0002) (0.00039) 0.0085hsperc + 0.00055 female hsperc+ 0.0023tothrs 0.00012 female tothrs (0.0014) (0.00316) (0.0009) (0.00163) Tested individually, the hypothesis that the interaction effects are zero cannot be rejected Joint test with F-statistic ( SSRr SSRur) / q ( 85.515 78.355 ) / 4 SSR / n k 1 78.355 / ( 366 7 1) F = = 8.18 ur The null is rejected. Alternative way to compute F-statistic in the given case Run separate regressions for men and for women; the unrestricted SSR is given by the sum of the SSR of these two regression: SSRur = SSRmen + SSRwomen Run regression for the restricted model and store SSR: SSR r If the test is computed in this way, it is called the Chow-Test. 11

5. A Binary Dependent Variable: The Linear Probability Model Linear regression when the dependent variable is binary y β0 β1x1... βkxk u = + + + + => E ( y x ) = β0 + β1x1 +... + βkxk ( x) = ( = x) + ( = x) = ( = x) When y is binary, E y 1 P y 1 0 P y 0 P y 1. Therefore, P ( ) β0 β1 1 y = 1 x = + x +... + β kxk; Linear Probability Model (LPM) Interpretation: β j = ( 1 x) P y = x j The coefficient measures the effect of the explanatory variables on the probability that y =1. 12

(Example) Labor force participation of married women Estimated equation 2 inlf = 0.586 0.0034nwifeinc + 0.038educ + 0.039exper 0.00060exper (0.154) (0.0014) (0.007) (0.006) (0.00018) 0.016age 0.262kidslt6 + 0.0130kidage6 (0.002) (0.034) (0.0132) n = 763, R 2 = 0.264 inlf : =1 if in labor force, =0 otherwise; nwifeinc: non-wife income (in thousands); kidslt 6: number of kids under six years; kidsage 6: number of kids between 6 and 18 The coefficient on kidslt 6 is statistically significant, indicating that the probability that the woman works falls by 26.2% when the number of kids under six increase by one. The coefficient on kidsage6 is insignificant. 13

(Example) Labor force participation of married women (cont'd) Graph for nwifein = 50, exper = 5, age = 30, kidslt 6= 1, kidsage 6= 0 the max of educ is 17. For the given case, this leads to a predicted probability to be in the labor force of about 50% Note that there is negative probability. But no problem because no woman in the sample has educ < 5 14

Disadvantages of the linear probability model Predicted probability may be larger than one or smaller than zero. Marginal probability sometimes logically impossible. => A probability cannot be linearly related to the independent variables for all their possible value. Δ inlf = 0.262Δ kidslt6 implies that the effect of the variable from zero to one and from three to four are the same. But we would expect the marginal effect would decreases as the number of kids under 6 increases. The linear probablity model is necessarily heteroskedastic ( y x) = P( y = X) P( y = x ) var 1 1 1 Heteroskedastic consistent standard errors need to be computed. (We'll see later) Advantages of the linear probability model Easy estimation and interpretation Estimated effects and predictions often reasonably good in practice. 15