Econometrics I Lecture 7: Dummy Variables

Similar documents
ECON 482 / WH Hong Binary or Dummy Variables 1. Qualitative Information

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is

Regression with Qualitative Information. Part VI. Regression with Qualitative Information

Ecmt 675: Econometrics I

Estimating the return to education for married women mroz.csv: 753 observations and 22 variables

Ch 7: Dummy (binary, indicator) variables

Course Econometrics I

Warwick Economics Summer School Topics in Microeconometrics Instrumental Variables Estimation

ECON Interactions and Dummies

Intermediate Econometrics

Chapter 9: The Regression Model with Qualitative Information: Binary Variables (Dummies)

Econometrics II. Seppo Pynnönen. Spring Department of Mathematics and Statistics, University of Vaasa, Finland

Econometrics I Lecture 3: The Simple Linear Regression Model

Making sense of Econometrics: Basics

Lab 10 - Binary Variables

Exercise sheet 6 Models with endogenous explanatory variables

Problem Set 10: Panel Data

Would you have survived the sinking of the Titanic? Felix Pretis (Oxford) Econometrics Oxford University, / 38

ECON Introductory Econometrics. Lecture 11: Binary dependent variables

Exercise Sheet 4 Instrumental Variables and Two Stage Least Squares Estimation

Econometrics Problem Set 4

FNCE 926 Empirical Methods in CF

2) For a normal distribution, the skewness and kurtosis measures are as follows: A) 1.96 and 4 B) 1 and 2 C) 0 and 3 D) 0 and 0

4. Nonlinear regression functions

Problem set - Selection and Diff-in-Diff

Applied Econometrics (MSc.) Lecture 3 Instrumental Variables

Regression #8: Loose Ends

Econometrics Problem Set 10

Course Econometrics I

Linear Regression With Special Variables

Applied Quantitative Methods II

CHAPTER 7. + ˆ δ. (1 nopc) + ˆ β1. =.157, so the new intercept is = The coefficient on nopc is.157.

Econometrics Multiple Regression Analysis with Qualitative Information: Binary (or Dummy) Variables

Problem 13.5 (10 points)

CIVL 7012/8012. Simple Linear Regression. Lecture 3

Simultaneous Equations with Error Components. Mike Bronner Marko Ledic Anja Breitwieser

ECO375 Tutorial 8 Instrumental Variables

Econometrics -- Final Exam (Sample)

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Applied Statistics and Econometrics

Applied Economics. Regression with a Binary Dependent Variable. Department of Economics Universidad Carlos III de Madrid

Control Function and Related Methods: Nonlinear Models

Economics Introduction to Econometrics - Fall 2007 Final Exam - Answers

Problemsets for Applied Econometrics

Introduction to Linear Regression Analysis

Applied Microeconometrics (L5): Panel Data-Basics

Logistic regression: Why we often can do what we think we can do. Maarten Buis 19 th UK Stata Users Group meeting, 10 Sept. 2015

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Mid-term exam Practice problems

Partial effects in fixed effects models

Econometrics Problem Set 3

Basic econometrics. Tutorial 3. Dipl.Kfm. Johannes Metzler

Lecture 6: Linear Regression (continued)

Lecture 8: Instrumental Variables Estimation

Applied Econometrics Lecture 1

Lecture 6: Linear Regression

Chapter 11. Regression with a Binary Dependent Variable

ECON Introductory Econometrics. Lecture 13: Internal and external validity

Universidad Carlos III de Madrid Econometría Nonlinear Regression Functions Problem Set 8

Lab 07 Introduction to Econometrics

The multiple regression model; Indicator variables as regressors

Binary Dependent Variable. Regression with a

Econometrics Honor s Exam Review Session. Spring 2012 Eunice Han

Multiple Regression. Midterm results: AVG = 26.5 (88%) A = 27+ B = C =

ECONOMETRIC MODEL WITH QUALITATIVE VARIABLES

Answer Key: Problem Set 5

WISE International Masters

Lecture 3: Multiple Regression. Prof. Sharyn O Halloran Sustainable Development U9611 Econometrics II

Final Exam - Solutions

Statistical Inference with Regression Analysis

Lecture-1: Introduction to Econometrics

Solutions to Odd-Numbered End-of-Chapter Exercises: Chapter 8

Categorical Predictor Variables

α version (only brief introduction so far)

Lecture 5: Omitted Variables, Dummy Variables and Multicollinearity

ECONOMETRICS HONOR S EXAM REVIEW SESSION

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

Speci cation of Conditional Expectation Functions

Introduction to Econometrics (4 th Edition) Solutions to Odd-Numbered End-of-Chapter Exercises: Chapter 8

Marginal effects and extending the Blinder-Oaxaca. decomposition to nonlinear models. Tamás Bartus

Lecture 9: Panel Data Model (Chapter 14, Wooldridge Textbook)

Project Report for STAT571 Statistical Methods Instructor: Dr. Ramon V. Leon. Wage Data Analysis. Yuanlei Zhang

PhD/MA Econometrics Examination January 2012 PART A

Applied Statistics and Econometrics

MATH 1150 Chapter 2 Notation and Terminology

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

ECON3150/4150 Spring 2016

Problem Set # 1. Master in Business and Quantitative Methods

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

Data Analysis 1 LINEAR REGRESSION. Chapter 03

INTRODUCTION TO BASIC LINEAR REGRESSION MODEL

Economics 113. Simple Regression Assumptions. Simple Regression Derivation. Changing Units of Measurement. Nonlinear effects

Applied Health Economics (for B.Sc.)

Linear Regression. Junhui Qian. October 27, 2014

Non-linear panel data modeling

Multiple Linear Regression CIVL 7012/8012

Applied Statistics and Econometrics

Gibbs Sampling in Latent Variable Models #1

Contest Quiz 3. Question Sheet. In this quiz we will review concepts of linear regression covered in lecture 2.

Model Specification and Data Problems. Part VIII

Transcription:

Econometrics I Lecture 7: Dummy Variables Mohammad Vesal Graduate School of Management and Economics Sharif University of Technology 44716 Fall 1397 1 / 27

Introduction Dummy variable: d i is a dummy variable if it only takes values of 0 and 1. Useful for studying effect of qualitative variables / event / treatment / etc. Examples: wage discrimination against women: f emale a dummy that takes 1 if the person is female and 0 otherwise. effect of clean drinking water: clean a dummy that shows whether the individuals have access to clean water. 2 / 27

Outline Single dummy as an explanatory variable Dummy for multiple categories Interaction terms with dummies Linear probability model Reference: Wooldridge (2013), Ch 7. 3 / 27

Outline Single dummy as an explanatory variable Dummy for multiple categories Interaction terms with dummies Linear probability model 4 / 27

Single dummy as a regressor Consider the wage equation wage = β 0 + β 1 female + β 2 educ + u β 1 : the wage gain if the person is a woman rather than a man holding education constant. Easy to show that β 1 = E [wage female, educ] E [wage male, educ] level of education is the same across expectations. Effectively, β 1 delivers a different intercept for females 5 / 27

Intuition: shifting intercept 6 / 27

Interpretation of dummies Interpretation of dummies is with reference to the omitted group In wage = β 0 + β 1 female + β 2 educ + u we dropped the male category, and β 1 was the change in wage when female relative to being male (holding education constant). Must omit one category or the constant wage = γ 0 male + γ 1 female + β 2 educ + u β 2 is the same but γ 1 = β 0 + β 1 and γ 0 = β 0. 7 / 27

Difference in means If we are interested in wage discrimination, one could calculate w female w male. How is this related to ˆβ 1? Notice ˆδ 1 = w female w male in wage = δ 0 + δ 1 female + u When we add more regressors we are trying to achieve the causal interpretation. The wage differential between men and women could be due to omitted characteristics that matter for wage. Women might be on average more educated. Women might on average have less experience. 8 / 27

Example - wage discrimination Dep. Var. wage wage wage lwage (1) (2) (3) (4) female -2.512-2.273-1.811-0.301 (0.303) (0.279) (0.265) (0.0372) educ 0.506 0.572 0.0875 (0.0504) (0.0493) (0.00694) exper 0.0254 0.00463 (0.0116) (0.00163) tenure 0.141 0.0174 (0.0212) (0.00298) Constant 7.099 0.623-1.568 0.501 (0.210) (0.673) (0.725) (0.102) N 526 526 526 526 R 2 0.114 0.256 0.359 0.388 Mean y 5.896 5.896 5.896 1.623 9 / 27

Outline Single dummy as an explanatory variable Dummy for multiple categories Interaction terms with dummies Linear probability model 10 / 27

Categorical variables Often we want to include a categorical variable as a control in a regression. Example: marital status (=0 single, =1 married, =2 divorced, =3 widow), city of residence,... Including a categorical variable in a regression makes no sense, instead, you must introduce a set of dummies for each category Example: Correct specification wage = β 0 +β 1 married+β 2 divorced+β 3 widow+β 4 educ+u incorrect specification wage = β 0 + β 1marital status + β 4educ + u here we impose: β 2 = 2β 1 and β 3 = 3β 1. 11 / 27

Ordinal variables For ordinal variables only the ordering matters. Example: credit rating (CR= 0, 1, 2, 3, 4) 1. Including CR in a regression makes little sense 2. Including dummies for each value of CR makes a lot of sense this is also a more flexible specification, in other words (1) is a special case of (2)! 12 / 27

Fixed effects Think about a dataset with n individuals in J cities. We want to model returns to education ln wage = β 0 + β 1 educ + u We might think cities might have special features that change returns to education this could lead to omitted variable bias because those living in cities with a high returns to education, would achieve higher levels of education! Is there a way to correct for this? city fixed effects: include J 1 dummies that show whether the individual lives in each city J 1 ln wage = β 0 + β 1 educ + δ j d j + u j=1 sometimes we simplify the notation and write ln wage ij = β 0 + β 1 educ ij + δ j + u ij 13 / 27

Fixed effects What does ln wage ij = β 0 + β 1 educ ij + δ j + u ij do? each city shifts the intercept of the regression controls for any observed and unobserved city specific characteristics that matter for wage estimation of β 1 solely relies on within city variation in wage and education effectively, ˆβ 1 is the average of estimated returns to education from the J separate city regressions. 14 / 27

Outline Single dummy as an explanatory variable Dummy for multiple categories Interaction terms with dummies Linear probability model 15 / 27

Interaction terms with dummy variables Consider wage = β 0 + β 1 female + β 2 educ + β 3 female educ + u the single f emale term shifts intercept the interaction term shifts slope How do we test for whether returns to education is the same across genders? How do we test for whether being a female has no effect on the wage? 16 / 27

Graphical illustration 17 / 27

Testing for differences in regression functions across groups Could use the same interactions idea to test whether coefficients are different for various groups. For example wage = β 0 + β 1 female + β 2 educ + β 3 female educ + β 4 exper + β 5 female exper + u Testing for β 1 = β 3 = β 5 = 0 is equivalent to testing that the same regression equation applies to men and women. 18 / 27

Outline Single dummy as an explanatory variable Dummy for multiple categories Interaction terms with dummies Linear probability model 19 / 27

Linear probability model (LPM) Consider y = β 0 + β 1 x 1 + + β K x K + u where y is a dummy variable Conditional expectation is now the conditional probability of y = 1 E(y x) = 1 Pr(y = 1 x) + 0 Pr(y = 0 x) = Pr(y = 1 x) = β 0 + β 1 x 1 + + β K x K Increasing x 1 by 1 unit and holding x 2,..., x K fixed, probability of y = 1 increases by β 1 units 20 / 27

Pros and cons of LPM advantages: easy estimation and interpretation often works well in practice disadvantages predicted probabilities could be negative or greater than 1! model gives heteroskedastic errors V ar(y x) = p(x) [1 p(x)] Logit transformation could solve the issue of unreasonable predicted probabilities Pr(y = 1 x) = eβ 0+β 1 x 1 + e β 0+β 1 x [0, 1] 21 / 27

Example- Female labor force participation What are the determinants of being employed? husbands income, education, experience, age, number of kids,... LPM: inlf = β 0 + β 1 nwifeinc + β 2 educ + β 3 exper + β 4 age + β 5 kids +... use MROZ.dta to estimate this model units of obs: married women 22 / 27

Estimation results Dep. Var. inlf (N=753) (1) (2) nwifeinc -0.0050-0.0034 (0.0015) (0.0014) educ 0.0380 (0.0074) exper 0.0395 (0.0057) expersq -0.0006 (0.0002) age -0.0161 (0.0025) kidslt6-0.2618 (0.0335) kidsge6 0.0130 (0.0132) Constant 0.6692 0.5855 (0.0359) (0.1542) R 2 0.0125 0.257 23 / 27

Predicted probabilities - LPM 24 / 27

Predicted probabilities - Logit 25 / 27

Estimation results - heteroskedasticity adjusted Dep. Var. inlf (N=753) Unadjusted Adjusted (1) (2) (3) (4) nwifeinc -0.0050-0.0034-0.0050-0.0034 (0.0015) (0.0014) (0.0015) (0.0015) educ 0.0380 0.0380 (0.0074) (0.0073) exper 0.0395 0.0395 (0.0057) (0.0058) expersq -0.0006-0.0006 (0.0002) (0.0002) age -0.0161-0.0161 (0.0025) (0.0024) kidslt6-0.2618-0.2618 (0.0335) (0.0318) kidsge6 0.0130 0.0130 (0.0132) (0.0135) Constant 0.6692 0.5855 0.6692 0.5855 (0.0359) (0.1542) (0.0355) (0.1523) R 2 0.0125 0.257 0.0125 0.257 26 / 27

Summary In this lecture we discussed use of dummy variables as explanatory and dependent variables discussed interpretation of dummies fixed effect and interaction terms Linear Probability Model 27 / 27