Lab 6 - Simple Regression

Similar documents
Lab 10 - Binary Variables

ECON3150/4150 Spring 2016

ECON3150/4150 Spring 2015

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47

1 Linear Regression Analysis The Mincer Wage Equation Data Econometric Model Estimation... 11

Empirical Application of Simple Regression (Chapter 2)

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b.

THE MULTIVARIATE LINEAR REGRESSION MODEL

Lab 07 Introduction to Econometrics

Problem Set #3-Key. wage Coef. Std. Err. t P> t [95% Conf. Interval]

Problem Set #5-Key Sonoma State University Dr. Cuellar Economics 317- Introduction to Econometrics

Handout 12. Endogeneity & Simultaneous Equation Models

Essential of Simple regression

Section I. Define or explain the following terms (3 points each) 1. centered vs. uncentered 2 R - 2. Frisch theorem -

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

ECON3150/4150 Spring 2016

Warwick Economics Summer School Topics in Microeconometrics Instrumental Variables Estimation

Practice exam questions

Lecture 24: Partial correlation, multiple regression, and correlation

Economics 326 Methods of Empirical Research in Economics. Lecture 14: Hypothesis testing in the multiple regression model, Part 2

Problem Set 1 ANSWERS

The Regression Tool. Yona Rubinstein. July Yona Rubinstein (LSE) The Regression Tool 07/16 1 / 35

Lab 11 - Heteroskedasticity

1: a b c d e 2: a b c d e 3: a b c d e 4: a b c d e 5: a b c d e. 6: a b c d e 7: a b c d e 8: a b c d e 9: a b c d e 10: a b c d e

4 Instrumental Variables Single endogenous variable One continuous instrument. 2

Econometrics I KS. Module 1: Bivariate Linear Regression. Alexander Ahammer. This version: March 12, 2018

Applied Statistics and Econometrics

4 Instrumental Variables Single endogenous variable One continuous instrument. 2

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is

Econometrics Homework 1

Chapter 2: simple regression model

Soc 63993, Homework #7 Answer Key: Nonlinear effects/ Intro to path analysis

Interpreting coefficients for transformed variables

Specification Error: Omitted and Extraneous Variables

Multiple Linear Regression CIVL 7012/8012

Week 3: Simple Linear Regression

General Linear Model (Chapter 4)

Regression #8: Loose Ends

1 Warm-Up: 2 Adjusted R 2. Introductory Applied Econometrics EEP/IAS 118 Spring Sylvan Herskowitz Section #

2. (3.5) (iii) Simply drop one of the independent variables, say leisure: GP A = β 0 + β 1 study + β 2 sleep + β 3 work + u.

Section Least Squares Regression

Applied Statistics and Econometrics

1. The shoe size of five randomly selected men in the class is 7, 7.5, 6, 6.5 the shoe size of 4 randomly selected women is 6, 5.

Introductory Econometrics. Lecture 13: Hypothesis testing in the multiple regression model, Part 1

Applied Statistics and Econometrics

2.1. Consider the following production function, known in the literature as the transcendental production function (TPF).

ECON Introductory Econometrics. Lecture 7: OLS with Multiple Regressors Hypotheses tests

(a) Briefly discuss the advantage of using panel data in this situation rather than pure crosssections

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests

Question 1a 1b 1c 1d 1e 2a 2b 2c 2d 2e 2f 3a 3b 3c 3d 3e 3f M ult: choice Points

Basic econometrics. Tutorial 3. Dipl.Kfm. Johannes Metzler

Lecture 4: Multivariate Regression, Part 2

SOCY5601 Handout 8, Fall DETECTING CURVILINEARITY (continued) CONDITIONAL EFFECTS PLOTS

Applied Statistics and Econometrics

ECO220Y Simple Regression: Testing the Slope

Mediation Analysis: OLS vs. SUR vs. 3SLS Note by Hubert Gatignon July 7, 2013, updated November 15, 2013

Binary Dependent Variables

Ecmt 675: Econometrics I

Course Econometrics I

Statistical Inference with Regression Analysis

sociology 362 regression

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

Econometrics. 8) Instrumental variables

[ ESS ESS ] / 2 [ ] / ,019.6 / Lab 10 Key. Regression Analysis: wage versus yrsed, ex

Problem Set 10: Panel Data

Nonlinear relationships Richard Williams, University of Notre Dame, Last revised February 20, 2015

Lecture 4: Multivariate Regression, Part 2

Lecture#12. Instrumental variables regression Causal parameters III

The Simple Regression Model. Part II. The Simple Regression Model

Lecture 3: Multiple Regression. Prof. Sharyn O Halloran Sustainable Development U9611 Econometrics II

STATISTICS 110/201 PRACTICE FINAL EXAM

sociology 362 regression

Problem 4.1. Problem 4.3

Empirical Application of Panel Data Regression

ECON Introductory Econometrics. Lecture 16: Instrumental variables

sociology sociology Scatterplots Quantitative Research Methods: Introduction to correlation and regression Age vs Income

Course Econometrics I

Graduate Econometrics Lecture 4: Heteroskedasticity

Lecture 7: OLS with qualitative information

Econometrics Midterm Examination Answers

Simultaneous Equations with Error Components. Mike Bronner Marko Ledic Anja Breitwieser

Case of single exogenous (iv) variable (with single or multiple mediators) iv à med à dv. = β 0. iv i. med i + α 1

At this point, if you ve done everything correctly, you should have data that looks something like:

Lecture 8: Instrumental Variables Estimation

Ordinary Least Squares (OLS): Multiple Linear Regression (MLR) Analytics What s New? Not Much!

INTRODUCTION TO BASIC LINEAR REGRESSION MODEL

Quantitative Methods Final Exam (2017/1)

ECON Introductory Econometrics. Lecture 4: Linear Regression with One Regressor

Correlation and regression. Correlation and regression analysis. Measures of association. Why bother? Positive linear relationship

Interaction effects between continuous variables (Optional)

Greene, Econometric Analysis (7th ed, 2012)

8. Nonstandard standard error issues 8.1. The bias of robust standard errors

Heteroskedasticity Example

ECON The Simple Regression Model

ECON2228 Notes 7. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 41

Econometrics II Censoring & Truncation. May 5, 2011

1 A Review of Correlation and Regression

Multiple Regression: Inference

Wooldridge, Introductory Econometrics, 4th ed. Chapter 2: The simple regression model

1 The basics of panel data

Transcription:

Lab 6 - Simple Regression Spring 2017 Contents 1 Thinking About Regression 2 2 Regression Output 3 3 Fitted Values 5 4 Residuals 6 5 Functional Forms 8 Updated from Stata tutorials provided by Prof. Cichello 1

1 Thinking About Regression Today we will look at the following simple linear regression: wage = β 0 + β 1 educ + u (1) which models the relationship between education and earnings in a linear fashion. It is important to keep in mind that the fact that more educated people tend to earn more than less educated people does not mean necessarily that schooling causes earnings to increase even without resolving the causality problem, it is clear that education predicts earnings, at least in a narrow statistical sense This predictive power is summarized by the so-called conditional expectations function (CEF), which is the expectation (or population average) of wage, with educ held fixed: CEF = E(wage educ) (2) Note that in a dataset you will have multiple observations for your dependent and independent variables and usually, when looking at the relationship between education and earnings, observations are at the individual level. Indexing each individual with the letter i, we can rewrite equation (1) as: wage i = β 0 + β 1 educ i + u i (3) For simplicity, we omit the individual index, but it is always important to know what we are dealing with! One can prove that: wage = E(wage educ) + u (4) where E(u educ)=0, which implies that the error term is uncorrelated with any function of the variable educ.!! In words, equation (4) simply states that our random variable wage can be decomposed into a piece that is explained by educ, i.e. the CEF, and a piece left over u. Of course, this decomposition applies for general random y, x and u. Going back to the linear regression in equation (1), we now understand that it is a particular case of equation (4), where the CEF is assumed to be a linear function. When you run a regression in Stata to estimate ˆβ 0 and ˆβ 1, it is always important to understand what you are doing and what you are trying to estimate! If you are interested in understanding more about econometric methods, I suggest you to have a look at Mostly Harmless Econometrics: An Empiricist s Companion by Angrist-Pischke, Princeton University Press, 2008. 2

2 Regression Output We will use wage1, load the dataset using bcuse command. Since we are estimating the equation using Ordinary Least Squares (OLS) and we are in a simple bivariate case where there are a single regressor and a constant term, the coefficients in equation (1) will be equal to: β 1 = Cov(wage, educ) V ar(educ) (5) and β 0 = E(wage) β 1 E(educ) (6) What do you expect the sign of the slope coefficient (β 1 ) to be? Regress wage on educ. reg wage educ Source SS df MS Number of obs = 526 -------------+------------------------------ F( 1, 524) = 103.36 Model 1179.73204 1 1179.73204 Prob > F = 0.0000 Residual 5980.68225 524 11.4135158 R-squared = 0.1648 -------------+------------------------------ Adj R-squared = 0.1632 Total 7160.41429 525 13.6388844 Root MSE = 3.3784 wage Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- educ.5413593.053248 10.17 0.000.4367534.6459651 _cons -.9048516.6849678-1.32 0.187-2.250472.4407687 First let s look at the coefficient output (today we will focus only on the coefficients output): wage Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- educ.5413593.053248 10.17 0.000.4367534.6459651 _cons -.9048516.6849678-1.32 0.187-2.250472.4407687 Here wage is the dependent variable, while educ is the independent variable. A constant term (_cons) is added automatically without the need to specify. 3

The estimated beta coefficients are: ˆβ 0 =.90 ˆβ 1 =.54 How would you interpret the estimate of the slope coefficient ( ˆβ 1 )? β 1 measures the change in the hourly wage for an additional year of education, holding all other factors fixed. Note that the linearity of (1) implies that a one-unit change in education has the same effect on hourly wage, regardless of the initial value of the variable education.!! If you are asked to interpret one of the regression coefficients in a problem set or quiz always make sure to mention the following: magnitude of the coefficient put a number! units both of the dependent variable and independent variable of interest change in dependent variable is ceteris paribus or holding all other factors fixed generally no causal interpretation Make a scatter plot of wage and education, including a fitted line. To do so use the following command (Learnt in Lab 4): twoway (scatter wage educ) (lfit wage educ) Figure 1: Wage and education relationship The fitted line is the same as the regression line we estimated in the regression above: it has an intercept equal to ˆβ 0 =.90 and slope ˆβ 1 =.54. 4

3 Fitted Values After regression estimation we can construct fitted values (ŵage). For each observation i in the dataset, the fitted values are constructed as: ŵage i = ˆβ 0 + ˆβ 1 educ i To obtain the predicted values in Stata we use the command predict newvar after a regression estimation. Type in: predict wagehat where wagehat is the name of the new variable that will be created by the predict command (you choose what you want to call the newly created variable). Open the Data Browser to see the new variable. Now make a scatter plot of wagehat and educ, including a fitted line for wage and educ. The command for this is: graph twoway (scatter wagehat educ) (lfit wage educ) Figure 2: Relationship between predicted wage and education By construction, each fitted value of ŵage i is on the OLS regression line! 5

4 Residuals The residual for observation i is the difference between the actual value of the dependent variable (wage i ) and its fitted value (ŵage i ): û i = wage i ŵage i In Stata to obtain residuals we can again use the predict command specifying the option residual. For example to create variable resid to contain the residuals from the above regression type: predict resid, residuals Note resid is only a variable name, and you can choose any other name for residuals. Look at the first 10 observations for wage, wagehat, and resid:. list wage wagehat resid in 1/10 +------------------------------+ wage wagehat resid ------------------------------ 1. 3.1 5.0501-1.9501 2. 3.24 5.591459-2.35146 3. 3 5.0501-2.0501 4. 6 3.426023 2.573977 5. 5.3 5.591459 -.2914593 ------------------------------ 6. 8.75 7.756896.9931036 7. 11.25 8.839615 2.410385 8. 5 5.591459 -.5914595 9. 3.6 5.591459-1.991459 10. 18.18 8.298256 9.881744 +------------------------------+ Above we can see that the residual takes on both positive and negative values. When the residual (resid) is positive, the fitted value is less than the actual value of wage, i.e. the fitted value underpredicts wage i. If the residual is negative, then ŵage i > wage i. The ideal case for observation i is when û i = 0 but in most cases each residual is NOT equal to zero. Can you tell the sign of the residuals by looking at Figure 1? 6

Let s look at a scatter plot of the residuals scatter resid educ Figure 3: Scatter plot of residuals You can combine your observations, the fitted values and the regression line in a single plot with the following command: twoway (scatter wage wagehat educ) (lfit wage educ) Figure 4: Scatter plot of residuals Another way to obtain residuals is to use the predicted values for wage we obtained above: gen resid2 = wage - wagehat Summarize resid and resid2: 7

. sum resid resid2 Variable Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- resid 526 4.43e-09 3.37517-5.339615 16.60854 resid2 526 2.79e-08 3.37517-5.339615 16.60854 What do you think the mean of the residuals would be? What is it? Note the means of both variables are in fact the same. 5 Functional Forms So far we have focused on linear relationships between the dependent and independent variables. For many economics applications linear relationships are not enough and we may want to introduce non linearities in our econometric model. One simple way to do this is by appropriately redefining the dependent and independent variables. When we estimated the equation wage = β 0 + β 1 educ + U (7) we implicitly assumed that an additional year of education has the same effect not the hourly wage for either the first year of education or the twentieth (recall the interpretation of the β 1 ). A more reasonable characterization may be that each year of education increases wage by a constant percentage. A model that gives (approximately) a constant percentage effect is ln(wage) = β 0 + β 1 educ + u (8) How would you interpret β 1 in this case? 100β 1 measures the percentage change in wage given one additional year of education.. regress lwage educ Source SS df MS Number of obs = 526 -------------+------------------------------ F( 1, 524) = 119.58 Model 27.5606296 1 27.5606296 Prob > F = 0.0000 Residual 120.769132 524.230475443 R-squared = 0.1858 -------------+------------------------------ Adj R-squared = 0.1843 Total 148.329762 525.28253288 Root MSE =.48008 lwage Coef. Std. Err. t P> t [95\% Conf. Interval] 8

-------------+---------------------------------------------------------------- educ.0827444.0075667 10.94 0.000.0678796.0976092 _cons.5837726.0973358 6.00 0.000.3925562.774989 In this case, one additional year of education increases wages by approximately 8.27%. Another important use in economics of the natural log is in obtaining the constant elasticity model. To obtain this, we take the logarithm of both the dependent and independent variable. For this example, let s use the dataset on CEO salaries and firm sales (which you can access with bcuse ceosal1). The constant elasticity model for CEO salaries and sales is: ln(salary) = β 0 + ln(sales) + u (9). reg lsalary lsales Source SS df MS Number of obs = 209 -------------+------------------------------ F( 1, 207) = 55.30 Model 14.0661688 1 14.0661688 Prob > F = 0.0000 Residual 52.6559944 207.254376785 R-squared = 0.2108 -------------+------------------------------ Adj R-squared = 0.2070 Total 66.7221632 208.320779631 Root MSE =.50436 lsalary Coef. Std. Err. t P> t [95\% Conf. Interval] -------------+---------------------------------------------------------------- lsales.2566717.0345167 7.44 0.000.1886224.3247209 _cons 4.821997.2883396 16.72 0.000 4.253538 5.390455 Now β 1 is the elasticity of salary with respect to sales. A 1% increase in sales increases CEO salary by 0.257%. 9