Lecture 3: Multivariate Regression

Similar documents
Lecture 4: Multivariate Regression, Part 2

Lecture 4: Multivariate Regression, Part 2

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Lecture 5: Hypothesis testing with the classical linear model

THE MULTIVARIATE LINEAR REGRESSION MODEL

Economics 326 Methods of Empirical Research in Economics. Lecture 14: Hypothesis testing in the multiple regression model, Part 2

1 The basics of panel data

Econometrics Midterm Examination Answers

ECO220Y Simple Regression: Testing the Slope

Specification Error: Omitted and Extraneous Variables

ECON3150/4150 Spring 2016

Multiple Linear Regression CIVL 7012/8012

ECON3150/4150 Spring 2016

Handout 12. Endogeneity & Simultaneous Equation Models

Measurement Error. Often a data set will contain imperfect measures of the data we would ideally like.

1 Independent Practice: Hypothesis tests for one parameter:

Lecture 24: Partial correlation, multiple regression, and correlation

Section Least Squares Regression

Econometrics I KS. Module 1: Bivariate Linear Regression. Alexander Ahammer. This version: March 12, 2018

Basic econometrics. Tutorial 3. Dipl.Kfm. Johannes Metzler

Statistical Inference with Regression Analysis

Econometrics II Censoring & Truncation. May 5, 2011

statistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors:

Heteroskedasticity. Occurs when the Gauss Markov assumption that the residual variance is constant across all observations in the data set

sociology sociology Scatterplots Quantitative Research Methods: Introduction to correlation and regression Age vs Income

1. The shoe size of five randomly selected men in the class is 7, 7.5, 6, 6.5 the shoe size of 4 randomly selected women is 6, 5.

General Linear Model (Chapter 4)

Interpreting coefficients for transformed variables

Econometrics Homework 1

2.1. Consider the following production function, known in the literature as the transcendental production function (TPF).

Handout 11: Measurement Error

ECNS 561 Multiple Regression Analysis

Problem Set 1 ANSWERS

Graduate Econometrics Lecture 4: Heteroskedasticity

STATISTICS 110/201 PRACTICE FINAL EXAM

sociology 362 regression

Section I. Define or explain the following terms (3 points each) 1. centered vs. uncentered 2 R - 2. Frisch theorem -

Model Building Chap 5 p251

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

Multiple Regression Analysis: Estimation. Simple linear regression model: an intercept and one explanatory variable (regressor)

Lecture 8: Instrumental Variables Estimation

Practice exam questions

University of California at Berkeley Fall Introductory Applied Econometrics Final examination. Scores add up to 125 points

Multiple Regression Analysis

Greene, Econometric Analysis (7th ed, 2012)

1 Warm-Up: 2 Adjusted R 2. Introductory Applied Econometrics EEP/IAS 118 Spring Sylvan Herskowitz Section #

Lab 07 Introduction to Econometrics

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Multiple Regression: Inference

Heteroskedasticity. (In practice this means the spread of observations around any given value of X will not now be constant)

Lecture 7: OLS with qualitative information

Applied Statistics and Econometrics

Problem Set 10: Panel Data

Nonrecursive Models Highlights Richard Williams, University of Notre Dame, Last revised April 6, 2015

sociology 362 regression

ECON3150/4150 Spring 2015

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests

Multivariate Regression: Part I

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is

Applied Statistics and Econometrics

At this point, if you ve done everything correctly, you should have data that looks something like:

The general linear regression with k explanatory variables is just an extension of the simple regression as follows

Empirical Application of Simple Regression (Chapter 2)

Statistical Modelling in Stata 5: Linear Models

Longitudinal Data Analysis Using Stata Paul D. Allison, Ph.D. Upcoming Seminar: May 18-19, 2017, Chicago, Illinois

Lecture (chapter 13): Association between variables measured at the interval-ratio level

Multiple Regression. Peerapat Wongchaiwat, Ph.D.

Lecture 5. In the last lecture, we covered. This lecture introduces you to

Lecture 14. More on using dummy variables (deal with seasonality)

4 Instrumental Variables Single endogenous variable One continuous instrument. 2

Regression #8: Loose Ends

4 Instrumental Variables Single endogenous variable One continuous instrument. 2

Correlation and regression. Correlation and regression analysis. Measures of association. Why bother? Positive linear relationship

Instrumental Variables, Simultaneous and Systems of Equations

ECON Introductory Econometrics. Lecture 7: OLS with Multiple Regressors Hypotheses tests

Binary Dependent Variables

Linear Regression with Multiple Regressors

Multiple Regression Analysis

Lecture 8: Heteroskedasticity. Causes Consequences Detection Fixes

Week 3: Simple Linear Regression

Interaction effects between continuous variables (Optional)

Exercices for Applied Econometrics A

Problem Set #3-Key. wage Coef. Std. Err. t P> t [95% Conf. Interval]

Applied Statistics and Econometrics

ECON Introductory Econometrics. Lecture 6: OLS with Multiple Regressors

Immigration attitudes (opposes immigration or supports it) it may seriously misestimate the magnitude of the effects of IVs

Lab 6 - Simple Regression

7 Introduction to Time Series Time Series vs. Cross-Sectional Data Detrending Time Series... 15

ECON2228 Notes 7. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 41

Econometrics Review questions for exam

F Tests and F statistics

Control Function and Related Methods: Nonlinear Models

Lecture notes to Stock and Watson chapter 8

σ σ MLR Models: Estimation and Inference v.3 SLR.1: Linear Model MLR.1: Linear Model Those (S/M)LR Assumptions MLR3: No perfect collinearity

Fixed and Random Effects Models: Vartanian, SW 683

Lecture 3 Linear random intercept models

Introductory Econometrics. Lecture 13: Hypothesis testing in the multiple regression model, Part 1

Practice 2SLS with Artificial Data Part 1

Linear Modelling in Stata Session 6: Further Topics in Linear Modelling

Transcription:

Lecture 3: Multivariate Regression

Rates, cont. Two weeks ago, we modeled state homicide rates as being dependent on one variable: poverty. In reality, we know that state homicide rates depend on numerous variables. Our estimation of homicide rates using multiple regression will look something like this: Y X X X i 0 1 i1 2 i2 k ik i This allows us to estimate the effect of any one factor while holding all else constant.

Rates, cont. The true model: Y E E E R i 0 1 i1 2 i2 p ip i 0 j1 Our estimation model: Y X X X i 0 1 i1 2 i2 k ik i 0 p E k j1 R j ij i X j ij i

Rates, cont. Usually, the independent variables in our estimation model are some subset of the true model. We can rewrite the true model in terms of k observed and p-k unobserved variables: k Y X E R i 0 j ij j ij i j1 jk1 p

Rates, cont. Re-arranging the true equation: k j ij i 0 j ij i j1 jk1 Re-arranging the estimation equation: And substituting: X ( Y ) E R Y X i i 0 j ij j1 p i i 0 i 0 j ij i jk1 0 0 k Y Y E R ( ) E R p jk1 p j ij i

Rates, cont. This means that the error term in a regression reflects both the random component in the dependent variable, and the impact of all excluded variables. Variables besides poverty thought to influence homicide rates: Region, high school graduation, incarceration, unemployment, gun ownership, female headed households, population heterogeneity, income, welfare, law enforcement officers, IQ, smokers, other crime

Rates, example Recall, in a bivariate regression, we found the following: E(hom rate ).973.475 poverty u i i i Adding imprisonment rate and rate of femaleheaded households to the model yields the following: E(hom rate ) 7.34.005 poverty.0077 prison.89 femhh u i i i i i

Rates, example In the bivariate regression, imprisonment rates and rates of female-headed households were in the error term, and assumed to be uncorrelated with poverty rates. This assumption was false. In fact, explicitly controlling for just these two variables reduces the estimate for the effect of poverty on homicide rates from.475 to -.005

Rates, example E(hom rate ) 7.34.005 poverty.0077 prison.89 femhh u i i i i i It s important to know how to interpret the regression results. -7.34 is the expected homicide rate if poverty rates, imprisonment rates, and female-headed household rates were zero. This is never the case, so it s not a meaningful estimate..0077 is the effect of a 1 point increase in the imprisonment rate on the homicide rate, holding poverty and femhh constant..89 is the effect of a 1 point increase in the femaleheaded household rate on the homicide rate, holding poverty and prison constant. See Wooldridge pp. 78-9 (partialling out)

Rates, example E(hom rate ) 7.34.005 poverty.0077 prison.89 femhh u i i i i i Is the effect of female-headed households 115 times bigger than the effect of the imprisonment rate? prison: mean=404, s.d.=141 femhh: mean=10.2, s.d.=1.4 Because the standard deviation of prison is 100 times larger than femhh, it s not easy to directly compare the two estimates, unless we calculate standardized effects: prison:.422, femhh:.499

Rates, example E(hom rate ) 7.34.005 poverty.0077 prison.89 femhh u The fitted value for each state is the expected homicide rate given the poverty, imprisonment and female-headed household rate. For Arizona: i i i i i E(hom rate ) 7.34.005*15.2.0077*529.89*10.06 7.34.076 4.07 8.95 5.60 i

Rates, example E(hom rate ) 7.34.005 poverty.0077 prison.89 femhh u i i i i i The actual homicide rate in Arizona was 7.5, so the residual is 1.9 u y yˆ 7.5 5.6 1.9 i i i That s just one of 50 residuals. The sum of all residuals is zero. The sum of the squares of all residuals is as small as possible. That s how the estimates are chosen

R 2 Estimating and interpreting R 2 remains the same in multivariate regression. R 2 SSE yˆi y SST y y As more variables are included in the model, R 2 will either stay the same or increase. One danger is overfitting, where variables are included in the model that are explaining noise or random error in the dependent variable i 2 2

R 2, example. reg hom pov Source SS df MS Number of obs = 50 -------------+------------------------------ F( 1, 48) = 21.36 Model 100.175656 1 100.175656 Prob > F = 0.0000 Residual 225.109343 48 4.68977798 R-squared = 0.3080 -------------+------------------------------ Adj R-squared = 0.2935 Total 325.284999 49 6.63846936 Root MSE = 2.1656 ------------------------------------------------------------------------------ homrate Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- poverty.475025.1027807 4.62 0.000.2683706.6816795 _cons -.9730529 1.279803-0.76 0.451-3.54627 1.600164 ------------------------------------------------------------------------------

R 2, example. reg homrate pov IQ gdp leo welfare smokers income het gunowner fem_ unemp prison gradrate pop65 Source SS df MS Number of obs = 50 -------------+------------------------------ F( 14, 35) = 7.57 Model 244.511494 14 17.4651067 Prob > F = 0.0000 Residual 80.7735048 35 2.30781442 R-squared = 0.7517 -------------+------------------------------ Adj R-squared = 0.6524 Total 325.284999 49 6.63846936 Root MSE = 1.5191 ------------------------------------------------------------------------------ homrate Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- poverty -.1260969.1570399-0.80 0.427 -.444905.1927111 IQ -.2960415.2012222-1.47 0.150 -.7045442.1124613 gdp_percap -.0000843.0000675-1.25 0.220 -.0002214.0000527 leo.0078023.0062672 1.24 0.221 -.0049209.0205255 welfare.0043498.0046595 0.93 0.357 -.0051096.0138091 smokers.0731863.0906833 0.81 0.425 -.1109106.2572833 income_per~p.0000533.0001005 0.53 0.599 -.0001508.0002574 het 1.716118 3.287625 0.52 0.605-4.958115 8.390351 gunowner.0026661.0301547 0.09 0.930 -.0585511.0638834

R 2, example Our R 2 went up to.75! We can explain 75% of the variance in homicide rates, or can we? It could be that our high R 2 is due to overfitting. Solutions If you have enough cases, split your sample and build your model on half the cases. Test it once on the remaining cases. If you can t do that, avoid iterative or stepwise modeling as it produces biased estimates. Assess your model by adjusted R 2

Adjusted R 2 R R 2 SSE SSR 1 SST SST SSR / ( N k) N 1 1 1 (1 R ) SST / ( N 1) N k 2 2 Adjusted r-squared penalizes our estimate of explained variance by the number of parameters used.

F-test F SSE 2 k 1 R n k k1, N k 2 SSR 1R k1 N k The formula for the F-statistic remains the same in a multivariate context, we just have to adjust the degrees of freedom depending on how many parameters are in the model You can use the last expression above to calculate the F-statistic if Stata doesn t provide it, and all you have is R 2

F-test, cont. The F-test can be thought of as a formal test of the significance of R 2 H 0 1 2 1 : 0 H : j [1, k]: 0 j k That last line reads: There exists j in the set of values from 1 to k such that β j does not equal zero. In other words, at least one variable is statistically significant.

Gauss-Markov Assumptions 1) Linear in Parameters: Y X X X i 0 1 1 2 2 k k 2) Random Sampling: we have a random sample from the population that follows the above model. 3) No Perfect Collinearity: None of the independent variables is a constant, and there is no exact linear relationship between independent variables. 4) Zero Conditional Mean: The error has zero expected value for each set of values of k independent variables: E( i ) = 0 5) Unbiasedness of OLS: The expected value of our beta estimates is equal to the population values (the true model).

Next time: Homework: Problems 3.2, 3.4, C3.2, C3.4, C3.6 Read: Wooldridge Chapters 3 (again) & 4