ECNS 561 Topics in Multiple Regression Analysis

Similar documents
ECO375 Tutorial 4 Wooldridge: Chapter 6 and 7

Inference in Regression Analysis

Multiple Regression Analysis: Further Issues

Outline. 2. Logarithmic Functional Form and Units of Measurement. Functional Form. I. Functional Form: log II. Units of Measurement

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is

Answer Key: Problem Set 6

Multiple Regression Analysis: Inference MULTIPLE REGRESSION ANALYSIS: INFERENCE. Sampling Distributions of OLS Estimators

CHAPTER 4. > 0, where β

CHAPTER 7. + ˆ δ. (1 nopc) + ˆ β1. =.157, so the new intercept is = The coefficient on nopc is.157.

2. (3.5) (iii) Simply drop one of the independent variables, say leisure: GP A = β 0 + β 1 study + β 2 sleep + β 3 work + u.

7. Prediction. Outline: Read Section 6.4. Mean Prediction

Problem Set # 1. Master in Business and Quantitative Methods

Problem 4.1. Problem 4.3

Chapter 9: The Regression Model with Qualitative Information: Binary Variables (Dummies)

Lecture Module 6. Agenda. Professor Spearot. 1 P-values. 2 Comparing Parameters. 3 Predictions. 4 F-Tests

Answer Key: Problem Set 5

Wednesday, October 10 Handout: One-Tailed Tests, Two-Tailed Tests, and Logarithms

In Chapter 2, we learned how to use simple regression analysis to explain a dependent

In Chapter 2, we learned how to use simple regression analysis to explain a dependent

Introductory Econometrics Exercises for tutorials (Fall 2014)

ECON 482 / WH Hong Binary or Dummy Variables 1. Qualitative Information

Regression with Qualitative Information. Part VI. Regression with Qualitative Information

Solutions to Problem Set 5 (Due November 22) Maximum number of points for Problem set 5 is: 220. Problem 7.3

Econometrics Review questions for exam

Economics 113. Simple Regression Assumptions. Simple Regression Derivation. Changing Units of Measurement. Nonlinear effects

Answer Key. 9.1 Scatter Plots and Linear Correlation. Chapter 9 Regression and Correlation. CK-12 Advanced Probability and Statistics Concepts 1

Exercise sheet 3 The Multiple Regression Model

Correlation & Simple Regression

q3_3 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Making sense of Econometrics: Basics

Wooldridge, Introductory Econometrics, 4th ed. Chapter 6: Multiple regression analysis: Further issues

Eco 391, J. Sandford, spring 2013 April 5, Midterm 3 4/5/2013

AP Final Review II Exploring Data (20% 30%)

ECNS 561 Multiple Regression Analysis

Final Exam - Solutions

Scatterplots. 3.1: Scatterplots & Correlation. Scatterplots. Explanatory & Response Variables. Section 3.1 Scatterplots and Correlation

Lesson 4 Linear Functions and Applications

(quantitative or categorical variables) Numerical descriptions of center, variability, position (quantitative variables)

CHAPTER 1 Functions Graphs, and Models; Linear Functions. Algebra Toolbox Exercises. 1. {1,2,3,4,5,6,7,8} and

H. Diagnostic plots of residuals

Can you tell the relationship between students SAT scores and their college grades?

Solutions to Problem Set 4 (Due November 13) Maximum number of points for Problem set 4 is: 66. Problem C 6.1

More on Specification and Data Issues

Announcements: You can turn in homework until 6pm, slot on wall across from 2202 Bren. Make sure you use the correct slot! (Stats 8, closest to wall)

Tables and Figures. This draft, July 2, 2007

Prob/Stats Questions? /32

HOMEWORK (due Wed, Jan 23): Chapter 3: #42, 48, 74

Section 5: Dummy Variables and Interactions

CHAPTER 4 & 5 Linear Regression with One Regressor. Kazu Matsuda IBEC PHBU 430 Econometrics

Ch 7: Dummy (binary, indicator) variables

Regression #8: Loose Ends

Topics in Applied Econometrics and Development - Spring 2014

Introduction to Linear Regression Analysis

Problem Set 1 ANSWERS

11 Correlation and Regression

Exercise Sheet 4 Instrumental Variables and Two Stage Least Squares Estimation

Sociology 593 Exam 2 March 28, 2002

Statistical Inference. Part IV. Statistical Inference

Unless provided with information to the contrary, assume for each question below that the Classical Linear Model assumptions hold.

Regression Analysis with Cross-Sectional Data

Correlation Coefficient: the quantity, measures the strength and direction of a linear relationship between 2 variables.

x i = 1 yi 2 = 55 with N = 30. Use the above sample information to answer all the following questions. Show explicitly all formulas and calculations.

Lecture 6: Linear Regression

Chapter 8. Linear Regression /71

Sociology 593 Exam 2 Answer Key March 28, 2002

1. Use Scenario 3-1. In this study, the response variable is

Recall, Positive/Negative Association:

9. Linear Regression and Correlation

In the previous chapter, we learned how to use the method of least-squares

Correlation & Regression Chapter 5

Multiple Linear Regression CIVL 7012/8012

Inference in Regression Model

Marketing Research Session 10 Hypothesis Testing with Simple Random samples (Chapter 12)

Linear Regression With Special Variables

Math 1101 Exam 3 Practice Problems

LECTURE NOTES II FUNCTIONAL FORMS OF REGRESSION MODELS 1. THE LOG-LINEAR MODEL

Important note: Transcripts are not substitutes for textbook assignments. 1

Regression Models REVISED TEACHING SUGGESTIONS ALTERNATIVE EXAMPLES

Universidad Carlos III de Madrid Econometría Nonlinear Regression Functions Problem Set 8

Lecture 6: Linear Regression (continued)

Lecture 3: Multiple Regression. Prof. Sharyn O Halloran Sustainable Development U9611 Econometrics II

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Multiple Linear Regression

IF YOU HAVE DATA VALUES:

Lecture 10: Alternatives to OLS with limited dependent variables. PEA vs APE Logit/Probit Poisson

Brief Suggested Solutions

Lecture 19: Inference for SLR & Transformations

Model Specification and Data Problems. Part VIII

appstats8.notebook October 11, 2016

ECONOMETRIC MODEL WITH QUALITATIVE VARIABLES

Economics Introduction to Econometrics - Fall 2007 Final Exam - Answers

Notes 6: Multivariate regression ECO 231W - Undergraduate Econometrics

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima

18.0 Multiple and Nonlinear Regression

Lecture 4: Multivariate Regression, Part 2

Worksheet Topic 1 Order of operations, combining like terms 2 Solving linear equations 3 Finding slope between two points 4 Solving linear equations

ECON3150/4150 Spring 2015

3.2: Least Squares Regressions

CHAPTER 5 FUNCTIONAL FORMS OF REGRESSION MODELS

Transcription:

ECNS 561 Topics in Multiple Regression Analysis

Scaling Data For the simple regression case, we already discussed the effects of changing the units of measurement Nothing different here Coefficients, SEs, CIs, t and F stats change in ways that preserve all measured effects and testing outcomes Consider the textbook example of regression child birth weight in ounces vs. pounds on the number of cigarettes smoked by the mother while pregnant Insert more general proof

Functional Form Logarithmic function form Model Recall the following Dependent Variable Independent Variable Level-level y x Δy = βδx Interpretation of β Level-log y Log(x) Δy=(β/100)%Δx Log-level Log(y) x %Δy = (100β) Δx Log-log Log(y) Log(x) %Δy = β% Δx

Why again do we use logs in applied work? Linear interpretation may not be reasonable Suppose β 1 = -5.00 in a county-level regression on the relationship between crime and education (with an interpretation as follows an increase in the average level of education in the county by one year, leads to five fewer crimes per 1,000 population annually here, we have defined the dependent variable as a rate) This means that the marginal impact of an added year of education is the same if we go from 9 to 10 years or from, say, 13 to 14 years Perhaps a better characterization of how crime changes with education levels is that each year of education decreases crime by a constant percentage. We can be ignorant about the units of measurement of variables appearing in log form because the slope coefficients are invariant to rescalings When y > 0, models using log(y) as the dependent variable often satisfy the classical linear model assumptions more closely than models using the level of y Taking the log of a variable often will narrow its range. Ex. Annual sales, baseball players salaries, population variables. Narrowing the range of the dependent and independent variables can make OLS estimates less sensitive to outliers Not always the case. If y is a ratio b/w 0 and 1, then log(y) can be very large in magnitude, whereas the original variable, y, is bounded b/w 0 and 1.

Models with Quadratics Suppose we are interested in econometrically modeling the age-crime profile Or, the crime time of day profile

In both cases, crime depends on a factor in a non-linear fashion looks more like a quadratic relationship We would write the crime-age regression as Crime i = β 0 + β 1 Age i + β 2 Age i2 + ε i The marginal effect of Age on Crime depends on Age itself Crime Age = β 1 + 2β 2 Age For the relationship between age and crime, what do we expect the signs on β 1 and β 2 to be? We can easily solve for the turning point i.e., where the crimeage profile is at its peak how would we do this? Age = β 1 2 β 2

The relationship need not be parabolic e.g., we could have a U- shaped relationship In this case, β 1 < 0 and β 2 > 0 If β 1 and β 2 share the same sign, there is no turning point for values of the independent variable > 0. For example, if both estimates are positive, the smallest expected value of y is at x = 0, and increases in x always have a positive and increasing effect on y.

Interaction Terms Often, we want to allow for the partial effect of the dependent variable with respect to an explanatory variable to depend on the magnitude of yet another explanatory variable For example, perhaps we want to allow the education and crime relationship to differ by gender Crime i = β 0 + β 1 Educ i + β 2 Male i + β 3 Educ i *Male i + ε i Here, the marginal effect of education on crime would be Crime Educ = β 1 + β 3 Male If β 1 < 0 (as we would probably expect) and β 3 > 0, then this implies that one more year of education has less of a crime-reducing effect for males than it does for females i.e., there is an interaction effect between gender and education. [show example of triple interaction in MDA and crime paper]

Crime i = β 0 + β 1 Educ i + β 2 Male i + β 3 Educ i *Male i + ε i Interpreting the other variables in the model can get tricky when interaction terms have been included. Q. What is the interpretation of β 2? Ans. It is the relationship between gender and crime for individuals with zero years of education. Obviously, this is not a very interesting interpretation So, when evaluating the relationship between gender and crime, we must be sure to do so for interesting values of Educ. Q. What might be some interesting values of Educ?

Some practice problems (from Ch. 6) 1.) The following equation was estimated for firm salaries as a function of sales and return on equity log(salary) = 4.322 +.276log(sales) +.0215roe -.00008roe 2 (.324) (.033) (.0129) (.00026) N = 209, R 2 =.282, standard deviation of roe 25 This equation allows roe to have a diminishing effect on log(salary). Is this generally necessary? Explain why or why not. Ans. No, this is generally not necessary. The t state on roe 2 is only about -.30 i.e., this variable is nowhere near statistically significant In addition, the inclusion of this squared term has only a very minor effect on the slope even when the values of roe are large Approximate slope is.0215 -.00016roe. So, even a one standard deviation increase in roe doesn t matter much for the slope.

3.) The following equation was obtained via OLS N = 32 R 2 =.1484. rdintens = 2.613 +.00030sales -.0000000070sales 2 Note: sales are in millions of dollars (.429) (.00014) (.0000000037) (i) At what point does the marginal effect of sales on rdintens become negative? (ii) Would you keep the quadratic term in the model? (iii) Define salesbil = sales/1000. Rewrite the estimated equation. (iv) For the purpose of reporting the results, which equation do you prefer?

4.) The following model allows the return to education to depend upon the total amount of both parents education, called pareduc: log(wage) = β 0 + β 1 educ + β 2 educ*pareduc + β 3 exper + β 4 tenure + ε (i) Show that, in decimal form, the return to another year of education in this model is Δlog(wage)/Δeduc = β 1 + β 2 pareduc What sign do you expect for β 2? Why?

(ii) Suppose we estimate the following log(wage) = 5.65 +.047educ +.00078educ*pareduc +.019exper N = 722, R 2 =.169 (.13) (.010) (.00021) (.004) +.010tenure (.003) Interpret the coefficient on the interaction term. It might help to choose two specific values for pareduc and to compare the estimated return to educ what might be reasonable values to choose for evaluation?

(iii) When pareduc is added as a separate variable to the equation, we get: log(wage) = 4.94 +.097educ +.033pareduc -.0016educ*pareduc N = 722, R 2 =.174 (.38) (.027) (.017) (.0012) +.020exper +.010tenure (.004) (.003) Does the estimated return to education now depend positively on parent education? Test the null that the return to education does not depend on parent education.

8.) Suppose we want to estimate the effects of alcohol consumption (alcohol) on college grade point average (colgpa). In addition to collecting information on grade point averages and alcohol usage, we also obtain attendance information (say, percentage of lectures attended, called attend). A standardized test score (say, SAT) and high school GPA (hsgpa) are also available. (i) Should we include attend along with alcohol as explanatory variables in a multiple regression model? (ii) Should SAT and hsgpa be included as explanatory variables?