Interpreting coefficients for transformed variables

Similar documents
2.1. Consider the following production function, known in the literature as the transcendental production function (TPF).

Section Least Squares Regression

Problem Set #3-Key. wage Coef. Std. Err. t P> t [95% Conf. Interval]

Functional Form. So far considered models written in linear form. Y = b 0 + b 1 X + u (1) Implies a straight line relationship between y and X

SOCY5601 Handout 8, Fall DETECTING CURVILINEARITY (continued) CONDITIONAL EFFECTS PLOTS

Problem Set 1 ANSWERS

Exercices for Applied Econometrics A

Practice exam questions

Unemployment Rate Example

Applied Statistics and Econometrics

Lecture 4: Multivariate Regression, Part 2

1 Independent Practice: Hypothesis tests for one parameter:

Specification Error: Omitted and Extraneous Variables

1 Warm-Up: 2 Adjusted R 2. Introductory Applied Econometrics EEP/IAS 118 Spring Sylvan Herskowitz Section #

Interaction effects between continuous variables (Optional)

Lecture 24: Partial correlation, multiple regression, and correlation

Problem Set #5-Key Sonoma State University Dr. Cuellar Economics 317- Introduction to Econometrics

Economics 326 Methods of Empirical Research in Economics. Lecture 14: Hypothesis testing in the multiple regression model, Part 2

Lecture 4: Multivariate Regression, Part 2

Lab 6 - Simple Regression

Nonlinear Regression Functions

Lecture 8: Functional Form

ECO220Y Simple Regression: Testing the Slope

Lab 07 Introduction to Econometrics

STATISTICS 110/201 PRACTICE FINAL EXAM

sociology 362 regression

Answers: Problem Set 9. Dynamic Models

Lecture notes to Stock and Watson chapter 8

Week 3: Simple Linear Regression

Econometrics II Censoring & Truncation. May 5, 2011

CHAPTER 5 FUNCTIONAL FORMS OF REGRESSION MODELS

ECON3150/4150 Spring 2016

Applied Statistics and Econometrics

Binary Dependent Variables

sociology 362 regression

Measurement Error. Often a data set will contain imperfect measures of the data we would ideally like.

Lecture 5. In the last lecture, we covered. This lecture introduces you to

Rockefeller College University at Albany

Heteroskedasticity. (In practice this means the spread of observations around any given value of X will not now be constant)

sociology sociology Scatterplots Quantitative Research Methods: Introduction to correlation and regression Age vs Income

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b.

1. The shoe size of five randomly selected men in the class is 7, 7.5, 6, 6.5 the shoe size of 4 randomly selected women is 6, 5.

Lecture 3: Multivariate Regression

Correlation and regression. Correlation and regression analysis. Measures of association. Why bother? Positive linear relationship

Empirical Application of Simple Regression (Chapter 2)

Introductory Econometrics. Lecture 13: Hypothesis testing in the multiple regression model, Part 1

Handout 11: Measurement Error

General Linear Model (Chapter 4)

Regression #8: Loose Ends

Question 1a 1b 1c 1d 1e 2a 2b 2c 2d 2e 2f 3a 3b 3c 3d 3e 3f M ult: choice Points

Heteroskedasticity. Occurs when the Gauss Markov assumption that the residual variance is constant across all observations in the data set

Correlation and Simple Linear Regression

Soc 63993, Homework #7 Answer Key: Nonlinear effects/ Intro to path analysis

Introduction to Regression

Eksamen. ge-506 AdvancedEconometrics. Dato: Varighet 9:00-12:00. Antall sider inkl. forside 5. Tillatte hjelpemidler

Ordinary Least Squares (OLS): Multiple Linear Regression (MLR) Analytics What s New? Not Much!

ECON 497 Final Exam Page 1 of 12

10) Time series econometrics

BIOSTATS 640 Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) STATA Users

1: a b c d e 2: a b c d e 3: a b c d e 4: a b c d e 5: a b c d e. 6: a b c d e 7: a b c d e 8: a b c d e 9: a b c d e 10: a b c d e

4 Instrumental Variables Single endogenous variable One continuous instrument. 2

Handout 12. Endogeneity & Simultaneous Equation Models

Nonrecursive Models Highlights Richard Williams, University of Notre Dame, Last revised April 6, 2015

Problem Set 10: Panel Data

Greene, Econometric Analysis (7th ed, 2012)

Econometrics Homework 1

Measures of Fit from AR(p)

ECON3150/4150 Spring 2015

Please discuss each of the 3 problems on a separate sheet of paper, not just on a separate page!

Statistical Modelling in Stata 5: Linear Models

leebounds: Lee s (2009) treatment effects bounds for non-random sample selection for Stata

F Tests and F statistics

Testing methodology. It often the case that we try to determine the form of the model on the basis of data

1 A Review of Correlation and Regression

Econometrics I KS. Module 1: Bivariate Linear Regression. Alexander Ahammer. This version: March 12, 2018

(a) Briefly discuss the advantage of using panel data in this situation rather than pure crosssections

4 Instrumental Variables Single endogenous variable One continuous instrument. 2

Forecast with Trend Model

Appendix A. Numeric example of Dimick Staiger Estimator and comparison between Dimick-Staiger Estimator and Hierarchical Poisson Estimator

Econometrics Midterm Examination Answers

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is

Applied Statistics and Econometrics

Linear Modelling in Stata Session 6: Further Topics in Linear Modelling

Autocorrelation. Think of autocorrelation as signifying a systematic relationship between the residuals measured at different points in time

Autoregressive models with distributed lags (ADL)

Essential of Simple regression

Lab 10 - Binary Variables

Lecture#12. Instrumental variables regression Causal parameters III

Question 1 [17 points]: (ch 11)

1 The basics of panel data

ECON Introductory Econometrics. Lecture 7: OLS with Multiple Regressors Hypotheses tests

Section I. Define or explain the following terms (3 points each) 1. centered vs. uncentered 2 R - 2. Frisch theorem -

THE MULTIVARIATE LINEAR REGRESSION MODEL

. regress lchnimp lchempi lgas lrtwex befile6 affile6 afdec6 t

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests

Multiple Regression Analysis: Estimation. Simple linear regression model: an intercept and one explanatory variable (regressor)

Course Econometrics I

Statistical Inference with Regression Analysis

Lecture 14. More on using dummy variables (deal with seasonality)

Question 1 carries a weight of 25%; Question 2 carries 20%; Question 3 carries 20%; Question 4 carries 35%.

Transcription:

Interpreting coefficients for transformed variables! Recall that when both independent and dependent variables are untransformed, an estimated coefficient represents the change in the dependent variable expected when the independent variable changes by one unit, holding the other variables constant.! The substantive interpretation of coefficients in such situations is accordingly fairly straightforward.! Interpreting coefficients when variables have been transformed can be somewhat trickier.! The most straightforward case involves transforms with logarithms.! We will deal with situation first, and talk about how to deal with some of the others later.

Logged variables! There are two common bases that are used for logarithmic transformations.! A natural logarithm is in base e. e, you may know, is a mathematical constant. Its first few digits are 2.71828. y! The natural log of x is y such that e = x.! In STATA, log(x) and ln(x) both return the natural log of x.! Another common base for the logarithm is 10. y! The log 10 of x is y such that 10 = x.! In STATA, log (x) returns the log of x. 10 10! One property of logarithms is that multiplying x by some constant a adds log a to its log.! Thus if the natural log of a variable increases by 1, that implies that the original variable has been multiplied by e.! If log of a variable increases by 1, the original variable 10 has been multiplied by 10.

Either the independent or dependent variable is logged! If the dependent variable is raw, and the independent variable is logged, the estimated coefficient b is the absolute change in the dependent variable expected when the original independent variable is multiplied by e or 10, depending on the base of the transform.! In this situation, you can work out the expected change in the dependent variable associated with a x percent increase in the independent variable by multiplying the coefficient by log([100+x]/100). Make sure to keep the bases the same.! To work out the expected change associated with a 10% increase in the independent variable, therefore, multiply by log(110/100) = log(1.1).! ln(1.1) = 0.09531! log 10(1.1) = 0.041393! If the dependent variable is logged, and the independent variable is not, every unit change in the independent variable is expected to multiply the original dependent b b variable by e or 10, depending on the base of the transform. b is the estimated coefficient.

When both independent and dependent variables are logged! If both the independent and dependent variables are logged, multiplying the original independent variable by e or 10 b b will multiply the original dependent variable by e or 10, depending on the base.! In the latter situation, where a proportional change in the independent variable is associated with a proportional change in the dependent variable, the coefficient is referred to as an elasticity.! To get the proportional change in the dependent variable associated with a x percent increase in the independent ab variable, calculate a = log([100+x]/100) and take e or ab 10, depending on the base.! The predicted proportional change can be converted to a predicted percentage change by subtracting 1 and multiplying by 100.! Be careful in all these calculations to keep your bases consistent.

Some examples! Let's consider the relationship between the percentage urban and per capita GNP: 100 % urban 95 (World Bank) 8 77 42416 United Nations per capita GDP! This doesn't look too good. Let's try transforming the per capita GNP by logging it: 100 % urban 95 (World Bank) 8 4.34381 10.6553 lpcgdp95

! That looked pretty good. Now let's quantify the association between percentage urban and the logged per capita income:. regress urb95 lpcgdp95 Source SS df MS Number of obs = 132 ---------+------------------------------ F( 1, 130) = 158.73 Model 38856.2103 1 38856.2103 Prob > F = 0.0000 Residual 31822.7215 130 244.790165 R-squared = 0.5498 ---------+------------------------------ Adj R-squared = 0.5463 Total 70678.9318 131 539.533831 Root MSE = 15.646 urb95 Coef. Std. Err. t P> t [95% Conf. Interval] ---------+-------------------------------------------------------------------- lpcgdp95 10.43004.8278521 12.599 0.000 8.792235 12.06785 _cons -24.42095 6.295892-3.879 0.000-36.87662-11.96528! The implication of this coefficient is that multiplying capita income by e, roughly 2.71828, 'increases' the percentage urban by 10.43 percentage points.! Increasing per capita income by 10% 'increases' the percentage urban by 10.43*0.09531 = 0.994 percentage points.

What about the situation where the dependent variable is logged?! We could just as easily have considered the 'effect' on logged per capita income of increasing urbanization: 10.6553 lpcgdp95 4.34381 8 100 % urban 95 (World Bank). regress lpcgdp95 urb95 Source SS df MS Number of obs = 132 ---------+------------------------------ F( 1, 130) = 158.73 Model 196.362646 1 196.362646 Prob > F = 0.0000 Residual 160.818406 130 1.23706466 R-squared = 0.5498 ---------+------------------------------ Adj R-squared = 0.5463 Total 357.181052 131 2.72657291 Root MSE = 1.1122 lpcgdp95 Coef. Std. Err. t P> t [95% Conf. Interval] ---------+-------------------------------------------------------------------- urb95.052709.0041836 12.599 0.000.0444322.0609857 _cons 4.630287.2420303 19.131 0.000 4.151459 5.109115! Every one point increase in the percentage urban multiplies 0.052709 per capita income by e = 1.054. In other words, it increases per capita income by 5.4%.

Logged independent and dependent variables! Let's look at infant mortality and per capita income: 5.1299 limr 1.38629 3.58352 10.6553 lpcgdp95. regress limr lpcgdp95 Source SS df MS Number of obs = 194 ---------+------------------------------ F( 1, 192) = 404.52 Model 131.035233 1 131.035233 Prob > F = 0.0000 Residual 62.1945021 192.323929698 R-squared = 0.6781 ---------+------------------------------ Adj R-squared = 0.6765 Total 193.229735 193 1.00119034 Root MSE =.56915 limr Coef. Std. Err. t P> t [95% Conf. Interval] ---------+-------------------------------------------------------------------- lpcgdp95 -.4984531.0247831-20.113 0.000 -.5473352 -.449571 _cons 7.088676.1908519 37.142 0.000 6.71224 7.465111! Thus multiplying per capita income by 2.718 multiplies the -0.4984531 infant mortality rate by e = 0.607! A 10% increase in per capita income multiplies the infant -0.4984531*ln(1.1) mortality rate e = 0.954.! In other words, a 10% increase in per capita income reduces the infant mortality rate by 4.6%.

What about other transformations?! The power and root transformations don't lead to such intuitive interpretations.! The coefficient represents the effect, after all, of a change in the power or root of the original variable.! One of the best things to do in such situations is to look at predicted values of the dependent variable for a range of values of the independent variable, most likely through a graphical plot of the predicted variable against the untransformed variable.! Consider the relationship between IMR and the square root of the percentage of houses with running water: 149 IMR 4 3.60555 10 water2. regress IMR water2 Source SS df MS Number of obs = 92 ---------+------------------------------ F( 1, 90) = 134.76 Model 83700.8284 1 83700.8284 Prob > F = 0.0000 Residual 55899.0412 90 621.100457 R-squared = 0.5996 ---------+------------------------------ Adj R-squared = 0.5951 Total 139599.87 91 1534.0645 Root MSE = 24.922 IMR Coef. Std. Err. t P> t [95% Conf. Interval] ---------+-------------------------------------------------------------------- water2-20.17469 1.737893-11.609 0.000-23.62732-16.72206 _cons 217.738 14.52444 14.991 0.000 188.8826 246.5933! So increasing the square root of the percentage of

households with running water by 1 lowers the infant mortality rate by 20 per 1000.! Let's vary the percentage from 0 to 100, predict values of the IMR, and look at the results:. replace water95 = _n - 1 (216 real changes made). replace water2 = sqrt(water95) (216 real changes made). predict pimr. graph pimr water95 if water95 <= 100 217.738 pimr 15.9911 0 100 Water (World Bank)! Another approach is to consider derivatives.! The prediction equation from the above estimation is:

ŷ'217.738%&20.17 x! If we differentiate that with respect to x, we get 1 dy ˆ dx '&0.5(20.17x& 2 '&10.085x & 1 2! If we evaluate that at a few locations: x (%) dy/dx 10-3.19 20-2.26 30-1.84 40-1.59 50-1.43 60-1.30 70-1.21 80-1.13 90-1.06! The effect of an increase in the percentage in houses with running water is much stronger when the percentage is small than when it is large.! Typically, a root transformation of an independent variable implies 'diminishing returns.'