Interpreting coefficients for transformed variables

Interpreting coefficients for transformed variables! Recall that when both independent and dependent variables are untransformed, an estimated coefficient represents the change in the dependent variable expected when the independent variable changes by one unit, holding the other variables constant.! The substantive interpretation of coefficients in such situations is accordingly fairly straightforward.! Interpreting coefficients when variables have been transformed can be somewhat trickier.! The most straightforward case involves transforms with logarithms.! We will deal with situation first, and talk about how to deal with some of the others later.

Logged variables! There are two common bases that are used for logarithmic transformations.! A natural logarithm is in base e. e, you may know, is a mathematical constant. Its first few digits are 2.71828. y! The natural log of x is y such that e = x.! In STATA, log(x) and ln(x) both return the natural log of x.! Another common base for the logarithm is 10. y! The log 10 of x is y such that 10 = x.! In STATA, log (x) returns the log of x. 10 10! One property of logarithms is that multiplying x by some constant a adds log a to its log.! Thus if the natural log of a variable increases by 1, that implies that the original variable has been multiplied by e.! If log of a variable increases by 1, the original variable 10 has been multiplied by 10.

Either the independent or dependent variable is logged! If the dependent variable is raw, and the independent variable is logged, the estimated coefficient b is the absolute change in the dependent variable expected when the original independent variable is multiplied by e or 10, depending on the base of the transform.! In this situation, you can work out the expected change in the dependent variable associated with a x percent increase in the independent variable by multiplying the coefficient by log([100+x]/100). Make sure to keep the bases the same.! To work out the expected change associated with a 10% increase in the independent variable, therefore, multiply by log(110/100) = log(1.1).! ln(1.1) = 0.09531! log 10(1.1) = 0.041393! If the dependent variable is logged, and the independent variable is not, every unit change in the independent variable is expected to multiply the original dependent b b variable by e or 10, depending on the base of the transform. b is the estimated coefficient.

When both independent and dependent variables are logged! If both the independent and dependent variables are logged, multiplying the original independent variable by e or 10 b b will multiply the original dependent variable by e or 10, depending on the base.! In the latter situation, where a proportional change in the independent variable is associated with a proportional change in the dependent variable, the coefficient is referred to as an elasticity.! To get the proportional change in the dependent variable associated with a x percent increase in the independent ab variable, calculate a = log([100+x]/100) and take e or ab 10, depending on the base.! The predicted proportional change can be converted to a predicted percentage change by subtracting 1 and multiplying by 100.! Be careful in all these calculations to keep your bases consistent.

Some examples! Let's consider the relationship between the percentage urban and per capita GNP: 100 % urban 95 (World Bank) 8 77 42416 United Nations per capita GDP! This doesn't look too good. Let's try transforming the per capita GNP by logging it: 100 % urban 95 (World Bank) 8 4.34381 10.6553 lpcgdp95

! That looked pretty good. Now let's quantify the association between percentage urban and the logged per capita income:. regress urb95 lpcgdp95 Source SS df MS Number of obs = 132 ---------+------------------------------ F( 1, 130) = 158.73 Model 38856.2103 1 38856.2103 Prob > F = 0.0000 Residual 31822.7215 130 244.790165 R-squared = 0.5498 ---------+------------------------------ Adj R-squared = 0.5463 Total 70678.9318 131 539.533831 Root MSE = 15.646 urb95 Coef. Std. Err. t P> t [95% Conf. Interval] ---------+-------------------------------------------------------------------- lpcgdp95 10.43004.8278521 12.599 0.000 8.792235 12.06785 _cons -24.42095 6.295892-3.879 0.000-36.87662-11.96528! The implication of this coefficient is that multiplying capita income by e, roughly 2.71828, 'increases' the percentage urban by 10.43 percentage points.! Increasing per capita income by 10% 'increases' the percentage urban by 10.43*0.09531 = 0.994 percentage points.

What about the situation where the dependent variable is logged?! We could just as easily have considered the 'effect' on logged per capita income of increasing urbanization: 10.6553 lpcgdp95 4.34381 8 100 % urban 95 (World Bank). regress lpcgdp95 urb95 Source SS df MS Number of obs = 132 ---------+------------------------------ F( 1, 130) = 158.73 Model 196.362646 1 196.362646 Prob > F = 0.0000 Residual 160.818406 130 1.23706466 R-squared = 0.5498 ---------+------------------------------ Adj R-squared = 0.5463 Total 357.181052 131 2.72657291 Root MSE = 1.1122 lpcgdp95 Coef. Std. Err. t P> t [95% Conf. Interval] ---------+-------------------------------------------------------------------- urb95.052709.0041836 12.599 0.000.0444322.0609857 _cons 4.630287.2420303 19.131 0.000 4.151459 5.109115! Every one point increase in the percentage urban multiplies 0.052709 per capita income by e = 1.054. In other words, it increases per capita income by 5.4%.

Logged independent and dependent variables! Let's look at infant mortality and per capita income: 5.1299 limr 1.38629 3.58352 10.6553 lpcgdp95. regress limr lpcgdp95 Source SS df MS Number of obs = 194 ---------+------------------------------ F( 1, 192) = 404.52 Model 131.035233 1 131.035233 Prob > F = 0.0000 Residual 62.1945021 192.323929698 R-squared = 0.6781 ---------+------------------------------ Adj R-squared = 0.6765 Total 193.229735 193 1.00119034 Root MSE =.56915 limr Coef. Std. Err. t P> t [95% Conf. Interval] ---------+-------------------------------------------------------------------- lpcgdp95 -.4984531.0247831-20.113 0.000 -.5473352 -.449571 _cons 7.088676.1908519 37.142 0.000 6.71224 7.465111! Thus multiplying per capita income by 2.718 multiplies the -0.4984531 infant mortality rate by e = 0.607! A 10% increase in per capita income multiplies the infant -0.4984531*ln(1.1) mortality rate e = 0.954.! In other words, a 10% increase in per capita income reduces the infant mortality rate by 4.6%.

What about other transformations?! The power and root transformations don't lead to such intuitive interpretations.! The coefficient represents the effect, after all, of a change in the power or root of the original variable.! One of the best things to do in such situations is to look at predicted values of the dependent variable for a range of values of the independent variable, most likely through a graphical plot of the predicted variable against the untransformed variable.! Consider the relationship between IMR and the square root of the percentage of houses with running water: 149 IMR 4 3.60555 10 water2. regress IMR water2 Source SS df MS Number of obs = 92 ---------+------------------------------ F( 1, 90) = 134.76 Model 83700.8284 1 83700.8284 Prob > F = 0.0000 Residual 55899.0412 90 621.100457 R-squared = 0.5996 ---------+------------------------------ Adj R-squared = 0.5951 Total 139599.87 91 1534.0645 Root MSE = 24.922 IMR Coef. Std. Err. t P> t [95% Conf. Interval] ---------+-------------------------------------------------------------------- water2-20.17469 1.737893-11.609 0.000-23.62732-16.72206 _cons 217.738 14.52444 14.991 0.000 188.8826 246.5933! So increasing the square root of the percentage of

households with running water by 1 lowers the infant mortality rate by 20 per 1000.! Let's vary the percentage from 0 to 100, predict values of the IMR, and look at the results:. replace water95 = _n - 1 (216 real changes made). replace water2 = sqrt(water95) (216 real changes made). predict pimr. graph pimr water95 if water95 <= 100 217.738 pimr 15.9911 0 100 Water (World Bank)! Another approach is to consider derivatives.! The prediction equation from the above estimation is:

ŷ'217.738%&20.17 x! If we differentiate that with respect to x, we get 1 dy ˆ dx '&0.5(20.17x& 2 '&10.085x & 1 2! If we evaluate that at a few locations: x (%) dy/dx 10-3.19 20-2.26 30-1.84 40-1.59 50-1.43 60-1.30 70-1.21 80-1.13 90-1.06! The effect of an increase in the percentage in houses with running water is much stronger when the percentage is small than when it is large.! Typically, a root transformation of an independent variable implies 'diminishing returns.'