DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 816 MULTIPLE REGRESSION METHODS I. AGENDA: A. Residuals B. Transformations 1. A useful procedure for making transformations C. Reading: Agresti and Finlay Statistical Methods in the Social Sciences, 3 rd edition. II. III. RESIDUALS: A. MINITAB identifies cases that exert leverage (disproportionately affect) on estimators and very poor fitting data (that is, cases with large residuals). 1. Normally one would look at each of these cases carefully to make sure there are no measurement errors or substantive reasons that should be taken into account. B. Partial regression plots: summary 1. Assume Y and K independent or predictor variables. 2. A partial plot shows the relationship between Y and one of the predictors after both have been adjusted for the influence of the remaining K -1 variables. 3. Method: i. Regress Y on the K - 1 variables and obtain residuals ii. iii. iv. Regress X on the K - 1 variables and obtain residuals k Plot first set of residuals against second set to obtain partial regression plot. 1) This plot may indicate the need to transform the data. (See below.) Regress the first set of residuals on the second. 1) The intercept will be 0. 2) The regression coefficient will equal the partial regression coefficient obtained when Y is regressed on all K variables. TRANSFORMING DATA A. Frequently plots will reveal patterns that indicate one or more variables should be transformed in order to meet the assumptions and requirements of regression analysis. 1. OLS regression assumes that the model has been correctly specified; in particular the relationship between Y should be a linear function of the
Posc/Uapp 816 Class 17 Regression Methods Page 2 X s. 2. Moreover, variables sometimes need to be transformed to make their observed distributions more symmetrical. 3. The "raw" or "original" data can sometimes be transformed to new values, ' ' Y and/or X, in a way that creates linear relations and/or symmetry. 4. One way to find an appropriate transformation is to use the so-called "ladder powers." B. Here s a motivating example: 1. The next figure shows the relationship between sulfur dioxide and mortality. 2. The relationship seems slightly curved, right? C. Sometimes a variable will be highly skewed. 1. To see this let s switch to a new data set, one used last semester. 2. It includes per capita crime rates and percent living in poverty (or classified as poor) for 506 districts in Boston. i. The data were drawn from the Data Story and Library at Statlib located at Carnegie Mellon University. 3. Here is a stem-and-leaf display of the per capita crime variable.
Posc/Uapp 816 Class 17 Regression Methods Page 3 (400) 0 00000000000000000000000000000000000000000000000000000000000000000+ 106 0 5555555555555666666667777777778888888888999999999999 54 1 000011111122233333444444 30 1 555555678889 18 2 0022344 11 2 558 8 3 8 3 78 6 4 1 5 4 5 4 5 1 3 5 3 6 3 6 7 2 7 3 1 7 1 8 1 8 8 4. It s clear that the data are highly skewed. Most values are below 1.0. 5. Moreover, it is hard to plot 500 plus data points. 6. So I took a random sample of 50 cases from the file to use in a preliminary analysis. 7. Here is the plot of crime versus percent poor.
Posc/Uapp 816 Class 17 Regression Methods Page 4 i. We can see that a linear model may not be appropriate, partly because Y is so highly skewed and perhaps because the relationship is not linear. D. What to do? 1. We need a systematic way to decide how to transform variables. 2. First let s consider bivariate relationships. 3. Basic idea: i. Rank the X scores from lowest to highest. ii. Divide them into three roughly equal batches (i.e., each batch has about 1/3 of the cases). 1) If N, divided by 3, is even each has same number. 2) If N, divided by three, has remainder of one, put the extra case in the middle batch. 3) If N, divided by 3, has a remainder of 2, put an extra case in each end batch. iii. iv. Find the median X in each of the three batches. Call these medians X L, X M, and X H. Find the median Y's for the Y's that correspond to the X's in each batch. The median Y may or may not involve the same cases as the X median. In other words, 1) The X's have been divided into three groups. 2) Find the Y's that correspond to these X's. 3) For each of the three batches of Y's find the medians: Y, L Y M, and Y H. 4) These medians need not be actual data points. v. Find the half slopes: 1) The left or lower half slope is: b L = Y M - Y L X M - X L 2) and the upper or right half slope is: b R = Y H - Y M X H - X M 4. The half slopes can be used to check for linearity and to pick an appropriate transformation (if any exists) that will "straighten out" the relationship so that OLS can be applied: i. Find the half slopes and sketch them in the scatter plot. If the data are linear the two half slopes will be roughly equal and their graph
Posc/Uapp 816 Class 17 Regression Methods Page 5 ii. iii. will be a nearly straight line. If, on the other hand, the relationship is not linear, then the graphs of the half slopes will form an "arrow" (see below) which you can use to pick a transformation. Calculate the half slope ratio by dividing b L by b R: if the relationship is linear, the ratio will be about 1.0; if not it will be less than or greater than 1.0. If the half slope ratio is negative, that means that one slope is positive and one is negative and the ladder powers will not help. E. Using the half slopes. Consider the following 1. Suppose data points were dispersed roughly as shown. 2. There is a relationship between X and Y, it is not linear. 3. You can imagine finding half slopes i. I ve sketched them in. They are of course not drawn to scale. 4. You can also imagine obtaining their ratio, which in this figure is greater than zero. i. Both slopes have the same sign, here negative. 5. The left slope is larger (steeper) than the right slope. i. So you can determine that ratio is greater than 1.0 Figure 3 6. You can imagine drawing an arrow using the two half slope ratios, as I have done. i. This arrow points down the Y and X axes. ii. That in turn suggests that we transform either Y or X or both by taking powers down the ladder. 1) See below. For now going down means taking the square root or logarithm or some other power of X and/or Y.
Posc/Uapp 816 Class 17 Regression Methods Page 6 7. It s possible that data would be related as indicated in Figure 4: i. Now there is a curved positive correlation. ii. The left half ratio is small than the right, although they both have positive signs. 1) So again the ratio is positive and a transformation of either X or Y or both might help. iii. The arrow formed by sketching the half slopes points up the X axis and down the Y axis. 1) As we will see this implies converting X by taking higher 2 power, such as X, and/or lower powers of Y such as log(y). Figure 4 8. Now look at the next figure. We can analyze it in the same way by drawing half slopes and creating arrows.
Posc/Uapp 816 Class 17 Regression Methods Page 7 Figure 5 i. The arrow points up the Y axis and down the X axis, so we would reverse the transformations mentioned above. ii. We might have to push Y up and/or pull X down. F. Each of the these figures contains an implied arrow that represents the half-slopes. Since there are "bends" in the line (hence the arrows), we can see that the relationships are nonlinear. 1. The direction that the implied arrow points indicates what transformations of X and/or Y may help make the relationship more nearly linear. i. The words "push up" means take powers of the variable that are greater than 1.0. That is, "push up X" means transform X by 2.5 squaring or cubing it; or perhaps taking the 2.5 power (that is, X ) trial and error is necessary to see which transformation works best. G. The words "pull down" meaning taking a power that is less than 1.0; for example, one can take the square root (the 1/2 power) or the logarithm (the 0 power) of X or Y or both. Again trial and error is necessary to give the best fit. 1. Ladder Powers: when "pushing" or "pulling" a variable, one can use the so-called ladder powers (named by John Tukey, a statistician at Bell Labs):
Posc/Uapp 816 Class 17 Regression Methods Page 8 The Ladder Powers Transformation (Power) (Step on Ladder) Name Result......... 3 3 X = cube Pushes X "up" 2 2 X = square 1 "raw" score No change 1/2 1/2 X = square root 0 log(x) (base 10) -1/2 reciprocal root -1-1/X Pulls X "down"......... 2. It is possible to take half or even more refined intermediate steps such as 3/4 raising X to the 3/4 power (i.e., X ). IV. AN EXAMPLE WITH SIMULATED DATA: A. Here is an example using simulated data. 1. I created a population based on the model: Y i = $ 11 X 2.9 i +, i 2. Note that that β = 0 and β = 1.0. Y is simply X plus an error term. That 0 1 2.9 is, X has been raised to the 2.9 power. 3. I then sampled 100 cases from this population. 4. Assume then that I have 100 X-y pairs and am trying to find the best fitting model for them. 5. Normally, I would plot Y against X. In this case it the plot is:
Posc/Uapp 816 Class 17 Regression Methods Page 9 Figure 6 6. Since I am assuming that the "true" model is not known, my first guess is a simple linear equation: Y i = $ 0 + $ 1 X i +, i 7. But the plot suggests that there is a non-linear relationship between Y and X. i. Indeed, if one imagined half slopes forming the head of an arrow, one would think to transform X by going up the ladder powers-- that is, transforming X by taking, say, X-squared--or by moving down the ladder powers with Y--that is, using the square root of Y. 8. But for now I can proceed as if using raw X and Y were satisfactory. i. Here are the results from a bare-bones regression analysis.
Posc/Uapp 816 Class 17 Regression Methods Page 10 The regression equation is SampleY = - 10180 + 998 SampleX Predictor Coef StDev T P Constant -10180 1107-9.20 0.000 SampleX 997.68 42.14 23.68 0.000 S = 5840 R-Sq =.851 Analysis of Variance Source DF SS MS F P Regression 1 19120576375 19120576375 560.60 0.000 Residual Error 98 3342509414 34107239 Total 99 22463085789 ii. The sample data seem to fit the linear model quite well. Look at R 2 and s. iii. The estimated coefficient relating Y to X is 997.7, which we know is incorrect. 1) Also the constant is -10180, which we know is wrong since we created the population to have β 0 = 0. iv. Still, the data provide a good fit. v. But if we use the half slope ratios or approximations of them, we can possibly improve the fit. 1) The imaginary arrow suggests going up the ladder in X (or down in Y, but let s try X first) so we can create a variable, 2 X*, which is simply X* = X. 2) The plot of it against Y follows.
Posc/Uapp 816 Class 17 Regression Methods Page 11 Figure 7 vi. The points seem to lie on a straight line so we use regression procedures to obtain. The regression equation is SampleY = - 2624 + 21.3 SampleX2 Predictor Coef StDev T P Constant -2623.8 325.7-8.06 0.000 SampleX2 21.3132 0.3325 64.11 0.000 S = 2311 R-Sq = 97.7% R-Sq(adj) = 97.6% Analysis of Variance Source DF SS MS F P Regression 1 21939895085 21939895085 4109.61 0.000 Residual Error 98 523190705 5338681 Total 99 22463085789
Posc/Uapp 816 Class 17 Regression Methods Page 12 vii. 2 Although the R has become nearly perfect, we know--because we created the population model--that the estimated coefficients are off. 1) Of course they are closer to the population values of β = 0 0 and β 1 = 1.0. 3 2) Were we to transform X still again, by taking say, X, we would find the coefficients closer to the true values. a) Actually the figure above hints at a slightly curved relationship. 3) Also, don t forget that these data constitute a relatively 2.5 small sample from the population in which Y = X + error. a) So our transformation is not too bad. V. NOTES ARE CONTINUED ON NEXT PAGES: A. The file is too large to fit on a single disk so I split it into two parts.