Scatterplot for Cedar Tree Example

Size: px

Start display at page:

Download "Scatterplot for Cedar Tree Example"

Homer Johnathan Dalton
5 years ago
Views:

1 Transformations: I Forbes happily had a physical argument to justify linear model E(log (pressure) boiling point) = boiling point usual (not so happy) situation: parametric model behind MLR is convenient approximation at best All models are wrong but some are useful. (Box, 1979) transformations allow MLR models to be extended to data that, in their original state, are poorly approximated by linear models in SLR case, idea is to get transformation e Y of response Y and/or transformation e X of regressor X such that E( e Y e X = x) x ALR 185 VIII 1

2 Transformations: II while transformations are a favorite tool of statisticians, their use is not without controversy (arises in the physical sciences) picking suitable transformations is part science and part art focusing on SLR to start with, need for transformation is manifested in nonlinear appearance of scatterplot let s look at two examples from Weisberg: height of cedar trees (response) versus diameter at 4.5 feet above the ground (regressor) see Figure 8.3 (p. 190) surface tension of liquid copper (response) versus dissolved sulfur (regressor) see Problem 8.1 (pp ) ALR 189, 190, 199, 200 VIII 2

3 Height (dm) Scatterplot for Cedar Tree Example Dbh (mm) ALR 189, 190 VIII 3

4 Tension (dynes/cm) Scatterplot for Liquid Copper Example Sulfur (% of weight) ALR 199, 200 VIII 4

5 Transforming One Regressor: I assuming regressor is positive (true for two examples), widely used family of transformations is scaled power transformations: ( (X 1)/, 6= 0 S(X, ) = log(x), = 0 rationale for 1 and division by for 6= 0 is in part tied to definition for = 0 choice: can show that by making use of lim!0 X 1 = log(x) X 1 = e log(x) 1 along with e z = 1 + z + z2 2! + z3 3! + ALR 189 VIII 5

6 Transforming One Regressor: II for = 0, E(Y X = x) = log(x), whereas, for 6= 0, E(Y X = x) = x 1 where 0 = 0 1 and 1 = 1 = x = x, in practice unscaled power transformation achieves same e ect: ( X, 6= 0 (X, ) = log(x), = 0 choice note: (X, ) has opposite sign from S (X, ) when < 0 = 1 is essentially no transformation ALR 189, 186 VIII 6

7 Transforming One Regressor: III idea is to find such that we approximately have E(Y X = x) = S (x, ) given data (x i, y i ), consider residual sum of squares function: nx RSS(b 0, b 1, ) = [y i (b 0 + b 1 S (x i, ))] 2 i=1 for fixed, minimizer of above is OLS estimators ˆ0 and ˆ1 from regression of y i on S (x i, ), resulting in RSS( ) = RSS( ˆ0, ˆ1, ) ALR 189 VIII 7

8 idea: select Transforming One Regressor: IV such that RSS( ) is minimized over in theory, need nonlinear optimizer to find best, but, since we really don t need to know precisely, can often make do with restricted grid search using, e.g., n o 2 2, 1, 1 2, 1 3, 0, 1 3, 1 2, 1, 2 returning to cedar tree example, following scatterplots show Height versus (Dbh, ) along with fitted regression line for above 9 choices of ALR 189 VIII 8

9 Height (dm) Scatterplot of Height versus (Dbh, ) with = S (Dbh, ) ALR 191 VIII 9

10 RSS( ) Plot of RSS( ) versus VIII 10

11 Transforming One Regressor: V alternative way of displaying transform regress y i = Height i on x i = (Dbh i, ), i = 1,..., n, where x i is transformed value of x i = Dbh i find min & max values Dbh min & Dbh max of Dbh i form dense grid of values x j, j = 1,..., m, ranging from Dbh min to Dbh max compute predicted values yj corresponding to (x j, ) plot y j versus x j on original scatterplot of y i versus x i ALR 189, 190 VIII 11

12 Height (dm) Scatterplot of Height versus Dbh with = Dbh (mm) ALR 190 VIII 12

13 Height (dm) Scatterplot of Height versus Dbh, = 1, 0, Dbh (mm) ALR 190 VIII 13

14 Height Plot Created by R Function invtranplot ^: Dbh ALR 190 VIII 14

15 Tension Scatterplot for Liquid Copper Example ^: Sulfur ALR 199, 200 VIII 15

16 Transforming Response: I assuming regressor X has been suitably transformed into e X, now consider transforming response Y will consider two methods, the first of which is based on model E(Ŷ Y = y) = S (y, ), where Ŷ is fitted value from regression of Y on e X recall that Ŷ is a linear transformation of e X (note: need to assume Y > 0) model is analogous to E(Y X = x) = S (x, ), so the idea is to use same procedure as we did for selecting e X leads to creation of inverse fitted value plot of Ŷ versus Y (also called an inverse response plot) let s look at three examples (using e X = log(x) for first two) ALR 196, 197 VIII 16

17 Fitted height Inverse Fitted Value Plot for Cedar Tree Example ^: Height VIII 17

18 Fitted tension Inverse Fitted Value Plot for Liquid Copper Example ^: Tension VIII 18

19 Fitted pressure Inverse Fitted Value Plot for Forbes Example ^: Pressure VIII 19

20 Transforming Response: II 2nd method (called Box Cox method) makes use of family of modified power transformations (note: again we need Y > 0): ( M(Y, ) = gm(y ) 1 gm(y ) 1 (Y 1)/, 6= 0 S (Y, ) = gm(y ) 1 log(y ), = 0, where gm(y ) is geometric mean of untransformed responses y 1,..., y n : 0 11/n 0 1 ny gm(y ) y i A = 1 nx log(y n i ) A i=1 i=1 2nd form is computationally preferable on a computer so: what is the rationale for multiplying by geometric mean? ALR 198, 190, 191 VIII 20

21 Transforming Response: III residual sum of squares function when transforming predictor: nx RSS p,s (b 0, b 1, ) = [y i (b 0 + b 1 S (x i, ))] 2 i=1 for each, measures how well we predict y i s residual sum of squares function when transforming response: nx RSS r,s (b 0, b 1, ) = [ S (y i, ) (b 0 + b 1 x i )] 2 for each i=1, measures how well we predict S (y i, ) s units of S (y i, ) change as changes, leading to concerns about comparing apples & oranges (because b 1 changes units implicitly, not a concern with RSS p,s (b 0, b 1, )) ALR 190, 191 VIII 21

22 Transforming Response: IV when 6= 0, have M(Y, ) = 20 6 ny 4@ i=1 Y i 11/n A Y 1 if Y has units of m (meters), then Q n i=1 Y i has units of m n ( Q n i=1 Y i ) 1/n has units of m; [( Q n i=1 Y i ) 1/n ] 1, units of m 1 Y has units of m, so M (Y, ) has units of m for all thus: transformed & untransformed responses have same units ALR 190, 191 VIII 22

23 Transforming Response: V resulting residual sum of squares function, i.e., nx RSS r,m (b 0, b 1, ) = [ M (y i, ) (b 0 + b 1 x i )] 2, i=1 measured in same units for all oranges concern, thus eliminating apples and for fixed, minimizer of above is OLS estimators ˆ0 and ˆ1 from regression of M (y i, ) on x i, resulting in as before, select RSS r,m ( ) = RSS r,m ( ˆ0, ˆ1, ) such that RSS r,m ( ) is minimized over let s consider same three examples again (using e X = log(x) for first two) ALR 198, 190, 191 VIII 23

24 Height (dm) Scatterplot for Cedar Tree Example = 0.6 = 1 = 0 = log(dbh) VIII 24

25 Tension Scatterplot for Liquid Copper Example = 0.7 = 1 = 0 = log(sulfur) VIII 25

26 Pressure Scatterplot for Forbes Example = 0.4 = 1 = 0 = Boiling point VIII 26

27 Transforming Response: VI display of transforms done as follows: regress ỹ i = M (y i, ) on x i, i = 1,..., n find min & max values x min & x max of x i form dense grid of values x j, j = 1,..., m, ranging from x min to x max compute predicted values ỹj over dense grid plot M 1 (ỹ j, ) versus x j 1 M (Y, ) = on original scatterplot, where ( (1 + Y /gm(y ) 1 ) 1/, 6= 0 exp(y /gm(y )), = 0 (note that 1 M ( M(Y, ), ) = Y ) VIII 27

28 Summary of Regressor and Response Transforms here is a table showing s chosen so far response Example regressor 1st method 2nd method Cedar tree Liquid copper Forbes s chosen by two methods for transforming response disagree for cedar tree example, but agree for other two examples following plots for = 0, 0.2, 0.6 and 1 for cedar tree example suggest that using = 0 or 0.2 impart some curvature, whereas = 0.6 or 1 (i.e., no transformation) do not going with no transformation is a simple (& reasonable) choice VIII 28

29 log(height) = 0 Transformation with Linear & Quadratic Fits log(dbh) VIII 29

30 Height = 0.2 Transformation with Linear & Quadratic Fits log(dbh) VIII 30

31 Height = 0.6 Transformation with Linear & Quadratic Fits log(dbh) VIII 31

32 Height (dm) = 1 Transformation with Linear & Quadratic Fits log(dbh) ALR 191 VIII 32

33 Transformations for Multiple Regressors: I focusing now on MLR, overall goal is to find transformations in which MLR matches the data to a reasonable approximation (Weisberg, p. 193) theoretical arguments (Weisberg, pp ) suggest we can make progress towards this goal if regressors in mean function are all linearly related if, through suitable transformations, we arrive at regressors X e that are approximately pairwise linear, theoretical arguments say that, under certain conditions (some restrictive!), fitting mean function E(Y X e = x) = 0 x using OLS allows us to identify unknown function g( ) in more general model E(Y e X = x) = g( 0 x) from a scatterplot of y i versus fitted values ˆ0 x i ALR 193, 194 VIII 33

34 Transformations for Multiple Regressors: II overall strategy is thus: 1. transform regressors so that all pairwise scatterplots are approximately linear (don t worry about scatterplots of Y versus individual regressors) 2. regress Y on transformed regressors e X to get estimates ˆ 3. determine a suitable transform for Y using one of two methods discussed previously (see VIII 16 and VIII 20), leading to MLR model E( (Y, ) e X = x) = 0 x starting point in achieving 1 is study of scatterplot matrix quickly leads to realization that achieving 1 can be daunting! Weisberg ( 8.2) uses Highway data as an illustration ALR 194, 195, 191 VIII 34

35 Transformations for Multiple Regressors: III goal is to predict response rate (accidents per million vehicle miles for a particular highway segment) using as regressors len, length of highway segment in miles adt, average daily tra c count in thousands trks, truck volume as % of total volume slim, speed limit shld, shoulder width of outer shoulder on roadway sigs, number of interchanges with signals per mile ALR 192 VIII 35

36 Scatterplot Matrix for Untransformed Highway Data len adt trks slim shld sigs ALR 193 VIII 36

37 Enhanced Scatterplot Matrix len adt trks slim shld sigs ALR 193 VIII 37

38 Enhanced Matrix for Selected Regressors: I len adt trks ALR 193 VIII 38

39 Enhanced Matrix for Selected Regressors: II len slim shld sigs ALR 193 VIII 39

40 Enhanced Matrix for Selected Regressors: III adt slim shld sigs ALR 193 VIII 40

41 Enhanced Matrix for Selected Regressors: IV trks slim shld sigs ALR 193 VIII 41

42 Transformations for Multiple Regressors: IV all regressors positive except sigs (# of signaled interchanges per mile), which has some values equal to zero can handle sigs by defining new regressor sigs len + 1 sigs1 = len (i.e., bump up signal count by 1 in every segment) slim (speed limit) doesn t vary much (most values between 50 to 60 mph, with total range being between 40 and 70 mph) unlikely any transformation (slim, ) will be e ective with sigs replaced by sigs1 and with slim left out as a candidate for transformation, can use R function powertransform to get initial guesses at suitable transformations (implementation of multivariate extension of Box Cox due to Velilla) ALR 192, 195, 196 VIII 42

43 Transformations for Multiple Regressors: V output from powertransform for Highway data: Est.Power Std.Err. Lower Bound Upper Bound len adt trks shld sigs Likelihood ratio tests about transformation parameters LRT df pval LR test, lambda = ( ) LR test, lambda = ( ) LR test, lambda = ( ) ALR 196 VIII 43

44 Transformations for Multiple Regressors: VI can reject null hypothesis of all log transformations and null hypothesis of no transformations at all cannot reject null hypothesis of log transformation for all regressors except shld (shoulder width) suggestion for trks is a bit odd: sticking to nearest integer would lead to choice = 1 rather than = 0 can test feasibility of this modification using testtransform, which yields following output for comparison with = 0 choice: LRT df pval LR test, lambda = ( ) LR test, lambda = ( ) cannot reject either of two stated null hypotheses, so will go with suggested (0, 0, 0, 1, 0) choice for s ALR 196 VIII 44

45 Scatterplot Matrix for Transformed Highway Data loglen logadt logtrks slim shld logsigs ALR 197 VIII 45

46 Enhanced Scatterplot Matrix loglen logadt logtrks slim shld logsigs ALR 197 VIII 46

47 Enhanced Matrix for Selected Regressors: I loglen logadt logtrks ALR 193 VIII 47

48 Enhanced Matrix for Selected Regressors: II loglen slim shld logsigs ALR 193 VIII 48

49 Enhanced Matrix for Selected Regressors: III logadt slim shld logsigs ALR 193 VIII 49

50 Enhanced Matrix for Selected Regressors: IV logtrks slim shld logsigs ALR 193 VIII 50

51 Transforming Response for Highway Data: I with transformations for regressors set, turn attention now to transformation of response using 1. inverse fitted value plot 2. Box Cox method for 1, start by fitting model E(rate X) = loglen + 2 logadt + 3 logtrks + 4 slim + 5 shld + 6 logsigs1 to obtain fitted values Ŷ for rate fit model E(Ŷ Y = y) = S (y, ) for various choices of to create inverse fitted value plot following plot suggests log transform might be appropriate ALR 196 VIII 51

52 Fitted rate Inverse Fitted Value Plot for Highway Example ^: rate ALR 197 VIII 52

53 Transforming Response for Highway Data: II for 2 (Box Cox method), consider residual sum of squares function nx RSS r,m (b, ) = M(y i, ) x 0 i b 2, i=1 where y i is rate for ith case, and vector x i contains 1 followed by values of loglen, logadt,..., logsigs1 for ith case for fixed, minimizer of above is OLS estimator ˆ from regression of M (y i, ) on x i, resulting in as before, select RSS r,m ( ) = RSS r,m ( ˆ, ) such that RSS r,m ( ) is minimized over following summary plot shows so-called log-likelihood versus, where log-likelihood is n 2 log[rss r,m ( )/n] (suggests log transform as did inverse fitted value plot) ALR 198 VIII 53

54 log Likelihood Summary of Box Cox Method for Highway Example 95% ALR 197 VIII 54

55 Main Points: I transformations allow application of SLR and MLR analysis to regressor/response data that, in their original state, are not well-suited for such analysis in SLR, need for transformation suggested by scatterplot of Y versus X with points that aren t clustered about a line useful family of transformations is scaled power transformations: ( (X 1)/, 6= 0 S(X, ) = log(x), = 0 (case = 1 is essentially no transformation at all) above requires that regressor X be positive ( 8.4 of Weisberg discusses a (not entirely satisfactory) modification for handling data that can take both positive and negative values) ALR 185, 189, 198, 199 VIII 55

56 Main Points: II appropriate is value minimizing residual sum of squares (RSS) of Y regressed on S (X, ) once X has been suitably transformed, can use either 1. inverse fitted value plot or 2. Box Cox method to select a power transformation suitable for Y > 0 (2nd method makes use of a modified power transformation, which di ers from a scaled power transformation by a normalizing factor involving the geometric mean of responses y i ) ALR 189, 196, 198 VIII 56

57 Main Points: III in MLR, good to have regressors whose entries in scatterplot matrix show a linear relationship if original regressors don t have this pattern, transformation of one or more regressors is called for identifying which transformation to apply to which regressor can be a daunting task task can be facilitated by an automatic transformation selection method due to Velilla, which can provide a useful starting point for picking suitable transformations (part art/part science!) ALR 193, 194, 195 VIII 57

58 Additional Reference G.E.P. Box (1979), Robustness in the Strategy of Scientific Model Building, in Robustness in Statistics, edited by R.L. Launer and G.N. Wilkinson, New York: Academic Press, pp VIII 58

1 Introduction 1. 2 The Multiple Regression Model 1

1 Introduction 1. 2 The Multiple Regression Model 1 Multiple Linear Regression Contents 1 Introduction 1 2 The Multiple Regression Model 1 3 Setting Up a Multiple Regression Model 2 3.1 Introduction.............................. 2 3.2 Significance Tests