Diagnostics can identify two possible areas of failure of assumptions when fitting linear models.

Size: px

Start display at page:

Download "Diagnostics can identify two possible areas of failure of assumptions when fitting linear models."

Isabel Simon
5 years ago
Views:

1 1 Transformations 1.1 Introduction Diagnostics can identify two possible areas of failure of assumptions when fitting linear models. (i) lack of Normality (ii) heterogeneity of variances It is important to detect and correct a non constant error variance. If this problem is not eliminated the least squared estimators will still be unbiased but they will no longer have the minimum variance property. This means that the regression coefficient estimators will have larger standard errors than necessary. Both problems in (i) and (ii) can be remedied by taking an appropriate transformation of the response variable Y. There may be a third reason for taking transformations, namely (iii) to simplify the relationship between the response variable and the explanatory variables - fitting a polynomial of high degree should be avoided, if possible, since interpretation of such models is difficult. Transforming variables may simplify the relationship sufficiently to avoid the need of using a high degree polynomial. Transformation are also used (iv) to turn non-linear models into linear models. 1.2 Ad-hoc methods Consider the four curved relationships shown on the next page between a response variable Y and an explanatory variable x. Notice that these curves have only one bend in them and that the bulge of the bend points either towards larger or towards smaller values of x (or Y ). For example in the first figure the bulge points towards larger values of x and smaller values of Y, whereas in the second figure the bulge points towards smaller values of both Y and x. 1

2 . 2

3 An one bulge relationship can be made into a straight line relationship by either transforming the Y -values or the x-values using a power transformation i.e. by taking either Y = Y k or by taking x = x k for some k. The value of k depends on the direction of the bulge. If the bulge points towards lower values of the variable you are transforming then take k < 1 for the relationship to be transformed into a straight line. (The case k = 0 represents a logarithmic transformation) If the bulge points towards larger values of the variable you are transforming then take k > 1 to linearise the relationship. The exact value of k is determined by trial and error. It must be remembered however that transforming the response variable Y affects the Normality and homogeneity of variance assumptions. The usual values of k that are considered are k =..., 1, 3/4, 1/2, 1/4, 0, 1/4, 1/2, 3/4, 1.25, 1.5,... Two bend relationships, an example of which is displayed below cannot be made into a straight line relationship by a power transformation. In toxicity studies the proportion Y of a fixed number of organisms surviving a toxic substance is related to the dose x of the the toxin. However, this relationship is S- shaped i.e. a two bend relationship. 3

4 Common transformations that change such a two bend relationship into a straight line relationship are Logit transformation: Y = log ( ) Y 1 Y Arc-sine transformation: Y = sin ( ) ( ) 1 Y = arcsin Y Probit transformation : Y = Φ 1 (Y ) where Φ (.) is the standard Normal distribution function. Once again it must be remembered that these transformations on the response variable may either disturb or not necessarily rectify the Normality and variance homogeneity assumptions. 1.3 Intrinsically linear models. The following models although at first sight may not appear to be linear they van in fact be converted into linear models, without much difficulty. (a) Y i = αx β i ε i Y = α + βx i + ε i under Y = log Y, x = log x transformation (b) Y i = αe βx i ε i Y = α + βx i + ε i by a logarithmic transformation (c) Y i = x i α + βx i + ε i Y = β + αx i + ε i under the Y = 1 Y, X = 1 X transformation. α (d) Y i = Y = γ βx 1 + γe βx i +ε under the Y = log( α i εi Y 1) transformation, provided α is known. Note that for the transformed models to be useful the least squares assumptions on the error distribution must apply to the the errors ε after the transformation. 4

5 1.4 Analytic ways of determining transformations Transformations in the explanatory variable to simplify relationships The Box-Tidwell method Suppose we want to find the power transformation such that i.e. w = Y i = α + βw i + ε i Y i = α + βx λ i + ε i { x λ if λ 0 log x if λ = 0 i = 1, 2,..., n i = 1, 2,..., n The Box-Tidwell method of determining the appropriate value λ 0 of λ argues as follows. For the appropriate value λ 0 we have the linear relationship between Y and x λ 0 Y i = α + βx λ 0 i + ε i i = 1, 2,..., n with Residual sum of squares RSS(λ 0 ). However, for an inappropriate choice λ the relationship between y and x λ is not linear and if we insist on fitting Y i = α + βx λ i + ε i i = 1, 2,..., n the fit will be poor and in particular RSS(λ ) of such a fit will have RSS(λ ) RSS(λ 0 ) Hence the appropriate choice λ 0 satisfies RSS(λ 0 ) = min RSS(λ) λ i.e. λ 0 is that value of λ which minimises the RSS(λ) of the model Y i = α + βx λ i + ε i i = 1, 2,..., n The minimization is achieved using the following iterative procedure. 5

6 Step 1 Let λ (1) be a first approximation to to the required value λ 0 and let w (1) i = x λ (1) i i = 1, 2,..., n (usually we take λ (1) = 1 so that w (1) i = x i i.e. no transformation is applied at the first iteration of the procedure) Fit the model Y i = α + βw (1) i + ε i i = 1, 2,..., n and let ˆβ (1) be the l.s.e of β in the fitted model. If, in the correct relationship Y i = α + βx λ 0 i + ε i i = 1, 2,..., n we take the first order Taylor expansion of x λ 0 about λ = λ (1) we get to a d first order approximation (recall dλ xλ = x λ log x) where Step 2 Y i = α + β[x λ (1) i + (λ 0 λ (1) )x λ (1) 1 log x i ] + ε i (1) = α + βw (1) i + β (λ 0 λ (1) ) x λ (1) i log x λ (1) i + ε i λ (1) (2) = α + βw (1) i + γw (1) i log w (1) i + ε i (3) γ = β( λ 0 λ (1) 1) (4) Fit the model in (3) by regressing Y i on two explanatory variables w (1) i and w (1) i log w (1) i. Let ˆγ (1) be the l.s.e. of γ in fitting the model in (3). Aside Notice that if we have taken, as recommended, λ (1) = 1 testing at this stage the hypothesis H 0 : γ = 0 against H 1 : γ 0 6

7 is equivalent to testing the hypothesis H 0 : λ 0 = 1 against H 1 : λ 0 1 i.e. it is equivalent to testing the null hypothesis that no transformation of the explanatory variable is required to linearize the relationship between the response variable and the explanatory variable against the alternative that some transformation is required. Result In order to test whether the relationship between the response variable and an explanatory variable x is linear, fit the model which contains both the explanatory variable x and the generated explanatory variable x log x and proceed to test the hypothesis whether the coefficient γ of the variable x log x in the fitted model is equal to zero or not. If there is statistical evidence that γ 0 that is an indication that the relationship between the response variable and x is not linear. If there is no statistical evidence that γ 0 then that is an indication that the relationship between the response variable and x is linear. Returning to the procedure for determining the value of the power λ 0 in the power transformation that linearises the relationship between the response and explanatory variable we are ready to take the next step. Step 3 From equation (4) we see that λ 0 is approximately equal to λ (1) ( ˆγ (1) ˆβ (1) + 1) = λ (2) i.e. λ (2) is a better approximation to λ 0 than λ (1). Note that ˆβ (1) is the l.s.e. of β in Step 1 and ˆγ (1) is the l.s.e. of γ in Step 2. 7

8 Step 4 Repeat steps 1 to 3 so that at the end of the rth iteration you have the improved approximation to λ 0 λ (r+1) = ( ˆγ (r) ˆβ (r) + 1)λ (r) The iterations stop when λ (r+1) agrees with λ (r) to within the degree of required accuracy. Usually the convergence is fairly fast. Note that at the start of the (r + 1)th iteration we have w (r+1) i = x λ (r+1) i λ (r) ( ˆγ (r) +1) ˆβ = x (r) i = [w (r) i ] ( ˆγ (r) +1) ˆβ (r) Remark: It must be noted that a power transformation of the explanatory variable can only succeed in linearizing the relationship between the response variable and explanatory variable only if the original relationship between the response and the explanatory variable is an one bend relationship. Remark: This iterative procedure can be used unaltered to determine the required power transformation of a particular regressor in a multiple regression model in order to linearize the relationship between the response variable and that particular regressor. Example An engineer is investigating the use of windmills for power generation. Below are the data collected by the engineer on the DC output from a new design of windmill and the corresponding wind velocity. The data are also plotted in Figure 1. 8

9 Row DCoutput WVelocity

10 The scatter plot suggests that the relationship between DC output (Y ) and wind velocity (x) is not a straight line but being an one bent relationship it may be linearised using a power transformation of the regressor x. Starting with the initial guess λ (1) = 1 we fit a straight line model between DC output (Y ) and wind velocity (x) with the following results. Regression Analysis: DCoutput versus WVelocity The regression equation is DCoutput = WVelocity Predictor Coef SE Coef T P Constant WVelocit S = R-Sq = 87.4% R-Sq(adj) = 86.9% Defining xlogx = WVelocity log(wvelocity) and fitting the model E(Y ) = α + βx + γ(xlogx) 10

11 we get the results Regression Analysis: DCoutput versus WVelocity, xlogx The regression equation is DCoutput = WVelocity xlogx Predictor Coef SE Coef T P Constant WVelocit xlogx S = R-Sq = 97.4% R-Sq(adj) = 97.1% The improved estimate of λ 0 is therefore λ (2) = ˆγ (1) ˆβ (1) + 1 = = 0.92 To perform the second iteration we define the new regressor w = x 0.92 and fit a straight line model between DCoutput Y and w. The results are Regression Analysis: DCoutput versus w The regression equation is DCoutput = w Predictor Coef SE Coef T P Constant w S = R-Sq = 98.1% R-Sq(adj) = 98.0% 11

12 Now define the second regressor wlogw and fit the model The results are E(Y ) = α + βw + γ(wlogw) Regression Analysis: DCoutput versus w, wlogw The regression equation is DCoutput = w wlogw Predictor Coef SE Coef T P Constant w wlogw S = R-Sq = 98.1% R-Sq(adj) = 97.9% The second iteration estimate of λ 0 is λ (3) = ( ˆγ (2) + 1)λ (2) = ( ) ( 0.92) = 0.84 ˆβ (2) To perform the third iteration we define the new regressor w = x 0.84 and a fit a straight line model between DCoutput Y and w. The results are Regression Analysis: DCoutput versus w The regression equation is DCoutput = w Predictor Coef SE Coef T P Constant w S = R-Sq = 98.1% R-Sq(adj) = 98.0% 12

13 Now define the second regressor wlogw and fit the model The results are E(Y ) = α + βw + γ(wlogw) Regression Analysis: DCoutput versus w, wlogw The regression equation is DCoutput = w wlogw Predictor Coef SE Coef T P Constant w wlogw S = R-Sq = 98.1% R-Sq(adj) = 97.9% The third iteration estimate of λ 0 is λ (4) = ( ˆγ (3) + 1)λ (3) = ( ) ( 0.84) = 0.83 ˆβ (3) To perform the fourth iteration we define the new regressor w = x 0.83 and a fit a straight line model between DCoutput Y and w. The results are Regression Analysis: DCoutput versus w The regression equation is DCoutput = w Predictor Coef SE Coef T P Constant w S = R-Sq = 98.1% R-Sq(adj) = 98.0% 13

14 Now define the second regressor wlogw and fit the model The results are E(Y ) = α + βw + γ(wlogw) Regression Analysis: DCoutput versus w, wlogw The regression equation is DCoutput = w wlogw Predictor Coef SE Coef T P Constant w wlogw S = R-Sq = 98.1% R-Sq(adj) = 97.9% The fourth iteration estimate of λ 0 is λ (5) = ( ˆγ (4) + 1)λ (5) = ( ) ( 0.83) = 0.83 ˆβ (5) which, to two decimal points is the same as λ (4), the estimate in the last iteration. The iterative procedure therefore terminates and λ 0 is taken as Thus the relationship between E(Y ) and w = x 0.83 is linear. The plot below confirms this. 1.5 Variance Stabilizing Transformations 1. Suppose that the variance σ 2 of the observations Y are dependent on the mean value of Y i.e. σ 2 = g(µ) where µ = E(Y ) and g is a known function. We require a transformation Y = T (Y ) 14

15 so that the variance of the transformed data Y is constant i.e. V ar(y ) = τ 2. Now as a first order approximation we have through a Taylor series expansion about µ Y = T (Y ) T (µ) + (Y µ)t (µ) (5) where T denotes first derivative. Thus τ 2 = V ar(y ) = V ar(y µ)[t (µ)] 2 (6) i.e. or Hence Examples τ 2 = g(µ)[t (µ)] 2 T (µ) = T (µ) τ [g(µ)] 1/2 1 dµ (7) [g(µ)] 1/2 (a) If Y Poisson with mean µ then σ 2 = µ i.e. g(µ) = µ and T (Y ) 1 y dy y i.e. if Y is Poisson the square root transformation will stabilize the variance (b) If ny is Binomial(n, µ) distributed then E(Y ) = nµ n = µ and V ar(y ) = 1 n nµ(1 µ) = 1 µ(1 µ) = g(µ) 2 n The variance stabilizing transformation is therefore T (y) 1 y(1 y) dy arcsin( y) 15

16 (c) If g(µ) = µ 2k, some k, then the variance stabilizing transformation is 1 T (y) dy y1 k yk i.e. a power transformation. The case k = 1 corresponds to the logarithmic transformation i.e. T (y) = log y Warning In examples (a) and (b), even though the transformations may turn the transformed values into Normal values, non-the-less, before rushing to take the suggested variance stabilizing transformation and regressing the transformed variable on a number of explanatory variables, consider first the possibility of using generalised linear regression models (covered in the second semester). 2. Suppose that on inspecting the residual plots you find evidence that the variance of the observations depends on the size of one of the explanatory variables, say x j, in the fitted model, i.e. σ 2 = v(x j )τ 2 (8) where x j is the jth explanatory in the fitted model. If the functional form v(.) is known or can be guessed then one can stabilize the variance of the observations by transforming all variables, response as well as explanatory variables, as follows Y i = Y v(xij ) i = 1, 2,..., n and x ir = x ir v(xij ) i = 1, 2,..., n r = 1, 2,..., p. Thus if the original model was p Y i = β 0 + β r x ir + ɛ i r=1 i = 1, 2,..., n then dividing throughout by v(x ij ) we get Y i = β 0 x i0 + p β r x ir + ɛ i r=1 i = 1, 2,..., n 16

17 where x 1 i0 = and v(xij ) ɛ i = ɛ i so that V v(xij ) ar(ɛ i ) = 1 V ar(ɛ v(x ij ) i) = τ 2. Thus under the transformed model the observations have constant variance. However, usually v(.) is not known. In the case when v(.) is monotonic an approximation to it can be obtained as follows: Suppose that the plot of the residuals ê i (or standardised residuals r i ) against the values x ij of the jth explanatory variable x j indicates that the variance of the observations either increases or decreases with the x j values. (If the evidence is not compelling or ambiguous, it can be confirmed statistically with the following test. Order the observations/cases in the data set according to the size of the values of x j. Remove the middle (1/5)th of the ordered cases and keep the remaining as two distinct groups, the first (2/5)ths of the ordered cases in one group and the last (2/5)ths of the ordered cases in the second group. For each group of observations fit the model separately and calculate the residual mean squares RMS 1 and RMS 2 for the two groups. Clearly, if the observations are independent then so are RMS 1 and RMS 2 and, further, if the observations have constant variance σ 2 then both RMS 1 and RMS 2 estimate σ 2 and where F = RMS 1 RMS 2 F n1 k,n 2 k n 1 = number of cases in the first group n 2 = number of cases in the second group k = number of parameters in the fitted model. (a proof of this will be provided in the Linear Models lectures).if however the variance of the observations is either increasing or decreasing with the values of x j then RMS 1 /RMS 2 will either be too small or too large respectively. An F-test will therefore test the hypothesis of constant variance against the alternative of either increasing or decreasing variance. If the test is significant [i.e. if F > F n1 k,n 2 k;α/2 or F < F n1 k,n 2 k;1 α/2 where F n1 k,n 2 k;γ denotes the 100(1 γ) percentile of the F -distribution and α is the level of significance you are working at] fit each of the five models (i) ê i = α 1 + α 2 x ij + ɛ i, 17

18 (ii) ê i = α 1 + α 2 xij + ɛ i, (iii) ê i = α 1 + α 2 x ij + ɛ i, (iv) ê i = α 1 + α 2 xij + ɛ i, (v) ê i = α 1 + α 2 log x ij + ɛ i, and test for the significance of α 1 and α 2 in each model. Choose the function v(.) to be the functional form in x ij in the model, amongst the above five, for which α 1 is NOT significant and α 2 is significant. If none of the above models satisfy the above condition then choose the function v(.) to be the form on the r.h.s. (less ɛ i ) of the model for which both α 1 and α 2 are significant. 1.6 Transformations to improve Normality and stabilise the variance - the Box Cox method When fitting a linear model, if a probability plot and residual plot indicate that the observations are not Normally distributed with constant variance transformation of the response variable may improve Normality and stabilize the variance. The Box - Cox method suggests the power transformation Y λ 1 Y (λ) = if λ 0 λc λ 1 c log Y if λ = 0 where λ has to be decided by the experimenter so that the Y (λ) i s are Normally distributed with constant variance σλ 2 and E(Y (λ) i ) = x T i θ λ, i = 1, 2,..., n where x i is a vector of covariate values associated with the ith case and θ λ the vector of their unknown coefficients. To see how λ is chosen, note that if y = (y 1, y 2,..., y n ) T are the observed values of Y = (Y 1, Y 2,..., Y n ) T and y (λ) = (y (λ) 1, y (λ) 2,..., y (λ) ) T are the values n (9) 18

19 of Y (λ) = (Y (λ) 1, Y (λ) 2,..., Y (λ) n ) T corresponding to y, the likelihood of the values y is expressed in terms of the likelihood of the values y (λ) as follows. L(λ, θ λ, σ 2 λ) = f Y (y) = n i=1 f (λ) Y = f Y (λ)(y (λ) ) (y (λ) i ) y(λ) i n i=1 y i y (λ) i y i 1 = ( 2πσ λ ) exp( 1 n (y (λ) n 2σλ 2 i x i θ λ ) 2 ) i=1 = (2πσ λ ) n 1 n 2 exp( S 2σλ 2 λ ) ( y i i=1 c )(λ 1) = (2πσ λ ) n 1 2 exp( S 2σλ 2 λ ) (ẏ c )n(λ 1) n i=1 y (λ) i y i where ẏ = n i=1 y 1 n i = geometric mean of the y i s and n S λ = (y (λ) i x T i θ λ ) 2 = (y (λ) Xθ λ ) T (y (λ) Xθ λ ) (10) i=1 is the sum of squares of the deviations of the model fitted to the transformed y (λ) i s. Here X is the design matrix with x T i as its ith row. Choose the arbitrary constant c to be c = ẏ so that the likelihood L(λ, θ λ, σλ) 2 reduces to L(λ, θ λ, σλ) 2 = (2πσ λ ) n 1 2 exp( S 2σλ 2 λ ) (11) and the log-likelihood to l(λ, θ λ, σλ) 2 = log L(λ, θ λ, σλ) 2 = n 2 log σ2 λ 1 S 2σλ 2 λ + constant (12) This needs to be maximised with respect to θ λ, σ 2 λ and λ to make the observed data values y as likely as possible. Maximising first with respect to θ λ we see from (12) that this is equivalent to minimising S λ = (y (λ) Xθ λ ) T (y (λ) Xθ λ ) with respect to θ λ. Thus the maximising value ˆθ λ is the l.s.e of θ λ in the linear model y (λ) = Xθ λ + ɛ and the minimised value of S λ is min S λ = (y (λ) X ˆθ λ ) T (y (λ) X ˆθ λ ) = RSS(λ) (13) 19

20 Following this maximisation with respect to θ λ the log-likelihood is now l(λ, ˆθ λ, σλ) 2 = n 2 log σ2 λ 1 RSS(λ) + constant (14) 2σλ 2 Now, maximising this with respect to σ 2 λ we get the maximising value σ 2 λ of σ 2 λ as σ 2 λ = 1 n RSS(λ) = n 1 n ˆσ2 λ (15) where ˆσ 2 λ is the l.s.e. of σ 2 λ in the linear model y (λ) = Xθ λ + ɛ. Following this maximisation with respect to σ 2 λ the log-likelihood is now l(λ, ˆθ λ, ˆσ 2 λ) = n 2 log RSS(λ) + κ = n 2 log ˆσ2 λ + κ (16) where κ and κ are constants. Finally, as can be seen from (16), the maximising value λ of λ minimises the RSS(λ) or the l.s.e. ˆσ 2 λ of σ 2 λ, the common variance of the transformed values. In practice λ is usually determined by calculating either the RSS(λ) (or the l.s.e. ˆσ 2 λ) for different values of λ in the range -2 to 2, and plotting RSS(λ) (or ˆσ λ ) against λ and reading off the plot the point λ at which the minimum is attained. The Box Cox procedure, therefore involves three steps 1. Transform the data y 1, y 2,..., y n by taking y (λ) i = yi λ 1 λẏ λ 1 if λ 0 ẏ log y i if λ = 0 (17) i = 1, 2,..., n 2. Fit the model y (λ) = Xθ λ + ɛ to the transformed data. 3. Calculate the RSS(λ) of the fitted model in (2) Do these three steps for different values of λ in the range -2 to 2 and plot RSS(λ)against λ and read off the plot the minimising value λ. Confidence intervals for the appropriate power λ can be constructed as follows. Suppose we wished to test the hypothesis H 0 : appropriate λ = λ 0 20

21 where λ 0 is a given value, against the alternative H 0 : appropriate λ λ 0 When the null hypothesis is true the best chances of getting the observed values y 1, y 2,..., y n is given by the maximized likelihood L(λ 0, ˆθ λ0, ˆσ 2 λ 0 ) whose log-likelihood is given in (16) with ˆθ λ0 and ˆσ 2 λ 0 the l.s.e s respectively of θ λ0 and σ 2 λ 0 in the linear model y (λ 0) = Xθ λ0 + ɛ. When indeed the null hypothesis H 0 is true log-likelihood l(λ 0, ˆθ λ0, ˆσ 2 λ 0 ) will be close to l( λ, ˆθ λ, ˆσ 2 λ) and the difference l(λ 0, ˆθ λ0, ˆσ 2 λ 0 ) l( λ, ˆθ λ, ˆσ 2 λ) = n 2 [log RSS( λ) RSS(λ 0 )] will be small. Conversely, when the difference l(λ 0, ˆθ λ0, ˆσ 2 λ 0 ) l( λ, ˆθ λ, ˆσ 2 λ) is small that is an indication that the null hypothesis is acceptable. It can, in fact be shown that 2[l(λ 0, ˆθ λ0, ˆσ 2 λ 0 ) l( λ, ˆθ λ, ˆσ 2 λ)] = n[log RSS( λ) RSS(λ 0 )] (18) is chi-squared distributed with 1 degree of freedom when H 0 is true. Thus one should accept the null hypothesis H 0 at the 5% level of significance if n[log RSS( λ) RSS(λ 0 ) χ 2 1;0.05 (19) or equivalently if RSS(λ 0 ) RSS( λ) eχ2 1;0.05 /n (20) But when n is large e χ2 1;0.05 /n = 1 + χ2 1;0.05. Thus one should accept the null n hypothesis at the 5% level of significance if RSS(λ 0 ) RSS( λ)(1 + χ2 1;0.05 n ) (21) We can therefore infer from the above that any λ 0 for which (21) is satisfied is an acceptable power, at the 5% level of significance, for a power transformation of the data. This is equivalent to saying that the set C of values defined by {C = λ 0 : RSS(λ 0 ) RSS( λ)(1 + χ2 1;0.05 )} is a 95% confidence interval n 21

22 for λ. As can be seen from a Box-Cox plot, once λ is identified from the plot and RSS( λ) read from the vertical axis of the plot, a horizontal line at height RSS( λ)(1 + χ2 1;0.05 ) can be drawn on the plot. At the two points where this n horizontal line intersects the curve of RSS(λ) two vertical lines are dropped onto the horizontal axis. The interval between these two vertical lines identifies the confidence interval {C = λ 0 : RSS(λ 0 ) RSS( λ)(1 + χ2 1;0.05 n )}. Clearly, if this confidence interval includes the value 1 then it would be advisable not to transform the data. On the other hand, if C does not include 1, then any value in it which is not awkward, makes the interpretation of the transformation easy and is close to λ, can be used for the power transformation to Normalize the data and stabilize their variance. 22

23 Comment: The Box Cox transformation may not always achieve its aims. In fact this transformation is trying to achieve three things simultaneously. 1. Fit the model E(Y (λ) ) = Xθ λ, 2. Stabilize the variance of the transformed data, 3. Achieve Normality of transformed response variable. If the model E(Y (λ) ) = Xθ λ that is fitted is too restrictive, in trying to fit such a restrictive model the transformation may fail to achieve its latter two aims. It is therefore advisable to make the model that is fitted as flexible as possible, to begin with, e.g. do not insist on a linear dependence on a given explanatory variable but allow a quadratic term in this explanatory variable in the model. If there is more than one explanatory variable in the model it may be helpful if you allow an interaction term (to be discussed at length later on in the course) between the two variables in your model. 23

Ch. 5 Transformations and Weighting

Outline Three approaches: Ch. 5 Transformations and Weighting. Variance stabilizing transformations; Box-Cox Transformations - Section 5.2; 5.4 2. Transformations to linearize the model - Section 5.3 3.