1. Variance stabilizing transformations; Box-Cox Transformations - Section. 2. Transformations to linearize the model - Section 5.

Size: px

Start display at page:

Download "1. Variance stabilizing transformations; Box-Cox Transformations - Section. 2. Transformations to linearize the model - Section 5."

Frank Lawson
6 years ago
Views:

1 Ch. 5: Transformations and Weighting 1. Variance stabilizing transformations; Box-Cox Transformations - Section 5.2; Transformations to linearize the model - Section Weighted regression - Section 5.5 1

2 Variance-Stabilizing Transformations Model assumptions: E[y x] = β 0 + β 1 x V (y x) = σ 2 Set µ y = E[y x]. What if V (y x) = σ 2 f(µ y ) where f(x) is some non-constant function? Try to find a function g(y) so that V (g(y) x) = constant 2

3 Variance-Stabilizing Transformations (cont d) Then obtain a Taylor expansion of g(y) about µ y : g(y) = g(µ y ) + (y µ y )g (µ y ) + (y µ y) 2 g (µ y ) + 2 Then V (g(y)) =. V (y) ( g (µ y ) ) 2 = σ 2 f(µ y ) ( g (µ y ) ) 2 V (g(y)) will be constant if g (µ y ) = 1 f(µ y ) g (z) = 1 f(z) 3

4 Examples 1. f(x) = x (e.g. Poisson data) 1 f(x) = x 1/2 g(y) = y Poisson Residuals Residuals vs Fitted Poisson Residuals (after sqrt) Residuals vs Fitted Residuals Residuals Fitted values lm(formula = yy ~ xx) Fitted values lm(formula = I(sqrt(yy) ~ xx)) 4

5 Examples (cont d) 2. f(x) = x 2 (e.g. Exponential data) Residuals Exponential Residuals Residuals vs Fitted f(x) = 1 x g(y) = log(y) Fitted values lm(formula = yy ~ xx) 5

6 Examples (cont d) 3. f(x) = x(1 x) (e.g. binomial data) 1 = f(x) 1 x(1 x) d dx sin 1 ( x) = 1 2 x(1 x) g(y) = arcsin( y) 6

7 5.4.1 Box-Cox Transformations (on response) Select the power λ in the transformation g(y) = y λ by maximum likelihood. Equivalent to minimizing the SSE with respect to λ (and other parameters). Caution: The residual sums of squares are not comparable for different values of λ. We need to ensure that comparisons are made according to the same standard: where y (λ) = y λ 1 λẏ λ 1, λ 0 ẏ log y, λ = 0 ẏ = geometric mean of the y s 7

8 Strategy 1. Perform transformation y (λ) 1,..., y(λ) n for several values of λ. 2. Compute SSE for each value of λ 3. Select λ which gives the minimum value. 4. Fit y λ = Xβ + ɛ 5. Approximate confidence intervals for λ can also be obtained. 6. In R, use boxcox(y x, data= dataset) 8

9 Example 1 1. Bacteria data (Ex. 5.3) - the average number of surviving bacteria (y) in a canned food product versus time (t) of exposure to 300 F heat. 9

10 Example 1 (cont d) > library(mpv) > data(p5.3) > bact.lm <- lm(bact min, data=p5.3) > plot(bact.lm, which=1) # > plot(bact.lm, which=2) # > library(mass) > boxcox(bact.lm) # > bactlog.lm <- lm(log(bact) min, data=p5.3) > plot(bactlog.lm, which=1) # > plot(bactlog.lm, which=2) # 10

11 Residuals vs. Fitted Residuals vs Fitted 1 Residuals Fitted values lm(formula = bact ~ min, data = p5.3) 11

12 Q-Q Plot Normal Q Q plot Standardized residuals Theoretical Quantiles lm(formula = bact ~ min, data = p5.3) 12

13 Box-Cox log Likelihood % lambda 13

14 Residuals vs. Fitted (after log-transforming) Residuals vs Fitted Residuals Fitted values lm(formula = log(bact) ~ min, data = p5.3) 14

15 Q-Q Plot (after log-transforming) Normal Q Q plot Standardized residuals Theoretical Quantiles lm(formula = log(bact) ~ min, data = p5.3) 15

16 Example (cont d) A model of the form log(y) = β 0 + β 1 t + ε is reasonable, especially if β 1 is negative ( β 1 =.236). 16

17 Example 2 trees data. 31 observations on Girth (g), Height (h) and Volume (V ) A Simple Model: or V. = g2 h 4π log V = β 0 + β 1 log h + β 2 log g + ε 17

18 Example 2 (Cont d) > library(daag) > data(trees); attach(trees) > trees.lm <- lm(log(volume) log(girth) + log(height)) > boxcox(trees.lm) # (lambda = 1 is OK) > summary(trees.lm) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e-09 log(height) e-06 log(girth) < 2e-16 18

19 Example 2 (Cont d) - Box-Cox after Transforming log Likelihood % lambda Coefficient of log(height) is not distinguishable from 1, and coefficient of log(girth) is not distinguishable from 2. 19

20 5.3 Linearizing Transformations Intrinsically linear model: The relationship between y and x is such that a simple transformation can produce a linear model. Example: Fit the model E[y] = β 0 e β 1x log E[y] = log β 0 + β 1 x log y i = β 0 + β 1x i + ε i Note that this implies multiplicative errors. i.e. y i = e β 0 +β 1x i +ε i = β 0 e β 1x i e ε i If the error is additive, i.e. y i = β 0 e β 1x i + ε i then the transformation is not appropriate. 20

21 Other possibilities from the text E[y] = β 0 x β 1 log E[y] = log β 0 + β 1 log x New model: log y = β 0 + β 1 log x i + ε i E[y] = x β 0 x β 1 New model: 1 E[y] = β 0 β 1 (1/x) 1 y i = β 0 + β 1 ( 1/x i ) + ε i 21

22 Example - Windmill Data These data concern the relation between the electrical output of a windmill subjected to different wind velocities. A decent model is: DC output = β 0 + β 1 /velocity + ε 22

23 Scatter Plots Before and After Transformation Windmill Data untransformed Windmill Data transformed DC output DC output Wind Velocity /Wind Velocity 23

24 Some models that are intrinsically nonlinear Michaelis-Menten model (useful for modelling chemical reaction rates) y = β 0x β 1 + x + ε Mitcherlich Law (useful for modelling chemical yield, etc.) Logistic Growth Model: y = β 0 β 1 γ x + ε y = β β 1 e kx + ε 24

25 Box-Tidwell transformation of a predictor variable Consider the model y = β 0 + β 1 x α + ε If α is known, β 0 and β 1 can be estimated... How can α be estimated? 25

26 Suppose we have a good guess: α 0 Taylor expand x α about α 0 : x α = x α 0 + (α α 0 )x α 0 log(x) + O((α α 0 ) 2 ) so if α 0 is close to α, we have x α. = x α 0 + (α α 0 )x α 0 log(x) Our regression model then looks like y =. β 0 + β 1 x α 0 + β 1 (α α 0 )x α 0 log(x) + ε so consider y =. β0 + β 1 xα 0 + β2 xα 0 log(x) + ε where β2 = β 1(α α 0 ). This gives the updating equation: α = β 2 /β 1 + α 0 26

27 Box-Tidwell Procedure 1. Guess α: α 0 2. Fit y = β 0 + β 1 x α 0 + ε β 1 3. Fit y. = β 0 + β 1 xα 0 + β 2 xα 0 log(x) + ε β 2 4. Update α α 1 = β 2 / β 1 + α 0 5. Repeat above steps to get α

28 Box-Tidwell Procedure (cont d) Convergence usually in three iterations. There are instances where this procedure may not converge at all. Note that the textbook implementation of the Box-Tidwell procedure is incorrect. 28

29 Example Windmill generation of electricity. DC output is measured against wind velocity: wind v DC

30 Windmill Example (cont d) The scatterplot (windmill.pdf) indicates the need for a transformation. We saw earlier the usefulness of the reciprocal transformation of the velocity: 1/v. y = β 0 + β 1 (1/v) + ε Does the Box-Tidwell procedure agree? 30

31 Box-Tidwell Initial guess: α 0 = 1 > boxtidwell.lm(dc v,data=wind) initial guess alpha_1 alpha_2 alpha_3 alpha_ y = β 0 + β 1 (1/v.833 ) + ε > wind.lm <- lm(dc I(vˆ(.833)), data=wind) > summary(wind.lm) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) <2e-16 I(vˆ(-0.833)) <2e-16 Fitted Model: ŷ = (1/v.833 ) 31

32 Windmill Example (cont d) v DC Windmill data: DC output vs Wind velocity red curve: reciprocal of v black curve: v^(.833) Transformed LS fits: 32

33 Windmill Example (cont d) Standardized residuals Sample Q Q Plot Normal Q Q plot Sample Quantiles Simulated Q Q Plot Theoretical Quantiles Theoretical Quantiles Simulated Q Q Plot Simulated Q Q Plot Sample Quantiles Sample Quantiles Theoretical Quantiles Theoretical Quantiles These plots indicate that this model fits fairly well. 33

34 Exercises on Box-Cox and Box-Tidwell Analyse the data in p5.4. Do you need to transform the response or the predictor? Check all diagnostics before and after transforming. Also, obtain a plot of the data with the overlaid curve., Analyze the data in p5.2; check the Box-Tidwell transformation is it consistent with the theory described in Exercise 5.2 of the textbook. Analyze the data in p5.3. Analyze the data in p

35 5.5.2 Weighted Least Squares Consider the regression through the origin model y i = β 1 x i + ε i with E[ε i ] = 0 and suppose V (y i x i ) = σ 2 /w i where w i is a known weight. i.e. E[ε 2 i ] = σ2 /w i The least squares estimate was previously found by minimizing n i=1 ε i : β 1 = xi y i x 2 i Gauss-Markov Theorem: When the variances are constant, β 1 has the smallest variance of any linear unbiased of β 1. 35

36 Weighted Least Squares (cont d) β 1 is not the best linear unbiased estimator for β 1 when there are weights w i. To find the BLUE now, multiply the model by a i : or a i y i = a i β 1 x i + a i ε i y i = β 1x i + ε i Compute β 1 for the new data (x i, y i ): β 1 = x i y i (x i ) 2 E[ β 1 ] = β 1 (unbiased) V ( β 1 ) = σ 2 x 2 i a 4 i /w i ( a 2 i x2 i )2 36

37 Weighted Least Squares (cont d) How do we choose a 1, a 2,..., a n to make this as small as possible? Recall: Cauchy-Schwarz Inequality: n i=1 u i v i 2 n u 2 n j vk 2 j=1 k=1 (equality holds if the u i s are proportional to the v i s: u i = cv i ) Look at the denominator of our variance: 2 n a 2 i x2 i i=1 n i=1 a 4 i x2 i /w i n i=1 w i x 2 i (equality holds if the u i s are proportional to the v i s: e.g. a 4 i x2 i /w i = w i x 2 i or a i = w i ) 37

38 Weighted Least Squares (cont d) Thus, the V ( β 1 ) is minimized if a i = w i : V ( β 1 ) = σ 2 ni=1 wx 2 i Note also that E[ w i ε i ] = 0 and V ( w i ε i ) = σ 2 and that instead of minimizing n ε 2 i i=1 we are now minimizing n i=1 w i ε 2 i Ordinary Least Squares Weighted Least Squares 38

39 Example roller data Ordinary Least Squares: roller.lm <- lm(depression weight, data=roller) plot(roller.lm, which=4) 39

40 Example (Cont d) Residuals vs Fitted Residuals Fitted values lm(formula = depression ~ weight, data = roller) The residual plot indicates that the variance might not be constant. 40

41 Weighted Least Squares roller.wlm <- lm(depression weight, data=roller, weights=1/weightˆ2) plot(roller.wlm, which=4) Residuals vs Fitted Residuals Fitted values lm(formula = depression ~ weight, data = roller, weights = 1/weight^2) a more random pattern 41

42 Weighted Least Squares Comparing the fitted lines: Roller Data depression OLS WLS weight 42

43 Generalized Least Squares Model: y = Xβ + ɛ E[ɛ ] = 0 and E[ɛ ɛ T ] = Σ = σ 2 V. Σ must be symmetric and positive definite. This implies, among other things, that Σ possesses an inverse. Weighted Least Squares is a special case where Σ is a diagonal matrix with ii element σ 2 /w i V = K 2 for some symmetric nonsingular K. 43

44 Generalized Least Squares (cont d) Consider Note K 1 y = K 1 Xβ + K 1 ɛ Var(K 1 ɛ ) = E[K 1 ɛ ɛ T K 1 ] = K 1 σ 2 V K 1 = σ 2 I By multiplying through by K 1 we now have a constant variance, so β can be estimated by Least-Squares: β = (X T K 2 X) 1 X T K 2 y β = (X T V 1 X) 1 X T V 1 y is the generalized least-squares estimator for β. 44

45 Generalized Least Squares (cont d) Unbiased: E[ β ] = β Variance: Var( β ) = (X T V 1 X) 1 X T V 1 ΣV 1 X(X T V 1 X) 1 = σ 2 (X T V 1 X) 1 45

Ch. 5 Transformations and Weighting

Outline Three approaches: Ch. 5 Transformations and Weighting. Variance stabilizing transformations; Box-Cox Transformations - Section 5.2; 5.4 2. Transformations to linearize the model - Section 5.3 3.