Chapter 11 Building the Regression Model II:

Size: px

Start display at page:

Download "Chapter 11 Building the Regression Model II:"

Abigail Jennings
5 years ago
Views:

1 Chapter 11 Building the Regression Model II: Remedial Measures( 補救措施 ) 許湘伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 11 1 / 48

2 11.1 WLS remedial measures may need to be taken: a regression model is not appropriate several cases are very influential previous: transformations to linearize the regression relation the error distributions more nearly normal make the variances of the error terms more nearly equal hsuhl (NUK) LR Chap 11 2 / 48

3 11.1 WLS remedial measures In this chapter- remedial measure to deal with: unequal error variances a high degree of multicollinearity influential observations two methods for nonparametric regression: lowess regression trees bootstrapping: for evaluating the precision of the complex estimators hsuhl (NUK) LR Chap 11 3 / 48

4 11.1 Unequal Error Variances Remedial Measures-Weighted Least Squares Weighted Least Squares (WLS) Chap. 3,6: transformation of Y - reducing or eliminating unequal variances difficulty: may create an inappropriate regression relationship weighted least squares( 加權最小平方法 ): when an appropriate regression relationship has been found but the variances of the error terms are unequal hsuhl (NUK) LR Chap 11 4 / 48

5 11.1 Unequal Error Variances Remedial Measures-Weighted Least Squares Weighted Least Squares (WLS) (cont.) Y i = β 0 + β 1 X i1 + + β p 1 X i,p 1 + ε i, i = 1,..., n parameters: β 0,..., β p 1 known constants: X i1,..., X i,p 1 ε i indep. N(0, σ 2 i ) σ 2 {ε} n n = σ σ σn 2 b = (X X) 1 X Y unbias, consistent but don t have minimum variance hsuhl (NUK) LR Chap 11 5 / 48

6 11.1 Unequal Error Variances Remedial Measures-Weighted Least Squares Weighted Least Squares (WLS) (cont.) Consider: Observations with small variances provide more reliable information about the regression function than those with large variances. Error Variances Know (w i = 1 )(unrealistic 不切實際的 ) σi 2 Error Variances Know up to Proportionality Constant (w i = k 1 ) σi 2 Error Variances Unknown: estimation of variance function or standard deviation function (w i = 1 (ŝ i, w ) 2 i = 1ˆν i ) hsuhl (NUK) LR Chap 11 6 / 48

7 11.1 Unequal Error Variances Remedial Measures-Weighted Least Squares WLS-error variances known Methods of maximum likelihood to obtain estimators: indep. ε i N(0, σi 2 ) and w i = 1 σi 2 L(β) = = [ n i=1 n i=1 ( wi 2π 1 (2πσi 2 ] ) 1/2 b= arg max L(β) β [ exp 1 )1/2 2σi 2 [ exp b = arg min β Q w (Q w = 1 2 (Y i β 0 β 1 X i1 β p 1 X i,p 1 ) 2 ] ] n w i (Y i β 0 β 1 X i1 β p 1 X i,p 1 ) 2 i=1 n w i (Y i β 0 β 1 X i1 β p 1 X i,p 1 ) 2 ) i=1 Called weighted least squares criterion (min Q w ) w i = 1 OLS hsuhl (NUK) LR Chap 11 7 / 48

8 11.1 Unequal Error Variances Remedial Measures-Weighted Least Squares WLS-error variances known (cont.) w i = 1 : reflects the amount of information contained in the σi 2 observation Y i vary i large w i less W = n n w w w n The normal equations: (X W X)b w = X W Y The weighted least squares (MLE) for β: σ 2 {b w } p p b w = (X W X) 1 X W Y = (X W X) 1 ( σ 2 {Y } = W 1 ) hsuhl (NUK) LR Chap 11 8 / 48

9 11.1 Unequal Error Variances Remedial Measures-Weighted Least Squares WLS-error variances known (cont.) b w : unbiased, consistent, have minimum variance among unbiased linear estimators When the weighted are known, b w generally exhibits less variability than b. hsuhl (NUK) LR Chap 11 9 / 48

10 11.1 Unequal Error Variances Remedial Measures-Weighted Least Squares WLS-error variances known up to proportionality constant w i = k 1 σ 2 i σ 2 {b w } p p k: unknwon s 2 {b w } p p where MSE w = b w = (X W X) 1 X W Y = k(x W X) 1 = MSE w (X W X) 1 wi (Y i Ŷ i ) 2 n p = (unaffected) wi e 2 i n p MSE w : an estimator of the proportionality constant k hsuhl (NUK) LR Chap / 48

11 11.1 Unequal Error Variances Remedial Measures-Weighted Least Squares WLS-error variances unknown Estimation of Variance Function or Standard Deviation Function One rarely has knowledge of the variances σ 2 i. estimate of the variances σ 2 i = E{ε 2 i } (E{ε i }) 2 = E{ε 2 i } Estimator of σi 2 : the squared residual ei 2 Estimator of σ i = σi 2 : the absolute residual e i hsuhl (NUK) LR Chap / 48

12 11.1 Unequal Error Variances Remedial Measures-Weighted Least Squares WLS-error variances unknown (cont.) the estimation process: 1 Fit the regression model by OLS and analyze e i 2 Estimate the variance function or the standard deviation function by regressing ei 2 or e i on the appropriate predictor(s). 3 ŝ i for standard deviation function, ˆν i for the variance function: to obtain w i 4 obtain b w by using w i (Sometimes, iterated for several times to reach stabilize iteratively reweighted lease squares) hsuhl (NUK) LR Chap / 48

13 11.1 Unequal Error Variances Remedial Measures-Weighted Least Squares WLS-error variances unknown (cont.) Reference: figure from Faraway s Linear Models with R (2005, p. 59) hsuhl (NUK) LR Chap / 48

14 11.1 Unequal Error Variances Remedial Measures-Weighted Least Squares WLS-error variances unknown (cont.) Some possible variance and standard deviation functions: 1 plot e i vs. X 1 : a megaphone( 大聲公 ) shape e i regresses on X 1 2 plot e i vs. Ŷ : a megaphone( 大聲公 ) shape e i regresses on Ŷ 3 plot e 2 i vs. X 3 : upward tendency e 2 i regresses on X 3 4 plot e i vs. X 2 : increase rapidly with X 2 and increases more slowly e i regresses on X 2, X 2 2 w i = 1 (ŝ i ) 2 : ŝ i-fitted value from standard function w i = 1ˆν i : ˆν i -fitted value from variance function the weight matrix W b w = (X W X) 1 X W Y hsuhl (NUK) LR Chap / 48

15 11.1 Unequal Error Variances Remedial Measures-Weighted Least Squares WLS-error variances unknown (cont.) Using of Replicated or Near Replicates Replicate observations are made at each combination of levels of Xs It the number of replications (or near replications) is large, w i may be obtained directly from the sample variances of Y observations at each combination of levels on X variables each case in a replicate group receives the same weight with this method Confidence interval: (similar but approximate) b k ± t(1 α/2; n p)s{b k } (6.50) hsuhl (NUK) LR Chap / 48

16 11.1 Unequal Error Variances Remedial Measures-Weighted Least Squares WLS-error variances unknown (cont.) Using of Ordinary Least Squares with Unequal Error Variances: b = (X X) 1 X Y σ 2 {b} = (X X) 1 (X σ 2 {ε}x)(x X) 1 S 2 {b} = (X X) 1 (X S 0 {ε}x)(x X) 1 where S 0 = diag{e1, 2 e2, 2..., en} 2 hsuhl (NUK) LR Chap / 48

17 11.1 Unequal Error Variances Remedial Measures-Weighted Least Squares blood pressure example 54 subjects: women hsuhl (NUK) LR Chap / 48

18 11.1 Unequal Error Variances Remedial Measures-Weighted Least Squares blood pressure example (cont.) linear regression function by unweighted least squares: Ŷ = (3.994) ( ) X hsuhl (NUK) LR Chap / 48

19 11.1 Unequal Error Variances Remedial Measures-Weighted Least Squares blood pressure example (cont.) e i regresses on X: ŝ = X ŝ 1 = w 1 = 1/(ŝ 1 ) 2 = WLS: Ŷ = X C.I. for β 1 : α = 0.05; s{b w1 } = ± t(0.975; 52) ( ) β =2.007 hsuhl (NUK) LR Chap / 48

20 11.1 Unequal Error Variances Remedial Measures-Weighted Least Squares blood pressure example (cont.) unequal variances: heteroscedasticity( 異質性 ); equal variances:homoscedasticity( 等變異性 ) R 2 does not have a clear-cut meaning for WLS WLS may be view as OLS of transformed variables: W 1/2 Y Y w Y = Xβ + ε, = W 1/2 Xβ X w β E{ε} = 0, σ 2 {ε} = W 1 + W 1/2 ε ε w E{ε w } = 0; σ 2 {ε w } = I; b w = (X wx w ) 1 X wy w = (X W X) 1 X W Y hsuhl (NUK) LR Chap / 48

21 11.1 Unequal Error Variances Remedial Measures-Weighted Least Squares blood pressure example (cont.) The weighted least squares normal equations: wi Y i = b w0 wi + b w1 wi X i wi X i Y i = b w0 wi X i + b w1 wi X 2 i The weighted least squares estimators b w0 and b w1 in (11.9) are: b w1 = b w0 = wi X i Y i wi X i wi Y i wi wi Xi 2 ( w i X i ) 2 wi wi Y i b w1 wi X i wi hsuhl (NUK) LR Chap / 48

22 11.2 Ridge Regression Ridge Regression( 脊迴歸 ) ridge regression: a method of overcoming serious multicollinearity problem by modifying the method of least squares. allowing biased estimators of the regression functions When an estimator has only a small bias and is substantially more precise than an unbiased estimator, it may well be the preferred estimator since it will have a larger probability of being close to the true parameter value. hsuhl (NUK) LR Chap / 48

23 11.2 Ridge Regression Ridge Regression( 脊迴歸 ) (cont.) Estimator b is unbiased but imprecise Estimator b R is much more precise but has a small bias The probability that b R falls near β is much greater than that for b hsuhl (NUK) LR Chap / 48

24 11.2 Ridge Regression Ridge Regression( 脊迴歸 ) (cont.) A measure of the combined effect of bias and sampling variation: The mean squared error (MSE): E{b R β} 2 = σ 2 {b R } + (E{b R } β) 2 Transformed variables: Y i = Y i ( 1 Yi Ȳ n 1 s Y ), X ik = = β 1X i1 + β 2X i2 + β p 1X i,p 1 + ε i r XX b = r YX ( ) 1 Xik X k n 1 s k hsuhl (NUK) LR Chap / 48

25 11.2 Ridge Regression Ridge Regression( 脊迴歸 ) (cont.) The ridge standardized regression estimators are obtained by: (r XX + ci)b R = r YX b R = (r XX + ci) 1 r YX Biasing constant c 0: reflects the amount of bias c = 0 OLS; c > 0 bias but more stable (less variable) than OLS estimators hsuhl (NUK) LR Chap / 48

26 11.2 Ridge Regression Ridge Regression( 脊迴歸 ) (cont.) Choice of Biasing Constant c: bias component (E{b R } β) of E{b R β} 2 as c while the variance (σ 2 {b R }) component becomes smaller Always a c s.t. b R has a smaller total MSE than b difficulty: the optimum value of c varies from one application to another and is unknown Using Ridge trace( 脊軌跡 ) and (VIF ) k (c) hsuhl (NUK) LR Chap / 48

27 11.2 Ridge Regression Ridge Regression( 脊迴歸 ) (cont.) Ridge trace:a simultaneous plot of the values of the (p 1) b R for different c; 0 c 1 Choose c: the Ridge trace starts to become stable and VIF has become sufficiently small. hsuhl (NUK) LR Chap / 48

28 11.2 Ridge Regression Ridge Regression( 脊迴歸 ) (cont.) The normal equation for the ridge estimators: (1 + c)b1 R + r 12 b2 R + + r 1,p 1 bp 1 R = r Y 1 r 21 b1 R + (1 + c)b2 R + + r 2,p 1 bp 1 R = r Y 2. r p 1,1 b1 R + r p 1,2 b2 R + + (1 + c)bp 1 R = r Y,p 1 VIF for bk R are defined analogously to those for OLS b k : measures how large is the variance of bk R relative to what the variance would be if X s were uncorrelated. hsuhl (NUK) LR Chap / 48

29 11.2 Ridge Regression Ridge Regression( 脊迴歸 ) (cont.) VIF for b R k : the diagonal elements of the following matrix (r XX + ci) 1 r XX (r XX + ci) 1 Ridge regression estimates can be obtained by the method of penalized least squares. Q = n i=1 [ Y i (β1x i1 + + βp 1X i,p 1) ] p c (βj ) 2 sometimes referred to as shrinkage esimators Limitation: ridge regression is that ordinary inference procedures are not applicable and exact distribution properties are not known. (bootstrapping ) j=1 hsuhl (NUK) LR Chap / 48

30 11.2 Ridge Regression Ridge Regression( 脊迴歸 ) (cont.) (a) Table 11.2 (b) Table hsuhl (NUK) LR Chap / 48

31 11.2 Ridge Regression Ridge Regression( 脊迴歸 ) (cont.) c = 0.02 Ŷ = X X X 3 hsuhl (NUK) LR Chap / 48

32 11.2 Ridge Regression Ridge Regression( 脊迴歸 ) (cont.) LASSO: Least absolute shrinkage( 收縮 ) and selection operator( 最小絕對緊縮與選擇算子 ) ˆβ = arg min β n p 1 (Y i i=1 j=0 p 1 X i jβ j ) 2 + λ j=0 β j hsuhl (NUK) LR Chap / 48

33 11.2 Ridge Regression Ridge Regression( 脊迴歸 ) (cont.) hsuhl (NUK) LR Chap / 48

34 11.3 Robust Regression Robust Regression To examine whether an outlying case is the result of a recording error, breakdown( 故障 ) of a measurement instrument, or the like. Erroneous data can be corrected corrected Erroneous data cannot be corrected discarded( 拋棄 ) Many time: impossible to tell for certain whether the observations for an outlying case are erroneous not be discarded If an outlying influential case is not clearly erroneous to examine the adequacy of the model hsuhl (NUK) LR Chap / 48

35 11.3 Robust Regression Robust Regression (cont.) Numerous robust regression procedures have been developed. LAR or LAD regression: Least absolute residuals or least absolute deviations( 最小絕對離差 ) minimum L 1 norm regression Criterion: min L 1 = min β β { n Y i (β 0 + β 1 X i1 + + β p 1 x i,p 1 ) i=1 LAR: less emphasis on outlying observations than does the method of least squares linear programming method: to obtain the LAR estimators LAR method: the residuals will not sum to zero } hsuhl (NUK) LR Chap / 48

36 11.3 Robust Regression Robust Regression (cont.) LMS regression: Least median of squares median { [Y i (β 0 + β 1 X i1 + + β p 1 x i,p 1 )] 2} IRLS robust regression: Iteratively reweighted least squares method hsuhl (NUK) LR Chap / 48

37 11.4 Regression Tree Regression Tree Powerful, conceptually simple method of nonparametric regression 1 the range of the predictor is partitioned into segments 2 within each segment the estimated regression fit is given by the mean of the response in the segment hsuhl (NUK) LR Chap / 48

38 11.4 Regression Tree Regression Tree (cont.) hsuhl (NUK) LR Chap / 48

39 11.4 Regression Tree Regression Tree (cont.) hsuhl (NUK) LR Chap / 48

40 11.4 Regression Tree Regression Tree (cont.) r = 2: the best point is choose to min SSE = SSE(R 21 ) + SSE(R 22 ) SSE(R rj ) = (Y i Ȳ Rjk ) 2 SSE(R rj ): the sum of squared residuals in region R rj hsuhl (NUK) LR Chap / 48

41 11.4 Regression Tree Regression Tree (cont.) hsuhl (NUK) LR Chap / 48

42 11.4 Regression Tree Regression Tree (cont.) Determining the number of Regions R hsuhl (NUK) LR Chap / 48

43 11.5 Bootstrapping Bootstrapping In many nonstandard situations, (ex: nonconstant error variances estimated by iteratively reweighted least squares), standard methods for evaluating the precision may not be available or may only be approximately applicable when the sample size is large Bootstrapping: provide estimates of the precision of sample estimated for these complex cases hsuhl (NUK) LR Chap / 48

44 11.5 Bootstrapping Bootstrapping (cont.) general procedure: obtain b 1 by some procedure and wish to evaluate the precision of b 1 by bootstrap method The bootstrap method calls for( 需要 ) the selection from the observed sample data of a random sample of size n with replacement. the bootstrap sample may contain some duplicate( 複製的 ) data from the original sample and omit some other data in the original sample calculates the estimated regression coefficients from the bootstrap sample b 1 repeated a large number of times The estimated standard deviation of all of b 1 s {b 1} an estimate of the variability of the sampling distribution of b 1 hsuhl (NUK) LR Chap / 48

45 11.5 Bootstrapping Bootstrapping (cont.) Bootstrap Sampling: two basic ways When the regression function being fitted is a good model for the data, the error terms have constant variance, and fixed X sampling is appropriate. e i from the original fitting are regarded as the sample data to be sampled with replacement After a bootstrap sample with n: e 1,..., e n Y i = Ŷ i + e i Y regression on Xs b 1 hsuhl (NUK) LR Chap / 48

46 11.5 Bootstrapping Bootstrapping (cont.) When there is some doubt about the adequacy of the regression function being fitted, the error variance are not constant, and/or random X sampling is appropriate. (X i, Y i ) in the original sample are considered to be the data to be sampled with replacement After a bootstrap sample with n: n pairs: (X 1, Y 1 ),..., (X n, Y n ) Y regression on X s b 1 hsuhl (NUK) LR Chap / 48

47 11.5 Bootstrapping Bootstrapping (cont.) bootstrap samples are adequate One can observe the variability of the bootstrap estimates by s {b 1} as the number of bootstrap samples is increased. Bootstrap Confidence Intervals From the bootstrap distribution of b 1, find the α/2 and 1 α/2 quantiles b 1(α/2) and b 1(1 α/2) Percentiles from b 1 : b 1(α/2) β 1 b 1(1 α/2) reflection method: require a larger number of bootstrap samples 2b 1 b 1(α/2) β 1 2b 1 b 1(α/2) hsuhl (NUK) LR Chap / 48

48 11.3 & 11.4 Other remedial measures Other remedial measures Section 11.3 Robust regression Least absolute residuals (LAR)-minimum L 1 -norma regression Section 11.4 Nonparametric regression Lowess Method Regression Trees hsuhl (NUK) LR Chap / 48

Remedial Measures for Multiple Linear Regression Models

Remedial Measures for Multiple Linear Regression Models Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Remedial Measures for Multiple Linear Regression Models 1 / 25 Outline