Within Groups Comparisons of Least Squares Regression Lines When There Is Heteroscedasticity

Size: px

Start display at page:

Download "Within Groups Comparisons of Least Squares Regression Lines When There Is Heteroscedasticity"

Harriet Maud Garrison
5 years ago
Views:

1 Within Groups Comparisons of Least Squares Regression Lines When There Is Heteroscedasticity Rand R. Wilcox Dept of Psychology University of Southern California Florence Clark Division of Occupational Science & Occupational Therapy University of Southern California WORD COUNT 3446 December 28,

2 ABSTRACT Motivated by a problem that arose in the Well Elderly II study (Clark et al., 2012; Jackson et al., 2009), the paper deals with the situation where a least squares regression line is fitted to data at two different times and the goal is to test the hypothesis that the slopes and intercepts are equal in a manner that allows a heteroscedastic error term. A bootstrap estimate of the standard errors could be used to deal with heteroscedasticity followed by a simple modification of Hotellng s test. But evidently there are no simulation results regarding the resulting control over the probability of a Type I error. Three related goals are to test the hypothesis of equal intercepts, ignoring the slopes, testing the hypothesis of equal slopes, ignoring the intercepts, and testing the hypothesis that the regression lines differ at a specified design point. This last goal corresponds to the classic Johnson Neyman method when dealing with independent groups. Another unknown is the impact on the actual Type I error probability when leverage points are removed. Here it is found that for various situations, removing leverage points has a minimal impact, but for certain patterns of heteroscedasticity, there is a substantial improvement over the control of a Type I error when the sample size is small. Keywords: analysis of covariance, bootstrap methods, heteroscedasticity, Hotelling s test, Johnson Neyman method, Well Elderly II study. 1 Introduction The paper deals with what in essence is a within groups analysis of covariance design with a single covariate. At time j (j = 1, 2), it is assumed that Y j = β 0j + β 1j X j + λ(x j )ɛ j, (1) where β kj (k = 0, 1; j = 1, 2) are unknown parameters, λ(x j ) is some unknown function that models heteroscedasticity, and ɛ j is a random variable having variance σj 2 and E(ɛ j ) = 0. The paper considers the problem of testing H 0 : (β 01, β 11 ) = (β 02, β 12 ) (2) 2

3 when using the ordinary least squares estimator. Two related goals are testing H 0 : β 01 = β 02, (3) and H 0 : β 11 = β 12, (4) Yet another goal is, for a chosen value X, test H 0 : E(Y 1 X) = E(Y 2 X). (5) For the case of independent groups, this last goal corresponds to the classic Johnson and Neyman (1936) method when there is homoscedasticity and the error term has a normal distribution. Two general remarks are in order. First, it is assumed that there is explicit interest in determining the mean of Y, given X, rather some other robust measure of location such as the median. For skewed distribution, it is evident that a robust measure of location can be argued to better reflect the typical value. The presumption is that at least in some situations, the mean might still provide a useful perspective, in which case least squares regression is more appropriate compared to some robust regression estimator. Second, the methods considered here assume asymptotic normality for reasons that will be evident. Some robust regression estimators are known to be asymptotically normal. But for others, either it is known that this is not necessarily the case or asymptotic normality has not been established. One example is the Theil (1950) and Sen (1968) estimator. Peng et al. (2008) established that the slope estimator may or may not be asymptotically normal. The point is that some obvious extension of the methods in section 2, when using a robust regression estimator, are not necessarily appropriate. Let b kj be the least squares estimate of β kj. Of course, classic methods assume ɛ j has a normal distribution and that λ(x) 1 (homoscedasicity). It is well known, however, that classic inferential methods, which assume homoscedasticity, are based on an incorrect estimate of the standard error of b kj when in fact there is heteroscedasticity (e.g., Godfrey, 2006; Long & Ervin, 2000). For completeness, it is noted that several theoretically sound methods for estimating standard errors, when there is heteroscedasticity, have been derived (e.g., White, 1980; Hinkley, 1977; Cribari-Neto, 2004; Cribari-Neto, Souza & Vasconcellos, 3

4 2007; Cribari-Neto, Souza & da Silva, 2011). However, no details are given here because these estimators are not readily extended to the situation at hand where dependent groups are being compared. That is, these estimators do not provide an estimate of the covariance between b k1 and b k2, which is needed for present purposes. The general strategy here is to use a basic bootstrap estimate of the standard errors that allows heteroscedasticity followed by some obvious test statistics for testing (3) and (4). As for (2), a simple generalization of Hotelling s T 2 test statistic is used. A bootstrap estimate of the standard error also is used when testing (5). Another goal in this study is to investigate the impact of removing leverage points (outliers among the independent variable) on the probability of a Type I error. When using the OLS estimator, a well-known concern is that even a single leverage point can result in a fit that poorly reflects the association among the bulk of the points (e.g., Rousseeuw & Leroy, 1987; Staudte & Sheather, 1990; Heritier et al. 2009; Wilcox, 2012). That is, there are concerns about leverage points beyond their impact on the probability of a Type I error: power can be impacted as well. Simulation results reported here indicate that in many situations, removing leverage points does not alter the control over the Type I error by very much. However, for some situations, control over the Type I error probability is improved substantially, as will be seen. The paper is organized as follows. Section 2 describes the methods used to test (2), (3), (4) and (5). Section 3 reports simulation results and section 4 illustrates the methods using data from the Well Elderly II study (Clark et al., 2012; Jackson et al., 2009), which motivated this paper. 2 Description of the Methods This section describes the details of the methods to be studied via simulation. The method for identifying leverage points is described first followed by the bootstrap method for estimating the standard errors and then the methods for testing (2), (3), (4) and (5). 4

5 2.1 Identifying Leverage Points Let (Y 11, X 11, Y 12, X 12, ),..., (Y n1, X n1, Y n2, X n2 ) be a random sample from some four-variate distribution where all four random variables are possibly correlated. For a within subjects design, (Y ij, X ij ) (i = 1,..., n; j = 1, 2), represents the observations at time j. Let M j be the usual sample median based on X 1j,..., X nj. The median absolute deviation statistic at time j, MAD j, is the median of X 1j M j,..., X nj M j. At time j, the MAD-median rule declares the value X an outlier if, for some specified constant K, X M j MAD j /.6745 > K. A common choice for K is the square root of the.975 quantile of a chi-squared distribution with one degree of freedom (e.g., Rousseeuw & Leroy, 1987; Wilcox, 2012, p. 97) and this convention was used here. So K is approximately Note that for the situation at hand, values declared outliers among the independent variable at time 1 are not necessarily the same as the values declared outliers at time 2. Here, removing leverage points means that for either possible value for j, the point (X ij, Y ij ) is removed if X ij is declared an outlier among the values X 1j,..., X nj using the MAD-median rule. That is, a point is removed if at either time 1 or time 2 the value of the independent variable is declared an outlier. 2.2 Estimating Standard Errors and Covariances There is the issue of estimating the covariance matrix associated with (d 0, d 1 ) = (b 01 b 02, b 11 b 12 ) in a manner that allows heteroscedasticity. Here, a standard bootstrap method is used (e.g. Efron & Tibshirani, 1993). Generate a bootstrap sample by randomly sampling with replacement n vectors of observations from (Y 11, X 11, Y 12, X 12, ),..., (Y n1, X n1, Y n2, X n2 ) yielding (Y 11, X 11, Y 12, X 12, ),..., (Y n1, X n1, Y n2, X n2). Based on this bootstrap sample, compute the least squares estimates of the slopes and intercepts and take the differences yielding (d 0, d 1). Repeat this B times yielding (d 0b, d 1b), b = 1,..., B. The covariance matrix associated with (d 0, d 1 ) is estimated with the sample covariance matrix based on (d 0b, d 1b), b = 1,..., B, which is denoted by S. 5

6 Given some value for the covariate, X = x say, the squared standard error of ˆD = Ŷ1 Ŷ2 is computed in a similar manner, where Ŷj = b 0j + b 1j x. Now let Ŷ j = b 0j + b 1jx, where b 0j and b 1j are the bootstrap estimates of the intercept and slope, respectively. For B bootstrap samples, this yields Ŷ jb, b = 1,..., B. The squared standard error of D is estimated with where ˆD b = Ŷ 1b Ŷ 2b and D = ˆD b /B. U 2 = 1 ( ˆD B 1 b D ) 2, Three choices for B were considered: 100, 200 and 500. Simulations indicated that increasing B from 100 to 200 offered some improvement in terms of the probability of a Type I error. Increasing B to 500 was found to provide little or no improvement, so B = 200 is assumed henceforth. 2.3 The Test Statistics First consider testing (2). The test statistic is based on a simple modification of Hotelling s T 2 statistic for testing the hypothesis that a multivariate normal distribution has a mean of zero. From basic principles, under multivariate normality, the hypothesis that J dependent groups have a common mean of zero is rejected if n(n J) 1 XS m J(n 1) X exceeds the 1 α quantile of an F distribution with degrees of freedom J and n J, where X is a vector of sample means and S m is the usual covariance matrix. It is well known that under general conditions, (d 0, d 1 ) is asymptotically bivariate normal. For the situation at hand, where J = 2, this suggests the test statistic H = n(n 2) 2(n 1) (d 0, d 1 )S 1 (d 0, d 1 ) (6) and rejecting (2) at the α level if H exceeds the 1 α quantile of an F distribution with ν 1 = 2 and ν 2 = n 2 degrees of freedom. For (3) and (4), the test statistic is taken to be T k = d k sk+1,k+1 (7) 6

7 where k = 0 or 1. So for k = 0, (3) is being tested where s k+1,k+1 = s 1,1 is the estimated squared standard error given by S. The null hypothesis is rejected if T k t, the 1 α/2 of a Student s t distribution with n 1 degrees of freedom. (The degrees of freedom were taken to be n 1 rather than n 2 because we did not use the usual homoscedastic estimate of the standard errors based on the residuals. Rather, we are mimicking the usual Student s t test. Despite this, perhaps there is some argument for using n 2, but this remains to be determined.) Finally, (5) is rejected if W t, where W = Ŷ1 Ŷ2 U (8) and again t is the 1 α/2 of a Student s t distribution with n 1 degrees of freedom. 3 Simulation Results Simulations were used to study the small-sample properties of the methods in section 2. The sample sizes considered were 20 and 40. Some additional simulations were run with n = 200 as a partial check on the R functions that were used to apply the methods. Estimated Type I error probabilities, ˆα, were based on 4000 replications. Four types of marginal distributions were used: normal, symmetric and heavy-tailed, asymmetric and light-tailed, and asymmetric and heavy-tailed. More precisely, the marginal distributions were taken to be one of four g-and-h distributions (Hoaglin, 1985) that contain the standard normal distribution as a special case. If Z has a standard normal distribution, then exp(gz) 1 exp(hz 2 /2), if g > 0 g V = Zexp(hZ 2 /2), if g = 0 has a g-and-h distribution where g and h are parameters that determine the first four moments. The four distributions used here were the standard normal (g = h = 0.0), a symmetric heavy-tailed distribution (h = 0.2, g = 0.0), an asymmetric distribution with relatively light tails (h = 0.0, g = 0.2), and an asymmetric distribution with heavy tails (g = h = 0.2). Table 1 shows the skewness (κ 1 ) and kurtosis (κ 2 ) for each distribution. Additional properties 7

8 Table 1: Some properties of the g-and-h distribution. g h κ 1 κ of the g-and-h distribution are summarized by Hoaglin (1985). The correlation among the four variables was taken to be ρ = 0 or.5. (The R function rmul in Wilcox, 2012, was used to generate data.) Three choices for λ were used: λ(x) = 1, λ(x) = X + 1 and λ(x) = 1/( X + 1). For convenience, these three choices are denoted by variance patterns (VP) 1, 2, and 3. As is evident, VP 1 corresponds to the usual homoscedasticity assumption. Table 2 summarizes the simulation results when testing (2) at the.05 level and the sample size is n = 20, where the columns headed by S are the results when leverage points are retained and LR indicates that leverage points are removed. Although the seriousness of a Type I error depends on the situation, Bradley (1978) has suggested that as a general guide, when testing at the.05 level, at a minimum the actual level should be between.025 and.075. In Table 2, when leverage points are retained, estimates range between.016 and With leverage points removed, the range is.026 to.071. Note that in Table 2, when leverage points are retained, the lowest estimates occur for VP 3 when h =.2. For g = 0 the estimate is.016 and for g =.2 the estimate is.017. Increasing the number of bootstrap samples from 200 to 500, the estimates are now.016 for both situations. The highest estimate in Table 2 is.079. With B = 500 the estimate is.080. When leverage points are removed, the lowest estimate in Table 2 is.026. With B = 500 the estimate is.022. The highest estimates are.071 for VP 2 when g = h = 0 and (g, h) = (.2,.2). For these two situations the estimates are.076 and.071 again with B = 500. Table 3 summarizes the results when testing (3) and (4). Note that when leverage points are retained, control over the Type I error probability is fairly good when testing (3), but 8

9 Table 2: Estimated Type I error probability when testing (2), α =.05, n = 20 S LR g h V P ρ =.0 ρ =.5 ρ =.0 ρ = S=Leverage points retained LR=Leverage points removed 9

10 Table 3: Estimates of α when testing (3) and (4), n = 20, α =.05 ρ = 0 ρ =.5 S LR S LR g h V P β 0 β 1 β 0 β 1 β 0 β 1 β 0 β S=Leverage points retained LR=Leverage points removed when testing (4) the estimates range between.011 and.083. It is heteroscedasticity that results in estimates well below or above the nominal level. Removing leverage points, again testing (4), now the estimates range between.019 and.069. In Table 3, the lowest estimates occur for VP 3 when testing (4), leverage points are retained, ρ = 0 and h =.2, the two estimates being.012 and.011. Increasing B to 500, the estimates are.013 for both situations. The highest estimate is.082, which occurs for VP 2. With B = 500, the estimate is.080. To add perspective, note that for VP 2, g =.2, h = 0, Table 3 indicates that when testing (4), without removing leverage points, the estimated Type I error probability is.075 when ρ =.5. Increasing n to 60, the estimate is.072 and for n = 100 the estimate is.062. So control over the Type I error is improving, as is expected, but simply removing leverage points, the estimate is.066 with n = 20. All indications are that in terms of controlling 10

11 Table 4: Estimates of α when testing (5), n = 20, α =.05, ρ = 0 S LR g h V P q1 q2 q3 q1 q2 q S=Leverage points retained LR=Leverage points removed the Type I error probability, little or nothing is lost removing leverage points and in some situations this has practical advantages. Finally, Table 4 reports estimated Type I error probabilities when testing (5) for three choices for X, namely, the estimated quartiles based on the covariate values in group 1. For brevity, only results for ρ = 0 are reported. With ρ =.5, no new insights were made. The results for the lower quartile, the median and the upper quartile are indicated by the columns headed by q1, q2 and q3, respectively. Generally, control over the Type I error probability is reasonably good. Again, when there is heteroscedasticity, removing leverage points can provide some improvement. 11

12 4 An Illustration A general goal in the Well Elderly II study was to assess the efficacy of an intervention strategy aimed at improving the physical and emotional health of older adults. A portion of the study was aimed at understanding the impact of intervention on a measure of meaningful activities as measured by the Meaningful Activity Participation Assessment (MAPA) instrument (Eakman et al., 2010). Higher MAPA scores reflect greater activity satisfaction. (Possible MAPA scores range between 6 and 42.) A covariate of interest was the cortisol awakening response (CAR), which is defined as the change in cortisol concentration that occurs during the first hour after waking from sleep. (CAR is taken to be the cortisol level after the participants were awake for about an hour or less minus the level of cortisol upon awakening.) Extant studies (e.g., Clow et al., 2004; Chida & Steptoe, 2009) indicate that various forms of stress are associated with the CAR. An issue was whether the association between the CAR and MAPA, measured before intervention, differed from the association after intervention. Testing (2) based on the regression line for predicting MAPA, given CAR, the p-value is.011 with leverage points retained compared to.005 when leverage points are removed. Testing (3) and (4), with leverage points retained, the corresponding p-values are.003 and.673. So the results indicate that the intercepts differ, but no significant difference between the slopes is found. Removing leverage points, again testing (3) and (4), now the p-values are.021 and.061. So again the slopes are not significantly different at the.05 level, but when testing (4), the p-value is substantially different compared to when the leverage points are retained. Testing (5) indicates that the regression lines cross somewhere between CAR equal to and But CAR equal to 5.62 falls well outside the range of observed CAR values. From a substantive point of view, among participants whose cortisol levels increase after awakening, MAPA scores tend to be higher after intervention. For those whose cortisol levels decrease, there is no indication that intervention improves MAPA scores. 12

13 5 Concluding Remarks In summary, there are some obvious speculations about how one might control the probability of a Type I error when there is heteroscedasticity and the goal is to compare the least squares regression lines at two different times. Simulations indicate that these methods perform reasonably well when there is homoscedasticity. When there is heteroscedasticity, there are some concerns when n is small, but they can be reduced by removing leverage points. As noted in the introduction, if a distribution is skewed, a robust measure of location might be preferred over the mean. In this case a robust regression estimator is more appropriate than the OLS estimator. And robust estimators offer the potential of higher power when there are outliers among the dependent variable. A method for testing the hypotheses (2), (3), (4) and (5), via a robust estimator, is being investigated that does not assume asymptotic normality. Finally, the R functions DregGOLS, difregols and Dancols are available for applying the methods in section 2. They are stored in the file Rallfun-24 on the first author s web page. Alternatively, these functions can be accessed via the R package WRS, which can be installed using a series of commands that are also described on this web page. REFERENCES Bradley, J. V. (1978) Robustness? British Journal of Mathematical and Statistical Psychology, 31, Chida, Y. & Steptoe, A. (2009). Cortisol awakening response and psychosocial factors: A systematic review and meta-analysis. Biological Psychology, 80, Clark, F., Jackson, J., Carlson, M., Chou,C.-P., Cherry, B. J., Jordan-Marsh, M., Knight, B. G., Mandel, D. Blanchard, J., Granger, D. A., Wilcox, R. R., Lai, M. Y., White, B., Hay, J., Lam, C., Marterella, A., & Azen, S. P. (2012). Effectiveness of a lifestyle intervention in promoting the well-being of independently living older people: results of the Well Elderly 2 Randomise Controlled Trial. Journal of Epidemiology and Community Health, 66, doi: /jech

14 Clow, A., Thorn, L., Evans, P. & Hucklebridge, F. (2004). The awakening cortisol response: Methodological issues and significance. Stress, 7, Cribari-Neto, F. (2004). Asymptotic inference under heteroskedasticity of unknown form. Computational Statistics & Data Analysis, 45, Cribari-Neto, F., & da Silva, W. B. (2011). A new heteroskedasticity-consistent covariance matrix estimator for the linear regression model. AStA Advances in Statistical Analysis, DOI /s Cribari-Neto, F., Souza, T. C. & Vasconcellos, K. L. P. (2007). Inference under heteroskedasticity and leveraged data. Communication in Statistics Theory and Methods, 36, Eakman, A. M., Carlson, M. E. & Clark, F. A. (2010). The meaningful activity participation assessment: a measure of engagement in personally valued activities International Journal of Aging Human Development, 70, Efron, B. & Tibshirani, R. J. (1993). An Introduction to the Bootstrap. New York: Chapman & Hall. Godfrey, L. G. (2006). Tests for regression models with heteroskedasticity of unknown form. Computational Statistics & Data Analysis, 50, Heritier, S., Cantoni, E, Copt, S. & Victoria-Feser, M.-P. (2009). Robust Methods in Biostatistics. New York: Wiley. Hinkley, D. V. (1977). Jackknifing in unbalanced situations. Technometrics, 19, Hoaglin, D. C. (1985). Summarizing shape numerically: The g-and-h distribution. In D. C. Hoaglin, F. Mosteller & J. W. Tukey (Eds.) Exploring Data Tables Trends and Shapes. New York: Wiley, pp Jackson, J., Mandel, D., Blanchard, J., Carlson, M., Cherry, B., Azen, S., Chou, C.-P., Jordan-Marsh, M., Forman, T., White, B., Granger, D., Knight, B., & Clark, F. (2009). Confronting challenges in intervention research with ethnically diverse older adults: the USC Well Elderly II trial. Clinical Trials, 6, Johnson, P. O. & Neyman, J. (1936). Tests of certain linear hypotheses and their application to some educational problems. Statistical Research Memoirs, 1, Long, J. S. & Ervin, L. H. (2000). Using heteroscedasticity consistent standard errors in the linear regression model. American Statistician, 54, Peng, H., Wang, S. & Wang, X. (2008). Consistency and asymptotic distribution of 14

15 the Theil Sen estimator. Journal of Statistical Planning and Inference, 138, Rousseeuw, P. J. & Leroy, A. M. (1987). Robust Regression & Outlier Detection. New York: Wiley. Sen, P. K. (1968). Estimate of the regression coefficient based on Kendall s tau. Journal of the American Statistical Association, 63, Staudte, R. G. & Sheather, S. J. (1990). Robust Estimation and Testing. New York: Wiley. Theil, H. (1950). A rank-invariant method of linear and polynomial regression analysis. Indagationes Mathematicae, 12, White, H. (1980). A heteroskedastic-consistent covariance matrix estimator and a direct test of heteroskedasticity. Econometrica, 48, Wilcox, R. R. (2012). Introduction to Robust Estimation and Hypothesis Testing, 3rd Ed. San Diego, CA: Academic Press. 15

COMPARING ROBUST REGRESSION LINES ASSOCIATED WITH TWO DEPENDENT GROUPS WHEN THERE IS HETEROSCEDASTICITY

COMPARING ROBUST REGRESSION LINES ASSOCIATED WITH TWO DEPENDENT GROUPS WHEN THERE IS HETEROSCEDASTICITY Rand R. Wilcox Dept of Psychology University of Southern California Florence Clark Division of Occupational