COMPARING ROBUST REGRESSION LINES ASSOCIATED WITH TWO DEPENDENT GROUPS WHEN THERE IS HETEROSCEDASTICITY

Size: px

Start display at page:

Download "COMPARING ROBUST REGRESSION LINES ASSOCIATED WITH TWO DEPENDENT GROUPS WHEN THERE IS HETEROSCEDASTICITY"

Martha Shanon Snow
5 years ago
Views:

1 COMPARING ROBUST REGRESSION LINES ASSOCIATED WITH TWO DEPENDENT GROUPS WHEN THERE IS HETEROSCEDASTICITY Rand R. Wilcox Dept of Psychology University of Southern California Florence Clark Division of Occupational Science & Occupational Therapy University of Southern California January 8,

2 ABSTRACT The paper deals with three approaches to comparing the regression lines corresponding to two dependent groups when using a robust estimator. The focus is on the Theil Sen estimator with some comments about alternative estimators that might be used. The first approach is to test the global hypothesis that the two groups have equal intercepts and slopes in a manner that allows a heteroscedastic error term. The second approach is to test the hypothesis of equal intercepts, ignoring the slopes, and testing the hypothesis of equal slopes, ignoring the intercepts. The third approach is to test the hypothesis that the regression lines differ at a specified design point. This last goal corresponds to the classic Johnson and Neyman method when dealing with independent groups and when using the ordinary least squares regression estimator. Based on extant studies, there are guesses about how to proceed in a manner that will provide reasonably accurate control over the Type I error probability: Use some type of percentile bootstrap method. (Methods that assume the regression estimator is asymptotically normal were not considered for reasons reviewed in the paper.) But there are no simulation results providing some sense of how well they perform when dealing with a relatively small sample size. Data from the Well Elderly II study are used to illustrate that the choice between the ordinary least squares estimator and the Theil Sen estimator can make a practical difference. Keywords: analysis of covariance, bootstrap methods, heteroscedasticity, Well Elderly II study. 1 Introduction Consider four random variables ɛ 1, ɛ 2, X 1 and X 2 where both (ɛ 1, ɛ 2 ) and (X 1, X 2 ) have some unknown bivariate distribution. Assume that Y j = β 0j β 1j X 1j λ(x 1j )ɛ j, (1) where β kj (k = 0, 1; j = 1, 2) are unknown parameters, λ(x 1j ) is some unknown function that models heteroscedasticity and σj 2 is the population variance associated with ɛ j. It is 2

3 assumed that ɛ j has a median of zero. The paper considers the problem of testing H 0 : (β 01, β 11 ) = (β 02, β 12 ) (2) when using the robust regression estimator derived by Theil (1950) and Sen (1968). In principle, any robust estimator can be used with the approach taken here and a few results based on alternative estimators (including a quantile regression estimator and the MMestimator) are reported. But because there are so many robust estimators, considering all possible choices amounts to a monumental task. The goal here is a more modest one: Use simulations to determine whether reasonably good control over the probability of a Type I error can be achieved for some estimator that is reasonably robust when the sample size is relatively small. Another goal is to use data from an actual study to determine whether the robust method studied here can make a practical difference, compared to using a least squares regression estimator, when the sample size is reasonably large. The paper also reports results on methods aimed at testing H 0 : β 01 = β 02, (3) and and H 0 : β 11 = β 12, (4) H 0 : M(Y 1 X) = M(Y 2 X), (5) where M(Y X) indicates some conditional measure of location given X. For the case of independent groups, 5 corresponds to the classic Johnson and Neyman (1936) method when using least squares regression, there is homoscedasticity and the error term has a normal distribution. Testing the hypotheses just described arose in connection with the Well Elderly II study (Jackson et al., 2009), which motivated this paper. A general goal in the Well Elderly II study was to assess the efficacy of an intervention strategy aimed at improving the physical and emotional health of older adults. A portion of the study was aimed at understanding the impact of intervention on a measure of meaningful activities as measured by the Meaningful Activity Participation Assessment (MAPA) instrument (Eakman et al., 2010). Higher MAPA scores reflect greater activity satisfaction. A covariate of interest was the cortisol awakening 3

4 response (CAR), which is defined as the change in cortisol concentration that occurs during the first hour after waking from sleep. (CAR is taken to be the cortisol level after the participants were awake for about an hour or less minus the level of cortisol upon awakening.) Extant studies (e.g., Clow et al., 2004; Chida & Steptoe, 2009) indicate that various forms of stress are associated with the CAR. An issue was whether the association between the CAR and MAPA, measured before intervention, differed from the association after intervention. In an unpublished paper, Wilcox and Clark (2013) studied heteroscedastic methods for testing the above hypotheses using the ordinary least squares (OLS) estimator. However, in the Well Elderly study, boxplots indicated that both CAR and MAPA have skewed distributions with outliers. Simple transformations (e.g., taking logs or using a Box Cox transformation) did not deal effectively with this concern. The distributions remained skewed with outliers. It is well known that even a single outlier can result in a misleading summary of an association when using the OLS estimator (e.g., Hampel et al., 1986; Huber & Ronchetti, 2009; Staudte & Sheather, 1990, Wilcox, 2012). Outliers among the dependent variable can impact power as well. Consequently, there was interest in using a robust regression estimator, but there was concern regarding the lack of information about the finite sample properties of methods that might be used. Moreover, it seems that there are no published results dealing with a heteroscedastic error term when comparing the regression parameters of dependent groups via some robust estimator. The strategies used here differ in obvious ways from the approach used by Wilcox and Clark. For example, when testing (2) using the OLS estimator, Wilcox and Clark used a version of Hotelling s T 2 test statistic coupled with a bootstrap estimate of the relevant standard errors and covariances. One strategy is to mimic this approach here using in part a bootstrap estimate of the standard errors and relevant covariances. However, any method that assumes that the Theil Sen estimator is asymptotically normal appears to be ill advised. The reason is that Peng et al. (2008) established that the slope estimator may or may not be asymptotically normal. Here, a percentile bootstrap method is used with a p-value computed based on the projection depth of the null vector in the bootstrap cloud. For the other hypotheses, again a percentile bootstrap method is used. A basic percentile bootstrap appears to perform well when using robust regression estimators in general when the goal is to compare independent groups in contrast to situations where hypotheses about population 4

5 means are of interest (Wilcox, 2012). So an obvious speculation is that a percentile bootstrap method will perform reasonably well for the situation at hand, but there are no published results indicating the extent this is the case. 2 Description of the Methods This section describes the Theil Sen estimator and reviews some of its properties. This is followed by a description of the the bootstrap methods that are used to test (2) - (5). 2.1 The Theil Sen Estimator Let (Y 1, X 1 ),..., (Y n, X n ) be a random sample from some unknown bivariate distribution. Assuming that X l X m for any l < m, let b lm = Y l Y m X l X m, 1 l < m n. The Theil-Sen estimate of the slope, say ˆβ 1, is taken to be the usual sample median based on the b jk values. The intercept, β 0, is typically estimated with ˆβ 0 = M y ˆβ 1 M x, where M y is the usual sample median based on Y 1,..., Y n. This will be called the TS estimator henceforth. Its mean squared error and small-sample efficiency compare well to the OLS estimator as well as other robust estimators that have been derived (Dietz, 1987; Wilcox, 1998). Dietz (1989) established that its asymptotic breakdown point is approximately.29. Roughly, about 29% of the points must be changed in order to make the estimate of the slope arbitrarily large or small. Other asymptotic properties have been studied by Wang (2005) and Peng et al. (2008). This is not to suggest that it dominates in any way all of the other robust estimators that have been derived. All indications are that no single estimator dominates. The only point is that a reasonable estimator has been chosen that has been studied extensively. It is noted that several strategies have been proposed regarding how the Theil Sen estimator might be generalized to more than one predictor (Wilcox, 2012, section 10.2). Because the focus here is on a single predictor, the details are omitted. 5

6 2.2 The Hypothesis Testing Techniques First consider testing (2). A version of a basic percentile bootstrap is applied as follows. (For relevant theoretical details, see Liu & Singh, 1997.) Let (X i1, Y i1, X i2, Y i2 ), i = 1,..., n, be a random sample of pairs of observations of observations taken at two different times. The resulting estimates of the slopes at times 1 and 2 are denoted by b 11 the estimate of the intercepts are b 01 and b 02. and b 12, and Begin by resampling with replacement n vectors of observations from (X i1, Y i1, X i2, Y i2 ) yielding (X i1, Y i1, X i2, Y i2) and let b 11 and b 12 b 01 and b 02 be the resulting estimates of the slopes and intercepts, respectively, based on this bootstrap sample. Let d j = b j1 b j2 (j = 0, 1). Repeat this process B times yielding d jb (b = 1,..., B). In terms of controlling the probability of a Type I error, B = 500 seems to be adequate for a range of hypothesis testing methods based on robust estimators (Wilcox, 2012). Consequently, B = 500 was used here. But in terms of power, a larger choice for B might have practical value (e.g., Jöckel, 1986; Racine & MacKinnon, 2007). Let V be the sample covariance matrix based on the d jb values. That is, where d j = d jb /B. Let d b = (d 0b, d 1b) and v jk = 1 (d B 1 jb d j)(d kb d k ), D 2 b = (d b d)v 1 (d b d) be the squared Mahalanobis distance of d b from d, where d = ( d 1, d 2). Let D 0 be the Mahalanobis distance of the null vector. Then from general theoretical results in Liu and Singh (1997), a p-value is given by 1 ID0 <D B b, where the indicator function I D0 <D b = 1 if D 0 < D b, otherwise I D0 <D b = 0. The strategy for computing a p-value, just described, has been used in a variety of settings where the goal is to test some global hypothesis associated with some multivariate distribution. Generally, this approach appears to perform well, in terms of controlling the Type I error probability, when using some multivariate estimator that is reasonably robust (Wilcox, 2012). However, Wilcox notes that situations are encountered where this is not 6

7 the case. More precisely, situations are encountered where the sample covariance matrix associated the bootstrap sample, the matrix V in the present context, is singular. Such situations seem to be rare, but to guard against this possibility here, an alternative measure of the distance of a point from the center of the bootstrap cloud is used that avoids this problem. The approach is a variation of the projection distance discussed by Donoho and Gasko (1992), which is described in Wilcox (2012, section 6.2.5). The complete computational details are not given here. Rather a rough outline of the method is described instead. Consider any d m, where 1 m B. The distance of d m from the center of the cloud is computed as follows. For each b (b = 1,..., B), project the points d 1,..., d B points onto the line connecting d b and the center of the bootstrap cloud. The center of the cloud can be estimated in many ways. Here, for simplicity, the marginal medians are used. Next, standardize the projected distance of d m from the center by dividing by some robust measure of scale based on the projected points yielding say m. Here the interquartile range is used. Do this for each, b = 1,..., B, and label the results mb. Then the projection distance of d m is the maximum value among m1,..., mb. Here, these distances were computed with the R function pdis, which is part of the Forge R package WRS. For (3) and (4), a basic percentile bootstrap method is used. For fixed j, let ˆP be the proportion of times d jb is negative among the B bootstrap samples. Then a p-value when testing (4) and (3), which corresponds to j = 1 and 0, respectively, is 2 min( ˆP, 1 ˆP ). As for (5), proceed as was done testing (4) and (3), only now ˆP is taken to be the proportion of times Ŷ 1 < Ŷ 2 for a given value of X, where Ŷ j group based on a bootstrap sample. is the estimate of M(Y X) for the jth 3 Simulation Results Simulations were used to study the small-sample properties of the methods in section 2 when testing at the.05 level. The sample sizes considered were 20 and 40. Some additional simulations were run with n = 200 as a partial check on the R functions that were used to apply the methods. Estimated Type I error probabilities, ˆα, were based on 2000 replications. Although the seriousness of a Type I error depends on the situation, Bradley (1978) has 7

8 suggested that as a general guide, when testing at the.05 level, at a minimum the actual level should be between.025 and.075. Using results in Pratt (1968) to compute a (two-sided).95 confidence interval for the actual level, based on 2000 replications, the hypothesis that the actual level is.075, would be rejected if the estimated level is less than or equal.063. As will be seen, the estimated levels are less than.050 among all of the situations considered. Similarly, if the estimated level is less than or equal to.018, reject the hypothesis that the actual level is.025. If the actual level is less than or equal to.075, the margin of error is about.011 or less. Serlin (2000) provides a more detailed analysis of how many replications should be used in a simulation study. Of course, power is a function in part of how small of a difference one wants to detect. If, for example, a method is considered level robust if the actual Type I error probability is between.04 and.06 when testing at the.05 level, and if the goal is to have power equal to.8 when in fact the actual level satisfies Bradley s criterion, results derived in Serlin s paper indicate using over 13,000 replications. Serlin also notes that in various situations, such a large number of replications can be impractical in terms of execution time. When studying more traditional methods, such as least squares regression, 13,000 replications is a realistic number. But when combining bootstrap methods with the robust regression estimators used here, high execution time becomes a serious issue. For example, when testing (2), a single simulation based on replications would require about 30 hours of execution time on a 2.7 GHz MacBook pro. No attempt is made to resolve the issue of how close the actual level should be to the nominal level. This goes beyond the scope of this paper. Rather, the goal is to characterize the accuracy of simulations reported here based on 2000 replicatsions. The results reported in this section strongly indicate that the actual level will not exceed.075 when testing at the.05 level. The reason is that all estimates were found to be significantly less than.075, so a Type II error regarding the hypothesis that the actual level is greater than or equal to.075 was not made. As for situations where the actual level might be less than.025, power is an issue. If, for example, the actual level is.02, power is.41. (based on a one-sided test at the.045 level where the hypothesis is that the level is greater than.025).if the actual level is.017, power is.79. Four types of marginal distributions were used: normal, symmetric and heavy-tailed, 8

9 Table 1: Some properties of the g-and-h distribution. g h κ 1 κ asymmetric and light-tailed, and asymmetric and heavy-tailed. More precisely, the marginal distribution s were taken to be one of four g-and-h distributions (Hoaglin, 1985) that contain the standard normal distribution as a special case. If Z has a standard normal distribution, then exp(gz) 1 exp(hz 2 /2), if g > 0 g W = Zexp(hZ 2 /2), if g = 0 has a g-and-h distribution where g and h are parameters that determine the first four moments. The four distributions used here were the standard normal (g = h = 0.0), a symmetric heavy-tailed distribution (h = 0.2, g = 0.0), an asymmetric distribution with relatively light tails (h = 0.0, g = 0.2), and an asymmetric distribution with heavy tails (g = h = 0.2). Table 1 shows the skewness (κ 1 ) and kurtosis (κ 2 ) for each distribution. Additional properties of the g-and-h distribution are summarized by Hoaglin (1985). The correlation among the four variables was taken to be ρ = 0 or.5. (The R function rmul in Wilcox, 2012, was used to generate data.) Three choices for λ were used: λ(x) = 1, λ(x) = X 1 1 and λ(x) = 1/( X 1 1). For convenience, these three choices are denoted by variance patterns (VP) 1, 2, and 3. As is evident, VP 1 corresponds to the usual homoscedasticity assumption. These variance patterns have been used in a number of studies that are summarized in Wilcox (2012, Ch. 11). (The final section comments on the general problem of modeling heteroscedasticity.) Table 2 summarizes the simulation results when testing (2) at the.05 level and the sample size is n = 20. In Table 2 the estimates range between.011 and.040. So all indications are that the actual level never exceeds the nominal.05 level, but situations are encountered where it drops below.025. Note that the lowest estimates occur for VP 3 when ρ = 0 and sampling is from a heavy-tailed distribution (h =.2). Increasing the sample size to n = 40, 9

10 Table 2: Estimated Type I error probability when testing (2), α =.05, n = 20 g h V P ρ = 0 ρ = the estimates for these two situations are.016 when g = 0 and.013 when g =.2. As will be seen, the methods for testing (3), (4) and (5) perform better in these same situations. Simulations were run again, when testing (2), with the Theil Sen estimator replaced by the MM-estimator derived by Yohai (1987), but the results were virtually the same as when using the Theil Sen estimator. Using the least trimmed squares estimator made matters worse. Switching to the quantile regression estimator derived by Koenker and Bassett (1978) resulted in improved control over the Type I error probability when there is homoscedasticity. For example, for normal distributions the estimated Type I error probability was.056 with ρ = 0 compared to.027 when using Theil Sen. For h =.2, the estimates based on the quantile regression estimator was.035 compared to.017 using Theil Sen. However, for VP 2, control over the Type I error probability was less satisfactory. Under normality, for example, with ρ = 0, the estimate was.077. For (g, h) = (.2, 0), the estimate was.078. Table 3 shows the results when testing (3) and (4). Again the estimated probability of a Type I error is always less than the nominal level, but situations are encountered where the estimate is less than.025. This is particularly the case for VP 2 when testing (3). Increasing the sample size to n = 40, the lowest estimate when testing (3) is now.028 and the lowest 10

11 Table 3: Estimates of α when testing (3) and (4), n = 20, α =.05 ρ = 0 ρ =.5 g h V P β 0 β 1 β 0 β estimate when testing (4) is.036. So with n 40 all indications are Bradley s criterion is met. Table 4 shows the estimated Type I error probabilities when testing (5). Three choices for X were used: q 1, q 2 and q 3, which correspond to the estimated lower, middle and upper quartiles based on X 11,..., X n1. Again, estimated Type I error probabilities are always less than the nominal.05 level. The only difficulty is that the estimates drop below.025 in some situations. Increasing n to 40 generally corrects this problem. For example, for g = 0, h =.2, ρ =.5 and VP 2, the estimates corresponding to q 1, q 2 and q 3 are.025,.032 and.031. The only exception was for g = h =.2, ρ =.5 and VP 2: the estimates were.024,.028 and An Illustration As indicated in the introduction, the motivation for this paper stems from the Well Elderly study. Extant papers indicate that there is an association between the CAR and various 11

12 Table 4: Estimates of α when testing (5), n = 20, α =.05 ρ =.0 ρ =.5 g h V P q 1 q 2 q 3 q 1 q 2 q measures of psychological stress and well being. But little is known about the impact of intervention on the association between cortisol and psychological measures of interest. Both CAR and MAPA were measured before intervention and after six months of intervention. Before intervention, the sample size was 328. Eliminating all participants with missing values after intervention, the sample size was n = 216. If CAR is taken to be the dependent variable, the test of (2) with the method in this paper yields p =.024. In contrast, using OLS as in Wilcox and Clark (2013), p =.28, the only point being that the choice of method can make a practical difference due to the impact of outliers on the least squares estimator. Testing (3), p =.09 and for the test of (4), p =.007. For the test of (5) based on the Theil Sen estimator, given that MAPA=59, p =.036 suggesting that for MAPA 59, and after intervention, CAR tends to be more negative compared to the CAR prior to intervention. That is, among participants with higher MAPA scores, cortisol tends to increase more after intervention compared to increases prior to intervention. Indeed, prior to intervention, no association between the CAR and MAPA was found. It seems fair to say that a common practice is to assume a straight line provides a 12

13 reasonably accurate approximation of the true regression line. However, look at Figure 1, which shows an approximation of the two regression lines using the smoother derived by Cleveland (1979) and later extended by Cleveland and Devlin (1988). The method is popularly known as LOESS and is designed to deal with curvature in a reasonably flexible manner. In Figure 1, the solid line corresponds to measures taken before intervention, which appears to be approximately straight. However, after intervention, there appears to be a distinct bend close to where MAPA is equal to 70. Testing the hypothesis that the regression line is straight, using the method in Wilcox (2012, section ), the p-value is p =.04. If possible curvature is taken into account by focusing on only the participants with MAPA less than 70, the test of (2) now has p =.01 and the Wilcox Clark method yields p =.16. The test of (3), using the method in this paper, now has p =.037 and for the test of (4), p =.01. So taking curvature into account had only a small impact when testing (4), but when testing (3) the p-value dropped from.09 to.037. Also, ignoring curvature, both of the slopes do not differ significantly from zero when testing at the.05 level. After intervention, p =.057. Focusing only on the data satisfying MAPA less than 70, p = Concluding Remarks In summary, there is an obvious speculation about how one might control the probability of a Type I error when testing the hypotheses considered in the paper. Simulations indicate that these methods perform reasonably well, in terms of avoiding an actual level above the nominal level when n = 20. The main difficulty is that there are situations where estimate drops below.025 when testing at the.05 level. Generally, with n = 40 this problem is corrected. An exception occurs for VP 3 when testing (2) and when sampling from a heavytailed distribution. Perhaps some variation of the methods used here is better able to deal with this situation, but this remains to be determined. Although the variance patterns used here have been used in other studies, evidently there are no comprehensive empirical investigations that help characterize the degree of heteroscedasticity that might be encountered in practice. Perhaps more severe amounts of heteroscedasticity occur in practice that would alter the conclusions reported here. It is evident that dealing with this issue in a satisfactory manner is difficult at best. 13

14 MAPA CAR Figure 1: The estimated regression lines for predicting CAR given MAPA. The solid line is the regression line prior to intervention. Points associated with the group prior to intervention are indicated by o; points after intervention are indicated by 14

15 There are numerous robust regression estimators beyond the one considered here (e.g., Wilcox, 2012). Some additional simulations were run using the MM-estimator derived by Yohai (1987) as well as the least trimmed squares estimator. All indications are that the Type I error probability is controlled about as well as indicated here when using the MMestimator, but a more detailed study is needed. Using the least trimmed squares estimator via the R package robustbase, made matters worse. Here it is assumed that the regression lines are reasonably straight. But it is suggested that this should not be taken for granted, as was illustrated in section 4. Finally, the R functions DregG, difreg and Dancts apply the methods in section 2 and have been added to the Forge R package WRS. REFERENCES Bradley, J. V. (1978) Robustness? British Journal of Mathematical and Statistical Psychology, 31, Chida, Y. & Steptoe, A. (2009). Cortisol awakening response and psychosocial factors: A systematic review and meta-analysis. Biological Psychology, 80, Cleveland, W. S. (1979). Robust locally weighted regression and smoothing scatterplots. Journal of the American Statistical Association, 74, Cleveland, W.S., and Devlin, S.J., (1988) Locally-weighted Regression: An Approach to Regression Analysis by Local Fitting. Journal of the American Statistical Association, 83, Clow, A., Thorn, L., Evans, P. & Hucklebridge, F. (2004). The awakening cortisol response: Methodological issues and significance. Stress, 7, Donoho, D. L. & Gasko, M. (1992). Breakdown properties of the location estimates based on halfspace depth and projected outlyingness. Annals of Statistics, 20, Eakman, A. M., Carlson, M. E. & Clark, F. A. (2010). The meaningful activity participation assessment: a measure of engagement in personally valued activities International Journal of Aging Human Development, 70,

16 Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J. & Stahel, W. A. (1986). Robust Statistics. New York: Wiley. Heritier, S., Cantoni, E, Copt, S. & Victoria-Feser, M.-P. (2009). Robust Methods in Biostatistics. New York: Wiley. Hinkley, D. V. (1977). Jackknifing in unbalanced situations. Technometrics, 19, Hoaglin, D. C. (1985). Summarizing shape numerically: The g-and-h distribution. In D. Hoaglin, F. Mosteller & J. Tukey (Eds.) Exploring Data Tables Trends and Shapes. New York: Wiley, pp Huber, P. J. & Ronchetti, E. (2009). Robust Statistics, 2nd Ed. New York: Wiley. Jackson, J., Mandel, D., Blanchard, J., Carlson, M., Cherry, B., Azen, S., Chou, C.-P., Jordan-Marsh, M., Forman, T., White, B., Granger, D., Knight, B., & Clark, F. (2009). Confronting challenges in intervention research with ethnically diverse older adults: the USC Well Elderly II trial. Clinical Trials, Jöckel, K.-H. (1986). Finite sample properties and asymptotic efficiency of Monte Carlo tests. Annals of Statistics, 14, Johnson, P. & Neyman, J. (1936). Tests of certain linear hypotheses and their application to some educational problems. Statistical Research Memoirs, 1, Koenker, R. & Bassett, G. (1978). Regression quantiles. Econometrika, Peng, H., Wang, S. & Wang, X. (2008). Consistency and asymptotic distribution of the Theil Sen estimator. Journal of Statistical Planning and Inference, 138, Pratt, J. W. (1968). A normal approximation for binomial, F, beta, and other common, related tail probabilities, I. Journal of the American Statistical Association, 63, Racine, J. & MacKinnon, J. G. (2007). Simulation-based tests than can use any number of simulations. Communications in Statistics Simulation and Computation, 36, Rousseeuw, P. J. & Leroy, A. M. (1987). Robust Regression & Outlier Detection. New York: Wiley. Sen, P. K. (1968). Estimate of the regression coefficient based on Kendall s tau. Journal of the American Statistical Association, 63,

17 Serlin, R. C. (2000). Testing for robustness in Monte Carlo studies. Psychological Methods, 5, Staudte, R. G. & Sheather, S. J. (1990). Robust Estimation and Testing. New York: Wiley Theil, H. (1950). A rank-invariant method of linear and polynomial regression analysis. Indagationes Mathematicae, 12, Wang, X. Q., Asymptotics of the Theil-Sen estimator in simple linear regression models with a random covariate. Nonparametric Statistics 17, Wilcox, R. R. (2012). Introduction to Robust Estimation and Hypothesis Testing, 3rd Ed. San Diego, CA: Academic Press. Wilcox, R. R. & Clark, F. (2013). Within groups comparisons of least squares regression lines when there Is heteroscedasticity. Technical report, Dept of Psychology, University of Southern California. Yohai, V. J. (1987). High breakdown point and high efficiency robust estimates for regression. Annals of Statistics, 15,

Within Groups Comparisons of Least Squares Regression Lines When There Is Heteroscedasticity

Within Groups Comparisons of Least Squares Regression Lines When There Is Heteroscedasticity Rand R. Wilcox Dept of Psychology University of Southern California Florence Clark Division of Occupational