COMPARING ROBUST REGRESSION LINES ASSOCIATED WITH TWO DEPENDENT GROUPS WHEN THERE IS HETEROSCEDASTICITY

Size: px
Start display at page:

Download "COMPARING ROBUST REGRESSION LINES ASSOCIATED WITH TWO DEPENDENT GROUPS WHEN THERE IS HETEROSCEDASTICITY"

Transcription

1 COMPARING ROBUST REGRESSION LINES ASSOCIATED WITH TWO DEPENDENT GROUPS WHEN THERE IS HETEROSCEDASTICITY Rand R. Wilcox Dept of Psychology University of Southern California Florence Clark Division of Occupational Science & Occupational Therapy University of Southern California January 8,

2 ABSTRACT The paper deals with three approaches to comparing the regression lines corresponding to two dependent groups when using a robust estimator. The focus is on the Theil Sen estimator with some comments about alternative estimators that might be used. The first approach is to test the global hypothesis that the two groups have equal intercepts and slopes in a manner that allows a heteroscedastic error term. The second approach is to test the hypothesis of equal intercepts, ignoring the slopes, and testing the hypothesis of equal slopes, ignoring the intercepts. The third approach is to test the hypothesis that the regression lines differ at a specified design point. This last goal corresponds to the classic Johnson and Neyman method when dealing with independent groups and when using the ordinary least squares regression estimator. Based on extant studies, there are guesses about how to proceed in a manner that will provide reasonably accurate control over the Type I error probability: Use some type of percentile bootstrap method. (Methods that assume the regression estimator is asymptotically normal were not considered for reasons reviewed in the paper.) But there are no simulation results providing some sense of how well they perform when dealing with a relatively small sample size. Data from the Well Elderly II study are used to illustrate that the choice between the ordinary least squares estimator and the Theil Sen estimator can make a practical difference. Keywords: analysis of covariance, bootstrap methods, heteroscedasticity, Well Elderly II study. 1 Introduction Consider four random variables ɛ 1, ɛ 2, X 1 and X 2 where both (ɛ 1, ɛ 2 ) and (X 1, X 2 ) have some unknown bivariate distribution. Assume that Y j = β 0j β 1j X 1j λ(x 1j )ɛ j, (1) where β kj (k = 0, 1; j = 1, 2) are unknown parameters, λ(x 1j ) is some unknown function that models heteroscedasticity and σj 2 is the population variance associated with ɛ j. It is 2

3 assumed that ɛ j has a median of zero. The paper considers the problem of testing H 0 : (β 01, β 11 ) = (β 02, β 12 ) (2) when using the robust regression estimator derived by Theil (1950) and Sen (1968). In principle, any robust estimator can be used with the approach taken here and a few results based on alternative estimators (including a quantile regression estimator and the MMestimator) are reported. But because there are so many robust estimators, considering all possible choices amounts to a monumental task. The goal here is a more modest one: Use simulations to determine whether reasonably good control over the probability of a Type I error can be achieved for some estimator that is reasonably robust when the sample size is relatively small. Another goal is to use data from an actual study to determine whether the robust method studied here can make a practical difference, compared to using a least squares regression estimator, when the sample size is reasonably large. The paper also reports results on methods aimed at testing H 0 : β 01 = β 02, (3) and and H 0 : β 11 = β 12, (4) H 0 : M(Y 1 X) = M(Y 2 X), (5) where M(Y X) indicates some conditional measure of location given X. For the case of independent groups, 5 corresponds to the classic Johnson and Neyman (1936) method when using least squares regression, there is homoscedasticity and the error term has a normal distribution. Testing the hypotheses just described arose in connection with the Well Elderly II study (Jackson et al., 2009), which motivated this paper. A general goal in the Well Elderly II study was to assess the efficacy of an intervention strategy aimed at improving the physical and emotional health of older adults. A portion of the study was aimed at understanding the impact of intervention on a measure of meaningful activities as measured by the Meaningful Activity Participation Assessment (MAPA) instrument (Eakman et al., 2010). Higher MAPA scores reflect greater activity satisfaction. A covariate of interest was the cortisol awakening 3

4 response (CAR), which is defined as the change in cortisol concentration that occurs during the first hour after waking from sleep. (CAR is taken to be the cortisol level after the participants were awake for about an hour or less minus the level of cortisol upon awakening.) Extant studies (e.g., Clow et al., 2004; Chida & Steptoe, 2009) indicate that various forms of stress are associated with the CAR. An issue was whether the association between the CAR and MAPA, measured before intervention, differed from the association after intervention. In an unpublished paper, Wilcox and Clark (2013) studied heteroscedastic methods for testing the above hypotheses using the ordinary least squares (OLS) estimator. However, in the Well Elderly study, boxplots indicated that both CAR and MAPA have skewed distributions with outliers. Simple transformations (e.g., taking logs or using a Box Cox transformation) did not deal effectively with this concern. The distributions remained skewed with outliers. It is well known that even a single outlier can result in a misleading summary of an association when using the OLS estimator (e.g., Hampel et al., 1986; Huber & Ronchetti, 2009; Staudte & Sheather, 1990, Wilcox, 2012). Outliers among the dependent variable can impact power as well. Consequently, there was interest in using a robust regression estimator, but there was concern regarding the lack of information about the finite sample properties of methods that might be used. Moreover, it seems that there are no published results dealing with a heteroscedastic error term when comparing the regression parameters of dependent groups via some robust estimator. The strategies used here differ in obvious ways from the approach used by Wilcox and Clark. For example, when testing (2) using the OLS estimator, Wilcox and Clark used a version of Hotelling s T 2 test statistic coupled with a bootstrap estimate of the relevant standard errors and covariances. One strategy is to mimic this approach here using in part a bootstrap estimate of the standard errors and relevant covariances. However, any method that assumes that the Theil Sen estimator is asymptotically normal appears to be ill advised. The reason is that Peng et al. (2008) established that the slope estimator may or may not be asymptotically normal. Here, a percentile bootstrap method is used with a p-value computed based on the projection depth of the null vector in the bootstrap cloud. For the other hypotheses, again a percentile bootstrap method is used. A basic percentile bootstrap appears to perform well when using robust regression estimators in general when the goal is to compare independent groups in contrast to situations where hypotheses about population 4

5 means are of interest (Wilcox, 2012). So an obvious speculation is that a percentile bootstrap method will perform reasonably well for the situation at hand, but there are no published results indicating the extent this is the case. 2 Description of the Methods This section describes the Theil Sen estimator and reviews some of its properties. This is followed by a description of the the bootstrap methods that are used to test (2) - (5). 2.1 The Theil Sen Estimator Let (Y 1, X 1 ),..., (Y n, X n ) be a random sample from some unknown bivariate distribution. Assuming that X l X m for any l < m, let b lm = Y l Y m X l X m, 1 l < m n. The Theil-Sen estimate of the slope, say ˆβ 1, is taken to be the usual sample median based on the b jk values. The intercept, β 0, is typically estimated with ˆβ 0 = M y ˆβ 1 M x, where M y is the usual sample median based on Y 1,..., Y n. This will be called the TS estimator henceforth. Its mean squared error and small-sample efficiency compare well to the OLS estimator as well as other robust estimators that have been derived (Dietz, 1987; Wilcox, 1998). Dietz (1989) established that its asymptotic breakdown point is approximately.29. Roughly, about 29% of the points must be changed in order to make the estimate of the slope arbitrarily large or small. Other asymptotic properties have been studied by Wang (2005) and Peng et al. (2008). This is not to suggest that it dominates in any way all of the other robust estimators that have been derived. All indications are that no single estimator dominates. The only point is that a reasonable estimator has been chosen that has been studied extensively. It is noted that several strategies have been proposed regarding how the Theil Sen estimator might be generalized to more than one predictor (Wilcox, 2012, section 10.2). Because the focus here is on a single predictor, the details are omitted. 5

6 2.2 The Hypothesis Testing Techniques First consider testing (2). A version of a basic percentile bootstrap is applied as follows. (For relevant theoretical details, see Liu & Singh, 1997.) Let (X i1, Y i1, X i2, Y i2 ), i = 1,..., n, be a random sample of pairs of observations of observations taken at two different times. The resulting estimates of the slopes at times 1 and 2 are denoted by b 11 the estimate of the intercepts are b 01 and b 02. and b 12, and Begin by resampling with replacement n vectors of observations from (X i1, Y i1, X i2, Y i2 ) yielding (X i1, Y i1, X i2, Y i2) and let b 11 and b 12 b 01 and b 02 be the resulting estimates of the slopes and intercepts, respectively, based on this bootstrap sample. Let d j = b j1 b j2 (j = 0, 1). Repeat this process B times yielding d jb (b = 1,..., B). In terms of controlling the probability of a Type I error, B = 500 seems to be adequate for a range of hypothesis testing methods based on robust estimators (Wilcox, 2012). Consequently, B = 500 was used here. But in terms of power, a larger choice for B might have practical value (e.g., Jöckel, 1986; Racine & MacKinnon, 2007). Let V be the sample covariance matrix based on the d jb values. That is, where d j = d jb /B. Let d b = (d 0b, d 1b) and v jk = 1 (d B 1 jb d j)(d kb d k ), D 2 b = (d b d)v 1 (d b d) be the squared Mahalanobis distance of d b from d, where d = ( d 1, d 2). Let D 0 be the Mahalanobis distance of the null vector. Then from general theoretical results in Liu and Singh (1997), a p-value is given by 1 ID0 <D B b, where the indicator function I D0 <D b = 1 if D 0 < D b, otherwise I D0 <D b = 0. The strategy for computing a p-value, just described, has been used in a variety of settings where the goal is to test some global hypothesis associated with some multivariate distribution. Generally, this approach appears to perform well, in terms of controlling the Type I error probability, when using some multivariate estimator that is reasonably robust (Wilcox, 2012). However, Wilcox notes that situations are encountered where this is not 6

7 the case. More precisely, situations are encountered where the sample covariance matrix associated the bootstrap sample, the matrix V in the present context, is singular. Such situations seem to be rare, but to guard against this possibility here, an alternative measure of the distance of a point from the center of the bootstrap cloud is used that avoids this problem. The approach is a variation of the projection distance discussed by Donoho and Gasko (1992), which is described in Wilcox (2012, section 6.2.5). The complete computational details are not given here. Rather a rough outline of the method is described instead. Consider any d m, where 1 m B. The distance of d m from the center of the cloud is computed as follows. For each b (b = 1,..., B), project the points d 1,..., d B points onto the line connecting d b and the center of the bootstrap cloud. The center of the cloud can be estimated in many ways. Here, for simplicity, the marginal medians are used. Next, standardize the projected distance of d m from the center by dividing by some robust measure of scale based on the projected points yielding say m. Here the interquartile range is used. Do this for each, b = 1,..., B, and label the results mb. Then the projection distance of d m is the maximum value among m1,..., mb. Here, these distances were computed with the R function pdis, which is part of the Forge R package WRS. For (3) and (4), a basic percentile bootstrap method is used. For fixed j, let ˆP be the proportion of times d jb is negative among the B bootstrap samples. Then a p-value when testing (4) and (3), which corresponds to j = 1 and 0, respectively, is 2 min( ˆP, 1 ˆP ). As for (5), proceed as was done testing (4) and (3), only now ˆP is taken to be the proportion of times Ŷ 1 < Ŷ 2 for a given value of X, where Ŷ j group based on a bootstrap sample. is the estimate of M(Y X) for the jth 3 Simulation Results Simulations were used to study the small-sample properties of the methods in section 2 when testing at the.05 level. The sample sizes considered were 20 and 40. Some additional simulations were run with n = 200 as a partial check on the R functions that were used to apply the methods. Estimated Type I error probabilities, ˆα, were based on 2000 replications. Although the seriousness of a Type I error depends on the situation, Bradley (1978) has 7

8 suggested that as a general guide, when testing at the.05 level, at a minimum the actual level should be between.025 and.075. Using results in Pratt (1968) to compute a (two-sided).95 confidence interval for the actual level, based on 2000 replications, the hypothesis that the actual level is.075, would be rejected if the estimated level is less than or equal.063. As will be seen, the estimated levels are less than.050 among all of the situations considered. Similarly, if the estimated level is less than or equal to.018, reject the hypothesis that the actual level is.025. If the actual level is less than or equal to.075, the margin of error is about.011 or less. Serlin (2000) provides a more detailed analysis of how many replications should be used in a simulation study. Of course, power is a function in part of how small of a difference one wants to detect. If, for example, a method is considered level robust if the actual Type I error probability is between.04 and.06 when testing at the.05 level, and if the goal is to have power equal to.8 when in fact the actual level satisfies Bradley s criterion, results derived in Serlin s paper indicate using over 13,000 replications. Serlin also notes that in various situations, such a large number of replications can be impractical in terms of execution time. When studying more traditional methods, such as least squares regression, 13,000 replications is a realistic number. But when combining bootstrap methods with the robust regression estimators used here, high execution time becomes a serious issue. For example, when testing (2), a single simulation based on replications would require about 30 hours of execution time on a 2.7 GHz MacBook pro. No attempt is made to resolve the issue of how close the actual level should be to the nominal level. This goes beyond the scope of this paper. Rather, the goal is to characterize the accuracy of simulations reported here based on 2000 replicatsions. The results reported in this section strongly indicate that the actual level will not exceed.075 when testing at the.05 level. The reason is that all estimates were found to be significantly less than.075, so a Type II error regarding the hypothesis that the actual level is greater than or equal to.075 was not made. As for situations where the actual level might be less than.025, power is an issue. If, for example, the actual level is.02, power is.41. (based on a one-sided test at the.045 level where the hypothesis is that the level is greater than.025).if the actual level is.017, power is.79. Four types of marginal distributions were used: normal, symmetric and heavy-tailed, 8

9 Table 1: Some properties of the g-and-h distribution. g h κ 1 κ asymmetric and light-tailed, and asymmetric and heavy-tailed. More precisely, the marginal distribution s were taken to be one of four g-and-h distributions (Hoaglin, 1985) that contain the standard normal distribution as a special case. If Z has a standard normal distribution, then exp(gz) 1 exp(hz 2 /2), if g > 0 g W = Zexp(hZ 2 /2), if g = 0 has a g-and-h distribution where g and h are parameters that determine the first four moments. The four distributions used here were the standard normal (g = h = 0.0), a symmetric heavy-tailed distribution (h = 0.2, g = 0.0), an asymmetric distribution with relatively light tails (h = 0.0, g = 0.2), and an asymmetric distribution with heavy tails (g = h = 0.2). Table 1 shows the skewness (κ 1 ) and kurtosis (κ 2 ) for each distribution. Additional properties of the g-and-h distribution are summarized by Hoaglin (1985). The correlation among the four variables was taken to be ρ = 0 or.5. (The R function rmul in Wilcox, 2012, was used to generate data.) Three choices for λ were used: λ(x) = 1, λ(x) = X 1 1 and λ(x) = 1/( X 1 1). For convenience, these three choices are denoted by variance patterns (VP) 1, 2, and 3. As is evident, VP 1 corresponds to the usual homoscedasticity assumption. These variance patterns have been used in a number of studies that are summarized in Wilcox (2012, Ch. 11). (The final section comments on the general problem of modeling heteroscedasticity.) Table 2 summarizes the simulation results when testing (2) at the.05 level and the sample size is n = 20. In Table 2 the estimates range between.011 and.040. So all indications are that the actual level never exceeds the nominal.05 level, but situations are encountered where it drops below.025. Note that the lowest estimates occur for VP 3 when ρ = 0 and sampling is from a heavy-tailed distribution (h =.2). Increasing the sample size to n = 40, 9

10 Table 2: Estimated Type I error probability when testing (2), α =.05, n = 20 g h V P ρ = 0 ρ = the estimates for these two situations are.016 when g = 0 and.013 when g =.2. As will be seen, the methods for testing (3), (4) and (5) perform better in these same situations. Simulations were run again, when testing (2), with the Theil Sen estimator replaced by the MM-estimator derived by Yohai (1987), but the results were virtually the same as when using the Theil Sen estimator. Using the least trimmed squares estimator made matters worse. Switching to the quantile regression estimator derived by Koenker and Bassett (1978) resulted in improved control over the Type I error probability when there is homoscedasticity. For example, for normal distributions the estimated Type I error probability was.056 with ρ = 0 compared to.027 when using Theil Sen. For h =.2, the estimates based on the quantile regression estimator was.035 compared to.017 using Theil Sen. However, for VP 2, control over the Type I error probability was less satisfactory. Under normality, for example, with ρ = 0, the estimate was.077. For (g, h) = (.2, 0), the estimate was.078. Table 3 shows the results when testing (3) and (4). Again the estimated probability of a Type I error is always less than the nominal level, but situations are encountered where the estimate is less than.025. This is particularly the case for VP 2 when testing (3). Increasing the sample size to n = 40, the lowest estimate when testing (3) is now.028 and the lowest 10

11 Table 3: Estimates of α when testing (3) and (4), n = 20, α =.05 ρ = 0 ρ =.5 g h V P β 0 β 1 β 0 β estimate when testing (4) is.036. So with n 40 all indications are Bradley s criterion is met. Table 4 shows the estimated Type I error probabilities when testing (5). Three choices for X were used: q 1, q 2 and q 3, which correspond to the estimated lower, middle and upper quartiles based on X 11,..., X n1. Again, estimated Type I error probabilities are always less than the nominal.05 level. The only difficulty is that the estimates drop below.025 in some situations. Increasing n to 40 generally corrects this problem. For example, for g = 0, h =.2, ρ =.5 and VP 2, the estimates corresponding to q 1, q 2 and q 3 are.025,.032 and.031. The only exception was for g = h =.2, ρ =.5 and VP 2: the estimates were.024,.028 and An Illustration As indicated in the introduction, the motivation for this paper stems from the Well Elderly study. Extant papers indicate that there is an association between the CAR and various 11

12 Table 4: Estimates of α when testing (5), n = 20, α =.05 ρ =.0 ρ =.5 g h V P q 1 q 2 q 3 q 1 q 2 q measures of psychological stress and well being. But little is known about the impact of intervention on the association between cortisol and psychological measures of interest. Both CAR and MAPA were measured before intervention and after six months of intervention. Before intervention, the sample size was 328. Eliminating all participants with missing values after intervention, the sample size was n = 216. If CAR is taken to be the dependent variable, the test of (2) with the method in this paper yields p =.024. In contrast, using OLS as in Wilcox and Clark (2013), p =.28, the only point being that the choice of method can make a practical difference due to the impact of outliers on the least squares estimator. Testing (3), p =.09 and for the test of (4), p =.007. For the test of (5) based on the Theil Sen estimator, given that MAPA=59, p =.036 suggesting that for MAPA 59, and after intervention, CAR tends to be more negative compared to the CAR prior to intervention. That is, among participants with higher MAPA scores, cortisol tends to increase more after intervention compared to increases prior to intervention. Indeed, prior to intervention, no association between the CAR and MAPA was found. It seems fair to say that a common practice is to assume a straight line provides a 12

13 reasonably accurate approximation of the true regression line. However, look at Figure 1, which shows an approximation of the two regression lines using the smoother derived by Cleveland (1979) and later extended by Cleveland and Devlin (1988). The method is popularly known as LOESS and is designed to deal with curvature in a reasonably flexible manner. In Figure 1, the solid line corresponds to measures taken before intervention, which appears to be approximately straight. However, after intervention, there appears to be a distinct bend close to where MAPA is equal to 70. Testing the hypothesis that the regression line is straight, using the method in Wilcox (2012, section ), the p-value is p =.04. If possible curvature is taken into account by focusing on only the participants with MAPA less than 70, the test of (2) now has p =.01 and the Wilcox Clark method yields p =.16. The test of (3), using the method in this paper, now has p =.037 and for the test of (4), p =.01. So taking curvature into account had only a small impact when testing (4), but when testing (3) the p-value dropped from.09 to.037. Also, ignoring curvature, both of the slopes do not differ significantly from zero when testing at the.05 level. After intervention, p =.057. Focusing only on the data satisfying MAPA less than 70, p = Concluding Remarks In summary, there is an obvious speculation about how one might control the probability of a Type I error when testing the hypotheses considered in the paper. Simulations indicate that these methods perform reasonably well, in terms of avoiding an actual level above the nominal level when n = 20. The main difficulty is that there are situations where estimate drops below.025 when testing at the.05 level. Generally, with n = 40 this problem is corrected. An exception occurs for VP 3 when testing (2) and when sampling from a heavytailed distribution. Perhaps some variation of the methods used here is better able to deal with this situation, but this remains to be determined. Although the variance patterns used here have been used in other studies, evidently there are no comprehensive empirical investigations that help characterize the degree of heteroscedasticity that might be encountered in practice. Perhaps more severe amounts of heteroscedasticity occur in practice that would alter the conclusions reported here. It is evident that dealing with this issue in a satisfactory manner is difficult at best. 13

14 MAPA CAR Figure 1: The estimated regression lines for predicting CAR given MAPA. The solid line is the regression line prior to intervention. Points associated with the group prior to intervention are indicated by o; points after intervention are indicated by 14

15 There are numerous robust regression estimators beyond the one considered here (e.g., Wilcox, 2012). Some additional simulations were run using the MM-estimator derived by Yohai (1987) as well as the least trimmed squares estimator. All indications are that the Type I error probability is controlled about as well as indicated here when using the MMestimator, but a more detailed study is needed. Using the least trimmed squares estimator via the R package robustbase, made matters worse. Here it is assumed that the regression lines are reasonably straight. But it is suggested that this should not be taken for granted, as was illustrated in section 4. Finally, the R functions DregG, difreg and Dancts apply the methods in section 2 and have been added to the Forge R package WRS. REFERENCES Bradley, J. V. (1978) Robustness? British Journal of Mathematical and Statistical Psychology, 31, Chida, Y. & Steptoe, A. (2009). Cortisol awakening response and psychosocial factors: A systematic review and meta-analysis. Biological Psychology, 80, Cleveland, W. S. (1979). Robust locally weighted regression and smoothing scatterplots. Journal of the American Statistical Association, 74, Cleveland, W.S., and Devlin, S.J., (1988) Locally-weighted Regression: An Approach to Regression Analysis by Local Fitting. Journal of the American Statistical Association, 83, Clow, A., Thorn, L., Evans, P. & Hucklebridge, F. (2004). The awakening cortisol response: Methodological issues and significance. Stress, 7, Donoho, D. L. & Gasko, M. (1992). Breakdown properties of the location estimates based on halfspace depth and projected outlyingness. Annals of Statistics, 20, Eakman, A. M., Carlson, M. E. & Clark, F. A. (2010). The meaningful activity participation assessment: a measure of engagement in personally valued activities International Journal of Aging Human Development, 70,

16 Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J. & Stahel, W. A. (1986). Robust Statistics. New York: Wiley. Heritier, S., Cantoni, E, Copt, S. & Victoria-Feser, M.-P. (2009). Robust Methods in Biostatistics. New York: Wiley. Hinkley, D. V. (1977). Jackknifing in unbalanced situations. Technometrics, 19, Hoaglin, D. C. (1985). Summarizing shape numerically: The g-and-h distribution. In D. Hoaglin, F. Mosteller & J. Tukey (Eds.) Exploring Data Tables Trends and Shapes. New York: Wiley, pp Huber, P. J. & Ronchetti, E. (2009). Robust Statistics, 2nd Ed. New York: Wiley. Jackson, J., Mandel, D., Blanchard, J., Carlson, M., Cherry, B., Azen, S., Chou, C.-P., Jordan-Marsh, M., Forman, T., White, B., Granger, D., Knight, B., & Clark, F. (2009). Confronting challenges in intervention research with ethnically diverse older adults: the USC Well Elderly II trial. Clinical Trials, Jöckel, K.-H. (1986). Finite sample properties and asymptotic efficiency of Monte Carlo tests. Annals of Statistics, 14, Johnson, P. & Neyman, J. (1936). Tests of certain linear hypotheses and their application to some educational problems. Statistical Research Memoirs, 1, Koenker, R. & Bassett, G. (1978). Regression quantiles. Econometrika, Peng, H., Wang, S. & Wang, X. (2008). Consistency and asymptotic distribution of the Theil Sen estimator. Journal of Statistical Planning and Inference, 138, Pratt, J. W. (1968). A normal approximation for binomial, F, beta, and other common, related tail probabilities, I. Journal of the American Statistical Association, 63, Racine, J. & MacKinnon, J. G. (2007). Simulation-based tests than can use any number of simulations. Communications in Statistics Simulation and Computation, 36, Rousseeuw, P. J. & Leroy, A. M. (1987). Robust Regression & Outlier Detection. New York: Wiley. Sen, P. K. (1968). Estimate of the regression coefficient based on Kendall s tau. Journal of the American Statistical Association, 63,

17 Serlin, R. C. (2000). Testing for robustness in Monte Carlo studies. Psychological Methods, 5, Staudte, R. G. & Sheather, S. J. (1990). Robust Estimation and Testing. New York: Wiley Theil, H. (1950). A rank-invariant method of linear and polynomial regression analysis. Indagationes Mathematicae, 12, Wang, X. Q., Asymptotics of the Theil-Sen estimator in simple linear regression models with a random covariate. Nonparametric Statistics 17, Wilcox, R. R. (2012). Introduction to Robust Estimation and Hypothesis Testing, 3rd Ed. San Diego, CA: Academic Press. Wilcox, R. R. & Clark, F. (2013). Within groups comparisons of least squares regression lines when there Is heteroscedasticity. Technical report, Dept of Psychology, University of Southern California. Yohai, V. J. (1987). High breakdown point and high efficiency robust estimates for regression. Annals of Statistics, 15,

Within Groups Comparisons of Least Squares Regression Lines When There Is Heteroscedasticity

Within Groups Comparisons of Least Squares Regression Lines When There Is Heteroscedasticity Within Groups Comparisons of Least Squares Regression Lines When There Is Heteroscedasticity Rand R. Wilcox Dept of Psychology University of Southern California Florence Clark Division of Occupational

More information

GLOBAL COMPARISONS OF MEDIANS AND OTHER QUANTILES IN A ONE-WAY DESIGN WHEN THERE ARE TIED VALUES

GLOBAL COMPARISONS OF MEDIANS AND OTHER QUANTILES IN A ONE-WAY DESIGN WHEN THERE ARE TIED VALUES arxiv:1506.07461v1 [stat.me] 24 Jun 2015 GLOBAL COMPARISONS OF MEDIANS AND OTHER QUANTILES IN A ONE-WAY DESIGN WHEN THERE ARE TIED VALUES Rand R. Wilcox Dept of Psychology University of Southern California

More information

ANCOVA: A HETEROSCEDASTIC GLOBAL TEST WHEN THERE IS CURVATURE AND TWO COVARIATES

ANCOVA: A HETEROSCEDASTIC GLOBAL TEST WHEN THERE IS CURVATURE AND TWO COVARIATES ANCOVA: A HETEROSCEDASTIC GLOBAL TEST WHEN THERE IS CURVATURE AND TWO COVARIATES Rand R. Wilcox Dept of Psychology University of Southern California February 17, 2016 1 ABSTRACT Consider two independent

More information

COMPARING TWO DEPENDENT GROUPS VIA QUANTILES

COMPARING TWO DEPENDENT GROUPS VIA QUANTILES COMPARING TWO DEPENDENT GROUPS VIA QUANTILES Rand R. Wilcox Dept of Psychology University of Southern California and David M. Erceg-Hurn School of Psychology University of Western Australia September 14,

More information

THE RUNNING INTERVAL SMOOTHER: A CONFIDENCE BAND HAVING SOME SPECIFIED SIMULTANEOUS PROBABILITY COVERAGE

THE RUNNING INTERVAL SMOOTHER: A CONFIDENCE BAND HAVING SOME SPECIFIED SIMULTANEOUS PROBABILITY COVERAGE International Journal of Statistics: Advances in Theory and Applications Vol. 1, Issue 1, 2017, Pages 21-43 Published Online on April 12, 2017 2017 Jyoti Academic Press http://jyotiacademicpress.org THE

More information

Global comparisons of medians and other quantiles in a one-way design when there are tied values

Global comparisons of medians and other quantiles in a one-way design when there are tied values Communications in Statistics - Simulation and Computation ISSN: 0361-0918 (Print) 1532-4141 (Online) Journal homepage: http://www.tandfonline.com/loi/lssp20 Global comparisons of medians and other quantiles

More information

COMPARISONS OF TWO QUANTILE REGRESSION SMOOTHERS

COMPARISONS OF TWO QUANTILE REGRESSION SMOOTHERS COMPARISONS OF TWO QUANTILE REGRESSION SMOOTHERS arxiv:1506.07456v1 [stat.me] 24 Jun 2015 Rand R. Wilcox Dept of Psychology University of Southern California September 17, 2017 1 ABSTRACT The paper compares

More information

ANCOVA: A GLOBAL TEST BASED ON A ROBUST MEASURE OF LOCATION OR QUANTILES WHEN THERE IS CURVATURE

ANCOVA: A GLOBAL TEST BASED ON A ROBUST MEASURE OF LOCATION OR QUANTILES WHEN THERE IS CURVATURE ANCOVA: A GLOBAL TEST BASED ON A ROBUST MEASURE OF LOCATION OR QUANTILES WHEN THERE IS CURVATURE Rand R. Wilcox Dept of Psychology University of Southern California June 24, 2015 1 ABSTRACT For two independent

More information

Improved Methods for Making Inferences About Multiple Skipped Correlations

Improved Methods for Making Inferences About Multiple Skipped Correlations Improved Methods for Making Inferences About Multiple Skipped Correlations arxiv:1807.05048v1 [stat.co] 13 Jul 2018 Rand R. Wilcox Dept of Psychology University of Southern California Guillaume A. Rousselet

More information

Comparing Two Dependent Groups: Dealing with Missing Values

Comparing Two Dependent Groups: Dealing with Missing Values Journal of Data Science 9(2011), 1-13 Comparing Two Dependent Groups: Dealing with Missing Values Rand R. Wilcox University of Southern California Abstract: The paper considers the problem of comparing

More information

Two-by-two ANOVA: Global and Graphical Comparisons Based on an Extension of the Shift Function

Two-by-two ANOVA: Global and Graphical Comparisons Based on an Extension of the Shift Function Journal of Data Science 7(2009), 459-468 Two-by-two ANOVA: Global and Graphical Comparisons Based on an Extension of the Shift Function Rand R. Wilcox University of Southern California Abstract: When comparing

More information

INFLUENCE OF USING ALTERNATIVE MEANS ON TYPE-I ERROR RATE IN THE COMPARISON OF INDEPENDENT GROUPS ABSTRACT

INFLUENCE OF USING ALTERNATIVE MEANS ON TYPE-I ERROR RATE IN THE COMPARISON OF INDEPENDENT GROUPS ABSTRACT Mirtagioğlu et al., The Journal of Animal & Plant Sciences, 4(): 04, Page: J. 344-349 Anim. Plant Sci. 4():04 ISSN: 08-708 INFLUENCE OF USING ALTERNATIVE MEANS ON TYPE-I ERROR RATE IN THE COMPARISON OF

More information

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions Journal of Modern Applied Statistical Methods Volume 8 Issue 1 Article 13 5-1-2009 Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error

More information

Methods for Detection of Word Usage over Time

Methods for Detection of Word Usage over Time Methods for Detection of Word Usage over Time Ondřej Herman and Vojtěch Kovář Natural Language Processing Centre Faculty of Informatics, Masaryk University Botanická 68a, 6 Brno, Czech Republic {xherman,xkovar}@fi.muni.cz

More information

9 Correlation and Regression

9 Correlation and Regression 9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Robust model selection criteria for robust S and LT S estimators

Robust model selection criteria for robust S and LT S estimators Hacettepe Journal of Mathematics and Statistics Volume 45 (1) (2016), 153 164 Robust model selection criteria for robust S and LT S estimators Meral Çetin Abstract Outliers and multi-collinearity often

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

Introduction and Single Predictor Regression. Correlation

Introduction and Single Predictor Regression. Correlation Introduction and Single Predictor Regression Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Correlation A correlation

More information

Supplementary Material for Wang and Serfling paper

Supplementary Material for Wang and Serfling paper Supplementary Material for Wang and Serfling paper March 6, 2017 1 Simulation study Here we provide a simulation study to compare empirically the masking and swamping robustness of our selected outlyingness

More information

Introduction to Robust Statistics. Elvezio Ronchetti. Department of Econometrics University of Geneva Switzerland.

Introduction to Robust Statistics. Elvezio Ronchetti. Department of Econometrics University of Geneva Switzerland. Introduction to Robust Statistics Elvezio Ronchetti Department of Econometrics University of Geneva Switzerland Elvezio.Ronchetti@metri.unige.ch http://www.unige.ch/ses/metri/ronchetti/ 1 Outline Introduction

More information

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of

More information

THE 'IMPROVED' BROWN AND FORSYTHE TEST FOR MEAN EQUALITY: SOME THINGS CAN'T BE FIXED

THE 'IMPROVED' BROWN AND FORSYTHE TEST FOR MEAN EQUALITY: SOME THINGS CAN'T BE FIXED THE 'IMPROVED' BROWN AND FORSYTHE TEST FOR MEAN EQUALITY: SOME THINGS CAN'T BE FIXED H. J. Keselman Rand R. Wilcox University of Manitoba University of Southern California Winnipeg, Manitoba Los Angeles,

More information

Fast and robust bootstrap for LTS

Fast and robust bootstrap for LTS Fast and robust bootstrap for LTS Gert Willems a,, Stefan Van Aelst b a Department of Mathematics and Computer Science, University of Antwerp, Middelheimlaan 1, B-2020 Antwerp, Belgium b Department of

More information

Regression Analysis for Data Containing Outliers and High Leverage Points

Regression Analysis for Data Containing Outliers and High Leverage Points Alabama Journal of Mathematics 39 (2015) ISSN 2373-0404 Regression Analysis for Data Containing Outliers and High Leverage Points Asim Kumer Dey Department of Mathematics Lamar University Md. Amir Hossain

More information

Parametric Probability Densities and Distribution Functions for Tukey g-and-h Transformations and their Use for Fitting Data

Parametric Probability Densities and Distribution Functions for Tukey g-and-h Transformations and their Use for Fitting Data Applied Mathematical Sciences, Vol. 2, 2008, no. 9, 449-462 Parametric Probability Densities and Distribution Functions for Tukey g-and-h Transformations and their Use for Fitting Data Todd C. Headrick,

More information

ROBUST ESTIMATION OF A CORRELATION COEFFICIENT: AN ATTEMPT OF SURVEY

ROBUST ESTIMATION OF A CORRELATION COEFFICIENT: AN ATTEMPT OF SURVEY ROBUST ESTIMATION OF A CORRELATION COEFFICIENT: AN ATTEMPT OF SURVEY G.L. Shevlyakov, P.O. Smirnov St. Petersburg State Polytechnic University St.Petersburg, RUSSIA E-mail: Georgy.Shevlyakov@gmail.com

More information

Introduction Robust regression Examples Conclusion. Robust regression. Jiří Franc

Introduction Robust regression Examples Conclusion. Robust regression. Jiří Franc Robust regression Robust estimation of regression coefficients in linear regression model Jiří Franc Czech Technical University Faculty of Nuclear Sciences and Physical Engineering Department of Mathematics

More information

An Alternative to Cronbach s Alpha: A L-Moment Based Measure of Internal-consistency Reliability

An Alternative to Cronbach s Alpha: A L-Moment Based Measure of Internal-consistency Reliability Southern Illinois University Carbondale OpenSIUC Book Chapters Educational Psychology and Special Education 013 An Alternative to Cronbach s Alpha: A L-Moment Based Measure of Internal-consistency Reliability

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 370 Regression models are used to study the relationship of a response variable and one or more predictors. The response is also called the dependent variable, and the predictors

More information

ROBUSTNESS OF TWO-PHASE REGRESSION TESTS

ROBUSTNESS OF TWO-PHASE REGRESSION TESTS REVSTAT Statistical Journal Volume 3, Number 1, June 2005, 1 18 ROBUSTNESS OF TWO-PHASE REGRESSION TESTS Authors: Carlos A.R. Diniz Departamento de Estatística, Universidade Federal de São Carlos, São

More information

TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST

TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST Econometrics Working Paper EWP0402 ISSN 1485-6441 Department of Economics TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST Lauren Bin Dong & David E. A. Giles Department

More information

Inference for Single Proportions and Means T.Scofield

Inference for Single Proportions and Means T.Scofield Inference for Single Proportions and Means TScofield Confidence Intervals for Single Proportions and Means A CI gives upper and lower bounds between which we hope to capture the (fixed) population parameter

More information

A Comparison of Robust Estimators Based on Two Types of Trimming

A Comparison of Robust Estimators Based on Two Types of Trimming Submitted to the Bernoulli A Comparison of Robust Estimators Based on Two Types of Trimming SUBHRA SANKAR DHAR 1, and PROBAL CHAUDHURI 1, 1 Theoretical Statistics and Mathematics Unit, Indian Statistical

More information

3 Joint Distributions 71

3 Joint Distributions 71 2.2.3 The Normal Distribution 54 2.2.4 The Beta Density 58 2.3 Functions of a Random Variable 58 2.4 Concluding Remarks 64 2.5 Problems 64 3 Joint Distributions 71 3.1 Introduction 71 3.2 Discrete Random

More information

Midwest Big Data Summer School: Introduction to Statistics. Kris De Brabanter

Midwest Big Data Summer School: Introduction to Statistics. Kris De Brabanter Midwest Big Data Summer School: Introduction to Statistics Kris De Brabanter kbrabant@iastate.edu Iowa State University Department of Statistics Department of Computer Science June 20, 2016 1/27 Outline

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

Conventional And Robust Paired And Independent-Samples t Tests: Type I Error And Power Rates

Conventional And Robust Paired And Independent-Samples t Tests: Type I Error And Power Rates Journal of Modern Applied Statistical Methods Volume Issue Article --3 Conventional And And Independent-Samples t Tests: Type I Error And Power Rates Katherine Fradette University of Manitoba, umfradet@cc.umanitoba.ca

More information

Accurate and Powerful Multivariate Outlier Detection

Accurate and Powerful Multivariate Outlier Detection Int. Statistical Inst.: Proc. 58th World Statistical Congress, 11, Dublin (Session CPS66) p.568 Accurate and Powerful Multivariate Outlier Detection Cerioli, Andrea Università di Parma, Dipartimento di

More information

Experimental Design and Data Analysis for Biologists

Experimental Design and Data Analysis for Biologists Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1

More information

Comparing the performance of modified F t statistic with ANOVA and Kruskal Wallis test

Comparing the performance of modified F t statistic with ANOVA and Kruskal Wallis test Appl. Math. Inf. Sci. 7, No. 2L, 403-408 (2013) 403 Applied Mathematics & Information Sciences An International ournal http://dx.doi.org/10.12785/amis/072l04 Comparing the performance of modified F t statistic

More information

Contents. Acknowledgments. xix

Contents. Acknowledgments. xix Table of Preface Acknowledgments page xv xix 1 Introduction 1 The Role of the Computer in Data Analysis 1 Statistics: Descriptive and Inferential 2 Variables and Constants 3 The Measurement of Variables

More information

The scatterplot is the basic tool for graphically displaying bivariate quantitative data.

The scatterplot is the basic tool for graphically displaying bivariate quantitative data. Bivariate Data: Graphical Display The scatterplot is the basic tool for graphically displaying bivariate quantitative data. Example: Some investors think that the performance of the stock market in January

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

ON THE CALCULATION OF A ROBUST S-ESTIMATOR OF A COVARIANCE MATRIX

ON THE CALCULATION OF A ROBUST S-ESTIMATOR OF A COVARIANCE MATRIX STATISTICS IN MEDICINE Statist. Med. 17, 2685 2695 (1998) ON THE CALCULATION OF A ROBUST S-ESTIMATOR OF A COVARIANCE MATRIX N. A. CAMPBELL *, H. P. LOPUHAA AND P. J. ROUSSEEUW CSIRO Mathematical and Information

More information

Lecture Outline. Biost 518 Applied Biostatistics II. Choice of Model for Analysis. Choice of Model. Choice of Model. Lecture 10: Multiple Regression:

Lecture Outline. Biost 518 Applied Biostatistics II. Choice of Model for Analysis. Choice of Model. Choice of Model. Lecture 10: Multiple Regression: Biost 518 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture utline Choice of Model Alternative Models Effect of data driven selection of

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

9. Robust regression

9. Robust regression 9. Robust regression Least squares regression........................................................ 2 Problems with LS regression..................................................... 3 Robust regression............................................................

More information

Prentice Hall Stats: Modeling the World 2004 (Bock) Correlated to: National Advanced Placement (AP) Statistics Course Outline (Grades 9-12)

Prentice Hall Stats: Modeling the World 2004 (Bock) Correlated to: National Advanced Placement (AP) Statistics Course Outline (Grades 9-12) National Advanced Placement (AP) Statistics Course Outline (Grades 9-12) Following is an outline of the major topics covered by the AP Statistics Examination. The ordering here is intended to define the

More information

* Tuesday 17 January :30-16:30 (2 hours) Recored on ESSE3 General introduction to the course.

* Tuesday 17 January :30-16:30 (2 hours) Recored on ESSE3 General introduction to the course. Name of the course Statistical methods and data analysis Audience The course is intended for students of the first or second year of the Graduate School in Materials Engineering. The aim of the course

More information

Instructions: Closed book, notes, and no electronic devices. Points (out of 200) in parentheses

Instructions: Closed book, notes, and no electronic devices. Points (out of 200) in parentheses ISQS 5349 Final Spring 2011 Instructions: Closed book, notes, and no electronic devices. Points (out of 200) in parentheses 1. (10) What is the definition of a regression model that we have used throughout

More information

Introduction to Linear regression analysis. Part 2. Model comparisons

Introduction to Linear regression analysis. Part 2. Model comparisons Introduction to Linear regression analysis Part Model comparisons 1 ANOVA for regression Total variation in Y SS Total = Variation explained by regression with X SS Regression + Residual variation SS Residual

More information

MULTIVARIATE TECHNIQUES, ROBUSTNESS

MULTIVARIATE TECHNIQUES, ROBUSTNESS MULTIVARIATE TECHNIQUES, ROBUSTNESS Mia Hubert Associate Professor, Department of Mathematics and L-STAT Katholieke Universiteit Leuven, Belgium mia.hubert@wis.kuleuven.be Peter J. Rousseeuw 1 Senior Researcher,

More information

G. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication

G. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication G. S. Maddala Kajal Lahiri WILEY A John Wiley and Sons, Ltd., Publication TEMT Foreword Preface to the Fourth Edition xvii xix Part I Introduction and the Linear Regression Model 1 CHAPTER 1 What is Econometrics?

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 Statistics Boot Camp Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 March 21, 2018 Outline of boot camp Summarizing and simplifying data Point and interval estimation Foundations of statistical

More information

COMPARISON OF THE ESTIMATORS OF THE LOCATION AND SCALE PARAMETERS UNDER THE MIXTURE AND OUTLIER MODELS VIA SIMULATION

COMPARISON OF THE ESTIMATORS OF THE LOCATION AND SCALE PARAMETERS UNDER THE MIXTURE AND OUTLIER MODELS VIA SIMULATION (REFEREED RESEARCH) COMPARISON OF THE ESTIMATORS OF THE LOCATION AND SCALE PARAMETERS UNDER THE MIXTURE AND OUTLIER MODELS VIA SIMULATION Hakan S. Sazak 1, *, Hülya Yılmaz 2 1 Ege University, Department

More information

Introduction to Statistical Analysis

Introduction to Statistical Analysis Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive

More information

Transition Passage to Descriptive Statistics 28

Transition Passage to Descriptive Statistics 28 viii Preface xiv chapter 1 Introduction 1 Disciplines That Use Quantitative Data 5 What Do You Mean, Statistics? 6 Statistics: A Dynamic Discipline 8 Some Terminology 9 Problems and Answers 12 Scales of

More information

TESTS FOR MEAN EQUALITY THAT DO NOT REQUIRE HOMOGENEITY OF VARIANCES: DO THEY REALLY WORK?

TESTS FOR MEAN EQUALITY THAT DO NOT REQUIRE HOMOGENEITY OF VARIANCES: DO THEY REALLY WORK? TESTS FOR MEAN EQUALITY THAT DO NOT REQUIRE HOMOGENEITY OF VARIANCES: DO THEY REALLY WORK? H. J. Keselman Rand R. Wilcox University of Manitoba University of Southern California Winnipeg, Manitoba Los

More information

Analysing data: regression and correlation S6 and S7

Analysing data: regression and correlation S6 and S7 Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

An Equivalency Test for Model Fit. Craig S. Wells. University of Massachusetts Amherst. James. A. Wollack. Ronald C. Serlin

An Equivalency Test for Model Fit. Craig S. Wells. University of Massachusetts Amherst. James. A. Wollack. Ronald C. Serlin Equivalency Test for Model Fit 1 Running head: EQUIVALENCY TEST FOR MODEL FIT An Equivalency Test for Model Fit Craig S. Wells University of Massachusetts Amherst James. A. Wollack Ronald C. Serlin University

More information

Practical Statistics for the Analytical Scientist Table of Contents

Practical Statistics for the Analytical Scientist Table of Contents Practical Statistics for the Analytical Scientist Table of Contents Chapter 1 Introduction - Choosing the Correct Statistics 1.1 Introduction 1.2 Choosing the Right Statistical Procedures 1.2.1 Planning

More information

Subject CS1 Actuarial Statistics 1 Core Principles

Subject CS1 Actuarial Statistics 1 Core Principles Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and

More information

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij = K. Model Diagnostics We ve already seen how to check model assumptions prior to fitting a one-way ANOVA. Diagnostics carried out after model fitting by using residuals are more informative for assessing

More information

Research Methodology Statistics Comprehensive Exam Study Guide

Research Methodology Statistics Comprehensive Exam Study Guide Research Methodology Statistics Comprehensive Exam Study Guide References Glass, G. V., & Hopkins, K. D. (1996). Statistical methods in education and psychology (3rd ed.). Boston: Allyn and Bacon. Gravetter,

More information

IMPROVING THE SMALL-SAMPLE EFFICIENCY OF A ROBUST CORRELATION MATRIX: A NOTE

IMPROVING THE SMALL-SAMPLE EFFICIENCY OF A ROBUST CORRELATION MATRIX: A NOTE IMPROVING THE SMALL-SAMPLE EFFICIENCY OF A ROBUST CORRELATION MATRIX: A NOTE Eric Blankmeyer Department of Finance and Economics McCoy College of Business Administration Texas State University San Marcos

More information

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 10: Course Review Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,

More information

Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator

Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator by Emmanuel Flachaire Eurequa, University Paris I Panthéon-Sorbonne December 2001 Abstract Recent results of Cribari-Neto and Zarkos

More information

Statistical. Psychology

Statistical. Psychology SEVENTH у *i km m it* & П SB Й EDITION Statistical M e t h o d s for Psychology D a v i d C. Howell University of Vermont ; \ WADSWORTH f% CENGAGE Learning* Australia Biaall apan Korea Меяко Singapore

More information

On Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness

On Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness Statistics and Applications {ISSN 2452-7395 (online)} Volume 16 No. 1, 2018 (New Series), pp 289-303 On Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness Snigdhansu

More information

A Monte Carlo Simulation of the Robust Rank- Order Test Under Various Population Symmetry Conditions

A Monte Carlo Simulation of the Robust Rank- Order Test Under Various Population Symmetry Conditions Journal of Modern Applied Statistical Methods Volume 12 Issue 1 Article 7 5-1-2013 A Monte Carlo Simulation of the Robust Rank- Order Test Under Various Population Symmetry Conditions William T. Mickelson

More information

CHAPTER 5. Outlier Detection in Multivariate Data

CHAPTER 5. Outlier Detection in Multivariate Data CHAPTER 5 Outlier Detection in Multivariate Data 5.1 Introduction Multivariate outlier detection is the important task of statistical analysis of multivariate data. Many methods have been proposed for

More information

AP Statistics Cumulative AP Exam Study Guide

AP Statistics Cumulative AP Exam Study Guide AP Statistics Cumulative AP Eam Study Guide Chapters & 3 - Graphs Statistics the science of collecting, analyzing, and drawing conclusions from data. Descriptive methods of organizing and summarizing statistics

More information

A Brief Overview of Robust Statistics

A Brief Overview of Robust Statistics A Brief Overview of Robust Statistics Olfa Nasraoui Department of Computer Engineering & Computer Science University of Louisville, olfa.nasraoui_at_louisville.edu Robust Statistical Estimators Robust

More information

TESTS FOR EQUIVALENCE BASED ON ODDS RATIO FOR MATCHED-PAIR DESIGN

TESTS FOR EQUIVALENCE BASED ON ODDS RATIO FOR MATCHED-PAIR DESIGN Journal of Biopharmaceutical Statistics, 15: 889 901, 2005 Copyright Taylor & Francis, Inc. ISSN: 1054-3406 print/1520-5711 online DOI: 10.1080/10543400500265561 TESTS FOR EQUIVALENCE BASED ON ODDS RATIO

More information

AN IMPROVEMENT TO THE ALIGNED RANK STATISTIC

AN IMPROVEMENT TO THE ALIGNED RANK STATISTIC Journal of Applied Statistical Science ISSN 1067-5817 Volume 14, Number 3/4, pp. 225-235 2005 Nova Science Publishers, Inc. AN IMPROVEMENT TO THE ALIGNED RANK STATISTIC FOR TWO-FACTOR ANALYSIS OF VARIANCE

More information

Statistics 135 Fall 2008 Final Exam

Statistics 135 Fall 2008 Final Exam Name: SID: Statistics 135 Fall 2008 Final Exam Show your work. The number of points each question is worth is shown at the beginning of the question. There are 10 problems. 1. [2] The normal equations

More information

A Modified M-estimator for the Detection of Outliers

A Modified M-estimator for the Detection of Outliers A Modified M-estimator for the Detection of Outliers Asad Ali Department of Statistics, University of Peshawar NWFP, Pakistan Email: asad_yousafzay@yahoo.com Muhammad F. Qadir Department of Statistics,

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 1 Bootstrapped Bias and CIs Given a multiple regression model with mean and

More information

1 A Review of Correlation and Regression

1 A Review of Correlation and Regression 1 A Review of Correlation and Regression SW, Chapter 12 Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then

More information

STAT Section 3.4: The Sign Test. The sign test, as we will typically use it, is a method for analyzing paired data.

STAT Section 3.4: The Sign Test. The sign test, as we will typically use it, is a method for analyzing paired data. STAT 518 --- Section 3.4: The Sign Test The sign test, as we will typically use it, is a method for analyzing paired data. Examples of Paired Data: Similar subjects are paired off and one of two treatments

More information

Bootstrapping, Randomization, 2B-PLS

Bootstrapping, Randomization, 2B-PLS Bootstrapping, Randomization, 2B-PLS Statistics, Tests, and Bootstrapping Statistic a measure that summarizes some feature of a set of data (e.g., mean, standard deviation, skew, coefficient of variation,

More information

Median Cross-Validation

Median Cross-Validation Median Cross-Validation Chi-Wai Yu 1, and Bertrand Clarke 2 1 Department of Mathematics Hong Kong University of Science and Technology 2 Department of Medicine University of Miami IISA 2011 Outline Motivational

More information

Bivariate Data: Graphical Display The scatterplot is the basic tool for graphically displaying bivariate quantitative data.

Bivariate Data: Graphical Display The scatterplot is the basic tool for graphically displaying bivariate quantitative data. Bivariate Data: Graphical Display The scatterplot is the basic tool for graphically displaying bivariate quantitative data. Example: Some investors think that the performance of the stock market in January

More information

Finite-sample quantiles of the Jarque-Bera test

Finite-sample quantiles of the Jarque-Bera test Finite-sample quantiles of the Jarque-Bera test Steve Lawford Department of Economics and Finance, Brunel University First draft: February 2004. Abstract The nite-sample null distribution of the Jarque-Bera

More information

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation Biost 58 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 5: Review Purpose of Statistics Statistics is about science (Science in the broadest

More information

Readings Howitt & Cramer (2014)

Readings Howitt & Cramer (2014) Readings Howitt & Cramer (014) Ch 7: Relationships between two or more variables: Diagrams and tables Ch 8: Correlation coefficients: Pearson correlation and Spearman s rho Ch 11: Statistical significance

More information

Generalized Multivariate Rank Type Test Statistics via Spatial U-Quantiles

Generalized Multivariate Rank Type Test Statistics via Spatial U-Quantiles Generalized Multivariate Rank Type Test Statistics via Spatial U-Quantiles Weihua Zhou 1 University of North Carolina at Charlotte and Robert Serfling 2 University of Texas at Dallas Final revision for

More information

11. Bootstrap Methods

11. Bootstrap Methods 11. Bootstrap Methods c A. Colin Cameron & Pravin K. Trivedi 2006 These transparencies were prepared in 20043. They can be used as an adjunct to Chapter 11 of our subsequent book Microeconometrics: Methods

More information

Lawrence D. Brown* and Daniel McCarthy*

Lawrence D. Brown* and Daniel McCarthy* Comments on the paper, An adaptive resampling test for detecting the presence of significant predictors by I. W. McKeague and M. Qian Lawrence D. Brown* and Daniel McCarthy* ABSTRACT: This commentary deals

More information

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances Advances in Decision Sciences Volume 211, Article ID 74858, 8 pages doi:1.1155/211/74858 Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances David Allingham 1 andj.c.w.rayner

More information

Dover- Sherborn High School Mathematics Curriculum Probability and Statistics

Dover- Sherborn High School Mathematics Curriculum Probability and Statistics Mathematics Curriculum A. DESCRIPTION This is a full year courses designed to introduce students to the basic elements of statistics and probability. Emphasis is placed on understanding terminology and

More information

Institute of Actuaries of India

Institute of Actuaries of India Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2018 Examinations Subject CT3 Probability and Mathematical Statistics Core Technical Syllabus 1 June 2017 Aim The

More information

Extending the Robust Means Modeling Framework. Alyssa Counsell, Phil Chalmers, Matt Sigal, Rob Cribbie

Extending the Robust Means Modeling Framework. Alyssa Counsell, Phil Chalmers, Matt Sigal, Rob Cribbie Extending the Robust Means Modeling Framework Alyssa Counsell, Phil Chalmers, Matt Sigal, Rob Cribbie One-way Independent Subjects Design Model: Y ij = µ + τ j + ε ij, j = 1,, J Y ij = score of the ith

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Application of Variance Homogeneity Tests Under Violation of Normality Assumption

Application of Variance Homogeneity Tests Under Violation of Normality Assumption Application of Variance Homogeneity Tests Under Violation of Normality Assumption Alisa A. Gorbunova, Boris Yu. Lemeshko Novosibirsk State Technical University Novosibirsk, Russia e-mail: gorbunova.alisa@gmail.com

More information

Key Algebraic Results in Linear Regression

Key Algebraic Results in Linear Regression Key Algebraic Results in Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) 1 / 30 Key Algebraic Results in

More information