COMPARING ROBUST REGRESSION LINES ASSOCIATED WITH TWO DEPENDENT GROUPS WHEN THERE IS HETEROSCEDASTICITY
|
|
- Martha Shanon Snow
- 5 years ago
- Views:
Transcription
1 COMPARING ROBUST REGRESSION LINES ASSOCIATED WITH TWO DEPENDENT GROUPS WHEN THERE IS HETEROSCEDASTICITY Rand R. Wilcox Dept of Psychology University of Southern California Florence Clark Division of Occupational Science & Occupational Therapy University of Southern California January 8,
2 ABSTRACT The paper deals with three approaches to comparing the regression lines corresponding to two dependent groups when using a robust estimator. The focus is on the Theil Sen estimator with some comments about alternative estimators that might be used. The first approach is to test the global hypothesis that the two groups have equal intercepts and slopes in a manner that allows a heteroscedastic error term. The second approach is to test the hypothesis of equal intercepts, ignoring the slopes, and testing the hypothesis of equal slopes, ignoring the intercepts. The third approach is to test the hypothesis that the regression lines differ at a specified design point. This last goal corresponds to the classic Johnson and Neyman method when dealing with independent groups and when using the ordinary least squares regression estimator. Based on extant studies, there are guesses about how to proceed in a manner that will provide reasonably accurate control over the Type I error probability: Use some type of percentile bootstrap method. (Methods that assume the regression estimator is asymptotically normal were not considered for reasons reviewed in the paper.) But there are no simulation results providing some sense of how well they perform when dealing with a relatively small sample size. Data from the Well Elderly II study are used to illustrate that the choice between the ordinary least squares estimator and the Theil Sen estimator can make a practical difference. Keywords: analysis of covariance, bootstrap methods, heteroscedasticity, Well Elderly II study. 1 Introduction Consider four random variables ɛ 1, ɛ 2, X 1 and X 2 where both (ɛ 1, ɛ 2 ) and (X 1, X 2 ) have some unknown bivariate distribution. Assume that Y j = β 0j β 1j X 1j λ(x 1j )ɛ j, (1) where β kj (k = 0, 1; j = 1, 2) are unknown parameters, λ(x 1j ) is some unknown function that models heteroscedasticity and σj 2 is the population variance associated with ɛ j. It is 2
3 assumed that ɛ j has a median of zero. The paper considers the problem of testing H 0 : (β 01, β 11 ) = (β 02, β 12 ) (2) when using the robust regression estimator derived by Theil (1950) and Sen (1968). In principle, any robust estimator can be used with the approach taken here and a few results based on alternative estimators (including a quantile regression estimator and the MMestimator) are reported. But because there are so many robust estimators, considering all possible choices amounts to a monumental task. The goal here is a more modest one: Use simulations to determine whether reasonably good control over the probability of a Type I error can be achieved for some estimator that is reasonably robust when the sample size is relatively small. Another goal is to use data from an actual study to determine whether the robust method studied here can make a practical difference, compared to using a least squares regression estimator, when the sample size is reasonably large. The paper also reports results on methods aimed at testing H 0 : β 01 = β 02, (3) and and H 0 : β 11 = β 12, (4) H 0 : M(Y 1 X) = M(Y 2 X), (5) where M(Y X) indicates some conditional measure of location given X. For the case of independent groups, 5 corresponds to the classic Johnson and Neyman (1936) method when using least squares regression, there is homoscedasticity and the error term has a normal distribution. Testing the hypotheses just described arose in connection with the Well Elderly II study (Jackson et al., 2009), which motivated this paper. A general goal in the Well Elderly II study was to assess the efficacy of an intervention strategy aimed at improving the physical and emotional health of older adults. A portion of the study was aimed at understanding the impact of intervention on a measure of meaningful activities as measured by the Meaningful Activity Participation Assessment (MAPA) instrument (Eakman et al., 2010). Higher MAPA scores reflect greater activity satisfaction. A covariate of interest was the cortisol awakening 3
4 response (CAR), which is defined as the change in cortisol concentration that occurs during the first hour after waking from sleep. (CAR is taken to be the cortisol level after the participants were awake for about an hour or less minus the level of cortisol upon awakening.) Extant studies (e.g., Clow et al., 2004; Chida & Steptoe, 2009) indicate that various forms of stress are associated with the CAR. An issue was whether the association between the CAR and MAPA, measured before intervention, differed from the association after intervention. In an unpublished paper, Wilcox and Clark (2013) studied heteroscedastic methods for testing the above hypotheses using the ordinary least squares (OLS) estimator. However, in the Well Elderly study, boxplots indicated that both CAR and MAPA have skewed distributions with outliers. Simple transformations (e.g., taking logs or using a Box Cox transformation) did not deal effectively with this concern. The distributions remained skewed with outliers. It is well known that even a single outlier can result in a misleading summary of an association when using the OLS estimator (e.g., Hampel et al., 1986; Huber & Ronchetti, 2009; Staudte & Sheather, 1990, Wilcox, 2012). Outliers among the dependent variable can impact power as well. Consequently, there was interest in using a robust regression estimator, but there was concern regarding the lack of information about the finite sample properties of methods that might be used. Moreover, it seems that there are no published results dealing with a heteroscedastic error term when comparing the regression parameters of dependent groups via some robust estimator. The strategies used here differ in obvious ways from the approach used by Wilcox and Clark. For example, when testing (2) using the OLS estimator, Wilcox and Clark used a version of Hotelling s T 2 test statistic coupled with a bootstrap estimate of the relevant standard errors and covariances. One strategy is to mimic this approach here using in part a bootstrap estimate of the standard errors and relevant covariances. However, any method that assumes that the Theil Sen estimator is asymptotically normal appears to be ill advised. The reason is that Peng et al. (2008) established that the slope estimator may or may not be asymptotically normal. Here, a percentile bootstrap method is used with a p-value computed based on the projection depth of the null vector in the bootstrap cloud. For the other hypotheses, again a percentile bootstrap method is used. A basic percentile bootstrap appears to perform well when using robust regression estimators in general when the goal is to compare independent groups in contrast to situations where hypotheses about population 4
5 means are of interest (Wilcox, 2012). So an obvious speculation is that a percentile bootstrap method will perform reasonably well for the situation at hand, but there are no published results indicating the extent this is the case. 2 Description of the Methods This section describes the Theil Sen estimator and reviews some of its properties. This is followed by a description of the the bootstrap methods that are used to test (2) - (5). 2.1 The Theil Sen Estimator Let (Y 1, X 1 ),..., (Y n, X n ) be a random sample from some unknown bivariate distribution. Assuming that X l X m for any l < m, let b lm = Y l Y m X l X m, 1 l < m n. The Theil-Sen estimate of the slope, say ˆβ 1, is taken to be the usual sample median based on the b jk values. The intercept, β 0, is typically estimated with ˆβ 0 = M y ˆβ 1 M x, where M y is the usual sample median based on Y 1,..., Y n. This will be called the TS estimator henceforth. Its mean squared error and small-sample efficiency compare well to the OLS estimator as well as other robust estimators that have been derived (Dietz, 1987; Wilcox, 1998). Dietz (1989) established that its asymptotic breakdown point is approximately.29. Roughly, about 29% of the points must be changed in order to make the estimate of the slope arbitrarily large or small. Other asymptotic properties have been studied by Wang (2005) and Peng et al. (2008). This is not to suggest that it dominates in any way all of the other robust estimators that have been derived. All indications are that no single estimator dominates. The only point is that a reasonable estimator has been chosen that has been studied extensively. It is noted that several strategies have been proposed regarding how the Theil Sen estimator might be generalized to more than one predictor (Wilcox, 2012, section 10.2). Because the focus here is on a single predictor, the details are omitted. 5
6 2.2 The Hypothesis Testing Techniques First consider testing (2). A version of a basic percentile bootstrap is applied as follows. (For relevant theoretical details, see Liu & Singh, 1997.) Let (X i1, Y i1, X i2, Y i2 ), i = 1,..., n, be a random sample of pairs of observations of observations taken at two different times. The resulting estimates of the slopes at times 1 and 2 are denoted by b 11 the estimate of the intercepts are b 01 and b 02. and b 12, and Begin by resampling with replacement n vectors of observations from (X i1, Y i1, X i2, Y i2 ) yielding (X i1, Y i1, X i2, Y i2) and let b 11 and b 12 b 01 and b 02 be the resulting estimates of the slopes and intercepts, respectively, based on this bootstrap sample. Let d j = b j1 b j2 (j = 0, 1). Repeat this process B times yielding d jb (b = 1,..., B). In terms of controlling the probability of a Type I error, B = 500 seems to be adequate for a range of hypothesis testing methods based on robust estimators (Wilcox, 2012). Consequently, B = 500 was used here. But in terms of power, a larger choice for B might have practical value (e.g., Jöckel, 1986; Racine & MacKinnon, 2007). Let V be the sample covariance matrix based on the d jb values. That is, where d j = d jb /B. Let d b = (d 0b, d 1b) and v jk = 1 (d B 1 jb d j)(d kb d k ), D 2 b = (d b d)v 1 (d b d) be the squared Mahalanobis distance of d b from d, where d = ( d 1, d 2). Let D 0 be the Mahalanobis distance of the null vector. Then from general theoretical results in Liu and Singh (1997), a p-value is given by 1 ID0 <D B b, where the indicator function I D0 <D b = 1 if D 0 < D b, otherwise I D0 <D b = 0. The strategy for computing a p-value, just described, has been used in a variety of settings where the goal is to test some global hypothesis associated with some multivariate distribution. Generally, this approach appears to perform well, in terms of controlling the Type I error probability, when using some multivariate estimator that is reasonably robust (Wilcox, 2012). However, Wilcox notes that situations are encountered where this is not 6
7 the case. More precisely, situations are encountered where the sample covariance matrix associated the bootstrap sample, the matrix V in the present context, is singular. Such situations seem to be rare, but to guard against this possibility here, an alternative measure of the distance of a point from the center of the bootstrap cloud is used that avoids this problem. The approach is a variation of the projection distance discussed by Donoho and Gasko (1992), which is described in Wilcox (2012, section 6.2.5). The complete computational details are not given here. Rather a rough outline of the method is described instead. Consider any d m, where 1 m B. The distance of d m from the center of the cloud is computed as follows. For each b (b = 1,..., B), project the points d 1,..., d B points onto the line connecting d b and the center of the bootstrap cloud. The center of the cloud can be estimated in many ways. Here, for simplicity, the marginal medians are used. Next, standardize the projected distance of d m from the center by dividing by some robust measure of scale based on the projected points yielding say m. Here the interquartile range is used. Do this for each, b = 1,..., B, and label the results mb. Then the projection distance of d m is the maximum value among m1,..., mb. Here, these distances were computed with the R function pdis, which is part of the Forge R package WRS. For (3) and (4), a basic percentile bootstrap method is used. For fixed j, let ˆP be the proportion of times d jb is negative among the B bootstrap samples. Then a p-value when testing (4) and (3), which corresponds to j = 1 and 0, respectively, is 2 min( ˆP, 1 ˆP ). As for (5), proceed as was done testing (4) and (3), only now ˆP is taken to be the proportion of times Ŷ 1 < Ŷ 2 for a given value of X, where Ŷ j group based on a bootstrap sample. is the estimate of M(Y X) for the jth 3 Simulation Results Simulations were used to study the small-sample properties of the methods in section 2 when testing at the.05 level. The sample sizes considered were 20 and 40. Some additional simulations were run with n = 200 as a partial check on the R functions that were used to apply the methods. Estimated Type I error probabilities, ˆα, were based on 2000 replications. Although the seriousness of a Type I error depends on the situation, Bradley (1978) has 7
8 suggested that as a general guide, when testing at the.05 level, at a minimum the actual level should be between.025 and.075. Using results in Pratt (1968) to compute a (two-sided).95 confidence interval for the actual level, based on 2000 replications, the hypothesis that the actual level is.075, would be rejected if the estimated level is less than or equal.063. As will be seen, the estimated levels are less than.050 among all of the situations considered. Similarly, if the estimated level is less than or equal to.018, reject the hypothesis that the actual level is.025. If the actual level is less than or equal to.075, the margin of error is about.011 or less. Serlin (2000) provides a more detailed analysis of how many replications should be used in a simulation study. Of course, power is a function in part of how small of a difference one wants to detect. If, for example, a method is considered level robust if the actual Type I error probability is between.04 and.06 when testing at the.05 level, and if the goal is to have power equal to.8 when in fact the actual level satisfies Bradley s criterion, results derived in Serlin s paper indicate using over 13,000 replications. Serlin also notes that in various situations, such a large number of replications can be impractical in terms of execution time. When studying more traditional methods, such as least squares regression, 13,000 replications is a realistic number. But when combining bootstrap methods with the robust regression estimators used here, high execution time becomes a serious issue. For example, when testing (2), a single simulation based on replications would require about 30 hours of execution time on a 2.7 GHz MacBook pro. No attempt is made to resolve the issue of how close the actual level should be to the nominal level. This goes beyond the scope of this paper. Rather, the goal is to characterize the accuracy of simulations reported here based on 2000 replicatsions. The results reported in this section strongly indicate that the actual level will not exceed.075 when testing at the.05 level. The reason is that all estimates were found to be significantly less than.075, so a Type II error regarding the hypothesis that the actual level is greater than or equal to.075 was not made. As for situations where the actual level might be less than.025, power is an issue. If, for example, the actual level is.02, power is.41. (based on a one-sided test at the.045 level where the hypothesis is that the level is greater than.025).if the actual level is.017, power is.79. Four types of marginal distributions were used: normal, symmetric and heavy-tailed, 8
9 Table 1: Some properties of the g-and-h distribution. g h κ 1 κ asymmetric and light-tailed, and asymmetric and heavy-tailed. More precisely, the marginal distribution s were taken to be one of four g-and-h distributions (Hoaglin, 1985) that contain the standard normal distribution as a special case. If Z has a standard normal distribution, then exp(gz) 1 exp(hz 2 /2), if g > 0 g W = Zexp(hZ 2 /2), if g = 0 has a g-and-h distribution where g and h are parameters that determine the first four moments. The four distributions used here were the standard normal (g = h = 0.0), a symmetric heavy-tailed distribution (h = 0.2, g = 0.0), an asymmetric distribution with relatively light tails (h = 0.0, g = 0.2), and an asymmetric distribution with heavy tails (g = h = 0.2). Table 1 shows the skewness (κ 1 ) and kurtosis (κ 2 ) for each distribution. Additional properties of the g-and-h distribution are summarized by Hoaglin (1985). The correlation among the four variables was taken to be ρ = 0 or.5. (The R function rmul in Wilcox, 2012, was used to generate data.) Three choices for λ were used: λ(x) = 1, λ(x) = X 1 1 and λ(x) = 1/( X 1 1). For convenience, these three choices are denoted by variance patterns (VP) 1, 2, and 3. As is evident, VP 1 corresponds to the usual homoscedasticity assumption. These variance patterns have been used in a number of studies that are summarized in Wilcox (2012, Ch. 11). (The final section comments on the general problem of modeling heteroscedasticity.) Table 2 summarizes the simulation results when testing (2) at the.05 level and the sample size is n = 20. In Table 2 the estimates range between.011 and.040. So all indications are that the actual level never exceeds the nominal.05 level, but situations are encountered where it drops below.025. Note that the lowest estimates occur for VP 3 when ρ = 0 and sampling is from a heavy-tailed distribution (h =.2). Increasing the sample size to n = 40, 9
10 Table 2: Estimated Type I error probability when testing (2), α =.05, n = 20 g h V P ρ = 0 ρ = the estimates for these two situations are.016 when g = 0 and.013 when g =.2. As will be seen, the methods for testing (3), (4) and (5) perform better in these same situations. Simulations were run again, when testing (2), with the Theil Sen estimator replaced by the MM-estimator derived by Yohai (1987), but the results were virtually the same as when using the Theil Sen estimator. Using the least trimmed squares estimator made matters worse. Switching to the quantile regression estimator derived by Koenker and Bassett (1978) resulted in improved control over the Type I error probability when there is homoscedasticity. For example, for normal distributions the estimated Type I error probability was.056 with ρ = 0 compared to.027 when using Theil Sen. For h =.2, the estimates based on the quantile regression estimator was.035 compared to.017 using Theil Sen. However, for VP 2, control over the Type I error probability was less satisfactory. Under normality, for example, with ρ = 0, the estimate was.077. For (g, h) = (.2, 0), the estimate was.078. Table 3 shows the results when testing (3) and (4). Again the estimated probability of a Type I error is always less than the nominal level, but situations are encountered where the estimate is less than.025. This is particularly the case for VP 2 when testing (3). Increasing the sample size to n = 40, the lowest estimate when testing (3) is now.028 and the lowest 10
11 Table 3: Estimates of α when testing (3) and (4), n = 20, α =.05 ρ = 0 ρ =.5 g h V P β 0 β 1 β 0 β estimate when testing (4) is.036. So with n 40 all indications are Bradley s criterion is met. Table 4 shows the estimated Type I error probabilities when testing (5). Three choices for X were used: q 1, q 2 and q 3, which correspond to the estimated lower, middle and upper quartiles based on X 11,..., X n1. Again, estimated Type I error probabilities are always less than the nominal.05 level. The only difficulty is that the estimates drop below.025 in some situations. Increasing n to 40 generally corrects this problem. For example, for g = 0, h =.2, ρ =.5 and VP 2, the estimates corresponding to q 1, q 2 and q 3 are.025,.032 and.031. The only exception was for g = h =.2, ρ =.5 and VP 2: the estimates were.024,.028 and An Illustration As indicated in the introduction, the motivation for this paper stems from the Well Elderly study. Extant papers indicate that there is an association between the CAR and various 11
12 Table 4: Estimates of α when testing (5), n = 20, α =.05 ρ =.0 ρ =.5 g h V P q 1 q 2 q 3 q 1 q 2 q measures of psychological stress and well being. But little is known about the impact of intervention on the association between cortisol and psychological measures of interest. Both CAR and MAPA were measured before intervention and after six months of intervention. Before intervention, the sample size was 328. Eliminating all participants with missing values after intervention, the sample size was n = 216. If CAR is taken to be the dependent variable, the test of (2) with the method in this paper yields p =.024. In contrast, using OLS as in Wilcox and Clark (2013), p =.28, the only point being that the choice of method can make a practical difference due to the impact of outliers on the least squares estimator. Testing (3), p =.09 and for the test of (4), p =.007. For the test of (5) based on the Theil Sen estimator, given that MAPA=59, p =.036 suggesting that for MAPA 59, and after intervention, CAR tends to be more negative compared to the CAR prior to intervention. That is, among participants with higher MAPA scores, cortisol tends to increase more after intervention compared to increases prior to intervention. Indeed, prior to intervention, no association between the CAR and MAPA was found. It seems fair to say that a common practice is to assume a straight line provides a 12
13 reasonably accurate approximation of the true regression line. However, look at Figure 1, which shows an approximation of the two regression lines using the smoother derived by Cleveland (1979) and later extended by Cleveland and Devlin (1988). The method is popularly known as LOESS and is designed to deal with curvature in a reasonably flexible manner. In Figure 1, the solid line corresponds to measures taken before intervention, which appears to be approximately straight. However, after intervention, there appears to be a distinct bend close to where MAPA is equal to 70. Testing the hypothesis that the regression line is straight, using the method in Wilcox (2012, section ), the p-value is p =.04. If possible curvature is taken into account by focusing on only the participants with MAPA less than 70, the test of (2) now has p =.01 and the Wilcox Clark method yields p =.16. The test of (3), using the method in this paper, now has p =.037 and for the test of (4), p =.01. So taking curvature into account had only a small impact when testing (4), but when testing (3) the p-value dropped from.09 to.037. Also, ignoring curvature, both of the slopes do not differ significantly from zero when testing at the.05 level. After intervention, p =.057. Focusing only on the data satisfying MAPA less than 70, p = Concluding Remarks In summary, there is an obvious speculation about how one might control the probability of a Type I error when testing the hypotheses considered in the paper. Simulations indicate that these methods perform reasonably well, in terms of avoiding an actual level above the nominal level when n = 20. The main difficulty is that there are situations where estimate drops below.025 when testing at the.05 level. Generally, with n = 40 this problem is corrected. An exception occurs for VP 3 when testing (2) and when sampling from a heavytailed distribution. Perhaps some variation of the methods used here is better able to deal with this situation, but this remains to be determined. Although the variance patterns used here have been used in other studies, evidently there are no comprehensive empirical investigations that help characterize the degree of heteroscedasticity that might be encountered in practice. Perhaps more severe amounts of heteroscedasticity occur in practice that would alter the conclusions reported here. It is evident that dealing with this issue in a satisfactory manner is difficult at best. 13
14 MAPA CAR Figure 1: The estimated regression lines for predicting CAR given MAPA. The solid line is the regression line prior to intervention. Points associated with the group prior to intervention are indicated by o; points after intervention are indicated by 14
15 There are numerous robust regression estimators beyond the one considered here (e.g., Wilcox, 2012). Some additional simulations were run using the MM-estimator derived by Yohai (1987) as well as the least trimmed squares estimator. All indications are that the Type I error probability is controlled about as well as indicated here when using the MMestimator, but a more detailed study is needed. Using the least trimmed squares estimator via the R package robustbase, made matters worse. Here it is assumed that the regression lines are reasonably straight. But it is suggested that this should not be taken for granted, as was illustrated in section 4. Finally, the R functions DregG, difreg and Dancts apply the methods in section 2 and have been added to the Forge R package WRS. REFERENCES Bradley, J. V. (1978) Robustness? British Journal of Mathematical and Statistical Psychology, 31, Chida, Y. & Steptoe, A. (2009). Cortisol awakening response and psychosocial factors: A systematic review and meta-analysis. Biological Psychology, 80, Cleveland, W. S. (1979). Robust locally weighted regression and smoothing scatterplots. Journal of the American Statistical Association, 74, Cleveland, W.S., and Devlin, S.J., (1988) Locally-weighted Regression: An Approach to Regression Analysis by Local Fitting. Journal of the American Statistical Association, 83, Clow, A., Thorn, L., Evans, P. & Hucklebridge, F. (2004). The awakening cortisol response: Methodological issues and significance. Stress, 7, Donoho, D. L. & Gasko, M. (1992). Breakdown properties of the location estimates based on halfspace depth and projected outlyingness. Annals of Statistics, 20, Eakman, A. M., Carlson, M. E. & Clark, F. A. (2010). The meaningful activity participation assessment: a measure of engagement in personally valued activities International Journal of Aging Human Development, 70,
16 Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J. & Stahel, W. A. (1986). Robust Statistics. New York: Wiley. Heritier, S., Cantoni, E, Copt, S. & Victoria-Feser, M.-P. (2009). Robust Methods in Biostatistics. New York: Wiley. Hinkley, D. V. (1977). Jackknifing in unbalanced situations. Technometrics, 19, Hoaglin, D. C. (1985). Summarizing shape numerically: The g-and-h distribution. In D. Hoaglin, F. Mosteller & J. Tukey (Eds.) Exploring Data Tables Trends and Shapes. New York: Wiley, pp Huber, P. J. & Ronchetti, E. (2009). Robust Statistics, 2nd Ed. New York: Wiley. Jackson, J., Mandel, D., Blanchard, J., Carlson, M., Cherry, B., Azen, S., Chou, C.-P., Jordan-Marsh, M., Forman, T., White, B., Granger, D., Knight, B., & Clark, F. (2009). Confronting challenges in intervention research with ethnically diverse older adults: the USC Well Elderly II trial. Clinical Trials, Jöckel, K.-H. (1986). Finite sample properties and asymptotic efficiency of Monte Carlo tests. Annals of Statistics, 14, Johnson, P. & Neyman, J. (1936). Tests of certain linear hypotheses and their application to some educational problems. Statistical Research Memoirs, 1, Koenker, R. & Bassett, G. (1978). Regression quantiles. Econometrika, Peng, H., Wang, S. & Wang, X. (2008). Consistency and asymptotic distribution of the Theil Sen estimator. Journal of Statistical Planning and Inference, 138, Pratt, J. W. (1968). A normal approximation for binomial, F, beta, and other common, related tail probabilities, I. Journal of the American Statistical Association, 63, Racine, J. & MacKinnon, J. G. (2007). Simulation-based tests than can use any number of simulations. Communications in Statistics Simulation and Computation, 36, Rousseeuw, P. J. & Leroy, A. M. (1987). Robust Regression & Outlier Detection. New York: Wiley. Sen, P. K. (1968). Estimate of the regression coefficient based on Kendall s tau. Journal of the American Statistical Association, 63,
17 Serlin, R. C. (2000). Testing for robustness in Monte Carlo studies. Psychological Methods, 5, Staudte, R. G. & Sheather, S. J. (1990). Robust Estimation and Testing. New York: Wiley Theil, H. (1950). A rank-invariant method of linear and polynomial regression analysis. Indagationes Mathematicae, 12, Wang, X. Q., Asymptotics of the Theil-Sen estimator in simple linear regression models with a random covariate. Nonparametric Statistics 17, Wilcox, R. R. (2012). Introduction to Robust Estimation and Hypothesis Testing, 3rd Ed. San Diego, CA: Academic Press. Wilcox, R. R. & Clark, F. (2013). Within groups comparisons of least squares regression lines when there Is heteroscedasticity. Technical report, Dept of Psychology, University of Southern California. Yohai, V. J. (1987). High breakdown point and high efficiency robust estimates for regression. Annals of Statistics, 15,
Within Groups Comparisons of Least Squares Regression Lines When There Is Heteroscedasticity
Within Groups Comparisons of Least Squares Regression Lines When There Is Heteroscedasticity Rand R. Wilcox Dept of Psychology University of Southern California Florence Clark Division of Occupational
More informationGLOBAL COMPARISONS OF MEDIANS AND OTHER QUANTILES IN A ONE-WAY DESIGN WHEN THERE ARE TIED VALUES
arxiv:1506.07461v1 [stat.me] 24 Jun 2015 GLOBAL COMPARISONS OF MEDIANS AND OTHER QUANTILES IN A ONE-WAY DESIGN WHEN THERE ARE TIED VALUES Rand R. Wilcox Dept of Psychology University of Southern California
More informationANCOVA: A HETEROSCEDASTIC GLOBAL TEST WHEN THERE IS CURVATURE AND TWO COVARIATES
ANCOVA: A HETEROSCEDASTIC GLOBAL TEST WHEN THERE IS CURVATURE AND TWO COVARIATES Rand R. Wilcox Dept of Psychology University of Southern California February 17, 2016 1 ABSTRACT Consider two independent
More informationCOMPARING TWO DEPENDENT GROUPS VIA QUANTILES
COMPARING TWO DEPENDENT GROUPS VIA QUANTILES Rand R. Wilcox Dept of Psychology University of Southern California and David M. Erceg-Hurn School of Psychology University of Western Australia September 14,
More informationTHE RUNNING INTERVAL SMOOTHER: A CONFIDENCE BAND HAVING SOME SPECIFIED SIMULTANEOUS PROBABILITY COVERAGE
International Journal of Statistics: Advances in Theory and Applications Vol. 1, Issue 1, 2017, Pages 21-43 Published Online on April 12, 2017 2017 Jyoti Academic Press http://jyotiacademicpress.org THE
More informationGlobal comparisons of medians and other quantiles in a one-way design when there are tied values
Communications in Statistics - Simulation and Computation ISSN: 0361-0918 (Print) 1532-4141 (Online) Journal homepage: http://www.tandfonline.com/loi/lssp20 Global comparisons of medians and other quantiles
More informationCOMPARISONS OF TWO QUANTILE REGRESSION SMOOTHERS
COMPARISONS OF TWO QUANTILE REGRESSION SMOOTHERS arxiv:1506.07456v1 [stat.me] 24 Jun 2015 Rand R. Wilcox Dept of Psychology University of Southern California September 17, 2017 1 ABSTRACT The paper compares
More informationANCOVA: A GLOBAL TEST BASED ON A ROBUST MEASURE OF LOCATION OR QUANTILES WHEN THERE IS CURVATURE
ANCOVA: A GLOBAL TEST BASED ON A ROBUST MEASURE OF LOCATION OR QUANTILES WHEN THERE IS CURVATURE Rand R. Wilcox Dept of Psychology University of Southern California June 24, 2015 1 ABSTRACT For two independent
More informationImproved Methods for Making Inferences About Multiple Skipped Correlations
Improved Methods for Making Inferences About Multiple Skipped Correlations arxiv:1807.05048v1 [stat.co] 13 Jul 2018 Rand R. Wilcox Dept of Psychology University of Southern California Guillaume A. Rousselet
More informationComparing Two Dependent Groups: Dealing with Missing Values
Journal of Data Science 9(2011), 1-13 Comparing Two Dependent Groups: Dealing with Missing Values Rand R. Wilcox University of Southern California Abstract: The paper considers the problem of comparing
More informationTwo-by-two ANOVA: Global and Graphical Comparisons Based on an Extension of the Shift Function
Journal of Data Science 7(2009), 459-468 Two-by-two ANOVA: Global and Graphical Comparisons Based on an Extension of the Shift Function Rand R. Wilcox University of Southern California Abstract: When comparing
More informationINFLUENCE OF USING ALTERNATIVE MEANS ON TYPE-I ERROR RATE IN THE COMPARISON OF INDEPENDENT GROUPS ABSTRACT
Mirtagioğlu et al., The Journal of Animal & Plant Sciences, 4(): 04, Page: J. 344-349 Anim. Plant Sci. 4():04 ISSN: 08-708 INFLUENCE OF USING ALTERNATIVE MEANS ON TYPE-I ERROR RATE IN THE COMPARISON OF
More informationLeast Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions
Journal of Modern Applied Statistical Methods Volume 8 Issue 1 Article 13 5-1-2009 Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error
More informationMethods for Detection of Word Usage over Time
Methods for Detection of Word Usage over Time Ondřej Herman and Vojtěch Kovář Natural Language Processing Centre Faculty of Informatics, Masaryk University Botanická 68a, 6 Brno, Czech Republic {xherman,xkovar}@fi.muni.cz
More information9 Correlation and Regression
9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the
More information401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.
401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis
More informationRobust model selection criteria for robust S and LT S estimators
Hacettepe Journal of Mathematics and Statistics Volume 45 (1) (2016), 153 164 Robust model selection criteria for robust S and LT S estimators Meral Çetin Abstract Outliers and multi-collinearity often
More informationGlossary. The ISI glossary of statistical terms provides definitions in a number of different languages:
Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the
More informationIntroduction and Single Predictor Regression. Correlation
Introduction and Single Predictor Regression Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Correlation A correlation
More informationSupplementary Material for Wang and Serfling paper
Supplementary Material for Wang and Serfling paper March 6, 2017 1 Simulation study Here we provide a simulation study to compare empirically the masking and swamping robustness of our selected outlyingness
More informationIntroduction to Robust Statistics. Elvezio Ronchetti. Department of Econometrics University of Geneva Switzerland.
Introduction to Robust Statistics Elvezio Ronchetti Department of Econometrics University of Geneva Switzerland Elvezio.Ronchetti@metri.unige.ch http://www.unige.ch/ses/metri/ronchetti/ 1 Outline Introduction
More informationWEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract
Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of
More informationTHE 'IMPROVED' BROWN AND FORSYTHE TEST FOR MEAN EQUALITY: SOME THINGS CAN'T BE FIXED
THE 'IMPROVED' BROWN AND FORSYTHE TEST FOR MEAN EQUALITY: SOME THINGS CAN'T BE FIXED H. J. Keselman Rand R. Wilcox University of Manitoba University of Southern California Winnipeg, Manitoba Los Angeles,
More informationFast and robust bootstrap for LTS
Fast and robust bootstrap for LTS Gert Willems a,, Stefan Van Aelst b a Department of Mathematics and Computer Science, University of Antwerp, Middelheimlaan 1, B-2020 Antwerp, Belgium b Department of
More informationRegression Analysis for Data Containing Outliers and High Leverage Points
Alabama Journal of Mathematics 39 (2015) ISSN 2373-0404 Regression Analysis for Data Containing Outliers and High Leverage Points Asim Kumer Dey Department of Mathematics Lamar University Md. Amir Hossain
More informationParametric Probability Densities and Distribution Functions for Tukey g-and-h Transformations and their Use for Fitting Data
Applied Mathematical Sciences, Vol. 2, 2008, no. 9, 449-462 Parametric Probability Densities and Distribution Functions for Tukey g-and-h Transformations and their Use for Fitting Data Todd C. Headrick,
More informationROBUST ESTIMATION OF A CORRELATION COEFFICIENT: AN ATTEMPT OF SURVEY
ROBUST ESTIMATION OF A CORRELATION COEFFICIENT: AN ATTEMPT OF SURVEY G.L. Shevlyakov, P.O. Smirnov St. Petersburg State Polytechnic University St.Petersburg, RUSSIA E-mail: Georgy.Shevlyakov@gmail.com
More informationIntroduction Robust regression Examples Conclusion. Robust regression. Jiří Franc
Robust regression Robust estimation of regression coefficients in linear regression model Jiří Franc Czech Technical University Faculty of Nuclear Sciences and Physical Engineering Department of Mathematics
More informationAn Alternative to Cronbach s Alpha: A L-Moment Based Measure of Internal-consistency Reliability
Southern Illinois University Carbondale OpenSIUC Book Chapters Educational Psychology and Special Education 013 An Alternative to Cronbach s Alpha: A L-Moment Based Measure of Internal-consistency Reliability
More informationSimple Linear Regression
Simple Linear Regression ST 370 Regression models are used to study the relationship of a response variable and one or more predictors. The response is also called the dependent variable, and the predictors
More informationROBUSTNESS OF TWO-PHASE REGRESSION TESTS
REVSTAT Statistical Journal Volume 3, Number 1, June 2005, 1 18 ROBUSTNESS OF TWO-PHASE REGRESSION TESTS Authors: Carlos A.R. Diniz Departamento de Estatística, Universidade Federal de São Carlos, São
More informationTESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST
Econometrics Working Paper EWP0402 ISSN 1485-6441 Department of Economics TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST Lauren Bin Dong & David E. A. Giles Department
More informationInference for Single Proportions and Means T.Scofield
Inference for Single Proportions and Means TScofield Confidence Intervals for Single Proportions and Means A CI gives upper and lower bounds between which we hope to capture the (fixed) population parameter
More informationA Comparison of Robust Estimators Based on Two Types of Trimming
Submitted to the Bernoulli A Comparison of Robust Estimators Based on Two Types of Trimming SUBHRA SANKAR DHAR 1, and PROBAL CHAUDHURI 1, 1 Theoretical Statistics and Mathematics Unit, Indian Statistical
More information3 Joint Distributions 71
2.2.3 The Normal Distribution 54 2.2.4 The Beta Density 58 2.3 Functions of a Random Variable 58 2.4 Concluding Remarks 64 2.5 Problems 64 3 Joint Distributions 71 3.1 Introduction 71 3.2 Discrete Random
More informationMidwest Big Data Summer School: Introduction to Statistics. Kris De Brabanter
Midwest Big Data Summer School: Introduction to Statistics Kris De Brabanter kbrabant@iastate.edu Iowa State University Department of Statistics Department of Computer Science June 20, 2016 1/27 Outline
More informationScatter plot of data from the study. Linear Regression
1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25
More informationConventional And Robust Paired And Independent-Samples t Tests: Type I Error And Power Rates
Journal of Modern Applied Statistical Methods Volume Issue Article --3 Conventional And And Independent-Samples t Tests: Type I Error And Power Rates Katherine Fradette University of Manitoba, umfradet@cc.umanitoba.ca
More informationAccurate and Powerful Multivariate Outlier Detection
Int. Statistical Inst.: Proc. 58th World Statistical Congress, 11, Dublin (Session CPS66) p.568 Accurate and Powerful Multivariate Outlier Detection Cerioli, Andrea Università di Parma, Dipartimento di
More informationExperimental Design and Data Analysis for Biologists
Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1
More informationComparing the performance of modified F t statistic with ANOVA and Kruskal Wallis test
Appl. Math. Inf. Sci. 7, No. 2L, 403-408 (2013) 403 Applied Mathematics & Information Sciences An International ournal http://dx.doi.org/10.12785/amis/072l04 Comparing the performance of modified F t statistic
More informationContents. Acknowledgments. xix
Table of Preface Acknowledgments page xv xix 1 Introduction 1 The Role of the Computer in Data Analysis 1 Statistics: Descriptive and Inferential 2 Variables and Constants 3 The Measurement of Variables
More informationThe scatterplot is the basic tool for graphically displaying bivariate quantitative data.
Bivariate Data: Graphical Display The scatterplot is the basic tool for graphically displaying bivariate quantitative data. Example: Some investors think that the performance of the stock market in January
More informationInferences for Regression
Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In
More informationON THE CALCULATION OF A ROBUST S-ESTIMATOR OF A COVARIANCE MATRIX
STATISTICS IN MEDICINE Statist. Med. 17, 2685 2695 (1998) ON THE CALCULATION OF A ROBUST S-ESTIMATOR OF A COVARIANCE MATRIX N. A. CAMPBELL *, H. P. LOPUHAA AND P. J. ROUSSEEUW CSIRO Mathematical and Information
More informationLecture Outline. Biost 518 Applied Biostatistics II. Choice of Model for Analysis. Choice of Model. Choice of Model. Lecture 10: Multiple Regression:
Biost 518 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture utline Choice of Model Alternative Models Effect of data driven selection of
More informationScatter plot of data from the study. Linear Regression
1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25
More information9. Robust regression
9. Robust regression Least squares regression........................................................ 2 Problems with LS regression..................................................... 3 Robust regression............................................................
More informationPrentice Hall Stats: Modeling the World 2004 (Bock) Correlated to: National Advanced Placement (AP) Statistics Course Outline (Grades 9-12)
National Advanced Placement (AP) Statistics Course Outline (Grades 9-12) Following is an outline of the major topics covered by the AP Statistics Examination. The ordering here is intended to define the
More information* Tuesday 17 January :30-16:30 (2 hours) Recored on ESSE3 General introduction to the course.
Name of the course Statistical methods and data analysis Audience The course is intended for students of the first or second year of the Graduate School in Materials Engineering. The aim of the course
More informationInstructions: Closed book, notes, and no electronic devices. Points (out of 200) in parentheses
ISQS 5349 Final Spring 2011 Instructions: Closed book, notes, and no electronic devices. Points (out of 200) in parentheses 1. (10) What is the definition of a regression model that we have used throughout
More informationIntroduction to Linear regression analysis. Part 2. Model comparisons
Introduction to Linear regression analysis Part Model comparisons 1 ANOVA for regression Total variation in Y SS Total = Variation explained by regression with X SS Regression + Residual variation SS Residual
More informationMULTIVARIATE TECHNIQUES, ROBUSTNESS
MULTIVARIATE TECHNIQUES, ROBUSTNESS Mia Hubert Associate Professor, Department of Mathematics and L-STAT Katholieke Universiteit Leuven, Belgium mia.hubert@wis.kuleuven.be Peter J. Rousseeuw 1 Senior Researcher,
More informationG. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication
G. S. Maddala Kajal Lahiri WILEY A John Wiley and Sons, Ltd., Publication TEMT Foreword Preface to the Fourth Edition xvii xix Part I Introduction and the Linear Regression Model 1 CHAPTER 1 What is Econometrics?
More informationChapter 1 Statistical Inference
Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations
More informationStatistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018
Statistics Boot Camp Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 March 21, 2018 Outline of boot camp Summarizing and simplifying data Point and interval estimation Foundations of statistical
More informationCOMPARISON OF THE ESTIMATORS OF THE LOCATION AND SCALE PARAMETERS UNDER THE MIXTURE AND OUTLIER MODELS VIA SIMULATION
(REFEREED RESEARCH) COMPARISON OF THE ESTIMATORS OF THE LOCATION AND SCALE PARAMETERS UNDER THE MIXTURE AND OUTLIER MODELS VIA SIMULATION Hakan S. Sazak 1, *, Hülya Yılmaz 2 1 Ege University, Department
More informationIntroduction to Statistical Analysis
Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive
More informationTransition Passage to Descriptive Statistics 28
viii Preface xiv chapter 1 Introduction 1 Disciplines That Use Quantitative Data 5 What Do You Mean, Statistics? 6 Statistics: A Dynamic Discipline 8 Some Terminology 9 Problems and Answers 12 Scales of
More informationTESTS FOR MEAN EQUALITY THAT DO NOT REQUIRE HOMOGENEITY OF VARIANCES: DO THEY REALLY WORK?
TESTS FOR MEAN EQUALITY THAT DO NOT REQUIRE HOMOGENEITY OF VARIANCES: DO THEY REALLY WORK? H. J. Keselman Rand R. Wilcox University of Manitoba University of Southern California Winnipeg, Manitoba Los
More informationAnalysing data: regression and correlation S6 and S7
Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association
More informationReview of Statistics 101
Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods
More informationAn Equivalency Test for Model Fit. Craig S. Wells. University of Massachusetts Amherst. James. A. Wollack. Ronald C. Serlin
Equivalency Test for Model Fit 1 Running head: EQUIVALENCY TEST FOR MODEL FIT An Equivalency Test for Model Fit Craig S. Wells University of Massachusetts Amherst James. A. Wollack Ronald C. Serlin University
More informationPractical Statistics for the Analytical Scientist Table of Contents
Practical Statistics for the Analytical Scientist Table of Contents Chapter 1 Introduction - Choosing the Correct Statistics 1.1 Introduction 1.2 Choosing the Right Statistical Procedures 1.2.1 Planning
More informationSubject CS1 Actuarial Statistics 1 Core Principles
Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and
More informationK. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =
K. Model Diagnostics We ve already seen how to check model assumptions prior to fitting a one-way ANOVA. Diagnostics carried out after model fitting by using residuals are more informative for assessing
More informationResearch Methodology Statistics Comprehensive Exam Study Guide
Research Methodology Statistics Comprehensive Exam Study Guide References Glass, G. V., & Hopkins, K. D. (1996). Statistical methods in education and psychology (3rd ed.). Boston: Allyn and Bacon. Gravetter,
More informationIMPROVING THE SMALL-SAMPLE EFFICIENCY OF A ROBUST CORRELATION MATRIX: A NOTE
IMPROVING THE SMALL-SAMPLE EFFICIENCY OF A ROBUST CORRELATION MATRIX: A NOTE Eric Blankmeyer Department of Finance and Economics McCoy College of Business Administration Texas State University San Marcos
More informationBusiness Statistics. Lecture 10: Course Review
Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,
More informationBootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator
Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator by Emmanuel Flachaire Eurequa, University Paris I Panthéon-Sorbonne December 2001 Abstract Recent results of Cribari-Neto and Zarkos
More informationStatistical. Psychology
SEVENTH у *i km m it* & П SB Й EDITION Statistical M e t h o d s for Psychology D a v i d C. Howell University of Vermont ; \ WADSWORTH f% CENGAGE Learning* Australia Biaall apan Korea Меяко Singapore
More informationOn Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness
Statistics and Applications {ISSN 2452-7395 (online)} Volume 16 No. 1, 2018 (New Series), pp 289-303 On Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness Snigdhansu
More informationA Monte Carlo Simulation of the Robust Rank- Order Test Under Various Population Symmetry Conditions
Journal of Modern Applied Statistical Methods Volume 12 Issue 1 Article 7 5-1-2013 A Monte Carlo Simulation of the Robust Rank- Order Test Under Various Population Symmetry Conditions William T. Mickelson
More informationCHAPTER 5. Outlier Detection in Multivariate Data
CHAPTER 5 Outlier Detection in Multivariate Data 5.1 Introduction Multivariate outlier detection is the important task of statistical analysis of multivariate data. Many methods have been proposed for
More informationAP Statistics Cumulative AP Exam Study Guide
AP Statistics Cumulative AP Eam Study Guide Chapters & 3 - Graphs Statistics the science of collecting, analyzing, and drawing conclusions from data. Descriptive methods of organizing and summarizing statistics
More informationA Brief Overview of Robust Statistics
A Brief Overview of Robust Statistics Olfa Nasraoui Department of Computer Engineering & Computer Science University of Louisville, olfa.nasraoui_at_louisville.edu Robust Statistical Estimators Robust
More informationTESTS FOR EQUIVALENCE BASED ON ODDS RATIO FOR MATCHED-PAIR DESIGN
Journal of Biopharmaceutical Statistics, 15: 889 901, 2005 Copyright Taylor & Francis, Inc. ISSN: 1054-3406 print/1520-5711 online DOI: 10.1080/10543400500265561 TESTS FOR EQUIVALENCE BASED ON ODDS RATIO
More informationAN IMPROVEMENT TO THE ALIGNED RANK STATISTIC
Journal of Applied Statistical Science ISSN 1067-5817 Volume 14, Number 3/4, pp. 225-235 2005 Nova Science Publishers, Inc. AN IMPROVEMENT TO THE ALIGNED RANK STATISTIC FOR TWO-FACTOR ANALYSIS OF VARIANCE
More informationStatistics 135 Fall 2008 Final Exam
Name: SID: Statistics 135 Fall 2008 Final Exam Show your work. The number of points each question is worth is shown at the beginning of the question. There are 10 problems. 1. [2] The normal equations
More informationA Modified M-estimator for the Detection of Outliers
A Modified M-estimator for the Detection of Outliers Asad Ali Department of Statistics, University of Peshawar NWFP, Pakistan Email: asad_yousafzay@yahoo.com Muhammad F. Qadir Department of Statistics,
More informationMultiple Linear Regression
Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2
MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 1 Bootstrapped Bias and CIs Given a multiple regression model with mean and
More information1 A Review of Correlation and Regression
1 A Review of Correlation and Regression SW, Chapter 12 Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then
More informationSTAT Section 3.4: The Sign Test. The sign test, as we will typically use it, is a method for analyzing paired data.
STAT 518 --- Section 3.4: The Sign Test The sign test, as we will typically use it, is a method for analyzing paired data. Examples of Paired Data: Similar subjects are paired off and one of two treatments
More informationBootstrapping, Randomization, 2B-PLS
Bootstrapping, Randomization, 2B-PLS Statistics, Tests, and Bootstrapping Statistic a measure that summarizes some feature of a set of data (e.g., mean, standard deviation, skew, coefficient of variation,
More informationMedian Cross-Validation
Median Cross-Validation Chi-Wai Yu 1, and Bertrand Clarke 2 1 Department of Mathematics Hong Kong University of Science and Technology 2 Department of Medicine University of Miami IISA 2011 Outline Motivational
More informationBivariate Data: Graphical Display The scatterplot is the basic tool for graphically displaying bivariate quantitative data.
Bivariate Data: Graphical Display The scatterplot is the basic tool for graphically displaying bivariate quantitative data. Example: Some investors think that the performance of the stock market in January
More informationFinite-sample quantiles of the Jarque-Bera test
Finite-sample quantiles of the Jarque-Bera test Steve Lawford Department of Economics and Finance, Brunel University First draft: February 2004. Abstract The nite-sample null distribution of the Jarque-Bera
More informationBiost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation
Biost 58 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 5: Review Purpose of Statistics Statistics is about science (Science in the broadest
More informationReadings Howitt & Cramer (2014)
Readings Howitt & Cramer (014) Ch 7: Relationships between two or more variables: Diagrams and tables Ch 8: Correlation coefficients: Pearson correlation and Spearman s rho Ch 11: Statistical significance
More informationGeneralized Multivariate Rank Type Test Statistics via Spatial U-Quantiles
Generalized Multivariate Rank Type Test Statistics via Spatial U-Quantiles Weihua Zhou 1 University of North Carolina at Charlotte and Robert Serfling 2 University of Texas at Dallas Final revision for
More information11. Bootstrap Methods
11. Bootstrap Methods c A. Colin Cameron & Pravin K. Trivedi 2006 These transparencies were prepared in 20043. They can be used as an adjunct to Chapter 11 of our subsequent book Microeconometrics: Methods
More informationLawrence D. Brown* and Daniel McCarthy*
Comments on the paper, An adaptive resampling test for detecting the presence of significant predictors by I. W. McKeague and M. Qian Lawrence D. Brown* and Daniel McCarthy* ABSTRACT: This commentary deals
More informationResearch Article A Nonparametric Two-Sample Wald Test of Equality of Variances
Advances in Decision Sciences Volume 211, Article ID 74858, 8 pages doi:1.1155/211/74858 Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances David Allingham 1 andj.c.w.rayner
More informationDover- Sherborn High School Mathematics Curriculum Probability and Statistics
Mathematics Curriculum A. DESCRIPTION This is a full year courses designed to introduce students to the basic elements of statistics and probability. Emphasis is placed on understanding terminology and
More informationInstitute of Actuaries of India
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2018 Examinations Subject CT3 Probability and Mathematical Statistics Core Technical Syllabus 1 June 2017 Aim The
More informationExtending the Robust Means Modeling Framework. Alyssa Counsell, Phil Chalmers, Matt Sigal, Rob Cribbie
Extending the Robust Means Modeling Framework Alyssa Counsell, Phil Chalmers, Matt Sigal, Rob Cribbie One-way Independent Subjects Design Model: Y ij = µ + τ j + ε ij, j = 1,, J Y ij = score of the ith
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationApplication of Variance Homogeneity Tests Under Violation of Normality Assumption
Application of Variance Homogeneity Tests Under Violation of Normality Assumption Alisa A. Gorbunova, Boris Yu. Lemeshko Novosibirsk State Technical University Novosibirsk, Russia e-mail: gorbunova.alisa@gmail.com
More informationKey Algebraic Results in Linear Regression
Key Algebraic Results in Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) 1 / 30 Key Algebraic Results in
More information