Chapter 15 Confidence Intervals for Mean Difference Between Two DeltaDistributions


 Virgil Skinner
 11 months ago
 Views:
Transcription
1 Chapter 15 Confidence Intervals for Mean Difference Between Two DeltaDistributions Karen V. Rosales and Joshua D. Naranjo Abstract Traditional twosample estimation procedures like pooledt, Welch s t, and the WilcoxonHodgesLehmann are often used for skewed data and data inflated with zero values. We investigate how well these work compared to dedicated procedures that consider the specialized nature of the data. Keywords Twosample estimation Confidence intervals Skewed distribution Zeroinflated data Delta distribution 15.1 Introduction Some data are inherently nonnegative and contain a large number of zeros. Aitchison (1955) first described a distribution that contains both zero and positive values in an application to household expenditures. Some households spend nothing on, say, children s clothing while others allocate high amounts that make the distribution skewed and approximately follow the lognormal curve. On marine surveys, data are frequently inflated with zeros. Pennington (1983) examined a series of ichthyoplankton surveys aimed at estimating the total egg production of Atlantic mackerel in the study region. When zeros are mixed with lognormal positive values, this type of distribution is referred to as delta distribution (Aitchison 1955). Onesample confidence intervals for the mean of a delta distribution were investigated by Owen and DeRouen (1980), Pennington (1983), Zhou and Tu (2000a), Fletcher (2008), and Rosales (2009). Zhou and Tu (2000a) explored different methods of constructing confidence intervals for the mean of a delta distribution, including a bootstrap and two likelihoodbased intervals. Fletcher (2008) investigated a profilelikelihood K.V. Rosales MMS Holdings, Inc., Canton, MI, USA J.D. Naranjo ( ) Department of Statistics, Western Michigan University, Kalamazoo, MI 49008, USA Springer International Publishing Switzerland 2016 R.Y. Liu, J.W. McKean (eds.), Robust RankBased and Nonparametric Methods, Springer Proceedings in Mathematics & Statistics 168, DOI / _15 261
2 262 K.V. Rosales and J.D. Naranjo approach. Zhou and Tu (2000b) proposed a maximum likelihoodbased method and a bootstrap method for constructing confidence intervals for the ratio in means of medical costs data that contained both lognormal and zero observations. It remains unclear how well various twosample confidence intervals work. For example, can we simply ignore the delta distribution structure of data and use traditional LS methods for estimating difference between means? Will more robust versions work better? In this paper, we focus on commonly used twosample confidence intervals, and compare them to confidence intervals specifically derived under deltadistribution theory. We investigate how relative performance depends on sample size, proportion of zeros, the population means, and the population variances. In Sect. 15.2, we set up notation and terminology. In Sect. 15.3, we describe the confidence intervals included in the simulation study. In Sect. 15.4, we discuss results of a simulation study Notation and Terminology Consider a population in which a proportion ı of the observations are zeros, and the nonzero values follow a lognormal distribution with parameters and 2.The population is said to have a Delta distribution, denoted as (ı; ; 2 ). We will index the populations of interest by j D 1; 2. Thus the jth population is said to have distribution (ı j ; j ;j 2), with mean j and variance j 2. The population mean and variance of the jth population are j D EŒY j D.1 ı j /e jc 2 j =2 (15.1) j D VarŒY j D.1 ı j /e 2 jc 2 j.e 2 j.1 ı j // (15.2) Let y 1j ;:::;y nj j be a random sample from the jth population. Assume, without loss of generality, that the n j1 nonzero observations are listed first and the n j0 D n j n j1 zero observations are listed last. For the nonzero observations let x ij D log y ij and O j D Oı j D n j0 =n j (15.3) P nj1 id1 log y P nj1 ij id1 D x ij DNx j (15.4) n j1 n j1 s 2 j D P nj1 id1.log y ij O j / 2 P nj1 id1 D.x ij Nx j / 2 n j1 1 n j1 1 (15.5) Note that O j and s 2 j are simply the sample mean and variance of the logtransformed nonzero observations from the jth sample. The proportion of nonzero observations in the jth sample is 1 O ı j. Finney (1941) derived minimumvariance unbiased
3 15 Confidence Intervals for Mean Difference Between Two DeltaDistributions 263 estimators for the lognormal mean and variance. Extending his results, Aitchison (1955) showed that the following is a minimum variance unbiased estimator of the mean of the distribution. 8 n j1 ˆ< n j e O s j G 2 j nj1 if n 2 j1 >1 O j D x j1 n ˆ: j if n j1 D 1 (15.6) 0 if n j1 D 0 where G nj1.t/ is a Bessel function defined as, G nj1.t/ D 1 C n j1 1 t C n j1 1X id2.n j1 1/ 2i 1 t i n i j1.n j1 C 1/.n j1 C 3/.n j1 C 2i 3/iŠ An estimate of asymptotic variance is given by Aitchison and Brown (1969) O 1.O j / D e2 O j C S 2 j n j " Oı j.1 ı O j / C.1 ı O j /.2Sj 2 C Sj 4/ # 2 (15.7) Owen and DeRouen (1980) suggested confidence interval estimates based on these estimates of mean and variance. Pennington (1983) proposed an interval estimate using an alternative estimate of the variance, as follows: 8 n j1 ˆ< n j e 2 O n j j1 s n j G 2 j nj1 n j1 1 2 n j 1 G nj1 2 n j1 n j1 1 s2 j if n j1 >1 O pen.o j / D. x j1 n ˆ: j / 2 if n j1 D 1 0 if n j1 D 0 (15.8) 15.3 TwoSample Confidence Intervals We are interested in confidence interval estimates for the difference between means 1 2 of two delta distributions. We first consider traditional leastsquares confidence intervals based on Student s tdistribution, using either the pooledsd version or the unpooledsd Welch Satterthwaite version. The pooledt 100(1 )% confidence interval is given by s s # 1 ".Ny 1 Ny 2 / t =2;df S p C 1 1 ;.Ny 1 Ny 2 / C t =2;df S p C 1 (15.9) n 1 n 2 n 1 n 2
4 264 K.V. Rosales and J.D. Naranjo n X j where Ny j D 1 n j y ij is the sample mean for the jth sample, t =2;df is the upper id1 percentile of the tdistribution, n j is the sample size, df Dn 1 C n 2 2, and S p is the pooled standard deviation. We refer to this method as Pooledt in the simulation study. A 100(1 )% confidence interval based on Welch s statistic is 2 4.Ny 1 Ny 2 / t =2; s s 2 1 n 1 C s2 2 n 2 ;.Ny 1 Ny 2 / C t =2; s 3 s 2 1 C s2 2 5 (15.10) n 1 n 2 The degrees of freedom associated with this variance estimate is approximated using the WelchSatterthwaite equation D. s2 1 n 1 C s2 2 n 2 / 2 s 4 1 n 2 1.n 1 1/ C s4 2 n 2 2.n 2 1/ This method will be denoted as Welcht in the simulation study. Since the lognormal is right skewed, more robust alternatives might work better than the tbased methods. A rankbased alternative is the confidence interval based on the Wilcoxon rank sum test. See, for example, Hollander et al. (2014). The Wilcoxon interval may be computed as follows. Form all possible.n 1 /.n 2 / pairwise differences y h1 y i2 between the first group and the second group. Let O.1/ ; O.2/ ;:::;O.n 1n 2 / denote these ordered differences. The HodgesLehmann point estimator of 1 2 is the median of these differences. A 100(1 )% confidence interval is given by O.C / ; O.n 1n 2 C1 C (15.11) where C D n 1.2n 2 Cn 1 C1/ 2 C 1 w =2, and w =2 is an appropriate percentile of the rank sum distribution. For large samples, a normal approximation of C is given by C D n 1n 2 2 Z =2 n1 n 2.n 1 C n 2 C 1/ This method is denoted as Wilcoxon in the simulation study. Both versions of the tinterval and the Wilcoxon interval ignore the zeroinflated nature of the data. One may construct a confidence interval based on Aitchison s minimum variance unbiased estimator O and Pennington s estimator of the variance of O. A 100(1 )% confidence interval for. 1 2 / is 12 1=2.O 1 O 2 / z =2 qo pen.o 1 / CO pen.o 2 / (15.12) where O and O pen are given in Eqs. (15.6) and (15.8), respectively. This method will be referred to as MVUE1 in the simulation study.
5 15 Confidence Intervals for Mean Difference Between Two DeltaDistributions 265 An alternative confidence interval can be constructed based on the variance estimate from Aitchison and Brown (1969). This 100(1 )% confidence interval for. 1 2 / is.o 1 O 2 / z =2 p O1.O 1 / CO 1.O 2 / (15.13) where O and O 1 are given in Eqs. (15.6) and (15.7), respectively. We refer to this method as MVUE2 for the rest of this dissertation. In addition to the above confidence intervals, we propose two additional robust confidence intervals. Since the sample mean and the sample variance lack robustness, AlKhouli (1999) proposed to directly replace O and s 2 in (15.4) and (15.5) with robust Mestimators to obtain robust estimators of and. In his simulation, using (T H, Sb 2) in place of ( O, s2 ) seemed to work best, where T H is the onestep Huber Mestimator of location and Sb 2 is a biweight Aestimator of scale. Directly substituting T H and Sb 2 in place of O and s2 in (15.6) and (15.8), we get a robust version of the MVUE1 interval (15.12). The confidence interval is.o M1 O M2 / z =2 p OM.O M1 / CO M.O M2 / (15.14) where 8 n j1 ˆ< n j e T Sb Hj j G nj1 if n 2 j1 >1 O Mj D x 1 ˆ: nj if n j1 D 1 0 if n j1 D 0 and 8 n n j1 ˆ< n j e 2T nj1 Sb o Hj j n j G nj1 n j1 1 2 n j 1 G nj1 2 n j1 n j1 1 S b j if n j1 >1 O M.O Mj / D. ˆ: x 1 nj / 2 if n j1 D 1 0 if n j1 D 0 This method is referred as RMVUE1 in the simulation study. Similarly, a robust version of the MVUE2 confidence interval (15.13) replaces O and s in Eqs. (15.6) and (15.7) with their robust versions. The confidence interval is.o M1 O M2 / z =2 p O1.O M1 / CO 1.O M2 / (15.15) where O Mj D n j1 n j e T Hj G nj1 Sbj 2
6 266 K.V. Rosales and J.D. Naranjo and O 1.O Mj / D e2t Hj CS bj n j " Oı j.1 ı O j / C.1 ı O # j /.2S bj C Sb 2 j / 2 We denote this method as RMVUE2 in the simulation study Simulation To assess the general performance and robustness of the interval estimators (15.9) (15.15), we conducted a simulation study under various parameter combinations of the distribution. Performance of the different estimates will be assessed using the following criteria: Coverage Probability (CP): proportion of times that the 95 % confidence interval contains the true value of 1 2. Coverage Error (CE): absolute difference between the coverage probability and 95 %. Lower Error Rate (LER): proportion of times that the true value 1 2 falls below the interval Upper Error Rate (UER): proportion of times that the true value 1 2 falls above the interval Average Width (Width): average width of 95 % confidence interval Note that all confidence intervals have confidence level set at 95 %. Ideally an estimation procedure will have CP=0.95, CE=0.0, LER=0.025, and UER= We also report the average width of each method. We evaluate performance at balanced sample sizes of 15 and 50. Ten thousand simulations are done for each combination of parameters and sample size. Table 15.1 shows simulation results when the two delta distributions are the same. MVUE1 and RMVUE1 seem to do best, achieving narrower intervals without sacrificing coverage probability. Coverage probabilities all exceed 0.95, maybe due to overinflated standard error estimates because of skewness. The naive tbased intervals seem competitive, with reasonable width and coverage probability. The Wilcoxon interval has the shortest width. Table 15.2 shows simulation results when ı 1 ı 2. Again, MVUE1 and RMVUE1 seem to do best, with narrower intervals without sacrificing coverage probability. The naive tbased intervals remain competitive, with reasonable width and coverage probability. The Wilcoxon interval still has significantly shortest width but achieves this at the price of unacceptably low coverage probability, especially for larger differences in ı. Table 15.3 shows simulation results when 1 2. MVUE1 and RMVUE1 still seem to do best, with RMVUE1 edging out MVUE1 in coverage probability
7 15 Confidence Intervals for Mean Difference Between Two DeltaDistributions 267 Table % CI under equal distributions 1.0:2; 0:5; 1/ and 2.0:2; 0:5; 1/ W 1 2 D 0 Method Sample size CP CE LER UER Width Pooledt Welcht Wilcoxon MVUE MVUE RMVUE RMVUE Pooledt Welcht Wilcoxon MVUE MVUE RMVUE RMVUE and width. MVUE2 and RMVUE2 attain better coverage probabilities at the cost of significantly wider intervals. The naive procedures pooledt and Welcht are surprisingly competitive, with reasonable width and coverage probability. The Wilcoxon interval has unacceptably low coverage probability, especially for larger differences in. Table 15.4 shows simulation results when All intervals have problems maintaining close to 95 % coverage probability, especially for larger differences in 2. The simulations show two notable features of Wilcoxon confidence intervals: they tend to be shorter and have low coverage probability. Wilcoxon intervals are a function of the ordered pairwise differences between the two samples [see e.g. Hollander et al. (2014)]. If.ı 1 ;ı 2 / are both large, then enough pairwise differences are 0 regardless of the values of the positive observations. This seems to reduce length of the Wilcoxon interval more than the others. Low coverage probability may be a result of the Wilcoxon interval estimating the wrong parameter. The Wilcoxon point estimator is the median of pairwise differences, which is naturally a better estimate of the true median of differences (i.e. the median of F Y1 Y 2 ) rather than the difference in means 1 2. For example, given two distributions.0:1; 0:5; 1/ and.0:5; 0:5; 1/, the difference in means is 1 2 D 1:0873 while the median of the difference is m D 0:7988. In Table 15.5, we reassess the performance of Wilcoxon by looking at the percentage of time it contains the median of differences m instead of 1 2. The Wilcoxon 95 % interval coverage probability for 1 2 D 1:0873 are quite low at and , respectively, but the coverage probability for m D 0:7988 are and , respectively, as
8 268 K.V. Rosales and J.D. Naranjo Table % CI under varying proportion of zeros ı Method Sample size CP CE LER UER Width 1.0:2; 0:5; 1/ and 2.0:4; 0:5; 1/ W 1 2 D 0:5437 Pooledt Welcht Wilcoxon MVUE MVUE RMVUE RMVUE Pooledt Welcht Wilcoxon MVUE MVUE RMVUE RMVUE :1; 0:5; 1/ and 2.0:5; 0:5; 1/ W 1 2 D 1:0873 Pooledt Welcht Wilcoxon MVUE MVUE RMVUE RMVUE Pooledt Welcht Wilcoxon MVUE MVUE RMVUE RMVUE found in the entry labeled W(for m). In fact, in all cases (see the rest of Table 15.5), as long as we measure the percentage of times that Wilcoxon interval contains the appropriate parameter m instead of 1 2, then the Wilcoxon has best coverage probability and narrowest width. Since the performance of MVUE2 and RMVUE2 trail MVUE1 and RMVUE1 in Tables 15.2, 15.3, and 15.4, they have been removed from Table 15.5 for space considerations.
9 15 Confidence Intervals for Mean Difference Between Two DeltaDistributions 269 Table % CI under varying lognormal parameter Method Sample size CP CE LER UER Width 1 (0.2, 0, 1) and 2 (0.2, 0.5, 1): 1 2 D 0:8556 Pooledt Welcht Wilcoxon MVUE MVUE RMVUE RMVUE Pooledt Welcht Wilcoxon MVUE MVUE RMVUE RMVUE (0.2, 0, 1) and 2 (0.2, 0.9, 1): 1 2 D 1:9252 Pooledt Welcht Wilcoxon MVUE MVUE RMVUE RMVUE Pooledt Welcht Wilcoxon MVUE MVUE RMVUE RMVUE Conclusion Traditional twosample estimation procedures like pooledt and Welch t that require normal distribution are often used for skewed data and data inflated with zero values. Our simulations show that these naive nonrobust approaches do not do too badly compared to dedicated delta distribution procedures, in terms of coverage probabilities and interval width. Among the dedicated approaches, we would recommend the MVUE1 and its robust version RMVUE1. The MVUE1 procedure is based on the mean estimator
10 270 K.V. Rosales and J.D. Naranjo Table % CI under varying lognormal parameter 2 Method Sample Size CP CE LER UER Width 1 (0.2, 0.5, 0.15) and 2 (0.2, 0.5, 1.0): 1 2 D 0:7529 Pooledt Welcht Wilcoxon MVUE MVUE RMVUE RMVUE Pooledt Welcht Wilcoxon MVUE MVUE RMVUE RMVUE (0.2, 0.5, 0.15) and 2 (0.2, 0.5, 2.0): 1 2 D 2:1636 Pooledt Welcht Wilcoxon MVUE MVUE RMVUE RMVUE Pooledt Welcht Wilcoxon MVUE MVUE RMVUE RMVUE O by Aitchison (1955) and the variance estimator by Pennington (1983). The RMVUE1 is similar to MVUE1 but uses Mestimates for the lognormal parameters and 2. The Wilcoxon twosample interval performed consistently badly, but only when it was asked to estimate the difference in means 1 2. When used to estimate the median of differences m, it performed very well in terms of coverage probability, and generally had the shortest interval width. Of course, usefulness of the Wilcoxon interval will depend more on whether the user wants to estimate the median of differences instead of the difference in means.
11 15 Confidence Intervals for Mean Difference Between Two DeltaDistributions 271 Table % CI under varying parameters and sample size Method Sample Size CP CE LER UER Width Varying ı: 1 (0.1, 0.5, 1.0) and 2 (0.5, 0.5, 1.0) 1 2 = , m= Pooledt Welcht Wilcoxon (for 1 2 ) Wilcoxon (for m) MVUE MVUE RMVUE Pooledt Welcht Wilcoxon (for 1 2 ) Wilcoxon (for m) MVUE RMVUE Varying : 1 (0.2, 0, 1) and 2 (0.2, 0.9, 1) 1 2 D 1:9252, m= Pooledt Welcht Wilcoxon (for 1 2 ) Wilcoxon (for m) MVUE RMVUE Pooledt Welcht Wilcoxon (for 1 2 ) Wilcoxon (for m) MVUE RMVUE Varying 2 : 1 (0.2, 0.5, 0.15) and 2 (0.2, 0.5, 2.0) 1 2 = , m=0.0 Pooledt Welcht Wilcoxon (for 1 2 ) Wilcoxon (for m) MVUE RMVUE Pooledt Welcht Wilcoxon (for 1 2 ) Wilcoxon (for m) MVUE RMVUE The Wilcoxon interval is assessed for containing both 1 2 and the median of difference m
12 272 K.V. Rosales and J.D. Naranjo References Aitchison, J. (1955). On the distribution of a positive random variable having a discrete probability mass at the origin. Journal of the American Statistical Association, 50(271), Aitchison, J., & Brown, J. (1969). The lognormal distribution. Cambridge: Cambridge University Press. AlKhouli, A. (1999). Robust estimation and bootstrap testing for the delta distribution with applications in marine sciences. Ph.D. dissertation, Texas A&M University. Finney, D. J. (1941). On the distribution of a variate whose logarithm is normally distributed. Journal of the Royal Statistical Society, Series B, 7, Fletcher, D. (2008). Confidence intervals for the mean of the deltalognormal distribution. Environmental and Ecological Statistics, 15(2), Hollander, M., Wolfe, D., & Chicken, E. (2014). Nonparametric statistical methods. Hoboken: Wiley. Owen, W., & DeRouen, T. (1980). Estimation of the mean for lognormal data containing zeroes and leftcensored values, with applications to the measurement of worker exposure to air contaminants. Biometrics, 36(4), Pennington, M. (1983). Efficient estimators of abundance, for fish and plankton surveys, Biometrics, 39(1), Rosales, M. (2009). The robustness of confidence intervals for the mean of delta distribution. Ph.D. dissertation, Western Michigan University. Zhou, X. H., & Tu, W. (2000a). Confidence intervals for the mean of diagnostic test charge data containing zeros. Biometrics, 56(4), Zhou, X. H., & Tu, W. (2000b). Interval estimation for the ratio in means of lognormally distributed medical costs with zero values. Computational Statistics and Data Analysis, 35(2),
Robust Outcome Analysis for Observational Studies Designed Using Propensity Score Matching
The work of Kosten and McKean was partially supported by NIAAA Grant 1R21AA01790601A1 Robust Outcome Analysis for Observational Studies Designed Using Propensity Score Matching Bradley E. Huitema Western
More informationAN IMPROVEMENT TO THE ALIGNED RANK STATISTIC
Journal of Applied Statistical Science ISSN 10675817 Volume 14, Number 3/4, pp. 225235 2005 Nova Science Publishers, Inc. AN IMPROVEMENT TO THE ALIGNED RANK STATISTIC FOR TWOFACTOR ANALYSIS OF VARIANCE
More informationLeast Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions
Journal of Modern Applied Statistical Methods Volume 8 Issue 1 Article 13 512009 Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error
More informationIncreasing Power in PairedSamples Designs. by Correcting the Student t Statistic for Correlation. Donald W. Zimmerman. Carleton University
Power in PairedSamples Designs Running head: POWER IN PAIREDSAMPLES DESIGNS Increasing Power in PairedSamples Designs by Correcting the Student t Statistic for Correlation Donald W. Zimmerman Carleton
More informationAsymptotic Relative Efficiency in Estimation
Asymptotic Relative Efficiency in Estimation Robert Serfling University of Texas at Dallas October 2009 Prepared for forthcoming INTERNATIONAL ENCYCLOPEDIA OF STATISTICAL SCIENCES, to be published by Springer
More informationExtending the Robust Means Modeling Framework. Alyssa Counsell, Phil Chalmers, Matt Sigal, Rob Cribbie
Extending the Robust Means Modeling Framework Alyssa Counsell, Phil Chalmers, Matt Sigal, Rob Cribbie Oneway Independent Subjects Design Model: Y ij = µ + τ j + ε ij, j = 1,, J Y ij = score of the ith
More informationA nonparametric twosample wald test of equality of variances
University of Wollongong Research Online Faculty of Informatics  Papers (Archive) Faculty of Engineering and Information Sciences 211 A nonparametric twosample wald test of equality of variances David
More informationRanksum Test Based on Order Restricted Randomized Design
Ranksum Test Based on Order Restricted Randomized Design Omer Ozturk and Yiping Sun Abstract One of the main principles in a design of experiment is to use blocking factors whenever it is possible. On
More informationContents 1. Contents
Contents 1 Contents 1 OneSample Methods 3 1.1 Parametric Methods.................... 4 1.1.1 Onesample Ztest (see Chapter 0.3.1)...... 4 1.1.2 Onesample ttest................. 6 1.1.3 Large sample
More informationJoseph W. McKean 1. INTRODUCTION
Statistical Science 2004, Vol. 19, No. 4, 562 570 DOI 10.1214/088342304000000549 Institute of Mathematical Statistics, 2004 Robust Analysis of Linear Models Joseph W. McKean Abstract. This paper presents
More informationOneSample and TwoSample Means Tests
OneSample and TwoSample Means Tests 1 Sample t Test The 1 sample t test allows us to determine whether the mean of a sample data set is different than a known value. Used when the population variance
More informationA Simulation Comparison Study for Estimating the Process Capability Index C pm with Asymmetric Tolerances
Available online at ijims.ms.tku.edu.tw/list.asp International Journal of Information and Management Sciences 20 (2009), 243253 A Simulation Comparison Study for Estimating the Process Capability Index
More informationOn Selecting Tests for Equality of Two Normal Mean Vectors
MULTIVARIATE BEHAVIORAL RESEARCH, 41(4), 533 548 Copyright 006, Lawrence Erlbaum Associates, Inc. On Selecting Tests for Equality of Two Normal Mean Vectors K. Krishnamoorthy and Yanping Xia Department
More informationModel Fitting. Jean Yves Le Boudec
Model Fitting Jean Yves Le Boudec 0 Contents 1. What is model fitting? 2. Linear Regression 3. Linear regression with norm minimization 4. Choosing a distribution 5. Heavy Tail 1 Virus Infection Data We
More informationCIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E  8
CIVL  7904/8904 T R A F F I C F L O W T H E O R Y L E C T U R E  8 Chisquare Test How to determine the interval from a continuous distribution I = Range 1 + 3.322(logN) I> Range of the class interval
More informationLecture 12: Small Sample Intervals Based on a Normal Population Distribution
Lecture 12: Small Sample Intervals Based on a Normal Population MSUSTT351Sum17B (P. Vellaisamy: MSUSTT351Sum17B) Probability & Statistics for Engineers 1 / 24 In this lecture, we will discuss (i)
More informationNonparametric tests. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 704: Data Analysis I
1 / 16 Nonparametric tests Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I Nonparametric one and twosample tests 2 / 16 If data do not come from a normal
More informationRegression Analysis for Data Containing Outliers and High Leverage Points
Alabama Journal of Mathematics 39 (2015) ISSN 23730404 Regression Analysis for Data Containing Outliers and High Leverage Points Asim Kumer Dey Department of Mathematics Lamar University Md. Amir Hossain
More informationTwoSample Inferential Statistics
The t Test for Two Independent Samples 1 TwoSample Inferential Statistics In an experiment there are two or more conditions One condition is often called the control condition in which the treatment is
More informationRankBased Estimation and Associated Inferences. for Linear Models with Cluster Correlated Errors
RankBased Estimation and Associated Inferences for Linear Models with Cluster Correlated Errors John D. Kloke Bucknell University Joseph W. McKean Western Michigan University M. Mushfiqur Rashid FDA Abstract
More informationInference for Distributions Inference for the Mean of a Population. Section 7.1
Inference for Distributions Inference for the Mean of a Population Section 7.1 Statistical inference in practice Emphasis turns from statistical reasoning to statistical practice: Population standard deviation,
More informationOn the Existence and Uniqueness of the Maximum Likelihood Estimators of Normal and Lognormal Population Parameters with Grouped Data
Florida International University FIU Digital Commons Department of Mathematics and Statistics College of Arts, Sciences & Education 6162009 On the Existence and Uniqueness of the Maximum Likelihood Estimators
More informationNonparametric tests, part A:
Two types of statistical test: Nonparametric tests, part A: Parametric tests: Based on assumption that the data have certain characteristics or "parameters": Results are only valid if (a) the data are
More informationBootstrap Procedures for Testing Homogeneity Hypotheses
Journal of Statistical Theory and Applications Volume 11, Number 2, 2012, pp. 183195 ISSN 15387887 Bootstrap Procedures for Testing Homogeneity Hypotheses Bimal Sinha 1, Arvind Shah 2, Dihua Xu 1, Jianxin
More informationInterval Estimation for the Ratio and Difference of Two Lognormal Means
UW Biostatistics Working Paper Series 1272005 Interval Estimation for the Ratio and Difference of Two Lognormal Means YeaHung Chen University of Washington, yeahung@u.washington.edu XiaoHua Zhou University
More informationComparison of Two Samples
2 Comparison of Two Samples 2.1 Introduction Problems of comparing two samples arise frequently in medicine, sociology, agriculture, engineering, and marketing. The data may have been generated by observation
More informationBootstrap tests. Patrick Breheny. October 11. Bootstrap vs. permutation tests Testing for equality of location
Bootstrap tests Patrick Breheny October 11 Patrick Breheny STA 621: Nonparametric Statistics 1/14 Introduction Conditioning on the observed data to obtain permutation tests is certainly an important idea
More informationAN ALTERNATIVE APPROACH TO EVALUATION OF POOLABILITY FOR STABILITY STUDIES
Journal of Biopharmaceutical Statistics, 16: 1 14, 2006 Copyright Taylor & Francis, LLC ISSN: 10543406 print/15205711 online DOI: 10.1080/10543400500406421 AN ALTERNATIVE APPROACH TO EVALUATION OF POOLABILITY
More informationMultiple Regression Methods
Chapter 1: Multiple Regression Methods Hildebrand, Ott and Gray Basic Statistical Ideas for Managers Second Edition 1 Learning Objectives for Ch. 1 The Multiple Linear Regression Model How to interpret
More informationConfidence Intervals of the Simple Difference between the Proportions of a Primary Infection and a Secondary Infection, Given the Primary Infection
Biometrical Journal 42 (2000) 1, 59±69 Confidence Intervals of the Simple Difference between the Proportions of a Primary Infection and a Secondary Infection, Given the Primary Infection KungJong Lui
More informationGeneralized Multivariate Rank Type Test Statistics via Spatial UQuantiles
Generalized Multivariate Rank Type Test Statistics via Spatial UQuantiles Weihua Zhou 1 University of North Carolina at Charlotte and Robert Serfling 2 University of Texas at Dallas Final revision for
More informationConfidence Intervals for the Coefficient of Variation in a Normal Distribution with a Known Mean and a Bounded Standard Deviation
KMUTNB Int J Appl Sci Technol, Vol. 10, No. 2, pp. 79 88, 2017 Research Article Confidence Intervals for the Coefficient of Variation in a Normal Distribution with a Known Mean and a Bounded Standard Deviation
More informationInferences About the Difference Between Two Means
7 Inferences About the Difference Between Two Means Chapter Outline 7.1 New Concepts 7.1.1 Independent Versus Dependent Samples 7.1. Hypotheses 7. Inferences About Two Independent Means 7..1 Independent
More informationStatistics for Managers Using Microsoft Excel Chapter 10 ANOVA and Other CSample Tests With Numerical Data
Statistics for Managers Using Microsoft Excel Chapter 10 ANOVA and Other CSample Tests With Numerical Data 1999 PrenticeHall, Inc. Chap. 101 Chapter Topics The Completely Randomized Model: OneFactor
More informationApplication of Variance Homogeneity Tests Under Violation of Normality Assumption
Application of Variance Homogeneity Tests Under Violation of Normality Assumption Alisa A. Gorbunova, Boris Yu. Lemeshko Novosibirsk State Technical University Novosibirsk, Russia email: gorbunova.alisa@gmail.com
More informationIn Defence of Score Intervals for Proportions and their Differences
In Defence of Score Intervals for Proportions and their Differences Robert G. Newcombe a ; Markku M. Nurminen b a Department of Primary Care & Public Health, Cardiff University, Cardiff, United Kingdom
More informationAnalysis of Regression and Bayesian Predictive Uncertainty Measures
Analysis of and Predictive Uncertainty Measures Dan Lu, Mary C. Hill, Ming Ye Florida State University, dl7f@fsu.edu, mye@fsu.edu, Tallahassee, FL, USA U.S. Geological Survey, mchill@usgs.gov, Boulder,
More informationEstimation of AUC from 0 to Infinity in Serial Sacrifice Designs
Estimation of AUC from 0 to Infinity in Serial Sacrifice Designs Martin J. Wolfsegger Department of Biostatistics, Baxter AG, Vienna, Austria Thomas Jaki Department of Statistics, University of South Carolina,
More informationRobustness of location estimators under t distributions: a literature review
IOP Conference Series: Earth and Environmental Science PAPER OPEN ACCESS Robustness of location estimators under t distributions: a literature review o cite this article: C Sumarni et al 07 IOP Conf.
More informationON THE CONSEQUENCES OF MISSPECIFING ASSUMPTIONS CONCERNING RESIDUALS DISTRIBUTION IN A REPEATED MEASURES AND NONLINEAR MIXED MODELLING CONTEXT
ON THE CONSEQUENCES OF MISSPECIFING ASSUMPTIONS CONCERNING RESIDUALS DISTRIBUTION IN A REPEATED MEASURES AND NONLINEAR MIXED MODELLING CONTEXT Rachid el Halimi and Jordi Ocaña Departament d Estadística
More informationOverall Plan of Simulation and Modeling I. Chapters
Overall Plan of Simulation and Modeling I Chapters Introduction to Simulation Discrete Simulation Analytical Modeling Modeling Paradigms Input Modeling Random Number Generation Output Analysis Continuous
More informationOne Factor Experiments
One Factor Experiments Raj Jain Washington University in Saint Louis Saint Louis, MO 63130 Jain@cse.wustl.edu These slides are available online at: http://www.cse.wustl.edu/~jain/cse56706/ 201 Overview!
More informationA Type of Sample Size Planning for Mean Comparison in Clinical Trials
Journal of Data Science 13(2015), 115126 A Type of Sample Size Planning for Mean Comparison in Clinical Trials Junfeng Liu 1 and Dipak K. Dey 2 1 GCE Solutions, Inc. 2 Department of Statistics, University
More informationA Note on Bayesian Inference After Multiple Imputation
A Note on Bayesian Inference After Multiple Imputation Xiang Zhou and Jerome P. Reiter Abstract This article is aimed at practitioners who plan to use Bayesian inference on multiplyimputed datasets in
More informationA Comparison of Two Approaches For Selecting Covariance Structures in The Analysis of Repeated Measurements. H.J. Keselman University of Manitoba
1 A Comparison of Two Approaches For Selecting Covariance Structures in The Analysis of Repeated Measurements by H.J. Keselman University of Manitoba James Algina University of Florida Rhonda K. Kowalchuk
More informationModelling skewed data with many zeros: A simple approach combining ordinary and logistic regression
Environmental and Ecological Statistics 12, 45 54, 2005 Modelling skewed data with many zeros: A simple approach combining ordinary and logistic regression DAVID FLETCHER, 1,2,* DARRYL MACKENZIE 2 and
More informationOn robust and efficient estimation of the center of. Symmetry.
On robust and efficient estimation of the center of symmetry Howard D. Bondell Department of Statistics, North Carolina State University Raleigh, NC 276958203, U.S.A (email: bondell@stat.ncsu.edu) Abstract
More informationFINITE MIXTURES OF LOGNORMAL AND GAMMA DISTRIBUTIONS
The 7 th International Days of Statistics and Economics, Prague, September 9, 03 FINITE MIXTURES OF LOGNORMAL AND GAMMA DISTRIBUTIONS Ivana Malá Abstract In the contribution the finite mixtures of distributions
More informationHighdimensional regression
Highdimensional regression Advanced Methods for Data Analysis 36402/36608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and
More information18Ï È² 7( &: ÄuANOVAp.O`û5 571 Based on this ANOVA model representation, Sobol (1993) proposed global sensitivity index, S i1...i s = D i1...i s /D, w
A^VÇÚO 1 Êò 18Ï 2013c12 Chinese Journal of Applied Probability and Statistics Vol.29 No.6 Dec. 2013 Optimal Properties of Orthogonal Arrays Based on ANOVA HighDimensional Model Representation Chen Xueping
More informationAnalysis of 2x2 CrossOver Designs using TTests
Chapter 234 Analysis of 2x2 CrossOver Designs using TTests Introduction This procedure analyzes data from a twotreatment, twoperiod (2x2) crossover design. The response is assumed to be a continuous
More information9/2/2010. Wildlife Management is a very quantitative field of study. throughout this course and throughout your career.
Introduction to Data and Analysis Wildlife Management is a very quantitative field of study Results from studies will be used throughout this course and throughout your career. Sampling design influences
More informationA Nonparametric Estimator of Species Overlap
A Nonparametric Estimator of Species Overlap Jack C. Yue 1, Murray K. Clayton 2, and FengChang Lin 1 1 Department of Statistics, National Chengchi University, Taipei, Taiwan 11623, R.O.C. and 2 Department
More informationAvoiding Bias in Calculations of Relative Growth Rate
Annals of Botany 80: 37±4, 00 doi:0.093/aob/mcf40, available online at www.aob.oupjournals.org Avoiding Bias in Calculations of Relative Growth Rate WILLIAM A. HOFFMANN, * and HENDRIK POORTER Departamento
More informationWhat is Experimental Design?
One Factor ANOVA What is Experimental Design? A designed experiment is a test in which purposeful changes are made to the input variables (x) so that we may observe and identify the reasons for change
More informationAsymptotic StatisticsVI. Changliang Zou
Asymptotic StatisticsVI Changliang Zou KolmogorovSmirnov distance Example (KolmogorovSmirnov confidence intervals) We know given α (0, 1), there is a welldefined d = d α,n such that, for any continuous
More informationUncertainty due to Finite Resolution Measurements
Uncertainty due to Finite Resolution Measurements S.D. Phillips, B. Tolman, T.W. Estler National Institute of Standards and Technology Gaithersburg, MD 899 Steven.Phillips@NIST.gov Abstract We investigate
More informationSTATISTICAL INFERENCE FOR SURVEY DATA ANALYSIS
STATISTICAL INFERENCE FOR SURVEY DATA ANALYSIS David A Binder and Georgia R Roberts Methodology Branch, Statistics Canada, Ottawa, ON, Canada K1A 0T6 KEY WORDS: Designbased properties, Informative sampling,
More informationStatistical Applications in Genetics and Molecular Biology
Statistical Applications in Genetics and Molecular Biology Volume 6, Issue 1 2007 Article 28 A Comparison of Methods to Control Type I Errors in Microarray Studies Jinsong Chen Mark J. van der Laan Martyn
More informationChapters 46: Estimation
Chapters 46: Estimation Read sections 4. (except 4..3), 4.5.1, 5.1 (except 5.1.5), 6.1 (except 6.1.3) Point Estimation (4.1.1) Point Estimator  A formula applied to a data set which results in a single
More informationDouble Bootstrap Confidence Interval Estimates with Censored and Truncated Data
Journal of Modern Applied Statistical Methods Volume 13 Issue 2 Article 22 112014 Double Bootstrap Confidence Interval Estimates with Censored and Truncated Data Jayanthi Arasan University Putra Malaysia,
More informationLinear Models 1. Isfahan University of Technology Fall Semester, 2014
Linear Models 1 Isfahan University of Technology Fall Semester, 2014 References: [1] G. A. F., Seber and A. J. Lee (2003). Linear Regression Analysis (2nd ed.). Hoboken, NJ: Wiley. [2] A. C. Rencher and
More informationUnit 2. Describing Data: Numerical
Unit 2 Describing Data: Numerical Describing Data Numerically Describing Data Numerically Central Tendency Arithmetic Mean Median Mode Variation Range Interquartile Range Variance Standard Deviation Coefficient
More informationInference for Distributions Inference for the Mean of a Population
Inference for Distributions Inference for the Mean of a Population PBS Chapter 7.1 009 W.H Freeman and Company Objectives (PBS Chapter 7.1) Inference for the mean of a population The t distributions The
More informationAN EMPIRICAL LIKELIHOOD RATIO TEST FOR NORMALITY
Econometrics Working Paper EWP0401 ISSN 14856441 Department of Economics AN EMPIRICAL LIKELIHOOD RATIO TEST FOR NORMALITY Lauren Bin Dong & David E. A. Giles Department of Economics, University of Victoria
More informationIt's Only Fitting. Fitting model to data parameterizing model estimating unknown parameters in the model
It's Only Fitting Fitting model to data parameterizing model estimating unknown parameters in the model Likelihood: an example Cohort of 8! individuals observe survivors at times >œ 1, 2, 3,..., : 8",
More informationMarcia Gumpertz and Sastry G. Pantula Department of Statistics North Carolina State University Raleigh, NC
A Simple Approach to Inference in Random Coefficient Models March 8, 1988 Marcia Gumpertz and Sastry G. Pantula Department of Statistics North Carolina State University Raleigh, NC 276958203 Key Words
More informationSampling Distributions: Central Limit Theorem
Review for Exam 2 Sampling Distributions: Central Limit Theorem Conceptually, we can break up the theorem into three parts: 1. The mean (µ M ) of a population of sample means (M) is equal to the mean (µ)
More informationBootstrapBased T 2 Multivariate Control Charts
BootstrapBased T 2 Multivariate Control Charts Poovich Phaladiganon Department of Industrial and Manufacturing Systems Engineering University of Texas at Arlington Arlington, Texas, USA Seoung Bum Kim
More informationMidterm 1 and 2 results
Midterm 1 and 2 results Midterm 1 Midterm 2  Min. :40.00 Min. : 20.0 1st Qu.:60.00 1st Qu.:60.00 Median :75.00 Median :70.0 Mean :71.97 Mean :69.77 3rd Qu.:85.00 3rd Qu.:85.0
More informationOrdinary Least Squares Regression Explained: Vartanian
Ordinary Least Squares Regression Explained: Vartanian When to Use Ordinary Least Squares Regression Analysis A. Variable types. When you have an interval/ratio scale dependent variable.. When your independent
More informationMultiple Pairwise Comparison Procedures in OneWay ANOVA with Fixed Effects Model
Biostatistics 250 ANOVA Multiple Comparisons 1 ORIGIN 1 Multiple Pairwise Comparison Procedures in OneWay ANOVA with Fixed Effects Model When the omnibus FTest for ANOVA rejects the null hypothesis that
More informationDescription Syntax for predict Menu for predict Options for predict Remarks and examples Methods and formulas References Also see
Title stata.com logistic postestimation Postestimation tools for logistic Description Syntax for predict Menu for predict Options for predict Remarks and examples Methods and formulas References Also see
More informationFULL LIKELIHOOD INFERENCES IN THE COX MODEL
October 20, 2007 FULL LIKELIHOOD INFERENCES IN THE COX MODEL BY JIANJIAN REN 1 AND MAI ZHOU 2 University of Central Florida and University of Kentucky Abstract We use the empirical likelihood approach
More informationMulticollinearity and A Ridge Parameter Estimation Approach
Journal of Modern Applied Statistical Methods Volume 15 Issue Article 5 111016 Multicollinearity and A Ridge Parameter Estimation Approach Ghadban Khalaf King Khalid University, albadran50@yahoo.com
More informationStatistical comparison of univariate tests of homogeneity of variances
Submitted to the Journal of Statistical Computation and Simulation Statistical comparison of univariate tests of homogeneity of variances Pierre Legendre* and Daniel Borcard Département de sciences biologiques,
More informationPercentage point z /2
Chapter 8: Statistical Intervals Why? point estimate is not reliable under resampling. Interval Estimates: Bounds that represent an interval of plausible values for a parameter There are three types of
More informationA BIMODAL EXPONENTIAL POWER DISTRIBUTION
Pak. J. Statist. Vol. 6(), 379396 A BIMODAL EXPONENTIAL POWER DISTRIBUTION Mohamed Y. Hassan and Rafiq H. Hijazi Department of Statistics United Arab Emirates University P.O. Box 7555, AlAin, U.A.E.
More informationAppendix A Summary of Tasks. Appendix Table of Contents
Appendix A Summary of Tasks Appendix Table of Contents Reporting Tasks...357 ListData...357 Tables...358 Graphical Tasks...358 BarChart...358 PieChart...359 Histogram...359 BoxPlot...360 Probability Plot...360
More informationObtaining Uncertainty Measures on Slope and Intercept
Obtaining Uncertainty Measures on Slope and Intercept of a Least Squares Fit with Excel s LINEST Faith A. Morrison Professor of Chemical Engineering Michigan Technological University, Houghton, MI 39931
More informationPOLYNOMIAL REGRESSION AND ESTIMATING FUNCTIONS IN THE PRESENCE OF MULTIPLICATIVE MEASUREMENT ERROR
POLYNOMIAL REGRESSION AND ESTIMATING FUNCTIONS IN THE PRESENCE OF MULTIPLICATIVE MEASUREMENT ERROR Stephen J. Iturria and Raymond J. Carroll 1 Texas A&M University, USA David Firth University of Oxford,
More informationCentral Limit Theorem Confidence Intervals Worked example #6. July 24, 2017
Central Limit Theorem Confidence Intervals Worked example #6 July 24, 2017 10 8 Raw scores 6 4 Mean=71.4% 2 0 010 1020 2030 3040 4050 5060 6070 7080 8090 90+ Scaling is to add 3.6% to bring mean
More information4. STATISTICAL SAMPLING DESIGNS FOR ISM
IRTC Incremental Sampling Methodology February 2012 4. STATISTICAL SAMPLING DESIGNS FOR ISM This section summarizes results of simulation studies used to evaluate the performance of ISM in estimating the
More informationA Hypothesis Test for the End of a Common Source Outbreak
Johns Hopkins University, Dept. of Biostatistics Working Papers 9202004 A Hypothesis Test for the End of a Common Source Outbreak Ron Brookmeyer Johns Hopkins Bloomberg School of Public Health, Department
More informationCHAPTER 17 CHISQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)
FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NONPARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter
More informationA Note on Coverage Probability of Confidence Interval for the Difference between Two Normal Variances
pplied Mathematical Sciences, Vol 6, 01, no 67, 3313330 Note on Coverage Probability of Confidence Interval for the Difference between Two Normal Variances Saaat Niwitpong Department of pplied Statistics,
More information10/31/2012. OneWay ANOVA Ftest
PSY 511: Advanced Statistics for Psychological and Behavioral Research 1 1. Situation/hypotheses 2. Test statistic 3.Distribution 4. Assumptions OneWay ANOVA Ftest One factor J>2 independent samples
More informationCHAPTER 4. > 0, where β
CHAPTER 4 SOLUTIONS TO PROBLEMS 4. (i) and (iii) generally cause the t statistics not to have a t distribution under H. Homoskedasticity is one of the CLM assumptions. An important omitted variable violates
More informationTwobytwo ANOVA: Global and Graphical Comparisons Based on an Extension of the Shift Function
Journal of Data Science 7(2009), 459468 Twobytwo ANOVA: Global and Graphical Comparisons Based on an Extension of the Shift Function Rand R. Wilcox University of Southern California Abstract: When comparing
More informationSome Observations on the Wilcoxon Rank Sum Test
UW Biostatistics Working Paper Series 816011 Some Observations on the Wilcoxon Rank Sum Test Scott S. Emerson University of Washington, semerson@u.washington.edu Suggested Citation Emerson, Scott S.,
More informationChapter 1  Lecture 3 Measures of Location
Chapter 1  Lecture 3 of Location August 31st, 2009 Chapter 1  Lecture 3 of Location General Types of measures Median Skewness Chapter 1  Lecture 3 of Location Outline General Types of measures What
More informationBootstrapping, Permutations, and Monte Carlo Testing
Bootstrapping, Permutations, and Monte Carlo Testing Problem: Population of interest is extremely rare spatially and you are interested in using a 95% CI to estimate total abundance. The sampling design
More informationDownloaded from:
Hossain, A; DiazOrdaz, K; Bartlett, JW (2017) Missing binary outcomes under covariatedependent missingness in cluster randomised trials. Statistics in medicine. ISSN 02776715 DOI: https://doi.org/10.1002/sim.7334
More informationChapter 1 Statistical Inference
Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations
More informationChapter 3: Statistical methods for estimation and testing. Key reference: Statistical methods in bioinformatics by Ewens & Grant (2001).
Chapter 3: Statistical methods for estimation and testing Key reference: Statistical methods in bioinformatics by Ewens & Grant (2001). Chapter 3: Statistical methods for estimation and testing Key reference:
More informationA MONTE CARLO INVESTIGATION INTO THE PROPERTIES OF A PROPOSED ROBUST ONESAMPLE TEST OF LOCATION. Bercedis Peterson
A MONTE CARLO INVESTIGATION INTO THE PROPERTIES OF A PROPOSED ROBUST ONESAMPLE TEST OF LOCATION by Bercedis Peterson Department of Biostatistics University of North Carolina at Chapel Hill Institute of
More informationWiley. Methods and Applications of Linear Models. Regression and the Analysis. of Variance. Third Edition. Ishpeming, Michigan RONALD R.
Methods and Applications of Linear Models Regression and the Analysis of Variance Third Edition RONALD R. HOCKING PenHock Statistical Consultants Ishpeming, Michigan Wiley Contents Preface to the Third
More informationBasics on ttests Independent Sample ttests SingleSample ttests Summary of ttests Multiple Tests, Effect Size Proportions. Statistiek I.
Statistiek I ttests John Nerbonne CLCG, Rijksuniversiteit Groningen http://www.let.rug.nl/nerbonne/teach/statistieki/ John Nerbonne 1/46 Overview 1 Basics on ttests 2 Independent Sample ttests 3 SingleSample
More informationConcentrationbased Delta Check for Laboratory Error Detection
Northeastern University Department of Electrical and Computer Engineering Concentrationbased Delta Check for Laboratory Error Detection Biomedical Signal Processing, Imaging, Reasoning, and Learning (BSPIRAL)
More informationScore Normalization in Multimodal Biometric Systems
Score Normalization in Multimodal Biometric Systems Karthik Nandakumar and Anil K. Jain Michigan State University, East Lansing, MI Arun A. Ross West Virginia University, Morgantown, WV http://biometrics.cse.mse.edu
More information