# Chapter 15 Confidence Intervals for Mean Difference Between Two Delta-Distributions

Save this PDF as:

Size: px
Start display at page:

Download "Chapter 15 Confidence Intervals for Mean Difference Between Two Delta-Distributions"

## Transcription

1 Chapter 15 Confidence Intervals for Mean Difference Between Two Delta-Distributions Karen V. Rosales and Joshua D. Naranjo Abstract Traditional two-sample estimation procedures like pooled-t, Welch s t, and the Wilcoxon-Hodges-Lehmann are often used for skewed data and data inflated with zero values. We investigate how well these work compared to dedicated procedures that consider the specialized nature of the data. Keywords Two-sample estimation Confidence intervals Skewed distribution Zero-inflated data Delta distribution 15.1 Introduction Some data are inherently nonnegative and contain a large number of zeros. Aitchison (1955) first described a distribution that contains both zero and positive values in an application to household expenditures. Some households spend nothing on, say, children s clothing while others allocate high amounts that make the distribution skewed and approximately follow the lognormal curve. On marine surveys, data are frequently inflated with zeros. Pennington (1983) examined a series of ichthyoplankton surveys aimed at estimating the total egg production of Atlantic mackerel in the study region. When zeros are mixed with lognormal positive values, this type of distribution is referred to as delta distribution (Aitchison 1955). One-sample confidence intervals for the mean of a delta distribution were investigated by Owen and DeRouen (1980), Pennington (1983), Zhou and Tu (2000a), Fletcher (2008), and Rosales (2009). Zhou and Tu (2000a) explored different methods of constructing confidence intervals for the mean of a delta distribution, including a bootstrap and two likelihood-based intervals. Fletcher (2008) investigated a profile-likelihood K.V. Rosales MMS Holdings, Inc., Canton, MI, USA J.D. Naranjo ( ) Department of Statistics, Western Michigan University, Kalamazoo, MI 49008, USA Springer International Publishing Switzerland 2016 R.Y. Liu, J.W. McKean (eds.), Robust Rank-Based and Nonparametric Methods, Springer Proceedings in Mathematics & Statistics 168, DOI / _15 261

2 262 K.V. Rosales and J.D. Naranjo approach. Zhou and Tu (2000b) proposed a maximum likelihood-based method and a bootstrap method for constructing confidence intervals for the ratio in means of medical costs data that contained both lognormal and zero observations. It remains unclear how well various two-sample confidence intervals work. For example, can we simply ignore the delta distribution structure of data and use traditional LS methods for estimating difference between means? Will more robust versions work better? In this paper, we focus on commonly used two-sample confidence intervals, and compare them to confidence intervals specifically derived under delta-distribution theory. We investigate how relative performance depends on sample size, proportion of zeros, the population means, and the population variances. In Sect. 15.2, we set up notation and terminology. In Sect. 15.3, we describe the confidence intervals included in the simulation study. In Sect. 15.4, we discuss results of a simulation study Notation and Terminology Consider a population in which a proportion ı of the observations are zeros, and the non-zero values follow a lognormal distribution with parameters and 2.The population is said to have a Delta distribution, denoted as (ı; ; 2 ). We will index the populations of interest by j D 1; 2. Thus the jth population is said to have distribution (ı j ; j ;j 2), with mean j and variance j 2. The population mean and variance of the jth population are j D EŒY j D.1 ı j /e jc 2 j =2 (15.1) j D VarŒY j D.1 ı j /e 2 jc 2 j.e 2 j.1 ı j // (15.2) Let y 1j ;:::;y nj j be a random sample from the jth population. Assume, without loss of generality, that the n j1 nonzero observations are listed first and the n j0 D n j n j1 zero observations are listed last. For the nonzero observations let x ij D log y ij and O j D Oı j D n j0 =n j (15.3) P nj1 id1 log y P nj1 ij id1 D x ij DNx j (15.4) n j1 n j1 s 2 j D P nj1 id1.log y ij O j / 2 P nj1 id1 D.x ij Nx j / 2 n j1 1 n j1 1 (15.5) Note that O j and s 2 j are simply the sample mean and variance of the log-transformed nonzero observations from the jth sample. The proportion of nonzero observations in the jth sample is 1 O ı j. Finney (1941) derived minimum-variance unbiased

3 15 Confidence Intervals for Mean Difference Between Two Delta-Distributions 263 estimators for the lognormal mean and variance. Extending his results, Aitchison (1955) showed that the following is a minimum variance unbiased estimator of the mean of the -distribution. 8 n j1 ˆ< n j e O s j G 2 j nj1 if n 2 j1 >1 O j D x j1 n ˆ: j if n j1 D 1 (15.6) 0 if n j1 D 0 where G nj1.t/ is a Bessel function defined as, G nj1.t/ D 1 C n j1 1 t C n j1 1X id2.n j1 1/ 2i 1 t i n i j1.n j1 C 1/.n j1 C 3/.n j1 C 2i 3/iŠ An estimate of asymptotic variance is given by Aitchison and Brown (1969) O 1.O j / D e2 O j C S 2 j n j " Oı j.1 ı O j / C.1 ı O j /.2Sj 2 C Sj 4/ # 2 (15.7) Owen and DeRouen (1980) suggested confidence interval estimates based on these estimates of mean and variance. Pennington (1983) proposed an interval estimate using an alternative estimate of the variance, as follows: 8 n j1 ˆ< n j e 2 O n j j1 s n j G 2 j nj1 n j1 1 2 n j 1 G nj1 2 n j1 n j1 1 s2 j if n j1 >1 O pen.o j / D. x j1 n ˆ: j / 2 if n j1 D 1 0 if n j1 D 0 (15.8) 15.3 Two-Sample Confidence Intervals We are interested in confidence interval estimates for the difference between means 1 2 of two delta distributions. We first consider traditional least-squares confidence intervals based on Student s t-distribution, using either the pooled-sd version or the unpooled-sd Welch Satterthwaite version. The pooled-t 100(1- )% confidence interval is given by s s # 1 ".Ny 1 Ny 2 / t =2;df S p C 1 1 ;.Ny 1 Ny 2 / C t =2;df S p C 1 (15.9) n 1 n 2 n 1 n 2

4 264 K.V. Rosales and J.D. Naranjo n X j where Ny j D 1 n j y ij is the sample mean for the jth sample, t =2;df is the upper id1 percentile of the t-distribution, n j is the sample size, df Dn 1 C n 2 2, and S p is the pooled standard deviation. We refer to this method as Pooled-t in the simulation study. A 100(1- )% confidence interval based on Welch s statistic is 2 4.Ny 1 Ny 2 / t =2; s s 2 1 n 1 C s2 2 n 2 ;.Ny 1 Ny 2 / C t =2; s 3 s 2 1 C s2 2 5 (15.10) n 1 n 2 The degrees of freedom associated with this variance estimate is approximated using the Welch-Satterthwaite equation D. s2 1 n 1 C s2 2 n 2 / 2 s 4 1 n 2 1.n 1 1/ C s4 2 n 2 2.n 2 1/ This method will be denoted as Welch-t in the simulation study. Since the lognormal is right skewed, more robust alternatives might work better than the t-based methods. A rank-based alternative is the confidence interval based on the Wilcoxon rank sum test. See, for example, Hollander et al. (2014). The Wilcoxon interval may be computed as follows. Form all possible.n 1 /.n 2 / pairwise differences y h1 y i2 between the first group and the second group. Let O.1/ ; O.2/ ;:::;O.n 1n 2 / denote these ordered differences. The Hodges-Lehmann point estimator of 1 2 is the median of these differences. A 100(1- )% confidence interval is given by O.C / ; O.n 1n 2 C1 C (15.11) where C D n 1.2n 2 Cn 1 C1/ 2 C 1 w =2, and w =2 is an appropriate percentile of the rank sum distribution. For large samples, a normal approximation of C is given by C D n 1n 2 2 Z =2 n1 n 2.n 1 C n 2 C 1/ This method is denoted as Wilcoxon in the simulation study. Both versions of the t-interval and the Wilcoxon interval ignore the zero-inflated nature of the data. One may construct a confidence interval based on Aitchison s minimum variance unbiased estimator O and Pennington s estimator of the variance of O. A 100(1- )% confidence interval for. 1 2 / is 12 1=2.O 1 O 2 / z =2 qo pen.o 1 / CO pen.o 2 / (15.12) where O and O pen are given in Eqs. (15.6) and (15.8), respectively. This method will be referred to as MVUE1 in the simulation study.

5 15 Confidence Intervals for Mean Difference Between Two Delta-Distributions 265 An alternative confidence interval can be constructed based on the variance estimate from Aitchison and Brown (1969). This 100(1- )% confidence interval for. 1 2 / is.o 1 O 2 / z =2 p O1.O 1 / CO 1.O 2 / (15.13) where O and O 1 are given in Eqs. (15.6) and (15.7), respectively. We refer to this method as MVUE2 for the rest of this dissertation. In addition to the above confidence intervals, we propose two additional robust confidence intervals. Since the sample mean and the sample variance lack robustness, Al-Khouli (1999) proposed to directly replace O and s 2 in (15.4) and (15.5) with robust M-estimators to obtain robust estimators of and. In his simulation, using (T H, Sb 2) in place of ( O, s2 ) seemed to work best, where T H is the one-step Huber M-estimator of location and Sb 2 is a bi-weight A-estimator of scale. Directly substituting T H and Sb 2 in place of O and s2 in (15.6) and (15.8), we get a robust version of the MVUE1 interval (15.12). The confidence interval is.o M1 O M2 / z =2 p OM.O M1 / CO M.O M2 / (15.14) where 8 n j1 ˆ< n j e T Sb Hj j G nj1 if n 2 j1 >1 O Mj D x 1 ˆ: nj if n j1 D 1 0 if n j1 D 0 and 8 n n j1 ˆ< n j e 2T nj1 Sb o Hj j n j G nj1 n j1 1 2 n j 1 G nj1 2 n j1 n j1 1 S b j if n j1 >1 O M.O Mj / D. ˆ: x 1 nj / 2 if n j1 D 1 0 if n j1 D 0 This method is referred as RMVUE1 in the simulation study. Similarly, a robust version of the MVUE2 confidence interval (15.13) replaces O and s in Eqs. (15.6) and (15.7) with their robust versions. The confidence interval is.o M1 O M2 / z =2 p O1.O M1 / CO 1.O M2 / (15.15) where O Mj D n j1 n j e T Hj G nj1 Sbj 2

6 266 K.V. Rosales and J.D. Naranjo and O 1.O Mj / D e2t Hj CS bj n j " Oı j.1 ı O j / C.1 ı O # j /.2S bj C Sb 2 j / 2 We denote this method as RMVUE2 in the simulation study Simulation To assess the general performance and robustness of the interval estimators (15.9) (15.15), we conducted a simulation study under various parameter combinations of the -distribution. Performance of the different estimates will be assessed using the following criteria: Coverage Probability (CP): proportion of times that the 95 % confidence interval contains the true value of 1 2. Coverage Error (CE): absolute difference between the coverage probability and 95 %. Lower Error Rate (LER): proportion of times that the true value 1 2 falls below the interval Upper Error Rate (UER): proportion of times that the true value 1 2 falls above the interval Average Width (Width): average width of 95 % confidence interval Note that all confidence intervals have confidence level set at 95 %. Ideally an estimation procedure will have CP=0.95, CE=0.0, LER=0.025, and UER= We also report the average width of each method. We evaluate performance at balanced sample sizes of 15 and 50. Ten thousand simulations are done for each combination of parameters and sample size. Table 15.1 shows simulation results when the two delta distributions are the same. MVUE1 and RMVUE1 seem to do best, achieving narrower intervals without sacrificing coverage probability. Coverage probabilities all exceed 0.95, maybe due to overinflated standard error estimates because of skewness. The naive t-based intervals seem competitive, with reasonable width and coverage probability. The Wilcoxon interval has the shortest width. Table 15.2 shows simulation results when ı 1 ı 2. Again, MVUE1 and RMVUE1 seem to do best, with narrower intervals without sacrificing coverage probability. The naive t-based intervals remain competitive, with reasonable width and coverage probability. The Wilcoxon interval still has significantly shortest width but achieves this at the price of unacceptably low coverage probability, especially for larger differences in ı. Table 15.3 shows simulation results when 1 2. MVUE1 and RMVUE1 still seem to do best, with RMVUE1 edging out MVUE1 in coverage probability

7 15 Confidence Intervals for Mean Difference Between Two Delta-Distributions 267 Table % CI under equal distributions 1.0:2; 0:5; 1/ and 2.0:2; 0:5; 1/ W 1 2 D 0 Method Sample size CP CE LER UER Width Pooled-t Welch-t Wilcoxon MVUE MVUE RMVUE RMVUE Pooled-t Welch-t Wilcoxon MVUE MVUE RMVUE RMVUE and width. MVUE2 and RMVUE2 attain better coverage probabilities at the cost of significantly wider intervals. The naive procedures pooled-t and Welch-t are surprisingly competitive, with reasonable width and coverage probability. The Wilcoxon interval has unacceptably low coverage probability, especially for larger differences in. Table 15.4 shows simulation results when All intervals have problems maintaining close to 95 % coverage probability, especially for larger differences in 2. The simulations show two notable features of Wilcoxon confidence intervals: they tend to be shorter and have low coverage probability. Wilcoxon intervals are a function of the ordered pairwise differences between the two samples [see e.g. Hollander et al. (2014)]. If.ı 1 ;ı 2 / are both large, then enough pairwise differences are 0 regardless of the values of the positive observations. This seems to reduce length of the Wilcoxon interval more than the others. Low coverage probability may be a result of the Wilcoxon interval estimating the wrong parameter. The Wilcoxon point estimator is the median of pairwise differences, which is naturally a better estimate of the true median of differences (i.e. the median of F Y1 Y 2 ) rather than the difference in means 1 2. For example, given two distributions.0:1; 0:5; 1/ and.0:5; 0:5; 1/, the difference in means is 1 2 D 1:0873 while the median of the difference is m D 0:7988. In Table 15.5, we reassess the performance of Wilcoxon by looking at the percentage of time it contains the median of differences m instead of 1 2. The Wilcoxon 95 % interval coverage probability for 1 2 D 1:0873 are quite low at and , respectively, but the coverage probability for m D 0:7988 are and , respectively, as

8 268 K.V. Rosales and J.D. Naranjo Table % CI under varying proportion of zeros ı Method Sample size CP CE LER UER Width 1.0:2; 0:5; 1/ and 2.0:4; 0:5; 1/ W 1 2 D 0:5437 Pooled-t Welch-t Wilcoxon MVUE MVUE RMVUE RMVUE Pooled-t Welch-t Wilcoxon MVUE MVUE RMVUE RMVUE :1; 0:5; 1/ and 2.0:5; 0:5; 1/ W 1 2 D 1:0873 Pooled-t Welch-t Wilcoxon MVUE MVUE RMVUE RMVUE Pooled-t Welch-t Wilcoxon MVUE MVUE RMVUE RMVUE found in the entry labeled W(for m). In fact, in all cases (see the rest of Table 15.5), as long as we measure the percentage of times that Wilcoxon interval contains the appropriate parameter m instead of 1 2, then the Wilcoxon has best coverage probability and narrowest width. Since the performance of MVUE2 and RMVUE2 trail MVUE1 and RMVUE1 in Tables 15.2, 15.3, and 15.4, they have been removed from Table 15.5 for space considerations.

9 15 Confidence Intervals for Mean Difference Between Two Delta-Distributions 269 Table % CI under varying lognormal parameter Method Sample size CP CE LER UER Width 1 (0.2, 0, 1) and 2 (0.2, 0.5, 1): 1 2 D 0:8556 Pooled-t Welch-t Wilcoxon MVUE MVUE RMVUE RMVUE Pooled-t Welch-t Wilcoxon MVUE MVUE RMVUE RMVUE (0.2, 0, 1) and 2 (0.2, 0.9, 1): 1 2 D 1:9252 Pooled-t Welch-t Wilcoxon MVUE MVUE RMVUE RMVUE Pooled-t Welch-t Wilcoxon MVUE MVUE RMVUE RMVUE Conclusion Traditional two-sample estimation procedures like pooled-t and Welch t that require normal distribution are often used for skewed data and data inflated with zero values. Our simulations show that these naive nonrobust approaches do not do too badly compared to dedicated delta distribution procedures, in terms of coverage probabilities and interval width. Among the dedicated approaches, we would recommend the MVUE1 and its robust version RMVUE1. The MVUE1 procedure is based on the mean estimator

10 270 K.V. Rosales and J.D. Naranjo Table % CI under varying lognormal parameter 2 Method Sample Size CP CE LER UER Width 1 (0.2, 0.5, 0.15) and 2 (0.2, 0.5, 1.0): 1 2 D 0:7529 Pooled-t Welch-t Wilcoxon MVUE MVUE RMVUE RMVUE Pooled-t Welch-t Wilcoxon MVUE MVUE RMVUE RMVUE (0.2, 0.5, 0.15) and 2 (0.2, 0.5, 2.0): 1 2 D 2:1636 Pooled-t Welch-t Wilcoxon MVUE MVUE RMVUE RMVUE Pooled-t Welch-t Wilcoxon MVUE MVUE RMVUE RMVUE O by Aitchison (1955) and the variance estimator by Pennington (1983). The RMVUE1 is similar to MVUE1 but uses M-estimates for the lognormal parameters and 2. The Wilcoxon two-sample interval performed consistently badly, but only when it was asked to estimate the difference in means 1 2. When used to estimate the median of differences m, it performed very well in terms of coverage probability, and generally had the shortest interval width. Of course, usefulness of the Wilcoxon interval will depend more on whether the user wants to estimate the median of differences instead of the difference in means.

11 15 Confidence Intervals for Mean Difference Between Two Delta-Distributions 271 Table % CI under varying parameters and sample size Method Sample Size CP CE LER UER Width Varying ı: 1 (0.1, 0.5, 1.0) and 2 (0.5, 0.5, 1.0) 1 2 = , m= Pooled-t Welch-t Wilcoxon (for 1 2 ) Wilcoxon (for m) MVUE MVUE RMVUE Pooled-t Welch-t Wilcoxon (for 1 2 ) Wilcoxon (for m) MVUE RMVUE Varying : 1 (0.2, 0, 1) and 2 (0.2, 0.9, 1) 1 2 D 1:9252, m= Pooled-t Welch-t Wilcoxon (for 1 2 ) Wilcoxon (for m) MVUE RMVUE Pooled-t Welch-t Wilcoxon (for 1 2 ) Wilcoxon (for m) MVUE RMVUE Varying 2 : 1 (0.2, 0.5, 0.15) and 2 (0.2, 0.5, 2.0) 1 2 = , m=0.0 Pooled-t Welch-t Wilcoxon (for 1 2 ) Wilcoxon (for m) MVUE RMVUE Pooled-t Welch-t Wilcoxon (for 1 2 ) Wilcoxon (for m) MVUE RMVUE The Wilcoxon interval is assessed for containing both 1 2 and the median of difference m

12 272 K.V. Rosales and J.D. Naranjo References Aitchison, J. (1955). On the distribution of a positive random variable having a discrete probability mass at the origin. Journal of the American Statistical Association, 50(271), Aitchison, J., & Brown, J. (1969). The lognormal distribution. Cambridge: Cambridge University Press. Al-Khouli, A. (1999). Robust estimation and bootstrap testing for the delta distribution with applications in marine sciences. Ph.D. dissertation, Texas A&M University. Finney, D. J. (1941). On the distribution of a variate whose logarithm is normally distributed. Journal of the Royal Statistical Society, Series B, 7, Fletcher, D. (2008). Confidence intervals for the mean of the delta-lognormal distribution. Environmental and Ecological Statistics, 15(2), Hollander, M., Wolfe, D., & Chicken, E. (2014). Nonparametric statistical methods. Hoboken: Wiley. Owen, W., & DeRouen, T. (1980). Estimation of the mean for lognormal data containing zeroes and left-censored values, with applications to the measurement of worker exposure to air contaminants. Biometrics, 36(4), Pennington, M. (1983). Efficient estimators of abundance, for fish and plankton surveys, Biometrics, 39(1), Rosales, M. (2009). The robustness of confidence intervals for the mean of delta distribution. Ph.D. dissertation, Western Michigan University. Zhou, X. H., & Tu, W. (2000a). Confidence intervals for the mean of diagnostic test charge data containing zeros. Biometrics, 56(4), Zhou, X. H., & Tu, W. (2000b). Interval estimation for the ratio in means of log-normally distributed medical costs with zero values. Computational Statistics and Data Analysis, 35(2),

### Robust Outcome Analysis for Observational Studies Designed Using Propensity Score Matching

The work of Kosten and McKean was partially supported by NIAAA Grant 1R21AA017906-01A1 Robust Outcome Analysis for Observational Studies Designed Using Propensity Score Matching Bradley E. Huitema Western

### AN IMPROVEMENT TO THE ALIGNED RANK STATISTIC

Journal of Applied Statistical Science ISSN 1067-5817 Volume 14, Number 3/4, pp. 225-235 2005 Nova Science Publishers, Inc. AN IMPROVEMENT TO THE ALIGNED RANK STATISTIC FOR TWO-FACTOR ANALYSIS OF VARIANCE

### Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions

Journal of Modern Applied Statistical Methods Volume 8 Issue 1 Article 13 5-1-2009 Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error

### Increasing Power in Paired-Samples Designs. by Correcting the Student t Statistic for Correlation. Donald W. Zimmerman. Carleton University

Power in Paired-Samples Designs Running head: POWER IN PAIRED-SAMPLES DESIGNS Increasing Power in Paired-Samples Designs by Correcting the Student t Statistic for Correlation Donald W. Zimmerman Carleton

### Asymptotic Relative Efficiency in Estimation

Asymptotic Relative Efficiency in Estimation Robert Serfling University of Texas at Dallas October 2009 Prepared for forthcoming INTERNATIONAL ENCYCLOPEDIA OF STATISTICAL SCIENCES, to be published by Springer

### Extending the Robust Means Modeling Framework. Alyssa Counsell, Phil Chalmers, Matt Sigal, Rob Cribbie

Extending the Robust Means Modeling Framework Alyssa Counsell, Phil Chalmers, Matt Sigal, Rob Cribbie One-way Independent Subjects Design Model: Y ij = µ + τ j + ε ij, j = 1,, J Y ij = score of the ith

### A nonparametric two-sample wald test of equality of variances

University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 211 A nonparametric two-sample wald test of equality of variances David

### Rank-sum Test Based on Order Restricted Randomized Design

Rank-sum Test Based on Order Restricted Randomized Design Omer Ozturk and Yiping Sun Abstract One of the main principles in a design of experiment is to use blocking factors whenever it is possible. On

### Contents 1. Contents

Contents 1 Contents 1 One-Sample Methods 3 1.1 Parametric Methods.................... 4 1.1.1 One-sample Z-test (see Chapter 0.3.1)...... 4 1.1.2 One-sample t-test................. 6 1.1.3 Large sample

### Joseph W. McKean 1. INTRODUCTION

Statistical Science 2004, Vol. 19, No. 4, 562 570 DOI 10.1214/088342304000000549 Institute of Mathematical Statistics, 2004 Robust Analysis of Linear Models Joseph W. McKean Abstract. This paper presents

### One-Sample and Two-Sample Means Tests

One-Sample and Two-Sample Means Tests 1 Sample t Test The 1 sample t test allows us to determine whether the mean of a sample data set is different than a known value. Used when the population variance

### A Simulation Comparison Study for Estimating the Process Capability Index C pm with Asymmetric Tolerances

Available online at ijims.ms.tku.edu.tw/list.asp International Journal of Information and Management Sciences 20 (2009), 243-253 A Simulation Comparison Study for Estimating the Process Capability Index

### On Selecting Tests for Equality of Two Normal Mean Vectors

MULTIVARIATE BEHAVIORAL RESEARCH, 41(4), 533 548 Copyright 006, Lawrence Erlbaum Associates, Inc. On Selecting Tests for Equality of Two Normal Mean Vectors K. Krishnamoorthy and Yanping Xia Department

### Model Fitting. Jean Yves Le Boudec

Model Fitting Jean Yves Le Boudec 0 Contents 1. What is model fitting? 2. Linear Regression 3. Linear regression with norm minimization 4. Choosing a distribution 5. Heavy Tail 1 Virus Infection Data We

### CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8

CIVL - 7904/8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8 Chi-square Test How to determine the interval from a continuous distribution I = Range 1 + 3.322(logN) I-> Range of the class interval

### Lecture 12: Small Sample Intervals Based on a Normal Population Distribution

Lecture 12: Small Sample Intervals Based on a Normal Population MSU-STT-351-Sum-17B (P. Vellaisamy: MSU-STT-351-Sum-17B) Probability & Statistics for Engineers 1 / 24 In this lecture, we will discuss (i)

### Nonparametric tests. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 704: Data Analysis I

1 / 16 Nonparametric tests Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I Nonparametric one and two-sample tests 2 / 16 If data do not come from a normal

### Regression Analysis for Data Containing Outliers and High Leverage Points

Alabama Journal of Mathematics 39 (2015) ISSN 2373-0404 Regression Analysis for Data Containing Outliers and High Leverage Points Asim Kumer Dey Department of Mathematics Lamar University Md. Amir Hossain

### Two-Sample Inferential Statistics

The t Test for Two Independent Samples 1 Two-Sample Inferential Statistics In an experiment there are two or more conditions One condition is often called the control condition in which the treatment is

### Rank-Based Estimation and Associated Inferences. for Linear Models with Cluster Correlated Errors

Rank-Based Estimation and Associated Inferences for Linear Models with Cluster Correlated Errors John D. Kloke Bucknell University Joseph W. McKean Western Michigan University M. Mushfiqur Rashid FDA Abstract

### Inference for Distributions Inference for the Mean of a Population. Section 7.1

Inference for Distributions Inference for the Mean of a Population Section 7.1 Statistical inference in practice Emphasis turns from statistical reasoning to statistical practice: Population standard deviation,

### On the Existence and Uniqueness of the Maximum Likelihood Estimators of Normal and Lognormal Population Parameters with Grouped Data

Florida International University FIU Digital Commons Department of Mathematics and Statistics College of Arts, Sciences & Education 6-16-2009 On the Existence and Uniqueness of the Maximum Likelihood Estimators

### Non-parametric tests, part A:

Two types of statistical test: Non-parametric tests, part A: Parametric tests: Based on assumption that the data have certain characteristics or "parameters": Results are only valid if (a) the data are

### Bootstrap Procedures for Testing Homogeneity Hypotheses

Journal of Statistical Theory and Applications Volume 11, Number 2, 2012, pp. 183-195 ISSN 1538-7887 Bootstrap Procedures for Testing Homogeneity Hypotheses Bimal Sinha 1, Arvind Shah 2, Dihua Xu 1, Jianxin

### Interval Estimation for the Ratio and Difference of Two Lognormal Means

UW Biostatistics Working Paper Series 12-7-2005 Interval Estimation for the Ratio and Difference of Two Lognormal Means Yea-Hung Chen University of Washington, yeahung@u.washington.edu Xiao-Hua Zhou University

### Comparison of Two Samples

2 Comparison of Two Samples 2.1 Introduction Problems of comparing two samples arise frequently in medicine, sociology, agriculture, engineering, and marketing. The data may have been generated by observation

### Bootstrap tests. Patrick Breheny. October 11. Bootstrap vs. permutation tests Testing for equality of location

Bootstrap tests Patrick Breheny October 11 Patrick Breheny STA 621: Nonparametric Statistics 1/14 Introduction Conditioning on the observed data to obtain permutation tests is certainly an important idea

### AN ALTERNATIVE APPROACH TO EVALUATION OF POOLABILITY FOR STABILITY STUDIES

Journal of Biopharmaceutical Statistics, 16: 1 14, 2006 Copyright Taylor & Francis, LLC ISSN: 1054-3406 print/1520-5711 online DOI: 10.1080/10543400500406421 AN ALTERNATIVE APPROACH TO EVALUATION OF POOLABILITY

### Multiple Regression Methods

Chapter 1: Multiple Regression Methods Hildebrand, Ott and Gray Basic Statistical Ideas for Managers Second Edition 1 Learning Objectives for Ch. 1 The Multiple Linear Regression Model How to interpret

### Confidence Intervals of the Simple Difference between the Proportions of a Primary Infection and a Secondary Infection, Given the Primary Infection

Biometrical Journal 42 (2000) 1, 59±69 Confidence Intervals of the Simple Difference between the Proportions of a Primary Infection and a Secondary Infection, Given the Primary Infection Kung-Jong Lui

### Generalized Multivariate Rank Type Test Statistics via Spatial U-Quantiles

Generalized Multivariate Rank Type Test Statistics via Spatial U-Quantiles Weihua Zhou 1 University of North Carolina at Charlotte and Robert Serfling 2 University of Texas at Dallas Final revision for

### Confidence Intervals for the Coefficient of Variation in a Normal Distribution with a Known Mean and a Bounded Standard Deviation

KMUTNB Int J Appl Sci Technol, Vol. 10, No. 2, pp. 79 88, 2017 Research Article Confidence Intervals for the Coefficient of Variation in a Normal Distribution with a Known Mean and a Bounded Standard Deviation

### Inferences About the Difference Between Two Means

7 Inferences About the Difference Between Two Means Chapter Outline 7.1 New Concepts 7.1.1 Independent Versus Dependent Samples 7.1. Hypotheses 7. Inferences About Two Independent Means 7..1 Independent

### Statistics for Managers Using Microsoft Excel Chapter 10 ANOVA and Other C-Sample Tests With Numerical Data

Statistics for Managers Using Microsoft Excel Chapter 10 ANOVA and Other C-Sample Tests With Numerical Data 1999 Prentice-Hall, Inc. Chap. 10-1 Chapter Topics The Completely Randomized Model: One-Factor

### Application of Variance Homogeneity Tests Under Violation of Normality Assumption

Application of Variance Homogeneity Tests Under Violation of Normality Assumption Alisa A. Gorbunova, Boris Yu. Lemeshko Novosibirsk State Technical University Novosibirsk, Russia e-mail: gorbunova.alisa@gmail.com

### In Defence of Score Intervals for Proportions and their Differences

In Defence of Score Intervals for Proportions and their Differences Robert G. Newcombe a ; Markku M. Nurminen b a Department of Primary Care & Public Health, Cardiff University, Cardiff, United Kingdom

### Analysis of Regression and Bayesian Predictive Uncertainty Measures

Analysis of and Predictive Uncertainty Measures Dan Lu, Mary C. Hill, Ming Ye Florida State University, dl7f@fsu.edu, mye@fsu.edu, Tallahassee, FL, USA U.S. Geological Survey, mchill@usgs.gov, Boulder,

### Estimation of AUC from 0 to Infinity in Serial Sacrifice Designs

Estimation of AUC from 0 to Infinity in Serial Sacrifice Designs Martin J. Wolfsegger Department of Biostatistics, Baxter AG, Vienna, Austria Thomas Jaki Department of Statistics, University of South Carolina,

### Robustness of location estimators under t- distributions: a literature review

IOP Conference Series: Earth and Environmental Science PAPER OPEN ACCESS Robustness of location estimators under t- distributions: a literature review o cite this article: C Sumarni et al 07 IOP Conf.

### ON THE CONSEQUENCES OF MISSPECIFING ASSUMPTIONS CONCERNING RESIDUALS DISTRIBUTION IN A REPEATED MEASURES AND NONLINEAR MIXED MODELLING CONTEXT

ON THE CONSEQUENCES OF MISSPECIFING ASSUMPTIONS CONCERNING RESIDUALS DISTRIBUTION IN A REPEATED MEASURES AND NONLINEAR MIXED MODELLING CONTEXT Rachid el Halimi and Jordi Ocaña Departament d Estadística

### Overall Plan of Simulation and Modeling I. Chapters

Overall Plan of Simulation and Modeling I Chapters Introduction to Simulation Discrete Simulation Analytical Modeling Modeling Paradigms Input Modeling Random Number Generation Output Analysis Continuous

### One Factor Experiments

One Factor Experiments Raj Jain Washington University in Saint Louis Saint Louis, MO 63130 Jain@cse.wustl.edu These slides are available on-line at: http://www.cse.wustl.edu/~jain/cse567-06/ 20-1 Overview!

### A Type of Sample Size Planning for Mean Comparison in Clinical Trials

Journal of Data Science 13(2015), 115-126 A Type of Sample Size Planning for Mean Comparison in Clinical Trials Junfeng Liu 1 and Dipak K. Dey 2 1 GCE Solutions, Inc. 2 Department of Statistics, University

### A Note on Bayesian Inference After Multiple Imputation

A Note on Bayesian Inference After Multiple Imputation Xiang Zhou and Jerome P. Reiter Abstract This article is aimed at practitioners who plan to use Bayesian inference on multiplyimputed datasets in

### A Comparison of Two Approaches For Selecting Covariance Structures in The Analysis of Repeated Measurements. H.J. Keselman University of Manitoba

1 A Comparison of Two Approaches For Selecting Covariance Structures in The Analysis of Repeated Measurements by H.J. Keselman University of Manitoba James Algina University of Florida Rhonda K. Kowalchuk

### Modelling skewed data with many zeros: A simple approach combining ordinary and logistic regression

Environmental and Ecological Statistics 12, 45 54, 2005 Modelling skewed data with many zeros: A simple approach combining ordinary and logistic regression DAVID FLETCHER, 1,2,* DARRYL MACKENZIE 2 and

### On robust and efficient estimation of the center of. Symmetry.

On robust and efficient estimation of the center of symmetry Howard D. Bondell Department of Statistics, North Carolina State University Raleigh, NC 27695-8203, U.S.A (email: bondell@stat.ncsu.edu) Abstract

### FINITE MIXTURES OF LOGNORMAL AND GAMMA DISTRIBUTIONS

The 7 th International Days of Statistics and Economics, Prague, September 9-, 03 FINITE MIXTURES OF LOGNORMAL AND GAMMA DISTRIBUTIONS Ivana Malá Abstract In the contribution the finite mixtures of distributions

### High-dimensional regression

High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and

### 18Ï È² 7( &: ÄuANOVAp.O`û5 571 Based on this ANOVA model representation, Sobol (1993) proposed global sensitivity index, S i1...i s = D i1...i s /D, w

A^VÇÚO 1 Êò 18Ï 2013c12 Chinese Journal of Applied Probability and Statistics Vol.29 No.6 Dec. 2013 Optimal Properties of Orthogonal Arrays Based on ANOVA High-Dimensional Model Representation Chen Xueping

### Analysis of 2x2 Cross-Over Designs using T-Tests

Chapter 234 Analysis of 2x2 Cross-Over Designs using T-Tests Introduction This procedure analyzes data from a two-treatment, two-period (2x2) cross-over design. The response is assumed to be a continuous

### 9/2/2010. Wildlife Management is a very quantitative field of study. throughout this course and throughout your career.

Introduction to Data and Analysis Wildlife Management is a very quantitative field of study Results from studies will be used throughout this course and throughout your career. Sampling design influences

### A Nonparametric Estimator of Species Overlap

A Nonparametric Estimator of Species Overlap Jack C. Yue 1, Murray K. Clayton 2, and Feng-Chang Lin 1 1 Department of Statistics, National Chengchi University, Taipei, Taiwan 11623, R.O.C. and 2 Department

### Avoiding Bias in Calculations of Relative Growth Rate

Annals of Botany 80: 37±4, 00 doi:0.093/aob/mcf40, available online at www.aob.oupjournals.org Avoiding Bias in Calculations of Relative Growth Rate WILLIAM A. HOFFMANN, * and HENDRIK POORTER Departamento

### What is Experimental Design?

One Factor ANOVA What is Experimental Design? A designed experiment is a test in which purposeful changes are made to the input variables (x) so that we may observe and identify the reasons for change

### Asymptotic Statistics-VI. Changliang Zou

Asymptotic Statistics-VI Changliang Zou Kolmogorov-Smirnov distance Example (Kolmogorov-Smirnov confidence intervals) We know given α (0, 1), there is a well-defined d = d α,n such that, for any continuous

### Uncertainty due to Finite Resolution Measurements

Uncertainty due to Finite Resolution Measurements S.D. Phillips, B. Tolman, T.W. Estler National Institute of Standards and Technology Gaithersburg, MD 899 Steven.Phillips@NIST.gov Abstract We investigate

### STATISTICAL INFERENCE FOR SURVEY DATA ANALYSIS

STATISTICAL INFERENCE FOR SURVEY DATA ANALYSIS David A Binder and Georgia R Roberts Methodology Branch, Statistics Canada, Ottawa, ON, Canada K1A 0T6 KEY WORDS: Design-based properties, Informative sampling,

### Statistical Applications in Genetics and Molecular Biology

Statistical Applications in Genetics and Molecular Biology Volume 6, Issue 1 2007 Article 28 A Comparison of Methods to Control Type I Errors in Microarray Studies Jinsong Chen Mark J. van der Laan Martyn

### Chapters 4-6: Estimation

Chapters 4-6: Estimation Read sections 4. (except 4..3), 4.5.1, 5.1 (except 5.1.5), 6.1 (except 6.1.3) Point Estimation (4.1.1) Point Estimator - A formula applied to a data set which results in a single

### Double Bootstrap Confidence Interval Estimates with Censored and Truncated Data

Journal of Modern Applied Statistical Methods Volume 13 Issue 2 Article 22 11-2014 Double Bootstrap Confidence Interval Estimates with Censored and Truncated Data Jayanthi Arasan University Putra Malaysia,

### Linear Models 1. Isfahan University of Technology Fall Semester, 2014

Linear Models 1 Isfahan University of Technology Fall Semester, 2014 References: [1] G. A. F., Seber and A. J. Lee (2003). Linear Regression Analysis (2nd ed.). Hoboken, NJ: Wiley. [2] A. C. Rencher and

### Unit 2. Describing Data: Numerical

Unit 2 Describing Data: Numerical Describing Data Numerically Describing Data Numerically Central Tendency Arithmetic Mean Median Mode Variation Range Interquartile Range Variance Standard Deviation Coefficient

### Inference for Distributions Inference for the Mean of a Population

Inference for Distributions Inference for the Mean of a Population PBS Chapter 7.1 009 W.H Freeman and Company Objectives (PBS Chapter 7.1) Inference for the mean of a population The t distributions The

### AN EMPIRICAL LIKELIHOOD RATIO TEST FOR NORMALITY

Econometrics Working Paper EWP0401 ISSN 1485-6441 Department of Economics AN EMPIRICAL LIKELIHOOD RATIO TEST FOR NORMALITY Lauren Bin Dong & David E. A. Giles Department of Economics, University of Victoria

### It's Only Fitting. Fitting model to data parameterizing model estimating unknown parameters in the model

It's Only Fitting Fitting model to data parameterizing model estimating unknown parameters in the model Likelihood: an example Cohort of 8! individuals observe survivors at times >œ 1, 2, 3,..., : 8",

### Marcia Gumpertz and Sastry G. Pantula Department of Statistics North Carolina State University Raleigh, NC

A Simple Approach to Inference in Random Coefficient Models March 8, 1988 Marcia Gumpertz and Sastry G. Pantula Department of Statistics North Carolina State University Raleigh, NC 27695-8203 Key Words

### Sampling Distributions: Central Limit Theorem

Review for Exam 2 Sampling Distributions: Central Limit Theorem Conceptually, we can break up the theorem into three parts: 1. The mean (µ M ) of a population of sample means (M) is equal to the mean (µ)

### Bootstrap-Based T 2 Multivariate Control Charts

Bootstrap-Based T 2 Multivariate Control Charts Poovich Phaladiganon Department of Industrial and Manufacturing Systems Engineering University of Texas at Arlington Arlington, Texas, USA Seoung Bum Kim

### Midterm 1 and 2 results

Midterm 1 and 2 results Midterm 1 Midterm 2 ------------------------------ Min. :40.00 Min. : 20.0 1st Qu.:60.00 1st Qu.:60.00 Median :75.00 Median :70.0 Mean :71.97 Mean :69.77 3rd Qu.:85.00 3rd Qu.:85.0

### Ordinary Least Squares Regression Explained: Vartanian

Ordinary Least Squares Regression Explained: Vartanian When to Use Ordinary Least Squares Regression Analysis A. Variable types. When you have an interval/ratio scale dependent variable.. When your independent

### Multiple Pairwise Comparison Procedures in One-Way ANOVA with Fixed Effects Model

Biostatistics 250 ANOVA Multiple Comparisons 1 ORIGIN 1 Multiple Pairwise Comparison Procedures in One-Way ANOVA with Fixed Effects Model When the omnibus F-Test for ANOVA rejects the null hypothesis that

### Description Syntax for predict Menu for predict Options for predict Remarks and examples Methods and formulas References Also see

Title stata.com logistic postestimation Postestimation tools for logistic Description Syntax for predict Menu for predict Options for predict Remarks and examples Methods and formulas References Also see

### FULL LIKELIHOOD INFERENCES IN THE COX MODEL

October 20, 2007 FULL LIKELIHOOD INFERENCES IN THE COX MODEL BY JIAN-JIAN REN 1 AND MAI ZHOU 2 University of Central Florida and University of Kentucky Abstract We use the empirical likelihood approach

### Multicollinearity and A Ridge Parameter Estimation Approach

Journal of Modern Applied Statistical Methods Volume 15 Issue Article 5 11-1-016 Multicollinearity and A Ridge Parameter Estimation Approach Ghadban Khalaf King Khalid University, albadran50@yahoo.com

### Statistical comparison of univariate tests of homogeneity of variances

Submitted to the Journal of Statistical Computation and Simulation Statistical comparison of univariate tests of homogeneity of variances Pierre Legendre* and Daniel Borcard Département de sciences biologiques,

### Percentage point z /2

Chapter 8: Statistical Intervals Why? point estimate is not reliable under resampling. Interval Estimates: Bounds that represent an interval of plausible values for a parameter There are three types of

### A BIMODAL EXPONENTIAL POWER DISTRIBUTION

Pak. J. Statist. Vol. 6(), 379-396 A BIMODAL EXPONENTIAL POWER DISTRIBUTION Mohamed Y. Hassan and Rafiq H. Hijazi Department of Statistics United Arab Emirates University P.O. Box 7555, Al-Ain, U.A.E.

Appendix A Summary of Tasks Appendix Table of Contents Reporting Tasks...357 ListData...357 Tables...358 Graphical Tasks...358 BarChart...358 PieChart...359 Histogram...359 BoxPlot...360 Probability Plot...360

### Obtaining Uncertainty Measures on Slope and Intercept

Obtaining Uncertainty Measures on Slope and Intercept of a Least Squares Fit with Excel s LINEST Faith A. Morrison Professor of Chemical Engineering Michigan Technological University, Houghton, MI 39931

### POLYNOMIAL REGRESSION AND ESTIMATING FUNCTIONS IN THE PRESENCE OF MULTIPLICATIVE MEASUREMENT ERROR

POLYNOMIAL REGRESSION AND ESTIMATING FUNCTIONS IN THE PRESENCE OF MULTIPLICATIVE MEASUREMENT ERROR Stephen J. Iturria and Raymond J. Carroll 1 Texas A&M University, USA David Firth University of Oxford,

### Central Limit Theorem Confidence Intervals Worked example #6. July 24, 2017

Central Limit Theorem Confidence Intervals Worked example #6 July 24, 2017 10 8 Raw scores 6 4 Mean=71.4% 2 0 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80 80-90 90+ Scaling is to add 3.6% to bring mean

### 4. STATISTICAL SAMPLING DESIGNS FOR ISM

IRTC Incremental Sampling Methodology February 2012 4. STATISTICAL SAMPLING DESIGNS FOR ISM This section summarizes results of simulation studies used to evaluate the performance of ISM in estimating the

### A Hypothesis Test for the End of a Common Source Outbreak

Johns Hopkins University, Dept. of Biostatistics Working Papers 9-20-2004 A Hypothesis Test for the End of a Common Source Outbreak Ron Brookmeyer Johns Hopkins Bloomberg School of Public Health, Department

### CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter

### A Note on Coverage Probability of Confidence Interval for the Difference between Two Normal Variances

pplied Mathematical Sciences, Vol 6, 01, no 67, 3313-330 Note on Coverage Probability of Confidence Interval for the Difference between Two Normal Variances Sa-aat Niwitpong Department of pplied Statistics,

### 10/31/2012. One-Way ANOVA F-test

PSY 511: Advanced Statistics for Psychological and Behavioral Research 1 1. Situation/hypotheses 2. Test statistic 3.Distribution 4. Assumptions One-Way ANOVA F-test One factor J>2 independent samples

### CHAPTER 4. > 0, where β

CHAPTER 4 SOLUTIONS TO PROBLEMS 4. (i) and (iii) generally cause the t statistics not to have a t distribution under H. Homoskedasticity is one of the CLM assumptions. An important omitted variable violates

### Two-by-two ANOVA: Global and Graphical Comparisons Based on an Extension of the Shift Function

Journal of Data Science 7(2009), 459-468 Two-by-two ANOVA: Global and Graphical Comparisons Based on an Extension of the Shift Function Rand R. Wilcox University of Southern California Abstract: When comparing

### Some Observations on the Wilcoxon Rank Sum Test

UW Biostatistics Working Paper Series 8-16-011 Some Observations on the Wilcoxon Rank Sum Test Scott S. Emerson University of Washington, semerson@u.washington.edu Suggested Citation Emerson, Scott S.,

### Chapter 1 - Lecture 3 Measures of Location

Chapter 1 - Lecture 3 of Location August 31st, 2009 Chapter 1 - Lecture 3 of Location General Types of measures Median Skewness Chapter 1 - Lecture 3 of Location Outline General Types of measures What

### Bootstrapping, Permutations, and Monte Carlo Testing

Bootstrapping, Permutations, and Monte Carlo Testing Problem: Population of interest is extremely rare spatially and you are interested in using a 95% CI to estimate total abundance. The sampling design

Hossain, A; DiazOrdaz, K; Bartlett, JW (2017) Missing binary outcomes under covariate-dependent missingness in cluster randomised trials. Statistics in medicine. ISSN 0277-6715 DOI: https://doi.org/10.1002/sim.7334

### Chapter 1 Statistical Inference

Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

### Chapter 3: Statistical methods for estimation and testing. Key reference: Statistical methods in bioinformatics by Ewens & Grant (2001).

Chapter 3: Statistical methods for estimation and testing Key reference: Statistical methods in bioinformatics by Ewens & Grant (2001). Chapter 3: Statistical methods for estimation and testing Key reference:

### A MONTE CARLO INVESTIGATION INTO THE PROPERTIES OF A PROPOSED ROBUST ONE-SAMPLE TEST OF LOCATION. Bercedis Peterson

A MONTE CARLO INVESTIGATION INTO THE PROPERTIES OF A PROPOSED ROBUST ONE-SAMPLE TEST OF LOCATION by Bercedis Peterson Department of Biostatistics University of North Carolina at Chapel Hill Institute of

### Wiley. Methods and Applications of Linear Models. Regression and the Analysis. of Variance. Third Edition. Ishpeming, Michigan RONALD R.

Methods and Applications of Linear Models Regression and the Analysis of Variance Third Edition RONALD R. HOCKING PenHock Statistical Consultants Ishpeming, Michigan Wiley Contents Preface to the Third

### Basics on t-tests Independent Sample t-tests Single-Sample t-tests Summary of t-tests Multiple Tests, Effect Size Proportions. Statistiek I.

Statistiek I t-tests John Nerbonne CLCG, Rijksuniversiteit Groningen http://www.let.rug.nl/nerbonne/teach/statistiek-i/ John Nerbonne 1/46 Overview 1 Basics on t-tests 2 Independent Sample t-tests 3 Single-Sample