In Defence of Score Intervals for Proportions and their Differences

Similar documents
Score intervals for the difference of two binomial proportions


Good Confidence Intervals for Categorical Data Analyses. Alan Agresti

Approximate and Fiducial Confidence Intervals for the Difference Between Two Binomial Proportions

Confidence Intervals of the Simple Difference between the Proportions of a Primary Infection and a Secondary Infection, Given the Primary Infection

Power Comparison of Exact Unconditional Tests for Comparing Two Binomial Proportions

Pseudo-score confidence intervals for parameters in discrete statistical models

INVARIANT SMALL SAMPLE CONFIDENCE INTERVALS FOR THE DIFFERENCE OF TWO SUCCESS PROBABILITIES

PROD. TYPE: COM. Simple improved condence intervals for comparing matched proportions. Alan Agresti ; and Yongyi Min UNCORRECTED PROOF

Styles of Inference: Bayesianness and Frequentism

Confidence Intervals for a Ratio of Binomial Proportions Based on Unbiased Estimators

And the Bayesians and the frequentists shall lie down together...

A Non-parametric bootstrap for multilevel models

Categorical Data Analysis Chapter 3

Finite Population Correction Methods

Inverse Sampling for McNemar s Test

Inequalities Between Hypergeometric Tails

A Simple Approximate Procedure for Constructing Binomial and Poisson Tolerance Intervals

Adaptive Extensions of a Two-Stage Group Sequential Procedure for Testing a Primary and a Secondary Endpoint (II): Sample Size Re-estimation

Reports of the Institute of Biostatistics

Confidence intervals and point estimates for adaptive group sequential trials

Comparing Adaptive Designs and the. Classical Group Sequential Approach. to Clinical Trial Design

Confidence intervals for a binomial proportion in the presence of ties

Robust covariance estimator for small-sample adjustment in the generalized estimating equations: A simulation study

And the Bayesians and the frequentists shall lie down together...

ASYMPTOTIC TWO-TAILED CONFIDENCE INTERVALS FOR THE DIFFERENCE OF PROPORTIONS

Estimation in Flexible Adaptive Designs

TESTS FOR EQUIVALENCE BASED ON ODDS RATIO FOR MATCHED-PAIR DESIGN

One-sample categorical data: approximate inference

Modern Likelihood-Frequentist Inference. Donald A Pierce, OHSU and Ruggero Bellio, Univ of Udine

Simultaneous Confidence Intervals for Risk Ratios in the Many-to-One Comparisons of Proportions

STATISTICAL ANALYSIS AND COMPARISON OF SIMULATION MODELS OF HIGHLY DEPENDABLE SYSTEMS - AN EXPERIMENTAL STUDY. Peter Buchholz Dennis Müller

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

Session 3 The proportional odds model and the Mann-Whitney test

Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p.

Estimation and sample size calculations for correlated binary error rates of biometric identification devices

Test Volume 11, Number 1. June 2002

Pubh 8482: Sequential Analysis

A nonparametric two-sample wald test of equality of variances

A simulation study for comparing testing statistics in response-adaptive randomization

Mantel-Haenszel Test Statistics. for Correlated Binary Data. Department of Statistics, North Carolina State University. Raleigh, NC

Poisson Regression. Ryan Godwin. ECON University of Manitoba

IMPROVED CONFIDENCE INTERVALS FOR THE DIFFERENCE BETWEEN BINOMIAL PROPORTIONS BASED ON PAIRED DATA

STATISTICAL INFERENCE FOR SURVEY DATA ANALYSIS

Sample size determination for logistic regression: A simulation study

Clinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto.

Two Correlated Proportions Non- Inferiority, Superiority, and Equivalence Tests

Power and Sample Size Calculations with the Additive Hazards Model

arxiv: v1 [math.st] 5 Jul 2007

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data

Introduction to Statistical Analysis

Journeys of an Accidental Statistician

Modified Normal-based Approximation to the Percentiles of Linear Combination of Independent Random Variables with Applications

Pubh 8482: Sequential Analysis

Weizhen Wang & Zhongzhan Zhang

Cox s proportional hazards model and Cox s partial likelihood

Multivariate Capability Analysis Using Statgraphics. Presented by Dr. Neil W. Polhemus

BOOTSTRAPPING WITH MODELS FOR COUNT DATA

An Approximate Test for Homogeneity of Correlated Correlation Coefficients

Group Sequential Tests for Delayed Responses. Christopher Jennison. Lisa Hampson. Workshop on Special Topics on Sequential Methodology

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances

Estimation of AUC from 0 to Infinity in Serial Sacrifice Designs

Asymptotic Relative Efficiency in Estimation

Modified Large Sample Confidence Intervals for Poisson Distributions: Ratio, Weighted Average and Product of Means

Regression M&M 2.3 and 10. Uses Curve fitting Summarization ('model') Description Prediction Explanation Adjustment for 'confounding' variables

A New Confidence Interval for the Difference Between Two Binomial Proportions of Paired Data

Maximum-Likelihood Estimation: Basic Ideas

TUTORIAL 8 SOLUTIONS #

Lawrence D. Brown, T. Tony Cai and Anirban DasGupta

A Simulation Comparison Study for Estimating the Process Capability Index C pm with Asymmetric Tolerances

Confidence Intervals. Contents. Technical Guide

Chapter 6 Continuous Probability Distributions

TECHNICAL REPORT # 59 MAY Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study

CHOOSING AMONG GENERALIZED LINEAR MODELS APPLIED TO MEDICAL DATA

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches

Contents Kruskal-Wallis Test Friedman s Two-way Analysis of Variance by Ranks... 47

One-stage dose-response meta-analysis

Robustness and Distribution Assumptions

Three-Level Modeling for Factorial Experiments With Experimentally Induced Clustering

SAMPLE SIZE RE-ESTIMATION FOR ADAPTIVE SEQUENTIAL DESIGN IN CLINICAL TRIALS

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion

This is a revised version of the Online Supplement. that accompanies Lai and Kelley (2012) (Revised February, 2016)

Comment on HAC Corrections for Strongly Autocorrelated Time Series by Ulrich K. Müller

A MODIFICATION OF THE HARTUNG KNAPP CONFIDENCE INTERVAL ON THE VARIANCE COMPONENT IN TWO VARIANCE COMPONENT MODELS

Testing Goodness Of Fit Of The Geometric Distribution: An Application To Human Fecundability Data

Section 9.4. Notation. Requirements. Definition. Inferences About Two Means (Matched Pairs) Examples

A Discussion of the Bayesian Approach

Charles Geyer University of Minnesota. joint work with. Glen Meeden University of Minnesota.

Asymptotic efficiency of general noniterative estimators of common relative risk

14.30 Introduction to Statistical Methods in Economics Spring 2009

ANALYSING BINARY DATA IN A REPEATED MEASUREMENTS SETTING USING SAS

Plugin Confidence Intervals in Discrete Distributions

Inference for Binomial Parameters

Confidence Intervals for the Coefficient of Variation in a Normal Distribution with a Known Mean and a Bounded Standard Deviation

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Basic Concepts of Inference

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Comparison of Two Samples

Inferential Statistics

Data Mining Chapter 4: Data Analysis and Uncertainty Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Transcription:

In Defence of Score Intervals for Proportions and their Differences Robert G. Newcombe a ; Markku M. Nurminen b a Department of Primary Care & Public Health, Cardiff University, Cardiff, United Kingdom b Department of Public Health, Hjelt Institute, University of Helsinki, Helsinki, Finland Online publication date: 08 February 2011

IN DEFENCE OF LIKELIHOOD-BASED INTERVALS FOR PROPORTIONS AND THEIR DIFFERENCES* ROBERT G. NEWCOMBE 1, MARKKU M. NURMINEN 2 1 Cardiff University, United Kingdom 2 University of Helsinki, Finland *Abstract for the Nordic-Baltic Biometric Conference, Turku, Finland, 5-6 June 2011. Miettinen and Nurminen (1985) developed interval estimation methods for a comparative analysis of two binomial proportions for application in epidemiology and medical research. The MN method, as it has become known in the biometric literature, has been theoretically and empirically demonstrated to possess desirable statistical properties. A recent article by Santner et al (2007) asserted that the MN likelihood-based interval for a difference of two unpaired proportions may have inadequate coverage. We re-visit the properties of likelihood-based intervals for proportions and their differences. Published results indicate that these methods produce a mean coverage probability slightly above the nominal confidence level. We argue (Newcombe and Nurminen, 2011) that it is appropriate to align the mean rather than the minimum coverage with the nominal one, based on a moving average over an interval of parameter values which represents a reasonable expression of prior uncertainty. We will show under various simulation scenarios that the poor coverage properties claimed by Santner et al. actually relate to an ostensibly similar but in fact inferior version of the method (Mee, 1984). In conclusion, likelihood-based intervals, including the MN method, have gained a welldeserved place as appropriate tools for useful application in epidemiologic and medical research practice. We call for a wider availability of these procedures in statistical software, including StatXact (2007), based on a correct test statistic that employs the unbiased estimator of variance in the MN interval method. References Mee, R.W. (1984). Confidence bounds for the difference between two probabilities. Biometrics, 40, pp. 1175-1176. Miettinen, O., Nurminen, M. (1985). Comparative analysis of two rates. Statistics in Medicine, 4, pp. 213-226. Newcombe, R.W., Nurminen, M.M. (2011). In defence of score intervals for proportions and their differences. Communications in Statistics - Theory and Methods, 40: 7, 1271-1282. Online publication: http://markstat.net/en/images/stories/cstm.pdf Santner, T.J., Pradhan, V., Senchaudhuri, P., Mehta, C.R., Tamhane, A. (2007). Small-sample comparisons of confidence intervals for the difference of two independent binomial proportions. Computational Statistics & Data Analysis, 5, pp. 5791-5799. StatExact 8. (2007). Statistical Software for Exact Nonparametric Inference. User Manual. Equation 14.96, p. 405. Cambridge, MA: CYTEL Software Corporation.

Introduction This communication is a response to an article authored by Santner et al. (2007) who criticized the score method for calculating confidence intervals for differences between two independent binomial proportions, originally developed by Miettinen and Nurminen (MN) (1985). Objectives (1) It is necessary to distinguish between different versions of score intervals when discussing their coverage probability, with special respect to Santner et al. s assertion that the MN intervals may have inadequate coverage. (2) In the evaluation of the performance of the various methods it is more appropriate to consider the moving average of coverage probabilities instead of the minimum of coverage probabilities in view of the oscillating pattern typically occurring for binomial confidence intervals.

Asymptotic and Exact Approaches to the Construction of Interval Estimates for Binomial Proportions A general procedure for asymptotic interval estimation described by Cox and Hinkley (1974), the score method, is based on a test statistic which contrasts the parameter of interest θ to its empirical value θ ~, usually taken as a maximum likelihood estimate, and which has the variance of θ ~ as its denominator. These interval estimates are obtained by inverting a chi-squared test with 1 degree of freedom corresponding to a large-sample Gaussian distribution rendered free of any nuisance parameters. Evaluative studies commonly report both mean and minimum coverage probability. We argue in this article that it is appropriate to align the mean coverage with the nominal level, 1-α. Confidence interval methods such as the exact method (Clopper & Pearson, 1934) that align the minimum coverage with 1-α may be regarded as leading to unnecessarily wide intervals, or conversely, as requiring unnecessarily large sample sizes in order to attain a desired degree of precision, as shown below.

Score Intervals for the Difference between Independent Proportions Miettinen and Nurminen (1985) derived score intervals for three effect size measures commonly used to compare two independent binomial proportions, their difference, ratio, and the odds ratio. This study (for elaboration, see Nurminen, 1984; Miettinen, 1985), and several subsequent ones (Nurminen, 1986, 1988, 1990; Nurminen and Miettinen, 1990; Nurminen and Nurminen, 1990; Newcombe, 1998b; Liu et al, 2006) confirmed that this approach produces average coverage probabilities close to 1-α. We show that the poor coverage properties claimed by Santner et al actually relate to an ostensibly similar but in fact inferior version of the score interval proposed by Mee (1984). The crucial difference between the Mee and Miettinen-Nurminen (MN) formulations is that in the latter the variance incorporates a correction factor (n 1 + n 2 ) /(n 1 + n 2-1), where n 1 and n 2 denote the two sample sizes. This seemingly small difference has a substantial impact on coverage. In a matched-pairs design, with n 1 = n 2 = 1, the factor equals 2.

Generation of Parameter Space Points One essential way in which our evaluation differs from that of Santner et al. is the algorithm used to generate the proportions (p 1, p 2 ). They generated these from two independent uniform distributions. While this is obviously the simplest choice, it has limited relevance to the differences encountered in practice, e.g. in clinical trials. It would be very unreasonable to plan the size of a clinical trial on the assumption that such a large difference would occur. In practice, when two proportions get compared, they are generally broadly similar in magnitude. We generated random parameter space points (PSPs) in a more appropriate way. Defining the risk difference parameter as RD = p 1 p 2 and the nuisance parameter as P a = (p 1 + p 2 )/2, we first chose P a from U(0, 1) and then RD > 0 from U(0, 1 2P a 1 ).

Proportion of parameter space points with coverage probability greater than 1-α based on 10,000 parameter space points for each sample size (n 1, n 2 ) pair. 90% intervals Wald Mee MN Exact Mid-P Square & add n 1 = n 2 = 5 0.0005 0.7135 0.8564 0.9983 0.8203 0.7096 n 1 = n 2 = 15 0.0242 0.4321 0.5658 1 0.6070 0.7099 n 1 = n 2 = 30 0.0354 0.3746 0.5264 1 0.5492 0.6880 n 1 = 5, n 2 = 15 0.0178 0.6867 0.8082 0.9997 0.7729 0.7488 n 1 = 15, n 2 = 25 0.0144 0.4514 0.6027 0.9456 0.4999 0.6606 n 1 = 25, n 2 = 35 0.0299 0.3888 0.5437 0.9275 0.4297 0.6214 n 1 = 20, n 2 = 50 0.0242 0.4916 0.6466 0.9700 0.4764 0.6595 95% intervals n 1 = n 2 = 5 0 0.8140 0.9494 0.9991 0.8899 0.7351 n 1 = n 2 = 15 0.0010 0.5485 0.7222 1 0.7047 0.7200 n 1 = n 2 = 30 0.0026 0.4306 0.5926 1 0.6232 0.7231 n 1 = 5, n 2 = 15 0.0032 0.7676 0.8777 0.9999 0.8539 0.7205 n 1 = 15, n 2 = 25 0.0078 0.5897 0.7525 0.9561 0.6018 0.6873 n 1 = 25, n 2 = 35 0.0052 0.4759 0.6454 0.9358 0.5056 0.6447 n 1 = 20, n 2 = 50 0.0197 0.6335 0.7938 0.9787 0.5837 0.7040

Results on Coverage Probabilities Santner s percentage of PSPs conservative criterion turns out to be particularly sensitive to detect differences in coverage properties between different methods. Clearly, for several combinations of n 1, n 2, and 1 α, the MN interval has coverage over 1 α for more than 50% of PSPs whereas the Mee interval has coverage over 1 α for less than 50% of the same PSPs. This criterion shows two methods in a favorable light here, the MN interval and, arguably even more so, the square-and-add interval. The exact interval (based on profiling out the nuisance parameter ) is simply too wide, whereas even the otherwise excellent though computationally laborious mid-p interval does not fare quite so well by this criterion. In fact, Santner et al. used the minimum proximity criterion, according to which the nominal level is taken to represent an absolute minimum, but with coverage probabilities (CPs) exceeding the nominal level as little as possible. This stance favors methods that are strictly conservative, and led the assessors to rank the performance of the asymptotic MN method in a pejorative manner. Thus it is hardly surprising that Santner et al. s comments come across as unfavorable, and that they found the MN method performed poorly according to their implied criterion of strict conservatism. However, their stated primary criterion of nearness of achieved coverage, interpreted literally would mean that the MN method was the best of the five they assessed.

Moving Average Coverage Probabilities for the Miettinen-Nurminen Confidence Interval Courtesy of Pete Laud, AstraZeneca UK Limited

Conclusions We emphasize the desirability of a comprehensible quantitative appraisal of the precision of empirical findings in applied research. The better the users of statistical methods see how these function to achieve the desired objectives, the more likely the practitioners will be satisfied in applying them again. We regard the arguments in favor of aligning typical rather than minimum coverage with the nominal level as very cogent ones. What really matters is not the theoretical coverage probability at a point value for the parameter, but rather an average over an interval of parameter values which represents a reasonable expression of prior uncertainty. We call for a wider availability of score-based intervals in statistical software. These have gained a well-deserved place as appropriate tools for useful application in epidemiological and medical research practice. A correct form of the MN statistic, one that employs the unbiased estimator of variance will be incorporated into the next release of StatXact (Cyrus Mehta, personal communication, 2011).