ADJUSTED POWER ESTIMATES IN. Ji Zhang. Biostatistics and Research Data Systems. Merck Research Laboratories. Rahway, NJ

Similar documents
Antonietta Mira. University of Pavia, Italy. Abstract. We propose a test based on Bonferroni's measure of skewness.

Mantel-Haenszel Test Statistics. for Correlated Binary Data. Department of Statistics, North Carolina State University. Raleigh, NC

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary

Lecture 21. Hypothesis Testing II

Can we do statistical inference in a non-asymptotic way? 1

Contingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878

1 Statistical inference for a population mean

BTRY 4090: Spring 2009 Theory of Statistics

STAT 461/561- Assignments, Year 2015

Applications of Basu's TheorelTI. Dennis D. Boos and Jacqueline M. Hughes-Oliver I Department of Statistics, North Car-;'lina State University

Power Calculations for Preclinical Studies Using a K-Sample Rank Test and the Lehmann Alternative Hypothesis

Spring 2012 Math 541B Exam 1

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

A Multivariate Two-Sample Mean Test for Small Sample Size and Missing Data

STAT Section 3.4: The Sign Test. The sign test, as we will typically use it, is a method for analyzing paired data.

Likelihood Ratio Tests and Intersection-Union Tests. Roger L. Berger. Department of Statistics, North Carolina State University

TUTORIAL 8 SOLUTIONS #

Sociology 6Z03 Review II

Chapter 11 - Sequences and Series

The Delta Method and Applications

H 2 : otherwise. that is simply the proportion of the sample points below level x. For any fixed point x the law of large numbers gives that

Discussion Paper Series

Ling 289 Contingency Table Statistics

Recall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n

Lecture 1: Brief Review on Stochastic Processes

Economics 241B Review of Limit Theorems for Sequences of Random Variables

Lecture 21: October 19

LECTURE 5. Introduction to Econometrics. Hypothesis testing

Contents. 2.1 Vectors in R n. Linear Algebra (part 2) : Vector Spaces (by Evan Dummit, 2017, v. 2.50) 2 Vector Spaces

S D / n t n 1 The paediatrician observes 3 =

The Components of a Statistical Hypothesis Testing Problem

Multivariate Time Series

How do we compare the relative performance among competing models?

Statistics 3858 : Contingency Tables

Contingency Tables. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels.

Discrete Dependent Variable Models

Just Enough Likelihood

Comparison of Two Samples

Statistics in medicine

Loglikelihood and Confidence Intervals

Confidence Intervals, Testing and ANOVA Summary

Statistical Significance of Ranking Paradoxes

The best expert versus the smartest algorithm

Statistical Distribution Assumptions of General Linear Models

Introductory Econometrics. Review of statistics (Part II: Inference)

Summary of Chapters 7-9

CHAPTER 8. Test Procedures is a rule, based on sample data, for deciding whether to reject H 0 and contains:

Multivariate Time Series: Part 4

Optimal rejection regions for testing multiple binary endpoints in small samples

Confidence intervals for kernel density estimation

Practice Problems Section Problems

Inverse Sampling for McNemar s Test

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

LECTURE 10: REVIEW OF POWER SERIES. 1. Motivation

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk

Averaging Estimators for Regressions with a Possible Structural Break

Asymptotic distribution of the sample average value-at-risk

Stat 5101 Lecture Notes

Institute of Actuaries of India

Chapter 5: HYPOTHESIS TESTING

Testing for a break in persistence under long-range dependencies and mean shifts

2.3 Analysis of Categorical Data

The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80

Introduction to Statistics

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Combinations. April 12, 2006

The Multinomial Model

Chapter 1 Statistical Inference

Lecture 8 Inequality Testing and Moment Inequality Models

Business Statistics 41000: Homework # 5

Linear Regression and Its Applications

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE

Distribution Fitting (Censored Data)

Statistics. Statistics

dqd: A command for treatment effect estimation under alternative assumptions

Taylor and Maclaurin Series

Scaled and adjusted restricted tests in. multi-sample analysis of moment structures. Albert Satorra. Universitat Pompeu Fabra.

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.

On power and sample size calculations for Wald tests in generalized linear models

Lectures on Statistics. William G. Faris

LECTURE 5 HYPOTHESIS TESTING

Data Analysis: Agonistic Display in Betta splendens I. Betta splendens Research: Parametric or Non-parametric Data?

1 Hypothesis testing for a single mean

Test for Discontinuities in Nonparametric Regression

Two Correlated Proportions Non- Inferiority, Superiority, and Equivalence Tests

AP Statistics Cumulative AP Exam Study Guide

Chapter Three. Hypothesis Testing

7 Random samples and sampling distributions

Math 152. Rumbos Fall Solutions to Exam #2

Statistical Inference

Section 3 : Permutation Inference

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) 1.1 The Formal Denition of a Vector Space

Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence

PROBABILITY DISTRIBUTIONS: DISCRETE AND CONTINUOUS

Module 9: Nonparametric Statistics Statistics (OA3102)

Sample size calculations for logistic and Poisson regression models

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

MA Advanced Econometrics: Applying Least Squares to Time Series

Transcription:

ADJUSTED POWER ESTIMATES IN MONTE CARLO EXPERIMENTS Ji Zhang Biostatistics and Research Data Systems Merck Research Laboratories Rahway, NJ 07065-0914 and Dennis D. Boos Department of Statistics, North Carolina State University Raleigh, NC 27695-8203 Key Words and Phrases: Critical value; empirical level; McNemar's test; true level. ABSTRACT Critical values and powers of competing tests are often evaluated through Monte Carlo simulations. Since the true levels of those tests are often very dierent, comparisons of the power estimates are not valid. We suggest that Monte Carlo estimates of critical values be used to create adjusted power estimates which are then comparable. The main contribution is to analyze the variability of the adjusted estimates and to point out implications for the planning of Monte Carlo power studies. 1

1 INTRODUCTION Monte Carlo experiments are often the simplest way to estimate and compare the power functions of complex test statistics. Unfortunately, at the end of such studies one may nd results such as the following: TABLE I = 0 = 1:5 = 3:0 = 4:5 Test 1:.08.24.37.74 Test 2:.03.20.30.62 where = 0 corresponds to the null hypothesis,.05 is the nominal level of the tests, and the entries such as.08 are the proportion of test rejections in N Monte Carlo replications. If N is large enough so that the dierence :08 :03 is not the result of sampling variability, then these two power functions are not comparable in the usual sense because Test 1 has a \head start" over Test 2. Thus it would seem important to adjust the power curves if comparing them is of interest. The simple solution is to just estimate the true critical values by Monte Carlo simulation and use those estimates throughout the study. Many experimenters would automatically take this approach. The purpose of this paper is to analyze such power estimates when the critical values themselves have been estimated by Monte Carlo methods. We use the term \adjusted power estimates" because test statistics often come with standard critical values (usually from asymptotic arguments) such as normal or chi-squared quantiles. We then think of the use of estimated critical values as an adjustment to make the power curves comparable. Table II in Section 2 shows how we prefer to display the nal results. 2

The paper is organized as follows. Section 2 introduces the notation and denitions. Section 3 gives an asymptotic analysis of the mean and variance of the adjusted power estimate, and Section 4 comments on the eect the estimation of critical values has on McNemar's test for equality of power curves. Results in both Sections 3 and 4 have implications for the planning of Monte Carlo power studies. 2 THE ADJUSTED POWER ESTIMATE Consider a test statistic T for which a critical value C has been proposed. of nominal level A typical Monte Carlo experiment might draw N0 independent samples at the null hypothesis and N1; : : : ; N k samples at points in the alternative hypothesis. For simplicity we shall often assume k=1. The estimated true level of the test is then XN0 ^p 0 = 1 I(T0i C); (1) N0 i=1 where T01; : : : ; T0N0 are the test statistics for the N0 Monte Carlo samples under the null hypothesis, and I(A) is the indicator function of a set A such that I(A) = 1 if A is true and 0 otherwise. Thus the sum above just counts the number of test rejections using the critical value C. Similarly, the estimated power at an alternative is ^p 1 = 1 N1 XN1 where T11; : : : ; T1N1 I(T1i C i=1 under the alternative hypothesis. ); (2) are the test statistics for the N1 Monte Carlo samples A typical result for ^p 0 and ^p 1 are the values.08 and.24 in row 1 of Table I. If N0 is larger than about 200, then a simple binomial test of H0: true level = 3

.05 would suggest that the true level is bigger than the nominal level = :05. This makes the value.24 not comparable to power results for other tests with true level =.05 or for a test such as in the second row of Table I. At this point the experimenters appear forced to rerun the experiment unless they have saved the individual outcomes from each Monte Carlo replication (which is always a good practice if the cost of running the experiment is fairly high). We shall assume that they have saved those values: T01; : : : ; T0N0 under the null and T11; : : : ; T1N1 under the alternative. In this case they can estimate the critical value for T from the (1 ) quantile of the null observed values, i.e., ^C = the (1 )N0th value of T01; : : : ; T0N0 when these are placed in order from smallest to largest. The adjusted power estimate is found by replacing C by ^C in (2) resulting in XN1 ^p1 = 1 I(T1i ^C ): (3) N1 i=1 Of course the adjusted power at the null hypothesis is just because of the way ^C is chosen. If we carried out these calculations for the two statistics in Table I, we might replace Table I by TABLE II = 0 = 1:5 = 3:0 = 4:5 Test 1:.08 (.05).24 (.19).37 (.30).74 (.62) Test 2:.03 (.05).20 (.22).30 (.33).62 (.68) The adjusted powers in parentheses are then comparable. On the other hand we have left the original power estimates in Table II because they show what kind of power one would obtain by using the standard critical value C even though such a test does not have level. 4

In the next section we will analyze the adjusted power estimate ^p1. 3 PROPERTIES OF THE ADJUSTED POWER ESTIMATE Let F0(x) = P (T x) and F1(x) = P (T x) be the distribution functions of the test statistic T under the null and alternative hypotheses, respectively. The true level critical value C 1 F1(C ) is the power which ^p1 tries to estimate. We assume that F0 and F1 are twice dierentiable at C satises F0(C ) = 1, and and that their densities satisfy 0 < f0(c ) < 1 and 0 < f1(c ) < 1. Then using results from the Bahadur representation theory found in Sering (1980, Sec. 2.5), we have that XN0 ^p1 [1 F1(C )] = f1(c ) 1 N0 i=1 where p N0R N0 XN1 " (1 ) I(T 0i C ) f0(c ) + 1 I(T1i C ) + R N0 + R N1 ; (4) N1 i=1 p! 0 and p p N1R N1! 0 as N0! 1, N1! 1. Then by the CLT applied to (4) we have that ^p1 is asymptotically normal with mean 1 F1(C ) and variance AVAR = f2 1 (C ) (1 ) f0 2 (C ) N0 + F 1(C )[1 F1(C )] : (5) N1 Using Taylor expansions we can give further insight by directly approximating the mean and variance of ^p1: E(^p1) = E E(^p1jT01; : : : ; T0N0) = E[1 F1( ^ C )] = [1 F1(C )] + O(N 1 0 ) # 5

V ar(^p1) = V ar E(^p1jT01; : : : ; T0N0) + E V ar(^p1jt01; : : : ; T0N0) = V ar[1 F1( C ^ )] + EF 1( ^C )[1 F1( ^C )] N1 = f2 1 (C ) (1 ) f0 2 (C ) N0 = AVAR + O(N0 2 ) + O(N1 2 ): + O(N0 2 ) + F 1(C )[1 F1(C )] + O(N1 2 ) N1 These latter results actually require further assumptions on f0 and f1 but show more clearly the error of approximation. In particular we see that the square of the bias is of lower order than the terms of AVAR, and thus we can concentrate attention on AVAR. Note rst that AVAR has two components, the rst depends on the squared ratio of the densities at C divided by N0, and the second is the usual binomial variance for when C is known instead of estimated. To see the relative importance of the two terms of AVAR, let AVAR=A/N0+B/N1. Figure 1 plots the ratio A/B versus power for a onesided test of a normal mean with known variance for three values of, = :10,.05, and.01. The results are in ascending order with the maximum of the ratio = 1.86 at = :10, = 2.22 at = :05, and = 8.87 at = :01. The calculations in Figure 1 are simple: at = :05 f1(c ) = (1:645 ), f0(c ) = (1:645), and power()= 1 F1(C ) = 1 (1:645 ), where and are the standard normal density and distribution functions, respectively, and is the alternative. Similar computations were also carried out for chi-squared type tests. Surprisingly, results very similar to Figure 1 hold true for any test which has a non-central chi-squared power function. Thus for such cases it would appear that the variance of ^p1 is about three times the variance of ^p 1 when is near.05 and N0 = N1 and about nine times as much when is near.01. If this analysis is made after the experiment is completed, then not much 6

Ratio 0 2 4 6 8 0.0 0.2 0.4 0.6 0.8 1.0 Power Figure 1: Ratio of terms in AVAR for one-sided mean test with normal data and known variance. In ascending order the curves are for = :10;.05, and.01. else can be said. However, if one knows before the experiment that adjusted power will be needed (or equivalently that Monte Carlo critical values will be required), then we might try to optimize the choice of sample sizes N0; : : : ; N k, at least if the value of is specied (and using the variance ratios suggested by Figure 1). Rather than give formal procedures, we merely suggest that N0 be chosen considerably larger than N1; : : : ; N k. For example, if k > 1 a rough rule of thumb might be N0 = 10N1; N1 = N2 = : : : = N k. ^ C 4 TESTS FOR EQUALITY OF TWO POWER FUNCTIONS Sometimes it is of interest to provide a formal statistical test of whether two power functions are equal. Consider rst the results in Table I. Suppose that one wanted to compare the estimates.24 and.20 at = 1:5 (ignoring for the moment that the true levels are not equal). If the individual test results are available for each of the N1 Monte Carlo replications, then one would form 7

the two by two table Test 2 Reject Accept Total Test 1 Reject a b a + b Accept c d c + d Total a + c b + d N1 and use McNemar's test to test for equal power (see, e.g., Agresti, 1990, p. 350). The approximate normal statistic typically used is Z = b c p b + c : (6) An exact test may be obtained by noting that b conditioned on b + c is binomially distributed with b + c trials and p = 1=2. McNemar's test is required here because the rejection results for Test 1 and Test 2 are paired since the statistics are both computed on the same Monte Carlo samples. Now suppose that we want to use McNemar's test to compare the adjusted power estimates. For notation we let ^a;^b; ^c; ^d be the counts in the 2 x 2 table when Monte Carlo estimated critical values are used, and ^Z = (^b ^c)= q^b + ^c. Using expansions similar to (4) we can show that ^Z! d N(0; 1 + V ) as N0! 1 and N1! 1 with N1=N0!, 0 < 1, and V is a positive constant depending on and on the joint density of the test statistics under both null and alternative hypotheses. This result tells us that we need to choose N0 large relative to N1 in order to justify the use of N(0; 1) critical values with ^Z. Of course the magnitude of V is also important but hard to characterize in general because of the way that it depends on the joint densities of the test statistics. 8

BIBLIOGRAPHY Agresti, A. (1990). Categorical Data Analysis, New York: John Wiley. Sering, R. J. (1980). Approximation Theorems of Statistics, New York: John Wiley. 9