A SAS/AF Application For Sample Size And Power Determination Fiona Portwood, Software Product Services Ltd. Abstract When planning a study, such as a clinical trial or toxicology experiment, the choice of sample size (i.e. the number of patients or animals to use in the study) is a critical part of the study in order to ensure that the desired conclusions of efficacy or equivalence of the test treatment are successfully demonstrated at the end of the study. Since there are no SAS procedures for performing these calculations, the task of tracking down the necessary background information and the correct statistical formulae can become an obstacle, which may result in researchers being tempted to guess at an appropriate sample size. An application has been developed by the author using SAS/AF software which incorporates the various statistical formulae for sample size selection and power determination behind a user-friendly, point-and-click graphical interface. This application allows the user to estimate the sample size, power, and minimum detectable differences between treatments for various statistical hypotheses, including tests of equivalence and tests of differences between means and proportions. Results are given in both graphical and tabular formats. This paper will give an introduction to the statistical concepts behind sample size and power determination, together with a demonstration of the SAS/AF application. Introduction When planning a study or experiment, for example a clinical trial, a toxicology study or preclinical research experiment, one of the first questions that a statistician is often asked is, How many patients/animals should be used in the study?. The choice of sample size is a critical part of the study in order to ensure that the desired conclusions of efficacy or equivalence of the test treatment are successfully demonstrated at the end of the study using the appropriate statistical methodologies. The task of tracking down the necessary background information and the correct statistical formulae for determining the sample size can be difficult, and thus researchers may be tempted to use their best guess at an appropriate sample size. The number of patients or animals that should be used in a study will depend on a number of factors, including: Purpose of the study (for example, whether you wish to test for a difference or an equivalence between treatments) Study design (for example, cross-over design or parallel group design) Variance of the underlying population Size of the difference between treatments
Definitions It is necessary to define the following statistical definitions which will be used frequently in this paper: 1. Null Hypothesis This is a basic assumption or theory about the data which we assume to be true in the absence of any other information. The alternative hypothesis is the converse theory to the null hypothesis. During any study or experiment, we hope to collect enough information to provide evidence against the null hypothesis and for the alternative hypothesis. For example, suppose we wish to carry out a study to test for a difference between a control and a test treatment. The null hypothesis would be that population means for the control and test treatments are equal, and the alternative hypothesis would be that the population means are not equal. 2. Significance Level For tests of difference between treatments, this is the probability of incorrectly rejecting the null hypothesis, i.e. it is the probability of rejecting the null hypothesis of no difference between the treatments, when in reality there is truly no difference. We usually want this probability to be small, and it is commonly set to 0.05 (or 5%). 3. Power This is the probability of correctly rejecting the null hypothesis, i.e. it is the probability of rejecting the null hypothesis when the alternative hypothesis is true. Ideally we would like the power to be as large as possible. Calculation of Sample Size and Power There are various statistical formulae, depending on the null hypothesis of the study, which define the relationship between the sample size, significance level, power, population variance (or estimate of variance) and the size of the detectable difference between treatments. Consider the simplest example where we wish to perform a parallel group study to investigate the difference between a control and a test treatment. Suppose that the control population has a Normal distribution with mean µ 1, variance σ 2, and the test population has a Normal distribution with mean µ 2, but the same variance σ 2. The hypotheses are as follows: Null Hypothesis: H 0 : µ 1 - µ 2 = 0 (no difference between treatments) Alternative Hypothesis: H 1 : µ 1 - µ 2 0 (there is a difference between treatments) Suppose a sample size of n 1 observations is taken from the control population, and a sample size of n 2 observations is taken from the test population. The significance level is 100α% and the probability of incorrectly accepting the null hypothesis is β (so the power is 100(1-β)%).
Suppose d = µ 1 - µ 2 is the true difference between the sample means, and the estimated difference is d. Then using a two-sample t-test to compare the sample means, we will incorrectly accept the null hypothesis (with probability β) if the test statistic is less than the critical value at the 100(1-α)/2% level (since this is a 2-tailed test). So Prob( d / (σ (1/n 1 + 1/n 2 ) ) < Z (1-α)/2 d 0 ) = β Prob( Z < Z (1-α)/2 - d /(σ (1/n 1 + 1/n 2 ) ) ) = β where Z ~ N(0,1) d 2 /(σ 2 (1/n 1 + 1/n 2 ) ) = (Z β - Z (1-α)/2 ) 2 = (Z α/2 + Z β ) 2 since Z α/2 = -Z (1-α)/2 Therefore n 1 = (Z α/2 + Z β ) 2 σ 2 (1 + 1/R) / d 2 where R = n 2 /n 1 (ratio of the second to the first sample size). In general, suppose S is a sample estimate of f(µ) and is Normally distributed with mean f(µ) and variance V, and that the two samples have equal variances. Then for a test to measure a difference d with power 100(1-β)% at the 100α% significance level (or 100(1-α)% confidence coefficient for tests of equivalence), we have the relationship: V = d 2 / (g(α,β)) 2 The variance V of the sample S will depend on the sample size, n, and so expressing V as v/n gives: n = (g(α,β)) 2 v / d 2 ------------ (1) g(α,β) will depend on the nature of the test, and v will depend on the model parameter f(µ) which is being tested. The tables below illustrate the values of g(α,β) and v respectively for a selection of common tests: Type Of Test g(α,β) 1-tailed difference (Z α + Z β ) 2 1-sided equivalence (Z α + Z β ) 2 2-tailed difference (Z α/2 + Z β ) 2 2-sided equivalence (Z α/2 + Z β/2 ) 2 f(µ) v Single Mean σ 2 Difference of Two Means (1 + 1/R)σ 2 Difference of Two Proportions p 1 (1-p 1 ) + p 2 (1-p 2 )/R Equation (1) can also be rearranged to solve for d and/or β. There is currently no standard SAS procedure for performing these calculations, and so a statistician or researcher might look up these formulae in a statistical text book and then perform some hand calculations, or maybe do some data-step programming. To overcome these difficulties, the author has written an application using SAS/AF which incorporates the various options and statistical formulae behind an easy-to-use graphical user interface.
Tests Of Proportions For tests involving proportions, the accuracy of the Normal approximation improves as the sample size increases. However, for accuracy at small sample sizes, Fisher s Exact Test should be used. Computing sample sizes for this method involves an iterative procedure, and the application uses an approximation due to Casagrande, Pike and Smith, 1980 (Ref. [1]). Note that estimates of p 1 and p 2 for the proportions of samples 1 and 2 respectively are needed to estimate the variance of the statistic. In practice, we assume that p 1 = p 2 for equivalence tests, and that p 2 differs from p 1 by an amount d for difference tests. The exact values of p 1 and p 2 make little difference to the results except for values very close to 0 or 1. In general the difference between the proportions p 1 - p 2 is much more important. Estimating Variance Usually the variance is not known in advance, and is therefore estimated from the sample. For large sample sizes, the Normal approximation provides a fair approximation, but for small samples, we should be using the T distribution. However, in order to use the T distribution, we need to know the degrees of freedom (df) in terms of the sample sizes which we are trying to calculate. For example, for a two-tailed test of difference with estimated variance s 2, equation (1) becomes: n = (t df(n(1+1/r)),α/2 + t df(n(1+1/r)),β ) 2 s 2 (1 + 1/R) / d 2 If the sample size n is known, it is as straight forward as before to calculate power or difference given the other values. However, calculating n requires an iterative procedure since the degrees of freedom of the t-values on the right hand side of the equation vary with n. An initial estimate of n is obtained by using the Normal approximation, and then this is used to determine a value for the degrees of freedom, and hence a new value of n. A number of iterations of this sequence is performed until convergence is obtained. For further details, see Dupont & Plummer, 1990 (Ref. [2]). The Application The sample size & power determination application combines the various formulae for sample size estimation behind a point-and-click graphical user interface, which was designed to be as easy to use as possible. By default, the application will calculate sample sizes for a test of differences between two treatment means using a 2-tailed test at the 5% significance level, assuming the sample sizes of each treatment group are equal. For these defaults, the user simply needs to enter an estimate of the population variance. The results are available in both graphical and tabular format. The user selects the format they require together with the attribute (power, minimum detectable difference or sample size) to be used for the curves of the graph or cells of the table. Once the desired options have been selected, the user clicks on the Run button to produce the results.
Figure 1: Main Screen with Graphical Output Figure 1 above shows the main screen of the sample size & power determination application with some graphical output for the default test options when the estimated population variance is 9. The graph shows sample size plotted against minimum detectable difference, and the curves show points of constant power. Options are available for printing, editing, or saving the graph to a graphics file. The same information can also be presented in a tabular format as shown in Figure 2 below. This table can be printed or saved to a SAS data set. Figure 2: Main Screen with Tabular Output
Figure 3: Sample Size Estimation Options Figure 3 shows the options available for performing sample size estimation. These options are not exhaustive options for sample size estimation by any means, but they give an idea of what is possible using SAS/AF software. The test-type (hypothesis), the significance level, and the ratio of sample sizes in the two samples can all be modified. The Normal or t-test approximation can be selected for use in the sample size calculations, and for tests of proportions, the Casagrande-Pike-Smith approximation is also available. If the t-test approximation is selected, the user is asked to specify a formula for the degrees of freedom in terms of the sample sizes (m & n) of the two groups. For example, for a parallel group study design the degrees of freedom are (m + n - 2), and for a cross-over study design, the degrees of freedom are (m - 2). Options for customising the graphical output are shown in Figure 4. The axis and curve variables can be selected, and the range of the axes, the number of tick marks, and the number and values of the curves plotted can be modified. Figure 5 shows the options available for customising the tabular output. The row, columns and cell variables can be selected, and the user can specify whether either a continuous range of values or specific user-supplied values are used to define the rows and columns of the table.
Figure 4: Graphical Output Options Figure 5: Tabular Output Options
Conclusion SAS/AF software provides excellent facilities for combining the various formulae for sample size estimation behind an easy-to-use, point-and-click graphical user interface. Use of this application greatly simplifies the task of sample size estimation and power determination. References [1] Casagrande, Pike & Smith, 1980, A Simple Approximation for Calculation Sample Sizes for Comparing Independent Proportions, Biometrics, Vol. 36, pp343-346. [2] Dupont & Plummer, 1990, Power and Sample Size Calculations, Controlled Clinical Trials, Vol. 11, pp116-128. Further Information For further information, please contact: Fiona Portwood, Software Product Services Ltd., Regent House, 19-20 The Broadway, Woking, Surrey, GU21 5AP, UK. Tel: + 44 (0) 1483 730771 Fax: + 44 (0) 1483 727417 Email: software@spsuk.demon.co.uk SAS and SAS/AF are registered trademarks of SAS Institute Inc., Cary, NC, USA.