BIOS 312: Precision of Statistical Inference

Similar documents
Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation

Group Sequential Tests for Delayed Responses. Christopher Jennison. Lisa Hampson. Workshop on Special Topics on Sequential Methodology

Bios 6648: Design & conduct of clinical research

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

Bios 6649: Clinical Trials - Statistical Design and Monitoring

Review. December 4 th, Review

Sequential Monitoring of Clinical Trials Session 4 - Bayesian Evaluation of Group Sequential Designs

Lecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary

Central Limit Theorem ( 5.3)

Monitoring clinical trial outcomes with delayed response: incorporating pipeline data in group sequential designs. Christopher Jennison

Basic Concepts of Inference

Business Statistics. Lecture 10: Course Review

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH

High-Throughput Sequencing Course

General Linear Model (Chapter 4)

Statistics Primer. ORC Staff: Jayme Palka Peter Boedeker Marcus Fagan Trey Dejong

Bios 6649: Clinical Trials - Statistical Design and Monitoring

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

BIO5312 Biostatistics Lecture 6: Statistical hypothesis testings

Business Statistics: Lecture 8: Introduction to Estimation & Hypothesis Testing

Welcome! Webinar Biostatistics: sample size & power. Thursday, April 26, 12:30 1:30 pm (NDT)

Definition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution.

Practice Problems Section Problems

Mathematical Statistics

Correlation and Simple Linear Regression

Math 494: Mathematical Statistics

TMA 4275 Lifetime Analysis June 2004 Solution

Classification. Chapter Introduction. 6.2 The Bayes classifier

Statistical Inference

Two-stage Adaptive Randomization for Delayed Response in Clinical Trials

Preliminary Statistics Lecture 5: Hypothesis Testing (Outline)

Math 494: Mathematical Statistics

Introduction to Statistical Analysis

MS&E 226: Small Data

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions

Lecture 5: ANOVA and Correlation

Hypothesis Testing. ECE 3530 Spring Antonio Paiva

Review of Statistics 101

STAT 4385 Topic 01: Introduction & Review

Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.

Interval estimation. October 3, Basic ideas CLT and CI CI for a population mean CI for a population proportion CI for a Normal mean

One-stage dose-response meta-analysis

SPRING 2007 EXAM C SOLUTIONS

Testing Independence

Stat 5102 Final Exam May 14, 2015

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

Inference for Single Proportions and Means T.Scofield

ME3620. Theory of Engineering Experimentation. Spring Chapter IV. Decision Making for a Single Sample. Chapter IV

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over

Beyond GLM and likelihood

Psychology 282 Lecture #4 Outline Inferences in SLR

Statistical Data Analysis Stat 3: p-values, parameter estimation

Introduction to Statistical Inference

General Regression Model

Statistical Inference

One-sample categorical data: approximate inference

Discrete Multivariate Statistics

Lecture 1: Bayesian Framework Basics

Swarthmore Honors Exam 2012: Statistics

Pubh 8482: Sequential Analysis

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

Reports of the Institute of Biostatistics

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Problem Selected Scores

Inverse Sampling for McNemar s Test

Bios 6649: Clinical Trials - Statistical Design and Monitoring

Machine Learning Linear Classification. Prof. Matteo Matteucci

Group Sequential Designs: Theory, Computation and Optimisation

Multiple Regression Analysis

Introductory Econometrics. Review of statistics (Part II: Inference)

Hypothesis Testing The basic ingredients of a hypothesis test are

Bias Variance Trade-off

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

Interim Monitoring of Clinical Trials: Decision Theory, Dynamic Programming. and Optimal Stopping

Topic 12 Overview of Estimation

Unobservable Parameter. Observed Random Sample. Calculate Posterior. Choosing Prior. Conjugate prior. population proportion, p prior:

MS&E 226: Small Data

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key

Visual interpretation with normal approximation

Multivariate Survival Analysis

Lecture Outline. Biost 518 Applied Biostatistics II. Choice of Model for Analysis. Choice of Model. Choice of Model. Lecture 10: Multiple Regression:

2.830J / 6.780J / ESD.63J Control of Manufacturing Processes (SMA 6303) Spring 2008

MAXIMUM LIKELIHOOD, SET ESTIMATION, MODEL CRITICISM

Econ 325: Introduction to Empirical Economics

Harvard University. Rigorous Research in Engineering Education

Stat 5101 Lecture Notes

Lecture 2: Statistical Decision Theory (Part I)

Section Comparing Two Proportions

Sample Size and Power I: Binary Outcomes. James Ware, PhD Harvard School of Public Health Boston, MA

Sample Size and Power Considerations for Longitudinal Studies

Advanced Herd Management Probabilities and distributions

REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520

Applied Econometrics (QEM)

Accounting for Baseline Observations in Randomized Clinical Trials

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Transcription:

and Power/Sample Size and Standard Errors BIOS 312: of Statistical Inference Chris Slaughter Department of Biostatistics, Vanderbilt University School of Medicine January 3, 2013

Outline Overview and Power/Sample Size and Standard Errors 1 Overview 2 3 4 and Power/Sample Size 5 and Standard Errors 6

and Power/Sample Size and Standard Errors Bias and Goal of statistical inference is to estimate parameters accurately (unbiased) and with high precision Measures of precision Standard error (not standard deviation) Width of confidence intervals Power (equivalently, type II error rate)

and Power/Sample Size and Standard Errors Summary measures Scientific hypotheses are typically refined in statistical hypotheses by identifying some parameter, θ, measuring differences in the distribution of the response variable Often we are interested in if θ differs across of levels of categorical (e.g. treatment/control) or continuous (e.g. age) predictor variables θ could be any summary measure such as Difference/ratio of means Difference/ratio of medians Ratio of geometric means Difference/ratio of proportions Odds ratio, relative risk, risk difference Hazard ratio

and Power/Sample Size and Standard Errors Choosing summary measure How to select θ? In order of importance... 1 Scientific (clinical) importance. May be based on current state of knowledge 2 Is θ likely to vary across the predictor of interest? Impacts the ability to detect a difference, if it exists. 3 Statistical precision. Only relevant if all other factors are equal.

and Power/Sample Size and Standard Errors Statistical inference Statistics is concerned with making inference about population parameters, (θ), based on a sample of data Frequentist estimation includes both point estimates (ˆθ) and interval estimates (confidence intervals) Bayesian analysis estimates the posterior distribution of θ given the sampled data, p(θ data). The posterior distribution can then be summarized by quantities like the posterior mean and 95% credible interval. Likelihood analysis focuses on using the likelihood function to obtain maximum likelihood estimates. The likelihood function can be used directly to obtain upper and lower confidence-type intervals for estimates.

and Power/Sample Size and Standard Errors example Consider the following results from 5 clinical trials of three drugs (A, B, C) designed to lower cholesterol compared to baseline. Assume a 10 unit drop in cholesterol (relative to baseline) is clinically meaningful. Trial Drug Pts Mean diff Std dev Std error 95% CI for diff p-value 1 A 30-30 191.7 49.5 [-129, 69] 0.55 2 A 1000-30 223.6 10 [-49.6, -10.4] 0.002 3 B 40-20 147.6 33 [-85, 45] 0.55 4 B 4000-2 147.6 3.3 [-8.5, 4.5] 0.54 5 C 5000-6 100.0 2 [-9.9, -2.1] 0.002

and Power/Sample Size and Standard Errors example Consider the following results from 5 clinical trials of three drugs (A, B, C) designed to lower cholesterol compared to baseline. Assume a 10 unit drop in cholesterol (relative to baseline) is clinically meaningful. Trial Drug Pts Mean diff Std dev Std error 95% CI for diff p-value 1 A 30-30 191.7 49.5 [-129, 69] 0.55 2 A 1000-30 223.6 10 [-49.6, -10.4] 0.002 3 B 40-20 147.6 33 [-85, 45] 0.55 4 B 4000-2 147.6 3.3 [-8.5, 4.5] 0.54 5 C 5000-6 100.0 2 [-9.9, -2.1] 0.002 Which drug is effective at reducing cholesterol? Why is study 4 more informative than study 3 (even though the p values are similar)?

and Power/Sample Size and Standard Errors example Consider the following results from 5 clinical trials of three drugs (A, B, C) designed to lower cholesterol compared to baseline. Assume a 10 unit drop in cholesterol (relative to baseline) is clinically meaningful. Trial Drug Pts Mean diff Std dev Std error 95% CI for diff p-value 1 A 30-30 191.7 49.5 [-129, 69] 0.55 2 A 1000-30 223.6 10 [-49.6, -10.4] 0.002 3 B 40-20 147.6 33 [-85, 45] 0.55 4 B 4000-2 147.6 3.3 [-8.5, 4.5] 0.54 5 C 5000-6 100.0 2 [-9.9, -2.1] 0.002 Which drug is effective at reducing cholesterol? Why is study 4 more informative than study 3 (even though the p values are similar)? Moral: Hypothesis tests and p-values can often be insufficient to make proper decisions. The confidence interval provides more useful information.

Outline Overview and Power/Sample Size and Standard Errors 1 Overview 2 3 4 and Power/Sample Size 5 and Standard Errors 6

and Power/Sample Size and Standard Errors Sampling distribution defined The sampling distribution is the probability distribution of a statistic ( ) e.g. the sampling distribution of the sample mean is N µ, σ2 n Most often we choose estimators that are asymptotically Normally distributed For large n, ˆθ N ( ) θ, V n ˆθ is our estimate of θ. Theˆindicates it is an estimate. Mean: θ Variance: V, which is related to the average amount of statistical information available from each observation Often V depends on θ Large n depends on the distribution of the underlying data. If n is large enough, approximate Normality of ˆθ will hold.

and Power/Sample Size and Standard Errors Confidence intervals when n is large Calculating 100(1 α)% confidence intervals (θ L, θ U ) with approximate Normality θ L = ˆθ V Z 1 α/2 n V θ U = ˆθ + Z 1 α/2 n (estimate) ± (crit val) (std err of estimate) Can similarly calculate approximate two-sided p-values Z = (estimate) (hyp value) (std err of estimate) p-value in Stata: 2 norm p-value in R: use the pnorm() function ( ( )) abs (estimate) (hyp value) (std err of estimate)

and Power/Sample Size and Standard Errors Comparing independent estimates If estimates are independent and Normally distributed ˆθ 1 N ( ) θ 1, se1 2 and ˆθ2 N ( ) θ 2, se2 2 Then, ˆθ 1 ˆθ 2 N ( ) θ 1 θ 2, se1 2 + se2 2 ˆθ 1 + ˆθ 2 N ( ) θ 1 + θ 2, se1 2 + se2 ( ) 2 ˆθ 1 N θ1 ˆθ 2 θ 2, se1 2 + θ2 1 se 2 θ2 2 2

and Power/Sample Size and Standard Errors Comparing correlated estimates If estimate are correlated and Normally distributed ˆθ 1 N ( ) θ 1, se1 2 and ˆθ 2 N ( ) θ 2, se2 2 ρ = corr(ˆθ 1, ˆθ 2) Then, ˆθ 1 ˆθ 2 N ( ) θ 1 θ 2, se1 2 + se2 2 2 ρ se 1 se 2 ˆθ 1 + ˆθ 2 N ( ) θ 1 + θ 2, se1 2 + se2 2 + 2 ρ se 1 se 2 Example: Comparing results from the same study Paper may not give the interesting results (from your point of view) Comparison can be difficult because correlation usually not reported

Outline Overview and Power/Sample Size and Standard Errors 1 Overview 2 3 4 and Power/Sample Size 5 and Standard Errors 6

and Power/Sample Size and Standard Errors Classical hypothesis testing Classical hypothesis testing is stated in terms of the null hypothesis (H 0 ). The alternative hypothesis (H 1 ) is the complement of H 0 Two sided: H 0 : θ = θ 0 vs H 1 : θ θ 0 One sided: H 0 : θ θ 0 vs H 1 : θ < θ 0 One sided: H 0 : θ θ 0 vs H 1 : θ > θ 0 Inference is based on either rejecting or failing to reject the null hypothesis Typically, the null hypothesis is stated in some form so as to indicate no association

and Power/Sample Size and Standard Errors Classical hypothesis testing thought process Assumes H 0 is true

and Power/Sample Size and Standard Errors Classical hypothesis testing thought process Assumes H 0 is true Conceives of data as one of many datasets that might have happened

and Power/Sample Size and Standard Errors Classical hypothesis testing thought process Assumes H 0 is true Conceives of data as one of many datasets that might have happened See if data are consistent with H 0

and Power/Sample Size and Standard Errors Classical hypothesis testing thought process Assumes H 0 is true Conceives of data as one of many datasets that might have happened See if data are consistent with H 0 Are data extreme or unlikely if H 0 is really true?

and Power/Sample Size and Standard Errors Classical hypothesis testing thought process Assumes H 0 is true Conceives of data as one of many datasets that might have happened See if data are consistent with H 0 Are data extreme or unlikely if H 0 is really true? Proof by contradiction: if assuming H 0 is true leads to results that are bizarre or unlikely to have been observed, casts doubt on premise

and Power/Sample Size and Standard Errors Classical hypothesis testing thought process Assumes H 0 is true Conceives of data as one of many datasets that might have happened See if data are consistent with H 0 Are data extreme or unlikely if H 0 is really true? Proof by contradiction: if assuming H 0 is true leads to results that are bizarre or unlikely to have been observed, casts doubt on premise Evidence summarized through a single statistic capturing a tendency of data, e.g., x

and Power/Sample Size and Standard Errors Classical hypothesis testing thought process Assumes H 0 is true Conceives of data as one of many datasets that might have happened See if data are consistent with H 0 Are data extreme or unlikely if H 0 is really true? Proof by contradiction: if assuming H 0 is true leads to results that are bizarre or unlikely to have been observed, casts doubt on premise Evidence summarized through a single statistic capturing a tendency of data, e.g., x Look at probability of getting a statistic as or more extreme than the calculated one (results as or more impressive than ours) if H 0 is true (the P-value)

and Power/Sample Size and Standard Errors Classical hypothesis testing thought process cont. If the statistic has a low probability of being observed to be this extreme we say that if H 0 is true we have acquired data that are very improbable, i.e., have witnessed a low probability event

and Power/Sample Size and Standard Errors Classical hypothesis testing thought process cont. If the statistic has a low probability of being observed to be this extreme we say that if H 0 is true we have acquired data that are very improbable, i.e., have witnessed a low probability event Then evidence mounts against H 0 and we might reject it

and Power/Sample Size and Standard Errors Classical hypothesis testing thought process cont. If the statistic has a low probability of being observed to be this extreme we say that if H 0 is true we have acquired data that are very improbable, i.e., have witnessed a low probability event Then evidence mounts against H 0 and we might reject it A failure to reject does not imply that we have gathered evidence in favor of H 0 many reasons for studies to not be impressive, including small sample size (n)

and Power/Sample Size and Standard Errors Classical hypothesis testing thought process cont. If the statistic has a low probability of being observed to be this extreme we say that if H 0 is true we have acquired data that are very improbable, i.e., have witnessed a low probability event Then evidence mounts against H 0 and we might reject it A failure to reject does not imply that we have gathered evidence in favor of H 0 many reasons for studies to not be impressive, including small sample size (n) Key Limitation Classical hypothesis ignores clinical significance. An approach that allows us to make informed decisions is preferential.

and Power/Sample Size and Standard Errors Decision theoretic approach Stated in terms of the null hypothesis and suitable chosen design alternative

and Power/Sample Size and Standard Errors Decision theoretic approach Stated in terms of the null hypothesis and suitable chosen design alternative Summarize the design alternative through θ 1 (θ 1 > 0) Two sided: H 0 : θ = θ 0 vs H 1 : θ θ 1 or θ θ 1 One sided: H 0 : θ θ 0 vs H 1 : θ θ 1 One sided: H 0 : θ θ 0 vs H 1 : θ θ 1

and Power/Sample Size and Standard Errors Decision theoretic approach Stated in terms of the null hypothesis and suitable chosen design alternative Summarize the design alternative through θ 1 (θ 1 > 0) Two sided: H 0 : θ = θ 0 vs H 1 : θ θ 1 or θ θ 1 One sided: H 0 : θ θ 0 vs H 1 : θ θ 1 One sided: H 0 : θ θ 0 vs H 1 : θ θ 1 Using the decision theoretic approach, can conclude Reject Null Hypothesis. Data is atypical of what we would expect if the null hypothesis is true Reject Alternative Hypothesis. Data is atypical of what we would expect if the alternative hypothesis is true

and Power/Sample Size and Standard Errors Decision theoretic approach cont. Key difference from classical approach The design alternative (θ 1 ) is ideally chosen to be the minimal important difference to detect based on scientific or clinical criteria. Clinical significance: In the cholesterol example, the important difference was assumed to be 10 mg/dl Economic impact: A new drug is not marketable unless it has a large effect Feasibility of study: Limited availability of subjects may limit investigators to searching for interventions with large impact

and Power/Sample Size and Standard Errors Decision theoretic approach cont. Key difference from classical approach The design alternative (θ 1 ) is ideally chosen to be the minimal important difference to detect based on scientific or clinical criteria. Clinical significance: In the cholesterol example, the important difference was assumed to be 10 mg/dl Economic impact: A new drug is not marketable unless it has a large effect Feasibility of study: Limited availability of subjects may limit investigators to searching for interventions with large impact Remember the cholesterol example. Studies 2, 4, and 5 follow the decision theoretic approach because they allow us to discriminate between scientifically meaningful hypotheses.

Outline Overview and Power/Sample Size and Standard Errors 1 Overview 2 3 4 and Power/Sample Size 5 and Standard Errors 6

and Power/Sample Size and Standard Errors Measures of high precision What are the measures of (high) precision? Estimators are less variable across studies, which is often measured by decreased standard error. Narrower confidence intervals. Estimators are consistent with fewer hypotheses if the CIs are narrow. Able to reject false hypotheses. Z statistic is higher when the alternative hypothesis is true.

and Power/Sample Size and Standard Errors Measures of high precision What are the measures of (high) precision? Estimators are less variable across studies, which is often measured by decreased standard error. Narrower confidence intervals. Estimators are consistent with fewer hypotheses if the CIs are narrow. Able to reject false hypotheses. Z statistic is higher when the alternative hypothesis is true. Translation into sample size Based on the width of the confidence interval Choose a sample size such that a 95% CI will not contain both the null and design alternative If both θ 0 and θ 1 cannot be in the CI, we have discriminated between those hypotheses Based on statistical power When the alternative is true, have a high probability of rejecting the null In other words, minimize the type II error rate

and Power/Sample Size and Standard Errors Statistical power: quick review Power is the probability of rejecting the null hypothesis when the alternative is true Pr(reject H 0 θ = θ 1) Most often ˆθ N ( ) θ, V n so that the test statistic Z = ˆθ θ 0 wll V /n follow a Normal distribution Under H 0, Z N(0, ( 1) so we ) reject H 0 if Z > Z 1 α/2 θ Under H 1, Z N 1 θ 0, 1 V /n

and Power/Sample Size and Standard Errors Statistical power: quick review Power is the probability of rejecting the null hypothesis when the alternative is true Pr(reject H 0 θ = θ 1) Most often ˆθ N ( ) θ, V n so that the test statistic Z = ˆθ θ 0 wll V /n follow a Normal distribution Under H 0, Z N(0, ( 1) so we ) reject H 0 if Z > Z 1 α/2 θ Under H 1, Z N 1 θ 0, 1 V /n Power curves The power function (power curve) is a function of the true value of θ We can compute power for every value of θ As θ moves away from θ 0, power increases (for two-sided alternatives) For any choice of desired power, there is always some θ such that the study has that power Pwr(θ 0 ) = α, the type I error rate

and Power/Sample Size and Standard Errors Power curves for a two-sample, equal variance, t-test; n=100 Power 0.0 0.2 0.4 0.6 0.8 1.0 σ = 1 σ = 1.2 0.5 0.0 0.5 True difference in means (theta)

and Power/Sample Size and Standard Errors Code for generating example power curve mydiffs <- seq(-0.8, 0.8, 0.05) mypower <- vector("numeric", length(mydiffs)) mypower2 <- vector("numeric", length(mydiffs)) for (i in 1:length(mydiffs)) { mypower[i] <- power.t.test(n = 100, sd = 1, delta = mydiffs[i])$power mypower2[i] <- power.t.test(n = 100, sd = 1.2, delta = mydiffs[i])$power } plot(mydiffs, mypower, xlab = "True difference in means (theta)", ylab = "Power", type = "l", main = "") lines(mydiffs, mypower2, lty = 2) legend("top", c(expression(sigma == 1), expression(sigma == 1.2)), lty = 1:2, inset = 0.05)

Outline Overview and Power/Sample Size and Standard Errors 1 Overview 2 3 4 and Power/Sample Size 5 and Standard Errors 6

and Power/Sample Size and Standard Errors and standard errors Standard errors are the key to precision Greater precision is achieved with smaller standard errors Standard errors are decreased by either decreasing V or increasing n Typically: se(ˆθ) = V n Width of CI: 2 (crit value) se(ˆθ) Test statistic: Z = ˆθ θ 0 se( ˆθ)

and Power/Sample Size and Standard Errors Example: One sample mean Observations are independent and identically distributed (iid) iid Y i (µ, σ 2 ), i = 1,..., n n θ = µ, ˆθ = 1 Y n i = Y i=1 V = σ 2, se(ˆθ) = σ 2 n

and Power/Sample Size and Standard Errors Example: One sample mean Observations are independent and identically distributed (iid) iid Y i (µ, σ 2 ), i = 1,..., n n θ = µ, ˆθ = 1 Y n i = Y i=1 V = σ 2, se(ˆθ) = σ 2 n Note that we are not assuming a specific distribution for Y i, just that the distribution has a mean and variance We are assuming that n is large so asymptotic results are applicable Then the distribution Y i could be binary data, Poisson, exponential, normal, etc. and the results will hold

and Power/Sample Size and Standard Errors Example: One sample mean Observations are independent and identically distributed (iid) iid Y i (µ, σ 2 ), i = 1,..., n n θ = µ, ˆθ = 1 Y n i = Y i=1 V = σ 2, se(ˆθ) = σ 2 n Note that we are not assuming a specific distribution for Y i, just that the distribution has a mean and variance We are assuming that n is large so asymptotic results are applicable Then the distribution Y i could be binary data, Poisson, exponential, normal, etc. and the results will hold There are ways to decrease V including... Restrict sample by age, gender, etc. Take repeated measures on each subject, summarize, and perform test on summary measures Better ideas (this course): Adjust for age and gender; use all data while modeling correlation

and Power/Sample Size and Standard Errors Example: Two sample mean Difference of independent means Observations no longer identically distributed, just independent. Group 1 has a different mean and variance than group 2 ind Y ij (µ j, σ 2 j ), j = 1, 2; i = 1,..., n j n = n 1 + n 2; r = n 1/n 2 θ = µ 1 µ 2, ˆθ = Y 1 Y 2 V = (r + 1)( σ2 1 r + σ 2) 2 se(ˆθ) = = σ1 2 V n n 1 + σ2 2 n 2

and Power/Sample Size and Standard Errors Comments on the optimal ratio of sample sizes (r) If we are constrained by the maximal sample size n = n 1 + n 2 Smallest V when r = n 1 n 2 = σ 1 σ 2 In other words, smaller V if we sample more subjects from the more variable group If we are unconstrained by the maximal sample size, there is a point of diminishing returns Example: Case-control study where finding cases is difficult/expensive but finding controls is easy/cheap Often quoted r = 5

and Power/Sample Size and Standard Errors Optimal sample size ratio for fixed sample size Optimal r for Fixed (n1 + n2): r = s1 / s2 Standard Error 1 2 3 4 5 6 r = 1 r = 3 r = 2 s1 = 3*s2 s1 = 2*s2 s1 = s2 0 5 10 15 20 25 Sample Size Ratio r = n1/n2

and Power/Sample Size and Standard Errors Diminishing returns for increase sample size ratio, r Diminishing returns for r > 5 Standard Error 0.5 1.0 1.5 2.0 2.5 3.0 s1 = 3*s2 s1 2*s2 s1 = s2 0 5 10 15 20 25 Sample Size Ratio r = n1/n2

and Power/Sample Size and Standard Errors Code for optimal sample size ratio for fixed sample size var.fn <- function(r, s1, s2) { (r + 1) * (s1^2/r + s2^2) } n <- 100 s2 <- 10 plot(function(r) sqrt(var.fn(r, s1 = s2, s2 = s2)/n), 0, 20, ylim = c(1, 6), xlim = c(0, 25), ylab = "Standard Error", xlab = "Sample Size Ratio r = n1/n2", main = "Optimal r for Fixed (n1 + n2): r = s1 / s2") plot(function(r) sqrt(var.fn(r, s1 = 2 * s2, s2 = s2)/n), 0, 20, add = TRUE, lty = 2) plot(function(r) sqrt(var.fn(r, s1 = 3 * s2, s2 = s2)/n), 0, 20, add = TRUE, lty = 3) text(20, 4.7, "s1 = s2", pos = 4) text(20, 5.1, "s1 = 2*s2", pos = 4) text(20, 5.5, "s1 = 3*s2", pos = 4) points(c(1, 2, 3), sqrt(var.fn(c(1, 2, 3), s1 = c(1, 2, 3) * s2, s2 = s2)/n), pch = 2) text(1, 1.8, "r = 1") text(2, 2.8, "r = 2") text(3, 3.8, "r = 3")

and Power/Sample Size and Standard Errors Code for diminishing returns for increase sample size ratio n1 <- 200 plot(function(r) sqrt(var.fn(r, s1 = s2, s2 = s2)/(n1 + r * n1)), 0, 20, ylim = c(0.5, 3), xlim = c(0, 25), ylab = "Standard Error", xlab = "Sample Size Ratio r = n1/n2", main = "Diminishing returns for r > 5") plot(function(r) sqrt(var.fn(r, s1 = 2 * s2, s2 = s2)/(n1 + r * n1)), 0, 20, add = TRUE, lty = 2) plot(function(r) sqrt(var.fn(r, s1 = 3 * s2, s2 = s2)/(n1 + r * n1)), 0, 20, add = TRUE, lty = 3) text(20, 0.7, "s1 = s2", pos = 4) text(20, 0.8, "s1 = 2*s2", pos = 4) text(20, 0.9, "s1 = 3*s2", pos = 4)

and Power/Sample Size and Standard Errors Example: Paired means Difference of paired means No longer iid. Group 1 has a different mean and variance than group 2, and observations are paired (correlated) Y ij (µ j, σj 2 ), j = 1, 2; i = 1,..., n corr(y i1, Y i2 ) = ρ; corr(y ij, Y mk ) = 0 if i m θ = µ 1 µ 2, ˆθ = Y 1 Y 2 V = σ1 2 + σ2 2 2ρσ 1σ 2 se(ˆθ) = V n gains are made when matched observations are positively correlated (ρ > 0) Usually the case, but possible exceptions Sleep on successive nights Intrauterine growth of litter-mates

and Power/Sample Size and Standard Errors Example: Clustered data Clustered data: Experiment where treatments/interventions are assigned based on the basis of Households, schools, clinics, cities, etc. Mean of clustered data Y ij (µ, σ 2 ), i = 1,..., n; j = 1,..., m Up to n clusters, each of which have m subjects corr(y ij, Y ik ) = ρ if j k corr(y ij, Y mk ) = 0 if i m θ = µ, ˆθ = 1 nm n i=1 ( ) V = σ 2 1+(m 1)ρ se(ˆθ) = V n m m Y ij = Y j=1

and Power/Sample Size and Standard Errors Example: Clustered data Clustered data: Experiment where treatments/interventions are assigned based on the basis of Households, schools, clinics, cities, etc. Mean of clustered data Y ij (µ, σ 2 ), i = 1,..., n; j = 1,..., m Up to n clusters, each of which have m subjects corr(y ij, Y ik ) = ρ if j k corr(y ij, Y mk ) = 0 if i m θ = µ, ˆθ = 1 nm n i=1 ( ) V = σ 2 1+(m 1)ρ V n m m Y ij = Y j=1 se(ˆθ) = What is V if... ρ = 0 (independent) m = 1 m is large (e.g m = 1000) and ρ is 0, 1, or 0.01

and Power/Sample Size and Standard Errors Clustered data cont. With clustered data, even small correlations can be very important to consider Equal precision achieved with Clusters (n) m ρ Total N 1000 1 0.01 1000 650 2 0.30 1300

and Power/Sample Size and Standard Errors Clustered data cont. With clustered data, even small correlations can be very important to consider Equal precision achieved with Clusters (n) m ρ Total N 1000 1 0.01 1000 650 2 0.30 1300 550 2 0.10 1100 190 10 0.10 1900 109 10 0.01 1090 20 100 0.01 2000 Always consider practical issues. Is it easier/cheaper to collect 1 observation on 1000 different subjects, or 100 observations on 20 different subjects?

and Power/Sample Size and Standard Errors Example: Independent odds ratios Binary outcomes ind Y ij B(1, p j ), i = 1,..., n j ; j = 1, 2 n = n 1 + n 2; r = n 1/n 2 θ = log σ 2 j = ( p1 /(1 p 1 ) p 2 /(1 p 2 ) 1 = 1 p j (1 p j ) p j (q j ) V = (r + 1)( σ2 1 r + σ2) 2 se(ˆθ) = = 1 V n ) ; ˆθ = log n 1 p 1 q 1 + 1 n 2 p 2 q 2 ( ) ˆp1 /(1 ˆp 1 ) ˆp 2 /(1 ˆp 2 ) Notes on maximum precision Max precision is achieved when the underlying odds are near 1 (proportions near 0.5) If we were considering differences in proportions, the max precision is achieved when the underlying proportions are near 0 or 1

and Power/Sample Size and Standard Errors Example: Hazard ratios Independent censored time to event outcomes (T ij, δ ij ), i = 1,..., n j ; j = 1, 2 n = n 1 + n 2; r = n 1/n 2 θ = log(hr); ˆθ = ˆβ from proportional hazards (PH) regression V = (r+1)(1/r+1) se(ˆθ) = Pr(δ ij =1) V n = (r+1)(1/r+1) d In the PH model, statistical information is roughly proportional to d, the number of observed events Papers always report the number of events Study design must consider how long it will take to observe events (e.g. deaths) starting from randomization

and Power/Sample Size and Standard Errors Example: Linear regression Independent continuous outcomes associated with covariates ind Y i X i (β 0 + β 1X i, σ 2 Y X ), i = 1,..., n θ = β 1, ˆθ = ˆβ 1 from LS regression V = σ2 Y X Var(X ) se(ˆθ) = ˆσ 2 Y X n ˆ Var(X ) tends to increases as the predictor (X ) is measured over a wider range also related to the within group variance σ 2 Y X What happens to the formulas when X is a binary variable? See two sample mean

Outline Overview and Power/Sample Size and Standard Errors 1 Overview 2 3 4 and Power/Sample Size 5 and Standard Errors 6

Summary Overview and Power/Sample Size and Standard Errors Options for increasing precision Increase sample size Decrease V (Decrease confidence level) Criteria for precision Standard error Width of confidence intervals Statistical power Select a suitable design alternative Select desired power

Summary Overview and Power/Sample Size and Standard Errors Sample size calculation: The number of sampling units needed to obtain the desired precision Level of significance α when θ = θ 0 Power β when θ = θ 1 Variability V within one sampling unit n = (z 1 α/2 +z β ) 2 V (θ 1 θ 0 ) 2 When sample size is constrained (the usual case) either Compute power to detect a specified alternative ( ) (θ 1 β = φ 1 θ 0 ) z 1 α/2 V /n φ is the standard Normal cdf function In STATA, use normprob for the φ function Compute alternative that can be detected with high power θ 1 = θ 0 + (z 1 α/2 + z β ) V /n

and Power/Sample Size and Standard Errors General comments Sample size required behaves like the square of the width of the CI. To cut the width of the CI in half, need to quadruple the sample size. Positively correlated observations within the same group provide less precision than the same number of independent observations Positively correlated observations across groups provide more precision What power do you use? Most popular is 80% (too low) or 90% Key is to be able to discriminate between scientifically meaningful hypotheses