Power and Sample Size (StatPrimer Draft)

Similar documents
Tests for the Odds Ratio in a Matched Case-Control Design with a Quantitative X

Person-Time Data. Incidence. Cumulative Incidence: Example. Cumulative Incidence. Person-Time Data. Person-Time Data

Describing Contingency tables

Part IV Statistics in Epidemiology

Chapter 2: Describing Contingency Tables - I

Sample Size. Vorasith Sornsrivichai, MD., FETP Epidemiology Unit, Faculty of Medicine Prince of Songkla University

Welcome! Webinar Biostatistics: sample size & power. Thursday, April 26, 12:30 1:30 pm (NDT)

Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal

TESTS FOR EQUIVALENCE BASED ON ODDS RATIO FOR MATCHED-PAIR DESIGN

The Inclusion-Exclusion Principle

CHAPTER 10 Comparing Two Populations or Groups

Tests for the Odds Ratio in Logistic Regression with One Binary X (Wald Test)

Tests for Two Correlated Proportions in a Matched Case- Control Design

E509A: Principle of Biostatistics. GY Zou

REVIEW: Midterm Exam. Spring 2012

Hypothesis testing. Data to decisions

Harvard University. Rigorous Research in Engineering Education

Assistant Prof. Abed Schokry. Operations and Productions Management. First Semester

Glossary for the Triola Statistics Series

IE 316 Exam 1 Fall 2011

IE 316 Exam 1 Fall 2011

Superiority by a Margin Tests for One Proportion

Chapter 9 Regression. 9.1 Simple linear regression Linear models Least squares Predictions and residuals.

Survey on Population Mean

Case-control studies C&H 16

Estimating a Population Mean. Section 7-3

Testing Independence

Chapter 13: Forecasting

Stat 231 Exam 2 Fall 2013

8 Nominal and Ordinal Logistic Regression

Introduction 1. STA442/2101 Fall See last slide for copyright information. 1 / 33

Binomial Data, Axioms of Probability, and Bayes Rule. n! Pr(k successes in n draws)' k!(n&k)! Bk (1&B) n&k

Confidence Intervals for the Sample Mean

Logistic regression: Miscellaneous topics

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Applied Regression Modeling: A Business Approach Chapter 2: Simple Linear Regression Sections

Big Data Analysis with Apache Spark UC#BERKELEY

A First Course on Kinetics and Reaction Engineering Example 13.2

DIRECTED NUMBERS ADDING AND SUBTRACTING DIRECTED NUMBERS

a Sample By:Dr.Hoseyn Falahzadeh 1

Answer keys for Assignment 10: Measurement of study variables (The correct answer is underlined in bold text)

Introduction to Basic Statistics Version 2

PubH 5450 Biostatistics I Prof. Carlin. Lecture 13

Lecture 3: Measures of effect: Risk Difference Attributable Fraction Risk Ratio and Odds Ratio

Statistics for Managers Using Microsoft Excel 7 th Edition

Linkage Methods for Environment and Health Analysis General Guidelines

STAT 705: Analysis of Contingency Tables

United Kingdom (UK) Road Safety Development. Fatalities

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

REPORTING OF INFERRED MINERAL RESOURCES

Sequences of Real Numbers

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology, Kharagpur

Ignoring the matching variables in cohort studies - when is it valid, and why?

AP Statistics Ch 12 Inference for Proportions

E509A: Principle of Biostatistics. GY Zou

Comparing p s Dr. Don Edwards notes (slightly edited and augmented) The Odds for Success

CS 5014: Research Methods in Computer Science. Statistics: The Basic Idea. Statistics Questions (1) Statistics Questions (2) Clifford A.

Part 13: The Central Limit Theorem

Chapter One. Quadratics 20 HOURS. Introduction

Chapter 1 Statistical Inference

Section 1.1. Data - Collections of observations (such as measurements, genders, survey responses, etc.)

Interval estimation. October 3, Basic ideas CLT and CI CI for a population mean CI for a population proportion CI for a Normal mean

STAT Section 2.1: Basic Inference. Basic Definitions

Probability and Probability Distributions. Dr. Mohammed Alahmed

Review of Statistics 101

«Infuse mathematical thinking in a lesson on Pythagoras Theorem»

Solving Recurrence Relations 1. Guess and Math Induction Example: Find the solution for a n = 2a n 1 + 1, a 0 = 0 We can try finding each a n : a 0 =

STEP Support Programme. Hints and Partial Solutions for Assignment 17

Agreement Coefficients and Statistical Inference

Lecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.

Sampling and Sample Size. Shawn Cole Harvard Business School

MS&E 226: Small Data

Fourier and Stats / Astro Stats and Measurement : Stats Notes

Causal Hazard Ratio Estimation By Instrumental Variables or Principal Stratification. Todd MacKenzie, PhD

An introduction to biostatistics: part 1

Lecturer: Dr. Adote Anum, Dept. of Psychology Contact Information:

Big-oh stuff. You should know this definition by heart and be able to give it,

1 Interaction models: Assignment 3

Stat 529 (Winter 2011) Experimental Design for the Two-Sample Problem. Motivation: Designing a new silver coins experiment

Day 5. Friday May 25, 2012

The Difference in Proportions Test

In case (1) 1 = 0. Then using and from the previous lecture,

Chapter 4. Probability Distributions Continuous

Using Geographic Information Systems for Exposure Assessment

Investigating mediation when counterfactuals are not metaphysical: Does sunlight exposure mediate the effect of eye-glasses on cataracts?

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016

Using Scientific Measurements

2.4. Characterising Functions. Introduction. Prerequisites. Learning Outcomes

Model comparison. Patrick Breheny. March 28. Introduction Measures of predictive power Model selection

Note that r = 0 gives the simple principle of induction. Also it can be shown that the principle of strong induction follows from simple induction.

Multiple regression and inference in ecology and conservation biology: further comments on identifying important predictor variables

Nemours Biomedical Research Statistics Course. Li Xie Nemours Biostatistics Core October 14, 2014

Math 1040 Final Exam Form A Introduction to Statistics Fall Semester 2010

Power; sample size; significance level; detectable difference; standard error; influence function.

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur

Quantitative Methods for Economics, Finance and Management (A86050 F86050)

ST3241 Categorical Data Analysis I Two-way Contingency Tables. 2 2 Tables, Relative Risks and Odds Ratios

Working with Stata Inference on the mean

Transcription:

Power and Sample Size (StatPrimer Draft) To achieve meaningful results, statistical studies must be carefully planned and designed. Study design has many aspects. Here s just a sampling of questions you must ask yourself when planning a study: How will the research question be operationalized? How will the study outcome be measured? Will it be objective? Will it be repeatable? How will relations be quantified? What parameter will be estimated? Will the study be experimental or non-experimental? If experimental, will blinding be employed? Will randomization be employed? What is the nature of the control group? If non-experimental, will data be prospective or retrospective? Will the sample be crossnaturalistic, cohort, or case-control? What are you going to do about confounders? How large a sample will be needed? These notes address the last question for studies that seek to infer means, mean differences, proportions, and differences in proportions. Selected estimation and testing methods are considered. Confidence Interval for a Mean The goal is to estimate μ with stated plus or minus margin of error m. In estimating μ with 95% confidence, m σ n. Therefore, n = 4σ m In applying this method, considerable effort should be put into getting a good estimate of σ. (via prior studies, pilot studies, or Gestalt ). Example: To obtain a margin of error of 5 when studying a variable with a standard deviation of 15, use n = (4)(15 )/(5 ) = 36. To obtain a margin of error of.5 when studying this variable, use n = (4)(15 )/(.5 ) = 144. Notice that requiring a smaller margin of error required a larger sample size; half the margin of error required us to quadruple the sample size. Testing Two Means In testing H 0 : μ 1 = μ with equal sized groups, use n = σ z β + z α per group, where 1 β = the desired power, α = the significance-level, σ = the within group standard deviation, and = a difference worth detecting. Page 1 of 5 C:\data\StatPrimer\sampsize.doc Last printed 5/3/007 6:05:00 AM

Example: We want to test H 0 : μ 1 = μ at α = 0.05 (two-sided) with 90% power and are looking for a mean difference of 1 mmol/l. We assume a within group standard deviation of 0.5 mmol. σ ( z ) β + z α 0.5 ( 1.8 + 1.96) Therefore, n = = = 5.. Use 6 per group. 1 Notes: Maximum efficiency is gained when the groups have equal sizes. If n 1 is restricted, apply the formula n = nn 1 / (n 1 n) to determine the size of the group. Always round-up results to ensure adequate power. Use of a software utilities (e.g., WipPepi, www.openepi.com) is strongly encouraged, see this: The sample size formula for can be rearranged to determine the power of a test = Φ z β α + σ where Φ(z) is the cumulative probability of a standard Normal random variable. Use of a software utility is strongly encouraged. Example: What is the power of a test of H 0 : μ 1 = μ when α = 0.05 (two-sided), n = 30 per group, μ 1 μ = 0.5, and σ = 0.5? Solution: 0.5 β = Φ = Φ + z α + 1.96 =Φ( 0.0) = (0.5) σ n 30 0.490 (about 50/50). Note: You can get the same result with www.openepi.com. n Page of 5 C:\data\StatPrimer\sampsize.doc Last printed 5/3/007 6:05:00 AM

Estimating a Proportion Specify your best guess for population proportion (parameter) p. If you have no idea, use p = 0.5 to derive ensure adequate precision. State your desired level of confidence (e.g., 95%) and c computations a software utility such as OpenEpi.com. Example: How many people do I need to estimate a proportion within a margin of error no greater than ±3%? I do not have a good idea of p, so will assume p = 0.5 for the sake of calculations. OpenEpi.com derives this output: suggesting the need for 1066 observations. How many people do I need to study to the proportion within a margin of error no greater than ±5%? This larger margin of error allowed for a smaller sample size. Page 3 of 5 C:\data\StatPrimer\sampsize.doc Last printed 5/3/007 6:05:00 AM

Testing two proportions Sample size: In testing H 0 : p 1 = p, the sample size depends on α, power (1 β), the difference worth detecting (), the relative sizes of the samples (n / n 1 ), and the proportion in the non-exposed group (p ). Because computations are tedious, we rely on a software utility for computations. Example: We want to test proportions from a cohort study. We use an equal number of individuals in the two groups, let alpha = 0.05 (two-sided), and power = 80%. We wish to detect a doubling of risk and assume the risk in the non-exposed group is 10%. The results from www.openepi.com are to the right. Three different methods of calculation are reported. Pick one (and report its source). For example, the Fleiss method (middle column) states that we need 199 individuals per group. Power: sample size formula can be rearranged to determine the power of a test based on a given sample size. For example, we can ask what would the power of the aforementioned example had we used only 100 individuals per group. Here the www.openepi.com output. This says that the power would have been only 51% (Normal approximation). That would have been inadequate. Page 4 of 5 C:\data\StatPrimer\sampsize.doc Last printed 5/3/007 6:05:00 AM

Testing proportions from case control studies In testing H 0 : OR = 1 for a case-control study, specify the expected odds ratio and expected proportion of controls who are classified as exposed. Here s output for an illustration that assumes α = 0.05, 1 β = 0.8, an equal number of cases and controls, an exposure proportion of 0.5 in controls, and an expected odds ratio of. These conditions require 15 cases and 15 controls. Page 5 of 5 C:\data\StatPrimer\sampsize.doc Last printed 5/3/007 6:05:00 AM