Sample Size Estimation in the Proportional Hazards Model for K-sample or Regression Settings Scott S. Emerson, M.D., Ph.D.

Similar documents
Sample Size Determination (Two or More Samples)

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

6 Sample Size Calculations

1 Models for Matched Pairs

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

1 Inferential Methods for Correlation and Regression Analysis

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test.

Stat 200 -Testing Summary Page 1

Chapter 6 Sampling Distributions

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

Properties and Hypothesis Testing

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

2 1. The r.s., of size n2, from population 2 will be. 2 and 2. 2) The two populations are independent. This implies that all of the n1 n2

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

TAMS24: Notations and Formulas

Topic 9: Sampling Distributions of Estimators

5. Likelihood Ratio Tests

Topic 9: Sampling Distributions of Estimators

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Linear Regression Models

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

Statisticians use the word population to refer the total number of (potential) observations under consideration

CS284A: Representations and Algorithms in Molecular Biology

Topic 9: Sampling Distributions of Estimators

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Chapter 13, Part A Analysis of Variance and Experimental Design

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

This chapter focuses on two experimental designs that are crucial to comparative studies: (1) independent samples and (2) matched pair samples.

Agreement of CI and HT. Lecture 13 - Tests of Proportions. Example - Waiting Times

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Direction: This test is worth 150 points. You are required to complete this test within 55 minutes.

STAC51: Categorical data Analysis

University of California, Los Angeles Department of Statistics. Hypothesis testing

Inferential Statistics. Inference Process. Inferential Statistics and Probability a Holistic Approach. Inference Process.

Common Large/Small Sample Tests 1/55

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

Formulas and Tables for Gerstman

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

Chapter 13: Tests of Hypothesis Section 13.1 Introduction

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

Final Review. Fall 2013 Prof. Yao Xie, H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech

SDS 321: Introduction to Probability and Statistics

Module 1 Fundamentals in statistics

STA6938-Logistic Regression Model

Power and Type II Error

THE KALMAN FILTER RAUL ROJAS

This is an introductory course in Analysis of Variance and Design of Experiments.

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9

STAT431 Review. X = n. n )

Statistical inference: example 1. Inferential Statistics

NANYANG TECHNOLOGICAL UNIVERSITY SYLLABUS FOR ENTRANCE EXAMINATION FOR INTERNATIONAL STUDENTS AO-LEVEL MATHEMATICS

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population

Lecture 1 Probability and Statistics

Computing Confidence Intervals for Sample Data

Data Analysis and Statistical Methods Statistics 651

Binomial Distribution

Statistical Analysis on Uncertainty for Autocorrelated Measurements and its Applications to Key Comparisons

[ ] ( ) ( ) [ ] ( ) 1 [ ] [ ] Sums of Random Variables Y = a 1 X 1 + a 2 X 2 + +a n X n The expected value of Y is:

Summary. Recap ... Last Lecture. Summary. Theorem

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

A statistical method to determine sample size to estimate characteristic value of soil parameters

Lecture 7: Properties of Random Samples

Expectation and Variance of a random variable

Recall the study where we estimated the difference between mean systolic blood pressure levels of users of oral contraceptives and non-users, x - y.

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

Open book and notes. 120 minutes. Cover page and six pages of exam. No calculators.

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01

Problem Set 4 Due Oct, 12

Agenda: Recap. Lecture. Chapter 12. Homework. Chapt 12 #1, 2, 3 SAS Problems 3 & 4 by hand. Marquette University MATH 4740/MSCS 5740

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

GG313 GEOLOGICAL DATA ANALYSIS

Basics of Probability Theory (for Theory of Computation courses)

Biostatistics for Med Students. Lecture 2

6.3 Testing Series With Positive Terms

Unbiased Estimation. February 7-12, 2008

Lecture 5. Materials Covered: Chapter 6 Suggested Exercises: 6.7, 6.9, 6.17, 6.20, 6.21, 6.41, 6.49, 6.52, 6.53, 6.62, 6.63.

Analysis of Experimental Data

Since X n /n P p, we know that X n (n. Xn (n X n ) Using the asymptotic result above to obtain an approximation for fixed n, we obtain

Chapters 5 and 13: REGRESSION AND CORRELATION. Univariate data: x, Bivariate data (x,y).

Lecture 5: Parametric Hypothesis Testing: Comparing Means. GENOME 560, Spring 2016 Doug Fowler, GS


Stat 319 Theory of Statistics (2) Exercises

Lecture 2: Monte Carlo Simulation

Example: Find the SD of the set {x j } = {2, 4, 5, 8, 5, 11, 7}.

Last time: Moments of the Poisson distribution from its generating function. Example: Using telescope to measure intensity of an object

Important Formulas. Expectation: E (X) = Σ [X P(X)] = n p q σ = n p q. P(X) = n! X1! X 2! X 3! X k! p X. Chapter 6 The Normal Distribution.

Simple Regression. Acknowledgement. These slides are based on presentations created and copyrighted by Prof. Daniel Menasce (GMU) CS 700

Math 140 Introductory Statistics

(all terms are scalars).the minimization is clearer in sum notation:

October 25, 2018 BIM 105 Probability and Statistics for Biomedical Engineers 1

Chapter 11: Asking and Answering Questions About the Difference of Two Proportions

ST 305: Exam 3 ( ) = P(A)P(B A) ( ) = P(A) + P(B) ( ) = 1 P( A) ( ) = P(A) P(B) σ X 2 = σ a+bx. σ ˆp. σ X +Y. σ X Y. σ X. σ Y. σ n.

Comparing your lab results with the others by one-way ANOVA

Transcription:

ample ie Estimatio i the Proportioal Haards Model for K-sample or Regressio ettigs cott. Emerso, M.D., Ph.D. ample ie Formula for a Normally Distributed tatistic uppose a statistic is kow to be ormally distributed with mea ad variace. A hypothesis test havig oe-sided type I error might be based o a critical fuctio which rejects H : i favor of alterative hypothesis H : whe ( ) ( ) Φ Φ u du e Z where ; π. The power fuctio for this hypothesis test is the ( ) Φ Pwr Pr Pr This power fuctio ca be used to compute the power β with which the hypothesis test rejects a specific alterative whe the sample sie is at some give value of ; compute the sample sie for which a hypothesis test would have prescribed power β to detect a specific desig alterative ; or compute the alterative which is rejected with prescribed power β whe performig the hypothesis test with some give sample sie. For istace, whe desirig to compute a sample sie such that the hypothesis test has power β, we merely wat ( ), Pr β Φ Pwr which the suggests ( ) ( ) β β β +. Aother approach to sample sie estimatio is based o the precisio with which some parameter ca be estimated. For istace, a (-)% cofidece iterval for might be computed as., + If we wat the width of the cofidece iterval to be - (so the CI will discrimiate betwee the ull ad alterative hypotheses), the we use sample sie formula ( ) ( ), + which correspods to the same sample sie formula as derived from the hypothesis test, providig we choose power β (my religio). 5.4.

Modified ample ie Formula i the Presece of a Mea-ariace Relatioship uppose a statistic is kow to be ormally distributed with mea ad variace (). I this case, the variace of the distributio of depeds upo the mea of that distributio a mea-variace relatioship. The above formulas eed to be modified whe the variace of the ormally distributed statistic depeds o the mea. As a geeral rule, most statisticias igore this issue because either ) the sample sie will be such that the variace will ot differ by very much over the rage of alteratives for which the power is, say, betwee % ad 99%, or ) the sample sie computatio is based o such crude estimates of the variability of the data that ay error due to igorig the meavariace relatioship is egligible, or 3) both. Nevertheless, for completeess I preset the modified formulas here for mea-variace relatioships. I these formulas, I presume that the power fuctio is higher at tha at ay <. I ote that there are sample sie requiremets i order to guaratee that the power curve achieves its maximum over the ull hypothesis regio at this boudary betwee the ull ad alterative hypotheses. This requiremet is that for all <, we must have ( ) ( ). This is clearly satisfied whe () is a icreasig fuctio of, because i that case, the umerator is egative. uppose a statistic is kow to be ormally distributed with mea ad variace (). A hypothesis test havig oe-sided type I error might be based o a critical fuctio which rejects H : i favor of alterative hypothesis H : whe Z u Φ e du. ( ) ; where Φ( ) ( ) π The power fuctio for this hypothesis test is the Pwr ( ) Pr Φ ( ) ( ) ( ) Pr. ( ) ( ) ( ) ( ) ( ) This power fuctio ca be used to compute the power β with which the hypothesis test rejects a specific alterative whe the sample sie is at some give value of ; compute the sample sie for which a hypothesis test would have prescribed power β to detect a specific desig alterative ; or compute the alterative which is rejected with prescribed power β whe performig the hypothesis test with some give sample sie. For istace, whe desirig to compute a sample sie such that the hypothesis test has power β, we merely wat 5.4.

Pwr ( ) Pr Φ β, ( ) ( ) ( ) ( ) which the suggests ( ) ( ( ) + ( ) ) β β β. ( ) ( ( ) ) As before, the choice of power β (my religio) correspods exactly to choosig sample sie accordig to the precisio with which some parameter ca be estimated as judged by a (-)% cofidece iterval for. Whe ivertig the above power ador sample sie formulas to fid the alterative for which a desig has prescribed power, it may be the case that a iterative search is ecessary. Geeral ample ie Formula for -sample, -sample, ad Regressio ettigs The +eqtrial Techical Overview describes a geeral sample sie formula which ca be used i the data aalysis models most commoly used i the aalysis of cliical trial data. (The otatio i this documet differs slightly from the otatio used i the techical overview.) I these models, we let θ represet the measure of treatmet effect, which is most ofte a cotrast (differece or ratio) of some withi group summary measure µ computed idepedetly for each treatmet arm. tatistical aalysis ca usually be based o a estimate of the treatmet effect θ. Most ofte, either the estimate of θ or the logarithmic trasformatio of θ are approximately ormally distributed i a fixed sample study (i.e., oe without iterim aalyses). We thus let g(θ) be the trasformed treatmet effect measure which is commoly estimated with a approximately ormally distributed estimate. The lik fuctio g( ) is typically the idetity fuctio (so θ) or the logarithmic trasformatio (so log (θ)). We thus assume that the estimate of is approximately ormally distributed with ˆ ~ N, where is the (average) variability cotributed to the estimate by a sigle observatio, ad is the sample sie. I geeral, ca be a fuctio of the withi group summary measures µ, as well as other uisace parameters that are idepedet of µ. I the rest of this documet, we igore ay mea-variace relatioship. Whe implemetig these formulas, it will geerally be ecessary to decide whether to make calculatios usig the value of uder the ull, alterative, or some itermediate hypothesis. uppose we are iterested i discrimiatig betwee a ull hypothesis H : ad a alterative hypothesis H : i a hypothesis test havig oe-sided type I error ad statistical power β. Whe the above approximate distributio holds, sample sie computatios are most ofte effected usig δ β 5.4.

where ad δ β is a stadardied alterative, which i a fixed sample study (i.e., oe without iterim aalyses) is δ β - + β. The same geeral formula ca be used i a group sequetial test, providig the estimate of treatmet effect ca be viewed as a weighted sum of ucorrelated, approximately ormally distributed statistics computed o the groups accrued betwee aalyses. This is ofte referred to as idepedet icremet structure, ad this holds i a wide variety of commo cliical trial settigs. I these group sequetial settigs, the stadardied alterative must be computed usig recursive umerical itegratio of covolutios of desities. (+eqtrial will do this for us.) Use of the Geeral Formula i Commo -sample Aalysis Models. Testig meas of cotiuous distributios: Y i ~ (µ, σ ), i,, θ µ θ σ. Testig geometric meas of cotiuous distributios: log Y i ~ (µ, σ ), i,, θ e µ log ( θ ) σ 3. Testig proportios of Beroulli distributios: Y i ~B (,µ), i,, θ µ θ p(-p) Use of the Geeral Formula i Commo Idepedet ample Aalysis Models. Testig meas of cotiuous distributios: Y ki ~ (µ k, σ k ), i,, m k ; k, m + m Radomiatio ratio r m m θ µ µ (differece of meas) θ (r+) [ σ r + σ ]. Testig geometric meas of cotiuous distributios: log Y ki ~ (µ k, σ k ), i,, m k ; k, m + m Radomiatio ratio r m m θ exp ( µ ) exp (µ ) exp ( µ µ ) (ratio of geometric meas) log ( θ ) (r+) [ σ r + σ ] 5.4.

3. Testig proportios of Beroulli distributios: Y ki ~B (, µ k ), i,, m k ; k, a. m + m b. Radomiatio ratio r m m c. θ µ µ (differece of proportios) d. θ e. (r+) [ p ( - p ) r + p ( p ) ] (uder the alterative) 4. Testig odds of Beroulli distributios: Y ki ~B (, p k ), i,, m k ; k, a. m + m b. Radomiatio ratio r m m c. Odds µ k p k ( p k ) d. θ µ µ (odds ratio) e. log ( θ ) f. (r+) [ ( r p ( - p ) ) + ( p ( p ) ) ] (uder alterative) 5. Testig haard ratios i survival distributios: Y ki ~ k (t ), i,, m k ; k, umber of observed evets i both groups combied Radomiatio ratio r m m Haard fuctio h k (t ) - d ( log k (t ) ) θ h (t ) h (t ) (costat ratio of haard fuctios) log ( θ ) (r+) [ r + ] (uder the ull) Use of the Geeral Formula i Whe Comparig Meas with Correlated Observatios. ( Repeated Measures ): uppose the kth treatmet group (k,) has m k idepedet subjects, each of whom have J measuremets, ad subjects i differet groups are idepedet Y kij ~ (µ k, σ k ), k,, i,, m k ; j,, J corr(y kij, Y k i j )ρ if kk, ii, j j corr(y kij, Y k i j ) if kk, ii, jj corr(y kij, Y k i j )ρ if k k or i i Radomiatio ratio r m m θ µ µ (differece of meas) θ (r+) { σ [+(J-)ρ](J r) + σ [+(J-)ρ]J }. ( Crossover ): uppose m idepedet pairs of subjects are radomied such that oe member of each pair is i treatmet group ad oe is i treatmet group. Y ki ~ (µ k, σ k ), k,, i,, m corr(y ki, Y k i )ρ if k k, ii corr(y ki, Y k i ) if kk, ii corr(y ki, Y k i )ρ if i i θ µ µ (differece of meas) 5.4.

θ { σ + σ - ρσ σ } Use of the Geeral Formula i Commo Regressio Aalysis Models. Liear regressio (meas): ( Y i X i x i ) ~ (β + β x i, σ ), i,, θ E( Y X x+ ) - E ( Y X x ) β (liear cotrast of meas) θ σ ar ( x ). Liear regressio o log trasformed data (geometric meas): ( log Y i X i x i ) ~ (β + β x i, σ ), i,, θ GM( Y X x+ ) GM ( Y X x ) exp ( β ) log ( θ ) σ ar ( x ) 3. Logistic regressio (odds): Y ki ~B (, p k ), i,, m k ; k, a. m + m b. Radomiatio ratio r m m c. Odds µ k p k ( p k ) d. θ µ µ (odds ratio) e. log ( θ ) f. [ p ( p) ar ( x ) ] (usig a average value for p) 4. Proportioal haards regressio (haard ratios): Y i ~ i (t ), i,, umber of observed evets i both groups combied Haard h i (t X i x i ) - d ( log i (t X i x i ) ) h (t ) exp (β x i ) θ h(t X x +) h i (t X x ) exp (β ) log ( θ ) ar ( x ) (uder the ull) ample ie Formula for K-sample ettig The +eqtrial Techical Overview also provides a sample sie formula appropriate whe comparig meas or geometric meas across K idepedet samples i a fixed sample (o iterim aalyses) settig. I this settig, we agai use some cosider some withi group summary measure µ k computed idepedetly for the kth treatmet arm, k,,k. The ull ad alterative hypotheses are classically stated as H : µ µ µ K ad H : µ i µ j for some i,j. Testig of the hypotheses is geerally based o the variace of the withi group summary measures. That is, the parameter measurig treatmet effect is θ ar((µ, µ,, µ K )). Whe all groups have equal summary measures, this variace is. Whe the alterative hypothesis is true, the variace across the group summary measures is oero. 5.4.

The exact formula ad code used to compute sample sies i the K-sample settig is give i the techical overview. Usig +eqtrial to Compute Number of Evets for Proportioal Haards Models +eqtrial provides explicit fuctios for the computatio of sample sies i the two sample settig for both fixed sample ad group sequetial trials usig the proportioal haards model. Although o explicit facility is provided for proportioal haards regressio with a cotiuous predictor, examiatio of the results give above for the geometric mea ad haard ratio regressios reveals a similarity of the formulas. I fact, we merely eed to use the geometric mea model with a stadard deviatio of i order to estimate the umber of observed evets eeded for the proportioal haards model. This also suggests that whe plaig to use the K-sample lograk statistic, we ca merely use the geometric mea model i order to fid the umber of evets eeded to provide desired power. I this case, we ca use the commad lie fuctios (there is a bug i the dialog) to provide a vector of haard ratios across the K groups. All haard ratios should be specified relative to the cotrol group, ad it will be ecessary to iclude a haard ratio of reflectig the compariso of the cotrol group to itself. Computig the Number of ubjects to Accrue to a urvival tudy The above sample sie formulas for proportioal haards models provide the umber of evets eeded, rather tha the umber of subjects. everal approximate approaches are used to determie the umber of subjects to accrue:. Assume that subjects are accrued uiformly over, say, (,a), ad that data aalysis will occur at time τ+a. Further assume expoetial survival distributio (a costat haard) i each group. We ca the derive the probability of a subject havig a evet by the time of aalysis, ad by dividig the umber of evets by that probability, derive the umber of subjects to accrue. (see +eqtrial Techical Overview).. Uder the same assumptios, use the rate of observed evets ad the average time of follow-up i a Poisso type model. 5.4.