Bios 6649: Clinical Trials - Statistical Design and Monitoring

Similar documents
Bios 6649: Clinical Trials - Statistical Design and Monitoring

Bios 6648: Design & conduct of clinical research

Sequential Monitoring of Clinical Trials Session 4 - Bayesian Evaluation of Group Sequential Designs

BIOS 6649: Handout Exercise Solution

Sociology 6Z03 Review II

Accounting for Baseline Observations in Randomized Clinical Trials

Accounting for Baseline Observations in Randomized Clinical Trials

Poisson regression: Further topics

Bios 6649: Clinical Trials - Statistical Design and Monitoring

Bios 6648: Design & conduct of clinical research

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH

Pubh 8482: Sequential Analysis

Sample Size and Power I: Binary Outcomes. James Ware, PhD Harvard School of Public Health Boston, MA

CHL 5225H Advanced Statistical Methods for Clinical Trials: Multiplicity

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

Econometrics. 4) Statistical inference

Inference for the Regression Coefficient

Analysis of Longitudinal Data. Patrick J. Heagerty PhD Department of Biostatistics University of Washington

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke

Power and Sample Size Calculations with the Additive Hazards Model

Group Sequential Designs: Theory, Computation and Optimisation

4. Issues in Trial Monitoring

Overrunning in Clinical Trials: a Methodological Review

Categorical data analysis Chapter 5

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

ST 732, Midterm Solutions Spring 2019

BIOS 2083 Linear Models c Abdus S. Wahed

Sample size re-estimation in clinical trials. Dealing with those unknowns. Chris Jennison. University of Kyoto, January 2018

Central Limit Theorem ( 5.3)

BIOS 312: Precision of Statistical Inference

Scatter plot of data from the study. Linear Regression

Chapter 12 - Lecture 2 Inferences about regression coefficient

Diagnostics can identify two possible areas of failure of assumptions when fitting linear models.

The SEQDESIGN Procedure

Chapter 8 - Statistical intervals for a single sample

Lecture 5: ANOVA and Correlation

Lecture 3: Inference in SLR

Measuring the fit of the model - SSR

Adaptive designs beyond p-value combination methods. Ekkehard Glimm, Novartis Pharma EAST user group meeting Basel, 31 May 2013

Comparing Adaptive Designs and the. Classical Group Sequential Approach. to Clinical Trial Design

Scatter plot of data from the study. Linear Regression

Optimising Group Sequential Designs. Decision Theory, Dynamic Programming. and Optimal Stopping

Adaptive Designs: Why, How and When?

Dose-response modeling with bivariate binary data under model uncertainty

Interim Monitoring of Clinical Trials: Decision Theory, Dynamic Programming. and Optimal Stopping

Next is material on matrix rank. Please see the handout

Review of Statistics 101

BIOS 2083: Linear Models

Personalized Treatment Selection Based on Randomized Clinical Trials. Tianxi Cai Department of Biostatistics Harvard School of Public Health

Bias Variance Trade-off

[y i α βx i ] 2 (2) Q = i=1

Chapter Seven: Multi-Sample Methods 1/52

STAT 135 Lab 10 Two-Way ANOVA, Randomized Block Design and Friedman s Test

Micro-Randomized Trials & mhealth. S.A. Murphy NRC 8/2014

Clinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto.

Review of Statistics

Three-Way Contingency Tables

Sample Size Determination

STA441: Spring Multiple Regression. More than one explanatory variable at the same time

Ch 2: Simple Linear Regression

COMPLETELY RANDOM DESIGN (CRD) -Design can be used when experimental units are essentially homogeneous.

Sleep data, two drugs Ch13.xls

Unbalanced Designs & Quasi F-Ratios

Sample Size and Power Considerations for Longitudinal Studies

COMPARING GROUPS PART 1CONTINUOUS DATA

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

COMPARING SEVERAL MEANS: ANOVA

Statistical Hypothesis Testing

Tests for the Odds Ratio of Two Proportions in a 2x2 Cross-Over Design

Chapter 1 Statistical Inference

LECTURE 5. Introduction to Econometrics. Hypothesis testing

6 Sample Size Calculations

Rerandomization to Balance Covariates

Battery Life. Factory

SAS/STAT 15.1 User s Guide The SEQDESIGN Procedure

Ch 3: Multiple Linear Regression

General Linear Model: Statistical Inference

Pubh 8482: Sequential Analysis

Sample Size/Power Calculation by Software/Online Calculators

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

Stat 579: Generalized Linear Models and Extensions

STAT 285: Fall Semester Final Examination Solutions

Topic 12 Overview of Estimation

The legacy of Sir Ronald A. Fisher. Fisher s three fundamental principles: local control, replication, and randomization.

Welcome! Webinar Biostatistics: sample size & power. Thursday, April 26, 12:30 1:30 pm (NDT)

Statistical Aspects of Futility Analyses. Kevin J Carroll. nd 2013

INTERVAL ESTIMATION AND HYPOTHESES TESTING

Lecture 9 Two-Sample Test. Fall 2013 Prof. Yao Xie, H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech

Tutorial 5: Power and Sample Size for One-way Analysis of Variance (ANOVA) with Equal Variances Across Groups. Acknowledgements:

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Comparison of Different Methods of Sample Size Re-estimation for Therapeutic Equivalence (TE) Studies Protecting the Overall Type 1 Error

STAT 525 Fall Final exam. Tuesday December 14, 2010

Inference for Regression

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression

Simple logistic regression

Bayes Factor Single Arm Time-to-event User s Guide (Version 1.0.0)

UNIVERSITY OF TORONTO Faculty of Arts and Science

Two-stage Adaptive Randomization for Delayed Response in Clinical Trials

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation

Transcription:

Bios 6649: Clinical Trials - Statistical Design and Monitoring Spring Semester 2015 John M. Kittelson Department of Biostatistics & Informatics Colorado School of Public Health University of Colorado Denver c 2015 John M. Kittelson, PhD Bios 6649- pg 1

6. Other topics Other topics in design of biomedical studies than 2 treatment groups (a) > 2 unrelated treatment (one-way ANOVA) designs (d) Designs for regression endpoints endpoints 6.2 Design of non-inferiority trials 6.3 Prevention of bias (randomization and blinding) 6.4 Prevention and treatment of missing data Bios 6649- pg 2

6.1(a) Trials with more than two unrelated Evaluating more than two unrelated (1-way ANOVA) Recall inferential objective with 2-group RCT: Parameterization of treatment effect θ (true mean treatment effect) Magnitude of ˆθ used to decide whether new treatment is suited for use in practice Parameter space divided into regions of clinically important effects (in both directions) What is the question with more than 2 treatment groups? Which treatment (of many) is best? Is there a dose-response relationship? Is this several studies with a shared control group? Is a combination treatment better than each of its components? Bios 6649- pg 3

6.1(a) Trials with more than two unrelated Evaluating more than two unrelated (1-way ANOVA) Inferential objective Is there a difference between K treatment groups or are they all the same? (My observation: investigators are more often interested in identifying the particular that differ from the rest, so the ANOVA F-test does not usually satisfy the scientific/clinical objectives.) Statistical setting Hypotheses: Null hypothesis: the mean outcome is the same in all groups (θ k = θ for k = 1,..., K ) Alternative hypothesis: At least one mean differs from the others. Bios 6649- pg 4

6.1(a) Trials with more than two unrelated Evaluating more than two unrelated (1-way ANOVA) Statistical inference focuses on the F-test in a 1-way ANOVA: F = SS b (K 1) SS w K (N 1) where N is the number of subjects in treatment group, and SS b and SS w denote between-group and within-group sums of squares: SS b = SS w = N i=1 N i=1 K (Y k Y ) 2 k=1 K (Y ik Y k ) 2 k=1 The statistic follows an F-distribution with numerator and denominator degrees of freedom of K 1 and K (N 1), respectively. (Recall that the F -distribution comes from the ratio of two chi-square random variables.) Bios 6649- pg 5

Evaluating more than two unrelated Sample size evaluation (one-way ANOVA) The ANOVA setting does not fit into the usual sample size formulation (The F-statistic is not asymptotically normal) For the purposes of sample size evaluation: Assume SSw is fixed and known, and let = K (N 1) σ2 w. This is analogous to assuming σ 2 in the usual sample size evaluation. SS w With known σ 2 w known: F = SS b (K 1) σ 2 w, and F χ 2 K 1 Bios 6649- pg 6

Evaluating more than two unrelated Sample size evaluation (one-way ANOVA) Under the null hypothesis, the critical value is cv = qchisq(0.95,k-1) Finding power = Pr(F > cv): F follows a non-central chi-square distribution. Non-centrality parameter is KNδ 2 where: δ 2 = σ2 b σ 2 w. Bios 6649- pg 7

Evaluating more than two unrelated Sample size evaluation (one-way ANOVA) F χ 2 K 1, so power is given by: 1 Pr(F f ) = 1 pchisq(f,k-1,ncp = KNδ 2 ) To find the sample size that gives power β for σ 2 b/σ 2 w, we find N such that 1 Pr(F f ) = β. For sample size evaluation, we specify the alternative hypothesis in terms of σ 2 b. Suppose we want adequate power for a particular vector of θ k s: (θ 1,..., θ K ). This alternative determines σ 2 b as follows: σ 2 b = K (θ k θ) 2 k=1. K * NOTICE: That this is not the standard variance estimator. The standard estimator would have K 1 in the denominator. Bios 6649- pg 8

Evaluating more than two unrelated Sample size evaluation (one-way ANOVA) Example: Suppose that a trial is planned with 3 treatment groups. Suppose that the primary outcome is LDL-C reduction (measured as percent). Assume σw = 10%. Want power to detect a difference between treatment groups of 8%. Finding σb 2: (µ 1, µ 2, µ 3 ) = (0, 0, 8) σb 2 = 14.22 (µ 1, µ 2, µ 3 ) = (0, 2, 8) σb 2 = 11.56 (µ 1, µ 2, µ 3 ) = (0, 4, 8) σb 2 = 10.67 (µ 1, µ 2, µ 3 ) = (0, 6, 8) σb 2 = 11.56 (µ 1, µ 2, µ 3 ) = (0, 8, 8) σb 2 = 14.22 The critical value is: cv <- qchisq(0.95,2) = 5.991. Set δ 2 = σ 2 b/σ 2 w = 10.67/100 = 0.1067. For a particular choice of N, the power is given by: 1 - pchisq(cv,2,3*n*0.1067). There is just over 90% power when N = 40. Bios 6649- pg 9

Evaluating more than two unrelated Sample size evaluation (one-way ANOVA) Example in seqdesign. > dsgn <- seqdesign(prob.model="mean",arms=3,variance=sgma2w, + alt.hypo=c(0,4,8), alpha=0.05,power = 0.90335) > dsgn Call: seqdesign(prob.model = "mean", arms = 3, alt.hypothesis = c(0, 4, 8), variance = sgma2w, power = 0.90335, alpha = 0.05) PROBABILITY MODEL and HYPOTHESES: Theta is between group variance of means One-sided hypothesis test of a greater alternative: Null hypothesis : Theta <= 0.00 (size = 0.0500) Alternative hypothesis : Theta >= 10.67 (power = 0.9033) (Fixed sample test) STOPPING BOUNDARIES: Sample Mean scale Futility Efficacy Time 1 (N= 119.96) 4.9946 4.9946 > dsgn$std.bounds STOPPING BOUNDARIES: Standardized Cumulative Sum scale Futility Efficacy Time 1 (N= 1) 5.9915 5.9915 Bios 6649- pg 10

Evaluating more than two unrelated Sample size evaluation (one-way ANOVA) Comments on ANOVA designs seqdesign: There are some quirks: * Not possible to specify sample size and then calculate power * By default power = 1- alpha which can be confusing ANOVA designs are uncommon in RCTs. Interim decision-making is not implemented in RCTdesign * It is difficult to specify reasons for early termination with more. * See above comments regarding inferential objective. Bios 6649- pg 11

Evaluating more than two unrelated Definition: The term factorial design" refers to an experiment that evaluates several factors simultaneously. The most common application in the health sciences setting is a 2 2 factorial design of two different (A vs B) and 4 treatment groups: * no A; no B * no A; B * A; no B * A; B I have also seen 2-factor designs with more than 2 levels (e.g., a 3 4 factorial design) or with more than 2 (e.g., a 2 2 2 factorial design). Inferential objective: There are 3 common reasons for using a factorial design (Classical) An efficient way to study 2 separately. Study the interaction between two drugs. Selecting the best treatment from 4 different options. Bios 6649- pg 12

Evaluating more than two unrelated (statistical setting in a 2 2 factorial design) Data: Y ijk = outcome for: ith subject (i = 1,..., N) jth level of treatment A (j = 0, 1) kthe level of treatment B (k = 0, 1) Distribution: Y jk N (θ jk, σ 2 /N). Tests: Potential tests include (see next page): Effect of treatment A (marginal) Effect of treatment B (marginal) Interaction between treatment A and B. Bios 6649- pg 13

Evaluating more than two unrelated (statistical setting in a 2 2 factorial design) Effect of treatment A: Y 10 + Y 11 2 Y 00 + Y 01 2 N (θ 1 θ 0, σ2 N ) where θ j = (θ j0 + θ j1 )/2. Effect of treatment B: Y 01 + Y 11 2 Y 00 + Y 10 2 N (θ 1 θ 0, σ2 N ) where θ k = (θ 0k + θ 1k )/2. Test interaction: Is the effect of A if you receive B the same as its effect if you do not receive B? (Y 11 Y 01 ) (Y 10 Y 00 ) N ((θ 11 θ 01 ) (θ 10 θ 00 ), 4σ2 N Selecting best of the 4 : test for any evidence of group differences in a 1-way ANOVA. ) Bios 6649- pg 14

Evaluating more than two unrelated (statistical setting in a 2 2 factorial design) Sample size evaluation: Let θ + denote the smallest important difference. Test effects of treatment A and (separately) the effects of treatment B: ( ) 2 zα + z β N main = σ 2 Test for interaction: θ + N Xact = ( zα + z β θ + ) 2 4σ 2 Notes on designing for interaction: * N Xact = 4N main * Interactions are usually smaller than main effects. When designing to test interactions design alternative (µ + ) could be smaller in which case N Xact > 4N main. Bios 6649- pg 15

Evaluating more than two unrelated (statistical setting in a 2 2 factorial design) One-way ANOVA approach: δ = σ2 b σ 2 where σ 2 b is calculated from θ jk. If you want to select the sample size to detect when at least one treatment has effect θ +, then: * choosing (µ 00, µ 01, µ 10, µ 11 ) = (0, 0, µ +, µ + ) gives smallest samples size. * choosing (µ 00, µ 01, µ 10, µ 11 ) = (0, 1 3 µ +, 2 3 µ +, µ + ) gives largest samples size. Bios 6649- pg 16

Evaluating more than two unrelated (inferential objective) Inferential objective: Treatment groups represent different doses of a drug. The FDA often requires evidence of a dose-response relationship. Some approaches to inference: Identify two dose levels as being of primary interest. Ask if any group differs from the others (ANOVA design). Compare all dose levels to 0-dose (placebo) to determine the minimally effective dose. Ask if there is evidence that response increases with dose. I will discuss the last setting. In my experience this is a common FDA motivation for dose-response studies. Bios 6649- pg 17

Evaluating more than two unrelated (Statistical setting) Treatment groups are ordered by dose. Let µ d, d = 1,..., D denote the mean outcome in the ordered dose groups. The dose-response hypothesis is that response increases with dose. Usually monotonicity is also hypothesized; that is, µ 0 < µ D with µ 0 µ 1... µ D 1 µ D Notice that this definition divides the parameter space into null and alternative regions: Null region: any non-monotonic or decreasing response. Alternative region: monotonic increasing response. As with ANOVA, these regions are multi-dimensional, and so it is not possible to express the null and alternative hypotheses by one-dimensional inequalities. In contrast to ANOVA, the null region is bigger than µ 0 = µ 1 =,..., = µ D. Bios 6649- pg 18

Evaluating more than two unrelated (Statistical setting) The multi-dimensional regions can be reduced to a single parameter using an appropriate contrast, θ = d w dµ d. A linear-trend contrast is interpreted as the first-order approximation to the (potentially non-linear) dose-response function. If the true dose-response function is non-linear, then a linear-trend contrast will still generalize to populations treated with the same dose levels. Linear-trend contrast (equally-spaced dose-levels): w d = d d. The contrast θ = d w dµ d is interpretable as the change in response for a 1-level change in dose. Bios 6649- pg 19

Evaluating more than two unrelated (Statistical setting) Notes: There are other linear" contrasts (e.g., orthogonal polynomials), but the above contrast is the linear contrast for a linear trend. The contrast can be rescaled to change the interpretation (e.g., change units). Unless you know that the dose-response relationship is exactly linear, then it is better to fit a linear contrast across dose levels than to treat dose as continuous and estimate the slope parameter in a regression model. If dose is treated as continuous, then systematic departures from linearity are counted as residual variation thereby decreasing power. Bios 6649- pg 20

Evaluating more than two unrelated (sample size evaluation) Suppose that you have D + 1 dose groups (d = 0,..., D), and that you want adequate power for dose-response functions with µ D µ 0 µ +. As above, consider a linear-trend contrast: With θ = d w d µ d where w d = (d d) Estimate θ by (ˆθ) = d w d Y d. If the variance is constant across dose groups, then with N patients per group, ˆθ N (θ, d w d 2 σ 2 N ). Assume µ0 = 0. If µ d is in fact linear (i.e., µ d = d D µ D), then the sample size (in each dose group) require for power β when µ D = µ + is given by: N = z α + z β D d w d µ D + d=0 2 D wd 2 σ 2 d=0 You can also evaluate power under non-linear dose-response relationships; for example if µ d = µ D (d/d) P for several choices of P 1. Bios 6649- pg 21

Evaluating more than two unrelated (example) Example: Consider a trial in which children are randomized to different doses of inhaled steroid for asthma control (Busse WW (1999) J. Allergy Clin Immunol; 1215-22). Suppose that doses will be 100µg/d, 400µg/d, and 800µg/d. Response will be measured by FEV(%predicted) at week 6 and standard deviation is about 10(%predicted). Suppose that you want to detect a linear trend in dose-response of 1(% predicted) for every 100µg/d increase in steroid dose. This is similar to the magnitude of the dose-response relationship for FEV 1 in the paper. Bios 6649- pg 22

Evaluating more than two unrelated (example) Approach Linear-trend contrast: With 3 equally-spaced dose-groups the contrast is w d = ( 1, 0, 1). This study does not have equally-spaced doses, so we use wd = ( 1, 0.1, 1.1). To get the correct interpretation we use w d = wd /7.4 If response increases by 1(%pred) for every 100µg/d increase in steroid then if µ 100 = 20, µ 400 = 23, and µ 800 = 27 so that w d µ d = 1 Sample size for linear trend (90% power): * variance = d w 2 d σ2 = 0.04054 10 2 * Design alternative = θ + = 1: d ( ) zα + z 2 β N = 4.054 1 ( ) 1.96 + 1.28 2 = 4.054 1 = 42.56 Bios 6649- pg 23

Evaluating more than two unrelated (example) Evaluate sensitivity of power to non-linear dose-response relationships: Suppose µ d = 20 + 27(d/D) P. A study with 43 subjects per group has the following power for non-linear relationships: P θ + Power 0.20 0.961 0.879 0.50 0.979 0.890 0.80 0.993 0.898 1.00 1.000 0.903 1.25 1.008 0.907 2.00 1.023 0.915 5.00 1.039 0.923 Bios 6649- pg 24

Evaluating more than two unrelated (example): Implementation in RCTdesign R code for above calculations: > # Linear contrast: > d <- c(100,400,800)/100 > wd <- d - mean(d) > wd <- wd/sum(wd^2) > # With this contrast thetaa is 1.0: > thetaa <- sum(wd*c(20,23,27)) > thetaa [1] 1 Equivalent seqdesign command: > fxd <- seqdesign(arms=0,variance=100,ratio=c(1,4,8), + alt.hypo=1,power="calculate",sample.size=42.56*3) PROBABILITY MODEL and HYPOTHESES: Theta is difference in means per unit difference in treatment level One-sided hypothesis test of a greater alternative: Null hypothesis : Theta <= 0 (size = 0.0250) Alternative hypothesis : Theta >= 1 (power = 0.8997) (Fixed sample test) STOPPING BOUNDARIES: Sample Mean scale Futility Efficacy Time 1 (N= 127.68) 0.6049 0.6049 Bios 6649- pg 25

Evaluating more than two unrelated Notes on normed contrasts: A normed contrast has length = 1; i.e., divide by d w d 2 A normed contrast may not preserve the units of interest (i.e., θ A may not have desired interpretation) Dose response example with normed contrast: > d <- c(100,400,800)/100 > wd <- d - mean(d) > wdn <- wd/sqrt(sum(wd^2)) > sum(wdn^2) [1] 1 > > thetaa <- sum(wdn*c(20,23,27)) > thetaa [1] 4.9666 Bios 6649- pg 26

Consider a study for estimating the linear trend (θ 1 ) in outcome Y with a continuous explanatory (X) variable: Least squares estimator: ˆθ 1 = E(Y i ) = θ 0 + θ 1 X i N i (X i X)(Y i Y ) i (X i X) 2 ( θ 1, V ) N where: V N = σ 2 Y X i (X i X) 2 σ2 Y X Nσ 2 X Bios 6649- pg 27

Sample size evaluation for regression designs Usual sample size formula can be used with the variance of ˆθ 1 /4: N = ( zα + z β θ 1+ ) 2 V = ( zα + z β θ 1+ ) 2 σ 2 Y X σ 2 X For example we require about 100 patients if we want 97.5% power for θ + = 0.25 when σ 2 Y X = 2 and σ 2 X = 5: N = ( ) 2 3.92 0.4 0.25 = 98.5 The corresponding seqdesign command, assuming X i N ((0,ratio) is: dsgn <- seqdesign(arms=0,variance=2,ratio=5,alt.hypo=0.25) Logistic regression designs can be specified using the odds probability model (see users guide). Bios 6649- pg 28

endpoints (e) Designs using change from baseline endpoints In many trial the primary endpoint is the change from baseline in a continuous outcome measure Examples: * Change in FEV1 * Change in 6-minute walk distance * Change in weight (or BMI) Motivation for analyzing change: * Looking at change within a subject removes between-subject variability * May reduce variation (increase power) Which is the most precise estimate of treatment effect? * T-test of difference in mean at follow-up? * T-test of difference in change from baseline? * Regression (ANCOVA) of difference in followup given baseline? Bios 6649- pg 29

endpoints (e) Designs using change from baseline endpoints Setting: Let Yi0k and Y i1k denote a measurement on the ith subject in treatment group k (k = 0, 1) at baseline and follow-up. Let Dik = Y i1k Y i0k Consider the following statistical models for testing treatment effect (Tx = 1 [k=1] ) (a) Test difference at follow-up time: E(Y i1k ) = α 0 + α 1 Tx (b) Test change from baseline: E(D ik ) = β 0 + β 1 Tx (c) Test follow-up meausure given baseline measure (ANCOVA): E(Y i1k ) = γ 0 + γ 1 Tx + γ 2 E(Y i0k ) Bios 6649- pg 30

endpoints Designs using change from baseline endpoints Probability model Suppose that ( ) [( ) ] Y 0ik µ 00 N, Σ Y 1ik µ 1k where Σ = [ σ 2 ] ρσ 2 ρσ 2 σ 2 notice that µ 0k = µ 01 = µ 00 by randomization Bios 6649- pg 31

endpoints Designs using change from baseline endpoints Distribution of estimated treatment effects from the 3 statistical models with i = 1,...N subjects per group: (a) E(Y i1k ) = α 0 + α 1 Tx (b) E(D ik ) = β 0 + β 1 Tx ˆα 1 N ˆβ 1 N (µ 1 µ 0, 2σ2 N ) ( ) µ 1 µ 0, 4σ2 (1 ρ) N (c) E(Y i1k ) = γ 0 + γ 1 Tx + γ 2 E(Y i0k ) ( ) ˆγ 1 N µ 1 µ 0, 2σ2 (1 ρ 2 ) N where µ 1 = µ 11 µ 10 ; also notice: α 1 = µ 1 β 1 = µ 1 = (µ 11 µ 00 ) (µ 10 µ 00 ) γ 1 = µ 1 Bios 6649- pg 32

endpoints Designs using change from baseline endpoints Relative efficiency of the designs using the 3 statistical models: (a) vs (b) When is var( ˆβ 1 ) < var(ˆα 1 )? var( ˆβ 1 ) < var(ˆα 1 ) 4σ 2 (1 ρ) < 2σ 2 0.5 < ρ < 1.0 (a) vs (c) When is var(ˆγ 1 ) < var(ˆα 1 )? var(ˆγ 1 ) < var(ˆα 1 ) 2σ 2 (1 ρ 2 ) < 2σ 2 0.0 < ρ < 1.0 (b) vs (c) When is var(ˆγ 1 ) < var( ˆβ 1 )? var(ˆγ 1 ) < var( ˆβ 1 ) 2σ 2 (1 ρ 2 ) < 4σ 2 (1 ρ) 0.0 < ρ < 1.0 So, conditioning on baseline (ANCOVA) is more efficient (see also examples/proof in handout) Note: ˆγ1 is the same regardless of whether Y i1k or D ik is used as the response variable (try it). Bios 6649- pg 33