BIOS 6649: Handout Exercise Solution

Similar documents
Bios 6648: Design & conduct of clinical research

Bios 6649: Clinical Trials - Statistical Design and Monitoring

Accounting for Baseline Observations in Randomized Clinical Trials

Accounting for Baseline Observations in Randomized Clinical Trials

MS&E 226: Small Data

BIOSTATISTICAL METHODS

Advanced Quantitative Research Methodology, Lecture Notes: Research Designs for Causal Inference 1

Chapter 12 - Lecture 2 Inferences about regression coefficient

Homoskedasticity. Var (u X) = σ 2. (23)

The variable θ is called the parameter of the model, and the set Ω is called the parameter space.

General Linear Model: Statistical Inference

BIOS 2083 Linear Models c Abdus S. Wahed

POL 681 Lecture Notes: Statistical Interactions

Two Factor Full Factorial Design with Replications

Causal Inference with Big Data Sets

Three-Level Modeling for Factorial Experiments With Experimentally Induced Clustering

7 Estimation. 7.1 Population and Sample (P.91-92)

Recitation 5. Inference and Power Calculations. Yiqing Xu. March 7, 2014 MIT

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal

Linear Models and Estimation by Least Squares

Simple Linear Regression

BIOS 2083: Linear Models

Comparing Group Means When Nonresponse Rates Differ

Sample Size and Power Considerations for Longitudinal Studies

Approximate analysis of covariance in trials in rare diseases, in particular rare cancers

Lecture Outline. Biost 518 Applied Biostatistics II. Choice of Model for Analysis. Choice of Model. Choice of Model. Lecture 10: Multiple Regression:

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

Practical Considerations Surrounding Normality

Econ 583 Final Exam Fall 2008

STAT 4385 Topic 01: Introduction & Review

Use of Matching Methods for Causal Inference in Experimental and Observational Studies. This Talk Draws on the Following Papers:

Institute of Actuaries of India

Ph.D. Preliminary Examination Statistics June 2, 2014

ECO375 Tutorial 8 Instrumental Variables

A Measure of Robustness to Misspecification

Regression. Estimation of the linear function (straight line) describing the linear component of the joint relationship between two variables X and Y.

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

Measuring the fit of the model - SSR

BIOS 312: Precision of Statistical Inference

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

1. Regressions and Regression Models. 2. Model Example. EEP/IAS Introductory Applied Econometrics Fall Erin Kelley Section Handout 1

AGEC 661 Note Fourteen

Confidence Intervals, Testing and ANOVA Summary

Personalized Treatment Selection Based on Randomized Clinical Trials. Tianxi Cai Department of Biostatistics Harvard School of Public Health

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key

Problem Set #5. Due: 1pm on Friday, Nov 16th

Final Exam. Economics 835: Econometrics. Fall 2010

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

Unit 6 - Simple linear regression

Example. Multiple Regression. Review of ANOVA & Simple Regression /749 Experimental Design for Behavioral and Social Sciences

Propensity Score Methods for Causal Inference

STA 2201/442 Assignment 2

Growth Mixture Model

6.867 Machine Learning

Accepted Manuscript. Comparing different ways of calculating sample size for two independent means: A worked example

Covariance. Lecture 20: Covariance / Correlation & General Bivariate Normal. Covariance, cont. Properties of Covariance

A Significance Test for the Lasso

Supplemental Materials. In the main text, we recommend graphing physiological values for individual dyad

1. (Rao example 11.15) A study measures oxygen demand (y) (on a log scale) and five explanatory variables (see below). Data are available as

Inference for the Regression Coefficient

Tutorial 5: Power and Sample Size for One-way Analysis of Variance (ANOVA) with Equal Variances Across Groups. Acknowledgements:

Rule of Thumb Think beyond simple ANOVA when a factor is time or dose think ANCOVA.

Modeling the Covariance

Central Limit Theorem ( 5.3)

Regression Analysis. Table Relationship between muscle contractile force (mj) and stimulus intensity (mv).

Lawrence D. Brown* and Daniel McCarthy*

Topic 12 Overview of Estimation

Growth Curve Modeling Approach to Moderated Mediation for Longitudinal Data

Chapter 14 Simple Linear Regression (A)

Bios 6648: Design & conduct of clinical research

2.1 Linear regression with matrices

A re-appraisal of fixed effect(s) meta-analysis

Scatter plot of data from the study. Linear Regression

Covariance and Correlation

Linear models and their mathematical foundations: Simple linear regression

Bivariate distributions

Keppel, G. & Wickens, T. D. Design and Analysis Chapter 12: Detailed Analyses of Main Effects and Simple Effects

Spatial inference. Spatial inference. Accounting for spatial correlation. Multivariate normal distributions

VIII. ANCOVA. A. Introduction

Econometrics I KS. Module 1: Bivariate Linear Regression. Alexander Ahammer. This version: March 12, 2018

Specification Errors, Measurement Errors, Confounding

Causal Mechanisms Short Course Part II:

Lecture 5: ANOVA and Correlation

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Multiple Regression Analysis: The Problem of Inference

Selection on Observables: Propensity Score Matching.

Unit 10: Simple Linear Regression and Correlation

Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13)

The Application and Promise of Hierarchical Linear Modeling (HLM) in Studying First-Year Student Programs

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Group Sequential Tests for Delayed Responses. Christopher Jennison. Lisa Hampson. Workshop on Special Topics on Sequential Methodology

Scatter plot of data from the study. Linear Regression

Topic 16 Interval Estimation

multilevel modeling: concepts, applications and interpretations

Regression Analysis. Ordinary Least Squares. The Linear Model

Linear Regression Measurement & Evaluation of HCC Systems

Unit 6 - Introduction to linear regression

Accounting for Regression to the Mean and Natural Growth in Uncontrolled Weight Loss Studies

Transcription:

BIOS 6649: Handout Exercise Solution NOTE: I encourage you to work together, but the work you submit must be your own. Any plagiarism will result in loss of all marks. This assignment is based on weight-loss data in the file WeightLoss.csv. The data set has 11 columns comprised of a subject id, treatment group, and weights at times (0, 6, 12, 18, 24, 30, 36, 42, 48) months. Read the data into R. Note: The file WeightLoss.R contains relevant R-code. In that file I have standardized the contrasts that are discussed in the problem so that the results are on the same scale. You can submit either the standardized or non-standardized contrasts for your answers. 1. Used the observed weight loss data to answer the following: (a) Plot the data: i. Plot the data for each subject in the control group on a single graph (one line per subject). ii. Plot the data for each subject in the intervention group on a single graph (one line per subject). Answer: Group 1 Group 0 Weight 60 100 140 180 Weight 60 100 140 180 0 10 20 30 40 Months post randomization 0 10 20 30 40 Months post randomization 1

(b) Calculate the contrast corresponding to the average change from baseline and its variance for each treatment group (separately). Specifically (using R): Get the average in each group. Get the covariance matrix in each group. Create the contrast vector: w = ( 1, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8). Use the above elements to answer the question (see source file). (c) What is the between-group difference in the above contrast; specifically: i. What is the value of the between-group difference in the average change contrast? ii. What is the standard error of the between-group difference in the average change contrast? iii. What is the 95% confidence interval for the between-group difference in the average change contrast? (d) Repeat part (c) using the 48-month change contrast: w = ( 1, 0, 0, 0, 0, 0, 0, 0, 1). Submit the value for the contrast, its standard error, and the ratio between the contrast value and the standard error. (e) Repeat part (c) using the 48-month linear-trend (slope) contrast: w = ( 4, 3, 2, 1, 0, 1, 2, 3, 4). Submit the value for the contrast, its standard error, and the ratio between the contrast value and the standard error. Answers to problem 1 (with standardized contrast): Control Intervention Avg (sd) Avg (sd) Difference SE CI Diff/SE* Average change -0.122 (3.633) -4.725 (5.117) -4.603 0.583 (-5.746, -3.460) -7.895 48-mo change -0.218 (3.975) -1.765 (3.930) -1.547 0.523 (-2.571, -0.522) -2.958 Linear trend -0.0003 (0.1651) -0.0051 (0.1521) -0.0049 0.0210 (-0.0461, 0.0363) -0.2322 * Between group difference divided by the standard error (i.e., the t-statistic). Answers to problem 1 (without standardization of contrast): Control Intervention Avg (sd) Avg (sd) Difference SE CI Diff/SE* Average change -0.138 (4.087) -5.316 (5.757) -5.179 0.656 (-6.464, -3.893) -7.895 48-mo change -0.436 (7.950) -3.529 (7.860) -3.093 1.046 (-5.143, -1.043) -2.958 Linear trend -0.0909 (59.4347) -1.8487 (54.7637) -1.7578 7.5707 (-16.5964, 13.0808) -0.2322 2

Suppose that you are considering a new weight intervention study. It will enroll 500 subjects per group. Problems 2-4 illustrate how you might evaluate the effect size for various timetrajectory contrasts over different values for the true mean weight loss trajectory. Use the observed covariance matrix from the data in problem 1 to answer these questions. 2. Suppose that the mean weight change at the measurement times is µ 0 = (0, 0, 0, 0, 0, 0, 0, 0, 0) (control group) and µ 1 = (0, 8, 8, 8, 8, 8, 8, 8, 8) in the treatment group. (a) What is θ if the time trajectory is summarized by the average change contrast: w = ( 1, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8)? What is the anticipated standard error of ˆθ? What is the ratio between the anticipated effect and its standard error? (b) What is θ if the time trajectory is summarized by the 48-month change contrast: w = ( 1, 0, 0, 0, 0, 0, 0, 0, 1)? What is the anticipated standard error of ˆθ? What is the ratio between the anticipated effect and its standard error? (c) What is θ if the time trajectory is summarized by the linear-trend (slope) contrast: w = ( 4, 3, 2, 1, 0, 1, 2, 3, 4)? What is the anticipated standard error of ˆθ? What is the ratio between the anticipated effect and its standard error? Answers to problem 2: Control Intervention Avg (sd) Avg (sd) Difference SE CI Diff/SE* Average change 0.000 ( 3.633) -7.111 ( 5.117) -7.111 0.281 ( -7.661, -6.561) -25.339 48-mo change 0.000 ( 3.975) -4.000 ( 3.930) -4.000 0.250 ( -4.490, -3.510) -16.001 Linear trend 0.0000 (0.1651) -0.0889 (0.1521) -0.0889 0.0100 (-0.1086, -0.0692) -8.8537 * Between group difference divided by the standard error (i.e., the t-statistic). 3

3. Suppose that the mean weight change at the measurement times is µ 0 = (0, 0, 0, 0, 0, 0, 0, 0, 0) (control group) and µ 1 = (0, 8, 7, 6, 5, 4, 3, 2, 1) in the treatment group. (a) What is θ if the time trajectory is summarized by the average change contrast: w = ( 1, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8)? What is the anticipated standard error of ˆθ? What is the ratio between the anticipated effect and its standard error? (b) What is θ if the time trajectory is summarized by the 48-month change contrast: w = ( 1, 0, 0, 0, 0, 0, 0, 0, 1)? What is the anticipated standard error of ˆθ? What is the ratio between the anticipated effect and its standard error? (c) What is θ if the time trajectory is summarized by the linear-trend (slope) contrast: w = ( 4, 3, 2, 1, 0, 1, 2, 3, 4)? What is the anticipated standard error of ˆθ? What is the ratio between the anticipated effect and its standard error? Answers to problem 3: Control Intervention Avg (sd) Avg (sd) Difference SE CI Diff/SE* Average change 0.000 ( 3.633) -4.000 ( 5.117) -4.000 0.281 ( -4.550, -3.450) -14.253 48-mo change 0.000 ( 3.975) -0.500 ( 3.930) -0.500 0.250 ( -0.990, -0.010) -2.000 Linear trend 0.0000 (.1651) 0.0667 (.1521) 0.0667.0100 (0.0470, 0.0863) 6.6403 * Between group difference divided by the standard error (i.e., the t-statistic). 4

4. Suppose that the mean weight change at the measurement times is µ 0 = (0, 0, 0, 0, 0, 0, 0, 0, 0) (control group) and µ 1 = (0, 1, 2, 3, 4, 5, 6, 7, 8) in the treatment group. (a) What is θ if the time trajectory is summarized by the average change contrast: w = ( 1, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8)? What is the anticipated standard error of ˆθ? What is the ratio between the anticipated effect and its standard error? (b) What is θ if the time trajectory is summarized by the 48-month change contrast: w = ( 1, 0, 0, 0, 0, 0, 0, 0, 1)? What is the anticipated standard error of ˆθ? What is the ratio between the anticipated effect and its standard error? (c) What is θ if the time trajectory is summarized by the linear-trend (slope) contrast: w = ( 4, 3, 2, 1, 0, 1, 2, 3, 4)? What is the anticipated standard error of ˆθ? What is the ratio between the anticipated effect and its standard error? Answers to problem 4: Control Intervention Avg (sd) Avg (sd) Difference SE CI Diff/SE* Average change 0.000 ( 3.633) -4.000 ( 5.117) -4.000 0.281 ( -4.550, -3.450) -14.253 48-mo change 0.000 ( 3.975) -4.000 ( 3.930) -4.000 0.250 ( -4.490, -3.510) -16.001 Linear trend 0.0000 ( 0.1651) -0.1667 ( 0.1521) -0.1667 0.0100 ( -0.1863, -0.1470) -16.6008 * Between group difference divided by the standard error (i.e., the t-statistic). 5

5. Use your answers to the above problems in answering the following: (a) Which of the above hypothetical mean weight loss trajectories (µ 1 ) is most likely to be affiliated with beneficial health outcomes? Which trajectory is not very likely to result in good outcomes? Give a reason for your answer. Answer: In general we think that losing weight and keeping it off has the most benefit (trajectory in problem 2). Weight rebound is not good, although the trajectory in problem 3 shows long-term weight regain rather than short term loss and immediate weight regain. The trajectory in problem 4 is probably of intermediate health benefit; certainly slow steady weight loss is better than slow steady weight gain. (b) Based on the calculations in problems 1-4 recommend a contrast w for use in the new trial. Give a reason for your choice. Answer: As we have discussed in class, our choice of summary measure (contrast) will induce an order on the longitudinal outcome space. The orders will differ according to the summary measure that is selected. This is true regardless of whether you use summary measures (as above) or a mixed-model repeated-measures ANOVA summary (as demonstrated in class). The contrast should be selected to be sensitive (large) to true mean trajectories that are felt to convey health benefits, and not sensitive (small) for trajectories that are not likely to have long-term health benefits. As argued above, it is probably best to lose weight and keep it off (problem 2), it is probably bad to lose weight and regain it (problem 3), and of intermediate benefit to lose weight slowly over time (problem 4). The following conclusions follow based on the last column of the results tables for problems 2-4: The slope contrast is very sensitive to trends. In problem 3 it would find that the intervention group was significantly worse than control (positive value in the last column). It would find that intervention was highly beneficial if weight loss was linearly decreasing. It is not as sensitive as other contrasts to long-term maintenance of weight loss (problem 2). The average change contrast is sensitive to all of these weight loss trajectories (particularly the long-term weight loss - problem 2). It might not be desirable that it is sensitive to the weight rebound example (problem 3). The 48-month change contrast seems to be most sensitive to the changes that are likely to have greatest health benefits and least sensitive to trajectories that are less likely to be beneficial. I am inclined toward the 48-month change contrast. I would work with study investigators to make sure I understood the potential health benefits affiliated with each of the trajectories. I would discuss the above evaluation with the investigator team to make sure that we were all in agreement as to the best contrast. 6

6. Suppose that you have a randomized clinical trial with 2 groups and n participants per group. Suppose that you have a baseline measurement and a follow-up measurement on every participant. Denote the data by Y 0ik and Y 1ik for the baseline and follow-up measurements in the ith participant (i = 1,..., n) of the kth treatment group (k = 0, 1). Suppose that ( ) [( ) ] Y0ik µ0k N, Σ (1) Y 1ik µ 1k where Σ = [ ] σ 2 ρσ 2 ρσ 2 σ 2 (a) Find the variance of the between-group mean difference at the last measurement times; i.e.: var(y 11 Y 10 ) (b) Find the variance of the between-group mean difference in the change from baseline to follow-up; i.e., var[(y 11 Y 01 ) (Y 10 Y 00 )]. (c) Find the conditions under which the variance in 6a is smaller than the variance in 6b. (d) What are the implications of this result for study design (or data analysis)? Answer: (a) var[y 11 Y 10 ] = 2σ 2 /n (b) var[(y 11 Y 01 ) (Y 10 Y 00 )]: var[(y 11 Y 01 ) (Y 10 Y 00 )] = var[(y 11 Y 01 )] + var[(y 10 Y 00 )] = 2σ 2 /n 2ρσ 2 /n + 2σ 2 /n 2ρσ 2 /n = 4σ 2 (1 ρ)/n (c) The second measure is more efficient when its variance is smaller than that of the first measure: var[(y 11 Y 01 ) (Y 10 Y 00 )] < var[y 11 Y 10 ] 4σ 2 (1 ρ)/n < 2σ 2 /n (1 ρ) < 0.5 ρ > 0.5 (d) Thus, the correlation must be fairly strong before it is more efficient to measure the difference between treatment groups by the difference in the average change as opposed to the difference in the average outcome at follow-up. 7

7. Use the distribution from equation (1) when answering the following: (a) What is the interpretation of β 2 in the following regression model? E(Y 1ik ) = β 0 + β 1 Y 0ik + β 2 1 [k=1] where 1 [k=1] is the indicator function for treatment group 1. (b) What is the variance of ˆβ 2 (estimated using linear regression)? [Hint: See last page for a summary of the variance of regression coefficients.] (c) Find the conditions under which the variance of ˆβ 2 is smaller than either the variance of 6a or 6b. (d) What are the implications of this result for study design (or data analysis)? Answer: (a) β 2 is the expected difference between treatment groups in two populations that have the same mean baseline level of the outcome variable. (b) To apply these identities in this problem you need to recognize that X in the identity on the last page represents the indicator for treatment effect (1 [ k = 1]) and Z represents the baseline measure (Y 0ik ). You then need to recognize that ρ is the correlation between baseline and follow-up measures and that it is the same in both treatment groups; thus, ρ Y Z X = ρ. Furthermore, within each treatment group the variance of the outcome measure is the same; thus, σ 2 Y X = σ2. Finally, you need to recognize that the sample size in the identities is the total number of participants in the trial; thus N = 2n where n is the number per group. Now from the identities: var( ˆβ 2 ) = σ2 Y XZ Nσ 2 X Z = σ2 Y X (1 ρ2 Y Z X ) Nσ 2 X Z = σ2 (1 ρ 2 ) 2nσ 2 X Z Now, note that σx Z 2 represents the variance of the treatment group indicator given the baseline level of the outcome. Since treatment groups were randomly assigned, the treatment group indicator takes the value 1 with probably 0.5 and the value 0 with probability 0.5 regardless of baseline value; thus E(X Z) = 0.5 and: σ 2 X Z = E(X E(X))2 = (1 0.5) 2 0.5 + (0 0.5) 2 0.5 = 0.25 It follows that var( ˆβ 2 ) = 2σ 2 (1 ρ 2 )/n 8

(c) The variance of the regression analysis is smaller that analysis of the difference between change whenever: var( ˆβ 2 ) < var[(y 11 Y 01 ) (Y 10 Y 00 )] 2σ 2 (1 ρ 2 ) < 4σ 2 (1 ρ) (1 ρ)(1 + ρ) < 2(1 ρ) 1 + ρ < 2 ρ < 1 which is always true, therefore the regression analysis is always more efficient. The variance of the regression analysis is smaller that analysis of follow-up measures whenever: var( ˆβ) < var[y 11 Y 10 ] 2σ 2 (1 ρ 2 ) < 2σ 2 (1 ρ 2 ) < 1 which is always true (since 1 ρ 1), therefore the regression analysis is always more efficient. (d) The above proof shows that it is always more efficient to condition on the baseline value in a regression analysis when analyzing pre-post (before-after) outcomes. There are two caveats that you should realize before applying this uniformly: The above result assumes that n is large because I have ignored degrees of freedom. If you include degrees of freedom, then the regression method is more efficient as long as there are more than about 15 subjects per group. I recommending using the regression method in a randomized trial where the distribution of baseline values the same in both treatment groups. In an observational study where the distribution of the baseline value may differ between two exposure categories, it is possible to estimate a different quantity than you are estimating with an analysis of change (i.e., one of the analyses is biased relative to the other). 9

8. In problem 7, what is the probability model, the functional, and the contrast that define the statistical model for scientific inference? Answer: It is not necessary to assume that the data are normally distributed; estimated regression coefficients will be normally distributed by a form of the central limit theorem. Similarly, it is not necessary to assume that there is a linear relationship between baseline and follow-up measurements; the coefficient β 1 is the first-order approximation to the nonlinear relationship between baseline and follow-up measures. Probability Model: Non-parametric (assuming only that the regression coefficients are normally distributed). Functional: Mean level of outcome, conditional on baseline level. Contrast: Difference in mean outcome levels (conditional on baseline). 10

Variance of a conditional coefficient in multiple linear regression Consider the linear regression model: E(Y ) = β 0 + β 1 Z + β 2 X Given data (Y i, X i, Z i ) on i = 1,..., N subjects, it is possible to show: (a) Variance of ˆβ 2 : var( ˆβ 2 ) = σ2 Y XZ Nσ 2 X Z where σy 2 XZ denotes the variance of Y given X and Z, σ2 X Z given Z, and N denotes the total sample size. (b) Conditional variance of Y given X and Z: denotes the variance of X σ 2 Y XZ = σ2 Y X (1 ρ2 Y Z X ) where ρ Y Z X denotes the correlation of Y and Z given X. 11