BIOS 6649: Handout Exercise Solution NOTE: I encourage you to work together, but the work you submit must be your own. Any plagiarism will result in loss of all marks. This assignment is based on weight-loss data in the file WeightLoss.csv. The data set has 11 columns comprised of a subject id, treatment group, and weights at times (0, 6, 12, 18, 24, 30, 36, 42, 48) months. Read the data into R. Note: The file WeightLoss.R contains relevant R-code. In that file I have standardized the contrasts that are discussed in the problem so that the results are on the same scale. You can submit either the standardized or non-standardized contrasts for your answers. 1. Used the observed weight loss data to answer the following: (a) Plot the data: i. Plot the data for each subject in the control group on a single graph (one line per subject). ii. Plot the data for each subject in the intervention group on a single graph (one line per subject). Answer: Group 1 Group 0 Weight 60 100 140 180 Weight 60 100 140 180 0 10 20 30 40 Months post randomization 0 10 20 30 40 Months post randomization 1
(b) Calculate the contrast corresponding to the average change from baseline and its variance for each treatment group (separately). Specifically (using R): Get the average in each group. Get the covariance matrix in each group. Create the contrast vector: w = ( 1, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8). Use the above elements to answer the question (see source file). (c) What is the between-group difference in the above contrast; specifically: i. What is the value of the between-group difference in the average change contrast? ii. What is the standard error of the between-group difference in the average change contrast? iii. What is the 95% confidence interval for the between-group difference in the average change contrast? (d) Repeat part (c) using the 48-month change contrast: w = ( 1, 0, 0, 0, 0, 0, 0, 0, 1). Submit the value for the contrast, its standard error, and the ratio between the contrast value and the standard error. (e) Repeat part (c) using the 48-month linear-trend (slope) contrast: w = ( 4, 3, 2, 1, 0, 1, 2, 3, 4). Submit the value for the contrast, its standard error, and the ratio between the contrast value and the standard error. Answers to problem 1 (with standardized contrast): Control Intervention Avg (sd) Avg (sd) Difference SE CI Diff/SE* Average change -0.122 (3.633) -4.725 (5.117) -4.603 0.583 (-5.746, -3.460) -7.895 48-mo change -0.218 (3.975) -1.765 (3.930) -1.547 0.523 (-2.571, -0.522) -2.958 Linear trend -0.0003 (0.1651) -0.0051 (0.1521) -0.0049 0.0210 (-0.0461, 0.0363) -0.2322 * Between group difference divided by the standard error (i.e., the t-statistic). Answers to problem 1 (without standardization of contrast): Control Intervention Avg (sd) Avg (sd) Difference SE CI Diff/SE* Average change -0.138 (4.087) -5.316 (5.757) -5.179 0.656 (-6.464, -3.893) -7.895 48-mo change -0.436 (7.950) -3.529 (7.860) -3.093 1.046 (-5.143, -1.043) -2.958 Linear trend -0.0909 (59.4347) -1.8487 (54.7637) -1.7578 7.5707 (-16.5964, 13.0808) -0.2322 2
Suppose that you are considering a new weight intervention study. It will enroll 500 subjects per group. Problems 2-4 illustrate how you might evaluate the effect size for various timetrajectory contrasts over different values for the true mean weight loss trajectory. Use the observed covariance matrix from the data in problem 1 to answer these questions. 2. Suppose that the mean weight change at the measurement times is µ 0 = (0, 0, 0, 0, 0, 0, 0, 0, 0) (control group) and µ 1 = (0, 8, 8, 8, 8, 8, 8, 8, 8) in the treatment group. (a) What is θ if the time trajectory is summarized by the average change contrast: w = ( 1, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8)? What is the anticipated standard error of ˆθ? What is the ratio between the anticipated effect and its standard error? (b) What is θ if the time trajectory is summarized by the 48-month change contrast: w = ( 1, 0, 0, 0, 0, 0, 0, 0, 1)? What is the anticipated standard error of ˆθ? What is the ratio between the anticipated effect and its standard error? (c) What is θ if the time trajectory is summarized by the linear-trend (slope) contrast: w = ( 4, 3, 2, 1, 0, 1, 2, 3, 4)? What is the anticipated standard error of ˆθ? What is the ratio between the anticipated effect and its standard error? Answers to problem 2: Control Intervention Avg (sd) Avg (sd) Difference SE CI Diff/SE* Average change 0.000 ( 3.633) -7.111 ( 5.117) -7.111 0.281 ( -7.661, -6.561) -25.339 48-mo change 0.000 ( 3.975) -4.000 ( 3.930) -4.000 0.250 ( -4.490, -3.510) -16.001 Linear trend 0.0000 (0.1651) -0.0889 (0.1521) -0.0889 0.0100 (-0.1086, -0.0692) -8.8537 * Between group difference divided by the standard error (i.e., the t-statistic). 3
3. Suppose that the mean weight change at the measurement times is µ 0 = (0, 0, 0, 0, 0, 0, 0, 0, 0) (control group) and µ 1 = (0, 8, 7, 6, 5, 4, 3, 2, 1) in the treatment group. (a) What is θ if the time trajectory is summarized by the average change contrast: w = ( 1, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8)? What is the anticipated standard error of ˆθ? What is the ratio between the anticipated effect and its standard error? (b) What is θ if the time trajectory is summarized by the 48-month change contrast: w = ( 1, 0, 0, 0, 0, 0, 0, 0, 1)? What is the anticipated standard error of ˆθ? What is the ratio between the anticipated effect and its standard error? (c) What is θ if the time trajectory is summarized by the linear-trend (slope) contrast: w = ( 4, 3, 2, 1, 0, 1, 2, 3, 4)? What is the anticipated standard error of ˆθ? What is the ratio between the anticipated effect and its standard error? Answers to problem 3: Control Intervention Avg (sd) Avg (sd) Difference SE CI Diff/SE* Average change 0.000 ( 3.633) -4.000 ( 5.117) -4.000 0.281 ( -4.550, -3.450) -14.253 48-mo change 0.000 ( 3.975) -0.500 ( 3.930) -0.500 0.250 ( -0.990, -0.010) -2.000 Linear trend 0.0000 (.1651) 0.0667 (.1521) 0.0667.0100 (0.0470, 0.0863) 6.6403 * Between group difference divided by the standard error (i.e., the t-statistic). 4
4. Suppose that the mean weight change at the measurement times is µ 0 = (0, 0, 0, 0, 0, 0, 0, 0, 0) (control group) and µ 1 = (0, 1, 2, 3, 4, 5, 6, 7, 8) in the treatment group. (a) What is θ if the time trajectory is summarized by the average change contrast: w = ( 1, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8)? What is the anticipated standard error of ˆθ? What is the ratio between the anticipated effect and its standard error? (b) What is θ if the time trajectory is summarized by the 48-month change contrast: w = ( 1, 0, 0, 0, 0, 0, 0, 0, 1)? What is the anticipated standard error of ˆθ? What is the ratio between the anticipated effect and its standard error? (c) What is θ if the time trajectory is summarized by the linear-trend (slope) contrast: w = ( 4, 3, 2, 1, 0, 1, 2, 3, 4)? What is the anticipated standard error of ˆθ? What is the ratio between the anticipated effect and its standard error? Answers to problem 4: Control Intervention Avg (sd) Avg (sd) Difference SE CI Diff/SE* Average change 0.000 ( 3.633) -4.000 ( 5.117) -4.000 0.281 ( -4.550, -3.450) -14.253 48-mo change 0.000 ( 3.975) -4.000 ( 3.930) -4.000 0.250 ( -4.490, -3.510) -16.001 Linear trend 0.0000 ( 0.1651) -0.1667 ( 0.1521) -0.1667 0.0100 ( -0.1863, -0.1470) -16.6008 * Between group difference divided by the standard error (i.e., the t-statistic). 5
5. Use your answers to the above problems in answering the following: (a) Which of the above hypothetical mean weight loss trajectories (µ 1 ) is most likely to be affiliated with beneficial health outcomes? Which trajectory is not very likely to result in good outcomes? Give a reason for your answer. Answer: In general we think that losing weight and keeping it off has the most benefit (trajectory in problem 2). Weight rebound is not good, although the trajectory in problem 3 shows long-term weight regain rather than short term loss and immediate weight regain. The trajectory in problem 4 is probably of intermediate health benefit; certainly slow steady weight loss is better than slow steady weight gain. (b) Based on the calculations in problems 1-4 recommend a contrast w for use in the new trial. Give a reason for your choice. Answer: As we have discussed in class, our choice of summary measure (contrast) will induce an order on the longitudinal outcome space. The orders will differ according to the summary measure that is selected. This is true regardless of whether you use summary measures (as above) or a mixed-model repeated-measures ANOVA summary (as demonstrated in class). The contrast should be selected to be sensitive (large) to true mean trajectories that are felt to convey health benefits, and not sensitive (small) for trajectories that are not likely to have long-term health benefits. As argued above, it is probably best to lose weight and keep it off (problem 2), it is probably bad to lose weight and regain it (problem 3), and of intermediate benefit to lose weight slowly over time (problem 4). The following conclusions follow based on the last column of the results tables for problems 2-4: The slope contrast is very sensitive to trends. In problem 3 it would find that the intervention group was significantly worse than control (positive value in the last column). It would find that intervention was highly beneficial if weight loss was linearly decreasing. It is not as sensitive as other contrasts to long-term maintenance of weight loss (problem 2). The average change contrast is sensitive to all of these weight loss trajectories (particularly the long-term weight loss - problem 2). It might not be desirable that it is sensitive to the weight rebound example (problem 3). The 48-month change contrast seems to be most sensitive to the changes that are likely to have greatest health benefits and least sensitive to trajectories that are less likely to be beneficial. I am inclined toward the 48-month change contrast. I would work with study investigators to make sure I understood the potential health benefits affiliated with each of the trajectories. I would discuss the above evaluation with the investigator team to make sure that we were all in agreement as to the best contrast. 6
6. Suppose that you have a randomized clinical trial with 2 groups and n participants per group. Suppose that you have a baseline measurement and a follow-up measurement on every participant. Denote the data by Y 0ik and Y 1ik for the baseline and follow-up measurements in the ith participant (i = 1,..., n) of the kth treatment group (k = 0, 1). Suppose that ( ) [( ) ] Y0ik µ0k N, Σ (1) Y 1ik µ 1k where Σ = [ ] σ 2 ρσ 2 ρσ 2 σ 2 (a) Find the variance of the between-group mean difference at the last measurement times; i.e.: var(y 11 Y 10 ) (b) Find the variance of the between-group mean difference in the change from baseline to follow-up; i.e., var[(y 11 Y 01 ) (Y 10 Y 00 )]. (c) Find the conditions under which the variance in 6a is smaller than the variance in 6b. (d) What are the implications of this result for study design (or data analysis)? Answer: (a) var[y 11 Y 10 ] = 2σ 2 /n (b) var[(y 11 Y 01 ) (Y 10 Y 00 )]: var[(y 11 Y 01 ) (Y 10 Y 00 )] = var[(y 11 Y 01 )] + var[(y 10 Y 00 )] = 2σ 2 /n 2ρσ 2 /n + 2σ 2 /n 2ρσ 2 /n = 4σ 2 (1 ρ)/n (c) The second measure is more efficient when its variance is smaller than that of the first measure: var[(y 11 Y 01 ) (Y 10 Y 00 )] < var[y 11 Y 10 ] 4σ 2 (1 ρ)/n < 2σ 2 /n (1 ρ) < 0.5 ρ > 0.5 (d) Thus, the correlation must be fairly strong before it is more efficient to measure the difference between treatment groups by the difference in the average change as opposed to the difference in the average outcome at follow-up. 7
7. Use the distribution from equation (1) when answering the following: (a) What is the interpretation of β 2 in the following regression model? E(Y 1ik ) = β 0 + β 1 Y 0ik + β 2 1 [k=1] where 1 [k=1] is the indicator function for treatment group 1. (b) What is the variance of ˆβ 2 (estimated using linear regression)? [Hint: See last page for a summary of the variance of regression coefficients.] (c) Find the conditions under which the variance of ˆβ 2 is smaller than either the variance of 6a or 6b. (d) What are the implications of this result for study design (or data analysis)? Answer: (a) β 2 is the expected difference between treatment groups in two populations that have the same mean baseline level of the outcome variable. (b) To apply these identities in this problem you need to recognize that X in the identity on the last page represents the indicator for treatment effect (1 [ k = 1]) and Z represents the baseline measure (Y 0ik ). You then need to recognize that ρ is the correlation between baseline and follow-up measures and that it is the same in both treatment groups; thus, ρ Y Z X = ρ. Furthermore, within each treatment group the variance of the outcome measure is the same; thus, σ 2 Y X = σ2. Finally, you need to recognize that the sample size in the identities is the total number of participants in the trial; thus N = 2n where n is the number per group. Now from the identities: var( ˆβ 2 ) = σ2 Y XZ Nσ 2 X Z = σ2 Y X (1 ρ2 Y Z X ) Nσ 2 X Z = σ2 (1 ρ 2 ) 2nσ 2 X Z Now, note that σx Z 2 represents the variance of the treatment group indicator given the baseline level of the outcome. Since treatment groups were randomly assigned, the treatment group indicator takes the value 1 with probably 0.5 and the value 0 with probability 0.5 regardless of baseline value; thus E(X Z) = 0.5 and: σ 2 X Z = E(X E(X))2 = (1 0.5) 2 0.5 + (0 0.5) 2 0.5 = 0.25 It follows that var( ˆβ 2 ) = 2σ 2 (1 ρ 2 )/n 8
(c) The variance of the regression analysis is smaller that analysis of the difference between change whenever: var( ˆβ 2 ) < var[(y 11 Y 01 ) (Y 10 Y 00 )] 2σ 2 (1 ρ 2 ) < 4σ 2 (1 ρ) (1 ρ)(1 + ρ) < 2(1 ρ) 1 + ρ < 2 ρ < 1 which is always true, therefore the regression analysis is always more efficient. The variance of the regression analysis is smaller that analysis of follow-up measures whenever: var( ˆβ) < var[y 11 Y 10 ] 2σ 2 (1 ρ 2 ) < 2σ 2 (1 ρ 2 ) < 1 which is always true (since 1 ρ 1), therefore the regression analysis is always more efficient. (d) The above proof shows that it is always more efficient to condition on the baseline value in a regression analysis when analyzing pre-post (before-after) outcomes. There are two caveats that you should realize before applying this uniformly: The above result assumes that n is large because I have ignored degrees of freedom. If you include degrees of freedom, then the regression method is more efficient as long as there are more than about 15 subjects per group. I recommending using the regression method in a randomized trial where the distribution of baseline values the same in both treatment groups. In an observational study where the distribution of the baseline value may differ between two exposure categories, it is possible to estimate a different quantity than you are estimating with an analysis of change (i.e., one of the analyses is biased relative to the other). 9
8. In problem 7, what is the probability model, the functional, and the contrast that define the statistical model for scientific inference? Answer: It is not necessary to assume that the data are normally distributed; estimated regression coefficients will be normally distributed by a form of the central limit theorem. Similarly, it is not necessary to assume that there is a linear relationship between baseline and follow-up measurements; the coefficient β 1 is the first-order approximation to the nonlinear relationship between baseline and follow-up measures. Probability Model: Non-parametric (assuming only that the regression coefficients are normally distributed). Functional: Mean level of outcome, conditional on baseline level. Contrast: Difference in mean outcome levels (conditional on baseline). 10
Variance of a conditional coefficient in multiple linear regression Consider the linear regression model: E(Y ) = β 0 + β 1 Z + β 2 X Given data (Y i, X i, Z i ) on i = 1,..., N subjects, it is possible to show: (a) Variance of ˆβ 2 : var( ˆβ 2 ) = σ2 Y XZ Nσ 2 X Z where σy 2 XZ denotes the variance of Y given X and Z, σ2 X Z given Z, and N denotes the total sample size. (b) Conditional variance of Y given X and Z: denotes the variance of X σ 2 Y XZ = σ2 Y X (1 ρ2 Y Z X ) where ρ Y Z X denotes the correlation of Y and Z given X. 11