Postgraduate course: Anova and Repeated measurements Day 2 (part 2) Niels Trolle Andersen, Dept. of Biostatistics, Aarhus University, June 2009

Size: px

Start display at page:

Download "Postgraduate course: Anova and Repeated measurements Day 2 (part 2) Niels Trolle Andersen, Dept. of Biostatistics, Aarhus University, June 2009"

Collin Gervase Manning
5 years ago
Views:

1 Postgraduate course in ANOVA and Repeated Measurements Day (part ) Repeated Measurements Niels Trolle Andersen Dept. of Biostatistics, Aarhus University CVP (mean and sd) Sd w (within) version 1 Sd w (within) version The mean curve and the sd is the same for the two versions The with-in subject variation is the relevant variation analyzing changes over time..so How can we estimate the with-in subject variation and between subject variation? sd total = (sd w + sd b ) sd b (between subjects) (relevant for comparison of group levels) Univariate Repeated Measurements ANOVA -using the anova command Notation (comparison of k groups, each subject measured p times ): Data y ijt where i=1,.,k (group i) j=1,., n i (n i sample size in group i) t=1,..,p ( timepoint t) Assumptions: Data between subjects (within and between groups) are independent and y ijt = α ij + τ it + ε ijt where (τ i1,.., τ ip ) is the mean curve for group i. α ij is the random the level for subject j in group i ε ijt is random variation (with in subject measurement error ) 3 Assumptions: 1. The ε ijt s are independent and normal distributed with mean 0 and standard deviation σ w ( with in subject ). The α ij s are independent with mean 0 and standard deviation σ B (between subject) 3. The α ij s are normal distributed Remark: Only assumption 1 is required in comparisons of changes (changes in two group or no changes at all), while all three assumptions are required in comparisons of levels in the groups. 4

2 5 6 Example (evf): If all three assumptions are fulfilled then : The variation of the data can be described by normal distributions with mean µ ijt and standard deviation σ T where µ ijt = τ it and σ T = σ B + σ W (independent of i,j,t) EVF measured 6 times, two groups. When we used the Manova command and our measurements are given in 6 variables : evf1, evf,..,evf6. (no of rows= number of subject) The use of anova requires that the data are organized in another way: One variable containing all the evf measurements and three variable describing which subject, group and timepoint the measurement reefers to. The correlation between any two different measurements on the same subject is σ B / σ T If id is a variable which identify the subjects (the rows) then The stata command: reshape long evf,i(id) j(time) will produce a data set (evf data) with two new variables evf and time and keeping e.g. the group variable (no of rows= 6*number of subject). (reshape wide split the data into several rows) reshape long evf,i(id) j(time) (note: j = ) Data wide -> long 7 Plotting the individual curves (in Stata): Plotting the mean curves (in Stata): generate gw = 100*group + time Number of obs. 30 -> 180 Number of variables 8 -> 4 sort group id time scatter evf time, connect(l) by(group) egen groupmean = mean(evf), by(gw) sort group time id Twoway /// j variable (6 values) -> time (connected groupmean time if group==1,symbol(oh)) xij variables: evf1 evf... evf6 -> evf (connected groupmean time if group==,symbol(o)), legend(label(1 "CPB") label( "Sham")) (reshape wide can produce dataset example 5 from example 6.) If we have the data in long format i.e. one variable containing all the measurements then Stata can be used to produce figures with individual and/or mean curves. evf time Graphs by group groupmean time CPB Sham

3 We want to test the hypothesis: H : Equal changes over time for the two groups i.e. two parallel curves. This test can be perfformed by a 3-way ANOVA with id (subject identification), time, group and group*time (interaction) in the model. 9 The Univariate Repeated Measurements Anova: set matsize 800, permanently Number of obs = 180 R-squared = Root MSE = Adj R-squared = Source Partial SS df MS F Prob > F Model group idgroup time time*group Residual Total The p value for testing H : Equal changes over time for the two groups i.e. two parallel curves Maybe necessary (between 10 and 800) 10 Output (continued) Between-subjects error term: idgroup Levels: 30 (8 df) Lowest b.s.e. variable: id Covariance pooled over: group (for repeated variable) 11 How can we use of the four/three p - values: Source df F Regular H-F G-G Box 1 Huynh-Feldt epsilon = Greenhouse-Geisser epsilon = Box's conservative epsilon = time time*group Residual 140 Source df F Regular H-F G-G Box time time*group Residual 140 Some corrections of the p-value have been proposed when the assumptions (about equal sd and correlation) aren t fulfilled. In the regular column the p - values from first part of the output is stated (the uncorrected p-value); the three others will normally be larger than this. The following has been proposed: If the regular/uncorrected p value is not significant (>0.05) then stop and accept (fail to reject) the hypothesis else If the G-G p value is significant (<0.05) then stop and reject the hypothesis else If the Box p value is significant (<0.05) then stop and reject the hypothesis else stop and accept (fail to reject) the hypothesis. Kirk (198)

4 If the hypothesis H: Time + group is rejected i.e. the changes over time depend on the groups we can test (for each group): H: no changes over time. anova evf id time if group==1, repeated(time) Number of obs = 90 R-squared = = Root MSE = Adj R-squared 13 Between-subjects error term: id Levels: 15 (14 df) Lowest b.s.e. variable: id Huynh-Feldt epsilon = Greenhouse-Geisser epsilon = Box's conservative epsilon = Source Partial SS df MS F Prob > F Model id time Residual Source df F Regular H-F G-G Box time Residual 70 Upper bound for the p - value anova evf id time if group==, repeated(time) anova evf id time if group==, repeated(time) Number of obs = 90 R-squared = Root MSE = Adj R-squared = Source Partial SS df MS F Prob > F Model id time Residual Total Between-subjects error term: id Lowest b.s.e. variable: id Levels: 15 (14 df) Source df Huynh-Feldt epsilon = Greenhouse-Geisser epsilon = Box's conservative epsilon = F Regular H-F G-G Box time Residual 70 Lower bound for the p - value Conclusion: We found a significant differerence between the groups with respect to changes over time p<0.004) We found a statistical significant changes over time in the CPB-group (p<0.006) but not in the Sham-group (p>0.19) 16

5 17 18 If we want to estimate the with-in and the between subject variation one can use the xtmixed command: We have four variables: Evf group id and time: xi: xtmixed evf i.time*i.group _all: R.id ///,nofetable noheader nogroup nostderr nolrtest We have sd W = sd B = and can calculate Part of the output: Random-effects Parameters Estimate Std. Err. [95% Conf. Interval] _all: Identity sd(r.id) estimate of the between subject variation sd(residual) estimate of the with-in subject variation s w = s B = sd T = sd B + sd W = s T = The (estimated) correlation between two measurements on the same subject s B / s T = 0.53 We can look at each group separately: 19 We can calculate estimates for both groups: 0 xi: xtmixed evf i.time _all: R.id if group==1 ///,nofetable noheader nogroup nostderr nolrtest Random-effects Parameters Estimate Std. Err. [95% Conf. Interval] _all: Identity grp 1 s W s B s T s W s B s T Correlation sd(r.id) sd(residual) xi: xtmixed evf i.time _all: R.id if group== ///,nofetable noheader nogroup nostderr nolrtest Random-effects Parameters Estimate Std. Err. [95% Conf. Interval] _all: Identity sd(r.id) sd(residual) We can see that the estimates for the between subject variation (s B ) are almost equal but the with - in subject variation (s W ) are different and hence also the total variation and the correlation. Remarks: The correlations are expected to be positive (why?), but in special cases one might get negative correlations. (weight of mice with limit amount of food and.) We can compare the estimates above with the standard deviations and correlation calculated from the 6 variables evf1, evf,.., evf6.

6 In order to check part of assumption 1 ( e ijt ) one can calculate the residuals and the predicted values and make the figures as in day 1: predict evf_pre if e(sample), xb predict evf_res if e(sample), resid scatter evf_res evf_pre, saving(q1,replace) qnorm evf_res, title ("residual probability-plot") saving(q,replace) graph combine q1.gph q.gph Linear prediction residual probability-plot Inverse Normal Remarks: Is the probability plots nice? Does the variation in the residuals depend on the predicted values? 1 or we can make the figures for the groups separately: scatter evf_res evf_pre if group==1, saving(q1,replace) qnorm evf_res if group==1, title ("residual probability-plot") saving(q,replace) scatter evf_res evf_pre if group==, saving(q3,replace) qnorm evf_res if group==, title ("residual probability-plot") saving(q4,replace) graph combine q1.gph q.gph q3.gph q4.gph Linear prediction Linear prediction residual probability-plot Inverse Normal residual probability-plot Inverse Normal Remarks: We can see that the variation in the residuals is larger in group 1 compared to group. The probability plots are nice. Does the variation in the residuals depends on the predicted values? 3 anova dist sex/idsex time sex*time, repeated(time) 4 groupmean time Boys Girls Number of obs = 108 R-squared = Root MSE = Adj R-squared = Source Partial SS df MS F Prob > F Model sex idsex time sex*time Residual Total

7 Between-subjects error term: idsex Lowest b.s.e. variable: id Levels: 7 (5 df) Covariance pooled over: sex (for repeated variable) Huynh-Feldt epsilon = *Huynh-Feldt epsilon reset to Greenhouse-Geisser epsilon = Box's conservative epsilon = Source df F Regular H-F G-G Box time sex*time Residual 75 Conclusion: We found no significant differerence between the groups (sex) with respect to changes over time p>.078) 5 Number of obs = 108 R-squared = Root MSE = Adj R-squared = Source Partial SS df MS F Prob > F Model sex idsex time sex*time Residual Total If we accept H we can test whether the two mean curves coincides. It is exactly the same test as day i.e. equal to a t-test on the average of the 4 measurements of distances. All three assumptions should be fulfilled. 6 Between-subjects error term: idsex Lowest b.s.e. variable: id Levels: 7 (5 df) Covariance pooled over: sex (for repeated variable) 7 Remarks: We have now more than one way to analyze the data. 8 Huynh-Feldt epsilon = *Huynh-Feldt epsilon reset to Greenhouse-Geisser epsilon = Box's conservative epsilon = Which one (if any) shall we choose? How can describe the analysis? How can we describe the results? Source df F Regular H-F G-G Box time sex*time Residual 75 If we accept H (parallel curves) we can test H4 (no changes over time) for both groups in one test. If we perform a test for each of the groups we can have to different answers or we can accept H4 for both groups separately due to low power. First of all we need to check the assumptions (Day 3). Depending of what we can assume we can try to answer the questions (Day 4).

Postgraduate course: Anova and Repeated measurements Day 2 (part 2) Mogens Erlandsen, Department of Biostatistics, Aarhus University, November 2010

Postgraduate course: Anova and Repeated measurements Day 2 (part 2) Mogens Erlandsen, Department of Biostatistics, Aarhus University, November 2010 30 CVP (mean and sd) Postgraduate course in ANOVA and Repeated Measurements Day Repeated measurements (part ) Mogens Erlandsen Deptartment of Biostatistics Aarhus University 5 0 15 10 0 1 3 4 5 6 7 8 9