Mixed Models for Assessing Correlation in the Presence of Replication

Size: px
Start display at page:

Download "Mixed Models for Assessing Correlation in the Presence of Replication"

Transcription

1 Journal of the Air & Waste Management Association ISSN: (Print) (Online) Journal homepage: Mixed Models for Assessing Correlation in the Presence of Replication Anthony Hamlett, Louise Ryan, Paulina Serrano-Trespalacios & Russ Wolfinger To cite this article: Anthony Hamlett, Louise Ryan, Paulina Serrano-Trespalacios & Russ Wolfinger (003) Mixed Models for Assessing Correlation in the Presence of Replication, Journal of the Air & Waste Management Association, 53:4, , DOI: / To link to this article: Published online: Feb 01 Submit your article to this journal Article views: 865 View related articles Citing articles: 33 View citing articles Full Terms & Conditions of access and use can be found at Download by: [ ] Date: 06 December 017, At: 13:59

2 TECHNICAL PAPER ISSN J Air & Waste Manage Assoc 53: Copyright 003 Air & Waste Management Association Mixed Models for Assessing Correlation in the Presence of Replication Anthony Hamlett and Louise Ryan Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts Paulina Serrano-Trespalacios Department of Environmental Health, Harvard School of Public Health, Boston, Massachusetts Russ Wolfinger SAS Institute, Inc, Cary, North Carolina Downloaded by [ ] at 13:59 06 December 017 ABSTRACT The need to assess correlation in settings where multiple measurements are available on each of the variables of interest often arises in environmental science However, this topic is not covered in introductory statistics texts Although several ad hoc approaches can be used, they can easily lead to invalid conclusions and to a difficult choice of an appropriate measure of the correlation Lam et al approached this problem by using maximum likelihood estimation in cases where the replicate measurements are linked over time, but the method requires specialized software We reanalyze the data of Lam et al using PROC MIXED in SAS and show how to obtain the parameter estimates of interest with just a few lines of code We then extend Lam et al s method to settings where the replicate measurements are not linked Analysis of the unlinked case is illustrated with data from a study designed to assess correlations between indoor and outdoor measurements of benzene concentration in the air INTRODUCTION An important first step in any environmental science research project is assessing the accuracy and reliability of IMPLICATIONS While replicate measurements are commonly taken in environmental science research settings, it is unclear how to use these replicates to assess correlations When the number of replicates varies by subject, use of ad hoc approaches to correlation results in an efficiency loss and, hence, in unreliable correlation estimates Formulating the problem as a mixed model leads to results that are more reliable and that overcome the problems of the ad hoc approaches In addition, the SAS approach is very userfriendly and lends itself to extensions for more complex settings the measurement tools to be used For example, researchers may wish to compare the results of air pollution measurements based on the use of two different types of sampling or analytical techniques Correlation analysis is the primary statistical tool used in this context As discussed in almost every introductory statistics book, the Pearson correlation coefficient is the appropriate measure of association between two variables when these two variables are jointly normally distributed The Spearman correlation coefficient provides a nonparametric alternative based on ranks As discussed by Rosner, 1 the more specialized intraclass correlation coefficient is appropriate in settings where the two variables of interest are expected to have the same means and variances Pearson and Spearman correlations are easily and directly obtained from most statistical packages While some packages (eg, Stata) also provide commands to compute intraclass correlations directly, this quantity is easily obtained in packages such as SAS by formulating the problem as a mixed model (see the SAS manual ) In this paper, we address a question not covered in introductory texts, namely, the assessment of correlation in settings where multiple measurements are available on each of the variables of interest Two settings, linked and unlinked, are considered In the linked setting, the repeated measurements are linked together in some way, for example, repeats are taken together on different days Table 1 (taken from Bland and Altman 3 ) shows a linked data set of repeated measurements of intramural ph and PaCO in a study designed to assess within-subject correlations of clinical information gained from blood gas analysis and from gastric ph of critically ill patients Each pair of measurements was taken on different days In the unlinked setting, repeated measures are not linked 44 Journal of the Air & Waste Management Association Volume 53 April 003

3 Downloaded by [ ] at 13:59 06 December 017 Table 1 Repeated measures of intramural ph(x) and PaCO (Y) for eight critically ill patients (PATI) PATI ph PaCO PATI ph PaCO PATI ph PaCO together Table shows such a data set, corresponding to replicate measurements of benzene concentration in indoor and outdoor air, measured on 35 Mexican families Note that while some families have a single replicate measurement on indoor and outdoor air (eg, family 1), others (eg, family 6) have two replicate measurements on each Several ad hoc approaches can be taken to compute correlations in the presence of replication One naive approach is to ignore the repeated measurements and treat the data as if it were a simple random sample, and then compute the standard Pearson correlation coefficient Another choice is to compute the mean response for each variable for each subject, and then compute the standard Pearson correlation coefficient using the subject-specific averages for each variable Yet another choice is to compute a weighted correlation coefficient, 4 using the subject-specific averages for each variable and the number of repeated measurements for each subject as weights There are inherent problems with each of these approaches The simple correlation coefficient ignores the number of subjects as the (correct) sample size and uses instead the total number of observations as the (incorrect) sample size, thereby erroneously increasing the degrees of freedom, which can lead to overly frequent rejection of the null hypothesis when in fact it is true (ie, an invalid type I error 5 ) The simple correlation coefficient based on subject means avoids this problem but does not take into account the different number of replicate measurement per subject In addition, it tends to underestimate the true between-subject correlation 6 The weighted correlation coefficient does take into account the different number of replicate measurements per subject; however, the number of replicate measurements per variable for a given subject must be the same In addition, because the subject means are used in the computation, it too tends to underestimate the true between-subject correlation Several authors have proposed more technical solutions to the problem of measuring the correlation between two variables in the presence of replication For example, Bland and Altman 3 proposed using a partial correlation coefficient, which requires removing differences between subjects The partial correlation coefficient is useful if we want to know whether an increase (decrease) in one variable within a subject is associated with an increase (decrease) in the other variable However, if there are many subjects, there is loss in power, caused by the increased number of parameters that are to be estimated Chinchilli et al 7 proposed the use of a weighted correlation coefficient, using the sample variances and covariances to compute the weights While they considered both the unlinked and the linked case, their method is complicated Furthermore, it is empirically based and does not naturally arise from an underlying statistical model Lam et al 6 used maximum likelihood (ML) estimation to estimate the true correlation between the variables when the repeated measurements are linked over time They derived their estimates through formulation of the problem as a mixed-effects model Unfortunately, their approach is rather technical and requires the use of specialized software An important purpose of this paper is to show how to reproduce the parameter estimates of Lam et al, 6 using PROC MIXED in SAS We show that the analysis can be easily achieved with just a few lines of code We then extend Lam et al s approach to the nonlinked setting and also obtain parameter estimates using PROC MIXED in SAS In addition to reanalyzing the data of Lam et al, the analysis of data in the nonlinked setting is illustrated with data from a study designed to assess the correlation between indoor and outdoor measurements of benzene concentration in the air STATISTICAL MODELS Linked Repeated Measurements of Two Variates To proceed, some notation must be introduced We begin with the linked case, in which the repeated observations on the variables of interest, X and Y, are Volume 53 April 003 Journal of the Air & Waste Management Association 443

4 Downloaded by [ ] at 13:59 06 December 017 Table Repeated measures of benzene concentration (g/m 3 ) in indoor(x) and outdoor(y) air taken at the homes of 35 Mexican families Family Benzene Location Family Benzene Location Family Benzene Location In Out In Out In Out 75 In In In 1363 Out Out In Out In 8 86 Out 4 71 In Out Out Out In In In Out In Out In Out 6 69 In Out In In In In Out Out Out Out In Out In Out In In In In Out In Out Out Out Out In Out 3 46 In In 1 38 Out Out Out 60 In Out Out 703 In 3 71 Out 9 96 In 686 Out In In 3 46 Out In Out In 33 3 Out 9 98 Out In In In 5 31 Out In In 5 76 Out Out Out 6 75 In Out Out In In In Out In In 6 87 Out Out Out In where X and Y are the variances of X and Y, respectively, and XY is their correlation Note that XY is our main parameter of interest For notational convenience later, we will define the covariance form X Y XY XY Full specification of the model also requires assumptions regarding the relationships between X s and Y s measured at different times Like Lam et al, we assume that correlations between measurements taken at two different times, j and j, j j, are given by CorrX ij, X ij X CorrY ij, Y ij Y () CorrX ij, Y ij XY Heuristically, we would expect the term to generally be less than 1, indicating that correlations between variables measured at different times are lower in magnitude than those taken at the same time The assumed correlation structure is depicted in Figure 1 To better visualize the covariance structure, it is helpful to write out the full covariance matrix for the entire set of n i repeated measurements for the ith subject linked, for example, by being taken at the same point in time Let (X ij, Y ij ) be the jth repeated observation (j 1,,n i )ofthex, Y variables taken on the ith subject (i 1,,n), in a sample of n individuals, and define N to be the total number of observations Suppose that the pair (X ij, Y ij ) have a bivariate normal distribution with mean ( X, Y ) and variance-covariance matrix The parameters X and Y represent the overall mean values of the variables of interest Assumptions about are important, because this is where the correlations of interest are defined Following Lam et al, it is assumed X X Y XY X Y XY Y (1) Y X i C i CovXi1 Y i X X Y ini XY X X XY X X XY XY Y XY Y Y XY Y Y X X XY X XY X X XY XY Y Y XY Y XY Y Y X X XY X X XY X XY XY Y Y XY Y Y XY Y Note that to allow for a more parsimonious expression, we are using the covariance term XY Note also the block structure of this matrix, with submatrices corresponding to down the main diagonal The covariance matrix will have the same structure for each subject, except that the dimension will vary For example, Table 1 shows that person 1 has eight observations, 4 on each variable; hence, the covariance matrix for person 1 has eight rows (3) 444 Journal of the Air & Waste Management Association Volume 53 April 003

5 CorrX ij, X ij X CorrY ij, Y ij Y (5) CorrX ij, Y ij XY CorrX ij, Y ij It follows that one can think of the unlinked case as a special case of the linked setting, with set equal to 1 The covariance matrix for the ith subject in this setting is given by Downloaded by [ ] at 13:59 06 December 017 Figure 1 Correlation structure and eight columns (8 8) with the previously given structure Further insight into the structure of C i is seen if the data is reordered Instead of setting up the covariance matrix in terms of successive X, Y pairs (ie, X i1, Y i1, X i, Y i,, Y ini, Y ini ), suppose the n i X values are written first, followed by the n i Y values With this rearrangement, the covariance matrix becomes X i X i3 X Cov Xi1 Y i1 Y i Y i3 Y inix X X X X X X XY XY XY XY X X X X X X X XY XY XY XY X X X X X X X XY XY XY XY X X X X X X X X X XY XY XY XY XY XY XY XY Y Y Y Y Y Y Y XY XY XY XY Y Y Y Y Y Y Y XY XY XY XY Y Y Y Y Y Y Y Y Y XY XY XY XY Y Y Y Y Y Y Y (4) The covariance matrix can now be seen to fall into four distinct blocks The upper left block shows a constant covariance X X between the n i repeated X values taken on the ith subject Similarly, the lower right block shows a constant covariance Y Y between the n i repeated Y values taken on the ith subject The off-diagonal blocks show a compound symmetric covariance structure between the n i X and Y values, with XY on the main diagonal and XY on the off-diagonal Unlinked Repeated Measurements of Two Variates An appropriate model for the unlinked repeated measures design is easily obtained by a simple alteration to the model corresponding to the linked repeated measures design The fundamental difference between the linked and the unlinked settings is that the X ij and Y ij are no longer linked together That is, there is no time effect in the problem, and hence, the correlation between any two X and Y measurements should be the same, regardless of when they are taken The correlation structure thus becomes Y X i C i CovXi1 Y i X X Y ini XY X X XY X X XY XY Y XY Y Y XY Y Y X X XY X XY X X XY XY Y Y XY Y XY Y Y X X XY X X XY X XY XY Y Y XY Y Y XY Y (6) C i can also be written with the n i X values first, followed by the n i Y values Note that in this unlinked version of the covariance matrix, the difference between C i and C i is that there are no terms in C i involving and, hence, the blocks on the off-diagonal are now constant MODEL FITTING IN SAS Models for both the linked and unlinked settings can be easily fit using PROC MIXED in SAS To use PROC MIXED, the data must be entered in univariate form; that is, each row of data must correspond to a different measurement A variable needs to be defined, which indicates whether each line of data corresponds to an X or Y observation This variable is called Vtype A Replicate variable is used to keep track of the repeated measurements within subjects Note that the Replicate variable will be nested within subjects Appropriate SAS data format is illustrated below by Example 1, for the data in Table 1, where ph is chosen as Vtype 1 and PaCO is chosen as Vtype Response is the value of Vtype 1orVtype and Persnum is the subject number It is of no significance that ph is chosen as Vtype 1 and PaCO as Vtype, because the coding scheme was arbitrary Example 1 Input Persnum Vtype Response Replicate; cards; Volume 53 April 003 Journal of the Air & Waste Management Association 445

6 Downloaded by [ ] at 13:59 06 December The appropriate formulation of the PROC MIXED code, however, is not immediately obvious, because of the relative complexity of the covariance matrices C i and C i As described in the SAS documentation, PROC MIXED allows the fitting of regression models, where the covariance of the response involves the sum of two components, a matrix G involving the random effects in the model and specified through the random command, as well as a matrix R corresponding to the error term in the model and specified through the use of the repeated command While most familiar mixed models use either the random or repeated commands, the models described in the previous sections require the use of both We begin with the linked case To see how the SAS code should be written, it is useful to note that for each subject, the covariance matrix C i can be written as the sum of two matrices, one a matrix of constants whose values depend on whether the corresponding pair is two X s, two Y s oranx, Y pair, and the other a block diagonal, with blocks corresponding to X, Y pairs measured at the same time Hence, C i can be written as C i X X XY X X XY X X XY XY Y Y XY Y Y XY Y Y X X XY X X XY X X XY XY Y Y XY Y Y XY Y Y X X XY X X XY X X XY XY Y Y XY Y Y XY Y Y (7) where X (1 X ), Y (1 Y ) and XY (1 ) These two matrices can be set up through judicious use of the random and repeated statements in PROC MIXED Consider first the matrix on the left side of the expression Careful scrutiny indicates that the matrix can be constructed by assigning X- and Y-specific random effects to individual i, and allowing these random effects to be correlated This can be achieved by declaring the variable Vtype (ie, the indicator of whether a particular observation is an X or a Y) to be random across individual subjects Covariance between the X- and Y-specific random effects can be achieved by specifying an unstructured covariance matrix Now consider the matrix on the right side of the expression This structure is relatively straightforward and can be achieved by declaring the variable Vtype to be repeated within each individual-specific replicate (ie, declaring the subject to be replicate nested within individual) and using an unstructured covariance In the case of linked repeated measurements, the SAS code to obtain the parameter estimates is given by SAS code; data dataname; input persnum vtype response replicate; datalines; ; proc mixed; class persnum vtype replicate; model response vtype/solution ddfmkr; random vtype/typeun subjectpersnum g gcorr v vcorr; repeated vtype/typeun subjectreplicate(persnum) r rcorr; run; where Persnum corresponds to subject number; Vtype refers to the two variables, which are coded as 1 and ; Response corresponds to the values of the two variables; and Replicate corresponds to the number of repeated measurements for each subject, which need not be the same The CLASS statement specifies Persnum, Treatment, and Replicate as classification (categorical) effects, and the MODEL statement specifies the mean (regression) model for the data SOLUTION requests that the fixed effects (specified on the right side of the equal sign in the model statement, before /) estimates be printed, and DDFM KR specifies the Kenward- Roger 8 method for computing the denominator degrees of freedom for the fixed effects Note that while this latter option is not necessary, it tends to yield more reliable results in general (see the SAS manual for more details) As indicated earlier, the RANDOM and REPEATED statements are used to set up the structure of the G and R matrices Declaring SUBJECT Persnum after the specification of Vtype as random instructs PROC MIXED to make the N N variance-covariance matrix for the entire data vector to be block diagonal, with block corresponding to subject The size of the blocks depends on the number of measurements each subject has These subject blocks are in themselves block diagonal of size with structure specified by TYPE option For example, from 446 Journal of the Air & Waste Management Association Volume 53 April 003

7 Downloaded by [ ] at 13:59 06 December 017 the data in Table 1, the first person has a total of eight measurements; hence, the size of the block for the first person is 8 8, while the third person has 16 measurements and, thus, the size of the block for the third person is TYPE UN specifies a general variance-covariance matrix and makes the subject-specific X and Y random effects correlated On the REPEATED statement line, SUBJECT Replicate(Persnum) instructs PROC MIXED to make the N N variance-covariance matrix for the data vector to be a diagonal matrix of blocks Each of these blocks has the structure specified by the TYPE option In this case, TYPE UN specifies a general variance-covariance matrix G and GCORR request that the estimated random effect variance-covariance and correlation be printed, respectively V and VCORR request that the estimated response variance-covariance and correlation be printed, respectively R and RCORR request that the variance-covariance and correlation between the within subject replicate X, Y pairs be printed, respectively The V matrix is a combination of the G and R matrices By default, for R, RCORR, V, and VCORR, the first block, determined by the SUBJECT effect, is printed However, the default can be changed by specifying a specific value for R, RCORR, V, and VCORR (see the SAS manual ) In the PROC MIXED statement, a METHOD option can be given to specify the method of estimation for the covariance parameters If no METHOD option is given in the PROC MIXED statement, the covariance parameters are estimated using restricted maximum likelihood (REML) estimation, the default option Similarly, in the MODEL statement, the method of computation for the denominator degrees of freedom can be specified by using the DDFM option If no DDFM option is given in the MODEL statement, for the SAS code given here, the CONTAINMENT option is used For further details on the METHOD option and DDFM option, see the SAS manual For the unlinked case, the code is the same as that described previously, except that the repeated statement is replaced by repeated vtype/typeun(1) subjectreplicate(persnum) r rcorr; where TYPE UN(1) specifies a variance-covariance matrix whose off-diagonal element is zero Equivalently, one can use the following code for the repeated statement: repeated/groupvtype r rcorr; where GROUP vtype specifies heterogeneity of variances between observations with vtype 1 and vtype (ie, for X and Y) EXAMPLES Linked Data Table 1 provided by Bland and Altman 3 and reproduced in Lam et al 6 shows linked repeated measurements of intramural ph and PaCO for eight subjects Table 3 gives the simple Pearson correlation, the simple Pearson correlation based on subject means, the weighted correlation (Bland and Altman 4 ), and the 95% bootstrapped confidence interval (CI) for this data set It is important to note here that bootstrapping was accomplished by resampling individuals, thus maintaining the appropriate correlation structure of the data Inspection of Table 3 reveals that these correlation measures are of different magnitudes and signs Thus, one is faced with the dilemma of choosing one of these measures as the appropriate measure of the true correlation The values presented here differ from those of Bland and Altman 4 because of rounding Of the three correlation measures, the naive Pearson correlation measure has the shortest interval Lam et al 6 obtained parameter estimates (Table 4) for the data, using an ML estimation program These results can be reproduced using the SAS code, by specifying METHOD ML in the PROC MIXED statement The main difference between ML and REML (the default option) is that ML gives biased estimates of the covariance parameters, whereas REML does not For comparison with the naive estimates reported in Table 3, we provide bootstrap confidence intervals for the correlation parameter estimates obtained using SAS s PROC MIXED Selected Table 3 Simple correlations between ph(x) and PaCO (Y) and 95% bootstrap confidence interval for the data in Table 1 Correlation Value 95% CI Naive Pearson correlation , Pearson correlation based on means , Weighted correlation , 0813 Table 4 Parameter estimates from Lam et al 6 for the ph(x)-paco data in Table 1 with 95% bootstrap confidence interval for XY Parameter Estimate 95% CI X Y 5008 X Y X Y 0654 XY , Volume 53 April 003 Journal of the Air & Waste Management Association 447

8 Downloaded by [ ] at 13:59 06 December 017 portions of the SAS output are given in Tables 5 8, where labels have been added for clarity Table 5 gives the results obtained from the SAS code for the R and G matrices From Table 5, and 0547 are the estimated variances ( XR ; YR )ofxand Y, ( X ; Y )ofx and Y, respectively Note that and Note also that the elements in G appear in V From Table 7, the estimated correlation between X and Y ( XY )is For j j, the estimated correlation between X ij and X ij respectively, obtained from the R matrix Similarly, ( X ) is and the estimated correlation between Y ij and 045 are the estimated variances ( XG ; YG )ofxand Y, respectively, obtained from the G matrix The respective covariances from the R and G matrices are and 005 The correlations derived from the R ( R ) and G ( G ) matrices are 0509 and 01416, respectively The results in Table 4 are obtained from Tables 6, 7, and Y ij ( Y ) is 0654 The estimated correlation between X ij and Y ij ( XY ) is 0104 and, thus, the estimate of is (0104/000995) The means X and Y are obtained from Table 8 In SAS, when the variables are categorical, the highest value is taken as the point of reference, which in this case is the variable labeled as [(PaCO (Y)] The estimate for X is 71151, which is the and 8 Table 6 gives the results obtained from the SAS sum of the estimated values for the intercept and code for the V matrix and Table 7 gives the corresponding correlations associated with the V matrix From Table 6, PaCO (Y) The estimate for Y is 5008, the intercept value and are the overall estimated variances Unlinked Data Table 5 Estimated R and G matrices obtained for the data in Table 1 using SAS PROC MIXED procedure The data in Table is from an environmental study that focused on measuring the amount of benzene concentration R Matrix G Matrix (in g/m 3 ) in the air inside and outside the homes of several Mexican families The data are entered into SAS Variable ph PaCO ph PaCO similarly as was done for Example 1 Table 9 gives the ph simple correlation coefficients along with the 95% bootstrapped confidence intervals Of the two correlation PaCO measures, the naive Pearson correlation measure has the shorter confidence interval Inspection of Table 9 indicates that it is much more difficult to Table 6 Estimated variance-covariance matrix for the ph(x)-paco (Y) data in Table 1, for PATI 1 choose an appropriate measure of the true X 1 Y 1 X Y X 3 Y 3 X 4 Y 4 correlation in this setting because the number of observations is not the same in the two cases In addition, not all of the data X (95 observations) are used in computing Y these correlations The reason for the discrepancy in the number of observations is X Y that, for some subjects, there are measurements X Y missing Consequently, because of X missing measurements, a weighted correlation Y would be difficult to compute These problems do not occur if the SAS code is used to obtain the correlation Table 7 Estimated correlation matrix between ph(x) and PaCO (Y) data in Table 1, for PATI 1 Table 10 gives the results obtained X 1 Y 1 X Y X 3 Y 3 X 4 Y 4 from the SAS code for the R and G matrices From Table 10, 8 and are the estimated variances ( XR ; YR )ofx and Y, X respectively, obtained from the R matrix Y Similarly, 1196 and are the estimated variances ( XG ; YG )ofx and Y, X Y respectively, obtained from the G matrix X Y The covariance from the G matrix is X and the correlation ( G ) is 0655 Y Note here that for the R matrix there is no covariance and, hence, no correlation, 448 Journal of the Air & Waste Management Association Volume 53 April 003

9 Downloaded by [ ] at 13:59 06 December 017 Table 8 Regression results for the ph(x)-paco (Y) data in Table 1 Effect Estimate SE DF t Value Pr > t Intercept ph PaCO 0 Table 9 Simple correlations between indoor(x) and outdoor(y) air, and 95% bootstrap confidence interval for the benzene data in Table Correlation # of Obs Value 95% CI Naive Pearson correlation , Pearson correlation based on means , Table 10 Estimated R and G matrices obtained for the data in Table using SAS PROC MIXED procedure Variable R Matrix G Matrix Indoor Outdoor Indoor Outdoor Indoor Outdoor because of the TYPE UN(1) specified in the REPEATED statement The results in Tables 11, 1, and 13 are used to obtain the results in Table 14 Table 11 gives the results obtained from the SAS code for the V matrix and Table 1 gives the corresponding correlations associated with the V matrix From Table 11, and 4493 are the overall estimated variances ( X ; Y )ofx and Y, respectively Note that and Note also that the elements in G appear in V From Table 1, the estimated correlation between X and Y ( XY ) is For j j, the estimated correlation between X ij and X ij ( X ) is 088 and the estimated correlation between Y ij and Y ij ( Y ) is From Table 13, X ( ) and Y Table 14 also provides bootstrap confidence interval for the parameter of interest, XY DISCUSSION In this paper, we investigated methods to assess the correlation between two variates, X and Y, in the presence of repeated measures or replicates Both linked and unlinked settings were considered, in both cases under the assumption that the two variates follow a multivariate normal distribution Ad hoc approaches as well as PROC MIXED in SAS were used to estimate the correlation for two examples Of the ad hoc approaches, the bootstrapped confidence interval was shortest for the naive Pearson approach Bootstrapped confidence intervals for the mixed model formulation were approximately equal to or shorter than the bootstrapped confidence intervals for the ad hoc approaches This confirms that the mixed-model approach is indeed using the data in a more efficient manner The mixed-model formulation overcomes some of the inherent problems with the ad hoc approaches and is very easy to apply using PROC MIXED in SAS Although not of direct relevance to the topic of this paper, our data examples revealed some interesting features in relation to the effects of outliers For both the ad hoc approaches and the SAS PROC MIXED approach, the estimates were sensitive to the exclusion of an extreme Table 11 Estimated variance-covariance matrix for the indoor(x)-outdoor(y) benzene data in Table for family 6 X 1 Y 1 X Y X Y X Y Table 1 Estimated correlation matrix between indoor(x) and outdoor(y) air for the benzene data in Table for family 6 X 1 Y 1 X Y X Y X Y Table 13 Regression results for the indoor(x)-outdoor(y) benzene data in Table Effect Estimate SE DF t Value Pr > t Intercept Indoor Outdoor 0 Table 14 Parameter estimates for the indoor(x)-outdoor(y) benzene data in Table with 95% bootstrap confidence interval for XY Parameter Estimate 95% CI X Y X Y X 088 Y XY , Volume 53 April 003 Journal of the Air & Waste Management Association 449

10 Downloaded by [ ] at 13:59 06 December 017 observation For example, in the benzene analysis, the correlation coefficient was reduced by 69% when an influential observation was removed This finding suggests that users should be cautious to make sure the results are not driven by extreme values before interpreting them If the data are skewed, thereby violating the normality assumption, a transformation, such as the log, might be appropriate before applying the SAS PROC MIXED procedure On the other hand, one can compute Spearman s correlation 9 in the simple case (no repeats) However, it is not clear how one would generalize our method to compute a Spearman correlation in the presence of replication We have treated X and Y as being distinct variables, each having its own mean and variance However, in many instances, one may not be able to distinguish between X and Y For example, consider two different devices used to measure the lung capacity of a subject In this situation, one is more interested in the agreement of measurement of the two devices A measure of this agreement is the concordance correlation 1,7 On the other hand, interest may focus on the degree to which a single measure of an event describes the mean of repeated measurements of that event In this case, an intraclass correlation 1 can be computed For both of the data sets presented here, intraclass correlations can be computed Finally, one can compute a correlation for each subject and then use the subject correlations in the computation of an overall correlation 7 This procedure would work well if there were several repeated measurements per subject per variable ACKNOWLEDGMENTS This work was supported by NIH grants ES0000, ES0714, and ES05947 REFERENCES 1 Rosner, B Fundamentals of Biostatistics, 5th ed; Duxbury: Pacific Grove, CA, 000 SAS Institute Inc SAS/STAT User s Guide: Version 8, Volume ; SAS Institute, Inc: Cary, NC, Bland, JM; Altman, DG Calculating Correlation Coefficients with Repeated Observations: Part 1 Correlation within Subjects; Brit Med J 1995, 310, Bland, JM; Altman, DG Calculating Correlation Coefficients with Repeated Observations: Part Correlation between Subjects; Brit Med J 1995, 310, Bland, JM; Altman, DG Correlation, Regression and Repeated Data; Brit Med J 1994, 308, Lam, M; Webb, CA; O Donnell, DE Correlation between Two Variables in Repeated Measures In American Statistical Association, Proceedings of the Biometric Section; American Statistical Association: Alexandria, VA, 1999; pp Chinchilli, VM; Martel, JK; Kumanyika, S; Lloyd, T A Weighted Concordance Correlation Coefficient for Repeated Measures Designs; Biometrics 1996, 5, Kenward, MG; Roger, JH Small Sample Inference for Fixed Effects from Restricted Maximum Likelihood; Biometrics 1997, 53, Zar, JH Biostatistical Analysis, 4th ed; Prentice Hall: Upper Saddle River, NJ, 1999 About the Authors Anthony Hamlett is a research fellow and Louise Ryan is a professor of biostatistics in the Department of Biostatistics, Harvard School of Public Health, 655 Huntington Avenue, Boston, MA 0115 Paulina Serrano-Trespalacios is a doctoral student in the Department of Environmental Science, Harvard School of Public Health, 655 Huntington Avenue, Boston, MA 0115 Russ Wolfinger is the director of geonomics at SAS Institute Inc, SAS Campus Drive, Cary, NC Journal of the Air & Waste Management Association Volume 53 April 003

over Time line for the means). Specifically, & covariances) just a fixed variance instead. PROC MIXED: to 1000 is default) list models with TYPE=VC */

over Time line for the means). Specifically, & covariances) just a fixed variance instead. PROC MIXED: to 1000 is default) list models with TYPE=VC */ CLP 944 Example 4 page 1 Within-Personn Fluctuation in Symptom Severity over Time These data come from a study of weekly fluctuation in psoriasis severity. There was no intervention and no real reason

More information

Analysis of Longitudinal Data: Comparison between PROC GLM and PROC MIXED.

Analysis of Longitudinal Data: Comparison between PROC GLM and PROC MIXED. Analysis of Longitudinal Data: Comparison between PROC GLM and PROC MIXED. Maribeth Johnson, Medical College of Georgia, Augusta, GA ABSTRACT Longitudinal data refers to datasets with multiple measurements

More information

Answer to exercise: Blood pressure lowering drugs

Answer to exercise: Blood pressure lowering drugs Answer to exercise: Blood pressure lowering drugs The data set bloodpressure.txt contains data from a cross-over trial, involving three different formulations of a drug for lowering of blood pressure:

More information

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46 BIO5312 Biostatistics Lecture 10:Regression and Correlation Methods Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 11/1/2016 1/46 Outline In this lecture, we will discuss topics

More information

Large Sample Properties of Estimators in the Classical Linear Regression Model

Large Sample Properties of Estimators in the Classical Linear Regression Model Large Sample Properties of Estimators in the Classical Linear Regression Model 7 October 004 A. Statement of the classical linear regression model The classical linear regression model can be written in

More information

Statistical Distribution Assumptions of General Linear Models

Statistical Distribution Assumptions of General Linear Models Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions

More information

An R # Statistic for Fixed Effects in the Linear Mixed Model and Extension to the GLMM

An R # Statistic for Fixed Effects in the Linear Mixed Model and Extension to the GLMM An R Statistic for Fixed Effects in the Linear Mixed Model and Extension to the GLMM Lloyd J. Edwards, Ph.D. UNC-CH Department of Biostatistics email: Lloyd_Edwards@unc.edu Presented to the Department

More information

Introduction to SAS proc mixed

Introduction to SAS proc mixed Faculty of Health Sciences Introduction to SAS proc mixed Analysis of repeated measurements, 2017 Julie Forman Department of Biostatistics, University of Copenhagen 2 / 28 Preparing data for analysis The

More information

Introduction to SAS proc mixed

Introduction to SAS proc mixed Faculty of Health Sciences Introduction to SAS proc mixed Analysis of repeated measurements, 2017 Julie Forman Department of Biostatistics, University of Copenhagen Outline Data in wide and long format

More information

Time-Invariant Predictors in Longitudinal Models

Time-Invariant Predictors in Longitudinal Models Time-Invariant Predictors in Longitudinal Models Today s Topics: What happens to missing predictors Effects of time-invariant predictors Fixed vs. systematically varying vs. random effects Model building

More information

Testing Indirect Effects for Lower Level Mediation Models in SAS PROC MIXED

Testing Indirect Effects for Lower Level Mediation Models in SAS PROC MIXED Testing Indirect Effects for Lower Level Mediation Models in SAS PROC MIXED Here we provide syntax for fitting the lower-level mediation model using the MIXED procedure in SAS as well as a sas macro, IndTest.sas

More information

Longitudinal Data Analysis of Health Outcomes

Longitudinal Data Analysis of Health Outcomes Longitudinal Data Analysis of Health Outcomes Longitudinal Data Analysis Workshop Running Example: Days 2 and 3 University of Georgia: Institute for Interdisciplinary Research in Education and Human Development

More information

Measuring relationships among multiple responses

Measuring relationships among multiple responses Measuring relationships among multiple responses Linear association (correlation, relatedness, shared information) between pair-wise responses is an important property used in almost all multivariate analyses.

More information

Application of Ghosh, Grizzle and Sen s Nonparametric Methods in. Longitudinal Studies Using SAS PROC GLM

Application of Ghosh, Grizzle and Sen s Nonparametric Methods in. Longitudinal Studies Using SAS PROC GLM Application of Ghosh, Grizzle and Sen s Nonparametric Methods in Longitudinal Studies Using SAS PROC GLM Chan Zeng and Gary O. Zerbe Department of Preventive Medicine and Biometrics University of Colorado

More information

Correlation and Simple Linear Regression

Correlation and Simple Linear Regression Correlation and Simple Linear Regression Sasivimol Rattanasiri, Ph.D Section for Clinical Epidemiology and Biostatistics Ramathibodi Hospital, Mahidol University E-mail: sasivimol.rat@mahidol.ac.th 1 Outline

More information

Data Analyses in Multivariate Regression Chii-Dean Joey Lin, SDSU, San Diego, CA

Data Analyses in Multivariate Regression Chii-Dean Joey Lin, SDSU, San Diego, CA Data Analyses in Multivariate Regression Chii-Dean Joey Lin, SDSU, San Diego, CA ABSTRACT Regression analysis is one of the most used statistical methodologies. It can be used to describe or predict causal

More information

Multiple Linear Regression

Multiple Linear Regression Chapter 3 Multiple Linear Regression 3.1 Introduction Multiple linear regression is in some ways a relatively straightforward extension of simple linear regression that allows for more than one independent

More information

Time-Invariant Predictors in Longitudinal Models

Time-Invariant Predictors in Longitudinal Models Time-Invariant Predictors in Longitudinal Models Today s Class (or 3): Summary of steps in building unconditional models for time What happens to missing predictors Effects of time-invariant predictors

More information

Biostatistics 301A. Repeated measurement analysis (mixed models)

Biostatistics 301A. Repeated measurement analysis (mixed models) B a s i c S t a t i s t i c s F o r D o c t o r s Singapore Med J 2004 Vol 45(10) : 456 CME Article Biostatistics 301A. Repeated measurement analysis (mixed models) Y H Chan Faculty of Medicine National

More information

A SAS/AF Application For Sample Size And Power Determination

A SAS/AF Application For Sample Size And Power Determination A SAS/AF Application For Sample Size And Power Determination Fiona Portwood, Software Product Services Ltd. Abstract When planning a study, such as a clinical trial or toxicology experiment, the choice

More information

Chapter 3 ANALYSIS OF RESPONSE PROFILES

Chapter 3 ANALYSIS OF RESPONSE PROFILES Chapter 3 ANALYSIS OF RESPONSE PROFILES 78 31 Introduction In this chapter we present a method for analysing longitudinal data that imposes minimal structure or restrictions on the mean responses over

More information

Designing Multilevel Models Using SPSS 11.5 Mixed Model. John Painter, Ph.D.

Designing Multilevel Models Using SPSS 11.5 Mixed Model. John Painter, Ph.D. Designing Multilevel Models Using SPSS 11.5 Mixed Model John Painter, Ph.D. Jordan Institute for Families School of Social Work University of North Carolina at Chapel Hill 1 Creating Multilevel Models

More information

Chapter 13 Correlation

Chapter 13 Correlation Chapter Correlation Page. Pearson correlation coefficient -. Inferential tests on correlation coefficients -9. Correlational assumptions -. on-parametric measures of correlation -5 5. correlational example

More information

Repeated Measures ANOVA Multivariate ANOVA and Their Relationship to Linear Mixed Models

Repeated Measures ANOVA Multivariate ANOVA and Their Relationship to Linear Mixed Models Repeated Measures ANOVA Multivariate ANOVA and Their Relationship to Linear Mixed Models EPSY 905: Multivariate Analysis Spring 2016 Lecture #12 April 20, 2016 EPSY 905: RM ANOVA, MANOVA, and Mixed Models

More information

A Re-Introduction to General Linear Models (GLM)

A Re-Introduction to General Linear Models (GLM) A Re-Introduction to General Linear Models (GLM) Today s Class: You do know the GLM Estimation (where the numbers in the output come from): From least squares to restricted maximum likelihood (REML) Reviewing

More information

SAS Syntax and Output for Data Manipulation:

SAS Syntax and Output for Data Manipulation: CLP 944 Example 5 page 1 Practice with Fixed and Random Effects of Time in Modeling Within-Person Change The models for this example come from Hoffman (2015) chapter 5. We will be examining the extent

More information

BIOL Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES

BIOL Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES BIOL 458 - Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES PART 1: INTRODUCTION TO ANOVA Purpose of ANOVA Analysis of Variance (ANOVA) is an extremely useful statistical method

More information

Unit 14: Nonparametric Statistical Methods

Unit 14: Nonparametric Statistical Methods Unit 14: Nonparametric Statistical Methods Statistics 571: Statistical Methods Ramón V. León 8/8/2003 Unit 14 - Stat 571 - Ramón V. León 1 Introductory Remarks Most methods studied so far have been based

More information

The MIANALYZE Procedure (Chapter)

The MIANALYZE Procedure (Chapter) SAS/STAT 9.3 User s Guide The MIANALYZE Procedure (Chapter) SAS Documentation This document is an individual chapter from SAS/STAT 9.3 User s Guide. The correct bibliographic citation for the complete

More information

STAT 501 Assignment 2 NAME Spring Chapter 5, and Sections in Johnson & Wichern.

STAT 501 Assignment 2 NAME Spring Chapter 5, and Sections in Johnson & Wichern. STAT 01 Assignment NAME Spring 00 Reading Assignment: Written Assignment: Chapter, and Sections 6.1-6.3 in Johnson & Wichern. Due Monday, February 1, in class. You should be able to do the first four problems

More information

Split-Plot Designs. David M. Allen University of Kentucky. January 30, 2014

Split-Plot Designs. David M. Allen University of Kentucky. January 30, 2014 Split-Plot Designs David M. Allen University of Kentucky January 30, 2014 1 Introduction In this talk we introduce the split-plot design and give an overview of how SAS determines the denominator degrees

More information

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 1: August 22, 2012

More information

1 A Review of Correlation and Regression

1 A Review of Correlation and Regression 1 A Review of Correlation and Regression SW, Chapter 12 Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then

More information

Rank parameters for Bland Altman plots

Rank parameters for Bland Altman plots Rank parameters for Bland Altman plots Roger B. Newson May 2, 8 Introduction Bland Altman plots were introduced by Altman and Bland (983)[] and popularized by Bland and Altman (986)[2]. Given N bivariate

More information

Subject CS1 Actuarial Statistics 1 Core Principles

Subject CS1 Actuarial Statistics 1 Core Principles Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and

More information

Psychology 282 Lecture #4 Outline Inferences in SLR

Psychology 282 Lecture #4 Outline Inferences in SLR Psychology 282 Lecture #4 Outline Inferences in SLR Assumptions To this point we have not had to make any distributional assumptions. Principle of least squares requires no assumptions. Can use correlations

More information

One-Way ANOVA. Some examples of when ANOVA would be appropriate include:

One-Way ANOVA. Some examples of when ANOVA would be appropriate include: One-Way ANOVA 1. Purpose Analysis of variance (ANOVA) is used when one wishes to determine whether two or more groups (e.g., classes A, B, and C) differ on some outcome of interest (e.g., an achievement

More information

SAS/STAT 13.1 User s Guide. The MIANALYZE Procedure

SAS/STAT 13.1 User s Guide. The MIANALYZE Procedure SAS/STAT 13.1 User s Guide The MIANALYZE Procedure This document is an individual chapter from SAS/STAT 13.1 User s Guide. The correct bibliographic citation for the complete manual is as follows: SAS

More information

Principal Component Analysis, A Powerful Scoring Technique

Principal Component Analysis, A Powerful Scoring Technique Principal Component Analysis, A Powerful Scoring Technique George C. J. Fernandez, University of Nevada - Reno, Reno NV 89557 ABSTRACT Data mining is a collection of analytical techniques to uncover new

More information

Time-Invariant Predictors in Longitudinal Models

Time-Invariant Predictors in Longitudinal Models Time-Invariant Predictors in Longitudinal Models Topics: What happens to missing predictors Effects of time-invariant predictors Fixed vs. systematically varying vs. random effects Model building strategies

More information

Biostatistics Workshop Longitudinal Data Analysis. Session 4 GARRETT FITZMAURICE

Biostatistics Workshop Longitudinal Data Analysis. Session 4 GARRETT FITZMAURICE Biostatistics Workshop 2008 Longitudinal Data Analysis Session 4 GARRETT FITZMAURICE Harvard University 1 LINEAR MIXED EFFECTS MODELS Motivating Example: Influence of Menarche on Changes in Body Fat Prospective

More information

9 Correlation and Regression

9 Correlation and Regression 9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the

More information

Ch. 16: Correlation and Regression

Ch. 16: Correlation and Regression Ch. 1: Correlation and Regression With the shift to correlational analyses, we change the very nature of the question we are asking of our data. Heretofore, we were asking if a difference was likely to

More information

SAS/STAT 13.1 User s Guide. The Four Types of Estimable Functions

SAS/STAT 13.1 User s Guide. The Four Types of Estimable Functions SAS/STAT 13.1 User s Guide The Four Types of Estimable Functions This document is an individual chapter from SAS/STAT 13.1 User s Guide. The correct bibliographic citation for the complete manual is as

More information

Topic 12. The Split-plot Design and its Relatives (continued) Repeated Measures

Topic 12. The Split-plot Design and its Relatives (continued) Repeated Measures 12.1 Topic 12. The Split-plot Design and its Relatives (continued) Repeated Measures 12.9 Repeated measures analysis Sometimes researchers make multiple measurements on the same experimental unit. We have

More information

MATH Notebook 3 Spring 2018

MATH Notebook 3 Spring 2018 MATH448001 Notebook 3 Spring 2018 prepared by Professor Jenny Baglivo c Copyright 2010 2018 by Jenny A. Baglivo. All Rights Reserved. 3 MATH448001 Notebook 3 3 3.1 One Way Layout........................................

More information

Unit 10: Simple Linear Regression and Correlation

Unit 10: Simple Linear Regression and Correlation Unit 10: Simple Linear Regression and Correlation Statistics 571: Statistical Methods Ramón V. León 6/28/2004 Unit 10 - Stat 571 - Ramón V. León 1 Introductory Remarks Regression analysis is a method for

More information

STAT 501 EXAM I NAME Spring 1999

STAT 501 EXAM I NAME Spring 1999 STAT 501 EXAM I NAME Spring 1999 Instructions: You may use only your calculator and the attached tables and formula sheet. You can detach the tables and formula sheet from the rest of this exam. Show your

More information

Keywords: One-Way ANOVA, GLM procedure, MIXED procedure, Kenward-Roger method, Restricted maximum likelihood (REML).

Keywords: One-Way ANOVA, GLM procedure, MIXED procedure, Kenward-Roger method, Restricted maximum likelihood (REML). A Simulation JKAU: Study Sci., on Vol. Tests 20 of No. Hypotheses 1, pp: 57-68 for (2008 Fixed Effects A.D. / 1429 in Mixed A.H.) Models... 57 A Simulation Study on Tests of Hypotheses for Fixed Effects

More information

Sample Size / Power Calculations

Sample Size / Power Calculations Sample Size / Power Calculations A Simple Example Goal: To study the effect of cold on blood pressure (mmhg) in rats Use a Completely Randomized Design (CRD): 12 rats are randomly assigned to one of two

More information

From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. About This Book... xiii About The Author...

From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. About This Book... xiii About The Author... From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. Contents About This Book... xiii About The Author... xxiii Chapter 1 Getting Started: Data Analysis with JMP...

More information

Graphical Procedures, SAS' PROC MIXED, and Tests of Repeated Measures Effects. H.J. Keselman University of Manitoba

Graphical Procedures, SAS' PROC MIXED, and Tests of Repeated Measures Effects. H.J. Keselman University of Manitoba 1 Graphical Procedures, SAS' PROC MIXED, and Tests of Repeated Measures Effects by H.J. Keselman University of Manitoba James Algina University of Florida and Rhonda K. Kowalchuk University of Manitoba

More information

Introduction to Matrix Algebra and the Multivariate Normal Distribution

Introduction to Matrix Algebra and the Multivariate Normal Distribution Introduction to Matrix Algebra and the Multivariate Normal Distribution Introduction to Structural Equation Modeling Lecture #2 January 18, 2012 ERSH 8750: Lecture 2 Motivation for Learning the Multivariate

More information

MIXED MODELS FOR REPEATED (LONGITUDINAL) DATA PART 2 DAVID C. HOWELL 4/1/2010

MIXED MODELS FOR REPEATED (LONGITUDINAL) DATA PART 2 DAVID C. HOWELL 4/1/2010 MIXED MODELS FOR REPEATED (LONGITUDINAL) DATA PART 2 DAVID C. HOWELL 4/1/2010 Part 1 of this document can be found at http://www.uvm.edu/~dhowell/methods/supplements/mixed Models for Repeated Measures1.pdf

More information

DISPLAYING THE POISSON REGRESSION ANALYSIS

DISPLAYING THE POISSON REGRESSION ANALYSIS Chapter 17 Poisson Regression Chapter Table of Contents DISPLAYING THE POISSON REGRESSION ANALYSIS...264 ModelInformation...269 SummaryofFit...269 AnalysisofDeviance...269 TypeIII(Wald)Tests...269 MODIFYING

More information

Research Design: Topic 18 Hierarchical Linear Modeling (Measures within Persons) 2010 R.C. Gardner, Ph.d.

Research Design: Topic 18 Hierarchical Linear Modeling (Measures within Persons) 2010 R.C. Gardner, Ph.d. Research Design: Topic 8 Hierarchical Linear Modeling (Measures within Persons) R.C. Gardner, Ph.d. General Rationale, Purpose, and Applications Linear Growth Models HLM can also be used with repeated

More information

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model 1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor

More information

SC705: Advanced Statistics Instructor: Natasha Sarkisian Class notes: Introduction to Structural Equation Modeling (SEM)

SC705: Advanced Statistics Instructor: Natasha Sarkisian Class notes: Introduction to Structural Equation Modeling (SEM) SC705: Advanced Statistics Instructor: Natasha Sarkisian Class notes: Introduction to Structural Equation Modeling (SEM) SEM is a family of statistical techniques which builds upon multiple regression,

More information

Nemours Biomedical Research Biostatistics Core Statistics Course Session 4. Li Xie March 4, 2015

Nemours Biomedical Research Biostatistics Core Statistics Course Session 4. Li Xie March 4, 2015 Nemours Biomedical Research Biostatistics Core Statistics Course Session 4 Li Xie March 4, 2015 Outline Recap: Pairwise analysis with example of twosample unpaired t-test Today: More on t-tests; Introduction

More information

Models for longitudinal data

Models for longitudinal data Faculty of Health Sciences Contents Models for longitudinal data Analysis of repeated measurements, NFA 016 Julie Lyng Forman & Lene Theil Skovgaard Department of Biostatistics, University of Copenhagen

More information

Multivariate Regression Generalized Likelihood Ratio Tests for FMRI Activation

Multivariate Regression Generalized Likelihood Ratio Tests for FMRI Activation Multivariate Regression Generalized Likelihood Ratio Tests for FMRI Activation Daniel B Rowe Division of Biostatistics Medical College of Wisconsin Technical Report 40 November 00 Division of Biostatistics

More information

Accounting for Correlation in the Analysis of Randomized Controlled Trials with Multiple Layers of Clustering

Accounting for Correlation in the Analysis of Randomized Controlled Trials with Multiple Layers of Clustering Duquesne University Duquesne Scholarship Collection Electronic Theses and Dissertations Spring 2016 Accounting for Correlation in the Analysis of Randomized Controlled Trials with Multiple Layers of Clustering

More information

Chapter 11. Analysis of Variance (One-Way)

Chapter 11. Analysis of Variance (One-Way) Chapter 11 Analysis of Variance (One-Way) We now develop a statistical procedure for comparing the means of two or more groups, known as analysis of variance or ANOVA. These groups might be the result

More information

Chapter 11. Correlation and Regression

Chapter 11. Correlation and Regression Chapter 11. Correlation and Regression The word correlation is used in everyday life to denote some form of association. We might say that we have noticed a correlation between foggy days and attacks of

More information

Advanced Experimental Design

Advanced Experimental Design Advanced Experimental Design Topic 8 Chapter : Repeated Measures Analysis of Variance Overview Basic idea, different forms of repeated measures Partialling out between subjects effects Simple repeated

More information

Analysis of Longitudinal Data: Comparison Between PROC GLM and PROC MIXED. Maribeth Johnson Medical College of Georgia Augusta, GA

Analysis of Longitudinal Data: Comparison Between PROC GLM and PROC MIXED. Maribeth Johnson Medical College of Georgia Augusta, GA Analysis of Longitudinal Data: Comparison Between PROC GLM and PROC MIXED Maribeth Johnson Medical College of Georgia Augusta, GA Overview Introduction to longitudinal data Describe the data for examples

More information

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model EPSY 905: Multivariate Analysis Lecture 1 20 January 2016 EPSY 905: Lecture 1 -

More information

Multicollinearity and A Ridge Parameter Estimation Approach

Multicollinearity and A Ridge Parameter Estimation Approach Journal of Modern Applied Statistical Methods Volume 15 Issue Article 5 11-1-016 Multicollinearity and A Ridge Parameter Estimation Approach Ghadban Khalaf King Khalid University, albadran50@yahoo.com

More information

SAS Code for Data Manipulation: SPSS Code for Data Manipulation: STATA Code for Data Manipulation: Psyc 945 Example 1 page 1

SAS Code for Data Manipulation: SPSS Code for Data Manipulation: STATA Code for Data Manipulation: Psyc 945 Example 1 page 1 Psyc 945 Example page Example : Unconditional Models for Change in Number Match 3 Response Time (complete data, syntax, and output available for SAS, SPSS, and STATA electronically) These data come from

More information

PLS205 Lab 2 January 15, Laboratory Topic 3

PLS205 Lab 2 January 15, Laboratory Topic 3 PLS205 Lab 2 January 15, 2015 Laboratory Topic 3 General format of ANOVA in SAS Testing the assumption of homogeneity of variances by "/hovtest" by ANOVA of squared residuals Proc Power for ANOVA One-way

More information

Hypothesis Testing for Var-Cov Components

Hypothesis Testing for Var-Cov Components Hypothesis Testing for Var-Cov Components When the specification of coefficients as fixed, random or non-randomly varying is considered, a null hypothesis of the form is considered, where Additional output

More information

Lecture 3: Inference in SLR

Lecture 3: Inference in SLR Lecture 3: Inference in SLR STAT 51 Spring 011 Background Reading KNNL:.1.6 3-1 Topic Overview This topic will cover: Review of hypothesis testing Inference about 1 Inference about 0 Confidence Intervals

More information

Rejection regions for the bivariate case

Rejection regions for the bivariate case Rejection regions for the bivariate case The rejection region for the T 2 test (and similarly for Z 2 when Σ is known) is the region outside of an ellipse, for which there is a (1-α)% chance that the test

More information

Correlation. Martin Bland. Correlation. Correlation coefficient. Clinical Biostatistics

Correlation. Martin Bland. Correlation. Correlation coefficient. Clinical Biostatistics Clinical Biostatistics Correlation Martin Bland Professor of Health Statistics University of York http://martinbland.co.uk/ Correlation Example: Muscle and height in 42 alcoholics A scatter diagram: How

More information

Repeated Measures Data

Repeated Measures Data Repeated Measures Data Mixed Models Lecture Notes By Dr. Hanford page 1 Data where subjects are measured repeatedly over time - predetermined intervals (weekly) - uncontrolled variable intervals between

More information

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010 1 Linear models Y = Xβ + ɛ with ɛ N (0, σ 2 e) or Y N (Xβ, σ 2 e) where the model matrix X contains the information on predictors and β includes all coefficients (intercept, slope(s) etc.). 1. Number of

More information

POWER ANALYSIS TO DETERMINE THE IMPORTANCE OF COVARIANCE STRUCTURE CHOICE IN MIXED MODEL REPEATED MEASURES ANOVA

POWER ANALYSIS TO DETERMINE THE IMPORTANCE OF COVARIANCE STRUCTURE CHOICE IN MIXED MODEL REPEATED MEASURES ANOVA POWER ANALYSIS TO DETERMINE THE IMPORTANCE OF COVARIANCE STRUCTURE CHOICE IN MIXED MODEL REPEATED MEASURES ANOVA A Thesis Submitted to the Graduate Faculty of the North Dakota State University of Agriculture

More information

Chapter 7: Simple linear regression

Chapter 7: Simple linear regression The absolute movement of the ground and buildings during an earthquake is small even in major earthquakes. The damage that a building suffers depends not upon its displacement, but upon the acceleration.

More information

Randomized Complete Block Designs

Randomized Complete Block Designs Randomized Complete Block Designs David Allen University of Kentucky February 23, 2016 1 Randomized Complete Block Design There are many situations where it is impossible to use a completely randomized

More information

dm'log;clear;output;clear'; options ps=512 ls=99 nocenter nodate nonumber nolabel FORMCHAR=" = -/\<>*"; ODS LISTING;

dm'log;clear;output;clear'; options ps=512 ls=99 nocenter nodate nonumber nolabel FORMCHAR= = -/\<>*; ODS LISTING; dm'log;clear;output;clear'; options ps=512 ls=99 nocenter nodate nonumber nolabel FORMCHAR=" ---- + ---+= -/\*"; ODS LISTING; *** Table 23.2 ********************************************; *** Moore, David

More information

SRMR in Mplus. Tihomir Asparouhov and Bengt Muthén. May 2, 2018

SRMR in Mplus. Tihomir Asparouhov and Bengt Muthén. May 2, 2018 SRMR in Mplus Tihomir Asparouhov and Bengt Muthén May 2, 2018 1 Introduction In this note we describe the Mplus implementation of the SRMR standardized root mean squared residual) fit index for the models

More information

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Lecture Linear models from a matrix perspective An example of how to do

More information

BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression

BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression Introduction to Correlation and Regression The procedures discussed in the previous ANOVA labs are most useful in cases where we are interested

More information

SAS/STAT 15.1 User s Guide The GLMMOD Procedure

SAS/STAT 15.1 User s Guide The GLMMOD Procedure SAS/STAT 15.1 User s Guide The GLMMOD Procedure This document is an individual chapter from SAS/STAT 15.1 User s Guide. The correct bibliographic citation for this manual is as follows: SAS Institute Inc.

More information

Time-Invariant Predictors in Longitudinal Models

Time-Invariant Predictors in Longitudinal Models Time-Invariant Predictors in Longitudinal Models Topics: Summary of building unconditional models for time Missing predictors in MLM Effects of time-invariant predictors Fixed, systematically varying,

More information

REPEATED MEASURES USING PROC MIXED INSTEAD OF PROC GLM James H. Roger and Michael Kenward Live Data and Reading University, U.K.

REPEATED MEASURES USING PROC MIXED INSTEAD OF PROC GLM James H. Roger and Michael Kenward Live Data and Reading University, U.K. saug '93 ProceedioJls REPEATED MEASURES USING PROC MIXED INSTEAD OF PROC GLM James H. Roger and Michael Kenward Live Data and Reading University, U.K. Abstract The new procedure Mixed in Release 6.07 of

More information

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /8/2016 1/38

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /8/2016 1/38 BIO5312 Biostatistics Lecture 11: Multisample Hypothesis Testing II Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 11/8/2016 1/38 Outline In this lecture, we will continue to

More information

Correlated data. Repeated measurements over time. Typical set-up for repeated measurements. Traditional presentation of data

Correlated data. Repeated measurements over time. Typical set-up for repeated measurements. Traditional presentation of data Faculty of Health Sciences Repeated measurements over time Correlated data NFA, May 22, 2014 Longitudinal measurements Julie Lyng Forman & Lene Theil Skovgaard Department of Biostatistics University of

More information

Tutorial 4: Power and Sample Size for the Two-sample t-test with Unequal Variances

Tutorial 4: Power and Sample Size for the Two-sample t-test with Unequal Variances Tutorial 4: Power and Sample Size for the Two-sample t-test with Unequal Variances Preface Power is the probability that a study will reject the null hypothesis. The estimated probability is a function

More information

2. TRUE or FALSE: Converting the units of one measured variable alters the correlation of between it and a second variable.

2. TRUE or FALSE: Converting the units of one measured variable alters the correlation of between it and a second variable. 1. The diagnostic plots shown below are from a linear regression that models a patient s score from the SUG-HIGH diabetes risk model as function of their normalized LDL level. a. Based on these plots,

More information

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model Topic 17 - Single Factor Analysis of Variance - Fall 2013 One way ANOVA Cell means model Factor effects model Outline Topic 17 2 One-way ANOVA Response variable Y is continuous Explanatory variable is

More information

with the usual assumptions about the error term. The two values of X 1 X 2 0 1

with the usual assumptions about the error term. The two values of X 1 X 2 0 1 Sample questions 1. A researcher is investigating the effects of two factors, X 1 and X 2, each at 2 levels, on a response variable Y. A balanced two-factor factorial design is used with 1 replicate. The

More information

The bootstrap. Patrick Breheny. December 6. The empirical distribution function The bootstrap

The bootstrap. Patrick Breheny. December 6. The empirical distribution function The bootstrap Patrick Breheny December 6 Patrick Breheny BST 764: Applied Statistical Modeling 1/21 The empirical distribution function Suppose X F, where F (x) = Pr(X x) is a distribution function, and we wish to estimate

More information

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression INTRODUCTION TO CLINICAL RESEARCH Introduction to Linear Regression Karen Bandeen-Roche, Ph.D. July 17, 2012 Acknowledgements Marie Diener-West Rick Thompson ICTR Leadership / Team JHU Intro to Clinical

More information

Sigmaplot di Systat Software

Sigmaplot di Systat Software Sigmaplot di Systat Software SigmaPlot Has Extensive Statistical Analysis Features SigmaPlot is now bundled with SigmaStat as an easy-to-use package for complete graphing and data analysis. The statistical

More information

DESIGNING EXPERIMENTS AND ANALYZING DATA A Model Comparison Perspective

DESIGNING EXPERIMENTS AND ANALYZING DATA A Model Comparison Perspective DESIGNING EXPERIMENTS AND ANALYZING DATA A Model Comparison Perspective Second Edition Scott E. Maxwell Uniuersity of Notre Dame Harold D. Delaney Uniuersity of New Mexico J,t{,.?; LAWRENCE ERLBAUM ASSOCIATES,

More information

Describing Change over Time: Adding Linear Trends

Describing Change over Time: Adding Linear Trends Describing Change over Time: Adding Linear Trends Longitudinal Data Analysis Workshop Section 7 University of Georgia: Institute for Interdisciplinary Research in Education and Human Development Section

More information

Repeated ordinal measurements: a generalised estimating equation approach

Repeated ordinal measurements: a generalised estimating equation approach Repeated ordinal measurements: a generalised estimating equation approach David Clayton MRC Biostatistics Unit 5, Shaftesbury Road Cambridge CB2 2BW April 7, 1992 Abstract Cumulative logit and related

More information

Odor attraction CRD Page 1

Odor attraction CRD Page 1 Odor attraction CRD Page 1 dm'log;clear;output;clear'; options ps=512 ls=99 nocenter nodate nonumber nolabel FORMCHAR=" ---- + ---+= -/\*"; ODS LISTING; *** Table 23.2 ********************************************;

More information

Factor Analysis. Qian-Li Xue

Factor Analysis. Qian-Li Xue Factor Analysis Qian-Li Xue Biostatistics Program Harvard Catalyst The Harvard Clinical & Translational Science Center Short course, October 7, 06 Well-used latent variable models Latent variable scale

More information