Computation of Variances of Functions of Parameter Estimates for Mixed Models in GLM Ramon C. Littell and Stephen B. Linda University of Florida. Introduction Estimates of functions of parameters in linear models may be obtained from the ESTIMATE statement in PROC G LM. The standard error and hypothesis test produced by the ESTIMATE statement are correct for fixed effects models. However, when fitting mixed models, the standard error produced by the ES TIMATE statement may not be correct. This is true, for example, in many split-plot and repeated measures applications. A special class of estimable functions are least squares means, also referred "to as adjusted means, which may be obtained in GLM from the LSMEANS statement. GLM allows some flexibility when fitting mixed models by providing the E option in the LSMEANS statement, by which an effect in the model may be specified as an error term for calculating standard errors and hypothesis tests of least squares means. However, an appropriate error term is not always clearly evident, and may not be available as an effect in the model. For example, in a split-plot model, no effect in the model is appropriate as an error term for computing the standard error of the mean for a given level of the sub-plot treatment. The CONTRAST and RANDOM statements can be used along with the ESTIMATE statement to determine the correct variance expression of estimable functions of parameters (e.g., see Milliken and Johnson 984, Allen 989). Mean squares can then be combined to obtain an estimate of the variance expression. The purpose of this paper is to illustrate and extend the methodology and to provide the theory to justify it. 2. Motivating Example Consider the following example, from Freund and Lit tell (98). A sample of SUBJECTS were assigned to one of the two levels Hand L of a factor TRT. There are three subjects at level H and two subjects at level L. Measurements (Y) are made on each subject at three time phases, 2, and 3. The data are presented in Table. Table. Data used in Example. OBS TIlT SUBJ PHASE Y H 7.5 2 H 2 5.30 3 H 3 3.25 4 H 2 5.6 5 H 2 2 8.00 6 H 2 3 2.8 7 H 3 5.58 8 H 3 2 5.8 9 H 3 3 9.33 0 L 6.98 L 2 6. 2 L 3 0.35 3 L 2 5.03 4 L 2 2 6.2 5 L 2 3 9.56 A statistical model is where Yijk is the response from SUBJECT j in level i oftrt at PHASE k, E(Yi; k) Ji. J + ai + Tk + ati., and {jij and Cijk are independent random parameters with and E ( Oi; ) 0 Cijk v ( Oi; ) (uj {ijk 0 An analysis of variance (AN OVA) of the form 33
Source df Mean Square F TRT.592 0.70 SUBJ(TRT) 3 2.285 PHASE 2 30.09 83.93 TRT'PHASE 2 0.847 0.55 ERROR 6.550 can be obtained from the statements PROC GLM; CLASSES TRT SUBJ PHASE; MODEL Y TRT SUBJ(TRT) PHASE TRT'PHASE / SS3; TEST HTRT ESUBJ(TRT); Suppose you want to estimate the mean response for PHASE, averaged across TRT Hand TRT L. In terms of the model parameters, you want an estimate of This is the expression estimated by the LSMEAN for PHASE. So you run the statement and obtain Table 2. LSMEAN PHASE / STDERR E; Table 2. Output from LSMEANS/STDERR. y Std Err Pr > ITI PHASE LSMEAH LSMEAH HO:LSMEAHO 2 3 6.059667 0.568883 6.2425000 0.568883 0.7708333 0.568883 0.000 0.000 0.000 The STDERR option in the LSMEANS statement prints computations for the standard errors of the LSMEANS. The standard errors, however, are not correct because they do not incorporate between-subject variance 0";' You can see this by looking at the terms in the model that are involved in the LSMEAN. For this simple example, you can compute the LSMEANS for PHASE as PHASE LSMEAN {- -} 2 Y,.,+ Y H {I 2 3(7.5 + 5.6 + 5.58) +~(6.98 + 5.03)} 2 {6. + 6.0} 6.06. In terms of the model, you get PHASE LSMEAN {- -} 2 Yl-+ Y 2., { - 2 (l"+al +h + Tl +atl +h.') + (I" + ah + 8H + Tl + ath! + <H.,)} "+2(aL +ah)+t + 2(aTLl + athl) - - + 2(h. + bh. + h, + <H d The 6's and f's are the random quantities in this expression, so V(PHASE LSMEAN) V { 2(bL. I - + - bh. + h, + <H.,)} 4: H(/3+ /2)+u;(/3+/2)}. The estimated standard error as printed by G LM involves only 0';(/3 + /2). Check this out: G) 0-;(/3+ /2) (D MS(ERROR)(I/3 + /2) This gives (D (0.550)(0.8333) 0.323. Std. err. PHASE LSMEAN ';0.323 0.568. which is the same as printed. So you see that this computation has not involved the 0"; term in the correct expression for the variance of the LSMEAN. The correct expression for the variance of the LSMEAN is easy to derive for this simple example. But in more complex data sets ius typically impossible to derive a variance expression algebraically. The main objective of this paper is to show how this can be done using existing facilities in GLM. A secondary objective is to indicate that the essential computations already exist within GLM. It should therefore be relatively easy to enhance GLM to print the appropriate standard errors. 332
3. Summary of Method Let 0 LP be a linear function of parameter estimates from an ESTIMATE or LSMEANS statement in GLM. Obtain an estimate V(O) of the variance of 0 as follows:. Use an ESTIMATE or LSMEANS statement to compute O. If an LSMEANS statement was used to compute 0, specify the E option to obtain coefficients of the "estimable function." Then use these coefficients in an ESTIMATE statement to duplicate the value of e from the LSMEANS statement. (This is a check to make sure you have the right coefficients.) 2. Include a CONTRAST statement specifying the same coefficients as those in the ESTIMATE statement. 3. Include a RANDOM statement with the Q option. 4. Obtain the quantity L( A' A rl' in one of two ways: i. 7. Compute V(O) as V(O) [L(A' A)-L'] L r;uf. il Approximate degrees offrccdom for 7(9) to be used in hypothesis testing and confidence interval construction may be obtained from Satterthwaite's formula. 4. Solution to Example Problem For the example, the linear function to be estimated is B iii.. We obtain the estimate 0 from the LSMEANS statement. Here are the steps in Section 3 for this example.. Run the LSMEANS statement with the E option to get the coefficients of the "estimable function." LSMEANS TRT / E; This gives the output in Tables 2 and 3. Use the coefficients of the "estimable function" in an ES TIMATE statement to duplicate the LSMEANS computation il L(A'ArL' h'/ms(error), where h the reported standard error of 0 from the ESTIMATE statement, or ESTIMATE 'PHASE ADJ MEAN' INTERCEPT TRT.5.5 SUBJ (TRT).66667.66667.66667.25.25 PHASE 00 TRT*PHASE.5 0 0.5 0 0; where Lo; is the coefficent of the ith fixed parameter, and qii is the coefficient reported in the ith row and ith column of the matrix from the Q option in the RANDOM statement, for Lo;,p 0, i,...,po Note: it does not matter which Lo; is used as long as La;,p o. 5. Obtain the coefficients Til i,..., k, of the variance components in the expected mean square of the CONTRAST from the Expected Mean Square Table. 6. Obtain estimates, ul, i,...,k, of the variance components by i. setting expected mean squares equal to observed mean squares, and solving for the variance components, or n. using PROC VARCOMP. This gives the output in Table 4, which confirms the printed LSMEAN values and verifies that you have the correct coefficients. 2. Run a CONTRAST statement with the same coefficents as in the ESTIMATE statement CONTRAST 'PHASE ADJ MEAN' INTERCEPT TRT.5.5 SUBJ (TRT).66667.66667.66667.25.25 PHASE 0 0 TRT*PHASE.5 0 0.5 0 0; 3. Include a RANDOM statement with the Q option: RANDOM SUBJ(TRT) / Q; obtaining the output in Tables 5, 6, and 7. 4. Obtain L(A' A)-L' from either i. or ii.: 333
i. L(AiA)-L' ( reported standard error of ij ) 2 from ESTIMATE statement MS(ERROR) (0.5682)' 0 2083.550.. ii. Take Lo; LOl coefficient in the ESTI MATE statement of the first fixed effect p'" rameter J which in this case is the "intercept," giving LOl. Now get qu 4.8 from the, position in the matrix of the quadratic form for the fixed effects (obtained from the Q option of the RANDOM statement, output given in Table 6). Then L~l qu /(4.8) 0.2083. Note that both approaches i. and ii. give L(A' A)-L' 0.2083 H~+D which we calculated directly in Section 2. 5. From Table 5, Expected mean square of contrast 0:; + u + QUADRATIC giving rl and r2. This tells you that L(NA) L' Var(ERROR) + Var(SUBJ(TRT)) (T;+u~. Then the variance expression we want is 6. and 7. From the RANDOM statement you get the expected mean squares given in Table 7. Now MS(ERROR)is an estimate of u~ and you see that MS(SUBJ (TRT) is an estimate of u; + 3ui. So an estimate of V(frd 0.2083(u~ + ui) is V(frd 0.2083 ( MS[SUBJ(TR3T)] + 2MSE) 0.2083 C 284967 +:(.5496222») 0.3739. Satterthwaite's formula for degrees of freedom for V(iId gives d 8.7, (0.2083)2 [(MS[SUBJ~TRT))/3)' + (2MS~/3)'] (0.3739)2 (0.2083)2 [(2.2S;~67)' + [2(.54::222)]'] and an approximate 95% confidence interval for lil' is iii, ± tb, o.o25vv(iil.) 6.0592 ± 2.306v'0.3739 (4.649, 7.4693), where t B, 0.D25 is the 0.975 quantile of a student's t distribution with eight degrees of freedom. 5. Theoretical Basis for Method Consider the mixed model where Y is the vector of n observations X is the n x p fixed effects design matrix b is a vector of p fixed effects parameters U; is the n x m; design matrix for the i th random effect, Uk In, and () ei is a vector of mi independent random errors with mean 0 and variance u} i,..., k. Then E(Y) Xh, and k V(Y) L:U;U:ul. (2) i;;;;l PROC GLM uses the method of ordinary least squares (OLS), and computes an "estimate" of - el e2 ek_l ')',as fj - (h' " where A (X U I Uk-I), and (A' A)- is a generalized inverse of (A' A). GLM judges estimability of linear functions LfJ assuming all ei except ek are vectors of fixed effects parameters, according to the criterion L(A' Ar A' A L. Likewise, the standard errors of estimable functions of fj reported by the ESTIMATE and LSMEANS (3) 334
statements are computed under the assumption that V(Y) I"ul. Therefore, when fitting a mixed linear model, the standard error reported for estimates produced by the ESTIMATE and LSMEANS statements can be incorrect. However, values necessary for calculating the appropriate standard errors of these estimates can be obtained directly from GLM output, as follows. We wish to compute 0 LP, where L is a row vector, such that L(A' A)-A' A L. Partition L into Lo, L"..., Lk_ such that Therefore where k R Lriq~, i.;:;l Ti c;jl(a'a)-v, i,...,k, E[02] L(A'A) L' V(O) + [E(O)F L(A' A) L' R+Q, (8) (9) Since L(A' A)-A' A L, we have L(A' A)-A'(X V"" Vk-I) (Lo L, Lk_ I ). (4) It follows that L(A'ArA'V, Li, i l,...,k-l, and L(A' A)-A'X Lo' The expected value of 0 LP is E(O) and the variance of e is where Ci E(LP) E[L(A'A)-A'Y] L(A' A)-A'Xb Lob, (5) k L Ci U; (6) i;;;;l L(A' A)-A'V i V;A(A' A)- L' { LiL; for i,..., k - L(A' A)-V for i k. The coefficients Cl,...,CI,; are the coefficients needed to compute a. valid estimate of V(O) from estimates of the variance components a-r,..., &i. We proceed to derive their computation using output from CON TRAST and RANDOM statements. Elements of L may be specified as the coefficients in both the CONTRAST and ESTIMATE statements. There is no requirement that coefficients sum to zero for either statement. The mean square for 0 LP computed from the contrast statement is -2-8 MS(8) L(A' A) L" (7) Q b'l'olob L(A' A) L' I:f I:}, bibj LOiLOj L(A'A) L' p.p LLbibjqij, i;;:ljl (0) b i and LOi are the ith elements of b and La J respectively, and q;; L(A' A) L" The coefficients C", Ck are therefore obtained as Ci T,[L(A' A)-V]. It remains to obtain T"...,Tk, and L(A'ArV. If the elements of L are specified in a CONTRAST statement, and el"",ek-l are specified in a RAN DOM statement, then GLM produces r"...,tk as the coefficients of the variance components in the expected mean squares. The estimate of the standard error of B reported by the ESTIMATE statement is h J[L( A' A) L']MSE. So L(A' A)-L' h 2 /MSE. An alternate method for obtaining the value of L( A' A )-L' is to use output from the Q option in the RANDOM statement. If the Q option is specified, then the qij i,i " ",P, are produced as elements of the matrix of the quadratic form for the fixed effects in the expected mean squares. This yields for any LOi oft 0, i,...,po Thus, when CONTRAST, ESTIMATE, and RAN DOM statements are used in PROC GLM, SAS reports the MSE, J[L(A' A) L']MSE, ri, i,..., k, 335
and q'j, i, j,..., p. All that is needed to estimate V(O) according to equation (6) are estimates of the variance components, or an estimate of R. If not easily computed from combinations' of mean squares, the variance component estimates may be obtained from PROC VARCOMP.. Given estimates,,', of u?,, i,...,k, Milliken, G. A., and Johnson, D. E. 984. Analysis of Messy Data. New York: Van Nostrand Reinhold. Satterthwaite, F. F. 94. Synthesis of variance. Psychometrica 6, 309-36. for any Lo; io,j,...,p. In some cases, an estimate of R is directly available as a mean square, say it Then V(O) L~j R. ~R., q;j MSE (2) for any Lo; i 0, j,...,p. The usual hypothesis tests and confidence limits for e may be constructed by assuming that V(O) has, approximately, a Chi-square distribution, with d degrees of freedom. The quantity d can be estimated using the Satterthwaite approach (94), i.e., d 2{E[V(ii)W Var[V(O)] '" 2[V(0)]2 Var[V(O)]' (3) where Var[V(O)] is an estimate of Var[V(O)]. If V(O) is a linear combination of mean squares, V(O) I: w,ms" i:;;:l where MS j mean square for ith random effect, i,...,k,then (4) where df i are the degrees of freedom associated with the i,h mean square. It should be noted that (4) ignores nonzero covariance between the MS i. References Allen, R. 989. Analysis of a repeated measures experiment with missing data. Fourteenth Annual SUGI Conference, 294-298. Freund, R. J., and Littell, R. C. 98. SAS for Linear Models. Cary, NC: SAS Institute Inc. 336
Table 3. Output from the E option in the LSMEANS statement General Linear Models Procedure Least Squares Means Coefficients for PHASE Least Square Means PHASE 2 3 Effect Coefficients INTERCEPT TRT H 0.5 0.5 0.5 L 0.5 0.5 0.5 SUBJ(TRT) H 0.666666667 0.666666667 0.666666667 2 H 0.666666667 0.666666667 0.666666667 3 H 0.666666667 0.666666667 0.666666667 L 0.25 0.25 0.25 2 L 0.25 0.25 0.25 PHASE 0 0 2 0 0 3 0 0 TRT-PHASE H 0.5 0 0 H 2 0 0.5 0 H 3 0 0 0.5 L 0.5 0 0 L 2 0 0.5 0 L 3 0 0 0.5 Table 4. Output from ESTIMATE statement. T for HO: Parameter Estimate Parameter;O Pr > ITI Std Error of Estimate phase adj mean 6.0596778 0.66 0.000 0.5688832 Table 5. Expected mean square of contrast. Contrast phase adj mean Contrast Expected Mean Square Var(Error) + Var(SUBJ(TRT» + Q(INTERCEPT,TRT,PHASE,TRT*PHASE) 337
Table 6. Output from Q option in RANDOM statement. General Linear Models Procedure Quadratic Forms of Fixed Effects in the Expected Mean Squares Source: Contrast Mean Square for phase adj mean INTERCEPT TRTH TRT L PHASE PHASE 2 PHASE 3 INTERCEPT 4.80000000 2.40000000 2.40000000 4.80000000 0.00000000 0.00000000 A H 2.40000000.20000000. 20000000 2.40000000 0.00000000 0.00000000 A L 2.40000000. 20000000. 20000000 2.40000000 0.00000000 0.00000000 PHASE 4.80000000 2.40000000 2.40000000 4.80000000 0.00000000 0.00000000 PHASE 2 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 PHASE 3 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 A-PHASE H 2.40000000.20000000.20000000 2.40000000 0.00000000 0.00000000 hphase H 2 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 hphase H 3 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 A-PHASE L 2.40000000. 20000000. 20000000 2.40000000 0.00000000 0.00000000 hphase L 2 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 hphase L 3 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 TRT_PHASE H TRT_PHASE H 2 TRT-PHASE H 3 TRT_PHASE L TRHPHASE L 2 TRHPHASE L 3 INTERCEPT 2.40000000 0.00000000 0.00000000 2.40000000 0.00000000 0.00000000 A H. 20000000 0.00000000 0.00000000. 20000000 0.00000000 0.00000000 A L.20000000 0.00000000 0.00000000. 20000000 0.00000000 0.00000000 PHASE 2.40000000 0.00000000 0.00000000 2.40000000 0.00000000 0.00000000 PHASE 2 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 PHASE 3 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 A-PHASE H.20000000 0.00000000 0.00000000.20000000 0.00000000 0.00000000 hphase H 2 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 hphase H 3 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 hphase L.20000000 0.00000000 0.00000000.20000000 0.00000000 0.00000000 hphase L 2 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 hphase L 3 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 Table 7. Expected mean squares. General Linear Models Procedure Source Type III Expected Mean Square TRT Var(Error) + 3 Var(SUBJ(TRT» + Q(TRT,TRT*PHASE) SUBJ(TRT) PHASE TRT*PHASE Var(Error) + 3 Var(SUBJ(TRT» Var(Error) + Q(PHASE,TRT*PHASE) Var(Error) + Q(TRT*PHASE) 338