Use of Transformations and the Repeated Statement in PROC GLM in SAS Ed Stanek

Use of Transformations and the Reeated Statement in PROC GLM in SAS Ed Stanek Introduction We describe how the Reeated Statement in PROC GLM in SAS transforms the data to rovide tests of hyotheses of interest. A good reference for multivariate methods is Timm (975). We use the SAS manual as a reference for the reeated statement. First, we briefly review notation and the context for multivariate data. The Poulation, Samle, and Context Consider a simle random samle of n subjects from a very large oulation. Assume that there are measures of resonse made on each selected subject. The measures may corresond to measures of different characteristics of the subjects (such as age, height, weight, systolic blood ressure, etc), or reeated measures of the same variable (total cholesterol) at different times or conditions. With these assumtions, we reresent the vector of resonses for the i th selected subject as Y = Y Y Y. We will assume that selections of subjects is indeendent, but ( 2 ) i i i i that measures on a selected subject may be correlated, reresenting var be740-trans-reeatedsas.doc /25/2006 Y = Σ. In exerimental settings, Factor level may be assigned to subjects (blocks) or to occasions within a subject (lots). Factors assigned to subjects are modeled with a design matrix such that Y= Xβ + E Y 2 where Y Y =, X is a design matrix reresenting the levels of the factor assigned to n n k Yn β β2 β 2 22 2 selected subject, β β β β = is a matrix of arameters, with rows k βk βk2 β k corresonding to a level of factor A, and columns corresonding to the measures, or var vec = var vec Y = Σ I. occasions. With these assumtions, ( Y ) I Σ, or Estimation under a General Linear Multivariate Model n Estimates of the arameters corresonding to least squares estimates can be obtained similar to estimates in univariate models. The estimates are given by i n

( XX) β ˆ = XY. k Since vec( ABC) = ( C A) vec( B ), then exressing β ˆ = [ ] ( I XX X ) vec( Y) k ( vec( ˆ )) which simlifies to var ( vec ( ˆ )) = ( XX ) that vec ˆ so that var vec ( ˆ ) β =, and hence be740-trans-reeatedsas.doc /25/2006 2 k ( I XX X )( I ) I X( XX ) XX X Y I, we find var = n = ˆ ( XX ) β Σ. β Σ, β Σ. The variance matrix is estimated by ( ˆ Y Xβ) ( Y Xβˆ) Σ= ˆ, n k Hyotheses under A General Linear Multivariate Normal Model Traditionally, hyotheses have been secified for a general linear multivariate β β2 β 2 22 2 model that are linear combinations of rows and columns of β β β β =. k βk βk2 β k Such hyotheses are exressed as L β M = 0, where L and M are matrices of g k u g k k u constants. For examle, the hyothesis of whether or not the average of the arameters in the first row equals the average of the arameters in the second row is secified by setting L = ( 0 0 0) and M =. It should be clear that there k are limitations in defining hyotheses in terms of L β M = 0. With such a structure, it is g k k u not ossible to define L and M test the hyothesis that β + β2 = β + β2. Using g k u mixed models, any linear hyothesis concerning the elements of β can be tested. When u = and M is of full rank, then M is a square matrix that can be viewed as a transformation matrix. The standard multivariate model can be written as a model on a set of transformed random variables. This is the idea behind the Reeated Statement in PROC GLM. A Model for A Transformed Set of Random Variables We transform the random variables in a multivariate model by ost-multilying by

the non-singular matrix M. The model is given by which we exress as YM = Xβ M + EM Y = Xβ + E where Y = YM, β = β M, and vec = vec = vec E = EM. Note that ( Y ) ( YM) ( M In ) ( Y ) while vec ( ) = vec ( ) = ( n ) vec ( ) ( β ) = ( βm) = ( M In ) ( β ). Then in the transformed model, E ( Y ) = vec vec vec where ( vec( Y )) = ( I M )( I Σ )( I M) var n n n = I M Σ M n = I Σ n Y MY I M Y and Xβ and Σ = M Σ M. Thus, all the analysis tools used for the usual multivariate model can be alied to the transformed multivariate model. Examle: Secial Case For A Simle Resonse Error Model Suose that indeendent measures are made on each selected subject in a th simle random samle of n subjects from a large oulation. Let the model for the k measure on the i th selected subject be given by Y ik = + E ik 2 2 where σ σ var i e 2 N 2 e σ s N s= Y = I + J = Σ, with σ = reresenting the average resonse N 2 error in the oulation, and σ = ( ) 2 s reresenting the variance between N s= subjects. Consider a transformation matrix M = M ( ) where we assume that M = 0. Thus, the column sace sanned by the remaining columns is orthogonal to the column sace of a one vector. Using this transformation matrix, Y = X E E Y = X var vec Y = I Σ where Σ = M Σ M. Using β +, β, and be740-trans-reeatedsas.doc /25/2006 3 n

the definitions of these terms, Σ = M Σ M 2 2 = ( σe + σ ) M I J M ( ) 2 2 2 2 ( σei + σ J) ( σei + σ J) M. = 2 2 2 2 M ( σei + σ J) M ( σei + σ J) M 2 2 ( σe + σ ) 0 = 2 σ e 0 M M Frequently, we will require the columns of the second art of the transformation matrix to be orthogonal, and normal (such that the inner roduct of the column vectors is ). 2 2 ( σ ) e + σ 0 When this is true, then M M = I ( ), and hence Σ = 2. Notice 0 σ e I( ) that with this choice of a transformation, the columns of the transformed random variables are indeendent. Thus, univariate methods can be used to test hyotheses concerning the transformed random variables. Using the Reeated Statement in PROC GLM in SAS In reeated measures studies, the same variable is measured under different conditions. Often, there are simle functions of the measures that are of interest. For examle, if there are 2 measures on each selected subject corresonding to a retest and osttest measure, then the difference between the two measures is of interest. This difference is secified in terms of an M matrix. If there is a factorial structure on the reeated measures, the main effects, and interactions can be secified in terms of linear combinations. We consider two examles as illustrations. First, suose that = 6 measures were made on each subject under each of the factor levels in a two factor study where Factor A had 2 levels, and Factor B had 3 levels. Let the resonses be organized as follows: Level of Factor A A A A A2 A2 A2 Level of Factor B B B2 B3 B B2 B3. Y Y Y Y Y Y Y i i i2 i3 i2 i22 i23 th Suose an aroriate model for the m measure on subject s under level j of Factor A and level k of Factor B is given by Y = + E m m be740-trans-reeatedsas.doc /25/2006 4

2 where E( Y m ) =, and ( Y ) σ var m =. Let us define arameters corresonding to the average exected resonse under level j of Factor A and level k of Factor B by N jk =, and re-arameterize these averages such that N s= 0 0 0 0 α 2 β 3 =. 0 0 β2 2 0 0 αβ 22 αβ 2 23 With this arameterization, we can exress 2 2 α 2 2 2 β 3 = (see bem032.sas). We summarize this by β2 6 2 2 2 αβ 2 2 22 αβ 2 23 exressing = + α + β+ β2 + αβ+ αβ2. Finally, let us define a b s s ab j= k= jk = + δ =, where N a b =. Then Nab s= j= k= = jk + jk. = jk s + s + jk The first term is the subject by treatment interaction. This term is a difference in the subject effect under a secific treatment from the average subject effect. If a treatment affects subjects differently, this term will be non-zero. If the subject by treatment interaction is zero, the model simlifies. The mean structure is given by = + jk jk = s + jk. = + α + β + β + αβ + αβ + δ 2 2 s be740-trans-reeatedsas.doc /25/2006 5

Post-multilying Y i by M given by Z = Y M i i 0 0 0 0 M = will results in 0 0 0 0 ( ) ( ) Y Y Y Y =. The columns of this vector corresond to function that reresent main effects and interactions of the two factors. For examle, simly by testing the hyothesis that a b b b a a a a i i3 i2 i3 Yijk Yi k Yi2k Yij Yij3 Yij2 Yij3 j= k= k= k= i= i= i= i= Yi2 Yi23 Yi22 Yi23 E Z i2 = 0 will test for a main effect for Factor A. In such settings, some of the univariate tests on the transformed data will be interretable. be740-trans-reeatedsas.doc /25/2006 6

Polynomial Regression Polynomial regression oses no secial roblems for analysis of data. Polynomial functions of x i (i.e. x i 2, x i 3, etc) can be treated simly as different regression variables. There are two ecularities in analysis, however, that deserves secial discussion with resect to olynomial regression, or olynomial trends. The first relates to a comuting roblem. The second relates to testing contrasts in ANOVA alications that reresent olynomial trends. We discuss these two issues here. Examle: Cubic Polynomial model Suose heart rate is measured at P=6 walking seeds, corresonding to x i =, 2, 3, 4, 5 and 6 mh, for i=,...,6=p. Let y ij = heart rate at walking seed "i" on j th measure. If we assume that heart rate is a cubic function of walking seed, then y ij = β 0 + β x i + β 2 x i 2 + β 3 x i 3 + e ij is a cubic olyonmial model. When considering olynomial models, it is customary to include all lower order olynomial terms along with the highest olynomial term in the model. We will always follow this convention. We can estimate regression coefficients in the usual manner with a design matrix corresonding to 2 3 x x x 2 3 x 2 x2 x2 2 3 X= x3 x3 x 3............ 2 3 x P xp xp Polynomial Models as Simle Transformation of Cell Mean Model. If a olynomial model is of degree (P-) for i=,...,p times, then there will be as many olynomial arameters as time oints. For examle, with P=6 walking seeds, a 5th degree olynomial model will have 6 arameters. In such cases, the olynomial arameters are simly a re-arameterization of the cell mean resonse at each time. The ideas can be illustrated with a simle examle with P=4. Suose that mean resonse at four doses are recorded, where the doses are, 2, 3, and 4. The four means can be reresented as their individual means. Alternatively, the four means be740-trans-reeatedsas.doc /25/2006 7

can be reresented by four regression arameters corresonding to a constant, linear, quadratic, and cubic arameter. The cell mean arameters can be transformed to form the regression arameters. The transformation is a re-arameterization. The connection between the arameters can be seen by considering the relationshi i = β 0 + β x i + β 2 x i 2 + β 3 x i 3 Using matrix notation, 0 2 4 8 β β 2 4 6 64β 3 =X β= 3 9 27 β or β = X -. This is a simle transformation of the means. If we had n measures at each dose, we could first estimate the oulation mean resonse based on the samle mean. We could then transform these mean resonses to regression coefficients. Estimate of regression coefficients in a model with n measures er dose can be fit directly using a design matrix, where the design matrix can be exressed as a set of individual design matrices (like the one given above). X 0 = X X... Then X 0 X 0 = n (X X). As a result, when the same number of measures are made at each dose level (or time oint), the X X matrix is a function of the olynomial matrix for a single set of measures. Problems With Polynomial Design Matrices: Examle With 0 Times; Suose a child is measured at P=0 times. Let the times be given by,2,3,4,5,6,7,8,9,0, where the measures corresond to the weight of at each of 0 ages. Suose a 8th order (x 8 i ) regression model is to be fit to these data. To fit this, we need to invert the design matrix, X X. This turns out to be a difficult comuting roblem. For examle, in PROC REG, the rogram will not form the regression coefficients. In PROC IML, a check of (X X) - (X X) results in a matrix that is not an identify matrix. Orthogonal olynomials enable tests to be constructed for regression coefficients be740-trans-reeatedsas.doc /25/2006 8

in this roblem by rearameterizing the regression arameters, and thereby avoiding inverting a an ill conditionned matrix. The test results avoid the inaccuracies that may be introduced by matrix inversion. Although tests can be constructed based on orthogonal olynomials (as well as redicted values), the matrix inversion is necessary to estimate regression arameters on the original metric. For this reason, this re-arameterization is often not done. Choleski Decomosition of Symmetric Full Rank Matrix and its Relationshi to Orthogonal Polynomials Orthogonal olynomials were introduced as a way of making tests more recise and not deendent on the inversion of X X. While their use imroves accuracy of the tests, it transforms arameters into quantities that are not easily interretable. We illustrate orthogonal olynomials in the context of a olynomial regression with P time oints, where a olynomial of degree (P-) (having P arameters) is fit. We assume that the matrix X is a P x P square square olynomial matrix. When X reresents a matrix of olynomials, the matrix X X will be ositive definite (of full rank) and symmetrical. In such cases, it is ossible to factor the matrix using a Cholesky decomosition (see Timm, (975), 73-75). Cholesky factoring of a matrix A results in an uer triangular matrix T such that A = T T In PROC IML, the Cholesky decomositon is secified via the ROOT function, and results in the matrix T =ROOT(A). Of course, the question is how does this factorization hel in estimating sums of squares for regression roblems. To answer this question, we consider a transformation of the original X matrix that will result in a T matrix. First, note that X X = TT Both the matrices X and T are P x P matrices. Let us indictate the relationshi between X and T via a matrix P such that Then since X=PT, or P=X T - be740-trans-reeatedsas.doc /25/2006 9

which imlies that X X = TT = T P P T, P P=I Simle Examle with P=3. (x=,2,3); X X X (X X) - 3 6 4 9-2 5 2 4 6 4 36-2 24.5-6 3 9 4 36 98 5-6.5 T T (TT ) -.7320508 0 0.7320508 3.46406 8.0829038 9-2 5 3.46406.44236 0 0.44236 5.6568542-2 24.5-6 8.0829038 5.6568542 0.864966 0 0 0.864966 5-6.5 P 0.5773503-0.70707 0.4082483 0.5773503 0-0.86497 0.5773503 0.707068 0.4082483 Column Scale Factors: 3 2 6 The coefficients in the matrix P are orthogonal olynomial coefficients. Aside: Note that the solution we have given for P requires inversion of the matrix T. If the design matrix is ill conditionned, then the matrix T will also be ill conditionned. Hence, it may be questionnable as to how much the roblem of inverting the matrix has been resolved. Another fucntion, ORPOL, in SAS will generate orthogonal olynomials without inverting T. This function using a slightly different algorithm, but does not aear to be better than simle use of the Choleski decomosition. Summary of Transformations of Regression Coefficients with Orthogonal Polynomials To summarize, if the model is Y = X β + e where X = P T, we can transform the model such that Y = P T β + e be740-trans-reeatedsas.doc /25/2006 0

= P β + e where β = T β. The model with transformed arameters will fit idential to the original model. It is easy to fit this model, since the design matrix is orthogonal, (P P=I) and hence, Also, ˆ β =P y ˆ )=IP 2 var( β σ where σ 2 is estimated via the MSE. [ Aside: It is this transformation that is used in the SAS Proceedure PROC GLM with a REPEATED POLYNOMIAL otion.] Note that the diagonal form of the variance of the transformed arameters indicates the indeendence of the arameters. be740-trans-reeatedsas.doc /25/2006

A similar analysis can be made when measures are made over time. Suose three measures of ulse rate are made on each selected subject immediately following exercise, minute following exercise, and 2 minutes following exercise. We may consider a transformation matrix similar to a olynomial matrix given by 0 0 M = 2 4 where columns corresond to a mean, a linear, and a quadratic trend. The transformation used in SAS is a normalized orthogonal olyonimal matrix, not the olynomial matrix given above. A normalized matrix means that the sum of the squared values of the coefficients in any column add u to. Orthogonal means that the inner roduct of any two columns of the matrix is zero. The orthonormal olynomial matrix is formed by taking the Choleski decomosition of the olynomial matrix (see Timm, (975) 73-75). The decomosition results in M = TT, where T is an uer triangular matrix. All elements of T below the diagonal are zero. In PROC IML, this can be obtained by the command T A ;. With this transformation, = ROOT References Timm, N.H. (975). Multivariate Analysis with Alications in Education and Psychology, Wadsworth Publishing Comany, Belmont, California. be740-trans-reeatedsas.doc /25/2006 2