Random Coefficient Model (a.k.a. multilevel model) (Adapted from UCLA Statistical Computing Seminars)

STAT:5201 Applied Statistic II Random Coefficient Model (a.k.a. multilevel model) (Adapted from UCLA Statistical Computing Seminars) School math achievement scores The data file consists of 7185 students nested in 160 schools. The outcome variable of interest is student-level math achievement score (MATHACH). Variable SES is social-economicstatus of a student and therefore is a student-level variable. Variable MEANSES is the group mean of SES and therefore is a school-level variable. Both SES and MEANSES are centered at the grand mean (they both have means of 0). Variable SECTOR is an indicator variable indicating if a school is public or catholic and is therefore a school-level variable. There are 90 public schools (SECTOR=0) and 70 catholic schools (SECTOR=1) in the sample. A segment of the data file: SCHOOL MATHACH SES MEANSES SECTOR 1296 6.588-0.178-0.420 0 1296 11.026 0.392-0.420 0 1296 7.095-0.358-0.420 0 1296 12.721-0.628-0.420 0 1296 5.520-0.038-0.420 0 1296 7.353 0.972-0.420 0 1296 7.095 0.252-0.420 0 1296 9.999 0.332-0.420 0 1296 10.715-0.308-0.420 0 1308 13.233 0.422 0.534 1 1308 13.952 0.562 0.534 1 1308 13.757-0.058 0.534 1 1308 13.970 0.952 0.534 1... 1

1. Is there a significant school effect? (Model that disregards covariates). Let i=student and j=school. y ij = µ + u 0j + r ij with u 0j N(0, στ) 2 and r ij N(0, σ 2 ) where u 0j and r ij independent. This looks like our usual 1-way ANOVA random effects model, or some may call it the unconditional means model. As a random coefficient model with a random intercept: proc mixed data = rc.hsb12 covtest noclprint; model mathach = / solution; random intercept / subject = school; Cov Parm Subject Estimate Error Value Pr > Z Intercept SCHOOL 8.6097 1.0778 7.99 <.0001 Residual 39.1487 0.6607 59.26 <.0001 Unconditional Means Model The Mixed Procedure Intercept 12.6370 0.2443 159 51.72 <.0001 As a mixed model with a random SCHOOL effect: proc mixed data=rc.hsb12 covtest; model mathach=/ddfm=satterth solution; random school; 2

Cov Parm Estimate Error Value Pr > Z SCHOOL 8.6097 1.0778 7.99 <.0001 Residual 39.1487 0.6607 59.26 <.0001 Intercept 12.6370 0.2443 157 51.72 <.0001 2. Do schools with high socio-economic status (SES) tend to have high math achievement scores (MATHACH)? Model 2: Include fixed effect for MEANSES (continuous variable) while accounting for the random SCHOOL effect (as compound symmetry). Let i=student and j=school. y ij = µ + β 1 x j + u 0j + r ij with u 0j N(0, στ) 2 and r ij N(0, σ 2 ) where u 0j and r ij independent. This model assumes a linear relationship with MEANSES from the schools, allows for a separate random intercept for each school, and assumes constant variance across schools. The correlation structure is compound symmetry. proc mixed data = rc.hsb12 covtest noclprint; model mathach = meanses / solution ddfm = satterth; random intercept / subject = school; Cov Parm Subject Estimate Error Value Pr > Z Intercept SCHOOL 2.6357 0.4036 6.53 <.0001 Residual 39.1578 0.6608 59.26 <.0001 3

Intercept 12.6495 0.1492 154 84.77 <.0001 MEANSES 5.8635 0.3613 154 16.23 <.0001 If we compare the estimated SCHOOL variance component (representing variation between schools) from model 1 and model 2, the magnitude decreases greatly (from 8.6097 to 2.6357). This means that the variable meanses explains a large portion of the schoolto-school variation in mean math achievement. Do school achievement means still vary significantly once MEANSES is controlled? From the output of, we see that the test that between variance is zero is highly significant. Therefore, we conclude that after controlling for MEANSES, significant variation among school mean math achievement still remains to be explained. Again, the same results would be found from the following SAS code: proc mixed data = rc.hsb12 covtest noclprint; model mathach = meanses / solution ddfm=satterth; random school; 3. What if we include the student-level socio-economic level (SES)? (Random coefficient model - random intercept and random slope). y ij = µ + β 1 x ij + u 0j + u 1j x ij + r ij with ( u0j u 1j ) N(0, D) and r ij N(0, σ 2 ) where (u 0j, u 1j ) and r ij independent. x is the centered covariate wrt school MEANSES. This model allows for a separate regression line for each school (both intercept and slope). It assumes constant variance for each school, but this model can allow for heteroscedasticity at the marginal level (because it allows for separate lines by school which can spread out as ses increases, for instance). 4

The motivating questions for this model: (a) What would be the average of the 160 regression equations (both intercept and slope)? (b) How much do the regression equations vary from school to school? (c) What is the correlation between the intercepts and slopes? data rc.hsb12; set rc.hsb12; cses = ses - meanses; proc mixed data = rc.hsb12 noclprint covtest noitprint; model mathach = cses / solution ddfm = satterth notest; random intercept cses / subject = school type = un gcorr; Cov Parm Subject Estimate Error Value Pr Z UN(1,1) SCHOOL 8.6769 1.0786 8.04 <.0001 UN(2,1) SCHOOL 0.05075 0.4062 0.12 0.9006 UN(2,2) SCHOOL 0.6940 0.2808 2.47 0.0067 Residual 36.7006 0.6258 58.65 <.0001 Estimated G Correlation Matrix Row Effect SCHOOL Col1 Col2 1 Intercept 1224 1.0000 0.02068 2 cses 1224 0.02068 1.0000 Intercept 12.6493 0.2445 157 51.75 <.0001 cses 2.1932 0.1283 155 17.10 <.0001 5

In the output of, the parameter corresponding to UN(2,2) is the variability in slopes of cses. The estimate is 0.6940 with standard error 0.2808. That yields a p-value of 0.0067 for 1-tailed test. The test being significant tells us that we can not accept the hypothesis that there is no difference in slopes among schools. NOTE: When you have a longitudinal study and the times are not uniform, the random coefficients model is one way to model the data without considering observations to be missing. 6