Multilevel Modeling of Non-Normal Data. Don Hedeker Department of Public Health Sciences University of Chicago.

Size: px

Start display at page:

Download "Multilevel Modeling of Non-Normal Data. Don Hedeker Department of Public Health Sciences University of Chicago."

Kelly Gilmore
5 years ago
Views:

1 Multilevel Modeling of Non-Normal Data Don Hedeker Department of Public Health Sciences University of Chicago Hedeker, D. (2005). Generalized linear mixed models. In B. Everitt & D. Howell (Eds.), Encyclopedia of Statistics in Behavioral Science. Wiley. 1

2 What are Multilevel Data? Data that are hierarchically structured, nested, clustered Data collected from units organized or observed within units at a higher level (from which data are also obtained) data collected on students siblings repeated observations who are clustered within classrooms families individuals ==> these are examples of two-level data level 1 - (students) - measurement of primary outcome and important mediating variables level 2 - (classrooms) - provides context or organization of level-1 units which may influence outcome; other mediating variables 2

3 What is Multilevel Data Analysis? any set of analytical procedures that involve data gathered from individuals and from the social structure in which they are embedded and are analyzed in a manner that models the multilevel structure L. Burstein, Units of Analysis, 1985, Int Ency of Educ analysis that models the multilevel structure recognizes influence of structure on individual outcome structure classroom family individual may influence response from students siblings repeated observations 3

4 Why do Multilevel Data Analysis? assess amount of variability due to each level (e.g., family variance and individual variance) model level 1 outcome in terms of effects at both levels individual var. = fn(individual var. + family var.) assess interaction between level effects (e.g., individual outcome influenced by family SES for males, not females) Responses are not independent - individuals within clusters share influencing factors Multilevel analysis - another example of Golden Rule of Statistics: one person s error term is another person s (or many persons ) career 4

5 Multilevel models aka random-effects models random-coefficient models mixed-effects models hierarchical linear models Useful for analyzing Clustered data subjects (level-1) within clusters (level-2) e.g., clinics, hospitals, families, worksites, schools, classrooms, city wards Longitudinal data repeated obs. (level-1) within subjects (level-2) 5

6 cluster variables subject variables cluster subject tx group size outcome sex age n n n N n N..... i = 1... N clusters j = 1... n i subjects in cluster i 6

7 time-invariant variables time-varying variables subject time tx group sex age outcome dose n n n N n N..... i = 1... N subjects j = 1... n i timepoints for subject i 7

8 Multilevel models for categorical outcomes dichotomous outcomes mixed-effects logistic regression ordinal outcomes mixed-effects ordinal logistic regression proportional odds model partial or non-proportional odds model nominal outcomes mixed-effects nominal logistic regression discrete or grouped time-to-event data mixed-effects dichotomous or ordinal regression complementary log-log link for proportional (and non-proportional) hazards models 8

9 Logistic Regression Model P (Y log i = 1) = x 1 P (Y i = 1) iβ Dichotomous outcome (Y = 0 absence, Y = 1 presence). Function that links probabilities to regressors is the logit (or log odds) function log [P/(1 P ]. Logit is called the link function. The model can be written in terms of probabilities: 1 P (Y i = 1) = 1 + exp( x i β) Model is a linear model for the logits, not for the probabilities. Logits can take on any values between negative and positive infinity, probabilities can only take on values between 0 and 1. 9

10 10

11 The model can also be written in terms of the odds: P (Y i = 1) 1 P (Y i = 1) = exp(x iβ) exp β = change in odds for Y per unit change of x β = 0 yields no effect on the odds β > 0 increases odds Y is present with increasing x β < 0 decreases odds Y is present with increasing x 11

12 Dichotomous Response and Threshold Concept Continuous y i - an unobservable latent variable - related to dichotomous response Y i via threshold concept Response occurs (Y i = 1) if γ < y i otherwise, a response does not occur (Y i = 0) 12

13 The Threshold Concept in Practice How was your day? (what is your satisfaction level today?) Satisfaction may be continuous, but we usually emit a dichotomous response: 13

14 Model for Latent Continuous Responses Consider the model with p covariates for the latent response strength y i (i = 1, 2,..., N): y i = x iβ + ε i probit: ε i standard normal (mean=0, variance=1) logistic: ε i standard logistic (mean=0, variance=π 2 /3) β estimates from logistic regression are larger (in abs. value) than from probit regression by approximately π 2 /3 = 1.8 Underlying latent variable useful way of thinking of the problem not an essential assumption of the model 14

15 Random-intercept Logistic Regression Model Consider the model with p covariates for the response Y ij for subject j (j = 1, 2,..., n i ) in cluster i (i = 1, 2,..., N): log P (Y ij = 1) = x 1 P (Y ij = 1) ijβ + υ 0i where Y ij = dichotomous response for subject j in cluster i x ij = (p + 1) 1 covariate vector (includes 1 for intercept) β = (p + 1) 1 vector of unknown parameters υ 0i = cluster effects distributed N ID(0, σ 2 υ) and assumed independent of x variables 15

16 Characteristics of υ 0i N ID(0, σ 2 υ) separates model from ususal (fixed-effects) multiple logistic regression model takes on i = 1, 2,..., N values assess impact of cluster i on individual outcome; represents degree of subject clustering common for each cluster member, but changes for each cluster if υ 0i = 0, then cluster has no effect for cluster i if υ 0i = 0 for all clusters, cluster structure has no impact on individual data (σ 2 υ = 0) no need for multilevel approach ordinary logistic regression is OK if subject clustering has strong effect, estimates of υ 0i 0 and σ 2 υ will increase from 0 16

17 Model for Latent Continuous Responses Consider the model with p covariates for the n i 1 latent response strength y ij : where assuming y ij = x ijβ + υ 0i + ε ij ε ij standard normal (mean 0 and σ 2 = 1) leads to multilevel probit regression ε ij standard logistic (mean 0 and σ 2 = π 2 /3) leads to multilevel logistic regression 17

18 Underlying latent variable not an essential assumption of the model useful for obtaining intra-class correlation (r) and for design effect (d) r = σ2 υ σ 2 υ + σ 2 d = σ2 υ + σ 2 σ 2 = 1/(1 r) ratio of actual variance to the variance that would be obtained by simple random sampling (holding sample size constant) 18

19 Scaling of regression coefficients Fixed-effects model β estimates from logistic regression are larger (in abs. value) than from probit regression by approximately because π 2 /3 1 V (y) = σ 2 = π 2 /3 for logistic V (y) = σ 2 = 1 for probit =

20 Mixed-effects model β estimates from mixed-effects model are larger (in abs. value) than from fixed-effects model by approximately because d = συ 2 + σ 2 σ 2 V (y) = σ 2 υ + σ 2 in mixed-effects model V (y) = σ 2 in fixed-effects model difference depends on size of random-effects variance σ 2 υ 20

21 Within-Clusters / Between-Clusters models Within-clusters model - level 1 (j = 1,..., n i ) log observed response P (Y ij = 1) 1 P (Y ij = 1) = b 0i + b 1i Sex ij latent response y ij = b 0i + b 1i Sex ij + ε ij Between-clusters model - level 2 (i = 1,..., N) b 0i = β 0 + β 2 Grp i + υ 0i b 1i = β 1 + β 3 Grp i with υ 0i N ID(0, σ 2 υ) and ε ij LID(0, π 2 /3) 21

22 Put together, logit ij = b 0i + b 1i Sex ij = (β 0 + β 2 Grp i + υ 0i ) + (β 1 + β 3 Grp i )Sex ij = β 0 + β 1 Sex ij + β 2 Grp i + β 3 (Grp i Sex ij ) + υ 0i β 0 = logit when Sex = Grp = 0 β 1 = Sex effect when Grp = 0 β 2 = Grp effect when Sex = 0 β 3 = difference between Sex effect for Grp = 1 vs Grp = 0; or difference between Grp effect for Sex = 1 vs Sex = 0 coding of variables very important for correct interpretation. Also, these are controlling for cluster effect ( cluster-specific effects) 22

23 Effects of a School-based Intervention The Television School and Family Smoking Prevention and Cessation Project (Flay, et al., 1988); a subsample: sample th-graders classes - 28 schools 1 to 13 classes per school, 2 to 28 students per class outcome - knowledge of the effects of tobacco use timing - students tested at pre and post-intervention design - schools exposed to a social-resistance classroom curriculum (CC) a media (television) intervention (TV) CC combined with TV a no-treatment control group 23

24 Main question of interest: Influence of the intervention on the tobacco health knowledge scores (THKS)? Challenges in the analysis: outcome variable (THKS) is number correct of 7 items controlling for intra-school and intra-class variability potential explanatory variables are at different levels 24

25 Tobacco and Health Knowledge Scale Post-Intervention Scores 3 (out of 7) Subgroup Descriptive Statistics CC = no CC = yes TV=no TV=yes TV=no TV=yes n proportions odds logits

26 Within-Clusters / Between-Clusters components Within-clusters model - level 1 (j = 1,..., n i subjects) logit ij = b 0i Between-clusters model - level 2 (i = 1,..., N clusters) b 0i = β 0 + β 1 CC i + β 2 T V i + β 3 (CC i T V i ) + υ 0i υ 0i N ID(0, σ 2 υ) 26

27 β 0 β 1 = THKS logit for CC=no TV=no subgroup = logit diff. between CC=yes vs CC=no (for TV=no) b 0i = β 0 + (β 1 + β 3 T V i )CC i + β 2 T V i + υ 0i β 2 = logit diff. between TV=yes vs TV=no (for CC=no) b 0i = β 0 + (β 2 + β 3 CC i )T V i + β 1 CC i + υ 0i β 3 = difference in logit attributable to interaction υ 0i = random cluster deviation note: interpretation depends on coding of variables, and βs are adjusted for the cluster effects (cluster-specific effects) 27

28 3-level model Within-classrooms (and schools) model - level 1 (k = 1,..., n ij students) logit ijk = b 0ij Between-classrooms (within-schools) model - level 2 (j = 1,..., n i classrooms) b 0ij = b 0i + υ 0ij Between-schools model - level 3 (i = 1,..., N schools) b 0i = β 0 + β 1 CC i + β 2 T V i + β 3 (CC i T V i ) + υ 0i υ 0ij N ID(0, σ 2 υ(2) ) and υ 0i N ID(0, σ 2 υ(3) ) 28

29 β 0 β 1 β 2 β 3 = THKS logit for CC=no TV=no subgroup = logit diff. between CC=yes vs CC=no (for TV=no) = logit diff. between TV=yes vs TV=no (for CC=no) = difference in logit attributable to interaction υ 0ij = random classroom deviation υ 0i = random school deviation 29

30 Stata for multilevel analysis of dichotomous outcomes: melogit (version 13 and thereafter) Multiple levels of nesting, crossed random effects Full likelihood estimation using numerical quadrature for integration over the random effects non-adaptive, mode/curvature adaptive, mean/variance adaptive (default except for crossed random effects) 7 points per dimension are the default; more points provides greater accuracy, but also more computation time Laplace approximation (default for crossed random effects models) same as mode/curvature adaptive with one point can produce biased estimates, especially as the ICC is high and numbers of clusters and/or subjects is small 30

31 Stata Example: tvsfp binary.do log using u:\stata_examples\tvsfp_binary.log, replace infile school class thkso thksb ones thkspre cc tv cctv using clear summarize codebook school class * ordinary logistic regression logit thksb cc tv cctv, nolog * 2-level logistic regression melogit thksb cc tv cctv, nolog class: scalar m2 = e(ll) estat icc * 3-level logistic regression melogit thksb cc tv cctv, nolog school: class: scalar m3 = e(ll) estat icc 31

32 * random school and class effects with std errors predict u3 u2, reffects reses(u3se u2se) * assign a value of 1 for one obs in each class * & rank the RE estimates & class ids egen pick1class = tag(class) egen u2rank = rank(u2) if pick1class==1 egen classrank = rank(class) if pick1class==1 list class u2 u2se u2rank if pick1class==1 & classrank <= 10 * histogram of class random effects histogram u2 if pick1class==1, normal * std error bar chart (caterpillar plot) of class random effects serrbar u2 u2se u2rank if pick1class==1, scale(1.96) yline(0) * get LR test for comparing 2- and 3-level models display "chibar2(01) = " 2*(m3-m2) display "Prob > chibar2(01) = "chi2tail(1, 2*(m3-m2))/2 log close 32

33 . infile school class thkso thksb ones thkspre cc tv cctv using clear * cannot be read as a number for school[1601] (eof not at end of obs) (1,601 observations read). summarize Variable Obs Mean Std. Dev. Min Max school 1, class 1, thkso 1, thksb 1, ones 1, thkspre 1, cc 1, tv 1, cctv 1,

34 . codebook school class school type: numeric (float) range: [193,515] units: 1 unique values: 28 missing.: 1/1,601 mean: std. dev: percentiles: 10% 25% 50% 75% 90% class type: numeric (float) range: [193101,515113] units: 1 unique values: 135 missing.: 1/1,601 mean: std. dev: percentiles: 10% 25% 50% 75% 90%

35 . * ordinary logistic regression. logit thksb cc tv cctv, nolog Logistic regression Number of obs = 1,600 LR chi2(3) = Prob > chi2 = Log likelihood = Pseudo R2 = thksb Coef. Std. Err. z P> z [95% Conf. Interval] cc tv cctv _cons

36 . * 2-level logistic regression. melogit thksb cc tv cctv, nolog class: Mixed-effects logistic regression Number of obs = 1,600 Group variable: class Number of groups = 135 Obs per group: min = 1 avg = 11.9 max = 28 Integration method: mvaghermite Integration pts. = 7 Wald chi2(3) = Log likelihood = Prob > chi2 = thksb Coef. Std. Err. z P> z [95% Conf. Interval] cc tv cctv _cons class var(_cons) LR test vs. logistic model: chibar2(01) = Prob >= chibar2 =

37 . scalar m2 = e(ll). estat icc Residual intraclass correlation Level ICC Std. Err. [95% Conf. Interval] class

38 . *3-level logistic regression. melogit thksb cc tv cctv, nolog school: class: Mixed-effects logistic regression Number of obs = 1, No. of Observations per Group Group Variable Groups Minimum Average Maximum school class Integration method: mvaghermite Integration pts. = 7 Wald chi2(3) = Log likelihood = Prob > chi2 = thksb Coef. Std. Err. z P> z [95% Conf. Interval] cc tv cctv _cons

39 school var(_cons) school>class var(_cons) LR test vs. logistic model: chi2(2) = Prob > chi2 = Note: LR test is conservative and provided only for reference.. scalar m3 = e(ll). estat icc Residual intraclass correlation Level ICC Std. Err. [95% Conf. Interval] school class school

40 . * random school and class effects with std errors. predict u3 u2, reffects reses(u3se u2se). * assign a value of 1 for one obs in each class. * & rank the RE estimates & class ids. egen pick1class = tag(class). egen u2rank = rank(u2) if pick1class==1. egen classrank = rank(class) if pick1class==1. list class u2 u2se u2rank if pick1class==1 & classrank <= class u2 u2se u2rank

41 * histogram of class random effects histogram u2 if pick1class==1, normal Density empirical Bayes means for _cons[school>class] 41

42 * standard error bar chart (caterpillar plot) of class random effects serrbar u2 u2se u2rank if pick1class==1, scale(1.96) yline(0) empirical Bayes means for _cons[school>class] rank of (u2) 42

43 Model comparisons - Likelihood Ratio (LR) tests comparing mixed logistic to ordinary (fixed) logistic regression LR test vs. logistic regression: chibar2(01) = Prob>=chibar2 = H 0 : σ 2 υ (2) = 0, H A : σ 2 υ (2) > 0 one-sided test chibar2(01) refers to a 50:50 mixture of a χ 2 0 and a χ 2 1 distribution; chi-bar square distribution; p-value is obtained from χ 2 1, but is halved comparing 3-level mixed logistic to ordinary (fixed) logistic regression LR test vs. logistic regression: chi2(2) = Prob > chi2 = Note: LR test is conservative and provided only for reference. H 0 : σ 2 υ (2) = σ 2 υ (3) = 0 43

44 comparing 3-level to 2-level mixed logistic * 2-level logistic regression melogit thksb cc tv cctv class: scalar m2 = e(ll) * 3-level logistic regression melogit thksb cc tv cctv school: class: scalar m3 = e(ll) m2 = 2-level Log likelihood = m3 = 3-level Log likelihood = display "chibar2(01) = " 2*(m3-m2) chibar2(01) = display "Prob > chibar2(01) = "chi2tail(1, 2*(m3-m2))/2 Prob > chibar2(01) = H 0 : σ 2 υ (3) = 0, H A : σ 2 υ (3) > 0 one-sided test 44

45 THKS Post-Int (dichotomized) Scores - LR Estimates (std errs) Multilevel Fixed 2-level 3-level intercept (.099) (.140) (.192) CC (.145) (.203) (.278) TV (.139) (.199) (.270) CC TV (.204) (.287) (.390) class var (.087) (.081) school var.120 (.077) -2 log L p <.01 p <.05 p <.10 (Wald tests not done for vars) 45

46 SAS for multilevel analysis of dichotomous outcomes PROC GLIMMIX (version and thereafter) Multiple levels of nesting, crossed random effects Pseudo-likelihood estimation (by default) Linearization to avoid integration over the random effects Produces biased estimates if number of level-1 or level-2 units is small and/or ICC is large Full likelihood estimation using numerical quadrature for integration over the random effects METHOD=QUAD; however for 3-level models can only use METHOD=QUAD(QPOINTS=1) or METHOD=LAPLACE (these are equivalent) PROC NLMIXED Full likelihood estimation using numerical quadrature for integration over the random effects Only for 2-level models; allows programming features (can do 3-level models with SAS/STAT 13.2; 2nd maintenance release for SAS 9.4) 46

47 SAS Example: tvsfp binary.sas FILENAME TvsfpDat URL ; DATA one; INFILE TvsfpDat; INPUT sid cid thkso thksb int thkspre cc tv cctv; sometimes doesn t seem to work... ERROR: The connection has timed out.. NOTE: The SAS System stopped processing this step because of errors. in this case, easiest just to go to URL and download the data FILENAME TvsfpDat u:/mixdemo/tvsfpors.dat ; DATA one; INFILE TvsfpDat; INPUT sid cid thkso thksb int thkspre cc tv cctv; RUN; 47

48 /* logistic regression ignoring clustering */ PROC LOGISTIC; MODEL thksb (DESCENDING) = cc tv cctv; /* GLIMMIX: students in classrooms - Quasi-Like */ PROC GLIMMIX NOCLPRINT; CLASS cid; MODEL thksb (DESCENDING) = cc tv cctv / DIST=BINARY SOLUTION; RANDOM INTERCEPT / SUBJECT = cid TYPE=CHOL; RUN; TYPE=CHOL requests estimation of cluster standard deviation (σ υ ) rather than variance (σ 2 υ). More stable computationally if variance is close to zero. 48

49 /* GLIMMIX: students in classrooms - Full-Like */ PROC GLIMMIX NOCLPRINT METHOD=QUAD; CLASS cid; MODEL thksb (DESCENDING) = cc tv cctv / DIST=BINARY SOLUTION; RANDOM INTERCEPT / SUBJECT=cid TYPE=CHOL SOLUTION; COVTEST class variance GLM; ODS OUTPUT SOLUTIONR=ClassEffects; RUN; METHOD=QUAD requests full-likelihood estimation (using numerical quadrature) SOLUTION on RANDOM statement produces estimates of random classroom effects; ODS statement directs these to the data set ClassEffects COVTEST class variance GLM statement yields a likelihood ratio test of H 0 : σ 2 υ (2) = 0, H A : σ 2 υ (2) > 0 one-sided test 49

50 The GLIMMIX Procedure Model Information Data Set Response Variable Response Distribution Link Function Variance Function Variance Matrix Blocked By Estimation Technique Likelihood Approximation Degrees of Freedom Method WORK.ONE thksb Binary Logit Default cid Maximum Likelihood Gauss-Hermite Quadrature Containment Number of Observations Read 1600 Number of Observations Used

51 Response Profile Ordered Total Value thksb Frequency The GLIMMIX procedure is modeling the probability that thksb= 1. Dimensions G-side Cov. Parameters 1 Columns in X 4 Columns in Z per Subject 1 Subjects (Blocks in V) 135 Max Obs per Subject 28 51

52 Optimization Information Optimization Technique Dual Quasi-Newton Parameters in Optimization 5 Lower Boundaries 1 Upper Boundaries 0 Fixed Effects Not Profiled Starting From GLM estimates Quadrature Points 3 Iteration History Objective Max Iteration Restarts Evaluations Function Change Gradient Convergence criterion (GCONV=1E-8) satisfied. 52

53 Fit Statistics -2 Log Likelihood AIC (smaller is better) AICC (smaller is better) BIC (smaller is better) CAIC (smaller is better) HQIC (smaller is better) Fit Statistics for Conditional Distribution -2 log L(thksb r. effects) Pearson Chi-Square Pearson Chi-Square / DF 0.93 Covariance Parameter Estimates Standard Cov Parm Subject Estimate Error CHOL(1,1) cid

54 Solutions for Fixed Effects Standard Effect Estimate Error DF t Value Pr > t Intercept cc <.0001 tv cctv Type III Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F cc <.0001 tv cctv

55 Solution for Random Effects Std Err Effect Subject Estimate Pred DF t Value Pr > t Intercept cid Intercept cid Intercept cid Intercept cid Intercept cid Intercept cid Intercept cid Intercept cid Intercept cid Intercept cid Intercept cid Intercept cid Intercept cid

56 Tests of Covariance Parameters Based on the Likelihood Label DF -2 Log Like ChiSq Pr > ChiSq Note class variance <.0001 MI MI: P-value based on a mixture of chi-squares. Comparing mixed logistic to ordinary (fixed) logistic regression H 0 : σ 2 υ (2) = 0, H A : σ 2 υ (2) > 0 one-sided test mixture refers to a 50:50 mixture of a χ 2 0 and a χ 2 1 distribution; chi-bar square distribution; p-value is obtained from χ 2 1, but is halved 56

57 /* NLMIXED: students in classrooms - Full-Like */ PROC NLMIXED DATA=one; PARMS b0=-.34 b cc=.88 b tv=.27 b cctv=-.39 sd=1; z = b0 + b cc*cc + b tv*tv + b cctv*cctv + sd*u; IF (thksb=1) THEN p = 1/(1 + EXP(-z)); ELSE IF (thksb=0) THEN p = 1 - (1/(1 + EXP(-z))); logl = LOG(p); MODEL thksb GENERAL(logl); RANDOM u NORMAL(0,1) SUBJECT=cid; RUN; Programming features of PROC NLMIXED make it very flexible, though somewhat difficult to use; not really necessary for 2-level mixed logistic model 57

58 /* GLIMMIX: 3-level - quasi-likelihood */ PROC GLIMMIX NOCLPRINT DATA=one; CLASS cid sid; MODEL thksb (DESCENDING) = cc tv cctv / DIST=BINARY SOLUTION; RANDOM INTERCEPT / SUBJECT = cid(sid) TYPE=CHOL; RANDOM INTERCEPT / SUBJECT = sid TYPE=CHOL; RUN; /* GLIMMIX: 3-level - full-likelihood */ PROC GLIMMIX NOCLPRINT METHOD=QUAD(QPOINTS=1) DATA=one; CLASS cid sid; MODEL thksb (DESCENDING) = cc tv cctv / DIST=BINARY SOLUTION; RANDOM INTERCEPT / SUBJECT = cid(sid) TYPE=CHOL; RANDOM INTERCEPT / SUBJECT = sid TYPE=CHOL; COVTEST class & school variances GLM; COVTEST school variance. 0; RUN; 58

59 Tests of Covariance Parameters Based on the Likelihood Label DF -2 Log Like ChiSq Pr > ChiSq Note class & school variances < school variance MI MI: P-value based on a mixture of chi-squares. --: Standard test with unadjusted p-values. COVTEST class & school variances GLM compares 3-level mixed logistic to ordinary (fixed) logistic regression - test of independence H 0 : σ 2 υ (2) = σ 2 υ (3) = 0 COVTEST school variance. 0 compares 3-level mixed logistic to 2-level mixed logistic with random classroom effects. Mixture refers to a 50:50 mixture of a χ 2 0 and a χ 2 1 distribution. H 0 : συ 2 (3) = 0, H A : συ 2 (3) > 0 one-sided test 59

60 PROC SGPLOT DATA=ClassEffects; HISTOGRAM Estimate; DENSITY Estimate; RUN; 60

61 THKS Post-Int (dichotomized) Scores - LR Estimates (std errs) Fixed GLIMMIX full GLIMMIX quasi intercept (.099) (.140) (.190) (.137) (.204) CC (.145) (.203) (.277) (.199) (.293) TV (.139) (.199) (.268) (.195) (.286) CC TV (.204) (.287) (.387) (.281) (.409) class sd (.083) (.097) (.093) (.078) school sd (.110) (.115) -2 log L p <.01 p <.05 p <.10 (Wald-tests not done for sds) 61

62 62

$Open C:\SuperMixEn Examples\Workshop\Binary\tvsfpors.$

63 Under SSI, Inc > SuperMix (English) or SuperMix (English) Student Under File click on Open Spreadsheet Open C:\SuperMixEn Examples\Workshop\Binary\tvsfpors.ss3 (or C:\SuperMixEn Student Examples\Workshop\Binary\tvsfpors.ss3) 63

64 C:\SuperMixEn Examples\Workshop\Binary\tvsfpors.ss3 64

$Under File click on Open Existing Model Setup Open C:\SuperMixEn$

65 Under File click on Open Existing Model Setup Open C:\SuperMixEn Examples\Workshop\Binary\tvbc.mum (or C:\SuperMixEn Student Examples\Workshop\Binary\tvbc.mum) 65

66 Note Dependent Variable Type should be binary 66

67 For the moment, unselect PreTHKS as an explanatory variable 67

68 Note Optimization Method should be adaptive quadrature 68

69 69

70 70

71 71

72 Empirical Bayes Estimates of Random Effects Select Analysis > View Level-2 Bayes Results Class ID, random effect number, estimate, variance, name 72

73 Select File > Model-based Graphs > Confidence Intervals 73

74 order of classes on x-axis is the same as order in the dataset 74

$Under File click on Open Existing Model Setup Open C:\SuperMixEn$

75 Under File click on Open Existing Model Setup Open C:\SuperMixEn Examples\Workshop\Binary\tvbsc.mum (or C:\SuperMixEn Student Examples\Workshop\Binary\tvbsc.mum) 75

76 Note Dependent Variable Type should be binary 76

77 For the moment, unselect PreTHKS as an explanatory variable 77

78 Note Optimization Method should be adaptive quadrature 78

79 79

80 80

81 81

82 Empirical Bayes Estimates of Random Class Effects Select Analysis > View Level-2 Bayes Results School ID, Class ID, random effect number, estimate, variance, name 82

83 Empirical Bayes Estimates of Random School Effects Select Analysis > View Level-3 Bayes Results School ID, random effect number, estimate, variance, name 83

84 Calculation of ICC - 2 level model r = σ2 υ σ 2 υ + σ 2 Random classrooms model (π 2 /3 = ) r = π 2 /3 = % of the unexplained variation is at the classroom level 84

85 Calculation of ICC - 3 level model Level-3 (likeness of students in the same school) r = σ 2 υ(3) σ 2 υ(3) + σ2 υ(2) + σ2 = π 2 /3 =.034 Level-2 (likeness of students in same classroom & school) r = σ 2 υ(3) + σ2 υ(2) σ 2 υ(3) + σ2 υ(2) + σ2 = π 2 /3 =.081 Level-2 (likeness of classes in the same school) r = σ 2 υ(3) σ 2 υ(3) + σ2 υ(2) = =.415 r <.5 : the school level contributes slightly less to variability than the class level average classroom post THKS scores are moderately similar within schools 85

86 CC TV logistic Ψ(z) = [1 + exp( z)] 1 estimate Fixed-effects model 0 0 Ψ(.341) Ψ( ) Ψ( ) Ψ( ).603 Random-classrooms model ˆd = ( π 2 /3)/(π 2 /3) 0 0 Ψ((.384)/ ˆd) Ψ(( )/ ˆd) Ψ(( )/ ˆd) Ψ(( )/ ˆd) level model ˆd = ( π 2 /3)/(π 2 /3) 0 0 Ψ((.391)/ ˆd) Ψ(( )/ ˆd) Ψ(( )/ ˆd) Ψ(( )/ ˆd).597 d = design effect = (σ 2 υ + σ 2 )/σ 2 or = (σ 2 υ(3) + σ2 υ(2) + σ2 )/σ 2 86

87 Stata mata script: tvsfp binary mataest1.do * 3-level model with intercept, cc, tv, cc*tv mata beta = ( \ \ \ ) xmat = (1, 0, 0, 0 \ 1, 0, 1, 0 \ 1, 1, 0, 0 \ 1, 1, 1, 1) xbeta = xmat*beta varc = vars = var1 = pi()^2/3 d = (vars + varc + var1)/var1 xbetad = xbeta/sqrt(d) estprob = invlogit(xbetad) estprob 87

88 Supermix population average estimates Population Average Estimates Standard Parameter Estimate Error z Value P Value intercept CC TV CC*TV CC TV logistic Ψ(z) = [1 + exp( z)] 1 estimate Population average estimates from 3-level analysis 0 0 Ψ(.367) Ψ( ) Ψ( ) Ψ( )

89 Stata mata script: tvsfp binary mataest1b.do * 3-level model with intercept, cc, tv, cc*tv - PA estimates mata beta = ( \ \ \ ) xmat = (1, 0, 0, 0 \ 1, 0, 1, 0 \ 1, 1, 0, 0 \ 1, 1, 1, 1) xbeta = xmat*beta estprob = invlogit(xbeta) estprob end 89

90 Within-Clusters / Between-Clusters components Within-clusters model - level 1 (j = 1,..., n i subjects) logit ij = b 0i + b 1i P RET HKS ij Between-clusters model - level 2 (i = 1,..., N clusters) b 0i = β 0 + β 2 CC i + β 3 T V i + β 4 (CC i T V i ) + υ 0i b 1i = β 1 υ 0i N ID(0, σ 2 υ) 90

91 β 0 β 1 β 2 β 3 β 4 = (PRETHKS adjusted) logit for CC=no TV=no subgroup = effect of PRETHKS on POSTTHKS = (PRETHKS adjusted) logit diff. between CC=yes vs CC=no (for TV=no) = (PRETHKS adjusted) logit diff. between TV=yes vs TV=no (for CC=no) = (PRETHKS adjusted) difference in logit attributable to interaction υ 0i = random cluster deviation 91

92 3-level model Within-classrooms (and schools) model - level 1 (k = 1,..., n ij students) logit ijk = b 0ij + b 1ij P RET HKS ijk Between-classrooms (within-schools) model - level 2 (j = 1,..., n i classrooms) b 0ij = b 0i + υ 0ij b 1ij = b 1i Between-schools model - level 3 (i = 1,..., N schools) b 0i = β 0 + β 2 CC i + β 3 T V i + β 4 (CC i T V i ) + υ 0i b 1i = β 1 υ 0ij N ID(0, σ 2 υ(2) ) and υ 0i N ID(0, σ 2 υ(3) ) 92

93 Stata code: tvsfp binary.do add thkspre to the explanatory variable list melogit thksb melogit thksb thkspre cc tv cctv class: thkspre cc tv cctv school: class: or meqrlogit thksb thkspre cc tv cctv class:, intp(11) meqrlogit thksb thkspre cc tv cctv school: class:, intp(11) meqrlogit uses the Cholesky (matrix square root) of the random-effects variance-covariance matrix in estimation (more stable if variances are close to zero) intp(11) changes the default of 7 quadrature points to 11 93

94 SAS code: tvsfp binary.sas add thkspre to the explanatory variable list on the MODEL statement /* GLIMMIX: 3-level - full-likelihood */ PROC GLIMMIX NOCLPRINT METHOD=QUAD(QPOINTS=1); CLASS cid sid; MODEL thksb (DESCENDING) = thkspre cc tv cctv / DIST=BINARY SOLUTION; RANDOM INTERCEPT / SUBJECT = cid(sid) TYPE=CHOL; RANDOM INTERCEPT / SUBJECT = sid TYPE=CHOL; RUN; 94

95 In Supermix, reopening TVBSC.mum and selecting PreTHKS as an explanatory variable 95

96 THKS Post-Int (dichotomized) Scores - LR Estimates (std err) Multilevel Fixed 2-level 3-level intercept (.141) (.170) (.196) PRETHKS (.044) (.046) (.046) CC (.150) (.197) (.245) TV (.143) (.192) (.236) CC TV (.210) (.277) (.343) class var (.080) (.081) school var.063 (.062) -2 log L p <.01 p <.05 p <.10 (Wald-tests not done for vars) 96

97 Calculation of ICC - 2 level models r = σ2 υ σ 2 υ + σ 2 Random classrooms model.219 r = π 2 /3 = % of the unexplained variation is at the classroom level 97

98 Calculation of ICC - 3 level model Level-3 (likeness of students in the same school) r = σ 2 υ(3) σ 2 υ(3) + σ2 υ(2) + σ2 = π 2 /3 =.018 Level-2 (likeness of students in same classroom & school) r = σ 2 υ(3) + σ2 υ(2) σ 2 υ(3) + σ2 υ(2) + σ2 = π 2 /3 =.063 Level-2 (likeness of classes in the same school) r = σ 2 υ(3) σ 2 υ(3) + σ2 υ(2) = =.276 r <.5 : the school level contributes less to variability than the class level average classroom post THKS scores are moderately similar within schools 98

99 CC TV logistic Ψ(z) = [1 + exp( z)] 1 estimate Fixed-effects model 0 0 Ψ( ) Ψ( ) Ψ( ) Ψ( ).610 Random-classrooms model ˆd = ( π 2 /3)/(π 2 /3) 0 0 Ψ(( )/ ˆd) Ψ(( )/ ˆd) Ψ(( )/ ˆd) Ψ(( )/ ˆd) level model ˆd = ( π 2 /3)/(π 2 /3) 0 0 Ψ(( )/ ˆd) Ψ(( )/ ˆd) Ψ(( )/ ˆd) Ψ(( )/ ˆd).605 d = design effect = (σ 2 υ + σ 2 )/σ 2 or = (σ 2 υ(3) + σ2 υ(2) + σ2 )/σ 2 99

100 Supermix population average estimates Population Average Estimates Standard Parameter Estimate Error z Value P Value intercept CC TV CC*TV PreTHKS CC TV logistic Ψ(z) = [1 + exp( z)] 1 estimate Population average estimates from 3-level analysis 0 0 Ψ( ) Ψ( ) Ψ( ) Ψ( )

101 Stata mata script: tvsfp binary mataest2.do * 3-level model with intercept, prethks, cc, tv, cc*tv mata beta = ( \ \ \ \ ) xmat = (1, 2.152, 0, 0, 0 \ 1, 2.087, 0, 1, 0 \ 1, 2.050, 1, 0, 0 \ 1, 1.979, 1, 1, 1) xbeta = xmat*beta varc = vars = var1 = pi()^2/3 d = (vars + varc + var1)/var1 xbetad = xbeta/sqrt(d) estprob = invlogit(xbetad) estprob end 101

102 Stata mata script: tvsfp binary mataest2b.do * 3-level model with int, prethks, cc, tv, cc*tv - PA estimates mata beta = ( \ \ \ \ ) xmat = (1, 2.152, 0, 0, 0 \ 1, 2.087, 0, 1, 0 \ 1, 2.050, 1, 0, 0 \ 1, 1.979, 1, 1, 1) xbeta = xmat*beta estprob = invlogit(xbeta) estprob end

Mixed Models for Longitudinal Binary Outcomes. Don Hedeker Department of Public Health Sciences University of Chicago.

Mixed Models for Longitudinal Binary Outcomes Don Hedeker Department of Public Health Sciences University of Chicago hedeker@uchicago.edu https://hedeker-sites.uchicago.edu/ Hedeker, D. (2005). Generalized