I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Size: px

Start display at page:

Download "I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN"

John Davidson
6 years ago
Views:

1 for Clustered Edps/Psych/Stat 587 Fall 2010 Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN c Board of Trustees, University of Illinois for Clustered Slide 1 of 98

2 Outline In this set of notes: More References Quick Introduction to logistic regression. : Population-Average Model : Subject-specific Model Three Level Models Reading/References: Snijders & Bosker, Chapter 14 Molenberghs, G. & Verbeke, G. (2005). Models for Discrete Longitudinal. Springer. Agresti, A. (2002). Categorical Analysis, Second Edition. NY: Wiley. Agresti, A. (2007). An Introduction to Categorical Analysis, Second edition, NY: Wiley. Concluding Comments Item Response Theory Wrap-up for Clustered Slide 2 of 98

3 More References More References Skrondal, A. & Rabe-Hesketh, S. (2004). Generalized Latent Variable Modeling. NY: Chapman & Hall/CRC. de Boeck, P. & Wilson, M. (editors) (2004). Explanatory Item Response Models. Springer. Molenbergahs, G. & Verbeke, G. (2004). An introduction to (Generalized Non) Linear Mixed Models, Chapter 3, pp In de Boeck, P. & Wilson, M. (Eds.) Explanatory Item Response Models. Springer. Three Level Models Anderson, C.J., Verkuilen, J.V., & Johnson, T. (after my sabbatical). Applied Generalized Linear Mixed Models: Continuous and Discrete. Springer. Concluding Comments Item Response Theory Wrap-up for Clustered Slide 3 of 98

4 Clustered, nested, hierarchial, longitudinal. Respiratory Infection Longitudinal Depression Example Cool Kids LSAT6 The response/outcome variable is dichotomous. Examples: Longitudinal study of patients in treatment for depression: normal or abnormal Responses to items on an exam (correct/incorrect) Admission decisions for graduate programs in different departments. Longitudinal study of respiratory infection in children Whether basketball players make free-throw shots. Whether cool kids are tough kids. others Three Level Models for Clustered Slide 4 of 98 Concluding Comments

5 Respiratory Infection Respiratory Infection Longitudinal Depression Example Cool Kids LSAT6 From Skrondal & Rabe-Hesketh (2004) also analyzed by Zeger & Karim (1991), Diggle et. al (2002), but originally from Sommer et al (1983) Preschool children from Indonesia who were examined up to 6 consecutive quarters for respiratory infection. Predictors/explanatory/covariates: Age in months Xeropthalmia as indicator of chronic vitamin A deficiency (dummy variable) night blindness & dryness of membranes dryness of cornea softening of cornea Cosine of annual cycle (ie., season of year) Sine of annual cycle (ie., season of year). Gender Height (as a percent) Stunted Three Level Models for Clustered Slide 5 of 98 Concluding Comments

6 Longitudinal Depression Example From Agresti (2002) who got it from Koch et al (1977) Respiratory Infection Longitudinal Depression Example Cool Kids LSAT6 Comparison of new drug with a standard drug for treating depression. Classified as N= Normal and A= Abnormal at 1, 2 and 4 weeks. Response at Each of 3 Time Points Diagnosis R x NNN NNA NAN NAA ANN ANA AAN AAA Mild std new Severe std new Three Level Models for Clustered Slide 6 of 98 Concluding Comments

7 Cool Kids Respiratory Infection Longitudinal Depression Example Cool Kids LSAT6 Rodkin, P.C., Farmer, T.W, Pearl, R. & Acker, R.V. (2006). They re cool: social status and peer group for aggressive boys and girls. 15, Clustering: Kids within peer groups within classrooms. Response variable: Whether a kid nominated by peers is classified as a model (ideal) student. Predictors: Nominator s Popularity Gender Race Classroom aggression level Three Level Models for Clustered Slide 7 of 98 Concluding Comments

8 LSAT6 Law School Admissions data: 5 items, N = 1000 Respiratory Infection Longitudinal Depression Example Cool Kids LSAT6 Y 1 Y 2 Y 3 Y 4 Y 5 Freqency Three Level Models for Clustered Slide 8 of 98 Concluding Comments

9 Three Level Models Concluding Comments Item Response Theory The logistic regression model is a generalized linear model with Random component: The response variable is binary. Y i = 1 or 0 (an event occurs or it doesn t). We are interesting in probability that Y i = 1, π(x i ). The distribution of Y i is Binomial. Systematic component: A linear predictor such as α+β 1 x 1i +...+β j x ji The explanatory or predictor variables may be quantitative (continuous), qualitative (discrete), or both (mixed). : The log of the odds that an event occurs, otherwise known as the logit: ( ) π logit(π) = log 1 π The logistic regression ( model is ) π(xi ) logit(π(x i )) = log = α+β 1 x 1i +...+β j x ji 1 π(x i ) Wrap-up for Clustered Slide 9 of 98

10 The The Bernoulli Random Variable Bernoulli Variance vs Mean Example of Bernoulli Random Variable Variance of Binomial Random Variable Variance vs Mean Function Function We now assume that the number of trials is fixed and we count the number of successes or events that occur. Preliminaries: Bernoulli random variables X is a random variable where X = 1 or 0 The probability that X = 1 is π The probability that X = 0 is (1 π) Such variables are called Bernoulli random variables. for Clustered Slide 10 of 98

11 Bernoulli Random Variable The mean of a Bernoulli random variable is µ x = E(X) = 1π +0(1 π) = π The Bernoulli Random Variable Bernoulli Variance vs Mean Example of Bernoulli Random Variable Variance of Binomial Random Variable Variance vs Mean Function Function The variance of X is var(x) = σx 2 = E[(X µ X ) 2 ] = (1 π) 2 π +(0 π) 2 (1 π) = π(1 π) for Clustered Slide 11 of 98

12 Bernoulli Variance vs Mean The Bernoulli Random Variable Bernoulli Variance vs Mean Example of Bernoulli Random Variable Variance of Binomial Random Variable Variance vs Mean Function Function for Clustered Slide 12 of 98

13 Example of Bernoulli Random Variable Suppose that a coin is not fair or is loaded The Bernoulli Random Variable Bernoulli Variance vs Mean Example of Bernoulli Random Variable Variance of Binomial Random Variable Variance vs Mean Function Function The probability that it lands on heads equals.40 and the probability that it lands on tails equals.60. If this coin is flipped many, many, many times, then we would expect that it would land on heads 40% of the time and tails 60% of the time. We define our Bernoulli random variable as X = 1 if Heads 0 if Tails where π = P(X = 1) =.40 and (1 π) = P(X = 0) =.60. Note: Once you know π, you know the mean and variance of the distribution of X. for Clustered Slide 13 of 98

14 The Bernoulli Random Variable Bernoulli Variance vs Mean Example of Bernoulli Random Variable Variance of Binomial Random Variable Variance vs Mean Function Function A binomial random variable is the sum of n independent Bernoulli random variables. We will let Y represent a binomial random variable and by definition Y = n i=1 X i The mean of a Binomial random variable is n µ y = E(Y) = E( X i ) i=1 = E(X 1 )+E(X 2 )+...+E(X n ) = = = nπ n {}}{ µ x +µ x +...+µ x n {}}{ π +π +...+π for Clustered Slide 14 of 98

15 Variance of Binomial Random Variable The Bernoulli Random Variable Bernoulli Variance vs Mean Example of Bernoulli Random Variable Variance of Binomial Random Variable Variance vs Mean Function Function... and the variance of a Binomial random variable is var(y) = σ 2 y = var(x 1 +X X n ) = = n {}}{ var(x)+var(x)+...+var(x) n {}}{ π(1 π)+π(1 π)+...+π(1 π) = nπ(1 π) Note: Once you know π and n, you know the mean and variance of the Binomial distribution. for Clustered Slide 15 of 98

16 Variance vs Mean The Bernoulli Random Variable Bernoulli Variance vs Mean Example of Bernoulli Random Variable Variance of Binomial Random Variable Variance vs Mean Function Function for Clustered Slide 16 of 98

17 Function Toss the unfair coin with π =.40 coin n = 3 times. Y = number of heads. The Bernoulli Random Variable Bernoulli Variance vs Mean Example of Bernoulli Random Variable Variance of Binomial Random Variable Variance vs Mean Function Function The tosses are independent of each other. Possible Outcomes Probability of a Sequence Prob(Y) X 1 +X 2 +X 3 = Y P(X 1,X 2,X 3 ) P(Y) = 3 (.4)(.4)(.4) = (.4) 3 (.6) 0 = = 2 (.4)(.4)(.6) = (.4) 2 (.6) 1 = = 2 (.4)(.6)(.4) = (.4) 2 (.6) 1 =.096 3(.096) = = 2 (.6)(.4)(.4) = (.4) 2 (.6) 1 = = 1 (.4)(.6)(.6) = (.4) 1 (.6) 2 = = 1 (.6)(.4)(.6) = (.4) 1 (.6) 2 =.144 3(.144) = = 1 (.6)(.6)(.4) = (.4) 1 (.6) 2 = = 0 (.6)(.6)(.6) = (.4) 0 (.6) 3 = for Clustered Slide 17 of 98

18 Function The Bernoulli Random Variable Bernoulli Variance vs Mean Example of Bernoulli Random Variable Variance of Binomial Random Variable Variance vs Mean Function Function The formula for the probability of a Binomial random variable is ( ) the number of ways that P(Y = a) = P(X = 1) a P(X = 0) (n a) Y = a out of n trials ( ) n = π a (1 π) n a a where ( n a ) = n! a!(n a)! = which is called the binomial coefficient. n(n 1)(n 2)...1 a(a 1)...1((n a)(n a 1)...1) For example, the number of ways that you can get Y = 2 out of 3 tosses is ( ) 3 = 3(2)(1) 2 2(1)(1) = 3 for Clustered Slide 18 of 98

19 The The Linear Predictor. A linear function of the explanatory variables: The The x s could be η i = β 0 +β 1 x 1i +β 2 x 2i +...+β K x Ki Metric (numerical, continuous ) Discrete (dummy or effect codes) Products (Interactions): e.g., x 3i = x 1i x 2i Quadratic, cubic terms, etc: e.g., x 3i = x 2 2i Transformations: e.g., x 3i = log(x 3i ), x 3i = exp(x 3i ) Foreshadowing random effects models: Three Level Models η ij = β 0j +β 1j x 1ij +β 2j x 2ij +...+β Kj x Kij where i is index of level 1 and j is index of level 2. Concluding Comments Item Response Theory Wrap-up for Clustered Slide 19 of 98

20 The : Problem: Probabilities must be between 0 and 1. The : Some Example cdf s Putting All the Components Together Interpretation of the Parameters Example 1: Respiratory Example 1: The model Probability of Infection Probability of NO infection Example 2: Longitudinal Depression SAS and fitting Logit models η i could be between to. Solution: Use (inverse of) cumulative distribution function (cdf s) of a continuous variable to link the linear predictor and the mean of the response variable. cdf s are P(random variable specific value), which are between 0 and 1 Normal probit link Logistic logit link Gumbel (extreme value) Complementary log-log link log[ log(1 π)] for Clustered Slide 20 of 98

21 Some Example cdf s The : Some Example cdf s Putting All the Components Together Interpretation of the Parameters Example 1: Respiratory Example 1: The model Probability of Infection Probability of NO infection Example 2: Longitudinal Depression SAS and fitting Logit models for Clustered Slide 21 of 98

22 Putting All the Components Together log ( ) P(Yi = 1 x i ) P(Y i = 0 x i ) = logit(p(y i = 1 x i )) = β 0 +β 1 x 1i +β 2 x 2i +...+β K x Ki The : Some Example cdf s Putting All the Components Together Interpretation of the Parameters Example 1: Respiratory Example 1: The model Probability of Infection Probability of NO infection Example 2: Longitudinal Depression SAS and fitting Logit models where x i = (x 0i,x 1i,...,x Ki ). or in-terms of probabilities E(Y i x i ) = P(Y i = 1 x i ) = exp[β 0 +β 1 x 1i +β 2 x 2i +...+β K x Ki ] 1+exp[β 0 +β 1 x 1i +β 2 x 2i +...+β K x Ki ] Implicit assumption (for identification): For P(Y i = 0 x i ): β 0 = β 1 =... = β K = 0. for Clustered Slide 22 of 98

23 Interpretation of the Parameters Simple example: The : Some Example cdf s Putting All the Components Together Interpretation of the Parameters Example 1: Respiratory Example 1: The model Probability of Infection Probability of NO infection Example 2: Longitudinal Depression SAS and fitting Logit models P(Y i = 1 x i ) = exp[β 0 +β 1 x i ] 1+exp[β 0 +β 1 x i ] The ratio of the probabilities is the odds (odds of Y i = 1 vs Y = 0) = P(Y i = 1 x i ) P(Y i = 0 x i ) = exp[β 0 +β 1 x i ] For a 1 unit increase in x i the odds equal P(Y i = 1 (x i +1)) P(Y i = 0 (x i +1)) = exp[β 0 +β 1 (x i +1)] The odds ratio for a 1 unit increase in x i equal P(Y i = 1 (x i +1))/P(Y i = 0 (x i +1)) P(Y i = 1 x i )/P(Y i = 0 x i ) = exp[β 0 +β 1 (x i +1)] exp[β 0 +β 1 x i ] = exp(β 1 ) for Clustered Slide 23 of 98

24 Example 1: Respiratory One with a continuous explanatory variable (for now) Response variable The : Some Example cdf s Putting All the Components Together Interpretation of the Parameters Example 1: Respiratory Example 1: The model Probability of Infection Probability of NO infection Example 2: Longitudinal Depression SAS and fitting Logit models Y = whether person has had a respiratory infection P(Y = 1) Binomial with n = 1 Note: models can be fit to data at the level of the individual (i.e., Y i = 1 where n = 1) or to collapsed data (i.e., i index for everyone with same value on explanatory variable, and Y i = y where n = n i ). Systematic component β 0 +β 1 (age) i where age was been centered around 36 (I don t know why). Link logit for Clustered Slide 24 of 98

25 Example 1: The model Our logit model The : Some Example cdf s Putting All the Components Together Interpretation of the Parameters Example 1: Respiratory Example 1: The model Probability of Infection Probability of NO infection Example 2: Longitudinal Depression SAS and fitting Logit models P(Y i = 1 age i ) = exp(β 0 +β 1 (age) i ) 1+exp(β 0 +β 1 (age) i ) We ll ignore the clustering and use MLE to estimate this model, which yields Analysis Of Parameter Estimates Standard 95% Conf. Chi- Pr > Parameter Estimate Error Limits Square ChiSq Intercept <.0001 age <.0001 Interpretation: The odds of an infection equals exp(.0248) = 0.98 times that for a person one year younger. Alternatively: The odds of no infection equals exp(0.0248) = 1/.98 = 1.03 times the odds for a person one year older. for Clustered Slide 25 of 98

26 Probability of Infection The : Some Example cdf s Putting All the Components Together Interpretation of the Parameters Example 1: Respiratory Example 1: The model Probability of Infection Probability of NO infection Example 2: Longitudinal Depression SAS and fitting Logit models for Clustered Slide 26 of 98

27 Probability of NO infection The : Some Example cdf s Putting All the Components Together Interpretation of the Parameters Example 1: Respiratory Example 1: The model Probability of Infection Probability of NO infection Example 2: Longitudinal Depression SAS and fitting Logit models for Clustered Slide 27 of 98

28 Example 2: Longitudinal Depression From Agresti (2002) who got it from Koch et al (1977) Model Normal versus Abnormal at 1, 2 and 4 weeks. Also, whether mild/servere (s = 1 for severe) and standard/new drug (d = 1 for new). The : Some Example cdf s Putting All the Components Together Interpretation of the Parameters Example 1: Respiratory Example 1: The model Probability of Infection Probability of NO infection Example 2: Longitudinal Depression SAS and fitting Logit models for Clustered Slide 28 of 98 Parameter DF Estimate exp ˆβ Std. Error X 2 Pr> χ 2 1 Diagnose <.0001 Drug Time <.0001 Drug*Time <.0001 The odds of normal when diagnosis is severe is 0.27 times the odds when diagnosis is mild (or 1/.27 = 3.72). For new drug, the odds ratio of normal for 1 week later: exp[ ] = exp[1.4002] = 4.22 For the standard drug, the odds ratio of normal for 1 week later: exp[0.4824] = 1.62 What does exp( ) exp(0.4824) exp(1.0174) equal?

29 SAS and fitting Logit models The : Some Example cdf s Putting All the Components Together Interpretation of the Parameters Example 1: Respiratory Example 1: The model Probability of Infection Probability of NO infection Example 2: Longitudinal Depression SAS and fitting Logit models proc genmod descending; model outcome = diagnose treat time treat*time / dist=bin link=logit type3 obstats; output out=fitted pred=fitvalues StdResChi=haberman; title MLE ignoring repeated aspect of the data ; Or proc genmod descending; class diagnose(ref=first) treat(ref=first); * model outcome = diagnose treat time treat*time / dist=bin link=logit type3 obstats; output out=fitted pred=fitvalues StdResChi=haberman; title MLE ignoring repeated aspect of the data ; Or proc logistic descending; model outcome = diagnose treat time treat*time / lackfit influence; title MLE ignoring repeated aspect of the data ; Can also use the class statement in proc logistic for Clustered Slide 29 of 98

30 Three Level Models to deal with Clustering Population-averaged to deal with Clustering Demonstration via Simulation Simulation: Parameter Estimates Simulation: Fitted Values Conditional vs Marginal Models Explanation of Difference P(Y ij = 1 x ij ) = exp(β 0 +β 1 x 1ij +...+β K x Kij ) 1+exp(β 0 +β 1 x 1ij +...+β K x Kij ) Clustering a nuisance. Use generalized estimating equations (GEEs). Only estimate the first 2 moments. Random Effects: subject-specific P(Y ij = 1 x ij,u j ) = exp(β 0j +β 1j x 1ij +...+β Kj x Kij ) 1+exp(β 0j +β 1j x 1ij +...+β Kj x Kij ) The level 2 model, we specify models for the β kj s. The implied marginal of this random effects model when there is only a random intercept yields exp(γ 00 +γ 10x 1ij +...+γ K0x Kij +U 0 ) P(Y ij = 1 x ij ) = U 0 1+exp(γ 00 +γ 10 x 1ij +...+γ K0 x Kij +U 0 ) f(u 0)dU 0 for Clustered Slide 30 of 98

31 Three Level Models Demonstration via Simulation The following random model was simulated: to deal with Clustering Demonstration via Simulation Simulation: Parameter Estimates Simulation: Fitted Values Conditional vs Marginal Models Explanation of Difference P(Y ij = 1 x ij ) = exp( x ij +U 0j ) 1+exp( x ij +U 0j ) x ij = x i +ǫ ij where x i N(0,4) and ǫ ij N(0,.01). U O N(0,4). x i, ǫ ij and U 0j all independent. Number of macro units j = 1,...,50. Number of replications (micro units) i = 1,...,4. The logit models were fit by MLE ignoring clustering (PROC GENMOD). GEE using exchangable correlation matrix (PROC GENMOD) MLE of random effects model (PROC NLMIXED) for Clustered Slide 31 of 98

32 Three Level Models Simulation: Parameter Estimates to deal with Clustering Demonstration via Simulation Simulation: Parameter Estimates Simulation: Fitted Values Conditional vs Marginal Models Explanation of Difference MLE Ignoring GEE MLE Random clustering (exchangeable) Effects Std Std Std Parameter Estimate Error Estimate Error Estimate Error Intercept x From GEE: correlation =.42 From Random effects : ˆτ 2 o = (s.e. =.6018) and ˆτ 2 o = What do you notice? for Clustered Slide 32 of 98

33 Three Level Models Simulation: Fitted Values to deal with Clustering Demonstration via Simulation Simulation: Parameter Estimates Simulation: Fitted Values Conditional vs Marginal Models Explanation of Difference for Clustered Slide 33 of 98

34 Three Level Models Conditional vs s to deal with Clustering Demonstration via Simulation Simulation: Parameter Estimates Simulation: Fitted Values Conditional vs Marginal Models Explanation of Difference for Clustered Slide 34 of 98

35 Three Level Models Explanation of Difference or Why the marginal model (GEE) has weaker effects than the random effects model: to deal with Clustering Demonstration via Simulation Simulation: Parameter Estimates Simulation: Fitted Values Conditional vs Marginal Models Explanation of Difference The subject- (or cluster-) specific or conditional curves P(Y ij = 1 x ij,u 0j ) exhibit quite a bit of variability (& dependency within cluster). For a fixed x, there is considerable variability in the probability, P(Y ij = 1 U 0j ). For example, consider x = 0, the fitted probabilities range from about.3 to almost 1.0. The average of the P(Y ij = 1) averaged over j has a less steep slope, weaker effect. The greater the variability between the cluster specific curves (i.e. the large τ 0 and larger correlation within cluster), the greater the difference. for Clustered Slide 35 of 98

36 The Working Correlation Matrix GEE Example: Longitudinal Depression SAS and GEE GEE Example 2: Respiratory : Complex Model Miscellaneous Comments on s Have repeated measures data or nested data correlated observations. Use Generalized Estimating Equations (GEE) method (some cases MLE possible) In GLM, we assumed binomial distribution for binary data, which determines the relationship between the mean E(Y) and the variance var(y) of the response variable. For the GEE part, we need to specify (guess) what the correlational structure is for the observations. working correlation matrix. Independent: no correlation between observations. Exchangeable: correlation between pairs of observations are same within clusters (and is the same within all clusters) Autoregressive: for time t and t, correlation between Y t and Y t equals ρ t t Unstructured: correlations between all pairs within clusters can differ for Clustered Slide 36 of 98

37 The Working Correlation Matrix The Working Correlation Matrix GEE Example: Longitudinal Depression SAS and GEE GEE Example 2: Respiratory : Complex Model Miscellaneous Comments on s GEE assumes a distribution for each marginal (e.g., P(Y ij = 1) for all j) but does not assume distribution for joint (i.e., P(Y i1,y i2,...,y in )).... there s no multivariate generalizations of discrete data distributions like there is for the normal for continuous. is used to estimate the dependency between observations within a cluster. (the dependency assumed to be the same within all clusters) Choosing a Working Correlation Matrix If available, use information you know. If lack information and n is small, then try unstructured to give you an idea of what might be appropriate. If lack information and n is large, then unstructured might requires (too) many parameters. If you choose wrong, then still get valid standard errors because these are based on data (empirical). If the correlation/dependency is small, all choices will yield very similar results. for Clustered Slide 37 of 98

38 GEE Example: Longitudinal Depression The Working Correlation Matrix GEE Example: Longitudinal Depression SAS and GEE GEE Example 2: Respiratory : Complex Model Miscellaneous Comments on s Initial Exchangeable Unstructured Intercept (0.1639) (0.1742) (0.1726) diagnose (0.1464) (0.1460) (0.1450) treat (0.2222) (0.2286) (0.2271) time (0.1148) (0.1199) (0.1190) treat*time (0.1888) (0.1877) (0.1865) Working correlation for exchangeable =.0034 Correlation Matrix for Unstructured: Working Correlation Matrix Col1 Col2 Col3 Row Row Row (Interpretation the same as when we ignored clustering.) for Clustered Slide 38 of 98

39 SAS and GEE proc genmod descending data=depress; class case; model outcome = diagnose treat time treat*time / dist=bin link=logit type3; repeated subject=case / type=exch corrw; title GEE with Exchangeable ; run; The Working Correlation Matrix GEE Example: Longitudinal Depression SAS and GEE GEE Example 2: Respiratory : Complex Model Miscellaneous Comments on s Other correlational structures repeated subject=case / type=ar(1) corrw; title GEE with AR(1) ; repeated subject=case / type=unstr corrw; title GEE with Unstructured ; for Clustered Slide 39 of 98

40 GEE Example 2: Respiratory We ll do simple (just time) and then complex (lots of predictors): Exchangeable Working Correlation Correlation Analysis Of GEE Parameter Estimates Empirical Standard Error Estimates The Working Correlation Matrix GEE Example: Longitudinal Depression SAS and GEE GEE Example 2: Respiratory : Complex Model Miscellaneous Comments on s Standard 95% Confidence Parameter Estimate Error Limits Z Pr > Z Intercept <.0001 age <.0001 Score Statistics For Type 3 GEE Analysis Chi- Source DF Square Pr > ChiSq age <.0001 Estimated odds ratio = exp(.0243) = 0.96 (or 1/0.96 = 1.02) : Note ignoring correlation, odds ratio = 0.98 or 1/0.98 = for Clustered Slide 40 of 98

41 The Working Correlation Matrix GEE Example: Longitudinal Depression SAS and GEE GEE Example 2: Respiratory : Complex Model Miscellaneous Comments on s : Complex Model Exchangeable Working correlation = some model refinement needed... Analysis Of GEE Parameter Estimates Empirical Standard Error Estimates Parameter Estimate exp(beta) std. Error Z Pr > Z Intercept <.01 age <.01 xero xero female female cosine <.01 sine height stunted stunted for Clustered Slide 41 of 98

42 Miscellaneous Comments on s The Working Correlation Matrix GEE Example: Longitudinal Depression SAS and GEE GEE Example 2: Respiratory : Complex Model Miscellaneous Comments on s With GEE There is no likelihood being maximized = no likelihood based tests. (Information criteria statistics: QIC & UQIC) Can do Wald type tests and confidence intervals for parameters. Score tests are also available. There are other ways to model the marginal distribution(s) of discrete variables that depend on the number of observations per group (macro unit). e.g., For matched pairs of binary variables, MacNemars test. Loglinear models of quasi-symmetry and symmetry to test marginal homogeneity in square tables. Transition models. Others. for Clustered Slide 42 of 98

43 GLM but we re going to allow random parameters in the systematic component: η ij = β 0j +β 1j x 1ij +β 2j x 2ij +...+β Kj x Kij Putting Levels 1 & 2 Together A Simple Random Intercept Model Example 1: A Simple Example 1: Dimensions Table Example 1: Input and Iteration History Example 1: Fit Statistics & Parameter Estimates Example 1: Additional Parameter Estimates Example 1: Estimated Probabilities Example 1: Estimated Probabilities SAS PROC NLMIXED & where i is index of level 1 and j is index of level 2. Level 1: Model conditional on x ij and U j : P(Y ij = 1 x ij,u j ) = exp[β 0j +β 1j x 1ij +β 2j x 2ij +...+β Kj x Kij ] 1+exp[β 0j +β 1j x 1ij +β 2j x 2ij +...+β Kj x Kij ] where Y is binomial with n = 1 (i.e., Bernoulli). Level 2: Model for intercept and slopes: β 0j β 1j. β Kj where U j N(0,T)i.i.d.. = γ 00 +U 0j = γ U 1j GLIMMIX input for Clustered Slide 43 of 98 Different Estimation Methods:.. = γ K0 +U Kj

44 Putting Levels 1 & 2 Together P(Y ij = 1 x ij,u j ) = exp[γ 00 +γ 1 x 1ij +...+γ K x Kij +U 0j +...+U KJ x KJ ] 1+exp[γ 0 +γ 1 x 1ij +...+γ K x Kij +U 0j +...+U KJ x KJ ] Marginalizing... P(Y ij = 1 x ij ) =... U 0 U K exp(γ 00 +γ 10x 1ij +...U U K x Jij) 1+exp(γ 00 +γ 10 x 1ij...+U U K x Jij ) f(u)du Putting Levels 1 & 2 Together A Simple Random Intercept Model Example 1: A Simple Example 1: Dimensions Table Example 1: Input and Iteration History Example 1: Fit Statistics & Parameter Estimates Example 1: Additional Parameter Estimates Example 1: Estimated Probabilities Example 1: Estimated Probabilities SAS PROC NLMIXED & GLIMMIX input for Clustered Slide 44 of 98 Different Estimation Methods:

45 A Simple Level 1: P(Y ij = 1 x ij ) = exp[β 0j +β 1j x 1ij ] 1+exp[β 0j x 1ij ] where Y ij is Binomial (Bernoulli). Level 2: β 0j = γ 00 +U 0j β 1j = γ 01 where U 0j N(0,τ 2 0) i.i.d.. Putting Levels 1 & 2 Together A Simple Random Intercept Model Example 1: A Simple Example 1: Dimensions Table Example 1: Input and Iteration History Example 1: Fit Statistics & Parameter Estimates Example 1: Additional Parameter Estimates Example 1: Estimated Probabilities Example 1: Estimated Probabilities SAS PROC NLMIXED & Random effects model for micro unit i and macro unit j: P(Y ij = 1 x ij,u 0j ) = exp[γ 00 +γ 01 x 1ij +U 0j ] 1+exp[γ 00 +γ 01 x 1ij +U 0j ] GLIMMIX input for Clustered Slide 45 of 98 Different Estimation Methods:

46 Example 1: A Simple The respiratory data of children. The NLMIXED Procedure Specifications Putting Levels 1 & 2 Together A Simple Random Intercept Model Example 1: A Simple Example 1: Dimensions Table Example 1: Input and Iteration History Example 1: Fit Statistics & Parameter Estimates Example 1: Additional Parameter Estimates Example 1: Estimated Probabilities Example 1: Estimated Probabilities SAS PROC NLMIXED & Set Dependent Variable Distribution for Dependent Variable Random Effects Distribution for Random Effects Subject Variable Optimization Technique Integration Method WORK.RESPIRE resp Binary GLIMMIX input for Clustered Slide 46 of 98 Different Estimation Methods: u Normal id Dual Quasi-Newton Adaptive Gaussian Quadrature

47 Example 1: Dimensions Table Dimensions Putting Levels 1 & 2 Together A Simple Random Intercept Model Example 1: A Simple Example 1: Dimensions Table Example 1: Input and Iteration History Example 1: Fit Statistics & Parameter Estimates Example 1: Additional Parameter Estimates Example 1: Estimated Probabilities Example 1: Estimated Probabilities SAS PROC NLMIXED & Observations Used 1200 Observations Not Used 0 Total Observations 1200 Subjects 275 Max Obs Per Subject 6 Parameters 3 Quadrature Points 10 GLIMMIX input for Clustered Slide 47 of 98 Different Estimation Methods:

48 Example 1: Input and Iteration History Parameters lam bage sigma NegLogLike Iteration History Iter Calls NegLogLike Diff MaxGrad Putting Levels 1 & 2 Together A Simple Random Intercept Model Example 1: A Simple Example 1: Dimensions Table Example 1: Input and Iteration History Example 1: Fit Statistics & Parameter Estimates Example 1: Additional Parameter Estimates Example 1: Estimated Probabilities Example 1: Estimated Probabilities SAS PROC NLMIXED & E E-8 NOTE: GCONV convergence criterion satisfied. GLIMMIX input for Clustered Slide 48 of 98 Different Estimation Methods:

49 Example 1: Fit Statistics & Parameter Estimates Fit Statistics -2 Log Likelihood AIC (smaller is better) AICC (smaller is better) BIC (smaller is better) Putting Levels 1 & 2 Together A Simple Random Intercept Model Example 1: A Simple Example 1: Dimensions Table Example 1: Input and Iteration History Example 1: Fit Statistics & Parameter Estimates Example 1: Additional Parameter Estimates Example 1: Estimated Probabilities Example 1: Estimated Probabilities SAS PROC NLMIXED & Parameter Estimates Standard Parameter Est Error DF t Value Pr > t Gradient gamma < gage < tau < Note: I cut out Alpha, Lower & Upper GLIMMIX input for Clustered Slide 49 of 98 Different Estimation Methods:

50 Example 1: Additional Parameter Estimates Additional Estimates Standard Label Est Error DF Value Pr > t t Var(Uo) odds ratio <.0001 Putting Levels 1 & 2 Together A Simple Random Intercept Model Example 1: A Simple Example 1: Dimensions Table Example 1: Input and Iteration History Example 1: Fit Statistics & Parameter Estimates Example 1: Additional Parameter Estimates Example 1: Estimated Probabilities Example 1: Estimated Probabilities SAS PROC NLMIXED & I requested these. Alpha Lower Upper GLIMMIX input for Clustered Slide 50 of 98 Different Estimation Methods:

51 Example 1: Estimated Probabilities Putting Levels 1 & 2 Together A Simple Random Intercept Model Example 1: A Simple Example 1: Dimensions Table Example 1: Input and Iteration History Example 1: Fit Statistics & Parameter Estimates Example 1: Additional Parameter Estimates Example 1: Estimated Probabilities Example 1: Estimated Probabilities SAS PROC NLMIXED & GLIMMIX input for Clustered Slide 51 of 98 Different Estimation Methods:

Example 1: Estimated Probabilities Putting Levels 1 & 2 Together A Simple Random Intercept Model Example 1: A Simple Example 1: Dimensions Table Example 1: Input and Iteration History Example 1: Fit

52 Example 1: Estimated Probabilities Putting Levels 1 & 2 Together A Simple Random Intercept Model Example 1: A Simple Example 1: Dimensions Table Example 1: Input and Iteration History Example 1: Fit Statistics & Parameter Estimates Example 1: Additional Parameter Estimates Example 1: Estimated Probabilities Example 1: Estimated Probabilities SAS PROC NLMIXED & GLIMMIX input for Clustered Slide 52 of 98 Different Estimation Methods:

53 SAS PROC NLMIXED & GLIMMIX input proc nlmixed data=respire qpoints=10; parms gamma0=-2.3 gage=0.02 tau0= 0.8; eta = gamma0 + gage*age + u; p = exp(eta)/(1 + exp(eta)); model resp binary(p); random u normal(0, tau0*tau0) subject=id out=ebur; estimate Var(Uo) tau0**2; estimate odds ratio exp(gage); title Random Intercept, Simple Model ; run; Putting Levels 1 & 2 Together A Simple Random Intercept Model Example 1: A Simple Example 1: Dimensions Table Example 1: Input and Iteration History Example 1: Fit Statistics & Parameter Estimates Example 1: Additional Parameter Estimates Example 1: Estimated Probabilities Example 1: Estimated Probabilities SAS PROC NLMIXED & proc glimmix data=respire method=quad ; class id ; model resp = age / solution link=logit dist=bin; random intercept / subject=id; title Random Intercept, Simple Model ; run; GLIMMIX input for Clustered Slide 53 of 98 Different Estimation Methods:

54 Different Estimation Methods: Different Results Some GLIMMIX Estimation Options MMPL RMPL RSPL Param est s.e. est s.e. est s.e. ˆγ (0.1163) (0.1167) (0.1160) ˆγ (0.0061) (0.0061) (0.0061) ˆτ (0.2775) (0.2810) (0.2292) Putting Levels 1 & 2 Together A Simple Random Intercept Model Example 1: A Simple Example 1: Dimensions Table Example 1: Input and Iteration History Example 1: Fit Statistics & Parameter Estimates Example 1: Additional Parameter Estimates Example 1: Estimated Probabilities Example 1: Estimated Probabilities SAS PROC NLMIXED & GLIMMIX NLMIXED LaPlace quad gauss Param est s.e. est s.e. est s.e. ˆγ (0.1844) (0.1723) (0.1723) ˆγ (0.0067) (0.0066) (0.0066) ˆτ (0.3961) (0.3559) (0.3560) What s going on? GLIMMIX input for Clustered Slide 54 of 98 Different Estimation Methods:

55 Estimation of GLIMMs Pseudo-likelihood Turn into linear mixed model problem. pseudo-likelihood Implemented in SAS PROC/GLIMMIX Maximum likelihood LaPlace implemented in HLM6 & GLIMMIX (SAS v9.2) Approximate the integral (numerical integration) Gaussian Quadrature Putting Levels 1 & 2 Together A Simple Random Intercept Model Example 1: A Simple Example 1: Dimensions Table Example 1: Input and Iteration History Example 1: Fit Statistics & Parameter Estimates Example 1: Additional Parameter Estimates Example 1: Estimated Probabilities Example 1: Estimated Probabilities SAS PROC NLMIXED & Adaptive quadrature Implemented in SAS v9.2: PROC NLMIXED & GLIMMIX Bayesian: WinBugs, R, SAS v9.2 PROC MCMC (experimental) GLIMMIX input for Clustered Slide 55 of 98 Different Estimation Methods:

56 Comparison of PLE and MLE Putting Levels 1 & 2 Together A Simple Random Intercept Model Example 1: A Simple Example 1: Dimensions Table Example 1: Input and Iteration History Example 1: Fit Statistics & Parameter Estimates Example 1: Additional Parameter Estimates Example 1: Estimated Probabilities Example 1: Estimated Probabilities SAS PROC NLMIXED & Appox. integrand Fits as wider range of models (e.g., 3+ levels, more than random intercept) Estimation approximates the integrand, pseudo-likelihood Parameter estimates are downward biased Estimation can be very poor for small n per macro-unit Faster Easier to use (like PROC MIXED) No LR testings Approx. integral Narrower range of models (only 2 level for QUAD, more with LaPlace) Estimation uses numerical integration (Gaussian or adaptive quadrature) Parameter estimates aren t biased Estimation can be fine for small n Slower Harder to use This is maximum likelihood estimation GLIMMIX input for Clustered Slide 56 of 98 Different Estimation Methods:

57 Cool Kid Example: The empty/null model A good starting point... for the cool kid data, Level 1: ideal ij = y ij Binomial(π ij,n ij ) and ln ( πij ) = η 1 π ij = β 0j ij Level 2: β 0j = γ 00 +u 0j Cool Kid Example: The empty/null model Results and Interpretation and Probability Estimates where u 0j N(0,τ 00 ) i.i.d. Linear Mixed Predictor: ln ( πij 1 π ij ) = γ 00 +u 0j Three Level Models Concluding Comments Useful information from this model: An estimate of the classroom-specific odds (& probability) of choosing an ideal student. Amount of between school variability in the odds (& probability). for Clustered Slide 57 of 98 Item Response Theory

58 Results and Interpretation From adaptive quadrature, ˆγ 00 = (s.e. = ) and ˆτ 00 = (s.e. = ) Cool Kid Example: The empty/null model Results and Interpretation and Probability Estimates Interpretation: Based on our model, the odds that a student in classroom j nominates an ideal student equals exp[γ 00 +u 0j ] For a classroom with u 0j = 0, the estimated odds of nominating an ideal student equals exp[ˆγ 00 ] = exp[ ] =.64. The 95% confidence of classroom-specific odds equals exp[ˆγ (s.e.)],exp[ˆγ (s.e.)] (.63,.65). The 95% of the estimated variability in odds over classrooms equals exp[ˆγ ˆτ 00 ],exp[ˆγ ˆτ 00 ] (0.01,8.93). Three Level Models Concluding Comments What does this imply? for Clustered Slide 58 of 98 Item Response Theory

59 and Probability Estimates ˆγ 00 = (s.e. = ) and ˆτ 00 = (s.e. = ) We can also compute estimated *probabilities using the estimated linear predictor by using the inverse of the logit: π = exp(η) 1+exp(η). For a classroom with u 0j = 0, the probability that a student nominates an ideal student is ˆπ = exp( ) 1+exp( ) =.39 Cool Kid Example: The empty/null model Results and Interpretation and Probability Estimates A 95% confidence interval for this classroom-specific probability is (logit 1 (.63), logit 1 (.65)) (.28,.52) Three Level Models 95% of the classrooms have probabilities ranging from.01 to.90. Concluding Comments for Clustered Slide 59 of 98 Item Response Theory

60 s s Three Level Models For the empty/null random intercept model there are at least two ways to define an interclass correlation. The following definition will extend to residual interclass correlation case: τ 00 ICC = τ 00 +π 2 /3 where π = ICC = /3 =.48 Lots of variability between classrooms. Lots of dependency within classrooms. Concluding Comments Item Response Theory Wrap-up for Clustered Slide 60 of 98

61 Level 1: y ij Bionmial(π ij,n ij ) where logit(π ij ) = η ij = β 0j +β 1j x 1ij Level 2: β 0j = γ 00 +u 0j whereu 0j N(0,τ 00 ) Example of Random Intercept Model Probabilities within a Cluster Random Intercept with Predictors Results Results and Interpretation Results and Interpretation Residual Intraclass Correlation β 1j = γ 10 For interpretation: (π ij u 0j ) 1 (π ij u 0j ) = exp[ ] γ 00 +γ 10 x ij +u 0j The intercept: When x ij = 0, the odds in cluster j equals exp(γ 00 +u 0j ). When x ij = 0, the odds within an average cluster (i.e., u 0j = 0) equals exp(γ 00 ). The slope: The odds ratio within a cluster for a 1 unit change in x ij equals exp(γ 00 )exp(γ 10 (x ij +1))exp(u 0j ) exp(γ 00 )exp(γ 10 x ij )exp(u 0j ) = exp(γ 10 ) for Clustered Slide 61 of 98

62 Example of In the last set of notes, we fit a random intercept model to the cool kid data set with only. The estimated model is π ij 1 π ij = exp [ Popularity ij Gender ij Race ij ] Example of Random Intercept Model Probabilities within a Cluster Random Intercept with Predictors Results Results and Interpretation Results and Interpretation Residual Intraclass Correlation Holding other predictors constant, Popularity: WITHIN A CLUSTER, the odds that a highly popular student nominates an ideal student is exp(0.1080) = 1.11 times the odds for a low popular student. Gender: WITHIN A CLUSTER, the odds that a girl nominates an ideal student is exp(0.6486) = 1.92 times the odds for a boy. Race: WITHIN A CLUSTER, the odds that a white student nominates an ideal student is exp(1.3096) = 3.70 times the odds for a black student. for Clustered Slide 62 of 98

63 Probabilities within a Cluster Example of Random Intercept Model Probabilities within a Cluster Random Intercept with Predictors Results Results and Interpretation Results and Interpretation Residual Intraclass Correlation for Clustered Slide 63 of 98

64 Random Intercept with Predictors Level 1: y ij Bionmial(π ij,n ij ) where π ij 1 π ij = exp [ β 0j +β 1j Popularity ij +β 2j Gender ij +β 3j Race ij ] Level 2: β 0j = γ 00 +γ 01 ClassAggress j +u 0j β 1j = γ 10 Example of Random Intercept Model Probabilities within a Cluster Random Intercept with Predictors Results Results and Interpretation Results and Interpretation Residual Intraclass Correlation β 2j = γ 20 +u 1j β 3j = γ 30 where u 0j N(0,τ 00 ). For interpretation: (π ij u 0j ) = exp [ γ 00 +γ 10 Popularity 1 (π ij u 0j ) ij +γ 20 Gender ij +γ 3 Race ij ] +γ 01 ClassAggress j +u 0j for Clustered Slide 64 of 98

65 Results Covariance Parameter Estimates Standard Cov Parm Subject Estimate Error Intercept CLASSID Solutions for Fixed Effects Standard Effect Estimate Error DF t Value Pr > t Example of Random Intercept Model Probabilities within a Cluster Random Intercept with Predictors Results Results and Interpretation Results and Interpretation Residual Intraclass Correlation Intercept DPOP SEX black class_agg for Clustered Slide 65 of 98

66 Results and Interpretation ˆπ ij 1 ˆπ ij = exp [ Popularity ij Gender ij Race ij ClassAggress j ] Classroom aggression helps to explain the differences between cluster intercepts. Within in class j, for students with the same popularity, gender and race, the odds of a student choosing an ideal student is exp[ (ClassAggress j )+u 0j ] exp[ (ClassAggress k )+u 0k ] = exp[ (ClassAggress j ClassAggress k )+(u 0j u 0k )] Example of Random Intercept Model Probabilities within a Cluster Random Intercept with Predictors Results Results and Interpretation Results and Interpretation Residual Intraclass Correlation times those of a student in class k.... So the systematic differences between classrooms can be in part explained by mean classroom aggression such that the lower classroom aggression, the greater the tendency for ideal students to be nominated as cool. for Clustered Slide 66 of 98

67 Results and Interpretation Example of Random Intercept Model Probabilities within a Cluster Random Intercept with Predictors Results Results and Interpretation Results and Interpretation Residual Intraclass Correlation For students with the same popularity, gender and race, from two different schools where u 0j = u 1j but the schools differ by one unit of classroom aggression, the odds ratio of nominating an ideal student equals exp[ ] =.52. Interpretation of Popularity, Gender and Race are basically the same, but for the sake of completeness, Popularity: Within a classroom and holding other variables constant, the odds that a highly popular student nominates an ideal student is exp(0.0722) = 1.07 times the odds of a low popular student. Gender: Within a classroom and holding other variables constant, the odds that a girl nominates an ideal student is exp(0.6366) = 1.89 times the odds for a boy. Race: Within a classroom and holding other variables constant, the odds that a white student nominates an ideal student is exp(1.0709) = 2.92 times the odds for a black student. for Clustered Slide 67 of 98

68 Residual We can use our estimate of τ 00 to see what this now equals give that we have both Level 1 and Level 2 predictors in the model using where π = ICC = τ 00 τ 00 +π 2 /3 Example of Random Intercept Model Probabilities within a Cluster Random Intercept with Predictors Results Results and Interpretation Results and Interpretation Residual Intraclass Correlation For three random intercept models we have fit so far: Model ˆτ 00 ICC Null/Empty Popularity + Gender + Minority Class Aggression Note: ˆτ 00 doesn t drop too much when add Level 1 variables. What does this imply? for Clustered Slide 68 of 98

69 Level 1: y ij Bionmial(π ij,n ij ) where π ij 1 π ij = exp [ β 0j +β 1j Popularity ij +β 2j Gender ij +β 3j Race ij ] Level 2: β 0j = γ 00 +γ 01 ClassAggress j +u 0j β 1j = γ 10 β 2j = γ 20 +u 2j β 3j = γ 30 Results and Comparisons Some Model Refinements Results and Comparisons Comments on Results To help interpretation: (π ij u 0j ) = exp [ γ 00 +γ 10 Popularity 1 (π ij u 0j ) ij +γ 20 Gender ij +γ 3 Race ij ] +γ 01 ClassAggress j +u 0j +u 2j Gender ij Three Level Models Concluding Comments for Clustered Slide 69 of 98

70 Results and Comparisons Empty Model 2 Model 3 Model 4 Effects Est. s.e. Est. s.e. Est. s.e. Est. s.e. Results and Comparisons Some Model Refinements Results and Comparisons Comments on Results Intercept 0.44 (0.26) 0.31 (0.29) 0.24 (0.28) 0.26 (0.36) Popularity 0.10 (0.21) 0.07 (0.21) 0.10 (0.24) Gender 0.65 (0.21) 0.64 (0.21) 0.48 (0.42) Race 1.30 (0.33) 1.07 (0.33) 1.14 (0.36) ClassAgg 0.64 (0.24) 0.71 (0.27) τ (0.85) 2.31 (0.73) 1.99 (0.64) 3.74 (1.41) τ (1.42) τ 11 gender 4.87 (1.96) # param lnLike AIC BIC Three Level Models Concluding Comments for Clustered Slide 70 of 98

71 Some Model Refinements Popularity is clearly not significant with a t =.10/0.24 = Results and Comparisons Some Model Refinements Results and Comparisons Comments on Results Gender is no longer significant with a t =.48/0.42 = Should we drop gender? Test H o : τ 11 = τ 01 = 0 versus H a : not H o. Use same method as we did for HLM: compute LR and compare to a mixture of chi-square distributions. p-value p-value p-value Model 2lnLike LR χ 2 χ 1 for test Null H a tiny <.01 Drop Popularity from the model but keep random Gender effect. Three Level Models Concluding Comments for Clustered Slide 71 of 98

72 Results and Comparisons Model 2 Model 3 Model 4 Refined Effects Est. s.e. Est. s.e. Est. s.e. Est. s.e. Results and Comparisons Some Model Refinements Results and Comparisons Comments on Results Intercept 0.31 (0.29) 0.24 (0.28) 0.26 (0.36) 0.22 (0.35 ) Popularity 0.10 (0.21) 0.07 (0.21) 0.10 (0.24) Gender 0.65 (0.21) 0.64 (0.21) 0.48 (0.42) 0.48 (0.41) Race 1.30 (0.33) 1.07 (0.33) 1.14 (0.36) 1.14 (0.36) ClassAgg 0.64 (0.24) 0.71 (0.27) 0.69 (0.27) τ (0.73) 1.99 (0.64) 3.74 (1.41) 3.63 (1.35) τ (1.42) 2.73 (1.38) τ 22 gender 4.87 (1.96) 4.76 (1.91) # param lnLike AIC BIC Three Level Models Concluding Comments for Clustered Slide 72 of 98

73 Comments on Results A likelihood ratio test for popularity, compared to a χ 2 1 has p =.67. LR = = 0.18 Fixed parameter estimates and their standard errors are the same. Results and Comparisons Some Model Refinements Results and Comparisons Comments on Results Estimates variance and their standard errors changed a little. Empirical standard errors are very similar to the model based ones reported on the previous slide. Before getting serious about this model, we consider 3 level models because students are nested within peer groups nested within classrooms. Three Level Models Concluding Comments for Clustered Slide 73 of 98

74 Three Level Models Regardless of the type of response variable (e.g., normal, binomial, etc), additional levels of nesting can be included. A very simple example: Level 1 y ijk Binomial(π ijk,n ijk ) and Level 2 logit(π ijk ) = β 0jk β 0jk = γ 00k +u 0jk where u 0jk N(0,τ 00 ) i.i.d.. Level 3: γ 00k = ξ 00 +w 0k where w 0k N(0,ψ) i.i.d and independent of u 0jk. Linear Mixed Predictor: Three Level Models Three Level Models Adding Predictors Three Level Random Intercept Adding Predictors of logit(π ijk ) = ξ 00 }{{} fixed +u 0jk +w 0k }{{} random Intercepts What the Last Model Looks for Clustered Slide 74 of 98 Like

75 Adding Predictors Predictors can be added at every level. Predictors are lower levels can have random coefficients that are modeled a high level. Can have cross-level interactions. Three Level Models Three Level Models Adding Predictors Three Level Random Intercept Adding Predictors of Predictors for the cool kid data: Level 1 Black ijk = 1for black student,0 for white Zpop ijk Zagg ijk = standardized popularity score = standardized aggression score Level 2 Gnom jk = Peer group centrality Gagg jk sex jk = Peer group aggression score = 1 boy group and 0 girl group Level 3 ClassAgg k = Mean class aggression score Majority k = 1more white and 0 more Black Intercepts What the Last Model Looks for Clustered Slide 75 of 98 Like

76 Three Level Random Intercept To show what happens when we enter variable, I ll do this one set at a time. Level 1 y ijk Binomial(π ijk,n ijk ) and Level 2 logit(π ijk ) = β 0jk +β 1jk Zpop ijk +β 2jk Zagg ijk β 0jk = γ 00k +u 0jk β 1jk = γ 10k β 2jk where u 0jk N(0,τ 00 ) i.i.d.. Level 3: γ 00k = γ 20k = ξ 00 +w 0k γ 10k = ξ 10 γ 20k = ξ 20 Three Level Models Three Level Models Adding Predictors Three Level Random Intercept Adding Predictors of where w 0k N(0,ψ) i.i.d and independent of u 0jk. What s the Linear Mixed Predictor? Intercepts What the Last Model Looks for Clustered Slide 76 of 98 Like

Multilevel Logistic Regression

Multilevel Logistic Regression Edps/Psych/Stat 587 Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Fall 2017 Outline In this set of notes: Data Quick