Logistic And Probit Regression

Further Readings On Multilevel Regression Analysis Ludtke Marsh, Robitzsch, Trautwein, Asparouhov, Muthen (27). Analysis of group level effects using multilevel modeling: Probing a latent covariate approach. Submitted for publication. Raudenbush, S.W. & Bryk, A.S. (22). Hierarchical linear models: Applications and data analysis methods. Second edition. Newbury Park, CA: Sage Publications. Snijders, T. & Bosker, R. (999). Multilevel analysis. An introduction to basic and advanced multilevel modeling. Thousand Oakes, CA: Sage Publications. 37 Logistic And Probit Regression 38 9

Categorical Outcomes: Logit And Probit Regression Probability varies as a function of x variables (here x, x 2 ) P(u = x, x 2 ) = F[ + x + 2 x 2 ], (22) P(u = x, x 2 ) = - P[u = x, x 2 ], where F[z] is either the standard normal (Φ[z]) or logistic (/[ + e -z ]) distribution function. Example: Lung cancer and smoking among coal miners u lung cancer (u = ) or not (u = ) x smoker (x = ), non-smoker (x = ) x 2 years spent in coal mine 39 Categorical Outcomes: Logit And Probit Regression P(u = x, x 2 ) = F [ + x + 2 x 2 ], (22) P( u = x, x 2 ) x = Probit / Logit x = x = x =.5 x 2 x 2 4 2

2 4 Interpreting Logit And Probit Coefficients Sign and significance Odds and odds ratios Probabilities 42 Logistic Regression And Log Odds Odds (u = x) = P(u = x)/ P(u = x) = P(u = x) / ( P(u = x)). The logistic function gives a log odds linear in x, + + = + + ) ( / log ) ( ) ( x x e e [ ] x e x ) ( log + = = + + + = + + + ) ( ) ( ) ( * log x x x e e e logit = log [odds (u = x)] = log [P(u = x) / ( P(u = x))] ) ( ) ( x - e x u P + + = =

Logistic Regression And Log Odds (Continued) logit = log odds = + x When x changes one unit, the logit (log odds) changes units When x changes one unit, the odds changes e units 43 Two-Level Logistic Regression With j denoting cluster, logit ij = log (P(u ij = )/P(u ij = )) = α j + j * x ij where α j = α + u j j = + u j High/low α j value means high/low logit (high log odds) 44 22

Predicting Juvenile Delinquency From First Grade Aggressive Behavior Cohort data from the Johns Hopkins University Preventive Intervention Research Center n=,84 students in 4 classrooms, Fall first grade Covariates: gender and teacher-rated aggressive behavior 45 Input For Two-Level Logistic Regression TITLE: Hopkins Cohort 2-level logistic regression DATA: FILE = Cohort_classroom_ALL.DAT; VARIABLE: NAMES = prcid juv99 gender stubf bkrulef harmof bkthinf yellf takepf fightf liesf teasef; CLUSTER = classrm; USEVAR = juv99 male aggress; CATEGORICAL = juv99; MISSING = ALL (999); WITHIN = male aggress; DEFINE: male = 2 - gender; aggress = stubf + bkrulef + harmof + bkthinf + yellf + takepf + fightf + liesf + teasef; 46 23

Input For Two-Level Logistic Regression (Continued) ANALYSIS: TYPE = TWOLEVEL MISSING; PROCESS = 2; MODEL: %WITHIN% juv99 ON male aggress; %BETWEEN% OUTPUT: TECH TECH8; 47 Output Excerpts Two-Level Logistic Regression MODEL RESULTS Estimates S.E Est./S.E. Within Level JUV99 MALE ON.7.49 7.93 AGGRESS.6. 6.9 Between Level Thresholds JUV99$ 2.98.25 4.562 Variances JUV99.87.25 3.228 48 24

Understanding The Between-Level Intercept Variance Intra-class correlation ICC =.87/(π 2 /3 +.87) Odds ratios Larsen & Merlo (25). Appropriate assessment of neighborhood effects on individual health: Integrating random and fixed effects in multilevel logistic regression. American Journal of Epidemiology, 6, 8-88. Larsen proposes MOR: "Consider two persons with the same covariates, chosen randomly from two different clusters. The MOR is the median odds ratio between the person of higher propensity and the person of lower propensity." MOR = exp( (2* σ 2 ) * Φ - (.75) ) In the current example, ICC =.2, MOR = 2.36 Probabilities Compare α j = SD and α k =- SD from the mean 49 Two-Level Path Analysis 5 25

A Path Model With A Binary Outcome And A Mediator With Missing Data Logistic Regression Path Model female mothed homeres expect lunch expel arrest droptht7 hisp black math7 math hsdrop female mothed homeres expect lunch expel arrest droptht7 hisp black math7 math hsdrop 5 Two-Level Path Analysis Within Between female mothed homeres expect lunch expel arrest droptht7 hisp black math7 math hsdrop math hsdrop 52 26

Input For A Two-Level Path Analysis Model With A Categorical Outcome And Missing Data On The Mediating Variable TITLE: DATA: VARIABLE: ANALYSIS: a twolevel path analysis with a categorical outcome and missing data on the mediating variable FILE = lsayfull_dropout.dat; NAMES = female mothed homeres math7 math expel arrest hisp black hsdrop expect lunch droptht7 schcode; MISSING = ALL (9999); CATEGORICAL = hsdrop; CLUSTER = schcode; WITHIN = female mothed homeres expect math7 lunch expel arrest droptht7 hisp black; TYPE = TWOLEVEL MISSING; ESTIMATOR = ML; ALGORITHM = INTEGRATION; INTEGRATION = MONTECARLO (5); 53 Input For A Two-Level Path Analysis Model With A Categorical Outcome And Missing Data On The Mediating Variable (Continued) MODEL: %WITHIN% hsdrop ON female mothed homeres expect math7 math lunch expel arrest droptht7 hisp black; math ON female mothed homeres expect math7 lunch expel arrest droptht7 hisp black; %BETWEEN% hsdrop*; math*; OUTPUT: PATTERNS SAMPSTAT STANDARDIZED TECH TECH8; 54 27

Output Excerpts A Two-Level Path Analysis Model With A Categorical Outcome And Missing Data On The Mediating Variable Summary Of Data Number of patterns 2 Number of clusters 44 Size (s) 2 3 36 38 39 4 4 42 43 44 45 Cluster ID with Size s 34 35 37 6 38 3 38 46 2 33 4 22 2 9 2 43 55 Output Excerpts A Two-Level Path Analysis Model With A Categorical Outcome And Missing Data On The Mediating Variable (Continued) Size (s) Cluster ID with Size s 46 47 49 5 5 52 53 55 57 44 4 8 26 27 37 42 45 35 24 7 3 23 5 47 8 3 36 58 2 59 9 73 4 89 32 93 8 39 5 56 28

Output Excerpts A Two-Level Path Analysis Model With A Categorical Outcome And Missing Data On The Mediating Variable (Continued) Model Results Within Level HSDROP ON FEMALE MOTHED HOMERES EXPECT MATH7 MATH LUNCH EXPEL ARREST DROPTHT7 HISP BLACK Estimates.323 -.253 -.77 -.244 -. -.3.8.947.68.757 -.8 -.86 S.E..7.3.55.65.5..6.225.32.284.274.253 Est./S.E..887-2.457 -.4-3.756 -.754-2.76.324 4.2.22 2.665 -.43 -.34 Std.323 -.253 -.77 -.244 -. -.3.8.947.68.757 -.8 -.86 StdYX.77 -.2 -.6 -.59 -.55 -.97.74.2.7.74 -.6 -.3 57 Output Excerpts A Two-Level Path Analysis Model With A Categorical Outcome And Missing Data On The Mediating Variable (Continued) Estimates S.E. Est./S.E. Std StdYX MATH ON FEMALE MOTHED HOMERES EXPECT MATH7 LUNCH EXPEL ARREST DROPTHT7 HISP BLACK -.84.263.568.985.94 -.39 -.293-3.426 -.424 -.5 -.369.398.25.36.62.23.7.825.22.49.728.733-2..222 4.69 6.9 4.23-2.38 -.567-3.353 -.358 -.689 -.53 -.84.263.568.985.94 -.39 -.293-3.426 -.424 -.5 -.369 -.3.2.7..697 -.59 -.26 -.54 -.22 -. -.9 58 29

Output Excerpts A Two-Level Path Analysis Model With A Categorical Outcome And Missing Data On The Mediating Variable (Continued) Estimates S.E. Est./S.E. Std StdYX Residual Variances MATH 62. 2.62 28.683 62..34 Between Level Means MATH Thresholds HSDROP$ Variances HSDROP MATH.226 -.76.286 3.757.34.56.33.248 7.632 -.92 2.5 3..226.286 3.757 5.276.. 59 Two-Level Mediation a j m b j x c j y Indirect effect: α + + Cov (a j, b j ) Bauer, Preacher & Gil (26). Conceptualizing and testing random indirect effects and moderated mediation in multilevel models: New procedures and recommendations. Psychological Methods,, 42-63. 6 3

Input For Two-Level Mediation MONTECARLO: NAMES ARE y m x; WITHIN = x; NOBSERVATIONS = ; NCSIZES = ; CSIZES = (); NREP = ; MODEL POPULATION: %WITHIN% c y ON x; b y ON m; a m ON x; x*; m*; y*; %BETWEEN% y WITH m*. b*. a*. c*.; m WITH b*. a*. c*.; a WITH b*. c*.; b WITH c*.; y* m* a* b* c*; [a*.4 b*.5 c*.6]; 6 Input For Two-Level Mediation (Continued) ANALYSIS: TYPE = TWOLEVEL RANDOM; MODEL: %WITHIN% c y ON x; b y ON m; a m ON x; m*; y*; %BETWEEN% y WITH M*. b*. a*. c*.; m WITH b*. a*. c*.; a WITH b*. (cab); a WITH c*.; b WITH c*.; y* m* a* b* c*; [a*.4] (ma); [b*.5] (mb); [c*.6]; MODEL CONSTRAINT: NEW(m*.3); m=ma*mb+cab; 62 3

32 63..9.93.32.376.756. C.5.92.37..74.964. B WITH A.6.97.32.65.47.38. C.7.95.9.6.8.85. A.2.94.5.85.29.33. B WITH M.9.94.26.237.2.868. C.9.9.73.62.38.86. A.2.9.58.4.246.22. B WITH Y Between Level..9.29.496.538.. M..96.28.53.53.2. Y Residual variances Within Level Coeff Cover Average Std.Dev. Average Population % Sig 95% M. S. E. S.E. Estimates Output Excerpts Two Level Mediation 64.55.95.2.36.422.294.3 M New/Additional Parameters..95.239.587.54.88. A..95.22.545.443.9768. B..98.2.78.43.982. C..93.36.57.782.3. M..9.28.689.68.7. Y Variances.97.97.96.72.972.3854.4 A..89.62.6.279.522.5 B..93.5.25.229.5979.6 C.5.95.2.56.2 -.3. M.5.95.32.3.5.7. Y Means.4.94.78.285.342.34. M WITH Y.7.96.2.56.56.892. C WITH B Output Excerpts Two-Level Mediation (Continued)

Two-Level Factor Analysis 65 Two-Level Factor Analysis Recall random effects ANOVA (individual i in cluster j ): y ij = ν + η j + ε ij = y B + y W Two-level factor analysis (r =, 2,, p items): j ij y rij = ν r + λ B η B + ε B r j rj r (between-cluster variation) + λ W η Wij + ε Wrij (within-cluster variation) 66 33

Two-Level Factor Analysis (Continued) Covariance structure: V(y) = V(y B )+ V(y w ) = Σ B + Σ w, Σ B = Λ B Ψ B Λ B + Θ B, Σ W = Λ W Ψ W Λ W + Θ W. Two interpretations: variance decomposition, including decomposing the residual random intercept model 67 Two-Level Factor Analysis And Design Effects Muthén & Satorra (995; Sociological Methodology): Monte Carlo study using two-level data (2 clusters of varying size and varying intraclass correlations), a latent variable model with variables, 2 factors, conventional ML using the regular sample covariance matrix S T, and, replications (d.f. = 34). Λ B = Λ W = Ψ B, Θ B reflecting different icc s y ij = ν + Λ(η B + η W ) + ε B + ε j ij j W ij V(y) = Σ B + Σ W = Λ(Ψ B + Ψ W ) Λ + Θ B + Θ W 68 34

Two-Level Factor Analysis And Design Effects (Continued) Inflation of χ 2 due to clustering Intraclass Correlation.5..2 Chi-square mean Chi-square var 5% % Chi-square mean Chi-square var 5% % Chi-square mean Chi-square var 5% % Cluster Size 7 5 3 6 35 68 5.6.4 36 75 8.5. 42 23.5 8.6 36 72 7.6.6 4 89 6. 5.2 52 52 57.7 35. 38 8.6 2.8 46 7 37.6 7.6 73 32 93. 83. 4 96 2.4 7.7 58 89 73.6 52. 4 734 99.9 99.4 69 Two-Level Factor Analysis And Design Effects (Continued) Regular analysis, ignoring clustering Inflated chi-square, underestimated SE s TYPE = COMPLEX Correct chi-square and SE s but only if model aggregates, e.g. Λ B = Λ W TYPE = TWOLEVEL Correct chi-square and SE s 7 35

Two-Level Factor Analysis (IRT) Within Between u u2 u3 u4 u u2 u3 u4 fw fb u* ij = λ ( f B + f w ) + ε j ij ij 7 Input For A Two-Level Factor Analysis (IRT) Model With Categorical Outcomes TITLE: DATA: VARIABLE: ANALYSIS: MODEL: this is an example of a two-level factor analysis model with categorical outcomes FILE = catrep.dat; NAMES ARE u-u6 clus; CATEGORICAL = u-u6; CLUSTER = clus; TYPE = TWOLEVEL; ESTIMATION = ML; ALGORITHM = INTEGRATION; %WITHIN% fw BY u@ u2 () u3 (2) u4 (3) u5 (4) u6 (5); 72 36

Input For A Two-Level Factor Analysis (IRT) Model With Categorical Outcomes (Continued) OUTPUT: %BETWEEN% fb BY u@ u2 () u3 (2) u4 (3) u5 (4) u6 (5); TECH TECH8; 73 Output Excerpts A Two-Level Factor Analysis (IRT) Model With Categorical Outcomes Tests Of Model Fit Loglikelihood H Value Information Criteria Number of Free Parameters Akaike (AIC) Bayesian (BIC) Sample-Size Adjusted BIC (n* = (n + 2) / 24) -3696.7 3 748.235 748.55 744.27 74 37

Output Excerpts A Two-Level Factor Analysis (IRT) Model With Categorical Outcomes (Continued) Model Results Within Level FW BY U U2 U3 U4 U5 U6 Estimates..95.87.58.9.43 S.E...46.69.64.85.78 Est./S.E.. 6.264 6.437 6.44 6.449 6.439 Variances FW.834.9 4.36 75 Output Excerpts Two-Level Factor Analysis (IRT) Model With Categorical Outcomes (Continued) Between Level FB BY U U2 U3 U4 U5 U6 Estimates..95.87.58.9.43 S.E...46.69.64.85.78 Est./S.E.. 6.264 6.437 6.44 6.449 6.439 Thresholds U$ U2$ U3$ U4$ U5$ U6$ -.26. -.6 -.64 -.33 -.2.96.9..98.5.2-2.5.7 -.56 -.652 -.35 -.29 Variances FB.496.39 3.562 76 38

SIMS Variance Decomposition The Second International Mathematics Study (SIMS; Muthén, 99, JEM). National probability sample of school districts selected proportional to size; a probability sample of schools selected proportional to size within school district, and two classes randomly drawn within each school 3,724 students observed in 97 classes from 3 schools with class sizes varying from 2 to 38; typical class size of around 2 Eight variables corresponding to various areas of eighthgrade mathematics Same set of items administered as a pretest in the Fall of eighth grade and as a posttest in the Spring. 77 SIMS Variance Decomposition (Continued) Muthén (99). Multilevel factor analysis of class and student achievement components. Journal of Educational Measurement, 28, 338-354. Research questions: The substantive questions of interest in this article are the variance decomposition of the subscores with respect to within-class student variation and between-class variation and the change of this decomposition from pretest to posttest. In the SIMS such variance decomposition relates to the effects of tracking and differential curricula in eighth-grade math. On the one hand, one may hypothesize that effects of selection and instruction tend to increase between-class variation relative to within-class variation, assuming that the classes are homogeneous, have different performance levels to begin with, and show faster growth for higher initial performance level. On the other hand, one may hypothesize that eighth-grade exposure to new topics will increase individual differences among students within each class so that posttest within-class variation will be sizable relative to posttest between-class variation. 78 39

SIMS Variance Decomposition (Continued) y rij = ν r + λ Br η Bj + ε Brj + λ wr η wij + ε wrij V(y rij ) = BF + BE + WF + WE Between reliability: BF / (BF + BE) BE often small (can be fixed at ) Within reliability: WF / (WF + WE) sum of a small number of items gives a large WE Intraclass correlation: ICC = (BF + BE) / (BF + BE + WF+ WE) Large measurement error large WE small ICC True ICC = BF / (BF + WF) 79 Between fb_pre rpp_pre fract_pre eqexp_pre intnum_pre testi_pre aeravol_pre coorvis_pre pfigure_pre Within fw_pre fb_post rpp_post fract_post eqexp_post intnum_post testi_post aeravol_post coorvis_post fw_post pfigure_post 8 4