Spring RMC Professional Development Series January 14, Generalized Linear Mixed Models (GLMMs): Concepts and some Demonstrations

Similar documents
Generalized Linear Models for Non-Normal Data

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

Introducing Generalized Linear Models: Logistic Regression

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Generalized Models: Part 1

Investigating Models with Two or Three Categories

Partitioning variation in multilevel models.

Ronald Heck Week 14 1 EDEP 768E: Seminar in Categorical Data Modeling (F2012) Nov. 17, 2012

Model Estimation Example

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Introduction to Generalized Models

Binary Logistic Regression

Semiparametric Generalized Linear Models

Generalized Linear Probability Models in HLM R. B. Taylor Department of Criminal Justice Temple University (c) 2000 by Ralph B.

Generalized Multilevel Models for Non-Normal Outcomes

H-LIKELIHOOD ESTIMATION METHOOD FOR VARYING CLUSTERED BINARY MIXED EFFECTS MODEL

Mixed Models for Longitudinal Ordinal and Nominal Outcomes

Modelling heterogeneous variance-covariance components in two-level multilevel models with application to school effects educational research

Comparing IRT with Other Models

multilevel modeling: concepts, applications and interpretations

Review of Multiple Regression

Poisson regression: Further topics

Chapter 1. Modeling Basics

Longitudinal Modeling with Logistic Regression

An Introduction to Multilevel Models. PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 25: December 7, 2012

Generalized linear models

Generalized Linear Models for Count, Skewed, and If and How Much Outcomes

Estimation and Centering

LOGISTIC REGRESSION Joseph M. Hilbe

Multilevel Statistical Models: 3 rd edition, 2003 Contents

A Re-Introduction to General Linear Models

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

WU Weiterbildung. Linear Mixed Models

ML estimation: Random-intercepts logistic model. and z

Chapter 9 Regression with a Binary Dependent Variable. Multiple Choice. 1) The binary dependent variable model is an example of a

8 Nominal and Ordinal Logistic Regression

A Re-Introduction to General Linear Models (GLM)

Model Assumptions; Predicting Heterogeneity of Variance

Random Intercept Models

Correlation and regression

Generalized Linear Models (GLZ)

Linear Regression With Special Variables

Stat 642, Lecture notes for 04/12/05 96

Statistical Distribution Assumptions of General Linear Models

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

NELS 88. Latent Response Variable Formulation Versus Probability Curve Formulation

Logistic Regression: Regression with a Binary Dependent Variable

Lecture 10: Alternatives to OLS with limited dependent variables. PEA vs APE Logit/Probit Poisson

7. Assumes that there is little or no multicollinearity (however, SPSS will not assess this in the [binary] Logistic Regression procedure).

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

Multilevel Modeling: A Second Course

Chapter 1 Statistical Inference

Generalized Linear Models

11. Generalized Linear Models: An Introduction

ISQS 5349 Spring 2013 Final Exam

STAT 705 Generalized linear mixed models

Package HGLMMM for Hierarchical Generalized Linear Models

2/26/2017. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

Simple logistic regression

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

A multivariate multilevel model for the analysis of TIMMS & PIRLS data

Binary Choice Models Probit & Logit. = 0 with Pr = 0 = 1. decision-making purchase of durable consumer products unemployment

Review of CLDP 944: Multilevel Models for Longitudinal Data

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis"

Generalized logit models for nominal multinomial responses. Local odds ratios

Introduction to Within-Person Analysis and RM ANOVA

Measurement Invariance (MI) in CFA and Differential Item Functioning (DIF) in IRT/IFA

Introduction To Logistic Regression

Lecture 3.1 Basic Logistic LDA

Introduction to Statistical Analysis

Chapter 22: Log-linear regression for Poisson counts

Repeated Measures ANOVA Multivariate ANOVA and Their Relationship to Linear Mixed Models

STAT5044: Regression and Anova

Lecture 1 Introduction to Multi-level Models

Recent Developments in Multilevel Modeling

Stat 5101 Lecture Notes

Introduction to GSEM in Stata

Research Design: Topic 18 Hierarchical Linear Modeling (Measures within Persons) 2010 R.C. Gardner, Ph.d.

The Application and Promise of Hierarchical Linear Modeling (HLM) in Studying First-Year Student Programs

Introduction to lnmle: An R Package for Marginally Specified Logistic-Normal Models for Longitudinal Binary Data

Categorical and Zero Inflated Growth Models

An ordinal number is used to represent a magnitude, such that we can compare ordinal numbers and order them by the quantity they represent.

Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models

Centering Predictor and Mediator Variables in Multilevel and Time-Series Models

Sampling and Sample Size. Shawn Cole Harvard Business School

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

The Basic Two-Level Regression Model

More Statistics tutorial at Logistic Regression and the new:

Linear Regression Models P8111

General structural model Part 2: Categorical variables and beyond. Psychology 588: Covariance structure and factor models

Models for Clustered Data

Advanced Quantitative Data Analysis

An Introduction to Path Analysis

Time-Invariant Predictors in Longitudinal Models

9 Generalized Linear Models

Models for Clustered Data

Transcription:

Spring RMC Professional Development Series January 14, 2016 Generalized Linear Mixed Models (GLMMs): Concepts and some Demonstrations Ann A. O Connell, Ed.D. Professor, Educational Studies (QREM) Director, Research Methodology Center College of Education and Human Ecology, OSU What are GLMs? Generalized linear models refer to an approach used to model response variables that are: Discrete (dichotomous, ordinal, nominal) [logistic or logit models] Limited within a range (proportions or rates, time to events) [binomial, or hazard or survival models] Counts of events [Poisson or negative binomial models] When the (conditional) distribution is normal, we end up with our familiar linear model For example, Anova or regression analysis; this is a special case called the general linear model 2 1

Clustered Data Designs in education and the social or behavioral sciences often rely on clusters (classrooms, schools, hospitals, neighborhoods) from which to collect data These hierarchies bring an additional level of complexity to studies analyzing discrete or limited outcomes. 3 Why worry about clustering? Research has consistently demonstrated that people within the same setting or context tend to be more similar to each other than they are to people in a different group or context. Violates the assumption of independence in the data Repeated observations share this same phenomena (correlated observations) Statistically, the standard errors for parameter estimates in a model that ignores clustering are extremely biased downwards Too small, every effect is statistically significant! 4 2

HLMs versus HGLMs (1) What are HLMs? HLMs are used to fit models to hierarchical data with normally distributed errors at level one. Linear mixed model: fixed and random effects Hierarchical models capture the structure of data obtained from many naturalistic settings Allows us to model data that are not independent (i.e., can model the covariance structure) Clustering, or repeated measures HLM is a special case of the HGLM 5 HLMs versus HGLMs (2) What are HGLMs? Hierarchical generalized linear models are used to fit models to hierarchical data where the errors at level one are not (necessarily) normally distributed Mixed model, includes both fixed and random effects Multilevel logistic, multilevel Poisson, etc. Generalized Linear Mixed Models HGLM is the general case and GLMM is a special case In GLMM, the higher-order random effects are assumed to be normally distributed 6 3

Examples Examples of outcomes where the application of GLMM would be appropriate: Data on whether or not an adolescent drops out of school, across multiple schools Data on proportion of school-aged children attending public schools across multiple neighborhoods Counts of number and type of suspensions in a school in past year Student proficiency in reading or math (below basic, basic, proficient, goal, advanced), across schools Community differences in time to first drinking episode among adolescents (age, months, days, years ) 7 Goals for today Provide an introduction to GLMMs Clarify some of the complexity involved in estimating and interpreting GLMMs Illustrate several models using HLMv7.1 software Multilevel logistic, ordinal, Poisson, overdispersed Poisson Highlight challenges and limitations to working with these kinds of models 8 4

Background Approach to building GLMMs parallels that of GLMs in single-level (fixed-effects) analyses Some non-glm strategies to dealing with non-normally distributed outcomes involved transformations of the DV: Severe skew Proportions Counts ln(y) 2arcsin(p 1/2 ) y 1/2 1/y p/(1-p) (logit) Probit (inverse normal) But: Empirical transformations don t always work well; and for dichotomies, there is no fix towards normality! 9 GLM Approach Embed information about the distribution of the DV and a desired transformation directly within the statistical model Three related model components A. Sampling Model What is the distribution of interest? B. Link function What transformation might be used to link the predicted values to the observed values for Y, and appropriately constrain model results to be within a certain interval (i.e., between 0 and 1 for proportions) C. Structural (Linear) Component How are the predicted values from the link function related to the covariates in the models? 10 5

Estimation Concerns GLMs (like logistic regression, etc.) are estimated through ML methods Most use a distribution from the exponential family (canonical links) Multilevel models are also estimated through ML Combined complexity for GLMMs leads to complex models as well as complex estimation procedures There are pros and cons to the available choices of estimation Choice affects other aspects of the model 11 Issues Particular to GLMM Estimation Deviance Variance partitioning and estimating the ICC Unit-specific versus Population average inferences Over- or under dispersion (for counts) I will talk briefly, we can revisit after looking at some examples 12 6

Model Representation Some software (including HLM) use separate models to describe relationships among predictors at each level Others use a mixed-model approach Conceptually, substitute level-two equations into level-one model, gather fixed and random effects X Z X and Z are the fixed and random effects design matrices, and and are vectors for the fixed and random effects parameters 13 Distribution of Random Effects The random effects,, are assumed to follow a normal distribution: ~ N(0, G) G represents the covariance structure for the random effects Need to estimate the fixed effects and the elements of G Dispersion at level one is given by the underlying distribution (for logistic, this is Bernoulli; for ordinal, it s multinomial, for counts, it s Poisson or some variation on Poisson) 14 7

Estimation In the normal case, V = var(y) = ZGZ T + R, where R is the covariance matrix for the levelone residuals For GLMM s, V is not as easily specified Likelihood function is a non-linear function of the fixed effects, and the residuals at level one are heteroscedastic and depend on the mean Integration required for solution intractable Two approaches dominant in literature: Approximate the model Approximate the log-likelihood 15 Approximating the Model Pseudo-likelihood techniques PL pseudo-likelihood PQL penalized quasi-likelihood, compensates for initial estimation of the variance structure Linearization methods (i.e., Taylor series) as the algorithm for a model based on pseudo data i.e., starting with normal mixed model; iterates through process until convergence is reached Advantages: Generally converges quickly; default in many statistical packages Disadvantages: does not yield a reliable deviance for model comparison (Snders & Bosker, 2012) Research has also shown these estimates to be biased (in logistic HLM), more so in small samples; suggestion of carry over to GLMMs of other forms 16 8

Approximating the log-likelihood The integral in the log-likelihood is approximated numerically Quadrature methods Gauss-Hermite quadrature Adaptive quadrature Laplace MCMC Advantages these methods yield deviance statistics that can be used for LR tests Disadvantages May have convergence problems 17 Deviance Deviance comparisons may be unreliable under PQL PQL not recommended when number of groups is small HLM has an option for Laplace iterations to obtain a model deviance for some models HLM can also fit models through Adaptive Quadrature, although this is often hard to converge with complex or poorly specified models 18 9

ICC ICC for non-normal models Raudenbush & Bryk (2002); Snders and Bosker (1999); Goldstein, Browne & Rasbash (2002); Browne, Subramanian, Jones & Goldstein (2005) Assume outcome is a latent variable that was discretized Implicit in the logit model, the level-1 errors are heteroscedastic, but assumed to have a standard logistic distribution with mean 0 and variance 2 /3 From scaling of the probability density and cumulative distribution functions for the logistic distribution Can be used for ICC calculation for logistic 19 Unit Specific vs. Population Average Inferences Situation that occurs with non-linear link functions Decision on choice is based on research aims The unit-specific model (hierarchically structured model) describes a process that s occurring within each group or level-2 unit. In most situations we want to see how these processes vary across the level 2 units Population average models ask a different question not focusing on specifics occurring within contexts May only want an average for the population without regard to group: i.e., how does risk of being at or below category 3 differ by gender (averaging across all schools) Is one characteristic of a unit-specific model 20 10

Illustrations of GLMMs Methodological considerations and structure of the models Logistic for dichotomous outcomes Cumulative logistic for ordinal outcomes Poisson and over-dispersed Poisson for counts Substantive examples using the ECLS-K Numeracy proficiency for first-graders at the end of first grade. 0, 1, 2, 3, 4, 5 = 6 proficiency levels 6539 students 569 schools 21 Proficiency Categories Proficiency Brief Description Category 0 Did not pass level 1 1 Can identify shapes and numbers 2 Can understand relative size and recognize patterns 3 Can understand ordinality and sequencing 4 Can solve simple addition and subtraction problems 5 Can solve simple multiplication and division problems 22 11

Student Level Descriptives Student Level Descriptive Statistics for the Analytic Sample, N = 6539 ProfMath 0 1 2 3 4 5 Total n N = 9 N = 53 N = 192 N = 1231 N = 3188 N = 1866 N = 6539 Cum.P..11%.95% 3.88% 22.71% 71.47% 100% --- % Male 33.33% 52.83% 48.44% 49.39% 46.55% 54.39% 49.4% NumRisks M (SD) 1.11 1.15.90.55.40.19.39 (1.17) (1.06) (.97) (.82) (.72) (.50) (.72) 23 School Level Variables School Level Descriptive Statistics for the Analytic Sample, J = 569 Variable Mean SD Minimum Maximum Neighborhood Problems 1.73 2.49 0.00 12.00 (NBHOODCLIM) Private (PUBPRIV2) 18% --- --- --- Public: Pubpriv2 = 0 Private: Pubpriv2 = 1 24 12

Proficiency Outcomes For multilevel logistic models I used the highest two categories as the target of interest (Prof45) For the multilevel ordinal models, I used the entire scale, 0 to 5 (profmath) For the count models, I assumed the outcomes represented actual counts (for demonstration) All variables measured at the end of first grade 25 Multilevel Logistic Model We want to model proficiency in terms of scoring in categories 4 or 5 for children within schools Y = Response for i th child in j th school Y = 0 or 1 (1 = success, i.e., score of 4 or 5) Are there child- or school-level characteristics that can help explain the likelihood of success? 26 13

Building the Model We specify three features: Sampling model (level one), Link function, and Structural model For HLM (normally distributed residuals), the sampling model at level one is Normal: Y E( Y ) Y 2 ~ NID(, ) The link is the identity link because no transformation is necessary The structural model is the regression model: Y X X X r 0j 1j 1 2 j 2... pj p 27 Dichotomous Data: A. Sampling Model For binary outcomes, the sampling model is Bernoulli, which is a special case of the Binomial distribution A binomial random variable counts the number of successes in m trials When m=1, the outcome is binary (0,1), rather than a count 28 14

Binomial Distribution Y p ~ B( m, p ) Where m refers to the number of trials (for Bernoulli, m = 1) p refers to the probability of success on each trial, i.e., the success probability for the i th person in the j th group. Specifying the sampling distribution as Binomial (or Bernoulli) identifies the nature of the level-1 variation as Binomial 29 Expected Values and Variance for Bernoulli Models E( Y Var( Y p ) p p ) p (1 p ) Level-one errors are Heteroscedastic depends on each person s P(success) Not constant across individuals; depends on p-hat 30 15

Dichotomous Data: B. Level-1 Link function We call the transformed predicted values:, This transformation process is called linking. For binary outcomes, the link is (typically) the logit link (the canonical link for binary data) Indicates how the transformed variable relates back to the original data logit( p ) ln 1 p p 31 Dichotomous Data: C. Structural Model The structural model describes how the transformed predicted value is related to the predictors (a linear structural model) In this example, I am just looking at level one for now.... 0 j 1 j X1 2 j X 2 Rarely do we need to write these three components out for a normal distribution (i.e. HLM), but for GLMM the distinction regarding these three elements of the model (sample, link, structure) becomes quite useful. pj X p 32 16

Level 2 Models Level 2 models for GLMM have the same form as the standard HLM Coefficients from the level-1 model can be fixed, randomly varying, or non-randomly varying Similar to the standard HLM, model building usually starts with the empty model to approximate the ICC, and estimate overall probability of success More on ICC later 33 Level 1 and Level 2 models Clusters are considered as a random sample from some population of clusters The success probabilities within the clusters, P, are regarded as random variables Thus we have our level 1 and level 2 models X X... X qj 0 j 1 j 1 2 j 2 S q q0 qs s1 W sj u qj pj p Level-2 random effects assumed normally distributed, in the GLMM 34 17

Interpreting the logit is the model s logit prediction ˆ If logit = 0, P(success) = P(failure) Odds = 1 If logit < 0, P(success < P(failure) Odds < 1 If logit > 0, P(success > P(failure) Odds > 1 35 Estimating Probability To get from ˆ (predicted logit) to estimated probability, use back transformation: Odds = exp ( ˆ ) pˆ odds 1 odds (or, use : p ˆ 1 ) 1 exp( ˆ ) 36 18

Example ECLS-K data Y = whether or not the i th child in the j th school is proficient in numeracy at end of first grade (prof45) IVs at level 1 are Number of family risk factors (Numrisks) Gender (n.s.) IVs at level 2 are NbhoodClimate PubPriv2 (private=1, public=0) 37 ICC Based on hierarchical linear probability model, =.1575 00 ICC 2 00 Based on mean, variance of logistic distribution, =.1435 ICC 00 00 3 2 38 19

Final Logistic Model Results Fixed Effects Coefficient (SE) Odds Ratio Model for the Intercepts (β o ) Intercept (γ 00 ) 1.499 (.055) 4.477 ** Neighborhood Problems (γ 01 ) -.102 (.016).903 ** Public/Private (γ 02 ).473 (.111) 1.604 ** Model for NumRisks Slope (β 1 ) Intercept (γ 10 ) -.330 (.043).719 ** Random Effects (Var. Components) Variance Intercept (τ oo ).340 ** 39 Probability Estimates Explanatory Variables Model Predictions NumRisks PubPriv2 NbhoodClim Logits Odds Estimated Probability 0 Public 0 1.499 4.4772.8174 2 Public 6.2270 1.2548.5565 4 Public 12-1.045.3517.2602 0 Private 0 1.972 7.1850.8778 2 Private 6.7000 2.0138.6682 4 Private 12 -.5720.5644.3608 40 20

Summary Logistic Model NumRisks has a negative effect on being in higher categories 4 and 5, = -.330 Children with more family risks are less likely to be in the two higher categories NbhoodClim, = -.102; with more severe climate, less likely to be in highest proficiency categories PubPriv2, =.473; children in private schools more likely to be in higher two categories 41 MULTILEVEL ORDINAL MODEL 42 21

Examples of Ordinal Variables ECLS-K proficiency in early numeracy (6 levels) Teachers stages of concern for adoption of an instructional innovation (8 levels) CBO capacity for implementation of effective HIV prevention interventions (8 levels) No Child Left Behind: States set proficiency standards based on educational assessments (5 levels) Transtheoretical model of behavior change (5 levels) 43 Proportional Odds Model One of several regression models appropriate for ordinal data, and also the most common, is the proportional or cumulative odds model. Model predicts the logit=ln(odds) of being in category k or below. These ln(odds) can be back-transformed into odds and then into cumulative probabilities. We are generally interested in the odds (or probability) of being at or below a specific category (relative to being in higher categories). 44 22

Student-level Responses R = proficiency of i th student in j th school. Need K-1 dummy variables, Y k such that: Y k = 1 if R < k, and 0 otherwise. With K=6 proficiency categories (0 to 5), we have: Y 1 = 1 if R = 0 Y 2 = 1 if R <1 Y 3 = 1 if R <2 Y 4 = 1 if R <3 Y 5 = 1 if R <4 [Y 6 = 1 if R <5 Y 6 = 1 always!] 45 Cumulative Probabilities Using this approach, the probabilities of response are cumulative probabilities: P(Y 1 ) = P(R = 0) P(Y 2 ) = P(R <1) P(Y 3 ) = P(R <2) P(Y 4 ) = P(R <3) P(Y 5 ) = P(R <4) P(Y 6 ) = P(R < 5) = 1.0 46 23

Cumulative Odds The odds of an event is a ratio of the probability that the event happens to the probability that it does not happen. P( R k) P( R k) Odds 1 P( R k) P( R k) If Odds = 1.0, 50/50 chance of an event occurring. If Odds < 1.0, numerator is less likely than denominator, so there is a higher probability that the event does not occur: [Consider.4/.6 =.67] If Odds > 1.0, numerator is more likely than denominator, indicating higher probability that event does occur: [Consider.6/.4 = 1.5] 47 Cumulative Comparisons Category k=0 (Proficiency 0) k=1 (Proficiency 1) k=2 (Proficiency 2) k=3 (Proficiency 3) k=4 (Proficiency 4) Cumulative Cumulative Odds Probability [ Y kj ] P R 0 PR 0 PR 0 P R 1 PR 1 PR 1 P R 2 PR 2 PR 2 P R 3 PR 3 PR 3 P R 4 PR 4 PR 4 Probability Comparison Proficiency 0 versus all levels above Proficiency 0 and 1 combined versus all levels above Proficiency 0,1,2 combined versus 3, 4, 5 combined Proficiency 0,1,2,3 combined versus 4,5 combined Proficiency 0,1,2,3,4 versus proficiency 5 48 24

Proportionality Assumption Proportional odds, sometimes referred to as equal slopes assumption Effect of an explanatory variable remains the same across all simultaneous comparisons or splits to the DV Very restrictive, but parsimonious, assumption Straightforward test in single-level models Ad-hoc approaches for multilevel 49 Level-1 Model logit k P( R ln P( R k) β k) 0j Q q1 β qj X q K 1 D k 2 k δ k This model assumes proportional odds, which means that the effect of the predictor variables on the odds doesn t depend on the category K. ( parallel odds ) The delta s are the thresholds (like intercepts) for each category (the common 0j is the intercept for the first category) 50 25

Level-2 Model β qj γ q0 Sq s γ 1 qswsj u qj Assume the random effects are multivariate normal. var(u ) qj τ qq 51 Logit for the Cumulative Distribution Similar to logistic, the ordinal model uses the logit link k = logit prediction for being at or below the k th category for the i th child in the j th school and to estimate cumulative probability: exp( k ) x k. 1 exp( ) k 52 26

Series of Models Ordinal empty model Ordinal contextual model with NumRisks as a fixed level 1 predictor and NbhoodClim and PubPriv2 as school level predictors of the intercept Model parallel to the earlier logistic model 53 ICC Based on hierarchical linear probability model, =.1451 00 ICC 2 00 Based on mean, variance of logistic distribution, =.1488 ICC 00 00 3 2 54 27

Ordinal Contextual Model Level 1: Level 2: P( R Y k ln( k) ln P( R k) β k) 0 j β 1 j NUMRISKS β0 j γ00 γ01nbhoodclimj γ02pubpriv2 j u0 j K 1 k2 D k δ k 1j γ 10 55 Ordinal Model Results Fixed Effects Coefficient (SE) Odds Ratio Model for the Intercepts (β o ) Intercept (γ 00 ) -7.132 (.344).001 ** Neighborhood Problems (γ 01 ).122 (.015) 1.129 ** Public/Private (γ 02 ) -.424 (.088).655 ** Model for NUMRISKS Slope. (β 1 ) Intercept (γ 10 ).394 (.038) 1.483 ** For thresholds: δ 2 1.951 (.315) 7.034 ** δ 3 3.427 (.335) 30.781 ** δ 4 5.542 (.340) 255.104 ** δ 5 7.919 (.342) 2750.280 ** Random Effects (Var. Components) Variance Var. in Intercepts (τ oo ).309 ** 56 28

Probability Estimates School Type Num Risks Nbhood Clim P(R < cat. 0) P(R < cat. 1) P(R < cat. 2) P(R < cat. 3) P(R < cat. 4) Public 0 0.0008.0056.0240.1694.6872 0 6.0017.0116.0487.2978.8204 2 12.0075.0507.1895.6597.9543 2 0.0018.0122.0513.3096.8285 4 6.0080.0535.1983.6722.9567 4 12.0164.1052.3396.8100.9787 Private 0 0.0005.0037.0158.1177.5898 0 6.0011.0076.0324.2172.7493 2 12.0049.0338.1327.5592.9318 2 0.0011.0080.0342.2269.7597 4 6.0052.0357.1393.5730.9353 4 12.0108.0714.2518.7361.9678 57 Summary: Ordinal Model NumRisks has a positive effect, =.394 As the number of family risks increases, the probability of being at or below a given category, rather than beyond that category, tends to increase Children with greater family risks are more likely to be at or below a given category NbhoodClim, =.122; with more severe climate, increased likelihood of being at or below PubPriv2, = -.424; children in private schools less likely to be at or below given category 58 29

Ad-hoc Investigation of PO Response Estimated R < 0 R < 1 R < 2 R < 3 R < 4 Fixed Effects OR OR OR OR OR Model for the Intercepts (β o ) Intercept (γ 00 ) 0.00 ** 0.01 **.03 ** 0.22 ** 1.97 ** NBHOODCLIM (γ 01 ) 1.03 1.10 * 1.10 ** 1.11 ** 1.16 ** PUBPRIV2 (γ 02 ) 0.00 0.23 * 0.26 ** 0.62 ** 0.69 ** Model for the NUMRISKS slopes (β 1 ) Intercept (γ 10 ) 1.99 * 1.95 ** 1.76 ** 1.39 ** 1.68 ** Entry is the OR for each split. Average tends to match OR for the Cumulative Model. Example: Average for NBHOODCLIM = 1.11; Cum. OR = 1.129. Note: some software (SuperMix, SAS) can do this test directly. 59 MODELS FOR COUNTS 60 30

Y A. Sampling Model: Poisson Distribution ~ P( m, ) Where Y = number of events occurring during an interval of length m m = interval for the rate (i.e., time-span, length, population size, etc.) Must be greater than zero May be constant for every unit Referred to as exposure, i.e., the interval during which you could be exposed to or experience the event (included in model as an offset ) event rate (i.e., 5 times per past year) 61 Expected Values and Variance for Poisson Models EY ( ) m Var( Y ) m Mean and variance are assumed to be equal. The smaller the mean event rate, the smaller the variability of the counts Often leads to situation where we have data that are overdispersed (more variability than expected if data followed a true Poisson process); could also yield underdispersion ) (need replication in order to estimate overdispersion parameter) 62 31

B. Level-1 Link function For binary outcomes, the link is (typically) the logit link For counts, the link is the log link. Indicates how the transformed variable relates back to the original data Bernoulli logit( p Poisson log( ) ) ln 1 p p 63 C. Structural Model Similar to previous (logistic models), but link is now the log link... X qj 0 j 1 j X1 2 j X 2 S q q0 qs s1 W sj u qj pj p 64 32

Interpreting the log(event rate) ˆ is the Poisson model s prediction ˆ ˆ log( ) expected counts = the log of the event rate for the collection of covariates ˆ ˆ exp( ) = the rate parameter, or the average number of events expected in the time period If log = 0, event rate = 1 If log < 0, event rate < 1 (but non-negative) If log > 0, event rate > 1 65 Estimating Event Rates for Poisson Model (1) (assuming constant unit of exposure) Constant term (level one) gives us predictions for log of event rate when all predictors are zero (baseline rate) May not be substantively meaningful for the set of predictors Coefficient of X is the expected difference in log(event rate) when X increases by one-unit 33

Estimating Event Rates for Poisson Model (2) Exp(b) = expected multiplicative increase in the rate (i.e., on the expected number of events) This is why exp(b) referred to as event rate ratios Percent fewer, or percent more, in terms of number of events (1 exp(b))*100% = percent change (increase or decrease) in the rate, for increase of one unit on X 67 Results - Poisson Fixed Effects Coefficient (SE) Event Ratio Model for the Intercepts (βo) Intercept (γ00) 1.418 (.009) 4.13 ** Neighborhood Problems (γ01) -.012 (.003).988 ** Public/Private (γ02).042 (.016) 1.043 ** Model for NumRisks Slope (β1 ) Intercept (γ10) -.055 (.010).947 ** Random Effects (Var. Components) Intercept (τoo) Variance.000 n.s. 68 34

ˆ ˆ Example: Poisson Model Estimates NumRisks PubPriv2 NbhoodClim = ln(event rate) Expected no. of events 0 Public 0 1.42 4.13 2 Public 6 1.26 3.53 4 Public 12 1.10 3.00 0 Private 0 1.46 4.31 2 Private 6 1.30 3.67 4 Private 12 1.14 3.13 Note that Poisson assumptions truly do not fit this example; it is for demonstration and interpretation. Expected number of events mimics expected score for a child. 69 Over- or Underdispersion An issue for non-linear models, particularly for counts Actual data may not follow the strict Poisson model, where variance is equal to the expected value HLM allows you to specify a scale factor for the level-1 variance Var= 2 * for Poisson Note: 2 is not a variance, it s a scaling factor!! If no over/under dispersion, scaling factor = 1; if under- or over-dispersed, it s less than or greater than 1, respectively. 70 35

Scaling factor The scaling factor is used to better estimate the standard errors and variances in the model Little change in fixed effects For our example, the intercept variance was 0 in the Poisson model, and is still small but significant in the scaled model. Page 13 handout 71 Factors affecting dispersion Overdispersion Unaccounted for clustering at level 1 Extreme outliers, or missing levels, or small group sizes (<3) Underdispersion Level one variance may be smaller than assumed Misspecification, such as omission of important variables or large interaction effects In our example, we see under-dispersion 2 =.14904 You can see in handout, standard errors on page 12 are now slightly smaller than they were in the Poisson model Better option: negative binomial (for another day) 72 36

THANK YOU! Oconnell.87@osu.edu 73 SOME EXTRA 74 37

SuperMix test for Proportional Odds Create interaction terms between predictors and thresholds. SuperMix has an option to perform this test this directly PO model: -2LL = 14,701.48 (9 params) Non-PO model: -2LL = 14,650.32 (21 params) DIFF = 2 (12) = 51.16, p<.0001 Evidence for non-proportional odds across at least one predictor Based on previous slide, effect for private schools steadily increasing Effect of private schooling not noticeable for kids at lower levels, but increasingly strong for private-school kids to be beyond, rather than at or below, in higher categories. 75 Summary Considerations (1) We know from logistic HLM that pseudo-likelihood methods not recommended when number of groups is small, or target probabilities are either very small or very large AQ recommended for logistic HLM, but Pinheiro & Chao (2006) warn against assuming this is the also the case for other GLMM s. More research is needed to establish estimation validity under PQL versus AQ or other methods for non-dichotomous models Student (freeware) version of HLM makes it an attractive option PQL default in v6 and v7 Researchers need to be wary of defaults Consider and be aware of implications of choice for estimation. 76 38

Summary Considerations (2) Choice of software may impact quality of model/inferences Vary in terms of default estimation methods and options for additional estimation approaches. As with most statistical packages, the default may not be the best strategy to pursue! Deviances often reported under PL options, so caution is required. Stata, HLMv7, SuperMix, R, GLIMMIX All have different estimation options, different output characteristics Laplace, AQ may have convergence problems, often take longer to reach a solution SAS PROC GLIMMIX likely most extensive array of options; although a challenge to learn/use; syntax mirrors PROC MIXED and SPSS MIXED. Good reference: Stroup (2013) 77 39