Multilevel/Mixed Models and Longitudinal Analysis Using Stata

Similar documents
Categorical and Zero Inflated Growth Models

Lecture (chapter 13): Association between variables measured at the interval-ratio level

Introducing Generalized Linear Models: Logistic Regression

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

Outline. Linear OLS Models vs: Linear Marginal Models Linear Conditional Models. Random Intercepts Random Intercepts & Slopes

Recent Developments in Multilevel Modeling

Lab 11. Multilevel Models. Description of Data

Introduction to Within-Person Analysis and RM ANOVA

Longitudinal Data Analysis Using Stata Paul D. Allison, Ph.D. Upcoming Seminar: May 18-19, 2017, Chicago, Illinois

Lecture 3.1 Basic Logistic LDA

Review of CLDP 944: Multilevel Models for Longitudinal Data

Lecture 3: Multiple Regression. Prof. Sharyn O Halloran Sustainable Development U9611 Econometrics II

Introduction and Background to Multilevel Analysis

SAS Code for Data Manipulation: SPSS Code for Data Manipulation: STATA Code for Data Manipulation: Psyc 945 Example 1 page 1

Random Intercept Models

Subject-specific observed profiles of log(fev1) vs age First 50 subjects in Six Cities Study

Lecture 7: OLS with qualitative information

Multilevel Modeling Day 2 Intermediate and Advanced Issues: Multilevel Models as Mixed Models. Jian Wang September 18, 2012

Time Invariant Predictors in Longitudinal Models

REVIEW 8/2/2017 陈芳华东师大英语系

08/09/2014. Growth Curve Models. Prf. José Fajardo FGV/EBAPE. Modelling Individual Growth. Cross-Sectional Data

Ron Heck, Fall Week 3: Notes Building a Two-Level Model

Designing Multilevel Models Using SPSS 11.5 Mixed Model. John Painter, Ph.D.

Time-Invariant Predictors in Longitudinal Models

Lecture 4: Generalized Linear Mixed Models

Example 7b: Generalized Models for Ordinal Longitudinal Data using SAS GLIMMIX, STATA MEOLOGIT, and MPLUS (last proportional odds model only)

Stat 579: Generalized Linear Models and Extensions

Additional Notes: Investigating a Random Slope. When we have fixed level-1 predictors at level 2 we show them like this:

Longitudinal Data Analysis of Health Outcomes

Step 2: Select Analyze, Mixed Models, and Linear.

Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p )

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is

Time-Invariant Predictors in Longitudinal Models

A Journey to Latent Class Analysis (LCA)

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

In Class Review Exercises Vartanian: SW 540

Description Remarks and examples Reference Also see

An Introduction to Multilevel Models. PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 25: December 7, 2012

Longitudinal Data Analysis

Overview. Overview. Overview. Specific Examples. General Examples. Bivariate Regression & Correlation

MULTILEVEL MODELS. Multilevel-analysis in SPSS - step by step

STAT 5500/6500 Conditional Logistic Regression for Matched Pairs

Group Comparisons: Differences in Composition Versus Differences in Models and Effects

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis

Thursday Morning. Growth Modelling in Mplus. Using a set of repeated continuous measures of bodyweight

Stat 579: Generalized Linear Models and Extensions

Introduction to Event History Analysis. Hsueh-Sheng Wu CFDR Workshop Series June 20, 2016

Exploiting TIMSS and PIRLS combined data: multivariate multilevel modelling of student achievement

Mixed Models for Longitudinal Binary Outcomes. Don Hedeker Department of Public Health Sciences University of Chicago.

Using the same data as before, here is part of the output we get in Stata when we do a logistic regression of Grade on Gpa, Tuce and Psi.

Ordinary Least Squares Regression Explained: Vartanian

Lab 10 - Binary Variables

Final Exam - Solutions

Chapter 11. Regression with a Binary Dependent Variable

One-Way ANOVA. Some examples of when ANOVA would be appropriate include:

A multivariate multilevel model for the analysis of TIMMS & PIRLS data

Longitudinal Data Analysis Using SAS Paul D. Allison, Ph.D. Upcoming Seminar: October 13-14, 2017, Boston, Massachusetts

Binary Dependent Variables

Lecture 3 Linear random intercept models

Modelling heterogeneous variance-covariance components in two-level multilevel models with application to school effects educational research

A (Brief) Introduction to Crossed Random Effects Models for Repeated Measures Data

Estimating a Piecewise Growth Model with Longitudinal Data that Contains Individual Mobility across Clusters

Table B1. Full Sample Results OLS/Probit

Nonlinear Econometric Analysis (ECO 722) : Homework 2 Answers. (1 θ) if y i = 0. which can be written in an analytically more convenient way as

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

Lecture 2: Poisson and logistic regression

Testing Main Effects and Interactions in Latent Curve Analysis

Binary Logistic Regression

ECONOMICS AND ECONOMIC METHODS PRELIM EXAM Statistics and Econometrics August 2013

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression

ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables

Fall Homework Chapter 4

Lecture 12: Effect modification, and confounding in logistic regression

ECON Interactions and Dummies

Homework Solutions Applied Logistic Regression

Applied Statistics and Econometrics

Question 1 carries a weight of 25%; Question 2 carries 20%; Question 3 carries 20%; Question 4 carries 35%.

Study Guide #3: OneWay ANALYSIS OF VARIANCE (ANOVA)

NELS 88. Latent Response Variable Formulation Versus Probability Curve Formulation

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

36-309/749 Experimental Design for Behavioral and Social Sciences. Dec 1, 2015 Lecture 11: Mixed Models (HLMs)

Statistical and psychometric methods for measurement: G Theory, DIF, & Linking

Review of Multiple Regression

Simple Linear Regression: One Qualitative IV

Sociology 362 Data Exercise 6 Logistic Regression 2

Interaction effects between continuous variables (Optional)

Lecture 4 Multiple linear regression

LDA Midterm Due: 02/21/2005

Equation Number 1 Dependent Variable.. Y W's Childbearing expectations

Lecture 5: Poisson and logistic regression

Ordinary Least Squares Regression Explained: Vartanian

multilevel modeling: concepts, applications and interpretations

Draft Proof - Do not copy, post, or distribute. Chapter Learning Objectives REGRESSION AND CORRELATION THE SCATTER DIAGRAM

Describing Nonlinear Change Over Time

STAT 7030: Categorical Data Analysis

CHAPTER 4 & 5 Linear Regression with One Regressor. Kazu Matsuda IBEC PHBU 430 Econometrics

MIXED MODELS FOR REPEATED (LONGITUDINAL) DATA PART 2 DAVID C. HOWELL 4/1/2010

Latent Growth Models 1

Transcription:

Multilevel/Mixed Models and Longitudinal Analysis Using Stata Isaac J. Washburn PhD Research Associate Oregon Social Learning Center Summer Workshop Series July 2010 Longitudinal Analysis 1

Longitudinal Example A child s growth in weight Variables id: child s id weight: child s weight in Kg age: child s age in years male: 1 male; 0 female 2 Plot of Growth Trajectories by Gender Looking for possible random intercept and slope As well as any functional form *plot observed growth trajectories sort id age graph twoway (line weight age, /// connect(ascending)), by(male) /// xtitle(age in years) ytitle(weight in Kg) 3 2

4 Steps for Estimating the Model Unconditional Means Model (Singer and Willett, 2003) Get ICC Random Intercept Model Allow for the intercept to vary across individuals See General Growth Pattern Random Coefficient Model (also called a Random Intercept and Slope Model or Unconditional Growth Model) Allow intercept and slope to vary across individuals 5 3

Unconditional Means Model This model will allow us to get an idea of the dependence of our observations * Unconditional Means Model xtmixed weight id:, mle 6 7 4

Intra-Class Correlation 8 Random Intercept Model xtmixed weight age age2 id:,cov(un) mle 9 5

10 Results-Random Intercept Model The children are estimated to be 3.43Kg at the start. They grow linearly at 7.82Kg per year, but the quadratic dampens this with an effect of -1.71Kg. This model does much better than a linear regression model, 2 (3) = 78.07, p <.001 The standard deviation for the intercept (_cons) is.92. This is huge, 95% of the intercepts for the children are within 1.84 kg of the intercept of 3.43. This is quite a range. 11 6

Random Coefficient Model xtmixed weight age age2 id:age,cov(un) mle 12 13 7

Results Random Coefficient Model The estimates for the growth don t change much, but there is substantial variance for the slope of age). Although the slope for age and the intercept are correlated, r =.275, this correlation is not significantly different from zero. We could remove the unstructured covariance matrix. (I tried this and it didn t seem to matter much). To have a true statistical test of model differences we use the log-likelihood test 14 Quick Comment on Covariance 4 possible covariance structures in Stata Independent Separate Estimate for each random effect and all pairwise covariances are set to zero. Exchangeable Single Estimate for each random effect and a single estimate for all pairwise covariances. Identity Single Estimate for each random effect and all pairwise covariances are set to zero. Unstructured Every random effect and pairwise covariance is estimated separately 15 8

Log-Likelihood Test *Log-likelihood Test quietly:xtmixed weight age age2 id:, mle cov(un) estimates store intercept quietly:xtmixed weight age age2 id: age, cov (un) mle estimates store slope lrtest intercept slope Likelihood-ratio test LR chi2(2) = 37.51 (Assumption: intercept nested in slope) Prob > chi2 = 0.000 Note: The reported degrees of freedom assumes the null hypothesis is not on the boundary of the parameter space. If this is not true, then the reported test is conservative. 16 Graph of Results 17 9

Graph of Results-Continued 18 Graph of Results-Continued 19 10

Graphs-Stata Commands Fitted Results only predict pred twoway (line pred age, sort connect(asscending)) Fitted Results and Individual Results predict fitted, fitted twoway (connected fitted age, lwidth(thin) /// lpattern(dash) connect(ascending)) (line pred /// age, sort lcolor(green) lwidth(thick) /// lpattern(solid) connect(ascending)) twoway (connected fitted age in 1/30, /// lpattern(dash) connect(ascending)) (line pred /// age, sort lcolor(green) lwidth(thick) /// lpattern(solid) connect(ascending)) 20 Adding a Time-Invariant Covariate We have several possible models: We could have a random slope model where girls and boys differ only on their intercept. We could have a random coefficient model where the intercept varies not only by child, but also by gender. We could also have an interaction between gender and age and age 2 21 11

Results-Time invariant Covariant, No Random Coefficient 22 Results-Time invariant Covariant, Random Coefficient 23 12

Log-Likelihood Test xtmixed weight age age2 male id:age,cov(unstr) mle estimates store invar xtmixed weight age age2 male id:age male, cov(unstr) mle estimates store invarr lrtest invar invarr Likelihood-ratio test LR chi2(3) = 1.09 (Assumption:invar nested in invarr) Prob > chi2 = 0.7795 Note: The reported degrees of freedom assumes the null hypothesis is not on the boundary of the parameter space. If this is not true, then the reported test is conservative. 24 Factors in Stata 25 13

Results-Time invariant Covariant, Interaction 26 Log-Likelihood Test xtmixed weight age age2 male id: age, cov(unstr) mle estimates store invar xtmixed weight c.age##c.age##c.male id: age, cov (unstr) mle estimates store inter lrtest invar inter Likelihood-ratio test LR chi2(2) = 3.39 (Assumption: invar nested in inter) Prob > chi2 = 0.1837 27 14

Adding a Time-Varying Covariate We have several possible models: We could have a random intercept model We could have a interaction between gender and being sick We could have a random coefficient model 28 Results-Time invariant Covariant, No Random Coefficient 29 15

Results-Time invariant Covariant, Interaction 30 Log-Likelihood Test *Random Slope Model xtmixed weight age age2 male sick id: age, cov(unstr) mle estimates store sick *Interaction between gender and sick xtmixed weight age age2 c.male##c.sick id: age, cov (unstr) mle estimates store sick_inter lrtest sick sick_inter Likelihood-ratio test LR chi2(1) = 0.71 (Assumption: sick nested in sick_inter) Prob > chi2 = 0.4007 31 16

Results-Time invariant Covariant, Random Coefficient 32 Log-Likelihood Test *Random Slope Model xtmixed weight age age2 male sick id: age, cov(unstr) mle estimates store sick *Check if sick has a random component xtmixed weight age age2 male sick id: age sick, cov (unstr) mle estimates store sick_rand lrtest sick sick_rand Likelihood-ratio test LR chi2(3) = 10.94 (Assumption: sick nested in sick_rand) Prob > chi2 = 0.0120 33 17

3-level model. Early Years of Marriage Project Level 1 observations across time happy: variable for marital happiness (y ijk ) year: year of interview year2: year squared Level 2 individuals id: identifier for individuals (j) gender: gender of individual educ: number of years of education age: age of participate at first interview Level 3 Family couple: identifies couples (k) race: with no interracial marriages this is either black or white for the couple kids: dummy for having kids under age 12 m_income: household income income_diff: difference between spouse s incomes educ_diff: difference between spouse s education age_diff: difference between spouse s age 34 Data Structure 35 18

A OLS Regression This regression ignores all of the dependences in the data. 36 Multilevel Test of Different models *Model a, full model xtmixed hap year year2 gender race kids age /// age_diff educ educ_diff m_income /// income_diff if use == 0 couple: /// id:year,var mle cov(un) estat ic estimates store a *Model b, remove non-significant covariates xtmixed hap year year2 gender race kids /// age_diff educ m_income if use == 0 /// couple: id:year,var mle cov(un) estat ic estimates store b lrtest a b 37 19

Model A 38 Model B 39 20

Multilevel Test of Different models *Model c, Add Gender by Year and Gender BY Year^2 Interaction xtmixed hap c.gender##c.year##c.year race /// kids age_diff educ m_income if use == 0 /// couple: id:year,var mle cov(un) estat ic estimates store c lrtest b c *Model d, Remove Gender BY Year^2 Interaction xtmixed hap c.gender##c.year year2 race /// kids age_diff educ m_income if use == 0 /// couple: id:year,var mle cov(un) estat ic estimates store d lrtest b d 40 Model C 41 21

Model D 42 43 Choosing a Model-Information obtained from estst and lrtest Model A Model B Model C Model D Obs 2325 2325 2325 2325 Log-likelihood -2480.051-2481.351-2478.826-2479.420 df 17 14 16 15 AIC 4994.103 4990.702 4989.652 4988.839 BIC 5091.878 5071.223 5081.675 5075.111 Adding to or Taking From Model Likelihood-Ratio test df p A vs B Taking 2.60 3 0.458 B vs C Add 5.05 2 0.080 B vs D Add 3.86 1 0.049 22

Graphical Results of Model D, by Gender 44 Graphical Results of Model D, by Kids 45 23

Stata Commands predict path twoway (qfit path year if gender==0) (qfit path year if gender==1), ytitle(predicted Martial Happiness) xtitle(year) legend(order(1 "Female" 2 "Male")) twoway (qfit path year if kids==0) (qfit path year if kids==1), ytitle (Predicted Martial Happiness) xtitle (Year) legend(order(1 "No Young Kids" 2 "Young Kids")) 46 Logistic Multilevel Models Both Cross-Sectional and Longitudinal Same Basic Structure as Regression Multilevel Models Just using Logistic instead of OLS Examples Cross-Sectional: Immunization Longitudinal: Limit Breaking and Parenting Style 47 24

Data Structure 48 Simple Logistic 49 25

Same Model in Multilevel 50 Odds Ratios 51 26

Results Dad s having some education has an odds ratio of 1.55. The odds a child will be immunized are 55% greater [(OR -1) 100] if the father has some education. This is stronger than the effect for mothers. The effect of proportion indigenous is tricky to interpret. The odds ratio is a bit misleading as this variable is ranges from 0 to 1 and you would be comparing a community with no indigenous people to one that was 100% indigenous. The mean of pcind81 is.467 and the SD is.375. One useful way of seeing the effect of a continuous variable like this is to see what a one standard deviation change (.375) would be as an odds ratio. di exp(.375*-1.232638).62987163 Thus, a one standard deviation change in the proportion indigenous reduces the odds of a child having immunization by 37% [(1 OR) 100]. Whether you use, 1.0, one SD, or something else is subjective. Doing this it is fair to say that rural has a bigger effect since a rural community has an odds ratio that is.429, reducing the odds by 57%. 52 Limit Breaking-NLSY Predict youth reported limit breaking with mother s parenting style and risk of physical violence in the home. Parenting style is a categorical variable describing the mother s parenting style as uninvolved, permissive, authoritarian, or authoritative. Risk of physical violence in the home is a generic scale that is transformed to run from 0 to 10, with more risk as you get closer to 10 We also control for age of the youth 53 27

Graph of Limit Breaking by Style 54 Simple Logistic Regression for wave 1 55 28

Longitudinal Structure 2677*3=8031 (342+263+184+72)*2=1722 (17+14)*1=31 8031+1722+31=9784 56 Multilevel Analysis-Odds Ratios Why the Difference? FIML allows for use of information from dependent variable but not independent 57 29

Predicted Probabilities by Style 58 Randomized trial Example Evaluation of school-based intervention Done in high-risk elementary schools in the Hawai`i school district Five Waves of data with a total of 7,347 observations from 2,646 children distributed over 20 schools, with an average of 2.8 waves of data for each student 59 30

3 Level Random Intercept Model Condition (0=Control, 1=Treatment) Gender (0=Girl, 1=Boy) Cohort (0 = 2 nd grade in Wave 1, 1=1 st grade in Wave 1) 60 First more on Intra-Class Correlation In the three level model, there is not a single ICC, but rather several possible. We will discuss two: Proportion of Variance between Schools Proportion of Variance between Scores 61 31

Unconditional Means Model 62 Two Possible Intraclass Correlations. *Finding the ICC. *Proportion of Variance due to school membership. dis (6.232196)/(6.232196+166.0717+414.3931).01062251. *Proportion of Variance due to children within same school. dis (6.232196+166.0717)/(6.232196+166.0717+414.3931).29368464 63 32

Full Model Results 64 The Time BY Condition Interactions The effect of condition is split into three pieces here 1) Main effect (effect on intercept) 2) Linear effect (year by condition) 3) Quadratic effect (year 2 by condition) To get the affect of condition at any one wave you need all three. The comparison is the intercept, year effect, and the quadratic effect) 65 33

66 Stata Command for Graph twoway (qfit path y if cond==1 & boy==0) /// (qfit path y if cond==0 & boy==0) /// (qfit path y if cond==1 & boy==1) /// (qfit path y if cond==0 & boy==1), /// ytitle(percent of Maximum Possible) /// xtitle(year) /// title(estimated Change in Positive Behavior) /// subtitle(student Reports) /// legend(order(1 "PA Group Girls" /// 2 "Control Group Girls" 3 "PA Group Boys" /// 4 "Control Group Boys")) ylabel(#5) 67 34

Full Commands In 2010_SIRM_july_15,afternoon.do Used the following datasets asian.dta couple1234v4.dta guatemala.dta LimitsV2.dta hawaii.dta 68 35