Lab 11. Multilevel Models. Description of Data

Similar documents
SAS Syntax and Output for Data Manipulation:

over Time line for the means). Specifically, & covariances) just a fixed variance instead. PROC MIXED: to 1000 is default) list models with TYPE=VC */

Random Coefficient Model (a.k.a. multilevel model) (Adapted from UCLA Statistical Computing Seminars)

SAS Syntax and Output for Data Manipulation: CLDP 944 Example 3a page 1

SAS Code for Data Manipulation: SPSS Code for Data Manipulation: STATA Code for Data Manipulation: Psyc 945 Example 1 page 1

Designing Multilevel Models Using SPSS 11.5 Mixed Model. John Painter, Ph.D.

Testing Indirect Effects for Lower Level Mediation Models in SAS PROC MIXED

Subject-specific observed profiles of log(fev1) vs age First 50 subjects in Six Cities Study

Practice with Interactions among Continuous Predictors in General Linear Models (as estimated using restricted maximum likelihood in SAS MIXED)

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models:

Review of CLDP 944: Multilevel Models for Longitudinal Data

Describing Change over Time: Adding Linear Trends

Example 7b: Generalized Models for Ordinal Longitudinal Data using SAS GLIMMIX, STATA MEOLOGIT, and MPLUS (last proportional odds model only)

Simple logistic regression

36-402/608 Homework #10 Solutions 4/1

ANOVA Longitudinal Models for the Practice Effects Data: via GLM

Introduction to Random Effects of Time and Model Estimation

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

Introducing Generalized Linear Models: Logistic Regression

Longitudinal Modeling with Logistic Regression

Ron Heck, Fall Week 3: Notes Building a Two-Level Model

Longitudinal Data Analysis of Health Outcomes

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

Multilevel/Mixed Models and Longitudinal Analysis Using Stata

Introduction to SAS proc mixed

Analysis of variance and regression. May 13, 2008

Advantages of Mixed-effects Regression Models (MRM; aka multilevel, hierarchical linear, linear mixed models) 1. MRM explicitly models individual

Correlated data. Repeated measurements over time. Typical set-up for repeated measurements. Traditional presentation of data

Introduction to Within-Person Analysis and RM ANOVA

Repeated Measures Design. Advertising Sales Example

Mixed Models for Longitudinal Binary Outcomes. Don Hedeker Department of Public Health Sciences University of Chicago.

unadjusted model for baseline cholesterol 22:31 Monday, April 19,

Describing Within-Person Change over Time

Random Effects. Edps/Psych/Stat 587. Carolyn J. Anderson. Fall Department of Educational Psychology. university of illinois at urbana-champaign

Missing Data in Longitudinal Studies: Mixed-effects Pattern-Mixture and Selection Models

STA6938-Logistic Regression Model

General Linear Model (Chapter 4)

Lecture 11 Multiple Linear Regression

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

Random Intercept Models

Introduction and Background to Multilevel Analysis

Longitudinal Data Analysis Using SAS Paul D. Allison, Ph.D. Upcoming Seminar: October 13-14, 2017, Boston, Massachusetts

Covariance Structure Approach to Within-Cases

MIXED MODELS FOR REPEATED (LONGITUDINAL) DATA PART 2 DAVID C. HOWELL 4/1/2010

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Answer to exercise: Blood pressure lowering drugs

Biostatistics Workshop Longitudinal Data Analysis. Session 4 GARRETT FITZMAURICE

Independence (Null) Baseline Model: Item means and variances, but NO covariances

Review of Unconditional Multilevel Models for Longitudinal Data

Exam Applied Statistical Regression. Good Luck!

SEM Day 3 Lab Exercises SPIDA 2007 Dave Flora

Supplemental Materials. In the main text, we recommend graphing physiological values for individual dyad

Cohen s s Kappa and Log-linear Models

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

Statistical Inference: The Marginal Model

Additional Notes: Investigating a Random Slope. When we have fixed level-1 predictors at level 2 we show them like this:

Research Design: Topic 18 Hierarchical Linear Modeling (Measures within Persons) 2010 R.C. Gardner, Ph.d.

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

Statistics 5100 Spring 2018 Exam 1

Multilevel Modeling of Non-Normal Data. Don Hedeker Department of Public Health Sciences University of Chicago.

Models for binary data

Growth Mixture Model

Variance component models part I

Binary Logistic Regression

Topic 14: Inference in Multiple Regression

Some general observations.

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key

36-309/749 Experimental Design for Behavioral and Social Sciences. Dec 1, 2015 Lecture 11: Mixed Models (HLMs)

Models for longitudinal data

Hypothesis Testing for Var-Cov Components

Possibly useful formulas for this exam: b1 = Corr(X,Y) SDY / SDX. confidence interval: Estimate ± (Critical Value) (Standard Error of Estimate)

Logistic Regression - problem 6.14

Count data page 1. Count data. 1. Estimating, testing proportions

Dyadic Data Analysis. Richard Gonzalez University of Michigan. September 9, 2010

6. Multiple regression - PROC GLM

Generalized Linear Models for Non-Normal Data

Random Coefficients Model Examples

Booklet of Code and Output for STAC32 Final Exam

multilevel modeling: concepts, applications and interpretations

An Introduction to Multilevel Models. PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 25: December 7, 2012

SAS Analysis Examples Replication C11

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Correlated data. Longitudinal data. Typical set-up for repeated measurements. Examples from literature, I. Faculty of Health Sciences

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

STAT 705 Generalized linear mixed models

Review of the General Linear Model

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Logistic regression: Miscellaneous topics

Regression. Estimation of the linear function (straight line) describing the linear component of the joint relationship between two variables X and Y.

Simple Linear Regression: One Qualitative IV

In Class Review Exercises Vartanian: SW 540

Lecture 3 Linear random intercept models

Value Added Modeling

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

Linear models Analysis of Covariance

SAS Analysis Examples Replication C8. * SAS Analysis Examples Replication for ASDA 2nd Edition * Berglund April 2017 * Chapter 8 ;

Introduction to SAS proc mixed

Longitudinal Data Analysis Using Stata Paul D. Allison, Ph.D. Upcoming Seminar: May 18-19, 2017, Chicago, Illinois

Linear models Analysis of Covariance

Transcription:

Lab 11 Multilevel Models Henian Chen, M.D., Ph.D. Description of Data MULTILEVEL.TXT is clustered data for 386 women distributed across 40 groups. ID: 386 women, id from 1 to 386, individual level (level 1). GROUP: 40 groups, group from 1 to 40, group level (level 2). The groups meet regularly to discuss diet and weight control, group size ranges from 5 to 15 women. MOTIVATC: motivation to lose weight, measured on a six-point scale, and centered around the grand mean of the 386 cases. WEIGHTL: weight loss in pounds. WEIGHTG: 0=light weight loss (weight loss<15 pounds) 1=heavy weight loss (weight loss>=15) Linear Regression Model for n=386 (analysis based on individual level--level 1) proc import datafile='a:multilevel.txt' out=multilevel dbms=tab replace; proc print; proc reg; model weightl=; model weightl=motivatc; Dependent Variable: WEIGHTL Model 1 Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept 1 15.00259 0.23034 65.13 <.0001 Model 2 Intercept 1 14.99158 0.15561 96.34 <.0001 MOTIVATC 1 3.27029 0.15255 21.44 <.0001

Linear Regression Models by group (analysis based on group level--level 2) proc reg outest=out; model weightl=motivatc; by group; data groups; set out; keep group intercept motivatc; proc print data=groups; proc univariate data=groups; var intercept motivatc; histogram intercept motivatc / normal; Linear Regression Models by group cont. GROUP INTERCEPT MOTIVATC (SLOPE) 1 16.5071-0.14286 2 15.6097 4.75000 3 13.8069 3.86207 4 11.4107 2.50000 5 15.7500 1.66667 6 15.3688 1.37500 7 13.8386 1.80000 8 12.8114 1.50000 9 12.1250 2.11538 10 13.0294 1.79817... 19 14.8769 4.00000 20 20.8812 3.62500... 39 13.8574 4.97872 40 15.1863 3.52500 40 intercepts: from 11.4107 to 20.8812 40 slopes: from 1.4000 to 9.1667 Distribution of 40 Intercepts

Distribution of 40 Slopes Linear Regression Models by group cont. There is no need to use a two-level model if: 1) all 40 intercepts are identical (no variance among the intercepts), and 2) all 40 slopes are identical (no variance among the slopes). But in this case the intercepts and slopes varied. There is the effect of clustering or group membership on the weight loss. Unconditional Two-Level Linear Regression Model (1) proc mixed data=multilevel covtest noitprint; class group; model weightl= / solution; random intercept / subject=group; PROC MIXED statement calls the procedure. COVEST: tests the variance components (random effects). NOITPRINT statement tells SAS not to print the iteration history. CLASS statement specifies that GROUP is classification variable as opposed to continuous variable. MODEL statement is an equation whose left-side contains the name of the dependent variable, in this case WEIGHTL. The right-hand side contains a list of the fixed-effect variables (predictors). The intercept is contained in all models. We just test the intercept in this unconditional model. RANDOM statement contains a list of the random effects, in this case intercept.

Unconditional Two-Level Linear Regression Model (2) The Mixed Procedure Model Information Data Set WORK.MULTILEVEL Dependent Variable WEIGHTL Covariance Structure Variance Components Subject Effect GROUP Estimation Method REML Residual Variance Method Profile Fixed Effects SE Method Model-Based Degrees of Freedom Method Containment Dimensions Covariance Parameters 2 Columns in X 1 Columns in Z Per Subject 1 Subjects 40 Max Obs Per Subject 15 Observations Used 386 Observations Not Used 0 Total Observations 386 Unconditional Two-Level Linear Regression Model (3) Covariance Parameter Estimates (Random Effects) Standard Z Cov Parm Subject Estimate Error Value Pr Z Intercept GROUP 4.9062 1.5603 3.14 0.0008 Residual 16.0695 1.2251 13.12 <.0001 4.9062 is the variance among the 40 group intercepts (between group variance). This value is significantly greater than zero. It tells us that there is random variation among the intercepts of the individual groups, we should not ignore clustering. 16.0695 is the level 1 residual variance (within group variance). This value is significantly greater than zero. The degree of clustering is measured by intraclass correlation (ICC). The ICC measures the proportion of the total variance of a variable that is accounted for by the clustering. The ICC ranges from 0 for complete independence of observations to 1 for complete dependence. One of the linear regression analysis assumption is that ICC=0. ICC = 4.9062 / (4.9062 + 16.0695) = 0.24. The estimated ICC of 0.24 is very substantial. Unconditional Two-Level Linear Regression Model (4) The Mixed Procedure: Solution for Fixed Effects Standard Effect Estimate Error DF t Value Pr > t Intercept 15.1154 0.4090 39 36.95 <.0001 15.1154 tells us the average group-level weight loss in our sample (DF=39). Model 1 (n=386) Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept 1 15.00259 0.23034 65.13 <.0001 15.00259 tells us the average individual level weight loss. In OLS model, the standard error is 0.23034, smaller than the standard error of multilevel model (0.4090). This means OLS overestimates the finding.

Two-Level Model Including Level-1 Predictor (motivatc) (1) proc mixed data=multilevel covtest noitprint; class group; model weightl=motivatc / solution notest; random intercept motivatc / subject=group; Solution for Fixed Effects Effect Estimate Standard Error DF t Value Pr > t Intercept 15.1401 0.2846 39 53.19 <.0001 MOTIVATC 3.0835 0.2157 39 14.29 <.0001 For every one unit increase in motivation (6-point scale), the weight loss increases by 3.08 pounds, with an average weight loss per group of 15.14 pounds. Two-Level Model Including Level-1 Predictor (motivatc) (2) Covariance Parameter Estimates (Random Effects) Standard Z Cov Parm Subject Estimate Error Value Pr Z Intercept GROUP 2.5031 0.7683 3.26 0.0006 MOTIVATC GROUP 1.0024 0.4043 2.48 0.0066 Residual 5.8828 0.4694 12.53 <.0001 2.5031 is the variance among the 40 group intercepts. This value is significantly greater than zero. Significant random variation among the intercepts indicates we need to include clustering in our analysis. 1.0024 is the variance among the 40 group slopes. This value is significantly greater than zero. Significant random variation among the slopes indicates we need to include clustering in our analysis. 5.8828 is the level 1 residual variance. This value is significantly greater than zero. Two-Level Model Including Level-1 Predictor (motivatc) (3) Unconditional Model Conditional Model ( + motivatc ) Intercept 4.9062 2.5031 Residual 16.0695 5.8829 (16.0695 5.8829) / 16.0695 = 63.4% 63.4% of the within group variance in weight loss has been accounted for by the level 1 motivation variable. (4.9062 2.5031) / 4.9062 = 49.0% 49.0% of the between-group differences in average weight loss has been accounted for by the level 1 motivation variable.

Logistic Regression Model proc logistic data=multilevel descending; model weightg=motivate; Total WEIGHTG Frequency 1 216 0 170 Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1-6.0476 0.6753 80.2067 <.0001 MOTIVATE 1 1.8616 0.1990 87.5221 <.0001 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits MOTIVATE 6.434 4.356 9.503 Two-Level Logistic Regression Model (1) SAS Program %inc 'c:\apps\sas institute \sas\v8\stat\sample \glimmix.sas /nosource; %glimmix(data=multilevel, stmts=%str( class group; model weightg=motivate / notest solution; random intercept / subject=group;), error=binomial) Two-Level Logistic Regression Model (2) Covariance Parameter Estimates (Random Effects) Cov Parm Subject Estimate Intercept GROUP 1.3227 Residual 0.7175 ICC = 1.3227 / (1.3227 + 0.7175) = 64.8%. We have to employ the two-level logistic regression model. Solution for Fixed Effects Standard Effect Estimate Error DF t Value Pr > t Intercept -6.7345 0.6901 39-9.76 <.0001 MOTIVATE 2.0699 0.1941 345 10.66 <.0001 OR=exp(2.0699)=7.92, the odds of weight loss increase by 7.92 for each unit increase in motivate.

Multilevel Growth Models Models that represent individual trajectories over age or time A common form of growth trajectory is specified for all individuals, but individuals may vary in the parameters that characterize the growth (e.g., slope and intercept for linear growth). ID AGE WAVE XFIN2 DXPDSUMA SEX 101011-5.00 1 20.00 0 0 101011-4.92 2 20.00 0 0 101011-4.83 3 20.00 0 0 101011-4.75 4 20.00 0 0 101011-4.67 5 20.00 0 0 101011-4.58 6 20.00 0 0 101011-4.50 7 20.00 0 0 101011-4.42 8 20.00 0 0 101011-4.33 9 20.00 0 0 101011-4.25 10 20.00 0 0.. 101011 3.25 100 62.50 0 0 101612-5.00 1.00 1 1 101612-4.92 2.00 1 1 101612-4.83 3.00 1 1 101612-4.75 4.00 1 1.. 329110 3.92 108 62.50 1 0 329110 4.00 109 62.50 1 0 329110 4.08 110 62.50 1 0 329110 4.17 111 62.50 1 0 329110 4.25 112 62.50 1 0 329110 4.33 113 62.50 1 0 Our Transitions data consists of 27244 monthly records for 233 subjects ages 17 to 27. Subject 101011 is the first subject in our data and has 100 records, only some of which are shown here. Subject 329110 is the last subject in our data and here is represented by a few of the 120 months of data. We are looking at the relationship between Finance Transitional Level (XFIN2, from 0 to 100), sex (male=1), and any PD diagnosis in adolescence (DXPDSUMA, yes=1). We centered age by 22. Unconditional Linear Growth Model (1) SAS Program proc mixed method=ml noclprint covtest noitprint; class id wave; model xfin2 = age /solution ddfm=bw notest; random intercept age / type=un subject=id; repeated wave / type=ar(1) subject=id; * RANDOM / TYPE=un, is the structure of the variancecovariance matrix for the intercepts and slopes. ** REPEATED / TYPE=ar(1) is the structure of variancecovariance matrix within person (error-covariance structure)

Unconditional Linear Growth Model (2) Covariance Parameter Estimates (Random Effects) Standard Z Cov Parm Subject Estimate Error Value Pr Z Intercept ID 81.0862 16.9389 4.79 <0.0001 Slope ID 1.0290 0.7877 1.31 0.0957 Intercept*Slope ID 14.8738 2.3227 6.40 <0.0001 AR(1) ID 0.9436 0.0037 257.41 <0.0001 Residual 318.5400 20.7905 15.32 <0.0001 81.0862 is the variance among the 233 intercepts. This value is significantly greater than zero; intercepts vary across persons. 1.0290 is the variance among the 233 slopes. P=0.0957 means slopes don t vary across persons. ICC for intercepts = 81.0862 / (81.0862 + 14.8738 + 1.029 + 0.9438 + 318.54) = 0.195 Unconditional Linear Growth Model (3) Solution for Fixed Effects Effect Estimate Standard Error DF t Value Pr > t Intercept 44.2848 0.8150 232 54.34 <.0001 age 3.9377 0.1690 27E3 23.30 <.0001 44.2848 is our estimate of the average intercept across persons (the average finance transition level when age=22) 3.9377 is our estimate of the average slope across persons. The average young adult has a finance transition level score of 44.2848 at age 22, and increasing about 4 points per year. Multilevel Growth Model including Level-2 Predictors (1) SAS Program proc mixed method=ml noclprint covtest noitprint; class id wave; model xfin2 = age sex dxpdsuma age*sex age*dxpdsuma /solution ddfm=bw notest; random intercept age / type=un subject=id; repeated wave / type=ar(1) subject=id; This model adds sex, PD and their slopes to the previous model.

Multilevel Growth Model including Level-2 Predictors (2) Covariance Parameter Estimates (Random Effects) Standard Z Cov Parm Subject Estimate Error Value Pr Z Intercept ID 68.6310 16.0446 4.28 <0.0001 Slope ID 0.8316 0.7749 1.07 0.1416 Intercept*Slope ID 13.3323 2.1768 6.12 <0.0001 AR(1) ID 0.9439 0.0037 256.95 <0.0001 Residual 319.1200 20.9073 15.26 <0.0001 The variance for the intercepts changed from 81.0862 to 68.631. Computing (81.0862 68.631) / 81.0862 = 0.154, we find a 15.4% reduction. In other word, sex, PD, and their interaction with age account for 15.4% of the explained variation in intercepts. The variance for the slopes changed from 1.029 to 0.8316. Computing (1.029 0.8316) / 1.029 = 0.192, we find a 19.2% reduction. In other word, sex, PD, and their interaction with age account for 19.2% of the explained variation in slopes. Multilevel Growth Model including Level-2 Predictors (3) Effect Estimate Standard Error DF t Value Pr > t Intercept 42.7143 1.1861 230 36.01 <.0001 age 3.7394 0.2523 27E3 14.82 <.0001 sex 5.4264 1.5777 230 3.44 0.0007 dxpdsuma -4.6373 1.9198 230-2.42 0.0165 age*sex 0.6918 0.3361 27E3 2.06 0.0396 age*dxpdsuma -0.6013 0.4099 27E3-1.47 0.1425 42.7143 is our estimate of the average intercept across persons (the average finance transition level for female without PD at their age 22). 3.7394 is our estimate of the average slope across persons, increasing about 3.7 points per year. 5.4264 is the difference on finance transition level between male and female. Male is 5.4264 higher than female. -4.6373 is the difference on finance transition level between subjects with PD and subjects without PD. Subjects with PD are 4.6373 lower than subjects without PD. 0.6918 is the difference on linear slope between male and female. Slope for male is 3.7394 + 0.6918 = 4.4312. Female s slope is 3.7394. SEXPNT 80 70 60 50 Age 22, male intercept + 5.426 40 Age 22, female 30 intercept = 42.714 20 17 19 21 23 25 27 AGE Here, graphically, is the difference between males and females SEX Female Male 5.4264 is the difference on finance transition level between male and female. Male is 5.4264 higher than female. 0.6918 is the difference on linear slope between male and female. Slope for male is 3.7394 + 0.6918 = 4.4312. Female s slope is 3.7394.

Finance Transitional Level 70 60 50 40 30 20 17 19 21 23 25 27 Here, graphically, is the difference between subjects with adolescent PD and those without DX No diagnosis PD diagnosed in adolescence AGE -4.6373 is the difference on finance transition level between subjects with PD and subjects without PD. Subjects with PD is 4.6373 less than subjects without PD. -0.6013 is the difference on linear slope between subjects with PD and subjects without PD. Slope for subjects with PD is 3.7394 0.6013 = 3.1381. Slope for subjects without PD is 3.7394.