Newsom Psy 510/610 Multilevel Regression, Spring

Similar documents
Designing Multilevel Models Using SPSS 11.5 Mixed Model. John Painter, Ph.D.

Research Design: Topic 18 Hierarchical Linear Modeling (Measures within Persons) 2010 R.C. Gardner, Ph.d.

Hypothesis Testing for Var-Cov Components

Hierarchical Linear Models (HLM) Using R Package nlme. Interpretation. 2 = ( x 2) u 0j. e ij

Daniel J. Bauer & Patrick J. Curran

Contents. Acknowledgments. xix

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. About This Book... xiii About The Author...

Statistical Inference: The Marginal Model

Random Intercept Models

Random Effects. Edps/Psych/Stat 587. Carolyn J. Anderson. Fall Department of Educational Psychology. university of illinois at urbana-champaign

Single and multiple linear regression analysis

Simple Linear Regression

1 A Review of Correlation and Regression

LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION

Random Effects. Edps/Psych/Stat 587. Carolyn J. Anderson. Fall Department of Educational Psychology. c Board of Trustees, University of Illinois

Investigating Models with Two or Three Categories

Interactions among Continuous Predictors

MULTILEVEL MODELS. Multilevel-analysis in SPSS - step by step

1 Introduction to Minitab

Bivariate data analysis

Estimation and Centering

Tom A.B. Snijders, ICS, University of Groningen Johannes Berkhof, Free University, Amsterdam

Random Coefficient Model (a.k.a. multilevel model) (Adapted from UCLA Statistical Computing Seminars)

Stat 209 Lab: Linear Mixed Models in R This lab covers the Linear Mixed Models tutorial by John Fox. Lab prepared by Karen Kapur. ɛ i Normal(0, σ 2 )

Introduction

Supplemental Materials. In the main text, we recommend graphing physiological values for individual dyad

CAMPBELL COLLABORATION

Lecture 2 Linear Regression: A Model for the Mean. Sharyn O Halloran

Ref.: Spring SOS3003 Applied data analysis for social science Lecture note

Activity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression

SAS Code for Data Manipulation: SPSS Code for Data Manipulation: STATA Code for Data Manipulation: Psyc 945 Example 1 page 1

MULTILEVEL IMPUTATION 1

MLA Software for MultiLevel Analysis of Data with Two Levels

Introduction to the Analysis of Hierarchical and Longitudinal Data

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA

Introduction

DISCOVERING STATISTICS USING R

Spring RMC Professional Development Series January 14, Generalized Linear Mixed Models (GLMMs): Concepts and some Demonstrations

4 Multicategory Logistic Regression

Ronald Heck Week 14 1 EDEP 768E: Seminar in Categorical Data Modeling (F2012) Nov. 17, 2012

AMS 7 Correlation and Regression Lecture 8

Additional Notes: Investigating a Random Slope. When we have fixed level-1 predictors at level 2 we show them like this:

Box-Cox Transformations

SLR output RLS. Refer to slr (code) on the Lecture Page of the class website.

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Index. Cambridge University Press Data Analysis for Physical Scientists: Featuring Excel Les Kirkup Index More information

Estimation: Problems & Solutions

Overview of Talk. Motivating Example: Antisocial Behavior (AB) Raw Data

LINEAR REGRESSION. Copyright 2013, SAS Institute Inc. All rights reserved.

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

Some general observations.

10 Model Checking and Regression Diagnostics

CHAPTER 5. Outlier Detection in Multivariate Data

Logistic Regression. Continued Psy 524 Ainsworth

ISQS 5349 Final Exam, Spring 2017.

EDF 7405 Advanced Quantitative Methods in Educational Research MULTR.SAS

2/26/2017. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

Interactions among Categorical Predictors

Exploiting TIMSS and PIRLS combined data: multivariate multilevel modelling of student achievement

Model Assumptions; Predicting Heterogeneity of Variance

Estimation: Problems & Solutions

MLMED. User Guide. Nicholas J. Rockwood The Ohio State University Beta Version May, 2017

4:3 LEC - PLANNED COMPARISONS AND REGRESSION ANALYSES

Checking model assumptions with regression diagnostics

Technical Appendix C: Methods

Data Analysis. with Excel. An introduction for Physical scientists. LesKirkup university of Technology, Sydney CAMBRIDGE UNIVERSITY PRESS

Longitudinal Data Analysis of Health Outcomes

Using Mplus individual residual plots for. diagnostics and model evaluation in SEM

Inferences for Regression

An Introduction to SEM in Mplus

A Re-Introduction to General Linear Models (GLM)

Confidence intervals for the variance component of random-effects linear models

Advanced Regression Topics: Violation of Assumptions

Labor Economics with STATA. Introduction to Regression Diagnostics

Diagnostics for mixed/hierarchical linear models

How To: Deal with Heteroscedasticity Using STATGRAPHICS Centurion

Preface Introduction to Statistics and Data Analysis Overview: Statistical Inference, Samples, Populations, and Experimental Design The Role of

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

Lecture 3: Inference in SLR

OPEN GEODA WORKSHOP / CRASH COURSE FACILITATED BY M. KOLAK

Inference for the Regression Coefficient

Introduction and Single Predictor Regression. Correlation

Lab 11. Multilevel Models. Description of Data

Longitudinal Modeling with Logistic Regression

Logistic Regression Models to Integrate Actuarial and Psychological Risk Factors For predicting 5- and 10-Year Sexual and Violent Recidivism Rates

PubH 7405: REGRESSION ANALYSIS SLR: DIAGNOSTICS & REMEDIES

ASSIGNMENT 3 SIMPLE LINEAR REGRESSION. Old Faithful

Modelling heterogeneous variance-covariance components in two-level multilevel models with application to school effects educational research

Lab 11 - Heteroskedasticity

STATISTICS 110/201 PRACTICE FINAL EXAM

Steps to take to do the descriptive part of regression analysis:

REVIEW 8/2/2017 陈芳华东师大英语系

Parametric Empirical Bayes Methods for Microarrays

Item Reliability Analysis

Three Factor Completely Randomized Design with One Continuous Factor: Using SPSS GLM UNIVARIATE R. C. Gardner Department of Psychology

Subject CS1 Actuarial Statistics 1 Core Principles

Online Appendix for Sterba, S.K. (2013). Understanding linkages among mixture models. Multivariate Behavioral Research, 48,

Exploratory Spatial Data Analysis Using GeoDA: : An Introduction

HLM Models and Type VI Errors

Transcription:

Psy 510/610 Multilevel Regression, Spring 2017 1 Diagnostics Chapter 10 of Snijders and Bosker (2012) provide a nice overview of assumption tests that are available. I will illustrate only one statistical test here, a test that the within-group variance varies randomly across groups or is a function of the predictors in the model. Formulas for this test are available in Snijders and Bosker (2012) on pp. 159-160 and Raudenbush & Bryk (2002) on p. 264. Heterogeneity problems may arise because of omitted variables, omitted effects, outliers (individuals or groups), or non-normal data. I illustrate the chi-square test of homogeneity used by Raudenbush and Bryk in the HLM package below. This test follows the Bartlett test (Bartlett & Kendall, 1946) and should be used only when data are normally distributed and there are 10 or more cases per group. I tested a model with SES (group-centered) as a predictor of MATHACH in the HSB data. The rest of the output is the same as usual, so I have omitted it here. When you request the test, a new, small section can be found in the output that looks like the following. Test of homogeneity of level-1 variance χ 2 statistic = 245.76576 degrees of freedom = 159 p-value = 0.000 A significant test indicates that the within-group variance is not equal across groups, as seems to be the case here. As with many variance tests, there can be considerable power when there are many groups, and, thus, it can be difficult to distinguish between circumstances when the violation is minor as opposed to major without using some supplemental information, such as visual exploration methods. Plotting HLM will produce some residual plots through the Graph Equations option under the File menu. For example, you can check whether residuals are equal at all values of X or appear to be equal in two groups (assumption of homogeneity) as shown in the following box-andwhiskers plot of Level-1 residuals. (Use of the sector variable Z-focus variable is optional).

Psy 510/610 Multilevel Regression, Spring 2017 2-3.60 SECTOR = 0 SECTOR = 1-8.36 Level-1 Residual -13.12-17.88-22.63 0 3.00 6.00 9.00 12.00 Another useful built in graph for diagnosing assumption violations, such as nonlinearity, heteroscedasticity, or outliers is a scatter plot of the predicted and residuals. (Graph Equations level-1 residual v. predicted value). I illustrate this graph using SPSS on the next page, so I do not reproduce the HLM version here. HLM also produces SPSS, SAS, SYSTAT, or STATA or ASCII files with residuals and other statistics that are useful for examining outliers and regression assumptions. Under the Basic Settings menu you can choose additional variables, file type, and location of the residual file (Level-1 Residual File and Level-2 Residual File buttons). These files allow for many other options for diagnostic graphs. Here is a list of the variables in the Level-1 residual file by default (when I tested a model with SES as a Level-1 predictor): l1resid = Level-1 residual; fitval = Level-1 fitted (predicted) value; sigma = square root of σ 2 ; ses = SES values; mathach = MATHACH values; Here is a list of variables in the Level-2 residual file: nj = number of cases per group; chipct = expected values on the chi-square distribution (used with Maholonobis distance for a Q-Q normal probability plot; mdist = Mahalonobis Distance of the Empirical Bayes coefficients from the fitted value (plot against chipct for normality assumption check); lntotvar = the natural logarithm of the total standard deviation within each unit; olsrsvar = the natural logarithm of the residual standard deviation within each unit based on its least squares regression; mdrsvar = the natural logarithm of the residual standard deviation from the final fitted fixed effects model; ebintrcp = Empirical Bayes intercept estimate; ebses = Empirical Bayes slope estimate for SES; olintrcp = Ordinary Least Squares intercept estimate; olses = Ordinary Least Squares slope estimate; fvintrcp = fitted (predicted) value of the intercept; fvses = fitted (predicted) value of the of the SES slope; ecintrcp = ; ecses = ; pv00, pv10, pv11, pvc00,pvc10, pvc11 = posterior variance and covariance estimates of τ0 2,τ12, and = τ1 2. SPSS Below I generated a few plots using the HLM output values.

Psy 510/610 Multilevel Regression, Spring 2017 3 Normal Probability Plot (scatter plot of chipct against mdist) 12.000 10.000 8.000 chipct 6.000 4.000 2.000 0.000 0.000 2.500 5.000 7.500 10.000 12.500 mdist Scatter Plot: EB Residual vs. Predicted Values (from Level-2 residual file) to examine heteroscedasticity or outliers. 6 Plot of EBINTRCP with FVINTRCP 4 2 0 EBINTRCP -2-4 4 6 8 10 12 14 16 18 FVINTRCP You can also save the residuals and predicted values for Level-1 equations from the MIXED command. Here is example syntax the saves residual and predicted values using the HSB data set. 1 MIXED mathach WITH cses sector /CRITERIA=MXITER(1000) SCORING(1) /METHOD = REML /PRINT = SOLUTION TESTCOV HISTORY /FIXED = cses sector cses*sector SSTYPE(3) /RANDOM = INTERCEPT cses SUBJECT(schoolid) COVTYPE(UN) /SAVE=PRED RESID. 1 Note: Standard errors and degrees of freedom can also be saved. FIXPRED PRED SEFIXP SEPRED DFFIXP DFPRED RESID are the possible keywords on the SAVE subcommand. Level-2 residuals are more difficult to obtain, but see Leyland s SPSS review for some SPSS code that will obtain them (http://www.cmm.bristol.ac.uk/learning-training/multilevel-m-software/spss.shtml).

Psy 510/610 Multilevel Regression, Spring 2017 4 R Here are a couple of diagnostic plots in R. The HLMdiag package has several nice features. library(lme4) fm1 <- lmer(mathach ~ ses + (ses schoolid), data = mydata,reml=false) #residual plots from Loy & Hofman (2014) #install before first use #install.packages("hlmdiag") library(hlmdiag) resid1_fm1 <- HLMresid(fm1, level = 1, type = "LS", standardize = TRUE) #level-1 residual plot library(ggplot2) qplot(x = ses, y = LS.resid, data = resid1_fm1,geom = c("point", "smooth")) + ylab("ls level-1 residuals") 20 10 LS level-1 residuals 0-10 -20-4 -2 0 2 ses require("lattice") #normal probability plot of level-1 residuals qqmath(model1, id=0.05) if (require("ggplot2")) { ## we can create the same plots using ggplot2 and the fortify() function model1f <- fortify(model1) ggplot(model1f, aes(.fitted,.resid)) + geom_point(colour="blue") + facet_grid(.~sector) + geom_hline(yintercept=0)

Psy 510/610 Multilevel Regression, Spring 2017 5 ## note: schoolids are ordered by mean mathach ggplot(model1f, aes(schoolid,.resid)) + geom_boxplot() + coord_flip() ggplot(model1f, aes(.fitted,mathach))+ geom_point(colour="blue") + facet_wrap(~schoolid) +geom_abline(intercept=0,slope=1) ggplot(model1f, aes(ses,.resid)) + geom_point(colour="blue") + facet_grid(.~sector) + geom_hline(yintercept=0)+geom_line(aes(group=schoolid),alpha=0.4)+ge om_smooth(method="loess") ## (warnings about loess are due to having only 4 unique x values) detach("package:ggplot2") } Standardized residuals 3 2 1 0 1296 2639 1462 1224 1462 1374 1499 1637 2277 2917 2768 3657 3657 6464 2467 3716 4253 3377 7734 8627 6144 2277 1499 2277 2651 2995 2771 2771 2995 3020 3351 3705 3881 3533 4931 3999 5640 4350 4292 3377 5667 3881 4530 5819 6144 5762 5819 6291 6990 6578 6808 7890 7172 7332 8202 8175 7890 7342 8531 8946 8983 9292 9347 8775 8983-1 -2-3 1224 1296 1308 1374 1224 1358 1906 2030 2305 2626 1637 2336 2629 2629 1374 3013 2305 2917 3020 2651 1461 1374 1288 2526 1477 3020 2771 2755 2917 3013 3152 3152 3152 3377 3351 3332 3377 3498 3498 3657 3688 1477 1906 1942 3533 2467 3533 3705 3716 3967 3705 3716 3967 4042 3992 3999 3999 4042 3688 4173 4350 4253 4350 4410 5404 4642 4383 5619 5640 5783 5619 6291 5937 5819 6089 6074 5720 6170 6170 6366 6397 6366 6415 6443 6469 6578 6808 6897 6484 7011 6816 7101 7232 7342 6897 7232 7276 7697 7345 7890 7341 7345 8165 8627 7919 5619 4868 3152 5667 6484 8202 8357 8531 8983 9021 8946 8628 8628 8857 8946 9158 9198 9508 9340 9397 9586 9550 9359 9347 6366 3020 6600 6397 8009 7688 9104 8707 9359 8857 9225 9586-4 -2 0 2 4 Standard normal quantiles #Histogram of level-2 residuals Histogram(resid2_fm2)

Psy 510/610 Multilevel Regression, Spring 2017 6 Normality assumptions are addressed in the handout on robust estimation methods.