Value Added Modeling

Similar documents
Introduction and Background to Multilevel Analysis

Outline. Mixed models in R using the lme4 package Part 3: Longitudinal data. Sleep deprivation data. Simple longitudinal data

lme4 Luke Chang Last Revised July 16, Fitting Linear Mixed Models with a Varying Intercept

A brief introduction to mixed models

Hierarchical Linear Models (HLM) Using R Package nlme. Interpretation. 2 = ( x 2) u 0j. e ij

Overview. Multilevel Models in R: Present & Future. What are multilevel models? What are multilevel models? (cont d)

Mixed models in R using the lme4 package Part 2: Longitudinal data, modeling interactions

Correlated Data: Linear Mixed Models with Random Intercepts

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

Mixed models in R using the lme4 package Part 7: Generalized linear mixed models

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

Introduction and Single Predictor Regression. Correlation

Outline. Mixed models in R using the lme4 package Part 5: Generalized linear mixed models. Parts of LMMs carried over to GLMMs

Random and Mixed Effects Models - Part II

Workshop 9.3a: Randomized block designs

Subject-specific observed profiles of log(fev1) vs age First 50 subjects in Six Cities Study

Mixed effects models

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn

STAT3401: Advanced data analysis Week 10: Models for Clustered Longitudinal Data

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

Solution Anti-fungal treatment (R software)

Outline. Statistical inference for linear mixed models. One-way ANOVA in matrix-vector form

Fitting Linear Mixed-Effects Models Using the lme4 Package in R

Mixed models in R using the lme4 package Part 3: Linear mixed models with simple, scalar random effects

Longitudinal Data Analysis

36-463/663: Hierarchical Linear Models

Fitting Mixed-Effects Models Using the lme4 Package in R

PAPER 206 APPLIED STATISTICS

Power Analysis. Introduction to Power

Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p )

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

R Output for Linear Models using functions lm(), gls() & glm()

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models:

Introduction to SAS proc mixed

20. REML Estimation of Variance Components. Copyright c 2018 (Iowa State University) 20. Statistics / 36

Simple Linear Regression

Generalized Linear and Nonlinear Mixed-Effects Models

These slides illustrate a few example R commands that can be useful for the analysis of repeated measures data.

Independent Samples t tests. Background for Independent Samples t test

Introduction to SAS proc mixed

Additional Notes: Investigating a Random Slope. When we have fixed level-1 predictors at level 2 we show them like this:

y i s 2 X 1 n i 1 1. Show that the least squares estimators can be written as n xx i x i 1 ns 2 X i 1 n ` px xqx i x i 1 pδ ij 1 n px i xq x j x

Introduction to Mixed Models in R

Stat 209 Lab: Linear Mixed Models in R This lab covers the Linear Mixed Models tutorial by John Fox. Lab prepared by Karen Kapur. ɛ i Normal(0, σ 2 )

Mixed models in R using the lme4 package Part 1: Linear mixed models with simple, scalar random

Mixed models in R using the lme4 package Part 1: Linear mixed models with simple, scalar random effects

Lab 11. Multilevel Models. Description of Data

36-309/749 Experimental Design for Behavioral and Social Sciences. Dec 1, 2015 Lecture 11: Mixed Models (HLMs)

Random and Mixed Effects Models - Part III

Wheel for assessing spinal block study

Multivariate Statistics in Ecology and Quantitative Genetics Summary

Wrap-up. The General Linear Model is a special case of the Generalized Linear Model. Consequently, we can carry out any GLM as a GzLM.

Introduction to Linear Mixed Models: Modeling continuous longitudinal outcomes

Exploring Hierarchical Linear Mixed Models

Package HGLMMM for Hierarchical Generalized Linear Models

Non-Gaussian Response Variables

Introduction to Within-Person Analysis and RM ANOVA

Sparse Matrix Methods and Mixed-effects Models

A course in statistical modelling. session 09: Modelling count variables

Outline. Mixed models in R using the lme4 package Part 2: Linear mixed models with simple, scalar random effects. R packages. Accessing documentation

Correlational Structure in the Random-Effect Structure of Mixed Models

Coping with Additional Sources of Variation: ANCOVA and Random Effects

Power analysis examples using R

Stat 5303 (Oehlert): Randomized Complete Blocks 1

Technische Universität München. Zentrum Mathematik. Linear Mixed Models Applied to Bank Branch Deposit Data

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Outline for today. Two-way analysis of variance with random effects

Homework 3 - Solution

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

Analysis of binary repeated measures data with R

Exercise 5.4 Solution

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

Outline. Example and Model ANOVA table F tests Pairwise treatment comparisons with LSD Sample and subsample size determination

Mixed Effects Models for Fish Growth

One-Way Analysis of Variance: ANOVA

Analysis of Count Data A Business Perspective. George J. Hurley Sr. Research Manager The Hershey Company Milwaukee June 2013

Bivariate data analysis

Multiple Group CFA Invariance Example (data from Brown Chapter 7) using MLR Mplus 7.4: Major Depression Criteria across Men and Women (n = 345 each)

over Time line for the means). Specifically, & covariances) just a fixed variance instead. PROC MIXED: to 1000 is default) list models with TYPE=VC */

Time-Invariant Predictors in Longitudinal Models

Generalized linear models

Econometrics 1. Lecture 8: Linear Regression (2) 黄嘉平

Longitudinal Data Analysis of Health Outcomes

Random Intercept Models

Generalized Linear Models

Generalized Linear Mixed-Effects Models. Copyright c 2015 Dan Nettleton (Iowa State University) Statistics / 58

A course in statistical modelling. session 06b: Modelling count data

Review of Multinomial Distribution If n trials are performed: in each trial there are J > 2 possible outcomes (categories) Multicategory Logit Models

MS&E 226: Small Data

Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00

Statistical Distribution Assumptions of General Linear Models

Random Coefficients Model Examples

Week 8, Lectures 1 & 2: Fixed-, Random-, and Mixed-Effects models

Lecture 9 STK3100/4100

An Introduction to Path Analysis

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

Introducing Generalized Linear Models: Logistic Regression

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

Regression Analysis: Exploring relationships between variables. Stat 251

Transcription:

Value Added Modeling Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning

Background for VAMs Recall from previous lectures that we can define a growth model as: Y ti = π 0i + π 1i t + e ti π 0i = β 0 + u 0i π 1i = β 1 + u 1i This is a two-level model with random intercepts and a random component for the time variable t. This can also be viewed as a full model where: Y ti = (β 0 + u 0i ) + (β 1 + u 1i ) t + e ti = [β 0 + β 1 t] + [u 0i + u 1i t + e ti ] δ i N (0, Ω), ɛ ti N (0, σ 2 I)

Setting Up The Model Before fitting the model we use two diagnostic plots to examine the data. First, we examine the functional form of the data using within-subject linear fits using OLS. This is accomplished using the lmlist function. Second, we examine the variability of the intercepts and slopes to decide which random effects will be permitted to vary. This is accomplished using confint. The dataset we will be using is the star dataset in the mlmrev package and can be loaded simply by running library(mlmrev)

Within-Subject Linear Fits Before fitting a linear model it is useful to examine the functional form of the data Instead of plotting the entire dataset we take a random sample of 50 students (because our dataset is so large) > samp <- sample(names(which(table(subset(star, +!is.na(math))$id) > 1)), 50) > datsamp <- subset(star, id %in% samp) > child.lm <- lmlist(math ~ yrs id, datsamp, na.action = n The first two lines take the subset The last command fits the OLS regression and stores them in the object child.lm.

> xyplot(math ~ yrs id, datsamp, type = c("g", + "r", "p"), aspect = "xy", layout = c(17, 3)) math 700 600 500 700 600 500 5613806955120733958023231115302772052102 94854 94909 98033 8172 8179 83446447678553016878860341564170790855030449574 0151 0887 1071521719 2644 2724 3228 3584004650 4666 5569 5626 5826072 7782 700 600 500 yrs

> xyplot(..., index.cond = function(x, y) predict(lm(y ~ + x), list(x = -2.5))) math 700 600 500 700 600 500 6072 55697338864851203228066876441719 4666 2644 156263 40343 55745 0151565613 8179767088730240043014900524495826 41702312724 8344 855102503077277829581521 3584 8172 46501150239558558031071790 700 600 500 yrs

Within-Subject Confidence Intervals In some multilevel packages one typically fits the unconditional growth model and examines the chi-square statistic associated with the variance components. If the p-value is statistically significant, the random effects are retained. In R, this is accomplished a little differently. First, we examine the data graphically and we subsequently use a likelihood ratio test. Creating the visual displays requires use of the confint command. We have already created an object with the within-subject linear fits. We can now use the plot function to plot the intervals.

> plot(confint(child.lm)) 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 123456789 (Intercept) yrs 440 460 480 500 520 540 20 30 40 50 60 70

Fitting the VAM First, we will fit linear growth model with random effects for time. Y ti = [µ + β 1 (year)] + [δ 0 + δ 1 (year) + ɛ ti ] > nograd <- list(gradient = FALSE, niterem = 0) > fm1 <- lmer(math ~ yrs + (yrs id), star, control = nograd) The portion of code after the control statement turns off the analytic gradient and tells R to use no EM iterations. This is used to speed up computations when the number of grouping factors becomes exceedingly large.

> fm1 Model Results Linear mixed model fit by REML Formula: math ~ yrs + (yrs id) Data: star AIC BIC loglik deviance REMLdev 247325 247373-123656 247311 247313 Random effects: Groups Name Variance Std.Dev. Corr id (Intercept) 1506.605 38.8150 yrs 68.418 8.2715-0.405 Residual 600.937 24.5140 Number of obs: 24613, groups: id, 10767 Fixed effects: Estimate Std. Error t value (Intercept) 484.2708 0.4947 978.9 yrs 43.2031 0.1962 220.2 Correlation of Fixed Effects: (Intr) yrs -0.612

Model Modifications We can now test our original random effects model against a model in which we do not allow for the time variable (yrs) to have random effects > fm2 <- lmer(math ~ yrs + (1 id), star, control = nograd) > anova(fm1, fm2) Data: star Models: fm2: math ~ yrs + (1 id) fm1: math ~ yrs + (yrs id) Df AIC BIC loglik Chisq Chi Df Pr(>Chisq) fm2 4 247579 247612-123786 fm1 6 247323 247372-123656 259.8 2 < 2.2e-16 Testing these models against each other reveals that, in deed, the random effect for yrs is warranted.

Checking Assumptions The distributional assumptions underlying fm1 are that the within-group residuals are distributed as ɛ ti N (0, σ 2 I) and the random effects are distributed as δ i N (0, Ω). Both of these assumptions must be examined.

> with(fm1, xyplot(resid(.) ~ fitted(.) gr, type = c("g", + "p"))) Residuals 100 50 0 50 100 K 1 2 400 500 600 700 100 50 0 50 100 3

4 2 0 2 4 > qqmath(~resid(fm1)) Residuals 100 50 0 50 100

4 2 0 2 4 > qqmath(~ranef(fm1)$id$ (Intercept), cex = 0.7) 20 10 ranef(fm1)$id$yrs 0 10 20

4 2 0 2 4 > qqmath(~ranef(fm1)$id$yrs, cex = 0.7) 20 10 ranef(fm1)$id$yrs 0 10 20

> plot(ranef(fm1)) $id (Intercept) 150 100 50 0 50 100

Extending the Model The Project Star dataframe includes unique school IDs so we can extend this to a three-level model and examine the school effects. We previously noted how the structure of the random effects can be modified to handle additional levels. R can handle an arbitrary number of levels when fitting models with nested random effects. The portion of code (yrssch:id) denotes that students are nested within schools. We could also run this portion of the code as (yrsid)+(yrssch), such that we have defined yrs to have random effects for both individuals and schools.

Extending the Model (cont.) We specify the model as: Y ti = [µ + β 1 (time)] + [ θ 0j(i) + θ 1j(i) (time) + δ 0i + δ 1i (time) + ɛ ti ] θ j N (0, Φ), δ i N (0, Ω), ɛ ti N (0, σ 2 I)

Examining the Variance Components As we did before, we use the lmlist function to produce within-school confidence intervals. The only difference here is that we have 80 schools, so we do not take a subset. > school.lm <- lmlist(math ~ yrs sch, data = star)

> plot(confint(school.lm)) 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 123456789 (Intercept) yrs 440 460 480 500 520 540 20 30 40 50 60 70

Fixed effects: Fitting the VAM The trellis plots suggest sufficient variability to proceed with random intercepts and random slopes at the school level > sch.lmer <- lmer(math ~ yrs + (yrs id) + (yrs + sch), star, control = nograd) > sch.lmer Linear mixed model fit by REML Formula: math ~ yrs + (yrs id) + (yrs sch) Data: star AIC BIC loglik deviance REMLdev 244619 244692-122301 244606 244601 Random effects: Groups Name Variance Std.Dev. Corr id (Intercept) 1134.316 33.6796 yrs 19.026 4.3619-0.339 sch (Intercept) 341.751 18.4865 yrs 58.019 7.6170-0.614 Residual 600.161 24.4982 Number of obs: 24613, groups: id, 10767; sch, 80

Model Testing Suppose we wanted to modify the structure of the random effects such that we did not model random slopes for yrs. Y ti = [µ + β 1 time] + [θ 0j(i) + θ 1j(i) + δ 0i + ɛ ti ] > sch2.lmer <- lmer(math ~ yrs + (1 id) + (1 + sch), star, control = nograd) > anova(sch.lmer, sch2.lmer) Data: star Models: sch2.lmer: math ~ yrs + (1 id) + (1 sch) sch.lmer: math ~ yrs + (yrs id) + (yrs sch) Df AIC BIC loglik Chisq Chi Df Pr(>Chisq) sch2.lmer 5 246225 246265-123107 sch.lmer 9 244624 244697-122303 1609.2 4 < 2.2e-16

> head(ranef(sch.lmer)$sch) (Intercept) yrs 1 11.003624-4.254397 2-30.828722 4.528853 3 16.818051-5.660053 4-9.848673 9.203151 5-17.785526 0.764398 6-15.086948 3.816181 > head(ranef(sch.lmer)$id) (Intercept) yrs 100017 93.5158906-4.107405381 100028-40.9552884 1.798838369 100045 10.4070394-0.014233537 100064-0.1676666 0.007364254 100070 49.1877278-1.999841478 100096 30.7611035 0.212732219 Looking at Effects