Introduction to the Analysis of Hierarchical and Longitudinal Data

Similar documents
SPIDA 2012 Part 2 Mixed Models with R: Hierarchical Models to Mixed Models

Hierarchical Linear Models (HLM) Using R Package nlme. Interpretation. 2 = ( x 2) u 0j. e ij

SPIDA Mixed Models with R Hierarchical Models to Mixed Models

Random Intercept Models

Stat 209 Lab: Linear Mixed Models in R This lab covers the Linear Mixed Models tutorial by John Fox. Lab prepared by Karen Kapur. ɛ i Normal(0, σ 2 )

Introduction and Background to Multilevel Analysis

A First Look at Multilevel and Longitudinal Models York University Statistical Consulting Service (Revision 3, March 2, 2006) March 2006

Ch 2: Simple Linear Regression

MS&E 226: Small Data

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Stat 5102 Final Exam May 14, 2015

Correlated Data: Linear Mixed Models with Random Intercepts

Exam Applied Statistical Regression. Good Luck!

Workshop 9.3a: Randomized block designs

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model

ST430 Exam 2 Solutions

Business Statistics. Lecture 10: Course Review

Random Coefficients Model Examples

Chapter 8 Conclusion

Random Effects. Edps/Psych/Stat 587. Carolyn J. Anderson. Fall Department of Educational Psychology. university of illinois at urbana-champaign

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA

Unit 10: Simple Linear Regression and Correlation

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

10. Alternative case influence statistics

SCHOOL OF MATHEMATICS AND STATISTICS

Measuring the fit of the model - SSR

36-309/749 Experimental Design for Behavioral and Social Sciences. Dec 1, 2015 Lecture 11: Mixed Models (HLMs)

Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT. Charlotte Wickham. stat511.cwick.co.nz. Nov

Lecture 3: Inference in SLR

Multiple Linear Regression. Chapter 12

General Linear Model (Chapter 4)

Activity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression

Simple Linear Regression for the Climate Data

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

STAT3401: Advanced data analysis Week 10: Models for Clustered Longitudinal Data

IES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc

Lecture 4 Multiple linear regression

A discussion on multiple regression models

Lecture 18: Simple Linear Regression

Models for Clustered Data

Models for Clustered Data

ST430 Exam 1 with Answers

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Multivariate Analysis of Variance

22s:152 Applied Linear Regression. Returning to a continuous response variable Y...

Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph.

22s:152 Applied Linear Regression. Take random samples from each of m populations.

Lecture 14 Simple Linear Regression

22s:152 Applied Linear Regression. In matrix notation, we can write this model: Generalized Least Squares. Y = Xβ + ɛ with ɛ N n (0, Σ)

MATH 644: Regression Analysis Methods

22s:152 Applied Linear Regression

Longitudinal Data Analysis. with Mixed Models

Mixed Models II - Behind the Scenes Report Revised June 11, 2002 by G. Monette

Random Coefficient Model (a.k.a. multilevel model) (Adapted from UCLA Statistical Computing Seminars)

36-463/663: Hierarchical Linear Models

Review of Statistics 101

STK4900/ Lecture 10. Program

: The model hypothesizes a relationship between the variables. The simplest probabilistic model: or.

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

R Output for Linear Models using functions lm(), gls() & glm()

Confidence Intervals, Testing and ANOVA Summary

III. Inferential Tools

1 Introduction 1. 2 The Multiple Regression Model 1

Linear Regression Model. Badr Missaoui

Outline. Mixed models in R using the lme4 package Part 3: Longitudinal data. Sleep deprivation data. Simple longitudinal data

Section 4.6 Simple Linear Regression

STAT 350: Summer Semester Midterm 1: Solutions

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)

Inference for Regression Inference about the Regression Model and Using the Regression Line

Biostatistics 380 Multiple Regression 1. Multiple Regression

Handout 4: Simple Linear Regression

Business Statistics. Lecture 9: Simple Regression

ST505/S697R: Fall Homework 2 Solution.

Lecture 6: Linear Regression

Dealing with Heteroskedasticity

Simple linear regression

Simple Linear Regression

Formal Statement of Simple Linear Regression Model

Lecture 1: Linear Models and Applications

Density Temp vs Ratio. temp

Inference for Regression

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

Intro to Linear Regression

22s:152 Applied Linear Regression. Chapter 5: Ordinary Least Squares Regression. Part 1: Simple Linear Regression Introduction and Estimation

MODELS WITHOUT AN INTERCEPT

Topic 20: Single Factor Analysis of Variance

22s:152 Applied Linear Regression. 1-way ANOVA visual:

ISQS 5349 Final Exam, Spring 2017.

Covariance Models (*) X i : (n i p) design matrix for fixed effects β : (p 1) regression coefficient for fixed effects

McGill University. Faculty of Science. Department of Mathematics and Statistics. Statistics Part A Comprehensive Exam Methodology Paper

Math 3330: Solution to midterm Exam

1 Multiple Regression

Correlation & Simple Regression

28. SIMPLE LINEAR REGRESSION III

Chapter 14 Simple Linear Regression (A)

Multiple Linear Regression

lme4 Luke Chang Last Revised July 16, Fitting Linear Mixed Models with a Varying Intercept

Consider fitting a model using ordinary least squares (OLS) regression:

Transcription:

Introduction to the Analysis of Hierarchical and Longitudinal Data Georges Monette, York University with Ye Sun SPIDA June 7, 2004 1

Graphical overview of selected concepts Nature of hierarchical models Interpretation of components of the models: data space and beta space Understanding T BLUPs Mixed model: marginal or conditional effects Contextual and compositional effects Use one data example: MathAch ~ SES in a sample of Public and Catholic Schools used by Bryk and Raudenbush (1992) from the 1982 High School and Beyond survey. The data used comprise 7185 U.S. high-school students from 160 schools: 90 are public and 70 are Catholic. WARNING: Babel of notation. Early bilingualism an asset. HLM Notation SAS Notation Level 1 true effects β b Population fixed effects γ β Random effects variance T G matrix Within-cluster variance σ 2 σ 2 2

Looking at a single school: Public School P4458 Relationship between MathAch and SES Regression output > summary(lm(mathach ~ SES, zz1)) Call: lm(formula = MathAch ~ SES, data = zz1) Residuals: Min 1Q Median 3Q Max -8.66-2.839-0.5862 2.428 12.46 Coefficients: Value Std. Error t value Pr(> t ) (Intercept) 6.9992 1.2380 5.6534 0.0000 SES 1.1318 1.0074 1.1235 0.2671 Residual standard error: 4.463 on 46 degrees of freedom Multiple R-Squared: 0.02671 F-statistic: 1.262 on 1 and 46 degrees of freedom, the p-value is 0.2671 Correlation of Coefficients: (Intercept) SES 0.854 3

Model: Y = β + β X + ε ε iid N(0, σ 2 ) i 0 1 i i i Fit: Y = b + bx i 0 1 i 4

Public School 4458 MathAch -3-2 -1 0 1 2 SES Figure 1: Data in one school 5

Public School 4458 MathAch -3-2 -1 0 1 2 SES Figure 2: Data with least squares line 6

Public School 4458 MathAch -3-2 -1 0 1 2 SES Figure 3: Data with least squares line and vertical axis at SES = 0 7

P 4458: Estimate of int. and slope b0-5 0 5 10 b(ses) Figure 4: Beta space: every line in data space is represented by a point in beta space. The least squares line corresponds to the point ˆβ 8

P 4458: Estimate of int. and slope b0-5 0 5 10 b(ses) Figure 5: Beta space: every line in data space is represented by a point in beta space. The slope of the least squares line, ˆβ, is obtained by projecting the point βˆ onto the horizontal axis. SES 9

P 4458: Estimate of int. and slope b0-5 0 5 10 b(ses) Figure 6: Beta space: every line in data space is represented by a point in beta space. The intercept of the least squares line, ˆβ, is obtained by projecting the point βˆ onto the horizontal axis. 0 10

P 4458: Estimate of int. and slope b0-5 0 5 10 b(ses) Figure 7: 95% Scheffe simultaneous confidence ellipse for the intercept and slope of the true line. 11

P 4458: Estimate of int. and slope b0-5 0 5 10 b(ses) Figure 8: 95% 2-dimensional Scheffe confidence ellipse rejects hypothesis with 2 constraints: H : β = β = 0 0 0 1 12

P 4458: Estimate of int. and slope b0-5 0 5 10 b(ses) Figure 9: 95% confidence interval ellipse (in red). Shadow on any line is a 95% confidence interval for projections of intercept and slope on the line. 13

P 4458: Estimate of int. and slope b0-5 0 5 10 b(ses) Figure 10: Shadows are ordinary 95% confidence intervals. Shadow on the horizontal axis is a 95% confidence interval for the true slope. Note that the interval includes β 1 = 0 so we accept H 0 : β 1 = 0 14

P 4458: Estimate of int. and slope b0-5 0 5 10 b(ses) Figure 11: Shadows of 2-dim CE are Scheffe confidence intervals adjusted for fishing in 2-D space (but we don t need that now) 15

P 4458: Estimate of int. and slope b0-5 0 5 10 b(ses) Figure 12: Another approach to testing H 0 : β 1 = 0: does the line β 1 = 0 cross the ellipse? 16

Comparing 2 schools: P4458 and C1433 Regression output: > summary(lm(mathach ~ SES * Sector, zz)) Call: lm(formula = MathAch ~ SES * Sector, data = zz) Residuals: Min 1Q Median 3Q Max -8.66-2.797-0.362 3.031 12.46 Coefficients: Value Std. Error t value Pr(> t ) (Intercept) 6.9992 1.1618 6.0244 0.0000 SES 1.1318 0.9454 1.1972 0.2348 Sector 11.3997 1.6096 7.0823 0.0000 SES:Sector 0.7225 1.5340 0.4710 0.6390 Residual standard error: 4.188 on 79 degrees of freedom Multiple R-Squared: 0.7418 F-statistic: 75.67 on 3 and 79 degrees of freedom, the p-value is 0 Correlation of Coefficients: (Intercept) SES Sector SES 0.8540 Sector -0.7218-0.6164 SES:Sector -0.5263-0.6163-0.0410 17

> fit$contrasts $Sector: Catholic Public 0 Catholic 1 18

P4458 and C1433 MathAch -3-2 -1 0 1 2 SES Figure 13: Data space : Public school = o, Catholic shool = + 19

P4458 and C1433 MathAch -3-2 -1 0 1 2 SES Figure 14: Data space : Public school = o, Catholic shool = triangles 20

P4458 and C1433 MathAch -3-2 -1 0 1 2 SES Figure 15: Data with least squares lines 21

P4458 and C1433 MathAch -3-2 -1 0 1 2 SES Figure 16: Interpreting the least squares estimated coefficients 22

Testing no school (=sector) effect Two ways: Method 1: Compare 2 models with and without Sector factor With Sector(full model): Value Std. Error t value Pr(> t ) (Intercept) 6.9992 1.1618 6.0244 0.0000 SES 1.1318 0.9454 1.1972 0.2348 Sector 11.3997 1.6096 7.0823 0.0000 SES:Sector 0.7225 1.5340 0.4710 0.6390 Residual standard error: 4.188 on 79 degrees of freedom Multiple R-Squared: 0.7418 F-statistic: 75.67 on 3 and 79 degrees of freedom, the p-value is 0 Without Sector: Value Std. Error t value Pr(> t ) (Intercept) 13.4307 0.6054 22.1867 0.0000 SES 5.7215 0.5454 10.4904 0.0000 Residual standard error: 5.3 on 81 degrees of freedom Multiple R-Squared: 0.576 F-statistic: 110 on 1 and 81 degrees of freedom, the p-value is 1.11e-016 23

Comparison: Analysis of Variance Table Response: MathAch Terms Res.Df RSS Test Df Sum of Sq F Value Pr(F) 1 SES 81 2275.568 2 SES * Sector 79 1385.619 +Sector+SES:Sector 2 889.9487 25.36987 3.089479e-009 Method 2: Linear hypothesis: H : β = β = 0 0 Sector Sector:SES With Sector(full model): Value Std. Error t value Pr(> t ) (Intercept) 6.9992 1.1618 6.0244 0.0000 SES 1.1318 0.9454 1.1972 0.2348 Sector 11.3997 1.6096 7.0823 0.0000 SES:Sector 0.7225 1.5340 0.4710 0.6390 Residual standard error: 4.188 on 79 degrees of freedom Multiple R-Squared: 0.7418 F-statistic: 75.67 on 3 and 79 degrees of freedom, the p-value is 0 24

Test of Linear Hypothesis: > library(car) # John Fox s Companion to Applied Regression > L <- diag(4)[c(3, 4), ] > L [,1] [,2] [,3] [,4] [1,] 0 0 1 0 [2,] 0 0 0 1 > coef(fitfull) (Intercept) SES Sector SES:Sector 6.999212 1.131837 11.39967 0.7224572 > linear.hypothesis(fitfull, L) F-Test SS = 889.9486 SSE = 1385.6188 F = 25.3698 Df = 2 and 79 p = 3.089474e-009 25

The dual in Beta space 26

P4458 and C1433 MathAch -3-2 -1 0 1 2 SES Figure 17: Least squares lines in data space 27

P 4458 and C1433: Estimates of int. and slope b0-5 0 5 10 b(ses) Figure 18: Least squares lines in beta space 28

P 4458 and C1433: Estimates of int. and slope b0-5 0 5 10 b(ses) Figure 19: Least squares lines in beta space 29

P4458 and C1433: Estimates of int. and slope b0-5 0 5 10 b(ses) Figure 20: Least squares lines in beta space with 95% confidence ellipses 30

Code in Splus? (even easier using library car in R) > plot(0, 0, xlim = c(-5, 10), ylim = c(-5, 25), xlab = "b(ses)", ylab = "b0", main = "P 4458 and C1433: Estimates of int. and slope", type = "n") > summary(fitfull <- lm(mathach ~ SES * Sector, zz)) Call: lm(formula = MathAch ~ SES * Sector, data = zz) Residuals: Min 1Q Median 3Q Max -8.66-2.797-0.362 3.031 12.46 Coefficients: Value Std. Error t value Pr(> t ) (Intercept) 6.9992 1.1618 6.0244 0.0000 SES 1.1318 0.9454 1.1972 0.2348 Sector 11.3997 1.6096 7.0823 0.0000 SES:Sector 0.7225 1.5340 0.4710 0.6390 Residual standard error: 4.188 on 79 degrees of freedom Multiple R-Squared: 0.7418 F-statistic: 75.67 on 3 and 79 degrees of freedom, the p-value is 0 Correlation of Coefficients: (Intercept) SES Sector SES 0.8540 Sector -0.7218-0.6164 SES:Sector -0.5263-0.6163-0.0410 > coef(fitfull) 31

(Intercept) SES Sector SES:Sector 6.999212 1.131837 11.39967 0.7224572 > L.Public <- diag(4)[c(2, 1), ] > L.Public [,1] [,2] [,3] [,4] [1,] 0 1 0 0 [2,] 1 0 0 0 > L.Catholic <- diag(4)[c(2, 1), ] + diag(4)[c(4, 3), ] > L.Catholic [,1] [,2] [,3] [,4] [1,] 0 1 0 1 [2,] 1 0 1 0 > lines(ell( h.pub <- glh(fitfull, L.Public)), col = 6) > lines(ell( h.cat <- glh(fitfull, L.Catholic)), col = 8) > abline( v = h.pub[[1]][1], col = 6) > abline( v = h.pub[[2]][1], col = 6) > abline( h = h.pub[[1]][2], col = 8) > abline( h = h.pub[[2]][2], col = 8) 32

Comparing 2 Sectors Some approaches: Two-stage or derived variables : Stage1: Perform a regression on each School to get ˆβ for each school Stage2: Analyze ˆβ s from each school using multivariate methods to estimate mean line in each sector. Use multivariate methods to compare the two lines Doing this graphically: 33

Public Schools: Estimates of int. and slope (data space) M athach - 5 0 5 10 15 20 25-3 -2-1 0 1 2 SES Figure 21: Public school regression in data space 34

All Schools: Estimates of int. and slope (data space) M athach - 5 0 5 10 15 20 25-3 -2-1 0 1 2 SES Figure 22: Mean public school regressions 35

Catholic Schools: Estimates of int. and slope (data space) MathAch -3-2 -1 0 1 2 SES Figure 23: Catholic school regressions in data space 36

All Schools: Estimates of int. and slope (data space) MathAch -3-2 -1 0 1 2 SES Figure 24: All schools: least squares lines in data space 37

All Schools: Estimates of int. and slope (data space) MathAch -3-2 -1 0 1 2 SES Figure 25: Mean regression in each sector in data space 38

All Schools: Estimates of int. and slope (beta space) b0-5 0 5 10 b(ses) Figure 26: All schools: least squares lines in beta space 39

All Schools: Estimates of int. and slope (beta space) b0-5 0 5 10 b(ses) Figure 27: Standard dispersion ellipse for each sector 40

All Schools: Estimates of int. and slope (beta space) b0-5 0 5 10 b(ses) Figure 28: Standard dispersion ellipse and 95% confidence ellipse for each sector 41

What s wrong with this? If every school sample had: 1: the same number of subjects 2: the same set of values of SES 3: the same σ 2 (we ve been tacitly assuming this anyways) Then the shape of the confidence ellipses in beta space would be identical for every school and the 2 stage procedure would give the same result as a classical repeated measures analysis. With the present data, with different N s and varying distributions of SES in each school, the 2-stage process ignores the fact that some β ˆi s are measured with greater precision than others (e.g. if the ith school has a large N or a large spread in SES). We could attempt to correct this problem by using weights equal to Var( βˆ ) ( XX ) but this is the ' 1 i =σ 2 i i variance of β ˆi as an estimator of the true β i for the ith School, NOT the underlying βfor the Sector. Thus using σ 2 ( XX ' ) 1 i i would provide weights that are too variable from school to school. Note: To extent that the schools are similar wrt predictors then the approach above is plausible -- but we ll see mixed models are often easier. 42

Other approach: Fixed effects model: Use School as a categorical predictor. Construct contrasts with 1 degree of freedom for between Sector comparisons. Drawbacks: Inferences generalize to news students from same schools, not new schools from population of schools 43

The Hierarchical Model Basic structure: 1. Each school has a true regression line that is not directly observed 2. The observations from each school are generated by taking random observations generated with the school s true regression line 3. The true regression lines come from a population of regression lines Within School model: For school i: (For now we suppose all schools come from the same population, e.g. only one Sector) β 1) True but unknown 0i β β 0i i = β = β for each school 2) The data are genereated as SESi 1i Y = β + β X + ε ε ~ N (0, σ 2 ) independent of β ' s ij 0i 0i ij ij ij i 44

Between School model: β are sampled from a population of schools i β = γ+ u u ~ ( 0, Τ ) i i i N where τ τ Τ = 1 τ τ 00 0 10 11 is a variance matrix. Note: Var( β i) = τ Var( β ) = τ Cov( β, β ) = τ 0 00 1i 11 0i 1i 10 45

Example: A simulated sample. To generate an example we need to do something with SES although its distribution is not part of the model. In the model the values of SES are taken as given constants. We will take: γ= 12, Τ = 16 8, σ 2 = 20 2 8 25 Once we have generated β i we generate Ni ~ Poisson (30) and SES ~ N( 0,1) Here s our first simulated school in detail: For i = 1: u 6.756, 12 6.756 5.244 i = β 6.612 i = γ+ u i = + = 2 6.612 4.612 46

Ni = 23 SES : -1.05-0.78 1.05-1.01 0.77 1.85 0.87-1.18 0.18 2.08-1.14-1.71-0.64-0.41 0.86 1.29 0.04 0.23 0.90 0.50-2.10-1.89 0.38 ε : 4.46-0.73 0.30 7.63-7.03 1.20-6.23-4.66 6.17 0.75-1.43 0.46 3.64-2.39 2.24 2.60 3.96 0.71-3.74 3.30 4.42-4.59-3.61 Y = β + β SES + ε : ij 0i 1i ij ij 14.53 8.09 0.70 17.56-5.34-2.10-4.99 6.03 10.58-3.59 9.09 13.57 11.83 4.75 3.51 1.88 9.00 4.91-2.66 6.24 19.37 9.38-0.13 47

Simulated school (data space) MathAch -3-2 -1 0 1 2 SES Figure 29: Simulation: mean population regression line γ 48

Simulated school (data space) MathAch -3-2 -1 0 1 2 SES Figure 30: Simulated school: True regression line in School 1: β = γ+ u i i 49

Simulated school (data space) MathAch -3-2 -1 0 1 2 SES Figure 31: Simulated school: True regression line and data 50

Simulated school (data space) MathAch -3-2 -1 0 1 2 SES Figure 32: Simulated school with true line (dashed), data and least-squares line (solid) 51

Simulated school (beta space) b0-5 0 5 10 b(ses) Figure 33: Simulated school: population mean line in beta space 52

Simulated school (beta space) b0-5 0 5 10 b(ses) Figure 34: Simulated school: population mean 'line' and standard dispersion ellipse (shadows are means +/- 1 SD) 53

Simulated school (beta space) b0-5 0 5 10 b(ses) Figure 35: Simulated school: population mean line and 'true' line for School 1 54

Simulated school (beta space) b0-5 0 5 10 b(ses) Figure 36: Simulated school: true line with dispersion ellipse for least-squares line from true line. 55

Simulated school (beta space) b0-5 0 5 10 b(ses) Figure 37: Simulated school: true line with dispersion ellipse for least-squares line from true line and least-squares line from School 1. 56

Simulated school (beta space) b0-5 0 5 10 b(ses) Figure 38: Simulated school: dispersion ellipse of least-squares lines from population regression line which is almost coincident with the dispersion ellipse of the true line from population line. 57

Note that with smaller N, larger σ 2 or smaller dispsersion for SES, these dispersion ellipse for the true β i (with matrix T ) and the dispersion ellipse for β ˆ 2 i (with matrix T+ σ ( X' X ) 1 ) could differ much more than they do here What does T mean? Subtitle: T and the variance of Y It s fairly clear what Τ= Var( β ) means in beta space. What does it mean for data space? i The following graphs show 50 true lines in beta space and then in data space. 58

50 simulated schools (beta space) b0-10 -5 0 5 10 15 b(ses) Figure 39: 50 simulated β i s in beta space 59

50 simulated schools (beta space) b0-10 -5 0 5 10 15 b(ses) Figure 40: 50 simulated β i s in beta space 60

50 simulated schools (beta space) b0-10 -5 0 5 10 15 b(ses) Figure 41: 50 simulated β i s in beta space and the mean of 61

50 simulated schools (beta space) b0-10 -5 0 5 10 15 b(ses) Figure 42: Standard data ellipse (blue). 62

50 simulated schools (beta space) b0-10 -5 0 5 10 15 b(ses) Figure 43: 95% confidence ellipse for mean regression line (we happen to be lucky to include the true value this time) 63

What does this look like in data space? i.e. What does T really mean? Frequently: theory easier in beta space, interpretation easier in data space. 64

50 simulated schools (data space) MathAch -3-2 -1 0 1 2 SES 12 Figure 44: Population mean line γ = 2 65

50 simulated schools (data space) MathAch -3-2 -1 0 1 2 SES 12 Figure 45: Population mean line γ=, and 50 values of ˆ 16 8 βi = γ+ ui with Var( ui) = Τ = 2 8 25 66

50 simulated schools (data space) MathAch -3-2 -1 0 1 2 SES Figure 46: Standard dispersion bands: Over any values of SES approximately 68% of sampled lines should lie between the bands 67

50 simulated schools (data space) MathAch -3-2 -1 0 1 2 SES Figure 47: Standard dispersion bands with 50 sampled true lines 68

50 simulated schools (data space) MathAch -3-2 -1 0 1 2 SES Figure 48: Width of standard dispersion bands at 3 values of SES 69

50 simulated schools (data space) MathAch -3-2 -1 0 1 2 SES Figure 49: Point of minimum variance: x = τ / τ = 8/25= 0.32 min 01 11 70

50 simulated schools (data space) MathAch -3-2 -1 0 1 2 SES Figure 50: SD at x = 0: τ 00 = 16 = 4 SD at x = x 2 2 min : τ 00 τ 01 / τ 11 = 16 8 /25 = 13.44 = 3.66 71

To complete the simulation we still need to generate X s and Y s for the remaining 49 schools and analyze the results but let s analyze the actual data 72

Recap: 50 Simulated Schools: data space Level 1: Yij = β0 i+ β1 ixij + εij with ε independent N(0, σ 2 ) ij Level 2: β0 i = γ00+ γ01sectori + u β = γ + γ Sector + u 0i 1i 10 11 i 1i with u 0i 0 00 01 independently ~ N, u 1i 0 τ10 τ 11 τ τ and independent of ε ij s So EY ( ) = E( β + β X + ε ) ij 0i 1i ij ij = E( β ) + E( β X ) + E( ε ) = γ + γ X 0 1 0i 1i ij ij ij and Var( Y ) = Var( β + β X + ε ) ij 0i 1i ij ij = Var( β + β X ) + Var( ε ) 0i 1i τ τ 1 = + σ 2 1 1 X 00 0 ij τ X 10 τ 11 ij 2X X2 00 ij 01 11 ij + ij = τ + τ + τ ij σ2 73

The variance of the height of the true line is Var( β + β X ) = 1 X 0i 1i ij τ τ 1 00 01 ij τ X 10 τ 11 ij 2X 2 00 ij 01 11Xij = τ + τ + τ This is a quadratic in X which has a mininum of at τ τ τ 2 00 01 / 11 X τ τ = / min 01 11 74

Analysis of data (Public Schools only) Model: Level 1: Yij = β0 i+ β1 ixij + εij with εij independent N(0, σ 2 ) β0 i = γ00+ γ01sectori + u0 Level 2: β = γ + γ Sector + u 1i 10 11 i 1i i u 0i 1 with independently ~ N 0 τ τ 00 0, u 0 τ τ 1i 10 11 and independent of ε ij s Reduced Model: Y = β + β X + ε ij 0i 1i ij ij ( 00 01 0i) ( 10 11 1i) = γ + γ Sector + u + γ + γ Sector + u X + ε i i ij ij = γ + γ X + γ Sector + γ Sector X 00 10 ij 01 i 11 i ij + u + u X + ε 0i 1i ij ij [fixed effects] [random effects] [micro level error] 75

If analyse only one Sector with Sector = 0, then the model is: Level 1: Same Level 2: β = γ + u 0i 00 β = γ + u 0i 1i 10 1i u 0 τ τ u 1i 0 τ10 τ 11 0i 00 01 with independently ~ N, and independent of ε ij s Reduced Model: Y = β + β X + ε ij 0i 1i ij ij ( 00 0i) ( 10 1i) = γ + u + γ + u X + ε = γ + γ 00 10 X ij + u + u X + ε 0i 1i ij ij ij ij [fixed effects] [random effects] [micro level error] 76

A look at data: 77

School MEANSES Minority Sex 2458 3610 5640 3838 9359 2208 6089 1477 2768 3039 5819 6397 1308 1433 1436 1461 1637 1942 1946 2336 2526 2626 2755 2771 2990 3020 3152 3332 3351 3427 3657 4642 5404 5619 6366 6469 7011 7276 7332 7345 7688 7697 8193 8202 8627 8628 9198 9586 8367 8854 4458 5762 6990 5815 7172 4868 7341 1358 4383 2305 8800 3088 8775 7890 6144 6443 5192 6808 2818 9340 4523 6816 2277 8009 5783 3013 7101 4530 9021 4511 2639 3377 6578 9347 3705 3533 1296 4350 9397 4253 2655 7342 9292 3499 7364 8983 5650 2658 8188 4410 9508 8707 1499 8477 1288 6291 1224 4292 8857 3967 6415 1317 2629 4223 1462 9550 6464 4931 5937 7919 3716 1909 2651 2467 1374 6600 5667 5720 3498 3881 2995 5838 3688 9158 8946 7232 2917 6170 8165 9104 2030 8150 4042 8357 8531 6074 4420 1906 3992 3999 4173 4325 5761 6484 6897 7635 7734 8175 8874 9225 Data -1.0-0.5 0.0 0.5 Yes No Female Male 0 10 20 30 40 50 60 0.0 0.2 0.4 0.6 0.8 1.0 0 500 1500 2500 0 500 1000 1500 Fraction of 3642 (NA: 0 ) SES MathAch Size Sector Data -3-2 -1 0 1 2-4 Data 0 5 10 15 20 25 Data 0 500 1000 2000 Catholic Public 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0 1000 2000 3000 Fraction of 3642 (NA: 0 ) Fraction of 3642 (NA: 0 ) Fraction of 3642 (NA: 0 ) PRACAD DISCLIM HIMINTY PropFemale Data 0.0 0.2 0.4 0.6 0.8 Data -1 0 1 2 1 0 Data 0.4 0.5 0.6 0.7 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0 500 1500 2500 0.0 0.2 0.4 0.6 0.8 1.0 Fraction of 3642 (NA: 0 ) Fraction of 3642 (NA: 0 ) Fraction of 3642 (NA: 0 ) Figure 51: Uniform quantile plots and bar charts 78

SecSchool PropMinority MeanSES C8628 C9508 C2629 C3533 C4511 C4223 C7364 C5192 C4523 C7011 C5650 C4173 C5720 C3610 C3838 C3427 C1477 C7688 C9347 C2458 C3020 C3705 C7635 C7332 C8150 C8165 C6366 C2526 C1317 C3039 C9359 C4868 C4931 C3992 C4042 C3688 C5667 C5619 C2208 C2658 C3499 C9198 C1906 C8009 C8857 C1308 C6816 C2755 C1436 C9586 C9021 C2990 C3498 C1433 C9104 C6469 C5404 C2277 C8800 C1462 C2305 C4530 C6578 C4292 C7342 C4253 C5761 C7172 C6074 C8193 P3332 P5937 P1461 P1942 P5762 P4458 P2639 P2917 P8854 P1637 P2655 P5815 P6464 P3657 P9292 P3377 P7890 P6291 P6990 P3088 P7734 P1499 P6144 P1224 P1296 P3716 P9158 P9340 P6443 P8946 P8874 P8775 P2771 P2467 P6170 P2995 P8983 P3967 P4420 P6397 P8477 P6415 P6484 P5640 P8175 P7341 P8357 P6808 P3999 P7232 P2626 P2768 P6600 P4325 P3013 P1358 P1374 P1946 P7101 P8367 P3152 P7345 P9550 P2651 P4383 P7276 P8202 P6089 P4410 P8627 P2818 P4642 P1288 P3881 P9397 P8188 P4350 P8707 P5838 P5783 P5819 P1909 P9225 P7697 P3351 P2030 P6897 P8531 P2336 P7919 Data.8 1.0 0.0 0.2 0.4 0.6 0 Data 0.5-1.0-0.5 0.0 0 10 20 30 40 50 60 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Fraction of 3642 (NA: 0 ) Fraction of 3642 (NA: 0 ) Figure 52: Uniform quantile plots and bar charts 79

-4-3 -2-1 0 1 2-4 -3-2 -1 0 1 2-4 -3-2 -1 0 1 2 25 6443 8946 8874 8775 2771 2467 20 15 10 5 0 6144 1224 1296 3716 9158 9340 25 20 15 10 5 0 MathAch 25 20 15 10 5 0 7890 6291 6990 3088 7734 1499 2655 5815 6464 3657 9292 3377 25 20 15 10 5 0 25 5762 4458 2639 2917 8854 1637 20 15 10 5 0-4 -3-2 -1 0 1 2-4 -3-2 -1 0 1 2-4 -3-2 -1 0 1 2 SES Figure 53: Trellis plot of data in each school ordered by mean SES (lowest 30 Public schools) 80

-4-3 -2-1 0 1 2-4 -3-2 -1 0 1 2-4 -3-2 -1 0 1 2 25 7101 8367 3152 7345 9550 2651 20 15 10 5 0 6600 4325 3013 1358 1374 1946 25 20 15 10 5 0 MathAch 25 20 15 10 5 0 8357 6808 3999 7232 2626 2768 8477 6415 6484 5640 8175 7341 25 20 15 10 5 0 25 6170 2995 8983 3967 4420 6397 20 15 10 5 0-4 -3-2 -1 0 1 2-4 -3-2 -1 0 1 2-4 -3-2 -1 0 1 2 SES Figure 54: Trellis plot of data in each school ordered by mean SES (middle 30 Public schools) 81

-4-3 -2-1 0 1 2-4 -3-2 -1 0 1 2-4 -3-2 -1 0 1 2 25 2336 7919 3332 5937 1461 1942 20 15 10 5 0 9225 7697 3351 2030 6897 8531 25 20 15 10 5 0 MathAch 25 20 15 10 5 0 4350 8707 5838 5783 5819 1909 2818 4642 1288 3881 9397 8188 25 20 15 10 5 0 25 4383 7276 8202 6089 4410 8627 20 15 10 5 0-4 -3-2 -1 0 1 2-4 -3-2 -1 0 1 2-4 -3-2 -1 0 1 2 SES Figure 55: Trellis plot of data in each school ordered by mean SES (highest 30 Public schools) 82

83

-4-3 -2-1 0 1 2-4 -3-2 -1 0 1 2-4 -3-2 -1 0 1 2 25 6443 8946 8874 8775 2771 2467 20 15 10 5 0 6144 1224 1296 3716 9158 9340 25 20 15 10 5 0 MathAch 25 20 15 10 5 0 7890 6291 6990 3088 7734 1499 2655 5815 6464 3657 9292 3377 25 20 15 10 5 0 25 5762 4458 2639 2917 8854 1637 20 15 10 5 0-4 -3-2 -1 0 1 2-4 -3-2 -1 0 1 2-4 -3-2 -1 0 1 2 SES Figure 56: Trellis plot of data in each school ordered by mean SES (lowest 30 Public schools) 84

-4-3 -2-1 0 1 2-4 -3-2 -1 0 1 2-4 -3-2 -1 0 1 2 25 7101 8367 3152 7345 9550 2651 20 15 10 5 0 6600 4325 3013 1358 1374 1946 25 20 15 10 5 0 MathAch 25 20 15 10 5 0 8357 6808 3999 7232 2626 2768 8477 6415 6484 5640 8175 7341 25 20 15 10 5 0 25 6170 2995 8983 3967 4420 6397 20 15 10 5 0-4 -3-2 -1 0 1 2-4 -3-2 -1 0 1 2-4 -3-2 -1 0 1 2 SES Figure 57: Trellis plot of data in each school ordered by mean SES (middle 30 Public schools) 85

-4-3 -2-1 0 1 2-4 -3-2 -1 0 1 2-4 -3-2 -1 0 1 2 25 2336 7919 3332 5937 1461 1942 20 15 10 5 0 9225 7697 3351 2030 6897 8531 25 20 15 10 5 0 MathAch 25 20 15 10 5 0 4350 8707 5838 5783 5819 1909 2818 4642 1288 3881 9397 8188 25 20 15 10 5 0 25 4383 7276 8202 6089 4410 8627 20 15 10 5 0-4 -3-2 -1 0 1 2-4 -3-2 -1 0 1 2-4 -3-2 -1 0 1 2 SES Figure 58: Trellis plot of data in each school ordered by mean SES (highest 30 Public schools) 86

Public Schools: Estimates of int. and slope (beta space) b0-5 0 5 10 b(ses) Figure 59: Least squares estimates of intercept and slope in beta space 87

Public Schools: Estimates of int. and slope (data space) MathAch -3-2 -1 0 1 2 SES Figure 60: Least-squares lines from each school in data space 88

Each point β ˆi has variance: Var( βˆ ) = T+σ 2 ( X X ) ' 1 i i i If the X i s are all the same, then all the β ˆi s have the same variance and each gets the same weight in γ 00 estimating γ =. We can use classical methods. γ 10 Otherwise.. mixed models!!! Here even N i differs from school to school so we use a mixed model analysis 89

S-Plus library(nlme) output > fitp <- lme( MathAch ~ SES, dsp, random = ~ 1 + SES School ) > summary(fitp) Linear mixed-effects model fit by REML Data: dsp AIC BIC loglik 23914.51 23951.71-11951.25 Random effects: Formula: ~ 1 + SES School Structure: General positive-definite StdDev Corr (Intercept) 1.8056132 (Inter ˆ τ 00 SES 0.5103541 0.999 ˆ τ ˆ τ / ˆ τ ˆ τ Residual 6.3285603 11 01 00 11 ˆ σ 2 Fixed effects: MathAch ~ SES Value Std.Error DF t-value p-value (Intercept) 11.71820 0.2204700 3551 53.15098 <.0001 SES 3.01583 0.1544383 3551 19.52773 <.0001 γ γ ˆ00 ˆ10 90

Public Schools: Estimates of int. and slope (beta space) b0-5 0 5 10 b(ses) Figure 61: Least-squares estimates (open circles) and mixed model estimate (solid circle) 91

Public Schools: Estimates of int. and slope (beta space) b0-5 0 5 10 b(ses) Figure 62: Least-squares estimates, mixed model estimate and unweighted mean of least-squares estimates 92

Public Schools: Estimates of int. and slope (data space) MathAch -3-2 -1 0 1 2 SES Figure 63: Least-squares estimates from each school (data space) 93

Public Schools: Estimates of int. and slope (data space) MathAch -3-2 -1 0 1 2 SES Figure 64: Least-squares estimates and mixed model overall estimate 94

Public Schools: Estimates of int. and slope (data space) MathAch -3-2 -1 0 1 2 SES Figure 65: Least-squares estimates, mixed model overall estimate and unweighted average of leastsquares lines 95

Interpreting estimated variances ˆT: Estimated variance of the true β i 96

Public Schools: Estimates of int. and slope (beta space) b0-5 0 5 10 b(ses) Figure 66: Least-squares estimates and mixed model overall estimate 97

Public Schools: Estimates of int. and slope (beta space) b0-5 0 5 10 b(ses) Figure 67: above + estimated dispersion ellipse for T: note that it has almost collapsed to a line 98

ˆT has almost collapsed to a line. What does this mean? The variability in β has tw T and ' ( ) 1 ˆi o components: the variability in one dimension entirely with σ ' ( ) 1 σ 2 XX. The REML process finds that it can explain i i i i 2 XX and no contribution from T. In this data, the departure from parallelism of the β ˆi s can be explained by the random variation within schools. There is little evidence of differences in true slope from school to school. We can test this hypothesis by fitting a model with only a random intercept and no random slope. But we leave the discussion to later. 99

Estimating β i Is least squares all there is? If we had only one school, the data from that school would be the only source of information (except to a Bayesian) about the true β for that school. i With a sample of many schools we have two sources of information for the ith school: 1. the data from the school 2. the data from other schools The data from the school is generated from the true line. The error of estimation of the true line with ( ' ) 1 σ 2 XX β ˆi is But ˆγ is also an estimate of β i with variance T We can combine them together into an optimal estimator by using inverse variance weights. 100

If we knew the variance and population parameters: Source Estimator Variance ith school ' ˆi σ 2 XX i i β ( ) 1 Other schools γ T Optimal combination: Best Linear Unbiased Predictor (contrast with Best Linear Unbiased Estimator) BLUP: 1 1 [ ] 1 ' 1 ( ) ˆ i σ 2 = i i i+ 1 1 1 1 1 σ2 ' 1 ' 1 ( ) ( ) ˆ i i σ2 = XX + T XX i i βi+ T γ 1 σ 2 ' 1 ' 1 ( ) ( ) ˆ i i σ 2 = XX + T XX i i βi+ T γ β XX β T γ EBLUP (Empirical BLUP): same but use ˆ σ 2, Tˆ, γ ˆ What does this look like in practice? Note: Small Τ implies a big Τ 1 and γ gets a large weight. 101