Hierarchical Linear Models

Similar documents
Bayesian Linear Models

Bayesian Linear Regression

Hierarchical Linear Models. Hierarchical Linear Models. Much of this material already seen in Chapters 5 and 14. Hyperprior on K parameters α:

Gibbs Sampling in Linear Models #2

Bayesian Linear Models

Bayesian Linear Models

The linear model is the most fundamental of all serious statistical models encompassing:

The Wishart distribution Scaled Wishart. Wishart Priors. Patrick Breheny. March 28. Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/11

Hierarchical Modeling for Univariate Spatial Data

Model Checking and Improvement

Accounting for Complex Sample Designs via Mixture Models

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

Metropolis-Hastings Algorithm

HW3 Solutions : Applied Bayesian and Computational Statistics

BUGS Bayesian inference Using Gibbs Sampling

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

Introduction to Markov Chain Monte Carlo

Markov Chain Monte Carlo

MCMC algorithms for fitting Bayesian models

Ages of stellar populations from color-magnitude diagrams. Paul Baines. September 30, 2008

Statistics & Data Sciences: First Year Prelim Exam May 2018

Principles of Bayesian Inference

Random and Mixed Effects Models - Part II

Bayesian Model Comparison:

Spatial Statistics Chapter 4 Basics of Bayesian Inference and Computation

Multivariate Normal & Wishart

Hierarchical Modelling for Univariate Spatial Data

Hierarchical Modelling for Multivariate Spatial Data

Calibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

36-463/663Multilevel and Hierarchical Models

The joint posterior distribution of the unknown parameters and hidden variables, given the

Stat 5303 (Oehlert): Balanced Incomplete Block Designs 1

Lecture 13 Fundamentals of Bayesian Inference

Hierarchical Modeling for Multivariate Spatial Data

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

Hierarchical models. Dr. Jarad Niemi. August 31, Iowa State University. Jarad Niemi (Iowa State) Hierarchical models August 31, / 31

Gibbs Sampling in Endogenous Variables Models

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

TAKEHOME FINAL EXAM e iω e 2iω e iω e 2iω

Hierarchical Modelling for Univariate Spatial Data

November 2002 STA Random Effects Selection in Linear Mixed Models

Bayesian Inference: Concept and Practice

A Bayesian Treatment of Linear Gaussian Regression

A Comparison of Two MCMC Algorithms for Hierarchical Mixture Models

WinBUGS : part 2. Bruno Boulanger Jonathan Jaeger Astrid Jullion Philippe Lambert. Gabriele, living with rheumatoid arthritis

BAYESIAN INFERENCE FOR A COVARIANCE MATRIX

Computational statistics

Nearest Neighbor Gaussian Processes for Large Spatial Data

Models for Clustered Data

Bayesian Methods for Machine Learning

A Brief and Friendly Introduction to Mixed-Effects Models in Linguistics

MIT /30 Gelman, Carpenter, Hoffman, Guo, Goodrich, Lee,... Stan for Bayesian data analysis

Bayesian Methods in Multilevel Regression

Models for Clustered Data

Robust Bayesian Regression

First Year Examination Department of Statistics, University of Florida

Default Priors and Effcient Posterior Computation in Bayesian

Gibbs Sampling in Latent Variable Models #1

CSC 2541: Bayesian Methods for Machine Learning

variability of the model, represented by σ 2 and not accounted for by Xβ

Bayesian Inference for Regression Parameters

Markov Chain Monte Carlo methods

Gibbs Sampling in Linear Models #1

Statistics in Environmental Research (BUC Workshop Series) II Problem sheet - WinBUGS - SOLUTIONS

MULTILEVEL IMPUTATION 1

Module 11: Linear Regression. Rebecca C. Steorts

UNIVERSITY OF TORONTO Faculty of Arts and Science

Contents. Part I: Fundamentals of Bayesian Inference 1

Analyzing the genetic structure of populations: a Bayesian approach

Contents. 1 Introduction: what is overdispersion? 2 Recognising (and testing for) overdispersion. 1 Introduction: what is overdispersion?

Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements

Hierarchical Modeling for Spatial Data

Estimation of a bivariate hierarchical Ornstein-Uhlenbeck model for longitudinal data

Bayesian spatial hierarchical modeling for temperature extremes

Part 8: GLMs and Hierarchical LMs and GLMs

Modeling conditional distributions with mixture models: Theory and Inference

Stat 5102 Final Exam May 14, 2015

Example using R: Heart Valves Study

Measurement error as missing data: the case of epidemiologic assays. Roderick J. Little

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 15-7th March Arnaud Doucet

Weakly informative priors

General Linear Statistical Models

Bayesian Inference and Decision Theory

MCMC and Gibbs Sampling. Sargur Srihari

Bayesian Inference and MCMC

Multivariate spatial modeling

Bayesian data analysis in practice: Three simple examples

Lecture 3 Linear random intercept models


36-720: Linear Mixed Models

Confidence Intervals, Testing and ANOVA Summary

Impact of serial correlation structures on random effect misspecification with the linear mixed model.

Partially Collapsed Gibbs Samplers: Theory and Methods. Ever increasing computational power along with ever more sophisticated statistical computing

Lecture 6. Prior distributions

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Weakly informative priors

Spatial inference. Spatial inference. Accounting for spatial correlation. Multivariate normal distributions

User's Guide for SCORIGHT (Version 3.0): A Computer Program for Scoring Tests Built of Testlets Including a Module for Covariate Analysis

Transcription:

Hierarchical Linear Models Statistics 220 Spring 2005 Copyright c 2005 by Mark E. Irwin

The linear regression model Hierarchical Linear Models y N(Xβ, Σ y ) β σ 2 p(β σ 2 ) σ 2 p(σ 2 ) can be extended to more complex situations. We can put more complex structures on the βs to better the describe the structure in the data. In addition to allowing for more structure on the βs, it can also used to model the measure error structure Σ y. Hierarchical Linear Models 1

For example, consider the one-way random effects model discussed earlier y ij θ, σ 2 θ j µ, τ 2 ind N(θ j, σ 2 ) iid N(µ, τ 2 ) This is an equivalent model to (after integrating out the θs) y µ, Σ y N(µ, Σ y ) where Var(y i ) = σ 2 + τ 2 = η 2 { ρη 2 if i 1 and i 2 in group j Cov(y i1, y i2 ) = 0 if i 1 and i 2 in different groups Hierarchical Linear Models 2

and ρ = τ 2 σ 2 + τ 2 In this framework, ρ is often referred to as the interclass correlation. Note that this correspondence with the usual ANOVA formulation of the model. See the text for the regression formulation of the equivalence. This approach can be used the model the equal correlation structure Σ y = σ 2 1 ρ ρ ρ ρ 1 ρ ρ ρ ρ 1 ρ ρ ρ ρ 1 discussed last time as long as ρ 0 (each observation is in its own group). (Note that in general that ρ can be positive. However this hierarchical model can not be used to deal with this case.) Hierarchical Linear Models 3

General Hierarchical Linear Model y X, β, Σ y N(Xβ, Σ y ) β X β, α, Σ β N(X β α, Σ β ) α α 0, Σ α ) N(α 0, Σ α ) The first term is the likelihood, the second term is population distribution (process), and the third term is the hyperprior distribution. The X is the set of covariates for the responses y and X β is the set of the covariates for the βs. Often Σ y = σ 2 I, X β = 1 and Σ β = σ 2 β I. Usually the hyprerprior parameters α 0 and Σ α are treated as fixed. Often the noninformative prior p(α) 1 is used. General Hierarchical Linear Model 4

Note that this can be treated as a single linear regression with the structure with γ = (β α) T and y X, γ, Σ N(X γ, Σ ) y = y 0 α 0 X = X 0 I J X β 0 I K Σ = Σ y 0 0 0 Σ β 0 0 0 Σ α While this is sometimes useful for computation, as many conditional distributions just fall out, it is less useful in terms of interpretation. General Hierarchical Linear Model 5

Regression Example Soap Production Waste y: Amount of scrap x 1 : Line speed x 2 : Production line (1 or 2) There are n 1 = 15 observations on Line 1 and n 2 = 12 observations on Line 2. Amount of Scrap 100 200 300 400 500 Line 1 Line 2 100 150 200 250 300 Line Speed Want to fit a model allowing different slopes and intercepts for each production line (i.e. an interaction model). Regression Example 6

We can use the following model y ij β j, σ 2 j ind N(β 0j + β 1j x 1ij, σ 2 j ); i = 1,..., n j, j = 1, 2 β 0j α 0 β 1j α 1 iid N(α 0, 100) iid N(α 1, 1) α 0 N(0, 10 6 ) α 1 N(0, 10 6 ) This model forces the two regression lines to be somewhat similar, though the prior form for the lines is vague. Regression Example 7

Note that this does fit into the framework mentioned earlier with β = (β 01 β 11 β 02 β 12 ) T, α = (α 0 α 1 ) T and X and X β have the forms X = 1 x 111 0. 0 1 x 1n1 1 0 0 0 0 1. x 112 0 0 1 x 1n2 2 X β = 1 0 0 1 1 0 0 1 Regression Example 8

Amount of Scrap 100 200 300 400 500 Line 1 Line 2 100 150 200 250 300 Line Speed The posterior mean lines suggest that the intercepts are quite different but the slopes of the lines are similar, though the slope for line 1 appears to be a bit flatter. Regression Example 9

β 0 Line 1 β 1 Line 1 σ Line 1 Density 0.000 0.005 0.010 0.015 Density 0 1 2 3 4 Density 0.00 0.02 0.04 0.06 0.08 0 50 100 150 0.8 1.0 1.2 1.4 1.6 10 20 30 40 50 β 01 β 0 Line 2 β 11 β 1 Line 2 σ 1 σ Line 2 Density 0.000 0.005 0.010 0.015 0.020 Density 0 1 2 3 4 Density 0.00 0.02 0.04 0.06 0.08 50 0 50 100 0.8 1.0 1.2 1.4 1.6 10 20 30 40 50 β 02 β 12 σ 2 Regression Example 10

The similarity of the slopes is also suggested by the previous histograms and the following posterior summary statistics. It also appears that variation around the regression lines are similar for the two lines, though it appears that the standard deviation is larger for line 1. Parameter Mean SD β 01 95.57 22.56 β 02 10.44 19.60 β 01 1.156 0.107 β 02 1.310 0.088 σ 1 23.68 4.82 σ 2 20.24 5.03 We can examine whether there is a difference between the slopes by examining the distribution of β 11 β 12 and a difference in variance about the regression line by looking at the distribution of σ 1 σ 2. Regression Example 11

β 11 β 12 σ 1 σ 2 Density 0.0 0.5 1.0 1.5 2.0 2.5 Density 0.0 0.2 0.4 0.6 0.8 1.0 0.8 0.4 0.0 0.2 0.4 0.5 1.0 1.5 2.0 2.5 3.0 β 11 β 12 σ 1 σ 2 The is marginal evidence for a difference in slopes as P [β 11 > β 12 y] = 0.125 E[β 11 > β 12 y] = 0.154 Med(β 11 > β 12 y) = 0.150 Regression Example 12

There is less evidence for a difference in σs as P [σ 1 > σ 2 y] = 0.694 E[σ 1 > σ 2 y] = 1.24 Med(σ 1 > σ 2 y) = 1.17 This is also supported by comparing this model with the model where σ 2 1 = σ 2 2. Model DIC p D Common σ 2 247.2 5.6 Different σ 2 249.2 7 In this case the smaller model with σ 2 1 = σ 2 2 appears to be giving the better fit, though it is not a big difference. Regression Example 13

Fitting Hierarchical Linear Models Not surprisingly, exact distributional results for these hierarchical models do not exist and Monte Carlo methods are required. The usual approach is some form of Gibbs sampler. There are a wide range of approaches that can be used for sampling the regression parameters All-at-once Gibbs All regression parameters γ = (β α) T are drawn jointly given y and the variance parameters. While this is simple in theory, for some problems the dimensionality can be huge and this can be inefficient. Scalar Gibbs Draw each parameter separately. dimension of each draw is small. slowly in some cases This can be much faster as the Unfortunately, the chain may mix Fitting Hierarchical Linear Models 14

Blocking Gibbs Sample the regression parameters in blocks. This helps with the dimensionality problems and will tend to mix faster than Scalar Gibbs. Scalar Gibbs with a linear transformation By rotating the parameter space the Markov Chain will tend to mix quickly. Thus working with ξ = A 1 (γ γ 0 ) where A = Vγ 1/2 will mix much better. After this transformation, sample the component of ξ one by one and then transform back at the end of each scan to give γ. This approach can also be used with the Blocking Gibbs form. Fitting Hierarchical Linear Models 15

ANOVA Many Hierarchical Linear Models are examples of ANOVA models. This should not be surprising as any ANOVA model can be written as an regression model where all predictor variables are indicator variables. In this situation, the βs will fall in blocks, corresponding to the different factors in the studies. For example consider a two way design with the interaction terms y ijk ind N(µ + φ i + θ j + (φθ) ij, σ 2 ) In this case there are three blocks, the main effects φ i and θ j interactions (φθ) ij. and the ANOVA 16

A common approach is to put a separate prior structure on each block. For this example, put the prior on the treatment effects φ i θ j (φθ) ij iid N(0, σ 2 φ) iid N(0, σ 2 θ) iid N(0, σ 2 φθ) For the variance parameters, the conjugate hyperprior σ 2 φ Inv χ 2 (ν φ, σ 2 0φ) σ 2 θ Inv χ 2 (ν θ, σ 2 0θ) σ 2 φθ Inv χ 2 (ν φθ, σ 2 0φθ) ANOVA 17

Note that as in standard ANOVA analyzes, the interaction terms are only included if all of the lower order effects included in the interaction are in the model. For example, for a three way ANOVA, the three-way interaction will only be included if all the main effects and two-way interactions are included in the model. ANOVA 18

ANOVA Example MPG: The effect of driver (4 levels) and car (5 levels) were examined. Each driver drove each car over a 40 mile test course twice. From the plot of the data, it appears that both driver and car have an effect on gas mileage. As the pattern of MPG for each driver seems to be the same for each car (points are roughly shifted up or down as the car level changes) is appears that the interaction effects are small. MPG 26 28 30 32 34 36 1 2 3 4 5 Car ANOVA Example 19

The standard ANOVA analysis agrees with this hypothesis as Analysis of Variance Table Response: MPG Df Sum Sq Mean Sq F value Pr(>F) Car 4 94.713 23.678 134.73 3.664e-14 *** Driver 3 280.285 93.428 531.60 < 2.2e-16 *** Car:Drive 12 2.446 0.204 1.16 0.3715 Residuals 20 3.515 0.176 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 ANOVA Example 20

Inference for Bugs model at "mpg.bug" 5 chains, each with 32000 iterations (first 16000 discarded), n.thin = 40, n.sims = 2000 iterations saved Time difference of 23 secs mean sd 2.5% 25% 50% 75% 97.5% Rhat n.eff mu 29.8 2.1 25.4 28.5 30.0 31.2 33.4 1.1 84 phi[1] -3.0 1.9-6.3-4.1-3.1-1.9 1.5 1.0 99 phi[2] 4.2 1.9 1.0 3.1 4.1 5.3 8.8 1.0 98 phi[3] -1.1 1.9-4.4-2.2-1.2 0.0 3.5 1.1 98 phi[4] 0.4 1.9-3.0-0.8 0.2 1.4 4.9 1.0 99 theta[1] -0.9 1.1-3.0-1.6-1.0-0.4 1.5 1.0 230 theta[2] 2.3 1.1 0.2 1.7 2.2 2.8 4.9 1.0 250 theta[3] -2.0 1.1-4.1-2.6-2.0-1.4 0.5 1.0 250 theta[4] 1.2 1.1-0.9 0.6 1.1 1.8 3.7 1.0 220 theta[5] 0.0 1.1-2.1-0.6 0.0 0.6 2.5 1.0 220 ANOVA Example 21

mean sd 2.5% 25% 50% 75% 97.5% Rhat n.eff phitheta[1,1] -0.1 0.2-0.6-0.2-0.1 0.0 0.1 1.0 2000 phitheta[1,2] 0.1 0.2-0.2 0.0 0.0 0.1 0.5 1.0 2000 phitheta[1,3] 0.0 0.1-0.3 0.0 0.0 0.1 0.4 1.0 2000 phitheta[1,4] 0.0 0.1-0.3 0.0 0.0 0.1 0.4 1.0 2000 phitheta[1,5] 0.0 0.2-0.3-0.1 0.0 0.1 0.3 1.0 1300 phitheta[2,1] 0.0 0.1-0.3 0.0 0.0 0.1 0.4 1.0 2000 phitheta[2,2] 0.1 0.2-0.2 0.0 0.0 0.1 0.4 1.0 1100 phitheta[2,3] 0.0 0.1-0.4-0.1 0.0 0.0 0.2 1.0 1600 phitheta[2,4] 0.0 0.1-0.3-0.1 0.0 0.1 0.3 1.0 2000 phitheta[2,5] 0.0 0.2-0.4-0.1 0.0 0.0 0.2 1.0 2000 phitheta[3,1] 0.1 0.2-0.2 0.0 0.0 0.1 0.5 1.0 2000 phitheta[3,2] -0.1 0.2-0.5-0.2-0.1 0.0 0.2 1.0 2000 phitheta[3,3] 0.0 0.1-0.4-0.1 0.0 0.0 0.3 1.0 2000 phitheta[3,4] 0.0 0.2-0.3-0.1 0.0 0.1 0.3 1.0 2000 phitheta[3,5] 0.1 0.2-0.2 0.0 0.0 0.1 0.5 1.0 2000 ANOVA Example 22

mean sd 2.5% 25% 50% 75% 97.5% Rhat n.eff phitheta[4,1] 0.0 0.1-0.3-0.1 0.0 0.1 0.3 1.0 1900 phitheta[4,2] 0.0 0.1-0.3-0.1 0.0 0.1 0.3 1.0 2000 phitheta[4,3] 0.0 0.1-0.2 0.0 0.0 0.1 0.4 1.0 2000 phitheta[4,4] 0.0 0.1-0.3-0.1 0.0 0.0 0.3 1.0 2000 phitheta[4,5] 0.0 0.1-0.3-0.1 0.0 0.1 0.3 1.0 2000 sigma 0.4 0.1 0.3 0.4 0.4 0.5 0.6 1.0 2000 sigmaphi 4.0 2.2 1.7 2.5 3.4 4.7 10.2 1.0 890 sigmatheta 2.2 1.2 1.0 1.5 1.9 2.6 5.0 1.0 2000 sigmaphitheta 0.1 0.1 0.0 0.1 0.1 0.2 0.4 1.0 1000 deviance 44.9 6.3 32.4 40.7 44.7 48.9 57.7 1.0 2000 pd = 20.1 and DIC = 65 (using the rule, pd = var(deviance)/2) ANOVA Example 23

Bugs model at "C:/Documents and Settings/Mark Irwin/My Documents/Harvard/Courses/Stat 220/R/mpg.bug", 5 chains, each with 32000 iterations mu phi[1] [2] [3] [4] theta[1] [2] [3] [4] [5] phitheta[1,1] [1,2] [1,3] [1,4] [1,5] [2,1] [2,2] [2,3] [2,4] [2,5] [3,1] [3,2] [3,3] [3,4] [3,5] [4,1] [4,2] [4,3] [4,4] [4,5] sigma sigmaphi sigmatheta sigmaphitheta 80% interval for each chain R hat 20 20 0 0 20 20 40 40 1 1.5 2+ 1 1.5 2+ mu phi theta phitheta sigma sigmaphi sigmatheta 35 30 25 10 5 0 5 10 5 0 5 0.5 0 0.5 0.6 0.5 0.4 0.3 8 6 4 2 4 3 2 1 medians and 80% intervals 1 2 3 4 1 2 3 4 5 11 2 3 4 5 2 1 2 3 4 5 3 1 2 3 4 5 4 1 2 3 4 5 0.3 0.2 sigmaphitheta 0.1 0 ANOVA Example 24 deviance 60 50 40 30