De-mystifying random effects models

Similar documents
Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression

Lecture 18: Simple Linear Regression

ST430 Exam 1 with Answers

Hierarchical Linear Models (HLM) Using R Package nlme. Interpretation. 2 = ( x 2) u 0j. e ij

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Models for Clustered Data

Models for Clustered Data

Unit 6 - Introduction to linear regression

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Joint modelling of longitudinal measurements and event time data

Correlation and simple linear regression S5

7. Do not estimate values for y using x-values outside the limits of the data given. This is called extrapolation and is not reliable.

Inference for Regression

AMS 7 Correlation and Regression Lecture 8

MS&E 226: Small Data

STAT 3022 Spring 2007

Ch Inference for Linear Regression

ECON The Simple Regression Model

Lecture 8 CORRELATION AND LINEAR REGRESSION

Unit 6 - Simple linear regression

Exam Applied Statistical Regression. Good Luck!

Chapter 27 Summary Inferences for Regression

Linear regression and correlation

Random Intercept Models

Bios 6648: Design & conduct of clinical research

Lecture 4 Multiple linear regression

Regression Models - Introduction

MATH 644: Regression Analysis Methods

Area1 Scaled Score (NAPLEX) .535 ** **.000 N. Sig. (2-tailed)

Lecture (chapter 13): Association between variables measured at the interval-ratio level

Categorical Predictor Variables

Mixed-Effects Pattern-Mixture Models for Incomplete Longitudinal Data. Don Hedeker University of Illinois at Chicago

A Significance Test for the Lasso

INTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y

Regression Models - Introduction

Linear Regression Model. Badr Missaoui

Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT. Charlotte Wickham. stat511.cwick.co.nz. Nov

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Longitudinal Data Analysis

Chapter 10. Simple Linear Regression and Correlation

Simple Linear Regression

Lecture 11: Simple Linear Regression

1. (Rao example 11.15) A study measures oxygen demand (y) (on a log scale) and five explanatory variables (see below). Data are available as

appstats27.notebook April 06, 2017

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Psych 230. Psychological Measurement and Statistics

A discussion on multiple regression models

UNIVERSITY OF TORONTO Faculty of Arts and Science

Section 3: Simple Linear Regression

Applied Regression Modeling: A Business Approach Chapter 2: Simple Linear Regression Sections

Analysing data: regression and correlation S6 and S7

Asymptotic standard errors of MLE

Practice 2 due today. Assignment from Berndt due Monday. If you double the number of programmers the amount of time it takes doubles. Huh?

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Simple Linear Regression

1 The Classic Bivariate Least Squares Model

Thursday Morning. Growth Modelling in Mplus. Using a set of repeated continuous measures of bodyweight

Specification Errors, Measurement Errors, Confounding

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.

Introduction and Background to Multilevel Analysis

Ch 2: Simple Linear Regression

Warm-up Using the given data Create a scatterplot Find the regression line

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

BIOSTATISTICAL METHODS

Outline. Statistical inference for linear mixed models. One-way ANOVA in matrix-vector form

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

Coping with Additional Sources of Variation: ANCOVA and Random Effects

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Introduction to Within-Person Analysis and RM ANOVA

ECON3150/4150 Spring 2015

Chapter 12: Linear regression II

AP Statistics L I N E A R R E G R E S S I O N C H A P 7

11 Correlation and Regression

t-test for b Copyright 2000 Tom Malloy. All rights reserved. Regression

Example: 1982 State SAT Scores (First year state by state data available)

Lecture 19 Multiple (Linear) Regression

Introduction to the Analysis of Hierarchical and Longitudinal Data

Applied Regression Modeling: A Business Approach Chapter 3: Multiple Linear Regression Sections

VIII. ANCOVA. A. Introduction

PAPER 30 APPLIED STATISTICS

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

WU Weiterbildung. Linear Mixed Models

Multiple Linear Regression

: The model hypothesizes a relationship between the variables. The simplest probabilistic model: or.

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Lecture 13. Simple Linear Regression

LECTURE 15: SIMPLE LINEAR REGRESSION I

Chapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania

Testing a Claim about the Difference in 2 Population Means Independent Samples. (there is no difference in Population Means µ 1 µ 2 = 0) against

Correlation and Regression

Global Sensitivity Analysis for Repeated Measures Studies with. Informative Drop-out: A Fully Parametric Approach

ECON Introductory Econometrics. Lecture 17: Experiments

Multiple Linear Regression

An Introduction to Multilevel Models. PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 25: December 7, 2012

Mixed effects models

A (Brief) Introduction to Crossed Random Effects Models for Repeated Measures Data

Lectures 5 & 6: Hypothesis Testing

PAPER 218 STATISTICAL LEARNING IN PRACTICE

Transcription:

De-mystifying random effects models Peter J Diggle Lecture 4, Leahurst, October 2012

Linear regression input variable x factor, covariate, explanatory variable,... output variable y response, end-point, primary outcome,...

A synthetic example y x relationship between x and y can be captured approximately by a straight line scatter about the line is approximately the same at all values of x

Interpreting the linear regression model y y = α + βx+z x intercept α: predicted value of y when x = 0 slope β: predicted change in y for unit change in x residual z: difference between actual and predicted y

Precision how precisely can we estimate the straight-line relationship? how precisely can we predict a future value of y? y x

The sting in the tail: longitudinal studies y x simple linear regression software assumes that data are uncorrelated in longitudinal studies, with repeated measurements on each subject, this is rarely true as a consequence, nominal standard errors, p-values,... are WRONG

> fit1<-lm(y~1+x) > summary(fit1)... plus lots of stuff you don t want to know Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 3.26789 0.10274 31.81 <2e-16 *** x 0.39286 0.01924 20.41 <2e-16 *** Residual standard error: 0.7817 on 198 degrees of freedom Multiple R-squared: 0.6779, Adjusted R-squared: 0.6763 F-statistic: 416.7 on 1 and 198 DF, p-value: < 2.2e-16

> library(nlme) > fit2<-lme(y~1+x,random=~1 id) > summary(fit2) Linear mixed-effects model fit by REML... plus lots of stuff... Random effects: Formula: ~1 id (Intercept) Residual StdDev: 0.7477531 0.2730349 Fixed effects: y ~ 1+x Value Std.Error DF t-value p-value (Intercept) 3.267887 0.17100989 179 19.10935 0 x 0.392856 0.00672165 179 58.44637 0 Number of Observations: 200 Number of Groups: 20

Random effects random effects can be thought of as missing information on individual subjects that, were it available, would be included in the statistical model to reflect our not knowing what values to use for the random effects, we model them as a random sample from a distribution this induces correlation amongst repeated measurements on the same subjects Example: some subjects are intrinsically high responders, others intrinsically low responders

Correlation doesn t always hurt you Model ˆα SE ˆβ SE fixed effects 3.268 0.103 0.393 0.019 random effects 3.268 0.171 0.393 0.007

Random effects or fixed effects? For our synthetic example, write: Y ij x ij = j th response from i th subject: i = 1,...,n = corresponding value of explanatory variable A random effects model 1. Y ij = α + βx ij + U i + Z ij 2. U i = random effect for subject i 3. Z ij = residual 4. all U i and all Z ij mutually independent 5. different responses on same subject positively correlated: ρ = Var(U) Var(U) + Var(Z)

Random effects or fixed effects? For our synthetic example, write: Y ij x ij = j th response from i th subject: i = 1,...,n = corresponding value of explanatory variable A fixed effects model 1. Y ij = α i + βx ij + Z ij 2. α i = intercept for subject i 3. Z ij = residual 4. all Z ij mutually independent 5. different responses on same subject uncorrelated

Which model is correct? 1. they both are 2. the choice between them depends primarily on why you are analysing the data (a) to find out about the particular subjects in your data (b) to find out about the population from which your subjects were drawn Cases (a) and (b) call for fixed effects and random effects models, respectively 3. and secondarily on considerations of statistical efficiency: for large n, fixed effects model has many more parameters

A philosophical objection to fixed effects? Y ij = mark for student i on exam paper j = 1,...,p Question: what overall mark should you give to student i? Fixed effects model: Y ij = α i + Z ij Z ij N(0,τ 2 ), independent Answer: ˆα i = Ȳ i (observed average mark for student i)) Random effects model: Y ij = A i + Z ij Z ij N(0,τ 2 ), independent A i N(α,σ 2 ), independent Answer: Â i = c ȳ i + (1 c) ȳ c = p/(p + τ 2 /σ 2 )

An RCT of drug therapies for schizophrenia randomised clinical trial of drug therapies three treatments: haloperidol (standard) placebo risperidone (novel) dropout due to inadequate response to treatment Treatment Number of non-dropouts at week 0 1 2 4 6 8 haloperidol 85 83 74 64 46 41 placebo 88 86 70 56 40 29 risperidone 345 340 307 276 229 199 total 518 509 451 396 315 269

The schizophrenia trial data PANSS 0 50 100 150 time (weeks since randomisation)

A summary of the schizophrenia trial data average PANSS score 70 75 80 85 90 95 100 placebo haloperidol risperidone Time (weeks)

A model for the schizophrenia trial data mean response depends on treatment and time two random effects: between subjects (high or low responders) between times within subjects (good and bad days) method of analysis allows for dropouts

PANSS mean response profiles average PANSS score 70 75 80 85 90 95 100 placebo haloperidol risperidone Time (weeks)

What s going on? dropout is selective (high responders more likely to leave) but the data are correlated and this allows the model to infer what you would have seen, had there not been any dropouts which may or may not be what you want

Closing remarks fixed effects describe the variation in average responses of groups of subjects according to their measured characteristics (age, sex, treatment,...) random effects describe variation in subject-specific responses according to their unmeasured characteristics both kinds of model can easily be fitted using open-source software (R), or in various proprietary packages dropout in longitudinal studies can have surprising consequences random effects and parameters are different things: parameters don t change if you re-run an experiment random effects do