Lecture 9 STK3100/4100

Similar documents
Non-Gaussian Response Variables

13. October p. 1

STAT 526 Advanced Statistical Methodology

Multivariate Statistics in Ecology and Quantitative Genetics Summary

Linear, Generalized Linear, and Mixed-Effects Models in R. Linear and Generalized Linear Models in R Topics

Workshop 9.3a: Randomized block designs

Mixed models in R using the lme4 package Part 7: Generalized linear mixed models

R Output for Linear Models using functions lm(), gls() & glm()

These slides illustrate a few example R commands that can be useful for the analysis of repeated measures data.

Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

Outline. Mixed models in R using the lme4 package Part 5: Generalized linear mixed models. Parts of LMMs carried over to GLMMs

Linear Regression Models P8111

Modelling using ARMA processes

Package HGLMMM for Hierarchical Generalized Linear Models

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

Generalized linear mixed models for biologists

A brief introduction to mixed models

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

Generalized linear models

Overdispersion Workshop in generalized linear models Uppsala, June 11-12, Outline. Overdispersion

Introduction and Background to Multilevel Analysis

20. REML Estimation of Variance Components. Copyright c 2018 (Iowa State University) 20. Statistics / 36

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key

PAPER 206 APPLIED STATISTICS

Generalized Linear Mixed-Effects Models. Copyright c 2015 Dan Nettleton (Iowa State University) Statistics / 58

Exercises in STK3100/4100.

Correlated Data: Linear Mixed Models with Random Intercepts

Generalized Linear Models

PAPER 218 STATISTICAL LEARNING IN PRACTICE

Generalized linear models

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

HW 2 due March 6 random effects and mixed effects models ELM Ch. 8 R Studio Cheatsheets In the News: homeopathic vaccines

Introduction to Within-Person Analysis and RM ANOVA

I r j Binom(m j, p j ) I L(, ; y) / exp{ y j + (x j y j ) m j log(1 + e + x j. I (, y) / L(, ; y) (, )

STK4900/ Lecture 10. Program

Generalized Linear Models. Kurt Hornik

Ch 8. MODEL DIAGNOSTICS. Time Series Analysis

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

Time-Series Regression and Generalized Least Squares in R*

Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/

FORECASTING SUGARCANE PRODUCTION IN INDIA WITH ARIMA MODEL

Generalised linear models. Response variable can take a number of different formats

1. Time-dependent data in general

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

22s:152 Applied Linear Regression. Returning to a continuous response variable Y...

Outline. Mixed models in R using the lme4 package Part 3: Longitudinal data. Sleep deprivation data. Simple longitudinal data

Linear Mixed Models. One-way layout REML. Likelihood. Another perspective. Relationship to classical ideas. Drawbacks.

Ch 6. Model Specification. Time Series Analysis

Outline. Statistical inference for linear mixed models. One-way ANOVA in matrix-vector form

Bayesian analysis of logistic regression

22s:152 Applied Linear Regression. In matrix notation, we can write this model: Generalized Least Squares. Y = Xβ + ɛ with ɛ N n (0, Σ)

Model comparison and selection

Univariate Time Series Analysis; ARIMA Models

Generalized Linear Models 1

Generalized, Linear, and Mixed Models

Estimating prediction error in mixed models

Logistic Regressions. Stat 430

Overview. 1. Independence. 2. Modeling Autocorrelation. 3. Temporal Autocorrelation Example. 4. Spatial Autocorrelation Example

Improving the Precision of Estimation by fitting a Generalized Linear Model, and Quasi-likelihood.

Chapter 4 - Fundamentals of spatial processes Lecture notes

Univariate ARIMA Models

Statistics 203: Introduction to Regression and Analysis of Variance Course review

lme4 Luke Chang Last Revised July 16, Fitting Linear Mixed Models with a Varying Intercept

Forecasting using R. Rob J Hyndman. 2.4 Non-seasonal ARIMA models. Forecasting using R 1

Regression models. Generalized linear models in R. Normal regression models are not always appropriate. Generalized linear models. Examples.

Random and Mixed Effects Models - Part III

WU Weiterbildung. Linear Mixed Models

R code and output of examples in text. Contents. De Jong and Heller GLMs for Insurance Data R code and output. 1 Poisson regression 2

Outline for today. Computation of the likelihood function for GLMMs. Likelihood for generalized linear mixed model

NANYANG TECHNOLOGICAL UNIVERSITY SEMESTER II EXAMINATION MAS451/MTH451 Time Series Analysis TIME ALLOWED: 2 HOURS

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn

MSH3 Generalized linear model

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,

Non-independence due to Time Correlation (Chapter 14)

Exploring Hierarchical Linear Mixed Models

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn

Generalized Linear Models I

Solution to Series 6

APPLIED ECONOMETRIC TIME SERIES 4TH EDITION

Modeling Overdispersion

1 Class Organization. 2 Introduction

Generalized Linear Models (GLZ)

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Circle the single best answer for each multiple choice question. Your choice should be made clearly.

STA 450/4000 S: January

Intruction to General and Generalized Linear Models

Logistic Regression - problem 6.14

Outline for today. Two-way analysis of variance with random effects

Hierarchical Linear Models (HLM) Using R Package nlme. Interpretation. 2 = ( x 2) u 0j. e ij

36-720: Linear Mixed Models

36-463/663: Hierarchical Linear Models

Regression with correlation for the Sales Data

McGill University. Faculty of Science. Department of Mathematics and Statistics. Statistics Part A Comprehensive Exam Methodology Paper

Outline for today. Maximum likelihood estimation. Computation with multivariate normal distributions. Multivariate normal distribution

Log-linear Models for Contingency Tables

Statistics: A review. Why statistics?

Transcription:

Lecture 9 STK3100/4100 27. October 2014 Plan for lecture: 1. Linear mixed models cont. Models accounting for time dependencies (Ch. 6.1) 2. Generalized linear mixed models (GLMM, Ch. 13.1-13.3) Examples R code GLMM - general formulation of the model Likelihood and estimation p. 1

Models accounting for time dependencies (Ch. 6.1) Ex: Abundance of a bird species at Hawaii 1956-2003 (square root transformed) Moorhen abundance on Kauai 5 10 15 1960 1970 1980 1990 2000 Many subsequent observations above or below trend Year p. 2

Linear regression y s = α+β 1 Rainfall s +β 2 Year s +ε s ε s N(0,σ 2 ) > fit = gls(birds Rainfall+Year,na.action=na.omit,data=Hawaii) > summary(fit) Generalized least squares fit by REML Model: Birds Rainfall + Year Data: Hawaii AIC BIC loglik 228.4798 235.4305-110.2399 Coefficients: Value Std.Error t-value p-value (Intercept) -477.6634 56.41907-8.466346 0.0000 Rainfall 0.0009 0.04989 0.017245 0.9863 Year 0.2450 0.02847 8.604858 0.0000 Residual standard error: 2.608391 Degrees of freedom: 45 total; 42 residual Can not trust p values if residuals are dependent Positive dependencies give too little uncertainties and too small p values p. 3

Time dependencies in residuals Residuals 4 2 0 2 4 6 1960 1970 1980 1990 2000 Still many subsequent observations above or below trend Year p. 4

Time dependency Uncorrelated (= independence if Gaussian) if 1 ifs = t cov[ε s,ε t ] = σ 2 0 ifs t Extension: cov[ε s,ε t ] = σ 2 1 ifs = t h(s,t) ifs t We will typical assume h(s, t) = h( s t ) p. 5

Autocorrelation function (ACF) 1 n v n v s=1 ĥ(v) = (y s ȳ)(y s+v ȳ) ˆσ 2 Can be calculated and plotted by the R function acf p. 6

Ex: ACF for Bird residuals M0<-gls(Birds Rainfall+Year,na.action=na.omit,data=Hawaii) E <- residuals(m0, type = "normalized") I1 <-!is.na(hawaii$birds) Efull <- rep(na,length(hawaii$birds)) Efull[I1] <- E acf(efull, na.action = na.pass, main = "Auto-correlation plot for residuals") Auto correlation plot for residuals ACF 0.5 0.0 0.5 1.0 0 5 10 15 Lag p. 7

Models accounting for time dependency: Compound symmetry Marginal model cov[ε s,ε t ] = σ 2 1 ifs = t φ ifs t M1<-gls(Birds Rainfall + Year, na.action = na.omit, correlation = corcompsymm(form = Year), data=hawaii) Same covariance structure as within a group in a random intercept model p. 8

Residual plot for model with compound symmetry Auto correlation plot for residuals ACF 0.5 0.0 0.5 1.0 0 5 10 15 Covariance matrix with compound symmetry does not remove positive autocorrelation Lag p. 9

Models accounting for time dependency: AR(1) Assume ε s = φε s 1 +η s uif η s N(0,σ 2 ) 1 < φ < 1 This is called an autoregressive model of order 1, AR(1) 1 ifs = t cov[ε s,ε t ] = σ 2 φ s t ifs t p. 10

AR(1) - R code M2<-gls(Birds Rainfall + Year, na.action = na.omit, correlation = corar1(form = Year), data = Hawaii) Auto correlation plot for residuals ACF 0.2 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 Correlation structure is removed! Lag p. 11

AR(1) - R code > summary(m2) Generalized least squares fit by REML Model: Birds Rainfall + Year Data: Hawaii AIC BIC loglik 199.1394 207.8277-94.5697 Correlation Structure: ARMA(1,0) Formula: Year Parameter estimate(s): Phi1 0.7734303 Coefficients: Value Std.Error t-value p-value (Intercept) -436.4326 138.74948-3.145472 0.0030 Rainfall -0.0098 0.03268-0.300964 0.7649 Year 0.2241 0.07009 3.197828 0.0026 Residual standard error: 2.928588 Degrees of freedom: 45 total; 42 residual ˆφ = ˆ cor[ε s,ε s+1 ] = 0.77 ˆφ 2 = cor[ε ˆ,ε ] = 0.598 p. 12

Autoregressive (AR) models AR(1): ε s = φε s 1 +η s uif η s N(0,σ 2 ) AR(p): cor[ε s,ε t ] = φ s t ε s = φ 1 ε s 1 +φ 2 ε s 2 + +φ p ε s p +η s uif η s N(0,σ 2 ) p. 13

Moving average (MA) models MA(1): ε s = θ 1 η s 1 +η s θ 1 s t = 1 cor[ε s,ε t ] = 0 s t > 1 MA(q): ε s = θ 1 η s 1 +θ 2 η s 2 + +θ q η s q +η s p. 14

ARMA models ARMA(p,q): ε s =φ 1 ε s 1 +φ 2 ε s 2 + +φ p ε s p + θ 1 η s 1 +θ 2 η s 2 + +θ q η s q +η s p. 15

Fitting different models M0 = gls(birds Rainfall+Year,na.action=na.omit,data=Hawaii) M1<-gls(Birds Rainfall + Year, na.action = na.omit, correlation = corcompsymm(form = Year),data=Hawaii) M2 = gls(birds Rainfall + Year, na.action = na.omit, correlation = corar1(form= Year), data = Hawaii) arma2 = corarma(c(.2,.2),p=2,q=0,form= Year) Marma20<-gls(Birds Rainfall + Year, na.action = na.omit, correlation = arma2, data = Hawaii) arma21 = corarma(c(.2,.2,.2),p=2,q=1,form= Year) Marma21<-gls(Birds Rainfall + Year, na.action = na.omit, correlation = arma21, data = Hawaii) arma22 = corarma(c(.2,.2,.2,.2),p=2,q=2,form= Year) Marma22<-gls(Birds Rainfall + Year, na.action = na.omit, correlation = arma22, data = Hawaii) arma3 = corarma(c(.2,.2,.2),p=3,q=0,form= Year) Marma30<-gls(Birds Rainfall + Year, na.action = na.omit, correlation = arma3, data = Hawaii) p. 16

Model selection among several models > AIC(M0,M1,M2,Marma20,Marma21,Marma22,Marma30) df AIC M0 4 228.4798 M1 5 230.4798 M2 5 199.1394 Marma20 6 196.8777 Marma21 7 198.8578 Marma22 8 199.6768 Marma30 7 198.8621 p. 17

GLMM related to other model classes Normal Exponential family lm glm Fixed effects lmm GLMM Fixed and random effects glmm = Generalized Linear Mixed Models p. 18

GLMM Allow dependencies between observations Model structure similar to linear mixed models Theory and methods still under development Many approaches for estimation Documentation is rather technical p. 19

Ex: Species richness No. species RIKZ measured at 9 beaches/areas 5 observations at each beach Want to explain variation in RIKZ by NAP Exposure When we analysed these data by a linear mixed model, we assumed that the response was normal distributed However, the response are count data, and a Poisson distribution is a more natural assumption p. 20

Ex. Species richness: Quasi Poisson > Mglm2 = glm(richness NAP,family=quasipoisson,data = RIKZ) > summary(mglm2) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 1.7910 0.1104 16.218 < 2e-16 *** NAP -0.5560 0.1250-4.448 6.02e-05 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 (Dispersion parameter for quasipoisson family taken to be 3.044178) Null deviance: 179.75 on 44 degrees of freedom Residual deviance: 113.18 on 43 degrees of freedom AIC: NA Ignore dependencies within beaches If we do quasipoisson, we can account for over dispersion, but will still ignore dependencies between groups of observations p. 21

Ex. Species richness: GLMM and glmmpql function > library(mass) > MglmmPQL = glmmpql(richness NAP,random= 1 fbeach,family=poisson,data > summary(mglmmpql) Linear mixed-effects model fit by maximum likelihood Random effects: Formula: 1 fbeach (Intercept) Residual StdDev: 0.4590787 1.112673 Variance function: Structure: fixed weights Formula: invwt Fixed effects: Richness NAP Value Std.Error DF t-value p-value (Intercept) 1.6887218 0.17517518 35 9.640189 2.19e-11 NAP -0.5058049 0.08592218 35-5.886779 1.09e-06 Estimates and standard errors differs from quasi Poisson p. 22

GLMM Y ij b i (Y ij conditioned onb i ) are independent and from the same distribution from the exponential family E[Y ij b i ] = µ ij g(µ ij ) = X ij β +Z ij b i b i uif N(0,D) p. 23

Ex: E. cervi L1 in deer Ecervi.01: 1 if a deer has E. cervi L1, 0 if not fsex: sex of deer Length: length of deer Farm: Farm (24 different farms) p. 24

Ex. E. cervi L1 in deer: Ordinary logistic regression > DE.glm<-glm(Ecervi.01 CLength * fsex, data = DeerEcervi, + family = binomial) > summary(de.glm) Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) 0.652409 0.109602 5.953 2.64e-09 *** CLength 0.025112 0.005576 4.504 6.68e-06 *** fsex2 0.163873 0.174235 0.941 0.3469 CLength:fSex2 0.020109 0.009722 2.068 0.0386 * Data are repeated measurements within same farm But the model ignore dependencies within farms p. 25

Ex. E. cervi L1 in deer: Ordinary logistic regression including farm as a fixed effect factor > DE.glm<-glm(Ecervi.01 CLength * fsex+ffarm, data = DeerEcervi, + family = binomial) > anova(de.glm,test="chisq") Df Deviance Resid. Df Resid. Dev P(> Chi ) NULL 825 1073.13 CLength 1 64.815 824 1008.31 8.225e-16 *** fsex 1 0.191 823 1008.12 0.662216 ffarm 23 252.638 800 755.48 < 2.2e-16 *** CLength:fSex 1 9.984 799 745.50 0.001579 ** Problems ffarm clearly significant, but uses 23 parameters Can show that interaction between ffarm and CLength is also significant, and uses additional 22 parameters How can we predict for a farm without data? p. 26

Ex. E. cervi L1 in deer: GLM vs. GLMM GLM GLMM Y ij Bin(1,p ij ) logit(p ij ) =α+β 1 Length ij +β 2 Sex ij +β 3 Length ij Sex ij +α Farm i Y ij Bin(1,p ij ) logit(p ij ) =α+β 1 Length ij +β 2 Sex ij +β 3 Length ij Sex ij +a i a i N(0,σ 2 a) The size ofσ 2 a indicates the importance of Farm p. 27

Ex. E. cervi L1 in deer: GLMM and glmmpql glmmpql is one out of several R functions for estimating a GLMM: > library(mass) > DE.PQL<-glmmPQL(Ecervi.01 CLength * fsex, + random = 1 ffarm, family = binomial, data = DeerEcervi) > summary(de.pql) Random effects: Formula: 1 ffarm (Intercept) Residual StdDev: 1.462108 0.9620576 Variance function: Structure: fixed weights Formula: invwt Fixed effects: Ecervi.01 CLength * fsex Value Std.Error DF t-value p-value (Intercept) 0.8883697 0.3373283 799 2.633547 0.0086 CLength 0.0378608 0.0065269 799 5.800768 0.0000 fsex2 0.6104570 0.2137293 799 2.856216 0.0044 CLength:fSex2 0.0350666 0.0108558 799 3.230228 0.0013 p. 28

glmmpql - interpretation of output Random effects: Formula: 1 ffarm (Intercept) Residual StdDev: 1.462108 0.9620576 ˆσ a 2 = 1.462108 2 = 2.14 Residual StdDev: Standard deviation of working residuals. Does not correspond directly to a parameter in the model! p. 29

GLMM and likelihood GLM : Likelihood can be written directly LMM : Y i multivariate normal, likelihood can be written directly GLMM : Likelihood contribution from observations from i-th group is f(y i β,θ) = f(y i b i,β)f(b i D)db i b i = f(y ij b i,β)f(b i D)db i b i Difficult to compute the integral Must in addition optimise j L(β,θ) = i f(y i β,θ) wrt. β,θ whered = D(θ) Very complicated numerical problem p. 30

Estimation methods Maximum likelihood REML - difficult, not well understood, probably not much used in practice yet Penalised quasi-likelihood optimise a function that is simpler than the likelihood quasi-likelihood has another meaning than previously in the course MCMC and Bayesian methods - not in this course In addition numerical approximations within these methods: Laplace approximation Gauss-Hermite integration - approximates integrals by sums p. 31

Ex: Comparison of estimation results #Penalized quasi-likelihood > library(mass) > DE.PQL<-glmmPQL(Ecervi.01 CLength * fsex, + random = 1 ffarm, family = binomial, data = DeerEcervi) #ML: Laplace approximation with lmer > library(lme4) > DE.lme4<-lmer(Ecervi.01 CLength * fsex +(1 ffarm), + family = binomial, data = DeerEcervi) #ML: Laplace approx with glmmml > library(glmmml) > DE.glmmML<-glmmML(Ecervi.01 CLength * fsex, + cluster = ffarm,family=binomial, data = DeerEcervi) #ML: Gauss-Hermite with glmmml > DE.glmmML2<-glmmML(Ecervi.01 CLength * fsex,method="ghq", + cluster = ffarm,family=binomial, data = DeerEcervi) Note: None of these use REML p. 32

Ex: Comparison of results Default Gauss-Hermite (20) GLM Intercept 0.652 0.109 Estimates SE Estimates SE Length 0.025 0.005 Sex 0.163 0.174 Length Sex 0.020 0.009 glmmpql Intercept 0.888 0.337 Length 0.027 0.006 Sex 0.610 0.213 Length Sex 0.034 0.010 lmer Intercept 0.941 0.354 0.940 0.355 Length 0.038 0.006 0.039 0.007 Sex 0.624 0.222 0.624 0.224 Length Sex 0.035 0.011 0.036 0.011 glmmml Intercept 0.939 0.357 0.942 0.361 Length 0.038 0.006 0.039 0.007 Sex 0.624 0.224 0.624 0.224 Length Sex 0.035 0.111 0.036 0.011 p. 33