Generalised Linear Mixed Models

Size: px

Start display at page:

Download "Generalised Linear Mixed Models"

Christian Marvin Allen
5 years ago
Views:

1 University of Edinburgh November 4, 2014

2 ANCOVA Bradley-Terry models MANCOVA Meta-analysis Multi-membership models Pedigree analysis: animal models Phylogenetic analysis: comparative approach Random Regression Rasch Models Regression Ridge Regression Splines Survival-analysis Threshold models Time-series Varying coefficient models

3 Outline What is a linear model? What is a random effect? MCMCglmm Non-Gaussian data Structured random effects

4 Linear Model > data(traffic, package="mass") A Swedish Experiment: On some days make everyone drive to the speed limit on others let everyone drive as fast as they want. Count how many citizens are killed.

5 Linear Model > data(traffic, package="mass") A Swedish Experiment: On some days make everyone drive to the speed limit on others let everyone drive as fast as they want. Count how many citizens are killed. > Traffic[c(1,2,184),] year day limit y no no yes 9

6 Linear Model Model Syntax y ~ limit + year + day

7 Linear Model Model Syntax y ~ limit + year + day Set of Simultaneous Equations E[y[1]] = 1β 1 + (limit[1]=="yes")β 2 + (year[1]=="1962")β 3 + day[1]β 4 E[y[2]] = 1β 1 + (limit[2]=="yes")β 2 + (year[2]=="1962")β 3 + day[2]β 4. =.. E[y[184]] = 1β 1 + (limit[184]=="yes")β 2 + (year[184]=="1962")β 3 + day[184]β 4

8 Linear Model Model Syntax y ~ limit + year + day Set of Simultaneous Equations E[y[1]] = 1β 1 + (limit[1]=="yes")β 2 + (year[1]=="1962")β 3 + day[1]β 4 E[y[2]] = 1β 1 + (limit[2]=="yes")β 2 + (year[2]=="1962")β 3 + day[2]β 4. =.. E[y[184]] = 1β 1 + (limit[184]=="yes")β 2 + (year[184]=="1962")β 3 + day[184]β 4 Compact representation: design matrix and parameter vector E[y] = Xβ

9 Linear Model Model Syntax y ~ limit + year + day Set of Simultaneous Equations E[y[1]] = 1β 1 + (limit[1]=="yes")β 2 + (year[1]=="1962")β 3 + day[1]β 4 E[y[2]] = 1β 1 + (limit[2]=="yes")β 2 + (year[2]=="1962")β 3 + day[2]β 4. =.. E[y[184]] = 1β 1 + (limit[184]=="yes")β 2 + (year[184]=="1962")β 3 + day[184]β 4 Compact representation: design matrix and parameter vector E[y] = Xβ > X<-model.matrix(y~limit+year+day, data=traffic) > X[c(1,2,184),] (Intercept) limityes year1962 day

10 Linear Model E[y] = Xβ

11 Linear Model E[y] = Xβ The full model y N(Xβ, σ 2 e I)

12 Linear Model E[y] = Xβ The full model y N(Xβ, σ 2 e I) Error structure σe 2 I = σe = σe σe σe 2

13 Linear Model > m1<-mcmcglmm(y ~ limit + year + day, data=traffic)

14 Linear Model > m1<-mcmcglmm(y ~ limit + year + day, data=traffic) > summary(m1) Iterations = 3001:12991 Thinning interval = 10 Sample size = 1000 DIC: R-structure: ~units post.mean l-95% CI u-95% CI eff.samp units Location effects: y ~ limit + year + day post.mean l-95% CI u-95% CI eff.samp pmcmc (Intercept) <0.001 *** limityes ** year day * --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

15 Linear Mixed Model Random Effects: E[y[1]] = X[1, ]β E[y[2]] = X[2, ]β E[y[184] = X[184, ]β

16 Linear Mixed Model Random Effects: E[y[1]] = X[1, ]β + (day[1]=="1")u 1 + (day[1]=="2")u 2... (day[1]=="92")u 92 E[y[2]] = X[2, ]β + (day[2]=="1")u 1 + (day[2]=="2")u 2... (day[2]=="92")u 92 E[y[184] = X[184, ]β + (day[184]=="1")u 1 + (day[184]=="2")u 2... (day[184]=="92")u 92

17 Linear Mixed Model Random Effects: E[y[1]] = X[1, ]β + (day[1]=="1")u 1 + (day[1]=="2")u 2... (day[1]=="92")u 92 E[y[2]] = X[2, ]β + (day[2]=="1")u 1 + (day[2]=="2")u 2... (day[2]=="92")u 92 E[y[184] = X[184, ]β + (day[184]=="1")u 1 + (day[184]=="2")u 2... (day[184]=="92")u 92 Compact representation: design matrix and parameter vector E[y] = Xβ + Zu = Wθ

18 Linear Mixed Model Random Effects: E[y[1]] = X[1, ]β + (day[1]=="1")u 1 + (day[1]=="2")u 2... (day[1]=="92")u 92 E[y[2]] = X[2, ]β + (day[2]=="1")u 1 + (day[2]=="2")u 2... (day[2]=="92")u 92 E[y[184] = X[184, ]β + (day[184]=="1")u 1 + (day[184]=="2")u 2... (day[184]=="92")u 92 Compact representation: design matrix and parameter vector E[y] = Xβ + Zu = Wθ [ β θ = u ]

19 Linear Mixed Model Random Effects: E[y[1]] = X[1, ]β + (day[1]=="1")u 1 + (day[1]=="2")u 2... (day[1]=="92")u 92 E[y[2]] = X[2, ]β + (day[2]=="1")u 1 + (day[2]=="2")u 2... (day[2]=="92")u 92 E[y[184] = X[184, ]β + (day[184]=="1")u 1 + (day[184]=="2")u 2... (day[184]=="92")u 92 Compact representation: design matrix and parameter vector E[y] = Xβ + Zu = Wθ [ β θ = u ] W = [X, Z]

20 Linear Mixed Model Random Effects: E[y[1]] = X[1, ]β + (day[1]=="1")u 1 + (day[1]=="2")u 2... (day[1]=="92")u 92 E[y[2]] = X[2, ]β + (day[2]=="1")u 1 + (day[2]=="2")u 2... (day[2]=="92")u 92 E[y[184] = X[184, ]β + (day[184]=="1")u 1 + (day[184]=="2")u 2... (day[184]=="92")u 92 Compact representation: design matrix and parameter vector E[y] = Xβ + Zu = Wθ [ β θ = u ] W = [X, Z] > Z<-model.matrix(~as.factor(day)-1, data=traffic) > W<-cbind(X,Z)

21 Linear Mixed Model Fixed Effects σ 2 β β N(0, σ 2 βi) is not estimated, and is usually assumed to be large (or often in non-bayesian models)

22 Linear Mixed Model Fixed Effects σ 2 β β N(0, σ 2 βi) is not estimated, and is usually assumed to be large (or often in non-bayesian models) Random Effects σ 2 u is estimated. u N(0, σ 2 ui)

23 Linear Mixed Model > m2<-mcmcglmm(y ~ limit + year + day, random=~day, data=traffic)

24 Linear Mixed Model > m2<-mcmcglmm(y ~ limit + year + day, random=~day, data=traffic) > summary(m2) Iterations = 3001:12991 Thinning interval = 10 Sample size = 1000 DIC: G-structure: ~day post.mean l-95% CI u-95% CI eff.samp day R-structure: ~units post.mean l-95% CI u-95% CI eff.samp units Location effects: y ~ limit + year + day post.mean l-95% CI u-95% CI eff.samp pmcmc (Intercept) <0.001 *** limityes <0.001 *** year day Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

25 Linear Mixed Model: Credible Intervals > plot(m2$vcv) Trace of day Density of day Iterations N = 1000 Bandwidth = Trace of units Density of units Iterations N = 1000 Bandwidth = Figure: Time-series of MCMC output (left) and smoothed posterior distribution (right) for the variance components.

26 Linear Mixed Model: Credible Intervals > plot(cbind(m2$vcv), type="l") units day Figure: MCMC trace through the joint posterior distribution for the two variance components.

27 Linear Mixed Model: Credible Intervals > r2<-m2$vcv[,"day"]/(m2$vcv[,"day"]+m2$vcv[,"units"]) Iterations N = 1000 Bandwidth = Figure: Time-series of MCMC output (left) and smoothed posterior distribution (right) for the proportion of variance explained by day.

28 Linear Model Diagnostics > hist(traffic$y-predict(m2)) Frequency Residual Figure: Histogram of residuals from model m1 which assumes they followed a Normal distribution.

29 Generalised Linear Model Link function g(): log log(e[y]) = Xβ

30 Generalised Linear Model Link function g(): log log(e[y]) = E[y] = E[y] = Xβ log 1 (Xβ) exp(xβ)

31 Generalised Linear Model Link function g(): log log(e[y]) = E[y] = E[y] = Xβ log 1 (Xβ) exp(xβ) Distribution: Poisson y Pois(λ = exp(xβ))

32 Generalised Linear Mixed Model: MCMCglmm A latent variable l where g 1 (l) is the distribution parameter:

33 Generalised Linear Mixed Model: MCMCglmm A latent variable l where g 1 (l) is the distribution parameter: y Pois(λ = exp(l))

34 Generalised Linear Mixed Model: MCMCglmm A latent variable l where g 1 (l) is the distribution parameter: y Pois(λ = exp(l)) then apply a standard linear model for the latent variables: l N(Wθ, Iσ 2 e )

35 Generalised Linear Mixed Model: MCMCglmm A latent variable l where g 1 (l) is the distribution parameter: y Pois(λ = exp(l)) then apply a standard linear model for the latent variables: l N(Wθ, Iσ 2 e ) Standard Poisson glm assumes σ 2 e = 0.

36 Generalised Linear Model: Poisson > m3<-mcmcglmm(y ~ limit + year + day, data=traffic, family="poisson")

37 Generalised Linear Model: Poisson > m3<-mcmcglmm(y ~ limit + year + day, data=traffic, family="poisson") > summary(m3) Iterations = 3001:12991 Thinning interval = 10 Sample size = 1000 DIC: R-structure: ~units post.mean l-95% CI u-95% CI eff.samp units Location effects: y ~ limit + year + day post.mean l-95% CI u-95% CI eff.samp pmcmc (Intercept) <0.001 *** limityes ** year day * --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

38 Generalised Linear Model: Poisson > prior<-list(r=list(v=0.01, fix=1)) > m4<-mcmcglmm(y ~ limit + year + day, data=traffic, family="poisson", prior=prior)

39 Generalised Linear Model: Poisson > prior<-list(r=list(v=0.01, fix=1)) > m4<-mcmcglmm(y ~ limit + year + day, data=traffic, family="poisson", prior=prior) > summary(m4) Iterations = 3001:12991 Thinning interval = 10 Sample size = 1000 DIC: R-structure: ~units post.mean l-95% CI u-95% CI eff.samp units Location effects: y ~ limit + year + day post.mean l-95% CI u-95% CI eff.samp pmcmc (Intercept) <0.001 *** limityes <0.001 *** year day <0.001 *** --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

40 Generalised Linear Mixed Model: Poisson > m5<-mcmcglmm(y ~ limit + year + day, random=~day, data=traffic, family="poisson")

41 Generalised Linear Mixed Model: Poisson > m5<-mcmcglmm(y ~ limit + year + day, random=~day, data=traffic, family="poisson") > summary(m5) Iterations = 3001:12991 Thinning interval = 10 Sample size = 1000 DIC: G-structure: ~day post.mean l-95% CI u-95% CI eff.samp day R-structure: ~units post.mean l-95% CI u-95% CI eff.samp units e Location effects: y ~ limit + year + day post.mean l-95% CI u-95% CI eff.samp pmcmc (Intercept) <0.001 *** limityes <0.001 *** year ** day Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

42 Generalised Linear Mixed Model: Poisson Trace of day Density of day Iterations N = 1000 Bandwidth = Trace of units Density of units Iterations N = 1000 Bandwidth = 1.834e 05 Figure: Time-series of MCMC output (left) and smoothed posterior distribution (right) with flat (improper) priors on the variance components.

43 Generalised Linear Model > prior<-list(r=list(v=1, nu=0.002), G=list(G1=list(V=1, nu=1, alpha.mu=0, alpha.v=1000)))

44 Generalised Linear Model > prior<-list(r=list(v=1, nu=0.002), G=list(G1=list(V=1, nu=1, alpha.mu=0, alpha.v=1000))) > m5.b<-mcmcglmm(y ~ limit + year + day, random=~day, data=traffic, family="poisson", + prior=prior, nitt=13000*10, thin=10*10, burnin=3000*10) > plot(m5.b$vcv) Trace of day Density of day Iterations N = 1000 Bandwidth = Trace of units Density of units Iterations N = 1000 Bandwidth = Figure: Time-series of MCMC output (left) and smoothed posterior distribution (right) with proper priors on the variance components.

45 Distributions Binomial Multinomial Gaussian Poisson Ordinal Exponential Geometric Threshold Zero-inflated Poisson Zero-altered Poisson Hurdle Poisson Zero-inflated Binomial Censored Gaussian Censored Poisson Censored Exponential

46 Generalised Linear Mixed Model: Binary A latent variable l where g 1 (l) is the distribution parameter:

47 Generalised Linear Mixed Model: Binary A latent variable l where g 1 (l) is the distribution parameter: y Binom(Pr = probit 1 (l)) then apply a standard linear model for the latent variables: l N(Wθ, Iσ 2 e )

48 Generalised Linear Mixed Model: Binary > Traffic$y2<-as.numeric(Traffic$y>20) > m6<-mcmcglmm(y2 ~ limit + year + day, random=~day, data=traffic, + family="ordinal", slice=true)

49 Generalised Linear Mixed Model: Binary > Traffic$y2<-as.numeric(Traffic$y>20) > m6<-mcmcglmm(y2 ~ limit + year + day, random=~day, data=traffic, + family="ordinal", slice=true) > plot(m6$vcv) Trace of day Density of day 0e+00 2e+15 4e Iterations 0e+00 4e 15 0e+00 2e+15 4e+15 N = 1000 Bandwidth = 6.716e+13 Trace of units Density of units 0e+00 2e+15 4e Iterations 0e+00 3e 15 0e+00 2e+15 4e+15 N = 1000 Bandwidth = 8.95e+13 Figure: Time-series of MCMC output (left) and smoothed posterior distribution (right) for variance components.

50 Generalised Linear Mixed Model: Binary > prior<-list(r=list(v=1, fix=1), G=list(G1=list(V=1, nu=1, alpha.mu=0, alpha.v=1000))) > m7<-mcmcglmm(y2 ~ limit + year + day, random=~day, data=traffic, + family="ordinal", slice=true, prior=prior)

51 Generalised Linear Mixed Model: Binary > prior<-list(r=list(v=1, fix=1), G=list(G1=list(V=1, nu=1, alpha.mu=0, alpha.v=1000))) > m7<-mcmcglmm(y2 ~ limit + year + day, random=~day, data=traffic, + family="ordinal", slice=true, prior=prior) > plot(m7$vcv) Trace of day Density of day Iterations N = 1000 Bandwidth = Trace of units Density of units Iterations Figure: Time-series of MCMC output (left) and smoothed posterior distribution (right) for variance components.

52 Generalised Linear Mixed Model: Binary > prior<-list(r=list(v=0.5, fix=1), G=list(G1=list(V=1, nu=1, alpha.mu=0, alpha.v=1000))) > m8<-mcmcglmm(y2 ~ limit + year + day, random=~day, data=traffic, + family="ordinal", slice=true, prior=prior)

53 Generalised Linear Mixed Model: Binary > prior<-list(r=list(v=0.5, fix=1), G=list(G1=list(V=1, nu=1, alpha.mu=0, alpha.v=1000))) > m8<-mcmcglmm(y2 ~ limit + year + day, random=~day, data=traffic, + family="ordinal", slice=true, prior=prior) > plot(mcmc.list(m7$vcv, m8$vcv)) Trace of day Density of day Iterations N = 1000 Bandwidth = Trace of units Density of units Iterations N = 1000 Bandwidth = Figure: Time-series of MCMC output (left) and smoothed posterior distribution (right) for variance components with σ 2 e = 1 (black) and σ 2 e = 0.5 (red)

54 Generalised Linear Mixed Model: Binary > plot(mcmc.list(m7$sol[,"limityes"], m8$sol[,"limityes"])) Iterations N = 1000 Bandwidth = Figure: Time-series of MCMC output (left) and smoothed posterior distribution (right) for the effect of a speed limit with σ 2 e = 1 (black) and σ 2 e = 0.5 (red)

55 Generalised Linear Mixed Model: Binary > res.7<-m7$sol[,"limityes"]/sqrt(m7$vcv[,"units"]+1) > res.8<-m8$sol[,"limityes"]/sqrt(m8$vcv[,"units"]+1)

56 Generalised Linear Mixed Model: Binary > res.7<-m7$sol[,"limityes"]/sqrt(m7$vcv[,"units"]+1) > res.8<-m8$sol[,"limityes"]/sqrt(m8$vcv[,"units"]+1) > plot(mcmc.list(res.7, res.8)) Iterations N = 1000 Bandwidth = Figure: Time-series of MCMC output (left) and smoothed posterior distribution (right) for the scaled effect of a speed limit with σ 2 e = 1 (black) and σ 2 e = 0.5 (red)

57 Generalised Linear Mixed Model: Binary Pr = probit 1 (l) Wθ+e Liability

58 Generalised Linear Mixed Model: Binary Pr = probit 1 (l) Wθ+e Pr Liability

59 Generalised Linear Mixed Model: Binary Pr = probit 1 (l) ε Wθ+e Pr Liability

60 Generalised Linear Mixed Model: Binary Pr = probit 1 (l) ε e Wθ Pr Liability

61 Generalised Linear Mixed Model: Binary Pr = probit 1 (l) ε e Wθσ e ε Pr Liability

62 Generalised Linear Mixed Model: Binary > prior<-list(r=list(v=1, fix=1), G=list(G1=list(V=1, nu=1, alpha.mu=0, alpha.v=1000))) > m9<-mcmcglmm(y2 ~ limit + year + day, random=~day, data=traffic, + family="threshold", prior=prior)

63 Generalised Linear Mixed Model: Binary > prior<-list(r=list(v=1, fix=1), G=list(G1=list(V=1, nu=1, alpha.mu=0, alpha.v=1000))) > m9<-mcmcglmm(y2 ~ limit + year + day, random=~day, data=traffic, + family="threshold", prior=prior) > plot(mcmc.list(res.7, res.8, m9$sol[,"limityes"])) Iterations N = 1000 Bandwidth = Figure: Time-series of MCMC output (left) and smoothed posterior distribution (right) for the scaled effect of a speed limit with σe 2 = 1 (black), σe 2 = 0.5 (red) and σe 2 = 0 (green)

MCMCglmm Course Notes. Jarrod Hadfield

MCMCglmm Course Notes. Jarrod Hadfield MCMCglmm Course Notes Jarrod Hadfield (j.hadfield@ed.ac.uk) March 17, 2014 Introduction These are (incomplete) course notes about generalised linear mixed models (GLMM). Special emphasis is placed on understanding