Hierarchical generalized linear models a Lego approach to mixed models

Size: px
Start display at page:

Download "Hierarchical generalized linear models a Lego approach to mixed models"

Transcription

1 Hierarchical generalized linear models a Lego approach to mixed models Lars Rönnegård Högskolan Dalarna Swedish University of Agricultural Sciences Trondheim Seminar

2 Aliations Hierarchical Generalized Linear Models Borlänge Uppsala

3 Outline Hierarchical Generalized Linear Models Hierarchical Generalized Linear Models Principle Method Iterative GLM tting algorithm The hglm package in R Fitting a spatial CAR model Using extensions of GLMs (DGLM, HGLM and DHGLM) in genetics Finding genes for uniformity Modelling uniformity in animal breeding

4 Principles, Methods and Algorithms Table : Overview of statistical principles. Principle Method Algorithm Bayesian min. loss/max. posterior MCMC, INLA Extended likelihood h-likelihood N-R, IRWLS Likelihood maximum likelihood N-R, Fisher scoring Frequentist method of moments Monte Carlo

5 Books on extended likelihood principle Pawitan Y (2001) In All Likelihood Lee Y, Nelder JA, Pawitan Y (2006) Generalized Linear Models with Random Eects

6 Some h-likelihood people

7 Denition of the h-likelihood Lee & Nelder's (1996) hierarchical log-likelihood (h-likelihood): h(β,θ,u) = logf (y u) + logf (u). Classical inference Uses marginal likelihood (random eects integrated out) f (y,u)du Includes xed parameters and only observations are treated as random Bayesian inference Probabilistic framework that combines likelihood and prior information. Treats all parameters and observations as random. Extended likelihood inference The h-likelihood is based on the extended likelihood principle Extended likelihood principle: all information in the data about the random and xed eects is included in a joint likelihood (which the h-likelihood is an implementation of) Includes: xed parameters, unobserved random eects, and observations as random

8 Denition of the h-likelihood Lee & Nelder's (1996) hierarchical log-likelihood (h-likelihood): h(β,θ,u) = logf (y u) + logf (u). Classical inference Uses marginal likelihood (random eects integrated out) f (y,u)du Includes xed parameters and only observations are treated as random Bayesian inference Probabilistic framework that combines likelihood and prior information. Treats all parameters and observations as random. Extended likelihood inference The h-likelihood is based on the extended likelihood principle Extended likelihood principle: all information in the data about the random and xed eects is included in a joint likelihood (which the h-likelihood is an implementation of) Includes: xed parameters, unobserved random eects, and observations as random

9 Extended likelihood Likelihood Principle: Birnbaum (1962) showed that the classical likelihood function contains all the information about the value of the xed parameter. Extended Likelihood Principle: Bjørnstad (1996) showed that all information in the data y for parameters θ and unobservables u is in the extended likelihood.

10 The h-likelihood is the extended likelihood applied on HGLM Hierarchical Generalized Linear Models Generalized linear models with random eects Both the response y and the random eects u can come from a wide range of distributions Inference and model selection tools

11 h-likelihood estimation for HGLM Estimating xed and random eects: h β = 0 and h u = 0 Estimating variance components using adjusted prole likelihood: ( h p = h + 1 ) 2 log 2πD 1 h p θ = 0 β = ˆβ,u=û where D is the matrix of second derivatives of h around β = ˆβ,u = û.

12 h-likelihood estimation for HGLM Estimating xed and random eects: h β = 0 and h u = 0 Estimating variance components using adjusted prole likelihood: ( h p = h + 1 ) 2 log 2πD 1 h p θ = 0 β = ˆβ,u=û where D is the matrix of second derivatives of h around β = ˆβ,u = û.

13 The h-likelihood for a linear mixed model For a linear mixed model y = Xβ + Zu + e with u N(0,I k σ 2 u ) and e N(0,I n σ 2 e ) All we need is the normal density function (n iid observations with mean 0): 1 f (x) = ( 2πσ 2 ) 1 e 2σ 2 x x ; ie log(f (x)) = n n 2 log(σ 2 ) 1 x x 2σ 2 We have e = y Xβ Zu, so h(β,θ,u) = logf (y u) + logf (u) = log(f (e)) + log(f (u)) = { n 2 log(σ 2 e ) 1 2σ 2 e e} + { k e 2 log(σ 2 u ) 1 2σ 2 u u} u

14 The h-likelihood for a linear mixed model For a linear mixed model y = Xβ + Zu + e with u N(0,I k σ 2 u ) and e N(0,I n σ 2 e ) All we need is the normal density function (n iid observations with mean 0): 1 f (x) = ( 2πσ 2 ) 1 e 2σ 2 x x ; ie log(f (x)) = n n 2 log(σ 2 ) 1 x x 2σ 2 We have e = y Xβ Zu, so h(β,θ,u) = logf (y u) + logf (u) = log(f (e)) + log(f (u)) = { n 2 log(σ 2 e ) 1 2σ 2 e e} + { k e 2 log(σ 2 u ) 1 2σ 2 u u} u

15 The h-likelihood for a linear mixed model For a linear mixed model y = Xβ + Zu + e with u N(0,I k σ 2 u ) and e N(0,I n σ 2 e ) All we need is the normal density function (n iid observations with mean 0): 1 f (x) = ( 2πσ 2 ) 1 e 2σ 2 x x ; ie log(f (x)) = n n 2 log(σ 2 ) 1 x x 2σ 2 We have e = y Xβ Zu, so h(β,θ,u) = logf (y u) + logf (u) = log(f (e)) + log(f (u)) = { n 2 log(σ 2 e ) 1 2σ 2 e e} + { k e 2 log(σ 2 u ) 1 2σ 2 u u} u

16 h-likelihood estimation for a linear mixed model For the linear mixed model we have h = n 2 log(σ 2 e ) 1 2σ 2 e e k e 2 log(σ 2 u ) 1 2σ 2 u u u Putting the rst derivatives equal to zero (ie h β = 0 and h = 0 ) gives the u standard (Henderson's) Mixed Model Equations for estimating xed and random eects. Estimating the variance components (REML): h p θ = 0 D = h p ( X X 1 σ 2 e Z X 1 σ 2 e = ( h log 2πD 1 ) β = ˆβ,u=û = n 2 log(σ 2 e ) 1 ê ê k 2σ 2 2 log(σ 2 u ) 1 ^u û 1 2σ 2 2 log D e u ) X Z 1 σ 2 e Z Z 1 σ 2 e + I k 1 σ 2 u

17 h-likelihood estimation for a linear mixed model For the linear mixed model we have h = n 2 log(σ 2 e ) 1 2σ 2 e e k e 2 log(σ 2 u ) 1 2σ 2 u u u Putting the rst derivatives equal to zero (ie h β = 0 and h = 0 ) gives the u standard (Henderson's) Mixed Model Equations for estimating xed and random eects. Estimating the variance components (REML): h p θ = 0 D = h p ( X X 1 σ 2 e Z X 1 σ 2 e = ( h log 2πD 1 ) β = ˆβ,u=û = n 2 log(σ 2 e ) 1 ê ê k 2σ 2 2 log(σ 2 u ) 1 ^u û 1 2σ 2 2 log D e u ) X Z 1 σ 2 e Z Z 1 σ 2 e + I k 1 σ 2 u

18 A linear model Hierarchical Generalized Linear Models To start with, consider a linear model with only xed eects: y N(Xβ,σ 2 e ) Can also be written as y = Xβ + e How can this model be tted? e N(0,σ 2 e ) Maximum likelihood: ˆβ = (X X) 1 X y, ˆσ 2 e = 1 n (y X ˆ β) (y X ˆ β) Unbiased residual variance estimate: ˆσ 2 e = 1 n p (y X ˆ β) (y X ˆ β)

19 A linear model Hierarchical Generalized Linear Models To start with, consider a linear model with only xed eects: y N(Xβ,σ 2 e ) Can also be written as y = Xβ + e How can this model be tted? e N(0,σ 2 e ) Maximum likelihood: ˆβ = (X X) 1 X y, ˆσ 2 e = 1 n (y X ˆ β) (y X ˆ β) Unbiased residual variance estimate: ˆσ 2 e = 1 n p (y X ˆ β) (y X ˆ β)

20 Using GLM Hierarchical Generalized Linear Models Basic idea If the xed eects in the mean part of the model (ie β ) were known, then the squared residuals are 2 e σ 2 χ 2 i e 1 (for observation i), i.e. gamma distributed. So, the squared residuals may be tted using a GLM having a gamma distribution with a log link function. But β is estimated and not known V (ê i ) = (1 h ii )σ 2 e where h ii are the diagonal elements of the hat matrix H H = X(X X) 1 X so that ŷ = Hy So, (ê i ) 2 1 h ii can be tted using a GLM having a gamma distribution with a log link function (and weights 1 h ii 2 ).

21 Using GLM Hierarchical Generalized Linear Models Basic idea If the xed eects in the mean part of the model (ie β ) were known, then the squared residuals are 2 e σ 2 χ 2 i e 1 (for observation i), i.e. gamma distributed. So, the squared residuals may be tted using a GLM having a gamma distribution with a log link function. But β is estimated and not known V (ê i ) = (1 h ii )σ 2 e where h ii are the diagonal elements of the hat matrix H H = X(X X) 1 X so that ŷ = Hy So, (ê i ) 2 1 h ii can be tted using a GLM having a gamma distribution with a log link function (and weights 1 h ii 2 ).

22 A heteroscedastic linear model Consider now a linear model with only xed eects both in the mean and dispersion parts: y N(Xβ, exp(x d β d )) Can also be written as y = Xβ + e; e N(0,σ 2 e); How can this model be tted? log(σ 2 e) = X d β d Iterate between a linear model and a GLM (implemented in the R package dglm) Estimate β for a given residual variance Estimate β d : Fit (ê i ) 2 as response variable in a gamma GLM (log link) with 1 h ii linear predictor X d β d

23 A heteroscedastic linear model Consider now a linear model with only xed eects both in the mean and dispersion parts: y N(Xβ, exp(x d β d )) Can also be written as y = Xβ + e; e N(0,σ 2 e); How can this model be tted? log(σ 2 e) = X d β d Iterate between a linear model and a GLM (implemented in the R package dglm) Estimate β for a given residual variance Estimate β d : Fit (ê i ) 2 as response variable in a gamma GLM (log link) with 1 h ii linear predictor X d β d

24 Can this be used for linear mixed models? y = Xb + Zu + e V = ZZ σ 2 u + I n σ 2 e Re-write it as an augmented weighted linear model! y a = Tδ + e a ( ) ( ) ( ) ( ) y X Z b e where y a =, T =, δ =, e 0 0 I k u a = u The variance-covariance ( matrix of the augmented residual vector is given by V (e a ) W 1 In σ = 2 ) e 0 0 I k σ 2 u The estimates from weighted least squares are given by T W Tˆδ = T Wy a This is identical to Henderson's( mixed model equations where ) the left hand side X X 1 X Z 1 σ 2 σ 2 e e can be veried to be T W T = Z X 1 σ 2 e Z Z 1 σ 2 e + I k 1 σ 2 u

25 Can this be used for linear mixed models? y = Xb + Zu + e V = ZZ σ 2 u + I n σ 2 e Re-write it as an augmented weighted linear model! y a = Tδ + e a ( ) ( ) ( ) ( ) y X Z b e where y a =, T =, δ =, e 0 0 I k u a = u The variance-covariance ( matrix of the augmented residual vector is given by V (e a ) W 1 In σ = 2 ) e 0 0 I k σ 2 u The estimates from weighted least squares are given by T W Tˆδ = T Wy a This is identical to Henderson's( mixed model equations where ) the left hand side X X 1 X Z 1 σ 2 σ 2 e e can be veried to be T W T = Z X 1 σ 2 e Z Z 1 σ 2 e + I k 1 σ 2 u

26 Can this be used for linear mixed models? y = Xb + Zu + e V = ZZ σ 2 u + I n σ 2 e Re-write it as an augmented weighted linear model! y a = Tδ + e a ( ) ( ) ( ) ( ) y X Z b e where y a =, T =, δ =, e 0 0 I k u a = u The variance-covariance ( matrix of the augmented residual vector is given by V (e a ) W 1 In σ = 2 ) e 0 0 I k σ 2 u The estimates from weighted least squares are given by T W Tˆδ = T Wy a This is identical to Henderson's( mixed model equations where ) the left hand side X X 1 X Z 1 σ 2 σ 2 e e can be veried to be T W T = Z X 1 σ 2 e Z Z 1 σ 2 e + I k 1 σ 2 u

27 Can this be used for linear mixed models? y = Xb + Zu + e V = ZZ σ 2 u + I n σ 2 e Re-write it as an augmented weighted linear model! y a = Tδ + e a ( ) ( ) ( ) ( ) y X Z b e where y a =, T =, δ =, e 0 0 I k u a = u The variance-covariance ( matrix of the augmented residual vector is given by V (e a ) W 1 In σ = 2 ) e 0 0 I k σ 2 u The estimates from weighted least squares are given by T W Tˆδ = T Wy a This is identical to Henderson's( mixed model equations where ) the left hand side X X 1 X Z 1 σ 2 σ 2 e e can be veried to be T W T = Z X 1 σ 2 e Z Z 1 σ 2 e + I k 1 σ 2 u

28 Use same method as before σ 2 is estimated by applying a gamma GLM to the response e ê2/(1 h i ii) with weights (1 h ii )/2, where the index i goes from 1 to n. Similarly for σ 2 u. Hat values given by the diagonal elements of H = T(T W T) 1 T W Possible to have xed eects in the linear predictor for estimating σ 2 (and e σ 2). u Can also add random eects in this gamma GLM! Double Hierarchical Generalized Linear Models (DHGLM). Can be estimated using a second layer in the iterative GLM algorithm.

29 Use same method as before σ 2 is estimated by applying a gamma GLM to the response e ê2/(1 h i ii) with weights (1 h ii )/2, where the index i goes from 1 to n. Similarly for σ 2 u. Hat values given by the diagonal elements of H = T(T W T) 1 T W Possible to have xed eects in the linear predictor for estimating σ 2 (and e σ 2). u Can also add random eects in this gamma GLM! Double Hierarchical Generalized Linear Models (DHGLM). Can be estimated using a second layer in the iterative GLM algorithm.

30 Hierarchical Generalized Linear Models

31

32 hglm notation Hierarchical Generalized Linear Models Linear mixed model with heteroscedastic residual variance library(hglm) model2 <- hglm(fixed = y ~ x, disp = ~ x, random = ~ 1 ID, family = gaussian(link = identity) )

33 Other possibilities in hglm Notation using design matrices: model2 <- hglm(x, y, Z, X.disp,family = gaussian(link = identity) ) Possible to t animal model, random regression, etc. For instance: animal.model <- hglm(x, y, Z = t(chol(a)), family = gaussian(link = identity) ) Possible to t several random eects: model3 <- hglm(x, y, Z = cbind(z1,z2), randc = c(ncol(z1),ncol(z2)), family = gaussian(link = identity) ) Possible to t other distributions: model4 <- hglm(x, y, Z,family = poisson(link = log) ) Possible to t other distributions for the random eects too: negative_binomial.model <- hglm(x, y, Z,family = poisson(link = log), rand.family = Gamma(link = log))

34 Other possibilities in hglm Notation using design matrices: model2 <- hglm(x, y, Z, X.disp,family = gaussian(link = identity) ) Possible to t animal model, random regression, etc. For instance: animal.model <- hglm(x, y, Z = t(chol(a)), family = gaussian(link = identity) ) Possible to t several random eects: model3 <- hglm(x, y, Z = cbind(z1,z2), randc = c(ncol(z1),ncol(z2)), family = gaussian(link = identity) ) Possible to t other distributions: model4 <- hglm(x, y, Z,family = poisson(link = log) ) Possible to t other distributions for the random eects too: negative_binomial.model <- hglm(x, y, Z,family = poisson(link = log), rand.family = Gamma(link = log))

35 Other possibilities in hglm Notation using design matrices: model2 <- hglm(x, y, Z, X.disp,family = gaussian(link = identity) ) Possible to t animal model, random regression, etc. For instance: animal.model <- hglm(x, y, Z = t(chol(a)), family = gaussian(link = identity) ) Possible to t several random eects: model3 <- hglm(x, y, Z = cbind(z1,z2), randc = c(ncol(z1),ncol(z2)), family = gaussian(link = identity) ) Possible to t other distributions: model4 <- hglm(x, y, Z,family = poisson(link = log) ) Possible to t other distributions for the random eects too: negative_binomial.model <- hglm(x, y, Z,family = poisson(link = log), rand.family = Gamma(link = log))

36 Other possibilities in hglm Notation using design matrices: model2 <- hglm(x, y, Z, X.disp,family = gaussian(link = identity) ) Possible to t animal model, random regression, etc. For instance: animal.model <- hglm(x, y, Z = t(chol(a)), family = gaussian(link = identity) ) Possible to t several random eects: model3 <- hglm(x, y, Z = cbind(z1,z2), randc = c(ncol(z1),ncol(z2)), family = gaussian(link = identity) ) Possible to t other distributions: model4 <- hglm(x, y, Z,family = poisson(link = log) ) Possible to t other distributions for the random eects too: negative_binomial.model <- hglm(x, y, Z,family = poisson(link = log), rand.family = Gamma(link = log))

37 Other possibilities in hglm Notation using design matrices: model2 <- hglm(x, y, Z, X.disp,family = gaussian(link = identity) ) Possible to t animal model, random regression, etc. For instance: animal.model <- hglm(x, y, Z = t(chol(a)), family = gaussian(link = identity) ) Possible to t several random eects: model3 <- hglm(x, y, Z = cbind(z1,z2), randc = c(ncol(z1),ncol(z2)), family = gaussian(link = identity) ) Possible to t other distributions: model4 <- hglm(x, y, Z,family = poisson(link = log) ) Possible to t other distributions for the random eects too: negative_binomial.model <- hglm(x, y, Z,family = poisson(link = log), rand.family = Gamma(link = log))

38 Playing with Lego: Fitting a DHGLM using the hglm package y = Xβ + Zu + e u N(0,Iσ 2 u ) e i N(0,σ 2 e,i), log(σ 2 e ) = Xβ d + Zu d Easy to t using the hglm package w <- rep(1, length(y)) for (i in 1:20) { u d N(0,Iσ 2 u d ) mmean <- hglm(y = y, X = X, Z = Z, weights = w) mdisp <- hglm(y = mmean$resid^2, X = X, Z = Z, family = Gamma(link = 'log'), weights = (1 - mmean$hv)/2) w <- mdisp$fv }

39 Playing with Lego: Fitting a spatial CAR model Linear mixed model y = Xβ + Zu + e with e N(0,I n σ 2 e ) and u N(0,Σ = τ(i n ρd) 1 ) Here D is the neighbourhood matrix specifying which areas that have common borders, τ and ρ are the parameters to be estimated. Eigen decompose D; eigenvalues w and eigenvectors Γ. Then the eigen decomposition of the covariance matrix is Σ = ΓΛΓ T with the diagonal matrix Λ having elements τ 1 ρw i.

40 ˆτ = 1ˆθ 0 Hierarchical Generalized Linear Models Playing with Lego: Fitting a spatial CAR model Re-write the model as: y = Xβ + Γ T Zũ + e with e N(0,I n σ 2 e ) and ũ N(0,Λ) Use a gamma GLM with inverse link and linear predictor θ 0 + θ 1 w to estimate the random eect variance. Then the estimates of τ and ρ are: Possible to t in hglm G <- eigen(nbr)$vectors w <- eigen(nbr)$values ˆρ = ˆθ 1 ˆθ 0 CAR.model_ugly <- hglm(x, y, Z = t(g)%*%z, X.rand.disp = model.matrix(~w), rand.family = Gamma(link = "inverse") ) Implementation in version 2.0 of hglm CAR.model_nice <- hglm(x, y, Z = diag(n), rand.family = CAR(D=nbr))

41 ˆτ = 1ˆθ 0 Hierarchical Generalized Linear Models Playing with Lego: Fitting a spatial CAR model Re-write the model as: y = Xβ + Γ T Zũ + e with e N(0,I n σ 2 e ) and ũ N(0,Λ) Use a gamma GLM with inverse link and linear predictor θ 0 + θ 1 w to estimate the random eect variance. Then the estimates of τ and ρ are: Possible to t in hglm G <- eigen(nbr)$vectors w <- eigen(nbr)$values ˆρ = ˆθ 1 ˆθ 0 CAR.model_ugly <- hglm(x, y, Z = t(g)%*%z, X.rand.disp = model.matrix(~w), rand.family = Gamma(link = "inverse") ) Implementation in version 2.0 of hglm CAR.model_nice <- hglm(x, y, Z = diag(n), rand.family = CAR(D=nbr))

42 ˆτ = 1ˆθ 0 Hierarchical Generalized Linear Models Playing with Lego: Fitting a spatial CAR model Re-write the model as: y = Xβ + Γ T Zũ + e with e N(0,I n σ 2 e ) and ũ N(0,Λ) Use a gamma GLM with inverse link and linear predictor θ 0 + θ 1 w to estimate the random eect variance. Then the estimates of τ and ρ are: Possible to t in hglm G <- eigen(nbr)$vectors w <- eigen(nbr)$values ˆρ = ˆθ 1 ˆθ 0 CAR.model_ugly <- hglm(x, y, Z = t(g)%*%z, X.rand.disp = model.matrix(~w), rand.family = Gamma(link = "inverse") ) Implementation in version 2.0 of hglm CAR.model_nice <- hglm(x, y, Z = diag(n), rand.family = CAR(D=nbr))

43 A tree genetics trial example Figure 1: Location of each tree and their height given in grey scale. Darkness increases with height and white indicates missing phenotype.

44 Figure 1: Location of each tree and their height given in grey scale. Darkness increases with height and white indicates missing phenotype. Figure 2: Estimated spatial and genetic random effects for each tree. Darkness increases with higher values.

45 Ordinary (mean-controlling) genes

46 Variance-controlling genes

47 GWAS for variance-controlling genes Shen, Pettersson, Rönnegård and Carlborg (2012). Inheritance beyond plain heritability: variance controlling genes in Arabidopsis thaliana. PLoS Genetics 8(8):e Arabidopsis thaliana study including 199 individuals Trait: molybdenum content 216,130 SNP Most signicant SNP located within the gene: ion transporter gene MOT1

48 Figure 2: A gene controlling robustness of molybdenum contents in Arabidopsis. Top gure (a) shows logp values for mean-controlling SNP (yellow) and variance-controlling SNP (dierent colours for dierent chromosomes). Bottom gure (b) shows substitution eect of the MOT1 allele. (Shen et al PLoS Genetics)

49 Using Double GLM to t a parametric model

50 Model Hierarchical Generalized Linear Models Traditional model for SNP regression y = µ + x j b + e e N(0,σ 2 e ) model1 <- glm(y ~ SNP) Model to detect variance-controlling genes y = µ + x j b + e e i N(0,σe,i) 2 ; log(σ 2 e ) = c + x j v Model easy to t using Gordon K. Smyth's dglm package in R library(dglm) model2 <- dglm(y ~ SNP, ~ SNP)

51 Model Hierarchical Generalized Linear Models Traditional model for SNP regression y = µ + x j b + e e N(0,σ 2 e ) model1 <- glm(y ~ SNP) Model to detect variance-controlling genes y = µ + x j b + e e i N(0,σe,i) 2 ; log(σ 2 e ) = c + x j v Model easy to t using Gordon K. Smyth's dglm package in R library(dglm) model2 <- dglm(y ~ SNP, ~ SNP)

52 Model Hierarchical Generalized Linear Models Traditional model for SNP regression y = µ + x j b + e e N(0,σ 2 e ) model1 <- glm(y ~ SNP) Model to detect variance-controlling genes y = µ + x j b + e e i N(0,σe,i) 2 ; log(σ 2 e ) = c + x j v Model easy to t using Gordon K. Smyth's dglm package in R library(dglm) model2 <- dglm(y ~ SNP, ~ SNP)

53 Using DHGLM for animal breeding models Rönnegård, Felleki, Fikse, Mulder and Strandberg (2010) Genetic heterogeneity of residual variance - estimation of variance components using double hierarchical generalized linear models. Genetics Selection Evolution 42:8. Rönnegård, L., Felleki, M., Fikse, W.F., Mulder H.A. & Strandberg, E Variance component and breeding value estimation for genetic heterogeneity of residual variance in Swedish Holstein dairy cattle. Journal of Dairy Science 96: Felleki, M., Lee, D., Lee, Y., Gilmour, A. & Rönnegård, L Estimation of breeding values for mean and dispersion, their variance and correlation using double hierarchical generalized linear models. Genetics Research 94: Rönnegård, L. & Lee, Y. (2013) Editorial: Exploring the potential of hierarchical generalized linear models in animal breeding and genetics. Journal of Animal Breeding and Genetics 130:

54 Linear mixed model using pedigree information Animal model y = Xβ + Za + e a N(0,Aσ 2 a ) e N(0,σ 2 e ) a i = additive genetic eect for individual i A= relationship matrix (calculated from pedigree information) Estimated Breeding Values = Best Linear Unbiased Predictor (BLUP) of a i

55 Extending the animal model y = Xβ + Za + e a N(0,Aσ 2 a ) e i N(0,σ 2 e,i), log(σ 2 e ) = X d β d + Za d a d N(0,Aσ 2 a d ) ρ = cor(a,a d ) We t a DHGLM for the pig litter size data, previously studied in Sorensen and Waagepetersen (2003) using MCMC DHGLM possible to t using existing variance-component estimation software (ASReml).

56 Extending the animal model y = Xβ + Za + e a N(0,Aσ 2 a ) e i N(0,σ 2 e,i), log(σ 2 e ) = X d β d + Za d a d N(0,Aσ 2 a d ) ρ = cor(a,a d ) We t a DHGLM for the pig litter size data, previously studied in Sorensen and Waagepetersen (2003) using MCMC DHGLM possible to t using existing variance-component estimation software (ASReml).

57 Data Description Data from Danish Pig Production. Pig litter size from 4,149 sows mean litter size 10.3 The data includes 10,060 records from these 4,149 sows in 82 farms. Fixed eects: farm, season, type of insemination, parity number of litters per sow varying from 1 to 9

58 Simulation results Hierarchical Generalized Linear Models

59 Thank you! Hierarchical Generalized Linear Models Lars Rönnegård Special thanks to my student Majbritt Felleki and collaborators Dalarna University: Moudud Alam Carlborg lab, SLU: Xia Shen, Örjan Carlborg Animal Breeding and Genetics, SLU: Erling Strandberg, Freddy Fikse Seoul National University, Korea: Youngjo Lee Wageningen University, The Netherlands: Herman A. Mulder University of North Carolina at Chapel Hill, USA: William Valdar Reindeer Unit, SLU: Anna Skarin

60 A last illustrative example - Image reconstruction

61 A last illustrative example - Image reconstruction

62 A last illustrative example - Image reconstruction

63 Noise added Hierarchical Generalized Linear Models

64 70% of pixels missing at random

65 Clustered 4x4 pixels missing

66

67 Deriving the algorithm directly from the h-likelihood Estimating the variance components: h p θ = 0 h p = ( h log 2πD 1 ) β = ˆβ,u=û = C n 2 log(σ 2 e ) 1 ê ê k 2σ 2 2 log(σ 2 u ) 1 ^u û 1 2σ 2 2 log D e u When we take the rst derivative of log D, the hat values for the augmented model appears! log D = tr(d 1 δ D) = 1 tr([x,z] [X,Z](T T) 1 ) = (σ 2)2 e δ δσ 2 e δσ 2 e 1 tr([x,z](t T) 1 [X,Z] ) = 1 (σ 2)2 (σ 2)2 e e So, for the residual variance we have n h ii i=1 h p σ 2 e = n 2σ 2 e + 1 2(σ 2 e ) 2 n i=1 ê 2 i 1 (σ 2 e ) 2 n i=1 This can be re-written as the score function for a gamma GLM with response. h ii ê 2 i 1 h ii as And similarly for σ 2 u...

68 Deriving the algorithm directly from the h-likelihood Estimating the variance components: h p θ = 0 h p = ( h log 2πD 1 ) β = ˆβ,u=û = C n 2 log(σ 2 e ) 1 ê ê k 2σ 2 2 log(σ 2 u ) 1 ^u û 1 2σ 2 2 log D e u When we take the rst derivative of log D, the hat values for the augmented model appears! log D = tr(d 1 δ D) = 1 tr([x,z] [X,Z](T T) 1 ) = (σ 2)2 e δ δσ 2 e δσ 2 e 1 tr([x,z](t T) 1 [X,Z] ) = 1 (σ 2)2 (σ 2)2 e e So, for the residual variance we have n h ii i=1 h p σ 2 e = n 2σ 2 e + 1 2(σ 2 e ) 2 n i=1 ê 2 i 1 (σ 2 e ) 2 n i=1 This can be re-written as the score function for a gamma GLM with response. h ii ê 2 i 1 h ii as And similarly for σ 2 u...

69 Deriving the algorithm directly from the h-likelihood Estimating the variance components: h p θ = 0 h p = ( h log 2πD 1 ) β = ˆβ,u=û = C n 2 log(σ 2 e ) 1 ê ê k 2σ 2 2 log(σ 2 u ) 1 ^u û 1 2σ 2 2 log D e u When we take the rst derivative of log D, the hat values for the augmented model appears! log D = tr(d 1 δ D) = 1 tr([x,z] [X,Z](T T) 1 ) = (σ 2)2 e δ δσ 2 e δσ 2 e 1 tr([x,z](t T) 1 [X,Z] ) = 1 (σ 2)2 (σ 2)2 e e So, for the residual variance we have n h ii i=1 h p σ 2 e = n 2σ 2 e + 1 2(σ 2 e ) 2 n i=1 ê 2 i 1 (σ 2 e ) 2 n i=1 This can be re-written as the score function for a gamma GLM with response. h ii ê 2 i 1 h ii as And similarly for σ 2 u...

70 Introduction to genome-wide association studies Example: 3 individuals and 5 SNPs Linear model y = µ + x j b + e with y =

71 Introduction to genome-wide association studies Example: 3 individuals and 5 SNPs Linear model y = µ + x j b + e with y =

72 GWAS Hierarchical Generalized Linear Models Example: 3 individuals and 5 SNPs Linear model y = µ + x j b + e with y = and x1 = 1 0 2

73 GWAS Hierarchical Generalized Linear Models Example: 3 individuals and 5 SNPs Linear model y = µ + x j b + e with y = and ˆb = 6.5 (P = 0.56) and x1 = 1 0 2

74 GWAS Hierarchical Generalized Linear Models Example: 3 individuals and 5 SNPs Linear model y = µ + x j b + e with y = Calculate P-value for each b and plot log 10 P and x2 = 0 1 2

75 GWAS Hierarchical Generalized Linear Models Example: 3 individuals and 5 SNPs Linear model y = µ + x j b + e with y = Calculate P-value for each b and plot log 10 P and x2 = 0 1 2

76 GWAS plot Hierarchical Generalized Linear Models log 10(P) SNP

77 Manhattan plot Hierarchical Generalized Linear Models Example from: Weedon et al. 2008, Nature Genetics 40,

78 Classical likelihood inference A solution to inference for xed unknowns θ was proposed by Fisher (1922). He developed a likelihood theory, expressing the probability to observe the data as the function of the parameter value. Consider a statistical model, consisting of two types of objects, data y and parameter θ, and two related processes on them: Statistical Model for Data Generation: Generate an instance of the data y from a probability function with xed parameters θ, f θ (y). Statistical Inference: Given the data y, make an inference about unknown xed θ in the stochastic model by using the likelihood L(θ;y). The connection between these two processes is: L(θ;y) f θ (y) where L and f are algebraically identical, but on the left-hand side y is xed while θ varies and on the right-hand side θ is xed while y varies. However, this approach avoids inference of any random eects.

79 Classical likelihood inference A solution to inference for xed unknowns θ was proposed by Fisher (1922). He developed a likelihood theory, expressing the probability to observe the data as the function of the parameter value. Consider a statistical model, consisting of two types of objects, data y and parameter θ, and two related processes on them: Statistical Model for Data Generation: Generate an instance of the data y from a probability function with xed parameters θ, f θ (y). Statistical Inference: Given the data y, make an inference about unknown xed θ in the stochastic model by using the likelihood L(θ;y). The connection between these two processes is: L(θ;y) f θ (y) where L and f are algebraically identical, but on the left-hand side y is xed while θ varies and on the right-hand side θ is xed while y varies. However, this approach avoids inference of any random eects.

80 Classical marginal likelihood for a linear mixed model Consider the linear mixed model y = Xβ + Zu + e In the classical likelihood approach the random eects are integrated out f θ (y) = f θ (u)f θ (y u)du and the data generation process is given by a multivariate normal distribution N(Xβ,σ 2 u ZZ + σ 2 e I), with the corresponding likelihood L(θ;y) = (2π V ) 0.5 exp( 0.5(y Xβ) V 1 (y Xβ)) Note, however, that the random eect u is not included and that the classical likelihood does not give estimates of, nor inference about, the random eects.

81 Inference Hierarchical Generalized Linear Models For model comparisons and testing Lee & Nelder (1996) proposed to use the h-likelihood for random eects, f θ (y,v) the marginal likelihood for xed eects, f θ (y) for the dispersion parameters f θ (y ˆβ) When the Laplace approximation is used, f θ (y) is replaced by p v (h) and f θ (y ˆβ) by p β,v (h). Model selection for nested HGLMs: We can use the above likelihoods to perform likelihood ratio tests Model selection for non-nested HGLMs: The conditional AIC (caic) is dened as 2f (y v) + p D where p D are the estimated number of parameters (computed from the trace of the hat matrix).

82 Inference Hierarchical Generalized Linear Models For model comparisons and testing Lee & Nelder (1996) proposed to use the h-likelihood for random eects, f θ (y,v) the marginal likelihood for xed eects, f θ (y) for the dispersion parameters f θ (y ˆβ) When the Laplace approximation is used, f θ (y) is replaced by p v (h) and f θ (y ˆβ) by p β,v (h). Model selection for nested HGLMs: We can use the above likelihoods to perform likelihood ratio tests Model selection for non-nested HGLMs: The conditional AIC (caic) is dened as 2f (y v) + p D where p D are the estimated number of parameters (computed from the trace of the hat matrix).

83 Criticism of h-likelihood - example

Genetic Heterogeneity of Environmental Variance - estimation of variance components using Double Hierarchical Generalized Linear Models

Genetic Heterogeneity of Environmental Variance - estimation of variance components using Double Hierarchical Generalized Linear Models Genetic Heterogeneity of Environmental Variance - estimation of variance components using Double Hierarchical Generalized Linear Models L. Rönnegård,a,b, M. Felleki a,b, W.F. Fikse b and E. Strandberg

More information

Estimation of Parameters in Random. Effect Models with Incidence Matrix. Uncertainty

Estimation of Parameters in Random. Effect Models with Incidence Matrix. Uncertainty Estimation of Parameters in Random Effect Models with Incidence Matrix Uncertainty Xia Shen 1,2 and Lars Rönnegård 2,3 1 The Linnaeus Centre for Bioinformatics, Uppsala University, Uppsala, Sweden; 2 School

More information

The hglm Package. Xia Shen Uppsala University

The hglm Package. Xia Shen Uppsala University The hglm Package Lars Rönnegård Dalarna University Xia Shen Uppsala University Moudud Alam Dalarna University Abstract This vignette describes the R hglm package via a series of applications that may be

More information

Citation for the original published paper (version of record):

Citation for the original published paper (version of record): http://www.diva-portal.org This is the published version of a paper published in The R Journal. Citation for the original published paper (version of record): Rönnegård, L., Shen, X., Alam, M. (010) Hglm:

More information

Evaluation of a New Variance Component. Estimation Method - Hierarchical GLM Approach with. Application in QTL Analysis. Supervisor: Lars Rönnegård

Evaluation of a New Variance Component. Estimation Method - Hierarchical GLM Approach with. Application in QTL Analysis. Supervisor: Lars Rönnegård Evaluation of a New Variance Component Estimation Method - Hierarchical GLM Approach with Application in QTL Analysis Author: Xia Shen Supervisor: Lars Rönnegård D-level Essay in Statistics, Spring 2008.

More information

MIXED MODELS THE GENERAL MIXED MODEL

MIXED MODELS THE GENERAL MIXED MODEL MIXED MODELS This chapter introduces best linear unbiased prediction (BLUP), a general method for predicting random effects, while Chapter 27 is concerned with the estimation of variances by restricted

More information

Alternative implementations of Monte Carlo EM algorithms for likelihood inferences

Alternative implementations of Monte Carlo EM algorithms for likelihood inferences Genet. Sel. Evol. 33 001) 443 45 443 INRA, EDP Sciences, 001 Alternative implementations of Monte Carlo EM algorithms for likelihood inferences Louis Alberto GARCÍA-CORTÉS a, Daniel SORENSEN b, Note a

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Lecture 9 Multi-Trait Models, Binary and Count Traits

Lecture 9 Multi-Trait Models, Binary and Count Traits Lecture 9 Multi-Trait Models, Binary and Count Traits Guilherme J. M. Rosa University of Wisconsin-Madison Mixed Models in Quantitative Genetics SISG, Seattle 18 0 September 018 OUTLINE Multiple-trait

More information

The hglm Package (version 1.2)

The hglm Package (version 1.2) The hglm Package (version 1.2) Lars Rönnegård Dalarna University Xia Shen Uppsala University Moudud Alam Dalarna University Abstract This vignette describes the R hglm package via a series of applications

More information

Mixed-Model Estimation of genetic variances. Bruce Walsh lecture notes Uppsala EQG 2012 course version 28 Jan 2012

Mixed-Model Estimation of genetic variances. Bruce Walsh lecture notes Uppsala EQG 2012 course version 28 Jan 2012 Mixed-Model Estimation of genetic variances Bruce Walsh lecture notes Uppsala EQG 01 course version 8 Jan 01 Estimation of Var(A) and Breeding Values in General Pedigrees The above designs (ANOVA, P-O

More information

H-LIKELIHOOD ESTIMATION METHOOD FOR VARYING CLUSTERED BINARY MIXED EFFECTS MODEL

H-LIKELIHOOD ESTIMATION METHOOD FOR VARYING CLUSTERED BINARY MIXED EFFECTS MODEL H-LIKELIHOOD ESTIMATION METHOOD FOR VARYING CLUSTERED BINARY MIXED EFFECTS MODEL Intesar N. El-Saeiti Department of Statistics, Faculty of Science, University of Bengahzi-Libya. entesar.el-saeiti@uob.edu.ly

More information

Linear Mixed Models. One-way layout REML. Likelihood. Another perspective. Relationship to classical ideas. Drawbacks.

Linear Mixed Models. One-way layout REML. Likelihood. Another perspective. Relationship to classical ideas. Drawbacks. Linear Mixed Models One-way layout Y = Xβ + Zb + ɛ where X and Z are specified design matrices, β is a vector of fixed effect coefficients, b and ɛ are random, mean zero, Gaussian if needed. Usually think

More information

MIT Spring 2015

MIT Spring 2015 Regression Analysis MIT 18.472 Dr. Kempthorne Spring 2015 1 Outline Regression Analysis 1 Regression Analysis 2 Multiple Linear Regression: Setup Data Set n cases i = 1, 2,..., n 1 Response (dependent)

More information

Lecture 5: BLUP (Best Linear Unbiased Predictors) of genetic values. Bruce Walsh lecture notes Tucson Winter Institute 9-11 Jan 2013

Lecture 5: BLUP (Best Linear Unbiased Predictors) of genetic values. Bruce Walsh lecture notes Tucson Winter Institute 9-11 Jan 2013 Lecture 5: BLUP (Best Linear Unbiased Predictors) of genetic values Bruce Walsh lecture notes Tucson Winter Institute 9-11 Jan 013 1 Estimation of Var(A) and Breeding Values in General Pedigrees The classic

More information

The linear model is the most fundamental of all serious statistical models encompassing:

The linear model is the most fundamental of all serious statistical models encompassing: Linear Regression Models: A Bayesian perspective Ingredients of a linear model include an n 1 response vector y = (y 1,..., y n ) T and an n p design matrix (e.g. including regressors) X = [x 1,..., x

More information

Bayesian construction of perceptrons to predict phenotypes from 584K SNP data.

Bayesian construction of perceptrons to predict phenotypes from 584K SNP data. Bayesian construction of perceptrons to predict phenotypes from 584K SNP data. Luc Janss, Bert Kappen Radboud University Nijmegen Medical Centre Donders Institute for Neuroscience Introduction Genetic

More information

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation. Statistical Computation Math 475 Jimin Ding Department of Mathematics Washington University in St. Louis www.math.wustl.edu/ jmding/math475/index.html October 10, 2013 Ridge Part IV October 10, 2013 1

More information

ESTIMATING THE MEAN LEVEL OF FINE PARTICULATE MATTER: AN APPLICATION OF SPATIAL STATISTICS

ESTIMATING THE MEAN LEVEL OF FINE PARTICULATE MATTER: AN APPLICATION OF SPATIAL STATISTICS ESTIMATING THE MEAN LEVEL OF FINE PARTICULATE MATTER: AN APPLICATION OF SPATIAL STATISTICS Richard L. Smith Department of Statistics and Operations Research University of North Carolina Chapel Hill, N.C.,

More information

Topic 12 Overview of Estimation

Topic 12 Overview of Estimation Topic 12 Overview of Estimation Classical Statistics 1 / 9 Outline Introduction Parameter Estimation Classical Statistics Densities and Likelihoods 2 / 9 Introduction In the simplest possible terms, the

More information

Heritability estimation in modern genetics and connections to some new results for quadratic forms in statistics

Heritability estimation in modern genetics and connections to some new results for quadratic forms in statistics Heritability estimation in modern genetics and connections to some new results for quadratic forms in statistics Lee H. Dicker Rutgers University and Amazon, NYC Based on joint work with Ruijun Ma (Rutgers),

More information

ST 740: Linear Models and Multivariate Normal Inference

ST 740: Linear Models and Multivariate Normal Inference ST 740: Linear Models and Multivariate Normal Inference Alyson Wilson Department of Statistics North Carolina State University November 4, 2013 A. Wilson (NCSU STAT) Linear Models November 4, 2013 1 /

More information

Mixed-Models. version 30 October 2011

Mixed-Models. version 30 October 2011 Mixed-Models version 30 October 2011 Mixed models Mixed models estimate a vector! of fixed effects and one (or more) vectors u of random effects Both fixed and random effects models always include a vector

More information

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Institute of Statistics and Econometrics Georg-August-University Göttingen Department of Statistics

More information

21. Best Linear Unbiased Prediction (BLUP) of Random Effects in the Normal Linear Mixed Effects Model

21. Best Linear Unbiased Prediction (BLUP) of Random Effects in the Normal Linear Mixed Effects Model 21. Best Linear Unbiased Prediction (BLUP) of Random Effects in the Normal Linear Mixed Effects Model Copyright c 2018 (Iowa State University) 21. Statistics 510 1 / 26 C. R. Henderson Born April 1, 1911,

More information

1 Mixed effect models and longitudinal data analysis

1 Mixed effect models and longitudinal data analysis 1 Mixed effect models and longitudinal data analysis Mixed effects models provide a flexible approach to any situation where data have a grouping structure which introduces some kind of correlation between

More information

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Department of Mathematics Carl von Ossietzky University Oldenburg Sonja Greven Department of

More information

Weighted Least Squares

Weighted Least Squares Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w

More information

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

Using Estimating Equations for Spatially Correlated A

Using Estimating Equations for Spatially Correlated A Using Estimating Equations for Spatially Correlated Areal Data December 8, 2009 Introduction GEEs Spatial Estimating Equations Implementation Simulation Conclusion Typical Problem Assess the relationship

More information

Likelihood Methods. 1 Likelihood Functions. The multivariate normal distribution likelihood function is

Likelihood Methods. 1 Likelihood Functions. The multivariate normal distribution likelihood function is Likelihood Methods 1 Likelihood Functions The multivariate normal distribution likelihood function is The log of the likelihood, say L 1 is Ly = π.5n V.5 exp.5y Xb V 1 y Xb. L 1 = 0.5[N lnπ + ln V +y Xb

More information

BAYESIAN KRIGING AND BAYESIAN NETWORK DESIGN

BAYESIAN KRIGING AND BAYESIAN NETWORK DESIGN BAYESIAN KRIGING AND BAYESIAN NETWORK DESIGN Richard L. Smith Department of Statistics and Operations Research University of North Carolina Chapel Hill, N.C., U.S.A. J. Stuart Hunter Lecture TIES 2004

More information

Lecture 8 Genomic Selection

Lecture 8 Genomic Selection Lecture 8 Genomic Selection Guilherme J. M. Rosa University of Wisconsin-Madison Mixed Models in Quantitative Genetics SISG, Seattle 18 0 Setember 018 OUTLINE Marker Assisted Selection Genomic Selection

More information

Best unbiased linear Prediction: Sire and Animal models

Best unbiased linear Prediction: Sire and Animal models Best unbiased linear Prediction: Sire and Animal models Raphael Mrode Training in quantitative genetics and genomics 3 th May to th June 26 ILRI, Nairobi Partner Logo Partner Logo BLUP The MME of provided

More information

Conjugate Analysis for the Linear Model

Conjugate Analysis for the Linear Model Conjugate Analysis for the Linear Model If we have good prior knowledge that can help us specify priors for β and σ 2, we can use conjugate priors. Following the procedure in Christensen, Johnson, Branscum,

More information

Multiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar

Multiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression 1 / 36 Previous two lectures Linear and logistic

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2015 Julien Berestycki (University of Oxford) SB2a MT 2015 1 / 16 Lecture 16 : Bayesian analysis

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

Animal Models. Sheep are scanned at maturity by ultrasound(us) to determine the amount of fat surrounding the muscle. A model (equation) might be

Animal Models. Sheep are scanned at maturity by ultrasound(us) to determine the amount of fat surrounding the muscle. A model (equation) might be Animal Models 1 Introduction An animal model is one in which there are one or more observations per animal, and all factors affecting those observations are described including an animal additive genetic

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Outline for today. Computation of the likelihood function for GLMMs. Likelihood for generalized linear mixed model

Outline for today. Computation of the likelihood function for GLMMs. Likelihood for generalized linear mixed model Outline for today Computation of the likelihood function for GLMMs asmus Waagepetersen Department of Mathematics Aalborg University Denmark likelihood for GLMM penalized quasi-likelihood estimation Laplace

More information

Regularized PCA to denoise and visualise data

Regularized PCA to denoise and visualise data Regularized PCA to denoise and visualise data Marie Verbanck Julie Josse François Husson Laboratoire de statistique, Agrocampus Ouest, Rennes, France CNAM, Paris, 16 janvier 2013 1 / 30 Outline 1 PCA 2

More information

Generalized Linear Models. Kurt Hornik

Generalized Linear Models. Kurt Hornik Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Linear Mixed Models for Longitudinal Data Yan Lu April, 2018, week 15 1 / 38 Data structure t1 t2 tn i 1st subject y 11 y 12 y 1n1 Experimental 2nd subject

More information

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Department of Forestry & Department of Geography, Michigan State University, Lansing Michigan, U.S.A. 2 Biostatistics, School of Public

More information

Modeling Real Estate Data using Quantile Regression

Modeling Real Estate Data using Quantile Regression Modeling Real Estate Data using Semiparametric Quantile Regression Department of Statistics University of Innsbruck September 9th, 2011 Overview 1 Application: 2 3 4 Hedonic regression data for house prices

More information

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a

More information

Recent advances in statistical methods for DNA-based prediction of complex traits

Recent advances in statistical methods for DNA-based prediction of complex traits Recent advances in statistical methods for DNA-based prediction of complex traits Mintu Nath Biomathematics & Statistics Scotland, Edinburgh 1 Outline Background Population genetics Animal model Methodology

More information

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Department of Mathematics Carl von Ossietzky University Oldenburg Sonja Greven Department of

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models Mixed effects models - Part II Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is

More information

Matrix Approach to Simple Linear Regression: An Overview

Matrix Approach to Simple Linear Regression: An Overview Matrix Approach to Simple Linear Regression: An Overview Aspects of matrices that you should know: Definition of a matrix Addition/subtraction/multiplication of matrices Symmetric/diagonal/identity matrix

More information

An Introduction to Bayesian Linear Regression

An Introduction to Bayesian Linear Regression An Introduction to Bayesian Linear Regression APPM 5720: Bayesian Computation Fall 2018 A SIMPLE LINEAR MODEL Suppose that we observe explanatory variables x 1, x 2,..., x n and dependent variables y 1,

More information

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University Integrated Likelihood Estimation in Semiparametric Regression Models Thomas A. Severini Department of Statistics Northwestern University Joint work with Heping He, University of York Introduction Let Y

More information

Chapter 3: Maximum Likelihood Theory

Chapter 3: Maximum Likelihood Theory Chapter 3: Maximum Likelihood Theory Florian Pelgrin HEC September-December, 2010 Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 1 / 40 1 Introduction Example 2 Maximum likelihood

More information

1 Data Arrays and Decompositions

1 Data Arrays and Decompositions 1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is

More information

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Xiuming Zhang zhangxiuming@u.nus.edu A*STAR-NUS Clinical Imaging Research Center October, 015 Summary This report derives

More information

WU Weiterbildung. Linear Mixed Models

WU Weiterbildung. Linear Mixed Models Linear Mixed Effects Models WU Weiterbildung SLIDE 1 Outline 1 Estimation: ML vs. REML 2 Special Models On Two Levels Mixed ANOVA Or Random ANOVA Random Intercept Model Random Coefficients Model Intercept-and-Slopes-as-Outcomes

More information

Linear Models A linear model is defined by the expression

Linear Models A linear model is defined by the expression Linear Models A linear model is defined by the expression x = F β + ɛ. where x = (x 1, x 2,..., x n ) is vector of size n usually known as the response vector. β = (β 1, β 2,..., β p ) is the transpose

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Maximum Likelihood Estimation Merlise Clyde STA721 Linear Models Duke University August 31, 2017 Outline Topics Likelihood Function Projections Maximum Likelihood Estimates Readings: Christensen Chapter

More information

Linear Models for the Prediction of Animal Breeding Values

Linear Models for the Prediction of Animal Breeding Values Linear Models for the Prediction of Animal Breeding Values R.A. Mrode, PhD Animal Data Centre Fox Talbot House Greenways Business Park Bellinger Close Chippenham Wilts, UK CAB INTERNATIONAL Preface ix

More information

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Elizabeth C. Mannshardt-Shamseldin Advisor: Richard L. Smith Duke University Department

More information

Models for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data

Models for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data Hierarchical models for spatial data Based on the book by Banerjee, Carlin and Gelfand Hierarchical Modeling and Analysis for Spatial Data, 2004. We focus on Chapters 1, 2 and 5. Geo-referenced data arise

More information

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture16: Population structure and logistic regression I Jason Mezey jgm45@cornell.edu April 11, 2017 (T) 8:40-9:55 Announcements I April

More information

Bayesian Linear Models

Bayesian Linear Models Eric F. Lock UMN Division of Biostatistics, SPH elock@umn.edu 03/07/2018 Linear model For observations y 1,..., y n, the basic linear model is y i = x 1i β 1 +... + x pi β p + ɛ i, x 1i,..., x pi are predictors

More information

Course topics (tentative) The role of random effects

Course topics (tentative) The role of random effects Course topics (tentative) random effects linear mixed models analysis of variance frequentist likelihood-based inference (MLE and REML) prediction Bayesian inference The role of random effects Rasmus Waagepetersen

More information

A short introduction to INLA and R-INLA

A short introduction to INLA and R-INLA A short introduction to INLA and R-INLA Integrated Nested Laplace Approximation Thomas Opitz, BioSP, INRA Avignon Workshop: Theory and practice of INLA and SPDE November 7, 2018 2/21 Plan for this talk

More information

-A wild house sparrow population case study

-A wild house sparrow population case study Bayesian animal model using Integrated Nested Laplace Approximations -A wild house sparrow population case study Anna M. Holand Ingelin Steinsland Animal model workshop with application to ecology at Oulu

More information

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8 Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall

More information

Fractional Imputation in Survey Sampling: A Comparative Review

Fractional Imputation in Survey Sampling: A Comparative Review Fractional Imputation in Survey Sampling: A Comparative Review Shu Yang Jae-Kwang Kim Iowa State University Joint Statistical Meetings, August 2015 Outline Introduction Fractional imputation Features Numerical

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Distinctive aspects of non-parametric fitting

Distinctive aspects of non-parametric fitting 5. Introduction to nonparametric curve fitting: Loess, kernel regression, reproducing kernel methods, neural networks Distinctive aspects of non-parametric fitting Objectives: investigate patterns free

More information

Hierarchical Modeling for Univariate Spatial Data

Hierarchical Modeling for Univariate Spatial Data Hierarchical Modeling for Univariate Spatial Data Geography 890, Hierarchical Bayesian Models for Environmental Spatial Data Analysis February 15, 2011 1 Spatial Domain 2 Geography 890 Spatial Domain This

More information

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Final Review. Yang Feng.   Yang Feng (Columbia University) Final Review 1 / 58 Final Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Final Review 1 / 58 Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple

More information

1 Appendix A: Matrix Algebra

1 Appendix A: Matrix Algebra Appendix A: Matrix Algebra. Definitions Matrix A =[ ]=[A] Symmetric matrix: = for all and Diagonal matrix: 6=0if = but =0if 6= Scalar matrix: the diagonal matrix of = Identity matrix: the scalar matrix

More information

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 Association Testing with Quantitative Traits: Common and Rare Variants Timothy Thornton and Katie Kerr Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 1 / 41 Introduction to Quantitative

More information

BIOS 2083 Linear Models c Abdus S. Wahed

BIOS 2083 Linear Models c Abdus S. Wahed Chapter 5 206 Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter

More information

arxiv: v1 [math.st] 22 Dec 2018

arxiv: v1 [math.st] 22 Dec 2018 Optimal Designs for Prediction in Two Treatment Groups Rom Coefficient Regression Models Maryna Prus Otto-von-Guericke University Magdeburg, Institute for Mathematical Stochastics, PF 4, D-396 Magdeburg,

More information

A measurement error model approach to small area estimation

A measurement error model approach to small area estimation A measurement error model approach to small area estimation Jae-kwang Kim 1 Spring, 2015 1 Joint work with Seunghwan Park and Seoyoung Kim Ouline Introduction Basic Theory Application to Korean LFS Discussion

More information

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood Regression Estimation - Least Squares and Maximum Likelihood Dr. Frank Wood Least Squares Max(min)imization Function to minimize w.r.t. β 0, β 1 Q = n (Y i (β 0 + β 1 X i )) 2 i=1 Minimize this by maximizing

More information

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Multilevel Statistical Models: 3 rd edition, 2003 Contents Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction

More information

Chapter 4 - Fundamentals of spatial processes Lecture notes

Chapter 4 - Fundamentals of spatial processes Lecture notes Chapter 4 - Fundamentals of spatial processes Lecture notes Geir Storvik January 21, 2013 STK4150 - Intro 2 Spatial processes Typically correlation between nearby sites Mostly positive correlation Negative

More information

Linear Methods for Prediction

Linear Methods for Prediction This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

Business Statistics. Tommaso Proietti. Model Evaluation and Selection. DEF - Università di Roma 'Tor Vergata'

Business Statistics. Tommaso Proietti. Model Evaluation and Selection. DEF - Università di Roma 'Tor Vergata' Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Model Evaluation and Selection Predictive Ability of a Model: Denition and Estimation We aim at achieving a balance between parsimony

More information

REGRESSION WITH SPATIALLY MISALIGNED DATA. Lisa Madsen Oregon State University David Ruppert Cornell University

REGRESSION WITH SPATIALLY MISALIGNED DATA. Lisa Madsen Oregon State University David Ruppert Cornell University REGRESSION ITH SPATIALL MISALIGNED DATA Lisa Madsen Oregon State University David Ruppert Cornell University SPATIALL MISALIGNED DATA 10 X X X X X X X X 5 X X X X X 0 X 0 5 10 OUTLINE 1. Introduction 2.

More information

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 Lecture 3: Linear Models Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector of observed

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

Linear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77

Linear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77 Linear Regression Chapter 3 September 27, 2016 Chapter 3 September 27, 2016 1 / 77 1 3.1. Simple linear regression 2 3.2 Multiple linear regression 3 3.3. The least squares estimation 4 3.4. The statistical

More information

Random vectors X 1 X 2. Recall that a random vector X = is made up of, say, k. X k. random variables.

Random vectors X 1 X 2. Recall that a random vector X = is made up of, say, k. X k. random variables. Random vectors Recall that a random vector X = X X 2 is made up of, say, k random variables X k A random vector has a joint distribution, eg a density f(x), that gives probabilities P(X A) = f(x)dx Just

More information

Estimation in Generalized Linear Models with Heterogeneous Random Effects. Woncheol Jang Johan Lim. May 19, 2004

Estimation in Generalized Linear Models with Heterogeneous Random Effects. Woncheol Jang Johan Lim. May 19, 2004 Estimation in Generalized Linear Models with Heterogeneous Random Effects Woncheol Jang Johan Lim May 19, 2004 Abstract The penalized quasi-likelihood (PQL) approach is the most common estimation procedure

More information

Linear Regression (1/1/17)

Linear Regression (1/1/17) STA613/CBB540: Statistical methods in computational biology Linear Regression (1/1/17) Lecturer: Barbara Engelhardt Scribe: Ethan Hada 1. Linear regression 1.1. Linear regression basics. Linear regression

More information

Modelling heterogeneous variance-covariance components in two-level multilevel models with application to school effects educational research

Modelling heterogeneous variance-covariance components in two-level multilevel models with application to school effects educational research Modelling heterogeneous variance-covariance components in two-level multilevel models with application to school effects educational research Research Methods Festival Oxford 9 th July 014 George Leckie

More information

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind

More information

Bayes: All uncertainty is described using probability.

Bayes: All uncertainty is described using probability. Bayes: All uncertainty is described using probability. Let w be the data and θ be any unknown quantities. Likelihood. The probability model π(w θ) has θ fixed and w varying. The likelihood L(θ; w) is π(w

More information

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation Yujin Chung November 29th, 2016 Fall 2016 Yujin Chung Lec13: MLE Fall 2016 1/24 Previous Parametric tests Mean comparisons (normality assumption)

More information

The Poisson transform for unnormalised statistical models. Nicolas Chopin (ENSAE) joint work with Simon Barthelmé (CNRS, Gipsa-LAB)

The Poisson transform for unnormalised statistical models. Nicolas Chopin (ENSAE) joint work with Simon Barthelmé (CNRS, Gipsa-LAB) The Poisson transform for unnormalised statistical models Nicolas Chopin (ENSAE) joint work with Simon Barthelmé (CNRS, Gipsa-LAB) Part I Unnormalised statistical models Unnormalised statistical models

More information

F & B Approaches to a simple model

F & B Approaches to a simple model A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 215 http://www.astro.cornell.edu/~cordes/a6523 Lecture 11 Applications: Model comparison Challenges in large-scale surveys

More information