Course topics (tentative) The role of random effects

Size: px

Start display at page:

Download "Course topics (tentative) The role of random effects"

Martin Freeman
5 years ago
Views:

1 Course topics (tentative) random effects linear mixed models analysis of variance frequentist likelihood-based inference (MLE and REML) prediction Bayesian inference The role of random effects Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark February 5, / 33 2 / 33 Outline for today examples of data sets. analysis of variance multivariate normal distribution density for multivariate normal distribution Reflectance (colour) measurements for samples of cardboard (egg trays) (project at Department of Biotechnology, Chemistry and Environmental Engineering) Four replications at same position on each cardboard reflectance For five cardboards: four replications at four positions at each cardboard Reflektans / nr. Pap.nr. Colour variation between/within cardboards? 4 / 33

2 Orthodontic growth curves Distance between pituitary and the pterygomaxillary fissure for children of age 8-14 Distance versus age: Orthodontic growth curves Distance between pituitary and the pterygomaxillary fissure for children of age 8-14 Distance versus age grouped Distance versus age: according to child distance distance distance age age age Different intercepts for different children 5 / 33 6 / 33 Compression of mats for cows Electricity consumption and temperature in Sweden Compression vs. pressure for two brands of mats compression pressure Non-linear relation y = ab + cx d b + x d, Random variation between mats of same brand, small measurement noise. Temperature and electricity consumption: elforbrug temperatur Index Index 7 / 33 8 / 33

3 Consumption versus temperature and residuals from linear regression: elforbrug + D * temperatur Serial correlation! fejlled y_t Model for reflectances: one-way anova Four replications on each cardboard reflectance nr. Models: Y ij = ξ + ǫ ij i = 1,..., k j = 1,..., m 9 / / 33 Model for reflectances: one-way anova Model for reflectances: one-way anova Models: Models: Four replications on each cardboard Y ij = ξ + ǫ ij i = 1,..., k j = 1,..., m Four replications on each cardboard Y ij = ξ + ǫ ij i = 1,..., k j = 1,..., m reflectance or Y ij = ξ + α i + ǫ ij where ξ and α i are fixed unknown parameters and ǫ ij stochastic noise reflectance or Y ij = ξ + α i + ǫ ij where ξ and α i are fixed unknown parameters and ǫ ij stochastic noise or Y ij = ξ + U i + ǫ ij nr nr. where U i are random variables Which is most relevant? 11 / / 33

4 One role of random effects: parsimonious and population relevant models With fixed effects α i : many parameters (ξ, σ 2, α 2,..., α 34 ). Parameters α 2,..., α 34 not interesting as they just represent intercepts for specific card boards which are individually not of interest. With random effects: just three parameters (ξ, σ 2 = Varǫ ij and τ 2 = VarU i ). Hence parsimonious model. Variance parameters interesting for several reasons. Second role of random effects: quantify sources of variation Quantify sources of variation (e.g. quality control): is pulp for paper production too heterogeneous? With random effects model Y ij = ξ + U i + ǫ ij we have decomposition of variance: VarY ij = VarU i + Varǫ ij = τ 2 + σ 2 Hence we can quantify variation between (τ 2 ) cardboard pieces and within (σ 2 ) cardboard. 13 / / 33 Third role: modeling of covariance and correlation Ratio γ = τ 2 /σ 2 is signal to noise. Proportion of variance is called intra-class correlation. τ 2 σ 2 + τ 2 = γ γ + 1 High proportion of between cardboard variance leads to high correlation (next slide). Covariances: 0 i i Cov[Y ij, Y i j ] = VarU i i = i, j j VarU i + Varǫ ij i = i, j = j Correlations: 0 i i Corr[Y ij, Y i j ] = τ 2 /(σ 2 + τ 2 ) i = i, j j 1 i = i, j = j That is, observations for same cardboard are correlated! Correct modeling of correlation is important for correct evaluation of uncertainty. 15 / / 33

5 Fourth role: correct evalution of uncertainty Suppose we wish to estimate ξ = EY ij. Due to correlation, observations on same cardboard to some extent redundant. Estimate is empirical average ˆξ = Ȳ. Evaluation of VarȲ : Model erroneously ignoring variation between cardboards Y ij = ξ + ǫ ij Varǫ ij = σ 2 total = σ2 + τ 2 Naive variance expression is VarȲ = σ2 total n = σ2 + τ 2 km With first model, variance is underestimated! For VarȲ 0 is it enough that km? Correct model with random cardboard effects Y ij = ξ + U i + ǫ ij, VarU i = τ 2, Varǫ ij = σ 2 Correct variance expression is VarȲ = τ 2 k + σ2 km 17 / 33 Balanced one-way ANOVA (analysis of variance) Decomposition of empirical variance/sums of squares (i = 1,..., k, j = 1,..., m): SST = ij (Y ij Ȳ ) 2 = ij Expected sums of squares: Moment-based estimates: ˆσ 2 = (Y ij Ȳ i ) 2 +m i ESSE = k(m 1)σ 2 ESSB = m(k 1)τ 2 + (k 1)σ 2 SSE k(m 1) NB: ˆτ 2 may be negative. ˆτ 2 = SSB/(k 1) ˆσ2 m (Ȳ i Ȳ ) 2 = SSE+SSB Slightly less nice formulae in the unbalanced case (Thm 5.3 in M&T) 18 / 33 Design of experiment - one-way ANOVA Two levels of random effects Suppose Y ij is outcome of analysis of jth sample in ith lab, τ 2 = 2 variance between labs and σ 2 = 1 measurement variance. Suppose we want to analyze in total 100 samples. What is then the optimal number of labs that makes VarȲ minimal? Suppose instead we have available 5000 kr., there is an initial cost of 200 kr. for each lab and subsequently 10 kr. for the analysis of each sample. What is then the optimal number k of labs that gives the smallest VarȲ? Exercise! For five cardboards we have 4 replications at 4 positions. Hierarchical model (nested random effects) Y ipj = ξ + U i + U ip + ǫ ipj VarY ipj = τ 2 + ω 2 + σ 2 19 / / 33

6 Covariance structure for nested random effects model Correlation structure for nested random effects model Y ipj = ξ + U i + U ip + ǫ ipj 0 i l τ 2 i = l, p q same card Cov(Y ipj, Y lqk ) = τ 2 + ω 2 i = l, p = q same card and pos. τ 2 + ω 2 + σ 2 i = 1, p = q, k = j (VarY ipj ) Y ipj = ξ + U i + U ip + ǫ ipj 0 i l τ 2 i = l, p q Corr(Y ipj, Y lqk ) = σ 2 +ω 2 +τ 2 τ 2 +ω 2 i = l, p = q σ 2 +ω 2 +τ 2 1 i = 1, p = q, k = j 21 / / 33 Model for longitudinal growth data Summary - role of random effects Y ij = ξ i + η i x ij + ǫ ij i: child, j: time. Random intercepts and slopes? Correlated error ǫ ij? e.g. AR(1) ǫ ij = φǫ i(j 1) + ν ij Random effects models are useful for: quantifying different sources of variation appropriate modelling of correlation ( correct evalution of uncertainty of parameter estimates) sometimes individual subject specific effects not of interest - the population variation of effects more interesting more parsimonious models (replace many systematic effects by just one variance) 23 / / 33

7 Likelihood based inference Multivariate normal distribution Let µ R p and Σ a p p symmetric and positive semidefinite p p matrix. Spectral decomposition of Σ: Need probability density Normal distribution allows to build tractable and (reasonably) flexible models Σ = OΛO T where O orthonormal matrix (columns=eigen vectors) and Λ diagonal matrix of eigen values. Definition: a p-variate random p 1 vector Y is p-variate normal N p (µ, Σ) if Y is distributed as µ + OΛ 1/2 Z where Z = (Z 1,..., Z n ) is a vector of independent standard normal random variables. 25 / / 33 Geometric interpretation and PCA (principal component analysis) Assume µ = 0. Λ 1/2 : scaling. O rotation. I.e. Y scaled and rotated Z. Starting with Y : Let v i ith eigen vector in O. Then v T i Y = λ i Z i (projection on v i ) ith principal component with variance λ i. Principal components are independent (uncorrelated if Y not normal). Since λ 1 > λ 2,..., λ p, v1 T Y explains most of the variance in Y ( i VarY i = i λ i) etc. v i is called loading vector for ith PC. 27 / 33 Equivalent definitions: Definition: a random p 1 vector Y is p-variate normal with mean µ and covariance matrix Σ if a T Y is univariate normal with mean a T µ and variance a T Σa for any a R p. Definition: a random p 1 vector Y is p-variate normal with mean µ and covariance matrix Σ if Y has characteristic function φ(t) = E exp(it T Y ) = exp(it T µ t T Σt/2). From the last definition it follows easily that for any m p matrix A. Y N p (µ, Σ) AY N m (Aµ, AΣA T ) Since Vara T Y = a T Σa 0 it follows that Σ must be positive semi-definite. NB: the distribution of a random vector is uniquely determined by the characteristic function. 28 / 33

8 Density of multivariate normal Suppose Z i are independent standard normal. Then Z = (Z 1,..., Z p ) N p (0, I ) with joint density f Z (z 1,..., z p ) = (2π) n/2 exp( z 2 /2) Suppose further that Y N p (µ, Σ) where Σ positive definite. Then Σ = LL T for some invertible matrix L (Cholesky or spectral decomposition). Thus Y µ + LZ and Jacobian of transformation is L = Σ 1/2. By multivariate transformation theorem f Y (y 1,..., y p ) = (2π) n/2 Σ 1/2 exp( 1 2 (y µ)t Σ 1 (y µ)) Density if Σ not positive definite Suppose Σ has rank r < p. Then λ r+1 = = λ p = 0 and Y µ lives on subspace L = span{v 1,..., v r } R p. Possible to define density function on L with respect to r -dimensional Lebesgue measure on L. It is of the form: (2π) r/2 ( r λ i ) 1/2 exp[ 1 2 (y µ)t Σ (y µ)] i=1 where Σ is generalized inverse Σ = Odiag(λ 1 1,..., λ 1 r, 0,..., 0)O T (we will see several examples of this during the course) 29 / / 33 Exercises 1. Show results regarding covariances and correlations in one-way and two-level ANOVA (slide 16 and 21-22). 2. compute VarȲ for one way ANOVA. 3. compute expectations of SSB and SSE in one-way ANOVA. 4. fit linear models for the orthodontic growth curves with subject specific intercepts. Draw histograms of the fitted intercepts (can be extracted using coef()). Check residuals from the model. Also fit model with common intercept and plot residuals against subject. 5. compute covariance and correlation structure of observations from linear models with random intercepts and random slopes: More exercises 6. show Y N p (µ, Σ) AY N m (Aµ, AΣA T ) 7. show that the three definitions of multivariate normal distribution are equivalent 8. solve the design of experiment problem regarding optimal choice of number of laboratories for the one-way ANOVA Y ij = α + U i + βx ij + V i x ij + ǫ ij where U i N(0, τu 2 ) and V i N(0, τv 2 ) and Corr(U i, V i ) = ρ (while (U i, V i ) and (U i, V i ) independent when i i ). What can you say about the variance structure of Y ij? 31 / / 33

9 One more exercise 9. In Bayesian statistics the following pairwise-difference density is often used as a smoothing prior (promotes similar values of y i and y i 1 ): f (y 1,..., y n ) exp( 1 2 n (y i y i 1 ) 2 ) 9.1 Find Q playing the role as Σ 1 so that the above is of the form of a multivariate Gaussian density. Is Q invertible? 9.2 Can you find a square-root of Q? 9.3 Find the eigen space for the eigen value The pairwise-difference density can be viewed as a density on an n 1-dimensional subspace L. Characterize L. i=2 33 / 33

Outline. Statistical inference for linear mixed models. One-way ANOVA in matrix-vector form

Outline. Statistical inference for linear mixed models. One-way ANOVA in matrix-vector form Outline Statistical inference for linear mixed models Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark general form of linear mixed models examples of analyses using linear mixed