Introduction to General and Generalized Linear Models

Similar documents
Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models

Mixed-Models. version 30 October 2011

Mixed-Model Estimation of genetic variances. Bruce Walsh lecture notes Uppsala EQG 2012 course version 28 Jan 2012

Introduction to General and Generalized Linear Models

Time Series Analysis

1 Mixed effect models and longitudinal data analysis

36-720: Linear Mixed Models

Lecture 5: BLUP (Best Linear Unbiased Predictors) of genetic values. Bruce Walsh lecture notes Tucson Winter Institute 9-11 Jan 2013

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Outline for today. Maximum likelihood estimation. Computation with multivariate normal distributions. Multivariate normal distribution

Math 423/533: The Main Theoretical Topics

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Analysis of variance using orthogonal projections

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Linear models and their mathematical foundations: Simple linear regression

Stat 5101 Lecture Notes

1 One-way analysis of variance

Multivariate Regression

WU Weiterbildung. Linear Mixed Models

MIXED MODELS THE GENERAL MIXED MODEL

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1

Lecture Notes. Introduction

Regression Estimation Least Squares and Maximum Likelihood

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

Lecture 6 Multiple Linear Regression, cont.

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood

Econometrics of Panel Data

Ma 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA

Mathematical statistics

Regression. Oscar García

For more information about how to cite these materials visit

Linear Mixed Models. One-way layout REML. Likelihood. Another perspective. Relationship to classical ideas. Drawbacks.

More on nuisance parameters

Statistical inference

Stat 579: Generalized Linear Models and Extensions

The Simple Regression Model. Simple Regression Model 1

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands

ANOVA Variance Component Estimation. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 32

Multiple Linear Regression

Central Limit Theorem ( 5.3)

2.1 Linear regression with matrices

Introduction to Econometrics. Heteroskedasticity

Homoskedasticity. Var (u X) = σ 2. (23)

Likelihood-Based Methods

Least Squares Estimation-Finite-Sample Properties

Outline. Statistical inference for linear mixed models. One-way ANOVA in matrix-vector form

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

Course topics (tentative) The role of random effects

3. Linear Regression With a Single Regressor

Heteroskedasticity. Part VII. Heteroskedasticity

Stat 579: Generalized Linear Models and Extensions

REGRESSION WITH SPATIALLY MISALIGNED DATA. Lisa Madsen Oregon State University David Ruppert Cornell University

Formal Statement of Simple Linear Regression Model

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

ECON The Simple Regression Model

Multiple Linear Regression

Linear Model Under General Variance

Chapter 7, continued: MANOVA

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

1 Outline. 1. Motivation. 2. SUR model. 3. Simultaneous equations. 4. Estimation

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Best Linear Unbiased Prediction (BLUP) of Random Effects in the Normal Linear Mixed Effects Model. *Modified notes from Dr. Dan Nettleton from ISU

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Mathematical statistics

Biostatistics Workshop Longitudinal Data Analysis. Session 4 GARRETT FITZMAURICE

2. Linear regression with multiple regressors

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Prediction. is a weighted least squares estimate since it minimizes. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark

[y i α βx i ] 2 (2) Q = i=1

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

F & B Approaches to a simple model

Introduction to Within-Person Analysis and RM ANOVA

(a) (3 points) Construct a 95% confidence interval for β 2 in Equation 1.

Remedial Measures, Brown-Forsythe test, F test

Statistics. Lent Term 2015 Prof. Mark Thomson. 2: The Gaussian Limit

Lecture 34: Properties of the LSE

Multivariate Regression Analysis

Random and Mixed Effects Models - Part III

Module 6: Model Diagnostics

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

Chapter 12 - Lecture 2 Inferences about regression coefficient

Review. December 4 th, Review

First Year Examination Department of Statistics, University of Florida

Estimation: Problems & Solutions

Properties of Random Variables

Introduction to Random Effects of Time and Model Estimation

Bayesian Linear Regression

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST4233 Linear Models: Solutions. (Semester I: ) November/December, 2007 Time Allowed : 2 Hours

Spatial statistics, addition to Part I. Parameter estimation and kriging for Gaussian random fields

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

the error term could vary over the observations, in ways that are related

Linear Models and Estimation by Least Squares

Transcription:

Introduction to General and Generalized Linear Models Mixed effects models - Part II Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby January 2011 Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January 2011 1 / 29

This lecture, continued More examples of hierarchical variation Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January 2011 2 / 29

Estimation of parameters Confidence interval for the variance ratio In the balanced case one may construct a confidence interval for the variance ratio γ. A 1 α confidence interval for γ, i.e. an interval (γ L,γ U ), satisfying P[γ L < γ < γ U ] = 1 α is obtained by using γ L = 1 n γ U = 1 n ( ) Z 1 F(k 1,N k) 1 α/2 ( ) Z 1 F(k 1,N k) α/2 Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January 2011 3 / 29

Estimation of parameters Theorem (Moment estimates in the random effects model) Moment estimates for the parameters µ,σ 2 and σ 2 u are µ = Y σ 2 = SSE/(N k) σ u 2 = SSB/(k 1) SSE/(N k) n 0 = SSB/(k 1) σ2 n 0 where the weighted average group size n 0 is given by k 1 n k ) i ( 1 n2 i / k 1 n i n 0 = = N i n2 i /N k 1 k 1 Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January 2011 4 / 29

Estimation of parameters Distribution of residual sum of squares In the balanced case we have that SSE σ 2 χ 2 (k(n 1)) SSB {σ 2 /w(γ)}χ 2 (k 1) and that SSE and SSB are independent. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January 2011 5 / 29

Estimation of parameters Unbiased estimates for variance ratio in the balanced case In the balanced case, n 1 = n 2 = = n k = n, we can provide explicit unbiased estimators for γ and w(γ) = 1/(1+nγ). One has w = SSE/{k(n 1)} SSB/(k 3) γ = 1 ( ) SSB/(k 1) n SSE/{k(n 1) 2} 1 are unbiased estimators for w(γ) = 1/(1+nγ) and for γ = σ 2 u /σ2, respectively. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January 2011 6 / 29

Example - Wool data Variation Sum of squares f s 2 = SS/f E[S 2 ] Between bales SSB 65.9628 6 10.9938 σ 2 +4σ 2 u Within bales SSE 131.4726 21 6.2606 σ 2 Table: ANOVA table for the baled wool data. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January 2011 7 / 29

Example - Wool data The test statistic for the hypothesis H 0 : σ 2 u = 0, is z = 10.9938 6.2606 = 1.76 < F 0.95(6,21) = 2.57 The p-value is P[F(6,21) 1.76] = 0.16 Thus, the test fails to reject the hypothesis of no variation between the purity of the bales when testing at a 5% significance level. However, as the purpose is to describe the variation in the shipment, we will estimate the parameters in the random effects model, irrespective of the test result. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January 2011 8 / 29

Example - Wool data Now lets find a 95% confidence interval for the ratio γ = σu/σ 2 2. As F(6,21) 0.025 = 1/F(21,6) 0.975, one finds the interval γ L = 1 ( ) ( ) 1.76 1.76 1 = 0.25 4 F(6,21) 0.975 3.09 1 = 0.11 γ U = 1 ( ) 1.76 1 = 0.25 (1.76 5.15 1) = 2.02 4 F(6,21) 0.025 Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January 2011 9 / 29

Maximum likelihood estimates Theorem (Maximum likelihood estimates for the parameters under the random effects model) The maximum likelihood estimates for µ,σ 2 and σ 2 u = σ2 γ are determined by For i n2 i (y i y ) 2 < SSE+SSB one obtains µ = y = 1 n i y N i σ 2 = 1 N (SSE+SSB) γ = 0 i Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January 2011 10 / 29

Maximum likelihood estimates Theorem (Maximum likelihood estimates for the parameters under the random effects model continued) For i n2 i (y i y ) 2 > SSE+SSB the estimates are determined as solution to µ = 1 k n i w i ( γ)y W( γ) i i=1 { } σ 2 = 1 k SSE+ n i w i ( γ)(y N i µ) 2 i=1 { } 1 k n 2 W( γ) iw i ( γ) 2 (y i µ) 2 = 1 k SSE+ n i w i ( γ)(y N i µ) 2 i=1 i=1 where k W(γ) = n i w i (γ). i=1 Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January 2011 11 / 29

Maximum likelihood estimates The maximum likelihood estimate µ is a weighted average of the group averages µ is a weighted average of the group averages, y i, with the estimates for the marginal precisions σ 2 n i w i (γ) = σ 2 Var[Y i ] as weights. We have the marginal variances Var[Y i ] = σ 2 u + σ2 n i = σ2 n i (1+n i γ) = σ2 n i w i (γ) When the experiment is balanced, i.e. when n 1 = n 2 = = n k, then all weights are equal, and one obtains the simple result that µ is the crude average of the group averages. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January 2011 12 / 29

Maximum likelihood estimates The estimate for σ 2 utilizes also the variation between groups We observe that the estimate for σ 2 is not only based upon the variation within groups, SSE, but the estimate does also utilize the knowledge of the variation between groups, as E[(Y i µ) 2 ] = Var[Y i ] = σ2 n i w i (γ) and therefore, the terms (y i µ) 2 contain information about σ 2 as well as γ. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January 2011 13 / 29

Maximum likelihood estimates The estimate for σ 2 is not necessarily unbiased We observe further that as usual with ML-estimates of variance the estimate for σ 2 is not necessarily unbiased. Instead of the maximum likelihood estimate above, it is common practice to adjust the estimate. Later we shall introduce the so-called residual maximum likelihood (REML) estimates for σ 2 and σ 2 u, obtained by considering the distribution of the residuals. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January 2011 14 / 29

Maximum likelihood estimates Maximum-likelihood-estimates in the balanced case In the balanced case, n 1 = n 2 = = n k the weights do not depend on i, and then w i (γ) = 1 1+nγ µ = 1 k k y i+ = y ++, i=1 which is the same as the moment estimate. When (n 1)SSB > SSE then the maximum likelihood estimate corresponds to an inner point. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January 2011 15 / 29

Maximum likelihood estimates Maximum-likelihood-estimates in the balanced case, continued N n 1+nγ Nσ 2 = SSE+ SSB 1+nγ SSB = SSE+ SSB k 1+nγ with the solution σ 2 = SSE N k γ = 1 [ ] SSB n k σ 2 1 σ 2 b = SSB/k σ2 n Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January 2011 16 / 29

Estimation of random effects, BLUP-estimation In a mixed effects model, it is not clear what fitted values, and residuals are. Our best prediction for subject i is not given by the mean relationship, µ. It may sometimes be of interest to estimate the random effects. The best linear unbiased predictor (BLUP) in the one-way case is ( ) µ i = 1 w i (γ) y i +w i (γ)µ Thus, the estimate for µ i is a weighted average between the individual bale averages, y i and the overall average µ with weights (1 w i (γ)) and w i (γ), where w i (γ) = 1 1+n i γ Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January 2011 17 / 29

Definition (Linear mixed effects model) The model Y = Xβ +ZU +ǫ with X and Z denoting known matrices, and where ǫ N(0,Σ) and U N(0,Ψ) are independent is called a mixed general linear model. In the general case may the covariance matrices Σ and Ψ depend on some unknown parameters, ψ, that also need to be estimated. The parameters β are called fixed effects or systematic effects, while the quantities U are called random effects. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January 2011 18 / 29

It follows from the independence of U and ǫ that [( )] ( ) ǫ Σ 0 D = U 0 Ψ The model may also be interpreted as a hierarchical model U N(0,Ψ) Y U = u N(Xβ +Zu,Σ) Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January 2011 19 / 29

The marginal distribution of Y is a normal distribution with E[Y ] = Xβ D[Y ] = Σ+ZΨZ T We shall introduce the symbol V for the dispersion matrix in the marginal distribution of Y, i.e. V = Σ+ZΨZ T The matrix V may grow rather large and cumbersome to handle. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January 2011 20 / 29

One-way model with random effects - example The one-way model with random effects We can formulate this as with Y ij = µ+u i +e ij Y = Xβ +ZU +ǫ X = 1 N β = µ U = (U 1,U 2,...,U k ) T Σ = σ 2 I N Ψ = σ 2 ui k where 1 N is a column of 1 s. The i,j th element in the N k dimensional matrix Z is 1, if y ij belongs to the i th group, otherwise it is zero. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January 2011 21 / 29

Estimation of fixed effects and variance parameters The fixed effect parameters β and the variance parameters ψ are estimated from the marginal distribution of Y. For fixed ψ the estimate of β is found as the solution of (X T V 1 X)β = X T V 1 y This is the well-known weighted least squares (WLS) formula. In some software systems the solution is called the generalised least squares (GLS). Note, however, that the solution may depend on the unknown variance parameters ψ as we saw in the case of the unbalanced one-way random effect model. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January 2011 22 / 29

Estimation of fixed effects and variance parameters The observed Fisher information for β is I( β) = X T V 1 X An estimate for the dispersion matrix for β is determined as Var[ β] = (X T V 1 X) 1 Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January 2011 23 / 29

Estimation of fixed effects and variance parameters In order to determine estimates for the variance parameters ψ we shall modify the profile likelihood for ψ in order to compensate for the estimation of β The modified profile log-likelihood is l m (ψ) = 1 2 log V 1 2 log XT V 1 X 1 2 (Y X β) T V 1 (Y X β) When β depends on ψ it is necessary to determine the solution by iteration. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January 2011 24 / 29

Estimation of fixed effects and variance parameters The modification to the profile likelihood equals the so-called residual maximum likelihood (REML)-method using the marginal distribution of the residual (y X β ψ ). In REML the problem of biased variance components is solved by setting the fixed effects estimates equal to the WLS solution above in the likelihood function and then maximising it to find the variance component terms only. The reasoning is that the fixed effects cannot contribute with information on random effects leading to a justification of not estimating these parameters in the same likelihood. The method is also termed restricted maximum likelihood method because the model may be embedded in a more general model for the group observation vector Y i where the random effects model restricts the correlation coefficient in the general model. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January 2011 25 / 29

Estimation of fixed effects and variance parameters It is observed that the REML-estimates are obtained by minimising (Y Xβ) T V 1 (ψ)(y Xβ)+log V 1 (ψ) +log X T V 1 (ψ)x A comparison with the full likelihood function in shows that it is the variance term log X T V 1 (ψ)x which is associated with the estimation of β that causes the REML estimated variance components to be unbiased. If accuracy of estimates of the variance terms are of greater importance than bias then the full maximum likelihood should be considered instead. An optimal weighting between bias and variance of estimators is obtained by the estimators optimising the so-called Godambe Information In balanced designs REML gives the classical moment estimates of variance components (constrained to be non-negative). Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January 2011 26 / 29

Estimation of random effects Formally, the random effects, U are not parameters in the model, and the usual likelihood approach does not make much sense for estimating these random quantities. It is, however, often of interest to assess these latent, or state variables. We formulate a so-called hierarchical likelihood by writing the joint density for observable as well as unobservable random quantities. By putting the derivative of the hierarchical likelihood equal to zero and solving with respect to u one finds that the estimate û is solution to (Z T Σ 1 Z +Ψ 1 )u = Z T Σ 1 (y Xβ) where the estimate β is inserted in place of β. The solution is termed the best linear unbiased predictor Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January 2011 27 / 29

Simultaneous estimation of β and u The estimates for β and for u are those values that simultaneously maximize l(β,ψ,u) for a fixed value of ψ. The mixed model equations are ( X T Σ 1 X X T Σ 1 Z Z T Σ 1 X Z T Σ 1 Z +Ψ 1 )( ) β = u ( X T Σ 1 ) y Z T Σ 1 y The equations facilitate the estimation of β and u without calculation of the marginal variance V, or its inverse. The estimation may be performed by an iterative back-fitting algorithm. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January 2011 28 / 29

Interpretation as empirical Bayes estimate It is seen from (Z T Σ 1 Z +Ψ 1 )u = Z T Σ 1 (y X β) that the BLUP-estimate û for the random effects has been shrunk towards zero, as it is a weighted average of the direct estimate, (y X β), and the prior mean, E[U] = 0, where the weights are the precision Ψ 1 in the distribution of U. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January 2011 29 / 29