MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1

Similar documents
MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2

CAS MA575 Linear Models

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

STAT 100C: Linear models

Statistics 910, #5 1. Regression Methods

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION

Chapter 14. Linear least squares

. a m1 a mn. a 1 a 2 a = a n

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

Chapter 5 Matrix Approach to Simple Linear Regression

Chaper 5: Matrix Approach to Simple Linear Regression. Matrix: A m by n matrix B is a grid of numbers with m rows and n columns. B = b 11 b m1 ...

Lecture 2: Linear and Mixed Models

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

Linear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2

MIT Spring 2015

Regression diagnostics

5.2 Expounding on the Admissibility of Shrinkage Estimators

A Short Introduction to the Lasso Methodology

Linear Algebra Review

Data Mining Stat 588

Multiple Linear Regression

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

MA 575 Linear Models: Cedric E. Ginestet, Boston University Bootstrap for Regression Week 9, Lecture 1

STAT5044: Regression and Anova. Inyoung Kim

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Quantitative Analysis of Financial Markets. Summary of Part II. Key Concepts & Formulas. Christopher Ting. November 11, 2017

Part 6: Multivariate Normal and Linear Models

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Scatter plot of data from the study. Linear Regression

Mixed-Models. version 30 October 2011

Stat 579: Generalized Linear Models and Extensions

14 Multiple Linear Regression

Well-developed and understood properties

STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method.

REPEATED MEASURES. Copyright c 2012 (Iowa State University) Statistics / 29

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

Scatter plot of data from the study. Linear Regression

Linear Mixed Models. One-way layout REML. Likelihood. Another perspective. Relationship to classical ideas. Drawbacks.

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

Matrix Approach to Simple Linear Regression: An Overview

Sparse Linear Models (10/7/13)

Topic 7 - Matrix Approach to Simple Linear Regression. Outline. Matrix. Matrix. Review of Matrices. Regression model in matrix form

Linear Regression (9/11/13)

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Regression Models - Introduction

STAT 100C: Linear models

Lecture 5: BLUP (Best Linear Unbiased Predictors) of genetic values. Bruce Walsh lecture notes Tucson Winter Institute 9-11 Jan 2013

Ordinary Least Squares Regression

Mixed-Model Estimation of genetic variances. Bruce Walsh lecture notes Uppsala EQG 2012 course version 28 Jan 2012

Prediction. is a weighted least squares estimate since it minimizes. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark

Bayesian Linear Regression

Math 423/533: The Main Theoretical Topics

11 Hypothesis Testing

Weighted Least Squares

Ridge Regression. Flachs, Munkholt og Skotte. May 4, 2009

Penalized least squares versus generalized least squares representations of linear mixed models

Course topics (tentative) The role of random effects

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

Linear Regression and Its Applications

Linear Regression (1/1/17)

Lecture 2. The Simple Linear Regression Model: Matrix Approach

18.S096 Problem Set 3 Fall 2013 Regression Analysis Due Date: 10/8/2013

Variable Selection and Model Building

1 Outline. 1. Motivation. 2. SUR model. 3. Simultaneous equations. 4. Estimation

6. Multiple Linear Regression

Lecture Notes 15 Prediction Chapters 13, 22, 20.4.

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014

3. For a given dataset and linear model, what do you think is true about least squares estimates? Is Ŷ always unique? Yes. Is ˆβ always unique? No.

THE UNIVERSITY OF CHICAGO Booth School of Business Business 41912, Spring Quarter 2016, Mr. Ruey S. Tsay

Need for Several Predictor Variables

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002

WLS and BLUE (prelude to BLUP) Prediction

Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13)

Leverage. the response is in line with the other values, or the high leverage has caused the fitted model to be pulled toward the observed response.

STAT 350: Geometry of Least Squares

Generalized Linear Models

Lectures on Simple Linear Regression Stat 431, Summer 2012

Chapter 3: Multiple Regression. August 14, 2018

STAT 540: Data Analysis and Regression

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

Multivariate Regression (Chapter 10)

Statistical Techniques II EXST7015 Simple Linear Regression

Lecture 1: Linear Models and Applications

Multiple Linear Regression

The Simple Regression Model. Part II. The Simple Regression Model

For more information about how to cite these materials visit

Categorical Predictor Variables

Lecture 14 Simple Linear Regression

Econ 2120: Section 2

Weighted Least Squares

ESTIMATING THE MEAN LEVEL OF FINE PARTICULATE MATTER: AN APPLICATION OF SPATIAL STATISTICS

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION

Lecture 10 Multiple Linear Regression

Biostatistics Workshop Longitudinal Data Analysis. Session 4 GARRETT FITZMAURICE

Transcription:

MA 575 Linear Models: Cedric E Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 1 Within-group Correlation Let us recall the simple two-level hierarchical model that we have studied as a motivating example for mixed effects models In this case, we had assumed that the single intercept in our original model was drawn from a population of intercepts, such that y ij = α j + βx ij + e ij, α j = α + b j for i = 1,, n j, and j = 1,, m In addition, we will make the following standard distributional assumptions, b j N(0, σ 2 b ), and e ij N(0, σ 2 e ) Moreover, we also assume that the random effects and error terms are uncorrelated, Cov[b j, e ij ] = 0 Clearly, this model contains two different sources of variation: within- and between-clusters We may therefore wish to ask what is the correlation between two observations y 1j and y 2j belonging to the same group, j Here, we are assuming that homoscedasticity holds within and between group, so we can drop the dependence on x 1j and x 2j, such that Cov[Y 1j, Y 1j x 1j, x 2j ] = Cov[b j + e 1j, b j + e 2j ] = Cov[b j, b j ] + Cov[b j, e 2j ] + Cov[e 1j, b j ] + Cov[e 1j, e 2j ] = Var[b j ], since we have assumed that the e ij are statistically independent of each other and statistically independent of the b j It suffices to normalize with respect to the product of the standard deviations of Y 1j and Y 2j, which are (Var[Y 1j x 1j ] Var[Y 2j x 2j ]) 1/2 = Var[b j + e ij ] = σ 2 e + σ 2 b The within-group correlation between any two elements in a cluster is then given by, ρ := Var[b j ] Var[b j + e ij ] = σ2 b σe 2 + σb 2 Here, ρ weighs the relative contribution of each independent group to the overall mean It is sometimes described in terms of shrinkage, since as ρ increases, the group-specific observations shrink toward their group means Moreover, we may have the following two extreme situations, Department of Mathematics and Statistics, Boston University 1

1 As σ 2 b 0, we have ρ 0, and we recover a standard regression with no random effects 2 As σ 2 e 0, we have ρ 1, and each group element is equal to its group mean Thus, the within-cluster correlation quantifies how influential is the group structure on the estimation of the fixed effects This particular quantity is related to the shrinkage coefficient in Bayesian statistics, which is usually defined as 1 ρ The latter represents the shrinkage of the group means toward the overall mean of the observations 2 Mixed Models Estimation 21 General Formulation The main mixed effects model equation is generally written as follows, y j = X j β + Z j b j + e j, j = 1,, m (1) This general expression includes different choices of the Z j s, such as a model with both random intercepts and random slope coefficients, y ij = α j + β j x ij + e ij, α j = α + a j, β j = β + b j for i = 1,, n j, and j = 1,, m This model would fit a separate regression line for every group 22 Balanced Mixed Effects Designs A mixed effects model is called balanced if for every j = 1,, m, we have both Z j = Z and n j = n; and where the total number of observations is still denoted by N := m j=1 n j = mn Moreover, we have made the following two distributional assumptions, e j MVNn (0, σ 2 I n ), b j MVNk (0, σ 2 D), for every j = 1,, m Recall that we have expressed the general mixed effects model by stacking the group-specific matrices, such that y = Xβ + Zb + e (2) where y, X, Z, b and e are of dimensions (N 1), (N p ), (N mk), (mk 1), and (N 1) This was further simplified using the definition η := Zb + e as y = Xβ + η, where y is an N 1 vector of observations, X is a design matrix for the fixed effects of order N p, and β is of order p 1 Finally, the error vector combines both the random effects and error terms, such that η 1 Zb 1 + e 1 η := = Zb m + e m η m Department of Mathematics and Statistics, Boston University 2

Moreover, we had seen that the variance of the error term is an N N matrix that has a block-diagonal structure, I n + ZDZ T 0 0 Var[η X] = σ 2 0 I n + ZDZ T 0 0 0 I n + ZDZ T Here, we may use the Kronecker product for matrices For any two matrices A, B, with A of order (r c), the product A B is given by the block matrix, a 11 B a 1c B A B = a r1 B a rc B Hence, the covariance matrix of η in a balanced mixed effects model can simply be written as Var[η X] = σ 2 (I m (I n + ZDZ T )) Altogether, this gives the following mean and covariance functions for the general mixed effects model, E[y X] = Xβ, and Var[η X] = σ 2 (I m (I n + ZDZ T )) For notational convenience, let us denote the variance/covariance matrix of the error term by Σ 1 := Var[η X] One can then formulate an RSS criterion for estimating β such that RSS(β; Σ) := (y Xβ) T Σ(y Xβ) This expression can be straightforwardly minimized using the GLS framework that we have studied in lecture 62, in order to obtain β = (X T ΣX) 1 X T Σy However, this GLS estimator is dependent on D, which is unknown Therefore, we need to resort to a more sophisticated estimation procedure 23 MLE Estimation We can express the full likelihood of our balanced model using the variance of η, such that ) ind y j MVN n (X j β, σ 2 (I n + ZDZ T ), j = 1,, m Here, there are three parameters to estimate, which can be summarized as follows, Θ := {θ : β R p, σ 2 > 0, D 0}, where the last condition states that D should be positive semi-definite Strictly speaking, we are here dealing with a constrained maximization problem However, note that the space Θ is convex, since any convex combination of any two positive semi-definite matrices is itself positive semi-definite The standard way of estimating the parameters of such general mixed effects models is through restricted maximum likelihood Department of Mathematics and Statistics, Boston University 3

24 Identifiability In general, we want to ensure that the model of interest is identifiable This simply means that once we know the probability distribution (or likelihood) L(θ, y, X) of our model, we also necessarily know the parameter θ This condition is therefore equivalent to the requirement that the map L( ; y, X) : Θ R +, is injective or one-to-one In that case, L(θ 1 ; y, X) = L(θ 2 ; y, X) implies that θ 1 = θ 2 (see Demidenko, 2004, p117) An example of a non-identifiable model would be a linear regression with two intercepts, y i = α 1 + α 2 + βx i + e i, i = 1,, n If one assumes that in addition e i N(0, σ 2 ), we can compute the likelihood function of this model However, several combinations of the parameters, α 1 and α 2, would give the same probability to the model This constitutes the formal definition of identifiability In statistical practice, however, this term is usually used in a more general sense Often, one may use the phrase degrees of identifiability, which could be interpreted as the number of degrees of freedom still available after fitting the specified parameters 3 Regression Diagnostics: Residuals 31 Errors and Residuals In standard multiple regression, we make the following assumptions on the error terms, E[e X] = 0, and Var[e X] = σ 2 I This should be contrasted with the moments of the residuals that were found to be E[ê X] = 0, and Var[ê X] = σ 2 (I H) Observe that the moments of the error terms do not depend on the values of the x is However, the residuals are dependent on the x i s through the hat matrix Whereas the variance of the error terms is assumed to be homoscedastic, the residuals will take different values, depending on the diagonal entries of H Moreover, the variance of each individual residual is given by Var[ê i X] = σ 2 (1 h ii ), where h ii is the i th diagonal entry of H The covariance of any two residuals is given by Cov[ê i, ê j X] = σ 2 (1 h ij ) Thus, the properties of the residuals are intrinsically linked with the ones of the hat matrix 32 Reminder: Properties of the Hat Matrix Recall that for any multiple regression, the hat matrix is defined as H := X(X T X) 1 X T It satisfies Putting these two equalities together, we obtain, ŷ = X β = Hy, and ê = y ŷ = (I H)y y = ŷ + ê = Hy + (I H)y Since H is an orthogonal projection onto the column space of X, it also follows that where 0 is here a matrix of order n p HX = X and (I H)X = 0, Department of Mathematics and Statistics, Boston University 4

33 Leverages Let x i := [x i0,, x ip ] T denote the i th row of X, here defined as a column vector Each diagonal entry in the hat matrix takes the form, h ii = x T i (X T X) 1 x i, and moreover it can be shown that for any i = 1,, n, 1 n xt i (X T X) 1 x i 1 r, where r is the number of rows of X, which are identical to x i This quantity is called the leverage because it controls the relative importance of each data point, through the following relationships: 1 As h ii 0, that particular data point becomes irrelevant in the computation of ŷ i, since the variance of its residual, Var[e i X], is high 2 As h ii 1, the residual for that data point becomes increasingly close 0, because its variance also approaches 0 Therefore, ŷ i y i A good example of this upper bound is when considering a standard ANOVA model with m groups, for instance In that case, the leverage for any point labeled in the j th group is given by h ii 1 n j For every data point, the fitted value can be shown to be a function of the form, ŷ i = h ij y j = h ii y i + h ij y j, j=1 for every i = 1,, n If the mean function includes an intercept, every h ii admits the following decomposition, j i h ii = 1 n + (x i x ) T (X T X ) 1 (x i x ), where x i = [1, x i ]T, and the mean covariate profile is defined as x := 1 n x i, and recall that X is the corrected matrix of predictors, (x 10 x 0 ) (x 1p x p ) X := (x n0 x 0 ) (x np x p ) In simple linear regression, we obtain the following formula, where observe that the h ii s are only a function of the x i s, h ii = 1 n + (x i x) 2 n i=1 (x i x) 2 i=1 References Demidenko, E (2004) Mixed Models: Theory and Applications Wiley, London Department of Mathematics and Statistics, Boston University 5