Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

Similar documents
Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Lecture 2: Linear and Mixed Models

Mixed-Model Estimation of genetic variances. Bruce Walsh lecture notes Uppsala EQG 2012 course version 28 Jan 2012

Lecture 5: BLUP (Best Linear Unbiased Predictors) of genetic values. Bruce Walsh lecture notes Tucson Winter Institute 9-11 Jan 2013

Mixed-Models. version 30 October 2011

SYLLABUS MIXED MODELS IN QUANTITATIVE GENETICS. William (Bill) Muir, Department of Animal Sciences, Purdue University

20th Summer Institute in Statistical Genetics

Lecture 4. Basic Designs for Estimation of Genetic Parameters

MIXED MODELS THE GENERAL MIXED MODEL

Lecture 5 Basic Designs for Estimation of Genetic Parameters

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

Ch 2: Simple Linear Regression

F-tests and Nested Models

Lecture 14 Simple Linear Regression

Math 423/533: The Main Theoretical Topics

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1

Chapter 14. Linear least squares

THE ANOVA APPROACH TO THE ANALYSIS OF LINEAR MIXED EFFECTS MODELS

STAT 100C: Linear models

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

Lecture 3: Multiple Regression

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

Estadística II Chapter 5. Regression analysis (second part)

Ma 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA

Lecture 3: Basic Statistical Tools. Bruce Walsh lecture notes Tucson Winter Institute 7-9 Jan 2013

Lecture 32: Infinite-dimensional/Functionvalued. Functions and Random Regressions. Bruce Walsh lecture notes Synbreed course version 11 July 2013

First Year Examination Department of Statistics, University of Florida

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46

Hypothesis Testing hypothesis testing approach

2.1 Linear regression with matrices

Formal Statement of Simple Linear Regression Model

Lecture 9. QTL Mapping 2: Outbred Populations

Lecture 1 Basic Statistical Machinery

BIOS 2083 Linear Models c Abdus S. Wahed

Multiple Linear Regression

Correlation and Regression

STAT5044: Regression and Anova. Inyoung Kim

Linear Regression Model. Badr Missaoui

Multivariate Linear Models

Need for Several Predictor Variables

Statistical Distribution Assumptions of General Linear Models

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.

Lecture 34: Properties of the LSE

Regression and correlation. Correlation & Regression, I. Regression & correlation. Regression vs. correlation. Involve bivariate, paired data, X & Y

Multivariate Distributions

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Multiple Linear Regression

Applied linear statistical models: An overview

Introduction to Random Effects of Time and Model Estimation

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

Concordia University (5+5)Q 1.

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Introduction to Estimation Methods for Time Series models. Lecture 1

Stat 579: Generalized Linear Models and Extensions

STAT 540: Data Analysis and Regression

Lecture 11. Multivariate Normal theory

Short-Term Selection Response: Breeder s equation. Bruce Walsh lecture notes Uppsala EQG course version 31 Jan 2012

Lectures on Simple Linear Regression Stat 431, Summer 2012

1. The OLS Estimator. 1.1 Population model and notation

Biostatistics for physicists fall Correlation Linear regression Analysis of variance

Statistics 910, #5 1. Regression Methods

INTRODUCTION TO ANIMAL BREEDING. Lecture Nr 3. The genetic evaluation (for a single trait) The Estimated Breeding Values (EBV) The accuracy of EBVs

STA 2101/442 Assignment 3 1

5.1 Consistency of least squares estimates. We begin with a few consistency results that stand on their own and do not depend on normality.

General Linear Model: Statistical Inference

1. The Multivariate Classical Linear Regression Model

Correlation in Linear Regression

Simple Linear Regression Analysis

Lecture 7 Correlated Characters

1 One-way analysis of variance

Regression Models - Introduction

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Lecture 2. Fisher s Variance Decomposition

Stat 579: Generalized Linear Models and Extensions

Multiple random effects. Often there are several vectors of random effects. Covariance structure

Chapter 7, continued: MANOVA

Association studies and regression

3. The F Test for Comparing Reduced vs. Full Models. opyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics / 43

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47

Linear models and their mathematical foundations: Simple linear regression

Linear Algebra Review

Lecture 6: Selection on Multiple Traits

Review of Statistics

A Re-Introduction to General Linear Models (GLM)

Lecture 9: Linear Regression

Spatial Regression. 3. Review - OLS and 2SLS. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

General Linear Model Introduction, Classes of Linear models and Estimation

Testing a Normal Covariance Matrix for Small Samples with Monotone Missing Data

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Lecture 9 SLR in Matrix Form

Simple Linear Regression

Heteroskedasticity. Part VII. Heteroskedasticity

MFin Econometrics I Session 4: t-distribution, Simple Linear Regression, OLS assumptions and properties of OLS estimators

Stat 511 Outline (Spring 2009 Revision of 2008 Version)

ECON The Simple Regression Model

Transcription:

Lecture 3: Linear Models Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 1

Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector of observed dependent values X = Design matrix: observations of the variables in the assumed linear model! = vector of unknown parameters to estimate e = vector of residuals (deviation from model fit), e = y-x! 2

y = X! + e Solution to! depends on the covariance structure (= covariance matrix) of the vector e of residuals Ordinary least squares (OLS) OLS: e ~ MVN(0, " 2 I) Residuals are homoscedastic and uncorrelated, so that we can write the cov matrix of e as Cov(e) = " 2 I the OLS estimate, OLS(!) = (X T X) -1 X T y Generalized least squares (GLS) GLS: e ~ MVN(0, V) Residuals are heteroscedastic and/or dependent, GLS(!) = (X T V -1 X) -1 V -1 X T y 3

BLUE Both the OLS and GLS solutions are also called the Best Linear Unbiased Estimator (or BLUE for short) Whether the OLS or GLS form is used depends on the assumed covariance structure for the residuals Special case of Var(e) = " e 2 I -- OLS All others, i.e., Var(e) = R -- GLS 4

Linear Models One tries to explain a dependent variable y as a linear function of a number of independent (or predictor) variables. A multiple regression is a typical linear model, y = µ + β 1 x 1 + β 2 x 2 + + β n x x + e Here e is the residual, or deviation between the true value observed and the value predicted by the linear model. The (partial) regression coefficients are interpreted as follows. A unit change in x i while holding all other variables constant results in a change of! i in y 5

Linear Models As with a univariate regression (y = a + bx + e), the model parameters are typically chosen by least squares, wherein they are chosen to minimize the sum of squared residuals, # e i 2 This unweighted sum of squared residuals assumes an OLS error structure, so all residuals are equally weighted (homoscedastic) and uncorrelated If the residuals differ in variances and/or some are correlated (GLS conditions), then we need to minimize the weighted sum e T V -1 e, which removes correlations and gives all residuals equal variance. 6

Predictor and Indicator Variables Suppose we measuring the offspring of p sires. One linear model would be y ij = µ + s i + e ij y ij = trait value of offspring j from sire i µ = overall mean. This term is included to give the s i terms a mean value of zero, i.e., they are expressed as deviations from the mean s i = The effect for sire i (the mean of its offspring). Recall that variance in the s i estimates Cov(half sibs) = V A /4 e ij = The deviation of the jth offspring from the family mean of sire i. The variance of the e s estimates the within-family variance. 7

Predictor and Indicator Variables In a regression, the predictor variables are typically continuous, although they need not be. y ij = µ + s i + e ij Note that the predictor variables here are the s i, (the value associated with sire i) something that we are trying to estimate We can write this in linear model form by using indicator variables, x ik = { 1 if sire k = i 0 otherwise 8

Models consisting entirely of indicator variables are typically called ANOVA, or analysis of variance models Models that contain no indicator variables (other than for the mean), but rather consist of observed value of continuous or discrete values are typically called regression models Both are special cases of the General Linear Model (or GLM) y ijk = µ + s i + d ij +!x ijk + e ijk Example: Nested half sib/full sib design with an age correction! on the trait 9

Example: Nested half sib/full sib design with an age correction! on the trait ANOVA model y ijk = µ + s i + d ij +!x ijk + e ijk Regression model s i = effect of sire i d ij = effect of dam j crossed to sire i x ijk = age of the kth offspring from i x j cross 10

Linear Models in Matrix Form Suppose we have 3 variables in a multiple regression, with four (y,x) vectors of observations. y i = µ + β 1 x i 1 + β 2 x i 2 + β 3 x i 3 + e i In matrix form, y = Xβ + e y 1 µ 1 x 11 x 12 x 13 y y = 2 β β = 1 1 x X = 21 x 22 x 23 e = y 3 β 2 1 x 31 x 32 x 33 y 4 β 3 1 x 41 x 42 x 43 e 1 e 2 e 3 e 4 The design matrix X. Details of both the experimental design and the observed values of the predictor variables all reside solely in X 11

The Sire Model in Matrix Form The model here is y ij = µ + s i + e ij Consider three sires. Sire 1 has 2 offspring, sire 2 has one and sire 3 has three. The GLM form is y = y 11 1 1 0 0 y 12 1 1 0 0 y 21 y 31, X = 1 0 1 0 1 0 0 1 y 32 1 0 0 1 y 33 1 0 0 1 β = µ s 1 s 2 s 3, and e = e 11 e 12 e 21 e 31 e 32 e 33 Still need to specify covariance structure of the residuals. For example, could be common-family effects that make some correlated. 12

In-class Exercise Suppose you measure height and sprint speed for five individuals, with heights (x) of 9, 10, 11, 12, 13 and associated sprint speeds (y) of 60, 138, 131, 170, 221 1) Write in matrix form (i.e, the design matrix X and vector! of unknowns) the following models y = bx y = a + bx y = bx 2 y = a + bx + cx 2 2) Using the X and y associated with these models, compute the OLS BLUE,! = (X T X) -1 X T y for each 13

Rank of the design matrix With n observations and p unknowns, X is an n x p matrix, so that X T X is p x p Thus, at most X can provide unique estimates for up to p < n parameters The rank of X is the number of independent rows of X. If X is of full rank, then rank = p A parameter is said to be estimable if we can provide a unique estimate of it. If the rank of X is k < p, then exactly k parameters are estimable (some as linear combinations, e.g.! 1-3! 3 = 4) if det(x T X) = 0, then X is not of full rank Number of nonzero eigenvalues of X T X gives the rank of X. 14

Experimental design and X The structure of X determines not only which parameters are estimable, but also the expected sample variances, as Var(!) = k (X T X) -1 Experimental design determines the structure of X before an experiment (of course, missing data almost always means the final X is different form the proposed X) Different criteria used for an optimal design. Let V = (X T X) -1. The idea is to chose a design for X given the constraints of the experiment that: A-optimality: minimizes tr(v) D-optimality: minimizes det(v) E-optimality: minimizes leading eigenvalue of V 15

Ordinary Least Squares (OLS) When the covariance structure of the residuals has a certain form, we solve for the vector! using OLS If residuals follow a MVN distribution, OLS = ML solution If the residuals are homoscedastic and uncorrelated, " 2 (e i ) = " e2, "(e i,e j ) = 0. Hence, each residual is equally weighted, Sum of squared residuals can be written as n i=1 ê 2 i = ê T ê = (y X!) T (y X!) Predicted value of the y s 16

Ordinary Least Squares (OLS) n i=1 ê 2 i = ê T ê = (y X!) T (y X!) Taking (matrix) derivatives shows this is minimized by! = (X T X) 1 X T y This is the OLS estimate of the vector! The variance-covariance estimate for the sample estimates is V! = (X T X) 1 σ 2 e The ij-th element gives the covariance between the estimates of! i and! j. 17

Sample Variances/Covariances The residual variance can be estimated as n σ 2 e = 1 n rank(x) i=1 ê 2 i The estimated residual variance can be substituted into V! = (X T X) 1 σ 2 e To give an approximation for the sampling variance and covariances of our estimates. Confidence intervals follow since the vector of estimates ~ MVN(!, V! ) 18

Example: Regression Through the Origin Here X = x 1 x 2. xn y i =!x i + e i y = y 1 y 2.. y n β = ( β ) X T X = n i=1 x 2 i X T y = n i=1 xi yi! = ( X T X) 1 X T y = σ 2 (!) = 1 n 1 xi yi x 2 i (yi!xi) 2 x 2 i σ 2 (b) = ( ) X T 1 X σ 2 e = σ2 e x 2 i σe 2 = 1 n - 1 (yi!xi) 2-19

Polynomial Regressions GLM can easily handle any function of the observed predictor variables, provided the parameters to estimate are still linear, e.g. Y = $ +! 1 f(x) +! 2 g(x) + + e Quadratic regression: y i = α + β 1 x i + β 2 x 2 i + e i β = α β 1 β 2 X = 1 x 1 x 2 1 1 x 2 x 2 2... 1 x n x 2 n 20

Interaction Effects Interaction terms (e.g. sex x age) are handled similarly y i = α +β 1 x i 1 + β 2 x i 2 + β 3 x i 1x i 2 +e i β = α β 1 X = β 2 β 3 1 x 11 x 12 x 11 x 12 1 x 21 x 22 x 21 x 22.... 1 x n 1 x n 2 x n 1x n 2 With x 1 held constant, a unit change in x 2 changes y by! 2 +! 3 x 1 (i.e., the slope in x 2 depends on the current value of x 1 ) Likewise, a unit change in x 1 changes y by! 1 +! 3 x 2 21

The GLM lets you build your own model! Suppose you want a quadratic regression forced through the origin where the slope of the quadratic term can vary over the sexes Y i =! 1 x i +! 2 x i 2 +! 3 s i x i 2 s i is an indicator (0/1) variable for the sex (0=male, 1=female). Male slope =! 2, Female slope =! 2 +! 3 22

Fixed vs. Random Effects In linear models are are trying to accomplish two goals: estimation the values of model parameters and estimate any appropriate variances. For example, in the simplest regression model, y = $ +!x + e, we estimate the values for $ and! and also the variance of e. We, of course, can also estimate the e i = y i - ($ +!x i ) Note that $/! are fixed constants are we trying to estimate (fixed factors or fixed effects), while the e i values are drawn from some probability distribution (typically Normal with mean 0, variance " 2 e). The e i are random effects. 23

This distinction between fixed and random effects is extremely important in terms of how we analyzed a model. If a parameter is a fixed constant we wish to estimate, it is a fixed effect. If a parameter is drawn from some probability distribution and we are trying to make inferences on either the distribution and/or specific realizations from this distribution, it is a random effect. We generally speak of estimating fixed factors and predicting random effects. Mixed models contain both fixed and random factors y = Xb + Zu + e, u ~MVN(0,R), e ~ MVN(0," 2 ei) Key: need to specify covariance structures for MM 24

Example: Sire model y ij = µ + s i + e ij Here µ is a fixed effect, e a random effect Is the sire effect s fixed or random? It depends. If we have (say) 10 sires, if we are ONLY interested in the values of these particular 10 sires and don t care to make any other inferences about the population from which the sires are drawn, then we can treat them as fixed effects. In the case, the model is fully specified the covariance structure for the residuals. Thus, we need to estimate µ, s 1 to s 10 and " 2 e, and we write the model as y ij = µ + s i + e ij, " 2 (e) = " 2 e I 25

y ij = µ + s i + e ij Conversely, if we are not only interested in these 10 particular sires but also wish to make some inference about the population from which they were drawn (such as the additive variance, since " 2 A = 4" 2 s, ), then the s i are random effects. In this case we wish to estimate µ and the variances " 2 s and " 2 w. Since 2s i also estimates (or predicts) the breeding value for sire i, we also wish to estimate (predict) these as well. Under a Random-effects interpretation, we write the model as y ij = µ + s i + e ij, " 2 (e) = " 2 ei, " 2 (s) = " 2 AA 26

Generalized Least Squares (GLS) Suppose the residuals no longer have the same variance (i.e., display heteroscedasticity). Clearly we do not wish to minimize the unweighted sum of squared residuals, because those residuals with smaller variance should receive more weight. Likewise in the event the residuals are correlated, we also wish to take this into account (i.e., perform a suitable transformation to remove the correlations) before minimizing the sum of squares. Either of the above settings leads to a GLS solution in place of an OLS solution. 27

In the GLS setting, the covariance matrix for the vector e of residuals is written as R where R ij = "(e i,e j ) The linear model becomes y = X! + e, cov(e) = R The GLS solution for! is ( ) b = X T R 1 1 T X X R 1 y The variance-covariance of the estimated model parameters is given by - V b = X T R 1 X 1 σe 2 ( ) 28

Example One common setting is where the residuals are uncorrelated but heteroscedastic, with " 2 (e i ) = " 2 e/w i. For example, sample i is the mean value of n i individuals, with " 2 (e i ) = " e2 /n i. Here w i = n i R = Diag(w 1 1, w 1 2,..., w 1 n ) Consider the model y i = $ +!x i β = ( α β ) X = 1 x 1.... 1 x n 29

Here - R 1 = Diag(w 1, w 2,...,w n ) giving X T R 1 y = w - y w xy w X T R 1 X = w 1 x w x w x 2 w where w = n wi, xw = i=1 y w = n i=1 n i=1 wiyi w, xy w = wixi w, x2 w = n i=1 n i=1 wixi yi w wix 2 i w

This gives the GLS estimators of $ and! as a = y w bx w b = xy w x w y w - x 2 w Likewise, the resulting variances and covariance for these estimators is x 2 w σ 2 (a) = σ 2 (b) = σ(a, b) = σ 2 e x 2 w w ( x 2 w xw 2 ) σ 2 e w ( x 2 w x 2 w ) σ 2 e x w w ( x 2 w x 2 w )

Chi-square and F distributions Let U i ~ N(0,1), i.e., a unit normal The sum U 1 2 + U 2 2 + + U k 2 is a chi-square random variable with k degrees of freedom Under appropriate normality assumptions, the sums of squares that appear in linear models are also chi-square distributed. In particular, n 2 (x i x ) χ 2 n 1 i=1 The ratio of two chi-squares is an F distribution 32

In particular, an F distribution with k numerator degrees of freedom, and n denominator degrees of freedom is given by χ 2 k /k χ 2 n/n F k,n Thus, F distributions frequently arise in tests of linear models, as these usually involve ratios of sums of squares. 33

Sums of Squares in linear models The total sums of squares SST of a linear model can be written as the sum of the error (or residual) sum of squares and the model (or regression) sum of squares SS T = SS M + SS E (y i y) 2 (y (y i y i ) 2 i y) 2 r 2, the coefficient of determination, is the fraction of variation accounted for by the model r 2 = SS M SS T = 1 - SS E SS T 34

Sums of Squares are quadratic products SS T = n ( y i y ) 2 = i=1 n yi 2 y 2 = i=1 ( ) n yi 2 1 n 2 n 2 y i i=1 i=1 We can write this as a quadratic product as SS T = y T y 1 ( n yt Jy = y T I 1 ) n J y Where J is a matrix all of whose elements are 1 s SS E = n ( y i y i ) 2 = i=1 SS E = y T (I X SS M = SS T SS E = y T (X n i=1 e i 2 ( ) ) 1 X T X X T y ( ) 1 X T X X T 1 ) n J y 35

Expected value of sums of squares In ANOVA tables, the E(MS), or expected value of the Mean Squares (scaled SS or Sum of Squares), often appears This directly follows from the quadratic product. If E(x) = µ, Var(x) = V, then E(x T Ax) = tr(av) + µ T Aµ 36

Hypothesis testing Provided the residual errors in the model are MVN, then for a model with n observations and p estimated parameters, SS E σ 2 e χ 2 n p Consider the comparison of a full (p parameters) and reduced (q < p) models, where SSE r = error SS for reduced model, SSE f = error SS for full model ( SSEr SS Ef p q ) / ( SSEf n p ) = ( ) ( ) n p SSEr 1 p q SS Ef The difference in the error sum of squares for the full and reduced model provided a test for whether the model fit is the same This ratio follows an F p-q,n-p distribution 37

Does our model account for a significant fraction of the variation? Here the reduced model is just y i = u + e i In this case, the error sum of squares for the reduced model is just the total sum of squares and the F test ratio becomes ( ) ( ) n p SST 1 p 1 SS Ef = ( ) ( n p r 2 ) p 1 1 r 2 This ratio follows an F p-1,n-p distribution 38

Model diagnostics It s all about the residuals Plot the residuals Quick and easy screen for outliers Test for normality among estimated residuals Q-Q plot Wilk-Shapiro test If non-normal, try transformations, such as log 39

OLS, GLS summary 40

Different statistical models GLM = general linear model OLS ordinary least squares: e ~ MVN(0,cI) GLS generalized least squares: e ~ MVN(0,R) Mixed models Both fixed and random effects (beyond the residual) Mixture models A weighted mixture of distributions Generalized linear models Nonlinear functions, non-normality 41

Mixture models Under a mixture model, an observation potentially comes from one of several different distributions, so that the density function is % 1 & 1 + % 2 & 2 + % 3 & 3 The mixture proportions % i sum to one The & i represent different distribution, e.g., normal with mean µ i and variance " 2 Mixture models come up in QTL mapping -- an individual could have QTL genotype QQ, Qq, or qq See Lynch & Walsh Chapter 13 They also come up in codon models of evolution, were a site may be neutral, deleterious, or advantageous, each with a different distribution of selection coefficients See Walsh & Lynch (volume 2A website), Chapters 10,11 42

Generalized linear models Typically assume non-normal distribution for residuals, e.g., Poisson, binomial, gamma, etc 43

Likelihoods for GLMs Under assumption of MVN, x ~ MVN(!,V), the likelihood function becomes [ = (2π) n/2 V 1/2 exp 1 ] - 2 (x ) T V 1 (x!) L(!,V x)! Variance components (e.g., " 2 A, "2 e, etc.) are included in V REML = restricted maximum likelihood. Method of choice for variance components, as it maximizes that part of the likelihood function that is independent of the fixed effects,!. 44