For GLM y = Xβ + e (1) where X is a N k design matrix and p(e) = N(0, σ 2 I N ), we can estimate the coefficients from the normal equations

Similar documents
11 Hypothesis Testing

Lecture 15. Hypothesis testing in the linear model

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

Multiple Linear Regression

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

Ch 3: Multiple Linear Regression

Statistical Inference

[y i α βx i ] 2 (2) Q = i=1

3. For a given dataset and linear model, what do you think is true about least squares estimates? Is Ŷ always unique? Yes. Is ˆβ always unique? No.

Statistical Inference

Linear Models Review

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

The linear model is the most fundamental of all serious statistical models encompassing:

Estimable Functions and Their Least Squares Estimators. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 51

Associated Hypotheses in Linear Models for Unbalanced Data

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Ch 2: Simple Linear Regression

Quick Review on Linear Multiple Regression

4 Multiple Linear Regression

1 Least Squares Estimation - multiple regression.

Maximum Likelihood Estimation

Chapter 3: Multiple Regression. August 14, 2018

The General Linear Model (GLM)

CAS MA575 Linear Models

14 Multiple Linear Regression

The General Linear Model. How we re approaching the GLM. What you ll get out of this 8/11/16

Biostatistics 533 Classical Theory of Linear Models Spring 2007 Final Exam. Please choose ONE of the following options.

3. The F Test for Comparing Reduced vs. Full Models. opyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics / 43

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

MLES & Multivariate Normal Theory

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Signal Processing for Functional Brain Imaging: General Linear Model (2)

Biostatistics 533 Classical Theory of Linear Models Spring 2007 Final Exam. Please choose ONE of the following options.

A Note on UMPI F Tests

F3: Classical normal linear rgression model distribution, interval estimation and hypothesis testing

Appendix A: Review of the General Linear Model

Economics 620, Lecture 5: exp

Lecture 3: Multiple Regression

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

Sampling Distributions

The outline for Unit 3

STAT 401A - Statistical Methods for Research Workers

18.S096 Problem Set 3 Fall 2013 Regression Analysis Due Date: 10/8/2013

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION

Bayesian Linear Models

Empirical Economic Research, Part II

Ma 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA

Linear models and their mathematical foundations: Simple linear regression

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

AMS-207: Bayesian Statistics

Economics 573 Problem Set 5 Fall 2002 Due: 4 October b. The sample mean converges in probability to the population mean.

Bayesian Linear Models

BIOS 2083 Linear Models c Abdus S. Wahed

2.7 Estimation with linear Restriction

Economics 620, Lecture 4: The K-Variable Linear Model I. y 1 = + x 1 + " 1 y 2 = + x 2 + " 2 :::::::: :::::::: y N = + x N + " N

MFin Econometrics I Session 4: t-distribution, Simple Linear Regression, OLS assumptions and properties of OLS estimators

Regression. ECO 312 Fall 2013 Chris Sims. January 12, 2014

3 Multiple Linear Regression

The General Linear Model. Monday, Lecture 2 Jeanette Mumford University of Wisconsin - Madison

Brief Suggested Solutions

Sampling Distributions

One-Way ANOVA Model. group 1 y 11, y 12 y 13 y 1n1. group 2 y 21, y 22 y 2n2. group g y g1, y g2, y gng. g is # of groups,

Simple and Multiple Linear Regression

Statistical Distribution Assumptions of General Linear Models

Basic Distributional Assumptions of the Linear Model: 1. The errors are unbiased: E[ε] = The errors are uncorrelated with common variance:

Statistics II Exercises Chapter 5

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

The Standard Linear Model: Hypothesis Testing

Multivariate Regression

First Year Examination Department of Statistics, University of Florida

The General Linear Model in Functional MRI

Chapter 8: Hypothesis Testing Lecture 9: Likelihood ratio tests

Categorical Predictor Variables

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

ECON 5350 Class Notes Functional Form and Structural Change

Simple Linear Regression

2. Regression Review

General Linear Model: Statistical Inference

5.1 Consistency of least squares estimates. We begin with a few consistency results that stand on their own and do not depend on normality.

ML and REML Variance Component Estimation. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

8. Hypothesis Testing

Basis Penalty Smoothers. Simon Wood Mathematical Sciences, University of Bath, U.K.

20.1. Balanced One-Way Classification Cell means parametrization: ε 1. ε I. + ˆɛ 2 ij =

ECON The Simple Regression Model

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

Gauss Markov & Predictive Distributions

Homoskedasticity. Var (u X) = σ 2. (23)

So far our focus has been on estimation of the parameter vector β in the. y = Xβ + u

Linear Models in Machine Learning


REGRESSION ANALYSIS AND INDICATOR VARIABLES

Economics 620, Lecture 4: The K-Varable Linear Model I

Generalized Linear Models. Kurt Hornik

Lecture 6 Multiple Linear Regression, cont.

High-dimensional regression

The Statistical Property of Ordinary Least Squares

Properties of the least squares estimates

Contrasts and Classical Inference

Transcription:

1 Generalised Inverse For GLM y = Xβ + e (1) where X is a N k design matrix and p(e) = N(0, σ 2 I N ), we can estimate the coefficients from the normal equations (X T X)β = X T y (2) If rank of X, denoted r(x), is k (ie. full rank) then X T X has an inverse (it is nonsingular ) and ˆβ = (X T X) 1 X T y (3) But if r(x) < k we can have Xβ 1 = Xβ 2 (ie. same predictions) with β 1 β 2 (different parameters). The parameters are then not therefore unique, identifiable or estimable. For example, a design matrix sometimes used in the 1

Analysis of Variance (ANOVA) X = 1 0 1 1 0 1 1 0 1 0 1 1 0 1 1 0 1 1 0 1 1 (4) has k = 3 columns but rank r(x) = 2 ie. only two linearly independent columns (any column can be expressed as a linear combination of the other two). For models such as these X T X is not invertible, so we must resort to the generalised inverse, X. This is defined as any matrix X such that XX X = X. It can be shown that in the general case ˆβ = (X T X) X T y (5) = X y If X is full-rank, X T X is invertible and X = (X T X) 1 X T. 2

There are many generalise inverses. We would often choose the pseudo-inverse (pinv in MATLAB) ˆβ = X + y (6) Take home message: avoid rank-deficient designs. If X is full rank, then X + = X = (X T X) 1 X T. 2 Estimating error variance An unbiased estimate for the error variance σ 2 can be derived as follows. Let where P is the projection matrix X ˆβ = P y (7) P = X(X T X) X T (8) = XX P y projects the data y into the space of X. P has two important properties (i) it is symmetric P T = P, (ii) 3

P P =P. This second property follows from it being a projection. If what is being projected is already in X space (ie. P y) then looking for that component of it that is in X space will give the same thing ie. P P y = P y. Then residuals are ê = y X ˆβ (9) = (I P )y = Ry where R = I N XX is the residual-forming matrix. Remember, ê is that component of the data, orthogonal to the space X. Ry is another projection matrix, but one that projects the data y into the orthogonal complement of X. Similarly, R has the two properties (i) R T = R and (ii) RR = R. We now look seek an unbiased estimator of the variance by first looking at the expected sum of squares E[ê T ê] = E[y T R T Ry] (10) = E[y T Ry] 4

5

We now use the standard result: If p(a) = N(µ, V ) then E[a T Ba] = µ T Bµ + T r(bv ) So, if p(y) = N(X ˆβ, σ 2 I N ) then E[y T Ry] = ˆβ T X T RX ˆβ + T r(σ 2 R) (11) = ˆβ T (X T X X T XX X) ˆβ + T r(σ 2 R) = T r(σ 2 (I P )) = σ 2 (N r(p )) = σ 2 (N k) So, an unbiased estimate of the variance is ˆσ 2 = (y T Ry)/(N k) (12) = RSS/(N k) where the RSS is Residual Sum of Squares. Remember, 6

the ML variance estimate is ˆσ 2 ML = (y T Ry)/N (13) 3 Comparing nested GLMs Full model: Reduced model: Consider the test-statistic y = X 0 β 0 + X 1 β 1 + e (14) y = X 0 β 0 + e 0 (15) f = (RSS red RSS full )/(k p) RSS full /(N k) where Residual Sum of Squares (RSS) are (16) RSS full = ê T ê (17) RSS red = ê T 0 ê 0 (18) 7

We can re-write in terms of Extra Sum of Squares where f = ESS/(k p) RSS full /(N k) (19) ESS = RSS red RSS full (20) We can compute these quantities using RSS full = y T Ry (21) RSS red = y T R 0 y We expect the denominator to be E[RSS full /(N k)] = σ 2 (22) and, under the null (β 1 = 0), we have σ 2 0 = σ 2 and therefore expect the numerator to be E[(RSS red RSS full )/(k p)] = σ 2 (23) where r(r 0 R) = k p (mirroring the earlier expectation calculation). Under the null, we therefore expect a 8

test statistic of unity < f >= σ2 σ 2 (24) as both numerator and denominator are unbiased estimates of error variance. We might naively expect to get a numerator of zero, under the null. But this is not the case because, in any finite sample, ESS will be non zero. When we then divide by (k p) we get E[ESS/(k p)] = σ 2. When the full model is better we get a larger f value. 4 Partial correlation and R 2 The square of the partial correlaton coefficient R 2 y,x 1 X 0 = RSS red RSS full RSS red (25) is the (square) of the correlation between y and X 1 β 1 after controlling for the effect of X 0 β 0. Abbreviating the 9

above to R 2, the F-statistic can be re-written as f = R 2 /(k p) (1 R 2 )/(N k) (26) Model comparison tests are identical to tests of partial correlation. In X 0 explains no variance eg. it is a constant or empty matrix then R 2 = Y T Y Y T RY (27) Y T Y which is the proportion of variance explained by the model with design matrix X. More generally, if X 0 is not the empty matrix then R 2 is that proportion of the variability unexplained by the reduced model X 0 that is explained by the full model X. 5 Examples 10

11

12

13

14

15

16

6 How large must f be for a significant improvement? Under the null (β 1 = 0), f follows an F -distribution with k p numerator degrees of freedom (DF) and N k denominator DF. Info on PDFs and transforming them. 17

7 Contrasts We can also compare nested models using contrasts. This is more efficient, as we only need to estimate parameters of the full model. For a contrast matrix C we wish to test the hypothesis C T β = 0. This can correspond to a model comparison, as before, if C is chosen appropriately. But it is also more general, as we can test any effect which can be expressed as C T β = H T Xβ (28) for some H. This defines a space of estimable contrasts. The contrast C defines a subspace X c = XC. As before, we can think of the hypothesis C T β = 0 as comparing a full model, X, versus a reduced model which is now given by X 0 = XC 0 where C 0 is a contrast orthogonal to C ie. C 0 = I k CC (29) A test statistic can then be generated as before where 18

R 0 = I N X 0 X 0, M = R 0 R and f = yt My/r(M) y T Ry/r(R) (30) In fmri, the use of contrasts allows us to test for (i) main effects and interactions in factorial designs, (ii) choice of hemodynamic basis sets. Importantly, we do not need to refit models. The numerator can be calculated efficiently as y T My = ĉ T [ C T (X T X) C ] ĉ (31) where ĉ = C T ˆβ is the estimated effect size. See Christensen [1] for details. 19

8 Hemodynamic basis functions If C(t, u) is the Canonical basis function for event offset u then, using a first-order Taylor series approximation dc(t, u) C(t, u 0 + h) C(t, u 0 ) + h du C(t, u 0 ) + hd(t, u 0 ) (32) where the derivative is evaluated at u = u 0. This will allow us to accomodate small errors in event timings, or earlier/later rises in the hemodynamic response. 20

21

22

23

24

25

26

9 References [1] R. Christensen. Plane Answers to Complex Questions: The Theory of Linear Models. Springer, New York, 2002. 27