Next is material on matrix rank. Please see the handout

Similar documents
[y i α βx i ] 2 (2) Q = i=1

Multivariate Statistical Analysis

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Consistent Bivariate Distribution

The Multivariate Gaussian Distribution [DRAFT]

VAR Model. (k-variate) VAR(p) model (in the Reduced Form): Y t-2. Y t-1 = A + B 1. Y t + B 2. Y t-p. + ε t. + + B p. where:

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Linear Models and Estimation by Least Squares

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

[Disclaimer: This is not a complete list of everything you need to know, just some of the topics that gave people difficulty.]

Large Sample Properties of Estimators in the Classical Linear Regression Model

Linear Algebra March 16, 2019

. a m1 a mn. a 1 a 2 a = a n

TAMS39 Lecture 2 Multivariate normal distribution

Properties of the least squares estimates

Eigenvalues & Eigenvectors

Measuring the fit of the model - SSR

Elliptically Contoured Distributions

The General Linear Model. Monday, Lecture 2 Jeanette Mumford University of Wisconsin - Madison

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

STAT 100C: Linear models

CS 143 Linear Algebra Review

THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2008, Mr. Ruey S. Tsay. Solutions to Final Exam

Linear Algebra: Matrix Eigenvalue Problems

AGEC 621 Lecture 16 David Bessler

Stat 5101 Lecture Notes

Econ 620. Matrix Differentiation. Let a and x are (k 1) vectors and A is an (k k) matrix. ) x. (a x) = a. x = a (x Ax) =(A + A (x Ax) x x =(A + A )

Mathematics. EC / EE / IN / ME / CE. for

Interpreting Regression Results

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Chapter 4 & 5: Vector Spaces & Linear Transformations

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

The Linear Regression Model

CAS MA575 Linear Models

Simple Linear Regression

Regression Models - Introduction

2. Matrix Algebra and Random Vectors

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Regression. Estimation of the linear function (straight line) describing the linear component of the joint relationship between two variables X and Y.

Linear Regression and Its Applications

Ch 2: Simple Linear Regression

3d scatterplots. You can also make 3d scatterplots, although these are less common than scatterplot matrices.

The Delta Method and Applications

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Singular Value Decomposition. 1 Singular Value Decomposition and the Four Fundamental Subspaces

STA 294: Stochastic Processes & Bayesian Nonparametrics

Review of Statistics

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression

An Introduction to Bayesian Linear Regression

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

Economics 240A, Section 3: Short and Long Regression (Ch. 17) and the Multivariate Normal Distribution (Ch. 18)

Bayesian Linear Regression [DRAFT - In Progress]

MIT Final Exam Solutions, Spring 2017

Vectors and Matrices Statistics with Vectors and Matrices

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.

Master s Written Examination - Solution

Designing Information Devices and Systems I Discussion 4B

3 Multiple Linear Regression

A Probability Review

Eigenvalues, Eigenvectors, and an Intro to PCA

Eigenvalues, Eigenvectors, and an Intro to PCA

SAMPLE OF THE STUDY MATERIAL PART OF CHAPTER 1 Introduction to Linear Algebra

Learning gradients: prescriptive models

Linear models and their mathematical foundations: Simple linear regression

Systems of Linear ODEs

Multivariate Regression

Hypothesis Testing hypothesis testing approach

18.06 Problem Set 8 - Solutions Due Wednesday, 14 November 2007 at 4 pm in

Math 21b. Review for Final Exam

Econometrics II - EXAM Answer each question in separate sheets in three hours

Stat 206: Sampling theory, sample moments, mahalanobis

Eigenvalues, Eigenvectors, and an Intro to PCA

STAT 501 Assignment 1 Name Spring Written Assignment: Due Monday, January 22, in class. Please write your answers on this assignment

Eigenvalues and Eigenvectors. Prepared by Vince Zaccone For Campus Learning Assistance Services at UCSB

Introduction and Single Predictor Regression. Correlation

Elementary Linear Algebra

6 EIGENVALUES AND EIGENVECTORS

Multivariate Time Series: VAR(p) Processes and Models

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

Eigenvectors and Hermitian Operators

Linear Algebra & Geometry why is linear algebra useful in computer vision?

1 Appendix A: Matrix Algebra

Study Guide for Linear Algebra Exam 2

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

Final Review Sheet. B = (1, 1 + 3x, 1 + x 2 ) then 2 + 3x + 6x 2

Repeated Eigenvalues and Symmetric Matrices

STAT 501 Assignment 1 Name Spring 2005

2.1: Inferences about β 1

Background Mathematics (2/2) 1. David Barber

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Agenda: Understand the action of A by seeing how it acts on eigenvectors.

Applied Econometrics (QEM)

Stat 159/259: Linear Algebra Notes

Dimension. Eigenvalue and eigenvector

The Multivariate Normal Distribution. In this case according to our theorem

1. The Multivariate Classical Linear Regression Model

Multivariate Regression (Chapter 10)

Lecture 9 SLR in Matrix Form

Transcription:

B90.330 / C.005 NOTES for Wednesday 0.APR.7 Suppose that the model is β + ε, but ε does not have the desired variance matrix. Say that ε is normal, but Var(ε) σ W. The form of W is W w 0 0 0 0 0 0 w 0 0 0 w 0 0 0 3 w n We assume that we know the w i s, but we don t know σ. This would described as W diag,,,..., w w w3 w n The solution is easy. Define diag w, w, w3,..., wn. W. Then W W β + W ε. The distribution of W ε is normal with mean 0 and variance σ I. This gets us right back to the standard model. The problem is called weighted least squares. The same idea works if Var(ε) σ W in which W is some other known symmetric positive definite matrix. The only hangup is that W is more difficult to define and calculate. This problem is known as generalized least squares. Next is material on matrix rank. Please see the handout

We need to ask what we lose if does not have full rank. It s simple. The fitted vector ˆ is still unique. The residual sum of squares is identified (though its degrees of freedom get confused). The estimated coefficient b is not identifiable. Let s do the definition of positive definite. For ordinary numbers, the concept a > 0 is very clear. What should we mean by having a positive matrix? It s too much to ask that every entry be positive. We ll restrict our notions here to square matrices. There is no material consequence to the distinction symmetric versus non-symmetric, so we ll assume symmetric matrices. An n n matrix W is called positive definite if ( x is n and x 0 ) ( x W x > 0 ) 4 0 The matrix W 0 x W x 4 x + x > 0. is positive definite. If x x x, and x 0, then 0 The matrix W 40 is also positive definite. If x x W x 0 x x x + 40 x > 0. x x, and x 0, then An n n matrix W is called positive semi-definite if ( x is n ) ( x W x 0 ) The concepts negative definite and negative semi-definite are defined in the obvious way, but these ideas are much less useful.

There are these results about matrices that should be part of the knowledge base of every statistical person. * Covariance matrices are automatically positive semi-definite. * If Var() Ω then Var( a ) a Ω a 0. * If there is a vector a for which Var( a ) a Ω a 0, then the linear combination a is a constant. Equivalently, the matrix Ω must be singular. * Square matrices have eigenvalues. An eigenvalue is an any λ for which there is a non-zero vector u for which Ω u λ u. Rewrite this as (Ω - λi) u 0. Thus λ is an eigenvalue if and only if (Ω - λi) is a singular matrix. One consequence is that det[(ω - λi) ] 0 is a condition that can be used to find eigenvalues. We won t prove that here. Also, this is usually not the best computational way to find eigenvalues, at least for large matrices. In passing, we observe that an n n has n eigenvalues, not necessarily all different. cos θ sin θ Do eigenvalues have to exist? The matrix Ω sin θ cos θ u is a pure rotation. ou can check that Ω u rotates u u by angle θ. Consider then the condition det[(ω - λi) ] The condition is then cos θ λ sin θ det sin θ cos θ λ 0 (cos θ - λ) + sin θ 0 This is cos θ - λ cos θ + λ + sin θ 0 or λ - λ cos θ + 0 3

The roots for λ are cos θ ± cos θ cosθ ± 4cos θ 4 cos θ± sin θ cos θ ± i sin θ. It looks like eigenvalues can be imaginary! cos θ ± cos θ As a fun little proof, you can show that symmetric matrices only have real eigenvalues. (It s assumed that the matrix in question has all real entries.) Here s how. Say that A A and that u is an eigenvector with eigenvalue λ. While A has all real entries, we have made no such assumptions on λ or u. Now A u λ u. Write this in transpose form as u A λ u. Take complex conjugates throughout. The bar denotes complex conjugates. Thus a+ ib a ib. This operation for matrices and vectors is entry-by-entry. Note this about squared length: u u. Then u A λ u. Multiply this into u, getting u Au λ u u λ u In the original A u λ u, multiply on the left by u to get u Au u Au u λ u λ u u λ u This gets to λ λ, so λ must be a real number. * Matrix Ω is positive semi-definite if and only if all its eigenvalues are 0. An eigenvalue is an any λ for which there is a non-zero vector u for which Ω u λ u. Take this condition and multiply on the left by u to get u Ω u λ u u λ u. Since u, the squared length of u, is 4

non-negative, it follows that u Ω u 0. This proves the positive semi-definite definition for eigenvalues. However, every vector is a linear combination of the eigenvectors, so the result is true in general. * If M is idempotent, then its eigenvalues are all 0 s and s. This is easy to prove. Suppose that M u λ u. Multiple both sides by M to get M M u M u M (λu) λ M u λ (λ u) λ u This shows that λ λ, which is satisfied only for 0 and. See the handout on the Gauss-Markov theorem. The important finding is that the least squares estimate b ( ) - is blue (best linear unbiased estimate). It s important to note that this * uses Var( ε ) σ I * does not use normality (it s only about first and second moments) We have a nice handout out the reduced sum of squares. Here s a matrix version. Let s suppose the usual model β + ε. Let s suppose that the matrix is partitioned as U V n p n s n ( ps) We ll suppose that the column is part of U. In similar style, partition the coefficient vector as β p βu s βv ( p s) We ll assume, to preserve sanity, that everything has full rank. We d like to test the null hypothesis H 0 : β V 0 versus H : β V 0. The likelihood ratio test works very well for this. This comes down to comparing the two residual sums of squares: 5

SS Resid (H 0 ) based on model U β U + ε SS Resid (H ) based on model β + ε ( U V ) β β U V + ε Observe that SS Resid (H 0 ) SS Resid (H ). The difference SS Resid (H 0 ) - SS Resid (H ) will be distributed as σ χ when H 0 is true. This will be independent of ps SS Resid (H ), as it is in an orthogonal space. Of course SS Resid (H ) ~ These two together give us the basis for the partial F test. This test is σ χ. n p F ( ) MS ( H ) SSResid H0 SSResid H p s Resid and it has ( p s, n p ) degrees of freedom. Since SS Total SS Regr + SS Resid (for any model), the test can also be written as F ( ) MS ( H ) SSRegr H SSRegr H0 p s Resid 6

Let s make a note on conditional mean and variance. Suppose that random variables and have a joint density f, (x, y). The conditional mean E( x) and conditional variance Var( x) will come from the conditional density f (y x) f ( xy),, f ( x) What are these things for the normal distribution? Let s suppose that W is a k random multivariate normal with mean E W μ and Var( W ) Σ. This use of Σ is fairly common for this context. The density is ( wμ) Σ ( wμ) f( w ) e k / / π Σ In the special case with two coordinates (k ) for bivariate random variable σ the covariance matrix as Σ ρσ σ ρσ σ σ. The inverse is, write Σ - σ σ ( ρ ) ρ σ σ σ σ ρ σ σ. Then the density is where f y x,, πσ σ ρ e z ( ρ ) z ( y μ ) ρ( y μ )( x μ ) ( x μ ) + σ σ σ σ The marginal distribution of is normal, mean μ and variance conditional distribution of x : σ. So here is the 7

f y x πσ ρ f, y, x f ( x) We need to examine the exponent. e πσ σ ρ σ z + μ ( ρ ) σ ( x ) e π σ e z ( x μ ) ( ρ ) z + μ σ ( ρ ) ( x ) ( y μ ) ρ( y μ )( x μ ) ( x μ ) + + μ ( ρ ) σ σ σ σ σ ( x ) ( y )( x ) ( x ) ( y μ ) + + μ σ ( ρ ) ρσ μ μ σ μ ( x ) σ σ σ ( y )( x ) ρσ μ μ σ μ + μ ρ y x σ ( ρ ) σ σ ρσ y μ x μ σ ( ρ ) σ As a convenient notation, let s use β ρ σ σ. Finally, f y x πσ ρ e { (( y ) ( x )) } ( ρ ) μ β μ σ This conditional density is of course normal. The mean, meaning E( x), is ( x ) μ +β μ. The variance, meaning Var( x), is ( ) σ ρ. 8

This has many links to the model for simple linear regression! * The model is i β 0 + β x i + ε i, where the ε i s are independent, each with mean zero and with variance σ. Sxy * The estimate of slope is b Sxy. Observe that the Sxx n n Sxx numerator estimates Cov(, ) ρ σ σ, and the denominator estimates ρ σ σ. Thus b estimates. σ * Var ε σ ( ) σ ρ Var( x ). Curiously, this does not depend on the value (lower case) x on which the conditioning was done. (More just below.) In a simple linear regression R r, so that R Var ( ) estimates Var ( ). This is almost the definition of adjusted R. * The simple linear regression model gives the same variance to every ε i, so that this model is consistent with a bivariate normal distribution. That is, the bivariate normal satisfies the condition that Var( x ) is the same for all (lower case) x. If it happens that Var( x ) seems to depend on x, then we try to transform the problem to achieve equi-variance. Suppose that we have three or more coordinates to our multivariate normal random vector. We would then want Corr(, 3 x 3 ) to be the correlation in the conditional distribution. The result that makes all this go through is this one. Suppose that q r and with variance matrix is a (q+r) multivariate normal random vector with mean Σ Σ q q q r. Σ Σ r q r r μ q μ r 9

The conditional distribution of, given x, is then multivariate normal with mean μ + Σ Σ x μ and with variance Σ Σ Σ Σ. ( ) r q q r r r r q q q r r r r q It should be regarded as amazing (!!!) that the conditional variance matrix does not depend on x, the value on which the conditioning was done. Suppose that r, so that just represents one variable. Then Σ is just a scalar, and Σ is trivial to compute. This idea greatly simplifies the calculation of stepwise regression. See the handout on the column space of. Here s the definition: col () { a a is any p-by- vector } This is, of course, a p-dimensional space, since there are exactly p free choices in the vector a. Since E β, the expected value of the data vector lies in a p-dimensional space. However contains n pieces of information, and n is much greater than p. The remaining n - p pieces of information in can be relevant only to noise, and indeed will furnish the basis for estimating σ, the noise standard deviation. This is exactly the reason that the residual line in the analysis of variance table has n - p n - K - degrees of freedom. 0