STAT 350: Geometry of Least Squares

Similar documents
Lecture 2. The Simple Linear Regression Model: Matrix Approach

Lecture 6: Geometry of OLS Estimation of Linear Regession

20.1. Balanced One-Way Classification Cell means parametrization: ε 1. ε I. + ˆɛ 2 ij =

One-Way ANOVA Model. group 1 y 11, y 12 y 13 y 1n1. group 2 y 21, y 22 y 2n2. group g y g1, y g2, y gng. g is # of groups,

Linear Models Review

Ch 2: Simple Linear Regression

Math Camp II. Basic Linear Algebra. Yiqing Xu. Aug 26, 2014 MIT

Unbalanced Data in Factorials Types I, II, III SS Part 1

22A-2 SUMMER 2014 LECTURE Agenda

The geometry of least squares

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Intro to Linear Regression

Intro to Linear Regression

Chapter 6: Orthogonality

MATH 22A: LINEAR ALGEBRA Chapter 4

Linear models. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. October 5, 2016

Math Linear Algebra

Vectors and Matrices Statistics with Vectors and Matrices

Image Registration Lecture 2: Vectors and Matrices

Properties of Matrices and Operations on Matrices

MTH 2310, FALL Introduction

Lecture 1 Review: Linear models have the form (in matrix notation) Y = Xβ + ε,

Lecture 3 Linear Algebra Background

Lecture 2: Vector-Vector Operations

Review of Linear Algebra

Dot Products, Transposes, and Orthogonal Projections

[y i α βx i ] 2 (2) Q = i=1

Chapter 6. Orthogonality

Review of linear algebra

Statistics 910, #5 1. Regression Methods

. a m1 a mn. a 1 a 2 a = a n

22 Approximations - the method of least squares (1)

Singular Value Decomposition. 1 Singular Value Decomposition and the Four Fundamental Subspaces

MAT 1332: CALCULUS FOR LIFE SCIENCES. Contents. 1. Review: Linear Algebra II Vectors and matrices Definition. 1.2.

Projections and Least Square Solutions. Recall that given an inner product space V with subspace W and orthogonal basis for

Section 6.4. The Gram Schmidt Process

Orthogonal Projection and Least Squares Prof. Philip Pennance 1 -Version: December 12, 2016

Chapter 3: Theory Review: Solutions Math 308 F Spring 2015

CS 246 Review of Linear Algebra 01/17/19

MATH 167: APPLIED LINEAR ALGEBRA Least-Squares

Approximations - the method of least squares (1)

Dot Products. K. Behrend. April 3, Abstract A short review of some basic facts on the dot product. Projections. The spectral theorem.

4 ORTHOGONALITY ORTHOGONALITY OF THE FOUR SUBSPACES 4.1

MATH Linear Algebra

Design and Analysis of Experiments Prof. Jhareswar Maiti Department of Industrial and Systems Engineering Indian Institute of Technology, Kharagpur

is Use at most six elementary row operations. (Partial

The Geometry of Linear Regression

Lecture 3: Matrix and Matrix Operations

Lecture Notes 1: Vector spaces

STAT 151A: Lab 1. 1 Logistics. 2 Reference. 3 Playing with R: graphics and lm() 4 Random vectors. Billy Fang. 2 September 2017

Linear Algebra V = T = ( 4 3 ).

Maths for Signals and Systems Linear Algebra for Engineering Applications

Inner Product, Length, and Orthogonality

Remedial Measures, Brown-Forsythe test, F test

Linear Algebra Review. Vectors

Math 261 Lecture Notes: Sections 6.1, 6.2, 6.3 and 6.4 Orthogonal Sets and Projections

Lecture 1 Systems of Linear Equations and Matrices

[Disclaimer: This is not a complete list of everything you need to know, just some of the topics that gave people difficulty.]

MAT2342 : Introduction to Applied Linear Algebra Mike Newman, fall Projections. introduction

Linear Algebra (Review) Volker Tresp 2017

Least squares problems Linear Algebra with Computer Science Application

7. Dimension and Structure.

Factorial designs. Experiments

A Note on UMPI F Tests

1 Overview. 2 Multiple Regression framework. Effect Coding. Hervé Abdi

Matrices and Matrix Algebra.

Confidence Intervals, Testing and ANOVA Summary

1. Vectors.

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

Linear Algebra (Review) Volker Tresp 2018

Weighted Least Squares

Lecture 7. Econ August 18

Singular Value Decomposition and Principal Component Analysis (PCA) I

Lecture 13: Simple Linear Regression in Matrix Format

Orthogonality. Orthonormal Bases, Orthogonal Matrices. Orthogonality

This appendix provides a very basic introduction to linear algebra concepts.

Conceptual Questions for Review

11 Hypothesis Testing

5. Linear Regression

March 27 Math 3260 sec. 56 Spring 2018

The Hilbert Space of Random Variables

Definition of Equality of Matrices. Example 1: Equality of Matrices. Consider the four matrices

POL 213: Research Methods

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

Worksheet for Lecture 23 (due December 4) Section 6.1 Inner product, length, and orthogonality

1 Dirac Notation for Vector Spaces

An Introduction to Matrix Algebra

Physics 342 Lecture 2. Linear Algebra I. Lecture 2. Physics 342 Quantum Mechanics I

Review: Linear and Vector Algebra

Principal Component Analysis

6.1. Inner Product, Length and Orthogonality

Multivariate Statistical Analysis

Answer Key for Exam #2

Regression With a Categorical Independent Variable: Mean Comparisons

MATH 221, Spring Homework 10 Solutions

ECS130 Scientific Computing. Lecture 1: Introduction. Monday, January 7, 10:00 10:50 am

MATH 167: APPLIED LINEAR ALGEBRA Chapter 3

Mathematics for Graphics and Vision

LINEAR ALGEBRA 1, 2012-I PARTIAL EXAM 3 SOLUTIONS TO PRACTICE PROBLEMS

3. For a given dataset and linear model, what do you think is true about least squares estimates? Is Ŷ always unique? Yes. Is ˆβ always unique? No.

Transcription:

The Geometry of Least Squares Mathematical Basics Inner / dot product: a and b column vectors a b = a T b = a i b i a b a T b = 0 Matrix Product: A is r s B is s t (AB) rt = s A rs B st

Partitioned Matrices Partitioned matrices are like ordinary matrices but the entries are matrices themselves. They add and multiply (if the dimensions match properly) just like regular matrices but(!) you must remember that matrix multiplication is not commutative. Here is an example A = B = [ A11 A 12 A 13 A 21 A 22 A 23 B 11 B 12 B 21 B 22 B 31 B 32 ]

Think of A as a 2 3 matrix and B as a 3 2 matrix. multiply them to get C = AB a 2 2 matrix as follows: [ A11 B AB = 11 + A 12 B 21 + A 13 B 31 A 11 B 12 + A 12 B 22 + A 13 B 32 A 21 B 11 + A 22 B 21 + A 23 B 31 A 21 B 12 + A 22 B 22 + A 23 B 32 ] BUT: this only works if each of the matrix products in the formulas makes sense. So, A 11 must have the same number of columns as B 11 has rows and many other similar restrictions apply.

First application: X = [X 1 X 2 X p ] where each X i is a column of X. Then X β = [X 1 X 2 X p ] β 1. β p = X 1 β 1 + X 2 β 2 + + X p β p which is a linear combination of the columns of X. Definition: The column space of X, written col(x ) is the (vector space of) set of all linear combinations of columns of X also called the space spanned by the columns of X. SO: ˆµ = X β is in col(x ).

Back to normal equations: X T Y = X T X ˆβ or [ ] X T Y X ˆβ = 0 or X T 1. [ Y X ˆβ ] = 0 or or X T i X T p [ ] Y X ˆβ = 0 i = 1,..., p Y X ˆβ every vector in col(x )

Definition: ˆǫ = Y X ˆβ is the fitted residual vector. SO: ˆǫ col(x ) and ˆǫ ˆµ Pythagoras Theorem: If a b then a 2 + b 2 = a + b 2 Definition: a is the length or norm of a: a = a 2 i = a T a Moreover, if a, b, c,... are all perpendicular then a 2 + b 2 + = a + b + 2

Application Y = Y X ˆβ + X ˆβ = ˆǫ + ˆµ so Y 2 = ˆǫ 2 + ˆµ 2 or Y 2 i Definitions: Y 2 i = ˆǫ 2 i + ˆµ 2 i = Total Sum of Squares (unadjusted) ˆǫ 2 i = Error or Residual Sum of Squares ˆµ 2 i = Regression Sum of Squares

Alternative formulas for the Regression SS ˆµ 2 i = ˆµ T ˆµ = (X ˆβ) T (X ˆβ) = ˆβ T X T X ˆβ Notice the matrix identity which I will use regularly: (AB) T = B T A T.

What is least squares? Choose ˆβ to minimize (Yi ˆµ i ) 2 = Y ˆµ 2 That is, to minimize ˆǫ 2. The resulting ˆµ is called the Orthogonal Projection of Y onto the column space of X. Extension: [ ] β1 X = [X 1 X 2 ] β = p = p 1 + p 2 Imagine we fit 2 models: 1. The FULL model: 2. The REDUCED model: β 2 Y = X β + ǫ(= X 1 β 1 + X 2 β 2 + ǫ) Y = X 1 β 1 + ǫ

If we fit the full model we get ˆβ F ˆµ F ˆǫ F ˆǫ F col(x ) (1) If we fit the reduced model we get ˆβ R ˆµ R ˆǫ R ˆµ R col(x 1 ) col(x ) (2) Notice that ˆǫ F ˆµ R. (3) (The vector ˆµ R is in the column space of X 1 so it is in the column space of X and ˆǫ F is orthogonal to everything in the column space of X.) So: Y = ˆǫ F + ˆµ F = ˆǫ F + ˆµ R + (ˆµ F ˆµ R ) = ǫ R + ˆµ R

You know ˆǫ F ˆµ R (from (3) above) and ˆǫ F ˆµ F (from (1) above). So ˆǫ F ˆµ F ˆµ R Also ˆµ R ˆǫ R = ˆǫ F + (ˆµ F ˆµ R ) So 0 = (ˆǫ F + ˆµ F ˆµ R ) T ˆµ R so = ˆǫ T F ˆµ R +(ˆµ }{{} F ˆµ R ) T ˆµ R 0 ˆµ F ˆµ R ˆµ R

Summary We have Y = ˆµ R + (ˆµ F ˆµ R ) + ˆǫ F All three vectors on the Right Hand Side are perpendicular to each other. This gives: Y 2 = ˆµ R 2 + ˆµ F ˆµ R 2 + ˆǫ F 2 which is an Analysis of Variance (ANOVA) table!

Here is the most basic version of the above: X = [1 X 1 ] Y i = β 0 + + ǫ i The notation here is that 1 = 1. 1 is a column vector with all entries equal to 1. The coefficient of this column, β 0, is called the intercept term in the model.

To find ˆµ R we minimize (Yi ˆβ 0 ) 2 and get simply ˆβ 0 = Ȳ and ˆµ R = Ȳ. Ȳ Our ANOVA identity is now Y 2 = ˆµ R 2 + ˆµ F ˆµ R 2 + ˆǫ F 2 = nȳ 2 + ˆµ F ˆµ R 2 + ˆǫ F 2

This identity is usually rewritten in subtracted form: Y 2 nȳ 2 = ˆµ F ˆµ R 2 + ˆǫ F 2 Remembering the identity (Y i Ȳ )2 = Yi 2 (Yi Ȳ ) 2 = (ˆµ F,i Ȳ ) 2 + ˆǫ 2 F,i nȳ 2 we find These terms are respectively: the Adjusted or Corrected Total Sum of Squares, the Regression or Model Sum of Squares and the Error Sum of Squares.

Simple Linear Regression Filled Gas tank 107 times. Record distance since last fill, gas needed to fill. Question for discussion: natural model? Look at JMP analysis.

The sum of squares decomposition in one example Example discussed in Introduction. Consider model with α 4 = (α 1 + α 2 + α 3 ). Y ij = µ + α i + ǫ ij Data consist of blood coagulation times for 24 animals fed one of 4 different diets. Now I write the data in a table and decompose the table into a sum of several tables. The 4 columns of the table correspond to Diets A, B, C and D. You should think of the entries in each table as being stacked up into a column vector, but the tables save space.

The design matrix can be partitioned into a column of 1s and 3 other columns. You should compute the product X T X and get 24 4 2 2 4 12 8 8 2 8 14 8 2 8 8 14 The matrix X T Y is just ij Y ij, j Y 1j j Y 4j, j Y 2j j Y 4j, j Y 3j j Y 4j

The matrix X T X can be inverted using a program like Maple. I found that 384(X T X ) 1 = 17 7 1 1 7 65 23 23 1 23 49 15 1 23 15 49 It now takes quite a bit of algebra to verify that the vector of fitted values can be computed by simply averaging the data in each column.

That is, the fitted value, ˆµ is the table 61 66 68 61 61 66 68 61 61 66 68 61 61 66 68 61 66 68 61 66 68 61 61 61

On the other hand fitting the model with a design matrix consisting only of a column of 1s just leads to ˆµ R (notation from the lecture) given by 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64

Earlier I gave identity: Y = ˆµ R + (ˆµ F ˆµ R ) + ˆǫ F which corresponds to the following identity: 2 6 4 62 63 68 56 60 67 66 62 63 71 71 60 59 64 67 61 65 68 63 66 68 64 63 59 3 7 5 = 2 6 4 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 3 2 + 7 6 5 4 3 2 4 3 3 2 4 3 3 2 4 3 3 2 4 3 2 4 3 2 4 3 3 3 3 2 + 7 6 5 4 1 3 0 5 1 1 2 1 2 5 3 1 2 2 1 0 1 0 2 0 0 3 2 2 3 7 5

Pythagoras identity: ANOVA The sums of squares of the entries of each of these arrays are as follows. Uncorrected total sum of squares: On the left hand side 62 2 + 63 2 + = 98644. The first term on the right hand side gives 24(64 2 ) = 98304. This term is sometimes put in ANOVA tables as the Sum of Squares due to the Grand Mean. But it is usually subtracted from the total to produce the Total Sum of Squares which we usually put at the bottom of the table This is often called the Corrected (or Adjusted) Total Sum of Squares.

In this case the corrected sum of squares is the squared length of the table 2 1 4 8 4 3 2 2 1 7 7 4 5 0 3 3 1 4 1 2 4 0 1 5 which is 340.

Treatment Sum of Squares: The second term on the right hand side of the equation has squared length 4( 3) 2 + 6(2) 2 + 6(4) 2 + 8( 3) 2 = 228. The formula for this Sum of Squares is I n i ( X i X ) 2 = i=1 j=1 I n i ( X i X ) 2 i=1 but I want you to see that the formula is just the squared length of the vector of individual sample means minus the grand mean. The last vector of the decomposition is called the residual vector. It has squared length 1 2 + ( 3) 2 + 0 2 + = 112.

Degrees of freedom: dimensions of spaces Corresponding to the decomposition of the total squared length of the data vector is a decomposition of its dimension, 24, into the dimensions of subspaces. For instance the grand mean is always a multiple of the single vector all of whose entries are 1; this describes a one dimensional space this is just another way of saying that the reduced ˆµ R is in the column space of the reduced model design matrix. The second vector, of deviations from a grand mean lies in the three dimensional subspace of tables which are constant in each column and have a total equal to 0. Similarly the vector of residuals lies in a 20 dimensional subspace the set of all tables whose columns sum to 0.

Degrees of Freedom This decomposition of dimensions is the decomposition of degrees of freedom. So 24 = 1 + 3 + 20 and the degrees of freedom for treatment and error are 3 and 20 respectively. The vector whose squared length is the Corrected Total Sum of Squares lies in the 23 dimensional subspace of vectors whose entries sum to 1. This produces the 23 total degrees of freedom in the usual ANOVA table.