Xβ is a linear combination of the columns of X: Copyright c 2010 Dan Nettleton (Iowa State University) Statistics / 25 X =

Similar documents
Estimation of the Response Mean. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 27

Estimating Estimable Functions of β. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 17

2. A Review of Some Key Linear Models Results. Copyright c 2018 Dan Nettleton (Iowa State University) 2. Statistics / 28

The Aitken Model. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 41

Estimable Functions and Their Least Squares Estimators. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 51

STAT 540: Data Analysis and Regression

Constraints on Solutions to the Normal Equations. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 41

The Gauss-Markov Model. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 61

When is the OLSE the BLUE? Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 40

Introduction to Estimation Methods for Time Series models. Lecture 1

Linear Models Review

ML and REML Variance Component Estimation. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

Maximum Likelihood Estimation

3. The F Test for Comparing Reduced vs. Full Models. opyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics / 43

3. For a given dataset and linear model, what do you think is true about least squares estimates? Is Ŷ always unique? Yes. Is ˆβ always unique? No.

Preliminaries. Copyright c 2018 Dan Nettleton (Iowa State University) Statistics / 38

General Linear Test of a General Linear Hypothesis. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 35

Determinants. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 25

Quantitative Analysis of Financial Markets. Summary of Part II. Key Concepts & Formulas. Christopher Ting. November 11, 2017

Lecture 6: Linear models and Gauss-Markov theorem

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Chapter 3 Best Linear Unbiased Estimation

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Matrix Approach to Simple Linear Regression: An Overview

Chapter 5 Matrix Approach to Simple Linear Regression

Econ 620. Matrix Differentiation. Let a and x are (k 1) vectors and A is an (k k) matrix. ) x. (a x) = a. x = a (x Ax) =(A + A (x Ax) x x =(A + A )

ANOVA Variance Component Estimation. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 32

Gauss Markov & Predictive Distributions

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

Miscellaneous Results, Solving Equations, and Generalized Inverses. opyright c 2012 Dan Nettleton (Iowa State University) Statistics / 51

18.S096 Problem Set 3 Fall 2013 Regression Analysis Due Date: 10/8/2013

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Preliminary Linear Algebra 1. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 100

Distributions of Quadratic Forms. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 31

STA 2101/442 Assignment Four 1

THE ANOVA APPROACH TO THE ANALYSIS OF LINEAR MIXED EFFECTS MODELS

Topic 7 - Matrix Approach to Simple Linear Regression. Outline. Matrix. Matrix. Review of Matrices. Regression model in matrix form

Lecture 6: Geometry of OLS Estimation of Linear Regession

ANOVA Variance Component Estimation. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 32

Chapter 1. Matrix Algebra

Scheffé s Method. opyright c 2012 Dan Nettleton (Iowa State University) Statistics / 37

Multiple Linear Regression

STAT 100C: Linear models

Linear Algebra Review

14 Multiple Linear Regression

STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method.

Properties of Matrices and Operations on Matrices

Math Camp II. Basic Linear Algebra. Yiqing Xu. Aug 26, 2014 MIT

Linear Models and Estimation by Least Squares

Basic Distributional Assumptions of the Linear Model: 1. The errors are unbiased: E[ε] = The errors are uncorrelated with common variance:

2.7 Estimation with linear Restriction

Likelihood Ratio Test of a General Linear Hypothesis. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 42

Lecture 24: Weighted and Generalized Least Squares

Advanced Quantitative Methods: ordinary least squares

Multivariate Regression Analysis

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

Simple Linear Regression Model & Introduction to. OLS Estimation

FIRST MIDTERM EXAM ECON 7801 SPRING 2001

Quick Review on Linear Multiple Regression

STA 302f16 Assignment Five 1

Linear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77

Regression and Statistical Inference

3 Multiple Linear Regression

BANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1

Solutions for Econometrics I Homework No.1

The Linear Regression Model

Lecture 13: Simple Linear Regression in Matrix Format. 1 Expectations and Variances with Vectors and Matrices

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Y i = η + ɛ i, i = 1,...,n.

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

The BLP Method of Demand Curve Estimation in Industrial Organization

Reference: Davidson and MacKinnon Ch 2. In particular page

MIT Spring 2015

20.1. Balanced One-Way Classification Cell means parametrization: ε 1. ε I. + ˆɛ 2 ij =

Matrices and Multivariate Statistics - II

Fitting a regression model

COPYRIGHT. Abraham, B. and Ledolter, J. Introduction to Regression Modeling Belmont, CA: Duxbury Press, 2006

Linear Models in Machine Learning

Linear Algebra V = T = ( 4 3 ).

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

MLES & Multivariate Normal Theory

1. Variance stabilizing transformations; Box-Cox Transformations - Section. 2. Transformations to linearize the model - Section 5.

Regression. Oscar García

On V-orthogonal projectors associated with a semi-norm

20. REML Estimation of Variance Components. Copyright c 2018 (Iowa State University) 20. Statistics / 36

Economics 620, Lecture 4: The K-Variable Linear Model I. y 1 = + x 1 + " 1 y 2 = + x 2 + " 2 :::::::: :::::::: y N = + x N + " N

Ch. 5 Transformations and Weighting

Appendix A: Review of the General Linear Model

Regression #4: Properties of OLS Estimator (Part 2)

Ordinary Least Squares Regression

In the bivariate regression model, the original parameterization is. Y i = β 1 + β 2 X2 + β 2 X2. + β 2 (X 2i X 2 ) + ε i (2)

7.6 The Inverse of a Square Matrix

The Multivariate Normal Distribution. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 36

The Multivariate Normal Distribution. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 36

1 Introduction to Generalized Least Squares

Regression Models - Introduction

Example: Suppose Y has a Poisson distribution with mean

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Transcription:

The Gauss-Markov Linear Model y Xβ + ɛ y is an n random vector of responses X is an n p matrix of constants with columns corresponding to explanatory variables X is sometimes referred to as the design matrix β is an unknown parameter vector in IR p ɛ is an n random vector of errors E(ɛ) 0 and Var(ɛ) σ I, where σ is an unknown parameter in IR + The Column Space of the Design Matrix Xβ is a linear combination of the columns of X: β Xβ x,,x p β x + + β p x p β p The set of all possible linear combinations of the columns of X is called the column space of X and is denoted by {Xa : a IR p } The Gauss-Markov linear model says y is a random vector whose mean is in the column space of X and whose variance is σ I for some positive real number σ, ie, E(y) and Var(y) σ I,σ IR + Copyright c 00 Dan Nettleton (Iowa State University) Statistics 5 / 5 Copyright c 00 Dan Nettleton (Iowa State University) Statistics 5 / 5 An Example Column Space X {Xa : a IR p } { a : a IR} { } a : a IR { } a : a IR a Another Example Column Space X 0 0 0 0 0 0 a 0 : a IR a 0 0 a 0 + a 0 : a, a IR 0 a 0 a 0 + 0 a : a, a IR 0 a a a a : a, a IR a Copyright c 00 Dan Nettleton (Iowa State University) Statistics 5 / 5 Copyright c 00 Dan Nettleton (Iowa State University) Statistics 5 / 5

Another Column Space Example X 0 0 0 0 X 0 0 0 0 x C(X ) x X a for some a IR 0 x X for some a IR a x X b for some b IR x C(X ) Thus, C(X ) C(X ) Another Column Space Example (continued) X 0 0 0 0 X 0 0 0 0 x C(X ) x X a for some a IR x a + a 0 + a 0 x a + a a + a a + a a + a 0 0 for some a, a, a IR a + a x X for some a a + a, a, a IR for some a IR Copyright c 00 Dan Nettleton (Iowa State University) Statistics 5 5 / 5 Copyright c 00 Dan Nettleton (Iowa State University) Statistics 5 6 / 5 Another Column Space Example (continued) Estimation of E(y) A fundamental goal of linear model analysis is to estimate E(y) a + a x X for some a a + a, a, a IR x X b for some b IR x C(X ) Thus, C(X ) C(X ) We previously showed that C(X ) C(X ) Thus, it follows that C(X )C(X ) We could, of course, use y to estimate E(y) y is obviously an unbiased estimator of E(y), but it is often not a very sensible estimator For example, suppose y μ + y ɛ Should we estimate E(y) ɛ 6, and we observe y μ μ by y 6? Copyright c 00 Dan Nettleton (Iowa State University) Statistics 5 7 / 5 Copyright c 00 Dan Nettleton (Iowa State University) Statistics 5 8 / 5

Estimation of E(y) Orthogonal Projection Matrices The Gauss-Markov linear models says that E(y), so we should use that information when estimating E(y) Consider estimating E(y) by the point in that is closest to y (as measured by the usual Euclidean distance) This unique point is called the orthogonal projection of y onto and denoted by ŷ (although it could be argued that Ê(y) might be better notation) By definition, y ŷ min z y z, where a n i a i It can be shown that y IR n, ŷ P X y, where P X is a unique n n matrix known as an orthogonal projection matrix P X is idempotent: P X P X P X P X is symmetric: P X P X P X X X and X P X X 5 P X X(X X) X, where (X X) is any generalized inverse of X X Copyright c 00 Dan Nettleton (Iowa State University) Statistics 5 9 / 5 Copyright c 00 Dan Nettleton (Iowa State University) Statistics 5 0 / 5 Why Does P X X X? Generalized Inverses G is a generalized inverse of a matrix A if AGA A P X X P X x,,x p P X x,,p X x p x,,x p X We usually denote a generalized inverse of A by A If A is nonsingular, ie, if A exists, then A is the one and only generalized inverse of A AA A AI IA A If A is singular, ie, if A does not exist, then there are infinitely many generalized inverses of A Copyright c 00 Dan Nettleton (Iowa State University) Statistics 5 / 5 Copyright c 00 Dan Nettleton (Iowa State University) Statistics 5 / 5

Invariance of P X X(X X) X to Choice of (X X) If X X is nonsingular, then P X X(X X) X because the only generalized inverse of X X is (X X) If X X is singular, then P X X(X X) X and the choice of the generalized inverse (X X) does not matter because P X X(X X) X will turn out to be the same matrix no matter which generalized inverse of X X is used Suppose (X X) and (X X) are any two generalized inverses of X X Then X(X X) X X(X X) X X(X X) X X(X X) X An Example Orthogonal Projection Matrix Suppose y y ɛ 6 μ +, and we observe y ɛ ( ) X(X X) X ( ) / / / / Copyright c 00 Dan Nettleton (Iowa State University) Statistics 5 / 5 Copyright c 00 Dan Nettleton (Iowa State University) Statistics 5 / 5 An Example Orthogonal Projection Thus, the orthogonal projection of y onto the column space of X 6 Suppose X and y is P X y / / / / 6 Copyright c 00 Dan Nettleton (Iowa State University) Statistics 5 5 / 5 Copyright c 00 Dan Nettleton (Iowa State University) Statistics 5 6 / 5

Suppose X and y Suppose X and y y Copyright c 00 Dan Nettleton (Iowa State University) Statistics 5 7 / 5 Copyright c 00 Dan Nettleton (Iowa State University) Statistics 5 8 / 5 Suppose X and y Suppose X and y y^ y^ y y y y^ Copyright c 00 Dan Nettleton (Iowa State University) Statistics 5 9 / 5 Copyright c 00 Dan Nettleton (Iowa State University) Statistics 5 0 / 5

Optimality of ŷ as an Estimator of E(y) The angle between ŷ and y ŷ is 90 The vectors ŷ and y ŷ are orthogonal ŷ (y ŷ) ŷ (y P X y)ŷ (I P X )y (P X y) (I P X )y y P X(I P X )y y P X (I P X )y y (P X P X P X )y y (P X P X )y 0 ŷ is an unbiased estimator of E(y): E(ŷ) E(P X y)p X E(y) P X Xβ Xβ E(y) It can be shown that ŷ P X y is the best estimator of E(y) in the class of linear unbiased estimators, ie, estimators of the form My for M satisfying E(My) E(y) β IR p MXβ Xβ β IR p MX X Under the Normal Theory Gauss-Markov Linear Model, ŷ P X y is best among all unbiased estimators of E(y) Copyright c 00 Dan Nettleton (Iowa State University) Statistics 5 / 5 Copyright c 00 Dan Nettleton (Iowa State University) Statistics 5 / 5 Ordinary Least Squares (OLS) Estimation of E(y) OLS: Find a vector b IR p such that Note that Q(b ) Q(b) b IR p, where Q(b) Q(b) n (y i x (i) b) i n (y i x (i) b) (y Xb) (y Xb) y Xb i To minimize this sum of squares, we need to choose b IR p such Xb will be the point in that is closest to y In other words, we need to choose b such that Xb P X y X(X X) X y Clearly, choosing b (X X) X y will work Copyright c 00 Dan Nettleton (Iowa State University) Statistics 5 / 5 Ordinary Least Squares and the Normal Equations Often calculus is used to show that Q(b ) Q(b) b IR p if and only if b is a solution to the normal equations: X Xb X y If X X is nonsingular, multiplying both sides of the normal equations by (X X) shows that the only solution to the normal equations is b (X X) X y If X X is singular, there are infinitely many solutions that include (X X) X y for all choices of generalized inverse of X X X X(X X) X yx X(X X) X y X P X y X y Henceforth, we will use ˆβ to denote any solution to the normal equations Copyright c 00 Dan Nettleton (Iowa State University) Statistics 5 / 5

Ordinary Least Squares Estimator of E(y) Xβ We call Xˆβ P X Xˆβ X(X X) X Xˆβ X(X X) X y P X y ŷ the OLS estimator of E(y) Xβ It might be more appropriate to use Xβ rather than Xˆβ to denote our estimator because we are estimating Xβ rather than pre-multiplying an estimator of β by X As we shall soon see, it does not make sense to estimate β when X X is singular However, it does make sense to estimate E(y) Xβ whether X X is singular or nonsingular Copyright c 00 Dan Nettleton (Iowa State University) Statistics 5 5 / 5