MIT Spring 2015

Similar documents
STAT 100C: Linear models

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

18.S096 Problem Set 3 Fall 2013 Regression Analysis Due Date: 10/8/2013

Lecture 11. Multivariate Normal theory

MIT Spring 2016

Chapter 5 Matrix Approach to Simple Linear Regression

STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method.

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Topic 7 - Matrix Approach to Simple Linear Regression. Outline. Matrix. Matrix. Review of Matrices. Regression model in matrix form

STAT5044: Regression and Anova. Inyoung Kim

Linear Algebra Review

Methods of Estimation

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

3 Multiple Linear Regression

Estimating Estimable Functions of β. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 17

Random Vectors and Multivariate Normal Distributions

Maximum Likelihood Estimation

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Matrix Approach to Simple Linear Regression: An Overview

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1

STAT 540: Data Analysis and Regression

Quantitative Analysis of Financial Markets. Summary of Part II. Key Concepts & Formulas. Christopher Ting. November 11, 2017

BIOS 2083 Linear Models Abdus S. Wahed. Chapter 2 84

STAT 100C: Linear models

Ma 3/103: Lecture 24 Linear Regression I: Estimation

A Probability Review

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects

MLES & Multivariate Normal Theory

Lecture 13: Simple Linear Regression in Matrix Format. 1 Expectations and Variances with Vectors and Matrices

Lecture 9 SLR in Matrix Form

Prediction. Prediction MIT Dr. Kempthorne. Spring MIT Prediction

Linear Regression. Junhui Qian. October 27, 2014

Notes on Random Vectors and Multivariate Normal

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

The Statistical Property of Ordinary Least Squares

Probability and Statistics Notes

Quick Review on Linear Multiple Regression

Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011

Association studies and regression

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 26. Estimation: Regression and Least Squares

Regression diagnostics

Statistics 910, #5 1. Regression Methods

STAT5044: Regression and Anova. Inyoung Kim

Lecture 11: Regression Methods I (Linear Regression)

Linear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77

The Multivariate Normal Distribution 1

Regression Steven F. Arnold Professor of Statistics Penn State University

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

Matrices and Multivariate Statistics - II

Math 423/533: The Main Theoretical Topics

Multivariate Regression

Sampling Distributions

Random vectors X 1 X 2. Recall that a random vector X = is made up of, say, k. X k. random variables.

5.1 Consistency of least squares estimates. We begin with a few consistency results that stand on their own and do not depend on normality.

Introduction to Estimation Methods for Time Series models. Lecture 1

Multiple Linear Regression

The Multivariate Normal Distribution 1

Ch4. Distribution of Quadratic Forms in y

ANOVA: Analysis of Variance - Part I

Inference After Variable Selection

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

4 Multiple Linear Regression

Lecture 11: Regression Methods I (Linear Regression)

Regression and Statistical Inference

Xβ is a linear combination of the columns of X: Copyright c 2010 Dan Nettleton (Iowa State University) Statistics / 25 X =

Econ 620. Matrix Differentiation. Let a and x are (k 1) vectors and A is an (k k) matrix. ) x. (a x) = a. x = a (x Ax) =(A + A (x Ax) x x =(A + A )

Generalized Linear Models

Part 6: Multivariate Normal and Linear Models

Properties of the least squares estimates

Preliminaries. Copyright c 2018 Dan Nettleton (Iowa State University) Statistics / 38

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Regression Analysis for Data Containing Outliers and High Leverage Points

An Introduction to Bayesian Linear Regression

Moment Generating Function. STAT/MTHE 353: 5 Moment Generating Functions and Multivariate Normal Distribution

A Bayesian Treatment of Linear Gaussian Regression

Need for Several Predictor Variables

Bayesian Linear Regression

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Generalized Method of Moments (GMM) Estimation

ECON 4160, Autumn term Lecture 1

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

EC3062 ECONOMETRICS. THE MULTIPLE REGRESSION MODEL Consider T realisations of the regression equation. (1) y = β 0 + β 1 x β k x k + ε,

Applied linear statistical models: An overview

1. The Multivariate Classical Linear Regression Model

Chapter 3: Multiple Regression. August 14, 2018

THE UNIVERSITY OF CHICAGO Booth School of Business Business 41912, Spring Quarter 2016, Mr. Ruey S. Tsay

Lecture Notes on Different Aspects of Regression Analysis

Linear Regression Spring 2014

Recitation 1: Regression Review. Christina Patterson

. a m1 a mn. a 1 a 2 a = a n

Estimation of the Response Mean. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 27

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Homoskedasticity. Var (u X) = σ 2. (23)

A6523 Modeling, Inference, and Mining Jim Cordes, Cornell University

Transcription:

Regression Analysis MIT 18.472 Dr. Kempthorne Spring 2015 1

Outline Regression Analysis 1 Regression Analysis 2

Multiple Linear Regression: Setup Data Set n cases i = 1, 2,..., n 1 Response (dependent) variable y i, i = 1, 2,..., n p Explanatory (independent) variables x i = (x i,1, x i,2,..., x i,p ) T, i = 1, 2,..., n Goal of Regression Analysis: Extract/exploit relationship between y i and x i. Examples Prediction Causal Inference Approximation Functional Relationships 3

General Linear Model: For each case i, the conditional distribution [y i x i ] is given by y i = ŷ i + E i where ŷ i = β 1 x i,1 + β 2 x i,2 + + β i,p x i,p β = (β 1, β 2,..., β p ) T are p regression parameters (constant over all cases) E i Residual (error) variable (varies over all cases) Extensive breadth of possible models Polynomial approximation (x i,j = (x i ) j, explanatory variables are different powers of the same variable x = x i ) Fourier Series: (x i,j = sin(jx i ) or cos(jx i ), explanatory variables are different sin/cos terms of a Fourier series expansion) Time series regressions: time indexed by i, and explanatory variables include lagged response values. Note: Linearity of ŷ i (in regression parameters) maintained with non-linear x. 4

Steps for Fitting a Model (1) Propose a model in terms of Response variable Y (specify the scale) Explanatory variables X 1, X 2,... X p (include different functions of explanatory variables if appropriate) Assumptions about the distribution of E over the cases (2) Specify/define a criterion for judging different estimators. (3) Characterize the best estimator and apply it to the given data. (4) Check the assumptions in (1). (5) If necessary modify model and/or assumptions and go to (1). 5

Specifying Assumptions in (1) for Residual Distribution Gauss-Markov: zero mean, constant variance, uncorrelated Normal-linear models: E i are i.i.d. N(0, σ 2 ) r.v.s Generalized Gauss-Markov: zero mean, and general covariance matrix (possibly correlated,possibly heteroscedastic) Non-normal/non-Gaussian distributions (e.g., Laplace, Pareto, Contaminated normal: some fraction (1 δ) of the E i are i.i.d. N(0, σ 2 ) r.v.s the remaining fraction (δ) follows some contamination distribution). 6

Specifying Estimator Criterion in (2) Least Squares Maximum Likelihood Robust (Contamination-resistant) Bayes (assume β j are r.v. s with known prior distribution) Accommodating incomplete/missing data Case Analyses for (4) Checking Assumptions Residual analysis Model errors E i are unobservable Model residuals for fitted regression parameters β j are: e i = y i [β 1x i,1 + β 2x i,2 + + β p x i,p ] Influence diagnostics (identify cases which are highly influential?) Outlier detection 7

Outline Regression Analysis 1 Regression Analysis 8

Ordinary Least Squares Estimates Least Squares Criterion: For β = (β 1, β 2,..., β p ) T, define L Q(β) = N [y i ŷ i ] 2 Li=1 N i=1 = [y i (β 1 x i,1 + β 2 x i,2 + + β i,p x i,p )] 2 Ordinary Least-Squares (OLS) estimate βˆ: minimizes Q(β). Matrix Notation y 1 x 1,1 x 1,2 x 1,p β 1 y 2 x 2,1 x 2,2 x 2,p y = X = β =........ β y p n x n,1 x n,2 x p,n 9

Solving for OLS Estimate ˆβ ŷ 1 ŷ 2 ŷ =. = Xβ and ŷ n L Q(β) = n (y i ŷ i ) 2 = (y ŷ) T (y ŷ) i=1 = (y Xβ) T (y Xβ) Q(β) OLS βˆ solves β =0, j = 1, 2,..., p j Q(β) L n = β [y i (x i,1 β 1 + x i,2 β 2 + x i,p β p )] 2 β j j i=1 Ln = i=1 2( x i,j )[y i (x i,1 β 1 + x i,2 β 2 + x i,p β p )] = 2(X [j] ) T (y Xβ) where X [j] is the jth column of X 10

Solving for OLS Estimate ˆβ Q β = Q β 1 [1] Q β 2 X T [2]. Q β p X T (y Xβ) (y Xβ) = 2 = 2XT (y Xβ). X T (y Xβ) So the OLS Estimate βˆ solves the Normal Equations X T (y Xβ) = 0 X T Xβˆ = X T y = βˆ = (X T X) 1 X T y [p] N.B. For βˆ to exist (uniquely) (X T X) must be invertible X must have Full Column Rank 11

(Ordinary) Least Squares Fit OLS Estimate: Where ˆβ = ˆβ 1 ˆβ 2 = (X T X) 1 X T y Fitted Values:. βˆp ŷ 1 x 1,1 βˆ1 + + x 1,p βˆp ŷ 2 x 2,1 βˆ1 + + x 2,p βˆp ŷ = =... ŷ n x n,1 βˆ1 + + x n,p βˆp = Xβˆ = X(X T X) 1 X T y = Hy H = X(X T X) 1 X T is the n n Hat Matrix 12

(Ordinary) Least Squares Fit The Hat Matrix H projects R n onto the column-space of X Residuals: Ê i = y i ŷ i, i = 1, 2,..., n ˆ = Ê 1 Ê 2. Ê n = y ŷ = (I n H)y 0 Normal Equations: X T (y Xβˆ) = X T ˆ = 0 p =.. 0 N.B. The Least-Squares Residuals vector ˆ is orthogonal to the column space of X 13

Outline Regression Analysis 1 Regression Analysis 14

Random Vector and Mean Vector Y 1 µ 1 Y 2 µ 2 Y = E [Y] = µ. Y =. where Y n µ n Y 1, Y 2,..., Y n have joint pdf f (y 1, y 2,...,, y n ) E (Y i ) = µ i, i = 1, 2,..., n Covariance Matrix Var(Y i ) = σ ii, i = 1,..., n Cov(Y i, Y j ) = σ ij, i, j = 1,..., n Σ = σ i,j : (n n) matrix with (i, j) element σ ij 15 MIT 18.472 Regression Analysis

Covariance Matrix σ 1,1 σ 1,2 σ 1,p σ 2,1 σ 2,2 σ 2,p Cov(Y) = Σ =...... σ n,1 σ n,2 σ p,n Theorem. Suppose Y is a random n-vector with E (Y) = µ Y and Cov(Y) = Σ YY A is a fixed (m n) matrix c is a fixed (m 1) vector. Then for the random m-vector: E (Z) = c + AE (Y) = c + Aµ Y Cov(Z) = Σ ZZ = AΣ YY A T Z = c + AY 16

Random m-vector: Z = c + AY Example 1 Y i i.i.d. with mean µ and variance σ 2. c = 0 and A = [1, 1,..., 1] T. (m = 1) Example 2 Y i i.i.d. with mean µ and variance σ 2. c = 0 and A = [1/n, 1/n,..., 1/n] T. (m = 1) Example 3 Y i i.i.d. with mean µ and variance σ 2. 1 0 0 0 0 c = 0 and A = 1 1 0 0 0 1 1 1 0 0 17

Quadratic Form A an (n n) symmetric matrix x an n-vector (an n 1 matrix) QF (x, A) = x T Ax n n = x i A ij x j i=1 j=1 Theorem. Let X be a random n-vector with mean µ and covariance Σ. For fixed n n matrix A E [X T AX] = trace(aσ) + µ T Aµ (trace of a square matrix is sum L of diagonal terms). Example: If Σ = σ 2 I, then E [ n (X i X ) 2 ] = (n 1)σ 2 A = I 1 11 T n n X T AX = L i=1 (X i X ) 2 i=1 18

Theorem. Let X be a random n-vector with mean µ and covariance Σ. For fixed p n matrix A Y = AX For fixed m n matrix B Z = BX Then the cross-covariance matrix of Y and Z is Σ YZ = AΣB T Example: If X is a random n-vector with mean µ = µ1 and covariance Σ = σ 2 I. A = I 1 11 T n 1 B = n 1 Solve for Y, Z and Cov(Y, Z) 19

Outline Regression Analysis 1 Regression Analysis 20

Least Squares Estimate ˆβ 1 ˆβ 2 ˆβ = = (X T X) 1 X T Y = AY. βˆp Mean: E (βˆ) = E (AY) = AE (Y) = AXβ = (X T X) 1 X T Xβ = β Covariance: Cov(βˆ) = ACov(Y)A T = A(σ 2 I)A T = σ 2 AA T = σ 2 (X T X) 1 21

Outline Regression Analysis 1 Regression Analysis 22

Normal Linear Regression Models Distribution Theory Y i = x i,1 β 1 + x i,2 β 2 + x i,p β p + E i = µ i + E i Assume {E 1, E 2,..., E n } are i.i.d N(0, σ 2 ). = [Y i x i,1, x i,2,..., x i,p, β, σ 2 ] N(µ i, σ 2 ), independent over i = 1, 2,... n. Conditioning on X, β, and σ 2 E 1 E 2 Y = Xβ +, where = N, σ 2 n (O n I n ). E n 23

Distribution Theory Regression Analysis µ = µ1.. µ n = E (Y X, β, σ 2 ) = Xβ 24

σ 2 0 0 0 0 σ 2 0 0 Σ = Cov(Y X, β, σ 2 ) = 0 0 σ 2 0...... 0 0 σ 2 That is, Σ i,j = Cov(Y i, Y j X, β, σ 2 ) = σ 2 δ i,j. = σ 2 I n Apply Moment-Generating Functions (MGFs) to derive Joint distribution of Y = (Y 1, Y 2,..., Y n ) T Joint distribution of βˆ = ( βˆ1, βˆ2,..., βˆp) T. 25

MGF of Y For the n-variate r.v. Y, and constant n vector t = (t 1,..., t n ) T, M Y (t) t 1 Y 1 +t 2 Y 2 + t ny n ) = E (e tt Y ) = E(e = E (e t 1Y 1 ) E (e t 2Y 2 ) E (e tnyn ) = M Y1 (t 1 ) M Y2 (t 2 ) M Yn (t n ) n 1 t i µ i + t 2 σ = 2 i 2 i=1 e n 1 n 1 i=1 t i µ i + 2 i,k=1 t i Σ i,k t k t T u+ t 2 T Σt = e = e = Y N n (µ, Σ) Multivariate Normal with mean µ and covariance Σ 26

MGF of βˆ For the p-variate r.v. βˆ, and constant p vector τ = (τ 1,..., τ p ) T, M τ T ˆ τ 1 βˆ1+τ 2 βˆ2+ τ p β p βˆ(τ ) = E (e β ) = E (e ) Defining A = (X T X) 1 X T we can express βˆ = (X T X) 1 X T y = AY and Mˆ(τ ) = E (eτ T β ˆ β ) = E (eτ T AY ) = E (e tt Y ), with t = A T τ = M Y (t) 1 t T u+ 2 t T Σt = e 27

MGF of βˆ Regression Analysis For Plug in: Mˆβ(τ ) = = t µ Σ E(e τ T ˆβ) e tt u+ 1 2 tt Σt = A T τ = X(X T X) 1 τ = Xβ = σ 2 I n Gives: t T µ = τ T β t T Σt = τ T (X T X) 1 X T [σ 2 I n ]X(X T X) 1 τ = τ T [σ 2 (X T X) 1 ]τ So the MGF of βˆ is 1 τ T β+ τ T [σ 2 (X T X) 1 ]τ M βˆ(τ ) = e 2 β ˆ N p (β, σ 2 (X T X) 1 ) 28

Marginal Distributions of Least Squares Estimates Because ˆβ N p (β, σ 2 (X T X) 1 ) the marginal distribution of each βˆj is: βˆj N(β j, σ 2 C j,j ) where C j.j = jth diagonal element of (X T X) 1 29

MIT OpenCourseWare http://ocw.mit.edu 18.443 Statistics for Applications Spring 2015 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.