Regression Analysis MIT 18.472 Dr. Kempthorne Spring 2015 1
Outline Regression Analysis 1 Regression Analysis 2
Multiple Linear Regression: Setup Data Set n cases i = 1, 2,..., n 1 Response (dependent) variable y i, i = 1, 2,..., n p Explanatory (independent) variables x i = (x i,1, x i,2,..., x i,p ) T, i = 1, 2,..., n Goal of Regression Analysis: Extract/exploit relationship between y i and x i. Examples Prediction Causal Inference Approximation Functional Relationships 3
General Linear Model: For each case i, the conditional distribution [y i x i ] is given by y i = ŷ i + E i where ŷ i = β 1 x i,1 + β 2 x i,2 + + β i,p x i,p β = (β 1, β 2,..., β p ) T are p regression parameters (constant over all cases) E i Residual (error) variable (varies over all cases) Extensive breadth of possible models Polynomial approximation (x i,j = (x i ) j, explanatory variables are different powers of the same variable x = x i ) Fourier Series: (x i,j = sin(jx i ) or cos(jx i ), explanatory variables are different sin/cos terms of a Fourier series expansion) Time series regressions: time indexed by i, and explanatory variables include lagged response values. Note: Linearity of ŷ i (in regression parameters) maintained with non-linear x. 4
Steps for Fitting a Model (1) Propose a model in terms of Response variable Y (specify the scale) Explanatory variables X 1, X 2,... X p (include different functions of explanatory variables if appropriate) Assumptions about the distribution of E over the cases (2) Specify/define a criterion for judging different estimators. (3) Characterize the best estimator and apply it to the given data. (4) Check the assumptions in (1). (5) If necessary modify model and/or assumptions and go to (1). 5
Specifying Assumptions in (1) for Residual Distribution Gauss-Markov: zero mean, constant variance, uncorrelated Normal-linear models: E i are i.i.d. N(0, σ 2 ) r.v.s Generalized Gauss-Markov: zero mean, and general covariance matrix (possibly correlated,possibly heteroscedastic) Non-normal/non-Gaussian distributions (e.g., Laplace, Pareto, Contaminated normal: some fraction (1 δ) of the E i are i.i.d. N(0, σ 2 ) r.v.s the remaining fraction (δ) follows some contamination distribution). 6
Specifying Estimator Criterion in (2) Least Squares Maximum Likelihood Robust (Contamination-resistant) Bayes (assume β j are r.v. s with known prior distribution) Accommodating incomplete/missing data Case Analyses for (4) Checking Assumptions Residual analysis Model errors E i are unobservable Model residuals for fitted regression parameters β j are: e i = y i [β 1x i,1 + β 2x i,2 + + β p x i,p ] Influence diagnostics (identify cases which are highly influential?) Outlier detection 7
Outline Regression Analysis 1 Regression Analysis 8
Ordinary Least Squares Estimates Least Squares Criterion: For β = (β 1, β 2,..., β p ) T, define L Q(β) = N [y i ŷ i ] 2 Li=1 N i=1 = [y i (β 1 x i,1 + β 2 x i,2 + + β i,p x i,p )] 2 Ordinary Least-Squares (OLS) estimate βˆ: minimizes Q(β). Matrix Notation y 1 x 1,1 x 1,2 x 1,p β 1 y 2 x 2,1 x 2,2 x 2,p y = X = β =........ β y p n x n,1 x n,2 x p,n 9
Solving for OLS Estimate ˆβ ŷ 1 ŷ 2 ŷ =. = Xβ and ŷ n L Q(β) = n (y i ŷ i ) 2 = (y ŷ) T (y ŷ) i=1 = (y Xβ) T (y Xβ) Q(β) OLS βˆ solves β =0, j = 1, 2,..., p j Q(β) L n = β [y i (x i,1 β 1 + x i,2 β 2 + x i,p β p )] 2 β j j i=1 Ln = i=1 2( x i,j )[y i (x i,1 β 1 + x i,2 β 2 + x i,p β p )] = 2(X [j] ) T (y Xβ) where X [j] is the jth column of X 10
Solving for OLS Estimate ˆβ Q β = Q β 1 [1] Q β 2 X T [2]. Q β p X T (y Xβ) (y Xβ) = 2 = 2XT (y Xβ). X T (y Xβ) So the OLS Estimate βˆ solves the Normal Equations X T (y Xβ) = 0 X T Xβˆ = X T y = βˆ = (X T X) 1 X T y [p] N.B. For βˆ to exist (uniquely) (X T X) must be invertible X must have Full Column Rank 11
(Ordinary) Least Squares Fit OLS Estimate: Where ˆβ = ˆβ 1 ˆβ 2 = (X T X) 1 X T y Fitted Values:. βˆp ŷ 1 x 1,1 βˆ1 + + x 1,p βˆp ŷ 2 x 2,1 βˆ1 + + x 2,p βˆp ŷ = =... ŷ n x n,1 βˆ1 + + x n,p βˆp = Xβˆ = X(X T X) 1 X T y = Hy H = X(X T X) 1 X T is the n n Hat Matrix 12
(Ordinary) Least Squares Fit The Hat Matrix H projects R n onto the column-space of X Residuals: Ê i = y i ŷ i, i = 1, 2,..., n ˆ = Ê 1 Ê 2. Ê n = y ŷ = (I n H)y 0 Normal Equations: X T (y Xβˆ) = X T ˆ = 0 p =.. 0 N.B. The Least-Squares Residuals vector ˆ is orthogonal to the column space of X 13
Outline Regression Analysis 1 Regression Analysis 14
Random Vector and Mean Vector Y 1 µ 1 Y 2 µ 2 Y = E [Y] = µ. Y =. where Y n µ n Y 1, Y 2,..., Y n have joint pdf f (y 1, y 2,...,, y n ) E (Y i ) = µ i, i = 1, 2,..., n Covariance Matrix Var(Y i ) = σ ii, i = 1,..., n Cov(Y i, Y j ) = σ ij, i, j = 1,..., n Σ = σ i,j : (n n) matrix with (i, j) element σ ij 15 MIT 18.472 Regression Analysis
Covariance Matrix σ 1,1 σ 1,2 σ 1,p σ 2,1 σ 2,2 σ 2,p Cov(Y) = Σ =...... σ n,1 σ n,2 σ p,n Theorem. Suppose Y is a random n-vector with E (Y) = µ Y and Cov(Y) = Σ YY A is a fixed (m n) matrix c is a fixed (m 1) vector. Then for the random m-vector: E (Z) = c + AE (Y) = c + Aµ Y Cov(Z) = Σ ZZ = AΣ YY A T Z = c + AY 16
Random m-vector: Z = c + AY Example 1 Y i i.i.d. with mean µ and variance σ 2. c = 0 and A = [1, 1,..., 1] T. (m = 1) Example 2 Y i i.i.d. with mean µ and variance σ 2. c = 0 and A = [1/n, 1/n,..., 1/n] T. (m = 1) Example 3 Y i i.i.d. with mean µ and variance σ 2. 1 0 0 0 0 c = 0 and A = 1 1 0 0 0 1 1 1 0 0 17
Quadratic Form A an (n n) symmetric matrix x an n-vector (an n 1 matrix) QF (x, A) = x T Ax n n = x i A ij x j i=1 j=1 Theorem. Let X be a random n-vector with mean µ and covariance Σ. For fixed n n matrix A E [X T AX] = trace(aσ) + µ T Aµ (trace of a square matrix is sum L of diagonal terms). Example: If Σ = σ 2 I, then E [ n (X i X ) 2 ] = (n 1)σ 2 A = I 1 11 T n n X T AX = L i=1 (X i X ) 2 i=1 18
Theorem. Let X be a random n-vector with mean µ and covariance Σ. For fixed p n matrix A Y = AX For fixed m n matrix B Z = BX Then the cross-covariance matrix of Y and Z is Σ YZ = AΣB T Example: If X is a random n-vector with mean µ = µ1 and covariance Σ = σ 2 I. A = I 1 11 T n 1 B = n 1 Solve for Y, Z and Cov(Y, Z) 19
Outline Regression Analysis 1 Regression Analysis 20
Least Squares Estimate ˆβ 1 ˆβ 2 ˆβ = = (X T X) 1 X T Y = AY. βˆp Mean: E (βˆ) = E (AY) = AE (Y) = AXβ = (X T X) 1 X T Xβ = β Covariance: Cov(βˆ) = ACov(Y)A T = A(σ 2 I)A T = σ 2 AA T = σ 2 (X T X) 1 21
Outline Regression Analysis 1 Regression Analysis 22
Normal Linear Regression Models Distribution Theory Y i = x i,1 β 1 + x i,2 β 2 + x i,p β p + E i = µ i + E i Assume {E 1, E 2,..., E n } are i.i.d N(0, σ 2 ). = [Y i x i,1, x i,2,..., x i,p, β, σ 2 ] N(µ i, σ 2 ), independent over i = 1, 2,... n. Conditioning on X, β, and σ 2 E 1 E 2 Y = Xβ +, where = N, σ 2 n (O n I n ). E n 23
Distribution Theory Regression Analysis µ = µ1.. µ n = E (Y X, β, σ 2 ) = Xβ 24
σ 2 0 0 0 0 σ 2 0 0 Σ = Cov(Y X, β, σ 2 ) = 0 0 σ 2 0...... 0 0 σ 2 That is, Σ i,j = Cov(Y i, Y j X, β, σ 2 ) = σ 2 δ i,j. = σ 2 I n Apply Moment-Generating Functions (MGFs) to derive Joint distribution of Y = (Y 1, Y 2,..., Y n ) T Joint distribution of βˆ = ( βˆ1, βˆ2,..., βˆp) T. 25
MGF of Y For the n-variate r.v. Y, and constant n vector t = (t 1,..., t n ) T, M Y (t) t 1 Y 1 +t 2 Y 2 + t ny n ) = E (e tt Y ) = E(e = E (e t 1Y 1 ) E (e t 2Y 2 ) E (e tnyn ) = M Y1 (t 1 ) M Y2 (t 2 ) M Yn (t n ) n 1 t i µ i + t 2 σ = 2 i 2 i=1 e n 1 n 1 i=1 t i µ i + 2 i,k=1 t i Σ i,k t k t T u+ t 2 T Σt = e = e = Y N n (µ, Σ) Multivariate Normal with mean µ and covariance Σ 26
MGF of βˆ For the p-variate r.v. βˆ, and constant p vector τ = (τ 1,..., τ p ) T, M τ T ˆ τ 1 βˆ1+τ 2 βˆ2+ τ p β p βˆ(τ ) = E (e β ) = E (e ) Defining A = (X T X) 1 X T we can express βˆ = (X T X) 1 X T y = AY and Mˆ(τ ) = E (eτ T β ˆ β ) = E (eτ T AY ) = E (e tt Y ), with t = A T τ = M Y (t) 1 t T u+ 2 t T Σt = e 27
MGF of βˆ Regression Analysis For Plug in: Mˆβ(τ ) = = t µ Σ E(e τ T ˆβ) e tt u+ 1 2 tt Σt = A T τ = X(X T X) 1 τ = Xβ = σ 2 I n Gives: t T µ = τ T β t T Σt = τ T (X T X) 1 X T [σ 2 I n ]X(X T X) 1 τ = τ T [σ 2 (X T X) 1 ]τ So the MGF of βˆ is 1 τ T β+ τ T [σ 2 (X T X) 1 ]τ M βˆ(τ ) = e 2 β ˆ N p (β, σ 2 (X T X) 1 ) 28
Marginal Distributions of Least Squares Estimates Because ˆβ N p (β, σ 2 (X T X) 1 ) the marginal distribution of each βˆj is: βˆj N(β j, σ 2 C j,j ) where C j.j = jth diagonal element of (X T X) 1 29
MIT OpenCourseWare http://ocw.mit.edu 18.443 Statistics for Applications Spring 2015 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.