Summary of Part II Key Concepts & Formulas Christopher Ting November 11, 2017 christopherting@smu.edu.sg http://www.mysmu.edu/faculty/christophert/ Christopher Ting 1 of 16
Why Regression Analysis? Understand the relationship between two variables (x i, y i ) of interest: y = Xβ + u (1) Is the relationship captured by β statistically significant? Is the statistical significance robust against heteroskedasticity and correlations? Once β estimate is obtained and given any new information x, what would be the point forecast of y? Also, what is the range of forecast? Christopher Ting 2 of 16
Matrix-Vector Framework Linear regression model of any number of explanatory variables is most general when it is written as y = Xβ + u. (2) The column vector y contains n observations that are to be explained by n K matrix X of observations inclusive of a column vector of ones (K := k + 1). The parameter vector β has K parameters. The noise vector u has rows. Christopher Ting 3 of 16
Regression by Sum of Least Squares of Noise Find β such that the sum of squared noise is as small as possible: β = min β u u = min β ( y Xβ ) ( y Xβ ) (3) Perform vector differentiation with respect to β to obtain the FOC: 2 ( X y X X β ) = 0. The resulting vector of parameter estimates is given by β = ( X X ) 1 X y. (4) Christopher Ting 4 of 16
Variance-Covariance Matrix of β Under the assumptions of homoscedasticity, i.e., V ( u ) = σ 2 ui, and E ( u X ) = 0, the variance-covariance matrix of β is V ( β ) = E ( ( β β )( β β ) ) = σ 2 u( X X ) 1. (5) The variance of β i is the i-th diagonal element of the variancecovariance matrix σ 2 u( X X ) 1. Christopher Ting 5 of 16
Steps to Compute OLS Standard Errors The variance of noise σ 2 u is unknown and it can be estimated as follows: 1. Compute the fitted values ŷy: 2. Compute the residuals or surprise 3. Compute the residual sum of squares (RSS) ŷy = Xβ (6) ûu = y ŷy (7) RSS = ûu ûu = n û i 2 (8) Christopher Ting 6 of 16
4. The variance of the residuals is 5. Let Ω := ( X X ) 1. The variance of βi is σ 2 u = 1 n Kûu ûu (9) V( β i ) = σ 2 uω ii. (10) Christopher Ting 7 of 16
Insight! The first-order conditions can be written as ( ) X y X β = 0, which is The OLS residuals are orthogonal to X. X ûu = 0. (11) Consequently, if X has a column vector of ones, then the average of the residuals is zero. This is because there is one row of ones in X and hence n 1 ûu = û i = 0. Christopher Ting 8 of 16
Ordinary Least Squares (OLS) s Goodness of Fit Explained sum of squares (ESS) is ESS := Total sum of squares (TSS) is n (ŷi y ) 2 TSS := y y = n ( yi y ) 2 = n (ŷi y ) 2 + n û 2 i (12) = ESS + RSS. (13) n The cross term (ŷi y )( ) y i ŷ i is zero because of the orthogonality: X ûu = 0. Christopher Ting 9 of 16
Proof n (ŷi y )( y i ŷ i ) = n (ŷi y ) û i = n ŷ i û i (since y n û i = 0 ) = = n ( ) X β û i i ( ûu X β) = β X ûu = 0. Christopher Ting 10 of 16
Properties of OLS Regression ) The estimates are unbiased, i.e., E ( β = β. The variance of the residuals is unbiased E ( σ 2 u) = σ 2 u. Efficiency: According to the Gauss-Markov theorem, among the classical linear regression models, the OLS estimator is the linear unbiased estimator of β with the minimum variance. Conditional normality β X N ( β, σ 2 u(x X) 1) (14) Christopher Ting 11 of 16
Statistical Inference For all j = 1, 2,..., K, the t test statistic for β j is β j β j σ u Ωjj t n K (15) Here, Ω := ( X X ) 1, and Ωjj is the j-th diagonal element. The α% significance level for β j is β j q σ u Ωjj β j β j + q σ u Ωjj, where q is the ( 1 α/2 ) -th quantile of the t n K distribution. Christopher Ting 12 of 16
Confidence Interval for Mean Response For a given observation x, which is a K 1 vector, the mean response is β x in sample. Given the unbiased estimate β, the variance of β x is V( β ) x ( = V x β ) = x V ( β ) x = σ 2 u x ( X X ) 1 x. (16) Hence, a 100 (1 α)% confidence interval for the in-sample mean response β x is β x ± qσ u x ( X X ) 1 x (17) Christopher Ting 13 of 16
Prediction Interval for a New Observation Suppose a future observation of x is obtained. assumption of u N (0, σu), 2 we have ( V y β ) x = V(u) + V( β ) x Then, by (18) Hence a 100 (1 α)% prediction interval for y is β x ± qσ u 1 + x ( X X ) 1 x (19) Christopher Ting 14 of 16
R 2 and Adjusted R 2 The coefficient of determination R 2 := ESS TSS = 1 RSS TSS (20) Denoted by R 2, the adjusted R 2 is based on the unbiased variances: RSS R 2 = 1 n K (21) TSS n 1 Christopher Ting 15 of 16
Simple Linear Regression: Special Case When K = 2 Slope and intercept estimators b = S xy S xx =: n (x i x)(y i y) n (x i x) 2 ; â = y b x. (22) OLS distribution is â d N a b b, σ 2 u ( ) 1 n + x2 S xx x σu 2 S xx x σu 2 S xx σu 2 S xx (23) Christopher Ting 16 of 16