THE UNIVERSITY OF CHICAGO Booth School of Business Business 41912, Spring Quarter 2016, Mr. Ruey S. Tsay

Size: px

Start display at page:

Download "THE UNIVERSITY OF CHICAGO Booth School of Business Business 41912, Spring Quarter 2016, Mr. Ruey S. Tsay"

Ami Ferguson
5 years ago
Views:

1 THE UNIVERSITY OF CHICAGO Booth School of Business Business 41912, Spring Quarter 2016, Mr. Ruey S. Tsay Lecture 5: Multivariate Multiple Linear Regression The model is Y n m = Z n (r+1) β (r+1) m + ɛ n m, (1) where E(ɛ (i) ) = 0 and Cov(ɛ (i), ɛ (j) ) = σ ij I for i, j = 1,..., m with ɛ (i) being the ith column of ɛ. The design matrix Z is of full rank r + 1 and n > r + 1. Here we have n observations in Y and Z. Each observation Y i = (Y i1,..., Y im ) contains m variables. Each observation Z i = (1, Z i1,..., Z ir ) contains r measurements of the design matrix. Let ɛ i = (ɛ i1,..., ɛ im ) be the error term for the i observation and we have Cov(ɛ i ) = Σ = [σ jk ]. Note that in Eq.(1), the ith row of Y is Y i, the ith row of ɛ is ɛ i and for the ith data point, we have Y i = β Z i + ɛ i, where Z i is the ith row of Z. That is, Z i is a (r +1)-dimensional vector. The prior equation states the equation for the ith data point. We can also consider the jth component of Y i over all sample. Then, we have Y j = Zβ j + ɛ j, where Y j, β j and ɛ j are the jth column of Y, β and ɛ, respectively. The prior equation is a multiple linear regression discussed before. From Eq.(1), we have vec(y ) = (I m Z)vec(β) + vec(ɛ), (2) where Cov(vec(ɛ)) = Σ I n. Here we use the identity vec(ab) = (I A)vec(B). The generalized least squares estimate of β is obtained by minimizing the objective function S(β) = [vec(ɛ)] [Σ I n ] 1 vec(ɛ) = [vec(y Zβ)] [Σ 1 I n ]vec(y Zβ) = tr[(y Zβ)Σ 1 (Y Zβ) ]. The last equality holds because Σ is symmetric and we use tr(dbc) = [vec(c )] (B I)vec(D). Next, using vec(zβ) = (I m Z)vec(β) and Eq.(1), we have S(β) = [vec(y ) vec(β) (I m Z )][Σ 1 I n ][vec(y ) (I m Z)vec(β))] = vec(y ) (Σ 1 I n )vec(y ) 2vec(β) (Σ 1 Z )vec(y ) + vec(β) (Σ 1 Z Z)vec(β).

2 Taking partial derivative of S(β) with respect to vec(β), we have S(β) vec(β) = 2(Σ 1 Z )vec(y ) + 2(Σ 1 Z Z)vec(β). Equating to zero gives the normal equations Consequently, the GLS estimate is (Σ 1 Z Z)vec( β) = (Σ 1 Z )vec(y ). vec( β) = (Σ 1 Z Z) 1 (Σ 1 Z )vec(y ) = [Σ (Z Z) 1 ](Σ 1 Z )vec(y ) = [I m (Z Z) 1 Z ]vec(y ) (3) = vec[(z Z) 1 Z Y ]. Therefore, β = (Z Z) 1 (Z Y ). Note that this is the ordinary least squares estimate. Also, from Eq. (3), we have Cov[vec( β)] = [I m (Z Z) 1 Z ][Σ I n ][I m (Z Z) 1 Z ] = [I m (Z Z) 1 Z ][Σ I n ][I m Z(Z Z) 1 ] = Σ (Z Z) 1. In a sense, we put m multiple linear regressions with the same design matrix Z together. The errors between the ith and kth multiple linear regressions might be correated as σ ik I n. Let the ith column of β as β (i) for i = 1,..., m. We can think of the model as one that consists of m multiple lienar regressions with the same design matrix Z. As such, we can estimate β (i) as β (i) = (Z Z) 1 Z Y (i). Collecting the estimates, we obtain β = [ β (1), β (2),..., β (m) ] = (Z Z) 1 Z [Y (1), Y (2),..., Y (m) ], or, for simplicity, β = (Z Z) 1 Z Y. For any choice of parameters B = [B (1), B (2),..., B (m) ], the error matrix is Y ZB. The error sum of squares and cross-products matrix is (Y ZB) (Y ZB) = 2

3 (Y (1) ZB (1) ) (Y (1) ZB (1) ) (Y (1) ZB (1) ) (Y (m) ZB (m) ).. (Y (m) ZB (m) ) (Y (1) ZB (1) ) (Y (m) ZB (m) ) (Y (m) ZB (m) ) The least squares estimates β (i) thus minimize tr[(y ZB) (Y ZB)]. The estimates also minimize the generalized variance (Y ZB) (Y ZB). Let Ŷ = Z β be the fitted values and ɛ = Y Ŷ be the residuals. It follows that ɛ = (I H)Y, where H is the hat-matrix as defined in the multiple linear regression. The orthogonality properties of the multiple linear regression continue to hold for the multivariate multiple linear regression. Specfically, 1. Z ɛ = 0, 2. Ŷ ɛ = 0, 3. Y Y = Ŷ Ŷ + ɛ ɛ, i.e. ɛ ɛ = Y Y β Z Z β. The second property says that Ŷ (i) is perpendicular to all residual vetors ɛ (k). Result 7.9. For the LSE β with Z being of full rank r + 1, 1. E( β (i) ) = β (i), i.e. E( β) = β. 2. Cov( β (i), β (k) ) = σ ik (Z Z) 1, i, j = 1,..., m. 3. E( ɛ) = 0 and E( ɛ (i), ɛ (k) ) = (n r 1)σ ik, so E( ɛ ɛ) = (n r 1)Σ. 4. ɛ and β are uncorrelated. Proof: Use the results of multiple linear regression discussed before. In addition, β (i) β (i) = (Z Z) 1 Z Y (i) β (i) = (Z Z) 1 Z ɛ (i),. and ɛ (i) = Y (i) Ŷ (i) = [I H]ɛ (i). Result Assume that Z is of full rank r + 1 and n (r + 1) + m. If the errors ɛ have a multivariate normal distribution, then the LSE β is also the maximum likelihood estimator of β, and β has a normal distribution with mean β and Cov( β (i), β (k) ) = σ ik (Z Z) 1. In addition, β is independent of the MLE of the positive definite matrix Σ given by Σ = 1 n ɛ ɛ = 1 n (Y Z β) (Y Z β), and n Σ is distributed as W p,n r 1 (Σ). The maximized likelihood L( µ, Σ) = (2π) mn/2 σ n/2 e mn/2. 3

4 1 Inference Partitioning Z = [Z 1, Z 2 ], where Z 1 and Z 2 are n (q + 1) and n (r q) matrix, respectively. The model becomes Y = Z 1 β 1 + Z 2 β 2 + ɛ. Test H o : β 2 = 0 vs H a : β 2 0. As before, consider the submodel Y = Z 1 β 1 + ɛ. The LSE of β 1 is β 1 = (Z 1Z 1 ) 1 Z 1Y and, under normality, the estimate of the residual covariance matrix is Σ 1 = n 1 (Y Z 1 β 1 ) (Y Z 1 β 1 ). The likelihood ratio is Λ = max β 1,Σ L(β 1, Σ) max β,σ L(β, Σ) = L( β 1, Σ 1 ) L( β, Σ) ( Σ ) n/2 = Σ. 1 Result Assume that Z is of full rank r + 1 and n > m + r + 1. Also, ɛ are normally distributed. Under H o : β 2 = 0, n Σ is distributed as W p,n r 1 (Σ) independently of n( Σ 1 Σ) which, in turn, is distributed as W p,r q (Σ). The likelihood ratio test of H o is equivalent to rejecting H o for large values of ( Σ ) 2 ln(λ) = n ln Σ. 1 Furthermore, for large n, the modified statistic [n r 1 1 ] ( Σ ) 2 (m r + q + 1) ln Σ 1 has, to a close approximation, a chi-square distribution with m(r q) degrees of freedom. Remarks: Some alternative test statistics have been proposed in the literature for testing H o : β 2 = 0. Let E = n Σ, G = n( Σ 1 Σ) and η 1 η 2... η s, where s = min(p, r q), be the non-zero eigenvalues of GE 1. Then, Wilk s lambda = s i=1 1 1+η i = E E+G, Pillai s trace = s i=1 η i 1+η i = tr[g(g + E) 1 ], Hotelling-Lawley trace = s i=1 η i = tr[ge 1 ], Roy s greatest root = η 1 1+η 1. 4

5 Prediction: To predict the mean response at a fixed values Z 0 of the predictor variables, we have Z 0 β N m (Z 0β, Z 0(Z Z) 1 Z 0 Σ), and n Σ W n r 1 (Σ), where the above two statistics are independent. The 100(1 α)% confidence ellipsoid for Z 0β is Z 0(β β) ( n n r 1 Z 0(Z Z) 1 Z 0 [( m(n r 1) n r m ) 1 Σ (β β) Z 0 ) F m,n r m (α) where F m,n r m (α) is the upper 100αth percentile of an F-distribution with m and n r m degrees of freedom. The 100(1 α)% simultaneous confidence intervals for E(Y i ) = Z 0 β (i) are Z β ( ) m(n r 1) 0 (i) ± F m,n r m (α) Z 0(Z ( ) n Z) n r m 1 Z 0 n r 1 ˆσ ii, where β (i) is the ith column of β and ˆσ ii is the (i, i)th element of Σ. Forecasting: To forecast the response Y 0 = Z 0β + ɛ 0 at Z 0, we have Y 0 Z 0 β = Z 0(β β) + ɛ 0 N m [0, (1 + Z 0(Z Z) 1 Z 0 )Σ], which is independent of n Σ. Thus, the 100(1 α)% prediction ellipsoid for Y 0 is (Y 0 Z β) ( 0 n 1 Σ) (Y 0 Z β) 0 n r 1 [( )] m(n r 1) (1 + Z 0(Z Z) 1 Z 0 ) n r m F m,n r m(α). The 100(1 α)% simultaneous prediction intervals for the individual responses Y 0i are Z β ( ) m(n r 1) ( ) 0 (i) ± n r m F m,n r m(α) (1 + Z 0(Z n Z) 1 Z 0 ) n r 1 ˆσ ii, for i = 1,..., m. ], 5

6 2 Regression models with time-series errors See R demonstration, using the Natural Gas Data on Table 7.4 of the textbook. In particular, a multiplicative seasonal model seems to fit the data better than the one used in the text. Example 7.6 of the textbook: natural gas for heating. Daily data and the variables are 1. Y : (Dependent variable) sendouts of natural gas 2. X 1 : Degree of heating days (DHD) which is defined as 65 o daily average temperature. 3. X 2 : Lagged DHD, i.e. DHD Lag. 4. X 3 : Windspeed (24-hour average) 5. X 4 : Weekend indicator. There are 63 observations shown in Table

Multivariate Linear Regression Models

Multivariate Linear Regression Models Regression analysis is used to predict the value of one or more responses from a set of predictors. It can also be used to estimate the linear association between