Zellner s Seemingly Unrelated Regressions Model James L. Powell Department of Economics University of California, Berkeley Overview The seemingly unrelated regressions (SUR) model, proposed by Zellner, can be viewed as a special case of the generalized regression model E(y) β, V(y) σ 2 Ω; however, it does not share all of the features or problems of other leading special cases (e.g., models of heteroskedasticity or serial correlation). While, like those models, the matrix Ω generally involves unknown parameters which must be estimated, the usual estimators for the covariance matrix of the least squares estimator ˆβ LS are valid, so that the usual inference procedures based on normal theory are valid if the dependent variable y is multinormal or if the sample size N is large and suitable limit theorems are applicable. Also, unlike those other models, there is little reason to test the null hypothesis H : Ω I; the form of Ω is straightforward and its parameters are easy to estimate consistently, so a feasible version of Aitken s GLS estimator is an attractive alternative to the asymptotically-inefficient LS estimator. The basic SUR model assumes that, for each individual observation i, there are M dependent variables y i,,y ij,,y im available, each with its own linear regression model: y ij x ijβ j + ε ij, i,,n, or, with the usual stacking of observations over i, y j j β j + ε j for j,,m, where y j and ε j are N-vectors and j is an N K j matrix, where K j dim(β j ) is the number of regressors for the j th regression. The standard conditions for the classical regression model are assumed to hold for each j : namely, E(y j ) j β j, V(y j ) σ jj I N,
with j nonstochastic and rank( j ) K j. Under these conditions, and the additional condition of multinormality of y j, the usual inference theory is valid for the classical LS estimator of β j, applied separately to each equation. However, the SUR model permits nonzero covariance between the error terms ε ij and ε ij for a given individual i across equations j and k, i.e., Cov(ε ij,ε ik )σ ij while assuming Cov(ε ij,ε i k) if i 6 i. This can be expressed more compactly in matrix form: C(ε j, ε k )σ jk I N. It is the potential nonzero covariance across equations j and k that allows for an improvement in efficiency of GLS relative to the classical LS estimator of each β j. Kronecker Product Notation Zellner s insight was that, like the usual stacking of the individual dependent variables y ij into an N-vector y j, those latter vectors can themselves be stacked into an MN-dimensional vector y, with a corresponding arrangement for the error terms, coefficient vectors, and regressors: y ε y y 2 (MN ), ε ε 2 (MN ), β (K ) y M ε M β β 2 β M, and with (MN K) 2 M K M K j. j, With this notation, and the individual assumptions for each equation j, it follows that E(y) β 2
and V(y) (MN MN) σ I N σ 2 I N σ M I N σ 2 I N σ 22 I N σ M I N σ MM I N This nonscalar covariance matrix is a particular mixture of the matrix σ σ 2 σ M Σ σ 2 σ 22 (M M) σ M σ MM and the (N N) identity matrix I N. A notation system for such combinations was proposed by the otherwise-despicable mathematician Kronecker, the so-called Kronecker product notation; for two matrices A [a ij ](i,,l, j,m) and B, the Kronecker product of A and B is defined as With this notation, clearly for the stacked SUR model. A B a B a 2 B a M B a 2 B a 22 B a L B a LM B V(y) Σ I N Kronecker products satisfy a distributive rule, which will come in handy later:.. (A B)(C D) AC BD, assuming all matrix products are well defined. From this rule follows another for inverses of Kronecker products: (A B) A B, assuming both A and B are invertible. 3
Least Squares and Generalized Least Squares With the foregoing notation, the classical least squares estimator for the vector β can be expressed as ˆβ LS y ( ) y ( 2 2) 2 y 2 ( M M) M y M In contrast, the GLS estimator of β (assuming Σ is known) is. ˆβ GLS (Σ I N ) (Σ I N ) y Σ I N Σ I N y σ ( ) σ 2 ( 2) σ M ( M) σ 2 ( 2 ) σ 22 ( 2 2) σ M ( M ) σ MM ( M M) 2 M P j σj y j P j σ2j y j P j σmj y j, where σ ij is defined to be the element in the i th row and j th column of Σ, i.e., Σ σ ij. To get a better idea of what is going on with the GLS estimator, consider the special case M 2, with ˆβ LS µ b b 2 σ 22 and ˆβ GLS µ ˆβ ˆβ 2 ; then it can be shown that the GLS estimators ˆβ and ˆβ 2 satisfy the two equations µ σ2 ˆβ b y 2 2ˆβ 2, ˆβ 2 b 2 µ σ2 σ 2 2 2 y ˆβ. Thus the GLS estimators can be viewed as adjusted versions of classical LS, where the adjustment involves the regression of the GLS residuals from the other equation on the regressors from each equation. As noted by Luce in JASA, 964, the GLS estimator for this model can be calculated sequentially by including appropriately-reweighted residuals from all other equations as additional regressors for each equation. Another important special case is when the matrix Σ is diagonal, i.e., σ ij if i 6 j. In this case, 4
since Σ diag[/σ ii ], it follows that ˆβ GLS σ ( ) σ 22 ( 2 2) σ MM ( M M) ( ) y ( 2 2) 2 y 2 ( M M) M y M ˆβ LS. σ y σ 22 2 y 2 σ MM M y M Not surprisingly, then, if there is no covariance across equations in the error terms, there is no prospect for an efficiency improvement in the GLS estimator relative to LS, applied equation by equation. Still another important special case is when the matrix of regressors is identical for each equation, i.e., j for some N K matrix, with K K/M. Here the stacked matrix takes the form (MN K) (I M ), and the GLS estimator also reduces to classical LS: ˆβ GLS (Σ I N ) (Σ I N ) y (I M ) Σ I N (IM ) (IM ) Σ I N y Σ Σ y y I M y 2 ˆβ LS. ( ) y ( ) y 2 ( ) y M y M Some intuition for this reduction can be obtained by considering Luce s result that GLS can be obtained iteratively, by starting from classical LS estimators and including the residuals from other equations as 5
regressors for each equation. Since the LS residuals are, by construction, orthogonal to the common matrix of regressors, their inclusion in each equation will not affect the LS estimates of the β j coefficients. An extension of this argument implies that, if each matrix of regressors j has a submatrix in common for example, if all equations have an intercept term then the GLS coefficients corresponding to those common regressors will be identical to their LS counterparts. Feasible GLS An obvious estimator of the unknown covariance matrix Σ V(y) [σ ij ] would be ˆΣ [ˆσ ij ], with ˆσ jk N y j j ˆβ j y k kˆβ k ; while these estimators are not unbiased for σ jk, they are consistent under the usual conditions, and obtaining unbiased estimators for σ jk when j 6 k involves more than a simple degrees of freedom adjustment. Again imposing reasonable regularity conditions, it can be shown that the feasible GLS estimator ˆβ FGLS ˆΣ I N ˆΣ I N y is asymptotically equivalent to the infeasible GLS estimator which assumes Σ is known: N ˆβFGLS ˆβ GLS p. Thus N ˆβ FGLS β d N (, V), where V µ plim N Σ I N µ plim N I N plim ˆV, so inference on the parameter vector β can be carried out using the approximate normality of ˆβ FGLS : A ˆβ FGLS N β, ˆV. 6