Doubly Constrained Factor Models with Applications. Summary

Size: px

Start display at page:

Download "Doubly Constrained Factor Models with Applications. Summary"

Andrew Wilcox
5 years ago
Views:

1 Doubly Constrained Factor Models with Applications Henghsiu Tsai 1 Institute of Statistical Science, Academia Sinica, Taiwan, R.O.C. Ruey S. Tsay Booth School of Business, University of Chicago, Illinois, U.S.A. Edward M. H. Lin Institute of Statistical Science, Academia Sinica, Taiwan, R.O.C. and Ching-Wei Cheng Department of Statistics, Purdue University, Indiana, U.S.A. Summary This paper focuses on factor analysis of high-dimensional data. We propose statistical methods that enable an analyst to make use of prior knowledge or substantive information to sharpen the estimation of common factors. Specifically, we consider a doubly constrained factor model that enables analysts to specify both row and column constraints of the data matrix to improve the estimation of common factors. The row constraints may represent classifications of individual subjects whereas the column constraints may show the categories of variables. We derive both the maximum likelihood and least squares estimates of the proposed doubly constrained factor model and use simulation to study the performance of the analysis in finite samples. Akaike information criterion is used for model selection. Monthly U.S. housing starts of nine geographical divisions are used to demonstrate the application of the proposed model. Keywords: Akaike information criterion, Constrained factor model, Eigenvalues, Factor model, Housing starts, Principal component analysis, Seasonality. 1 Corresponding author: Henghsiu Tsai, Institute of Statistical Science, Academia Sinica, 128 Academia Road, Section 2, Nankang, Taipei 11529, Taiwan, R.O.C. Tel: , Fax: , htsai@stat.sinica.edu.tw. 1

2 1 Introduction Big data have become common in statistical applications. In many situations, it is natural to entertain the data as a 2-dimensional array with row representing subjects and column denoting variables. See, for instance, the large panel data in the econometric literature and the multivariate time series analysis in statistics. Consider specifically the United States (U.S.) housing markets. The U.S. Census Bureau publishes monthly housing starts of nine geographical divisions shown in Figure 1. We employ 10 years of the data from January 1997 to December Here the data matrix Z is a 120-by-9 matrix with each column representing a division and each row denoting a particular calendar month. Figure 2 shows the time plots of the logarithms of monthly housing starts of the nine divisions. From the plots, it is clear that U.S. housing starts have strong seasonality. Furthermore, the housing starts also exhibit some common characteristics. It is then natural to consider both the seasonality and geographical divisions in searching for the common factors driving the U.S. housing markets. In this particular example, the seasonality leads naturally to row constraints whereas the geographical considerations give rise to column constraints. The goal of this paper is to consider such constraints when we search for common factors in a big data set. Factor models are widely used in econometric and statistical applications, and constrained factor models have also been studied in the literature. See, for instance, Lam and Yao (2012) and the references therein. Tsai and Tsay (2010) proposed constrained and partially constrained factor models for high-dimensional time series analysis. They show that column constraints can be used effectively to obtain parsimonious factor models for high-dimensional series. Only column constraints are considered in that paper, however. On the other hand, as illustrated by the U.S. housing starts data, both row and column constraints might be informative in some applications. Therefore, we investigate doubly constrained factor models in this paper. The theoretical framework of the proposed model is the constrained principal component analysis of Takane and Hunter (2001), and our study focuses on estimation of the proposed model. Principal component analysis was proposed originally for independent data, but it has been widely used in time series analysis. See, for instance, Peña and Box (1987) and Tiao, Tsay, and Wang (1993). Consider a T by N data matrix Z, rows and columns of which often represent subjects and 2

3 variables, respectively. Let G be a T by m matrix of row constraints of rank m, and H be an N by s matrix of column constraints of rank s. Both G and H are known a priori based on some prior knowledge or substantive information of the problem at hand. For instance, Tsai and Tsay (2010) use H to obtain the level, slope, and curvature of interest rates. They also use H to denote the industrial classification of stock returns. Let ω 1 (s by r), ω 2 (N by p) and ω 3 (s by q) be the loading matrices of full rank, and E (T by N) a matrix of residuals, where p < N, max{r, q} s < N, and q min{r, p}. The postulated doubly constrained factor (DCF) model for Z = [Z i,j ] = [Z 1,, Z T ] is Z = F 1 ω 1H + GF 2 ω 2 + GF 3 ω 3H + E, (1) where A denotes the transpose matrix of A, F 1 = [F (1) 1,..., F (T ) 1 ] (T by r), F 2 = [F (1) 2,..., F (m) 2 ] (m by p), F 3 = [F (1) 3,..., F (m) 3 ] (m by q), and E = [e 1,..., e T ] (T by N) with E(e i ) = 0 and var(e i ) = Ψ. We refer to the model in Equation (1) as a DCF model of order (r, p, q) with r, p and q denoting the number of common factors in F 1, F 2 and F 3, respectively. For statistical factor models, one further assumes that Ψ is a diagonal matrix. In the econometric and finance literature, Ψ is not necessarily diagonal and the model becomes an approximate factor model. For the DCF model in Equation (1), F i are common factors. The first term of model pertains to what in Z can be explained by H but not by G, the second term to what can be explained by G but not by H, the third term to what can be explained by both G and H, and the last term to what can be explained by neither G nor H. Often the third term of model (1) denotes the interaction between the constraints G and H. Thus, F 1, F 2 and F 3 can be interpreted as column, row, and interaction factors, respectively. Similar to the conventional factor models, the scales and orderings of the latent common factors F i are not identifiable. The paper is organized as follows. In Section 2 we consider estimation of the proposed DCF model, including model selection and the common factors. We use simulation in Section 3 to investigate the efficacy of the estimation methods. Section 4 applies the proposed analysis to the monthly U.S. housing starts, and Section 5 concludes. 3

4 2 Estimation The proposed doubly constrained factor model in Equation (1) can be estimated by either the least squares (LS) method or the maximum likelihood method. The LS estimates are less efficient, but easier to obtain. We begin with the LS method. 2.1 Least Squares Estimation Least squares estimates (LSE) of the doubly constrained factor model in (1) can be obtained by several steps under certain assumptions. Consider first the simple factor model, Z = F ω +E, with the assumption that the factor matrix F satisfies F F = T I r, where I r denotes the r by r identity matrix with r being the rank of ω. In this case, LSEs of F and ω can be obtained by minimizing the objective function l(f, ω) = tr { (Z F ω )(Z F ω ) } = tr(zz ) + tr(f ω ωf ) 2 tr(zωf ). Therefore, it can easily be seen that F = [g 1,..., g r ] (2) where g i is an eigenvector corresponding to the i th largest eigenvalue of the matrix ZZ, and the estimator for ω is ω = 1 T Z F. (3) See, for instance, Proposition A.5 of Lütkepohl (2005). Consider the doubly constrained factor model in (1) with the following assumptions: F 1F 1 = T I r, F 2F 2 = mi p, and F 3F 3 = mi q, G F 1 ω 1 = O and F 2ω 2H = O. The proposed LS estimation procedure is as follows: (i) Pre- and post-multiplying (1) by (G G) 1 G and H(H H) 1, we have (G G) 1 G ZH(H H) 1 = F 3 ω 3 + (G G) 1 G EH(H H) 1, which becomes an unconstrained factor model, and therefore we can solve F 3 and ω 3 using (2) and (3) by substituting (G G) 1 G ZH(H H) 1 for Z. 4

5 (ii) Let Z = Z G F 3 ω 3H and the model becomes Z = F 1 ω 1H + GF 2 ω 2 + E. (4) (iii) Post-multiplying (4) by H(H H) 1, we have Z H(H H) 1 = F 1 ω 1 + E H(H H) 1. Solve F 1 and ω 1 using (2) and (3) by substituting Z H(H H) 1 for Z. (iv) Let Z = Z F 1 ω 1H and the model becomes Z = GF 2 ω 2 + E. (5) (v) Pre-multiplying (5) by (G G) 1 G, we have (G G) 1 G Z = F 2 ω 2 + (G G) 1 G E. Solve F 2 and ω 2 using (2) and (3) by substituting (G G) 1 G Z for Z. (vi) Finally, Ê = Z F 1 ω 1H G F 2 ω 2 G F 3 ω 3H and Ψ = Ê Ê/T. It is understood that Ψ = diag(ê Ê/T ) if Ψ is diagonal. 2.2 Maximum Likelihood Estimation For maximum likelihood estimation, we assume that, for 1 k T, var(e t ) = Ψ is a diagonal N by N matrix. We further assume that E(F (k) i ) = 0, and var(f (k) i ) = I, the identity matrix, for 1 i 3, and all k. We also assume cov(f (k) i, F (l) j ) = 0 for k l or i j, cov(e i, e j ) = 0 for all i j, cov(f (k) i, e j ) = 0 for all i, j, and k, and e k is a N-dimensional Gaussian random vector with mean zero and diagonal covariance matrix Ψ. For the purpose of identifiability, we adopt the approach of Anderson (2003) by imposing the restrictions that the matrices Γ 1, Γ 2, and Γ 3 are all diagonal, where Γ 1 = ω 1H Ψ 1 Hω 1, Γ 2 = ω 2Ψ 1 ω 2, Γ 3 = ω 3H Ψ 1 Hω 3. (6) We also assume that the diagonal elements of Γ 1, Γ 2 and Γ 3 are ordered and distinct (γ 1 11 > γ 1 22 > > γ1 rr, γ 2 11 > γ2 22 > > γ2 pp, and γ 3 11 > γ3 22 > > γ3 qq), and the first non-zero 5

6 element in each column of the matrix ω i, i = 1, 2, 3, is positive, so ω 1, ω 2 and ω 3 are uniquely determined. It can readily be checked that the covariance of vec(z ) is Σ = I T A+GG B, where A = Hω 1 ω 1 H + Ψ, and B = ω 2 ω 2 + Hω 3ω 3 H. For the definitions of the matrix operators vec( ) and, see, for example, Schott (1997). We divide the discussion of maximum likelihood estimation into subsections to better understand the flexibility of the proposed model. Also, the existence of row constraints requires an additional condition to simplify the estimation Case 1: ω 2 = ω 3 = 0 In this particular case, the proposed model becomes Z = F 1 ω 1H + E, which is the column constrained factor model of Tsai and Tsay (2010). An iterated procedure was proposed there to perform estimation Case 2: ω 1 = ω 3 = 0 When ω 1 = ω 3 = 0, the doubly constrained factor model becomes Z = GF 2 ω 2 + E. Here the model can be estimated by an iterated procedure similar to that of Tsai and Tsay (2010). Let Y = (G G) 1 G Z, and C Y = Y Y /m. The estimating procedure is as follows: 1. Compute initial estimates of the diagonal matrix Ψ = [ Ψ ij ]. Following Jöreskog (1975), we set Ψ ii = (1 r/(2n))/s ii, i = 1,..., N, where s ii is the ith diagonal element of S 1 and S = Z Z/(T 1). 2. Construct the symmetric matrix R B = Ψ 1/2 (C Y m Ψ/T ) Ψ 1/2 and perform a spectral decomposition on R B, say R B = L B W B L B, where W B = diag(ˆγ j ) and ˆγ 1 > ˆγ 2 > > ˆγ N are the ordered eigenvalues of R B. 3. Let Γ B = W B and Γ 2 = W 2, where W 2 is the left-upper r r submatrix of W B. Obtain ω 2 from Ψ 1/2 ω 2 = L 2, where L 2 consists of the first r columns of L B. The eigenvectors 6

7 are normalized such that ω 2 Ψ 1 ω 2 = Γ 2. More precisely, ω 2 is a normalized version of ω 2 = Ψ 1/2 L 2, where the normalization is to ensure that ω 2 Ψ 1 ω 2 = Γ 2, a diagonal matrix. 4. Substitute ω 2 obtained in Step 3 into the objective function m T ln Q + T m ln T Ψ + tr(c Q 1 Y ) + tr((c CY ) Ψ 1 ), (7) where Q = T ω 2 ω 2/m + Ψ, and minimize (7) with respect to Ψ 11,..., Ψ NN. A numerical search routine must be used. The resulting values Ψ 11,..., Ψ NN are employed at Steps 2 and 3 to create a new ω 2. Steps 2, 3 and 4 are repeated until convergence, i.e., until the differences between successive values of ω 2 ij in ω 2 = [ ω 2 ij ] and Ψ ii are negligible Case 3. The Full Model In this case, the log-likelihood function of vec(z ) is log f(vec(z )) = T N 2 log(2π) 1 2 log Σ 1 2 {vec(z )} Σ 1 vec(z ). To evaluate the prior log-likelihood function, we assume, for simplicity, that the row constraint G satisfies G G = T m I m. (8) This is not a strong condition and it can be met easily. For example, if (i) G = I m 1 T/m, where 1 T/m is the T/m-dimensional vector of 1, or if (ii) G = 1 T/m I m, then Assumption (8) holds. Note that (ii) is the case for the U.S. housing starts data. Lemma 1. If G G = T m I m, then (a) Σ = Q m A T m, where Q = A + T m B, (b) Σ 1 = I T A 1 + GG U, where U = m T (Q 1 A 1 ). (c) {vec(z )} Σ 1 vec(z ) = tr(za 1 Z ) + tr(zuz GG ). Proof: See the Appendix. 7

8 Let C = Z Z/T, Y = (G G) 1 G Z, and C Y = Y Y /m, then by Equation (8), Lemma 1 (a) and (c), the log likelihood function of ω 1, ω 2, ω 3 and Ψ based on Z is ln L(ω 1, ω 2, ω 3, Ψ) = T N 2 ln(2π) m 2 = T N 2 ln(2π) m 2 T 2 tr[(c C Y )A 1 ]. T m ln Q 2 T m ln Q 2 Thus the objective function can be written as ln A T 2 2m tr(c Y U) T 2 tr(ca 1 ) ln A T 2 tr(c Y Q 1 ) 2 ln L(θ) = T N ln(2π) + m ln Q + (T m) ln A + T tr(c Y Q 1 ) + T tr[(c C Y )A 1 ], (9) where A = Hω 1 ω 1 H + Ψ, B = ω 2 ω 2 + Hω 3ω 3 H, Q = A + T B/m, and we minimize (9) with respect of θ = (ω 1, ω 2, ω 3, Ψ) to obtain the maximum likelihood estimate ˆθ. Note that equation (7) is a special case of equation (9) Case 4. ω 3 = 0 In this case, there is no interaction between the row and column constraints, and the model becomes The associated objective function is Z = F 1 ω 1H + GF 2 ω 2 + E. 2 ln L(θ) = T N ln(2π) + m ln Q + (T m) ln A + T tr(c Y Q 1 ) + T tr((c C Y )A 1 ), (10) where θ = (ω 1, ω 2, Ψ), A = Hω 1 ω 1 H + Ψ, and Q = A + T ω 2 ω 2 /m. We minimize (10) to obtain the estimate ˆθ Initial Estimates for Cases 3 and 4 For Cases 1 and 2, the MLE are computed by iterated procedures. For Cases 3 and 4, no iterative procedure is available, and the MLE must be obtained by some numerical optimization method with certain initial estimates. We use the LS estimates of subsection 2.1 as the initial estimates. 8

9 2.3 Estimation of Latent Factors for Maximum Likelihood Approach Treating the ML estimates of ω i as given, we can estimate the latent factors F i by using the weighted least squares method. Specifically, given ω 1, ω 2, and ω 3, the weighted least squares estimates of F 1, F 2, and F 3 can be obtained by minimizing f(f 1, F 2, F 3 ) = tr(eψ 1 E ) = tr((z F 1 ω 1 H GF 2 ω 2 GF 3ω 3 H )Ψ 1 (Z F 1 ω 1 H GF 2 ω 2 GF 3ω 3 H ) ). Taking the partial derivative of f(f 1, F 2, F 3 ) with respect to F 1, and equating the result to zero, we obtain f F 1 = F 1 tr( 2F 1 ω 1H Ψ 1 (Z GF 2 ω 2 GF 3 ω 3H ) + F 1 ω 1H Ψ 1 Hω 1 F 1) = 2(Z GF 2 ω 2 GF 3 ω 3H )Ψ 1 Hω 1 + 2F 1 ω 1H Ψ 1 Hω 1 = 0. (11) The second equality follows from the fact that Equation (11) implies that X tr(ax) = A, X tr(xax B) = BXA + B XA. F 1 = (Z GF 2 ω 2 GF 3 ω 3H )Ψ 1 Hω 1 (ω 1H Ψ 1 Hω 1 ) 1. (12) Similarly, f F 2 = F 2 tr( 2GF 2 ω 2Ψ 1 (Z F 1 ω 1H GF 3 ω 3H ) + GF 2 ω 2Ψ 1 ω 2 F 2G ) = 2G (Z F 1 ω 1H GF 3 ω 3H )Ψ 1 ω 2 + 2G GF 2 ω 2Ψ 1 ω 2 = 0. Let G = (G G) 1 G, and Ḡ = G(G G) 1 G, then F 2 = G(Z F 1 ω 1H GF 3 ω 3H )Ψ 1 ω 2 (ω 2Ψ 1 ω 2 ) 1. (13) Thirdly, F F 3 = F 3 tr( GF 3 ω 3H Ψ 1 (Z F 1 ω 1H GF 2 ω 2) + GF 3 ω 3H Ψ 1 Hω 3 F 3G ) = 2G (Z F 1 ω 1H GF 2 ω 2)Ψ 1 Hω 3 + 2G GF 3 ω 3H Ψ 1 Hω 3 = 0. 9

10 Therefore, F 3 = G(Z F 1 ω 1H GF 2 ω 2)Ψ 1 Hω 3 (ω 3H Ψ 1 Hω 3 ) 1. (14) Using (6) and let Γ 12 = Γ 21 = ω 1 H Ψ 1 ω 2, Γ 13 = Γ 31 = ω 1 H Ψ 1 Hω 3, Γ 23 = Γ 32 = ω 2 Ψ 1 Hω 3, Γ 01 = ZΨ 1 Hω 1, Γ 02 = ZΨ 1 ω 2, and Γ 03 = ZΨ 1 Hω 3, then Equations (12), (13), and (14) becomes Multiplying both sides of (17) by Γ 3 to get Plugging (16) into (18) to get F 1 = (Γ 01 GF 2 Γ 21 GF 3 Γ 31 )Γ 1 1, (15) F 2 = ( GΓ 02 GF 1 Γ 12 F 3 Γ 32 )Γ 1 2, (16) F 3 = ( GΓ 03 GF 1 Γ 13 F 2 Γ 23 )Γ 1 3, (17) F 3 Γ 3 = GΓ 03 GF 1 Γ 13 F 2 Γ 23, (18) F 3 Γ 3 = GΓ 03 GF 1 Γ 13 ( GΓ 02 GF 1 Γ 12 F 3 Γ 32 )Γ 1 2 Γ 23 = G{F 1 (Γ 12 Γ 1 2 Γ 23 Γ 13 ) + Γ 03 Γ 02 Γ 1 2 Γ 23} + F 3 Γ 32 Γ 1 2 Γ 23. Subtracting both sides by F 3 Γ 32 Γ 1 2 Γ 23, and then post-multiplying by 32 = (Γ 3 Γ 32 Γ 1 2 Γ 23) 1 to get F 3 = G{F 1 (Γ 12 Γ 1 2 Γ 23 Γ 13 ) + Γ 03 Γ 02 Γ 1 2 Γ 23} 32. (19) Similarly, multiplying both sides of (16) by Γ 2 to get Plugging (17) into (20) to get F 2 Γ 2 = GΓ 02 GF 1 Γ 12 F 3 Γ 32. (20) F 2 Γ 2 = GΓ 02 GF 1 Γ 12 ( GΓ 03 GF 1 Γ 13 F 2 Γ 23 )Γ 1 3 Γ 32 = G{F 1 (Γ 13 Γ 1 3 Γ 32 Γ 12 ) + Γ 02 Γ 03 Γ 1 3 Γ 32} + F 2 Γ 23 Γ 1 3 Γ 32. Subtracting both sides by F 2 Γ 23 Γ 1 3 Γ 32, and then post-multiplying by 23 = (Γ 2 Γ 23 Γ 1 3 Γ 32) 1 to get F 2 = G{F 1 (Γ 13 Γ 1 3 Γ 32 Γ 12 ) + Γ 02 Γ 03 Γ 1 3 Γ 32} 23. (21) 10

11 Now, multiplying both sides of (15) by Γ 1, we have Plugging (19) and (21) into (22), we obtain F 1 Γ 1 = Γ 01 GF 2 Γ 21 GF 3 Γ 31. (22) F 1 Γ 1 = Γ 01 Ḡ{F 1 (Γ 13 Γ 1 3 Γ 32 Γ 12 ) + Γ 02 Γ 03 Γ 1 3 Γ 32} 23 Γ 21 Ḡ{F 1 (Γ 12 Γ 1 2 Γ 23 Γ 13 ) + Γ 03 Γ 02 Γ 1 2 Γ 23} 32 Γ 31. (23) Pre-multiplying both sides of (23) by G, and noting that G Ḡ = G, we have G F 1 Γ 1 = G Γ 01 G {F 1 (Γ 13 Γ 1 3 Γ 32 Γ 12 ) + Γ 02 Γ 03 Γ 1 3 Γ 32} 23 Γ 21 One solution to equation (24) is G {F 1 (Γ 12 Γ 1 2 Γ 23 Γ 13 ) + Γ 03 Γ 02 Γ 1 2 Γ 23} 32 Γ 31. (24) F 1 Γ 1 = Γ 01 {F 1 (Γ 13 Γ 1 3 Γ 32 Γ 12 ) + Γ 02 Γ 03 Γ 1 3 Γ 32} 23 Γ 21 From equation (25), we obtain {F 1 (Γ 12 Γ 1 2 Γ 23 Γ 13 ) + Γ 03 Γ 02 Γ 1 2 Γ 23} 32 Γ 31. (25) F 1 = { Γ 01 (Γ 02 Γ 03 Γ 1 3 Γ 32) 23 Γ 21 (Γ 03 Γ 02 Γ 1 2 Γ } 23) 32 Γ 31 { Γ1 + (Γ 13 Γ 1 3 Γ 32 Γ 12 ) 23 Γ 21 + (Γ 12 Γ 1 2 Γ } 1 23 Γ 13 ) 32 Γ 31. (26) Therefore, we use Equation (26) to compute F 1 first, then we use Equation (21) to compute F 2, and Equation (19) to compute F Model Selection In applications, the data generating process is unknown and one needs to select a proper constrained factor model based on the available data. In particular, the validity of row and/or column constraints must be verified. To this end, we consider the Akaike information criterion (AIC) (Akaike, 1974) for each of the fitted model, AIC = 2 ln L(ˆθ) + 2λ, where λ is the number of parameters of the model, and ˆθ is the MLE. Our simulation study and empirical example show that AIC works well in model selection. 11

12 Tsai and Tsay (2010) used hypothesis testing to check the validity of column constraints. The testing procedure becomes complicated for doubly constrained factor models because it would involve non-nested hypothesis testing. For instance, the model with only column constraints is not a sub-model of the one with only row constraints. 3 Simulation Study In this section, we report some finite-sample performance of the MLE and the AIC in Subsections 3.1 and 3.2, respectively. All computations in this section were performed using some Fortran code with IMSL subroutines. 3.1 Finite Sample Properties of the MLE We are interested in evaluating the performance of the numerical optimization in finding the MLE. The true model considered is the full model described in Subsection with N = 6, r = 2, p = 2, q = 1, s = 2, m = 12, G = 1 T/m I m, and H = I This is a DCF model of order (2,2,1). The values of ω i, i = 1, 2, 3, and Ψ used in the simulation are given in Tables 1 and 2, and the sample sizes employed are T = 480, 960 and 1,920. The estimation results of 10,000 repetitions are summarized also in Tables 1 and 2. From the tables, the maximum likelihood estimation works well in general and, as expected, the biases and the standard errors of the MLEs become smaller when the sample size increases. The estimates of ω 2 are relatively uncertain and their standard errors improve slowly with the increase in sample size. 12

13 Table 1: Simulation results of Ψ and ω 1 estimation via maximum likelihood method for the doubly constrained factor model, where the model is of order (2,2,1). Least squares estimates are used as initial values. (a) Ψ (True) Ψ ii Estimate when T=480 ˆΨ ii (St.E.) Estimate when T=960 ˆΨ ii (St.E.) Estimate when T=1920 ˆΨ ii (St.E.) (b) ω 1 (True) ω 1[, 1] ω 1[, 2] Estimate when T=480 ω 1[, 1] (St.E.) ω 1[, 2] (St.E.) Estimate when T=960 ω 1[, 1] (St.E.) ω 1[, 2] (St.E.) Estimate when T=1920 ω 1[, 1] (St.E.) ω 1[, 2] (St.E.) The numbers shown are the sample means and standard deviations of the estimates from 10,000 repetitions. the diagonal elements of Ψ, i = 1,..., 6. 13

14 Table 2: Simulation results of ω 2 and ω 3 estimation via the maximum likelihood method for the doubly constrained factor model, where the model is of order (2,2,1). Least squares estimates are used as initial values. (c) ω 2 (True) ω 2[, 1] ω 2[, 2] Estimate when T=480 ω 2[, 1] (St.E.) ω 2[, 2] (St.E.) Estimate when T=960 ω 2[, 1] (St.E.) ω 2[, 2] (St.E.) Estimate when T=1920 ω 2[, 1] (St.E.) ω 2[, 2] (St.E.) (d) ω 3 (True) ω 3[, 1] Estimate when T=480 ω 3[, 1] (St.E.) Estimate when T=960 ω 3[, 1] (St.E.) Estimate when T=1920 ω 3[, 1] (St.E.) The numbers shown are the sample means and standard deviations of the estimates from 10,000 repetitions. 14

15 3.2 Performance of AIC As mentioned in Subsection 2.4, the testing procedure of Tsai and Tsay (2010) becomes complicated for doubly constrained factor models because it would involve non-nested hypothesis testing. Therefore, AIC is used for checking the adequacy of the column and/or row constraints. In this subsection, we consider the finite sample performance of the AIC in selecting the data generating model among Cases 1-4 below. The data generating models considered are Case 1: ω 2 = ω 3 = 0, and ω 1 is the same as that of Subsection 3.1, Case 2: ω 1 = ω 3 = 0, and ω 2 is the same as the one used in Subsection 3.1, Case 3: the Full Model studied in Subsection 3.1, Case 4: ω 3 = 0, and ω 1 and ω 2 are the same as those studied in Subsection 3.1. The matrices H and G, the values of Ψ, N, r, p, q, m, and s are also the same as those of Subsection 3.1. For singly constrained factor models (Cases 1 and 2), we implement the estimation procedures as described in Subsections and 2.2.2, respectively. For doubly constrained factor models (Cases 3 and 4), we estimate all parameters via the objective functions (9) and (10), using the optimizing subroutine DNCONF from FORTRAN s IMSL library. The least squares estimates discussed in Subsection 2.1 are used as the initial values. The percentages of correct model specification based on 1,000 repetitions are reported in Table 3. The results show that the AIC works well in selecting a proper doubly constrained factor model. 4 Application To demonstrate the application of the proposed doubly constrained factor model, we consider the total housing starts of the United States, obtained from the U.S. Census Bureau website. The data period is from January 1997 to December 2006, so that we have 120 monthly data for the nine main divisions of the U.S. shown in Figure 1. The LOESS regression is applied 15

16 Table 3: The percentages of correct model specifications of doubly constrained factor model via the AIC criterion. The results are based on 1,000 repetitions. True model Fitted model Case 1 Case 2 Case 3 Case 4 T=480 Case % 0.0% 0.0% 0.0% Case 2 0.0% 98.4% 0.0% 0.0% Case 3 0.0% 0.0% 84.5% 17.6% Case 4 0.2% 1.6% 15.5% 82.4% T=960 Case % 0.0% 0.0% 0.0% Case 2 0.0% 98.8% 0.0% 0.0% Case 3 0.0% 0.0% 98.0% 17.7% Case 4 0.0% 1.2% 2.0% 82.3% T=1,920 Case % 0.0% 0.0% 0.0% Case 2 0.0% 99.1% 0.0% 0.0% Case 3 0.0% 0.1% 99.8% 19.2% Case 4 0.1% 0.8% 0.2% 80.8% to the log transformed data before fitting the doubly constrained factor model. This step is taken to remove the trend of the series. To specify the constraint matrix H, prior experience or geographical clustering may be helpful. In this paper, we apply the hierarchical clustering to the variables to specify H. It turns out that the result is consistent with the geographical clustering. Therefore, we employ three groups for the variables (divisions) and they are as follows: Group 1: New England, Middle Atlantic, East North Central, West North Central ; Group 2: South Atlantic, East South Central, West South Central ; Group 3: Mountain, Pacific. The H matrix simply consists of the indicator variables for the 3 groups. From Figure 1, Group 1 consists of the Northeast and Midwest of the U.S., Group 2 denotes the South, whereas Group 3 is the West. The time plots of Figure 2 show that the housing starts exhibit strong seasonality of period 12. Therefore, we let G = 1 10 I 12. Consequently, for this particular instance, we have 16

17 Figure 1: The census regions and divisions of the United States m = 12, T = 120, N = 9, and s = 3. We consider the DCF models of order (r, p, q) with 0 r, p, q 3, and q min{r, p}. Therefore, a total of 30 models were entertained. Table 4 shows the ranking of the entertained DCF models based on the AIC criterion, where the model of order (0,0,0) means an unrestricted model. Based on the AIC criterion, the doubly constrained factor model of order (2,2,1) is selected with the model of order (2,2,2) as a close second. Model checking shows that the residuals of the fitted DCF model of order (2,2,1) have some minor serial correlations, but those of the model of order (2,2,2) are close to being white noises. Therefore, we adopt the DCF model of order (2,2,2). Figure 3 shows the time plots of the residuals, Ê, of the entertained DCF(2,2,2) model. The left panel consists of the residuals of least square estimation whereas the right panel those of the maximum likelihood estimates. The two sets of residuals show similar pattern, but also contain certain differences. However, their sample autocorrelation functions confirm that the residuals have no significant serial dependence; see Figure 4. Table 5 gives the maximum likelihood estimates of the ω i for the selected DCF model of order (2,2,2). The corresponding 17

18 New England West Noth Central West South Central Middle Atlantic East North Central South Atlantic East South Central moutain Pacific year year year Figure 2: Time plots of monthly housing starts (in logarithms) of nine U.S. divisions: LSE of ω i are given in Table 6. These estimates are different from those of MLE of Table 5 because different normalizations are used. Figure 5 shows the time plots of the fitted common factors. The upper three panels show the common factors obtained by the least squares method whereas the lower three panels give the corresponding results for the maximum likelihood estimation. Care must be exercised in comparing the fitted common factors because their scales and orderings are not identifiable. For instance, consider the fitted common factors F 3. The orderings seem to be interchanged between the two estimation methods. Overall, the common factors F 1 of the maximum likelihood estimation appear to have some seasonality. We shall return to this point in our discussion later. 4.1 Discussion To gain insight into the decomposition of the housing starts implied by the fitted DCF model of order (2,2,2), we consider in details the results of maximum likelihood estimation. Figures 6 to 8 show the time plots of the decompositions of the housing starts series. The plots in Figure 6 consist of G F 2 ω 2 of Equation(1). Since the row constraints used are monthly indicator variables, these plots signify the deterministic seasonal pattern of each housing starts series that is orthogonal to the geographical divisions. From the plots, the deterministic 18

19 Table 4: The rankings of AIC for the proposed constrained factor models. Model AIC ranks Model AIC ranks (r,p,q) (r,p,q) (0,0,0) (3,3,0) (0,1,0) (1,1,1) (0,2,0) (2,1,1) (0,3,0) (3,1,1) (1,0,0) (1,2,1) (2,0,0) (2,3,1) (3,0,0) (2,3,2) (1,1,0) (1,3,1) (2,1,0) (3,2,1) (3,1,0) (3,2,2) (1,2,0) (2,2,1) (2,3,0) (2,2,2) (1,3,0) (3,3,1) (3,2,0) (3,3,2) (2,2,0) (3,3,3) Table 5: Maximum likelihood estimates of the doubly constrained factor model of order (2,2,2) for the U.S. housing starts data from 1997 to (a) MLE of ω 1 ω 1[, 1] ω 1[, 2] (b) MLE of ω 2 ω 2[, 1] ω 2[, 2] (c) MLE of ω 3 ω 3[, 1] ω 3[, 2] seasonality varies from series to series, but those of the East North Central and West North Central are similar. This seems reasonable as these two divisions are the Midwest and share close weather characteristics. New England and Middle Atlantic divisions have their own deterministic seasonal patterns. Finally, the Mountain and West South Central also share similar deterministic seasonal pattern. The plots in Figure 7 consist of F 1 ω 1H of Equation (1), which denotes housing variations due to the geographical locations, but is orthogonal to the deterministic seasonality. The 19

20 Table 6: Least squares estimates of the doubly constrained factor model of order (2,2,2) for the U.S. housing data from 1997 to LSE of ω 1 ω 1[, 1] ω 1[, 2] LSE of ω 2 ω 2[, 1] ω 2[, 2] LSE of ω 3 ω 3[, 1] ω 3[, 2] column constraints essentially pool information within each group to obtain the geographical housing variations. The series in Figure 7 also contain certain seasonality and we believe that they describe the stochastic seasonality of the three geographical groups. These stochastic seasonalities differ from group to group. Figure 8 shows the interactions G F 3 ω 3H between geographical grouping and deterministic seasonality of Equation (1). The plots show marked differencs between the three interactions. For this particular example, the proposed DCF model is capable of describing the seasonal and geographical patterns of U.S. housing starts. The example also demonstrates that the row and column constraints can be used to gain insight into the common structure of a multivariate time series. 5 Concluding Remarks In this paper, we considered both the least squares and maximum likelihood estimations of a doubly constrained factor model, and demonstrated the proposed methods by analyzing nine U.S. monthly housing starts series. The decomposition of the housing starts series shows that the proposed model is capable of describing the characteristics of the data. Much work of the constrained factor models, however, remains open. For instance, the maximum likelihood estimation is obtained under the normality assumption. In real applications, such an assumption might not be valid and the innovations of Equation (1) may contain conditional heteroscedasticity. In addition, we only consider deterministic constraints in the paper. It is of interest to investigate the proposed analysis when the constraints are stochastic. 20

21 Appendix: Proof of Lemma 1 To prove part (a), write B = ω B ω B, where ω B = [ω 2 Hω 3 ], then we have Σ = I T A + (G ω B )(G ω B) (by the definition of Σ and Theorem 7.7 of Schott, 1997) = I T A I m(p+q) + (G ω B)(I T A 1 )(G ω B ) (by Theorem of Harville, 1997, and Theorem 7.9 (a) of Schott, 1997) = A T I m(p+q) + T m I m ω BA 1 ω B (by Equation (8), Theorems 7.7 and 7.11 of Schott, 1997) = A T I mn + T m I m A 1/2 BA 1/2 (by Theorem 7.7 of Schott, 1997, and Theorem of Harville, 1997) = A T Im A 1/2 2 (I m A 1/2 ) (I m I N + Tm ) I m A 1/2 BA 1/2 (I m A 1/2 ) ( = A T m I m A + I m T ) m B (by Theorems 7.7 and 7.11 of Schott, 1997) = A T m I m Q (by the definition of Q and Theorem 7.6 (e) of Schott, 1997) = Q m A T m (by Theorem 7.11 of Schott, 1997). This proves part (a). Now, we prove part (b). We will prove part (b) by showing that Σ Σ 1 = Σ 1 Σ = INT. First note that, by the definitions of U and Q, we have QUA = B, and so QU = BA 1. Therefore, Σ Σ 1 = (I T A + GG B)(I T A 1 + GG U) = (I T I N ) + (GG AU) + (GG BA 1 ) + = I NT + (GG QU) + (GG BA 1 ) = I NT. ( ) T m GG BU Similarly, it can be shown that Σ 1 Σ = INT. This proves (b). Part (c) follows from part (b) and Theorem 7.17 of Schott (1997). 21

22 References [1] Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control 19, [2] Anderson, T. W. (2003). An introduction to multivariate statistical analysis. N.J.: Wiley- Interscience. [3] Harville, D. A. (1997). Matrix Algebra From a Statistician s Perspective. New York: Springer. [4] Jöreskog, K.G. (1975),. Factor analysis by least squares and maximum likelihood, in Statistical Methods for Digital Computers, eds. K. Enslein, A. Ralston, and H. S. Wilf, New York: John Wiley. [5] Lam, C. and Yao, Q.W. (2012). Factor modeling for high-dimensional time series: inference for the number of factor. The Annals of Statistics 40, [6] Lütkepohl, H. (2005). New Introduction to Multiple Time Series Analysis, Berlin: Springer-Verlag. [7] Peña, D. and Box, G. E. P. (1987). Identifying a simplifying structure in time series. Journal of the American Statistical Association 82, [8] Schott, James R. (1997). Matrix Analysis for Statistics. New York : Wiley. [9] Takane, Y. and Hunter, M. A. (2001). Constrained principal component analysis: a comprehensive theory. Applicable Algebra in Engineering, Communication and Computing 12, [10] Tiao, G. C., Tsay, R. S., and Wang, T. C. (1993). Usefulness of linear transformations in multiple time series analysis. Empirical Econometrics, 18, [11] Tsai, H. and Tsay, R.S. (2010). Constrained factor models. Journal of the American Statistical Association 105,

23 ts(e) ts(e) Pacific Pacific Mountain Mountain West South Central West South Central East South Central East South Central South Atlantic West North Central South Atlantic West North Central East North Central Middle Atlantic New England East North Central Middle Atlantic New England Index Index (a) residuals for LSE (b) residuals for MLE Figure 3: Time series plots for (a) the least squares residuals and (b) the maximum likelihood residuals of the DCF model order (r,p,q) = (2,2,2). 23

24 E_r2p2q2, New England E_r2p2q2, New England E_r2p2q2, Middle Atlantic E_r2p2q2, East North Central E_r2p2q2, West North Central E_r2p2q2, South Atlantic E_r2p2q2, East South Central E_r2p2q2, West South Central E_r2p2q2, Mountain E_r2p2q2, Pacific E_r2p2q2, Middle Lag Atlantic E_r2p2q2, East LagNorth Central E_r2p2q2, West LagNorth Central E_r2p2q2, South Lag Atlantic E_r2p2q2, East LagSouth Central E_r2p2q2, West LagSouth Central E_r2p2q2, LagMountain E_r2p2q2, Lag Pacific (a) for LSE (b) for MLE Figure 4: for the residuals of DCF model with order (r,p,q) = (2,2,2). Results of the least squares estimation and the maximum likelihood estimation are shown. 24

25 F1[,1] F1[,2] F2[,1] F2[,2] F3[,1] F3[,2] (a) F i for LSE F1[,1] F1[,2] F2[,1] F2[,2] F3[,1] F3[,2] (b) F i for MLE Figure 5: Time series plots of common factors for a DCF model of order (r,p,q) = (2,2,2) via least squares estimation and maximum likelihood estimation. 25

26 ts(gterm) Pacific Mountain West South Central East South Central South Atlantic West North Central East North Central Middle Atlantic New England Index Figure 6: Time series plots for G F 2 ω 2 of a fitted DCF model of order (2,2,2). Maximum likelihood estimation is used. 26

27 ts(hterm) Pacific Mountain West South Central East South Central South Atlantic West North Central East North Central Middle Atlantic New England Index Figure 7: Time series plots for F 1 ω 1H of a fitted DCF model of order (2,2,2). Maximum likelihood estimation is used. 27

28 ts(ghterm) Pacific Mountain West South Central East South Central South Atlantic West North Central East North Central Middle Atlantic New England Index Figure 8: Time series plots for G F 3 ω 3H of a fitted DCF model of order (2,2,2). Maximum likelihood estimation is used. 28

DOUBLY CONSTRAINED FACTOR MODELS WITH APPLICATIONS

Statistica Sinica 26 (2016), 1453-1478 doi:http://dx.doi.org/10.5705/ss.2013.332t DOUBLY CONSTRAINED FACTOR MODELS WITH APPLICATIONS Henghsiu Tsai, Ruey S. Tsay, Edward M. H. Lin and Ching-Wei Cheng Academia