Sparse seasonal and periodic vector autoregressive modeling

Size: px

Start display at page:

Download "Sparse seasonal and periodic vector autoregressive modeling"

Belinda Curtis
5 years ago
Views:

1 Sparse seasonal and periodic vector autoregressive modeling Changryong Baek Sungkyunkwan University Richard A. Davis Columbia University October, Vladas Pipiras University of North Carolina Abstract Seasonal and periodic vector autoregressions are two common approaches to modeling vector time series exhibiting cyclical variations. The total number of parameters in these models increases rapidly with the dimension and order of the model, making it difficult to interpret the model and questioning the stability of the parameter estimates. To address these and other issues, two methodologies for sparse modeling are presented in this work: first, based on regularization involving adaptive lasso and, second, extending the approach of Davis, Zang and Zheng () for vector autoregressions based on partial spectral coherences. The methods are shown to work well on simulated data, and to perform well on several examples of real vector time series exhibiting cyclical variations. Introduction In this work, we introduce methodologies for sparse modeling of stationary vector (q dimensional) time series data exhibiting cyclical variations. Sparse models are gaining traction in the time series literature for similar reasons sparse (generalized) linear models are used in the traditional setting of i.i.d. errors. Such models are particularly suitable in a high-dimensional context, for which the number of parameters often grows as q (as for example with vector autoregressive models considered below) and becomes prohibitively large compared to the sample size even for moderate q. Sparse models also ensure better interpretability of the fitted models and numerical stability of the estimates, and tend to improve prediction. In the vector time series context, sparse modeling has been considered for the class of vector autoregressive (VAR) models: X n µ = A (X n µ) A p (X n p µ) + ɛ n, n Z, (.) where X n = (X,n,..., X q,n ) is a q vector time series, A,..., A p are q q matrices, µ is the overall constant mean vector and ɛ n are white noise (WN) error terms. Regularization approaches based on lasso and its variants were taken in Hsu, Hung and Chang (), Shojaie and Michailidis (), Song and Bickel (), Medeiros and Mendes (), Basu and Michailidis (), Nicholson, Matteson and Bien (), Kock and Callot (), with applications to economics, neuroscience AMS subject classification. Primary: M, H. Secondary: H. Keywords and phrases: seasonal vector autoregressive (SVAR) model, periodic vector autoregressive (PVAR) model, sparsity, partial spectral coherence (PSC), adaptive lasso, variable selection. The work of the first author was supported in part by the Basic Science Research Program from the National Research Foundation of Korea (NRF), funded by the Ministry of Science, ICT & Future Planning (NRF- RAA). The third author was supported in part by NSA grant H---.

2 (e.g. functional connectivity among brain regions), biology (e.g. reconstructing gene regulatory network from time course data), and environmental science (e.g. pollutants levels over time). As usual, the model (.) will be abbreviated as Φ(B)(X n µ) = ɛ n, n Z, (.) where Φ(B) = A B... A p B p and B is the backshift operator. In a different approach, Davis et al. () introduced an alternative stage procedure for sparse VAR modeling. In the first stage, all pairs of component series are ranked based on the estimated values of their partial spectral coherences (PSCs), defined as sup PSC X jk (λ) := sup λ λ gjk X (λ) gjj X j, k =,..., q, j k, (.) (λ)gx kk (λ), where g X (λ) = f X (λ) with f X being the spectral density matrix of X. Then, the order p and the top M pairs are found which minimize the BIC(p, M) value, and the coefficients of matrices A r are set to for all pairs of indices j, k not included in M. In the second stage, the estimates of the remaining non-zero coefficients are ranked according to their t statistic values. Again, the top m of the coefficients are selected that minimize a suitable BIC, and then the rest of the coefficients are set to. As shown in Davis et al. (), this stage procedure outperforms regular lasso. The basic idea of this approach is that small PSCs do not increase the likelihood sufficiently to warrant the inclusion of the respective coefficients of matrices A r in the model. We shall extend here the regularization approach based on lasso and the approach of Davis et al. () based on PSCs to sparse modeling of vector time series data exhibiting cyclical variations. The motivation here is straightforward. Consider, for example, the benchmark flu trends and pollutants series studied through sparse VAR models by Davis et al. (), and others. Figure depicts the plots of (the logs of) their two component series with the respective sample ACFs and PACFs. The cyclical nature of the series can clearly be seen from the figure. The same holds for other component series (not illustrated here). Cyclical features of component series are commonly built into a larger vector model by using one of the following two approaches. A seasonal VAR model (SVAR(P, p) model, for short; not to be confused with the so-called structural VAR) is one possibility, defined as Φ(B)Φ s (B s )(X n µ) = ɛ n, n Z, (.) where Φ(B) and ɛ n are as in (.), Φ s (B s ) = A s, B s... A s,p B P s with q q matrices A s,,..., A s,p, µ denotes the overall mean and s denotes the period. This is the vector version of the multiplicative seasonal AR model proposed by Box and Jenkins (). Another possibility is a periodic VAR model (PVAR(p) model, for short) defined as Φ m (B)(X n µ m ) = ɛ m,n, n Z, (.) where Φ m (B) = A m, B... A m,p B p with q q matrices A m,,..., A m,p which depend on the season m =,..., s wherein the time n falls (that is, there are in fact sp matrices A of dimension q q), and µ m refers to seasonal mean. One could also allow p depend on the season m =,..., s. Note that whereas the overall mean µ and the covariance matrix Eɛ n ɛ n = Σ are constant in (.), the mean µ m in (.) and the covariance matrix Eɛ m,n ɛ m,n = Σ m are allowed to depend on the season m. Both seasonal and periodic VAR models are widely used. For SVAR models, including the univariate case, see Brockwell and Davis (), Ghysels and Osborn (). These models form

3 Monthly Flu NC SACF Monthly Flu NC SPACF Monthly Flu NC y.... ACF.... PACF.... Time Lag Lag Ozone SACF Ozone SPACF Ozone y.... ACF..... PACF..... Time Lag Lag Figure : Top: Monthly flu trend in NC. Bottom: hour ozone levels at a CA location. Respective sample ACFs and PACFs are given. the basis for the U.S. Census X--ARIMA seasonal adjustment program. PVAR models, again with the focus on univariate series, are considered in the monographs by Ghysels and Osborn (), Franses and Paap (), Lütkepohl (), Hurd and Miamee (). These references barely scratch the surface. The vast amount of work should not be surprising most economic, environmental and other time series naturally exhibit cyclical variations. Sparse modeling is proposed below for both SVAR and PVAR models. These are two different classes of models. Both classes are considered here because of their central role in the analysis of time series with cyclical variations, and because the real time series of the flu and pollutants data discussed above are, in fact, better modeled by one of the two types of models. Indeed, for example, the step-ahead mean square prediction errors for the flu data are. (best AR model),. (best seasonal AR model), and. (best periodic AR model). For the pollutants series, the prediction errors are smaller for periodic AR models when long (at least step-ahead) horizons are considered. In this work, we thus decide between SVAR and PVAR models just based on how well they fit the data and perform in prediction. For more systematic approaches to choosing between seasonal and periodic models, see e.g. Lund () and references therein. The regularization approach to SVAR and PVAR models is based on adaptive lasso, and is somewhat standard. The regular lasso of Tibshirani () is well known to overestimate the number of nonzero coefficients (e.g. Bühlmann and van de Geer ()). The adaptive lasso of Zou () corrects this tendency in estimating fewer non-zero coefficients. While the application of adaptive lasso to PVAR models is straightforward, a linearization and an iterative version of adaptive lasso is used for SVAR models, which are nonlinear by their construction. Our extension of the Davis et al. () approach based on PSCs to sparse SVAR and PVAR

4 models involves the (qs) vector series Y t = X (t )s+ X (t )s+. X (t )s+s where s is the period as above and t now refers to a cycle. For the PVAR model, the series {Y t } is now (second-order) stationary. The Davis et al. () approach can then be applied, though not directly since a VAR model for {Y t } is too complex for our purposes. For SVAR models, it is natural to estimate first a sparse seasonal filter Φ s (B s ) by considering the between-period (between-cycle) series X (t )s+m as a series in t, for fixed season m =,..., s. A non-trivial and new issue is how to deal with the results across seasons m =,..., s. Once a sparse seasonal filter Φ s (B s ) is estimated, its seasonal effect on X n can be removed, and then the between-season filter Φ(B) can be estimated sparsely by following the approach of Davis et al. (). In the simulations and data applications presented below, the adaptive lasso and the PSC approaches are found to perform similarly. But the latter approach provides great computational advantages, especially as the dimension increases. In the data applications, both of these sparse modeling approaches outperform non-seasonal (non-periodic) VAR alternatives, as well as nonsparse seasonal (periodic) models. The rest of the paper is organized as follows. Preliminaries on vector time series, partial spectral coherences, SVAR and PVAR models can be found in Section. Our approaches to fit sparse SVAR and PVAR models are presented in Section. Finite sample properties of the proposed methods are studied in Section. An application to two real data sets is given in Section. Conclusions are in Section. Finally, Appendices A and B contain details on several estimation methods employed for our sparse SVAR and PVAR modeling. Preliminaries We focus throughout on q vector time series models X n = (X,n,..., X q,n ), n Z, with component univariate series {X j,n } n Z, j =,..., q. The prime above indicates transpose. If the series X = {X n } n Z is second-order stationary, it has constant mean EX n =: µ X and the autocovariance function Cov(X n, X n+h ) = EX n X n+h EX nex n+h =: γx (h), h Z, does not depend on n. The spectral density, if it exists, is a complex- and matrix-valued function f X (λ), λ ( π, π], characterized by π π eihλ f X (λ)dλ = γ X (h), h Z. For more information, see Hannan (), Lütkepohl (), Brockwell and Davis (). As described in Section, in the approach of Davis et al. (), the quantity of interest in the initial step is the partial spectral coherence (PSC) between two component series X j,n and X k,n defined as (see (.)): PSC X jk (λ) = gjk X g (λ), λ ( π, π], (.) jj X(λ)gX kk (λ), where g X (λ) = (gjk X (λ)) j,k=,...,q satisfies g X (λ) = f X (λ), supposing the latter exists. PSCs are related to pairwise conditional correlations of the series X. Denote by X jk,n the (q ) vector series obtained from X n by removing its jth and kth components X j,n and X k,n, respectively. Set ( {D opt j,m Rq, m Z} = argmin E D j,m,m Z X j,n m= D j,m X jk,n m ) (.)

5 and consider the residual series ε j,n = X j,n m= D opt j,m X jk,n m. Define similarly the residual series {ε k,n }. The conditional correlation between {X j,n } and {X k,n } is characterized by the correlation between the two residual series {ε j,n } and {ε k,n }. The component series {X j,n } and {X k,n } are called conditionally uncorrelated when Cov(ε j,n+m, ε k,n ) = for any lag m Z. It can be shown (Davis et al. ()) that {X j,n } and {X k,n }, j k, are conditionally uncorrelated (.) if and only if PSC X j,k (λ) =, for all λ ( π, π]. In a sparse modeling of VAR time series (.), Davis et al. () set A r (j, k) = A r (k, j) =, r =,..., p, (.) with A r (j, k) denoting the entries of the matrix A r, whenever PSC X j,k (λ) =, for all λ ( π, π]. (.) From a practical perspective, the rule (.) is used when the corresponding PSCs in (.) are small. Strictly speaking, the relation between (.) and (.) is not true in either direction. But using (.) (or when the PSCs are small in practice) to select a sparse model according to (.) seems to be working very well in practice. As suggested in Davis et al. () (see Section ), this is because if the PSCs are near zero, the corresponding AR coefficients do not increase the likelihood sufficiently to merit their inclusion in the model based on BIC. A seasonal VAR model is defined as (see (.)) ( A B... A p B p )( A s, B s... A s,p B sp )X n = ɛ n, n Z, (.) where s is the period, A,..., A p, A s,,..., A s,p are q q matrices, {ɛ n } is a white noise series with Eɛ n ɛ n = Σ and we assume for simplicity that the overall mean µ X = EX n =. The model (.) will be abbreviated as (.) and denoted SVAR(p, P ) s. We suppose that it is stationary causal, that is, det(i q p r= A rz r ) and det(i q P R= A s,rz R ) for z C, z. The elements of matrices A r and A s,r are denoted as A r (j, k) and A s,r (j, k), j, k =,..., q, respectively. A periodic VAR model is defined as (see (.)) ( A m, B... A m,pm B pm )(X n µ m ) = ɛ m,n, n Z, (.) where A m,r, r =,..., p m, m =,..., s, are q q matrices, s is the period, n = m + ts for some cycle t Z so that m denotes which season from,,..., s, the time instance n belongs to, {ɛ m,n } n Z, m =,..., s, are uncorrelated white noise series with Eɛ m,n ɛ m,n = Σ m, and µ m, m =,..., s, denote the seasonal means. The model (.) will be abbreviated as (.) and denoted PVAR(p,..., p s ) (or PVAR(p) s, when p =... = p s = p). Note that a PVAR model is not stationary. But as already indicated in Section, the (qs) vector series Y t = X (t )s+ X (t )s+. X ts, t Z, (.)

6 is (second-order) stationary. The elements of matrices A m,r are denoted as A m,r (j, k), j, k =,..., q. We shall not provide any preliminaries regarding lasso and its variants. The interested reader can consult any of the available textbooks on the subject, including Hastie, Tibshirani and Friedman (), Bühlmann and van de Geer (), Giraud (). Fitting sparse SVAR and PVAR models In this section, we propose methods to fit sparse SVAR and PVAR models by using PSCs or regularization. The suggested approaches based on PSCs extend the stage method for sparse VAR models considered by Davis et al. (). The regularization approaches also extend those previously used for sparse VAR models. For notational convenience, we assume throughout this section that the observed q vector time series is X,..., X N with the sample size N = T s, (.) that is, an integer multiple of T cycles of length s, where s is the period. Each cycle has s seasons, denoted m =,..., s.. Sparse SVAR models.. Two-stage approach based on PCSs The idea to fit sparse SVAR models based on PSCs is straightforward. The seasonal filter Φ s (B s ) in (.) (or (.)) can be thought to account for the between-cycle dependence structure, that is, the dependence structure in the series Y (m) t = X (t )s+m, t =,..., T, (.) where m =,..., s is a fixed season. The sparse filter Φ s (B s ) can then be estimated by following the approach of Davis et al. () applied to the series (.). One new issue arising here is how to combine the information across different seasons m. Once the sparse seasonal filter is estimated, its seasonal effect on X n can be removed, and then the between-season filter Φ(B) can be estimated sparsely by following again the approach of Davis et al. (). We next give the details of the described method. First, for each season m =,..., s, calculate the PSCs for ( q ) pairs of component series of Y (m) in (.). Denote the rank of the PSC as R (m) (j, k) for the pair (j, k). We then define the rank of the seasonal PSC for the pair (j, k) as r(j, k) = s R (m) (j, k). (.) m= We will use r(j, k) s to rank the conditional correlations in the between-cycle time series Y (m) t in (.), that is, the top seasonal ranks will indicate the strongest conditional between-cycle correlations. Somewhat surprising perhaps, we also investigated the possibility of considering the average PSCs across m seasons and then ranking them, but this approach did not lead to satisfactory results in practice. Second, having the seasonal ranks of conditional correlations, we proceed as in Davis et al. (). A sparse SVAR filter Φ s (B s ) is fitted as follows. For M pairs (j, k) with the top M t

7 seasonal ranks, the coefficients A s,r (j, k) and A s,r (k, j) of the matrices A s,r, R =,..., P, can be non-zero. The (j, k) and (k, j) coefficients are set to zero for all other ( q ) M pairs (j, k) with smaller ranks. The order P and the top M pairs are chosen by minimizing the following BIC over a prespecified range (P, M) P M, BIC S (P, M) = s log L(Â(m) s,,..., Â(m) ) + (q + M)P log N, m= where L(Â(m) s,,..., Â(m) s,p ) is the Gaussian maximum likelihood of the VAR(P ) model based on the series Y (m) t in (.), assuming the model is sparse (with M non-zero off-diagonal elements in matrices A (m) s,r ), and (q + M)P is the number of non-zero coefficients. With the sparse seasonal VAR model filter Φ s (B s ) chosen, consider the deseasonalized series s,p Z n = ( A s, B s... A s, P B P s )X n, (.) where A s,r = s s m= Â(m) s,r, R =,..., P, are the average estimators of A s,r across m seasons. The matrices A s,r have only M non-zero off-diagonal elements. The rest of the method is essentially the Davis et al. () approach applied to the series {Z n } in (.). Third, which is the first stage of Davis et al. () method, calculate and rank the ( q ) PSCs of the series {Z n } in (.). Select the order p and the top m pairs (with the top m PCSs) to minimize the following BIC over (p, m) P M: BIC(p, m) = log L(Â,..., Âp) + (q + m)p log(n P s), where L(Â,..., Âp) is the Gaussian maximum likelihood of the VAR(p) model for the series Z n in (.) assuming the model is sparse (with m non-zero off-diagonal elements in matrices A r ), and (q + m)p is the number of non-zero coefficients. Finally and forth, we adapt the second stage of the Davis et al. () approach as follows. We reestimate the sparse matrices A,..., A p and A s,,..., A s, P by using the estimated GLS procedure described in Appendix A. The t statistic for a non-zero SVAR coefficient is t( γ C (i)) = γc (i) s.e.( γ C (i)), (.) where γ = Rα with constraints matrix R (imposing sparsity) and α := vec(â,..., Â p, Âs,,..., Âs, P ) is the ( q ( p + P ) ) vector, and γ C is the estimated GLS estimator for the constrained SVAR. By ranking the absolute values of the t statistics t( γ C (i)) from the highest to the lowest, we finally select the top r of non-zero coefficients from SVAR by finding the smallest BIC value given by BIC C (r) = log L C ( α C ) + r log N, where L C ( α C ) is the likelihood evaluated with the estimated GLS estimator α C obtained by selecting the top r non-zero coefficients from the ranked t statistics. In fact, to give a more balanced treatment to the non-seasonal and seasonal parts of the model, it is natural to consider the ranking of the t statistics separately for the non-seasonal and seasonal coefficients, and to take the top r successively from the two lists of the coefficients. We found this ranking to perform better and use it throughout the paper.

8 .. Adaptive lasso approach The adaptive lasso procedure of Zou () applies to linear models. In order to apply it to the nonlinear SVAR models, we linearize the SVAR model (.) around A r = A r and A s,r = A s,r as in (A.) of Appendix A: Y = Xγ + ɛ, (.) where γ = vec ( ) ( A,..., A p, A s,,..., A s,p = γ,..., γ (p+p )q ), Y = vec(y,..., Y N ) with Y n given by (A.), and X is the design matrix determined by (A.). For fixed A r and A s,r, the adaptive lasso solution for SVAR(p, P ) in the linearized form (.) is given by argmin γ (p+p )q N Y Xγ + λ l j= j γ j. (.) The estimation (.) is applied iteratively in l, where A r = A r (l), A s,r = A s,r (l) are the estimates of A r and A s,r in the previous step l, the weights w (l) j are w (l) j = γ (l ) j with the estimator γ (l ) of γ in the previous stage l. One additional consideration in applying adaptive lasso for SVAR is that the components of the innovations ɛ n in the SVAR model are not identically distributed. To incorporate possible correlations in ɛ n, we modify the adaptive lasso as γ (l) al = argmin γ N (I N Σ / (l) )Y (I N w (l) (p+p )q / Σ (l) )Xγ + λ l j= j γ j, (.) where Σ (l) is the estimated covariance matrix of innovations ɛ n from the previous step l. The fold cross validation is used to select the tuning penalty parameter λ l. Finally, the iterative procedure is stopped when Σ (l+) Σ (l) ε for some predetermined ε. We have used the original lasso estimator with the fold cross validation rule to select the initial estimator γ (). The order of the SVAR model is chosen by selecting ( p, P ) = argmin CV(p, P ), (p,p ) P P where CV(p, P ) is the average fold cross-validation error.. Sparse PVAR models.. Two-stage approach based on PSCs Recall from Section that PVAR models can be cast into the framework of stationary time series by considering the (qs) vector time series {Y t } in (.). Moreover, the series {Y t } has a VAR representation (see e.g. Franses and Paap (), p., in the case of quarterly data). The VAR representation, however, is a nonlinear and involved transformation of the PVAR coefficients, making a direct application of the Davis et al. () approach not plausible. For the reader less familiar with the subject, the VAR representation of the series {Y t } is more complex because, for w (l)

9 (m, j) \(l, k) (,) (,) (,) (,) (,) (,) (,) (,) (,) A, (, ) A, (, ) (,) A, (, ) A, (, ) (,) A, (, ) A, (, ) (,) A, (, ) A, (, ) (,) A, (, ) A, (, ) (,) A, (, ) A, (, ) (,) A, (, ) A, (, ) (,) A, (, ) A, (, ) Table : Corresponding PVAR() coefficients in the case of period s = and dimension q =. First column: the indices of the response variable. First row: the indices of the regressor variable. example, the component of Y t in season s is regressed on the component of Y t in season s for the same cycle t. To fit a sparse PVAR model, we can nevertheless proceed straightforwardly by following the same principle as in Davis et al. () described around (.) (.). That is, if the PSC between two component series of {Y t } is zero (or small), we would set the corresponding coefficient in the PVAR representation between the two components to zero, even if it accounts for the regression of one on the other in the same (or any) cycle t. Before expressing this rule in the general case, we illustrate it through an example. Consider a PVAR() model with period s = and dimension q =. Index the components of Y t by (, ), (, ), (, ), (, ), (, ), (, ), (, ) and (, ), where the first index refers to the season m =,,, and the second to the dimension j =,. Table presents the coefficients of the matrices A m,, m =,,,, in the PVAR() model between two component series in the above indexing, where the indices of the response and regressors are given, respectively, in the first column and the first row. For example, in this PVAR() model, the component series Y (,),t in season is regressed on the component series Y (,),t and Y (,),t in season of the same cycle t, with the respective coefficients A, (, ) and A, (, ). Similarly, Y (,),t in season is regressed on Y (,),t and Y (,),t in season of the previous cycle t, with the respective coefficients A, (, ) and A, (, ). Then, for example, if the PSC between the component series {Y (,),t } and {Y (,),t } is small, we would set the coefficient A, (, ) to zero, even if the regression is in the same cycle t. Likewise, for example, if the PSC between {Y (,),t } and {Y (,),t } is small, we would set the coefficient A, (, ) to zero. In general, we index the components of the vector Y t through (m, j), where m =,..., s refers to the season and j =,..., q to the dimension of the q vector time series X t. That is, Y t = vec ( ) X (t )s+, X (t )s+,..., X (t )s+s = ( Y (,),t,..., Y (,q),t, Y (,),t,..., Y (,q),t,..., Y (s,),t,..., Y (s,q),t ). The periodic PSC between component series {Y (m,j) } and {Y (l,k) } is sup λ PSC Y (m,j),(l,k) (λ), m, l =,..., s, j, k =,..., q and m l, j k. The pairs of the component series are ranked according to their PSCs. At first stage, as in VAR and SVAR models, we always take the diagonal coefficients of the PVAR model as non-zero. These are the coefficients A m,r (j, j), r = s, s,...,

10 that is, the coefficients in the PVAR regression when Y (m,j),t is regressed on its values Y (m,j),t, Y (m,j),t,... in the previous cycles t, t,.... If the pair (m, j) and (l, k) with m l, are included in the PVAR model based on the rank of their PSC, the following coefficients are then naturally set to non-zero: if m > l, A m,r (j, k), r = m l, (m l),..., and, if m = l, A m,r (j, m), r = s, s,.... For a possible range of (p, m) P M, the order p and the number m of non-zero coefficients in the PVAR model is selected by minimizing the BIC value BIC(p, m) = log L P (Â,,..., Âs,p) + m log(n), (.) where L P is the likelihood computed with the GLS estimates Â,,..., Âs,p for the constrained PVAR model. The calculation of the GLS estimators is detailed in Appendix B. The refinement step can also be applied similarly to PVAR. By ranking the absolute values of t statistic for non-zero PVAR coefficients, the top r number of non-zero coefficients are finally selected from the BIC criterion as in Davis et al. ()... Adaptive lasso approach An adaptive lasso procedure for PVAR is straightforward by applying it to each season. Let Xn = X n µ m be the seasonally centered observations and set X := (X,..., X N ). Let also Q m, m =,..., s, be the N T matrix operator extracting observations falling into the mth season from the indices,..., N. Then, the PVAR model for the mth seasonal component can be written as Y m = B m U m + Z m, (.) where Y m = X Q m, B m = (A m,,..., A m,p ), U m = UQ m, Z m = ZQ m with U = (U,..., U N ), Z = (ɛ,..., ɛ N ) and U t = vec(xt, Xt,..., X t p m ). Vectorizing (.) gives the linear model for PVAR: y m = (U m I q )β m + z m with y m = vec(y m ), β m = vec(b m ) and z m = vec(z m ). Similarly to the case of SVAR, possible correlations of ɛ m,n can be incorporated in the adaptive lasso by using the estimated covariance matrix Σ (l). This leads to the adaptive lasso estimator β (l) m = argmin β m T (I T Σ / (l) )y m (U m Σ / (l) )β m + λ l p mq j= j β m,j, (.) where β m,j represents the jth component parameter of β m and the weights are given by w (l) j = β (l ) m,j. The estimation (.) is iterated over l. For the lth iteration, the covariance matrix is obtained as Σ (l) = T (Y m (l ) (l ) B m U m )(Y m B m U m ), w (l)

11 B (l ) m where is the estimator from the previous step l. The fold cross-validation rule is used to select the tuning penalty parameter λ l. The iterations are stopped when the covariance matrix is estimated within the specified margin of error. We used the original lasso estimator from the -fold cross validation rule as the initial estimator. The order of PVAR model is selected by finding the orders that minimize the average fold cross validation error as in the SVAR model selection. Finite sample properties In this section, we study the finite sample properties of the proposed methods through a simulation study.. Sparse SVAR models We examine here the performance of our proposed two-stage and adaptive lasso procedures for sparse SVAR models. Consider a dimensional SVAR(, ) model with period s = given by where A =.... ( A B)( A, B )X n = ɛ n,, A, =..... (.) Note that the number of non-zero coefficients of A and A, is so that.% of the coefficients are set to zero in this model. To indicate the number of non-zero coefficients, we will also write the model as sparse SVAR(,;). A sequence of Gaussian i.i.d. innovations {ɛ n } with zero mean and covariance matrix Σ = δ δ/ δ/ δ/ δ/ δ/ δ/ δ/ δ/ δ/ δ/ is considered with three different values of δ {,, }. The order of SVAR(p, P ) model is searched within the pre-specified range p, P {,,, }. The sample size is N = = (with T = ) and all results are based on replications. We first evaluate the performance of our proposed two-stage approach based on PSCs and adaptive lasso (A-LASSO) approaches in Table by considering the average values (over replications) of estimated orders ˆp, ˆP, the numbers of non-zero coefficients at the two stages, and the MSE. Observe from the table that our proposed methods find the correct order of SVAR(p, P ) model in all the cases considered. The true number of non-zero coefficients in stage based on PSCs is by including the diagonal and symmetric entries. It can also be seen that the twostage approach based on PSCs finds non-zero coefficients reasonably well at both stages. For the adaptive lasso approach, first observe that it also estimates the orders p, P correctly. It performs

12 ˆp ˆP # coeff # coeff Bias Variance MSE δ = PSC..... A-LASSO.... δ = PSC..... A-LASSO.... δ = PSC..... A-LASSO.... Table : Estimated orders ˆp and ˆP with the number of non-zero coefficients at each stage, and MSE for sparse SVAR(,;) TRUE coefficients (delta=) Selected proportions Estimated coefficients TRUE coefficients (delta=).. Selected proportions Estimated coefficients Figure : The coefficients of stacked A and A, for simulated sparse SVAR(,;) (left), the relative frequency on the non-zero coefficients (middle) and the estimated coefficients (right) for two-stage approach based on PSCs.

13 TRUE coefficients (delta=) Selected proportions Estimated coefficients TRUE coefficients (delta=) Selected proportions Estimated coefficients Figure : The coefficients of stacked A and A, for simulated sparse SVAR(,;) (left), the relative frequency on the non-zero coefficients (middle) and the estimated coefficients (right) for the adaptive lasso approach. quite similarly to the two-stage approach based on PSCs in terms of MSE, but it has slightly larger bias than the PSC approach and tends to find a larger number of non-zero coefficients. The sparse SVAR modelling based on both PSCs and adaptive lasso works successfully in this simulation, but the PSC approach performs slightly better and is computationally less demanding than the adaptive lasso. When the signal-to-noise ratio parameter δ is larger, our method tends to find a sparser model. This can be observed more clearly in both Figures and. The left panels of Figure represent the true coefficients (.) where the top matrix represents A and the bottom matrix is A,, and where the size and the brightness of a circle are proportional to the absolute value of

14 h = h = h = h = h = h = h = h = sparse VAR sparse PVAR (PSC) SVAR sparse SVAR-(PSC) sparse SVAR-(A-LASSO) Table : MSPE(h) for sparse SVAR(,;) and five different fitted models, namely, sparse VAR, sparse PVAR, SVAR and sparse SVAR based on PSCs and adaptive Lasso approaches when the innovations are generated with δ =. the average and the color corresponds to its sign. The middle panels show the proportion of (j, k) entries set to be non-zero out of replications at stage, and the right panels are the averages of estimated coefficients for replications. When δ =, the selected proportions are essentially and the estimated coefficients are quite close to the true values. Even when δ is increased to so that larger noise now obscures the underlying true dynamics, our method still finds the correct location of non-zero coefficients together with good estimates, but tends to find a sparser model. For example, one diagonal term A (, ) and A, (, ) are missed most of the time. For the latter case, this might be because Var(ɛ,n ) = and Var(ɛ,n ) = so that the model will be almost the same assuming that A, (, ) =. We also compared forecasting errors among the following five models: a sparse VAR model (based on PSCs) not taking into account seasonal variations, a sparse PVAR model (based on PSCs), an unrestricted SVAR model and a sparse SVAR model with the proposed two-stage procedure and the adaptive lasso. The order of sparse VAR(p) model is searched for p {,,..., } as in Davis et al. (). The out-of-sample forecast performance is measured by computing the empirical h step ahead forecast mean squared prediction error (MSPE) MSPE(h) = N r N r i= ( ) X (i) (i) ( ) n+h X n+h X (i) (i) n+h X n+h, where X n+h is the best linear predictor based on {X,..., X n }. Table indicates MSPE of sparse SVAR(,;) model with δ =. Observe that SVAR model outperforms sparse VAR and sparse PVAR models as expected in terms of forecasting. Note that our proposed sparse SVAR models and SVAR without coefficient constraints perform quite similarly, but sparse SVAR based on PSCs achieves the smallest MSPE for all lags h =,..., (lags,, and are omitted for brevity). However, note that sparse SVAR uses only non-zero coefficients on average with the PSC approach and non-zero coefficients for the adaptive lasso, which constitute only % and % respectively, of SVAR(,) model coefficients. Sparse modeling makes the model interpretation much easier and improves forecasting performance.. Sparse PVAR models To evaluate the performance of the proposed methods on PVAR models, we consider a dimensional PVAR() model with period s = given by X n = A m, X n + ɛ m,n, m =,...,, (.)

15 ˆp # coeff # coeff Bias Variance MSE δ = PSC..... A-LASSO.... δ = PSC..... A-LASSO.... δ = PSC..... A-LASSO.... Table : Estimated order ˆp, the number of non-zero coefficients at stage, the number of non-zero coefficients at stage and MSE for sparse PVAR(;). where A, = A, = , A, =, A, = and the errors ɛ m,n are i.i.d. Gaussian noise with zero mean and covariance matrix Σ = δ δ/ δ/ δ/ δ/ Note that the simulated model has an autoregressive order of one with non-zero coefficients, and we write it as sparse PVAR(;). The pre-specified order p {,,, }, and three levels of δ {,, } are considered. The sample size is N =, (and T = ) and all results are based on replications. Table presents the five measures analogous to those in Table. First, observe that our twostage approach selects the true order of autoregression in all the cases considered. Regarding the estimated non-zero coefficients, stage always includes the diagonal entries and the coefficients are set to be non-zero symmetrically as explained in Section.. Thus, the true number of such nonzero coefficients is =, which is overestimated by the procedure of stage. However, the refinement stage leads to a much smaller number of non-zero coefficients. The sparse PVAR modeling based on the adaptive lasso also successfully finds the true order, and the number of estimated non-zero coefficients is very close to the true one. However, it has more variability than the PSC approach, leading to larger MSE. Figures and depict which non-zero coefficients are selected from our two-stage procedure and the adaptive lasso. Observe that the estimated non-zero coefficients are close to the true values. Observe from Table that the MSE is increasing as δ is getting larger. The effect of increasing δ is seen more clearly in Figure. In our simulated model, fewer non-zero coefficients are selected while the estimated non-zero coefficients are close to the true values. Forecasting performance is also compared for sparse VAR (based on PSCs), sparse SVAR, unconstrained PVAR() and sparse PVAR(;) models in Table. Sparse PVAR(;) model is used with δ =. Observe that our sparse PVAR models achieve the smallest MSPE in all the cases considered. The estimation based on PSCs performs slightly better than the adaptive lasso approach. As in the case of SVAR, unconstrained PVAR and sparse PVAR perform quite similarly.,,

16 h = h = h = h = sparse VAR.... sparse SVAR (PSC).... PVAR.... sparse PVAR (PSC).... sparse PVAR (A-LASSO).... Table : MSPE(h) for sparse PVAR(;) and five different fitted models, namely, sparse VAR, sparse SVAR, PVAR and sparse PVARs based on PSCs and the adaptive lasso when the innovations are generated with δ =. TRUE coefficients (delta=) Selected proportions Estimated coefficients TRUE coefficients (delta=) Selected proportions Estimated coefficients Figure : The coefficients of stacked A,, A,, A, and A, for simulated sparse PVAR(;) (left), the relative frequency on the non-zero coefficients (middle) and the corresponding estimated coefficients (right) for the two-stage PSC approach.

17 TRUE coefficients (delta=) Selected proportions Estimated coefficients TRUE coefficients (delta=) Selected proportions Estimated coefficients Figure : The coefficients of stacked A,, A,, A, and A, for simulated sparse PVAR(;) (left), the relative frequency on the non-zero coefficients (middle) and the corresponding estimated coefficients (right) for the adaptive lasso approach. in terms of forecasting. However, sparse PVAR model uses only % of the non-zero coefficients compared to PVAR(). Thus, sparse PVAR makes the model interpretation simpler while keeping the same forecasting performance as unconstrained PVAR.

18 NO NO... Time SACF SPACF ACF.... Partial ACF... Lag Lag Seasonal mean Seasonal STDEV Mean.... SD.. season season Figure : Time plot and sample ACF and PACF plots for detrended NO concentration. Seasonal mean and standard deviation are also depicted. Applications to real data. Air quality in CA The air quality data observed hourly during the year of at Azusa, California is analyzed based on the methods proposed in Section. We have considered four concentration levels of air pollutants, namely, CO, NO, NO, Ozone, as well as solar radiation as a natural force. All data can be downloaded from Air Quality and Meteorological Information System (AQMIS). Since stations are not operating from AM to AM, there are only observations for each day. We also applied linear interpolation for missing values. Thus, the series is dimensional with N = = observations in each component series. Before fitting a model, we applied the log transformation and cubic polynomial regression to remove heteroscedascity and deterministic trend. Figure shows the time plot of the detrended NO concentration together with several diagnostics plots. First, observe that the sample ACF and PACF indicate the presence of cyclical variations. Hence, a seasonal model seems plausible. However, as the bottom seasonal mean and standard deviation plots show, seasons have varying mean and standard deviation, suggesting a periodic model for the air quality data. The other series have similar properties and are not considered here for brevity. We applied our proposed two-stage method of Section. to fit a sparse PVAR model with the pre-specified order p {,,, }. Figure depicts the BIC curves for stages and, showing that the best selected model is sparse PVAR(;). When the adaptive lasso approach is used,

19 Stage Stage BIC p= p= p= BIC p= Index Figure : BIC plots for the two-stage procedure in finding the best sparse PVAR model for the air quality data. Index h = h = h = h = h = h = sparse VAR(;) sparse SVAR(,;) (PSC) PVAR(;) sparse PVAR(;) (PSC) sparse PVAR(;) (A-LASSO) Table : The h step forecast MSE for air quality data with sparse VAR, sparse SVAR based on PSCs, (non-sparse) PVAR and sparse PVAR models. The sparse PVAR(;) model achieves the smallest h step forecast MSE in all cases considered. sparse PVAR(;) is selected. We also examined the fitted sparse PVAR model for the air quality data by comparing out-ofsample forecasts as in Davis et al. (). The h step-ahead forecast mean squared error (MSE) is calculated by T +T t h MSE(h) = (Ŷt+h Y t+h ) (Ŷt+h Y t+h ), q(t t h + ) t=t where Ŷt+h is the h step-ahead best linear forecast of Y t+h with the training sample size T and test sample size T t. In this analysis, we used the first observations as the training set and set T t =. We compared five different models: sparse VAR, sparse SVAR based on PSCs, unconstrained PVAR, and sparse PVARs based on PSCs and the adaptive lasso. The best sparse VAR model is VAR(;), and the unconstrained PVAR of order is selected, hence the number of non-zero coefficients is. For h=,,,, and, h step-ahead forecast MSE is reported in Table. First, the sparse VAR model has increasing MSE as h increases. Thus, the sparse VAR model performs poorly since cyclical variations are not incorporated in the model. The best sparse SVAR model is SVAR(,) with non-zero coefficients. The sparse SVAR model incorporates cyclical variations into the model, but its forecasting performance is considerably worse compared to the PVAR models. While the PVAR models provide similar and reasonably small MSEs, our proposed sparse PVARs achieve slightly smaller MSEs for all the considered lags h. This is in addition to a better model interpretability a sparse PVAR model provides.

20 Flu trend CA Flu trend GA flu... flu... time time Flu trend IL Flu trend NJ flu... flu... time time Flu trend TX SACF (CA) flu... ACF... time Lag Figure : Monthly Google flu trend data.. Google flu trend In this section, we consider the celebrated Google flu trend data on the weekly predicted number of influenza-like-illness (ILI) related visits per, outpatients in a US region. More specifically, we consider the monthly data obtained by aggregating weekly observations from the first week of to the last week of. Though the Google flu data is available for the states, the District of Columbia and major cities over the US, we consider only states (CA, GA, IL, NJ, TX) for illustration. Thus, the dimension of the data is q = and N =. We also take a log transformation to make the series more stationary. Figure shows the time plot of the data and the right bottom plot shows the sample ACF for CA. Observe that the Google flu data exhibits cyclical behavior, which we model through a sparse SVAR model as proposed in Section.. The best model is sparse SVAR(,) with nonzero coefficients for the two-stage procedure based on PSCs, SVAR(,;) for the adaptive lasso approach. If we ignore cyclical variations, then the procedure of Davis et al. () finds sparse VAR() as the best model with non-zero coefficients. Figure shows estimated coefficients for sparse VAR() and sparse SVAR(,) models. To see which model explains the observed data better, we conducted the out-of-sample forecasting comparison as described in the previous section. The results are more interesting here. Table reports MSPE(h), h =, and, for five different models, namely, sparse VAR, sparse PVAR based on PSCs, (non-sparse) SVAR and sparse SVAR models based on PSCs and the adaptive lasso. We used the first observations as the training set and the subsequent observations for the test

21 sparse VAR() sparse SVAR(,) AR sparse SVAR(,) SAR Figure : Estimated coefficients for sparse VAR() and sparse SVAR(,) for monthly flu data. h = h = h = sparse VAR(;)... sparse PVAR(;) (PSC)... SVAR(,;)... sparse SVAR(,;) (PSC)... sparse SVAR(,;) (A-LASSO)... Table : The h step forecast MSE for monthly flu data with sparse VAR, sparse PVAR based on PSCs, (non-sparse) SVAR and sparse SVAR models. data. Note that the SVAR models outperform the sparse VAR model, showing the advantages of modeling cyclical variations. Also, our proposed sparse SVAR(,) models achieve the smallest forecasting errors, in turn indicating that our sparse SVAR(,) models not only provide an easier model interpretation but also better forecasting performance. The sparse PVAR(;) model performs the worst amongst the four fitted models. This may be due to the small sample size and a relatively large number of selected non-zero coefficients. Conclusions In this work, we study the estimation of the sparse seasonal and periodic vector autoregressive models. These SVAR and PVAR models play a central role in modeling time series with cyclical variations. We consider two popular approaches available for the modeling of high dimensional time series: first, the regularization approach based on the adaptive lasso and, second, the variable selection approach based on PSCs. However, both approaches do not apply directly to SVAR and PVAR because SVAR models are nonlinear, and aggregating results across different seasons in PVAR models is not immediate. These issues are resolved by using linearization and new ways of using information from PSCs as detailed in Section. Finite sample simulations show good performance of our methods as reported in Section. In particular, the number of nonzero coefficients, parameter and model order estimates are reasonably close to the true values and forecasting is shown to be superior to those from the other models considered. Our methods are illustrated on real data applications and shown to outperform other approaches especially in terms of out-of-sample forecasting.

22 A Estimation in constrained SVAR models In this appendix, we present an algorithm to estimate the coefficients for SVAR models with (sparsity) constraints. The SVAR(p, P ) s model can be written as (assuming µ = ) X n = p A r X n r + r= P A s,r X n Rs R= p r= R= P A r A s,r X n Rs r + ɛ n, (A.) where {ɛ n } is WN(, Σ). Due to the cross terms in (A.), the usual estimation algorithm for VAR with constraints (e.g. Lütkepohl (), Chapter ) cannot be applied. We shall adapt this approach after linearizing (A.). Vectorizing (A.) yields vec(x n ) = = p vec(a r X n r ) + r= P vec(a s,r X n Rs ) R= p ( ) X n r I q vec(ar ) + r= p r= R= p r= R= P (X n Rs I q )vec(a s,r ) R= P vec ( ) A r A s,r X n Rs r + vec(ɛn ) P ( ) X n Rs r I q vec(ar A s,r ) + vec(ɛ n ). (A.) Now, linearize the cross term in (A.) as vec(a r A s,r ) vec(a r A s,r ) + vec(a ra s,r ) vec(a r ) (vec(a r ) vec(a r )) Ar,As,R + vec(a ra s,r ) vec (A s,r ) (vec(a s,r ) vec(a s,r )) Ar,As,R at some neighborhood of A r, A s,r. By using this linearization and the identities vec(a r A s,r ) = ( A s,r I q ) vec(ar ) = (I q A r ) vec (A s,r ) to compute the derivatives in the linearization, the relation (A.) can be linearized as { p (X ) P } ( Y n = n r I q ) Xn Rs r I q (As,R I q ) vec(a r ) + where r= R= P ( ) Xn Rs I q vec(as,r ) R= Y n = vec(x n )+ p r= R= p r= R= P ( ) Xn Rs r I q (Iq A r ) vec(a s,r ) + vec(ɛ n ), (A.) P ( ) { X n Rs r I q vec (Ar A s,r ) ( A ) s,r I q vec(ar ) (I q A r ) vec (A s,r ) }. (A.) The model (A.) is the vectorized and linearized version of the relation (A.) around the points A r and A s,r. Rewriting (A.) in the matrix form and viewing Y n as a response, we have Y = Xγ + ɛ, (A.)

23 where Y = vec(y,..., Y N ), γ = vec ( A,..., A p, A s,,..., A s,p ), and ɛ = vec(ɛ,..., ɛ N ) with the corresponding design matrix X. Write also the sparsity constraints as γ = Rα, (A.) so that α contains only the non-zero coefficients of the SVAR model. Then, the GLS estimator of γ is given by γ = R ( (XR) Σ (XR) ) ( (XR) Σ Y ) (A.) with the covariance Var( γ) = R ( (XR) Σ (XR) ) R (A.) (e.g. Lütkepohl (), p. ). The estimated GLS (EGLS) estimator of γ in (A.) is then defined by replacing Σ with a suitable estimate. Turning back to the multiplicative constrained SVAR model (A.), we propose to estimate its coefficients through the following iterative procedure. STEP : First, we obtain initial estimates for constrained SVAR based on the approximation of (A.) by discarding the cross terms, namely, p P X n = A r X n r + A s,r X n Rs + z n. r= R= The model can be written in a compact form as W = (U I q )γ + Z, where W = vec(x P s+,..., X N ), U = (U P s+,..., U N ) with U t = vec(x t,..., X t p, X t s,..., X t P s ), γ = vec(a,..., A p, A s,,..., A s,p ) and Z = vec(z P s+,..., z N ). Then, the least squares (LS) estimator of γ is given by γ () := vec(â(),..., Â() p, Â() s,,..., Â() s,p ) = R ( R (UU I q )R ) R (U I q ) W (A.) (e.g. Lütkepohl (), Section.). Set i =. STEP : For a given estimate of γ (i) = vec(â(i),..., Â(i) p, Â(i) s,,..., Â(i) s,p ), define Σ (i+) by where X (i) n = p r= Σ (i+) = N Â (i) r X n r + N n=max(p,p s)+ P R= ( X n Â (i) s,r X n Rs p ) ( (i) X n X n P r= R= ) (i) X n (A.) Â (i) r Â (i) s,r X n Rs r. Then, update the EGLS estimates of the constrained SVAR model through (A.) as ( ) ) γ (i+) = R (X (i) R) Σ (i+) (X (i) R) ((X (i) R) Σ (i+) Y (i), where the X (i) and Y (i) are obtained by applying linearization at points A r = Â(i) r A s,r = Â(i) s,r in (A.). STEP : Stop if γ (i+) γ (i) ε (A.) for predetermined tolerance level ε. Otherwise, set i to i + and go back to STEP. and

24 B Estimated GLS for constrained PVAR models The estimated GLS (EGLS) procedure for constrained PVAR models applies EGLS to constrained VAR for each season. Suppose that the constraint for the mth season component coefficients is represented as δ m = R m β m, m =,..., s, where δ m := vec(b m ) := vec(a m,,..., A m,p ). Let Xn = X n µ m be the seasonally centered observations and write X := (X,..., X N ). Let also Q m be the matrix operation extracting observations falling into the mth season from the indices,..., N. Then, for the mth season component, PVAR model can be represented as Y m = B m U m + Z m, where Y m = X Q m, U m = UQ m, Z m = ZQ m with U = (U,..., U N ), Z = (ɛ,..., ɛ N ) and U t = vec(x t, X t,..., X t p ). The GLS estimator for δ m is given by δ m = R m ( Rm (U m U m Σ m )R m ) R m (U m Σ m )y m, where y m = vec(y m ) and Σ m is the covariance matrix for the mth season component innovations. Then, the EGLS is obtained by using estimates of Σ m iteratively till a prespecified tolerance () () ( level is achieved. For an initial estimator of Σ m, the OLS estimator δ m := vec( B m ) := (Um U m) ) I q ym can be used to obtain Σ () m = T (Y () () m B m U m )(Y m B m U m ), where T is the number of observations falling into the mth season. The EGLS estimator is consistent and asymptotically normal and the variance can be estimated through References Var( δ C m) = R m (R m(u m U m ( Σ C m) )R m ) R m. Basu, S. and Michailidis, G. (), Regularized estimation in sparse high-dimensional time series models, The Annals of Statistics (),. Box, G. and Jenkins, G. (), Time Series Analysis: Forecasting and Control, Holden-Day. Brockwell, P. and Davis, R. A. (), Time Series: Theory and Methods, Springer Series in Statistics, Springer, New York. Reprint of the second () edition. Bühlmann, P. and van de Geer, S. (), Statistics for High-Dimensional Data: Methods, Theory and Applications, Springer Series in Statistics, Springer Berlin Heidelberg. Davis, R. A., Zang, P. and Zheng, T. (), Sparse vector autoregressive modeling, Journal of Computational and Graphical Statistics, To appear. Franses, P. H. and Paap, R. (), Periodic Time Series Models, Advanced Texts in Econometrics, Oxford University Press, Oxford. Ghysels, E. and Osborn, D. R. (), The Econometric Analysis of Seasonal Time Series, Themes in Modern Econometrics, Cambridge University Press, Cambridge.

Title. Description. var intro Introduction to vector autoregressive models

Title. Description. var intro Introduction to vector autoregressive models Title var intro Introduction to vector autoregressive models Description Stata has a suite of commands for fitting, forecasting, interpreting, and performing inference on vector autoregressive (VAR) models