Bayesian Analysis of Vector ARMA Models using Gibbs Sampling. Department of Mathematics and. June 12, 1996

Size: px

Start display at page:

Download "Bayesian Analysis of Vector ARMA Models using Gibbs Sampling. Department of Mathematics and. June 12, 1996"

Dorcas Harrington
5 years ago
Views:

1 Bayesian Analysis of Vector ARMA Models using Gibbs Sampling Nalini Ravishanker Department of Statistics University of Connecticut Storrs, CT Bonnie K. Ray Department of Mathematics and Center for Applied Math and Statistics New Jersey Institute of Technology Newark, NJ June 12, 1996 Address for Correspondence: Bonnie K. Ray Department of Mathematics New Jersey Institute of Technology Newark, NJ Phone: (201) ; Fax: (201)

2 Bayesian Analysis of Vector ARMA Models using Gibbs Sampling 2

3 Bayesian Analysis of Vector ARMA Models using Gibbs Sampling Abstract We present a methodology for estimation, prediction and model assessment of multivariate autoregressive moving-average (VARMA) models in the Bayesian framework using Markov chain Monte Carlo algorithms. The sampling-based Bayesian framework for inference allows for the incorporation of parameter restrictions, such as stationarity restrictions or zero constraints, through appropriate prior specications. It also facilitates extensive posterior and predictive analyses through the use of numerical summary statistics and graphical displays, such as box plots and density plots for estimated parameters. We present a method for computationally feasible evaluation of the joint posterior density of the model parameters using the exact likelihood function, and discuss the use of backcasting to approximate the exact likelihood function in certain cases. We also show how to incorporate indicator variables as additional parameters for use in coecient selection. The sampling is facilitated through a Metropolis-Hastings algorithm. Graphical techniques based on predictive distributions are used to assess model adequacy and select among models. The methods are illustrated using two data sets from business and economics. The rst example consists of quarterly xed investment, disposable income, and consumption rates for West Germany, which are known to have correlation and feedback relationships between series. The second example consists of monthly revenue data from seven dierent geographic areas of IBM. The revenue data exhibit seasonality, strong inter-regional dependence, and feedback relationships between certain regions. KEYWORDS: VARMA models; Gibbs sampling; time series analysis; parameter selection 3

4 Multivariate autoregressive moving-average (VARMA) models have been widely used for modeling multiple time series, in order to incorporate relationships between series as well as within series. There is a considerable literature on inference for these models using frequentist approaches, such as least squares or maximum likelihood methods (see, e.g., Reinsel, 1993, for a review). A Bayesian modeling framework has the advantage of being able to incorporate available prior information in a natural way. Recently, Bayesian inference has been facilitated by the use of Markov chain Monte Carlo algorithms (Gelfand and Smith, 1990; Tanner, 1993) to generate samples from the required joint posterior distributions, thereby rendering unnecessary the tedious and sometimes intractable analytical posterior and predictive computations. The Gibbs sampling algorithm has been used for the Bayesian analysis of univariate time series by Marriott et al. (1993), McCulloch and Tsay (1994), and Pai and Ravishanker (1996). For multivariate time series, West and Harrison (1989) discuss ARMA model estimation in the Bayesian framework using a state-space formulation and direct computation of posterior distributions based on conjugate prior specications, without resorting to Monte Carlo methods. Using Gibbs sampling, Pai et al. (1994) present an analysis of contemporaneously correlated multiple time series data, while Li (1995) discusses parsimonious model selection for VARMA models using a conditional likelihood function and assuming conjugate prior distributions for certain process parameters. In this paper, we present a Bayesian inference methodology for VARMA processes using Markov chain Monte Carlo methods. We use a Metropolis-Hastings algorithm to generate samples from the joint posterior density of the model parameters based on the exact Gaussian likelihood function and discuss the use of backcasting to approximate the exact likelihood function in certain cases. We also show how indicator variables can be incorporated into the likelihood function as additional coecient selection parameters. Indicator variables having large posterior probability values dictate which VARMA coef- cients to include in the model. The samples are used to obtain numerical and graphical summary features of interest. We also discuss graphical methods for model assessment using the output from the sampler. Finally, we present forecasts and prediction intervals for future observations obtained using the predictive density. The techniques are illustrated using two sets of data. The rst consists of quarterly, seasonally adjusted xed investment, disposable income and consumption expenditures in billions of DM for West Germany; correlation and feedback relationships between series exists, as discussed in Lutktepohl (1993). 4

5 The second is a set of monthly IBM revenues collected from seven geographical regions. The revenue data is similar in stochastic behavior to the data analyzed by Pai et al. (1994), though more recent. The revenues exhibit seasonal autoregressive behavior and strong inter-regional dependence, with feedback relationships between certain of the regions. Multiplicative seasonal modeling of vector time series has received little attention in the literature relative to univariate seasonal modeling (Reinsel, 1993, p.187). We analyze the seasonal behavior of the revenue data using a full VARMA model, whereas the contemporaneous model of Pai et al. ignores the feedback mechanism. The outline of the paper is as follows. In the second section, we present the Bayesian modeling framework for VARMA time series. The third section presents the posterior density corresponding to the exact Gaussian likelihood and illustrates the use of latent variables to obtain a form of the likelihood that is computationally feasible for repeated evaluations in a Metropolis-Hastings sampling algorithm. The use of backcasting in the Bayesian framework to approximate the exact likelihood function in certain cases, such as when the incorporation of latent variables leads to an unmanageable dimension of the parameter space, is also discussed. Bayesian methods for prediction and model assessment are presented in the fourth section. In the fth section, we present two examples of our approach. The rst example is an exact likelihood analysis of the West German data; we t a full VAR(4) and a full VARMA(1,1) model to the data and use indicator variables for initial selection of coecients. We then analyze models in which zero constraints are imposed on those parameters associated with indicator variables having small posterior probability. The second example is an analysis of IBM revenue data using a seasonal VAR(3) model of seasonal period 3, in which the posterior is derived using the backcasting approach and certain parameters are constrained to be zero on the basis of cross-correlation analysis. Bayesian Framework for a Multivariate ARMA model Let Zt ~ denote a k-variate time series generated by a VARMA(p; q) process. (We use ~ to denote a vector). Then the process can be written as (B)( Z ~ t? ~) = (B)~a t ; (1) where ~a t are k-variate iid N (0; ) random variables, (B) = I? 1 B?? p B p is a matrix polynomial of degree p, (B) = I? 1 B? q B q is a matrix polynomial of degree q and ~ = ( 1 ; : : :; k ) is the mean vector. Here we assume that and obey the usual stationarity and invertibility conditions and 5

6 are left coprime. The potentially large number of parameters, as well as the possible non-uniqueness of the model representation, require that further constraints on the parameter matrices of a VARMA(p; q) model must often be imposed (see, e.g., Reinsel, 1993, pp.36{37). Here, we identify parsimonious models having certain coecients constrained to zero by using a Bayesian variable selection idea (Kuo and Mallick, 1994). To do this, we consider an expanded form of the VARMA(p; q) model in (1), where ~y t = ~z t? ~ denotes the mean subtracted process and ijl and ijm denote indicator variables supported at points 1 and 0. Then the VARMA(p; q) process having some parameter elements constrained to be zero may be expressed as y i;t = px kx ijl ijl y j;t?l + a i;t? qx kx l=1 j=1 m=1 j=1 ijm ijm a j;t?m : (2) When ijl = 1, the (i; j) th coecient of the l th AR matrix is included in the model; when ijm = 1, the (i; j) th coecient of the m th MA matrix is included. corresponding coecients are omitted from the model. Let ~ Zn = (~z T 1 ; : : :; ~zt n ) T When these indicator variables are 0, the denote a sample of n observations from a k-variate VARMA(p; q) process and let ~ and ~ denote the vectors of indicator variables for the AR coecients and the MA coecients respectively. The exact likelihood function is given by f( Zn ~ ; ; ) = (2)?kn=2 jj?1=2 exp[?( Zn ~? ~ n ) T?1 ( Zn ~? ~ n ) ] (3) 2 where = (; ; ~; ~ ; ~), with = V ec( 1 ; : : :; p ), = V ec( 1 ; : : :; q ) and is the covariance matrix of ~ Zn with k k block elements? Z (i? j) = E(~z i? ~) T (~z j? ~) and is a function of and. The vector ~ n = ( 1 ; ; k ; ; 1 ; ; k ) T has length kn. Given data ~ Z n, along with parameters and elements of, the Bayesian model specication requires a likelihood f( ~ Zn ; ; ) and a prior ( ; ). By Bayes theorem, we obtain the posterior density as ( ; j ~ Z n ) / f( ~ Z n ; ; )( ; ): We assume that ( ; ) = ( )(), in general, and that ( ) = (~) (; ) (~) ( ~ ). In the absence of specic subjective prior information, we may adopt a uniform prior for and over their respective stationary and invertibility regions, and an improper prior for ~ on R k. We let () / jj?1=2. We assume that the priors on each of the elements 6

7 of ~ and ~ are independent and have Bernoulli distributions with sucess probabilities p ijl and q ijm ; in the absence of additional information, these prior probabilities can be chosen to be 1/2 to reect equal prior weight for all possible 2 k2 (p+q) submodels. However, alternate weights can be incorporated into the prior specication as appropriate. Hence we have ( ; ) Q Q / i;j;l p ijl i;j;m q ijmjj?1=2. Subjective information about model parameters may be incorporated via informative prior distributions when available. The posterior distribution of ijl ( ijm ) measures the likelihood that the ij th element of l ( m ) is nonzero. Jointly, the posterior distributions of ijl and ijm determine which submodel of the full VARMA(p; q) model will be selected. Note that this approach is attractive, since we do not actually need to compute the posterior probabilities of each of the submodels. However, the posterior density that results as the product of the likelihood function and the prior densities is analytically intractable, hence a sampling based approach is necessary. The Gibbs sampling approach to estimating the model parameters involves sampling from the complete conditional distribution of each parameter in a systematic manner, conditional on the previous sampled values of the other parameters. This approach is always possible, since the complete conditional densities are available, up to a normalizing constant from the form of the likelihood prior (see Gelfand and Smith, 1990, for details). When these conditional densities do not have standard form, as is often the case, the Metropolis-Hastings algorithm may be used to obtain realizations from a Markov chain having the required stationary distribution (see, e.g., Gelfand and Smith, 1990; Hastings, 1970). Additionally, complete distributions of the estimated parameters, residuals and forecasts can easily be obtained. Computationally Feasible Evaluation of the Posterior Density The posterior distribution corresponding to the exact likelihood function given in (3) requires evaluation of the (nk) 2 elements of within each Gibbs iteration, as well as computation of?1. Although the use of an approximate likelihood may be computationally simpler, the exact likelihood is preferable for greater accuracy of parameter estimates, especially for small samples and for nearly nonstationary and/or noninvertible models. The following approach for the computation of the exact Gaussian likelihood eliminates the need to compute and invert. 7

8 Sampling-based Inference using the Exact Likelihood In the Bayesian framework, the exact likelihood function may be easily evaluated through the incorporation of latent variables denoting the unobserved history of the observed process and the error process. Let Y 0 = (~y 0 ; : : :; ~y 1?p ) denote the history of the mean subtracted data process and A 0 = (~a 0 ; : : :; ~a 1?q ) denote the history of the error process. Then = ( ; ; Y 0 ; A 0 ) denotes the augmented parameter vector incorporating the latent variables. As in Marriott et al. (1996) for the univariate case, the multivariate conditional likelihood f( Zn ~ ; ) may be factorized in the following manner: f( Z ~ n ; ) = f(~z 1 j )f(~z 2 j~z 1 ; ) f(~z n j~z 1 ; : : :; ~z n?1 ; ) (4) P = (2)?kn=2 jj?n=2 expf? 1 n 2 t=1 (~y t? ~ t ) T?1 (~y t? ~ t )g; where ~ 1 = ~ t = ~ t = px i ~y 1?i? qx i=1 i=1 px i=1 px i ~y t?i? i ~y t?i? Xt?1 i=1 qx i=1 i=1 i ~a 1?i (5) i (~y t?i? ~ t?i )? qx i=t i ~a t?i ; i (~y t?i? ~ t?i ); t = q + 1; : : :; n; t = 2; : : :; q with the (l; m) th element of i equal to lmi lmi, l; m = 1; : : :; k; i = 1; : : :; p and the (l; m) th element of i equal to lmi lmi, l; m = 1; : : :; k; i = 1; : : :; q. With the incorporation of the latent variables, we factor ( ) as ( ) = (Y 0 ; A 0 j ; )( )(). The form of (Y 0 ; A 0 j ; ) is that of a Gaussian likelihood function, with the covariance of (Y 0 ; A 0 ) computed assuming a VARMA(p; q) process. Sampling-based Inference using an Approximate Likelihood Function Although calculation of the exact likelihood function in the Bayesian framework through incorporation of latent variables allows for easier computation of the posterior density than can be obtained using the form given in (3), the number of parameters to be estimated may become infeasible for short time series when p and q are even moderately large and/or the model is seasonal. Since the Y 0 and A 0 8

9 needed to compute the posterior distribution are not of much direct interest, it may be preferable to obtain them via backcasting in such cases (see Pai et al., 1994). The backcasted values at a particular iteration are generated in the usual way, (see, e.g., Box et al., 1994, p.230) based on the sampled parameters at that iteration. Although inappropriate in a strict Bayesian paradigm, backcasting yields a good approximation to Eq.(4) and can reduce model dimension drastically. Implementation of Metropolis-Hastings Algorithm The Gibbs sampling algorithm requires sampling from the complete conditional distributions associated with elements of ( ; ; Y 0 ; A 0 ) in some systematic order (see Gelfand and Smith, 1990). Let ~ (?ijl) denote the vector of indicator variables for the AR coecients excluding ijl. The posterior conditional distribution of ijl given ~ (?ijl) ; ; ~ ; ; ~; ; Zn ~ is Bernoulli B(1; p ijl ) with p ijl = c ijl=(c ijl + d ijl ), where c ijl = p ijl expf? 1 2 nx t=1 (~y t? ~ t ) T?1 (~y t? ~ t )g (6) with ijl = 1 and d ijl = (1? p ijl ) expf? 1 2 nx t=1 (~y t? ~ t ) T?1 (~y t? ~ t )g (7) with ijl = 0 and ~ t is computed as in (5). The posterior conditional distribution of ijl is computed in an analogous manner. For all elements of except ~ and, ~ we use the Metropolis-Hastings algorithm with appropriate Gaussian proposals (Hastings, 1970) to obtain samples from the required stationary distributions. An inverse Wishart proposal (Press, 1982) is used to obtain samples from the required stationary distribution of. For each draw, we run Metropolis trajectories within each Gibbs sampler updating step (see Pai et al., 1994, for a discussion of the Metropolis-Hastings updating scheme for estimating contemporaneously correlated ARMA models). To implement the sampling algorithm, we block the parameters into the following groups: (~; ), ~ (; ); (Y 0 ; A 0 ), ~, and. Within each Gibbs iteration, we draw samples of (~; ) ~ from the Bernoulli distribution described above, and use draws based on a Metropolis scheme for each of the other groups; for each draw, the Metropolis algorithm runs a conditional Markov chain whose stationary distribution is the required posterior density. For example, for (; ) we use a Gaussian proposal denoted by h k 2 (p+q) (; ), whose covariance matrix is obtained from 9

10 the observed Fisher information for (; ). Let U denote the current value of (; ); we draw V from h k 2 (p+q) centered at U and restricted to the stationary and invertible region. We calculate the ratio (U; V ) = f(v; ~; ; ~ Y 0 ; A 0 ; ~; )=f(u; ~; ; ~ Y 0 ; A 0 ; ~; ): If 1 we move to V; if < 1 we move to V with probability. The stationary distribution of this Markov chain is the normalized density associated with the conditional distribution for (; ) given (~; ; ~ Y 0 ; A 0 ; ~; ) (Hastings, 1970; Tierney, 1994). Sampling subject to stationarity and invertibility conditions on and is handled by rejection sampling (see Pai et al., 1994). In the univariate case, an alternate choice is to employ a reparametrization (Marriott et al., 1996). In the multivariate case, reparametrization is not as straightforward. Rejection sampling may also be used to force parameter estimates to obey certain constraints. After updating (~; ) ~ and (; ), we make draws for (Y 0 ; A 0 ), ~, and by running suitable Metropolis schemes, to complete one iteration of the Metropolis-Hastings sampler. The covariance used for sampling ~ is that obtained from the observed Fisher information for ^, an initial estimate of. The parameter is sampled using an inverse Wishart proposal having parameters k and ^, where ^ is an initial estimate of : The covariance used for sampling (Y 0 ; A 0 ) is the covariance of a VARMA(p; q) model with parameters (^; ^; ^). We conduct the estimation including model selection in two steps. In the rst model selection step, we incorporate (~; ) ~ as parameters in the model. The posterior means of the elements of (~; ) ~ are used to determine which elements in the AR and MA matrices should be included in the model. In the second step, we obtain nal estimates for the model(s) selected after the rst step by constraining to zero those elements of the AR and MA matrices for which the corresponding ~ and ~ parameters have small posterior means. For the implementation of the second step, ~ and ~ are xed, having elements taking value 1 or 0 as determined in the rst step, and are not updated as parameters in the Metropolis-Hastings scheme. As a set of initial values for the sampler, we use estimates obtained via maximum likelihood estimation or conditional least squares. These estimates provide initial values for a single Markov chain, which must usually be run for a fairly large number of iterations, and convergence monitored. After the chain has converged, say at the j th iteration, a set of samples (of size R) is obtained by choosing every h th sample, where the sample autocorrelations of the chain at lags k h are very small. Alternately, the initial parameter estimates may be perturbed based on Gelman and Rubin's (1992) overdispersion criterion to start independent parallel Markov chains. The transition of the Markov chains is investigated 10

11 by monitoring the autocorrelations of each chain and the mixing between the chains. The convergence of the iterative simulation may be monitored by estimating a potential scale reduction factor (Gelman and Rubin, 1992). A large value of the factor suggests the importance of further simulations; once the factor is near 1 for all scalar estimands of interest, we collect samples from selected iterations of the converged chains to ensure that we have independent and identically distributed samples from the joint posterior density. Forecasting and Model Determination In Bayesian analysis, predictions are obtained via the predictive density f(~z F j ~ Z n ) = Z f(~z F j ~ Z n ; )( j ~ Z n )d : Using the output of the Gibbs sampler, the density is approximated using Monte Carlo integration. For example, to estimate the predictive density of ~ Zn+k = (~z n+1 ; : : :; ~z n+k ), we use f( ~ Z n+k j ~ Z n ; j) = ky i=1 f(~z n+i j~z n+i?1 ; : : :; ~z n+1 ; ~ Z n ; j); (8) where f(~z n+i j~z n+i?1 ; : : :; ~z n+1 ; Z ~ n ; j ) is a normal density function with variance-covariance matrix j and mean n+i;j computed using (4) with the parameters obtained at iteration j of the sampler. To obtain a sample of predictions from the density (8), random values ~ Z n+k;j are drawn from f( Z ~ n+k j Z ~ n ; j ); j = 1; : : :; R. Prediction intervals are obtained using specied upper and lower percentiles of the sampled predictions. Bayesian assessment of model determination is also based on predictive distributions, thus pairwise choice of models in a Bayesian framework based on the formal Bayes criterion requires calculation of the Bayes factor (ratio of marginal predictive distributions). Pai and Ravishanker (1996) discuss the use of Bayes factors for model choice among univariate long memory models in the context of improper prior specication on the parameters. In this paper, we suggest an alternate approach to assess model adequacy and determine model choice for "candidate" models selected using the indicator variables ~ and ~. This is based on EDA-style graphical displays of summary statistics for the predictive performance of competing models at each time point and for each series, given the observations at hand (see Marriott et al., 1996, for the univariate analog). 11

12 Suppose we denote the observed series by ~z obs = (~z 1;obs ; :::; ~z n;obs ) and we wish to predict a replication of the series ~z = (~z 1 ; :::; ~z n ) given ~z obs (and a particular VARMA model), then the predictive distribution of ~z given ~z obs is, analogous to (8), Z f(~zj~z obs ) = f(~zj~z obs ; )( j~z obs )d : (9) Thus samples from the predictive distribution are drawn by rst sampling j ; j = 1; : : :; R and then drawing ~z from f(~zj j ). The ~z t; t = 1; : : :; n are drawn sequentially using (4). To construct EDA-style diagnostics, the sample f~z t;j ; j = 1; : : :; Rg is used to obtain jj~z obs? E(~z t j~z obs )jj, where jj jj denotes the norm of a vector. The sample is also used to obtain a measure of the dispersion V t in the predictive distributions, where P R P k j=1 i=1 V t = ((z t;i;jjz obs;i )? E(z t;i jz obs;i )) 2 ; (10) R? 1 which is analogous to the sample variance in the univariate case. A scatter plot of jj~z obs? E(~z t j~z obs )jj versus V 1=2 t for t = 1; : : :; n gives an informal model choice criterion. Models having points lying in a cloud around the origin correspond to smaller dispersion in the predictive distributions and suggest that the observations are consonant with these distributions. Thus a poor model will perform poorly on both the x and y scales, the satisfactory parsimonious model will do well on both scales, while an overtted model will perform well on the y scale alone. This approach works with nested as well a non-nested models, although the choice is not clear-cut when the point clouds overlap considerably. In such cases, single number summaries of the predictive distributions or analysis of plots of the estimated kernel density estimates of the model parameters provide additional information useful for model determination. See Marriott et al. (1996) for a discussion of a similar criterion in the univariate case. The examples in the following section illustrate these ideas. Illustrative Examples VARMA Bayesian Modeling of West German Economic Data using the Exact Likelihood The data consists of quarterly, seasonally adjusted data on xed investment, disposable income and consumption expenditure (in millions of DM) for West Germany (Lutkepohl, 1993) from 1960 to

13 The rst n=75 observations on the rst dierences of the logarithms of these data ( to ) are used for model tting, while forecasts are obtained and evaluated for the quarters to We analyze the data in two steps; rst we t a full VAR(4) model to the data, similar to the analysis of Lutkepohl, using an exact Bayesian approach through the incorporation of four latent variables, each of dimension 3, corresponding to the four autoregressive lags. In the initial step, we do not constrain any elements of the AR matrices to be zero, but rather incorporate the additional parameters (~; ) ~ into the model to select submodels having high posterior probability. The Metropolis-Hastings algorithm is started with initial estimates provided by the maximum likelihood estimates of the VAR(4) model obtained using the SCA statistical system. In tting the model, we ran 5000 iterations for each of 4 parallel chains, with 20 sub-trajectories for the Metropolis scheme. Convergence was monitored by looking at the Gelman and Rubin statistics for each parameter. The Gelman and Rubin statistics were all close to one, indicating convergence. After obtaining convergence, we select every 10 th sample from each parallel chain (the sample serial correlations of the sampled parameter values were close to zero at lags 10 or greater) to yield a total of 1000 samples for each parameter. On the basis of the posterior means of the ~ elements, we considered two reduced VAR(4) models in the second step. First we estimated the reduced model for which all the posterior means for ijl in Step 1 were greater than 0.5. The posterior means and standard deviations for the estimated j parameters, as well as for the elements of the error process covariance matrix, are shown in the rst part of Table 1 (Model 1). For comparison, we also t a reduced VAR(4) model with the (2,1) element of 1 and the (1,1) element of 3 estimated, in addition to the elements estimated in Model 1. These additional elements had ijl posterior means in Step 1 of 0.46 and 0.33 respectively. The results are given in the second part of Table 1 (Model 2). We see that the additional parameters do not dier signicantly from zero, although the residual variance for each of the three series is smaller than for Model 1. We note also that in Lutkepohl's analysis, the model chosen using the Hannan-Quinn model selection criterion is exactly the same as our Model 1, with parameter estimates very close to the posterior means obtained using our Bayesian approach. INSERT Table 1 ABOUT HERE We also t an unconstrained VARMA(1,1) model to the data. On the basis of the posterior means of the corresponding ~ and ~ elements, the constrained model shown in the bottom part of Table 1 (Model 3) was estimated. Only the (3,2) element of the 1 matrix and the (3; 3) element of the 1 matrix dier 13

14 signicantly from zero. Small positive estimates of the mean of each of the three logged dierenced series were obtained for each of the models, with ^ 1 = :0:018(:005); ^ 2 = :021(:002); ^ 3 = :020(:002) for Model 1, indicating that xed investments, disposable income, and consumption expenditures all have signicant positive long-term drift. Estimates of the series means for the other models were very similar. For all models, there is evidence of signicant cross-correlation in the residual series. We select among the three models on the basis of their predictive densities below. Figure 1 shows a plot of the mean absolute deviations of the observations from their average predicted values versus the dispersion measure for the predicted values, V 1=2 t, as described in the previous section, for the three models of the West German data. All the models have point clouds close to the origin, and are almost indistinguishable. Model 1 has points lying a little further to the right, indicating a less adequate model. Parameter estimates for Model 3 indicate that while changes in xed investments (disposable incomes) depend only to a small extent on xed investment rates (disposable incomes) in previous quarters, and are uncorrelated with changes in the other series, changes in consumption expenditures depend on both consumption expenditures in previous quarters, as well as on the previous quarters' disposable incomes. INSERT Figure 1 ABOUT HERE We also investigate the performance of the out-of-sample forecasts obtained from the three models for each of the three series over the period to Figure 2 presents features of the forecasts relative to the observed data over the sixteen quarters from to , with forecasting carried out as described in the previous section. INSERT Figure 2 ABOUT HERE The 95% prediction intervals are shown for each series for the 16 quarters, based on each of the three models. Model 3 appears to produce the best forecasts, although Model 2 is competitive. VARMA Modeling of IBM Revenue Data in the Bayesian Framework The data consists of monthly observations on the revenue of seven dierent geographical regions within IBM, extending over 34 months (see Figure 3). INSERT Figure 3 ABOUT HERE 14

15 To maintain condentiality, the exact years are not disclosed and the numbers have all been rescaled by a constant; the rescaling, however, does not alter the dependence structure between and within the series. The seven time series exhibit remarkably similar stochastic patterns. Ravishanker et al. (1995) used shrinkage estimation with contemporaneously correlated ARMA processes to model these series, ignoring possible feedback mechanisms that can be handled using a more general VARMA model. In this section, we investigate the usefulness of such a general model for the revenue data in a Bayesian framework. We carry out model estimation based on the rst 30 months, reserving the last 4 months for forecast evaluation. As in Pai et al. (1994) for a similar set of revenue series, we apply a seasonal dierence of order 3 to the data and t to the seasonally dierenced series a seasonal AR model of order P = 3. The mean of the seasonally dierenced data is assumed to be zero. Using exact maximum likelihood estimation with latent variables, all AR parameters estimated, and unstructured would require estimation of 238 parameters! Examination of the sample cross-correlations between residuals obtained from univariate SAR(3) models t to the seasonally dierenced series indicates that feedback relationships occur only between certain regions. The AR parameters corresponding to regions having no feedback relationships are constrained to be zero, resulting in 12 nonzero parameters coecients estimated for each j ; j = 1; 2; 3. Additionally, the method of backcasting is used to approximate the exact likelihood function, further reducing the number of parameters by 9 7 = 56. Following the analysis of Pai et al. (1994), homogeneous correlation among the regions is reasonable, thus is structured as ii = i 2 ; ij = i j ; bringing the total number of parameters estimated to (12 3) + 8 = 44. To implement the sampling algorithm we block the parameters into the following groups: ( j ; j = 1; 2; 3), ( i ; i = 1; ; 7), and, with nonroutine draws from a normal proposal used to sample transformed values of i and (see Pai et al., 1994). Summary results on the means and standard deviations of the marginal posterior distributions of the parameters are shown in Table 2 (Model 1), along with results from tting a contemporaneous ARMA model, i.e. all o-diagonal elements of the AR matrices constrained to be zero (Model 2). There appear to be small, but signicant, relationships between revenues for Regions 2 and 1 and Regions 5 and 3. The estimated diagonal elements of the AR parameters are similar for both models, with the exception of the 3 diagonal values for Regions 3 and 6. INSERT Table 2 ABOUT HERE 15

16 Figure 4 shows the posterior distribution of under Model 1 and Model 2. The posterior mean is 0.47 with a posterior standard deviation of 0.07 under Model 1, while under Model 2 the corresponding values are 0.38 and The distribution of appears to be slightly skewed to the right under Model 1. The box plots of the samples from the posteriors of each of the i 2 under Model 1 and Model 2 are shown in Figure 5. These plots are useful in assessing the homogeneity of variances for the seven regions. In each gure, the plot on the left corresponds to Model 1. The plots indicate that an assumption of identical variances for all regions is questionable. INSERT Figure 4 ABOUT HERE INSERT Figure 5 ABOUT HERE Figures 6, 7, 8 and 9 pertain to a discussion of forecasting in the sampling-based Bayesian framework. Figure 6 presents the forecasting features of the future data ~z F under Model 1 and Model 2, carried out as described in the previous section. The 95% prediction intervals are shown for each region for months 31, 32, 33 and 34. The bold line indicates the actual revenues for these time periods. The forecasts for all 4 months under both models appear to be reasonably good, however for Region 3 and Region 5, the actual observed revenue values lie closer to the upper prediction limits under Model 2. This may be a result of the apparent feedback relationship between Regions 3 and 5 which is unaccounted for in Model 2. Other than this, the intervals are virtually indistinguishable. Figure 7 presents, for each of the two models, forecasts for the same periods aggregated over all seven regions. Again the tight upper limit under Model 2 is evident. INSERT Figure 6 ABOUT HERE INSERT Figure 7 ABOUT HERE Figures 8 and 9 illustrate an attractive feature of the sampling-based Bayesian framework; various forecasting features of interest which may be complicated to obtain in the frequentist framework may be routinely computed in the Bayesian framework. For example, there is often interest in looking at predictions of the regional revenues as proportions of the total revenue for all regions. Although in the frequentist approach, the derivation of the corresponding prediction intervals is complicated by the nonlinearity imposed, it is straightforward in the sampling-based Bayesian framework. The forecasted proportions for the revenue data are shown for Model 1 in Figure 8. Figure 9 presents the forecasts and prediction intervals for the revenues aggregated over time for each region, also under Model 1. 16

17 INSERT Figure 8 ABOUT HERE INSERT Figure 9 ABOUT HERE References Box, G.E., Jenkins, G.M., and Reinsel, G.C., Time Series Analysis: Forecasting and Control, 3rd edn., Englewood Clis, NJ: Prentice-Hall (1994). Gelfand, A.E. and Smith, A.F.M., \Sampling based approaches to calculating marginal densities", Journal of the American Statistical Association, 85 (1990), 398{409. Gelman, A. and Rubin, D.B.,\Inference from iterative simulation using multiple sequences (with discussion)", Statistical Science, 7 (1992), George, E. and McCulloch, R.E., \Variable selection via Gibbs sampling", Journal of the American Statistical Association, 88 (1993), Hastings, W.K., \Monte Carlo sampling methods using Markov chains and their applications", Biometrika, 57 (1970), Kuo,L. and Mallick, B., \Variable selection for regression models," Technical Report 94-26, University of Connecticut (1994). Li, H., \Markov variance-shift model, random intervention model, Bayesian VARMA model and their application", preprint, University of Chicago (1995). Lutkepohl, H., Introduction to Multiple Time Series Analysis, 2nd edn, New York: Springer Verlag (1993). Marriott, J.M., Ravishanker, N., Gelfand, A.E. and Pai, J.S., \Bayesian analysis for ARMA processes: Complete sampling based inference under exact likelihoods", Bayesian Statistics and Econometrics: Essays in honor of Arnold Zellner, eds. D.Barry, K.Chaloner and J.Geweke, New York: John Wiley, (1996), McCulloch, R.E. and Tsay, R.S., \Bayesian analysis of autoregressive time series via the Gibbs sampler", Journal of Time Series Analysis, 15 (1994), Pai, J.S., Ravishanker, N. and Gelfand, A.E., \Bayesian analysis of concurrent time series with application to regional IBM revenue data", Journal of Forecasting, 13 (1994), Pai, J.S. and Ravishanker,N.,\Bayesian modeling of ARFIMA processes by Markov chain Monte Carlo methods", Journal of Forecasting, 15 (1996), Press, S.J., Applied Multivariate Analysis, 2nd edn, New York: Holt, Reinhart and Winston (1982). Ravishanker,N., Wu, L. S.-Y. and Dey, D.K., \Shrinkage estimation in time series using a bootstrapped covariance estimate", Journal of Statistical Computation and Simulation, 53 (1995),

18 Reinsel, G.C., Elements of Multivariate Time Series Analysis, Springer Series in Statistics, New York: Springer Verlag (1993). Tanner, M., Tools for statistical inference: Observed Data and Data Augmentation methods, Lecture Notes in Statistics, New York: Springer Verlag (1993). Tierney, L., \Markov chains for exploring posterior distributions (with discussion)", Annals of Statistics, 22 (1994), West, M. and Harrison, J., Bayesian Forecasting and Dynamic Models, New York: Springer-Verlag (1989). 18

19 Table 1: Summary of tted model parameters for West German economic data for constrained VAR(4) and constrained VARMA(1,1) models Model 1 Position (1,1) -.209(.103).331(.102) (2,1) (3,2).287(.080).318(.073) (3,3) -.446(.096) 10 2 (1,1).201(.020) (1,2).017(.007) (1,3).018(.005) (2,2).030(.004) (2,3).015(.003) (3,3).014(.002) Model 2 Position (1,1) -.219(.100).173(.103).369(.102) (2,1).051(.028) (3,2).283(.081).330(.074) (3,3) -.402(.099) 10 2 (1,1).186(.023) (1,2).012(.007) (1,3).014(.005) (2,2).026(.004) (2,3).013(.003) (3,3).012(.002) Model 3 Position 1 1 (1,1).187(.106) (2,2) -.228(.133) (3,2).510(.098) (3,3).459(.093) 10 2 (1,1).228(.027) (1,2).016(.007) (1,3).018(.005) (2,2).024(.003) (2,3).018(.002) (3,3).013(.002) 19

20 Table 2: Summary of tted seasonal AR parameters by model and by region for IBM revenue data Model 1 Model 2 Position (1,1) (.0872) (.0901) (.0593) (.0860) (.0931) (.0576) (2,2) (.2041) (.2646) (.2211) (.1410) (.1800) (.1786) (3,3) (.1931) (.2598) (.2854) (.1477) (.1766) (.1302) (4,4) (.1349) (.1626) (1331) (.1368) (.1724) (.1457) (5,5) (.1591) (.1902) (.1468) (.1760) (.2713) (.2896) (6,6) (.1797) (.2612) (.2996) (.1760) (.2713) (.2896) (7,7) (.1514) (.1471) (.1751) (.1471) (.1495) (.1792) (2,1) (.1238) (.1491) (.1203) (2,6) (.1899) (.2472) (.2272) (3,7) (.2189) (.1821) (.2588) (5,3) (.1617) (.1978) (.1589) (6,5) (.1535) (.1389) (.1604) 20

21 absolute deviation model 1 + model 2 * model standard deviation Figure 1: Model choice plot for West German economic data 21

22 Series 1 Series 1 Series model 1 model 2 model 3 Series 2 Series 2 Series model 1 model 2 model 3 Series 3 Series 3 Series model 1 model 2 model 3 Figure 2: 16-month-ahead 95% forecast intervals for West German data relative to actual(solid line), by series 22

23 Region 1 Revenue Region 2 Revenue Region 3 Revenue Region 4 Revenue Region 5 Revenue Region 6 Revenue Region 7 Revenue Figure 3: IBM regional revenue data for 7 regions. 23

24 Model 1 Model 2 Figure 4: Posterior distribution of common correlation for Model 1 (full model) and Model 2 (contemporaneous model 24

25 Region 1 Region 2 Region 3 Region Region 5 Region 6 Region 7 Figure 5: Box-plots of posterior samples for 2 i under Model 1 (full) and Model 2(contemporaneous) 25

26 Region 1 Region 2 Region 3 Region Region 5 Region 6 Region 7 Figure 6: Four-month-ahead 95% forecast intervals relative to actual (solid line) by region for Model 1 (full, denoted by dotted line) and Model 2 (contemporaneous, denoted by dashed line) 26

27 Figure 7: Four-month-ahead 95% aggregate (over regions) forecast intervals for Model 1 (full, denoted by dotted line) and Model 2 (contemporaneous, denoted by dashed line) compared with actual (solid line) 27

28 Region 1 Region 2 Region 3 Region Region 5 Region 6 Region 7 Figure 8: Four-month-ahead 95% forecast intervals for proportions for Model 1, compared with actual (solid line) 28

29 Region 1 Region 2 Region 3 Region Region 5 Region 6 Region 7 Figure 9: Four-month-ahead 95% aggregate (over time) forecast intervals for Model 1, compared with actual (solid line) 29

TAKEHOME FINAL EXAM e iω e 2iω e iω e 2iω

TAKEHOME FINAL EXAM e iω e 2iω e iω e 2iω ECO 513 Spring 2015 TAKEHOME FINAL EXAM (1) Suppose the univariate stochastic process y is ARMA(2,2) of the following form: y t = 1.6974y t 1.9604y t 2 + ε t 1.6628ε t 1 +.9216ε t 2, (1) where ε is i.i.d.