Bayesian inference for the mixed conditional heteroskedasticity model

Similar documents
CORE DISCUSSION PAPER 2005/85 BAYESIAN INFERENCE FOR THE MIXED CONDITIONAL HETEROSKEDASTICITY MODEL. December 1, Abstract

Stock index returns density prediction using GARCH models: Frequentist or Bayesian estimation?

Volatility. Gerald P. Dwyer. February Clemson University

Econometrics. Journal. Theory and inference for a Markov switching GARCH model

August 13, 2007, revised February 21, 2008

Session 5B: A worked example EGARCH model

Modeling conditional distributions with mixture models: Theory and Inference

Analytical derivates of the APARCH model

Generalized Autoregressive Score Models

Econ 423 Lecture Notes: Additional Topics in Time Series 1

2007/97. Mixed exponential power asymmetric conditional heteroskedasticity. Mohammed Bouaddi and Jeroen V.K. Rombouts

Markov Chain Monte Carlo Methods

Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US

Bayesian Semiparametric GARCH Models

GARCH Models. Eduardo Rossi University of Pavia. December Rossi GARCH Financial Econometrics / 50

Bayesian Semiparametric GARCH Models

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations

Gaussian kernel GARCH models

Infinite-State Markov-switching for Dynamic. Volatility Models : Web Appendix

13. Estimation and Extensions in the ARCH model. MA6622, Ernesto Mordecki, CityU, HK, References for this Lecture:

Bayesian Methods for Machine Learning

The Bayesian Approach to Multi-equation Econometric Model Estimation

Multivariate Asset Return Prediction with Mixture Models

Katsuhiro Sugita Faculty of Law and Letters, University of the Ryukyus. Abstract

Bayesian semiparametric GARCH models

Modeling conditional distributions with mixture models: Applications in finance and financial decision-making

Index. Pagenumbersfollowedbyf indicate figures; pagenumbersfollowedbyt indicate tables.

ECO 513 Fall 2009 C. Sims HIDDEN MARKOV CHAIN MODELS

Contents. Part I: Fundamentals of Bayesian Inference 1

ABC methods for phase-type distributions with applications in insurance risk problems

Accounting for Missing Values in Score- Driven Time-Varying Parameter Models

Bayesian Modeling of Conditional Distributions

Kazuhiko Kakamu Department of Economics Finance, Institute for Advanced Studies. Abstract

POSTERIOR ANALYSIS OF THE MULTIPLICATIVE HETEROSCEDASTICITY MODEL

The Metropolis-Hastings Algorithm. June 8, 2012

ECONOMICS 7200 MODERN TIME SERIES ANALYSIS Econometric Theory and Applications

ADVANCED FINANCIAL ECONOMETRICS PROF. MASSIMO GUIDOLIN

1 Phelix spot and futures returns: descriptive statistics

Bayesian Regression Linear and Logistic Regression

VAR models with non-gaussian shocks

Computational statistics

Switching Regime Estimation

Discussion of Predictive Density Combinations with Dynamic Learning for Large Data Sets in Economics and Finance

STAT 425: Introduction to Bayesian Analysis

Markov Switching Regular Vine Copulas

Modeling Ultra-High-Frequency Multivariate Financial Data by Monte Carlo Simulation Methods

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix

The profit function system with output- and input- specific technical efficiency

Revisiting linear and non-linear methodologies for time series prediction - application to ESTSP 08 competition data

Research Article The Laplace Likelihood Ratio Test for Heteroscedasticity

Lecture Notes based on Koop (2003) Bayesian Econometrics

D I S C U S S I O N P A P E R 2009/61. On marginal likelihood computation in change-point models. Luc Bauwens and Jeroen V.K.

ISSN Article. Selection Criteria in Regime Switching Conditional Volatility Models

Partially Censored Posterior for Robust and Efficient Risk Evaluation.

Inference in VARs with Conditional Heteroskedasticity of Unknown Form

eqr094: Hierarchical MCMC for Bayesian System Reliability

Heteroskedasticity in Time Series

A Comparison of Bayesian Model Selection based on MCMC with an application to GARCH-Type Models

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Outlier detection in ARIMA and seasonal ARIMA models by. Bayesian Information Type Criteria

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling

GARCH processes probabilistic properties (Part 1)

Modelling and forecasting of offshore wind power fluctuations with Markov-Switching models

CPSC 540: Machine Learning

Marginal Specifications and a Gaussian Copula Estimation

Eco517 Fall 2013 C. Sims MCMC. October 8, 2013

Gibbs Sampling in Linear Models #2

The Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition.

Dynamic Factor Models and Factor Augmented Vector Autoregressions. Lawrence J. Christiano

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

GARCH Models Estimation and Inference

MH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution

Online Appendix for: A Bounded Model of Time Variation in Trend Inflation, NAIRU and the Phillips Curve

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Simulation of truncated normal variables. Christian P. Robert LSTA, Université Pierre et Marie Curie, Paris

Label Switching and Its Simple Solutions for Frequentist Mixture Models

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Working Papers in Econometrics and Applied Statistics

Cross-sectional space-time modeling using ARNN(p, n) processes

Doing Bayesian Integrals

Using all observations when forecasting under structural breaks

Introduction to ARMA and GARCH processes

Study Notes on the Latent Dirichlet Allocation

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

Kobe University Repository : Kernel

Likelihood-free MCMC

Bayesian inference for factor scores

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

Lecture 6: Univariate Volatility Modelling: ARCH and GARCH Models

Pattern Recognition and Machine Learning

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Markov Chain Monte Carlo

STA 4273H: Statistical Machine Learning

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Reminder of some Markov Chain properties:

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering

EM Algorithm II. September 11, 2018

Transcription:

Econometrics Journal (2007), volume 0, pp. 408 425. doi: 0./j.368-423X.2007.0023.x Bayesian inference for the mixed conditional heteroskedasticity model L. BAUWENS AND J.V.K. ROMBOUTS CORE and Department of Economics, Université Catholique de Louvain Institute of Applied Economics at HEC Montréal, CIRANO, CIRPEE, CORE and CREF First version received: November 2005; final version accepted: February 2007 Summary We estimate by Bayesian inference the mixed conditional heteroskedasticity model of Haas et al. (2004a Journal of Financial Econometrics 2, 2 50). We construct a Gibbs sampler algorithm to compute posterior and predictive densities. The number of mixture components is selected by the marginal likelihood criterion. We apply the model to the SP500 daily returns. Key words: Bayesian inference, Finite mixture, ML estimation, Value at risk.. INTRODUCTION Finite mixture models (see e.g. McLachlan and Peel 2000) are more and more used in statistics and econometrics. Their main advantage lies in the flexibility they provide in model specification, compared to the use of a more simple distribution. On the other hand, these models are more difficult to estimate than corresponding models without a mixture, but their estimation becomes more and more feasible as computational power increases. However, computational power is not sufficient, one needs also good algorithms. Maximum likelihood estimation of mixture models is not at all as easy as for non-mixture models, and not very reliable in some cases. The EM algorithm was initially developed in this perspective (see Dempster et al. 977). Bayesian estimation is also very efficient for mixture models (see Marin et al. 2005; Geweke and Keane 2005). Conditionally heteroskedastic models are very widespread for modelling time-series of financial returns. The most used class of model is the GARCH family (see e.g. Bollerslev et al. 994) for a survey. A lot of research has been devoted to refine the dynamic specification of the conditional variance equation, for which the benchmark is the linear GARCH specification of Bollerslev (986). The conditional distribution of the model error term is chosen by most researchers among the normal, Student-t, skewed versions of these, and the GED distribution (see Nelson 99). Empirical models typically include around five parameters to fit time-series of a few thousand observations. This may be considered as a powerful way to represent the data. Such models fit the most important stylized facts of financial returns, which are volatility clustering and fat tails. However, a typical result of the estimation of such models is that the conditional variance is almost integrated of the order one and therefore very persistent, at least for relatively long time series at the daily frequency. Several authors have argued that this could be an artefact of structural changes (see e.g. Diebold 986; Mikosch and Starica 2004). Furthermore, it has also. Published by Blackwell Publishing Ltd, 9600 Garsington Road, Oxford OX4 2DQ, UK and 350 Main Street, Malden, MA, 0248, USA.

Bayesian inference 409 been observed that volatility is less persistent around crisis periods than during normal periods. Such empirical regularities can be captured by using a finite mixture approach. Finite mixture GARCH models have been recently developed by Haas et al. (2004a), who build on the results of Wong and Li (2000, 200), Haas et al. (2004b) and Alexander and Lazar (2004). All these authors use ML estimation, while Bauwens et al. (2004) propose a particular two component mixture GARCH model and estimate it by Bayesian inference. Bayesian estimation of GARCH models has been studied by Geweke (989), Kleibergen and Dijk (993) and Bauwens and Lubrano (998). Note that finite mixtures are different from continuous mixtures. An example of a continuous mixture GARCH model is a GARCH equation combined with a Student-t distribution for the error term, since the latter distribution is a continuous mixture of normal distributions whose variance follows an inverted-gamma distribution. Thus, a t-garch model results in fatter tails than a Gaussian GARCH but it does not increase the flexibility of the conditional variance equation, whereas a finite mixture GARCH model permits this. Bayesian inference for the mixed normal GARCH model of Haas et al. (2004a) is the subject of this paper. The model is defined in Section 2. In Section 3, we explain how this model can be estimated in the Bayesian framework. We design a Gibbs sampler, and discuss how to obtain predictive densities and how to choose the number of components of the mixture. In Section 4, we apply the approach to returns of the SP500 index. 2. MIXED CONDITIONAL HETEROSKEDASTICITY Haas et al. (2004a) define a mixture model on a demeaned series = y t E(y t F t ) where F t is the information set up to time t and the conditional mean does not depend on the components of the mixture. They call this model (diagonal) MN-GARCH (MN for mixed normal). The conditional CDF of is the K-component mixture ( ) K µ k F( F t ) = π k () hk,t where k= h k,t = ω k + α k ɛ 2 t + β kh k,t (2) and ( ) is the standard Gaussian cdf. Note that the parameter π k is positive for all k and K k= π k =, which is imposed by setting π K = K k= π k. The other Greek letters denote the other parameters. The zero mean assumption on is ensured by the restriction K µ K = k= π k µ k π K. Haas et al. (2004a) also consider a more general model where the h k,t s are GARCH(p k, q k ) and more importantly may depend on other h j,t s, k j (contrary to the diagonal specification defined above). The weak stationarity condition for a (diagonal) MN-GARCH model is [ K ( )] π K k α k β k β k > 0, (4) β k k= k=

40 L. Bauwens and J.V.K. Rombouts where β k = β k. Its unconditional variance is then given by E ( ɛ 2 t ) = c + K k= π kω k / β k K k= π k( α k β k )/ β k (5) where c = K k= π kµ 2 k. One can check that the process may be stationary even if some components are not stationary provided that these components have sufficiently low corresponding component weights. Strict stationarity conditions are not known for this model. 3. BAYESIAN INFERENCE We specify the conditional mean E(Y t F t )asanar(p) model with a constant term. The model is then written as y t = ρ 0 + ρ y t + +ρ p y t p +, (6) where follows the MN-GARCH specification defined by (). We replace in the sequel (6) by the shorter notation y t = ρ x t +, (7) where ρ = (ρ 0, ρ,..., ρ p ) and x t = (, y t,..., y t p ). The likelihood of the MN-GARCH model for T observations is given by L( y) = T t= k= K π k φ(y t µ k + ρ x t,θ k ) (8) where is the vector regrouping the parameters ρ and π k, µ k, θ k for k =,..., K, y = (y, y 2,..., y T ) and φ( µ k + ρ x t, θ k ) denotes a normal density with mean µ k + ρ x t and variance h k,t that depends on θ k = (ω k, α k, β k ). A direct evaluation of the likelihood function is difficult because it consists of a product of sums. To alleviate this evaluation, we introduce for each observation a state variable S t {, 2,..., K } that takes the value k if the observation y t belongs to component k. The vector S T contains the state variables for the T observations. We assume that the state variables are independent given the group probabilities, and the probability that S t is equal to k is equal to π k : ϕ(s T π) = T ϕ(s t π) = where π = (π, π 2,..., π K ). Given S T and y the likelihood function is L( S T, y) = t= T π St, (9) t= T π St φ(y t µ St + ρ x t,θ St ), (0) t= which is easier to evaluate than (8). Since S T is not observed we treat it as a parameter of the model. This technique is called data augmentation, (see Tanner and Wong (987) for more details). Although the augmented model contains more parameters, inference becomes easier by making use of Markov chain Monte Carlo (MCMC) methods. In this paper, we implement a Gibbs

Bayesian inference 4 sampling algorithm that allows to sample from the posterior distribution by sampling from its conditional posterior densities, which are called blocks. The blocks of the Gibbs sampler, and the prior densities, are explained in the next subsections, using the parameter vectors ρ, π, θ = (θ, θ 2,..., θ k ) and µ = (µ, µ 2,..., µ K ). The joint posterior distribution is given by ϕ(s T,ρ,µ,θ,π y) ϕ(ρ) ϕ(µ) ϕ(θ) ϕ(π) T π St φ(y t µ St + ρ x t,θ St ), () where ϕ(ρ), ϕ(µ), ϕ(θ), ϕ(π) are the corresponding prior densities. Thus, we assume prior independence between ρ, π, µ and θ. We define these prior densities below when we explain the different blocks of the Gibbs sampler. t= 3.. Sampling S T from ϕ(s T ρ, µ, θ, π, y) Given ρ, µ, θ, π and y, the posterior density of S T is proportional to L( S T, y). It turns out that the S t s are mutually independent, so that we can write the relevant conditional posterior density as ϕ(s T ρ,µ,θ,π, y) = T ϕ(s t ρ,µ,θ,π, y). (2) As the sequence {S t } t= T is equivalent to a multinomial process, we simply have to sample from a discrete distribution where the K probabilities are given by P(S t = k ρ,µ,θ,π, y) = t= π k φ(y t µ k + ρ x t,θ k ) K j= π, (k =,...,K ). jφ(y t µ j + ρ x t,θ j ) To sample S t, we draw one observation from a uniform distribution on (0, ) and decide which group k to take according to. 3.2. Sampling π from ϕ(π S T, ρ, µ, θ, y) The full conditional posterior density of π depends only on S T and y and is given by ϕ(π S T, y) = ϕ(π S T ) ϕ(π) K k= π x k k, (4) where x k is the number of times that S t = k. The prior ϕ(π) is chosen to be a Dirichlet distribution, Di(a 0, a 20 a K 0 ) with parameter vector a 0 = (a 0, a 20 a K 0 ) (see the Appendix for more details). As a consequence, ϕ(π S T, y) is also a Dirichlet distribution, Di(a, a 2 a K ) with a k = a k0 + x k, k =, 2,..., K. 3.3. Sampling µ from ϕ(µ S T, ρ, π, θ, y) We show in the Appendix that the conditional distribution of µ = (µ,µ 2,...,µ K ) is Gaussian with a non-diagonal covariance matrix. Once µ has been drawn, the last mean µ K is obtained from.

42 L. Bauwens and J.V.K. Rombouts 3.4. Sampling ρ from ϕ(ρ S T, µ, π, θ, y) Given that the conditional variances h k,t depend on ρ, the conditional posterior distribution for this block does not belong to a family that can be easily simulated. We can, for example, employ the Metropolis Hastings algorithm. For the latter, we use a Gaussian proposal q( ) the functional form of which is given in the Appendix. The acceptance probability at iteration n + for candidate ρ has the form ( ϕ(s T,ρ,µ,θ,π y) q(ρ ; ρ = ρ (n) ) ) min ϕ(s T,ρ (n),µ,θ,π y) q(ρ (n) ; ρ = ρ ),. (5) Apart from ρ, the other parameters in the posterior ϕ( ) are fixed at their latest draw. 3.5. Sampling θ from ϕ(θ S T, ρ, µ, π, y) By assuming prior independence between the θ k s, i.e. ϕ(θ) = K k= ϕ (θ k ), it follows that ϕ(θ S T,ρ,µ,π,y) = ϕ(θ S T,ρ,µ,y) = ϕ(θ ρ,µ, ỹ )ϕ ( θ 2 ρ,µ 2, ỹ 2) ϕ ( θ K ρ,µ K, ỹ K ) (6) where ỹ k ={y t S t = k} and ϕ ( θ k ρ,µ k, ỹ k) ϕ(θ k ) t S t =k φ(y t µ k + ρ x t,θ k ). (7) Since we condition on the state variables, we can simulate each block θ k separately. We do this with the griddy-gibbs sampler (see the Appendix, and for further details, see Bauwens et al. 999). Note that intervals of values for ω k, α k and β k must be defined. The choice of these bounds needs to be finely tuned in order to cover the range of the parameter over which the posterior is relevant. For the deterministic integration we used 33 points, which proved to be enough according to several experiments. 3.6. Label switching In mixture models, the labelling of the components is arbitrary and one can shuffle the labels without changing the likelihood function. The latter has as many modes as there are permutations of the regime labels. In the Bayesian framework, one can run an algorithm that explores all the modes, which may not be easy and take a lot of computing time, or impose an identification condition through the prior information. The solution used by Haas et al. (2004a) in the ML framework is to impose that π >π 2 > >π K but this solution is destroying the result that the full conditional posterior of π is Dirichlet and thus the sampling of π would be more difficult. We choose the solution of imposing that the component specific parameters have sufficiently different prior densities (e.g. through non-overlapping supports, but this is an extreme solution that is not necessary).

Bayesian inference 43 3.7. Predictive densities Predictive densities are essential for financial applications such as portfolio optimization and risk management. Unlike prediction in the classical framework, predictive densities take into account parameter uncertainty by construction. The predictive density of y T + is given by f (y T + y) = f (y T +, y) ϕ( y) d, (8) where f (y T +, y) = K k= π kφ(y T + µ k + ρ x T +,θ k ) as implied by (). An analytical solution to (8) is not available but extending the algorithm of Geweke (989), it can be approximated by N ( N K π ( j) k φ ( y T + µ ( j) k + ρ ( j) x T +,θ ( j) k, y )) (9) k= j= where the superscript (j) indexes the draws generated with the Gibbs sampler and N is the number of draws. Therefore, simultaneously with the Gibbs sampler, we repeat N times the following two-step algorithm Step : simulate ( j) ϕ ( y). This is done by the Gibbs sampler. Step 2: simulate y ( j) T + f (y T + ( j), y). Go to step. Extending the idea used for y T +, the predictive density for y T +s may be written as [ f (y T +s y) =... f (y T +s y T +s,...,y T +, y, ) f (y T +s y T +s 2,...,y T +, y, )... f (y T + y, )dy T +s dy T +s 2 dy T + ]ϕ( y) d (20) for which draws can be obtained by extending the above algorithm to a (s+)-step algorithm. The draw of y T + serves as conditioning information to draw y T +2, both realisations serve to draw y T +3, etc. All these draws are easily generated from the finite mixture of normal densities. A non-bayesian procedure typically proceeds by conditioning on a point estimate of, which ignores the estimation uncertainty. 3.8. Marginal likelihood The marginal likelihood of y, also called predictive density, is useful for selecting the number of components K in the mixture. It is well defined when the prior density is integrable, which is the case for the prior we use in this paper. For example, Bayes factors are ratios of marginal likelihoods (see Kass and Raftery 995 for a detailed explanation). The marginal likelihood is defined as the integral of the likelihood with respect to the prior density m(y) = L( y)ϕ( ) d. (2)

44 L. Bauwens and J.V.K. Rombouts Since this is the normalizing constant in Bayes theorem, we can also write L( y)ϕ( ) m(y) =. (22) ϕ( y) Note that (22) is an identity that holds for every. Deterministic numerical integration of (2) is computationally too demanding for the finite mixture model of this paper. Instead, we calculate the marginal likelihood by the Laplace approximation (see Tierney and Kadane 986). To explain this, let us define exp(h( )) = L( y)ϕ( ). The Laplace approximation is based on a secondorder Taylor expansion of h( ) around the posterior mode ˆ = arg max ln φ( y), so that the first-order term in the expansion vanishes: h( ) h( ˆ ) + 2 ( ˆ ) 2 h( ) = ˆ ( ˆ ). (23) Therefore, the marginal likelihood can be computed as ( ) exp h( )d exp(h( ˆ )) exp 2 ( ˆ ) 2 h( ) = ˆ ( ˆ ) d (24) or m(y) = L( ˆ y) ϕ( ˆ ) (2π) k/2 ( ˆ ) /2, (25) where k is the dimension of and [ ] ( ˆ ) = 2 ln L( y) ϕ( ) = ˆ. (26) We choose the model with the highest marginal likelihood value. Another possibility to choose the number of components is to treat K as an additional parameter in the model as is done in Richardson and Green (997) who make use of the reversible jump MCMC methods. In this way, the prior information on the number of components can be taken explicitly into account by specifying for example a Poisson distribution on K in such a way that it favours a small number of components. 4. APPLICATION TO S&P500 DATA We fit the two component mixture model to daily S& P500 percentage return data from 0/03/994 to 09/06/2005 (3047 observations). Descriptive statistics are given in Table. Panel (a) of Figure displays the sample path of the returns (the other panels are explained further down in this section). It is clear that excess kurtosis and volatility clustering are present in the data. We analyzed whether a dynamic specification for the conditional mean is necessary and we found evidence for an autoregressive model of order three. Thus, we estimate the model defined by (6) with p = 3 and by () with K = 2. The ML estimates and the Bayesian first two marginal posterior moments are given in Table 2. The parameters a k0 of the Dirichlet prior for π are all equal to, which means that the prior density for the probability π is uniform on (0, ). The prior densities for the other parameters are all independent. For the parameters in ρ and µ, these prior densities are flat on wide intervals (their bounds need not be specified). For the GARCH parameters, the densities are uniform on finite

Bayesian inference 45 Table. Descriptive statistics S&P 500 returns. Observations 3047 Mean 0.039 Standard deviation.07 Maximum 5.58 Minimum 7. Skewness 0. Kurtosis 6.74 Statistics for SP500 percentage daily returns from 0/03/994 to 09/06/2005. intervals given by 0.000 <ω < 0.004, 0.0005 <α < 0.08, 0.89 <β < 0.99, 0.009 <ω 2 < 0.6, 0.08 <α 2 < 0.65, 0.73 <β 2 < 0.97. These values are the bounds used in the griddy-gibbs sampler part of the algorithm described in Section 3.5. The posterior marginal distributions of all the parameters are given in Figure 2. The x-axes for the GARCH parameters are the prior intervals reported above. Note that the posterior marginals of ω and ω 2 are somewhat truncated at zero given that they are restricted to be positive. A scatterplot of the α and β draws for both components is given in Figure 3. A clear conclusion from the figures is that the data are much less informative on the explosive regime than on the stable one. We checked the convergence of the Gibbs sampler for all parameters with CUMSUM plots of the draws (see Bauwens et al. 999 for details). From Table 2, we conclude that the ML and Bayesian parameter estimates are close to each other. The posterior standard deviations (SD) are in most cases a little bit smaller than the ML standard errors (SE) computed from the Hessian matrix evaluated at the ML estimates. These differences come to some extent from the use of finite intervals as support of some prior densities. The estimated probability is about 0.83 for the first component which is driven by a persistent stationary GARCH process (α + β = 0.98). The second component of the mixture has an explosive conditional variance (α 2 + β 2 =.7) with a probability of about 0.7. To illustrate the interest of the Bayesian estimation of the two component model, we report in panel (b) of Figure the sample path of the posterior means of the state variables (mean states), i.e. for each observation we count the proportion of the Gibbs sampler generated state values that correspond to the explosive regime. The mean of these proportions is equal to 0.6 which is close to the probability of being in the second component of the mixture. Panel (c) of the figure contains the scatter plot of these mean states and the corresponding returns. From these graphs, one can identify a clear association between the explosive regime and the extreme returns, especially for the negative returns. The asymmetric shape of this relation can be interpreted as the leverage effect, i.e. the association of large negative returns of a given value to a higher volatility than in the case of positive returns having the same absolute value. As a comparison, we report estimates of the single component mixture model, i.e. the conventional GARCH(,) model. The ML estimates and the Bayesian first two marginal posterior moments are given in Table 3. The process looks like integrated in variance given that α + β is estimated at 0.995. This may be interpreted as a compromise between the less persistent and explosive components of the mixture model. We obtained a similar result when we estimated the GARCH(,) model with data simulated from a two component mixture. Thus, the observation

46 L. Bauwens and J.V.K. Rombouts 4 2 0 0 250 500 750 000 250 500 750 2000 2250 2500 2750 3000 (a) S&P 500 returns.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0. 0 250 500 750 000 250 500 750 2000 2250 2500 2750 3000 (b) Mean states.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0. 0 2 3 4 5 0 (c) Scatterplot returns and mean states Figure. Information on states.

Bayesian inference 47 Table 2. Estimation results S&P 500. MLE Bayes Estimate SE Mean SD ρ 0 0.06 0.05 0.040 0.02 ρ 0.05 0.09 0.0 0.05 ρ 2 0.026 0.09 0.029 0.05 ρ 3 0.046 0.08 0.048 0.05 π 0.87 0.2 0.83 0.098 µ 0.067 0.023 0.073 0.027 ω 0.0030 0.0025 0.0032 0.002 α 0.042 0.07 0.040 0.05 β 0.94 0.07 0.94 0.07 ω 2 0.032 0.035 0.044 0.030 α 2 0.30 0.8 0.32 0.3 β 2 0.87 0.054 0.85 0.044 Results for the AR two component normal mixture GARCH(,) model. Sample of 3047 observations from 0/03/994 to 09/06/2005. that a quasi-integrated GARCH model (α + β ) is obtained in many empirical results can be explained by a lack of flexibility of this model. In Table 4, we report the marginal likelihood and the Bayesian information criterion (BIC) values for the single-and two-component models. The results indicate a strong preference for the two-component model. As for any time series model, prediction is essential. As we explain in Section 3.7, Bayesian inference allows to obtain predictive densities that by construction incorporate parameter uncertainty. Furthermore, they can be easily computed together with the posterior densities during the application of the Gibbs sampler for the model parameters. We report in Figure 4 the computed predictive densities for a horizon up to five days out of sample (September 7, 2005 until September, 2005). Eyeballing the graphs, we see that the left tail of the predictive densities is fatter for the two component model than for the simple GARCH model. In Table 5, we report the skewness and kurtosis coefficients, plus the value-at-risk (VaR) at per cent for the five days. Judging from the skewness and kurtosis values, the single-component model yields close to normal predictive densities, while the two component model produces predictive densities with fatter tails and negative skewness. Because of the fatter left tail of the two component model predictive densities, their VaR are smaller than for the one component model. We also computed a sequence of one step ahead VaR s from the end of the sample until September 2006 (250 new observations). We computed the failure rates for the, 5, 0, 90, 95 and 99 VaR levels. Likelihood ratio tests for each VaR level, for both models, do not reject, hence both models are able to fit the tails of the distribution well. This similar performance can be explained by the fact that the distribution of the returns in the covered forecast period is very close to being normal, that is very symmetrical and with low excess kurtosis.

48 L. Bauwens and J.V.K. Rombouts 25 22.5 20.0 20 7.5 5.0 5 2.5 0.0 0 7.5 5 5.0 2.5 0.00 0.02 0.04 0.06 0.08 0.0 0.2 0.00 0.02 0.04 0.06 0.08 (a) ρ 0 (b) ρ 22.5 20.0 7.5 22.5 20.0 7.5 5.0 5.0 2.5 2.5 0.0 0.0 7.5 7.5 5.0 5.0 2.5 2.5 0.00 0.02 0.04 0.06 0.00 0.02 0.04 (c) ρ 2 (d) ρ 3 6 7.5 5 5.0 2.5 4 0.0 3 7.5 2 5.0 2.5 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95.00 0.00 0.02 0.04 0.06 0.08 0.0 0.2 0.4 0.6 0.8 (e) π (f) µ 40 35 30 25 20 5 0 5 0.00 0.0 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.0 0. 0.2 0.3 0.4 0.5 0.6 (g) ω and ω 2 20.0 7.5 5.0 2.5 0.0 7.5 5.0 2.5 0.00 0.05 0.0 0.5 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 (h) α and α 2 20.0 7.5 5.0 2.5 0.0 7.5 5.0 2.5 0.725 0.750 0.775 0.800 0.825 0.850 0.875 0.900 0.925 0.950 0.975.000 (i) β and β 2 Figure 2. Posterior densities (kernel estimates from Gibbs output) for two component normal mixture GARCH(,) model.

Bayesian inference 49 Figure 3. MCMC draws in α-β space. Table 3. Estimation results (one component) S&P 500. MLE Bayes Estimate SD mean SE ρ 0 0.067 0.05 0.039 0.00 ρ 0.009 0.09 0.05 0.04 ρ 2 0.0049 0.09 0.004 0.04 ρ 3 0.035 0.09 0.042 0.04 ω 0.0056 0.0020 0.0060 0.008 α 0.064 0.0085 0.065 0.0079 β 0.93 0.0087 0.93 0.0082 Results for Gaussian GARCH(,) model. Sample of 3047 observations from 0/03/994 to 09/06/2005. 5. CONCLUSION We have shown how a certain type of mixture GARCH model can be estimated by Bayesian inference. ML estimation is typically not easy because of the complexity of the likelihood function. In Bayesian estimation, this is taken care of by enlarging the parameter space with state variables, so that a Gibbs sampling algorithm is easy to implement. Despite a higher computing time, the Bayesian solution is reliable since estimation does not fail, while this may happen in MLE. Moreover, the Gibbs algorithm delivers automatically posterior results on the state variables, which can be used for interpreting the nature of the second regime, as we illustrate in Section 4. Finally, the Gibbs algorithm can be extended to include the computation of predictive densities,

420 L. Bauwens and J.V.K. Rombouts Table 4. Model choice criteria S&P500 data. K Marginal log-lik. Maximized log-lik. No. par. BIC 446.9 426. 7 8308.4 2 400. 407.8 2 8239.8 K is the number of components of the normal mixture GARCH(,) model. 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0. 0. 0 2 3 4 0 2 3 4 (a) T+ (b) T+2 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0. 0. 0 2 3 4 0 2 3 4 (c) T+3 (d) T+4 0.6 0.5 0.4 0.3 0.2 0. 0 2 3 4 (e) T+5 Figure 4. Kernel density estimates of predictive densities for September 7 to, 2005 (dotted line for the two component model, solid line for single component model).

Bayesian inference 42 Table 5. Features of predictive densities h One component Two components 0.08 0.4 2 0.09 0.43 Skewness 3 0.06 0.38 4 0.04 0.30 5 0.02 0.35 2.95 3.97 2 3.4 4. Kurtosis 3 3.08 4.00 4 2.99 3.78 5 3. 4.02.43.73 2.48.82 VaR 3.58.76 4.46.65 5.58.77 h is the post-sample prediction horizon. VaR is the per cent value-at-risk quantile. which takes care of estimation uncertainty. Prediction in the ML approach is typically done by conditioning on the ML estimate and therefore ignores estimation uncertainty. Bayesian estimation of other types of mixture GARCH models can be handled in a similar way as in this paper. A bivariate mixture GARCH model is estimated by Bauwens et al. (2006). ACKNOWLEDGMENTS We thank Viorel Maxim for research assistance and Arie Preminger and anonymous referees for useful comments. Bauwens s work was supported in part by the European Community s Human Potential Programme under contract HPRN-CT-2002-00232, MICFINMA and by a FSR grant from UCL. Rombouts s work was supported by a HEC Montréal Fonds de démarrage and by the Centre for Research on e-finance. This text presents research results of the Belgian Program on Interuniversity Poles of Attraction initiated by the Belgian State, Prime Minister s Office, Science Policy Programming. The scientific responsibility is assumed by the authors. REFERENCES Alexander, C. and E. Lazar (2004). Normal mixture GARCH(,): Applications to exchange rate modelling. Journal of Applied Econometrics 2, 307 36. Bauwens, L., C. Bos, R. van Oest and H. van Dijk (2004). Adaptive radial-based direction sampling: A class of flexible and robust Monte Carlo integration methods. Journal of Econometrics 23, 20 25.

422 L. Bauwens and J.V.K. Rombouts Bauwens, L., C. Hafner and J. Rombouts (2006). Multivariate mixed normal conditional heteroskedasticity. CORE Discussion Paper 2006/2, Louvain-La-Neuve. Computational Statistics and Data Analysis (Forthcoming). Bauwens, L. and M. Lubrano (998). Bayesian inference on GARCH models using the Gibbs sampler. Econometrics Journal, C23 46. Bauwens, L., M. Lubrano and J. Richard (999). Bayesian Inference in Dynamic Econometric Models. Oxford: Oxford University Press. Bollerslev, T. (986). Generalized Autoregressive Conditional Heteroskedasticity. Journal of Econometrics 3, 307 27. Bollerslev, T., R. Engle and D. Nelson (994). ARCH Models. in Handbook of Econometrics, ed. by R. Engle and D. McFadden, chap. 4, pp. 2959 3038. North Holland Press, Amsterdam. Dempster, A., N. Laird and D. Rubin (977). Maximum likelihood for incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society Series B 39, 38. Diebold, F. (986). Comment on Modeling the Persistence of Conditional Variances. Econometric Reviews 5, 5 6. Geweke, J. (989). Exact Predictive Densitites in Linear Models with ARCH Disturbances. Journal of Econometrics 40, 63 86. Geweke, J. and M. Keane (2005). Smoothly mixing regressions. Working Paper, University of Iowa. Journal of Econometrics (Forthcoming). Haas, M., S. Mittnik and M. Paolella (2004a). Mixed normal conditional heteroskedasticity. Journal of Financial Econometrics 2, 2 50. Haas, M., S. Mittnik and M. Paolella (2004b). A new approach to Markov-Switching GARCH models. Journal of Financial Econometrics 2, 493 530. Kass, R. and A. Raftery (995). Bayes factors. Journal of the American Statistical Association 90, 773 95. Kleibergen, F. and H. V. Dijk (993). Non-stationarity in GARCH Models: A Bayesian Analysis. Journal of Applied Econometrics 8, S4 6. Marin, J., K. Mengersen and C. Robert (2005). Bayesian Modelling and Inference on Mixtures of Distributions, Handbook of Statistics 25.D. Dey and C.R. Rao (eds), Elsevier-Sciences. McLachlan, G. and D. Peel (2000). Finite Mixture Models. New York Wiley Interscience. Mikosch, T. and C. Starica (2004). Nonstationarities in financial time series, the long-range dependence, and the IGARCH Effects. Review of Economics and Statistics 86, 378 90. Nelson, D. (99). Conditional heteroskedasticity in asset returns: A new approach. Econometrica 59, 349 70. Richardson, S. and P. Green (997). On Bayesian Analysis of Mixtures with an Unknown Number of Components. Journal of the Royal Statistical Society, Series B 59, 73 92. Tanner, M. and W. Wong (987). The calculation of posterior distributions by data augmentation. Journal of the American Statistical Association 82, 528 40. Tierney, L. and J. Kadane (986). Accurate Approximations for posterior moments and marginal densities. Journal of the American Statistical Association 8, 82 6. Wilks, S. (962). Mathematical Statistics. New York: Wiley. Wong, C. and W. Li (2000). On a mixture autoregressive model. Journal of the Royal Statistical Society, Series B, 62, 95 5. Wong, C. and W. Li (200). On a mixture autoregressive conditional heteroscedastic model. Journal of the American Statistical Association 96, 982 95.

Bayesian inference 423 The Dirichlet density function is given by f Di (π a, a 2 a K ) = APPENDIX The Dirichlet distribution Ɣ(A) K k= Ɣ(a k) K k= π a k k SK (π) (A.) where a k > 0(k =,...,K ), A = K i= a i and S K ={π k, k =,...,K π k > 0 k, K k= π k = }. The first two moments are given by E(π i a) = a i, V (π A i a) = a i (A a i ) and cov(π A 2 (A+) i,π j a) = a i a j, A 2 (A+) respectively. We sample a Dirichlet distribution by sampling K independent gamma random variables, X k G(a k, ), and transforming them to X i π i = i =,...,K X + + X K π K = π π 2 π K. It follows that (π,..., π K ) Di(a,..., a K ). Other properties of the Dirichlet distribution can be found in Wilks (962). (k) Proof that ϕ( µ S T,ρ,π,θ,y) is Gaussian We illustrate this for K = 3. Minus two times the log-kernels for the first two components are given by ( ) 2 µ k = c k + µ 2 k 2µ k (k =, 2), (A.2) hk,t h k,t h k,t (k) where (k) means summation over all t for which s t = k, and c k does not depend on µ k. The third mixture component contributes in the following way: ( ) ɛt + π µ + π 2 2 ( µ 2 = c 3 + µ 2 h3,t π + 2µ π ) 2 The sum of (A.2) and (A.3) can be written compactly as (k) ( + µ 2 2 + 2µ 2 π 2 π 2 ) 2 + 2 π π 2 µ µ 2 π 2 3. (A.3) ( µ µ) A( µ µ) + c, (A.4) where c is a constant not depending on µ, bydefining the matrix A as ( ) 2 () h,t + π π π 2 π 2 3 π π 2 π3 2 (2) h 2,t + ( π2 ) 2, (A.5)

424 L. Bauwens and J.V.K. Rombouts and the vector µ as A b, where b = [ π π 2 () (2) h,t h 2,t ]. (A.6) Minus one half times the first term of (30) is the log-kernel of a bivariate Gaussian density with mean µ and covariance matrix A. For K components, where π = (π,...,π K ), and ( A = diag () b =,..., h,t π π K (K ) π K π K (K ) h (K ) K,t h K,t (). h K,t (K ) ) π π + h,t π 2 K h K,t (K ). h K,t, (A.7) (A.8) Gaussian proposal for ϕ(ρ S T, µ, π, θ, y) We illustrate this for K = 3. We condition on the previous draw of ρ, denoted by ρ, and we compute h k,t conditional on ρ and therefore use the notation h k,t. Minus two times the log-kernels for the first two components is given by ( y t ρ x t µ k (k) h k,t ) 2 = c k + ρ (k) x t x t h k,t ρ 2ρ (k) x t y t h k,t + 2µ k ρ (k) x t h k,t (k =, 2), (A.9) where (k) means summation over all t for which s t = k, and c k does not depend on ρ. The third mixture component contributes in the following way: ( ) yt ρ x t + π µ + π 2 2 µ 2 = c 3 + ρ x t x t ρ 2ρ h x t y t π 2µ ρ 3,t h x t 3,t 2µ 2 π 2 ρ x t. (A.0) The sum of (A.9) for k = and 2 and (A.0) can be written compactly as where c is a constant not depending on ρ. The matrix A is defined by (ρ ρ) A(ρ ρ) + c, (A.) 3 k= (k) x t x t h k,t and the vector ρ is equal to A b, where b = 3 k= (k) x t y t h k,t + 2 k= µ k [ (k) ] x t π k x t. h k,t

Bayesian inference 425 Minus one half times the first term of (37) is the log-kernel of a multivariate Gaussian density with mean ρ and covariance matrix A. To generalize the last two formulas to K > 3, replace 3 by K and 2 by K. Griddy-Gibbs sampler for ϕ(θ k ρ,µ k, ỹ k ) The algorithm works as follows at iteration n + (for lighter notations, we drop the index k and the conditioning variables ρ, µ k and ỹ k ): () Using (7), compute κ(ω α n, β n ), the kernel of the conditional posterior density of ω given the values of α and β sampled at iteration n, over a grid (ω, ω 2, ω G ), to obtain the vector G κ = (κ, κ 2,,κ G ). (2) By a deterministic integration rule using M points, compute G f = (0, f 2,..., f G ) where ωi f i = κ(ω α n,β n ) dω, i = 2,...,G. (A.2) ω Generate u U(0, f G ) and invert f (ω α (n), β (n) ) by numerical interpolation to get a draw ω (n+) ϕ(ω α (n), β (n) ). (4) Repeat Steps 3 for ϕ (α ω (n+), β n ) and ϕ(β ω (n+), α (n+) ).