Generalized Autoregressive Score Smoothers

Generalized Autoregressive Score Smoothers Giuseppe Buccheri 1, Giacomo Bormetti 2, Fulvio Corsi 3,4, and Fabrizio Lillo 2 1 Scuola Normale Superiore, Italy 2 University of Bologna, Italy 3 University of Pisa, Italy 4 City University of London, UK Very Preliminary and Incomplete. Abstract Motivated by the observation that Generalized Autoregressive Score (GAS) models can be viewed as approximate filters, we introduce a new class of simple approximate smoothers for nonlinear non-gaussian state-space models that are named Generalized Autoregressive Score Smoothers (sgas). The newly proposed sgas improves on GAS filtered estimates as it is able to use all available observations when re-constructing time-varying parameters. In contrast to complex and computationally demanding simulation-based methods, the sgas has similar structure to Kalman backward smoothing recursions but uses the score of the non-gaussian observation density. Through an extensive Monte Carlo study, we provide evidence that the performance of the approximation is very close (with average differences lower than 2.5% in mean square errors) to that of simulation-based techniques while at the same time requiring significantly lower computational burden. Keywords: GAS models, Smoothing, Kalman filter, State-Space models, GARCH, MEM 1

1 Introduction Observation-driven models like the GARCH model of Bollerslev (1986), where time-varying parameters are driven by functions of lagged observations, are typically viewed as data generating processes. As such, all relevant information is encoded on previous observations and there is no room for using actual and future observations. However, they can also be viewed as predictive filters, as time-varying parameters are one-step-ahead measurable. This idea was largely exploited by Daniel B. Nelson who explored the asymptotic properties of conditional covariance estimates generated by GARCH processes under the assumption that the true data generating process is a diffusion 1 ; see e.g. Nelson (1992), Nelson and Foster (1994), Nelson and Foster (1995) and Nelson (1996). In particular, Nelson (1996) showed how to efficiently use information in both lagged and led GARCH residuals to estimate the unobserved components of a stochastic volatility process. However, despite the huge amount of observation-driven models, which allow to filter out the timevarying parameters of the true data generating process, the literature lacks of observation-driven smoothers allowing to estimate parameters using all available information. In this paper we aim at filling this gap by introducing a smoothing method for a general class of observation-driven models, namely Generalized Autoregressive Score (GAS) models of Creal et al. (2013) and Harvey (2013), also known as Score-Driven Models. We start from the observation that Kalman filtering and smoothing recursions for time-invariant linear Gaussian models can be re-written in terms of the score of the predictive likelihood and a set of static parameters. Since the predictive filtering recursion is known to have the same form of GAS models, the latters can be viewed as approximate filters for nonlinear non-gaussian models. Thanks to the use of the score of the non-gaussian density, GAS filters provide similar forecasting performances as correctly specified parameter-driven models, as shown by Koopman et al. (2016). Based on the same logic, we build a new class of smoothers that maintain the same simple form of Kalman backward smoothing recursions but use the score of the non-gaussian density. The resulting smoothing method is very 1 The interpretation of GARCH processes as filters is well described in this statement by Nelson (1992): Note that our use of the term estimate corresponds to its use in the filtering literature rather than the statistics literature; that is, an ARCH model with (given) fixed parameters produces estimates of the true underlying conditional covariance matrix at each point in time in the same sense that a Kalman filter produces estimates of unobserved state variables in a linear system. 2

general, as it can be applied to any observation density, in a similar fashion to GAS filters. We name the newly proposed methodology Generalized Autoregressive Score Smoother (sgas). Smoothing with the sgas requires performing a backward recursion following the standard GAS forward recursion to filter out time-varying parameters. While going backward, the sgas updates filtered GAS estimates by including the effect of actual and future observations and leads to a more efficient re-contruction of time-varying parameters. Considering that the likelihood of observation-driven models can be typically written down in closed form, sgas smoothing is particularly advantageous from a computational point of view. In contrast, the classical theory of filtering and smoothing for nonlinear non-gaussian models requires the use of computationally demanding simulation-based techniques (see e.g. Durbin and Koopman 2012). GAS models have been successfully applied in the recent econometric literature. For instance, Creal et al. (2011) developed a multivariate dynamic model for volatilities and correlations using fat tailed distributions. Oh and Patton (2017) introduced high-dimensional factor copula models based on GAS dynamics for systemic risk assessment while Harvey and Luati (2014) described a new framework for filtering with heavy tails. Compared to other observation-driven models, GAS models are locally optimal from an information theoretic perspective, as shown by Blasques et al. (2015). For any GAS model, a companion sgas can be devised to improve the estimation of timevarying parameters. The sgas is therefore useful for offline signal reconstruction and analysis. In particular, we examine in detail the companion sgas of some among the most popular observationdriven models, namely the GARCH, the MEM model of Engle (2002) and Engle and Gallo (2006) and an AR(1) model with time-varying autoregressive coefficient. By performing extensive Monte Carlo simulations of nonlinear non-gaussian state-space models, we compare the performance of the sgas to that of correctly specified parameter-driven models. In particular, we consider two stochastic volatility models and a stochastic intensity models. Importance sampling methods allow to evaluate the full likelihood of these models. Increasing the number of simulations from the importance density leads to precise parameter estimates and accurate smoothed estimates of time-varying parameters. We also used the Quasi Maximum Likelihood (QML) method of Harvey et al. (1994) to estimate the two stochastic volatility models. Compared to correctly specified models, the losses incurred by the sgas are very small in all the simulated scenarios and are always lower on average than 2.5% in mean square errors. Morevoer, it sys- 3

tematically outperforms the QML. Computational times are decisively in favour of the sgas. For instance, for the first stochastic volatility model used in the simulation, we found that smoothing with the sgas is on average 179.74 faster than smoothing with importance sampling and similar computational advantages are observed for the remaining models. The rest of the paper is organized as follows: Section (2) introduces the sgas and conveys the main theoretical ideas; Section (3) describes in detail three examples of sgas models; Section (4) shows the results of the Monte Carlo study; Section (5) concludes. 2 sgas In this section we will discuss in detail the main theoretical ideas conveying to the formulation of the sgas. We start by observing that the classical Kalman filter and smoothing recursions for time-invariant linear Gaussian models can be re-written in an equivalent form that involves only the score of the conditional likelihood and a set of static parameters. Based on the same logic of GAS filters, that have similar form of Kalman forward filtering recursions, we will introduce the sgas as having similar form of Kalman backward smoothing recursions. However, sgas models are able to provide robust estimates since they use the score of the non-gaussian density. 2.1 Kalman filtering and smoothing Let us consider a linear Gaussian state-space representation: y t = Zα t + ɛ t, ɛ t NID(0, H) (1) α t+1 = c + T α t + η t, η t NID(0, Q) (2) where α t is an n-dimensional column vector of state variables and y t is an m-dimensional column vector of observations. The parameters Z, H, T and Q are system matrices of appropriate dimensions. Let F t denote the set of observations up to time t, namely F t = {y 1,..., y t }. We are interested in updating our knowledge of the underlying state variable α t when a new observation y t 4

becomes available and to predict α t+1 based on the last observations y 1,..., y t. Thus, we define: a t t. = E[α t F t ], P t t. = Var[α t F t ] (3) a t+1. = E[α t+1 F t ], P t+1. = Var[α t+1 F t ] (4) The Kalman filter allows to compute recursively a t t, P t t, a t+1 and P t+1. Assuming α 1 N(a 1, P 1 ), where a 1 and P 1 are known, for t = 1,..., N, we have (see e.g. Durbin and Koopman 2012): v t = y t Za t, F t = ZP t Z + H (5) a t t = a t + P t Z F 1 t v t, P t t = P t P t Z F 1 t ZP t (6) a t+1 = c + T a t + K t v t, P t+1 = T P t (T K t Z) + Q (7) where K t = T P t Z Ft 1. The log-likelihood can be computed in the prediction error decomposition form, namely: log p(y t F t 1 ) = const 1 2 ( log Ft + v tf 1 t v t ) Smoothed estimates ˆα t. = E[α t F N ], ˆP t. = Var[α t F N ], N > t, can be computed through the following backward recursions: r t 1 = Z F 1 t v t + L tr t, N t 1 = Z F 1 t Z + L tn t L t (9) ˆα t = a t + P t r t 1, ˆPt = P t P t N t 1 P t (10) (8) where L t = T K t Z, r n = 0, N n = 0 and t = N,..., 1. 2.2 A more general representation We now re-write the Kalman filter and smoothing recursions in an equivalent but more general representation. To this end, let us consider the score of the log-likelihood (8) with respect to a t : [ ] log p(yt F t 1 ) t = (11) a t By computing the derivative we obtain: [ ] [ log p(yt F t 1 ) log p(yt F t 1 ) v t = a t v t a t ] = [v tft 1 Z] = Z Ft 1 v t (12) 5

The information matrix is computed as: Thus, we can re-write recursions for a t t and a t+1 as: I t t 1 = E t 1 [ t t] = Z F 1 t Z (13) a t t = a t + P t t (14) a t+1 = c + T a t + T P t t (15) and the backward recursion for ˆα t as r t 1 = t + L tr t (16) ˆα t = a t + P t r t 1 (17) where L t = T T P t I t t 1. Since the system matrices are constant, a steady state exists and P t converges to the fixed point P of the iteration in few steps (see e.g. Durbin and Koopman 2012). By defining R =. T P and I =. Z (Z P Z + H) 1 Z, we can re-write the Kalman filter and smoother recursions for the mean in the steady state as: a t t = a t + T 1 R t (18) a t+1 = c + T a t + R t (19) and r t 1 = t + L r t (20) ˆα t = a t + T 1 Rr t 1 (21) where L = T RI. The new Kalman filter and smoothing recursions for the mean are re-parametrized in terms of the score t. This representation is equivalent to the one in equations (5)-(7) and (9), (10). However, it is more general in the sense that it only relies on the predictive density p(y t F t 1 ). In principle, the forward recursions (18), (19) and the backward recursions (20), (21) can be applied to any observation-driven model for which a predictive density p(y t F t 1 ) is defined. 6

2.3 sgas specification Note that the predictive filter (19) has an autoregressive structure and is driven by the score of the conditional likelihood. Thus, if one looks at GAS models as filters, it turns out that the GAS is correctly specified in case of linear Gaussian state-space models. In case of nonlinear non-gaussian state-space models, it can be regarded as an approximate filter that provides robust estimates as it uses the score of the non-gaussian observation density. Indeed, as shown by Koopman et al. (2016), the GAS has similar predictive accuracy to correctly specified parameter-driven models while at the same time providing large computational gains. The main advantage is that the likelihood can be written in closed form and standard quasi-newton techniques can be employed for optimization. Based on the same principle, we introduce an approximate smoother that allows to estimate time-varying parameters using all the available observations. As the GAS filter is correctly specified in case of linear and Gaussian state-space models, we define our smoother in such a way that it has similar structure to the Kalman smoother in case of linear Gaussian state-space models while in case of nonlinear non-gaussian state-space models, it maintains the same simple form but uses the score of the non-gaussian observation density. Let us assume that observations y t R n, t = 1,..., T, are generated by the following predictive density: y t f t p(y t f t, Θ) (22) where f t R k is a vector of time-varying parameters and Θ is a vector of static parameters. We generalize the filtering and smoothing recursions (18)-(21) for the prediction density p(y t f t, Θ) as: f t t = f t + B 1 As t (23) f t+1 = ω + As t + Bf t (24) t = 1,..., N and: r t 1 = s t + (B A) r t (25) ˆf t = f t + B 1 Ar t 1 (26) where r n = 0 and t = N,..., 1. The predictive filter in equation (24) has the same form of a GAS filter. The term s t. = S t t, t. = log p(yt ft,θ) f t is the scaled score of the predictive likelihood. As 7

discussed by Creal et al. (2013), the most common choice for the scaling matrix is S t = I 1 t t 1, where. I t t 1 = E t 1 [ t t] is the information matrix. The vector ω R k and the two matrices A, B R k k are static parameters included in Θ which are estimated by maximizing the log-likelihood, namely: ˆΘ = argmax Θ T log p(y t f t, Θ) (27) t=1 Thus, one can run the backward smoothing recursions (25), (26) after estimating Θ and computing the forward filtering recursions (23), (24), in a similar fashion to Kalman filter and backward recursions. Compared to the latters, we have used s t in place of t in order to correct for the curvature of the log-likelihood, as tipically done in GAS models. by AI 1 t t 1 The score is now multiplied in equation (24) and therefore the term L = T RI in equation (20) generalizes as B (AI 1 t t 1 )I t t 1 = B A. That is, the information matrix I t t 1 disappears because its effect is already taken into account when scaling the score. We term the approximate smoother obtained through recursions (25), (26) as Generalized Autoregressive Score Smoother (sgas). 3 Examples of sgas models In this section we show some examples of sgas models. We compare filtered estimates obtained through the GAS to the corresponding smoothed estimates obtained through the companion sgas model. We focus on three time-varying parameter models that are quite popular in the econometric literature, namely the GARCH model of Bollerslev (1986), the multiplicative error model (MEM) of Engle (2002) and Engle and Gallo (2006) and an AR(1) model with a time-varying autoregressive coefficient. In all the three cases we first estimate the static parameters, compute filtered estimates through the forward recursions (24) and finally compute smoothed estimates by running the backward recursions (25), (26). Example 1: sgarch. Consider the model: y t = σ t ɛ t, ɛ t NID(0, 1) (28) The predictive density is thus: p(y t σt 2 ) = 1 e y 2 t 2πσt 2σ 2 t (29) 8

Setting f t = σ 2 t and S t = I 1 t t 1, equation (24) reduces to the GARCH(1,1) model: f t+1 = c + A(yt 2 f t ) + Bf t (30) while the smoothing recursions (25), (26) reduce to: r t 1 = yt 2 f t + (B A) r t (31) ˆf t = f t + B 1 Ar t 1 (32) t = N,..., 1. Example 2: smem. Consider the model: y t = µ t ɛ t (33) where ɛ t has a gamma distribution with density p(ɛ t α) = Γ(α) 1 ɛ α 1 t α α e αɛt. The predictive density is thus given by: p(y t µ t, α) = Γ(α) 1 y α 1 t α α µ α Setting f t = µ t and S t = I 1 t t 1, equation (24) reduces to the MEM(1,1) model: while the smoothing recursions (25), (26) reduce to: t e α y t µ t (34) f t+1 = c + A(y t f t ) + Bf t (35) r t 1 = y t f t + (B A) r t (36) ˆf t = f t + B 1 Ar t 1 (37) t = N,..., 1. Example 3: sar(1). Consider the model: y t = c + α t y t 1 + ɛ t, ɛ t N(0, q 2 ) (38) The predictive density is thus given by: [ p(y t α t ) = 1 exp 1 ( ) ] 2 yt c α t y t 1 2πq 2 q (39) Setting f t = α t and S t = I 1 t t 1, equation (24) reduces to: f t+1 = c + A y t c f t y t 1 y t 1 + Bf t (40) 9

while the smoothing recursions (25), (26) reduce to: r t 1 = y t c f t y t 1 y t 1 + (B A) r t (41) ˆf t = f t + B 1 Ar t 1 (42) t = N,..., 1. We simulate N = 4000 observations with different dynamic patterns for σt 2, µ t and α t. Figures (1), (2), (3) show filtered and smoothed estimates of time-varying parameters obtained through the three models. As expected, sgas estimates are less noisy than filtered GAS estimates and are characterized by lower mean square errors (MSE). For instance, the average MSE of sgarch estimates in Figure (1), is 0.6401 times lower than that of GARCH estimates and similar values are obtained for other models. 4 Monte Carlo analysis In this section we will perform extensive Monte Carlo simulations to test the performance of the sgas under different dynamic specifications for the time-varying parameters. Since we interpret the sgas as an approximate smoother for parameter-driven models, we compare its performance to that of correctly specified parameter-driven models. The main idea is to examine the extent to which the approximation leads to similar results as correctly specified parameter-driven models. In this case, the use of the sgas would be particularly advantageous as the likelihood can be written in closed form and smoothing is performed through a simple backward recursion. Thus, the computational burden would be much lower than that required for parameter-driven models where computationally demanding simulation-based techniques are employed for both estimation and smoothing. This analysis is similar to that of Koopman et al. (2016), who compared the GAS to correctly specified parameter-driven models and found that the two classes of models have similar predictive accuracy, with very small average losses. We will find a similar result for sgas models. 10

4.1 Linear Gaussian models As a first step, we consider an AR(1) plus noise model: y t = α t + ɛ t, ɛ t N(0, H) (43) α t+1 = γ + T α t + η t, η t N(0, Q) (44) The signal-to-noise ratio is defined as δ. = Q H and the constant is chosen as γ = 0.01. The model is linear and Gaussian and thus the classical Kalman recursions can be applied to obtain smoothed estimates. To apply the sgas, we consider a Gaussian prediction density: [ p(y t f t ; σ 2 1 ) = exp (y ] t f t ) 2 2πσ 2 2σ 2 (45) Setting S t = I 1 t t 1, equation (24) reduces to: f t+1 = c + A(y t f t ) + Bf t (46) while the smoothing recursions (25), (26) reduce to: r t 1 = y t f t + (B A) r t (47) ˆf t = f t + B 1 Ar t 1 (48) t = N,..., 1. We generate 1000 time series with 4000 observations. We use the first 2000 observations for estimation and the last 2000 for testing purposes. Since the sgas has similar form to Kalman smoother backward recursions, we expect that the two methods provide very similar results. Indeed, this is confirmed by the results in Table (1) which compares, for a wide range of autoregressive coefficients T and signal-to-noise ratios δ, average MSE and MAE of GAS and sgas estimates to those obtained through Kalman filtering and smoothing recursions. The GAS provides same results as the Kalman filter and the sgas provides same results as the Kalman smoother, confirming that the two methods are equivalent on linear Gaussian models. 11

4.2 Linear non-gaussian models We add non-gaussianity to the previous model by considering a t-distributed measurement error. The new model reads: y t = α t + ɛ t, ɛ t t(0, H, ν) (49) α t+1 = γ + T α t + η t, η t N(0, Q) (50) We choose γ = 0.01 and T = 0.95. The corresponding observation driven model has a t-distributed predictive density: p(y t f t ; ϕ, β) = Γ[(β + 1)/2] Γ(β/2)ϕ πβ [1 + (y ] t f t ) 2 (β+1)/2 (51) βϕ 2 Setting S t = I 1 t t 1, equation (24) reduces to (see e.g. Harvey 2013): while the smoothing recursions (25), (26) reduce to: f t+1 = c + A(β + 3) y t f t ( ) 2 + Bf t (52) β + y t f t ϕ r t 1 = (β + 3) y t f t ( ) 2 + (B A) r t (53) β + y t f t ϕ ˆf t = f t + B 1 Ar t 1 (54) t = N,..., 1. We compare standard Kalman filtered and smoothed estimates with GAS and sgas estimates. The simulation setting is the same as the one in paragraph (4.1). Table (2) shows relative MSE and MAE for different values of ν. In contrast to the previous case, GAS and sgas provide now better estimates than standard Kalman filter and smoother. In particular, we observe large differences for low values of ν, where the t-distribution strongly deviates from the Gaussian and for low values of δ, at which accounting for the non-gaussianity of the measurement error becomes more important. Note that the gains of sgas over Kalman smoother estimates are larger than the gains of GAS over the Kalman filter for low ν and δ. These results confirm the ability of the sgas to provide robust smoothed estimates of time-varying parameters to the same extent as the GAS provides robust filtered estimates of time-varying parameters in presence of a non-gaussian prediction density. 12

4.3 Nonlinear non-gaussian models We now examine the behaviour of the sgas in presence of nonlinear non-gaussian parameter-driven models. In particular, we consider the following three specifications, which are quite popular in the econometric literature: 1. Stochastic volatility model with Gaussian measurement density: r t = σe 0.5θt ɛ t, ɛ t N(0, 1) θ t+1 = γ + φθ t + η t, η t N(0, σ 2 η) 2. Stochastic volatility with non-gaussian measurement density: r t = σe 0.5θt ɛ t, ɛ t t(0, 1, ν) θ t+1 = γ + φθ t + η t, η t N(0, σ 2 η) 3. Stochastic intensity model with Poisson measurement density: t e λt p(y t λ t ) = λyt y t!, θ t = log λ t θ t+1 = γ + φθ t + η t, η t N(0, σ 2 η) Harvey et al. (1994) proposed a Quasi Maximum Likelihood method (QML) to estimate the stochastic volatility model 1 and 2 based on a Gaussian quasi-likelihood that is obtained through linearization. As the linearized model is a assumed to be normal, it is susceptible of treatment with the Kalman filter and smoother and thus the method can be viewed as providing approximate filtered and smoothed estimates. As such, it is interesting to compare the performance of the QML to that of the sgas. Sandmann and Koopman (1998) devised a Monte-Carlo approach based on importance sampling to evaluate the full likelihood function. We estimate the two stochastc volatility models by employing the same importance sampling approach but use the recently developed Numerically Accelerated Importance Sampling (NAIS) technique of Koopman et al. (2015) to choose the parameters of the importance density. This method has been shown to provide several efficiency gains compared to existing approaches. The stochastic intesity model can also be estimated through importance sampling, as described e.g. by Durbin and Koopman (1997). Similarly to the previous 13

cases, we use importance sampling but choose the parameters of the importance density through the NAIS. More details on importance sampling techniques for nonlinear non-gaussian state-space models can be found on Durbin and Koopman (2012). We choose the predictive densities of the corresponding observation-driven models as indicated below. 1. For the two stochastic volatility models: p(y t f t ) = Γ ( ) [ β+1 2 Γ ( ) 1 + 1 β πβϕ2 e 2 ft β ( yt ϕe 0.5ft ) 2 ] β+1 2 (55) 2. For the stochastic intensity model: p(y t f t ) = e ef t e ftyt y t! (56) The use of a t distribution for the first model is due to the fact that even Gaussian stochastic volatility models are able to generate a predictive density with fat-tails and overdispersion (see e.g. Carnero et al. 2004). Thus, in order for the observation-driven model to capture these features, we adopt a more flexible specification for the predictive density. This is in line with Koopman et al. (2016), who compared GAS models to correctly specified parameter-driven models. In the case of the predictive density in equation (55), setting S t = I 1 t t 1, the filtering recursion (24) reduces to (see e.g. Harvey 2013): f t+1 = c + A β + 3 β β + 1 β ( 1 + 1 β while the smoothing recursions (25), (26) reduce to: ( r t 1 = β + 3 β β + 1 β 1 + 1 β ) 2 y t ϕe 0.5f t ( ) 2 y t ϕe 0.5f t ( y t ϕe 0.5f t y t ϕe 0.5f t ) 2 1 + Bf t (57) ) 2 1 + (B A) r t (58) ˆf t = f t + B 1 Ar t 1 (59) In the case of the predictive density in equation (56), setting S t = I 1 t t 1, the filtering recursion (24) reduces to: f t+1 = c + A(e ft y t 1) + Bf t (60) 14

while the smoothing recursions (25), (26) reduce to: r t 1 = e ft y t 1 + (B A) r t (61) ˆf t = f t + B 1 Ar t 1 (62) Importance sampling is implemented with S = 200 simulations and we also use control variables as described by Koopman et al. (2015). In all experiments we generate 1000 time series of 4000 observations. The first 2000 observations are used for estimation while the last 2000 are used for testing purposes. Figure (4) compares smoothed estimates obtained through the NAIS and the sgas for the three different models at hand. A simple visual inspection shows that the estimates provided by the two methods are very close. In order to examine in more detail differences between sgas and NAIS estimates, tables (3), (4), (5) show the results of Monte Carlo experiments for the three models. In the case of stochastic volatility models, we consider different scenarios where we vary the autoregressive coefficients φ and the coefficient of variation CV, as defined in Sandmann and Koopman (1998): ( σ CV =. 2 ) η exp 1 (63) 1 φ 2 Note that, as CV increases, the signal-to-noise ratio increases. The values of both φ, CV and of the remaining parameters are chosen to be close to those estimated on real financial time series, as discussed in Durbin and Koopman (1997). For the stochastic intensity model, we consider scenarios characterized by different autoregressive coefficients φ and different values of the variance σ 2 η of the signal. In the case of the two stochastic volatility models, the sgas largely outperforms the QML on all scenarios. The performance of the latter tends to worsen as CV decreases, according to the fact that the non-gaussianity of the measurement equation becomes more important. Compared to the NAIS, the relative MSE loss of the sgas is very small. In particular, it is always less than 2% in the Gaussian case, while it is always lower than 2.5% in the non-gaussian case. In contrast to the QML, larger losses are observed for large values of CV, where the the signal-to-noise ratio is larger and non-gaussianity is less relevant. It is worth emphasizing that increasing further the number of simulations S in the NAIS does not lead to significant improvements over the sgas. In the case of the stochastic intensity model, we observe a similar behaviour but the relative MSE loss is slightly larger for φ = 0.98 and σ 2 η = 0.01 where it is found to be around 5%. Overall, 15

average MSE losses are less than 2.5% if one averages across all scenarios. This result is in agreement with what Koopman et al. (2016) found by comparing the prediction performance of the GAS to that of correctly specified parameter-driven models. Finally, it is interesting to look at computational times. In the case of the stochastic volatility model with Gaussian measurement density, the estimation time of the NAIS with S = 200 simulations is on average 45.35 times larger than that of the corresponding observation driven model while the smoothing time is 179.74 times larger. Similar values are achieved for the remaining models. Thus, using the sgas allows to obtain smoothed estimates which are very close to those of correctly specified parameter-driven models but reducing considerably the required computational burden. 5 Conclusions In this paper we have introduced the sgas, a new class of approximate smoothers for nonlinear non-gaussian state-space models. The sgas is based on interpreting observation-driven models and in particular GAS models as filters rather than data generating processes. As such, actual and future observations can be used to improve GAS filtered estimates of time-varying parameters. The sgas has similar structure to Kalman backward smoothing recursions for linear Gaussian state-space models but uses the score of the observation density at hand. Thus, it is particularly advantageous from a computational point of view and can be applied to any observation density, in a similar fashion to GAS models. We have examimed three examples of sgas corresponding to popular observation-driven models, namely GARCH, MEM and an AR(1) model with a time-varying autoregressive coefficient. The sgas updates GAS estimates based on all available observations and thus leads to more efficient estimates. As such, it is useful for signal reconstruction and analysis. Extensive Monte Carlo simulations of nonlinear non-gaussian state-space models show that sgas estimates are very similar to those of correctly specified parameter-driven models. Indeed, losses are always smaller on average than 2.5%. In contrast, the sgas is more appealing from a computational point of view, being much faster than simulation-based techniques that are employed to estimate nonlinear non-gaussian state-space models. 16

GAS - Kalman filter sgas - Kalman smoother δ 0.1 1 10 0.1 1 10 T = 0.90 MSE 1.0001 1.0001 1.0000 1.0001 1.0000 1.0000 MAE 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 T = 0.95 MSE 1.0005 1.0003 1.0000 1.0004 1.0001 1.0000 MAE 1.0002 1.0001 1.0000 1.0002 1.0001 1.0000 T = 0.98 MSE 1.0023 1.0010 1.0001 1.0011 1.0004 1.0000 MAE 1.0008 1.0004 1.0000 1.0005 1.0001 1.0000 Table 1: Average MSE and MAE of GAS (relative to Kalman filter) filtered estimates and sgas (relative to Kalman smoother) smoothed estimates. GAS - Kalman filter sgas - Kalman smoother δ 0.1 1 10 0.1 1 10 ν = 3 MSE 0.8610 0.9522 0.9991 0.8093 0.8876 0.9618 MAE 0.9389 0.9859 1.0036 0.9128 0.9634 1.0169 ν = 5 MSE 0.9552 0.9912 1.0032 0.9376 0.9880 1.0058 MAE 0.9792 0.9973 0.9999 0.9698 0.9949 1.0112 ν = 8 MSE 0.9877 0.9981 1.0029 0.9844 0.9954 1.0117 MAE 0.9939 0.9992 1.0039 0.9917 0.9982 1.0136 Table 2: Average MSE and MAE of GAS (relative to Kalman filter) filtered estimates and sgas (relative to Kalman smoother) smoothed estimates. 17

CV 0.1 1 5 10 0.1 1 5 10 MSE MAE φ = 0.98 NAIS 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 sgas 0.9988 1.0050 1.0001 1.0162 1.0004 1.0043 1.0017 1.0097 QML 1.4153 1.3880 1.3333 1.3138 1.1797 1.1739 1.1564 1.1475 φ = 0.95 NAIS 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 sgas 1.0057 0.9983 0.9988 1.0059 1.0034 1.0023 1.0024 1.0059 QML 1.3131 1.3737 1.3246 1.3168 1.1450 1.1758 1.1567 1.1524 φ = 0.90 NAIS 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 sgas 1.0076 0.9956 0.9974 1.0086 1.0044 1.0010 1.0033 1.0093 QML 1.2371 1.3157 1.2893 1.2750 1.1109 1.1508 1.1422 1.1370 Table 3: Average MSE and MAE of NAIS, sgas and QML smoothed estimates normalized by NAIS loss in case of stochastic volatility model with Gaussian measurement density. 18

CV 0.1 1 5 10 0.1 1 5 10 MSE MAE φ = 0.98, ν = 3 NAIS 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 sgas 1.0026 0.9950 1.0140 1.0169 1.0015 0.9997 1.0077 1.0098 QML 1.3962 1.2553 1.2125 1.1998 1.1735 1.1184 1.1013 1.0939 φ = 0.95, ν = 3 NAIS 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 sgas 1.0014 1.0049 1.0121 1.0200 1.0008 1.0031 1.0064 1.0105 QML 1.3058 1.2639 1.2447 1.2246 1.1354 1.1230 1.1158 1.1056 φ = 0.90, ν = 3 NAIS 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 sgas 1.0020 1.0033 1.0149 1.0221 1.0016 1.0023 1.0081 1.0117 QML 1.2306 1.2325 1.2262 1.2200 1.1026 1.1075 1.1062 1.1034 Table 4: Average MSE and MAE of NAIS, sgas and QML smoothed estimates normalized by NAIS loss in case of stochastic volatility model with non-gaussian measurement density. 19

ση 2 100 0.1 0.5 1 0.1 0.5 1 MSE MAE φ = 0.98 NAIS 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 sgas 1.0149 1.0281 1.0521 1.0067 1.0132 1.0244 φ = 0.95 NAIS 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 sgas 1.0203 1.0176 1.0254 1.0097 1.0083 1.0120 φ = 0.90 NAIS 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 sgas 1.0310 1.0160 1.0205 1.0142 1.0079 1.0099 Table 5: Average MSE and MAE of NAIS and sgas smoothed estimates normalized by NAIS loss in case of stochastic intensity model with Poisson measurement density. 20

0.4 0.5 0.3 0.4 0.2 0.3 0.1 0.2 0 1000 2000 3000 4000 0 1000 2000 3000 4000 1 0.8 0.6 0.4 0.2 0 0 1000 2000 3000 4000 0.4 0.3 0.2 0.1 0 1000 2000 3000 4000 Figure 1: Comparison among simulated (black line), filtered (blue dotted) and smoothed (red line) variance σ 2 t of GARCH(1,1) model. 21

0.4 0.3 0.2 0.1 0 1000 2000 3000 4000 0.6 0.5 0.4 0.3 0.2 0.1 0 1000 2000 3000 4000 1 0.8 0.6 0.4 0.2 0 0 1000 2000 3000 4000 0.4 0.3 0.2 0.1 0 0 1000 2000 3000 4000 Figure 2: Comparison among simulated (black line), filtered (blue dotted) and smoothed (red line) mean µ t of MEM(1,1) model. 22

0.6 0.4 0.2 0-0.2 0 2000 4000 0.8 0.6 0.4 0.2 0 2000 4000 0.8 0.6 0.4 0.2 0 2000 4000 0.8 0.6 0.4 0.2 0 0 2000 4000 Figure 3: Comparison among simulated (black line), filtered (blue dotted) and smoothed (red line) autoregressive coefficient α t of AR(1) model. 23

10 SV Gaussian 5 0 0 200 400 600 800 1000 1200 1400 1600 1800 2000 10 SV non-gaussian 5 1.6 1.4 1.2 0.8 0 0 200 400 600 800 1000 1200 1400 1600 1800 2000 1 Stochastic intensity 0.6 0 200 400 600 800 1000 1200 1400 1600 1800 2000 Figure 4: Comparison among simulated unobserved components (black dotted), NAIS smoothed estimates (red dashed) and sgas smoothed estimates (blue dotted and dashed). 24

References Blasques, F., Koopman, S. J., Lucas, A., 2015. Information-theoretic optimality of observationdriven time series models for continuous responses. Biometrika 102 (2), 325. Bollerslev, T., April 1986. Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics 31 (3), 307 327. Carnero, M. A., Peña, D., Ruiz, E., 2004. Persistence and kurtosis in garch and stochastic volatility models. Journal of Financial Econometrics 2 (2), 319 342. Creal, D., Koopman, S. J., Lucas, A., 2011. A dynamic multivariate heavy-tailed model for timevarying volatilities and correlations. Journal of Business & Economic Statistics 29 (4), 552 563. Creal, D., Koopman, S. J., Lucas, A., 2013. Generalized autoregressive score models with applications. Journal of Applied Econometrics 28 (5), 777 795. Durbin, J., Koopman, S., 2012. Time Series Analysis by State Space Methods: Second Edition. Oxford Statistical Science Series. OUP Oxford. Durbin, J., Koopman, S. J., 1997. Monte carlo maximum likelihood estimation for non-gaussian state space models. Biometrika 84 (3), 669 684. Engle, R., 2002. New frontiers for arch models. Journal of Applied Econometrics 17 (5), 425 446. Engle, R. F., Gallo, G. M., 2006. A multiple indicators model for volatility using intra-daily data. Journal of Econometrics 131 (1), 3 27. Harvey, A., Luati, A., 2014. Filtering with heavy tails. Journal of the American Statistical Association 109 (507), 1112 1122. Harvey, A., Ruiz, E., Shephard, N., 1994. Multivariate stochastic variance models. The Review of Economic Studies 61 (2), 247 264. Harvey, A. C., 2013. Dynamic Models for Volatility and Heavy Tails: With Applications to Financial and Economic Time Series. Econometric Society Monographs. Cambridge University Press. 25

Koopman, S. J., Lucas, A., Scharth, M., 2015. Numerically accelerated importance sampling for nonlinear non-gaussian state-space models. Journal of Business & Economic Statistics 33 (1), 114 127. Koopman, S. J., Lucas, A., Scharth, M., March 2016. Predicting Time-Varying Parameters with Parameter-Driven and Observation-Driven Models. The Review of Economics and Statistics 98 (1), 97 110. Nelson, D. B., 1992. Filtering and forecasting with misspecified arch models i: Getting the right variance with the wrong model. Journal of Econometrics 52 (1), 61 90. Nelson, D. B., 1996. Asymptotically optimal smoothing with arch models. Econometrica 64 (3), 561 573. Nelson, D. B., Foster, D. P., 1994. Asymptotic filtering theory for univariate arch models. Econometrica 62 (1), 1 41. Nelson, D. B., Foster, D. P., 1995. Filtering and forecasting with misspecified arch models ii: Making the right forecast with the wrong model. Journal of Econometrics 67 (2), 303 335. Oh, D. H., Patton, A. J., 2017. Time-varying systemic risk: Evidence from a dynamic copula model of cds spreads. Journal of Business & Economic Statistics 0 (0), 1 15. Sandmann, G., Koopman, S. J., 1998. Estimation of stochastic volatility models via monte carlo maximum likelihood. Journal of Econometrics 87 (2), 271 301. 26