Estimating Expected Shortfall Using a Conditional Autoregressive Model: CARES

Estimating Expected Shortfall Using a Conditional Autoregressive Model: CARES Yin Liao and Daniel Smith March 23, 2014 Abstract In financial risk management, the expected shortfall (ES) becomes an increasingly popular downside risk measure due to its desired sub-additivity property, which is lacking in the Value at Risk (VaR). This paper propose a new conditional autoregressive model to estimate ES. This model specifies the evolution of the ES over time using an autoregressive process and estimates the model parameters by jointly solving two minimization problems. We derive asymptotic properties of the model estimators and illustrate attractive finite sample property of the model throughout a simulation study. As an illustration, we apply the model to evaluate the ES of stock market index and individual stocks. 1 Introduction In the past few years, financial markets have been experiencing an unprecedented crisis. This turmoil has emphasized the need for accurate risk measures for financial institutions. Value at risk (VaR), a measure of how much an asset or a certain portfolio can lose within a given time period for a given confidence level, has gained a great popularity among financial practitioners due to its conceptual simplicity. However, VaR has several shortcomings, which have been criticized for long. First, it reports only a quantile of the return distribution and disregards the expected loss beyond the quantile. In addition, VaR is not a coherent risk measure because it fails to be subadditive. In order to deal with these conceptual issues caused by VaR, Artzner, Delbaen, Eber, and Heath (1999) introduced a new measure of financial risk referred to as the expected shortfall (ES). 1

ES is defined as the conditional expectation of the return given that it exceeds the VaR. In more detail, let X t, t = 1,..., n denotes the price of an asset or a portfolio over n periods, and y t = log(x t /X t 1 ) is the negative log return over the t th period. Suppose y t, t = 1,..., n is a stationary process with marginal distribution function F, the VaR at a given probability τ is V ar τ = inf{u : F (u) τ}, which is the τ th quantile of the distribution function F, and the ES associated with the probability τ is consequently defined as ES τ = E(y t y t > V ar τ ). ES is a risk measure that overcomes the weaknesses of VaR, and becomes increasingly widely used in the market. Despite ES is conceptually superior to VaR, its modeling and measuring is still an ongoing research problem without reaching a consensus. Because the ES is simply the expected loss beyond an extreme value measured by VaR, its estimation cannot be independent of measuring VaR. Meanwhile, as the distribution of the returns typically changes over time, the challenge of measuring ES is to find a suitable way to simultaneously model time-varying conditional VaR, as well as time-varying conditional expectation of exceedances beyond the VaR. Therefore, any reasonable methodology should provide formulas for calculating V ar t and ES t as a function of variables known at time t 1 and a set of parameters that need to be estimated, as long as a procedure to estimate the set of unknown parameters. Most of the existing models for calculating VaR and ES focus on modeling the whole time-varying distribution of the return, then computing the corresponding quantile as VaR and the expected value beyond the quantile as ES. A recent development in the VaR literature is the conditional autoregressive value at risk (CAViaR) class of models (see Engle and Manganelli (2004)). It specifies the evolution of the quantile over time using a special type of autoregressive process, and estimates the parameters with regression quantiles. This approach has strong appeal in that it focuses the tail of return distribution directly and does not rely on any distributional assumption. We follow this line to propose a conditional autoregressive specification for ES, which we call conditional autoregressive expected shortfall(cares) model. It specifies the evolution of the ES itself over time and estimates the unknown parameters by minimizing the loss function E(I(y t < V ar t (τ)) (y t ES t (τ)) 2 ) throughout jointly solving two minimization problems. The first is a quantile estimation problem, and the objective function 1 is, min T β T t=1 (τ I(y t < f t (β)))(y t f t (β)), where f t (β) is a dynamic specification for V ar t, and β is the parameter vector. The second is a least square problem that only 1 focuses on the tail of the return distribution as min T γ T t=1 I(y t < f t (β))(y t g t (γ)) 2, where g t (γ) is a dynamic specification for ES t, and γ is the parameter vector. The first order conditions of the two minimization problems imply two moment conditions 2

for the unknown parameters β and γ, so that the model parameter estimators can be regarded as generalized moment of method (GMM) type estimators. Therefore, we extend the standard asymptotical theory of GMM estimator to provide the consistency and asymptotic results for our model parameter estimators. Meanwhile, we conduct a simulation study to exam the finite sample property of the new model. Comparing with several widely used ES estimation approaches, the CARES model is able to provide better out-of-sample ES forecasts. Lastly, it is worth noting that Taylor (2008) developed a conditional ES modeling (and they call CARE model) which is similar to our CARES model. It also specifies a conditional dynamic model for ES itself and estimate ES throughout estimating the model parameters. However, the CARE model differs from our CARES model in essence and has one more parameter which brings in extra estimation uncertainty. Taylor (2008) uses expectile as quantile estimator (and VaR) given that there is a corresponding α th - expectile for each τ th -quantile, then links the conditional ES to the conditional expectile throughout ES t τ = (1 + α )V ar (1 2α)τ t(τ) to obtain the parameter estimators of the CARE model. Therefore, the success of the CARE model for VaR and ES estimation relies on the value of α that one selects to ensure the proportion of the observations lying below the conditional α th expectile is τ. Meanwhile, as the distribution of the return is time-varying, the value of α corresponding to a given τ varies over time. The need to estimate α at each time point makes the CARE model appear to be not only more computationally demanding, but also associated with more estimation errors from the uncertainty of α. We will further illustrate how the extra estimation procedure caused by α influences the model forecasting performance in Section 4. (Daniel definitely can add more for this model and explain the shortcoming of this model better) The rest of the article is structured as follows. Section 2 reviews the current approaches to ES estimation, and Section 3 introduces the CARES model and establishes consistency and asymptotic normality for the model estimator. Section 4 conducts a Monte Carlo simulation to study the finite sample property of the CARES model, and compares its out-of-sample forecasts with other commonly used ES estimation approaches. Section 5 presents an empirical application to real data. Section 6 concludes the article. 2 Expected Shortfall Models The existing approaches for calculating expected shortfall mainly focus on modeling the whole return distribution, and then derive the ES from the distribution. These approaches can be divided into three different categories: parametric, semiparametric, and 3

nonparametric. Parametric approaches involve a parameterization of the time-varying stochastic behavior of financial asset prices. Conditional VaR and ES are estimated from a conditional volatility forecast with an assumption of the asset return distribution. GARCH models are often used to forecast the volatility and the distribution is typically assumed to be Gaussian or the student-t. Turning to the nonparametric methods, they estimate the distribution of asset returns based on the data without any assumptions. The VaR and ES are naturally calculated as the quantile of the estimated distribution and the corresponding expected loss. The most widely used methods so far is historical simulation and kernel smoothing estimation. The both approaches require no distribution assumptions. The former estimates the VaR as the quantile of the empirical distribution of historical returns from a moving window of the most recent observations, and the ES can be estimated as the mean of the returns that exceed the VaR estimates. The latter uses kernel smoothing technique with an optimal bandwidth on historical returns to estimate the conditional distribution of returns, then VaR and ES can be sequently estimated. There are, however, two severe problems with the above approaches. On one hand, an assumption on the return distribution must be invoked in parametric approach. Unfortunately, the assumption imposed is usually at odds with the real data. On the other hand, nonparametric method is notoriously hard to be applied with little data. Meanwhile, it assumes that returns are independent and identically distributed, and hence does not allow for time-varying volatility. Semiparametric approaches consequently emerge to solve these problems. These approaches include those based on extreme value analysis and quantile or expectile regression. A recent proposal for VaR using quantile regression is the class of CAViaR models introduced by Engle and Manganelli (2004). Kuan, Yeh, and Hsu (2009) proposed the expectile-based VaR estimation(evar), which is more sensitive to the magnitude of extreme losses than the quantile-based VaR (QVaR). However, an undesirable property of these models is that it is not clear how to estimate the corresponding ES. Taylor (2008) extended the expectiles theory to deliver estimates for ES. This method firstly builds up a conditional autoregressive expectile model for the estimation of VaR, then convert the estimated conditional expectile to the conditional ES throughout a specific function. Although this model is good to allow for the time-varying property without any distribution assumption, an extra parameter which links the expectile to quantile involved in the model increases the estimation uncertainty. 4

3 CARES model 3.1 Model Description In this section, we propose a new approach to ES estimation. In contrast to modeling the whole distribution or modeling the quantile, we model the ES directly. Before presenting the model, it is worth to explain the rationale behind it. Assuming that an asset return r t follows a Gaussian distribution with mean µ t and standard deviation σ t, and φ(r t ; µ t, σ 2 t ) and Φ(r t ; µ t, σ 2 t ) are respectively the density and distribution functions of the return, we have the V ar t for a given probability τ is and the corresponding ES is V ar t = µ t + σ t Φ [ 1] (τ), (1) ES t = E(r t r t V ar t ) = V ar t φ(r t ; µ t, σt 2 ) r t Φ(V ar t ; µ t, σt 2 ) dr = µ t + = µ t σ 2 t [ σ 2 t φ(r t ; 0, σt 2 ) Φ(V ar t ; 0, σt 2 ) φ(v ar t ; 0, σ 2 t ) Φ(V ar t ; 0, σ 2 t ). V ar t Since (V ar t µ t )/σ t = Φ [ 1] (τ), we can rearrange the above equation as below ES t = E(r t r t V ar t ) = µ t σ t φ(φ[ 1] (τ)). (2) τ So far, it is clear that both V ar t and ES t are proportional to the standard deviation σ t, which suggests that the evolutions of both V ar t and ES t are triggered by the timevarying volatility, and the same functional form for VaR and ES would be appropriate. Please note that we use Gaussian distribution as an example to reveal the linkage between the volatility and ES (or VaR), and the relationship displayed here should be held in any distribution with different functional forms. Consequently, we propose a conditional autoregressive model for ES to formalize its dynamic characteristics, and the model is referred to as CARES. Recalling the CAViaR model of Engle and Manganelli (2004) in which the conditional quantile is specified as an autoregressive function f t (β) that depends on the parameter 5

vector β as f t (β) = β 0 + q β i f t i (β) + i=1 r β q+i l(x t i ), (3) where β i f t i (β), i = 1,..., q are the autoregressive terms, which ensure that the quantile changes smoothly over time, and the role of l(x t j ) is to link f t (β) to observable variables that belong to the information set. Some examples of CAViaR model are as follows: Symmetric absolute value: Asymmetric slope: Indirect GARCH(1,1): i=1 f t (β) = β 1 + β 2 f t 1 (β) + β 3 r t 1 f t (β) = β 1 + β 2 f t 1 (β) + β 3 (r t 1 ) + + β 4 (r t 1 ) f t (β) = (β 1 + β 2 f 2 t 1(β) + β 3 r 2 t 1) 1/2 Therefore, we introduce a similar model for ES as g t (γ) = γ 0 + q γ i g t i (γ) + i=1 r γ q+i m(x t i ), (4) i=1 where γ i g t i (γ), i = 1,..., q are the autoregressive terms, and m(x t j ) term is used to link g t (γ) to observable variables that belong to the information set. Some examples of the CARES model can be easily obtained from the above three CAViaR models by using their ES analogies. 3.2 Model Estimation Next, we estimate the parameters in the CARES model by jointly solving two problems. Assuming that the level τ quantile of a sample of return observations y 1,..., y T follows CAViaR model, that is, V ar t (τ) = f t (β 0 (τ)), (5) where f is assumed known up to the vector of parameters β 0, and the corresponding ES depends on another vector of parameters γ 0 as ES t (τ) = g t (γ 0 (τ)), (6) 6

then both the τ quantile and the ES can be defined by θ 0 = (θ 01, θ 02 ) = (β 0 (τ), γ 0 (τ) ) that minimizes the loss function E(I(y t < V ar t (τ)) (y t ES t (τ)) 2 ). (7) The estimator for θ 0, denoted as ˆθ, then can be obtained by minimizing the sample counterpart T I(y t < V ar t (τ)) (y t ES t (τ)) 2 (8) T 1 t=1 throughout a two stage procedure. In the first stage we estimate equation (5) by solving 1 min β T T (τ I(y t < V ar t (τ))) (y t V ar t (τ)) (9) t=1 to obtain ˆβ and V arˆ t (τ). In the second stage, the estimated V ar t (τ) is used as an observation to estimate the parameters of (6) by solving 1 min γ T T I(y t < V arˆ t (τ)) (y t ES t (τ)) 2. (10) t=1 Alternatively, the parameters in the two equations (5) and (6) can be jointly estimated by solving the two problems (9) and (10) together. The two first order conditions involved here are ( ) 1 T T t=1 βf t (β(τ)) (τ I(y t < f t (β(τ)))) = 0 1 T T t=1 (11) γg t (γ(τ)) (y t g t (γ(τ))) I(y t < f t (β(τ)) = 0, where and f t (β) = d dβ f t(β), (12) g t (γ) = d dγ g t(γ). (13) Therefore, ˆθ is actually the resulting generalized method of moment (GMM) estimator given two moment conditions implied by the above two first order conditions. Then, the asymptotic distribution of ˆθ can be established within the GMM framework. Theorem 3.1 and Theorem 3.2 show that the GMM estimator ˆθ is consistent and asymptotically normal. Theorem 3.3 provides a consistent estimator of the variance-covariance matrix. The related assumptions and detailed proof are provided in Appendix A. 7

Theorem 3.1. (Consistency) Under assumptions 6.1 and 6.2, we have as T. θ(τ) ˆ P θ 0 (τ) Proof. See Appendix A. Theorem 3.2. (Asymptotic normality) Given assumptions 6.3-6.5, we have as T, T (ˆθ θ0 ) D N(0, Σ(θ 0 )) (14) where Σ(θ 0 ) = D(θ 0 ) 1 S(θ 0 )(D(θ 0 ) 1 ) with [ ] D11 D D(θ 0 ) = 12 D 21 D [ 22 ] E( = β f t (β 0 (τ)) β f t (β 0 (τ)) h(0)) 0 E( γ g t (γ 0 (τ)) β f t (β 0 (τ)) (f t (β 0 (τ)) g t (γ 0 (τ)))h(0)) E( γ g t (γ 0 (τ)) γ g t (γ 0 (τ)) τ) (15) and [ ] S11 S S(θ 0 ) = 12 S 21 S [ 22 ] (16) τ(1 τ)e( β f = t (β 0 (τ)) β f t (β 0 (τ)) ) 0 0 E( g t (γ(τ)) γ g t (γ 0 (τ)) ) T V where h(.) is the density function, and T V = E((y t g t (γ 0 (τ)) 2 I(y t f t (β 0 (τ)) < 0)). Proof. See Appendix A. The basic idea is that we approximates the (discontinuous) gradient of the objective function by its continuously differentiable expectation, and then relates this approximation to the asymptotic first-order condition to set the approximation of the gradient asymptotically equal to zero. So that the standard Taylor expansion can be implemented to derive the asymptotic theory of the parameter estimators. The way for obtaining such an approximation is provided by the theorem of Huber (1967). This technique is widely used in the quantile and expectile regression. See Engle and Manganelli (2004) and Kuan et al. (2009) for some recent applications. 8

Theorem 3.3. (Variance-covariance matrix estimation) Under assumptions and the conditions of Theorem 3.1 and Theorem 3.2, the asymptotic variance-covariance matrix Σ(θ) can be consistently estimated by Σ(θ) ˆ = D(θ) ˆ 1 S(θ) ˆ D(θ) ˆ 1, where [ ] ˆ D(θ) ˆ D = 11 (θ) D 12 ˆ(θ) [ 1 2T c T 1 T ˆ D 21 (θ) ˆ D 22 (θ) 1 T T t=1 βf t ( β(τ)) ˆ β f t ( β(τ)) ˆ ] h(0) 0 T t=1 γg t ( γ(τ)) ˆ β f t ( β(τ)) ˆ (f t ( β(τ)) ˆ g t ( γ(τ)))h(0) ˆ 1 T T t=1 γg t ( γ(τ)) ˆ γ g t ( γ(τ)) ˆ τ 1 T 2T c T t=1 βf t ( β(τ)) ˆ β f t ( β(τ)) ˆ I( y t f t ( β(τ)) ˆ < c T ) 0 1 T T t=1 γg t ( ˆ T t=1 γg t ( ˆ γ(τ)) β f t ( ˆ β(τ)) (f t ( ˆ β(τ)) g t ( ˆ γ(τ)))i( y t f t ( ˆ β(τ)) < c T ) 1 T ˆ S(θ) = T t=1 τ(1 τ) βf t ( ˆ β(τ)) β f t ( ˆ 0 P [ D(θ 0 ), ] S 11 ˆ (θ) S 12 ˆ(θ) S 21 ˆ(θ) S 22 ˆ(θ) β(τ)) 0 1 T P S(θ 0 ), γ g t ( ˆ γ(τ)) τ T t=1 g t( ˆ γ(τ)) γ g t ( ˆ γ(τ)) )(y t g t ( ˆ I(y t f t ( ˆ β(τ)) < 0) where c T is a bandwidth, which can be defined by two ways. The first is the k-nearest neighbor estimator used in Engle and Manganelli (2004), with k = 40 for 1% VaR and ES and k = 60 for 5% VaR and ES. We follow Koenker (2005) to define the other one as (17) γ(τ)) 2 γ(τ)) c T = ŝ(φ 1 (τ + h T ) Φ 1 (τ h T )), (18) where ŝ = min(sd(y t f t ( ˆβ)), IQR(y t f t ( ˆβ)))/1.34, and h T = T 1/5 [ 4.5φ4 (Φ 1 (t)) ] 1/5 (2Φ 1 (t) 2 +1) 2 following Bofinger (1975), or h T = T 1/3 Φ 1 (1 0.025) 2/3 [ 1.5φ2 (Φ 1 (τ)) ] 1/3 following Hall (2Φ 1 (τ) 2 +1) 2 and Sheather (1988). The proof of this Theorem (including the assumptions) is quite similar to Theorem 3 of Engle and Manganelli (2004). We omit the details here. Meanwhile, we undertake a small simulation study to investigate the finite sample property of the model parameter estimators, and observe the behavior of these estimators as the sample size increases. 9

To do this, we generate an asset or portfolio s return from a GARCH(1,1) model r t = σ t z t, σ 2 t = a 0 + a 1 r 2 t 1 + a 2 σ 2 t 1, where the parameters are set to be a 0 = 0.025, b 0 = 0.0500, c 0 = 0.9250, and the disturbance z t follows a standard Gaussian distribution. Based on the relationship between the conditional VaR/ES and the standard deviation of the return, as shown in Section 2, we are able to derive the true values for the parameters of the indirect GARCH(1,1) specification of the above model as β 0 = a 0 (Φ 1 (τ)) 2, β 1 = a 2, β 2 = a 1 (Φ 1 (τ)) 2, γ 0 = a 0 ( φ(φ 1 (τ))/τ) 2, γ 1 = a 2, γ 2 = a 1 ( φ(φ 1 (τ))/τ) 2, where Φ and φ are cumulative density function and probability density function of the standard Gaussian distribution, and τ is the coverage probability. See Appendix B for the derivation details. We generate 10000 samples of size 1000, 2000, 5000 and 10000 from the above GARCH(1,1) model with the initial values of the return and volatility drawn from the corresponding unconditional distributions implied by the model. For each sample, we estimate the parameters of the above model when coverage probability is 5% or 1% by the two stage procedure, and the mean and standard deviation of the estimator for each parameter computed from 10000 replications for different sample size are respectively reported in Table 1 panel A and panel B. It is important to note that the performance of the estimator is quite good even when the sample size is moderate (T = 1000), and the bias and standard deviations of the estimators decline as expected with the sample size. It is apparent that each parameter estimator is converging to the true value of the parameter as T increases, which verifies the consistency of the estimators. Meanwhile, we calculate the average theoretical standard error of each parameter estimator (the number reported with square brackets in Table 1) by using the estimated value of parameters from each simulation with the asymptotical theory provided above, and compare it with the standard deviation of each parameter estimator across the replications of simulation (The numbers reported with parentheses in Table 1). The fact that the two standard errors are quite close prove the validity of the asymptotic distribution we derived above for the model parameter estimators. Moreover, in order to investigate the degree of efficiency loss in CARES model estimation, we alternatively compute the theoretical standard error of parameter estimators in CARES model by relying on asymptotical standard error of the above GARCH(1,1) model parameters (a 0, a 1, and a 2.), with an appropriate scaling based on the relationship between GARCH(1,1) model parameters and CARES model parameters. With 10000 10

Table 1: Finite Sample Property of Each Parameter Estimator of CARES Model Panel A: τ = 0.05 Sample Size T = 1000 T = 2000 T = 5000 T = 10000 True Parameters Mean Estimated Parameter (Standard Deviation) β0 = a0(φ ( 1) (τ)) 2 = 0.0676 β1 = a2 = 0.9250 β2 = a1(φ ( 1) (τ)) 2 = 0.1353 γ0 = a0( φ(φ ( 1) (τ))/τ) 2 = 0.1064 γ1 = a2 = 0.9250 γ2 = a1( φ(φ ( 1) (τ))/τ) 2 = 0.2127 0.0790 0.0743 0.0703 0.0685 (0.0470) a (0.0329) (0.0194) (0.0140) [0.0356] b [0.0314] [0.0192] [0.0123] 0.9219 0.9231 0.9246 0.9248 (0.0248) (0.0173) (0.0108) (0.0086) [0.0211] [0.0111] [0.0102] [0.0082] 0.1310 0.1332 0.1333 0.1348 (0.0561) (0.0388) (0.0248) (0.0171) [0.0402] [0.0285] [0.0214] [0.0141] 0.2119 0.1555 0.1216 0.1139 (0.1881) (0.0914) (0.0432) (0.0283) [0.1054] [0.0825] [0.0394] [0.0223] 0.8924 0.9093 0.9199 0.9223 (0.0496) (0.0264) (0.0147) (0.0103) [0.0309] [0.0213] [0.0134] [0.0101] 0.2382 0.2276 0.2186 0.2163 (0.0938) (0.0633) (0.0401) (0.0294) [0.0828] [0.0529] [0.0392] [0.0261] a The standard deviation of the parameter estimators across the simulation. b The average theoretical standard error. 11

Panel B: τ = 0.01 Sample Size T = 1000 T = 2000 T = 5000 T = 10000 True Parameters Mean Estimated Parameter (Standard Deviation) β0 = a0(φ ( 1) (τ)) 2 = 0.1353 β1 = a2 = 0.9250 β2 = a1(φ ( 1) (τ)) 2 = 0.2706 γ0 = a0( φ(φ ( 1) (τ))/τ) 2 = 0.1776 γ1 = a2 = 0.9250 γ2 = a1( φ(φ ( 1) (τ))/τ) 2 = 0.3552 0.1534 0.1531 0.1420 0.1398 (0.0989) (0.0815) (0.0523) (0.0366) [0.2360] [0.1218] [0.0685] [0.0281] 0.9248 0.9221 0.9239 0.9243 (0.0279) (0.0217) (0.0144) (0.0099) [0.0489] [0.0250] [0.0141] [0.0099] 0.2492 0.2658 0.2684 0.2689 (0.1395) (0.0963) (0.0611) (0.0312) [0.1826] [0.1175] [0.0780] [0.0302] 0.2342 0.2139 0.1897 0.1852 (0.2089) (0.1382) (0.0762) (0.0406) [0.3049] [0.2156] [0.0950] [0.0404] 0.9151 0.9191 0.9229 0.9238 (0.0388) (0.0144) (0.0139) (0.0096) [0.0711] [0.0308] [0.0217] [0.0095] 0.3564 0.3559 0.3555 0.3553 (0.2177) (0.1416) (0.0860) (0.0498) [0.4288] [0.2618] [0.1073] [0.0470] 12

samples and sample size T = 10000, the implied standard errors of the parameter estimators in CARES model from GARCH(1,1) model should be 0.0124 for β 0, 0.0081 for β 1, 0.0134 for β 2, 0.0194 for γ 0, 0.0081 for γ 1 and 0.0211 for γ 2 when τ = 0.05, and 0.0248 for β 0, 0.0081 for β 1, 0.0269 for β 2, 0.0325 for γ 0, 0.0081 for γ 1 and 0.0353 for γ 2 when τ = 0.01. The fact that the implied standard errors reported here are quite close to ones reported in Table 1 based on the simulation suggests that our procedure is able to efficiently estimate CARES model. 4 Simulation Study To illustrate the finite sample property of the CARES model, we conduct some simple simulation studies to explain its superiority to other popular models with respect to ES forecast. In all cases, performance is measured in terms of root mean squared error (RMSE). The RMSE of an ES forecasting ÊS from an arbitrary model has the standard definition E((ES ES) ˆ 2 ), where ES is the true value of ES. In all the following simulations, the RMSE is approximated by square root of averaging 10 4 realizations of (ES ÊS)2. We begin with generating data from a simple model. Assuming that an asset or portfolio s return follows the GARCH(1,1) model as described in Section 3.2. This model allows for time-varying volatility, and thereby the time-varying VaR and ES. Figure 1 displays the empirical density of an asset return obtained from this model by setting the parameters as a 0 = 0.025, b 0 = 0.0500, and c 0 = 0.9250. Compared with the standard Gaussian density, the density of GARCH(1,1)-GAUSSIAN model has a fatter tail. In order to study the out-of-sample ES forecasting properties of the CARES model, we simulate data from the above model with sample size T + 500 + 1 = 751, T + 500 + 1 = 1001, T + 500 + 1 = 1501 and T + 500 + 1 = 2501. For every sample, the first 500 observations are discarded in order to allow for a sufficiently long burn-in period. Then, we use the first T observations to fit the three CARES models as discussed in Section 3, and leave the (T + 1)th observation for the one-step-ahead out-of-sample 1% ES and 5% ES forecasting evaluation. The RMSE of forecasting is computed through replicating the simulation by 10 4 times. Figure 2 shows the 5% ES forecasts from the CARES model (the indirect GARCH(1,1) specification) against the true value of 5% ES for the GARCH(1,1)-GAUSSIAN model when the sample size is 2000. We observe that the true value of ES exhibits a strong dynamic clustering, and the 5% ES forecasts from the two specifications are able to capture this pattern and fit the true value of ES very well. 13

4 5 GARCH(1,1) GAUSSIAN density Standard GAUSSIAN density 3 5 2 5 1 5 0 8 6 4 2 0 2 4 6 8 Figure 1: model The density of an asset return obtained from a GARCH(1,1)-GAUSSIAN For comparison, we also compute the RMSE of ES forecasts from two commonly used nonparametric methods and the CARE model in Taylor (2008). The first nonparametric method is historical simulation (HS). By assuming asset returns are independent and identically distributed, HS obtains empirical distribution of the return from past observations, and calculate a certain percentile of the empirical distribution and the expected loss beyond this percentile as the corresponding VaR and ES measures for next period. As the performance of HS largely depends on the length of historical data used to form the empirical distribution, we vary the length from past 250 observations to past 500 observations in our simulations. The second is kernel-based nonparametric ES estimator (KDE) (see Scaillet (2004). The KDE uses the historical returns r 1, r 2,..., r n as the sample, and takes the form ES KDE = (np) 1 n t=1 r tg h (V ar KDE r t ), where V ar KDE is the kernel based VaR estimator, G h (t) = G(t/h), and G(t) = t K(u)du. K and h are the standard Gaussian Kernel and the optimal bandwidth. As the standard KDE estimator is usually to be biased 1, in the simulation we implement jackknife technique to correct bias on this estimator. In CARE model, we can have the relationship between α and τ (1% or 5%) in a close form as we know the return follows a GARCH(1,1)-GAUSSIAN bias. 1 See Theorem 2 of Chen (2008) for more discussion about this and the detailed expression of the 14

1 1.5 2 2.5 3 CAViaR IGARCH VaR True VaR 3.5 0 500 1000 1500 1 1.5 2 2.5 3 CARES IGARCH ES True ES 3.5 0 500 1000 1500 Figure 2: The ES forecasts of CARES models v.s true ES for a GARCH(1,1)- GAUSSIAN model 15

process. In other word, the true value of α is known. So we provide ES forecasts from the CARE model under two scenarios when α is estimated (that is, ˆα) by using grid search 2 and when α takes its true value α 0. The difference between the ES forecasts from the two scenarios is helpful for us to understand the extra uncertainty introduced by the estimation of α. The results of 5% and 1% ES forecasts are respectively shown in the Table 2 3. When sample size is small (T = 250), the CARES models perform even worse than historical simulation and nonparametric estimators. This is not surprising, as we use tail observations to fit the CARES models, and the tail observations are very less when sample size and coverage probability are both small. The data limitation outweighs the advantage of dynamic specification in the CARES models, which results in the inferiority of the CARES models to the historical simulation and kernel based nonparametric estimators in these cases. This is further corroborated by the fact that 1% ES forecasts from the CARES models are more worse than 5% ES forecasts. However, the CARES models exhibit superior performance to HS and KDE methods when the sample size increases. The RMSE of 5% ES forecasts from the CARES models are smaller than those from the historical simulation and nonparametric estimators when sample size increases to 500, and the RMSE of 1% ES forecasts from the CARES models are smaller than those two methods when sample size increases to 1000. The advantage of the CARES models become more obvious as the sample size increases to 2000. Compared with the CARE model, when alpha takes its true value α 0, the performances of our CARES models are quite similar. However, when we use the estimated value (ˆα) of alpha, any version of our CARES models perform better than the CARE models regardless of the sample size. These results show the evidence that the need to estimate α for a certain coverage probability level introduces an extra estimation error and deteriorates the forecasting performance of the CARE model. We study another two examples to further investigate the advantage of our CARES model, particularly when the data is from a comprehensive distribution. Consider a 2 We follow Taylor (2008) to find the optimal value of α by estimating models for different values of α over a grid with step size of 0.0001. The final optimal value of α was derived by linearly interpolating between grid values. 3 In consistent with Taylor (2008), we found that the asymmetric slope CARE model and CARES model were outperformed by the symmetric versions of these models. So in the remainder of this paper, we do not consider further the asymmetric slope version of these models. 16

Table 2: ES forecasts When Data is from GARCH(1,1)-GAUSSIAN Model 5% ES T=250 T=500 T=1000 T=2000 Bias RMSE Bias RMSE Bias RMSE Bias RMSE HS(250) a -0.0212 0.3332-0.0297 0.3413-0.0416 0.3595-0.0215 0.3458 HS(500) b NA NA -0.0438 0.3298-0.0449 0.3441-0.0212 0.3307 KDE c 0.1015 0.3466 0.0391 0.3387 0.0127 0.3361 0.0054 0.3327 KDE-JK d -0.0301 0.3351-0.0109 0.3347-0.0121 0.3336-0.0501 0.3317 CARE-SAV(α0) e 0.0205 0.3571 0.0067 0.2545-0.0020 0.2095 0.1729 0.1849 CARE-IG(α0) f 0.0232 0.3079 0.0021 0.2534 0.0054 0.1942 0.0325 0.1603 CARE-SAV(ˆα) 0.0234 0.4036 0.0077 0.2895-0.0026 0.2360 0.1967 0.2088 CARE-IG(ˆα) 0.0264 0.3481 0.0023 0.2857 0.0062 0.2209 0.0370 0.1824 CARES-SAV g 0.0207 0.3577 0.0068 0.2566-0.0023 0.2092 0.1743 0.1851 CARES-IG h 0.0234 0.3085 0.0020 0.2532 0.0055 0.1958 0.0328 0.1616 1% ES T=250 T=500 T=1000 T=2000 Bias RMSE Bias RMSE Bias RMSE Bias RMSE HS(250) -0.0616 0.5060-0.0690 0.5083-0.0675 0.5413-0.0518 0.5253 HS(500) NA NA -0.0866 0.4691-0.0855 0.4961-0.0506 0.4807 KDE 0.1377 0.5142 0.1296 0.4682-0.0428 0.4686-0.0684 0.4629 KDE-JK -0.1057 0.4905-0.1216 0.4815-0.0412 0.4873-0.0647 0.4784 CARE-SAV(α0) -1.5641 2.5858-0.4956 0.7421-0.0594 0.4075-0.0020 0.3090 CARE-IG(α0) -0.5295 1.3861-0.3209 0.6170-0.0134 0.3795 0.0049 0.3086 CARE-SAV(ˆα) -1.7797 2.6620-0.5639 0.8356-0.06759 0.4637-0.0023 0.3516 CARE-IG(ˆα) -0.6025 1.5658-0.3652 0.6975-0.0152 0.4280 0.0055 0.3477 CARES-SAV -1.5770 2.5867-0.4997 0.7405-0.0599 0.4109-0.0020 0.3116 CARES-IG -0.5339 1.3876-0.3236 0.6181-0.0135 0.3793 0.0049 0.3082 a Historical simulation with the recent 250 observations. b Historical simulation with the recent 500 observations. c Kernel based ES estimator. d Kernel based ES estimator with Jackknife bias correction. e CARE model with symmetric absolute value specification. f CARE model with indirect GARCH(1,1) specification. g CARES model with symmetric absolute value specification. h CARES model with indirect GARCH(1,1) specification. 17

normal mixture (NM) stochastic volatility (SV) model as r t NM(p 1,..., p K ; µ 1,..., µ K ; σ 2 1t,..., σ 2 Kt ), K i=1 p i = 1, σ 2 it = ω i + α i σ 2 it 1 + ɛ it, i = 1,..., K, (19) where K represents the No. of components in the mixture normal distribution, and the disturbance of volatility ɛ follows a standard Gaussian distribution. We use a NM(2) SV model in the simulation for simplicity. Following the EUR exchange rate analysis of Alexander and Lazar (2006), the parameters are set to be p = 0.6927, ω 1 = 0.000046, α 1 = 0.0248, ω 2 = 0.0004, α 2 = 0.1066, µ 1 = 1,and µ 2 = 1. The data generated from this model has a skewed leptokurtic conditional density. Figure 3 displays the empirical density of an asset return obtained from this model. We investigate the forecasting performance of the CARES models by comparing their RMSEs of 1% and 5% ES forecasts with those of historical simulation, kernel-based nonparametric estimator and the CARE models when sample size is 500 + T + 1 = 751, 500+T +1 = 1001, 500+T +1 = 1501 and 500+T +1 = 2501. The first 500 observations are discarded as a burn-in period, and then we use the next first T observations to fit the CARES models, and leave the last observation for the one-step-ahead out-of-sample 1% ES and 5% ES forecasting evaluation. Table 3 present the simulation results. The RMSEs of 5% and 1% ES from the CARES models are smaller than those of the two nonparametric methods when sample size T is not less than 500. Moreover, we see the advantage of the CARES models to the two nonparametric methods become more pronounced as T increases. The reduction in RMSE from HS and KDE to the CARES models represents the benefit of exploiting the dynamic pattern of the tail observations in the ES forecasts. Again, the CARES models outperform the corresponding CARE models 4 in all the cases. Let s consider a GARCH(1,1) model with time-varying skewness and kurtosis as the last example. This model is defined as follows: r t = σ t z t, σ 2 t = a 0 + b + 0 (r + t 1) 2 + b 0 (r t 1) 2 + c 0 σ 2 t 1. The disturbance z t follows a generalized Student-t distribution 5 4 The true value of α is unknown when the return follows the normal mixture (NM) stochastic volatility (SV) model. So we only use the estimated value of alpha to implement the CARE models. 5 The density of generalized t distribution (GT) is defined by gt(z η, λ) = { bc(1 + 1 η 2 ( bz+a 1 λ )2 ) (η+1)/2 if z < a/b bc(1 + 1 η 2 ( bz+a 1+λ )2 ) (η+1)/2 if z a/b, 18

Table 3: ES forecasts When Data is from a NM(2) SV model 5% ES T=250 T=500 T=1000 T=2000 Bias RMSE Bias RMSE Bias RMSE Bias RMSE HS(250) -0.5968 1.9950-0.6505 1.2530-0.6156 1.2430-0.6021 1.2431 HS(500) NA NA -0.6351 1.2315-0.6003 1.2231-0.5953 1.2244 KDE -0.6286 1.2172-0.6726 1.2677-0.6298 1.2423-0.6129 1.2272 KDE-JK -0.4028 1.2160-0.4988 1.2520-0.4992 1.2334-0.5152 1.2229 CARE-SAV 0.3125 1.2521 0.6086 1.2701-0.6473 1.2323 0.6437 1.2196 CARE-IG 0.2987 1.2529 0.6167 1.2767-0.7665 1.2333 0.6018 1.2212 CARES-SAV -0.5627 1.2299-0.6237 1.2500-0.5899 1.2189-0.5923 1.2154 CARES-IG -0.5588 1.2222-0.6259 1.2541-0.5907 1.2242-0.5924 1.2188 1% ES T=250 T=500 T=1000 T=2000 Bias RMSE Bias RMSE Bias RMSE Bias RMSE HS(250) -0.8674 1.4642-0.9180 1.5338-0.8926 1.5268-0.8898 1.5377 HS(500) NA NA -0.8801 1.4517-0.8431 1.4364-0.8414 1.4475 KDE -1.0164 1.5433-1.0292 1.5686-0.9487 1.4946-0.9126 1.4746 KDE-JK -0.5730 1.5395-0.7225 1.5435-0.7427 1.4850-0.7719 1.4567 CARE-SAV -1.2134 2.0986-0.9982 1.4987-0.7896 1.4754-0.8576 1.4621 CARE-IG -1.3125 2.1321-0.8765 1.4890-0.8654 1.4896-0.8976 1.4687 CARES-SAV -1.0397 1.8914-0.8267 1.4500-0.8290 1.4591-0.8453 1.4389 CARES-IG -0.7957 1.6220-0.8990 1.4541-0.8314 1.4549-0.8486 1.4447 19

1.8 1.6 Density of NM(2) SV Model Standard GAUSSIAN density 1.4 1.2 1 0.8 0.6 0.4 0.2 0 5 4 3 2 1 0 1 2 3 4 5 Figure 3: The density of an asset return obtained from a NM(2) SV model with time-varying asymmetry parameter λ t and tail-fatness parameter η t as z t GT (z t η t, λ t ), where η t = a 1 + b + 1 y t 1 + + b 1 yt 1 + c 1 η t 1, λ t = a 2 + b 2 yt 1 2 + c 2 λt 1, (20) η t = g [2,+30] η t, λ t = g [ 1,1] λt, and g represents the logistic map. Following the S&P500 stock index return analysis of Jondeau and Rockinger (2003), parameters of the model in our simulation are set to be a 0 = 0.0074, b + 0 = 0.0384, b 0 = 0.0759, c 0 = 0.9366, a 1 = 0.5191, b + 1 = 0.5615, b 1 = 0.0653, c 1 = 0.5999, a 2 = 0.0062, b 2 = 0.0626, c 2 = 0.6961. This model is not only able to accommodate the time-varying volatility, but also the time dependent higher order moments, skewness and kurtosis. Figure 4 displays the empirical density of an asset return obtained from this model. It is clear to see that this density exhibits an very unusual shape with strong skewness and kurtosis, which is unable to be modeled by a simple distribution. To study the forecasting performance of the CARES models, we compare their RMSEs of 1% and 5% ES forecasts with those of historical simulation, kernel based nonparametric where a 4λc η 2 η 1, b2 1 + 3λ 2 a 2, c Γ((η+1)/2). π(η 2)Γ(η/2) 20

1.4 1.2 GARCH model with time varying skewness and kurtosis Standard GAUSSIAN density) 1 0.8 0.6 0.4 0.2 0 6 4 2 0 2 4 6 Figure 4: The density of an asset return obtained from a GARCH model with timevarying skewness and kurtosis estimator and the CARE models when sample size is 500+T +1 = 751, 500+T +1 = 1001, 500+T +1 = 1501 and 500+T +1 = 2501. In each simulation, the first 500 observations are discarded as a burn-in period, the next first T observations are used to fit the CARES model, and the last observation is left to do out-of-sample forecasting comparison. The RMSE is computed by replicating the simulation by 10 4 times. The simulation results are reported in table 4. Due to the lack of data, the RMSEs of 1% and 5% ES from the CARES models are larger than those from the two nonparametric methods when sample size is small (T = 250). However, with the increase of the sample size (even the sample size is moderate (T = 500)), the CARES models perform better than the two nonparamtric methods by showing the smallest RMSE. Intuitively, the reason is that the CARES models specify a dynamic parametric structure for the tail observations, whereas HS and KDE do not consider the time-varying volatility of the returns. Meanwhile, the CARES models again outperform the corresponding CARE models 6 in all the situations. 6 The true value of α is also unknown in this case. So we only use the estimated value of alpha to implement the CARE models. 21

Table 4: ES forecasts When Data is from a GARCH(1,1) model with time-varying skewness and kurtosis 5% ES T=250 T=500 T=1000 T=2000 Bias RMSE Bias RMSE Bias RMSE Bias RMSE HS(250) -0.0238 0.4930-0.0482 0.4598-0.0015 0.3100 0.0525 0.2934 HS(500) NA NA -0.0661 0.4575-0.0245 0.3095 0.0220 0.2648 KDE 0.0108 0.4863-0.0685 0.4560-0.0930 0.3277-0.0241 0.2721 KDE-JK -0.0077 0.4862-0.0555 0.4579-0.0844 0.3295-0.0176 0.2724 CARE-SAV 0.3614 2.2076 0.0321 0.4507 0.4534 0.4621 0.3756 0.3702 CARE-IG 0.3457 2.1987-0.0487 0.4514 0.5866 0.4637 0.3902 0.3678 CARES-SAV -0.2914 0.7197-0.0958 0.4300-0.1002 0.3031-0.0120 0.1683 CARES-IG -0.3611 2.1917-0.1031 0.4384-0.1583 0.3074 0.0102 0.1764 1% ES T=250 T=500 T=1000 T=2000 Bias RMSE Bias RMSE Bias RMSE Bias RMSE HS(250) -0.2034 1.2972 0.0879 0.9296 0.1112 1.0771-0.0565 1.0329 HS(500) NA NA 0.1314 0.8265 0.1453 0.8101 0.0265 0.7536 KDE -0.0188 1.1222 0.1386 0.8289 0.1178 0.7989-0.1107 0.5472 KDE-JK -0.0177 1.1184 0.1314 0.8264 0.1127 0.7970-0.1069 0.5475 CARE-SAV 1.8780 1.4538 1.6347 1.3014 0.6698 0.7503 0.2324 0.5381 CARE-IG 1.9629 1.4079 1.7694 1.3231 0.7926 0.7761 0.1356 0.5247 CARES-SAV -2.3325 1.3282-1.0849 1.2801 0.0212 0.7129-0.0872 0.5058 CARES-IG -2.2489 1.3477-1.4831 1.1887 0.0163 0.7085-0.1073 0.5027 22

5 Empirical Analysis To implement our CARES model on real data, we conduct a simple empirical study to assess the expected shortfall of some stock indices and individual stocks. We shall try different CARES model specifications for each index and stock, and then evaluate both in-sample and out-of-sample forecasting performance of these specifications. 5.1 Data We consider two individual stocks, General Motors (GM) and IBM, and one stock index, S&P500, to conduct empirical study. Following Engle and Manganelli (2004), we firstly take a sample of 3,392 daily prices from Datastream for each of them, spanning from April 7, 1986 to April 7, 1999, to see whether the ES estimates from our CARES model can provide the same risk indication as the VaR estimates from CAViaR model did. Secondly, we take a recent sample of daily prices from Wharton Research Data Services (WRDS) for the above two stocks and one index, which ranges from Jan 1, 2005 to Dec 31, 2011. This sample period undergoes the recent global financial crisis, and it is useful to study whether these stocks and indices are more risky in the crisis time, and our CARES model is able to capture this effect. The daily returns are computed as 100 times the difference of the log of the prices. 5.2 Empirical Results For the first sample, we use the first 2,892 observations to estimate the CARES models, and leave the last 500 observations for out-of-sample forecasting. We estimate 1% and 5% 1-day-ahead ESs, using the CARES specifications discussion in Section 3.1. The 5% VaR and ES estimates for GM are plotted in Figure 5, and all of the estimation results are reported in Table 5. The top panel of Figure 5 is the plot of 5% VaR and ES estimates from CARES symmetric absolute value specification for GM 7, and the bottom panel of Figure 5 is the plot of 5% VaR and ES estimates from CARES indirect GARCH specification for GM. We can see that the ES plot has a very similar pattern as the VaR plot, with spike at the beginning of the sample indicating the 1987 crash, and the increase toward the end of the sample, which reflects the increase volatility following the Russian and Asian crises. 7 The plot exhibits the same trend as Figure 1 in Engle and Manganelli (2004), and the only difference is that VaR is reported as a negative number rather than positive one. 23

Table 5: Estimation Results of the CARES models (Part A) 1% ES Symmetric Absolute Value Indirect GARCH GM IBM S&P500 GM IBM S&P500 γ1 2.1648-0.0802-0.0440-0.9024 1.1747-0.2470 (Std1) a (1.5962) (1.2069) (0.5396) (1.0177) (5.4666) (0.0703) (Std2) b (1.4586) (1.2582) (0.6173) (1.0245) (9.1299) (0.0710) (Std3) c (1.6313) (1.1975) (0.5791) (1.0161) (12.2810) (0.0705) γ2 0.2143 0.9001 0.6981 0.9171 0.9101 0.8919 (Std1) (0.2300) (0.9532) (0.4026) (0.0878) (0.1034) (0.0214) (Std2) (0.2058) (1.0297) (0.4128) (0.0883) (0.1744) (0.0216) (Std3) (0.2391) (1.0202) (0.4085) (0.0876) (0.2327) (0.0215) γ3 1.8901 0.6174 1.8698 1.3730 0.9595 2.8273 (Std1) (0.8261) (4.7204) (2.4376) (7.5860) (5.3782) (13.1030) (Std2) (0.8147) (5.1286) (2.4309) (7.5899) (5.4315) (13.0990) (Std3) (0.8416) (5.1226) (2.4376) (7.5832) (5.3252) (13.1010) a k-nearest neighbour estimator b Koenker s bandwidth with Bofinger s h T c Koenker s bandwidth with Hall and Sheather s h T 24

5% ES Symmetric Absolute Value Indirect GARCH GM IBM S&P500 GM IBM S&P500 γ1 0.6092 0.0717 0.2367 2.0529 0.6679 1.0458 (Std1) a (0.5707) (0.3806) (0.3854) (1.8022) (2.2157) (3.3992) (Std2) b (0.9244) (0.3916) (0.4094) (2.1404) (1.0939) (3.1835) (Std3) c (0.9737) (0.3896) (0.3812) (2.5239) (1.2997) (3.1485) γ2 0.6488 0.9190 0.5610 0.6886 0.8764 0.3302 (Std1) (0.2449) (0.2156) (0.2775) (0.0993) (0.1320) (0.3156) (Std2) (0.3578) (0.2084) (0.2769) (0.1141) (0.0628) (0.2621) (Std3) (0.3710) (0.2095) (0.2588) (0.1336) (0.0756) (0.2628) γ3 0.5755 0.1924 1.1267 0.8747 0.3883 3.8413 (Std1) (0.4205) (0.3890) (0.8126) (1.2039) (0.5361) (2.6250) (Std2) (0.5077) (0.3584) (0.8020) (1.2336) (0.4246) (2.6027) (Std3) (0.5125) (0.3629) (0.7828) (1.2024) (0.4433) (2.6138) a k-nearest neighbour estimator b Koenker s bandwidth with Bofinger s h T c Koenker s bandwidth with Hall and Sheather s h T 25

Figure 5: 5% VaR and ES estimates from CARES models for GM This shows that the ES estimates from the CARES model is able to produce the same risk indication as VaR, and can be regarded as an alternative risk measure to the VaR estimates from the CAViaR model. For the second sample, we use the first 1,262 observations to estimate the CARES models, and still leave the last 500 observations for out-of-sample forecasting. We estimate 1% and 5% 1-day-ahead ESs, using the two CARES specifications discussion in Section 3.1. The 5% VaR and ES estimates for IBM and S&P 500 are respectively plotted in Figure 6 and Figure 7. The VaR and ES estimates are reported as negative numbers in these plots. The common spike in the middle of the sample (between the end of 2008 and 2009) is the global financial crisis, and the increase risk toward the end of the sample reflects the recent Euro zone crisis. All of the estimation results are reported in Table 6. The table presents the value of the estimated parameters and the corresponding standard errors. The most striking result is that the coefficient of the autoregressive term in CARES model is always very significant. This confirms that the phenomenon of clustering of volatilities is relevant also in the tails. (May Daniel can add more descriptions for the results!!) 26

Figure 6: 5% VaR and ES estimates from CARES models for IBM Figure 7: 5% VaR and ES estimates from CARES models for S&P 500 27

Table 6: Estimation Results of the CARES models (Part B) 1% ES Symmetric Absolute Value Indirect GARCH GM IBM S&P500 GM IBM S&P500 γ1 0.1538 0.3703 0.0304 0.5621 0.8743 0.3303 (Std1) a (0.2907) (0.5324) (0.0626) (0.1690) (0.8722) (0.2458) (Std2) b (0.1524) (0.5650) (0.0592) (0.1440) (0.8765) (0.2391) (Std3) c (0.3747) (0.5801) (0.0592) (0.1440) (0.8877) (0.2374) γ2 0.8731 0.7000 0.9250 0.9456 0.6490 0.8902 (Std1) (0.0563) (0.3401) (0.1532) (0.0975) (0.1930) (0.0201) (Std2) (0.0629) (0.3631) (0.1272) (0.0974) (0.1966) (0.0210) (Std3) (0.0967) (0.3708) (0.1293) (0.0974) (0.2000) (0.0211) γ3 0.5127 0.4603 0.2768 0.5515 1.0359 0.7826 (Std1) (0.3464) (0.4445) (0.6114) (0.3690) (0.5886) (0.6683) (Std2) (0.2868) (0.4727) (0.5070) (0.3690) (0.6323) (0.6334) (Std3) (0.1213) (0.4749) (0.5157) (0.3690) (0.6338) (0.6341) a k-nearest neighbour estimator b Koenker s bandwidth with Bofinger s h T c Koenker s bandwidth with Hall and Sheather s h T 28

5% ES Symmetric Absolute Value Indirect GARCH GM IBM S&P500 GM IBM S&P500 γ1 0.0139 0.1123 0.0442 0.0736 0.2058 0.0757 (Std1) a (0.0560) (0.1458) (0.0483) (0.0320) (0.1246) (0.0465) (Std2) b (0.0560) (0.2173) (0.0562) (0.0372) (0.1154) (0.0505) (Std3) c (0.0560) (0.1956) (0.0550) (0.0330) (0.1168) (0.0522) γ2 0.9192 0.7875 0.9081 0.9322 0.7016 0.9219 (Std1) (0.0604) (0.1871) (0.0507) (0.0065) (0.0521) (0.0061) (Std2) (0.0594) (0.2700) (0.0473) (0.0065) (0.0471) (0.0060) (Std3) (0.0594) (0.2424) (0.0502) (0.0065) (0.0469) (0.0059) γ3 0.2480 0.3686 0.2297 0.3837 0.8893 0.3533 (Std1) (0.1490) (0.2720) (0.1266) (0.2158) (0.4671) (0.2101) (Std2) (0.1089) (0.3709) (0.1139) (0.2153) (0.4078) (0.2212) (Std3) (0.1156) (0.3344) (0.1241) (0.2156) (0.4231) (0.1954) a k-nearest neighbour estimator b Koenker s bandwidth with Bofinger s h T c Koenker s bandwidth with Hall and Sheather s h T 29

6 Conclusion We have proposed a new model to ES estimation. Most existing methods estimate the distribution of the returns and then recover its quantile, and the expected value of the exceedances beyond the quantile in an indirect way. In contrast, we directly model the quantile and the expected value of the exceedances beyond the quantile. To do this, we introduce a new class of models, the CARES models, which use CaViaR model for quantile estimation, along with specifying the evolution of the expected value of exceedance beyond the quantile over time using a special type of autoregressive process. We estimate the unknown parameters by a two-stage procedure, and derive the limiting theory of these parameter estimators within a GMM framework. Simulation study that compares this new model with some existing methods shows the new model performs good with a moderate sample size. Applications to real data illustrate the ability of the new model to adapt to new risk environments. Appendix A As the estimator ˆθ = ( ˆβ, ˆγ) can be asymptotically regarded as a GMM estimator, its asymptotic distribution can be established within a GMM framework. In our particular problem, ˆθ can be identified as where ˆθ = argmin{q n (θ)}, Q n (θ) = m n (θ) Vn 1 m n (θ) m n (θ) = 1 n [ 1 n ϕ t (θ) = n t=1 βf t (β) (τ I(y t < f t (β))) 1 n n t=1 n t=1 γg t (γ) (y t g t (γ) I(y t < f t (β)) E 0 [ϕ t (θ)] = 0 V n P V, ] where E 0 means expectation, and V is the weighting matrix. Proof of Theorem 3.1 In order to establish the consistency of the estimator ˆθ, we require some assumptions as follows: P Assumption 6.1. Denote m 0 (θ) = E 0 [ϕ t (θ)], then sup m n (θ) m 0 (θ) 0, where is Euclidian norm. This assumption ensures that m n (θ) uniformly converges to m 0 (θ) in probability. 30

Assumption 6.2. For all θ Θ such that θ θ 0 > ε, we have Q 0 (θ) Q 0 (θ 0 ) > 0. This assumption ensures that the population objective function Q 0 (θ) has a unique maximum at θ 0. Define the population objective function as Q 0 (θ) = E 0 [ϕ t (θ)] V 1 E 0 [ϕ t (θ)]. Then under assumption 6.1, we have sup Q n (θ) Q 0 (θ) = sup m n (θ) V 1 m n (θ) E 0 [ϕ(ω i, θ)] V 1 E 0 [ϕ(ω i, θ)] That is = sup = = P 0 sup +sup +sup sup +sup +sup sup +sup +sup n m n (θ) Vn 1 m n (θ) m n (θ) Vn 1 E 0 [ϕ(ω i, θ)] +m n (θ) Vn 1 E 0 [ϕ(ω i, θ)] m n (θ) V 1 E 0 [ϕ(ω i, θ)] +m n (θ) V 1 E 0 [ϕ(ω i, θ)] E 0 [ϕ(ω i, θ)] V 1 E 0 [ϕ(ω i, θ)] m n (θ) V 1 m n (θ) m n (θ) V 1 E 0 [ϕ(ω i, θ)] n m n (θ) V 1 E 0 [ϕ(ω i, θ)] m n (θ) V 1 E 0 [ϕ(ω i, θ)] n n m n (θ) V 1 E 0 [ϕ(ω i, θ)] E 0 [ϕ(ω i, θ)] V 1 E 0 [ϕ(ω i, θ)] m n (θ) V 1 (m n (θ) E 0 [ϕ(ω i, θ)]) n m n (θ) (V 1 V 1 )E 0 [ϕ(ω i, θ)] n (m n (θ) E 0 [ϕ(ω i, θ)] )V 1 E 0 [ϕ(ω i, θ)] m n (θ) V 1 sup m n (θ) E 0 [ϕ(ω i, θ)] n Vn 1 m n (θ) sup m n (θ) E 0 [ϕ(ω i, θ)] sup P sup Q n (θ) Q 0 (θ) 0. V 1 sup E 0 [ϕ(ω i, θ)] V 1 E 0 [ϕ(ω i, θ)] Then, let ε > 0 be arbitrary small real number. Suppose θ θ 0 > ε, by assumption 6.2, there exists a δ > 0 such that Q 0 (θ) Q 0 (θ 0 ) > δ. Then, P r( ˆθ θ 0 ε) P r(q 0 (ˆθ Q 0 (θ 0 ) δ) = P r(q 0 (ˆθ) Q n (ˆθ) + Q n (ˆθ) Q 0 (θ 0 ) δ) = P r(q 0 (ˆθ) Q n (ˆθ) + Q n (θ 0 ) + o p (1) Q 0 (θ 0 ) δ) P r[( Q 0 (ˆθ) Q n (ˆθ) δ) ( Q n (θ 0 ) Q 0 (θ 0 ) δ)] P r[2sup Q 0 (ˆθ) Q n (ˆθ) δ]. 31