Dynamic Semiparametric Models for Expected Shortfall (and Value-at-Risk)

Size: px

Start display at page:

Download "Dynamic Semiparametric Models for Expected Shortfall (and Value-at-Risk)"

Mervin Dorsey
5 years ago
Views:

1 Supplemental Appendix to: Dynamic Semiparametric Models for Expected Shortfall (and Value-at-Ris) Andrew J. Patton Johanna F. Ziegel Rui Chen Due University University of Bern Due University September 08 This appendix contains three parts. Part presents lemmas that provide further details on the proof of Theorem presented in the main paper. Part presents a detailed veri cation that the high level assumptions made in the theorems of the paper hold for the widely-used GARCH(,) process. Part 3 contains additional tables of analysis. Appendix SA.: Detailed proofs Throughout this appendix, we press the subscript on ^ T for simplicity of presentation, and we denote the conditional distribution and density functions as F t and f t rather than F t (jf t ) and f t (jf t ) : In Lemmas and 3 below, we will refer to the expected score, de ned as: () = E [g t ()] Ft (v t ()) = E e t () Ft (v t ()) e t () v t () rv t () 0 + E t [Y t j fy t v t ()g] v t () + e t () re t () 0 () Lemma Let Then under Assumptions -, ( ) [g = p T (^ 0 ) = ( 0 ) + o p () p T T X t= g t ( 0 ) + o p ()! () (3)

2 Proof of Lemma. Consider a mean-value expansion of (^) around 0 : (^) =( 0 ) [g (^ 0 ) (4) = = ( )(^ 0 ) (5) where lies between ^ and 0, and noting that ( 0 ) = 0 and the de nition of ( ) given in the statement of the lemma. Proving the claim involves two results: (I) ( ) = ( 0 )+o p (); and (II) p T (^) = p T P T t= g t( 0 ) + o p (): Part (I) is easy to verify: Since v t () and e t () are twice continuously di erentiable, and e t ( 0 ) < 0, () is continuous in and () is non-singular in a neighborhood of 0. Then by the continuous mapping theorem, p! 0 ) ( ) p! ( 0 ). Establishing (II) builds on Theorem 3 of Huber (967) and Lemma A. of Weiss (99), which extends Huber s conclusion to the case of non-iid dependent random variables. We are going to verify the conditions of Weiss s Lemma A.. Since the other conditions are easily checed, we only need to show that T = P T t= g t(^) = o p (), which we show in Lemma, and that his assumptions N3 and N4 hold, which we show in Lemmas 3-6. Lemma Under Assumptions -, T = P T t= g t(^) = o p (): Proof of Lemma. Let fe j g p j= be the standard basis of Rp and de ne T L j = T (a) = T X t= L F Z0 Y t ; v t (^ + ae j ); e t (^ + ae j ); where a is a scalar. Let G j T (a) (a scalar) be the right partial derivative of Lj T (a), that is (6) T G j = T (a) = T X t= r j e t (^ + ae j ) e t (^ + ae j ) r j v t (^ + ae j ) e t (^ + ae j ) ( n o Y t v t (^ + ae j ) )+ (7) n o! Y t v t (^ + ae j ) (v t (^ + ae j ) Y t ) v t (^ + ae j ) + e t (^ + ae j ) G j T (0) = lim!0+ Gj T ( ) is the right partial derivative of L T () at ^ in the direction j, while lim!0+ Gj T ( ) is the left partial derivative of L T () at ^ in the direction j. Because L T () achieves its minimum at ^; and its left and right partial derivatives exist, its left derivative must

3 be non-positive and its right derivative must be non-negative. Thus, jg j T (0)j lim!0+ Gj T ( ) = T = T X t= lim!0+ Gj T ( ) r j v t (^) n e t (^) o Y t = v t (^) X T = T = jr j v t (^)j n o e t (^) Y t = v t (^) t= + r je t (^) n v t (^) Y t Y t = v t (^)o! (8) e t (^) The second term in the penultimate line vanishes as fy t = v t (^)g(v t (^) Y t ) is always zero. By Assumption (C), for all t, jr j v t (^)j rv t (^) V (F t ) and =e t (^) H, thus: jg j T (0)j H " T = max V X T n (F t ) Y t = v t (^)o # (9) tt t= H is nite by Assumption (C). Next note that for all > 0; Pr T = max V (F t tt ) > TX t= h Pr V (F t ) > T =i with the latter inequality following from Marov s inequality. Since E[V (F t TX t= E[V (F t ) 3 ] 3 T 3=! 0 (0) ) 3 ] is nite by assumption (D), we then have that T = max V (F t ) = o p () : Finally, by Assumption (G) we tt have P o T t= ny t = v t (^) = O a:s: (). We therefore have G j T (0)! p 0. Since this holds for every j, we have T = P T t= g t(^) = o p (). The following three lemmas show each of the three parts of Assumption N3 of Weiss (99) holds. In the proofs below we mae repeated use of mean-value expansions, and we use to denote a point on the line connecting ^ and 0 ; and to denote a point on the line connecting and 0 : The particular point on the line can vary from expansion to expansion. Lemma 3 Under assumptions -, Assumption N3(i) of Weiss (99) holds: T () a 0 ; for 0 d 0. for T su ciently large, where a and d 0 are strictly positive numbers. 3

4 Proof of Lemma 3. A mean-value expansion yields: T (^) = T ( 0 ) + ( )(^ 0 ) = T ( )(^ 0 ) () since T ( 0 ) = 0; where () t ()]=@: Using the fact that ( Z t fy t v t ()gjf t ] vt() yf t (y)dy = v t ()f t (v t ())rv t we can write: r v t () () = E e t () + rv t() 0 re t () e t () + re t() 0 rv t () e t () + r e t () e t () + e t () 3 re t() 0 re t () Ft (v t ()) v t () + f t(v t ()) e t () r0 v t ()rv t () + e t () r0 e t ()re t ()]g F t E [Y tfy t v t ()gjf t Ft (v t ()) ] + e t () (3) Evaluated at 0, the rst two terms of drop out because F t v t ( 0 )gjf t ] = e t 0 : De ne D as D ( 0 ) = T T X t= v t ( 0 ) = and E[Y tfy t ft (v t ( 0 )) E e t ( 0 ) rv t( 0 ) 0 rv t ( 0 ) + e t ( 0 ) re t( 0 ) 0 re t ( 0 ) Below we show that ( ) = D + O(^ 0 ) by decomposing T ( ) D into four terms and showing that each is bounded by a O(^ 0 ) term. First term: Using a mean-value expansion around 0 and Assumptions (C)-(D) we obtain: r E v t ( ) e t ( ) + rv t( ) 0 re t ( ) e t ( ) + re t( ) 0 rv t ( ) Ft (v t ( )) e t ( ) E HV (F t ) + H V (F t )H (F t ) f t (v t ( )) rv t ( )( 0 ) (5) K n HE[V (F t ) 3 ] =3 E[V (F t ) 3= ] =3 + H E[V (F t ) 3 ] =3 E[H (F t ) 3 ] =3o 0 (4) 4

5 Second term: Again using a mean-value expansion around 0 and Assumptions (C)-(D): E e t ( ) r e t ( ) e t ( ) 3 re t( ) 0 re t ( ) Ft (v t ( )) v t ( ) E[Y tfy t v t ( )gjf t ] + e t ( ) E[ H (F t )H + H (F t ) H 3 H (F t ) (6) ((F t (v t ( )) = )rv t ( ) + re t ( )) ( 0 )] f(= + )(H E[V (F t )H (F t )] + H 3 E[V (F t )H (F t ) ]) + (H E[H (F t )H (F t )] + H 3 E[H (F t ) 3 ])g 0 Third term: E ft (v t ( )) e t ( ) rv t( ) 0 rv t ( ) = TX T t= Ef f t(v t ( )) e t ( ) rv t( ) 0 rv t ( ) + f t(v t ( )) e t ( ) rv t( 0 ) 0 rv t ( ) + f t(v t ( 0 )) e t ( ) rv t( 0 ) 0 rv t ( ) f t (v t ( 0 )) e t ( 0 ) rv t( 0 ) 0 rv t ( 0 ) f t (v t ( 0 )) e t ( ) rv t( 0 ) 0 rv t ( ) f t (v t ( 0 )) e t ( 0 ) rv t( 0 ) 0 rv t ( ) + f t(v t ( 0 )) e t ( 0 ) rv t( 0 ) 0 rv t ( f t (v t ( 0 )) ) e t ( 0 ) rv t( 0 ) 0 rv t ( 0 )g = TX T Ef f t(v t ( )) e t ( ) [r v t ( )( 0 )]rv t ( ) t= + f t(v t ( )) f t (v t ( 0 )) e t ( rv t ( 0 ) 0 rv t ( ) ) + f t(v t ( 0 )) e t ( ) ( 0 )rv t ( 0 ) 0 rv t ( ) + f t(v t ( 0 )) e t ( 0 ) rv t( 0 ) 0 ( 0 ) v t ( )g T T X t= EfV (F t ) (KH V (F t )) + KH V (F t ) 3 + KH H (F t )V (F t ) + KHV (F t )V (F t )g 0 f t (v t ( )) e t ( ) rv t( 0 ) 0 rv t ( ) (7) 5

6 Fourth term: The bound on this term follows similar steps to that of the third term: TX T Ef e t= t ( ) re t( ) 0 re t ( ) e t ( 0 ) re t( 0 ) 0 re t ( 0 )g TX = T Ef e t ( ) re t( ) 0 re t ( ) e t ( ) re t( 0 ) 0 re t ( ) (8) t= + e t ( ) re t( 0 ) 0 re t ( ) + e t ( 0 ) re t( 0 ) 0 re t ( ) e t ( 0 ) re t( 0 ) 0 re t ( ) e t ( 0 ) re t( 0 ) 0 re t ( 0 )g j X T T fh E[H (F t )H (F t )] + H 3 E H (F t ) 3 + H E[H (F t )H (F t )]g 0 t= Therefore, T ( ) = D T + O(^ 0 ) ) T ( ) D T K^ 0 ;where K is some constant <, for T su ciently large. By Assumption (E), D T has eigenvalues bounded below by a positive constant, denoted as a; for T su ciently large. Thus, T (^) = T ( ) ^ 0 = D T (^ 0 ) (D T T ( ))(^ 0 ) (9) D T (^ 0 ) (D T T ( ))(^ 0 ) (a K^ 0 ) ^ 0 The penultimate inequality holds by the triangle inequality, and the nal inequality follows from Assumption (E) on the minimum eigenvalue of D T : Thus, for T a K^ 0 > 0; the result follows. su ciently large so that Lemma 4 De ne t (; d) = g t ( ) g t () (0) d Then under assumptions -, Assumption N3(ii) of Weiss (99) holds E[ t (; d)] bd; for 0 + d d 0 ; d 0 () for T su ciently large, where b; d;and d 0 are strictly positive numbers. Proof of Lemma 4. In this proof, the strictly positive constant c and the mean-value expansion term,, can change from line to line. Pic d 0 such that for any that satis es 6

7 0 d 0, all the conditions in Assumption (C) and (D) hold as well as e t () v t () 0. Let us expand g t () into six terms: g t () = r 0 v t () e t () fy t v t ()g r 0 v t () e t () + v t ()r 0 e t () e t () fy t v t ()g () v t ()r 0 e t () r 0 e t () e t () e t () fy t v t ()gy t + r0 e t () e t () We will bound t (; d) by considering six terms, t (; d) (i) ; i = ; ; ; 6, de ned below. Each term is shown to be bounded by a constant times d. First term: t (; d) () = d r 0 v t ( ) e t ( ) fy t v t ( )g r 0 v t () e t () fy t v t ()g (3) Set = arg min d v t ( ) and = arg max d v t ( ). Since v t () and e t () are assumed to be twice continously di erentiable, and exist. We want to tae the indicator function out from the operator. To this end, let us discuss what t (; d) () equals in two cases. Case : Y t v t (). (a) If Y t > v t ( ), t (; d) () = r0 v t() e t(). (b) If Y t < v t ( ), t (; d) () = r0 v t( ) r 0 v t() e t( ) e t(). (c) If v t ( ) Y t v t ( ), d t (; d) () = max ( d d;y tv( ) r 0 v t ( ) e t ( ) r 0 v t ( ) r 0 v t () e t ( ) e t () ; r 0 ) v t () e t () r 0 v t () e t () + r 0 v t () e t () Case : Y t > v t (), t (; d) () = fy t v( )g r 0 v t ( ) d;y tv( ) e t ( ) (5) fy t v( )g r 0 v t ( ) d e t ( ) 0 + d d 0 implies that both and (which are in a d-neighborhood of ) are in a d 0 -neighborhood of 0, and so r 0 v t () e t () d r 0 v t ( ) e t ( ) 0 d 0 (4) r 0 v t () e t () (6) 7

8 Thus, t (; d) () (fv t ( ) < Y t v t ()g + fv t ( ) Y t v t ()g + fv t () < Y t v t ( )g) (7) r 0 v t () e t () + r 0 v t ( ) r 0 v t () e t ( ) e t () ; 0 d 0 d where E t [fv t ( ) < Y t v t ()g] = Z vt() v t( ) f t (y)dy (8) Kjv t ( ) v t ()j KV (F t ) KV (F t )d and similarly, E [fv t () < Y t v t ( )gjf t ] KV (F t )d (9) and E [fv t ( ) < Y t v t ()gjf t ] KV (F t )d Further and by the mean-value theorem, ) r 0 v t ( ) e t ( ) d r 0 v t ( ) e t ( ) 0 d r 0 v t () e t () = r 0 v t () e t () r 0 v t () e t () HV (F t ) (30) r v t ( ) e t ( ) + r0 v t ( )re t ( ) e t ( ) ( ) (3) HV (F t ) + H V (F t )H (F t ) d: (3) By Assumption (D), E[V (F t )] and E[V (F t )H (F t )] are nite, so E[ t (; d) () ] cd, where c is a strictly positive constant. Second term: t (; d) () = e t( ) d the rst term that E[ t (; d) () ] cd, where c is a strictly positive constant. r0 v t( ) r 0 v t() e t() : It was shown in the derivations for Third term: t (; d) (3) = d v t ( )r 0 e t ( ) e t ( ) fy t v t ( )g v t ()r 0 e t () e t () fy t v t ()g (33) 8

9 Similar to the rst term, t (; d) (3) can be bounded by where (fv t ( ) < Y t v t ()g + fv t ( ) Y t v t ()g + fv t () < Y t v t ( )g) (34) v t ()r 0 e t () e t () + v t ( )r 0 e t ( ) v t ()r 0 e t () e t ( ) e t () 0 d 0 d E [fv t ( ) < Y t v t ()g + fv t ( ) Y t v t ()g + fv t () < Y t v t ( )gjf t ] 3KV (F t )d and 0 d where e t () v t () 0 is used, and by the mean-value theorem, (35) v t ()r 0 e t () e t () H H (F t ) (36) v t ( )r 0 e t ( ) v t ()r 0 e t () e t ( ) e t () (37) = r 0 e t ( )rv t ( ) v t ( )r 0 e t ( )re t ( ) e t ( ) e t ( ) 3 + v t( )r e t ( ) e t ( ) ( ) ) d v t ( )r 0 e t ( ) e t ( ) v t ()r 0 e t () e t () (38) H V (F t )H (F t ) + H H (F t ) + H H (F t ) d By Assumption (D), E[V (F t )H (F t )], E[H (F t ) ], E[H (F t )] <. Therefore, E[ t (; d) (3) ] cd, where c is a strictly positive constant. Fourth term: t (; d) (4) = e t( ) d term we showed that E[ t (; d) (4) ] cd, where c is a strictly positive constant. vt( )r0 e t( ) v t()r 0 e t() e t() : In the derivations for the third Fifth term: t (; d) (5) = d r 0 e t ( ) e t ( ) fy t v t ( )gy t r 0 e t () e t () fy t v t ()gy t (39) Similar to the rst term, t (; d) (5) can be bounded by (fv t ( ) < Y t v t ()g + fv t ( ) Y t v t ()g + fv t () < Y t v t ( )g) (40) jy t j r 0 e t () e t () + jy tj r 0 e t ( ) r 0 e t () e t ( ) e t () 0 d 0 d 9

10 where E [fv t ( ) < Y t v t ()gjy t j jf t ] = Z vt() v t( ) jyjf t (y)dy Kjv t ( )j jv t ( ) v t ()j (4) KV (F t )V (F t ) KV (F t )V (F t )d and similarly, E [fv t ( ) < Y t v t ()gjy t j jf t ] KV (F t )V (F t )d (4) and E [fv t () < Y t v t ( )gjy t j jf t ] KV (F t )V (F t )d Further and by the mean-value theorem, r 0 e t ( ) e t ( ) ) d 0 d r 0 e t () e t () = r 0 e t ( ) e t ( ) r 0 e t () e t () H H (F t ) (43) r 0 e t ( )re t ( ) e t ( ) 3 + r e t ( ) e t ( ) ( ) (44) H3 H (F t ) + H H (F t ) d r 0 e t () e t () By Assumption (D), E[V (F t )V (F t )H (F t )], E[H (F t ) jy t j], E[H (F t )jy t j] <. Therefore, E[ t (; d) (5) ] cd, where c is a strictly positive constant. Sixth term: By the mean-value theorem, r 0 e t ( ) e t ( ) ) d (6) t (; d) = r 0 e t () e t () = r 0 e t ( ) e t ( ) d r 0 e t ( ) e t ( ) r 0 e t () e t () (45) r 0 e t ( )re t ( ) e t ( ) + r e t ( ) e t ( ) ( ) (46) H H (F t ) + H H (F t ) d: (47) r 0 e t () e t () By Assumption (D), E[H (F t ) ], E[H (F t )] <. Therefore, E[ t (; d) (6) ] cd, where c is a strictly positive constant. Thus we have shown that t (; d) P 6 i= t(; d) (i) with E[ t (; d) (i) ] cd, 8i = ; ; ; 6, where c is a strictly positive constant, proving the lemma. 0

11 Lemma 5 Under Assumptions -, Assumption N3(iii) of Weiss (99) holds: E[ t (; d) q ] cd; for 0 + d d 0 ; and some q > for T su ciently large, and where c > 0; d 0 and d 0 > 0: Proof of Lemma 5. expansion term,, can change from line to line. In this proof, the strictly positive constant c and the mean-value Pic d 0 such that for any that satis es 0 d 0, all the conditions in Assumption (C) and (D) hold as well as e t () v t () 0. Similar to Lemma 4, we will decompose t (; d) into six terms, t (; d) (i) ; for i = ; ; :::; 6. By Jensen s inequality, E [ t (; d) q ] 6 q P 6 i= E[ t(; d) (i) q ]; q >. We will show that for some 0 < <, E[ t (; d) (i) + ] cd, 8 i = ; ; ; 6, where c is a strictly positive constant. First term: t (; d) () = d r 0 v t ( ) e t ( ) fy t v t ( )g r 0 v t () e t () fy t v t ()g (48) Set = arg min d v t ( ) and = arg max d v t ( ). Following the same argument as in the proof of Lemma 4, we obtain [ t (; d) () ] + (fv t ( ) < Y t v t ()g + fv t ( ) Y t v t ()g + fv t () < Y t v t ( (49) )g) r 0 + v t () e t ()! + r 0 v t ( ) r 0! + v t () d e t ( ) e t () where 0 d 0 E t [fv t ( ) < Y t v t ()g + fv t ( ) Y t v t ()g + fv t () < Y t v t ( )g] 3KV (F t )d (50) and For r0 v t( ) e t( ) d d d r 0 v t() e t() 0 d r 0 v t ( ) e t ( ) r 0 v t ( ) e t ( ) r 0 + v t () e t ()! (HV (F t )) + (5) +!, we need to combine the two following two results: r 0 v t () e t () r 0 v t () e t () HV (F t ) + H V (F t )H (F t ) d (5)! + (HV (F t )) +

12 Combining with Assumption (D), we thus have E[ t (; d) () + ] cd. Second term: t (; d) () = r0 v t( ) r 0 v t() e t( ) e t() : It was shown in the derivations for d the rst term that E[ t (; d) () + ] cd. Third term: t (; d) (3) = d v t ( )r 0 e t ( ) e t ( ) fy t v t ( )g v t ()r 0 e t () e t () fy t v t ()g (53) Similar to the rst term, t (; d) (3) + can be bounded by (fv t ( ) < Y t v t ()g + fv t ( ) Y t v t ()g + fv t () < Y t v t ( )g) (54) v t ()r 0 + e t () e t ()! + v t ( )r 0 e t ( ) v t ()r 0! + e t () d e t ( ) e t () 0 d 0 where E t (fv t ( ) < Y t v t ()g + fv t ( ) Y t v t ()g + fv t () < Y t v t ( )g) 3KV (F t )d and For d vt( )r0 e t( ) e t( ) d v t ( )r 0 e t ( ) e t ( ) d v t ( )r 0 e t ( ) e t ( ) 0 d (55) v t ()r 0 + e t () e t ()! (H H (F t )) + (56) v t()r 0 e t() e t()! +, we need to combine the following two results: v t ()r 0 e t () e t () v t ()r 0 e t () e t () H V (F t )H (F t ) + H H (F t ) + H H (F t ) d (57) +! (H H (F t )) + (58) Combining with Assumption (D), we thus have E[ t (; d) (3) + ] cd. Fourth term: t (; d) (4) = vt( )r0 e t( ) e t( ) d v t()r 0 e t() e t() : It was shown in the deriva- tions for the third term that E[ t (; d) (4) + ] cd.

13 Fifth term: t (; d) (5) = d r 0 e t ( ) e t ( ) fy t v t ( )gy t r 0 e t () e t () fy t v t ()gy t (59) Similar to the rst and third terms, t (; d) (5) + can be bounded by where (fv t ( ) < Y t v t ()g + fv t ( ) Y t v t ()g + fv t () < Y t v t ( )g) (60) jy t j + r 0 + e t () e t ()! + jy t j + r 0 e t ( ) r 0! + e t () d e t ( ) e t () 0 d 0 E t [fv t ( ) < Y t v t ()gjy t j + ] = Z vt() v t( ) jyj + f t (y)dy Kjv t ( )j + jv t ( ) v t ()j (6) KV (F t ) + V (F t ) KV (F t ) + V (F t )d and similarly, E hfv t ( ) < Y t v t ()gjy t j + jf t i and E hfv t () < Y t v t ( )gjy t j + jf t i KV (F t ) + V (F t )d (6) KV (F t ) + V (F t )d Further For r0 e t( ) e t( ) d d 0 d r 0 e t () e t () H H (F t ) (63) r 0 e t() e t()! +, we need to combine the following two results: r 0 e t ( ) e t ( ) 0 d r 0 e t () e t () r 0 e t () e t () H3 H (F t ) + H H (F t ) d (64)! + H H (F t ) + Combining with Assumption (D), we thus have E[ t (d) (5) + ] cd. Sixth term: We have d d (6) t (; d) = r 0 e t ( ) e t ( ) r 0 e t ( ) e t ( ) r 0 e t () e t () d r 0 e t ( ) e t ( ) r 0 e t () e t () (65) r 0 e t () e t () H H (F t ) + HH (F t ) d (66)! + (HH (F t )) + 3

14 Combining with Assumption (D), we thus have E[ t (; d) (6) + ] cd. Thus E[t (; d) (i) ] + cd, 8i = ; ; :::; 6; proving the lemma. Lemma 6 Under Assumptions -, Eg t ( 0 ) + M, for all t and some M > 0. Proof of Lemma 6. Eg t ( 0 ) + 4 (E + r 0 v t ( 0 ) e t ( 0 ) + E + E +E v t ( 0 )r 0 e t ( 0 ) e t ( 0 ) r 0 e t ( 0 ) e t ( 0 ) r 0 e t ( 0 ) e t ( 0 ) + Y t v t ( 0 ) + Y t v t ( 0 ) + Y t v t ( 0 ) Y t + ) 4 ( + + )+ H + E hv (F t ) +i + ( + )+ H + E hh (F t ) +i + + H4+ E[H (F t ) + jy t j + ] +H + E hh (F t ) +io M since all the four expectations in the penultimate inequality are nite by assumption (D). Assumption N4 of Weiss (99) only requires Eg t ( 0 ) M, which is implied by the above. Lemma 7 Under Assumptions -, we have T = P T t= g t( 0 ) A 0 E g t ( 0 )g t ( 0 ) 0 : d! N (0; A0 ) as T! ; where Proof of Lemma 7. First note that the sequence g t ( 0 ) is stationary by Assumption (B)(ii), and has zero mean. Under Assumption (F) and Lemma 6, we can use Corollary 5. of Hall and Heyde (980) and the Cramer-Wold device to obtain the result. 4

15 Appendix SA.: Estimating a GARCH(,) model by FZ loss minimization In this appendix we show that we can estimate the popular GARCH(,) model via FZ loss minimization. We then verify that the assumptions required to show this are implied by the Assumptions - in the main paper. Throughout, jjxjj refers to the Euclidean norm if x is a vector and to the Frobenius norm if x is a matrix. Appendix SA..: Model speci cation Assume that the data generating process for Y t is: Y t = t t ; t? t ; t s iid F (0; ) (67) t =! t + 0 Y t Under this model, the conditional VaR and ES of Y t at a probability level (0; ), that is V ar (Y t jf t ) and ES (Y t jf t ), follow the dynamics: 4 V ar (Y t jf t ) ES (Y t jf t ) 3 5 = 4 c 0 ES (Y t jf t ) b 0 t where c 0 F ()=E[ t j t F ()] b 0 E[ t j t F ()] 3 5 (68) We x the level (0; ) throughout this appendix. Our goal is to estimate the parameter vector 0 = [ 0 ; 0 ; b 0 ; c 0 ] by minimizing the FZ loss function. Note that the parameters do not include! 0 because only two of the three parameters! 0 ; b 0 ; 0 are identi able under this model. A detailed discussion about the identi cation of the GARCH model via FZ loss minimization is provided in Section SA..3 of this appendix. In the simulation study (Section 4 of the main paper), for estimating the GARCH model via FZ loss minimization, we x! at its true value! 0. Put = [; ; b; c] and its true value is 5

16 0 = [ 0 ; 0 ; b 0 ; c 0 ]. We will estimate 0 by where L T () = T T arg min L T () TX L F Z0 (Y t ; v t (); e t (); ) t= t () =! 0 + t () + Y t v t () = c e t () e t () = b t () and the FZ loss function L F Z0 is de ned in equation (6) Appendix SA..: Assumptions to estimate GARCH by FZ minimization GARCH Assumption : F has zero mean, unit variance, nite fourth moment, and a unique -quantile, which is non-positive. It has density f () that satis es f () K and jf ( ) f ( )j Kj j. The distributions we often assume for the innovations of GARCH model, lie the normal distribution or t-distribution with degrees of freedom greater than four, all satisfy this assumption. GARCH Assumption : 0 <! 0 <. The true parameter vector 0 = [ 0 ; 0 ; b 0 ; c 0 ] R 4 is in the interior of, a compact and convex parameter space. Speci cally, for any vector [; ; b; c], assume that ( ), ( ) for some constant > 0, c ( ), B b B, for some constants ; B ; B > 0, and ( + ) + E 4 t 3 for some constant 3 > 0: This assumption is similar to Assumption of Lumsdaine (996) with the exception of the third condition on the parameter vector, which is used to validate the mixing condition in Assumption (F), which we now discuss. It is not hard to show that Y t ; v t () ; e t () ; r 0 v t () ; r 0 e t () Y t ; t () t () = and thus we need to consider the mixing properties of Y t ; t () t () = : Using De nition 3 of Carrasco and Chen (00), Y t ; t () t () = ; t 0 is a generalized hidden Marov model with a hidden chain t () t () = ; t 0 : By their Proposition 4, if t () t () = is stationary and -mixing then Y t ; t () t () = is stationary and -mixing with a decay rate at least as fast as that of t () t () = ; t 0 : 6

17 We use Proposition 3 of Carrasco and Chen (00). First, we express t () t () = ; t 0 in the polynomial random coe cient form: 0 0 t () A 0 A t + t () = 0 0 t t () = A (69) First, note that by GARCH Assumption t satis es their condition (e) and that by GARCH Assumption 0, their Assumption A 0 is obviously satis ed. For Assumption0 A ; the spectral ra- dius 0 A is < : For Assumption A 0 ; the spectral radius t + 0 A is h i E t + = ( + ) + E 4 t < :Then, if t () t () = is initialized from the invariant distribution (which we did in our simulations) then t () t () = ; t 0 strictly stationary and -mixing with exponential decay. It is well nown that -mixing implies -mixing and so Assumption (F) of the paper is satis ed. is GARCH Assumptions imply that the distribution and density of Y t conditional on F t satisfy Assumption (B)(i). Since Y t = t t and t F t, F t (xjf t ) =F ( x t ) f t (xjf t ) = t f ( x t ) Thus, jf t (xjf t )j K p!0 ; since t =! 0 + t + y t! 0 > 0 jf t ( jf t ) f t ( jf t )j = jf ( ) f ( )j K t t t j j K j j t! 0 GARCH Assumption 3: EjY t j 5+ <, for some > 0. GARCH Assumption 3 is needed to show the uniform LLN Assumption (A) of the paper and also to ensure the moment conditions in Assumptions (C) and (D). For the GARCH model it is possible to obtain the results of the paper under a weaer version of Assumption (D). An inspection of the proofs shows that it is su cient to replace Assumption (D) by the following. (i) E Assumption (D ): For some 0 < < and 8 t : h V (F t ) 3+i h ; E H (F t ) 3+i ; E hv (F t ) 3+ i ; E i hh (F t ) 3+ K; 7

18 h (ii) E h (iii) E H (F t V (F t ) + V (F t ) +i K; ) + jy t j +i ; E h H (F t ) + jy t j +i K: Assumption (D ) is in turn ful lled if EjY t j 4+ <, for some > 0. For reasons of brevity, we omit the arguments and wor instead with the stronger GARCH Assumption 3. Appendix SA..3: Identi cation In Theorem, we discussed the identi cation of a general dynamic model for ES and VaR model by minimizing the FZ loss, with the form of a general model given by equation (4). Under correct speci cation of the model, that is (V ar (Y t jf t ); ES (Y t jf t )) = (v t ( 0 ); e t ( 0 )) 8 t a.s., the condition required for identi cation is given by Assumption (B) (iv): Pr v t () = v t ( 0 ) \ e t () = e t ( 0 ) = ; 8 t ) = 0 : This assumption is equivalent to Pr v t () = v t ( 0 ) \ e t () = e t ( 0 ); 8t = In the case of the GARCH model we have: Pr fv t () = v t ( 0 )g \ fe t () = e t ( 0 )g; 8t = ) Pr fc e t () = c 0 e t ( 0 )g \ fe t () = e t ( 0 )g; 8t = ) Pr fc = c 0 g \ fb t () = b 0 t ( 0 )g; 8t = )c = c 0 ; Pr b t () = b 0 t ( 0 ); 8t = )c = c 0 ; Pr b (! + t () + Y t ) = b 0(! t ( 0 ) + 0 Y t ); 8t = )c = c 0 ; Pr b! + b 0 t ( 0 ) + b Y t = b 0! b 0 t ( 0 ) + b 0 0 Y t ; 8t = where the third line holds because e t ( 0 ) = b 0 t ( 0 ) and we assume that b 0 < 0, thus e t ( 0 ) < 0, and in the last line, we replaced b t () by b 0 t (0 ) because we started with b t () = b 0 t ( 0 ); 8t almost surely. Since the GARCH model assumes that Y t j t ( 0 ) F (0; t (0 )), Pr b! + b 0 t ( 0 ) + b Y t = b 0! b 0 t ( 0 ) + b 0 0 Y t ; 8t = ) Pr fb = b 0 0 g \ fb! + b 0 t ( 0 ) = b 0! b 0 t ( 0 )g; 8t = ) b = b 0 0 ; Pr b! + b 0 t ( 0 ) = b 0! b 0 t ( 0 ); 8t = If b 0 6= 0b 0 and P r b! + b 0 t (0 ) = b 0! b 0 t (0 ); 8t = hold at the same time, 8

19 then we have Pr t ( 0 ) = b 0! 0 b! b 0 0 b 0 ) Pr! t ( 0 ) + 0 Y ; 8t = t = b 0! 0 b! b 0 0 b 0 ; 8t = This contradicts the assumption of the GARCH model,that Y t j t ( 0 ) F (0; t (0 )). Thus, b 0 6= 0b 0 and Pr b! + b 0 t (0 ) = b 0! b 0 t (0 ); 8t = cannot hold at the same time. This means that Pr b! + b 0 t (0 ) = b 0! b 0 t (0 ); 8t = implies b 0 = 0b 0, which further implies that = 0 and b! = b 0! 0. In summary, we have shown that Pr v t () = v t ( 0 ) \ e t () = e t ( 0 ) = ; 8 t )c = c 0 ; b = b 0 0 ; b 0 = 0 b 0; b! = b 0! 0 )c = c 0 ; = 0 ; b = b 0 0 ; b! = b 0! 0 Therefore, Assumption (B)(iv) holds if we normalize one of the three parameters b; ;!. choose to normalize!. We Appendix SA..4: Uniform LLN In this section, we show that under the GARCH assumptions we have made in Section SA.., Assumption (A) is satis ed: L F Z0 (Y t ; v t () ; e t () ; ) obeys the uniform law of large numbers. Since the parameter space is assumed to be compact, we can establish the uniform LLN by combining the pointwise LLN with stochastic equicontinuity. Appendix SA..4.: LLN The LLN is based on Davidson (994, Corollary 9.3) which we restate here as Theorem 4 for convenience. Theorem 4 (Davidson) Suppose that (X t ) tn satis es: EjX t j + < for some > 0 tn and (X t ) tn is mixing with P m= m (m) =(+) <, then n nx (X t E[X t ]) L! 0: t= 9

20 Under Assumption (F), which we discussed in the context of the GARCH model in Section SA.. above, implies that L F Z0 (Y t ; v t () ; e t ()) is -mixing with a decay rate no slower than that required by Theorem 4. Also, L F Z0 (Y t ; v t () ; e t ()) is strictly stationary as we have shown that Y t ; t () t () =@ is strictly stationary. We then need only show that EjL F Z0 (Y t ; v t (); e t (); )j + < jl F Z0 (Y t ; v t (); e t (); )j =j e t () fy t v t ()g(v t () Y t ) + v t() e t () + log( e t()) j =j v t() e t () ( fy t v t ()g) + Y t e t () fy t v t ()g + log( e t ()) j =jc ( c( + ) + j log( fy t v t ()g) By Cr-inequality, it is su cient to show that t b fy t v t ()g + log( b t ()) j b)j + + j tj jbj + j log tj Ej t j + < and Ej log t j + < : The moment condition on t is directly implied by the structure of the model and GARCH Assumption 3. Recall that t =! 0 + t +y t! 0 > 0. Therefore, if t < then j log t j j log p! 0 j, and if t then j log t j t. In summary, j log t j j log p! 0 j+ t. Therefore, by Cr-inequality Ej log t j + E(j log p! 0 j + t ) + + (j log p! 0 j + + E + t ): h Thus, a su cient condition for Ej log t j + < is E + t i < which is implied by GARCH Assumption 3. Hence, L F Z0 (Y t ; v t () ; e t ()) obeys the law of large numbers for any xed by Theorem 4. Appendix SA..4.: Stochastic equicontinuity The stochastic equicontinuity condition is derived using Davidson (994, Theorem.0) which we restate here as Theorem 5 for convenience. 0

21 Theorem 5 (Davidson) Let Q n () be the objective function for an M-estimator. Suppose there exists N N such that jq n () Q n ( 0 )j a n h( 0 ); a.s. holds for all ; 0 and n N, where h is a deterministic function with h(x) # 0 as x # 0, and a n = O p (). Then (Q n ) nn is stochastically equicontinuous. Observe that fy t v t ()g(v t () Y t ) = (v t() Y t + jv t () Y t j): (70) Let = [ ; ; b ; c ]; = [ ; ; b ; c ]. Then, using (70), we obtain fy t v t ( )g(v t ( ) Y t ) fy t v t ( )g(v t ( ) Y t ) e t ( ) e t ( ) = v t ( ) Y t + jv t ( ) Y t j v t ( ) Y t + jv t ( ) Y t j e t ( ) e t ( ) v t ( ) Y t v t ( ) Y t e t ( ) e t ( ) + jv t ( ) Y t j jv t ( ) Y t j e t ( ) e t ( ) (7) v t ( ) Y t v t ( ) Y t e t ( ) e t ( ) (7) = v t ( ) v t ( ) Yt Y t e t ( ) e t ( ) e t ( ) e t ( ) = c t c t b b jc c j + jb b j j jb b j t j: The inequality between (7) and (7) holds because jv t ( ) Y t j jv t ( ) Y t j e t ( ) e t ( ) = jv t ( ) Y t j je t ( )j = jv t ( ) Y t j je t ( )j v t ( ) Y t e t ( ) jv t ( ) Y t j je t ( )j jv t ( ) Y t j je t ( )j v t ( ) Y t e t ( ) : By Taylor s theorem, j log( e t ( )) log( e t ( ))j = e t ( ) re t( )

22 for some between and. Since, re t () =b r t () + t () [0; 0; ; 0] ) re t() je t ()j jbj r t () + t () jbj r t() + t () jb t ()j r t() t () + jbj ; we obtain Therefore, rt ( j log( e t ( )) log( e t ( ))j ) t ( ) + jb j : jl F Z0 (Y t ; v t ( ); e t ( ); ) L F Z0 (Y t ; v t ( ); e t ( ); )j jc c j + jb b j rt ( j jb b j t j + jc c j + ) t ( ) j tj B B + r t( ) t ( ) + jb j The last inequality holds because by GARCH Assumption, jb j; jb j; jb j B > 0. Using Lemma 8 in Section SA..5, we obtain " r t () # t () X = = t () (i ) (i )= jy t i j + i= " # X ( ) = =! = 0 (i )( ) (i )= jy t i j + using the bounds on and in GARCH Assumption. De ne " # Z t = X ( ) = =! = 0 (i )( ) (i )= jy t i j + Then, i= jl T ( ) L T ( )j + + B + B i= T TX j t j + T t=! TX Z t : Ej t j = const < and E [Z t ] = const < (because EjY t j = const < by GARCH Assumptions and 3). Then, T P T t= j tj = O p () and T P T t= Z t = O p (). Therefore, by Theorem 5, L T is stochastically equicontinuous. t= Appendix SA..5: Assumptions (C) and (D)

23 We de ne X (t; ) = X (t; ) = X 3 (t; ) = X i= (i ) i Y t X (i ) (i )= jy t i j i= X i=3 (i )(i ) i 3 Yt i i Lemma 8 Under GARCH Assumption, we have r t () X (t; ) + t () r t () [= = X (t; ) + t ()]! 0 r t () [ ( ) 3 + X 3(t; ) + X (t; )] r t ()! = 0 4 Proof of Lemma 8. Therefore, X (t; ) + = = X (t; ) + 4 t ()! 0 +! = 0 [ ( ) 3 + X 3(t; ) + X (t; )] t () =! 0 + t () + Y t =! 0 r t () = "! 0 =( ) + X i= + X (i ) i Y t i; i= X i= i Y t i: (73) i Y t i; 0; 0 =! 0 =( ) + X (t; ); ( t! 0 =( )); 0; 0 (74) r t ()! 0 =( ) + X (t; ) + ( t! 0 =( )) = X (t; ) + t +! 0 X (t; ) + t ; # since + implies =( ) = 0. Furthermore, r t () = r t () 4 P i= (i )i Yt i q t ()!0 + P i= i Yt i 3 + t t 3 5 :

24 For all j, we have (j ) j Yt j q!0 + P (j )j Yt q i= i Yt i j Yt j j = (j ) = (j 3)= jy t j j as all summands in the denominator are positive. This implies " r t () # X = = (i ) (i )= jy t i j + t () i= = [= = X (t; ) + t ()]: Using equation (74), we obtain 3 a a 0 0 r a t () = where a =! 0 ( ) 3 + X (i )(i ) i 3 Yt a = a = X i= i=3 (i ) i Y t Thus, since the Frobenius norm is always less than the sum of the absolute values of the matrix entries, r t () [ ( ) 3 + X 3(t; ) + X (t; )]: Using that r t () = r t ()=( t ()), we nd that therefore,! 0 r t () = r0 t ()r t () t () + r t () = r0 t ()r t () + r t () t () t () t () ; r t () r t() t () i + r t () t () Since t! 0 > 0 and using our previous results, we obtain the claimed bound on r t (). i 4

25 Lemma 9 Under GARCH Assumption, it holds that jv t ()j V (F t ) = B S (F t ) re t () H (F t ) = B S (F t ) + S (F t ) rv t () V (F t ) = H (F t ) + V (F t ) r e t () H (F t ) = B S 3 (F t ) + S (F t ) r v t () V (F t ) = H (F t ) + H (F t ); where v u X S (F t ) = t!0 + ( ) i Yt i= S (F t ) = [( ) = = X (t; ) + S (F t )] S 3 (F t ) =! = 0 4 i ( )X (t; ) + X (t; ) + 4 S (F t ) +! = 0 [! ( )X 3 (t; ) + X (t; )]: Proof of Lemma 9. As a function in and, t () is increasing in both arguments, see equation (73), and, in fact, it does not depend on the parameters b and c. Therefore, t () S (F t ). The quantities X (t; ), X (t; ), X 3 (t; ) de ned in the beginning of this section are all increasing in, and thus, bounded by X (t; ), X (t; ), X 3 (t; ), respectively. Recall that jbj B under GARCH Assumption. The rst inequality holds because jv t ()j je t ()j = jbj t (). The remaining ones are implied by Lemma 8 and re t () = b r t () + t () [0; 0; ; 0] jbj r t () + t () rv t () = c re t () + e t () [0; 0; 0; ] re t () + jbj t () r e t () = br t () + [0; 0; ; 0] 0 r t () + r 0 t ()[0; 0; ; 0] jbj r t () + r t () r v t () = cr e t () + [0; 0; 0; ] 0 re t () + r 0 e t ()[0; 0; 0; ] r e t () + re t (): 5

26 Lemma 0 Let fx i g in be a sequence of random variables and de ne X = P i= a ijx i j, where a i > 0 for all i N and P i= a i <. Let p >. If in EjX i j p K < for some constant K, then EjXj p ( P i= a i) p K. Proof of Lemma 0. X = By Jensen s inequality, EjZj p jezj p. We rewrite X as X a i jx i j = i=! X X a i Note that P i= (P i= a i) a i =, namely f( P i= a i) a i g i= i= i=! X a i a i jx i j: i= is a probability measure. Then, using Jensen s inequality, 0 X X p A p 0 0 a j j= i= j= a j A p 0 X a i jx i A p a j j= i= 0 X j= a j A a i jx i j p : Thus, 0 X E [X p A p a j j= i= 0 X j= a j A 0 X a i EjX i j A p a j j= i= 0 X j= a j A 0 X a i K j= a j A p K Lemma Under GARCH Assumption and for any p >, p ; : : : ; p 6 > 0, the following statements hold for all t: () If EjY t j p <, then the following quantities are all nite: E[V p (F t )], E[V p (F t )], E[H p (F t )], E[V p= (F t )], E[H p= (F t )]. () If p = p + p + p 3 + p 4 + p 5 + p 6 and EjY t j p <, then E[V p (F t )V p (F t )H p 3 (F t )V p 4 (F t )H p 5 (F t )jy t j p 6 ] < : (3) If EjY t j 4+ < for some > 0, all the moment conditions in Assumption (D) could be satis ed. Proof of Lemma. Part () follows by combining Lemma 9 with Lemma 0, and part () is a consequence of part () and Hölder s inequality. Lemma implies that GARCH Assumption 3 implies Assumption (D) of the paper. 6

27 Appendix SA..6: Assumption (E) D 0 is the Hessian of the expected loss at 0, so it is positive semi-de nite. Let x = (x ; : : : ; x 4 ) 0 R 4 such that x 0 D 0 x = 0. This implies that x 0 rv t ( 0 ) = 0, x 0 re t ( 0 ) = 0 almost surely. We have, x 0 rv t ( 0 ) = cx 0 re t ( 0 ) + x 4 e t ( 0 ). Therefore, x 4 = 0. Furthermore, t ( 0 )x 0 re t ( 0 ) = t ( 0 )bx 0 r t ( 0 ) + x 3 t ( 0 ) = bx 0 r t ( 0 ) + x 3 t ( 0 ) = 0; a.s. (75) The stationarity of fy t g implies that t ( 0 ) is stationary. Therefore it also holds that bx 0 r t ( 0 ) + x 3 t ( 0 ) = 0; a.s. (76) Computing (75) (76), we obtain that a.s. 0 = bx 0 [ t ( 0 ); Y t ; 0; 0] 0 + x 3 (! 0 + Y t ) = (bx + x 3 )Y t + (! 0 x 3 + bx t ( 0 )): (77) By the assumption that Y t j t F (0; t (0 )) and that t (0 ) =! t (0 )+ 0 Y t, we can conclude from the above equation that x = x = x 3 = 0. Thus D 0 is positive de nite. Appendix SA..7: Assumption (G) We now verify this assumption for the GARCH(,) model. Set a = bc; so that v t = a t : Then for T 5, a necessary condition for Y t = v t (), t = ; : : : ; T is given by the set of equations or, equivalently, Xt Yt = a t 0 + a t (! 0 + Y0 ) + a t (! 0 + Y ); t = ; : : : ; 4 = Y = a 0 + a (! 0 + Y 0 ) (78) Y = Y + a (! 0 + Y ) (79) Y 3 = Y + a (! 0 + Y ) (80) Y 4 = Y 3 + a (! 0 + Y 3 ): (8) Solving equations (79)-(8) for and equating the results, we obtain a = Y 4 Y Y 3! 0 Y Y = Y 4 3 Y Y 4 Y3 Y : (8) 7

28 Therefore, a necessary condition such that Y t = v t (), t = ; : : : ; T for some parameter is that (Y ; : : : ; Y T ) lies in the set p (0) = f(y ; :::; Y T ) R T jp(y ; :::; Y T ) = 0g, where p is the polynomial function p(y ; : : : ; Y T ) = (Y 4 Y Y 3 )(Y 3 Y ) (Y 4 3 Y Y 4 )(Y Y ): The set p (0) has Hausdor dimension less than T. Therefore, as the distribution of (Y ; : : : ; Y T ) is assumed to be absolutely continuous from GARCH Assumption, we obtain the claim with K = 4. Appendix SA..8: Summary We summarize the arguments showing that Assumption and of the paper are satis ed under GARCH Assumptions 3. Assumption : Part (A) holds as it has been shown in Section SA..4. that the uniform law of large number holds under our GARCH Assumptions. Part (B)(i)-(ii) are satis ed under GARCH Assumptions -. Part (B)(iii) is easy to chec. Concerning Part (B)(iv), we have shown in Section SA..3 that the GARCH model is identi able when! is normalized. Assumption : Part (A)(i) is easy to chec, (ii) is satis ed by GARCH Assumption. Part (B)(i) is satis ed by GARCH Assumption, (ii) is clearly weaer than GARCH Assumption 3. Part (C)(i) follows easily from t ()! 0 > 0 and the bounds on the parameter jbj. Part (C)(ii) has been shown in Lemma 9. Part (D) is implied by Lemma. Part (E) is discussed in Section SA..6. Part (F) is satis ed under GARCH Assumptions 3 as discussed in Section SA.., and Part (G) is satis ed by GARCH Assumption as discussed in Section SA..7. 8

29 Appendix SA.3: Additional tables Table S: Finite-sample performance of (Q)MLE T = 500 T = 5000!! Panel A: N(0,) innovations True Median Avg bias 0.0 (0.0) (0.005) St dev Coverage Panel B: Sew t (5,-0.5) innovations True Median Avg bias 0.07 (0.03) (0.008) 0.00 St dev Coverage Notes: This table presents results from 000 replications of the estimation of the parameters of a GARCH(,) model, using the Normal lielihood. In Panel A the innovations are standard Normal, and so estimation is then ML. In Panel B the innovations are standardized sew t; and so estimation is QML. Details are described in Section 4 of the main paper. The top row of each panel presents the true values of the parameters. The second, third, and fourth rows present the median estimated parameters, the average bias, and the standard deviation (across simulations) of the estimated parameters. The last row of each panel presents the coverage rates for 95% con dence intervals constructed using estimated standard errors. 9

30 Table S: Simulation results for Normal innovations, estimation by CAViaR T = 500 T = 5000 a a = 0:0 True Median Avg bias St dev Coverage = 0:05 True Median Avg bias St dev Coverage = 0:05 True Median Avg bias St dev Coverage = 0:0 True Median Avg bias St dev Coverage = 0:0 True Median Avg bias St dev Coverage Notes: This table presents results from 000 replications of the estimation of VaR from a GARCH(,) DGP with standard Normal innovations. Details are described in Section 4 of the main paper. The top row of each panel presents the true values of the parameters. The second, third, and fourth rows present the median estimated parameters, the average bias, and the standard deviation (across simulations) of the estimated parameters. The last row of each panel presents the coverage rates for 95% con dence intervals constructed using estimated standard errors. 30

31 Table S3: Simulation results for sew t innovations, estimation by CAViaR T = 500 T = 5000 a a = 0:0 True Median Avg bias St dev Coverage = 0:05 True Median Avg bias St dev Coverage = 0:05 True Median Avg bias St dev Coverage = 0:0 True Median Avg bias St dev Coverage = 0:0 True Median Avg bias St dev Coverage Notes: This table presents results from 000 replications of the estimation of VaR from a GARCH(,) DGP with sew t innovations. Details are described in Section 4 of the main paper. The top row of each panel presents the true values of the parameters. The second, third, and fourth rows present the median estimated parameters, the average bias, and the standard deviation (across simulations) of the estimated parameters. The last row of each panel presents the coverage rates for 95% con dence intervals constructed using estimated standard errors. 3

32 Table S4: Diebold-Mariano t-statistics on average out-of-sample loss di erences for the DJIA, NIKKEI and FTSE00 (alpha=0.05) RW5 RW50 RW500 G-N G-St G-EDF FZ-F FZ-F G-FZ Hybrid Panel A: DJIA RW RW RW G-N G-St G-EDF FZ-F FZ-F G-FZ Hybrid Panel B: NIKKEI RW RW RW G-N G-St G-EDF FZ-F FZ-F G-FZ Hybrid Table continued on next page. 3

33 Table S4 (cont d): Diebold-Mariano t-statistics on average out-of-sample loss di erences for the DJIA, NIKKEI and FTSE00 (alpha=0.05) RW5 RW50 RW500 G-N G-St G-EDF FZ-F FZ-F G-FZ Hybrid Panel C: FTSE RW RW RW G-N G-St G-EDF FZ-F FZ-F G-FZ Hybrid Notes: This table presents t-statistics from Diebold-Mariano tests comparing the average losses, using the FZ0 loss function, over the out-of-sample period from January 000 to December 06, for ten di erent forecasting models. A positive value indicates that the row model has higher average loss than the column model. Values greater than.96 in absolute value indicate that the average loss di erence is signi cantly di erent from zero at the 95% con dence level. Values along the main diagonal are all identically zero and are omitted for interpretability. The rst three rows correspond to rolling window forecasts, the next three rows correspond to GARCH forecasts based on di erent models for the standardized residuals, and the last four rows correspond to models introduced in Section of the main paper. 33

34 Table S5: Out-of-sample average losses and goodness-of- t tests (alpha=0.05) Average loss GoF p-values: VaR GoF p-values: ES S&P DJIA NIK FTSE S&P DJIA NIK FTSE S&P DJIA NIK FTSE RW RW RW GCH-N GCH-St GCH-EDF FZ-F FZ-F GCH-FZ Hybrid Notes: The left panel of this table presents the average losses, using the FZ0 loss function, for four daily equity return series, over the out-of-sample period from January 000 to December 06, for ten di erent forecasting models. The lowest average loss in each column is highlighted in bold, the second-lowest is highlighted in italics. The rst three rows correspond to rolling window forecasts, the next three rows correspond to GARCH forecasts based on di erent models for the standardized residuals, and the last four rows correspond to models introduced in Section of the main paper. The middle and right panels of this table present p-values from goodness-of- t tests of the VaR and ES forecasts respectively. Values that are greater than 0.0 (indicating no evidence against optimality at the 0.0 level) are in bold, and values between 0.05 and 0.0 are in italics. 34

35 Table S6: Diebold-Mariano t-statistics on average out-of-sample loss di erences for the S&P 500, DJIA, NIKKEI and FTSE00 (alpha=0.05) RW5 RW50 RW500 G-N G-St G-EDF FZ-F FZ-F G-FZ Hybrid Panel A: S&P 500 RW RW RW G-N G-St G-EDF FZ-F FZ-F G-FZ Hybrid Panel B: DJIA RW RW RW G-N G-St G-EDF FZ-F FZ-F G-FZ Hybrid Table continued on next page. 35

36 Table S6 (cont d): Diebold-Mariano t-statistics on average out-of-sample loss di erences for the S&P 500, DJIA, NIKKEI and FTSE00 (alpha=0.05) RW5 RW50 RW500 G-N G-St G-EDF FZ-F FZ-F G-FZ Hybrid Panel C: NIKKEI RW RW RW G-N G-St G-EDF FZ-F FZ-F G-FZ Hybrid Panel D: FTSE RW RW RW G-N G-St G-EDF FZ-F FZ-F G-FZ Hybrid Notes: This table presents t-statistics from Diebold-Mariano tests comparing the average losses, using the FZ0 loss function, over the out-of-sample period from January 000 to December 06, for ten di erent forecasting models. A positive value indicates that the row model has higher average loss than the column model. Values greater than.96 in absolute value indicate that the average loss di erence is signi cantly di erent from zero at the 95% con dence level. Values along the main diagonal are all identically zero and are omitted for interpretability. The rst three rows correspond to rolling window forecasts, the next three rows correspond to GARCH forecasts based on di erent models for the standardized residuals, and the last four rows correspond to models introduced in Section of the main paper. 36

37 References [] Carrasco, M. and X. Chen, 00, Mixing and moment properties of various GARCH and stochastic volatility models, Econometric Theory, 8(), [] Davidson, J. 994, Stochastic Limit Theory: An Introduction for Econometricians. Oxford University Press. [3] Lumsdaine, R., 996, Consistency and asymptotic normality of the quasi-maximum lielihood estimator in IGARCH(,) and covariance stationary GARCH(,) models, Econometrica, 64,

Dynamic Semiparametric Models for Expected Shortfall (and Value-at-Risk)

Supplemental Appendix to: Dynamic Semiparametric Models for Expected Shortfall (and Value-at-Ris) Andrew J. Patton Johanna F. Ziegel Rui Chen Due University University of Bern Due University 30 April 08