Rank-Based Estimation for GARCH Processes

Size: px

Start display at page:

Download "Rank-Based Estimation for GARCH Processes"

Teresa Freeman
6 years ago
Views:

1 Rank-Based Estimation for GARCH Processes Beth Andrews Northwestern University September 7, 2011 Abstract We consider a rank-based technique for estimating GARCH model parameters, some of which are scale transformations of conventional GARCH parameters The estimators are obtained by minimizing a rank-based residual dispersion function similar to the one given in Jaeckel (1972) They are useful for GARCH order selection and preliminary estimation We give a limiting distribution for the rank estimators which holds when the true parameter vector is in the interior of its parameter space, and when some GARCH parameters are zero The limiting theory is used to show that the rank estimators are robust, can have the same asymptotic efficiency as maximum likelihood estimators, and are relatively efficient compared to traditional Gaussian and Laplace quasi-maximum likelihood estimators The behavior of the estimators for finite samples is studied via simulation, and we use rank estimation to fit a GARCH model to exchange rate log-returns The author is very grateful to editor Peter CB Phillips, co-editor Pentti Saikkonen, and two anonymous referees for their helpful comments This work was supported in part by NSF Grant DMS Address correspondence to Beth Andrews, Department of Statistics, Northwestern University, Evanston, Illinois 60208, USA; bandrews@northwesternedu

2 Rank Estimation for GARCH Processes 1 1 Introduction Observed time series processes frequently appear uncorrelated, yet exhibit volatility clustering Volatility clustering is the tendency of observations relatively small in absolute value to be followed by other small observations, and the tendency of observations relatively large in absolute value to be followed by other large observations Hence, these series appear uncorrelated, but dependent Nonlinear models with time-dependent conditional variances, most notably generalized autoregressive conditionally heteroskedastic (GARCH) models, are often used to describe time series with these features GARCH models were first developed for modeling inflation rates (Engle, 1982; Bollerslev, 1986), and have also appeared for analyzing the returns of exchange rates (Bollerslev, 1987; Engle and González-Rivera, 1991; Shephard, 1996) and stock prices (Bollerslev, 1987; Shephard, 1996; Fan and Yao, 2003) Applications for GARCH models are not limited to finance, however Time series processes exhibiting GARCH-type behavior have also appeared, for example, in speech signals (Abramson and Cohen, 2008), daily and monthly mean temperatures (Campbell and Diebold, 2005; Romilly, 2005; Huang, Shiu, and Lin, 2008), wind speeds (Ewing, Kruse, and Thompson, 2008), and atmospheric carbon dioxide concentrations (Hoti, McAleer, and Chan, 2005) In this paper, we consider a rank-based technique for estimating GARCH model parameters, some of which are scale transformations of conventional GARCH parameters The rank (R) estimators are obtained by minimizing the sum of mean-corrected model residuals weighted by a function of residual rank They are similar to the R-estimators proposed by Jaeckel (1972) for estimating linear regression parameters, and can be used for GARCH order selection and preliminary estimation As discussed in Jure cková and Sen (1996), R-estimators are, in general, robust and relatively efficient, and results in this paper indicate this is true in the case of GARCH estimation The technique is robust because the R-estimators are n 1/2 -consistent (n represents sample size) under general conditions and, since the weight function can be chosen so that R- estimators have the same asymptotic efficiency as maximum likelihood (ML) estimators, it is also relatively efficient In addition, R-estimation dominates traditional techniques such as Gaussian and Laplace quasi- ML (QML) with respect to asymptotic efficiency We, therefore, recommend that R-estimation be used for preliminary GARCH estimation and order selection when the noise distribution is unknown Once a R-estimate has been found, corresponding model residuals can be used to identify an appropriate noise

3 Rank Estimation for GARCH Processes 2 distribution, and then the conventional GARCH parameters can be estimated via ML Another rank-based technique for estimating the parameters of conditionally heteroskedastic processes is given in Mukherjee (2007) However, the class of models considered in Mukherjee (2007) includes ARCH but not GARCH models, and n 1/2 -consistency for the ARCH parameter estimates is established only when the noise distribution has a finite fourth moment While it is traditional to assume the noise distribution is Gaussian when fitting an ARCH/GARCH model to an observed series, many series appear to have noise distributions that are heavier-tailed than Gaussian (Bollerslev, 1987; Engle and González-Rivera, 1991; Shephard, 1996; Fan and Yao, 2003) Given the many applications for GARCH models, it is, therefore, important that robust statistical theory be developed In this paper, we consider both ARCH and GARCH models and show that, when the true parameter vector lies in the interior of the parameter space, higher-order moment conditions are not required for the n 1/2 -consistency of R-estimators of ARCH model parameters The R-estimators are introduced in Section 2 and, in Section 31, we show that as sample size n, they converge in distribution to the minimizer of a random quadratic function on a convex space This limiting result holds when the parameter vector lies in the interior of its parameter space, in which case it follows that the estimators are asymptotically normal, and also when some GARCH parameters are zero, and hence the parameter vector lies on the boundary of its parameter space In Section 32, we show the limiting distribution for the R-estimators can be used for GARCH order selection and confidence interval estimation Proofs of the lemmas used to establish the results of Section 3 are in the Appendix The quality of the asymptotic approximations for finite samples is studied via simulation in Section 41, and we use R-estimation to fit a GARCH model to exchange rate log-returns in Section 42 2 Preliminaries A series {X t } t= is a GARCH(p,q) process if X t = σ t Z t, (21)

4 Rank Estimation for GARCH Processes 3 where {Z t } is a sequence of independent and identically distributed (iid) random variables with E{Z t } = 0 and Var{Z t } = 1, {σ t } is a non-negative process satisfying p q σt 2 = α 00 + α 0i Xt i 2 + β 0j σt j, 2 (22) i=1 j=1 and Z t is independent of {X t k,k 1} for every t (Bollerslev, 1986) The parameter α 00 is positive, α 0i, i {1,,p}, and β 0j, j {1,,q}, are non-negative, and σt 2 represents Var{X t X s,s < t} When q = 0, {X t } is an ARCH(p) process (Engle, 1982) In the case of a GARCH(1,1) process, E{ln(α 01 Zt 2 +β 01 )} < 0 is necessary and sufficient for the stationarity and ergodicity of {X t } (Nelson, 1990) and, for p 2 and q 2, a stationary, ergodic solution to (21) (22) exists if and only if the top Lyapunov exponent associated with the (p+q 1) (p+q 1) matrices {A 0t }, where α 01 Zt 2 +β 01 β 02 β 0,q 1 β 0q α 02 α 0,p 1 α 0p A 0t := Zt , is negative (Bougerol and Picard, 1992) That is, {X t } is stationary and ergodic if and only if γ L := inf 1 t< [(t+1) 1 E{ln A 00 A 0t op }] < 0, where, for a matrix A, A op represents the matrix operator norm sup x 0 ( Ax / x ), and is the Euclidean norm Note that γ L can also be used to assess the stationarity/ergodicity of a GARCH process for which p < 2 or q < 2 by setting α 0i = 0 for p < i 2 and β 0j = 0 for q < j 2 Additionally, if p i=1 α 0i+ q j=1 β 0j < 1, it can be shown that the process {X t } is not only stationary and ergodic, but also has finite variance α 00 /(1 p i=1 α 0i q j=1 β 0j) (Bollerslev, 1986) We assume throughout that A1 γ L < 0,

5 Rank Estimation for GARCH Processes 4 so that {X t } is stationary and ergodic, but we do not assume the variance of {X t } is necessarily finite Following Straumann (2005, page 76), so that the GARCH parameter values α 00,α 01,,α 0p,β 01,,β 0q are unique, we also assume A2 there exists at least one i > 0 for which α 0i > 0, α 0p +β 0q 0, and the polynomials α 0 (x) := p i=1 α 0ix i and β 0 (x) := 1 q j=1 β 0jx j have no common roots If we assume A3 P(Z t 0) = 1, itfollowsfrom(21) (22)thatP( t= {X 2 t > 0}) = 1 Therefore,wehaveln(X 2 t) = ln(σ 2 t/α 00 )+ln(α 00 Z 2 t) or ln(α 00 Z 2 t ) = ln(x2 t ) ln(σ2 t /α 00), with σ 2 t /α 00 = 1+ p i=1 (α 0i/α 00 )X 2 t i + q j=1 β 0j(σ 2 t j /α 00), and, for arbitrary values α 0,α 1,,α p,β 1,,β q, we define ɛ t (θ) = ln(x 2 t) ln( σ 2 t(θ)), t = 1,,n, (23) with 1, t = min{1, p q +1},,p, σ t(θ) 2 := 1+ p i=1 θ ixt i 2 + q j=1 θ p+j σ t j 2 (θ), t = p+1,,n, (24) and θ := (α 1 /α 0,,α p /α 0,β 1,,β q ) Let θ 0 = (α 01 /α 00,,α 0p /α 00,β 01,,β 0q ) and ɛ t = ln(α 00 Z 2 t ), and note that {ɛ t (θ 0 )} n closely approximates {ɛ t} n ; the error is due to the initialization with ones in (24) Since A1 implies q j=1 β 0j < 1 (Bougerol and Picard, 1992, Corollary 23), θ 0 is in the parameter space Θ := [0, ) p {(x 1,,x q ) [0,1) q : x 1 + +x q < 1} Now suppose A4 λ is a nonconstant and nondecreasing function from (0, 1) to IR For θ Θ, we introduce the R-function D n (θ) = ( ) Rt (θ) [ λ ɛ t (θ) ɛ(θ)], (25) n p+1 where {R t (θ)} n contains the ranks of the residuals {ɛ t (θ)} n and ɛ(θ) := (n p) 1 n ɛ t(θ) D n is similar to the R-function introduced in Jaeckel (1972) for estimating linear regression parameters We, however, consider a weighted sum of the mean-corrected {ɛ t (θ)} n, instead of a weighted sum of the

6 Rank Estimation for GARCH Processes 5 residuals (as in Jaeckel, 1972), to avoid assuming n λ((t p)/(n p+1)) = 0, which is required in Jaeckel (1972) Note that, if {ɛ (t) (θ)} n is the series {ɛ t(θ)} n ordered from smallest to largest, (25) canalsobewrittenasd n (θ) = n λ((t p)/(n p+1))[ɛ (t)(θ) ɛ(θ)]and, ifλ := (n p) 1 n λ((t p)/(n p+1)), D n (θ) = n [λ((t p)/(n p+1)) λ][ɛ (t)(θ) ɛ(θ)] Because it tends to be near zero when the elements of {ɛ t (θ)} n are similar and gets larger as the values of { ɛ (t)(θ) ɛ(θ) } n increase, D n is a measure of the dispersion of the residuals {ɛ t (θ)} n Given a realization of length n from (21), {X t } n t=1, we plan to estimate θ 0 by minimizing D n Our motivation for using the residuals {ɛ t (θ)} n is that, givenanappropriatelychosenlossfunction, M-estimationwithrespectto{ɛ t (θ)} n canbeequivalent to ML or QML estimation for GARCH model parameters (Muler and Yohai, 2008) Additionally, it is not possible to estimate θ 0 by minimizing the dispersion of the residuals {X t / σ t 2(θ)}n, since the values of {X t / σ 2 t(θ)} n become more clustered about zero as the elements of θ increase By Theorem 21, D n is a non-negative, continuous function on Θ Choices for the weight function λ are discussed in Section 31 Theorem 21 Assume A1 A4 hold If, for θ Θ, { P1 (θ),,p (n p)! (θ) } = { {ɛ 1,p+1 (θ),,ɛ 1,n (θ)},, { ɛ (n p)!,p+1 (θ),,ɛ (n p)!,n (θ) }} contains the (n p)! permutations of the sequence {ɛ t (θ)} n (so, for j {1,,(n p)!},t {p+1,,n}, ɛ j,t (θ) represents the (t p)th element of permutation P j (θ)), then D n (θ) = sup j {1,,(n p)!} In addition, D n is a non-negative, continuous function on Θ ( ) t p [ λ ɛ j,t(θ) ɛ(θ)] n p+1 Proof Recall that D n (θ) = n [λ((t p)/(n p+1)) λ][ɛ (t)(θ) ɛ(θ)], and let a n (t) = λ((t p)/(n p + 1)) λ and z t (θ) = ɛ t (θ) ɛ(θ) The results of this theorem follow from the proof of Theorem 1 in Jaeckel (1972), where properties are given for n a n(t)z (t) (θ) We are, therefore, estimating θ 0 = (α 01 /α 00,,α 0p /α 00,β 01,,β 0q ) by minimizing the rank-based residual dispersion function (25) Other robust GARCH estimation techniques considered in the literature, M-estimation (Mukherjee, 2008; Muler and Yohai, 2008), least absolute deviations (Peng and Yao, 2003), and QML corresponding to noise distributions other than Gaussian (Berkes and Horváth, 2004), are also not

7 Rank Estimation for GARCH Processes 6 used to directly estimate η 0 := (α 00,α 01,,α 0p,β 01,,β 0q ) Those methods are used instead to estimate (α 00 /c,α 01 /c,,α 0p /c,β 01,,β 0q ), where c > 0 is unknown when the noise distribution is unknown Gaussian QML (Berkes, Horváth, and Kokoszka, 2003; Francq and Zakoïan, 2004) and ML (Berkes and Horváth, 2004) can be used to directly estimate η 0 but, because Gaussian QMLEs have a rate of convergence slower than n 1/2 when E{Zt 4 } = (Hall and Yao, 2003) and the noise distribution is unknown in practice, they are not robust techniques Therefore, for estimating η 0, we recommend that R-estimation be used as a preliminary technique Once a R-estimate ˆθ R of θ 0 has been found, an appropriate noise distribution can be identified from the residuals {X t / σ t(ˆθ 2 R )}, which resemble { α 00 Z t } when ˆθ R is near θ 0, and η 0 can be estimated via ML 3 Asymptotic Results 31 Limiting Distribution for R-Estimators Let f and F denote the density and distribution functions for ln(zt 2 ) In order to obtain the limiting distribution for R-estimators of θ 0, we make the following additional assumptions: A5 F is strictly increasing and differentiable on IR A6 f is uniformly continuous on IR A7 The weight function λ is bounded and left-continuous on (0, 1) We also consider the following conditions: A8 θ 0i > 0 for all i {1,,p+q} A9 The set {j : β 0j > 0}, and j 0 i=1 α 0i > 0 for j 0 := min{j : β 0j > 0} A10 EX 4 t < Finally, for values of θ = (θ 1,,θ p+q ) Θ, we define the series {ɛ t(θ)} so that, for all t, ɛ t(θ) = ln(x 2 t) ln( σ 2 t(θ)) with p q σ t 2 (θ) = 1+ θ i Xt i 2 + θ p+j σ t j 2 (θ) i=1 j=1

8 Rank Estimation for GARCH Processes 7 The sequence {ɛ t (θ)}n is a stationary (stationarity follows from Berkes, Horváth, and Kokoszka, 2003, Section 2) approximation of the residuals {ɛ t (θ)} n Note that { σ2 t (θ 0)} = {σ 2 t /α 00} and {ɛ t (θ 0)} = {ɛ t }, with {σ 2 t} defined in (22) and ɛ t = ln(α 00 Z 2 t) Since q j=1 θ p+j < 1 for θ = (θ 1,,θ p+q ) Θ, 1/(1 θ p+1 x θ p+q x q ) has a Laurent series expansion of the form j=0 c j(θ)x j, where the coefficients {c j (θ)} j=0 are geometrically decaying (Berkes, Horváth, and Kokoszka, 2003, Section 2) Because σ 2 t (θ) θ i = Xt i 2 + q j=1 θ p+j ( σ 2 t j (θ)/ θ i ), i {1,,p}, σ t+p i 2 (θ)+ q j=1 θ ( ) p+j σ 2 t j (θ)/ θ i, i {p+1,,p+q}, it follows that the first partial derivatives of ɛ t (θ) are given by ɛ t(θ) θ i = 1 σ t(θ) 2 σ t(θ) 2 = θ i ( j=0 c j(θ)x 2 t i j ) / σ 2 t (θ), i {1,,p}, ( j=0 c j(θ) σ 2 t+p i j (θ) )/ σ 2 t(θ), i {p+1,,p+q} (31) When θ = θ 0, ɛ t (θ 0) θ i = ( α 00 j=0 c j(θ 0 )Xt i j 2 ( j=0 c j(θ 0 )σ 2 t+p i j ) /σ 2 t, i {1,,p}, ) /σ 2 t, i {p+1,,p+q} (these are right derivatives when θ 0i = 0), with j=0 c j(θ 0 )x j = 1/(1 β 01 x β 0q x q ) We are now able to give the limiting distribution for R-estimators of θ 0 Theorem 31 If A1 A7 and either A8, A9 or A10 hold, then there exists a sequence of minimizers ˆθ R Θ of D n ( ) in (25) such that n 1/2(ˆθR θ 0 ) d ξ := argminu Λ (u Y) KΓ(u Y), (32) where Λ = Λ 1 Λ p+q, with Λ i = IR if θ 0i > 0 and Λ i = [0, ) if θ 0i = 0, Y N(0,Σ), Σ := J K 2 Γ 1, J := 1 0 λ2 (x)dx ( 1 0 λ(x)dx)2, K := f(x)dλ(f(x)), and { [ ɛ { }][ Γ := E t (θ 0 ) ɛ E t (θ 0 ) ɛ t (θ 0 ) θ θ θ The limiting random vector ξ is a unique, finite value almost surely { }] } ɛ E t (θ 0 ) θ Proof By Lemma 54 in the Appendix, S n (u) = D n (θ 0 + n 1/2 u) D n (θ 0 ) d S(u) = u N + Ku Γu/2 on C(Λ), where N N(0, JΓ) and C(Λ) is the space of continuous functions on Λ where convergence is

9 Rank Estimation for GARCH Processes 8 equivalent to uniform convergence on every compact subset The derivatives ɛ t (θ 0)/ θ i, i {1,,p+q}, have finite second moments (in the case of A8 this follows from the proof of Theorem 22(i) in Francq and Zakoïan, 2004, and in the case of A9 or A10 this follows from the proof of Lemma 8 in Francq and Zakoïan, 2007) so the matrix Γ exists, and, following the proof of Theorem 22(ii) in Francq and Zakoïan (2004), Γ is positive definite Consequently, since K > 0 and Λ is a convex space, S(u) has a unique minimum on Λ almost surely Because n 1/2 (ˆθ R θ 0 ) minimizes S n (u), it follows from the proof of Lemma 22 and Remark 1 in Davis, Knight, and Liu (1992) that n 1/2 (ˆθ R θ 0 ) d argmin u Λ S(u) The result of this theorem holds since N = d KΓY, which implies that argmin u Λ S(u) = d argmin u Λ (Y KΓY 2u KΓY +u KΓu) = argminu Λ (u Y) KΓ(u Y) If all parameter values are positive (ie, A8 holds), it follows from Theorem 31 that ˆθ R is asymptotically normal Corollary 31 If A1 A8 hold, then n 1/2(ˆθR θ 0 ) d Y N(0,Σ) (33) Proof This result follows from (32), since KΓ is positive definite and, under A8, Λ = IR p+q Remark 1: Assumptions A5 and A6 are mild They hold, for example, if Z t has a Laplace, logistic, N(0,1), or rescaled Student s t (rescaled to have unit variance) distribution Remark 2: As discussed in the proof of Theorem 31, given A8, A9 or A10, the derivatives ɛ t (θ 0)/ θ i, i {1,,p+q}, have finite second moments Since A8 is required for GARCH parameter estimates to be asymptotically unbiased and normal, it is a standard assumption in the literature Conditions A9 and A10 are similar to assumptions introduced by Francq and Zakoïan (2007), who derived the limiting distribution for Gaussian QMLEs when some GARCH parameter values may be zero In the absence of A8 and A9, EXt 6 <, instead of A10 (EXt 4 < ), is required for the n 1/2 -consistency of the Gaussian QMLEs, however (Francq and Zakoïan, 2007) Note that α 01 > 0 and β 01 > 0 are sufficient for A9

10 Rank Estimation for GARCH Processes 9 Remark 3: When a parameter vector may lie on a boundary of its parameter space, it is standard for estimators to converge in distribution to the minimizer of a random quadratic function on a convex space (DWK Andrews, 1999) Francq and Zakoïan (2007) show this is true for Gaussian QMLEs of GARCH model parameters The form of the limiting distribution for ˆθ R in (32) is, therefore, to be expected Let the vector (Y 1,,Y p+q ) contain the Gaussian random elements of Y N(0,Σ) If only the jth element of θ 0, θ 0j, equals zero and all other elements of θ 0 are positive, it follows from (32) that n 1/2 (ˆθ j,r θ 0j ) = n 1/2ˆθ d j,r Yj I{Y j 0} (I{ } represents the indicator function), so ˆθ j,r is asymptotically half-normal in this case Remark 4: If f, the density function for ln(zt 2 ), is almost everywhere differentiable on IR, using integration by parts, it can be shown that K = f (x)λ(f(x))dx = 1 0 [f (F 1 (x))/f(f 1 (x))]λ(x)dx In practice, these integrals can be easier to evaluate than f(x)dλ(f(x)) Remark 5: Let f Z represent the density function for the iid noise process {Z t } and, for η = (η 0,η 1,,η p+q ) (0, ) Θ, let ɛ t (η) = ln(x2 t ) ln(σ2 t (η)) with σ2 t (η) = η 0 + p i=1 η ixt i 2 + q j=1 η p+jσt j 2 (η) t From Berkes and Horváth (2004), under general conditions which include A8, ML estimators of η 0 = (α 00,α 01,,α 0p,β 01,,β 0q ) are asymptotically normal with mean η 0 and covariance matrix 4n 1 τ 2 A 1, where τ 2 = (E{[1 + Z t f Z (Z t)/f Z (Z t )] 2 }) 1 and A = E{[ ɛ t(η 0 )/ η][ ɛ t(η 0 )/ η] } Using the delta method (see, for example, Rao, 1973, pages ), the corresponding estimators of θ 0, ˆθ ML := (ˆα 1,ML /ˆα 0,ML,, ˆα p,ml /ˆα 0,ML, ˆβ 1,ML,, ˆβ q,ml ), are asymptotically Gaussian with mean θ 0 and covariance matrix 4n 1 τ 2 BA 1 B, where B is the (p+q) (p+q +1) Jacobian matrix α 01 /α /α α 02 /α /α B = α 0p /α /α For any p and q, matrix algebra can be used to show that BA 1 B = Γ 1 and, therefore, under assumptions

11 Rank Estimation for GARCH Processes 10 A1 A8, the asymptotic relative efficiency (ARE) for R-estimation with respect to ML is 4τ 2 J 1 K2 If the weight function λ is proportional to f (F 1 (x))/f(f 1 (x)), then J 1 K2 = E{[f (ln(zt))/f(ln(z 2 t))] 2 2 } (Jure cková and Sen, 1996, Section 34) In addition, since f(x) = e x f Z ( e x ) when the distribution for Z t is symmetric about zero, it can be shown that f (ln(zt))/f(ln(z 2 t)) 2 = [1 + Z t f Z (Z t)/f Z (Z t )]/2 in the symmetric case Consequently, when the distribution for Z t is symmetric about zero and a weight function λ f (x) f (F 1 (x))/f(f 1 (x)) is used, J 1 K2 = E{[1+Z t f Z (Z t)/f Z (Z t )] 2 }/4 = (4τ 2 ) 1 and, if A1 A8 hold, R-estimation has the same asymptotic efficiency as ML Remark 6: When Z t N(0,1), the optimal weight function λ f (x) [Φ 1 ((x + 1)/2)] 2 1, where Φ represents the standard normal distribution function Hence, J 1 K2 = (4τ 2 ) 1 when Z t N(0,1) and λ(x) = [Φ 1 ((x + 1)/2)] 2 1 However, the function λ N (x) := [Φ 1 ((x + 1)/2)] 2 1 does not satisfy assumption A7, since it is unbounded as x 1, so limiting results (32) and (33) do not necessarily hold for R-estimates obtained using λ N Bounded weight functions closely approximating λ N which do satisfy the assumptions can, however, be found For example, let λ m,n (x) = λ N (x)i{0 < x 1 1/m}+ λ N (1 1/m)I{x > 1 1/m}, with m 2 This weight function satisfies assumptions A4 and A7 and, as m, λ m,n converges pointwise to λ N on (0,1) In addition, Jm := 1 0 λ2 m,n (x)dx ( 1 0 λ m,n(x)dx) λ2 N (x)dx ( 1 0 λ N(x)dx) 2 and K m := f(x)dλ m,n(f(x)) f(x)dλ N(F(x)) Therefore, in the case of Gaussian noise, large m 2 can be chosen so that R-estimation with weight function λ m,n has essentially the same asymptotic efficiency as ML Gaussian ML estimators of GARCH model parameters are also consistent when the noise distribution is non-gaussian (Francq and Zakoïan, 2004), and when all parameter values are positive and E{Z 4 t } <, Gaussian QMLEs of η 0 are asymptotically normal with mean η 0 and covariance matrix n 1 (E{Z 4 t } 1)A 1 (Berkes and Horváth, 2004; Francq and Zakoïan, 2004) It follows that the corresponding estimators of θ 0, ˆθ GML := (ˆα 1,GML /ˆα 0,GML,, ˆα p,gml /ˆα 0,GML, ˆβ 1,GML,, ˆβ q,gml ), have a limiting normal distribution with mean θ 0 and covariance matrix n 1 (E{Z 4 t } 1)BA 1 B = n 1 (E{Z 4 t } 1)Γ 1, so the ARE for R- estimation with respect to Gaussian QML is (E{Z 4 t } 1) J 1 K2 The rate of convergence for Gaussian

12 Rank Estimation for GARCH Processes 11 Distribution for Noise {Z t } ARE Laplace 1221 logistic 1118 N(0, 1) 1000 t 3 t t t t t t t Table 31: AREs for R-estimation with λ(x) λ N (x) with respect to Gaussian QML QMLEs is slower than n 1/2 when E{Zt 4} =, however (Hall and Yao, 2003) Since E{Z4 t } < is not required for the n 1/2 -consistency of ˆθ R, R-estimation is more robust than traditional Gaussian QML In Table 31 we give the values of ARE (E{Z 4 t} 1) J 1 K2 (rounded to the nearest three decimal places) when the weight function is λ m,n with m large (ie, λ(x) λ N (x)) and the noise distribution is Laplace, logistic, N(0,1), and rescaled t with various degrees of freedom Note that, in the case of rescaled t 3 noise, we have an ARE of, since in this case E{Zt 4 } = and so Gaussian QMLEs have a rate of convergence slower than n 1/2 Since all AREs in Table 31 are greater than or equal to one, with equality only when the noise distribution is Gaussian, R-estimation is not only more robust than Gaussian QML, but also tends to be more efficient In the case of R-estimation for linear model parameters, R-estimation with the weight function that is optimal for Gaussian noise is always at least as asymptotically efficient as Gaussian QML (Chernoff and Savage, 1958; Gastwirth and Wolff, 1968; see Hettmansperger and McKean, 1998, for discussion) As can be seen in Table 31, this is true in the case of GARCH parameter estimation for commonly used noise distributions, but it is possible for (E{Z 4 t } 1) J 1 K2 to be much less than one when λ(x) [Φ 1 ((x + 1)/2)] 2 1, however (for example, when ln(z 2 t ) N(µ = 0125,σ2 = 025), (E{Z 4 t} 1) J 1 K2 = 0787) Remark 7: R-estimation is also relatively efficient when compared to more robust GARCH estimation techniques To demonstrate, in Table 32, we give AREs for R-estimation with weight function λ t7 (x) := [7{F 1 t 7 ((x+1)/2)} 2 5]/[{F 1 t 7 ((x+1)/2)} 2 +5], where F t7 represents the distribution function for rescaled t 7 noise, with respect to (A) Laplace QML (Berkes and Horváth, 2004; Mukherjee, 2008), (B) rescaled

13 Rank Estimation for GARCH Processes 12 ARE Distribution for Noise {Z t } (A) (B) (C) (D) Laplace logistic N(0, 1) t t t t t t t t Table 32: AREs for R-estimation with weight function λ t7 with respect to (A) Laplace QML, (B) rescaled Student s t 7 QML, (C) least absolute deviations, and (D) Gaussian QML Student s t 7 QML (Muler and Yohai, 2008), and (C) the least absolute deviations estimation technique proposed by Peng and Yao (2003) involving log-transformed GARCH residuals The weight function λ t7 is optimal when the noise distribution is rescaled Student s t 7 and satisfies assumptions A4 and A7 Under general conditions, which include A8, estimators (A) (C) are asymptotically normal and n 1/2 -consistent for (α 00 /c,α 01 /c,,α 0p /c,β 01,,β 0q ), for different values of c > 0 These techniques can, therefore, also be used to obtain n 1/2 -consistent and asymptotically normal estimators of θ 0 For the noise distributions considered in Table 32, R-estimation with weight function λ t7 is more efficient than Laplace QML for all distributions except Laplace, and R-estimation uniformly dominates rescaled Student s t 7 QML and least absolute deviations with respect to asymptotic efficiency AREs for R-estimation with weight function λ t7 with respect to Gaussian QML are also given in Table 32 (column (D)) Since, when compared to other techniques, R-estimation with weight function λ t7 performs well for light, medium, and heavier-tailed noise distributions, we recommend that it or R-estimation with a similar weight function be used in practice when the noise distribution is unknown The weight function λ t7 is plotted in Figure 31, along with λ N Note that λ t7 (x) and λ N (x) are fairly similar except near x = 1 Remark 8: The Wilcoxon weight function λ W (x) = 2x 1 is also shown in Figure 31 When estimating linear regression or linear time series model parameters via R-estimation, it is generally the recommended weight function when the noise distribution is unknown, since the corresponding R-estimators tend to be

14 Rank Estimation for GARCH Processes t(7) N Wilcoxon Figure 31: The weight functions λ t7, λ N, and Wilcoxon weight function λ W relatively efficient (see, for example, Hettmansperger and McKean, 1998, Andrews, Davis, and Breidt, 2007, and Andrews, 2008) In the case of linear model estimation, λ W is the optimal weight function when the noise distribution is logistic and, for R-estimation of GARCH model parameters, λ W is optimal when ln(zt) 2 is logistic R-estimation with the Wilcoxon function is not relatively efficient for GARCH models with commonly used noise distributions, however To demonstrate, in Table 33, we give AREs for R-estimation with the Wilcoxon function with respect to R-estimation with weight function λ t7 All AREs are less than one Remark 9: For estimating η 0 = (α 00,α 01,,α 0p,β 01,,β 0q ), an alternative to using Gaussian QML or ML with a noise distribution resembling the empirical distribution for R-estimation residuals is to estimate α 00 via ˆα 0 := n 1 n exp(ɛ t(ˆθ R )) and then let ˆη = (ˆα 0, ˆα 0ˆθ1,R,, ˆα 0ˆθp,R, ˆθ p+1,r,,ˆθ p+q,r ) Since ˆα 0 = n 1 n exp(ɛ t (ˆθ R )) = n 1 n X 2 t σ 2 t(ˆθ R ) = n 1 α 00 Z 2 t σ 2 t /α 00 σ 2 t(ˆθ R ) (34) and, using the proof of Lemma 58 in Berkes, Horváth, and Kokoszka (2003), it can be shown that the right-hand side of (34) equals n 1 n α 00Z 2 t + o p(1), ˆα 0 P α00 By Theorem 31, ˆθR P θ0 = (α 01 /α 00,,α 0p /α 00,β 01,,β 0q ), and so ˆη is consistent for η 0 From Theorem 31 and Corollary 31, we have the limiting distributions for ˆθ i,r, i = p + 1,,p + q, and when E{Z 4 t} <, it can be shown

15 Rank Estimation for GARCH Processes 14 Distribution for Noise {Z t } ARE Laplace 0760 logistic 0736 N(0, 1) 0634 t t t t t t t t Table 33: AREs for R-estimation with the Wilcoxon function with respect to R-estimation with weight function λ t7 that n 1/2 (ˆα 0 α 00 ) d α 00 (ξ E{ ɛ t (θ 0)/ θ} + Ỹ) and, for i {1,,p}, n1/2 (ˆα 0ˆθi,R α 0i ) d α 00 ξ i + α 0i (ξ E{ ɛ t(θ 0 )/ θ} + Ỹ), with the random vector ξ as defined in equation (32), Ỹ N(0,Var{Z2 t}) independent of ξ, and ξ i denotes the ith element of ξ It follows that, if A1 A8 hold and E{Z 4 t} <, then n 1/2 (ˆα 0 α 00 ) d N(0,α 2 00 [(E{ ɛ t (θ 0)/ θ}) ΣE{ ɛ t (θ 0)/ θ} + Var{Zt 2 }]) and, for i {1,,p}, n 1/2 (ˆα 0ˆθi,R α 0i ) d N(0,C i ΣC i +α 2 0i Var{Z2 t }), where C i := (0,,0, α }{{} 00, 0,,0 ) +α }{{} 0i E i 1 times p+q i times { ɛ t (θ 0 ) However, as is the case for Gaussian QMLEs of η 0, when E{Z 4 t } =, the estimators ˆα 0 and ˆα 0ˆθi,R, i {1,,p}, can be shown to have a rate of convergence slower than n 1/2, so this is not a robust technique θ } 32 Order Selection and Interval Estimation To use Theorem 31 to make inferences about θ 0, estimates of K and Γ are needed Since λ is determined by the practitioner, J is known For estimating Γ, we propose using ˆΓ n := 1 n [ ( ɛ t (ˆθ R ) 1 θ n s=p+1 )][ ( ɛ s (ˆθ R ) ɛ t (ˆθ R ) 1 θ θ n s=p+1 )] ɛ s (ˆθ R ), θ with {ɛ t (θ)} as defined in (23) Since ˆθ R P θ0, by the proofs of Lemmas 11 and 12 in Francq and Zakoïan (2007), ˆΓ n P Γ A consistent estimator for K is given in the following theorem Theorem 32 Consider the empirical distribution function ˆF n (x) := (n p) 1 n I{ɛ t(ˆθ R ) x} and

16 Rank Estimation for GARCH Processes 15 the kernel density estimator ˆf n (x) := 1 b n n κ ( ) ɛ t (ˆθ R ) x, (35) b n where κ is a uniformly continuous, differentiable kernel density function on IR such that κ is uniformly continuous on IR and xln x 1/2 κ (x) dx <, and the bandwidth sequence {b n } is chosen so that b n P 0 and b 2 n n P as n If A1 A7 and either A8, A9 or A10 hold, then ˆK n := P K ˆf n (x)dλ( ˆF n (x)) = ˆf n (ɛ (t) (ˆθ R )) [ ( ) ( )] t p t p 1 λ λ n p n p Proof Let f ɛ and F ɛ denote the density and distribution functions for ɛ t = ln(α 00 Zt 2 ) By Lemma 55 in the Appendix, sup x IR ˆf n (x) f ɛ (x) P 0and, usingtheglivenko-cantellitheorem, sup x IR ˆF n (x) F ɛ (x) P 0 As a result, ˆKn P f ɛ(x)dλ(f ɛ (x)) by the proof of Theorem 453 in Koul (2002) Since f ɛ (x) = f(x ln(α 00 )) and F ɛ (x) = F(x ln(α 00 )), K = f ɛ(x)dλ(f ɛ (x)), and so the proof is complete It follows that ˆΣ n := 2 1 J ˆK n ˆΓ n is consistent for Σ = J K 2 Γ 1 Note that the Gaussian and Student s t densities satisfy the conditions for the kernel density function κ in Theorem 32 For GARCH order selection, we can test null hypotheses of the form H 0 : θ 0i1 = = θ 0im = 0, (36) with 1 i 1 < < i m p+q Following Corollary 31, a corresponding Wald test statistic is given by W n := n(wˆθ R ) (WˆΣ n W ) 1 Wˆθ R, (37) where W = [w j,k ] is the m (p+q) matrix with w l,il = 1 for l {1,,m} and w j,k = 0 otherwise The limiting distribution for W n under H 0 is given in the following theorem This limiting distribution has a simple form when we are testing the nullity of just one parameter Theorem 33 Under (36), if A1 A7 and either A9 or A10 hold, then W n d W := ξ W (WΣW ) 1 Wξ as n, with ξ as defined in (32) If m = 1 and θ 0j > 0 for j i 1, then W d = V 2 I{V 0} with V N(0,1)

17 Rank Estimation for GARCH Processes 16 Proof By Theorem 31, under these conditions, n 1/2 Wˆθ R d Wξ Since ˆΣn P Σ, it follows that Wn d W If m = 1 and θ 0j > 0 for j i 1, following Remark 3, Wξ = ξ i1 = Y i1 I{Y i1 0} with Y i1 N(0,Σ i1,i 1 ) (Σ i1,i 1 represents element i 1,i 1 of Σ), so W = Σ 1 i 1,i 1 Y 2 i 1 I{Y i1 0} d = V 2 I{V 0} Therefore, assuming θ 0j > 0 for j i 1, we can reject the null hypothesis H 0 : θ 0i1 = 0 at level of significance α < 1/2 if W n > χ 2 1,1 2α, where χ2 1,1 2α represents the 1 2α quantile of the chi-squared distribution with one degree of freedom Finding a critical value for testing the nullity of multiple parameters θ 0i1,,θ 0im is not as simple, since, in general, the distribution for the limiting random variable W under H 0 depends on the values of the other model parameters and the distribution for the iid noise {Z t } However, one can examine the distribution for W n under H 0 in practice by simulating GARCH series of length n with parameter vector θ = (θ 1,,θ p+q ), with θ j = 0 for j {i 1,,i m } and θ j = ˆθ j,r otherwise, and iid noise following the empirical distribution for R-estimation residuals, and then obtaining the corresponding values for W n Test statistics for testing the nullity of GARCH model parameters using Gaussian QMLEs are considered in Francq and Zakoïan (2009) Limiting results in Francq and Zakoïan (2009) require E{Zt 4 } <, however In contrast, when A9 holds, higher-order moment conditions are not required for the limiting result in Theorem 33 Although parameter estimation for the ARCH(p) model with θ 01 = = θ 0p = 0 is not considered in this paper since A2 is not satisfied, our results can be used to show that, under H 0 : θ 01 = = θ 0p = 0, if E{Z 4 t} <, then W n d W = ξ Σ 1 ξ, where, in this case, ξ = (Y 1 I{Y 1 0},,Y p I{Y p 0}), Y = (Y 1,,Y p ) N(0,Σ), and Σ = J K 2 (Var{Z 2 t}) 1 I (I represents the identity matrix) It follows that, under these conditions, W n d V 2 1 I{V 1 0} + + V 2 p I{V p 0}, where V 1,,V p are iid N(0,1) Consequently, R-estimation and the Wald test statistic (37) can be used to identify ARCH-type conditional heteroskedasticity in an observed time series However, because E{Z 4 t} < is required, this technique is just as robust as more traditional Gaussian likelihood-based techniques (see Francq and Zakoïan, 2009) Therefore, in practice, the rank-based Wald test statistic is most useful for choosing between GARCH models of different orders that are under consideration for an observed conditionally heteroskedastic series Once appropriate GARCH model orders p and q have been identified, confidence intervals for the elements of θ 0

18 Rank Estimation for GARCH Processes 17 can be obtained using the limiting normal result in Corollary 31 and the consistent estimate ˆΣ n of Σ 4 Numerical Results 41 Simulation Study In this section, we give the results of a simulation study to assess the quality of the asymptotic approximations for finite samples First, for each of 1000 replicates, we simulated a GARCH(1,1) series with parameters α 00 = 001, α 01 = 05, and β 01 = 04, and found the R-estimate ˆθ R of θ 0 = (05/001,04) = (50,04) by minimizing D n in (25) To reduce the possibility of the optimizer getting trapped at local minima, we used 100 starting values for each replicate Starting values for α 1 and β 1 = θ 2, with α 1 + β 1 < 1, were randomly chosen, and then, since α 1 +β 1 < 1 implies Var{X t } = α 0 /(1 α 1 β 1 ), we used the values of α 1 /[s 2 X (1 α 1 β 1 )], where s 2 X represents sample variance for {X t} n t=1, for the starting values of θ 1 = α 1 /α 0 We evaluated D n at each of the 100 candidate values and then reduced the collection of initial values to the three with the smallest values of D n Using these three initial values as starting points, we found optimized values by implementing the Nelder-Mead algorithm (Nelder and Mead, 1965) The optimized value for which D n was smallest was chosen to be ˆθ R By Corollary 31, in this GARCH(1,1) case, ˆθ R is asymptotically normal with mean θ 0 and covariance matrix n 1 Σ = n 1 J K 2 Γ 1, with { } ɛ Γ = Var t (θ 0 ) α 00 ( j=0 = Var βj 01 X2 t 1 j )/σ2 t θ ( j=0 βj 01 σ2 t 1 j )/σ2 t Confidence intervals for the elements of θ 0 were, therefore, constructed using the asymptotic normality and ˆΣ n, the consistent estimator of Σ For the kernel density estimator (35), we used the standard Gaussian kernel density function and, because of its recommendation in Silverman (1986, page 48), we used bandwidth b n = 09n 1/5 min{s ɛ,iqr ɛ /134},wheres ɛ andiqr ɛ representsamplestandarddeviationandinterquartile range for {ɛ t (ˆθ R )} n Results of these simulations, for N(0,1) and rescaled t 3 noise, and weight functions λ t7 (x) = [7{F 1 t 7 ((x+ 1)/2)} 2 5]/[{F 1 t 7 ((x + 1)/2)} 2 + 5] and λ W (x) = 2x 1, are given in Tables 41 and 42 We show the empirical means, standard deviations, and percent coverages of nominal 95% confidence intervals for the

19 Rank Estimation for GARCH Processes 18 Asymptotic Empirical std dev mean std dev % coverage n mean (N(0,1), t 3 ) (N(0,1), t 3 ) (N(0,1), t 3 ) (N(0,1), t 3 ) 250 θ 1 = α 1 /α 0 = , , , , 890 θ 2 = β 1 = , , , , θ 1 = α 1 /α 0 = , , , , 910 θ 2 = β 1 = , , , , θ 1 = α 1 /α 0 = , , , , 956 θ 2 = β 1 = , , , , 976 Table 41: Empirical means, standard deviations, and percent coverages of nominal 95% confidence intervals for R-estimates of GARCH model parameters The N(0,1) and rescaled t 3 noise distributions and weight function λ t7 (x) = [7{F 1 t 7 ((x+1)/2)} 2 5]/[{F 1 t 7 ((x+1)/2)} 2 +5] were used Asymptotic Empirical std dev mean std dev % coverage n mean (N(0,1), t 3 ) (N(0,1), t 3 ) (N(0,1), t 3 ) (N(0,1), t 3 ) 250 θ 1 = α 1 /α 0 = , , , , 885 θ 2 = β 1 = , , , , θ 1 = α 1 /α 0 = , , , , 929 θ 2 = β 1 = , , , , θ 1 = α 1 /α 0 = , , , , 955 θ 2 = β 1 = , , , , 967 Table 42: Empirical means, standard deviations, and percent coverages of nominal 95% confidence intervals for R-estimates of GARCH model parameters The N(0,1) and rescaled t 3 noise distributions and Wilcoxon weight function λ W (x) = 2x 1 were used R-estimates of θ 0 The asymptotic means and standard deviations are also given in Tables 41 and 42 We see that the R-estimates appear nearly unbiased, and the asymptotic standard deviations fairly accurately reflect the true variability of the estimates, so R-estimation with weight function λ t7 is more precise than R-estimation with the Wilcoxon weight function We also see that the confidence interval coverages are close to the nominal 95% level, especially when n = 2000 Normal probability plots show that the R-estimates are approximately normal, particularly for large n We also ran simulations to assess the accuracy of the asymptotic approximations in the case of one null parametervalue,byfittinggarch(2,1)andgarch(1,2)modelstogarch(1,1)serieswithθ 0 = (50,04) When θ 0j = 0, we obtained the Wald statistic (37) for testing H 0 : θ 0j = 0 and comparedit to χ 2 1,09 = 2706, the level 005 critical value for testing the nullity of one parameter assuming all other parameters are positive

20 Rank Estimation for GARCH Processes 19 Empirical mean std dev % Wald stats > 2706 n (N(0,1), t 3 ) (N(0,1), t 3 ) (N(0,1), t 3 ) 250 θ 1 = α 1 /α 0 = , , 263 θ 2 = α 2 /α 0 = 0 715, , , 70 θ 3 = β 1 = , , θ 1 = α 1 /α 0 = , , 190 θ 2 = α 2 /α 0 = 0 487, , , 114 θ 3 = β 1 = , , θ 1 = α 1 /α 0 = , , 100 θ 2 = α 2 /α 0 = 0 266, , , 94 θ 3 = β 1 = , , θ 1 = α 1 /α 0 = , , 289 θ 2 = β 1 = , , 0188 θ 3 = β 2 = , , , θ 1 = α 1 /α 0 = , , 188 θ 2 = β 1 = , , 0144 θ 3 = β 2 = , , , θ 1 = α 1 /α 0 = , , 913 θ 2 = β 1 = , , 0081 θ 3 = β 2 = , , , 65 Table 43: Empirical means, standard deviations, and the percentages of Wald statistics above 2706 for R- estimates of GARCH model parameters The N(0,1) and rescaled t 3 noise distributions and weight function λ t7 (x) = [7{F 1 t 7 ((x+1)/2)} 2 5]/[{F 1 t 7 ((x+1)/2)} 2 +5] were used Empirical means and standard deviations for the estimates, and the percentages of Wald statistics above 2706 are given in Tables 43 and 44 The R-estimators appear consistent in both the GARCH(2,1) and GARCH(1,2) cases, and the test sizes are close to the nominal 005 level, particularly when n is large 42 GARCH Modeling In Figure 41(a), wegive the daily log-returnsfor the Japaneseyen to US dollarexchangerate forjanuary4, 1993 December 31, 2002 Sample autocorrelation functions for these data, {X t } 2514 t=1, and their absolute valuesandsquaresaregiveninfigure41(b) (d)with thebounds±196/ 2514, approximate95%confidence bounds for the sample correlations assuming the observations are iid (see, for example, Brockwell and Davis, 1991, Section 72) Since the log-returns {X t } exhibit volatility clustering and appear uncorrelated but dependent, a GARCH model appears appropriate for this series To find a suitable fitted model, we replaced the seventeenvaluesofx t = 0with X t = 10 8 (averyslightalterationofthe serieswhichallowsus toassume

21 Rank Estimation for GARCH Processes 20 Empirical mean std dev % Wald stats > 2706 n (N(0,1), t 3 ) (N(0,1), t 3 ) (N(0,1), t 3 ) 250 θ 1 = α 1 /α 0 = , , 299 θ 2 = α 2 /α 0 = 0 848, , , 34 θ 3 = β 1 = , , θ 1 = α 1 /α 0 = , , 209 θ 2 = α 2 /α 0 = 0 644, , , 91 θ 3 = β 1 = , , θ 1 = α 1 /α 0 = , , 102 θ 2 = α 2 /α 0 = 0 291, , , 84 θ 3 = β 1 = , , θ 1 = α 1 /α 0 = , , 337 θ 2 = β 1 = , , 0201 θ 3 = β 2 = , , , θ 1 = α 1 /α 0 = , , 199 θ 2 = β 1 = , , 0156 θ 3 = β 2 = , , , θ 1 = α 1 /α 0 = , , 944 θ 2 = β 1 = , , 0085 θ 3 = β 2 = , , , 71 Table 44: Empirical means, standard deviations, and the percentages of Wald statistics above 2706 for R- estimates of GARCH model parameters The N(0,1) and rescaled t 3 noise distributions and Wilcoxon weight function λ W (x) = 2x 1 were used

22 Rank Estimation for GARCH Processes 21 (a) Log Returns (b) ACF t Lag (c) ACF of Absolute Values (d) ACF of Squares Lag Lag Figure 41: (a) The daily log-returns for the Japanese yen to US dollar exchange rate for January 4, 1993 December 31, 2002, with sample autocorrelation functions for (b) the log-returns, (c) their absolute values, and (d) their squares

23 Rank Estimation for GARCH Processes 22 (a) ACF of Absolute Values (b) ACF of Squares Lag Lag Figure 42: Sample autocorrelation functions for the (a) absolute values and (b) squares of the GARCH(1,1) residuals obtained via R-estimation A3 and compute {ln(x 2 t)}), and fit low order ARCH/GARCH models to the data via R-estimation with weight function λ t7 (x) = [7{F 1 t 7 ((x+1)/2)} 2 5]/[{F 1 t 7 ((x+1)/2)} 2 +5] ARCH(1) and ARCH(2) models were not appropriate, since those residuals were dependent, but, as is often the case with log-returns series, GARCH(1,1) residuals appeared independent To demonstrate, in Figure 42, we give sample autocorrelation functions for the absolute values and squares of the residuals {X t / σ t(ˆθ 2 R )} from the GARCH(1,1) fitted model; note that the values of {X t / σ t(ˆθ 2 R )} resemble {X t / σt/α 2 00 } = { α 00 Z t } when ˆθ R is close to the true parameter vector θ 0 The corresponding R-estimates for the GARCH(1,1) parameter values are ˆθ 1 = α 1 /α 0 = and ˆθ 2 = ˆβ 1 = 09388, with approximate 95% confidence intervals (20793,69755) and (09170, 09606) Higher order GARCH(2,1) and GARCH(1,2) models were also considered, but low values of the Wald test statistic led us to fail to reject the null hypotheses H 0 : α 02 /α 00 = 0 and H 0 : β 02 = 0 at the 005 level of significance This analysis, therefore, indicates that a GARCH(1,1) model is suitable for the exchange rate log-returns We then considered using ML to fit a GARCH(1,1) model to the data and obtain individual estimates for α 00 and α 01 A kernel estimate of the density for the standardized GARCH(1,1) residuals, ie {X t / σ t 2(ˆθ R )} standardized to have variance one, is given in Figure 43(a), along with the N(0,1) density Since the distribution for the residuals appears roughly symmetric, but more peaked and heavier-

24 Rank Estimation for GARCH Processes 23 (a) (b) Density Kernel Estimate N(0, 1) Density Kernel Estimate t(5266) Figure 43: Kernel estimate of the density for the standardized GARCH(1,1) residuals with (a) the N(0,1) density function and (b) the rescaled t 5266 density function tailed than Gaussian, we considered modeling the GARCH(1,1) noise distribution as rescaled Student s t The ML estimate of degrees of freedom is 5266 and, in Figure 43(b), it can be seen that the kernel density estimate for the standardized residuals and the t 5266 density are close, so it appears reasonable to model the log-returns series as GARCH(1,1) with iid rescaled t 5266 noise Corresponding ML estimates of the model parameters are ˆα 0 = , ˆα 1 = , and ˆβ 1 = 09393, and, using the theory of Berkes and Horváth (2004), approximate 95% confidence intervals for the parameter values are ( , ), (002663,006050),and (09176,09610) Note that the ML estimate for β 01 and the corresponding confidence interval are nearly the same as those obtained via R-estimation, even though no specific distributional information was used for R-estimation Also, ˆα 1 /ˆα 0 = 45231, which is quite close to the R-estimate of θ 01 = α 01 /α 00 Finally, to verify that the rescaled t 5266 distribution is suitable for the fitted GARCH(1,1) noise process, we used the ML residuals and the Kolmogorov-Smirnov test described in Koul and Ling (2006) to test this null hypothesis The test statistic K n equals 1194 and, via simulation, we found a corresponding p-value of 0409 We, therefore, failed to reject H 0 (the noise distribution is t 5266 ) at the 005 level of significance, and so this test result indicates that the rescaled t 5266 distribution is appropriate

25 Rank Estimation for GARCH Processes 24 Appendix This section contains proofs of the lemmas used to establish Theorems 31 and 32 Assume conditions A1 A7 and either A8, A9 or A10 hold throughout To begin, note that following (23) (24), the first partial derivatives for the residuals {ɛ t (θ)} n t=1 are given by ɛ t(θ)/ θ i = [ σ t 2(θ)/ θ i]/ σ t 2 (θ), i {1,,p+q}, with 0, t p, σ t 2(θ) = θ X 2 i t i + q j=1 θ ( ) p+j σ 2 t j (θ)/ θ i, t {p+1,,n}, i {1,,p}, σ t+p i 2 (θ)+ q j=1 θ ( ) p+j σ 2 t j (θ)/ θ i, t {p+1,,n}, i {p+1,,p+q} And we let ɛ(θ) θ = 1 n p ɛ t (θ) θ and ɛ (θ) θ = 1 n p ɛ t (θ) θ (recall that the partial derivatives ɛ t (θ)/ θ i, i {1,,p+q}, are given in (31)) Lemma 51 As n, 1 n n λ(f ɛ (ɛ t )) [ ] ɛ t (θ 0 ) ɛ(θ 0) d N N(0, θ θ JΓ), where ɛ t = ln(α 00 Z 2 t) and F ɛ represents the distribution function for ɛ t Proof Bythe proofoflemma 12in Francqand Zakoïan(2007), n 1/2 n ɛ t(θ 0 )/ θ i ɛ t (θ 0)/ θ i P 0 for any i {1,,p+q} It, therefore, suffices to show that [ ] 1 n ɛ λ(f ɛ (ɛ t )) t(θ 0 ) ɛ (θ 0 ) n θ θ [ ] 1 n ɛ = [λ(f ɛ (ɛ t )) E{λ(F ɛ (ɛ t ))}] t(θ 0 ) ɛ (θ 0 ) n θ θ 1 n [ { }] ɛ = [λ(f ɛ (ɛ t )) E{λ(F ɛ (ɛ t ))}] t (θ 0 ) ɛ E t (θ 0 ) n θ θ ( )( 1 n { } ) ɛ [λ(f ɛ (ɛ t )) E{λ(F ɛ (ɛ t ))}] (θ 0 ) ɛ E t (θ 0 ) (51) n θ θ converges in distribution to N By the central limit theorem, n 1/2 n [λ(f ɛ(ɛ t )) E{λ(F ɛ (ɛ t ))}] N(0, J)and,since{ ɛ t(θ 0 )/ θ}isstationaryergodicwithe ɛ t(θ 0 )/ θ i < i(francqandzakoïan,2007), d

26 Rank Estimation for GARCH Processes 25 ɛ (θ 0 )/ θ P E{ ɛ t (θ 0)/ θ} Hence, (51) equals 1 n n [ ɛ [λ(f ɛ (ɛ t )) E{λ(F ɛ (ɛ t ))}] t (θ 0 ) E θ { ɛ t (θ 0 ) θ }] +o p (1) (52) Equation(52) converges in distribution to N by the central limit theorem for martingale differences(billingsley, 1961) Lemma 52 For any T (0, ), as n, sup u Λ, u T ( 1 n ( u [λ Rt θ0 +n 1/2 u ) ) ][ λ(f ɛ (ɛ t )) n n p+1 ɛ t (θ 0 ) θ ] ɛ(θ 0) θ Ku Γu (53) is o p (1) (Recall that Λ = Λ 1 Λ p+q, with Λ i = IR if θ 0i > 0 and Λ i = [0, ) if θ 0i = 0) Proof If R t,n (u) := R t (θ 0 +n 1/2 u) and V t,n := ɛ t (θ 0 )/ θ ɛ(θ 0 )/ θ, then (53) can be expressed as sup u Λ, u T 1 n n ( ) ] u [λ Rt,n (u) λ(f ɛ (ɛ t )) Vt,n n p+1 Ku Γu Because the weight function λ is left-continuous and K = f ɛ(x)dλ(f ɛ (x)) = 1 0 f ɛ(f 1 ɛ (y))dλ(y), (53) equals sup u Λ, u T 1 0 ( 1 n n { } ] ) u [I Rt,n (u) n p+1 y I{F ɛ (ɛ t ) y} Vt,n +f ɛ(fɛ 1 (y))u Γu dλ(y), which is bounded above by sup u Λ, u T, y (0,1) [λ(1) λ(0)] 1 n n { } ] u [I Rt,n (u) n p+1 y I{F ɛ (ɛ t ) y} Vt,n +f ɛ (Fɛ 1 (y))u Γu (54) (55) Equation (54) is o p (1); a related result is obtained in Andrews (2008, proof of Lemma 55) in the case of rank-based estimation for autoregressive-moving average models and a similar proof can be used here, so we omit the details Since λ is bounded, the proof of this lemma is complete Next consider the mixed partial derivatives of {ɛ t (θ)} n t=1 : 2 ɛ t (θ) θ i θ j = 1 σ 4 t (θ) ( σ 2 t (θ) θ i σ 2 t (θ) θ j σ t(θ) 2 2 σ t 2(θ) ), i,j {1,,p+q}, θ i θ j

27 Rank Estimation for GARCH Processes 26 with 2 σ 2 t(θ) θ i θ j = 0, i,j {1,,p}, + q k=1 θ 2 σ t k 2 (θ) p+k θ i θ j, i {1,,p}, j {p+1,,p+q}, σ 2 t+p j (θ) θ i σ 2 t+p j (θ) θ i + σ2 t+p i (θ) θ j for t {p+1,,n}, and 2 σ 2 t(θ)/( θ i θ j ) = 0 for t p Let Lemma 53 For any T (0, ), as n, sup u,v Λ, u, v T 1 n + q k=1 θ 2 σ t k 2 (θ) p+k θ i θ j, i,j {p+1,,p+q}, 2 ɛ(θ) θ θ = 1 n p 2 ɛ t (θ) θ θ ( ( u Rt θ0 +n 1/2 u ) )[ ] 2 ɛ t (θ 0 +n 1/2 v) λ n p+1 θ θ 2 ɛ(θ 0 +n 1/2 v) P θ θ u 0 Proof For any i,j {1,,p+q}, the sequence { 2 ɛ t(θ 0 )/( θ i θ j )} of mixed partial derivatives of {ɛ t(θ)} at θ = θ 0 is stationary, ergodic with E 2 ɛ t(θ 0 )/( θ i θ j ) < and n 1 n 2 ɛ t (θ 0 )/( θ i θ j ) 2 ɛ t (θ 0)/( θ i θ j ) P 0 (Francq and Zakoïan, 2007) Following Lemma 51, it can therefore be shown that and following Lemma 52, sup u Λ, u T 1 n sup u Λ, u T u [λ 1 n [ ] u 2 ɛ t (θ 0 ) λ(f ɛ (ɛ t )) θ θ 2 ɛ(θ 0 ) P θ θ u 0, ( ( Rt θ0 +n 1/2 u ) ) n p+1 ][ ] 2 ɛ t (θ 0 ) λ(f ɛ (ɛ t )) θ θ 2 ɛ(θ 0 ) θ θ u P 0 Since 1 sup v Λ, v T n 2 ɛ t (θ 0 +n 1/2 v) 2 ɛ t (θ 0 ) P 0 i,j {1,,p+q} θ i θ j θ i θ j (see Francq and Zakoïan, 2007, proof of Lemma 11), the proof is complete Now, for u Λ and δ 1,δ 2 [0,1], let Ũ n (u,δ 1,δ 2 ) = λ ( Rt ( θ0 +n 1/2 δ 1 u ) λ n p+1 ( ( Rt θ0 +n 1/2 δ 1 u ) n p+1 ( )[ɛ t θ 0 + δ ( 2u ) ɛ θ 0 + δ ) ] 2u n n ( )[ɛ t θ 0 + δ ( 1u ) ɛ θ 0 + δ ) ] 1u n n

Consistency of Quasi-Maximum Likelihood Estimators for the Regime-Switching GARCH Models

Consistency of Quasi-Maximum Likelihood Estimators for the Regime-Switching GARCH Models Yingfu Xie Research Report Centre of Biostochastics Swedish University of Report 2005:3 Agricultural Sciences ISSN