Financial Econometrics and Volatility Models Estimation of Stochastic Volatility Models

Financial Econometrics and Volatility Models Estimation of Stochastic Volatility Models Eric Zivot April 26, 2010

Outline Likehood of SV Models Survey of Estimation Techniques for SV Models GMM Estimation of SV Models State Space Estimation of SV Models Forecasting Volatility from SV Models

Reading FMUND, chapter 4 (section 7) MFTS, chapter 14 (section 4.3), 21 (section 7.4), 23 (section 4.2) APDVP, chapter 11

Estimation Techniques for SV Models Likelihood function Estimation methods

Standard SV Model or r t = μ + σ t u t ln(σ t ) α = φ(ln(σ t 1 α)+η t, φ < 1 Ã! ÃÃ! Ã!! ut 0 1 0 iid N, 0 0 σ 2 η η t r t = μ +exp(w t /2)u t, w t =ln(σ 2 t )=2ln(σ t ) w t α w = φ(w t 1 α w )+η w,t, η w,t iid N(0,σ 2 η w )! ÃÃ! Ã!! ut 0 1 0 iid N, η w,t 0 0 σ 2 η w Ã

Typicall, μ is first estimated using ˆμ = 1 n nx r t t=1 and the SV model is fit to the demeaned returns y t = r t ˆμ The parameters to be estimated are then θ = (α, φ, σ 2 η) 0 or (α, φ, β 2 ) 0, β 2 = σ 2 η(1 φ 2 ) θ w = (α w,φ,σ 2 η w ) 0 or (α w,φ,β 2 w) 0, β 2 = σ 2 η w (1 φ 2 )

Likelihood Function {r 1,...,r n } = sample of observed returns from SV model The likelihood function for {r 1,...,r n } given θ is L(r 1,...,r n θ) = Z cannot be evaluated analytically. However, Z f(r 1,...,r n σ 1,...,σ n ; θ)f(σ 1,...,σ n )dσ 1 dσ n f(r 1,...,r n σ 1,...,σ n ; θ) = Π n i=1 f(r i σ i ; θ) f(r i σ i ) = N(μ, σ 2 i ) and f(σ 1,...,σ n ) can be evaluated using the fact that ln σ t follows a Gaussian AR(1) model.

Method of moments Generalized method of moments (GMM) Quasi-MLE from state space model Simulated MLE Indirect Inference Bayesian MCMC

Method of Moments Idea: Estimate parameters by matching population moments to sample moments Population parameters: α, β 2,φ Population moments E[ r t μ ] = E[a t ]= q 2/π µα + 1 2 β2 var(r t ) = exp(2α +2β 2 ) kurt(r t ) = 3exp(4β 2 )

Method of moment estimators for α and β using var(r t ) and kurt(r t ):! ˆβ 2 = 1 Ã d 4 ln kurt(r t ) 3 ˆα = ln( dvar(r t )) ˆβ 2 2 Problem: d kurt(r t ) is very sensitive to outliers

Method moment estimators for α and β using E[ r t μ ] and var(r t ): where ā = 1 n ˆα = ln q πā2 2 dvar(r t ) Ã! 2 dvar(r t ) ˆβ = ln πā 2 nx t=1 r t r, r = 1 n nx r t t=1 No simple method of moments estimator for φ Note: Method of moment estimators are not unique

Review of GMM Estimation Idea: GMM optimally combines moment conditions to estimate population parameters Let {w t } be a covariance stationary and ergodic vector process representing the underlying data. Let the p 1 vector θ denote the population parameters. The moment conditions g(w t, θ) are K p possibly nonlinear functions satisfying E[g(w t, θ 0 )] = 0 where θ 0 represent the true parameter vector.

Global identification of θ 0 requires that E[g(w t, θ 0 )] = 0 E[g(w t, θ)] 6= 0 for θ 6= θ 0 Local Identification requires that the K p matrix " # g(wt, θ G = E 0 ) θ 0 has full column rank p.

The sample moment condition for an arbitrary θ is g n (θ) =n 1 n X t=1 g(w t, θ) If K = p, then θ 0 is apparently just identified and the GMM objective function is J(θ) =ng n (θ) 0 g n (θ) which does not depend on a weight matrix. The corresponding GMM estimator is then ˆθ =argmin θ J(θ) and satisfies g n (ˆθ) =0.

If K>p,then θ 0 is apparently overidentified. Let Ŵ denote a K K symmetric and p.d. weight matrix, possibly dependent on the data, such that Ŵ p W as n with W symmetric and p.d. The GMM estimator of θ 0, denoted ˆθ(Ŵ), is defined as ˆθ(Ŵ) =argmin θ The first order conditions are J(ˆθ(Ŵ), Ŵ) θ J(θ, Ŵ) =ng n (θ) 0 Ŵg n (θ) = 2G n (ˆθ(Ŵ)) 0 Ŵg n (ˆθ(Ŵ)) = 0 G n (ˆθ(Ŵ)) = g n(ˆθ(ŵ)) θ 0

Asymptotic Properties of Nonlinear GMM Under standard regularity conditions, it can be shown that where ˆθ(Ŵ) p θ 0 n ³ˆθ(Ŵ) θ 0 d N(0, avar(ˆθ(ŵ))) avar(ˆθ(ŵ)) = (G 0 WG) 1 G 0 WSWG(G 0 WG) 1 " # g(wt, θ G = E 0 ) θ 0, S =avar( ng n (θ 0 ))

The efficient GMM estimator uses a weight matrix W that minimizes avar(ˆθ(ŵ)). Hansen (1982) showed that the optimal weight matrix is W = S 1. In this case, avar(ˆθ(s 1 )) = (G 0 S 1 G) 1 If {g t (w t, θ 0 )} is an ergodic stationary MDS then S = E[g t (w t, θ 0 )g t (w t, θ 0 ) 0 ]. A consistent estimator of S takes the form Ŝ HC = n 1 n X t=1 g t (w t, ˆθ)g t (w t, ˆθ) 0, ˆθ p θ 0

If {g t (w t, θ 0 )} is a mean-zero serially correlated ergodic-stationary process then S = LRV = Γ 0 +2 X j=1 (Γ j + Γ 0 j ) Γ j = E[g t (w t, θ 0 )g t (w t j, θ 0 ) 0 ] and a consistent estimator has the form Ŝ HAC = ˆΓ 0 (ˆθ)+ ˆΓ j (ˆθ) = 1 n nx t=j+1 ˆθ p θ 0 n 1 X j=1 k Ã j q(n)! (ˆΓ j (ˆθ)+ˆΓ 0 j (ˆθ)) g t (w t, ˆθ)g t j (w t jˆθ) 0

The efficient GMM estimator may be computed using two-step procedure iterated procedure continuous updating procedure.

Application to log-normal SV model Consider the alternative parameterization of the simple log-normal stochastic volatility (SV) model assuming μ =0: r t = σ t u t =exp(w t /2)u t, t =1,...,n w t =lnσ 2 t = ω + φw t 1 + η w,t (u t,η w,t ) 0 iid N(0, diag(1,σ 2 η w )) θ w =(ω, φ, σ ηw ) 0 For 0 <φ<1 and σ ηw 0, the series r t is strictly stationary and ergodic, and unconditional moments of all orders exist.

The GMM estimation of the SV model is surveyed in Andersen and Sorensen (1996). They recommended using moment conditions for GMM estimation based on lower-order moments of r t, since higher-order moments tend to exhibit erratic finite sample behavior. They considered a GMM estimation based on (subsets) of 24 moments considered by Jacquier, Polson, and Rossi (1994). To describe these moment conditions, first define α w = ω 1 φ, β2 w = σ2 η w 1 φ 2

The moment conditions, which follow from properties of the log-normal distribution and the Gaussian AR(1) model, are expressed as E[ r t ] = (2/π) 1/2 E[σ t ] E[rt 2 ] = E[σ 2 t ] E[ rt 3 q ] = 2 2/πE[σ 3 t ] E[rt 4 ] = 3E[σ 4 t ] E[ r t r t j ] = (2/π)E[σ t σ t j ], j =1,...,10 E[rt 2 rt j 2 ] = E[σ2 t σ 2 t j ], j =1,...,10 where for any positive integer j and positive constants p and s, Ã E[σ p pαw t ] = exp 2 + p2 β 2 w 8 Ã E[σ p psφ j t σs t j ] = β 2 E[σp t ]E[σs w t]exp 4!!

Let and define the 24 1 vector g(w t, θ w )= w t = ( r t,rt 2, rt 3,rt 4, r t r t 1,..., r t r t 10,rt 2 rt 1 2,...,r2 t rt 10 2 )0 µα w2 + β2 w 8 r t (2/π) 1/2 exp µ rt 2 exp α w + β2 w 2 r 2 t r2 t 10 exp µα w + β2 w 2 2 exp ³ φ 10 β 2 w. Then, E[g(w t, θ w )] = 0 is the population moment condition used for the GMM estimation of the model parameters θ w =(α w,φ,β 2 w) 0. Since the elements of w t are serially correlated, the efficient weight matrix S = avar(ḡ) must be estimated using an HAC estimator.

Review of Linear Gaussian State Space Models State Space Models Defn: A state space model for an N dimensional time series y t consists of a measurement equation relating the observed data to an m dimensional state vector α t, and a Markovian transition equation that describes the evolution of the state vector over time.

The measurement equation has the form y t N 1 α t (N m)(m 1) = Z t ε t iid N(0, H t ) + d t N 1 + ε t, t =1,...,T N 1 The transition equation for the state vector α t is the first order Markov process α t m 1 α t 1 (m m)(m 1) = T t η t iid N(0, Q t ) + c t m 1 η t (m g)(g 1) + R t t =1,...,T

For most applications, it is assumed that the measurement equation errors ε t are independent of the transition equation errors E[ε t η 0 s ]=0 for all s, t =1,...,T The state space representation is completed by specifying the behavior of the initial state α 0 N(a 0, P 0 ) E[ε t a 0 0 ] = 0, E[η ta 0 0 ]=0 for t =1,...,T The matrices Z t, d t, H t, T t, c t, R t and Q t are called the system matrices, and contain non-random elements. If these matrices do not depend deterministically on t the state space system is called time invariant. Note: If y t is covariance stationary, then the state space system will be time invariant.

Initial State Distribution for Covariance Stationary Models If the state space model is covariance stationary, then the state vector α t is covariance stationary. The unconditional mean of α t, a 0, may be determined using E[α t ]=TE[α t 1 ]+c = TE[α t ]+c Solving for E[α t ], assuming T is invertible, gives a 0 = E[α t ]=(I m T) 1 c Similarly, var(α 0 ) may be determined using P 0 = var(α t )=Tvar(α t )T 0 + Rvar(η t )R 0 = TP 0 T 0 + RQR 0

Then, vec(p 0 ) = vec(tp 0 T 0 ) + vec(rqr 0 ) = (T T)vec(P 0 )+vec(rqr 0 ) which implies that vec(p 0 )=(I m 2 T T) 1 vec(rqr 0 )

Example: Unobserved component AR(2) model y t = μ + c t c t = φ 1 c t 1 + φ 2 c t 2 + η t,η t σ 2 The state vector is α t =(c t,c t 1 ) 0, which is unobservable, and the transition equation is Ã! Ã!Ã! Ã! φ1 φ = 2 ct 1 1 + η 1 0 0 t so that ct c t 1 T = Ã φ1 φ 2 1 0!, R = c t 2 Ã 1 0!, Q = σ 2 This representation has measurement equation matrices Z t =(1, 0), d t = μ, ε t =0, H t =0

Distribution of initial state α 0 N(a 0, P 0 ) Here α t =(c t,c t 1 ) 0 is stationary and has mean zero a 0 = E[α t ]=0 For the state variance, solve vec(p 0 )=(I 4 T T) 1 vec(rqr 0 )

Simple algebra gives and so I 4 T T = vec(rqr 0 ) = vec(p 0 )= 1 φ 2 2 φ 1φ 2 φ 1 φ 2 φ 2 2 φ 1 1 φ 2 0 φ 1 φ 2 1 0 1 0 0 1 σ 2 0 0 0 1 φ 2 2 φ 1φ 2 φ 1 φ 2 φ 2 2 φ 1 1 φ 2 0 φ 1 φ 2 1 0 1 0 0 1 1 σ 2 0 0 0

The Kalman Filter The Kalman filter is a set of recursion equations for determining the optimal estimates of the state vector α t given information available at time t, I t. The filter consists of two sets of equations: 1. Prediction equations 2. Updating equations To describe the filter, let a t = E[α t I t ]= optimal estimator of α t based on I t P t = E[(α t a t )(α t a t ) 0 I t ]= MSE matrix of a t

Prediction Equations Given a t 1 and P t 1 at time t 1, the optimal predictor of α t and its associated MSE matrix are a t t 1 = E[α t I t 1 ]=T t a t 1 + c t P t t 1 = E[(α t a t 1 )(α t a t 1 ) 0 I t 1 ] = T t P t 1 T 0 t 1 + R tq t R 0 t The corresponding optimal predictor of y t give information at t 1 is y t t 1 = Z t a t t 1 + d t The prediction error and its MSE matrix are v t = y t y t t 1 = y t Z t a t t 1 d t = Z t (α t a t t 1 )+ε t E[v t v 0 t] = F t = Z t P t t 1 Z 0 t + H t

These are the components that are required to form the prediction error decomposition of the log-likelihood function.

Updating Equations When new observations y t become available, the optimal predictor a t t 1 and its MSE matrix are updated using a t = a t t 1 + P t t 1 Z 0 tf 1 t (y t Z t a t t 1 d t ) = a t t 1 + P t t 1 Z 0 tf 1 t v t P t = P t t 1 P t t 1 Z t F 1 t Z t P t t 1 The value a t is referred to as the filtered estimate of α t and P t is the MSE matrix of this estimate. It is the optimal estimate of α t given information available at time t.

Prediction Error Decomposition Let θ denote the parameters of the state space model. These parameters are embedded in the system matrices. For the state space model with a fixed value of θ, the Kalman Filter produces the prediction errors, v t (θ), and the prediction error variances, F t (θ), from the prediction equations. The prediction error decomposition of the log-likelihood function follows immediately: ln L(θ y) = NT 2 ln(2π) 1 2 1 2 TX t=1 TX t=1 ln F t (θ) vt(θ)f 0 1 t (θ)v t (θ)

State Space Representation of Log-Normal AR(1) SV model Parameterization 1: let y t = r t μ. y t = σ t u t Then ln σ t = ω + φ ln σ t 1 + η t ln y t = lnσ t +ln u t E[ln u t ] = 0.63518, var(ln u t )=π 2 /8 The measurement equation is ln y t = 0.6358 + ln σ t + ξ t, ξ t iid (0,π 2 /8) The transition equation is ln σ t = ω + φ ln σ t 1 + η t, η t iid N(0,σ 2 η)

State space parameters α t = lnσ t = state variable Z t = 1, d t = 0.63518,T = φ a 0 = ω/(1 φ), P 0 = σ 2 η/(1 φ 2 )

Parameterization 2: r t =exp(w t /2)u t, w t = ω + φw t 1 + η w,t Then The measurement equation is The transition equation is ln rt 2 = w t +lnu 2 t E[ln u 2 t ] = 1.27, var(ln u 2 t )=π 2 /2 ln r 2 t = 1.27 + w t + ζ t, ζ t iid (0,π 2 /2) w t = ω + φw t 1 + η w,t, η w,t iid N(0,σ 2 η w )

State space parameters α t = lnσ t = state variable Z t = 1, d t = 1.27,T = φ a 0 = ω/(1 φ), P 0 = σ 2 η w /(1 φ 2 )

Complications ξ t and ζ t are not Gaussian random variables Kalman filter only provided minimum MSE linear estimators of state space parameters and state variables Quasi-MLE can be performed from prediction error decomposition of Gaussian log-likelihood Must use sandwich asymptotic variance for valid standard errors r t 0 ln r t and ln r 2 t which causes numerical problems

Transformation to improve numerical stability (Breidt and Carriquiry, 1996) x t = ln(rt 2 + s) s rt 2 + s s = dvar(r t ) 0.02

Log-Normal AR(1) SV Model with Student-t Errors Since y t = σ t u t = σ t, u t iid St(v), v > 2 ln(σ t ) α = φ(ln(σ t 1 α)+η t, η t iid N(0,σ 2 η) u t is independent of η t for all t It follows that u t = ω t v t v t iid N(0, 1) (v 2)ω 1 t χ 2 v ln y t = lnσ t + 1 2 ln ω t +ln v t = μ ξ (v)+lnσ t + ξ t ξ t = ξ t μ ξ (v), E[ξ t ] = μ ξ (v) = var(ξ t ) = σ 2 ξ (v) =