Lecture 4: Dynamic models - PDF Free Download

linear s Lecture 4: s Hedibert Freitas Lopes The University of Chicago Booth School of Business 5807 South Woodlawn Avenue, Chicago, IL 60637 http://faculty.chicagobooth.edu/hedibert.lopes hlopes@chicagobooth.edu SV

linear s SV 1 1st order DLM 2 linear s 3 Outline 4 SV

linear s s (DMs) SV

linear s SV 1st order DLM The local level (West and Harrison, 1997) has Observation equation: y t+1 x t+1, θ N(x t+1, σ 2 ) System equation: x t+1 x t, θ N(x t, τ 2 ) where x 0 N(m 0, C 0 ) and θ = (σ 2, τ 2 ) fixed (for now).

linear s SV It is worth noticing that the can be rewritten as where y x, θ N(x, σ 2 I n ) x x 0, θ N(x 0 1 n, τ 2 Ω) x 0 N(m 0, C 0 ) Ω =. 1 1 1 1... 1 1 1 1 2 2 2... 2 2 2 1 2 3 3... 3 3 3 1 2 3 4... 4 4 4......... 1 2 3 4... n 2 n 2 n 2 1 2 3 4... n 1 n 1 n 1 1 2 3 4... n 2 n 1 n

linear s Therefore, the prior of x given θ is x θ N(m 0 1 n ; C 0 1 n 1 n + τ 2 Ω), while its full conditional posterior distribution is where and x y, θ N(m 1, C 1 ) C 1 1 = (C 0 1 n 1 n + τ 2 Ω) 1 + σ 2 I n C 1 1 m 1 = (C 0 1 n 1 n + τ 2 Ω) 1 m 0 1 n + σ 2 y SV

linear s SV Let y t = (y 1,..., y t ). The previous joint posterior posterior for x given y (omitting θ for now) can be constructed as p(x y n ) = p(x 1 y n, x 2 ) which is obtained from n p(x t y n, x t+1 ), t=1 p(x n y n ) and noticing that given y t and x t+1, x t and x t+h are independent, and x t and y t are independent, for all integer h > 1.

Therefore, we first need to derive the above joint and this is done forward via the well-known Kalman recursions. 1st order DLM linear s SV p(x t y t ) = p(x t+1 y t ) = p(y t+1 x t ) = p(x t+1 y t+1 ) Posterior at t: (x t y t ) N(m t, C t ) Prior at t + 1: (x t+1 y t ) N(m t, R t+1 ) R t+1 = C t + τ 2 Marginal likelihood: (y t+1 y t ) N(m t, Q t+1 ) Q t+1 = R t+1 + σ 2 Posterior at t + 1: (x t+1 y t+1 ) N(m t+1, C t+1 ) m t+1 = (1 A t+1 )m t + A t+1 y t+1 C t+1 = A t+1 σ 2 where A t+1 = R t+1 /Q t+1.

linear s SV For t = n, x n y n N(m n n, C n n ), where m n n = m n and C n n = C n. For t < n, where and x t y n N(m n t, C n t ) x t x t+1, y n N(a n t, R n t ) m n t = (1 B t )m t + B t m n t+1 C n t = (1 B t )C t + B 2 t C n t+1 at n = (1 B t )m t + B t x t+1 Rt n = B t τ 2 B t = C t /(C t + τ 2 ).

linear s SV n = 100, σ 2 = 1.0 τ 2 = 0.5 and x 0 = 0. 5 0 5 10 y(t) x(t) 0 20 40 60 80 100 Time

linear s SV p(x t y t ) via Kalman m 0 = 0.0 and C 0 = 10.0 given τ 2 and σ 2 5 0 5 10 0 20 40 60 80 100 Time

linear s SV p(x t y n ) via Kalman m 0 = 0.0 and C 0 = 10.0 given τ 2 and σ 2 5 0 5 10 0 20 40 60 80 100 Time

linear s SV We showed earlier that (y t y t 1 ) N(m t 1, Q t ) where both m t 1 and Q t were presented before and are functions of θ = (σ 2, τ 2 ), y t 1, m 0 and C 0. Therefore, by Bayes rule, p(θ y n ) p(θ)p(y n θ) n = p(θ) f N (y t ; m t 1, Q t ). t=1

linear s SV : p(y σ 2, τ 2 )p(σ 2 )p(τ 2 ) σ 2 IG(ν 0 /2, ν 0 σ 2 0 /2), where ν 0 = 5 and σ 2 0 = 1. τ 2 IG(n 0 /2, n 0 τ 2 0 /2), where n 0 = 5 and τ 2 0 = 0.5 τ 2 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 σ 2

linear s SV Sample θ from p(θ y n, x n ) n p(θ y n, x n ) p(θ) p(y t x t, θ)p(x t x t 1, θ). t=1 Sample x n from p(x n y n, θ) n p(x n y n, θ) = f N (x t at n, Rt n ) t=1

linear s SV : p(x t y n ) 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 0.5 1.0 1.5 2.0 σ 2 τ 2 σ 2 Density 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 0.0 0.5 1.0 1.5 2.0 τ 2 Density 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 2.5

linear s SV : Comparison p(x t y n ) versus p(x t y n, σ 2 = 0.87, τ 2 = 0.63). 5 0 5 10 0 20 40 60 80 100 Time

linear s SV from the 1st order DLM learning in non-normal and nonlinear dynamic s p(y t+1 x t+1 ) and p(x t+1 x t ) in general rather difficult since p(x t+1 y t ) = p(x t+1 x t )p(x t y t )dx t p(x t+1 y t+1 ) p(y t+1 x t+1 )p(x t+1 y t ) are usually unavailable in closed form. Over the last 20 years: FFBS for conditionally Gaussian DLMs; Gamerman (1998) for generalized DLMs; Carlin, Polson and Stoffer (2002) for more general DMs.

linear s SV linear s Large class of s with time-varying parameters. linear s are defined by a pair of equations, the observation equation and the evolution/system equation: y t = F tβ t + ɛ t, ɛ t N(0, V ) β t = G t β t 1 + ω t, ω t N(0, W ) y t : sequence of observations; F t : vector of explanatory variables; β t : d-dimensional state vector; G t : d d evolution matrix; β 1 N(a, R).

linear s SV The linear growth is slightly more elaborate by incorporation of an extra time-varying parameter β 2 representing the growth of the level of the series: y t = β 1,t + ɛ t ɛ t N(0, V ) β 1,t = β 1,t 1 + β 2,t + ω 1,t β 2,t = β 2,t 1 + ω 2,t where ω t = (ω 1,t, ω 2,t ) N(0, W ) and F t = (1, 0) ( ) 1 1 G t = 0 1

linear s SV Prior distributions Updated/online distributions Smoothed distributions Prior, updated and smoothed distributions p(β t y t k ) k > 0 p(β t y t ) p(β t y t+k ) k > 0

linear s SV Let y t = {y 1,..., y t }. Posterior at time t 1: β t 1 y t 1 N(m t 1, C t 1 ) Prior at time t: β t y t 1 N(a t, R t ) with a t = G t m t 1 and R t = G t C t 1 G t + W. predictive at time t: y t y t 1 N(f t, Q t ) with f t = F ta t and Q t = F tr t F t + V.

linear s SV Posterior at time t p(β t y t ) = p(β t y t, y t 1 ) p(y t β t ) p(β t y t 1 ) The resulting posterior distribution is β t y t N(m t, C t ) with m t = a t + A t e t C t = R t A t A tq t A t = R t F t /Q t e t = y t f t By induction, these distributions are valid for all times.

linear s SV In dynamic s, the smoothed distribution π(β y n ) is more commonly used: n 1 π(β y n ) = p(β n y n ) p(β t β t+1,..., β n, y n ) = p(β n y n ) t=1 n 1 p(β t β t+1, y t ) t=1 Integrating with respect to (β 1,..., β t 1 ): n 1 π(β t,..., β n y n ) = p(β n y n ) p(β k β k+1, y t ) k=t π(β t, β t+1 y n ) = p(β t+1 y n )p(β t β t+1, y t ) for t = 1,..., n 1.

linear s It can be shown that where β t V, W, y n N(m n t, C n t ) : p(β t y n ) mt n = m t + C t G t+1r 1 t+1 (mn t+1 a t+1 ) Ct n = C t C t G t+1r 1 t+1 (R t+1 Ct+1)R n 1 t+1 G t+1c t SV

linear s It can be shown that : p(β y n ) (β t β t+1, V, W, y n ) is normally distributed with mean (G tw 1 G t + Ct 1 ) 1 (G tw 1 β t+1 + Ct 1 m t ) and variance (G tw 1 G t + C 1 t ) 1. SV

linear s Forward ing, backward (FFBS) Sampling from π(β y n ) can be performed by Sampling β n from N(m n, C n ) and then Sampling β t from (β t β t+1, V, W, y t ), for t = n 1,..., 1. The above scheme is known as the forward ing, backward (FFBS) algorithm (Carter and Kohn, 1994 and Frühwirth-Schnatter, 1994). SV

linear s SV from π(β t β t, y n ) Let β t = (β 1,..., β t 1, β t+1,..., β n ). For t = 2,..., n 1 where π(β t β t, y n ) p(y t β t ) p(β t+1 β t ) p(β t β t 1 ) f N (y t ; F tβ t, V )f N (β t+1 ; G t+1 β t, W ) f N (β t ; G t β t 1, W ) = f N (β t ; b t, B t ) b t = B t (σ 2 F t y t + G t+1w 1 β t+1 + W 1 G t β t 1 ) B t = (σ 2 F t F t + G t+1w 1 G t+1 + W 1 ) 1 for t = 2,..., n 1.

linear s SV For t = 1 and t = n, and where π(β 1 β 1, y n ) = f N (β 1 ; b 1, B 1 ) π(β n β n, y n ) = f N (β t ; b n, B n ) b 1 = B 1 (σ1 2 1y 1 + G 2W 1 β 2 + R 1 a) B 1 = (σ1 2 1F 1 + G 2W 1 G 2 + R 1 ) 1 b n = B n (σn 2 F n y n + W 1 G n β n 1 ) B n = (σn 2 F n F n + W 1 ) 1

linear s SV Sampling from π(v, W y n, β) Assume that φ = V 1 Gamma(n σ /2, n σ S σ /2) Φ = W 1 Wishart(n W /2, n W S W /2) Full conditionals π(φ β, Φ) π(φ β, φ) n f N (y t ; F t β t, φ 1 ) f G (φ; n σ /2, n σ S σ /2) t=1 f G (φ; nσ/2, nσs σ/2) n f N (β t ; G t β t 1, Φ 1 ) f W (Φ; n W /2, n W S W /2) t=2 f W (Φ; nw /2, nw SW /2) where nσ = n σ + n, nw = n W + n 1, nσs σ = n σ S σ + σ(y t F t β t ) 2 nw SW = n W S W + Σ n t=2(β t G t β t 1 )(β t G t β t 1 )

linear s SV to sample from p(β, V, W y n ) Sample V 1 from its full conditional f G (φ; n σ/2, n σs σ/2) Sample W 1 from its full conditional f W (Φ; n W /2, n W S W /2) Sample β from its full conditional by the FFBS algorithm. π(β y n, V, W )

linear s It is easy to see that n p(y n V, W ) = f N (y t f t, Q t ) t=1 Likelihood for (V, W ) which is the integrated likelihood of (V, W ). SV

linear s SV (β, V, W ) can be sampled jointly by Jointly (β, V, W ) Sampling (V, W ) from its marginal posterior π(v, W y n ) l(v, W y n )π(v, W ) by a rejection or Metropolis-Hastings step; Sampling β from its full conditional by the FFBS algorithm. π(β y n, V, W ) Jointly (β, V, W ) avoids MCMC convergence problems associated with the posterior correlation between parameters (Gamerman and Moreira, 2002).

linear s SV First order DLM with V = 1 : Comparing schemes 1 y t = β t + ɛ t, ɛ t N(0, 1) β t = β t 1 + ω t, ω t N(0, W ), with (n, W ) {(100,.01), (100,.5), (1000,.01), (1000,.5)}. 400 runs: 100 replications per combination. Priors: β 1 N(0, 10) and V and W have inverse Gammas with means set at true values and coefficients of variation set at 10. Posterior : based on 20,000 MCMC draws. 1 Gamerman, Reis and Salazar (2006) Comparison of schemes for dynamic linear s. International Statistical Review, 74, 203-214.

linear s SV Effective sample size For a given θ, let t (n) = t(θ (n) ), γ k = Cov π (t (n), t (n+k) ), the variance of t (n) as σ 2 = γ 0, the autocorrelation of lag k as ρ k = γ k /σ 2 and τ 2 n /n = Var π ( t n ). It can be shown that, as n, n 1 τn 2 = σ (1 2 + 2 k=1 ) ( n k n ρ k σ 2 ) 1 + 2 ρ k. k=1 }{{} inefficiency factor The inefficiency factor measures how far t (n) s are from being a random sample and how much Var π ( t n ) increases because of that. The effective sample size is defined as n n eff = 1 + 2 k=1 ρ k or the size of a random sample with the same variance.

linear s SV Schemes Scheme I: Sampling β 1,..., β n, V, W from their conditionals. Scheme II: Sampling β, V and W from their conditionals. Scheme III: Jointly (β, V, W ). Scheme n=100 n=1000 II 1.7 1.9 III 1.9 7.2 Computing times relative to scheme I. Scheme W n I II III 0.01 1000 242 8938 2983 0.01 100 3283 13685 12263 0.50 1000 409 3043 963 0.50 100 1694 3404 923 Sample averages (based on the 100 replications) of n eff based on V.

linear s SV Let y t, for t = 1,..., n, be generated by y t = x 2 t 20 + ɛt ɛt N(0, σ2 ) x t = αx t 1 + β xt 1 + γcos(1.2(t 1)) + u 1 + xt 1 2 t u t N(0, τ 2 ) where x 0 N(m 0, C 0 ) and θ = (α, β, γ). Prior distribution σ 2 IG(n 0 /2, n 0 σ 2 0/2) θ, τ 2 N(θ 0, τ 2 V 0 )IG(ν 0 /2, ν 0 τ 2 0 /2)

linear s SV It follows that where n 1 = n 0 + n and Sampling (σ 2, θ, τ 2 x 0, x n, y n ) (σ 2 y n, x n ) IG(n 1 /2, n 1 σ 2 1/2) n 1 σ 2 1 = n 0 σ 2 0 + n (y t xt 2 /20) 2. t=1 Also (θ, τ 2 x 0:n ) N(θ 1, τ 2 V 1 )IG(ν 1 /2, ν 1 τ1 2 /2) where ν 1 = ν 0 + n, V 1 1 = V 1 0 + Z Z V 1 1 θ 1 = V 1 0 θ 0 + Z x 1:n ν 1 τ1 2 = ν 0 τ0 2 + (y Zθ 1) (y Zθ 1 ) + (θ 1 θ 0 ) V 1 0 (θ 1 θ 0 ) Z = (G x0,..., G xn 1 ) G xt = (x t 1, x t 1 /(1 + x 2 t 1 ), cos(1.2(t 1))).

linear s SV Sampling x 1,..., x n Let x t = (x 0,..., x t 1, x t+1,..., x n ), for t = 1,..., n 1, x 0 = x n, x n = x 0:(n 1) and y 0 =. For t = 0 p(x 0 x 0, y 0, ψ) f N (x 0 ; m 0, C 0 )f N (x 1 ; G x 0 θ, τ 2 ) For t = 1,..., n 1 For t = n p(x t x t, y t, ψ) f N (y t ; x 2 t /20, σ 2 )f N (x t ; G x t 1 θ, τ 2 ) f N (x t+1 ; G x t θ, τ 2 ) p(x n x n, y n, ψ) f N (y n ; x 2 n/20, σ 2 )f N (x n ; G x n 1 θ, τ 2 )

linear s SV Metropolis-Hastings algorithm A simple random walk Metropolis algorithm with tuning variance v 2 x would work as follows. For t = 0,..., n 1 Current state: x (j) t 2 Sample xt from N(x (j) t, vx 2 ) 3 Compute the acceptance probability { } α = min 1, p(x t x t, y t, ψ) p(x (j) t x t, y t, ψ) 4 New state: x (j+1) t = { x t w. p. α x (j) t w. p. 1 α

linear s SV Simulation set up We simulated n = 100 observations based on θ = (0.5, 25, 8), σ 2 = 1, τ 2 = 10 and x 0 = 0.1. y t 0 5 10 15 20 0 5 10 15 20 y t 0 20 40 60 80 100 time x t 10 0 10 20 10 0 10 20 x t 0 20 40 60 80 100 time 10 0 10 20 10 0 10 20 x t x t 1

linear s SV Prior hyperparameters x 0 N(m 0, C 0 ) m 0 = 0.0 and C 0 = 10 θ τ 2 N(θ 0, τ 2 V 0 ) θ 0 = (0.5, 25, 8) and V 0 = diag(0.0025, 0.1, 0.04) τ 2 IG(ν 0 /2, ν 0 τ0 2/2) ν 0 = 6 and τ0 2 = 20/3 such that E(τ 2 ) = V (τ 2 ) = 10. σ 2 IG(n 0 /2, n 0 σ0 2) n 0 = 6 and σ 2 0 = 2/3 such that E(σ 2 ) = V (σ 2 ) = 1.

linear s SV Metropolis-Hastings tuning parameter v 2 x = (0.1) 2 MCMC setup Burn-in period, step and MCMC sample size M 0 = 1, 000 L = 20 M = 950 20, 000 draws Initial values θ = (0.5, 25, 8) τ 2 = 10 σ 2 = 1 x 0:n = x true 0:n

linear s SV ACF 0.4 0 50 100 150 200 250 300 350 0.0 0.2 0.6 0.8 1.0 1.5 0.5 0.0 0.5 1.0 1.5 2.0 alpha 5000 10000 15000 20000 iteration 0 5 10 15 20 25 30 Lag ACF 0.4 0 50 100 150 200 250 0.0 0.2 0.6 0.8 1.0 20 22 24 26 28 beta 5000 10000 15000 20000 iteration 0 5 10 15 20 25 30 Lag ACF 0.4 0 50 100 150 0.0 0.2 0.6 0.8 1.0 5.5 6.0 6.5 7.0 7.5 8.0 gamma 5000 10000 15000 20000 iteration 0 5 10 15 20 25 30 Lag Parameters ACF 0.4 0 50 100 150 0.0 0.2 0.6 0.8 1.0 6 8 10 12 14 16 tau2 5000 10000 15000 20000 iteration 0 5 10 15 20 25 30 Lag ACF 0.4 0 100 200 300 0.0 0.2 0.6 0.8 1.0 1 2 3 4 sig2 5000 10000 15000 20000 iteration 0 5 10 15 20 25 30 Lag 1.5 0.5 0.0 0.5 1.0 1.5 2.0 20 22 24 26 28 30 5.5 6.0 6.5 7.0 7.5 8.0 8.5 6 8 10 12 14 16 1 2 3 4

linear s SV 20 10 0 10 20 States 0 20 40 60 80 100 time

linear s SV ACF 0.2 0.0 0.2 0.4 0.6 0.8 1.0 lag States 0 5 10 15 20 25 30

linear s Parameters M 0 = 100, 000 L = 50 M = 1000 150, 000 draws ACF 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.5 0.0 0.5 1.0 1.5 2.0 alpha 0 10000 20000 30000 40000 50000 iteration 0 5 10 15 20 25 30 Lag ACF 0.0 0.2 0.4 0.6 0.8 1.0 20 22 24 26 28 beta 0 10000 20000 30000 40000 50000 iteration 0 5 10 15 20 25 30 Lag ACF 0.0 0.2 0.4 0.6 0.8 1.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 gamma 0 10000 20000 30000 40000 50000 iteration 0 5 10 15 20 25 30 Lag ACF 0.0 0.2 0.4 0.6 0.8 1.0 5 10 15 20 tau2 0 10000 20000 30000 40000 50000 iteration 0 5 10 15 20 25 30 Lag ACF 0.0 0.2 0.4 0.6 0.8 1.0 1 2 3 4 5 6 sig2 0 10000 20000 30000 40000 50000 iteration 0 5 10 15 20 25 30 Lag SV 0 50 100 150 200 250 300 350 0 50 100 150 200 250 0 100 200 300 0 100 200 300 400 0 50 100 150 200 250 1 0 1 2 18 20 22 24 26 28 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 5 10 15 20 1 2 3 4 5 6 7

linear s SV ACF 0.2 0.0 0.2 0.4 0.6 0.8 1.0 lag States 0 5 10 15 20 25 30

linear s Stochastic volatility The canonical stochastic volatility (SVM), is y t = e ht/2 ε t h t = µ + φh t 1 + τη t where ε t and η t are N(0, 1) shocks with E(ε t η t+h ) = 0 for all h and E(ε t ε t+l ) = E(η t η t+l ) = 0 for all l 0. τ 2 : volatility of the log-volatility. φ < 1 then h t is a stationary process. SV Let y n = (y 1,..., y n ), h n = (h 1,..., h n ) and h a:b = (h a,..., h b ).

linear s SV Simulated data n = 500, h 0 = 0.0 and (µ, φ, τ 2 ) = ( 0.00645, 0.99, 0.0225) 4 2 0 2 4 0 1 2 3 4 0 100 200 300 400 500 time 0 100 200 300 400 500 time

linear s SV Prior information Uncertainty about the initial log volatility is h 0 N (m 0, C 0 ). Let θ = (µ, φ), then the prior distribution of (θ, τ 2 ) is normal-inverse gamma, i.e. (θ, τ 2 ) NIG(θ 0, V 0, ν 0, s 2 0 ): θ τ 2 N(θ 0, τ 2 V 0 ) τ 2 IG(ν 0 /2, ν 0 s 2 0/2) For example, if ν 0 = 10 and s0 2 = 0.018 then E(τ 2 ) = ν 0s 2 0 /2 ν 0 /2 1 = 0.0225 Var(τ 2 ) = (ν 0 s 2 0 /2)2 (ν 0 /2 1) 2 (ν 0 /2 2) = (0.013)2 Hyperparameters: m 0, C 0, θ 0, V 0, ν 0 and s 2 0.

linear s SV Simulated data Simulation setup h 0 = 0.0 µ = 0.00645 φ = 0.99 τ 2 = 0.0225 Prior distribution h 0 N(0, 100) µ N(0, 100) φ N(0, 100) τ 2 IG(5, 0.14) (Mode=0.0234; 95% c.i.=(0.014; 0.086))

linear s SV Posterior The SVM is a dynamic and posterior via MCMC for the the latent log-volatility states h t can be performed in at least two ways. Let h t = (h 0:(t 1), h (t+1):n ), for t = 1,..., n 1 and h n = h 1:(n 1). moves for h t (θ, τ 2 h n, y n ) (h t h t, θ, τ 2, y n ), for t = 1,..., n Block move for h n (θ, τ 2 h n, y n ) (h n θ, τ 2, y n )

linear s SV Sampling (θ, τ 2 h n, y n ) Conditional on h 0:n, the posterior distribution of (θ, τ 2 ) is also normal-inverse gamma: (θ, τ 2 y n, h 0:n ) NIG(θ 1, V 1, ν 1, s 2 1) where X = (1 n, h 0:(n 1) ), ν 1 = ν 0 + n V1 1 = V0 1 + X X V1 1 θ 1 = V0 1 θ 0 + X h 1:n ν 1 s1 2 = ν 0 s0 2 + (y X θ 1 ) (y X θ 1 ) + (θ 1 θ 0 ) V0 1 (θ 1 θ 0 )

linear s SV Combining and leads to (by Bayes theorem) where Sampling (h 0 θ, τ 2, h 1 ) h 0 N(m 0, C 0 ) h 1 h 0 N(µ + φh 0, τ 2 ) h 0 h 1 N(m 1, C 1 ) C1 1 1 = C0 1 0 + φτ 2 (h 1 µ) C1 1 = C0 1 + φ 2 τ 2

linear s SV Conditional prior distribution of h t Given h t 1, θ and τ 2, it can be shown that ( ) {( ) ( ht µ + φh N t 1 h t+1 (1 + φ)µ + φ 2, τ 2 1 φ h t 1 φ (1 + φ 2 ) E(h t h t 1, h t+1, θ, τ 2 ) and V (h t h t 1, h t+1, θ, τ 2 ) are ( ) ( ) 1 φ φ µ t = 1 + φ 2 µ + 1 + φ 2 (h t 1 + h t+1 ) ν 2 = τ 2 (1 + φ 2 ) 1. Therefore, )}. (h t h t 1, h t+1, θ, τ 2 ) N(µ t, ν 2 ) t = 1,..., n 1 (h n h n 1, θ, τ 2 ) N(µ n, τ 2 ) where µ n = µ + φh n 1.

linear s SV Sampling h t via RWM Let ν 2 t = ν 2 for t = 1,..., n 1 and ν 2 n = τ 2, then p(h t h t, y n, θ, τ 2 ) = f N (h t ; µ t, ν 2 t )f N (y t ; 0, e ht ) for t = 1,..., n. RWM with tuning vh 2 (t = 1,..., n): 1 Current state: h (j) t 2 Sample ht from N(h (j) t, vh 2) 3 Compute the acceptance probability { } f N (ht ; µ t, νt 2 )f N (y t ; 0, e h t ) α = min 1, f N (h (j) t ; µ t, νt 2 )f N (y t ; 0, e h(j) t ) 4 New state: h (j+1) t = { h t w. p. α h (j) t w. p. 1 α

linear s SV MCMC setup M 0 = 1, 000 M = 1, 000 Autocorrelation of h t s ACF 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Simulated data 0 5 10 15 20 25 30 35 lag

linear s SV Sampling h t via IMH The full conditional distribution of h t is given by p(h t h t, y n, θ, τ 2 ) = p(h t h t 1, h t+1, θ, τ 2 )p(y t h t ) = f N (h t ; µ t, ν 2 )f N (y t ; 0, e ht ). Kim, Shephard and Chib (1998) explored the fact that log p(y t h t ) = const 1 2 h t y 2 t 2 exp( h t) and that a Taylor expansion of exp( h t ) around µ t leads to log p(y t h t ) const 1 2 h t y t 2 ( e µ t (h t µ t )e ) µt { 2 g(h t ) = exp 1 } 2 h t(1 yt 2 e µt )

linear s Proposal distribution Let ν 2 t = ν 2 for t = 1,..., n 1 and ν 2 n = τ 2. Then, by combining f N (h t ; µ t, νt 2 ) and g(h t ), for t = 1,..., n, leads to the following proposal distribution: q(h t h t, y n, θ, τ 2 ) N ( h t ; µ t, νt 2 ) where µ t = µ t + 0.5ν 2 t (y 2 t e µt 1). SV

linear s SV For t = 1,..., n 1 Current state: h (j) t 2 Sample h t from N( µ t, ν 2 t ) IMH algorithm 3 Compute the acceptance probability { } f N (ht ; µ t, νt 2 )f N (y t ; 0, e h t ) α = min 1, f N (h (j) t ; µ t, νt 2 )f N (y t ; 0, e h(j) t ) f (j) N(ht ; µ t, νt 2 ) f N (ht ; µ t, νt 2 ) 4 New state: h (j+1) t = { h t w. p. α h (j) t w. p. 1 α

ACF for both schemes 1st order DLM linear s SV ACF 0.2 0.4 0.6 0.8 1.0 RANDOM WALK ACF 0.2 0.4 0.6 0.8 1.0 INDEPENDENT 0 5 10 15 20 25 30 35 lag 0 5 10 15 20 25 30 35 lag

linear s SV Let y t = log y 2 t and ɛ t = log ε 2 t. Sampling h n - normal approximation and FFBS The SV-AR(1) is a DLM with nonnormal observational errors, i.e. y t = h t + ɛ t h t = µ + φh t 1 + τη t where η t N(0, 1). The distribution of ɛ t is log χ 2 1, where E(ɛ t ) = 1.27 V (ɛ t ) = π2 2 = 4.935

linear s SV Normal approximation Let ɛ t be approximated by N(α, σ 2 ), z t = y t α, α = 1.27 and σ 2 = π 2 /2. Then z t = h t + σv t h t = µ + φh t 1 + τη t is a simple DLM where v t and η t are N(0, 1). Sampling from p(h n θ, τ 2, σ 2, z n ) can be performed by the FFBS algorithm.

log χ 2 1 and N( 1.27, π2 /2) 1st order DLM linear s SV density 0.00 0.05 0.10 0.15 0.20 0.25 log chi^2_1 normal 20 15 10 5 0 5 10 x

linear s ACF 0.0 0.2 0.4 0.6 0.8 1.0 RANDOM WALK 0 5 10 15 20 25 30 35 lag ACF for the three schemes ACF 0.0 0.2 0.4 0.6 0.8 1.0 INDEPENDENT 0 5 10 15 20 25 30 35 lag ACF 0.0 0.2 0.4 0.6 0.8 1.0 NORMAL 0 5 10 15 20 25 30 35 lag SV

linear s SV Sampling h n - mixtures of normals and FFBS The log χ 2 1 distribution can be approximated by where 7 π i N(µ i, ωi 2 ) i=1 i π i µ i ωi 2 1 0.00730-11.40039 5.79596 2 0.10556-5.24321 2.61369 3 0.00002-9.83726 5.17950 4 0.04395 1.50746 0.16735 5 0.34001-0.65098 0.64009 6 0.24566 0.52478 0.34023 7 0.25750-2.35859 1.26261

linear s SV density 0.00 0.05 0.10 0.15 0.20 0.25 log χ 2 1 and 7 i=1 π in(µ i, ω 2 i ) log chi^2_1 mixture of 7 normals 20 15 10 5 0 5 10 x

linear s SV Mixture of normals Using an argument from the Bayesian analysis of mixture of normal, let z 1,..., z n be unobservable (latent) indicator variables such that z t {1,..., 7} and Pr(z t = i) = π i, for i = 1,..., 7. Therefore, conditional on the z s, y t is transformed into log y 2 t, log y 2 t = h t + log ε 2 t h t = µ + φh t 1 + τ η η t which can be rewritten as a normal DLM: log y 2 t = h t + v t v t N(µ zt, ω 2 z t ) h t = µ + φh t 1 + w t w t N(0, τ 2 η ) where µ zt and ω 2 z t are provided in the previous table. Then h n is jointly sampled by using the the FFBS algorithm.

linear s SV ACF ACF 0.4 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.6 0.8 1.0 RANDOM WALK 15 20 0 5 10 25 30 35 lag NORMAL 0 5 10 15 20 25 30 35 ACF for the four schemes ACF ACF 0.4 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.6 0.8 1.0 INDEPENDENT 15 20 0 5 10 25 30 35 lag MIXTURE 0 5 10 15 20 25 30 35 lag lag

linear s SV 0 1 2 3 4 Posterior means: volatilities TRUE RANDOM WALK INDEPENDENT NORMAL MIXTURE 0 100 200 300 400 500 time