Lecture 4: Dynamic models

Similar documents
Bayesian Model Comparison:

Bayesian Ingredients. Hedibert Freitas Lopes

Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US

Particle Learning and Smoothing

Bayesian Stochastic Volatility (SV) Model with non-gaussian Errors

Bayesian Monte Carlo Filtering for Stochastic Volatility Models

Particle Learning and Smoothing

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Online Appendix for: A Bounded Model of Time Variation in Trend Inflation, NAIRU and the Phillips Curve

Hierarchical models. Dr. Jarad Niemi. August 31, Iowa State University. Jarad Niemi (Iowa State) Hierarchical models August 31, / 31

Dynamic linear models (aka state-space models) 1

Gaussian Multiscale Spatio-temporal Models for Areal Data

Particle Learning and Smoothing

Computer Intensive Methods in Mathematical Statistics

Lecture 2: From Linear Regression to Kalman Filter and Beyond

BAYESIAN MODEL CRITICISM

Bayesian Semiparametric GARCH Models

Bayesian Semiparametric GARCH Models

Markov Chain Monte Carlo

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix

Time-Varying Parameters

The Metropolis-Hastings Algorithm. June 8, 2012

Dynamic Generalized Linear Models

Markov Chain Monte Carlo Methods for Stochastic Optimization

Particle Filtering Approaches for Dynamic Stochastic Optimization

Particle Learning for General Mixtures

Contrastive Divergence

Point, Interval, and Density Forecast Evaluation of Linear versus Nonlinear DSGE Models

Gibbs Sampling in Linear Models #2

Probabilistic Graphical Models

Quantile Filtering and Learning

Markov Chain Monte Carlo Methods for Stochastic

Research Division Federal Reserve Bank of St. Louis Working Paper Series

Financial Econometrics and Volatility Models Estimation of Stochastic Volatility Models

Bayesian Dynamic Linear Modelling for. Complex Computer Models

Sequential Monte Carlo Methods for Bayesian Computation

Lecture 6: Bayesian Inference in SDE Models

MCMC algorithms for fitting Bayesian models

Bayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems

1 Elements of Markov Chain Structure and Convergence

Spatial Dynamic Factor Analysis

The Effects of Monetary Policy on Stock Market Bubbles: Some Evidence

The Kalman filter, Nonlinear filtering, and Markov Chain Monte Carlo

Markov Chain Monte Carlo methods

Flexible Particle Markov chain Monte Carlo methods with an application to a factor stochastic volatility model

Riemann Manifold Methods in Bayesian Statistics

Bayesian Inference for DSGE Models. Lawrence J. Christiano

State-space Model. Eduardo Rossi University of Pavia. November Rossi State-space Model Fin. Econometrics / 53

Foundations of Statistical Inference

Dynamic models. Dependent data The AR(p) model The MA(q) model Hidden Markov models. 6 Dynamic models

A Class of Non-Gaussian State Space Models. with Exact Likelihood Inference

Statistical Techniques in Robotics (16-831, F12) Lecture#17 (Wednesday October 31) Kalman Filters. Lecturer: Drew Bagnell Scribe:Greydon Foil 1

July 31, 2009 / Ben Kedem Symposium

Particle Learning and Smoothing

DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Bayesian Linear Models

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

David Giles Bayesian Econometrics

Bayesian Inference: Concept and Practice

Alexandra M. Schmidt and Hedibert F. Lopes. Dynamic models

Markov Chain Monte Carlo (MCMC)

Hierarchical Modeling for Spatial Data

High-dimensional Problems in Finance and Economics. Thomas M. Mertens

Cholesky Stochastic Volatility. August 2, Abstract

Part I State space models

Parsimony inducing priors for large scale state-space models

Bayesian Linear Models

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Embedding Supernova Cosmology into a Bayesian Hierarchical Model

Gaussian kernel GARCH models

arxiv: v1 [stat.me] 26 Jul 2011

Metropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9

Markov Chains and Hidden Markov Models

CS281A/Stat241A Lecture 22

A note on Reversible Jump Markov Chain Monte Carlo

Cross-sectional space-time modeling using ARNN(p, n) processes

Non-homogeneous Markov Mixture of Periodic Autoregressions for the Analysis of Air Pollution in the Lagoon of Venice

Markov Chain Monte Carlo

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Controlled sequential Monte Carlo

Bayesian Decision and Bayesian Learning

MONTE CARLO METHODS. Hedibert Freitas Lopes

State-space Model. Eduardo Rossi University of Pavia. November Rossi State-space Model Financial Econometrics / 49

Bayesian Computations for DSGE Models

Hyperparameter estimation in Dirichlet process mixture models

Statistical Inference and Methods

Estimation and Inference on Dynamic Panel Data Models with Stochastic Volatility

Sequential Monte Carlo Methods

A Bayesian Treatment of Linear Gaussian Regression

Lecture 8: Bayesian Estimation of Parameters in State Space Models

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Sequential Monte Carlo Methods (for DSGE Models)

17 : Markov Chain Monte Carlo

Lecture 16: State Space Model and Kalman Filter Bus 41910, Time Series Analysis, Mr. R. Tsay

Bayesian Estimation of DSGE Models

Bayesian Inference and MCMC

Computational statistics

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

GARCH Models Estimation and Inference

Transcription:

linear s Lecture 4: s Hedibert Freitas Lopes The University of Chicago Booth School of Business 5807 South Woodlawn Avenue, Chicago, IL 60637 http://faculty.chicagobooth.edu/hedibert.lopes hlopes@chicagobooth.edu SV

linear s SV 1 1st order DLM 2 linear s 3 Outline 4 SV

linear s s (DMs) SV

linear s SV 1st order DLM The local level (West and Harrison, 1997) has Observation equation: y t+1 x t+1, θ N(x t+1, σ 2 ) System equation: x t+1 x t, θ N(x t, τ 2 ) where x 0 N(m 0, C 0 ) and θ = (σ 2, τ 2 ) fixed (for now).

linear s SV It is worth noticing that the can be rewritten as where y x, θ N(x, σ 2 I n ) x x 0, θ N(x 0 1 n, τ 2 Ω) x 0 N(m 0, C 0 ) Ω =. 1 1 1 1... 1 1 1 1 2 2 2... 2 2 2 1 2 3 3... 3 3 3 1 2 3 4... 4 4 4......... 1 2 3 4... n 2 n 2 n 2 1 2 3 4... n 1 n 1 n 1 1 2 3 4... n 2 n 1 n

linear s Therefore, the prior of x given θ is x θ N(m 0 1 n ; C 0 1 n 1 n + τ 2 Ω), while its full conditional posterior distribution is where and x y, θ N(m 1, C 1 ) C 1 1 = (C 0 1 n 1 n + τ 2 Ω) 1 + σ 2 I n C 1 1 m 1 = (C 0 1 n 1 n + τ 2 Ω) 1 m 0 1 n + σ 2 y SV

linear s SV Let y t = (y 1,..., y t ). The previous joint posterior posterior for x given y (omitting θ for now) can be constructed as p(x y n ) = p(x 1 y n, x 2 ) which is obtained from n p(x t y n, x t+1 ), t=1 p(x n y n ) and noticing that given y t and x t+1, x t and x t+h are independent, and x t and y t are independent, for all integer h > 1.

Therefore, we first need to derive the above joint and this is done forward via the well-known Kalman recursions. 1st order DLM linear s SV p(x t y t ) = p(x t+1 y t ) = p(y t+1 x t ) = p(x t+1 y t+1 ) Posterior at t: (x t y t ) N(m t, C t ) Prior at t + 1: (x t+1 y t ) N(m t, R t+1 ) R t+1 = C t + τ 2 Marginal likelihood: (y t+1 y t ) N(m t, Q t+1 ) Q t+1 = R t+1 + σ 2 Posterior at t + 1: (x t+1 y t+1 ) N(m t+1, C t+1 ) m t+1 = (1 A t+1 )m t + A t+1 y t+1 C t+1 = A t+1 σ 2 where A t+1 = R t+1 /Q t+1.

linear s SV For t = n, x n y n N(m n n, C n n ), where m n n = m n and C n n = C n. For t < n, where and x t y n N(m n t, C n t ) x t x t+1, y n N(a n t, R n t ) m n t = (1 B t )m t + B t m n t+1 C n t = (1 B t )C t + B 2 t C n t+1 at n = (1 B t )m t + B t x t+1 Rt n = B t τ 2 B t = C t /(C t + τ 2 ).

linear s SV n = 100, σ 2 = 1.0 τ 2 = 0.5 and x 0 = 0. 5 0 5 10 y(t) x(t) 0 20 40 60 80 100 Time

linear s SV p(x t y t ) via Kalman m 0 = 0.0 and C 0 = 10.0 given τ 2 and σ 2 5 0 5 10 0 20 40 60 80 100 Time

linear s SV p(x t y n ) via Kalman m 0 = 0.0 and C 0 = 10.0 given τ 2 and σ 2 5 0 5 10 0 20 40 60 80 100 Time

linear s SV We showed earlier that (y t y t 1 ) N(m t 1, Q t ) where both m t 1 and Q t were presented before and are functions of θ = (σ 2, τ 2 ), y t 1, m 0 and C 0. Therefore, by Bayes rule, p(θ y n ) p(θ)p(y n θ) n = p(θ) f N (y t ; m t 1, Q t ). t=1

linear s SV : p(y σ 2, τ 2 )p(σ 2 )p(τ 2 ) σ 2 IG(ν 0 /2, ν 0 σ 2 0 /2), where ν 0 = 5 and σ 2 0 = 1. τ 2 IG(n 0 /2, n 0 τ 2 0 /2), where n 0 = 5 and τ 2 0 = 0.5 τ 2 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 σ 2

linear s SV Sample θ from p(θ y n, x n ) n p(θ y n, x n ) p(θ) p(y t x t, θ)p(x t x t 1, θ). t=1 Sample x n from p(x n y n, θ) n p(x n y n, θ) = f N (x t at n, Rt n ) t=1

linear s SV : p(x t y n ) 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 0.5 1.0 1.5 2.0 σ 2 τ 2 σ 2 Density 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 0.0 0.5 1.0 1.5 2.0 τ 2 Density 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 2.5

linear s SV : Comparison p(x t y n ) versus p(x t y n, σ 2 = 0.87, τ 2 = 0.63). 5 0 5 10 0 20 40 60 80 100 Time

linear s SV from the 1st order DLM learning in non-normal and nonlinear dynamic s p(y t+1 x t+1 ) and p(x t+1 x t ) in general rather difficult since p(x t+1 y t ) = p(x t+1 x t )p(x t y t )dx t p(x t+1 y t+1 ) p(y t+1 x t+1 )p(x t+1 y t ) are usually unavailable in closed form. Over the last 20 years: FFBS for conditionally Gaussian DLMs; Gamerman (1998) for generalized DLMs; Carlin, Polson and Stoffer (2002) for more general DMs.

linear s SV linear s Large class of s with time-varying parameters. linear s are defined by a pair of equations, the observation equation and the evolution/system equation: y t = F tβ t + ɛ t, ɛ t N(0, V ) β t = G t β t 1 + ω t, ω t N(0, W ) y t : sequence of observations; F t : vector of explanatory variables; β t : d-dimensional state vector; G t : d d evolution matrix; β 1 N(a, R).

linear s SV The linear growth is slightly more elaborate by incorporation of an extra time-varying parameter β 2 representing the growth of the level of the series: y t = β 1,t + ɛ t ɛ t N(0, V ) β 1,t = β 1,t 1 + β 2,t + ω 1,t β 2,t = β 2,t 1 + ω 2,t where ω t = (ω 1,t, ω 2,t ) N(0, W ) and F t = (1, 0) ( ) 1 1 G t = 0 1

linear s SV Prior distributions Updated/online distributions Smoothed distributions Prior, updated and smoothed distributions p(β t y t k ) k > 0 p(β t y t ) p(β t y t+k ) k > 0

linear s SV Let y t = {y 1,..., y t }. Posterior at time t 1: β t 1 y t 1 N(m t 1, C t 1 ) Prior at time t: β t y t 1 N(a t, R t ) with a t = G t m t 1 and R t = G t C t 1 G t + W. predictive at time t: y t y t 1 N(f t, Q t ) with f t = F ta t and Q t = F tr t F t + V.

linear s SV Posterior at time t p(β t y t ) = p(β t y t, y t 1 ) p(y t β t ) p(β t y t 1 ) The resulting posterior distribution is β t y t N(m t, C t ) with m t = a t + A t e t C t = R t A t A tq t A t = R t F t /Q t e t = y t f t By induction, these distributions are valid for all times.

linear s SV In dynamic s, the smoothed distribution π(β y n ) is more commonly used: n 1 π(β y n ) = p(β n y n ) p(β t β t+1,..., β n, y n ) = p(β n y n ) t=1 n 1 p(β t β t+1, y t ) t=1 Integrating with respect to (β 1,..., β t 1 ): n 1 π(β t,..., β n y n ) = p(β n y n ) p(β k β k+1, y t ) k=t π(β t, β t+1 y n ) = p(β t+1 y n )p(β t β t+1, y t ) for t = 1,..., n 1.

linear s It can be shown that where β t V, W, y n N(m n t, C n t ) : p(β t y n ) mt n = m t + C t G t+1r 1 t+1 (mn t+1 a t+1 ) Ct n = C t C t G t+1r 1 t+1 (R t+1 Ct+1)R n 1 t+1 G t+1c t SV

linear s It can be shown that : p(β y n ) (β t β t+1, V, W, y n ) is normally distributed with mean (G tw 1 G t + Ct 1 ) 1 (G tw 1 β t+1 + Ct 1 m t ) and variance (G tw 1 G t + C 1 t ) 1. SV

linear s Forward ing, backward (FFBS) Sampling from π(β y n ) can be performed by Sampling β n from N(m n, C n ) and then Sampling β t from (β t β t+1, V, W, y t ), for t = n 1,..., 1. The above scheme is known as the forward ing, backward (FFBS) algorithm (Carter and Kohn, 1994 and Frühwirth-Schnatter, 1994). SV

linear s SV from π(β t β t, y n ) Let β t = (β 1,..., β t 1, β t+1,..., β n ). For t = 2,..., n 1 where π(β t β t, y n ) p(y t β t ) p(β t+1 β t ) p(β t β t 1 ) f N (y t ; F tβ t, V )f N (β t+1 ; G t+1 β t, W ) f N (β t ; G t β t 1, W ) = f N (β t ; b t, B t ) b t = B t (σ 2 F t y t + G t+1w 1 β t+1 + W 1 G t β t 1 ) B t = (σ 2 F t F t + G t+1w 1 G t+1 + W 1 ) 1 for t = 2,..., n 1.

linear s SV For t = 1 and t = n, and where π(β 1 β 1, y n ) = f N (β 1 ; b 1, B 1 ) π(β n β n, y n ) = f N (β t ; b n, B n ) b 1 = B 1 (σ1 2 1y 1 + G 2W 1 β 2 + R 1 a) B 1 = (σ1 2 1F 1 + G 2W 1 G 2 + R 1 ) 1 b n = B n (σn 2 F n y n + W 1 G n β n 1 ) B n = (σn 2 F n F n + W 1 ) 1

linear s SV Sampling from π(v, W y n, β) Assume that φ = V 1 Gamma(n σ /2, n σ S σ /2) Φ = W 1 Wishart(n W /2, n W S W /2) Full conditionals π(φ β, Φ) π(φ β, φ) n f N (y t ; F t β t, φ 1 ) f G (φ; n σ /2, n σ S σ /2) t=1 f G (φ; nσ/2, nσs σ/2) n f N (β t ; G t β t 1, Φ 1 ) f W (Φ; n W /2, n W S W /2) t=2 f W (Φ; nw /2, nw SW /2) where nσ = n σ + n, nw = n W + n 1, nσs σ = n σ S σ + σ(y t F t β t ) 2 nw SW = n W S W + Σ n t=2(β t G t β t 1 )(β t G t β t 1 )

linear s SV to sample from p(β, V, W y n ) Sample V 1 from its full conditional f G (φ; n σ/2, n σs σ/2) Sample W 1 from its full conditional f W (Φ; n W /2, n W S W /2) Sample β from its full conditional by the FFBS algorithm. π(β y n, V, W )

linear s It is easy to see that n p(y n V, W ) = f N (y t f t, Q t ) t=1 Likelihood for (V, W ) which is the integrated likelihood of (V, W ). SV

linear s SV (β, V, W ) can be sampled jointly by Jointly (β, V, W ) Sampling (V, W ) from its marginal posterior π(v, W y n ) l(v, W y n )π(v, W ) by a rejection or Metropolis-Hastings step; Sampling β from its full conditional by the FFBS algorithm. π(β y n, V, W ) Jointly (β, V, W ) avoids MCMC convergence problems associated with the posterior correlation between parameters (Gamerman and Moreira, 2002).

linear s SV First order DLM with V = 1 : Comparing schemes 1 y t = β t + ɛ t, ɛ t N(0, 1) β t = β t 1 + ω t, ω t N(0, W ), with (n, W ) {(100,.01), (100,.5), (1000,.01), (1000,.5)}. 400 runs: 100 replications per combination. Priors: β 1 N(0, 10) and V and W have inverse Gammas with means set at true values and coefficients of variation set at 10. Posterior : based on 20,000 MCMC draws. 1 Gamerman, Reis and Salazar (2006) Comparison of schemes for dynamic linear s. International Statistical Review, 74, 203-214.

linear s SV Effective sample size For a given θ, let t (n) = t(θ (n) ), γ k = Cov π (t (n), t (n+k) ), the variance of t (n) as σ 2 = γ 0, the autocorrelation of lag k as ρ k = γ k /σ 2 and τ 2 n /n = Var π ( t n ). It can be shown that, as n, n 1 τn 2 = σ (1 2 + 2 k=1 ) ( n k n ρ k σ 2 ) 1 + 2 ρ k. k=1 }{{} inefficiency factor The inefficiency factor measures how far t (n) s are from being a random sample and how much Var π ( t n ) increases because of that. The effective sample size is defined as n n eff = 1 + 2 k=1 ρ k or the size of a random sample with the same variance.

linear s SV Schemes Scheme I: Sampling β 1,..., β n, V, W from their conditionals. Scheme II: Sampling β, V and W from their conditionals. Scheme III: Jointly (β, V, W ). Scheme n=100 n=1000 II 1.7 1.9 III 1.9 7.2 Computing times relative to scheme I. Scheme W n I II III 0.01 1000 242 8938 2983 0.01 100 3283 13685 12263 0.50 1000 409 3043 963 0.50 100 1694 3404 923 Sample averages (based on the 100 replications) of n eff based on V.

linear s SV Let y t, for t = 1,..., n, be generated by y t = x 2 t 20 + ɛt ɛt N(0, σ2 ) x t = αx t 1 + β xt 1 + γcos(1.2(t 1)) + u 1 + xt 1 2 t u t N(0, τ 2 ) where x 0 N(m 0, C 0 ) and θ = (α, β, γ). Prior distribution σ 2 IG(n 0 /2, n 0 σ 2 0/2) θ, τ 2 N(θ 0, τ 2 V 0 )IG(ν 0 /2, ν 0 τ 2 0 /2)

linear s SV It follows that where n 1 = n 0 + n and Sampling (σ 2, θ, τ 2 x 0, x n, y n ) (σ 2 y n, x n ) IG(n 1 /2, n 1 σ 2 1/2) n 1 σ 2 1 = n 0 σ 2 0 + n (y t xt 2 /20) 2. t=1 Also (θ, τ 2 x 0:n ) N(θ 1, τ 2 V 1 )IG(ν 1 /2, ν 1 τ1 2 /2) where ν 1 = ν 0 + n, V 1 1 = V 1 0 + Z Z V 1 1 θ 1 = V 1 0 θ 0 + Z x 1:n ν 1 τ1 2 = ν 0 τ0 2 + (y Zθ 1) (y Zθ 1 ) + (θ 1 θ 0 ) V 1 0 (θ 1 θ 0 ) Z = (G x0,..., G xn 1 ) G xt = (x t 1, x t 1 /(1 + x 2 t 1 ), cos(1.2(t 1))).

linear s SV Sampling x 1,..., x n Let x t = (x 0,..., x t 1, x t+1,..., x n ), for t = 1,..., n 1, x 0 = x n, x n = x 0:(n 1) and y 0 =. For t = 0 p(x 0 x 0, y 0, ψ) f N (x 0 ; m 0, C 0 )f N (x 1 ; G x 0 θ, τ 2 ) For t = 1,..., n 1 For t = n p(x t x t, y t, ψ) f N (y t ; x 2 t /20, σ 2 )f N (x t ; G x t 1 θ, τ 2 ) f N (x t+1 ; G x t θ, τ 2 ) p(x n x n, y n, ψ) f N (y n ; x 2 n/20, σ 2 )f N (x n ; G x n 1 θ, τ 2 )

linear s SV Metropolis-Hastings algorithm A simple random walk Metropolis algorithm with tuning variance v 2 x would work as follows. For t = 0,..., n 1 Current state: x (j) t 2 Sample xt from N(x (j) t, vx 2 ) 3 Compute the acceptance probability { } α = min 1, p(x t x t, y t, ψ) p(x (j) t x t, y t, ψ) 4 New state: x (j+1) t = { x t w. p. α x (j) t w. p. 1 α

linear s SV Simulation set up We simulated n = 100 observations based on θ = (0.5, 25, 8), σ 2 = 1, τ 2 = 10 and x 0 = 0.1. y t 0 5 10 15 20 0 5 10 15 20 y t 0 20 40 60 80 100 time x t 10 0 10 20 10 0 10 20 x t 0 20 40 60 80 100 time 10 0 10 20 10 0 10 20 x t x t 1

linear s SV Prior hyperparameters x 0 N(m 0, C 0 ) m 0 = 0.0 and C 0 = 10 θ τ 2 N(θ 0, τ 2 V 0 ) θ 0 = (0.5, 25, 8) and V 0 = diag(0.0025, 0.1, 0.04) τ 2 IG(ν 0 /2, ν 0 τ0 2/2) ν 0 = 6 and τ0 2 = 20/3 such that E(τ 2 ) = V (τ 2 ) = 10. σ 2 IG(n 0 /2, n 0 σ0 2) n 0 = 6 and σ 2 0 = 2/3 such that E(σ 2 ) = V (σ 2 ) = 1.

linear s SV Metropolis-Hastings tuning parameter v 2 x = (0.1) 2 MCMC setup Burn-in period, step and MCMC sample size M 0 = 1, 000 L = 20 M = 950 20, 000 draws Initial values θ = (0.5, 25, 8) τ 2 = 10 σ 2 = 1 x 0:n = x true 0:n

linear s SV ACF 0.4 0 50 100 150 200 250 300 350 0.0 0.2 0.6 0.8 1.0 1.5 0.5 0.0 0.5 1.0 1.5 2.0 alpha 5000 10000 15000 20000 iteration 0 5 10 15 20 25 30 Lag ACF 0.4 0 50 100 150 200 250 0.0 0.2 0.6 0.8 1.0 20 22 24 26 28 beta 5000 10000 15000 20000 iteration 0 5 10 15 20 25 30 Lag ACF 0.4 0 50 100 150 0.0 0.2 0.6 0.8 1.0 5.5 6.0 6.5 7.0 7.5 8.0 gamma 5000 10000 15000 20000 iteration 0 5 10 15 20 25 30 Lag Parameters ACF 0.4 0 50 100 150 0.0 0.2 0.6 0.8 1.0 6 8 10 12 14 16 tau2 5000 10000 15000 20000 iteration 0 5 10 15 20 25 30 Lag ACF 0.4 0 100 200 300 0.0 0.2 0.6 0.8 1.0 1 2 3 4 sig2 5000 10000 15000 20000 iteration 0 5 10 15 20 25 30 Lag 1.5 0.5 0.0 0.5 1.0 1.5 2.0 20 22 24 26 28 30 5.5 6.0 6.5 7.0 7.5 8.0 8.5 6 8 10 12 14 16 1 2 3 4

linear s SV 20 10 0 10 20 States 0 20 40 60 80 100 time

linear s SV ACF 0.2 0.0 0.2 0.4 0.6 0.8 1.0 lag States 0 5 10 15 20 25 30

linear s Parameters M 0 = 100, 000 L = 50 M = 1000 150, 000 draws ACF 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.5 0.0 0.5 1.0 1.5 2.0 alpha 0 10000 20000 30000 40000 50000 iteration 0 5 10 15 20 25 30 Lag ACF 0.0 0.2 0.4 0.6 0.8 1.0 20 22 24 26 28 beta 0 10000 20000 30000 40000 50000 iteration 0 5 10 15 20 25 30 Lag ACF 0.0 0.2 0.4 0.6 0.8 1.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 gamma 0 10000 20000 30000 40000 50000 iteration 0 5 10 15 20 25 30 Lag ACF 0.0 0.2 0.4 0.6 0.8 1.0 5 10 15 20 tau2 0 10000 20000 30000 40000 50000 iteration 0 5 10 15 20 25 30 Lag ACF 0.0 0.2 0.4 0.6 0.8 1.0 1 2 3 4 5 6 sig2 0 10000 20000 30000 40000 50000 iteration 0 5 10 15 20 25 30 Lag SV 0 50 100 150 200 250 300 350 0 50 100 150 200 250 0 100 200 300 0 100 200 300 400 0 50 100 150 200 250 1 0 1 2 18 20 22 24 26 28 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 5 10 15 20 1 2 3 4 5 6 7

linear s SV ACF 0.2 0.0 0.2 0.4 0.6 0.8 1.0 lag States 0 5 10 15 20 25 30

linear s Stochastic volatility The canonical stochastic volatility (SVM), is y t = e ht/2 ε t h t = µ + φh t 1 + τη t where ε t and η t are N(0, 1) shocks with E(ε t η t+h ) = 0 for all h and E(ε t ε t+l ) = E(η t η t+l ) = 0 for all l 0. τ 2 : volatility of the log-volatility. φ < 1 then h t is a stationary process. SV Let y n = (y 1,..., y n ), h n = (h 1,..., h n ) and h a:b = (h a,..., h b ).

linear s SV Simulated data n = 500, h 0 = 0.0 and (µ, φ, τ 2 ) = ( 0.00645, 0.99, 0.0225) 4 2 0 2 4 0 1 2 3 4 0 100 200 300 400 500 time 0 100 200 300 400 500 time

linear s SV Prior information Uncertainty about the initial log volatility is h 0 N (m 0, C 0 ). Let θ = (µ, φ), then the prior distribution of (θ, τ 2 ) is normal-inverse gamma, i.e. (θ, τ 2 ) NIG(θ 0, V 0, ν 0, s 2 0 ): θ τ 2 N(θ 0, τ 2 V 0 ) τ 2 IG(ν 0 /2, ν 0 s 2 0/2) For example, if ν 0 = 10 and s0 2 = 0.018 then E(τ 2 ) = ν 0s 2 0 /2 ν 0 /2 1 = 0.0225 Var(τ 2 ) = (ν 0 s 2 0 /2)2 (ν 0 /2 1) 2 (ν 0 /2 2) = (0.013)2 Hyperparameters: m 0, C 0, θ 0, V 0, ν 0 and s 2 0.

linear s SV Simulated data Simulation setup h 0 = 0.0 µ = 0.00645 φ = 0.99 τ 2 = 0.0225 Prior distribution h 0 N(0, 100) µ N(0, 100) φ N(0, 100) τ 2 IG(5, 0.14) (Mode=0.0234; 95% c.i.=(0.014; 0.086))

linear s SV Posterior The SVM is a dynamic and posterior via MCMC for the the latent log-volatility states h t can be performed in at least two ways. Let h t = (h 0:(t 1), h (t+1):n ), for t = 1,..., n 1 and h n = h 1:(n 1). moves for h t (θ, τ 2 h n, y n ) (h t h t, θ, τ 2, y n ), for t = 1,..., n Block move for h n (θ, τ 2 h n, y n ) (h n θ, τ 2, y n )

linear s SV Sampling (θ, τ 2 h n, y n ) Conditional on h 0:n, the posterior distribution of (θ, τ 2 ) is also normal-inverse gamma: (θ, τ 2 y n, h 0:n ) NIG(θ 1, V 1, ν 1, s 2 1) where X = (1 n, h 0:(n 1) ), ν 1 = ν 0 + n V1 1 = V0 1 + X X V1 1 θ 1 = V0 1 θ 0 + X h 1:n ν 1 s1 2 = ν 0 s0 2 + (y X θ 1 ) (y X θ 1 ) + (θ 1 θ 0 ) V0 1 (θ 1 θ 0 )

linear s SV Combining and leads to (by Bayes theorem) where Sampling (h 0 θ, τ 2, h 1 ) h 0 N(m 0, C 0 ) h 1 h 0 N(µ + φh 0, τ 2 ) h 0 h 1 N(m 1, C 1 ) C1 1 1 = C0 1 0 + φτ 2 (h 1 µ) C1 1 = C0 1 + φ 2 τ 2

linear s SV Conditional prior distribution of h t Given h t 1, θ and τ 2, it can be shown that ( ) {( ) ( ht µ + φh N t 1 h t+1 (1 + φ)µ + φ 2, τ 2 1 φ h t 1 φ (1 + φ 2 ) E(h t h t 1, h t+1, θ, τ 2 ) and V (h t h t 1, h t+1, θ, τ 2 ) are ( ) ( ) 1 φ φ µ t = 1 + φ 2 µ + 1 + φ 2 (h t 1 + h t+1 ) ν 2 = τ 2 (1 + φ 2 ) 1. Therefore, )}. (h t h t 1, h t+1, θ, τ 2 ) N(µ t, ν 2 ) t = 1,..., n 1 (h n h n 1, θ, τ 2 ) N(µ n, τ 2 ) where µ n = µ + φh n 1.

linear s SV Sampling h t via RWM Let ν 2 t = ν 2 for t = 1,..., n 1 and ν 2 n = τ 2, then p(h t h t, y n, θ, τ 2 ) = f N (h t ; µ t, ν 2 t )f N (y t ; 0, e ht ) for t = 1,..., n. RWM with tuning vh 2 (t = 1,..., n): 1 Current state: h (j) t 2 Sample ht from N(h (j) t, vh 2) 3 Compute the acceptance probability { } f N (ht ; µ t, νt 2 )f N (y t ; 0, e h t ) α = min 1, f N (h (j) t ; µ t, νt 2 )f N (y t ; 0, e h(j) t ) 4 New state: h (j+1) t = { h t w. p. α h (j) t w. p. 1 α

linear s SV MCMC setup M 0 = 1, 000 M = 1, 000 Autocorrelation of h t s ACF 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Simulated data 0 5 10 15 20 25 30 35 lag

linear s SV Sampling h t via IMH The full conditional distribution of h t is given by p(h t h t, y n, θ, τ 2 ) = p(h t h t 1, h t+1, θ, τ 2 )p(y t h t ) = f N (h t ; µ t, ν 2 )f N (y t ; 0, e ht ). Kim, Shephard and Chib (1998) explored the fact that log p(y t h t ) = const 1 2 h t y 2 t 2 exp( h t) and that a Taylor expansion of exp( h t ) around µ t leads to log p(y t h t ) const 1 2 h t y t 2 ( e µ t (h t µ t )e ) µt { 2 g(h t ) = exp 1 } 2 h t(1 yt 2 e µt )

linear s Proposal distribution Let ν 2 t = ν 2 for t = 1,..., n 1 and ν 2 n = τ 2. Then, by combining f N (h t ; µ t, νt 2 ) and g(h t ), for t = 1,..., n, leads to the following proposal distribution: q(h t h t, y n, θ, τ 2 ) N ( h t ; µ t, νt 2 ) where µ t = µ t + 0.5ν 2 t (y 2 t e µt 1). SV

linear s SV For t = 1,..., n 1 Current state: h (j) t 2 Sample h t from N( µ t, ν 2 t ) IMH algorithm 3 Compute the acceptance probability { } f N (ht ; µ t, νt 2 )f N (y t ; 0, e h t ) α = min 1, f N (h (j) t ; µ t, νt 2 )f N (y t ; 0, e h(j) t ) f (j) N(ht ; µ t, νt 2 ) f N (ht ; µ t, νt 2 ) 4 New state: h (j+1) t = { h t w. p. α h (j) t w. p. 1 α

ACF for both schemes 1st order DLM linear s SV ACF 0.2 0.4 0.6 0.8 1.0 RANDOM WALK ACF 0.2 0.4 0.6 0.8 1.0 INDEPENDENT 0 5 10 15 20 25 30 35 lag 0 5 10 15 20 25 30 35 lag

linear s SV Let y t = log y 2 t and ɛ t = log ε 2 t. Sampling h n - normal approximation and FFBS The SV-AR(1) is a DLM with nonnormal observational errors, i.e. y t = h t + ɛ t h t = µ + φh t 1 + τη t where η t N(0, 1). The distribution of ɛ t is log χ 2 1, where E(ɛ t ) = 1.27 V (ɛ t ) = π2 2 = 4.935

linear s SV Normal approximation Let ɛ t be approximated by N(α, σ 2 ), z t = y t α, α = 1.27 and σ 2 = π 2 /2. Then z t = h t + σv t h t = µ + φh t 1 + τη t is a simple DLM where v t and η t are N(0, 1). Sampling from p(h n θ, τ 2, σ 2, z n ) can be performed by the FFBS algorithm.

log χ 2 1 and N( 1.27, π2 /2) 1st order DLM linear s SV density 0.00 0.05 0.10 0.15 0.20 0.25 log chi^2_1 normal 20 15 10 5 0 5 10 x

linear s ACF 0.0 0.2 0.4 0.6 0.8 1.0 RANDOM WALK 0 5 10 15 20 25 30 35 lag ACF for the three schemes ACF 0.0 0.2 0.4 0.6 0.8 1.0 INDEPENDENT 0 5 10 15 20 25 30 35 lag ACF 0.0 0.2 0.4 0.6 0.8 1.0 NORMAL 0 5 10 15 20 25 30 35 lag SV

linear s SV Sampling h n - mixtures of normals and FFBS The log χ 2 1 distribution can be approximated by where 7 π i N(µ i, ωi 2 ) i=1 i π i µ i ωi 2 1 0.00730-11.40039 5.79596 2 0.10556-5.24321 2.61369 3 0.00002-9.83726 5.17950 4 0.04395 1.50746 0.16735 5 0.34001-0.65098 0.64009 6 0.24566 0.52478 0.34023 7 0.25750-2.35859 1.26261

linear s SV density 0.00 0.05 0.10 0.15 0.20 0.25 log χ 2 1 and 7 i=1 π in(µ i, ω 2 i ) log chi^2_1 mixture of 7 normals 20 15 10 5 0 5 10 x

linear s SV Mixture of normals Using an argument from the Bayesian analysis of mixture of normal, let z 1,..., z n be unobservable (latent) indicator variables such that z t {1,..., 7} and Pr(z t = i) = π i, for i = 1,..., 7. Therefore, conditional on the z s, y t is transformed into log y 2 t, log y 2 t = h t + log ε 2 t h t = µ + φh t 1 + τ η η t which can be rewritten as a normal DLM: log y 2 t = h t + v t v t N(µ zt, ω 2 z t ) h t = µ + φh t 1 + w t w t N(0, τ 2 η ) where µ zt and ω 2 z t are provided in the previous table. Then h n is jointly sampled by using the the FFBS algorithm.

linear s SV ACF ACF 0.4 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.6 0.8 1.0 RANDOM WALK 15 20 0 5 10 25 30 35 lag NORMAL 0 5 10 15 20 25 30 35 ACF for the four schemes ACF ACF 0.4 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.6 0.8 1.0 INDEPENDENT 15 20 0 5 10 25 30 35 lag MIXTURE 0 5 10 15 20 25 30 35 lag lag

linear s SV 0 1 2 3 4 Posterior means: volatilities TRUE RANDOM WALK INDEPENDENT NORMAL MIXTURE 0 100 200 300 400 500 time