Bayesian Model Comparison: - PDF Free Download

Bayesian Model Comparison: Modeling Petrobrás log-returns Hedibert Freitas Lopes February 2014

Log price: y t = log p t Time span: 12/29/2000-12/31/2013 (n = 3268 days) LOG PRICE 1 2 3 4 0 500 1000 1500 2000 2500 3000 DAYS

Scatterplot of y t 1 versus y t y t 1 2 3 4 1 2 3 4 y t 1

Log return: r t = y t y t 1 = log(p t /p t 1 ) Time span: 01/02/2001-12/31/2013 (n = 3267 days) LOG RETURN 0.2 0.0 0.1 0.2 0 500 1000 1500 2000 2500 3000 DAYS

Histogram of r t 0 5 10 15 0.2 0.1 0.0 0.1 0.2 LOG RETURN

Models Model M 0 : r 1,..., r n iid N(0, σ 2 ) Model M 1 : r 1,..., r n iid N(µ, σ 2 ) Model M 2 : For t = 2,..., n for α, β R and σ 2 > 0. y t y t 1 N(α + βy t 1, σ 2 ),

Prior First n 0 = 1506 days (2001-2006): Prior knowledge. Let ỹ t and r t be log prices and log returns in the training sample. Let m r and V r be the sample mean and the sample variance of r t s. Last n = 1760 days (2007-2013): Inference and model comparison. Model M 0 : σ 2 IG (5, 4m r 2) Model M 1 : µ N(m r, 100V r /(n 0 1)) and σ 2 IG(5, V r ) Model M 2 : α N(a, 100V a ), β N(b, 100V b ), σ 2 IG(5, 4s 2 ). (a, b, s 2 ) are MLE of (α, β, σ 2 ) based on the training sample. V a and V b are the MLE variances of estimators a and b, respectively.

Model M 0 : Prior, posterior, MLE 0 2000 4000 6000 8000 10000 0.0005 0.0010 0.0015 0.0020 σ 2

Model M 1 : Gibbs sampler output µ 0.002 0.000 0.002 ACF 0.0 0.2 0.4 0.6 0.8 1.0 0 100 200 300 400 500 0e+00 4e+04 8e+04 iterations 0 10 20 30 40 50 Lag 0.004 0.002 0.000 0.002 σ 2 0.00100 0.00110 0.00120 ACF 0.0 0.2 0.4 0.6 0.8 1.0 0 2000 6000 10000 0e+00 4e+04 8e+04 iterations 0 10 20 30 40 50 Lag 0.00095 0.00105 0.00115 0.00125

Model M 1 : Prior, posterior, MLE µ σ 2 0 100 300 500 0 2000 6000 10000 0.03 0.00 0.02 0.000 0.002 0.004

Model M 2 : Gibbs sampler output α 0.010 0.005 0.015 0.025 0e+00 2e+04 4e+04 6e+04 8e+04 1e+05 ACF 0.0 0.2 0.4 0.6 0.8 1.0 0 10 20 30 40 50 0 10 20 30 40 50 60 0.01 0.00 0.01 0.02 0.03 iterations Lag β 0.992 0.996 1.000 ACF 0.0 0.2 0.4 0.6 0.8 1.0 0 50 100 150 200 0e+00 2e+04 4e+04 6e+04 8e+04 1e+05 0 10 20 30 40 50 0.990 0.994 0.998 1.002 iterations Lag σ 2 0.00100 0.00110 ACF 0.0 0.2 0.4 0.6 0.8 1.0 0 2000 6000 10000 0e+00 2e+04 4e+04 6e+04 8e+04 1e+05 0 10 20 30 40 50 0.00095 0.00105 0.00115 0.00125 iterations Lag

Model M 2 : Prior, posterior, MLE α β σ 2 0 10 20 30 40 50 60 0 50 100 150 200 0 2000 4000 6000 8000 10000 0.2 0.1 0.0 0.1 0.2 0.90 0.95 1.00 1.05 1.10 0.0000 0.0010 0.0020 0.0030

Comparing the models Recall that the posterior of θ i under model M i is p(θ i data, M i ) p(θ i M i )p(data θ i, M i ), while the marginal likelihood under model M i is p(data M i ) = p(data θ i, M i )p(θ i M i )dθ i. Therefore, the posterior model probability of M i is Pr(M i data) Pr(M i )p(data M i ), and the Bayes factor of model M i against model M j is B ij = Pr(M i)p(data M i ) Pr(M j )p(data M j )

Monte Carlo integration A major (computational) difficulty is calculating the marginal likelihood (or predictive) under model M, ie. p(data M). The simplest estimator of p(data M) is obtained by Monte Carlo integration. More specifically, when {θ (1),..., θ (N) } represents a sample from the prior p(θ M), then (for large N) p N (data M) = 1 N N p(data θ (j), M), j=1 is the simple Monte Carlo estimator of p(data M).

Harmonic mean identity (and estimator) An alternative way of calculating p(data M) is via the harmonic mean identity: 1 p(data M) = 1 p(θ data, M)dθ p(data θ, M) Monte Carlo integration: When {θ (1),..., θ (N) } represents a sample from the posterior p(θ data, M), then (for large N) p N (data M) = 1 N N 1 p(data θ (j), M) j=1 is the harmonic mean estimator of p(data M). 1

Comparing M 0, M 1 and M 2 Log predictive M i log p(data M i ) Computation Monte Carlo Harmonic Mean M 0 3508.33 3508.33 Closed form M 1 3509.47 3506.30 Gibbs sampler M 2 3509.71 3504.69 Gibbs sampler Posterior model probability M i Pr(M i data) Computation Monte Carlo Harmonic Mean M 0 12.4 86.4 Closed form M 1 38.5 11.3 Gibbs sampler M 2 49.1 2.3 Gibbs sampler log p MC (data M i ) = 3508.34 log p HM (data M i ) = 3509.8

M 3 : GARCH(1,1) with t errors - Let us get more serious! The GARCH(1,1) model with Student-t innovations: r t t ν (0, ρh t ) h t = α 0 + α 1 rt 1 2 + βh t 1, where α 0 > 0, α 1 0 and β > 0. We set the initial variance to h 0 = 0 for convenience. We let ρ = (ν 2)/ν so that V (r t h t ) = ν ν 2 ρh t = h t.

Prior Let ψ = (α, β, ν) and α = (α 0, α 1 ). The prior distribution of ψ is such that p(α, β, µ) = p(α)p(β)p(ν) where α N 2 (µ α, Σ α )I (α>0) β N(µ β, Σ β )I (β>0) and p(ν) = λ exp{ λ(ν δ)}i (λ>δ) for λ > 0 and δ 2, such that E(ν) = δ + 1/λ. Normal case: λ = 100 and δ = 500.

bayesgarch bayesgarch: Bayesian Estimation of the GARCH(1,1) Model with Student-t Innovations bayesgarch(r,mu.alpha = c(0,0),sigma.alpha=1000*diag(1,2), mu.beta=0,sigma.beta=1000, lambda=0.01,delta=2,control=list()) Paper: Ardia and Hoogerheide (2010) Bayesian Estimation of the GARCH(1,1) Model with Student-t Innovations. The R Journal, 2,41-47. http://cran.r-project.org/web/packages/bayesgarch

Example of R script Recall that r 0 are Petrobras returns for the first part of the data. M0 = 10000 # to be discarded (burn-in) M = 10000 # kept for posterior inference niter = M0+M MCMC.initial = bayesgarch(r0,mu.alpha=c(0,0),sigma.alpha=1000*diag(1,2), mu.beta=0,sigma.beta=1000,lambda=0.01,delta=2, control=list(n.chain=1,l.chain=niter,refresh=100)) draws = MCMC.initial$chain1 range = (M0+1):niter par(mfrow=c(2,2)) ts.plot(draws[range,1],xlab="iterations",main=expression(alpha[0]),ylab="") ts.plot(draws[range,2],xlab="iterations",main=expression(alpha[1]),ylab="") ts.plot(draws[range,3],xlab="iterations",main=expression(beta),ylab="") ts.plot(draws[range,4],xlab="iterations",main=expression(nu),ylab="")

Model M 3 : MCMC output α 0 α 1 β ν 1e 05 2e 05 3e 05 4e 05 5e 05 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.82 0.84 0.86 0.88 0.90 0.92 0.94 4 6 8 10 12 2e+04 6e+04 1e+05 2e+04 6e+04 1e+05 2e+04 6e+04 1e+05 2e+04 6e+04 1e+05 iterations iterations iterations iterations ACF 0.0 0.2 0.4 0.6 0.8 1.0 ACF 0.0 0.2 0.4 0.6 0.8 1.0 ACF 0.0 0.2 0.4 0.6 0.8 1.0 ACF 0.0 0.2 0.4 0.6 0.8 1.0 0 50 100 150 200 0 50 100 150 200 0 50 100 150 200 0 50 100 150 200 Lag Lag Lag Lag

Model M 3 : Marginal posterior distributions α 0 α 1 0 20000 50000 0 5 10 15 20 0e+00 4e 05 0.05 0.10 0.15 0.20 β ν 0 5 10 15 20 0.0 0.2 0.4 0.80 0.85 0.90 0.95 4 6 8 10 12

Model M 3 : p(α 1 + β data) Pr(α 1 + β > 1 data) = 0.0034 α 1 + β 0 10 20 30 40 0.92 0.94 0.96 0.98 1.00 1.02

Model M 3 : Quantiles from p(h 1/2 data) t Percentiles 2.5%, 50% and 97.5% of p(h 1/2 t Black vertical lines: rt 2 data) Standard deviations 0.00 0.05 0.10 0.15 0 500 1000 1500 Days

Comparing all 4 models Model log p(data M i ) MCMC M i Monte Carlo Harmonic Mean Scheme M 0 3508.33 3508.33 Closed form M 1 3509.47 3506.30 Gibbs sampler M 2 3509.71 3504.69 Gibbs sampler M 3-3901.46 Metropolis algorithm

M 4 : SV-AR(1) - For later in the course. Model: The basic stochastic volatility model can be written as r t h t N(0, exp{h t }) h t h t 1 N(µ + φ(h t 1 µ), σ 2 ) h 0 N(µ, σ 2 /(1 φ 2 )) where µ R, β ( 1, 1) and σ 2 > 0. Prior: A possible prior distribution is µ N(b µ, B µ ) φ + 1 Beta(a 0, b 0 ) E(φ) = 2a 0 1 2 a 0 + b 0 σ 2 G(1/2, 1/(2B σ )) E(σ 2 ) = B σ, and V (φ) = 4a 0 b 0 (a 0 + b 0 ) 2 (a 0 + b 0 + 1)

The SV-AR(1) model falls into the more general class of dynamic models (aka state-space models). The usual estimation technique is the Kalman filter, which can not be directly implemented here due to the nonlinearity of the observation equation. We will introduce (much later) the well-known Forward filtering backward sampling (FFBS) scheme that made posterior inference in the SV-AR(1) straightforward.

stochvol stochvol: Efficient Bayesian Inference for SV Models svsample(y,draws=10000,burnin=1000,priormu=c(-10,3), priorphi=c(5,1.5),priorsigma=1,thinpara=1, thinlatent=1,thintime=1,quiet=false, startpara,startlatent,expert,...) Example: # Simulate a highly persistent SV process # Obtain 5000 draws from the sampler sim = svsim(500,mu=-10,phi=0.99,sigma=0.2) draws = svsample(sim$y,draws=5000,burnin=100, priormu=c(-10,1), priorphi=c(20,1.2), priorsigma=0.2) Paper: Kastner and Frühwirth-Schnatter (2013) Ancillarity-sufficiency interweaving strategy for boosting MCMC estimation of stochastic volatility models. CSDA, http://dx.doi.org/10.1016/j.csda.2013.01.002.

Model M 4 : MCMC output µ 9 7 ACF 0.0 0.4 0.8 0.0 0.6 1.2 0 200 400 600 800 1000 iterations 0 10 20 30 40 Lag 9 8 7 6 5 β 0.95 0.98 ACF 0.0 0.4 0.8 0 20 40 0 200 400 600 800 1000 0 10 20 30 40 0.94 0.96 0.98 1.00 iterations Lag σ 2 0.10 0.20 ACF 0.0 0.4 0.8 0 5 10 15 0 200 400 600 800 1000 0 10 20 30 40 0.10 0.15 0.20 0.25 iterations Lag

Model M 4 : Marginal prior and posterior distributions µ β σ 2 0.0 0.5 1.0 1.5 2.0 0 5 10 15 20 25 30 0 2 4 6 8 10 12 14 9 8 7 6 5 0.92 0.94 0.96 0.98 1.00 0.10 0.15 0.20 0.25 0.0 0.2 0.4 0.6 0.8 1.0 1.2 0 10 20 30 40 50 0 5 10 15 9 8 7 6 5 0.92 0.94 0.96 0.98 1.00 0.10 0.15 0.20 0.25

Model M 4 : Quantiles from p(exp{h t /2} data) Percentiles 2.5%, 50% and 97.5% of p(exp{h t /2} data) Black vertical lines: r 2 t Standard deviation 0.00 0.05 0.10 0.15 0 500 1000 1500 Days