Bayesian Model Comparison:

Similar documents
Lecture 4: Dynamic models

Markov Chain Monte Carlo

Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

Lecture 13 Fundamentals of Bayesian Inference

Bayesian Stochastic Volatility (SV) Model with non-gaussian Errors

Introduction to Bayesian Inference

Principles of Bayesian Inference

Calibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

Learning the hyper-parameters. Luca Martino

Markov Chain Monte Carlo methods

BAYESIAN MODEL CRITICISM

Point, Interval, and Density Forecast Evaluation of Linear versus Nonlinear DSGE Models

Bayesian Inference and MCMC

Multivariate Normal & Wishart

David Giles Bayesian Econometrics

Bayesian Monte Carlo Filtering for Stochastic Volatility Models

Spatio-temporal precipitation modeling based on time-varying regressions

Bayesian Linear Regression

Bayesian Ingredients. Hedibert Freitas Lopes

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Computational statistics

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Riemann Manifold Methods in Bayesian Statistics

ComputationalToolsforComparing AsymmetricGARCHModelsviaBayes Factors. RicardoS.Ehlers

Dynamic Factor Models and Factor Augmented Vector Autoregressions. Lawrence J. Christiano

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Stock index returns density prediction using GARCH models: Frequentist or Bayesian estimation?

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix

STAT 425: Introduction to Bayesian Analysis

MCMC and Gibbs Sampling. Kayhan Batmanghelich

Gibbs Sampling in Linear Models #2

MCMC Sampling for Bayesian Inference using L1-type Priors

Hierarchical Linear Models

INTRODUCTION TO BAYESIAN STATISTICS

Doing Bayesian Integrals

Monte Carlo-based statistical methods (MASM11/FMS091)

Bayesian data analysis in practice: Three simple examples

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

TAKEHOME FINAL EXAM e iω e 2iω e iω e 2iω

Likelihood-free MCMC

Basic math for biology

Eco517 Fall 2014 C. Sims MIDTERM EXAM

Bayesian Phylogenetics:

Gaussian Multiscale Spatio-temporal Models for Areal Data

LECTURE 15 Markov chain Monte Carlo

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

Bayesian model selection in graphs by using BDgraph package

MONTE CARLO METHODS. Hedibert Freitas Lopes

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Sequential Monte Carlo Methods (for DSGE Models)

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Particle Learning and Smoothing

Principles of Bayesian Inference

MCMC algorithms for fitting Bayesian models

Partially Censored Posterior for Robust and Efficient Risk Evaluation.

SUPPLEMENT TO MARKET ENTRY COSTS, PRODUCER HETEROGENEITY, AND EXPORT DYNAMICS (Econometrica, Vol. 75, No. 3, May 2007, )

Recall that the AR(p) model is defined by the equation

Graphical Models for Collaborative Filtering

Bayesian Regression Linear and Logistic Regression

A New Structural Break Model with Application to Canadian Inflation Forecasting

Principles of Bayesian Inference

Introduction to Machine Learning CMU-10701

A note on Reversible Jump Markov Chain Monte Carlo

Nested Sampling. Brendon J. Brewer. brewer/ Department of Statistics The University of Auckland

Bayesian Linear Models

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Sequential Monte Carlo Methods for Bayesian Computation

Bayesian Estimation with Sparse Grids

Down by the Bayes, where the Watermelons Grow

Hierarchical models. Dr. Jarad Niemi. August 31, Iowa State University. Jarad Niemi (Iowa State) Hierarchical models August 31, / 31

Probabilistic Machine Learning

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

Introduction to Machine Learning

Bayesian Linear Models

ECO 513 Fall 2009 C. Sims HIDDEN MARKOV CHAIN MODELS

Sequential Monte Carlo Algorithms for Bayesian Sequential Design

Bayesian Methods for Machine Learning

Advanced Statistical Modelling

Control Variates for Markov Chain Monte Carlo

The Kalman filter, Nonlinear filtering, and Markov Chain Monte Carlo

Modelling and forecasting of offshore wind power fluctuations with Markov-Switching models

The linear model is the most fundamental of all serious statistical models encompassing:

Making rating curves - the Bayesian approach

Spring 2006: Introduction to Markov Chain Monte Carlo (MCMC)

Bayesian Estimation of Input Output Tables for Russia

SAMPLING ALGORITHMS. In general. Inference in Bayesian models

Bayesian Computations for DSGE Models

Bayesian Semiparametric GARCH Models

Markov Chain Monte Carlo methods

Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements

Bayesian Semiparametric GARCH Models

Estimating marginal likelihoods from the posterior draws through a geometric identity

CS281A/Stat241A Lecture 22

Monte Carlo Methods. Leon Gu CSD, CMU

Metropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9

Sequential Monte Carlo Methods

Monte Carlo in Bayesian Statistics

DSGE Methods. Estimation of DSGE models: Maximum Likelihood & Bayesian. Willi Mutschler, M.Sc.

Transcription:

Bayesian Model Comparison: Modeling Petrobrás log-returns Hedibert Freitas Lopes February 2014

Log price: y t = log p t Time span: 12/29/2000-12/31/2013 (n = 3268 days) LOG PRICE 1 2 3 4 0 500 1000 1500 2000 2500 3000 DAYS

Scatterplot of y t 1 versus y t y t 1 2 3 4 1 2 3 4 y t 1

Log return: r t = y t y t 1 = log(p t /p t 1 ) Time span: 01/02/2001-12/31/2013 (n = 3267 days) LOG RETURN 0.2 0.0 0.1 0.2 0 500 1000 1500 2000 2500 3000 DAYS

Histogram of r t 0 5 10 15 0.2 0.1 0.0 0.1 0.2 LOG RETURN

Models Model M 0 : r 1,..., r n iid N(0, σ 2 ) Model M 1 : r 1,..., r n iid N(µ, σ 2 ) Model M 2 : For t = 2,..., n for α, β R and σ 2 > 0. y t y t 1 N(α + βy t 1, σ 2 ),

Prior First n 0 = 1506 days (2001-2006): Prior knowledge. Let ỹ t and r t be log prices and log returns in the training sample. Let m r and V r be the sample mean and the sample variance of r t s. Last n = 1760 days (2007-2013): Inference and model comparison. Model M 0 : σ 2 IG (5, 4m r 2) Model M 1 : µ N(m r, 100V r /(n 0 1)) and σ 2 IG(5, V r ) Model M 2 : α N(a, 100V a ), β N(b, 100V b ), σ 2 IG(5, 4s 2 ). (a, b, s 2 ) are MLE of (α, β, σ 2 ) based on the training sample. V a and V b are the MLE variances of estimators a and b, respectively.

Model M 0 : Prior, posterior, MLE 0 2000 4000 6000 8000 10000 0.0005 0.0010 0.0015 0.0020 σ 2

Model M 1 : Gibbs sampler output µ 0.002 0.000 0.002 ACF 0.0 0.2 0.4 0.6 0.8 1.0 0 100 200 300 400 500 0e+00 4e+04 8e+04 iterations 0 10 20 30 40 50 Lag 0.004 0.002 0.000 0.002 σ 2 0.00100 0.00110 0.00120 ACF 0.0 0.2 0.4 0.6 0.8 1.0 0 2000 6000 10000 0e+00 4e+04 8e+04 iterations 0 10 20 30 40 50 Lag 0.00095 0.00105 0.00115 0.00125

Model M 1 : Prior, posterior, MLE µ σ 2 0 100 300 500 0 2000 6000 10000 0.03 0.00 0.02 0.000 0.002 0.004

Model M 2 : Gibbs sampler output α 0.010 0.005 0.015 0.025 0e+00 2e+04 4e+04 6e+04 8e+04 1e+05 ACF 0.0 0.2 0.4 0.6 0.8 1.0 0 10 20 30 40 50 0 10 20 30 40 50 60 0.01 0.00 0.01 0.02 0.03 iterations Lag β 0.992 0.996 1.000 ACF 0.0 0.2 0.4 0.6 0.8 1.0 0 50 100 150 200 0e+00 2e+04 4e+04 6e+04 8e+04 1e+05 0 10 20 30 40 50 0.990 0.994 0.998 1.002 iterations Lag σ 2 0.00100 0.00110 ACF 0.0 0.2 0.4 0.6 0.8 1.0 0 2000 6000 10000 0e+00 2e+04 4e+04 6e+04 8e+04 1e+05 0 10 20 30 40 50 0.00095 0.00105 0.00115 0.00125 iterations Lag

Model M 2 : Prior, posterior, MLE α β σ 2 0 10 20 30 40 50 60 0 50 100 150 200 0 2000 4000 6000 8000 10000 0.2 0.1 0.0 0.1 0.2 0.90 0.95 1.00 1.05 1.10 0.0000 0.0010 0.0020 0.0030

Comparing the models Recall that the posterior of θ i under model M i is p(θ i data, M i ) p(θ i M i )p(data θ i, M i ), while the marginal likelihood under model M i is p(data M i ) = p(data θ i, M i )p(θ i M i )dθ i. Therefore, the posterior model probability of M i is Pr(M i data) Pr(M i )p(data M i ), and the Bayes factor of model M i against model M j is B ij = Pr(M i)p(data M i ) Pr(M j )p(data M j )

Monte Carlo integration A major (computational) difficulty is calculating the marginal likelihood (or predictive) under model M, ie. p(data M). The simplest estimator of p(data M) is obtained by Monte Carlo integration. More specifically, when {θ (1),..., θ (N) } represents a sample from the prior p(θ M), then (for large N) p N (data M) = 1 N N p(data θ (j), M), j=1 is the simple Monte Carlo estimator of p(data M).

Harmonic mean identity (and estimator) An alternative way of calculating p(data M) is via the harmonic mean identity: 1 p(data M) = 1 p(θ data, M)dθ p(data θ, M) Monte Carlo integration: When {θ (1),..., θ (N) } represents a sample from the posterior p(θ data, M), then (for large N) p N (data M) = 1 N N 1 p(data θ (j), M) j=1 is the harmonic mean estimator of p(data M). 1

Comparing M 0, M 1 and M 2 Log predictive M i log p(data M i ) Computation Monte Carlo Harmonic Mean M 0 3508.33 3508.33 Closed form M 1 3509.47 3506.30 Gibbs sampler M 2 3509.71 3504.69 Gibbs sampler Posterior model probability M i Pr(M i data) Computation Monte Carlo Harmonic Mean M 0 12.4 86.4 Closed form M 1 38.5 11.3 Gibbs sampler M 2 49.1 2.3 Gibbs sampler log p MC (data M i ) = 3508.34 log p HM (data M i ) = 3509.8

M 3 : GARCH(1,1) with t errors - Let us get more serious! The GARCH(1,1) model with Student-t innovations: r t t ν (0, ρh t ) h t = α 0 + α 1 rt 1 2 + βh t 1, where α 0 > 0, α 1 0 and β > 0. We set the initial variance to h 0 = 0 for convenience. We let ρ = (ν 2)/ν so that V (r t h t ) = ν ν 2 ρh t = h t.

Prior Let ψ = (α, β, ν) and α = (α 0, α 1 ). The prior distribution of ψ is such that p(α, β, µ) = p(α)p(β)p(ν) where α N 2 (µ α, Σ α )I (α>0) β N(µ β, Σ β )I (β>0) and p(ν) = λ exp{ λ(ν δ)}i (λ>δ) for λ > 0 and δ 2, such that E(ν) = δ + 1/λ. Normal case: λ = 100 and δ = 500.

bayesgarch bayesgarch: Bayesian Estimation of the GARCH(1,1) Model with Student-t Innovations bayesgarch(r,mu.alpha = c(0,0),sigma.alpha=1000*diag(1,2), mu.beta=0,sigma.beta=1000, lambda=0.01,delta=2,control=list()) Paper: Ardia and Hoogerheide (2010) Bayesian Estimation of the GARCH(1,1) Model with Student-t Innovations. The R Journal, 2,41-47. http://cran.r-project.org/web/packages/bayesgarch

Example of R script Recall that r 0 are Petrobras returns for the first part of the data. M0 = 10000 # to be discarded (burn-in) M = 10000 # kept for posterior inference niter = M0+M MCMC.initial = bayesgarch(r0,mu.alpha=c(0,0),sigma.alpha=1000*diag(1,2), mu.beta=0,sigma.beta=1000,lambda=0.01,delta=2, control=list(n.chain=1,l.chain=niter,refresh=100)) draws = MCMC.initial$chain1 range = (M0+1):niter par(mfrow=c(2,2)) ts.plot(draws[range,1],xlab="iterations",main=expression(alpha[0]),ylab="") ts.plot(draws[range,2],xlab="iterations",main=expression(alpha[1]),ylab="") ts.plot(draws[range,3],xlab="iterations",main=expression(beta),ylab="") ts.plot(draws[range,4],xlab="iterations",main=expression(nu),ylab="")

Model M 3 : MCMC output α 0 α 1 β ν 1e 05 2e 05 3e 05 4e 05 5e 05 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.82 0.84 0.86 0.88 0.90 0.92 0.94 4 6 8 10 12 2e+04 6e+04 1e+05 2e+04 6e+04 1e+05 2e+04 6e+04 1e+05 2e+04 6e+04 1e+05 iterations iterations iterations iterations ACF 0.0 0.2 0.4 0.6 0.8 1.0 ACF 0.0 0.2 0.4 0.6 0.8 1.0 ACF 0.0 0.2 0.4 0.6 0.8 1.0 ACF 0.0 0.2 0.4 0.6 0.8 1.0 0 50 100 150 200 0 50 100 150 200 0 50 100 150 200 0 50 100 150 200 Lag Lag Lag Lag

Model M 3 : Marginal posterior distributions α 0 α 1 0 20000 50000 0 5 10 15 20 0e+00 4e 05 0.05 0.10 0.15 0.20 β ν 0 5 10 15 20 0.0 0.2 0.4 0.80 0.85 0.90 0.95 4 6 8 10 12

Model M 3 : p(α 1 + β data) Pr(α 1 + β > 1 data) = 0.0034 α 1 + β 0 10 20 30 40 0.92 0.94 0.96 0.98 1.00 1.02

Model M 3 : Quantiles from p(h 1/2 data) t Percentiles 2.5%, 50% and 97.5% of p(h 1/2 t Black vertical lines: rt 2 data) Standard deviations 0.00 0.05 0.10 0.15 0 500 1000 1500 Days

Comparing all 4 models Model log p(data M i ) MCMC M i Monte Carlo Harmonic Mean Scheme M 0 3508.33 3508.33 Closed form M 1 3509.47 3506.30 Gibbs sampler M 2 3509.71 3504.69 Gibbs sampler M 3-3901.46 Metropolis algorithm

M 4 : SV-AR(1) - For later in the course. Model: The basic stochastic volatility model can be written as r t h t N(0, exp{h t }) h t h t 1 N(µ + φ(h t 1 µ), σ 2 ) h 0 N(µ, σ 2 /(1 φ 2 )) where µ R, β ( 1, 1) and σ 2 > 0. Prior: A possible prior distribution is µ N(b µ, B µ ) φ + 1 Beta(a 0, b 0 ) E(φ) = 2a 0 1 2 a 0 + b 0 σ 2 G(1/2, 1/(2B σ )) E(σ 2 ) = B σ, and V (φ) = 4a 0 b 0 (a 0 + b 0 ) 2 (a 0 + b 0 + 1)

The SV-AR(1) model falls into the more general class of dynamic models (aka state-space models). The usual estimation technique is the Kalman filter, which can not be directly implemented here due to the nonlinearity of the observation equation. We will introduce (much later) the well-known Forward filtering backward sampling (FFBS) scheme that made posterior inference in the SV-AR(1) straightforward.

stochvol stochvol: Efficient Bayesian Inference for SV Models svsample(y,draws=10000,burnin=1000,priormu=c(-10,3), priorphi=c(5,1.5),priorsigma=1,thinpara=1, thinlatent=1,thintime=1,quiet=false, startpara,startlatent,expert,...) Example: # Simulate a highly persistent SV process # Obtain 5000 draws from the sampler sim = svsim(500,mu=-10,phi=0.99,sigma=0.2) draws = svsample(sim$y,draws=5000,burnin=100, priormu=c(-10,1), priorphi=c(20,1.2), priorsigma=0.2) Paper: Kastner and Frühwirth-Schnatter (2013) Ancillarity-sufficiency interweaving strategy for boosting MCMC estimation of stochastic volatility models. CSDA, http://dx.doi.org/10.1016/j.csda.2013.01.002.

Model M 4 : MCMC output µ 9 7 ACF 0.0 0.4 0.8 0.0 0.6 1.2 0 200 400 600 800 1000 iterations 0 10 20 30 40 Lag 9 8 7 6 5 β 0.95 0.98 ACF 0.0 0.4 0.8 0 20 40 0 200 400 600 800 1000 0 10 20 30 40 0.94 0.96 0.98 1.00 iterations Lag σ 2 0.10 0.20 ACF 0.0 0.4 0.8 0 5 10 15 0 200 400 600 800 1000 0 10 20 30 40 0.10 0.15 0.20 0.25 iterations Lag

Model M 4 : Marginal prior and posterior distributions µ β σ 2 0.0 0.5 1.0 1.5 2.0 0 5 10 15 20 25 30 0 2 4 6 8 10 12 14 9 8 7 6 5 0.92 0.94 0.96 0.98 1.00 0.10 0.15 0.20 0.25 0.0 0.2 0.4 0.6 0.8 1.0 1.2 0 10 20 30 40 50 0 5 10 15 9 8 7 6 5 0.92 0.94 0.96 0.98 1.00 0.10 0.15 0.20 0.25

Model M 4 : Quantiles from p(exp{h t /2} data) Percentiles 2.5%, 50% and 97.5% of p(exp{h t /2} data) Black vertical lines: r 2 t Standard deviation 0.00 0.05 0.10 0.15 0 500 1000 1500 Days