Likelihood-free MCMC

Similar documents
F denotes cumulative density. denotes probability density function; (.)

Session 3A: Markov chain Monte Carlo (MCMC)

Monte Carlo in Bayesian Statistics

Markov Chain Monte Carlo

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo (MCMC)

Calibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

MONTE CARLO METHODS. Hedibert Freitas Lopes

Eco517 Fall 2014 C. Sims MIDTERM EXAM

STA 4273H: Statistical Machine Learning

Bayesian Inference and MCMC

Approximate Bayesian Computation: a simulation based approach to inference

Bayesian Estimation with Sparse Grids

Pseudo-marginal MCMC methods for inference in latent variable models

Down by the Bayes, where the Watermelons Grow

Computational statistics

ComputationalToolsforComparing AsymmetricGARCHModelsviaBayes Factors. RicardoS.Ehlers

Tutorial on ABC Algorithms

Principles of Bayesian Inference

Bayesian Methods for Machine Learning

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture February Arnaud Doucet

LECTURE 15 Markov chain Monte Carlo

MCMC algorithms for fitting Bayesian models

Lecture 7 and 8: Markov Chain Monte Carlo

Introduction to Markov Chain Monte Carlo & Gibbs Sampling

Learning the hyper-parameters. Luca Martino

CPSC 540: Machine Learning

Bayesian Phylogenetics:

Control Variates for Markov Chain Monte Carlo

Metropolis-Hastings Algorithm

SAMPLING ALGORITHMS. In general. Inference in Bayesian models

Markov Chain Monte Carlo, Numerical Integration

Bayesian Estimation of Input Output Tables for Russia

Nonparametric Bayesian Methods - Lecture I

A Review of Pseudo-Marginal Markov Chain Monte Carlo

Default Priors and Effcient Posterior Computation in Bayesian

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Eco517 Fall 2013 C. Sims MCMC. October 8, 2013

CSC 2541: Bayesian Methods for Machine Learning

The Metropolis-Hastings Algorithm. June 8, 2012

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods


Advanced Statistical Modelling

Hierarchical models. Dr. Jarad Niemi. August 31, Iowa State University. Jarad Niemi (Iowa State) Hierarchical models August 31, / 31

Markov Chain Monte Carlo Methods

Riemann Manifold Methods in Bayesian Statistics

Introduction to Machine Learning CMU-10701

Deblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox Richard A. Norton, J.

Markov chain Monte Carlo

Bayesian Semiparametric GARCH Models

Bayesian model selection in graphs by using BDgraph package

Bayesian Regression Linear and Logistic Regression

MCMC: Markov Chain Monte Carlo

CSC 2541: Bayesian Methods for Machine Learning

Bayesian Semiparametric GARCH Models

STAT 425: Introduction to Bayesian Analysis

Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation. Luke Tierney Department of Statistics & Actuarial Science University of Iowa

Computer Practical: Metropolis-Hastings-based MCMC

Bayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference

Kernel Sequential Monte Carlo

Brief introduction to Markov Chain Monte Carlo

Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US

Kernel adaptive Sequential Monte Carlo

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo

Inference in state-space models with multiple paths from conditional SMC

(I AL BL 2 )z t = (I CL)ζ t, where

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters

Likelihood-free Markov chain Monte

Estimation of Operational Risk Capital Charge under Parameter Uncertainty

MCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

Hmms with variable dimension structures and extensions

Practical Bayesian Quantile Regression. Keming Yu University of Plymouth, UK

an introduction to bayesian inference

CTDL-Positive Stable Frailty Model

Approximate Bayesian computation for spatial extremes via open-faced sandwich adjustment

Answers and expectations

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

MARKOV CHAIN MONTE CARLO

Lecture 8: The Metropolis-Hastings Algorithm

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

Inexact approximations for doubly and triply intractable problems

eqr094: Hierarchical MCMC for Bayesian System Reliability

Data Analysis I. Dr Martin Hendry, Dept of Physics and Astronomy University of Glasgow, UK. 10 lectures, beginning October 2006

Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements

Modeling conditional distributions with mixture models: Applications in finance and financial decision-making

MCMC and Gibbs Sampling. Kayhan Batmanghelich

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

Modeling conditional distributions with mixture models: Theory and Inference

Theory of Stochastic Processes 8. Markov chain Monte Carlo

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

Controlled sequential Monte Carlo

Advances and Applications in Perfect Sampling

Lecture 6: Markov Chain Monte Carlo

An ABC interpretation of the multiple auxiliary variable method

Transcription:

Bayesian inference for stable distributions with applications in finance Department of Mathematics University of Leicester September 2, 2011 MSc project final presentation

Outline 1 2 3 4

Classical Monte Carlo Integration Bayes formula f (θ D) π (θ) P (D θ) f (θ D): posterior π (θ): prior P (D θ):likelihood Evaluating integrals Normalisation Z = π (θ) P (D θ) dθ Marginalisation f (θ D) = f (θ, x D) dx Expectation E f [h (θ)] = h (θ) f (θ D) dθ Suppose we can draw samples θ (j) f (θ D), j = 1,..., m E f [h (θ)] 1 m m j=1 h ( θ (j))

Rejection sampling 1 Sample x (i) g (x) 2 Accept x (i) with probability f (x(i) )/Mg(x (i) ), then go to 1

Importance sampling A different way to view E f [h (θ)] E f [h (θ)] = ˆ 1 m h (θ) f (θ) [ ] f (θ) g (θ) g (θ) dθ = E g g (θ) h (θ) m f ( θ (j)) ( g ( θ (j))h θ (j)) (1) j=1 for θ (j) drawn from g (θ) Importance sampling does not throw away samples, it gives different weights(importance) f ( θ (j) D ) /g ( θ (j)).

Importance sampling In Bayesian context with normalising constant not known: [ ] f (θ D) E f [h (θ)] = E g g (θ) h (θ) [ ] 1 π (θ) P (D θ) = P (D) E g h (θ) g (θ) m j=1 π ( θ (j)) P ( D θ (j)) h ( θ (j)) /g ( θ (j)) = m j=1 P ( D θ (j)) π ( θ (j)) /g ( θ (j)) where θ (j) g (θ) This can also be used in general setting, i.e. use m j=1 h(θ(j) )f (θ (j) )/g(θ (j) ) m as an alternative to (1), with an j=1 f (θ(j) )/g(θ (j) ) improvement in variance.

What if likelihoods are unavailable? Approximating posterior, avoid likelihood evaluation known as approximate Bayesian computation Some early literatures LF-RS Tavaré et al., 1997 Inferring Coalescence Times From DNA Sequence Data replacing the full dataset with summary statistics. Fu and Li, 1997 Estimating the age of the common ancestor of a sample of DNA sequences simulating a new dataset, comparing with the observed one. LF-MCMC Marjoram et al., 2003 Markov Chain Monte Carlo without likelihoods MCMC approach generalized from LF-RS

Likelihood-free rejection sampling The idea can be seen in the following algorithm. LF-RS 1 Simulate from the prior θ π 2 Generate D under the model with parameter θ 3 Accept θ if D = D ;go to 1 D : observed dataset D : simulated dataset In practice one replace D and D with corresponding summary statistics S and S.The condition can be rewritten as ρ (S, S ) ε for some distance measure ρ (e.g. Euclidean). This will result in an approximate posterior f (θ ρ (S, S ) ε).

LF-RS example Example Suppose y 1, y 2,... y n are observations from Exp (θ) with density f (y θ) = θe θy, y > 0. The prior for θ is conjugate gamma distribution θ Gamma (α, β), then the posterior is gamma with altered parameters θ D Gamma (n + α, β/ (β y i + 1)). Let α = 3, β = 1 and n = 5 observations from Exp (2), and choose the sample mean y as a sufficient statistic. We simulate the posterior distribution using LF-RS algorithm with ε = 1, and 0.1

LF-RS example Simulation results Results for ε = 1(left) and ε = 0.1(right)

Markov Chain Monte Carlo(MCMC) About MCMC Algorithms that realize Markov chain We want the invariant distribution of the chain to be our target distribution Samples can be taken as drawn from the target distribution after running the chain for a long time

Markov Chain Monte Carlo(MCMC) About MCMC Algorithms that realize Markov chain We want the invariant distribution of the chain to be our target distribution Samples can be taken as drawn from the target distribution after running the chain for a long time MCMC History Metropolis, et.al.(1953). Equations of state calculations by fast computing machines. J. Chem. Phys. 21 1087 1092. Hastings, W. (1970). Monte Carlo sampling methods using Markov chains and their application. Biometrika 57 97 109. Gelfand, A. E. and Smith, A. F. M. (1990). Sampling-based approaches to calculating marginal densities. J. Amer. Statist. Assoc. 85 398 409.

Constructing MCMC algorithms Ergodic Theorem gaurantees convergence. From Markov chain theory, general balance implies(in discrete setting) f P = f (2) f : invariant distribution P: transition matrix with elements P ij = P (x t+1 = j x t = i) := P (i j). Sum over each row is one. Detailed balance P ( x x ) f (x) = P ( x x ) f ( x ) Summing both sides over x, we get (2).

Metropolis-Hastings algorithm Metropolis-Hastings 1 If now at θ, propose a move to θ according to a proposal distribution q (θ θ ) 2 Accept θ with probability A (θ, θ ) = min{1, f (θ )q(θ θ) f (θ)q(θ θ ) } 3 Go to 1 until desired number of iterations

Theorem The invariant distribution of the chain is f (θ). Proof. We show that detailed balance is satisfied. The M-H transition probability is Choose(w.l.o.g) Then P ( θ θ ) = q ( θ θ ) A ( θ, θ ) f (θ ) q (θ θ) f (θ) q (θ θ ) 1 P ( θ θ ) f (θ) = q ( θ θ ) f (θ ) q (θ θ) f (θ) q (θ θ ) f (θ) = f ( θ ) q ( θ θ ) A ( θ, θ ) = P ( θ θ ) f ( θ )

(Marjoram et al.,2003) proposed a MCMC method without likelihood evaluation LF-MCMC 1.If now at θ, propose a move to θ according to a proposal distribution q (θ θ ) 2.Generate D under model with θ 3.If D = D, go to 4; otherwise return to 1 4.Accept θ with probability A (θ, θ ) = min{1, π(θ )q(θ θ) π(θ)q(θ θ ) }, then go to 1 One can proof the invariant distribution is f (θ D). Approximate posterior: replacing D = D with ρ (S, S ) ε

Stable distributions No closed form densities in general.

Stable distributions No closed form densities in general. 4 parameters: α (0, 2] determins tail behavior, β [ 1, 1] the skewness, γ > 0 the scale and δ R the location.

Stable distributions No closed form densities in general. 4 parameters: α (0, 2] determins tail behavior, β [ 1, 1] the skewness, γ > 0 the scale and δ R the location. Special cases: Cauchy (α = 1, β = 0) Normal (α = 2, β = 0) Levy (α = 1/2, β = 1)

Stable distributions No closed form densities in general. 4 parameters: α (0, 2] determins tail behavior, β [ 1, 1] the skewness, γ > 0 the scale and δ R the location. Special cases: Cauchy (α = 1, β = 0) Normal (α = 2, β = 0) Levy (α = 1/2, β = 1) Infinite variance(except α = 2), mean is existed only if 1 < α 2.

Stable distributions No closed form densities in general. 4 parameters: α (0, 2] determins tail behavior, β [ 1, 1] the skewness, γ > 0 the scale and δ R the location. Special cases: Cauchy (α = 1, β = 0) Normal (α = 2, β = 0) Levy (α = 1/2, β = 1) Infinite variance(except α = 2), mean is existed only if 1 < α 2. Generalized CLT

Some literatures Bayesian inference for stable models Buckle, D.J., 1995. Bayesian inference for stable distributions. Journal of the American Statistical Association 90, 605 613. Auxiliary variable Gibbs sampler Lombardi, M.J., 2007. Bayesian inference for alpha stable distributions: a random walk MCMC approach. Computational Statistics & Data Analysis 51, 2688 2700. Evaluating likelihood via inverse Fourier transform combined with a series expansion Peters, G.W., Sisson, S.A., Fan, Y., 2010. Likelihood-free Bayesian inference for α-stable models. Computational Statistics and Data Analysis. doi:10.1016/j.csda.2010.10.004 Likelihood-free sequential Monte Carlo sampler

Before implementing LF-MCMC Assumptions of simulation: Estimate one parameter with the other three parameters fixed. * Use flat prior for the parameter to be estimated. Use a Gaussian transition kernel centered at current state. If the parameter is within some interval, simply truncate those values that are outside the interval. Use quantiles and Kolmogorov-Smirnov statistic as summary statistics. Use a fixed ε value during computation. * *: These assumptions will be dropped later.

Simulation results Fix parameters β, γ, δ Simulation results for α based on 200 observations from Stable (1.5, 0.5, 10, 10) using a fixed ε = 25. (Left) Sample path of α, true value is 1.5. (Right) Trace of sample average.

Simulation results Fix parameters β, γ, δ Sample path and ergodic average plot for α. Top: ε = 15, acceptance rate: 1.3% Bottom: ε = 50 acceptance rate:34.2%

Modified LF-MCMC Motivation Modification dynamically define ε t as a monotonically decreasing sequence: { max{ε min, min{ε, ε t 1 }} if accept θ ε t = otherwise ε t 1 ε 0 = ρ (S, S 0 ), ε = ρ (S, S ), where S 0 : summary statistics for the dataset generated by the intital value and ε min : target ε value. Before: compare with the target ε value(global comparison) Now: compare with the previous ε value(local comparison) adaptively change the variance of the proposal distribution accelerate/control chain mixing

Modified LF-MCMC Motivation Modification dynamically define ε t as a monotonically decreasing sequence: { max{ε min, min{ε, ε t 1 }} if accept θ ε t = otherwise ε t 1 ε 0 = ρ (S, S 0 ), ε = ρ (S, S ), where S 0 : summary statistics for the dataset generated by the intital value and ε min : target ε value. Before: compare with the target ε value(global comparison) Now: compare with the previous ε value(local comparison) adaptively change the variance of the proposal distribution accelerate/control chain mixing

Modified LF-MCMC Motivation Modification dynamically define ε t as a monotonically decreasing sequence: { max{ε min, min{ε, ε t 1 }} if accept θ ε t = otherwise ε t 1 ε 0 = ρ (S, S 0 ), ε = ρ (S, S ), where S 0 : summary statistics for the dataset generated by the intital value and ε min : target ε value. Before: compare with the target ε value(global comparison) Now: compare with the previous ε value(local comparison) adaptively change the variance of the proposal distribution accelerate/control chain mixing

Simulation results All four parameters unknown Simulation results for Stable (α, β, γ, δ) based on 500 observations from Stable (1.5, 0.5, 10, 10), using 10000 iterations and ε min = 15.

LF-MCMC for variance gamma (VG) distribution For comparison, we apply the method to VG distribution.

LF-MCMC for variance gamma (VG) distribution For comparison, we apply the method to VG distribution. VG process (Madan and Seneta, 1990) (Madan, Carr and Chang, 1998) X (VG) t = θg t + σw Gt G t is a gamma process with mean rate unity and variance rate ν, W t is the standard Brownian motion.

LF-MCMC for variance gamma (VG) distribution For comparison, we apply the method to VG distribution. VG process (Madan and Seneta, 1990) (Madan, Carr and Chang, 1998) X (VG) t = θg t + σw Gt G t is a gamma process with mean rate unity and variance rate ν, W t is the standard Brownian motion. Unit period distribution VG (σ, ν, θ) pdf can be written in terms of modified Bessel function of the second kind

LF-MCMC for variance gamma (VG) distribution For comparison, we apply the method to VG distribution. VG process (Madan and Seneta, 1990) (Madan, Carr and Chang, 1998) X (VG) t = θg t + σw Gt G t is a gamma process with mean rate unity and variance rate ν, W t is the standard Brownian motion. Unit period distribution VG (σ, ν, θ) pdf can be written in terms of modified Bessel function of the second kind VG distribution has finite moments of all order.

LF-MCMC for variance gamma (VG) distribution Simulation results Simulation results for VG (σ, ν, θ, µ) based on 500 observations from VG (0.8, 1, 0.5, 10), using 10000 iterations and ε min = 1. Added summary statistics: mean and variance.

Application to financial data Fit stable distribution to real financial data. The data is the S&P 500 index from the period of January 2009 to July 2011, with 629 daily log returns and the prices are adjusted close price. Implement 10000 iterations of LF-MCMC, discard first 2000 iterations, averaging over the samples gave the values of posterior estimates: α: 1.3542 β: 0.0741 γ: 0.0070 δ: 0.0019

blue=stable fit, green=smoothed data The figure is produced using J.P. Nolan s STABLE program, available at http://academic2.american.edu/ jpnolan

Concluding remarks Our results: apply LF-MCMC to the inference for stable models

Concluding remarks Our results: apply LF-MCMC to the inference for stable models make the method applicable to general cases

Concluding remarks Our results: apply LF-MCMC to the inference for stable models make the method applicable to general cases relatively low computational cost

Concluding remarks Our results: Pitfalls: apply LF-MCMC to the inference for stable models make the method applicable to general cases relatively low computational cost need to specify a proper target ε value

Concluding remarks Our results: Pitfalls: apply LF-MCMC to the inference for stable models make the method applicable to general cases relatively low computational cost need to specify a proper target ε value don t know when convergence will happen, need more iterations

Concluding remarks Our results: Pitfalls: apply LF-MCMC to the inference for stable models make the method applicable to general cases relatively low computational cost need to specify a proper target ε value don t know when convergence will happen, need more iterations choice of summary statistics can crucially affect sampler performance

Acknowledgement Thanks to My supervisor Dr. Ray Kawai