Bayesian inference for stable distributions with applications in finance Department of Mathematics University of Leicester September 2, 2011 MSc project final presentation
Outline 1 2 3 4
Classical Monte Carlo Integration Bayes formula f (θ D) π (θ) P (D θ) f (θ D): posterior π (θ): prior P (D θ):likelihood Evaluating integrals Normalisation Z = π (θ) P (D θ) dθ Marginalisation f (θ D) = f (θ, x D) dx Expectation E f [h (θ)] = h (θ) f (θ D) dθ Suppose we can draw samples θ (j) f (θ D), j = 1,..., m E f [h (θ)] 1 m m j=1 h ( θ (j))
Rejection sampling 1 Sample x (i) g (x) 2 Accept x (i) with probability f (x(i) )/Mg(x (i) ), then go to 1
Importance sampling A different way to view E f [h (θ)] E f [h (θ)] = ˆ 1 m h (θ) f (θ) [ ] f (θ) g (θ) g (θ) dθ = E g g (θ) h (θ) m f ( θ (j)) ( g ( θ (j))h θ (j)) (1) j=1 for θ (j) drawn from g (θ) Importance sampling does not throw away samples, it gives different weights(importance) f ( θ (j) D ) /g ( θ (j)).
Importance sampling In Bayesian context with normalising constant not known: [ ] f (θ D) E f [h (θ)] = E g g (θ) h (θ) [ ] 1 π (θ) P (D θ) = P (D) E g h (θ) g (θ) m j=1 π ( θ (j)) P ( D θ (j)) h ( θ (j)) /g ( θ (j)) = m j=1 P ( D θ (j)) π ( θ (j)) /g ( θ (j)) where θ (j) g (θ) This can also be used in general setting, i.e. use m j=1 h(θ(j) )f (θ (j) )/g(θ (j) ) m as an alternative to (1), with an j=1 f (θ(j) )/g(θ (j) ) improvement in variance.
What if likelihoods are unavailable? Approximating posterior, avoid likelihood evaluation known as approximate Bayesian computation Some early literatures LF-RS Tavaré et al., 1997 Inferring Coalescence Times From DNA Sequence Data replacing the full dataset with summary statistics. Fu and Li, 1997 Estimating the age of the common ancestor of a sample of DNA sequences simulating a new dataset, comparing with the observed one. LF-MCMC Marjoram et al., 2003 Markov Chain Monte Carlo without likelihoods MCMC approach generalized from LF-RS
Likelihood-free rejection sampling The idea can be seen in the following algorithm. LF-RS 1 Simulate from the prior θ π 2 Generate D under the model with parameter θ 3 Accept θ if D = D ;go to 1 D : observed dataset D : simulated dataset In practice one replace D and D with corresponding summary statistics S and S.The condition can be rewritten as ρ (S, S ) ε for some distance measure ρ (e.g. Euclidean). This will result in an approximate posterior f (θ ρ (S, S ) ε).
LF-RS example Example Suppose y 1, y 2,... y n are observations from Exp (θ) with density f (y θ) = θe θy, y > 0. The prior for θ is conjugate gamma distribution θ Gamma (α, β), then the posterior is gamma with altered parameters θ D Gamma (n + α, β/ (β y i + 1)). Let α = 3, β = 1 and n = 5 observations from Exp (2), and choose the sample mean y as a sufficient statistic. We simulate the posterior distribution using LF-RS algorithm with ε = 1, and 0.1
LF-RS example Simulation results Results for ε = 1(left) and ε = 0.1(right)
Markov Chain Monte Carlo(MCMC) About MCMC Algorithms that realize Markov chain We want the invariant distribution of the chain to be our target distribution Samples can be taken as drawn from the target distribution after running the chain for a long time
Markov Chain Monte Carlo(MCMC) About MCMC Algorithms that realize Markov chain We want the invariant distribution of the chain to be our target distribution Samples can be taken as drawn from the target distribution after running the chain for a long time MCMC History Metropolis, et.al.(1953). Equations of state calculations by fast computing machines. J. Chem. Phys. 21 1087 1092. Hastings, W. (1970). Monte Carlo sampling methods using Markov chains and their application. Biometrika 57 97 109. Gelfand, A. E. and Smith, A. F. M. (1990). Sampling-based approaches to calculating marginal densities. J. Amer. Statist. Assoc. 85 398 409.
Constructing MCMC algorithms Ergodic Theorem gaurantees convergence. From Markov chain theory, general balance implies(in discrete setting) f P = f (2) f : invariant distribution P: transition matrix with elements P ij = P (x t+1 = j x t = i) := P (i j). Sum over each row is one. Detailed balance P ( x x ) f (x) = P ( x x ) f ( x ) Summing both sides over x, we get (2).
Metropolis-Hastings algorithm Metropolis-Hastings 1 If now at θ, propose a move to θ according to a proposal distribution q (θ θ ) 2 Accept θ with probability A (θ, θ ) = min{1, f (θ )q(θ θ) f (θ)q(θ θ ) } 3 Go to 1 until desired number of iterations
Theorem The invariant distribution of the chain is f (θ). Proof. We show that detailed balance is satisfied. The M-H transition probability is Choose(w.l.o.g) Then P ( θ θ ) = q ( θ θ ) A ( θ, θ ) f (θ ) q (θ θ) f (θ) q (θ θ ) 1 P ( θ θ ) f (θ) = q ( θ θ ) f (θ ) q (θ θ) f (θ) q (θ θ ) f (θ) = f ( θ ) q ( θ θ ) A ( θ, θ ) = P ( θ θ ) f ( θ )
(Marjoram et al.,2003) proposed a MCMC method without likelihood evaluation LF-MCMC 1.If now at θ, propose a move to θ according to a proposal distribution q (θ θ ) 2.Generate D under model with θ 3.If D = D, go to 4; otherwise return to 1 4.Accept θ with probability A (θ, θ ) = min{1, π(θ )q(θ θ) π(θ)q(θ θ ) }, then go to 1 One can proof the invariant distribution is f (θ D). Approximate posterior: replacing D = D with ρ (S, S ) ε
Stable distributions No closed form densities in general.
Stable distributions No closed form densities in general. 4 parameters: α (0, 2] determins tail behavior, β [ 1, 1] the skewness, γ > 0 the scale and δ R the location.
Stable distributions No closed form densities in general. 4 parameters: α (0, 2] determins tail behavior, β [ 1, 1] the skewness, γ > 0 the scale and δ R the location. Special cases: Cauchy (α = 1, β = 0) Normal (α = 2, β = 0) Levy (α = 1/2, β = 1)
Stable distributions No closed form densities in general. 4 parameters: α (0, 2] determins tail behavior, β [ 1, 1] the skewness, γ > 0 the scale and δ R the location. Special cases: Cauchy (α = 1, β = 0) Normal (α = 2, β = 0) Levy (α = 1/2, β = 1) Infinite variance(except α = 2), mean is existed only if 1 < α 2.
Stable distributions No closed form densities in general. 4 parameters: α (0, 2] determins tail behavior, β [ 1, 1] the skewness, γ > 0 the scale and δ R the location. Special cases: Cauchy (α = 1, β = 0) Normal (α = 2, β = 0) Levy (α = 1/2, β = 1) Infinite variance(except α = 2), mean is existed only if 1 < α 2. Generalized CLT
Some literatures Bayesian inference for stable models Buckle, D.J., 1995. Bayesian inference for stable distributions. Journal of the American Statistical Association 90, 605 613. Auxiliary variable Gibbs sampler Lombardi, M.J., 2007. Bayesian inference for alpha stable distributions: a random walk MCMC approach. Computational Statistics & Data Analysis 51, 2688 2700. Evaluating likelihood via inverse Fourier transform combined with a series expansion Peters, G.W., Sisson, S.A., Fan, Y., 2010. Likelihood-free Bayesian inference for α-stable models. Computational Statistics and Data Analysis. doi:10.1016/j.csda.2010.10.004 Likelihood-free sequential Monte Carlo sampler
Before implementing LF-MCMC Assumptions of simulation: Estimate one parameter with the other three parameters fixed. * Use flat prior for the parameter to be estimated. Use a Gaussian transition kernel centered at current state. If the parameter is within some interval, simply truncate those values that are outside the interval. Use quantiles and Kolmogorov-Smirnov statistic as summary statistics. Use a fixed ε value during computation. * *: These assumptions will be dropped later.
Simulation results Fix parameters β, γ, δ Simulation results for α based on 200 observations from Stable (1.5, 0.5, 10, 10) using a fixed ε = 25. (Left) Sample path of α, true value is 1.5. (Right) Trace of sample average.
Simulation results Fix parameters β, γ, δ Sample path and ergodic average plot for α. Top: ε = 15, acceptance rate: 1.3% Bottom: ε = 50 acceptance rate:34.2%
Modified LF-MCMC Motivation Modification dynamically define ε t as a monotonically decreasing sequence: { max{ε min, min{ε, ε t 1 }} if accept θ ε t = otherwise ε t 1 ε 0 = ρ (S, S 0 ), ε = ρ (S, S ), where S 0 : summary statistics for the dataset generated by the intital value and ε min : target ε value. Before: compare with the target ε value(global comparison) Now: compare with the previous ε value(local comparison) adaptively change the variance of the proposal distribution accelerate/control chain mixing
Modified LF-MCMC Motivation Modification dynamically define ε t as a monotonically decreasing sequence: { max{ε min, min{ε, ε t 1 }} if accept θ ε t = otherwise ε t 1 ε 0 = ρ (S, S 0 ), ε = ρ (S, S ), where S 0 : summary statistics for the dataset generated by the intital value and ε min : target ε value. Before: compare with the target ε value(global comparison) Now: compare with the previous ε value(local comparison) adaptively change the variance of the proposal distribution accelerate/control chain mixing
Modified LF-MCMC Motivation Modification dynamically define ε t as a monotonically decreasing sequence: { max{ε min, min{ε, ε t 1 }} if accept θ ε t = otherwise ε t 1 ε 0 = ρ (S, S 0 ), ε = ρ (S, S ), where S 0 : summary statistics for the dataset generated by the intital value and ε min : target ε value. Before: compare with the target ε value(global comparison) Now: compare with the previous ε value(local comparison) adaptively change the variance of the proposal distribution accelerate/control chain mixing
Simulation results All four parameters unknown Simulation results for Stable (α, β, γ, δ) based on 500 observations from Stable (1.5, 0.5, 10, 10), using 10000 iterations and ε min = 15.
LF-MCMC for variance gamma (VG) distribution For comparison, we apply the method to VG distribution.
LF-MCMC for variance gamma (VG) distribution For comparison, we apply the method to VG distribution. VG process (Madan and Seneta, 1990) (Madan, Carr and Chang, 1998) X (VG) t = θg t + σw Gt G t is a gamma process with mean rate unity and variance rate ν, W t is the standard Brownian motion.
LF-MCMC for variance gamma (VG) distribution For comparison, we apply the method to VG distribution. VG process (Madan and Seneta, 1990) (Madan, Carr and Chang, 1998) X (VG) t = θg t + σw Gt G t is a gamma process with mean rate unity and variance rate ν, W t is the standard Brownian motion. Unit period distribution VG (σ, ν, θ) pdf can be written in terms of modified Bessel function of the second kind
LF-MCMC for variance gamma (VG) distribution For comparison, we apply the method to VG distribution. VG process (Madan and Seneta, 1990) (Madan, Carr and Chang, 1998) X (VG) t = θg t + σw Gt G t is a gamma process with mean rate unity and variance rate ν, W t is the standard Brownian motion. Unit period distribution VG (σ, ν, θ) pdf can be written in terms of modified Bessel function of the second kind VG distribution has finite moments of all order.
LF-MCMC for variance gamma (VG) distribution Simulation results Simulation results for VG (σ, ν, θ, µ) based on 500 observations from VG (0.8, 1, 0.5, 10), using 10000 iterations and ε min = 1. Added summary statistics: mean and variance.
Application to financial data Fit stable distribution to real financial data. The data is the S&P 500 index from the period of January 2009 to July 2011, with 629 daily log returns and the prices are adjusted close price. Implement 10000 iterations of LF-MCMC, discard first 2000 iterations, averaging over the samples gave the values of posterior estimates: α: 1.3542 β: 0.0741 γ: 0.0070 δ: 0.0019
blue=stable fit, green=smoothed data The figure is produced using J.P. Nolan s STABLE program, available at http://academic2.american.edu/ jpnolan
Concluding remarks Our results: apply LF-MCMC to the inference for stable models
Concluding remarks Our results: apply LF-MCMC to the inference for stable models make the method applicable to general cases
Concluding remarks Our results: apply LF-MCMC to the inference for stable models make the method applicable to general cases relatively low computational cost
Concluding remarks Our results: Pitfalls: apply LF-MCMC to the inference for stable models make the method applicable to general cases relatively low computational cost need to specify a proper target ε value
Concluding remarks Our results: Pitfalls: apply LF-MCMC to the inference for stable models make the method applicable to general cases relatively low computational cost need to specify a proper target ε value don t know when convergence will happen, need more iterations
Concluding remarks Our results: Pitfalls: apply LF-MCMC to the inference for stable models make the method applicable to general cases relatively low computational cost need to specify a proper target ε value don t know when convergence will happen, need more iterations choice of summary statistics can crucially affect sampler performance
Acknowledgement Thanks to My supervisor Dr. Ray Kawai