Nonparametric Drift Estimation for Stochastic Differential Equations

Similar documents
Gaussian processes for inference in stochastic differential equations

Exact Simulation of Diffusions and Jump Diffusions

LAN property for sde s with additive fractional noise and continuous time observation

A variational radial basis function approximation for diffusion processes

Computational statistics

Exact Simulation of Multivariate Itô Diffusions

Controlled Diffusions and Hamilton-Jacobi Bellman Equations

Bayesian Methods for Machine Learning

Bayesian inference for stochastic differential mixed effects models - initial steps

MH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution

F denotes cumulative density. denotes probability density function; (.)

Hierarchical Bayesian Inversion

Introduction to Random Diffusions

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

CPSC 540: Machine Learning

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao

Basic math for biology

Bayesian Inference for DSGE Models. Lawrence J. Christiano

MCMC 2: Lecture 3 SIR models - more topics. Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham

Bayesian Regularization

An introduction to adaptive MCMC

On Reparametrization and the Gibbs Sampler

Nonparametric Bayesian Methods - Lecture I

Bayesian Inference and MCMC

Sequential Monte Carlo Samplers for Applications in High Dimensions

Generalized Gaussian Bridges of Prediction-Invertible Processes

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters

Dynamic models. Dependent data The AR(p) model The MA(q) model Hidden Markov models. 6 Dynamic models

Representing Gaussian Processes with Martingales

Stochastic Differential Equations.

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS

Tools of stochastic calculus

Linear Models A linear model is defined by the expression

Sequential Monte Carlo Methods for Bayesian Computation

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

A new Hierarchical Bayes approach to ensemble-variational data assimilation

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Bayesian inference for factor scores

A Note on Auxiliary Particle Filters

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3

Point spread function reconstruction from the image of a sharp edge

MCMC Sampling for Bayesian Inference using L1-type Priors

Notes on pseudo-marginal methods, variational Bayes and ABC

Markov chain Monte Carlo algorithms for SDE parameter estimation

Density Estimation. Seungjin Choi

Homework # , Spring Due 14 May Convergence of the empirical CDF, uniform samples

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

STA 294: Stochastic Processes & Bayesian Nonparametrics

10. Exchangeability and hierarchical models Objective. Recommended reading

Statistical inference on Lévy processes

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait

Nonparameteric Regression:

Patterns of Scalable Bayesian Inference Background (Session 1)

ICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts

Lecture 8: The Metropolis-Hastings Algorithm

Bayesian Regression Linear and Logistic Regression

Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements

Bayesian estimation of the discrepancy with misspecified parametric models

On Markov chain Monte Carlo methods for tall data

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

Discretization of SDEs: Euler Methods and Beyond

Contents. Part I: Fundamentals of Bayesian Inference 1

Graphical Models for Query-driven Analysis of Multimodal Data

How to build an automatic statistician

Bayesian inference for nonlinear multivariate diffusion processes (and other Markov processes, and their application to systems biology)

Methods of Data Assimilation and Comparisons for Lagrangian Data

Non-Parametric Bayes

A Bayesian perspective on GMM and IV

Stochastic (intermittent) Spikes and Strong Noise Limit of SDEs.

Introduction to Bayesian methods in inverse problems

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Towards a Bayesian model for Cyber Security

The Metropolis-Hastings Algorithm. June 8, 2012

Bayesian Linear Regression

1 Brownian Local Time

A new iterated filtering algorithm

12 - Nonparametric Density Estimation

The concentration of a drug in blood. Exponential decay. Different realizations. Exponential decay with noise. dc(t) dt.

Bayesian Machine Learning - Lecture 7

Kernel adaptive Sequential Monte Carlo

Bayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference

Default Priors and Effcient Posterior Computation in Bayesian

Statistics: Learning models from data

Geometric projection of stochastic differential equations

Bayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

Understanding Regressions with Observations Collected at High Frequency over Long Span

Lecture 22 Girsanov s Theorem

Strong uniqueness for stochastic evolution equations with possibly unbounded measurable drift term

Introduction. log p θ (y k y 1:k 1 ), k=1

HMM part 1. Dr Philip Jackson

Introduction. Chapter 1

STAT 518 Intro Student Presentation

Transcription:

Nonparametric Drift Estimation for Stochastic Differential Equations Gareth Roberts 1 Department of Statistics University of Warwick Brazilian Bayesian meeting, March 2010 Joint work with O. Papaspiliopoulos, Y. Pokern and A. Stuart

Centre for Research in Statistical Methodology http://www2.warwick.ac.uk/fac/sci/statistics/crism/ Conferences and workshops Academic visitor programme Currently preparing for a pre-valencia meeting on Model Uncertainty to take place between 30st May and 1st June.

Fitting SDEs to Molecular Dynamics MD data X(m t) R d Multiple Timescales High frequency data High dimension, only few dimensions of chemical interest Diffusion good description at some timescales only.

Plan Start from SDE dx t = b(x t )dt + db t, X(0) = x 0 and high-frequency discrete time observations x i. Write down likelihood for b( ). Manipulate likelihood to make the local time L an (almost) sufficient statistic. Specify prior on function space H, compute posterior. Make Bayesian framework rigorous. Make numerics robust. Application: Toy example from Molecular Dynamics

Plan Start from SDE dx t = b(x t )dt + db t, X(0) = x 0 and high-frequency discrete time observations x i. Write down likelihood for b( ). Manipulate likelihood to make the local time L an (almost) sufficient statistic. Specify prior on function space H, compute posterior. Make Bayesian framework rigorous. Make numerics robust. Application: Toy example from Molecular Dynamics

Plan Start from SDE dx t = b(x t )dt + db t, X(0) = x 0 and high-frequency discrete time observations x i. Write down likelihood for b( ). Manipulate likelihood to make the local time L an (almost) sufficient statistic. Specify prior on function space H, compute posterior. Make Bayesian framework rigorous. Make numerics robust. Application: Toy example from Molecular Dynamics

Plan Start from SDE dx t = b(x t )dt + db t, X(0) = x 0 and high-frequency discrete time observations x i. Write down likelihood for b( ). Manipulate likelihood to make the local time L an (almost) sufficient statistic. Specify prior on function space H, compute posterior. Make Bayesian framework rigorous. Make numerics robust. Application: Toy example from Molecular Dynamics

Plan Start from SDE dx t = b(x t )dt + db t, X(0) = x 0 and high-frequency discrete time observations x i. Write down likelihood for b( ). Manipulate likelihood to make the local time L an (almost) sufficient statistic. Specify prior on function space H, compute posterior. Make Bayesian framework rigorous. Make numerics robust. Application: Toy example from Molecular Dynamics

Plan Start from SDE dx t = b(x t )dt + db t, X(0) = x 0 and high-frequency discrete time observations x i. Write down likelihood for b( ). Manipulate likelihood to make the local time L an (almost) sufficient statistic. Specify prior on function space H, compute posterior. Make Bayesian framework rigorous. Make numerics robust. Application: Toy example from Molecular Dynamics

SDE properties - Girsanov dx t = b(x t )dt + db t Generates measure P on path space C ([0, T ], [0, 2π)). P is absolutely continuous w.r.t. W generated by Brownian Motion. The likelihood (Radon-Nikodym derivative) is dp dw = exp (I[b]) I[b] viewed as functional of the drift: T (b 2 (X t )dt 2b(X t )dx t ) I[b] = 1 2 0

SDE properties - Girsanov dx t = b(x t )dt + db t Generates measure P on path space C ([0, T ], [0, 2π)). P is absolutely continuous w.r.t. W generated by Brownian Motion. The likelihood (Radon-Nikodym derivative) is dp dw = exp (I[b]) I[b] viewed as functional of the drift: T (b 2 (X t )dt 2b(X t )dx t ) I[b] = 1 2 0

SDE properties - Girsanov dx t = b(x t )dt + db t Generates measure P on path space C ([0, T ], [0, 2π)). P is absolutely continuous w.r.t. W generated by Brownian Motion. The likelihood (Radon-Nikodym derivative) is dp dw = exp (I[b]) I[b] viewed as functional of the drift: T (b 2 (X t )dt 2b(X t )dx t ) I[b] = 1 2 0

A hint from the discrete problem n state Markov chain, discrete time, with transition matrix p 11 p 12 p 1n p 21 p 22 p 2n p n1 p n2 p nn Data X 0, X 1,... X T. Likelihood is T t=1 p Xt 1,X t Information about the ith row is only available from visits to i. So, let L i = T t=1 1 X t 1 =i be the local time at i, and factorise: n p i,xt. i=1 t; X t 1 =i Conditional on L, we have n independent inference problems.

A hint from the discrete problem n state Markov chain, discrete time, with transition matrix p 11 p 12 p 1n p 21 p 22 p 2n p n1 p n2 p nn Data X 0, X 1,... X T. Likelihood is T t=1 p Xt 1,X t Information about the ith row is only available from visits to i. So, let L i = T t=1 1 X t 1 =i be the local time at i, and factorise: n p i,xt. i=1 t; X t 1 =i Conditional on L, we have n independent inference problems.

A hint from the discrete problem n state Markov chain, discrete time, with transition matrix p 11 p 12 p 1n p 21 p 22 p 2n p n1 p n2 p nn Data X 0, X 1,... X T. Likelihood is T t=1 p Xt 1,X t Information about the ith row is only available from visits to i. So, let L i = T t=1 1 X t 1 =i be the local time at i, and factorise: n p i,xt. i=1 t; X t 1 =i Conditional on L, we have n independent inference problems.

A hint from the discrete problem n state Markov chain, discrete time, with transition matrix p 11 p 12 p 1n p 21 p 22 p 2n p n1 p n2 p nn Data X 0, X 1,... X T. Likelihood is T t=1 p Xt 1,X t Information about the ith row is only available from visits to i. So, let L i = T t=1 1 X t 1 =i be the local time at i, and factorise: n p i,xt. i=1 t; X t 1 =i Conditional on L, we have n independent inference problems.

Diffusion Local Time Local time is defined (at least in one-dimension) as the occupation density at a location: L(a) = lim ɛ 0 T s=0 1(X s (a ɛ, a + ɛ))ds 2ɛ This gives a natural way to replace time averages by space averages. T 0 f (x t )dt = f (a)l(a)da R

Inference for SDEs - parametric dx t = b(x t )dt + σ(x t )db t, Estimate drift function b( ) from observations {x i } M i=1, x i = x(i t) High frequency setup ( t 0), discretely observed case ( t = O(1)), or more generally... Drift functions parametrised by θ R N Could just choose a basis {b j }. Substantial theory and methodology available for these cases, and huge numbers of successful applications. BUT choice of basis functions can be far from obvious.

Inference for SDEs - parametric dx t = b(x t, θ)dt + σ(x t )db t, Estimate drift function b( ) from observations {x i } M i=1, x i = x(i t) High frequency setup ( t 0), discretely observed case ( t = O(1)), or more generally... Drift functions parametrised by θ R N Could just choose a basis {b j }. Substantial theory and methodology available for these cases, and huge numbers of successful applications. BUT choice of basis functions can be far from obvious.

Inference for SDEs - parametric dx = m θ j b j (X t )dt + σ(x t )db t, j=1 Estimate drift function b( ) from observations {x i } M i=1, x i = x(i t) High frequency setup ( t 0), discretely observed case ( t = O(1)), or more generally... Drift functions parametrised by θ R N Could just choose a basis {b j }. Substantial theory and methodology available for these cases, and huge numbers of successful applications. BUT choice of basis functions can be far from obvious.

Inference for SDEs - parametric Estimate drift function b( ) from observations {x i } M i=1, x i = x(i t) High frequency setup ( t 0), discretely observed case ( t = O(1)), or more generally... Drift functions parametrised by θ R N Could just choose a basis {b j }. Substantial theory and methodology available for these cases, and huge numbers of successful applications. BUT choice of basis functions can be far from obvious.

Inference for SDEs - parametric Estimate drift function b( ) from observations {x i } M i=1, x i = x(i t) High frequency setup ( t 0), discretely observed case ( t = O(1)), or more generally... Drift functions parametrised by θ R N Could just choose a basis {b j }. Substantial theory and methodology available for these cases, and huge numbers of successful applications. BUT choice of basis functions can be far from obvious.

The Likelihood Functional I[b] = log dp T ) (b dw = 1 2 (X t )dt 2b(X t )dx t 2 0 Start from the log-likelihood. We write V (x) = x 0 b(u)du. Apply Ito s formula for V (x) to rewrite the stochastic integral as boundary terms plus correction. Replace Integrals along the trajectory by integral against local time L(a)da.

The Likelihood Functional I[b] = log dp T ) (b dw = 1 2 (X t )dt 2b(X t )dx t 2 0 Start from the log-likelihood. We write V (x) = x 0 b(u)du. Apply Ito s formula for V (x) to rewrite the stochastic integral as boundary terms plus correction. Replace Integrals along the trajectory by integral against local time L(a)da.

The Likelihood Functional I[b] = W 2 (V (2π) V (0)) + 1 2 (V (X T ) V (X 0 )) 1 2 T 0 ( ) b 2 (X t ) + 2b (X t ) dt. Start from the log-likelihood. We write V (x) = x 0 b(u)du. Apply Ito s formula for V (x) to rewrite the stochastic integral as boundary terms plus correction. Replace Integrals along the trajectory by integral against local time L(a)da.

The Likelihood Functional I[b] = W 2 (V (2π) V (0)) + 1 2 (V (X T ) V (X 0 )) 1 2 2π 0 ( ) b 2 (a) + 2b (a) L(a)da. Start from the log-likelihood. We write V (x) = x 0 b(u)du. Apply Ito s formula for V (x) to rewrite the stochastic integral as boundary terms plus correction. Replace Integrals along the trajectory by integral against local time L(a)da.

The Likelihood Functional I[b] = W 2 1 2 2π 0 (V (2π) V (0)) [( ) ] b 2 (a) + 2b (a) L(a) χ x0,x T b(a) da. Start from the log-likelihood. We write V (x) = x 0 b(u)du. Apply Ito s formula for V (x) to rewrite the stochastic integral as boundary terms plus correction. Replace Integrals along the trajectory by integral against local time L(a)da.

If local time was smooth..... the log-likelihood I[b] would be bounded above on b L 2 (0, 2π). Taking the functional derivative yields the MLE b = L 2L

Infinite Dimensional Trouble I[b] = 2π 0 b(a) 2 L(a) + b (a)l(a)da For smooth L the functional is positive definite and bounded below. BUT L is not differentiable! Likelihood is easily shown to be almost surely unbounded.

Various Options Boundary 1 2 2π 0 ( ) b(a) 2 + b (a) L(a)da. 1 Use a non-likelihood based approach 2 Assume a parametric form b(x, θ) 3 Adopt some kind of penalised likelihood approach 4 Introduce a prior measure on drift functions b( ) and perform Bayesian estimation.

Gaussian Prior on drift functions Specify a prior Gaussian measure for zero-mean drift functions by Its Mean: b 0 H 2 per([0, 2π]) Its Precision (operator): A 0 on [0, 2π] with periodic boundary conditions. Thus a continuum Gaussian Markov random field. Smoothness imposed in A persists in the posterior

Choice of A A simple choice would take A = (= d 2 da 2 ). This is (approximately..) assuming independent increments of db(a) for different a, and would lead to continuous non-differentiable sample paths with probability 1. We mostly use instead A = 2, whereby sample paths have the smoothness of once integrated diffusions. Intuitively: ( ) b exp 2π 0 b(a) b 0 (a) 2 da

Finding the Posterior Multiply prior density by likelihood: ( ) exp exp ( 2π 0 2π 0 b(a) 2 da ) 1 [ ] (b(a) 2 + b (a))l(a)) + χ X0,X 2 T b(a) da + W term Complete the square to find that the posterior is Gaussian with Mean ( ) 2 1 + L ˆb = 2 L + χ X0,X T + W Posterior Covariance ( ) 1 2 + L

Finding the Posterior Multiply prior density by likelihood: ( ) exp exp ( 2π 0 2π 0 b(a) 2 da ) 1 [ ] (b(a) 2 + b (a))l(a)) + χ X0,X 2 T b(a) da + W term Complete the square to find that the posterior is Gaussian with Mean ( ) 2 1 + L ˆb = 2 L + χ X0,X T + W Posterior Covariance ( ) 1 2 + L

Towards the Posterior rigorously Standard PDE theory shows that the posterior mean is the weak solution of a PDE: Theorem Let L C([0, 2π]) be continuous and periodic and not identically zero. Then the PDE 2 u + Lu = 1 2 L + W + χ x0,x T (1) has a unique weak solution u H 2 per([0, 2π]).

Robustness of the Posterior Mean Theorem There exists a constant C(W, L ) > 0 such that for all admissible perturbed local times L the deviation of the perturbed posterior mean ũ from the unperturbed posterior mean u is bounded in the H 2 -norm: C > 0 L Λ : ũ u H 2 C(W, L ) L L L 2

Cleanup Observe absolute continuity of posterior and prior measure, 2 and 2 + L differ only in lower order differential parts. Compute the Radon-Nikodym derivative and identify with the likelihood. Simple estimate of local time from pointwise estimations combined with Hölder continuity of local time, so that ˆL L L 2 is small.

Estimating local time Choose N equal-sized bins. Count realisations falling in each bin. Width of bins adapts to M (i.e. t) (rather like bandwidth selection in kernel density estimation). Pointwise estimates converge to local time in probability. Use Hölder continuity of local time to obtain an L 2 error bound, using point values to construct piecewise constant approximants of L.

Estimating local time Choose N equal-sized bins. Count realisations falling in each bin. Width of bins adapts to M (i.e. t) (rather like bandwidth selection in kernel density estimation). Pointwise estimates converge to local time in probability. Use Hölder continuity of local time to obtain an L 2 error bound, using point values to construct piecewise constant approximants of L.

Estimating local time Choose N equal-sized bins. Count realisations falling in each bin. Width of bins adapts to M (i.e. t) (rather like bandwidth selection in kernel density estimation). Pointwise estimates converge to local time in probability. Use Hölder continuity of local time to obtain an L 2 error bound, using point values to construct piecewise constant approximants of L.

Estimating local time Choose N equal-sized bins. Count realisations falling in each bin. Width of bins adapts to M (i.e. t) (rather like bandwidth selection in kernel density estimation). Pointwise estimates converge to local time in probability. Use Hölder continuity of local time to obtain an L 2 error bound, using point values to construct piecewise constant approximants of L.

Estimating local time Choose N equal-sized bins. Count realisations falling in each bin. Width of bins adapts to M (i.e. t) (rather like bandwidth selection in kernel density estimation). Pointwise estimates converge to local time in probability. Use Hölder continuity of local time to obtain an L 2 error bound, using point values to construct piecewise constant approximants of L.

Numerical Analysis Fourth order elliptic PDE with non-regular right hand side. Use piecewise cubic polynomial base functions on each finite element. b(a) = N e=1 f =1 4 B e,f φ e,f (a) These span the approximation space H 2 h H2.

Finite Elements 2 Finite element representation turns weak PDE into a collection of linear equation for B e,f : (i,j),(e,f ) Ψ (i,j) M (i,j),(e,f ) B e,f = (i,j) Ψ (i,j) F (i,j) Standard numerical analysis guarantees the accuracy of these methods.

Numerics: Samples from Posterior dx = sin(x) + 3cos 2 (x)sin(x)dt + db t Second order prior covariance: A = 2

Numerics: Samples from Posterior dx = sin(x) + 3cos 2 (x)sin(x)dt + db t First order prior covariance: A = 1

Convergence as T Gaussian boundary conditions with second order covariance operator. T = 0.02

Convergence as T Gaussian boundary conditions with second order covariance operator. T = 50

Convergence as T Gaussian boundary conditions with second order covariance operator. T = 5000

Rates of posterior contraction For Z 1 = ˆb(0.38π) b(0.38π) and Z 2 = 2π 0 ˆb(a) sin(a)da. Questions: Do we have a law of large numbers Z i 0 as T? Do we get CLT-like convergence? Var(Z i ) = O ( ) 1 T (Numerical) Answers: Numerically, lim T Z i = 0 is observed. Decay of Variance: Answer depends on i! High frequency components of L can dominate the convergence.

Rates of posterior contraction For Z 1 = ˆb(0.38π) b(0.38π) and Z 2 = 2π 0 ˆb(a) sin(a)da. Questions: Do we have a law of large numbers Z i 0 as T? Do we get CLT-like convergence? Var(Z i ) = O ( ) 1 T (Numerical) Answers: Numerically, lim T Z i = 0 is observed. Decay of Variance: Answer depends on i! High frequency components of L can dominate the convergence.

Rate of Posterior Contraction Smooth Functional

Rate of Posterior Contraction Point Evaluation

Marginal likelihood More general prior precision operator: A(η) = η k + ε How to choose hyper-parameter η? Fully Bayesian approach clearly possible. We choose to maximise marginal likelihood. P({x t } T t=0 b)p 0(db) = A(η) A(η) + L T 1 L ( 2 exp 1 ) 2π ] [ (Ab 0 + f ) (A(η) + L T ) 1 (Ab 0 + f ) + b 0 Ab 0 da 2 0

Marginal likelihood More general prior precision operator: A(η) = η k + ε How to choose hyper-parameter η? Fully Bayesian approach clearly possible. We choose to maximise marginal likelihood. P({x t } T t=0 b)p 0(db) = A(η) A(η) + L T 1 L ( 2 exp 1 ) 2π ] [ (Ab 0 + f ) (A(η) + L T ) 1 (Ab 0 + f ) + b 0 Ab 0 da 2 0

Optimising Smoothness η

Molecular Dynamics MẌ(t) = V (X(t)) γmẋ(t) + 2γk B TMḂ X ω(x)

Fitting Result Whether data looks like a diffusion depends on timescale.

Fitting Result Posterior mean and standard deviation band for k = 1000:

Future Work / In Progress The case where the state space is R Extension to higher dimensions (2, 3) Consistency issues: rate of posterior contraction depends on smoothness of prior, truth and functional to be estimated. Extend to O(1)-spaced data using perfect simulation of Brownian local times with Metropolis-Hastings correction. Heterogenous diffusion coefficient is (in principle) straightforward for high-frequency data, though there would be associated numerical issues to resolve. In the discretely-observed case, this requires reparameterisation techniques (see R + Stramer, 2001, Durham and Gallant, 2002.

Future Work / In Progress The case where the state space is R Extension to higher dimensions (2, 3) Consistency issues: rate of posterior contraction depends on smoothness of prior, truth and functional to be estimated. Extend to O(1)-spaced data using perfect simulation of Brownian local times with Metropolis-Hastings correction. Heterogenous diffusion coefficient is (in principle) straightforward for high-frequency data, though there would be associated numerical issues to resolve. In the discretely-observed case, this requires reparameterisation techniques (see R + Stramer, 2001, Durham and Gallant, 2002.

Future Work / In Progress The case where the state space is R Extension to higher dimensions (2, 3) Consistency issues: rate of posterior contraction depends on smoothness of prior, truth and functional to be estimated. Extend to O(1)-spaced data using perfect simulation of Brownian local times with Metropolis-Hastings correction. Heterogenous diffusion coefficient is (in principle) straightforward for high-frequency data, though there would be associated numerical issues to resolve. In the discretely-observed case, this requires reparameterisation techniques (see R + Stramer, 2001, Durham and Gallant, 2002.

Future Work / In Progress The case where the state space is R Extension to higher dimensions (2, 3) Consistency issues: rate of posterior contraction depends on smoothness of prior, truth and functional to be estimated. Extend to O(1)-spaced data using perfect simulation of Brownian local times with Metropolis-Hastings correction. Heterogenous diffusion coefficient is (in principle) straightforward for high-frequency data, though there would be associated numerical issues to resolve. In the discretely-observed case, this requires reparameterisation techniques (see R + Stramer, 2001, Durham and Gallant, 2002.

Future Work / In Progress The case where the state space is R Extension to higher dimensions (2, 3) Consistency issues: rate of posterior contraction depends on smoothness of prior, truth and functional to be estimated. Extend to O(1)-spaced data using perfect simulation of Brownian local times with Metropolis-Hastings correction. Heterogenous diffusion coefficient is (in principle) straightforward for high-frequency data, though there would be associated numerical issues to resolve. In the discretely-observed case, this requires reparameterisation techniques (see R + Stramer, 2001, Durham and Gallant, 2002.

Summary Nonparametric drift estimation for diffusions on the circle can be performed rigorously for Gaussian prior (conjugate prior). Finite element implementation enables error control from discrete time high frequency samples all the way to numerically obtained posterior means. Applications in molecular dynamics and many other areas...

Samples from the Posterior are usable

Nonexistence Local time has the same regularity as Brownian motion: C α, α < 1 2. Unboundedness of the log-likelihood functional is linked to the regularity of local time. Substituting local time by Brownian bridge we get the

Nonexistence Local time has the same regularity as Brownian motion: C α, α < 1 2. Unboundedness of the log-likelihood functional is linked to the regularity of local time. Substituting local time by Brownian bridge we get the

Nonexistence Local time has the same regularity as Brownian motion: C α, α < 1 2. Unboundedness of the log-likelihood functional is linked to the regularity of local time. Substituting local time by Brownian bridge we get the Theorem Let W be a realisation of the Brownian bridge on [0, 1]. Then with probability one, the functional I[b] = 1 ( ) b 2 (s) + b (s) W (s)ds 2 is not bounded above on b H 1 ([0, 1]).

Classic example: Gibbs sampler for drift and diffusivity Algorithm Use sequential Gibbs sampler to estimate diffusivity σ and drift parameters θ j : 1 P(θ j x i, σ) 2 P(σ x i, θ j ) Observations: Algorithm works fine provided good approximations of the conditional densities are available. Frequently, approximations are good only for t 0. Simple fix: Augment the data {x j } by imputed datapoints x i,k at times t i < t i,1 < t i,2 <... < t i,k < t i+1.

Classic example: Gibbs sampler for drift and diffusivity Algorithm Use sequential Gibbs sampler to estimate diffusivity σ and drift parameters θ j : 1 P(θ j x i, σ) 2 P(σ x i, θ j ) Observations: Algorithm works fine provided good approximations of the conditional densities are available. Frequently, approximations are good only for t 0. Simple fix: Augment the data {x j } by imputed datapoints x i,k at times t i < t i,1 < t i,2 <... < t i,k < t i+1.

Classic example: Gibbs sampler for drift and diffusivity Augmented Data Algorithm Use sequential Gibbs sampler to estimate diffusivity σ, drift parameters θ j and imputed data points {x i,k }: 1 θ j P(θ j x i, x i,k, σ) 2 σ P(σ x i, x i,k, θ j ) 3 x i,k P( x i,k x i, σ, θ j ) Observations: Algorithm grinds to a halt as augmentation is increased Bad mixing for σ is observed Reason: Imputed data points determine σ, σ determines quadratic variation of imputed data points. Analysis via continuous time asymptotically equivalent diffusions (time rescaling)

Classic example: Gibbs sampler for drift and diffusivity Augmented Data Algorithm Use sequential Gibbs sampler to estimate diffusivity σ, drift parameters θ j and imputed data points {x i,k }: 1 θ j P(θ j x i, x i,k, σ) 2 σ P(σ x i, x i,k, θ j ) 3 x i,k P( x i,k x i, σ, θ j ) Observations: Algorithm grinds to a halt as augmentation is increased Bad mixing for σ is observed Reason: Imputed data points determine σ, σ determines quadratic variation of imputed data points. Analysis via continuous time asymptotically equivalent diffusions (time rescaling)