Bayesian inference for stochastic differential mixed effects models - initial steps

Similar documents
Pseudo-marginal Metropolis-Hastings: a simple explanation and (partial) review of theory

Tracking using CONDENSATION: Conditional Density Propagation

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

Some Examples on Gibbs Sampling and Metropolis-Hastings methods

CS Lecture 13. More Maximum Likelihood

Machine Learning Basics: Estimators, Bias and Variance

Bayes Decision Rule and Naïve Bayes Classifier

Nonparametric Drift Estimation for Stochastic Differential Equations

1 Brownian motion and the Langevin equation

Donald Fussell. October 28, Computer Science Department The University of Texas at Austin. Point Masses and Force Fields.

Lecture 12: Ensemble Methods. Introduction. Weighted Majority. Mixture of Experts/Committee. Σ k α k =1. Isabelle Guyon

A Poisson process reparameterisation for Bayesian inference for extremes

Block designs and statistics

Markov chain Monte Carlo algorithms for SDE parameter estimation

TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES

Training an RBM: Contrastive Divergence. Sargur N. Srihari

SIMPLE HARMONIC MOTION: NEWTON S LAW

Gaussian processes for inference in stochastic differential equations

CSE525: Randomized Algorithms and Probabilistic Analysis May 16, Lecture 13

Bayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference

Exact Simulation of Diffusions and Jump Diffusions

Lecture 4: Dynamic models

Extension of CSRSM for the Parametric Study of the Face Stability of Pressurized Tunnels

Reading from Young & Freedman: For this topic, read the introduction to chapter 25 and sections 25.1 to 25.3 & 25.6.

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

Estimating Parameters for a Gaussian pdf

DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS

Riemann Manifold Methods in Bayesian Statistics

Ph 20.3 Numerical Solution of Ordinary Differential Equations

Data-Driven Imaging in Anisotropic Media

Estimation of the Mean of the Exponential Distribution Using Maximum Ranked Set Sampling with Unequal Samples

Combining Classifiers

Retail Planning in Future Cities A Stochastic Dynamical Singly Constrained Spatial Interaction Model

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab

Nonmonotonic Networks. a. IRST, I Povo (Trento) Italy, b. Univ. of Trento, Physics Dept., I Povo (Trento) Italy

Detection and Estimation Theory

2 Q 10. Likewise, in case of multiple particles, the corresponding density in 2 must be averaged over all

A stochastic formulation of a dynamical singly constrained spatial interaction model

SUPERPOSITION OF BETA PROCESSES

J11.3 STOCHASTIC EVENT RECONSTRUCTION OF ATMOSPHERIC CONTAMINANT DISPERSION

Sequence Analysis, WS 14/15, D. Huson & R. Neher (this part by D. Huson) February 5,

Figure 1: Equivalent electric (RC) circuit of a neurons membrane

lecture 35: Linear Multistep Mehods: Truncation Error

Bayesian Approach for Fatigue Life Prediction from Field Inspection

SEISMIC FRAGILITY ANALYSIS

are equal to zero, where, q = p 1. For each gene j, the pairwise null and alternative hypotheses are,

HORIZONTAL MOTION WITH RESISTANCE

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

I forgot to mention last time: in the Ito formula for two standard processes, putting

2nd Workshop on Joints Modelling Dartington April 2009 Identification of Nonlinear Bolted Lap Joint Parameters using Force State Mapping

Adaptive Monte Carlo methods

3D acoustic wave modeling with a time-space domain dispersion-relation-based Finite-difference scheme

Random Process Review

Bayesian inference for nonlinear multivariate diffusion processes (and other Markov processes, and their application to systems biology)

IN modern society that various systems have become more

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis

Ch 12: Variations on Backpropagation

Boosting with log-loss

Markov Chain Monte Carlo

Multiscale Entropy Analysis: A New Method to Detect Determinism in a Time. Series. A. Sarkar and P. Barat. Variable Energy Cyclotron Centre

CHAPTER 19: Single-Loop IMC Control

1 Bounding the Margin

Feedforward Networks

Analyzing Simulation Results

Markov Chain Monte Carlo (MCMC)

Bayesian parameter inference for stochastic biochemical network models using particle MCMC

The Solution of One-Phase Inverse Stefan Problem. by Homotopy Analysis Method

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information

VCMC: Variational Consensus Monte Carlo

Bayes Estimation of the Logistic Distribution Parameters Based on Progressive Sampling

Feedforward Networks

Feedforward Networks. Gradient Descent Learning and Backpropagation. Christian Jacob. CPSC 533 Winter 2004

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters

Biostatistics Department Technical Report

A generalization of the Multiple-try Metropolis algorithm for Bayesian estimation and model selection

MCMC 2: Lecture 3 SIR models - more topics. Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham

Statistical Logic Cell Delay Analysis Using a Current-based Model

SPECTRUM sensing is a core concept of cognitive radio

Computational statistics

Numerical Studies of a Nonlinear Heat Equation with Square Root Reaction Term

Classical and Bayesian Inference for an Extension of the Exponential Distribution under Progressive Type-II Censored Data with Binomial Removals

Probability Distributions

ma x = -bv x + F rod.

A Note on the Applied Use of MDL Approximations

GEE ESTIMATORS IN MIXTURE MODEL WITH VARYING CONCENTRATIONS

Chapter 4: Hypothesis of Diffusion-Limited Growth

C na (1) a=l. c = CO + Clm + CZ TWO-STAGE SAMPLE DESIGN WITH SMALL CLUSTERS. 1. Introduction

Simulation of Discrete Event Systems

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Supplementary to Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data

Some Perspective. Forces and Newton s Laws

Projectile Motion with Air Resistance (Numerical Modeling, Euler s Method)

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks

PHY307F/407F - Computational Physics Background Material for Expt. 3 - Heat Equation David Harrison

Uncertainty Propagation and Nonlinear Filtering for Space Navigation using Differential Algebra

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

An Improved Particle Filter with Applications in Ballistic Target Tracking

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION

Transcription:

Bayesian inference for stochastic differential ixed effects odels - initial steps Gavin Whitaker 2nd May 2012 Supervisors: RJB and AG

Outline Mixed Effects Stochastic Differential Equations (SDEs) Bayesian inference for SDEs Toy odel (Roberts and Straer 2001 paper)

SDE Models Consider an Itô process {X t, t 0} satisfying dx t = α(x t, θ)dt + β(x t, θ)dw t X t is the value of the process at tie t α(x t, θ) is the drift β(x t, θ) is the diffusion coefficient W t is standard Brownian otion X 0 is the vector of initial conditions

SDE Models Consider an Itô process {X t, t 0} satisfying dx t = α(x t, θ)dt + β(x t, θ)dw t X t is the value of the process at tie t α(x t, θ) is the drift β(x t, θ) is the diffusion coefficient W t is standard Brownian otion X 0 is the vector of initial conditions Seek a nuerical solution via (for exaple) the Euler-Maruyaa approxiation X t X t+ t X t = α(x t, θ) t + β(x t, θ) W t where W t N(0, I t)

SDE Models CIR Model dx t = (θ 1 θ 2 X t )dt + θ 3 Xt dw t Used to odel short ter interest rates The process is ean reverting

SDE Models CIR Model Siulation of CIR Model X t 10 12 14 16 18 20 θ 3=0.5 θ 3=0.2 0 5 10 15 20 Tie Figure: Nuerical solution for CIR odel, θ 1 = 1, θ 2 = 0.1, X 0 = 15

SDE Models Aphid Growth Model Also known as plant lice, or greenfly They are sall sap sucking insects Soe species of ants far aphids, for the honeydew they release. These dairying ants, ilk the aphids by stroking the

SDE Models Aphid Growth Model ( ) ( ) ( dnt λnt µn = t C t λnt + µn dt + t C t λn t dc t λn t λn t λn t ) 1 2 dw t N t is the aphid population size at tie t C t is the cuulative population at tie t This odel is an SDE approxiation to an underlying stochastic kinetic odel Birth rate of λn t and a death rate of µn t C t

SDE Models Aphid Growth Model Siulation of Aphid Growth Model Population size 0 500 1000 1500 2000 2500 3000 3500 N t C t 0 2 4 6 8 10 Tie Figure: Nuerical solution for Aphid odel, λ = 1.75, µ = 0.001

Mixed Effects SDE Models What if experiental units are not identical? Suppose the units have coon paraeters θ but different paraeters b i We treat the b i as rando effects with a population profile

Mixed Effects SDE Models What if experiental units are not identical? Suppose the units have coon paraeters θ but different paraeters b i We treat the b i as rando effects with a population profile This gives us a stochastic differential ixed-effects odel for the experiental units: dx i t = α(x i t, θ, b i )dt + β(x i t, θ, b i )dw i t, i = 1,..., M Differences between units are down to different realisations of the Brownian otion paths W i t and the rando effects b i Allows us to split the total variation between within- and between-individual coponents

Bayesian inference for SDEs Probleatic due to the intractability of the transition density characterising the process In other words, we typically can t solve an SDE analytically So we could just work with the Euler approxiation Given data d at equidistant ties t 0, t 1,..., t n, the Euler approxiation ay be unsatisfactory for t = t i+1 t i We therefore adopt a data augentation schee

Bayesian inference for SDEs Introduce a partition of [t i, t i+1 ] as t i = τ i < τ i+1 <... < τ (i+1) = t i+1 where τ τ i+1 τ i = t i+1 t i Apply Euler approxiation over each interval of width τ Introduces 1 latent values between every pair of observations

Bayesian inference for SDEs Forulate joint posterior for paraeters and latent values d = (x t0, x t1,..., x tn ) x = (x τ1, x τ2,..., x τ 1, x τ+1,...,..., x τn 1 ) = latent path (x, d) = (x τ0, x τ1,..., x τ, x τ+1,...,..., x τn ) = augented path

Bayesian inference for SDEs Forulate joint posterior for paraeters and latent data as where π(θ, x d) π(θ)π(x, d θ) n 1 π(θ) i=0 π(x τi+1 x τi, θ) π(x τi+1 x τi, θ) = φ ( x τi+1 ; x τi + α(x τi, θ) t, β(x τi, θ) t ) and φ( ; µ, Σ) denotes the Gaussian density with ean µ and variance Σ

Bayesian inference for SDEs Forulate joint posterior for paraeters and latent data as where π(θ, x d) π(θ)π(x, d θ) n 1 π(θ) i=0 π(x τi+1 x τi, θ) π(x τi+1 x τi, θ) = φ ( x τi+1 ; x τi + α(x τi, θ) t, β(x τi, θ) t ) and φ( ; µ, Σ) denotes the Gaussian density with ean µ and variance Σ The posterior distribution is typically analytically intractable

A Gibbs sapling approach We therefore saple via an MCMC schee E.g a Gibbs sapler, alternating between draws of θ x, d x θ, d

A Gibbs sapling approach We therefore saple via an MCMC schee E.g a Gibbs sapler, alternating between draws of θ x, d x θ, d The last step can be done (for exaple) in blocks of length 1 between observations Metropolis within Gibbs updates ay be needed Proble: the ixing is poor for large

Toy odel Consider the SDE dx t = 1 θ dw t Suppose that we have observations X 0 = x 0 = 0 and X 1 = x 1 Set τ i = i/ for i = 0, 1,..., so that (x, d) = x 0, x 1, x 2,..., x 1,x 1 }{{} obs path (bridge) obs Under the Euler approxiation X i x (i 1), θ N ( ) 1 x (i 1), θ

Toy odel Consider the SDE dx t = 1 θ dw t Suppose that we have observations X 0 = x 0 = 0 and X 1 = x 1 Set τ i = i/ for i = 0, 1,..., so that (x, d) = x 0, x 1, x 2,..., x 1,x 1 }{{} obs path (bridge) obs

Toy odel Consider the SDE dx t = 1 θ dw t Suppose that we have observations X 0 = x 0 = 0 and X 1 = x 1 Set τ i = i/ for i = 0, 1,..., so that (x, d) = x 0, x 1, x 2,..., x 1,x 1 }{{} obs path (bridge) obs Under the Euler approxiation X i x (i 1), θ N ( ) 1 x (i 1), θ

Toy odel Hence π(x, d θ) θ θ exp 2π i=1 θ /2 exp { 1 2 θ i=1 ( x i ( x i x (i 1) 2 x (i 1) ) 2 } ) 2

Toy odel Hence π(x, d θ) θ θ exp 2π i=1 θ /2 exp { 1 2 θ i=1 ( x i ( x i x (i 1) 2 x (i 1) ) 2 } ) 2 Take prior θ Exp(1) The full conditional for θ is π(θ x, d) π(θ)π(x, d θ)

Toy odel θ /2 exp { { θ /2 exp θ 1 2 θ i=1 ( ΣX 2 ( x i )} + 1 x (i 1) ) 2 θ } where Σ X = i=1 ( x i x (i 1) ) 2

Toy odel θ /2 exp { { θ /2 exp θ 1 2 θ i=1 ( ΣX 2 ( x i )} + 1 x (i 1) ) 2 θ } where Σ X = i=1 ( x i x (i 1) ) 2 Therefore ( θ x, d Γ 2 + 1, Σ X 2 ) + 1

Toy odel Under the linear Gaussian structure of the siple SDE, the full conditional x θ, d can be sapled using X i x 1, θ = ix 1 + 1 Z i, i = 1, 2,..., θ where {Z t, 0 t 1} is a standard Brownian bidge, that is a standard Brownian otion conditioned to hit 0 at tie 0, at tie 1

Toy odel Siulated data Take x 0 = 0, θ = 1 Siulate x 1 using Euler schee Get x 1 = 0.6947 d = (0, 0.6947)

Toy odel Siulated data Take x 0 = 0, θ = 1 Siulate x 1 using Euler schee Get x 1 = 0.6947 d = (0, 0.6947) MCMC schee We perfor a run of 1000 iterations of the schee with no thin. Initialise with θ (0) = 1, the prior ean Step 1 Update the discretised Brownian bridge which hits x 1 at t = 1 Step 2 Draw θ fro its full conditional distribution, ( θ x, d Γ 2 + 1, Σ ) X + 1 2

Toy odel Results log(1 θ), =10 θ1 1 0 1 2 3 0 200 400 600 800 1000 Tie log(1 θ), =100 θ1 1 0 1 2 0 200 400 600 800 1000 Tie log(1 θ), =1000 θ1 0.0 1.0 0 200 400 600 800 1000 Tie Figure: Trace plots for log( 1 θ )

Toy odel Results Auto correlation for log(1 θ) ACF 0.2 0.0 0.2 0.4 0.6 0.8 1.0 0 50 100 150 Lag =10 =100 =1000 Figure: Auto-correlation plots for log( 1 θ )

Toy odel Results Mixing gets even worse for larger Why is this happening? Try and quantify the ixing tie by considering the paraeter update The paraeter update can be rewritten in a clever way...

Toy odel What s going wrong? Recall that X i x 1, θ = ix 1 + 1 Z i, i = 1, 2,..., θ

Toy odel What s going wrong? Recall that X i x 1, θ = ix 1 + 1 Z i, i = 1, 2,..., θ Now Σ X = = = ( i x 1 + 1 [ (i 1) Z i i=1 θ ( 1 [ Z i Z (i 1) θ i=1 i=1 ( 1 θ [ Z i Z (i 1) x 1 + 1 θ Z (i 1) ] + i x (i 1) 1 ] + x ) 2 1 x 1 ) 2 ]) 2

Toy odel What s going wrong? Expanding out Σ X = i=1 ( 1 [ Z i θ Z (i 1) = 1 θ Σ Z + x2 1 + 2 x 1 θ ] 2 + x 2 1 i=1 2 + 2 θ x 1 [ Z i Z (i 1) ] [ Z i Z (i 1) ] ) Now i=1 [ Z i Z (i 1) ] = ( ) ( ) Z 1 Z 0 + Z 2 Z 1 +... (... + ( Z ( 1) Z ( 2) ) + Z 1 Z ( 1) )

Toy odel What s going wrong? Expanding out Σ X = i=1 ( 1 [ Z i θ Z (i 1) = 1 θ Σ Z + x2 1 + 2 x 1 θ ] 2 + x 2 1 i=1 2 + 2 θ x 1 [ Z i Z (i 1) ] [ Z i Z (i 1) ] ) Now i=1 [ Z i Z (i 1) ] = ( ) ( ) Z 1 Z 0 + Z 2 Z 1 +... ( ) ( )... + Z ( 1) Z ( 2) + Z 1 Z ( 1)

Toy odel What s going wrong? So Σ X = 1 θ Σ Z + x2 1 + 2 θ x 1 [Z 1 Z 0 ] But Z 0 = Z 1 = 0 since Z is a standard Brownian bridge Thus where Σ Z = Σ X = 1 θ Σ Z + x2 1 i=1 ( Z i Z (i 1) ) 2 Using properties of Gaussian rando variables we have Σ Z χ2 1

Toy odel What s going wrong? So Σ X = 1 θ Σ Z + x2 1 + 2 ] x 1 θ [ 0 Z 1 Z 0 But Z 0 = Z 1 = 0 since Z is a standard Brownian bridge Thus where Σ Z = Σ X = 1 θ Σ Z + x2 1 i=1 ( Z i Z (i 1) ) 2

Toy odel What s going wrong? So Σ X = 1 θ Σ Z + x2 1 + 2 ] x 1 θ [ 0 Z 1 Z 0 But Z 0 = Z 1 = 0 since Z is a standard Brownian bridge Thus where Σ Z = Σ X = 1 θ Σ Z + x2 1 i=1 ( Z i Z (i 1) ) 2 Using properties of Gaussian rando variables we have Σ Z χ2 1

Toy odel What s going wrong? Now If H Γ ( 2 + 1, 1) then ( θ x, d Γ 2 + 1, Σ X 2 θ new = = H Σ X 2 2 + 1 H ( x 2 1 + χ2 1 θ old ) + 1 ) + 1

Toy odel What s going wrong? Now If H Γ ( 2 + 1, 1) then ( θ x, d Γ 2 + 1, Σ X 2 θ new = = H Σ X 2 + 1 H x 2 1 2 + χ2 1 2θ old + 1 ) + 1

Toy odel What s going wrong? For large, approxiate H and χ 2 1 with Noral rando variables Roberts and Straer then use a suitable Taylor expansion of the expression for θ new to give ( ) 1 } 2 2 θ new θ old {1 + (W1 W 2 ) 2 + x2 1 θ old + W 2 2 where W 1 and W 2 are independent N(0, 1) rando variables

Toy odel What s going wrong? For large, approxiate H and χ 2 1 with Noral rando variables Roberts and Straer then use a suitable Taylor expansion of the expression for θ new to give ( ) 1 } 2 2 θ new θ old {1 + (W1 W 2 ) 2 + x2 1 θ old + W 2 2 where W 1 and W 2 are independent N(0, 1) rando variables θ new = θ old { 1 + O( 1 ) }

Toy odel What s going wrong? For large, approxiate H and χ 2 1 with Noral rando variables Roberts and Straer then use a suitable Taylor expansion of the expression for θ new to give ( ) 1 } 2 2 θ new θ old {1 + (W1 W 2 ) 2 + x2 1 θ old + W 2 2 where W 1 and W 2 are independent N(0, 1) rando variables θ new = θ old { 1 + O( 1 ) } Mixing tie is O()

Future work Construct MCMC schees for arbitrary nonlinear diffusion processes Naive schees with a block update for the path Better schees that use a reparaeterisation Joint update of path and paraeters (pmcmc) Application to ixed effects SDEs Aphid odel, real data exaples

References Roberts. G. O. and Straer. O. On inference for partially observed nonlinear diffusion odels using the Metropolis-Hastings algorith. Bioetrika, 88 (3) 603-621, 2001 Gillespie, C. S. and Golightly, A. Bayesian inference for generalized stochastic population growth odels with application to aphids. JRSS Series C, Applied Statistics, 59(2):341-357, 2010