Lecture 5: Importance sampling and Hamilton-Jacobi equations

Similar documents
Weak convergence and large deviation theory

A Note On Large Deviation Theory and Beyond

MIN-MAX REPRESENTATIONS OF VISCOSITY SOLUTIONS OF HAMILTON-JACOBI EQUATIONS AND APPLICATIONS IN RARE-EVENT SIMULATION

Importance Sampling for Jackson Networks

Asymptotics and Simulation of Heavy-Tailed Processes

The Moment Method; Convex Duality; and Large/Medium/Small Deviations

March 1, Florida State University. Concentration Inequalities: Martingale. Approach and Entropy Method. Lizhe Sun and Boning Yang.

Proceedings of the 2014 Winter Simulation Conference A. Tolk, S. D. Diallo, I. O. Ryzhov, L. Yilmaz, S. Buckley, and J. A. Miller, eds.

Outline. Computer Science 331. Cost of Binary Search Tree Operations. Bounds on Height: Worst- and Average-Case

Gärtner-Ellis Theorem and applications.

Weak Convergence of Numerical Methods for Dynamical Systems and Optimal Control, and a relation with Large Deviations for Stochastic Equations

Statistics 300B Winter 2018 Final Exam Due 24 Hours after receiving it

Exam February h

21.1 Lower bounds on minimax risk for functional estimation

Markov processes and queueing networks

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Computer Intensive Methods in Mathematical Statistics

Chapter 8: Differential entropy. University of Illinois at Chicago ECE 534, Natasha Devroye

Exercises in Extreme value theory

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Large Deviations from the Hydrodynamic Limit for a System with Nearest Neighbor Interactions

Ordinary Differential Equations II

Introduction to Self-normalized Limit Theory

Convex Functions. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Constrained Optimization and Lagrangian Duality

On Lyapunov Inequalities and Subsolutions for Efficient Importance Sampling

Consistency of the maximum likelihood estimator for general hidden Markov models

Annealed Brownian motion in a heavy tailed Poissonian potential

Large deviation theory and applications

LECTURE 2. Convexity and related notions. Last time: mutual information: definitions and properties. Lecture outline

LECTURE 3. Last time:

Concentration inequalities and the entropy method

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 8: Importance Sampling

CS281A/Stat241A Lecture 22

Mañé s Conjecture from the control viewpoint

Convex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version

in Bounded Domains Ariane Trescases CMLA, ENS Cachan

Monte-Carlo MMD-MA, Université Paris-Dauphine. Xiaolu Tan

Duality and dynamics in Hamilton-Jacobi theory for fully convex problems of control

Explicit Bounds for the Distribution Function of the Sum of Dependent Normally Distributed Random Variables

Introduction to Rare Event Simulation

l approche des grandes déviations

Importance Sampling for Rare Events

18.175: Lecture 15 Characteristic functions and central limit theorem

Lecture 2. We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales.

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Lecture Quantitative Finance Spring Term 2015

(Classical) Information Theory II: Source coding

6.1 Variational representation of f-divergences

Mean field games and related models

Quantitative Biology II Lecture 4: Variational Methods

18.175: Lecture 8 Weak laws and moment-generating/characteristic functions

Lecture 6: Bayesian Inference in SDE Models

Intrinsic Noise in Nonlinear Gene Regulation Inference

Control Theory: From Classical to Quantum Optimal, Stochastic, and Robust Control

Lecture 1: Background on Convex Analysis

Spectral Theory of Orthogonal Polynomials

General Theory of Large Deviations

Large-deviation theory and coverage in mobile phone networks

Large deviations for random projections of l p balls

LANGEVIN EQUATION AND THERMODYNAMICS

Lecture 22: Final Review

Exercises with solutions (Set D)

Closed-Loop Impulse Control of Oscillating Systems

Statistical physics models belonging to the generalised exponential family

3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions

Average-cost temporal difference learning and adaptive control variates

Concentration inequalities and tail bounds

An Introduction to Malliavin calculus and its applications

Nonparametric estimation under Shape Restrictions

16. Working with the Langevin and Fokker-Planck equations

10-704: Information Processing and Learning Spring Lecture 8: Feb 5

SOLVABLE VARIATIONAL PROBLEMS IN N STATISTICAL MECHANICS

Upper Bound for Intermediate Singular Values of Random Sub-Gaussian Matrices 1

Concentration Inequalities

Gradient Gibbs measures with disorder

Model Counting for Logical Theories

x log x, which is strictly convex, and use Jensen s Inequality:

1 Stat 605. Homework I. Due Feb. 1, 2011

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.

6 Markov Chain Monte Carlo (MCMC)

A large deviation principle for a RWRC in a box

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

Convex Optimization & Lagrange Duality

4TE3/6TE3. Algorithms for. Continuous Optimization

A new Hellinger-Kantorovich distance between positive measures and optimal Entropy-Transport problems

LARGE DEVIATIONS FOR STOCHASTIC PROCESSES

Lecture 3 January 28

1 Solution to Problem 2.1

Logarithmic Functions

Lecture 8: Bayesian Estimation of Parameters in State Space Models

8. Conjugate functions

Lecture 15 Random variables

Optimization and Optimal Control in Banach Spaces

LECTURE NOTES 57. Lecture 9

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,

MARKOV CHAIN MONTE CARLO

Max stable Processes & Random Fields: Representations, Models, and Prediction

Math 141: Lecture 11

Econ Lecture 14. Outline

Transcription:

Lecture 5: Importance sampling and Hamilton-Jacobi equations Henrik Hult Department of Mathematics KTH Royal Institute of Technology Sweden Summer School on Monte Carlo Methods and Rare Events Brown University, June 13-17, 2016

Outline 1 Large deviations and Hamilton-Jacobi equations 2 Exponential decay of the second moment 3 Construction of subsolutions

The subsolution approach to efficient importance sampling Quantify performance as the exponential rate of decay of the second moment The latter is given as the initial value of a solution to a Hamilton-Jacobi equation Construction of efficient importance sampling algorithms is essentially equivalent to the construction of classical subsolutions of the corresponding HJ-equation

Large deviations and Hamilton-Jacobi equations Consider a process {X n } with continuous trajectories, satisfying a large deviations principle (LDP). The large deviations principle says roughly that, given T > 0, an absolutely continuous φ : [0, T ] R d and a small δ > 0, { P sup t [0,T ] } X n (t) φ(t) > δ e ni T (φ), where I T is the rate function and takes the form: I T (φ) = T 0 L(φ(t), φ(t))dt, and L is the local rate function. For an expectation E[exp{ nf (X n (T ))}] we have, similarly, that { } E[exp{ nf (X n (T ))}] exp n inf {I T (φ) + F(φ(T ))}. φ

Large deviations and Hamilton-Jacobi equations Consider a process {X n } with continuous trajectories, satisfying a large deviations principle (LDP). The large deviations principle says roughly that, given T > 0, an absolutely continuous φ : [0, T ] R d and a small δ > 0, { P sup t [0,T ] } X n (t) φ(t) > δ e ni T (φ), where I T is the rate function and takes the form: I T (φ) = T 0 L(φ(t), φ(t))dt, and L is the local rate function. For an expectation E[exp{ nf (X n (T ))}] we have, similarly, that { } E[exp{ nf (X n (T ))}] exp n inf {I T (φ) + F(φ(T ))}. φ

Recall the Markov random walk model Let {v i (x), x R d, i 0} be independent and identically distributed random vector fields with distribution P{v i (x) } = θ( x), where θ is a regular conditional probability distribution. Let Xi+1 n = X i n + 1 n v i(xi n ), X0 n = x 0. Denote the log moment generating function of θ( x) by H(x, α) = log E[exp{ α, v 1 (x) }] and suppose H(x, α) < for all x and α in R d. The Fenchel-Legendre transform (convex conjugate) of H(x, ), denoted by L(x, β) = sup α R d [ α, β H(x, α)].

The backward equation Let A n denote the backward evolution operator associated with X n, that is, A n f (i, x) = E i,x [f (i + 1, Xi+1 n ) f (i, x)] [ = f (i + 1, x + 1 ] n z) f (i, x) θ(dz x). The (Kolmogorov) backward equation implies that V n (i, x) = E i,x [exp{ nf(x n n )}] satisfies A n V n (i, x) = 0, V n (n, x) = exp{ nf (x)}, where V n (0, x 0 ) = E[exp{ nf (X n n )}] is the quantity we are interested in computing.

The Hamilton-Jacobi equation Redefine V n by V n ( i n, x) := 1 n log E i,x[exp{ nf(x n n )}]. Plugging in this transformation in the backward equation leads to the equation for V n : [ 0 = exp{ n[v n ( i n + 1 n, x + 1 n z) V n ( i ] n, x)]} 1 θ(dz x).

The Hamilton-Jacobi equation Redefine V n by V n ( i n, x) := 1 n log E i,x[exp{ nf(x n n )}]. Plugging in this transformation in the backward equation leads to the equation for V n : [ 0 = exp{ n[v n ( i n + 1 n, x + 1 n z) V n ( i ] n, x)]} 1 θ(dz x).

The Hamilton-Jacobi equation Assuming V n V, with V smooth and i/n t, we may approximate V n ( i n + 1 n, x + 1 n z) V n ( i n, x) V ( i n + 1 n, x + 1 n z) V ( i n, x) 1 [ ] V t (t, x) + DV (t, x), z, n... which leads to 1 = exp{ V t (t, x) DV (t, x), z }θ(dz x) = e V t (t,x)+h(x, DV (t,x))... and taking logarithm on both sides yields V t (t, x) H(x, DV (t, x)) = 0, with terminal condition V (T, x) = F(x)

The Hamilton-Jacobi equation Assuming V n V, with V smooth and i/n t, we may approximate V n ( i n + 1 n, x + 1 n z) V n ( i n, x) V ( i n + 1 n, x + 1 n z) V ( i n, x) 1 [ ] V t (t, x) + DV (t, x), z, n... which leads to 1 = exp{ V t (t, x) DV (t, x), z }θ(dz x) = e V t (t,x)+h(x, DV (t,x))... and taking logarithm on both sides yields V t (t, x) H(x, DV (t, x)) = 0, with terminal condition V (T, x) = F(x)

The Hamilton-Jacobi equation Assuming V n V, with V smooth and i/n t, we may approximate V n ( i n + 1 n, x + 1 n z) V n ( i n, x) V ( i n + 1 n, x + 1 n z) V ( i n, x) 1 [ ] V t (t, x) + DV (t, x), z, n... which leads to 1 = exp{ V t (t, x) DV (t, x), z }θ(dz x) = e V t (t,x)+h(x, DV (t,x))... and taking logarithm on both sides yields V t (t, x) H(x, DV (t, x)) = 0, with terminal condition V (T, x) = F(x)

The Hamilton-Jacobi equation We have formally derived the Hamilton-Jacobi equation for the exponential rate of decay V (t, x): V t (t, x) H(x, DV (t, x)) = 0, V (T, x) = F(x). Using the theory of viscosity solutions and variational representation one can show that the unique viscosity solution can be given the variational representation { T } V (t, x) = inf L(φ(s), φ(s))ds + F(φ(T )), t where the infimum is taken over all absolutely continuous φ with φ(t) = x. In particular we can identify the exponential rate as the initial value: V (0, x 0 ) = inf{i T (φ) + F(φ(T ))}.

Exponential decay of the second moment Recall from Lecture 2 that the importance sampling estimator for computing E[exp{ nf (X n n )}] is given by dp dp α e nf( X n n ) = exp { n 1 i=0 ᾱ n i, Z i + H( X n i, ᾱ n i ) } exp{ nf ( X n n )}. The second moment of the estimator, starting from x at time j is given by: [( Ej,x ᾱ exp { n 1 i=j ᾱi n, Z i + H( X } i n, ᾱi n ) exp{ nf( X ) 2 ] n n )} [ ] = E j,x exp{sᾱn Sᾱj 2nF(Xn n )}.

Exponential decay of the second moment Recall from Lecture 2 that the importance sampling estimator for computing E[exp{ nf (X n n )}] is given by dp dp α e nf( X n n ) = exp { n 1 i=0 ᾱ n i, Z i + H( X n i, ᾱ n i ) } exp{ nf ( X n n )}. The second moment of the estimator, starting from x at time j is given by: [( Ej,x ᾱ exp { n 1 i=j ᾱi n, Z i + H( X } i n, ᾱi n ) exp{ nf( X ) 2 ] n n )} [ ] = E j,x exp{sᾱn Sᾱj 2nF(Xn n )}.

Exponential decay of the second moment Recall from Lecture 2 that the importance sampling estimator for computing E[exp{ nf (X n n )}] is given by dp dp α e nf( X n n ) = exp { n 1 i=0 ᾱ n i, Z i + H( X n i, ᾱ n i ) } exp{ nf ( X n n )}. The second moment of the estimator, starting from x at time j is given by: [( Ej,x ᾱ exp { n 1 i=j ᾱi n, Z i + H( X } i n, ᾱi n ) exp{ nf( X ) 2 ] n n )} [ ] = E j,x exp{sᾱn Sᾱj 2nF(Xn n )}.

Exponential decay of the second moment We expect that the second moment decays exponentially fast in n and as above we abuse notation (from Lecture 2) and redefine W n by W n ( j [ ] n, x) := 1 n log E j,x e Sᾱn Sᾱj 2nF(Xn n ). Using the backward equation for the second moment from Lecture 2 gives [ ] 0 = e n[w n ( j n + 1 n,x+ 1 n z) W n ( j n,x)] ᾱ j (x),z +H(x,ᾱ j (x)) 1 θ(dz x). Assuming W n W, with W smooth, j/n t, and ᾱj n (x) ᾱ(t, x) we may approximate W n ( j n + 1 n, x + 1 n z) W n ( j n, x) W ( j n + 1 n, x + 1 n z) W ( j n, x) 1 [ ] W t (t, x) + DW (t, x), z, n

Exponential decay of the second moment... which leads to 1 = e W t (t,x) DW (t,x)+ᾱ(t,x),z +H(x,ᾱ(t,x)) θ(dz x) = e W t (t,x)+h(x, DW (t,x) ᾱ(t,x))+h(x,ᾱ(t,x))... and taking logarithm on both sides yields W t (t, x) H(x, DW (t, x) ᾱ(t, x)) H(x, ᾱ(t, x)) = 0, with terminal condition W (T, x) = 2F(x).

Exponential decay of the second moment We have formally derived the Hamilton-Jacobi equation for the exponential rate of decay of the second moment of an importance sampling algorithm based on the change of measure ᾱ: W t (t, x) H(x, DW (t, x) ᾱ(t, x)) H(x, ᾱ(t, x)) = 0, W (T, x) = 2F(x). We have established a Hamilton-Jacobi equation for V (t, x) such that E[exp{ nf(x n (T ))}] e nv (0,x 0). Moreover, given a change of measure ᾱj n (x) ᾱ(t, x), as n and j/n t the second moment of the importance sampling estimator is approximately equal to exp{ nw (0, x 0 )}.

The role of subsolutions Let V is a classical subsolution: continuously differentiable s.t. V t (t, x) H(x, D V (t, x)) 0, V (T, x) F(x). Consider the importance sampling algorithm designed by taking ᾱ(t, x) = D V (t, x). Then W (t, x) = V (t, x) + V (t, x) is a viscosity subsolution to HJ-eqn for W.

The role of subsolutions Indeed, since V is a (viscosity) solution and V is a subsolution W t (t, x) H(x, D W (t, x) ᾱ(t, x)) H(x, ᾱ(t, x)) = V t (t, x) H(x, DV (t, x)) + V t (t, x) H(x, D V (t, x)) 0,... and for the terminal condition we have W (T, x) = V (T, x) + V (T, x) 2F(x). Hence, V + V is a viscosity subsolution and V (t, x) + V (t, x) W (t, x) for all t T and all x. In particular, at the starting point (0, x 0 ) we have V (0, x 0 ) + V (0, x 0 ) W (0, x 0 ).

The role of subsolutions The subsolution property leads to an asymptotic upper bound on the second moment exp{ nw (0, x 0 )} exp{ nv (0, x 0 )} exp{ n V (0, x 0 )}. We also have, from Jensen s inequality that the second moment is larger than the square of the first moment, exp{ nw (0, x 0 )} exp{ n2v (0, x 0 )}, which leads to W (0, x 0 ) 2V (0, x 0 ). Consequently, if we can find a classical subsolution V with V (0, x 0 ) = V (0, x 0 ), then the importance sampling algorithm based on taking ᾱ n j (x) = D V (j/n, x) will be asymptotically optimal.

The Gaussian random walk Construction of subsolutions Consider the mean of iid N(0, 1)-random variables and the probability P{X n n (,, a] [b, )}. Let For V i, i = 1, 2, we have V 1 (t, x) = a(b x) (1 t)h(a), V 2 (t, x) = b(a x) (1 t)h(a). V i t (t, x) H( DV i (t, x)) = H(a) H(a) = 0,

The Gaussian random walk Construction of subsolutions We propose to take V (t, x) = V 1 (t, x) V 2 (t, x). For the terminal condition we have V (1, x) = ab + ( ax) ( bx) 0, x / (a, b). Thus V is a subsolution (in fact a viscosity subsolution). A mollification argument leads us to take V as V (t, x) = δ log(e 1 δ V 1 (t,x) + e 1 δ V 2 (t,x) ), for some small δ > 0.

The epidemic model Construction of subsolutions Consider the probability that the infection, starting from x 0 > 1 ρ 1 reaches a high level x 1 > x 0 before returning to 1 ρ 1. This is a stationary problem and the design of an importance sampling algorithm is related to finding a classical subsolution to the stationary Hamilton-Jacobi equation H(x, DV (x)) = 0, V (x 1 ) = 0, V (1 ρ 1 ) =, where H(x, α) = λ 1 (x)(e α 1) + λ 1 (x)(e α 1).

The epidemic model Construction of subsolutions In this case we can actually work out the quasi-potential V (x). Indeed, first consider α(x) as a solution to H(x, α) = 0. We note that the solution is [ λ 1 (x) + λ 1 (x) α(x) = log ± 2λ 1 (x) (λ 1 (x) + λ 1 (x) 2λ 1 (x) To obtain V (x) we simply need to intergrate: V (x) = x1 x [ λ 1 (z) + λ 1 (z) log + 2λ 1 (z) ( λ 1 (z) + λ 1 (z) 2λ 1 (z) ) 2 λ 1 (x) ]. λ 1 (x) ) 2 λ 1 (z) ] dz. λ 1 (z)