Lecture 5: Importance sampling and Hamilton-Jacobi equations

Size: px

Start display at page:

Download "Lecture 5: Importance sampling and Hamilton-Jacobi equations"

Alexia Edwards
5 years ago
Views:

1 Lecture 5: Importance sampling and Hamilton-Jacobi equations Henrik Hult Department of Mathematics KTH Royal Institute of Technology Sweden Summer School on Monte Carlo Methods and Rare Events Brown University, June 13-17, 2016

2 Outline 1 Large deviations and Hamilton-Jacobi equations 2 Exponential decay of the second moment 3 Construction of subsolutions

3 The subsolution approach to efficient importance sampling Quantify performance as the exponential rate of decay of the second moment The latter is given as the initial value of a solution to a Hamilton-Jacobi equation Construction of efficient importance sampling algorithms is essentially equivalent to the construction of classical subsolutions of the corresponding HJ-equation

4 Large deviations and Hamilton-Jacobi equations Consider a process {X n } with continuous trajectories, satisfying a large deviations principle (LDP). The large deviations principle says roughly that, given T > 0, an absolutely continuous φ : [0, T ] R d and a small δ > 0, { P sup t [0,T ] } X n (t) φ(t) > δ e ni T (φ), where I T is the rate function and takes the form: I T (φ) = T 0 L(φ(t), φ(t))dt, and L is the local rate function. For an expectation E[exp{ nf (X n (T ))}] we have, similarly, that { } E[exp{ nf (X n (T ))}] exp n inf {I T (φ) + F(φ(T ))}. φ

5 Large deviations and Hamilton-Jacobi equations Consider a process {X n } with continuous trajectories, satisfying a large deviations principle (LDP). The large deviations principle says roughly that, given T > 0, an absolutely continuous φ : [0, T ] R d and a small δ > 0, { P sup t [0,T ] } X n (t) φ(t) > δ e ni T (φ), where I T is the rate function and takes the form: I T (φ) = T 0 L(φ(t), φ(t))dt, and L is the local rate function. For an expectation E[exp{ nf (X n (T ))}] we have, similarly, that { } E[exp{ nf (X n (T ))}] exp n inf {I T (φ) + F(φ(T ))}. φ

6 Recall the Markov random walk model Let {v i (x), x R d, i 0} be independent and identically distributed random vector fields with distribution P{v i (x) } = θ( x), where θ is a regular conditional probability distribution. Let Xi+1 n = X i n + 1 n v i(xi n ), X0 n = x 0. Denote the log moment generating function of θ( x) by H(x, α) = log E[exp{ α, v 1 (x) }] and suppose H(x, α) < for all x and α in R d. The Fenchel-Legendre transform (convex conjugate) of H(x, ), denoted by L(x, β) = sup α R d [ α, β H(x, α)].

7 The backward equation Let A n denote the backward evolution operator associated with X n, that is, A n f (i, x) = E i,x [f (i + 1, Xi+1 n ) f (i, x)] [ = f (i + 1, x + 1 ] n z) f (i, x) θ(dz x). The (Kolmogorov) backward equation implies that V n (i, x) = E i,x [exp{ nf(x n n )}] satisfies A n V n (i, x) = 0, V n (n, x) = exp{ nf (x)}, where V n (0, x 0 ) = E[exp{ nf (X n n )}] is the quantity we are interested in computing.

8 The Hamilton-Jacobi equation Redefine V n by V n ( i n, x) := 1 n log E i,x[exp{ nf(x n n )}]. Plugging in this transformation in the backward equation leads to the equation for V n : [ 0 = exp{ n[v n ( i n + 1 n, x + 1 n z) V n ( i ] n, x)]} 1 θ(dz x).

9 The Hamilton-Jacobi equation Redefine V n by V n ( i n, x) := 1 n log E i,x[exp{ nf(x n n )}]. Plugging in this transformation in the backward equation leads to the equation for V n : [ 0 = exp{ n[v n ( i n + 1 n, x + 1 n z) V n ( i ] n, x)]} 1 θ(dz x).

10 The Hamilton-Jacobi equation Assuming V n V, with V smooth and i/n t, we may approximate V n ( i n + 1 n, x + 1 n z) V n ( i n, x) V ( i n + 1 n, x + 1 n z) V ( i n, x) 1 [ ] V t (t, x) + DV (t, x), z, n... which leads to 1 = exp{ V t (t, x) DV (t, x), z }θ(dz x) = e V t (t,x)+h(x, DV (t,x))... and taking logarithm on both sides yields V t (t, x) H(x, DV (t, x)) = 0, with terminal condition V (T, x) = F(x)

11 The Hamilton-Jacobi equation Assuming V n V, with V smooth and i/n t, we may approximate V n ( i n + 1 n, x + 1 n z) V n ( i n, x) V ( i n + 1 n, x + 1 n z) V ( i n, x) 1 [ ] V t (t, x) + DV (t, x), z, n... which leads to 1 = exp{ V t (t, x) DV (t, x), z }θ(dz x) = e V t (t,x)+h(x, DV (t,x))... and taking logarithm on both sides yields V t (t, x) H(x, DV (t, x)) = 0, with terminal condition V (T, x) = F(x)

12 The Hamilton-Jacobi equation Assuming V n V, with V smooth and i/n t, we may approximate V n ( i n + 1 n, x + 1 n z) V n ( i n, x) V ( i n + 1 n, x + 1 n z) V ( i n, x) 1 [ ] V t (t, x) + DV (t, x), z, n... which leads to 1 = exp{ V t (t, x) DV (t, x), z }θ(dz x) = e V t (t,x)+h(x, DV (t,x))... and taking logarithm on both sides yields V t (t, x) H(x, DV (t, x)) = 0, with terminal condition V (T, x) = F(x)

13 The Hamilton-Jacobi equation We have formally derived the Hamilton-Jacobi equation for the exponential rate of decay V (t, x): V t (t, x) H(x, DV (t, x)) = 0, V (T, x) = F(x). Using the theory of viscosity solutions and variational representation one can show that the unique viscosity solution can be given the variational representation { T } V (t, x) = inf L(φ(s), φ(s))ds + F(φ(T )), t where the infimum is taken over all absolutely continuous φ with φ(t) = x. In particular we can identify the exponential rate as the initial value: V (0, x 0 ) = inf{i T (φ) + F(φ(T ))}.

14 Exponential decay of the second moment Recall from Lecture 2 that the importance sampling estimator for computing E[exp{ nf (X n n )}] is given by dp dp α e nf( X n n ) = exp { n 1 i=0 ᾱ n i, Z i + H( X n i, ᾱ n i ) } exp{ nf ( X n n )}. The second moment of the estimator, starting from x at time j is given by: [( Ej,x ᾱ exp { n 1 i=j ᾱi n, Z i + H( X } i n, ᾱi n ) exp{ nf( X ) 2 ] n n )} [ ] = E j,x exp{sᾱn Sᾱj 2nF(Xn n )}.

15 Exponential decay of the second moment Recall from Lecture 2 that the importance sampling estimator for computing E[exp{ nf (X n n )}] is given by dp dp α e nf( X n n ) = exp { n 1 i=0 ᾱ n i, Z i + H( X n i, ᾱ n i ) } exp{ nf ( X n n )}. The second moment of the estimator, starting from x at time j is given by: [( Ej,x ᾱ exp { n 1 i=j ᾱi n, Z i + H( X } i n, ᾱi n ) exp{ nf( X ) 2 ] n n )} [ ] = E j,x exp{sᾱn Sᾱj 2nF(Xn n )}.

16 Exponential decay of the second moment Recall from Lecture 2 that the importance sampling estimator for computing E[exp{ nf (X n n )}] is given by dp dp α e nf( X n n ) = exp { n 1 i=0 ᾱ n i, Z i + H( X n i, ᾱ n i ) } exp{ nf ( X n n )}. The second moment of the estimator, starting from x at time j is given by: [( Ej,x ᾱ exp { n 1 i=j ᾱi n, Z i + H( X } i n, ᾱi n ) exp{ nf( X ) 2 ] n n )} [ ] = E j,x exp{sᾱn Sᾱj 2nF(Xn n )}.

17 Exponential decay of the second moment We expect that the second moment decays exponentially fast in n and as above we abuse notation (from Lecture 2) and redefine W n by W n ( j [ ] n, x) := 1 n log E j,x e Sᾱn Sᾱj 2nF(Xn n ). Using the backward equation for the second moment from Lecture 2 gives [ ] 0 = e n[w n ( j n + 1 n,x+ 1 n z) W n ( j n,x)] ᾱ j (x),z +H(x,ᾱ j (x)) 1 θ(dz x). Assuming W n W, with W smooth, j/n t, and ᾱj n (x) ᾱ(t, x) we may approximate W n ( j n + 1 n, x + 1 n z) W n ( j n, x) W ( j n + 1 n, x + 1 n z) W ( j n, x) 1 [ ] W t (t, x) + DW (t, x), z, n

18 Exponential decay of the second moment... which leads to 1 = e W t (t,x) DW (t,x)+ᾱ(t,x),z +H(x,ᾱ(t,x)) θ(dz x) = e W t (t,x)+h(x, DW (t,x) ᾱ(t,x))+h(x,ᾱ(t,x))... and taking logarithm on both sides yields W t (t, x) H(x, DW (t, x) ᾱ(t, x)) H(x, ᾱ(t, x)) = 0, with terminal condition W (T, x) = 2F(x).

19 Exponential decay of the second moment We have formally derived the Hamilton-Jacobi equation for the exponential rate of decay of the second moment of an importance sampling algorithm based on the change of measure ᾱ: W t (t, x) H(x, DW (t, x) ᾱ(t, x)) H(x, ᾱ(t, x)) = 0, W (T, x) = 2F(x). We have established a Hamilton-Jacobi equation for V (t, x) such that E[exp{ nf(x n (T ))}] e nv (0,x 0). Moreover, given a change of measure ᾱj n (x) ᾱ(t, x), as n and j/n t the second moment of the importance sampling estimator is approximately equal to exp{ nw (0, x 0 )}.

20 The role of subsolutions Let V is a classical subsolution: continuously differentiable s.t. V t (t, x) H(x, D V (t, x)) 0, V (T, x) F(x). Consider the importance sampling algorithm designed by taking ᾱ(t, x) = D V (t, x). Then W (t, x) = V (t, x) + V (t, x) is a viscosity subsolution to HJ-eqn for W.

21 The role of subsolutions Indeed, since V is a (viscosity) solution and V is a subsolution W t (t, x) H(x, D W (t, x) ᾱ(t, x)) H(x, ᾱ(t, x)) = V t (t, x) H(x, DV (t, x)) + V t (t, x) H(x, D V (t, x)) 0,... and for the terminal condition we have W (T, x) = V (T, x) + V (T, x) 2F(x). Hence, V + V is a viscosity subsolution and V (t, x) + V (t, x) W (t, x) for all t T and all x. In particular, at the starting point (0, x 0 ) we have V (0, x 0 ) + V (0, x 0 ) W (0, x 0 ).

22 The role of subsolutions The subsolution property leads to an asymptotic upper bound on the second moment exp{ nw (0, x 0 )} exp{ nv (0, x 0 )} exp{ n V (0, x 0 )}. We also have, from Jensen s inequality that the second moment is larger than the square of the first moment, exp{ nw (0, x 0 )} exp{ n2v (0, x 0 )}, which leads to W (0, x 0 ) 2V (0, x 0 ). Consequently, if we can find a classical subsolution V with V (0, x 0 ) = V (0, x 0 ), then the importance sampling algorithm based on taking ᾱ n j (x) = D V (j/n, x) will be asymptotically optimal.

23 The Gaussian random walk Construction of subsolutions Consider the mean of iid N(0, 1)-random variables and the probability P{X n n (,, a] [b, )}. Let For V i, i = 1, 2, we have V 1 (t, x) = a(b x) (1 t)h(a), V 2 (t, x) = b(a x) (1 t)h(a). V i t (t, x) H( DV i (t, x)) = H(a) H(a) = 0,

24 The Gaussian random walk Construction of subsolutions We propose to take V (t, x) = V 1 (t, x) V 2 (t, x). For the terminal condition we have V (1, x) = ab + ( ax) ( bx) 0, x / (a, b). Thus V is a subsolution (in fact a viscosity subsolution). A mollification argument leads us to take V as V (t, x) = δ log(e 1 δ V 1 (t,x) + e 1 δ V 2 (t,x) ), for some small δ > 0.

25 The epidemic model Construction of subsolutions Consider the probability that the infection, starting from x 0 > 1 ρ 1 reaches a high level x 1 > x 0 before returning to 1 ρ 1. This is a stationary problem and the design of an importance sampling algorithm is related to finding a classical subsolution to the stationary Hamilton-Jacobi equation H(x, DV (x)) = 0, V (x 1 ) = 0, V (1 ρ 1 ) =, where H(x, α) = λ 1 (x)(e α 1) + λ 1 (x)(e α 1).

26 The epidemic model Construction of subsolutions In this case we can actually work out the quasi-potential V (x). Indeed, first consider α(x) as a solution to H(x, α) = 0. We note that the solution is [ λ 1 (x) + λ 1 (x) α(x) = log ± 2λ 1 (x) (λ 1 (x) + λ 1 (x) 2λ 1 (x) To obtain V (x) we simply need to intergrate: V (x) = x1 x [ λ 1 (z) + λ 1 (z) log + 2λ 1 (z) ( λ 1 (z) + λ 1 (z) 2λ 1 (z) ) 2 λ 1 (x) ]. λ 1 (x) ) 2 λ 1 (z) ] dz. λ 1 (z)

Weak convergence and large deviation theory

First Prev Next Go To Go Back Full Screen Close Quit 1 Weak convergence and large deviation theory Large deviation principle Convergence in distribution The Bryc-Varadhan theorem Tightness and Prohorov