Lecture 5: Importance sampling and Hamilton-Jacobi equations Henrik Hult Department of Mathematics KTH Royal Institute of Technology Sweden Summer School on Monte Carlo Methods and Rare Events Brown University, June 13-17, 2016
Outline 1 Large deviations and Hamilton-Jacobi equations 2 Exponential decay of the second moment 3 Construction of subsolutions
The subsolution approach to efficient importance sampling Quantify performance as the exponential rate of decay of the second moment The latter is given as the initial value of a solution to a Hamilton-Jacobi equation Construction of efficient importance sampling algorithms is essentially equivalent to the construction of classical subsolutions of the corresponding HJ-equation
Large deviations and Hamilton-Jacobi equations Consider a process {X n } with continuous trajectories, satisfying a large deviations principle (LDP). The large deviations principle says roughly that, given T > 0, an absolutely continuous φ : [0, T ] R d and a small δ > 0, { P sup t [0,T ] } X n (t) φ(t) > δ e ni T (φ), where I T is the rate function and takes the form: I T (φ) = T 0 L(φ(t), φ(t))dt, and L is the local rate function. For an expectation E[exp{ nf (X n (T ))}] we have, similarly, that { } E[exp{ nf (X n (T ))}] exp n inf {I T (φ) + F(φ(T ))}. φ
Large deviations and Hamilton-Jacobi equations Consider a process {X n } with continuous trajectories, satisfying a large deviations principle (LDP). The large deviations principle says roughly that, given T > 0, an absolutely continuous φ : [0, T ] R d and a small δ > 0, { P sup t [0,T ] } X n (t) φ(t) > δ e ni T (φ), where I T is the rate function and takes the form: I T (φ) = T 0 L(φ(t), φ(t))dt, and L is the local rate function. For an expectation E[exp{ nf (X n (T ))}] we have, similarly, that { } E[exp{ nf (X n (T ))}] exp n inf {I T (φ) + F(φ(T ))}. φ
Recall the Markov random walk model Let {v i (x), x R d, i 0} be independent and identically distributed random vector fields with distribution P{v i (x) } = θ( x), where θ is a regular conditional probability distribution. Let Xi+1 n = X i n + 1 n v i(xi n ), X0 n = x 0. Denote the log moment generating function of θ( x) by H(x, α) = log E[exp{ α, v 1 (x) }] and suppose H(x, α) < for all x and α in R d. The Fenchel-Legendre transform (convex conjugate) of H(x, ), denoted by L(x, β) = sup α R d [ α, β H(x, α)].
The backward equation Let A n denote the backward evolution operator associated with X n, that is, A n f (i, x) = E i,x [f (i + 1, Xi+1 n ) f (i, x)] [ = f (i + 1, x + 1 ] n z) f (i, x) θ(dz x). The (Kolmogorov) backward equation implies that V n (i, x) = E i,x [exp{ nf(x n n )}] satisfies A n V n (i, x) = 0, V n (n, x) = exp{ nf (x)}, where V n (0, x 0 ) = E[exp{ nf (X n n )}] is the quantity we are interested in computing.
The Hamilton-Jacobi equation Redefine V n by V n ( i n, x) := 1 n log E i,x[exp{ nf(x n n )}]. Plugging in this transformation in the backward equation leads to the equation for V n : [ 0 = exp{ n[v n ( i n + 1 n, x + 1 n z) V n ( i ] n, x)]} 1 θ(dz x).
The Hamilton-Jacobi equation Redefine V n by V n ( i n, x) := 1 n log E i,x[exp{ nf(x n n )}]. Plugging in this transformation in the backward equation leads to the equation for V n : [ 0 = exp{ n[v n ( i n + 1 n, x + 1 n z) V n ( i ] n, x)]} 1 θ(dz x).
The Hamilton-Jacobi equation Assuming V n V, with V smooth and i/n t, we may approximate V n ( i n + 1 n, x + 1 n z) V n ( i n, x) V ( i n + 1 n, x + 1 n z) V ( i n, x) 1 [ ] V t (t, x) + DV (t, x), z, n... which leads to 1 = exp{ V t (t, x) DV (t, x), z }θ(dz x) = e V t (t,x)+h(x, DV (t,x))... and taking logarithm on both sides yields V t (t, x) H(x, DV (t, x)) = 0, with terminal condition V (T, x) = F(x)
The Hamilton-Jacobi equation Assuming V n V, with V smooth and i/n t, we may approximate V n ( i n + 1 n, x + 1 n z) V n ( i n, x) V ( i n + 1 n, x + 1 n z) V ( i n, x) 1 [ ] V t (t, x) + DV (t, x), z, n... which leads to 1 = exp{ V t (t, x) DV (t, x), z }θ(dz x) = e V t (t,x)+h(x, DV (t,x))... and taking logarithm on both sides yields V t (t, x) H(x, DV (t, x)) = 0, with terminal condition V (T, x) = F(x)
The Hamilton-Jacobi equation Assuming V n V, with V smooth and i/n t, we may approximate V n ( i n + 1 n, x + 1 n z) V n ( i n, x) V ( i n + 1 n, x + 1 n z) V ( i n, x) 1 [ ] V t (t, x) + DV (t, x), z, n... which leads to 1 = exp{ V t (t, x) DV (t, x), z }θ(dz x) = e V t (t,x)+h(x, DV (t,x))... and taking logarithm on both sides yields V t (t, x) H(x, DV (t, x)) = 0, with terminal condition V (T, x) = F(x)
The Hamilton-Jacobi equation We have formally derived the Hamilton-Jacobi equation for the exponential rate of decay V (t, x): V t (t, x) H(x, DV (t, x)) = 0, V (T, x) = F(x). Using the theory of viscosity solutions and variational representation one can show that the unique viscosity solution can be given the variational representation { T } V (t, x) = inf L(φ(s), φ(s))ds + F(φ(T )), t where the infimum is taken over all absolutely continuous φ with φ(t) = x. In particular we can identify the exponential rate as the initial value: V (0, x 0 ) = inf{i T (φ) + F(φ(T ))}.
Exponential decay of the second moment Recall from Lecture 2 that the importance sampling estimator for computing E[exp{ nf (X n n )}] is given by dp dp α e nf( X n n ) = exp { n 1 i=0 ᾱ n i, Z i + H( X n i, ᾱ n i ) } exp{ nf ( X n n )}. The second moment of the estimator, starting from x at time j is given by: [( Ej,x ᾱ exp { n 1 i=j ᾱi n, Z i + H( X } i n, ᾱi n ) exp{ nf( X ) 2 ] n n )} [ ] = E j,x exp{sᾱn Sᾱj 2nF(Xn n )}.
Exponential decay of the second moment Recall from Lecture 2 that the importance sampling estimator for computing E[exp{ nf (X n n )}] is given by dp dp α e nf( X n n ) = exp { n 1 i=0 ᾱ n i, Z i + H( X n i, ᾱ n i ) } exp{ nf ( X n n )}. The second moment of the estimator, starting from x at time j is given by: [( Ej,x ᾱ exp { n 1 i=j ᾱi n, Z i + H( X } i n, ᾱi n ) exp{ nf( X ) 2 ] n n )} [ ] = E j,x exp{sᾱn Sᾱj 2nF(Xn n )}.
Exponential decay of the second moment Recall from Lecture 2 that the importance sampling estimator for computing E[exp{ nf (X n n )}] is given by dp dp α e nf( X n n ) = exp { n 1 i=0 ᾱ n i, Z i + H( X n i, ᾱ n i ) } exp{ nf ( X n n )}. The second moment of the estimator, starting from x at time j is given by: [( Ej,x ᾱ exp { n 1 i=j ᾱi n, Z i + H( X } i n, ᾱi n ) exp{ nf( X ) 2 ] n n )} [ ] = E j,x exp{sᾱn Sᾱj 2nF(Xn n )}.
Exponential decay of the second moment We expect that the second moment decays exponentially fast in n and as above we abuse notation (from Lecture 2) and redefine W n by W n ( j [ ] n, x) := 1 n log E j,x e Sᾱn Sᾱj 2nF(Xn n ). Using the backward equation for the second moment from Lecture 2 gives [ ] 0 = e n[w n ( j n + 1 n,x+ 1 n z) W n ( j n,x)] ᾱ j (x),z +H(x,ᾱ j (x)) 1 θ(dz x). Assuming W n W, with W smooth, j/n t, and ᾱj n (x) ᾱ(t, x) we may approximate W n ( j n + 1 n, x + 1 n z) W n ( j n, x) W ( j n + 1 n, x + 1 n z) W ( j n, x) 1 [ ] W t (t, x) + DW (t, x), z, n
Exponential decay of the second moment... which leads to 1 = e W t (t,x) DW (t,x)+ᾱ(t,x),z +H(x,ᾱ(t,x)) θ(dz x) = e W t (t,x)+h(x, DW (t,x) ᾱ(t,x))+h(x,ᾱ(t,x))... and taking logarithm on both sides yields W t (t, x) H(x, DW (t, x) ᾱ(t, x)) H(x, ᾱ(t, x)) = 0, with terminal condition W (T, x) = 2F(x).
Exponential decay of the second moment We have formally derived the Hamilton-Jacobi equation for the exponential rate of decay of the second moment of an importance sampling algorithm based on the change of measure ᾱ: W t (t, x) H(x, DW (t, x) ᾱ(t, x)) H(x, ᾱ(t, x)) = 0, W (T, x) = 2F(x). We have established a Hamilton-Jacobi equation for V (t, x) such that E[exp{ nf(x n (T ))}] e nv (0,x 0). Moreover, given a change of measure ᾱj n (x) ᾱ(t, x), as n and j/n t the second moment of the importance sampling estimator is approximately equal to exp{ nw (0, x 0 )}.
The role of subsolutions Let V is a classical subsolution: continuously differentiable s.t. V t (t, x) H(x, D V (t, x)) 0, V (T, x) F(x). Consider the importance sampling algorithm designed by taking ᾱ(t, x) = D V (t, x). Then W (t, x) = V (t, x) + V (t, x) is a viscosity subsolution to HJ-eqn for W.
The role of subsolutions Indeed, since V is a (viscosity) solution and V is a subsolution W t (t, x) H(x, D W (t, x) ᾱ(t, x)) H(x, ᾱ(t, x)) = V t (t, x) H(x, DV (t, x)) + V t (t, x) H(x, D V (t, x)) 0,... and for the terminal condition we have W (T, x) = V (T, x) + V (T, x) 2F(x). Hence, V + V is a viscosity subsolution and V (t, x) + V (t, x) W (t, x) for all t T and all x. In particular, at the starting point (0, x 0 ) we have V (0, x 0 ) + V (0, x 0 ) W (0, x 0 ).
The role of subsolutions The subsolution property leads to an asymptotic upper bound on the second moment exp{ nw (0, x 0 )} exp{ nv (0, x 0 )} exp{ n V (0, x 0 )}. We also have, from Jensen s inequality that the second moment is larger than the square of the first moment, exp{ nw (0, x 0 )} exp{ n2v (0, x 0 )}, which leads to W (0, x 0 ) 2V (0, x 0 ). Consequently, if we can find a classical subsolution V with V (0, x 0 ) = V (0, x 0 ), then the importance sampling algorithm based on taking ᾱ n j (x) = D V (j/n, x) will be asymptotically optimal.
The Gaussian random walk Construction of subsolutions Consider the mean of iid N(0, 1)-random variables and the probability P{X n n (,, a] [b, )}. Let For V i, i = 1, 2, we have V 1 (t, x) = a(b x) (1 t)h(a), V 2 (t, x) = b(a x) (1 t)h(a). V i t (t, x) H( DV i (t, x)) = H(a) H(a) = 0,
The Gaussian random walk Construction of subsolutions We propose to take V (t, x) = V 1 (t, x) V 2 (t, x). For the terminal condition we have V (1, x) = ab + ( ax) ( bx) 0, x / (a, b). Thus V is a subsolution (in fact a viscosity subsolution). A mollification argument leads us to take V as V (t, x) = δ log(e 1 δ V 1 (t,x) + e 1 δ V 2 (t,x) ), for some small δ > 0.
The epidemic model Construction of subsolutions Consider the probability that the infection, starting from x 0 > 1 ρ 1 reaches a high level x 1 > x 0 before returning to 1 ρ 1. This is a stationary problem and the design of an importance sampling algorithm is related to finding a classical subsolution to the stationary Hamilton-Jacobi equation H(x, DV (x)) = 0, V (x 1 ) = 0, V (1 ρ 1 ) =, where H(x, α) = λ 1 (x)(e α 1) + λ 1 (x)(e α 1).
The epidemic model Construction of subsolutions In this case we can actually work out the quasi-potential V (x). Indeed, first consider α(x) as a solution to H(x, α) = 0. We note that the solution is [ λ 1 (x) + λ 1 (x) α(x) = log ± 2λ 1 (x) (λ 1 (x) + λ 1 (x) 2λ 1 (x) To obtain V (x) we simply need to intergrate: V (x) = x1 x [ λ 1 (z) + λ 1 (z) log + 2λ 1 (z) ( λ 1 (z) + λ 1 (z) 2λ 1 (z) ) 2 λ 1 (x) ]. λ 1 (x) ) 2 λ 1 (z) ] dz. λ 1 (z)