Monte Carlo Methods: Lecture 3 : Importance Sampling

Mote Carlo Methods: Lecture 3 : Importace Samplig Nick Whiteley 16.10.2008 Course material origially by Adam Johase ad Ludger Evers 2007

Overview of this lecture What we have see... Rejectio samplig. This lecture will cover... Importace samplig. Basic importace samplig Importace samplig usig self-ormalised weights Fiite variace estimates Optimal proposals Example

Recall rejectio samplig Algorithm 2.1: Rejectio samplig Give two desities f, g with f(x) < M g(x) for all x, we ca geerate a sample from f by 1. Draw X g. 2. Accept X as a sample from f with probability otherwise go back to step 1. Drawbacks: f(x) M g(x), We eed that f(x) < M g(x) O average we eed to repeat the first step M times before we ca accept a value proposed by g.

The fudametal idetities behid importace samplig (1) Assume that g(x) > 0 for (almost) all x with f(x) > 0. The for a measurable set A: P(X A) = A f(x) dx = A g(x) f(x) g(x) }{{} =:w(x) dx = A g(x)w(x) dx For some itegrable test fuctio h, assume that g(x) > 0 for (almost) all x with f(x) h(x) 0 = E f (h(x)) = f(x)h(x) dx = g(x)w(x)h(x) dx = E g (w(x) h(x)), g(x) f(x) h(x) dx g(x) }{{} =:w(x)

The fudametal idetities behid importace samplig (2) How ca we make use of E f (h(x)) = E g (w(x) h(x))? Cosider X 1,..., X g ad E g w(x) h(x) < +. The 1 a.s. w(x i )h(x i ) E g (w(x) h(x)) (law of large umbers), which implies 1 a.s. w(x i )h(x i ) E f (h(x)). Thus we ca estimate µ := E f (h(x)) by 1 Sample X 1,..., X g 2 µ := 1 w(x i)h(x i )

The importace samplig algorithm Algorithm 2.1a: Importace Samplig Choose g such that supp(g) supp(f h). 1. For i = 1,..., : i. Geerate X i g. ii. Set w(x i ) = f(xi) g(x. i) 2. Retur µ = w(w i)h(x i ) as a estimate of E f (h(x)). Cotrary to rejectio samplig, importace samplig does ot yield realisatios from f, but a weighted sample (X i, W i ). The weighted sample ca be used for estimatig expectatios E f (h(x)) (ad thus probabilities, etc.)

Basic properties of the importace samplig estimate We have already see that µ is cosistet if supp(g) supp(f h) ad E g w(x) h(x) < +, as µ := 1 a.s. w(x i )h(x i ) E f (h(x)) The expected value of the weights is E g (w(x)) = 1. µ is ubiased (see theorem below) Theorem 2.2: Bias ad Variace of Importace Samplig E g ( µ) = µ Var g ( µ) = Var g(w(x) h(x))

Is it eough to kow f up to a multiplicative costat? Assume f(x) = Cπ(x). The µ = w(x i)h(x i ) = 1 Cπ(X i ) g(x i ) h(x i) C does ot cacel out kowig π( ) is ot eough. Idea: Replace ormalisato by by ormalisatio by w(x i), i.e. cosider the self-ormalised estimator ˆµ = w(x i)h(x i ) w(x i) Now we have that ˆµ = w(x i)h(x i ) w(x = i) π(x i ) g(x i ) h(x i), π(x i ) g(x i ) ˆµ does ot deped o C eough to kow f up to a multiplicative costat

The importace samplig algorithm (2) Algorithm 2.1b: Importace Samplig usig self-ormalised weights Choose g such that supp(g) supp(f h). 1. For i = 1,..., : i. Geerate X i g. ii. Set w(x i ) = f(xi) g(x. i) 2. Retur ˆµ = w(x i)h(x i ) w(x i) as a estimate of E f (h(x)).

Basic properties of the self-ormalised estimate ˆµ is cosistet as ˆµ = w(x i)h(x i ) }{{ } = µ E f (h(x)) w(x i) } {{ } 1 a.s. E f (h(x)), (provided supp(g) supp(f h) ad E g w(x) h(x) < + ) ˆµ is biased, but asymptotically ubiased (see theorem below) Theorem 2.2: Bias ad Variace (ctd.) E g (ˆµ) = µ + µvar g(w(x)) Cov g (w(x), w(x) h(x)) Var g (ˆµ) = Var g(w(x) h(x)) 2µCov g (w(x), w(x) h(x)) + µ2 Var g (w(x)) + O( 2 ) + O( 2 )

Fiite variace estimators Importace samplig estimate cosistet for large choice of g. (oly eed that...) More importat i practice: fiite variace estimators, i.e. ( Var( µ) = Var w(x ) i)h(x i ) < + Necessary (albeit very restrictive) coditios for fiite variace of µ: f(x) < M g(x) ad Var f (h(x)) <, or E is compact, f is bouded above o E, ad g is bouded below o E. Note: If f has heavier tails the g, the the weights will have ifiite variace!

Optimal proposals Theorem 2.3: Optimal proposal The proposal distributio g that miimises the variace of µ is g (x) = h(x) f(x) h(t) f(t) dt. Theorem of little practical use: the optimal proposal ivolves h(t) f(t) dt, which is the itegral we wat to estimate! Practical relevace of theorem 2.3: Choose g such that it is close to h(x) f(x)

Super-efficiecy of importace samplig For the optimal g we have that Var f ( h(x1 ) +... + h(x ) if h is ot almost surely costat. Superefficiecy of importace samplig ) > Var g ( µ), The variace of the importace samplig estimate ca be less tha the variace obtaied whe samplig directly from the target f. Ituitio: Importace samplig allows us to choose g such that we focus o areas which cotribute most to the itegral h(x)f(x) dx. Eve sub-optimal proposals ca be super-efficiet.

Example 2.5: Setup Compute E f X for X t 3 by... (a) samplig directly from t 3. (b) usig a t 1 distributio as istrumetal distributio. (c) usig a N(0, 1) distributio as istrumetal distributio.

Example 2.5: Desities 0.0 0.1 0.2 0.3 0.4 x f(x) (Target) f(x) (direct samplig) gt 1 (x) (IS t 1 ) g N(0,1) (x) (IS N(0, 1)) -4-2 0 2 4 x

Example 2.5: Estimates obtaied Samplig directly from t3 IS usig t1 as istrumetal dist IS usig N(0, 1) as istrumetal dist IS estimate over time 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0 500 1000 1500 0 500 1000 1500 0 500 1000 1500 Iteratio

Example 2.5: Weights Samplig directly from t3 IS usig t1 as istrumetal dist IS usig N(0, 1) as istrumetal dist Weights Wi 0.0 0.5 1.0 1.5 2.0 2.5 3.0-4 -2 0 2 4-4 -2 0 2 4-4 -2 0 2 4 Sample Xi from the istrumetal distributio