IMPORTANCE SAMPLING Importance Sampling Background: let x = (x 1,..., x n ), θ = E[h(X)] = h(x)f(x)dx 1 N. h(x i ) = Θ,

Similar documents
Review 1: STAT Mark Carpenter, Ph.D. Professor of Statistics Department of Mathematics and Statistics. August 25, 2015

Θ is the stratified sampling estimator of θ. Note: y i s subdivide sampling space into k strata ;

1 Solution to Problem 2.1

Moments. Raw moment: February 25, 2014 Normalized / Standardized moment:

SOLUTION FOR HOMEWORK 12, STAT 4351

Sampling Distributions

Random Variables and Their Distributions

STA 4321/5325 Solution to Homework 5 March 3, 2017

Chapter 4. Continuous Random Variables

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Sampling Distributions

MTH739U/P: Topics in Scientific Computing Autumn 2016 Week 6

1.6 Families of Distributions

Solution to Assignment 3

Statistics 3657 : Moment Generating Functions

Chapter 2. Discrete Distributions

BMIR Lecture Series on Probability and Statistics Fall 2015 Discrete RVs

Continuous Distributions

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

Northwestern University Department of Electrical Engineering and Computer Science

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued

1 Random Variable: Topics

Partial Solutions for h4/2014s: Sampling Distributions

Monte-Carlo MMD-MA, Université Paris-Dauphine. Xiaolu Tan

1 Complete Statistics

Continuous Distributions

1 Review of Probability

Integration by Substitution

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER.

Mathematical statistics

Special Discrete RV s. Then X = the number of successes is a binomial RV. X ~ Bin(n,p).

Chapter 8.8.1: A factorization theorem

STAT/MATH 395 A - PROBABILITY II UW Winter Quarter Moment functions. x r p X (x) (1) E[X r ] = x r f X (x) dx (2) (x E[X]) r p X (x) (3)

Lecture 5: Moment generating functions

Summary. Ancillary Statistics What is an ancillary statistic for θ? .2 Can an ancillary statistic be a sufficient statistic?

MATH : EXAM 2 INFO/LOGISTICS/ADVICE

Chapter 5. Chapter 5 sections

STAT 512 sp 2018 Summary Sheet

National Sun Yat-Sen University CSE Course: Information Theory. Maximum Entropy and Spectral Estimation

Basic concepts of probability theory

p. 6-1 Continuous Random Variables p. 6-2

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

1 Probability Model. 1.1 Types of models to be discussed in the course

Multiple Random Variables

1 General problem. 2 Terminalogy. Estimation. Estimate θ. (Pick a plausible distribution from family. ) Or estimate τ = τ(θ).

Chapter 4. Continuous Random Variables 4.1 PDF

Statistics for scientists and engineers

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Mathematical statistics

Random Variables. Cumulative Distribution Function (CDF) Amappingthattransformstheeventstotherealline.

International Journal of Scientific & Engineering Research, Volume 6, Issue 2, February-2015 ISSN

Qualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University

3. Probability and Statistics

ON A GENERALIZATION OF THE GUMBEL DISTRIBUTION

ECE 4400:693 - Information Theory

MATH Solutions to Probability Exercises

STAT 3610: Review of Probability Distributions

Test Problems for Probability Theory ,

Math 407: Probability Theory 5/10/ Final exam (11am - 1pm)

Exam P Review Sheet. for a > 0. ln(a) i=0 ari = a. (1 r) 2. (Note that the A i s form a partition)

1 Review of di erential calculus

SOLUTIONS TO MATH68181 EXTREME VALUES AND FINANCIAL RISK EXAM

Stochastic Simulation Variance reduction methods Bo Friis Nielsen

Brief Review of Probability

Chapter 4: Continuous Random Variables and Probability Distributions

Chapter 4 Continuous Random Variables and Probability Distributions

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

Joint p.d.f. and Independent Random Variables

Basic concepts of probability theory

Statistics STAT:5100 (22S:193), Fall Sample Final Exam B

F X (x) = P [X x] = x f X (t)dt. 42 Lebesgue-a.e, to be exact 43 More specifically, if g = f Lebesgue-a.e., then g is also a pdf for X.

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Stochastic Processes and Monte-Carlo Methods. University of Massachusetts: Spring Luc Rey-Bellet

1 Review of Probability and Distributions

APPM/MATH 4/5520 Solutions to Exam I Review Problems. f X 1,X 2. 2e x 1 x 2. = x 2

Chapter 4 : Expectation and Moments

Actuarial models. Proof. We know that. which is. Furthermore, S X (x) = Edward Furman Risk theory / 72

More on Distribution Function

Question Points Score Total: 76

Review For the Final: Problem 1 Find the general solutions of the following DEs. a) x 2 y xy y 2 = 0 solution: = 0 : homogeneous equation.

Math 3215 Intro. Probability & Statistics Summer 14. Homework 5: Due 7/3/14

Errata for Stochastic Calculus for Finance II Continuous-Time Models September 2006

E[X n ]= dn dt n M X(t). ). What is the mgf? Solution. Found this the other day in the Kernel matching exercise: 1 M X (t) =

Outline PMF, CDF and PDF Mean, Variance and Percentiles Some Common Distributions. Week 5 Random Variables and Their Distributions

LIST OF FORMULAS FOR STK1100 AND STK1110

Miscellaneous Errors in the Chapter 6 Solutions

STAT Chapter 5 Continuous Distributions

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

Continuous Random Variables

Uses of Asymptotic Distributions: In order to get distribution theory, we need to norm the random variable; we usually look at n 1=2 ( X n ).

Continuous Random Variables

Probability Distributions Columns (a) through (d)

Math 510 midterm 3 answers

Probability Theory and Statistics. Peter Jochumzen

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Order Statistics. The order statistics of a set of random variables X 1, X 2,, X n are the same random variables arranged in increasing order.

Generating Random Variates 2 (Chapter 8, Law)

Probability and Distributions

Transcription:

1 IMPORTANCE SAMPLING Importance Sampling Background: let x = (x 1,..., x n ), θ = E[h(X)] = h(x)f(x)dx 1 N N h(x i ) = Θ, if X i F (X), and F (x) is cdf for f(x). For many problems, F (x) is difficult to sample from and/or V ar(h) is large. If a related, easily sampled pdf g(x) is available, could use θ = E g [ h(x)f(x h(x)f(x) ] = g(x)dx g(x) g(x) 1 N h(x i )f(x i ), N g(x i ) with X i G(X), for associated cdf G(X). Importance sampling: if V ar( h(x)f(x) g(x) ) is small, g(x) samples are concentrated where h(x)f(x) is important :

2 Importance Sampling Example: θ = 1 0 ex2 dx. Try g(x) = e x ; so θ = 1 0 ex2 x e x dx. To find G(x), note 1 0 ex dx = e 1, so G(x) = 1 x e t dt = (e x 1)/(e 1); e 1 0 then using X i = ln(1 + (e 1)U i ), 1 θ = (e 1) e x2 x ex (e 1) N dx e X2 i X i, e 1 N 0 N = 10000; U = rand(1,n); Y = exp(u.^2); disp( [mean(y) 2*std(Y)/sqrt(N)]) % simple MC 1.4672 0.009463 e = exp(1); X = log(1+(e-1)*u); T = (e-1)*exp(x.*(x-1)); disp( [mean(t) 2*std(T)/sqrt(N)]) % importance 1.4628 0.0022348 Error reduction by 1/4. 2.8 exp(x), exp(x 2 ), and 1+x 2 +x 4 /2 2.6 2.4 2.2 2 1.8 1.6 1.4 1.2 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x

3 Alternative: g(x) = 1 + x 2, G(x)? θ = 4 3 1 with X i 3 4 X + 1 4 X3. 0 e x2 3(1 + x 2 ) 1 + x 2 4 dx 4 3N N e X2 i 1 + X 2 i N = 10000; U = rand(1,n); I = rand(1,n)<3/4; X = I.*U + (1-I).*U.^(1/3); T = 4*exp(X.^2)./(3*(1+X.^2)); disp( [mean(t) 2*std(T)/sqrt(N)]) % importance 1.4627 0.0028178 Better g(x) = 1 + x 2 + x 4 /2, G(x) = 30 43 x + 10 43 x3 + 3 43 x5 ; θ = 43 30 43 30N 1 0 e x2 1 + x 2 + x 4 /2 N e X2 i 1 + X 2 i + X4 i /2, 30(1 + x 2 + x 4 /2) dx 43 with X i 30 43 x + 10 43 x3 + 3 43 x5. N = 100000; U = rand(1,n); V = rand(1,n); I = V<30/43; J = V>40/43; X = I.*U + (1-I).*(1-J).*U.^(1/3)+ J.*U.^(1/5); T = 43*exp(X.^2)./(30*(1+X.^2+X.^4/2)); disp( [mean(t) 2*std(T)/sqrt(N)]) % importance 1.4623 0.00072228,

4 Importance Sampling Example: θ = 1 e z2 /2 z 3 e z dz. N = 100000; Z = randn(1,n); Y = Z.^3.*exp(Z); disp( [mean(y) 2*std(Y)/sqrt(N)]) % simple MC 6.5418 0.33644 Try g(z) = 1 e (z 1)2 /2 = 1 e w2 /2, with w = z 1; so θ = e 1/2 z 3 g(z)dz = e1/2 (w + 1) 3 e w2 /2 dw. Simulation uses θ e1/2 N N (Z i+1) 3, with Z i Normal(0, 1). N = 100000; Z = randn(1,n); Y = exp(1/2)*(z+1).^3; disp( [mean(y) 2*std(Y)/sqrt(N)]) % importance 6.6338 0.081528 Error reduction by 1/4. Note θ = e1/2 (3w 2 + 1)e w2 /2 dw = 4e 1/2 6.5948851.

5 Higher dimensional problems: often f(x) g(x) = g 1 (x 1 )g 2 (x 2 ) g n (x n ), so samples are from a sequence of 1-d samples. 2-d example: θ = 1 1 0 0 e(x 1+x 2 ) 2 dx; if g(x) = e x 1e x 2; θ = 1 1 0 0 e((x 1+x 2 ) 2 x 1 x 2 e x 1+x 2 dx. After scaling, with X ij = ln(1 + (e 1)U ij ), 1 1 θ = (e 1) 2 e ((x 1+x 2 ) 2 x 1 x 2 ) ex 1+x 2 0 0 (e 1) 2dx (e 1)2 N e (X 1i+X 2i ) 2 X 1i X 2i. N N = 10000; U = rand(2,n); T = exp(sum(u).^2); disp( [mean(t) 2*std(T)/sqrt(N)]) % simple MC 4.9204 0.12261 e = exp(1); X = log(1+(e-1)*u); T = (e-1)^2*exp(sum(x).^2-sum(x)); disp( [mean(t) 2*std(T)/sqrt(N)]) 4.8863 0.065169 Better g(x) = e 2x 1e 2x 2, with g(1, 1) = f(1, 1)? Then G i (x) = e2x 1 e 2 1, X ij = ln(1 + (e 2 1)U ij )/2, and 1 1 θ = (e2 1) 2 4 0 0 e = exp(1); X = log(1+(e^2-1)*u)/2; T = (e^2-1)^2*exp(sum(x).^2-2*sum(x))/4; disp( [mean(t) 2*std(T)/sqrt(N)]) 4.9008 0.0082436 Better g(x) = 1 + (x 1 + x 2 ) 2? e ((x 1+x 2 ) 2 2(x 1 +x 2 )) 4e2(x 1+x 2 ) (e 2 1) 2 dx,

6 Tilted Densities g(x) : given pdf f(x) let M(t) = e tx f(x)dx (the moment generating function). The tilted density for f(x) is f t (x) = etx f(x) M(t). Examples Exponential densities: if f(x) = λe λx, x [0, ), f t (x) = (λ t)e (λ t)x, t < λ. Bernoullli pmf s: f(x) = p x (1 p) 1 x, x = 0, 1. M(t) = E f [e tx ] = e t p + (1 p), so ( ) f t (x) = etx p x (1 p) 1 x e = t x ( p 1 x, e t p+(1 p) e t p+(1 p) e p+(1 p)) t a Bernoulli RV with p t = e t p e t p+(1 p). So f/f t = et p+(1 p) = e tx (e t p + (1 p)) e tx Generalization: if f(x) is a Binomial(n, p) pmf, f t (x) is Binomial(n, e t p + 1 p), with M(t) = (e t p + 1 p) n. Normal densities: if f(x) = e x2 /2, x (, ), e tx f(x) = ext e x2 /2 = e (x t)2 /2 e t2 /2 so f t (x) = e (x t)2 /2, Normal(t, 1), with M(t) = e t2 /2. Generalization: if f(x) is a Normal(µ, σ 2 ) pdf, then f t (x) is a Normal(µ + σ 2 t, σ 2 ) pdf. Choosing t: pick t with small V ar( h(x)f(x) f t (x) ). Text heuristic for exponentials and Bernoullis: if h = I{ X i > a}, choose t = t with E t [ X i ] a.

Examples: 1. Bernoulli RV Examples: if X i s are independent Bernoulli(p i ) RV s and θ = I{ n X i > a} = I{S > a}. n ˆθ = I{S > a}e ts (e t p i + (1 p i )), with E t [ n X i ] = n e t p i e t p i + (1 p i ). Example with n = 20, p i =.4, a = 16; choose t so that E t [S] = 20.4et.4e t +.6 = 16, with solution et = 6; then p t =.8, e t p + (1 p) = 3, and estimator is ˆθ = I{ X i > a}6 S 3 20 = 3 20 S I{ X i > a}/2 S. Matlab N = 100000; p =.4; n = 20; I = sum( rand(n,n) < p ) > 16; % Simple MC disp([mean(i) 2*std(I)/sqrt(N)]) 6e-05 4.8989e-05 S = sum( rand(n,n) <.8 ); % importance I = 3.^(20-S).*( S > 16 )./2.^S; disp([mean(i) 2*std(I)/sqrt(N)]) 4.7575e-05 5.1608e-07 N = 10000000; p =.4; n = 20; I = sum( rand(n,n) < p ) > 16; % Simple MC disp([mean(i) 2*std(I)/sqrt(N)]) 4.82e-05 Note: θ = 20 7 4.3908e-06 ( 20 ) i (.4) i (.6) 20 i 4.7345 10 5. 7

8 2. Exponential RV Example: if X i Exp( 1 i+2 ), i = 1,..., 4, S(X) = 4 X i, find 4 x i θ = h(x) e i+2 3 4 5 6 dx, 0 0 with h(x) = S(x)I{S(x) > 62}. Raw simulation uses X ij Exp( 1 i+2 ), to estimate θ 1 N h(x j ). N j=1 Matlab N = 100000; U = rand(4,n); X = -diag([3:6])*log(1-u); S = sum(x); h = S.*( S > 62 ); disp( [mean(h) 2*std(h)/sqrt(N)]) 0.066974 0.013647 Note: to find E[S S > 62], divide by E[I(S > 62)] E = h/mean((s>62)); disp([mean(e) 2*std(E)/sqrt(N)]) 66.948 15.237

For tilted density, use common tilt parameter t, so that X i Exp(1/(i + 2) t), θ = 4 C N [0, )4 h(x)e ts(x) i + 2 e 1 (i + 2)t 4 (i + 2) N 4 h(x j )e ts(xj), with C = j=1 4 x i ( 1 i+2 t) 4 i+2 1 (i+2)t) 1 1 (i + 2)t. Text estimates good t =.14, by approximately solving 4 E t [X i ] = 3 1 3t + 4 1 4t + 5 1 5t + 6 1 6t = 62. 9 dx; But guess and check with Matlab finds better t.136. Matlab tests: t =.14; Cd = 1./(1-[3:6]*t); C = prod(cd); St = -([3:6].*Cd)*log(1-U); ht = C*St.*( St > 62 ).*exp(-t*st); disp([mean(ht) 2*std(ht)/sqrt(N)]) 0.063281 0.0010059 t =.136; Cd = 1./(1-[3:6]*t); C = prod(cd); St = -([3:6].*Cd)*log(1-U); ht = C*St.*( St > 62 ).*exp(-t*st); disp([mean(ht) 2*std(ht)/sqrt(N)]) 0.06201 0.00099896 E = ht/mean(c*(st > 62).*exp(-t*St)); disp([mean(e) 2*std(E)/sqrt(N)])% Expected Value 68.215 1.0931 Note smaller standard errors compared to raw sampling.

Tilting for Normal Densities: if f(x) = 1 e (x µ)2 /2, tilted density f t (x) = f(x)ext M(t) is a shifted normal. Example: θ = 1 e z2 /2 z 3 e z dz. Pick t to match point (mode) where integrand a(z) = Ke (z 1)2 /2 z 3 is max. 10 exp(x x 2 /2) x 3 /sqrt(2 π) 3.5 3 2.5 2 1.5 1 0.5 0 4 3 2 1 0 1 2 3 4 x

11 Or, pick t to match mean at t = 2.5. For either case: θ = 1 e (z t)2 /2 z 3 e z zt+t2 /2 dz = 1 e w2 /2 (w+t) 3 e (w+t)(1 t)+t2 /2), using z = w+t. Matlab tests: N = 100000; W = randn(1,n); t = (1+sqrt(13)/2; Y = (W+t).^3.*exp((W+t)*(1-t)+t^2/2); disp( [mean(y) 2*std(Y)/sqrt(N)]) % mode tilt 6.6408 0.035346 t = 2.5; Y = (W+t).^3.*exp((W+t)*(1-t)+t^2/2); disp( [mean(y) 2*std(Y)/sqrt(N)]) % mean tilt 6.6571 0.037741 Compare: t = 1; Y = (W+t).^3.*exp((W+t)*(1-t)+t^2/2); disp( [mean(y) 2*std(Y)/sqrt(N)]) % tilt = 1 6.5775 0.079949 t=0; Y = (W+t).^3.*exp((W+t)*(1-t)+t^2/2); disp( [mean(y) 2*std(Y)/sqrt(N) ]) % t=0, raw MC 6.3272 0.29958

Tilting for Multidimensional Normal Density Problems: use vector t. Choice of t? Try to make V ar( h(x)f(x) f(x t) ) small: a) choose point t where h(x)f(x) is maximum (mode), or b) choose t = E[xh(x)]/E[h(x)] (mean). Asian Option example: this has S m = S m 1 e (r σ2 2 )δ+σ δz m, with δ = T/M, Z m Normal(0, 1) and expected profit θ = E[e rt max( 1 M = e rt = M S i (Z) K, 0)] max( 1 M M M zi 2/2 h(z) e ( ) dz, m with h(z) = e rt max( 1 M M S i(z) K, 0). 12 M S i (z) K, 0) e zi 2/2 ( ) dz m For method a), find t to maximize h(z)e M zi 2/2. For method b), t can be estimated from data. Given t, use ˆθ = = h(z) e M z 2 i /2 e M (z i t i ) 2 /2 M (y i +t i ) 2 /2 e h(y + t) e e M y 2 i /2 e M (z i t i ) 2 /2 ( dz ) m M yi 2/2 ( ) dy. m

Example with M = 16, S 0 = K = 50, T = 1, r =.05, σ =.1. Matlab test using method b): M = 16; S0 = 50; K = 50; T = 1; dlt = T/M; r = 0.05; s = 0.1; rd = ( r - s^2/2 )*dlt; N = 10000; z = randn(m,n); % Simple MC S = S0*exp(cumsum(rd + s*sqrt(dlt)*z)); h = exp(-r*t)*max( mean(s)-k, 0 ); disp([mean(h) var(h) 2*std(h)/sqrt(N)]) 1.9465 4.825 0.043932 t = z*h /sum(h); % Approx. Mean Tilt Vector y = z; z = y + t*ones(1,n); S = S0*exp(cumsum(rd + s*sqrt(dlt)*z)); h = exp(-r*t)*max( mean(s)-k, 0 ); ht = h.*exp(sum(y.*y-z.*z)/2); % Importance disp([mean(ht) var(ht) 2*std(ht)/sqrt(N)]) 1.9136 0.66366 0.016293 Notice variance reduction from tilted sampling. Using method a) with Matlab fminsearch to find t: Sf = @(z)s0*exp(cumsum(rd+s*sqrt(dlt)*z)); hf = @(z)exp(-z *z/2)*max(mean(sf(z))-k,0); t = fminsearch(@(z)-hf(z), ones(m,1) ); % Tilt t y = z; z = y + t*ones(1,n); S = S0*exp(cumsum(rd + s*sqrt(dlt)*z)); h = exp(-r*t)*max( mean(s)-k, 0 ); ht = h.*exp(sum(y.*y-z.*z)/2); % Importance disp([mean(ht) var(ht) 2*std(ht)/sqrt(N)]) 1.8868 0.35124 0.011853 13