Hochdimensionale Integration - PDF Free Download

Oliver Ernst Institut für Numerische Mathematik und Optimierung Hochdimensionale Integration 14-tägige Vorlesung im Wintersemester 2010/11 im Rahmen des Moduls Ausgewählte Kapitel der Numerik

Contents 1. Introduction 1.1 An Example 1.2 A Selection of Strategies 1. Monte Carlo Integration 1.1 Convergence and Accuracy 1.2 Sampling Methods 1.3 Variance Reduction Methods 3. Sparse Grids 4. Quasi-Monte Carlo Integration 5. Extensions 5.1 ANOVA Decomposition 5.2 Concentration of Measure Oliver Ernst (TU Freiberg) Hochdimensionale Integration Wintersemester 2010/11 1

Inhalt 1. Introduction 1.1 An Example 1.2 A Selection of Strategies 1. Monte Carlo Integration 1.1 Convergence and Accuracy 1.2 Sampling Methods 1.3 Variance Reduction Methods 3. Sparse Grids 4. Quasi-Monte Carlo Integration 5. Extensions 5.1 ANOVA Decomposition 5.2 Concentration of Measure Oliver Ernst (TU Freiberg) Hochdimensionale Integration Wintersemester 2010/11 31

The Integral as an Expected Value Let x be a random variable which is uniformly distributed on [0, 1]. (Ω, A, P) x : Ω R probability space, measurable function, (cumulative) distribution function 0 ξ < 0, F (ξ) := P({x(ω) ξ}) = ξ 0 ξ 1, 1 ξ > 1. (probability) density (function) p(ξ) = I [0,1]. Oliver Ernst (TU Freiberg) Hochdimensionale Integration Wintersemester 2010/11 33

The Integral as an Expected Value For any f L 1 (0, 1) the expectation (expected value) of the random variable f(x(ω)) is given by 1 E [f(x)] = f(x(ω)) d P(ω) = f(x) dx. Ω 0 In the same way: for a d-dimensional random vector x : Ω [0, 1] d, x = x (ω), uniformly distributed on [0, 1] d and a function f L 1 ([0, 1] d ) we have E [f(x )] = f(x ) dx. [0,1] d For an ensemble {x n = x (ω j )} N j=1 sampled from the uniform distribution on [0, 1] d, an empirical estimate of the expectation is E [f(x )] Q N (f) := 1 N N f(x j ). j=1 Oliver Ernst (TU Freiberg) Hochdimensionale Integration Wintersemester 2010/11 34

Convergence of Monte Carlo Integration Theorem 1 (Strong Law of Large Numbers) Let {X n } n N be a sequence of mutually independent, identically distributed random variables with finite expected value µ := E[X n ]. Then the arithmetic averages of the partial sums S n := X 1 + + X n converge to µ almost surely, i.e., ({ }) Sn P n µ = 1. Consequence: Moreover: Q N (f) I(f) as N. E[Q N (f)] = I(f) N N, i.e., Q N (f) is an unbiased estimator of I(f). Oliver Ernst (TU Freiberg) Hochdimensionale Integration Wintersemester 2010/11 35

Convergence Rate of Monte Carlo Integration Theorem 2 (Central Limit Theorem) Let {X n } n N be a sequence of mutually independent, identically distributed random variables with finite expected value and variance µ := E[X n ] and σ 2 := Var[X n ]. Then the sequence of random variables S n nµ σ n, n N, where S n := X 1 + + X n, converges in distribution to the standard normal distribution, i.e., for all x R there holds ({ }) lim P Sn nµ n σ n < x = 1 x e t2 /2 dt =: Φ(x). 2π Oliver Ernst (TU Freiberg) Hochdimensionale Integration Wintersemester 2010/11 36

Convergence Rate of Monte Carlo Integration Consequence: denoting the MC error by we see that for N sufficiently large ɛ N ɛ N = ɛ N (f) := I(f) Q N (f) σ N ξ, σ = σ(f), ξ N(0, 1). More precisely: for a, b R, a < b, we have ( ) N lim P a < N σ ɛ N < b = P (a < ξ < b) = Φ(b) Φ(a). Note: probabilistic result, no deterministic bounds on MC error, only that it is of a certain size with some probability. Oliver Ernst (TU Freiberg) Hochdimensionale Integration Wintersemester 2010/11 37

Convergence Rate of Monte Carlo Integration error: bias: root mean-square error (RMSE): ɛ N (f) := I(f) Q N (f) E[ɛ N (f)] E[ɛN (f) 2 ] For random variables {x i } independent and uniformly distributed in [0, 1] d : E[ɛN (f) 2 ] = σn 1/2. Oliver Ernst (TU Freiberg) Hochdimensionale Integration Wintersemester 2010/11 38

Convergence Rate of Monte Carlo Integration Sample Size for Given Accuracy Probabilistic error bound: ɛ N For standard normal RV ξ, and ɛ > 0 σ N ξ, ξ N(0, 1). P ( ξ < ɛ) = Φ(ɛ) Φ( ɛ) = 2Φ(ɛ) 1 = erf where the error function erf is defined by erf(x) := 2 x e t2 dt. π For what value of the sample size N is ( ) σ P N ξ < ɛ c for a given confidence level c [0, 1]? 0 ( ɛ 2 ), Oliver Ernst (TU Freiberg) Hochdimensionale Integration Wintersemester 2010/11 39

Convergence Rate of Monte Carlo Integration Confidence Levels Answer: N σ 2 ɛ 2 s(c) 2, where s(c) defined by c = 2Φ(s) 1 = erf ( s 2 ). 2Φ(x) 1 1 0.8 c 0.6 0.4 0.2 0 0 1 s(c) 2 3 4 x standard normal density(x) 0.4 0.3 0.2 0.1 0 4 3 2 1 0 1 2 3 4 x Oliver Ernst (TU Freiberg) Hochdimensionale Integration Wintersemester 2010/11 40

Sampling Methods A random sequence is a vague notion... in which each term is unpredictable to the uninitiated and whose digits pass a certain number of tests traditional with statisticians... D. H. Lehmer, Berkeley, 1951 Oliver Ernst (TU Freiberg) Hochdimensionale Integration Wintersemester 2010/11 42

Sampling Methods Random Number Generators Random number generators (RNG) are computer programs which generate a sequence of rando numbers, which can be taken be uniformly distributed on [0, 1]. These sequences are not random, but deterministic (pseudo-random), but pass certain statistical tests which verify properties of random number sequences. References: Don Knuth. The Art of Computer Programming. Vol. II, Chapter 3. Ceve Moler. Numerical Computing with MATLAB. Chapter 9. We will view RNG as a black box. Fixing the internal state of the RNG will produce the same sequence of numbers. MATLAB: rand. Note: MATLAB now has three different RNG algorithms to choose from. Oliver Ernst (TU Freiberg) Hochdimensionale Integration Wintersemester 2010/11 43

Sampling Methods Inversion Method If X is a random variable with cdf F, i.e., P(X x) = F (x), then F is nondecreasing and we may define its inverse function for all u (0, 1) by F 1 (u) := inf{x R : F (x) = u}. If U U[0, 1], then X := F 1 (U) has cdf F. Consequence: If the evaluation of F 1 is computationally feasible, then samples {x n } of the random variable X may be generated from samples {u n } from a uniformly distributed random variable U [0, 1] as x n = F 1 (u n ). Oliver Ernst (TU Freiberg) Hochdimensionale Integration Wintersemester 2010/11 44

Sampling Methods Example: Univariate Gaussians If X N(0, 1), then F (x) = 1 2π x Samples of X may thus be obtained as e t2 dt = 1 2 + 1 2 erf(x). X = 2 erf 1 (2U 1), U U[0, 1]. Oliver Ernst (TU Freiberg) Hochdimensionale Integration Wintersemester 2010/11 45

Sampling Methods Example: Box-Muller Transform Given two independent uniformly distributed RV U 1, U 2 U[0, 1], define X 1 := 2 log U 1 cos(2πu 2 ), X 2 := 2 log U 1 sin(2πu 2 ). (1a) (1b) Claim: X 1, X 2 N(0, 1), independent. Polar coordinates: Transformed density: (x 1, x 2 ) = (r cos ϑ, r sin ϑ) 1 2 2π e (x 1 +x2 2 )/2 dx 1 dx 2 = 1 2π e r 2 /2 r dr dϑ Oliver Ernst (TU Freiberg) Hochdimensionale Integration Wintersemester 2010/11 46

Sampling Methods Example: Box-Muller Transform Angular variable U 1 := ϑ/(2π) U[0, 1]. Radial variable r has density re r2 /2 with cdf and inverse function F (r) = r 0 ρe ρ2 /2 dρ = 1 e r2 /2 r = F 1 (u) = 2 log(1 u). Noting that U 2 U[0, 1] implies 1 U 2 U[0, 1], we obtain (1). Oliver Ernst (TU Freiberg) Hochdimensionale Integration Wintersemester 2010/11 47

Sampling Methods Acceptance-Rejection Method Oliver Ernst (TU Freiberg) Hochdimensionale Integration Wintersemester 2010/11 48

Variance Reduction Monte Carlo integration: error ɛ and sample size N related by ( ) σ ɛ = O N, ( σ ) 2 computing time N ɛ 2 options for acceleration: (1) reduce σ (2) modify statistics, i.e., replace random sample sequence by alternative sequence which improves exponent 1 2 (QMC). Oliver Ernst (TU Freiberg) Hochdimensionale Integration Wintersemester 2010/11 50

Variance Reduction Antithetic Variables Idea: For each sample point x, also use x. Resulting quadrature rule is Q N (f) = 1 2N N [f(x n ) + f( x n )]. n=1 Advantage: existing symmetries preserved. Example: computing E[f(x)], x N(0, σ), σ small. Setting x = σˆx, Taylor expansion of f(σˆx) about 0 is f(x) = f(0) + f (0)σˆx + O(σ 2 ). Since distribution of ˆx symmetric about zero, E[ˆx] = 0. For standard MC the terms don t cancel, i.e., MC error σ. Here: MC error σ 2. Oliver Ernst (TU Freiberg) Hochdimensionale Integration Wintersemester 2010/11 51

Variance Reduction Control Variates Idea: use auxiliary integrand g f for which I(g) known, write I(f) = I(f g) + I(g), and use MC to approximate I(f g). Resulting quadrature formula: Error: where I(f) Q N (f) := 1 N N [f(x n ) g(x n )] + I(g) n=1 ɛ N (f) = I(f) Q N (f) σ f g N, ( σf g 2 = I [(f g) I(f g)] 2) = I ( ( f g) 2), where f(x) := f(x) I(f), g(x) := g(x) I(g). Thus: improvement whenever σ f g σ f. Oliver Ernst (TU Freiberg) Hochdimensionale Integration Wintersemester 2010/11 52

Variance Reduction Control Variates Optimal use employs multiplier λ such that I(f) = I(f λg) + λi(g). Error of MC applied to first integral proportional to σ 2 f λg = I ( [ f(x ) λ g(x )] 2). Optimal value of λ found by minimizing σ f λg, which is obtained for λ = I( f g) I( g 2 ). Oliver Ernst (TU Freiberg) Hochdimensionale Integration Wintersemester 2010/11 53

Variance Reduction Moment Matching MC error due to statistical sampling error, i.e., difference between desired density p and empirical density associated with sample points {x n }. Manifested, e.g., in deviating moments: in general m 1 := E[x] = x p(x) dx, µ 1 := 1 N m 2 := E[x 2 ] = x 2 p(x) dx, µ 2 := 1 N Partial correction: modify {x n } to match µ 1, µ 2. First moment matched for x n x n := (x n µ 1 ) + m 1. N x n, n=1 N x 2 n. n=1 Oliver Ernst (TU Freiberg) Hochdimensionale Integration Wintersemester 2010/11 54

Variance Reduction Moment Matching First two moments matched by modified sequence x n x n := x n µ 1 c + m 1, c := µ 2 µ 2 1 m 2 m 2. 1 Caution: Samples no longer independent, estimates more difficult. CLT no longer applicable. Method possibly biased. Oliver Ernst (TU Freiberg) Hochdimensionale Integration Wintersemester 2010/11 55

Variance Reduction Stratified Sampling Idea: Split integration region D = [0, 1] into M subregions [ k 1 D k = M, k ], D k = 1 M M k. Local averages: f(x) := f k := 1 f(x) dx, D k D k x D k For each k sample N k := N/M points {x (k) i } distributed uniformly in D k. Stratified quadrature formula: Error: ɛ Q N (f) := 1 N M N/M k=1 i=1 σ s N, σ 2 s = I ( (f f) 2) = ( f M k=1 x (k) i ). D k ( f(x) fk ) 2 dx Oliver Ernst (TU Freiberg) Hochdimensionale Integration Wintersemester 2010/11 56

Variance Reduction Stratified Sampling Claim: σ s σ. More general: split D into M subregions D k such that D = M k=1 D k. Take N k random samples on D k such that M k=1 N k = N. In each D k, choose x (k) n distributed with density p (k), where p (k) (x) := p(x), p k := p(x) dx. p k D k Stratified quadrature formula: Error: Q N (f) := M p k N k N k k=1 n=1 ɛ N (f) = I(f) Q N (f) = f(x (k) n ). M k=1 ɛ (k) N k (f) Oliver Ernst (TU Freiberg) Hochdimensionale Integration Wintersemester 2010/11 57

Variance Reduction Stratified Sampling Components: ɛ (k) N k (f) p 1/2 ( ) 1/2 k (f(x) Nk ( D f k ) 2 p (k) pk (x) dx) = σ (k), k N k with variances σ (k) = p 1/2 ( k (f(x) ( D f k ) 2 p (k) (x) dx) = (f(x) f k ) 2 p(x) dx k D k with averages f k := f(x)p(x) dx/ p k. D k Can show: stratification lowers integration error if balance condition satisfied: p k = 1, k = 1,..., M. N k N ) 1/2 Oliver Ernst (TU Freiberg) Hochdimensionale Integration Wintersemester 2010/11 58

Variance Reduction Stratified Sampling Resulting error: ɛ N σ s N, σ 2 s = M [σ (k) ] 2. Since variance over subdomain always less than over entire domain, we have σ s σ. k=1 Oliver Ernst (TU Freiberg) Hochdimensionale Integration Wintersemester 2010/11 59

Variance Reduction Importance Sampling Idea: To approximate I(f), introduce density p and write f(x) I(f) = f(x) dx = p(x) dx, p(x) sample points {x n } from density p and form MC approximation Q N (f) = 1 N N i=1 f(x n ) p(x n ). Error: where ɛ N (f) = I(f) Q N (f) σ p = σ p N ( ) 2 f(x) p(x) 1 p(x) dx. Oliver Ernst (TU Freiberg) Hochdimensionale Integration Wintersemester 2010/11 60

Variance Reduction Importance Sampling Remarks: Effective when f/p nearly constant. Most widely used MC variance reduction method. Need to sample from distribution with density p (acceptance-rejection). Can be used for rare but important events, i.e., small regions of space where integrand f is large. Oliver Ernst (TU Freiberg) Hochdimensionale Integration Wintersemester 2010/11 61