Adaptive Monte Carlo methods

Similar documents
Adaptive Population Monte Carlo

Outline. General purpose

Surveying the Characteristics of Population Monte Carlo

Computational statistics

Lecture 8: The Metropolis-Hastings Algorithm

17 : Markov Chain Monte Carlo

Kernel Sequential Monte Carlo

16 : Approximate Inference: Markov Chain Monte Carlo

Markov Chain Monte Carlo (MCMC)

Control Variates for Markov Chain Monte Carlo

Computer intensive statistical methods

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Markov Chain Monte Carlo

Monte Carlo methods for sampling-based Stochastic Optimization

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait

Introduction to Machine Learning CMU-10701

MCMC and likelihood-free methods

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

Expectation Propagation Algorithm

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture February Arnaud Doucet

Sampling from complex probability distributions

SAMPLING ALGORITHMS. In general. Inference in Bayesian models

Markov Chain Monte Carlo methods

I. Bayesian econometrics

STAT 425: Introduction to Bayesian Analysis

Metropolis-Hastings Algorithm

Markov chain Monte Carlo Lecture 9

Monte Carlo Methods. Leon Gu CSD, CMU

Kernel adaptive Sequential Monte Carlo

Deblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox Richard A. Norton, J.

Introduction to Bayesian methods in inverse problems

19 : Slice Sampling and HMC

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Lecture 7 and 8: Markov Chain Monte Carlo

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

A short diversion into the theory of Markov chains, with a view to Markov chain Monte Carlo methods

Answers and expectations

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version)

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

Monte Carlo Methods. Geoff Gordon February 9, 2006

MCMC and Gibbs Sampling. Kayhan Batmanghelich

MCMC algorithms for fitting Bayesian models

Learning the hyper-parameters. Luca Martino

Semi-Parametric Importance Sampling for Rare-event probability Estimation

TEORIA BAYESIANA Ralph S. Silva

STA 4273H: Statistical Machine Learning

Approximate Bayesian Computation: a simulation based approach to inference

Bayesian Methods for Machine Learning

On Markov chain Monte Carlo methods for tall data

13 Notes on Markov Chain Monte Carlo

A Review of Pseudo-Marginal Markov Chain Monte Carlo

The Recycling Gibbs Sampler for Efficient Learning

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters

Pseudo-marginal MCMC methods for inference in latent variable models

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

PSEUDO-MARGINAL METROPOLIS-HASTINGS APPROACH AND ITS APPLICATION TO BAYESIAN COPULA MODEL

Bayesian Inference and MCMC

INTRODUCTION TO BAYESIAN STATISTICS

MCMC: Markov Chain Monte Carlo

Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation. Luke Tierney Department of Statistics & Actuarial Science University of Iowa

An introduction to Sequential Monte Carlo

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Markov Chain Monte Carlo methods

Introduction to Markov Chain Monte Carlo & Gibbs Sampling

Dynamic models. Dependent data The AR(p) model The MA(q) model Hidden Markov models. 6 Dynamic models

CPSC 540: Machine Learning

Gradient-based Monte Carlo sampling methods

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling

Expectation Propagation for Approximate Bayesian Inference

ST 740: Markov Chain Monte Carlo

Markov Chain Monte Carlo Methods

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

MCMC Methods: Gibbs and Metropolis

The Metropolis-Hastings Algorithm. June 8, 2012

Computer intensive statistical methods

Machine Learning. Probabilistic KNN.

DAG models and Markov Chain Monte Carlo methods a short overview

Markov chain Monte Carlo

Spatial Statistics Chapter 4 Basics of Bayesian Inference and Computation

Markov Chain Monte Carlo

eqr094: Hierarchical MCMC for Bayesian System Reliability


CSC 2541: Bayesian Methods for Machine Learning

Inference in state-space models with multiple paths from conditional SMC

Advances and Applications in Perfect Sampling

Sampling Algorithms for Probabilistic Graphical models

Markov chain Monte Carlo

Quantifying Uncertainty

Principles of Bayesian Inference

Markov Chain Monte Carlo

Density Estimation. Seungjin Choi

Computer Practical: Metropolis-Hastings-based MCMC

Overlapping block proposals for latent Gaussian Markov random fields

Transcription:

Adaptive Monte Carlo methods Jean-Michel Marin Projet Select, INRIA Futurs, Université Paris-Sud joint with Randal Douc (École Polytechnique), Arnaud Guillin (Université de Marseille) and Christian Robert (Université Paris Dauphine) Séminaire Montpellier (17 décembre 2006) Page 1

Introduction Let (X, B(X), Π) be a probability space. (A1) Π µ and Π(dx) = π(x)µ(dx). (A2) π is known up to a normalizing constant: π(x) = π(x) ; π(x)µ(dx) π is known; the calculation of π(x)µ(dx) < is intractable. Séminaire Montpellier (17 décembre 2006) Page 2

Problem: for some Π-measurable applications h, approximate h(x) π(x)µ(dx) Π(h) = h(x)π(x)µ(dx) =. π(x)µ(dx) (A3) the calculation of h(x) π(x)µ(dx) is intractable. More concisely, X Π and we would like to approximate (µ(dx) = dx). Π(h) = E Π (h(x)) = h(x) π(x)dx π(x)dx Séminaire Montpellier (17 décembre 2006) Page 3

Applications in Bayesian inference where the target distribution is the posterior distribution of the parameter of interest π(θ x) f(x θ)π 1 (θ) f(x θ) (θ Θ) is the likelihood ; π 1 (θ) the prior distribution of θ. A Bayesian estimator of θ is the posterior mean of θ, that is θf(x θ)π 1 (θ)dθ E π (θ x) =. f(x θ)π 1 (θ)dθ Séminaire Montpellier (17 décembre 2006) Page 4

Monte Carlo methods (MC) = Generate an iid sample x 1,...,x N from Π and to estimate E Π (h(x)) by N ˆΠ MC N (h) = N 1 h(x i ). i=1 (i) MC ˆΠ N (h) as E Π (h(x)); (ii) E Π (h 2 (X)) <, ) N (ˆΠ MC N (h) E Π (h(x)) L N(0, V Π (h(x)). Often impossible to simulate directly from Π! Séminaire Montpellier (17 décembre 2006) Page 5

Markov Chain Monte Carlo methods (MCMC) = Generate x (1),...,x (T) from a Markov chain (x t ) t N with stationary distribution Π and estimate E Π (h(x)) by ˆΠ MCMC N (h) = N 1 T i=t N+1 h ( x (i)). Convergence to the stationary distribution could be very slow! Séminaire Montpellier (17 décembre 2006) Page 6

Metropolis Hastings algorithms Metropolis Hastings algorithms are generic (or down-the-shelf) MCMC algorithms, compared with the Gibbs sampler, in the sense that they can be tuned with a much wider range of possibilities. Séminaire Montpellier (17 décembre 2006) Page 7

If the target distribution has density π, the generic Metropolis Hastings algorithm is: Initialization: Choose an arbitrary x (0) Iteration t: 1. Given x (t 1), generate x q(x (t 1), x) 2. Calculate ρ(x (t 1), x) = min ( π( x)/q(x (t 1) ), x) π(x (t 1) )/q( x, x (t 1) ), 1 3. With probability min(ρ(x (t 1), x), 1) accept x and set x (t) = x; otherwise reject x and set x (t) = x (t 1). Séminaire Montpellier (17 décembre 2006) Page 8

This algorithm only needs to simulate from q which we can choose arbitrarily, as long as q is capable of reaching all areas of positive probability under π. While theoretical guarantees that the algorithm converges are very high, the choice of q remains paramount in practice. Séminaire Montpellier (17 décembre 2006) Page 9

The random walk sampler A random walk proposal has a symmetric transition density q(x, y) = q RW (y x) where q RW (x) = q RW ( x). In this case the acceptance probability ρ(x, y) reduces to the simpler form ( ρ(x, y) = min 1, π(y) ). π(x) Séminaire Montpellier (17 décembre 2006) Page 10

Example Consider the standard normal distribution N(0, 1) as a target. If we use random walk Metropolis-Hastings algorithm with a normal random walk, i.e. ( x x (t 1) N x (t 1), σ 2), q RW ( x x (t 1) ) = 1 2πσ 2 exp 1 2σ 2 ( x x(t 1) ), the performances of the sampler depends on the value of σ 2. Séminaire Montpellier (17 décembre 2006) Page 11

0 2000 4000 6000 8000 10000 MH chain 0.5 0.0 0.5 1.0 1.5 2.0 2.5 Iterations Density 0.0 0.5 1.0 1.5 0.0 0.5 1.0 1.5 2.0 x Autocorrelation 0.0 0.2 0.4 0.6 0.8 1.0 3 2 1 0 1 2 3 0 2000 4000 6000 8000 10000 Iterations Density MH chain 2 1 0 1 2 x Autocorrelation 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.5 1.0 1.5 0 200 400 600 800 1000 Lag 0 200 400 600 800 1000 Lag Figure 1: (left) σ 2 = 10 4 and a (right) σ 2 = 10 3 top: sequence of 10,000 iterations subsampled at every 10-th iteration; middle: histogram of the 2, 000 last iterations compared with the target density; bottom: empirical autocorrelations. Séminaire Montpellier (17 décembre 2006) Page 12

0 2000 4000 6000 8000 10000 MH chain 3 2 1 0 1 2 3 Iterations Density 0.0 0.1 0.2 0.3 0.4 3 2 1 0 1 2 3 x Autocorrelation 0.0 0.2 0.4 0.6 0.8 1.0 0 200 400 600 800 1000 Lag Figure 2: σ 2 = 2 top: sequence of 10, 000 iterations subsampled at every 10-th iteration; middle: histogram of the 2, 000 last iterations compared with the target density; bottom: empirical autocorrelations. Séminaire Montpellier (17 décembre 2006) Page 13

Importance sampling Let Q be a probability distribution on (X, B(X)). Suppose that Q(dx) = q(x)dx and that [q(x) = 0] [π(x) = 0]. Π(h) = E Π (h(x)) = h(x) π(x) q(x) q(x)dx = E Q ( ) π(x) q(x) h(x) = Q ( ) π q h = Generate an iid sample x 1,...,x N from Q, called the proposal distribution, and to estimate Π(h) by ˆΠ IS Q,N(h) = N 1 N i=1 π(x i ) q(x i ) h(x i). Séminaire Montpellier (17 décembre 2006) Page 14

(i) ˆΠ IS Q,N(h) as E π (h(x)); ( ) (ii) E π 2 (X) Q q 2 (X) h2 (X) <, N(ˆΠ IS Q,N(h) E Π (h(x)) L N ( ( )) π(x) 0, V Q q(x) h(x). ( ) For many h, a sufficient condition for E π 2 (X) Q q 2 (X) h2 (X) is bounded. < is that π/q The normalizing constant of Π is unknown, not possible to use ˆΠ IS Q,N. It is natural to use the self-normalized version of the IS estimator, ˆΠ SNIS Q,N (h) = ( N i=1 ) 1 π(x i ) N q(x i ) i=1 π(x i ) q(x i ) h(x i). Séminaire Montpellier (17 décembre 2006) Page 15

(i) SNIS ˆΠ Q,N (h) as E Π (h(x)); ( (ii) E π 2 (X) ( Q q 2 (X) 1 + h 2 (X) )) <, N(ˆΠSNIS Q,N (h) E Π (h(x)) L N ( ( )) π(x) 0, V Q (h(x) π(h). q(x) The quality of the SNIS approximation depends on the choice of the proposal distribution Q. Séminaire Montpellier (17 décembre 2006) Page 16

It is the well-known that the importance distribution / q (x) = h(x) π(x) h(y) π(y)dy minimizes the variance of ˆΠ IS Q,N (h). It produces a zero variance estimator when h is either positive or negative (indeed, in both cases, ˆπ IS Q,N = E Π(h(X))). q cannot be used in practice because it depends on the integral h(y) π(y)dy. This result is thus rather understood as providing a goal for choosing a importance function g tailored for the approximation of E Π (h(x)). Séminaire Montpellier (17 décembre 2006) Page 17

/ q (x) = h(x) π(h) π(x) h(y) E Π (h(x)) π(y)dy minimizes the asymptotic variance of ˆΠ SNIS Q,N (h). This second optimum is not available either, because it still depends on E Π (h(x)). There is little in the literature besides general recommendations that the support of q should be the support of h(x) π(x) or of h(y) π(h) π(y), or yet that the tails of q should be at least as thick as those of h(x) π(x). Séminaire Montpellier (17 décembre 2006) Page 18

PMC algorithms The notion of importance sampling can actually be greatly generalized to encompass much more adaptive and local schemes than thought previously. This extension is to learn from experience, that is, to build an importance sampling function based on the performances of earlier importance sampling proposals. By introducing a temporal dimension to the selection of the importance function, an adaptive perspective can be achieved at little cost, for a potentially large gain in efficiency. Séminaire Montpellier (17 décembre 2006) Page 19

D-kernel PMC algorithm Let Q i,t be the proposal distribution at iteration t of the algorithm for particle x i,t. Obviously, the quasi-total freedom in the construction of the Q i,t s has drawbacks, namely that some proposals do not necessarily lead to improvements in terms of variance reduction. We now restrict the family of proposals from which to select the new Q i,t s to mixture of fixed proposals. Séminaire Montpellier (17 décembre 2006) Page 20

We assume from now on that we use in parallel D fixed kernels Q d (, ) with densities q d and that the proposal is a mixture of those kernels q i,t (x) = D d=1 α t,n d q d ( x i,t 1, x), d α t,n d = 1, where the weights α t,n d > 0 can be modified at each iteration. The amount of adaptivity we allow in this version of PMC is thus restricted to a possible modification of the weights α t,n d. Séminaire Montpellier (17 décembre 2006) Page 21

The importance weight associated with this mixture proposal is / D π(x i,t ) d=1 α t,n d q d ( x i,t 1, x i,t ) while simulation from q i,t can be decomposed in the two usual mixture steps: first pick the component d then simulate from the corresponding kernel Q d. Séminaire Montpellier (17 décembre 2006) Page 22

Generic D-kernel PMC algorithm At time 0, produce the sample ( x i,0 ) 1 i N and set α 1,N d At time 1 t T, = 1/D (1 d D); a). Conditionally on the α t,n iid d s, generate (K i,t ) 1 i N M(1, (α t,n d ) 1 d D ); b). Conditionally on ( x i,t 1, K i,t ) 1 i N, generate independently ffi X D and set ω i,t = π(x i,t ) (x i,t ) 1 i N Q Ki,t ( x i,t 1, ) d=1 α t,n d q d ( x i,t 1, x i,t ); c). Conditionally on ( x i,t 1, K i,t, x i,t ) 1 i N, generate set x i,t = x Ji,t,t and α t+1,n d (J i,t ) 1 i N iid M(1, (ω i,t ) 1 i N ) = Ψ d (( x i,t 1, x i,t,k i,t ) 1 i N ) such that P D d=1 αt+1,n d = 1. Séminaire Montpellier (17 décembre 2006) Page 23

Ψ d (1 d D) denotes an update function that depends upon the past iteration. (A1) d {1,...,D}, Π Π {q d (x, x ) = 0} = 0 (the individual kernel importance weights are almost surely finite). Séminaire Montpellier (17 décembre 2006) Page 24

Theorem 1 Under (A1) and (A2), for any function h in L 1 π and for all t 0, both the unnormalised and the self-normalized PMC estimators are convergent, and ˆΠ PMC t,n (h) = 1 N ˆΠ SNPMC t,n (h) = N i=1 N i=1 ω i,t h(x i,t ) N P E Π (h(x)) ω i,t h(x i,t ) N P E Π (h(x)). As noted earlier, the unnormalised PMC estimator can only be used when π is completely known. Séminaire Montpellier (17 décembre 2006) Page 25

{ } (A2) Π Π (1 + h 2 (x )) π(x ) q d (x,x ) (integrability condition). Theorem 2 Under (A1) and (A2), if for all t 1, < for a d {1,...,D} 1 d D, α t,n d N P α t d > 0, then both ( N ) N ω i,t h(x i,t ) E Π (h(x)) i=1 N ( 1 N and ) N ω i,t h(x i,t ) E Π (h(x)) i=1 converge in distribution as n goes to infinity to centered normal distributions with variances Séminaire Montpellier (17 décembre 2006) Page 26

σ 2 1,t = Π Π ( (h(x ) E Π (h(x))) 2 π(x ) D d=1 αt d q d(x, x ) and ( σ2,t 2 π(x ) = Π Π D d=1 αt d q d(x, x ) h(x ) E Π (h(x)) ) ) 2 D d=1 αt d q d(x, x ) π(x ). Séminaire Montpellier (17 décembre 2006) Page 27

A first Kullback-Leibler criterion S = { } D α = (α 1,...,α D ); d {1,...,D}, α d 0 and α d = 1. d=1 α S, let us denote by KL 1 (α) the Kullback-Leibler divergence between the mixture and the target distribution Π: [ ( )] π(x)π(x ) KL 1 (α) = log π(x) D d=1 α Π Π(dx, dx ). dq d (x, x ) First Kullback-Leibler divergence criterion: the best mixture of transition kernels is the one that minimizes KL 1 (α). Séminaire Montpellier (17 décembre 2006) Page 28

Theorem 3 Under (A1) and (A2), for the unnormalised and the selfnormalised cases, the updates Ψ d of the mixture weights given by α t+1,n d = N ω i,t I d (K i,t ) i=1 garantee a systematic decrease of KL 1, a long-term run of the algorithm providing the mixture that is KL 1 -closest to the target. Séminaire Montpellier (17 décembre 2006) Page 29

A first toy example Target π(x) = 1/3f N( 1,0.1) (x) + 1/3f N(0,1) (x) + 1/3f N(3,10) (x). 3 proposal distributions: N( 1, 0.1), N(0, 1) and N(3, 10) (more simple than transition kernels) Use of the Rao-Blackwellized 3-kernels algorithm with N = 100, 000 Séminaire Montpellier (17 décembre 2006) Page 30

1 0.0500000 0.0500000 0.9000000 2 0.1815457 0.1002131 0.7182413 3 0.3142235 0.1490543 0.5367222 4 0.3632511 0.1910549 0.4456940 5 0.3674450 0.2301243 0.4024307 6 0.3529385 0.2858416 0.3612199 7 0.3405808 0.3119567 0.3474625 8 0.3414635 0.3208192 0.3377173 9 0.3356331 0.3295758 0.3347911 10 0.3357052 0.3312788 0.3330160 Table 1: Evolution of the proposal mixture weights over the PMC iterations Séminaire Montpellier (17 décembre 2006) Page 31

A second toy example Target Π N(0, 1). 3 Gaussian random walks proposals: q 1 (x, x ) = f N(x,0.1) (x ), q 2 (x, x ) = f N(x,2) (x ) and q 3 = f N(x,10) (x ) Use of the Rao-Blackwellized 3-kernels algorithm with N = 100, 000 Séminaire Montpellier (17 décembre 2006) Page 32

1 0.33333 0.33333 0.33333 2 0.24415 0.43145 0.32443 3 0.19525 0.52445 0.28031 4 0.10725 0.72955 0.16324 5 0.08223 0.83092 0.08691 6 0.06155 0.88355 0.05490 7 0.04255 0.92950 0.02795 8 0.03790 0.93760 0.02450 9 0.03130 0.94505 0.02365 10 0.03460 0.94875 0.01665 Table 2: Evolution of the proposal mixture weights over the PMC iterations Séminaire Montpellier (17 décembre 2006) Page 33

A second Kullback-Leibler criterion in the unnormalised case α S, let us denote by KL 2 (α) the Kullback-Leibler divergence between the mixture and h(x) π(x): [ ( )] π(x) h(x ) π(x ) KL 2 (α) = log π(x) D d=1 α Π Π(dx, dx ). dq d (x, x ) Second Kullback-Leibler divergence criterion: the best mixture of transition kernels is the one that minimizes KL 2 (α). Séminaire Montpellier (17 décembre 2006) Page 34

Theorem 4 Under (A1), for the unnormalised case, the updates Ψ d of the mixture weights given by α t+1,n d = N i=1 / N ω i,t h(x i,t ) I d (K i,t ) i=1 ω i,t h(x i,t ) garantee a systematic decrease of KL 2, a long-term run of the algorithm providing the mixture that is KL 2 -closest to the target. Séminaire Montpellier (17 décembre 2006) Page 35

Asymptotic variance criterion α S, let us define σ 2 1(α) = Π Π (self-normalised case) ( (h(x ) π(h)) 2 π(x ) D d=1 α dq d (x, x ) ( σ2(α) 2 π(x ) = Π Π D d=1 α dq d (x, x ) h(x ) π(h) ) ) 2 D d=1 α dq d (x, x ) π(x ). (unnormalised case) Asymptotic variance criterion: the best mixture of transition kernels is the one that minimizes σ 2 1(α) or σ 2 2(α). Séminaire Montpellier (17 décembre 2006) Page 36

Theorem 5 Under (A1) and (A2), for the unnormalised case, the updates Ψ d of the mixture weights given by α t+1,n d = N i=1 / N ωi,th 2 2 (x i,t )I d (K i,t ) i=1 ω 2 i,th 2 (x i,t ) garantee a systematic decrease of σ 2 2, a long-term run of the algorithm providing the mixture that is σ 2 2-closest to the target. Séminaire Montpellier (17 décembre 2006) Page 37

Theorem 6 Under (A1) and (A2), for the self-normalised case, the updates Ψ d of the mixture weights given by 2 N N h(x i,t ) ω j,t h(x j,t ) I d (K i,t ) α t+1,n d = i=1 ω 2 i,t N i=1 ω 2 i,t j=1 h(x i,t ) N ω j,t h(x j,t ) garantee a systematic decrease of σ 2 1, a long-term run of the algorithm providing the mixture that is σ 2 1-closest to the target. j=1 2 Séminaire Montpellier (17 décembre 2006) Page 38

A final example Target N(0, 1) and h(x) = x In this case, it well-known that the optimal importance distribution which minimises the variance of the unnormalised importance sampling estimator is g (x) x exp x 2 /2. We choose g as one of D = 3 independent kernels, the other kernels being the N(0, 1) and the C(0, 1) (Cauchy) distributions. N = 100, 000 and T = 20 Séminaire Montpellier (17 décembre 2006) Page 39

t ˆπ t,n PMC (x) αt+1,n 1 α t+1,n 2 α t+1,n 3 σ 2,t 1 0.0000 0.1000 0.8000 0.1000 0.9524 2-0.0030 0.1144 0.7116 0.1740 0.9192 3-0.0017 0.1191 0.6033 0.2776 0.8912 4-0.0006 0.1189 0.4733 0.4078 0.8608 5-0.0035 0.1084 0.3545 0.5371 0.8394 10 0.0065 0.0519 0.0622 0.8859 0.8016 15 0.0033 0.0305 0.0136 0.9559 0.7987 20-0.0042 0.0204 0.0041 0.9755 0.7984 Séminaire Montpellier (17 décembre 2006) Page 40

0.80 0.85 0.90 0.95 1.00 0 10 20 30 40 50 t Figure 3: Estimation of E[X] = 0 for a normal variate: decrease of the standard deviation to its optimal value Séminaire Montpellier (17 décembre 2006) Page 41