Using systematic sampling for approximating Feynman-Kac solutions by Monte Carlo methods

Similar documents
Concentration inequalities for Feynman-Kac particle models. P. Del Moral. INRIA Bordeaux & IMB & CMAP X. Journées MAS 2012, SMAI Clermond-Ferrand

A Backward Particle Interpretation of Feynman-Kac Formulae

Introduction. log p θ (y k y 1:k 1 ), k=1

A new class of interacting Markov Chain Monte Carlo methods

Sequential Monte Carlo Methods for Bayesian Computation

Convergence at first and second order of some approximations of stochastic integrals

Advanced Monte Carlo integration methods. P. Del Moral (INRIA team ALEA) INRIA & Bordeaux Mathematical Institute & X CMAP

An almost sure invariance principle for additive functionals of Markov chains

for all f satisfying E[ f(x) ] <.

SUPPLEMENT TO PAPER CONVERGENCE OF ADAPTIVE AND INTERACTING MARKOV CHAIN MONTE CARLO ALGORITHMS

Lecture 12. F o s, (1.1) F t := s>t

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1

Lecture 10. Theorem 1.1 [Ergodicity and extremality] A probability measure µ on (Ω, F) is ergodic for T if and only if it is an extremal point in M.

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.

Computer Intensive Methods in Mathematical Statistics

1 Introduction. 2 Measure theoretic definitions

SEPARABILITY AND COMPLETENESS FOR THE WASSERSTEIN DISTANCE

Contraction properties of Feynman-Kac semigroups

ON THE ZERO-ONE LAW AND THE LAW OF LARGE NUMBERS FOR RANDOM WALK IN MIXING RAN- DOM ENVIRONMENT

GLUING LEMMAS AND SKOROHOD REPRESENTATIONS

Lecture 5. If we interpret the index n 0 as time, then a Markov chain simply requires that the future depends only on the present and not on the past.

The main results about probability measures are the following two facts:

Random Process Lecture 1. Fundamentals of Probability

Tools from Lebesgue integration

University of Toronto Department of Statistics

STAT 7032 Probability Spring Wlodek Bryc

Chapter 7. Markov chain background. 7.1 Finite state space

Mean field simulation for Monte Carlo integration. Part II : Feynman-Kac models. P. Del Moral

A Note on Auxiliary Particle Filters

Consistency of the maximum likelihood estimator for general hidden Markov models

STABILITY AND UNIFORM APPROXIMATION OF NONLINEAR FILTERS USING THE HILBERT METRIC, AND APPLICATION TO PARTICLE FILTERS 1

Dynkin (λ-) and π-systems; monotone classes of sets, and of functions with some examples of application (mainly of a probabilistic flavor)

Particle Filters: Convergence Results and High Dimensions

Notes 1 : Measure-theoretic foundations I

9 Brownian Motion: Construction

Negative Association, Ordering and Convergence of Resampling Methods

Probability and Measure

THEOREMS, ETC., FOR MATH 515

Some Results on the Ergodicity of Adaptive MCMC Algorithms

A NEW NONLINEAR FILTER

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS

The Codimension of the Zeros of a Stable Process in Random Scenery

Product measure and Fubini s theorem

arxiv: v2 [math.pr] 4 Sep 2017

ASYMPTOTIC BEHAVIOR OF A TAGGED PARTICLE IN SIMPLE EXCLUSION PROCESSES

Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued

OPTIMAL SOLUTIONS TO STOCHASTIC DIFFERENTIAL INCLUSIONS

Reminder Notes for the Course on Measures on Topological Spaces

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions

Estimation of the Bivariate and Marginal Distributions with Censored Data

1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3

LIMIT THEOREMS FOR WEIGHTED SAMPLES WITH APPLICATIONS TO SEQUENTIAL MONTE CARLO METHODS

A log-scale limit theorem for one-dimensional random walks in random environments

PCMI LECTURE NOTES ON PROPERTY (T ), EXPANDER GRAPHS AND APPROXIMATE GROUPS (PRELIMINARY VERSION)

MATH 6605: SUMMARY LECTURE NOTES

Introduction to Empirical Processes and Semiparametric Inference Lecture 08: Stochastic Convergence

Measure Theory on Topological Spaces. Course: Prof. Tony Dorlas 2010 Typset: Cathal Ormond

Stability of Feynman-Kac Semigroup and SMC Parameters' Tuning

Spring 2012 Math 541B Exam 1

L n = l n (π n ) = length of a longest increasing subsequence of π n.

The Hopf argument. Yves Coudene. IRMAR, Université Rennes 1, campus beaulieu, bat Rennes cedex, France

REVERSIBILITY AND OSCILLATIONS IN ZERO-SUM DISCOUNTED STOCHASTIC GAMES

Simultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms

Some functional (Hölderian) limit theorems and their applications (II)

Empirical Processes: General Weak Convergence Theory

Math 209B Homework 2

Problem Set 5: Solutions Math 201A: Fall 2016

Lecture Notes Introduction to Ergodic Theory

An Brief Overview of Particle Filtering

A Concise Course on Stochastic Partial Differential Equations

A NEW PROOF OF THE WIENER HOPF FACTORIZATION VIA BASU S THEOREM

Modeling with Itô Stochastic Differential Equations

MATHS 730 FC Lecture Notes March 5, Introduction

Auxiliary Particle Methods

Independence of some multiple Poisson stochastic integrals with variable-sign kernels

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3

On the martingales obtained by an extension due to Saisho, Tanemura and Yor of Pitman s theorem

Some SDEs with distributional drift Part I : General calculus. Flandoli, Franco; Russo, Francesco; Wolf, Jochen

ASYMPTOTICALLY NONEXPANSIVE MAPPINGS IN MODULAR FUNCTION SPACES ABSTRACT

Math 240 (Driver) Qual Exam (9/12/2017)

Measures and Measure Spaces

On the convergence of sequences of random variables: A primer

THE SET OF RECURRENT POINTS OF A CONTINUOUS SELF-MAP ON AN INTERVAL AND STRONG CHAOS

Inverse Brascamp-Lieb inequalities along the Heat equation

Math 324 Summer 2012 Elementary Number Theory Notes on Mathematical Induction

LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS. S. G. Bobkov and F. L. Nazarov. September 25, 2011

Compression on the digital unit sphere

MAXIMAL COUPLING OF EUCLIDEAN BROWNIAN MOTIONS

Introduction to self-similar growth-fragmentations

Applied Analysis (APPM 5440): Final exam 1:30pm 4:00pm, Dec. 14, Closed books.

Real Analysis Notes. Thomas Goller

THE LINDEBERG-FELLER CENTRAL LIMIT THEOREM VIA ZERO BIAS TRANSFORMATION

A VERY BRIEF REVIEW OF MEASURE THEORY

Stochastic integration. P.J.C. Spreij

Notions such as convergent sequence and Cauchy sequence make sense for any metric space. Convergent Sequences are Cauchy

A SKOROHOD REPRESENTATION THEOREM WITHOUT SEPARABILITY PATRIZIA BERTI, LUCA PRATELLI, AND PIETRO RIGO

Continuous Functions on Metric Spaces

In terms of measures: Exercise 1. Existence of a Gaussian process: Theorem 2. Remark 3.

The properties of L p -GMM estimators

STAT 7032 Probability. Wlodek Bryc

Transcription:

Using systematic sampling for approximating Feynman-Kac solutions by Monte Carlo methods Ivan Gentil and Bruno Rémillard Abstract While convergence properties of many sampling selection methods can be proven to hold in a context of approximation of Feynman-Kac solutions using sequential Monte Carlo simulations, there is one particular sampling selection method introduced by Baker (987), closely related with systematic sampling in statistics, that has been exclusively treated on an empirical basis. The main motivation of the paper is to start to study formally its convergence properties, since in practice it is by far the fastest selection method available. One will show that convergence results for the systematic sampling selection method are related to properties of peculiar Markov chains. Keywords: Feynman-Kac formulæ, sequential Monte Carlo, genetic algorithms, systematic sampling, Markov chains. MSC (2000): 65C35, 60G35, 60J0. Introduction Let (X k ) k 0 be a non-homogeneous Markov chain on a locally compact metric space E, with transition kernels (K n ) n and initial law η 0 defined on the Borel σ-field B(E). Further let B b (E) be the set of bounded B(E)-measurable functions. Given a sequence (g n ) n of positive functions in B b (E), suppose that one wants to calculate recursively the following Feynman-Kac formulæ (η n ) n : () η n (f) = γ n(f) γ n (), f B b(e), where (2) γ n (f) = E ( f(x n ) n g k (X k ) ote that most nonlinear filtering problems are particular cases of Feynman-Kac formulæ. Following Crisan et al. (999) and Del Moral and Miclo (2000), let M (E) denotes the set of probability measures on (E, B(E)). If µ M (E) and n 0, let µk n be the probability measure defined on B b (E) by µk n (f) = µ(k n f) = f(z)k n (x, dz)µ(dx). E In order to understand the relation between the η n s, for any n, let ψ n : M (E) M (E) be defined by ψ n (η)f = η(g nf) η(g n ), η M(E), f B b(e), and let Φ n denotes the mapping from M (E) to M (E) defined by E Φ n (η) = ψ n (η)k n. CEREMADE, Université Paris-Dauphine, Place du Maréchal de Lattre de Tassigny, F-75775 Paris cedex 6, France, gentil@ceremade.dauphine.fr HEC Montréal, Service de l enseignement des méthodes quantitatives de gestion, 3000, chemin de la côte-sainte- Catherine, Canada H3T 2A7, Bruno.Remillard@hec.ca ).

Then it is easy to check that for any n, (3) η n = Φ n (η n ). ote also that for any n, the mapping Φ n can be decomposed into (4) ˆη n = ψ n+ (η n ), η n+ = ˆη n K n+, n 0, η 0 M (E). Further remark that the first transformation, η n ˆη n, is non-linear, while the second one, ˆη n η n+, is linear. Even if the forward system of equations (3) looks simple, it can rarely be solved analytically, and even if this is the case, it would require extensive calculations. This is why algorithms for approximating (η n ) n, starting from η 0, are so important. One such method, presented in the remarkable surveys Del Moral and Miclo (2000), Crisan and Doucet (2002) and the book of Del Moral (2004), is to build approximations of measures (η n ) n using interacting particle systems. The algorithm uses decomposition (4), and by analogy with genetics, the first step, which is related to a sampling selection method, is often referred to as the selection step, and the second one is termed the mutation step, while in reality it is a Markovian evolution of the particles. The speed of the latter cannot be improved in general, so the speed of any algorithm depends on the rapidity of the sampling selection process. In this paper, one discusses properties of a particular algorithm that is called systematic sampling selection herein, while in the genetic algorithms literature, it has been strangely called Stochastic universal sampling selection. It seems to have appeared first in Baker (987). It has been reintroduced in the filtering literature in Künsch (2005). In what follows, a description of the general algorithm is given in Section 2, with a few examples of sampling selection methods, together with some tools for studying its convergence. In Section 3, one focuses on the systematic sampling selection method, giving some properties, and stating some convergence results and a conjecture, based on results from Markov chains proved in the appendix. Finally, in Section 4, numerical comparisons between sampling selection methods are made through a simple model of nonlinear filtering for noisy black-and-white images. 2 Algorithm and sampling selection methods The general algorithm for approximating the solution of (3) is first given, following the exposition in Crisan et al. (999), Del Moral and Miclo (2000), while particular sampling selection methods are presented next. Throughout the rest of the paper, it is assumed that for any n, inf x E g n(x) > 0. 2. General algorithm Let be a integer, representing the number of particles and for any n 0, let ξ n = { ξn,, } ξ n denotes the particles at time n and set ηn = δ ξ i n. i= At time n = 0, the initial particle system ξ 0 = { ξ0,, ξ 0 distributed particles with common law η 0. } consist of independent and identically For each n, the particle system ξ n = { } ξn,, ξn consists of particles, is obtained in the following way: 2

(Sampling/Selection) First calculate the weights vector W n (0, ), where (5) W i n = g n (ξn i ) i= g =,...,. n(ξn i ), Then, select, according to a given sampling selection method, a sample ˆξ n = {ˆξ n,..., ˆξ n of size from ξ n. (Evolution/Mutation) Given ˆξ n, the new particle system ξ n consists of particles ξn i chosen independently from law K n (ˆξ n i, dx), i. In other words, for any z = (z,..., z ) E, ( ) P ξ n dx ˆξ n = z = K n (z i, dx i ). ote that in order to describe a sampling selection method, it suffices to define how the numbers Mn,..., Mn {0,,..., } are randomly selected, with Mn i representing the number of times particle ξn i is appears in the new sample. Therefore, one can write i= } ˆη n = i= δˆξi n = Mn i δ ξn i. i= A sampling selection method will be said to be conditionally unbiased, if for any i {,..., } and any k, E(M i k ξ k ) = W i k. Remark 2.. Conditional unbiasedness yields the following property: (6) E ( η k f ξ k ) = Φk ( η k ) (f), f Bb (E). For, in that case, E ( ηn f ξ ) n ( ( = E E (ηn f ξ n, ˆξ ) n ) ξ n = E = Wn i K nf(ξn i ) = Φ ) n( η n (f). i= ) Mn i K nf(ξn i ) ξ n i= The mean square error of a particular sampling selection method can be obtained using the following useful result, obtained by (Del Moral and Miclo, 2000, Theorem 2.36). Before stating the result, define, for any measurable η with values in M(E), η 2 2 = sup E ( (ηf) 2). f B b (E), f Theorem 2.2. Assume that the sampling selection method is conditionally unbiased and that the following condition is verified for all k n: there exists a constant C k such that for all -dimensional vectors { q,, q } R, ( (7) E ) 2 (Mk i W k i )qi ξ k C k max q i 2. i i= Then, for all k n, there exists a constant C k such that η k η k 2 2 C k /. 3

In what follows, only conditionally unbiased sampling selection methods are considered. As shown in Remark 3.3, one can see that in general, the systematic sampling selection method defined below does not satisfy condition (7) of the previous theorem, while classical sampling selection methods, like the ones listed in Section 2.3, do satisfy it. Therefore, weaker conditions must be imposed in order to obtain mean square convergence. In fact, one has the following result. Theorem 2.3. Let (a ) be a sequence such that a / 0, as. Assume that the sampling selection method is conditionally unbiased. Then if and only if, for any f B b (E), (8) lim a Moreover, sup a max k n η k sup a lim a max k n η k η k 2 2 = 0 ( max E k n ) 2 (Mk i W k i )f(ξi k ) = 0. i= η k 2 2 is finite if and only if ( max E k n ) 2 (Mk i W k i )f(ξi k ) <. i= Proof. Suppose that a / 0 and let f B b (E) be given. First, note that using the unbiasedness condition, together with (6), one has, for any k {,..., n}, (9) E [ (η k f η k f) 2] = E [ (η k f Φ k (η k )f) 2] + E [ (Φ k (η k )f η k f) 2]. Since g k c k > 0 by hypothesis, for some positive constant c k, k, it follows that lim a max k n η k η k 2 2 = 0 if and only if for any k =,..., n, lim a E [ (ηk f Φ k (ηk )f) 2] = 0. ext, it can be shown easily [ (η that for any k =,..., n, E k f Φ n (η k )f ) ] 2 ξk can be written as ( E 2 (Mk i Wk)K i k f(ξk )) i ξ k + Φ k(ηk )(K k f 2 (K k f) 2 ). i= Since K k f B b (E), 0 Φ k(ηk )(K kf 2 (K k f) 2 ) f 2, and a / 0, it follows from the calculations above that lim a max E [ (ηk f Φ k(ηk )f)2] = 0 k n if and only if (8) holds true. The rest of the proof is similar, so it is omitted. 2.2 Systematic sampling By obvious analogy with systematic sampling in Statistics, the first sampling selection method that is described is simply called systematic sampling. It appears that this method was first proposed by Baker (987) under the strange name Stochastic Universal Sampling, in a context of unbiased sampling selection for genetic algorithms. However, nobody formally studied its convergence properties. 4

As opposed to the definition of Baker (987), the sampling selection method can simply be defined in the following way: For n, let U n a uniform random variable on [0, ) and note for w [0, ], M(w, U n ) := w + U n, where x denotes the integer part of x. Then M n := M(W n, U n ), M k n := M(W n + + W k n, U n) M(W n + + W k n, U n ), k = 2,...,. Since M(, U n ) =, one gets that i= M i n =. Therefore the number of particles is always. Properties of that sampling selection method are examined in Section 3. 2.3 Other sampling methods One can grossly classify the various sampling selection methods into two categories, according as the number of particles is constant or random. The following list is by no means exhaustive. For the first two methods, is constant, while n fluctuates in the last two methods. For other sampling selection methods, one may consult Crisan et al. (999), Del Moral and Miclo (2000), Del Moral (2004) and references therein. ote that the last two methods are particular cases of what is known as Branching selection methods in the filtering literature. 2.3. Simple random sampling This selection method is based on simple random sampling without rejection. It follows that ( M n,, Mn ) ( Multinomial, W n,, Wn ), where ( ) Wn i are given by (5). This sampling selection method is computationally demanding, i but it has many interesting properties that have been studied mainly by Del Moral and co-authors, e.g. see Del Moral (2004). In particular, conditions (i) (ii) of Theorem 2.2 are met; also one can prove a Central Limit Theorem and Large Deviations Properties. 2.3.2 The remainder stochastic sampling This algorithm was first introduced by Brindle (980) in a context of unbiased sampling selection for genetic algorithms; see also Baker (985, 987) for comparisons between sampling selection methods in the latter context. It is also defined as Residual sampling by Liu and Chen (998). It is much faster to implement than the simple random sampling selection method, it satisfies conditions (i) (ii) of Theorem 2.2, and recently, Douc and Moulines (2005) investigated some of its convergence properties. See also Del Moral and Miclo (2000) and the references therein. To describe the selection method, first define Ñ = i= W n i = { } i= W i n, where {x} stands for the fractional part of x, i.e. {x} = x x. ext, allocate the (possibly) remaining Ñ particles via simple random sampling, i.e. ( M n Wn,, M n W n ) Multinomial (Ñ, W n,, W n with W i n = { W i n} / j= { W j n }, i. 2.3.3 Binomial sampling As stated before, for this sampling selection method and the next one, the number of particles at time n is random and it is denoted by n, n 0. Of course, 0 is fixed. For n, and given ξ n and n, Mn,..., M n n are independent and Mn i Bin( n, Wn i ), for =,..., n. It follows that n = n i= This sampling selection method is a little bit faster than the simple random sampling selection method, but a major drawback is that there is no control on the number of particles. Moreover, P ( n = 0) > 0. M i n. ), 5

2.3.4 Bernoulli sampling The Bernoulli sampling selection selection method was introduced in Crisan et al. (998). See also Crisan and Lyons (999), Crisan (2003) for additional properties of the sampling selection selection. It is worth noting that Mn i takes the same values as in the systematic sampling selection method, provided n =. In fact, for n, and given ξ n and n, Mn,..., M n n are independent, where Mn i is defined by Mn i = n Wn i + εi n, εi n Ber({ n Wn} i ), i n. ote that n and that the following alternative representation also holds: M i n = (W n + + W i n ) + U i n (W n + + W i n ) + U i n, where U n,..., U n n are independent and U i n Unif([0, )), given ξ n, n. 3 Some properties and results for systematic sampling selection Throughout the rest of paper, the selection method is the one defined in Section 2.2. Let s start first with some elementary properties of systematic sampling selection. Lemma 3.. Suppose that U n is uniformly distributed over [0, ). Then, conditionally on ξ n, one has, for any i {,, }, (0) M i n W i n Ber({ W i n} ). In particular, for any i {,, }, E ( M i n ξ n ) = W i n. Proof. It suffices to show that whenever U Unif([0, )) and x, y 0, then U + x + y U + x y is a Bernoulli random variable with parameter p = {y}. To this end, first note that V = {U + x} is also uniformly distributed on [0, ). ext, Hence the result. U + x + y U + x y = {U + x} + {y} = V + {y} Ber({y}). Remark 3.2. Using the same proof as in Lemma 3., then, conditionally on ξ n, one obtains Mn i + + Mn j (Wn i + + Wn) j Ber ({ (Wn i + + Wn) }) j, for any i j {,, }. ote also that since the sampling selection method is unbiased, i.e. condition (i) of Theorem 2.2 is satisfied, then for any n, one has E ( ηn ) ( ) f ξ n = Φn η n (f), f Bb (E). To obtain L 2 convergence of the algorithm based on the systematic sampling selection method, one would like to apply Theorem 2.2 of Del Moral and Miclo (2000). All sampling selections presented in Section 2.3 satisfies property (8). If the n is random, there is an similiar condition to (8). But as shown next, systematic sampling behaves differently. Remark 3.3. Inequality (7) is not verified in general for the systematic sampling selection method. Here is an illustration. Suppose that = 2m and let, for any i {,, /2}, Wn 2i = /(2), and = 3/(2). Then one can check that for any i /2, W 2i n Mn 2i = if U n [0, /2), Mn 2i = 2 if U n [/2, ), Mn 2i = if U n [0, /2), Mn 2i = 0 if U n [/2, ). ext, if i /2, set q 2i = and q 2i =. It follows that ( ) 2 E (Mk i W i k)q i ξ k = 4, showing that inequality (7) is false. i= 6

However one believes that the following holds true. Conjecture 3.4. Suppose that η 0 and (K n ) n are absolutely continuous laws and consider that Mn,..., M n are obtained using the the systematic sampling selection method. Then, for all f B b (E) and n, (8) holds with a, i.e. ( ) 2 () lim E (Mn i W n i )f(ξi n ) = 0. i= ote that it follows from Theorem 2.3 that the above conjecture is equivalent to η n η n 2 0, as, for any n 0. In what follows, one tries to motivate why Conjecture 3.4 might be true. To this end, first note that any k, ow, set F 0 n = 0 and F k n = M k n W k n = { (W n + + W k n ) + U n } { (W n + + W k n ) + U n}. k g n (ξ j n ), k. For any α > 0 and any f B b(e), further define j= Z n (f, α) = = f ( ξn k ) ({ Fn k } { }) F k α + U n n α + U n f ( ξ k n where S k n = F k n /α +U n and S 0 n = U n. Then, setting ḡ n = one has Y n (f) = i= )({ } { }) S k n S k n, g n (ξn k ) and defining Y n (f) = Z n (f, ḡ n), (Mn i W n i )f(ξi n ), so one can rewrite () in the form lim E [ (Y n (f) ) 2 ] = 0. Unfortunately, working with Yn appears to be impossibly difficult, so one could work instead with a more tractable quantity, namely Zn. In the case n =, one has a least the following result, which is a first step in proving Conjecture 3.4. Before stating it, recall that D([0, ]) is the space of càdlàg functions with Skorohod s topology. Theorem 3.5. Assume that the law of { g (ξ 0 )} is absolutely continuous. Then, for any α > 0 and any f B b (E), the sequence of processes B D([0, ]) defined by B f,α(t) = Z t (f, α), t [0, ], converges in D([0, ]) to σb f,α, where B f,α is a Brownian motion and [ (Z lim E (f, α) ) ] 2 = σ 2. The proof on Theorem 3.5 is an easy consequence of Theorem A.3 applied with X k = f(ξ k 0 ), Y k = { g (ξ k 0 )} and f(x, y, s) = x(y s). In addition, there is an explicit expression for σ 2. More details can be found in Appendix A. 7

Remark 3.6. Theorem 3.5 does not prove Conjecture 3.4 in the case n =. However, if one is willing to deal with a random number of particles at step n =, one obtains the following interesting result: Set = U + j= g (ξ k 0 )/α, and define ˆη 0 = M k δ ξ k 0. Then, as, / tends to η 0 (g )/α and lim sup η η 2 2 <. To prove convergence for higher orders, i.e n >, one would need results from non-homogeneous Markov chains. The approach will be examined in a near future, using for example the results of Sethuraman and Varadhan (2005). Remark 3.7. In order to keep fixed, one could try to control the term Z (g ) Z (α). (g α) η 0 (g 2 ) Z, where Z (0, ), it follows that [ ] η0 (g 2 = ) g α η0 2(g ) Z + o P (). Since If one could differentiate term by term, one would then obtain ( Z (g ) Z (α) ) η 0 (K fg ) η0 (g 2 ) η 0 (g ) 2 Z = η (f) η0 (g 2 ) η 0 (g ) η0 (g 2 so one could guess that Y η (f) ) Z + B f,α (). On the other hand, if the sequence Z η 0 (g ) (α) was tight for α in a closed interval not containing zero, then one would get Z (g ) Z (α) 0 in probability. There is no indication so far in favor of one of these two approaches. 4 umerical comparisons The numerical comparions will be done through a simple model of filtering for tracking a moving target using noisy black-and-white images, where the exact filter can be calculated explicitly, that is η n is known for any n, e.g. Gentil et al. (2005). 4. Description of the model One will assume that the target moves on Z 2 according to a Markov chain. Observations consist in blackand-white noisy images of a finite fixed region R Z 2. More precisely, let (X n ) n 0 be a homogeneous Markovian chain with values in X = {ω {0, } Z : x Z ω(x) = }. Of course, the position of the target at step n is x 0 if and only if X n (x 0 ) =. Set (2) M(a, b) = P {X n+ (a) = X n (b) = }, a, b Z 2. ote that M describes exactly the movement of the target. The model for observations Y k {0, } R, k =,..., n is the following: Given X 0,..., X n, assume that {Y n (x)} x R are independent and for any x R, (3) P (Y n (x) = 0 X n (x) = 0) = p 0, P (Y n (x) = X n (x) = ) = p, where 0 < p 0, p <. One wants to compute the distribution of X k conditionally to Y n, where Y n is the sigma-algebra generated by observations Y,..., Y n, and Y 0 is the trivial sigma-algebra. As in Z, 8

Section 2 of Gentil et al. (2005), note that for any (ω, ω ) {0, } R X, the conditional probability P (Y k = ω X k = ω ) = Λ(ω, ω ) satisfies ( ) <ω> ( ) <ωω Λ(ω, ω ) = p R p0 p 0 p > 0 ( p ), ( p 0 )( p ) p 0 where < ω >= x R ω(x) and < ωω >= x R ω(x)ω (x). Let P be the joint law of the Markovian targets with initial distribution ν, and the observations, and let Q be the joint law of the Markovian targets with initial distribution ν, and independent Bernoulli observations with mean /2. Further let G n be the sigma-algebra generated by Y,..., Y n, X 0,..., X n. Then it is easy to check that with respect to G n, P is equivalent to Q and dp n dq = 2 R Λ(Y j, X j ). Gn j= n Further define L n = Λ(Y j, X j ). Denoting by E P (resp. E Q ) expectation with respect to P (resp. Q), j= observe that for any f B b (X), one has (4) ˆη n (f) = E P (f(x n ) Y n ) = E Q (f(x n )L n Y n ). E Q (L n Y n ) This formula is a consequence of the properties of conditional expectations, and in the context of filtering, (4) is known as the Kallianpur-Stribel formula, e.g. Kallianpur and Striebel (968). Denote by K the Markov kernel associated with the Markov chain (X n ) n 0 defined by M, as in (2). One can check that η n and ˆη n satisfy (4) with g n (x) = Λ(y n, x) and K n = K. ote also that in that case, g n takes only two values which can be assumed to belong to Q because of rounding errors. It follows from Remark A.4 that sup E 0 [ η η 2] <. The results proved in Section 2 of Gentil et al. (2005) provide an algorithm for computing recursively the exact filter, i.e. the law of X n given Y n. In the next section, one will compare the results from the exact filter with those obtained by the Monte Carlo algorithm described in Section 2 with various sampling methods. 4.2 Simulation results In what follows, R is chosen to be the window of size 00 00 defined by R = {0,... 99} 2. To makes things simple, the target starts at (50, 50) and it moves according to a simple symmetric random walk, i.e its goes up, down, right or left to the nearest neighbor with probability /4. The estimation of the position of the target is taken to be the mean of the various measures. The simulations were performed with p 0 = 0.9 and p = 0.9, that there are 0% of errors in pixels. In order to compare the efficiency of the optimal filter (OF) and the samplings methods described in section 2, i.e. simple random sampling (SRS), remainder stochastic sampling (RSS), systematic sampling (SyS), binomial sampling (BiS), and Bernoulli sampling (BeS), the mean absolute error between the estimated position and the true one was calculated over several time intervals, namely [2, 00], [0, 00] and [30, 00]. The number of particles 0 takes values 000, 0000, 30000 and 50000. The results are reported in the Table. According to these results, one may conclude that the algorithm based on the systematic sampling selection method performs quite well, provided the number of particles is large enough. Surprisingly, the Monte Carlo based approximate filters seem to perform better than the optimal filter. However the difference may not be statistically significant. ext, based on the results of Table for the time interval [30, 00], note that when the target is precisely detected, the error seems to stabilize near zero, indicating that of ηn to η n might be uniform on n. Finally, other simulations performed with several moving targets seem to indicate that the algorithm based on systematic sampling also give impressive results. 9

Table : Mean absolute error for one target performing a simple symmetric random walk in images of size 00 00 with 0% of errors. t [2, 00] [0, 00] [30, 00] OF 2..7.4 0 = 000 SRS 57.4 60.3 56.2 RSS 5.8 53.6 43.8 SyS 42.7 43.7 36.9 BiS 54.0 56.5 45.5 BeS 3.8 2. 6.9 0 = 0000 SRS 76.0 8.0 64. RSS.9 0.8 0.5 SyS 2.4 0.7 0.5 BiS 6.4 6.7 0.5 BeS 77.5 82.5 85.4 0 = 30000 SRS 8.7 3.0 0.5 RSS 2.4 0.6 0.4 SyS 3.9.5 0.4 BiS 4.0 2. 0.5 BeS 8. 6.2 0.4 0 = 50000 SRS 3.9 3.4 0.9 RSS 0.2 5.2 0.7 SyS 5.0 2.5 0.6 BiS 4.8 2. 0.8 BeS 3.6.5 0.3 0

A Convergence results for a Markov chain Suppose that (X i, Y i ) i are independent observations of (X, Y ) Z := R [0, ) of law P, with marginal distributions P X and P Y respectively. Further let λ denotes Lebesgue s measure on [0, ). Given Z 0 = (X 0, S 0 ) R [0, ), set Z i = (X i, {S i }), where S i = S i + Y i, i. For n Z, set e n (s) = e 2πins, s [0, 2), and let ζ n = E (e n (Y )). Further set = {n Z; ζ n = }. Recall that (e n ) Z is a complete orthonormal basis of the Hilbert space H = L 2 ([0, ), λ) with scalar product (f, g) = 0 f(s)ḡ(s)ds and norm f 2 = (f, f). It is easy to check that (Z i ) i 0 is a Markov chain on Z with kernel K defined by (5) Kf(x, s) := f(x, {s + y})p (dx, dy), f B b (Z), Z and stationary distribution µ = P X λ. ote that for any f L 2 (µ), by Tonelli s theorem, Kf is well defined, it depends only on s [0, ), and it belongs to H since Kf 2 2 f 2 (x, {s + y})p (dx, dy)ds = f 2 (x, u)dup (dx, dy) 0 Z Z 0 = f 2 (z)µ(dz) = f 2 L 2 (µ). Z Finally, let L and A be the linear bounded operators from L 2 (µ) to H defined by Lf(s) = n (Kf, e n )e n (s), Af(s) = f(x, s)p X (dx), s [0, ). Theorem A.. Let f L 2 (µ) be given and set W = R f(z k ). Then: (i) If the initial distribution of Z 0 = (X 0, S 0 ) is µ, then W converges almost surely and in mean square to W given by (6) W = Lf(S 0 ) = n (Kf, e n )e n (S 0 ). If, in addition, (7) n Z\ (Kf, e n ) (Af, e n ) ζ n [ then E (W W) 2] converges, as, to <, (8) f 2 L 2 (µ) Lf 2 2 + 2 n Z\ (ii) If the initial distribution of Z 0 = (X 0, S 0 ) is µ, if = {0} and (9) n Z\{0} (Kf, e n ) 2 ζ n 2 <, (Kf, e n )(Af, e n ) ζ n. then the sequence of processes B, defined by B (t) = ( W t µ(f) ), t [0, ], converges in D([0, ]) to σb, where B is a Brownian motion and σ 2 is given by (8).

(iii) If P Y admits a square integrable density h, then the Markov chain is geometrically ergodic, that is, there exists ρ (0, ) such that for any f L 2 (µ), K n f(z 0 ) µ(f) h 2 ρ n 2 f L 2 (µ), n 2. Proof. For simplicity, set ψ = Kf H. To prove (i), start the Markov chain from µ and denote the law of the chain by Q. Then the sequence (Z n ) n 0 is stationary, and Birkhoff s ergodic theorem, e.g. (Durrett, 996, Section 6.2), can be invoked to claim that W converges almost surely and in mean square to some random variable W. To show that W is indeed given by (22), it suffices to show that E [ (W W) 2] tends to 0, as. First, note that E(W 2 ) = Lf 2 2 = (ψ, e n ) 2. ext, set ϕ(s) = Af(s). If n n, then e n (Y ) = P-a.s., and it follows, by Fubini s theorem, that (ψ, e n ) = = 0 0 Z Z f(x, {s + y})e n (s)p (dx, dy)ds f(x, u)e n (u)p (dx, dy)du = (ϕ, e n ). As a result, E(W 2 ) = (ϕ, e n ) 2. ext, using the fact that for any k Z, and any s, y [0, ), one n has e k ({s + y}) = e k (s + y) = e k (s)e k (y), and it follows that Ke k (s) = e k ({s + y})p (dx, dy) = e k ({s + y})p Y (dy) = ζ k e k (s), s [0, ). Z Hence, for any k and any n Z, one obtains [0,) (20) K k e n = ζ k n e n. ow, using the Markov property of the chain together with (20), one has E(W W) = E(W W)= = = = = n (ψ, e n )E n j Z n [ ] (ψ, e n )E f(z k )e n (S 0 ) [ K k f(s 0 )e n (S 0 ) ] = (ψ, e j )(ψ, e n ) ( K k ) e j, e n (ψ, e j )(ψ, e n )ζ k j (e j, e n ) j Z n n (ψ, e n ) 2 ζ k n =E(W 2 ), (ψ, e n ) ( K k ) ψ, e n n since, by definition, ζ n =, for any n. ext, using stationarity, the Markov property, (20), and also using identity 2 2 j= k z j = z z z, z C, z, ( z) 2 2

it follows that E(W) 2 = E [ f 2 (Z 0 ) ] + 2 2 = f 2 L 2 (µ) + 2 2 j= j= k E [ K j ψ(s 0 )f(z 0 ) ] k ( K j ψ, ϕ ) = f 2 L 2 (µ) + n Z(ψ, e n )(ϕ, e n ) = f 2 L 2 (µ) + + 2 2 n Z\ n Z\ 2 2 (ψ, e n )(ϕ, e n ) n (ψ, e n )(ϕ, e n ) k j= ζ j n [ ζ n ζ ] n ζ n ( ζ n ) 2 = f 2 L 2 (µ) + E(W2 ) + 2 [ 2 (ψ, e n )(ϕ, e n ) ζ n ζ ] n ζ n ( ζ n ) 2. Collecting the expressions obtained for E(W 2 ) and E(W W), one gets (2) E [ (W W) 2] = f 2 L 2 (µ) E(W2 ) + 2 2 n Z\ (ψ, e n )(ϕ, e n ) [ ζ n ζ ] n ζ n ( ζ n ) 2. Since n Z\ (ψ, e n) (ϕ, e n ) is finite, sup ζ n ζ n nz\ ζ n ( ζ n ) 2 = k j= it follows from (2) and the Dominated Convergence Theorem that lim E [ (W W) 2] = 0 and under the additional condition (7), one also obtains completing the proof of (i). lim E [ (W W) 2] = f 2 L 2 (µ) E(W2 ) + 2 ζ j n 2 2, n Z\ (ψ, e n )(ϕ, e n ) ζ n, The proof of (ii) is inspired by Durrett (996). First, note that since = {0}, Lf = µ(f) for any f L 2 (µ) and it follows from (i) that f(z k) converges almost surely and in L p to µ(f), for any p 2. Moreover given any f L (µ), one can find f n L 2 (µ) such that f f n L (µ) < n. It follows that for any n, ] ] lim sup [ E f(z k ) µ(f) 2 n + lim sup 3 [ E f n (Z k ) µ(f n ) = 2 n.

Since the latter is true for any n, one may conclude that f(z k) converges in L to µ(f). By Birkhoff s ergodic theorem, f(z k) converges almost surely to µ(f). ext, let D be the subset of H defined by D = h H; (h, e n ) 2 ζ n 2 <, n Z\{0} and let Ξ be the operator from D to H that satisfies Ξh = n Z\{0} (h, e n ) ζ n e n. ote that since (I K)Ξh = (I L)h, then Ξ = (I K) (I L) on D. Let D be the set of all f L 2 (µ) such that f satisfies (9), i.e. Kf D. Then Ξ can be extended to a mapping from D to L 2 (µ) viz. Ξf = (I L)f + ΞKf. Using KL = LK = L, one obtains that Ξ = (I K) (I L) on D. ext, if f D, set g = Ξf. Since Lf = µ(f), it follows that (W µ(f)) = [g(z k ) Kg({S k )}] + Kg(S 0 ) Kg({S }). ow, setting F k = σ{z j ; j k}, the terms ξ k = g(z k ) Kg({S k )} are square integrable martingale differences with respect to (F j ) j 0, i.e. E(ξ k F k ) = 0, and because g 2 and (Kg) 2 both belong to L (µ), it follows from (i), as shown above, that E [ ξ 2 k F k ] = [ ] Kg 2 ({S k }) (Kg) 2 ({S k }) converges almost surely to µ(g 2 ) µ((kg) 2 ). ote that since Kg = ΞKf, one has (Kg, Lf) = 0 and expression (8) can be written as σ 2 = (I L)f 2 L 2 (µ) + 2(ΞKf, Af) = (I K)g 2 L 2 (µ) + 2(Kg, Af) = (I K)g 2 L 2 (µ) + 2(Kg, A(I L)f) = (I K)g 2 L 2 (µ) + 2(Kg, A(I K)g) = µ(g 2 2gKg + (Kg) 2 ) + 2µ(gKg) 2µ((Kg) 2 ) = µ(g 2 ) µ((kg) 2 ). Finally, because of the stationarity of (ξ k ) k, it follows that for any ɛ > 0, [ E ξk 2 I( ξ k > ɛ ] [ ) = E ξ 2 I( ξ > ɛ ] ) 0, as. The conditions of Theorem 7.4 in Durrett (996) are all met, so one may safely conclude that defining the process B (t) = ( W t µ(f) ), t [0, ], then B converges in D[0, ] to σb, where B is a Brownian motion. To prove part (iii), note first that since the density h of Y is square integrable, then = {0}, sup n ζ n = ρ <, ζ n = (e n, h), and h 2 2 = ζ n 2. Therefore, for any g H, (g, e n ) ζ n n Z n Z g 2 h 2 <. It follows that for any k 2, the latter series converging absolutely. Thus K k f(s) µ(f) = sup sup z 0 Z This completes the proof of the theorem. K k f = K k ψ = (ψ, e n )ζn k e n, n Z s [0,) K k ψ(s) λ(ψ) h 2 f L 2 (µ)ρ k 2. n Z\{0} (ψ, e n ) ζ n ρ k 2 4

Remark A.2. ote that if ζ n = for some n > 0, then k ζ k is n-periodic, so {ζ k ; n Z \ } is finite. Therefore sup k Z\ ζ k = ρ < and condition (7) is satisfied. Also, if P Y has a non degenerate absolutely continuous part, then = {0} and sup n ζ n = ρ <, so condition (7) holds true. The next result is a straightforward extension of the previous theorem. Before stating it, denote by ν the joint law of (Z, S 0 ), where S 0 Unif([0, )). Theorem A.3. Suppose that f L 2 (ν) and set W = f(z k, {S k }). Then: (i) If the initial distribution of Z 0 = (X 0, S 0 ) is µ, then W converges almost surely and in mean square to W given by (22) W = Lf(S 0 ) = n (Kf, e n )e n (S 0 ), where Kf(s) = Z f(x, {s + y}, s)p (dx, dy). If Af(s) = Z f(x, s, {s y})p (dx, dy), s [0, ) and if in addition, (23) n Z\ (Kf, e n ) (Af, e n ) ζ n [ then E (W W) 2] converges, as, to (24) f 2 L 2 (ν) Lf 2 2 + 2 n Z\ (ii) If = {0}, if the initial distribution of Z 0 is µ and (25) n Z\{0} (Kf, e n ) 2 ζ n 2 <, <, (Kf, e n )(Af, e n ) ζ n. then the sequence of processes B, defined by B (t) = ( W t µ(f) ), t [0, ], converges in D([0, ]) to σb, where B is a Brownian motion and σ 2 is given by (24). (iii) If P Y admits a square integrable density h, then the Markov chain is geometrically ergodic, that is, there exists ρ (0, ) such that for any f L 2 (µ), K n f(z, S 0 ) µ(f) h 2 ρ n 2 f L 2 (µ), n 2. Remark A.4. For example, suppose that X k is bounded and set f(x, y, s) = x(y s). Then it is easy to check that for any n, (Kf, e n ) = x({y + s} s)p (dx, dy)e n (s)ds = Z [0,) Z [0,) since P (e n (Y ) = ) =. It follows from Theorem A.3 that W = xu(e n (y) )e n (u)p (dx, dy)du = 0, X k ({S k } {S k }) 5

converges to 0 almost surely and in mean square. Furthermore, if card( ) > then condition (23) holds and sup E(W) 2 <, while if P Y is absolutely continuous, then condition (25) holds true and T converges in law to a centered Gaussian random variable with variance σ 2 given by (24). References Baker, J. E. (985). Adaptative selection methods for genetic algorithms. In Proc. International Conf. on Genetic Algorithms amd their Applications., pages 0. L. Erlbaum Associates. Baker, J. E. (987). Reducing bias and inefficiency in the selection algorithm. In Proc. of the Second International Conf. on Genetic Algorithms amd their Applications., pages 4 2. L. Erlbaum Associates. Brindle, A. (980). Genetic algorithms for function optimization. PhD thesis. Crisan, D. (2003). Exact rates of convergence for a branching particle approximation to the solution of the Zakai equation. Ann. Probab., 3(2):693 78. Crisan, D., Del Moral, P., and Lyons, T. (999). Discrete filtering using branching and interacting particle systems. Markov Process. Related Fields, 5(3):293 38. Crisan, D. and Doucet, A. (2002). A survey of convergence results on particle filtering methods for practitioners. IEEE Trans. Signal Process., 50(3):736 746. Crisan, D., Gaines, J., and Lyons, T. (998). Convergence of a branching particle method to the solution of the Zakai equation. SIAM J. Appl. Math., 58(5):568 590 (electronic). Crisan, D. and Lyons, T. (999). A particle approximation of the solution of the Kushner-Stratonovitch equation. Probab. Theory Related Fields, 5(4):549 578. Del Moral, P. (2004). Feynman-Kac formulae. Genealogical and interacting particle systems with applications. Probability and Its Applications. ew York, Y: Springer. xviii, 555 p. Del Moral, P. and Miclo, L. (2000). Branching and interacting particle systems approximations of Feynman-Kac formulae with applications to non-linear filtering. In Séminaire de Probabilités, XXXIV, volume 729 of Lecture otes in Math., pages 45. Springer, Berlin. Douc, R. and Moulines, E. (2005). Limit theorems for weighted samples with applications to sequential monte carlo methods. preprint. Durrett, R. (996). Probability: theory and examples. Duxbury Press, Belmont, CA, second edition. Gentil, I., Rémillard, B., and Del Moral, P. (2005). Filtering of images for detecting multiple targets trajectories. In Statistical Modeling and Analysis for Complex Data Problem. Kluwer Academic Publishers, Springer. Kallianpur, G. and Striebel, C. (968). Estimation of stochastic systems: Arbitrary system process with additive white noise observation errors. Ann. Math. Statist., 39:785 80. Künsch, H. R. (2005). Recursive Monte Carlo Filters: Algorithms and Theoretical Analysis. Ann. Statist., 33(5):in press. Liu, J. S. and Chen, R. (998). Sequential Monte Carlo methods for dynamic systems. J. Amer. Statist. Assoc., 93(443). Sethuraman, S. and Varadhan, S. R. S. (2005). A martingale proof of Dobrushin s theorem for nonhomogeneous Markov chains. Electron. J. Probab., 0:22 235. 6