Nonlinear Filtering: Interacting Particle Resolution

Size: px

Start display at page:

Download "Nonlinear Filtering: Interacting Particle Resolution"

Rachel Waters
5 years ago
Views:

1 onlinear Filtering: Interacting Particle Resolution P. Del Moral Laboratoire de Statistiques et Probabilités, CRS UMR C55830, Bat 1R1, Université Paul Sabatier, 118 Route de arbonne, Toulouse Cedex, France Markov Processes and Related Fields, Volume 2 umber 4, Received April 2, 1996, revised October 3, 1996 Abstract. This paper covers stochastic particle methods for the numerical solution of the nonlinear filtering equations based on the simulation of interacting particle systems. The main contribution of this paper is to prove convergence of such approximations to the optimal filter, thus yielding what seemed to be the first convergence results for such approximations of the nonlinear filtering equations. This new treatment has been influenced primarily by the development of genetic algorithms J. H. Holland [11], R. Cerf [2] and secondarily by the papers of H. Kunita and L. Stettner [12, 13]. Such interacting particle resolutions encompass genetic algorithms. Incidentally, our models provide essential insight for the analysis of genetic algorithms with a non-homogeneous fitness function with respect to time. Keywords: nonlinear filtering, Bayesian estimation, Monte-Carlo sampling, interacting particle systems, sampling-resampling, evolutionary processes, genetic algorithms AMS Subject Classification: 60F10, 60G35, 60J10, 62D05, 62E25, 62F12, 62G05, 62L20, 62M05, 92D15, 93E10, 93E11 1. Introduction The basic model for the general nonlinear filtering problem consists of a nonlinear plant X with state noise W and nonlinear observation Y with observation noise V. Let X, Y be the Markov process taking values in S R and defined by the system: { X ν, K FX Y 1.1 Y n = h X n + V n, n 1, where S = R d, d 1, h : S R and V n are independent random variables having a density g n with respect to the Lebesgue measure. The signal process X

2 2 P. Del Moral that we consider, is assumed to be a temporally homogeneous Markov process on S with transition probability kernel K and initial probability measure ν. We assume that the observation noise V and the state plant X are independent. For simplicity, the observation process Y is taken real valued, the extension to vector observation processes is straightforward. One can study filtering problems in more general settings. We choose not to do so here, and prefer to focus on the central ideas. The methods used in this paper can be extended, if desired. The filtering problem is concerned with estimating a functional of the state process using the information contained in the observation process Y. The information is encoded in the filtration defined by the sigma algebra generated by the observations Y 1,..., Y n. Let f be an integrable Borel test function from S into R. The best estimate of X n given the observations up to time n, is the conditional expectation π n f def = EfX n Y n, Y n def = Y 1,..., Y n. With the notable exception of the linear-gaussian situation, general optimal filters have no finitely recursive solution [3]. This paper covers stochastic particle methods for the numerical solution of the nonlinear filtering equations based on the simulation of interacting particle systems. Such algorithms are an extension of the Sampling/Resampling S/R principles introduced by Gordon, Salmon and Smith in [10] and independently by Del Moral, oyer, Rigal and Salut in [4] and [5]. Several examples of practical problems that can be solved using these methods, are given in [1] and [8], including problems in Radar/Sonar signal processing and GPS/IS integration. Such particle nonlinear filters will differ from the others [6, 7] in the way they store and update the information that is accumulated through the resampling of the positions. We start by giving some general notation and we recall some basic facts related to the theory introduced by Kunita and Stettner [12, 13]. In Section 3 we introduce the interacting particle approximation and we design a natural stochastic basis for the convergence study. In Section 4 we describe recursive formulas for the conditional distributions and the context that we are interested in. We propose a natural framework which allows to explicitly formulate mean error bounds in terms of the likelihood functions related to the resampling/selection procedure. The hardest point of our program is contained in Section 5: the convergence study of our algorithm requires a specific development because of the difficulty to compute mean error estimates which are essential to perform convergence rates. The key idea is to introduce in the mean square error estimates a martingale with unit mean, using the functions g n. This martingale approach simplifies drastically! The evaluation of the convergence rates is discussed in Section 6. This last section contains our main result: we prove that the interacting particle filters converge to the conditional distribution, as the number of particles tends to infinity. The convergence rate estimates arise quite naturally from the results and associated methodologies of Sections 5 and 6.

3 onlinear filtering: interacting particle resolution 3 2. on linear filtering equation 2.1. General notations Before starting the description of the nonlinear filtering equation, let us first introduce some general notation. Let CS be the space of bounded continuous functions on S with norm f = sup x S fx. Let PS be the space of all probability measures on S in which the weak topology is induced: lim µ n = µ in PS for all f CS lim f µ n = f µ. n + n + Let µ PS, f CS and, let K 1 and K 2 be two Markov kernels. We will use the standard notation µk 1 dy = µdx K 1 x, dy K 1 K 2 x, dz = K 1 x, dy K 2 y, dz K 1 fx = K 1 x, dy fy µf = µdx fx. With m PS 2 we associate two measures m and m PS as follows, for all f CS : mf = mdx 1, dx 2 fx 2, mf = mdx 1, dx 2 fx 1. With a Markov kernel K and a measure µ PS we associate a measure µ K PS 2 by setting for all f CS 2 : µ Kf = µdx 1 Kx 1, dx 2 fx 1, x 2. Finally we denote by CPS the space of bounded continuous functions F : PS R Recursive filters and Bayes formula In this section we describe recursive expressions for the conditional distribution of X n and X n, X n+1 given the observations Y n = Y 1,..., Y n. Let Ω = Ω 1 Ω 2, F n, P be the canonical space for the signal observation pair X, Y. Therefore P is the probability measure on Ω corresponding to the filtering model FX Y, when ν is the probability measure of X 0 ; the marginal of P on Ω 1 is the law of X; V n = Y n hx n is a sequence of independent random variables with densities g n.

4 4 P. Del Moral We use E to denote expectation with respect to P on Ω. To clarify the presentation, we will also write m n+1 = π n K. 2.1 Using Bayes Theorem, we see that the conditional distribution of X n given the observations up to time n is given by π n = ρ n π n 1, Y n n 1 π 0 = ν, 2.2 where ρ n µ, yf = fx gn y hx µ Kdx gn y hx µ Kdx 2.3 for all f CS, µ PS, y R and n 1. Much more is true. Following Kunita and Stettner [12, 13], the above description enables us to consider the conditional distributions π n as a σy n, P-Markov process with infinite dimensional state space PS and transition probability kernel Π n defined by Π n F µ = F ρ n µ, y g n y hz µ Kdz dy for any bounded continuous function F : PS R and µ PS. In other words, with some obvious abuse of notation dpy n, x n, x n 1 π n 1 = g n y n hx n dy n π n 1 dx n 1 Kx n 1, dx n 2.4 py n /π n 1 = g n y n hx n π n 1 Kdx n 2.5 ow, the construction of the recursive expression for the conditional distribution of X n, X n+1 given the observations Y n, is a fairly immediate consequence of 2.1 and 2.2. Using the above notation one easily gets m n+1 = Φ n m n, Y n n 1 m 0 = ν K, 2.6 where Φ n m, yf = fx1, x 2 g n y hx 1 mdx 0, dx 1 Kx 1, dx 2 gn y hz 1 mdz 0, dz for all f CS 2, µ PS and y R. 3. Interacting particle systems 3.1. Description of the algorithm The particle system under study will be a Markov chain with state space S, where 1 is the size of the system. The -tuple of elements of S, i.e. the

5 onlinear filtering: interacting particle resolution 5 points of the set S, are called systems of particles and they will be mostly denoted by the letters x, y, z. Given the observations Y = y, we denote by x n, x n+1 n 0 the PS 2 -Markov process defined by the transition probabilities P [y] { x0, x 1 dz 0, z 1 } = p=1 m 0 dz p 0, dzp 1 P [y] { xn, x n+1 dz 0, z 1 x n 1, x n = x 0, x 1 } = p=1 Φ n 1 δ x i 0,x i 1, y n dz p 0, zp By the very definition of m 0 and Φ n we also have the following. Initial Particle System P [y] { x 0 = dx} = p=1 νdx p, Sampling/Exploration P [y] {x n = dx x n 1 = z} = p=1 Kz p, dx p, Resampling/Selection P [y] { x n = dx x n = z} = p=1 g n y n hz i j=1 g ny n hz j δ z idxp. Finally one gets a sequence of particle systems x n 1 = x 1 n 1,..., x n 1 x n = x 1 n,..., x n x n = x 1 n,..., x n. It is essential to remark that the particles x i n are chosen randomly and independently from the population {x 1 n,..., x n } by the law πn defined by the likelihood functions and the present measurement Y n. amely π n = g n y n hx i n j=1 g ny n hx j n δ x i n 3.2 Then it moves to x i n+1 using the transition probability kernel K. In other words, the S -valued Markov chain x n = x 1 n,..., x n is obtained through overlapping another chain x n = x 1 n,..., x n, representing the successive particles obtained

6 6 P. Del Moral by exploring the probability space with the transitions K. More precisely, the motion of particles is decomposed into two stages: x n Sampling/Exploration Resampling/Selection x n x n The algorithm constructed in this way, will be called an interacting particle filter. The terminology interacting is intended to emphasize, that the particles are not independent and it differs from the particle resolutions introduced in [7] and [6]. Remark 3.1. The formulated algorithm permits generalisation in the case that the selecting procedure is used from time to time. In practical situations, a simple way to do this consists of introducing a resampling schedule. For instance, we may choose to resample the particles, when fifty percent of the weights is lower than A p for a convenient choice of A > 0 and p 2. To point out the connection with genetic algorithms and to so emphasize the role of the likelihood functions g n, assume further that 1. the state space S is finite; 2. the Markov transition kernels K l x, z are governed by a parameter l with K l x, z 1 x z, as l tends to infinity; 3. a noise observation V n, also governed by a parameter l, with distribution dp V n v = exp V v log l dv ; exp V u log l du 4. homogeneous series of observations with respect to time for all n 1 Y n = y V y hx def = fx. The corresponding Exploration and Selection mechanisms are governed by a parameter l and they take the following form: P r { x l n = dx x l n = z } = l P r { x n = x l x n 1 = z} = p=1 p=1 K l z p, x p l fzi j=1 l fzj 1 z ixp. For this very special situation, Cerf [2] gives several conditions on the rate of decrease of the perturbations 1/ log ln, to ensure that all particles x ln,i n

7 onlinear filtering: interacting particle resolution 7 visit the set of global maxima of the fitness function f in finite time, when the number of particles is greater than a critical value. Unfortunately there is a critical lack of theoretical results on the convergence of such algorithms for numerically solving the nonlinear filtering equations. The crucial question is of course, whether the empirical random measure 1 δ x i n converges to the conditional distribution π n, when the size of the particle system is growing. This is positively answered in Theorem 6.2 Section 6. We will show that for every bounded Borel test function f : S R and for n 1 lim E 1 + f x i n π n f = We give a new and detailed analysis of this problem. These results are largely recent, although the question of local convergence occurs in [5]. Unlike genetic algorithms it should be emphasized that we are not necessarily trying to exactly recover the unknown state variables. The conditional distribution gives the conditional minimum variance estimate, but the error does not in general converge to zero as time tends to infinity see Kunita [12]. Such particle nonlinear filters differ from those introduced in [7] and [6]. It clearly encompasses the genetic algorithms introduced by Holland [11] and recently developed by R. Cerf [2]. An advantage of this procedure is, that it simultaneously explores the probability space using the a priori Markov kernel and updates the information that is accumulated through resampling of the positions. In introducing such adaptation/selection laws, the particle system acquires certain configurations, so as to represent an estimate of the conditional distribution. They provide a natural procedure for a system of particles to sense its environment through their likelihood functions and the observations. It is clear that such particle resolutions could be formulated in an natural way in other contexts as neural networks, model parameter identification, optimal control [9] The associated Markov process ext we shall proceed to model such interacting particle procedure and the nonlinear filtering problem on a natural stochastic basis. In the preceding paragraph we have remarked on the fact that the particles coincide with the support of random measures that estimate the conditional distribution. A convenient tool for analysing the modelling of such interacting particle systems is the split-

8 8 P. Del Moral ting transition kernel PS 2 C { 1 δ x i : x i S 2} PS 2, 3.5 defined for every F CPS 2, η PS 2 and 1, by C F η = F m C η, dm def = S 2 1 F δ x i ηdx 1... ηdx. 3.6 The interpretation of C is the following: by starting with a measure η PS 2, the next measure is the result of sampling independent random variables x i with common law η η C m = 1 δ x i. Given a series of observations Y = y, we observe that the random empirical measures m n+1 = 1 δ x i n,xi n+1 are the result of sampling independent random variables with common law Φ n m n, y n = π n K = g n y n hx i n j=1 g ny n hx j n δ x i n K. The above observation enables us to consider the empirical measures m n canonical PS 2 -valued Markov process Ω, β n, P [y], by setting as a P [y] {m 1 dη} = C Φ n m 0, y n, dη P [y] {m n+1 dη m n = µ} = C Φ n µ, y n, dη. 3.7 ow we design a stochastic basis for the convergence of our particle approximations. To capture all randomness we list all outcomes into the canonical space defined as follows. 1. Recall Ω, F n, P is the canonical space for the signal observation pair X, Y. 2. We define Ω = Ω Ω and F n = β n F n and, for every ω def = ω 1, ω 2, ω 3 Ω, we define m n ω = ω 1 n, X n ω = ω 2 n, Y n ω = ω 3 n.

9 onlinear filtering: interacting particle resolution 9 3. For every A β n and B F n, we define P as follows: P{A B} def = P [Y ω] A dpω. 3.8 B As usual we use Ẽ to denote the expectation with respect to P, and Ẽ[y] to denote the expectation with respect to P [y]. With some obvious abuse of notation we have d pm 1,..., m n, y 1,..., y n n = C Φk 1 m k 1, y k 1, dm k dp y1,..., y n, k=1 with the convention Φ 0 m 0, y 0 = ν K. 4. General recursive formulas In this section we shall adopt an unconventional model for the conditional distributions. The choice is dictated by our desire to have very simple relationships between the likelihood functions and the conditional distributions. Initially, this will require a quite different setup from the one used in Sections 2 and 3, but in the end of our investigations will resemble more and more the models presented in Section 2. The setting is the same as in Section 3.1. In particular, unless otherwise stated, we assume the observation data to be a fixed series of real numbers Y = y. This assumption enables us to consider the conditional distribution π n as a probability parametrised by the observation parameters y 1,..., y n,.... Therefore, when the context is unambiguous, we will often write for brevity g n x instead of g n y n hx n. Using this notation, an alternative notation for 2.2 is the recursive formula π n f = π n 1K f g n π n 1 Kg n, 4.1. Likelihood functions analysis The proof of the convergence 3.4 involves for all f CS. 1. the Maximum Log-likelihood functions given by pyn,..., y p+1 x p+1 2 V n/p y = sup log dpxp+1 x p S py n,..., y p+1 y p x p, 4.1 S p + 1 n n V n y = V p/p 1 y, 4.2 p=1

10 10 P. Del Moral where py n,..., y p+1 y p denotes the density under P of the distribution of Y n,..., Y p+1 conditionally on Y p = Y 1,..., Y p, py n,..., y p+1 x p+1 denotes the density under P of the distribution of Y n,..., Y p+1 conditionally on X p+1, dpx p+1 x p = Kx p, dx p+1 ; 2. the conditional expectations for 0 p n 1, given by f p n x p = E fx n X p, Y p+1,..., Y n x p, y p+1,..., y n. 4.3 The functions V n/p represent Log-likelihood functions on the observation process Y and they are closely related to the resampling/selection mechanism of our algorithm. This relationship is due to the fact, that the selection mechanism is formulated in terms of the fitness functions and of py n x n = g n y n hx n 4.4 py n,..., y p x p = py n,..., y p+1 x p+1 dpx p+1 x p. 4.5 S Example Assume here that state and observation processes X, Y are given by the linear dynamics X n = A X n 1 + W n 4.6 Y n = C X n + V n, 4.7 where X n R, Y n R, Y 0 = 0, A and C are real numbers and X 0, W n and V n are normally distributed with means 0 and respective non-negative covariances Q 0, Q and R. The conditional densities of Y n given X n and Y n 1 = Y 1,..., Y n 1, are given by 1 py n x n = g n y n C x n = exp 1 2π R 2 y n C x n 2 R 1 py n y n 1 = 1 2π C2 P n/n 1 + R exp 1 2 y n C A X n 1 2 C 2 P n/n 1 + R 1, with the well known measurement update equations X n K n P n/n 1 P n 1/n 1 def = EX n Y n = A X n 1 + K n Y n C A X n 1 def = C P n/n 1 C 2 P n/n 1 + R 1 def = E X n EX n /Y n 1 2 = A 2 P n 1/n 1 + Q def = E X n 1 EX n 1 /Y n 1 2 = P 1 n 1/n 2 + CR 1 C 1.

11 onlinear filtering: interacting particle resolution 11 In this very special situation [ C 2 P n/n 1 + R V n/n 1 Y log exp Y n CA R X n 1 2 C 2 P n/n 1 + R 1]. Further manipulations yield [ n V n/p Y log k=p C 2 P k/k 1 + R C 2 S k p + R with S 0 = 0 and, for A 1, exp Y k CA X k 1 2 C 2 P k/k 1 + R 1], E X k E X k /X p 2 = S k p = 1 Ak p 1 A Q. 2. Let X be a discrete time Markov process belonging to a finite discrete set S and assume that the observation noise V n is a sequence of independent, random variables with dp V n v = exp U n exp Un v dv U n : S R +. In this hidden Markov model, V n/n 1 Y 2 sup x S U n Y n hx. The functions 4.1, 4.3 and 4.5 will be used in our analysis of the convergence 3.4. This analysis requires a specific development, because of the difficulty of estimating mean errors. For instance we have Ẽ [y] 1 2 f x i n = Ẽ[y] m n+1 f 2 = 1 Ẽ[y]π n f Ẽ[y] π n f 2 = 1 Ẽ[y] π n f π n f 2 + Ẽ[y] π n f 2. Thus, Chebyshev s inequality and convenient estimates of Ẽ [y] π n f 2 = Ẽ[y] g n y n hx i n 2 j=1 g ny n hx j n fxi n 4.8 would give a complete answer concerning convergence in L 0 P [y]. Unfortunately, it is difficult to estimate 4.8 or even Ẽ[y] π n f. Accordingly, in order to point out the connections between 4.3 and 4.5 and to so emphasize the role of the Log-likelihood functions V n/p, we first introduce a natural framework in which the relationship between 4.3 and 4.5 is made explicit. The recursive expressions described in the next section, will be used repeatedly for the convergence 3.4 in the last part of the paper.

12 12 P. Del Moral 4.2. Recursive formulas The main purpose of this section is to introduce recursive expressions for 4.3 and 4.5. The technical approach presented here, is to work with the same given sequence of observations Y = y. The conditional distributions π n and the conditional expectations f n p 1 p n will be formulated as a probability and a sequence of functions parametrised by the observation parameters y. The recursive formulas described in this section, will show how the sequence of observations scales the updates of both π n and f n p. Our constructions will be explicit and the recursions will have a simple form. First, let us give some details on the use of Bayes formula in our setting. With some obvious abuse of notation, we have for all 1 p n the recursion py n,..., y p x p = py p x p py n,..., y p+1 x p = py p x p py n,..., y p+1 x p+1 dpx p+1 x p py n,..., y p x p py n,..., y p y p 1 = py p x p py p y p 1 pyn,..., y p+1 x p+1 py p+1 y p dpx p+1 x p py n,..., y 1 = py n y n 1 py n 1 y n 2 py 2 y 1 py 1. To clarify the presentation, we introduce for all 0 p n the following definitions 1. g p n x p = py n,..., y p x p 2. g n/p 1 x p = py n,..., y p x p /py n,..., y p y p 1. With a slight abuse of notation we will often write g n x n instead of g n n x n = py n x n. It is now easily checked from the above remarks, that g n/p 1 x p = g n p x p π p dz p 1 Kz p 1, dz p g n p z p 4.9 g n p x p = g p x p Kx p, dx p+1 g n p+1 x p g n/p 1 x p = g p/p 1 x p Kx p, dx p+1 g n/p x p Summarising, we have the backward recursive formulas. Proposition 4.1. For every 0 p < n 1 g p n = g p K g p+1 n g n/p 1 = g n p π p 1 Kg n p g n/p 1 = g p/p 1 Kg n/p. 4.12

13 onlinear filtering: interacting particle resolution 13 We will adopt the conventions π 1 K = ν and g 0 0 = 1. Moreover, by the very definition of π n and π n, one gets π n f = π n f = For brevity we will write fz gn y n hz m n dz gn y n hz m n dz g n y n hx i n j=1 g ny n hx j n fxi n π n f = m n g n f m n g n, π n f = m n g n f m n g n. Continuing in the same vein, we derive the conditional expectations f p n, introduced in 4.3. Using Bayes rule we have for all 1 p n the following basic equation dpx n, x p x p 1, y p,..., y n = dpx n x p, y p+1,..., y n dpx p x p 1, y p,..., y n = dpx n x p, y p+1,..., y n py n,..., y p x p py n,..., y p x p 1 dpx p x p 1. By the same line of arguments, for all 1 p n, we get dpx n, x p, x p 1 y n = dpx n x p, y p+1,..., y n dpx p, x p 1 y n = dpx n x p, y p+1,..., y n py n,..., y p x p py n,..., y p y p 1 dpx p x p 1 dpx p 1 y p 1. Thus we arrive at πp 1 dx p 1 Kx p 1, dx p f n p x p g n p x p π n f =, 4.14 πp 1 dx p 1 Kx p 1, dx p g n p x p for all f CS, 1 p n. Additionally we have f n p 1 Kxp 1, dx p f n p x p g n p x p x p 1 =, 4.15 Kxp 1, dx p g n p x p for all f CS, 1 p n. Summarising, our conditional expectations f p n can be described as follows.

14 14 P. Del Moral Proposition 4.2. For every f CS, the conditional expectations f n p 1 p n satisfy the recursive formula. For every f CS and 1 p n f p 1 n def = p p Kf n g n/p 1 Kf n g p = Kg n/p 1 Kg n p n, for all 1 p n Moreover, for every f CS and 1 p n π n f = π p 1 K f n p g n p π p 1 K g n p = π p 1 K f n p g n/p 1 π p 1 K g n/p 1 We will adopt the convention f 1 n 5. Mean square estimates def = νf 0 n g 0 n /νg 0 n. In this section we analyse the structure of the Log-likelihood functions V n/p, whilst pointing out explicit bounds. Our next objective is to estimate the convergence rate and mean errors. We shall do this now, beginning with some lemmas that will be used repeatedly in this section. Lemma 5.1. Let f : S R be any integrable Borel test function, and let n 0 and 1. We have P-a.e. m n+1 f β n = π n f m n+1 f β n = π n Kf 5.1 Proof. m n+1 f 2 1 β n = π n f π n f m n+1 f 2 1 β n = π n Kf π n Kf It suffices to note that m n+1 f β n m n+1 f 2 /β n = = mf C πn K, dm = πn f mf 2 C πn K, dm m n+1 f β n m n+1 f 2 β n = 1 π n f π n f 2 = mf C πn K, dm = πn Kf = mf 2 C πn K, dm = 1 π n Kf π n Kf 2.

15 onlinear filtering: interacting particle resolution 15 ext we derive a technical recursive formula in p, for the expressions m p+1 g n/p f n p+1 β p 0 p n 1. Lemma 5.2. Let f : S R be any integrable Borel test function and let n 0 and 1. For every 0 p n 1 we have the recursion m p+1 g n/p f n p+1 β p = πp K g n/p f n p+1 = m p g n/p 1 f n p m, P-a.e. 5.4 p g p/p 1 Proof. Using the inductive definition of f n p, g n/p and Lemma 5.1, we have m p+1 g n/p f p+1 n β p = π p Kg n/p f p+1 = m p = m p n gp/p 1 Kg n/p f n p+1 m p g p/p 1 gn/p 1 f n p m. p g p/p Martingale approach For estimating mean square errors, the key idea is to introduce a P, F n - martingale U, using the functions g n/n 1 and the random measures m n. More precisely, we define Un as follows. Definition 5.1. We denote by U the stochastic process defined by U 0 = 1, U n = m n g n/n 1 U n 1, for all n In other words, U n = n k=1 1 g k/k 1 x i k. Lemma 5.3. Un Un = 1. is a P, F n -martingale with ẼU n = 1 and P-a.s. Proof. The first statement follows by recalling that g n/n 1 x = g nx π n 1 Kg n = g ny n hx py n Y n 1.

16 16 P. Del Moral This gives P-a.s. ẼU n F n 1 = U n 1 = U n 1 mg n/n 1 C π n 1 K, dm dpy n Y n 1 g n y n hz π n 1Kdz dy n = U n 1 and the first assertion follows. To prove Ẽ[Y ]Un = 1, the above discussion goes through with some minor changes. Indeed, P-a.s. Un = ] Ẽ[Y m n g n/n 1 Un 1 = Ẽ[Y ]Ẽ[Y ] m n g n/n 1 β n 1 U n ow, using the inductive definition of g n/p, we obtain Then 5.6 gives m n g n/n 1 β n 1 = m n 1g n/n 2 /m n 1g n 1/n 2. U n = Ẽ[Y ]m n 1g n/n 2 U n 2 = Ẽ[Y ]Ẽ[Y ] m n 1g n/n 2 β n 2 U n 2. This procedure can be repeated. Using the recursive formulas described in Section 4.2, we note that Un = Ẽ[Y ]Ẽ[Y ] m p g n/p 1 β p 1 Up 1 Then m p g n/p 1 β p 1 = m p 1g n/p 2 /m p 1g p 1/p 2. U n = Ẽ[Y ]Ẽ[Y ] m p 1g n/p 2 β p 2 U p 2. Using backward induction in p, the result follows from the fact that m 1 g n/0 = νkg n/0 = 1. The analysis of U is a powerful tool to study the convergence rate of our interacting particle filter. That is, introducing the process U makes the calculation of mean errors possible, then these estimates will be reinterpreted back. As a typical example we have the following lemma. Lemma 5.4. For any integrable Borel test function f : S R, we have U n m n+1 f π n f = 0, P-a.e. 5.7

17 onlinear filtering: interacting particle resolution 17 Proof. Using Lemma 5.3, this is equivalent to proving U n m n+1 f = π n f. ow, using Lemma 5.1 and the fact that f n n = f, note that Un m n+1f = Ẽ[Y ]Ẽ[Y ] m n+1f β n Un Then m n+1 f β n = π n f = m n g n/n 1 f/m n g n/n 1. Un m n+1f = Ẽ[Y ] U n 1 m n g n/n 1 f n n. ow, using backward induction in 1 p n 1 and the recursive formulas described in Section 4.2, we have U p m p+1g n/p f n p+1 = Ẽ[Y ] U p Ẽ [y] m p+1 g n/p f n p+1 β p m p+1 g n/p f p+1 n β p Hence = m p g n/p 1 f p n /m n 1g p/p 1. U n m n+1f = Ẽ[Y ] U p 1 m p g n/p 1 f n p. The result finally follows from the recursive formulas described in Section 4.2 and the fact that n m 1 g n/0 f n 1 = νkg n/0 f n 1 = νkg n/0 f 1 νkg n/0 = π n f Mean square estimates To prove the convergence 3.4, it clearly suffices to prove that lim U n = 0 = lim f x i n π n f Un. 5.8 The main purpose of this section is to provide a way for estimating the rate of convergence of 5.8 in terms of the log-likelihood functions V n/p. Such computations are at the heart of the development. They will drastically simplify the evaluation of the convergence rates discussed in the last section. We first quote the following result. Proposition 5.1. For every series of observations y R,

18 18 P. Del Moral 1. for all 0 p < n, V n/p y V n/n 1 y + V n 1/n 2 y + + V p+1/p y V n y; 2. for every 0 k n and 0 p 0 < < p k 1 < p k = n, we have k l=1 V pl /p l 1 y V n y. Proof. The proof is a straightforward computation using that g n/p = Kg n/p+1 g p+1/p implies V n/p y = log Kgn/p 2 V n/p+1 y + V p+1/p y. ext we want to develop explicit formulas for expressing the effect of the population size and the observation likelihood functions on the mean square errors 5.8. The following propositions will be used repeatedly in the last section. Proposition 5.2. For every 1 and n 0 we have P-a.e. 1. U n U n 1 2 n k= k 1 n k 0 p 0< <p k =n 1 1 k exp l=1 n e VnY. V pl /p l 1 Y, Proof. Inequality 2 is clearly a consequence of inequality 1 and Proposition 5.1. Let us prove inequality 1. Using Lemma 5.3 U n 1 2 = Ẽ[Y ] U n 2 1. Hence, with the standard convention = 0, it is sufficient to prove that U n 2 n k=0 1 1 k 1 n k k exp V pl /p l 1 Y p 0< <p k =n l=1 The proof of 5.9 is based on backward and forward induction in n and maximisation techniques. Using Lemma 5.1 and the recursion in Lemma 5.2, we

19 onlinear filtering: interacting particle resolution 19 have U n 2 = Ẽ[Y ] Ẽ[Y ] m n g n/n 1 2 β n 1 U n 1 2 m n g n/n 1 2 β n 1 1 π n 1Kgn/n π n 1Kg n/n ev n/n 1Y m n 1 g n/n 2 2. m n 1g n 1/n 2 Thus, U n 2 1 ev n/n 1Y U n Ẽ[Y ] Ẽ[Y ] m n 1 g n/n 2 2 β n 2 U n 2 2. By the same line of arguments, for every 1 p n 1 m p+1 g n/p 2 β p m 1 g n/0 2 β 0 1 ev n/py m p g n/p 1 m p g p/p 1 1 ev n/0y νkgn/ Cascading in the above expression U n n 1 + n 1 q=0 1 1 n 1 qe V n/q Y U q 2. Suppose the inequalities 5.9 have been proved for every q n 1, that is then U q 2 q k=0 1 1 k 1 q k k exp V pl /p l 1 Y, 0 p 0< <p k =q l=1 U n k+1 n n 1 + k=0 0 p 0< <p k <n 1 1 n 1+k exp V n/pk Y + k l=1 V pl /p l 1 Y, and the result follows. The same techniques then establish the following result.

20 20 P. Del Moral Proposition 5.3. For any bounded Borel test function f : S R, 1 and n 0, we have P-a.e. Therefore f x i n π n f Un 0 p 0< <p k 1 p k =n 4 f 2 2 f x i n π n f Un 4 f 2 1 n+1 k=1 l=1 1 1 k 1 n+1 k k exp V pl /p l 1 Y n+1 exp V n Y Proof. Let us prove the first statement. For any bounded Borel test function f : S R, using the recursive formulas described in Section 4.2, we have Then 1 2 f x i nun m n+1 f 2 β n = Ẽ[Y ] m n+1 fun 2 = Ẽ[Y ] Ẽ[Y ] m n+1 f 2 β n U n 2 1 f m n g n/n 1 f 2. m n g n/n 1 m n+1 fun 2 1 f 2 Ẽ[Y ] U n Ẽ[Y ] m n g n/n 1 fun Arguing as above, for every 1 p n 1 we get m p+1g n/p f p+1 2 n U p = Ẽ[Y ] Ẽ[Y ] m p+1 g n/p f n p+1 2 βp U p 2 m p+1 gn/p f n p+1 2 βp m 1 1 f 2 e V n/py m p g n/p 1 f p m p g p/p 1 gn/0 f n 1 2 β0 1 f 2 e V n/0y π n f 2. n 2

21 onlinear filtering: interacting particle resolution 21 Using the above inequalities with expression 5.12, we conclude that m n+1 f U n 2 1 f 2 n k=0 1 1 n k e V n/k Y U k n+1 πn f 2, with the convention V n/n = 0. Then, using formula 5.9 we easily obtain m n+1 f π n f U n 2 4 f 2 4 f 2 n l=0 n+1 l=1 1 1 l 1 n l exp 0 p 0< <p l p l+1 =n l+1 s=1 V pl /p l 1 Y 1 1 l 1 n+1 l l exp V pl /p l 1 Y. 0 p 0< <p l 1 p l =n s=1 The second statement follows from the first inequality in Proposition 5.1 and the fact that n l n+1 l. 6. Convergence theorems We are now ready to prove the convergence of the interacting particle approximation described in Section 3. The following theorem is our main result, and it states the relevant consequences of the mean square error estimates stated in Propositions 5.2 and 5.3. Theorem 6.1. For every 1, n 0, any bounded Borel test function f : S R, T > 0, and 0 < ε < 1/2, we have { 1 sup P [Y ] n [0,T ] } f x i n π n f > ε AT, f, Y ε T +1 with AT, f, Y = 2 sup 4 f 2, 1 e V T Y. Moreover, 1 sup n [0,T ] f x i n π n f BT, f, Y 1 with BT, f, Y = 4 f e V T Y /2. P-a.e., T +1 1/2 P-a.e., 6.2

22 22 P. Del Moral Proof. Using Propositions 5.2 and 5.3 and the Cauchy Schwarz inequality, we have 1 = Ẽ[Y ] f x i n π n f 1 f x i n π n f BT, f, Y 1 U n 1 1 n+1 1/2 1 + Ẽ[Y ] f x i n π n f 1 Un and the inequality 6.2 follows. To prove 6.1, write A = {U n 1 ε} B = Using Proposition 5.2 and the fact that { 1 } f x i n π n f > ε. U n 1 ε, for all 0 < ε < 1 implies we have U n 1 ε, P [Y ] {Un 1 ε} P [Y ] { Un 1 ε} 1 1 ε 2 Ẽ[Y ] U n ε n e VnY. Therefore, P [Y ] {A} 1 1 ε n+1 e VnY. On the other hand, proposition 5.3 gives the inequalities P [Y ] {B A} 1 ε 2 Ẽ[Y ] 1 f x i n π n f 2 1 A 1 1 ] 1 εε 2 Ẽ[Y 4 f 2 1 εε f x i n π n f Un n+1 e VnY.

23 onlinear filtering: interacting particle resolution 23 Finally, the inequality P [Y ] {B} P [Y ] {B A} + P [Y ] {A} 6.3 and the fact that 0 < ε < 1/2 implies ε < 1 ε, complete the proof. Corollary 6.1. For every 2, n 0, for any integrable Borel test function f : S R, T > 0, and 0 < ε < 1/2, we have { 1 sup P [Y ] n [0,T ] } f x i n π n f > ε where Aε, T, f, Y = 4 T AT, f, Y /ε 4. Aε, T, f, Y, P-a.e., 6.4 Proof. From the inequality e y 1 + y, it follows that n+1 n + 1 log 1 1 = n + 1 log 1 and that log x x 1. Thus, we conclude that n+1 n + 1 2n n 1 for all 2 and for all n 1. This completes the proof. Theorem 6.2. Let Y = y R be a series of observations such that V n y < +. For every bounded Borel test function f : S R 1 and n 0, we have for all p > 0, Proof. Using the inequality for all ε > 0, sup n [0,T ] Ẽ [y] 1 Ẽ [y] 1 and Corollary 6.1, the result follows. p f x i n π n f ε p + 2 f p P[y] { 1 p f x i n π n f Consider the following assumption on the observation process: } f x i n π n f > ε V for all n 0 Ẽ V n Y def = αn < This assumption enables us to estimate the convergence rate of our approximations in spaces L p P with p > 0.

24 24 P. Del Moral Theorem 6.3. Assume V is satisfied. For every bounded Borel test function f : S R 1, n 0, 0 < ε < 1/2 and M > 0, we have { 1 sup P n [0,T ] } f x i n π n f > ε 1 M with AT, M, f, ε = 8 sup 4 f, 1 T e M αt /ε 4. Proof. Using Theorem 6.1 { 1 P } f x i n π n f ε P V n Y Mαn with Af = 2 sup 4 f, 1, we get { 1 P } f x i n π n f ε 1 1 M AT, M, f, ε +, 6.7 Af emαn ε n+1, Af emαn + ε n+1. The arguments used in the proof of Corollary 6.1 complete the proof. Finally, the following corollary is derived by the same reasoning as in Theorem 6.2. Corollary 6.2. Assume V is satisfied. For every bounded Borel test function f : S R, 1 and n 0, we have for all p > 0, Acknowledgements 1 sup Ẽ n [0,T ] p f x i n π n f I am deeply grateful to Laure Coutin and Pierre Vandekerkhove for their careful reading of the original manuscript. I also thank the anonymous referee for his comments and suggestions that greatly improved the presentation of the paper. References [1] H. Carvalho, P. Del Moral, A. Monin and G. Salut 1996 Optimal nonlinear filtering in GPS/IS integration. To appear in IEEE Trans. Aerospace and Electron. Syst.

25 onlinear filtering: interacting particle resolution 25 [2] R. Cerf 1994 Une théorie asymptotique des Algorithmes Génétiques. Université Montpellier II, Ph.D. Sciences et techniques du Languedoc. [3] M. Chaleyat-Maurel and D. Michel 1983 Des résultats de non existence de filtres de dimension finie. C.R. Acad. Sci. Paris, Ser. I, t [4] P. Del Moral, G. Rigal, J.C. oyer and G. Salut 1993 Traitement non-linéaire du signal par reseau particulaire: application radar. Quatorzième colloque GRETSI, Juan les Pins, Sept [5] P. Del Moral 1994 Résolution particulaire des problèmes d estimation et d optimisation non-linéaires. Thèse de l Université Paul Sabatier, Rapport LAAS 94269, Juin [6] P. Del Moral and G. Salut 1995 Filtrage non-linéaire: résolution particulaire à la Monte-Carlo. C.R. Acad. Sci. Paris, Ser. I, t [7] P. Del Moral 1995 on linear filtering using random particles. Probab. Theory Appl [8] P. Del Moral, J.C. oyer and G. Salut 1995 Résolution particulaire et traitement non-linéaire du signal: Application Radar/Sonar. Traitement du signal, Septembre [9] P. Del Moral and G. Salut 1996 Particle interpretation of nonlinear filtering and optimization. Russian Journal of Mathematical Physics. In press. [10].J. Gordon, D.J. Salmon and A.F.M. Smith 1993 ovel approach to non-linear/non-gaussian Bayesian state estimation. IEE. [11] J.H. Holland 1975 Adaptation in atural and Artificial Systems. University of Michigan Press, Ann Arbor. [12] H. Kunita 1971 Asymptotic behavior of nonlinear filtering errors of Markov processes. Journal of Multivariate Analysis 1 4, [13] L. Stettner 1989 On Invariant Measures of Filtering Processes. Stochastic Differential Systems, Lecture otes in Control and Information Sciences, 126, Springer Verlag.

SUPPLEMENT TO PAPER CONVERGENCE OF ADAPTIVE AND INTERACTING MARKOV CHAIN MONTE CARLO ALGORITHMS

SUPPLEMENT TO PAPER CONVERGENCE OF ADAPTIVE AND INTERACTING MARKOV CHAIN MONTE CARLO ALGORITHMS Submitted to the Annals of Statistics SUPPLEMENT TO PAPER CONERGENCE OF ADAPTIE AND INTERACTING MARKO CHAIN MONTE CARLO ALGORITHMS By G Fort,, E Moulines and P Priouret LTCI, CNRS - TELECOM ParisTech,