4. Linear filtering. Wiener filter Assume (X n, Y n is a pair of random sequences in which the first component X n is referred a signal and the second an observation. The paths of the signal are unobservable and, on the contrary, the paths of observation Y n are observable (measurable. Filtering is a procedure of estimating the signal value X n by a past of the observation Y n, Y n 1,..., Y n m := Yn m n (traditionally, m = n or m =. Denote X n a filtering estimate which is a function of observation, that is X n = X n (Yn m. n Here and now, we restrict ourselves by a consideration of linear estimates (shortly - linear filters and shall evaluate a quality of filtering by the mean square error P n = E(X n X n 2 for every time parameter n. Obviously, such a setting requires the existence of second moments for the signal and observation EXn 2 <, EYn 2 < for all n and gives the optimal estimate in the form of conditional expectation in the wide sense X n = Ê(X n Y n, (4.1 where Y n is a linear space generated by 1, Y n, Y n 1,..., (henceforth, X n is assumed to be defined by (4.1. A filtering problem for random processes consists in an estimating of an unobservable signal (random process via paths of an observable one. 4.1. The Wiener filter. Assume that signa-observation pair (X n, Y n is a stationary random sequence in the wide sense. To have X n also a stationary sequence, it is assumed that m =, that is the path Y n measurable. Since X n Y n it is defined as: X n = c n + c n,k Y k, (4.2 where the sum in (4.2 converges in L 2 -norm and parameters c n, c n,k s are chosen such that the error X n X n is orthogonal to Y n. Without loss of a generality we may assume that both the signal and observation are zero mean and so taking the expectation from both sides of (4.2 we find that c n 0. In spite of a general formula for Ê(X n Y n is given in Theorem 3.1 (Lect. 3, its application to the case considered is problematic owing to infinite dimensional of Y n (one could apply that formula only if we are ready to be satisfied by an approximation of X n being the orthogonal projection on the linear space Y [n m,n] generated by 1, Y n, Y n 1,..., Y n m for large m. Set R yy (k l = EY k Y l and R xy (k l = EX k Y l. 1
Theorem 4.1. The optimal coefficients, c n,k c n k, and solves an infinite system of linear algebraic equation (Wiener-Hopf equation R xy (n l = c n k R y,y (k l. (4.3 Proof. Since the filtering error X n X n is orthogonal to Y n, for any Y l, l n we have E ( X n X n Y n Yl = 0. Taking into consideration (4.2, with c n 0, from the latter equality we derive an infinite system of linear equations R xy (n l = c n,k R y,y (k l. (4.4 So it remains to prove only that c n,k = c n k. Notice that l = n, (4.4 is transformed to l 0 R xy (0 = c l,k R y,y (k l = c l,l j R y,y (j. j= Since the left side of this equality is independent of l, the right one has to be independent of l as well, what provides c l,l j = c l (l j = c j. Example. Clearly, a solution of the Wiener-Hopf equation would be difficult to find directly. So, we give here a simple example for which solution for Wiener-Hopf equation is found. Let (X n, Y n be defined as follows: for any n > X n = ax n 1 + ε n Y n = X n + δ n, where (ε n, (δ n are orthogonal white noises, that is any linear combinations of (ε n and (δ n are uncorrelated, with Eε n 0, Eε 2 n 1 and Eδ n 0, Eδ 2 n 1. An absolute value of the parameter a is smaller than 1, a < 1, what guarantees that is the recursion for X n is stable and X n is a stationary process. Parallel to X n, let us introduce X n,n 1 = E(X n Y n 1 and Ŷ n,n 1 = E(Y n Y n 1 the orthogonal projections of X n and Y n on Y n 1 the subspace of Y n and denote also by Var n,n 1 (X = E ( X n X n,n 1 2 Var n,n 1 (X, Y = E ( X n X n,n 1 ( Y n Ŷn,n 1 It is obvious that X n,n 1 = E(X n Y n 1 = E(aX n 1 + ε n Y n 1 = a X n 1 2
and similarly Ŷ n,n 1 = E(X n + δ n Y n 1 = a X n 1. These relations allow us to find explicitly that and that Var n,n 1 (X = E ( 2 X n aŷn 1 = E ( a(x n 1 a X n 1 + ε n 2 = a 2 P n 1 + 1 (4.5 Var n,n 1 (Y = E ( Y n aŷn 1 2 = E ( a(x n 1 a X n 1 + ε n + δ n 2 = a 2 P n 1 + 2 (4.6 Cov n,n 1 (X, Y = E ( X n a X n 1 ( Yn a X n 1 = E ( a(x n 1 a X n 1 + ε n ( a(xn 1 a X n 1 + ε n + δ n = a 2 P n 1 + 1. (4.7 We show now that X n = a X n 1 + a2 P n 1 + 1 (Y a 2 n a P n 1 + 2 X n 1. (4.8 The proof of (4.8 looks like the proof of Theorem 3.1 (Lect.3. Set η = X n X n,n 1 + C(Y n Ŷn,n 1 (4.9 and choose the constant C such that η Y n, that is Eηξ = 0 for any ξ Y n. With ξ = Y n Ŷn,n 1, we have 0 = Cov n,n 1 (X, Y + CVar n,n+1 (Y, i.e. C is uniquely defined: C = a2 P n 1 +1 a 2 P n 1 +2 and (4.8 is provided by (4.9. Notice also that the mean square error P n P, since (X n, X n forms stationary process. Hence, the recursion, given in (4.8, is transformed to ( X n = a a a2 P + 1 a 2 P + 2 Xn 1 + a2 P + 1 a 2 P + 2 Y n. (4.10 We have to find now the number P. Notice that η = X n X n, so that P = Eη 2 and from (4.9 it follows P = a 2 P + 1 (a2 P + 1 2 a 2 P + 2 = a2 P + 1 a 2 P + 2 < 1 and, then (4.10 looks as: X n = a(1 P X n 1 + P Y n. (4.11 3
Emphasizing that a(1 P < 1 we derive from (4.11 that ( n kp X n = a(1 P Yk, so that c j ( a(1 P jp. Generalization of the example. (X n, Y n, treated as the signal and observation, is zero mean stationary in wide sense process being part of entries for Z n the stationary Markov process in the wide sense: Z n = AZ n 1 + ξ n, (4.12 where A is a matrix eigenvalues of which have negative real parts and (ξ n is a vector-valued white noise type sequence; the latter means that for some nonnegative definite matrix B { Eξ k ξl T B, k = l = 0 otherwise. For notational convenience, assume that X n is the first entry of Z n and Y n the last one. Also introduce Z n the subvector of Z n containing all entries excluding Y n. Let for a definiteness, Z n be the vector(column ( Z of the size p. Since Z n = n, we rewrite (4.12 into the form Y n ( ( ( ( Z n A11 A = 12 Z n 1 ξ + n Y n A 21 A 22 Y n 1 ξ, (4.13 n ( ( A11 A where 12 ξ (= A and n A 21 A 22 ξ (= ξ n ; ξ n contains q 1 first n component of ξ n. Notice also that B = Eξ n(ξ n T, B 12 ξ n(ξ n T, B 21 = B 21, B 22 E(ξ n 2. Denote Ẑ n = Ê(Z n Y n. Under Ẑ n is the stationary process defined recurrently: ( B11 B 12, where B B 21 B 11 22 B 22 > 0, (4.14 Ẑ n = A 11 Ẑ n 1 + A 12 Y n 1 + B 12 + A 11 P A T ( 12 Yn A B 22 + A 21 P A T 21 Ẑ n 1 A 22 Y n 1 21 (4.15 where P E ( Z n n( Ẑ Z T n n Ẑ solves the algebraic Riccati equation ( ( P = A 11 P A B12 + A 11 P A T 12 B12 + A 11 P A T 11 + B 11 12. (4.16 B 22 + A 21 P A T 21 4
A derivation of (4.15 and (4.16 follows from Theorem 5.1 (Lect. 5. The first coordinate of Ẑ n, obviously, is X n. That estimate possesses a structure X n = n c n k Y k with coefficients c n k calculating in accordance with (4.15 and (4.16. This result shows how intricate might be a method for solving of the Wiener-Hopf equation. There is another way for solving of the Wiener-Hopf equation given in terms spectral densities. For instance, to apply the Fourier transform to both sides of (4.3. It is exposed in others courses. Fixed length of observation. Let m be fixed number and Y [n m,n] be a linear space generated by 1, Y n, Y n 1,..., Y n m. Define X [n m,n] = Ê( X n Y [n m,n] the filtering estimate under fixed length of observation. Denote Y n ( ( Y Yn m n = n 1. and Cov [n m,n] (Y, Y = E Yn m n Y n T ( n m ( Cov [n m,n] (X, Y = E X n Y n T n m Y n m Since (X n, Y n is the stationary sequence, for any number k we have { Cov [n m,n] (Y, Y = Cov [k m,k] (Y, Y Cov [n m,n] (X, Y = Cov [k m,k] (X, Y. The latter property provides, assuming that Cov [n m,n] (Y, Y is a nonsingular matrix, that the vector H m = Cov [n m,n] (X, Y Cov 1 [n m,n](y, Y (4.17 depends only on the number m and so is fixed for any n. Thus, by Theorem 3.1 (Lect. 3 we have (recall that EX n 0 and EYn m n 0 X [n m,n] = H m Yn m. n (4.18 An attractiveness of this type filtering estimate is twofold. First, for fixed m the vector H m is independent of n and so is fixed (computed only once. Second, taking into consideration that for fixed n the family M [n m,n], m 1 increases, i.e. M [n m,n] M [n (m+1,n]... M n, the sequence of random variables X [n m,n], m 1 forms a martingale in the wide sense. Therefore and by EXn 2 <, we have (see Section 3.4 in Lect. 3 lim E( Xn X 2 [n m,n] = 0. m Consequently, for m large enough X n m,n] might be an appropriate approximation for X n. 5