Lecture 3: Linear FIR Adaptive Filtering Gradient based adaptation: Steepest Descent Method

Size: px

Start display at page:

Download "Lecture 3: Linear FIR Adaptive Filtering Gradient based adaptation: Steepest Descent Method"

Crystal Skinner
6 years ago
Views:

1 1 Lecture 3: Linear FIR Adaptive Filtering Gradient based adaptation: Steepest Descent Method Adaptive filtering: Problem statement Consider the family of variable parameter FIR filters, computing their output according to y(n) = w 0 (n)u(n)+w 1 (n)u(n 1) + w M 1 (n)u(n M +1) = M 1 k=0 w k (n)u(n k) =w(n) T u(n), n =0, 1, 2,, where parameters w(n) are allowed to change at every time step, u(t) is the input signal, d(t) is the desired signal and {u(t) and d(t)} are jointly stationary Given the parameters w(n) =[w 0 (n),w 1 (n),w 2 (n),,w M 1 (n)] T, find an adaptation mechanism w(n + 1) = w(n) + δw(n), or written componentwise w 0 (n +1) = w 0 (n)+δw 0 (n) w 1 (n +1) = w 1 (n)+δw 1 (n) w M 1 (n +1) = w M 1 (n)+δw M 1 (n) such as the adaptation process converges to the parameters of the optimal Wiener filter, w(n +1) w o, no matter where the iterations are initialized (ie w(0))

2 Lecture 3 2 The key of adaptation process is the existence of an error between the output of the filter y(n) and the desired signal d(n): e(n) =d(n) y(n) =d(n) w(n) T u(n) If the parameter vector w(n) will be used at all time instants, the performance criterion will be (see last Lecture) J(n) =J w(n) = E[e(n)e(n)] = E[(d(n) w T (n)u(n))(d(n) u T (n)w(n))] = E[d 2 (n)] 2E[d(n)u T (n)]w(n)+w T (n)e[u(n)u T (n)]w(n) = σd 2 2p T w(n)+w T (n)rw(n) = σd 2 2 M 1 i=0 p( i)w i + M 1 l=0 M 1 i=0 w l w i R i,l (1)

3 Lecture 3 3 Expressions of the gradient vector

4 Lecture 3 4 The gradient of the criterion J(n) with respect to the parameters w(n) is w(n) J(n) = J(n) w 0 (n) J(n) w 1 (n) J(n) w M 1 (n) = 2p(0)+2 M 1 i=0 R 0,i w i (n) 2p( 1) + 2 M 1 i=0 R 1,i w i (n) 2p( M +1)+2 M 1 i=0 R 1,M 1 w i (n) = 2p +2Rw(n) (2) w(n) J(n) = 2Ed(n)u(n)+2 M 1 i=0 Eu(n)u(n i)w i (n) 2Ed(n)u(n 1) + 2 M 1 i=0 Eu(n 1)u(n i)w i (n) 2Ed(n)u(n M +1)+2 M 1 i=0 Eu(n M +1)u(n i)w i (n) = 2Eu(n) ( d(n) M 1 i=0 u(n i)w i (n) ) 2Eu(n 1) ( d(n) M 1 i=0 u(n i)w i (n) ) 2Eu(n M +1) ( d(n) M 1 i=0 u(n i)w i (n) )

5 Lecture 3 5 = 2E (u(n)e(n)) 2E (u(n 1)e(n)) 2E (u(n M +1)e(n)) = 2E u(n) u(n 1) u(n M +1) e(n) = 2Eu(n)e(n) (3) ((2) and (3) give the solution of exercise 7, page 229 [Haykin 2002]) Steepest descent algorithm w(n +1) = w(n) 1 2 µ w(n)j(n) = w(n)+µeu(n)e(n)

6 Lecture 3 6 SD ALGORITHM Steepest descent search algorithm for finding the Wiener FIR optimal filter

7 Lecture 3 7 Given the autocorrelation matrix R = Eu(n)u T (n) the cross-correlation vector p(n) = Eu(n)d(n) Initialize the algorithm with an arbitrary parameter vector w(0) Iterate for n =0, 1, 2, 3,,n max w(n +1)=w(n)+µ[p Rw(n)] Stop iterations if p Rw(n) < ε Designer degrees of freedom: µ, ε, n max

8 Lecture 3 8 Equivalent forms of the adaptation equations We have the following equivalent forms of adaptive equations, each showing one facet (or one interpretation) of the adaptation process 1 Solving p = Rw o for w o using iterative schemes: w(n +1)=w(n)+µ[p Rw(n)] 2 or componentwise w 0 (n +1)=w 0 (n)+µ(p(0) M 1 i=0 R 0,i w i (n)) w 1 (n +1)=w 1 (n)+µ(p( 1) M 1 i=0 R 1,i w i (n)) w M 1 (n +1)=w M 1 (n)+µ(p( M +1) M 1 i=0 R M 1,i w i (n)) 3 Error driven adaptation process (See Figure at page 6): w(n +1)=w(n)+µ[Ee(n)u(n)] 4 or componentwise w 0 (n +1)=w 0 (n)+µ(ee(n)u 0 (n)) w 1 (n +1)=w 1 (n)+µ(ee(n)u 1 (n)) w M 1 (n +1)=w M 1 (n)+µ(ee(n)u M 1 (n))

9 Lecture 3 9 Example: Running SD algorithm to solve for the Wiener filter derived in the example of Lecture 2 Rw = p; r u(0) r u (1) r u (1) r u (0) w 0 w 1 = Ed(n)u(n) Ed(n)u(n 1) with the solution We will use the Matlab code w 0 w 1 w 0 w 1 = = R_u=[11 05 ; % Define autocovariance matrix ]; p= [05271 ; ] % Define cross-correlation vector W=[-1;-1]; mu= 001; % Initial values for the adaptation algorithm Wt=W; % Wt will record the evolution of vector W for k=1:1000 % Iterate 1000 times the adaptation step W=W +mu*(p-r_u*w); % Adaptation Equation! Quite simple! Wt=[Wt W]; % Wt records the evolution of vector W end % Is W the Wiener Filter?

10 Lecture W_1 W_2 Criterion Surface and Adaptation process 10 1 W(2) Parameter error w(n) w_ W(1) n (iteration) Figure 1: Adaptation Step µ =001 Left: Adaptation process starts from an arbitary point w(0) = [ 1, 1] T markedwith a cross and converges to Wiener filter parameters w(0) = [ ] T ( x point, in the center of ellipses) Right The convergence rate is exponential (note the logarithmic scale for the parametric error) In 1000 iterations the parameter error reaches 10 3

11 Lecture W_1 W_2 Criterion Surface and Adaptation process W(2) Parameter error w(n) w_ W(1) n (iteration) Figure 2: Adaptation Step µ =1 Left: Stable oscillatory adaptation process starting from an arbitary point w(0) = [ 1, 1] T marked with a cross and convergence to Wiener filter parameters w(0) = [ ] T ( x point, in the center of ellipses) Right The convergence rate is exponential In about 70 iterations the parameter error reaches 10 16

12 Lecture 3 12 W_1 W_2 Criterion Surface and Adaptation process W(2) Parameter error w(n) w_ W(1) n (iteration) Figure 3: Adaptation Step µ =4 Left: Unstable oscillatory adaptation process starting from an arbitary point w(0) = [ 1, 1] T marked with a cross and divergence from Wiener filter parameters w(0) = [ ] T ( x point, in the center of ellipses) Right The divergence rate is exponential In about 400 iterations the parameter error reaches 10 30

13 Lecture W_1 W_2 Criterion Surface and Adaptation process W(2) Parameter error w(n) w_ W(1) n (iteration) Figure 4: Adaptation Step µ =0001 Left: Very slow adaptation process, starting from an arbitary point w(0) = [ 1, 1] T marked with a cross It will eventually converge to Wiener filter parameters w(0) = [ ] T ( x, point in the center of ellipses) Right The convergence rate is exponential (note the logarithmic scale for the parametric error) In 1000 iterations the parameter error reaches 07

14 Lecture 3 14 Stability of Steepest Descent algorithm Write the adaptation algorithm as w(n +1)=w(n)+µ[p Rw(n)] (4) But from Wiener-Hopf equation p = Rw o and therefore w(n +1)=w(n)+µ[Rw o Rw(n)] = w(n)+µr[w o w(n)] (5) Now subtract Wiener optimal parameters,w o, from both members w(n +1) w o = w(n) w o + µr[w o w(n)] = (I µr)[w(n) w o ] and introducing the vector c(n) =w(n) w o,wehave c(n +1)=(I µr)c(n) Let λ 1,λ 2,,λ M be the (real and positive) eigenvalues and let q 1,q 2,,q M be the (generally complex) eigenvectors of the matrix R, thus satisfying Rq i = λ i q i (6) Then the matrix Q =[q 1 q 2 q M ] can transform R to diagonal form Λ = diag(λ 1,λ 2,,λ M ) where the superscript H means complex conjugation and transposition R = QΛQ H (7) c(n +1)=(I µr)c(n) =(I µqλq H )c(n)

15 Lecture 3 15 Left- multiplying by Q H Q H c(n +1)=Q H (I µqλq H )c(n) =(Q H µq H QΛQ H )c(n) =(I µλ)q H c(n) We notate ν(n) the rotated and translated version of w(n) ν(n) =Q H c(n) =Q H (w(n) w o ) (8) and now we have ν(n +1)=(I µλ)ν(n) where the initial value ν(0) = Q H (w(0) w o ) We can write componentwise the recursions for ν(n) ν k (n +1)=(1 µλ k )ν k (n) k =1, 2,,M which can be solved easily to give ν k (n) =(1 µλ k ) n ν k (0) k =1, 2,,M For the stability of the algorithm 1 < 1 µλ k < 1 k =1, 2,,M or 0 <µ< 2 λ k k =1, 2,,M or 0 <µ< 2 λ max STABILITY CONDITION!

16 Lecture 3 16 where λ max is the maximum eigenvalue of autocovariance matrix R If the stability condition is respected, this will result in ν k (n) 0, ie w k (n) w ok when n, k =1, 2,,M Notating τ k = 1 log( 1 µλ k ),wehave 1 µλ k = exp( 1 τ k ) and therefore ν k (n) = 1 µλ k n ν k (0) = exp( n τ k ) ν k (0) which is a decreasing exponential, which needs approximately 4τ k steps to reduce ν k (0) to 2% of its value Thus, we have obtained * a condition of stability, * information about the transient behavior and speed of convergence of parameters ν k (n) We can transfer back to the original parameters the results of analysis: The matrix Q =[q 1 q 2 q M ] obeys Q H Q = QQ H = I and multiplying by Q the extreme terms of the equalities ν(n) =Q H c(n) =Q H (w(n) w o ) Qν(n) =w(n) w o

17 Lecture 3 17 w(n) =w o + Qν(n) =w o +[q 1 q 2 Writing componentwise the equality, we have q M ]ν(n) =w o + M q k ν k (n) k=1 w i (n) =w o,i + M q k,i ν k (0)(1 µλ k ) n k=1 The fastest rate of convergence is obtained for the eigenvalue λ k which gives the least term 1 µλ k, while the slowest rate of convergence results for the eigenvalue λ k which gives the largest term 1 µλ k If we limit µ such that 0 < 1 µλ k < 1, then 1 log(1 µλ max ) τ a Transient behavior of the Mean -squared error 1 log(1 µλ min ) In Lecture 2 we obtained the canonical form of the quadratic form which expresses the mean square error: where ν was defined as Taking into account that J w(n) = J wo + M λ i ν i 2 i=1 ν(n) =Q H c(n) =Q H (w(n) w o ) (9) ν k (n) =(1 µλ k ) n ν k (0) k =1, 2,,M

18 Lecture 3 18 we have finally The convergence J w(n) = J wo + M λ i (1 µλ i ) 2n ν i (0) 2 i=1 J w(n) J wo (10) takes place under the same conditions as parameter convergence, and the rate of convergence is bounded, similarly, by 1 2log(1 µλ max ) τ a 1 2log(1 µλ min ) (11)

SGN Advanced Signal Processing: Lecture 4 Gradient based adaptation: Steepest Descent Method

SGN Advanced Signal Processing: Lecture 4 Gradient based adaptation: Steepest Descent Method SGN 21006 Advanced Signal Processing: Lecture 4 Gradient based adaptation: Steepest Descent Method Ioan Tabus Department of Signal Processing Tampere University of Technology Finland 1 / 20 Adaptive filtering: