Adaptive Beamforming Algorithms

Size: px

Start display at page:

Download "Adaptive Beamforming Algorithms"

Phillip Newton
5 years ago
Views:

1 S. R. Zinka October 29, 2014

2 Outline 1 Least Mean Squares 2 Sample Matrix Inversion 3 Recursive Least Squares 4 Accelerated Gradient Approach 5 Conjugate Gradient Method

3 Outline 1 Least Mean Squares 2 Sample Matrix Inversion 3 Recursive Least Squares 4 Accelerated Gradient Approach 5 Conjugate Gradient Method

4 Gradient Descent Method * x 1 x 2 x 3 x 4 x 0 * x n+1 = x n γ F (x n) (1) In general, γ is allowed to change at every iteration. However, for this algorithm we use constant γ.

5 Gradient Descent Method - γ and Convergence

6 Least Mean Squares Algorithm The squared error (between the reference signal and the array output) is given as ɛ (t) = r (t) w H x (t). (2) Taking the expected values of both sides of the equation (2) gives { ξ = E ɛ (t) 2} = E {ɛ (t) ɛ (t)} { [r = E (t) w H x (t) ] [ r (t) w H x (t) ] } = E {[ r (t) w H x (t) ] [ r (t) x H (t) w ]} { } = E r (t) 2 + w H x (t) x H (t) w w H x (t) r (t) r (t) x H (t) w { = E r (t) 2} + w H Rw w H E {x (t) r (t)} E { r (t) x H (t) } w { = E r (t) 2} + w H Rw w H z z H w, (3) where z = E {x (t) r (t)} is known as signal correlation vector.

7 Least Mean Squares Algorithm... Contd The mean square error ( { E ɛ (t) 2}) surface is a quadratic function of w and is minimized by setting its gradient with respect to w equal to zero, which gives { w E ɛ (t) 2} wopt = 0 Rw opt z = 0 w MSE = R 1 z. (4) In general, we do not know the signal statistics and thus must resort to estimating the array correlation matrix (R) and the signal correlation vector (z) over a range of snapshots or for each instant in time.

8 Least Mean Squares Algorithm... Contd The instantaneous estimates of R and z are as given below: ˆR (k) = x (k) x H (k) (5) ẑ (k) = r (k) x (k) (6)

9 Least Mean Squares Algorithm... Contd The method of steepest descent can be approximated in terms of the weights using the LMS method as shown below: w (k + 1) = w (k) γ w (ξ) = w (k) γ [Rw (k) z] w (k) γ [ ˆR (k + 1)w (k) ẑ (k + 1) ] = w (k) γ [ x (k + 1) x H (k + 1) w (k) r (k + 1) x (k + 1) ] = w (k) γ { x (k + 1) [ x H (k + 1) w (k) r (k + 1) ]} = w (k) γ {x (k + 1) [ w H (k) x (k + 1) r (k + 1) ] } = w (k) + γɛ (k + 1) x (k + 1) (7)

10 LMS - Convergence of Weight Vector According to LMS algorithm, w (k + 1) = w (k) γ [ x (k + 1) x H (k + 1) w (k) r (k + 1) x (k + 1) ]. Taking the expected value on both sides of the above equation E [w (k + 1)] = E [w (k)] γe [ x (k + 1) x H (k + 1) w (k) ] + γe [r (k + 1) x (k + 1)] w (k + 1) = w (k) γr w (k) + γz = [I γr] w (k) + γz. (8) Define a mean error vector v as v (k) = w (k) w MSE (9) where w MSE = R 1 z. Then, v (k + 1) = w (k + 1) w MSE = [I γr] w (k) + γz w MSE = [I γr] [ v (k) + w MSE] + γz w MSE = [I γr] v (k) + [I γr] w MSE + γiz Iw MSE }{{} 0 = [I γr] v (k). (10)

11 LMS - Convergence of Weight Vector... Contd From equation (10), v (k + 1) = [I γr] k+1 v (0). (11) Using eigenvalue decomposition, the above equation an be written as v (k + 1) = [ I γqλq H] k+1 v (0), (12) where Λ is a diagonal matrix of the eigenvalues and QQ H = I. The above equation can further be written as v (k + 1) = Q [I γλ] k+1 Q H v (0). (13) For convergence, as the iteration number increases, each diagonal element of the matrix [I γλ] k+1 should diminish, i.e., 1 γλ i < 1, i. (14)

12 LMS - Convergence of Weight Vector... Contd From the previous slide, step size γ should be (for all λ i values) 1 γλ i < 1 1 < (1 γλ i )< 1 2 < ( γλ i )< 0 2 > γλ i > 0 0 < γλ i < 2 0 < γ < 2 λ i 0 < γ < 2 λ max. (15) As the sum of all eigenvalues of R equals its trace, the sum of its diagonal elements, the gradient step size γ can be selected in terms of measurable quantities using γ < 2 trace (R). (16)

13 Drawbacks One of the drawbacks of the LMS adaptive scheme is that the algorithm must go through many iterations before satisfactory convergence is achieved. The rate of convergence of the weights is dictated by the eigenvalue spread of R.

14 Outline 1 Least Mean Squares 2 Sample Matrix Inversion 3 Recursive Least Squares 4 Accelerated Gradient Approach 5 Conjugate Gradient Method

15 Sample / Direct Matrix Inversion The sample matrix is a time average estimate of the array correlation, matrix using K time samples. For MMSE optimal criteria, optimum weight vector is given as where R = E [ xx H] and z = E [r x]. w MSE = R 1 z, (17) We can estimate the correlation matrix by calculating the time average such that N 2 R ˆR = 1 x (i) x H (i), (18) K i=n 1 where N 2 N 1 = K is the observation interval. Similarly, signal correlation vector can be estimated as z ẑ = 1 N 2 K r (i) x (i). (19) i=n 1 Since we use a K length block of data, this method is called a block adaptive approach. We are thus adapting the weights block-by-block.

16 Calculation of Estimations of R and z Let s define a matrix X K as shown below: X K = x 1 (N 1) x 1 (N 1 + 1) x 1 (N 1 + K) x 2 (N 1) x 2 (N 1 + 1) x L (N 1)... x L (N 1 + K) (20) Then the estimate of the array correlation matrix is given by ˆR = 1 K X KX H K. (21) Similarly, the estimate of the correlation vector is given by ẑ = 1 K rh X K, (22) where r = [ r (N 1) r (N 1 + 1) r (N 1 + K) ].

17 SMI Weights w MSE = ˆR 1 ẑ = [ X K X H ] 1 [ K r H ] X K (23)

18 SMI - Drawbacks The correlation matrix may be ill conditioned resulting in errors or singularities when inverted Computationally intensive because for large arrays, there is the challenge of inverting large matrices

19 Outline 1 Least Mean Squares 2 Sample Matrix Inversion 3 Recursive Least Squares 4 Accelerated Gradient Approach 5 Conjugate Gradient Method

20 Recursive Least Squares Following a similar estimation method used in SMI algorithm, one can estimate R and z as shown below: k R ˆR (k) = x (i) x H (i), (24) i=1 where k is the block length. z ẑ (k) = k r (i) x (i). (25) i=1

21 Recursive Least Squares... Weighted Estimation Both (24) and (25) use rectangular windows, thus they equally consider all previous time samples. Since the signal sources can change or slowly move with time, we might want to deemphasize the earliest data samples and emphasize the most recent ones as shown below: ˆR (k) = ẑ (k) = where α is the forgetting factor such that 0 α 1. k α k i x (i) x H (i), (26) i=1 k α k i r (i) x (i). (27) i=1

22 Recursive Least Squares... Recursive Formulation ˆR (k) = k i=1 α k i x (i) x H (i) = x (k) x H k 1 (k) + α k i x (i) x H (i) i=1 = x (k) x H k 1 (k) + α α k 1 i x (i) x H (i) i=1 = α ˆR (k 1) + x (k) x H (k) ẑ (k) = k i=1 α k i r (i) x (i) = αẑ (k 1) + r (k) x (k)

23 Recursive Least Squares... Recursive Formulation... Contd where g (k) = ˆR (k) = α ˆR (k 1) + x (k) x H (k) ˆR 1 (k) = [ α ˆR (k 1) + x (k) x H (k) ] 1 = α 1 ˆR 1 (k 1) α 1 ˆR 1 (k 1) x (k) x H (k) α 1 ˆR 1 (k 1) 1 + x H (k) α 1 ˆR 1 (k 1) x (k) = α 1 ˆR 1 (k 1) α 2 ˆR 1 (k 1) x (k) x H (k) ˆR 1 (k 1) 1 + α 1 x H (k) ˆR 1 (k 1) x (k) = α 1 ˆR 1 (k 1) α 1 ˆR 1 (k 1) x (k) 1 + α 1 x H (k) ˆR 1 (k 1) x (k) α 1 x H (k) ˆR 1 (k 1) = α 1 ˆR 1 (k 1) α 1 g (k) x H (k) ˆR 1 (k 1) α 1 ˆR 1 (k 1)x(k) 1+α 1 x H (k) ˆR 1. We can also prove that (k 1)x(k) g (k) = [ α 1 ˆR 1 (k 1) α 1 g (k) x H (k) ˆR 1 (k 1) ] x (k) = ˆR 1 (k) x (k).

24 Recursive Least Squares... Recursive Formulation... Contd w (k) = ˆR 1 (k) ẑ (k) = ˆR 1 (k) [αẑ (k 1) + r (k) x (k)] = α ˆR 1 (k) ẑ (k 1) + r (k) ˆR 1 (k) x (k) = α [ α 1 ˆR 1 (k 1) α 1 g (k) x H (k) ˆR 1 (k 1) ] ẑ (k 1) + r (k) ˆR 1 (k) x (k) = [ ˆR 1 (k 1) g (k) x H (k) ˆR 1 (k 1) ] ẑ (k 1) + r (k) ˆR 1 (k) x (k) = ˆR 1 (k 1) ẑ (k 1) g (k) x H (k) ˆR 1 (k 1) ẑ (k 1) + r (k) ˆR 1 (k) x (k) = w (k 1) g (k) x H (k) w (k 1) + r (k) g (k) = w (k 1) + g (k) [ r (k) x H (k) w (k 1) ] = w (k 1) + g (k) ɛ (k + 1) (28)

25 Outline 1 Least Mean Squares 2 Sample Matrix Inversion 3 Recursive Least Squares 4 Accelerated Gradient Approach 5 Conjugate Gradient Method

26 Quadratic Form F (x) = x H Ax b H x x H b + c x F (x) = Ax b

27 The Method of Steepest Descent * x 1 x 2 x 3 x 4 x 0 * x n+1 = x n γ F (x n) (29) Unlike LMS method, in AGM, that the value of γ is allowed to change at every iteration.

28 What is the optimum γ value?

29 The Method of Steepest Descent

30 2 The Method of Steepest Descent γ { F [x n γ F (x n)]} = 0 i.e., γ { F [x n γ [Ax n b]]} = 0

31 The Method of Steepest Descent F [x n+1] = [x n γ [Ax n b]] H A [x n γ [Ax n b]] b H [x n γ [Ax n b]] [x n γ [Ax n b]] H b So, df dγ = xh n A [Ax n b] [Ax n b] H Ax n + 2γ [Ax n b] H A [Ax n b] + b H [Ax n b] + [Ax n b] H b = x H n A [Ax n b] + b H [Ax n b] [Ax n b] H Ax n + [Ax n b] H b + 2γ [Ax n b] H A [Ax n b] = [ x H n A + b H] [Ax n b] [Ax n b] H [Ax n b] + 2γ [Ax n b] H A [Ax n b] = [Ax n b] H [Ax n b] [Ax n b] H [Ax n b] + 2γ [Ax n b] H A [Ax n b] = 2 [Ax n b] H [Ax n b] + 2γ [Ax n b] H A [Ax n b]. Since df dγ = 0, γ = [Ax n b] H [Ax n b] [Ax n b] H A [Ax n b].

32 Consecutive Orthogonal Paths (Ax n b) H (Ax n+1 b) = 0 (Ax n b) H {A [x n γ (Ax n b)] b} = 0 (Ax n b) H {[Ax n γa (Ax n b)] b} = 0 (Ax n b) H {[(Ax n b) γa (Ax n b)]} = 0 {[ ]} (Ax n b) H (Ax n b) γ (Ax n b) H A (Ax n b) = 0 γ = (Ax n b) H (Ax n b) (Ax n b) H A (Ax n b).

33 So, Accelerated Gradient Method is... If we approximate array correlation matrix and signal correlation vectors as R ˆR = x (k + 1) x H (k + 1) and z ẑ = r (k + 1) x (k + 1), w (k + 1) = w (k) γ w (ξ) = w (k) γ [Rw (k) z] ( ) [Rw (k) z] H [Rw (k) z] = w (k) [Rw (k) z] H [Rw (k) z] R [Rw (k) z] ( [ ˆRw (k) ẑ ] H [ ˆRw (k) ẑ ] ) [ w (k) [ ˆRw (k) ẑ ]H ˆR [ ˆRw (k) ẑ ] ˆRw (k) ẑ ] ( = w (k) + ɛ (k + 1) [ = w (k) + ɛ (k + 1) x H (k + 1) x (k + 1) because [Rw (k) z] = ɛ (k + 1) x (k + 1). x H (k + 1) x (k + 1) x H (k + 1) x (k + 1) x H (k + 1) x (k + 1) ) x (k + 1) ] x (k + 1), (30)

34 Outline 1 Least Mean Squares 2 Sample Matrix Inversion 3 Recursive Least Squares 4 Accelerated Gradient Approach 5 Conjugate Gradient Method

35 Steepest Descent vs Conjugate Gradient

36 Before proceeding further, let s recap Gram-Schmidt method...

37 Gram-Schmidt Conjugation d (0) = u 0 d (1) = u 1 β 10 d (0) u 0 u 1 d(0) + u u * d(0) d(1) d (2) = u 2 β 20 d (0) β 21 d (1) i 1 d (i) = u i β ij d (j) j=0 β ij = dh (j) Au i d H (j) Ad (j)

38 Now, let s enter into Conjugate Gradient method...

39 Problem Statement Let s consider a convex cost function as shown below: F (x) = x H Ax b H x x H b + c (31) Let s say that the optimal solution which minimizes the above cost function is x. Then, Ax b = 0. (32) So, solving the cost function for it s stationary point is equivalent to solving the above system of linear equations.

40 Conjugate Gradient Method

41 Expressing Error in terms of Search Vectors e 0 = x x 0 = α 1 p 1 + α 2 p 2 + α 3 p α n p n e 1 = x x 1 = α 2 p 2 + α 3 p α n p n.. e n 1 = x x n 1 = α n p n If {p k } is a simple orthogonal set, α k = ph k e k 1 p H k p. (33) k Also, it can be noted that e k is orthogonal to p k, p k 1, p k 2, etc. However, we can t use this approach because we don t know the value of x. If we knew x in the beginning itself, then we wouldn t worry about solving the optimization problem at all. So, we can t use this approach.

42 Expressing Residual in terms of Search Vectors r 0 = b Ax 0 = α 1 Ap 1 + α 2 Ap 2 + α 3 Ap α n Ap n r 1 = b Ax 1 = α 2 Ap 2 + α 3 Ap α n Ap n.. r n 1 = b Ax n 1 = α n Ap n If {p k } A is an orthogonal (conjugate) set with respect to matrix A, α k = ph k r k 1 p H k Ap. (34) k Instead of e k, in this case, r k is orthogonal to p k, p k 1, p k 2, etc.

43 How to Generate the Orthogonal Set Till now, we assumed that we knew the conjugate (i.e., A orthogonal) set {p k } A. There are several ways to generate this set, both iterative and non-iterative. We will choose Gram-Schmidt iterative process to generate this set.

44 Gram-Schmidt Conjugation It can be easily shown that r 0, r 1, r 2,, r n 1 are independent of each other. So, we can use these residuals to construct an A-orthogonal search set {p 1, p 2, p 3,, p n} A. So, k p k+1 = r k β kj p j (35) j=1 and β kj = ph j Ar k p H j Ap j. (36)

45 So, the Basic Algorithm is... Choose a starting position x 0 Calculate the residual at x 0 and assign this residual vector to p 1, i.e., p 1 = r 0. Now, calculate the value of α 1 using p 1 and r 0 according to the equation (34) Move to the next position according to the equation x k = x k 1 + α k p k Once you have reached to the next position, calculate the next search direction using r k and {p 1, p 2,, p k } according to the equation (36) Repeat this process until you reach x

46 Let s further refine conjugate gradient algorithm...

47 The Residual is Orthogonal to all Previous Residuals Search directions are dictated by the following equation: k p k+1 = r k β kj p j j=1 By taking Hermitian on both sides, we get j=1 p H k k+1 = rh k β kj ph j. Multiply the above equation on both sides by r k+1 to get p H k+1 r k+1 = r H k k r k+1 β kj ph j r k+1. j=1 So, 0 = r H k r k+1 0 r H k r k+1 = 0 (37)

48 Writing α k in terms of Residuals From (34), α k = ph k r k 1 p H k Ap. k Let s re-formulate the above equation in terms of residuals as shown below: r H k r k 1 = 0 (r k 1 α k Ap k ) H r k 1 = 0 ( r H k 1 α k ph k AH) r k 1 = 0 r H k 1 r k 1 α k ph k AH r k 1 = 0 αk = rh k 1 r k 1 p H k AH r k 1 α k = rh k 1 r k 1 r H k 1 Ap k r H k 1 α k = r k 1 ( ) H p k + k 1 j=1 β k 1jp j Apk α k = rh k 1 r k 1 p H k Ap k (38)

49 Writing β kj in terms of Residuals We know that r j = r j 1 α j Ap j. So, r H j = r H j 1 α j ph j A H. Multiply the above equation on both sides by r k. Then we get r H j r k = r H j 1 r k α j ph j A H r k. When A is Hermitian r H j r k = r H j 1 r k α j ph j Ar k r H j r k = r H j 1 r k α j β kjp H j Ap j r H j r k = r H j 1 r k β kj r H j 1 r j 1 β kj = rh j 1 r k r H j r k r H j 1 r. j 1

50 Writing β kj in terms of Residuals... Contd Since j k while calculating β kj values the term r H j 1 r k is always zero. So, β kj = 0 when j < k and rh k r k β kk = r H k 1 r. (39) k 1 So, finally p k+1 = r k β kk p k. (40)

51 Basic Version vs Refined Version α k = ph k r k 1 p H k Ap k k p k+1 = r k β kj p j j=1 β kj = ph j Ar k p H j Ap j α k = rh k 1 r k 1 p H k Ap k p k+1 = r k β kk p k rh k r k β kk = r H k 1 r k 1

52 If A is not Hermitian? Actual problem is solving Ax = b. If A is not Hermitian, then we can reformulate the above problem as shown below: A H Ax = A H b A 1 x = b 1 (41) where A 1 = A H A and b 1 = A H b. Now, we can see that conjugate gradient method can be applied to (41). Also, for this new problem, r new k = b 1 A 1 x k = A H (b A 1 x k ) = A H r k. So, algorithm is α k = ( A H r k 1 ) H ( A H r k 1 ) p H k AH Ap k p k+1 = r new k β kk p k = A H r k β kk p k ( ) A H H ( ) r k A H r k β kk = (A H r k 1 ) H (A H r k 1 )

53 So, Adaptive Algorithm using CGM is... w (k + 1) = w (k) + α k+1 p k+1 p k+1 = R H r k β kk p k where ( ) R H H ( ) r k 1 R H r k 1 α k = p H k RH Rp k ( ) R H H ( ) r k R H r k β kk = (R H r k 1 ) H (R H r k 1 ) r k = z Rw (k) ẑ ˆRw (k) = ɛ (k + 1) x (k + 1)

V. Adaptive filtering Widrow-Hopf Learning Rule LMS and Adaline

V. Adaptive filtering Widrow-Hopf Learning Rule LMS and Adaline Goals Introduce Wiener-Hopf (WH) equations Introduce application of the steepest descent method to the WH problem Approximation to the Least