Least Mean Square Filtering

Size: px

Start display at page:

Download "Least Mean Square Filtering"

Meghan Walker
5 years ago
Views:

1 Least Mean Square Filtering U. B. Desai Slides tex-ed by Bhushan

2 Least Mean Square(LMS) Algorithm Proposed by Widrow (1963) Advantage: Very Robust Only Disadvantage: It takes longer to converge

3 where X(n) = W(n) = x(n) x(n 1). x(n M + 1) w 1 (n) w 2 (n). w M (n) M 1 M 1 e(n) = d(n) ˆd(n) = d(n) W T (n)x(n) = d(n) X T (n)w (n)

4 Least Mean Square (LMS) Algorithm Assume x( ) to be wide sense stationary with zero mean, and consider the mean squared error J = E [ e 2 (n) ] [ (d(n) = E W T (n)x(n) ) ] 2 = E[({d(n) W T (n)x(n)} T )(d(n) W T (n)x(n))] = E[d(n)d(n)] W T (n)e [X(n)d(n)] E [ d(n)x T (n) ] W (n) + W T (n)e [ X(n)X T (n) ] W (n) = r dd W T (n)r xd r T xdw (n) + W T (n)rw (n)

5 Problem is to minimize J with respect to W (n) Evaluate the derivative of Scalar J with respect to vector W (.) of size M 1 dj dw (n) = 0 0 r xd r xd + 2RW (n) = 0 RW (n) = r xd W (n) = R 1 r xd RW (n) = r xd is known as the Wiener-Hopf equation. W = R 1 r xd is the Wiener optimal solution Solution is non adaptive, since we are assuming that R and r xd are known. Question: How to make the above solution adaptive, i.e. data driven and not covariance and corss covariance driven. Depending on the data, we update the weights New Weights W (n) = Old Weights W (n 1) + (function of new data x(n) and the desired signal d(n)

6 Approximate E[e 2 (n)] based on samples Adaptive...? Typical approximation based on samples (Refer HW2): E[e 2 (n)] 1 N N 1 i=0 e 2 (n) Widrow s Idea: Approximate E[e 2 (n)] by just one sample E[e 2 (n)] e 2 (n) = (d(n) W T (n)x(n)) 2 = (d(n) W T (n)x(n))(d(n) W T (n)x(n)) T = d 2 (n) W T (n)x(n)d(n) d(n)x T (n)w (n) + + W T (n)x(n)x T (n)w (n) Above approximation seems very simplistic, but surprisingly, works extremely well

7 Minimizing e 2 (n)? Try minimizing e 2 (n) with respect to W (n) using the differentiation technique x(n) is most recent data e 2 (n) = d 2 (n) W T (n)x(n)d(n) d(n)x T (n)w (n) + + W T (n)x(n)x T (n)w (n) W (n) is the weight at time instant n d(n) is the desired signal at time instant n differentiate e 2 (n) and equate it to zero Can we solve the above? e 2 (n) W (n) = 0 0 2X(n)d(n) + 2X(n)X T (n)w (n) = 0 X(n)X T (n)w (n) = X(n)d(n) NO. Because X(n)X T (n) is rank one matrix, thus can t be inverted

8 Gradient Descent (greedy algorithm) Iterative Scheme Move i.e. iterate along the negative of gradient direction to get to an optimal solution New weights = old weights + Additional term dependent on the -ve gradient W (n + 1) = W (n) + µ [ W (n) J(n)] }{{} along ve of the gradient J(n) (sometimes denoted in the notes by J(W (n))) is the cost function [ w(n) J(n)] is the gradient of J(n) with respect to W (n). If J(n) is convex, then the algorithm will converge to the global optimum If J(n) not convex, i.e., J(n) is multimodal, then it will converge to a local optimum

12 Widrow s LMS Algorithm W (n) J(n) = J(n) W (n) e 2 (n) W (n) (d(n) W T (n)x(n)) 2 W (n) = 2X(n)d(n) + 2X(n)X T (n)w (n) W (n + 1) = W (n) + µ[ ( 2X(n)d(n) + 2X(n)X T (n)w (n))] W (n + 1) = W (n) + µx(n) [d(n) X T (n)w (n)] }{{} e(n) W (n + 1) = W (n) + µx(n)e(n)

13 Issues LMS equation is W (n + 1) = W (n) + µx(n)e(n), e(n) = d(n) W T (n)x(n) How to select W (0)? In the absence of any knowledge, we can take W (0) = 0 How to select µ? Convergence Convergence of mean? Convergence of mean square error?

14 Choosing µ Proper selection of µ is critical, since it will govern the convergence of the LMS alogorithm Note that the LMS algorithm is completely governed by one external parameter, namely µ W (n + 1) = [I µx(n)x T (n)]w (n) + µx(n)d(n) You can look upon the above as a State Variable equation with stochastic input d(n) (assuming d(n) is modeled as a stochastic process). The above state variable equation has time varying system matrix (counter part of F in our earlier notation) stochastic time varying system matrix stochastic time varying input matrix Key Questions: 1. Does LMS algorithm converge and in what sense?. 2. If it converges, does W (n) W, the Wiener optimal solution? 3. If it converges, does J w (n) = E[J(n)] = E[e 2 (n)] J min the Wiener Optimal minimum cost (minimum error)?

15 Convergence in Mean for W(n) Can this limit be zero? W (n) = W (n) W lim E[ W (n)] =? n One approach is to explore if we can express E[ W ( )] as E[ W (n + 1)] = [A]E[ W (n)] In that case, stability can be investigated by examining the eigenvalues of A. If eigenvalues of A inside the unit circle then, E[ W (n + 1)] 0 for n

16 Z(n + 1) = AZ(n) Assume A has distinct eigenvalues A Digression Eigenvalue-Eigenvector factorization of A: A = P ΛP 1 Z(n + 1) = P ΛP 1 Z(n) P 1 Z(n + 1) = ΛP 1 Z(n) Z(n + 1) = ΛZ(n) Z 1 (n + 1) Z 2 (n + 1). Z m (n + 1) = λ λ λ m Z 1 (n) Z 2 (n). Z m (n)

17 Z i (n + 1) = λ i Z i (n) = Z i (n) = (λ i ) n Z i (0), i = 1, m Thus Z i (n) will tend to zero iff λ i < 1 The block diagram of this system is as follows

18 Some Manipulation Back to Convergence in Mean for W (n) W (n + 1) = W (n + 1) W W (n + 1) = W (n) + µx(n)[d(n) X T (n)w (n)] W W (n + 1) = W (n) + µx(n)[d(n) X T (n)( W (n) + W )] W (n + 1) = W (n) + µx(n)[d(n) X T (n) W (n) X T (n)w ] W (n + 1) = [I µx(n)x T (n)] W (n) + µx(n)[d(n) X T (n)w ] Consider E[ W (n + 1)] = E[(I µx(n)x T (n)) W (n)] + µe[x(n)d(n) X(n)X T (n)w ] = E[ W (n)] µe[x(n)x T (n) W (n)] + µ[r xd RW ] = E[ W (n)] µe[x(n)x T (n) W (n)] + 0

19 Issue... E[ W (n + 1)] = E[ W (n)] µe[x(n)x T (n) W (n)] The above equation requires higher order statistics, namely, E[X(n)X T (n) W (n)] Even with know joint pdf, messy to evaluate What is the way out? Make assumptions that make the problem tractable Independence Assumptions Assume X(n) X(n 1) X(n 2) X(0) x(n) independent of the past desired signals, namely, x(n) {d(n 1), d(n 2),...d(0)} We will introduce some more assumptions as when required

20 E[ W (n)] is a nonlinear function of X(n 1),, X(0). Now, since X(n) X(m), for m n, we obtain which implies E[X(n)X T (n) W (n)] = E[X(n)X T (n)]e[ W (n)] = RE[ W (n)] E[ W (n + 1)] = E[ W (n)] µre[ W (n)] E[ W (n + 1)] = [I µr]e[ W (n)] Now, If eigenvalues of [I µr], namely λ[i µr] < 1, then lim n E[ W (n)] = 0 which in turn implies lim n E[W (n)] = W, the Wiener optimal solution

21 Selection of µ We showed that lim n E[ W (n)] = 0 if λ[i µr] < 1 Can this help in the choice of µ? Note, we want a µ such that λ[i µr] < 1 is satisfied for a given R Towards this end, consider transforming along the eigen-directions E[ W (n + 1)] = [I µr]e[ W (n)] Let, eigenvalue-eigenvector equation for R be: Ru i = λ i u i, Since R is symmetric, we have R = UΛU T, where U = [u 1 u M ] U T U = I = UU T Moreover, since R is also positive definite, λ i > 0 for i = 1,, M

22 Substitute R by UΛU T, Define U T W (n) = WU (n), then E[ W (n + 1)] = [I µuλu T ]E[ W (n)] = [UIU T µuλu T ]E[ W (n)] = U[I µλ]u T E[ W (n)] E[U T W (n + 1)] = [I µλ]e[u T W (n)] E[ W U (n + 1)] = [I µλ]e[ W U (n)] 1 µλ E[ W U (n + 1)] = 0 1 µλ E[ W U (n)] µλ m E[ W i U(n + 1)] = (1 µλ i )E[ W i U(n)], for i = 1,, M We want 1 µλ i < 1 for i = 1, 2, M = 1 < 1 µλ i < 1 = 0 < µ < 2 λ i = 0 < µ < 2 λ max

23 It can be shown that 0 < µ < 2 Mr(0) < 2 λ max Estimating λ i (Exercise) 2 Thus becomes a conservative bound for µ Mr(0) i as compared to 2 λ max Now, r(0) = E[x(n)x(n)] 1 N 1 N n=0 x2 (n) Thus r(0) represents a good measure for signal power. We can have an estimate of the signal power at our disposal Consequently, we can select µ as per: 0 < µ < 2 M(signal power) This gives a practical handle on selecting µ Also, µ should be very very small, (small steps)

24 Mean Square Sense (MSS) Convergence Excess Mean Squared Error Define J W (n) E[e 2 (n)] = E[(d(n) W T (n)x(n)) T (d(n) W T (n)x(n))] J W (n) = E[(d(n) W T X(n) W T (n)x(n) + W T X(n)) (d(n) W T X(n) W T (n)x(n) + W T X(n)) T ] Now using, W (n) = W (n) W and e (n) = d(n) W T X(n) J W (n) = E[(e (n) W T (n)x(n))(e (n) W T (n)x(n))) T ] = E[(e 2 (n)] + E[ W T (n)x(n)x T (n) W (n)] E[ W T (n)x(n)e (n)] E[e (n)x T (n) W (n)] J W (n) = J min +E[ W T (n)x(n)x T (n) W (n)] E[ W T (n)x(n)e (n)] E[e (n)x T (n) W (n)] }{{} excess mean squared error

25 Consider E[e (n)x T (n) W (n)] = E[(d(n) W T X(n))X T (n) W (n)] = E[d(n)X T (n) W (n)] E[W T X(n)X T (n) W (n)] Now using independence assumptions we have, d(n)x T (n) W (n) and X(n)X T (n) W (n), thus E[e (n)x T (n) W (n)] = E[d(n)X T (n)]e[ W (n)] W T (n)e[x(n)x T (n)]e[ W (n)] = [r xd W T R] E[ }{{} W (n)] =0 = 0 Now the expression for J w (n) becomes Need to obtain a tractable expression for J W (n) = J min + E[ W T (n)x(n)x T (n) W (n)] the excess mean square error E[ W T (n)x(n)x T (n) W (n)]

26 Simplification of Excess Mean Square Error Expression Let W (n) = [ w1 (n) w M (n)] T, then E[ W T (n)x(n)x T (n) W (n)] = E[ = = M i=1 M i=1 = E[ M i=1 M j=1 w i (n)x i (n)x j (n) w j (n)] M E[ w i (n)x i (n)x j (n) w j (n)]] j=1 M E[ w i (n) w j (n)]e[x i (n)x j (n)] ( indep. assump.) j=1 M i=1 M j=1 w i (n){e[x i (n)x j (n)} w j (n)] = E[ W T (n)r W (n)] = trace{e[ W T (n) W (n)r]} J ex (n) This gives J W (n) = J min + E[ W T (n)r W (n)]

27 Tractable Expression for Excess Mean Square Error Use the Eigen-decomposition R = UΛU T J ex (n) = E[( W T (n)u)λ(u T W (n) )] }{{} W u (n) = E[ W T u (n)λ W u (n)] = = M λ i E[ w i u w }{{ u] i } p ii u(n) M λ i p ii u(n) i=1 i=1 Now to compute p ii u(n) we consider the difference equation for W (n) (Derive the above equation) W (n + 1) = [I µx(n)x T (n)] W (n) + µx(n)[d(n) X T (n)w ] Let P (n) E[ W (n) W T (n)], and P u (n) = U T P (n)u, then p ii u(n) = the ii th diagonal entry of P u (n)

28 After making several approximations, like µ << 1, ignoring terms which depend on µ 2, and higher powers in µ judiciously, we obtain p ii u(n + 1) = [1 2µλ i ]p ii u(n) + µ 2 J min λ i In order for the above system to be stable, we require that, for 1 i M 1 2µλ i < 1 = 1 < 1 2µλ i < 1 If the above stability condition is satisfied then, = 0 < µ < 1 λ i which is satisified if 0 < µ < 1 λ max p ii u( ) = [1 2µλ i ]p ii u( ) + µ 2 J min λ i = p ii u( ) = µj min 2

29 This is in turn implies that if 0 < µ < 1 λ max, then J ex ( ) = M i=1 λ i p ii u( ) M µj min = λ i 2 i=1 = µj M min λi 2 i=1 = µj min tr(r) 2 = µj min M(signal power) 2 J W ( ) = J min + µj M min λi 2 i=1

30 For convergence in the mean we required For m.s.s. convergence we require 0 < µ < 2 λ max 0 < µ < 1 λ max We have convergence in the mean and m.s.s. convergence if 0 < µ < 1 λ max Misadjustment J ex( ) J min M = µ 2 i=1 λ i = µ 2 tr(r) = µ M(signal power) 2

31 Some Comments LMS has O(M) computational complexity per iteration Just increasing M (order of the filter) will not help. It will increase mean square error (J ex ( ) = µj min 2 M(signal power)) Rate of convergence in mean is governed by λ min ((1 µλ min ) is the slowest mode)) J ex ( ) depends on λ max Small µ slow convergence and small J ex ( ) Large µ faster convergence and larger J ex ( ) λ max λ min eigenvalue spread or spread of λ for R LMS algorithm sensitive to µ and spread of λ When eigenvalue spread is large, λ max λ min >> 1, the we have conflicting situation Disadvantage of LMS: Very slow convergence rate; typically requires 20M iterations to converge

32 Prof. U. B. Desai Electrical Engineering Department Indian Institute of Technology - Bombay Mumbai

2.6 The optimum filtering solution is defined by the Wiener-Hopf equation

2.6 The optimum filtering solution is defined by the Wiener-Hopf equation .6 The optimum filtering solution is defined by the Wiener-opf equation w o p for which the minimum mean-square error equals J min σ d p w o () Combine Eqs. and () into a single relation: σ d p p 1 w o