Ch6-Normalized Least Mean-Square Adaptive Filtering

Ch6-Normalized Least Mean-Square Adaptive Filtering LMS Filtering The update equation for the LMS algorithm is wˆ wˆ u ( n 1) ( n) ( n) e ( n) Step size Filter input which is derived from SD as an approximation Error signal Step size where the step size is originally considered for a deterministic gradient. LMS suffers from gradient noise due to its random nature. Above update is problematic due to this noise Gradient noise amplification when u(n) is large. 1

Normalized LMS u(n) is random instantaneous samples can assume any value for the norm u(n) which can be very large. Solution: input samples can be forced to have constant norm Normalization wˆ wˆ u u( n) ( n 1) ( n) ( n) e ( n) Update equation for the normalized LMS algorithm. Note the similarity bw. NLMS and LMS update eqn.s NLMS can be considered same as LMS except time-varying step size. ( n) u ( n)

Normalized LMS Block diagram very similar to that of LMS The difference is in the Weight-Control Mechanism block. 3

Normalized LMS We have seen that LMS algorithm optimizes the criterion instead of MSE. Similarly, NLMS optimises another problem: From one iteration to the next, the weight vector of an adaptive filter should be changed in a minimal manner, subject to a constraint imposed on the updated filter s output. Mathematically, minimize the squared Euclidean norm of the change, Subject to the constraint wˆ ( n 1) wˆ ( n 1) wˆ ( n) wˆ ( n 1) u( n ) d ( n ) which can be optimized by the method Lagrange multipliers J( n) wˆ( n 1) Re d( n) ˆ w ( n 1) u( n) 4

ˆ ˆ ˆ ˆ w w w w wˆ u J( n) ( n 1) ( n) ( n 1) ( n) Re d( n) ( n 1) ( n) Jn ( ) w ˆ ( n 1) ˆ ˆ w( n 1) w( n) u( n) Proof detail on slides later: 1 Set equal to zero wˆ( n 1) wˆ( n) u( n). 1 d( n) wˆ ( n 1) u( n) wˆ( n) u( n) u( n) 1 1 wˆ ( n) u( n) u ( n) u( n) wˆ ( n) u( n) u( n). en ( ), u ( n) 1 1 wˆ ( n 1) wˆ ( n 1) wˆ ( n) u( n) u( n) e ( n) u( n) 5

In order to exercise control over the change in the tap-weight vector from one iteration to the next without changing the direction of the vector, we introduce a positive real scaling factor denoted by wˆ wˆ wˆ u u( n) ( n 1) ( n 1) ( n) ( n) e ( n) wˆ wˆ u u( n) ( n 1) ( n) ( n) e ( n) The product vector u(n)e(n) is normalized with respect to the squared Euclidean norm of the tap-input vector u(n). is dimensionless, while dimension of μ is inverse of power. We may view the normalized LMS filter as an LMS filter with a time-varying step-size parameter. 6

Proof detail: k=0, 1,,, M-1 7

k=0, 1,,, M-1, λ=λ 1 +jλ we multiply both sides of Eq. by u (n - k) and then sum over all possible integer values of k for 0 to M - 1. We thus get 9

k=0, 1,,, M-1 10

Normalized LMS wˆ ( n 1) 1 wˆ wˆ u 1. Take the first derivative of J(n) wrt and set to zero to find ( n 1) ( n) ( n).. Substitute this result into the constraint to solve for the multiplier en ( ), u ( n) 3. Combining these results and adding a step-size parameter to control the progress gives wˆ wˆ wˆ u u( n) ( n 1) ( n 1) ( n) ( n) e ( n) 4. ence the update eqn. for NLMS becomes wˆ wˆ u u( n) ( n 1) ( n) ( n) e ( n) 11

Normalized LMS Observations: We may view an NLMS filter as an LMS filter with a time-varying step-size parameter ( n) u ( n) Rate of convergence of NLMS is faster than LMS u(n) can be very large, however, likewise it can also be very small Causes problem since it appears in the denominator Solution: include a small correction term to avoid stability problems. اثبات با استفاده از روش نيوتن Ch 4 از کتاب سعيد مطالعه شود. 1

Stability of NLMS What should be the value of step size for convergence? Assume that the desired response is governed by d( n) w u( n) ( n) Substituting the weight-error vector ε( n) w wˆ ( n) Additive disturbance into the NLMS update equation we get wˆ( n 1) wˆ( n) u( n) e ( n) u( n) ε( n 1) ε( n) u( n) e ( n) u( n) which provides the update for the mean-square deviation D( n) E ε( n) Where ξ u (n) is called undisturbed error signal ( ) w wˆ ( ) u( ) ε ( ) u( ) n n n n n u d( n) v( n) y( n) e( n) v( n) d( n) y( n) v( n) 13

Stability of NLMS Find the range for so that Right hand side is a quadratic function of, is satisfied when Differentiate wrt and equate to 0 to find opt This step size yields maximum drop in the MSD! For clarity of notation assume real-valued signals 14

Stability of NLMS Assumption I: The fluctuations in the input signal energy u(n) from one iteration to the next are small enough so that Then Assumption II: Undisturbed error signal u (n) is uncorrelated with the disturbance noise (n) Then e(n): observable, u (n): unobservable 15

Stability of NLMS Assumption III: The spectral content of the input signal u(n) is essentially flat over a frequency band larger than that occupied by each element of the weight-error vector (n), hence Then T ( ) u ε ( ) u( ) E n E n n E ε( n) E u ( n) D( n) E u ( n) 16

Normalized LMS 17

Affine Projection Adaptive Filters Mathematically,minimize the squared Euclidean norm of the change, wˆ ( n 1) wˆ ( n 1) wˆ ( n) subject to the set of N constraints wˆ ( n 1) u( n k) d( n k) for k 0, 1,..., N 1 (6.36) where N is smaller than the dimensionality M of the input data space or, equivalently, the weight space. This constrained optimization criterion includes that of the normalized LMS filter as a special case namely, N = 1. We may view N, the number of constraints, as the order of the affine projection adaptive filter. 18

Following the method of Lagrange multipliers with multiple constraints definitions: N 1 k k0 J( n) wˆ ( n 1) wˆ ( n) Re d( n k) wˆ ( n 1) u( n k). An N-by-M data matrix A(n) An N-by-1 desired response vector An N-by-1 Lagrange vector Compact form of cost function A ( n) u( n), u( n 1),, u( n N 1) d ( n) d( n), d( n 1),, d( n N 1) λ ( n),,, 0 1 N 1 ( ) ˆ ( 1) ˆ ( ) Re ( ) ( ) ˆ ( 1) J n w n w n n n n. d A w λ

The derivative of the cost function is: Jn ( ) w ˆ ( n 1) Set equal zero; Rewrite equation (6.36) in new form Then we have wˆ( n 1) wˆ( n) A ( n) λ. 1 wˆ ( n1) A ( n) λ. d( n) Awˆ ( n1) 1 A( n) wˆ ( n 1) A( n) wˆ ( n 1) wˆ ( n) A( n) A ( n) λ. 1 A( n) wˆ( n 1) A( n) wˆ( n) A( n) A ( n) λ 1 d( n) A( n) wˆ ( n) A( n) A ( n) λ 0

The difference between d( n) and A( n) w( n) based on data available at iteration N is N-by-1 error vector Solving for λ e( n) d( n) A( n) wˆ ( n) 1 λ A( n) A ( n) e( n). Finally, we need to exercise control over the change in the weight vector from one iteration to the next, but keep the same direction. ˆ 1 wˆ ( n 1) A ( n) A( n) A ( n) e( n). 1 wˆ ( n 1) A ( n) A( n) A ( n) e( n). 1 wˆ( n 1) wˆ( n) A ( n) A( n) A ( n) e( n). which is the desired update equation for the affine projection adaptive filter 1

Affine Projection Operator Substituting e(n) in the above eq. 1 wˆ ( n 1) wˆ ( n) A ( n) A( n) A ( n) d( n) A( n) wˆ ( n). 1 1 ( n) ( n) ( n) ( n) ˆ I A A A A w( n) A ( n) A( n) A ( n) d( n) Define the projection operator: 1 P A ( n) A( n) A ( n) A( n) The complement projector I P acts on the old weight vector wˆ ( n) to produce the updated weight vector wˆ ( n 1) Defining pseudo-inverse of the data matrix A ( n) A ( n) A( n) A ( n) 1 w ˆ( n 1) I A ( n) A ( n) w ˆ( n) A ( n) d ( n)

Summary of the Affine Projection Adaptive Filter We may view the affine projection filter as an intermediate adaptive filter between the normalized LMS filter and the recursive least-squares (RLS) filter, in terms of both computational complexity and performance. 3

Stability Analysis of the Affine Projection AF Rewrite 1 ε( n 1) ε( n) A ( n) A( n) A ( n) e( n). where 4

Observations on the Convergence Behavior of Affine Projection Adaptive FiIters 1. The learning curve of an affine projection adaptive filter consists of the sum of exponential terms.. An affine projection adaptive filter converges at a rate faster than that of the corresponding normalized LMS filter. 3. As more delayed versions of the tap-input vector u(n) are used (i.e., the filter order N is increased), the rate of convergence improves, but the rate at which improvement is attained decreases. Practical Considerations: Regularization to take care of noisy data Fast implementation to improve computational efficiency I 1 wˆ( n 1) wˆ( n) A ( n) A( n) A ( n) e( n). W6 ; Ch6: 1, 3, 6, 7 5