Ch5: Least Mean-Square Adaptive Filtering

Similar documents
Ch4: Method of Steepest Descent

2.6 The optimum filtering solution is defined by the Wiener-Hopf equation

Ch6-Normalized Least Mean-Square Adaptive Filtering

Adaptive Filtering Part II

Adaptive Filters. un [ ] yn [ ] w. yn n wun k. - Adaptive filter (FIR): yn n n w nun k. (1) Identification. Unknown System + (2) Inverse modeling

Adaptive Filter Theory

ADAPTIVE FILTER THEORY

CHAPTER 4 ADAPTIVE FILTERS: LMS, NLMS AND RLS. 4.1 Adaptive Filter

Linear Optimum Filtering: Statement

LMS and eigenvalue spread 2. Lecture 3 1. LMS and eigenvalue spread 3. LMS and eigenvalue spread 4. χ(r) = λ max λ min. » 1 a. » b0 +b. b 0 a+b 1.

ADAPTIVE FILTER THEORY

EE482: Digital Signal Processing Applications

Adaptive Filtering. Squares. Alexander D. Poularikas. Fundamentals of. Least Mean. with MATLABR. University of Alabama, Huntsville, AL.

On the Stability of the Least-Mean Fourth (LMF) Algorithm

IS NEGATIVE STEP SIZE LMS ALGORITHM STABLE OPERATION POSSIBLE?

AdaptiveFilters. GJRE-F Classification : FOR Code:

3.4 Linear Least-Squares Filter

Statistical and Adaptive Signal Processing

Sparse Least Mean Square Algorithm for Estimation of Truncated Volterra Kernels

EE482: Digital Signal Processing Applications

EFFECTS OF ILL-CONDITIONED DATA ON LEAST SQUARES ADAPTIVE FILTERS. Gary A. Ybarra and S.T. Alexander

A Derivation of the Steady-State MSE of RLS: Stationary and Nonstationary Cases

26. Filtering. ECE 830, Spring 2014

Performance Comparison of Two Implementations of the Leaky. LMS Adaptive Filter. Scott C. Douglas. University of Utah. Salt Lake City, Utah 84112

2262 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 47, NO. 8, AUGUST A General Class of Nonlinear Normalized Adaptive Filtering Algorithms

Least Mean Square Filtering

Chapter 2 Wiener Filtering

Lecture 6: Block Adaptive Filters and Frequency Domain Adaptive Filters

BLOCK LMS ADAPTIVE FILTER WITH DETERMINISTIC REFERENCE INPUTS FOR EVENT-RELATED SIGNALS

ELEG-636: Statistical Signal Processing

New Recursive-Least-Squares Algorithms for Nonlinear Active Control of Sound and Vibration Using Neural Networks

III.C - Linear Transformations: Optimal Filtering

Variable Learning Rate LMS Based Linear Adaptive Inverse Control *

V. Adaptive filtering Widrow-Hopf Learning Rule LMS and Adaline

Performance Analysis and Enhancements of Adaptive Algorithms and Their Applications

Assesment of the efficiency of the LMS algorithm based on spectral information

ADAPTIVE FILTER ALGORITHMS. Prepared by Deepa.T, Asst.Prof. /TCE

A Strict Stability Limit for Adaptive Gradient Type Algorithms

Revision of Lecture 4

Machine Learning. A Bayesian and Optimization Perspective. Academic Press, Sergios Theodoridis 1. of Athens, Athens, Greece.

An Adaptive Sensor Array Using an Affine Combination of Two Filters

Recursive Least Squares for an Entropy Regularized MSE Cost Function

Lecture: Adaptive Filtering

SGN Advanced Signal Processing: Lecture 4 Gradient based adaptation: Steepest Descent Method

Probability Space. J. McNames Portland State University ECE 538/638 Stochastic Signals Ver

SIMON FRASER UNIVERSITY School of Engineering Science

Cooperative Communication with Feedback via Stochastic Approximation

ADAPTIVE signal processing algorithms (ASPA s) are

Lecture 3: Linear FIR Adaptive Filtering Gradient based adaptation: Steepest Descent Method

DESIGN AND IMPLEMENTATION OF SENSORLESS SPEED CONTROL FOR INDUCTION MOTOR DRIVE USING AN OPTIMIZED EXTENDED KALMAN FILTER

A METHOD OF ADAPTATION BETWEEN STEEPEST- DESCENT AND NEWTON S ALGORITHM FOR MULTI- CHANNEL ACTIVE CONTROL OF TONAL NOISE AND VIBRATION

Benjamin L. Pence 1, Hosam K. Fathy 2, and Jeffrey L. Stein 3

Machine Learning and Adaptive Systems. Lectures 3 & 4

Adap>ve Filters Part 2 (LMS variants and analysis) ECE 5/639 Sta>s>cal Signal Processing II: Linear Es>ma>on

KNOWN approaches for improving the performance of

Advanced Digital Signal Processing -Introduction

Convergence Evaluation of a Random Step-Size NLMS Adaptive Algorithm in System Identification and Channel Equalization

3. ESTIMATION OF SIGNALS USING A LEAST SQUARES TECHNIQUE

Here represents the impulse (or delta) function. is an diagonal matrix of intensities, and is an diagonal matrix of intensities.

Neural Network Training

Submitted to Electronics Letters. Indexing terms: Signal Processing, Adaptive Filters. The Combined LMS/F Algorithm Shao-Jen Lim and John G. Harris Co

Statistical Signal Processing Detection, Estimation, and Time Series Analysis

Error Vector Normalized Adaptive Algorithm Applied to Adaptive Noise Canceller and System Identification

Comparison of Modern Stochastic Optimization Algorithms

Parametric Signal Modeling and Linear Prediction Theory 1. Discrete-time Stochastic Processes

Statistical signal processing

4. Multilayer Perceptrons

EEL 6502: Adaptive Signal Processing Homework #4 (LMS)

Algorithm for Multiple Model Adaptive Control Based on Input-Output Plant Model

Optimal control and estimation

Comparative Performance Analysis of Three Algorithms for Principal Component Analysis

Signals and Spectra - Review

Parametric Signal Modeling and Linear Prediction Theory 4. The Levinson-Durbin Recursion

On the Use of A Priori Knowledge in Adaptive Inverse Control

Analysis of incremental RLS adaptive networks with noisy links

Expressions for the covariance matrix of covariance data

NSLMS: a Proportional Weight Algorithm for Sparse Adaptive Filters

Lecture Notes in Adaptive Filters

Machine Learning and Adaptive Systems. Lectures 5 & 6

Hill climbing: Simulated annealing and Tabu search

SNR lidar signal improovement by adaptive tecniques

Iterative Learning Control Analysis and Design I

Lecture 7: Linear Prediction

8 Numerical methods for unconstrained problems

Recurrences and Full-revivals in Quantum Walks

Widely Linear Estimation and Augmented CLMS (ACLMS)

Computer exercise 1: Steepest descent

Recursive Generalized Eigendecomposition for Independent Component Analysis

FIR Filters for Stationary State Space Signal Models

ECE531 Lecture 11: Dynamic Parameter Estimation: Kalman-Bucy Filter

1. Determine if each of the following are valid autocorrelation matrices of WSS processes. (Correlation Matrix),R c =

Impulsive Noise Filtering In Biomedical Signals With Application of New Myriad Filter

Numerical methods part 2

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions

Shannon meets Wiener II: On MMSE estimation in successive decoding schemes

BASICS OF DETECTION AND ESTIMATION THEORY

Signal Denoising with Wavelets

Beam Propagation Method Solution to the Seminar Tasks

A Cross-Associative Neural Network for SVD of Nonsquared Data Matrix in Signal Processing

Neuro-Fuzzy Comp. Ch. 4 March 24, R p

Transcription:

Ch5: Least Mean-Square Adaptive Filtering Introduction - approximating steepest-descent algorithm Least-mean-square algorithm Stability and performance of the LMS algorithm Robustness of the LMS algorithm Variants of the LMS algorithm Summary & references 1

1. Introduction Introduced by Widrow & off in 1959 Ever-green hit on the Top 10 list of adaptation algorithms Simple, no matrices involved in the adaptation In the family of stochastic gradient algorithms To distinguish from the method of steepest descent, using deterministic gradient LMS: adaptive filtering algorithm having two basic processes Filtering process, producing 1) output signal 2) estimation error Adaptive process, i.e., automatic adjustment of filter tap weights

Linear Adaptive Filtering Algorithms Stochastic Gradient Approach Least-Mean-Square (LMS) algorithm Gradient Adaptive Lattice (GAL) algorithm Least-Squares Estimation Recursive least-squares (RLS) estimation Standard RLS algorithm Square-root RLS algorithms Fast RLS algorithms 3

Notations

Steepest Descent The update rule for SD is where or SD is a deterministic algorithm, in the sense that p and R are assumed to be exactly known. In practice we can only estimate these functions. 5

Basic Idea The simplest estimate of the expectations is To remove the expectation terms and replace them with the instantaneous values, i.e. Then, the gradient becomes Eventually, the new update rule is No expectations, Instantaneous samples! 6

Basic Idea owever the term in the brackets is the error, i.e. then wˆ wˆ u * ( n 1) ( n) ( n) e ( n) is the gradient of instead of as in SD. Also other stochastic approximations are possible. E.g., in the Griffiths algorithm cross-correlation vector p is estimated with some other means (e.g. pilot signal), and instantaneous estimate is used for R. 7

Basic Idea Filter weights are updated using instantaneous values wˆ wˆ u * ( n 1) ( n) ( n) e ( n) (a) Block diagram of adaptive transversal filter. 8

(b) Detailed structure of the transversal filter component. 9

(c) Detailed structure of the adaptive weightcontrol mechanism. 10

Update Equation for Method of Steepest Descent Update Equation for Least Mean-Square 11

LMS Algorithm unbiased Since the expectations are omitted, the estimates will have a high variance. Therefore, the recursive computation of each tap weight in the LMS algorithm suffers from a gradient noise. In contrast to SD which is a deterministic algorithm, LMS is a member of the family of stochastic gradient descent algorithms. LMS has higher MSE (J( )) compared to SD (J min ) (Wiener Soln.) as n i.e., J(n) J( ) as n Difference is called the excess mean-square error J ex ( ) The ratio J ex ( )/ J min is called the misadjustment. opefully, J( ) is a finite value, then LMS is said to be stable in the mean square sense. LMS will perform a random motion around the Wiener solution (Due to the rough estimates of R and p, adaptation steps are quite random). 12

LMS Algorithm Involves a feedback connection. Although LMS might seem very difficult to work due the randomness, the feedback acts as a low-pass filter or performs averaging so that the randomness can be filtered-out. The time-constant of averaging is inversely proportional to μ. Actually, if is chosen small enough, the adaptive process is made to progress slowly and the effects of the gradient noise on the tap weights are largely filtered-out. Computational complexity of LMS is very low very attractive Only 2M+1 complex multiplications and 2M complex additions per iteration. 13

LMS Algorithm 14

Gradient Adaptive Lattice (GAL) In Multistage Lattice Predictor (next fig), the cost function is defined as: 1 J fb, m E f m ( n) bm ( n) 2 Where we had: f ( n) f ( n) k b ( n 1) * m m1 m m1 b ( n) b ( n 1) k f ( n) m m1 m m1 Substituting, we get: 2 2

Multistage Lattice Predictor 16

Differentiating the cost function J fb,m with respect to the complex-valued reflection κ m we get: Putting equal to zero we get the Burg formula: The optimum reflection coefficient κ m,o (5-15) 17

The GAL Algorithm The above formula is a block estimator for κ m. We reformulate it into a recursive structure. Which is total energy of both forward and backward prediction errors at the input of mth stage measured up to and including time n. 18

For the numerator of Eq 5-15 (time-average cross-correlation) n n1 * * * m 1 m 1 m 1 m 1 m 1 m 1 i1 i1 b ( i 1) f ( i ) b ( i 1) f ( i ) b ( n 1) f ( n) Substituting, we get: It is not recursive yet. (5-19) k m ˆ ( 1) We replace with k n (refer to Pr. 8) f ( n) f ( n) kˆ ( n 1) b ( n 1) * m m 1 m m 1 b ( n) b ( n 1) kˆ ( n 1) f ( n) m m 1 m m 1 m 19

The 2 nd term of the numerator of Eq (5-19) is written as: 20

The first term of the numerator of Eq (5-19) is written as: Substituting, we get: Two modifications: 0 < β < 1 21

Gradient Adaptive Lattice (GAL) Normalized step-size tracks variations in environment E ˆ ( n) m 1 Prediction errors are the cue for adaptation. Prediction errors small: E m-1 [n] small and μ m [n] large in magnitude, i.e, fast adaptation mode. Noisy environments: E m-1 [n] large and μ m [n] smaller in magnitude, i.e, noise rejection mode. Superior to the LMS: lower noise sensitivity of κ m [n] and better tracking capabilities via μ m [n]. Computationally simple and attractive for practical implementation. Convergence of GAL inferior to RLS-based lattice structures. 22

Desired Response Estimator 23

24

25

26

Application1- Canonical Model LMS algorithm for complex signals/with complex coef.s can be represented in terms of four separate LMS algorithms for real signals with cross-coupling between them. Write the input/desired signal/tap gains/output/error in the complex notation I denotes in-phase Q denotes quadrature 27

Canonical Model Then the relations bw. these expressions are y ( n) y ( n) T wˆ u I I I Q Q T wˆ u T wˆ u T wˆ u Q I Q Q I wˆ wˆ u * ( n 1) ( n) ( n) e ( n) wˆ ( 1) ˆ I n wi ( n) ei ( n) ui ( n) eq ( n) uq( n) wˆ ( 1) ˆ Q n wq( n) ei ( n) uq( n) eq ( n) ui ( n) 28

This canonical model clearly illustrates that a complex LMS algorithm is equivalent to a set of four real LMS algorithms with cross-coupling between them. Its use may arise, for example, in the adaptive equalization of a communication channel for the transmission of binary data by means of a multiphase modulation scheme such as quadriphase-shift keying (QPSK). 29

Canonical Model 30

Canonical Model + - 31

Analysis of the LMS Algorithm (1) Although the filter is a linear combiner, the algorithm is highly nonlinear and violates superposition and homogenity Assume the initial condition, then n1 n1 * wˆ( n) e ( i) u( i) y( n) wˆ ( n) u( n) e( i) u ( i) u( n) i0 i0 Analysis will continue using the weight-error vector (different notation from SD); and its autocorrelation output ε( n) w wˆ ( n) K( n) E[ ε( n) ε ( n)] o input nonlinear ere we use expectation, however, actually it is the ensemble average!. 32

ε( n) w wˆ ( n) o e( n) d( n) wˆ ( n) u( n) e ( n) d( n) w ( n) u( n) o o 33

Analysis is based on independence theory - following assumptions are made: Input vectors u(n), u(n-1),, u(1) are statistically independent vectors. Clearly not true, but can be circumvented Input vector u(n) and desired response d(n) are statistically independent of d(n-1),d(n-2),.d(1) Input vector u(n) and desired response d(n) consist of mutually Gaussian-distributed random variables. The LMS algorithm in terms of the weight-error vector: ε I u u ε u * ( n 1) [ ( n) ( n)] ( n) ( n) eo ( n) Proof: next slide 34

ε w wˆ w wˆ u * ( n 1) o ( n 1) o ( n) ( n) e ( n) ε u ε u wˆ u * ( n) ( n) e ( n) ( n) ( n) d( n) ( n) ( n) ε u u wˆ * ( n) ( n) d ( n) ( n) ( n) ε u u w ε * ( n) ( n) d ( n) ( n) o ( n) * ε( n) u( n) d ( n) u( n) u ( n) ε( n) u( n) u ( n) wo * [ ( n) ( n)] ( n) u( n) d ( n) u ( n) w I u u ε o ε I u u ε ε * ( n 1) [ ( n) ( n)] ( n) u( n) eo ( n) (5. * 0( n 1) [ I R] ε0( n) u( ne ) o( n) (5.58) 56) Also Kushner s direct-averaging method (assuming small step-size μ) is invoked: Above Eqs have similar solutions for limiting small step-size μ. * 35

Eε n E I u n u n ε n u n e n * ( 1) [ ( ) ( )] ( ) ( ) o( ) * E ε( n) E u( n) u ( n) ε( n) E u( n) eo ( n) E ε( n) E u( n) u ( n) E ε( n) 0 [ IR] E ε( n) ε( 1) [ QIQ QΛQ ] ε( ) E n E n ( 1) ( ) Q ε I Λ Q ε E n E n Using orthogonality principle Assumption: w(n) is Statistically Independence of u(n) and d(n) T ( n 1) Q E ε( n 1) T ( n 1) I Λ T ( n) 2 lim T ( n 1) 0 iff 1 1 i 1 0 n tr R tr QΛQ tr Λ i max 0 tr max 2 R 36

Correlation matrix of tap-weight-error vector ε(n). K( 1) [ ε( 1) ε ( 1)] ( I R) K( )( I R) R 2 n E n n n J min K(n) does not go to zero The first term, ( I R) K( n)( I R) is the result of evaluating the expectation of the outer product of [ I R] ε( n) with itself. The expectation of the cross-product term, eo ( n)( I R) ε( n) u ( n) is zero by virtue of the implied independence of ε(n) and u(n). 2 J min R The last term,, is obtained by applying the Gaussian 2 factorization theorem to the product term * eo( n) u( n) u ( n) eo( n) Since the correlation matrix R is positive definite, and μ is small, it follows that the first term of the above Eqn. is also positive definite, provided that K(n) is positive definite. Therefore K(n+1) is positive definite. The proof by induction is completed by noting that K(0) is positive definite K(0) ε(0) ε (0) ( w wˆ(0))( w wˆ(0)) o o 37

In summary, Eq. (K(n+1)) represents a recursive relationship for updating the weight-error correlation matrix K(n), starting with n = 0, for which we have K(0). Furthermore, after each iteration it does yield a positive-definite answer for the updated value of the weight-error correlation matrix. MSE at the output: e( n) d( n) wˆ ( n) u( n), e ( n) d( n) w ( n) u( n) e( n) e ( n) w ( n) u( n) wˆ ( n) u( n) e ( n) w ( n) wˆ( n) u( n) e ( n) ε ( n) u( n) J( n) E[ e( n) ] 2 E[( e ( n) ε ( n) u( n))( e ( n) ε ( n) u( n)) ] * [( o( ) ε ( ) u( ))( o( ) u ( ) ε( ))] J E[ ε ( n) u( n) u ( n) ε( n)] min J J ( n) J J ( n) J ( ) min o E e n n n e n n n o o o o o o ex min tr ex o o Excess MSE Wiener MSE Transient component Steady state Excess MSE 38

J ( n) E[ ε ( n) u( n) u ( n) ε( n)] = ex E[ tr[ ε ( n) u( n) u ( n) ε( n)]] E[ tr[ u( n) u ( n) ε( n) ε ( n)]] tr[ E[ u( n) u ( n) ε( n) ε ( n)]] tr[ E[ u( n) u ( n)] E[ ε( n) ε ( n)]] tr[ RK( n)] tr[ QΛQ K( n)] tr[ QΛQ QX( n) Q ] Note: Q RQ Λ and Q K( n) Q i1 X( n); J ( n) tr[ QΛX( n) Q ] tr[ Q QΛX( n)] ex tr[ ΛX( n)] x ( n) M i i trace of a scalar is the scalar itself In general, X( n) is not a diagonal matrix. x i (n) are the diagonal elements of Q K(n)Q 39

K( 1) ( I R) K( )( I R) R 2 n n J min Q RQ Λ ; Q K( n) Q X( n) Q K( n 1) Q Q ( I R) K( n)( I R) Q J Q RQ 2 min X( 1) ( Q Q QΛQ ) K( )( Q QΛQ Q Λ 2 n n Jmin X( 1) ( I Λ) Q K( ) Q( I Λ) Λ 2 n n Jmin Recursion X( 1) ( I Λ) X( )( I Λ) Λ 2 n n J min X ( n 1) (1 ) X ( n) J converges if 1 1 2 2 i, i i i, i i min i 2 2 xi ( n 1) (1 i ) xi ( n) i Jmin converges if 1 i 1 LMS algorithm is convergent in the mean square if and only if. 2 0 max 40

X( 1) ( I Λ) X( )( I Λ) Λ 2 n n Jmin x( 1) Bx( ) λ 2 n n Jmin x( n) [ x ( n), x ( n),, x ( n)], λ [,,, ] b ij T 1 2 M 1 2 2 (1 i ), i j 2 i j, i j the matrix B is real, positive, and symmetric. It can be shown that the solution to the difference equation 2 is given by (see appendix) x( n 1) Bx( n) J λ M i 1 min??? n T x( n) ci gi gi [ x(0) x( )] x( ) The coefficient c i is the ith eigenvalue of matrix, B, and g i is the associated eigenvector, that is G T BG=C where C = diag[c 1,c 2,,c M ], G=[g 1, g 2,, g M ] M T 41

M T n T T T ex i i i i 1 J ( n) λ x( n) c λ g g [ x(0) x( )] λ x( ) M n T T c λ g g [ x(0) x( )] J ( ) i 1 i i i ex M T ex λ x j x j j 1 where J ( ) ( ) ( ) The first term on the right-hand side describes the transient behavior of the mean-squared error, whereas the second term represents the final value of the excess mean-squared error after adaptation is completed (i.e., its steady-state value). 42

Transient behavior of MSE n J ( n) J c J ( ) i min i i i M i1 i i ex T T λ g g [ x(0) x( )], i1,2,, c is the ith eigenvalue of matrix B M This Equation provides the basis for a deeper understanding of the operation of the LMS algorithm in a wide-sense stationary environment, as described next in the form of four properties. 43

Property 1. The transient component of the mean-squared error, J(n), does not exhibit oscillations. M n tr ( ) i i i1 J n c c i is the ith eigenvalue of matrix B and γ i are constant coefficients Property 2. The transient component of the J(n) dies out; that is, the LMS algorithm is convergent in the mean square if and only if, 2 0 Property 3. The final value of the excess mean-squared error is less than the minimum mean-squared error if, J ex i1 max M i ( ) Jmin if 1 holds 2 i 44

Property 4. The misadjustment, defined as the ratio of the steadystate value J ex ( ) of the excess mean-squared error to the minimum mean-squared error J min, equals M Jex ( ) = i M Jmin i1 2 i which is less than unity if the step size parameter μ satisfies the M condition of 2i i 1 1 2 i Note these properties follow from: x ( n 1) (1 ) x ( n) J x 2 2 i i i i i J 2 min ( ) and i M M T i ex ( ) λ x( ) i i ( ) min i 1 i 1 2 i J x J min 45

Analysis of the LMS Algorithm (2 nd approach) We have Let wˆ wˆ u * ( n 1) ( n) ( n) e ( n) e ( n) d( n) w ( n) u( n) o o Then the update eqn. can be written as Analyse convergence in an average sense Algorithm run many times study their ensemble average behavior Using ε I u u ε u (5.56) * ( n 1) [ ( n) ( n)] ( n) ( n) eo ( n) It can be shown that E I u( n) u ( n) I R ε I R ε u has solution very close to eq. 5.56 ere we use expectation, however, actually it is the ensemble average!. * 0( n 1) [ ] 0( n) ( n) eo ( n) 46

The solution of 5.56 can be expressed as the sum of partial functions ε( n) ε ( n) ε ( n) ε ( n) ε( n) ε ( n) as 0 0 1 2 0 We define the zero mean difference matrix So eq 5.56 can written as: 0 1 2 igh order corrections to the zero-order solution P( n) u( n) u ( n) R ε ( n 1) ε ( n 1) ε ( n 1) ( I R) ε ( n) ε ( n) ε ( n) 0 1 2 P( n) ε ( n ε n ε n u * 0 ) 1( ) 2( ) ( ne ) o( n) ε ( n 1) ( I R) ε ( n) f ( n), i 0,1,2, i i i Small step size assumption Where i refers to the iteration order. The driving force f i (n) is defined as: (5.61) 47

f i ( n) * u( n) eo ( n), i 0 P( n) εi 1( n), i 1,2, Thus, a time-varying system characterized by the stochastic difference equation (5.56) is transformed into a set of equations having the same basic format as that described in (5.61), such that the solution to the ith equation in the set (i.e., step i in the iterative procedure) follows from the (i - 1)th equation. K( n) E[ ε( n) ε ( n)] E[ ε ( n) ε ( n)], ( i, k) 0,1,2, K( n) K ( n) K ( n) K ( n) 2 0 1 2 E[ ε ( n) ε ( n)] j K j ( n) E[ εi ( n) εk ( n)] i k i k 0 0 for j 0 i k for all ( i, k) 0 such that i k 2 j 1, 2 j 48

Small Step Size Analysis Assumption I: step size is small (how small?) LMS filter act like a low-pass filter with very low cut-off frequency. Assumption II: Desired response is described by a linear multiple regression model that is matched exactly by the optimum Wiener filter d( n) w ( n) u( n) e ( n) where e o (n) is the irreducible estimation error and o o Assumption III: The input and the desired response are jointly Gaussian. 49

Small Step Size Analysis Applying the similarity transformation resulting from the eigendecom. on i.e. ε ( n 1) ( I R) ε ( n) f ( n) f 0 o 0 * 0( n) u( n) eo ( n) We do not have this term in Wiener filtering!. Then, we have where The stochastic force vector Components of are uncorrelated! 50

* Q f Q u E ( n) E ( n) E ( n) e ( n) 0 o * Q E u( n) eo ( n) 0 2 * E ( n) ( n) Q E u( n) eo( n) eo( n) u ( n) Q Q E e ( n) e ( n) E u( n) u ( n) Q 2 nd assump. 2 * o o Q J RQ J Λ 2 2 min min 3r d assump. * * E u( n) eo ( n) eo ( n) u ( n) E u( n) eo ( n) E eo ( n) u ( n) * * E u( n) eo ( n) E eo ( n) u ( n) E eo ( n) eo ( n) E u( n) u ( n) * E eo( n) eo( n) E u( n) u ( n) J minr 2 2 E ( n) ( n) Q J minrq J min Λ 51

Small Step Size Analysis Components of v(n) are uncorrelated: stochastic force first order difference equation (Brownian motion, thermodynamics) Solution: Iterating from n=0 natural component of v(n) Can be shown: E v ( n) v (0)(1 ) k k k n forced component of v(n) 2 Jmin 2n 2 J min E vk ( n) (1 k ) vk (0) 2k 2k 52

Learning Curves Two kinds of learning curves The Mean-square error (MSE) learning curve The Mean-square deviation (MSD) learning curve e( n) d( n) wˆ ( n) u( n) ε( n) w wˆ ( n) Ensemble averaging results of many realizations are averaged. o What is the relation bw. MSE and MSD? E{ ε o (n) 2 } for small Euclidean norm of a vector is invariant to rotation by a similarity transform 53

Learning Curves etc. 54

55

Learning Curves - ε for small ε ε under the assumptions II and III. Excess MSE LMS performs worse than SD, there is always an excess MSE use 56

Learning Curves min D( n) J ( n) D( n) max ex or Jex ( n) Jex ( n) Dn ( ) Mean-square deviation D is lower-upper bounded by the excess MSE. They have similar response: decaying as n grows max min 57

Transient Behavior and Convergence For small ence, for convergence E v ( n) v (0)(1 ) n k k k or The ensemble-average learning curve of an LMS filter does not exhibit oscillations, rather, it decays exponentially to the const. value J ex (n) 58

Proof: J ( n) J J ( n) J tr RK ( n) min ex min 0 M J min ke vk( n) k 1 2 J Jmin M min 2n 2 min k (1 k ) vk (0) k 1 2k 2 k M M 2n k 2 Jmin J min Jmin k (1 k ) vk (0) k1 2k k1 2k M M k Jmin Jmin k 1 2 k 1 M k J ( ) J min Jex ( ) J min J min 2 M min Jmin k Jmin k 1 J 2n 2 min k (1 k ) vk (0) k 1 J J 2 2 min tr k R J 2 59

Misadjustment 2 0 max 0 2 tr R Misadjustment, define For small, from prev. slide or equivalently but then 60

Average Time Constant From SD we know that but then 61

Observations Misadjustment is directly proportional to the filter length M, for a fixed mse,av inversely proportional to the time constant mse,av slower convergence results in lower misadjustment. Directly proportional to the step size smaller step size results in lower misadjustment. Time constant is inversely proportional to the step size smaller step size results in slower convergence Large requires the inclusion of k (n) (k 1) into the analysis Difficult to analyse, small step analysis is no longer valid, learning curve becomes more noisy 62

LMS vs. SD Main goal is to minimise the Mean Square Error (MSE) Optimum solution found by Wiener-opf equations. Requires auto/cross-correlations. Achieves the minimum value of MSE, J min. LMS and SD are iterative algorithms designed to find w o. SD has direct access to auto/cross-correlations (exact measurements) w( n 1) w( n) [ p Rw( n)] n 0, 1, 2, can approach the Wiener solution w o, can go down to J min. LMS uses instantenous estimates instead (noisy measurements) wˆ wˆ u * ( n 1) ( n) ( n) e ( n) n 0, 1, 2,... fluctuates around w o in a Brownian-motion manner, at most J( ). 63

LMS vs. SD Learning curves SD has a well-defined curve composed of decaying exponentials For LMS, curve is composed of noisy- decaying exponentials 64

Statistical Wave Theory As filter length increases, M Propagation of electromagnetic disturbances along a transmission line towards infinity is similar to signals on an infinitely long LMS filter. Finite length LMS filter (transmission line) Corrections have to be made at the edges to tackle reflections, As length increases reflection region decreases compared to the total filter. Imposes a limit on the step size to avoid instability as M If the upper bound is exceeded, instability is observed. S max : maximum component of the PSD S(ω) of the tap inputs u(n). 65

Optimality of LMS A single realization of LMS is not optimum in the MSE sense Ensemble average is. The previous derivation is heuristic (replacing auto/cross correlations with their instantenous estimates.) In what sense is LMS optimum? It can be shown that LMS minimises Maximum energy gain of the filter under the constraint Minimising the maximum of something minimax Optimization of an criterion. 66

Optimality of LMS Provided that the step size parameter satisfies the limits on the prev. slide, then No matter how different the initial weight vector is from the unknown parameter vector w o of the multiple regression model, and Irrespective of the value of the additive disturbance n(n), The error energy produced at the output of the LMS filter will never exceed a certain level. 67

Limits on the Step Size 68

Robustness of the LMS algorithm A single realization of the LMS algorithm is not optimal in the leastmean square sense. Uncertainties in modeling and disturbance variations? Robust algorithms. Rough idea of (or mini-max) criterion is to assess whether unknown disturbances are attenuated or amplified in the adaptation. Ratio between the energy of estimation errors and the energy of unknown disturbances ~ energy gain from the disturbances to the estimation errors. The LMS (or NLMS, depending on the formulation of criterion) algorithm is optimal in the sense. 69

For the LMS algorithm it can be shown that i.e., sum of squared errors is always upper bounded by combined effects of the initial weight uncertainty and the noise robust behavior of the LMS algorithm. 70

Numerical Example - Directionality of Convergence When the eigenvalue spread of R is large, the convergence of the LMS algorithm has a directional nature. With increasing eigenvalue spread of R, the convergence becomes faster in some directions than other directions. X(R)=12.9 X(R)=2.9 71

Numerical example- channel equalization Transmitted signal: random Bernoulli sequence of ±1 s. The transmitted signal is corrupted by a channel. Channel impulse response: To the output of channel, white Gaussian noise with 2 0.001 is added. v The received signal is processed by a linear, 11-tap FIR equalizer adapted with the LMS algorithm 72

Block diagram of adaptive equalizer experiment. (a) Impulse response of channel; Delay=δ=2+5=7 (b) impulse response of optimum transversal equalizer 73

The amplitude distortion, and eigenvalue spread, were controlled byw. Time evolution of squared error e 2 (n) was averaged over 200 independent realizations / trials. The first tap input of the equalizer at time n equals 74

Quintdiagonal matrix 75

Experiment 1: Effect of Eigenvalue Spread The time evolution of squared error e 2 (n) was averaged over 200 trials. Results are shown for different values of step size. 76

Fig 5.21 77

Fig 5.22 Fig 5.22: Ensembleaverage impulse response of the adaptive equalizer (after 1000 iterations) for each of four different eigenvalue spreads. 78

Experiment 2: Effect of step size W=3.1 Χ(R)=11.1238 Fig 5.23 79

Numerical example- Adaptive prediction We use a first-order, autoregressive (AR) process to study the effects of ensemble averaging on the transient characteristics of the LMS algorithm for real data. Consider then an AR process u(n) of order 1, described by the difference equation u( n) au( n 1) n ( n) v(n) is a zero-mean white-noise process of variance σ v2. The real LMS algorithm for the adaptation of the (one and only) tap weight of the predictor is written as wˆ( n 1) wˆ( n) u( n 1) f ( n) where f(n) is the prediction error, defined by f ( n) u( n) wˆ ( n) u( n 1) 80

Fig 5.13 81

100 trials wˆ(0) 0 μ=0.05 Fig 5.14 82

Fig 5.15 83

Ensemble averaging over 100 independent trials Fig 5.16 84

Comparison of Experimental Results with Theory With the AR process of order one (M=1) we note the following for the problem at hand: M=1 Initial cond. for curve theory in Fig 5.17 85

The theoretical curve, labeled theory in Fig 5.18 5.127 5.128 Eqs 127 and 128 give: 86

87

μ=0.001 Fig 5.17 88

a 0.99 2 u 0.93627 0.001 Ense. ave. over 100 indep. trials Fig 5.18 89

Variants of the LMS algorithm 90

The LMS algorithm and its basic variants

Summary The LMS algorithm - workhorse of linear adaptive filtering Simplicity of implementation Model-independent and therefore robust performance Main limitation: slow convergence Principal factors of convergence: Step size μ 92

1/ μ ~ memory of the algorithm small μ slow convergence, small steadystate excess MSE Eigenvalues of the correlation matrix R of input signal Time constant of convergence limited by the smallest eigenvalues Excess MSE primarily determined by the largest eigenvalues Large eigenvalue spread likely slows down the convergence Several variants of the LMS algorithm exist 93

References [1] S. aykin, Adaptive Filter Theory, Chap. 9., 3rd ed., Prentice all, 1996. [2] T.K. Moon, W.C. Stirling, Mathematical Methods and Algorithms for Signal Processing, Chap. 14.6., Prentice all, 2000. [3] G.O. Glentis, K. Berberidis, S.Theodoridis, Efficient least squares adaptive algorithms for FIR transversal filtering, IEEE Signal Processing Magazine, vol. 16, no. 4, pp.13-41, July 1999. [4] T. Kailath, A. Sayed, B. assibi, Linear Estimation, Chap.1.6., Prentice all, 2000. W5: Ch5) p 5, 8, 10, 16 Computer Assignment 1: Ch5 p21, p22 94