Assesment of the efficiency of the LMS algorithm based on spectral information

Similar documents
IS NEGATIVE STEP SIZE LMS ALGORITHM STABLE OPERATION POSSIBLE?

Adaptive Filtering Part II

Samira A. Mahdi University of Babylon/College of Science/Physics Dept. Iraq/Babylon

Performance Comparison of Two Implementations of the Leaky. LMS Adaptive Filter. Scott C. Douglas. University of Utah. Salt Lake City, Utah 84112

Title without the persistently exciting c. works must be obtained from the IEE

ADAPTIVE FILTER THEORY

Advanced Digital Signal Processing -Introduction

On the Stability of the Least-Mean Fourth (LMF) Algorithm

Submitted to Electronics Letters. Indexing terms: Signal Processing, Adaptive Filters. The Combined LMS/F Algorithm Shao-Jen Lim and John G. Harris Co

A Strict Stability Limit for Adaptive Gradient Type Algorithms

Chapter 2 Wiener Filtering

New Recursive-Least-Squares Algorithms for Nonlinear Active Control of Sound and Vibration Using Neural Networks

Adaptive Filter Theory

A Derivation of the Steady-State MSE of RLS: Stationary and Nonstationary Cases

26. Filtering. ECE 830, Spring 2014

Transient Analysis of Data-Normalized Adaptive Filters

EE482: Digital Signal Processing Applications

ADAPTIVE signal processing algorithms (ASPA s) are

NSLMS: a Proportional Weight Algorithm for Sparse Adaptive Filters

SNR lidar signal improovement by adaptive tecniques

Adaptive Inverse Control

Performance Analysis and Enhancements of Adaptive Algorithms and Their Applications

EFFECTS OF ILL-CONDITIONED DATA ON LEAST SQUARES ADAPTIVE FILTERS. Gary A. Ybarra and S.T. Alexander

Adaptive Filtering. Squares. Alexander D. Poularikas. Fundamentals of. Least Mean. with MATLABR. University of Alabama, Huntsville, AL.

Binary Step Size Variations of LMS and NLMS

NOISE ROBUST RELATIVE TRANSFER FUNCTION ESTIMATION. M. Schwab, P. Noll, and T. Sikora. Technical University Berlin, Germany Communication System Group

ACTIVE noise control (ANC) ([1], [2]) is an established

Temporal Backpropagation for FIR Neural Networks

Ch5: Least Mean-Square Adaptive Filtering

On the Use of A Priori Knowledge in Adaptive Inverse Control

ADAPTIVE FILTER THEORY

A Unified Approach to the Steady-State and Tracking Analyses of Adaptive Filters

Variable Learning Rate LMS Based Linear Adaptive Inverse Control *

Sparse Least Mean Square Algorithm for Estimation of Truncated Volterra Kernels

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 53, NO. 3, MARCH

2262 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 47, NO. 8, AUGUST A General Class of Nonlinear Normalized Adaptive Filtering Algorithms

Title algorithm for active control of mul IEEE TRANSACTIONS ON AUDIO SPEECH A LANGUAGE PROCESSING (2006), 14(1):

Adaptive MMSE Equalizer with Optimum Tap-length and Decision Delay

Direction of Arrival Estimation: Subspace Methods. Bhaskar D Rao University of California, San Diego

3. ESTIMATION OF SIGNALS USING A LEAST SQUARES TECHNIQUE

EE482: Digital Signal Processing Applications

On the Convergence of the Inverses of Toeplitz Matrices and Its Applications

Dominant Pole Localization of FxLMS Adaptation Process in Active Noise Control

Efficient Use Of Sparse Adaptive Filters

An Iteration-Domain Filter for Controlling Transient Growth in Iterative Learning Control

Chapter 2 Fundamentals of Adaptive Filter Theory

Statistical and Adaptive Signal Processing

Ch4: Method of Steepest Descent

AdaptiveFilters. GJRE-F Classification : FOR Code:

Massoud BABAIE-ZADEH. Blind Source Separation (BSS) and Independent Componen Analysis (ICA) p.1/39

Performance Analysis of Norm Constraint Least Mean Square Algorithm

An Adaptive Sensor Array Using an Affine Combination of Two Filters

3.4 Linear Least-Squares Filter

inear Adaptive Inverse Control

IMPROVEMENTS IN ACTIVE NOISE CONTROL OF HELICOPTER NOISE IN A MOCK CABIN ABSTRACT

Blind Source Separation with a Time-Varying Mixing Matrix

Adaptive Filters. un [ ] yn [ ] w. yn n wun k. - Adaptive filter (FIR): yn n n w nun k. (1) Identification. Unknown System + (2) Inverse modeling

A METHOD OF ADAPTATION BETWEEN STEEPEST- DESCENT AND NEWTON S ALGORITHM FOR MULTI- CHANNEL ACTIVE CONTROL OF TONAL NOISE AND VIBRATION

Error Vector Normalized Adaptive Algorithm Applied to Adaptive Noise Canceller and System Identification

Parametric Signal Modeling and Linear Prediction Theory 1. Discrete-time Stochastic Processes

On Information Maximization and Blind Signal Deconvolution

THE concept of active sound control has been known

A Fast Algorithm for. Nonstationary Delay Estimation. H. C. So. Department of Electronic Engineering, City University of Hong Kong

arxiv: v1 [cs.sd] 28 Feb 2017

KNOWN approaches for improving the performance of

EIGENFILTERS FOR SIGNAL CANCELLATION. Sunil Bharitkar and Chris Kyriakakis

The Kalman filter is arguably one of the most notable algorithms

Automatic Stabilization of an Unmodeled Dynamical System Final Report

A low intricacy variable step-size partial update adaptive algorithm for Acoustic Echo Cancellation USNRao

Comparative Performance Analysis of Three Algorithms for Principal Component Analysis

ADAPTIVE INVERSE CONTROL BASED ON NONLINEAR ADAPTIVE FILTERING. Information Systems Lab., EE Dep., Stanford University

THE Back-Prop (Backpropagation) algorithm of Paul

SPEECH ENHANCEMENT USING PCA AND VARIANCE OF THE RECONSTRUCTION ERROR IN DISTRIBUTED SPEECH RECOGNITION

A Minimum Error Entropy criterion with Self Adjusting. Step-size (MEE-SAS)

POLYNOMIAL SINGULAR VALUES FOR NUMBER OF WIDEBAND SOURCES ESTIMATION AND PRINCIPAL COMPONENT ANALYSIS

Removing ECG-contamination from EMG signals

USING STATISTICAL ROOM ACOUSTICS FOR ANALYSING THE OUTPUT SNR OF THE MWF IN ACOUSTIC SENSOR NETWORKS. Toby Christian Lawin-Ore, Simon Doclo

Performance analysis and design of FxLMS algorithm in broadband ANC system with online secondary-path modeling

Lecture: Adaptive Filtering

Stochastic Processes. M. Sami Fadali Professor of Electrical Engineering University of Nevada, Reno

Exploring Granger Causality for Time series via Wald Test on Estimated Models with Guaranteed Stability

GENERALIZED DEFLATION ALGORITHMS FOR THE BLIND SOURCE-FACTOR SEPARATION OF MIMO-FIR CHANNELS. Mitsuru Kawamoto 1,2 and Yujiro Inouye 1

Hirschman Optimal Transform Block LMS Adaptive

Prof. Dr.-Ing. Armin Dekorsy Department of Communications Engineering. Stochastic Processes and Linear Algebra Recap Slides

Least Mean Squares Regression. Machine Learning Fall 2018

7.7 The Schottky Formula for Shot Noise

Least Mean Squares Regression

1 Introduction Blind source separation (BSS) is a fundamental problem which is encountered in a variety of signal processing problems where multiple s

FAST AND ACCURATE DIRECTION-OF-ARRIVAL ESTIMATION FOR A SINGLE SOURCE

Stochastic error whitening algorithm for linear filter estimation with noisy data

NONLINEAR ECHO CANCELLATION FOR HANDS-FREE SPEAKERPHONES. Bryan S. Nollett and Douglas L. Jones

Spatial Smoothing and Broadband Beamforming. Bhaskar D Rao University of California, San Diego

Optimal and Adaptive Filtering

Analysis of stochastic gradient tracking of time-varying polynomial Wiener systems

where A 2 IR m n is the mixing matrix, s(t) is the n-dimensional source vector (n» m), and v(t) is additive white noise that is statistically independ

AN ALTERNATIVE FEEDBACK STRUCTURE FOR THE ADAPTIVE ACTIVE CONTROL OF PERIODIC AND TIME-VARYING PERIODIC DISTURBANCES

Performance Analysis of an Adaptive Algorithm for DOA Estimation

LMS and eigenvalue spread 2. Lecture 3 1. LMS and eigenvalue spread 3. LMS and eigenvalue spread 4. χ(r) = λ max λ min. » 1 a. » b0 +b. b 0 a+b 1.

Revision of Lecture 4

Reduced-cost combination of adaptive filters for acoustic echo cancellation

Transcription:

Assesment of the efficiency of the algorithm based on spectral information (Invited Paper) Aaron Flores and Bernard Widrow ISL, Department of Electrical Engineering, Stanford University, Stanford CA, USA Email: {aflores,widrow}@stanford.edu Abstract The algorithm is used to find the optimal minimum mean-squared error (SE) solutions for a wide variety of problems. Unfortunately, its convergence speed depends heavily on its initial conditions when the autocorrelation matrix R of its input vector has a high eigenvalue spread. In many applications such as system identification or channel equalization, R is Toeplitz. In this paper we exploit the Toeplitz structure of R to show that when the weight vector is initialized to zero, the convergence speed of is related to the similarity between the input PSD and the power spectrum of the optimum solution. I. INTRODUCTION The algorithm is widely used to find the minimum mean-squared error (SE) solutions for a variety of linear estimation problems, and its efficiency has been extensively studied during the last decades [] [8]. The convergence speed of depends on two factors: the eigenvalue distribution of the input autocorrelation matrix R, and the chosen initial condition. odes corresponding to small eigenvalues converge much more slowly than those corresponding to large eigenvalues, and the initialization of the algorithm determines how much excitation is received by each of these modes. In practice, it is not known how the chosen initial conditions excite each of the modes, making the speed of convergence difficult to predict. Our analysis of the convergence speed is concerned with applications where the input vector comes from a tapped delay line, inducing a toeplitz structure in the R matrix. Using the toeplitz nature of R, we show that the speed of convergence of the algorithm can be estimated from its input power spectral density and the spectrum of the difference between the initial weight vector and the optimum solution. The speed of convergence of was qualitatively assessed by comparing it to its ideal counterpart, the /Newton algorithm, which is often used as a benchmark for adaptive algorithms [7], [8]. In section II we describe both the and /Newton algorithms and derive approximations to their learning curves on which our transient analysis is based. In section III we define the performance metric used to evaluate speed of convergence, and in section IV we show how that metric can be estimated from spectral information. We illustrate our result with simulations in section V and summarize our conclusions and future work in Section VI. II. THE AND /NEWTON ALGORITHS Adaptive algorithms such as and /Newton are used to solve linear estimation problems like the one depicted in Fig., where the input vector = [ x k x Lk ] T and desired response Rare jointly stationary random processes, w k = [w k w k w Lk ] T is the weight vector, = x T k w k is the output, and ɛ k = is the error. The ean Square Error (SE) is defined as ξ k = E[ɛ k ] and it is a quadratic function of the weight vector. The optimal weight vector that minimizes ξ k is given by w = R p,where R = E[ x T k ] is the input autocorrelation matrix (assumed to be full rank), and p = E[ ] is the crosscorrelation vector. The minimum SE (SE) obtained using w is denoted by ξ. input signal vector { x k x Lk. Fig.. w k w k w Lk. ɛ k error Adaptive Linear Combiner desired response output A. The algorithm Often in practice R p cannot be calculated due to the lack of knowledge of the statistics R and p. However,when samples of and are available, they can be used to iteratively adjust the weight vector to obtain an approximation of w. The simplest and most widely used algorithm for this is []. It performs instantaneous gradient descent adaptation of the weight vector: w k = w k µɛ k. () The step size parameter is µ and the initial weight vector w is arbitrarily set by the user. The SE sequence ξ k corresponding -783-86-//$. IEEE

to the sequence of adapted weight vectors w k is commonly known as the learning curve. Next we derive an approximation for ξ k that is the basis of our speed of convergence analysis. Since R is symmetric, we can diagonalize it as R = Q T ΛQ, where Λ is a diagonal matrix of the eigenvalues λ i of R, andq is a real orthonormal matrix with the corresponding eigenvectors of R as columns. Let v k = wk w be the weight vector deviation from the optimal solution and v k = Q T v k.letf k be the vector obtained from the diagonal of E[v k v kt ],and λ =[λ λ...λ L ] T. Assuming the random processes {, } to be independent, it can be shown [3] that the SE at time k can be expressed as ξ k = ξ λ T F k. () If we further assume to be gaussian, it was shown in [5] and [3] that F k obeys the following recursion F k =(I µλ 8µ Λ µ λλ T )F k µ ξ λ, (3) where F k is shown [5] to converge if µ < 3Tr(R). This condition in µ allows us to approximate (3) by F k (I µλ)f k µ ξ λ. () Denoting by R L a vector of ones and using (), F k (I µλ) k (F µξ )µξ, (5) Substituting (5) in (), we obtain the following approximation for the learning curve ξ k ξ λ T (I µλ) k (F µξ ). (6) where ξ = limk ξ k ξ ( µ Tr (R)). B. The /Newton algorithm The /Newton algorithm [6] is an ideal variant of the algorithm that uses R to whiten its input. Although most of the time it cannot be implemented in practice due to the lack of knowledge of R, it is of theoretical importance as a benchmark for adaptive algorithms [7], [8]. The /Newton algorithm is the following w k = w k µλ avg R ɛ k (7) P N where λ avg = i= λi n. It is well known [6] that the /Newton algorithm is equivalent to the algorithm with learning rate µλ avg and previously whitened and normalized such that Λ=I. Therefore, assuming the same independence and gaussian statistics of and as with, we can replace µ by µλ avg and Λ by I in (6) to obtain the following approximation for the /Newton learning curve ξ k ξ ( µλ avg ) k λ T (F µξ ) (8) where the asymptotic SE ξ is the same as for. III. TRANSIENT EFFICIENCY Examining (6) and (8) we can see that the learning curve for is a sum of geometric sequences (modes) with µλ i as geometric ratios, whereas for /Newton it consists of a single geometric sequence with geometric ratio µλ avg.if all the eigenvalues of R are equal, the learning curves for and /Newton are the same. Generally, the eigenvalues of R are not equal, and consists of multiple modes, some faster than /Newton and some slower than it as depicted in Fig. ean Square Error (SE) ξ 8 6 (fast mode) /Newton (slow mode) (average) 3 5 6 7 8 9 Fig.. and /Newton modes To evaluate the speed of convergence of a learning curve we use the natural measure suggested in [5], J = ξ k ξ. (9) k= Small values of J indicate fast convergence and large values indicate slow convergence. Using (6) and (8) to obtain J for and /Newton respectively we get J T F µlξ µ = vt v µ Lξ () J /Newton λt F µλ avg Lξ = vt Rv Lξ µλ avg µλ avg () To compare the speed of convergence of to the one of /Newton, we define what we call the Transient Efficiency Tansient Efficiency = J /Newton J () If the Transient Efficiency is bigger/smaller than one, then performs better/worse than /Newton in the sense of the J metric. Assuming that the initial weight vector w is far enough from w such that v T v µlξ and vt Rv µλ avg µlξ, we obtain from (), () and () Transient Efficiency v T Rv λ avg v T v (3)

Since (3) is invariant to scaling of v and R, we can assume without loss of generality v = λ avg =, obtaining the following compact expression: Transient Efficiency v T Rv () This is the first contribution in this paper and it will be used to obtain an approximation of the Transient Efficiency in terms of spectra in the next section. IV. TRANSIENT EFFICIENCY IN TERS OF SPECTRAL INFORATION In this section we show that in the case that R is toeplitz, the Transient Efficiency can be expressed in terms of the Fourier spectrum of v and the input power spectral density. The toeplitz structure of R arises from applications where the input vector comes from a Tapped Delay Line (TDL) as showninfig.3,i.e. =[... L ] T, where is a stationary scalar random process with autocorrelation sequence φ xx [n] = E[ n ]. The input autocorrelation matrix is therefore given by R k,l = φ xx [k l]. z z... w w w L Fig. 3.... L error Tapped Delay Line desired response output We define the Input Power Spectral Density vector Φ xx R as the uniformly spaced samples of the DTFT of the N- point truncation of the input autocorrelation function φ xx [n], i. e. Φ xx [m] = N n=n mn πj φ xx [n]e (5) m =,,..., N L, N L. In order to show how Φ xx R is related to the Transient Efficiency, the following Lemma will be needed Lemma (Spectral Factorization of R): Let Ψ be a diagonal matrix with Ψ m,m = Φ xx [m] as its diagonal entries, and let U be a L matrix with elements U l,m = ml πj e with l L and m, thenr can be factored in the following way R = UΨU (6) Proof: Let T = UΨU,then T k,l = m= = = = = e πj km ml πj Ψm,m e (7) m= m(kl) πj Φ xx [m]e (8) N m= n=n N n=n N n=n φ xx [n] φ xx [n]e m= mn πj e πj m(kl) (9) m(kln) πj e () φ xx [n]δ[k l n] () = φ xx [k l] =R k,l () where the step from () to () was done using the following inequalities: L k l L, N n N, and N L. Let V = U v be the -point DFT of v,i.e. V [m] = L mn πj v [n]e m =,,..., n= Using Lemma and (3) we obtain v T o Rv = v T o UΨU v = m= (3) Φ xx [m] V o [m], () furthermore, inserting () into (), we arrive to the following approximation for the Transient Eficciency in terms of the input psd and the spectrum of v o Transient Efficiency m= Φ xx [m] V [m] (5) We should note that, although the independence assumption under which the approximation in () was derived is violated when a tapped delay line is used, it has been observed to hold in simulations. Since v and R are scaled such that v = λ avg =, we can show that m= Φ xx[m] =, m= V [m] = (6) These normalizations on Φ xx [m] and V [m] allow us to interpret the right hand side of (5) as a weighted average of Φ xx [m] where the weights are given by V [m]. Therefore, if Φ xx [m] is large in the frequency ranges where V [m] is also large, the Transient Efficiency will likely be bigger than one, implying that will outperform /Newton under the J metric. On the other hand, if Φ xx [m] tends to be small in the frequency ranges where V [m] is large, the Transient Efficiency will likely be less than one indicating that will perform worse than /Newton in the J metric sense.

An important application of (5) is to the special case when the weight vector is initialized to zero w o =, hence v o = w, which results in V [m] = W [m] (7) where W [m] is the -point Discrete Fourier Transform of the optimal or Wiener solution w as defined in (3). Therefore, when the weight vector is initialized to zero, the speed of convergence of the algorithm can be estimated by the weighted average of the input psd with the weights given by the spectrum of the Wiener solution. This observation can be very useful for system identification applications, where prior approximatnowledge of the input psd and frequency response of the plant to be identifiedmaybeusedtoestimate s performance before implementing it. For example, if the plant has a low-pass frequency response and the input psd has very little power in the frequency range above the cut-off frequency, will converge very fast. On the other hand, if the plant has a high-pass nature and the input psd is small for high frequencies, will be very slow. In channel equalization applications the spectrum of the optimum solution is the opposite of the channel frequency response; hence, if we initiliaze the weight vector to zero, (5) will be small implying a slow convergence of, a phenomenom often observed when using in equalization or adaptive inverse control problems. We illustrate these ideas with two examples in the next section. V. SIULATIONS Consider the system identification problem depicted in Fig.. The additive plant noise is independent of the input. Both, the adaptive filter and the plant consists of 6 taps, so the spectrum of the optimum solution coincides with the plant frequency response. The weight vector is initialized to zero and adapted for 8 iterations. The learning rate is set so that µ Tr (R) =.5. The power spectrum of the input and the frequency response of the plant to be identified are shown in Fig. 5, where a value of = was used to calculate Φ xx [m] and W [m]. The eigenvalue spread for this exercise was 579. By inspection of the input psd and the frequency response of the plant we expect the Transient Efficiency to be bigger than one, indicating that should perform better than /Newton, which is confirmed by their learning curves shown in Fig. 6. The estimate for the Transient efficiency obtained from the simulation was.6, very close to the approximation of.69 given by (5). To illustrate what happens if we use zero initial conditions when the input psd and the Wiener solution spectrum are very distinct, consider the simple equalization problem depicted in Fig. 7. The input to the channel is a white signal, hence the input psd for the adaptive filter is given by the magnitude squared of the channel frequency response. The channel frequency response and spectrum of the wiener solution for the adaptive equalizer are shown in Fig. 8 where a value of = was used. The eigenvalue spread of the input to the adaptive filter was 75.. As intuitively expected, the input W [m] Input Φxx[m] 3 Fig.. FIR Filter Unknown Adaptive Filter n k Noise Error System Identification Example Output.5.5.5 3 3.5..5..5 Fig. 5. ean Square Error SE Eigenspread = 579 Input PSD Channel Response.5.5.5 3 3.5 Fig. 6. Input Power Spectrum and Response /Newton 3 5 6 7 8 Learning curves for System Identification example 3

psd and Wiener spectrum have opposite shapes resulting in a small Transient Efficiency,indicating that should perform worse than /Newton, which was corroborated by their learning curves shown in Fig. 9. The estimate for the Transient efficiency obtained from the simulation was.3, very close to the approximation of.37 given by (5). plant Input W [m] Φxx[m] 3 Channel Fig. 7. Adaptive Equalizer Error Equalization Example.5.5.5 3 3.5..3.. Fig. 8. ean Square Error SE Eigenspread = 75. Input PSD.5.5.5 3 3.5 Wiener Solution Spectrum Output response of plant and optimum equalizer SE Fig. 9. /Newton 3 5 6 7 8 9 Learning curves for equalization example VI. CONCLUSIONS AND FUTURE WORK We have shown that when the algorithm is used to train a transversal adaptive filter with jointly stationary input and desired response signals, its transient performance can be assesed by the inner product between the input psd and the spectrum of the initial weight vector deviation from the Wiener solution. This implies that when the initial weight vector is set to zero, the transient efficiency of the algorithm can be predicted from approximatnowledge of the input psd and the frequency response of the Wiener solution. We described how this theory can be applied to system identification and equalization tasks; however, our results can be useful in predicting the performance in any application where the input autocorrelation matrix is toeplitz and spectral information about the input and optimum solution is available. An extension of this analysis is being made to nonstationary conditions, where the spectral characteristics of the input and optimum solutions keep a general shape, say both being always low pass. A similar analysis based on the ean Square Deviation (SD) instead of the SE is also currently investigated. REFERENCES [] B. Widrow and. E. Hoff, Jr, Adaptive switching circuits, in IRE WESCON Con. Rec., pt., 96, pp. 96. [] B. Widrow, J.. ccool,. G. Larimore, and J. C. R. Johnson, Stationary and nonstationary learning characteristics of the adaptive filter., Proceedings of the IEEE, vol. 6, no. 8, pp. 5 6, Aug. 976. [3] L. L. Horowitz and K. D. Senne, Performance advantage of complex for controlling narrow-band adaptive arrays, IEEE Trans. Acoust. Speech Signal Process., vol. ASSP-9, no. 3, pp. 7 736, June 98. [] B. Widrow and E. Walach, On the statistical efficiency of the lms algorithm with nonstationary inputs., IEEE Transactions on Information Theory, vol. 3, no., pp., 98. [5] A. Feuer and E. Weinstein, Convergence analysis of filters with uncorrelated Gaussian data, IEEE Trans. Acoust. Speech Signal Process., vol. ASSP-33, no., pp. 9, Feb. 985. [6] B. Widrow and S. D. Stearns, Adaptive Signal Processing, Prentice Hall, Upper Saddle River, NJ, 985. [7] F.. Boland and J. B. Foley, Stochastic convergence of the lms algorithm in adaptive systems., Signal Processing, vol. 3, no., pp. 339 5, DEC 987. [8] B. F. Boroujeny, On statistical efficiency of the lms algorithm in system modeling., IEEE Transactions on Signal Processing, vol., no. 5, pp. 97 5, AY 993. [9] D. R. organ, Slow asymptotic convergence of lms acoustic echo cancelers., IEEE Transactions on Speech and Audio Processing, vol. 3, no., pp. 6 36, AR 995. [] S. C. Douglas and W.. Pan, Exact expectation analysis of the lms adaptive filter., IEEE Transactions on Signal Processing, vol. 3, no., pp. 863 7, DEC 995. [] B. Hassibi, A. H. Sayed, and T. Kailath, h optimality of the lms algorithm., IEEE Transactions on Signal Processing, vol., no., pp. 67 8, FEB 996. []. Reuter and J. R. Zeidler, Nonlinear effects in lms adaptive equalizers., IEEE Transactions on Signal Processing, vol. 7, no. 6, pp. 57 9, JUN 999. [3] K. J. Quirk, L. B. ilstein, and J. R. Zeidler, A performance bound for the lms estimator., IEEE Transactions on Information Theory, vol. 6, no. 3, pp. 5 8, AY. [] N. R. Yousef and A. H. Sayed, A unified approach to the steady-state and tracking analyses of adaptive filters., IEEE Transactions on Signal Processing, vol. 9, no., pp. 3, FEB. [5] H. J. Butterweck, A wave theory of long adaptive filters., IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, vol. 8, no. 6, pp. 739 7, JUN. [6] O. Dabeer and E. asry, Analysis of mean-square error and transient speed of the adaptive algorithm, IEEE Trans. Inf. Theory, vol. 8, no. 7, pp. 873 89, July. [7] B. Widrow and. Kamenetsky, Statistical efficiency of adaptive algorithms., Neural Networks, vol. 6, no. 5/6, pp. 735, JUN- JUL 3. [8] S. Haykin and B. Widrow, Least-mean-square adaptive filters, Wiley- Interscience, Hoboken, N.J., 3.