Sparse Least Mean Square Algorithm for Estimation of Truncated Volterra Kernels

Sparse Least Mean Square Algorithm for Estimation of Truncated Volterra Kernels Bijit Kumar Das 1, Mrityunjoy Chakraborty 2 Department of Electronics and Electrical Communication Engineering Indian Institute of Technology, Kharagpur, INDIA E.Mail : 1 bijitbijit@gmail.com, 2 mrityun@ece.iitkgp.ernet.in Abstract The Volterra series model, though a popular tool in modeling many practical nonlinear systems, suffers from the problem of over-parameterization, as too many coefficients need to be identified, requiring very long data records. On the other hand, often it is observed that of all the model coefficients, only a few are prominent while the others are relatively insignificant. The sparsity inherent in such systems is, however, not exploited by standard estimators which are based on minimization of some L2 norm like mean square error or sum of error squared. This paper draws inspiration from the domain of compressive sampling and proposes an adaptive algorithm for estimating sparse Volterra Kernels, by embedding a L1 norm penalty on the coefficients into the quadratic least mean squares (LMS) cost function. It is shown that the proposed algorithm can achieve a lower steadystate mean square error than that of a standard LMS based algorithm for identifying the Volterra model. Index terms : Volterra Series, L 1 norm, Sparse Systems, LMS adaptation II. PROBLEM FORMULATION AND ALGORITHM A. LMS Algorithm for Truncated Volterra Series Model The development of a gradient-type LMS adaptive algorithm for truncated Volterra series nonlinear models follows a similar method of development as for linear systems. The truncated p-th order Volterra series expansion is given as [1], y(n) = h 0 + m 1=0m 2=0...+ m 1=0 h 1 (m 1 )x(n m 1 )+ h 2 (m 1,m 2 )x(n m 1 )x(n m 2 )+... m 1=0m 2=0... m p=0 h p (m 1,m 2,.....m p )x(n m 1 )x(n m 2 )...x(n m p ). (1) I. INTRODUCTION Adaptive identification of nonlinear systems has found many applications in areas like control, communications, biological signal processing, image processing, etc. For systems with sufficiently smooth nonlinearity, the Volterra series [1] offers a well-appreciated model of the system, expressing the output as a polynomial expansion of the input. The number of terms in the Volterra series, however, increases exponentially as the model order increases and as a result, often in practice, a truncated model (upto 2nd order) is considered. The coefficients of such a model are then identified by an appropriate adaptive algorithm, e.g., the LMS algorithm [9]-[10].. In various applications, one, however, comes across sparse Volterra models that have several coefficients zero or negligible. Such a priori knowledge about the sparsity of the system, if embedded in the identification algorithm, can boost up the performance of the algorithm. However, except for [2], the sparsity has not so far been exploited in the identification of Volterra systems. In [2], new algorithms, both batch and recursive, have been developed, and the recursive algorithm, being a variant of the recursive least squares (RLS) [9]-[10], carries the demerit of huge computational burden. This motivates us to develop a LMS based alternative which exploits the sparse nature of the Volterra system model. Assuming h 0 = 0 and p = 2, the weight vector for the adaptive filter at the n-th index is given by, H(n) = {h 1 (0;n),h 1 (1;n),...,h 1 (N 1;n),h 2 (0,0;n),h 2 (0,1;n),..., h 2 (0,N 1;n),h 2 (1,1;n),...h 2 (N 1,N 1;n)} T (2) Similarly, the input vector at the n-th index is given as, X(n) = {x(n),x(n 1),...,x(n N +1),x 2 (n),x(n)x(n 1),...,x(n)x(n N +1) x 2 (n 1),...,x 2 (n N +1)} T. (3) Linear and quadratic coefficients are updated separately by minimizing the instantaneous square of the error where J(n) = e 2 (n) (4) e(n) = d(n) ˆd(n) (5) ˆd(n) is the estimate of d(n). This results in the following update equations : h 1 (m 1 ;n+1) = h 1 (m 1 ;n) µ e 2 (n) 2 h 1 (m 1 ;n) = h 1 (m 1 ;n)+µe(n)x(n m 1 ), (6) 350 10-0103500354 2010 APSIPA. All rights reserved. Proceedings of the Second APSIPA Annual Summit and Conference, pages 350 354, Biopolis, Singapore, 14-17 December 2010.

and, h 2 (m 1,m 2 ;n+1) = h 2 (m 1,m 2 ;n) µ e 2 (n) 2 h 2 (m 1,m 2 ;n) = h 2 (m 1,m 2 ;n) + µe(n)x(n m 1 )x(n m 2 ), (7) where µ is the so-called step-size, used to control the speed of convergence and ensure stability of the filter. Fig. 1. Second order Volterra series model with N=3. Using the weight vector notation, H(n), we can combine the two update equations into one as the coefficient update equation e(n) = d(n) H T (n) X(n) (8) H(n+1) = H(n)+µ X(n)e(n), (9) where the value of µ is chosen such that 0 < µ < 2 λ max, (10) withλ max denoting the maximum eigenvalue of the autocorrelation matrix of the input vector X(n). For nonlinear Volterra filters, the eigenvalue spread of the autocorrelation matrix of the input vector is quite large. This leads to slow convergence. Note that the symmetric property of the coefficients reduces the length of the coefficient vector by half. B. Sparse Nature of Volterra Kernels In many applications, the associated Volterra kernels are sparse, meaning that many of the entries of H(n) are zero. Consider for example the Linear-Nonlinear-Linear (LNL) model employed in various applications like modeling the effects of nonlinear amplifiers in OFDM, the satellite communication channel, or the transfer function of loudspeakers and headphones. The LNL model consists of a linear filter h a (k), k = 0,1,,L a 1, in cascade with a memoryless nonlinearity f(x), and a second linear filter h b (k), k = 0,1,,L b 1. The overall memory is thus L = L a +L b 1. If the nonlinear function is analytic on an open set (a, b), it accepts a Taylor series expansion : f(x) = c p x p, x p=0 (a, b). It can then be shown that the p-th order Volterra kernel is given by [1] L b 1 h p (k 1,k 2,...k p ) = c p k=0 h b (k)h a (k 1 k)...h a (k p k) In (11), there exist p-tuples (k 1, k 2,,k p ) for which there is no k {0,...,L b 1} such that (k i k) {0,L a 1} for all i = 1,,p. For these p-tuples, the Volterra kernel equals zero. Further, if the second filter in the LNL model is dropped, then one obtains the so-called Wiener model, for which the p-th order Volterra kernel is expressed as (11) h p (k 1,,k p ) = c p h a (k 1 ) h a (k p ). (12) Due to the separability of the kernel in (12), if the impulse responseh a (k) is also sparse, then the Volterra kernel becomes even sparser. Apart from these nonlinear systems with special structures, it has been observed that in many applications, only a few kernel coefficients contribute to the output [3]. Furthermore, the sparsity of the Volterra representation can also arise when the degree of the nonlinearity and the system memory are not known a priori. In this case, kernel estimation must be performed jointly with model order selection. Based on these considerations, exploiting the sparsity present in many Volterra representations is well motivated. C. A Sparsity-Aware Variant of LMS for Volterra Kernels Estimation In the proposed method, we employ L 1 norm regularization to exploit the a priori information that the Volterra model is over-parameterized and sparse. In (4), by combining the L 1 norm penalty of the coefficient vector with the instantaneous square error, a new cost function J 1 (n) can be defined as J 1 (n) = e 2 (n)+γ H(n) 1, (13) where. 1 indicates the L 1 norm of the vector considered. Using the gradient descent updating, the new filter update is then obtained as H(n+1) = H(n) µ J 1(n) H(n) = H(n)+µ X(n)e(n) ρsign( H(n))(14) where ρ = µγ and sign(.) is a component-wise sign function defined as { x sign(x) = x : x 0 (15) 0 : x = 0 351

Comparing with (9), (14) has the additional term ρsign( H(n)) which always attracts the tap coefficients towards zero. In other words, it exploits the sparse nature of the system model. This update equation is an extension of the ZA-LMS algorithm for linear sparse systems [4] to nonlinear over-parameterized Volterra kernels. Following steps analogous to [4], the mean coefficient vector E[ H(n)] can be shown to converge as E[ H( )] = H opt ρ µ R 1 E[sign H( )] (16) if µ satisfies (10). Similarly, the steady state excess mean square error in this case will be given as, where P ex ( ) = η 2 η P α 1 0 + (2 η)µ ρ(ρ 2α 2 ) (17) α 1 α 1 = E[sign( H( )) T ((I µr) 1 sign( H( ))] (18) with I denoting the identity matrix, R denoting the autocovariance matrix of the input, P 0 indicating the minimum mean square error, and η = Tr.(µR(I µr) 1 ), and, α 2 = E[ H( ) 1 ] H opt 1. (19) [Derivations of (16)-(19) are skipped in this paper and will be provided in the revised version of the manuscript.] For highly sparse systems, if ρ is properly selected between 0 and 2α2 α 1, lower MSE than obtainable under the standard LMS algorithm will be observed. III. SIMULATION STUDIES A. Linear-Nonlinear-Linear (LNL) Model Fig. 2. General nonlinear model (LNL) The proposed algorithm was simulated using matlab. First, a L-N-L model was constructed as shown in Fig. 2 above, having a linear FIR filter with impulse response h(n) = [ 0.9,0,0.87,0, 0.3,0.2,0,0] T, in cascade with the memoryless nonlinearity f(x) = 0.4x 2 +0.5x, which is followed by the same linear filter h(n). This system is exactly described by a Volterra expansion with N = 15 and p = 2, leading to a total of 136 kernel coefficients stored in the vector H. Out of these, only a few kernel coefficients are nonzero. The system input was taken as a zero mean, unit variance white Gaussian process (i.e., N (0,1)) while the output was corrupted by additive white Gaussian noise with zero mean and variance of 0.0316 (i.e., N (0,0.001)), leading to a signal to noise ratio (SNR) of 30dB. Fig. 3 shows the learning curves by plotting the observed mean-square error (MSE), averaged over 3000 experiments, against the iteration index n for the following Fig. 4. The Wiener nonlinear model cases : (i) the standard LMS algorithm, as given by (9), with µ = 0.0002 [the blue curve], and, (ii) the proposed sparse LMS algorithm, given by (14) with, µ = 0.0002 and ρ = 0.000003 [the red curve]. While the convergence rates are almost identical for the two cases, as is to be expected since the same value of µ is used for both, quite clearly, the proposed sparse LMS algorithm has a lower steady-state mean square error as compared to the standard LMS case. B. Wiener Model Next we considered a Wiener model, which is a cascade of a linear filter and a memoryless nonlinearity, as shown in Fig. 4. For our simulation, the linear filter chosen had the impulse response, h(n) = = [ 0.9,0,0.87,0, 0.3,0.2,0,0,0,0,0,0,0.514, 0.95, 0.12] T and the memoryless nonlinearity was given by f(x) = 0.4x 2 + 0.5x. This system too is exactly described by a Volterra expansion with N = 15 and p = 2, leading to a total of 136 kernel coefficients, out of which only a few are nonzero. As before, the system input was taken as a zero mean, unit variance white Gaussian process and the output noise was taken to be zero mean, white Gaussian with variance 0.0316, resulting in a SNR of 30 db. The corresponding learning curves are shown in Fig. 5, both for the standard LMS (the blue curve) with µ = 0.0007 and the proposed sparse LMS (the red curve) with µ = 0.0007 and ρ = 0.000003, by averaging the MSE curves over 3000 experiments. The sparse LMS shows a lower steady-state mean square error. Quite clearly, under the same convergence rate condition, the proposed algorithm exhibits considerably lesser steady-state mean square error as compared to the standard LMS algorithm. IV. CONCLUSIONS An algorithm is presented for adaptive identification of nonlinear systems given by sparse, truncated Volterra kernels. The algorithm introduces a L 1 norm penalty function of the filter coefficients in the instantaneous square error and derives a LMS like algorithm that forces the insignificant coefficients to converge to zero faster. Simulation results showing superiority of the proposed method over LMS are provided. 352

0 Sparsity aware LMS Standard LMS 10 Mean Square Error (M.S.E.) 20 30 40 50 60 0 2000 4000 6000 8000 10000 12000 14000 Iteration Index (n) Fig. 3. The MSE versus no. of observations curve for General nonlinear model (LNL) 0 Sparsity aware LMS Standard LMS 10 Mean Square Error (M.S.E.) 20 30 40 50 60 0 2000 4000 6000 8000 10000 12000 14000 Iteration Index (n) Fig. 5. The MSE versus no. of observations curve for nonlinear Wiener model 353

REFERENCES [1] V. Mathews and G. Sicuranza,Polynomial Signal Processing, John Wiley and Sons Inc., 2000. [2] Vassilis Kekatos, Daniele Angelosante, Georgios B. Giannakis, Sparsity- Aware Estimation of Nonlinear Volterra Kernels, in 3 rd IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing [CAMSAP], Aruba, Dutch Antilles, 2009. [3] S. Benedetto and E. Biglieri, Nonlinear equalization of digital satellite channels, in IEEE J. Select. Areas Commun., no. 1, pp.57-62, Jan. 1983. [4] Y. Gu Y. Chen and A. O. Hero, Sparse LMS for system identification, in Proc. IEEE Intl. Conf. Acoust. Sp. Sig. Proc., Taipei, Taiwan, Apr. 2009. [5] Tokunbo Ogunfunmi, Adaptive Nonlinear System Identification: The Volterra and Wiener Model Approaches, Springer, 2007. [6] R. Tibshirani, Regression shrinkage and selection via the lasso, in J. Royal. Statist. Soc B., vol. 58, pp. 267288, 1996. [7] E. Cand es, Compressive Sampling in Int. Congress of Mathematics, vol. 3, pp. 14331452, 2006. [8] R. Baraniuk, Compressive sensing, in IEEE Signal Processing Magazine, vol. 25, pp. 2130, March 2007. [9] S. Haykin, Adaptive Filter Theory, 3 rd, Prentice Hall. [10] B. Farhang-Boroujeny, Adaptive Filters, John Wlley and Sons. [11] D. G. Manolakis, V. K. Ingle, S. M. Kogon, Statistical and Adaptive Signal Processing, McGraw-HILL [12] Daniele Angelosante, Juan Andres Bazerque, Georgios B. Giannakis, Online Adaptive Estimation of Sparse Signals: where RLS meets the l 1 -norm in IEEE Transactions on Signal Processing (To appear) [13] A. H. Sayed, Fundamentals of Adaptive Filtering, John Wiley and Sons, 2003. 354