An efficient conjugate gradient method in the frequency domain: Application to channel equalization

Size: px

Start display at page:

Download "An efficient conjugate gradient method in the frequency domain: Application to channel equalization"

Jennifer Singleton
5 years ago
Views:

ARTICLE IN PRESS Signal Processing 88 (2008) 99 116 wwwelseviercom/locate/sigpro An efficient conjugate gradient method in the frequency domain: Application to channel equalization Aris S Lalos,

2007 Available online 14 July 2007 Abstract In this paper, a new block adaptive filtering algorithm, based on the conjugate gradient (CG) method of optimization, is proposed A Toeplitz approximation

1 ARTICLE IN PRESS Signal Processing 88 (2008) wwwelseviercom/locate/sigpro An efficient conjugate gradient method in the frequency domain: Application to channel equalization Aris S Lalos, Kostas Berberidis Department of Computer Engineering & Informatics, University of Patras, Rio-Patras, Greece Received 16 November 2006; received in revised form 3 July 2007; accepted 4 July 2007 Available online 14 July 2007 Abstract In this paper, a new block adaptive filtering algorithm, based on the conjugate gradient (CG) method of optimization, is proposed A Toeplitz approximation of the auto-correlation matrix is used for the estimation of the gradient vector and the involved quantities are updated on a block by block basis Due to this formulation, the algorithm can be efficiently implemented in the frequency domain (FD) To this end, recursive relations for the FD quantities updated on a block by block basis have been derived Different ways of accelerating convergence based either on the use of a preconditioner or on an appropriate decoupling of the direction vector are described The applicability of the new algorithm to adaptive linear equalization (LE) and decision feedback equalization (DFE) has been studied The proposed LE and DFE algorithms exhibit superior convergence properties as compared to existing adaptive algorithms, offering significant savings in computational complexity r 2007 Elsevier BV All rights reserved Keywords: Frequency domain adaptive filtering; Conjugate gradient method; Block adaptive DFE 1 Introduction Adaptive filtering has been an area of active research over the last decades due to their wide applicability in many signal processing and communication applications The performance of an adaptive algorithm can be assessed by a number of factors such as accuracy of steady state solution, convergence speed, tracking abilities, computational complexity, numerical robustness, etc [1] In many real time applications the issues of complexity and Corresponding author Tel: ; fax: addresses: lalos@ceidupatrasgr (AS Lalos), berberid@ceidupatrasgr (K Berberidis) convergence speed play a crucial role, therefore many different techniques such as partial updating schemes [2], IIR adaptive filtering [3], and frequency domain (FD) adaptive filtering (eg [1,4]) have been developed to reduce computational complexity, and accelerate convergence The main idea in partial updating schemes is to reduce the computational complexity of an adaptive filter by adapting a group of the filter coefficients rather than the entire filter at every iteration In non-stationary environments, however, partial updating adaptive filters might be undesirable as they do not guarantee convergence The use of IIR adaptive filters dramatically reduces the computational complexity since a good performance can be achieved by estimating a small number of /$ - see front matter r 2007 Elsevier BV All rights reserved doi:101016/jsigpro

2 100 ARTICLE IN PRESS AS Lalos, K Berberidis / Signal Processing 88 (2008) parameters However, the stability of IIR schemes in certain implementation platforms is still an issue under investigation Frequency domain adaptive filters (FDAF) turn out to be a good alternative in several practical applications due to their computational efficiency and their good convergence properties Most of the existing FDAF algorithms are of the gradient type, that is, their time-domain (TD) counterparts are based on some variations of the least mean square (LMS) algorithm On the contrary, to the best of our knowledge, little work has been done toward developing FD implementations of adaptive algorithms based on the conjugate gradient (CG) method CG methods were developed for iterative solutions of finite linear equations in the early 1950s [5,6] Later, these methods were extended to solving non-linear equations or optimization problems [5,7] The aim in CG methods is to accelerate the slow convergence rate of the steepest descent method while avoiding the involvement of a Hessian matrix associated with Newton methods It is known that the CG methods terminate in at most M iterations, where M is the number of variables being adjusted, when they are applied to the minimization of a positive definite quadratic function [6] Boray and Srinath proposed a CG algorithm for adaptive filtering applications including adaptive equalization, adaptive prediction and echo cancellation [8] The convergence rate of their algorithm is comparable to that of the classical RLS algorithm However, its computational burden is still high as compared with the LMS algorithm Lim and Un suggested an algorithm for block adaptive filtering using the CG method [9] More generally, the algorithms that are based on the CG method of optimization offer convergence comparable to RLS schemes at a computational cost that lies between the LMS and the RLS methods In the existing literature, depending upon the CG algorithm under consideration the cost of sequential processing of the data grows sub-exponentially or quadratically with the filter order In this paper, a new block adaptive CG algorithm implemented in the FD is developed In the resulting algorithm, one CG iteration per block update is executed To reduce the complexity of the algorithm, a Toeplitz approximation of the autocorrelation matrix is used Thus the involved matrix vector products are calculated efficiently by employing the FFT algorithm To avoid the convergence degradation of the proposed scheme, due to the above approximations, two different methods of accelerating convergence could be applied The first method is based on the use of a preconditioner and the second one on an appropriate decoupling of the direction vectors The idea of applying the preconditioned CG (PCG) method in block adaptive filtering has also been proposed in [10,11] In the technique proposed there, the classical PCG method was applied for solving a Toeplitz system of equations that had resulted after applying a proper data windowing [1] According to that technique, the PCG process is repeated for each incoming block of input data and the filter parameters at each block are initialized to the optimal values determined from the preceding one However, in the proposed scheme here, only one iteration of the PCG method per incoming block of data is executed The second method of accelerating convergence is equivalent to the use of matrix step sizes instead of scalar ones which has been successfully applied in other FD adaptive filtering algorithms such us FDLMS [1,4] In the proposed technique, a method of the second type (ie, decoupling of the successive direction vectors) has been applied and the convergence of the resulting algorithm has become comparable to existing CG adaptive algorithms [12] The application of the algorithm to channel equalization, for both the linear and the decision feedback equalization case, is also pointed out The inherent causality problem appearing in the block formulation of the DFE is overcome by replacing the unknown decisions with properly derived tentative decisions [13] The resulting algorithms exhibit convergence, comparable with the RLS algorithm and a computational complexity that is proportional to the logarithm of the filter order M per time step as compare to the OðM 2 Þ, OðMÞ and OðlogðMÞÞ complexity of RLS, LMS and FD block LMS, respectively The paper is organized as follows In Section 2, the problem is formulated, the CG method of optimization is described and ways to implement the algorithm efficiently in the adaptive filtering context are also discussed In Section 3, the new block CG algorithm and its FD implementation are derived Furthermore, different ways of accelerating convergence are presented In Section 4 the adaptive channel equalization case is treated Finally, simulation results are provided in Section 5 and the work is concluded in Section 6

3 ARTICLE IN PRESS AS Lalos, K Berberidis / Signal Processing 88 (2008) Some Preliminaries 21 Problem formulation Before proceeding further, let us first define the notation used throughout the paper Notations ðþ, ðþ T, ðþ H denote conjugate, transpose, and Hermitian transpose operations, respectively We use ½Š k;l, ½Š :;l, ½Š k;: notation to denote the ðk; lþth element, the lth column and the kth row of a matrix Operator E½Š denotes expectation Finally, in the TD, vectors and matrices are denoted by bold lower case and bold upper case letters, respectively In the FD, vectors are denoted by calligraphic upper case letters Let us now assume that we are given the input fxðnþg and a desired output fuðnþg of an unknown system The unknown system is assumed to be a time varying linear system and the task is to obtain at each time n an estimate wðnþ ¼½w 1 ðnþ; ; w M ðnþš T of the system This estimate is computed so that its output yðnþ, given by yðnþ ¼ XM i¼1 w i ðnþxðn iþþzðnþ, tracks in an optimal way the desired output uðnþ of the unknown system Signal ZðnÞ is an iid noise sequence with power s 2 Z 22 The CG algorithm The CG method is an iterative method, that may be applied to linear optimization problems, as the one of finding the minimum of a quadratic cost function The objective of the method here is the minimization of a cost function defined as Vðw; R; bþ ¼E½juðnÞj 2 Š b H w w H b þ w H Rw (1) with respect to w Matrix R ¼ E½x ðnþx T ðnþš corresponds to the M M correlation matrix of the system input xðnþ, and b ¼ E½uðnÞx ðnþš is the crosscorrelation vector between the desired output uðnþ of the system and the input xðnþ The vector xðnþ is defined as ½xðnÞ; ; xðn M þ 1ÞŠ T The CG method originates from the conjugate direction (CD) method [5] The main idea in the CD method is to obtain a set of linearly independent direction vectors pðkþ which are conjugate with respect to R so that the vector w 0 that minimizes (1) can be expressed as a linear combination of these vectors Thus, the solution at each iteration k can be updated according to wðkþ ¼wðk 1ÞþaðkÞpðkÞ (2) The R-CD vectors are generated by a set of M linearly independent vectors In the CG method these M vectors are the successive gradients of the cost function (1), obtained as the method progresses The basic CG algorithm consists of Eqs (3) (7) summarized in Table 1, where the step sizes aðkþ are selected as the minimizing arguments of V ðwðkþ; R; bþ with respect to aðkþ, the factors bðkþ are chosen to ensure R-orthogonality for the direction vectors pðkþ, and gðkþ is the negative gradient vector of VðwðkÞ; R; bþ defined as gðkþ ¼b RwðkÞ ¼ 1 2 ½r wvðwðkþ; R; bþš (8) Previously developed CG adaptive algorithms execute several iterations of the CG algorithm per sample (or per block) update of R; b, in order to solve the system of equations Rw ¼ b [8,9] In the case of sample by sample update of R, b, some modifications have been proposed in order to allow the CG algorithm to run one iteration per incoming sample with performance comparable with RLS or LMS-Newton [12] However, the matrix vector multiplication RpðkÞ that is required at each iteration increases the total complexity to OðM 2 Þ multiplications per output sample Therefore, the way that R and b are estimated directly influences the performance and the complexity of the algorithm In the section that follows, some modifications that allow the algorithm to run efficiently just one iteration per incoming block of input data are proposed Table 1 Conjugate gradient algorithm (k max iterations) INITIAL CONDITIONS: wð0þ ¼0; gð0þ ¼b; pð1þ ¼gð0Þ; k ¼ 1 WHILE kpk max aðkþ ¼ ph ðkþgðk 1Þ p H ; ð3þ ðkþrpðkþ wðkþ ¼wðk 1ÞþaðkÞpðkÞ; ð4þ gðkþ ¼gðk 1Þ aðkþrpðkþ; ð5þ g H ðkþgðkþ bðkþ ¼ g H ðk 1Þgðk 1Þ ; ð6þ pðk þ 1Þ ¼gðkÞþbðkÞpðkÞ; ð7þ k ¼ k þ 1 END

4 102 ARTICLE IN PRESS AS Lalos, K Berberidis / Signal Processing 88 (2008) Block FD CG algorithm In the first part of this section the block update formulas of the weight vector, the gradient vector and the direction vector are derived Expressions for computing the step sizes and the factors bðkþ are also presented Matrix R is approximated by a Toeplitz matrix constructed by a time averaged estimator of the auto-correlation sequence r 0 ; r 1 ; ; r M 1 of fxg, ie, r k ðnþ ¼ Xn i¼kþ1 l n i x ðiþx ði kþ Then, the BCG algorithm is developed in the FD Convolutions, correlations and matrix vector products are implemented efficiently in the FD, using the fast Fourier transform (FFT), leading to a total complexity of OðM log MÞ multiplications per output block of data samples An appropriate decoupling of the successive direction vectors is also applied, resulting to considerable acceleration in convergence speed 31 Block CG algorithm derivation As mentioned previously, block adaptive schemes based on the CG method have already been proposed in literature In these schemes, the CG algorithm runs several iterations per block update of the correlation quantities In the proposed method, only one iteration of the CG method is executed per block update of the auto-correlation matrix and the cross-correlation vector In such a case vector gðiþ at each step i of the algorithm is not completely orthogonal to the subspace spanned by fpð0þ; ; pðiþg [12] Therefore, the condition g H ðiþpðjþ ¼0, for joi does not hold true In the rest of this section the update equations for the coefficient, gradient and direction vector are derived by applying the above idea Furthermore, the scalar step size along with the factor providing R- orthogonality between the direction vectors is computed in such a way that the orthogonality of the successive gradient vectors (which ensures convergence [12,14]) is preserved Initially, it is essential to focus on obtaining the block update recursion of the auto-correlation matrix and the cross-correlation vectors by using the covariance sectioning method of data windowing and an exponentially decaying data window [1] Let us first write the symbol-by-symbol update recursion of the auto-correlation matrix RðnÞ and the cross-correlation vector bðnþ as RðnÞ ¼lRðn 1Þþx ðnþx T ðnþ, bðnþ ¼lbðn 1Þþx ðnþuðnþ, where fxg, fug denote the system s input and desired output sequence, respectively Based on the above recursions, it is straightforward to show that RðnÞ ¼l Q Rðn QÞþX H QM ðnþk QX QM ðnþ, bðnþ ¼l Q bðn QÞþX H QM ðnþk Qu Q1 ðnþ, ð9þ ð10þ where the matrix K Q is a diagonal matrix with elements l Q 1 ; ; l 2 ; l; 1 in its main diagonal and 2 3 xðn Q þ 1Þ xðn M Q þ 2Þ X QM ðnþ ¼ , xðnþ xðn M þ 1Þ (11) u Q1 ðnþ ¼½uðn Qþ1Þ; ; uðnþš T (12) This expression represents a single update of the auto-correlation matrix and the cross-correlation vectors from time n Q to time n based on the Q most recent data vectors, and is thus called a block update Without loss of generality we can substitute n ¼ kq in (9), (10), where k is a block time index Also, for the sake of simplicity, from now on kq is replaced by k A block recursive formula of gðkþ can now be derived by using (2) (5), ie, gðkþ ¼bðkÞ RðkÞwðkÞ ¼ l Q gðk 1ÞþX H QM ðkþk QeðkÞ aðkþrðkþpðkþ, ð13þ where pðkþ is the direction vector related to vector gðkþ via Eq (7) and eðkþ is the block error signal of length Q defined as eðkþ ¼u Q1 ðkþ X QM ðkþwðk 1Þ (14) All the elements of the error vector in the above equation, ie, eðkq Q þ 1Þ, ; eðkqþ, depend on the same weight vector wðk 1Þ In the above block recursion formula, the matrix vector product may be efficiently implemented by employing the FFT algorithm In order to do so, the matrix RðkÞ has to be approximated by a Toeplitz Hermitian matrix ~RðkÞ constructed by the first column of RðkÞ Thus, only the first column rðkþ of RðkÞ ~ is required at each iteration of the algorithm and it corresponds to the first column of matrix RðkÞ By inspecting (9) the following block recursive

5 ARTICLE IN PRESS AS Lalos, K Berberidis / Signal Processing 88 (2008) formula for vector rðkþ is obtained rðkþ ¼l Q rðk 1ÞþX H QM ðkþk Qx Q1 ðkþ (15) with x Q1 ðkþ ¼½xðkQ Q þ 1Þ; ; xðkqþš T (16) At this point, it is important to mention that such an approximation would not lead to any performance degradation at least at the steady state When the correlation quantities converge, matrix RðkÞ takes a Toeplitz Hermitian structure Since the first column of matrices ~ RðkÞ and RðkÞ is the same, obviously these matrices at the steady state would be the same In the conventional CG algorithm, aðkþ is a step size used in the update recursion of the weight vector as shown in (2) It is usually chosen so that Vðwðk 1ÞþaðkÞpðkÞ; RðkÞ; bðkþþ is minimized Because the coefficient vector wðkþ minimizes V in the direction of pðkþ where pðkþ is tangent to a contour of V at wðkþ, the vector gðkþ is orthogonal to pðkþ, that is p H ðkþgðkþ ¼0 (17) Hence, from (13), (17) the optimal step size aðkþ can be obtained by aðkþ ¼ ph ðkþ~gðk 1Þ p H ðkþ ~RðkÞpðkÞ, (18) where ~gðk 1Þ ¼l Q gðk 1ÞþX H QM ðkþk QeðkÞ (19) Having computed the weight coefficient vector and the negative gradient vector, the new direction vector pðk þ 1Þ is obtained by a linear combination of the current direction vector and the current negative gradient vector gðkþ according to Eq (7) The factor bðkþ that provides R-orthogonality between the vectors pðkþ, cannot be obtained by directly applying Eq (6), since a non-constant R is used at each iteration One way to tackle the aforementioned problem is to apply Eq (6) and periodically reset the direction vector to the negative gradient vector pðkþ ¼ gðkþ in order to ensure the convergence of the algorithm [12] Alternatively, another expression for bðkþ may be derived by imposing pðk þ 1Þ to be R-orthogonal to pðkþ, ie, p H ðk þ 1Þ ~RðkÞpðkÞ ¼0 (20) Combining (20) and (7) gives b ðkþ ¼ gh ðkþ ~RðkÞpðkÞ p H ðkþ RðkÞpðkÞ ~ (21) This expression has also been proposed in [14] Finally, as mentioned in [12] an improved performance can be achieved by applying the Polak Ribie` re method for the computation of bðkþ, given by bðkþ ¼ ðgðkþ gðk 1ÞÞH gðkþ g H (22) ðk 1Þgðk 1Þ The above formulae for computing bðkþ have been incorporated in the derived block CG algorithm for the k data block that is summarized in Table 2 It is important to mention that the formula we used for computing the step size aðkþ preserves orthogonality of the gradient vectors, as shown in [12] Furthermore, it has been proved in [14], that the proposed formulae for computing bðkþ ensures the convergence of a degenerated scheme [12] as the one proposed Finally, by using the rationale of [12], itcanbe shown that the algorithm has the same misadjustment with RLS In steady state, since ~RðkÞ ~Rðk 1Þ!R and ~bðkþ ~bðk 1Þ!b, the update relation of the gradient vector becomes gðkþ gðk 1Þ aðkþrpðkþ This means that the algorithm operates as the classical CG method It is also known that matrix R is symmetric and positive definite when the power of noise signal is not zero [1] and thus the algorithm at steady state will always converge, as long as the orthogonality of the gradient vectors is preserved Furthermore, one should notice that the way matrix RðkÞ defined in (9) and vector bðkþ defined in (10) are calculated is the same with that in [12] Thus, the cost function that is minimized by the BCG algorithm in steady state, is identical to that minimized by the modified CG (MCG) algorithm Table 2 The block conjugate gradient algorithm for the kth data block Step 1: Update rðk 1Þ by Eq (15) and construct a Toeplitz Hermitian matrix ~RðkÞ Step 2: Calculate the step size aðkþ via Eq (18) Step 3: Update the coefficient vector wðk 1Þ by using (2) Step 4: Update the vector gðk 1Þ by applying (13) Step 5: Calculate the factor bðkþ through Eqs (22) or (21) Step 6: Update the direction vector pðkþ according to Eq (7)

6 104 ARTICLE IN PRESS AS Lalos, K Berberidis / Signal Processing 88 (2008) proposed in [12], where it has been shown that MCG has the same misadjustment with RLS 32 FD implementation FD adaptive filtering is an efficient way to compute the convolutions involved in a block formulation of an adaptive algorithm [1] and a means to speed up convergence By inspecting Eqs (13) (15) we can easily observe the linear correlations and convolutions that have to be computed Specifically, the second term of (13) is a linear correlation between the error signal and the input signal vector A linear convolution between the weights and the input signal vector appears in (14) and a linear correlation between the input signal and the input signal vector is involved in Eq (15) In our case, we adopt the overlap-save (OS) sectioning method for computing the linear correlations and convolutions in the FD, since the overlap-add method requires more operations, whereas the cyclic convolution method does not converge to the Wiener solution (see [4]) We first define the following FD quantities: ð26þ VðkÞ ¼diagfF 2S ½xðkQ 2S þ 1Þ; ; xðkqþš T g, ð23þ XðkÞ ¼F 2S G 01 2SQ K Qx Q1 ðkþ, ð24þ UðkÞ ¼F 2S G 01 2SQ K Qu Q1 ðkþ, ð25þ W ðk 1Þ ¼F 2S G 10 2SMwðk 1Þ, Gðk 1Þ ¼F 2S G 10 2SMgðk 1Þ, PðkÞ ¼F 2S G 10 2SM pðkþ, ð28þ where S is equal to maxfq; Mg and F 2S denotes the 2S 2S DFT matrix operator The matrices G 01 2SQ ; G10 2SQ are used to impose constraints in the TD and are defined as " G 01 2SQ ¼ 0 # " # ð2s QÞQ ; G 10 2SQ ¼ I Q, I Q 0 ð2s QÞQ (29) with 0 ð2s QÞQ being an ð2s QÞQ of zeros, and I Q being the Q Q identity matrix The multiplication of matrices G 01 2SQ, G10 2SQ with any Q 1 vector pads with 2S Q zeros the top and the bottom part of this vector, respectively In addition, we introduce the windowing matrices G 01 Q2S, G10 Q2S which correspond to the transposes of the aforementioned matrices The result of the multiplication of the above matrices with any 2S 1 vector is a Q 1 vector that contains the last Q and the first Q elements of the 2S 1 vector, respectively The windowing matrices G 01 2SM, G10 2SM, G01 M2S, G10 M2S can be derived from the above definitions by substituting the block length Q with the filter length M After defining the appropriate FD quantities, we proceed further with the FD implementation of the involved linear convolution and correlations as follows: X QM ðkþwðk 1Þ ¼G 01 Q2S F 1 2S VðkÞW ðk 1Þ, ð30þ X H QM ðkþk Qx Q1 ðkþ ¼G 10 M2S F 1 2S VH ðkþxðkþ, ð31þ X H QM ðkþk QeðkÞ ¼G 10 Q2S F 1 2S VH ðkþeðkþ, ð32þ where EðkÞ ¼UðkÞ F 2S G 01 2SQ K QG 01 Q2S F 1 2S VðkÞW ðk 1Þ Since ~RðkÞ is a Toeplitz Hermitian matrix, we may implement the product RðkÞpðkÞ ~ efficiently in the FD by embedding RðkÞ ~ into the 2S 2S circulant matrix " # ~RðkÞ A Mð2S MÞ ðkþ CðkÞ ¼ A H Mð2S MÞ ðkþ B, (33) 2S MðkÞ where A Mð2S MÞ ðkþ 2 0 1ð2S 2Mþ1Þ r M 1 ðkþ r 1 ðkþ 3 r M 1 ðkþ 0 1ð2S 2Mþ1Þ r 2 ðkþ ¼ r 1 ðkþ r M 1 ðkþ 0 1ð2S 2Mþ1Þ and r i ðkþ, i ¼ 0; ; M 1 corresponds to the ith element of vector rðkþ Matrix B 2S M ðkþ is a Toeplitz Hermitian matrix, whose first column is defined as ½B 2S M ðkþš :;1 ¼½r T ðkþ0 1ð2S 2MÞ Š T The matrix CðkÞ can be diagonalized by using the DFT matrix operator, and its spectral decomposition is given by CðkÞ ¼F 1 2S D CðkÞF 2S, (34) where D C ðkþ is a 2S 2S diagonal matrix containing the eigenvalues of CðkÞ [15] The eigenvalues of CðkÞ are equal to the DFT values of its first column, ie, D C ðkþ ¼diagfF 2S ½CðkÞŠ :;1 g (35)

7 ARTICLE IN PRESS AS Lalos, K Berberidis / Signal Processing 88 (2008) Thus, from Eqs (13), (31), (33), (35) and after some simple manipulations, the eigenvalues of CðkÞ are updated according to the following relation: D C ðkþ ¼l Q D C ðk 1ÞþdiagfF 2S G 10 2S F 1 2S VH ðkþxðkþ where þ F 2S K 2S ðf 1 2S VH ðkþxðkþþ g, G 10 2S ¼ G10 2SM G10 M2S I M 0 M2S M " # ¼ 0 2S MM 0 2S M and K 2S is a windowing matrix defined as " K 2S ¼ 0 # ð2s Mþ1ÞðM 1Þ 0 ðm 1Þð2S Mþ1Þ J M 1 0 ðm 1Þð2S Mþ1Þ ð36þ ð37þ (38) with J M 1 being an anti-diagonal matrix containing ones in its main anti-diagonal Now, it is straightforward to compute the matrix vector product ~RðkÞpðkÞ as ~RðkÞpðkÞ ¼G 10 M2S F 1 2S D CðkÞPðkÞ (39) Having derived the FD representation of vectors wðkþ, gðkþ, pðkþ (Eqs (26) (28)), the recursive relations for the FD quantities can be updated on a block by block basis as follows: W ðkþ ¼W ðk 1ÞþaðkÞPðkÞ, ð40þ GðkÞ ¼l Q Gðk 1Þ aðkþl 10 2S D CðkÞPðkÞ þ L 10 2S VH ðkþeðkþ, ð41þ Pðk þ 1Þ ¼GðkÞþbðkÞPðkÞ ð42þ and EðkÞ ¼UðkÞ L 01 2SVðkÞW ðk 1Þ, (43) where L 10 2S, L01 2S are defined as L 10 2S ¼ F 2SG 10 2S F 1 2S, ð44þ L 01 2S ¼ F 2SG 01 2SQ K QG 01 Q2S F 1 2S ð45þ To finish up with the update equations we observe that the TD step size aðkþ and the TD factor bðkþ can be computed by already available FD vectors To this end, let us write Eq (18) as aðkþ ¼ ph ðkþ~gðk 1Þ p H ðkþrðkþpðkþ ¼ ðg10 2SM pðkþþh ðf H 2S F 2S=2SÞG 10 2SM ~gðk 1Þ ðg 10 2SM pðkþþh ðf H 2S F 2S=2SÞG 10 2SM RðkÞpðkÞ By substituting Eqs (19), (27), (28), (32) and (39) in the above relation, it is obvious that the step size can be computed by P H ðkþ Gðk aðkþ ¼ ~ 1Þ P H ðkþl 10 2S D CðkÞPðkÞ, ð46þ ~Gðk 1Þ ¼l Q Gðk 1ÞþL 10 2S VH ðkþeðkþ ð47þ Similarly to the step-size case, the factor bðkþ can be computed by any of the following relations: b ðkþ ¼ GH ðkþl 10 2S D CðkÞPðkÞ P H ðkþl 10 2S D CðkÞPðkÞ, ð48þ bðkþ ¼ ðgðkþ Gðk 1ÞÞH GðkÞ G H ð49þ ðk 1ÞGðk 1Þ It should be noted that we may proceed even further by making some approximations that reduce even more the complexity, at the expense of decreasing the convergence rate of the scheme Obviously, matrices L 10 2S ; L01 2S defined in Eq (44), (45), respectively, are circulant matrices, since they can be diagonalized by using the Fourier matrix F 2S In the case where the filter order M and the block length Q are chosen to be equal ðs ¼ Q ¼ MÞ then matrix L 10 2S may be approximated by a diagonal matrix [16] More specifically it is reasonable to write L 10 2S 0:5I 2S It should be noted that when applying these approximations one should take into account not only the complexity per iteration, but also the overall cost required to reach the same steady-state performance 33 Convergence acceleration 331 Use of a preconditioner One way to accelerate convergence is the use of preconditioning The preconditioner is typically used to cluster the spectrum of a matrix thus improving at each iteration the condition number of matrix ~RðkÞ Let us assume that PðkÞ is an easily invertible matrix such that P 1 ðkþ ~RðkÞ I (ie, the eigenvalues of P 1 ðkþ RðkÞ ~ lie around 1) By following the procedure that was described in Section 31 the preconditioned version of the block CG (PBCG) algorithm can be derived The objective here is the minimization of the cost function V ðwðkþ; P 1 ðkþ ~RðkÞ; P 1 ðkþ~bðkþþ The difference between the PBCG and BCG is in Steps 2, 5 and 6 The step size a p ðkþ in the preconditioned version of the BCG is again chosen as the minimizing argument of VðwðkÞ; P 1 ðkþ RðkÞ; ~ P 1 ðkþ bðkþþ ~ and is given by a p ðkþ ¼ ph ðkþp 1 ðkþ~gðk 1Þ p H ðkþp 1 (50) ðkþ ~RðkÞpðkÞ

8 106 ARTICLE IN PRESS AS Lalos, K Berberidis / Signal Processing 88 (2008) the factor b p ðkþ ensures P 1 ðkþ RðkÞ-orthogonality ~ between the direction vectors pðk þ 1Þ and pðkþ and is computed by b p ðkþ ðkþp 1 ðkþ ~RðkÞpðkÞ ¼ gh p H ðkþp 1 (51) ðkþ ~RðkÞpðkÞ and finally the direction vector is updated according to pðk þ 1Þ ¼P 1 ðkþgðkþþb p ðkþpðkþ (52) The best preconditioner is the matrix ~RðkÞ itself In that case, the algorithm converges with just one iteration, but the solution of the deconvolution is computationally expensive because we have to solve a Toeplitz system and form matrix ~RðkÞ The idea of using a circulant preconditioner to reduce the eigenvalue spread of a Toeplitz matrix was proposed by Strang [17] Since then, there have been many studies for constructing an efficient circulant preconditioner [18,19] One circulant preconditioner, that could be used in our case and has been also employed in [10], is the Strang preconditioner [18] However, the construction of such a preconditioner adds a small overhead in complexity, since we would have to compute one additional FFT to find its eigenvalues To keep the complexity as low as possible one could use a preconditioner whose eigenvalues can be computed by already available FD quantities, since the BCG algorithm has been completely transferred in the FD Here, we introduce a circulant preconditioner P c ðkþ that is assumed to be circulant so that it retains the Toeplitz property and adds periodicity [18] Initially, we constructed a 2S 2S circulant matrix from the data matrix VðkÞ whose spectral decomposition is given by C x ðkþ ¼F 1 2S VðkÞF 2S The circulant preconditioner P c ðkþ can be obtained from C x ðkþ as P c ðkþ ¼C H x ðkþc xðkþ (53) P c ðkþ can be also diagonalized by the Fourier matrix, and the resulting 2S 2S diagonal matrix D p ðkþ ¼V H ðkþvðkþ (54) corresponds to the signal power estimate at each frequency bin [1,4] It should be noticed, that this diagonal matrix can be obtained easily, by using the diagonal matrix VðkÞ, which is already used in the computations of convolutions Finally, the diagonal matrix D p ðkþ can be updated recursively as in [11]: D p ðkþ ¼l p D p ðk 1Þþð1 l p ÞV H ðkþvðkþ (55) The changes presented in (50) (52) may be incorporated in the FD implementation of the proposed algorithm as well However, apart from the update and the inversion of D p ðkþ some additional matrix vector products are required An alternative method for accelerating convergence is described below 332 Use of matrix step sizes The convergence speed of the proposed scheme can also be improved by using matrix step sizes in update Eqs (40) (42) instead of the scalar ones aðkþ and bðkþ These matrices, denoted as M a ðkþ and M b ðkþ, should have a diagonal structure, thus allowing a decoupled update at the different frequency bins The diagonals of these time varying matrices contain the step sizes a m ðkþ ¼aðkÞ=D pm ðkþ and the factors b m ðkþ ¼bðkÞ=D pm ðkþ, m ¼ 0; ; 2S 1, respectively D pm ðkþ corresponds to an estimate of the signal power in the mth bin The diagonal matrix that contains these estimates may be updated according to (55), [20] In that case the element D pm ðkþ corresponds to the mth element of matrix D p ðkþ Starting from the update equation of the weight coefficient vector and the direction vector, it is obvious that the use of matrix step sizes is equivalent to the pre-multiplication of each direction vector PðkÞ with a diagonal matrix that contains in its diagonal the elements 1=D pm ðkþ, m ¼ 0; ; 2S 1 This diagonal matrix can be considered as a diagonal matrix that contains the eigenvalues of the circulant matrix P 1 c ðkþ defined in (53) Therefore, it can be easily shown that ½P 1 c ðkþš 1:M;1:M pðkþ ¼G 10 M2S F 1 2S D 1 p ðkþpðkþ, where ½P 1 c ðkþš 1:M;1:M corresponds to the M M left upper part of P 1 c ðkþ Therefore, the FD representations of the weight coefficient vector and the direction vector may be updated according to the following recursive formulae: W ðkþ ¼W ðk 1ÞþaðkÞZðkÞ, Pðk þ 1Þ ¼GðkÞþbðkÞZðkÞ, ð56þ ð57þ where ZðkÞ ¼L 10 2S D 1 p ðkþpðkþ The FD representation of the gradient vector may be updated by already available FD quantities

9 ARTICLE IN PRESS AS Lalos, K Berberidis / Signal Processing 88 (2008) according to GðkÞ ¼l Q Gðk 1ÞþL 10 2S VH ðkþeðkþ L 10 2S D CðkÞaðkÞZðkÞ ð58þ Orthogonality between the FD representation of the gradient vectors can be preserved by ensuring Z H ðkþgðkþ ¼0 (59) The above condition, which was also applied in the TD in [12], leads to the following formula for computing the step size aðkþ: Z H ðkþ ~Gðk 1Þ aðkþ ¼ Z H ðkþl 10 2S D CðkÞZðkÞ (60) It is straightforward to show that the TD equivalent of Eq (60) can be written as aðkþ ¼ p H ðkþ½p H c p H ðkþ½p H c ðkþš 1:M;1:M ~gðk 1Þ ðkþš 1:M;1:M ~RðkÞ½P 1 c ðkþš 1:M;1:M pðkþ (61) The same expression is derived if aðkþ is computed as the minimizing argument of a cost function VðwðkÞ; RðkÞ; ~ bðkþþ, where the directions are given by ½P 1 c ðkþš 1:M;1:M pðkþ Finally, to preserve R-orthogonality between the decoupled direction vectors ½P 1 c ðk þ 1ÞŠ 1:M;1:M pðk þ 1Þ and ½P 1 c ðkþš 1:M;1:M pðkþ, the following equation should hold: p H ðk þ 1ÞTðkÞpðkÞ ¼0 ) b p ðkþ ðkþtðkþpðkþ ¼ gh p H ðkþtðkþpðkþ, where TðkÞ ¼½P H c ð62þ ðk þ 1ÞŠ 1:M;1:M ~RðkÞ½P 1 c ðkþš 1:M;1:M It can be easily shown that the above formula for computing bðkþ is equivalent to b ðkþ ¼ GH ðkþd H p ðk þ 1ÞL 10 2S D CðkÞL 10 2S D 1 p ðkþpðkþ P H ðkþd H p ðk þ 1ÞL 10 2S D CðkÞL 10 2S D 1 p ðkþpðkþ (63) The above equation, after making the approximation D 1 p ðk þ 1Þ ¼D 1 p ðkþ can be written as b ðkþ ¼ GH ðkþd H p ðkþl 10 2S D CðkÞZðkÞ Z H ðkþl 10 2S D (64) CðkÞZðkÞ Thus, the decoupling of the direction vectors seems to be strongly connected with the use of a preconditioner Simulations have shown that both schemes are almost equivalent in performance The step-sizing approach is the one used in the algorithm summarized in Table 3 34 Complexity issues Due to FD implementation, the new algorithm offers significant savings in complexity The total complexity depends on a number of different parameters To simplify the presentation and confine ourselves to the most important parameters, we make some assumptions which, however, are not crucial for the implementation of the algorithm If they do not hold true, then some of the FFT-based implementations simply have to be properly modified (ie, a different zero-padding may be needed in the involved OS implementations) Thus, we assume that Q and M are all powers of two It is obvious then that all the involved FFTs have lengths that are powers of two, and we can easily derive the total complexity Note that since Q and M are powers of two, this is true for S as well Furthermore, we assume that by examining only the total number of complex multiplications, we provide reasonably accurate comparative estimates of the overall complexity of the algorithm In an actual implementation several other issues would need to be considered, such as the number of additions, storage requirements, system transport delays, hardware design, etc Finally we assume that the input data are complex The computational complexity (complex multiplications) of the algorithm of Table 3 is now summarized as follows: Step 1 involves three FFTs of length 2S, with the cost of 6S log 2 ð2sþ complex multiplications (CM) and a multiplication of the diagonal matrix K Q with the vectors x Q1 ðkþ, u Q1 ðkþ, with a cost of 2Q CM The computation of the error term in the FD requires 4S log 2 ð2sþþ4s CM The update relation in Step 3 needs 4S log 2 ð2sþþ 4S CM Step 4 requires 8S CM, while the decoupling of the direction vector involves 4S log 2 ð2sþþ 2S CM The computation of the terms L 10 2S VH ðkþeðkþ and L 10 2S D CðkÞZðkÞ involves 4S+ 8S log 2 ð2sþ CM Thus, the step size calculation, the update of the negative of the gradient vector and the bðkþ computation requires 16S þ 8S log 2 ð2sþþ2 CM Finally, Step 7 requires 4S CM

10 108 ARTICLE IN PRESS AS Lalos, K Berberidis / Signal Processing 88 (2008) Table 3 The frequency domain conjugate gradient algorithm for the kth data block Definitions : Q: Block Length, M: Filter Length, S ¼ maxðq; MÞ, k: block index, F 2S : Fourier matrix, the matrices G 01 2SQ, L10 2S, L01 2S are defined in Eqs (29), (44) and (45), respectively, 0 M1 ð1 M1 Þ is a vector with M zeros (ones) and I 2S is the 2S identity matrix Initialization : W ð0þ ¼0 2S1, Gð0Þ ¼0 2S1, D C ð0þ ¼0 2S, P 2S ð0þ ¼I 2S k ¼ 1 Step 1: Step 2: Step 3: Step 4: Step 5: Step 6: Step 7: Form the appropriate FD quantities: VðkÞ ¼diagfF 2S ½xðkQ 2S þ 1Þ; xðkq 2S þ 2Þ; ; xðkqþš T g, XðkÞ ¼F 2S G 01 2SQ K Q½xðkQ Q þ 1Þ; xðkq Q þ 2Þ; ; xðkqþš T, UðkÞ ¼F 2S G 01 2SQ K Q½uðkQ Q þ 1Þ; uðkq Q þ 2Þ; ; uðkqþš T Compute the error vector in the FD: EðkÞ ¼UðkÞ L 01 2SVðkÞW ðk 1Þ Update the eigenvalues of the circulant matrix: D C ðkþ ¼l Q D C ðk 1ÞþdiagfF 2S ðg 10 2S F 1 2S VH ðkþxðkþþk 2S ðf 1 2S VH ðkþxðkþþ Þg Compute the diagonal matrix that contains the inverse power estimates on the different frequency bins: D p ðkþ ¼l p D p ðk 1Þþð1 l p ÞV H ðkþvðkþ Decouple the direction vector: if k ¼ 1 then; Zð1Þ ¼L 10 2S D 1 p ðkþl10 2S VH ðkþeðkþ; else ZðkÞ ¼L 10 2S D 1 p ðkþpðkþ Calculate the step size, update the gradient vector and compute the factor bðkþ: ~Gðk 1Þ ¼l Q Gðk 1ÞþL 10 2S VH ðkþeðkþ, Z H ðkþ ~Gðk 1Þ aðkþ ¼ Z H ðkþl 10 2S D CðkÞZðkÞ, GðkÞ ¼ ~Gðk 1Þ aðkþl 10 2S D CðkÞZðkÞ, b ðkþ ¼ GH ðkþd H p ðkþl 10 2S D CðkÞZðkÞ Z H ðkþl 10 2S D CðkÞZðkÞ Update the direction and the coefficient vector: Pðk þ 1Þ ¼GðkÞþbðkÞZðkÞ, W ðkþ ¼W ðk 1ÞþaðkÞZðkÞ Thus the total complexity of the algorithm is equal to 26S log 2 ð2sþþ36s þ 2Q þ 2 CM for Q output samples At this point, it should be noted that the FD block LMS referred as the linear-convolution OS FDAF in [4] requires 10S log 2 ð2sþþ16s CM, if we assume complex input data In order to emphasize the savings in complexity that the proposed algorithm offers, we present the complexity of two more adaptive algorithms operating in a sample by sample mode The MCG algorithm proposed in [12] involves 3M 2 þ 10M þ 2 CM per output sample and the RLS algorithm [1] requires 3M 2 þ 4M CM The complexities of the sample by sample algorithms per M output samples, and the cost of the block algorithm OS FDAF are compared to the cost of the algorithm of Table 3 in Table 4 Computational complexity ratios Complexity ratios for M output samples Filter size M FDCG/RLS FDCG/MCG FDCG/OSFDAF the form of a ratio, as summarized in Table 4 Several values of M have been used By examining Table 4, we see that the proposed algorithm has a computational complexity that is proportional to the complexity of OS FDAF, independently of the value M However, our

11 ARTICLE IN PRESS AS Lalos, K Berberidis / Signal Processing 88 (2008) algorithm offers superior savings as compared to the sample by sample adaptive algorithms There is a dramatic reduction in the number of CM as M increases This reduction is mainly due to the block nature, the FD implementation and the idea of executing only one iteration of the CG method per block update 4 Application to channel equalization In this section, the applicability of the new algorithm to adaptive channel equalization is pointed out We are particularly interested in mobile wireless communication systems Channels that are encountered in such systems may have long impulse responses (IR) and change significantly in time Therefore, the involved equalizer should be able to track the channel variations and also have a fast convergence so as a relatively short training sequence to be adequate In the following subsections, two new adaptive equalization algorithms based on the algorithm of Table 3 are developed 41 Linear equalization case The main part of a linear equalizer is a transversal filter that combines linearly a number of consecutive channel output samples to provide an estimate of the current symbol The MMSE criterion is commonly employed as a means for optimizing the transversal filter coefficients In case of changing channel conditions, the filter coefficients are updated via an adaptive algorithm The algorithm of Table 3 is directly applicable to the linear equalization (LE) case after taking into account the following remarks: (1) The equalizer s coefficients correspond to the unknown parameters of the system w i ðkþ that has to be identified (2) The sample xðkqþ denotes the channel output at time kq, and sample uðkqþ is either a training symbol (during the training mode), or a decision (during decision directed mode) In the latter case, u Q1 ðkþ will be given by u Q1 ðkþ ¼f fg 01 VðkÞW ðk 1Þg, Q2S F 1 2S where f fg stands for the decision device function (3) In the TD, the update recursions of the autocorrelation matrix of the equalizer s input and the cross-correlation vector will be given by RðkÞ ¼l Q Rðk 1ÞþX H QM ðnþk QX QM ðnþ, bðkþ ¼l Q bðk 1ÞþX H QM ðnþk Qu Q1 ðkþ, where n ¼ kq þ M 1 1 The above equations denote that the filter output is delayed by M 1 1 samples with respect to its input Note that due to the form of the channel IR the linear equalizer is in general non-causal and becomes realizable by introducing the above finite delay (4) Due to the delay of M 1 1 samples, the definitions of the FD quantities in Step 1 have to be modified accordingly, ie, Vðk þ M 1 1Þ ¼diagfF 2S ½xðn 2S þ 1Þ; ; xðnþš T g, Xðk þ M 1 1Þ ¼F 2S G 01 2SQ K Q½xðn Q þ 1Þ; ; xðnþš T, UðkÞ ¼F 2S G 01 2SQ K Qu Q1 ðkþ with n ¼ kq þ M 1 1 Incorporating the above modifications into Step 1 of the algorithm of Table 3, we end up with an adaptive FDCG linear equalizer 42 Decision feedback equalization case Another well-established equalization technique, which is very effective in reducing the introduced ISI when it is too severe for a LE to handle, is the adaptive decision feedback equalizer (DFE) [22] The FIR DFE consists of two filters of lengths M and N, respectively: a feed forward (FF) filter that combats precursor ISI and noise, and a strictly causal feedback (FB) filter that suppresses postcursor ISI The task here is to develop a CG type adaptive algorithm that minimizes a cost function defined as in Eq (1) where the auto-correlation matrix of the two input one output FIR system is given by " # R ¼, (65) with R xx M R ux NM R xu MN R uu N R xx M ¼ Efx M1 ðk þ M 1ÞxT M1ðk þ M 1Þg, ð66þ R xu MN ¼ Efx M1 ðk þ M 1ÞuT N1ðk 1Þg, ð67þ R ux NM ¼ðRxu MN ÞH, R uu N ¼ Efu N1 ðk 1ÞuT N1ðk 1Þg ð69þ ð68þ

12 110 ARTICLE IN PRESS AS Lalos, K Berberidis / Signal Processing 88 (2008) The ðm þ NÞ1 weight coefficient vector is defined as " # w ¼ wff M1 (70) w fb N1 Matrices Ruu N can be approximated by Toeplitz Hermitian matrices, whose first columns are updated on a block by block basis according to the following recursive relations: R xx M, r xx M1 ðkþ ¼lQ r xx M1 ðk 1ÞþXH QM K Qx Q1, r uu N1 ðkþ ¼lQ r uu N1 ðk 1ÞþUH QN K Qu Q1, ð71þ ð72þ where X QM ¼ X QM ðk þ M 1Þ and U QN corresponds to the matrix 2 3 uðkq QÞ uðkq N Q þ 1Þ U QN ¼ uðkq 1Þ uðkq NÞ Moreover, R xu MN may be approximated by an M N Toeplitz matrix R xu ~ MN, whose first column and first row are computed by the following respective formulas: r xu M1 ðkþ ¼lQ r xu M1 ðk 1ÞþXH QM K Qu Q1, r ux N1 ðkþ ¼lQ r ux N1 ðk 1ÞþU QNK Q x Q1 ð73þ ð74þ and ~R ux NM ¼ð R xu ~ MN ÞH To proceed with the derivation of the block adaptive DFE, let us first write the output of the equalizer, when it operates in decision directed mode u Q1 ðkþ ¼f fy Q1 ðkþg (75) with y Q1 ðkþ ¼X QM w ff M1 þ U QNw fb N1 (76) U QN may be written as the sum of two Toeplitz matrices U k QN, Uuk QN defined as U k QN 2 3 uðkq QÞ uðkq N Q þ 1Þ 0 uðkq QÞ ¼ uðkq NÞ ð77þ and U uk QN ¼ uðkq Q þ 1Þ uðkq 1Þ uðkq Q þ 1Þ 0 (78) Unfortunately, there is an inherent causality problem in the block formulation of the DFE, which can be better shown if we combine (75) (78) into a single one, ie, u Q1 ðkþ ¼f fy p Q1 þ Uuk QN wfb N1 g (79) with y p Q1 ¼ X QMw ff M1 þ Uk QN wfb N1 (80) By inspecting (79), we can easily observe that the same unknown decisions appear in both sides of the equation (note that the matrix U uk QN contains symbols which are also entries of u Q1 ðkþ) To overcome the above causality problem, we apply an iterative procedure based on (75) and (76) that was originally proposed in [13] More specifically, we start with properly chosen initial decisions to replace the unknown ones in U uk QN, and then, we obtain new decisions using (75) Instead of considering these decisions as final ones for the current block, we use these decisions to calculate again y Q1 ðkþ in (76) We expect that these decisions are more accurate than the ones used in U uk QN at the previous step since they come from the output vector of the filtering part of the block formulation of the DFE This yields, using (75) again, new decisions that can, in turn, be used in (76), and so on As shown in [13], this iterative procedure converges to the optimum decisions in a number of Q steps for any choice of the initial decision vector Recall that, we consider as optimum, the decisions that obtained by the conventional symbolby-symbol DFE, with fixed filters for each block The decision vector u Q1 ðnþ, consisting of these optimum decisions, will be the one satisfying (75) or, alternatively, the equilibrium point of the above nonlinear iterative procedure Thus, independently of the initial decisions used in U uk QN, by iteratively applying (75) and (76), we can finally obtain a decision vector that satisfies (75), ie, we can have the same decisions in both sides of (79) In practice, only a small number of iterations has been proved necessary to provide the

13 ARTICLE IN PRESS AS Lalos, K Berberidis / Signal Processing 88 (2008) Table 5 The frequency domain conjugate gradient DFE algorithm for the kth data block Definitions : Q: Block Length, M: Feed forward Filter Length, N: Feedback Filter Length, S ¼ maxðq; M; NÞ, k: block index, F 2S : Fourier matrix, the matrices G 10 2SM, G10 2SN are defined similarly as in Eqs (29), (44), respectively Initialization : W FF ð0þ ¼0 2S1, W FB ð0þ ¼0 2S1, P FF ð0þ ¼G FF ð0þ ¼0 2S1, P FB ð0þ ¼G FB ð0þ ¼0 2S1, D Cxx ð0þ ¼0 2S,D Cxu ð0þ ¼0 2S, D Cux ð0þ ¼0 2S, D Cuu ð0þ ¼0 2S, P FF 2S ð0þ ¼PFB 2S ð0þ ¼I 2S k ¼ 1 Step 1: Estimate the unknown decisions: V xx ðkþ ¼diagfF 2S ½xðkQ þ M 2SÞ; ; xðkq þ M 1ÞŠ T g, V k uu ðkþ ¼diagfF 2S½uðkQ 2SÞ; ; uðkq QÞ; 0 1Q 1 Š T g, y p Q1 ¼ G01 Q2S F 1 2S V xxðkþw FF þ G 01 Q2S F 1 2S Vk uu ðkþw FB, u 0 Q1 ¼½u0 ðkq Q þ 1Þ; ; u 0 ðkqþš ¼ 0 Q1 ; i ¼ 0 Iterative procedure to obtain the final decisions u Q1 : ðaþ V uk uu ðkþ ¼diagfF 2S½0 12S Qþ1 ; u i ðkq Q þ 1Þ; ; u i ðkq 1ÞŠ T g ðbþ y i Q1 ¼ yp Q1 þ G01 Q2S F 1 2S Vuk uu ðkþw FB ðcþ u i Q1 ¼ f fyi Q1 g ðdþ if u i Q1 aui 1 Q1 then i ¼ i þ 1 goto ðaþ ðeþ After I iterations; set u Q1 ¼ u I Q1 and y Q1 ¼ y I Q1 Step 2: Step 3: Form the error vector: e Q1 ¼ u Q1 y Q1 ; E 2S1 ¼ F 2S G 01 2SQ e Q1 Compute the eigenvalues of the circulant matrices: XðkÞ ¼F 2S G 01 2SQ K Q½xðkQ þ M QÞ; ; xðkq þ M 1ÞŠ T UðkÞ ¼F 2S G 01 2SQ K Q½uðkQ Q þ 1Þ; ; uðkqþš T ; V uu ðkþ ¼V k uu ðkþþvuk uu ðkþ X H QM K Qu Q1 ¼ G 10 M2S F 1 2S VH xx ðkþuðkþ; UH QN K Qu Q1 ¼ G 10 N2S F 1 2S VH uu ðkþuðkþ X H QM K Qu Q1 ¼ G 10 M2S F 1 2S VH xx ðkþxðkþ; UH QN K Qx Q1 ¼ G 10 N2S F 1 2S VH uu ðkþxðkþ (1) Update the correlation quantities through Eqs (71) (74) (2) Compute the eigenvalues of the circulant matrices through Eqs (81) (84) Step 4: Step 5: Compute the diagonal matrix that contains the inverse power estimates on the different frequency bins: ðkþ ¼l pd FF ðk 1Þþð1 l pþv H xx ðkþv xxðkþ, D FF p D FB p p ðkþ ¼l pd FB p ðk 1Þþð1 l pþv H uu ðkþv uuðkþ, 2 D FF p ðkþ ¼ p ðkþ S S D FB p ðkþ D 1 Decouple the direction 2vector: if k ¼ 1 then; Zð1Þ ¼ L10 2S 0 2S 4 5D 1 0 2S L 10 p ðkþ L 10 2S 0 2S 4 5 VH xx ðkþ 0 3" 2S 4 5 E # 2S1ðkÞ 2S 0 2S L 10 2S 0 2S V H uu ðkþ, E 2S1 ðkþ 2 3 " else; ZðkÞ ¼ L10 2S 0 2S 4 5D 1 0 2S L 10 p ðkþ P FF # ðkþ 2S P FB ðkþ Step 6: Calculate the step size, update the gradient vector and compute the factor bðkþ as in Step 6ofTable 3: " # " (i) Set GðkÞ ¼ GFF ðkþ G FB, EðkÞ ¼ E # " 2S1ðkÞ, D ðkþ E 2S1 ðkþ C ðkþ ¼ D # C xx ðkþ D Cxu ðkþ D Cux ðkþ D Cuu ðkþ " # " (ii) Substitute L 10 2S with L10 2S 0 2S 0 2S L 10 and V H ðkþ with VH xx ðkþ 0 # 2S 2S 0 2S V H uu ðkþ Step 7: Update the direction and the coefficient vector as in Step 7ofTable 3: " (i) Set W ðkþ ¼ W # " # FF ðkþ W FB, PðkÞ ¼ PFF ðkþ ðkþ P FB ðkþ

14 112 ARTICLE IN PRESS AS Lalos, K Berberidis / Signal Processing 88 (2008) optimal decision vector (less than 3) The above iterative scheme is the one used in Step 1 of the algorithm summarized in Table 5 To proceed further, we notice that the input autocorrelation matrix in (65), cannot be approximated by a Toeplitz matrix, as it was in the single-input single-output (SISO) case treated in Section 3 Therefore, the algorithm of Table 3 is not directly applicable to the DFE case, since the matrix vector product RðkÞpðkÞ, with pðkþ being the ðm þ NÞ1 direction vector, cannot be implemented in the FD by following the procedure that was described in the previous section However, one can easily observe that each one of the submatrices defined in Eqs (66) (69) may be approximated by a Toeplitz matrix and the matrix vector product may be implemented as " ~R xx M RðkÞpðkÞ ¼ ðkþpff M1 ðkþþ ~R xu MN ðkþpfb N1 ðkþ # ~R ux NM ðkþpff M1 ðkþþ ~R uu N ðkþpfb N1 ðkþ, where vectors p ff M1 ðkþ, pfb N1ðkÞ contain the first M and the last N elements of pðkþ At this point it is important to mention that the four partial matrix vector products that appear in the computation of RðkÞpðkÞ may be efficiently implemented in the FD as described in the previous section Thus, at each incoming block of data, the eigenvalues of four circulant matrices D Cxx, D Cxu, D Cux, D Cuu are computed by D Cxx ¼ diagff 2S ½r xx M1 ðkþ 0 ð2s 2Mþ1Þ1 ðr xx f Þ Šg, ð81þ D Cuu ¼ diagff 2S ½r uu N1 ðkþ 0 ð2s 2Nþ1Þ1 ðr uu f Þ Šg, ð82þ D Cxu ¼ diagff 2S ½r xu M1 ðkþ 0 ð2s Lþ1Þ1 ðr ux f Þ Šg, ð83þ D Cux ¼ diagff 2S ½r ux N1 ðkþ 0 ð2s Lþ1Þ1 ðr xu f Þ Šg, ð84þ where L ¼ M þ N The vectors r xx f, r xu f consist of the M 1 last elements of r xx M1, rxu M1 in reverse order Vectors r uu f, r ux f are defined in a similar way Finally, the products X H QM K Qu Q1, U H QN K Qu Q1, X H QM K Qu Q1, U H QN K Qx Q1 and X QM w ff M1, U k QN wfb N1, Uuk QN wfb N1 that appear in Eqs (71) (74) and (75), (76) respectively, are computed by employing the OS sectioning method as shown in Steps 1and3ofTable 5 5 Simulation results To illustrate the performance of the LE and DFE algorithm we provide some simulation results The experiments were carried out for different wireless channel cases Initially we investigated the performance of the algorithms in two time invariant channels named as channel A and vehicular B [21,22] Channel A contained four components with amplitudes 0:9333, 0:5012, 0:5129, 0:5370 and time delays 0T s, 7T s, 14T s, 21T s, respectively T s corresponds to the symbol period vehicular B contained six multipath components with amplitudes 0:5623, 1, 0:0525, 0:1, 0:0030, 0:0251, respectively, and the corresponding time delays with respect to the main peak were T s, 0T s, 17T s, 25T s, 33T s, 39T s, respectively The multipath component phases in both channels again were chosen randomly The input sequence consisted of QPSK symbols, while at the output of the channel, complex white Gaussian noise was added, resulting in SNR equal to 15 db For all experiments, the step sizing parameters involved in the different algorithms were adjusted properly so that the algorithms achieved the same steady-state error In addition, for all FD algorithms, the same initialization strategy was followed Finally, in the tests we conducted, all LE and DFE structures had a proper training phase and the respective filters were equal More specifically in all LE algorithms, the filter was of order M ¼ 64, the delay was M 1 ¼ 16 and the block length was chosen to be equal to the filter order In all DFE algorithms, the involved FF and FB filters were of orders M ¼ 16 and N ¼ 64, respectively, while the block length was equal to Q ¼ 16 In Figs 1(a) and (b) we provide the ensemble average learning curve from 100 independent experiments for the FDCG algorithm, the OS FDAF (FDLMS), the modified-cg and the RLS algorithm in both the LE case and the DFE cases, when the symbols are transmitted through the channel A It is important to mention that in the DFE case the FDLMS-DFE corresponds to the EBA-DFE that was proposed in [13] Note that the three curves of the RLS, MCG and FDCG almost coincide The performance of the LE and the DFE algorithms when the symbols are transmitted through vehicular B is shown in Figs 2(a) and (b) For all algorithms, the minimum mean-squared error (MMSE) in the LE and in the DFE case was calculated by using the actual channel and noise parameters according to equations that have been derived in [23,24] It is clear that both the LE and DFE algorithms proposed here outperform the NLMS and FDLMS and exhibit similar performance with the RLS and the MCG In the

Recurrent neural networks with trainable amplitude of activation functions

Neural Networks 16 (2003) 1095 1100 www.elsevier.com/locate/neunet Neural Networks letter Recurrent neural networks with trainable amplitude of activation functions Su Lee Goh*, Danilo P. Mandic Imperial