An Efficient Low-Complexity Technique for MLSE Equalizers for Linear and Nonlinear Channels

3236 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 51, NO. 12, DECEMBER 2003 An Efficient Low-Complexity Technique for MLSE Equalizers for Linear and Nonlinear Channels Yannis Kopsinis and Sergios Theodoridis, Senior Member, IEEE Abstract In this paper, a novel sequence equalizer, which belongs to the family of cluster-based sequence equalizers, is presented. The proposed algorithm achieves the maximum likelihood solution to the equalization problem in a fraction of computational load, compared with the classic maximum likelihood sequence estimation (MLSE) equalizers. The new method does not require the estimation of the channel impulse response. Instead, it utilizes the estimates of the cluster centers formed by the received observations. Furthermore, a new cluster center estimation scheme, which exploits the intrinsic dependencies among the cluster centers, is proposed. The new center estimation method exhibits enhanced performance with respect to convergence speed, compared with an LMS-based channel estimator. Moreover, this gain in performance is obtained at substantially lower computational load. The method is also extended in order to cope with nonlinear channels. The performance of the new equalizer is tested with several simulation examples, using both the quadrature phase shift keying (QPSK) and the 16-quadrature amplitude modulated (QAM) signaling schemes for linear and nonlinear communication channels. Index Terms Clustering, maximum likelihood sequence estimation (MLSE), nonlinear channel equalization. I. INTRODUCTION ONE of the major problems encountered in the receiver design of any communication system is that of intersymbol interference (ISI), which arises due to the finite bandwidth of the communication channel. The part of the receiver used to mitigate ISI is the equalizer, and the literature around the task is very rich [1]. Moreover, the strong motivation toward portable radio communications and satellite communication systems implies the need for transmission amplifiers, which operate near the saturation point leading to nonlinear distortion. An optimal sequence equalizer is based on the maximum likelihood sequence estimation (MLSE) scheme [2] [4]. MLSE equalizers are implemented via the Viterbi algorithm, and they require the channel impulse response (CIR) to be known. In practice, the CIR is estimated utilizing a least mean square (LMS) or a Kalman-type algorithm. In many communication systems, the data are usually grouped into blocks and are transmitted in bursts. This type of transmission is required for time-division multiple access (TDMA) systems [5] that are employed in several digital cellular and personal communications systems (PCS). In order to estimate the unknown CIR, each Manuscript received May 4, 2001; revised April 8, 2003. The associate editor coordinating the review of this paper and approving it for publication was Dr. Alex C. Kot. The authors are with the Department of Informatics and Telecommunications, Athens University, Athens, Greece (e-mail: kopsinis@di.uoa.gr; stheodor@di.uoa.gr). Digital Object Identifier 10.1109/TSP.2003.818891 block includes a sequence of symbols known to the receiver in advance. This sequence has to be as short as possible to keep the capacity of the communication system as high as possible. Thus, due to the short length of the known symbol sequence, the channel estimator has to converge fast and to estimate the channel accurately. Moreover, the required computational complexity has to be as low as possible. Kalman-type algorithms, including recursive least squares (RLS) schemes, provide fast convergence but are computationally demanding, which is something that limits their practical use. In contrast, LMS and its variants exhibit slower convergence but are more attractive for implementation due to their structural simplicity and their low computational requirements. In the current paper, a novel MLSE equalizer is presented that circumvents the problem of explicit CIR parametric modeling, leading to substantial computational savings. The proposed equalizer belongs to the family of cluster-based sequence equalizers (CBSEs) [7] [12]. The equalizer utilizes the clusters formed by the received observations at the receiver front end. Furthermore, a novel cluster center estimation method is introduced that exploits the structural symmetries underlying the generation mechanism of the clusters of the received symbols. This has a twofold advantage. On the one hand, the number of the required training symbols is substantially reduced compared with previously proposed CBSE equalizers, and on the other, the computational complexity is drastically reduced. The suggested CBSE equalizer operates in the one-dimensional (1-D) space, and its overall complexity is much lower than that of an LMS-trained MLSE equalizer, with an RLS-like performance. This is achieved because the information hidden in the cluster formation mechanism is appropriately exploited. Finally, due to the fact that no parametric modeling of the CIR is required, the method lends itself to systems with nonlinear channel impairments. The performance of the new method is tested in several simulation examples, using both quadrature phase shift keying (QPSK) and 16-ary Quadrature Amplitude Modulation (16-QAM) signaling schemes in linear and nonlinear environments. II. DESCRIPTION OF THE COMMUNICATION SYSTEM AND CHANNEL MODEL Fig. 1 illustrates the equivalent baseband communication system model, where is the th transmitted symbol, which can take one among distinct values, is the additive white noise, and denotes the th received observation. The adopted signaling schemes are the QPSK and the 16-QAM (see Fig. 2). 1053-587X/03$17.00 2003 IEEE

KOPSINIS AND THEODORIDIS: EFFICIENT LOW-COMPLEXITY TECHNIQUE FOR MLSE EQUALIZERS 3237 Fig. 1. Communication system model. III. SUMMARY OF THE MLSE The task of the equalizer is to estimate the transmitted symbols based on the received observations. More specifically, the maximum likelihood sequence solution is to choose that sequence of symbols (out of ) that maximizes the likelihood of the received sequence of observations, i.e., maximizes the joint conditional probability. The obtained sequence is the optimal solution and the procedure is referred to as MLSE. There exist two basic approaches to implement an MLSE equalizer. Forney [2] proposed the first one. The equalizer comprises a whitened matched filter, followed by the Viterbi algorithm using the Euclidean distance metric. Two years later, Ungerboeck [3] proposed an alternative approach, which in the place of the whitening filter, utilizes a distance metric different from the Euclidean one [4]. Recently, [13] it has been shown that the Euclidean distance metric can be rewritten in terms of a filtering operation that circumvents the whitening phase required by Forney s method. This leads to the unification of the two methods. In this paper, we have adopted Forney s receiver structure where the receiving filter is a whitening matched filter, and the distance metric used in the Viterbi algorithm is the Euclidean distance. The states at the th stage of the associated trellis diagram are related to the most recent transmitted symbols, i.e., (3) Fig. 2. QPSK and 16-QAM signaling sets. The transmitted symbol sequence has been assumed to be independently and identically distributed (i.i.d.). Furthermore, the communication channel, comprising the effects of the pulse shaping filter and the receiving filter, can be modeled as a finite impulse response filter spanning over consecutive transmitted symbols, with transfer function. Thus, the received signal sampled at, with being the transmission period of the symbols, is given by where is the complex-valued additive noise, whose real and imaginary parts are both white sequences. The variance of the noise is, and the signal-to-noise ratio (SNR) is determined by (1) SNR (2) where is the symbols variance. According to (1), are the complex conjugated channel taps, and indicates the noiseless observation associated with the transmitted sequence of symbols. Thus, is the vector of the complex taps of the CIR, and is the vector of successively transmitted symbols. The superscripts, denote transposition and Hermitian transposition, respectively. Thus, each state corresponds to one of the possible vectors that can be formed from symbols. There are allowable transitions that emerge from a state and terminate at different states, leading to a total of transition branches connecting two successive states. Each transition is associated with a cost, contributing to the total cost of a path along the states. The cost of the th branch, connecting two specific consecutive states, is given by where is the -element vector of the sequence of symbols, which is defined by the symbols associated with the th branch from state (3) to the new state. In other words, the cost for each transition depends on the two states and the value of the received sample. The symbols associated with the states along the best surviving path are the optimal estimates of the transmitted symbol sequence. IV. RELATION BETWEEN MLSE AND 1-D CBSE Taking into account (1) and (4), we can infer that each one of the transition branches in the trellis diagram is associated with one of the possible noiseless observations, which is uniquely determined by the vector. Hence, the possible values that can take are the points (centers) around which the received samples (observations) are clustered, due to the presence of the noise. (4)

3238 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 51, NO. 12, DECEMBER 2003 Fig. 3. Plot on complex plane of the clusters formed for the two tap channel H(z) =(0:50j)+(00:600:1j)z 1 is used. The crosses denote the cluster centers, and the gray circles are those corrupted by noise-received observations. Fig. 4. Trellis diagram for a two-tap channel case and QPSK signaling scheme. Fig. 3 shows the received observations for a two-tap complex channel with transfer function when the QPSK signaling scheme is adopted. A white Gaussian noise sequence, corresponding to an SNR 30 db, is also present. The notation denotes the th cluster center, which is associated with the transmitted symbol sequence. Due to the one-to-one correspondence between cluster centers and the values of, we label each cluster by the corresponding value of. The spread of each cluster depends on the power of the noise. The number of clusters and their position in the complex plane depend on the number as well as the values of the CIR taps. Fig. 4 illustrates the close relation between transitions in the trellis diagram and the clusters for a channel with two taps. Indeed, each transition defines the label of a cluster. This observation frees us from the need to know the explicit channel estimate. in (4) is the corresponding cluster center, and the distance metric becomes The resulting algorithm is equivalent to the MLSE (4). Thus, the problem of the parametric estimation of the CIR, as it is required by MLSE equalizers, can be circumvented, and all that is needed is to estimate the centers of the clusters formed on the complex plane. Thus, any supervised clustering technique, e.g., [6], can be used in order to detect the centers (e.g., a simple averaging), based on the known training sequence of symbols. The resulting sequence equalizer will be referred to as the onedimensional clustering-based sequence equalizer (1-D CBSE). Previously proposed CBSE equalizers utilize the clusters formed in the receiver front end in a high dimensional space, which are defined by successive observations [7] [10]. Recently, the information of the cluster centers has been used in order to build blind or semiblind equalizers [11]. The major reason for going into a high -dimensional space is the desire to avoid cluster overlapping. The price one pays for it is a high (5) computational burden. Moreover, in [12], it was demonstrated that in the case of supervised clustering, when a training sequence is available, the adoption of higher than 1-D space is not necessary. The reason is that even if a number of clusters with different labels overlap, the Viterbi algorithm has the power to unravel the confusion by exploiting the information hidden in the history of the surviving paths in the trellis diagram. Indeed, in the case of supervised clustering, the 1-D space is sufficient since the observed signal samples can be uniquely assigned to the trellis transitions/cluster centers. Cluster overlap would be a problem only if a symbol by symbol philosophy was adopted, and the estimation was based on the closeness of a received sample to a cluster center. The major drawback of the CBSEs, as well as of symbol-bysymbol equalizers that also require cluster center estimation, e.g., [14], [15], is the need of a relatively long training sequence so that all the clusters to be represented with sufficient observations in order to be able to estimate their centers accurately. A three-tap channel and a QPSK signal set leads to clusters in the 1-D space. For example, assuming that in order to achieve accurate estimates of the cluster centers the clustering algorithm needs about five observations per cluster, it implies that 320 training symbols are required. Considering that the widely used mobile communication systems GSM and IS-54 use 26 and 28 training bits per transmitted block, respectively, it is obvious that the use of clustering-based equalizers in such systems is prohibitive. In the current paper, we propose a novel method for the center detection, which speeds up the training period dramatically, by exploiting the intrinsic dependencies among the cluster centers and the symmetries underlying their structure. V. NOVEL CENTER ESTIMATION TECHNIQUE This section presents a new method for the cluster center estimation, which does not require the direct estimation of all the clusters. The new center estimation (CE) method exploits the

KOPSINIS AND THEODORIDIS: EFFICIENT LOW-COMPLEXITY TECHNIQUE FOR MLSE EQUALIZERS 3239 inherent mechanism that generates the clusters, and it is able to reveal all the centers, utilizing the estimates of only properly selected centers. The analysis of the method will be carried out for the QPSK case, and the extension to the 16-QAM signaling scheme will be presented subsequently. Let us assume a general -tap channel with impulse response vector. We define, as the tap contribution of the th tap to the generation of a cluster center, the quantity (6) In other words, this is the contribution of the tap in the convolution sum in (1). We can observe that can take one out of four different values, depending on the value of the symbol. We denote these values as. Using this notation, (1) can be rewritten as Fig. 5. Cluster centers constellation of a one-tap channel. (7) where is the cluster center associated with the transmitted -tuple. Furthermore, it is easy to realize that for each, only one of the four possible values, say, needs to be computed, and the rest can be obtained by simple rotations in the complex plane. This is due to the fact that the QPSK symbols are positioned at the corners of a square in the complex plane [see Fig. 2(a)]. Therefore, if, for example, we have the contribution available, then, and. In the sequel, we will show how the above observations can be exploited in order to estimate all the cluster centers in an efficient way. We will demonstrate the method via a series of examples, and the generalization is straightforward. Example 1: : In this extreme case of a single tap channel, the four different cluster centers are the corners of a square whose size and angle of rotation depends only on the value of the single tap. Each one of the centers corresponds to one among four possible transmitted symbols, and the resulting square is illustrated in Fig. 5. It is easy to see that in this case, the contribution coincides with the observed center. Once this has been obtained, the rest of the cluster centers are obtained as:. Thus for, it suffices to estimate only one cluster center. Example 2: : In this example, a second tap has been added to the one-tap channel of the first example. In this case, each one of the centers corresponds to one of the possible two-symbols combination, and it leads to the structure illustrated in Fig. 6. Due to the contribution of the second tap, the observed centers are also positioned at the corners of four similar squares, which are drawn with solid lines. The points on which these squares are centered are specified by the values of contribution of the first tap, i.e., the corners of the square drawn with dashed line. Exploring this structure of the cluster Fig. 6. Cluster center constellation of a two-tap channel. centers, one can find various ways to compute the tap contributions using only two cluster centers. For example, it is easy to see that After the computation of the two contributions, the detection of any cluster center is straightforward, e.g.,. Example 3: : In the same manner, if we add a third tap, say,, its contribution would specify the size and the angle of rotation of the squares whose centers are specified by the combined contributions of the first and the second taps and, respectively. The final structure for this three-tap

3240 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 51, NO. 12, DECEMBER 2003 TABLE I L CLUSTER CENTERS REQUIRED FOR THE ESTIMATION OF THE TAP CONTRIBUTIONS GENERATED FROM THE BASIC SEQUENCE OF SYMBOLS x = [x ;x ;...;x ;...;x ] to estimate a center observations, i.e., is by averaging the corresponding (8) Fig. 7. Cluster center constellation of a three-tap channel. channel example is shown in Fig. 7. Following similar arguments as before, one can see that it suffices to estimate only three centers in order to compute the tap contributions. It turns out that in the general case of taps, it also suffices to estimate only (out of ) properly selected centers. This is natural since free parameters are sufficient to describe the generation mechanism of the clusters, e.g., (1). where is the th observation corresponding to, and is the number of observations associated with. B. Computation of the Contributions For the computation of the tap-contributions, we first have to compute the basic center following the expression Proof: Taking into account that (9) A. Selection and Estimation of the Centers for QPSK In this section, we present a procedure that defines the centers that have to be estimated. Once these centers are known, we can compute the tap contributions, and in the sequel, the rest of the cluster centers are computed by simple additions and rotations. To this end, we first choose a sequence of symbols. We call this sequence a basic sequence, and it can be any sequence of successive QPSK symbols. The associated observed center is called the basic center. This basic sequence generates the centers that are required for the computation of the channel contributions in accordance to Table I. In other words, the sequence of symbols associated with the th desired center is the basic sequence, where the th symbol is rotated by 180 in the complex plane. A training sequence consisting of symbols produces observations at the receiver. A supervised clustering algorithm must utilize these observations in order to estimate the specific clusters. For each one of these clusters, the algorithm involves only the observations that correspond to this specific cluster centers given in Table I. Thus, the training sequence has to be constructed in such a way that it represents all the centers as effectively as possible. The simplest way we have Equation (9) follows as a direct consequence. Based on, the computation of the tap-contributions is straightforward and is given by (10) The selection of the basic sequence is not crucial to the problem. Using the basic sequence, only one among the four values for each tap contribution is computed via

KOPSINIS AND THEODORIDIS: EFFICIENT LOW-COMPLEXITY TECHNIQUE FOR MLSE EQUALIZERS 3241 (10). If, for example, for a three-tap channel we select as basic sequence, then, via (10), we will compute the contributions. The other three values for each contribution are then obtained via rotations. Although the specific choice of the basic sequence is not important to the computation of the tap contributions, it must be selected in such a way so that it will guarantee that all centers are visited a sufficient number of times to safeguard good estimates for all the centers. We will come back to this later on during the simulation examples. C. 16-QAM Case In the case of 16-QAM signal constellation, the philosophy of the method remains the same. As one can see in Fig. 2(b), the difference between the two signaling schemes is that instead of having four symbols positioned at the corners of a square, we have now 16 symbols placed at the positions defined by a square grid. The result is that rather than having four different values for each tap contribution, in the 16-QAM case, one has to deal with 16 values. More important, the QPSK symbols are a subset of the 16-QAM symbols. Due to the fact that the contribution of the th channel tap depends only on the specific tap weight and the symbol, we can infer that the four different values of the th tap contribution, i.e.,, are the same for the QPSK as well as for the 16-QAM signaling scheme. The remaining 12 values can be computed from these four values. This can be achieved following the expression (see Appendix A): (11) An alternative solution would be to compute first the channel impulse response, based on the contributions, and in the sequel to detect the 16 values for each tap. The in-phase and quadrature components of the th CIR tap are given by (see Appendix B) (12) where are the real and imaginary parts of the value of the th tap contribution, respectively. Therefore, any contribution associated with the value and the th tap is given by (13) Other scenarios are also possible. For example, in the case of -QAM, constellations, it turns out that the best performance, with respect to noise, is achieved if the is selected to comprise the higher energy symbols. For example, for a three-tap channel and the 16-QAM signaling scheme, better immunity to noise is obtained if is adopted, and the computation of the contributions is based on it in a similar procedure as explained above. The extension to higher level QAM constellations is straightforward. The basic steps for the center estimation (CE) algorithm are summarized as follows. Step 1) Choose the basic sequence. Step 2) Define the required centers for training (seetable I). Step 3) Estimate the centers by averaging the associated observations (8). Step 4) Compute the basic center (9). Step 5) Compute the tap contributions (10) [and (11) for QAM constellations]. Step 6) Compute the remaining centers (7). VI. PERFORMANCE RESULTS FOR THE LINEAR CHANNEL CASE The focus of the first set of experiments is to study the convergence speed (in terms of the required number of training symbols) of the new method and to compare it with that of the LMS, RLS algorithms, e.g., [16] [18], in various noise levels. The most common way to approach such problems is to adopt a couple of channels and perform the comparisons. In order to have a statistically more representative result, we performed the experiments using 500 different channels, and the reported results are the obtained mean values. The channels are five taps long, and they were constructed to simulate realistic conditions. To this end, we simulated a Rayleigh fading channel [19]. We recorded the time-varying channel during the transmission of 500 successive data bursts (blocks), and we selected the impulse responses of the channels as the snapshots of the fading channel corresponding to the middle of each burst. The bursts were 200 symbols long, the bit rate 300 kb/s, and the carrier frequency was 900 MHz. In order to realize Rayleigh fading characteristics, the real and imaginary components of each tap were chosen to be white Gaussian sequences spectrally shaped according to Clarkes fading model [20], [21]. The maximum Doppler frequency corresponds to a vehicle running at the speed of 150 km/h, and the rms value of the envelope of all the produced sequences was set equal to 1. As basic symbol sequence, we used the vector. In order to represent the centers associated with this basic sequence efficiently, we selected the training sequence to be a repetition of the symbols. The number of successive repetitions is determined by the available length of the training sequence. It is not difficult to see that each repetition corresponds to observations that are cyclically distributed among the centers. In order to compare the convergence speed of the new method with that of the LMS and the RLS, the estimation of the impulse response, using the new method, was obtained via (12). It must be emphasized that the impulse response is not required in our method, but it is computed in order to compare similar quantities. Fig. 8 summarizes the results for two SNR levels 30 and 10 db, respectively. The plotted quantity is the normalized mean tap error (MTE) in decibels between the true and the estimated values of the impulse response taps given by MTE (14)

3242 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 51, NO. 12, DECEMBER 2003 Fig. 8. MSE between real and estimated channel for various training sequence lengths. Fig. 9. Symbol error rate performance for QPSK signaling scheme. We can easily see that the performance of the new method, for the channel lengths considered in this paper, is close to that of the RLS algorithm. As we will see in the next section, this performance is achieved with complexity that is much lower than that of the LMS algorithm. The similar trend in performance between the new cluster-based estimation method and the LS estimator can easily be explained. The new CE method estimates a cluster center by averaging a number of received samples. If is the resulting estimate, it is known that this is a unbiased estimator with associated variance [6]. The latter quantity assumes white noise of variance and that data points have been used for training (averaging) the center. Assume now that the same points have been used to train the channel model. In [16, p. 451], it is shown that the RLS estimator is approximately unbiased (for finite ),, leading to an unbiased estimation of the corresponding center, i.e., The overall variance of the estimator due to the model mismatch, i.e.,, is easily shown [16, p. 453] to be equal to as well. Of course, the above comment concerns only the clusters that are estimated directly from the data. A more comprehensive statistical analysis of the new method and its close relationship to the least squares solution is presented elsewhere [22]. The LMS used in the experiments was optimized. The step size parameter was selected to be 0.05 since it provided the best tradeoff between convergence speed and MSE. The training sequence for the LMS algorithm was a repeated constant amplitude zero autocorrelation (CAZAC) sequence [23]. For other training sequences, the performance of the LMS algorithm was degraded substantially. In the sequel, we study the symbol error rate (SER) performance of the 1-D CBSE employing the new center estimation (CE) technique and the MLSE using the LMS algorithm for the QPSK and the 16-QAM signaling schemes, respectively. The performance of the RLS was always almost identical with that of CE and will no longer be considered. For both cases, the adopted complex channel had the transfer function. The transmission was realized in blocks, where each block comprised 200 data symbols together with a number of training symbols placed in the front of each block. The algorithms were tested for training sequences 10 and 30 symbols long. The step size parameter, which was adopted by the LMS algorithm, was time varying and optimized to give the best performance. This varying step size led to fast convergence with good estimates for the specific three-tap channel. Moreover, the LMS algorithm was trained by a CASAC sequence, and the adopted basic centers for the cases of the QPSK and the 16-QAM signaling were and, respectively. Furthermore, the Viterbi algorithm was initialized for each block separately, started from the right state, and determined by the last symbols of the training sequence. It is readily seen from Figs. 9 and 10 that the proposed equalizer outperforms the LMS-MLSE for all the cases of signaling sets and training sequence lengths. Moreover, this difference in performance is obtained at substantially lower computational load. VII. COMPUTATIONAL COMPLEXITY REQUIREMENTS This section deals with the computational requirements of the new method, which is compared with the LMS-based technique for the estimation of the unknown CIR. Obviously, the computational requirements for the RLS (even for its fast versions) is much higher. The overall computational load of an MLSE equalizer consists of two parts. The first part refers to the channel (LMS) or to the cluster CE. The second part refers to the computational requirements associated with the Viterbi algorithm.

KOPSINIS AND THEODORIDIS: EFFICIENT LOW-COMPLEXITY TECHNIQUE FOR MLSE EQUALIZERS 3243 TABLE II COMPUTATIONAL COMPLEXITY FOR THE QPSK SIGNALING SCHEME IN TERMS OF REAL OPERATIONS OF THE LMS AND THE NEW METHOD FOR CHANNEL AND TAP CONTRIBUTIONS ESTIMATION, RESPECTIVELY Fig. 10. Symbol error rate performance for 16-QAM signaling scheme. TABLE III COMPUTATIONAL COMPLEXITY IN TERMS OF REAL OPERATIONS OF THE CHANNEL-BASED MLSE AND THE 1-D CBSE EQUALIZERS, FOR THE VITERBI STAGE The computational complexity of the new method for the center estimation is given in Table II together with that of the LMS, in terms of real multiplications and additions, for the QPSK signaling scheme. denotes the number of training symbols. It is very important to note that the number of multiplications and divisions required by the new method is independent of the amount of training symbols, and it is much lower than that required by the LMS. Divisions and multiplications are performed once per training block. In the same table, the number of computations required for a realistic example consisting of 30 training symbols is also shown. Regarding the second set of computations, the more consuming part in the Viterbi algorithm is the computation of the Euclidean distance metrics. In the case of the LMS-MLSE, the distance metric is given by (4), which demands the convolution of the estimated channel impulse response with the transmitted symbols associated with the corresponding trellis branch. In the case of the 1-D CBSE, the distance metric is given by (5). Assuming stationarity or insignificant channel variation during the transmission of a data block, the convolutions of (4) are precomputed once at the beginning of the data block and stored in memory cells. In the same way, the cluster centers, which are utilized by the 1-D CBSE, can be computed once per data block based on (7). The rest of the computations concerning the Viterbi algorithm are the same for both equalizers. Table III shows the number of computations required by the convolutions and the centers detection for the MLSE and the CBSE, respectively, in terms of real multiplications and additions. Comparing the two implementations of the optimum sequence equalizer, the computational complexity of the standard MLSE is substantially higher since a) the 1-D CBSE requires about half the amount of additions and b) the specific equalizer needs no multiplications. However, the assumption of a time-invariant channel during the transmission of a data block is not always valid (e.g., [24], [25]) and could lead to critical performance degradation. In such a case, the channel taps (or the tap contributions) have to be re-estimated adaptively during the transmission, e.g., based on tentative decisions provided by the Viterbi algorithm. As a result, the convolutions or the centers related to the standard MLSE and the 1-D CBSE, respectively, need to be computed at every stage of the trellis diagram. For such a case, the difference in the computational requirements between the new cluster-based method and the channel model-based MLSE scheme becomes quite substantial. For example, the total number of real additions required by the Viterbi part of the equalizer for a data packet consisting of information symbols and a channel with three-taps, when the QPSK signaling scheme is adopted, are 179 200 for the MLSE equalizer and 102 400 for the CBSE. More important, the CBSE equalizer needs no multiplications in the Viterbi stage, in contrast to the MLSE that requires more than 150 000. Actually, this is the case of the simulation example illustrated in Fig. 9. Thus, in such cases, the standard MLSE may become impractical, due to the large amount of the involved multiplications. Finally, it must be stated that all the suboptimal methods that have been suggested for the MLSE, e.g., [26] [28], in order to tradeoff complexity with performance, are readily applicable for the case of the 1-D CBSE method. VIII. CASE OF NONLINEAR CHANNELS In a number of modern digital communication systems, a compromise between power efficiency and the linearity of the transmitter amplifiers is required. For better exploitation of the available power, the amplifiers have to operate near their saturation point. This leads to nonlinear distortion of the transmitted signal with a subsequent critical performance degradation when

3244 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 51, NO. 12, DECEMBER 2003 linear equalizers are employed. In that case, the MLSE requires the nonlinear modeling of the channel to be adopted, which is something that is not always a straightforward task. A simplified baseband model of the communication system, which comprises the nonlinear distortion, is illustrated in Fig. 11. The nonlinearities of the channel are expressed as a polynomial function of the symbols in the output of the channel, e.g., [9] and [14]. In this case, the noiseless observations are given by the expression (15) where s are appropriate parameters, and is the baseband digital output of the channel given by Fig. 11. Nonlinear communication system model. (16) Finally, the received observations are corrupted by the noise, i.e.,. The adopted polynomial nonlinearity is hard to handle because it is applied at the output of a channel with memory, and this tends to destroy the symmetries underlying the observed centers. This is a more general and more difficult case compared with the memoryless nonlinearities introduced by the traveling wave tube (TWT) amplifiers used in some satellite transmitters. The effect of the nonlinear distortion is the shift of the cluster centers in the complex plane. Fig. 12 illustrates the effect of a nonlinearity with and applied to the channel with transfer function. The crosses denote the cluster centers corresponding to the linear channel, and the triangles are the centers for the case of the nonlinear channel. It is readily seen that the nonlinearity tends to destroy the symmetric structure of the observed centers. As a result, the philosophy underlying the new CE method, which was introduced in the previous sections, is no more valid, unless an approximation is made, provided that a mild nonlinearity is present. The goal of this section is to suggest a hybrid scheme that embeds the new method in a scheme appropriate for nonlinear channels. The best solution for the nonlinear case would be obtained if all the cluster centers are estimated directly using a supervised clustering algorithm [12]. It has been shown that the obtained performance is better than the optimal symbol-by-symbol finite memory Bayesian equalizer with decision feedback at a substantially reduced complexity, e.g., [29] and [30]. However, in this case, one needs a training sequence that visits all clusters a sufficient number of times so that an accurate enough estimate for each one of the cluster centers is obtained. This also depends on the SNR level. For large values of and/or, this can make the necessary length of the training sequence prohibitive for a number of applications. However, it must be stated that this disadvantage of long training sequences is also shared by other techniques for nonlinear channels, such as neural networks, e.g., [14], [15], and [31] [34]. The hybrid technique suggested in this section combines the philosophy of the new methodology for the center estimation with that of the direct estimation of all the centers. The idea is to exploit the available training sequence in the best possible Fig. 12. Cluster centers constellation of a two-tap linear (+) and nonlinear (r) channel with transfer function H(z) =(0:5 0 j) +(00:6 0 0:1j)z. way. Given the length of the training sequence, our goal is now twofold: to combine the estimates obtained by direct estimation of the cluster centers, say, using a supervised clustering algorithm and the estimates obtained by the new center estimation method, say, as discussed before; given the length of the training sequence, to define the sequence of training symbols in such a way so as to visit as many clusters as possible as many times as possible. A. Hybrid Cluster Center Estimation Method for Nonlinear Channels The advantage of the new CE technique is that it achieves the estimation of the tap contributions, and, therefore, the cluster centers, by requiring the direct estimation of only cluster centers. The effect of this is that short training sequences are required. However, in the case of nonlinear channels, this does not lead to accurate estimates. On the other hand, the direct estimation of all the centers provides accurate results, but it requires long training sequences. A tradeoff between short training sequences and accurate center estimates can be achieved by utilizing a combination of both the and the. In other words, the main point behind this hybrid scheme is that the available length of the training sequence is given to us, and we are constrained by it. Thus, it may not be possible to visit all the centers enough times to

KOPSINIS AND THEODORIDIS: EFFICIENT LOW-COMPLEXITY TECHNIQUE FOR MLSE EQUALIZERS 3245 obtain good estimates, and perhaps, some of the centers are not visited at all. Assuming that the centers do not deviate a lot from the place where their linear counterparts are located (i.e., mild nonlinearity), we can also use the new CE to obtain estimates. The combined result depends on the relative confidence one gives on each one of the the two center estimates, i.e., TABLE IV INITIAL VALUES THAT GENERATE THE 6 SUBSEQUENCES FOR THE THREE-TAP CASE (17) The parameter takes values between 0 and 1, depending on the severeness of the nonlinearity and the ability of the clustering algorithm to accomplish good estimates. The latter is related to the SNR level and the number of available observations per cluster. For example, in the extreme case of noiseless transmission and intermediate nonlinearity, the parameter should be set equal to 1 because only one observation per cluster is enough for the direct center estimation method to provide the optimum estimation. In the presence of noise, should be close to 1 only if there are enough available observations in order to achieve accurate estimates of the centers. Otherwise, parameter should be allowed to deviate from 1 sufficiently. In such a case, estimates for more than centers are available. The new center estimation method can be applied to many -tuples of centers, which correspond to different basic centers. For improved results, the final estimate can be obtained as an average of these estimates. If there are clusters that are not visited by the training sequence, then the center estimation is provided by the CE method only. In other words, one could say that the direct center estimation method provides immunity to the nonlinearity, and the new CE method provides some immunity to the noise. We have found that a good choice for the parameter is to compute it via equation (18) SNR where SNR and are the SNR in decibels and the number of observations per cluster, respectively. The parameters take values in an ad hoc manner, depending on the strength of the nonlinearity. B. Selection of the Training Sequence The goal in the choice of the training sequence is different in the case of nonlinear channels from that of the linear channels. In the case of nonlinear channels, it is very important to visit as many clusters as possible as many times as possible. The only restriction for this is related to the number of training symbols, which is available. In order to exploit the training sequence efficiently, we have to choose symbols in a specific order that lead to observations that visit all clusters (approximately) the same number of times. For example, for QPSK signaling and a two-tap channel, a possible choice for the sequence of symbols is to transmit, during the training phase, the following sequence successively: If this sequence is transmitted. times, then each cluster will be represented by observations. This can easily be checked out since this sequence includes all the possible pairs of transmitting symbols once 1. The sequence results by combining two nine-symbol subsequences, namely and Each symbol in these subsequences obeys the equation (19) where for, the initial values for are and and for and, respectively. The resulting subsequences are periodic with period 8 symbols, i.e.,. If only subsequence is transmitted, then eight out of 16 clusters will be visited. The remaining eight clusters are visited by. Thus, a proper combination of the two subsequences could feed, with observations, all the clusters uniformly. The last cluster, which will be fed by the subsequence, is the one corresponding to the vector. Furthermore, the last symbol of is the same as the first symbol of. Therefore, when the two subsequences are combined, the last symbol of can be omitted. In the same way, the last symbol of can also be omitted since it is the same as the first symbol of the combined sequence as this is repeated for a second time. For, we have found that six such subsequences need to be combined, as is indicated in Table IV, alongside with the number of clusters visited by the respective subsequence. For 16 subsequences have been used (see Table V). We also have constructed such subsequences for longer channels. A general methodology is currently under investigation. IX. PERFORMANCE RESULTS FOR THE NONLINEAR CHANNEL CASE The results summarizing the first set of simulation examples for nonlinear channels are illustrated in Fig. 13. The plot shows the mean square deviation of the estimated cluster centers from 1 We can observe that actually, the sequence S includes only 15 out of 16 possible pairs. The pair [01 0 j; 1+j], which is missing, will be represented by the second successive repetition of that sequence since the last symbol of the sequence S is 01 0 j, and the first one is 1+j.

3246 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 51, NO. 12, DECEMBER 2003 TABLE V INITIAL VALUES THAT GENERATE THE 16 SUBSEQUENCES FOR THE FOUR-TAP CASE Fig. 14. SER performance for a three-tap channel in the presence of mild nonlinearity. Fig. 13. Convergence speed as a function of the training sequence length. their exact values, as a function of the training sequence length. The average is taken over 500 different three-tap channels generated in the same way as explained in Section VI. The adopted signaling scheme is the QPSK, and the SNR is set equal to 15 db. Following (15), the adopted function of the channel nonlinearity is. The LMS algorithm (solid line) exhibits very bad performance, as expected, due to the presence of the nonlinear impairment. The other two curves correspond to the direct estimation of the clusters using a supervised algorithm (dash-dotted line) and to the proposed hybrid method (dash line), respectively. The direct estimation method, in order to provide sensible results, needs at least one received observation per cluster, which means that at least 64 observations are required in the case of our example. In contrast, the proposed hybrid method operates satisfactorily, even with a few observations. For the computation of the parameter (18), the parameters were set equal to 0.3 and 0.5, respectively. The next two simulation examples show the ability of the 1-D CBSE to detect data symbols transmitted through a nonlinear channel, as a function of the SNR, for two different lengths of the training sequence, i.e., and symbols long. The transfer function of the adopted channel is. The performance of the new method is compared with the optimum finite memory symbol by symbol the Bayesian decision feedback equalizer [35]. The cluster centers for the Bayesian equalizer and the new method are the same. The feedforward part was of order 3 and the feedback part of order 2 [29], [35]. In addition, for comparison reasons, the performance of the LMS-MLSE is shown. Although it is obvious that linear modeling for nonlinear channels is not the proper choice, our goal is to show the robustness of the new method that needs no explicit modeling. Obviously, the performance of the LMS-based MLSE degrades seriously in the presence of nonlinearities. The performance of the equalizers was tested in the presence of a mild nonlinearity,as well as in the presence of a rather severe nonlinearity. In the presence of a mild nonlinearity (see Fig. 14), the Bayesian and the 1-D CBSE equalizers perform well for both cases of 40 and 100 training symbols. In the more severe nonlinearity case (Fig. 15), the 40 training symbols are not enough, even when the noise is low. When 100 training symbols are used, the 1-D CBSE outperforms the Bayesian equalizer and for certain SNR values as much as 6 db in SER. The LMS-MLSE equalizer is unable to cope with this nonlinear environment. The above reported trend was verified using different nonlinearities and channels. In all cases, the 1-D-CBSE exhibited higher performance than the Bayesian equalizer. Furthermore,

KOPSINIS AND THEODORIDIS: EFFICIENT LOW-COMPLEXITY TECHNIQUE FOR MLSE EQUALIZERS 3247 Finally, therefore Fig. 15. SER performance for a three-tap channel in the presence of rather severe nonlinearity. this enhanced performance is achieved at substantially lower computational complexity. The reason is that although the Bayesian equalizer is a symbol-by-symbol one, it requires the computation of exponentials as well as a much higher number of multiplications and additions [12]. X. CONCLUSION In this paper, a novel technique for the design of MLSE equalizers was proposed. It belongs to the family of cluster-based sequence equalizers, and it exploits the generation mechanism of the received data clusters. This leads to substantial computational savings, compared with the LMS, with improved RLS-like convergence performance. Both linear and nonlinear channel cases were examined. APPENDIX A PROOF OF (11) APPENDIX B PROOF OF (12) Following the notation for the real and the imaginary component of the th tap contribution, we have REFERENCES [1] J. G. Proakis, Digital Communications, 3rd ed. New York: McGraw- Hill, 1995. [2] G. D. Forney, Maximum-likelihood sequence estimation of digital sequences in the presence of intersymbol interference, IEEE Trans. Inform. Theory, vol. IT-18, pp. 363 378, May 1972. [3] G. Ungerboeck, Adaptive maximum likelihood receiver for carrier modulated data transmission systems, IEEE Trans. Commun., vol. COM-22, pp. 624 635, May 1974. [4] R. D Avella, L. Moreno, and M. Sant Agostino, An adaptive MLSE receiver for TDMA digital mobile radio, IEEE J. Select. Areas Commun., vol. 7, pp. 122 129, Jan. 1989. [5] D. D. Falconer, F. Adachi, and B. Gudmundson, Time division multiple access methods for wireless personal communications, IEEE Commun. Mag., vol. 33, pp. 50 57, Jan. 1995. [6] S. Theodoridis and K. Koutroumbas, Pattern Recognition. New York: Academic, 1998. [7] S. Theodoridis, C. M. S. See, and C. F. N. Cowan, Nonlinear channel equalization using clustering techniques, in Proc. Int. Contr. Conf., 1992. [8] S. Theodoridis, C. F. N. Cowan, C. P. Callender, and C. M. S. See, Schemes for equalization of communications channels with nonlinear impairments, Proc. Inst. Elect. Eng. Commun., vol. 142, pp. 165 171, June 1995. [9] K. Georgoulakis and S. Theodoridis, Efficient clustering techniques for channel equalization in hostile environments, Signal Process., vol. 58, pp. 153 164, 1997. [10], Channel equalization for coded signals in hostile environments, IEEE Trans. Signal Processing, vol. 47, pp. 1783 1787, June 1999. [11], Blind and semi-blind clustering equilization using hidden Markov models, Signal Process., vol. 80, pp. 1795 1805, Sept. 2000. [12] Y. Kopsinis and S. Theodoridis, Reduced-complexity clustering techniques for nonlinear channel equalization, in Proc. WCC, Beijing, China, 2000. [13] G. E. Bottomley and S. Chennakeshu, Unification of MLSE receivers and extension to time-varying channels, IEEE Trans. Commun., vol. 46, pp. 464 472, Apr. 1998. [14] S. Chen, B. Mulgrew, and P. M. Grant, A clustering technique for digital communications channel equalization using radial basis function networks, IEEE Trans. Neural Networks, vol. 4, pp. 570 590, July 1993. [15] B. Mulgrew, Applying radial basis functions, IEEE Signal Processing Mag., vol. 13, pp. 50 65, Mar. 1994. [16] S. Haykin, Adaptive Filter Theory, 4th ed. Englewood Cliffs, NJ: Prentice-Hall, 1996. [17] N. Kalouptsidis and S. Theodoridis, Adaptive System Identification and Signal Processing Algorithms, NJ: Prentice-Hall, 1996. [18] S. N. Crozier, D. D. Falconer, and S. A. Mahmoud, Least sum of squared errors (LSSE) channel estimation, Proc. Inst. Elect. Eng. F, pp. 371 378, Aug. 1991. [19] B. Sklar, Rayleigh fading channels in mobile digital communication systems part I: Characterization, IEEE Commun. Mag., vol. 35, pp. 90 100, July 1997. [20] R. H. Clarke, A statistical theory of mobile-radio reception, Bell Syst. Tech. J., vol. 47, pp. 957 1000, 1968. [21] J. I. Smith, A computer generated multipath fading simulation for mobile radio, IEEE Trans. Veh. Technol., vol. VT-24, pp. 39 40, Aug. 1975. [22] E. Kofidis, Y. Kopsinis, and S. Theodoridis, On the least squares performance of a novel efficient center estimation method for clustering-based MLSE equalization, IEEE Trans. Signal Processing, submitted for publication. [23] IEEE 802.16.1 Standart for Fixed Wireless Access LMDS, Local Multipoint Distribution Systems. [24] R. A. Ziegler and J. M. Cioffi, Estimation of time-varying digital radio channels, IEEE Trans. Veh. Technol., vol. 41, pp. 134 151, May 1992. [25] G. Castellini, F. Conti, E. Del Re, and L. Pierucci, A continuously adaptive MLSE receiver for mobile communications: Algorithm and performance, IEEE Trans. Commun., vol. 45, pp. 80 89, Jan. 1997.