MULTI-USER REDUCED RANK RECEIVERS FOR TD/CDMA SYSTEMS

Size: px

Start display at page:

Download "MULTI-USER REDUCED RANK RECEIVERS FOR TD/CDMA SYSTEMS"

Ann Bates
5 years ago
Views:

1 MULTI-USER REDUCED RANK RECEIVERS FOR TD/CDMA SYSTEMS MONICA NICOLI UMBERTO SPAGNOLINI, Advisor ANDERS AHLÉN, Referee POLITECNICO DI MILANO 2001 Dottorato di Ricerca Ingegneria Elettronica e delle Comunicazioni - XIV Ciclo

2 Dissertation for the degree of Doctor of Philosophy in Electronic and Communication Engineering at Politecnico di Milano, ABSTRACT M. Nicoli, Multiuser Reduced Rank Receivers for TD/CDMA Systems. In the past few years the number of users of wireless communication systems has been growing exponentially. For this reason increasing system capacity is a critical issue, especially for next generation cellular systems. Wideband code division multiple access (CDMA) is the preferred multiple access technique for third generation systems. As CDMA capacity is interference limited, any reduction of cochannel interference from own cell (MAI, Multiple Access Interference) and neighboring cells (intercell interference) improves the system performance. A promising approach to suppress interference and multipath channel distortion is the integration of array signal processing and multi-user detection. Multiuser space-time processing requires the knowledge of the input-output transfer function of the MIMO system (the channels from each transmitter to each receiver antenna) and, to allow interference rejection, the space covariance matrix of the co-channel interference. In time-slotted CDMA systems (such as the TDD mode of UMTS) the estimation is based on the transmission of training sequences known at the receiver, however the performance of the estimate is limited by the large number of the channel unknowns and the short training data. Moreover, when the velocity of the mobile is large, the estimate obtained from the training sequence has to be adapted over the time slot. In this thesis the accuracy of the estimate is improved by: i) reducing the number of the parameters that describe the channel matrix; ii) extending the training set by a multi-slot approach; iii) tracking the temporal variability of the fading channel within each slot. We propose a multi-user reduced rank estimation technique that is based on a parsimonious but effective parameterization of the channel, as trade-off between the model distortion (due to the under-parameterization) and the variance of the channel estimate (due to the limited training sequence length). The channels of all the users are estimated jointly, with the constraint that the space-time channel matrix of each user is low-rank. The method is then extended to handle multi-slot: the slowly varying component of the low-rank channel is estimated from the observation of successive midambles (inter-slot tracking), while the fast varying component is updated over the burst interval in decision-directed mode (intra-slot tracking). The mean square error of the reduced rank estimate is evaluated analytically for both single slot and the multi-slot. The analysis shows that the accuracy depends on the number of the degrees of freedom in the channel parameterization. Simulation results prove the relevant advantage of the proposed technique compared to the conventional least squares estimate and confirm the validity of the presented theory. Keywords: channel estimation, reduced rank, multi-slot, multi-user detection, antenna array, time slotted CDMA, adaptive receiver, tracking. Monica Nicoli, Dipartimento di Elettronica e Informazione, Politecnico di Milano, Piazza L. da Vinci 32, I Milano, Italy. monica.nicoli@elet.polimi.it.

3 To my parents

4 iv

5 Contents 1 Introduction Motivation Overviewofthework Problemformulation Space-timemulti-userdatadetection Multi-userchannelestimation Outlineofthethesis Thesiscontributions Space-time multi-user receivers Introduction HybridTDMA/CDMAsystems Systemmodelandproblemformulation Signalmodelforchannelestimation Unconstrainedchannelestimation PerformanceAnalysis Signalmodelfordatadetection Conventionaldetector:singleusermatchedfilter OptimumMUD:maximumlikelihood Linearmulti-userdetectors Decorrelatingdetector MMSEdetector Decisiondrivenmulti-userdetectors EffectofnoisychannelestimationonMUD Simulationresults Conclusions Efficient algorithms for block multi-user detection Introduction Datamodel Computationoftheidealdetector FactorizationbyblockbandCholeskyalgorithm FactorizationbyblockSchuralgorithm Computationalcomplexityevaluation Approximatedetectors v

6 vi Contents ApproximateCholeskydetector(ACD) Slidingwindowdetector(SWD) Computationalcomplexityandperformanceanalysis Systolicarray Conclusions Reduced rank channel model Introduction Reducedrankpropertyofthepropagationchannel Rankanalysisformultipathpropagation Conclusions A Rank-rapproximationintheFrobeniusnorm Single-user reduced rank channel estimation Introduction Reducedranksignalmodel Reducedrankchannelestimation SingleuserRRestimate(SU-RR) Performanceanalysis Selectionoftherankorder Simulationresults Conclusions A ProofofTheorem B EquivalentformulationsfortheRRestimate C ProofofProposition D ModelorderselectionbyMDL Space-time reduced rank receivers for multi-user system Introduction Reduced rank multi-user channel estimation SignalmodelforRRchannelestimation RRchannelestimation(MU-RR) Performanceanalysis Reduced rank multi-user detection SignalmodelforRRdataestimation Multi-user data detection Computationalcomplexity Simulationresults Systemspecifications Performanceanalysisforclusteredmultipathchannels PerformanceanalysisforCOST-259channels Conclusions A Re-estimationfromleastsquaresestimate B ProofofTheorem

7 Contents vii 7 Adaptive multi-slot channel estimation Introduction Physical considerations on the temporal variability of the propagation channel Reducedrankmodelfortimevaryingchannels Remarkonthenotation Inter-slotchannelestimation RRestimateintheSpacedomain(S-MS-RR) DataModel Method Performanceanalysis RRestimateintheTimedomain(T-MS-RR) Datamodel Method Performanceanalysis RRestimateinthejointSpace-Timedomain(JST-MS-RR) Datamodel Method Performanceanalysis RRestimateintheSpaceandTimedomain(ST-MS-RR) Method Comparisonofperformance Simulationresults:inter-slotchannelestimation Intra-slotchanneltracking Systemmodel Unconstrainedtracking Reducedranktracking Simulationresults:intra-slottracking Concludingremarks Concluding remarks 133 Bibliography 135 A COST-259 Directional Channel Model 141 A.1 Space-timedescriptionofthewirelesschannel A.2 Modelparameters A.3 Radioenvironmentsforoutdoormacrocells B UTRA-TDD standard 147 B.1 Framestructure B.2 Spreadingcodes B.3 Midambles...149

8 viii Contents

9 Acknowledgments I wish to express my sincere gratitude to my supervisor, Professor Umberto Spagnolini, for his valuable support and insight in this research. His constructing suggestions and comments have been extremely helpful. Thanks also to his little baby, Giorgio, who has made Umberto definitely better in these last few months! I am grateful to Professor Fabio Rocca for having introduced me to this research area and for the opportunity to work with him and the digital signal processing group. I would like to acknowledge also Vittorio Rampa and Osvaldo Simeone for their contribution to this work, and Siemens ICN Italy for supporting the research. During the last year of my Ph.D. I spent a period at Uppsala University in Sweden. I am very grateful to Professor Anders Ahlén and Professor Mikael Sternad for giving me the opportunity to visit their Signal and System research group. I would like to thank them for helping me in the last part of the work with useful discussions and suggestions. Thanks to all the people at Signals and Systems, and especially to the Ph.D. students of the group, for making these months such a beautiful experience. I really enjoyed Sweden! I wish to thank my closest friends during this Ph.D., Paolo Mazzucchelli, Nicola Bienati, Andrea Grandi, José Picheral and Matteo Beretta, for making the working environment so friendly and enjoyable. A special thank to my office mate during these three years, Paolo, for his positive attitude and his patience. He helped me so many times, especially with computer problems. My thanks to Nicola, first of all for his friendship, and also for helping me to improve this manuscript and for his valuable advice. My greatest thanks to my parents, my brothers and sisters, for their continuous encouragement over these academic studies. Finally, I would like to thank Roberto, for his belief in me, his patience and most of all for his love. Monica Nicoli Milano, December ix

10 x Acknowledgments

11 Notational conventions Notations Transpose of the matrix X Transpose conjugate of the matrix X Conjugate of the matrix X tr(x) Trace of the matrix X X Determinant of the matrix X X 0 X is semipositive definite ( > for positive definite) vec(x) Column vector formed by stacking the columns of the matrix X ontopofeachother X 1/2 Cholesky factor of a positive definite matrix X: [X 1/2 ] H X 1/2 = X X H/2 [X 1/2 ] H X H [X 1 ] H tri(x) Strictly upper triangular part of X svd r (X) Singular value decomposition of X truncated to the r largest singularvalues eigv r (X) Eigenvectors associated to the r largest eigenvalues of the matrix X λ i (X) ith singular value of the matrix X X Subspace spanned by the columns of the matrix X (range of X) X Orthogonal complement of X P(X) Orthogonal projector onto X P (X) Orthogonal projector onto X I n n n identity matrix 1 m n m n matrix with entries equal to 1 (1 n for square matrix) 0 m n m n matrix with entries equal to 0 (0 n for square matrix) diag(x) Diagonal matrix having the elements of the vector x as diagonal entries O(x) O(x)/x is bounded when x 0 log( ) Natural logarithm of Re( ) Real part of Im( ) Imaginary part of R Real set C Complex set Nearest upper integer Nearest lower integer dec( ) Hard detector from continuous decision variable X T X H X xi

12 xii Notational conventions Hadamard product Convolution Kronecker product Distributed as Approximated by Proportional to E[ ] Expected value of N (µ, σ 2 ) Normal distribution with mean µ and variance σ 2 CN(µ, σ 2 ) Complex normal distribution with mean µ and variance σ 2 Cov( ) Covariance of Abbreviations AWGN BS BTS CCI C/I CDMA COST CRLB DOA DF EVD FIM FDD FDMA GSM ISI LOS LMS LMS-N LS LSE MAC MAI MAP MDL MIMO ML MLE MMSE MS Additive white Gaussian noise Base station Base tranceiver station Co-channel interference Carrier to interferer ratio Code division multiple access European cooperation in the field of scientific and technical research Cramér Rao lower bound Direction of arrival Decision feedback Eigenvalue decomposition Fisher information matrix Frequency division duplex Frequency division multiple access Global system for mobile communications Inter-symbol interference Line of sight Least mean square Least mean square with Gauss-Newton update Least squares Least squares estimate Multiply and accumulation Multiple access interference Maximum a posteriori Minimum description length Multiple input multiple output Maximum likelihood Maximum likelihood estimate Minimum mean square error Mobile station

13 Notational conventions xiii MSE MU MUD RLS RR S SIR SNR ST SU SVD SWD T TDD TDMA TOA ULA UMTS UTRA WSSUS ZF Mean square error Multi-user Multi-user detection Recursive least squares Reduced rank Space Signal to interference ratio Signal to noise ratio Space time Single-user Singular value decomposition Sliding window decorrelator Time Time division duplex Time division multiple access Time of arrival Uniform linear array Universal mobile telecommunication system UMTS terrestrial radio access Wide sense stationary uncorrelated scattering Zero forcing

14 xiv Notational conventions

15 Chapter 1 Introduction 1.1 Motivation THE goal of mobile communication systems is to provide improved and flexible services to a larger number of mobile users at lower costs. This objective results in a big challenge for the wireless technology, that is increasing system capacity and quality within the limited available frequency spectrum. Two major factors limit the performance and the capacity of the system: Multipath propagation. The transmitted signal propagates through the wireless channel along different paths (referred to as multipath) each characterized by amplitude, delay and angle of arrival. At the receiver the signal results much weaker due to propagation loss and is subject to fluctuations of power that are caused by the multipath (fading). The amplitude of the arrival along each path can change very rapidly over the time, due to the movement of the mobile user and the Doppler effect (time selective channel). Furthermore, when the delay separation between paths is comparable with the symbol period, intersymbol interference (ISI) arises (frequency selective channel). Power fluctuations, Doppler spread and delay spread increase the probability of error at the receiver. Co-channel interference. In order to make an efficient use of the frequency bandwidth wireless networks are based on cellular layout with frequency reuse. The service area is divided into small cells which re-use subsets of the total available radio channels. When in neighboring cells other mobiles are using the same radio channel, the signal of the user of interest is affected by co-channel interference and the quality of the communication is reduced. Frequency planning has the purpose to limit the mutual co-channel interference. Since the available bandwidth is limited, a possible way to increase the number of users per unit of surface could be the reduction of the cell size. However, this would require additional costs for the deployment of the new base stations. In this thesis we consider technologies that can improve system capacity by reducing the degradation effects of multipath propagation and interference, without requiring the reduction of the cell size. We will consider the link from the mobile stations to the base station (uplink), focusing on third generation mobile systems based on code division multiple access (CDMA). In particular, we will test the proposed solutions by simulating the TDD mode of UMTS [28] which is a time slotted CDMA system. Differently from time division multiple access (TDMA) and frequency division multiple access (FDMA) in which users are separated in time or frequency, in CDMA users share the 1

16 2 Introduction same time interval and the same frequency band being separated by different spreading codes [59]. User codes are designed to be orthogonal (or quasi orthogonal), however, due to multipath propagation, orthogonality cannot be maintained at the receiver and signals interfere with each other (multiple access interference, MAI). Hence, in CDMA co-channel interference arises not only from neighboring cells (inter-cell), but also within the cell (intra-cell). This is the major limiting factor of the capacity of CDMA system. In order to cope with multiple access interference, at the base station advanced multiuser receivers can be adopted to elaborate jointly the signals from all users active in the cell [85]. Furthermore, a promising approach to improve system performance is the use of multiple antennas that enable to explore the spatial dimension of the propagation [54]. Array receivers process the signals in space and time by operating simultaneously on all the antennas, this reduces the effect of co-channel interference and multipath fading. The integration of multi-user signal detection and array signal processing improves system capacity, coverage and quality [45], [88]. Indeed, since the interference level is decreased, a larger number of subscribers can be supported either by reducing the reuse distance (even down to one, i.e. all frequencies are re-used in every cell) or by allocating more CDMA codes in the same cell. The drawback of the space-time multi-user processing is, on one hand, the increased computational complexity required by the joint processing of signals from multiple antenna and from multiple users. On the other hand, the knowledge of both the space and time features of the channel is required, for all users active in the cell. In time slotted CDMA systems this estimation is based on the transmission, in every data packet, of training sequences that are known at the receiver. In order to gain transmission efficiency, the number of training symbols in each data packet is usually too small to estimate such a large number of channel parameters. Hence, the channel estimate becomes unreliable and increases the probability of error in data detection. In this thesis we propose multi-user space-time processing that improves the estimate accuracy by: reducing the number of parameters that describe the wireless channel; virtually extending the training set by exploiting training data from consecutive slots; tracking fast varying parameters within each slot in case of rapidly moving mobiles. All these solutions are developed by assuming a reduced rank model for the channel of each user. This parsimonious but effective parameterization not only improves the channel estimation performance, but also reduces the complexity of the structure of the space-time equalization. 1.2 Overview of the work Problem formulation We consider the uplink of a CDMA system in which K mobile users share the same radio channel, i.e., the same time interval and frequency band. In order to separate the users, before the transmission the signals are multiplied by user-specific spreading code at a (chip) rate, that is Q times larger than the data rate (Q is referred to as spreading factor). Each mobile station uses a single antenna to transmit signals to the base station, which is equipped with an antenna array of M elements. The overall system can be modelled as a MIMO (multiple input multiple output) system that has the signals transmitted by the K mobile stations as inputs and the signals received by the M antennas at the base station as outputs. A discrete-time baseband model is derived by sampling at the chip rate the received signals at the output of the filter matched to the pulse waveform. The ith sample of the signal is represented

17 Introduction 3 by the M 1 vector y(i), that is the sum of the contributions from all users: y(i) = K H k x k (i)+n(i), (1.1) k=1 where the W 1 vector x k (i) contains W consecutive elements of the data sequence transmitted by user k (to represent the convolution with the channel). The channel length W (in chip intervals) is assumed to be W 1 to account for intersymbol interference. The vector n(i) models under Gaussian approssimation not only the ambient noise, but also the inter-cell interference and therefore it is spatially correlated with covariance R n = E[n(i)n H (i)]. The M W matrix H k denotes the discrete-time channel impulse response for the link between the kth user and the M antennas. The space-time matrix H k includes the effect of pulse shaping, multipath propagation, array response and chip matched filter. The overall set H =[H 1 H K ] describes the channel of the MIMO system. Notice that model (1.1) includes all the effects mentioned above, multipath propagation, ISI, inter-cell and intra-cell interference. We start the discussion by assuming the channels to be time invariant, then in the last part of the thesis we will extend the analysis to time varying channels. In time slotted CDMA systems, users transmit data in packets (bursts) that include sequences of information symbols and a known sequence of training symbols. During the transmission of information data, the kth signal {x k (i)} is obtained by spreading the N s information symbols {d k (j)} with the code {c k (i)} assigned to user k (with code length Q). On the other hand, during the training period the transmitted sequence {x k (i)} is composed of the N m known symbols for channel estimation. In this thesis we will address the following problems (see Figure 1.1): Multi-user data detection: joint detection of the information symbols {d k (j)} for all users from the data {y(i)}, by using the knowledge of the channels {H k } K k=1,thecovariance of noise R n andthespreadingcodes{c k (i)} (multi-user data detection); Multi-user channel estimation: joint estimation of the channels {H k } K k=1 and R n from the signals {y(i)}, by using the known training symbols {x k (i)}. Both the operations are carried out by processing jointly the signals from all antennas and all users (space-time multi-user processing) Space-time multi-user data detection The conventional single user receiver is composed of a bank of K filters matched to the CDMA signatures. Since the signals are no longer orthogonal after multipath propagation, the performance of this detector is heavily limited by the interference. Furthermore, the propagation channel is frequency selective, hence the signals at the receiver are affected not only by multiple access interference from other users but also by self-interference from consecutive symbols. For this reason, the optimum decision for each data symbol requires the observation of the whole received burst {y(i)},fori =1,...,N s Q + W 1. Receivers that handle both ISI and MAI by elaborating jointly all symbols in the data packet (under the assumption that the channel is time invariant over the block interval) are called block multi-user detectors. The optimum receiver is the maximum likelihood multi-user detector [83], in which the matched filters are followed by a Viterbi algorithm [16]. This detector has been proved to be

18 4 Introduction Received signals y 1 () i y 1 () i M y M () i M antennas MULTI-USER DATA DETECTION MULTI-USER CHANNEL ESTIMATION Ĥ L Ĥ K 1 Data symbols dˆ 1 ˆ K users () j d 2 () j M dˆ K () j Space-Time (ST) Multi-User (MU) processing Figure 1.1: Structure of the space-time multi-user receiver. effective against multiple access interference and to provide significant performance improvement over the single user receiver. However, its computational complexity grows exponentially with the number of users. Suboptimal linear and non-linear multi-user detectors have been proposed to gain a better performance/complexity trade-off [85]. In the thesis we revise several of these block based multi-user detectors and evaluate the performance by simulations for the UTRA-TDD system [28]. Furthermore, in order to reduce the computational complexity for the inversion of the large correlation matrix, some alternative suboptimal techniques are proposed Multi-user channel estimation In order to detect information symbols reliably, all the detectors mentioned above require the knowledge of the propagation channels. The spatial covariance of noise R n must be derived as well to allow interference rejection. A straightforward approach to the estimation of the multiuser multi-antenna channel is the least squares estimate of the matrices {H k } K k=1.however,the performance of this estimate is affected by a large mean square error, due to the high number of parameters to be estimated (KMW) and the short training sequence length (N m ). We observe that in model (1.1) the propagation channel of each user is modelled by MW variables. However, in many cases, even if the propagation has a rather complex structure, these parameters are correlated and therefore exceed those really needed to describe the space-time channel. Hence we propose to exploit features of the wireless channel to reduce the complexity of the model. Two main approaches can be identified in the literature. In the first one, structural constraints are imposed on the channel matrix based on the multipath model. The channel is parameterized in terms of delays, directions of arrival and fading amplitudes, finding these parameters is equivalent to estimate the channel. Algorithms for the estimation of the multipath parameters can be found in the literature, such as [81], [86], [56], [7]. However, these methods

Introduction 5 Typical urban Rural area Low rank channel 10-60 +60 0 Angle [deg] Delay [µs] 10-60 +60 0 Angle [deg] Delay [µs] Bad urban Hill Terrain High rank channel 10-60 +60 0 Angle [deg] Delay

2: Examples of space-time propagation channels: color scale denotes the power while markers represent the scatterer positions as a function of delays and angles.

Usually in multipath propagation the paths can be grouped into few clusters of scatterers having small angle/delay spread compared to the angular/temporal resolution of the system.

19 Introduction 5 Typical urban Rural area Low rank channel Angle [deg] Delay [µs] Angle [deg] Delay [µs] Bad urban Hill Terrain High rank channel Angle [deg] Delay [µs] Angle [deg] Delay [µs] Figure 1.2: Examples of space-time propagation channels: color scale denotes the power while markers represent the scatterer positions as a function of delays and angles. require a non-negligible increase of computational complexity. An alternative procedure to simplify channel parameterization is rank reduction. Usually in multipath propagation the paths can be grouped into few clusters of scatterers having small angle/delay spread compared to the angular/temporal resolution of the system. When this occurs the number of degrees of freedom in the matrix H k is less than MW and the channel matrix loses rank. The rank order can be approximated by the number of clusters with significant power. This is illustrated by some examples in Figure 1.2. The power of the channel impulse response is represented as a function of angles and delays. It can be seen that the examples of typical urban and rural area channels are composed of a single user cluster with relatively small angle/delay spread. These channels can be well described by low rank matrices H k.figure1.3 shows a channel that is composed of two main clusters and can be approximated by a rank 2 matrix. According to the principle of parsimony, the number of parameters should be restricted only to those that are really needed to describe the channel model. Hence we propose to reduce the unknowns of the model by exploiting the reduced rank property of the propagation: each channel is modelled by a low rank matrix, that is r k =rank(h k ) min(m, W). Reduced rank estimation For single user the reduced rank estimation of the channel and the covariance of noise is obtained by maximizing the likelihood function from the model (1.1) under the constraints of reduced rank [74]. This approach has been proposed in the literature for single user wireless systems such as the GSM system [19], [41]. In this thesis we propose a new extension to multi-

6 Introduction Rank 1 approximation Propagation channel 10 Delay [µs] +60 0-60 Angle [deg] Rank 2 approximation 10 Delay [µs] 0-60 +60 Angle [deg] 2nd CLUSTER MAIN CLUSTER 3rd CLUSTER 10 Delay [µs]

We show how the channels for the overall users can be estimated jointly with the constraint that the space-time channel for each user is low-rank.

20 6 Introduction Rank 1 approximation Propagation channel 10 Delay [µs] Angle [deg] Rank 2 approximation 10 Delay [µs] Angle [deg] 2nd CLUSTER MAIN CLUSTER 3rd CLUSTER 10 Delay [µs] Angle [deg] Figure 1.3: Example of reduced rank representation of the propagation channel. user systems. We show how the channels for the overall users can be estimated jointly with the constraint that the space-time channel for each user is low-rank. The method can effectively cope with multiple access interference that arises due to the non orthogonality of the training sequences. Furthermore, the rank order is estimated adaptively for each user according to the minimum description length criterion (MDL, [63]), this method provides the needed trade-off between distortion (due to under-parameterization) and variance (due to the noise). Multi-slot reduced-rank estimation. The reduced rank technique is extended to exploit training data from different slots. The basic idea is to virtually extend the training sequence length so as to obtain a more accurate channel estimate. Clearly the estimation needs to take into account the variations of the channel parameters over the slots. In multipath propagation the amplitudes of the paths change rapidly due to fast fading, but their angles and delays are usually fairly constant over several time slots, for reasonable geometries and speeds. Therefore, the slowly varying parameters can be isolated and estimated by exploiting the redundancy that arises from multiple slots (inter-slot estimation), while the fast varying parameters have to be derived on a slot by slot basis. In the literature multi-slot methods have been proposed to explicitly estimate angles and delays [82]. In this thesis we propose a different unstructured approach. More precisely, we translate the stationarity of angles and delays into the invariance of the subspace spanned by the temporal/spatial signatures and estimate this subspace from successive slots. A similar reduced rank method was proposed for the single user system, but the estimation is based on the invariance of the spatial basis only [17]. Here we derive new reduced rank multi-slot estimates that take advantage of the invariance of both the spatial and the temporal basis and

21 Introduction 7 are suited for multi-user systems. Adaptive reduced rank estimation For mobile users moving at high speed, the estimate obtained from the training sequences can not be used within the whole slot as the error probability increases at the edges of the burst. Hence we propose to track the low rank channel H k (t) within each time slot by an adaptive receiver (intra-slot tracking). The reduced rank tracking algorithm exploits the estimate of the slowly varying component of the channel from the multi-slot approach and updates only the fast varying parameters. Starting from the training data, the channel is updated over the slot interval in decision directed mode, by using a sliding window multi-user detector (SWD, [33]). Standard tracking algorithms can be used [27], but if severe fading occurs (for long data slots and large velocity) more effective methods should be adopted [38]-[39]. 1.3 Outline of the thesis The thesis is divided into three major parts. The first part is composed of Chapter 2 and 3, and regards full rank space-time receivers. The second part, given by Chapter 4, analyzes the rank properties of the wireless channel and defines the reduced rank model. This model is used to develop the reduced rank channel estimation in the third part, that is divided in Chapters 5, 6 and 7: for single-slot and single-user (Chapter 5), for single-slot and multi-user (Chapter 6), for multi-slot and multi-user with adaptation for time varying channels (Chapter 7). This last part contains the major theoretical contribution of this work and the most significant simulation results. Chapter 2. Readers that are familiar to space-time multi-user channel and data estimation can just browse this chapter, as it is an overview of the theory of multi-user detection and channel estimation for multi-antenna systems, with applications to the uplink of UMTS- TDD. A discrete-time baseband model is first introduced to describe the MIMO model, this model is then used throughout all the thesis. We describe several linear and non linear block multi-user detectors based on the least squares estimate of the channel matrices. In particular, we will show that, since the unconstrained least squares approach attempts to derive all the spatial-temporal dimensions of the channel, the estimate error is large and has an effect of degradation on the performance of data detection. This motivates the investigation of more accurate constrained estimators. Chapter 3. Block based multi-user detection involves the inversion of a large matrix whose dimensions depend on the block size and on the number of users. We investigate suboptimal techniques to reduce the computational complexity of this operation. Two algorithms are compared in term of performance, complexity and parallelism: the approximate Cholesky detector (ACD) and the sliding window detector (SWD). In particular, we focus on the SWD approach as it can be used in adaptive receivers and is well suited to be optimized in parallel hardware architectures. Chapter 3 has mainly a practical interest and it is quite independent from the rest of the thesis, hence, readers that are not interested to implementation aspects of the multi-user detection can just proceed to the next chapters. Chapter 4. This chapter introduces the underlying idea of the reduced rank approach. The low rank property of the wireless channel is analyzed. A rank analysis of typical outdoor propagation environments shows that the space-time matrix describing the channel is in fact low rank. The reduced rank channel model is thus defined.

22 8 Introduction Chapter 5. In this chapter we investigate the reduced rank channel estimation for single user systems. A new significant closed form for the Cramer Rao lower bound (CRLB, [53]) is calculated. The analytical performances are also validated by numerical results. The performance bound shows that the error of the reduced rank estimate depends on the number of independent parameters that describe a rank-r k channel matrix. Since this number is much lower than MW, the gain of the reduced rank approach with respect to the unconstrained approach is remarkable. Furthermore, the adaptive selection of the rank order by the MDL criterion is treated. Chapter 6. In this chapter the reduced rank estimation is extended to multi-user systems. The proposed method can effectively cope with multiple access interference (due to the non orthogonality of the training sequences) and co-channel interference from neighboring cells. Furthermore, with respect to the conventional full rank receiver, it is shown that the lower number of the space-time filters for each user results not only in better estimation performance but also in a reduced complexity for the structure of the multi-user detector. Chapter 7. In this chapter we extend the analysis to time varying channels. The key assumption of invariant spatial/temporal manifold is made. The slowly varying component of the channel is estimated from the observation of successive training intervals (inter-slot tracking), while the fast varying component is updated over the burst interval in decisiondirected mode (intra-slot tracking). Both inter-slot channel estimation and intra-slot tracking are developed by exploiting the low-rank nature of the channel matrix. Four methods are proposed to estimate the channel from successive slots, for each of them the CRLB of the channel estimate is calculated for varying number of slots. The analytical results show that for a large number of slots (asymptotic lower bound) the invariant features can be estimated with any accuracy thus leading to a remarkable improvement in estimate performance. Furthermore, the asymptotic bound is reached for a reasonable number of slots. The asymptotic performances for a large number of slots are derived also in [66], by using a different approach that leads to the same result. For intra-slot tracking of the fast varying features, standard recursive algorithms are considered, such as the recursive least squares (RLS) and the least mean squares (LMS) [27]. The performances of the proposed receivers are evaluated for time slotted CDMA systems by means of Monte Carlo simulations. Realistic realizations of the channel impulse response for outdoor macro-cell propagation scenarios are generated from COST 259 Directional Channel Model [69]. This model is described in Appendix A. The TD/CDMA system is simulated according to UMTS-TDD standard that is recalled in Appendix B. 1.4 Thesis contributions The results discussed in the thesis are published in the papers listed below. The reduced rank channel estimate is not new itself, several works can be found in the literature for the single user system. However, the extension of the reduced rank method to CDMA multi-user systems and the use of the estimate in multi-user detectors is novel. The reduced rank estimate for multi-user systems is introduced in the following paper: M. Nicoli and U. Spagnolini, Multiuser space-time channel estimation for CDMA under reduced-rank constraint, Proc. IEEE Globecom Telecommunications Conference (Globecom 00), vol. 1, pp , San Francisco, CA, Nov This paper analyzes also the rank order of realistic propagation environments obtained from COST-259 Directional channel model and applies the proposed technique to UMTS-TDD. The

23 Introduction 9 estimate with adaptive selection of the rank order of each user is proposed in the following paper: M. Nicoli and U. Spagnolini, Reduced-rank channel estimation and rank order selection for CDMA systems, Proc. IEEE International Conference on Communications (ICC 01), vol. 9, pp , Helsinki, Finland, June 2001, while the modification of the multi-user data detector on the base of the low rank channel model is treated in M. Nicoli and U. Spagnolini, Adaptive-rank receiver for space-time DS-CDMA systems, Proc. IEEE Vehicular Technology Conference (VTC Spring 01), Rodes, vol. 1, pp , Rodes, Greece, May The following paper proposes the adaptive reduced rank receiver as a combination of the multislot estimate of the slow varying channel subspace and the intra-slot tracking of the fast varying channel parameters: M. Nicoli, M. Sternad, U. Spagnolini, and A. Ahlén, Reduced-rank channel estimation and tracking in time-slotted CDMA systems, submitted to IEEE International Conference on Communications (ICC 02),NewYork,NY,June2002. Estimation and tracking are both developed for multi-user systems. Multi-slot channel estimation for a single antenna system is presented in the journal paper M. Nicoli, O. Simeone, and U. Spagnolini, Multi-slot estimation of frequency-selective fast-varying channels, submitted. Suboptimal block-based and sliding window multi-user detectors are compared in M. Beretta, A. Colamonico, M. Nicoli, and V. Rampa, U. Spagnolini, Space-Time multiuser detectors for TDD-UTRA: design and optimization, Proc. IEEE Vehicular Technology Conference (VTC Fall 01), vol. 1, pp , Atlantic City, NJ, October 2001, in term of performance, computational complexity, parallelism and hardware implementation. Finally, an alternative approach to reduce the parameterization of the propagation channel and improve estimate accuracy is treated in M. Nicoli and U. Spagnolini, Reducing the complexity of the space-time channel estimate at minimum risk, Proc. IEEE Vehicular Technology Conference (VTC Fall 01), vol. 1, pp , Atlantic City, NJ, October This method is not presented in the thesis as it is not based on the reduced rank approach. The channel estimate is obtained from the pre-whitened least squares estimate, by setting to zero (mask) only those samples of the matrix H k that have small values. With respect to the routinely employed methods, here the threshold values are selected according to the likelihood ratio test (LRT) that minimizes the Bayes risk function [91]. The statistical model of the prewhitened channel estimates for the LRT is based on the Rayleigh mixture model, parameters of the mixture are also iteratively estimated by the LRT. All the papers can be downloaded at:

24 10 Introduction

25 Chapter 2 Space-time multi-user receivers 2.1 Introduction IN CDMA systems users share the same frequency band and time interval and are separated by distinct spreading codes. If the codes are orthogonal and the propagation channel is not time dispersive, users do not interfere with each other and the conventional matched filter receiver remains the optimal solution for data estimation. However, in general the orthogonality between codes cannot be maintained because of multipath propagation and, therefore, the performance of the single user detector becomes interference limited. Advanced signal processing techniques are needed to reduce the multiple-access interference (MAI). Optimal multi-user signal detection (MUD, [83]) was proved to suppress multiple access interference and provide significant performance improvement over the single user receiver. However the computational complexity of this receiver is prohibitive as it is exponential in the number of users. Suboptimal solutions have been investigated to gain a better performance/complexity trade-off. Various linear and non linear multi-user detectors have been developed [85], such as the decorrelating detector, the minimum mean square error detector, successive interference cancellation, multistage detectors, decision feedback equalizers. In addition to multi-access interference, when the propagation channel is frequency selective, intersymbol interference (ISI) arises between consecutive data symbols. ISI cannot be neglected when the symbol period is comparable with the delay spread, such as for TDD- UTRA [28] where a short code CDMA system is employed. In this case, block multi-user detection can handle both MAI and ISI by processing all the symbols in the data packet (burst) jointly [37], [32]. Another approach to suppress co-channel interference (from the own cell or from the neighboring cells) is the use of antenna arrays at base stations [54]. The integration of both the methods, space-time signal processing and multi-user detection, can provide further improvement in performance [45], [88], [68]. This is referred to as space-time multi-user detection (ST-MUD). In order to perform ST-MUD, the space-time propagation channel has to be estimated for all the users active in the cell. Moreover, to allow interference rejection, the space covariance matrix of the noise has also to be derived. In TD/CDMA systems this estimation is based on the transmission, in every data packet, of training sequences that are known at the receiver. Since the multi-user multi-antenna propagation channel can be well modelled by a set of FIR filters, a straightforward approach for its estimation is the least squares estimate of the channel taps. However, the number of training data in each data packet can be too little, this makes the 11

26 12 2. Space-time multi-user receivers channel estimate unreliable and therefore increases the probability of error in the data detection process. In this chapter space-time multi-user receiver schemes are revised for the uplink of TD/CDMA systems. A discrete-time baseband model is introduced to describe the MIMO model that has the multi-user transmitted signals as input and the multi-antenna received signals as output. Then, in the first part of the chapter, we consider the unconstrained least squares estimation of the channels and the space covariance of noise from the training data. In the second part, linear and non linear multi-user detectors are described. Both inter-symbol interference and multiple access interference are included in the model, hence the multi-user detection is block based, under the assumption that the channel is time invariant over the block interval. Finally, the effect of noisy channel estimation on the detected data is examined. Simulation experiments are given at the end of the chapter for the UTRA-TDD standard. 2.2 Hybrid TDMA/CDMA systems A time slotted CDMA mobile radio system uses a combination of TDMA (time division multiple access) and CDMA (code division multiple access) to separate users in the time and/or in the code domain [26]. The frame structure of the TDMA component is illustrated in Figure 2.1. Each frame has duration T f and is subdivided into N f slots. An uplink is considered, where the mobile users have a single transmitter antenna and the base station uses an antenna array receiver of M elements. K mobile users are active in each time slot, they use the same frequency band but different spreading codes which allow signal separation at the receiver. The transmission is assumed to be perfectly synchronous. During the time slot each user transmits a burst of duration T b = T f /N f consisting of two data blocks, a user specific training sequence (midamble) and a guard period. The K known training sequences consist of N m chips each and are suitably designed to cope with multi-access interference. Each data block contains N s QPSK symbols of duration T s = QT c,whereq = T s /T c denotes the spreading factor. For user k the symbols are spread by the code c k =[c k (1),...,c k (Q)] T. The time interval between two successive slots of the same user is T f. At the base station, the signals received during the time slot are separated into two subsets, the training data and the signals depending on the information symbols. The estimation of the space-time channels is developed from the first subset of signals by the method described in the first part of the chapter (Sections ). Then space-time multi-user detection is carried Frame l Frame l+1 Timeslot 1 Timeslot 2 L Timeslot N f Time slot 1 Time slot 2 L Timeslot N f L T f N s Q chips N m chips N s Q chips N s symbols Training sequence N s symbols Guard Burst Data block 1 Data block 2 T b Figure 2.1: Burst structure for the uplink of a hybrid TD/CDMA system.

27 2. Space-time multi-user receivers 13 out on the two blocks of signals received before and after the midamble in order to estimate the 2N s K data symbols. This is discussed in the second part of the chapter (Sections ), by using the knowledge of the channel estimates and the signature sequences c k,fork =1,...,K. The derivation of both channel estimation and signal detection is based on the lowpass discrete-time model that is derived in Section 2.3 to describe the uplink of the multi-user multiantenna TD/CDMA system. 2.3 System model and problem formulation In this section we derive the discrete-time model for the signal received by the antenna array during either the training period or the transmission of the two data blocks. The baseband signal z k (t) transmitted by the kth user is given by z k (t) = N c 1 i=0 x k (i)ψ(t it c ) (2.1) where ψ(τ it c ) are symmetrical orthonormal chip waveforms (real valued), T c is the chip interval, x k (i) can represent either the ith symbol of the training sequence (during the training interval, N c = N m )ortheith chip of the sequence of data symbols spread by the code c k (N c = QN s ). Each element of the chip sequence has E[ x k (i) 2 ]=σ 2 x. The signal passes through the multipath propagation and arrives at the base station array. To simplify the discussion, the following assumptions are made: Users can be represented by source points. The received signals are narrowband with respect to the carrier frequency, f c = c/λ c,so that the signal envelope remains essentially unchanged during the propagation across the array. The sources are in the same plane of the antenna array and in the far field, so that the propagation wave impinges the array as a plane rather than a spherical wave. The antenna elements are identical and have the same response in any given direction. In this case, the array response vector is parameterized by the carrier frequency and the relative times of delay across the antenna array, for any given array geometry. Under these assumptions, for a uniform linear antenna array (ULA) with spacing between sensors of λ c /2, the array response vector to the direction ϑ is given by: a(ϑ) =[1 exp( jπ sin ϑ) exp( jπ(m 1) sin ϑ)] T, (2.2) where j denotes the imaginary unit. The first antenna is taken as reference point, ϑ is the angle between the arriving signal and the normal to the array. Let the propagation for user k be made of P k paths, the impulse response vector of the channel from the transmitter to the antenna array output can be expressed as: P k f k (t) =[f k,1 (t) f k,m (t)] T = α k,p (t)a(ϑ k,p (t))δ(t τ k,p (t)), (2.3) p=1 where α k,p (t), ϑ k,p (t), τ k,p (t) denote respectively the fading amplitude, the direction of arrival

28 14 2. Space-time multi-user receivers User 1 i x x 1 () i δ ( t it ) c () i δ ( t ) i K it c ψ () t ψ () t () t ( M ) f1 1 f 1, 1 () t () t f 1, 2 f 1, M () t M f K 1 f K, 2, () t () t M Antenna 1 ν 1 () t r 1 () t ν 2 () t r 2 () t ν M () t r M () t ψ ( t) ψ ( t) ψ ( t) t = it c y 1 () i y 2 () i y M () i y() i ( M 1) User K f K, M () t f K () t ( M 1) Antenna M Figure 2.2: Base band model for the received signals in the uplink of a K-user M-antenna system. and the delay of the pth reflection, a(ϑ k,p (t)) is the M 1 array response for the direction of arrival ϑ k,p (t). Weassumethatα k,p (t), ϑ k,p (t), τ k,p (t) remain unchanged over the burst interval, therefore we suppress the time dependence and denote the parameters as α k,p, ϑ k,p, τ k,p. Perfect power control is assumed so that all users have the same power here normalized to unity. The composed signal received at the base-station antenna array is then given by r(t) =[r 1 (t) r M (t)] T = K z k (t) f k (t)+ν(t), (2.4) where ν(t) =[ν 1 (t) ν M (t)] T is the additive noise that models both the ambient noise and the inter-cell interference. It is assumed Gaussian, with zero mean, temporally uncorrelated and spatially correlated (due to the interference). The continuous-time model is shown in Figure 2.2. At each antenna the signal is first passed through a filter matched to the chip waveform ψ(t) andthenissampledatthechiprate.letw denote the maximum channel length expressed in chip intervals, by using (2.1) and (2.3) and sampling the signal (2.4) at instants t = it c with i =0,...,N c + W 2, the discrete-time signal reduces to y(i) = r(τ)ψ(it c τ)dτ = K W 1 k=1 w=0 k=1 P k x k (i w) α k,p a(ϑ k,p )g(wt c τ k,p ), (2.5) where g(t) = ψ(τ)ψ(t τ)dτ is the convolution of the transmitted pulse with the receiver filter pulse. Now, let h k (i) be the convolution g(t) f k (t) sampled at the time instant t = it c, h k (i) = f k (τ)g(it c τ)dτ, (2.6) h k (i) represents the discrete-time channel vector that accounts for the array response, the path p=1

29 2. Space-time multi-user receivers 15 H 1 ( M W ) Space-time channel K-user transmitted data h 1,1 n 1 () i x 1 () i h 1,m M x K () i h 1,M M h K,1 h K,m M M n m () i n M () i y 1 () i y m () i y M () i y() i ( M 1) M-antenna received signals h K,M ( M W ) H K Figure 2.3: Equivalent discrete time model for the uplink of a K-user M-antenna system. fading, the symbol waveform and the receiver filter: P k h k (i) =[h 1 (i) h M (i)] T = α k,p a(ϑ k,p )g(it c τ k,p ). (2.7) The received signal simplifies then as y(i) =[y 1 (t) y M (t)] T = p=1 K h k (i) x k (i)+n(i), (2.8) where the discrete-time noise n(i) = ν(τ)ψ(it c τ)dτ is Gaussian, still temporally uncorrelated, and spatially correlated with covariance E[n(i)n H (i)] = R n. k=1 2.4 Signal model for channel estimation During the training period each mobile user transmits a sequence of N m known symbols {x k (i)} N m 1 i=0 to allow the estimation of the propagation channels. Let H k be the M W space-time channel matrix H k =[h k (0) h k (W 1)], (2.9) for the kth user. The mth row of H k,heredenotedbyh k,m, contains the channel impulse response for the link between the mobile user and the mth antenna: H k =[h k,1 h k,m ] T. Below the discrete-time model (2.8) is rewritten into the conventional form used for channel estimation. The model is shown also in Figure 2.3. The received signal simplifies as y(i) = K W 1 k=1 w=0 h k (w)x k (i w)+n(i) = K H k x k (i)+n(i), (2.10) k=1

30 16 2. Space-time multi-user receivers where x k (i) is the column vector containing delayed transmitted symbols for user k: x k (i) =[x k (i) x k (i W +1)] T. (2.11) We may observe that, as the channel length is W > 1, inter-symbol interference occurs. ThefirstandthelastW 1 symbols of the received sequence cannot be used for channel estimation, as they depend on the data transmitted before and after the training period. Hence, by discarding these time samples and arranging the remaining N = N m W +1into a single matrix Y of dimensions M N, the multi-user data model can be written by using the standard notation K Y = H k X k + N = HX + N, (2.12) where Y, X k and N are defined as k=1 Y = [y(w) y(w + N 1)], (2.13) X k = [x k (W) x k (W + N 1)], (2.14) N = [n(w) n(w + N 1)]. (2.15) Note that X k is a W N Toeplitz matrix. Finally, by collecting the K channels and K training sequences into the multi-user matrices thedatamodelreducestothemorecompactform H = [H 1 H K ], (2.16) X = [X T 1 XT K ]T, (2.17) Y = HX + N. (2.18) The problem that will be addressed in the next section is the estimation of the multi-user channel H and the spatial covariance of the noise R n from the received signals Yand the known training sequences X. 2.5 Unconstrained channel estimation The unconstrained maximum likelihood estimate of the compound channel H and the covariance matrix R n is obtained as [67] {ĤFR, ˆR n } { =argmin H,R n log R n + 1 N Y HX 2 R 1 n }, (2.19) where Y HX 2 =tr{(y HX) H R 1 R 1 n (Y HX)}. The minimization (2.19) leads n to the least squares (LS) solution Ĥ FR = [Ĥ FR,1 Ĥ FR,K ]=YX H (XX H ) 1 = ˆR yx R 1 xx, ˆR n = 1 ( )( ) H Y Ĥ FR X Y Ĥ FR X = ˆR yy ˆR yx R 1 xx N ˆR xy, (2.20a) (2.20b)

31 2. Space-time multi-user receivers 17 where the following sample covariance matrices have been introduced: ˆR yx = 1 N YXH = 1 N R xx = 1 N XXH = 1 N ˆR yy = 1 N YYH = 1 N N+W 1 i=w N+W 1 i=w N+W 1 i=w y(i)x H (i), (2.21) x(i)x H (i), (2.22) y(i)y H (i). (2.23) Equation (2.20b) shows that the covariance of noise is obtained from the residuals of the channel estimate, ˆN = Y Ĥ FR X. In the following, the least squares estimate will be referred to as unconstrained or full rank (FR) estimate Performance Analysis The error of the least squares channel estimate (2.20a) can be written as H FR = Ĥ FR H = NX H (XX H ) 1. (2.24) Let us define the vectors h FR = vec( H FR ), n =vec(n), and observe that the covariance of the noise vector n is E [ nn H] = I W R n. (2.25) By using the property vec(ayb)= ( B T A ) y [25], from (2.24) the error reduces to [ ((XX h FR = H ) 1 X ) ] IM n, (2.26) Then, the mean square error (MSE) of the unconstrained channel estimate can be easily derived as [ E H FR 2] [ = E h FR 2] =tr [ E ( h FR h H )] FR {[ ((XX = tr H ) 1 X ) ] IM E [ nn H] [ ( X H (XX H ) 1) ]} IM = tr {[( (XX H ) 1 XX H (XX H ) 1) ]} 1 R n = N tr ( R 1 xx R ) n = 1 N tr ( R 1 ) xx tr (Rn )= σ2 nm N tr ( R 1 ) xx. (2.27) If the training sequences are uncorrelated, in (2.27) we can set R xx = σ 2 x I W and write the MSE as below [ E H FR 2] = σ2 n σ 2 KMW. (2.28) xn This highlights that the accuracy of the estimate depends on the ratio between the total number of unknown coefficients (KMW) and the training length (N). In practice the training sequences are never uncorrelated, therefore multi-access interference arises and causes noise enhancement at the output of the channel estimator. By comparing (2.28) with the ideal case (2.27), it easy to see that the MSE is increased by a factor d db =10log 10 [ σ 2 x KW tr ( R 1 xx ) ]. (2.29)

32 18 2. Space-time multi-user receivers Appendix B gives the typical degradation of performance for UTRA-TDD training sequences. In the following we further investigate the effect of the multi-access interference by considering reciprocally equi-correlated training sequences (see also [66]). Let R t be the correlation matrix of each training sequence and ρ u the correlation coefficient between sequences of different users, 0 ρ u 1. The correlation matrix R xx can be written as: R x1 x 1 R x1 x 2 R x1 x K R t ρ u R t ρ u R t R x2 x 1 R x2 x 2 R x2 x K R xx =..... = ρ u R t R t ρ u R t σ2 x (2.30) R xk x 1 R xk x 2 R xk x K ρ u R t ρ u R t R t By defining the following K K matrix: 1 ρ u ρ u ρ u 1 ρ u R u =..... ρ u ρ u 1 = ρ u1 K 1 1 T K 1 +(1 ρ u )I K, (2.31) the training correlation matrix reduces to R xx = σ 2 x R u R t. Then, by using (2.27), the MSE canbeeasilyobtainedas [ E H FR 2] = σ2 n M N The inverse matrix in (2.32) can be calculated as [20] and its trace is given by R 1 u = I K tr ( R 1 u ) ( ) tr R 1. (2.32) t 1 1 ρ u ρ + K 1 K1 T K, (2.33) u tr ( R 1 ) 1+ρ u = K u (K 2) (1 ρ u )(1+ρ u (K 1)). (2.34) From (2.32) and (2.34), by assuming K 1, it follows: [ E H FR 2] = σ2 n M N K 1+ρ u (K 2) (1 ρ u )(1+ρ u (K 1)) tr ( R 1 ) t σ2 n N MK 1 tr ( R 1 ) t. (2.35) 1 ρ u The MSE (2.35) for reciprocally equicorrelated training sequences can be interpreted as a degradation in terms of the SNR by a factor D =1/(1 ρ u ) with respect to the single user case (ρ u =0). When the training sequence of each user is uncorrelated, the latter equation reduces to [ E H FR 2] σ2 n σ 2 xn MW K 1 ρ u. (2.36) Again, by comparing this relation with (2.28), it is evident that the normalized mean square error is increased by D. This degradation of performance is the effect of the multiple access

33 2. Space-time multi-user receivers 19 interference and can be interpreted as an increment of the number of equivalent orthogonal users. In other words, for a system of K correlated users, with correlation ρ u, the performance of the least squares channel estimate are the same that the system would experience in the case of K/(1 ρ u ) orthogonal users. EXAMPLE 2.1 We evaluate the performance of the LS estimate for ρ u {0, 0.1, 0.3, 0.5, 0.7, 0.9}, M =8, K =4, W =20, N =500, E [ H 2] =1. Wemakethefurtherassumptionthatthe training sequence for each user is a first order autoregressive process (AR(1)) with coefficient ρ t =0.1,thatis 1 ρ t ρ W 1 t R t = σ 2 ρ t 1 ρ W 2 u x (2.37) The inverse matrix is given by [20] R 1 t = 1 σ 2 x. 1 1 ρ 2 t ρ t 1 ρ 2 t ρ W 1 t ρ t 1 ρ 2 t 1 1 ρ 2 t 0 ρ t 1 ρ 2 t ρ W 2 t ρ t 1 ρ 2 t 1 1 ρ 2 t ρ t 0 0 ρ t 1 ρ 2 t. 1 ρ 2 t 1 1 ρ 2 t, (2.38) it follows that tr ( R 1 ) 1 W(1 + ρ 2 t = t ) 2ρ 2 t σ 2 x 1 ρ 2. (2.39) t The analytical mean square error (2.35) is compared with the value obtained by averaging H 2 over 100 independent runs of channel and noise. The result is shown in Figure 2.4. The solid lines represent the simulated MSE, the circles are the analytical values. According to (2.35), for ρ u =0.9 the degradation of performance with respect to the ideal case (ρ u =0) is equal to 1/(1 ρ u )=10dB.

34 20 2. Space-time multi-user receivers ρ u = 0.9 MSE 10 0 ρ u = ( 1 ρ ) 10dB 1 / = u σ x σ n [ db] Figure 2.4: Performance of the least squares estimate for varying ρ u. 2.6 Signal model for data detection In the following we derive the discrete-time baseband model for the signal received by the M antennas during the transmission of the data packet by the K users. We consider frequency selective channel fading and assume that the coherence time is much larger than the packet duration, so that channel variations can be neglected in the block-detection process. Moreover, perfect power control is assumed, near far effects are not considered in this discussion. The kth user transmits a sequence of QPSK symbols, {d k (j)} j=0 Ns 1. The symbol d k(j) is transmitted during the time period [jt s, (j +1)T s ] spread by the delayed signaling waveform c k (t jt s ). The signaling waveform c k (t) is defined as Q 1 c k (t) = c k (q)ψ(t qt c ), (2.40) q=0 where c k =[c k (0) c k (Q 1)] is the complex valued spreading code assigned to the user k. We recall that T s denotes the symbol interval, T c is the chip interval, Q = T s /T c is the spreading factor, ψ(t qt c ) are the symmetrical orthonormal chip waveforms. The transmitted baseband signal for the user k is obtained from (2.1) by rewriting x k (i) as the sequence of the symbols d k (j) spread by c k (t jt s ) for j =0,...,N s 1: z k (t) = N s 1 j=0 d k (j) c k (t jt s )= N s 1 j=0 Q 1 d k (j) q=0 c k (q)ψ(t jqt c qt c ). (2.41) We make the further assumption that each user transmits independent equiprobable symbols and that symbol sequences from different users are independent.

35 2. Space-time multi-user receivers 21 The signal received at the base station antenna array is given by the superposition of the signals from the K users plus the additive noise: r(t) = [r 1 (t)...r M (t)] T = = K k=1 N s 1 j=0 k=1 Q 1 K z k (t) f k (t)+ν(t) d k (j) c k (q)ψ(t jqt c qt c ) f k (t) + ν(t), (2.42) q=0 where f k (t) =[f 1 (t)...f M (t)] T is the vector channel impulse response for the kth user defined in (2.3), ν(t) =[ν 1 (t)...ν M (t)] T is the noise vector modelling thermal noise plus interference. ν(t) is assumed to be Gaussian, with zero mean, temporally white, with one sided power spectral density σ 2 n. At the receiver antenna the signal r(t) is passed through the chip matched filter and sampled at the chip rate at time instants t = it c,fori =0,...,N s Q+W 2, y(i) = r(τ)ψ(it c τ)dτ, (2.43) W is the channel length expressed in chip intervals. Insertion of (2.42) in (2.43) leads to the discrete-time received signal: y(i) = K k=1 N s 1 j=0 Q 1 d k (j) c k (q)h k (i jq q) + n(i), (2.44) q=0 where h k (i) is the discrete-time channel vector defined in (2.7), that accounts for the effects of the pulse shaping, the multipath propagation and the matched filter. The noise n(i) is complex Gaussian, with zero mean. It is assumed to be spatially correlated with covariance R n.forthe orthonormality of the waveforms ψ(it c t), the noise is temporally white and has variance σ 2 n =[R n ] m,,m at each antenna for m =1,...,M. The discrete-time model considered so far isshowninfigure2.5. The model (2.44) can be simplified by observing that the term within brackets is the convolution between the channel impulse response and the spreading code, that is: s k (i) =[s k,1 (i) s k,m (i)] T = c k (i) h k (i). (2.45) s k (i) represents the equivalent signature vector that includes both the signalling waveform and the propagation effects. Each signatures s k,m (i) is composed of Q+W 1 chips, corresponding to +1symbol intervals, where denotes the ISI length: = (Q + W 1)/Q. (2.46) As s k (i) has finite length, we can remove zero terms from the sum (2.44) and obtain the baseband discrete-time array response model: y(i) = K or, equivalently, for each antenna: i/q + k=1 j= i/q d k (j)s k (i jq)+n(i), (2.47)

36 22 2. Space-time multi-user receivers Data symbols H 1 ( M W ) h 1,1 n 1 () i Q d 1 () i c 1 h 1,m M d K () i Q Spreading code ck s K, m h 1,M M h K,1 h K,m Space-time channel M M n m () i n M () i y 1 () i M y m () i M y M () i y() i ( M 1) K users h K,M M antennas ( M W ) H K Figure 2.5: Equivalent discrete time model for the uplink of a CDMA system with K users and M antennas. y 1 (i) = K y M (i) = K i/q + k=1 j= i/q i/q + k=1 j= i/q d k (j)s k,1 (i jq)+n 1 (i). d k (j)s k,m (i jq)+n M (i). (2.48) Here j is a running symbol index, whereas i denotes the chip number. The discrete model is showninfigure2.5. Thesignals{y m (i)} constitute a sufficient statistic for the detection of thedatasymbols{d k (i)} [85]. Let us define the (Q + W 1) 1 vector s k,m =[s m,k (0) s m,k (Q + W 2)] T, (2.49) that collects the non-zero entries s m,k (i) (see Figure 2.6). The signatures of all users for the mth antenna can be arranged into the (Q + W 1) K matrix while the vector signatures for the antenna array are given by: S m =[s 1,m s K,m ], (2.50) S =[S T 1 ST M ]T. (2.51) In general, due to multipath propagation the channel is frequency selective ( > 1). The received signals (2.47) will then suffer from self-interference (ISI) from consecutive symbols and multi-access interference (MAI) between symbols of different users (even if orthogonal spreading codes are used). Intersymbol interference can be ignored when T s is much grater than the delay spread, typically in long code CDMA. However, the effects become severe when

37 2. Space-time multi-user receivers 23 T s is comparable with the duration of the channel impulse response. This occurs for instance in UMTS-TDD system (short code CDMA): for Q =16and a typical urban delay spread (5µs) it is =2. When the signals are affected by ISI, the optimum decision on the jth data symbol cannot be obtained from the observation of the single jth snapshot of the received signal (2.47). In order to properly handle ISI and MAI, the observation of the whole received burst of N s Q + W 1 signals is required. Hence, below we collect the N s Q + W 1 samples of the data received by the mth antenna into the vector y m y m =[y m (0) y m (N s Q + W 2)], (2.52) that is the sum of the K user sequences and the noise sequence: n m =[n m (0) n m (N s Q + W 2)]. (2.53) Moreover, we arrange the KN s data symbols {d k (j)} of all the users into the vector d of dimensions KN s 1, and the corresponding KN s signatures {s k,m (i jq)} into the (N s Q + W 1) KN s matrix S m. The symbols are ordered first by user and then by symbol interval, so that, for any k =1,...,K and j =0,...,N s 1 the entry k + jk of the data vector d represents the jth symbol for the kth user [d] k+jk = d k (j), (2.54) and the corresponding signature (waveform s m,k (i) delayed by jq) is contained into the (k + jk)th column of the matrix S m : The received sequence y m canbenowwrittenas [S m ] i,k+jk = s k,m (i jq). (2.55) y m = S m d + n m, (2.56) where S m is the (N s Q + W 1) KN s matrix containing the signature sequences for all the users and the symbol intervals. The model (2.56) represents the equivalent KN s -user synchronous system of the original ISI affected K-user N s -symbol system. For the ensemble of M sensors, the final form of the sufficient statistic y =[y T 1,...,yT M ]T is given by y = Sd + n, (2.57) where S =[S1 T,...,ST M ]T is the overall system matrix of dimension M(N s Q+W 1) N s K, and n =[n T 1 n T M ]T. The covariance matrix of the noise is decoupled into spatial and temporal correlation as R n = E [ nn H] = R n I NQ+W 1. (2.58) The structure of the matrices S m, S and S isgiveninfigure2.6. If the channel is not frequency selective (W =1), then consecutive symbols are orthogonal and y(i) is a sufficient statistic for the estimation of d(i) =[d 1 (i) d K (i)] T. When this occurs, the detection for the ISI-free synchronous model can be based on the single shot model: y(i) =Sd(i)+n(i). (2.59) In the following we will consider several optimal and suboptimal block detectors for the ISI affected system (2.57) based on the sufficient statistics y. The corresponding detectors for the single shot case (2.59) can be obtained by replacing y with y(i), S with S and d with d(i).

38 24 2. Space-time multi-user receivers Signatures antenna m S m = ( Q + W 1) K S = s 1, m s 2, m S 1 S 2 M S M L s K, m Signatures ISI-free system (W=1) Antenna 1 M ( Q +W 1) K users M antennas N s symbols W tap channel Antenna M Signatures ISI-affected system (W>1) S = N s ( Q +W 1) S 1 K S M 0 Q Q +W 1 M O O 0 0 S 1 Antenna 1 S M Antenna M K N s K Figure 2.6: Signature matrices for K-user M-antenna CDMA system. 2.7 Conventional detector: single user matched filter The conventional receiver in CDMA systems is the single user space-time matched filter. The data symbols are estimated from the output of the matched filter ( ) H y MF =S H R 1 n y = R H/2 n S R H/2 n y (2.60) as ˆd =dec(y MF ),wheredec( ) denotes the hard detector for a QPSK constellation. As highlighted in (2.60), the receiver can be decomposed into two terms. The purpose of the first operation is the spatial decorrelation of the background noise: ỹ =R H/2 n y =R H/2 n Sd+R H/2 n n= Sd+ñ, (2.61) S = R 1/2 n S denotes the matrix of the whitened space-time signatures. ñ =R H/2 n n is the output noise with covariance E[ññ H ]=I. The second operation in (2.60) is the filter matched to the whitened signatures: y MF = S H ỹ =Rd + n MF, (2.62) where n MF =S H R 1 n n is the Gaussian noise with covariance E(n MF n H MF ) = R, andr denotes the N s K N s K correlation matrix R (0) R (1)... R ( ) 0 0 R ( 1) R (0) R (1)... R ( ) 0 R = S H R 1 n S=. R ( 1) R ( ) R ( ).. 0 R ( )... R ( 1) R (0) R (1) 0 0 R ( )... R ( 1) R (0). (2.63)

39 2. Space-time multi-user receivers 25 y 1 () i M () i Whitening filter R H / 2 n Space-time matched filter M signatures user 1 ~ * s 1, 1 M ~ * s 1, M ~ s y M K *, 1 ~ y M () i Antenna array ~ y 1 () i M M ~ s * K, M M M M signatures user K Q Q Q Q ymf, 1() i y MF, k y MF, K () i () i Multi-user detector M M S-T MUD dˆ1 ( i) M M ˆ ( i) d k ˆ ( i) d K K-user data Figure 2.7: Structure of the space-time multi-user detector. R is block-toeplitz and block-band, with bandwidth Each block R (i) has dimension K K and contains the correlations between the signatures of the K users delayed by iq. As the whitening matched filter treats ISI and MAI as background noise, the estimate is biased, indeed for σ 2 n =0it is ˆd =dec(rd) d. In general, due to multipath propagation, asynchronism or non orthogonal spreading codes, R is not diagonal and therefore the performance is limited by ISI and MAI. The solution is optimum only in the special case of synchronous orthogonal signatures. The structure of the matched filter is shown in Figure 2.7. For each user, the output of the pre-whitened signal is fed into the bank of M filters matched to the signatures { s k,m } M m=1, where s k,m = Rn H/2 s k,m.thefilters are then followed by a maximum ratio combiner and a symbol rate sampler. All the detectors that will be described in the following are an extension of this structure, the matched filter is followed by a multi-user detector as shown in Figure 2.7. The computational complexity is O(M(Q + W 1)) operations per symbol. 2.8 Optimum MUD: maximum likelihood According to model (2.57), the optimum detection of the symbols d from the received signals y is obtained by maximizing the joint a-posteriori probability P [d y]. For equiprobable and independent symbols, the optimum solution is equivalent to the maximization of the likelihood function or the log-likelihood function: [ f(y d) exp 1 2 (y Sd)H R 1 n ] (y Sd), (2.64) L (d) =2Re ( d H S H R 1 n y ) d H Rd =2 Re ( d H y MF ) d H Rd, (2.65) where y MF is the output of the space-time whitening matched filter. An exhaustive search of the maximum in (2.65) would require O(4 KNs /KN s ) operations per symbol, which is totally out of question in practice, due to the typically large values of

40 26 2. Space-time multi-user receivers N s.however,asr has the banded structure shown in (2.63), the log-likelihood function can be decomposed in a way that allows the recursive maximization by the dynamic programming algorithm [16]. The Viterbi algorithm reduces the complexity to O(4 K( +1) /KN s ). In the case of flat fading channel and perfect synchronism between users ( =0), the receiver simplifies to the optimum multi-user detector for synchronous channels [83] and the complexity reduces to O(4 K /KN s ).Forflat fading channel and delays between users within one symbol interval ( =1), we get the optimum multi-user detector for asynchronous channels [84] whose computational complexity is O(4 2K /KN s ). Differently from the whitening matched filter, the optimum receiver demodulates the data error-free in the absence of noise. However, the computational complexity is exponential in the number of users K. This motivates the investigation of suboptimum detectors that improve the performance of the conventional receiver but require a number of operations only polynomial with K. 2.9 Linear multi-user detectors Decorrelating detector The decorrelating detector [42]-[43] selects the unbiased vector ˆd c that minimizes the quadratic form ˆd c =argmin y Sd c 2 d R 1 =argmin [ (y Sdc ) R 1 c n n (y Sd c ) ]. (2.66) d c Assuming that the matrix R is invertible, the solution of (2.66) is the zero-forcing equalizer: ˆd c = ( S H R 1 n S) 1 S H R 1 n y =R 1 y MF. (2.67) The output decision is then ˆd =dec(ˆd c ). By combining (2.57) and (2.67) we get: ˆd c = d+r 1 S H R 1 n n= d+r 1 n MF. (2.68) It can be seen from the latter equation that the decorrelating detector provides the exact solution of the linear system for σ 2 n =0, eliminating completely both MAI and ISI. Moreover, it has been shown that this receiver is near far resistent [42]. However, the decorrelating detector is not optimal, as it does not take into account the correlation of the noise at the output of the matched-filter. As a result, the decision variables are correlated with covariance Cov(ˆd c d) =R 1, leading to an enhancement of the error variance with respect to the conventional receiver. This will be shown by simulations in Section 2.12, where for sufficiently low SNR the single-user matched filter will perform better than the decorrelating detector. The computational load of the decorrelating detector arises from the computation of the inverse of R. This operation would require a number of operations in the order of O(Ns 2 K 2 ) per symbol, however, by exploiting the block-band and block-toeplitz structure of R, thecom- plexity can be reduced to O( K 2 ). This aspect will be investigated in Chapter MMSE detector The linear multi-user detector that minimizes the mean square error ˆd c =argmin d c d d c 2 (2.69)

41 2. Space-time multi-user receivers 27 is the MMSE multi-user block detector defined as ˆd c = ( S H R 1 n S+I ) 1 S H R 1 n y =(R+I) 1 y MF, (2.70) wherewehaveusedtheassumptione(dd H ) = I. The decisions are then obtained by ˆd=dec(ˆd c ). The MMSE detector is near far resistant and can be easily structured to be adaptive [29]. The equation (2.70) is an extension of the decorrelating detector (2.67) by a Wiener estimator, which reduces the performance degradation. We have seen that the single-user matched filter takes into account only the additive noise neglecting the presence of the interference. Conversely, the decorrelating detector eliminates completely MAI and ISI but does not consider the effect of the noise, that is the correlation existing in the decision variables. The MMSE linear detector is a good compromise solution that takes into account the relative importance of interference and background noise. Indeed, it can be noticed from (2.70) that the expression of the MMSE detector approaches the decorrelating detector for σ 2 n 0 and the whitened single user matched filter for σ 2 n.ithasbeenshownthat,foranyfinite SNR, the MMSE multi-user detector outperforms all the other linear detectors [58]. The computational complexity is the same of the decorrelating detector. It should be noticed that here the condition of invertible correlation matrix is not required Decision driven multi-user detectors We consider decision driven detectors that can be decomposed in two stages [85]. The first stage is a linear transformation ˆd c = Gy MF, (2.71) where G is a N s K N s K matrix. For each symbol i =1,..., N s K, the purpose of this first operation is to mitigate the interference from the previous symbols {1, 2,...,i 1}. The second stage uses the output of the first stage to perform a non-linear successive cancellation of the interference from the future symbols {i, i +1,...,N s K}: ) ( ) ˆd =dec(ˆd c Bˆd =dec Gy MF Bˆd, (2.72) with the constraint that the N s K N s K matrix B is strictly upper triangular. For the ith symbol in (2.72), the estimate ˆd(i) is given by ˆd(i) =dec ˆd c (i) N s K j=i+1 N s K B i,jˆd(j) =dec n=1 N s K G i,n y MF (i) j=i+1 B i,jˆd(j). (2.73) where here ˆd c (i) and y MF (i) denote the ith element of the vectors ˆd c and y MF respectively, while B i,j indicates the entry (i, j) of the matrix B. It can be seen that, as B is strictly upper triangular, ˆd(i) does not depend on {ˆd(j)} i 1 j=1 and therefore the solution (2.73) can be implemented sequentially, according to the order i = N s K, N s K 1,...,1 and assuming that for any i all the past decisions {ˆd(j)} i 1 j=1 that are fed back are correct. In case of incorrect past decisions, this methods suffer from error propagation. The transformation matrices are specified below for some special cases of decision driven detectors.

42 28 2. Space-time multi-user receivers The simplest form of successive cancellation does not exploit the stage G and uses directly the output of the matched filter as intermediate decision: G = I, (2.74) B = tri (R), (2.75) where tri( ) selects the strictly upper triangular part of the matrix argument. By assuming that R is invertible, the decorrelating decision feedback detector is obtained as G = R H/2, (2.76) ( B = R 1/2 diag R 1/2). (2.77) In the hypothetical absence of background noise this solution guarantees error-free demodulation. The MMSE decision feedback detector is given by G = (R+I) H/2, (2.78) ( B = (R+I) 1/2 diag (R+I) 1/2). (2.79) It should be noticed that for σ 2 n 0 this solution approaches the decorrelating detector. Several types of decision feedback multi-user equalizers are investigated in [78] Effect of noisy channel estimation on MUD We derive an approximation of the effect of channel estimate errors on the performance of the decorrelating detector. To simplify the discussion we assume M =1, uncorrelated noise and training sequences (R n = σ 2 n I M, R xx = σ 2 x I KW). Furthermore, we make the following assumptions: the error on the channel estimate H = Ĥ H is uncorrelated to the background noise; the data symbols are uncorrelated, i.e. E[dd H ]=σ 2 d I N sk; the border effect are neglected (infinite transmission). For a different derivation see also [70]. We start by recalling the model for the received data: and the expression of the decorrelating multi-user detector: y = Sd + n, (2.80) ˆd c = R 1 S H y. (2.81) If the channel is perfectly known, the variance of the decision variable is given by ( E ˆd c d 2) = σ 2 n tr ( R 1). (2.82) We will derive the increment of the variance (2.82) when the least square channel estimate Ĥ FR = H + H FR is used in the detector. Let Ŝ = S + S denote the signature matrix built

43 2. Space-time multi-user receivers 29 from the multi-user channel estimate Ĥ FR,and ˆR = Ŝ H Ŝ. The output of the decorrelating detector can be written as ˆd c = ˆR 1 Ŝ H y= ˆR 1 Ŝ H (Sd + n) = ˆR ( ) 1 Ŝ H Ŝd Sd + n = d+ ˆR 1 Ŝ H n ˆR 1 Ŝ H Sd = d+ ˆR 1 Ŝ H (n Sd). (2.83) We observe that the input signal of the detector is corrupted not only by the background noise n, but also by the interference Sd that is generated by the imperfect knowledge of the channel state information. The error on the detected data is therefore composed of two terms: n 1 = ˆR 1 Ŝ H n, (2.84) n 2 = ˆR 1 Ŝ H Sd. (2.85) We make the assumption that the two considered errors are uncorrelated between each other and therefore: [ ) ] H E (ˆd c d)(ˆd c d = E [ n 1 n H ] [ 1 + E n2 n H ] 2. (2.86) We observe that ˆR 1 is non-linearly related to the error of the channel estimate. However, for large SNR, the error S is much smaller than S, hence we can make the approximation: ˆR 1 ( S H S+ S H S + S H S ) 1 R 1 R 1 ( S H S + S H S ) 1 R 1 R 1. (2.87) Using this result and the hypothesis of uncorrelated data symbols, it follows: E [ n 1 n H ] 1 = σ 2 n R 1, (2.88) E [ n 2 n H ] 2 = σ 2 ˆR d 1 Ŝ H E [ S S H] Ŝ ˆR 1. (2.89) Moreover, the covariance (2.89) can be further simplified in the case of uncorrelated training sequences. Indeed, the intersymbol interference Sd at the input of the detector results uncorrelated, it follows: E [ S S H] = E ( H FR 2) I NsQ+W 1, (2.90) E [ n 2 n H ] 2 = σ 2 d E ( H FR 2) R 1. (2.91) By combining (2.88) and (2.91), the mean noise power on the decision variable reduces to: ( E ˆd c d 2) =tr ( E [ n 1 n H ]) ( [ 1 +tr E n2 n H ]) 2 = σ 2 n tr ( R 1) ( 1+ σ2 d σ 2 E ( H FR 2)). n (2.92) Now, assume σ 2 d = 1, and recall that the MSE of the unconstrained channel estimate is E( H FR 2 )=σ 2 n tr(r 1 xx )/N. With respect to the case of known channel (2.82), least squares channel estimation degrades the performance of the decorrelating detector by the factor d =1+ σ2 d σ 2 H FR 2 =1+ 1 n N tr(r 1 xx ). (2.93) For Rxx 1 = σ 2 xi the degradation due to the noisy channel estimate reduces to 1+σ 2 nkw/(nσ 2 x). It is clear that the performance of the detector depends on the ratio between the number of parameters to be estimated and the number of training symbols. Hence, reduction of the error

44 30 2. Space-time multi-user receivers probability can be obtained either by decreasing the number of channel parameters or by extending the training sequence length. EXAMPLE 2.2 We calculate the degradation effect of the ML channel estimation on the detected data for the UTRA-TDD system. We consider the estimation of K =8propagation channels from the training sequences definedbythestandardspecifications. The midamble used for this example is given in Appendix B and has length N =456. For each user, the maximum number of channel unknown is estimated, that is W = N/K =57. We compare the performance with the ideal case of uncorrelated training sequences R xx = I KW (σ 2 x =1). For ideal training sequences the mean square error of the ML channel estimate is E( H 2 ideal )=σ2 n tr(r 1 xx )/N = σ 2 n. The effect of the noisy channel estimate is the enhancement of the noise at the output of the ZF detector by the factor d =1+ H 2 ideal /σ2 n = 3dB. The UTRA-TDD training sequences are not perfectly uncorrelated and have tr(r 1 xx ) = 529. This leads to an increase of the estimate error with respect to the ideal case, by the factor E( H 2 )/E( H 2 ideal )=529/456 = 0.65 db. Moreover, the enhancement of the noise variance at the detector is d =1+529/456 = 3.35dB. The degradation is 0.35dB larger than the one calculate for uncorrelated training sequences Simulation results We evaluate and compare the performance of the different space-time detectors in terms of the average uncoded bit error rate and by using Monte Carlo simulations. The simulations are based on the system parameters defined by the UTRA-TDD standard (see Appendix B), for a uniform linear antenna array of M =1 8 elements having λ c /2 spacing. In the same time slot and in the same frequency band K =8users transmit synchronously a traffic burst that consists of two data blocks of N s =61symbols and a midamble of N m =512chips. The modulation scheme is QPSK with symbol duration T s =4.17µs. The spreading process consists of two operations. At first, the system generates K real valued Hadamard sequences, that are composed of Q = 16 chips. Then, each sequence is scrambled by a cell specific complexcodeoflength16. Thepurposeofthescramblingcodeistoimprovethecorrelation properties of the original Hadamard codes. The channel model used in this simulations is obtained by extending to the space domain the typical urban (TU) environment of COST-207 model [12]. For each user, the propagation consists of a single cluster of N p =12uncorrelated paths, whose delay and mean power are fixed and chosen according to the TU model. Conversely, the angles are random variables with distribution ϑ k,p N(ϑ k,σ 2 ϑ ),meanvalueϑ k U[ π/3,π/3] and standard deviation σ ϑ = π/36. The variations of the received power due to path loss and shadowing are assumed to be eliminated by applying a perfect power control. The amplitude variations due to fast fading are simulated by using the Rayleigh distribution. In all the simulations the interference is simulated by Gaussian noise. The bit error rate is evaluated for varying SNR by averaging over the channels, the noise and the data. The SNR

45 2. Space-time multi-user receivers M=1 M=4 M=8 BER M=8 ZF ZF-DF MMSE MMSE-DF MF M=4 M= SNR [db] Figure 2.8: Performance of space-time multi-user detectors for typical urban channels assumed known. per bit and per antenna is defined as γ = Q σ 2 d E ( H k 2) 2 σ 2 n M, (2.94) where the mean value E( H k 2 ) is the same for all users (near far effects are not present). Figure 2.8 compares the performance of the following space-time detectors: single-user matched filter (MF), multi-user decorrelator (ZF), multi-user minimum mean square error (MMSE), multi-user decision feedback decorrelator (ZF-DF), multi-user decision feedback minimum mean square error (MMSE-DF). Both single antenna (M =1) and antenna array receiver are considered (M =4and M =8). The channel is assumed perfectly known, the noise is spatially correlated. As expected, the MMSE-DF performs best, followed by ZF-DF, MMSE, ZF and MF. The MMSE approaches the MF for γ 0 and the ZF for γ. For large SNR the performance of the MF are limited by an irreducible error floor, because neither ISI nor MAI are fully removed. The value of the asymptotic probability of error depends on the number of antennas. With respect to the single antenna receiver, the antenna array detector provides an improvement in performance that is the sum of the array gain (M) and the additional gain from the exploitation of the multipath diversity. Furthermore, it can be observed that, according to [79], for increasing M the difference between linear and DF equalizers tends to decrease. Figure 2.9 shows the effect of channel estimation on the detector performance. The noise is AWGN covariance R n = σ 2 ni M. The channels are estimated from the training data by the multi-user least squares method, then the channel estimates are used to perform multi-user decorrelating detection. As expected, the errors of the channel estimate due to the short training sequence length increases the probability of error in the detection process. In particular, for

46 32 2. Space-time multi-user receivers 10 0 M=1 M=4 M= BER Known channel LS channel estimate SNR [db] Figure 2.9: Performance of the space-time multi-user decorrelating detector for typical urban channels: known channels ( ); least squares estimation of the channels ( ). M =1theSNRdegradationobtainedbythissimulationconfirms the analytical value of 3.35 db derived by using the approximation (2.93) in the example 1.2. The degradation is even larger for multiple antennas Conclusions In this chapter we have studied channel estimation and data detection for multi-user multiantenna mobile radio systems. In particular, we have presented several linear and non linear block based receivers, based on the least squares estimation of the channel matrices and noise statistics. However, simulations results have shown that least squares estimate of the MIMO channel can be unreliable, as it attempts to derive all the spatial-temporal dimensions of the channel (i.e., all the M temporal FIR filters) using an insufficient number of training data. The errors generated by the estimate has a degradation effect on the performance of the data detectors. This motivates the investigation of more accurate channel estimators that exploit properties of the space-time propagation (constrained estimators). The receivers proposed in this chapter are based on the assumption that the propagation channels remain unchanged over the burst interval. However, for time-varying channels, the block detectors have limitations, since the detector parameters need to be updated on a symbol by symbol basis. Hence, in the next chapter we investigate suboptimum single shot solutions that can be used in adaptive receiver. The adaptation of the detector parameters is considered in Chapter 7.

47 Chapter 3 Efficient algorithms for block multi-user detection 3.1 Introduction LINEAR multi-user detection (MUD) for frequency selective fading channels has always been considered a prohibitive computational task in CDMA systems. In time slotted CDMA, block based MUD involves the inversion of a large matrix that depends on the block size and on the number of users. Suboptimal techniques are computationally efficient but show some performance degradation. The reduced complexity detectors can be either block-type or one-shot. Efficient computational solutions for block-type receivers for UMTS TDD-UTRA are available [22]-[24], [35], [44]. However, the intrinsic latency caused by the block-style processing limits the possibility to update the channel estimation from decisions (adaptive receiver). This fact motivates the use of single shot space-time multi-user detection, such as the sliding window decorrelator (SWD) approach [33]. In addition, the SWD algorithm is well suited to be optimized in parallel hardware architectures and to be used in adaptive receiver [34]. This chapter is focused on the comparison between block-based and sliding window multiuser detectors for space-time processing in term of performance, computational complexity and parallelism; further remarks about hardware implementation based on pipelined systems (systolic arrays) are presented. 3.2 Data model We recall the discrete-time model for the uplink of a time slotted CDMA system (for details on the notation, see Section 2.6). The model is obtained by sampling at the chip rate 1/T c the received signals after the chip matched filter. Within the same cell, in the same frequency band and in the same time slot, K mobile stations are simultaneously active. At the base station an antenna array of M elements is employed. Users transmit in bursts, each consisting of two data blocks separated by a user specific training sequence (midamble) for channel estimation. In the following we will define the data model and propose efficient solutions for the detection of one single data block (before or after the midamble) assuming that the influence of the midamble due to the temporal dispersion of the channel is perfectly cancelled for all the users. 33

48 34 3. Efficient algorithms for block multi-user detection The data block of each user contains N s QPSK symbols of duration T s. Each symbol is spread by a user-specific signature composed of Q chips. The packet is transmitted to the antenna array over a frequency selective fading channel, that is assumed to be invariant over the burst interval. Due to multipath propagation, in general the channel length (expressed in chip intervals) is W 1. Hence, even if the codes are designed to be orthogonal, the signatures obtained from the convolution with the channels are no longer orthogonal and both multiple access interference (MAI) and intersymbol interference (ISI) arise. Let d k (i) denote the ith symbol transmitted by the kth user, d(i) =[d 1 (i) d K (i)] T is the vector containing the ith symbol for all the K users, d =[d(0) T d(n s 1) T ] T is the overall data vector of length N s K. The signals received by the M antennas during the burst transmission are arranged into the M(N s Q + W 1) 1 complexvaluedvector y = Sd + n, (3.1) S =[S1 T,...,ST M ]T denotes the M(N s Q + W 1) N s K system matrix, that is made from the signatures of all users and all symbols as described in Section 2.6. The noise n represents both the background noise and the intercell interference, it is assumed temporally uncorrelated and spatially correlated, n N(0, R n ). As shown in Chapter 2, multi-user data detection can be reduced into two steps: at first, the signals y are filtered by the space-time matched filter to get the N s K 1 vector y MF = S H R 1 n y. Linear space-time multi-user detection is then performed on the matched filter output to estimate data symbols: the decorrelating detector is ˆd c =(S H R 1 n S) 1 y MF, the MMSE detector is ˆd c =(S H R 1 n S+I N s K) 1 y MF (if the data symbols are independent and uniformly distributed). Without any loss of generality, in this chapter we assume that the noise n is white, R n = σ 2 n I N sk, and we adopt the MMSE multi-user detector for the estimation of the data symbols d. Moreover, known channel state information is assumed. The algorithms discussed in this chapter can be easily generalized to systems with correlated noise and/or decorrelating multi-user detection (with or without decision feedback, see Sections for details). 3.3 Computation of the ideal detector For the decorrelating detector the system simplifies to ˆd c = R 1 y MF, (3.2) R = S H S is the KN s KN s complex valued correlation matrix. This equation can be solved by exploiting the Cholesky decomposition of R R = L H L, (3.3) where L is the Cholesky factor of R and is upper triangular. At first the lower triangular system L H x = y MF is solved by forward substitution using the intermediate vector x, then the upper triangular system Lˆd = x is computed using the backward substitution algorithm. For large block size (N s ), number of users (K) anddelayspread(v = Q+W 1 Q,expressed in symbol intervals) the computational complexity of this solution can be prohibitive. In particular most of the complexity of the detector arises from the calculation of the matched filter y MF and from the Cholesky factorization of the large matrix R. The cost of the matched filter (MF) increases linearly with the number M of antennas; however, it can be moderately

49 3. Efficient algorithms for block multi-user detection 35 ( N s K NsK) ( N K N K) R 0 R 1 R 2 L L12 L13 11 R 1 L 22 L L R 2 s s R = L = R a L a R 2 R 1 R 2 R 1 R 0 L NN Figure 3.1: Structure of the correlation matrix R and its Cholesky factor L. optimized just considering that it may be performed in parallel for each antenna. The Cholesky factorization of R requires a number of operations in the order of O(N 3 s K 3 ). According to the system parameters and the burst size of the TDD-UTRA standard R can be very large, hence efficient algorithms must be employed for the Cholesky decomposition. The structure of matrices R and L is shown in Figure 3.1. The correlation matrix R is composed of blocks of dimensions K K, R= [R ij ],wherer ij designates the block (i, j), with i, j =1,...,N s. Moreover R is block Toeplitz (R ij = R j i, [10]), block Hermitian (R ij = R H ij ), and block band (R i = 0 for i v). The Cholesky factor is not block Toeplitz itself but it is still block band with the same bandwidth of R: L =[L ij ], L ij = 0 for i j v. By exploiting the structure of R and L it is possible to introduce efficient algorithms to reduce the computational complexity of the Cholesky factorization Factorization by block band Cholesky algorithm The standard algorithm for the Cholesky decomposition is obtained solving row-by-row the equation R= L H L for L [20]: for i=1:n L ii = chol(r 0 i 1 m=1 LH mi L mi) for j=i+1:n L ij = L H ii (R j i i 1 m=1 LH mi L mj) end end (3.4) The complexity of this algorithm for full matrices is O(N 3 s K3 ). However the block-banded structure of both R and L can be fully exploited while calculating L, as the number of iterations in the inner loop and the number of addends in the two summations in (3.4) are both v. This leads to a banded version of the algorithm [20], here referred to as block band Cholesky algorithm. The computational complexity reduces to O(v 2 N s K 3 ) Factorization by block Schur algorithm The block Schur algorithm is a generalization of the Schur algorithm for the QR decomposition of Toeplitz matrices [21] to the factorization of block Toeplitz matrices. The algorithm will be shortly revised here, for the details refer to [23]-[24].

50 36 3. Efficient algorithms for block multi-user detection The block Hermitian matrix S H S can always be expressed as: S H S=B H B C H C, (3.5) where B= [B i j ] and C= [C i j ] C KNs KNs are upper triangular block Toeplitz matrices, with B i = 0 and C i = 0 for i v. B and C are called generators of S H S. Definition 3.1 A 2n 2n matrix Θ is said to be J-orthogonal with [ ] In 0 J = 0 I n (3.6) if it satisfies Θ T JΘ = J. (3.7) Θ is orthogonal if it satisfies (3.7) with J = I 2n. Let Θ be the 2N s K 2N s K J-orthogonal matrix such that: [ ] [ ] B L Θ =, (3.8) C 0 where L is an upper triangular matrix. It can be shown that L in (3.8) corresponds to the Cholesky factor of S H S.ThematrixΘ is composed of elementary orthogonal and J-orthogonal transformations, Θ = i Θ i. The sequence of transformations is such that the number of rotations in (3.8) is minimized: the blocks of B are annihilated diagonal-wise and the blocks of L are computed row-wise, in other words at the ith step, with i =1...N s,thealgorithm annihilates the ith block diagonal of C and completes the computation of the ith block row of L. Since B and C are block Toeplitz matrices, the transformations for annihilating all the blocks on the same diagonal of C are redundant. Therefore the decomposition (3.8) can be performed by skipping the repeated computations and working only with the K vk matrices B =[B 0 B v 1 ] and C =[C 0 C v 1 ] instead of the large N s K N s K matrices B and C. The Schur algorithm takes into account both the Toeplitz and the banded structure of the correlation matrix, hence the complexity of the factorization is reduced to O(vN s K 3 ). It is worth noticing that as the complexity in this case is linear with v, compared to the block band algorithm the solution based on the Schur algorithm is well suited for propagation channels with large temporal spread (i.e., large v) Computational complexity evaluation The computational complexity of the algorithms considered for the multi-user detection is evaluated for 2 data blocks of N s symbols (e.g., 2 semi-bursts of the TDD-UTRA standard) and expressed in terms of real multiplications or, equivalently, multiply and accumulation operations - MAC. One complex multiplication requires four real multiplications. Table 3.1 shows the computational complexity of the multi-user decorrelating detector based on the exact factorization of R. The computational complexity is given for the matched filter (MF), the backward/forward substitution (B/F), the matrix factorization using either the block band Cholesky algorithm (BB) and the block Schur algorithm (BS). For the Cholesky algorithm the computational complexity takes into account also the computation of the term

51 3. Efficient algorithms for block multi-user detection 37 R=S H S. For the Schur algorithm the derivation of the generators is included (R is not calculated explicitly). The complexity is expressed as a function of the system parameters. It can be observed that the overall complexity is linearly related to the block size (N s ) and to the number of the antennas (M) for both the algorithms considered, whereas the dependence on the channel length (v) is linear for the Schur algorithm and quadratic for standard Cholesky algorithm. In addition Table 3.2 specifies the computational load for M = 1 and M = 8 using parameter values from the TDD-UTRA standard: K =8, v =5, N s =61, Q =16, W =57. It is evident that the MF load increases linearly with M; for instance, if M =8, the MF block accounts for most of the computational complexity of the detector. However, its calculation may be highly parallelized (e.g., using one MF processor for each antenna). Even the Cholesky factorization is a very demanding task, in particular it is the most complex block when M is low. Table 3.3 compares the computational load required only by the factorization of R for the block band Cholesky algorithm and the block Schur algorithm for varying values of N s and K =8, v =5, Q =16, W =57, M =1and M =8. The Schur algorithm has the lower computational load for all the values of N s as it exploits also the block Toeplitz structure of the correlation matrix. However, even if the Schur algorithm is adopted, for large data packet size (N s =61) the exact factorization still requires a huge computational load ( ). Therefore the ideal detector (3.2) is not feasible in practical applications and an optimized detector based on an approximate Cholesky factorization, even if suboptimum, is advisable. OPERATION COMPUTATIONAL COMPLEXITY MF 8MKN(Q + W 1) B/F 8K [KN(2v 1) vk(v 1) + N] Chol. Fact. BB 1 3 K [ 2K 2 (3va(v 1) + v 2 (3 2v)+a v)+6k(v 1)(2a v) ] +2MK [ (KQ(v 2 v +1)+(W 1)(K +1)+Q) ] 1 Chol. Fact. BS 3 K 2 [ 24Ka(v 1) + 3a(4v 1) 6vK(2v +1) 6v K ] +2K 2 M [2v(W 1) + Qv(3 v)] Table 3.1: MUD computational complexity using block-type algorithms for the exact Cholesky factorization of R: the Block Band (BB) Cholesky Algorithm and Block Schur (BS). Operation Comp. complexity M =1 M =8 MF O(8QvMKN) B/F O(16vK 2 N) Chol. fact. (BB) O(2v 2 K 3 N +2v 2 K 2 MQ) Chol. fact. (BS) O(8vK 3 N +2v 2 K 2 MQ) Total MUD (BB) Total MUD (BS) Table 3.2: MUD computational complexity (in terms of real multiplications 10 5 )using the Block Band (BB) Cholesky Algorithm and Block Schur (BS). The numerical values are choosen as K =8, v =5, N s =61, Q =16, W =57, M =1and M =8.

52 38 3. Efficient algorithms for block multi-user detection a BB Algorithm BS Algorithm M =1 M =8 M =1 M = Table 3.3: Computational complexity ( 10 5 ) of the exact decomposition of the correlation matrix, obtained by the standard Cholesky algorithm for block band matrices - BB - and the Schur algorithm for block Toeplitz matrices - BS. The numerical values are choosen as K =8, v =5, Q =16, W =57, M =1and M = Approximate detectors In this section two low complexity algorithms are considered for the solution of the equation (3.2), the approximate Cholesky detector (ACD) and the sliding window detector (SWD). The first method calculates R 1/2 from the factorization of a (ak ak) submatrix of R by exploiting the block banded and block Toeplitz structure of the correlation matrix. The SWD is based on the segmentation of the block y MF into smaller sub-blocks of ak symbols where a is the sliding window size (expressed in symbol intervals), this reduces the computation to N s a +1linear systems of size ak ak. For both the algorithms the window size a must be large enough so that the edge effects can be neglected, the optimum value depends on the delay spread of the propagation channels (v). For a N s the approximate detectors approach the ideal detector Approximate Cholesky detector (ACD) The Cholesky factor L of a block multidiagonal Toeplitz matrix R is not block Toeplitz itself even if it is still block diagonal. However, it was proved in [62] that, if N s and the number 2v 1 of the non-zero blocks remains finite, the block rows of the Cholesky factor L converge to the same block row. An intuitive explanation of this property is the following: let perform the decomposition row-wise from the top left corner of L and let a be the index of the current block row, as a increases the influence of the edge blocks (in the top left corner of R) on the elements of L will decrease and the block rows of L will not change noticeably, hence the Cholesky factor will show the nearly block-toeplitz structure shown in Figure 3.1. If in (3.2) N s is large and v N s, the Cholesky factor L of the correlation matrix R is nearly block Toeplitz. It is possible to exploit this property by calculating the first a block rows of R and then copying the last of them until N s rows are filled. This solution is particu-

larly suitable for row oriented factorization algorithms such as the algorithms described in the previous section (Schur and Cholesky algorithms): after having calculated a top rows of the Cholesky

53 3. Efficient algorithms for block multi-user detection 39 ( ak ak) ( N K N K ) s s R a = Lˆ = R 1/ 2 = L a L a = R 1/ 2 a = Figure 3.2: Description of the ACD method for the approximation of the Cholesky factor. larly suitable for row oriented factorization algorithms such as the algorithms described in the previous section (Schur and Cholesky algorithms): after having calculated a top rows of the Cholesky factor the decomposition can be stopped [22], [35], [57]. Clearly the approximation can be also carried out by computing the first a block columns of L and then copying the last column obtained until the end of L.Thisisequivalenttodetermine the Cholesky factor of the ak ak submatrix R a showninfigure3.2andthenfilling the rest with copies from the smaller factor [44]. The factorization of R a can be obtained by either the block Schur or the block band Cholesky algorithm. This method will be referred to as approximate Cholesky decorrelator (ACD). The length a must be chosen large enough to avoid edge effect: a v Sliding window detector (SWD) The sliding window decorrelator (SWD) [33] is an alternative low complexity solution that approximates the ideal decorrelator (3.2) by a one shot detector. The ith symbol for all users d(i) is estimated on the base of an observation window that spans a =2p +1symbol lengths around the symbol i (see Figure 3.3),the influence of the other symbols outside this window y MF R d R a ( ) d i 1 R a = R a ( ) y i MF = EDGE EDGE ( ) d i 1 mbr( R a ) dˆ ( i) 1 ( i) ( R ) y = mbr a MF Figure 3.3: Sliding window detector.

40 3. Efficient algorithms for block multi-user detection is neglected (edge effect). The window length a isalsoreferredtoasmemorylengthofthe detector.

Only the symbols in the middle of the window are detected since they suffer least from the edge effect due to the finite memory length of the detector: first the system y MF (i) =R aˆd is solved for

54 40 3. Efficient algorithms for block multi-user detection is neglected (edge effect). The window length a isalsoreferredtoasmemorylengthofthe detector. Let y MF (i) C ak denote the ith observation window containing the output of the matched filter for the symbols j {i p,..., i + p} of all users. Only the symbols in the middle of the window are detected since they suffer least from the edge effect due to the finite memory length of the detector: first the system y MF (i) =R aˆd is solved for ˆd andthenthek elements in the middle of the vector ˆd are extracted to obtain ˆd c (i). Hence the truncated decorrelating detector corresponds to: ˆd c (i) =mbr(r 1 a )y MF (i), (3.9) where mbr(r 1 a ) denotes the middle block row of the inverse of R a.forthe(i +1)th symbol the processing window slides one symbol interval and then the procedure is repeated. This reduces the computation to N s a +1linear systems (3.1) of size ak ak that can be solved by the Cholesky factorization of R a followed by appropriate backward and forward substitutions. An alternative to the truncation is the optimum finite window length detector [33]. In order to take into account also the MAI due to the edge symbols, this method selects the pseudoinverse of the ak (a+2v 1)K submatrix R a (see Figure 3.3) instead of the square submatrix R a, this yields the best least squares solution. However the improvement in performance with respect to the truncated detector is not significant, moreover the truncated detector is numerically more stable and can be updated easily in the case of time variant channels. For these reasons the truncated detector is the preferred choice in many practical applications. The influence due to the neighboring symbols outside the window is negligible if the window size a of the SWD algorithm is larger than the bandwidth of matrix R (i.e., a 2v 1). It is worth noticing that the correlation matrix R is block band, but the inverse R 1 is not block band, this is shown also in Figure 3.4. Indeed, as the direct filter R is a FIR, the ideal decorrelator R 1 is an infinite memory length detector or equivalently an IIR detector (for large N s ). The SWD attempts to limit the memory length of the decorrelator to a<n s by approximating R 1 with the block band matrix with bandwidth a illustrated in Figure 3.4. While the SWD operates on the inverse filter, the ACD approximates the direct filter R 1/2 and does not truncate the memory length of the decorrelator. It is clear that for the same size a, the edge effects introduced by the SWD are stronger than those due to the ACD, hence the SWD is expected to need for a larger window size. ( N K N K ) s mbr Ra ( ) 1 1 R = Rˆ = s 1 ( N K N K) s s 0 0 Figure 3.4: Approximation of R 1 in the SWD.

55 3. Efficient algorithms for block multi-user detection Computational complexity and performance analysis In this section SWD and ACD algorithms are compared in terms of computational complexity and performance. The key is the choice of the parameter a that should be large enough to neglect the edge effect that depends on the channel length (or equivalently on v). The computational load is given in terms of real multiplications for two data blocks of N s symbols and assuming that the sparsity of the matrices R, L, S is fully exploited. Both ACD and SWD require the computation of 2N s K samples of the output of the S-T matched filter and the Cholesky factorization of the ak ak matrix R a. The latter is carried out only once for the overall burst by the Schur algorithm, hence it requires O(8K 3 av) real multiplications. In the ACD, the large Cholesky factor is derived from Ra 1/2 by copying the last column; then the overall system (3.2) is solved by forward and backward substitution. The latter operations for 2 data blocks of N s symbols require O(16vK 2 N s ) real multiplications. Likewise, in the SWD for each window, the central symbol is detected for all K users by performing at first a forward substitution and then a partial backward substitution of dimension ak (in the first window the backward substitution must be completed). The B/F processing of the N s a +1windows requires O(12vK 2 an s ) real multiplications for v<<a<<n s,as 2N s forward substitutions and 2N s partial backward substitutions are performed (the computational complexity required by the partial BS can be approximated by half the computational complexity of the FS). For large processing windows the overall computational load of the SWD can be too expensive when compared with the ACD. For this reason, we propose to modify the one-shot algorithm previously reviewed by detecting s symbols within each window (1 s a). At each step, the window slides s symbol intervals, therefore the number of windows decreases to N w = (N s (a s))/s N s /s and the computational complexity reduces to O(16vK 2 a N s s ) for v<<a<<n s. In order to reduce the edge effects, the parameter s should be selected so that a s +2(v 1) (note that the ISI spans v 1 symbol intervals). Table 3.4 shows the computational complexity of each step of ACD and SWD (only the significant terms are shown). For the SWD the complexity load is derived by assuming 2v 1 a<<n s and neglecting the terms that are quadratic related to K. In Table the computational complexity is calculated for K =8, v =5, N s =61, Q =16, W =57and for M =1and M =8respectively (the numerical values are obtained by considering the terms neglected in Table 3.4). Moreover the values of (a, s) are selected such that a = s +4, this parameters are suited for an overall channel length of v =3symbol intervals or equivalently an ISI that spans 2 symbols, as for the TU environment definedbythecost 207 channel model (channel length of 5µs) [12]. In Figure 3.5 the performance of both ACD and SWD algorithms for MUD are compared in terms of BER for uncoded bits vs. E b /N 0. The channel model adopted here is the COST- 207 bad urban (BU) multipath propagation channel with 12 rays [12]. For sake of simplicity, Operation Computational complexity MF 8MKN s (Q + W 1) O(8QvMKN s ) Cholesky Fact. 2K 3 [4a(v 1) v(2v 1)]+2K 2 MQv(v +1) O(8vK 3 a) B/F SWD N w [3a + s + K(2v 1)(3a + s) 4Kv(v 1)] O(16vK 2 a N s s ) B/F ACD 8K ( 2KN s v + vk Kv 2 + N s N s K ) O(16vK 2 N s ) Table 3.4: Computational complexity of the MUD with ACD and SWD methods.

56 42 3. Efficient algorithms for block multi-user detection we have used M =1in the simulations as the factorization cost that involves the matrix R is almost independent of M (the number of antennas M affects mainly the size of matched filtering). From this example, the choice a =9inthe SWD algorithm is enough to have negligible impacts on performance with respect to the ACD (characterized by a = 5) still requiring only about 20% more calculations (see Tables 3.5). Figure 3.6 show how the MUD algorithms approximate the exact detector in terms of the windowing parameter a. In Figure 3.5 and 3.6 the performance of ACD and SWD are compared for the COST-207 typical urban (TU) multipath propagation channel with 12 rays. As the TU channel has a lower delay spread (5µs, v =3) thanthebu(10µs, v =4), it requires a smaller window size. Indeed Figure 3.6 indicates that the approximation of the detector has no observable effects on the performance if a =3isselected for the ACD and a =7with s =3for the SWD. It can be concluded (from Figure ) that the SWD has a lower convergence speed with respect to the ACD. Hence, to obtain the same performances, the SWD must adopt a larger windowing parameter a and requires a slightly greater computational complexity with respect to the ACD. a s Cholesky Fact. MF B/F Total MUD SWD ACD SWD ACD Table 3.5: Computational complexity (in terms of real multiplication 10 5 )oftheapproximate detectors SWD and ACD. The parameters adopted are: K =8, v =5, N s =61, Q =16, W =57, M =1. a s Cholesky Fact. MF B/F Total MUD SWD ACD SWD ACD Table 3.6: Computational complexity (in terms of real multiplication 10 5 )oftheapproximate detectors SWD and ACD. The parameters adopted are the same of Table 3.5 with M =8.

57 3. Efficient algorithms for block multi-user detection SWD 10-1 ACD 10-2 a= BER (a,s) = (3,1) (5,1) (7,1) (9,3) (11,5) (13,7) (15,9) (17,11) (61,61) a=5 a 9 a= E b /N 0 a = a 5 a= E b /N 0 Figure 3.5: Performance of MUD in terms of BER vs. SNR for SWD and ACD (BU environment, burst type 1, N s =61, W =57,Q =16, K =8, M =1,AWGNnoise) SWD ACD 10-2 snr = 8 db BER snr = 10 db 10-3 snr = 12 db a Figure 3.6: Performance of MUD in terms of BER vs. a foracdandswdusingthesame parameter values of Fig. 3.5

58 44 3. Efficient algorithms for block multi-user detection SWD 10-1 ACD BER (a,s) = (3,1) (5,1) (7,3) (9,5) (11,7) (13,9) (61,61) a=3 a 7 a= a = a 5 a= E b /N E b /N 0 Figure 3.7: Performance of MUD in terms of BER vs. SNR for SWD and ACD (TU environment, burst type 1, N s =61, W =57,Q =16, K =8, M =1, AWGN noise) SWD ACD 10-2 snr = 8 db BER snr = 10 db 10-3 snr = 12 db a Figure 3.8: Performance of MUD in terms of BER vs. a foracdandswdusingthesame parameter values of Fig. 3.7.

59 3. Efficient algorithms for block multi-user detection Systolic array Exploiting the inherent parallelism of the SWD algorithm, a pipelined architecture can be implemented. The systolic architecture showed in Figure 3.9 has been designed. We use an orthogonal array with ak (ak +1) cells composed of three different processor types. The array is fed with the matrix R a and the filtered vectors y MF (i) properly interleaved and arranged to form the matrix B described in Figure 3.10 for s =1. Except for the first and last window, only Ks columns of the output matrix ˆD are necessary to estimate the vector ˆd. Both size and computation time of the circuit have been evaluated in terms of the algorithm parameters (3 a 17, N s =61, W =57, Q =16, K =8, 1 s 11, M =8). The number of cells N p = ak(ak +1)varies from 600 to while the number of clock cycles N c required to process both semi-bursts increases linearly according to N c =4aK + N w 2. For instance, if a =7, s =1,itisN p = 3192 and N c = 332 clock cycles. Due to the low number of cycles and the high number of cells required, many architectural trade-offs can be exploited to keep the chip size at a reasonable level. The array design is being specified using VHDL while its FPGA implementation needs further investigation. In spite of the computational complexity disadvantage of the SWD algorithm with respect to the other methods previously described, its hardware implementation has several attractive properties that can be summarized here: the possibility to easily implement an adaptive receiver; the possibility to choose the order of processing for the K users (e.g., assign a user priority); the possibility to fully optimize the matrix inversion algorithm. In fact, due to the small dimensions of the R a matrix, the array can be easily updated in real time using the channel estimation information. In addition, the algorithm used by the hardware system to compute R 1 a may be optimized to match the required area-speed constraints. For instance, the proposed systolic array implements the Jordan diagonalization [60] in order to limit the use of multiplications and divisions. B P 1,aK+1 P1,1 P1,2 P1,3 P1,aK P2,aK+1 P2,1 P3,aK+1 P3,1 PaK,aK+1 PaK,1 D^ Figure 3.9: Systolic array implementation of S-T MUD.

60 46 3. Efficient algorithms for block multi-user detection Y B = R MF a ( y M = y y Ra M Ra Ra ( ak) (2) ymf ( ak) ( N w ) (1) ymf () 2 L ymf ( ak) N w )() 1 M L Ra ( ak, ak) (2) ymf () 2 L M (1) () 1 ymf ( 2) M Ra ( ak,2) () 1 Ra ( 2, ak) M Ra ( ak,1) ( 1, ak) M M R ( 2,2) a M ( ) ( ) ( ) 1,2 Ra 2,1 1,1 MF (2) MF (1) MF M y ( N w ) MF Figure 3.10: Input matrix arrangement for the systolic implementation of Figure 3.9 when s =1. Only one semi-burst is shown. 3.6 Conclusions The comparison of two approximate methods for space-time MUD in TDD-UTRA system has shown that The cost of matched filter dominates in S-T processing, however the high degree of parallelism must be exploited for an efficient implementation. Block-type Cholesky factorization is computationally more efficient than SWD for slowly varying channels. However, the SWD is the preferred choice in adaptive multi-user detection when there is the need to track the channel variations within the bursts. The SWD algorithm may be easily implemented with pipelined architectures that exploits its implicit parallelism.

61 Chapter 4 Reduced rank channel model 4.1 Introduction THE use of multiple antennas at the base station enables the exploration of the spatial dimension of the propagation. By exploiting the extra dimension, the space-time processing can reduce co-channel interference and improve system capacity, coverage and quality. However, while for the single antenna receiver the channel is basically modelled by a W tap FIR filter, for the multiple antenna receiver the number of FIR parameters is raised to MW (M is the number of antennas). This increased complexity of the parametrization has two undesired effects. On one hand, the joint space-time equalization that handles the full space-time dimensionality requires a high computational complexity. On the other hand, as the number of training symbols in each burst is limited, the estimate of the overall set of channel parameters is affected by a large variance. According to the principle of parsimony, the number of parameters should to be restricted only to those that are really needed to describe the channel model. In this chapter the number of the unknowns is reduced by exploiting the low rank property of the space-time propagation. A rank analysis on typical outdoor propagation environments shows that the space-time matrix describing the channel does not use all its available degrees of freedom. This implies that the channel matrix is low rank. We thus adopt the reduced rank channel model and consider only few significant space-time dimensions of the propagation. Chapters 5-6 will show that this parsimonious but effective parameterization not only improves the channel estimation performance, but also reduces the complexity of the structure of space-time equalization. 4.2 Reduced rank property of the propagation channel In the uplink of a mobile radio system the baseband channel between the user transmitter and the antenna array receiver is represented by the M 1 continuous time vector h(t) =[h 1 (t) h M (t)] T, (4.1) that captures the effects of array response, symbol waveform and path fading. h m (t) is the channel impulse response for the link between the user and the mth antenna. Let h(t) be sampled at the chip rate and collect W consecutive snapshots of h(t), corresponding to the time instants t i = t 0 + it c,i=0,...,w 1,togettheM W space-time matrix H: H =[h(t 1 ) h(t W )]. (4.2) 47

62 48 4. Reduced rank channel model The propagation channel is modelled by the MW parameters in the space-time matrix (4.2). However, in many cases, even if the propagation has a rather complex structure, these parameters are correlated and therefore are more than those really needed to describe the space-time channel. Parsimonious parametrization is classically obtained by exploiting the multipath structure of the propagation channel. The base band channel h(t) is described by the combination of P paths, each is characterized by the direction of arrival ϑ p, the delay τ p and the complex fading α p : P h(t) = α p a(ϑ p )g(t τ p ) T. (4.3) p=1 Here the path parameters are assumed to be time invariant. The waveform g(t) =ψ(t) ψ( t) represents the convolution of the transmitted pulse and the matched filter at the receiver, while a(ϑ p ) denotes the M 1 array response to the narrowband signal impinging from the direction ϑ p a(ϑ p )=[a 1 (ϑ p ) a M (ϑ p )] T. (4.4) The discrete time channel is obtained by sampling (4.1) at the chip rate: H = P α p a(ϑ p )g(τ p ) T, (4.5) p=1 where the real valued column vector g(τ p ) contains a sampled version of the waveform g(t) delayed by τ p, g(τ p )= [g(t 0 τ p ) g(t 0 + T c τ p ) g(t 0 +(W 1)Tc τ p )] T. (4.6) According to the discrete model (4.5), the channel matrix is now rewritten as: H =[a(ϑ 1 ) a(ϑ P )] diag [α 1 α P ] [g(τ 1 ) g(τ P )] T = F(ϑ)Λ(α)G T (τ), (4.7) where F(ϑ) =[a(ϑ 1 ) a(ϑ P )] denotes the array response, G(τ) =[g(τ 1 ) g(τ P )] is the time response, Λ(α) =diag[α 1 α P ] embodies the fading amplitudes. With respect to the unstructured description (4.1), the new structured model (4.7) reduces the parameters set to the 3P multipath variables: ϑ =[ϑ 1 ϑ P ], α =[α 1 α P ] and τ =[τ 1 τ P ]. The multipath unknowns could be estimated by any angle/delay estimation algorithm, such as [56]. However, here we decline to directly model the channel in terms of delays and consider an alternative approach that guarantees increased robustness against mismodelling (due, for instance, to antenna calibration problems) and lower computational complexity for channel estimation. The unstructured approach here considered is derived from the multipath model (4.7) by exploiting the low rank property of the channel matrix H. Usually in (4.7) the paths can be grouped into few clusters of scatterers having angle/delay spread that is below the angular/temporal resolution of the system. When this occurs the number of degrees of freedom in the matrix H is less than MW and the channel matrix loses rank, that is r 0 =rank(h) < min(m, W). Below we consider some simple examples to outline what kind of propagation environment has this low rank property.

63 4. Reduced rank channel model 49 Rank 1 determined by the angle manifold Rank 1 determined by the delay manifold ϑ ϑ ϑ = ϑ = ϑ 1 = 2 3 τ 1 = τ 2 = τ 3 = τ τ τ 3 ϑ 3 τ 2 τ 1 r S r T =1 = 3 n. resolvable angles n. resolvable delays r S r T = 3 =1 ϑ 2 ϑ 1 SINGLE CLUSTER 3 PATHS H = a b H SINGLE CLUSTER 3 PATHS H = a [ ] H * * * ( ϑ) α g( τ ) + α g( τ ) + α g( ) τ 3 a b M 1 W 1 H H [ α a( ϑ ) + α a( ϑ ) + α a( ϑ )] g () τ = a b M 1 W 1 Figure 4.1: Examples of rank-1 channels. For instance, a rank-1 channel (r k =1) is obtained from (4.5) when there is no angular spread, that is ϑ p = ϑ, p: P P H = α p a(ϑ)g(τ p ) T = a(ϑ) α p g(τ p ) T = ab H. (4.8) p=1 The channel is thus described by a single spatial signature a = a(ϑ) and a single FIR filter b = P p=1 α pg(τ p ) (see Figure 4.1 on the left). The same holds if the channel has no delay spread, namely τ p = τ, p: P P H = α p a(ϑ p )g(τ) T = α p a(ϑ p ) g(τ) T = ab H, (4.9) p=1 p=1 where a = P p=1 α pa(ϑ p ), b = g(τ) is real valued (see Figure 4.1 on the right). This type of rank-1 channel seems to rarely occur in realistic propagation environments, as the channel impulse response is always characterized by both angular and temporal dispersions. However, paths can be usually grouped into clusters of scatterers having low angle/delay spread, i.e., clusters that can no longer be resolved in τ or ϑ by the resolution of the array and the pulse bandwidth. In this case the rank order r 0 of the channel matrix is approximately given by the number of clusters, usually lower than min(w, M ). Each spatial (temporal) vector a r (b r ) represents the equivalent spatial (temporal) signature for the rth cluster and is obtained as the linear combination of the signatures of all the paths composing the cluster. The rank-r 0 channel matrix can be thus rewritten as the combination of two full-rank matrices H = r 0 r=1 p=1 a r b H r = ABH, (4.10)

64 50 4. Reduced rank channel model where A =[a 1,..., a r0 ] and B =[b 1,..., b r0 ] describe the space and time structure of the channel and have dimensions M r 0 and W r 0, respectively. Recalling the multipath model (4.7), it can be observed that r 0 = min(r S,r T ) where r S =rank(f(ϑ)) equals the number of resolvable angles and r T =rank(g(τ )) corresponds to the number of resolvable delays. For instance, for a wide band system such as UMTS, as the temporal resolution is large, the delay spread is usually much larger than T c and therefore r T r S. The rank order is determined by the spatial manifold. By using the low rank model the number of parameters to be estimated is reduced to r 0 (M + W ) MW. It should be noticed that the number of degrees of freedom for H is r 0 (M + W r 0 ). Indeed, the rank-r 0 matrix is uniquely identifiedbyitssvd,thatismade of r 0 W r 0 (r 0 +1)/2 free parameters for the left eigenvector, r 0 M r 0 (r 0 +1)/2 for the right eigenvectors and r 0 singular values. 4.3 Rank analysis for multipath propagation We consider the feasibility of the reduced rank model for the multi-path propagation channel (4.5) and study the rank properties of the space-time matrix H for varying angular/delay spread. The analytical tool here used to evaluate the effective rank order is the SVD, H = UΣV H = r 0 i=1 u i σ i v H i, r 0 =min(m,w), (4.11) Σ =diag[σ 1 σ r0 ] denotes the r 0 r 0 diagonal matrix of the singular values, ordered for non-increasing values, σ 1 σ 2...σ r0. The corresponding left and right eigenvectors are U =[u 1 u r0 ] and V =[v 1 v r0 ]. According to the channel definition, U and V represent the spatial and temporal basis of the propagation. The best rank-r approximation of the matrix H, in the Frobenius norm sense (see Appendix 4.A), is obtained by truncating the SVD to the first r r 0 singular values: r H r = U r Σ r Vr H = u i σ i vi H, r r 0, (4.12) i=1 where Σ r =diag[σ 1 σ r ], U r =[u 1 u r ] and V r =[v 1 v r ]. The normalized square error of the approximation is thus given by ε(r) = H H r 2 r0 i=r+1 H 2 = σ2 i r0. (4.13) i=1 σ2 i For a given ε, we say that the channel can be approximated by rank r, ifr is the minimum integer that guarantees ε(r) ε. Wemayobservethat,whenthesingularvaluesarevery spread, the error reduces to the first singular value discarded (σ 2 r+1 ), that is ε(r) σ2 r+1 r0 ε. (4.14) i=1 σ2 i The channel used for the rank analysis is obtained from a stochastic multipath model, that is illustrated in Figure 4.2. The paths in (4.5) are grouped into N g independent clusters, each composed of N r scatterers N g N r H = α i,p a(ϑ i,p )g(τ i,p ) T. (4.15) i=1 p=1

65 4. Reduced rank channel model 51 BS Power profile function E 2 [ αi, p ϑi, p, τ i, p ] ϑ i, p Path p ϑ τ i, p = ϑi + ϑi, p i, p = τ c + τ i, p τ i, p Cluster 2 Delay and azimuth pdf ( τ ) f i, p ( ϑ ) f i, p Cluster 1 MS τ i, p ϑ i, p Figure 4.2: Stochastic multipath channel model. where α i,p, ϑ i,p and τ i,p denote the parameters of the pth path of the ith cluster. According to the wide sense stationary uncorrelated scattering (WSSUS) assumption, within each cluster α i,p, ϑ i,p and τ i,p are i.i.d. random variables. The probability density functions are chosen from the COST-259 Directional Channel Model (see Appendix A), while the number of clusters N g and paths N r are deterministic. The following assumptions are made: For the ith cluster, the mean value of the delays (τ i ), the angles (ϑ i )andthepowers(p i ), follow the distribution ϑ i U[ π/3,π/3], i =1,...,N g (4.16) τ i τ 1 + U[0,τ max ], i =2,...,N g (4.17) P i [db] N(L i,σ 2 p ), i =1,...,N g (4.18) where τ max =6.5µs is the maximum delay. The delay of the first cluster (line of sight cluster) is τ 1 = d/c, d denotes the distance between mobile station and base station. The power of each group, P i, has an independent shadow fading, which is lognormal and has standard deviation σ p =9dB. The path loss depends on the distance d i between the ith cluster and the base station, and on the path loss exponent n =3.8: L i d n i. The clusters are assumed uniformly distributed in the cell area, with radius 2 km. For each cluster, the delay σ τ,i and angular σ ϑ,i spreads are uniformly distributed where σ ϑ max and σ τ max are the maximum spread values. σ ϑ,i U[0,σ ϑ max ] (4.19) σ τ,i U[0,σ τ max ] (4.20)

66 52 4. Reduced rank channel model (a) 0.9 (b) cdf(r) cdf(r) Delay spread 0.4 Azimuth spread µs 1µs 2µs deg 10 deg 15 deg r r (c) 0.9 (d) cdf(r) cdf(r) Number of paths r Number of clusters r Figure 4.3: Rank analysis for clustered multipath channels: cdf of the rank order for varying delay spread, angular spread, number of paths within each cluster, number of clusters. According to [55], within the ith cluster, the pth path has delay τ i,p = τ i + τ i,p, angle ϑ i,p = ϑ i + ϑ i,p, and amplitude α i,p, distributed as ϑ i,p N(0, (1.38 σ ϑ,i ) 2 ), i =1,...,N g (4.21) τ i,p Exponential(1.17 σ τ,i ), i =1,...,N g (4.22) α i,p Rayleigh( P i P ( ϑ i,p, τ i,p ), i =1,...,N g (4.23) where P ( ϑ i,p, τ i,p ) is the power delay profile that is exponential both in ϑ i,p /σ ϑ,i and τ i,p /σ τ,i. A rank analysis of the channel model described above is carried out for the UMTS system, with bandwidth 1/T c =3.84MHz and carrier frequency f c =1950MHz. We consider M =8 antennas and use the maximum channel length that can be estimated by UTRA midamble, W =57. It should be noticed that this value is chosen rather large as UMTS is intended to work with a broad range of environments. This is the reason why in many situations the full rank parameterization results too redundant. The cumulative density function of the rank order is calculated from 200 independent runs of channels. For each run, the rank order is evaluated by selecting the approximation that guarantees ε(r) 20dB. Figure 4.3 shows the cumulative density function of the rank order for a multipath channel having delay spread σ τ max =1µs, angular spread σ ϑ max =10deg, number of paths

4. Reduced rank channel model 53 N r =10, number of clusters N g =1. In each plot of Figure 4.3 one parameters is made varying while the others are fixed: (a) σ τ max = {0.

Large values determine an increased dispersion and therefore a high rank order.

67 4. Reduced rank channel model 53 N r =10, number of clusters N g =1. In each plot of Figure 4.3 one parameters is made varying while the others are fixed: (a) σ τ max = {0.2, 1, 2} µs, (b) σ ϑ max = {5, 10, 15} deg, (c) N r = {5, 10, 25}, (d)n g = {1, 2, 3}. The overall delay/angular spread of the channel is determined by the ensemble of the four parameters. Large values determine an increased dispersion and therefore a high rank order. It is evident from the numerical results that the dependence on the number of clusters is critical, while the number of paths is not significant. A typical urban environment can be reasonably described by N g =1, σ τ max =1µs, σ ϑ max =5 10 deg and N g =1(the number of paths is not significant), from Figure 4.3 the rank turns out to be r 2 3. A bad urban environment has larger number of clusters, typically N g =2, which leads to r 4. The second example here considered refers to the COST-259 Directional Channel Model. Differently from the previous example, here the number of clusters and paths are random and the path parameters are assigned according to four radio environments (see Appendix A): σ τ max =1µs, σ ϑ max =10deg,E(N g )=1.1, τ max =10µs for the Generalized Typical Urban (GTU), σ τ max =1µs, σ ϑ max =10deg,E(N g )=3.3, τ max =10µs for the Generalized Bad Urban (GBU), σ τ max =0.2 µs, σ ϑ max =5deg,E(N g )=1.1, τ max =16.6 µs for the Generalized Rural Area (GRA), σ τ max =0.2 µs, σ ϑ max =5deg,E(N g )=3.3, τ max =16.6 µs for the Generalized Hill Terrain (GHT). Figure 4.4 shows the power delay/angle profile for a few channels generated by COST-259 model. Figure 4.5 shows the cumulative distribution function of the rank order for the four environments. A low rank model may be suitable for GTU (r 3) andgra(r 2)environments as they are characterized by a low space (GTU) and/or time (GRA) diversity. GTU GBU GRA GHT Low rank Low rank 11µs µ s µ s µ s 120 Figure 4.4: Power delay profile for channels generated according to COST-259 propagation environments: GTU (Generalized Typical Urban), GBU (Generalized Bad Urban), GRA (Generalized Rural Area), GHT (Generalized Hill Terrain).

926 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 53, NO. 3, MARCH Monica Nicoli, Member, IEEE, and Umberto Spagnolini, Senior Member, IEEE (1)

926 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 53, NO. 3, MARCH 2005 Reduced-Rank Channel Estimation for Time-Slotted Mobile Communication Systems Monica Nicoli, Member, IEEE, and Umberto Spagnolini,