Precoding for the Sparsely Spread MC-CDMA Downlink with Discrete-Alphabet Inputs

Similar documents
Source and Relay Matrices Optimization for Multiuser Multi-Hop MIMO Relay Systems

Scalable Spectrum Allocation for Large Networks Based on Sparse Optimization

A Brief Introduction to Markov Chains and Hidden Markov Models

MC-CDMA CDMA Systems. Introduction. Ivan Cosovic. Stefan Kaiser. IEEE Communication Theory Workshop 2005 Park City, USA, June 15, 2005

CS229 Lecture notes. Andrew Ng

ESTIMATION OF SAMPLING TIME MISALIGNMENTS IN IFDMA UPLINK

Centralized Coded Caching of Correlated Contents

Maximizing Sum Rate and Minimizing MSE on Multiuser Downlink: Optimality, Fast Algorithms and Equivalence via Max-min SIR

Power Control and Transmission Scheduling for Network Utility Maximization in Wireless Networks

Turbo Codes. Coding and Communication Laboratory. Dept. of Electrical Engineering, National Chung Hsing University

A. Distribution of the test statistic

Fast Blind Recognition of Channel Codes

Lecture Note 3: Stationary Iterative Methods

Asynchronous Control for Coupled Markov Decision Systems

ASummaryofGaussianProcesses Coryn A.L. Bailer-Jones

LOW-COMPLEXITY LINEAR PRECODING FOR MULTI-CELL MASSIVE MIMO SYSTEMS

Separation of Variables and a Spherical Shell with Surface Charge

In-plane shear stiffness of bare steel deck through shell finite element models. G. Bian, B.W. Schafer. June 2017

Iterative Decoding Performance Bounds for LDPC Codes on Noisy Channels

Expectation-Maximization for Estimating Parameters for a Mixture of Poissons

Rate-Distortion Theory of Finite Point Processes

T.C. Banwell, S. Galli. {bct, Telcordia Technologies, Inc., 445 South Street, Morristown, NJ 07960, USA

The Group Structure on a Smooth Tropical Cubic

Sequential Decoding of Polar Codes with Arbitrary Binary Kernel

Partial permutation decoding for MacDonald codes

MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES

BICM Performance Improvement via Online LLR Optimization

Primal and dual active-set methods for convex quadratic programming

XSAT of linear CNF formulas

Cryptanalysis of PKP: A New Approach

FRST Multivariate Statistics. Multivariate Discriminant Analysis (MDA)

Recursive Constructions of Parallel FIFO and LIFO Queues with Switched Delay Lines

Discrete Techniques. Chapter Introduction

The Weighted Sum Rate Maximization in MIMO Interference Networks: Minimax Lagrangian Duality and Algorithm

MARKOV CHAINS AND MARKOV DECISION THEORY. Contents

Problem set 6 The Perron Frobenius theorem.

6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17. Solution 7

Uniprocessor Feasibility of Sporadic Tasks with Constrained Deadlines is Strongly conp-complete

Discrete Techniques. Chapter Introduction

First-Order Corrections to Gutzwiller s Trace Formula for Systems with Discrete Symmetries

Coded Caching for Files with Distinct File Sizes

On the Performance of Mismatched Data Detection in Large MIMO Systems

DURING recent decades, multiple-input multiple-output. Multiuser Multi-Hop AF MIMO Relay System Design Based on MMSE-DFE Receiver

BALANCING REGULAR MATRIX PENCILS

Some Measures for Asymmetry of Distributions

sensors Beamforming Based Full-Duplex for Millimeter-Wave Communication Article

14 Separation of Variables Method

Adaptive Joint Self-Interference Cancellation and Equalization for Space-Time Coded Bi-Directional Relaying Networks

On the Achievable Extrinsic Information of Inner Decoders in Serial Concatenation

Formulas for Angular-Momentum Barrier Factors Version II

Gokhan M. Guvensen, Member, IEEE, and Ender Ayanoglu, Fellow, IEEE. Abstract

Approximated MLC shape matrix decomposition with interleaf collision constraint

8 Digifl'.11 Cth:uits and devices

Bayesian Learning. You hear a which which could equally be Thanks or Tanks, which would you go with?

Massive MIMO Communications

$, (2.1) n="# #. (2.2)

Combining reaction kinetics to the multi-phase Gibbs energy calculation

A Statistical Framework for Real-time Event Detection in Power Systems

Uniformly Reweighted Belief Propagation: A Factor Graph Approach

V.B The Cluster Expansion

V.B The Cluster Expansion

THE OUT-OF-PLANE BEHAVIOUR OF SPREAD-TOW FABRICS

Group Sparse Precoding for Cloud-RAN with Multiple User Antennas

Coordination and Antenna Domain Formation in Cloud-RAN systems

FRIEZE GROUPS IN R 2

Space-time coding techniques with bit-interleaved coded. modulations for MIMO block-fading channels

Optimal Control of Assembly Systems with Multiple Stages and Multiple Demand Classes 1

The Streaming-DMT of Fading Channels

On the Performance of Wireless Energy Harvesting Networks in a Boolean-Poisson Model

Efficiently Generating Random Bits from Finite State Markov Chains

PERFORMANCE ANALYSIS OF MULTIPLE ACCESS CHAOTIC-SEQUENCE SPREAD-SPECTRUM COMMUNICATION SYSTEMS USING PARALLEL INTERFERENCE CANCELLATION RECEIVERS

Sum Rate Maximization for Full Duplex Wireless-Powered Communication Networks

Average Sum MSE Minimization in the Multi-User Downlink With Multiple Power Constraints

Space-Division Approach for Multi-pair MIMO Two Way Relaying: A Principal-Angle Perspective

An Algorithm for Pruning Redundant Modules in Min-Max Modular Network

Target Location Estimation in Wireless Sensor Networks Using Binary Data

A Generalized Framework on Beamformer Design and CSI Acquisition for Single-Carrier Massive MIMO Systems in Millimeter-Wave Channels

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 15, NO. 2, FEBRUARY

More Scattering: the Partial Wave Expansion

Duality, Polite Water-filling, and Optimization for MIMO B-MAC Interference Networks and itree Networks

Transmit Antenna Selection for Physical-Layer Network Coding Based on Euclidean Distance

SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS

Improved Min-Sum Decoding of LDPC Codes Using 2-Dimensional Normalization

A Branch and Cut Algorithm to Design. LDPC Codes without Small Cycles in. Communication Systems

VALIDATED CONTINUATION FOR EQUILIBRIA OF PDES

A Generic Complementary Sequence Encoder

Discrete Applied Mathematics

18-660: Numerical Methods for Engineering Design and Optimization

Asymptotic Properties of a Generalized Cross Entropy Optimization Algorithm

DIGITAL FILTER DESIGN OF IIR FILTERS USING REAL VALUED GENETIC ALGORITHM

Stochastic Variational Inference with Gradient Linearization

Lobontiu: System Dynamics for Engineering Students Website Chapter 3 1. z b z

(f) is called a nearly holomorphic modular form of weight k + 2r as in [5].

VALIDATED CONTINUATION FOR EQUILIBRIA OF PDES

On Efficient Decoding of Polar Codes with Large Kernels

FOURIER SERIES ON ANY INTERVAL

Optimality of Inference in Hierarchical Coding for Distributed Object-Based Representations

hole h vs. e configurations: l l for N > 2 l + 1 J = H as example of localization, delocalization, tunneling ikx k

Componentwise Determination of the Interval Hull Solution for Linear Interval Parameter Systems

BDD-Based Analysis of Gapped q-gram Filters

Transcription:

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL.*, NO.*, MONTH 2016 1 Precoding for the Sparsey Spread MC-CDMA Downin with Discrete-Aphabet Inputs Min Li, Member, IEEE, Chunshan Liu, Member, IEEE, and Stephen V. Hany, Senior Member, IEEE arxiv:1702.02634v1 [cs.it] 8 Feb 2017 Abstract Sparse signatures have been proposed for the CDMA upin to reduce muti-user detection compexity, but they have not yet been fuy expoited for its downin counterpart. In this wor, we propose a Muti-Carrier CDMA (MC-CDMA) downin communication, where reguar sparse signatures are depoyed in the frequency domain. Taing the symbo detection point of view, we formuate a probem appropriate for the downin with discrete aphabets as inputs. The soution to the probem provides a power-efficient precoding agorithm for the base station, subject to minimum symbo error probabiity (SEP) requirements at the mobie stations. In the agorithm, signature sparsity is shown to be crucia for reducing precoding compexity. Numerica resuts confirm system-oad-dependent power reduction gain from the proposed precoding over the zeroforcing precoding and the reguarized zero-forcing precoding with optimized reguarization parameter under the same SEP targets. For a fixed system oad, it is aso demonstrated that sparse MC-CDMA with a proper choice of sparsity eve attains amost the same power efficiency and in throughput as that of dense MC-CDMA yet with reduced precoding compexity, thans to the sparse signatures. Index Terms CDMA, discrete aphabets, MC-CDMA, power efficiency, precoding, sparse signature, symbo error probabiity. I. INTRODUCTION A. Motivations and Contributions Muti-Carrier Code Division Mutipe Access (MC-CDMA) is a muti-access scheme based on the Orthogona Frequency Division Mutipexing (OFDM) method. Since its invention, MC-CDMA has attracted broad interest, see, e.g., [1] [4] and the references therein. MC-CDMA naturay integrates CDMA s fexibe mutiuser access with interference suppression capabiity and the advantages of muticarrier OFDM, incuding robustness against frequency-seective fading. Therefore, it has the potentia to be one of the candidates to support massive access and provide reiabe data communication and better coverage for future-generation wireess systems. As in a CDMA systems, MC-CDMA may experience severe muti-access interference due to the oss of user orthogonaity, which may occur, particuary, in frequency-seective Copyright (c) 2015 IEEE. Persona use of this materia is permitted. However, permission to use this materia for any other purposes must be obtained from the IEEE by sending a request to pubs-permissions@ieee.org. This wor was presented in part in Proceedings of the IEEE Internationa Conference on Communications (ICC), Sydney, Austraia, June 2014. This research was supported in part by the Austraian Research Counci under grant DP130101760, and by the CSIRO Macquarie University Chair in Wireess Communications. This Chair has been estabished with funding provided by the Science and Industry Endowment Fund. The authors are with the Department of Engineering, Macquarie University, Macquarie Par, NSW 2113, Austraia (e-mai: {min.i, chunshan.iu, stephen.hany}@mq.edu.au). channe environments. For such systems, optima detection entais an exponentia number of hypothesis testings about data symbos of a users and thus coud be computationay demanding, time-consuming and even infeasibe in a arge system with conventiona dense signature design. To circumvent the compexity issue, sparse signatures, whose fraction of nonzero entries is sma, have been introduced and expoited for the CDMA upin muti-user detection [5] [10]. In particuar, the beief-propagation agorithm has been proposed for such a system, an agorithm that has a natura impementation using parae computation units, and one that is fast and provaby optima for different ensembes of sparse signatures in the arge system imits [5] [8]. Inspired by the beiefpropagation agorithm, references [9] and [10] have deveoped reduced-compexity soft-in soft-out (SISO) and Turbo iterative mutiuser detection agorithms for the sparse CDMA upin and a sparse-signature OFDM upin, respectivey. In this wor, we formuate a different probem, appropriate for the MC-CDMA downin counterpart, from the symbo detection point of view. The soution to the probem provides a power-efficient precoding agorithm for the base station (BS). The precoding is impemented at the BS, aowing the mobie stations (MSs) to use simpe conventiona singe-user matched fiters and standard singe-user symbo detection. This hence simpifies the impementation of the receiver, as compared to the conventiona MC-CDMA downin transceiver design, where muti-user interference is normay mitigated by a frequency-domain equaization at the receiver [2]. Moreover, the proposed agorithm optimizes the signas transmitted on different subcarriers so that they can be constructivey combined at each receiver, eading to a power-efficient precoding. The use of random sparse signatures was suggested for the MC-CDMA downin in [11] to aow ow-compexity iterative mutiuser detection at each MS. But here we show that sparsity can be expoited to reduce the compexity of the precoding, as compared to MC-CDMA with dense signatures. In addition, using sparse signatures simpifies channe measurement, since each MS ony needs to estimate channes for a sma number of subcarriers it occupies. The contributions of this paper are summarized as foows: We introduce the MC-CDMA downin communication with reguar sparse signatures, where each MS has access to an equa number of subcarriers and each subcarrier has (roughy) equa oad. We aso consider a bipartite graph representation for the system studied and use it to faciitate agorithm design and compexity anaysis. Assume that data symbos intended for MSs are drawn from discrete aphabets. We tae the symbo detection

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL.*, NO.*, MONTH 2016 2 point of view and introduce the minimum Symbo Error Probabiity (SEP) as a Quaity of Service (QoS) metric for each MS. We formuate the precoding probem as a transmit power optimization probem subject to minimum SEP requirements at MSs. We transate the SEP targets into constraint regions on noiseess received signa components at MSs and characterize them via a conservative approximation to mae the probem tractabe. Detaied formuation procedures are provided for systems with both standard 4/16-QAM consteations and Tominson- Harashima repica points. We deveop a precoding agorithm that accommodates parae computation units via the dua-decomposition theory. Aided by the graph representation of the system, the compexity of the agorithm is characterized in terms of the number of message passings between computation units and the number of additions and mutipications for precoding cacuation. Signature sparsity is shown to pay a vita roe in reducing precoding compexity. We demonstrate that the proposed optimized precoding generay outperforms the conventiona zero-forcing (ZF) and the optimized Reguarized ZF (RZF) precodings in terms of power efficiency under the same SEP requirements. The exact gain depends on the system oad and it is very significant for a fuy oaded system. We aso demonstrate that, for a fixed system oad, sparse MC-CDMA, e.g., with a proper reativey sma number of subcarriers aocated to each MS, attains amost the same power efficiency and in throughput as dense MC-CDMA under our proposed precoding scheme. This important observation, in conjunction with the fact that sparsity reduces precoding compexity, promotes the practicaity of sparse MC-CDMA. B. Other Reated Wor Precoding is a reativey mature concept in muti-antenna communication systems, enabing mutiuser mutipexing in the spatia domain, see, e.g., [12] [20]. It has to baance between the two conficting interests of maximizing the usefu signa power at the intended user and minimizing interference eaage towards non-intended users. The same concept can be appied to other systems such as direct-sequence CDMA or MC-CDMA, where mutipexing taes pace in the time- or frequency-domain, and muti-access interference, if it arises, has to be deat with [21] [25]. Existing precoding techniques can be divided into two categories, inear precoding and non-inear precoding, where both require channe state information whie the atter requires additiona symbo-based processing. Within the first category, matched fitering, ZF and RZF [12] are three commony nown precodings that maintain different eves of baance between the two conficting goas. Other power-aware inear precodings of genera form have aso been proposed in the iterature subject to different QoS metrics, such as signa-to-noise-pusinterference ratio [20]. Compared with inear precoding, non-inear precoding may offer higher power efficiency and transmission rate, but the gain comes at the cost of incorporating more sophisticated signa processing [13], [15] [17], [23] [25]. In the muti-antenna broadcasting setup, capacity-achieving non-inear Dirty-Paper Coding (DPC) entais a successive pre-canceation of nown intra-user interference at the BS. The encoding of data reies on codewords of infinite ength and invoves a highdimensiona sphere-search agorithm, which renders DPC unattractive in practica systems. Tominson-Harashima precoding (THP) is a simpified version of DPC, where the codeboo is comprised of periodic extension of standard consteations (repica points) in the two-dimensiona space and a transmit moduo-operation is introduced in the interference pre-canceation process in order to reduce transmit power. Buit on ZF or RZF, reference [13] generaizes the singe-userbased symbo extension idea of THP and introduces a joint perturbation of user symbo vector to further reduce transmit power. The optimized precoding proposed in this wor beongs to the second category. As in existing wors, power consumption is one of the primary concerns in our optimized precoding. However, the optimization criterion, minimum SEP constraint, has not been considered before, except our own wors [26] [28] in the MIMO (or distributed MIMO) setup. This criterion appears to be natura when we consider a system with discrete aphabets as inputs. In addition, in our formuation, we fix the information-bearing aphabets at the BS but aow a certain reaxation of received signas at MSs through precoding, as ong as they reside in detection-favorabe regions and the SEP targets are met. This is distinguished from reated wors [16], [18], [19], [25], where non-inear reaxation of input aphabets has been adopted. In [16], [18], the reaxation is required to maintain the minimum signaing distance, whie in [19], [25], the reaxation is to ensure the corresponding symbo-energyto-noise ratio is above a certain threshod [19], [25]. However, in a these wors, no expicit SEP targets are imposed at MSs. Notation: Bodface uppercase and owercase etters denote matrices and vectors, respectivey, e.g., A is a matrix and a is a vector; I N is an N N identity matrix; for integers i j, [i : j] = {i, i + 1,..., j}, is the discrete interva between i and j, and a [i:j] = {a i,..., a j }, is the coection of [i : j]th components of vector a; ( ) T denotes the matrix transpose, whie ( ) denotes the conjugate transpose; notation E[X] denotes the expectation operation on random variabe X, and x denotes a foor function of rea number x; R{ } and I{ } denote the rea and imaginary part of a compex number, respectivey; finay, A p is the standard p norm of A. A. Signaing Mode II. SYSTEM MODEL We consider a downin communication, where a singeantenna BS is simutaneousy serving K singe-antenna MSs via MC-CDMA. Specificay, data symbos intended for MSs are a drawn from discrete aphabet sets, e.g., M-QAM consteations, common in practica depoyments. The downin communication taes pace over a set of N orthogona subcarriers where we assume N K and thus the oad α = K/N (0, 1]. In the conventiona MC-CDMA, the

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL.*, NO.*, MONTH 2016 3 MS : dˆ d1 BS : d d K singe-user detection y precoding T s sparse despreading x1 x x N y IFFT FFT x channe y PSNs OSNs H x1 x2 x3 y y 1 2 yk xn Fig. 1. The transceiver architecture for the sparse MC-CDMA downin. Fig. 2. Bipartite graph representation of the sparse MC-CDMA downin. data symbo intended for a MS is transmitted over a parae subcarriers where each is encoded with a binary phaseoffset [1]. Here, however, information associated with each MS is assumed to be spread onto ony a sma subset of the avaiabe subcarriers, which eads to the sparse MC-CDMA as originay studied by [5] [8] for the upin. be the signature for MS, where normaization factor L corresponds to the tota number of subcarriers aocated to MS. In the signature, components s,n are i.i.d. drawn from a distribution P S with zero-mean and unit-variance if MS has access to subcarrier n, and s,n = 0 otherwise. The coection of a signatures corresponds to a sparse signature matrix S = [s 1, s 2,..., s K ], which is perfecty nown at the BS. The transceiver architecture for the downin transmission is depicted in Fig. 1 and is eaborated as foows. Let d = [d 1,..., d K ] T be the transmitted data symbo vector, where component d denotes the symbo intended for MS and is drawn from a discrete finite-aphabet set D. The transmission taes pace by first forming appropriate frequency-domain signas and then converting them into time-domain signas by the inverse fast Fourier transform (IFFT) at the BS. Specificay, symbo vector d is passed through a precoder and mapped to coded vector x C N. An IFFT is then appied over the coded vector in order to generate time-domain signa vector x C N that is subsequenty transmitted over the wireess channe. Upon observing channe output, each MS first performs an FFT and produces the frequency domain signa ỹ as Let s = 1 L [ s,1, s,2,..., s,n ] T ỹ = h x + z, (1) where notation denotes the Hadamard product; vector h = [ h,1, h,2,..., h,n ] T is the coection of frequencydomain channe gains from BS to MS ; vector z is a circuary symmetric compex Gaussian noise with E[ z z ] = N 0 I N. Despreading is then performed at MS based on its own signature s, foowed by a simpe singe-user detection. The corresponding output signa y after despreading is given by y = N ) (s,n h,n x n + s T z = n=1 }{{}}{{} z h,n N h,n x n + z, (2) n=1 where s,n = s,n / L is the nth component of s, and each equivaent channe noise z is circuary symmetric compex Gaussian with zero mean and variance N 0. Coecting a outputs at MSs, we obtain the equivaent system input-output reationship as y 1 h 1,1 h 1,N x 1 z 1 y 2 h 2,1 h 2,N x 2 z 2 =.... +..... } y K {{ } } h K,1 {{ h K,N }} x N {{ } } z K {{ } y H x z (3) It is straightforward to observe that in matrix H, h,n = 0 as ong as s,n = 0, and thus row h maintains the same eve of sparsity as the corresponding signature s. This aso means the BS ony needs to now the sma number of h,n s for which s,n 0 for the purpose of precoding. B. Graph Representation Given the matrix H from (3), we can aternativey construct a bipartite graph representation of the sparse MC-CDMA system. Assume that each symbo x n in the graph is represented by a precoded symbo node (PSN), and each output y is represented by a output symbo node (OSN). PSNs wi be drawn as circes and OSNs wi be drawn as squares in the graph. PSN x n is connected with OSN y ony if s,n 0 and h,n is the weight associated with the edge. Fig. 2 depicts an instance of the graph G for L = 2, where each OSN is connected with two PSNs. We use I(x n ) to denote the coection of OSNs connected to x n and define the node degree of x n as the cardinaity I(x n ). Simiary, we use I(y ) to denote the coection of PSNs connected to y and define the node degree of y as the cardinaity I(y ). This graph representation introduced wi faciitate the description of the precoding agorithm and the corresponding compexity anaysis in Section V. C. Sparse Signature Ensembe In the signature matrix, we assume that the non-zero eements { s,n } are i.i.d. drawn from a uniform distribution on {+1, 1}. It is observed that generating non-zero eements according to other distributions, e.g., Gaussian distribution, has itte impact on the averaged system performance. Hence, we stic to the binary uniform distribution, which eads to a binary phase-offset for the symbo transmitted as in [6], [7]. Depending on the number of subcarriers aocated across MSs and the oad per subcarrier, we have three common signature ensembes suggested for the upin [7]: i) irreguar ensembe, where Poisson-distributed number of subcarriers

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL.*, NO.*, MONTH 2016 4 are aocated across MSs and the oad per subcarrier is aso Poisson-distributed; ii) semi-reguar ensembe, where each MS is aocated a fixed positive integer number of subcarriers and the oad per subcarrier is Poisson-distributed; and iii) reguar ensembe, where the number of subcarriers aocated for each MS and the oading per subcarrier both tae fixed positive integer vaues. In particuar, [7] advocates the reguar ensembe as it amongst others prevents the systematic inefficiency due to eaving some subcarriers unoccupied by any of MSs. In this wor, we foow [7] and depoy the reguar-type ensembe to ensure that the system enjoys fu utiization of resources and provides user fairness. Specificay, when the system is fuy-oaded (α = 1), a perfecty reguar signature matrix is randomy generated in the sense that each MS is aocated L subcarriers, and each subcarrier is accessed by exacty L MSs. When the system is under-oaded (α < 1), a neary reguar signature matrix is randomy generated such that each MS is aocated L subcarriers, and each subcarrier is of amost equa oad, namey, accessed by either αl or ( αl + 1) MSs. III. OPTIMIZED PRECODING WITH SEP TARGETS For notationa convenience, we define y = h T x as the noiseess received component at MS (h T is the th row of matrix H) with rea part y (r) = R{y } and imaginary part y (i) = I{y }; simiary, denote the rea and imaginary parts of data symbo and noise as d (r) = R{d } and d (i) = I{d }, and z (r) = R{z } and z (i) = I{z }, respectivey. In addition, define 2 = N 0 /2 as the fixed noise variance per signa dimension. Assuming that a data symbos transmitted are seected from discrete aphabets, we tae a symbo detection point of view and impose minimum Symbo Error Probabiities (SEPs) as the user QoS constraints. Specificay, et A(d ) denote the decision region associated with data symbo d intended for MS and P e denote the SEP target. Detection error happens when the output signa y ies outside decision region A(d ). According to the SEP requirement, the probabiity of error events shoud be no greater than the target, i.e., Pr (y = (ȳ + z ) A(d )) P e. (4) A question we then as is: How do we design a precoder that efficienty maps a symbo vector d into a precoded vector x such that the SEP requirements at MSs can be met? Foowing the conventiona zero-forcing (ZF) approach, one coud form x according to x = H ( HH ) 1 d, (5) which inverts the channe matrix and forces noiseess component ȳ to sit exacty at the consteation point d. In order to meet a given SEP target, P e, data symbo d has to be chosen from a discrete aphabet set whose minimum distance between any two neighboring points (denoted by d min ) is above a certain threshod. For instance, consider a system with M-QAM moduation whose standard consteation is represented by D S = { ar a R + ja I, a I {±1, ±3,..., ±( } M 1)} with d min = 2. We need to scae the consteation points (increasing the minimum distance d min, but aso the transmit power) in order to meet the SEP target, P e. Considering the 4-QAM consteation (see Fig. 3), the scaing factor, β, to use for MS, must satisfy 1 + e ( z β (r) ) 2 2 2 dz (r) 1 + e ( z β (i) ) 2 2 2 (6) dz (i) 1 P e, (7) as impied by (4). Thus the minimum scaing factor under the conventiona ZF approach for the 4-QAM system is given by β = Q 1 ( 1 P e ), (8) where Q 1 (.) denotes the inverse of the standard Q-function [29]. When M 16 (see Fig. 4 for 16-QAM), the standard consteation D S shoud be scaed so that 1 +β β e ( z (r) ) 2 2 2 dz (r) 1 +β β e ( z (i) ) 2 2 2 dz (i) 1 P e, (9) as impied by (4), considering the dominant scenario in which one of the inner most points is transmitted. Thus the minimum scaing factor under the conventiona ZF approach is given by β = Q 1 ( 0.5 0.5 1 P e ). (10) In genera, however, we do not have to zero-force ȳ, and in fact, it is sufficient to ensure ȳ fas into a region that favours correct symbo detection. This reaxation introduces room to optimize the choice of x, eading to the foowing transmit power minimization probem: min P (x) = x x x C P : N 1 subject to Pr ((ȳ + z ) A(d )) P e, for the transmitted data set {d D, = 1,..., K}. (11) IV. SPARSE MC-CDMA WITH STANDARD M -QAM CONSTELLATIONS In this section, we show how to transate the set of SEP targets in (11) into constraints on noiseess output components at MSs. In particuar, we begin with the 4-QAM signaing case and then generaize to the 16-QAM signaing case. A simiar approach can be appied to systems with higher-order QAM consteations.

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL.*, NO.*, MONTH 2016 5 D 2 D 1 (ii) (iii) D 3 D 4 (i) d d d Fig. 3. 4-QAM: β is the minimum scaing factor under ZF; {D 1, D 2, D 3, D 4 } are consteation points; taing d = D 4 as an exampe, A(d ) corresponds to the decision region, B(d ) corresponds to the precise constraint region on noiseess output y m, whie C(d ) represents the constraint region with conservative approximation. A. Transate SEP Targets to Constraints on Noiseess Received Signa Components 1) 4-QAM: Assume that each d is drawn from the 4-QAM consteation set D = {D m : m = 1,..., 4} as shown in Fig. 3, where the green dashed ines partition the compex pane into four symmetric decision regions each occupying an open quarter pane. Any received signas faing outside the correct region ead to detection error. Thus, the SEP requirement (4) becomes 1 I (r) + (d ) I (r) (d ) e ( z (r) ) 2 2 2 dz (r) } {{ } O (r) 1 (i) I + (d ) e ( z I (i) (d ) (i) ) 2 2 2 dz (i) } {{ } O (i) 1 P e, (12) where the tupe (I (r) (d ), I (r) + (d ), I (i) (d ), I (i) + (d )) depends on the data transmitted and equas: (i) (, y (r),, y(i) ) for D 1; (ii) (, y (r), y(i), + ) for D 2; (iii) ( y (r), +,, y(i) ) for D 3; (iv) ( y (r), +, y(i), + ) for D 4. Given symbo d D and a target P e, one can determine the precise constraint region B (d ) on noiseess output y at MS from inequaity (12). In particuar, the boundary of the region is determined by equaity in (12). For exampe, when d = D 4, three points on the boundary of B (d ) are identified by considering combinations of ( O (r), O (i)) : (i) (1, 1 P e ): y (r) = +, y (i) = Q 1 (1 P e ); (ii) (1 P e, 1): y (r) = Q 1 (1 P e ), y (i) = + ; (iii) ( 1 P e, ) 1 P e : y (r) = y (i) = ( ) Q 1 1 P e. In principe, the curved-shape boundary can be determined by traversing a possibe combinations of O (r) and O (i). The constraint region B(d ) then incudes a the points on and within the boundary (see the red curve in Fig. 3). Note that point (iii) is exacty a scaed consteation point. Aso note that this region generay impies non-inear constraints on input signa x, which maes the optimization probem ess tractabe. Aternativey, we can find a poytype contained in B(d ), i.e., we conservativey approximate the region using the area bounded by ine segments between a finite number of points on or within boundary. A simpe approximation for d = D 4 is given by: y (r) Q ( ) 1 1 P e and y (i) Q ( ) 1 1 P e, which ead to inear constraints on input signa x. This region with conservative approximation is denoted as C(d = D 4 ), see the open shadow area in Fig. 3. By the same approach, reaxed constraint regions associated with the other consteation points can be derived and characterized as: { (i) C(d = D 1 ) = { (ii) C(d = D 2 ) = { (iii) C(d = D 3 ) = (y (r) (y (r) (y (r), y(i) ) : y(r) I, y (i) }; I, y(i) ) : y(r) I, y (i) }; I, y(i) ) : y(r) I, y (i) }, I with definition I = Q ( ) 1 1 P e. Note that the exact areas of these regions ony depend on the SEP targets. 2) 16-QAM: We now turn to the case where d is drawn from the 16-QAM consteation set D = {D m : m = 1,..., 16} as shown in Fig. 4. The constraint region on y can be characterized based on procedures simiar to those for the 4-QAM case. But cacuation requires some care, since the decision regions A(d ) for inner points and outer points are of different shapes (see the regions partitioned by the green dashed ines in Fig. 4) and the exact areas of these regions depend on the scaing factor β. For the sae of conciseness, the derivation of the constraint region B(d ) is deferred to Appendix A. Taing symbos {D 11, D 12, D 16 } as exampes, we pot the resuting regions B (d ) in Fig. 4. Again, one can approximate these regions with a poytype C(d ) for each, eading to inear constraints on input signa x. In genera, the exact areas of C(d ) depend on the scaing factor β. It is cear that, with minimum β as defined by (10), constraint region C(d ) corresponds to a strict equaity constraint on y, i.e., y = D m, for the center points m {6, 8, 14, 16}; for the side points, e.g., D 12, C(d ) shrins to a ine: y (r) 2β I and y(i) = β, which impies doing a zero-forcing for the imaginary part whie having a reaxed constraint on the rea part; and for the corner points, e.g., D 11, C(d ) becomes: y (r) 2β I, y(i) 2β I, where we reca that I = Q ( ) 1 1 P e. To meet a given SEP target P e for MS, one coud certainy adopt a scaing factor arger than the minimum β for the transmission. But such a choice may affect the power efficiency of the system. Fig. 5 pots two instances of constraint regions B(d ) for P e = 10 3 when the consteation is scaed up with β = 1.05β (bue curves) and β = 1.20β (red curves). It can be seen that with a arger β above β, when a center consteation point is transmitted, the constraint region is reaxed from a singe point to a circe-

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL.*, NO.*, MONTH 2016 6 D 3 D 4 D 2 D 1 D 7 D 8 D 6 D 5 D 15 D 16 D 14 D 13 D 11 D 12 D 10 D 9 d d Fig. 4. 16-QAM: β is a scaing factor; {D 1,..., D 16 } are consteation points; taing d = D 12 as an exampe, A(d ) corresponds to the decision region, B(d ) corresponds to the precise constraint region on noiseess output y, whie C(d ) represents the constraint region with conservative approximation. Fig. 5. Constraint regions B(d ) for the 16-QAM consteation under different scaing factors with P e = 10 3 fixed: (i) bue curves: β = 1.05β ; (ii) red curves: β = 1.20β. type region centered on the symbo. Potentia benefits can be accrued from the resuting enarged feasibe region. However, when the corner points are transmitted, the corresponding constraint regions aways shrin as the consteation is scaed up. In this case, power efficiency oss may be induced because of the reduced feasibe optimization space. When one of the side points is transmitted, it is uncear how the performance reacts, as the constraint region with arger β partiay overaps with that for a smaer β. Nevertheess our experiments have indicated that when the transmitted data symbos are randomy and uniformy generated, scaing up the consteation with β > β brings itte further power saving. Hence we wi use the minimum scaing, β, for the standard 16-QAM in what foows. d B. Probem Reformuation with Conservative Approximation With the conservative approximation, the SEP constraints in (11) can be transated into a set of inear inequaity/equaity constraints on vector x. We now present the resuting optimization probem. For ease of exposition, we stac the rea and the imaginary parts of each x n into a rea vector x, i.e., x = [R{x 1 }, I{x 1 },..., R{x N }, I{x N }] T R 2N 1. For the 4- QAM signaing, the rea and imaginary parts of coded signa x n are associated with different inequaities. For the 16-QAM signaing, with β = β, the rea and the imaginary parts of x n are associated with either one equaity or an inequaity depending on the data symbos; when β > β, at most two inequaities are introduced for either the rea or the imaginary part. Therefore, the optimization probem of (11) can be generay reformuated as foows: P 1 : min P ( x) = x T x x R 2N 1 s. t. A x c 0, B x e = 0, (13) where for the 4-QAM system, we have A = {a i,j } R 2K 2N, c = {c i } R 2K 1, B = 0 and e = 0, whie for the 16-QAM system, we have A = {a i,j } R 4K 2N, c = {c i } R 4K 1, B = {b i,j } R 2K 2N and e = {e i } R 2K 1. The precise definitions of the matrices and the vectors invoved depend on constraint regions C(d ),, as defined in Section IV-A1 and Section IV-A2 for the 4- QAM and 16-QAM signaing, respectivey. Whether or not the inequaity/equaity constraints are active wi depend on the transmitted data associated with the consteation. It is possibe that some of the constraints may not be active, in which case the corresponding row entries of A/B and c/e are padded with zeros. In addition, matrices A/B enjoy the same sparsity as the system matrix H. In particuar, when indexing matrices A/B, the index pair (i, j) corresponds to a particuar MS-subcarrier pair, and thus a i,j /b i,j = 0 whenever the particuar subcarrier is not aocated to the MS. As an iustrative exampe, we consider a 16-QAM system with N = 3 subcarriers and K = 2 MSs. Each MS is aocated L = 2 subcarriers: The first MS is aocated subcarriers 1 and 2, whie the second MS is aocated subcarriers 2 and 3. The effective channe matrix H is assumed to be H = [ 1 + j 1 + j 0 0 1 + j 1 j ]. (14) The data symbo intended for MS 1 and 2 is d 1 = D 16 and d 2 = D 11, respectivey. Both MSs require the same SEP target P e and thus empoy the same scaing factor β = β 1 = β 2. The noiseess received components at MSs are cacuated as [ ] [ ] ȳ1 (1 + j)x1 + ( 1 + j)x = Hx = 2. (15) ȳ 2 (1 + j)x 2 + (1 j)x 3 Therefore, according to the constraint regions constructed (see Section IV-A2 and aso Appendix A), we have: β δ 0 ȳ (r) 1 β + δ 0, β δ 0 ȳ (i) 1 β + δ 0, (16) ȳ (r) 2 (2β ( )) Q 1 1 P e, ȳ (i) 2 (2β ( )) Q 1 1 P e, (17)

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL.*, NO.*, MONTH 2016 7 which, by simpe agebra, can be further transated into a set of genera inear constraints as in (13) with B = 0, e = 0, 1 1 1 1 0 0 1 1 1 1 0 0 1 1 1 1 0 0 A = 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 β + δ 0 (β δ 0 ) β + δ 0 (β δ 0 ) and c = ( 2β Q 1 ( 1 P e )). (18) 0 ( 2β Q 1 ( 1 P e )) 0 As we can observe, matrix A inherits sparsity from the effective channe matrix H; transmission of d 1 = D 16 and d 2 = D 11 invoes four and two inequaity constraints, respectivey, and there is no strict equaity constraint in this exampe. V. PRECODING ALGORITHM DESIGN WITH PARALLEL COMPUTATION UNITS Probem P 1 is stricty convex and can be soved via a number of standard agorithms, such as the interior-point agorithm [30]. Most of these agorithms are designed for centraized impementation and coud be efficient enough for a sma-scae probem. However, as the probem dimension increases, the computationa compexity may be prohibitive. More importanty, the sparsity inherent to the probem may not be we expoited in standard sovers. Given these observations, we are interested in deveoping a precoding agorithm that everages the sparsity to reduce compexity and is suitabe for soving arge-dimension probems using parae computation units. These units may correspond to parae processor cores (threads) at the BS computer [31] or parae processors at the coud to which the BS is connected [32]. A. Agorithm Design We now detai the precoding agorithm with a focus on the genera probem as if a inequaities and equaities in (13) were activated. The ey technique is the dua decomposition approach, see, e.g., [33]. Recaing the graph representation G introduced (see Fig. 2), we can map a PSNs and OSNs to parae computation units. We start with forming the Lagrangian function: L ( x, λ, ν) = x T x + λ T (A x c) + ν T (B x e), (19) where λ R 4K 1 0 and ν R 2K 1 are Lagrangian mutipiers (dua variabes), among which each pair of prima variabes x [2n 1:2n] is associated with PSN computation unit n (n = 1,..., N), whie each tupe of dua variabes {λ [4 3:4], ν [2 1:2] } is associated with OSN computation unit ( = 1,..., K). The dua probem is then defined as max g(λ, ν), subject to λ 0, (20) λ,ν with g(λ, ν) = min x L ( x, λ, ν) being the dua objective function. Then one can sove the origina probem by finding the optima dua variabes in an iterative manner. Specificay, at the tth iteration, for fixed dua variabes λ (t) and ν (t), to attain the minimization of Lagrangian, one sets the first-order derivative of the Lagrangian to zero, which eads to x (t) = 1 ( A T λ (t) + B T ν (t)), (21) 2 or more expicity, x (t) = 1 2 i I( x ) a i, λ (t) i + i I ( x ) b i, ν (t) i, = 1,..., 2N, (22) where I( x ) and I ( x ) denote the coection of indices of dua variabes λ and ν that have interactions with x, respectivey, according to the graph G. Note that PSN unit n is in charge of computing the pair x (t) [2n 1:2n], n = 1,..., N. The corresponding dua function is given by: g(λ, ν) = x (t)t x (t) + λ T (A x (t) c) + ν T (B x (t) e). (23) The dua variabes are then updated by the OSN units in a parae manner according to [ λ (t+1) i = λ (t) i + t 1 ( ) λ (t) i λ (t 1) i t + 2 ν (t+1) j + 1 2κ I(λ i) = ν (t) j + t 1 t + 2 + 1 2κ I(ν j) a i,ˆx (t) ( ν (t) j b j,ˆx (t) c i + ) ν (t 1) j, i = 1,..., 4K, e j, j = 1,..., 2K, (24) where I(λ i ) and I(ν j ) denote the coection of indices of prima variabes that have interactions with λ i and ν j, respectivey; notation [.] + denotes the projection onto the nonnegative orthant, κ = ( ) 1/2 ĀĀT 1 ĀĀT with Ā = [A T, B T ] T and ˆx (t) = x (t) + t 1 ( ) x (t) x (t 1), = 1,..., 2N. t + 2 This dua-variabe updating rue offers faster convergence speed than the conventiona gradient updating [33] as shown in [34]. The agorithm described is summarized in Tabe I. B. Compexity Anaysis To quantify the compexity of the agorithm, we distinguish the communication overhead and computationa compexity. In the agorithm, to update its prima variabes x [2n 1:2n] via (22), PSN unit n ony needs to gather dua variabes from its neighboring OSNs; therefore, the number of messages passed to PSN unit n depends on the number of active dua variabes and is at most 4 I(x n ) for the 16- QAM, and 2 I(x n ) for the 4-QAM. To update dua variabes {λ [4 3:4], ν [2 1:2] }, OSN unit ony needs to coect

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL.*, NO.*, MONTH 2016 8 TABLE I PRECODING ALGORITHM WITH PARALLEL COMPUTATION UNITS 1) Initiaize: dua variabes λ (0) > 0, ν (0) ; 2) Repeat for T iterations unti convergence criterion is met: 2.1) For n = 1,..., N: PSN unit n computes its prima variabes x [2n 1:2n] using (22), and broadcasts the updated vaues to its neighboring OSN units; 2.2) For = 1,..., K: OSN unit computes its dua variabes {λ [4 3:4], ν [2 1:2] } using (24) and broadcasts the updated vaues to its neighboring PSN units. VI. SPARSE MC-CDMA WITH REPLICA CONSTELLATIONS In the previous sections, we have mainy focused on the system with standard QAM consteations for which convex constraints on the precoded vector are constructed according to the SEP targets. In this section, we assume the system adopts repica consteations, where each D is the periodic extension of a reguar QAM consteation aong the rea and imaginary axes. We propose an optimized Tominson- Harashima Precoding (THP) under the SEP constraints by appying a simiar approach we have used for the system with standard QAM consteations. prima variabes from its neighboring PSNs; therefore, the number of messages passed to OSN unit is at most 2 I(y ) for both the 4/16-QAM. Thus, with T iterations, the tota number of message-passings across computation units is O(4KLT ) and O(6KLT ) for the 4-QAM and 16- QAM system, respectivey, where we have used the fact that N n=1 I(x n) = K =1 I(y ) = KL with L being the number of non-zeroes in each signature. Tabe II summarizes the computationa compexity for the agorithm proposed. To update each of its prima variabes via (22), PSN unit n needs at most (4 I(x n ) 1) additions and 4 I(x n ) mutipications for the 16-QAM, and (2 I(x n ) 1) additions and 2 I(x n ) mutipications for the 4-QAM. On the other hand, to update each of its dua variabes via (24), OSN unit needs at most (6 I(y ) + 5) additions and (4 I(y ) + 2) mutipications for both the 4/16-QAM. Overa, the agorithm invoves O(16KLT ) additions and mutipications for the 4-QAM system, and invoves O(20KLT ) additions and mutipications for the 16-QAM system, where T is the number of iterations. It is cear that the more sparse the signatures are, the ess overa communication overhead and computationa compexity are required to generate the precoded symbos in proposed agorithm. For comparison, we note that the conventiona ZF precoding of (5) has a computationa compexity of O( 8 3 K3 +4NK 2 ) to compute precoding matrix W = H (HH ) 1 and additiona compexity of O(4KN) to generate each precoded symbo vector via x = Wd. Consider a transmission frame that consists of T s 4-QAM symbos intended for each MS and assume channe H remains unchanged during the frame. The ratio of the compexity of the proposed scheme to that of the ZF approach is thus quantified by ρ = O(16KLT T s )/O( 8 3 K3 + 4NK 2 + 4KNT s ). It is cear that the smaer L, the smaer ρ wi be. In particuar, ρ 4LT/K for a fuy oaded system with N = K and sufficienty arge T s. As an exampe with K = 32, L = 4 and T = 100 iterations, ratio ρ 50, which indicates the proposed scheme has approximatey 50 times the compexity of ZF precoding. Despite the increase in compexity, the proposed scheme is abe to provide enormous transmit power reduction over ZF precoding and is more robust against imperfect channe state estimation, as wi be shown ater in Section VII. A. THP-Basics We first briefy review some basic concepts reated to THP, see, e.g., [35]. In genera, the repica consteation point d D can be represented as: d = D + 2β M (ar + ja I ), (25) where D corresponds to a reguar point in the scaed M-QAM consteation under scaing β ( = 1,..., M), and {a R, a I } corresponds to an arbitrary integer pair, see Fig. 6 for a visua iustration when M = 4. It is noted that decision regions associated with a repica points are identica cosed squares with side ength 2β. The THP is normay done in a successive manner in which interference created by previous users transmissions is precanceed to faciitate the transmission for the current user at each stage. The encoding is accommodated by the repica consteation and moduo-operation at the transmitter. Specificay, et the channe matrix be represented as H = FR as a resut of QR factorization, where F is a unitary matrix and R is an upper trianguar matrix. Then B = HF = R is a ower trianguar matrix. The successive precoding operates as x = 1 B, [ d 1 =1 B, x ] p, (26) where d D is the repica point carrying information for MS and [u] p is the moduo operation operated on compex number u with respect to basis p and is defined as: R{u} + p /2 I{u} + p /2 [u] p = u p j p, p p (27) with p = Mβ. The transmit signa is then formed by mutipying F with x, i.e., x = F x. In this way, at the receiver side, no MS experiences inter-user interference because of the pre-canceation operations done at the BS. It is remared that since THP is performed in a successive manner, different user orderings may ead to different performance. To find the optima ordering, one needs to do an exhaustive search over a possibe combinations, which is generay infeasibe as K goes arge. In this wor we simpy adopt the suboptima V- BLAST (VB) ordering [36].

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL.*, NO.*, MONTH 2016 9 TABLE II COMPUTATIONAL COMPLEXITY OF THE PROPOSED ALGORITHM Schemes \ Operations + / iteration / iteration compexity (T iterations) compute (22) : 4KL 2N 4KL 4-QAM O(16KLT ) compute (24) : 12KL + 10K 8KL + 4K 16-QAM compute (22) : 8KL 2N 8KL O(20KLT ) compute (24) : 12KL + 10K 8KL + 4K D 2 D 4 and the set of information-carrying repica points {d, = 1,..., K} is determined from the ZF-THP encoding procedure. Therefore, the precoding agorithm proposed in Section V can be appied here to cacuate the optimized THP precoded vector. D 1 D 3 d d d Fig. 6. Repica consteation buit on 4-QAM: β is a scaing factor; {D 1, D 2, D 3, D 4 } correspond to reguar signa points before periodic extension; taing d = D 4 as an exampe, A(d ) corresponds to the decision region, B(d ) corresponds to the precise constraint region on noiseess output y m, whie C(d ) represents the constraint region with conservative approximation. B. Optimized THP Under the SEP constraints, we can formuate a precoding optimization probem simiar to P 1. The idea is that instead of choosing a minimum scaing factor β for the consteation and performing a zero-forcing THP, one can reax the constraints on the noiseess output components and introduces room for optimizing the input signas as we scae up the consteation. In particuar, with the repica consteation scaed up, the resuting constraint regions become boxes centered at each repica point as constructed and approximated simiary for the inner points of the 16-QAM consteation, see Fig. 6. The optimization probem is then formuated as: { min P ( x) = x T x P 2 : x R 2N 1 (28) s. t. A x c 0, where matrices A = {a i,j } R 4K 2N and c = {c i } R 4K 1 are formed according to the constraints { } ( ) C(d ) = ȳ (r), d (r) ȳ(i) : δ 0 ȳ (r) d (r) + δ 0 d (i) δ 0 ȳ (i) d(i) + δ, 0 = 1,..., K, (29) where parameter δ 0 determines the size of the constraint box and is chosen to satisfy: ( ) ( ) δ0 β δ0 + β Q Q = 1 P e, (30) VII. SIMULATION RESULTS We now present numerica resuts to demonstrate the effectiveness of the precoding schemes proposed for the sparse MC-CDMA system. A. Simuation Setup In the simuation, the noise variance N 0 is set to unity. The tota number of subcarriers is fixed with N = 32 and the number of MSs K N is aowed to vary. Different MSs experience different frequency-seective fading channes. Specificay, the channe frequency response between the BS and MS is generated according to where g = h,n = Q 1 q=0 2πqn j g,q e N, n = 1,..., N, (31) [ g,0,, g, Q 1 ] T represents the discrete-time channe response consisting of Q taps; components {g,q } are modeed as independent zero-mean Gaussian random variabes, whose individua variance equas { λe q 4 } with normaization factor λ [ chosen such that E g 2] = 1. In the simuation, Q = 8 is adopted for the channe generation. It is assumed that there is no inter-symbo-interference and inter-carrier-interference in the system. Uness stated otherwise, for any fixed system configuration, we simuate 1000 transmission sots, under each of which random data and random channe are independenty generated for each MS. In addition, we produce 10 random reguar signature matrix reaizations as defined in Section II-C and thus every 100 transmission sots share the same signature matrix. The system transmitted power consumption presented shorty is averaged over a transmission sots. For the precoding agorithm proposed, the cacuation terminates at iteration t if the normaized improvement of dua objective g (t) g /g δ = 10 4, where g = max i {1,...,t 1} g(i). B. System with Standard Consteation Fix SEP target P e = 10 3 for a MSs and vary K [24 : 32]. In the sparse MC-CDMA, different eves of sparsity, e.g., L = 4 and L = 8, are considered. The case with L = 32, referred to as the dense MC-CDMA, is aso considered for the

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL.*, NO.*, MONTH 2016 10 60 55 50 45 40 35 L=32 (ZF) L=32 (RZF) L=8 (ZF) L=4 (ZF) L=4 (RZF) L=8 (RZF) L=4 (proposed) L=8 (proposed) L=32 (proposed) (a) 33.5 33 32.5 32 30 31 32 70 60 50 40 30 16 QAM (ZF) 16 QAM (RZF) 4 QAM (ZF) 4 QAM (RZF) 16 QAM (proposed, L=8) 4 QAM (proposed, L=8) 20 30 25 24 25 26 27 28 29 30 31 32 K 65 60 55 50 45 40 L=32 (ZF) L=32 (RZF) L=8 (ZF) L=8 (RZF) L=4 (ZF) L=4 (RZF) L=4 (proposed) L=8 (proposed) L=32 (proposed) 44 43.5 43 42.5 (b) 30 31 32 35 24 25 26 27 28 29 30 31 32 K Fig. 7. Transmit power consumption at BS under different oads and different eves of sparsity (N = 32, P e = 10 3 ): (a) standard 4-QAM; (b) standard 16-QAM. purpose of comparison. In addition, we have aso compared with two conventiona precoding schemes incuding ZF and the optimized RZF in form of: x ZF = H ( HH ) 1 d, (32) x RZF = 1 H ( HH + 2 I K ) 1 d, (33) where d = [d 1,..., d K ] T with d denotes the transmitted data symbo for MS and is drawn from a scaed version of standard consteation by β, I K is a K K identity matrix, and 1, 2 are two non-negative parameters to be optimized subject to the SEP constraints in (13). Note that the optimized RZF encompasses the conventiona reguarized ZF precoder [12] with 1 = 1 and aso the minimummean-square-error (MMSE) precoder [20] with 1 = 1 and 2 = K 2 /P. Fig. 7-(a) and Fig. 7-(b) pot the transmit power consumption at the BS versus K under different setups and precoding schemes for systems with 4-QAM and 16-QAM, respectivey. Two important observations are made as foows. First, power reduction from the proposed precoding over ZF and RZF is ceary evident for a oad and sparsity combinations considered. In particuar, for any fixed L, the reduction increases as K grows. For instance, when K = N = 32 and L = 8, for the 4-QAM, we have 18.8 db and 13.6 db reduction compared with ZF and RZF, respectivey, whie for the 16-QAM, we have 16 db reduction compared with both schemes, noting that the optimized RZF soutions are degraded 10 10 0 10 1 10 2 SEP target Fig. 8. Power consumption versus different SEP targets (K = N = 32, L = 8). and coincide with the ZF soutions in this case. As K N, the effective channe matrix H is increasingy iey to be poory conditioned. Hence, the inefficiency of conventiona schemes (in particuar ZF) becomes pronounced. However, the precoding proposed is not sensitive to the conditiona number of H and aways attains the best performance. Second, for the sparse MC-CDMA system, there is a trend that a denser signature (a arger vaue of L) eads to a smaer power consumption needed. For instance, the system with L = 4 consumes sighty more power than the system with L = 8 to achieve the same SEP target under both the 4- QAM and 16-QAM systems. However, to attain comparabe power efficiency to the dense MC-CDMA, the signatures can sti be reativey sparse (L = 8 in our exampes), yieding considerabe reduction in precoding compexity. This observation aso indicates that the sparse MC-CDMA system with proper choice of L woud attain amost the same in throughput as that of the dense MC-CDMA system under the same transmit power budget. Fig. 8 pots the power consumption versus different SEPs with K = N = 32 and L = 8, which further confirms the superiority of the proposed scheme as compared to baseines ZF and RZF under different SEP targets for both 4-QAM and 16-QAM systems. C. System with Repica Consteation We now consider the system with repica consteation, where system parameters N = 32 and L = 8. A MSs request the same minimum SEP target P e = 10 3. Under this SEP requirement, a uniform scaing factor across a MSs is chosen such that the power consumption is minimized for the proposed optimized THP. To perform the precoding optimization, we use an agorithm simiar to that for systems with standard consteations. Therefore, signature sparsity is everaged to reduce precoding compexity, as it was before. Fig. 9-(a) and Fig. 9-(b) pot the transmit power consumption versus K under both the ZF- and optimized THP schemes for the system with 4-QAM and 16-QAM repica points, respectivey. The performance of the proposed scheme under standard consteations is aso incuded here for the purpose of comparison. It is seen that the optimized THP is 10 3 10 4 10 5

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL.*, NO.*, MONTH 2016 11 34 (a) (a) 33 Proposed (Standard 4 QAM) ZF THP Proposed Optimized THP 10 1 32 31 30 29 28 24 25 26 27 28 29 30 31 32 K 45 (b) BER 10 2 10 3 10 4 ZF (standard 4 QAM) Proposed optimized precoding (standard 4 QAM) ZF THP Proposed optimized THP 10 5 20 25 30 35 40 45 50 55 (b) 44 43 Proposed (Standard 16 QAM) ZF THP Proposed Optimized THP 10 1 42 41 40 39 38 37 36 35 24 25 26 27 28 29 30 31 32 K BER 10 2 10 3 10 4 10 5 ZF (standard 16 QAM) Proposed optimized precoding (standard 16 QAM) ZF THP Proposed optimized THP 25 30 35 40 45 50 55 60 65 Fig. 9. Power consumption for ZF-/optimized THP with repica points under different oads: (a) extended from standard 4-QAM; (b) extended from standard 16-QAM (N = 32, L = 8, P e = 10 3 ). abe to provide significant power reduction over the proposed scheme under standard consteations. This further reduction, abeit appeaing, does not come for free and has to be paid with more sophisticated encoding and decoding operations in THP schemes as described in Section VI. It is aso observed that the optimized THP generay outperforms ZF-THP in power efficiency and the exact gain depends on the system oad. In particuar, the former provides roughy 1.5 db power reduction over the atter for 4-QAM repica and roughy 0.85 db reduction for 16-QAM repica in a fu-oad system. The ZF-THP is aready very power-efficient, yet the proposed THP is seen here to provide further reduction in transmit power. D. Bit Error Rate (BER) Resuts and Impact of Imperfect Channe Estimation So far, we have demonstrated the power efficiency of the proposed precoding under different uncoded SEP targets. We now evauate the impact of the proposed scheme on another practicay important performance metric in terms of uncoded bit error rate (BER). The BER is cacuated and averaged over 10 6 reaizations of transmissions. Fig. 10 (a) and (b) depict the average BER (at a typica MS) as a function of power consumption for a system with standard/repica 4-QAM and 16- QAM, respectivey. Consistent with the previous observations, the proposed optimized precoding significanty outperforms ZF precoding in terms of power efficiency to attain the same BER target under standard QAM consteations. The optimized Fig. 10. Uncoded BER versus power consumption (K = N = 32, L = 8): (a) standard and repica 4-QAM; (b) standard and repica 16-QAM. THP is more power-efficient than the ZF-THP, and both of them generay outperform the optimized precoding under standard consteations but at the cost of increase compexity as expained before. Next, we evauate the impact of channe estimation error (i.e., channe uncertainty) on the performance of the schemes considered. Let Ĥ denote the estimated sparse channe matrix. Each nonzero entry ĥ,n of Ĥ is a noisy version of the perfect h,n of H. To mode the uncertainty, we assume ĥ,n is generated according to: ĥ,n = h,n + z,n, where z,n CN (0, e) 2 represents the compex Gaussian estimation error with variance e 2 and {z,n,, n} are independenty and identicay distributed. The average normaized channe uncertainty is then defined as τ = E[10 og 10 ( Ĥ H 2 2/ H 2 2)] in db. For a schemes evauated, the SEP target P e is set to 10 2 so that the corresponding BER is on the order of 10 3, if perfect channe state information is avaiabe. Fig. 11 (a) and (b) depict the rea BER (at a typica MS) versus different eves of channe uncertainty for a sparse MC-CDMA system with standard/repica 4-QAM and 16-QAM signaing, respectivey. It can be seen that as the channe uncertainty increases, the rea BER of both the ZF approach and the proposed precoding scheme degrade. However, the proposed scheme aways outperforms its ZF counterpart and exhibits much better robustness against imperfect channe estimation, particuary for a system with standard consteations. The proposed scheme is thus not ony more power-efficient but aso more robust against channe uncertainty.