Stair Matrix and its Applications to Massive MIMO Uplink Data Detection

1 Stair Matrix and its Applications to Massive MIMO Uplink Data Detection Fan Jiang, Stdent Member, IEEE, Cheng Li, Senior Member, IEEE, Zijn Gong, Senior Member, IEEE, and Roy S, Stdent Member, IEEE Abstract In this paper, we propose a new approach of sing the stair matrix for plink data detection in massive MIMO systems. We first demonstrate the applicability of the proposed method by showing that the probability (that the convergence conditions are met) will approach one as long as sfficient large nmber of antennas are eqipped at the base station. We then propose an iterative method to perform data detection and show that a mch improved performance can be achieved with the comptational complexity remaining at the same level of existing iterative methods where the diagonal matrix is adopted. Performance evalation is condcted in terms of the probability that the convergence conditions are met, the normalized mean-sqare error of the Nemann series expansion to approach the matrix inverse, the residal estimation error to approach the linear ZF/MMSE detection, and the system bit error rate. Nmerical simlations show significant performance enhancement of sing the stair matrix over the diagonal matrix in all performance aspects. Index Terms Massive MIMO; Stair Matrix; Iterative Method; Convergence Condition. I. INTRODUCTION The development and sccessfl applications of mltipleinpt mltiple-otpt (MIMO) systems in modern wireless commnications have broght the bright prospective of massive MIMO techniqes in ftre 5G mobile commnication systems [1] [3]. It is foreseeable that massive MIMO, together with the millimeter wave freqency band [4], has been a promising candidate to meet the high rate, low latency 5G system reqirements. De to the hge potential mltiplexing and diversity gain over the small-scale MIMO and single-antenna systems, massive MIMO can boom the system spectrm and energy efficiency [1], [5] [7]. Along with the benefits of massive MIMO, however, the cost of high comptational complexity reqired in signal processing (data detection, precoding, etc.) increases, which prohibits the application of the optimal detection methods, sch as the maximm likelihood (ML), and maximm a posteriori (MAP) detection, in realization. To achieve good tradeoff between the system performance and the comptational complexity, linear detection (and precoding) methods, sch as zero-forcing (ZF) and minimm mean-sqare error (MMSE), have been considered in realization [8] [15]. It has also been demonstrated that with these linear detection methods, the near-optimal performance The athors are with the Department of Electrical and Compter Engineering, Faclty of Engineering and Applied Science, Memorial University of Newfondland, St. John s, NL, A1B 3X5, Canada. E-mail: {fjiang, licheng, zg7454}@mn.ca can be achieved in massive MIMO systems, especially when the nmber of antennas at base station (denoted by N B ) is mch greater than the nmber of ser eqipment (denoted by N U ) in service. However, as we know, ZF/MMSE based data detection schemes experience matrix inversion, which is comptational costly (almost O (N 3 ), where N is the matrix size) in implementation. Therefore, the investigation of redcing comptational complexity bt still maintaining near-optimal system performance of ZF/MMSE based data detection schemes has emerged recently [8] [15]. Generally, all those schemes can be smmarized in two categories: the first one is to approach matrix inversion, and the other is to solve linear eqations with iterative methods. The first category is to approach the matrix inversion [8] [10]. For example, in [8], the athors attempt to introdce Nemann series expansion to avoid the matrix inversion in linear MMSE detection. It has been shown that when the nmber of antennas at base station is mch greater than the nmber of ser eqipment, the orders reqired for Nemann series expansion can be as few as 3 (for example, r = N B /N U 16). In [9], the probability of the convergence condition that sing the diagonal matrix in Nemann series expansion has been comprehensively discssed. However, Nemann series expansion sffers from matrix mltiplications, and the comptational complexity is comparable to the matrix inversion algorithm when the expansion order is more than two. In order to speed p the convergence rate, diagonal banded Newton iteration based matrix inversion approach is stdied in [10], where the Newton iteration strctre is sed. Actally, the reslts after P iterations in Newton iteration can be seen as the Nemann series expansion of the order P 1 [10]. Inevitably, matrix mltiplications are involved in diagonal banded Newton iteration based matrix inversion approach, and the iterations are limited to for comptational complexity consideration. In smmary, the methods that are to approach matrix inversion sffer from high comptational complexity de to the matrix mltiplications and the slow convergence rate when the ratio r is not sfficiently large. The second category is to solve linear eqations with iterative methods [11] [15]. The basic idea of these methods is to transform the matrix inversion problem into solving linear eqations. To solve the linear eqations, an initial estimation is provided. Then following an iterative strctre to converge, the final otpt is provided as the soltions to linear eqations. For example, in [11], the Jacobi method is adopted, and by following the Jacobi iterative strctre, the estimation eventally approaches the MMSE estimation. The Richardson

iteration in massive MIMO plink data detection has been stdied in [1], and the athors have demonstrated that the iterative strctre can converge even with zero initialization. However, as pointed ot in [13], [14], the convergence rate for both Jacobi method and Richardson iteration is slow, hence qite a few iterations are reqired for convergence. The application of Gass-Seidel method to massive MIMO plink data detection is stdied in [13], and the convergence performance can be greatly improved. By providing an initial estimation that is close to the MMSE estimation, the joint steepestdecent and Jacobi method based data detection is proposed in [14], and the iterations are greatly redced. In [15], the athors formlate the MMSE estimation as a minimization problem, and se the conjgate gradient to calibrate the next estimation. However, conjgate gradient-based data detection scheme involves many division operations, which is also comptational costly. Compared to the first category which is to approach matrix inversion, solving linear eqations with iterative methods is of less complexity de to the replacement of matrix mltiplications with matrix-vector prodcts. However, as smmarized in [14], the overall comptational complexity of the iterative methods, inclding the comptations in both the initialization and iteration, is still high. It is worth pointing ot that the convergence rate of the existing iterative methods can be speeded p by sing preconditioning [16]. The potential direction to frther redce the comptational complexity can be finding an iterative method that reqires less comptation in initialization and less iterations for convergence [17]. In both the previos mentioned two categories of data detection schemes, we note that most proposals in existing literatres mainly tilize the diagonal matrix in the development. In [8], [9], the applicability of sing diagonal matrix to massive MIMO plink data detection has been demonstrated. However, as we will show later, we find some limitations for sing diagonal matrix. First of all, in the massive MIMO system configration with small ratio of r = N B /N U, the convergence rate of sing the diagonal matrix is slow. Alternatively, a few iterations (or orders in Nemann series expansion) are reqired to provide near-optimal system performance. Besides, the convergence conditions, which are critical for the both data detection schemes mentioned above, are met with a low probability when r is small. That is to say, in some cases, the diagonal matrix may not be sed to converge. The motivation of this paper originates from achieving a better tradeoff between comptational complexity and system performance in massive MIMO plink data detection. We propose to se the stair matrix in the development. As far as we know, the applications of stair matrix in massive MIMO systems have not been stdied. The contribtions of this paper are smmarized as follows: We show that when N B grows to infinite, the probability that the convergence conditions are met approaches 1. As the antennas at base station in Massive MIMO systems can be hndreds, this conclsion demonstrates the applicability of the stair matrix in massive MIMO systems; We demonstrate the proposed iterative method with the se of the stair matrix has the same level of the comptational complexity compared to the existing iterative methods where the diagonal matrix is applied; We show that by sing the stair matrix, the probability that the convergence conditions are met can be greatly improved in a comparatively low r region, and the cmlative distribtion fnction of the maximm eigenvale of the convergence matrix indicates that the convergence rate can be speeded p by sing the stair matrix; We demonstrate that by sing the stair matrix, the meansqare error of the trncated Nemann series expansion to approach matrix inverse, can be greatly redced; We show that the residal estimation error of the proposed iterative method sing the stair matrix is mch less than that of the Jacobi method where the diagonal matrix is applied; We compare the system BER performance with the proposed iterative method, and show that the performance improvement over the se of the diagonal matrix is significant. The rest of this paper is organized as follows. Section II provides the system model, inclding the massive MIMO strctre and the preliminary work of linear ZF/MMSE detection. In section III, the introdction to stair matrix and its applicability in massive MIMO will be presented. The implementation of stair matrix in massive MIMO data detection with iterative method is presented in section IV. In section V, we condct the nmerical simlations and present the reslts and discssion. Finally, the conclsions are drawn in section VI. Notations: Throghot the paper, the lowercase and ppercase bold symbols denote the colmn vector and matrix, respectively. ( ) T, ( ) H, and ( ) 1 are reserved for matrix transpose, conjgate transpose, and inverse, respectively. C and N are reserved for the sets of the complex and natral nmbers, respectively. A F and a are the Frobenis-norm of a matrix A and the l -norm of a vector a. E { } and cov {, } denote the expectation, and covariance operation. exp ( ) and ln ( ) denote the exponential and natral logarithmic fnctions, respectively. I L is reserved for the size L identity matrix and e l represents the lth colmn of I L ; diag {a} converts a colmn vector a to a diagonal matrix and diag {A} obtains the diagonal elements in a matrix A to form a colmn vector. ρ (A) is the spectral radis of the matrix A. II. SYSTEM MODEL We consider the massive MIMO plink with N B antennas at base station to simltaneosly serve N U single-antenna ser eqipment. The N B bitstream from each ser is first encoded, then interleaved, and fed into digital modlator. The modlated symbols are transmitted into massive MIMO channel, and the received signal vector at base station can be expressed as y = Hx + z, (1) where y = [y 1, y,, y NB ] T is a complex-valed N B 1 vector, with y m denoting the received signal from the m-th receiving antenna. x = [x 1, x,, x NU ] T with the transmitted symbol of ser denoted by x. H = [h 1, h,, h NU ] denotes the channel matrix with h C N B 1 where each entry is independent and identically distribted (i.i.d.), modeled as the flat

3 Rayleigh fading channel [1], [5], [13]. z = [z 1, z,, z NB ] T is the noise vector, satisfying E {zz H } = σzi NB with each entry modeled as zero-mean complex Gassian circlarly symmetric (ZMCGCS) random variable. It is worth noting that in freqency selective fading channels, by applying the OFDM/SC-FDMA techniqes, the signal model expressed in (1) is applied to each sbcarrier. A. Linear MMSE Data Detection The mlti-ser data detector at the base station is to compte the a posteriori log likelihood ratio (LLR) of the bits associated with the modlated symbols. After the knowledge of the channel matrix (note that the channel matrix is obtained throgh channel estimator, where time domain and/or freqency domain training pilots are sed for the channel estimation [18], [19]), the well-known linear MMSE data detection can be given by ˆx = (H H H + σ zi NU ) 1 H H y = W 1 y MF, () where y MF = H H y can be seen as the matched-filter otpt, and the MMSE eqalization matrix W can be expressed as W = G + σ zi NU, (3) where G = H H H is the Gram matrix. It is worth noting that in high signal-to-noise ratio (SNR) region, Eqation () can be redced to ˆx = G 1 y MF, (4) which is the linear ZF data detection scheme, where the noise component is not considered in the eqalization process. To obtain the a posteriori LLR of the bits associated with the modlated symbols, we write the estimation in Eqation () as ˆx = e H ˆx = ρ x + ξ, (5) where the eqivalent channel gain ρ and the a posteriori noise-pls-interference (NPI) ξ can be given by ρ = e H W 1 Ge, (6) ξ = e H W 1 G (x x e ) + e H W 1 H H z. (7) The covariance of the NPI v = cov (ξ, ξ ) is given by v = e H W 1 GGW 1 e + σ ze H W 1 GW 1 e ρ = ρ ρ. Given Eqation (5), (6), and (8), we derive the max-log approximated LLR of the bits associated with x, given by (8) L (b,k ) = γ (min ˆx s min ˆx s ), (9) s χ 0 ρ k s χ 1 ρ k where b,k is the k-th mapping bit associated with x ; γ = ρ /v is the a posteriori signal-to-noise-pls-interference ratio (SINR); χ b k {s s χ, q k = b} denotes the sbset of χ, where the k-th mapping bit associated with the constellation symbol s, i.e. q k, is b; χ is the constellation symbols set. After data detection of all sers, the LLRs are fed into the soft-inpt channel decoder for decoding process. B. Nemann Series Expansion In the previos sbsection, we note that the matrix inverse operations are involved in linear MMSE/ZF data detection. The matrix inverse is comptational costly especially when the matrix size is large. One of the promising practical soltions to address the matrix inverse isse is to employ the Nemann series expansion [8]. The complete Nemann series expansion of the matrix inverse W 1 is given by W 1 = (X 1 (X W)) l X 1, (10) with the following condition satisfied: lim (I l X 1 W) l = 0. (11) When the high order is ignored, the trncated Nemann series expansion can be expressed as WL 1 L 1 = (X 1 (X W)) l X 1. (1) Generally, when we select the matrix X that is close to W, the L order expansion WL 1 in Eqation (1) can be close to W 1. Fortnately, in massive MIMO systems, the gram matrix G is diagonally dominant; hence the diagonal matrix, i.e., D = diag {W} can be selected as X, then the approximation of W 1 is given by WL 1 L 1 = (D 1 (D W)) l D 1. (13) In [8], the athors have provided the pper bond of the residal estimation error sing WL 1 to approach W 1, i.e., (W 1 W 1 L ) ymf I D 1 W L F ˆx, (14) where A F and a are the Frobenis norm of a matrix A and the l -norm of a vector a. From Eqation (14), we can see that the pper bond of residal estimation error decreases as the increase of the expansion order and N B. In other words, if the nmber of antennas at the base station is sfficiently large, even with a small order expansion, the residal estimation error will be small. Particlarly, when N B is sfficient large and the expansion order L, the comptation reqired for the Nemann series expansion will be mch redced, compared to the matrix inverse operations. These two factors provide the evidence to spport the sage of the diagonal matrix in Nemann series expansion for massive MIMO systems. C. Jacobi Method In Nemann series expansion, if the expansion order is greater than, the matrix mltiplication operations are involved; hence, the comptational complexity is comparable with that of the matrix inverse operations. On the other hand, as we can see in Eqation (14), if N B is not sfficiently large, with the expansion order that is less than, the residal estimation error is still considerable. These two factors limit the applications of diagonal matrix in Nemann series expansion.

4 To avoid the matrix mltiplication operations, bt maintain a reasonable orders of expansion, we can se the iterative methods. To be specific, we first rewrite the MMSE estimation in Eqation () as Wˆx = y MF. (15) By transforming the matrix inverse problem into the format of Eqation (15), we can adopt the iterative methods to solve linear eqations. Generally, the iterative methods follow the following process: (1) Provide an initial estimation; () Follow an iterative strctre to obtain the next estimation; (3) When the estimation converges, otpt the final estimation. In Jacobi method, we can have the initial estimation as x (0) = D 1 y MF, (16) which is the common selection in most of the existing literatre. The iterative strctre is given by x (i+1) = D 1 ((D W) x (i) + y MF ) = x (i) D 1 Wx (i) + D 1 y MF, (17) where x (i) denotes the i-th estimation. According to the iterative strctre in Eqation (17), and se the initial estimation given by Eqation (16), we can derive the i-th estimation given by i x (i) = (D 1 (D W)) l D 1 y MF. (18) That is to say, by selecting the initial estimation given by (16), after i iterations following Jacobi iterative strctre, we have the same estimation reslts as the (i + 1)-th order expansion in Nemann series. Therefore, the convergence conditions, the residal estimation error, and the estimation reslts are the same as those in the previos sbsection. However, as we can see from Eqation (16) to Eqation (17), only matrix-vector prodct operations are involved; therefore, Jacobi method has low complexity compared to the Nemann series expansion with the same iterations (or orders in Nemann series). III. STAIR MATRIX AND ITS APPLICABILITY TO MASSIVE MIMO SYSTEMS In this section, we will first introdce the stair matrix and its properties. And then, we will demonstrate the applicability of the stair matrix to massive MIMO systems. A. Stair Matrix and its Properties In an N N matrix A, if its entry A (m,n) = e H mae n, m, n = 1,,, N, satisfies A (m,n) = 0 where n [m 1, m, m + 1], we then call it as a tridiagonal matrix, denoted by A = tridiag (A (m,m 1), A (m,m), A (m,m+1) ). A special tridiagonal matrix is defined as a stair matrix if one of the following conditions is satisfied [0], [1]: (I) A (m,m 1) = 0, A (m,m+1) = 0, where m = k 1, k = 1,,, (N + 1)/ ; Algorithm 1: Compte the Inverse of a Stair Matrix Inpt: The Stair Matrix A = stair (A (m,m 1), A (m,m), A (m,m+1) ) Otpt: A 1 = B = stair (B (m,m 1), B (m,m), B (m,m+1) ) 1.for m = 1 1 N. B (m,m) = 1/A (m,m) 3. end 4. for m = N/ 5. B (m,m 1) = A (m,m 1) B (m,m) B (m 1,m 1) ; 6. B (m,m+1) = A (m,m+1) B (m,m) B (m+1,m+1) ; 7. end Retrn B. (II) A (m,m 1) = 0, A (m,m+1) = 0, where m = k, k = 1,, N/. In accordance, a stair matrix is of type I if the condition (I) is satisfied and is of type II if the condition (II) is satisfied. For example, a 5 5 stair matrix has the following forms: A = or A =. The previos one is of type I and the latter one is of type II. Next, we provide the following properties of the stair matrix in Corollary 1 and. Corollary 1: Let A be a stair matrix. Then A H is also a stair matrix. In addition, if A is of type I, then A H is of type II, and vice verse. Proof: Using the definition, it is straightforward to obtain Corollary 1. Corollary 1 shows that the properties of the stair matrix of type I and type II are almost the same; therefore, we only consider the stair matrix of type I hereafter except for specification. Corollary : Let A be a stair matrix. A is nonsinglar if and only if A m,m, m = 1,,, N, is nonsinglar. Frthermore, the inverse of A, i.e., A 1 is also a stair matrix of the same type, given by A 1 = D 1 (D A) D 1, where D = diag (A). Proof: Since det (A) = N m=1 A (m,m), we can see that A is nonsinglar if and only if A (m,m), m = 1,,, N, is nonsinglar. Following the matrix mltiplications, we can obtain that D 1 (D A) D 1 A = I N. Moreover, we can easily derive that A 1 is also a stair matrix and of the same type as A. Withot loss of generalness, we denote a stair matrix of type I as A = stair (A (m,m 1), A (m,m), A (m,m+1) ). From Corollary, we have the Algorithm 1 to obtain A 1. It is clear from Algorithm 1 that the complexity to obtain the inverse of a stair matrix is O (N), which is the same order of the comptation of D 1. B. Using Stair Matrix in Nemann Series Expansion We define the stair matrix S = stair (G, 1, G,, G,+1 ), derived from Gram matrix G as G (,v), U 1, v = ; S (,v) = G (,v), U, v { 1,, + 1}; 0, otherwise,

5 B (,v) = G (,v) + G (, 1) G ( 1,v) G (,) G (,) G ( 1, 1) G (, 1) G ( 1,) G (,) G ( 1, 1) G (,v), G (,) U 1, v ; 0, U 1, v = ; + G (,+1) G (+1,v), G (,) G (+1,+1) U, v ; + G (,+1) G (+1,) G (,) G (+1,+1), U, v =. () where U {n n N, n N U }; U 1 and U are sbsets of U, defined as U 1 {n n U, n = k 1, k N}, and U {n n U, n = k, k N}, respectively. Applying the stair matrix in Nemann series expansion in Eqation (10), we have 1 G 1 = (I S 1 G) k S 1, (19) k=0 where X is replaced with the stair matrix S and the Gram matrix is considered. The convergence condition for Eqation (19) is lim (I k S 1 G) k = 0, (0) or eqivalently ρ (I S 1 G) = λ 0 < 1, (1) where ρ (A) is the spectral radis of the matrix A, and λ 0 λ 1 λ NU denote the N U eigenvales. The convergence condition is critical for the application of the stair matrix in massive MIMO systems. In order to investigate the spectral radim of I S 1 G, we sppose N U is odd, and define B = I S 1 G, with each entry given by Eqation (), where Algorithm 1 is sed to compte the matrix inverse of the stair matrix. We have the following Lemmas: Lemma 1: B (,v) is given by Eqation (). For U 1, v and N B > 4, we have E { B (,v) } where A 1 and B 1 are respectively given by A1 B 1, (3) A 1 = N B (N B + 1), (4) B 1 = (N B 1) (N B ) (N B 3) (N B 4). (5) Proof: See Appendix B. Lemma : B (,v) is given by Eqation (). For U, v { 1, + 1}, and N B > 4, we have E { B (,v) A }, (6) B 1 where B 1 is given by Eqation (5), and A is given by A = 96N B + 4N B (N B 1) (N B ) (N B 3) + 144N B (N B 1) + 48N B (N B 1) (N B ), (7) 1 For illstration consideration, we investigate the stair matrix in linear ZF detection. However, similar analysis for the stair matrix in linear MMSE detection is straightforward by following the similar process, and the applicability can be demonstrated as well. When N U is even, the difference in the expression of B is only present in the last row. However, the general reslt is also expected. Proof: See Appendix C. Lemma 3: B (,v) is given by Eqation (). For U, v { 1,, + 1}, and N B > 4, we have E { B (,v) } 1A A 3 + 6A 1 A 3 + 4A 4 + 48 A 1 A A 3 3 B 3 1 (8) where A 1, A, and B 1 are given by Eqations (4), (7), and (5), respectively. A 3 and A 4 are respectively given by A 3 = 4N B + N B (N B 1) (N B ) (N B 3) + 36N B (N B 1) + 1N B (N B 1) (N B ), A 4 = N B (N B 1) (N B ) 3 (N B 3) 3 + 6N B (N B 1) (N B ) 3 (N B 3) + 46N B (N B 1) (N B ) (N B 3) + 4N B (N B 1) (N B ) 3 (N B 3) + 0N B (N B 1) (N B ) 3 (N B 3) + 48N B (N B 1) (N B ) (N B 3) + 808N B (N B 1) (N B ) (N B 3) + 18N B (N 1) (N B ) (N B 3) + 83N B (N B 1) (N B ) (N B 3) + 40N B (N B 1) (N B ) 3 + 600N B (N B 1) (N B ) 3 + 4N B (N B 1) 3 (N B ) + 576N B (N B 1) (N B ) + 3480N B (N B 1) (N B ) + 64N B (N B 1) 3 (N B ) + 59N B (N B 1) (N B ) + 8064N B (N B 1) (N B ) + 56N B (N B 1) 3 + 435N B (N B 1) + 9888N B (N B 1) + 304N B. (9) (30) Proof: See Appendix D. Lemma 4: B (,v) is given by Eqation (). For U, v =, and N B > 4, we have E { B (,) } 16A 3A 5 B1 3, (31) where A 3 and B 1 are given by Eqations (9) and (5), respectively. A 5 is given by A 5 = 576N B + 4N B (N B 1) (N B ) (N B 3) +864N B (N B 1) + 88N B (N B 1) (N B ). (3)

6 Proof: See Appendix E. With the reslts in Lemma 1-4, we have E { B F } = N U N U 1 N U =1 v=1 A1 + N U 4N U + 3 E { B (,v) } + (N U 1) B 1 A + (N U 1) B 1 16A 3A 5 B 3 1 1A A 3 + 6A 1 A 3 + 4A 4 + 48 A 1 A A 3 3 B 3 1 (33) Apparently, at the right hand side of the ineqality (33), as the power in nmerator is mch less than that in denominator, we can derive lim E N B { B F } = 0. (34) Applying the Markov s ineqality, we have Pr { B F < 1} = 1 Pr { B F 1} 1 E { B F }. (35) As B F = N U 1 i=0 λ i, together with the ineqality (35), we can see that with sfficiently large nmber of antennas at base station (i.e., N B ), the probability that convergence condition in (1) is satisfied, will approach 1. Following the similar analysis, we can also demonstrated that with sfficient large N B, sing stair matrix, the probability that the convergence condition is met will also approach 1 in the approximation of the linear MMSE estimation. Hence we demonstrate the applicability of the stair matrix in massive MIMO systems. C. Residal Estimation Error We now investigate the residal estimation error by sing the trncated Nemann series expansion. According to Eqation (1), we have G 1 L 1 L = Replacing G 1 with G 1 L (S 1 (S G)) l S 1. (36) in Eqation (4), we have ˆx (L) = G 1 L y MF. (37) Therefore, the residal estimation error J = ˆx (L) ˆx, is bonded as J = (G 1 G 1 L ) ymf = (S 1 (S G)) l S 1 y MF l=l = (S 1 (S G)) L G 1 y MF B L F ˆx, (38) where the ineqality holds since Ax A F x. As N B, Pr { B F < 1} 1. That is to say, the residal estimation error will approach 0 as indicated by ineqality (38). Ineqality (38) also indicates that increasing the trncation order in Nemann series expansion, the pper bond of the residal estimation error can be redced. This evidence, together with high probability with the convergence condition to be met, spports the applications of the stair matrix to massive MIMO systems. IV. IMPLEMENTATION OF THE STAIR MATRIX IN ITERATIVE METHOD De to the involvement of matrix mltiplications, the trncation order in Nemann series expansion is limited to three; otherwise, the comptational complexity is comparable with matrix inversion algorithm. Besides, we note that in existing work, the comptation of the LLR is obtained by tilizing the NPI after the first trncation order in Nemann series expansion (or first iteration in iterative method). This implementation, however, cases significant performance loss when N B is not sfficiently large (or r = N B /N U is not large, for example, r < 8). In this section, we address these isses in the application of stair matrix in iterative method. A. Stair Matrix in Iterative Method Compared to the linear ZF detection, linear MMSE detection achieves a better balance in consideration of interference and noise. Therefore, we will introdce an iterative method to approach the linear MMSE detection. To start with, we define the stair matrix S = stair (W (, 1), W (,), W (,+1) ). It is worth noting that compared to the stair matrix we discssed in previos section, the diagonal elements in the new stair matrix has increased by σz according to Eqation (3), which brings negligible comptational cost. According to Eqation (17), we have x (i+1) = S 1 ((S W) x (i) + y MF ) = x (i) S 1 Wx (i) + S 1 y MF, where x (i) is the i-th estimation. In accordance, if the initial estimation x (0) is selected as (39) x (0) = S 1 y MF, (40) following the iterative process in Eqation (39), we can derive i x (i) = (S 1 (S W)) l S 1 y MF, (41) which indicates that the iterative method in Eqation (39) can be seen as trncated Nemann series expansion method. However, in Eqation (39), only matrix-vector prodct is involved, hence the overall comptational complexity is of the order O (KNU ), where K denotes the iteration nmbers. B. Comptation of the LLR After the estimation of transmitted vector x, we need to compte the LLRs of the associated bits for the soft-inpt channel decoder. After K iterations, the eqivalent channel gain and the covariance of the NPI v (K) can be respectively given by = e H W 1 K Ge, (4) v (K) = e H W 1 K GGW 1 K e + σ ze H W 1 K GW 1 K e (43)

7 Apparently, Eqations (4) and (43) reqires matrix mltiplications if K. Therefore, in [8], [13] [15], D 1, which is the first trncation order, is considered for the simplification. This approximation, however, as we will show in the next section, has cased a significant performance loss. As we can see from Eqation (8), the exact a posteriori covariance of the NPI in linear MMSE estimation can be derived if the eqivalent channel gain is obtained. However, in [8], the athors have claimed that this relationship is not spported in trncated Nemann series expansion. The main reason for that claim is attribted to the fact that WK 1 is far away from W 1 with small K. In previos section, we introdce the iterative method for detection, and the iteration nmbers can be sfficiently large since the comptational complexity in one iteration is of the order O (NU ). With sfficiently large iterations, WK 1 can be qite close to W 1 (we will show this in the next section); hence, we can sed Eqation (8) to derive the covariance of the NPI. The next qestion is how to maintain low comptational complexity to obtain the eqivalent channel gain. We rewrite the eqivalent channel gain in Eqation (8) as ρ = e H W 1 Ge = 1 σze H W 1 e. In addition, we replace W 1 with WK 1, leading to = 1 σ ze H W 1 K e. (44) That is to say, we need obtain the diagonal elements in WK 1 to compte. If N B and r are sfficiently large, the Gram matrix G and W will become diagonal dominant; therefore, D 1 can be a good approximation of W 1, and we have the approximation to given by and v (K) is approximated as v (K) 1 σ zd 1 (,), (45) (1 ). (46) As a conseqence, the a posteriori SINR is approximated as γ (K) ρ(k) v (K) = 1. (47) and γ (K) are sed in Eqation (9) to compte L (b,k ). It is worth pointing ot that althogh we tilize the diagonal matrix to estimate the eqivalent channel gain, the comptation of γ (K) in Eqation (47) indicates that we try to approach the SINR in linear MMSE detection to derive the LLRs of the associated bits. This is qite different from the existing work [8], [13] [15], where the SINR after the first iteration (or the first trncation order in Nemann series expansion method) is adopted. In fact, as the iterations increase, the covariance of the NPI will decrease, and or proposed approximation method is more efficient and accrate. In nmerical simlations, we also validate that or approximation in (45) and (47) otperforms the approximation in existing work. To smmarize, we present Algorithm for the proposed iterative method sing stair matrix. Algorithm : Proposed Iterative Method Using Stair Matrix Inpt: y, H, σ z, and Iteration nmber K; Otpt: LLRs of the associated bits L (b,k ). Initialization: 1.G = H H H, W = G + σ zi NU, y MF = H H y;. S = stair (W (, 1), W (,), W (,+1) ); 3. Compte S 1 throgh Algorithm 1, and D 1 = diag (S 1 ); 4. Initial estimation: x (0) = S 1 y MF ; Iteration: 5. for i = 1 1 K 6. x (i+1) = S 1 ((S W) x (i) + y MF ); 7. end LLR Comptation: 8. = 1 σ zd 1 (,), γ(k) 9. L (b,k ) = γ (K) Retrn L (b,k ). min s χ 0 k = ρ(k) 1 ˆx(K) s ; min s χ 1 k ˆx(K) s. Fig. 1. Cmlative distribtion fnction of the maximm eigenvale N U = 5. C. Comptational Complexity Analysis We consider the nmber of real nmber mltiplications/divisions to evalate the comptational complexity. In initialization steps, the comptation of W and y MF reqires N B NU and 4N BN U real nmber mltiplications, respectively. According to Algorithm 1, the comptation of S 1 reqires 3 (N U 1) real nmber mltiplications and N U real nmber divisions. The initial estimation, provided in Step 4, reqires N U +1 + N U 1 (8 + ) = 6N U 4 real nmber mltiplications. Therefore, the total comptational complexity in initialization steps is N B NU + 4N BN U + 10N U 7. The iteration steps in Algorithm involves matrix-vector prodction. The comptation of (S W) x (i) reqires N U +1 4 (N U 1) + N U 1 4 (N U 3) = 4(N U 1) real nmber mltiplications. The resltant vector is mltiplied by a stair matrix, and additional 6N U 4 real nmber mltiplications are reqired. Therefore, the total comptational complexity in iteration steps is K (4NU N U ). That is to say, the comptational complexity of the iterative process is of O (NU ), which is the same as the existing iterative methods where the diagonal matrix is applied. Last, to obtain L (b k, ), we need the comptation of ρ (K), and the proposed approximation method only reqires the diagonal elements in D, which is obtained in step 3. Compared to the existing work in [8], [13] [15], or proposed scheme saves comptational complexity in this stage. To smmarize, the overall comptational complexity is the same level of the existing work. However, as we will see in

8 next section, the stair matrix otperforms the diagonal matrix at all rond. V. NUMERICAL SIMULATIONS AND PERFORMANCE A. Convergence Conditions EVALUATION We first investigate the convergence condition sing the stair matrix. Using Monte-Carlo method, we generate e7 random channel matrix H. For each H, we extract the diagonal matrix D and the stair matrix S, and compte the maximm eigenvales of the matrix I D 1 G, and I S 1 G, respectively. Using nmerical simlations, we eventally obtain the cmlative distribtion fnction (CDF) of the maximm eigenvales, given by Figre 1. In Figre 1, we evalate the scenario that 5 sers are in service and we increase the nmber of antennas at base station from 100 to 00. The following observations can be fond: With the increase of antennas at base station, the probability that the convergence conditions are met, i.e., Pr {ρ (I S 1 G) < 1} and Pr {ρ (I D 1 G) < 1} will increase. Specifically, for the sage of the diagonal matrix, the probability that the convergence conditions are met, is increase from 0.9 when N B = 100, to 1 when N B = 00. In accordance, for the sage of the stair matrix, Pr {ρ (I S 1 G) < 1} is increased from 0.74 when N B = 100, to 1 when N B = 00; In low r = N B /N U 5 region, the sage of the stair matrix can increase the probability that the convergence conditions are met. For example, when N B = 100, Pr {ρ (I D 1 G) < 1} is only 0.9, while Pr {ρ (I S 1 G) < 1} becomes 0.76. This indicates that in some low r region, the diagonal matrix is not applicable while the stair matrix can be sed; In any system configration, Pr {ρ (I S 1 G) < a} Pr {ρ (I D 1 G) < a}, a (0, 1). As the maximm eigenvale determines the convergence rate, we can conclde that by sing the stair matrix, the convergence rate is more likely faster compare to the sage of the diagonal matrix. The above observations validate the applicability of the sage of the stair matrix and diagonal matrix in massive MIMO systems. Besides, the reslts reveal that by sing stair matrix, we can increase the probability that the convergence conditions are met in low r region compared to the sage of the diagonal matrix. Frthermore, we also find that by sing the stair matrix, the convergence rate is more likely accelerated than the sage of the stair matrix. Fig.. Normalized mean-sqare error for the matrix inverse approximation where S = stair (G, 1, G,, G,+1 ), and (D) = 1 N U (I L 1 (I D 1 G) l D 1 G) where D = diag (G). In addition, we have (S) F and (D) F to indicate the normalized mean-sqare error for the approximation sing the stair matrix and the diagonal matrix, respectively. With different trncation order, we present the reslts in Figre. The following observations can be fond: With the increase of the trncation order, the normalized mean-sqare error is decreased. This indicates that the more trncation orders sed in Nemann series expansion, the more close of the reslting approximation of the matrix inverse is obtained; By sing the stair matrix, the normalized mean-sqare error is always less than that of sing the diagonal matrix in the same system configration. This indicates that the se of the stair matrix always achieves better approximation performance with the same trncation order compared to the se of the diagonal matrix; By sing the stair matrix, less iterations are reqired to achieve the the same level of the normalized mean-sqare error in sing the diagonal matrix. As the trncation order is eqivalent to the iteration nmber in iterative method, the less iterations indicate less comptational complexity in implementation. To smmarize, we conclde that the sage of the stair matrix otperforms the sage of the diagonal matrix in terms of the matrix inverse approximation performance. As we showed in section IV.A, the trncation order is eqivalent to the iterations in iterative method; therefore, the reslts in Figre help to interpret the convergence performance of the iterative method. B. Matrix Inverse we now investigate the performance of the stair matrix in Nemann series expansion to approach the matrix inverse 3. We define (S) = 1 N U (I L 1 (I S 1 G) l S 1 G) 3 In implementation, we propose the iterative method as shown in section IV. However, the reslts of the iterative method can be seen as the Nemann series expansion. C. Residal Estimation Error In iterative method, the estimation is to approach the estimation vector in linear ZF/MMSE method. In section IV.C, an pper bond of the residal estimation error for the se of the stair matrix in approaching linear ZF detection is presented. In order to differentiate the residal estimation error for the se of stair matrix and the diagonal matrix in linear ZF and

9 (a) (a) (b) Fig. 3. Residal Estimation Error: (a) N B = 150, N U = 5, average SNR= 5dB; (b) N B = 00, N U = 5, average SNR= 3.5dB MMSE detection, we define the following terms: J (D 1 ) = (D 1 1 (D 1 G)) L G 1 y MF, J (D ) = (D 1 (D W)) L W 1 y MF, J (S 1 ) = (S 1 1 (S 1 G)) L G 1 y MF, J (S ) = (S 1 (S W)) L W 1 y MF, where W = G + σ zi NU, D 1 = diag {G}, D = diag {W}, S 1 = stair (G m,m 1, G m,m, G m,m+1 ), and S = stair (W m,m 1, W m,m, W m,m+1 ). According to Eqation (38), we can see that J (D 1 ) and J (D ) denote the residal estimation error for the se of the diagonal matrix in approaching linear ZF and MMSE detection, respectively. J (S 1 ) and J (S ) denote the residal estimation error for the se of the stair matrix in approaching linea ZF and MMSE detection, respectively. For a given system configration and average receiving SNR, we present the residal estimation error performance in Figre 3. The following observations are fond: From Figre 3(a) and 3(b), J (S 1 ) is always less than J (D 1 ), and J (S ) is always less than J (D ) after the same iteration nmbers. These reslts reflect that after the same iterations, sing the stair matrix in iterative method can approach both the linear ZF and MMSE estimation more closely compared to the se of the diagonal matrix; In Figre 3(a), we note that, for the se of the diagonal matrix, the residal estimation error decreases slowly and remains a comparatively high level even with large (b) Fig. 4. BER performance: (a) N B = 150, N U = 5; (b) N B = 50, N U = 5. iteration nmbers. However, by sing the stair matrix, we can speed p the decreasing rate and achieve a comparatively lower estimation error level. These reslts are consistent with the previos nmerical reslts where we demonstrate that the se of the diagonal matrix may not be applicable in low r ratio. From Figre 3(a) and Figre 3(b), we can see that, with the increase of the receiving antennas at base station, the performance gain of the se of the stair matrix over the se of the diagonal matrix becomes small. These reslts are reasonable as N B increases large, G and W both become diagonal dominant. However, we can also achieve comparatively lower residal estimation error by sing the stair matrix in iterative method. To smmarize, we conclde that the se of the stair matrix otperforms the se of the diagonal matrix in terms of the residal estimation error. The performance gain is more significant in low r ratio, bt still obvios in high r ratio. D. BER Performance We now evalate the system BER performance. In the system, the base station is simltaneosly serving N U = 5 sers. For each ser, a LDPC code with code length 64800, code rate 1/ is considered for channel code scheme 4. We con- 4 LDPC code has been an agreed standard for long code in 5G

10 sider 64QAM modlation, and a block independent channel is considered for the evalation. To begin with, we investigate the proposed LLR comptation given by (47), and the eqivalent channel gain ρ and the covariance of the NPI v are approximated by (45) and (46). For comparison, we provide the linear MMSE detection as a benchmark, where the LLR comptation is given by Eqation (9) with ρ and v given by Eqation (6) and (8), respectively. The LLR comptation in existing work sch as [8], [13], [14] is to compte the covariance of the NPI after the first iteration. It is worth pointing ot that the iterative methods in [13], [14] reqires less iterations to approach the linear MMSE detection; however, the LLR comptation sed in MMSE detection is not compted from the exact NPI of the MMSE detection, bt the NPI after the first iteration. In Figre 4, we can see that the BER performance of the Jacobi method with the LLR comptation in [8], [13], [14] is far away from the BER performance of the MMSE detection with the exact LLR comptation. This is consistent with or previos analysis, where we pointed ot that the covariance of the NPI will decrease with iterations. However, we note that the proposed LLR comptation can greatly improve the BER performance of the Jacobi method by approximating the covariance of the NPI of the MMSE detection. Hereafter, we only tilize the proposed LLR comptation for the BER performance comparison. We now present the reslts with low r = N B /N U region, and the reslts are presented in Figre 5. The following observations are fond. From Figre 5(a), we note that the BER performance improvement with the proposed stair matrix compared to the diagonal matrix is obvios. However, the system performance is still far away from the MMSE detection even with sfficient large iterations. Specially, for the se of the diagonal matrix, the performance is level off after 9 iterations; for the se of the stair matrix, the performance is greatly improved, bt a level off performance still appears. These are attribted to the slow convergence rate and not a 100 percent convergence conditions satisfied; From Figre 5(b) and Figre 5(c), we can see that the BER performance eventally converges to the performance of the MMSE detection. Specifically, in the system configration N B = 150, N U = 5, at SNR= 5dB, the BER performance of the proposed iterative method after 13 iterations is almost the same as the performance of the MMSE detection. In the system configration N B = 175, N U = 5, at SNR= 4dB, the BER performance of the Jacobi method after 9 iterations approaches the performance of the MMSE detection; From Figre 5(a) to Figre 5(c), we can see that the convergence rate of the proposed iterative method is faster than that of the Jacobi method. These reslts are consistent with the previos analysis. With faster convergence rate, fewer iterations are reqired for the proposed iterative method, hence redcing the overall system comptational complexity. Next, we evalate the BER performance in the system (a) (b) (c) Fig. 5. BER performance: (a) N B = 15, N U = 5; (b) N B = 150, N U = 5; (c) N B = 175, N U = 5. Fig. 6. BER performance: N B = 00, N U = 5.

11 configration with high r = N B /N U region, and the reslts are shown in Figre 6. It is clear that both the ses of the diagonal matrix and stair matrix reqire few iterations to converge. However, as indicated by the cmlative distribtion fnction of the maximm eigenvale, Pr {ρ (I S 1 G) < a} Pr {ρ (I D 1 G) < a}, a (0, 1), we can conclde that the convergence rate of the proposed iterative method sing the stair matrix is faster than that of the Jacobi method sing the diagonal matrix. The reslts validate these conclsions. VI. CONCLUSIONS In this paper, we propose the application of the stair matrix in massive MIMO systems. To begin with, we demonstrate that with sfficient large nmber of antennas at base station, the probability that the convergence conditions are met with the se of the stair matrix approaches 1. We then propose an iterative method to redce the comptational complexity and show that the overall comptational complexity is of the same level as the existing iterative methods where the diagonal matrix is applied. Frthermore, we evalate the performance of the stair matrix in terms of the probability that the convergence conditions are met, the normalized meansqare error of in Nemann series expansion to approach the matrix inverse, the residal estimation error of the iterative method to approach the linear ZF/MMSE estimation, and the system BER performance. Nmerical simlations show that performance enhancement by sing the stair matrix over the diagonal matrix is presented in all performance metrics. A. Preliminaries APPENDIX We first present the preliminary lemmas. Lemma 5: Let a k CN (0, 1), we then have E { a k } = 1, (48) E { a k 4 } =, (49) E { a k 6 } = 6, (50) E { a k 8 } = 4, (51) Lemma 6: Let a = [a 1, a,, a NB ] T with each entry a k CN (0, 1), independent and identically distribted (i.i.d.). We then have E {a H a} = N B, (5) E { a H a 4 } = A 3, (53) E { a H a 4 } = 1 B 1, (54) where A 3 and B 1 are given by Eqations (9) and (5). Lemma 7: Let a = [a 1, a,, a NB ] T, b = [b 1, b,, b NB ] T, with each entry a k CN (0, 1), b k CN (0, 1), and i.i.d., we then have E { a H b 4 } = A 1, (55) E { a H b 8 } = A 5. (56) where A 1 and A 5 are given by Eqations (4) and (3). Lemma 8: Let A = a H bb H c, where a = [a 1, a,, a NB ] T, b = [b 1, b,, b NB ] T, and c = [c 1, c,, c NB ] T, with each entry a k CN (0, 1), b k CN (0, 1), and c k CN (0, 1), and i.i.d., we then have E { A 4 } = A, (57) where A is given by Eqation (7) Lemma 9: Let A = a H ab H bc H bb H da H cd H a, where a = [a 1, a,, a NB ] T, b = [b 1, b,, b NB ] T, c = [c 1, c,, c NB ] T, and d = [d 1, d,, d NB ] T, with each entry a k CN (0, 1), b k CN (0, 1), c k CN (0, 1), and d k CN (0, 1), and i.i.d., we then have where A 4 is given by Eqation (30). B. Proof of Lemma 1 E {A } = A 4, (58) For U 1, v, from Eqation (), we have E { B (,v) W (,v) } = E W (,) E { W (,v) 4 } E { W (,) 4 }, (59) where the Cachy-Schwarz ineqality is applied [8]. From Lemma 7 and Lemma 6, we have E { W,v 4 } = A 1, (60) E { W (,) 4 } = 1 B 1. (61) Hence we complete the proof of Lemma 1. C. Proof of Lemma For U, v = 1, from Eqation (), we have B (, 1) = G (,+1)G (+1, 1) G (,) G (+1,+1). Applying the Cachy-Schwarz ineqality, we have E { B (, 1) } E { G (,+1) G (+1, 1) 4 } E { (G (,) G (+1,+1) ) 1 4 } According to Lemma 8 and Lemma 6, we have (6) E { G (,+1) G (+1, 1) 4 } = A, (63) E { (G (,) G (+1,+1) ) 1 4 } = E { (G (,) ) 1 4 } E { (G (,) ) 1 4 } (64) = 1 B1. For U, v = + 1, following the similar process, we have the same reslt above. Therefore, we complete the proof of Lemma.

1 E { B (,v) G (+1,+1) G (, 1) G ( 1,v) + G ( 1, 1) G (,+1) G (+1,v) G ( 1, 1) G (+1,+1) G (,v) } = E G ( 1, 1) G (,) G (+1,+1) E { G (+1,+1) G (, 1) G ( 1,v) + G ( 1, 1) G (,+1) G (+1,v) G ( 1, 1) G (+1,+1) G (,v) 4 } E { G ( 1, 1) G (,) G (+1,+1) 4 } (65) E { G (+1,+1) G (, 1) G ( 1,v) + G ( 1, 1) G (,+1) G (+1,v) G ( 1, 1) G (+1,+1) G (,v) 4 } = E {(A + B + C + D + E + F ) } 6E {A + B + C + D + E + F } (66) E {D } 4E { G (+1,+1) G ( 1, 1) G (, 1) G ( 1,v) G (,+1) G (+1,v) } = 4A 4 (69) E { B (,) G (+1,+1) G (, 1) + G ( 1, 1) G (,+1) } = E G (,) G ( 1, 1) G (+1,+1) E { G (+1,+1) G (, 1) + G ( 1, 1) G (,+1) 4 } E { G (,) G ( 1, 1) G (+1,+1) 4 } (73) D. Proof of Lemma 3 For U, v { 1,, + 1}, from Eqation (), we have E { B (,v) } given by Eqation (65), where the Cachy-Schwarz ineqality is applied. Next, we have the first expectation in the right hand side of the ineqality (65) given by (66), where A = G (+1,+1) G (, 1) G ( 1,v), B = G ( 1, 1) G (,+1) G (+1,v), C = G ( 1, 1) G (+1,+1) G (,v), D = Re (G (, 1) G ( 1,v) G (,+1) G (+1,v) ) G (+1,+1) G ( 1, 1), E = Re (G (, 1) G ( 1,v) G (,v) ) G (+1,+1) G ( 1, 1), F = Re (G (,+1) G (+1,v) G (,v) ) G ( 1, 1) G (+1,+1). The ineqality (66) holds by noting that (A + B + C + D + E + F ) 6 (A + B + C + D + E + F ), where A, B, C, D, E, F are both real nmbers. Next, we derive the expectations as follows individally. With the reslts in Lemma 6 and Lemma 8, we have E (A ) = E (B ) given by E (A ) = E (B ) = A A 3. (67) E (C ) is given by E {C } = A 1 A 3. (68) where the reslts in Lemma 6 and Lemma 7 are applied. By sing (Re (a)) a, we derive the reslt of E {D }, given by (69), where A 4 is obtained throgh Lemma 9. Applying the Cachy-Schwarz ineqality, we have E {E } 4E { G ( 1, 1) G (, 1) G ( 1,v) G (,v) } E { G (+1,+1) 4 } 4E { G (+1,+1) 4 } E { G (, 1) G ( 1,v) 4 } E { G ( 1, 1) G (,v) 4 } (70) With the reslts in Lemma 6, Lemma 7, and Lemma 8, we derive the reslt of E {E } = E {F }, given by E {E } = E {F } 4A 3 A1 A A 3. (71) Therefore, we derive E { B (,v) } 1A A 3 + 6A 1 A 3 + 4A 4 + 48 A 1 A A 3 3 B1 3 (7) Hence, we complete the proof of Lemma 3. E. Proof of Lemma 4 For U, v =, from Eqation (), we have E { B (,v) } given by (73), where the Cachy-Schwarz in-

13 eqality is applied. By sing a + b ( a + b ), we have G (+1,+1) G (, 1) + G ( 1, 1) G (,+1) ( G (+1,+1) G (, 1) 4 + G ( 1, 1) G (,+1) 4 ), G (+1,+1) G (, 1) + G ( 1, 1) G (,+1) 4 8 ( G (+1,+1) 4 G (, 1) 8 + G ( 1, 1) 4 G (,+1) 8 ). (74) Therefore, we derive E { G (+1,+1) G (, 1) + G ( 1, 1) G (,+1) 4 } 8E ( G (+1,+1) 4 ) E ( G (, 1) 8 ) + 8E ( G ( 1, 1) 4 ) E ( G (,+1) 8 ). (75) With the reslts in Lemma 6 and 7, we have E { B (,) } 16A 3A 5 B1 3. (76) Hence we complete the proof of Lemma 4. REFERENCES [1] T. Marzetta, Noncooperative Celllar Wireless with Unlimited Nmbers of Base Station Antennas, IEEE Transactions on Wireless Commnications, vol. 9, no. 11, pp. 3590 3600, Nov. 010. [] F. Rsek, D. Persson, B. La, E. Larsson, T. Marzetta, O. Edfors, and F. Tfvesson, Scaling Up MIMO: Opportnities and Challenges with Very Large Arrays, IEEE Signal Processing Magazine, vol. 30, no. 1, pp. 40 60, Jan. 013. [3] E. Larsson, O. Edfors, F. Tfvesson, and T. Marzetta, Massive MIMO for Next Generation Wireless Systems, IEEE Commnications Magazine, vol. 5, no., pp. 186 195, Feb. 014. [4] T. Rappaport, S. Sn, R. Mayzs, H. Zhao, Y. Azar, K. Wang, G. W. J. Schlz, M. Samimi, and F. Gtierrez, Millimeter Wave Mobile Commnications for 5G Celllar: It Will Work! IEEE Access, vol. 1, pp. 335 349, May 013. [5] H. Ngo, E. Larsson, and T. Marzetta, Energy and Spectral Efficiency of Very Large Mltiser MIMO Systems, IEEE Transactions on Commnications, vol. 61, no. 4, pp. 1436 1449, Apr. 013. [6] J. Hoydis, S. Brink, and M. Debbah, Massive MIMO in the UL/DL of Celllar Networks: How Many Antennas Do We Need? IEEE Jornal on Selected Areas in Commnications, vol. 31, no., pp. 160 171, Feb. 013. [7] E. Bjornson, L. Sanginetti, J. Hoydis, and M. Debbah, Optimal Design of Energy-Efficient Mlti-User MIMO Systems: Is Massive MIMO the Answer? IEEE Transactions on Wireless Commnications, vol. 14, no. 6, pp. 3059 3075, Jn. 015. [8] M. W, B. Yin, G. Wang, C. Dick, J. Cavallaro, and C. Stder, Large-Scale MIMO Detection for 3GPP LTE: Algorithms and FPGA Implementations, IEEE Jornal of Selected Topics in Signal Processing, vol. 8, no. 5, pp. 916 99, Mar. 014. [9] D. Zh, B. Li, and P. Liang, On the Matrix Inversion Approximation Based on Nemann Series in Massive MIMO Systems, in Proceedings of the IEEE International Conference on Commnications, London, UK, Jn. 015, pp. 1763 1769. [10] C. Tang, C. Li, L. Yan, and Z. Xing, High Precision Low Complexity Matrix Inversion Based on Newton Iteration for Data Detection in the Massive MIMO, IEEE Commnications Letters, vol. 0, no. 3, pp. 490 493, Mar. 016. [11] F. Wang, C. Zhang, X. Liang, Z. W, S. X, and X. Yo, Efficient Iterative Soft Detection Based on Polynomial Approximation for Massive MIMO, in Proceedings of the IEEE International Conference on Wireless Commnications and Signal Processing, Nanjing, China, Oct. 015, pp. 1 5. [1] X. Gao, L. Dai, Y. Ma, and Z. Wang, Low-complexity near-optimal signal detection for plink large scale MIMO systems, Electronics Letters, vol. 50, no. 18, pp. 136 138, Ag. 014. [13] L. Dai, X. Gao, X. S, S. Han, C. I, and Z. Wang, Low-Complexity Soft-Otpt Signal Detection Based on Gass-Seidel Method for Uplink Mltiser Large-Scale MIMO Systems, IEEE Transactions on Vehiclar Technology, vol. 64, no. 10, pp. 4839 4845, Oct. 015. [14] X. Qin, Z. Yan, and G. He, A Near-Optimal Detection Scheme Based on Joint Steepest Descent and Jacobi Method for Uplink Massive MIMO Systems, IEEE Commnications Letters, vol. 0, no., pp. 76 79, Feb. 016. [15] B. Yin, M. W, J. Cavallaro, and C. Stder, Conjgate Gradient-based Soft-otpt Detection and Precoding in Massive MIMO Systems, in Proceedings of the IEEE Global Commnications Conference, Astin, TX, USA, Dec. 014, pp. 3696 3701. [16] J. Brgerscentrm, Iterative Soltions Methods, Applied Nmerical Mathematics, vol. 51, no. 4, pp. 437 450, 011. [17] F. Jiang, C. Li, and Z. Gong, A Low Complexity Soft-otpt Data Detection Scheme Based on Jacobi Method for Massive MIMO Uplink Transmission, in Proceedings of the IEEE International Conference on Commnications, Paris, France, May 017, to appear. [18] L. Dai, Z. Wang, and Z. Yang, Spectrally Efficient Time-Freqency Training OFDM for Mobile Large-Scale MIMO Systems, IEEE Jornal on Selected Areas in Commnications, vol. 31, no., pp. 51 63, Feb. 013. [19] Z. Gao, L. Dai, W. Dai, B. Shim, and Z. Wang, Strctred Compressive Sensing-Based Spatio-Temporal Joint Channel Estimation for FDD Massive MIMO, IEEE Transactions on Commnications, vol. 64, no., pp. 601 617, Feb. 016. [0] H. Lo, Stair Matrices and Their Generalizations with Applications to Iterative Methods I: A Generalization of the Sccessive Overrelaxation Method, SIAM Jornal on Nmerical Analysis, vol. 37, no. 1, pp. 1 17, 000. [1] H. Li, T. Hang, Y. Zhang, X. Li, and T. G, Chebyshev-type Methods and Preconditioning Techniqes, Applied Mathematics and Comptation, Elsevier, vol. 18, no., pp. 60 70, 011.