Optimal Resource Allocation in Multicast Device-to-Device Communications Underlaying LTE Networks

1 Optial Resource Allocation in Multicast Device-to-Device Counications Underlaying LTE Networks Hadi Meshgi 1, Dongei Zhao 1 and Rong Zheng 2 1 Departent of Electrical and Coputer Engineering, McMaster University 2 Departent of Coputing and Software, McMaster University Abstract In this paper, we present a fraework for resource allocations for ulticast device-to-device () counications underlaying the uplink of an LTE network. The objective is to axiize the su throughput of active cellular users (CUs) and feasible ulticast groups in a cell, while eeting a certain signal-to-interference-plusnoise ratio (SINR) constraint for both the CUs and the groups. We forulate the general proble of power and channel allocation as a ixed integer nonlinear prograing (MINLP) proble where one group can reuse the channels of ultiple CUs and the channel of each CU can be reused by ultiple groups. Distinct fro existing approaches in the literature, our forulation and solution ethods provide an effective and flexible eans to utilize radio resources in cellular networks and share the with ulticast groups without causing harful interference to each other. The MINLP proble is transfored so that it can be solved optially by a variant of the generalized Bender decoposition (GBD) ethod with provable convergence. A greedy algorith and a low-coplexity heuristic solution are then devised. The perforance of all schees is evaluated through extensive siulations. Nuerical results deonstrate that the proposed greedy algorith can achieve close-tooptial perforance, and the heuristic algorith provides good perforance, though inferior than that of the greedy, with uch lower coplexity. I. INTRODUCTION Device-to-Device () counication is a technology coponent for Long Ter Evolution-Advanced (LTE-A) of the Third Generation Partnership Project (3GPP) [1]. In counication, cellular users (CUs) in close proxiity can exchange inforation over a direct link rather than transitting and receiving signals through a cellular base station (BS). users counicate directly while reaining controlled under the BS. Copared to routing through a BS, CUs at close proxiity can save energy and resources when counicating directly with each other. Moreover, users ay experience high data rate and low transission delay due to the short-range direct counication [2]. Reducing the network load by offloading cellular traffic fro a BS and other network coponents to the direct path between users is another benefit of counication. In addition, counications can enhance user experience at cell edges [3] or relay traffic for users experiencing poor channel conditions [4], [5] [6]. Other benefits and usage cases of counication are discussed in [7]. The ajority of the literature in counications uses the cellular spectru for both and cellular counications, also known as in-band [8]. Generally, inband falls in two categories, underlay and overlay [9]. Underlay in-band can iprove the spectru efficiency of cellular networks by reusing cellular resources. Its ain drawback lies in the utual interference between and cellular transissions. Thus, efficient interference anageent and resource allocation are necessary [10], [11]. The overlay in-band avoids the interference issue by dedicating part of the cellular resources to counications. In this case, designing a resource allocation schee is crucial to axiize the utilization of dedicated cellular resources [12]. Other works consider out-of-band counications so that the cellular network perforance is not affected by counications [13]. Out-of-band counication faces challenges in coordinating the counication over two different bands because usually counication happens on a second radio interface (e.g., WiFi Direct and Bluetooth) [14]. Most existing work in resource allocation targets the unicast scenario where a single or ultiple pairs reuse the resources of CUs. In [8], the authors consider throughput axiization where by allowing counication to underlay the cellular network, the overall throughput in the network can be increased copared to a case where all traffic is relayed by the cellular network. Soe other works such as [14], [15] consider counication reliability while guaranteeing a certain level of SINR or

2 outage probability. Iproving the su rate is also the objective for the work in [16] and [17], where gae theoretical ethods are used for the users to copete for cellular network resources. The works in [18], [19], [20], [21] consider both throughput and reliability. The work in [18] considers one CU and one pair, and throughput is axiized subject to spectral efficiency and energy constraints. Multiple users and CUs are considered in [19] and [20] for axiizing the total throughput. This is done by solving a ixed integer and nonlinear prograing (MINLP) resource allocation proble in [19] and designing a axiu weight bipartite atching schee in [20]. In [21], a siple pairing algorith is proposed for the proble of sharing CU resource with links. Multicast transissions, where the sae packets for a user equipent (UE) are sent to ultiple receivers, are iportant for scenarios such as ultiedia streaing and device discovery. Specially, ulticast counications are required features in public safety services like police and abulance [1]. Copared to counicating with each receiver separately in unicast, ulticast transission reduces overhead and saves resources. However, unlike the ore coonly studied unicast (see e.g. [18], [20]), ulticast has its own challenges. Within a ulticast group, the data rates attainable at different receivers are different because of the diverse link conditions between each receiver and the transitter. A coon approach is to transit at the lowest rate deterined by the user with the worst channel condition in the group to ensure that the ulticast services can be provided to all users. As a result, the transission rate tends to decrease with nuber of receivers in the ulticast group. As discussed in [22] there are lots of works in ulticast scheduling and resource allocation for OFDMA-based systes. They can be broadly classified into two types: singlerate and ulti-rate transissions. In single-rate broadcast, the BS transits to all users in each ulticast group at the sae rate irrespective of their non-unifor achievable capacities, whereas in ultirate broadcast, the BS transits to each user in each ulticast group at different rates based on what each user can handle. All of the works entioned in [22] targeted cellular networks where the ulticast transitter is the BS. However, in ulticast, UEs are ulticast transitters and the quality of service (QoS) requireents for both the links and the cellular links should be satisfied. The proble of resource anageent for ulticast counication was addressed in our previous work [23], where the power and channel allocation proble for ulticast counication is forulated for a special case where each group can reuse the channel of one CU and the channel of each CU can be reused by at ost one group. In [23], since each cellular channel can be allocated to at ost one group, the radio resources in the cellular network ay not be efficiently utilized, and the groups cannot fully exploit the available channel resources to achieve higher transission rates. A baseline ulticast odel is proposed in [24] for overlaying in-band counications, and iportant ulticast etrics like coverage probability, ean nuber of covered receivers and throughput are analyzed. In [25], a single ulticast group that can reuse at ost one cellular channel in underlay ode is studied, and a resource allocation schee based on cognitive radio is proposed to reduce interference and iprove syste perforance. In this paper, we consider a general scenario of ulticast counications underlaying a cellular network, where each group can reuse the uplink channels of ultiple CUs and the channel of each CU can be reused by ultiple groups. The ain contributions of this work are suarized as follows: The proble of joint power control and channel allocations is forulated as an MINLP that axiizes the aggregated rate of all CUs and groups. Meanwhile, a iniu SINR constraint is iposed to guarantee the QoS requireents for both CUs and groups. In the MINLP, the transission powers are continuous variables, and the integer variables are binary for channel allocations. The MINLP is decoposed it into a prial proble and a aster proble, where the forer corresponds to the original proble with fixed binary variables, and the latter is derived through nonlinear duality theory using the Lagrange ultipliers obtained fro the forer. A variant of the generalized Bender decoposition (GBD) is applied to optially solve the MINLP iteratively. Inspired by the decoposed probles of the MINLP, a greedy algorith is proposed, which has uch lower coplexity than the GBD-based ethod but achieves very close-to-optial perforance. A low-coplexity heuristic solution is then devised which trades off coputational coplexity with perforance. This heuristic algorith extends the heuristic algorith presented in [23] to the general scenario. An exact solution to the MINLP is proposed for a special case where each group can reuse the channel of at ost one CU and each CU can share its channel with at ost one group. The reainder of the paper is organized as follows. In Section II, the syste odel is described and the proble of power and channel allocation for underlay ulticast counication is forulated. Section III describes

3 the generalized Bender decoposition ethod to solve the general proble. The atching-based optial resource allocation for one special case is presented in Section IV, and the greedy and the heuristic algoriths are presented in Section V. Nuerical results are deonstrated in Section VI, and Section VII concludes the paper. II. SYSTEM MODEL AND PROBLEM FORMULATION We study resource allocation for group counications underlaying uplink (UL) transissions in LTE networks. UL resource sharing is considered since reusing downlink resources is ore difficult and less effective than reusing uplink resources in the worst case of a cellular network where all channels are occupied by the cellular users, as deonstrated in [26]. Consider K ulticast groups coexisting with M CUs as shown in Fig. 1. Consider that there are M channels, each occupied by one CU. Our work is to ephasize the benefit of anaging utual interference between the and CU transissions by coordinating the transission power and channel reuse allocations when the ulticast groups underlay the cellular network. Therefore, we only consider the cellular channels that have already been used by the CUs. Allocating channels that are not used by the CUs to the groups does not need to consider the utual interference with the CUs and are not included in the forulation below. We use M = {1, 2,..., M} to index both the th CU and the channel it occupies, and k K = {1, 2,..., K} to index the kth group. We consider a single cell scenario and assue that advanced intercell interference itigation is applied on top of our schee [18], [27], and we consider to reuse the channel of CUs that experience sufficiently good SINRs. For the CUs that experience strong intercell interference, their channels will either not be reused for transissions or be reused for transissions but contribute little to the su rate. Therefore, these channels are not considered for the objective of axiizing the su rate. Within a group, there is only one user that ulticasts essages to the reaining users. Each user only belongs to one group. Multicast groups can be fored during the device discovery process. As low obility is considered for the users, overhead required for aintaining the groups is low. We use D k to represent the set of receivers in the kth ulticast group, and D k is the total nuber of receivers in the group. As a special case, when D k = 1, the scenario becoes unicast. We consider applications that require best effort rates and delay tolerable services. However, a iniu rate ay be required in order to ake the data useful at the receiver side. When the interference condition is poor and such a iniu rate cannot be achieved for a given group, the ulticast teporarily stops, Fig. 1: Syste Model i.e., no channel is allocated to the group. Maxiizing the aggregated rate of all the CUs and groups at all tie opportunistically akes the best use of the current channel conditions. Siilar objectives of axiizing the su rate of and CU transissions have been considered in [9], [19], [20], [17] for different syste odels as suarized before. For applications that require high reliability at all tie, the ain objective is to guarantee a iniu rate at all channel conditions while the channel resource ay not be fully utilized. Define a set of binary variables y with y = 1 if the kth group reuses channel, and y = 0 otherwise. In the general case, each group splits its ulticast traffic aong axially C 1 channels, and each channel can be reused by at ost C 2 groups, where C 1 M and C 2 K. That is, M =1 y C 1, k K, (1) K k=1 y C 2, M. (2) We further define β,d as an indication of the channel quality for receiver d in the kth group at channel given by the ratio of the desired link gain to total power of the experienced interference as follows, β,d = G,d P noise + P G C2D,d + k k P k, G k,,d k K, M, d D k, (3) where P noise is the aggregate power of background noise, G,d is the link gain to receiver d fro the transitter in group k over channel, G C2D,d is the link gain fro CU to receiver d in group k, P is,

4 the transission power of CU, P is the transission power of the kth group transitter at channel, and G k,,d is the link gain fro the transitter of group k to receiver d of group k reusing channel. For the kth group, its transission condition in channel is deterined by the receiver with the worst condition. Define β = in d D k β,d. (4) Then, the noralized transission rate (bit/s/hz) of the kth group is given by r k = M =1 y log 2 (1 + P β ). (5) The aggregate transission rate of the kth group is given by R k D k rk (6) = M =1 y D k log 2 (1 + P β ). (7) For CU, its channel quality is given by β G = P noise + K k=1 y P, (8) GD2C where G is the link gain of CU to the cellular BS, and G D2C is the link gain fro the kth transitter to the cellular BS at channel. Therefore, the noralized transission rate for CU is R log 2 (1 + P β ). (9) There is a iniu SINR required for each group and CU transission that is set by the higher layer based on specific applications [28]. For the kth group, and for CU, P β y γth, (10) P β γth. (11) Note that we only consider the CUs whose SINRs are above the SINR threshold before adding groups and (11) checks if the SINR threshold is satisfied after adding groups. Given these SINR threshold constraints, we can approxiate the capacity in higher SINR regies by reoving the ter 1 fro the logarith functions in both (7) and (9). The axiu power constraints for CUs and groups, respectively, are given by and P M =1 P P ax, M, (12) P ax, k K. (13) The objective is to axiize the aggregate data transission rate of all the groups and CUs. Cobining (1) (13), we forulate the joint power control and channel allocation proble as follows, ( K P1. ax k=1 R k + ) M =1 R (14) s.t. β β,d, k K, M, d D k,(15) y {0, 1}, k K, M, (16) Constraints(1) (3), (7) (13). Table I lists the paraeters and variables used in the proble forulation. Clearly, P1 is a MINLP proble. In general, MINLP probles are NP-hard and thus no efficient polynoial-tie solutions exist. In the general case, when C 1 and C 2 are arbitrary values, we will use GBD [29] to solve the proble optially in the next section. Based on the values of C 1 and C 2, several special cases exist. For exaple, when C 1 = 1 and C 2 = 1, each group can reuse the channels of at ost one CU and each CU can share their channels with at ost one group. Another special case of interest is when C 2 = 1. In this case, to increase the spectru utilization, we allow each group to reuse the resources of ultiple CUs, but each CU cannot share its resource with ore than one group. Here, there is no interference aong groups and this setting is useful when the nuber of groups is uch less than the nuber of CUs. All the special cases can be resolved via GBD. However, it turns out that a polynoial algorith can be devised when C 1 = 1 and C 2 = 1 as will be discussed in Section IV. III. GENERALIZED BENDER DECOMPOSITION The MINLP proble in P1 has the special property that when the binary variables (y s) are fixed, the proble becoes a geoetric prograing proble with continuous variables (P s and P s), which can be transfored to a convex proble. A well-known solution to this type of probles is GBD [29]. However, non-trivial transforations are needed to ensure the separability of the proble with respect to the binary variables. This allows efficient solutions using GBD with guaranteed convergence. We next discuss the details of the proposed solution to P1. A. Proble transforation Let X = [P, P, Rk, R, β, β, k K, M] represent the set of all continuous variables and Y = [y, k K, M] represent the binary variables. We odify the constraints in proble P1 to separate binary variables y Y fro the continuous variables x X and ake the proble linear in ters

5 Notation M K D k A y C 1 C 2 G,d G C2D,d G k,,d G G D2C P P β,d β Rk R R su γth γth f i ( D K ) TABLE I: Table of notations Description Set of cellular users (CU) Set of groups Set of receivers in kth group Set of adissible or successful groups Binary variable, =1 if kth group reuses CU s channel, and =0 otherwise Max. nuber of channels to be reused by a group Max. nuber of groups sharing a CU channel Link gain to receiver d fro transitter in group k at channel Link gain fro CU to receiver d in group k Link gain fro the transitter at group k to receiver d at group k Link gain of CU to the cellular BS Link gain fro the kth transitter to the cellular BS at channel Transission power of the kth group transitter at channel Transission power of CU Channel quality of receiver d in the kth group at channel Channel quality of CU Noralized transission rate of the kth group Noralized transission rate for CU The suation of and cellular throughput SINR threshold for all groups SINR threshold for all CUs The coplexity of solving proble Pi of y s when the continuous variables are fixed. Proble P1 can be transfored to ( K P2. ax x X,y Y f(x, y) = ax k=1 R k + ) M =1 R (17) s.t. β G,d P noise+p GC2D,d + k k P k, G k,,d R k k K, M, d D k, (18) [ ] M =1 D k log 2 (P β ) + C(1 y ), k K (19) D k log 2 (P β ) + C(1 y ) Cy, k K, M, (20) β P P ax P noise + K G k=1 P, M, (21) GD2C y + ɛ CP, k K, M, (22) y {0, 1}, k K, M, (23) Constraints (1) (2) and (9) (13). where C is a very large nuber and ɛ > 0 is a very sall positive nuber. Constraint (18) cobines constraints (3) and (15) in proble P1. Constraints (19) and (20) together are equivalent to constraint (7) in proble P1. In (19), when y = 1, the second ter in the suand, naely, 1 y is zero, and the su of the two ters inside the suation is the sae as the ter inside the suation on the right-hand side in (7) for the sae k and. When, y = 0, the second ter in the suand (19) is a large nuber, and the constraint is autoatically satisfied; while constraint (20) guarantees that the corresponding rate for the kth group at channel is zero when the channel is not allocated to the group. The introduction of constraint (22) akes P very sall whenever y is zero. This eliinates the binary variables y in (8) and results in constraint (22). Meanwhile, when y is zero, the iddle part of (22) is a very sall nuber, and having P in both the left and right hand side of the inequality ensures that P is a very sall nuber but larger than zero. This condition is needed for the logarith functions in (19) and (20) to be feasible. To this end, we have obtained in P2 a geoetric MINLP proble with separable continuous and binary variables. B. Solution using GBD The basic idea of GBD is to decopose the original MINLP proble into a prial proble and a aster proble, and solve the iteratively. The prial proble corresponds to the original proble with fixed binary variables. Solving this proble provides the inforation about the lower bound and the Lagrange ultipliers corresponding to the constraints. The aster proble is derived through nonlinear duality theory using the Lagrange ultipliers obtained fro the prial proble. The solution to the aster proble gives the inforation about the upper bound as well as the binary variables that can be used in the prial proble in next iteration. When the upper bound eets the lower bound, the iterative process converges. Prial proble: The prial proble results fro fixing the y variables to a particular 0-1 cobination denoted by y (i), where i stands for the iteration counter. After replacing the variable y with its current value in proble P2, the forulation for the prial proble at iteration i is given by P3. ax x X,y Y f(x, y (i) ) = ax s.t. ( K k=1 R k + ) M =1 R (24) β G,d P noise+p GC2D,d + k k P k, G k,,d R k k K, M, d D k, (25) M =1 [ D k log 2 (P β ) + C(1 y(i) ], ) k K, (26) D k log 2 (P β ) + C(1 y(i) ) Cy(i), k K, M, (27) β P y (i) Pax P noise + K G k=1 P, M,(28) GD2C + ɛ CP, k K, M,(29) P β y(i) γ th, k K, M,(30) Constraints (9), (11), (12), (13).,

6 Constraints (1) (2) are no longer needed, constraints (25) (29) are copied fro (18) (22), and (30) is the sae as (10). Since the optial solution to this proble (if exists) is also a feasible solution to proble P1, the optial value f(x, y (i) ) provides a lower bound to the original proble. In general, not all choices of binary variables lead to a feasible prial proble. Therefore, for a given choice of y s, there are two cases for prial proble P3: feasible proble and infeasible proble. In the following, we consider each of these cases. Feasible Prial: If the prial proble at iteration i is feasible, its solution provides inforation on the transission power of and cellular transitters, f(x, y (i) ), and the optial ultiplier vectors, λ (i) q, q = 1, 2,..., Q for the Q inequality constraints in Proble P3. Subsequently, using this inforation we can forulate the Lagrange function for all inequality constraints G q (x, y (i) ) 0 for q = 1, 2,..., Q as L(x, y (i), λ (i) ) = f(x, y (i) ) + Q q=1 λ(i) q G q (x, y (i) ), (31) where λ (i) = [λ (i) q, q = 1, 2,..., Q]. Infeasible Prial: If the prial proble is infeasible, to identify a feasible point we can forulate an l 1 - iniization proble as P3.1. in Q q=1 α q (32) s.t. G q (x, y (i) ) α q, q = 1, 2,..., Q, (33) α q 0, q = 1, 2,..., Q. (34) Note that if Q q=1 α q = 0, then P3 is feasible. Otherwise, the solution to this feasibility proble (FP) provides inforation on the Lagrange ultipliers, λ (i) q denoted as ; the Lagrange function resulting fro the FP at iteration i can be defined as L(x, y (i), λ (i) ) = Q (i) q=1 λ q (G q (x, y (i) ) α q ). (35) It is worth entioning that two different types of Lagrange functions are calculated depending on whether the prial proble is feasible or infeasible. Also, the lower bound is obtained only fro the feasible prial proble. Master Proble: The aster proble is derived fro the non-linear duality theory [29]. The original proble P2 can be written as: ax sup y Y x X Let also define set V as f(x, y) (36) s.t. G q (x, y) 0, q = 1, 2,..., Q. V = {y : G q (x, y)) 0 for soe x X }. (37) Using the Lagrange function in (31) and duality theory, we obtain ax f(x, y (i) ) = ax(in sup L(x, y (i), λ (i) )) y (i) λ (i) x (38) = ax η (39) s.t. η sup L(x, y (i), λ (i) ), λ 0, (40) x y (i) Y V (41) It is shown in [29] that a point y Y belongs also to the set V if and only if they satisfy the following syste: inf L(x, y (i), λ (i) ) 0, λ (i) Λ, (42) x where Λ = { λq 0, Q λ } q=1 q = 1. Substituting (42) for y Y V into (38) we can ake the constraints over set V explicit and obtain the following aster proble: P4. ax η (43) y (i) Y s.t. η sup L(x, y (i), λ (i) ), λ (i) 0, (44) x inf L(x, y (i), λ (i) ) 0, λ (i) Λ, (45) x Constraints (1), (2). The aster proble P4 is siilar to the original proble P2, but has two inner optiization probles that need to be considered for all λ and λ obtained fro the prial proble in every iteration. Therefore, it has a very large nuber of constraints. Because of the separability of binary variables y Y and continuous variables x X, and the linearity with regard to binary variables, we can adopt Variant 2 of GBD (V2-GBD) in [29]. It is proven in [29] that under the conditions for V2-GBD, the Lagrange function evaluated at the solution of the corresponding prial is a valid under-estiator of the inner optiization proble in P4. Therefore, the relaxed aster proble can be forulated as, P5. ax η (46) y (i) Y s.t. η L(x, y (i), λ (i) ), λ (i) 0, (47) L(x, y (i), λ (i) ) 0, λ (i) Λ, (48) Constraints (1), (2). The relaxed proble provides an upper bound to the aster proble and can be used to generate the prial proble in the next iteration. The sae procedure is then repeated until convergence. Over the iterations, the sequence of upper bounds are non-increasing and the set of lower bounds are non-decreasing. The two sequences are proven to converge, and the algorith will stop at the optial solution within a finite nuber of iterations [30]. Algorith 1 suarizes the GBD procedure.

7 Algorith 1 GBD Algorith 1: First iteration, i = 1 2: Select an initial value for y (i), which akes the prial proble feasible. 3: Solve the prial proble in P3 and obtain the Lagrange function 4: UBD (i) =, LBD (i) = 0 5: while UBD (i) LBD (i) > 0 do 6: i = i + 1 7: Solve the relaxed aster proble P5 to obtain η and y 8: Set UBD (i) = η 9: Solve the prial proble P3 with fixed y (i) = y 10: if the prial proble is feasible then 11: Obtain optial solution x and the Lagrange function L(x, y (i), λ (i) ) 12: Set LBD (i) = ax(lbd (i 1), f (i) (x, y (i) )) 13: else 14: Solve the feasibility-check proble P3.1 to obtain the optial solution x and the Lagrange function L(x, y (i), λ (i) ) 15: end if 16: end while IV. MATCHING-BASED OPTIMAL RESOURCE ALLOCATION FOR SINGLE GROUP PER CU In this section, we consider the MINLP proble in P1 for the special case C 1 = 1 and C 2 = 1. This case can be cast as a bipartite atching proble and thus can be solved polynoially. To forulate the bipartite proble, we divide P1 into two subprobles. In the first step, for each group k and each CU, we find their transission power so that the su throughput of the group and the CU is axiized. If this proble is feasible, group k is allowed to reuse the channel of CU and is arked as a candidate partner in the second step; otherwise group k is excluded fro the list of feasible partners. The second step is then to find the best CU partner for each group aong all feasible candidates so that the total throughput of all groups and CUs is axiized. 1) Feasibility check and power allocation: In order to deterine whether group k can reuse channel and to find the transission power of the feasible group and CU, we have proble P6 as follows: P6. ax ( R s.t. R R P P ) + R = D k log 2 (P β (49) ), (50) ( = log 2 P β ), (51) β γth, (52) β γth, (53) β = β P P M =1 P G P noise + P G,d, (54) GD2C P noise + P G C2D, d D k,(55),d ax, (56) P ax. (57) P6 is a reduced version of P1 by liiting it to only one group and one CU with the objective of axiizing their su throughput. Clearly, P6 is a geoetric prograing proble and can be transfored to a convex optiization proble using geoetric prograing techniques [31]. We solve proble P6 for all k and pairs. Define a candidate channel set C k for group k. If the proble is feasible, group k is adissible to channel (i.e., eligible to use channel ), then is added to C k. For C k, denote the optial throughput for the kth transitter and the th CU as R and R, respectively, and the optial su throughput as R su = R ( + R. For / C k, we set R = 0, R = log P ax G 2 P noise ), and thus R su = R. 2) Maxiizing total throughput: Given the axiu achievable throughput for each group when reusing each cellular channel, to find the optial channel allocation that axiizes the total throughput we have, K M P7. ax y k=1 =1 y R su (58) K s.t. k=1 y 1, M, (59) M =1 y 1, k K, (60) y {0, 1}, k K, M. (61) P7 is in effect the axiu weight bipartite atching proble, where the groups and the cellular channels are two groups of vertices in the bipartite graph, and the edge connecting group k and channel has a weight. The Hungarian algorith [32] can be used to solve the bipartite atching proble in polynoial tie. To deterine the coputational coplexity, consider M K and the coplexity of solving P6 is a function of the size of each group, denoted as f 6 ( D K ). Therefore, the tie coplexity of the atching-based optial resource allocation is O(M K f 6 ( D K )) + O(M 3 ), where the first and second ters correspond to the coputation tie in the first and second steps, respectively. R su V. GREEDY AND HEURISTIC CHANNEL ALLOCATION ALGORITHMS The MINLP proble in P1 is an NP-hard proble, and the coputational coplexity grows exponentially with the proble size in the worst case. In other words, GBD ay

8 converge in an exponential nuber of iterations. In this section, we first propose a greedy algorith and then a heuristic solution to the general MINLP proble in P1. Algorith 2 Greedy algorith 1: M: Set of cellular users 2: K: Set of all groups 3: e = 1, k K, M 4: Y = [y y = 0, k K, M] 5: S = 6: while K M k=1 =1 e 1 do 7: E = [e e = 1, k K, M] 8: T su = M =1 log 2 M 9: for each e E do 10: y = 1 11: if (k, ) is Adissible then ( P ) ax G P noise, k K, 12: Solve P3 to find Pk, and P, (k, ) [S (k, )] 13: if P3 is feasible then 14: T su = (k, ) [S ()] Z k,, where Z k, = y k, D k log 2(Pk, β k, ) + M =1 log ( 2 P ) β 15: else 16: e = 0 17: end if 18: else 19: e = 0 20: end if 21: y = 0 22: end for 23: (k, ) = arg ax () T su 24: y k, = 1 25: e k, = 0 26: S = S (k, ) 27: end while A. A greedy algorith Algorith 2 shows the greedy resource allocation algorith. The key idea of the greedy algorith is that, in each iteration, it selects a CU and group pair that axiizes the resulting su throughput of all selected pairs. The algorith terinates when no ore pair can be included. In this algorith, we first initialize all edges of a K M bipartite graph, e, to one in line 3. The K M assignent atrix Y is initialized to zero. S is the set of selected CU and pairs that axiize the su throughput and are initialized to zero at first. Matrix E includes all edges (e ) with the value of one. The inner loop (lines 9-22) finds the su throughput, T su, of all pairs in set S after an adissible pair (k, ) is added to S. In line 11, to find if (k, ) is adissible, the algorith checks constraints (1) and (2) for a given (k, ) pair. If either of these constraints is violated for the current (k, ), the procedure sets e and y to zero and oves to the next pair. Otherwise, the algorith solves proble P3 and finds T su. In the outer loop, the pair (k, ) that axiizes T su, (k, ) S (line 23) is found and reoved fro E. The outer loop is iterated until e = 0, k K and M. Since a total of in{m C 2, K C 1 } pairs can be found in the procedure, and in each iteration of the outer loop, only one such pair can be added, the coputational coplexity of the greedy algorith is O(in{M C 2, K C 1 } K M f 3 ( D K )), where f 3 ( D K ) is the coplexity of solving P3 as a function of the size of each group. The high coplexity of the greedy algorith ainly arises fro the need to solve the optiization proble P3 up to K M ties to find the best pair in each iteration. Algorith 3 Heuristic algorith 1: M: List of cellular users in decreasing order of G 2: K: List of all groups 3: G C2D,k = in d D k G C2D,d, k K, M 4: G k,k = in d D k G k,k,d, k K, M 5: y = 0, k K, M 6: P = Pax, M 7: P = 0, k K, M 8: = 1 9: for each M do 10: K = { k K M =1 y < C 2 } 11: while K k=1 y < C 1 or ( K do 12: k K = arg in k K ) k =1 P k, G k,k + P G C2D,k 13: y k, = 1 14: Solve P3 to find P k, and P 15: if P3 is feasible then 16: k transits on channel 17: y k, = 1 18: else 19: y k, = 0 20: end if 21: K = K \ {k } 22: end while 23: end for B. A heuristic algorith Since the coplexity of the greedy algorith is high, we propose a heuristic algorith with less coplexity in Algorith 3. In the following, we explain soe intuition behind the algorith.

9 To increase cellular and throughputs, it is desirable to have higher SINR. Fro (3) and (7), it can be deduced that having saller values of G C2D,d and G k,k,d reduces interference fro CU to group k and fro group k to group k, respectively, resulting in higher β and throughput. Furtherore, higher values of G lead to higher cellular throughput. Therefore, Algorith 3 tries to pair up a CU that has a high link gain to the BS and a group that has low interference to the CU. Starting fro = 1, the outer loop in Algorith 3 iterates through all CUs. For each, the algorith finds at ost C 1 best groups to share the channel in the inner loop. Line 12 shows the criteria for choosing the group that receives the iniu interferences fro CU and all other groups using the sae channel. In line 14, based on the current value of y, proble P3 is solved to find the optial transission power for each CU and group. If P3 is feasible, group k will reuse the channel and we have y k, = 1, otherwise y k, = 0 in line 19. In both cases, k is reoved fro the group list for the next iteration. The inner loop stops iterating after finding C 1 groups for CU or after at ost K iterations. It is worth entioning that each group cannot reuse ore than C 2 CUs. That is accoplished by introducing K that keeps track of all groups with less than C 2 assigned channels in line 10. In this algorith, proble P3 is solved M C 1 ties in the worst case, and thus the coplexity of the heuristic algorith is O(M 2 )+O(M K f 3 ( D K )). This is uch less than the coplexity of the greedy algorith. Note that this coplexity analysis does not give the actual aount of tie needed to run the algorith, which is ore iportant to deterine whether or not the algorith is acceptable in the real syste. The exact aount of tie for perforing the algorith also depends on the coputing speed of the CPU. In addition, the delay requireents of the application and the obility of the users (which deterines the channel dynaics) can also affect the feasibility of the algorith. We suarize the coputational coplexity of all the three solutions in Table II in the worst case. TABLE II: Worst case coplexity coparison Algorith Worst Case Coplexity GBD Exponential Greedy O(in{M C 2, K C 1 } K M f 3 ( D K )) Heursitic O(M 2 ) + O(M K f 3 ( D K )) C. Coordination and Overhead Channel easureent is an indispensable coponent of resource allocations in counications. Our proposed resource allocation algoriths are perfored at the BS. The TABLE III: Default Siulation Paraeters Paraeter Value radius (R) 1 k Nuber of receivers in each group 3 P noise -114 db Pathloss exponent (α) 3 Pax 20 db Pax 20 db γ th =γth =γth 10 db cluster size(r) 50 BS should first collect all required channel state inforation (CSI) in order to find the transission power for all CU and transitters and allocate channels for each group. It should then pass the values of the transission power to individual transitters, and the channel allocation inforation to the groups. Collecting the CSI between a CU and the BS, between two users, and between a CU and a user can be perfored during the device discovery process using the discovery signal. In order to reduce the overhead of reporting the involved CSI to the BS, the CSI feedback copression, signal flooding, and distance-based echaniss can be utilized [17]. The overhead can be further reduced for short and low obility user-to-user or user-to-bs links as the channel should have fewer taps and vary slowly. VI. PERFORMANCE EVALUATION We consider a single cell network as illustrated in Fig. 2, where cellular users are uniforly distributed in the cell. The distance-based path loss and slow Rayleigh fading are adopted as the channel odel. The probability density function of the instantaneous link gain at any tie is given by f G (x) = 1 Ḡ e x/ḡ, for x 0, where Ḡ is the average link gain between the transitter and the receiver and can be calculated based on the distance-based path loss odel. The proposed algoriths have been ipleented in Matlab together with the CVX, a package for specifying and solving convex progras [33]. Default paraeters used in the siulations are given in Table III. We run two sets of experients to evaluate the perforance of the proposed algoriths, naely, regularly placed clusters and randoly placed clusters. A larger M is used for the regular placeent of users, since the results are collected based on one placeent of the users. For the randoly placed users, each result is collected by averaging over a large nuber of different placeents of users. Therefore, each result for the randoly placed users takes uch longer tie to copute than that for the regular placeent. In order to keep the total siulation tie to be reasonable, we have to use saller M in the rando placeent. a) Regularly placed clusters: In Fig. 2, groups are anually placed in six different locations and

10 Y 1000 800 600 0 - - -600-800 11 border ular user cluster border user Base Station Selected partner connection 32 23 20 (-900,0) 39 13 14 24 (-500,0) 36 28 2 33 40 6 31 9 (-100,0) 17 4 30 1 3 (100,0) (500,0) (900,0) 7 10 25 34 35 16 12 22 21 15 8 18 5 27 37 29 26 19 Throughput (bps/hz) 900 800 700 600 500 300 100 R R cell ax R 38-1000 -1000-800 -600 - - 0 600 800 1000 X 0 20 40 60 80 100 cluster size, r() Fig. 2: Regularly placed clusters in a cell, C 1 = 2, C 2 = 2, M = 40. Fig. 3: Throughput coparison for different cluster sizes, C 1 = 2, C 2 = 2, M = 40, and K = 6. transitters and receivers are placed in the fixed locations within each group with radius r. This scenario allows us to have a better understanding of the channel selection for users and how it is ipacted by geographical spacing. In the figure, transitters are labeled with their coordinates. The GBD algorith finds the CU partner (or equivalent, the CU channel) for each group aong 40 CUs when C 1 = 2 and C 2 = 2. The straight lines in Fig. 2 connect groups with their respective CU partners. As shown in the figure, the chosen CU partners, tend to be close to the base station to ensure the rate of the CUs. Meanwhile, the CU partners are away fro the respective users to reduce utual interference between the CUs and the users. As it can be seen in Fig. 2, all groups found CU partners in this configuration. Note that even for CUs at the cell edges, their SINR constraints are satisfied as guaranteed by P1. Fig. 3 copares the axiu cellular throughput (without users), Rax, the throughput of cellular users (with users), R, and throughput, R, defined as follows, R ax = M =1 log 2 ( ) P ax G P noise, (62) R = M =1 R, (63) R = k A R k, (64) where A is the set of groups that are allowed to reuse at least one cellular channel. As can be observed in Fig. 3, the overall network throughput, R su = R + R, is greater than the axiu throughput before including users, Rax. With the introduction of users, the overall throughput increases by 25% to 125%. This coes at the cost of reduced cellular throughput as R ax > R since adding users causes interference to cellular users and decreases their throughput. However, the reduction is relatively sall, copared to the throughput. Moreover, although a larger cluster size leads to lower channel gain and lower throughput, it does not affect the cellular throughput very uch. Fig. 4 shows and su rates versus C 1 for different values of C 2. Both rates increase with C 1 since the nuber of available channels for each group increases and hence rate increases. However, both the and su rates flatten out after a certain value of C 1. For instance, when C 2 = 1, each CU can serve at ost one group, and increasing C 1 does not increase the rate since there are not enough channels to allow all the groups to reuse C 1 channels. Also, fro this figure we see that cellular throughput, which is the difference between the su rate and the rate, decreases as C 1 increases. This is because of the fact that the interference fro groups on CUs increases with C 1. On the other hand, increasing C 2 increases the and su rate for higher values of C 1 since each CU can serve ore groups and hence there are ore available channels for groups. However, for lower values of C 1, since there are enough CUs in the cell to be reused by groups, increasing C 2 does not change the and su rates significantly. Fig. 5 shows the convergence of the GBD in Algorith 1. As it is entioned in this algorith, in the first iteration UBD (1) =, LBD (1) = 0. The second iteration starts with a initial value of Y. It is shown in this figure the LBD results fro solving prial proble and the UBD fro solving aster proble converge in iteration 5.

11 0 1800 1600 1 Su rate, C_2=4 rate, C_2=4 Su rate, C_2=1 rate, C_2=1 750 700 650 600 GBD, R=0 Greedy, R=0 Proposed heursitic, R=0 GBD, R=1000 Greedy, R=1000 Proposed heursitic, R=1000 Rate (bps/hz) 1 1000 800 Su rate (bps/hz) 550 500 450 600 350 2 4 6 8 10 12 14 16 18 C 1 300 cluster size, r() Fig. 4: Throughput coparison for different values of C 1 and C 2, M = 20, and K = 6. Su rate (bps/hz) 550 500 450 350 300 250 150 100 LBD UBD 50 2 3 4 5 6 Nuber of iterations Fig. 5: Convergence of the GBD algorith, C 1 = 2, C 2 = 2, M = 20, and K = 6. b) Randoly placed users: In the second set of experients, we follow the clustered distribution odel in [34], where clusters of radius r are randoly located in a cell and the users in each group are randoly distributed in the corresponding cluster. Four etrics are used to evaluate the perforance: the su throughput, R su, the throughput, R, and the success rate. The success rate is defined as the ratio of the nuber of groups that found their CU partners ( A ) and the total nuber of groups. The results in this section have been generated for two sets of C 1 and C 2 values: in part (a) of all the figures, C 1 = 4 and C 2 = 3; and in part (b), C 1 = 1 and C 2 = 1. In the case of C 1 = 1 and C 2 = 1, both GBD and the atching-based algorith return the sae results since both Su rate (bps/hz) 260 250 240 230 220 210 190 180 170 (a) C 1 = 4, C 2 = 3 160 cluster size, r() (b) C 1 = 1, C 2 = 1 Matching, R=0 Greedy, R=0 Proposed heursitic, R=0 Matching, R=1000 Greedy, R=1000 Proposed heursitic, R=1000 Fig. 6: Average su throughput versus cluster radius for different cell radii (R), M = 10, K = 4 are optial. In [23], we have adapted the heuristic schee in [19] for ulticast and copared it against proposed schee when C 1 = 1 and C 2 = 1. Nuerical results in [23] show that our proposed heuristic outperfors the resource allocation algorith in [19], and thus evaluation of the heuristic in [19] is oitted here. Figs. 6 8 copare the perforance of GBD, the greedy and the heuristic algoriths for different cluster sizes (r) and different cell radii (R). Fro these figures, we observe that both the su and the throughput as well as the success rate decrease with the cluster size. Since the channel gain of link decreases when the cluster radius increases, ore transission power is required for

12 rate (bps/hz) 700 650 600 550 500 450 350 300 GBD, R=0 Greedy, R=0 Proposed heursitic, R=0 GBD, R=0 Greedy, R=1000 Proposed heursitic, R=1000 250 cluster size, r() Success rate 0.95 0.9 0.85 0.8 0.75 GBD, R=0 Greedy, R=0 Proposed heursitic, R=0 GBD, R=1000 Greedy, R=1000 Proposed heursitic, R=1000 0.7 cluster size, r() (a) C 1 = 4, C 2 = 3 (a) C 1 = 4, C 2 = 3 rate (bps/hz) 220 180 160 140 120 100 Matching, R=0 Greedy, R=0 Proposed heursitic, R=0 Matching, R=1000 Greedy, R=1000 Proposed heursitic, R=1000 80 cluster size, r() (b) C 1 = 1, C 2 = 1 Fig. 7: Average throughput versus cluster radius for different cell radii (R), M = 10, K = 4 Success rate 1 0.95 0.9 0.85 0.8 0.75 Matching, R=0 Greedy, R=0 Proposed heursitic, R=0 Matching, R=1000 Proposed heursitic, R=1000 Greedy, R=1000 0.7 cluster size, r() (b) C 1 = 1, C 2 = 1 Fig. 8: Average success rate versus cluster radius for different cell radii (R), M = 10, K = 4 the groups to satisfy the SINR threshold constraint. This in turn causes ore interference to the reused CU partner. Furtherore, it is seen fro these figures that the su throughput, the throughput and the success rate of all three algoriths increase with the cell radius. This is because increasing the cell radius increases the distance between the CUs and receivers and also the average distance of individual nodes to the BS. Hence, the interference fro CUs to receivers and the interference fro transitters at the BS is decreased. Recall that the rate is the axiu throughput achieved by the aditted groups. It is worth entioning that increasing the cell size leads to reduction in the cellular throughput due to the decreased link gain between the CUs and the base station. However, with the current siulation paraeters, R is the doinating part of the su rate and, therefore, R su increases with the cell size in both parts (a) and (b). It can be also seen fro Fig. 6 that the optial solutions, GBD algorith for part (a) and atching-based algorith for part (b), has the highest su rates. In coparison, the greedy algorith achieves close-to-optial su rate, while the heuristic algorith has a lower su rate copared to the other two algoriths, but it has the lowest coplexity aong the. Note that in Fig. 7, the rate of the greedy algorith exceeds that of the optial solution for soe cluster sizes. This does not contradict the optiality of GBD since the objective of P1 is to axiize the su rate not the rate. In Figs. 9 11 the perforance of all proposed algoriths for different SINR thresholds (γ th = γ th = γ th )

13 with different nubers of CUs (M) is shown. It is seen that increasing the SINR threshold leads to decreasing su rates, rates, and success rates since it liits the chances for groups to find CU partners. It can be also observed that the total throughput iproves slightly with increasing nuber of CUs since there are ore potential candidates for groups to reuse. 500 rate (bps/hz) 450 350 300 250 GBD, M=15 Greedy, M=15 Proposed heursitic, M=15 GBD, M=10 Greedy, M=10 Proposed heursitic, M=10 450 Su rate (bps/hz) 350 GBD, M=15 Greedy, M=15 300 Proposed heursitic, M=15 GBD, M=10 Greedy, M=10 Proposed heursitic, M=10 250 10 11 12 13 14 15 16 17 18 19 20 γ th (db) 250 240 230 (a) C 1 = 4, C 2 = 3 rate (bps/hz) 150 10 11 12 13 14 15 16 17 18 19 20 γ th (db) 140 130 120 110 100 90 80 70 (a) C 1 = 4, C 2 = 3 Matching, M=15 Greedy, M=15 Proposed heursitic, M=15 Matching, M=10 Greedy, M=10 Proposed heursitic, M=10 Su rate (bps/hz) 220 210 190 180 Matching, M=15 Greedy, M=15 Proposed heursitic, M=15 Matching, M=10 Greedy, M=10 Proposed heursitic, M=10 60 10 11 12 13 14 15 16 17 18 19 20 γ th (db) (b) C 1 = 1, C 2 = 1 Fig. 10: Average throughput versus γ th for different nuber of cellular users (M), R = 1000, K = 4 170 160 150 10 11 12 13 14 15 16 17 18 19 20 γ th (db) (b) C 1 = 1, C 2 = 1 Fig. 9: Average su throughput versus γ th for different nuber of cellular users (M), R = 1000, K = 4 The coplexity of the GBD prevents us fro obtaining the results for larger M, K, C 1 and C 2 values in a reasonable aount of tie. In Figs. 12 and 13 we copare the greedy and the heuristic algoriths when the cluster size varies, where M = 25, K = 8, C 1 = 4 and C 2 = 3. Fig. 12 shows the su rate and the rates using the two algoriths, and Fig. 13 copares the success rates. It can be seen fro Fig. 12 that although the heuristic underperfors the greedy algorith in both su rate and rate, both algoriths achieve high rate. Specifically, when the cluster size is relatively sall, say 10, the rate accounts for about 85% of the su rate for both the greedy and the heuristic algoriths. This percentage becoes lower when the cluster size becoes larger due to poorer channel conditions. When the cluster size is 70, the rate still accounts for about 80% of the su rate for both algoriths. Note that such high rates are obtained with the iniu SINRs guaranteed for the existing CUs. This deonstrates that our proposed algoriths can indeed support the ulticast with high rates without causing

14 1 1100 Greedy, Su rate 0.95 Proposed Heursitic, Su rate 1000 Greedy, rate 0.9 Proposed Heursitic, rate 0.85 900 Success rate 0.8 0.75 0.7 0.65 0.6 GBD, M=15 Greedy, M=15 Proposed heursitic, M=15 GBD, M=10 0.55 Greedy, M=10 Proposed heursitic, M=10 0.5 10 11 12 13 14 15 16 17 18 19 20 γ th (db) Rate (bps/hz) 800 700 600 500 cluster size, r() 1 (a) C 1 = 4, C 2 = 3 Fig. 12: Average throughput versus cluster radius, C 1 = 4, C 2 =3, M = 25, K = 8, R = 1000 0.95 0.9 1 0.85 0.95 Success rate 0.8 0.75 0.7 0.65 0.6 0.55 Matching, M=15 Greedy, M=15 Proposed heursitic, M=15 Matching, M=10 Greedy, M=10 Proposed heursitic, M=10 0.5 10 11 12 13 14 15 16 17 18 19 20 γ (db) th (b) C 1 = 1, C 2 = 1 Success rate 0.9 0.85 0.8 0.75 Greedy Proposed Heursitic 0.7 cluster size, r() Fig. 11: Average success rate versus γ th for different nuber of cellular users (M), R = 1000, K = 4 harful interference to the CUs. Fig. 13 shows that both the greedy and the heuristic algoriths can achieve relatively high success rate in aditting the ulticast groups. This is consistent with the results with saller M and K presented earlier. VII. CONCLUSIONS In this paper, we considered joint power and channel allocation for ulticast counications sharing uplink channels with the unicast CUs in a cellular network. To axiize the overall throughput while guaranteeing the QoS requireents of both CUs and groups, we forulated an optiization proble and found the optial solution using GBD. Then, we solved a special case when Fig. 13: Average success rate versus cluster radius, C 1 = 4, C 2 = 3, M = 25, K = 8, R = 1000 each group can reuse the channels of at ost one CU and each CU can share its channel with at ost one group, using axiu weight bipartite atching algorith. Finally, a greedy algorith and a low-coplexity heuristic algorith were also proposed. We perfored extensive siulations with different paraeters such as SINR threshold, cell size, cluster size, and nuber of CUs. Results showed that the greedy algorith has close-to-optial perforance in su rate, rate, and success rate. In coparison, our proposed heuristic algorith achieves lower su rate and rate than the greedy algorith and siilar success rate as the greedy algorith with lower coputational coplexity. Meanwhile, we have observed that both the greedy and the heuristic algoriths achieve