In-network Compression for Multiterminal. Cascade MIMO Systems

Similar documents
Optimality of Gaussian Fronthaul Compression for Uplink MIMO Cloud Radio Access Networks

Source and Relay Matrices Optimization for Multiuser Multi-Hop MIMO Relay Systems

Maximizing Sum Rate and Minimizing MSE on Multiuser Downlink: Optimality, Fast Algorithms and Equivalence via Max-min SIR

A Brief Introduction to Markov Chains and Hidden Markov Models

Centralized Coded Caching of Correlated Contents

MC-CDMA CDMA Systems. Introduction. Ivan Cosovic. Stefan Kaiser. IEEE Communication Theory Workshop 2005 Park City, USA, June 15, 2005

Iterative Decoding Performance Bounds for LDPC Codes on Noisy Channels

A. Distribution of the test statistic

Sum Capacity and TSC Bounds in Collaborative Multi-Base Wireless Systems

Duality, Polite Water-filling, and Optimization for MIMO B-MAC Interference Networks and itree Networks

Scalable Spectrum Allocation for Large Networks Based on Sparse Optimization

A Fundamental Storage-Communication Tradeoff in Distributed Computing with Straggling Nodes

Asynchronous Control for Coupled Markov Decision Systems

MARKOV CHAINS AND MARKOV DECISION THEORY. Contents

A Simple and Efficient Algorithm of 3-D Single-Source Localization with Uniform Cross Array Bing Xue 1 2 a) * Guangyou Fang 1 2 b and Yicai Ji 1 2 c)

Precoding for the Sparsely Spread MC-CDMA Downlink with Discrete-Alphabet Inputs

CS229 Lecture notes. Andrew Ng

arxiv: v1 [cs.it] 13 Jun 2014

Target Location Estimation in Wireless Sensor Networks Using Binary Data

Multicasting Energy and Information Simultaneously

Rate-Distortion Theory of Finite Point Processes

Linear Network Coding for Multiple Groupcast Sessions: An Interference Alignment Approach

Limited magnitude error detecting codes over Z q

Turbo Codes. Coding and Communication Laboratory. Dept. of Electrical Engineering, National Chung Hsing University

Pilot Contamination Problem in Multi-Cell. TDD Systems

The Weighted Sum Rate Maximization in MIMO Interference Networks: Minimax Lagrangian Duality and Algorithm

Reliability: Theory & Applications No.3, September 2006

Problem set 6 The Perron Frobenius theorem.

Fast Blind Recognition of Channel Codes

An Adaptive Opportunistic Routing Scheme for Wireless Ad-hoc Networks

Asymptotic Properties of a Generalized Cross Entropy Optimization Algorithm

Power Control and Transmission Scheduling for Network Utility Maximization in Wireless Networks

A Survey on Delay-Aware Resource Control. for Wireless Systems Large Deviation Theory, Stochastic Lyapunov Drift and Distributed Stochastic Learning

Coordination and Antenna Domain Formation in Cloud-RAN systems

An Infeasibility Result for the Multiterminal Source-Coding Problem

Multiplexing Two Information Sources over Fading. Channels: A Cross-layer Design Perspective

Sequential Decoding of Polar Codes with Arbitrary Binary Kernel

Space-Division Approach for Multi-pair MIMO Two Way Relaying: A Principal-Angle Perspective

ASummaryofGaussianProcesses Coryn A.L. Bailer-Jones

Sum Rate Maximization for Full Duplex Wireless-Powered Communication Networks

Information Theoretic Caching: The Multi-User Case

Statistical Learning Theory: A Primer

The Streaming-DMT of Fading Channels

Coded Caching for Files with Distinct File Sizes

On the Achievable Extrinsic Information of Inner Decoders in Serial Concatenation

On the Performance of Wireless Energy Harvesting Networks in a Boolean-Poisson Model

Polite Water-filling for the Boundary of the Capacity/Achievable Regions of MIMO MAC/BC/Interference Networks

Throughput Optimal Scheduling for Wireless Downlinks with Reconfiguration Delay

(This is a sample cover image for this issue. The actual cover is not yet available at this time.)

Appendix of the Paper The Role of No-Arbitrage on Forecasting: Lessons from a Parametric Term Structure Model

Transmit Antenna Selection for Physical-Layer Network Coding Based on Euclidean Distance

Downlink Transmission of Short Packets: Framing and Control Information Revisited

MULTI-PERIOD MODEL FOR PART FAMILY/MACHINE CELL FORMATION. Objectives included in the multi-period formulation

An Algorithm for Pruning Redundant Modules in Min-Max Modular Network

LOW-COMPLEXITY LINEAR PRECODING FOR MULTI-CELL MASSIVE MIMO SYSTEMS

Efficiently Generating Random Bits from Finite State Markov Chains

<C 2 2. λ 2 l. λ 1 l 1 < C 1

Cryptanalysis of PKP: A New Approach

sensors Beamforming Based Full-Duplex for Millimeter-Wave Communication Article

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 15, NO. 2, FEBRUARY

First-Order Corrections to Gutzwiller s Trace Formula for Systems with Discrete Symmetries

Separation of Variables and a Spherical Shell with Surface Charge

Partial permutation decoding for MacDonald codes

LINEAR DETECTORS FOR MULTI-USER MIMO SYSTEMS WITH CORRELATED SPATIAL DIVERSITY

Adaptive Joint Self-Interference Cancellation and Equalization for Space-Time Coded Bi-Directional Relaying Networks

BICM Performance Improvement via Online LLR Optimization

Maximum likelihood decoding of trellis codes in fading channels with no receiver CSI is a polynomial-complexity problem

Approximated MLC shape matrix decomposition with interleaf collision constraint

8 Digifl'.11 Cth:uits and devices

Many-Help-One Problem for Gaussian Sources with a Tree Structure on their Correlation

Feedback Schemes for Multiuser MIMO-OFDM Downlink

Some Measures for Asymmetry of Distributions

Uniprocessor Feasibility of Sporadic Tasks with Constrained Deadlines is Strongly conp-complete

Simplified Algorithms for Optimizing Multiuser Multi-Hop MIMO Relay Systems

Stochastic Complement Analysis of Multi-Server Threshold Queues. with Hysteresis. Abstract

Stochastic Variational Inference with Gradient Linearization

Nearest Neighbor Decoding and Pilot-Aided Channel Estimation for Fading Channels

Error Floor Approximation for LDPC Codes in the AWGN Channel

Diversity Gain Region for MIMO Fading Broadcast Channels

Uniformly Reweighted Belief Propagation: A Factor Graph Approach

How to Understand LMMSE Transceiver. Design for MIMO Systems From Quadratic Matrix Programming

A Statistical Framework for Real-time Event Detection in Power Systems

STA 216 Project: Spline Approach to Discrete Survival Analysis

A GENERAL METHOD FOR EVALUATING OUTAGE PROBABILITIES USING PADÉ APPROXIMATIONS

Gokhan M. Guvensen, Member, IEEE, and Ender Ayanoglu, Fellow, IEEE. Abstract

Maximum Ratio Combining of Correlated Diversity Branches with Imperfect Channel State Information and Colored Noise

On Scalable Coding in the Presence of Decoder Side Information

BDD-Based Analysis of Gapped q-gram Filters

BALANCING REGULAR MATRIX PENCILS

A New Algorithm for the Weighted Sum Rate Maximization in MIMO Interference Networks

Bayesian Learning. You hear a which which could equally be Thanks or Tanks, which would you go with?

Do Schools Matter for High Math Achievement? Evidence from the American Mathematics Competitions Glenn Ellison and Ashley Swanson Online Appendix

Algorithms to solve massively under-defined systems of multivariate quadratic equations

XSAT of linear CNF formulas

An Uplink-Downlink Duality for Cloud Radio Access Network

6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17. Solution 7

Delay Analysis of Maximum Weight Scheduling in Wireless Ad Hoc Networks

Two-Stage Least Squares as Minimum Distance

(f) is called a nearly holomorphic modular form of weight k + 2r as in [5].

Average Sum MSE Minimization in the Multi-User Downlink With Multiple Power Constraints

Transcription:

In-network Compression for Mutitermina 1 Cascade MIMO Systems Iñaki Estea Aguerri Abdeatif Zaidi arxiv:1612.01282v1 [cs.it] 5 Dec 2016 Abstract We study the probem of receive beamforming in upink cascade mutipe-input mutipe-output (MIMO) systems as an instance of that of cascade mutitermina source coding for ossy function computation. Using this connection, we deveop two coding schemes for the second and show that their appication eads to efficient beamforming schemes for the first. In the first coding scheme, each termina in the cascade sends a description of the source that it observes; the decoder reconstructs a sources, ossiy, and then computes an estimate of the desired function. This scheme improves upon standard routing in that every termina ony compresses the innovation of its source w.r.t. the descriptions that are sent by the previous terminas in the cascade. In the second scheme, the desired function is computed graduay in the cascade network, and each termina sends a finer description of it. In the context of upink cascade MIMO systems, the appication of these two schemes eads to efficient centraized receive-beamforming and distributed receive-beamforming, respectivey. I. INTRODUCTION Consider the cascade communication system for function computation shown in Figure 1. Termina, = 1,..., L, observes, or measures, a discrete memoryess source S n and communicates with Termina ( + 1) over an error-free finite-capacity ink of rate R. Termina (L + 1) does not observe any source, and pays the roe of a decoder which wishes to reconstruct I. Estea Aguerri is with the Mathematica and Agorithmic Sciences Lab, France Research Center, 92100 Bouogne- Biancourt, France. A. Zaidi was with Universit Paris-Est, France, and is currenty on eave at the Mathematica and Agorithmic Sciences Laboratory, Huawei France Research Center, 92100 Bouogne-Biancourt, France. Emai: {inaki.estea@huawei.com, abdeatif.zaidi@u-pem.fr}.

2 Fig. 1: Muti-termina cascade source coding coding for ossy function computation. a function Z n ossiy, to within some average fideity eve D, where Z i = ϕ(s 1,i,..., S L,i ) for some function ϕ( ). The memoryess sources (S1 n,..., SL n ) are arbitrary correated among them, with joint measure p S1,...,S L (s 1,..., s L ). For this communication system, optima tradeoffs among compression rate tupes (R 1,..., R L ) and aowed distortion eve D, captured by the rate-distortion region of the mode, are not known in genera, even if the sources are independent. For some specia cases, inner and outer bounds on the rate-distortion region, that do not agree in genera, are known, e.g., for the case L = 2 [1]. A reated work for the case L = 2 has aso appeared in [2]. For the genera case with L 2, athough a singe-etter characterization of the rate-distortion region seems to be out of reach, one can distinguish essentiay two different approaches or modes. In the first mode, each termina operates essentiay as a routing node. That is, each termina in the cascade sends an appropriate compressed version, or description, of the source that it observes; the decoder reconstructs a sources, ossiy, and then computes an estimate of the desired function. In this approach, the computation is performed centray, at ony the decoder, i.e., Termina (L + 1). In the second mode, Termina, = 1,..., L, processes the information that it gets from the previous termina, and then describes it, jointy with its own observation or source, to the next termina. That is, in a sense, the computation is performed distributivey in the network. (See, e.g., [3] [5], where variants of this approach are sometimes referred to as in-network processing). Consider now the seemingy unreated upink mutipe-input mutipe-output (MIMO) system mode shown in Figure 2. In this mode, M users communicate concurrenty with a common base station (BS), as in standard upink wireess systems. The base station is equipped with a arge number of antennas, e.g. a Massive MIMO BS; and the baseband processing is distributed across a number, say L, of modues or radio remote units (RRUs). The modues are connected each to a

3 Fig. 2: Chained MIMO architecture for upink Massive MIMO systems. sma number of antennas; and are concatenated in a ine network, through a common fronthau ink that connects them to a centra processor (CP) unit. This architecture, sometimes referred to as chained MIMO [6] and aso proposed in [6] as an aternative to the standard one in which each RRU has its dedicated fronthau ink to the CP [7] [16], offers a number of advantages and an additiona degree of fexibiity if more antennas/modues are to be added to the system. The reader may refer to [17] [19] where exampes of testbed impementations of this nove architecture can be found. For this architecture, depending on the amount of avaiabe channe state information (CSI), receive-beamforming operations may be better performed centray at the CP or distributivey across RRUs. Roughy, if CSI is avaiabe ony at the CP, not at the RRUs, it seems reasonabe that beamforming operations be performed ony centray, at the CP. In this case, RRU, = 1,..., L, sends a compressed version Ŝ of its output signa S to the CP, which first coects the vector (Ŝ,..., ŜL), and then performs receive-beamforming on it. In contrast, if oca CSI is avaiabe or can be acquired at the RRUs, due to the inearity of the receive beamforming (which is a simpe matrix mutipication) parts of the receive beamforming operations can be performed distributivey at the RRUs (see Section III). The above shows some connections among the mode of Figure 2 and that, more genera, of Figure 1. In this paper, we study them using a common framework. Specificay, we deveop two coding schemes for the mutitermina cascade source coding probem of Figure 1; and then show that their appication to the upink cascade MIMO system of Figure 1 eads to efficient receive-bemforming which, depending on the amount of avaiabe CSI at the RRUs, is better

4 performed centray at the CP or distributivey across RRUs. In the first coding scheme, each termina in the cascade sends a description of the source that it observes; the decoder reconstructs a sources, ossiy, and then computes an estimate of the desired function. This scheme improves upon standard routing in that every termina ony compresses the innovation of its source w.r.t. the descriptions that are sent by the previous terminas in the cascade. In the second scheme, the desired function is computed graduay in the cascade network, and each termina sends a finer description of it. We aso review the basic routing strategies and propose an improvement based on distributed source coding. Furthermore, we aso derive a ower bound on the minimum distortion at which the function can be reconstructed at the decoder. Numerica resuts show that the proposed methods outperform standard compression strategies and perform cose to the ower bound in some regimes. A. Notation Throughout, we use the foowing notation. Upper case etters are used to denote random variabes, e.g., X; ower case etters used to denote reaizations of random variabes x; and caigraphic etters denote sets, e.g., X. The cardinaity of a set X is denoted by X. The ength-n sequence (X 1,..., X n ) is denoted as X n ; and, for (1 k j n) the sub-sequence (X k, X k+1,..., X j ) is denoted as X j k. Bodface upper case etters denote vectors or matrices, e.g., X, where context shoud make the distinction cear. For an integer L 1, we denote the set of integers smaer or equa L as L {1,..., L}; and, for 1 L, we use the shorthand notations L {1, 2,... }, L c { + 1... L} and L/ {1,..., 1, + 1,..., L}. We denote the covariance of a vector X by Σ x E[XX H ]; Σ x,y is the cross-correation Σ x,y E[XY H ], and the conditiona correation matrix of X given Y as Σ x y Σ x Σ x,y Σ 1 y Σ x,y. The ength- N vector with a entries equa zero but the -th eement which is equa unity is denoted as δ, i.e., δ [0 1 1, 1, 0 1 N ]; and the N matrix whose entries are a zeros, but the first diagona eements which are equa unity, is denoted by Ī, [Ī ] i,i = 1 for i and 0 otherwise.

5 II. CASCADE SOURCE CODING SYSTEM MODEL Let {S 1,i, S 2,i,..., S L,i } n i=1 = (S1 n,..., SL n ) be a sequence of n independent and identicay distributed (i.i.d.) sampes of the L-dimensiona source (S 1, S 2,..., S L ) jointy distributed as p(s 1,..., s L ) over S 1... S L. For convenience, we denote SL+1 n. A cascade of (L + 1) terminas are concatenated as shown in Figure 1, such that Termina, = 1,..., L, is connected to Termina ( + 1) over an error-free ink of capacity R bits per channe use. Termina (L+1) is interested in reconstructing a sequence Z n ossiy, to within some fideity eve, where Z i = ϕ(s 1,i,..., S L,i ), i = 1,..., n, for some function ϕ : S 1... S L Z. To this end, Termina, = 1,..., L, which observes the sequence S n and receives message m 1 M 1 {1,..., M (n) 1 } from Termina ( 1), generates a message m M as m = f (n) (s n, m 1) for some encoding function f (n) : S n M 1 M, and forwards it over the error-free ink of capacity R to Termina ( + 1). At Termina (L + 1), the message m L is mapped to an estimate Ẑn = g (n) (m L ) of Z n, using some mapping g (n) : M L Ẑn. Let Ẑ be the reconstruction aphabet and d : Z Ẑ [0, ) be a singe etter distortion. The distortion between Z n and the reconstruction Ẑn is defined as d(z n ; ẑ n ) = 1 n n i=1 d(z i; ẑ i ). Definition 1. A tupe (R 1,..., R L ) is said to achieve distortion D for the cascade muti-termina source coding probem if there exist L encoding functions f (n) : S M 1 M, = 1,..., L, and a function g (n) : M L Ẑ such that R 1 n og M (n), = 1,..., L and D 1 n E[d(Zn ; Ẑn )]. The rate-distortion (RD) region R(D) of the cascade muti-termina source coding probem is defined as the cosure of a rate tupes (R 1,..., R L ) that achieve distortion D. III. SCHEMES FOR CASCADE SOURCE CODING In this section, we deveop three coding schemes for the cascade source coding mode of Figure 1 and anayze the RD regions that they achieve.

6 A. Successive Wyner-Ziv Routing (WZR) A simpe strategy which is inspired by standard routing (SR) in graphica networks and referred to as mutipe-and-forward in [4] has termina, = 1,..., L, forward a compressed version of its source to the next termina, in addition to the bit steam received from the previous termina in the cascade (without processing). The decoder decompresses a sources and then outputs an estimate of the desired function. In SR, observations are compressed independenty and correation with the observation of the next termina in the cascade is not expoited. We propose a nove scheme, to which we refer as Successive Wyner-Ziv Routing (WZR), in which termina, = 1,..., L, compresses its observation S n into a description U n taking into account the descriptions (U1 n,..., U 1 n ) sent by the previous terminas in the cascade as decoder side information that is avaiabe at the CP, i.e., expoits the correation between S n and (U n 1,..., U n 1 ) through Wyner-Ziv binning [20]. In doing so, it uses B bits per source sampe. Aong with the produced compression index of rate B, each termina aso forwards the bit steam received from the previous termina to the next one without processing. The decoder successivey decompresses a sources and outputs an estimate of the function of interest. Theorem 1. The RD region R WZR (D) that is achievabe with the WZR scheme is given by the set of rate tupes (R 1,..., R L ) satisfying R I(S i ; U i U 1,..., U i 1 ), for = 1,..., L, (1) i=1 for some joint pmf p(s 1,..., s L ) L =1 p(u s ) and function g s.t. E[d(Z, g(u 1,..., U L ))] D. Remark 1. The auxiiary random variabes (U 1,..., U L ) that are invoved in (1) satisfy the foowing Markov Chains U S (S L/, U L/ ), for = 1,..., L, (2) where S L/ = (S 1,..., S 1, S +1,..., S L ) and U L/ = (U 1,..., U 1, U +1,..., U L ). Outine Proof: The proof of Theorem 1 foows by appying successivey standard Wyner-Ziv source coding [21]. Hereafter, we ony outine the main steps, for the sake of brevity. Fix ɛ > 0

7 and a joint pmf p(s 1,..., s L, u 1,..., u L ) that factorizes as p(s 1,..., s L, u 1,..., u L ) = p(s 1,..., s L ) L p(u s ) (3) and a function g( ) such that E[d(Z, g(u 1,..., U L ))] D/(1 + ɛ). Aso, fix non-negative R 1,..., R L, such that for R R 1 + + R 1 for = 1,..., L. Codebook Generation: Let non-negative ˆR 1,..., ˆR L, and set B = R (R 1 + + R 1 ) for = 1,..., L. Generate L codebooks {C }, = 1,..., L, with codebook C consisting of a coection of 2 n(b + ˆR ) independent codewords {u n (i )} indexed with i = 1,..., 2 n(b + ˆR ), where codeword u n (i ) has its eements generated i.i.d. according to p(u ). Randomy and independenty assign these codewords into 2 nb codewords. =1 bins {Bj } indexed with j = 1,..., 2 nb, each containing 2 n ˆR Encoding at Termina : Termina finds an index i such that u n (i ) C is strongy ɛ-jointy typica 1 with s n, i.e., (un (i ), s n ) T (n) [U S ]. Using standard arguments, it is easy to see that this can be accompished with vanishing probabiity of error as ong as n is arge and B + ˆR I(S ; U ). (4) Let j such that B j u n (i ). Termina then forwards the bin index j and the received message m 1 = (j 1,..., j 1 ) to Termina as m = (m 1, j ). Reconstruction at the end Termina (L + 1): Termina (L + 1) coects a received bin indices as m L = (j 1,..., j L ), and reconstructs the codewords u n 1(i 1 ),..., u n L (i L) successivey in this order, as foows. Assuming that codewords (u n 1(i 1 ),..., u n 1 (i 1)) have been reconstructed correcty, it finds the appropriate codeword u n (i ) by ooking in the bin B j for the unique u n (i ) that is ɛ-jointy typica with (u n 1(i 1 ),..., u n 1 (i 1)). Using standard arguments, it is easy to see that the error in this step has vanishing probabiity as ong as n is arge and ˆR < I(U ; U 1,..., U 1 ). (5) 1 For forma definitions of strongy ɛ-joint typicaity, the reader may refer to [22].

8 Fig. 3: Distributed source coding mode for the routing scheme of Theorem 1. Termina (L+1) reconstructs an estimate of Z n sampe-wise as Ẑi = g(u 1,i (i 1 ), u 2,i (i 2 ),..., u L,i (i L )), i = 1,..., n. Note that in doing so the average distortion constraint is satisfied. Finay, substituting B = R (R 1 + + R 1 ) and combining (4) and (5), we get (1); and this competes the proof of Theorem 1. Remark 2. As it can be conveyed from the proof of Theorem 1, because every termina, = 1,..., L, uses a part (R 1 +... + R 1 ) of its per-sampe rate R to simpy route the bit streams received from the previous terminas in the cascade and the remaining per-sampe B = R (R 1 +... + R 1 ) bits to convey a description of its observed source S n, the resuting scheme can be seen as one for the mode shown in Figure 3 in which the terminas are connected through parae inks to the CP. Using this connection, the performance of the above WZR scheme can be improved by compressing the observations à-a Berger-Tung [23]. Furthermore, note that in the described WZR scheme Termina does not expoit the avaiabe bits streams from the previous terminas in the cascade as encoder side information. B. Improved Routing (IR) In this section, we propose a scheme, to which we refer to as Improved Routing (IR), which improves upon SR and WZR by compressing at each termina its observed signa considering the compressed observations from the previous terminas as side information avaiabe both at the encoder and the decoder [22]. Thus, each termina ony compresses the innovative part of the observation with respect to the compressed signas from previous terminas (see Section IV-A).

9 Theorem 2. The RD region R IR (D) that is achievabe with the IR scheme is given by the union of rate tupes (R 1,..., R L ) satisfying R I(S i ; U i U 1,..., U i 1 ), for = 1,..., L, (6) i=1 for some joint pmf p(s 1,..., s L ) L =1 p(u s, u 1,..., u 1 ) and function g, s.t. Ẑ = g(u 1,..., U L ) and E[d(Z, Ẑ)] D. Remark 3. The auxiiary random variabes (U 1,..., U L ) that are invoved in (6) satisfy the foowing Markov Chains U (S, U L 1 ) (S L/, U L c ) for = 1,..., L, (7) where U L 1 = (U 1,..., U 1 ) and U L c = (U +1,..., U L ). Outine Proof: Fix ɛ > 0, and a joint pmf p(s 1,..., s L, u 1,..., u L ) that factorizes as p(s 1,..., s L, u 1,..., u L ) = p(s 1,..., s L ) L p(u s, u 1,..., u 1 ), (8) and a reconstruction function g( ) such that E[d(Z; g(u 1,..., U L ))] D/(1 + ɛ). Aso, fix non-negative R 1,..., R L such that for R R 1 + + R 1, for = 1,..., L. Codebook generation: Let B = R (R 1 + + R 1 ) for = 1,..., L. Generate a codebook C 1 consisting of a coection of 2 nb 1 codewords {u n 1(i 1 )}, indexed with i 1 = 1,..., 2 nb 1, where codeword u n 1(i 1 ) has its eements generated i.i.d. according to p(u 1 ). For each index i 1, generate a codebook C 2 (i 1 ) consisting of a coection of 2 nb 2 =1 codewords {u n 2(i 1, i 2 )} indexed with i 2 = 1,..., 2 nb 2, where codeword u n 2(i 1, i 2 ) is generated independenty and i.i.d. according to p(u 2 u 1 ). Simiary, for each index tupe (i 1,..., i 1 ) generate a codebook C (i 1,..., i 1 ) of 2 nb codewords {u n (i 1,..., i )} indexed with i = 1,..., 2 nb, and where codeword u n (i 1,..., i ) has its eements generated i.i.d. according to p(u u 1,..., u 1 ). Encoding at Termina 1: Termina 1 finds an index i 1 such that u n 1(i 1 ) C 1 is strongy ɛ-jointy typica with s n 1, i.e., (u n 1(i 1 ), s n 1) T (n) [U 1 S 1 ]. Using standard arguments, this step can be seen to

10 have vanishing probabiity of error as ong as n is arge enough and B 1 I(S 1 ; U 1 ). (9) Then, it forwards m 1 = i 1 to Termina 2. Encoding at Termina 2: Upon reception of m 1 with the indices m 1 = (i 1,..., i 1 ), Termina finds an index i such that (u n 1(i 1 ),..., u n (i 1,..., i )) are strongy ɛ-jointy typica with s n, i.e., (un 1(i 1 ),..., u n (i 1,..., i ), s n ) T (n) [U 1,...,U,S ]. Using standard arguments, this step can be seen to have vanishing probabiity of error as ong as n is arge enough and B I(U ; S U 1,..., U 1 ). (10) Then, it forwards i and m 1 to Termina ( + 1) as m = (i, m 1 ). Reconstruction at end Termina (L + 1): Termina (L + 1) coects a received indices as m L = (i 1,..., i L ), and reconstructs the codewords (u n 1(i 1 ),..., u n L (i 1,..., i L )). Then, it reconstructs an estimate of Z n sampe-wise as Ẑi = f(u 1,i (i 1 ), u 2,i (i 1, i 2 ),..., u L,i (i 1,..., i L ), i = 1,..., n. Note that in doing so, the average distortion constraint is satisfied. Finay, substituting B = R (R 1 + + R 1 ) in (9) and (10), we get (6). This competes the proof of Theorem 2. Remark 4. Note that the rate constraints in Theorem 1 and Theorem 2 are equa. However, R WZR (D) R IR (D) since the set of feasibe pmfs in IR is arger than that in WZR. Remark 5. In the coding scheme of Theorem 2, the compression rate on the communication hop between termina and termina ( + 1), = 1,..., L, can be improved further (i.e., reduced) by taking into account sequence S+1 n as decode side information, through Wyner-Ziv binning. The resuting strategy, however, is not of routing type, uness every Wyner-Ziv code is restricted to account for the worst side information ahead in the cascade, i.e., binning at Termina accounts for the worst quaity side information among the sequences {Sj n, j = + 1,.., L}. Aso, in this case, since the end termina (L+1), or CP, does not observe any side information, i.e., SL+1 n =, this strategy makes most sense if the Wyner-Ziv codes are chosen such that the ast reay termina

11 in the cascade, i.e., Termina L, recovers an estimate of the desired function and then sends it using a standard rate-distortion code to the CP in a manner that aows the atter to reconstruct the desired function to within the desired fideity eve. Remark 6. Inine with the previous remark, for the mode of Figure 1 yet another natura coding strategy is one in which one decomposes the probem into L successive Wyner-Ziv type probems for function computation, one for each hop. Specificay, in this strategy one sees the communication between Termina and Termina ( + 1), = 1,..., L, as a Wyner-Ziv source coding probem with two-sided state information, state information S n at the encoder and state information S+1 n at the decoder. This strategy, which is not of routing type, is deveoped in the next section. C. In-Network Processing (IP) In the routing schemes in Section III-A and Section III-B, the function of interest is computed at the destination from the compressed observations, i.e., the terminas have to share the fronthau to send a compressed version of their observations to Termina (L + 1). We present a scheme to which we refer to as In-Network Processing (IP), in which instead, each termina computes a part of the function to reconstruct at the decoder so that the function of interest is computed aong the cascade. To that end, each termina decompresses the signa received from the previous termina and jointy compresses it with its observation to generate an estimate of the part of the function of interest, which is forwarded to the next termina (see Section IV-C). Correation between the computed part of the function and the source at the next termina S+1 n is expoited through Wyner-Ziv coding. Note that by decompressing and recompressing the observations at each termina, additiona distortion is introduced [1]. Theorem 3. The RD region R IP (D) that is achievabe with IP is given by the union of rate tupes (R 1,..., R L ) satisfying R I(S, U 1 ; U S +1 ), = 1,..., L, (11)

12 for some joint pmf p(s 1,..., s L ) L =1 p(u s, u 1 ) and a function g, such that Ẑ = g(u L ) and E[d(Z, Ẑ)] D. Remark 7. The auxiiary random variabes (U 1,..., U L ) that are invoved in (11) satisfy the foowing Markov Chains U (S, U 1 ) (S L/, U L/{ 1,} ) for = 1,..., L, (12) where S L/ = (S 1,..., S 1, S +1,..., S L ), U L/{ 1,} = (U 1,..., U 2, U +1,..., U L ). Outine Proof: Fix ɛ > 0 and a joint pmf p(s 1,..., s L, u 1,..., u L ) that factorizes as L p(s 1,..., s L, u 1,..., u L ) = p(s 1,..., s L ) p(u s, u 1 ), (13) and a function g( ) such that E[d(Z; g(u L ))] D/(1 + ɛ). Aso fix non-negative R 1,..., R L. Codebook generation: Let non-negative ˆR 1,..., ˆR L. Generate L codebooks {C }, = 1,..., L, with codebook C consisting of a coection of 2 n(r + ˆR ) independent codewords {u n (i )}, indexed =1 with i = 1,..., 2 n(r + ˆR ), where codeword u n (i ) has its eements generated randomy and independenty i.i.d. according to p(u ). Randomy and independenty assign these codewords into 2 nr bins {B j } indexed with j = 1,..., 2 nr, each containing 2 n ˆR codewords. Encoding at Termina 1: Termina 1 finds an index i 1 such that u n 1 C 1 is strongy ɛ-jointy typica with s n 1, i.e., (u n 1(i 1 ), s n 1) T (n) [U 1 S 1 ]. Using standard arguments, it is easy to see that this can be accompished with vanishing probabiity of error as ong as n is arge and R 1 + ˆR 1 I(S 1 ; U 1 ). (14) Let j 1 such that B j1 u n 1(i 1 ). Termina 1 then forwards the index j 1 to Termina 2. Decompression and encoding at Termina 2: Upon reception of the bin index m 1 = j 1 from Termina ( 1), Termina finds u n 1 (i 1) by ooking in the bin B j for the the unique u n 1 (i ) that is ɛ-jointy typica with s n. Using standard arguments, it can be seen that this can be accompished with vanishing probabiity of error as ong as n is arge enough and

13 ˆR < I(U 1 ; S ). (15) Then, Termina finds an index i such that u n (i ) C is strongy ɛ-jointy typica with (s n, un 1 (i 1)), i.e., (s n, un 1 (i 1), u n (i )) T (n) [SU 1 U ]. Using standard arguments, it can be seen that this can be accompished with vanishing probabiity of error as ong as n is arge and R + ˆR I(S, U 1 ; U ). (16) Let j such that B j u n (i ). Termina forwards the bin index to Termina ( + 1) as m = j. Reconstruction at end Termina (L + 1): Termina (L + 1) coects the bin index m L = j L and reconstructs the codeword u n L (i L) by ooking in the bin B jl. Since Termina (L+1) does not have avaiabe any side information sequence, form (15), successfu recovery of the unique u n L (i L) in the bin B jl requires ˆR L = 0. That is, each bin contains a singe codeword and j L = i L. Then, Termina (L + 1) reconstructs an estimate of Z n sampe-wise as Ẑi = g(u L,i (i L )), i = 1,..., n. In doing so, the average distortion constraint is satisfied. Finay, combining (14), (15) and (16), we get (11). This competes the proof of Theorem 3. Remark 8. It is shown in [1] that for L = 2, in genera none of the IR and IP schemes outperform the other; and a scheme combining the two strategies is proposed. IV. CENTRALIZED AND DISTRIBUTED BEAMFORMING IN CHAINED MIMO SYSTEMS In this section, we appy the cascade source coding mode to study the achievabe distortion in a Gaussian upink MIMO system with a chained MIMO architecture (C-MIMO) in which M singe antenna users transmit over a Gaussian channe to L RRUs as shown in Figure 2. The signa received at RRU, = 1,..., L, equipped with K antennas, S C K 1, is given by S = H X + N, (17) where X = [X 1,..., X M ] T is the signa transmitted by the M users and X m C, m = 1,..., M is the signa transmitted by user m. We assume that each user satisfies an average power constraint E[ X m 2 ] snr, m = 1,..., M, where snr > 0; H C K M is the channe between the M users and RRU and N C K CN (0, I) is the additive ambient noise.

14 The transmitted signa by the M users is assumed to be distributed as X CN (0, snri) and we denote the observations at the L RRUs as S = [S T 1,..., S T L ]T. Thus, we have S CN (0, Σ s ), where Σ s = snrh L H H L + I and H L [H T 1,..., H T L ]T. In traditiona receive-beamforming, a beamforming fiter W C M N K is appied at the decoder on the received signa S to estimate the channe input X with the inear function Z WS. (18) In C-MIMO, the decoder (the CP) is interested in computing the receive beamforming signa (18) with minimum distortion, athough S is not directy avaiabe at the CP but remotey observed at the terminas. Depending on the avaiabe CSI, receive-beamforming computation may be better performed centray at the CP or distributivey across the RRUs: Centraized Beamforming: If CSI is avaiabe ony at the CP, not at the RRUs, it seems reasonabe that beamforming operations are performed ony centray at the CP. In this case, RRU, = 1,..., L, sends a compressed version Ŝ of its output signa S to the CP, which first coects the vector (Ŝ,..., ŜL), and then performs receive-beamforming on it. Distributed Beamforming: If oca CSI is avaiabe at the RRUs, or can be acquired, receive beamforming operations can be performed distributivey aong the cascade. Due to inearity the joint beamforming operation (18) can be expressed as a function of the received source as Z = WS = W 1 S 1 + + W L S L, (19) where W C M K corresponds to bocks of K coumns of W such that [W 1,..., W L ] = W. In this case, the receive beamforming signa can be computed graduay in the cascade network, by etting the RRUs compute a part of the desired function, e.g., as proposed in Section IV-C, RRU, = 1,..., L computes an estimate of W 1 S 1 + + W S. The distortion between Z and the reconstruction of the beamforming signa Ẑ at the CP is measured with the sum-distortion d(z, Ẑ) Tr{(Z Ẑ)(Z Ẑ) H }. (20)

15 For a given fronthau tupe (R 1,..., R L ) in the RD region R(D), the minimum achievabe average distortion D is characterized by the distortion-rate function 2 given by D(R 1,..., R L ) min{d 0 : (R 1,..., R L ) R(D)}. (21) Next, we study the distortion-rate function in a Gaussian C-MIMO mode under centraized and distributed beamforming with the schemes proposed for the cascade source coding probem. A. Centraized Beamforming with Improved Routing In this section, we consider distortion-rate function of the IR scheme in Section III-B appied for centraized beamforming. Each RRU forwards a compressed version of the observation to the CP, which estimates the receive-beamforming signa Z from the decompressed observations. Whie the optima test channes are in genera unknown, next theorem gives the distortion-rate function of IR for centraized beamforming for the C-MIMO setup under jointy distributed Gaussian test channes. Theorem 4. The distortion-rate function for the IR scheme under jointy Gaussian test channes is given by where Σ s u L 1 D IR (R 1,..., R L ) = min Tr{Σ z Σ z,ul Σ 1 u K 1,...,K L Σ H z,u L } (22) L s.t. R B 1 +... + B, = 1,..., L, (23) B og Σ s u L 1 + K / K, (24) = Σ s Σ s,u L 1 Σ 1 u L 1 Σ H s,u L 1 and Σ s = δ Σ s δ T, Σ s,u L 1 = δ Σ s Ī T, Σ u L 1 = Ī Σ ul Ī T, Σ z = WΣ s W H, Σ z,ul = WΣ s, Σ ul = Σ s + diag[k L ]. Proof: We evauate Theorem 2 by considering jointy Gaussian sources and test channes (S 1,... S L, U 1,..., U L ) satisfying p(s 1,..., s L ) L =1 p(u s, u 1,..., u 1 ) and the minimum 2 This formuation is equivaent to the rate-distortion framework considered in Section III; here we consider the distortion-rate formuation for convenience.

16 mean square error (MMSE) estimator Ẑ = E[Z U L ] as reconstruction function g, where we define U L 1 [U 1,..., U 1 ]. Note that MMSE reconstruction is optima under (20), whie considering jointy Gaussian test channes might be suboptima in genera. First we derive a ower bound on the achievabe distortion. We have B I(S ; U U L 1 ) (25) = I(J ; U U L 1 ) (26) = I(J, U L 1 ; U ) (27) I(J ; U ), (28) where in (26) we define the MMSE error J S E[S U L 1 ], which is Gaussian distributed J CN (0, Σ s u L 1 ); (27) foows due to the orthogonaity principe [22], and due to the fact that for Gaussian random variabes, orthogonaity impies independence of J and U L 1. For the fixed test channes, et us choose A with 0 A Σ 1 s u L 1 such that for = 1,..., L cov(j U ) = Σ s u L 1 Σ s u L 1 A Σ s u L 1. (29) Note that such A aways exists since 0 cov(j U ) Σ s u L 1. Then, from (28), we have B h(j ) h(j U ) (30) og Σ s u L 1 og cov(j U ) (31) = og Σ s u L 1 + K og K, (32) where (32) foows by considering the positive-semidefinite matrix K 0 such that The distortion is ower bounded as A = Σ 1 s u L 1 (I K 1/2 (Σ s u L 1 + K ) 1 K 1/2 ). (33) D Tr{E[(Z E[Z U L ])(Z E[Z U L ]) H ]} = Tr{Σ z Σ z,ul Σ 1 u L Σ H z,u L }. (34) where (34) foows due to the inearity of the MMSE estimator for jointy Gaussian variabes.

17 Fig. 4: Improved Routing scheme for C-MIMO. Fig. 5: Wyner-Ziv Routing scheme for C-MIMO. The ower bound given by (32) and (34) is achievabe by etting U = S + Q, with Q CN (0, K ), and independent of a other variabes, as foows B = I(S ; U U L 1 ) (35) = h(u U L 1 ) h(u U L 1, S ) (36) = h(s E[S U L 1 ] + Q U L 1 ) h(q ) (37) = h(s E[S U L 1 ] + Q ) h(q ) (38) = og Σ s u L 1 + K og K, (39) where (37) foows since U = S + Q and (38)is due to the orthogonaity principe. Optimizing over the positive semidefinite covariance matrices K 1,..., K L 0 gives the desired minimum distortion D in Theorem 4. The IR scheme in Section III-B requires joint compression at each RRU. However, for the Gaussian C-MIMO, it is shown next that the sum-distortion D IR in Theorem 4 can be achieved by appying at each RRU separate decompression, innovation computation and compression, as foows. See Figure 6. At RRU : Upon receiving bits m 1, decompress Ū 1,..., Ū 1. Compute the innovation J S E[S Ū 1,..., Ū 1 ]. Compress J at B bits per sampe independenty of Ū 1,..., Ū 1 using test channes Ū = J + Q, with Q CN (0, K ) independent of each other. Proposition 1. For the Gaussian C-MIMO mode, separate decompression, innovation computa-

18 tion and compression achieves the minimum distortion D IR characterized by the distortion-rate function in Theorem 4. Proof: We show that any distortion D achievabe for a pmf p(s 1,..., s L ) L =1 p(u s, u 1,..., u 1 ) and the corresponding (B 1,..., B L ) in Theorem 4 is aso achievabe with separate decompression, innovation computation and compression as detaied above. Note that J corresponds to the MMSE error of estimating S from (Ū 1,..., Ū 1 ), and is an i.i.d. Gaussian sequence distributed as J CN (0, Σ j, ). From standard arguments, compressing J at B bits requires B I(J ; Ū ) (40) = h(ū ) h(ū J ) (41) = og Σ s u L 1 + K og K, (42) where (42) foows since Σ j, = Σ s u L 1, which foows since RRU can compute U = Ū + E[S U 1,..., U 1] for = 1,...,, which is distributed as the test channes U = S + Q and thus E[S U L 1 ] = E[S Ū L 1 ]. The distortion between Z and its estimation from Ū L satisfies D Tr{E[(Z E[Z Ū L ])(Z E[Z Ū L ]) H ]} (43) = Tr{E[(Z E[Z U L ])(Z E[Z U L ]) H ]} (44) = Tr{Σ z Σ z,ul Σ 1 u L Σ H z,u L }. (45) Thus, any achievabe distortion D for given p(s 1,..., s L ) L =1 p(u s, u 1,..., u 1 ) and fixed (B 1,..., B L ) in Theorem 4 is achievabe by separate decompression, innovation computation and compression. Determining the optima covariance matrices (K 1,..., K L ) achieving D IR (R 1,..., R L ) in Theorem 4 requires a joint optimization, which is generay not simpe. Next, we propose a method to successivey obtain a feasibe soution (K 1,..., K L ) and the corresponding minimum distortion D IR S (R 1,..., R L ) for given (R 1,..., R L ):

19 1) For a given fronthau tupe (R 1,..., R L ), fix non-negative B 1,..., B L, satisfying R B 1 +... + B, for = 1,..., L. 2) For such (B 1,..., B L ), sequentiay find K from RRU 1 to RRU L as the K minimizing the distortion between the innovation J and its reconstruction as foows. At RRU, for given K 1,..., K 1 and B, K is found from the covariance matrix K minimizing D (B ) min K Tr{E[(J E[J Ū ])(J E[J Ū ]) H ]} (46) s.t. B og Σ s u L 1 + K / K. 3) Compute the achievabe distortion D S IF (B 1,..., B L ) by evauating Tr{Σ z Σ z,ul Σ 1 u L Σ H z,u L } as in Theorem 2 with the chosen covariance matrices (K 1,..., K L ). 4) Compute D IR S (R 1,..., R L ) as the minimum D S IF (B 1,..., B L ) over (B 1,..., B L ) satisfying the fronthau constraints R B 1 + + B for = 1,..., L. Note that (46) corresponds to the distortion-rate probem of compressing a Gaussian vector source J CN (0, Σ j, ) at B bits. The soution for the distortion-rate probem in (46) is standard and is given next for competeness. Proposition 2. Given K 1,..., K 1, et Σ j, = Σ s u L 1 = V Λ J V H, where V H V = I and Λ J diag[λ J 1,..., λ J K ]. The optima distortion (46) is D = K k=1 min{λ, λj k } where λ > 0 is the soution to B = K k=1 ( ) λ og + J k, (47) λ and it achieved with K = V ΛV H, where Λ = diag[λ Q 1,..., λ Q L ] and λq k = min{λ, λj k }/(λj k min{λ, λ J k }). Outine Proof: The minimization of the RD probem in (46) is standard, e.g. [24], and we known to be achieved by uncorreating the vector source J into K uncorreated components as J = V H J CN (0, Λ J ). Then, the avaiabe B bits are distributed over the parae source

20 components J by soving the reverse water-fiing probem D = min d 1,...,d K 0 k=1 K d k s.t. B = K k=1 ( ) λ og + J k, d k The soution to this probem is given by d k = min{λ J k, λ}, where λ > 0 satisfies (47). The optimaity of K foows since D is achieved with K as stated in Proposition 2 [24], [25]. B. Centraized Beamforming with Succesive Wyner-Ziv In this section, we consider the distortion-rate function of the WZR scheme in Section III-A for centraized beamforming. Simiary to IR, each RRU forwards a compressed version of its observation to the CP, which estimates the receive-beamforming signa Z from the decompressed observations. Next theorem shows that WZR achieves the same distortion-rate function than the IR scheme under jointy Gaussian test channes. Theorem 5. The distortion-rate function of the WZR scheme D WZR (R 1,..., R L ) with jointy Gaussian test channes, is the same as the dirtortion-rate function of the IR scheme with Gaussian test channes in Theorem 4, i.e., D WZR (R 1,..., R L ) = D IR (R 1,..., R L ) Outine Proof: Since R WZR (D) R IR (D), we ony need to show that any distortion D achievabe with IR in Theorem 4 is aso achievabe with WZR. For fixed (B 1,..., B L ) and p(s 1,..., s L ) L =1 p(u s, u 1,..., u 1 ) with IR in Theorem 4, the minimum distortion is achieved by considering a test channe U = S + Q. Since this test channe is aso in the cass of test channes L =1 p(u s ) of WZR, it foows that any achievabe distortion D for fixed (B 1,..., B L ) and p(s 1,..., s L ) L =1 p(u s, u 1,..., u 1 ) in Theorem 4 is achievabe with WZR. C. In-Network Processing for Distributed Beamforming In this section, we study the distortion-rate function of the IP scheme in Section III-C for distributed beamforming. At each RRU, the received signa from the previous termina is jointy compressed with the observation and forwarded to the next RRU. Whie the optima joint compression per RRU aong the cascade remains an open probem, even for independent

21 observations [22], we propose to graduay compute the desired function Z by reconstructing at each RRU parts of Z. In particuar, compression at RRU 1 is designed such that RRU reconstructs from S and the received bits an estimate of the part of the function: Z W 1 S 1 + + W S. (48) The design of the compression is done successivey. Assuming U L 1 (U 1,..., U 1 ) are fixed, at RRU, U, is obtained as the soution to the foowing distortion-rate probem: D (R ) = min p(u s,u 1 ) Ẑ )(Z Ẑ ) H ]} (49) s.t. R I(S, U 1; U S +1 ). (50) Probem (49)-(50) corresponds to the distortion-rate function of the remote source coding probem of ossy reconstruction of function Z as Ẑ, from a remote observation S, U 1 when side information S +1 is avaiabe at the decoder. Proposition 3 characterizes the optima test channe at RRU given U L 1, i.e., U, and shows that it is Gaussian distributed as, U = P [U 1; W S ] H + Q, (51) where P = [P U, PS ] and Q CN (0, K ). Let Π U, = PU P U and Q = [Q H 1,..., Q T L ]T, P S [ Π U,2P S 1 W 1,..., Π U, P S 1W 1, P S W 1, 0,..., 0 ], P Q [ Π U,2P S 1, Π U,3P S 2,..., P U P S 1, 0, 0,..., 0 ]. (52) Then, we can write U in (51) as U = R + Q where R [P U, P S ][U 1 ; W S ] H = P S S + P Q Q. (53) Next proposition characterizes the optima test channe U with P and K for given test channes U L 1 with their corresponding P 1,..., P 1 and K 1,..., K 1. Proposition 3. Let F [U 1 ; S ], T z,f Σ f s +1 T H z,f = V Λ D V H, where Λ D = diag[λ D 1,..., λ D K ]

22 and T z,f = Σ z,f Σ 1 f, and Σ z,f = W Σ s P S,H, where W [W 1,..., W, 0,..., 0]. The minimum distortion (49) is D (R ) = K k=1 min{λ, λd k } + Tr{Σ z f s +1 } and λ > 0 satisfies K ( ) λ R = og + D k, (54) λ k=1 where Σ z f s +1 = Σ z Σ z,f s +1 Σ 1 f s +1 Σ H z,f s +1, Σ z,f s +1 = W Σ s [ P S, δ +1 ] H, Σẑ s +1 = [Σ f, P S Σ sδ H +1; δ H +1Σ s P S,T, Σ s+1 ], In addition, the minimum distortion in D (R ) is achieved with K = V Λ Q V H and P = T z,ẑ, where Λ Q = diag[λ Q 1,..., λ Q L ] is a diagona matrix, with the k-th diagona eement λ Q k = min{λ, λd k }/(λd k min{λ, λd k }). Outine Proof: The proof is simiar to that of the remote Wyner-Ziv source coding probem for source reconstruction in [25]. We consider ossy function reconstruction. For simpicity, we drop the RRU index in this proof and define F [U 1, W S ], Y S +1, Z Z and U = U. First, we obtain a ower bound on achievabe distortion. Let us define the MMSE fiters T f,y = Σ f,y Σ 1 y, and [T z,f T z,y ] = [Σ z,f Σ z,y ] Σ f We have from MMSE estimation [22], Σ H f,y Σ f,y Σ y 1. (55) F = T f,y Y + N 1, (56) Z = T z,f F + T z,y Y + N 2 = (T z,f T f,y + T z,y )Y + T z,f N 1 + N 2, (57) where N 1 and N 2 the MMSE error and are zero-mean jointy Gaussian random vectors independent of each other, N 1 is independent of Y and N 2 is independent of F, Y and have the covariance matrices given by Σ N1 Σ f y and Σ N2 Σ z f,y. We have R I(U, S; U Y) (58) I(V H T z,f F; U Y) (59) = h(v H T z,f F Y) h(v H T z,f F Y, U) (60)

23 = h(v H T z,f N 1 ) h(v H T z,f N 1 Y, U) (61) = = K k=1 h(n k) h(n k Y, U, N,k 1 1,1 ) (62) K h(n k) h(n k Y, U) (63) k=1 K I(N k; Y, U) (64) k=1 K I(N k; ˆN k ) (65) k=1 where (59) foows due to the data processing inequaity; (61) is due to (56) and the orthogonaity principe of the MMSE estimator; (62) is due to the definition of N ; (63) foows since conditioning reduces entropy; and (65) is due to the data processing inequaity and since ˆN k is a function of Y, U. On the other hand, we have D E[( Z Ẑ)( Z Ẑ) H ] (66) E[( Z E[ Z U, Y])( Z E[ Z U, Y]) H ] (67) = Tr{E[(T z,f N 1 + N 2 ˆN 1 )(T z,f N 1 + N 2 ˆN 1 ) H ]} (68) = Tr{E[(T z,f N 1 ˆN 1 )(T z,f N 1 ˆN 1 ) H ]} + Tr{Σ z f,y } (69) where (68) foows from (57) and where we have defined ˆN 1 E[ Z U] (T z,f T f,y + T z,y )Y; (69) foows from the independence of N 2 from N 1, Y and F. Consider the eigenvaue decomposition of T z,f Σ N1 T H z,f = T z,f Σ f y T H z,f = VΛ D V H, (70) and define N = V H T z,f N 1. Note that N has independent components of variance Λ D.

24 Therefore, from (69) we have D = Tr{E[(N ˆN )(N ˆN ) H ]} + Tr{Σ z f,y } (71) = K E[(N k ˆN k) 2 ] + Tr{Σ z f,y }, (72) k=1 where (71) foows due to the orthonormaity of V, and we define ˆN V H ˆN 1. It foows from (65) and (72) that D is ower bounded by the sum-distortion D of compressing N k CN (0, λd k ), k = 1,..., K, given as a standard reverse water fiing probem with a modified distortion D D Tr{Σ z f,y }, so that N k is reconstructed with distortion d k E[(N k ˆN k )2 ], D = min d 1,...,d K >0 K d k s.t. R(D) = k=1 K k=1 ( ) λ D og k. (73) d k Note that if D < Tr{Σ z f,y }, then R(D) = and if D > Tr{Σ z y }, then R(D) = 0. The minimum is found with d k = min{λ, λ D k }, for λ > 0 satisfying (54) [24]. The achievabiity of the derived ower bound foows by considering the set of tupes in R IP (D) in Theorem 3 for U satisfying the additiona Markov chain U R (U 1, S ) S +1, which is incuded in R IP (D), as U = R + Q, with R = P [U 1, W S ] H and Q CN (0, K ) and where K = V Λ Q VH and P = T z,f. The distortion-rate function of the proposed IP scheme in Gaussian C-MIMO is given next. Theorem 6. Given U L with K 1,..., K L and P 1,..., P L successivey obtained as in Proposition 3, the distortion-rate of the proposed IP scheme function is given as where Σ z,u L Σ f,s +1 = P S Σ sδ +1. D IP (R 1,..., R L ) =Tr{Σ z Σ z,u L Σ 1 ulσ H z,u }, (74) L = WΣ PS,H s L ; Σ f = P S Σ S,H s P + P Q L diag[k L ] P Q L ; Σ u = Σ L f L + K L, and Proof: Achievabiity foows from Theorem 3 with U L obtained as in Proposition 3. The IP scheme in Section III-C requires joint compression at each RRU. However, for the Gaussian C-MIMO, it is shown next that the distortion-rate function D IP (R 1,..., R L ) in Theorem

25 Fig. 6: In-network Processing scheme for C-MIMO. 6 can be achieved by appying at each RRU separate decompression, partia function estimation foowed by compression, as shown in Figure 6. At RRU : Upon receiving m 1, decompress U 1. Appy oca beamforming as S n = W S. Lineary combine U 1, S to compute an estimate R = P [U 1, S ] H of the partia function up to Termina : Z W 1 S 1 + + W S, (75) Forward a compressed version of R to Termina ( + 1) using Wyner-Ziv compression considering S +1 as side information and the test channe U = R + Q, Q CN (0, K ). Termina (L + 1) reconstructs Z using an MMSE estimator as Ẑ = E[Z U L ]. Proposition 4. For the C-MIMO mode, separate decompression, partia function estimation and Wyner-Ziv compression achieves the distortion-rate function D IP (R 1,..., R L ) in Theorem 6. Proof: The proof foows by showing that at any RRU, the minimum distortion D (R ) and the test channe U in Proposition 3 can aso be obtained with separate decompression, partia function estimation and compression. RRU decompresses U 1 and computes S = W S and R = P [U 1, S ] H. From standard arguments, it foows that compressing à-a Wyner-Ziv with S +1 as decoder side information requires R I(R ; U S +1 ) (76)

26 = I(V H R ; U S +1 ) (77) = I(V H T z,f F ; U S +1 ), (78) where (77) foows since V is orthonorma. Foowing from (60), and by noting that the distortion achievabe by estimating Z from Ū and S +1 corresponds to D in Proposition 3, it foows that any achievabe distortion D is aso achievabe with separate decompression, partia function estimation and compression. Remark 9. Equation (53) highights that due to the successive decompression and recompression performed at each RRU, the quantization noises Q propagate throughout the cascade. The inear combination of the ocay beamformed signa S and the decompressed signa U 1 can be seen as a noisy observation of the sources S, through and additive channe with channe coefficients P S and correated noise P Q Q. This noisy signa is used as an estimate of the partia beamformed signa (48) to be reconstructed as the next RRU. V. A LOWER BOUND In this section, we obtain an outer bound on the RD region R(D). We use the foowing notation from [26]. Define the minimum average distortion for X given Q as E(X Q) min f:q X E[d(X, f(q))], and the Wyner-Ziv RD function for X when side information Y is avaiabe at the decoder as RX Y WZ (D) min I(X; U Y ). (79) p(u x):e(x U,Y ) D An outer bound can be obtained using the rate-distortion Wyner-Ziv function in (79). Theorem 7. The RD region R(D) is contained in the region R o (D), given by the union of tupes (R 1,..., R L ) satisfying R R WZ Z S L c (D), = 1,..., L. (80) Outine Proof: The outer bound is obtained by the RD region of L network cuts, such that for the -th cut, S+1 n,..., Sn L acts as side information at the decoder. See Appendix I.

Average sum-distortion (10 og 10 D) Average sum-distortion (10 og 10 D) 27 0 Average sum-distortion 0 Average EVM -5-10 -15-20 -25-10 -20-30 -30-35 -40 Lower Bound Standard Routing Improved Routing InNetwork Processing -45 0 5 10 15 Average Bits per user B Fig. 7: Average sum-distortion D for M = 15, L = 4, K = 7 vs. average bits per user for B = 0,..., 15 for baanced FH capacities R = KB. -40-50 Lower Bound Standard Routing Improved Routing InNetwork Processing -60 0 5 10 15 Average Bits per user B Fig. 8: Average sum-distortion D for M = 15, L = 4, K = 7 vs. average bits per user for B = 0,..., 15 for increasing FH R = KB. In the Gaussian C-MIMO mode, Theorem 7 can be used to expicity write a ower bound on the achievabe distortion for a given fronthau tupe (R 1,..., R L ) as given next. Proposition 5. Given the fronthau tupe (R 1,..., R L ), the achievabe distortion in a C-MIMO system is ower bounded by D LB (R 1,..., R L ) = max =1,...,L D, where ( ) K K λ D D = min d,k s.t. R = og +,k, (81) d,1,...,d,k >0 d,k k=1 with d,k = min{λ, λ D,k }, for λ > 0 and λ D,k, k = 1,..., K are the eigenvaues of Σ z s L c = WΣ s Ī = V Λ D V H, with Λ D = diag[λ D,1,..., λd,k ]. Outine Proof: The poof foows by computing expicity the Wyner-Ziv RD function [25] for Gaussian vector sources for each network cut. k=1 VI. NUMERICAL RESULTS In this section, we provide numerica exampes to iustrate the average sum-distortion obtained using IR and IP as detaied in Section IV. We consider severa C-MIMO exampes, with K users and L RRUs, each equipped with M antennas under different fronthau capacities. The CP wants

28 to reconstruct the receive-beamforming signa using the Zero-Forcing weights given by W = (H H L H L ) 1 H L. (82) The channe coefficients are distributed as h,k CN (0, 1). We aso consider the SR scheme of [4]. The schemes are compared among them, and to the ower bound in Theorem 7. Note that WZR and IR achieve the same distortion-rate function as shown in Theorem 5, and is omitted. Figure 7 depicts the sum-distortion in a C-MIMO network with K = 15 users and L = 4 RRUs, each equipped with M = 7 antennas for equa fronthau capacity per ink R 1 =... = R L = KB, as a function of the average number of bits per user B. As it can be seen from the figure, the scheme IP based on distributed beamforming outperforms the other centraized beamforming schemes, and performs cose to the ower bound. For centraized beamforming, the scheme IF performs significanty better than SR, as it reduces the required fronthau by ony compressing the innovation at each RRU. Figure 8 shows the sum-distortion in a C-MIMO network with K = 15 users and L = 4 RRUs, each equipped with M = 7 antennas, with increasing fronthau capacity per ink R = KB, = 1,..., L as a function of the average number of bits per user B. In this case, the IP scheme using distributed beamforming aso achieves the owest sum-distortion among the proposed schemes. REFERENCES [1] P. Cuff, H. I. Su, and A. E. Gama, Cascade mutitermina source coding, in Proc. IEEE Int Symposium on Information Theory Proceedings (ISIT), Jun. 2009, pp. 1199 1203. [2] H. H. Permuter and T. Weissman, Cascade and trianguar source coding with side information at the first two nodes, IEEE Tran. Inf. Theory, vo. 58, no. 6, pp. 3339 3349, Jun. 2012. [3] M. Sefidgaran and A. Tchamkerten, Distributed function computation over a rooted directed tree, IEEE Tran. Inf. Theory, vo. PP, no. 99, pp. 1 1, Feb. 2016. [4] S. H. Park, O. Simeone, O. Sahin, and S. Shamai, Mutihop backhau compression for the upink of coud radio access networks, IEEE Trans. on Vehic. Tech., vo. PP, no. 99, pp. 1 1, May 2015. [5] Y. Yang, P. Grover, and S. Kar, Coding for ossy function computation: Anayzing sequentia function computation with distortion accumuation, in IEEE Int Symposium on Information Theory (ISIT), Ju. 2016, pp. 140 144.

29 [6] A. Pugiei, N. Narevsky, P. Lu, T. Courtade, G. Wright, B. Nikoic, and E. Aon, A scaabe massive MIMO array architecture based on common modues, in 2015 IEEE Int Conference on Communication Workshop (ICCW), Jun. 2015, pp. 1310 1315. [7] O. Somekh, B. Zaide, and S. Shamai, Sum rate characterization of joint mutipe ce-site processing, IEEE Trans. Inf. Theory, vo. 53, no. 12, pp. 4473 4497, Dec. 2007. [8] A. De Coso and S. Simoens, Distributed compression for MIMO coordinated networks with a backhau constraint, IEEE Trans. Wireess Comm., vo. 8, no. 9, pp. 4698 4709, Sep. 2009. [9] A. Sanderovich, O. Somekh, H. Poor, and S. Shamai, Upink macro diversity of imited backhau ceuar network, IEEE Trans. Inf. Theory, vo. 55, no. 8, pp. 3457 3478, Aug. 2009. [10] S.-H. Park, O. Simeone, O. Sahin, and S. Shamai, Robust and efficient distributed compression for coud radio access networks, IEEE Trans. Vehicuar Technoogy, vo. 62, no. 2, pp. 692 703, Feb. 2013. [11], Joint decompression and decoding for coud radio access networks, IEEE Signa Processing Letters, vo. 20, no. 5, pp. 503 506, May 2013. [12] Y. Zhou and W. Yu, Optimized backhau compression for upink coud radio access network, IEEE Journa on Se. Areas in Comm., vo. 32, no. 6, pp. 1295 1307, Jun. 2014. [13] B. Nazer, A. Sanderovich, M. Gastpar, and S. Shamai, Structured superposition for backhau constrained ceuar upink, in Proc. IEEE Int Symposium on Information Theory (ISIT), Seou, Korea, Jun. 2012. [14] S.-N. Hong and G. Caire, Compute-and-forward strategies for cooperative distributed antenna systems, IEEE Trans. Inf. Theory, vo. 59, no. 9, pp. 5227 5243, Sep. 2013. [15] I. Estea-Aguerri and A. Zaidi, Lossy compression for compute-and-forward in imited backhau upink mutice processing, IEEE Trans. Communications, vo. PP, no. 99, pp. 1 1, Sep. 2016. [16] I. Estea and A. Zaidi, Partia compute-compress-and-forward for imited backhau upink mutice processing, in Proc. 53rd Annua Aerton Conf. on Comm., Contro, and Computing, Monticeo, IL, Sep. 2015. [17] C. Shepard, H. Yu, N. Anand, E. Li, T. Marzetta, R. Yang, and L. Zhong, Argos: Practica many-antenna base stations, in Proceedings of the 18th Annua Int Conference on Mobie Computing and Networking, ser. Mobicom 12, 2012, pp. 53 64. [18] J. Vieira, S. Makowsky, K. Nieman, Z. Miers, N. Kundargi, L. Liu, I. Wong, V. 00wa, O. Edfors, and F. Tufvesson, A fexibe 100-antenna testbed for massive mimo, in 2014 IEEE Gobecom Workshops (GC Wkshps), Dec. 2014, pp. 287 293. [19] H. V. Baan, M. Segura, S. Deora, A. Michaoiakos, R. Rogain, K. Psounis, and G. Caire, Usc sdr, an easy-to-program, high data rate, rea time software radio patform, in Proceedings of the Second Workshop on Software Radio Impementation Forum, ser. SRIF 13, 2013, pp. 25 30. [20] T. Berger, Muti-termina Source Coding. Chapter in The Information Theory Approach to Communications (G. Longo, ed.), Springer-Verag, 1978. [21] A. Wyner, The rate-distortion function for source coding with side information at the decoder, Information and Contro, vo. 38, no. 1, pp. 60 80, Jan. 1978. [22] A. E. Gama and Y.-H. Kim, Network Information Theory. Cambridge University Press, 2011.