Network Coding on Directed Acyclic Graphs John MacLaren Walsh, Ph.D. Multiterminal Information Theory, Spring Quarter, 0 Reference These notes are directly derived from Chapter of R. W. Yeung s Information Theory and Network Coding, Springer, 008. Motivating Example: Butterfly Network s s b t t Figure : The butterfly example. We presented the butterfly example as a case where we could increase the capacity region of a network by incorporating coding between flows at intermediate network nodes. Summing the messages on link b allow both s and s to be multicasted to both sinks t, t. 3 Definition of a Network Code & Coding Capacity Region In these lectures we model a network as a directed acyclic graph G = V, E). There is a finite set of vertices or nodes V, and collection of edges e E which are ordered pairs of vertices e = v, v ), v, v V. We call vertex v the tail of edge e = v, v ) and the vertex v the head of edge e. A sequence of vertices e, e,..., e k such that the head of edge e n is the tail of the next edge e n+ is called a directed path in the the graph G. A directed path with the property that the tail of e is the head of edge e k is called a cycle, and the graph is called acyclic if it has no cycles. There is also a set of source nodes S V and sink nodes T V with S T =. Each source node s is endowed with a source variable X s uniformly distributed over the set X s =,,..., Nτs. The variables X s, s S are mutually independent, and represent messages to be sent over the network. Additionally, the edges of the graph are associated with capacity itations R e, indicating the number of bits per source time instant which can be sent over these edges. Each sink t T has a subset of source variables, those with indices in βt) S, that it wishes to determine. In order to make this possible, each node in the network will encode all of the messages it hears on its incoming edges i.e. all those edges that have it as a tail) into a message to be sent on its outgoing edges i.e. all those edges that have it as a head). These nodes do this using the functions i V \ S T ), e Outi) k e : 0,,..., η d 0,,..., η e ) d Ine) where Ini) = e E e = j, i) some j V is the set of edges having node i as their tail and Outi) = e E e = i, j) some j V is the set of edges having node i as their head. The source nodes encode their sources into messages
through the functions s S, e Outs), k e : X s 0,,..., η e ) The sink nodes reproduce the source messages using the encoded messages available locally to them through the functions t T, g t : 0,,..., η d X s 3) d Int) The aggregate of these functions ),),3) are collectively known as a N, η e : e E), τ s : s S network code. For this to work, all of the messages must have arrived on the incoming edges before the ones on the outgoing edges are calculated. This is enabled through the assumption of an acyclic network. For any finite directed acyclic graph, it is possible to order the nodes in a sequence such that if e E, e = i, j), then node i appears before node j in the sequence. By selecting such an order to perform the encoding among the nodes, every node will have all of the messages from its incoming edges before it calculates the message on the outgoing nodes. Let g t X S ) represent the composition of all functions from the sources to the sink t. We say that a collection of source rates ω s, s S are achievable if for arbitrarily small ɛ > 0, there exists a network code such that s βt) N log η e R e + ɛ e E 4) τ s ω s ɛ 5) P [ g t X S ) X βt) ] ɛ 6) The set of all achievable rate vectors ω, denoted by R, is the network coding capacity region. 4 Network Coding Capacity Region If a collection of rates were ω achievable with zero probability of error and a block length of N = for a particular network code, then, after identifying the random variables Y s, s S with the sources X s, s S, and the random variables U e with the coded message on edge e, for this code we would have the inequalities Indeed HY s ) ω s s S 7) HY S ) = s S HY s ) 8) HU Outs) Y s ) = 0 s S 9) HU Outi) U Ini) ) = 0 i V \ S T ) 0) HU e ) R e ) HY βt) U Int) ) = 0 ) 7) reflects the fact that the sources must be uniform over a set with cardinality Nτs with τ s ω s 8) reflects the requirement that the sources are independent of one another. 9) reflects that the message encoded by a source node is a function of the source available to it., i.e. ) 0) reflects that the messages on the outgoing edges from a node are a function of the messages on its incoming edges, i.e. ) ) reflects the itations on edge capacity 4) ) indicates the zero probability of error reconstruction. Of course, our notion of an achievable rate in the network coding capacity region R, was the usual Shannon lossless notion, which allows a non-zero, but finite probability of error as indicated by 6) and an arbitrarily large block length N and closure in rate space as represented by 4) and 6). Surprisingly, this lossless network coding capacity region can be written directly in terms of the inequalities 7,8,9,0,,) with an expression we shall define presently. The first bit of notation is to stack subset entropies into a vector. That is, given a collection of M = V + E random variables, there are M non-empty subsets of the random variables, and to each such subset we have an associated entropy.
We stack the entropies of these subsets into a vector h of dimension M, and will index this vector via the subset, so that for instance h A will represent the joint entropy of the random variables in A. The ordering for the indexing can be done, for instance, by using the integer associated with the length M binary string whose kth bit indicates whether or not k is in A.) We then consider each of the inequalities 8,9,0,,) as linear inequalities for this vector, defining the linear constraint sets L = h R M h Y S = h Ys 3) s S L = h R M huouts) Y s h Ys = 0 s S 4) L 3 = h R M huouti) U Ini) h UIni) = 0 i V \ S T ) 5) L 4 = h R M h Ue R e e E 6) L 5 = h R M hyβt) U Int) h UInt) = 0 t T 7) Additionally, introduce the following notation. Proj YS B) := h Ys : s S) h B. ΛB) := h R S + h h, h B 3. convb) the convex hull of the set B 4. B the closure of the set B 5. Γ M = h R N + Z, Z,..., Z M ) finite discrete random variables with h A = HZ A ) A,..., M Yeung and his co-workers have shown that the network coding capacity region R is equal to R = Λ Proj YS conv Γ M L 3) L 4 L 5 8) 4. Converse Sketch For the converse, we must show that R R. Consider an achievable rate vector ω R and a monotone decreasing sequence ɛ k 0 as k. Then for every k for every N sufficiently large there exists a network code such that N log η e R e + ɛ k e E 9) τ s ω s ɛ k 0) P [ g t X S ) X βt) ] ɛ k ) Let U e be the message sent by the network code on edge e for all e E, and identify Y s = X s as the source variables. Because the source variables are independent and because encodings on outgoing edges are a function of the messages on incoming edges to a node, the inequalities 7,8,9,0) hold among these finite discrete random variables Y s, s S, U e, e E. Now, Fano s inequality states that HY βt) U Int) ) + k) t log Y βt) = + k) t HY βt) ) + ɛ k HY βt) ) ) While we can upper bound the entropy of HY βt) ) using HY βt) ) = IY βt) ; U Int) ) + HY βt) U Int) ) HU Int) ) + HY βt) U Int) ) 3) HU Int) ) + + k) t HY βt) ) log η e + ) + + k) t HY βt) ) NR e + ɛ k ) + ɛ k) HY βt) 4) ) e Int) Solving for an inequality for HY βt) ) we get HY βt) ) N ɛ k e Int) R e + ɛ k N e Int) 5) 3
which when substituted back into Fano ) gives HY βt) U Int) ) N N + ɛ k R e + ɛ k. 6) ɛ k N e Int) φ tn,ɛ k ) Here, it is clear the function φ t N, ɛ k ) is bounded, is monotone decreasing in both k and N, and approaches 0 as k, N. Moving next to the edge capacity constraints, and the entropies of the sources we observe that HU e ) log η e + ) NR e + ɛ k ) HY s ) Nω s ɛ k ) 7) If we define the half spaces, reminiscent of L 4, L 5 in the more general case L N 4,ɛ k = h R M h Ue NR e ɛ k ) e E L N 5,ɛ k = h R M hyβt) U Int) h UInt) Nφ t N, ɛ k ) t T 8) 9) we observe that the subset entropies of this network code h k) lie in the set h k) Γ M L 3 L N 4,ɛ k L N 5,ɛ k, h Ys Nω s ɛ k ), s S 30) Now, since 0 Γ M L 3 L N 4,ɛ k L N 5,ɛ k, and N h can be viewed as a convex combination with h and zero, we observe that where N h k) conv Γ M L 3) L 4,ɛk L 5,ɛk, N h Ys ω s ɛ k 3) L 4,ɛk = L 5,ɛk = h R M h Ue R e + ɛ k ) e E h R M hyβt) U Int) h UInt) φ t N, ɛ k ) t T 3) 33) Defining the constraint set in 3) to be B N,k) we observe that the B N,k) s are monotone decreasing in that hence B N+,k) B N,k) B N,k+) B N,k) 34) N,k BN,k) = N= k= B N,k) 35) and the latter set, since it involves the intersection of the inequalities in L 4,ɛk and L 5,ɛk, becomes N,k N h k) conv Γ M L 3) L 4 L 5, Rearranging this fact, we have that if ω is achievable, then ω Λ proj YS conv Γ M L 3) L 4 L 5 which is what we needed to prove. 4. Obtaining Inner and Outer Bounds N h k) Y N,k s ω s, s S 36) We discussed that the capacity region presented in 8) is implicit, in that we don t generally know all of the inequalities necessary to describe Γ M in fact, the closure of its infinite cardinality counterpart Γ M is not even polyhedral for M 4). It is possible to obtain inner and outer bounds on the capacity region by substituting in inner and outer bounds for Γ M. We discussed a polyhedral outer bound for Γ M known as the Shannon outer bound Γ M. One way to write the Shannon outer bound is the set of vectors obeying the properties that entropy is a non-decreasing set function that is sub-modular: Γ M := h R M h A h A A A 38) h A + h B h A B + h A B A, B,..., M Inner bound TBD. 37) 4
4.3 Achievability Sketch We begin by proving an alternate form of the capacity region 8). Let D ) be a set operator which scales all the points in the set by numbers in between zero and one: DA) = αh h A, α [0, ] 39) We will prove that the convex hull can be replaced by scalings in the capacity region expression, so that Λ proj YS conv Γ M L 3) L 4 L 5 = Λ proj YS D Γ M L 3) L 4 L 5. 40) To do this, we will show that D Γ M L 3) = conv Γ M L 3) 4) Consider a point h D Γ M L 3). It is the it of some sequence h k D Γ M L 3), where h k = α k ĥ k for some ĥ k Γ M L 3 and α k [0, ]. Noting that 0 Γ M L 3, we can view α k ĥ k as the convex combination α k ĥ k + α k )0) convγ M L 3). This shows that D Γ M L 3) conv Γ M L 3). To prove the other containment, we show that DΓ M L 3) is a convex set containing Γ M L 3. Since the convex hull convγ M L 3) is defined as the smallest convex set containing Γ M L 3, the convexity of DΓ M L 3) will guarantee that it contains convγ M L 3) DΓ M L 3).) Consider two points h, h DΓ M L 3), and select any λ [0, ]. These points are its of the sequences h k) = α k) ĥk) and h k) = α k) ĥk) with α k), αk) 0, ] and ĥk), ĥk) Γ M L 3. Select a sequence of positive integers n k, n k N with n k, n k as k and with n k α k n k λ 4) αk λ Letting the collection of random variables Z,..., ZM ) and Z,..., ZM ) be random variables obtaining any ĥ, ĥ Γ M L 3 respectively, we observe that the collection of random variables Z,..., Z M defined via are associated with the entropies n ĥ + n ĥ sufficiently large we have This then shows that Z m = Zm,,..., Zm,n, Zm,,..., Zm,n ), m,..., M 43) n i.i.d. copies of Zm n i.i.d. copies of Zm α k α k n k αk + nk αk for all k sufficiently large, which then implies that α α k k k n k αk + nk αk Γ M L 3, hence n k ĥk) + n k ĥk) ) Γ M L 3. α k α k n k αk + nk αk However, rearranging the terms inside the it we have α α k k ) k n k αk + n k ĥk) nk αk + n k n k ĥk) ) = α k k n k αk + nk αk Additionally for k 44) n k ĥk) + n k ĥk) ) Γ M L 3 45) ) n k ĥk) + n k ĥk) ) Γ M L 3 46) α k ĥk) + k n k α k n k αk + nk αk α k ĥk) = λh + λ)h 47) is in Γ M L 3, proving that it is convex. This establishes We must now prove that any vector ω in the alternate rate region representation ω Λ proj YS DΓ M L 3) L 4 L 5 48) is acheivable. Since we can discard ) any excess rate we wish not to use, this amounts to showing that any rate vector ω proj YS DΓ M L 3) L 4 L 5 is achievable. Let h DΓ M L 3) L 4 L 5 be a vector such that ω = proj YS h. Since h DΓ M L 3) L 4 L 5 it is the it of some sequence h k DΓ M L 3) which is of the form α k ĥ k with 5
ĥ k Γ M L 3. Let Ys k : s S), Ue k : e E be the random variables associated with ĥk, since the entropies for these variables are in L 3 they obey the inequalities HY k S ) = s S HY k s ) 49) HUOuts) k Y s k ) = 0 s S 50) HU Outi) U Ini) ) = 0 i V \ S T ) 5) The equalities 50) and 5) show that we may think of the random variables Ue k = f e,k Ys k ) as a deterministic function f s,e,k of the source random variable Ys k for each e Outs) and s S and the random variables Ue k = f e,k UIni) k ) as a deterministic function of the random variables UIni) k for each e Outi) for each i V \ S T ). Additionally, since the it of the scaled entropies h k is in L 4 L 5 and has proj YS k h k = ω, we have where γ k 0 and µ k 0 as k while ω k s ω s. Let ˆN k = αn k. For each source s, generate a N kτ k s α k HY k s ) = ω k s s S 5) α k HU k e ) R e + µ k e E 53) α k HY k βt) U Int γ k t T 54) ˆNk dimensional matrix by sampling its elements i.i.d. according to the distribution p Ys, let the jth row be denoted by Y ˆN k s j), j,..., N kτ s. For each edge, enumerate all of the length ˆN k typical sequences T ˆN k ɛ Ue k ) as U ˆN k e,k ),..., U ˆN k e,k ηk e ). Due to the bound on the cardinality of the typical set, for such an enumeration ηe k ˆN k HU k e )+ɛc) α kn k HU k e )+ɛc e,k) 55) so that N k log η k e α k HU k e ) + ɛc e,k R e + µ k + ɛc e,k 56) The encoder at source node s selects at random one of the N kτ k s rows in its matrix, then calculates the deterministic function f s,k Y ˆN k s j operating elementwise on each of the ˆN k positions in the vector. Provided we select ɛ e,k Y k s ɛ s,k e Outs) 57) if Y ˆN k s j) T ˆN k ɛ s,k Ys k ), then the result of these deterministic function will be in T ˆN k ɛ e,k Ue k ), and together the outgoing messages will all be jointly typical, i.e. in T ˆN k ɛ Outs),k UOuts) k, Y s k ). The messages sent are the associated typical sequence index from,..., η e from the deterministic function, or 0 if the input was not typical. Via the Markov lemma if we take N sufficiently large, then we observe that if Y ˆN k s j) T ˆN k ɛ s,k Y k s ) for each s S, then all of the messages outgoing from these source are together jointly typical. Proceeding via the order defined by the directed acyclic graph for the operation of the encoders such that all incoming message are available before an outgoing message is calculated), we observe that provided all incoming messages are jointly typical, the outgoing messages calculated with the deterministic functions f e,k operating on the typical sequences associated with the incoming indices) will be jointly typical themselves, and jointly typical with everything computed so far. Thus, provided that each of the selected messages Y ˆN k s j s ) are typical, they will be jointly typical with the sequences associated with each of the incoming messages at a sink. The sink operates by looking in the rows of the codebooks for the sources βt) for a collection of codewords that are jointly typical together with the incoming messages U ˆN k Int). By the logic above, there will be at least one collection corresponding to the correct decoding) of such sequences provided that Y ˆN k s j s ) are typical. If there is more than one such collection, then an error is declared. This error event is then the union, over all subsets A of βt), of the events E A for which U ˆN k Int), Y ˆN k A j A ), Y ˆN k A j c A c) are jointly typical for some j s j s for each s A. These events associated with the independent codewords Y ˆN s j s), s A winding up in the jointly typical set with Y ˆN s j s ), s A c and U ˆN Int), have probabilities bounded by P[E A ] N s A τs ˆNIY A ;Y A c,u Int) ɛc) 58) These will go to zero exponentially as N k provided that we select τs k slightly less than α k HY s ) = ω s. This, together with 49,50,5,56), shows that the ωs k are achievable according to the definition 4,5,6) ) for sufficiently large k and N k. 6