Source Coding and Function Computation: Optimal Rate in Zero-Error and Vanishing Zero-Error Regime

Size: px

Start display at page:

Download "Source Coding and Function Computation: Optimal Rate in Zero-Error and Vanishing Zero-Error Regime"

Kerrie Wilkerson
5 years ago
Views:

1 Source Coding and Function Computation: Optimal Rate in Zero-Error and Vanishing Zero-Error Regime Solmaz Torabi Dept. of Electrical and Computer Engineering Drexel University Advisor: Dr. John M. Walsh 1/73 1

2 References H. Witsenhausen, The zero-error side information problem and chromatic numbers (corresp.), Information Theory, IEEE Transactions on, vol. 22, no. 5, pp , N. Alon and A. Orlitsky, Source coding and graph entropies, Information Theory, IEEE Transactions on, vol. 42, no. 5, pp , A. Orlitsky and J. Roche, Coding for computing, Information Theory, IEEE Transactions on, vol. 47, no. 3, pp , V. Doshi, D. Shah, M. Médard, and M. Effros, Functional compression through graph coloring, Information Theory, IEEE Transactions on, vol. 56, no. 8, pp , G. Simonyi, Graph entropy: a survey, Combinatorial Optimization, vol. 20, pp , J. Cardinal, S. Fiorini, and G. Joret, Minimum entropy coloring, Journal of combinatorial optimization, vol. 16, no. 4, pp , /73 2

3 Outline Introduction Motivation Some graph theory quantities Zero error source coding (Scalar vs. Block coding) Fixed length coding Variable length coding Asymptotically zero probability of error Optimal rate for function compression Achievable coding scheme Function compression in zero-error regime Compression of specific functions 3/73 3

4 Outline Introduction Motivation Some graph theory quantities Zero-error source coding (Scalar vs. Block coding) Fixed length coding Variable length coding Asymptotically zero probability of error Optimal rate for function compression Achievable coding scheme Compression of specific functions Function compression in zero-error regime 4/73 4

Motivation In big data problems How to design computational

the data being processed becomes immense Fundamental problems in

5 Motivation In big data problems How to design computational methodologies suited to statistical calculation Scale easily as the data being processed becomes immense Fundamental problems in massive datasets where some statistical primitives needs to be computed for some tasks. Users submit machine learning tasks Analytics servers translate the tasks into statistical primitives calculation 5/73 5

6 Motivation Central receiver wishes to compute the average temperature of the building Bandwidth limited sensors do not communicate with each other. Considers the recovery not of the sources but the function of sources 6/73 6

7 Outline Introduction Motivation Some graph theory quantities Zero error source coding (Scalar vs. Block coding) Fixed length coding Variable length coding Asymptotically zero probability of error Optimal rate for function compression Achievable coding scheme Function compression in zero-error regime Compression of specific functions 7/73 7

8 Some graph theory quantities Quantities to be discussed for undirected graphs:(undirected, finite and contain neither loops nor multiple edges) Independent set of the graph Coloring of the graph Product of the graph 8/73 8

9 Independent set An independent set or stable set is a set of vertices in a graph, no two of which are adjacent An independent set that is not the subset of another independent set is called maximal f e b a c d 9/73 9

10 Independent set An independent set or stable set is a set of vertices in a graph, no two of which are adjacent. An independent set that is not the subset of another independent set is called maximal. f e b a c Independent set but NOT maximal independent set d 10/73 10

11 Independent set An independent set or stable set is a set of vertices in a graph, no two of which are adjacent. An independent set that is not the subset of another independent set is called maximal. f e b a c Independent set but NOT maximal independent set d 11/73 11

12 Independent set An independent set or stable set is a set of vertices in a graph, no two of which are adjacent An independent set that is not the subset of another independent set is called maximal f e b a c Maximal independent set d 12/73 12

13 Independent set An independent set or stable set is a set of vertices in a graph, no two of which are adjacent. An independent set that is not the subset of another independent set is called maximal. f e b a c n (G) = {f,d,e}{b, d, e} o {c, f, e}{a, d, f} d 13/73 13

14 Probabilistic Graph-Chromatic Number χ(g) is the smallest number of colors needed to color the vertices of G so that no two adjacent vertices share the same color. The chromatic number of a graph clique number(maximum clique size) f b a e f b a e c 0.05 c (G) =3 d d 14/73 14

15 Entropy of the coloring Random variable X P(x), and function c : X N Entropy of the coloring is defined as H(c(X )) = p i = P(c 1 (i)) = i c(x ) p i log p i x V :c(x)=γ P(x) c 1 (i) is a color class (a set of vertices assigned the same color i) Coloring of G is any function c over V such that c 1 (.) induced a partition of V into independent sets of G H X (G, X ) = min{h(c(x )) : c is a coloring of G} 15/73 15

16 Chromatic Entropy Note that χ H(G,P) χ G f e f e f e b 0.05 a 0.05 b a b a c 0.05 c c d d d H(0.333, 0.333, 0.333) = 1.58 H(0.616, 0.333, 0.05) = 1.17 H(0.85, 0.05, 0.05, 0.05) = /73 16

17 Chromatic Entropy Chromatic entropy is known to be NP-hard 1 Find independent sets with high probability mass and establish these as color classes, Unbalance the distribution as much as possible Add a node to the color class with the highest probability if possible. (concavity of f (x) = x log(x) ) If color classes C 1 and C 2 can be merged properly, C = {C 1 C 2, C 3,..., C k } has strictly lower entropy than the original coloring C. 1 J. Cardinal, S. Fiorini, and G. Joret, Minimum entropy coloring, in Algorithms and Computation 17/73 17

18 AND power graph vs OR power graph AND-product graph of G 1 (X 1, E 1 ) and G 2 (X 2, E 2 ) G 1 G 2 has vertex set X 1 X 2 (x 1, x 2) and (x 1, x 2) are connected if {either x i = x i connected to x i in graph G i } for each i = 1, 2 or x i is OR-product graph of G 1 (X 1, E 1 ) and G 2 (X 2, E 2 ) G 1 G 2 has vertex set X 1 X 2 (x 1, x 2) and (x 1, x 2) are connected if either x 1 is adjacent to x 1 or x 2 is adjacent to x 2 in graphs G 1 and G 2, respectively. G 1 G 2 = G 1 G 2 G n = G n 18/73 18

19 AND product graph vs OR product graph (1, 1) (1, 1) (3, 3) (1, 2) (3, 3) (1, 2) (3, 2) (1, 3) (3, 2) (1, 3) 1 (3, 1) (2, 1) (3, 1) (2, 1) 2 3 (2, 3) (2, 2) (2, 3) (2, 2) G original graph 2 AND Power graph 2 OR Power graph 19/73 19

20 Outline Introduction Zero error source coding (Scalar vs. Block coding) Fixed length coding Variable length coding Function compression Optimal rate Achievable coding scheme Function compression in zero-error regime Compression of specific functions 20/73 20

21 Optimal Rate: Source coding, Function compression Source, side info fixed length Source, side info variable length Function scalar code Zero error block code Zero error block code error 21/73 21

22 Zero error source coding In information theory: Increasing the number of repetitions decreases p(e) In some applications no error can be tolerated. Sometimes there are only few source instances available. The error prob. decreases as the no. of instances increases Not useful 22/73 22

23 Zero error- Simple point-to-point communication - Scalar Every source instances, a random variable X is generated according to p(x) independently of all other instance. for a single instance, the smallest number of bits in the worst case is log X log X log X + 1 x Encoder Decoder ˆx X n =[x(1),x(2),...,x(n)] x 2 X x p(x) 23/73 23

24 Zero error- Simple point-to-point, Bocking For n independent instances, the number of bits needed in the worst case is n log X n log X n log X + 1 The asymptotic per-instance In worst case is between log X and log X + 1/n bits X n Encoder Decoder ˆXn X n =[x(1),x(2),...,x(n)] x 2 X x p(x) p{x n 6= ˆX n } =0 24/73 24

25 Zero-error source coding with side information The receiver knows some possibly related rv. Y jointly distributed with X Probability of error must be exactly zero. How many bits are needed in the worst case? Any advantages over block coding of any independent instances? X n Encoder Decoder ˆX n P ( ˆX n 6= X n )=0 Y n 25/73 25

26 Zero-error source coding, Fixed length code Receiver wants to know X without error. (No error is tolerated!) X = {1, 2,..., 5} and Y = {1, 2,..., 5} x y X n Encoder Decoder ˆX n P ( ˆX n 6= X n )=0 Y n 26/73 26

27 Confusability graph x, x are confusable if y Y such that p(x, y)p(x, y) > 0 Given y = 1, then x can be either 1 or 2 (1 and 2 are confusable). This confusability relation can be captured by the characteristic graph G. x y (x, x 0 ) 2 E () if x, x 0 are confusable /73 27

28 Encoding 1-instance at a time, Scaler code The smallest number of possible messages is χ(g) In worst case the smallest number of bits needed is log(χ(g)) x y x x 5 x x 4 x 3 (G) =3 28/73 28

29 n-instance encoding Assign different codewords to the confusable vectors x n = (x 1,.., x n ) and x n = (x 1,..., x n) are confusable if y n Y n such that p(x n, y n )p(x n, y n ) > 0 Confusability of the vectors can be captured by the AND power graph. G n = (V n, E n ) (x n, x n ) E n if (x i, x i ) E or x i = x i for all i {1,..., n} The number of bits needed in worst case is log χ(g n ) In general χ(g n ) (χ(g)) n benefit of block encoding 29/73 29

30 2-AND power graph of the pentagon The five individual pentagons represent the confusability relation for x 2, each for fixed x 1 meta-pentagon represents the confusability relation for x 1 alone (1, 1) (1, 5) (1, 2) (1, 4) (1, 3) log 2 χ(g 2 ) = log 2 5 < log 2 χ 2 (G) = log 2 9 (5, 5) (5, 4) (5, 2) (2, 5) (2, 1) (2, 2) Benefit from blocking can be obtained (5, 3) (2, 4) (2, 3) (4, 1) (3, 1) (4, 5) (4, 2) (3, 5) (3, 2) (4, 4) (4, 3) (3, 4) (3, 3) 30/73 30

31 Outline Introduction Zero error source coding (Scalar vs. Block coding) Fixed length coding Variable length coding Asymptotically zero probability of error Optimal rate for function compression Achievable coding scheme Compression of specific functions 31/73 31

32 Variable length coding, Simple point-to-point Scalar code: the smallest expected number of bits needed is between H(X ) and H(X ) + 1 Block encoding: the number of bits needed on average is between nh(x ) and nh(x ) + 1 The asymptotic per-instance on average is between H(X ) and H(X ) + 1 n bits x Encoder Decoder ˆx X n =[x(1),x(2),...,x(n)] x 2 X x p(x) 32/73 32

33 Variable length coding, Side information Variable length codes assigns shorter codes to the most frequent symbols and vice versa. Alon and Orlitsky considered two family of codes. 2 No single letter characterization exists for optimal variable length code Complementary graph entropy For a subclass of variable length codes the optimal rate is Graph entropy. X n Encoder Decoder ˆX n P ( ˆX n 6= X n )=0 Y n 2 N. Alon and A. Orlitsky, Source coding and graph entropies, Information Theory, IEEE Transactions 33/73 33

34 Coding method for unrestricted inputs (UI) Encoding function φ : X {0, 1} { for all x, x V φ(x) P φ(x ) (x, x ) E φ(x) φ(x ) The decoder learns x only if p(x, y) > 0. o.w., the decoder may be wrong on the value of x. (x 1), (x 2), (x3), (x 4 ) ) (x 1) (x 2) (x 3) (x 4) (x 1 )= (x 3 ) p(x 1,y 1 ) > 0 p(x 3,y 1 )=0 ) x 1 is detected x 1 x 2 x 3 x 4 ) Encoder Decoder Lookup table x 1 x 2 x 3 x 4 y 1 y 2 y 3 y 4 34/73 34

35 Coding method for unrestricted inputs (UI) Encoding function φ : X {0, 1} { for all x, x V φ(x) P φ(x ) (x, x ) E φ(x) φ(x ) UI code is a coloring of G followed by a prefix-free coding of the colors. (x 1), (x 2), (x3), (x 4 ) ) (x 1) (x 2) (x 3) (x 4) (x 1 )= (x 3 ) p(x 1,y 1 ) > 0 p(x 3,y 1 )=0 ) x 1 is detected x 1 x 2 x 3 x 4 ) Encoder Decoder Lookup table x 1 x 2 x 3 x 4 y 1 y 2 y 3 y 4 35/73 35

36 Expected number of bits Note The optimal code isn t necessarily induced by the optimal coloring The expected number of bits transmitted is l(ψ) = x V P(x) ψ(x) The minimum Rate of UI codes are: L(G) = min{l(ψ) : ψ is an UI code} call the codes attaining these minima the optimal code 36/73 36

37 UI code-scalar H χ (G, X ) L H χ (G, X ) + 1 Upper bound: Take a coloring of G achieving H χ (G, X ). There is a prefix free encoding of the colors whose expected length is at most H χ (G, X ) + 1 Data compression result H(X ) l p.f (X ) H(X ) + 1 lower bound Every UI protocol can be viewed as a coloring of G and prefix free encoding of it s colors. The entropy of a coloring is at least H χ (G, X ) The expected length is at least H χ (G, X ) 37/73 37

38 n-independent instance x 1 x 2 x 3 x 4 Encoder Decoder? x 2? x 4 p(x 1,y 3 )=0 p(x 3,y 1 )=0 y 3 y 2 y 1 y 4 Decoder needs to learn x i for exactly those i s where p(x i, y i ) > 0 and can be wrong about x i for the other i s. This may require more bits. x n, x n need to be distinguished if i such that p(x i, y i )p(x i, y i) > 0 (x n, x n ) E n if (x i, x i ) E for some i {1,.., n} This is the definition of OR-Power graph of G 38/73 38

39 Optimal rate for UI (variable length code) Remember for one instance the optimal rate is for n-instances Asymptotic per instance 3 : H χ (G, X ) L H χ (G, X ) + 1 H χ (G n, X (n) ) L n H χ (G n n, X (n) ) L am = lim n n H χ(g n, X (n) ) = H G (X ) 3 N. Alon and A. Orlitsky, Source coding and graph entropies, Information Theory, IEEE Transactions 39/73 39

40 Graph entropy Was originally defined by Körner(1973). H G (X ) = min I (W, X ) X W Γ(G) Γ(G) is the set of independent sets of the confusability graph G W is an independent set of G The minimization can be restricted to W ranging over the independent sets. 40/73 40

41 Source, side info fixed length Source, side info variable length Function scalar code Zero error log (G) H (G, X)? block code Zero error log (G^n ) H G (X)? block code error?? H G (X Y ) apple H(X Y ) apple H G (X) apple H (G, X) apple H(X) apple log (G) 41/73 41

42 Outline Introduction Zero error source coding (Scalar vs. Block coding) Fixed Length coding Variable Length coding Asymptotically zero error Optimal rate for function compression Achievable coding scheme Function compression in zero-error regime Compression of specific functions 42/73 42

43 Function computing Compute a function f (X, Y ) at the decoder, and we want this encoding to be as efficient as possible f (X, Y ) = Z must be determined for a (single) block of multiple instances Vanishing (bit) block probability of error is allowed H(f (X, Y ) Y ) Rate= R H(X Y ) : X n! {0, 1} k : {0, 1} k Y n! Z n p(x, Y ) X encoder decoder ˆf(X, Y ) Y p( ( (x), y) 6= f(x, y)) < 43/73 43

44 Example Satellite has 10 parameters to report to the base Base needs to learn only one parameter(only the base knows which one) The lower bound gives H(x Y Y ) = 1 bit and the upper bound gives H(X Y ) = 10 bits. R = 10 f(x, Y )=x Y Y Unif(1 : 10) X =(x 1,..,x 10 ) 44/73 44

45 Slepian Wolf In the case where f (X, Y ) = (X, Y ), the well-established rate result is the Slepian-Wolf bound X Y encoder encoder R 1 R 2 R 1 >H(X Y ) ( ˆX,Ŷ ) R 2 >H(Y X) R 1 + R 2 >H(X, Y ) When Y is available at the decoder, the optimal rate for the transmission of X is the conditional entropy of X given Y, H(X Y ). 45/73 45

46 Slepian Wolf To achieve this rate H(X Y ), encoder use more efficient encoding by taking advantage of its correlation with Y. knowing that the decoder can use its knowledge of Y to disambiguate the transmitted data This analysis so far has ignored the fact that we are computing f (X, Y ) at the decoder, rather than reporting the two values directly! 46/73 46

47 Can we do better than Slepian Wolf bound? Example: X and Y are integers and f = X + Y (mod 4). Regardless of how large X and Y might be, just encode the last two bits of each. The SW bound only takes advantage of the correlation between the two rvs. SW doesn t take advantage of the properties of the function To analyze a given f, we construct f-confusability graph. 47/73 47

48 f -Confusability graph f-confusability graph of X given Y, f, and p(x, y) is a graph G = (V, E) V x = X and, (x, x ) E if y Y, such that, both p(x, y) > 0 and p(x, y) > 0 and f (x, y) f (x, y). Negation If two nodes are not confusable, for all possible values of y either p(x, y) = 0 or p(x 0,y)=0 (x, x 0 ) 62 E or f(x, y) =f(x 0,y) 48/73 48

49 Encoding scheme 1-instance (x, x ) E can safely be given the same encoding Color the f-confusibility graph G f with minimum entropy Transmit the name of the color class that contains each value SW: for asymptotically zero-error, it suffices to send the conditional entropy H(X Y ) bits. Use SW encoding on the distribution over the color classes. This rate is H χ (G f, X Y ) = min p(y)h(c(x) Y = y) c y Y Slepian-Wolf Code X Graph Coloring ENCODER DECODER Lookup Table f(x, Y ) Y 49/73 49

50 Encoding scheme n-instance How to define the notion of f-confusability for vector? (x n, x n ) are f-confusable, if y n Y n such that p(x n, y n )p(x n, y n ) > 0 and f (x n, y n ) f (x n, y n ) (x n, x n ) E n if (x i, x i ) E for some i {1, 2,..., n} This is the definition of OR power graph 50/73 50

51 Optimal rate-achievable scheme For 1-instance Asymptotic per instance H χ (G f, X Y ) 1 lim n n H χ(g n f, X (n) Y (n) ) = H Gf (X Y ) we can compute the function with vanishing probability of error by first coloring a sufficiently large power graph of the characteristic graph encode the colors using any code that achieves the entropy bound on the colored source 51/73 51

52 Optimal rate optimal rate for the zero distortion functional compression problem with side information equals R = H G (X Y ) 4 H G (X Y ) = min I (W, X Y ) W X Y X W Γ(G) where W X Y indicates that W, X, Y is a Markov chain. W should not contain any information about Y that is not available through X The minimization can be restricted to W ranging over the maximal independent sets. 4 A. Orlitsky and J. Roche, Coding for computing, Information Theory, IEEE Transactions 52/73 52

53 Summary Source, side info fixed length Source, side info variable length Function scalar code Zero error log (G) H (G, X)? block code Zero error log (G^n ) H G (X)? block code error H(X Y ) H Gf (X Y ) H Gf (X Y ) apple H(X Y ) apple H G (X) apple H (G, X) apple H(X) apple log (G) 53/73 53

54 Outline Introduction Zero-error source coding (Scalar vs. Block coding) Fixed Length coding Variable Length coding Asymptotically zero-error Optimal rate for function compression Achievable coding scheme Function Computation in zero-error regime Compression of specific functions 54/73 54

55 Function Compression, Scalar code, Zero-error Compute a function f (X, Y ) at the decoder. f (X, Y ) must be determined for a (single) block instance. No error is allowed (zero-error regime) Fundamental limit of computing a function with exactly zero probability of error? Is motivated by Alon, and Orlitsky ( source coding with side information) p(x, Y ) X encoder decoder ˆf(X, Y ) Y 55/73 55

56 Achievable scheme The protocol that guarantees the zero probability of error, and to compute f (x, y) only when p(x, y) > 0 (x, x ) E f can safely be given the same encoding Either p(x, y) = 0 or p(x, y) = 0 f (x, y) = f (x, y) Color the f-confusibility graph G f with minimum entropy Transmit the name of the color class that contains each value Using a prefix free encoding on the distribution over the color classes yields a valid encoding scheme for the problem 56/73 56

57 Achievable coding scheme Hu man Code X Graph Coloring ENCODER DECODER Lookup Table f(x, Y ) Y 57/73 57

58 Optimal scalar code for Function computing Theorem Optimal rate for computing a function f (X, Y ) with zero probability of error, using scalar variable length code is H χ (G f, X ) R H χ (G f, X ) + 1 Proof. Upper bound: Take a coloring of G achieving H χ (G f, X ). There is a prefix free encoding of the colors whose expected length is at most H χ (G f, X ) + 1 lower bound This protocol can be viewed as a coloring of G f prefix free encoding of it s colors. and The entropy of a coloring is at least H χ (G f, X ) The expected length is at least H χ (G f, X ) 58/73 58

59 Block encoding To protect against loss of synchronization if the side information at the decoder is occasionally wrong: For a sequence x 1 x 2,..x n, the function needs to be computed correctly only for those coordinates that p(x i, y i ) > 0. For other positions decoder may not be able to compute f (x i, y i ) correctly. (x n, x n ) E (n) if (x i, x i ) E for some i {1,.., n} (x n, x n ) E (n) φ(x n ) φ(x n ) (1) x n, x n V (n) φ(x n ) P φ(x n ) (2) 59/73 59

60 Optimal rate for function compression in zero-error regime One instance n-instance Asymptotic per instance H χ (G f, X ) R H χ (G f, X ) + 1 H χ (G n f, X (n) ) R n H χ (G n f, X (n) ) lim n n H χ(g n f, X (n) ) = H Gf (X ) We can compute the function with zero-error by first coloring a sufficiently large power graph of the characteristic graph Encode the colors using Huffman codes 60/73 60

61 function compression in zero-error regime Proposition In exactly zero-error regime, using block code, the optimal rate for computing f (X, Y ) is less than recovering the source X. Proof. G f = (V, E f ) and G = (V, E), and E f E Let (X, W ) be random variable achieving H G (X ) W is an independent set containing X, and G f G it s also an independent set of G f. H Gf (X ) = min I (X ; W ) I (X ; W ) = min I (X ; W ) = H G (X ) X W Γ(G f ) X W Γ(G) 61/73 61

62 function compression in zero-error regime Proposition In exactly zero-error regime, using scalar code, the optimal rate for computing f (X, Y ) is less than recovering the source X. G f G Any coloring of G is a coloring of G f The coloring with minimum entropy of G is also a coloring of G f. H χ (G f, X ) H(c (G f )) = H χ (G, x) H(c(G)) 62/73 62

63 Summary Theorem We can get saving over the block encoding in zero-error function compression regime. H Gf (X ) H χ (G f, X ) Proof. if c is a coloring of G, then c 1 (c(x )), the color class of X, induced a partition of V into independent sets of G, and always contains X. H χ (G f, X ) = min{h(c(x ) c 1 (c(x )) are the color classes} a = min {H(W ) disjoint union of W s are the color classes} X W Γ(G f ) b = min {I (W ; X ) disjoint union of W s are the color classes} X W Γ(G f ) c min {I (W ; X )} = H G f (X ) X W Γ(G f ) 63/73 63

64 Summary Source, side info fixed length Source, side info variable length Function scalar code Zero error log (G) H (G, X) H (G f,x) block code Zero error log (G^n ) H G (X) H Gf (X) block code error H(X Y ) H Gf (X Y ) H G (X Y ) apple H(X Y ) apple H G (X) apple H (G, X) apple H(X) apple log (G) 64/73 64

65 Outline Introduction Zero-error source coding (Scalar vs. Block coding) Fixed Length coding Variable Length coding Asymptotically zero-error Optimal rate for function compression Achievable coding scheme Function compression zero-error regime Compression of specific functions 65/73 65

66 Compression of specific functions In environmental monitoring, a relevant statistic of temperature sensor readings x 1, x 2,..., x n may be the mean temperature, or median, or mode. In alarm networks, the quantity of interest might be the maximum, max 1 i n (x i ) of n temperature readings. Frequency histogram function counts the number of occurrences of each argument among measurements. It yields many good results such as mean, variance, maximum, minimum, median and other important statistics Function compression on specific functions by graph coloring which provide benefit over compressing each source separately 66/73 66

mod r-sum We have K users which has access to independent sequences Each node (user) periodically makes measurements, which belongs to a fixed finite set X = {1, 2,..., X }. f : X K {0,.

67 mod r-sum We have K users which has access to independent sequences Each node (user) periodically makes measurements, which belongs to a fixed finite set X = {1, 2,..., X }. f : X K {0,..., r 1} X 1 = x1(1),x1(2),...,x1(n) Enc 1 X 2 = x2(1),x2(2),...,x2(n) Enc 2 Dec f(x 1(j),x 2(j),...,x K(j)) = x 1(j) x 2(j)... x K(j) for j 2 {1,...,N} X K = xk(1),xk(2),...,xk(n) Enc K 67/73 67

68 The characteristic graph of the i-th user Theorem (p, q) E i (p, q X ), if and only if, p q mod r Proof. : Since (p, q) E i, we obtain that for all x \i X N 1, f (X ) = f (p, x \i ) = f (q, x \i ). p + j i x j q + j i x j mod r p q mod r. : If p q mod r, and x j x j mod r, then for all x \i X N 1 we j i j i have p + j i x j q + j i x j mod r. Hence, p, q E i 68/73 68

69 The characteristic graph of the i-th user Note that maximal independent sets form a partition. W0 = {x X x = 0 (modr)} W1 = {x X x = 1 (modr)}... Wr 1 = {x X x = r 1 (modr)} H χ (G, X ) can be achieved by assigning different colors to different maximal independent sets. The minimum entropy coloring can be achieved by optimal coloring χ(g) 69/73 69

70 Optimal rate for Mod r sum H G (X Y ) = min I (W, X Y ) = min{h(w Y ) H(W XY )} = min{h(w Y ) H(W X )} = H(W Y ) where P i = x i W i p(x i ) = H(W ) = (P 0 log P P r 1 log P r 1 ) If the distribution over the vertices is uniform, and X = Cr, and C Z +, the total benefit is H(X Y ) H G (X Y ) = log X log r = log C 0 70/73 70

71 Summary Source coding problem with side information in zero-error regime is discussed. The optimal rate for function compression in zero-error regime is derived Function computation in asymptoticly zero error regime is discussed. The optimal rate for computing mod r sum is derived. 71/73 71

72 Thank you Questions? 72/73 72

Distributed Functional Compression through Graph Coloring

Distributed Functional Compression through Graph Coloring Vishal Doshi, Devavrat Shah, Muriel Médard, and Sidharth Jaggi Laboratory for Information and Decision Systems Massachusetts Institute of Technology