Uncertain Compression & Graph Coloring Madhu Sudan Harvard Based on joint works with: (1) Adam Kalai (MSR), Sanjeev Khanna (U.Penn), Brendan Juba (WUStL) (2) Elad Haramaty (Harvard) (3) Badih Ghazi (MIT), Elad Haramaty (Harvard), Pritish Kamath (MIT) August 4, 2017 IITB: Uncertain Compression & Coloring 1 of 21
Classical Compression The Shannon setting Alice gets mm [NN] chosen from distribution PP Sends compression yy = EE PP mm 0,1 to Bob. Bob computes mm = DD PP yy (Alice and Bob both know PP). Require mm = mm (whp). Goal: min EE PP,DD PP EExp mm PP EE PP mm Further (technical) requirement: Prefix-free mm mm, EE PP (mm) not prefix of EE PP mm Ensures we can encode sequence of messages August 4, 2017 IITB: Uncertain Compression & Coloring 2 of 21
Shannon+Huffman [Kraft]: Prefix-free compression: Encoding message ii with length l ii possible iff 2 l ii 1 [Shannon]: Assign length log 2 ii PP(ii) with message ii Expected length EE ii log PP(ii) HH PP + 1 Motivates HH PP EE ii log PP ii [Huffman]: Constructive, explicit, optimal August 4, 2017 IITB: Uncertain Compression & Coloring 3 of 21
Compression Fundamental problem: gzip, Lempel-Ziv Leads to entropy: Fundamental measure. Fundamental role in learning Learning Compression A goal in language too! Language evolves Words introduced, others fade. Why? Intuitive explanation: distributions on messages evolve! August 4, 2017 IITB: Uncertain Compression & Coloring 4 of 21
Compression as proxy for language Understanding languages: Complex, not all forces well understood. Others hard to analyze. Compression: Clean mathematical problem. Faces similar issues as language. (Sometimes) easier to analyze. But to model issues associated with natural language, need to incorporate uncertainty! People don t have same priors on messages. Need to estimate/bound each others priors Can compression work with uncertain priors? August 4, 2017 IITB: Uncertain Compression & Coloring 5 of 21
Outline Part 1: Motivation Part 2: Formalism Part 3: Randomized Solution Part 4: Issues with Randomized Solution Part 5: Deterministic Issues. August 4, 2017 IITB: Uncertain Compression & Coloring 6 of 21
Uncertain Compression Design encoding/decoding schemes (EE/DD) so that Sender has distribution PP on [NN] Receiver has distribution QQ on [NN] Sender gets mm [NN] Sends EE(PP, mm) to receiver. Receiver receives yy = EE(PP, mm) Decodes to mm = DD(QQ, yy) Want: mm = mm (provided PP, QQ close), While minimizing EExp mm PP EE(PP, mm) August 4, 2017 IITB: Uncertain Compression & Coloring 7 of 21
Proximity of Distributions Many alternatives. Our goal: Find anything non-trivial that allows compression. Eventual choice: Δ PP, QQ = max mm [NN] max log PP mm QQ mm, log QQ mm PP mm Symmetrized worst-case KL divergence KL Divergence: DD PP, QQ = EExp mm PP (So trivially: DD PP, QQ Δ(PP, QQ) ) August 4, 2017 IITB: Uncertain Compression & Coloring 8 of 21 log PP mm QQ(mm) Question: Can message be compressed to within ff HH PP, Δ? Or is it ff(hh PP, Δ, NN)?
Solution 1: Assuming Randomness Assume sender+receiver share rr Unif( 0,1 tt ) In particular rr independent of PP, QQ, mm Compression scheme: Let rr = (rr 1,, rr NN ) with rr ii 0,1 tt NN Sender sends prefix zz of rr mm ; long enough so that mm s.t. zz is a prefix of rr mm : PP mm EExp rr rr mm log PP mm + 2Δ < PP mm 4 Δ Thm: Expected compression length HH PP + 2Δ Deterministic? August 4, 2017 IITB: Uncertain Compression & Coloring 9 of 21
Combinatorial Reinterpretation Can fix PP(mm) (adds log PP mm to compression). Define: AA 0 = mm, AA 1 = mm PP mm 4 Δ PP(mm), AA ii = mm PP mm 4 ii Δ PP(mm) Similarly BB 1 = mm QQ mm 2 Δ PP(mm), BB ii = mm QQ mm 2 2ii 1 Δ PP(mm) Nesting: AA 0 BB 1 AA 1 BB 2 AA 2 Sizes: AA ii KK CC 2ii, BB ii KK CC 2ii 1 for KK = 1 ; CC = 2Δ PP mm Question: Given KK, CC can mm be distinguished from mm BB 1 with OO KK,CC 1 bits? August 4, 2017 IITB: Uncertain Compression & Coloring 10 of 21
Compression Coloring Weak Uncertainty graph WW NN,KK,CC Vertices = AA 0, AA 1,, AA l : Nested, AA 0 = 1, AA ii KK CC 2ii, AA l = [NN] Edges: AA 0, AA 1,, AA l AA 0, AA 1,, AA l AA 0 AA 0 AA ii AA ii+1 ; AA ii AA ii+1 Claim: Compression length = ff HH PP, Δ iff KK, CC, NN χχ WW NN,KK,CC = OO KK,CC (1) χχ WW NN,KK,CC = open! Did we reduce to a harder problem? iff August 4, 2017 IITB: Uncertain Compression & Coloring 11 of 21
Bounding chromatic number Upper bounds: Easy! Just give the coloring! not always. E.g., Shift Graph SS nn,kk Vertices: nn kk Sequences of kk distinct elements of [nn] Edges ii 1, ii 2,, ii kk (ii 2, ii 3,, ii kk, ii kk+1 ) Thm: [Cole-Vishkin, Linial] If kk log nn, then χχ SS nn,kk = 3. More generally χχ SS nn,kk = max 3, log kk+θ 1 nn Lower bounds much more challenging! August 4, 2017 IITB: Uncertain Compression & Coloring 12 of 21
Some Results [Haramaty+S.] χχ WW NN,KK,CC exp KK. CC l log l NN l [Golowich] χχ WW NN,KK,CC exp exp KK CC l log 2l NN l [Trivial] χχ WW NN,KK,CC KK CC 2 l For further understanding define WW NN,KK,CC Like WW NN,KK,CC but without size restriction on AA l. (so wlog AA l = [NN]) l Upper bounds hold even for WW NN,KK,CC l [Haramaty+S]: χχ WW NN,KK,CC = log Ω l NN Need slow growth for long to get NN-independence August 4, 2017 IITB: Uncertain Compression & Coloring 13 of 21
Upper Bounds 1 [Cole-Vishkin/Linial] Coloring Shift graph: Given coloring χχ: SS nn,kk 1 0,1 cc, construct coloring χχ : SS nn,kk cc {0,1} as follows: To color ii 1,, ii kk : Let (aa 1,, aa cc ) = χχ(ii 1,, ii kk 1 ) And bb 1,, bb cc = χχ ii 2,, ii kk Let jj be least index s.t. aa jj bb jj (exists!) χχ ii 1,, ii kk = jj, aa jj Valid? χχ ii 2,, ii kk+1 = jj, bb jj or jj, xx for jj jj! August 4, 2017 IITB: Uncertain Compression & Coloring 14 of 21
Generalizing: Homorphisms GG homomorphic to HH (GG HH) if φφ: VV GG VV HH s. t. uu GG vv φφ uu HH φφ vv Homorphisms? GG is kk-colorable GG KK kk GG HH and HH LL GG LL Homomorphisms and Shift/Uncertainty graphs. SS nn,kk SS nn,kk 1 SS nn,kk 2 NN WW NN,KK,CC = WW NN,KK,CC WW NN 1 l NN,KK,CC WW NN,KK,CC l Suffices to upper bound χχ WW NN,KK,CC August 4, 2017 IITB: Uncertain Compression & Coloring 15 of 21
Degree of Homomorphisms Say φφ: GG HH dd φφ uu φφ vv vv GG uu dd φφ max {dd φφ uu } uu Lemma [HS]: χχ GG OO(dd φφ 2 log χχ HH ) Lemma [Golowich]: χχ GG OO(exp dd φφ log log χχ HH ) l For φφ: WW NN,KK,CC l 1 WW NN,KK,CC dd φφ = exp KK CC l GG HH August 4, 2017 IITB: Uncertain Compression & Coloring 16 of 21
Proof: (of χχ GG OO(dd φφ 2 log χχ HH ) ) Denote χχ HH = cc; dd φφ = dd Let MM = OO dd log cc ; tt = 2dd Claim: h 1,, h MM, h ii : cc tt s.t. ii SS cc, SS dd, jj ss. tt. h jj ii h jj SS Proof: Pick h jj s at random Claim: tt MM coloring of GG Proof: Given χχ: HH [cc], let χχ : GG MM [tt] be: Let SS uu = χχ φφ vv ) vv GG uu ; ii = χχ φφ uu Let jj be s.t. h jj ii h jj SS χχ uu = jj, h jj ii August 4, 2017 IITB: Uncertain Compression & Coloring 17 of 21
l Lower bounds: χχ WW NN,KK,CC log (2l) NN l Claim: SS NN,2l subgraph of WW NN,KK,CC Proof: by inspection Linial s Proof: χχ SS NN,l log χχ SS NN,l 1 χχ SS NN,l 1 2 χχ SS NN,l (lower bounds by upper bounds!) Given χχ: SS NN,l [cc], let χχ : SS NN,l 1 2 [cc] be: χχ ii 1,, ii l 1 = χχ ii 1, ii l ii l Claim: χχ ii 1,, ii l 1 χχ ii 2,, ii l Proof: χχ ii 1,, ii l χχ ii 1,, ii l 1 But χχ ii 1,, ii l χχ ii 2,, ii l+1 χχ ii 1,, ii l χχ ii 2,, ii l August 4, 2017 IITB: Uncertain Compression & Coloring 18 of 21
Conclusion Compression (Uncertain) Graph Coloring Unfortunately latter is hard! (not only to solve optimally, but also to understand analytically) Intriguing Special Case: WW NN AA ii 3ii (linear, not exponential, growth) Is χχ WW NN = OO 1? Fundamental underlying question: Is entropy the correct measure of natural compressibility August 4, 2017 IITB: Uncertain Compression & Coloring 19 of 21
Thank You August 4, 2017 IITB: Uncertain Compression & Coloring 20 of 21