Lecture 14: Graph Entropy

Similar documents
Lecture 6: Source coding, Typicality, and Noisy channels and capacity

Lecture 16: Monotone Formula Lower Bounds via Graph Entropy. 2 Monotone Formula Lower Bounds via Graph Entropy

Lecture 2. The Lovász Local Lemma

Lecture 15: Strong, Conditional, & Joint Typicality

Notes for Lecture 11

1 Review and Overview

Sequences and Series of Functions

Entropy and Ergodic Theory Lecture 5: Joint typicality and conditional AEP

Entropy Rates and Asymptotic Equipartition

Empirical Process Theory and Oracle Inequalities

Problem Set 2 Solutions

HOMEWORK 2 SOLUTIONS

Entropies & Information Theory

Lecture 11: Channel Coding Theorem: Converse Part

The Borel hierarchy classifies subsets of the reals by their topological complexity. Another approach is to classify them by size.

CS284A: Representations and Algorithms in Molecular Biology

Lecture 9: Hierarchy Theorems

Shannon s noiseless coding theorem

ACO Comprehensive Exam 9 October 2007 Student code A. 1. Graph Theory

Advanced Stochastic Processes.

Lecture 9: Expanders Part 2, Extractors

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +

MA131 - Analysis 1. Workbook 3 Sequences II

Lecture 7: Channel coding theorem for discrete-time continuous memoryless channel

Math 216A Notes, Week 5

Application to Random Graphs

Universal source coding for complementary delivery

Seunghee Ye Ma 8: Week 5 Oct 28

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

Asymptotic Coupling and Its Applications in Information Theory

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

If a subset E of R contains no open interval, is it of zero measure? For instance, is the set of irrationals in [0, 1] is of measure zero?

It is always the case that unions, intersections, complements, and set differences are preserved by the inverse image of a function.

Disjoint Systems. Abstract

Lecture Notes for Analysis Class

# fixed points of g. Tree to string. Repeatedly select the leaf with the smallest label, write down the label of its neighbour and remove the leaf.

Lecture 12: November 13, 2018

PRACTICE PROBLEMS FOR THE FINAL

ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE

Lecture 7: October 18, 2017

Increasing timing capacity using packet coloring

Definition An infinite sequence of numbers is an ordered set of real numbers.

Chapter 6 Infinite Series

Lecture 2: April 3, 2013

6.895 Essential Coding Theory October 20, Lecture 11. This lecture is focused in comparisons of the following properties/parameters of a code:

Distribution of Random Samples & Limit theorems

Math 155 (Lecture 3)

Intro to Learning Theory

Lecture 2: Concentration Bounds

Problem Set 4 Due Oct, 12

1. Universal v.s. non-universal: know the source distribution or not.

REGRESSION WITH QUADRATIC LOSS

Lecture 12: September 27

INFORMATION THEORY AND STATISTICS. Jüri Lember

Week 5-6: The Binomial Coefficients

Lecture 4: Unique-SAT, Parity-SAT, and Approximate Counting

Lecture 1: Basic problems of coding theory

Lecture 3 The Lebesgue Integral

Information Theory and Coding

Here, e(a, B) is defined as the number of edges between A and B in the n dimensional boolean hypercube.

Lecture 19: Convergence

f n (x) f m (x) < ɛ/3 for all x A. By continuity of f n and f m we can find δ > 0 such that d(x, x 0 ) < δ implies that

The Growth of Functions. Theoretical Supplement

4 The Sperner property.

Randomized Algorithms I, Spring 2018, Department of Computer Science, University of Helsinki Homework 1: Solutions (Discussed January 25, 2018)

Notes 19 : Martingale CLT

Large holes in quasi-random graphs

Fall 2013 MTH431/531 Real analysis Section Notes

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators

On Random Line Segments in the Unit Square

Math 220B Final Exam Solutions March 18, 2002

ECE 6980 An Algorithmic and Information-Theoretic Toolbox for Massive Data

Section 5.1 The Basics of Counting

Singular Continuous Measures by Michael Pejic 5/14/10

Lecture 11: Pseudorandom functions

Pairs of disjoint q-element subsets far from each other

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

The multiplicative structure of finite field and a construction of LRC

Chapter 6 Principles of Data Reduction

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3

2.1. The Algebraic and Order Properties of R Definition. A binary operation on a set F is a function B : F F! F.

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

Information Theory Tutorial Communication over Channels with memory. Chi Zhang Department of Electrical Engineering University of Notre Dame

Carleton College, Winter 2017 Math 121, Practice Final Prof. Jones. Note: the exam will have a section of true-false questions, like the one below.

Beurling Integers: Part 2

Infinite Sequences and Series

Lecture 10 October Minimaxity and least favorable prior sequences

MATH 112: HOMEWORK 6 SOLUTIONS. Problem 1: Rudin, Chapter 3, Problem s k < s k < 2 + s k+1

Square-Congruence Modulo n

EDGE-COLORINGS AVOIDING RAINBOW STARS

n=1 a n is the sequence (s n ) n 1 n=1 a n converges to s. We write a n = s, n=1 n=1 a n

Lecture 2 February 8, 2016

Lecture XVI - Lifting of paths and homotopies

Information Theory and Statistics Lecture 4: Lempel-Ziv code

STAT Homework 1 - Solutions

Balanced coloring of bipartite graphs

Lecture 23: Minimal sufficiency

Transcription:

15-859: Iformatio Theory ad Applicatios i TCS Sprig 2013 Lecture 14: Graph Etropy March 19, 2013 Lecturer: Mahdi Cheraghchi Scribe: Euiwoog Lee 1 Recap Bergma s boud o the permaet Shearer s Lemma Number of triagles i a grpah with l edges. 2 Motivatio ad Defiitio of Graph Etropy So far i this course, we have leared two aspects to codig theory - source codig ad chael codig. Graph etropy ca be thought as a combiatorial extesio of source codig. Suppose that we are give a source which emits oe symbol x V. The source codig theorem says that if symbols are i.i.d. ad the umber of symbols is large, it is possible to achieve Rate H(X) ad this is the best to hope for. This result is based o the requiremet that wheever we have two sequeces of symbols (x 1,..., x t ) ad (y 1,.., y t ), which are differet i at least oe symbol, the ecoder should assig differet codewords for them; otherwise at least oe of them caot be recovered. What does happe if we relax this strict requiremet ad allow some cofusio (i.e. it is okay to use the same codeword for certai pairs of strigs)? As the requiremet is relaxed, we might hope for a better rate. The graph etropy studies this questio by represetig such requiremets by graphs. 2.1 1-symbol Case We still have a source that emits a symbol i V, ad a graph G = (V, E) such that {a, b} E if a ad b must be distiguished. This graph represets the requiremet that for ay ecoder Ec : V {0, 1} R, {a, b} E : Ec(a) Ec(b) How small R ca be i this settig? This settig is exactly equal to the well-studied graph (vertex) colorig problem, where the goal is to color each vertex so that o edge has both edpoits with the same color (each color correspods to a codeword). Let χ(g) be the miimum umber of colors eeded for G. The best R = log χ(g). If G = K, which meas every symbol must be distiguished, χ(g) = ad R OP T = log. 1

2.2 Multi-symbol Case We ow assume that the source emits t i.i.d. symbols, each accordig to distributio p o V. Defiitio 2.1. (x 1,..., x t ) is distiguishable from (y 1,..., y t ) if i [t] such that (x i, y i ) E. Let G t = (V t, E t ) where V t = {(v 1,..., v t ) : v i V } {(v 1,..., v t ), (w 1,..., w t )} E if ad oly if i such that {v i, w i } E. We ca see (v 1,..., v t ) ad (w 1,..., w t ) are distiguishable whe {(v 1,..., v t ), (w 1,..., w t )} E t. Let p t (v 1,..., v t ) = Π i [t] p(v i ) be the probability of (v 1,..., v t ). As i the origial source codig theorem, we might decide to igore small fractio of vertices accordig to this distributio ad color the rest of the graph with a small umber of colors. Asymptotically, we take t ad allow a error parameter ɛ. If ɛ = 0 (i.e. error-free code), the best achievable rate is log χ(g t ) lim t t If ɛ > 0, we defie etropy of G as the best achievable rate allowig ɛ error, amely H(G, p) = lim mi t U V t p t (U) 1 ɛ where G t (U) is the subgraph of G t iduced by U. proved that 1. Limit exists 2. Limit is idepedet of ɛ (0, 1). log χ(g t (U)) t Körer, who itroduced this defiitio, 3. H(G, p) = mi (X,Y ) I(X; Y ) where X V is a radom vertex whose margial distributio is p, ad Y V is a radom idepedet set of vertices such that X Y always. Y is a idepedet set if for all v, v Y, {v, v } / E. Note that 3 implies 1 ad 2. Oe rough ituitio is that ay colorig of G partitios V ito idepedet sets, ad as we use a fewer umber of colors, the size of each idepedet set will be larger. This colorig aturally defies the joit distributio (X, Y ) - pick X V accordig to p, ad let Y be the set of vertices with the same color with X. I(X; Y ) = H(X) H(X Y ) also gets smaller as the size of Y icreases, so this roughly explais how colorig is related to a I(X; Y ). 3 Examples of Graph Etropy From ow o, p is the uiform distributio o V. I this case defie H(G) to be H(G, uiform). To prove a upper boud o H(G), it is eough to fid a joit distributio (X, Y ) such that I(X; Y ) is small. 2

3.1 Empty Graph I a graph with o edge, Y ca be V always regardless of X. H(G) I(X; Y ) H(Y ) = 0 Sice H(G) 0 by defiitio, H(G) = 0. 3.2 Complete Graph I a complete graph K, give X, Y has to be {X} sice it is the oly set that cotais X ad is idepedet. This uique distributio gives H(G) = I(X; Y ) = H(X) H(X Y ) = H(X) = log. 3.3 Bipartite ad r-partite Graph Suppose we have a complete bipartite graph K m, with partitios A ad B such that A = m, B =. Give X, we take Y = A if x A, ad Y = B if x B. Usig this joit distributio, H(G) I(X; Y ) = H(X) H(X Y ) = log(m + ) m m + log m log = h( m + m + ) where h is the biary etropy fuctio. O the other had, for ay joit distributio (X, Y ), we see that Y A if X A, ad Y B if X B. Therefore, H(X Y ) P r[x A] log A + P r[x B] log B = This shows that H(G) h( m+ ), ad therefore H(G) = h( m m + log m + m + log m+ ) Geerally, if we have r-partite graph where V = [] [r] ad E = {(i, j), (k, l) : j l}, followig the same argumet, we ca coclude that H(G) = log r. The bipartite graph with m = is a special case with H(G) = h( 1 2 ) = log 2 = 1. 4 Properties of Graph Etropy 4.1 Subadditivity Lemma 4.1. Let G 1 = (V, E 1 ), G 2 = (V, E 2 ) ad G = (V, E 1 E 2 ). The H(G) H(G 1 )+H(G 2 ). Proof. Take joit distributio (X, Y 1, Y 2 ) such that H(G 1 ) = I(X; Y 1 ) H(G 2 ) = I(X; Y 2 ) Coditioed o X, Y 1 ad Y 2 are idepedet. 3

Y 1 Y 2 is idepedet o G, ad it cotais X. Therefore, (X, Y 1 Y 2 ) is a valid distributio for G. H(G) I(X; Y 1 Y 2 ) I(X; Y 1, Y 2 ) (follows from data processig iequality) = H(Y 1, Y 2 ) H(Y 1, Y 2 X) = H(Y 1, Y 2 ) H(Y 1 X) H(Y 2 X) (Y 1 Y 2 coditioed o X) H(Y 1 ) + H(Y 2 ) H(Y 1 X) H(Y 2 X) = H(G 1 ) + H(G 2 ) 4.2 Mootoicity Lemma 4.2. Let G = (V, E), F = (V, E ), E E. The H(G) H(F ). Proof. Sice G has fewer edges (less strict requiremets) tha F, (X, Y ) achievig H(F ) is feasible for H(G). 4.3 Disjoit Uio Lemma 4.3. Let G 1,..., G t are coected compoets of G ad ρ i := V (G i) V (G). The H(G) = i [k] ρ i H(G i ) Proof. First we show that H(G) ρ i H(G i ). Take a joit distributio (X, Y ) such that H(G) = I(X; Y ), ad let Y i = Y V (G i ). Defie l(x) : V (G) [k] such that l(x) = i iff x V (G i ). H(G) = I(X; Y 1,..., Y k ) = I(X, l(x); Y 1,..., Y k ) (X determies (X, l(x))) = I(l(X); Y 1,..., Y k ) + I(X; Y 1,..., Y k l(x)) (Chai rule) P r[l(x) = i]i(x; Y 1,..., Y k l(x) = i) (Expad oly the secod term) i [k] = i [k] ρ i (I(X; Y i l(x) = i) + I(X; Y 1,..., Y i 1, Y i+1,..., Y k l(x) = i, Y i )) (Chai rule) i [k] ρ i I(X; Y i l(x) = i) (Igore the secod term) i [k] ρ i H(G i ) (Defiitio of H(G i )) which completes the proof that H(G) ρ i H(G i ). For the other directio, let p i be a joit distributio (X, Y i ) that achieves H(G i ) = I(X; Y i ). We defie a joit distributio (X, Y ) such that 4

1. Pick Y 1,..., Y k idepedetly accordig to p 1,..., p k. 2. Pick i [k] with probability ρ i. 3. Sample X accordig to p i (X Y i ). We wat to show that I(X; Y ) = ρ i H(G i ). We are goig to use the same proof; we oly eed to check that the three ieqaulities above ideed hold as equalities. 1. We chose i = l(x) idepedetly from Y 1,..., Y k ; so I(l(X); Y 1,..., Y k ) = 0 ad the first iequality holds with equality. 2. Our choice of X oly depeds o i ad Y i, so I(X; Y 1,..., Y i 1, Y i+1,..., Y k l(x) = i, Y i ) = 0 ad the secod ieqaulity holds with equalty. 3. By the choice of p i, I(X; Y i ) = H(G i ) for each i. Therefore, H(G) I(X; Y ) = ρ i H(G i ). With the lower boud above, we ca coclude that H(G) = ρ i H(G i ). 5