The Logic of Partitions with an application to Information Theory

Similar documents
The Logic of Partitions

Counting Distinctions: On the Conceptual Foundations of Shannon s Information Theory

An Introduction to Partition Logic

An introduction to partition logic

Counting distinctions: on the conceptual foundations of Shannon s information theory

An Introduction to Partition Logic

arxiv: v2 [cs.it] 26 Jan 2013

LOGIC JOURNAL INTEREST GROUP IN PURE AND APPLIED LOGICS. EDITORS-IN-CHIEF A. Amir D. M. Gabbay G. Gottlob R. de Queiroz J.

THE REVIEW OF SYMBOLIC LOGIC

Relations on Hypergraphs

Introduction to Information Theory

Boolean Algebra and Propositional Logic

Review CHAPTER. 2.1 Definitions in Chapter Sample Exam Questions. 2.1 Set; Element; Member; Universal Set Partition. 2.

Some Non-Classical Approaches to the Brandenburger-Keisler Paradox

Boolean Algebra and Propositional Logic

On Classical and Quantum Logical Entropy: The analysis of measurement arxiv: v2 [quant-ph] 8 Sep 2016

Section 2.2 Set Operations. Propositional calculus and set theory are both instances of an algebraic system called a. Boolean Algebra.

Category Theory (UMV/TK/07)

Lecture 4. Algebra, continued Section 2: Lattices and Boolean algebras

A Discrete Duality Between Nonmonotonic Consequence Relations and Convex Geometries

Problem 1: Suppose A, B, C and D are finite sets such that A B = C D and C = D. Prove or disprove: A = B.

BOOLEAN ALGEBRA INTRODUCTION SUBSETS

Topos Theory. Lectures 17-20: The interpretation of logic in categories. Olivia Caramello. Topos Theory. Olivia Caramello.

Category theory and set theory: algebraic set theory as an example of their interaction

The role of the overlap relation in constructive mathematics

An Overview of Residuated Kleene Algebras and Lattices Peter Jipsen Chapman University, California. 2. Background: Semirings and Kleene algebras

Category theory for computer science. Overall idea

arxiv: v2 [math.lo] 25 Jul 2017

The Objective Indefiniteness Interpretation of Quantum Mechanics: David Ellerman

Boolean Algebra CHAPTER 15

Math 730 Homework 6. Austin Mohr. October 14, 2009

Partial cubes: structures, characterizations, and constructions

Logics above S4 and the Lebesgue measure algebra

Some glances at topos theory. Francis Borceux

Introduction to Kleene Algebras

cse303 ELEMENTS OF THE THEORY OF COMPUTATION Professor Anita Wasilewska

The overlap algebra of regular opens

Dialetheism and a Game Theoretical Paradox

Introduction to Functions

Category Theory. Categories. Definition.

Exercises on chapter 0

BASIC MATHEMATICAL TECHNIQUES

Introduction. Foundations of Computing Science. Pallab Dasgupta Professor, Dept. of Computer Sc & Engg INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

Logical Entropy: Introduction to Classical and Quantum Logical Information Theory

Semantics of intuitionistic propositional logic

arxiv:math/ v1 [math.lo] 5 Mar 2007

Section Summary. Relations and Functions Properties of Relations. Combining Relations

2 Metric Spaces Definitions Exotic Examples... 3

Notes on Ordered Sets

Chapter 1 : The language of mathematics.

cse371/mat371 LOGIC Professor Anita Wasilewska Fall 2018

Discrete Mathematics. W. Ethan Duckworth. Fall 2017, Loyola University Maryland

3. The Sheaf of Regular Functions

Subtractive Logic. To appear in Theoretical Computer Science. Tristan Crolard May 3, 1999

RELATION ALGEBRAS. Roger D. MADDUX. Department of Mathematics Iowa State University Ames, Iowa USA ELSEVIER

Sets and Motivation for Boolean algebra

University of Oxford, Michaelis November 16, Categorical Semantics and Topos Theory Homotopy type theor

AN ALGEBRAIC APPROACH TO GENERALIZED MEASURES OF INFORMATION

Communication Theory and Engineering

Notation Index. gcd(a, b) (greatest common divisor) NT-16

Congruence Boolean Lifting Property

MONADIC FRAGMENTS OF INTUITIONISTIC CONTROL LOGIC

An introduction to locally finitely presentable categories

Relational semantics for a fragment of linear logic

Logic Synthesis and Verification

Automata and Languages

Inquiry Calculus and the Issue of Negative Higher Order Informations

A Class of Star-Algebras for Point-Based Qualitative Reasoning in Two- Dimensional Space

Computability of Heyting algebras and. Distributive Lattices

Notes about Filters. Samuel Mimram. December 6, 2012

BOOLEAN ALGEBRA. Introduction. 1854: Logical algebra was published by George Boole known today as Boolean Algebra

1. Propositional Calculus

Elementary (ha-ha) Aspects of Topos Theory

and The important theorem which connects these various spaces with each other is the following: (with the notation above)

A generalization of modal definability

Boolean algebra. Values

A Logician s Toolbox

Constructive version of Boolean algebra

KLEENE LOGIC AND INFERENCE

CHAPTER 11. Introduction to Intuitionistic Logic

LOGIC OF CLASSICAL REFUTABILITY AND CLASS OF EXTENSIONS OF MINIMAL LOGIC

Categories and functors

3. Abstract Boolean Algebras

Topology. Xiaolong Han. Department of Mathematics, California State University, Northridge, CA 91330, USA address:

Propositional Logic, Predicates, and Equivalence

Lecture Notes 1 Basic Concepts of Mathematics MATH 352

Direct Limits. Mathematics 683, Fall 2013

INF-SUP CONDITION FOR OPERATOR EQUATIONS

The task is to identify whether or not an arbitrary 3-CNF form is satisfiable. 3-DNF is used here for notational convenience.

On the connection of Hypergraph Theory with Formal Concept Analysis and Rough Set Theory 1

Axioms of Kleene Algebra

CHAPTER 4 CLASSICAL PROPOSITIONAL SEMANTICS

Example: Letter Frequencies

MATH 8253 ALGEBRAIC GEOMETRY WEEK 12

arxiv: v1 [math.lo] 10 Sep 2013

CHAPTER 6. Copyright Cengage Learning. All rights reserved.

A New Category for Semantics

Math 541 Fall 2008 Connectivity Transition from Math 453/503 to Math 541 Ross E. Staffeldt-August 2008

A fresh perspective on canonical extensions for bounded lattices

P P P NP-Hard: L is NP-hard if for all L NP, L L. Thus, if we could solve L in polynomial. Cook's Theorem and Reductions

Transcription:

1 The Logic of Partitions with an application to Information Theory David Ellerman University of California at Riverside and WWW.Ellerman.org

2 Outline of Talk Logic of partitions dual to ordinary logic of subsets where a partition on a set U is a mutually exclusive and jointly exhaustive set of subsets of U, a.k.a., an equivalence relation on U. Sketch of logic of partitions: connectives, lattices with implication, tautologies, and representation. Table of analogies between two logics. Logic of subsets + normalized size of subsets (finite universe) = finite probability theory which generalizes to probability theory. Logic of partitions + normalized size of partition (finite universe) = finite logical information theory which generalizes to information theory. Three precisely related notions of information content or entropy of a partition : Logical entropy: h( ) = p i (1 p i ), Block-count entropy: H m ( ) = (1/p i ) p i, and Shannon s entropy: H( ) = p i log 2 (1/p i ).

3 Partitions dual to Subsets Category Theory (CT) duality: monomorphisms (e.g., injective maps between sets) dual to epimorphisms (e.g., surjective maps between sets). CT duality gives subset-partition duality: monomorphism determines a subset of its codomain (its image); epimorphism determines a partition of its domain (inverse-image). In categorical logic, subsets generalize to subobjects or "parts". "The dual notion (obtained by reversing the arrows) of 'part' is the notion of partition." (Lawvere)

4 Lattices of Subsets and of Partitions Given universe set U, there is lattice of subsets P(U) with inclusion as partial ordering and the usual union and intersection, and enriched with implication: A B = A c B. Given universe set U, there is lattice of partitions (U) enriched by implication where refinement is the partial ordering. Given partitions = {B} and = {C}, is refined by,, if for every block B, there is a block C such that B C. Join of = {B} and = {C} is partition whose blocks are non-empty intersections B C. Meet : define undirected graph on U with link between u and u' if they are in same block of or. Then connected components of graph are blocks of meet. Implication is the partition that is like except that any block B contained in some block C is discretized. Top = 1 ={{u} u U} discrete partition; Bottom = 0 = {U} = indiscrete partition = "blob"

5 Tautologies in subset and partition logics Taking,, and as the primative connectives with a constant (with A = A ), a classical or "subset" tautology is any formula which evaluates to U in the enriched lattice of subsets P(U) regardless of which subsets were assigned to the atomic variables (with, the null set, always assigned to ). With the same definition of formulas (i.e.,,,, and ), a partition tautology is any formula which evaluates to 1 (the discrete partition) in the enriched lattice of partitions (U) regardless of which partitions were assigned to the atomic variables (with 0, the blob, always assigned to ). Classically, it suffices to take U = {*} so P(U) = 2 = {,{*}} as in usual truth tables with values 0 and 1. For U = 2 (any two element set), (U) = {0,1} (indiscrete and discrete partitions) and partition ops behave classically. Therefore: Theorem: Every partition tautology is a classical tautology.

6 B = { (U)} is a Boolean algebra All the elements with a fixed consequent, B, form a Boolean algebra under the partition operations with as bottom and 1 as top. Think of as -negation of. B Powerset BA of nonsingleton blocks of. -double-negation transform: every formula in the language of,,, and maps into a formula in B by the - double-negation mapping #( ) applied to each atomic variable (and by mapping # ). No change in connectives so is a substitution instance of. Theorem: If is a classical tautology, then is a partition tautology. Example: Law of Excluded Middle = ( ) transforms to partition tautology: (( ) ) ( ) [since (( ) ) = ] which is not intuitionistically valid.

7 Representation in Closure space: U U Build representation of partition lattice (U) using 'open' subsets of U U. Associate with partition, the subset of distinctions made by, dit( ) = {(u,u') u and u' in distinct blocks of }. Closed subsets of U 2 are reflexive-symmetric-transitive (rst) closed subsets, i.e., equivalence relations on U. Open subsets are complements, which are precisely ditsets dit( ) of partitions. For S U U, closure cl(s) is rst closure of S. Interior Int(S) = (cl(s c )) c where S c = U 2 S is complement. Closure op not topological: cl(s) cl(t) not nec. closed.

8 Dit-set Representation for (U) Translating "distinctions talk" into "elements talk". Representation: # dit( ), i.e., u and u' distinguished by means (u,u') is an element of dit( ). Refinement ordering of partitions = inclusion ordering of dit-sets: iff dit( ) dit( ). Join of partitions represented by union of dit-sets: dit( ) = dit( ) dit( ). Meet of partitions represented by interior of intersection of dit-sets: dit( ) = Int(dit( ) dit( )). Implication of partitions represented by: dit( ) = Int(dit( ) c dit( )). Top 1 = {{u} u U} represented by dit(1) = U 2 U. Bottom 0 = {U} ("blob") represented by dit(0) =.

9 Why previous attempts to dualize logic were stymied In categorical logic, algebras of subobjects are Heyting algebras which nicely dualize to co-heyting algebras (which have "difference" instead of "implication"), e.g., algebras of closed subsets of topological spaces. But co-heyting algebras are distributive while lattices of partitions are not distributive quite aside from any definition of "difference" or "implication." Additional keys: Proper definition of implication for partitions (or, equivalently, difference for equivalence relations), particularly "set-of-blocks" definition; Key to partition semantics is seeing that: "partition making a distinction" is dual to "subset containing an element"; Seeing that: partitions as binary relations [i.e., dit( )] are complementary to eq. relations [U U dit( )] like open and closed subsets in a topological space, so lattice of partitions is opposite of lattice of equivalence relations and implication rather than difference is used to enrich the lattice.

10 Truth tables in Subset Logic Ordinary truth tables can be presented as a subset semantics for propositional formulas. Think of A and B as subsets (or propositional variables interpreted as subsets) and u as a generic element. Then write u Aas 0 A and u Aas 1 A etc. to get ordinary truth table (which leaves off subscripts). A B A B u A u B u A B u A u B u A B u A u B u A B u A u B u A B A B A B 0 A 0 A B 0 A 1 B 1 A B 1 A 1 A B 1 A 1 B 1 A B

11 Partition Logic "Dit Table" for Join Partition "truth tables" can be presented as a partition semantics for prop. formulas. Think of and as partitions and (u,u') as a generic distinction. Then write (u,u') dit( ) as where u,u' B and (u,u') dit( ) as 1 B,B' where u B and u' B' etc. (u,u') dit( ) (u,u') dit( ) (u,u') dit( ) (u,u') dit( ) (u,u') dit( ) (u,u') dit( )) (u,u') dit( ) (u,u') dit( ) (u,u') dit( )) (u,u') dit( ) (u,u') dit( ) (u,u') dit( ) C 1 C,C' 1 B C,B C' 1 B,B' 1 B C,B' C 1 B,B' 1 C,C' 1 B C,B' C'

12 Dit Table for Implication Conditions (u,u') dit( ) (u,u') dit( ) (u,u') dit( ) B C (u,u') dit( ) (u,u') dit( ) (u,u') dit( ) B C (u,u') dit( ) (u,u') dit( ) (u,u') dit( ) (u,u') dit( ) (u,u') dit( ) (u,u') dit( ) (u,u') dit( ) (u,u') dit( ) (u,u') dit( ) Recall that is like except that any block B contained in some block C is discretized. Conditions B C B C 1 B,B' 1 B,B' 1 C,C' 1 C,C' 1 B,B' 1 B,B'

13 Dit table for -transform of 1 2 3 4 5 ( ) 3 4 Conditions B C B C 1 B,B' 1 B,B' 1 C,C' 1 C,C' 1 B,B' 1 B,B' Dit table for: ( ) (( ) )

14 Peirce's Law not a partition tautology ( ) (( ) ) Conditions B C; C B B C; C B B C; C B B C; C B 1 B,B' 1 B,B' 1 C,C' 1 C,C' 1 C,C' 1 B,B' 1 B,B' 1 C,C' Dit table for Peirce's Law: (( ) ) Counter-example: = {{u,u',u''}}; = {{u,u'},{u''}}; = ; ( ) = = 1; and (( ) ) = 1 = 1.

15 Table of Analogies Subsets Partitions Atoms Elements u U Distinctions (u,u') (U U) - U All atoms Universe set U Discrete partition 1 (all dits) No atoms Empty set Indiscrete partition 0 (no dits) Model of proposition Subset S U Partition on U Model of individual Elements u Distinctions (u,u') Proposition holds Element u S Partition distinguishes (u,u') Lattice of propositions Subset lattice P(U) Partition lattice (U) Counting (U finite) # elements in S # distinctions of Normalized count Prob(S) = S / U h( ) = dits of / U U Prob. Interpretation Prob(S) = probability h( ) = probability random random element in S pair distinguished by

16 From Subset Logic to Finite Prob. Theory U is now a finite "sample space". Subsets A,B U are "events". Elements u U are "outcomes". With Laplacian assumption of equiprobable outcomes, Prob(A) = A / U. A + B = A B + A B so normalizing yields: Prob(A)+Prob(B) = Prob(A B)+Prob(A B).

17 From Partition Logic to Logical Info. Theory Finite space of pairs U U analogous to U. Partition on U analogous to event A U. Pairs (u,u') U U analogous to outcomes. Under Laplacian assumption of equiprobable pairs (i.e., two randoms draws with replacement), h( ) = dit( ) / U U = logical entropy of = probability random pair distinguished by. Where mutual logical information mut(, ) = dit( ) dit( ) / U 2, dit( ) + dit( ) = dit( ) + dit( ) dit( ) so normalizing: h( ) + h( ) = h( ) + mut(, ). Since dit( ) = Int(dit( ) dit( )), Submodular inequality: h( ) + h( ) h( ) + h( ).

18 From Partitions to Finite Prob. Distributions Let p B = B / U = probability a random draw is from block B so {p B } B is a finite prob. dist. Then h( ) = dit( ) / U U = { B B' :B B'}/ U 2 = {p B p B' : B B'} = B p B (1 p B ) = 1 B p B2. Hence if p = {p 1,,p n } is any finite prob. dist. then the logical entropy of the probability dist. is: h(p) = 1 i p i 2.

19 History of Formula: h(p) = 1 i p i 2 In 1912, Gini defined 1 i p i 2 as the index of mutability. In 1922, cryptographer William Friedman defined i p i 2 as index of coincidence. Alan Turing worked at Bletchley Park in WWII on crypography and defined i p i 2 as the repeat rate. Turing's assistant, Edward Simpson, published in 1949, i p i 2 as "index of species concentration" so 1 i p i 2 is now often called Simpson index of diversity in biodiversity literature. In 1945, Albert Hirschman suggested Sqrt( i p i2 ) as "index of trade concentration", and a few years later, Orris Herfindahl independently defined i p i 2 as the "index of industrial concentration" so in economics, i p i 2 is the HH index of concentration. Let U = {u 1,,u n } with probabilities p = {p 1,,p n } and d ij = "distance" between u i and u j where d ii = 0. Then C.R. Rao defined in 1982, quadratic entropy: Q = i j d ij p i p j. "Logical distance" is d ij = 1 if i j and d ii = 0, and the Rao quadratic entropy with logical distances is the logical entropy h(p) = i j p i p j = 1 i p i2.

20 Shannon's Entropy and Block Entropies Given a partition = {B}, we compare: Shannon's entropy: H( ) = B p B log(1/p B ), Logical entropy: h( ) = B p B (1 p B ). Each formula is an average of block entropies: H(B) = log(1/p B ) for Shannon entropy, h(b) = (1 p B ) for logical entropy. Eliminating p B gives: h(b) = 1 (1/2 H(B) ) = block entropy relationship. Search interpretation of Shannon block entropy: Game of "Twenty Questions": min. # yes-or-no questions necessary to find any designated hidden element (e.g., "sent message") in a set of like elements. Example: 2 3 = 8 elements so 3 binary partitions are required to find hidden element. Code 8 elements with 3-digit binary numbers and ask 3 questions: What is 1 st, 2 nd, & 3 rd digits? Prob of each element is p B = 1/8 and H(B) = log(1/p B ) = 3 and H( ) = B p B log(1/p B ) = 8 (1/8) 3 = 3.

21 Distinctions Interpretation of Shannon's Entropy Search interpretation for "hidden element" or "sent message" comes from Shannon's application to communications. There is an alternative "distinctions interpretation": H(B) as the number of binary partitions needed to make all the distinctions in a set of distinct elements. Equivalence of search and distinctions interpretations: If a set of binary partitions did not jointly make the distinction (u,u'), then they could not single out the hidden element if it was u or u'. And if binary partitions make all distinctions, then asking which block of each binary partition has hidden element will find it.

22 Picture for U = 8 Example U = 2 3 = 8 so 8 8 square is picture of U U. 1 st binary partition has 2 equal blocks so eq. classes are shaded blocks and distinctions are 2 16 = 32 unshaded blocks. 2 nd binary partition (joined with 1 st ) adds 4 4 = 16 new distinctions. 3 rd binary partition adds 8 1 = 8 distinctions. Total # distinctions (as ordered pairs) = 32+16+8 = 56. If = 1, H( ) = 3 and h( ) = 56/64 = 1 (1/2 3 ) = 7/8.

23 Block-count Entropy In base case of U = 2 3, we have two measures of "information content": Shannon's binary-partition count of 3 and the normalized dit-count of 7/8. But "# elements distinguished" = 8 is also a natural measure. In general, when event of prob p B occurs, "surprisevalue" is 1/p B. For independent events, surprise-values multiply so this suggests defining new entropy as multiplicative average of surprise-values: Block-count Entropy: H m ( ) = B (1/p B ) p B

24 Block-count and Shannon Entropies For U = 8 example, H m (1) = (1/(1/8)) 1/8 (1/(1/8)) 1/8 (8 ) = [8 1/8 ] 8 = 8. H m ( ) = average # of equal blocks distinguished by. In general: H m ( ) = 2 H( ), i.e., the block-count entropy is just the anti-log of the Shannon entropy. Shannon entropy can also be defined for any other base such as 3, H 3 ( ) = B p B log 3 (1/p B ) (# ternary partitions etc.), or to base e, H e ( ) = B p B ln(1/p B ) but in every case, the anti-log is the blockcount entropy: H m ( ) = 2 H( ) = 3 H 3( ) = e H e( ) so the block-count entropy is a basefree notion of Shannon entropy. Block-count block-entropy is: H m (B) = 1/p B = 2 H(B) so: h(b) = 1 (1/H m (B)) = 1 (1/2 H(B) ).

25 Parallel Concepts for Shannon and Logical Entropies Shannon Entropy Logical Entropy Block entropy H(B) = log(1/p B ) h(b) = 1 p B Entropy H( ) = B p B H(B) h( ) = B p B h(b) Mutual Information I( ; ) = H( )+H( ) H( ) mut( ; ) = h( )+h( ) h( ) Independence I( ; ) = 0 mut( ; ) = h( )h( ) Cross entropy H(p q) = p i log(1/q i ) h(p q) = p i (1 q i ) Divergence D(p q) = H(p q) H(p) Information Inequality D(p q) 0 with = iff p i = q i for all i. d(p q) = 2h(p q) h(p) h(q) d(p q) 0 with = iff p i = q i for all i.

26 3 Ways to Measure All the Distinctions of a Partition Shannon base 2 entropy of is the ave. number of equal binary partitions needed to make all the distinctions of. Block-count entropy of is the (mult.) ave. number of blocks in an equal-blocked partition needed to make all the distinctions of. Logical entropy of is the normalized count of all the distinctions of. Unifying concept of distinctions of a partition comes from underlying logic of partitions.

27 The End "Counting distinctions: on the conceptual foundations of Shannon s information theory" Forthcoming in: Synthese. Now available from Online First at Synthese site arrived at from: http://www.springerlink.com/journals/

28 Appendix: Elements and Distinctions Again When is a binary relation F U V a function F:U V? A relation F preserves elements if for any element u U, there is an element v V such that (u,v) F. A relation F reflects distinctions if for any pairs (u,v) F and (u',v') F, if v v' then u u'. F is a function iff F preserves elements and reflects distinctions. A function F is injective iff it also preserves distinctions, i.e., if u u' then F(u) F(u'). A function F is surjective iff it also reflects elements, i.e., for any element v V, there is an element u Usuch that F(u) = v.

29 Two ways to create a set U U Null Set... 4... 4 4 4 Blob 1 "Subset" creation myth: In the beginning was the void (null set) and then elements were created (fully properties and distinguished from one another) until a snapshot was taken and the result was a set of elements U. "Partition" creation myth: In the beginning was the blob, and then distinctions were made until a snapshot was taken and the result was a set of blocks taken to be the discrete partition on a set U (i.e., singletons of the elements of U)

30 Forcing models in partition logic A forcing model is given by a forcing ("forcing" = "is distinguished by") relation between the off-diagonal pairs U 2 of a set U and the partition logic formulas such that: for any atomic variable, (u,u') iff (u',u) ; for any formulas and, (u,u') iff (u,u') or (u,u') ; for any formulas and, (u,u') iff not (u,u') or (u,u'), and for any (2-link) path connecting u and u', there is a link where is not forced or is forced; for any formulas and, (u,u') iff (u,u') and (u,u'), and for any path connecting u and u', there is a link where both and are forced. Theorem: A formula is a partition tautology iff it is forced at each pair (u,u') in every forcing model.

31 Paths and cuts in graph theory Let G be a simple undirected graph on node-set U. A path between u and u' is a set of links (u,u 1 ), (u 1,u 2 ),, (u n-1,u n ), (u n,u') connecting u and u'. A cut between u and u' is a set of links that contain at least one link from every path from u to u'. Paths and cuts are opposites or "duals": color the links arbitrarily black or white and then one and only option holds for any (u,u'): (where u u') the white links form a cut between u and u'; or the black links contain a path between u and u'.

32 Forcing models and graph theory Each formula "? " where? is,, or has a corresponding Boolean condition: : holds or holds; : does not hold or holds; and : holds and holds. On the complete graph K(U) [every (u,u') is a link], think of cuts as "generalized" distinctions (and thus paths as generalized indistinctions). Theorem: (u,u')? in any forcing model iff the links where the Boolean condition for? is satisfied is a cut between u and u'.

33 Parallel Concepts for Shannon and Logical Entropies Shannon Entropy Logical Entropy Entropy H(p x ) = x p x log(1/p x ) h(p x ) = x p x (1 p x ) Uniform 1/n log(n) 1-1/n Cross entropy H(p x q x ) = p x log(1/q x ) h(p x q x ) = p x (1 q x ) Divergence D(p x q x ) = x p x log(p x /q x ) d(p x q x ) = x (p x q x ) 2 Information Ineq. D(p q) 0 with = iff p i = q i all i. d(p q) 0 with = iff p i = q i all i. Joint entropy H(X,Y) = xy p xy log(1/p xy ) h(x,y) = xy p xy (1 p xy ) Mutual info. H(X:Y) = H(X)+H(Y) H(X,Y) m(x,y) = h(x)+h(y) h(x,y) Independence H(X:Y) = H(X) + H(Y) m(x,y) = h(x)h(y)