The Logic of Partitions with an application to Information Theory

1 The Logic of Partitions with an application to Information Theory David Ellerman University of California at Riverside and WWW.Ellerman.org

2 Outline of Talk Logic of partitions dual to ordinary logic of subsets where a partition on a set U is a mutually exclusive and jointly exhaustive set of subsets of U, a.k.a., an equivalence relation on U. Sketch of logic of partitions: connectives, lattices with implication, tautologies, and representation. Table of analogies between two logics. Logic of subsets + normalized size of subsets (finite universe) = finite probability theory which generalizes to probability theory. Logic of partitions + normalized size of partition (finite universe) = finite logical information theory which generalizes to information theory. Three precisely related notions of information content or entropy of a partition : Logical entropy: h( ) = p i (1 p i ), Block-count entropy: H m ( ) = (1/p i ) p i, and Shannon s entropy: H( ) = p i log 2 (1/p i ).

3 Partitions dual to Subsets Category Theory (CT) duality: monomorphisms (e.g., injective maps between sets) dual to epimorphisms (e.g., surjective maps between sets). CT duality gives subset-partition duality: monomorphism determines a subset of its codomain (its image); epimorphism determines a partition of its domain (inverse-image). In categorical logic, subsets generalize to subobjects or "parts". "The dual notion (obtained by reversing the arrows) of 'part' is the notion of partition." (Lawvere)

4 Lattices of Subsets and of Partitions Given universe set U, there is lattice of subsets P(U) with inclusion as partial ordering and the usual union and intersection, and enriched with implication: A B = A c B. Given universe set U, there is lattice of partitions (U) enriched by implication where refinement is the partial ordering. Given partitions = {B} and = {C}, is refined by,, if for every block B, there is a block C such that B C. Join of = {B} and = {C} is partition whose blocks are non-empty intersections B C. Meet : define undirected graph on U with link between u and u' if they are in same block of or. Then connected components of graph are blocks of meet. Implication is the partition that is like except that any block B contained in some block C is discretized. Top = 1 ={{u} u U} discrete partition; Bottom = 0 = {U} = indiscrete partition = "blob"

5 Tautologies in subset and partition logics Taking,, and as the primative connectives with a constant (with A = A ), a classical or "subset" tautology is any formula which evaluates to U in the enriched lattice of subsets P(U) regardless of which subsets were assigned to the atomic variables (with, the null set, always assigned to ). With the same definition of formulas (i.e.,,,, and ), a partition tautology is any formula which evaluates to 1 (the discrete partition) in the enriched lattice of partitions (U) regardless of which partitions were assigned to the atomic variables (with 0, the blob, always assigned to ). Classically, it suffices to take U = {*} so P(U) = 2 = {,{*}} as in usual truth tables with values 0 and 1. For U = 2 (any two element set), (U) = {0,1} (indiscrete and discrete partitions) and partition ops behave classically. Therefore: Theorem: Every partition tautology is a classical tautology.

6 B = { (U)} is a Boolean algebra All the elements with a fixed consequent, B, form a Boolean algebra under the partition operations with as bottom and 1 as top. Think of as -negation of. B Powerset BA of nonsingleton blocks of. -double-negation transform: every formula in the language of,,, and maps into a formula in B by the - double-negation mapping #( ) applied to each atomic variable (and by mapping # ). No change in connectives so is a substitution instance of. Theorem: If is a classical tautology, then is a partition tautology. Example: Law of Excluded Middle = ( ) transforms to partition tautology: (( ) ) ( ) [since (( ) ) = ] which is not intuitionistically valid.

7 Representation in Closure space: U U Build representation of partition lattice (U) using 'open' subsets of U U. Associate with partition, the subset of distinctions made by, dit( ) = {(u,u') u and u' in distinct blocks of }. Closed subsets of U 2 are reflexive-symmetric-transitive (rst) closed subsets, i.e., equivalence relations on U. Open subsets are complements, which are precisely ditsets dit( ) of partitions. For S U U, closure cl(s) is rst closure of S. Interior Int(S) = (cl(s c )) c where S c = U 2 S is complement. Closure op not topological: cl(s) cl(t) not nec. closed.

8 Dit-set Representation for (U) Translating "distinctions talk" into "elements talk". Representation: # dit( ), i.e., u and u' distinguished by means (u,u') is an element of dit( ). Refinement ordering of partitions = inclusion ordering of dit-sets: iff dit( ) dit( ). Join of partitions represented by union of dit-sets: dit( ) = dit( ) dit( ). Meet of partitions represented by interior of intersection of dit-sets: dit( ) = Int(dit( ) dit( )). Implication of partitions represented by: dit( ) = Int(dit( ) c dit( )). Top 1 = {{u} u U} represented by dit(1) = U 2 U. Bottom 0 = {U} ("blob") represented by dit(0) =.

9 Why previous attempts to dualize logic were stymied In categorical logic, algebras of subobjects are Heyting algebras which nicely dualize to co-heyting algebras (which have "difference" instead of "implication"), e.g., algebras of closed subsets of topological spaces. But co-heyting algebras are distributive while lattices of partitions are not distributive quite aside from any definition of "difference" or "implication." Additional keys: Proper definition of implication for partitions (or, equivalently, difference for equivalence relations), particularly "set-of-blocks" definition; Key to partition semantics is seeing that: "partition making a distinction" is dual to "subset containing an element"; Seeing that: partitions as binary relations [i.e., dit( )] are complementary to eq. relations [U U dit( )] like open and closed subsets in a topological space, so lattice of partitions is opposite of lattice of equivalence relations and implication rather than difference is used to enrich the lattice.

10 Truth tables in Subset Logic Ordinary truth tables can be presented as a subset semantics for propositional formulas. Think of A and B as subsets (or propositional variables interpreted as subsets) and u as a generic element. Then write u Aas 0 A and u Aas 1 A etc. to get ordinary truth table (which leaves off subscripts). A B A B u A u B u A B u A u B u A B u A u B u A B u A u B u A B A B A B 0 A 0 A B 0 A 1 B 1 A B 1 A 1 A B 1 A 1 B 1 A B

11 Partition Logic "Dit Table" for Join Partition "truth tables" can be presented as a partition semantics for prop. formulas. Think of and as partitions and (u,u') as a generic distinction. Then write (u,u') dit( ) as where u,u' B and (u,u') dit( ) as 1 B,B' where u B and u' B' etc. (u,u') dit( ) (u,u') dit( ) (u,u') dit( ) (u,u') dit( ) (u,u') dit( ) (u,u') dit( )) (u,u') dit( ) (u,u') dit( ) (u,u') dit( )) (u,u') dit( ) (u,u') dit( ) (u,u') dit( ) C 1 C,C' 1 B C,B C' 1 B,B' 1 B C,B' C 1 B,B' 1 C,C' 1 B C,B' C'

12 Dit Table for Implication Conditions (u,u') dit( ) (u,u') dit( ) (u,u') dit( ) B C (u,u') dit( ) (u,u') dit( ) (u,u') dit( ) B C (u,u') dit( ) (u,u') dit( ) (u,u') dit( ) (u,u') dit( ) (u,u') dit( ) (u,u') dit( ) (u,u') dit( ) (u,u') dit( ) (u,u') dit( ) Recall that is like except that any block B contained in some block C is discretized. Conditions B C B C 1 B,B' 1 B,B' 1 C,C' 1 C,C' 1 B,B' 1 B,B'

13 Dit table for -transform of 1 2 3 4 5 ( ) 3 4 Conditions B C B C 1 B,B' 1 B,B' 1 C,C' 1 C,C' 1 B,B' 1 B,B' Dit table for: ( ) (( ) )

14 Peirce's Law not a partition tautology ( ) (( ) ) Conditions B C; C B B C; C B B C; C B B C; C B 1 B,B' 1 B,B' 1 C,C' 1 C,C' 1 C,C' 1 B,B' 1 B,B' 1 C,C' Dit table for Peirce's Law: (( ) ) Counter-example: = {{u,u',u''}}; = {{u,u'},{u''}}; = ; ( ) = = 1; and (( ) ) = 1 = 1.

15 Table of Analogies Subsets Partitions Atoms Elements u U Distinctions (u,u') (U U) - U All atoms Universe set U Discrete partition 1 (all dits) No atoms Empty set Indiscrete partition 0 (no dits) Model of proposition Subset S U Partition on U Model of individual Elements u Distinctions (u,u') Proposition holds Element u S Partition distinguishes (u,u') Lattice of propositions Subset lattice P(U) Partition lattice (U) Counting (U finite) # elements in S # distinctions of Normalized count Prob(S) = S / U h( ) = dits of / U U Prob. Interpretation Prob(S) = probability h( ) = probability random random element in S pair distinguished by

16 From Subset Logic to Finite Prob. Theory U is now a finite "sample space". Subsets A,B U are "events". Elements u U are "outcomes". With Laplacian assumption of equiprobable outcomes, Prob(A) = A / U. A + B = A B + A B so normalizing yields: Prob(A)+Prob(B) = Prob(A B)+Prob(A B).

17 From Partition Logic to Logical Info. Theory Finite space of pairs U U analogous to U. Partition on U analogous to event A U. Pairs (u,u') U U analogous to outcomes. Under Laplacian assumption of equiprobable pairs (i.e., two randoms draws with replacement), h( ) = dit( ) / U U = logical entropy of = probability random pair distinguished by. Where mutual logical information mut(, ) = dit( ) dit( ) / U 2, dit( ) + dit( ) = dit( ) + dit( ) dit( ) so normalizing: h( ) + h( ) = h( ) + mut(, ). Since dit( ) = Int(dit( ) dit( )), Submodular inequality: h( ) + h( ) h( ) + h( ).

18 From Partitions to Finite Prob. Distributions Let p B = B / U = probability a random draw is from block B so {p B } B is a finite prob. dist. Then h( ) = dit( ) / U U = { B B' :B B'}/ U 2 = {p B p B' : B B'} = B p B (1 p B ) = 1 B p B2. Hence if p = {p 1,,p n } is any finite prob. dist. then the logical entropy of the probability dist. is: h(p) = 1 i p i 2.

19 History of Formula: h(p) = 1 i p i 2 In 1912, Gini defined 1 i p i 2 as the index of mutability. In 1922, cryptographer William Friedman defined i p i 2 as index of coincidence. Alan Turing worked at Bletchley Park in WWII on crypography and defined i p i 2 as the repeat rate. Turing's assistant, Edward Simpson, published in 1949, i p i 2 as "index of species concentration" so 1 i p i 2 is now often called Simpson index of diversity in biodiversity literature. In 1945, Albert Hirschman suggested Sqrt( i p i2 ) as "index of trade concentration", and a few years later, Orris Herfindahl independently defined i p i 2 as the "index of industrial concentration" so in economics, i p i 2 is the HH index of concentration. Let U = {u 1,,u n } with probabilities p = {p 1,,p n } and d ij = "distance" between u i and u j where d ii = 0. Then C.R. Rao defined in 1982, quadratic entropy: Q = i j d ij p i p j. "Logical distance" is d ij = 1 if i j and d ii = 0, and the Rao quadratic entropy with logical distances is the logical entropy h(p) = i j p i p j = 1 i p i2.

20 Shannon's Entropy and Block Entropies Given a partition = {B}, we compare: Shannon's entropy: H( ) = B p B log(1/p B ), Logical entropy: h( ) = B p B (1 p B ). Each formula is an average of block entropies: H(B) = log(1/p B ) for Shannon entropy, h(b) = (1 p B ) for logical entropy. Eliminating p B gives: h(b) = 1 (1/2 H(B) ) = block entropy relationship. Search interpretation of Shannon block entropy: Game of "Twenty Questions": min. # yes-or-no questions necessary to find any designated hidden element (e.g., "sent message") in a set of like elements. Example: 2 3 = 8 elements so 3 binary partitions are required to find hidden element. Code 8 elements with 3-digit binary numbers and ask 3 questions: What is 1 st, 2 nd, & 3 rd digits? Prob of each element is p B = 1/8 and H(B) = log(1/p B ) = 3 and H( ) = B p B log(1/p B ) = 8 (1/8) 3 = 3.

21 Distinctions Interpretation of Shannon's Entropy Search interpretation for "hidden element" or "sent message" comes from Shannon's application to communications. There is an alternative "distinctions interpretation": H(B) as the number of binary partitions needed to make all the distinctions in a set of distinct elements. Equivalence of search and distinctions interpretations: If a set of binary partitions did not jointly make the distinction (u,u'), then they could not single out the hidden element if it was u or u'. And if binary partitions make all distinctions, then asking which block of each binary partition has hidden element will find it.

22 Picture for U = 8 Example U = 2 3 = 8 so 8 8 square is picture of U U. 1 st binary partition has 2 equal blocks so eq. classes are shaded blocks and distinctions are 2 16 = 32 unshaded blocks. 2 nd binary partition (joined with 1 st ) adds 4 4 = 16 new distinctions. 3 rd binary partition adds 8 1 = 8 distinctions. Total # distinctions (as ordered pairs) = 32+16+8 = 56. If = 1, H( ) = 3 and h( ) = 56/64 = 1 (1/2 3 ) = 7/8.

23 Block-count Entropy In base case of U = 2 3, we have two measures of "information content": Shannon's binary-partition count of 3 and the normalized dit-count of 7/8. But "# elements distinguished" = 8 is also a natural measure. In general, when event of prob p B occurs, "surprisevalue" is 1/p B. For independent events, surprise-values multiply so this suggests defining new entropy as multiplicative average of surprise-values: Block-count Entropy: H m ( ) = B (1/p B ) p B

24 Block-count and Shannon Entropies For U = 8 example, H m (1) = (1/(1/8)) 1/8 (1/(1/8)) 1/8 (8 ) = [8 1/8 ] 8 = 8. H m ( ) = average # of equal blocks distinguished by. In general: H m ( ) = 2 H( ), i.e., the block-count entropy is just the anti-log of the Shannon entropy. Shannon entropy can also be defined for any other base such as 3, H 3 ( ) = B p B log 3 (1/p B ) (# ternary partitions etc.), or to base e, H e ( ) = B p B ln(1/p B ) but in every case, the anti-log is the blockcount entropy: H m ( ) = 2 H( ) = 3 H 3( ) = e H e( ) so the block-count entropy is a basefree notion of Shannon entropy. Block-count block-entropy is: H m (B) = 1/p B = 2 H(B) so: h(b) = 1 (1/H m (B)) = 1 (1/2 H(B) ).

25 Parallel Concepts for Shannon and Logical Entropies Shannon Entropy Logical Entropy Block entropy H(B) = log(1/p B ) h(b) = 1 p B Entropy H( ) = B p B H(B) h( ) = B p B h(b) Mutual Information I( ; ) = H( )+H( ) H( ) mut( ; ) = h( )+h( ) h( ) Independence I( ; ) = 0 mut( ; ) = h( )h( ) Cross entropy H(p q) = p i log(1/q i ) h(p q) = p i (1 q i ) Divergence D(p q) = H(p q) H(p) Information Inequality D(p q) 0 with = iff p i = q i for all i. d(p q) = 2h(p q) h(p) h(q) d(p q) 0 with = iff p i = q i for all i.

26 3 Ways to Measure All the Distinctions of a Partition Shannon base 2 entropy of is the ave. number of equal binary partitions needed to make all the distinctions of. Block-count entropy of is the (mult.) ave. number of blocks in an equal-blocked partition needed to make all the distinctions of. Logical entropy of is the normalized count of all the distinctions of. Unifying concept of distinctions of a partition comes from underlying logic of partitions.

27 The End "Counting distinctions: on the conceptual foundations of Shannon s information theory" Forthcoming in: Synthese. Now available from Online First at Synthese site arrived at from: http://www.springerlink.com/journals/

28 Appendix: Elements and Distinctions Again When is a binary relation F U V a function F:U V? A relation F preserves elements if for any element u U, there is an element v V such that (u,v) F. A relation F reflects distinctions if for any pairs (u,v) F and (u',v') F, if v v' then u u'. F is a function iff F preserves elements and reflects distinctions. A function F is injective iff it also preserves distinctions, i.e., if u u' then F(u) F(u'). A function F is surjective iff it also reflects elements, i.e., for any element v V, there is an element u Usuch that F(u) = v.

29 Two ways to create a set U U Null Set... 4... 4 4 4 Blob 1 "Subset" creation myth: In the beginning was the void (null set) and then elements were created (fully properties and distinguished from one another) until a snapshot was taken and the result was a set of elements U. "Partition" creation myth: In the beginning was the blob, and then distinctions were made until a snapshot was taken and the result was a set of blocks taken to be the discrete partition on a set U (i.e., singletons of the elements of U)

30 Forcing models in partition logic A forcing model is given by a forcing ("forcing" = "is distinguished by") relation between the off-diagonal pairs U 2 of a set U and the partition logic formulas such that: for any atomic variable, (u,u') iff (u',u) ; for any formulas and, (u,u') iff (u,u') or (u,u') ; for any formulas and, (u,u') iff not (u,u') or (u,u'), and for any (2-link) path connecting u and u', there is a link where is not forced or is forced; for any formulas and, (u,u') iff (u,u') and (u,u'), and for any path connecting u and u', there is a link where both and are forced. Theorem: A formula is a partition tautology iff it is forced at each pair (u,u') in every forcing model.

31 Paths and cuts in graph theory Let G be a simple undirected graph on node-set U. A path between u and u' is a set of links (u,u 1 ), (u 1,u 2 ),, (u n-1,u n ), (u n,u') connecting u and u'. A cut between u and u' is a set of links that contain at least one link from every path from u to u'. Paths and cuts are opposites or "duals": color the links arbitrarily black or white and then one and only option holds for any (u,u'): (where u u') the white links form a cut between u and u'; or the black links contain a path between u and u'.

32 Forcing models and graph theory Each formula "? " where? is,, or has a corresponding Boolean condition: : holds or holds; : does not hold or holds; and : holds and holds. On the complete graph K(U) [every (u,u') is a link], think of cuts as "generalized" distinctions (and thus paths as generalized indistinctions). Theorem: (u,u')? in any forcing model iff the links where the Boolean condition for? is satisfied is a cut between u and u'.

33 Parallel Concepts for Shannon and Logical Entropies Shannon Entropy Logical Entropy Entropy H(p x ) = x p x log(1/p x ) h(p x ) = x p x (1 p x ) Uniform 1/n log(n) 1-1/n Cross entropy H(p x q x ) = p x log(1/q x ) h(p x q x ) = p x (1 q x ) Divergence D(p x q x ) = x p x log(p x /q x ) d(p x q x ) = x (p x q x ) 2 Information Ineq. D(p q) 0 with = iff p i = q i all i. d(p q) 0 with = iff p i = q i all i. Joint entropy H(X,Y) = xy p xy log(1/p xy ) h(x,y) = xy p xy (1 p xy ) Mutual info. H(X:Y) = H(X)+H(Y) H(X,Y) m(x,y) = h(x)+h(y) h(x,y) Independence H(X:Y) = H(X) + H(Y) m(x,y) = h(x)h(y)