Maximization of Multi - Information

Size: px

Start display at page:

Download "Maximization of Multi - Information"

Nathaniel Stevenson
6 years ago
Views:

Maximization of Multi - Information Week of Doctoral Students 2007 Jozef Juríček http://www.adultpdf.

1 Maximization of Multi - Information Week of Doctoral Students 2007 Jozef Juríček Academy of Sciences of the Czech Republic Created by Image To PDF trial version, Institute to remove of Information this Theory mark, and please Automation register this soft Supervisor: Ing. František Matúš, CSc. Created by Image To PDF trial ver Prague, 05 th June 2007 Charles University in Prague Faculty of Mathematics and Physics Department of Probability and Mathematical Statistics

2 01 17 Contents Introduction Informally Formally Recent Results Dimensionality Set of Maximizers Exponential Families and Maximizers Examples and Special Cases The Case of N Equal Units The Case of 2 Units The Case of N Units

3 01 05 Introduction Contents Introduction Informally Formally Recent Results Dimensionality Set of Maximizers Exponential Families and Maximizers Examples and Special Cases The Case of N Equal Units The Case of 2 Units The Case of N Units

4 01 05 Introduction Informally 01/17 Interests of Study interdependence of stochastic units (random variables) - based on information theory the probability measures (pm) that represents maximal interdependence geometric interpretation of the problem and its solution description of the structure of maximizers

5 01 05 Introduction Formally 02/17 Notations = Ω finite set P(Ω) := {p = (p(ω)) ω Ω R Ω : ω Ω; p(ω) 0 & ω Ω p(ω) = 1} supp(p) := {ω Ω : p(ω) > 0} Kullback - Leibler divergence D : P(Ω) P(Ω) [0, ] defined as { p(ω) (p, q) D(p q) := ω supp(p) p(ω) ln, if supp(p) supp(q); q(ω), otherwise for E P(Ω) let us define D E : P(Ω) R + as p D E(p) := inf q E D(p q)

6 01 05 Introduction Formally 03/17 Notations Denote V := [N] := {1,..., N} the set of N 2 stochastic units (random variables X 1,..., X N ) for i [N]: Ω i the set of configurations of a unit i and Ω i =: n i for A [N]: Ω A := i A Ω i for p = p V P(Ω V ): p A denotes marginal pm and p i := p {i} P(Ω V ) := {p P(Ω V ) : supp(p) = Ω V } the set of factorizable pm F := F(Ω V ) := {p P(Ω V ) : p(ω 1,..., ω n) = p 1 (ω 1 )... p N (ω n); (ω 1,..., ω N ) Ω V } the set of strictly positive factorizable pm F := F P(Ω V )

7 01 05 Introduction Formally 04/17 Exponential Families define the function Exp : R Ω P(Ω) as ( R Ω (X(ω)) ω Ω = X Exp(X) = (Exp(X)(ω)) ω Ω := e X(ω) ω Ω ex(ω ) exponential family in P(Ω) is the image Exp(T ), where T R Ω is some linear (or affine) subspace of R Ω F is an exponential family in P(Ω) hierarchy of exponential families based on interactions for k [N]: I (k) := {f R Ω V : f is constant at least in N k variables} Ĩ (k) := I (k) (I (k 1) ) w.r.t. scalar product f, g := ω Ω V f (ω)g(ω); orthogonal complement is in R Ω V Exp(I (0) )... Exp(I (N) ) Exp(I (0) ) is center of ( i [N] n i 1) - dimensional simplex (uniform distribution on Ω V ) Exp(I (1) ) = F with dimension i [N] (n i 1) ) ω Ω

8 01 05 Introduction Formally 05/17 Multi - Information... Measure of Interdependence define I(p) := I p(x 1,..., X N ) := D F(p) the question is behavior of the I(p) as a function of pm p on P(Ω V ) interest in global and local maximizers of I(p) how many of them? any special structure of the set of maximizers? I(p) = I p(x 1,..., X n) = N i=1 Hp i (X i) H p(x 1,..., X N ), where marginal entropy H pi (X i ) = ω i Ω i p i (ω i ) ln p i (ω i ) global entropy H p(x 1,... X N ) = ω Ω V p(ω) ln p(ω) I(p) N 1 i=1 ln(n i) M(Ω V ) := M(n 1,..., n N ) := {p P(Ω V ) : I(p) = N 1 i=1 ln(n i)} for which cases of Ω V is M(Ω V )? (maximal global maximizers)

9 06 09 Recent Results Contents Introduction Informally Formally Recent Results Dimensionality Set of Maximizers Exponential Families and Maximizers Examples and Special Cases The Case of N Equal Units The Case of 2 Units The Case of N Units

10 06 09 Recent Results Dimensionality 06/17 Dimensions of Maximizers Theorem (Dimension of Maximizers of the Distance From an Arbitrary Exponential Family [AyKn05], [MaAy04]) Let E be an exponential family in P(Ω) with dim E = d. Then there exists an exponential family E ; E E with dim E = 3d + 2 such that cl(e ) contains all local maximizers of D E. By now let us assume (wlog) 2 n 1... n N. Corollary (Multi - Information Maximizers Dimension [AyKn05]) There exists an exponential family F in P(Ω V ) with dim F 3 (n i 1) + 2 3N(n N 1) + 2 i [N] such that cl(f ) contains all local maximizers of I(p). In the binary case, i [N] : n i = 2; dim F 3N + 2.

11 06 09 Recent Results Set of Maximizers 07/17 General Theorems About M(Ω V ) Theorems (The Essential Theorems About M(Ω V ) [AyKn05]) (1) Let p be a pm on Ω V. Then p M(Ω V ) if and only if there exist a pm p (N) P(Ω N ) and the functions π i : Ω N onto Ω i ; i [N 1] with p (N) [π i (ω N ) = ω i ] = 1 n i ( ω i Ω i ) (uniform distributions on 1-dimensional margins Ω 1,..., Ω N 1 ) and (ω 1,..., ω N ) Ω V : { p (N) (ω p(ω 1,..., ω N ) = N ), if ω i = π i (ω N ), i [N 1]; 0, otherwise (2) M(Ω V ) if and only if n N n min := A [N 1] ( 1) A 1 GCD((n i ) i A ) (GCD is greatest common divisor, LCM will be least common multiple)

12 06 09 Recent Results Set of Maximizers 08/17 Remarks on M(Ω V ) Remarks (Remarks on the Essential Theorems About M(Ω V ) [AyKn05]) (1) M(Ω V ) if (1a) N = 2 (1b) n 1 =... n N =: n (2) n N 1 n min 1 + i [N 1] (n i 1) and (2a) n min = n N 1 n N 1 = LCM((n i ) i [N 1] ) (2b) n min = 1 + i [N 1] (n i 1) LCM((n i ) i [N 1] ) = 1 (2c) n min LCM((n i ) i [N] ) (3) maximizers p M(Ω V ) simultaneously maximize the mutual information of the pairs (i, N) of units (i [N]) (3a) in the case of LCM((n i ) i [N 1] ) = n N maximizers p M(Ω V ) simultaneously maximize the mutual information of all pairs (i, j) of units (i, j [N])

13 06 09 Recent Results Exponential Families and Maximizers 09/17 Sufficiency of Low - Order Interactions for M(Ω V ) Theorem (The Lowest Order of Sufficient Interaction for M(Ω V ) is 2 [AyKn05]) There exists an exponential family F Exp(Ĩ(2) ) with dimf = (n N 1) i [N 1] (n i 1) such that M(Ω V ) cl(f ).

14 10 17 Examples and Special Cases Contents Introduction Informally Formally Recent Results Dimensionality Set of Maximizers Exponential Families and Maximizers Examples and Special Cases The Case of N Equal Units The Case of 2 Units The Case of N Units

15 10 17 Examples and Special Cases The Case of N Equal Units 10/17 (n,..., n) for Ω i = n; i [N]: M(Ω V ) = { 1 n ω N Ω δ (π1 (ω N N ),...,π N 1 (ω N ),ω N ) : π i : Ω N 1 1 Ω i ; i [N 1]} M(Ω V ) = (n!) N 1 p M(Ω V ); I(p) = (N 1) ln(n) there exists exponential family with dimension less than or equal to 3N(n 1) + 2 that contains M(Ω V ) in its closure there exists exponential family with dimension less than or equal to n2 +3n 2 that contains M(Ω V ) in its closure there exists exponential family F Exp(Ĩ(2) ) with dimension less than or equal to (N 1)(n 1) 2 such that M(Ω V ) cl(f )

10 17 Examples and Special Cases 10 11 The Case of N Equal Units 11/17 (2, 2) In the case of two binary units, we have: M(Ω V ) = { 1 2 (δ (0,0) + δ (1,1) ), 1 2 (δ (1,0) + δ (0,1) )} P({0, 1} 2 )

16 10 17 Examples and Special Cases The Case of N Equal Units 11/17 (2, 2) In the case of two binary units, we have: M(Ω V ) = { 1 2 (δ (0,0) + δ (1,1) ), 1 2 (δ (1,0) + δ (0,1) )} P({0, 1} 2 ) Exp(Ĩ(2) ), dim(exp(ĩ(2) )) = 1 F = Exp(Ĩ(2) ) = {λ 1 2 (δ (0,0) + δ (1,1) ) + (1 λ) 1 2 (δ (1,0) + δ (0,1) ) : 0 < λ < 1}. Let us see the situation on the pictures 1 : 1 the most of the right picture is grabbed from [AyKn05]

17 10 17 Examples and Special Cases The Case of 2 Units 12/17 (n1, n2) M(n 1, n 2 ) = {p P(Ω 1 Ω 2 ) : I(p) = ln(n 1 )} Let Ω 1 := Ω 1 {0}, S := {π : Ω 2 Ω 1 : π(ω 2 ) Ω 1 }, define relation σ π σ 1 (ω 1 ) π 1 (ω 1 ); ω 1 Ω 1 is a partial order(reflexivity, anti-symmetry, transitivity) and makes S a poset and induces a cover graph of S, see the picture of cover graph (so-called Hasse diagram) and the structure of M(2, 3): cover graph of S and the set M(n 1, n 2 ) are connected if and only if n 1 < n 2

10 17 Examples and Special Cases 12 14 The Case of 2 Units 13/17 (n 1, n 2 ) for a given π S we consider M π(ω 1, Ω 2 ) := {p P(Ω 1 Ω 2 ) : ω 1 Ω 1 : ω 2 π 1 (ω 1 ) p(ω 1, ω 2 ) = 1 n 1 and p(ω 1, ω

18 10 17 Examples and Special Cases The Case of 2 Units 13/17 (n 1, n 2 ) for a given π S we consider M π(ω 1, Ω 2 ) := {p P(Ω 1 Ω 2 ) : ω 1 Ω 1 : ω 2 π 1 (ω 1 ) p(ω 1, ω 2 ) = 1 n 1 and p(ω 1, ω 2 ) > 0 iff π(ω 2 ) = ω 1 } notice the picture: there must be at least one unit in each row (onto) and mostly one unit in each column (function) the set of global maximizers of the mutual information is a disjoint union of relatively open faces M π(ω 1, Ω 2 ) with dim M π(ω 1, Ω 2 ) = π 1 (Ω 1 ) Ω 1 : M(Ω 1, Ω 2 ) = π S Mπ(Ω 1, Ω 2 ) there are Fm n 1 := n 1! ( ) n 2 m Sm,n1 faces M π(ω 1, Ω 2 ) with dimension n1 i=0 ( 1)i( ) n 1 (n1 i i) m denotes Stirling m n 1, where S m,n1 := 1 n 1! number of the second kind and m = n 1,..., n 2 there are F n1,n2 := n 2 m=n 1 F m n 1 vertexes of cover graph (naturally different affine spaces/exponential families which in their closures contains maximizers) there exist an exponential family F with dim F = 2 such that cl(f ) M(2, 3)

19 10 17 Examples and Special Cases The Case of 2 Units 14/17 (2, n 2 ) let us consider number of vertexes F 2,n2 of the cover graph of S define the (code) function C : S {0, 1, 2} n 2 as this picture denotes: C(π) will be called code of π F 2,n2 = S = [all codes] [codes that do not contain 1 or 2] = (by De Morgan) = [all] [1 but not 2] [2 but not 1] [not 1 nor 2] formally: F 2,n = 3 n 2(2 n 1) 1 = 3 n 2 2 n + 1 n hence, we can represent the problem as a homogenous difference equation with characteristic polynomial P 2,n (λ) = (λ 3)(λ 2)(λ 1) = λ 3 6λ λ 6 Corresponding difference equation is a n+3 = 6a n a n+1 6a n with given values a 2 = F 2,2 = 2, a 3 = F 2,3 = 12, a 4 = F 2,4 = 50 there exists exponential family F with dim F = n 2 1 such that cl(f ) M(2, n 2 )

20 10 17 Examples and Special Cases The Case of N Units 15/17 (2, 2, n 3 ) Analogously, we define F 2,2,n3 as a number of naturally different exponential families which closures contain M(2, 2, n 3 ) we notice that the code of onto mappings contain at least on of pairs (1, 4), (2, 3) and we will see Created by Image To PDF trial version, to remove this mark, please register this F 2,2,n = 5 n 4 3 n n 1 P 2,2,n (λ) = (λ 5)P 2,n = (λ 5)(λ 3)(λ 2)(λ 1) with a 2 = 4, a 3 = F 2,2,3, a 4 = F 2,2,4, a 5 = F 2,2,5 dim F = 2(n 3 1)

21 10 17 Examples and Special Cases The Case of N Units 16/17 (2, 2, 2, n 4 ) F 2,2,2,n = 9 n 2 5 n 10 4 n n 6 2 n 1 P 2,2,2n (λ) = (λ 9)(λ 4)P 2,2,n = (λ 9)(λ 5)(λ 4)P 2,n (λ) dim F = 3(n 4 1)

22 10 17 Examples and Special Cases The Case of N Units 17/17 Questions for the Following Studies is ad-hoc searching of the difference equations representation useful/necessary? is there any (reasonable) mapping between the number of surjective mappings and the space of natural polynomials? is there any connection with the maximizers on the spaces of continuous distributions through the characteristic polynomial representation (difference to differential equations)?

23 Goodbye Bibliography Thank you for attention! This file is available on Ay, N., Knauf, A. (2005): Maximizing Multi-Information. Kybernetika. Matúš, F., Ay, N. (2004): On maximization of the information divergence from an exponential family. WUPES 03. University of Economics Prague. pp

Maximization of the information divergence from the multinomial distributions 1

Maximization of the information divergence from the multinomial distributions 1 aximization of the information divergence from the multinomial distributions Jozef Juríček Charles University in Prague Faculty of athematics and Physics Department of Probability and athematical Statistics