Parameter Learning: Binary Variables

Size: px

Start display at page:

Download "Parameter Learning: Binary Variables"

Evan Watson
5 years ago
Views:

1 Parameter Learning: Binary Variables SS 008 Bayesian Networks Multimedia Computing, Universität Augsburg

2 Reference Richard E. Neapolitan. Learning Bayesian Networks. Prentice Hall Series in Artificial Intelligence, ISBN Chapter on Parameter Learning: Binary Variables (chapter 6) Figures and text are taken from that book Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany;

3 Augmented Bayesian Network () Goal: Extending theory from learning one to many parameters in BN Definition 6.8: An augmented Bayesian network G, F, is a Bayesian network determined by the following:. A DAG G= V, E where V ={ X,..., X n } and each X i is a random variable.. For every i, an auxiliary parent variable Fi of X i and a density function i of Fi. Each Fi is a root and has no edge to any variable except X i. The set of all Fi is denoted by F. That is F =F F... F n. 3. For every i, for all values pa i of the parents PA i in V of X i, and all values f i of F i, a probability distribution of X i conditional on pa i and f i. An augmented BN is simply a BN. The notation is the only difference! => Encodes our beliefs concerning the unknown conditional relative frequencies (parameters) needed for the DAG Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany; Rainer.Lienhart@informatik.uni-augsburg.de 3

4 Augmented Bayesian Network () F i is a set of random variable representing our belief concerning the relative frequencies of the values of X i given values of the parents of X i. Since all F i s are root nodes, they are mutually independent. Therefore, we have global parameter independence: f, f,..., f n = f f n f n Theorem 6.6: Let an augmented Bayesian network G, F, be given. Then the marginal distribution of P of { X,..., X n } constitutes a Bayesian network with G. We say G, F, embeds G, P Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany; Rainer.Lienhart@informatik.uni-augsburg.de 4

5 Augmented Bayesian Network (3) Definition 6.9: A binomial augmented Bayesian network G, F, is an augmented Bayesian network with the following properties:. For every i, X i has space {,}.. For every i, there is an ordering [ pa i, pa i,..., paiq ] of all instantiations of the parents PAi in V of X i, where q i is the number of different instantiations of these parents. Furthermore, for every i F i={ F i, F i,..., F iq } where each F ij is a root, has no edge to any variable except X i, and has density function ij f ij 0 f ij 3. For every i and j, and all values of f i={ f i,..., f ij,..., f iq } of F i, P X i = paij, f i,..., f ij,..., f iq = f ij i i i i Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany; Rainer.Lienhart@informatik.uni-augsburg.de

6 Example: Figure Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany; 6

7 Augmented Bayesian Network (4) F ij is a random variable whose probability distribution represents our belief concerning the relative frequency with which X i is equal to given that the parents of X i are in their jth instantiation. F ij are all roots in the BN => they are mutually independent => local parameter independence of their members F ij f i, f i,..., f iq = f i f i f iq for i n i i Together with global and local parameter independence we get: f, f,..., f nq = f f f nq n n Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany; Rainer.Lienhart@informatik.uni-augsburg.de 7

8 Augmented Bayesian Network () Theorem 6.7: Let a binomial augmented Bayesian network G, F, be given. Then for each i and each j, the ij-th conditional distribution in the embedded Bayesian network G, P is given by P X i = paij =E F ij Corollary 6.: Let a binomial augmented Bayesian network be given. If each F ij has a beta distribution with parameters a ij, bij, N ij =a ij bij, then for each i and each j the ij-th conditional distribution in the embedded network G, P is given by aij P X i = paij = N ij Note: Inference is always done in the embedded BN using only the variables in V Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany; Rainer.Lienhart@informatik.uni-augsburg.de 8

9 Bayesian Network Sample Definition 6.0: Suppose we have a sample of size M as follows:. We have the random vectors X X M M X =,..., X = with D= { X,..., X M } M Xn Xn such that, for every i, each X h has the same space. i. There is an augmented Bayesian network G, F,, where G= V, E, such that, for h M { X h,..., X nh } constitutes an instance of V in G resulting in a distinct augmented Bayesian network. Then the sample D is called a Bayesian network sample of size M with parameter G, F. Definition 6.: Suppose we have a Bayesian network sample of size M such that. for every i, each X i h has the space {,}. its augmented Bayesian network G, F, is binomial. Then the sample D is called a binomial Bayesian network sample of size M with parameter G, F Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany; Rainer.Lienhart@informatik.uni-augsburg.de 9

10 P( d f,..., fn) Lemma 6.8: Suppose. D is a Bayesian network sample of size M with parameter G, F ; Lemma 6.0: Suppose. D is a binomial Bayesian network sample of size M with parameter G, F ;. we have a set of values (data) of the X h as follows: x x M M x =,..., x = with d ={ X,..., X M } x n x M n Then Then n M P d f,..., f n = P x pa, f i, i= h= h i h i h i where pa contains the values of the parents of X i in the h-th case. P d f,..., f n =P d f,..., f n q n n M n qi = P x ih pa i h, f i = f ij s f ij t ij i= h= ij i= j = h where M ij is the number of x s in which X i h 's parents are in their j-th instantiation, and of these M ij cases, sij is the number in which x i h is equal to and t ij is the number in which it equals Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany; Rainer.Lienhart@informatik.uni-augsburg.de 0

11 Posterior Global Parameter Independence Theorem 6.8: Suppose we have the conditions in Lemma 6.8. Then n P d = i = M P x h i f i h= h i pa, f i f i df i Theorem 6.: Suppose we have the conditions in Lemma 6.0. Then n qi P d = E F ij [ F ij ] sij tij i = j= Suppose we have the conditions in Lemma 6.0 and each F ij has a beta distribution with the parameters a ij, bij, N ij =a ij bij. Then q n N ij a ij sij bij t ij P d = aij bij i = j= N ij M ij i Theorem 6.9: (Posterior Global Parameter Independence) Suppose we have the conditions in Lemma 6.8. Then the F i s are mutually independent conditional on D. That is, Theorem 6.: (Posterior Global Parameter Independence) Suppose we have the conditions in Lemma 6.0. Then the F ij s are mutally independent conditional on D. That is, n f,..., f n d = f i d i = qi f, f,..., f nq d = f ij d n n i= j = with sij f ij d = t ij f ij f ij f ij s ij t ij E F ij [ F ij ] If f ij =beta f ij ; a ij, bij then f ij d =beta f ij ; aij s ij, bij t ij Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany; Rainer.Lienhart@informatik.uni-augsburg.de

12 Theorem 6.0 Definition 6.: The augmented Bayesian network G, F, d with d denoting f,..., f n d is called the updated augmented Bayesian network relative to the Bayesian network sample and the data d. The network it embeds is called the updated embedded Bayesian network relative to the Bayesian network sample and the data d. Theorem 6.0: Suppose the conditions in Lemma 6.8 hold, and we create a Bayesian network sample of size M+ by including another random vector X M M X = M Xn Then if D is the Bayesian network sample of size M, the updated distribution P x M,..., x nm d is the probability distribution in the updated embedded Bayesian network Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany; Rainer.Lienhart@informatik.uni-augsburg.de

13 Example 6.0 () Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany; 3

14 Example 6.0 () Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany; 4

15 Example 6.0 (3) Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany;

16 Example 6.0 (4) Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany; 6

17 Sample Size Problem () weird? P X = = Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany; Rainer.Lienhart@informatik.uni-augsburg.de =.09 7

18 Sample Size Problem () P X = = 3 = The two graphs are equivalent, how can the results be different? Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany; Rainer.Lienhart@informatik.uni-augsburg.de 8

19 Equivalent Sample Size () P X = = Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany; Rainer.Lienhart@informatik.uni-augsburg.de =. 9

20 Equivalent Sample Size () P X = = Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany; Rainer.Lienhart@informatik.uni-augsburg.de 3 = 0

21 Equivalent Sample Size (3) The idea is that we specify values for a ij and bij that can actually occur in a sample exhibiting the conditional independencies entailed by the DAG Definition 6.3: Suppose we have a binomial augmented Bayesian network in which the density functions are beta f ij ; aij,b ij for all i and j. If there is a number N such that, for all i and j N ij =aij bij =P pa ij N then the network is said to have equivalent sample size N. In case of a root, PA i is empty and q i= ( P pa i = ). Theorem 6.3: Suppose we specify G,F, and N and assign for all i and j N a ij =b ij = qi Then the resultant augmented Bayesian network has equivalent sample size N, and the probability distribution in the resultant embedded BN is uniform. Theorem 6.4: Suppose we specify G,F, and N, a BN (G,P), and assign for all i and j a ij =P X i= pa ij P paij N bij = P X i= pa ij P paij N Then the resultant augmented BN has equivalent sample size N, and it embeds the originally specified BN Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany; Rainer.Lienhart@informatik.uni-augsburg.de

22 Equivalent Sample Size (4) Definition 6.4: Binomial augmented BNs G, F G, G and G, F G, G are called equivalent if they satisfy the following:. G and G are Markov equivalent.. The probability distributions in their embedded BNs are the same. 3. The specified density functions in both are beta. 4. They have the same equivalent sample size. Theorem 6.: Suppose we have two equivalent binomial augmented Bayesian networks G G G, F, G and G, F, G. Let D be a set of random vectors as specified in Definition 6.. Then, given any set d of values of the vectors in D, the updated embedded Bayesian network relative to D and the data d, obtained by considering D a binomial BN sample with parameter G, F G, contains the same probability distribution as the one obtained by considering D a binomial BN sample with parameter G, F G. => As long as we use an equivalent sample size, our updated probability distribution does not depend on which equivalent DAG we use to represent a set of conditional independencies Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany; Rainer.Lienhart@informatik.uni-augsburg.de

23 Expressing Prior Indifference Use N= root nodes => beta(f;,) Other nodes with qi different combinations of parent values => beta(f;/qi, /qi) In this way, the total sample size at each node is always Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany; 3

24 Learning with Randomly Missing Data SS 008 Bayesian Networks Multimedia Computing, Universität Augsburg 4

25 Learning with Data beta f ;, beta f ;, F beta f ;, F X F X Case 3 4 X X beta f ;6,3 X P X = =/ X P X = X = = / P X = X = =/ s=4 t = s =3 t = s =0 t = F beta f ;4, beta f ;, F F X X X X P X = =/3 P X = X = =/3 P X = X = =/ Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany; Rainer.Lienhart@informatik.uni-augsburg.de

26 Learning with Missing Data Assumption: Data items are missing randomly beta f ;, beta f ;, F beta f ;, F X F X Case 3 4 X X?? beta f ; 6,3 X P X = = / X P X = X = = / P X = X = =/ Case X X # Occurrence 0, 0, 3 4 0, 0, s ' =. t ' =. s ' =0. t ' =0. F beta f ;3 /,3/ beta f ;7 /,/ F F X X X X P X = =/3 P X = X =7/ P X = X =/ Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany; Rainer.Lienhart@informatik.uni-augsburg.de 6

27 What We Computed Set f ' ={ f ', f ', f ' }={ f, f, f }= f ={ /, /,/ } s ' =E s d, f ' = = P X h =, X h = d, f ' h= P X h =, X h = x h, f ' h= = P X h =, X h = x h, x h, f ' h= = 0 0= 3 t ' =E t d, f ' = 0 0 0= ==> Estimates are based only on the 'data' in our prior sample. They are not based on the data d Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany; Rainer.Lienhart@informatik.uni-augsburg.de 7

28 Recall beta f ;6,3 beta f ;7 /,/ beta f ;3 /,3/ F F F X X X X P X = =/3 P X = X =7/ P X = X =/ Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany; Rainer.Lienhart@informatik.uni-augsburg.de 8

29 What We Should Computed Incorporate the data d in our estimates: f ' ={ f ', f ', f ' }= { /3,7 /, / } s ' =E s d, f ' = = P X h =, X h = d, f ' h= P X h =, X h = x h, f ' h= = P X h =, X h = x h, x h, f ' h= = t ' =E t d, f ' = 0 0 0= => Under certain conditions the limit (repeat procedure) that is approached by f' is a value of f that locally maximizes rho(f d) = Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany; Rainer.Lienhart@informatik.uni-augsburg.de 9

30 Recall beta f ; 6,3 beta f ; F beta f ;,, F X F X Case 3 4 X X?? beta f ; 6,3 X X P X = =/3 P X = X = =7 / P X = X = = / Case X X # Occurrence 7/ / 3 4 0, 0, 7 s ' = t ' = s ' =0. t ' =0. F beta f ; , beta f ;, F F X X X X P X = =/3 P X = X = =3/ 48 P X = X = = / Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany; Rainer.Lienhart@informatik.uni-augsburg.de 30

31 Repeat Update beta f ; 6,3 beta f ; F 43 9, beta f ; F X 3 3, F X Case 3 4 X X?? beta f ; 6,3 X X P X = =/3 P X = X = =43/7 P X = X = = / Case X X # Occurrence 43 / 7 9 / , 0, 43 s ' = 7 9 t ' = 7 s ' =0. t ' =0. F beta f ; , 7 beta f ;, F F X X X X P X = =/3 P X = X = =? P X = X = =/ Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany; Rainer.Lienhart@informatik.uni-augsburg.de 3

32 Algorithm 6. () Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany; 3

33 Algorithm 6. () Prof. Dr. Rainer Lienhart, Head of Multimedia Computing, Institut für Informatik, Universität Augsburg Eichleitnerstr. 30, D-863 Augsburg, Germany; 33

Parameter Learning With Binary Variables

Parameter Learning With Binary Variables With Binary Variables University of Nebraska Lincoln CSCE 970 Pattern Recognition Outline Outline 1 Learning a Single Parameter 2 More on the Beta Density Function 3 Computing a Probability Interval Outline