Non-Informative Dirichlet Score for learning Bayesian networks

Size: px

Start display at page:

Download "Non-Informative Dirichlet Score for learning Bayesian networks"

Toby Cummings
5 years ago
Views:

1 Non-Informatve Drchlet Score for learnng Bayesan networks Maom Ueno Unversty of Electro-Communcatons, Japan Masak Uto Unversty of Electro-Communcatons, Japan uto Abstract Learnng Bayesan networks s known to be hghly senstve to the chosen equvalent sample sze (ESS) n the Bayesan Drchlet equvalence unform (BDeu). Ths senstvty often engenders unstable or undesred results because the pror of BDeu does not represent gnorance of pror knowledge, but rather a user s pror belef n the unformty of the condtonal dstrbuton. Ths paper presents a proposal for a nonnformatve Drchlet score by margnalzng the possble hypothetcal structures as the user s pror belef. Some numercal experments demonstrate that the proposed score mproves learnng accuracy. The results also suggest that the proposed score mght be effectve especally for small samples. Introducton Margnal lkelhood (ML) (usng a Drchlet pror) that ensures lkelhood equvalence, the most popular learnng score for Bayesan networks, fnds the maxmum a posteror (MAP) structure (Buntne, 99; Heckerman et al., 995). Ths score s known as Bayesan Drchlet equvalence (BDe) (Heckerman et al., 995). Gven no pror knowledge, the Bayesan Drchlet equvalence unform (BDeu), as proposed earler by Buntne (99), s often used. Actually, BDe(u) requres an equvalent sample sze (ESS), whch reflects the degree of a user s pror belef. Moreover, recent studes have demonstrated that learnng Bayesan networks s hghly senstve to the chosen equvalent sample sze (ESS) (Steck and Jaakkola, ; Slander, Kontkanen and Myllymak, 7). To clarfy the BDe(u) mechansm, Ueno () analyzed log-bde(u) asymptotcally, obtanng the result that t s decomposed nto the log-lkelhood and the penalty term of the complexty, whch reflects the dfference between the learned structure from data and the hypothetcal structure from the user s knowledge. As the two structures become equvalent, the penalty term s mnmzed wth the fxed ESS. Conversely, the term ncreases to the degree that the two structures become dfferent. Furthermore, the result suggests that a tradeoff exsts between the role of ESS n the log-lkelhood (whch helps to block extra arcs) and ts role n the penalty term (whch helps to add extra arcs). That tradeoff mght cause the BDeu score to be hghly senstve to ESS. It mght make t more dffcult to determne an approxmate ESS. Moreover, Ueno () showed that the pror of BDeu does not represent gnorance of pror knowledge, but rather a user s pror belef n the unformty of the condtonal dstrbuton. Ths fact s partcularly surprsng because t had been beleved that BDeu has a non-nformatve pror. The results further mply that the optmal ESS becomes large/small when the emprcal condtonal dstrbuton becomes unform/skewed. Ths man factor underpns the senstvty of BDeu to ESS. To solve ths problem, Slander, Kontkanen and Myllymak (7) proposed a learnng method to margnalze the ESS of BDeu. They averaged BDeu values wth ncreasng ESS

2 from to by.. Moreover, to decrease the computaton costs of the margnalzaton, Cano et al. () proposed averagng a wde range of dfferent chosen ESSs: ESS >., ESS >>., ESS <., ESS <<.. They reported that the proposed method performed robustly and effcently. However, such approxmated margnalzaton does not always guarantee robust results when the optmal ESS s extremely small or large. In addton, the exact margnalzaton of ESS s dffcult because ESS s a contnuous varable n doman (, ). Our proposal n ths paper s a full nonnformatve Drchlet score. We assume all possble hypothetcal structures as a pror belef of BDe because the problem of BDeu s that t assumes only a unform dstrbuton as a pror belef. Ths paper presents a proposal for a nonnformatve Drchlet score by averagng the hypothetcal structures as a user s pror belef. The optmal ESS of BDeu becomes large/small when the emprcal condtonal dstrbuton becomes unform/skewed because ts hypothetcal structure assumes a unform condtonal probabltes dstrbuton and the ESS adjusts the magntude of the user s belef for a hypothetcal structure. However, the ESS of full non-nformatve pror s expected to work effectvely as actual pseudo-samples to augment the data, especally when the sample sze s small, regardless of the unformty of emprcal dstrbuton. Ths s a unque feature of the proposed method because the prevous non-nformatve methods exclude the ESS from the score(averaged BDeu(Slander, Kontkanen and Myllymak, 7,Cano et al., ), Normalzed Maxmum Lkelhood (NML)(Slander, Roos, and Myllymak, ) ). Some numercal experments demonstrate that the proposed score mproves learnng accuracy. The results also suggest that the proposed score mght be effectve especally for small samples. Learnng Bayesan networks Let {x, x,, x N } be a set of N dscrete varables, each of whch can take a value n the set of states {,, r }. Here, x = k means that an x s state k. Accordng to the Bayesan network structure g G, the jont probablty dstrbuton s gven as x, x,, x N g) = N x Π, g), () = where G sgnfes the possble set of Bayesan network structures, and where Π denotes the parent varables set of x. Next, we ntroduce the problem of learnng a Bayesan network. Let θ jk be a condtonal probablty parameter of x = k when the j-th nstance of the parents of x s observed (We wrte Π = j). Buntne (99) assumed the Drchlet pror and used an expected a posteror (EAP) estmator as the parameter estmator Θ = (ˆθ jk ), ( =,, N, j =,, q, k =,, r ): ˆθ jk = α jk + n jk α j + n j, (k =,, r ), () where n jk represents the number of samples of x = k when Π = j and n j = r k= n jk, and where α jk denotes the hyperparameters of the Drchlet pror dstrbutons. (α jk s a pseudosample correspondng to n jk ), α j = r k= α jk, and ˆθ jr = r k= ˆθ jk. The margnal lkelhood s obtaned as X α, g) = q N = j= Γ(α j) Γ(α j + n j) r k= Γ(α jk + n jk ). Γ(α jk ) () Here, q sgnfes the number of nstances of Π n whch q = x l Π r l. In addton, X s a dataset. The problem of learnng a Bayesan network s to fnd the MAP structure that maxmzes the score (). Partcularly, Heckerman et al. (995) presented a suffcent condton for satsfyng the lkelhood equvalence assumpton, whch says that data should not help to dscrmnate network structures that represent the same assertons of condtonal ndependence, n the form of the followng constrant related to hyperparameters of (): q N r α jk = const. () = j= k= Furthermore, they proposed a margnal lkelhood that reflects a user s pror knowledge as

3 shown below. X α, g, g h ) = q N = j= j j ) r + nj) k= jk + n jk) jk ) (5) jk = αx = k, Πg = j gh ) (6) Here, α s the user-determned equvalent sample sze (ESS), Π g denotes the parent varable sets of x accordng to g and g h s the hypothetcal Bayesan network structure that reflects a user s pror knowledge. Ths metrc was desgnated as the Bayesan Drchlet equvalence (BDe) score metrc. As Buntne (99) descrbed, jk = α (r q ) s regarded as a specal case of the BDe metrc. Heckerman et al. (995) desgnated ths specal case as BDeu. For cases of whch we have no pror knowledge, BDeu s often used n practce. Heckerman et al. (995) reported, as a result of ther comparatve analyses of BDeu and BDe, that BDeu s better than BDe unless the user s belefs are close to the true model. BDeu requres an equvalent sample sze (ESS), whch s the value of a free parameter specfed by the user. Recent reports have descrbed that ESS n BDeu plays an mportant role n learnng Bayesan networks (Steck and Jaakkola, ; Slander, Kontkanen and Myllymak, 7). Especally, Ueno () reported that the pror of BDeu does not represent gnorance of pror knowledge but rather a user s pror belef n the unformty of the condtonal dstrbuton. Ths result s partcularly surprsng because t had been beleved that BDeu has a nonnformatve pror. In addton, he explaned the mechansm by whch the optmal ESS becomes large/small when the emprcal condtonal dstrbuton becomes unform/skewed because ESS determnes the magntude of the user s pror belef for a hypothetcal structure. Ths mechansm causes the BDeu score to be hghly senstve to ESS. Non-nformatve Drchlet Score Ths secton presents an alternatve nonnformatve pror Drchlet score because BDeu has no non-nformatve pror. To clarfy the mechansm of BDe, Ueno () analyzed the log-bde asymptotcally and derved the followng theorem. Theorem. (Ueno ) When α + nt s suffcently large, log-bde converges to log X α, g, g h ) = log Θ g X, α, g, g h ) (7) ( ) q N r r log + n jk + const, r = j= k= jk where log Θ g X, α, g, g h ) = q N r h ( jk + n jk) log (αg jk + n jk) = j= k= ( j + n j), const s ndependent terms of the number of parameters, and Θ g = { θ g jk }, ( =,,, N, j =,, q, k =,, r ), θ g jk = αgh jk + n jk j + nj. (8) From (7), log-bde can be decomposed nto two factors:. a log-posteror term log Θ gh X, α, g, g h ) and. ( a penalty ) r r k= r log + n jk. jk term N q = j= N q = r r k= r j= s the number of parameters. Ths well known model selecton formula s generally nterpreted. as reflectng the ft to the data and. as sgnfyng the penalty that blocks extra arcs from beng added. Ueno () descrbed that the term N = q j= r k= log ( + n jk jk ) n (7) reflects the dfference between the learned structure from data and the hypothetcal structure g h from the user s knowledge n BDe. To the degree that the two structures are equvalent, the penalty term s mnmzed wth the fxed ESS. Conversely, to the degree that the two structures dffer, the term s larger. Moreover, from (7), α determnes the magntude of the user s pror belef for a hypothetcal structure g h. On the other hand, BDeu, of whch the pror dstrbuton assumes an unform dstrbuton of condtonal probabltes, had been beleved to employ a non-nformatve pror. However, from (7), the optmal ESS of BDeu becomes

4 large/small when the emprcal condtonal dstrbuton becomes unform/skewed because ts hypothetcal structure g h assumes a unform condtonal probabltes dstrbuton and the ESS adjusts the magntude of the user s belef for a hypothetcal structure. Namely, the pror of BDeu does not represent gnorance of pror knowledge but rather a user s pror belef n the unformty of the condtonal dstrbuton. The man purpose of ths paper s to develop a Drchlet score that has a full non-nformatve pror. For ths purpose, we assume all possble hypothetcal structures as a pror belef of BDe because a problem of BDeu s that t assumes only a unform dstrbuton as a pror belef. That s, we margnalze ML over the hypothetcal structures g h G as follows: Defnton. N IP BDe (Non-nformatve pror Bayesan Drchlet equvalence) s defned as X g, α) = g h )X α, g, g h ) (9) = q N g h ) g h G = j= g h G j j ) + n j) r k= jk + n jk) jk ), where g h G s the summaton over the possble hypothetcal structures, and where g h ) s a unform dstrbuton. To estmate the margnal ML, t s necessary to calculate x = k, Π g = j g h ) automatcally for all the structures. For ths purpose, we use an emprcal estmaton of x = k, Π g = j g h, Θ gh ) usng data. Here, we estmate the jont probablty estmate x = k, Π g = j gh, Θ gh ) whch s transformed to the jont probablty of x and ts parents varables n g from the estmated condtonal probablty parameters Θ gh gven g h. Our method has the followng two steps:. Estmate the condtonal probablty parameters set Θ gh = {θ gh jk }, ( =,,, N, j =,, q, k =,, r ) gven g h from data. Estmate the jont probablty x = k, Π g = j gh, Θ gh ) Frst, we estmate the condtonal probablty parameters set gven g h as θ gh jk = + n gh r q gh jk + n gh q gh j, () where n gh jk represents the number of samples of x = k when Π gh = j (parent varables set of x gven g h ) and n gh j = r k= ngh jk, and qgh denotes the number of parent varables of Π gh. Next, we calculate the estmated jont probablty x = k, Π g = j g h, Θ gh ) as shown below. x = k, Π g = j gh, Θ gh ) = () x,, x,, x N g h, Θ gh ) x l / x Π g For the computaton of (), we can employ varous margnalzaton algorthms (e.g. varable elmnaton, factor elmnaton, random samplng elmnaton: see, for example, (Darwche, 9)). In practce, however, the Expected log-bde s dffcult to calculate because the product of multple probabltes suffers serous computatonal problems. To avod ths, we propose an alternatve method, Expected log-bde, as descrbed below. Defnton. Expected log-bde s defned as E g h G log X α, g, g h ) = () g h ) log X α, g, g h ) g h G = q N g h ) j log ) g h G = j= j + n j) r h log Γ(αg jk + n jk) k= jk ), Compared to NIP-BDe, the Expected log- BDe s practcal for computaton because t can be calculated by the sum of log-bde. Although a smlar Bayesan model averagng crteron has already been proposed (Chckerng and Heckerman, ), (Tan, He, and Ram, ), ts purpose s not to predct the true structure but to seek the optmal structure

5 whch maxmzes the nference predcton of a new data x +. Therefore, ther purpose s not the same as that of ths study. Learnng algorthm usng dynamc programmng Learnng Bayesan networks s known to be an NP complete problem (Chckerng, 996). Recently, however, the exact soluton methods can produce results n reasonable computaton tme f the varables are not prohbtvely numerous (e.g.(slander and Myllymak, 6), Malone et al., ). We employ the learnng algorthm of (Slander and Myllymak, 6) usng dynamc programmng. Our algorthm comprses four steps:. Compute the local Expected log-bde scores for all possble N N (x, Π g ) pars.. For each varable x x, fnd the best parent set n parent canddate set {Π g } for all Π g x \ {x }, ( g G).. Fnd the best snk for all N varables and the best orderng.. Fnd the optmal Bayesan network Only Step s dfferent from the procedure descrbed by Slander et al., 6. Frst, to obtan the local Expected log-bde score for a varable and ts parent varables set par, we should average log-bde for all possble hypothetcal structures. To compute the local Expected log-bde score for each par (x, Π g ) n Step, condtonal frequency tables cft(x, Π gh ) for all possble hypothess parent sets {Π gh } for ( g h G)are needed. Algorthm gves a pseudo-code for computng the local Expected log-bde scores, LS[x ][Π g ], for all possble (x, Π g ) pars. The pseudo-code assumes some helper functons: getcft(x, Π g ) produces the condtonal frequency table cft(x, Π g ), and gett l(π g, cft[x ][Π gh ]) translates the jont probablty estmate x = k, Π g = j gh, Θ gh ) from cft[x ][Π gh ], and the functon score (Π g, T l[x ][Π g ][Πgh ]) calculates the local log-bde score of g gven a hypothetcal structure g h. Frst, getcft(x, Π g ) produces the condtonal frequency tables cft(x, Π g ) for all possble Π g. They are stored on the memory or the dsk as soon as they are produced. Ths procedure s the same as that of (Slander and Myllymak, 6). Next, gett l(π g, cft[x ][Π gh ]) calculates the jont probablty estmate x = k, Π g = j gh, Θ gh ) usng the stored condtonal frequency tables by margnalzng the product of factors n Θ gh ncludng {x, Π g }. Ths procedure runs n O( {x Π g } exw)) gven an elmnaton order of wdth w. The local Expected log-bde, LS[x ][Π g ], can be calculated as the summaton of local log-bde values for all local possble structures Π gh. Algorthm getlocalscore(x). for all x x do for all {Π g } x \ {x } do f cft[x ][Π g ] = null then cft[x ][Π g ] getcft(x, Πg ) end f end for for all {Π g } x \ {x} do for all {Π gh } x \ {x } do T l[x ][Π g ][Πgh ] gett l(π g, cft[x][πgh ]) LS[x ][Π g ] LS[x ][Π g ] + score (Π g, T l[x ][Π g ][Πgh ]) end for LS[x ][Π g ] LS[x ][Π g ]/ {Πg x \ {x }} end for end for f x > then getlocalscore(x \ {x }) end f The tme complexty of the Algorthm s O(N (N ) exw)). After computng the local scores, we fnd the optmal Bayesan network n Steps, and of our algorthm. Steps are the same as those proposed by Slander et al. (6). Although the tradtonal scores (BDeu, AIC, BIC, and so on) run n O(N (N ) ), the proposed method requres greater computaton costs. However, the requred memory for the proposed computaton s equal to that of the tradtonal scores.

6 x = x = ) =.5 x = x = ) =.95 x = ) =.5 x = x =, x = ) =.5 x = x =, x = ) =.5 x = x =, x = ) =.5 x = x =, x = ) =.95 x = x = ) =.5 x = x = ) =.95 x = x = ) =.5 x = x = ) =.95 Fgure : g: Strongly skewed dstrbuton. x = x = ) =. x = x = ) =.8 x = ) =.5 x = x =, x = ) =. x = x =, x = ) =.5 x = x =, x = ) =.5 x = x =, x = ) =.8 x = x = ) =. x = x = ) =.8 x = x = ) =. x = x = ) =.8 Fgure : g: Skewed dstrbuton. x = x = ) =. x = x = ) =.7 x = ) =.5 x = x =, x = ) =. x = x =, x = ) =.5 x = x =, x = ) =.5 x = x =, x = ) =.7 x = x = ) =. x = x = ) =.7 x = x = ) =. x = x = ) =.7 Fgure : g: Unform dstrbuton. x = x = ) =.5 x = x = ) =.65 x = ) =.5 x = x =, x = ) =.5 x = x =, x = ) =.5 x = x =, x = ) =.5 x = x =, x = ) =.65 x = x = ) =.5 x = x = ) =.65 x = x = ) =.5 x = x = ) =.65 Fgure : g: Strongly unform dstrbuton. 5 Smulaton experments We conducted smulaton experments to compare Expected log-bde wth BDeu accordng to the procedure descrbed by Ueno (). We used small network structures wth bnary varables n Fgs.,,, and, n whch the dstrbutons are changed from skewed to unform. Fgure presents a structure n whch the condtonal probabltes dffer greatly because of the parent varable states (g: Strongly skewed Table : Learnng performance of BDeu g BDeu(α =.) g g g g BDeu(α =.) g g g g BDeu(α = ) g g g dstrbuton). By gradually reducng the dfference of the condtonal probabltes from Fg., we generated Fg. (g: Skewed dstrbuton), Fg. (g: Unform dstrbuton), and Fg. (g: Strongly unform dstrbuton). Procedures used for the smulaton experments are descrbed below.. We generated, 5, and, samples

7 Table : Learnng performance of Expected log-bde g Expected log-bde(α =.) g g g g Expected log-bde(α =.) g g g g Expected log-bde(α = ) g g g from the four fgures.. Usng BDeu and Expected log-bde by changng α (.,., ), Bayesan network structures were estmated, respectvely, based on, 5, and, samples. We searched for the exactly true structure.. The tmes the estmated structure was the true structure (when Structural Hammng Dstance (SHD) s zero; Tsamardnos, Brown, and Alfers, 6) were counted by repeatng procedure for teratons. We employ the varable elmnaton algorthm (Darwche, 9) for the margnalzaton n () because our experments use only a fve-node network. Its computatonal cost s not burdensome. Table presents the results for BDeu. Table presents results for Expected log-bde. Column shows the number of correct-structure estmates n trals. Column SHD shows the Structure Hammng Dstance (SHD). Column ME shows the total number of mssng arcs, column EE shows the total number of extra arcs, column WO shows the total number of arcs wth wrong orentaton, column MO shows the total number of arcs wth mssng orentaton, and column EO shows the total number of arcs wth extra orentaton n the case of lkelhood equvalence. The results presented n Table reveal that BDeu s hghly senstve to α. In Table, the optmum value of α becomes small as the condtonal dstrbuton becomes skewed. In contrast, the optmum value of α becomes large as the condtonal dstrbuton becomes unform. As shown n Table, the performances of Expected log-bde are better than those of BDeu n almost all cases. The results also show that the BDeu performances are hghly senstve to α but those of Expected log-bde are robust for α. Addtonally, t s noteworthy that the performance of BDeu s extremely worse than those of Expected log-bde for g wth α =. and g wth α =. The reason s that the optmal α becomes large/small for unform/skewed condtonal probabltes because BDeu assumes a unform condtonal pror. Although the optmal α for BDeu s hghly senstve to the unformty of condtonal probabltes dstrbuton, the optmal α of the proposed method s robust for the unformty. Especally for small samples, the large α set-

8 tng for the proposed method works effectvely because the learnng performances of α = for n = are better than those of α =., whch means that the ESS of Expected log-bde mght be affected only by the sample sze. The results also suggest that the optmal ESS ncreases as the sample sze becomes small. Therefore, the ESS of the proposed method works effectvely as actual pseudo-samples to augment the data, especally when the sample sze s small. 6 Conclusons Ths paper presented a proposal for a nonnformatve Drchlet score. The results suggest that the proposed method s effectve especally for a small sample sze. The future tasks are the followng:. Analyss of the proposed method usng varous smulaton data.. Improvement of the learnng algorthm.. Determnaton of the optmal α for a gven data sze. Furthermore, NML (Slander, Roos, and Myllymak, ) s known as an alternatve nformatontheoretc approach for crcumventng the problem wth ESS. It s also an mportant future task to compare the performances of these nonnformatve learnng scores. References W.L. Buntne. 99. Theory refnement on Bayesan networks. In B. D Ambroso, P. Smets and P. Bonssone, (eds.), Proc. of the Seventh Int. Conf. Uncertanty n Artfcal Intellgence, pages 5 6. E. Castllo, A.S. Had and C. Solares Learnng and updatng of uncertanty n Drchlet models. Machne Learnng, 6, pages 6. D. Chckerng Learnng Bayesan networks s NPcomplete. In Learnng from Data Artfcal Intellgence and Statstcs, V, pages. D.M. Chckerng and D. Heckerman.. A comparson of scentfc and engneerng crtera for Bayesan model selecton. Statstcs and Computng,, pages A. Cano, M. Gomez-Olmedo, A.R. Masegosa, and S. Moral.. Locally Averaged Bayesan Drchlet Metrcs. In Symbolc and Quanttatve Approaches to Reasonng wth Uncertanty Symbolc and Quanttatve Approaches to Reasonng wth Uncertanty(Ecsqaru ), pages 7-8. D. Heckerman, D. Geger and D. Chckerng Learnng Bayesan networks: the combnaton of knowledge and statstcal data. Machne Learnng,, pages 97. B.M. Malone, C. Yuan, E.A. Hansen, and S. Brdges.. Improvng the Scalablty of Optmal Bayesan Network Learnng wth External-Memory Fronter Breadth-Frst Branch and Bound Search. In A. Preffer and F.G. Cozman (eds.) Proc. the Twenty-Seventh Conference on Uncertanty n Artfcal Intellgence, pages T. Slander and P. Myllymak. 6. A smple approach for fndng the globally optmal Bayesan network structure. In R. Dechter and T. Rchardson (eds.), Proc. nd Conf. on Uncertanty n Artfcal Intellgence, pages 5-5. T. Slander, P. Kontkanen and P. Myllymak. 7. On senstvty of the MAP Bayesan network structure to the equpment sample sze parameter, In K.B. Laskey, S.M. Mahoney and J. Goldsmth (eds.), Proc. rd Conference on Uncertanty n Artfcal Intellgence, pages T. Slander, T. Roos and P. Myllymak.. Learnng Locally Mnmax Optmal Bayesan Networks. Internatonal Journal of Approxmate Reasonng, 5(5): H. Steck and T.S. Jaakkola.. On the Drchlet Pror and Bayesan Regularzaton. In S. Becker, S. Thrun, and K. Obermayer (eds.), Advances n Neural Informaton Processng Systems (NIPS), pages H. Steck. 8. Learnng the Bayesan network structure: Drchlet Pror versus Data. D.A. McAllester and P. Myllymak (eds.), Proc. th Conference on Uncertanty n Artfcal Intellgence, pages J. Tan, R. He and L. Ram.. Bayesan Model Averagng Usng the k-best Bayesan Network Structures. In P. Grunwald and P. Sprtes (eds.), Proc. 6th Conference on Uncertanty n Artfcal Intellgence, pages I. Tsamardnos, L.E. Brown, and C.F. Alfers. 6. The max-mn hll-clmbng Bayesan network structure learnng algorthm. Machne Learnng, 65(), pages 78. M. Ueno.. Learnng networks determned by the rato of pror and data. In P. Grunwald and P. Sprtes (eds.), Proc. 6th Conference on Uncertanty n Artfcal Intellgence, pages M. Ueno.. Robust learnng Bayesan networks for pror belef. In A. Preffer and F.G. Cozman (eds.) Proc. 7th Conference on Uncertanty n Artfcal Intellgence, pages A. Darwche. 9. Modelng and reasonng wth Bayesan networks. Cambrdge Unversty Press.

Artificial Intelligence Bayesian Networks

Artificial Intelligence Bayesian Networks Artfcal Intellgence Bayesan Networks Adapted from sldes by Tm Fnn and Mare desjardns. Some materal borrowed from Lse Getoor. 1 Outlne Bayesan networks Network structure Condtonal probablty tables Condtonal