Learning the structure of Bayesian belief networks

Lectue 17 Leanng the stuctue of Bayesan belef netwoks Mlos Hauskecht mlos@cs.ptt.edu 5329 Sennott Squae Leanng of BBN Leanng. Leanng of paametes of condtonal pobabltes Leanng of the netwok stuctue Vaables: Obsevable values pesent n evey data sample Hdden they values ae neve obseved n data Mssng values values sometmes pesent, sometmes not Next: All vaables ae obsevable 1. Leanng of paametes of BBN 2. Leanng of the model (BBN stuctue

Leanng of BBN paametes. Example. Example: Pneumona Pneumona?? HWBC Pneum Pn???? eve Hgh WBC Palen Pneum eve Pneum Pneum??? Leanng of BBN paametes. Example. Data D (dffeent patent cases: Pal ev Cou HWB Pneu eve Pneumona Hgh WBC

Estmates of paametes of BBN Much lke multple con toss o oll of a dce poblems. A smalle leanng poblem coesponds to the leanng of exactly one condtonal dstbuton Example: eve Pneumona Poblem: How to pck the data to lean? Leanng of BBN paametes. Example. Lean: eve Pneumona Step 1: Select data ponts wth Pneumona Pal ev Cou HWB Pneu eve Pneumona Hgh WBC

Leanng of BBN paametes. Example. Lean: Step 1: eve Pneumona Ignoe the est Pal ev Cou HWB Pneu eve Pneumona Hgh WBC Leanng of BBN paametes. Example. Lean: eve Pneumona Step 2: Select values of the andom vaable defnng the dstbuton of eve Pal ev Cou HWB Pneu eve Pneumona Hgh WBC

Leanng of BBN paametes. Example. Lean: eve Pneumona Step 2: Ignoe the est ev eve Pneumona Hgh WBC Leanng of BBN paametes. Example. Lean: eve Pneumona Step 3a: Leanng the ML estmate ev eve Pneumona Hgh WBC eve Pneumona 0.6 0.4

Leanng of BBN paametes. Bayesan leanng. Lean: eve Pneumona Step 3b: Leanng the Bayesan estmate Assume the po ev θ eve Pneumona ~ Beta(3,4 eve Pneumona Hgh WBC Posteo: Pneumona ~ Beta(6,6 θ eve Model selecton BBN has two components: Stuctue of the netwok (models condtonal ndependences A set of paametes (condtonal chld-paent dstbutons We aleady know how to lean the paametes fo the fxed stuctue But how to lean the stuctue of the BBN? Alam? Buglay Alam Quake Buglay Quake John May John May

Leanng the stuctue Ctea we can choose to scoe the stuctue S Magnal lkelhood maxmze P ( D S, ξ ξ - epesents the po knowledge Maxmum posteo pobablty maxmze P ( S D, ξ P ( S D, ξ P ( D S, ξ P ( S P ( D ξ ξ How to compute magnal lkelhood P ( D S, ξ? Leanng of BBNs Notaton: anges ove all possble vaables 1,..,n j1,..,q anges ove all possble paent combnatons k1,.., anges ove all possble vaable values - paametes of the BBN j s a vecto of epesentng paametes of the condtonal pobablty dstbuton; such that 1 N N j j - a numbe of nstances n the dataset whee paents of vaable X take on values j and X has value k N - po counts (paametes of Beta and Dchlet pos

Magnal lkelhood Integate ove all possble paamete settngs P ( D S, ξ D S,, ξ p( S, ξ d Usng the assumpton of paamete and sample ndependence P ( D S, ξ n q j 1 j 1 j j We can use log-lkelhood scoe nstead log D S, ξ n q j log + log 1 1 1 j Scoe s decomposable along vaables!!! j j k om the d assumpton: D N h 1 1 h x paents, Let numbe of values that attbute x can take q numbe of possble paent combnatons N numbe of cases n D whee x has value k and paents wth values j. n n h x k paents n q j k Magnal lkelhood q j k P ( j, θ N N

om paamete ndependence Pos fo p( j ξ j ( j1,..., j s a vecto of paametes; we use a Dchlet dstbuton wth paametes to epesent t P ( j 1 ξ j1,..., j ξ Dchlet( j,..., j Magnal lkelhood n p( ξ p( ξ 1 j 1 q 1 j Combne thngs togethe: P ( D S P ( D S, P ( S d Γ n q ( N j k Γ n q ( j j Magnal lkelhood N + 1 1 d a j k 1 j j n q d

An altenatve way to compute the magnal lkelhood Integate ove all possble paamete settngs Posteo of paametes, gven data and the stuctue ck Gves the soluton P ( D ξ D, ξ p( ξ d D ξ D, ξ p( ξ p( D, ξ D ξ D, ξ p( ξ D ξ p( D, ξ n q j 1 j 1 j j Leanng the stuctue Lkelhood of data fo the BBN (stuctue and paametes D, ξ measues the goodness of ft of the BBN to data Magnal lkelhood (fo the stuctue only P ( D S, ξ Does not measue only a goodness of ft. It s: dffeent fo stuctues of dffeent complexty Incopoates pefeences towads smple stuctues, mplements Occam s azo!!!!

Occam s Razo Why thee s a pefeence towads smple stuctues? Rewte magnal lkelhood as D S, ξ We know that D S,, ξ p( S, ξ d p( S, ξ d p( ξ d 1 Intepetaton: n moe complex stuctues thee ae moe ways paametes can be set badly he numeato: count of good assgnments he denomnato: count of all assgnments Appoxmatons of pobablstc scoes Appoxmatons of the magnal lkelhood and posteo scoes Infomaton based measues Akake cteon Bayesan nfomaton cteon (BIC Mnmum descpton length (MDL Reflect the tadeoff between the ft to data and pefeence towads smple stuctues Example: Akake cteon. Maxmze: scoe( S log D ML, ξ compl(s Bayesan nfomaton cteon (BIC Maxmze: 1 scoe( S log D ML, ξ compl(s logn 2

Optmzng the stuctue ndng the best stuctue s a combnatoal optmzaton poblem A good featue: the scoe s decomposable along vaables: n q Γ Γ + ( j + ( N log P ( D S, ξ log log 1 j 1 Γ ( j j Γ ( Algothm dea: Seach the space of stuctues usng local changes (addtons and deletons of a lnk Advantage: we do not have to compute the whole scoe fom scatch Recompute the patal scoe fo the affected vaable Optmzng the stuctue. Algothms Geedy seach Stat fom the stuctue wth no lnks Add a lnk that yelds the best scoe mpovement Metopols algothm (wth smulated annealng Local addtons and deletons Avods beng tapped n local optmal