School of omuter Scece Statstcal learg wth basc grahcal models robablstc Grahcal Models -78 Lecture 7 Oct 8 7 Recetor A Recetor B ase ase D ase E 3 4 5 Erc g Gee G TF F 6 7 Gee H 8 Readg: J-ha. 56 F-ha. 8 Learg Grahcal Models The goal: Ge set of deedet samles assgmets of radom arables fd the best the most lely? Bayesa etwor both DAG ad Ds E B E B R A R A Structural learg BEARTFFTF BEARTFTTF.. BEARFTTTF E B e b e b e b e b A EB.9...8.9...99 arameter learg Erc g
Learg Grahcal Models Scearos: comletely obsered GMs drected udrected artally or uobsered GMs drected udrected a oe research toc Estmato rcles: Mamal lelhood estmato MLE Bayesa estmato Mamal codtoal lelhood Mamal "Marg" We use learg as a ame for the rocess of estmatg the arameters ad some cases the toology of the etwor from data. Erc g 3 Score-based aroach Data ossble structures E B Lear arameters Score struc/aram R A Mamum lelhood 5 M M R E B A Bayesa odtoal lelhood Marg 3 5. Erc g 4
Z ML arameter Est. for comletely obsered GMs of ge structure The data: {z z z 3 3... z } Erc g 5 arameter Learg Assume G s ow ad fed from eert desg from a termedate outcome of terate structure learg Goal: estmate from a dataset of deedet detcally dstrbuted d trag cases D {... }. I geeral each trag case... M s a ector of M alues oe er ode the model ca be comletely obserable.e. eery elemet s ow o mssg alues o hdde arables or artally obserable.e. s.t. s ot obsered. I ths lecture we cosder learg arameters for a sgle ode. Frequetst s. Bayesa estmate Erc g 6 3
Bayesa arameter Estmato Bayesas treat the uow arameters as a radom arable whose dstrbuto ca be ferred usg Bayes rule: D D D Ths crucal equato ca be wrtte words: For d data the lelhood s The ror. ecodes our ror owledge about the doma therefore Bayesa estmato has bee crtczed for beg "subecte" emrcal Bayes ft ror from "trag" data Erc g 7 D D d lelhood ror osteror margal lelhood D Frequetst arameter Estmato Two eole wth dfferet rors wll ed u wth dfferet estmates D. Frequetsts dsle ths subectty. Frequetsts th of the arameter as a fed uow costat ot a radom arable. Hece they hae to come u wth dfferet "obecte" estmators ways of comutg from data stead of usg Bayes rule. These estmators hae dfferet roertes such as beg ubased mmum arace etc. A ery oular estmator s the mamum lelhood estmator whch s smle ad has good statstcal roertes. Erc g 8 4
Dscusso or ths s the roblem! Bayesas ow t Erc g 9 Mamum Lelhood Estmato The log-lelhood s mootocally related to the lelhood: The Idea uderlyg mamum lelhood estmato MLE: c the settg of arameters most lely to hae geerated the data we saw: arg ma ; D roblem of MLE: Oerfttg: meas that "some of the relatoshs that aear statstcally sgfcat are actually ust ose. It occurs whe the comlety of the statstcal model s too great for the amout of data that you hae" Ofte the MLE oerfts the trag data so t s commo to mamze a regularzed loglelhood stead: ' Isuffcet trag data ca lead to surous estmator e.g. certa ossble alues are ot obsered due to data sarsty so t s commo to smooth the estmated arameter l ; D log D log ML l arg ma l ; D c Erc g 5
Eamle: Beroull model Data: We obsered d co tossg: D{ } Reresetato: Bary r.: Model: for for { } How to wrte the lelhood of a sgle obserato? The lelhood of datasetd{ }:... #head #tals Erc g MLE Obecte fucto: h t l ; D log D log log log We eed to mamze ths w.r.t. Tae derates wrt h h l h h h MLE or MLE Frequecy as samle mea Suffcet statstcs The couts h where are suffcet statstcs of data D Erc g 6
Beg a ragmatc frequetst Mamum a osteror MA estmato: arg ma D arg ma l ; D Smoothg wth seudo-couts Recall that for Bomal Dstrbuto we hae What f we tossed too few tmes so that we saw zero head? head We hae ML ad we wll redct that the robablty of seeg a head et s zero!!! The rescue: MA head MLE head MLE head head Where ' s ow as the seudo- magary cout log Erc g 3 tal head ' tal ' head But are we stll obecte? Bayesa estmato for Beroull Beta dstrbuto: Γ β ; B Γ Γ β β β β β Whe s dscrete Γ Γ! osteror dstrbuto of :...... h t β h... t β otce the somorhsm of the osteror to the ror such a ror s called a cougate ror ad β are hyerarameters arameters of the ror ad corresod to the umber of rtual heads/tals seudo couts Erc g 4 7
Bayesa estmato for Beroull co'd osteror dstrbuto of :... h t β h...... Mamum a osteror MA estmato: t β MA arg ma log osteror mea estmato:... Bata arameters ca be uderstood as seudo-couts Bayes ror stregth: Aβ β h t β h D d d A ca be teroerated as the sze of a magary data set from whch we obta the seudo-couts Erc g 5 Effect of ror Stregth Suose we hae a uform ror β/ ad we obsere h 8 t Wea ror A. osteror redcto: h h t 8 '. 5 Strog ror A. osteror redcto: h h t 8 '. 4 Howeer f we hae eough data t washes away the ror. e.g. h t 8. The the estmates uder wea ad strog ror are ad resectely both of whch are close to. Erc g 6 8
How estmators should be used? MA s ot Bayesa ee though t uses a ror sce t s a ot estmate. osder redctg the future. A sesble way s to combe redctos based o all ossble alues of weghted by ther osteror robablty ths s what a Bayesa wll do: ew ew ew ew d d d A frequetst wll tycally use a lug- estmator such as ML/MA: or ew ew ML ew ew MA The Bayesa estmate wll collase to MA for cocetrated osteror Erc g 7 Frequetst s. Beyesa Ths s a theologcal war. Adatages of Bayesa aroach: Mathematcally elegat. Wors well whe amout of data s much less tha umber of arameters e.g. oe-shot learg. Easy to do cremetal sequetal learg. a be used for model selecto ma lelhood wll always c the most comle model. Adatages of frequetst aroach: Mathematcally/ comutatoally smler. "obecte" ubased arat to rearameterzato As the two aroaches become the same: D D δ ML Erc g 8 9
Smlest GMs: the buldg blocs Desty estmato arametrc ad oarametrc methods Regresso Lear codtoal mture oarametrc lassfcato Geerate ad dscrmate aroach Q σ Y Q Erc g 9 lates A late s a macro that allows subgrahs to be relcated For d echageable data the lelhood s D We ca rereset ths as a Bayes et wth odes. The rules of lates are smle: reeat eery structure a bo a umber of tmes ge by the teger the corer of the bo e.g. udatg the late de arable e.g. as you go. Dulcate eery arrow gog to the late ad eery arrow leag the late by coectg the arrows to each coy of the structure. Erc g
Erc g Beroull dstrbuto: Ber Multomal dstrbuto: Mult Multomal dcator arable:. w.. ad ] [ where [...6] [...6] 6 5 4 3 de the dce - face} where { 6 5 4 3 6 5 4 3 Dscrete Dstrbutos for for Erc g Multomal dstrbuto: Mult out arable: Dscrete Dstrbutos where M!!!!!!!! L L L
Erc g 3 Eamle: multomal model Data: We obsered d de rolls -sded: D{5 3} Reresetato: Ut bass ectors: Model: How to wrte the lelhood of a sgle obserato? The lelhood of datasetd{ }: L } th roll the de the de -sde of where {... ad } { where M } { ad w..... 3 GM: Erc g 4 MLE: costraed otmzato wth Lagrage multlers Obecte fucto: We eed to mamze ths subect to the costra ostraed cost fucto wth a Lagrage multler Tae derates wrt Suffcet statstcs The couts are suffcet statstcs of data D D D log log log ; l λ log l λ λ λ λ l MLE MLE or Frequecy as samle mea L
3 Erc g 5 Bayesa estmato: Drchlet dstrbuto: osteror dstrbuto of : otce the somorhsm of the osteror to the ror such a ror s called a cougate ror osteror mea estmato: - - Γ Γ......... d d D 3 GM: Drchlet arameters ca be uderstood as seudo-couts Erc g 6 More o Drchlet ror: Where s the ormalze costat come from? Itegrato by arts Γ s the gamma fucto: For regers Margal lelhood: osteror closed-form: osteror redcte rate: Γ Γ d d L L L Γ dt e t t! Γ }... { r r r r r r r d }... { Dr }... { d
Sequetal Bayesa udatg Start wth Drchlet ror Dr : Obsere ' samles wth suffcet statstcs '. osteror becomes: ' Dr : ' Obsere aother " samles wth suffcet statstcs ". osteror becomes: ' " Dr : ' " So sequetally absorbg data ay order s equalet to batch udate. Erc g 7 Effect of ror Stregth Let be the umber of obsered samles Let A be the umber of "seudo obseratos" ---- the stregth of the ror Let ' /A deote the ror meas The osteror mea s a coe combato of the ror mea ad the MLE: where {... } A A A A A λ ' λ λ A. A MLE Erc g 8 4
Herarchcal Bayesa Models are the arameters for the lelhood are the arameters for the ror. We ca hae hyer-hyer-arameters etc. We sto whe the choce of hyer-arameters maes o dfferece to the margal lelhood; tycally mae hyerarameters costats. Where do we get the ror? Itellget guesses Emrcal Bayes Tye-II mamum lelhood comutg ot estmates of : MLE arg ma Erc g 9 Lmtato of Drchlet ror: Erc g 3 5
The Logstc ormal ror ~ L Σ Σ γ ~ Σ e γ log γ γ e roblem γ log γ e - Log artto Fucto - ormalzato ostat ro: co-arace structure o: o-cougate we wll dscuss how to sole ths later Erc g 3 Logstc ormal Destes Logstc ormal Erc g 3 6
Eamle : uarate-gaussa Data: We obsered d real samles: Model: D{-. -5. 3} / πσ e{ / σ } Log lelhood: l ; D log D log πσ σ GM: 3 MLE: tae derate ad set to zero: l / σ MLE l 4 σ σ σ MLE ML Erc g 33 σ MLE for a multarate-gaussa It ca be show that the MLE for ad Σ s Σ MLE MLE where the scatter matr s T ML ML S T T T ML ML S ML ML M T T M T The suffcet statstcs are Σ ad Σ T. ote that T Σ T may ot be full ra eg. f <D whch case Σ ML s ot ertble Erc g 34 7
Bayesa arameter estmato for a Gaussa There are arous reasos to ursue a Bayesa aroach We would le to udate our estmates sequetally oer tme. We may hae ror owledge about the eected magtude of the arameters. The MLE for Σ may ot be full ra f we do t hae eough data. We wll restrct our atteto to cougate rors. We wll cosder arous cases order of creasg comlety: ow σ uow ow uow σ Uow ad σ Erc g 35 Bayesa estmato: uow ow σ ormal ror: Jot robablty: / πτ e{ τ } / / πσ e σ / πτ e{ / τ } GM: 3 osteror: where πσ~ / e{ ~ / σ } ~ ~ / / σ / τ / / σ / τ σ τ ad ~ σ Samle mea σ τ Erc g 36 8
Bayesa estmato: uow ow σ / σ / σ / σ / σ / σ / σ ~ σ σ σ The osteror mea s a coe combato of the ror ad the MLE wth weghts roortoal to the relate ose leels. The recso of the osteror /σ s the recso of the ror /σ lus oe cotrbuto of data recso /σ for each obsered data ot. Sequetally udatg the mea.8 uow σ. ow Effect of sgle data ot σ σ σ σ σ σ Uformate ague/ flat ror σ Erc g 37 Other scearos ow uow λ /σ The cougate ror for λ s a Gamma wth shae a ad rate erse scale b The cougate ror for σ s Ierse-Gamma Uow ad uow σ The cougate ror s ormal-ierse-gamma Sem cougate ror Multarate case: The cougate ror s ormal-ierse-wshart Erc g 38 9
Summary Learg scearos: Data Obecte fucto Frequetst ad Bayesa Learg sgle-ode GM desty estmato Tycal dscrete dstrbuto Tycal cotuous dstrbuto ougate rors Erc g 39