PGM Learning Tasks and Metrics

Size: px

Start display at page:

Download "PGM Learning Tasks and Metrics"

Brett Wood
5 years ago
Views:

1 Probablstc Graphcal odels Learnng Overvew PG Learnng Tasks and etrcs

2 Learnng doan epert True dstrbuton P* aybe correspondng to a PG * dataset of nstances D{d],...d]} sapled fro P* elctaton Network Learnng Data

3 Known Structure, Coplete Data X X 2 X X 2 Intal network Y Inducer Y X X 2 Y Input Data 2 y 2 y 2 y 2 y 2 y 2 y 2 y PY X,X 2 X X 2 y y

4 Unknown Structure, Coplete Data X X 2 X X 2 Intal network Y Inducer Y X X 2 Y Input Data 2 y 2 y 2 y 2 y 2 y 2 y 2 y PY X,X 2 X X 2 y y

5 Known Structure, Incoplete Data X X 2 X X 2 Intal network Y Inducer Y X X 2 Y Input Data? 2 y? y? 2? 2 y? 2 y 2?? y PY X,X 2 X X 2 y y

6 Unknown Structure, Incoplete Data X X 2 X X 2 Intal network Y Inducer Y X X 2 Y Input Data? 2 y? y? 2? 2 y? 2 y 2?? y PY X,X 2 X X 2 y y

7 Latent Varables, Incoplete Data H X X 2 X X 2 Intal network Y Inducer Y X X 2 Y Input Data? 2 y? y? 2? 2 y? 2 y 2?? y PY X,X 2 X X 2 y y

8 PG Learnng Tasks I Goal: Answer general probablstc queres about new nstances Sple etrc: Tranng set lkelhood PD : Π Pd] : But we really care about new data Evaluate on test set lkelhood PD :

9 PG Learnng Tasks II Goal: Specfc predcton task on new nstances Predct target varables y fro observed varables E.g., age segentaton, speech recognton Often care about specalzed objectve E.g., pel-level segentaton accuracy Often convenent to select odel to optze lkelhood Π Pd] : or condtonal lkelhood Π Py] ] : odel evaluated on true objectve over test data

10 PG Learnng Tasks III Goal: Knowledge dscovery of * Dstngush drect vs ndrect dependences Possbly drectonalty of edges Presence and locaton of hdden varables Often tran usng lkelhood Poor surrogate for structural accuracy Evaluate by coparng to pror knowledge

Avodng Overfttng Selectng to optze tranng set

Structure overfttng Tranng lkelhood always

11 Avodng Overfttng Selectng to optze tranng set lkelhood overfts to statstcal nose Paraeter overfttng Paraeters ft rando nose n tranng data Use regularzaton / paraeter prors Structure overfttng Tranng lkelhood always ncreases for ore cople structures Bound or penalze odel coplety

Coplety penalty Choce of hyperparaeters akes a

12 Selectng Hyperparaeters Regularzaton for overfttng nvolves hyperparaeters: Paraeter prors Coplety penalty Choce of hyperparaeters akes a bg dfference to perforance ust be selected on valdaton set

varables Can ncorporate pror knowledge nto odel Learnng

13 Why PG Learnng Predctons of structured objects sequences, graphs, trees Eplot correlatons between several predcted varables Can ncorporate pror knowledge nto odel Learnng sngle odel for ultple tasks Fraework for knowledge dscovery

14 Probablstc Graphcal odels Learnng Paraeter Estaton au Lkelhood Estaton

15 Based Con Eaple P s a Bernoull dstrbuton: PX, PX - sapled IID fro P Tosses are ndependent of each other Tosses are sapled fro the sae dstrbuton dentcally dstrbuted

16 IID as a PG X Data X]... X] P ] ] ]

17 au Lkelhood Estaton Goal: fnd,] that predcts D well Predcton qualty lkelhood of D gven P D P D L ] : P D P D L ] : H H T T H L,,,, : LD:

18 au Lkelhood Estator Observatons: H heads and T tals Fnd azng lkelhood L : H, T H T Equvalent to azng log-lkelhood l :, log + log H T H T Dfferentatng the log-lkelhood and solvng for : H ˆ H + T

19 Suffcent Statstcs For coputng n the con toss eaple, we only needed H and T snce L : D H T H and T are suffcent statstcs

20 Suffcent Statstcs A functon sd s a suffcent statstc fro nstances to a vector n R k f for any two datasets D and D and any Θ we have ] D s ] s ] L : D L : D ] D ' ' Datasets Statstcs

21 Suffcent Statstc for ultnoal For a dataset D over varable X wth k values, the suffcent statstcs are counts <,..., k > where s the # of tes that X] n D Suffcent statstc s s a tuple of denson k s,...,,,..., L k : D

22 Suffcent Statstc for Gaussan Gaussan dstrbuton: P Rewrte as X ~ px N μ, σ 2 f p X e 2π σ Suffcent statstcs for Gaussan: s<,, 2 > 2 2πσ ep 2 2σ + μ 2 σ μ 2 2 2σ 2 2 μ σ

23 au Lkelhood Estaton LE Prncple: Choose to aze LD:Θ ultnoal LE: ˆ Gaussan LE: μ ] σˆ ] ˆ μ 2

24 Suary au lkelhood estaton s a sple prncple for paraeter selecton gven D Lkelhood functon n unquely deterned d by suffcent statstcs that suarze D LE has closed for soluton for any paraetrc dstrbutons

25 Probablstc Graphcal odels Learnng Paraeter Estaton a Lkelhood for BNs

26 LE for Bayesan Networks Paraeters: Data nstances: <],y]> X.7.3 X Y Y X y y

27 LE for Bayesan Networks Paraeters: : ] ], : Θ y P D L X X Y X : ] ] : ] : ] ] : ] X Y X y P P y P P : ] ] : ] y P P Y Data d

28 LE for Bayesan Networks Lkelhood for Bayesan network L Θ : D P L ]: Θ P ] U ]: Θ Θ f X U are dsjont, then LE can be coputed by azng each local lkelhood separately P ] U ]: Θ : D

29 LE for Table CPDs U u : ] ] X P : ] ] : ] ] X P P U u u u u u ], ] :, ] ], ], ' ], ' u u u u u u u u u ], ] :, ],, u u u

30 S S Shared Paraeters S S S 2 S 3

31 Shared Paraeters S S S S S S S 2 S 3 O O 2 O 3 O S

32 Suary For BN wth dsjont sets of paraeters n CPDs, lkelhood decoposes as product of local lkelhood functons, one per varable For table CPDs, local lkelhood further decoposes as product of lkelhood for ultnoals, one for each parent cobnaton For networks wth shared CPDs, suffcent statstcs accuulate over all uses of CPD

33 Fragentaton & Overfttng, u ] ', u ] u ', u ] u ] # of buckets ncreases eponentally wth U For large U, ost buckets wll have very few nstances very poor paraeter estates Wth lted data, we often get better generalzaton wth spler structures

34 Probablstc Graphcal odels Learnng Paraeter Estaton Bayesan Estaton

35 Ltatons of LE Two teas play tes, and the frst wns 7 of the atches Probablty of frst tea wnnng.7 A con s tossed tes, and coes out heads 7 of the tosses Probablty of heads.7 A con s tossed tes, and coes out heads 7 of the tosses Probablty of heads.7

36 Paraeter Estaton as a PG X... Data X] X] Gven a fed, tosses are ndependent If s unknown, tosses are not argnally ndependent each toss tells us soethng about

37 Bayesan Inference Jont probablstc odel X]... X] P ],..., ], P ],..., ] P P P P ] H T P ],..., ] P ],..., ] P P ],..., ]

38 Drchlet Dstrbuton s a ultnoal dstrbuton over k values Drchlet dstrbuton ~Drchletα,...,α k where and Intutvely, hyperparaeters correspond to the nuber of saples we have seen k Z P α Γ Γ k k Z α α Γ dt e t t

39 Drchlet Dstrbutons Drchlet, Drchlet2,2 Drchlet.5,.5 Drchlet5,

40 P Drchlet Prors & Posterors D P D P P k D P If P s Drchletchl and the lkelhood lh s ultnoal, then the posteror s also Drchlet Pror s Drα,...,α k Data counts are,..., k Posteror s Drα +,...α k + k Drchlet s a conjugate pror for the ultnoal k α

41 Suary Bayesan learnng treats paraeters as rando varables Learnng s then a specal case of nference Drchlet dstrbuton s conjugate to ultnoal Posteror has sae for as pror Can be updated n closed for usng suffcent statstcs fro data

42 Probablstc Graphcal odels Learnng Paraeter Estaton Bayesan Predcton

43 Bayesan Predcton d P X P X P X ~ Drchletα,...,α k j j j d Z X P j α α α Drchlet hyperparaeters correspond to the nuber of saples we have seen

44 Bayesan Predcton ~ Drchletα,...,α k P + ] ],..., ] X]... X] X+] P + ] ],..., ], P P + ] P ],..., ] d P X + ], ], K, ] ],..., α + α + ] d ~ Drchletα +,,α k + k Equvalent saple sze α α + +α K Larger α ore confdence n our pror

45 Eaple: Bnoal Data Pror: unfor for n,] P α k k Z k, 4, LE for PX6]4/5 Bayesan predcton s 5/

46 Effect of Prors Predcton of PX after seeng data wth ¼ as a functon of saple sze Dfferent strength α α + α.5 Fed strength α α + α.45 Fed rato α / α Dfferent rato α.4.4 / α

47 D Effect of Prors In real data, Bayesan estates are less senstve to nose n the data PX LE Drchlet.5,.5 Drchlet, Drchlet5,5 Drchlet,. N Toss Result

48 Suary Bayesan predcton cobnes suffcent statstcs fro agnary Drchlet saples and real data saples Asyptotcally the sae as LE But Drchlet hyperparaeters deterne both the pror belefs and ther strength

49 Probablstc Graphcal odels Learnng Paraeter Estaton Bayesan Estaton for BNs

50 Bayesan Estaton n BNs X X] Y] X2] Y2] X Y X X] Y] Instances are ndependent gven the paraeters X ],Y ] are d-separated fro X],Y] gven Paraeters for ndvdual varables are ndependent a pror P P X Pa X X Y Data d Y X

51 Bayesan Estaton n BNs X X] X2] Y] Y2] X Y X X] Y] Posterors of are ndependent gven coplete data Coplete data d-separates paraeters for dfferent CPDs P X, Y X D P X D P Y X D As n LE, we can solve each estaton proble separately X Y Data d Y X

52 Bayesan Estaton n BNs X X] X2] X X] X Y X Y] Y2] Y] Y Y Y Data d Posterors of are ndependent gven coplete data Also holds for paraeters wthn fales Note contet specfc ndependence between Y and Y when gven both X s and Y s

53 Bayesan Estaton n BNs X X] X2] X X] X Y X Y] Y2] Y] Y Data d Y X Posterors of can be coputed ndependently For ultnoal X u f pror s Drchletα u,..., α k u posteror s Drchletα u+,u],,α k u+ k,u]

54 Assessng Prors for BNs We need hyperparaeter α u for each node X, value, and parent assgnent u Pror network wth paraeters Θ Equvalent saple sze paraeter α α u : α P,u Θ X Y

55 Case StudyPULEBOLUS INTUBATION KINKEDTUBE INVOLSET VENTACH DISCONNECT ICU-Alar network PAP SHUNT VENTLUNG INOVL FIO2 VENTALV 37 varables ANAPHYLAXIS PVSAT ARTCO2 TPR SAO2 INSUFFANESTH EXPCO2 54 paras HYPOVOLEIA LVFAILURE CATECHOL PRESS VENITUBE LVEDVOLUE STROEVOLUE HISTORY ERRBLOWOUTPUT HR ERRCAUTER Eperent PCWP Saple nstances fro network Relearn paraeters CVP BP CO HRBP HREKG HRSAT

56 Case Study: ICU Alar Network.4 Relatve entropy Bayes: α LE Bayes: α2 Bayes: α5 Bayes: α

57 Suary In Bayesan networks, f paraeters are ndependent a pror, then also ndependent n the posteror For ultnoal BNs, estaton uses suffcent statstcs,u] ˆ u LE, u ] u ] P u, D α, u α u +, u ] + u ] Bayesan Drchlet Bayesan ethods requre choce of pror can be elcted as pror network and equvalent saple sze

Excess Error, Approximation Error, and Estimation Error

Excess Error, Approximation Error, and Estimation Error E0 370 Statstcal Learnng Theory Lecture 10 Sep 15, 011 Excess Error, Approxaton Error, and Estaton Error Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton So far, we have consdered the fnte saple