Learning with Partially Observed Data

Size: px

Start display at page:

Download "Learning with Partially Observed Data"

Derrick Underwood
5 years ago
Views:

1 Readngs K&F Learnng wth artall Observed ata Lecture 2 Ma 4 2 CSE 55 Statstcal Methods Sprng 2 Instructor Su-In Lee nverst of Washngton Seattle Model Selecton So far we focused on sngle model ~ ven {[] [M]} fnd best scorng model arg ma ~ se t to predct net eample [ M + ] Implct assumpton Makng predctons based on the Baesan estmaton rule [ M + ] [ M + ] Best scorng model domnates the weghted sum ~ [ M + ] [ M + ] Vald wth man data nstances ver large M ros We get a sngle structure Allows for effcent use n our predcton tasks Cons Commttng to the ndependences of a partcular structure Other structures wth smlar score mght be probable gven 2

2 Model Selecton enst estmaton ckng one structure ma suffce f t dstrbuton [ M + ] s smlar for dfferent hgh-scorng structures. Structure dscover Several networks wth smlar scores one or several of them mght be close to the true structure but we cannot dstngush between them gven the data. rawng a concluson about the structure from one of the networks can be wrong Thus nstead of pckng one of the hgh-scorng structures we should focus on estmatng the confdence of the structural propertes we are nterested n. efne features f e.g. edge sub-structure d-sep propert Compute f f Requres summng over eponentall man structures We can reduce the computaton assumng a certan orderng 3 f Model Averagng ven an Order g Assumptons Known total order of varables Mamum n-degree for varables d Margnal lkelhood f g g f g 2 + L+ f L f gm g n L f d d ep{ B a ep{ FamScore } B a } d d ep{ FamScoreB } { < ep FamScore < d} B g n { < < d} + L+ f gm n Cost per faml On d Total cost On d+ 2 L + L n sng decomposablt assumpton on pror { } Snce gven orderng parent choces are ndependent 4 n 2

3 3 Model Averagng ven an Order osteror probablt of a general feature f f partcular choce of parents for f estence of a partcular edge between j { } { } < < } { ep ep d B B FamScore FamScore a All terms cancel out { } < < d B d FamScore f f f } { ep { } { } < < < < } { } { ep ep d B d and B j j FamScore FamScore a 5 Model Averagng We cannot assume that order s known Soluton Sample from posteror dstrbuton of If we manage to sample graphs.. K from Estmate feature probablt b Samplng can be done b MCMC Markov chan Monte Carlo Net week K d f K f 6

4 Notes on Learnng Local Structures Beond table Cs efne score wth local structures Eample n tree Cs score decomposes b leaves not b and a partcular value on ar ror ma need to be etended Eample n tree Cs penalt for tree structure per C depth of the tree Etend search operators to local structure Eample n tree Cs we need to search for tree structure Can be done b local encapsulated search or b defnng new global operatons 7 Structure Search Summar screte optmzaton problem In general N-ard Need to resort to heurstc search In practce search s relatvel fast ~ vars n ~ mn ecomposablt Suffcent statstcs In some cases we can reduce the search problem to an eas optmzaton problem Eample learnng trees a fed orderng 8 4

5 Let s turn to the man topc for toda LEARNIN WIT ARTIALL OBSERVE ATA 9 Tranng ata Tranng nstance Θ <Θ > ISCONNECT 5 A SNT VENTLN VENITBE RESS MINOVL FIO2 VENTALV ANALAIS VSAT ARTCO2 TR SAO2 INSFFANEST ECO2 N- N OVOLEMIA LVFAILRE CATECOL LVEVOLME STROEVOLME ISTOR ERRBLOWOTT R ERRCATER CV CW CO REK RSAT RB B ntl now we assumed that the tranng data s full observed Each nstance assgns values to all the varables n our doman 5

6 Incomplete ata In realt ths assumpton mght not be true. Tranng nstance <Θ > Lung cancer? 3? -? 2 9 8? ? 8 2? ? A ANALAIS TR Θ 2 3 SNT MINOVL SAO2 FIO2 VSAT INSFFANEST VENTLN VENTALV ARTCO2 ECO2 RESS 4 5 VENITBE ISCONNECT N- N?????????????????? 7 4? 7 OVOLEMIA LVFAILRE CATECOL LVEVOLME STROEVOLME ISTOR ERRBLOWOTT R ERRCATER CV CW CO REK RSAT RB Mssng values dden varables B Challenges Foundatonal s the learnng task well defned? Computatonal how can we learn wth mssng data? Treatng Mssng ata ow should we treat mssng data? Based on data mssng mechansm Case I A con s tossed on a table occasonall t drops and measurements are not taken random mssng Sample sequence T??T? Treat mssng data b gnorng t Case II A con s tossed but onl heads are reported delberate mssng values Sample sequence???? Treat mssng data b fllng t wth Tals We need to consder the data mssng mechansm 2 6

7 Modelng ata Mssng Mechansm Let s tr to model the data mssng mechansm {... n } are random varables O {O...O n } are observablt varables Alwas observed {... n } new random varables Val Val {?} s a determnstc functon of and O O o? O o 3 Modelng Mssng ata Mechansm Case I random mssng values Case II delberate mssng values ψ ψ O O ψ T ψ? ψ L ψ M MT ψ M + MT ψ M? MLE M ˆ M + M M + M T ψˆ M + M + M T T 4? 7

8 Modelng Mssng ata Mechansm Case I random mssng values Case II delberate mssng values ψ O MLE? M ˆ M + M M + M T ψˆ M + M + M? T T? ψ O T? ψ O ψ ψ O T O + ψ O T M ψ + ψ? M MT M MT L ψ ψ O ψ O T O O T ecouplng of Observaton Mechansm When can we gnore the mssng data mechansm and focus onl on the lkelhood? Mssng Completel at Random MCAR For ever Ind ;O a ver strong assumpton Suffcent but not necessar for the decomposton of the lkelhood Mssng at Random MAR s suffcent The probablt that the value of s mssng s ndependent of ts actual value gven other observed values In both cases the lkelhood decomposes When there are mssng values n tr to model such that MAR holds. 6 8

9 Incomplete ata In realt ths assumpton mght not be true Lung cancer? 3? -? 2 9 8? ? 8 2? ? Θ <Θ > ISCONNECT 5 A SNT VENTLN VENITBE RESS MINOVL FIO2 VENTALV ANALAIS VSAT ARTCO2 TR SAO2 INSFFANEST ECO2 N- N?????????????????? 7 4? 7 OVOLEMIA LVFAILRE CATECOL LVEVOLME STROEVOLME ISTOR ERRBLOWOTT R ERRCATER CV CW CO REK RSAT RB Mssng values dden varables B Challenges Foundatonal s the learnng task well defned? Computatonal how can we learn wth mssng data? 7 dden Latent Varables Attempt to learn a model wth hdden varables In ths case MCAR alwas holds varable s alwas mssng Wh should we care about unobserved varables? parameters 59 parameters 8 9

10 dden Latent Varables dden varables also appear n clusterng Naïve Baes model Class varable s hdden Observed attrbutes are ndependent gven the class N- N Cluster dden 2... Observed possble mssng values n ow do mssng data affect the lkelhood functon? 2

11 Lkelhood for Complete ata [3] [3] [2] [2] [] [] L Input ata Lkelhood Lkelhood decomposes b varables Lkelhood decomposes wthn Cs Lkelhood functon s log-concave unque global mamum that has a smple analtc closed form. 2 Lkelhood for Incomplete ata?? 2 L Input ata Lkelhood Lkelhood does not decompose b varables Lkelhood does not decompose wthn Cs Computng lkelhood per nstance requres nference! 22

12 Lkelhood wth Mssng ata Multmodal lkelhood functon wth ncomplete data Lkelhood functon s not log-concave local mama cannot be obtaned b a smple analtc closed form CSE 55 Statstcal Methods Sprng 2 23 MLE from Incomplete ata Take steps proportonal to the postve of the gradent. LΘ radent Ascent Follow gradent of lkelhood w.r.t. to parameters Add lne search and conjugate gradent methods to get fast convergence Θ 24 2

13 MLE from Incomplete ata Nonlnear optmzaton problem LΘ Θ Epectaton Mamzaton EM se current pont to construct alternatve functon whch s nce uarant mamum of new functon has better score than current pont 25 MLE from Incomplete ata Nonlnear optmzaton problem LΘ radent Ascent and EM Fnd local mama Requre multple restarts to fnd appro. to the global mamum Requre computatons n each teraton Θ 26 3

14 radent Ascent Theorem log Θ pa roof log Θ pa m m pa m pa log m] Θ pa m] Θ m] Θ m] Θ pa a T Θ Θ T T Θ T T Θ T T a observed data n the m-th nstance each tranng nstance ow do we compute? m] Θ pa 27 radent Ascent a m] Θ pa pa Θ < O> m] pa pa m] Θ Θ < O> m] < a >< pa > pa log Θ pa m m log m] Θ pa m] Θ m] Θ pa 28 4

15 radent Ascent log Θ pa m m m m] Θ m] Θ pa m] Θ m] Θ pa m] Θ pa pa pa a Requres computaton pa m]θ for all m Can be done wth clque-tree algorthm snce a are n the same clque 29 radent Ascent Summar ros Fleble can be etended to non table Cs Cons Need to project gradent onto space of legal parameters For reasonable convergence need to combne wth advanced methods conjugate gradent lne search 3 5

16 Epectaton Mamzaton EM Talored algorthm for optmzng lkelhood functons Intuton arameter estmaton s eas gven complete data Computng probablt of mssng data s eas nference gven parameters Strateg ck a startng pont for parameters Complete the data usng current parameters Estmate parameters relatve to data completon Iterate rocedure guaranteed to mprove at each teraton 3 Epectaton Mamzaton EM Intalze parameters to Iterate E-step and M-step In the t-th teraton we do Epectaton E-step Let m] be the observed data n the m-th tranng nstance. For each m and each faml a compute a m] t Compute the epected suffcent statstcs for each values u on a respectvel. t M t [ a u] a u m] Mamzaton M-step Treat the epected suffcent statstcs as observed and set the parameters to the MLE wth respect to the ESS M [ a u] t+ t a u M [ a u] m t 32 6

17 Epectaton Mamzaton EM Intal network pdated network Epected counts N + E-Step nference N M-Step reparameterze Tranng data?? Iterate 33 Epectaton Mamzaton EM Formal uarantees LΘ t+ LΘ t Each teraton mproves the lkelhood If Θ t+ Θ t then Θ t s a statonar pont of LΘ suall ths means a local mamum Man cost Computatons of epected counts n E-Step Requres nference for each nstance n tranng set Eactl the same as n gradent ascent! Readng materal on EM lease read Andrew Ng s lecture note 34 7

18 EM ractcal Consderatons Intal parameters ghl senstve to startng parameters Choose randoml Choose b guessng from another source Stoppng crtera Small change n data lkelhood Small change n parameters Avodng bad local mama Multple restarts Earl prunng of unpromsng startng ponts 35 Acknowledgement These lecture notes were generated based on the sldes from rof Eran Segal. CSE 55 Statstcal Methods Sprng

Expectation Maximization Mixture Models HMMs

Expectation Maximization Mixture Models HMMs -755 Machne Learnng for Sgnal Processng Mture Models HMMs Class 9. 2 Sep 200 Learnng Dstrbutons for Data Problem: Gven a collecton of eamples from some data, estmate ts dstrbuton Basc deas of Mamum Lelhood