Machine Learning. Tutorial on Basic Probability. Lecture 2, September 15, 2006

Size: px

Start display at page:

Download "Machine Learning. Tutorial on Basic Probability. Lecture 2, September 15, 2006"

Elaine Bernadette Haynes
6 years ago
Views:

1 Mache Learg -7/5 7/5-78, 78, all 6 Tutoral o asc robablty Erc g f Lecture, Setember 5, 6 Readg: Cha. &, C & Cha 5,6, TM What s ths? Classcal AI ad ML research gored ths heomea The roblem a eamle: you wat to catch a flght at :am from tt to S, ca I make t f I leave at 7am ad take a 8 at CMU? artal observablty road state, other drvers' las, etc. osy sesors rado traffc reorts ucertaty acto outcomes flat tre, etc. mmese comlety of modelg ad redctg traffc Reasog uder ucertaty!

2 asc robablty Cocets A samle sace S s the set of all ossble outcomes of a cocetual or hyscal, reeatable eermet. S ca be fte or fte. E.g., S may be the set of all ossble outcomes of a dce roll: S {,,3,4,5,6} E.g., S may be the set of all ossble ucleotdes of a DA ste: S { A,T, C,G} E.g., S may be the set of all ossble ostos tme-sace ostos o of a arcraft o a radar scree: S {, R } {, 36 } {, } A evet A s the ay subset S : ma + Seeg "" or "6" a roll; observg a "G" at a ste; UA7 sace-tme terval A evet sace E s the ossble worlds the outcomes ca hae All dce-rolls, readg a geome, motorg the radar sgal Vsualzg robablty Sace A robablty sace s a samle sace of whch, for every subset s S, there s a assgmet s S such that: s Σ s S s s s called the robablty or robablty mass of s Evet sace of all ossble worlds. Its area s Worlds whch A s true Worlds whch A s false a s the area of the oval

3 Kolmogorov Aoms All robabltes are betwee ad true regardless of the evet, my outcome s true false o evet makes my outcome true The robablty of a dsjucto s gve by A A + A A A? A A Why use robablty? There have bee attemts to develo dfferet methodologes for ucertaty: uzzy logc Qualtatve reasog Qualtatve hyscs robablty theory s othg but commo sese reduced to calculato erre Lalace, 8. I 93, de ett roved that t s rratoal to have belefs that volate these aoms, the followg sese: If you bet accordace wth your belefs, but your belefs volate the aoms, the you ca be guarateed to lose moey to a ooet whose belefs more accurately reflect the true state of the world. ere, bettg ad moey are roes for decso makg ad utltes. What f you refuse to bet? Ths s lke refusg to allow tme to ass: every acto cludg acto s a bet 3

4 Radom Varable A radom varable s a fucto that assocates a uque umercal value a toke wth every outcome of a eermet. The value of the r.v. wll vary from tral to tral as the eermet s reeated ω S Dscrete r.v.: The outcome of a dce-roll The outcome of readg a t at ste : ary evet ad dcator varable: Seeg a "A" at a ste, o/w. Ths descrbes the true or false outcome a radom evet. Ca we descrbe rcher outcomes the same way?.e.,,, 3, 4, for beg A, C, G, T --- thk about what would hae f we take eectato of. Ut-ase Radom vector [ A, T, G, C ] ', [,,,]' seeg a "G" at ste Cotuous r.v.: The outcome of recordg the true locato of a arcraft: The outcome of observg the measured locato of a arcraft true ω obs Dscrete rob. Dstrbuto I the dscrete case, a robablty dstrbuto o S ad hece o the doma of s a assgmet of a o-egatve real umber s to each s S or each vald value of such that Σ s S s. s tutvely, s corresods to the frequecy or the lkelhood of gettg s the eermets, f reeated may tmes call s s the arameters a dscrete robablty dstrbuto A robablty dstrbuto o a samle sace s sometmes called a robablty model, artcular f several dfferet dstrbutos are uder cosderato wrte models as M, M, robabltes as M, M e.g., M may be the arorate rob. dst. f s from "far dce", M s for the "loaded dce". M s usually a two-tule of {dst. famly, dst. arameters} 4

..,6] 6 5 4 3 j j j j j j j k k T G C A j j k T G C A j j de the dce - face}

5 5 eroull dstrbuto: er Multomal dstrbuto: Mult, Multomal dcator varable:., w.. ad ],, [ where, [,...,6] [,...,6] j j j j j j j k k T G C A j j k T G C A j j de the dce - face} where, { Dscrete Dstrbutos for for Multomal dstrbuto: Mult, Cout varable: Dscrete Dstrbutos j j K where, M K K K K!!!!!!!! L L L

6 Cotuous rob. Dstrbuto A cotuous radom varable ca assume ay value a terval o the real le or a rego a hgh dmesoal sace usually corresods to a real-valued measuremets of some roerty, e.g., legth, osto, It s ot ossble to talk about the robablty of the radom varable assumg a artcular value --- Istead, we talk about the robablty of the radom varable assumg a value wth a gve terval, or half terval [ ],, < [ ], Arbtrary oolea combato of basc roostos Cotuous rob. Dstrbuto The robablty of the radom varable assumg a value wth some gve terval from to s defed to be the area uder the grah of the robablty desty fucto betwee ad. [ ] d, robablty mass:, ote that + d. Cumulatve dstrbuto fucto CD: < ' d ' robablty desty fucto D: + d d d ; >, Car flow o Lberty rdge cooked u! 6

7 What s the tutve meag of If a ad b, That s: the whe a value s samled from the dstrbuto wth desty, you are a/b tmes as lkely to fd that s very close to tha that s very close to. h < < + h lm h h < < + h + h d h a h b d h + h h Cotuous Dstrbutos Uform robablty Desty ucto / b a for a b elsewhere ormal Gaussa robablty Desty ucto / σ e πσ The dstrbuto s symmetrc, ad s ofte llustrated as a bell-shaed curve. Two arameters, mea ad σ stadard devato, determe the locato ad shae of the dstrbuto. The hghest ot o the ormal curve s at the mea, whch s also the meda ad mode. The mea ca be ay umercal value: egatve, zero, or ostve. Eoetal robablty Dstrbuto / desty : / e, o CD : e f f.4.3 < area Tme etwee Successve Arrvals ms. 7

8 Statstcal Characterzatos Eectato: the cetre of mass, mea value, frst momet: Samle mea: E S d dscrete cotuous Varace: the sreadess: Var Samle varace S [ E ] [ E ] d dscrete cotuous Gaussa ormal desty D If, σ, the robablty desty fucto df of s defed as / σ e πσ We wll ofte use the recso λ /σ stead of the varace σ. ere s how we lot the df matlab s-3:.:3; lots,ormdfs,mu,sgma ote that a desty evaluated at a ot ca be bgger tha! 8

9 Gaussa CD If Z,, the cumulatve desty fucto s defed as Φ z dz / z e dz π Ths has o closed form eresso, but s bult to most software ackages eg. ormcdf matlab stats toolbo. Use of the cdf If, σ, the Z /σ,. ow much mass s cotaed sde the [-.98σ,.98σ] terval? a b b a a < < b σ < Z < σ Φ σ Φ σ Sce Z.96 ormcdf.96 we have σ < < σ.95 9

10 Cetral lmt theorem If,, are..d. we wll come back to ths ot shortly cotuous radom varables The defe f,,..., As fty, Gaussa wth mea E[ ] ad varace Var[ ] Somewhat of a justfcato for assumg Gaussa ose s commo Elemetary maulatos of robabltes Set robablty of mult-valued r.v. {Odd} +3+5 /6+/6+/6 ½,, j j K Mult-varat dstrbuto: Jot robablty: true true j j {,K, } Margal robablty: j j S

11 Codtoal robablty racto of worlds whch s true that also have true "havg a headache" "comg dow wth lu" / /4 / fracto of flu-flcted worlds whch you have a headache / Defto: Corollary: The Cha Rule robablstc Iferece "havg a headache" "comg dow wth lu" / /4 / Oe day you wake u wth a headache. ou come wth the followg reasog: "sce 5% of flues are assocated wth headaches, so I must have a 5-5 chace of comg dow wth flu Is ths reasog correct?

12 robablstc Iferece "havg a headache" "comg dow wth lu" / /4 / The roblem:? The ayes Rule What we have just dd leads to the followg geeral eresso: Ths s ayes Rule

13 3 More Geeral orms of ayes Rule lu eadhead Drakeer + Z Z Z Z Z Z Z Z Z Z + S y y y ror Dstrbuto Suort that our roostos about the ossble has a "causal flow" e.g., ror or ucodtoal robabltes of roostos e.g., lu true ad Drkeer true. corresod to belef ror to arrval of ay ew evdece A robablty dstrbuto gves values for all ossble assgmets: Drkeer [.,.9,.,.8] ormalzed,.e., sums to

14 Jot robablty A jot robablty dstrbuto for a set of RVs gves the robablty of every atomc evet samle ot lu,drkeer a matr of values: lu,drkeer, eadache? Every questo about a doma ca be aswered by the jot dstrbuto, as we wll see later. osteror codtoal robablty Codtoal or osteror see later robabltes e.g., lueadache.78 gve that flu s all I kow OT f flu the 7.8% chace of eadache Reresetato of codtoal dstrbutos: lueadache -elemet vector of -elemet vectors If we kow more, e.g., Drkeer s also gve, the we have lueadache,drkeer.7 Ths effect s kow as ela away! lueadache,lu ote: the less or more certa belef remas vald after more evdece arrves, but s ot always useful ew evdece may be rrelevat, allowg smlfcato, e.g., lueadache,stealerw lueadache Ths kd of ferece, sactoed by doma kowledge, s crucal 4

15 Iferece by eumerato Start wth a Jot Dstrbuto uldg a Jot Dstrbuto of M3 varables rob.4..7 Make a truth table lstg all combatos of values of your varables f there are M oolea varables the the table wll have M rows.. or each combato of values, say how robable t s. ormalzed,.e., sums to Iferece wth the Jot Oe you have the JD you ca.4 ask for the robablty of ay atomc evet cosstet wth you query E row E..7. 5

16 Iferece wth the Jot Comute Margals.4..7 lu eadache. Iferece wth the Jot Comute Margals.4..7 eadache. 6

17 Iferece wth the Jot Comute Codtoals.4..7 E E E E E E IE E row row. Iferece wth the Jot Comute Codtoals.4. lu eadhead lu eadhead eadhead.7. Geeral dea: comute dstrbuto o query varable by fg evdece varables ad summg over hdde varables 7

18 Summary: Iferece by eumerato Let be all the varables. Tycally, we wat the osteror jot dstrbuto of the query varables gve secfc values e for the evdece varables E Let the hdde varables be --E The the requred summato of jot etres s doe by summg out the hdde varables: Eeα,Eeα h,ee, h The terms the summato are jot etres because, E, ad together ehaust the set of radom varables Obvous roblems: Worst-case tme comlety Od where d s the largest arty Sace comlety Od to store the jot dstrbuto ow to fd the umbers for Od etres??? Codtoal deedece Wrte out full jot dstrbuto usg cha rule: eadache;lu;vrus;drkeer eadache lu;vrus;drkeer lu;vrus;drkeer eadache lu;vrus;drkeer lu Vrus;Drkeer Vrus Drkeer Drkeer Assume deedece ad codtoal deedece eadachelu;drkeer luvrus Vrus Drkeer I.e.,? deedet arameters I most cases, the use of codtoal deedece reduces the sze of the reresetato of the jot dstrbuto from eoetal to lear. Codtoal deedece s our most basc ad robust form of kowledge about ucerta evromets. 8

19 Rules of Ideedece --- by eamles Vrus Drkeer Vrus ff Vrus s deedet of Drkeer lu Vrus;Drkeer luvrus ff lu s deedet of Drkeer, gve Vrus eadache lu;vrus;drkeer eadachelu;drkeer ff eadache s deedet of Vrus, gve lu ad Drkeer Margal ad Codtoal Ideedece Recall that for evets E.e. ad say, y, the codtoal robablty of E gve, wrtte as E, s E ad / the robablty of both E ad are true, gve s true E ad are statstcally deedet f E E.e., rob. E s true does't deed o whether s true; or equvaletly E ad E. E ad are codtoally deedet gve f E, E or equvaletly E, E 9

20 Why kowledge of Ideedece s useful Lower comlety tme, sace, search Motvates effcet ferece for all kds of queres Stay tued!! Structured kowledge about the doma easy to learg both from eert ad from data easy to grow Where do robablty dstrbutos come from? Idea Oe: uma, Doma Eerts Idea Two: Smler robablty facts ad some algebra e.g.,,, Idea Three: Lear them from data! A good chuk of ths course s essetally about varous ways of learg varous forms of them!

21 Desty Estmato A Desty Estmator lears a mag from a set of attrbutes to a robablty Ofte kow as arameter estmato f the dstrbuto form s secfed omal, Gaussa Three mortat ssues: ature of the data d, correlated, Objectve fucto MLE, MA, Algorthm smle algebra, gradet methods, EM, Evoluto scheme lkelhood o test data, redctablty, cosstecy, arameter Learg from d data Goal: estmate dstrbuto arameters from a dataset of deedet, detcally dstrbuted d, fully observed, trag cases D {,..., } Mamum lkelhood estmato MLE. Oe of the most commo estmators. Wth d ad full-observablty assumto, wrte L as the lkelhood of the data: L,, K, ; ; ;, K, ; 3. ck the settg of arameters most lkely to have geerated the data we saw: ; * arg ma L arg ma log L

22 Eamle : eroull model Data: We observed d co tossg: D{,,,, } Reresetato: ary r.v: Model: for for ow to wrte the lkelhood of a sgle observato? {, } The lkelhood of datasetd{,, }:,,..., #head #tals MLE Objectve fucto: h t l ; D log D log log + log We eed to mamze ths w.r.t. Take dervatves wrt h h l h h h MLE or MLE Suffcet statstcs h, where k, requecy as samle mea The couts, are suffcet statstcs of data D

23 MLE for dscrete jot dstrbutos More geerally, t s easy to show that # records whch evet s true evet total umber of records Ths s a mortat but sometmes ot so effectve learg algorthm! Eamle : uvarate ormal Data: We observed d real samles: Model: D{-.,,, -5.,, 3} Log lkelhood: / πσ e{ / σ } MLE: take dervatve ad set to zero: l ; D log D log πσ σ l / σ MLE l + 4 σ σ σ σ MLE ML 3

24 Overfttg Recall that for eroull Dstrbuto, we have What f we tossed too few tmes so that we saw zero head? We have head ad we wll redct that the robablty of ML, seeg a head et s zero!!! The rescue: head ML head Where ' s kow as the seudo- magary cout head ML ut ca we make ths more formal? head + tal head + ' head tal + + ' The ayesa Theory The ayesa Theory: e.g., for date D ad model M MD DMM/D the osteror equals to the lkelhood tmes the ror, u to a costat. Ths allows us to cature ucertaty about the model a rcled way 4

25 5 erarchcal ayesa Models are the arameters for the lkelhood α are the arameters for the ror α. We ca have hyer-hyer-arameters, etc. We sto whe the choce of hyer-arameters makes o dfferece to the margal lkelhood; tycally make hyerarameters costats. Where do we get the ror? Itellget guesses Emrcal ayes Tye-II mamum lkelhood comutg ot estmates of α : ma arg α α α v v v v MLE ayesa estmato for eroull eta dstrbuto: osteror dstrbuto of : otce the somorhsm of the osteror to the ror, such a ror s called a cojugate ror + + β α β α t h t h,...,,...,,..., Γ Γ + Γ β α β α β α β α β α β α,, ;

26 ayesa estmato for eroull, co'd osteror dstrbuto of :,..., h t α β h + α,...,,..., Mamum a osteror MA estmato: t + β MA arg ma log osteror mea estmato:,..., ata arameters ca be uderstood as seudo-couts ayes ror stregth: Aα+β + α β h + α t + β h D d C d + α + A ca be teroerated as the sze of a magary data set from whch we obta the seudo-couts Effect of ror Stregth Suose we have a uform ror αβ/, ad we observe v h, 8 t Weak ror A. osteror redcto: v v + h h, t 8, α α'. 5 + Strog ror A. osteror redcto: v v + h h, t 8, α α'. 4 + owever, f we have eough data, t washes away the ror. e.g., v h, t 8. The the estmates uder + + weak ad strog ror are + ad +, resectvely, both of whch are close to. 6

27 7 ayesa estmato for ormal dstrbuto ormal ror: Jot robablty: osteror: { } τ πτ / e / τ σ σ τ σ τ τ σ σ ~ ad, / / / / / / ~ where Samle mea { } τ πτ σ πσ / e e, / / { } σ πσ ~ / ~ e ~ /

CS 2750 Machine Learning Lecture 5. Density estimation. Density estimation

CS 2750 Machine Learning Lecture 5. Density estimation. Density estimation CS 750 Mache Learg Lecture 5 esty estmato Mlos Hausrecht mlos@tt.edu 539 Seott Square esty estmato esty estmato: s a usuervsed learg roblem Goal: Lear a model that rereset the relatos amog attrbutes the