Machine learning: Density estimation

Size: px

Start display at page:

Download "Machine learning: Density estimation"

Julianna Carson
5 years ago
Views:

1 CS 70 Foundatons of AI Lecture 3 Machne learnng: ensty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square ata: ensty estmaton {.. n} x a vector of attrbute values Objectve: estmate the model of the underlyng probablty dstrbuton over varables X px usng examples n true dstrbuton n samples p X.. } { n estmate pˆ X

2 ensty estmaton true dstrbuton n samples p X.. } { n estmate pˆ X Standard d assumptons: Samples are ndependent of each other come from the same dentcal dstrbuton fxed px Independently drawn nstances from the same fxed dstrbuton Learnng va parameter estmaton In ths lecture we consder parametrc densty estmaton Basc settngs: A set of random varables X { X X Xd} A model of the dstrbuton over varables n X wth parameters ata.. } { n Objectve: fnd parameters ˆ that ft the data the best What s the best set of parameters? here are varous crtera one can apply here.

3 arameter estmaton. Basc crtera. Maxmum lkelhood ML maxmze p - represents pror background knowledge Maxmum a posteror probablty MA maxmze p Selects the mode of the posteror p p p p arameter estmaton. Con example. Con example: we have a con that can be based Outcomes: two possble values -- head or tal ata: a sequence of outcomes x such that head x tal x 0 Model: probablty of a head probablty of a tal Objectve: We would lke to estmate the probablty of a head from data ˆ 3

4 arameter estmaton. Example. Assume the unknown and possbly based con robablty of the head s ata: H H H H H H H H H H H H H H H Heads: 5 als: 0 What would be your estmate of the probablty of a head? ~? arameter estmaton. Example Assume the unknown and possbly based con robablty of the head s ata: H H H H H H H H H H H H H H H Heads: 5 als: 0 What would be your choce of the probablty of a head? Soluton: use frequences of outcomes to do the estmate ~ hs s the maxmum lkelhood estmate of the parameter 4

5 robablty of an outcome ata: a sequence of outcomes such that head x tal x 0 Model: probablty of a head 0.6 probablty of a tal 0.4 Assume: we know the probablty robablty of an outcome of a con flp x x x Combnes the probablty of a head and a tal So that x s gong to pck ts correct probablty Gves or 0.6 for x Gves or 0.4 for x 0 x x Bernoull dstrbuton robablty of a sequence of outcomes. ata: a sequence of outcomes such that head x tal x 0 Model: probablty of a head 0.6 probablty of a tal 0.4 Assume: a sequence of ndependent con flps = H H H H encoded as = 00 What s the probablty of observng the data sequence :? x 5

6 robablty of a sequence of outcomes. ata: a sequence of outcomes such that head x tal x 0 Model: probablty of a head 0.6 probablty of a tal 0.4 Assume: a sequence of con flps = H H H H encoded as = 00 What s the probablty of observng a data sequence : = 0.6*0.6*0.4*0.6*0.4*0.6 =0.6 4 *0.4 x robablty of a sequence of outcomes. ata: a sequence of outcomes such that head x tal x 0 Model: probablty of a head probablty of a tal Assume: a sequence of con flps = H H H H encoded as = 00 What s the probablty of observng a data sequence : lkelhood of the data x 6

7 robablty of a sequence of outcomes. ata: a sequence of outcomes such that head x tal x 0 Model: probablty of a head probablty of a tal Assume: a sequence of con flps = H H H H encoded as = 00 What s the probablty of observng a data sequence : 6 x Can be rewrtten usng the Bernoull dstrbuton: x x he goodness of ft to the data Learnng: we do not know the value of the parameter Our learnng goal: Fnd the parameter that fts the data the best? One soluton to the best : Maxmze the lkelhood n x x Intuton: more lkely are the data gven the model the better s the ft ote: Instead of an error functon that measures how bad the data ft the model we have a measure that tells us how well the data ft : Error 7

8 8 Maxmum lkelhood ML estmate. Maxmum lkelhood estmate - number of heads seen - number of tals seen max arg ML Lkelhood of data: x n x Optmze log-lkelhood the same as maxmzng lkelhood log log x n x l log log log log n n n x x x x Maxmum lkelhood ML estmate. ML ML Soluton: Optmze log-lkelhood log log l Set dervatve to zero 0 l Solvng

9 Maxmum lkelhood estmate. Example Assume the unknown and possbly based con robablty of the head s ata: H H H H H H H H H H H H H H H Heads: 5 als: 0 What s the ML estmate of the probablty of a head and a tal? Maxmum lkelhood estmate. Example Assume the unknown and possbly based con robablty of the head s ata: H H H H H H H H H H H H H H H Heads: 5 als: 0 What s the ML estmate of the probablty of head and tal? Head: al: ML ML

10 Learnng of BB parameters. Example. Example: neumona neumona F?? HWBCneum n F?? F?? aleness Fever Cough Hgh WBC alenneum Feverneum Coughneum??? CS 57 Intro to AI Learnng of BB parameters. Example. ata dfferent patent cases: al Fev Cou HWB neu F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F aleness Fever neumona Cough Hgh WBC CS 57 Intro to AI 0

11 Estmates of parameters of BB Much lke multple con tosses A smaller learnng problem corresponds to the learnng of exactly one condtonal dstrbuton Example: Fever neumona roblem: How to pck the data to learn? CS 57 Intro to AI Learnng of BB parameters. Example. ata dfferent patent cases: al Fev Cou HWB neu F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F aleness Fever How to estmate: neumona Cough Hgh WBC Fever neumona? CS 57 Intro to AI

12 Learnng of BB parameters. Example. Learn: Fever neumona Step : Select data ponts wth neumona= al Fev Cou HWB neu F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F aleness Fever neumona Cough Hgh WBC CS 57 Intro to AI Learnng of BB parameters. Example. Learn: Step : Fever neumona Ignore the rest al Fev Cou HWB neu F F F F F F F F aleness Fever neumona Cough Hgh WBC CS 57 Intro to AI

13 Learnng of BB parameters. Example. Learn: Fever neumona Step : Select values of the random varable defnng the dstrbuton of Fever al Fev Cou HWB neu F F F F F F F F aleness Fever neumona Cough Hgh WBC CS 57 Intro to AI Learnng of BB parameters. Example. Learn: Fever neumona Step : Ignore the rest Fev F F aleness Fever neumona Cough Hgh WBC CS 57 Intro to AI 3

14 Learnng of BB parameters. Example. Learn: Fever neumona Step 3: Learnng the ML estmate Fev F F aleness Fever neumona Cough Hgh WBC Fever neumona F CS 57 Intro to AI Maxmum a posteror estmate Maxmum a posteror estmate Selects the mode of the posteror dstrbuton MA arg max p How to choose the pror probablty? Lkelhood of data pror p p va Bayes rule p n x x - s the pror probablty on ormalzng factor CS 70 Foundatons of AI 4

15 5 CS 70 Foundatons of AI ror dstrbuton p Choce of pror: dstrbuton dstrbuton fts Bernoull trals - conjugate choces p Why to use dstrbuton? osteror dstrbuton s agan a dstrbuton x - A Gamma functon! x x For nteger values of x CS 750 Machne Learnng dstrbuton b a b a b a b a p

16 6 CS 750 Machne Learnng osteror dstrbuton * = p CS 750 Machne Learnng Maxmum a posteror probablty Maxmum a posteror estmate Selects the mode of the posteror dstrbuton otce that parameters of the pror act lke counts of heads and tals sometmes they are also referred to as pror counts MA MA Soluton: p

17 MA estmate example Assume the unknown and possbly based con robablty of the head s ata: H H H H H H H H H H H H H H H Heads: 5 als: 0 Assume p 55 What s the MA estmate? CS 70 Foundatons of AI MA estmate example Assume the unknown and possbly based con robablty of the head s ata: H H H H H H H H H H H H H H H Heads: 5 als: 0 Assume p 55 What s the MA estmate? q MA = +a - + +a +a - = 9 33 CS 70 Foundatons of AI 7

18 MA estmate example ote that the pror and data ft data lkelhood are combned he MA can be based wth large pror counts It s hard to overturn t wth a smaller sample sze ata: H H H H H H H H H H H H H H H Heads: 5 als: 0 Assume p 55 p 50 9 MA 33 9 MA 48 CS 70 Foundatons of AI Learnng of BB parameters Learn: Fever neumona Assume the pror Fever neumona ~ 34 Fev F F osteror: Fever MA neumona ~ 66 6 Fever neumona 6 6 aleness 0.5 neumona Fever Cough Hgh WBC MA estmates F

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements CS 750 Machne Learnng Lecture 5 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square CS 750 Machne Learnng Announcements Homework Due on Wednesday before the class Reports: hand n before