CS 2750 Machine Learning Lecture 5. Density estimation. Density estimation

CS 750 Mache Learg Lecture 5 esty estmato Mlos Hausrecht mlos@tt.edu 539 Seott Square esty estmato esty estmato: s a usuervsed learg roblem Goal: Lear a model that rereset the relatos amog attrbutes the data.. } ata: { a vector of attrbute values Attrbutes: modeled by radom varables X { X wth X Xd} Cotuous or dscrete valued varables esty estmato: lear a uderlyg robablty dstrbuto model : X X X X from d

ata: esty estmato {.. } a vector of attrbute values Objectve: estmate the model of the uderlyg robablty dstrbuto over varables X X usg eamles true dstrbuto samles X.. } { estmate ˆ X esty estmato true dstrbuto samles X.. } { estmate ˆ X Stadard d assumtos: Samles are deedet of each other come from the same detcal dstrbuto fed X Ideedetly draw staces from the same fed dstrbuto

esty estmato yes of desty estmato: arametrc the dstrbuto s modeled usg a set of arameters ˆ X X Eamle: mea ad covaraces of a multvarate ormal Estmato: fd arameters descrbg data o-arametrc he model of the dstrbuto utlzes all eamles As f all eamles were arameters of the dstrbuto Eamles: earest-eghbor Learg va arameter estmato I ths lecture we cosder arametrc desty estmato Basc settgs: A set of radom varables X { X X Xd} A model of the dstrbuto over varables X wth arameters : ˆ X Eamle: Gaussa dstrbuto wth mea ad varace arameters ata {.. } Objectve: fd arameters such that X fts data the best 3

ML arameter estmato Model ˆ X X Θ ata {.. } Mamum lelhood ML Fd that mamzes the lelhood.. log-lelhood ML arg ma ˆ X X Θ log log arg ma ML Ideedet eamles log Bayesa arameter estmato he ML estmate cs just oe value of the arameter roblem: f there are two dfferet arameter values that are close terms of the lelhood usg oly oe of them may troduce a strog bas f we use t for eamle for redctos. Bayesa arameter estmato Remedes the lmtato of oe choce Uses the osteror dstrbuto for arameters osteror covers all ossble arameter values ad ther weghts arameter osteror ata Lelhood arameter ror 4

What does t do? Bayesa arameter estmato ror ad osteror covers all ossble arameter values ad ther weghts Assume: we have a model of wth a arameter Bayesa arameter estmato: ror o a arameter ML Estmate + ata + = ata + = osteror o a arameter Just oe value Bayesa arameter estmato Bayesa arameter estmato Uses the osteror dstrbuto for arameters osteror covers all ossble arameter values ad ther weghts arameter osteror How to use the osteror for modelg X? ˆ X X X Θ Θ dθ Θ ata Lelhood arameter ror 5

arameter estmato Other crtera: Mamum a osteror robablty MA mamze Θ mode of the osteror Yelds: oe set of arameters Θ MA Aromato: ˆ X X Θ MA Eected value of the arameter Θˆ E Θ mea of the osteror Eectato tae wth regard to osteror Θ Yelds: oe set of arameters Aromato: ˆ X X Θˆ arameter estmato. Co eamle. Co eamle: we have a co that ca be based Outcomes: two ossble values -- head or tal ata: a sequece of outcomes such that head tal 0 Model: robablty of a head robablty of a tal Objectve: We would le to estmate the robablty of a head from data ˆ 6

arameter estmato. Eamle. Assume the uow ad ossbly based co robablty of the head s ata: H H H H H H H H H H H H H H H Heads: 5 als: 0 What would be your estmate of the robablty of a head? ~? arameter estmato. Eamle Assume the uow ad ossbly based co robablty of the head s ata: H H H H H H H H H H H H H H H Heads: 5 als: 0 What would be your choce of the robablty of a head? Soluto: use frequeces of occurreces to do the estmate ~ 5 5 0.6 hs s the mamum lelhood estmate of the arameter 7

robablty of a outcome ata: a sequece of outcomes such that head tal 0 Model: robablty of a head robablty of a tal Assume: we ow the robablty robablty of a outcome of a co fl Combes the robablty of a head ad a tal So that s gog to c ts correct robablty Gves for Gves for 0 Beroull dstrbuto robablty of a sequece of outcomes. ata: a sequece of outcomes such that head tal 0 Model: robablty of a head robablty of a tal Assume: a sequece of deedet co fls = H H H H ecoded as = 00 What s the robablty of observg the data sequece :? 8

robablty of a sequece of outcomes. ata: a sequece of outcomes such that head tal 0 Model: robablty of a head robablty of a tal Assume: a sequece of co fls = H H H H ecoded as = 00 What s the robablty of observg a data sequece : robablty of a sequece of outcomes. ata: a sequece of outcomes such that head tal 0 Model: robablty of a head robablty of a tal Assume: a sequece of co fls = H H H H ecoded as = 00 What s the robablty of observg a data sequece : lelhood of the data 9

robablty of a sequece of outcomes. ata: a sequece of outcomes such that head tal 0 Model: robablty of a head robablty of a tal Assume: a sequece of co fls = H H H H ecoded as = 00 What s the robablty of observg a data sequece : 6 Ca be rewrtte usg the Beroull dstrbuto: he goodess of ft to the data Learg: we do ot ow the value of the arameter Our learg goal: Fd the arameter that fts the data the best? Oe soluto to the best : Mamze the lelhood Ituto: more lely are the data gve the model the better s the ft ote: Istead of a error fucto that measures how bad the data ft the model we have a measure that tells us how well the data ft : Error 0

Mamum lelhood ML estmate. Lelhood of data: Mamum lelhood estmate ML arg ma Otmze log-lelhood the same as mamzg lelhood l log log log log log - umber of heads see - umber of tals see log Mamum lelhood ML estmate. Otmze log-lelhood l log log Set dervatve to zero Solvg l 0 ML Soluto: ML

Mamum lelhood estmate. Eamle Assume the uow ad ossbly based co robablty of the head s ata: H H H H H H H H H H H H H H H Heads: 5 als: 0 What s the ML estmate of the robablty of a head ad a tal? Mamum lelhood estmate. Eamle Assume the uow ad ossbly based co robablty of the head s ata: H H H H H H H H H H H H H H H Heads: 5 als: 0 What s the ML estmate of the robablty of head ad tal? Head: al: ML ML 5 5 0.6 0 5 0.4

3 Bayesa arameter estmato Uses the dstrbutos ror ad osteror over all ossble values of the arameter of the samlg dstrbuto Beroull: We ow that the lelhood s: How to choose the ror robablty? va Bayes theorem - s the ror robablty o ror Lelhood of data ormalzg factor osteror CS 750 Mache Learg ror dstrbuto Beta Beta Choce of ror: Beta dstrbuto Beta dstrbuto fts Beroull samle - cojugate choces Beta Why to use Beta dstrbuto? osteror dstrbuto s aga a Beta dstrbuto - a Gamma fucto! For teger values of

4 CS 750 Mache Learg Beta dstrbuto b a b a b a b a Beta osteror dstrbuto * = Beta Beta Beta Beta

5 osteror dstrbuto Beta osteror A cojugate ror to Beroull samle otce that arameters of the ror act le couts of heads ad tals sometmes they are also referred to as ror couts Beta Beta Mamum aosteror robablty MA Mamum a osteror estmate Selects the mode of the osteror dstrbuto Selects the model of the osteror rereseted as a Beta dstrbuto va Bayes rule ror Lelhood of data ormalzg factor ma arg MA Beta Beta

6 Mamum osteror robablty Mamum a osteror estmate Selects the mode of the osteror dstrbuto Assumes cojugate ror to Beroull samle MA MA Soluto: Beta Beta 0 log Mode of the osteror satsfes : CS 750 Mache Learg MA estmate eamle Assume the uow ad ossbly based co robablty of the head s ata: H H H H H H H H H H H H H H H Heads: 5 als: 0 Assume What s the MA estmate? 55 Beta

MA estmate eamle Assume the uow ad ossbly based co robablty of the head s ata: H H H H H H H H H H H H H H H Heads: 5 als: 0 Assume Beta 55 What s the MA estmate? MA 9 33 CS 750 Mache Learg MA estmate eamle ote that the ror ad data ft data lelhood are combed he MA ca be based wth large ror couts It s hard to overtur t wth a smaller samle sze ata: H H H H H H H H H H H H H H H Heads: 5 als: 0 Assume Beta 55 Beta 50 9 MA 33 9 MA 48 CS 750 Mache Learg 7

8 CS 750 Mache Learg Bayesa framewor redctve robablty of a outcome the et tral Equvalet to the eected value of the arameter eectato s tae wth resect to the osteror dstrbuto Beta 0 d 0 E d osteror desty CS 750 Mache Learg Eected value of the arameter How to calculate the eected value of Beta? d d Beta E 0 0 d 0 Beta d 0 ote: for teger values of

Eected value of the arameter Substtutg the results for the osteror: Beta We get E ote that the mea of the osteror s yet aother reasoable arameter choce: ˆ E CS 750 Mache Learg Bomal dstrbuto Eamle roblem: a based co Outcomes: two ossble values -- head or tal ata: a set of order-deedet outcomes for trals - umber of heads see - umber of tals see ca be calculated from the tral data!!! Model: robablty of a head robablty of a tal robablty of a outcome Objectve: We would le to estmate the robablty of a head Bomal dstrbuto ˆ 9

Bomal dstrbuto = * + 3* Eamle roblem: co fls where each co fl ca have two results: head or tal Outcome: - umber of heads see - umber of tals see trals Model: robablty of a head robablty of a tal robablty of a outcome: Bomal dstrbuto Bomal dstrbuto: models order deedet sequece of Beroull trals Bomal dstrbuto: Bomal dstrbuto 0

Mamum lelhood ML estmate. Lelhood of data: Log-lelhood!!! log log!!! log log l Costat from the ot of otmzato!!! ML ML Soluto: he same as for Beroull ad wth d sequece of eamles osteror desty osteror desty ror choce Lelhood osteror MA estmate ma arg MA va Bayes rule Beta Beta MA

Multomal dstrbuto Eamle: multle rolls of a dce wth 6 results Outcome: couts of occurreces of ossble outcomes of trals: Model arameters: robablty dstrbuto: ML estmate: ML θ s.t. - a umber of tmes a outcome has bee see!!!! θ - robablty of a outcome Multomal dstrbuto osteror ad MA estmate Choce of the ror: rchlet dstrbuto.... r r θ θ θ θ MA.. MA estmate: osteror desty.. r θ rchlet s the cojugate choce for the multomal samlg!!!! θ θ

rchlet dstrbuto: Assume: =3 rchlet dstrbuto r θ.. 3 Other dstrbutos he same deas ca be aled to other dstrbutos ycally we choose dstrbutos that behave well so that comutatos lead to ce solutos Eoetal famly of dstrbutos Cojugate choces for some of the dstrbutos from the eoetal famly: Bomal Beta Multomal - rchlet Eoetal Gamma osso Iverse Gamma Gaussa - Gaussa mea ad Wshart covarace 3

Gaussa ormal dstrbuto Gaussa: ~ arameters: - mea - stadard devato esty fucto: e[ Eamle: ] 0.4 0.35 0 0.3 0.5 0. 0.5 0. 0.05 0-4 -3 - - 0 3 4 CS 750 Mache Learg arameter estmates Loglelhood l log ML estmates of the mea ad varace: ˆ ˆ ˆ ML varace estmate s based E E ˆ Ubased estmate: ˆ ˆ CS 750 Mache Learg 4

Multvarate ormal dstrbuto Multvarate ormal: ~ arameters: - mea - covarace matr esty fucto: e d / / Eamle: CS 750 Mache Learg arttoed Gaussa strbutos Multvarate Gaussa: Eamle: recso matr What are the dstrbutos for margals ad codtoals? a a b 5

arttoed Codtoals ad Margals Codtoal desty: Margal esty: arttoed Codtoals ad Margals 6

7 CS 750 Mache Learg arameter estmates Loglelhood ML estmates of the mea ad covaraces: Covarace estmate s based Ubased estmate: ˆ ˆ ˆ ˆ log l ˆ ˆ ˆ E E ˆ ˆ ˆ CS 750 Mache Learg osteror of a multvarate ormal Assume a ror o the mea that s ormally dstrbuted: he the osteror of s ormally dstrbuted d / / e e * / / d e / / d

8 CS 750 Mache Learg osteror of a multvarate ormal he the osteror of s ormally dstrbuted e / / d CS 750 Mache Learg Other dstrbutos Gamma dstrbuto: Eoetal dstrbuto: A secal case of Gamma for a= osso dstrbuto: b a a e b a b a b e b b! e for ] 0 [ for } 0 {

9 Other dstrbutos Gamma dstrbuto: b a a e b a b a for ] 0 [ CS 750 Mache Learg Sequetal Bayesa arameter estmato Sequetal Bayesa aroach Uder the d the estmates of the osteror ca be comuted cremetally for a sequece of data ots If we use a cojugate ror we get bac the same osteror Assume we slt the data the last elemet ad the rest he: d Θ Θ Θ d Θ Θ Θ Θ A ew ror

Eoetal famly Eoetal famly: all robablty mass / desty fuctos that ca be wrtte the eoetal ormal form f h e t Z a vector of atural or caocal arameters t a fucto referred to as a suffcet statstc h a fucto of t s less mortat Z a ormalzato costat a artto fucto Z h e t d Other commo form: f h e t A log Z A CS 750 Mache Learg Eoetal famly: eamles Beroull dstrbuto e log log elog e log Eoetal famly f h e t Z arameters? t? Z? h? CS 750 Mache Learg 30

3 CS 750 Mache Learg Eoetal famly: eamles Beroull dstrbuto Eoetal famly arameters ote log log e log t h Z e e t h Z f log e log e e CS 750 Mache Learg Eoetal famly: eamles Uvarate Gaussa dstrbuto Eoetal famly arameters e log e?? t? h? Z e t h Z f ] e[

3 CS 750 Mache Learg Eoetal famly: eamles Uvarate Gaussa dstrbuto Eoetal famly arameters e log e / / t / h log 4 e log e Z e t h Z f ] e[ CS 750 Mache Learg Eoetal famly For d samles the lelhood of data s Imortat: the dmesoalty of the suffcet statstc remas the same wth the umber of samles e A t h e A t h e A t h

33 CS 750 Mache Learg Eoetal famly he log lelhood of data s Otmzg the loglelhood For the ML estmate t must hold e log A t h l log A t h 0 A t l t A CS 750 Mache Learg Eoetal famly Rewrttg the gradet:

Eoetal famly Rewrttg the gradet: A log Z log h e t d t h e t d A h e t d A t h e t A d A E t Result: E t t For the ML estmate the arameters should be adjusted such that the eectato of the statstc t s equal to the observed samle statstcs CS 750 Mache Learg Momets of the dstrbuto For the eoetal famly he -th momet of the statstc corresods to the -th dervatve of A If s a comoet of t the we get the momets of the dstrbuto by dfferetatg ts corresodg atural arameter Eamle: Beroull e log log A log log e ervatves: A e log e e e A e CS 750 Mache Learg 34

Ed CS 750 Mache Learg Multvarate ormal dstrbuto Multvarate ormal: ~ arameters: - mea - covarace matr esty fucto: e d / / Eamle: CS 750 Mache Learg 35

arameter estmates Loglelhood l log ML estmates of the mea ad covaraces: ˆ ˆ ˆ ˆ Covarace estmate s based E ˆ E ˆ Ubased estmate: ˆ ˆ CS 750 Mache Learg ˆ ˆ Learg va arameter estmato I ths lecture we cosder arametrc desty estmato Basc settgs: A set of radom varables X { X X Xd} A model of the dstrbuto over varables X wth arameters ata.. } { Objectve: fd arameters ˆ that ft the data the best What s the best set of arameters? here are varous crtera oe ca aly here CS 750 Mache Learg 36

37 CS 750 Mache Learg arameter estmato. Mamum lelhood ML Mamum a osteror robablty MA Bayesa framewor use a osteror desty o otmzato mamze - reresets ror bacgroud owledge mamze Selects the mode of the osteror CS 750 Mache Learg osteror of a multvarate ormal Assume that we use oly a ror o the mea: A ror he the osteror s: ormally ML estmates of the mea ad covaraces: Covarace estmate s based Ubased estmate: ˆ ˆ ˆ ˆ ˆ ˆ ˆ E E ˆ ˆ ˆ e / / d

Loglelhood arameter estmates l log ML estmates of the mea ad covaraces: ˆ ˆ ˆ ˆ Ubased estmate: ˆ ˆ ˆ CS 750 Mache Learg Usuervsed learg ata: {.. } a vector of attrbute values e.g. the descrto of a atet o secfc target attrbute we wat to redct o outut y Objectve: lear descrbe relatos betwee attrbutes eamles yes of roblems: Clusterg Grou together smlar eamles esty estmato Model robablstcally the oulato of eamles CS 750 Mache Learg 38

Beta dstrbuto 3.5 3.5 0.5.5 5 =0.5 =0.5 =.5 =.5 =.5 =5.5.5 0.5 0 0 0. 0. 0.3 0.4 0.5 0.6 0.7 0.8 0.9 CS 750 Mache Learg Eoetal famly Eoetal famly of dstrbutos θ b θ f θφ e c φ a φ arameters: θ - locato arameters φ - scalg arameters Eamle: d / / e CS 750 Mache Learg 39

Eamle: Beroull dstrbuto Co eamle: we have a co that ca be based Outcomes: two ossble values -- head or tal ata: a sequece of outcomes such that head tal 0 Model: robablty of a head robablty of a tal Objectve: We would le to estmate the robablty of a head ˆ robablty of a outcome Beroull dstrbuto CS 750 Mache Learg 40