Three classification models Discriminant Model: learn the decision boundary directly and apply it to determine the class of each data point

Size: px

Start display at page:

Download "Three classification models Discriminant Model: learn the decision boundary directly and apply it to determine the class of each data point"

Rosalind Murphy
6 years ago
Views:

1 Review of Last Wee Three classificatio models Discrimiat Model: lear the decisio boudary directly ad aly it to determie the class of each data oit Discrimiative Model: lear PY directly Geerative Model: Lear PY through PY ad PY. The oit robability PY ad margial robability P ca also be leared. SML vs. o statistical ML Obective fuctio for SML: MLE or MAP Obective fuctio for ML: MSE maimiig the margi or others. Usuervised Learig Clusterig grou data together EM learig give icomletedata data based o MLE

2 Revisit Bayesia: Prior vs. Smoothig Techique

3 Estimatig the Lielihood Assumig we radomly observe coi tosses to fid head occurs H times what is the frequecy of head for this coi? Assumig the robability is the accordig to MLE we wat to otimie log H H After erformig derivatives o we ca lear that the MLE solutio of is H/ However the H/ model suffers a maor drawbac that usee evets will receive ero robability If we toss a coi 6 times ad fid ero heads H/ model tells us the robability of head is 0 However usee obects should receive a tiy robability rather tha ero give the fact that we ow they do eist. Smoothig: a techique to assig o ero robability to usee obects Add oe esmoothig: assumig eeyt everythig goccurs at least oce. Uder this assumtio the frequecy of W becomes H+/+2 because both head ad tail occurs oce. This is a commoly alied techiques for gram LagaugeModel g Learig

4 Add oe Smoothig vs. Bayesia Prior Last wee after class I received a from a studet i this class Shao Chua Wag sayig that right after the class he roved that t add oe smoothig ca be iterreted as a MAP solutio for coi toss. Recall that the MLE solutio aims at otimiig the lielihood robability H H ad that maosterior robability malielihood robability* Prior robability Assumig the rior robability is set to be roortioal to to rohibit assigig a very large or very small value to the the osterior robability becomes H+ H+ otimiig the osterior w.r.t will obtai H+/+2. It ca also be roved that add lambda smoothig ca be regarded as a MAP give slightly differet rior. Shao Chua s fidig i fact tells us that this two basic smoothig techiques is simly a secial case for Bayesia learig.

5 Usuervised Learig II

6 Clusterig Prof. Shou de Li CSIE/GIM TU 2009/2/9 6

7 What is clusterig? Clusterig is the artitioig ii i of a data set ito subsets clusters so that the data i each subset ideally share some commo trait. Differece betwee clusterig ad classificatio Clusterig: divide iut ito artitios without label. It s usuervised. Classificatio: classify iuts ito Y labeled classes suervised We will itroduce two famous clusterig algorithms K meas clusterig EM clusterig for Gaussia Miture Model 2009/2/9 7

8 Stes of K meas clusterig. Radomly select oits as cluster ceter. 2. Assig the rest of the oits to the cluster of its closest ceter 3. Re calculatig the mea oit of each cluster. 4. Costructig a ew artitio by associatig each oit with the cluster whose cetroid is the closest. 5. Go bac to /2/9 8

9 K Meas Eamle K2 It is A Greedy based Traiig!! Pic seeds Reassig clusters Comute cetroids Reasssig clusters Comute cetroids Reassig clusters Coverged! 2009/2/9 9

10 Meas Clusterig sometimes failed Failure Cases: Viterbi or greedy Traiig got stuc to subotimal easily If a ode is equally close to several clusters it ca cause roblems sice we ca oly assig it to oe class. Ca we do better? Yes usig EM clusterig 2009/2/9 0

11 EM clusterig for Gaussia Mitures Problem Suose you measure a sigle cotiuous variable ibl i a large samle of observatios. Suose the samle cosists of severalclusters of Gaussia observatios with differet meas ad variaces. Our ob is to determie the value of the 3 arameters: The mea ad variace for cluster The mea ad variace for cluster 2 The mea ad variace for cluster The samlig robability for cluster 2009/2/9

12 Multivariate Gaussia Distributio Sigle value Gaussia Distributio: 2 σ e 2 /2 2σ 2σ 2 Multivariate Gaussia Distributio: e D/2 / T 2009/2/9 2

13 Gaussia Miture Distributio GMD A liear combiatio of several Gaussia Distributios It ca model almost ay cotiuous desity give sufficiet umber of Gaussias. K Ν K rior lielihood 2009/2/9 3

14 MLE for GMD Suose we have a dataset D of observatios { 2 } ad we wish to model this data usig a miture of Gaussias. The thelog lielihood fuctio is give by l l l K Ν K Ν 2009/2/9 4

15 MLE solutio for MLE solutio for K l l K Ν l K Ν Ν Ν P c Solve the above equatio: ad Ca be iterreted as the effective umber of oits assiged to cluster The weighted mea of all the oits i the dataset i which the weight for data oit is give by the osterior robability that belogs to a cluster c 2009/2/9 5

16 MLE solutio for If we fid the ero artial derivatives of we will lear the MLE solutio for is T 2009/2/9 6

17 MLE solutio for MLE solutio for Sice there is a costrait that the sum of is we aly Lagragemultilier to maimie aly Lagrage multilier to maimie l + K λ The derivatives of the above w.r.t. equal 0 gives K K + Ν Ν + Ν Ν λ λ λ 0 0 K + + Ν λ 0 0 Ν 2009/2/9 7

18 How to Produce the MLE solutios? How to Produce the MLE solutios? MLE solutio for ad caot be easily obtaied i.e. o close form solutio sice cotais ad o close form solutio sice cotais ad Sice ca be geerated by ad ; ad ad ca be geerated by. We ca treat as a hidde variable ad aly EM algorithm to iteratively lear the arameters. T Ν K Ν 2009/2/9 8

19 EM for Gaussia Mitures Clusterig Goal: Maimie i the Lielihood fuctio w.r.t. the arameters ad Stes Iitialie the arameters ad ad evaluate the iitial value of the log lielihood l. E ste: geerate the osterior robabilities usig curret arameters Ν Ν M ste. Re estimate the arameters usig the curret osterior robabilities see revious age. Checthe log lielihood give curret arameters to see if they coverge. If ot retur to E ste. 2009/2/9 9 K

20 Grahic Eamle of EM Clusterig 2009/2/9 20

21 EM Framewor ad EM Theory

22 Goal The goal of EM algorithm is to fid maimum lielihood solutios for models that cotai latet or missig variables. We rereset the iut data as ad the latet variables as. The arameters of the model is rereseted as. {} is called the comlete data ad is called the icomlete data. Assumig the osterior distributio P ca be geerated tdgive ad is ow. The log lielihood fuctio to maimie is l. Sice is uow we ca rereset l as l Problem: the summatio over the latet variables aears iside the logarithm. Therefore eve P belogs to Gaussia P will still be hard to comute. 22

23 The sirit of EM. Create a iitial iti model dl 0. The iitialiatio ca be arbitrarily radomly or with a small set of traiig eamles. 2. Use the eistig model to obtai aother model ew such that l ew > l 3. Reeat the above ste util reachig a local maimum. 4. Challege: How ca we guarateed to fid a better model after each iteratio give the hidde variable eists? ew As : arg ma l 23

24 EM Theorem If we ca fid a ew that guaratee ew l > l the the same ew will also satisfy the coditio l ew > l If EM theorem is true the we ca try to fid ew arg ma l The such ew will lead to better P How ca we rove EM theorem? If we ca rove the equatio below the we are doe ew l l ew l l 24

25 l Proof of EM Theorem /2 ew l ew l l Sice P P* P l Pl P l P l P ew l P {l P ew l P ew } {l P l P } Aly o both eds: [l ew l ] l ew l ew {l l } {l ew l If we ca rove <0 the we are doe. } 25

26 Proof of EM Theorem 2/2 ew {l } < The roof used Jese s Iequality P t y i P t yi log t P t yi More geerally if ad q are robability distributios log q

27 Otimiatio Process of EM ew arg ma l Iitial ste: radomly choose 0 E ste: usig a eistig arameter to estimate P M ste: fid ew that maimie Q l Chec the covergece of Q or if ot satisfied the set ew ad go bac to E ste. 27

28 EM: Why P i the E ste? EM: Why P i the E ste? Let s go bac to lplp lp l l l l l l l l l q q q q q l l l q q q q Ideedet of q Larger tha 0 equal hs Whe settig ql ca Larger tha 0 equal hs whe q cause The we fid ew that maimie l l q q The we fid that maimie this ew will also mae >0 l l 28

29 Global otimal l l 0l

30 Geeralied EM GEM l ew l ew l l Sice the above is always true. I the M ste we do t really eed to fid a ew that otimies l If we ca guaratee that ew always does a better ob tha i l the we are guaratee to reach a local otimal 30

31 Ideal vs. Available Data Aligmet MT: Problem for Machie Traslatio PE E oisy Chael PFE F Ideal: e e 2 e 3.. solvable by SL f f 2 f 3. Available: e e 2 e 3.. eed EM f f 2 f /2/9 3

32 E: Eglish Frech Aligmet Data: the house la maiso house maiso Aligmets are missig!! Theory: Eglish words are traslated first the ermuted. Parameters: Plathe maisothe lahouse maisohouse 2009/2/9 32

33 Model to lear: Plathe? E: EMTraiig o MT Pmaisothe? Possible assigmets: Plahouse? Pmaisohouse? a iitialie uiformly: Clathe0*/8+*/8/8 Plathe/2 E ste a /8 M ste Cmaisothe */8+0*/8/8 Pmaisothe/2 b /8 MLE Clahouse*/8+0*/8/8 Plahouse/2 Cmaisohouse*/8+2*/83/8 Pmaisohouse/2 lathe3/4 Pa7/256 maisothe/4 Pb47/256 lahouse/8 ormalie ormalie maisohouse7/8 lathe/2 Clathe9/32 E ste M ste Pa3/32 maisothe/2 Cmaisothe3/32 Pb9/32 lahouse/4 Clahouse3/ /2/9 maisohouse3/4 33 Cmaisohouse2/32 b

34 Data ad Model Comlete data Icomlete model arameters uow icomlete data comlete model Suervised learig argma Pmdata m Data geeratio argmapdm d comlete model comlete data icomlete data icomlete model EM comlete data & model argmapicomlete datam m 2009/2/9 34

35 Recommed Readig Patter Recogitio ad Machie Learig Bisho Chater 2 Chater 9 "Bayesia Iferece with Tears Kevi Kight Se 2009

Expectation-Maximization Algorithm.

Expectation-Maximization Algorithm. Expectatio-Maximizatio Algorithm. Petr Pošík Czech Techical Uiversity i Prague Faculty of Electrical Egieerig Dept. of Cyberetics MLE 2 Likelihood.........................................................................................................