Machine Learning. Logistic Regression -- generative verses discriminative classifier. Le Song /15-781, Spring 2008

Size: px

Start display at page:

Download "Machine Learning. Logistic Regression -- generative verses discriminative classifier. Le Song /15-781, Spring 2008"

Martin Craig
5 years ago
Views:

1 Machie Learig 070/578 Srig 008 Logistic Regressio geerative verses discrimiative classifier Le Sog Lecture 5 Setember 4 0 Based o slides from Eric Xig CMU Readig: Cha CB

2 Geerative vs. Discrimiative Classifiers Goal: Wish to lear f: X Y e.g. PYX Geerative classifiers e.g. Naïve Baes: Assume some fuctioal form for PXY PY This is a geerative model of the data! Estimate arameters of PXY PY directl from traiig data Use Baes rule to calculate PYX= i Y i X i Discrimiative classifiers: Directl assume some fuctioal form for PYX This is a discrimiative model of the data! Estimate arameters of PYX directl from traiig data Y i X i

3 Cosider the a Gaussia Geerative Classifier learig f: X Y where X is a vector of realvalued features < X X > Y is boolea What does that iml about the form of PYX? The oit robabilit of a datum ad its label is: Give a datum we redict its label usig the coditioal robabilit of the label give the datum: Y i X i e / ' ' / ' / e e

4 Naïve Baes Classifier Whe X is multivariategaussia vector: The oit robabilit of a datum ad it label is: The aïve Baes simlificatio More geerall: Where.. is a arbitrar coditioal discrete or cotiuous D desit e / T Y i X i e / m Y i X i X i X i m

5 The redictive distributio Uderstadig the redictive distributio Uder aïve Baes assumtio: For two class i.e. K= ad whe the two classes haves the same variace ** turs out to be the logistic fuctio * ' ' ' ' N N log e log e log ] [ ] [ e C C e T ** log e log e ' ' ' ' ' C C θ θ 0

6 The decisio boudar The redictive distributio The Baes decisio rule: For multile class i.e. K> * corresod to a softma fuctio T T e e T M e 0 e T e e e T T T l l

7 Discussio: Geerative ad discrimiative classifiers Geerative: Modelig the oit distributio of all data Discrimiative: Modelig ol oits at the boudar How? Regressio!

8 Liear regressio The data: Both odes are observed: X is a iut vector Y is a resose vector we first cosider as a geeric cotiuous resose vector the we cosider the secial case of classificatio where is a discrete idicator 3 3 N N A regressio scheme ca be used to model directl rather tha Y i X i N

9 Logistic regressio sigmoid classifier The coditio distributio: a Beroulli where is a logistic fuctio T e We ca used the bruteforce gradiet method as i LR But we ca also al geeric laws b observig the is a eoetial famil fuctio more secificall a geeralized liear model see et lecture!

10 Traiig Logistic Regressio: MCLE Estimate arameters =< 0... > to maimize the coditioal lielihood of traiig data Traiig data Data lielihood = Data coditioal lielihood =

11 Eressig Coditioal Log Lielihood Recall the logistic fuctio: ad coditioal lielihood:

12 Maimizig Coditioal Log Lielihood The obective: l 3 θ + 3 θ lθ 3 lθ + 3 lθ lθ Good ews: l is cocave fuctio of θ θ lθ θ Bad ews: o closedform solutio to maimize l

13 Gradiet Ascet Proert of sigmoid fuctio: μ t = + e t The gradiet: lθ = θ i i μ μ μ θ = θ i i + i i μ i The gradiet ascet algorithm iterate util chage < ε reeat For all i

14 How about MAP? It is ver commo to use regularized maimum lielihood. Oe commo aroach is to defie riors o Normal distributio zero mea idetit covariace Hels avoid ver large weights ad overfittig MAP estimate

15 MLE vs MAP Maimum coditioal lielihood estimate Maimum a osteriori estimate

16 Logistic regressio: ractical issues IRLS taes ONd 3 er iteratio where N = umber of traiig cases ad d = dimesio of iut. QuasiNewto methods that aroimate the Hessia wor faster. Cougate gradiet taes ONd er iteratio ad usuall wors best i ractice. Stochastic gradiet descet ca also be used if N is large c.f. ercetro rule:

17 Naïve Baes vs Logistic Regressio Cosider Y boolea X cotiuous X=<X... X > Number of arameters to estimate: NB: + LR: + Estimatio method: NB arameter estimates are ucouled suer eas LR arameter estimates are couled less eas

18 Naïve Baes vs Logistic Regressio Asmtotic comariso # traiig eamles ifiit whe model assumtios correct NB LR roduce idetical classifiers whe model assumtios icorrect LR is less biased does ot assume coditioal ideedece therefore eected to outerform NB

19 Naïve Baes vs Logistic Regressio Noasmtotic aalsis see [Ng & Jorda 00] covergece rate of arameter estimates how ma traiig eamles eeded to assure good estimates? NB order log where = # of attributes i X LR order NB coverges more quicl to its erhas less helful asmtotic estimates

20 Rate of covergece: logistic regressio Let h Dism be logistic regressio traied o m eamles i dimesios. The with high robabilit: Imlicatio: if we wat for some small costat e 0 it suffices to ic order eamles Covergeces to its asmtotic classifier i order eamles result follows from Vai s structural ris boud lus fact that the "VC Dimesio" of a dimesioal liear searators is

21 Rate of covergece: aïve Baes arameters Let a ε δ>0 ad a l 0 be fied. Assume that for some fied ρ > 0 we have that Let The we robabilit at least d after m eamles:. For discrete iut for all i ad b. For cotiuous iuts for all i ad b

22 Some eerimets from UCI data sets

23 Summar Logistic regressio Fuctioal form follows from Naïve Baes assumtios For Gaussia Naïve Baes assumig variace For discretevalued Naïve Baes too But traiig rocedure ics arameters without the coditioal ideedece assumtio MLE traiig: ic W to maimize PY X; MAP traiig: ic W to maimize P XY regularizatio hels reduce overfittig Gradiet ascet/descet Geeral aroach whe closedform solutios uavailable Geerative vs. Discrimiative classifiers Bias vs. variace tradeoff ¾ ; = ¾

Classification with linear models

Classification with linear models Lecture 8 Classificatio with liear models Milos Hauskrecht milos@cs.pitt.edu 539 Seott Square Geerative approach to classificatio Idea:. Represet ad lear the distributio, ). Use it to defie probabilistic