Discriminative classifier: Logistic Regression. CS534-Machine Learning

Size: px

Start display at page:

Download "Discriminative classifier: Logistic Regression. CS534-Machine Learning"

Sarah Barrett
5 years ago
Views:

1 Dscrmnatve classfer: Logstc Regresson CS534-Machne Learnng

2 robablstc Classfer Gven an nstance, hat does a probablstc classfer do dfferentl compared to, sa, perceptron? It does not drectl predct Instead, t frst computes the probablt that the nstance belongs to dfferent classes,.e., the posteror probablt of gven Gven, t then makes a predcton usng decson theor 2

3 Decson Theor Goal : Mnmzng the probablt of mstake argma.e., predct the class that has the mamum posteror probablt Goal 2: mnmzng the epected loss True label redcted Spam Spam 0 0 Non-spam 0 Nonspam» Gven a cost matr specfng the cost of dfferent tpes of mstakes argmn, Epected cost f e predct 3

4 Eample Suppose our probablstc spam-flter gves the follong posteror for an ncomng emal : 0.6 True label redcted Spam Spam 0 0 Non-spam 0 Nonspam The epected cost f predct spam? If t s a spam: no cost 0.6 prob If t s not cost of prob What f e predct non-spam? argmn,, nonspam 4

5 To man approaches To learn a probablstc classfer, there are to tpes of approaches Generatve: Learn and Compute usng Baes rule, Dscrmnatve: Learn drectl Logstc regresson s one of such technques 5

6 6 Logstc Regresson Gven tranng set D, stc regresson drectl learns the condtonal dstrbuton We ll assume onl to classes and a parametrc form for here s the parameter vector It s eas to sho that ths s equvalent to.e. the odds of class s a lnear functon of. 0 p e p T T 0

7 The Logstc Sgmod Functon g, T ep T The output of a lnear functon has range,. A stc sgmod functon transforms the value of nto a range beteen 0 and 7

8 Logstc Regresson Yelds a Lnear Classfer Gven, suppose e use the decson rule for mnmzng classfcaton error:.e., predct f 0 More generall,, here s a threshold Dependng on the loss functon, can be dfferent values Ths elds a lnear classfer For more general decson rule, ths ll be replaced th a dfferent threshold 8

9 Learnng Setup Gven a set of tranng nstances: We assume that and are probablstcall related parameterzed b Goal: learnng from the tranng data usng Mamum Lkelhood Estmaton 9

10 0 Lkelhood Functon We assume each tranng eample, s dran IID from the same but unknon dstrbuton, : Jont dstrbuton, can be factored as Further, can be dropped because t does not depend on : arg ma, arg ma arg ma D D,, D arg ma ma arg

11 Let Ths can be compactl rtten as We ll take our learnng obectve functon to be: Computng the Lkelhood p e p 0 p l ] [, D, arg ma ma arg

12 2 Fttng Logstc Regresson b Gradent Ascent l ] [ Recall that ep T ep ep g t e have ep for 2 t g t g -t t t t g T N l N l ] [ l l Small tp

13 Batch Gradent Ascent for LR Gven : tranng eamples Let 0,0,0,...,0 Repeat untl convergence d 0,0,0,...,0 For to N do T e error,,,..., N d d d error Onlne gradent ascent algorthm can be easl constructed 3

14 Unlke lnear regresson, no close-form soluton for ths optmzaton problem because of the stc functon Hoever, other optmzaton technques can also be used, for eample Neton s method here s the Hessan matr such that, For stc regresson, e have: here s our data matr, each ro correspondng to an nstance, s a dagonal matr th elements: Neton s method enos faster convergence, but each step nvolves computng the Hessan and takng ts nverse epensve computaton Also called teratve reeghted least squares IRLS method 4

15 Instablt of MLE estmaton For lnearl separable data, the mamum lkelhood s acheved b fndng a lnear decson boundar 0that separates the to classes perfectl Make the magntude of go to nfnt Ths nstablt can be avoded b addng a regularzaton term to the lkelhood obectve Gradent ascent Update rule: arg ma, 2 5

16 Connecton Beteen Logstc Regresson & erceptron Algorthm Both methods learn a lnear functon of the nput features LR uses the stc functon, erceptron uses a step functon h e h 0 Both algorthms take a smlar update rule: f 0 otherse h 6

17 Mult-Class Logstc Regresson For multclass classfcaton, e defne the posteror probablt usng a so-called soft-ma functon ep ep here s gven b Gong through the same MLE dervatons, e arrve at the follong gradent: here f, and 0 otherse 7

18 Summar of Logstc Regresson Dscrmnatve classfer Learns condtonal probablt dstrbuton defned b a stc functon roduces a lnear decson boundar Mamum lkelhood estmaton Gradent ascent bears strong smlart th perceptron Unstable for lnearl separable case, should use th regularzaton term to avod ths ssue Easl etended to mult-class problem usng the softma functon 8

Discriminative classifier: Logistic Regression. CS534-Machine Learning

Discriminative classifier: Logistic Regression. CS534-Machine Learning Dscrmnatve classfer: Logstc Regresson CS534-Machne Learnng 2 Logstc Regresson Gven tranng set D stc regresson learns the condtonal dstrbuton We ll assume onl to classes and a parametrc form for here s