Discriminative classifier: Logistic Regression. CS534-Machine Learning

Dscrmnatve classfer: Logstc Regresson CS534-Machne Learnng

2 Logstc Regresson Gven tranng set D stc regresson learns the condtonal dstrbuton We ll assume onl to classes and a parametrc form for here s the parameter vector It s eas to sho that ths s equvalent to.e. the odds of class s a lnear functon of. ; 0 ; p e p ; 0 ;

The Logstc Sgmod Functon g ep A lnear functon has a range from [ ] the stc functon transforms the range to 0 to be a probablt. 3

Logstc Regresson Yelds a Lnear Classfer Gven the decson rule for mnmzng classfcaton error s to predct ŷ = f 0 or f 0.5 More generall here s a threshold Dependng on the loss functon can be dfferent values Ths elds a lnear classfer 0 0 0 0 0 For more general decson rule ths ll be replaced th a dfferent threshold 4

5 Mamum Lkelhood Estmaton We assume each tranng eample s dran ndependentl from the same but unknon dstrbuton the..d assumpton hence e can rte Jont dstrbuton can be factored as Further because t does not depend on so: arg ma arg ma arg ma D D D arg ma ma arg

6 Recall Ths can be compactl rtten as We ll take our learnng obectve functon to be: Computng the Lkelhood g p e g p 0 p D L ] [ D arg ma ma arg

7 Fttng Logstc Regresson b Gradent Ascent L ] [ L ] [ Recall that ep ep ep g t e have ep for 2 t g t g -t t t t g So N L N L

Batch Gradent Ascent for LR Gven : tranng eamples Let 000...0 Repeat untl convergence d 000...0 For to N do e error d d error d... N Onlne gradent ascent algorthm can be easl constructed 8

Other optmzaton technques can also be used for eample Neton s method here s the Hessan matr such that For stc regresson e have: Where s our data matr th each ro correspondng to a sngle nstance R s a dagonal matr th elements: 9

Instablt of MLE estmaton For lnearl separable data the mamum lkelhood s acheved b fndng a lnear decson boundar 0that separates the to classes perfectl Make the magntude of go to nfnt Ths nstablt can be avoded b addng a regularzaton term to the lkelhood obectve Gradent ascent Update rule: arg ma 2 0

Connecton Beteen Logstc Regresson & erceptron Algorthm Both methods learn a lnear functon of the nput features LR uses the stc functon erceptron uses a step functon h e h 0 Both algorthms take a smlar update rule: f 0 otherse h

Connecton th LDA It s nterestng to note that for lnear dscrmnant analss Gaussan class dstrbuton th dfferent mean and shared covarance matr e also have: ep here s defned b the parameters of the model ncludng and Σ Hoever man other possble dstrbutons ll also satsf ths assumpton Based on ths observaton hat can e sa about the modelng assumptons made b these to methods? LDA makes stronger modelng assumpton 2

Mult-Class Logstc Regresson For multclass classfcaton e defne the posteror probablt usng a so-called soft-ma functon here s gven b ep ep Gong through the same MLE dervatons e arrve at the follong gradent: here f and 0 otherse 3

Summar of Logstc Regresson Dscrmnatve classfer Learns condtonal probablt dstrbuton defned b a stc functon roduces a lnear decson boundar Uses eaker modelng assumpton compared to LDA Mamum lkelhood estmaton Gradent ascent bears strong smlart th perceptron Unstable for lnearl separable case should use th regularzaton term to avod ths ssue Easl etended to mult-class problem usng the softma functon 4