CS 675 Intro to Machne Learnng Lecture Generatve classfcaton models Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square Data: D { d, d,.., dn} d, Classfcaton represents a dscrete class value Goal: learn f : X Y Bnar classfcaton A specal case when Y {,} Frst step: we need to devse a model of the functon f
Dscrmnant functons A common wa to represent a classfer s b usng Dscrmnant functons Works for both the bnar and mult-wa classfcaton Idea: For ever class =,, k defne a functon g mappng X When the decson on nput should be made choose the class wth the hghest value of g * arg ma g Logstc regresson model Dscrmnant functons: g g w g g w Values of dscrmnant functons var n nterval [,] Probablstc nterpretaton f,w w, g g w w w w z, w Input vector w d d
Logstc regresson We learn a probablstc functon f : X [,] where f descrbes the probablt of class gven f, w g w, w Note that:, w, w Makng decsons wth the logstc regresson model: If / then choose Else choose When does the logstc regresson fal? Quadratc decson boundar s needed 3 Decson boundar.5.5.5 -.5 - -.5 - - -.5 - -.5.5.5 CS 75 Machne Learnng 3
When does the logstc regresson fal? Another eample of a non-lnear decson boundar 5 4 3 - - -3-4 -4-3 - - 3 4 5 CS 75 Machne Learnng Non-lnear etenson of logstc regresson use feature bass functons to model nonlneartes the same trck as used for the lnear regresson Lnear regresson f w w j j m j j - an arbtrar functon of w w w Logstc regresson f g w w j j m j d m w m CS 75 Machne Learnng 4
Generatve approach to classfcaton Logstc regresson: Represents and learns a model of An eample of a dscrmnatve approach Generatve approach:. Represents and learns the jont dstrbuton,. Uses t to defne probablstc dscrmnant functons E.g. g o g How? pcall the jont s,,, Generatve approach to classfcaton pcal jont model, = Class-condtonal dstrbutons denstes bnar classfcaton: two class-condtonal dstrbutons = Prors on classes probablt of class for bnar classfcaton: Bernoull dstrbuton 5
6 Quadratc dscrmnant analss QDA Model: Class-condtonal dstrbutons multvarate normal dstrbutons Prors on classes class, Bernoull dstrbuton ep / / Σ Σ,Σ d p for, ~ N Σ for, ~ N Σ Multvarate normal, ~ Σ N p, ~ Bernoull {,} Learnng of parameters of the QDA model Denst estmaton n statstcs We see eamples we do not know the parameters of Gaussans class-condtonal denstes ML estmate of parameters of a multvarate normal for a set of n eamples of Optmze log-lkelhood: How about class prors? ep, / / Σ Σ Σ d p n n ˆ n n ˆ ˆ ˆ Σ log,, Σ, Σ n p D l, Σ N
Learnng Quadratc dscrmnant analss QDA Learnng Class-condtonal dstrbutons Learn parameters of multvarate normal dstrbutons ~ ~ N, Σ for N, Σ for Use the denst estmaton methods Learnng Prors on classes class, ~ Bernoull Learn the parameter of the Bernoull dstrbuton Agan use the denst estmaton methods, {,} QDA.5.5 g g -.5 - -.5 - - -.5 - -.5.5.5 7
Gaussan class-condtonal denstes. QDA: Makng class decson Bascall we need to desgn dscrmnant functons Posteror of a class choose the class wth better posteror probablt then = g g else =, Σ It s suffcent to compare:, Σ, Σ, Σ, Σ 8
QDA: Quadratc decson boundar Contours of class-condtonal denstes.5.5 -.5 - -.5 - - -.5 - -.5.5.5 QDA: Quadratc decson boundar 3 Decson boundar.5.5.5 -.5 - -.5 - - -.5 - -.5.5.5 9
Lnear dscrmnant analss LDA Assume covarances are the same ~ N, Σ, ~ N, Σ, LDA: Lnear decson boundar Contours of class-condtonal denstes.5.5 -.5 - -.5 - - -.5 - -.5.5.5
LDA: lnear decson boundar Decson boundar.5.5 -.5 - -.5 - - -.5 - -.5.5.5 Generatve classfcaton models Idea:. Represent and learn the dstrbuton,. Use t to defne probablstc dscrmnant functons E.g. g o g pcal model, = Class-condtonal dstrbutons denstes bnar classfcaton: two class-condtonal dstrbutons = Prors on classes - probablt of class bnar classfcaton: Bernoull dstrbuton
Naïve Baes classfer A generatve classfer model wth an addtonal smplfng assumpton One of the basc ML classfcaton models ver often performs ver well n practce All nput attrbutes are condtonall ndependent of each other gven the class. So we have:, d p p d d Learnng parameters of the model Much smpler denst estmaton problems We need to learn: and and Because of the assumpton of the condtonal ndependence we need to learn: for ever varable : and Much easer f the number of nput attrbutes s large Also, the model gves us a fleblt to represent nput attrbutes of dfferent forms!!! E.g. one attrbute can be modeled usng the Bernoull, the other as Gaussan denst, or as a Posson dstrbuton
Makng a class decson for the Naïve Baes Dscrmnant functons Posteror of a class choose the class wth better posteror probablt then = else = d d, d,, Net: two nterestng questons wo models wth lnear decson boundares: Logstc regresson LDA model Gaussans wth the same covarance matrces ~ N, for ~ N, for Queston: Is there an relaton between the two models? wo models wth the same gradent: Lnear model for regresson Logstc regresson model for classfcaton have the same gradent update n w w f Queston: Wh s the gradent the same? 3
Logstc regresson and generatve models wo models wth lnear decson boundares: Logstc regresson Generatve model wth Gaussans wth the same covarance matrces ~ N, for ~ N, for Queston: Is there an relaton between the two models? Answer: Yes, the two models are related!!! When we have Gaussans wth the same covarance matr the probablt of gven has the form of a logstc regresson model!!!,,, Σ g w CS 75 Machne Learnng Logstc regresson and generatve models Members of the eponental faml can be often more naturall descrbed as θ f θ,φ h, φep θ - A locaton parameter A θ a φ Clam: A logstc regresson s a correct model when class condtonal denstes are from the same dstrbuton n the eponental faml and have the same scale factor φ Ver powerful result!!!! We can represent posterors of man dstrbutons wth the same small logstc regresson model φ - A scale parameter CS 75 Machne Learnng 4
Lnear regresson w w w he gradent puzzle f Logstc regresson f w f, w g w w w w z f w d w d d d Gradent update: n w w f Onlne: CS 75 Machne Learnng Gradent update: w w f he same w w f Onlne: n w w f he gradent puzzle he same smple gradent update rule derved for both the lnear and logstc regresson models Where the magc comes from? Under the log-lkelhood measure the functon models and the models for the output selecton ft together: Lnear model + Gaussan nose Gaussan nose w ~ N, w w w w d w Logstc + Bernoull Bernoull g w d w w w w d z g w Bernoull tral d CS 75 Machne Learnng 5
Generalzed lnear models GLIMs Assumptons: he condtonal mean epectaton s: f w Where f. s a response functon Output s characterzed b an eponental faml dstrbuton wth a condtonal mean Gaussan nose w Eamples: Lnear model + Gaussan nose w ~ N, Logstc + Bernoull Bernoull g w e w CS 75 Machne Learnng d d w w w w d w w w d z w g w Bernoull tral Generalzed lnear models GLIMs A canoncal response functons f. : encoded n the samplng dstrbuton θ θ,φ h, φep Leads to a smple gradent form Eample: Bernoull dstrbuton p log Logstc functon matches the Bernoull CS 75 Machne Learnng A θ a φ ep log log e 6
Evaluaton of classfers ROC CS 75 Machne Learnng Evaluaton For an data set we use to test the classfcaton model on we can buld a confuson matr: Counts of eamples wth: class label that are classfed wth a label predct j 4 target 7 54 7
Evaluaton For an data set we use to test the classfcaton model on we can buld a confuson matr: Counts of eamples wth: class label that are classfed wth a label predct j 4 target 7 54 Evaluaton For an data set we use to test the model we can buld a confuson matr: target 4 7 predct 54 Accurac = 94/3 Error = 37/3 = - Accurac CS 75 Machne Learnng 8
Evaluaton for bnar classfcaton Entres n the confuson matr for bnar classfcaton have names: target P FP predct FN N P: rue postve ht FP: False postve false alarm N: rue negatve correct rejecton FN: False negatve a mss Addtonal statstcs Senstvt recall Specfct SENS SPEC P P FN N N FP Postve predctve value precson P PP P FP Negatve predctve value N NPV N FN 9
Bnar classfcaton: addtonal statstcs Confuson matr target predct 4 8 SENS 4/6 SPEC 8/9 PPV 4/5 NPV 8/ Row and column quanttes: Senstvt SENS Specfct SPEC Postve predctve value PPV Negatve predctve value NPV Classfers Project dataponts to one dmensonal space: Defned for eample b: w or =,w.5 Decson boundar w.5 w > -.5 - -.5 w < Decson boundar w = - - -.5 - -.5.5.5
Bnar decsons: Recever Operatng Curves..8.6.4. * -. - -5 - -5 5 5 Probabltes: SENS SPEC threshold p * p * Recever Operatng Characterstc ROC ROC curve plots : SN= * -SP= * for dfferent *..8.6.4. * -. - -5 - -5 5 5 SENS p *.9.8.7.6.5.4.3.....3.4.5.6.7.8.9 -SPEC p *
ROC curve..8...8.8 Case Case Case 3.6.6.6.4.4.4... -. - -5 - -5 5 5 -. - -5 - -5 5 5 -. - -5 - -5 5 5 p *.9.8.7.6.5.4.3.....3.4.5.6.7.8.9 p * Recever operatng characterstc ROC shows the dscrmnablt between the two classes under dfferent decson bases Decson bas can be changed usng dfferent loss functon Qualt of a classfcaton model: Area under the ROC Best value, worst no dscrmnablt:.5