Generative classification models

CS 75 Mache Learg Lecture Geeratve classfcato models Mlos Hauskrecht mlos@cs.ptt.edu 539 Seott Square Data: D { d, d,.., d} d, Classfcato represets a dscrete class value Goal: lear f : X Y Bar classfcato A specal case whe Y {,} Frst step: we eed to devse a model of the fucto f

Dscrmat fuctos A commo wa to represet a classfer s b usg Dscrmat fuctos Works for both the bar ad mult-wa classfcato Idea: For ever class =,, k defe a fucto g () mappg X Whe the decso o put should be made choose the class wth the hghest value of () g * arg ma ( ) g Logstc regresso model Dscrmat fuctos: g ( ) g( w ) g ( ) g( w ) Values of dscrmat fuctos var terval [,] Probablstc terpretato f (,w) w, ) g( ) g( w ) w w w z, w) Iput vector w d d

Whe does the logstc regresso fal? Nolear decso boudar 3 Decso boudar.5.5.5 -.5 - -.5 - - -.5 - -.5.5.5 Whe does the logstc regresso fal? Aother eample of a o-lear decso boudar 5 4 3 - - -3-4 -4-3 - - 3 4 5 3

No-lear eteso of logstc regresso use feature (bass) fuctos to model oleartes the same trck as used for the lear regresso Lear regresso f ( ) w w j j ( ) m j j () ( ) ( ) - a arbtrar fucto of w w w Logstc regresso ) g( w w j j ( )) m j d m () w m CS 75 Mache Learg Regularzed logstc regresso If the model s too comple ad ca cause overfttg, ts predcto accurac ca be mproved b removg some puts from the model = settg ther coeffcets to zero Recall the lear model: f ( ) w w w w w 3 3 d d w Iput vector w w w w d d f ( ) w w w w 3 3 d d w 4

Regularzed logstc regresso If the model s too comple ad ca cause overfttg, ts predcto accurac ca be mproved b removg some puts from the model = settg ther coeffcets to zero We ca appl the same dea to the logstc regresso: ) g( w ) Iput vector w w, w, w w w w d k - parameters (weghts) d ) g( w w w3 3 wd d ) g( w ) J Rdge (L) pealt Lear regresso Rdge pealt: J ( w) ( w ) w L,.. w Logstc regresso: J L d Ft to data Model complet pealt w w w ad ( w) log P( D w) w L Ft to data Model complet pealt ( w) log g( w ) ( )log( g( w )) w L Ft to data measured usg the egatve log lkelhood 5

J Lasso (L) pealt Lear regresso Lasso pealt: J Logstc regresso: ( w) ( w ) w L,.. J Ft to data Model complet pealt d w L w ad ( w) log P( D w) w L Ft to data Model complet pealt ( w) log g( w ) ( )log( g( w )) w L Ft to data measured usg the egatve log lkelhood Geeratve approach to classfcato Logstc regresso: Represets ad lears a model of ) A eample of a dscrmatve classfcato approach Model s uable to sample (geerate) data staces (, ) Geeratve approach: Represets ad lears the jot dstrbuto, ) Model s able to sample (geerate) data staces (, ) he jot model defes probablstc dscrmat fuctos How? (, ) ) ) g ) ) ) ), ) ) ) g o ( ) ) ) ) ) ) 6

Geeratve approach to classfcato pcal jot model, ) ) ) ) = Class-codtoal dstrbutos (destes) bar classfcato: two class-codtoal dstrbutos ) ) ) = Prors o classes probablt of class for bar classfcato: Beroull dstrbuto ) ) ) ) Quadratc dscrmat aalss (QDA) Model: Class-codtoal dstrbutos are multvarate ormal dstrbutos μ,σ) d / ( ) Σ ~ ~ N( μ, Σ ) for N( μ, Σ ) for Multvarate ormal ~ N( μ, Σ) / ep Prors o classes (class,) Beroull dstrbuto, ) ( ) ( μ) Σ ~ Beroull {,} ( μ) 7

Learg of parameters of the QDA model Dest estmato statstcs We see eamples we do ot kow the parameters of Gaussas (class-codtoal destes) μ, Σ) d / ( ) Σ ML estmate of parameters of a multvarate ormal for a set of eamples of Optmze log-lkelhood: l( D, μ, Σ) log μˆ How about class prors? / ep Σˆ ( μ) ( Σ μˆ)( ( μ) μˆ) N( μ, Σ) μ, Σ) Learg Quadratc dscrmat aalss (QDA) Learg Class-codtoal dstrbutos Lear parameters of multvarate ormal dstrbutos ~ ~ N( μ, Σ ) for N( μ, Σ ) for Use the dest estmato methods Learg Prors o classes (class,) ~ Beroull Lear the parameter of the Beroull dstrbuto Aga use the dest estmato methods, ) ( ) {,} 8

QDA.5.5 g( ) g( ) -.5 - -.5 - - -.5 - -.5.5.5 Gaussa class-codtoal destes. 9

QDA: Makg class decso Bascall we eed to desg dscrmat fuctos Posteror of a class choose the class wth better posteror probablt ) ) the = g ( ) else = ), Σ Notce t s suffcet to compare:, Σ) ) ) ), Σ ) ), Σ) ), Σ) ) QDA: Quadratc decso boudar Cotours of class-codtoal destes.5.5 -.5 - -.5 - - -.5 - -.5.5.5

QDA: Quadratc decso boudar 3 Decso boudar.5.5.5 -.5 - -.5 - - -.5 - -.5.5.5 Lear dscrmat aalss (LDA) Assumes covaraces are the same ~ N( μ, Σ), ~ N( μ, Σ),

LDA: Lear decso boudar Cotours of class-codtoal destes.5.5 -.5 - -.5 - - -.5 - -.5.5.5 LDA: lear decso boudar Decso boudar.5.5 -.5 - -.5 - - -.5 - -.5.5.5

Geeratve classfcato models Idea:. Represet ad lear the dstrbuto, ). Model s able to sample (geerate) data staces (, ) 3. he model s used to get probablstc dscrmat fuctos g o ( ) ) g( ) ) pcal model, ) ) ) ) = Class-codtoal dstrbutos (destes) bar classfcato: two class-codtoal dstrbutos ) ) ) = Prors o classes - probablt of class bar classfcato: Beroull dstrbuto ) ) Naïve Baes classfer A geeratve classfer model wth a addtoal smplfg assumpto: All put attrbutes are codtoall depedet of each other gve the class. Oe of the basc ML classfcato models (ofte performs ver well practce) ) So we have:, ) ) ) ) d ) p ( ) p ( ) d ) d 3

Learg parameters of the model Much smpler dest estmato problems We eed to lear: ) ad ) ad ) Because of the assumpto of the codtoal depedece we eed to lear: for ever put varable : ) ad ) Much easer f the umber of put attrbutes s large Also, the model gves us a fleblt to represet put attrbutes of dfferet forms!!! E.g. oe attrbute ca be modeled usg the Beroull, the other usg Gaussa dest, or a Posso dstrbuto Makg a class decso for the Naïve Baes Dscrmat fuctos Posteror of a class choose the class wth better posteror probablt ) ) ) the = else = d d, ) ) d, ) ), ) ) 4

Net: two terestg questos () wo models wth lear decso boudares: Logstc regresso LDA model ( Gaussas wth the same covarace matrces ~ N(, ) for ~ N(, ) for Questo: Is there a relato betwee the two models? () wo models wth the same gradet: Lear model for regresso Logstc regresso model for classfcato have the same gradet update w w ( f ( )) Questo: Wh s the gradet the same? Logstc regresso ad geeratve models wo models wth lear decso boudares: Logstc regresso Geeratve model wth Gaussas wth the same covarace matrces ~ N(, ) for ~ N(, ) for Questo: Is there a relato betwee the two models? Aswer: Yes, the two models are related!!! Whe we have Gaussas wth the same covarace matr the probablt of gve has the form of a logstc regresso model!!!, μ, μ, Σ) g( w ) CS 75 Mache Learg 5

Logstc regresso ad geeratve models Members of the epoetal faml ca be ofte more aturall descrbed as θ f ( θ,φ) h(, φ)ep θ - A locato parameter A( θ) a( φ) Clam: A logstc regresso s a correct model whe class codtoal destes are from the same dstrbuto the epoetal faml ad have the same scale factor φ Ver powerful result!!!! We ca represet posterors of ma dstrbutos wth the same small logstc regresso model φ - A scale parameter CS 75 Mache Learg Lear regresso w w w he gradet puzzle f () Logstc regresso f ( ) w f ( ), w) g( w ) w w w z f () ) w d w d d d Gradet update: w w ( f ( )) Ole: CS 75 Mache Learg Gradet update: w w ( f ( )) he same w w ( f ( )) Ole: w w ( f ( )) 6

he gradet puzzle he same smple gradet update rule derved for both the lear ad logstc regresso models Where the magc comes from? Uder the log-lkelhood measure the fucto models ad the models for the output selecto ft together: Lear model + Gaussa ose Gaussa ose w ~ N(, ) w w w w d w Logstc + Beroull Beroull() ) g( w ) d w w w w d z g( w ) Beroull tral d Geeralzed lear models (GLIMs) Assumptos: he codtoal mea (epectato) s: f ( w ) Where f (.) s a respose fucto Output s characterzed b a epoetal faml dstrbuto wth a codtoal mea Gaussa ose w Eamples: Lear model + Gaussa ose w ~ N(, ) Logstc + Beroull Beroull() g( w ) e w d d w w w w d w w w d z w g( w ) Beroull tral 7

8 Geeralzed lear models (GLIMs) A caocal respose fuctos : ecoded the samplg dstrbuto Leads to a smple gradet form Eample: Beroull dstrbuto Logstc fucto matches the Beroull ) ( ) ( )ep, ( ) ( φ θ θ φ θ,φ a A h p (.) f p ) ( ) ( ) log( log ep log e