UVA$CS$6316$$ $Fall$2015$Graduate:$$ Machine$Learning$$ $ $Lecture$15:$LogisAc$Regression$/$ GeneraAve$vs.$DiscriminaAve$$

Dr.YanjunQ/UVACS6316/f15 UVACS6316 Fall2015Graduate: MachneLearnng Lecture15:LogsAcRegresson/ GeneraAvevs.DscrmnaAve 10/21/15 Dr.YanjunQ UnverstyofVrgna Departmentof ComputerScence 1 Wherearewe?! FvemajorsecHonsofthscourse " Regresson(supervsed " ClassfcaHon(supervsed " Unsupervsedmodels " Learnngtheory " Graphcalmodels Dr.YanjunQ/UVACS6316/f15 10/21/15 2

Wherearewe?! hreemajorsechonsforclassfcahon We can dvde the large varety of classfcaton approaches nto roughly three major types 1. Dscrmnatve - drectly estmate a decson rule/boundary - e.g., logstc regresson, support vector machne, decsonree 2. Generatve: - buld a generatve statstcal model - e.g., naïve bayes classfer, Bayesan networks 3. Instance based classfers - Use observaton drectly (no models - e.g. K nearest neghbors Dr.YanjunQ/UVACS6316/f15 10/21/15 3 C3 Dr.YanjunQ/UVACS6316/f15 ADatasetfor classfcahon C3 Output as Dscrete Class Label C 1, C 2,, C L GeneraHve DscrmnaHve argmax P(C X = argmax C C P(C X C = c 1,,c L P(X,C = argmax P(X CP(C C Data/ponts/nstances/examples/samples/records:[rows] Features/a0rbutes/dmensons/ndependent3varables/covarates/predctors/regressors:[columns,exceptthelast] arget/outcome/response/label/dependent3varable:specalcolumntobepredcted[lastcolumn] 10/21/15 4

Dr.YanjunQ/UVACS6316/f15 Establshng a probablstc model for classfcaton (cont. (1 Generatve model P x c ( 1 argmax C = argmax C P( x c2 P(C X = argmax P(X,C C P(X CP(C P( x cl Generatve Probablstc Model for Class 1 Generatve Probablstc Model for Class 2 Generatve Probablstc Model for Class L x1 x2 x p x1 x2 x p x1 x2 x p x = (x 1, x 2,, x p 10/21/15 AdaptfromProf.KeChenNBsldes 5 Establshng a probablstc model for classfcaton (2 Dscrmnatve model P(C X C = c 1,,c L, X = (X 1,, X n P ( c 1 x P ( c 2 x P( c L x Dr.YanjunQ/UVACS6316/f15 Dscrmnatve Probablstc Classfer x1 x2 x = (x 1, x 2,, x n 10/21/15 AdaptfromProf.KeChenNBsldes 6 xn

oday: Dr.YanjunQ/UVACS6316/f15 # LogsHcregresson # GeneraHvevs.DscrmnaHve 10/21/15 7 Dr.YanjunQ/UVACS6316/f15 MulHvaratelnearregressonto LogsHcRegresson y = α + β1 x1 + β2x2 +... + βx Dependent Independentvarables Predcted Predctorvarables Responsevarable Explanatoryvarables Outcomevarable Covarables LogsHcregressonfor bnaryclassfcahon P( y x ln 1 P( y x! = α + β x + β x +...+ β x 1 1 2 2 p p 10/21/15 8

! y {0,1} (1Lneardecsonboundary P( y x ln 1 P( y x! = α + β x + β x +...+ β x 1 1 2 2 p p Dr.YanjunQ/UVACS6316/f15 (2p(y x 10/21/15 9 Dr.YanjunQ/UVACS6316/f15 helogshcfunchon(1 eesacommon"s"shapefunc e.g. Probabltyof dsease P (Y=1 X 1.0 0.8 α+ βx e P(y x = α+ βx 1+ e 0.6 0.4 0.2 0.0 10/21/15 10 x

RECAP:ProbablsHcInterpretaHon oflnearregresson Dr.YanjunQ/UVACS6316/f15 Letusassumethatthetargetvarableandthenputsare relatedbytheequahon: y = θ x + ε whereεsanerrortermofunmodeledeffectsorrandomnose Nowassumethatε3followsaGaussanN(0,σ,thenwe have: 2 1 ( y θ x p( y ; θ = exp x 2 2πσ 2σ ByIIDassumpHon!lkelhood!MLEesHmator n n n 1 L( θ = p( y x; θ = exp = 2πσ 1 1 1 n 2 10/21/15 l( θ = nlog = ( y 2 1 θ x 2πσ σ 2 = 1 1 2 ( y θ x 2 σ 2 11 Dr.YanjunQ/UVACS6316/f15 LogsHcRegresson when? LogsHcregressonmodelsareappropratefortarget varablecodedas0/1. Weonlyobserve 0 and 1 forthetargetvarable but wethnkofthetargetvarableconceptuallyasa probabltythat 1 wlloccur. hs means we use Bernoull dstrbuton to model the target varable wth ts Bernoull parameter p=p(y=1 x predefned. he man nterest! predctng the probablty that an event occurs (.e., the probablty that p(y=1 x. 10/21/15 12

DscrmnaHve e.g. Probabltyof dsease LogsHcregressonmodelsfor bnarytargetvarablecoded0/1. P (C=1 X 1.0 0.8 Dr.YanjunQ/UVACS6316/f15 0.6 logshcfunchon 0.4 0.2 LogtfuncHon 0.0 eα+βx P(c =1 x = 1+ e α+βx DecsonBoundary!equalstozero! P(c =1 x! P(c =1 x ln# & = ln# & = α + β 1 x 1 + β 2 x 2 +... + β p x p 10/21/15 13 " P(c = 0 x % " 1 P(c =1 x % x helogshcfunchon(2 α + e P( y x = 1 + e P( y x ln = α + βx 1 P( y x { βx α + βx LogtofP(y x Dr.YanjunQ/UVACS6316/f15 10/21/15 14

Dr.YanjunQ/UVACS6316/f15 From probablty to logt,.e. log odds (and back agan p z = log 1 p!!!!!!!!!!logt!/!log!odd!functon! p = p 1 p = ez!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ez 1+ e = 1 z 1+ e!!!!!!!logstc!functon z 10/21/15 15 helogshcfunchon(3 Advantagesofthelogt SmpletransformaHonofP(y x LnearrelaHonshpwthx Dr.YanjunQ/UVACS6316/f15 CanbeconHnuous(Logtbetweenenfto+nfnty DrectlyrelatedtothenoHonoflogoddsoftarget event P ln = α + βx 1- P P 1- P = e α+βx 10/21/15 16

LogsHcregresson BnaryoutcometargetvarableY Dr.YanjunQ/UVACS6316/f15 10/21/15 17 Dr.YanjunQ/UVACS6316/f15 LogsHcRegressonAssumpHons Lneartynthelogt theregresson equahonshouldhavealnearrelahonshp wththelogtformofthetargetvarable heresnoassumphonaboutthefeature varables/predctorsbenglnearlyrelated toeachother. 10/21/15 18

Bnary Logstc Regresson Dr.YanjunQ/UVACS6316/f15 In summary that the logstc regresson tells us two thngs at once. ransformed, the log odds (logt are lnear. ln[p/(1-p] Odds=3p/(1=p 3 Logstc Dstrbuton P (Y=1 x x hs means we use Bernoull dstrbuton to model the target varable wth ts Bernoull parameter p=p(y=1 x predefned. 10/21/15 19 x p 1ep Bnary!MulHnomal LogsHcRegressonModel Dr.YanjunQ/UVACS6316/f15 DrectlymodelstheposterorprobablHesastheoutputofregresson exp( βk 0 + βk x Pr( G = k X = x =, K 1 1+ exp( β + β x Pr( G = K X = x = 1+ l= 1 K 1 l= 1 1 l0 exp( β l0 l + β x l k = 1,, K 1 10/21/15 xspedmensonalnputvector \beta k sapedmensonalvectorforeachk3 3 otalnumberofparameterss(ke1(p+1 Notethattheclassboundaresarelnear 20

oday: Dr.YanjunQ/UVACS6316/f15 # LogsHcregresson # ParameteresHmaHon # GeneraHvevs.DscrmnaHve 10/21/15 21 ParameterEsHmaHonforLR!MLEfromthedata Dr.YanjunQ/UVACS6316/f15 RECAP:Lnearregresson!Leastsquares LogsHcregresson:!Maxmumlkelhood eshmahon 10/21/15 22

RECAP:ProbablsHcInterpretaHon oflnearregresson(cont. Hencethelogelkelhoods: Dr.YanjunQ/UVACS6316/f15 1 1 1 n l( θ = nlog = ( y 2 1 θ x 2πσ σ 2 2 MLE Doyourecognzethelastterm? n Yests: 1 J ( θ = ( x θ y 2 = 1 husunderndependenceassumphon,resdualsquare error(rrssequvalenttomleof\theta3! 2 10/21/15 23 YanjunQ/UVACS4501e01e6501e07 MLEforLogsHcRegressonranng Let sftthelogshcregressonmodelfork=2,.e.,numberofclassess2 ranngset:(x,y,=1,,n3 ForBernoulldstrbuHon p(y x y (1 p 1 y (condhonal Logelkelhood: How? N l(β= {logpr(y = y X = x } N =1 = y log(pr(y = 1 X = x +(1 y log(pr(y = 0 X = x =1 N = ( y log exp(β x 1+ exp(β x +(1 y log 1 1+ exp(β x =1 N = ( y β x log(1+ exp(β x! =1 x are(p+1edmensonalnputvectorwthleadngentry1 \betasa(p+1edmensonalvector 10/21/15 WewanttomaxmzethelogelkelhoodnordertoesHmate\beta3 24

Dr.YanjunQ/UVACS6316/f15 N l(β= {logpr(y = y X = x }! =1 10/21/15 25 Dr.YanjunQ/UVACS6316/f15 NewtoneRaphsonforLR(opHonal l( β = β N = 1 ( y exp( β x x 1+ exp( β x = 0 (p+1nonelnearequahonstosolvefor(p+1unknowns SolvebyNewtoneRaphsonmethod: where, ( 2 l(β β β = - β new β old [( 2 l(β β β ]-1 l(β β, N =1 x x ( exp(β x 1+ exp(β x ( 1 1+ exp(β x mnmzesaquadrahcapproxmahon tothefunchonwearereallynterestedn. 10/21/15 p(x ;β 1ep(x ;β 26

NewtoneRaphsonforLR N l(β β = (y exp(β x 1+ exp(β x x = X (y p =1 Dr.YanjunQ/UVACS6316/f15 x 1 x2 X =! xn So,NRrulebecomes: N by ( p+ 1 y1 2, y y =! yn ( 2 l(β β β = X WX, N by 1 β new β old + ( X exp( β x1 /(1 + exp( β x 1 exp( β x2 /(1 + exp( β x2 p =! exp( β xn /(1 + exp( β xn X : N (p + 1 matrx of x y : N 1 matrx of y p : N 1 matrx of p( x ; β W : N N old dagonal matrx of WX, N by 1 p( x ; β old 1 X (1 p( x ; β ( y p, old 10/21/15 exp( β x 1 ( (1 (1+ exp( β x (1+ exp( β x 27 NewtoneRaphsonforLR Dr.YanjunQ/UVACS6316/f15 NewtoneRaphson 10/21/15 β new = ( X = ( X = β old WX WX + ( X 1 1 X X Wz Adjustedresponse z = Xβ old + W WX W ( Xβ 1 ( y 1 + W ( y p ( y p IteraHvelyreweghtedleastsquares(IRLS new β arg mn( z Xβ W ( z Xβ β arg mn( y p β old p X W 1 1 ( y p Reexpressng Newtonstepas weghtedleast squarestep 28

YanjunQ/UVACS4501e01e6501e07 Logstc Regresson ask classfcaton Representaton Score Functon Log-odds = lnear functon of X s EPE, wth condtonal Log-lkelhood Search/Optmzaton Iteratve (Newton method Models, Parameters Logstc weghts eα+βx P(c =1 x = 1+ e α+βx 10/21/15 29 oday: Dr.YanjunQ/UVACS6316/f15 # LogsHcregresson # GeneraHvevs.DscrmnaHve 10/21/15 30

Dscrmnatve vs. Generatve GeneraHveapproach emodelthejontdstrbuhonp(x,cusng p(x C=c k andp(c=c k DscrmnaHveapproach Classpror emodelthecondhonaldstrbuhonp(c X drectly e.g., Pr Dscrmnatve vs. Generatve LogsHcRegresson Gaussan Heght

LDAvs.LogsHcRegresson Dr.YanjunQ/UVACS6316/f15 10/21/15 33 Dscrmnatve vs. Generatve DefnHons h gen andh ds :generahveanddscrmnahve classfers h gen,nf andh ds,nf :sameclassfersbuttranedon theenhrepopulahon(asymptohcclassfers n nfnty,h gen h gen,nf andh ds h ds,nf Ng,Jordan,."OndscrmnaHvevs.generaHveclassfers:A comparsonoflogshcregressonandnavebayes."advances3n3 neural3nformahon3processng3systems14(2002:841.

Dscrmnatve vs. Generatve ProposHon1: ProposHon2: ep:numberofdmensons en:numberofobservahons eϵ:generalzahonerror Logstc Regresson vs. NBC DscrmnaHveclassfer(LogsHcRegresson esmallerasymptohcerror eslowconvergence~o(p GeneraHveclassfer(NaveBayes elargerasymptohcerror ecanhandlemssngdata(em efastconvergence~o(lg(p

generalzahonerror Ng,Jordan,."OndscrmnaHvevs.generaHveclassfers:A comparsonoflogshcregressonandnavebayes."advances3n3 neural3nformahon3processng3systems14(2002:841. LogsHcRegresson NaveBayes Szeoftranngset generalzahonerror Szeoftranngset Xue,JngeHao,andD.Mchael}erngton."Commenton OndscrmnaHvevs.generaHveclassfers:Acomparson oflogshcregressonandnavebayes."neural3processng3le0ers28.3(2008:169e187.

Dscrmnatve vs. Generatve Emprcally,generaHveclassfersapproach therasymptohcerrorfasterthan dscrmnahveones Goodforsmalltranngset Handlemssngdatawell(EM Emprcally,dscrmnaHveclassfershave lowerasymptohcerrorthangenerahveones Goodforlargertranngset References " Prof.an,Stenbach,Kumar s IntroducHon todatamnng slde " Prof.AndrewMoore ssldes " Prof.ErcXng ssldes YanjunQ/UVACS4501e01e6501e07 " HasHe,revor,etal.he3elements3of3 stahshcal3learnng.vol.2.no.1.newyork: Sprnger,2009. 10/21/15 40