Evaluation of classifiers MLPs

Lecture Evaluaton of classfers MLPs Mlos Hausrecht mlos@cs.ptt.edu 539 Sennott Square Evaluaton For any data set e use to test the model e can buld a confuson matrx: Counts of examples th: class label ω that are classfed th a label predct α α ω 4 target ω 7 54 α

Evaluaton For any data set e use to test the model e can buld a confuson matrx: target ω ω α 4 7 predct α 54 agreement Error:? Evaluaton For any data set e use to test the model e can buld a confuson matrx: target predct α α Error: 37/3 Accuracy - Error 94/3 ω 4 ω 7 54 agreement

Evaluaton for bnary classfcaton Entres n the confuson matrx for bnary classfcaton have names: target ω ω α TP FP predct α FN TN TP: True postve ht FP: False postve false alarm TN: True negatve correct reecton FN: False negatve a mss Addtonal statstcs Senstvty recall Specfcty SENS SPEC TP TP + FN TN TN + FP Postve predctve value precson TP PPT TP + FP Negatve predctve value TN NPV TN + FN

Bnary classfcaton: addtonal statstcs Confuson matrx target predct 4 8 SENS 4/6 SPEC 8/9 PPV 4/5 NPV 8/ Ro and column quanttes: Senstvty SENS Specfcty SPEC Postve predctve value PPV Negatve predctve value NPV Bnary decsons: ROC..8.6 ω ω.4. x * -. - -5 - -5 5 5 Probabltes: SENS SPEC p p x > x* x ω x < x* x ω

Recever Operatng Characterstc ROC ROC curve plots : SN p x > x* x ω -SP p x > x* x ω for dfferent x*..8.6.4. ω x * ω -. - -5 - -5 5 5 SN p x > x* x ω.9.8.7.6.5.4.3.....3.4.5.6.7.8.9 -SPEC p x > x* x ω ROC curve..8...8.8 Case Case Case 3.6.6.6.4.4.4... -. - -5 - -5 5 5 -. - -5 - -5 5 5 -. - -5 - -5 5 5 p x > x* x ω.9.8.7.6.5.4.3.....3.4.5.6.7.8.9 p x > x* x ω

Recever operatng characterstc ROC shos the dscrmnablty beteen the to classes under dfferent decson bases Decson bas can be changed usng dfferent loss functon Zero-one loss functon Msclassfcaton error Based on the zero-one loss functon Any msclassfed example counts as Correctly classfed example counts as ω ω ω α 4 α 7 54 8 α 4 76 agreement

General loss functon Error functon based on a more general loss functon Dfferent msclassfcatons have dfferent eght loss α our choce ω true label λ α ω loss for classfcaton Example: λ α ω α α α ω 3 3 ω ω 5 Bayesan decson theory More general loss functon Dfferent msclassfcatons have dfferent eght loss λ α ω Expected loss for the classfcaton choce R α x λ α ω P y ω x Also called condtonal rs Decson rule: α x Chooses label acton accordng to the nput The optmal decson rule α * x arg mn λ α ω P y ω x α α

Bayesan decson theory The optmal decson rule α * x arg mn λ α ω P y ω x α Ho to modfy classfers to handle dfferent loss? Dscrmnatve models: Drectly optmze the parameters accordng to the ne loss functon Generatve models: Learn probabltes as before Decsons about classes are based to mnmze the emprcal loss as seen above Calculatng the loss for data Confuson matrx: Counts of examples th: class label ω that are classfed th a label α α α α ω 4 7 ω 54 4 ω 8 76 agreement Loss N λ α ω N α ω

Multlayer neural netors Lnear unts x x Lnear regresson f x d + d x f x Logstc regresson f x p y x g x x d z d + x f x p y x x d On-lne gradent update: + α y f x x d The same On-lne gradent update: + α y f x + α y f x x + α y f x x

Lmtatons of basc lnear unts Lnear regresson f x + d x Logstc regresson f x p y x g d + x x x f x x x z p y x d d x d x d Functon lnear n nputs!! Lnear decson boundary!! Regresson th the quadratc model. Lmtaton: lnear hyper-plane only a non-lnear surface can be better

Classfcaton th the lnear model. Logstc regresson model defnes a lnear decson boundary Example: classes blue and red ponts Decson boundary.5.5 -.5 - -.5 - - -.5 - -.5.5.5 Lnear decson boundary logstc regresson model s not optmal but not that bad 5 4 3 - - -3-4 -4-3 - - 3 4 5 6

When logstc regresson fals? Example n hch the logstc regresson model fals 5 4 3 - - -3-4 -4-3 - - 3 4 5 Lmtatons of lnear unts. Logstc regresson does not or for party functons - no lnear decson boundary exsts.5.5 -.5 - -.5 - - -.5 - -.5.5.5 Soluton: a model of a non-lnear decson boundary

f x x Extensons of smple lnear unts use feature bass functons to model nonlneartes Lnear regresson m + φ x φ x φ x φ x - an arbtrary functon of x Logstc regresson f x g + φ x m x d φ m x m f x Learnng th extended lnear unts Feature bass functons model nonlneartes Lnear regresson m + φ x x φ x φ x Logstc regresson f x m g + φ x x d x φ m m Important property: The same problem as learnng of the eghts for lnear unts the nput has changed but the eghts are lnear n the ne nput Problem: too many eghts to learn

Mult-layered neural netors Alternatve ay to ntroduce nonlneartes to regresson/classfcaton models Idea: Cascade several smple neural models th logstc unts. Much le neuron connectons. Multlayer neural netor Also called a multlayer perceptron MLP Cascades multple logstc regresson unts Example: layer classfer th non-lnear decson boundares x x x d z z z p y x Input layer Hdden layer Output layer

Multlayer neural netor Models non-lneartes through logstc regresson unts Can be appled to both regresson and bnary classfcaton problems Input layer x x x d Hdden layer z z Output layer regresson f x f x z classfcaton f x p y x opton Multlayer neural netor Non-lneartes are modeled usng multple hdden logstc regresson unts organzed n layers The output layer determnes hether t s a regresson or a bnary classfcaton problem Input layer x Hdden layers Output layer regresson f x f x x classfcaton x d opton f x p y x

Learnng th MLP Ho to learn the parameters of the neural netor? Gradent descent algorthm Weght updates based on the error: J D α J D We need to compute gradents for eghts n all unts Can be computed n one bacard seep through the net!!! The process s called bac-propagaton Bacpropagaton --th level -th level +-th level x x z + l + z l x l + x z - output of the unt on level - nput to the sgmod functon on level - eght beteen unts and on levels - and + x z g z x

Bacpropagaton δ x u u n u f y K δ Update eght usng a data pont D J α D J z δ Let Then: x z z D J D J δ S.t. s computed from and the next layer + δ l x x l l l + + δ δ Last unt s the same as for the regular lnear unts: It s the same for the classfcaton th the log-lelhood measure of ft and lnear regresson th least-squares error!!! } { > < y D x x Learnng th MLP Gradent descent algorthm Weght update: D J α x z z D J D J δ x αδ x δ - -th output of the - layer - dervatve computed va bacpropagaton α - a learnng rate

Learnng th MLP Onlne gradent descent algorthm Weght update: α J onlne D u J onlne Du z J onlne Du z δ x αδ x x - -th output of the - layer δ - dervatve computed va bacpropagaton α - a learnng rate Onlne gradent descent algorthm for MLP Onlne-gradent-descent D number of teratons Intalze all eghts for :: number of teratons do select a data pont D u <xy> from D set learnng rate α compute outputs x for each unt compute dervatves δ va bacpropagaton update all eghts n parallel αδ x end for return eghts

Xor Example. lnear decson boundary does not exst.5.5 -.5 - -.5 - - -.5 - -.5.5.5 Xor example. Lnear unt

Xor example. Neural netor th hdden unts Xor example. Neural netor th hdden unts

MLP n practce Optcal character recognton dgts x Automatc sortng of mals 5 layer netor th multple output functons outputs 9 layer Neurons Weghts 5 3 4 3 3 5 784 336 x 4 nputs 336 784