Classification Part 4. Model Evaluation

Clssifiction Prt 4 Dr. Snjy Rnk Professor Computer nd Informtion Science nd Engineering University of Florid, Ginesville Model Evlution Metrics for Performnce Evlution How to evlute the performnce of model Methods for Performnce Evlution How to obtin relible estimtes Methods for Model Comprison How to compre the reltive performnce mong competing models Dt Mining Snjy Rnk Fll 2003 2

Metrics for Performnce Evlution Focus on the predictive cpbility of model Rther thn how fst it tkes to clssify or build models, sclbility, etc. Confusion Mtrix: PREDICTED c b d : TP (true positive) b: FN (flse negtive) c: FP (flse positive) d: TN (true negtive) Dt Mining Snjy Rnk Fll 2003 3 Metrics for Performnce Evlution PREDICTED (TP) c (FP) B (FN) d (TN) Most widelyused metric: d TP TN Accurcy = = b c d TP TN FP FN Dt Mining Snjy Rnk Fll 2003 4 2

Cost Mtrix PREDICTED C(i j) C(Yes Yes) C(No Yes) C(Yes No) C(No No) C(i j): Cost of misclssifying clss j exmple s clss i Accurcy is useful mesure if C(Yes No)=C(No Yes) nd C(Yes Yes)=C(No No) P(Yes) = P(No) (clss distribution re equl) Dt Mining Snjy Rnk Fll 2003 5 Cost vs. Accurcy Cost Mtrix PREDICTED C(i j) 00 0 Model M PREDICTED Model M 2 PREDICTED C(i j) 50 60 40 250 C(i j) 250 5 45 200 Accurcy = 80% Cost = 390 Accurcy = 90% Cost = 4255 Dt Mining Snjy Rnk Fll 2003 6 3

CostSensitive Mesures Precision (p) = c Recll (r) = b 2rp 2 F mesure (F) = = r p 2 b c Precision is bised towrds C(Yes Yes) & C(Yes No) Recll is bised towrds C(Yes Yes) & C(No Yes) Fmesure is bised towrds ll except C(No No) w w d 4 Weighted Accurcy = w w b w c w d Dt Mining Snjy Rnk Fll 2003 7 2 3 4 Methods for Performnce Evlution How to obtin relible estimte of performnce Performnce of model my depend on other fctors besides the lerning lgorithm: Clss distribution Cost of misclssifiction Size of trining nd test sets Dt Mining Snjy Rnk Fll 2003 8 4

Lerning Curve Lerning curve shows how ccurcy chnges with vrying smple size Requires smpling schedule for creting lerning curve Arithmetic smpling Geometric smpling Effect of smll smple size Bis in the estimte Vrince of the estimte Dt Mining Snjy Rnk Fll 2003 9 Methods for Estimtion Holdout Reserve 2/3 for trining nd /3 for testing Rndom subsmpling Repeted holdout Cross vlidtion Prtition dt into k disjoint subsets kfold: trin on k prtitions, test on the remining one Leveoneout: k=n Strtified smpling Oversmpling vs. Undersmpling Bootstrp Smpling with replcement Dt Mining Snjy Rnk Fll 2003 0 5

Receiver Operting Chrcteristic (ROC) Developed in 950s for signl detection theory to nlyze noisy signls Chrcterize the trdeoff between positive hits nd flse lrms ROC curve plots TP (on the yxis) ginst FP (on the xxis) Performnce of ech clssifier represented s point on the ROC curve chnging the threshold of lgorithm, smple distribution or cost mtrix chnges the loction of the point Dt Mining Snjy Rnk Fll 2003 ROC Curve dimensionl dt set contining 2 clsses (positive nd negtive) Any point locted t x > t is clssified s positive At threshold t: TP=0.5, FN=0.5, FP=0.2, FN=0.88 Dt Mining Snjy Rnk Fll 2003 2 6

(TP,FP): (0,0): declre everything to be negtive clss (,): declre everything to be positive clss (,0): idel ROC Curve Digonl line: Rndom guessing Below digonl line: prediction is opposite of the true clss Dt Mining Snjy Rnk Fll 2003 3 Using ROC for Model Comprison No model consistently outperforms the other M is better for smll FPR M2 is better for lrge FPR Are under the ROC curve Idel, re = Rndom guess, re = 0.5 Dt Mining Snjy Rnk Fll 2003 4 7