Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Sesson Outlne Introducton to classfcaton problems and dscrete choce models. Introducton to Logstcs Regresson. Logstc functon and Logt functon. Maxmum Lkelhood Estmator (MLE) for estmaton of LR parameters. Examples: Challenger Shuttle, German Bank Credt Ratng.

Classfcaton Problems Classfcaton s an mportant category of problems n whch the decson maker would lke to classfy the customers nto two or more groups. Examples of Classfcaton Problems: Customer Churn. Credt Ratng (low, hgh and medum rsk) Employee attrton. Fraud (classfcaton of a transacton to fraud/non-fraud) Outcome of any bnomal and multnomal experment.

Always Cheerful Always Sad Logstc Regresson attempts to classfy customers nto dfferent categores

Dscrete Choce Models Problems nvolvng dscrete choces avalable to the decson maker. Dscrete choce models (n busness) examnes whch alternatve s chosen by a customer and why? Most dscrete choce models estmate the probablty that a customer chooses a partcular alternatve from several possble alternatves.

Logstc Regresson Classfcaton Problem Dscrete Choce Model Probablty of an event

Logstc Regresson - Introducton The name logstc regresson emerges from logstc functon. P( Y 1) e 1 Z e Z Mathematcally, logstc regresson attempts to estmate condtonal probablty of an event.

Logstc Regresson Logstc regresson models estmate how probablty of an event may be affected by one or more explanatory varables.

Bnomal Logstc Regresson Bnomal (or bnary) logstc regresson s a model n whch the dependent varable s dchotomous. The ndependent varables may be of any type.

Logstc Functon (Sgmodal functon) n n z z x x x z e e z... 1 ) ( 2 2 1 1 0

Logstc Regresson wth one Explanatory Varable P( Y 1 X x) ( x) exp( x) 1 exp( x) = 0 mples that P(Y x) s same for each value of x > 0 mples that P(Y x) s ncreases as the value of x ncreases < 0 mples that P(Y x) s decreases as the value of x ncreases

Logt Functon The Logt functon s the logarthmc transformaton of the logstc functon. It s defned as the natural logarthm of odds. Logt of a varable s gven by: Logt( ) ln( ) 0 1x 1 1 π 1 π odds

Logstc regresson More robust Error terms need not be normal. No requrement for equal varance for error term (homoscedastcty). No requrement for lnear relatonshp between dependent and ndependent varables.

Estmaton of parameters n Logstc Regresson Estmaton of parameters n logstc regresson s carred out usng Maxmum Lkelhood Estmaton (MLE) technque. No closed form soluton exsts for estmaton of regresson parameters of logstc regresson.

Maxmum Lkelhood Estmator (MLE) MLE s a statstcal model for estmatng model parameters of a functon. For a gven data set, the MLE chooses the values of model parameters that makes the data more lkely, than other parameter values.

Lkelhood Functon Lkelhood functon L() represents the jont probablty or lkelhood of observng the data that have been collected. MLE chooses that estmator of the set of unknown parameters whch maxmzes the lkelhood functon L().

Maxmum Lkelhood Estmator Assume that x 1, x 2,, x n are some sample observatons of a dstrbuton f(x, ), where s an unknown parameter. The lkelhood functon s L() = f(x 1, x 2,, x n, ) whch s the jont probablty densty functon of the sample. The value of, *, whch maxmzes L() s called the maxmum lkelhood estmator of.

Example: Exponental Dstrbuton Let x1, x2,, xn be the sample observaton that follows exponental dstrbuton wth parameter. That s: f(x, θ) θe θx The lkelhood functon s gven by (assumng ndependence): L( x, ) f θe ( x 1, ) f ( x θe 2, ) f θe -θx1 -θx1 -θx ( x n n, ) n e n 1 x

Log-lkelhood functon Log-lkelhood functon s gven by: Ln( L( x, )) nln The optmal, * s gven by: n x 1 d (ln( L( x, )) d * n n 1 x 1 x n n 1 x 0

Lkelhood functon for Bnary Logstc Functon Probablty densty functon for bnary logstc regresson s gven by: f y y ( ) (1 ) y 1 n y y y n 1 1, 2,..., ) (1 ) 1 L( ) f ( y y ln( L( )) n 1 y ln n ( x ) (1 y ) ln(1 ( x )) 1

Lkelhood functon for Bnary Logstc Functon )) exp( ln(1 ) ( ) ( ln 1 0 1 1 0 1 n n x x y L

Estmaton of LR parameters 0 ) exp( 1 ) exp( )), ( ln( 1 1 0 1 0 1 0 1 0 n n x x y L 0 ) exp( 1 ) exp( )), ( ln( 1 1 0 1 0 1 1 1 0 n n x x x y x L The above system of equatons are solved teratvely to estmate 0 and 1

Lmtatons of MLE Maxmum lkelhood estmator may not be unque or may not exst. Closed form soluton may not exst for many cases, one may have to use teratve procedure to estmate the parameter values.

Challenger Data Flt Temp Damage STS-1 66 No STS-2 70 Yes STS-3 69 No STS-4 80 No STS-5 68 No STS-6 67 No STS-7 72 No STS-8 73 No STS-9 70 No STS-41B 57 Yes STS-41C 63 Yes STS-41D 70 Yes Flt Temp Damage STS-41G 78 No STS-51-A 67 No STS-51-C 53 Yes STS-51-D 67 No STS-51-B 75 No STS-51-G 70 No STS-51-F 81 No STS-51-I 76 No STS-51-J 79 No STS-61-A 75 Yes STS-61-B 76 No STS-61-C 58 Yes

Challenger launch temperature vs damage data

Logstc Regresson of challenger data Let: Y = 0 denote no damage Y = 1 denote damage to the O-rng P(Y = 1) = and P(Y = 0) = 1 -. We predct P(Y = 1 x ), x = launch temperature

Logstc Regresson usng SPSS Dependent varable: In Bnary logstc regresson, the dependent varable can take only two values. In multnomal logstc regresson, the dependent varable can take two or more values (but not contnuous). Covarate: All ndependent (predctor) varables are entered as covarates.

Step 1 a LaunchTemperature Constant Varables n the Equaton a. Varable(s) entered on step 1: LaunchTemperature. B S.E. Wald df Sg. Exp(B) -.236.107 4.832 1.028.790 15.297 7.329 4.357 1.037 4398676 ln 1 15.297 0.236 X P( Y 1) e 1 e 15.2970.236X 15.2970.236X

Challenger: Probablty of falure estmate e 1 e 15.2970.236X 15.2970.236X Probablty 1.2 1 0.8 0.6 0.4 0.2 0 0 20 40 60 80 100 120

Classfcaton table from SPSS

Accuracy Paradox Assume an example of nsurance fraud. Past data has revealed that out of 1000 clams n the past, 950 are true clams and 50 are fraudulent clams. The classfcaton table usng a logstc regresson model s gven below: Observed Predcted % accuracy 0 1 0 900 50 94.73% 1 5 45 90.00% The overall accuracy s 94.5%. Not predctng fraud wll gve 95% accuracy!

ODDS and ODDS RATIO ODDS: Rato of two probablty values. odds 1 ODDS Rato: Rato of two odds.

ODDS RATIO Assume that X s an ndependent varable (covarate). The odds rato, OR, s defned as the rato of the odds for X = 1 to the odds for X = 0. The odds rato s gven by: (1) OR (0) 1 (1) 1 (0)

Interpretaton of Beta Coeffcent n LR π(x) ln( ) β0 β1x1 1 π(x) For x 0 π(0) ln( ) β0 1 π(0) For x 1 π(1) ln( ) β0 β1 1 π(1) 1 (1) /(1 (1)) ln (0) /(1 (0)) (1) (2) (3)

Interpretaton of LR coeffcents 1 ( x 1) (1 ( x 1)) ln ( x) (1 ( x 1)) Change n ln odds rato e 1 ( x 1) (1 ( x 1)) ( x) (1 ( x 1)) Change n odds rato

Odds Rato for Bnary Logstc Regresson OR e (0) 1 (0) ( 1) 1 (1) 1 If OR = 2, then the event s twce lkely to occur when X = 1 compared to X = 0. Odds rato approxmates the relatve rsk.

Interpretaton of LR coeffcents 1 s the change n log-odds rato for unt change n the explanatory varable. 1 s the change n odds rato by a factor exp( 1 ).

Sesson Outlne Measurng the ftness of Logstc Regresson Model. Testng ndvdual regresson parameters (Wald s test). Omnbus test for overall model ftness Hosmer-Lemeshow Goodness of ft test. R 2 n Logstc Regresson. Confdence Intervals for parameters and probabltes.

Wald Test Wald test s used to check the sgnfcance of ndvdual explanatory varables (smlar to t-statstc n lnear regresson). Wald test statstc s gven by: W ( ) SE 2 W s a ch-square statstc

Wald test hypothess Null Hypothess H0: = 0 Alternatve Hypothess H1: 0

Wald Test Challenger Data The explanatory varable s sgnfcant, snce the p value s less than 0.05

Wald Test Challenger Data For sgnfcant varables, the CI for Exp(β) wll not contan 1

Model Ch-Square Omnbus test: H 0 : β 1 = β 2 = = β k = 0 H1: Not all βs are zero

Hosmer-Lemeshow Goodness of Ft Test for overall ftness of the model for a bnary logstc regresson (smlar to ch-square goodness of ft test). The observatons are grouped nto 10 groups based on ther predcted probabltes.

Hosmer-Lemeshow Test Statstc Hosmer-Lemeshow Test Statstc s gven by: C g k 1 ( O n k k n k k ) 2 k (1 k ) g Number of groups n O k k k Number of observatons n each group Sum of the values for k The avearge n k th th group group.

H-L test for Challenger Data Hosme r and Le meshow Test Step 1 Ch-square df Sg. 9.396 8.310 Snce P (=0.310) s more than 0.05, we accept the null hypothess that there s no dfference between the predcted and observed frequences (accept the model)

Classfcaton Table Predcton (Classfcaton) Observed 1 (Postve) 7 0 (Negatve) 17 1 (Postve) 4 [True Postve] TP 0 [False Postve] FP 0 (Negatve) 3 [False Negatve] FN 17 [True Negatve] TN Senstvty Specfcty TP TP FN TN TN FP 4 7 17 17 57.1 100

Senstvty & Specfcty Senstvty Number of No of true postves true postves Number of false negatves Senstvty s the probablty that the predcted value of y = 1 gven that the observed value s 1. Specfcty Number of No of true negatves true negatves Number of false postves Specfcty s the probablty that the predcted value of y = 0 gven that the observed value s 0.

Recever Operatng Characterstcs (ROC) Curve ROC curve plots the true postve rato (rght postve classfcaton) aganst the false postve rato (1- specfcty) and compares t wth random classfcaton. The hgher the area under the ROC curve, the better the predcton ablty.

Challenger Example: Senstvty Vs 1-Specfcty (True postve Vs False postve) Cut off Value Senstvty Specfcty 1-specfcty 0.05 1 0.235 0.765 0.1 0.857 0.412 0.588 0.2 0.857 0.529 0.471 0.3 0.571 0.706 0.294 0.4 0.571 0.941 0.059 0.5 0.571 1 0 0.6 0.571 1 0 0.7 0.429 1 0 0.8 0.429 1 0 0.9 0.143 1 0 0.95 0 1 0

ROC Curve Challenger Example 1.2 1 0.8 0.6 0.4 0.2 0-0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

ROC Curve

Area Under the ROC Curve Area under the ROC curve s nterpreted as the probablty that the model wll rank a randomly chosen postve hgher than randomly chosen negatve. If n1 s the number of postves (1s) and n2 s the number of negatves (0s), then the area under the ROC curve s the proporton of cases n all possble combnatons of (n1, n2) such that n1 wll have hgher probablty than n2.

ROC Curve General rule for acceptance of the model: If the area under ROC s: 0.5 No dscrmnaton 0.7 ROC area < 0.8 Acceptable dscrmnaton 0.8 ROC area < 0.9 Excellent dscrmnaton ROC area 0.9 Outstandng dscrmnaton

Gn Coeffcent Gn coeffcent measures ndvdual mpact of the an explanatory varable. Gn coeffcent = 2 AUC 1 AUC = Area under the ROC Curve

Optmal Cut-off probabltes Usng classfcaton plots. Youden s Index. Cost based optmzaton.

Classfcaton Plots Step number: 1 Observed Groups and Predcted Probabltes 4 + 1 + I 1 I I 1 I F I 1 I R 3 + 1 0 + E I 1 0 I Q I 1 0 I U I 1 0 I E 2 + 0 0 1 0 0 + N I 0 0 1 0 0 I C I 0 0 1 0 0 I Y I 0 0 1 0 0 I 1 + 000 0 0 0 0 0 0 0 0 0 1 1 1 1 + I 000 0 0 0 0 0 0 0 0 0 1 1 1 1 I I 000 0 0 0 0 0 0 0 0 0 1 1 1 1 I I 000 0 0 0 0 0 0 0 0 0 1 1 1 1 I Predcted ---------+---------+---------+---------+---------+---------+---------+---------+---------+---------- Prob: 0.1.2.3.4.5.6.7.8.9 1 Group: 0000000000000000000000000000000000000000000000000011111111111111111111111111111111111111111111111111

Youden s Index Youden's ndex s a measures for dagnostc accuracy. Youden's ndex s calculated by deductng 1 from the sum of test s senstvty and specfcty. Youden' s Index J(p) Senstvty(p) specfcty(p)-1

Cost based Model for Optmal Cut-off Observed Predcted 0 1 0 P 00 P 01 1 P 10 P 11 R 00 = Cost of classfyng 0 as 0 C 01 = Cost of classfyng 0 as 1 C 10 = Cost of classfyng 1 as 0 R 11 = Cost of classfyng 1 as 1 Optmal cut - off Mn p P 00 R 00 P 01 C 01 P 10 C 10 P 11 R 11