Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Similar documents
STAT 405 BIOSTATISTICS (Fall 2016) Handout 15 Introduction to Logistic Regression

3/3/2014. CDS M Phil Econometrics. Vijayamohanan Pillai N. CDS Mphil Econometrics Vijayamohan. 3-Mar-14. CDS M Phil Econometrics.

Statistics for Business and Economics

Diagnostics in Poisson Regression. Models - Residual Analysis

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition)

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Statistics for Economics & Business

Lecture 6: Introduction to Linear Regression

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Limited Dependent Variables

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Regression with limited dependent variables. Professor Bernard Fingleton

Chapter 13: Multiple Regression

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

Chapter 11: Simple Linear Regression and Correlation

x i1 =1 for all i (the constant ).

Comparison of Regression Lines

Basically, if you have a dummy dependent variable you will be estimating a probability.

First Year Examination Department of Statistics, University of Florida

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

ANSWERS CHAPTER 9. TIO 9.2: If the values are the same, the difference is 0, therefore the null hypothesis cannot be rejected.

Maximum Likelihood Estimation

Basic Business Statistics, 10/e

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Marginal Effects in Probit Models: Interpretation and Testing. 1. Interpreting Probit Coefficients

4.1. Lecture 4: Fitting distributions: goodness of fit. Goodness of fit: the underlying principle

January Examinations 2015

LOGIT ANALYSIS. A.K. VASISHT Indian Agricultural Statistics Research Institute, Library Avenue, New Delhi

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

since [1-( 0+ 1x1i+ 2x2 i)] [ 0+ 1x1i+ assumed to be a reasonable approximation

[The following data appear in Wooldridge Q2.3.] The table below contains the ACT score and college GPA for eight college students.

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Negative Binomial Regression

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

Linear Approximation with Regularization and Moving Least Squares

BIO Lab 2: TWO-LEVEL NORMAL MODELS with school children popularity data

SDMML HT MSc Problem Sheet 4

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis

1 Binary Response Models

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

Composite Hypotheses testing

Lecture Notes on Linear Regression

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

Statistics II Final Exam 26/6/18

STATISTICS QUESTIONS. Step by Step Solutions.

Scatter Plot x

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Chapter 14: Logit and Probit Models for Categorical Response Variables

Credit Card Pricing and Impact of Adverse Selection

/ n ) are compared. The logic is: if the two

Chapter 15 Student Lecture Notes 15-1

Multinomial logit regression

Chapter 8 Multivariate Regression Analysis

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

Tests of Single Linear Coefficient Restrictions: t-tests and F-tests. 1. Basic Rules. 2. Testing Single Linear Coefficient Restrictions

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests

Linear regression. Regression Models. Chapter 11 Student Lecture Notes Regression Analysis is the

Durban Watson for Testing the Lack-of-Fit of Polynomial Regression Models without Replications

ECONOMETRICS - FINAL EXAM, 3rd YEAR (GECO & GADE)

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

Introduction to Regression

STAT 3008 Applied Regression Analysis

Chapter 8 Indicator Variables

Continuous vs. Discrete Goods

Logistic Regression Maximum Likelihood Estimation

Designing a Pseudo R-Squared Goodness-of-Fit Measure in Generalized Linear Models

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models

ASYMPTOTIC PROPERTIES OF ESTIMATES FOR THE PARAMETERS IN THE LOGISTIC REGRESSION MODEL

9. Binary Dependent Variables

Limited Dependent Variables and Panel Data. Tibor Hanappi

4.3 Poisson Regression

Chapter 12 Analysis of Covariance

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression

Chapter 5 Multilevel Models

28. SIMPLE LINEAR REGRESSION III

CHAPTER 8. Exercise Solutions

Classification as a Regression Problem

Comparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method

ECON 351* -- Note 23: Tests for Coefficient Differences: Examples Introduction. Sample data: A random sample of 534 paid employees.

Decision Analysis (part 2 of 2) Review Linear Regression

Lecture 10 Support Vector Machines II

Issues To Consider when Estimating Health Care Costs with Generalized Linear Models (GLMs): To Gamma/Log Or Not To Gamma/Log? That Is The New Question

Statistical pattern recognition

STK4080/9080 Survival and event history analysis

Advanced Statistical Methods: Beyond Linear Regression

LINEAR REGRESSION ANALYSIS. MODULE VIII Lecture Indicator Variables

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

Introduction to Econometrics (3 rd Updated Edition, Global Edition) Solutions to Odd-Numbered End-of-Chapter Exercises: Chapter 13

Economics 130. Lecture 4 Simple Linear Regression Continued

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2017 Instructor: Victor Aguirregabiria

x = , so that calculated

Learning Objectives for Chapter 11

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

Transcription:

Sesson Outlne Introducton to classfcaton problems and dscrete choce models. Introducton to Logstcs Regresson. Logstc functon and Logt functon. Maxmum Lkelhood Estmator (MLE) for estmaton of LR parameters. Examples: Challenger Shuttle, German Bank Credt Ratng.

Classfcaton Problems Classfcaton s an mportant category of problems n whch the decson maker would lke to classfy the customers nto two or more groups. Examples of Classfcaton Problems: Customer Churn. Credt Ratng (low, hgh and medum rsk) Employee attrton. Fraud (classfcaton of a transacton to fraud/non-fraud) Outcome of any bnomal and multnomal experment.

Always Cheerful Always Sad Logstc Regresson attempts to classfy customers nto dfferent categores

Dscrete Choce Models Problems nvolvng dscrete choces avalable to the decson maker. Dscrete choce models (n busness) examnes whch alternatve s chosen by a customer and why? Most dscrete choce models estmate the probablty that a customer chooses a partcular alternatve from several possble alternatves.

Logstc Regresson Classfcaton Problem Dscrete Choce Model Probablty of an event

Logstc Regresson - Introducton The name logstc regresson emerges from logstc functon. P( Y 1) e 1 Z e Z Mathematcally, logstc regresson attempts to estmate condtonal probablty of an event.

Logstc Regresson Logstc regresson models estmate how probablty of an event may be affected by one or more explanatory varables.

Bnomal Logstc Regresson Bnomal (or bnary) logstc regresson s a model n whch the dependent varable s dchotomous. The ndependent varables may be of any type.

Logstc Functon (Sgmodal functon) n n z z x x x z e e z... 1 ) ( 2 2 1 1 0

Logstc Regresson wth one Explanatory Varable P( Y 1 X x) ( x) exp( x) 1 exp( x) = 0 mples that P(Y x) s same for each value of x > 0 mples that P(Y x) s ncreases as the value of x ncreases < 0 mples that P(Y x) s decreases as the value of x ncreases

Logt Functon The Logt functon s the logarthmc transformaton of the logstc functon. It s defned as the natural logarthm of odds. Logt of a varable s gven by: Logt( ) ln( ) 0 1x 1 1 π 1 π odds

Logstc regresson More robust Error terms need not be normal. No requrement for equal varance for error term (homoscedastcty). No requrement for lnear relatonshp between dependent and ndependent varables.

Estmaton of parameters n Logstc Regresson Estmaton of parameters n logstc regresson s carred out usng Maxmum Lkelhood Estmaton (MLE) technque. No closed form soluton exsts for estmaton of regresson parameters of logstc regresson.

Maxmum Lkelhood Estmator (MLE) MLE s a statstcal model for estmatng model parameters of a functon. For a gven data set, the MLE chooses the values of model parameters that makes the data more lkely, than other parameter values.

Lkelhood Functon Lkelhood functon L() represents the jont probablty or lkelhood of observng the data that have been collected. MLE chooses that estmator of the set of unknown parameters whch maxmzes the lkelhood functon L().

Maxmum Lkelhood Estmator Assume that x 1, x 2,, x n are some sample observatons of a dstrbuton f(x, ), where s an unknown parameter. The lkelhood functon s L() = f(x 1, x 2,, x n, ) whch s the jont probablty densty functon of the sample. The value of, *, whch maxmzes L() s called the maxmum lkelhood estmator of.

Example: Exponental Dstrbuton Let x1, x2,, xn be the sample observaton that follows exponental dstrbuton wth parameter. That s: f(x, θ) θe θx The lkelhood functon s gven by (assumng ndependence): L( x, ) f θe ( x 1, ) f ( x θe 2, ) f θe -θx1 -θx1 -θx ( x n n, ) n e n 1 x

Log-lkelhood functon Log-lkelhood functon s gven by: Ln( L( x, )) nln The optmal, * s gven by: n x 1 d (ln( L( x, )) d * n n 1 x 1 x n n 1 x 0

Lkelhood functon for Bnary Logstc Functon Probablty densty functon for bnary logstc regresson s gven by: f y y ( ) (1 ) y 1 n y y y n 1 1, 2,..., ) (1 ) 1 L( ) f ( y y ln( L( )) n 1 y ln n ( x ) (1 y ) ln(1 ( x )) 1

Lkelhood functon for Bnary Logstc Functon )) exp( ln(1 ) ( ) ( ln 1 0 1 1 0 1 n n x x y L

Estmaton of LR parameters 0 ) exp( 1 ) exp( )), ( ln( 1 1 0 1 0 1 0 1 0 n n x x y L 0 ) exp( 1 ) exp( )), ( ln( 1 1 0 1 0 1 1 1 0 n n x x x y x L The above system of equatons are solved teratvely to estmate 0 and 1

Lmtatons of MLE Maxmum lkelhood estmator may not be unque or may not exst. Closed form soluton may not exst for many cases, one may have to use teratve procedure to estmate the parameter values.

Challenger Data Flt Temp Damage STS-1 66 No STS-2 70 Yes STS-3 69 No STS-4 80 No STS-5 68 No STS-6 67 No STS-7 72 No STS-8 73 No STS-9 70 No STS-41B 57 Yes STS-41C 63 Yes STS-41D 70 Yes Flt Temp Damage STS-41G 78 No STS-51-A 67 No STS-51-C 53 Yes STS-51-D 67 No STS-51-B 75 No STS-51-G 70 No STS-51-F 81 No STS-51-I 76 No STS-51-J 79 No STS-61-A 75 Yes STS-61-B 76 No STS-61-C 58 Yes

Challenger launch temperature vs damage data

Logstc Regresson of challenger data Let: Y = 0 denote no damage Y = 1 denote damage to the O-rng P(Y = 1) = and P(Y = 0) = 1 -. We predct P(Y = 1 x ), x = launch temperature

Logstc Regresson usng SPSS Dependent varable: In Bnary logstc regresson, the dependent varable can take only two values. In multnomal logstc regresson, the dependent varable can take two or more values (but not contnuous). Covarate: All ndependent (predctor) varables are entered as covarates.

Step 1 a LaunchTemperature Constant Varables n the Equaton a. Varable(s) entered on step 1: LaunchTemperature. B S.E. Wald df Sg. Exp(B) -.236.107 4.832 1.028.790 15.297 7.329 4.357 1.037 4398676 ln 1 15.297 0.236 X P( Y 1) e 1 e 15.2970.236X 15.2970.236X

Challenger: Probablty of falure estmate e 1 e 15.2970.236X 15.2970.236X Probablty 1.2 1 0.8 0.6 0.4 0.2 0 0 20 40 60 80 100 120

Classfcaton table from SPSS

Accuracy Paradox Assume an example of nsurance fraud. Past data has revealed that out of 1000 clams n the past, 950 are true clams and 50 are fraudulent clams. The classfcaton table usng a logstc regresson model s gven below: Observed Predcted % accuracy 0 1 0 900 50 94.73% 1 5 45 90.00% The overall accuracy s 94.5%. Not predctng fraud wll gve 95% accuracy!

ODDS and ODDS RATIO ODDS: Rato of two probablty values. odds 1 ODDS Rato: Rato of two odds.

ODDS RATIO Assume that X s an ndependent varable (covarate). The odds rato, OR, s defned as the rato of the odds for X = 1 to the odds for X = 0. The odds rato s gven by: (1) OR (0) 1 (1) 1 (0)

Interpretaton of Beta Coeffcent n LR π(x) ln( ) β0 β1x1 1 π(x) For x 0 π(0) ln( ) β0 1 π(0) For x 1 π(1) ln( ) β0 β1 1 π(1) 1 (1) /(1 (1)) ln (0) /(1 (0)) (1) (2) (3)

Interpretaton of LR coeffcents 1 ( x 1) (1 ( x 1)) ln ( x) (1 ( x 1)) Change n ln odds rato e 1 ( x 1) (1 ( x 1)) ( x) (1 ( x 1)) Change n odds rato

Odds Rato for Bnary Logstc Regresson OR e (0) 1 (0) ( 1) 1 (1) 1 If OR = 2, then the event s twce lkely to occur when X = 1 compared to X = 0. Odds rato approxmates the relatve rsk.

Interpretaton of LR coeffcents 1 s the change n log-odds rato for unt change n the explanatory varable. 1 s the change n odds rato by a factor exp( 1 ).

Sesson Outlne Measurng the ftness of Logstc Regresson Model. Testng ndvdual regresson parameters (Wald s test). Omnbus test for overall model ftness Hosmer-Lemeshow Goodness of ft test. R 2 n Logstc Regresson. Confdence Intervals for parameters and probabltes.

Wald Test Wald test s used to check the sgnfcance of ndvdual explanatory varables (smlar to t-statstc n lnear regresson). Wald test statstc s gven by: W ( ) SE 2 W s a ch-square statstc

Wald test hypothess Null Hypothess H0: = 0 Alternatve Hypothess H1: 0

Wald Test Challenger Data The explanatory varable s sgnfcant, snce the p value s less than 0.05

Wald Test Challenger Data For sgnfcant varables, the CI for Exp(β) wll not contan 1

Model Ch-Square Omnbus test: H 0 : β 1 = β 2 = = β k = 0 H1: Not all βs are zero

Hosmer-Lemeshow Goodness of Ft Test for overall ftness of the model for a bnary logstc regresson (smlar to ch-square goodness of ft test). The observatons are grouped nto 10 groups based on ther predcted probabltes.

Hosmer-Lemeshow Test Statstc Hosmer-Lemeshow Test Statstc s gven by: C g k 1 ( O n k k n k k ) 2 k (1 k ) g Number of groups n O k k k Number of observatons n each group Sum of the values for k The avearge n k th th group group.

H-L test for Challenger Data Hosme r and Le meshow Test Step 1 Ch-square df Sg. 9.396 8.310 Snce P (=0.310) s more than 0.05, we accept the null hypothess that there s no dfference between the predcted and observed frequences (accept the model)

Classfcaton Table Predcton (Classfcaton) Observed 1 (Postve) 7 0 (Negatve) 17 1 (Postve) 4 [True Postve] TP 0 [False Postve] FP 0 (Negatve) 3 [False Negatve] FN 17 [True Negatve] TN Senstvty Specfcty TP TP FN TN TN FP 4 7 17 17 57.1 100

Senstvty & Specfcty Senstvty Number of No of true postves true postves Number of false negatves Senstvty s the probablty that the predcted value of y = 1 gven that the observed value s 1. Specfcty Number of No of true negatves true negatves Number of false postves Specfcty s the probablty that the predcted value of y = 0 gven that the observed value s 0.

Recever Operatng Characterstcs (ROC) Curve ROC curve plots the true postve rato (rght postve classfcaton) aganst the false postve rato (1- specfcty) and compares t wth random classfcaton. The hgher the area under the ROC curve, the better the predcton ablty.

Challenger Example: Senstvty Vs 1-Specfcty (True postve Vs False postve) Cut off Value Senstvty Specfcty 1-specfcty 0.05 1 0.235 0.765 0.1 0.857 0.412 0.588 0.2 0.857 0.529 0.471 0.3 0.571 0.706 0.294 0.4 0.571 0.941 0.059 0.5 0.571 1 0 0.6 0.571 1 0 0.7 0.429 1 0 0.8 0.429 1 0 0.9 0.143 1 0 0.95 0 1 0

ROC Curve Challenger Example 1.2 1 0.8 0.6 0.4 0.2 0-0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

ROC Curve

Area Under the ROC Curve Area under the ROC curve s nterpreted as the probablty that the model wll rank a randomly chosen postve hgher than randomly chosen negatve. If n1 s the number of postves (1s) and n2 s the number of negatves (0s), then the area under the ROC curve s the proporton of cases n all possble combnatons of (n1, n2) such that n1 wll have hgher probablty than n2.

ROC Curve General rule for acceptance of the model: If the area under ROC s: 0.5 No dscrmnaton 0.7 ROC area < 0.8 Acceptable dscrmnaton 0.8 ROC area < 0.9 Excellent dscrmnaton ROC area 0.9 Outstandng dscrmnaton

Gn Coeffcent Gn coeffcent measures ndvdual mpact of the an explanatory varable. Gn coeffcent = 2 AUC 1 AUC = Area under the ROC Curve

Optmal Cut-off probabltes Usng classfcaton plots. Youden s Index. Cost based optmzaton.

Classfcaton Plots Step number: 1 Observed Groups and Predcted Probabltes 4 + 1 + I 1 I I 1 I F I 1 I R 3 + 1 0 + E I 1 0 I Q I 1 0 I U I 1 0 I E 2 + 0 0 1 0 0 + N I 0 0 1 0 0 I C I 0 0 1 0 0 I Y I 0 0 1 0 0 I 1 + 000 0 0 0 0 0 0 0 0 0 1 1 1 1 + I 000 0 0 0 0 0 0 0 0 0 1 1 1 1 I I 000 0 0 0 0 0 0 0 0 0 1 1 1 1 I I 000 0 0 0 0 0 0 0 0 0 1 1 1 1 I Predcted ---------+---------+---------+---------+---------+---------+---------+---------+---------+---------- Prob: 0.1.2.3.4.5.6.7.8.9 1 Group: 0000000000000000000000000000000000000000000000000011111111111111111111111111111111111111111111111111

Youden s Index Youden's ndex s a measures for dagnostc accuracy. Youden's ndex s calculated by deductng 1 from the sum of test s senstvty and specfcty. Youden' s Index J(p) Senstvty(p) specfcty(p)-1

Cost based Model for Optmal Cut-off Observed Predcted 0 1 0 P 00 P 01 1 P 10 P 11 R 00 = Cost of classfyng 0 as 0 C 01 = Cost of classfyng 0 as 1 C 10 = Cost of classfyng 1 as 0 R 11 = Cost of classfyng 1 as 1 Optmal cut - off Mn p P 00 R 00 P 01 C 01 P 10 C 10 P 11 R 11