On the Inference of the Logistic Regression Model

Similar documents
Logistic Regressions. Stat 430

Interactions in Logistic Regression

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis

Logistic Regression - problem 6.14

Using R in 200D Luke Sonnet

R Hints for Chapter 10

Logistic Regression 21/05

Classification. Chapter Introduction. 6.2 The Bayes classifier

Linear Regression Models P8111

Exam Applied Statistical Regression. Good Luck!

Exercise 5.4 Solution

Introduction to the Generalized Linear Model: Logistic regression and Poisson regression

Matched Pair Data. Stat 557 Heike Hofmann

Sample solutions. Stat 8051 Homework 8

Introduction to General and Generalized Linear Models

UNIVERSITY OF TORONTO Faculty of Arts and Science

Administration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books

Introduction to the Analysis of Tabular Data

cor(dataset$measurement1, dataset$measurement2, method= pearson ) cor.test(datavector1, datavector2, method= pearson )

Neural networks (not in book)

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

P( y = 1) = 1+ e b 0+b 1 x 1 1 ( ) Activity #13: Generalized Linear Models: Logistic, Poisson Regression

Regression Methods for Survey Data

STAC51: Categorical data Analysis

Duration of Unemployment - Analysis of Deviance Table for Nested Models

1. Logistic Regression, One Predictor 2. Inference: Estimating the Parameters 3. Multiple Logistic Regression 4. AIC and BIC in Logistic Regression

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Regression so far... Lecture 21 - Logistic Regression. Odds. Recap of what you should know how to do... At this point we have covered: Sta102 / BME102

Clinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto.

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46

ssh tap sas913, sas

BMI 541/699 Lecture 22

Generalized linear models

STA 450/4000 S: January

Modeling Overdispersion

MODULE 6 LOGISTIC REGRESSION. Module Objectives:

( ) ( ) 2. ( ) = e b 0 +b 1 x. logistic function: P( y = 1) = eb 0 +b 1 x. 1+ e b 0 +b 1 x. Linear model: cancer = b 0. + b 1 ( cigarettes) b 0 +b 1 x

Stat 579: Generalized Linear Models and Extensions

Linear Regression. Data Model. β, σ 2. Process Model. ,V β. ,s 2. s 1. Parameter Model

Age 55 (x = 1) Age < 55 (x = 0)

Poisson Regression. The Training Data

Let s see if we can predict whether a student returns or does not return to St. Ambrose for their second year.

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

Truck prices - linear model? Truck prices - log transform of the response variable. Interpreting models with log transformation

Introduction to Statistics and R

Checking the Poisson assumption in the Poisson generalized linear model

9 Generalized Linear Models

Booklet of Code and Output for STAD29/STA 1007 Midterm Exam

Generalised linear models. Response variable can take a number of different formats

Stat 401B Final Exam Fall 2016

Stat 401B Exam 3 Fall 2016 (Corrected Version)

PAPER 218 STATISTICAL LEARNING IN PRACTICE

Log-linear Models for Contingency Tables

Generalized Linear Models. stat 557 Heike Hofmann

STAT 510 Final Exam Spring 2015

Week 7 Multiple factors. Ch , Some miscellaneous parts

R Output for Linear Models using functions lm(), gls() & glm()

Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/

STA102 Class Notes Chapter Logistic Regression

Logistic & Tobit Regression

Leftovers. Morris. University Farm. University Farm. Morris. yield

Random Independent Variables

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours

Table of Contents. Logistic Regression- Illustration Carol Bigelow March 21, 2017

STATS216v Introduction to Statistical Learning Stanford University, Summer Midterm Exam (Solutions) Duration: 1 hours

Various Issues in Fitting Contingency Tables

Simple logistic regression

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Booklet of Code and Output for STAD29/STA 1007 Midterm Exam

Today. HW 1: due February 4, pm. Aspects of Design CD Chapter 2. Continue with Chapter 2 of ELM. In the News:

Analysing categorical data using logit models

Regression and Generalized Linear Models. Dr. Wolfgang Rolke Expo in Statistics C3TEC, Caguas, October 9, 2015

Classification: Logistic Regression and Naive Bayes Book Chapter 4. Carlos M. Carvalho The University of Texas McCombs School of Business

Consider fitting a model using ordinary least squares (OLS) regression:

Unit 5 Logistic Regression Practice Problems

Introduction to Statistical Analysis

Theorems. Least squares regression

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

10702/36702 Statistical Machine Learning, Spring 2008: Homework 3 Solutions

Stat 5102 Final Exam May 14, 2015

Figure 36: Respiratory infection versus time for the first 49 children.

3 Joint Distributions 71

12 Modelling Binomial Response Data

Lecture 17: Numerical Optimization October 2014

Poisson Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Recap. HW due Thursday by 5 pm Next HW coming on Thursday Logistic regression: Pr(G = k X) linear on the logit scale Linear discriminant analysis:

Cherry.R. > cherry d h v <portion omitted> > # Step 1.

Logistic Regression. 0.1 Frogs Dataset

Regression models. Generalized linear models in R. Normal regression models are not always appropriate. Generalized linear models. Examples.

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

Binary Regression. GH Chapter 5, ISL Chapter 4. January 31, 2017

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

STAT 526 Spring Midterm 1. Wednesday February 2, 2011

Reaction Days

Statistics 203 Introduction to Regression Models and ANOVA Practice Exam

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn

Transcription:

On the Inference of the Logistic Regression Model 1. Model ln =(; ), i.e. = representing false. The linear form of (;) is entertained, i.e. ((;)) ((;)), where ==1 ;, with 1 representing true, 0 ;= 1+ + + =. ( =1 here) 2. Inference 2.1. Maximum Likelihood Method () Data: =,, where = is vector of binary responses, = is the feature matrix () with the first column being all 1 s. Log-likelihood function: = ln ( 1 ) ( ) = ln ()+(1 )ln1 (). (1) First Derivative (Gradient): =. = = ( ) ( ) =. ( ) ( ) =. = ( ) (( ))( ) ( ) = ( ) (( )) (( )) = (1 ). =(). = (1 ) (). Therefore, = ( ) (1 ) ()

= ( ) () = (1) () = ( ). (2) Second Derivative (Hessian): = ( ) = ( ) = () = () = () () () () () = () () () () = () = () = () (1 ) () = (1 ) () (). (1 ) 0 = (1) () (1) 0 (1 ) () =, (3) => concave! where = (1 ),, (1 ).

Newton s Method: ( 1) + () () () =0 (by letting the first derivative of the quadratic approximation at () be zero) ( 1) = () () () () () = () ( 1) () = () ( 1) + (). (4) Note that both () and () are calculated using (). Discussion 1: Let s push Eqn. (4) further. () = () ( 1) + () () () = () ( 1) + () () = () () () ( 1) + () = () () (). (5) The Eqn (5) says that () =argmin () () (), that is, iteratively, () is the solution to a weighted least square minimization problem. Discussion 2: Let s check what effect there will be when has linearly dependent columns (linealy correlated variables). The main concern is the Hessain matrix (). Note that () = () () = () (), where () = (), () = () Consider the matrix s rank, we have

() = () () = () () = () () =( ) =( ), Which implies that if is rank deficient due to the co-linearity of the variables, then () is also rank deficient, which is not invertible. So, co-linearity of the variables not only affects the inference in linear regression, it also affects the logistic regression. R Example: > ## Read and process the data: > inputfile_path = 'G:/AnalyticsCompetitions/others/data/'; > inputfile_name01 = 'binary.csv'; > inputfile01 = paste(inputfile_path, inputfile_name01, sep=''); > traindata = read.csv(inputfile01,colclasses = "character"); > featurecolumns = names(traindata)[2:4]; > traindata[, featurecolumns] <- apply(traindata[, featurecolumns], 2, as.numeric); > traindata$admit = as.numeric(traindata$admit); > print(summary(traindata)); admit gre gpa rank Min. :0.0000 Min. :220.0 Min. :2.260 Min. :1.000 1st Qu.:0.0000 1st Qu.:520.0 1st Qu.:3.130 1st Qu.:2.000 Median :0.0000 Median :580.0 Median :3.395 Median :2.000 Mean :0.3175 Mean :587.7 Mean :3.390 Mean :2.485 3rd Qu.:1.0000 3rd Qu.:660.0 3rd Qu.:3.670 3rd Qu.:3.000 Max. :1.0000 Max. :800.0 Max. :4.000 Max. :4.000 > print(traindata[1:20,]); admit gre gpa rank 1 0 380 3.61 3 2 1 660 3.67 3 3 1 800 4.00 1 4 1 640 3.19 4 5 0 520 2.93 4 6 1 760 3.00 2 7 1 560 2.98 1 8 0 400 3.08 2 9 1 540 3.39 3 10 0 700 3.92 2 11 0 800 4.00 4 12 0 440 3.22 1 13 1 760 4.00 1 14 0 700 3.08 2 15 1 700 4.00 1 16 0 480 3.44 3 17 0 780 3.87 4 18 0 360 2.56 3 19 0 800 3.75 2 20 1 540 3.81 1 > # Build dummy variables for the rank levels 2,3,4 > traindata[,'rank2'] <- 0; > traindata[,'rank3'] <- 0; > traindata[,'rank4'] <- 0; > traindata$rank2[traindata$rank == 2] <- 1; > traindata$rank3[traindata$rank == 3] <- 1; > traindata$rank4[traindata$rank == 4] <- 1; > featurecolumns = names(traindata)[c(2:3,5:7)]; > print(traindata[1:20,]); admit gre gpa rank rank2 rank3 rank4

1 0 380 3.61 3 0 1 0 2 1 660 3.67 3 0 1 0 3 1 800 4.00 1 0 0 0 4 1 640 3.19 4 0 0 1 5 0 520 2.93 4 0 0 1 6 1 760 3.00 2 1 0 0 7 1 560 2.98 1 0 0 0 8 0 400 3.08 2 1 0 0 9 1 540 3.39 3 0 1 0 10 0 700 3.92 2 1 0 0 11 0 800 4.00 4 0 0 1 12 0 440 3.22 1 0 0 0 13 1 760 4.00 1 0 0 0 14 0 700 3.08 2 1 0 0 15 1 700 4.00 1 0 0 0 16 0 480 3.44 3 0 1 0 17 0 780 3.87 4 0 0 1 18 0 360 2.56 3 0 1 0 19 0 800 3.75 2 1 0 0 20 1 540 3.81 1 0 0 0 > ## Build simple logistic regression model using just GRE > eqn = paste('admit', '~', 'gre', sep=''); > mylogistic <- glm(eqn, data = traindata, family='binomial'); > print(summary(mylogistic)); Call: glm(formula = eqn, family = "binomial", data = traindata) Deviance Residuals: Min 1Q Median 3Q Max -1.1623-0.9052-0.7547 1.3486 1.9879 Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) -2.901344 0.606038-4.787 1.69e-06 *** gre 0.003582 0.000986 3.633 0.00028 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 499.98 on 399 degrees of freedom Residual deviance: 486.06 on 398 degrees of freedom AIC: 490.06 Number of Fisher Scoring iterations: 4 > > ## Build simple logistic regression model using just GPA > eqn = paste('admit', '~', 'gpa', sep=''); > mylogistic <- glm(eqn, data = traindata, family='binomial'); > print(summary(mylogistic)); Call: glm(formula = eqn, family = "binomial", data = traindata) Deviance Residuals: Min 1Q Median 3Q Max -1.1131-0.8874-0.7566 1.3305 1.9824 Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) -4.3576 1.0353-4.209 2.57e-05 *** gpa 1.0511 0.2989 3.517 0.000437 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 499.98 on 399 degrees of freedom Residual deviance: 486.97 on 398 degrees of freedom AIC: 490.97

Number of Fisher Scoring iterations: 4 > > ## Build simple logistic regression model using all the feature variables > print("use dummy variables for the rank levels 2,3,4"); [1] "Use dummy variables for the rank levels 2,3,4" > eqn = paste('admit', '~', paste(featurecolumns,collapse='+'), sep=''); > mylogistic <- glm(eqn, data = traindata, family='binomial'); > print(summary(mylogistic)); Call: glm(formula = eqn, family = "binomial", data = traindata) Deviance Residuals: Min 1Q Median 3Q Max -1.6268-0.8662-0.6388 1.1490 2.0790 Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) -3.989979 1.139951-3.500 0.000465 *** gre 0.002264 0.001094 2.070 0.038465 * gpa 0.804038 0.331819 2.423 0.015388 * rank2-0.675443 0.316490-2.134 0.032829 * rank3-1.340204 0.345306-3.881 0.000104 *** rank4-1.551464 0.417832-3.713 0.000205 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 499.98 on 399 degrees of freedom Residual deviance: 458.52 on 394 degrees of freedom AIC: 470.52 Number of Fisher Scoring iterations: 4 > > ## Use factor function to specify the variable rank as categorical variable > print("use factor(rank)"); [1] "Use factor(rank)" > traindata$rank = factor(traindata$rank); > eqn = paste('admit', '~', paste(names(traindata)[2:4],collapse='+'), sep=''); > mylogistic <- glm(eqn, data = traindata, family='binomial'); > print(summary(mylogistic)); Call: glm(formula = eqn, family = "binomial", data = traindata) Deviance Residuals: Min 1Q Median 3Q Max -1.6268-0.8662-0.6388 1.1490 2.0790 Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) -3.989979 1.139951-3.500 0.000465 *** gre 0.002264 0.001094 2.070 0.038465 * gpa 0.804038 0.331819 2.423 0.015388 * rank2-0.675443 0.316490-2.134 0.032829 * rank3-1.340204 0.345306-3.881 0.000104 *** rank4-1.551464 0.417832-3.713 0.000205 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 499.98 on 399 degrees of freedom Residual deviance: 458.52 on 394 degrees of freedom AIC: 470.52 Number of Fisher Scoring iterations: 4

> # Confidence intervals > confint(mylogistic); Waiting for profiling to be done... 2.5 % 97.5 % (Intercept) -6.2716202334-1.792547080 gre 0.0001375921 0.004435874 gpa 0.1602959439 1.464142727 rank2-1.3008888002-0.056745722 rank3-2.0276713127-0.670372346 rank4-2.4000265384-0.753542605 > # Look at the correlation matrix of all the feature variables > print(cor(traindata[,featurecolumns])); gre gpa rank2 rank3 rank4 gre 1.00000000 0.38426588 0.05620179-0.07320022-0.06823548 gpa 0.38426588 1.00000000-0.05786732 0.07448976-0.08442825 rank2 0.05620179-0.05786732 1.00000000-0.51283704-0.34930442 rank3-0.07320022 0.07448976-0.51283704 1.00000000-0.29539686 rank4-0.06823548-0.08442825-0.34930442-0.29539686 1.00000000 () = () ( 1) + () ## # Input: x - feature matrix # n - number of rows (data points) # m - number of columns (features) # y - target column # Output: b - regressional parameters (m+1 parameters including the intercept) newtonlogistic <- function(x,n,m,y) { b = matrix(rep(0,m), m, 1); b_old = b; print('newton Iterations:'); print(as.numeric(b)); for (k in 1:10) { f = as.numeric(x %*% b_old); # n-by-1 matrix to a R-vector f = exp(f); p = f / (1+f); lam = diag(p*(1-p)); # Diagonal matrix p = matrix(p, n, 1); # Column vector b = solve( t(x) %*% lam %*% x) %*% t(x) %*% (y-p) + b_old; print(as.numeric(b)); if (sum(abs(b-b_old)) < 1.0e-4) break; b_old = b; } return (b); } ## > # Perform Newton's iterations for infereing the model parameters > n = dim(traindata)[1]; > x = as.matrix(traindata[,featurecolumns]); > x = cbind(matrix(rep(1,n)), x); # Feature matrix with first column as 1 > y = matrix(traindata$admit, n, 1); # Target (binary) vector > m = length(featurecolumns) + 1; # Number of columns in x > b = newtonlogistic(x,n,m,y); [1] "Newton Iterations:" [1] 0 0 0 0 0 0 [1] -3.035640842 0.001718288 0.622140099-0.649461398-1.162281919-1.292105462 [1] -3.913973614 0.002218669 0.789983846-0.674212541-1.327645335-1.527241669 [1] -3.989459770 0.002264096 0.803944664-0.675435206-1.340127398-1.551235371

[1] -3.989979046 0.002264426 0.804037545-0.675442928-1.340203913-1.551463658 [1] -3.989979073 0.002264426 0.804037549-0.675442928-1.340203916-1.551463677 > print('parameters infered by Newton method:'); [1] "Parameters infered by Newton method:" > print(b); [,1] -3.989979073 gre 0.002264426 gpa 0.804037549 rank2-0.675442928 rank3-1.340203916 rank4-1.551463677 Note: the interpretation of the inferred model includes: 1) For every one unit change in gre, the log odds of admission (versus non-admission) increases by 0.002. 2) For a one unit increase in gpa, the log odds of being admitted to graduate school increases by 0.804. 3) The indicator variables for rank have a slightly different interpretation. For example, having attended an undergraduate institution with rank of 2, versus an institution with a rank of 1, changes the log odds of admission by -0.675. For 3), consider r = log (P(admit = 1) / (1- P(admit = 1))) = b0 + b1*gre + b2*gpa + b3*rank2 + b4*rank3 + b5*rank4 b0 = E(r gre = 0, gpa = 0, rank2 = 0, rank3 = 0, rank4 = 0) b0 + b3 = E(r gre = 0, gpa = 0, rank2 = 1, rank3 = 0, rank4 = 0) b3 = E(r gre = 0, gpa = 0, rank2 = 1, rank3 = 0, rank4 = 0) - E(r gre = 0, gpa = 0, rank2 = 0, rank3 = 0, rank4 = 0) (rank2 = 0, rank3 = 0, rank4 = 0) corresponds to the rank value 1 (rank2 = 1, rank3 = 0, rank4 = 0) corresponds to the rank value 2 > # Add a column that is highly linealy correlated with gre > gre_mean = mean(traindata$gre); > gre_sd = sd(traindata$gre); > traindata[,'gre1'] <- traindata$gre + rnorm(n, gre_mean, 0.2*gre_sd); > featurecolumns = c(featurecolumns, 'gre1'); > print(cor(traindata[,featurecolumns])); gre gpa rank2 rank3 rank4 gre1 gre 1.00000000 0.38426588 0.05620179-0.07320022-0.06823548 0.97973518 gpa 0.38426588 1.00000000-0.05786732 0.07448976-0.08442825 0.37316586 rank2 0.05620179-0.05786732 1.00000000-0.51283704-0.34930442 0.05598413 rank3-0.07320022 0.07448976-0.51283704 1.00000000-0.29539686-0.06637129 rank4-0.06823548-0.08442825-0.34930442-0.29539686 1.00000000-0.06395016 gre1 0.97973518 0.37316586 0.05598413-0.06637129-0.06395016 1.00000000 > > ## Build simple logistic regression model using all the feature variables (with the newly added gre1) > eqn = paste('admit', '~', paste(featurecolumns,collapse='+'), sep=''); > mylogistic <- glm(eqn, data = traindata, family='binomial'); > print(summary(mylogistic)); Call: glm(formula = eqn, family = "binomial", data = traindata)

Deviance Residuals: Min 1Q Median 3Q Max -1.6568-0.8713-0.6454 1.0826 2.1229 Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) -0.096485 3.041532-0.032 0.974693 gre 0.008958 0.004992 1.795 0.072717. -> significance affected gpa 0.804279 0.333718 2.410 0.015950 * rank2-0.657404 0.316921-2.074 0.038047 * rank3-1.322875 0.345867-3.825 0.000131 *** rank4-1.530626 0.418321-3.659 0.000253 *** gre1-0.006682 0.004854-1.377 0.168627 -> significance affected --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 499.98 on 399 degrees of freedom Residual deviance: 456.60 on 393 degrees of freedom AIC: 470.6 Number of Fisher Scoring iterations: 4 > > # Newton's iterations > n = dim(traindata)[1]; > x = as.matrix(traindata[,featurecolumns]); > x = cbind(matrix(rep(1,n)), x); > y = matrix(traindata$admit, n, 1); > m = length(featurecolumns) + 1; > b = newtonlogistic(x,n,m,y); [1] "Newton Iterations:" [1] 0 0 0 0 0 0 0 [1] -0.051711928 0.006798878 0.616198019-0.633483074-1.141445649-1.272865437-0.005078346 [1] -0.091357472 0.008778534 0.788932144-0.656205150-1.309058459-1.505806919-0.006550945 [1] -0.096436935 0.008956801 0.804167308-0.657395835-1.322779777-1.530377433-0.006681483 [1] -0.096485412 0.008958147 0.804279270-0.657404060-1.322875265-1.530625937-0.006682452 [1] -0.096485415 0.008958147 0.804279277-0.657404060-1.322875270-1.530625960-0.006682452 > print('parameters infered by Newton method:'); [1] "Parameters infered by Newton method:" > print(b); [,1] -0.096485415 gre 0.008958147 gpa 0.804279277 rank2-0.657404060 rank3-1.322875270 rank4-1.530625960 gre1-0.006682452 > > # Look at vcov(mylogistic) > print('vcov:'); [1] "vcov:" > print(cov2cor(vcov(mylogistic))); (Intercept) gre gpa rank2 rank3 rank4 gre1 (Intercept) 1.00000000 0.88418841-0.29679332-0.05795453-0.02663107-0.05307074-0.92645019 gre 0.88418841 1.00000000-0.06919181 0.03064084 0.03092771 0.02516451-0.97560169 gpa -0.29679332-0.06919181 1.00000000 0.04322198-0.08233028 0.03017520-0.00640638 rank2-0.05795453 0.03064084 0.04322198 1.00000000 0.63513021 0.52967305-0.03232097 rank3-0.02663107 0.03092771-0.08233028 0.63513021 1.00000000 0.48191425-0.02079969 rank4-0.05307074 0.02516451 0.03017520 0.52967305 0.48191425 1.00000000-0.02018764 gre1-0.92645019-0.97560169-0.00640638-0.03232097-0.02079969-0.02018764 1.00000000 2.2. Maximum Accuracy Method (Minimum Error Method) Given the ground truths,, and the predictions,,, we have that the accuracy (or hit rate) is h = +(1 )1, (6)

where the summation h is total number of hits. Let s look at the first derivative =. = (2 1), = (1 ) () = (2 1) (1 ) (), (7) which looks not as nice as the Eqn. (2) Equivalently we can consider the error rate, which is = 1 (1 )1, = 1 +(1 ) (8) where the summation is total number of misses. The first derivative is = (1 2 ) (1 ) (). (9) 2.2. Minimum Cost Method Although the Eqn. (7) and the Eqn. (9) look not as nice as the Eqn. (2), we can easily generalize them to fit more scenarios. One scenario is that is very small but the cost of misclassifying any point with =1 is very high. Look at the following cost metrics Ground truth Model output =0 =1 Prediction 1 =0 0 =1 0 The minimum error method can then be generalized into the minimum cost method

= 1 + (1 ). (10) The first derivative is = ( (1 ) ) (1 ) (). (11) If is very large, an usual method is to resort to the stochastic gradient descent method using the Eqn. (11), that is () = () ( () ). An interesting question is: is convext? This is very tricky question. Let s look at the second derivative = ( (1 ) ) (1 ) () = ( (1 ) ) (1 ) () () = ( (1 ) ) (1 ) () (1 ) = ( (1 ) ) = ( (1 ) ) () ( ) () ( ) () ( ) () ( ) () ( ) () ( ) () ( ) ( ) () = ( (1 ) ) () ( ) = ( (1 ) ) () (1 2 ) = ( (1 ) ) () (1 2 ) (1 ) () = ( (1 ) )(1 2 ) (1 ) () () = (1 2 ) (1 ) () () : (1 2 ) (1 ) () () : (12) Therefore, we can see that when =0 and >0.5 when =1, then the hessian matrix is positive semidefinite. That is true positive rate > 0.5 is an important condition.