Kernel Logistic Regression and the Import Vector Machine

Kernel Logistic Regression and the Import Vector Machine Ji Zhu and Trevor Hastie Journal of Computational and Graphical Statistics, 2005 Presented by Mingtao Ding Duke University December 8, 2011 Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 1 / 24

Summary The authors propose a new approach for classification, called the import vector machine (IVM). Provides estimates of the class probabilities. Often these are more useful than the classifications. Generalizes naturally to M-class classification through kernel logistic regression (KLR). Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 2 / 24

Problem and Objective Supervised Learning Problem: a set of training data {(x i, y i )}, where x i R p is an input vector, and y i (dependent on x i ) is a univariate continuous output for the regress problem or binary output for the classification problem. Objective: learn a predictive function f (x) from the training data { 1 min f F n n i=1 L (y i, f (x i )) + λ 2 Φ( f F ) }. In this presentation, F is assumed as an reproducing kernel Hilbert space (RKHS) H K. Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 3 / 24

SVM and KLR The standard SVM can be fitted via Loss + Regularization { 1 min f H n K n i=1 [1 y i f (x i )] + + λ 2 f 2 H K }. Under very general conditions, the solution has the form f (x) = n a i K (x, x i ). i=1 Example Conditions: Arbitrary L ((x 1, y 1, f (x 1 ), ) (x 2, y 2, f (x 2 ),, ) (x n, y n, f (x n ))) and strictly monotonic increasing function Φ. The points with y i f (x i ) > 1 have no influence in loss function. As a consequence, it often happens that a sizeable fraction of the n values of a i can be zero. The points corresponding nonzero a i are called supporting points Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 4 / 24

SVM and KLR The loss function (1 yf ) + is plotted in Fig.1, along with the negative log-likelihood (NLL) of the binomial distribution (of y over {1, 1}). NLL = ln ( 1 + e yf ) = where p P (Y = 1 X = x) = 1 1+e f. { ln p, if y = 1 ln (1 p), if y = 1, Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 5 / 24

SVM and KLR The SVM only estimates sign [p(x) 1/2] (by calculating the distances between x and the hyperplanes), without defining the class probability p(x). The NLL of y has a similar shape to that of the SVM. If we let y {0, 1}, then NLL = (y ln p + (1 y) ln p) = ( yf ln ( 1 + e f )). This is the loss function of the classical KLR. Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 6 / 24

SVM and KLR If we replace (1 yf ) + with ln ( 1 + e yf ), the SVM becomes a KLR problem with the objective junction n ln ( 1 + e yf ) } + λ 2 f 2 H K. min f H K { 1 n i=1 Advantages: Offer a natural estimate of the class probability p(x). Can naturally be generalized to the M-Class case through kernel multi-logit regress. Disadvantages: For the KLR solution f (x) = n a i K (x, x i ), all the a i s are nonzero i=1 Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 7 / 24

SVM and KLR Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 8 / 24

KLR as a Margin Maximizer Suppose the basis functions of the transformed features space h(x) is rich enough, so that the superplane f (x) = h(x) T β + β 0 = 0 can separate the training data. Theorem 1 Denote by ˆβ (λ) the solution to KLR min f H K { 1 n n i=1 ln ( 1 + e yf ) + λ 2 f 2 H K }, then lim λ 0 ˆβ (λ) = β, where β is the margin-maximizing SVM solution. Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 9 / 24

Import Vector Machine The objective function of KLR can be written as H = 1 n 1T ln ( 1 + e y (K 1a) ) + λ 2 at K 2 a. To find a, we set the derivative of H with respect to a equal to 0, and use the Newton method iteratively solve the score equation. The Newton update can be written as Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 10 / 24

Import Vector Machine The computational cost of the KLR is O(n 3 ). To save the cost, the IVM algorithm will find a sub-model to approximate the full model given by KLR. The sub-model has the form f (x) = x i S a i K (x, x i ) where S is a subset of the training data, and the data in S are called import points. Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 11 / 24

Import Vector Machine Algorithm 1: Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 12 / 24

Import Vector Machine Algorithm 1: Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 13 / 24

Import Vector Machine The algorithm can be accelerated by revising Step (2) as Stopping rule for adding point to S: run the algorithm until H k H k k H k < ɛ. Choosing the regularization parameter λ: we can split all the data into a training set and a tuning set, and use the misclassification error on the tuning set as a criterion for choosing λ (Algorithm 3). Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 14 / 24

Import Vector Machine Algorithm 3: 1. Start with a large regularization parameter λ. Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 15 / 24

Simulation Results Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 16 / 24

Simulation Results Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 17 / 24

Real Data Results Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 18 / 24

Real Data Results Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 19 / 24

Real Data Results Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 20 / 24

Generalization to M-Class Case Similar with the kernel multi-logit regression, we define the class probabilities as C f c (x) = 0. c=1 Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 21 / 24

Generalization to M-Class Case The M-class KLR fits a model to minimize the regularized NLL of multinomial distribution The approximate solution can be obtained by a M-class IVM procedure, which is similar to the two-class case. Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 22 / 24

Generalization to M-Class Case Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 23 / 24

Conclusion IVM not only performs as well as SVm in two-class classification, but also can naturally be generalized to the M-class case. Computational Cost: KLR (O(n 3 )), SVM (O(n 2 n s )), IVM (O(n 2 n 2 I ), O(Cn2 n 2 I )). IVM has limiting optimal margin properties. Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 24 / 24