Kernel Logistic Regression and the Import Vector Machine

Similar documents
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

Support Vector Machines for Classification: A Statistical Portrait

Support Vector Machines

CIS 520: Machine Learning Oct 09, Kernel Methods

Introduction to Machine Learning

Regularization Paths

Support Vector Machine. Industrial AI Lab.

Indirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina

Reproducing Kernel Hilbert Spaces

Boosting as a Regularized Path to a Maximum Margin Classifier

Sufficient Dimension Reduction using Support Vector Machine and it s variants

Statistical Methods for Data Mining

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Learning Binary Classifiers for Multi-Class Problem

10-701/ Machine Learning - Midterm Exam, Fall 2010

ABC-LogitBoost for Multi-Class Classification

Regularization Paths. Theme

Support Vector Machine. Industrial AI Lab. Prof. Seungchul Lee

Lecture 15: Logistic Regression

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Machine Learning 2017

COMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma

Kernel Methods. Machine Learning A W VO

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc.

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Introduction to Machine Learning Midterm Exam

Evaluation. Andrea Passerini Machine Learning. Evaluation

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

Linear Models in Machine Learning

Evaluation requires to define performance measures to be optimized

BANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1

c 4, < y 2, 1 0, otherwise,

Big Data Analytics. Lucas Rego Drumond

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Machine Learning. Classification, Discriminative learning. Marc Toussaint University of Stuttgart Summer 2015

Incorporating detractors into SVM classification

ESS2222. Lecture 4 Linear model

Nonlinear Support Vector Machines through Iterative Majorization and I-Splines

CSE 546 Final Exam, Autumn 2013

18.9 SUPPORT VECTOR MACHINES

ECS171: Machine Learning

Machine Learning and Data Mining. Linear classification. Kalev Kask

Logistic Regression. COMP 527 Danushka Bollegala

Relevance Vector Machines

Support Vector Machine

Lecture 10: Support Vector Machine and Large Margin Classifier

Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron

0.5. (b) How many parameters will we learn under the Naïve Bayes assumption?

Introduction to SVM and RVM

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Neural Networks. Prof. Dr. Rudolf Kruse. Computational Intelligence Group Faculty for Computer Science

Statistical Methods for SVM

Warm up: risk prediction with logistic regression

Support Vector Machines

Logistic Regression Trained with Different Loss Functions. Discussion

Machine Learning Lecture 7

Machine Learning Practice Page 2 of 2 10/28/13

Pattern Recognition and Machine Learning

CENG 793. On Machine Learning and Optimization. Sinan Kalkan

Introduction to Machine Learning Midterm Exam Solutions

COMS 4771 Regression. Nakul Verma

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Announcements - Homework

ABC-Boost: Adaptive Base Class Boost for Multi-class Classification

Lecture 4 Logistic Regression

Linear Discrimination Functions

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

5.6 Nonparametric Logistic Regression

Variable Selection in Data Mining Project

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

Support Vector Machines

Undirected Graphical Models

Support Vector Machines Explained

Machine Learning for NLP

Linear & nonlinear classifiers

ECE 5424: Introduction to Machine Learning

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Advanced Introduction to Machine Learning

ESS2222. Lecture 5 Support Vector Machine

Logistic Regression and Generalized Linear Models

Midterm exam CS 189/289, Fall 2015

CSC 411 Lecture 17: Support Vector Machine

Structured Statistical Learning with Support Vector Machine for Feature Selection and Prediction

STA 4273H: Statistical Machine Learning

CMU-Q Lecture 24:

1 Boosting. COS 511: Foundations of Machine Learning. Rob Schapire Lecture # 10 Scribe: John H. White, IV 3/6/2003

Introduction to Logistic Regression and Support Vector Machine

Support Vector Machines

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

STATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION

Building a Prognostic Biomarker

Machine Learning. Lecture 3: Logistic Regression. Feng Li.

Jeff Howbert Introduction to Machine Learning Winter

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

6.036 midterm review. Wednesday, March 18, 15

Lecture 16: Modern Classification (I) - Separating Hyperplanes

Transcription:

Kernel Logistic Regression and the Import Vector Machine Ji Zhu and Trevor Hastie Journal of Computational and Graphical Statistics, 2005 Presented by Mingtao Ding Duke University December 8, 2011 Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 1 / 24

Summary The authors propose a new approach for classification, called the import vector machine (IVM). Provides estimates of the class probabilities. Often these are more useful than the classifications. Generalizes naturally to M-class classification through kernel logistic regression (KLR). Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 2 / 24

Problem and Objective Supervised Learning Problem: a set of training data {(x i, y i )}, where x i R p is an input vector, and y i (dependent on x i ) is a univariate continuous output for the regress problem or binary output for the classification problem. Objective: learn a predictive function f (x) from the training data { 1 min f F n n i=1 L (y i, f (x i )) + λ 2 Φ( f F ) }. In this presentation, F is assumed as an reproducing kernel Hilbert space (RKHS) H K. Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 3 / 24

SVM and KLR The standard SVM can be fitted via Loss + Regularization { 1 min f H n K n i=1 [1 y i f (x i )] + + λ 2 f 2 H K }. Under very general conditions, the solution has the form f (x) = n a i K (x, x i ). i=1 Example Conditions: Arbitrary L ((x 1, y 1, f (x 1 ), ) (x 2, y 2, f (x 2 ),, ) (x n, y n, f (x n ))) and strictly monotonic increasing function Φ. The points with y i f (x i ) > 1 have no influence in loss function. As a consequence, it often happens that a sizeable fraction of the n values of a i can be zero. The points corresponding nonzero a i are called supporting points Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 4 / 24

SVM and KLR The loss function (1 yf ) + is plotted in Fig.1, along with the negative log-likelihood (NLL) of the binomial distribution (of y over {1, 1}). NLL = ln ( 1 + e yf ) = where p P (Y = 1 X = x) = 1 1+e f. { ln p, if y = 1 ln (1 p), if y = 1, Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 5 / 24

SVM and KLR The SVM only estimates sign [p(x) 1/2] (by calculating the distances between x and the hyperplanes), without defining the class probability p(x). The NLL of y has a similar shape to that of the SVM. If we let y {0, 1}, then NLL = (y ln p + (1 y) ln p) = ( yf ln ( 1 + e f )). This is the loss function of the classical KLR. Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 6 / 24

SVM and KLR If we replace (1 yf ) + with ln ( 1 + e yf ), the SVM becomes a KLR problem with the objective junction n ln ( 1 + e yf ) } + λ 2 f 2 H K. min f H K { 1 n i=1 Advantages: Offer a natural estimate of the class probability p(x). Can naturally be generalized to the M-Class case through kernel multi-logit regress. Disadvantages: For the KLR solution f (x) = n a i K (x, x i ), all the a i s are nonzero i=1 Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 7 / 24

SVM and KLR Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 8 / 24

KLR as a Margin Maximizer Suppose the basis functions of the transformed features space h(x) is rich enough, so that the superplane f (x) = h(x) T β + β 0 = 0 can separate the training data. Theorem 1 Denote by ˆβ (λ) the solution to KLR min f H K { 1 n n i=1 ln ( 1 + e yf ) + λ 2 f 2 H K }, then lim λ 0 ˆβ (λ) = β, where β is the margin-maximizing SVM solution. Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 9 / 24

Import Vector Machine The objective function of KLR can be written as H = 1 n 1T ln ( 1 + e y (K 1a) ) + λ 2 at K 2 a. To find a, we set the derivative of H with respect to a equal to 0, and use the Newton method iteratively solve the score equation. The Newton update can be written as Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 10 / 24

Import Vector Machine The computational cost of the KLR is O(n 3 ). To save the cost, the IVM algorithm will find a sub-model to approximate the full model given by KLR. The sub-model has the form f (x) = x i S a i K (x, x i ) where S is a subset of the training data, and the data in S are called import points. Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 11 / 24

Import Vector Machine Algorithm 1: Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 12 / 24

Import Vector Machine Algorithm 1: Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 13 / 24

Import Vector Machine The algorithm can be accelerated by revising Step (2) as Stopping rule for adding point to S: run the algorithm until H k H k k H k < ɛ. Choosing the regularization parameter λ: we can split all the data into a training set and a tuning set, and use the misclassification error on the tuning set as a criterion for choosing λ (Algorithm 3). Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 14 / 24

Import Vector Machine Algorithm 3: 1. Start with a large regularization parameter λ. Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 15 / 24

Simulation Results Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 16 / 24

Simulation Results Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 17 / 24

Real Data Results Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 18 / 24

Real Data Results Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 19 / 24

Real Data Results Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 20 / 24

Generalization to M-Class Case Similar with the kernel multi-logit regression, we define the class probabilities as C f c (x) = 0. c=1 Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 21 / 24

Generalization to M-Class Case The M-class KLR fits a model to minimize the regularized NLL of multinomial distribution The approximate solution can be obtained by a M-class IVM procedure, which is similar to the two-class case. Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 22 / 24

Generalization to M-Class Case Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 23 / 24

Conclusion IVM not only performs as well as SVm in two-class classification, but also can naturally be generalized to the M-class case. Computational Cost: KLR (O(n 3 )), SVM (O(n 2 n s )), IVM (O(n 2 n 2 I ), O(Cn2 n 2 I )). IVM has limiting optimal margin properties. Mingtao Ding Kernel Logistic Regression and the Import Vector Machine December 8, 2011 24 / 24