Machine Learning Techniques

Similar documents
Machine Learning Techniques

Machine Learning Techniques

Machine Learning Techniques

Machine Learning Foundations

Machine Learning Foundations

Machine Learning Techniques

Array 10/26/2015. Department of Computer Science & Information Engineering. National Taiwan University

Machine Learning Techniques

Machine Learning Foundations

Machine Learning Techniques

Machine Learning Foundations

Polymorphism 11/09/2015. Department of Computer Science & Information Engineering. National Taiwan University

Introduction to Support Vector Machines

Encapsulation 10/26/2015. Department of Computer Science & Information Engineering. National Taiwan University

Inheritance 11/02/2015. Department of Computer Science & Information Engineering. National Taiwan University

Soft-Margin Support Vector Machine

CSC 411 Lecture 17: Support Vector Machine

Introduction to Logistic Regression and Support Vector Machine

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines

Learning From Data Lecture 25 The Kernel Trick

Basic Java OOP 10/12/2015. Department of Computer Science & Information Engineering. National Taiwan University

Announcements - Homework

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Jeff Howbert Introduction to Machine Learning Winter

Support vector machines

Outline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017

Support vector machines Lecture 4

Max Margin-Classifier

Warm up: risk prediction with logistic regression

Support Vector Machine (SVM) and Kernel Methods

Lecture 10: Support Vector Machine and Large Margin Classifier

Support Vector Machines

Statistical Pattern Recognition

Support Vector Machines

Support Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs

Lecture 10: A brief introduction to Support Vector Machine

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine

Kernelized Perceptron Support Vector Machines

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machines for Classification and Regression

Topics we covered. Machine Learning. Statistics. Optimization. Systems! Basics of probability Tail bounds Density Estimation Exponential Families

CS145: INTRODUCTION TO DATA MINING

ESS2222. Lecture 4 Linear model

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Support Vector Machine (continued)

Support Vector Machines and Kernel Methods

SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels

Machine Learning. Support Vector Machines. Manfred Huber

Support Vector Machines

Support Vector Machine II

Linear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Incremental and Decremental Training for Linear Classification

Indirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina

Support Vector Machines. Machine Learning Fall 2017

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University

Kernel Methods and Support Vector Machines

Statistical Machine Learning from Data

10701 Recitation 5 Duality and SVM. Ahmed Hefny

ICS-E4030 Kernel Methods in Machine Learning

Pattern Recognition 2018 Support Vector Machines

Machine Learning And Applications: Supervised Learning-SVM

LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

SVMs, Duality and the Kernel Trick

LECTURE 7 Support vector machines

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Introduction to Machine Learning

Support Vector Machines

Introduction to SVM and RVM

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Kernel Logistic Regression and the Import Vector Machine

Support Vector Machines: Maximum Margin Classifiers

Support Vector Machines

Support Vector Machine for Classification and Regression

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning

Lecture Support Vector Machine (SVM) Classifiers

Perceptron Revisited: Linear Separators. Support Vector Machines

Introduction to Machine Learning

Training Support Vector Machines: Status and Challenges

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines

Support Vector Machines

Support Vector Machines, Kernel SVM

Basis Expansion and Nonlinear SVM. Kai Yu

Lecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University

Support Vector Machines

Stat542 (F11) Statistical Learning. First consider the scenario where the two classes of points are separable.

Introduction to Support Vector Machines

SUPPORT VECTOR MACHINE

SVMs: nonlinearity through kernels

Infinite Ensemble Learning with Support Vector Machinery

Support Vector Machines for Classification: A Statistical Portrait

CIS 520: Machine Learning Oct 09, Kernel Methods

Multi-class SVMs. Lecture 17: Aykut Erdem April 2016 Hacettepe University

Support Vector Machines Explained

Logistic Regression. Machine Learning Fall 2018

Classification and Support Vector Machine

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Transcription:

Machine Learning Techniques ( 機器學習技巧 ) Lecture 5: SVM and Logistic Regression Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University ( 國立台灣大學資訊工程系 ) Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 0/20

Agenda Lecture 5: SVM and Logistic Regression Soft-Margin SVM as Regularization SVM versus Logistic Regression SVM for Soft Classification Kernel Logistic Regression Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques /20

Soft-Margin SVM as Regularization Wrap-Up Hard-Margin Primal Soft-Margin Primal b,w 2 wt w s.t. y n (w T z n + b) b,w,ξ s.t. 2 wt w + C ξ n y n (w T z n + b) ξ n Hard-Margin Dual α 2 αt Qα T α s.t. y T α = 0 0 α n Soft-Margin Dual α 2 αt Qα T α s.t. y T α = 0 0 α n C soft-margin preferred in practice; linear: LIBLINEAR; non-linear: LIBSVM Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 2/20

Soft-Margin SVM as Regularization Slack Variables ξ n record margin violation by ξ n penalize with margin violation Hi b,w,ξ s.t. 2 wt w + C ξ n y n (w T z n + b) ξ n and ξ n 0 for all n violation on any (b, w), ξ n = margin violation = max ( y n (w T z n + b), 0 ) (x n, y n ) violating margin: ξ n = y n (w T z n + b) (x n, y n ) not violating margin: ξ n = 0 Hi unconstrained form of soft-margin SVM: b,w 2 wt w + C max ( y n (w T z n + b), 0 ) Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 3/20

b,w Soft-Margin SVM as Regularization Unconstrained Form 2 wt w + C max ( y n (w T z n + b), 0 ) familiar? :-) just L2 regularization 2 wt w + C êrr λ N wt w + err with shorter w, another parameter, and special err why not solve this? :-) not QP, no (?) kernel trick max(, 0) not differentiable, harder to solve Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 4/20

Soft-Margin SVM as Regularization SVM as Regularization imize constraint regularization by constraint E in w T w C hard-margin SVM w T w E in = 0 [and more] L2 regularization λ N wt w + E in soft-margin SVM 2 wt w + CÊin large margin fewer hyperplanes L2 regularization of short w soft margin special êrr larger C or C smaller λ less regularization viewing SVM as regularization: allows extending/connecting to other learning models Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 5/20

Soft-Margin SVM as Regularization Fun Time Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 6/20

b,w SVM versus Logistic Regression Error Function of SVM 2 wt w + C max ( y n (w T z n + b), 0 ) linear score s = w T z n + b err 0/ (s, y) = ys êrr SVM (s, y) = max( ys, 0): upper bound of err 0/ often called hinge error measure 6 4 err 0/ 2 0 3 2 0 ys 2 3 êrr SVM : algorithmic error measure by convex upper bound of err 0/ Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 7/20

b,w SVM versus Logistic Regression Error Function of SVM 2 wt w + C max ( y n (w T z n + b), 0 ) linear score s = w T z n + b err 0/ (s, y) = ys êrr SVM (s, y) = max( ys, 0): upper bound of err 0/ often called hinge error measure 6 4 err 0/ svm 2 0 3 2 0 ys 2 3 êrr SVM : algorithmic error measure by convex upper bound of err 0/ Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 7/20

SVM versus Logistic Regression Connection between SVM and Logistic Regression linear score s = w T z n + b err 0/ (s, y) = ys êrr SVM (s, y) = max( ys, 0): upper bound of err 0/ err SCE (s, y) = log 2 ( + exp( ys)): another upper bound of err 0/ used in logistic regression 6 4 err 0/ svm scaled ce 2 0 3 2 0 ys 2 3 ys + ys êrr SVM (s, y) = 0 ys (ln 2) err SCE (s, y) 0 SVM L2-regularized logistic regression Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 8/20

PLA imize err 0/ specially SVM versus Logistic Regression Linear Models for Binary Classification pros: efficient if lin. separable cons: works only if lin. separable, otherwise needing pocket soft-margin SVM imize regularized êrr SVM by QP pros: easy optimization & theoretical guarantee cons: loose bound of err 0/ for very negative ys regularized logistic regression for classification imize regularized err SCE by GD/SGD/... pros: easy optimization & regularization guard cons: loose bound of err 0/ for very negative ys regularized LogReg = approximate SVM SVM = approximate LogReg (?) Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 9/20

SVM versus Logistic Regression Fun Time Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 0/20

SVM for Soft Classification SVM for Soft Classification Naïve Idea run SVM and get (b SVM, w SVM ) 2 return g(x) = θ(w T SVMx + b SVM ) direct use of similarity works reasonably well no LogReg flavor Naïve Idea 2 run SVM and get (b SVM, w SVM ) 2 run LogReg with (b SVM, w SVM ) as w 0 3 return LogReg solution as g(x) not really easier than original LogReg SVM flavor (kernel?) lost want: flavors from both sides Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques /20

SVM for Soft Classification A Possible Model: Two-Level Learning g(x) = θ(a (w T SVMΦ(x) + b SVM ) + B) SVM flavor: fix hyperplane direction by w SVM kernel applies LogReg flavor: fine-tune hyperplane to match maximum likelihood by scaling (A) and shifting (B) often A > 0 if w SVM reasonably good often B 0 if b SVM reasonably good new LogReg Problem: ) log + exp y n (A (w T A,B N SVMΦ(x n ) + b SVM ) + B }{{} Φ SVM(x n) two-level learning: LogReg on SVM-transformed data Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 2/20

SVM for Soft Classification Probabilistic SVM Platt s Model of Probabilistic SVM for Soft Classification run SVM on D to get (b SVM, w SVM ) [or the equivalent α], and transform D to z n = w T SVMΦ(x n ) + b SVM. actual model performs this step more sophistically 2 run LogReg on {(z n, y n )} N to get (A, B). actual model adds some special regularization here 3 return g(x) = θ(a (w T SVMΦ(x) + b SVM ) + B) soft classifier not having the same boundary as SVM classifier because of B how to solve LogReg: GD/SGD/or better because only two variables kernel SVM = approx. LogReg in Z-space exact LogReg in Z-space? Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 3/20

SVM for Soft Classification Fun Time Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 4/20

Kernel Logistic Regression Key behind Kernel Trick one key behind kernel trick: optimal w = N β n z n because w T z = N β n z T n z = N β n K (x n, x) SVM w SVM = (α n y n )z n α n from dual solutions PLA w PLA = (α n y n )z n α n by # mistake corrections LogReg by SGD w LOGREG = (α n y n )z n α n by total SGD moves when can optimal w be represented by z n? Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 5/20

Kernel Logistic Regression Representer Theorem claim: for any L2-regularized linear model w optimal w = N β nz n. λ N wt w + N err(y, w T z n ) let optimal w = w + w, where w span(z n ) & w span(z n ) want w = 0 what if not? Consider w of same err as w : err(y, w T z n ) = err(y, (w + w ) T z n ) of smaller regularizer as w : w T w = w T w + 2w T w + w T w > w T w w more optimal than w (contradiction!) any L2-regularized linear model can be kernelized! Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 6/20

Kernel Logistic Regression Kernel Logistic Regression solving L2-regularized logistic regression w λ N wt w + N ( )) log + exp ( y n w T z n yields optimal solution w = N β nz n with out loss of generality, can solve for optimal β instead of w β λ N m= β n β m K (x n, x m ) + N ( ( log + exp y n how? GD/SGD/... for unconstrained optimization N m= kernel logistic regression: use representer theorem for kernel trick on L2-regularized logistic regression β m K (x m, x n ) Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 7/20 ))

Kernel Logistic Regression Kernel Logistic Regression: Another View β λ N m= β n β m K (x n, x m ) + N ( ( log + exp y n N m= β m K (x m, x n ) )) N m= β mk (x m, x n ): inner product between variables β and transformed data (K (x, x n ), K (x 2, x n ),..., K (x N, x n )) N N m= β nβ m K (x n, x m ): a special regularizer β T Kβ KLR = linear model of β with kernel transform & kernel regularizer; = linear model of w with kernel-embedded transform & L2 regularizer similar for SVM different routes to the same destination allows extension from different views Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 8/20

Kernel Logistic Regression Fun Time Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 9/20

Kernel Logistic Regression Summary Lecture 5: SVM and Logistic Regression Soft-Margin SVM as Regularization L2-regularization with hinge error measure SVM versus Logistic Regression L2-regularized logistic regression SVM for Soft Classification common approach: two-level learning Kernel Logistic Regression representer theorem on L2-regularized LogReg Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 20/20