UVA CS 4501: Machine Learning

Similar documents
UVA CS / Introduc8on to Machine Learning and Data Mining. Lecture 9: Classifica8on with Support Vector Machine (cont.

UVA CS 4501: Machine Learning. Lecture 11b: Support Vector Machine (nonlinear) Kernel Trick and in PracCce. Dr. Yanjun Qi. University of Virginia

Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Support Vector Machines: Maximum Margin Classifiers

Support Vector Machines and Kernel Methods

Support Vector Machine (continued)

CSC 411 Lecture 17: Support Vector Machine

10701 Recitation 5 Duality and SVM. Ahmed Hefny

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines

Machine Learning. Support Vector Machines. Manfred Huber

Statistical Machine Learning from Data

SVMs, Duality and the Kernel Trick

COMP 652: Machine Learning. Lecture 12. COMP Lecture 12 1 / 37

Announcements - Homework

Support Vector Machines, Kernel SVM

Support Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs

Support Vector Machines for Classification and Regression

Linear Support Vector Machine. Classification. Linear SVM. Huiping Cao. Huiping Cao, Slide 1/26

SMO Algorithms for Support Vector Machines without Bias Term

The Lagrangian L : R d R m R r R is an (easier to optimize) lower bound on the original problem:

ICS-E4030 Kernel Methods in Machine Learning

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Machine Learning and Data Mining. Support Vector Machines. Kalev Kask

Lecture Notes on Support Vector Machine

Max Margin-Classifier

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines

Lecture Support Vector Machine (SVM) Classifiers

Outline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22

Lecture 2: Linear SVM in the Dual

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Machine Learning. Lecture 6: Support Vector Machine. Feng Li.

Support Vector Machine

Convex Optimization & Lagrange Duality

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017

Introduction to Logistic Regression and Support Vector Machine

Convex Optimization and Support Vector Machine

(Kernels +) Support Vector Machines

Support Vector Machines

CS269: Machine Learning Theory Lecture 16: SVMs and Kernels November 17, 2010

CS-E4830 Kernel Methods in Machine Learning

Constrained Optimization and Lagrangian Duality

Support Vector Machines

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning

SUPPORT VECTOR MACHINE

Support Vector Machines

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Machine Learning A Geometric Approach

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

Support Vector Machines

Sequential Minimal Optimization (SMO)

Lecture 18: Optimization Programming

Support Vector Machines

Support Vector Machines

A Tutorial on Support Vector Machine

Jeff Howbert Introduction to Machine Learning Winter

Support Vector Machine

Introduction to Support Vector Machines

Support Vector Machines

Support Vector Machines

ML (cont.): SUPPORT VECTOR MACHINES

Geometrical intuition behind the dual problem

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Constrained optimization: direct methods (cont.)

Incorporating detractors into SVM classification

Topics we covered. Machine Learning. Statistics. Optimization. Systems! Basics of probability Tail bounds Density Estimation Exponential Families

Solving the SVM Optimization Problem

Kernelized Perceptron Support Vector Machines

Introduction to Support Vector Machines

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Support Vector Machine (SVM) and Kernel Methods

Supplement: Distributed Box-constrained Quadratic Optimization for Dual Linear SVM

Optimization for Machine Learning

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Polyhedral Computation. Linear Classifiers & the SVM

A Revisit to Support Vector Data Description (SVDD)

Support Vector Machine (SVM) and Kernel Methods

UVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 6: Linear Regression Model with RegularizaEons. Dr. Yanjun Qi. University of Virginia

Dan Roth 461C, 3401 Walnut

Support Vector Machine (SVM) and Kernel Methods

SMO vs PDCO for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Support Vector Machines

Support Vector Machines.

Constrained Optimization

Nonlinear Support Vector Machines through Iterative Majorization and I-Splines

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Linear & nonlinear classifiers

UVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 10: Supervised ClassificaDon with Support Vector Machine. Dr. Yanjun Qi. University of Virginia

Pattern Recognition 2018 Support Vector Machines

Support Vector Machines for Classification: A Statistical Portrait

CS145: INTRODUCTION TO DATA MINING

Sufficient Conditions for Finite-variable Constrained Minimization

CS , Fall 2011 Assignment 2 Solutions

Support vector machines

Machine Learning Basics Lecture 4: SVM I. Princeton University COS 495 Instructor: Yingyu Liang

Primal/Dual Decomposition Methods

Lecture 10: A brief introduction to Support Vector Machine

L5 Support Vector Classification

Transcription:

UVA CS 4501: Machine Learning Lecture 16 Extra: Support Vector Machine Optimization with Dual Dr. Yanjun Qi University of Virginia Department of Computer Science

Today Extra q Optimization of SVM ü SVM as QP ü A simple example of constrained optimization ü SVM Optimization with dual form ü KKT condition ü SMO algorithm for fast SVM dual optimization 3/29/18 Dr. Yanjun Qi / UVA 2

Optimization with Quadratic programming (QP) Quadratic programming solves optimization problems of the following form: min U u T Ru 2 + d T u + c subject to n inequality constraints: a 11 u 1 + a 12 u 2 +... b 1!!! Quadratic term a n1 u 1 + a n 2 u 2 +... b n and k equivalency constraints: a n +1,1 u 1 + a n +1,2 u 2 +...= b n +1!!! a n +k,1 u 1 + a n +k,2 u 2 +...= b n +k When a problem can be specified as a QP problem we can use solvers that are better than gradient descent or simulated annealing 3/29/18 Dr. Yanjun Qi / UVA 3

SVM as a QP problem R as I matrix, d as zero vector, c as 0 value M = 2 w T w min U u T Ru 2 + d T u + c subject to n inequality constraints: a 11 u 1 + a 12 u 2 +... b 1!!! Min (w T w)/2 subject to the following inequality constraints: For all x in class + 1 w T x+b >= 1 For all x in class - 1 w T x+b <= -1 } A total of n constraints if we have n input samples a n1 u 1 + a n 2 u 2 +... b n and k equivalency constraints: a n +1,1 u 1 + a n +1,2 u 2 +... = b n +1!!! a n +k,1 u 1 + a n +k,2 u 2 +... = b n +k 3/29/18 Dr. Yanjun Qi / UVA 4

Optimization Review: Ingredients Objective function Variables Constraints Find values of the variables that minimize or maximize the objective function while satisfying the constraints 3/29/18 Dr. Yanjun Qi / UVA 5

Today Extra q Optimization of SVM ü SVM as QP ü A simple example of constrained optimization and dual ü Optimization with dual form ü KKT condition ü SMO algorithm for fast SVM dual optimization 3/29/18 Dr. Yanjun Qi / UVA 6

Optimization Review: Lagrangian Duality The Primal Problem Primal: min w s.t. f 0 (w) f i (w) 0, i = 1,,k The generalized Lagrangian: L(w,α )= f 0 (w)+ i=1 the α's (α ι 0) are called the Lagarangian multipliers Method of Lagrange multipliers convert to a higher-dimensional problem k α i f i (w) Lemma: max L(w,α )= f 0 (w) if w satisfies primal constraints α,αi 0 o/w A re-written Primal: 3/29/18 min w max L(w,α ) Dr. α Yanjun,αi 0 Qi / UVA Eric Xing @ CMU, 2006-2008

Optimization Review: Lagrangian Duality, cont. Recall the Primal Problem: min w max α,αi 0 L(w,α ) The Dual Problem: max α,αi 0 min w L(w,α ) Theorem (weak duality): d * = max min L(w,α ) min max α,αi L(w,α )= p* 0 w w α,α i 0 Theorem (strong duality): Iff there exist a saddle point of 3/29/18 we have L(w,α ) * d = p * Dr. Yanjun Qi / UVA

min u u 2 s.t. u >= b 3/29/18 Dr. Yanjun Qi / UVA 9

Optimization Review: Constrained Optimization min u u 2 s.t. u >= b Allowed min Case 1: b Global min Allowed min Case 2: b Global min 3/29/18 Dr. Yanjun Qi / UVA 10

Optimization Review: Constrained Optimization min u u 2 s.t. u >= b Allowed min Case 1: b Global min Allowed min Case 2: b Global min 3/29/18 Dr. Yanjun Qi / UVA 11

Optimization Review: Constrained Optimization min u u 2 s.t. u >= b Allowed min Case 1: b Global min Allowed min Case 2: b Global min 3/29/18 Dr. Yanjun Qi / UVA 12

min u u 2 s.t. u >= b 3/29/18 Dr. Yanjun Qi / UVA 13

min u u 2 s.t. u >= b 3/29/18 Dr. Yanjun Qi / UVA 14

min u u 2 s.t. u >= b 3/29/18 Dr. Yanjun Qi / UVA 15

min u u 2 s.t. u >= b Dual Primal 3/29/18 Dr. Yanjun Qi / UVA 16

min u u 2 s.t. u >= b Dual Primal 3/29/18 Dr. Yanjun Qi / UVA 17

min u u 2 s.t. u >= b Dual Primal 3/29/18 Dr. Yanjun Qi / UVA 18

min u u 2 s.t. u >= b Dual Primal 3/29/18 Dr. Yanjun Qi / UVA 19

3/29/18 Dr. Yanjun Qi / UVA 20

3/29/18 Dr. Yanjun Qi / UVA 21

Today Extra q Optimization of SVM ü SVM as QP ü A simple example of constrained optimization ü SVM Optimization with dual form ü KKT condition ü SMO algorithm for fast SVM dual optimization 3/29/18 Dr. Yanjun Qi / UVA 22

w T w min w,b max α 2 α i [(wt x i + b)y i 1] i α! i 0 i 3/29/18 Dr. Yanjun Qi / UVA 23

w T w min w,b max α 2 α i [(wt x i + b)y i 1] i α! i 0 i 3/29/18 Dr. Yanjun Qi / UVA 24

N ( ) L primal = 1! 2 w 2 α i y i (w x i + b) 1 i=1 3/29/18 Dr. Yanjun Qi / UVA 25

Optimization Review: Dual Problem Solving dual problem if the dual form is easier than primal form Need to change primal minimization to dual maximization (OR è Need to change primal maximization to dual minimization) Primal Problem Dual Problem, Strong duality Only valid when the original optimization problem is convex/concave (strong duality) 3/29/18 Dr. Yanjun Qi / UVA 26

Today Extra q Optimization of SVM ü SVM as QP ü A simple example of constrained optimization ü SVM Optimization with dual form ü KKT condition ü SMO algorithm for fast SVM dual optimization 3/29/18 Dr. Yanjun Qi / UVA 27

KKT Condition for Strong Duality Primal Problem Dual Problem, Strong duality 3/29/18 Dr. Yanjun Qi / UVA 28

Optimization Review: Lagrangian (even more general standard form) 3/29/18 Dr. Yanjun Qi / UVA 29

Optimization Review: Lagrange dual function Inf(.): greatest lower bound 3/29/18 Dr. Yanjun Qi / UVA 30

Optimization Review: inf (.): greatest lower bound 3/29/18 Dr. Yanjun Qi / UVA 31

Optimization Review: inf (.): greatest lower bound 3/29/18 Dr. Yanjun Qi / UVA 32

3/29/18 Dr. Yanjun Qi / UVA 33

Optimization Review: Key for SVM Dual 3/29/18 Dr. Yanjun Qi / UVA 34

Dual formulation for linearly non separable case (soft SVM) 3/29/18 Dr. Yanjun Qi / UVA 35

Dual formulation for linearly non separable case 3/29/18 Dr. Yanjun Qi / UVA 36

Today Extra q Optimization of SVM ü SVM as QP ü A simple example of constrained optimization ü SVM Optimization with dual form ü KKT condition ü SMO algorithm for fast SVM dual optimization 3/29/18 Dr. Yanjun Qi / UVA 37

Fast SVM Implementations SMO: Sequential Minimal Optimization SVM-Light LibSVM BSVM 3/29/18 Dr. Yanjun Qi / UVA 38

SMO: Sequential Minimal Optimization Key idea Divide the large QP problem of SVM into a series of smallest possible QP problems, which can be solved analytically and thus avoids using a timeconsuming numerical QP in the loop (a kind of SQP method). Space complexity: O(n). Since QP is greatly simplified, most time-consuming part of SMO is the evaluation of decision function, therefore it is very fast for linear SVM and sparse data. 3/29/18 Dr. Yanjun Qi / UVA 39

SMO At each step, SMO chooses 2 Lagrange multipliers to jointly optimize, find the optimal values for these multipliers and updates the SVM to reflect the new optimal values. Three components An analytic method to solve for the two Lagrange multipliers A heuristic for choosing which (next) two multipliers to optimize A method for computing b at each step, so that the KTT conditions are fulfilled for both the two examples (corresponding to the two multipliers ) 3/29/18 Dr. Yanjun Qi / UVA 40

Choosing Which Multipliers to Optimize First multiplier Iterate over the entire training set, and find an example that violates the KTT condition. Second multiplier Maximize the size of step taken during joint optimization. E 1 -E 2, where E i is the error on the i-th example. 3/29/18 Dr. Yanjun Qi / UVA 41

References Big thanks to Prof. Ziv Bar-Joseph and Prof. Eric Xing @ CMU for allowing me to reuse some of his slides Elements of Statistical Learning, by Hastie, Tibshirani and Friedman Prof. Andrew Moore @ CMU s slides Tutorial slides from Dr. Tie-Yan Liu, MSR Asia A Practical Guide to Support Vector Classification Chih-Wei Hsu, Chih-Chung Chang, and Chih-Jen Lin, 2003-2010 Tutorial slides from Stanford Convex Optimization I Boyd & Vandenberghe 3/29/18 Dr. Yanjun Qi / UVA CS 42