Maximal Margin Classifier

Similar documents
Which Separator? Spring 1

Lecture 10 Support Vector Machines. Oct

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

Lecture 10 Support Vector Machines II

Support Vector Machines

Assortment Optimization under MNL

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015

Support Vector Machines

Lagrange Multipliers Kernel Trick

Kernel Methods and SVMs Extension

Support Vector Machines

Solutions HW #2. minimize. Ax = b. Give the dual problem, and make the implicit equality constraints explicit. Solution.

Support Vector Machines. Jie Tang Knowledge Engineering Group Department of Computer Science and Technology Tsinghua University 2012

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

Support Vector Machines CS434

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

Natural Language Processing and Information Retrieval

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING

Lecture Notes on Linear Regression

Lecture 3: Dual problems and Kernels

Multilayer Perceptron (MLP)

Support Vector Machines

Support Vector Machines

Solutions to exam in SF1811 Optimization, Jan 14, 2015

Linear Classification, SVMs and Nearest Neighbors

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

Image classification. Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing i them?

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

10-701/ Machine Learning, Fall 2005 Homework 3

CSC 411 / CSC D11 / CSC C11

Support Vector Machines CS434

Some modelling aspects for the Matlab implementation of MMA

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression

Online Classification: Perceptron and Winnow

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

Intro to Visual Recognition

Pattern Classification

17 Support Vector Machines

Section 3.6 Complex Zeros

CS : Algorithms and Uncertainty Lecture 17 Date: October 26, 2016

Convex Optimization. Optimality conditions. (EE227BT: UC Berkeley) Lecture 9 (Optimality; Conic duality) 9/25/14. Laurent El Ghaoui.

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

Chapter Newton s Method

COS 521: Advanced Algorithms Game Theory and Linear Programming

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

Module 9. Lecture 6. Duality in Assignment Problems

Lecture 20: Lift and Project, SDP Duality. Today we will study the Lift and Project method. Then we will prove the SDP duality theorem.

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

Generalized Linear Methods

Important Instructions to the Examiners:

8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS

6.854J / J Advanced Algorithms Fall 2008

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #16 Scribe: Yannan Wang April 3, 2014

Kernel Methods and SVMs

Linear Feature Engineering 11

MMA and GCMMA two methods for nonlinear optimization

Advanced Introduction to Machine Learning

Difference Equations

Additional Codes using Finite Difference Method. 1 HJB Equation for Consumption-Saving Problem Without Uncertainty

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )

Learning Theory: Lecture Notes

Laboratory 3: Method of Least Squares

The Geometry of Logit and Probit

UVA CS / Introduc8on to Machine Learning and Data Mining. Lecture 10: Classifica8on with Support Vector Machine (cont.

CSE 252C: Computer Vision III

Linear discriminants. Nuno Vasconcelos ECE Department, UCSD

Kernels in Support Vector Machines. Based on lectures of Martin Law, University of Michigan

Singular Value Decomposition: Theory and Applications

Calculation of time complexity (3%)

14 Lagrange Multipliers

Lecture 6: Support Vector Machines

CS : Algorithms and Uncertainty Lecture 14 Date: October 17, 2016

FUZZY GOAL PROGRAMMING VS ORDINARY FUZZY PROGRAMMING APPROACH FOR MULTI OBJECTIVE PROGRAMMING PROBLEM

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

PHYS 705: Classical Mechanics. Calculus of Variations II

Lecture 20: November 7

1 Matrix representations of canonical matrices

What would be a reasonable choice of the quantization step Δ?

Laboratory 1c: Method of Least Squares

On the Multicriteria Integer Network Flow Problem

Linear, affine, and convex sets and hulls In the sequel, unless otherwise specified, X will denote a real vector space.

18-660: Numerical Methods for Engineering Design and Optimization

Affine transformations and convexity

Interactive Bi-Level Multi-Objective Integer. Non-linear Programming Problem

Ensemble Methods: Boosting

Supplement: Proofs and Technical Details for The Solution Path of the Generalized Lasso

Vector Norms. Chapter 7 Iterative Techniques in Matrix Algebra. Cauchy-Bunyakovsky-Schwarz Inequality for Sums. Distances. Convergence.

Lecture 12: Discrete Laplacian

Feature Selection: Part 1

15 Lagrange Multipliers

Lecture 14: Bandits with Budget Constraints

Multilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata

Problem Set 9 Solutions

Neural networks. Nuno Vasconcelos ECE Department, UCSD

Chapter 6 Support vector machine. Séparateurs à vaste marge

Multi-layer neural networks

The Minimum Universal Cost Flow in an Infeasible Flow Network

Communication Complexity 16:198: February Lecture 4. x ij y ij

Mean Field / Variational Approximations

Math 217 Fall 2013 Homework 2 Solutions

Transcription:

CS81B/Stat41B: Advanced Topcs n Learnng & Decson Makng Mamal Margn Classfer Lecturer: Mchael Jordan Scrbes: Jana van Greunen Corrected verson - /1/004 1 References/Recommended Readng 1.1 Webstes www.kernel-machnes.org 1. Books Learnng wth Kernels - MIT press Shawe-Taylor and Crstann (nd edton) Separatng Bnary Data wth a Hyper-plane.1 Observatons from Lecture 1 From the dscusson of the four classfers n lecture 1, t can be seen that all methods use the nner product between the data vectors j, j, nstead of the data vectors themselves. Ths fact s obvous for the Perceptron classfer. In lecture 1 we saw that θ at each step s a weghted sum of s. The predctons are determned by takng the nner product of θ. Ths nner product epands nto an nner product between the vectors. The use of the nner product nstead of the data vectors themselves wll yeld great computatonal savngs and allow kernelzaton (see lecture 3 for more detals).. Mamal Margn cont In ths lecture we wll consder problems for whch the nput data does not overlap, n other words, there ests a hyper-plane whch separates the data nto the two sets correspondng to y = 1 and y = 1. Note: In ths problem t wll be very useful to consder the Lagrangan dual problem. The Lagrangan dual turns out to be an unconstraned quadratc problem that s smpler to solve. The Lagrangan approach also gves nsght about the structure of the soluton (.e. the soluton s also a weghted lnear combnaton of the data elements.) Frst, consder a vector w that s perpendcular to the separatng hyper-plane. Defne: θ = (w, b) T where b s a scalar value. 1

Mamal Margn Classfer The prmal problem can be formulated n the followng way: mnmze w T w subject to y (w t + b) 1 The optmzaton problem s the mnmzaton of the norm of w. It s a quadratc problem because the constrants are lnear whle the objectve functon s quadratc. The fact that mnmzng the norm of w solves the problem may be surprsng at frst but can be shown wth the followng algebra. Pck z 1 and z 1 so that they are the vectors lyng on the boundares defnng the margns (see Fgure 1 for a depcton of z 1 and z 1 ). (Note: for the mamal margn problem, all data ponts that do not le on ether z 1 or z 1 can be dscarded because they do not contrbute to the margns.) It s easy to see that the separatng hyperplane needs to le at md-dstance between z 1 and z 1 (otherwse you could move the plane towards the mdpont to ncrease the mnmal margn). And so the mnmal margn wll smply be half the perpendcular dstance between z 1 and z 1, that s half the vector between z 1 and z 1 projected on the unt normal to the plane: To fnd the value of ths epresson, we do the followng manpulatons: Now: And: w T margn = 1 w (z 1 z 1 ) (1) w T z 1 + b = 1 w T z 1 + b = 1 (w T z 1 + b) (w T z 1 + b) = w T (z 1 z 1 ) = and so comparng wth (1), we get the epresson for the margn: margn = 1 w Thus, mnmzng w s equvalent to mamzng the mnmal margn 1/ w..3 Usng a Lagrangan A general conve optmzaton has the followng form: mnmze f() s.t. g() 0 Now, defne a Lagrangan: L(, ) = f() + T g() Clam: the orgnal problem can be wrtten as follows: mn ma L(, ) s.t. 0

Mamal Margn Classfer 3 Fgure 1: Plot of Data Showng Perpendcular vector w and vectors z 1 and z 1. Ths can be seen by takng the nner term: { } f() g() 0 ma L(, ) = g() 0 When we are n the non-feasble regon the solutons s whch wll never be pcked by the outer loop mnmzaton. When we are n the feasble regon, the objectve functon f() s mnmzed..4 Leap of Intuton (Dual Mamal Margn) Instead of wrtng the problem n the followng way (prmal): mn ma L(, ) s.t. 0 we swap the two operators and solve ma mn L(, ). In general, the two problems do not have the same soluton. However, f certan constrant qualfcatons hold, the solutons are the same. One eample of a constrant qualfcaton s Slater s qualfcaton, whch means that the problem must be strctly feasble. Now, why does swappng make sense?

4 Mamal Margn Classfer Consder a plot of the mage of the doman of under the mappng (g(), f()) (see fgure ). The optmal prmal soluton les on the ordnate, on the lower boundary of the mage of ths mapppng. In the dual problem, the Lagrangan f() + g() s beng mnmzed. On the graph ths s the y-ntercept of the lne wth slope passng through the pont (g(), f()). The mnmzaton fnds the smallest such y-ntercept, rangng over all. Ths corresponds to the dual functon. The subsequent mamzaton of the dual functon takes the mamum of such y-ntercepts. Ths yelds the same pont as the prmal soluton. Fgure : Dual and Prmal solutons for Conve Data. In general, the problem can be solved n the followng way: start wth a the solve to get a lower bound, adjust and solve agan. Or the problem can be solved by choosng an as a startng pont for the prmal problem and a as a startng pont for the dual-problem and then closng the gap between the solutons. These are called prmal-dual algorthms. When the set s not conve there ests a dualty gap. Ths s demonstrated n fgure 3..5 Solvng for w and b We can rewrte the Lagrangan as follows: L(w, b, α) = 1 wt w + We frst take the dervatve wth respect to w and set to zero: n α (1 y (w T + b) 1 δl δw = 0

Mamal Margn Classfer 5 Fgure 3: Dualty Gap for Non-Conve Data. => w = α y. Substtutng w back nto the Lagrangan we get: α α y b 1 α α j y y j T j. j Thus, for α such that α y = 0 we get: θ(α) = α 1 α α j y y j T j. j On the other hand, for α such that α y 0, we get θ(α) = (by takng b to nfnty). When mamzng the dual functon, t s clear that ponts α such that α y 0 cannot be mama. Thus we have uncovered an mplct constrant n the problem, namely that α y = 0. Thus the dual problem reduces to mamzng: θ(α) = α α y b 1 α α j y y j T j j subject to α 0 and α y = 0.