Efficient Maximum Margin Clustering
|
|
- Arthur Snow
- 5 years ago
- Views:
Transcription
1 Dept. Automation, Tsinghua Univ. MLA, Nov. 8, 2008 Nanjing, China
2 Outline
3 Outline
4 Support Vector Machine Given X = {x 1,, x n }, y = (y 1,..., y n ) { 1, +1} n, SVM finds a hyperplane f (x) = w T φ(x) + b by solving min w,b,ξi 1 2 wt w + C n s.t. n ξ i (1) y i (w T φ(x i ) + b) 1 ξ i ξ i 0 i = 1,..., n
5 Maximum Margin Clustering [Xu et. al. 2004] MMC targets to find not only the optimal hyperplane (w, b ), but also the optimal labeling vector y min min 1 y { 1,+1} n w,b,ξ i 2 wt w + C n s.t. n ξ i (2) y i (w T φ(x i ) + b) 1 ξ i ξ i 0 i = 1,..., n
6 Representative Works Semi-definite programming [Xu et. al. (NIPS 2004)] Several relaxations made n 2 variables in SDP Time complexity O(n 7 )
7 Representative Works Semi-definite programming [Valizadegan and Jin (NIPS 2006)] Reduce number of variables from n 2 to n Time complexity O(n 4 ) Only 2-class scenario
8 Representative Works Alternating optimization [Zhang et. al. (ICML 2007)] Involve a sequence of QPs Number of iterations not guaranteed theoretically Only 2-class scenario
9 Outline
10 Problem Reformulation Theorem Maximum margin clustering is equivalent to 1 min w,b,ξ i 2 wt w + C n s.t. n ξ i (3) w T φ(x i ) + b 1 ξ i ξ i 0 i = 1,..., n where the labeling vector y i = sign(w T φ(x i ) + b).
11 Problem Reformulation Theorem Any solution (w, b ) to problem (4) is also a solution to problem (3) (and vice versa), with ξ = 1 n n ξ i. 1 2 wt w+cξ (4) s.t. c {0, 1} n : 1 n c i w T φ(x i )+b 1 n c i ξ n n min w,b,ξ 0
12 Problem Reformulation Number of variables reduced by 2n 1 Number of constraints increased from n to 2 n We can always find a polynomially sized subset of constraints, with which the solution of the relaxed problem fulfills all constraints from problem (4) up to a precision of ɛ.
13 Cutting Plane Algorithm [J. E. Kelley 1960] Starts with an empty constraint subset Ω Computes the optimal solution to problem (4) subject to the constraints in Ω Finds the most violated constraint in problem (4) and adds it into the subset Ω Stops when no constraint in (4) is violated by more than ɛ 1 n c i w T φ(x i )+b 1 n c i (ξ + ɛ) (5) n n
14 The Most Violated Constraint Theorem The most violated constraint could be computed as follows c i = { 1 if w T φ(x i )+b < 1 0 otherwise (6) The feasibility of a constraint is measured by the corresponding value of ξ 1 n c i w T φ(x i )+b 1 n c i ξ (7) n n
15 Enforcing the Class Balance Constraint Enforce class balance constraint to avoid trivially optimal solutions 1 min w,b,ξ 0 2 wt w+cξ (8) s.t. c Ω: 1 n c i w T φ(x i )+b 1 n c i ξ n n l n ( ) w T φ(x i ) + b l
16 The Constrained Concave-Convex Procedure [A. J. Smola et.al. 2005] Solve non-convex optimization problem whose objective function could be expressed as a difference of convex functions min z f 0 (z) g 0 (z) (9) s.t. f i (z) g i (z) c i i = 1,..., n where f i and g i are real-valued convex functions on a vector space Z and c i R for all i = 1,..., n.
17 The Constrained Concave-Convex Procedure Given an initial point z 0, the CCCP computes z t+1 from z t by replacing g i (z) with its first-order Taylor expansion at z t min f 0 (z) T 1 {g 0, z t }(z) (10) z s.t. f i (z) T 1 {g i, z t }(z) c i i = 1,..., n
18 Optimization via the CCCP By substituting first-order Taylor expansion into problem (8), we obtain the following quadratic programming (QP) problem: 1 min w,b,ξ 2 wt w+cξ s.t. ξ 0 n ( ) l w T φ(x i ) + b l c Ω: 1 n n c i ξ 1 n (11) n [ ] c i sign(w T t φ(x i )+b t ) w T φ(x i )+b 0
19 Justification of CPMMC Theorem For any dataset X = (x 1,..., x n ) and any ɛ > 0, the CPMMC algorithm for maximum margin clustering returns a point (w, b, ξ) for which (w, b, ξ + ɛ) is feasible in problem (4).
20 Time Complexity Analysis Theorem Each iteration of CPMMC takes time O(sn) for a constant working set size Ω. Theorem For any ɛ > 0, C > 0, and any dataset X = {x 1,..., x n }, the CPMMC algorithm terminates after adding at most CR ɛ 2 constraints, where R is a constant number independent of n and s.
21 Time Complexity Analysis Theorem For any dataset X = {x 1,..., x n } with n samples and sparsity of s, and any fixed value of C > 0 and ɛ > 0, the CPMMC algorithm takes time O(sn).
22 Clustering Error Comparison Data Size KM NC MMC GMMC IterSVR CPMMC Digits ± ± Digits ± ± Digits ± ± Digits ± ± Ionosphere ± ± Letter ± ± Satellite ± ± Text ± ± Text ± ± UCI digits MNIST digits For UCI digits and MNIST datasets, we give a through comparison by considering all 45 pairs of digits 0-9. For NC/MMC/GMMC/IterSVR, results on the digits and ionosphere data are simply copied from (Zhang et. al., 2007).
23 Speed of CPMMC Data KM NC GMMC IterSVR CPMMC Digits Digits Digits Digits Ionosphere Letter Satellite Text Text
24 Dataset Size n vs. Speed CPU Time (seconds) Letter Satellite Text 1 Text 2 MNIST 1vs2 MNIST 1vs7 O(n) Number of Samples
25 ɛ vs. Accuracy & Speed Clustering Accuracy UCI Digits 3vs8 UCI Digits 1vs7 UCI Digits 2vs7 UCI Digits 8vs9 Letter Text 1 Text 2 (a) Epsilon CPU Time (seconds) (b) UCI Digits 3vs8 UCI Digits 1vs7 UCI Digits 2vs7 UCI Digits 8vs9 Letter Text 1 Text 2 O(x 0.5 ) Epsilon
26 C vs. Accuracy & Speed Clustering Accuracy UCI 3vs8 UCI 1vs7 UCI 2vs7 UCI 8vs9 Letter Satellite Ionosphere (a) C CPU Time (seconds) UCI 3vs8 UCI 1vs7 UCI 2vs7 UCI 8vs9 Letter Satellite Ionosphere O(x) (b) C
27 Outline
28 Multi-Class Support Vector Machine [Crammer & Singer 2001] Given a point set X ={x 1,, x n } and their labels y=(y 1,..., y n ) {1,..., k} n, SVM defines a weight vector w p for each class p {1,..., k} and classifies sample x by y =arg max y {1,...,k} w T y x with the weight vectors obtained as min w 1,...,w k,ξ s.t. 1 k n 2 β w p 2 + ξ i (12) p=1 i = 1,..., n, r = 1,..., k w T y i x i +δ yi,r w T r x i 1 ξ i
29 Similar with the binary clustering scenario min y min w 1,...,w k,ξ s.t. 1 k 2 β w p n p=1 n ξ i (13) i = 1,..., n, r = 1,..., k w T y i x i +δ yi,r w T r x i 1 ξ i
30 Problem Reformulation I Theorem min w 1,...,w k,ξ s.t. 1 k 2 β w p n p=1 n ξ i (14) i = 1,..., n, r = 1,..., k k k k w T px i I (w T p x i >w T q x i ) + I (w T r x i >w T q x i ) wt r x i 1 ξ i p=1 q=1,q p q=1,q r where I( ) is the indicator function and the label for sample x i is determined as y i = k p=1 p k q=1,q p I (w T p x i >w T q x i )
31 Problem Reformulation II Theorem Problem (14) can be equivalently formulated as problem (15), with ξ = 1 n n ξ i. 1 k min w 1,...,w k,ξ 2 β w p 2 +ξ (15) p=1 s.t. c i {e 0, e 1,..., e k }, i = 1,..., n 1 n k k n ct i e w T px i z ip + c ip (z ip w T px i ) 1 n p=1 p=1 n c T i e ξ where z ip = k q=1,q p I (w T p x i >w T q x i ) and each constraint c is represented as a k n matrix c = (c 1,..., c n ).
32 Problem Reformulation Number of variables reduced by 2n 1 Number of constraints increased from nk to (k + 1) n Targets to finding a small subset of constraints, with which the solution of the relaxed problem fulfills all constraints from problem (15) up to a precision of ɛ.
33 Cutting Plane Algorithm [J. E. Kelley 1960, T. Joachims 2006] Starts with an empty constraint subset Ω Computes the optimal solution to problem (15) subject to the constraints in Ω Finds the most violated constraint in problem (15) and adds it into the subset Ω Stops when no constraint in (15) is violated by more than ɛ c i {e 0, e 1,..., e k } n, i = 1,..., n (16) 1 n k k n n ct i e w T px i z ip + c ip (z ip w T px i ) 1 c T i n e ξ ɛ p=1 p=1
34 The Most Violated Constraint Theorem Define p = arg max p (w T p x i ) and r = arg max r p (w T r x i ) for i = 1,..., n, the most violated constraint could be calculated as follows { er if (w c i = T p x i w T r x i)<1, i = 1,..., n (17) 0 otherwise
35 Enforcing the Class Balance Constraint To avoid trivially optimal solutions min w 1,...,w k,ξ 0 s.t. 1 k 2 β w p 2 +ξ (18) p=1 1 n k k n ct i e w T px i z ip + c ip (z ip w T px i ) 1 n l p=1 p=1 n c T i e ξ, [c 1,..., c n ] Ω n n w T px i w T qx i l, p, q =1,..., k
36 Optimization via the CCCP Calculate the subgradients 1 n k k wr c T i n e w T px i z ip + c ip z ip = 1 n n c T i ez(t) ip x i p=1 p=1 r = 1,..., k w=w (t) (19) By substituting first-order Taylor expansion into problem (18), we obtain a quadratic programming (QP) problem.
37 Justification of CPM3C Theorem For any dataset X = (x 1,..., x n ) and any ɛ > 0, the CPM3C algorithm returns a point (w 1,..., w k, ξ) for which (w 1,..., w k, ξ + ɛ) is feasible.
38 Clustering Accuracy Comparison Data KM NC MMC CPM3C Dig Dig Cora-DS Cora-HA Cora-ML Cora-OS Cora-PL WK-CL WK-TX WK-WT WK-WC news RCVI
39 Rand Index Comparison Data KM NC MMC CPM3C Dig Dig Cora-DS Cora-HA Cora-ML Cora-OS Cora-PL WK-CL WK-TX WK-WT WK-WC news RCVI
40 Speed Comparison Data KM CPM3C Dig Dig Cora-DS Cora-HA Cora-ML Cora-OS Cora-PL WK-CL WK-TX WK-WT WK-WC news RCVI
41 Dataset Size n vs. Speed CPU Time (seconds) Cora DS Cora HA Cora ML Cora OS Cora PL 20News O(n) Cora & 20News CPU Time (seconds) WK CL WK HA WK WT WK WC RCVI O(n) WebKB & RCVI Number of Samples Number of Samples
42 ɛ vs. Accuracy 0.8 Cora & 20News 0.9 WebKB & RCVI Clustering Accuracy Cora DS Cora HA Cora ML 0.3 Cora OS Cora PL 20News Epsilon Clustering Accuracy WK CL WK TX 0.5 WK WT WK WC data Epsilon
43 ɛ vs. Speed 10 3 Cora & 20News 10 4 WebKB & RCVI CPU Time (seconds) Cora DS Cora HA Cora ML Cora OS Cora PL 20News O(x 0.5 ) Epsilon CPU Time (seconds) WK CL WK TX WK WT WK WC RCVI 10 2 O(x 0.5 ) Epsilon
44 Outline
45 Semi-Supervised Support Vector Machine Given X = {x 1,, x l, x l+1,, x n }, where the first l points in X are labeled as y i { 1, +1} and the remaining u = n l points are unlabeled 1 min min y l+1,...,y n w,b,ξ i 2 wt w+ C l n s.t. l ξ i + C u n n ξ j (20) j=l+1 y i [w T φ(x i )+b] 1 ξ i, i =1,..., l y j [w T φ(x j )+b] 1 ξ j, j =l + 1,..., n ξ i 0, i =1,..., l ξ j 0, j =l + 1,..., n
46 CutS3VM: Fast S3VM Algorithm Theorem Problem (20) is equivalent to 1 min w,ξ i 2 wt w+ C l n s.t. l ξ i + C u n n ξ j (21) j=l+1 y i [w T φ(x i )+b] 1 ξ i, i =1,..., l w T φ(x j )+b 1 ξ j, j =l + 1,..., n ξ i 0, i =1,..., l ξ j 0, j =l + 1,..., n where the labels y j, j = l + 1,..., n are calculated as y j = sign(w T φ(x j ) + b).
47 CutS3VM: Fast S3VM Algorithm Theorem Problem (21) can be equivalently formulated as min w,ξ 0 s.t. 1 2 wt w+ξ (22) 1 l n n C l c i y i [w T φ(x i )+b]+c u c j w T φ(x j )+b j=l+1 1 l n n C l c i +C u c j ξ, c {0, 1}n j=l+1 and any solution w to problem (22) is also a solution to problem (21) (vice versa), with ξ = C l l n ξ i + Cu n n j=l+1 ξ j.
48 Maximum Margin Embedding Traditional embedding methods find the optimal subspace by minimizing some form of average loss or cost MME directly finds the most discriminative subspace, where clusters are most well-separated MME is insensitive to the actual probability distribution of patterns lying further away from the separating hyperplanes
49 Maximum Margin Embedding min y,w,b,ξ i wt w + C n n ξ i (23) s.t. y i (w T φ(x i ) + b) 1 ξ i, i = 1,..., n A T w = 0 y { 1, +1} n where A = [w 1,..., w r 1 ] constrains that w should be orthogonal to all previously calculated projecting vectors.
50 Outline
51 Improvements No loss in clustering accuracy Major improvement on speed Handle large real-world datasets efficiently
52 Future works Automatically tune the parameters Even larger dataset
53 References Bin Zhao, Fei Wang, Changshui Zhang. Maximum Margin Embedding. ICDM 2008 Bin Zhao, Fei Wang, Changshui Zhang. CutS3VM: A Fast Semi-Supervised SVM Algorithm. KDD Bin Zhao, Fei Wang, Changshui Zhang. Efficient Multiclass Maximum Margin Clustering. ICML Bin Zhao, Fei Wang, Changshui Zhang. Efficient Maximum Margin Clustering Via Cutting Plane Algorithm. SDM 2008
54 Thanks for Listening
ECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 27, 2015 Outline One versus all/one versus one Ranking loss for multiclass/multilabel classification Scaling to millions of labels Multiclass
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 18, 2016 Outline One versus all/one versus one Ranking loss for multiclass/multilabel classification Scaling to millions of labels Multiclass
More informationSupport Vector Machines for Classification: A Statistical Portrait
Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,
More informationSupport Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2
More informationCSC 411 Lecture 17: Support Vector Machine
CSC 411 Lecture 17: Support Vector Machine Ethan Fetaya, James Lucas and Emad Andrews University of Toronto CSC411 Lec17 1 / 1 Today Max-margin classification SVM Hard SVM Duality Soft SVM CSC411 Lec17
More informationJeff Howbert Introduction to Machine Learning Winter
Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationMachine Learning. Support Vector Machines. Manfred Huber
Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data
More informationMachine Learning. Support Vector Machines. Fabio Vandin November 20, 2017
Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training
More informationSupport Vector Machines for Classification and Regression
CIS 520: Machine Learning Oct 04, 207 Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may
More informationOptimal Margin Distribution Clustering
Optimal Margin Distribution Clustering eng Zhang and Zhi-Hua Zhou National Key Laboratory for Novel Software echnology, Nanjing University, Nanjing 20023, China Collaborative Innovation Center of Novel
More informationLecture Notes on Support Vector Machine
Lecture Notes on Support Vector Machine Feng Li fli@sdu.edu.cn Shandong University, China 1 Hyperplane and Margin In a n-dimensional space, a hyper plane is defined by ω T x + b = 0 (1) where ω R n is
More informationLarge Scale Semi-supervised Linear SVMs. University of Chicago
Large Scale Semi-supervised Linear SVMs Vikas Sindhwani and Sathiya Keerthi University of Chicago SIGIR 2006 Semi-supervised Learning (SSL) Motivation Setting Categorize x-billion documents into commercial/non-commercial.
More informationLinear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction
Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the
More informationPerceptron Revisited: Linear Separators. Support Vector Machines
Support Vector Machines Perceptron Revisited: Linear Separators Binary classification can be viewed as the task of separating classes in feature space: w T x + b > 0 w T x + b = 0 w T x + b < 0 Department
More informationA Randomized Algorithm for Large Scale Support Vector Learning
A Randomized Algorithm for Large Scale Support Vector Learning Krishnan S. Department of Computer Science and Automation, Indian Institute of Science, Bangalore-12 rishi@csa.iisc.ernet.in Chiranjib Bhattacharyya
More informationSupport vector machines Lecture 4
Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The
More informationMathematical Programming for Multiple Kernel Learning
Mathematical Programming for Multiple Kernel Learning Alex Zien Fraunhofer FIRST.IDA, Berlin, Germany Friedrich Miescher Laboratory, Tübingen, Germany 07. July 2009 Mathematical Programming Stream, EURO
More informationSupport Vector Machine via Nonlinear Rescaling Method
Manuscript Click here to download Manuscript: svm-nrm_3.tex Support Vector Machine via Nonlinear Rescaling Method Roman Polyak Department of SEOR and Department of Mathematical Sciences George Mason University
More informationNon-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines
Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall
More informationMinimum Conditional Entropy Clustering: A Discriminative Framework for Clustering
JMLR: Workshop and Conference Proceedings 13: 47-62 2nd Asian Conference on Machine Learning (ACML2010), Tokyo, Japan, Nov. 8 10, 2010. Minimum Conditional Entropy Clustering: A Discriminative Framework
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationSupport Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs
E0 270 Machine Learning Lecture 5 (Jan 22, 203) Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in
More informationOutline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22
Outline Basic concepts: SVM and kernels SVM primal/dual problems Chih-Jen Lin (National Taiwan Univ.) 1 / 22 Outline Basic concepts: SVM and kernels Basic concepts: SVM and kernels SVM primal/dual problems
More informationA short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie
A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie Computational Biology Program Memorial Sloan-Kettering Cancer Center http://cbio.mskcc.org/leslielab
More informationLecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University
Lecture 18: Kernels Risk and Loss Support Vector Regression Aykut Erdem December 2016 Hacettepe University Administrative We will have a make-up lecture on next Saturday December 24, 2016 Presentations
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationCS798: Selected topics in Machine Learning
CS798: Selected topics in Machine Learning Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS798: Selected topics in Machine Learning
More informationCutting Plane Training of Structural SVM
Cutting Plane Training of Structural SVM Seth Neel University of Pennsylvania sethneel@wharton.upenn.edu September 28, 2017 Seth Neel (Penn) Short title September 28, 2017 1 / 33 Overview Structural SVMs
More informationLearning with kernels and SVM
Learning with kernels and SVM Šámalova chata, 23. května, 2006 Petra Kudová Outline Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Learning from data find
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationSupport vector machines
Support vector machines Guillaume Obozinski Ecole des Ponts - ParisTech SOCN course 2014 SVM, kernel methods and multiclass 1/23 Outline 1 Constrained optimization, Lagrangian duality and KKT 2 Support
More informationHOMEWORK 4: SVMS AND KERNELS
HOMEWORK 4: SVMS AND KERNELS CMU 060: MACHINE LEARNING (FALL 206) OUT: Sep. 26, 206 DUE: 5:30 pm, Oct. 05, 206 TAs: Simon Shaolei Du, Tianshu Ren, Hsiao-Yu Fish Tung Instructions Homework Submission: Submit
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction
More informationLecture 9: PGM Learning
13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and
More informationWarm up: risk prediction with logistic regression
Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T
More informationCS-E4830 Kernel Methods in Machine Learning
CS-E4830 Kernel Methods in Machine Learning Lecture 5: Multi-class and preference learning Juho Rousu 11. October, 2017 Juho Rousu 11. October, 2017 1 / 37 Agenda from now on: This week s theme: going
More informationData Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396
Data Mining Linear & nonlinear classifiers Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 31 Table of contents 1 Introduction
More informationCSE 151 Machine Learning. Instructor: Kamalika Chaudhuri
CSE 151 Machine Learning Instructor: Kamalika Chaudhuri Linear Classification Given labeled data: (xi, feature vector yi) label i=1,..,n where y is 1 or 1, find a hyperplane to separate from Linear Classification
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationAn Improved Conjugate Gradient Scheme to the Solution of Least Squares SVM
An Improved Conjugate Gradient Scheme to the Solution of Least Squares SVM Wei Chu Chong Jin Ong chuwei@gatsby.ucl.ac.uk mpeongcj@nus.edu.sg S. Sathiya Keerthi mpessk@nus.edu.sg Control Division, Department
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Max-margin learning of GM Eric Xing Lecture 28, Apr 28, 2014 b r a c e Reading: 1 Classical Predictive Models Input and output space: Predictive
More informationCS6375: Machine Learning Gautam Kunapuli. Support Vector Machines
Gautam Kunapuli Example: Text Categorization Example: Develop a model to classify news stories into various categories based on their content. sports politics Use the bag-of-words representation for this
More informationICS-E4030 Kernel Methods in Machine Learning
ICS-E4030 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 28. September, 2016 Juho Rousu 28. September, 2016 1 / 38 Convex optimization Convex optimisation This
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationMachine Learning And Applications: Supervised Learning-SVM
Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine
More informationPU Learning for Matrix Completion
Cho-Jui Hsieh Dept of Computer Science UT Austin ICML 2015 Joint work with N. Natarajan and I. S. Dhillon Matrix Completion Example: movie recommendation Given a set Ω and the values M Ω, how to predict
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Andreas Maletti Technische Universität Dresden Fakultät Informatik June 15, 2006 1 The Problem 2 The Basics 3 The Proposed Solution Learning by Machines Learning
More informationLecture 10: A brief introduction to Support Vector Machine
Lecture 10: A brief introduction to Support Vector Machine Advanced Applied Multivariate Analysis STAT 2221, Fall 2013 Sungkyu Jung Department of Statistics, University of Pittsburgh Xingye Qiao Department
More informationSupport Vector Machines.
Support Vector Machines www.cs.wisc.edu/~dpage 1 Goals for the lecture you should understand the following concepts the margin slack variables the linear support vector machine nonlinear SVMs the kernel
More informationReducing Multiclass to Binary: A Unifying Approach for Margin Classifiers
Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Erin Allwein, Robert Schapire and Yoram Singer Journal of Machine Learning Research, 1:113-141, 000 CSE 54: Seminar on Learning
More informationTrading Convexity for Scalability
Trading Convexity for Scalability Léon Bottou leon@bottou.org Ronan Collobert, Fabian Sinz, Jason Weston ronan@collobert.com, fabee@tuebingen.mpg.de, jasonw@nec-labs.com NEC Laboratories of America A Word
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
More informationLecture 10: Support Vector Machine and Large Margin Classifier
Lecture 10: Support Vector Machine and Large Margin Classifier Applied Multivariate Analysis Math 570, Fall 2014 Xingye Qiao Department of Mathematical Sciences Binghamton University E-mail: qiao@math.binghamton.edu
More informationKernel Methods and Support Vector Machines
Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete
More informationExploiting Primal and Dual Sparsity for Extreme Classification 1
Exploiting Primal and Dual Sparsity for Extreme Classification 1 Ian E.H. Yen Joint work with Xiangru Huang, Kai Zhong, Pradeep Ravikumar and Inderjit Dhillon Machine Learning Department Carnegie Mellon
More informationLearning SVM Classifiers with Indefinite Kernels
Learning SVM Classifiers with Indefinite Kernels Suicheng Gu and Yuhong Guo Dept. of Computer and Information Sciences Temple University Support Vector Machines (SVMs) (Kernel) SVMs are widely used in
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Shivani Agarwal Support Vector Machines (SVMs) Algorithm for learning linear classifiers Motivated by idea of maximizing margin Efficient extension to non-linear
More informationOnline Passive- Aggressive Algorithms
Online Passive- Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007 Tutor: Eneldo Loza Mencía Seminar aus Maschinellem Lernen WS07/08 Overview Online algorithms Online Binary Classification Problem Perceptron
More informationDiscriminative Unsupervised Learning of Structured Predictors
Linli Xu l5xu@cs.uwaterloo.ca Dana Wilkinson d3wilkin@cs.uwaterloo.ca School of Computer Science, University of Waterloo, Waterloo ON, Canada Alberta Ingenuity Centre for Machine Learning, University of
More informationSemi-Supervised Classification with Universum
Semi-Supervised Classification with Universum Dan Zhang 1, Jingdong Wang 2, Fei Wang 3, Changshui Zhang 4 1,3,4 State Key Laboratory on Intelligent Technology and Systems, Tsinghua National Laboratory
More informationCIS 520: Machine Learning Oct 09, Kernel Methods
CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed
More informationSupport Vector Machines, Kernel SVM
Support Vector Machines, Kernel SVM Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 27, 2017 1 / 40 Outline 1 Administration 2 Review of last lecture 3 SVM
More informationSecond Order Cone Programming, Missing or Uncertain Data, and Sparse SVMs
Second Order Cone Programming, Missing or Uncertain Data, and Sparse SVMs Ammon Washburn University of Arizona September 25, 2015 1 / 28 Introduction We will begin with basic Support Vector Machines (SVMs)
More informationWarm up. Regrade requests submitted directly in Gradescope, do not instructors.
Warm up Regrade requests submitted directly in Gradescope, do not email instructors. 1 float in NumPy = 8 bytes 10 6 2 20 bytes = 1 MB 10 9 2 30 bytes = 1 GB For each block compute the memory required
More informationA convex relaxation for weakly supervised classifiers
A convex relaxation for weakly supervised classifiers Armand Joulin and Francis Bach SIERRA group INRIA -Ecole Normale Supérieure ICML 2012 Weakly supervised classification We adress the problem of weakly
More informationBANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1
BANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1 Shaobo Li University of Cincinnati 1 Partially based on Hastie, et al. (2009) ESL, and James, et al. (2013) ISLR Data Mining I Lecture
More informationIndirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina
Indirect Rule Learning: Support Vector Machines Indirect learning: loss optimization It doesn t estimate the prediction rule f (x) directly, since most loss functions do not have explicit optimizers. Indirection
More informationSupport'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan
Support'Vector'Machines Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan kasthuri.kannan@nyumc.org Overview Support Vector Machines for Classification Linear Discrimination Nonlinear Discrimination
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationModel Selection for LS-SVM : Application to Handwriting Recognition
Model Selection for LS-SVM : Application to Handwriting Recognition Mathias M. Adankon and Mohamed Cheriet Synchromedia Laboratory for Multimedia Communication in Telepresence, École de Technologie Supérieure,
More informationLecture Support Vector Machine (SVM) Classifiers
Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in
More informationA GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES. Wei Chu, S. Sathiya Keerthi, Chong Jin Ong
A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES Wei Chu, S. Sathiya Keerthi, Chong Jin Ong Control Division, Department of Mechanical Engineering, National University of Singapore 0 Kent Ridge Crescent,
More information1 Training and Approximation of a Primal Multiclass Support Vector Machine
1 Training and Approximation of a Primal Multiclass Support Vector Machine Alexander Zien 1,2 and Fabio De Bona 1 and Cheng Soon Ong 1,2 1 Friedrich Miescher Lab., Max Planck Soc., Spemannstr. 39, Tübingen,
More informationSupport Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar
Data Mining Support Vector Machines Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar 02/03/2018 Introduction to Data Mining 1 Support Vector Machines Find a linear hyperplane
More informationBatch Mode Sparse Active Learning. Lixin Shi, Yuhang Zhao Tsinghua University
Batch Mode Sparse Active Learning Lixin Shi, Yuhang Zhao Tsinghua University Our work Propose an unified framework of batch mode active learning Instantiate the framework using classifiers based on sparse
More informationMachine Learning A Geometric Approach
Machine Learning A Geometric Approach CIML book Chap 7.7 Linear Classification: Support Vector Machines (SVM) Professor Liang Huang some slides from Alex Smola (CMU) Linear Separator Ham Spam From Perceptron
More informationLearning to Align Sequences: A Maximum-Margin Approach
Learning to Align Sequences: A Maximum-Margin Approach Thorsten Joachims Tamara Galor Ron Elber Department of Computer Science Cornell University Ithaca, NY 14853 {tj,galor,ron}@cs.cornell.edu June 24,
More informationSVM May 2007 DOE-PI Dianne P. O Leary c 2007
SVM May 2007 DOE-PI Dianne P. O Leary c 2007 1 Speeding the Training of Support Vector Machines and Solution of Quadratic Programs Dianne P. O Leary Computer Science Dept. and Institute for Advanced Computer
More informationMachine Learning. Lecture 6: Support Vector Machine. Feng Li.
Machine Learning Lecture 6: Support Vector Machine Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Warm Up 2 / 80 Warm Up (Contd.)
More informationMulticlass Classification-1
CS 446 Machine Learning Fall 2016 Oct 27, 2016 Multiclass Classification Professor: Dan Roth Scribe: C. Cheng Overview Binary to multiclass Multiclass SVM Constraint classification 1 Introduction Multiclass
More informationSupport Vector and Kernel Methods
SIGIR 2003 Tutorial Support Vector and Kernel Methods Thorsten Joachims Cornell University Computer Science Department tj@cs.cornell.edu http://www.joachims.org 0 Linear Classifiers Rules of the Form:
More informationConvex Optimization in Classification Problems
New Trends in Optimization and Computational Algorithms December 9 13, 2001 Convex Optimization in Classification Problems Laurent El Ghaoui Department of EECS, UC Berkeley elghaoui@eecs.berkeley.edu 1
More informationML (cont.): SUPPORT VECTOR MACHINES
ML (cont.): SUPPORT VECTOR MACHINES CS540 Bryan R Gibson University of Wisconsin-Madison Slides adapted from those used by Prof. Jerry Zhu, CS540-1 1 / 40 Support Vector Machines (SVMs) The No-Math Version
More informationTensor Methods for Feature Learning
Tensor Methods for Feature Learning Anima Anandkumar U.C. Irvine Feature Learning For Efficient Classification Find good transformations of input for improved classification Figures used attributed to
More informationSupport Vector Machine & Its Applications
Support Vector Machine & Its Applications A portion (1/3) of the slides are taken from Prof. Andrew Moore s SVM tutorial at http://www.cs.cmu.edu/~awm/tutorials Mingyue Tan The University of British Columbia
More informationSupport Vector Machines Explained
December 23, 2008 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),
More informationMulticlass Classification with Multi-Prototype Support Vector Machines
Journal of Machine Learning Research 6 (2005) 817 850 Submitted 2/05; Revised 5/05; Published 5/05 Multiclass Classification with Multi-Prototype Support Vector Machines Fabio Aiolli Alessandro Sperduti
More informationBack to the future: Radial Basis Function networks revisited
Back to the future: Radial Basis Function networks revisited Qichao Que, Mikhail Belkin Department of Computer Science and Engineering Ohio State University Columbus, OH 4310 que, mbelkin@cse.ohio-state.edu
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s
More informationDistributed Box-Constrained Quadratic Optimization for Dual Linear SVM
Distributed Box-Constrained Quadratic Optimization for Dual Linear SVM Lee, Ching-pei University of Illinois at Urbana-Champaign Joint work with Dan Roth ICML 2015 Outline Introduction Algorithm Experiments
More informationMATH 829: Introduction to Data Mining and Analysis Support vector machines
1/10 MATH 829: Introduction to Data Mining and Analysis Support vector machines Dominique Guillot Departments of Mathematical Sciences University of Delaware March 11, 2016 Hyperplanes 2/10 Recall: A hyperplane
More informationHow to learn from very few examples?
How to learn from very few examples? Dengyong Zhou Department of Empirical Inference Max Planck Institute for Biological Cybernetics Spemannstr. 38, 72076 Tuebingen, Germany Outline Introduction Part A
More informationYou should be able to...
Lecture Outline Gradient Projection Algorithm Constant Step Length, Varying Step Length, Diminishing Step Length Complexity Issues Gradient Projection With Exploration Projection Solving QPs: active set
More informationA Randomized Algorithm for Large Scale Support Vector Learning
A Randomized Algorithm for Large Scale Support Vector Learning Krishnan S. Department of Computer Science and Automation Indian Institute of Science Bangalore-12 rishi@csa.iisc.ernet.in Chiranjib Bhattacharyya
More informationSupport Vector Machine (continued)
Support Vector Machine continued) Overlapping class distribution: In practice the class-conditional distributions may overlap, so that the training data points are no longer linearly separable. We need
More informationCS-E4830 Kernel Methods in Machine Learning
CS-E4830 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 27. September, 2017 Juho Rousu 27. September, 2017 1 / 45 Convex optimization Convex optimisation This
More informationMehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 3 April 5, 2013 Due: April 19, 2013
Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 3 April 5, 2013 Due: April 19, 2013 A. Kernels 1. Let X be a finite set. Show that the kernel
More informationBrief Introduction to Machine Learning
Brief Introduction to Machine Learning Yuh-Jye Lee Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU August 29, 2016 1 / 49 1 Introduction 2 Binary Classification 3 Support Vector
More informationLearning by constraints and SVMs (2)
Statistical Techniques in Robotics (16-831, F12) Lecture#14 (Wednesday ctober 17) Learning by constraints and SVMs (2) Lecturer: Drew Bagnell Scribe: Albert Wu 1 1 Support Vector Ranking Machine pening
More information