Second Order Cone Programming, Missing or Uncertain Data, and Sparse SVMs

Size: px
Start display at page:

Download "Second Order Cone Programming, Missing or Uncertain Data, and Sparse SVMs"

Transcription

1 Second Order Cone Programming, Missing or Uncertain Data, and Sparse SVMs Ammon Washburn University of Arizona September 25, / 28

2 Introduction We will begin with basic Support Vector Machines (SVMs) or maximum margin algorithms We will introduce missing or uncertain data into the training data We will reformulate the CCPs into a SOCP that we will be able to solve using different information Introduce sparse SVMs and why they are used Slight digression on ν-svms Talk about future research areas 2 / 28

3 SVMs and MM programs The basic (linear) maximum margin (MM) program is defined by the following optimization problem. 1 min w,b,ξ i 2 w C m i=1 s.t. y i (w x i b) 1 ξ i, ξ i 0, i = 1,..., m ξ i This program finds a hyperplane between the groups of data points and uses that to categorize new data. ξ i is the penalty used if a data point is moved to the right side while C is some heuristic constant. w and the margin between the groups has an inverse relation. 3 / 28

4 Missing or Uncertain data When dealing with missing or uncertain data we reformulate the problem to Chance Constrained Program or CCP: 1 min w,b,ξ i 2 w C ) s.t. Pr (y i (w X i b) 1 ξ i 1 ɛ, ξ i 0, i = 1,..., m m i=1 ξ i Generally untractable even if the underlying probability distributions of the X i are known Want to find stronger and easier convex conditions Also fulfill for any probability distribution Robust means the worst case distribution 4 / 28

5 Using just Support in Robust Formulation Suppose we know the support of each variable, i.e. x i {x : D i x d i } If ɛ = 0 then in the robust formulation we pick the worst point(s) and proceed as in the original formulation This ensures for no misclassification of the training data The formulation is [4]: s.t. min w,b 1 2 w 2 min y i(w x b) 1, {x:d i x d i } i = 1,..., n 5 / 28

6 1 A 6 / 28

7 Transductive SVMs Introduction the training data set D = {(x i, c i ) x i R p, c i { 1, 1}} n i=1, a test data set to classify D = {x j x j R p } m j=1, 7 / 28

8 Transductive SVMs Introduction The optimization model for transductive SVM can be formulated as min w,b,c j 1 2 wt w s.t. c i (w T x i b) 1, i = 1,, n c j (w T x j b) 1, c j { 1, 1}, j = 1,, m where the decision variable cj in the test data set. is used to classify the point x j 7 / 28

9 Using Second Moments to reformulate it to a SOCP A second order cone program (SOCP) is a program formulated as the following: min f x s.t. A i x + b i 2 c i x + d i, i = 1,..., m Fx = g If A i = 0 for all i then it reduces to a linear program If c i = 0 for all i then it reduces to a quadratic program This can be formulated as a semi-definite program and can be solved using those methods Recently interior-point methods have come out that take advantage of SOCP directly 8 / 28

10 Multivariate Chebyshev Inequality The Multivariate Chebyshev inequality will allow us to put some bounds on the probability of being misclassified using the mean and variance of the data sup Pr(y S) = (1+d 2 ) 1, y (ȳ,σ) where d 2 = inf y S (y ȳ) Σ(y ȳ) S is our convex set over which we care about In SVM it is on one side of the hyperplane This holds for all distributions having the same mean and covariance 9 / 28

11 Robust Formulation Introduction ( ) inf Pr y i (w X i b) 1 ξ i 1 ɛ X i ( x,σ) We take the worst case distribution having our mean and covariance This is the robust formulation In order to use Chebyshev inequality we reformulate it as follows: ) sup Pr (y i (w X i b) 1 ξ i ɛ X i ( x,σ) 10 / 28

12 Plugging what we know we get the following inequality: ɛ (1 + d 2 ) 1 d 2 = inf (x x i ) Σ 1 (x x i ) x y i (w x b) 1 ξ i If the mean x i happens to lie on the hyperplane (or wrong side of the hyperplane) then in the worst case scenario you have a 100 percent chance of misclassifying the data Just move the hyperplane with penalty ξ i Otherwise it is the distance from the hyperplane to the mean d 2 = y i(w x i b) 1 + ξ i w Σw 11 / 28

13 Theorem of CCP Introduction Now we have the following theorem [5]. Theorem The classification problem with uncertainity or CCP is satisfied for all probability distributions having the same mean and covariance with the following second order cone program min w,b,ξ 1 2 w 2 + C n i=1 ξ i s.t. y i (w x 1 ɛ i b) 1 ξ i + Σ 1 2 w ɛ ξ i 0, 1 i n 12 / 28

14 Reformulation to a SOCP SOCP need to have a linear objective function Replace 1 2 w 2 with constraint that w W will give you the same answer if you tune C and W right Packages that have methods to solve SOCPs are AMPL, CPLEX, ECOS, Gurobi, JOptimizer, MOSEK, OpenOpt, SDPT3, and Xpress Can also use semi-definite programming methods to solve it 13 / 28

15 Incorporating more (or less) information Some problems with the last formulation Assumed we knew the means and covariances (More likely we have an estimate for the means and covariances) Didn t allow us to include support of the variable into the model (Want to include all the information we know) Sometimes we only know the support of the means and covariances 14 / 28

16 Incorporating all our information [1] Theorem Assume we know the support (l ij X ij u ij ), bounds on first moments (µ ij µ ij µ + ij ) and bounds on the second moments (0 E[X ij ] σij 2) of independent random variables X ij, j = 1,..., n are known. Then our CCP constraint is satisfied if the following convex constraint is satisfied: 1 ξ i + y i b + j (max[ y i µ ij w j, y i µ + ij w j]) + κ Σ (1),i w 0 Note κ = 2 log( 1 ɛ ) and Σ (1),i = diag ([ s i1 ν(µ i1, µ+ i1, σ i1),..., s in ν(µ in, µ+ in, σ in) ]) where ν(µ ij, µ+ ij, σ ij) will be defined later 15 / 28

17 Key Ideas from the Proof Consider a i0 = 1 ξ + y i b and a i = y i w. Then we can rewrite our CCP constraint as Now use that Pr(a i X i + a i0 0) ɛ (3) Pr(a i X i + a i0 0) = Pr(e αa i X i e αa i0 1), α 0 The Markov inequality Pr(X a) E(X ) a for non-negative random variables X ij for j = 1,..., n are independent We get the following inequality Pr(a i X i + a i0 0) e αa i0 E[e αa ij X ij ] (4) j 16 / 28

18 Key Ideas from the Proof We have now turned our random variables into non-negative random variables We use several bounds (from other papers), AM-GM inequality, and a Taylor series approximation to get the right convex conditions No intuition, just slugging away at the calculations We can find similar convex conditions for different kinds of information Support information and bounds for first and second moments (last theorem) Support information and exact values for first and second moments Same two as above but assume you don t know the second moments 17 / 28

19 Sparse SVMs Introduction The basic sparse linear SVM is exactly the same as before but we now use the l 1 norm in R n [2, 3]. min w 1 + C w,b,ξ i m i=1 s.t. y i (w x i b) 1 ξ i, ξ i 0, i = 1,..., m The most sparse norm is l 0 which counts the number of non-zero entries This norm isn t continuous so l 1 is next best (1, 0) and ( 1 2, 1 2 ) both have norm 1 in l 2 but ( 1 2, 1 2 ) has norm 2 in l 1 ξ i 18 / 28

20 Full LP sparse ν-svm min w 1 νρ + ρ,w,b,ξ i m i=1 ξ i s.t. y i (w x i b) ρ ξ i ξ i, ρ 0 i = 1,..., m i = 1,..., m ν has three properties which make it better than using C 1 It is an upper bound on the fraction number of margin errors (points x i with ξ i > 0) or ME m 2 It is a lower bound on the fraction of support vectors (points on the boundary) or SV m 3 If the data is drawn i.i.d. from a distribution then asymptotically with probability one, ν is the fraction of margin error and SVs 19 / 28

21 Key ideas behind ν-svm This will give the same answer as C-SVM with C = 1 ρ ν is a more intuitive parameter and keeps the same value even if you change dimensions or add data points There is an extra decision variable but since C is so heuristic then it is about the same 20 / 28

22 Benefits of Sparse SVM In big data problems, there are thousands of dimensions but really only a couple have actual predicting power Let your algorithm pick the dimensions that matter If you have thousands of dimensions but just a few data points then a sparse SVM is essential to avoid over-fitting (Genetics) Also using l 1 means the problem can be reformulated as a linear program (LP) 21 / 28

23 How to Enforce Sparseness Though l 1 is sparser than l 2, we would like it to be even sparser. We can do this by reducing the dimensions in the following ways Decide an arbitrary cut-off that gets rid of features (dimensions) with small weights or too large of standard deviation (pre-processing) Introduce arbitrary features (dimensions) which have no say on the categories (Draw them from normal with mean zero) and use the average of their weights as the cutoff Use several random subsets of the features (dimensions) and then bag the models together or bootstrap aggregation. These leads to less variance and less over-fitting (for unstable models) 22 / 28

24 Adding uncertainity into a Sparse Model The convex uncertain SVM models before didn t depend on the norm used. So just add that in. n min w 1 + C ξ i w,b,ξ i=1 s.t. y i (w x 1 ɛ i b) 1 ξ i + Σ w ɛ ξ i 0, 1 i n Would it be possible to add ν into this equation to get rid of C? 23 / 28

25 Possible new model Introduction Putting together the ideas from before we could make the full sparse robust ν-svm as follows: m min w 1 νρ + ξ i ρ,ξ i,w,b i=1 s.t. y i (w x 1 ɛ i b) ρ ξ i + ɛ ξ i, ρ 0 Σ 1 2 i w 2 i = 1,..., m i = 1,..., m Not clear what the ν represents with the uncertainty Will this even get what we want? It it doesn t, how could we change this to get similar ideas as before 24 / 28

26 Other possible regularization functions We have talked about l 2 and l 1 as a regularizing term. What about other regularizations? If we look at l n as n increases, we get less sparse solutions If we look at l n for 0 < n < 1 then these are no longer normed spaces (or n is not a norm). However, does increase sparsity The idea behind LASSO or least absolute shrinkage and selection operator is just to use the l 1 norm for linear regression. Nothing new added SCAD or Smoothly Clipped Absolute Deviation regularization is the following function: λ w j w j λ ( w p λ (w j ) = j 2 2aλ w j +λ 2 2(1 a) λ < w j aλ (a+1)λ 2 2 w j > aλ 25 / 28

27 Future Work Introduction Add ν somehow into the Robust Sparse SVM with uncertainty Analyze the changes with different regularization Add multiple classes to an SVM Take these ideas to SVR (Support vector regression) 26 / 28

28 References I Introduction Aharon Ben-Tal, Sahely Bhadra, Chiranjib Bhattacharyya, and J Saketha Nath. Chance constrained uncertain classification via robust optimization. Mathematical programming, 127(1): , Chiranjib Bhattacharyya, LR Grate, Michael I Jordan, L El Ghaoui, and I Saira Mian. Robust sparse hyperplane classifiers: application to uncertain molecular profiling data. Journal of Computational Biology, 11(6): , / 28

29 References II Introduction Jinbo Bi, Kristin Bennett, Mark Embrechts, Curt Breneman, and Minghu Song. Dimensionality reduction via sparse support vector machines. The Journal of Machine Learning Research, 3: , Neng Fan, Elham Sadeghi, and Panos M Pardalos. Robust support vector machines with polyhedral uncertainty of the input data. pages , Pannagadatta K Shivaswamy, Chiranjib Bhattacharyya, and Alexander J Smola. Second order cone programming approaches for handling missing and uncertain data. The Journal of Machine Learning Research, 7: , / 28

Interval Data Classification under Partial Information: A Chance-constraint Approach

Interval Data Classification under Partial Information: A Chance-constraint Approach Interval Data Classification under Partial Information: A Chance-constraint Approach Sahely Bhadra J. Saketha Nath Aharon Ben-Tal Chiranjib Bhattacharyya PAKDD 2009 J. Saketha Nath (PAKDD 09) Conference

More information

CSC 411 Lecture 17: Support Vector Machine

CSC 411 Lecture 17: Support Vector Machine CSC 411 Lecture 17: Support Vector Machine Ethan Fetaya, James Lucas and Emad Andrews University of Toronto CSC411 Lec17 1 / 1 Today Max-margin classification SVM Hard SVM Duality Soft SVM CSC411 Lec17

More information

Convex Optimization in Classification Problems

Convex Optimization in Classification Problems New Trends in Optimization and Computational Algorithms December 9 13, 2001 Convex Optimization in Classification Problems Laurent El Ghaoui Department of EECS, UC Berkeley elghaoui@eecs.berkeley.edu 1

More information

A Second order Cone Programming Formulation for Classifying Missing Data

A Second order Cone Programming Formulation for Classifying Missing Data A Second order Cone Programming Formulation for Classifying Missing Data Chiranjib Bhattacharyya Department of Computer Science and Automation Indian Institute of Science Bangalore, 560 012, India chiru@csa.iisc.ernet.in

More information

L5 Support Vector Classification

L5 Support Vector Classification L5 Support Vector Classification Support Vector Machine Problem definition Geometrical picture Optimization problem Optimization Problem Hard margin Convexity Dual problem Soft margin problem Alexander

More information

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines Gautam Kunapuli Example: Text Categorization Example: Develop a model to classify news stories into various categories based on their content. sports politics Use the bag-of-words representation for this

More information

Support Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs

Support Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs E0 270 Machine Learning Lecture 5 (Jan 22, 203) Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in

More information

Support Vector Machines for Classification and Regression

Support Vector Machines for Classification and Regression CIS 520: Machine Learning Oct 04, 207 Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may

More information

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training

More information

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training

More information

Short Course Robust Optimization and Machine Learning. Lecture 6: Robust Optimization in Machine Learning

Short Course Robust Optimization and Machine Learning. Lecture 6: Robust Optimization in Machine Learning Short Course Robust Optimization and Machine Machine Lecture 6: Robust Optimization in Machine Laurent El Ghaoui EECS and IEOR Departments UC Berkeley Spring seminar TRANSP-OR, Zinal, Jan. 16-19, 2012

More information

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training

More information

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall

More information

Convex Optimization and Support Vector Machine

Convex Optimization and Support Vector Machine Convex Optimization and Support Vector Machine Problem 0. Consider a two-class classification problem. The training data is L n = {(x 1, t 1 ),..., (x n, t n )}, where each t i { 1, 1} and x i R p. We

More information

Robust Novelty Detection with Single Class MPM

Robust Novelty Detection with Single Class MPM Robust Novelty Detection with Single Class MPM Gert R.G. Lanckriet EECS, U.C. Berkeley gert@eecs.berkeley.edu Laurent El Ghaoui EECS, U.C. Berkeley elghaoui@eecs.berkeley.edu Michael I. Jordan Computer

More information

Stat542 (F11) Statistical Learning. First consider the scenario where the two classes of points are separable.

Stat542 (F11) Statistical Learning. First consider the scenario where the two classes of points are separable. Linear SVM (separable case) First consider the scenario where the two classes of points are separable. It s desirable to have the width (called margin) between the two dashed lines to be large, i.e., have

More information

MSA220/MVE440 Statistical Learning for Big Data

MSA220/MVE440 Statistical Learning for Big Data MSA220/MVE440 Statistical Learning for Big Data Lecture 9-10 - High-dimensional regression Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Recap from

More information

Support Vector Machine

Support Vector Machine Andrea Passerini passerini@disi.unitn.it Machine Learning Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

More information

Machine Learning. Support Vector Machines. Manfred Huber

Machine Learning. Support Vector Machines. Manfred Huber Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data

More information

Convex envelopes, cardinality constrained optimization and LASSO. An application in supervised learning: support vector machines (SVMs)

Convex envelopes, cardinality constrained optimization and LASSO. An application in supervised learning: support vector machines (SVMs) ORF 523 Lecture 8 Princeton University Instructor: A.A. Ahmadi Scribe: G. Hall Any typos should be emailed to a a a@princeton.edu. 1 Outline Convexity-preserving operations Convex envelopes, cardinality

More information

A Robust Minimax Approach to Classification

A Robust Minimax Approach to Classification A Robust Minimax Approach to Classification Gert R.G. Lanckriet gert@cs.berkeley.edu Department of Electrical Engineering and Computer Science University of California, Berkeley, CA 94720, USA Laurent

More information

COMP 875 Announcements

COMP 875 Announcements Announcements Tentative presentation order is out Announcements Tentative presentation order is out Remember: Monday before the week of the presentation you must send me the final paper list (for posting

More information

Robust Fisher Discriminant Analysis

Robust Fisher Discriminant Analysis Robust Fisher Discriminant Analysis Seung-Jean Kim Alessandro Magnani Stephen P. Boyd Information Systems Laboratory Electrical Engineering Department, Stanford University Stanford, CA 94305-9510 sjkim@stanford.edu

More information

Support Vector Machines

Support Vector Machines EE 17/7AT: Optimization Models in Engineering Section 11/1 - April 014 Support Vector Machines Lecturer: Arturo Fernandez Scribe: Arturo Fernandez 1 Support Vector Machines Revisited 1.1 Strictly) Separable

More information

Ordinary Least Squares Linear Regression

Ordinary Least Squares Linear Regression Ordinary Least Squares Linear Regression Ryan P. Adams COS 324 Elements of Machine Learning Princeton University Linear regression is one of the simplest and most fundamental modeling ideas in statistics

More information

A Robust Minimax Approach to Classification

A Robust Minimax Approach to Classification A Robust Minimax Approach to Classification Gert R.G. Lanckriet Laurent El Ghaoui Chiranjib Bhattacharyya Michael I. Jordan Department of Electrical Engineering and Computer Science and Department of Statistics

More information

Bagging and Other Ensemble Methods

Bagging and Other Ensemble Methods Bagging and Other Ensemble Methods Sargur N. Srihari srihari@buffalo.edu 1 Regularization Strategies 1. Parameter Norm Penalties 2. Norm Penalties as Constrained Optimization 3. Regularization and Underconstrained

More information

Learning From Data: Modelling as an Optimisation Problem

Learning From Data: Modelling as an Optimisation Problem Learning From Data: Modelling as an Optimisation Problem Iman Shames April 2017 1 / 31 You should be able to... Identify and formulate a regression problem; Appreciate the utility of regularisation; Identify

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Support Vector Machine (continued)

Support Vector Machine (continued) Support Vector Machine continued) Overlapping class distribution: In practice the class-conditional distributions may overlap, so that the training data points are no longer linearly separable. We need

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Support Vector Machine (SVM) Hamid R. Rabiee Hadi Asheri, Jafar Muhammadi, Nima Pourdamghani Spring 2013 http://ce.sharif.edu/courses/91-92/2/ce725-1/ Agenda Introduction

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

SVM TRADE-OFF BETWEEN MAXIMIZE THE MARGIN AND MINIMIZE THE VARIABLES USED FOR REGRESSION

SVM TRADE-OFF BETWEEN MAXIMIZE THE MARGIN AND MINIMIZE THE VARIABLES USED FOR REGRESSION International Journal of Pure and Applied Mathematics Volume 87 No. 6 2013, 741-750 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu doi: http://dx.doi.org/10.12732/ijpam.v87i6.2

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

Support Vector Machines for Classification: A Statistical Portrait

Support Vector Machines for Classification: A Statistical Portrait Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,

More information

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines Lecture 9: Large Margin Classifiers. Linear Support Vector Machines Perceptrons Definition Perceptron learning rule Convergence Margin & max margin classifiers (Linear) support vector machines Formulation

More information

Robustness and Regularization: An Optimization Perspective

Robustness and Regularization: An Optimization Perspective Robustness and Regularization: An Optimization Perspective Laurent El Ghaoui (EECS/IEOR, UC Berkeley) with help from Brian Gawalt, Onureena Banerjee Neyman Seminar, Statistics Department, UC Berkeley September

More information

Sequential Minimal Optimization (SMO)

Sequential Minimal Optimization (SMO) Data Science and Machine Intelligence Lab National Chiao Tung University May, 07 The SMO algorithm was proposed by John C. Platt in 998 and became the fastest quadratic programming optimization algorithm,

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Warm up: risk prediction with logistic regression

Warm up: risk prediction with logistic regression Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T

More information

Nearest Neighbors Methods for Support Vector Machines

Nearest Neighbors Methods for Support Vector Machines Nearest Neighbors Methods for Support Vector Machines A. J. Quiroz, Dpto. de Matemáticas. Universidad de Los Andes joint work with María González-Lima, Universidad Simón Boĺıvar and Sergio A. Camelo, Universidad

More information

Semidefinite and Second Order Cone Programming Seminar Fall 2012 Project: Robust Optimization and its Application of Robust Portfolio Optimization

Semidefinite and Second Order Cone Programming Seminar Fall 2012 Project: Robust Optimization and its Application of Robust Portfolio Optimization Semidefinite and Second Order Cone Programming Seminar Fall 2012 Project: Robust Optimization and its Application of Robust Portfolio Optimization Instructor: Farid Alizadeh Author: Ai Kagawa 12/12/2012

More information

ν =.1 a max. of 10% of training set can be margin errors ν =.8 a max. of 80% of training can be margin errors

ν =.1 a max. of 10% of training set can be margin errors ν =.8 a max. of 80% of training can be margin errors p.1/1 ν-svms In the traditional softmargin classification SVM formulation we have a penalty constant C such that 1 C size of margin. Furthermore, there is no a priori guidance as to what C should be set

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

Estimation and Optimization: Gaps and Bridges. MURI Meeting June 20, Laurent El Ghaoui. UC Berkeley EECS

Estimation and Optimization: Gaps and Bridges. MURI Meeting June 20, Laurent El Ghaoui. UC Berkeley EECS MURI Meeting June 20, 2001 Estimation and Optimization: Gaps and Bridges Laurent El Ghaoui EECS UC Berkeley 1 goals currently, estimation (of model parameters) and optimization (of decision variables)

More information

An Introduction to Machine Learning

An Introduction to Machine Learning An Introduction to Machine Learning L6: Structured Estimation Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune, January

More information

Machine Learning A Geometric Approach

Machine Learning A Geometric Approach Machine Learning A Geometric Approach CIML book Chap 7.7 Linear Classification: Support Vector Machines (SVM) Professor Liang Huang some slides from Alex Smola (CMU) Linear Separator Ham Spam From Perceptron

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Support vector machines Lecture 4

Support vector machines Lecture 4 Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The

More information

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017 Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training

More information

Introduction How it works Theory behind Compressed Sensing. Compressed Sensing. Huichao Xue. CS3750 Fall 2011

Introduction How it works Theory behind Compressed Sensing. Compressed Sensing. Huichao Xue. CS3750 Fall 2011 Compressed Sensing Huichao Xue CS3750 Fall 2011 Table of Contents Introduction From News Reports Abstract Definition How it works A review of L 1 norm The Algorithm Backgrounds for underdetermined linear

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Reading: Ben-Hur & Weston, A User s Guide to Support Vector Machines (linked from class web page) Notation Assume a binary classification problem. Instances are represented by vector

More information

IEOR 265 Lecture 3 Sparse Linear Regression

IEOR 265 Lecture 3 Sparse Linear Regression IOR 65 Lecture 3 Sparse Linear Regression 1 M Bound Recall from last lecture that the reason we are interested in complexity measures of sets is because of the following result, which is known as the M

More information

Support Vector Machines, Kernel SVM

Support Vector Machines, Kernel SVM Support Vector Machines, Kernel SVM Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 27, 2017 1 / 40 Outline 1 Administration 2 Review of last lecture 3 SVM

More information

A Randomized Algorithm for Large Scale Support Vector Learning

A Randomized Algorithm for Large Scale Support Vector Learning A Randomized Algorithm for Large Scale Support Vector Learning Krishnan S. Department of Computer Science and Automation, Indian Institute of Science, Bangalore-12 rishi@csa.iisc.ernet.in Chiranjib Bhattacharyya

More information

Review: Support vector machines. Machine learning techniques and image analysis

Review: Support vector machines. Machine learning techniques and image analysis Review: Support vector machines Review: Support vector machines Margin optimization min (w,w 0 ) 1 2 w 2 subject to y i (w 0 + w T x i ) 1 0, i = 1,..., n. Review: Support vector machines Margin optimization

More information

Kernel Methods. Machine Learning A W VO

Kernel Methods. Machine Learning A W VO Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

Math for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han

Math for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han Math for Machine Learning Open Doors to Data Science and Artificial Intelligence Richard Han Copyright 05 Richard Han All rights reserved. CONTENTS PREFACE... - INTRODUCTION... LINEAR REGRESSION... 4 LINEAR

More information

Notes on Discriminant Functions and Optimal Classification

Notes on Discriminant Functions and Optimal Classification Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem

More information

Robust Kernel-Based Regression

Robust Kernel-Based Regression Robust Kernel-Based Regression Budi Santosa Department of Industrial Engineering Sepuluh Nopember Institute of Technology Kampus ITS Surabaya Surabaya 60111,Indonesia Theodore B. Trafalis School of Industrial

More information

CSE546: SVMs, Dual Formula5on, and Kernels Winter 2012

CSE546: SVMs, Dual Formula5on, and Kernels Winter 2012 CSE546: SVMs, Dual Formula5on, and Kernels Winter 2012 Luke ZeClemoyer Slides adapted from Carlos Guestrin Linear classifiers Which line is becer? w. = j w (j) x (j) Data Example i Pick the one with the

More information

Incorporating detractors into SVM classification

Incorporating detractors into SVM classification Incorporating detractors into SVM classification AGH University of Science and Technology 1 2 3 4 5 (SVM) SVM - are a set of supervised learning methods used for classification and regression SVM maximal

More information

Does Unlabeled Data Help?

Does Unlabeled Data Help? Does Unlabeled Data Help? Worst-case Analysis of the Sample Complexity of Semi-supervised Learning. Ben-David, Lu and Pal; COLT, 2008. Presentation by Ashish Rastogi Courant Machine Learning Seminar. Outline

More information

Support Vector Machine for Classification and Regression

Support Vector Machine for Classification and Regression Support Vector Machine for Classification and Regression Ahlame Douzal AMA-LIG, Université Joseph Fourier Master 2R - MOSIG (2013) November 25, 2013 Loss function, Separating Hyperplanes, Canonical Hyperplan

More information

Statistical Methods for SVM

Statistical Methods for SVM Statistical Methods for SVM Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot,

More information

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning Mark Schmidt University of British Columbia, May 2016 www.cs.ubc.ca/~schmidtm/svan16 Some images from this lecture are

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon

More information

Lecture Support Vector Machine (SVM) Classifiers

Lecture Support Vector Machine (SVM) Classifiers Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in

More information

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar Data Mining Support Vector Machines Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar 02/03/2018 Introduction to Data Mining 1 Support Vector Machines Find a linear hyperplane

More information

arxiv: v1 [stat.ml] 3 Sep 2014

arxiv: v1 [stat.ml] 3 Sep 2014 Breakdown Point of Robust Support Vector Machine Takafumi Kanamori, Shuhei Fujiwara 2, and Akiko Takeda 2 Nagoya University 2 The University of Tokyo arxiv:409.0934v [stat.ml] 3 Sep 204 Abstract The support

More information

Warm up. Regrade requests submitted directly in Gradescope, do not instructors.

Warm up. Regrade requests submitted directly in Gradescope, do not  instructors. Warm up Regrade requests submitted directly in Gradescope, do not email instructors. 1 float in NumPy = 8 bytes 10 6 2 20 bytes = 1 MB 10 9 2 30 bytes = 1 GB For each block compute the memory required

More information

COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017

COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017 COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University UNDERDETERMINED LINEAR EQUATIONS We

More information

A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES. Wei Chu, S. Sathiya Keerthi, Chong Jin Ong

A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES. Wei Chu, S. Sathiya Keerthi, Chong Jin Ong A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES Wei Chu, S. Sathiya Keerthi, Chong Jin Ong Control Division, Department of Mechanical Engineering, National University of Singapore 0 Kent Ridge Crescent,

More information

Learning with multiple models. Boosting.

Learning with multiple models. Boosting. CS 2750 Machine Learning Lecture 21 Learning with multiple models. Boosting. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Learning with multiple models: Approach 2 Approach 2: use multiple models

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete

More information

EE 227A: Convex Optimization and Applications April 24, 2008

EE 227A: Convex Optimization and Applications April 24, 2008 EE 227A: Convex Optimization and Applications April 24, 2008 Lecture 24: Robust Optimization: Chance Constraints Lecturer: Laurent El Ghaoui Reading assignment: Chapter 2 of the book on Robust Optimization

More information

Jeff Howbert Introduction to Machine Learning Winter

Jeff Howbert Introduction to Machine Learning Winter Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable

More information

Robust Optimization for Risk Control in Enterprise-wide Optimization

Robust Optimization for Risk Control in Enterprise-wide Optimization Robust Optimization for Risk Control in Enterprise-wide Optimization Juan Pablo Vielma Department of Industrial Engineering University of Pittsburgh EWO Seminar, 011 Pittsburgh, PA Uncertainty in Optimization

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Least Squares Regression

Least Squares Regression E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

The definitions and notation are those introduced in the lectures slides. R Ex D [h

The definitions and notation are those introduced in the lectures slides. R Ex D [h Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 2 October 04, 2016 Due: October 18, 2016 A. Rademacher complexity The definitions and notation

More information

Support Vector Machines Explained

Support Vector Machines Explained December 23, 2008 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),

More information

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple

More information

Lecture Notes on Support Vector Machine

Lecture Notes on Support Vector Machine Lecture Notes on Support Vector Machine Feng Li fli@sdu.edu.cn Shandong University, China 1 Hyperplane and Margin In a n-dimensional space, a hyper plane is defined by ω T x + b = 0 (1) where ω R n is

More information

Homework 3. Convex Optimization /36-725

Homework 3. Convex Optimization /36-725 Homework 3 Convex Optimization 10-725/36-725 Due Friday October 14 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

ML (cont.): SUPPORT VECTOR MACHINES

ML (cont.): SUPPORT VECTOR MACHINES ML (cont.): SUPPORT VECTOR MACHINES CS540 Bryan R Gibson University of Wisconsin-Madison Slides adapted from those used by Prof. Jerry Zhu, CS540-1 1 / 40 Support Vector Machines (SVMs) The No-Math Version

More information

Regularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline

Regularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline Other Measures 1 / 52 sscott@cse.unl.edu learning can generally be distilled to an optimization problem Choose a classifier (function, hypothesis) from a set of functions that minimizes an objective function

More information

A Randomized Algorithm for Large Scale Support Vector Learning

A Randomized Algorithm for Large Scale Support Vector Learning A Randomized Algorithm for Large Scale Support Vector Learning Krishnan S. Department of Computer Science and Automation Indian Institute of Science Bangalore-12 rishi@csa.iisc.ernet.in Chiranjib Bhattacharyya

More information

Lecture 4: Training a Classifier

Lecture 4: Training a Classifier Lecture 4: Training a Classifier Roger Grosse 1 Introduction Now that we ve defined what binary classification is, let s actually train a classifier. We ll approach this problem in much the same way as

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

Max Margin-Classifier

Max Margin-Classifier Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts Sridhar Mahadevan: CMPSCI 689 p. 1/32 Margin Classifiers margin b = 0 Sridhar Mahadevan: CMPSCI 689 p.

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Machine Learning Practice Page 2 of 2 10/28/13

Machine Learning Practice Page 2 of 2 10/28/13 Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Ensembles Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot, we get creative in two

More information

Stephen Scott.

Stephen Scott. 1 / 35 (Adapted from Ethem Alpaydin and Tom Mitchell) sscott@cse.unl.edu In Homework 1, you are (supposedly) 1 Choosing a data set 2 Extracting a test set of size > 30 3 Building a tree on the training

More information