Short Course Robust Optimization and Machine Learning. Lecture 6: Robust Optimization in Machine Learning

Size: px
Start display at page:

Download "Short Course Robust Optimization and Machine Learning. Lecture 6: Robust Optimization in Machine Learning"

Transcription

1 Short Course Robust Optimization and Machine Machine Lecture 6: Robust Optimization in Machine Laurent El Ghaoui EECS and IEOR Departments UC Berkeley Spring seminar TRANSP-OR, Zinal, Jan , 2012

2 Outline Machine

3 Outline Machine

4 Supervised learning problems Many supervised learning problems (e.g., classification, regression) can be written as min w L(X T w) where L is convex, and X contains the data. Machine

5 Penalty approach Often, optimal value and solutions of optimization problems are sensitive to data. A common approach to deal with sensitivity is via penalization, e.g.: min x L(X T w) + Wx 2 2 (W = weighting matrix). How do we choose the penalty? Can we choose it in a way that reflects knowledge about problem structure, or how uncertainty affects data? Does it lead to better solutions from machine learning viewpoint? Machine

6 Support Vector Machine Support Vector Machine (SVM) classification problem: min w,b m (1 y i (zi T w + b)) + i=1 Z := [z 1,..., z m] R n m contains the data points. y { 1, 1} m contain the labels. x := (w, b) contains the classifier parameters, allowing to classify a new point z via the rule y = sgn(z T w + b). Machine

7 Robustness to data uncertainty Assume the data matrix is only partially known, and address the robust optimization problem: min w,b max U U m (1 y i ((z i + u i ) T w + b)) +, i=1 where U = [u 1,..., u m] and U R n m is a set that describes additive uncertainty in the data matrix. Machine

8 Measurement-wise, spherical uncertainty Assume Machine where ρ > 0 is given. U = {U = [u 1,..., u m] R n m : u i 2 ρ}, Robust SVM reduces to min w,b m (1 y i (zi T w + b) + ρ w 2 ) +. i=1

9 Link with classical SVM Classical SVM contains l 2 -norm regularization term: min w,b m (1 y i (zi T w + b)) + + λ w 2 2. i=1 where λ > 0 is a penalty parameter. With spherical uncertainty, robust SVM is similar to classical SVM. Machine When data is separable, the two models are equivalent...

10 Separable data Machine Maximally robust classifier for separable data, with spherical uncertainties around each data point. In this case, the robust counterpart reduces to the classical maximum-margin classifier problem.

11 Interval uncertainty Assume U = {U R n m where ρ > 0 is given. : (i, j), U ij ρ}, Machine Robust SVM reduces to min w,b m (1 y i (zi T w + b) + ρ w 1 ) +. i=1 The l 1 -norm term encourages sparsity, and may not regularize the solution.

12 Separable data Machine Maximally robust classifier for separable data, with box uncertainties around each data point. This uncertainty model encourages sparsity of the solution.

13 Other uncertainty models We may generalize the approach to other uncertainty models, retaining tractability: Measurement-wise uncertainty models: perturbations affect each data point independent of each other. Other models couple the way uncertainties affect each measurement; for example we may control the number of errors across all the measurements. Norm-bound models allow for uncertainty of data matrix that is bounded in matrix norm. A whole theory is presented in [1]. Machine

14 Consider standard l 1 -penalized SVM: φ λ (X) := min w,b Constrained counterpart: ψ c(x) := min w,b 1 m m (1 y i (w T x i + b)) + + λ w 1 i=1 m (1 y i (xi T w + b)) + : w 1 c i=1 Machine Basic goal: solve these problems in the large-scale case. Approach: use to sparsify the data matrix in a controlled way.

15 Thresholding data We threshold the data using an absolute level t: { 0 if xi,j t (x i (t)) j := 1 otherwise This will make the data sparser, resulting in memory and time savings. Machine

16 Handling thresholding errors Handle thresholding errors via robust counterpart: (w(t), b(t)) := arg min w,b Above problem is tractable. max Z X t m (1 y i (w T z i + b)) + + λ w 1. i=1 Machine The solution w(t) at threshold level t satisfies 0 1 m m (1 y i (x T i w(t) + b(t))) + + λ w(t) 1 φ λ (X) 2t λ. i=1

17 Results 20 news groups data set Machine Dataset size: 20, , 000. Thresholding of data matrix of TF-IDF scores.

18 Results UCI NYTimes Dataset 1 stock 11 bond 2 nasdaq 12 forecast 3 portfolio 13 thomson financial 4 brokerage 14 index 5 exchanges 15 royal bank 6 shareholder 16 fund 7 fund 17 marketing 8 investor 18 companies 9 alan greenspan 19 bank 10 fed 20 merrill Top 20 keywords for topic stock. Dataset size: 100, , 660, 30,000,000 non-zeros. Thresholded dataset (by TF-IDF scores) with level ,000 non-zeroes (2.8 %). Total run time: 4317s. Machine

19 Robust SVM with Data: boolean Z {0, 1} n m (eg, co-occurence matrix) Nominal problem: SVM min w,b m (1 y i (zi T w + b)) +, i=1 Uncertainty model: assume each data value can be flipped, total budget of flips is constrained: { } U = U = [u 1,..., u m] R l m : u i { 1, 0, 1} l, u 1 k. Machine

20 Robust counterpart where min w,b m (1 y i (zi T w + b) + φ(w)) +, i=1 φ(w) := min s k w s + s 1 Penalty is a combination of l 1, l norms. Problem is tractable (doubles number of variables over nominal). Still needs regularization. Machine

21 Results UCI Internet advertisement data set Machine Dataset size: k = 0 corresponds to nominal SVM problem. Best performance at k = 3.

22 Refined model We can impose u i {0, 1 2x i }. This leads to a new penalty: with min w,b m (1 y i (xi T w + b) + φ i (w)) +, i=1 n φ i (w) := min kµ + (y i w j (2x ij 1) µ) + µ 0 Problem can still be solved via LP. j=1 Machine

23 Results UCI Heart data set Machine Dataset size: k = 0 corresponds to nominal SVM problem. Best performance at k = 1.

24 Outline Machine

25 Nominal problem where min L(Z T θ), θ Θ Z := [z 1,..., z m] R n m is the data matrix L : R m R is a convex loss function Θ imposes structure (eg, sign) constraints on parameter vector θ Machine

26 Loss function: assumptions We assume that L(r) = π(abs(p(r))), where abs( ) acts componentwise, π : R m + R is a convex, monotone function on the non-negative orthant, and { r ( symmetric case ) P(r) = ( asymmetric case ) r + with r + the vector with components max(r i, 0), i = 1,..., m. Machine

27 Loss function: examples l p-norm regression hinge loss Huber, Berhu loss Machine

28 Robust counterpart min max θ Θ Z Z L(ZT θ). where Z R n m is a set of the form Z = {Z + : ρd, }, with ρ 0 a measure of the size of the uncertainty, and D R l m is given. Machine

29 Generic analysis For a given vector θ, we have max Z Z L(ZT θ) = max u where L is the conjugate of L, and is the support function of D. u T Z T θ L (u) + ρφ D(uv T ), φ D(X) := max X, D Machine

30 Assumptions on uncertainty set D Separability condition: there exist two semi-norms φ, ψ such that φ D(uv T ) := max D ut v = φ(u)ψ(v). Does not completely characterize (the support function of) the set D Given φ, ψ, we can construct a set D out that obeys condition The robust counterpart only depends on φ, ψ. WLOG, we can replace D by its convex hull. Machine

31 Largest singular value model: D = { : ρ}, with φ, ψ Euclidean norms. Any norm-bound model involving an induced norm (φ, ψ are then the norms dual to the norms involved). Measurement-wise uncertainty models, where each column of the perturbation matrix is bounded in norm, independently of the others, correspond to the case with ψ(v) = v 1. Machine

32 Other examples Bounded-error model: there are (at most K ) errors affecting data Machine D = = [λ 1 δ 1,..., λ mδ m] R l m : δ i 1, i = 1,..., m,. m λ i K, λ {0, 1} m for which φ( ) =, ψ(v) = sum of the K largest magnitudes of the components of v. i=1

33 (follow d) The set { D = = [λ 1 δ 1,..., λ mδ m] R l m } : δ i { 1, 0, 1} l, δ 1 k models measurement-wise uncertainty affecting (we can impose δ i {x i 1, x i } to be more realistic) In this case, we have ψ( ) = 1 and φ(u) = u 1,k := min w k u w + w 1. Machine

34 Main result For a given vector θ, we have where min θ max Z Z L(ZT θ) = min θ,κ Lwc(Z T θ, κ) : κ φ(u T θ) L(r, κ) := max v v T r L (v) + κψ(v) is the worst-case loss function of the robust problem. Machine

35 Worst-case loss function The tractability of the robust counterpart is directly linked to our ability to compute optimal solutions v for L(r, κ) = max v v T r L (v) + κψ(v) Dual representation (assume ψ( ) = is a norm): L(r, κ) = max ξ L(r + κξ) : ξ 1 When ψ is the Euclidean norm, robust regularization of L (Lewis, 2001). Machine

36 When ψ( ) = p, p = 1,, problem reduces to simple, tractable convex problem (assuming nominal problem is). For p = 2, problem can be reduced to such a simple form, for the hinge, l q-norm and Huber loss functions. Machine

37 Lasso In particular, the least-squares problem with lasso penalty min θ X T θ y 2 + ρ θ 1 is the robust counterpart to a least-squares problem with uncertainty on X, with additive perturbation bounded in the norm n 1,2 := max 2 ij. 1 i l j=1 Machine

38 Globalized robust counterpart The robust counterpart is based on the worst-case value of the loss function assuming a bound on the data uncertainty (Z Z): min max θ Θ Z Z L(ZT θ). The approach does not control the degradation of the loss outside the set Z. Machine

39 Globalized robust counterpart: formulation In globalized robust counterpart, we fix a rate of degradation of the loss, which controls the amount of degradation of the loss as the data matrix Z goes away from the set Z. We seek to minimize τ, such that : L((Z + ) T θ) τ + α, where α > 0 controls the rate of degradation, and is a matrix norm. Machine

40 Globalized robust counterpart For the SVM case, the globalized robust counterpart can be expressed as: min w,b m (1 y i (zi T w + b)) + : m θ 2 α, i=1 which is a classical form of SVM. For l p-norm regression with m data points, the globalized robust counterpart takes the form min X T θ y p : κ(m, p) θ 2 α θ Machine where κ(m, 1) = m, κ(m, 2) = κ(m, ) = 1.

41 can address problems with chance constraints min θ max EpL(Z p P (δ)t θ) where δ follows distribution p, and P is a class of distributions Results are more limited, focused on upper bounds. Convex relaxations are available, but more expensive. Approach uses Bernstein approximations (Nemirovski & Ben-tal, 2006). Machine

42 Robust regression with chance constraints: an example φ p := min θ max x (ˆx,X) Regression variable is θ R n E x A(x)θ b(x) p x R q is an uncertainty vector that enters affinely in the problem matrices: [A(x), b(x)] = [A 0, b 0 ] + i x i[a i, b i ]. The distribution of uncertainty vector x is unknown, except for its mean ˆx and covariance X. Objective is worst-case (over distributions) expected value of l p-norm residual (p = 1, 2). Machine

43 Main result (Assume ˆx = 0, X = I WLOG) For p = 2, the problem reduces to least-squares: Machine φ 2 2 = min θ q A i θ b i 2 2 i=0 For p = 1, we have (2/π)ψ 1 φ 1 ψ 1, with ψ 1 = min θ q A i θ b i 2 i=0

44 Example: robust median As a special case, consider the median problem: min θ q θ x i i=1 Now assume that vector x is random, with mean ˆx and covariance X, and consider the robust version: φ 1 := min θ max E x (ˆx,X) x q θ x i i=1 Machine

45 Approximate solution We have (2/π)ψ 1 φ 1 ψ 1, with ψ 1 := n i=1 (θ ˆx i ) 2 + X ii Amounts to find the minimum distance sum (a very simple SOCP). Machine

46 Geometry of robust median problem std Nominal vs. robust median nominal points std dev. median robust median Machine x

47 Outline Machine

48 Machine A. Bental, L. El Ghaoui, and A. Nemirovski. Robust Optimization. Princeton Series in Applied Mathematics. Princeton University Press, October C. Caramanis, S. Mannor, and H. Xu. Robust optimization in machine learning. In S. Sra, S. Nowozin, and S. Wright, editors, Optimization for Machine, chapter 14. MIT Press, 2011.

Robustness and Regularization: An Optimization Perspective

Robustness and Regularization: An Optimization Perspective Robustness and Regularization: An Optimization Perspective Laurent El Ghaoui (EECS/IEOR, UC Berkeley) with help from Brian Gawalt, Onureena Banerjee Neyman Seminar, Statistics Department, UC Berkeley September

More information

Short Course Robust Optimization and Machine Learning. 3. Optimization in Supervised Learning

Short Course Robust Optimization and Machine Learning. 3. Optimization in Supervised Learning Short Course Robust Optimization and 3. Optimization in Supervised EECS and IEOR Departments UC Berkeley Spring seminar TRANSP-OR, Zinal, Jan. 16-19, 2012 Outline Overview of Supervised models and variants

More information

Convex Optimization in Classification Problems

Convex Optimization in Classification Problems New Trends in Optimization and Computational Algorithms December 9 13, 2001 Convex Optimization in Classification Problems Laurent El Ghaoui Department of EECS, UC Berkeley elghaoui@eecs.berkeley.edu 1

More information

Sparse and Robust Optimization and Applications

Sparse and Robust Optimization and Applications Sparse and and Statistical Learning Workshop Les Houches, 2013 Robust Laurent El Ghaoui with Mert Pilanci, Anh Pham EECS Dept., UC Berkeley January 7, 2013 1 / 36 Outline Sparse Sparse Sparse Probability

More information

Short Course Robust Optimization and Machine Learning. Lecture 4: Optimization in Unsupervised Learning

Short Course Robust Optimization and Machine Learning. Lecture 4: Optimization in Unsupervised Learning Short Course Robust Optimization and Machine Machine Lecture 4: Optimization in Unsupervised Laurent El Ghaoui EECS and IEOR Departments UC Berkeley Spring seminar TRANSP-OR, Zinal, Jan. 16-19, 2012 s

More information

Semidefinite and Second Order Cone Programming Seminar Fall 2012 Project: Robust Optimization and its Application of Robust Portfolio Optimization

Semidefinite and Second Order Cone Programming Seminar Fall 2012 Project: Robust Optimization and its Application of Robust Portfolio Optimization Semidefinite and Second Order Cone Programming Seminar Fall 2012 Project: Robust Optimization and its Application of Robust Portfolio Optimization Instructor: Farid Alizadeh Author: Ai Kagawa 12/12/2012

More information

Large-scale Robust Optimization and Applications

Large-scale Robust Optimization and Applications and Applications : Applications Laurent El Ghaoui EECS and IEOR Departments UC Berkeley SESO 2015 Tutorial s June 22, 2015 Outline of Machine Learning s Robust for s Outline of Machine Learning s Robust

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley A. d Aspremont, INFORMS, Denver,

More information

Sparse PCA with applications in finance

Sparse PCA with applications in finance Sparse PCA with applications in finance A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon 1 Introduction

More information

Second Order Cone Programming, Missing or Uncertain Data, and Sparse SVMs

Second Order Cone Programming, Missing or Uncertain Data, and Sparse SVMs Second Order Cone Programming, Missing or Uncertain Data, and Sparse SVMs Ammon Washburn University of Arizona September 25, 2015 1 / 28 Introduction We will begin with basic Support Vector Machines (SVMs)

More information

Robust Optimization and Data Approximation in Machine Learning

Robust Optimization and Data Approximation in Machine Learning Robust Optimization and Data Approximation in Machine Learning Gia Vinh Anh Pham Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2015-216

More information

Robust Optimization and Data Approximation in Machine Learning. Gia Vinh Anh Pham

Robust Optimization and Data Approximation in Machine Learning. Gia Vinh Anh Pham Robust Optimization and Data Approximation in Machine Learning by Gia Vinh Anh Pham A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer

More information

Machine Learning And Applications: Supervised Learning-SVM

Machine Learning And Applications: Supervised Learning-SVM Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon

More information

EE 227A: Convex Optimization and Applications April 24, 2008

EE 227A: Convex Optimization and Applications April 24, 2008 EE 227A: Convex Optimization and Applications April 24, 2008 Lecture 24: Robust Optimization: Chance Constraints Lecturer: Laurent El Ghaoui Reading assignment: Chapter 2 of the book on Robust Optimization

More information

CSC 411 Lecture 17: Support Vector Machine

CSC 411 Lecture 17: Support Vector Machine CSC 411 Lecture 17: Support Vector Machine Ethan Fetaya, James Lucas and Emad Andrews University of Toronto CSC411 Lec17 1 / 1 Today Max-margin classification SVM Hard SVM Duality Soft SVM CSC411 Lec17

More information

Convex Optimization M2

Convex Optimization M2 Convex Optimization M2 Lecture 8 A. d Aspremont. Convex Optimization M2. 1/57 Applications A. d Aspremont. Convex Optimization M2. 2/57 Outline Geometrical problems Approximation problems Combinatorial

More information

Interval Data Classification under Partial Information: A Chance-constraint Approach

Interval Data Classification under Partial Information: A Chance-constraint Approach Interval Data Classification under Partial Information: A Chance-constraint Approach Sahely Bhadra J. Saketha Nath Aharon Ben-Tal Chiranjib Bhattacharyya PAKDD 2009 J. Saketha Nath (PAKDD 09) Conference

More information

A Second order Cone Programming Formulation for Classifying Missing Data

A Second order Cone Programming Formulation for Classifying Missing Data A Second order Cone Programming Formulation for Classifying Missing Data Chiranjib Bhattacharyya Department of Computer Science and Automation Indian Institute of Science Bangalore, 560 012, India chiru@csa.iisc.ernet.in

More information

Robust linear optimization under general norms

Robust linear optimization under general norms Operations Research Letters 3 (004) 50 56 Operations Research Letters www.elsevier.com/locate/dsw Robust linear optimization under general norms Dimitris Bertsimas a; ;, Dessislava Pachamanova b, Melvyn

More information

Handout 8: Dealing with Data Uncertainty

Handout 8: Dealing with Data Uncertainty MFE 5100: Optimization 2015 16 First Term Handout 8: Dealing with Data Uncertainty Instructor: Anthony Man Cho So December 1, 2015 1 Introduction Conic linear programming CLP, and in particular, semidefinite

More information

A Least Squares Formulation for Canonical Correlation Analysis

A Least Squares Formulation for Canonical Correlation Analysis A Least Squares Formulation for Canonical Correlation Analysis Liang Sun, Shuiwang Ji, and Jieping Ye Department of Computer Science and Engineering Arizona State University Motivation Canonical Correlation

More information

Lecture 7: Kernels for Classification and Regression

Lecture 7: Kernels for Classification and Regression Lecture 7: Kernels for Classification and Regression CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 15, 2011 Outline Outline A linear regression problem Linear auto-regressive

More information

Robust Inverse Covariance Estimation under Noisy Measurements

Robust Inverse Covariance Estimation under Noisy Measurements .. Robust Inverse Covariance Estimation under Noisy Measurements Jun-Kun Wang, Shou-De Lin Intel-NTU, National Taiwan University ICML 2014 1 / 30 . Table of contents Introduction.1 Introduction.2 Related

More information

Announcements - Homework

Announcements - Homework Announcements - Homework Homework 1 is graded, please collect at end of lecture Homework 2 due today Homework 3 out soon (watch email) Ques 1 midterm review HW1 score distribution 40 HW1 total score 35

More information

CS295: Convex Optimization. Xiaohui Xie Department of Computer Science University of California, Irvine

CS295: Convex Optimization. Xiaohui Xie Department of Computer Science University of California, Irvine CS295: Convex Optimization Xiaohui Xie Department of Computer Science University of California, Irvine Course information Prerequisites: multivariate calculus and linear algebra Textbook: Convex Optimization

More information

Support vector machines

Support vector machines Support vector machines Guillaume Obozinski Ecole des Ponts - ParisTech SOCN course 2014 SVM, kernel methods and multiclass 1/23 Outline 1 Constrained optimization, Lagrangian duality and KKT 2 Support

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Handout 6: Some Applications of Conic Linear Programming

Handout 6: Some Applications of Conic Linear Programming ENGG 550: Foundations of Optimization 08 9 First Term Handout 6: Some Applications of Conic Linear Programming Instructor: Anthony Man Cho So November, 08 Introduction Conic linear programming CLP, and

More information

LIGHT ROBUSTNESS. Matteo Fischetti, Michele Monaci. DEI, University of Padova. 1st ARRIVAL Annual Workshop and Review Meeting, Utrecht, April 19, 2007

LIGHT ROBUSTNESS. Matteo Fischetti, Michele Monaci. DEI, University of Padova. 1st ARRIVAL Annual Workshop and Review Meeting, Utrecht, April 19, 2007 LIGHT ROBUSTNESS Matteo Fischetti, Michele Monaci DEI, University of Padova 1st ARRIVAL Annual Workshop and Review Meeting, Utrecht, April 19, 2007 Work supported by the Future and Emerging Technologies

More information

Selected Examples of CONIC DUALITY AT WORK Robust Linear Optimization Synthesis of Linear Controllers Matrix Cube Theorem A.

Selected Examples of CONIC DUALITY AT WORK Robust Linear Optimization Synthesis of Linear Controllers Matrix Cube Theorem A. . Selected Examples of CONIC DUALITY AT WORK Robust Linear Optimization Synthesis of Linear Controllers Matrix Cube Theorem A. Nemirovski Arkadi.Nemirovski@isye.gatech.edu Linear Optimization Problem,

More information

Support Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs

Support Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs E0 270 Machine Learning Lecture 5 (Jan 22, 203) Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.

More information

Robust Optimization for Risk Control in Enterprise-wide Optimization

Robust Optimization for Risk Control in Enterprise-wide Optimization Robust Optimization for Risk Control in Enterprise-wide Optimization Juan Pablo Vielma Department of Industrial Engineering University of Pittsburgh EWO Seminar, 011 Pittsburgh, PA Uncertainty in Optimization

More information

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015 EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,

More information

Robust Kernel-Based Regression

Robust Kernel-Based Regression Robust Kernel-Based Regression Budi Santosa Department of Industrial Engineering Sepuluh Nopember Institute of Technology Kampus ITS Surabaya Surabaya 60111,Indonesia Theodore B. Trafalis School of Industrial

More information

Interval solutions for interval algebraic equations

Interval solutions for interval algebraic equations Mathematics and Computers in Simulation 66 (2004) 207 217 Interval solutions for interval algebraic equations B.T. Polyak, S.A. Nazin Institute of Control Sciences, Russian Academy of Sciences, 65 Profsoyuznaya

More information

Support Vector Machines: Maximum Margin Classifiers

Support Vector Machines: Maximum Margin Classifiers Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 16, 2008 Piotr Mirowski Based on slides by Sumit Chopra and Fu-Jie Huang 1 Outline What is behind

More information

Robust Novelty Detection with Single Class MPM

Robust Novelty Detection with Single Class MPM Robust Novelty Detection with Single Class MPM Gert R.G. Lanckriet EECS, U.C. Berkeley gert@eecs.berkeley.edu Laurent El Ghaoui EECS, U.C. Berkeley elghaoui@eecs.berkeley.edu Michael I. Jordan Computer

More information

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines Gautam Kunapuli Example: Text Categorization Example: Develop a model to classify news stories into various categories based on their content. sports politics Use the bag-of-words representation for this

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Estimation and Optimization: Gaps and Bridges. MURI Meeting June 20, Laurent El Ghaoui. UC Berkeley EECS

Estimation and Optimization: Gaps and Bridges. MURI Meeting June 20, Laurent El Ghaoui. UC Berkeley EECS MURI Meeting June 20, 2001 Estimation and Optimization: Gaps and Bridges Laurent El Ghaoui EECS UC Berkeley 1 goals currently, estimation (of model parameters) and optimization (of decision variables)

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete

More information

Warm up: risk prediction with logistic regression

Warm up: risk prediction with logistic regression Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T

More information

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017 Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training

More information

Canonical Problem Forms. Ryan Tibshirani Convex Optimization

Canonical Problem Forms. Ryan Tibshirani Convex Optimization Canonical Problem Forms Ryan Tibshirani Convex Optimization 10-725 Last time: optimization basics Optimization terology (e.g., criterion, constraints, feasible points, solutions) Properties and first-order

More information

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 18

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 18 EE/ACM 150 - Applications of Convex Optimization in Signal Processing and Communications Lecture 18 Andre Tkacenko Signal Processing Research Group Jet Propulsion Laboratory May 31, 2012 Andre Tkacenko

More information

Lecture 20: November 1st

Lecture 20: November 1st 10-725: Optimization Fall 2012 Lecture 20: November 1st Lecturer: Geoff Gordon Scribes: Xiaolong Shen, Alex Beutel Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have not

More information

Statistical Analysis of Online News

Statistical Analysis of Online News Statistical Analysis of Online News Laurent El Ghaoui EECS, UC Berkeley Information Systems Colloquium, Stanford University November 29, 2007 Collaborators Joint work with and UCB students: Alexandre d

More information

Support Vector Machines for Classification: A Statistical Portrait

Support Vector Machines for Classification: A Statistical Portrait Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,

More information

Brief Introduction to Machine Learning

Brief Introduction to Machine Learning Brief Introduction to Machine Learning Yuh-Jye Lee Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU August 29, 2016 1 / 49 1 Introduction 2 Binary Classification 3 Support Vector

More information

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall

More information

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning Kernel Machines Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 SVM linearly separable case n training points (x 1,, x n ) d features x j is a d-dimensional vector Primal problem:

More information

1 Robust optimization

1 Robust optimization ORF 523 Lecture 16 Princeton University Instructor: A.A. Ahmadi Scribe: G. Hall Any typos should be emailed to a a a@princeton.edu. In this lecture, we give a brief introduction to robust optimization

More information

IE 521 Convex Optimization

IE 521 Convex Optimization Lecture 14: and Applications 11th March 2019 Outline LP SOCP SDP LP SOCP SDP 1 / 21 Conic LP SOCP SDP Primal Conic Program: min c T x s.t. Ax K b (CP) : b T y s.t. A T y = c (CD) y K 0 Theorem. (Strong

More information

Support Vector Machines, Kernel SVM

Support Vector Machines, Kernel SVM Support Vector Machines, Kernel SVM Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 27, 2017 1 / 40 Outline 1 Administration 2 Review of last lecture 3 SVM

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot, we get creative in two

More information

Sparse Covariance Selection using Semidefinite Programming

Sparse Covariance Selection using Semidefinite Programming Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support

More information

The Alternating Direction Method of Multipliers

The Alternating Direction Method of Multipliers The Alternating Direction Method of Multipliers With Adaptive Step Size Selection Peter Sutor, Jr. Project Advisor: Professor Tom Goldstein December 2, 2015 1 / 25 Background The Dual Problem Consider

More information

Mathematical Optimization Models and Applications

Mathematical Optimization Models and Applications Mathematical Optimization Models and Applications Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/ yyye Chapters 1, 2.1-2,

More information

Lecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University

Lecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University Lecture 18: Kernels Risk and Loss Support Vector Regression Aykut Erdem December 2016 Hacettepe University Administrative We will have a make-up lecture on next Saturday December 24, 2016 Presentations

More information

CS , Fall 2011 Assignment 2 Solutions

CS , Fall 2011 Assignment 2 Solutions CS 94-0, Fall 20 Assignment 2 Solutions (8 pts) In this question we briefly review the expressiveness of kernels (a) Construct a support vector machine that computes the XOR function Use values of + and

More information

Stat542 (F11) Statistical Learning. First consider the scenario where the two classes of points are separable.

Stat542 (F11) Statistical Learning. First consider the scenario where the two classes of points are separable. Linear SVM (separable case) First consider the scenario where the two classes of points are separable. It s desirable to have the width (called margin) between the two dashed lines to be large, i.e., have

More information

SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels

SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels Karl Stratos June 21, 2018 1 / 33 Tangent: Some Loose Ends in Logistic Regression Polynomial feature expansion in logistic

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

15. Conic optimization

15. Conic optimization L. Vandenberghe EE236C (Spring 216) 15. Conic optimization conic linear program examples modeling duality 15-1 Generalized (conic) inequalities Conic inequality: a constraint x K where K is a convex cone

More information

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer Vicente L. Malave February 23, 2011 Outline Notation minimize a number of functions φ

More information

Support Vector Machines and Kernel Methods

Support Vector Machines and Kernel Methods 2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University

More information

Homework 4. Convex Optimization /36-725

Homework 4. Convex Optimization /36-725 Homework 4 Convex Optimization 10-725/36-725 Due Friday November 4 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

Learning Classification with Auxiliary Probabilistic Information Quang Nguyen Hamed Valizadegan Milos Hauskrecht

Learning Classification with Auxiliary Probabilistic Information Quang Nguyen Hamed Valizadegan Milos Hauskrecht Learning Classification with Auxiliary Probabilistic Information Quang Nguyen Hamed Valizadegan Milos Hauskrecht Computer Science Department University of Pittsburgh Outline Introduction Learning with

More information

Max Margin-Classifier

Max Margin-Classifier Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization

More information

NEW ROBUST UNSUPERVISED SUPPORT VECTOR MACHINES

NEW ROBUST UNSUPERVISED SUPPORT VECTOR MACHINES J Syst Sci Complex () 24: 466 476 NEW ROBUST UNSUPERVISED SUPPORT VECTOR MACHINES Kun ZHAO Mingyu ZHANG Naiyang DENG DOI:.7/s424--82-8 Received: March 8 / Revised: 5 February 9 c The Editorial Office of

More information

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS RETRIEVAL MODELS Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Retrieval models Boolean model Vector space model Probabilistic

More information

Lecture 1: Introduction

Lecture 1: Introduction EE 227A: Convex Optimization and Applications January 17 Lecture 1: Introduction Lecturer: Anh Pham Reading assignment: Chapter 1 of BV 1. Course outline and organization Course web page: http://www.eecs.berkeley.edu/~elghaoui/teaching/ee227a/

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

More information

Robust portfolio selection under norm uncertainty

Robust portfolio selection under norm uncertainty Wang and Cheng Journal of Inequalities and Applications (2016) 2016:164 DOI 10.1186/s13660-016-1102-4 R E S E A R C H Open Access Robust portfolio selection under norm uncertainty Lei Wang 1 and Xi Cheng

More information

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie Computational Biology Program Memorial Sloan-Kettering Cancer Center http://cbio.mskcc.org/leslielab

More information

A Convex Upper Bound on the Log-Partition Function for Binary Graphical Models

A Convex Upper Bound on the Log-Partition Function for Binary Graphical Models A Convex Upper Bound on the Log-Partition Function for Binary Graphical Models Laurent El Ghaoui and Assane Gueye Department of Electrical Engineering and Computer Science University of California Berkeley

More information

Lasso: Algorithms and Extensions

Lasso: Algorithms and Extensions ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions

More information

A Robust Portfolio Technique for Mitigating the Fragility of CVaR Minimization

A Robust Portfolio Technique for Mitigating the Fragility of CVaR Minimization A Robust Portfolio Technique for Mitigating the Fragility of CVaR Minimization Jun-ya Gotoh Department of Industrial and Systems Engineering Chuo University 2-3-27 Kasuga, Bunkyo-ku, Tokyo 2-855, Japan

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Uppsala University Department of Linguistics and Philology Slides borrowed from Ryan McDonald, Google Research Machine Learning for NLP 1(50) Introduction Linear Classifiers Classifiers

More information

Robust conic quadratic programming with ellipsoidal uncertainties

Robust conic quadratic programming with ellipsoidal uncertainties Robust conic quadratic programming with ellipsoidal uncertainties Roland Hildebrand (LJK Grenoble 1 / CNRS, Grenoble) KTH, Stockholm; November 13, 2008 1 Uncertain conic programs min x c, x : Ax + b K

More information

Statistical Methods for Data Mining

Statistical Methods for Data Mining Statistical Methods for Data Mining Kuangnan Fang Xiamen University Email: xmufkn@xmu.edu.cn Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find

More information

Lecture 6: Conic Optimization September 8

Lecture 6: Conic Optimization September 8 IE 598: Big Data Optimization Fall 2016 Lecture 6: Conic Optimization September 8 Lecturer: Niao He Scriber: Juan Xu Overview In this lecture, we finish up our previous discussion on optimality conditions

More information

Sparse Principal Component Analysis via Alternating Maximization and Efficient Parallel Implementations

Sparse Principal Component Analysis via Alternating Maximization and Efficient Parallel Implementations Sparse Principal Component Analysis via Alternating Maximization and Efficient Parallel Implementations Martin Takáč The University of Edinburgh Joint work with Peter Richtárik (Edinburgh University) Selin

More information

Jeff Howbert Introduction to Machine Learning Winter

Jeff Howbert Introduction to Machine Learning Winter Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable

More information

MLCC 2017 Regularization Networks I: Linear Models

MLCC 2017 Regularization Networks I: Linear Models MLCC 2017 Regularization Networks I: Linear Models Lorenzo Rosasco UNIGE-MIT-IIT June 27, 2017 About this class We introduce a class of learning algorithms based on Tikhonov regularization We study computational

More information

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM 1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Machine Learning. Support Vector Machines. Manfred Huber

Machine Learning. Support Vector Machines. Manfred Huber Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data

More information

Accelerated Proximal Gradient Methods for Convex Optimization

Accelerated Proximal Gradient Methods for Convex Optimization Accelerated Proximal Gradient Methods for Convex Optimization Paul Tseng Mathematics, University of Washington Seattle MOPTA, University of Guelph August 18, 2008 ACCELERATED PROXIMAL GRADIENT METHODS

More information

The Perceptron Algorithm, Margins

The Perceptron Algorithm, Margins The Perceptron Algorithm, Margins MariaFlorina Balcan 08/29/2018 The Perceptron Algorithm Simple learning algorithm for supervised classification analyzed via geometric margins in the 50 s [Rosenblatt

More information

COS 424: Interacting with Data

COS 424: Interacting with Data COS 424: Interacting with Data Lecturer: Rob Schapire Lecture #14 Scribe: Zia Khan April 3, 2007 Recall from previous lecture that in regression we are trying to predict a real value given our data. Specically,

More information

Convex Optimization and Support Vector Machine

Convex Optimization and Support Vector Machine Convex Optimization and Support Vector Machine Problem 0. Consider a two-class classification problem. The training data is L n = {(x 1, t 1 ),..., (x n, t n )}, where each t i { 1, 1} and x i R p. We

More information

Homework 3. Convex Optimization /36-725

Homework 3. Convex Optimization /36-725 Homework 3 Convex Optimization 10-725/36-725 Due Friday October 14 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Robust Principal Component Analysis

Robust Principal Component Analysis ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M

More information