Short Course Robust Optimization and Machine Learning. Lecture 6: Robust Optimization in Machine Learning
|
|
- Anissa Young
- 6 years ago
- Views:
Transcription
1 Short Course Robust Optimization and Machine Machine Lecture 6: Robust Optimization in Machine Laurent El Ghaoui EECS and IEOR Departments UC Berkeley Spring seminar TRANSP-OR, Zinal, Jan , 2012
2 Outline Machine
3 Outline Machine
4 Supervised learning problems Many supervised learning problems (e.g., classification, regression) can be written as min w L(X T w) where L is convex, and X contains the data. Machine
5 Penalty approach Often, optimal value and solutions of optimization problems are sensitive to data. A common approach to deal with sensitivity is via penalization, e.g.: min x L(X T w) + Wx 2 2 (W = weighting matrix). How do we choose the penalty? Can we choose it in a way that reflects knowledge about problem structure, or how uncertainty affects data? Does it lead to better solutions from machine learning viewpoint? Machine
6 Support Vector Machine Support Vector Machine (SVM) classification problem: min w,b m (1 y i (zi T w + b)) + i=1 Z := [z 1,..., z m] R n m contains the data points. y { 1, 1} m contain the labels. x := (w, b) contains the classifier parameters, allowing to classify a new point z via the rule y = sgn(z T w + b). Machine
7 Robustness to data uncertainty Assume the data matrix is only partially known, and address the robust optimization problem: min w,b max U U m (1 y i ((z i + u i ) T w + b)) +, i=1 where U = [u 1,..., u m] and U R n m is a set that describes additive uncertainty in the data matrix. Machine
8 Measurement-wise, spherical uncertainty Assume Machine where ρ > 0 is given. U = {U = [u 1,..., u m] R n m : u i 2 ρ}, Robust SVM reduces to min w,b m (1 y i (zi T w + b) + ρ w 2 ) +. i=1
9 Link with classical SVM Classical SVM contains l 2 -norm regularization term: min w,b m (1 y i (zi T w + b)) + + λ w 2 2. i=1 where λ > 0 is a penalty parameter. With spherical uncertainty, robust SVM is similar to classical SVM. Machine When data is separable, the two models are equivalent...
10 Separable data Machine Maximally robust classifier for separable data, with spherical uncertainties around each data point. In this case, the robust counterpart reduces to the classical maximum-margin classifier problem.
11 Interval uncertainty Assume U = {U R n m where ρ > 0 is given. : (i, j), U ij ρ}, Machine Robust SVM reduces to min w,b m (1 y i (zi T w + b) + ρ w 1 ) +. i=1 The l 1 -norm term encourages sparsity, and may not regularize the solution.
12 Separable data Machine Maximally robust classifier for separable data, with box uncertainties around each data point. This uncertainty model encourages sparsity of the solution.
13 Other uncertainty models We may generalize the approach to other uncertainty models, retaining tractability: Measurement-wise uncertainty models: perturbations affect each data point independent of each other. Other models couple the way uncertainties affect each measurement; for example we may control the number of errors across all the measurements. Norm-bound models allow for uncertainty of data matrix that is bounded in matrix norm. A whole theory is presented in [1]. Machine
14 Consider standard l 1 -penalized SVM: φ λ (X) := min w,b Constrained counterpart: ψ c(x) := min w,b 1 m m (1 y i (w T x i + b)) + + λ w 1 i=1 m (1 y i (xi T w + b)) + : w 1 c i=1 Machine Basic goal: solve these problems in the large-scale case. Approach: use to sparsify the data matrix in a controlled way.
15 Thresholding data We threshold the data using an absolute level t: { 0 if xi,j t (x i (t)) j := 1 otherwise This will make the data sparser, resulting in memory and time savings. Machine
16 Handling thresholding errors Handle thresholding errors via robust counterpart: (w(t), b(t)) := arg min w,b Above problem is tractable. max Z X t m (1 y i (w T z i + b)) + + λ w 1. i=1 Machine The solution w(t) at threshold level t satisfies 0 1 m m (1 y i (x T i w(t) + b(t))) + + λ w(t) 1 φ λ (X) 2t λ. i=1
17 Results 20 news groups data set Machine Dataset size: 20, , 000. Thresholding of data matrix of TF-IDF scores.
18 Results UCI NYTimes Dataset 1 stock 11 bond 2 nasdaq 12 forecast 3 portfolio 13 thomson financial 4 brokerage 14 index 5 exchanges 15 royal bank 6 shareholder 16 fund 7 fund 17 marketing 8 investor 18 companies 9 alan greenspan 19 bank 10 fed 20 merrill Top 20 keywords for topic stock. Dataset size: 100, , 660, 30,000,000 non-zeros. Thresholded dataset (by TF-IDF scores) with level ,000 non-zeroes (2.8 %). Total run time: 4317s. Machine
19 Robust SVM with Data: boolean Z {0, 1} n m (eg, co-occurence matrix) Nominal problem: SVM min w,b m (1 y i (zi T w + b)) +, i=1 Uncertainty model: assume each data value can be flipped, total budget of flips is constrained: { } U = U = [u 1,..., u m] R l m : u i { 1, 0, 1} l, u 1 k. Machine
20 Robust counterpart where min w,b m (1 y i (zi T w + b) + φ(w)) +, i=1 φ(w) := min s k w s + s 1 Penalty is a combination of l 1, l norms. Problem is tractable (doubles number of variables over nominal). Still needs regularization. Machine
21 Results UCI Internet advertisement data set Machine Dataset size: k = 0 corresponds to nominal SVM problem. Best performance at k = 3.
22 Refined model We can impose u i {0, 1 2x i }. This leads to a new penalty: with min w,b m (1 y i (xi T w + b) + φ i (w)) +, i=1 n φ i (w) := min kµ + (y i w j (2x ij 1) µ) + µ 0 Problem can still be solved via LP. j=1 Machine
23 Results UCI Heart data set Machine Dataset size: k = 0 corresponds to nominal SVM problem. Best performance at k = 1.
24 Outline Machine
25 Nominal problem where min L(Z T θ), θ Θ Z := [z 1,..., z m] R n m is the data matrix L : R m R is a convex loss function Θ imposes structure (eg, sign) constraints on parameter vector θ Machine
26 Loss function: assumptions We assume that L(r) = π(abs(p(r))), where abs( ) acts componentwise, π : R m + R is a convex, monotone function on the non-negative orthant, and { r ( symmetric case ) P(r) = ( asymmetric case ) r + with r + the vector with components max(r i, 0), i = 1,..., m. Machine
27 Loss function: examples l p-norm regression hinge loss Huber, Berhu loss Machine
28 Robust counterpart min max θ Θ Z Z L(ZT θ). where Z R n m is a set of the form Z = {Z + : ρd, }, with ρ 0 a measure of the size of the uncertainty, and D R l m is given. Machine
29 Generic analysis For a given vector θ, we have max Z Z L(ZT θ) = max u where L is the conjugate of L, and is the support function of D. u T Z T θ L (u) + ρφ D(uv T ), φ D(X) := max X, D Machine
30 Assumptions on uncertainty set D Separability condition: there exist two semi-norms φ, ψ such that φ D(uv T ) := max D ut v = φ(u)ψ(v). Does not completely characterize (the support function of) the set D Given φ, ψ, we can construct a set D out that obeys condition The robust counterpart only depends on φ, ψ. WLOG, we can replace D by its convex hull. Machine
31 Largest singular value model: D = { : ρ}, with φ, ψ Euclidean norms. Any norm-bound model involving an induced norm (φ, ψ are then the norms dual to the norms involved). Measurement-wise uncertainty models, where each column of the perturbation matrix is bounded in norm, independently of the others, correspond to the case with ψ(v) = v 1. Machine
32 Other examples Bounded-error model: there are (at most K ) errors affecting data Machine D = = [λ 1 δ 1,..., λ mδ m] R l m : δ i 1, i = 1,..., m,. m λ i K, λ {0, 1} m for which φ( ) =, ψ(v) = sum of the K largest magnitudes of the components of v. i=1
33 (follow d) The set { D = = [λ 1 δ 1,..., λ mδ m] R l m } : δ i { 1, 0, 1} l, δ 1 k models measurement-wise uncertainty affecting (we can impose δ i {x i 1, x i } to be more realistic) In this case, we have ψ( ) = 1 and φ(u) = u 1,k := min w k u w + w 1. Machine
34 Main result For a given vector θ, we have where min θ max Z Z L(ZT θ) = min θ,κ Lwc(Z T θ, κ) : κ φ(u T θ) L(r, κ) := max v v T r L (v) + κψ(v) is the worst-case loss function of the robust problem. Machine
35 Worst-case loss function The tractability of the robust counterpart is directly linked to our ability to compute optimal solutions v for L(r, κ) = max v v T r L (v) + κψ(v) Dual representation (assume ψ( ) = is a norm): L(r, κ) = max ξ L(r + κξ) : ξ 1 When ψ is the Euclidean norm, robust regularization of L (Lewis, 2001). Machine
36 When ψ( ) = p, p = 1,, problem reduces to simple, tractable convex problem (assuming nominal problem is). For p = 2, problem can be reduced to such a simple form, for the hinge, l q-norm and Huber loss functions. Machine
37 Lasso In particular, the least-squares problem with lasso penalty min θ X T θ y 2 + ρ θ 1 is the robust counterpart to a least-squares problem with uncertainty on X, with additive perturbation bounded in the norm n 1,2 := max 2 ij. 1 i l j=1 Machine
38 Globalized robust counterpart The robust counterpart is based on the worst-case value of the loss function assuming a bound on the data uncertainty (Z Z): min max θ Θ Z Z L(ZT θ). The approach does not control the degradation of the loss outside the set Z. Machine
39 Globalized robust counterpart: formulation In globalized robust counterpart, we fix a rate of degradation of the loss, which controls the amount of degradation of the loss as the data matrix Z goes away from the set Z. We seek to minimize τ, such that : L((Z + ) T θ) τ + α, where α > 0 controls the rate of degradation, and is a matrix norm. Machine
40 Globalized robust counterpart For the SVM case, the globalized robust counterpart can be expressed as: min w,b m (1 y i (zi T w + b)) + : m θ 2 α, i=1 which is a classical form of SVM. For l p-norm regression with m data points, the globalized robust counterpart takes the form min X T θ y p : κ(m, p) θ 2 α θ Machine where κ(m, 1) = m, κ(m, 2) = κ(m, ) = 1.
41 can address problems with chance constraints min θ max EpL(Z p P (δ)t θ) where δ follows distribution p, and P is a class of distributions Results are more limited, focused on upper bounds. Convex relaxations are available, but more expensive. Approach uses Bernstein approximations (Nemirovski & Ben-tal, 2006). Machine
42 Robust regression with chance constraints: an example φ p := min θ max x (ˆx,X) Regression variable is θ R n E x A(x)θ b(x) p x R q is an uncertainty vector that enters affinely in the problem matrices: [A(x), b(x)] = [A 0, b 0 ] + i x i[a i, b i ]. The distribution of uncertainty vector x is unknown, except for its mean ˆx and covariance X. Objective is worst-case (over distributions) expected value of l p-norm residual (p = 1, 2). Machine
43 Main result (Assume ˆx = 0, X = I WLOG) For p = 2, the problem reduces to least-squares: Machine φ 2 2 = min θ q A i θ b i 2 2 i=0 For p = 1, we have (2/π)ψ 1 φ 1 ψ 1, with ψ 1 = min θ q A i θ b i 2 i=0
44 Example: robust median As a special case, consider the median problem: min θ q θ x i i=1 Now assume that vector x is random, with mean ˆx and covariance X, and consider the robust version: φ 1 := min θ max E x (ˆx,X) x q θ x i i=1 Machine
45 Approximate solution We have (2/π)ψ 1 φ 1 ψ 1, with ψ 1 := n i=1 (θ ˆx i ) 2 + X ii Amounts to find the minimum distance sum (a very simple SOCP). Machine
46 Geometry of robust median problem std Nominal vs. robust median nominal points std dev. median robust median Machine x
47 Outline Machine
48 Machine A. Bental, L. El Ghaoui, and A. Nemirovski. Robust Optimization. Princeton Series in Applied Mathematics. Princeton University Press, October C. Caramanis, S. Mannor, and H. Xu. Robust optimization in machine learning. In S. Sra, S. Nowozin, and S. Wright, editors, Optimization for Machine, chapter 14. MIT Press, 2011.
Robustness and Regularization: An Optimization Perspective
Robustness and Regularization: An Optimization Perspective Laurent El Ghaoui (EECS/IEOR, UC Berkeley) with help from Brian Gawalt, Onureena Banerjee Neyman Seminar, Statistics Department, UC Berkeley September
More informationShort Course Robust Optimization and Machine Learning. 3. Optimization in Supervised Learning
Short Course Robust Optimization and 3. Optimization in Supervised EECS and IEOR Departments UC Berkeley Spring seminar TRANSP-OR, Zinal, Jan. 16-19, 2012 Outline Overview of Supervised models and variants
More informationConvex Optimization in Classification Problems
New Trends in Optimization and Computational Algorithms December 9 13, 2001 Convex Optimization in Classification Problems Laurent El Ghaoui Department of EECS, UC Berkeley elghaoui@eecs.berkeley.edu 1
More informationSparse and Robust Optimization and Applications
Sparse and and Statistical Learning Workshop Les Houches, 2013 Robust Laurent El Ghaoui with Mert Pilanci, Anh Pham EECS Dept., UC Berkeley January 7, 2013 1 / 36 Outline Sparse Sparse Sparse Probability
More informationShort Course Robust Optimization and Machine Learning. Lecture 4: Optimization in Unsupervised Learning
Short Course Robust Optimization and Machine Machine Lecture 4: Optimization in Unsupervised Laurent El Ghaoui EECS and IEOR Departments UC Berkeley Spring seminar TRANSP-OR, Zinal, Jan. 16-19, 2012 s
More informationSemidefinite and Second Order Cone Programming Seminar Fall 2012 Project: Robust Optimization and its Application of Robust Portfolio Optimization
Semidefinite and Second Order Cone Programming Seminar Fall 2012 Project: Robust Optimization and its Application of Robust Portfolio Optimization Instructor: Farid Alizadeh Author: Ai Kagawa 12/12/2012
More informationLarge-scale Robust Optimization and Applications
and Applications : Applications Laurent El Ghaoui EECS and IEOR Departments UC Berkeley SESO 2015 Tutorial s June 22, 2015 Outline of Machine Learning s Robust for s Outline of Machine Learning s Robust
More informationA direct formulation for sparse PCA using semidefinite programming
A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley A. d Aspremont, INFORMS, Denver,
More informationSparse PCA with applications in finance
Sparse PCA with applications in finance A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon 1 Introduction
More informationSecond Order Cone Programming, Missing or Uncertain Data, and Sparse SVMs
Second Order Cone Programming, Missing or Uncertain Data, and Sparse SVMs Ammon Washburn University of Arizona September 25, 2015 1 / 28 Introduction We will begin with basic Support Vector Machines (SVMs)
More informationRobust Optimization and Data Approximation in Machine Learning
Robust Optimization and Data Approximation in Machine Learning Gia Vinh Anh Pham Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2015-216
More informationRobust Optimization and Data Approximation in Machine Learning. Gia Vinh Anh Pham
Robust Optimization and Data Approximation in Machine Learning by Gia Vinh Anh Pham A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer
More informationMachine Learning And Applications: Supervised Learning-SVM
Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine
More informationA direct formulation for sparse PCA using semidefinite programming
A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon
More informationEE 227A: Convex Optimization and Applications April 24, 2008
EE 227A: Convex Optimization and Applications April 24, 2008 Lecture 24: Robust Optimization: Chance Constraints Lecturer: Laurent El Ghaoui Reading assignment: Chapter 2 of the book on Robust Optimization
More informationCSC 411 Lecture 17: Support Vector Machine
CSC 411 Lecture 17: Support Vector Machine Ethan Fetaya, James Lucas and Emad Andrews University of Toronto CSC411 Lec17 1 / 1 Today Max-margin classification SVM Hard SVM Duality Soft SVM CSC411 Lec17
More informationConvex Optimization M2
Convex Optimization M2 Lecture 8 A. d Aspremont. Convex Optimization M2. 1/57 Applications A. d Aspremont. Convex Optimization M2. 2/57 Outline Geometrical problems Approximation problems Combinatorial
More informationInterval Data Classification under Partial Information: A Chance-constraint Approach
Interval Data Classification under Partial Information: A Chance-constraint Approach Sahely Bhadra J. Saketha Nath Aharon Ben-Tal Chiranjib Bhattacharyya PAKDD 2009 J. Saketha Nath (PAKDD 09) Conference
More informationA Second order Cone Programming Formulation for Classifying Missing Data
A Second order Cone Programming Formulation for Classifying Missing Data Chiranjib Bhattacharyya Department of Computer Science and Automation Indian Institute of Science Bangalore, 560 012, India chiru@csa.iisc.ernet.in
More informationRobust linear optimization under general norms
Operations Research Letters 3 (004) 50 56 Operations Research Letters www.elsevier.com/locate/dsw Robust linear optimization under general norms Dimitris Bertsimas a; ;, Dessislava Pachamanova b, Melvyn
More informationHandout 8: Dealing with Data Uncertainty
MFE 5100: Optimization 2015 16 First Term Handout 8: Dealing with Data Uncertainty Instructor: Anthony Man Cho So December 1, 2015 1 Introduction Conic linear programming CLP, and in particular, semidefinite
More informationA Least Squares Formulation for Canonical Correlation Analysis
A Least Squares Formulation for Canonical Correlation Analysis Liang Sun, Shuiwang Ji, and Jieping Ye Department of Computer Science and Engineering Arizona State University Motivation Canonical Correlation
More informationLecture 7: Kernels for Classification and Regression
Lecture 7: Kernels for Classification and Regression CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 15, 2011 Outline Outline A linear regression problem Linear auto-regressive
More informationRobust Inverse Covariance Estimation under Noisy Measurements
.. Robust Inverse Covariance Estimation under Noisy Measurements Jun-Kun Wang, Shou-De Lin Intel-NTU, National Taiwan University ICML 2014 1 / 30 . Table of contents Introduction.1 Introduction.2 Related
More informationAnnouncements - Homework
Announcements - Homework Homework 1 is graded, please collect at end of lecture Homework 2 due today Homework 3 out soon (watch email) Ques 1 midterm review HW1 score distribution 40 HW1 total score 35
More informationCS295: Convex Optimization. Xiaohui Xie Department of Computer Science University of California, Irvine
CS295: Convex Optimization Xiaohui Xie Department of Computer Science University of California, Irvine Course information Prerequisites: multivariate calculus and linear algebra Textbook: Convex Optimization
More informationSupport vector machines
Support vector machines Guillaume Obozinski Ecole des Ponts - ParisTech SOCN course 2014 SVM, kernel methods and multiclass 1/23 Outline 1 Constrained optimization, Lagrangian duality and KKT 2 Support
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationHandout 6: Some Applications of Conic Linear Programming
ENGG 550: Foundations of Optimization 08 9 First Term Handout 6: Some Applications of Conic Linear Programming Instructor: Anthony Man Cho So November, 08 Introduction Conic linear programming CLP, and
More informationLIGHT ROBUSTNESS. Matteo Fischetti, Michele Monaci. DEI, University of Padova. 1st ARRIVAL Annual Workshop and Review Meeting, Utrecht, April 19, 2007
LIGHT ROBUSTNESS Matteo Fischetti, Michele Monaci DEI, University of Padova 1st ARRIVAL Annual Workshop and Review Meeting, Utrecht, April 19, 2007 Work supported by the Future and Emerging Technologies
More informationSelected Examples of CONIC DUALITY AT WORK Robust Linear Optimization Synthesis of Linear Controllers Matrix Cube Theorem A.
. Selected Examples of CONIC DUALITY AT WORK Robust Linear Optimization Synthesis of Linear Controllers Matrix Cube Theorem A. Nemirovski Arkadi.Nemirovski@isye.gatech.edu Linear Optimization Problem,
More informationSupport Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs
E0 270 Machine Learning Lecture 5 (Jan 22, 203) Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in
More informationBig Data Analytics: Optimization and Randomization
Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.
More informationRobust Optimization for Risk Control in Enterprise-wide Optimization
Robust Optimization for Risk Control in Enterprise-wide Optimization Juan Pablo Vielma Department of Industrial Engineering University of Pittsburgh EWO Seminar, 011 Pittsburgh, PA Uncertainty in Optimization
More informationEE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015
EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,
More informationRobust Kernel-Based Regression
Robust Kernel-Based Regression Budi Santosa Department of Industrial Engineering Sepuluh Nopember Institute of Technology Kampus ITS Surabaya Surabaya 60111,Indonesia Theodore B. Trafalis School of Industrial
More informationInterval solutions for interval algebraic equations
Mathematics and Computers in Simulation 66 (2004) 207 217 Interval solutions for interval algebraic equations B.T. Polyak, S.A. Nazin Institute of Control Sciences, Russian Academy of Sciences, 65 Profsoyuznaya
More informationSupport Vector Machines: Maximum Margin Classifiers
Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 16, 2008 Piotr Mirowski Based on slides by Sumit Chopra and Fu-Jie Huang 1 Outline What is behind
More informationRobust Novelty Detection with Single Class MPM
Robust Novelty Detection with Single Class MPM Gert R.G. Lanckriet EECS, U.C. Berkeley gert@eecs.berkeley.edu Laurent El Ghaoui EECS, U.C. Berkeley elghaoui@eecs.berkeley.edu Michael I. Jordan Computer
More informationCS6375: Machine Learning Gautam Kunapuli. Support Vector Machines
Gautam Kunapuli Example: Text Categorization Example: Develop a model to classify news stories into various categories based on their content. sports politics Use the bag-of-words representation for this
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationEstimation and Optimization: Gaps and Bridges. MURI Meeting June 20, Laurent El Ghaoui. UC Berkeley EECS
MURI Meeting June 20, 2001 Estimation and Optimization: Gaps and Bridges Laurent El Ghaoui EECS UC Berkeley 1 goals currently, estimation (of model parameters) and optimization (of decision variables)
More informationKernel Methods and Support Vector Machines
Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete
More informationWarm up: risk prediction with logistic regression
Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T
More informationMachine Learning. Support Vector Machines. Fabio Vandin November 20, 2017
Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training
More informationCanonical Problem Forms. Ryan Tibshirani Convex Optimization
Canonical Problem Forms Ryan Tibshirani Convex Optimization 10-725 Last time: optimization basics Optimization terology (e.g., criterion, constraints, feasible points, solutions) Properties and first-order
More informationEE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 18
EE/ACM 150 - Applications of Convex Optimization in Signal Processing and Communications Lecture 18 Andre Tkacenko Signal Processing Research Group Jet Propulsion Laboratory May 31, 2012 Andre Tkacenko
More informationLecture 20: November 1st
10-725: Optimization Fall 2012 Lecture 20: November 1st Lecturer: Geoff Gordon Scribes: Xiaolong Shen, Alex Beutel Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have not
More informationStatistical Analysis of Online News
Statistical Analysis of Online News Laurent El Ghaoui EECS, UC Berkeley Information Systems Colloquium, Stanford University November 29, 2007 Collaborators Joint work with and UCB students: Alexandre d
More informationSupport Vector Machines for Classification: A Statistical Portrait
Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,
More informationBrief Introduction to Machine Learning
Brief Introduction to Machine Learning Yuh-Jye Lee Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU August 29, 2016 1 / 49 1 Introduction 2 Binary Classification 3 Support Vector
More informationNon-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines
Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall
More informationKernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning
Kernel Machines Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 SVM linearly separable case n training points (x 1,, x n ) d features x j is a d-dimensional vector Primal problem:
More information1 Robust optimization
ORF 523 Lecture 16 Princeton University Instructor: A.A. Ahmadi Scribe: G. Hall Any typos should be emailed to a a a@princeton.edu. In this lecture, we give a brief introduction to robust optimization
More informationIE 521 Convex Optimization
Lecture 14: and Applications 11th March 2019 Outline LP SOCP SDP LP SOCP SDP 1 / 21 Conic LP SOCP SDP Primal Conic Program: min c T x s.t. Ax K b (CP) : b T y s.t. A T y = c (CD) y K 0 Theorem. (Strong
More informationSupport Vector Machines, Kernel SVM
Support Vector Machines, Kernel SVM Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 27, 2017 1 / 40 Outline 1 Administration 2 Review of last lecture 3 SVM
More informationSupport Vector Machines
Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot, we get creative in two
More informationSparse Covariance Selection using Semidefinite Programming
Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support
More informationThe Alternating Direction Method of Multipliers
The Alternating Direction Method of Multipliers With Adaptive Step Size Selection Peter Sutor, Jr. Project Advisor: Professor Tom Goldstein December 2, 2015 1 / 25 Background The Dual Problem Consider
More informationMathematical Optimization Models and Applications
Mathematical Optimization Models and Applications Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/ yyye Chapters 1, 2.1-2,
More informationLecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University
Lecture 18: Kernels Risk and Loss Support Vector Regression Aykut Erdem December 2016 Hacettepe University Administrative We will have a make-up lecture on next Saturday December 24, 2016 Presentations
More informationCS , Fall 2011 Assignment 2 Solutions
CS 94-0, Fall 20 Assignment 2 Solutions (8 pts) In this question we briefly review the expressiveness of kernels (a) Construct a support vector machine that computes the XOR function Use values of + and
More informationStat542 (F11) Statistical Learning. First consider the scenario where the two classes of points are separable.
Linear SVM (separable case) First consider the scenario where the two classes of points are separable. It s desirable to have the width (called margin) between the two dashed lines to be large, i.e., have
More informationSVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels
SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels Karl Stratos June 21, 2018 1 / 33 Tangent: Some Loose Ends in Logistic Regression Polynomial feature expansion in logistic
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More information15. Conic optimization
L. Vandenberghe EE236C (Spring 216) 15. Conic optimization conic linear program examples modeling duality 15-1 Generalized (conic) inequalities Conic inequality: a constraint x K where K is a convex cone
More informationAdaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer Vicente L. Malave February 23, 2011 Outline Notation minimize a number of functions φ
More informationSupport Vector Machines and Kernel Methods
2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University
More informationHomework 4. Convex Optimization /36-725
Homework 4 Convex Optimization 10-725/36-725 Due Friday November 4 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)
More informationLearning Classification with Auxiliary Probabilistic Information Quang Nguyen Hamed Valizadegan Milos Hauskrecht
Learning Classification with Auxiliary Probabilistic Information Quang Nguyen Hamed Valizadegan Milos Hauskrecht Computer Science Department University of Pittsburgh Outline Introduction Learning with
More informationMax Margin-Classifier
Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization
More informationNEW ROBUST UNSUPERVISED SUPPORT VECTOR MACHINES
J Syst Sci Complex () 24: 466 476 NEW ROBUST UNSUPERVISED SUPPORT VECTOR MACHINES Kun ZHAO Mingyu ZHANG Naiyang DENG DOI:.7/s424--82-8 Received: March 8 / Revised: 5 February 9 c The Editorial Office of
More informationRETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS
RETRIEVAL MODELS Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Retrieval models Boolean model Vector space model Probabilistic
More informationLecture 1: Introduction
EE 227A: Convex Optimization and Applications January 17 Lecture 1: Introduction Lecturer: Anh Pham Reading assignment: Chapter 1 of BV 1. Course outline and organization Course web page: http://www.eecs.berkeley.edu/~elghaoui/teaching/ee227a/
More informationUses of duality. Geoff Gordon & Ryan Tibshirani Optimization /
Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
More informationRobust portfolio selection under norm uncertainty
Wang and Cheng Journal of Inequalities and Applications (2016) 2016:164 DOI 10.1186/s13660-016-1102-4 R E S E A R C H Open Access Robust portfolio selection under norm uncertainty Lei Wang 1 and Xi Cheng
More informationSupport Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationA short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie
A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie Computational Biology Program Memorial Sloan-Kettering Cancer Center http://cbio.mskcc.org/leslielab
More informationA Convex Upper Bound on the Log-Partition Function for Binary Graphical Models
A Convex Upper Bound on the Log-Partition Function for Binary Graphical Models Laurent El Ghaoui and Assane Gueye Department of Electrical Engineering and Computer Science University of California Berkeley
More informationLasso: Algorithms and Extensions
ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions
More informationA Robust Portfolio Technique for Mitigating the Fragility of CVaR Minimization
A Robust Portfolio Technique for Mitigating the Fragility of CVaR Minimization Jun-ya Gotoh Department of Industrial and Systems Engineering Chuo University 2-3-27 Kasuga, Bunkyo-ku, Tokyo 2-855, Japan
More informationMachine Learning for NLP
Machine Learning for NLP Uppsala University Department of Linguistics and Philology Slides borrowed from Ryan McDonald, Google Research Machine Learning for NLP 1(50) Introduction Linear Classifiers Classifiers
More informationRobust conic quadratic programming with ellipsoidal uncertainties
Robust conic quadratic programming with ellipsoidal uncertainties Roland Hildebrand (LJK Grenoble 1 / CNRS, Grenoble) KTH, Stockholm; November 13, 2008 1 Uncertain conic programs min x c, x : Ax + b K
More informationStatistical Methods for Data Mining
Statistical Methods for Data Mining Kuangnan Fang Xiamen University Email: xmufkn@xmu.edu.cn Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find
More informationLecture 6: Conic Optimization September 8
IE 598: Big Data Optimization Fall 2016 Lecture 6: Conic Optimization September 8 Lecturer: Niao He Scriber: Juan Xu Overview In this lecture, we finish up our previous discussion on optimality conditions
More informationSparse Principal Component Analysis via Alternating Maximization and Efficient Parallel Implementations
Sparse Principal Component Analysis via Alternating Maximization and Efficient Parallel Implementations Martin Takáč The University of Edinburgh Joint work with Peter Richtárik (Edinburgh University) Selin
More informationJeff Howbert Introduction to Machine Learning Winter
Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable
More informationMLCC 2017 Regularization Networks I: Linear Models
MLCC 2017 Regularization Networks I: Linear Models Lorenzo Rosasco UNIGE-MIT-IIT June 27, 2017 About this class We introduce a class of learning algorithms based on Tikhonov regularization We study computational
More informationSupport Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM
1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationMachine Learning. Support Vector Machines. Manfred Huber
Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data
More informationAccelerated Proximal Gradient Methods for Convex Optimization
Accelerated Proximal Gradient Methods for Convex Optimization Paul Tseng Mathematics, University of Washington Seattle MOPTA, University of Guelph August 18, 2008 ACCELERATED PROXIMAL GRADIENT METHODS
More informationThe Perceptron Algorithm, Margins
The Perceptron Algorithm, Margins MariaFlorina Balcan 08/29/2018 The Perceptron Algorithm Simple learning algorithm for supervised classification analyzed via geometric margins in the 50 s [Rosenblatt
More informationCOS 424: Interacting with Data
COS 424: Interacting with Data Lecturer: Rob Schapire Lecture #14 Scribe: Zia Khan April 3, 2007 Recall from previous lecture that in regression we are trying to predict a real value given our data. Specically,
More informationConvex Optimization and Support Vector Machine
Convex Optimization and Support Vector Machine Problem 0. Consider a two-class classification problem. The training data is L n = {(x 1, t 1 ),..., (x n, t n )}, where each t i { 1, 1} and x i R p. We
More informationHomework 3. Convex Optimization /36-725
Homework 3 Convex Optimization 10-725/36-725 Due Friday October 14 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationRobust Principal Component Analysis
ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M
More information