CSCI5654 (Linear Programming, Fall 2013) Lectures Lectures 10,11 Slide# 1
|
|
- Nelson Bailey
- 5 years ago
- Views:
Transcription
1 CSCI5654 (Linear Programming, Fall 2013) Lectures Lectures 10,11 Slide# 1
2 Today s Lecture 1. Introduction to norms: L 1,L 2,L. 2. Casting absolute value and max operators. 3. Norm minimization problems. 4. Applications: fitting, classification, denoising etc.. Lectures 10,11 Slide# 2
3 Norm Basic idea: Notion of length/distance between vector. Definition (Norm): Let l : X R 0 be a mapping from a vector space to non-negative reals. The function l is a norm iff 1. l( x) = 0 iff 0, 2. l(a x) = a l( x) 3. l( x + y) l( x)+l( y) (Triangle Inequality). Length of x: l( x). Distance between x, y is l( y x)(= l( x y)). Lectures 10,11 Slide# 3
4 L 2 (Euclidean) norm Distance between (x 1,y 1 ) to (x 2,y 2 ) is (x 1 x 2 ) 2 +(y 1 y 2 ) 2. In general, x 2 = x t. x. Distance between x, y is given by x y 2. Lectures 10,11 Slide# 4
5 L p norm For p 1, we define L p norm as x p = ( x 1 p + x 2 p x n p ) 1 p. The L p norm distance between two vectors is: x y p L 1 norm: x 1 = x 1 + x x n. Class exercise: verify that L 1 norm distance is indeed a norm. Q: Why is p 1 necessary? Lectures 10,11 Slide# 5
6 L norm Obtained as the limit lim p x p. Definition: For vector x, x is defined as x = max( x 1, x 2,..., x n ). Ex-1: Can we verify that L is really a norm? Ex-2 (challenge): Prove that the limit definition coincides with the explicit form of the L distance. Lectures 10,11 Slide# 6
7 Norm: Exercise Take vectors x = (0,1, 1), y = (2,1,1). Write down L 1,L 2,L norm distances between x, y. Lectures 10,11 Slide# 7
8 Norm Spheres ǫ-sphere: { x d( x,0) ǫ} for norm d. The 1 spheres corresponding to the norms L 1,L,L 2. 1 L 2 L 1 L Note: Norm spheres are convex sets. Lectures 10,11 Slide# 8
9 Norm Minimization problems General form: minimize x p Ax b Note-1: We always minimize the norm functions (in this course). Note-2: Maximization of norm is generally a hard (non-convex) problem. Lectures 10,11 Slide# 9
10 Unconstrained Norm Minimization Lets first study the following problem: min. Ax b p. 1. For p = 2, this problem is called (unconstrained) least squares. We will solve this using calculus. 2. For p = 1,, this problem is called L 1 (L ) least squares. We will reduce this problem to LP. 3. Applications: solving (noisy) linear systems, regularization, denoising, max. likelihood estimation, regression and so on. Lectures 10,11 Slide# 10
11 Unconstrained Least Squares Decision Variables: x = (x 1,...,x n ) R n. Objective function: min. A x b 2. Constraints: No explicit constraint. Lectures 10,11 Slide# 11
12 Least Squares: Example Example: min. (2x +3y 1,y) 2. Solution: We will equivalently minimize (2x +3y 1) 2 +y 2. Using fundamental theorem of calculus, we obtain the conditions: In other words, we obtain: (2x+3y 1) 2 +y 2 x = 0 (2x+3y 1) 2 +y 2 y = 0 4(2x +3y 1) = 0 6(2x +3y 1)+2y = 0 The optima lies at is y = 0,x = 1 2. Verify minima by computing checking second order derivative (Hessian matrix). Lectures 10,11 Slide# 12
13 Unconstrained Least Squares Problem: minimize A x b 2. Solution: We will use calculus minimum finding using partial derivatives. Recall: f(x 1,...,x n ) = ( x1 f, x2 f,..., xn f). Criterion for optimal point for f( x) is that f = (0,...,0). ( A x b 2 2) = ((A x b) t (A x b)) = ( x t A t A x 2 x t A t b + bt b) = 2A t A x 2A t b = 0 Minima will occur at x = (A t A) 1 A t b (assume A is full rank ). Similar solution for L p norm for even number p. Lectures 10,11 Slide# 13
14 L 1 norm minimization Decision Variables: x = (x 1,...,x n ) R n. Objective function: min. A x b 1. Constraints: No explicit constraint. We will reduce this to a linear program. Lectures 10,11 Slide# 14
15 Example Example: min. (2x +3y 1,y) 1 = 2x +3y 1 + y. Trick: Let t 1 0,t 2 0 be s.t. Consider the following problem: 2x +3y 1 t 1 y t 2 min. t 1 +t 2 2x +3y 1 t 1 y t 2 t 1,t 2 0 Goal: convince class that this problem is equivalent to original. Note: y t 2 can be rewritten as y t 2 y t 2. Lectures 10,11 Slide# 15
16 Example: LP formulation min. t 1 +t 2 2x +3y 1 t 1 2x 3y +1 t 1 y t 2 y t 2 t 1,t 2 0 Solution: x = 1 2,y = 0. Q: Can we repeat the same trick if y t 2? N: No. A disjunction of inequalities will be needed. Lectures 10,11 Slide# 16
17 L 1 norm minimization Problem: min. Ax b 1. Solution: Let A be m n matrix. Add variables t 1,...,t m correspoding to rows of m. min. m i=1 t i A i x b t i t 1,...,t m 0 min. m i=1 t i A x b t A x + b t t 1,...,t m 0 We write it in the general form: max. 1 t t A x t b A x t b t 1,...,t m 0 Lectures 10,11 Slide# 17
18 L norm minimization Decision Variables: x = (x 1,...,x n ) R n. Objective function: min. A x b. Constraints: No explicit constraint. We will reduce this to a linear program. Lectures 10,11 Slide# 18
19 Example Example: min. (2x +3y 1,y) = min max.( 2x +3y 1, y ). Trick: Let t 0 be s.t. Consider the following problem: min. 2x +3y 1 t y t t 2x +3y 1 t y t t 0 Goal: convince class that this problem is equivalent to original. Note: y t can be rewritten as t y t. Lectures 10,11 Slide# 19
20 Example:LP formulation min. t 2x +3y 1 t 2x 3y +1 t y t y t t 0 Solution: x = 1 2,y = 0. Lectures 10,11 Slide# 20
21 L norm minimization Problem: min. Ax b. Solution: Let A be m n matrix. Add variables t 1,...,t m correspoding to rows of m. min. t A i x b t t 0 min. t A x b t. 1 A x + b t. 1 t 0 We write it in the general form: max. t A x t. 1 b A x t. 1 b t 0 Lectures 10,11 Slide# 21
22 Application-1: Estimation of Solutions Lectures 10,11 Slide# 22
23 Application-1: Solution Estimation Problem: We are trying to solve the linear equation A x = b 1. The entries in A, b could have random errors (noisy measurements). 2. There are many more equations than variables. x A x A Noise b Lectures 10,11 Slide# 23
24 Application: Estimation Example Problem: We are trying to measure a quantity x using noisy measurements: 2x = 5.1 3x = 7.6 4x = x = x = 15.1 Q-1: Is there a value of x that explains the measurement? Q-2: What do we do in this case? Lectures 10,11 Slide# 24
25 Application: Estimation Example Problem: We are trying to measure a quantity x using noisy measurements: 2x = 5.1 3x = 7.6 4x = x = x = 15.1 Find x that nearly fits all the measurements., i,e, min. (2x 5.1,3x 7.6,4x 9.91,5x 12.45,6x 15.1) p. We can choose p = 1,2, and see what answer we get. Lectures 10,11 Slide# 25
26 Application:Estimation L 2 norm: Apply least squares for A = b = Solving for x = argmin Ax b 2, yields x Residual error: (0.09, 0.08,.11, 0.08,.06). Lectures 10,11 Slide# 26
27 Application:Estimation L 1 norm: Reduce to linear program using A = , b = ,. Solution: x = Residual Error: (.12,.13, 0.05, 0.0,.16). L norm: Reduce to linear program. Solution: x = Residual Error: (.1,.1,.1,.05,.1) Lectures 10,11 Slide# 27
28 Experiment with Larger Matrices (matlab) L 1 norm: large number of entries with small residuals, spread is larger. L 2 norm: residuals look like a gaussian distribution. L norm: residuals look more uniform, smaller spread. Matlab code is available, run your own experiments! Lectures 10,11 Slide# 28
29 Regularized Approximation Ordinary least squares can have poor performance (ill conditioning problem). Tikhonov Regularized Approximation: A x b 2 +λ x 2. λ 0 is a tuning parameter. Note-1: This is the same as the following norm minimization: [ A λ 1 2I ] x [ b Note-2: In general, we can also have regularized approximation using L 1,L and other norms. 0 ] 2 2. Lectures 10,11 Slide# 29
30 Penalty Function Approximation Problem: Solve minimize.φ(a x b). where φ is a penalty function. If φ = L 1,L 2,L, this is exactly the same as norm minimization. Note-1: In general, φ need not be a norm. Note-2: φ is sometimes called a loss function. Lectures 10,11 Slide# 30
31 Penalty Functions: Example Deadzone Linear Penalty: Let x = (x 1,...,x n ) t. The deadzone linear penalty is sum of penalties on each individual component x 1,...,x n : wherein φ dz ( x) : φ dz (x 1 )+φ dz (x 2 )+ +φ dz (x n ), φ dz (x) = { 0 u a b( u a) u > a L2 φ dz L 1 -a a Lectures 10,11 Slide# 31
32 Deadzone Linear Penalty (cont) Deadzone vs. L 2 : L 2 norm imposes large penalty when residual is high. Therefore, outliers in data can affect performance drastically. Deadzone penalty function is generally less sensitive to outliers. Q: How do we solve the deadzone penalty approximation problem? A: Apply tricks for L 1,L (upcoming assignment). Lectures 10,11 Slide# 32
33 Other penalty functions Huber Penalty: The Huber penalty is sum of functions on individual components x 1,...,x n : φ H (x 1 )+φ H (x 2 )+ +φ H (x n ), wherein φ H (x) = { u 2 u a a(2 u a) u > a φ H L 2 -a a Lectures 10,11 Slide# 33
34 Penalty Function Approximation: Summary General observations: L 2 : Natural, easy to solve (using pseudo inverse). However, sensitive to outliers. L 1 : Sparse: ensures that most residuals are very small. Insensitive to outliers. However, spread of error is larger. L : Non-sparse but minimizes spread. Very sensitive to outliers. Other loss functions: provide various tradeoffs. Details: Cf. Boyd s book. Lectures 10,11 Slide# 34
35 Application-2: Signal Denoising Lectures 10,11 Slide# 35
36 Application-2: Signal Denoising Input: Given samples of a noisy signal: y : ( y 1, y 2,..., y N ) t. Original signal is unknown (very little data available). Problem: Find new signal y by minimizing φ s is a smoothing function. mininize. y y p p +φ s ( y ). 4 Original Signal 4 Signal With Noise Lectures 10,11 Slide# 36
37 Smoothing Criteria Quadratic Smoothing: L 2 norm on successive differences. φ q ( y) = N (y i y i 1 ) 2. i=2 Total Variation Smoothing: L 1 norm on successive differences. φ tv ( y) = N y i y i 1. i=2 Maximum Variation Smoothing: L norm on successive differences. φ max ( y) = N max i=2 ( y i+1 y i ). Lectures 10,11 Slide# 37
38 Quadratic Smoothing Problem: minimize y y n 2 2 +φ q( y). This can be viewed as a least squares problem: minimize [ I ] y [ yn 0 ] 2 2. Where is the matrix: = Lectures 10,11 Slide# 38
39 Total/Maximum Variance Smoothing Total variance smoothing: L 1 least squares problem. [ ] [ ] minimize I y yn 0 Max. variance smoothing: L 1 least squares problem. [ ] [ ] minimize I y yn Where is the matrix: = Lectures 10,11 Slide# 39
40 Experiment 5 Original Signal 5 Signal With Noise L1 reconstructed L2 reconstructed L reconstructed L1 residuals L2 residuals L residuals Lectures 10,11 Slide# 40
41 Other Smoothing Functions Consider three successive data points instead of two: φ T : m 1 i=2 2y i (y i 1 +y i+1 ). Modify matrix as follows: = is called a Topelitz Matrix. Lectures 10,11 Slide# 41
42 Application-3: Regression Lectures 10,11 Slide# 42
43 Linear Regression Inputs: Data points x i : (x i1,...,x in ) and outcome y i. Goal: Find a function that best fits the data. f( x) = c 1 x 1 +c 2 x c n x n +c 0 12 input vs. output plot for given data Lectures 10,11 Slide# 43
44 Linear Regression Inputs: Data points x i : (x i1,...,x in ) and outcome y i. Goal: Find a function that best fits the data. f( x) = c 1 x 1 +c 2 x c n x n +c 0 12 input vs. output plot for given data Lectures 10,11 Slide# 44
45 Linear Regression Let X be the data matrix and y be corresponding outcome matrix. Note: The i th row X i represents data point i y i is the corresponding outcome for X i If c t x +c 0 is the best line fit, the error for i th data point is: ǫ i : (X i c)+c 0 y i. We can write the overall error as: [X 1] c y. p Lectures 10,11 Slide# 45
46 Regression Inputs: Data X, outcome y. Decision Vars: c 0,c 1,...,c n the coefficients of the straight line. Solution:minimize [X 1] c y. p Note: Instead of the norm, we could use any penalty function. Ordinary Regression: min [X 1] c y p. p Tikhanov Regression: min [X 1] c y 2 2 +λ c 2 2. Lectures 10,11 Slide# 46
47 Experiment Experiment: Create noisy data set with outliers. 80 regular data points with low noise. 5 outlier datapoints with large noise. Goal: Compare effect of L 1, L 2, L norm regressions. Lectures 10,11 Slide# 47
48 Regression with Outliers 15 Regression with outliers L 2 10 L 1 L Lectures 10,11 Slide# 48
49 Regression with Outliers (Error Histogram) 100 L1 norm residual L2 norm residual L norm residual Lectures 10,11 Slide# 49
50 Linear Regression With Non-Linear Models Input: Data X, outcome y, functions g 1,...,g k. Goal: Fit a model of the form y i = c 0 +c 1 g 1 ( x)+c 2 g 2 ( x)+...+c k g k ( x). Solution: Transform the data matrix as follows: g 1 (X 1 )... g k (X 1 ) X g 1 (X 2 )... g k (X 2 ) =... g 1 (X n )... g k (X n ). Apply linear regression on (X, y). Lectures 10,11 Slide# 50
51 Experiment Experiment: Create noisy non-linear data set with outliers. y = 0.2sin(x)+1.5 cos(x)+2 cos(1.5 x)+.2 x 3+Noise. 80 regular data points with low noise. 5 outlier datapoints with large noise Data outliers Regression with non linear model Goal: Compare effect of L 1, L 2, L norm regressions. Lectures 10,11 Slide# 51
52 Experiment: linear regression non-linear model Regression with outliers L 1 L 2 L Data outliers Lectures 10,11 Slide# 52
53 Experiment: residuals 200 L1 norm residual L2 norm residual L norm residual Lectures 10,11 Slide# 53
54 Classification Problem: Given two classes: x 1 +,..., x+ N and x 1,..., x M Goal: Find separating hyperplane between the classes. of data points x + x classifier Lectures 10,11 Slide# 54
55 Application-4: Classification Lectures 10,11 Slide# 55
56 Linear Classification Problem: Given two classes: x 1 +,..., x+ N and x 1,..., x M Goal: Find separating hyperplane between the classes. Q-1: Will such an hyperplane always exist? A: Not always! Only if data is linearly separable. of data points. 1. Assume that such a hyperplane exists. 2. Deal with cases where a hyperplane does not exist. Lectures 10,11 Slide# 56
57 Linear Classification (cont) Solution: Use linear programming. Decision Variables: w 0,w 1,w 2,...,w n, representing the classfier: f w ( x) : w 0 +w 1 x 1 + +w n x n. Linear Programming: X + : max.??? X + w 0 X w 0 x x x x + N 1 X : x 1 1 x x M 1 Lectures 10,11 Slide# 57
58 Classification Using Linear Programs Objective: Lets try i w i = 1 t w. Note: We write a, b for dot product a t b. Simplified Data representation: Represent two classes +1, 1 for data. Data:( x 1,y 1 ),...,( x N,y N ) (X, y), where y i = +1 for class I and y i = 1 for class II data. Linear Program: min. 1, w +w 0 y i ( x i, w +w 0 ) 0 Verify that this formulation works. Lectures 10,11 Slide# 58
59 Experiment: Classification Experiment: Find a classifier for the following points. 2.5 Data for classification Lectures 10,11 Slide# 59
60 Experiment: Classification Experiment: Outcome. 2.5 Classifiers Note: Many different classifiers are possible. Note-2: What is the deal with points that lie on the classifier line? Lectures 10,11 Slide# 60
61 Dual, Complementary Slackness Linear Program: min. 1, w +w 0 y i ( x i, w +w 0 ) 0 Complementary Slackness: Let α 1,...,α N be the dual variables. α i y i ( x i, w +w 0 ) = 0, α i 0. Claim: There are at least n+1 points x j1,..., x jn+1 s.t. x ji, w +w 0 = 0. Note: n+1 pts. fully determine the hyper-plane w;w 0. Lectures 10,11 Slide# 61
62 Classifying Points Problem: Given a new point x, let us find which class it belongs to. Solution-1: Just find out the sign of w, x +w 0, we know w;w 0 from our LP solution. Solution-2: More complicated. Express x as a linear combination of support vectors: We can write w, x as x = λ 1 x j1 +λ 2 x j2 + +λ n+1 x n+1 ( ( w, x +w 0 = i λ i w 0 + w, x ji + }{{} 0 1 i λ i )w 0 = 1 i λ i )w 0. This is impractical for linear programming formulations. Lectures 10,11 Slide# 62
63 Non-Separable Data Q: What if data is not linear separable? The linear program will be primal infeasible In practice, linear separation may hold without noise but not with noise. Solution: Relax the conditions on half-space w. ξ 1 ξ 2 Lectures 10,11 Slide# 63
64 Non-Separable Data: Linear Programming Classifier: (old) w 0 + w, x. (new) w, x +1. Note: Classifier i w ix i +w 0 is equivalent to i LP Formulation: minimize ξ + 1 x + i, w +1 ξ + i x j, w +1 ξ j ξ + i,ξ j 0 w i w 0 x i +1 if w 0 0. The variable ξ encodes tolerance. We minimize ξ as part of objective. Exercise: Verify that the support vector interpretation of dual works. Lectures 10,11 Slide# 64
65 Non Linear Model Classification Linear Classifier: w, x +1. Non-linear functions: φ : R n R. w,(φ 1 ( x),φ 2 ( x),...,φ m ( x)). This is exactly same idea as linear regression with non-linear functions. Lectures 10,11 Slide# 65
Support Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationSupport Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationConvex Optimization M2
Convex Optimization M2 Lecture 8 A. d Aspremont. Convex Optimization M2. 1/57 Applications A. d Aspremont. Convex Optimization M2. 2/57 Outline Geometrical problems Approximation problems Combinatorial
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationPerceptron Revisited: Linear Separators. Support Vector Machines
Support Vector Machines Perceptron Revisited: Linear Separators Binary classification can be viewed as the task of separating classes in feature space: w T x + b > 0 w T x + b = 0 w T x + b < 0 Department
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationSupport Vector Machines.
Support Vector Machines www.cs.wisc.edu/~dpage 1 Goals for the lecture you should understand the following concepts the margin slack variables the linear support vector machine nonlinear SVMs the kernel
More informationCSCI5654 (Linear Programming, Fall 2013) Lecture-8. Lecture 8 Slide# 1
CSCI5654 (Linear Programming, Fall 2013) Lecture-8 Lecture 8 Slide# 1 Today s Lecture 1. Recap of dual variables and strong duality. 2. Complementary Slackness Theorem. 3. Interpretation of dual variables.
More information6. Approximation and fitting
6. Approximation and fitting Convex Optimization Boyd & Vandenberghe norm approximation least-norm problems regularized approximation robust approximation 6 Norm approximation minimize Ax b (A R m n with
More informationMax Margin-Classifier
Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization
More informationJeff Howbert Introduction to Machine Learning Winter
Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable
More informationExample Problem. Linear Program (standard form) CSCI5654 (Linear Programming, Fall 2013) Lecture-7. Duality
CSCI5654 (Linear Programming, Fall 013) Lecture-7 Duality Lecture 7 Slide# 1 Lecture 7 Slide# Linear Program (standard form) Example Problem maximize c 1 x 1 + + c n x n s.t. a j1 x 1 + + a jn x n b j
More informationSUPPORT VECTOR MACHINE
SUPPORT VECTOR MACHINE Mainly based on https://nlp.stanford.edu/ir-book/pdf/15svm.pdf 1 Overview SVM is a huge topic Integration of MMDS, IIR, and Andrew Moore s slides here Our foci: Geometric intuition
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationLecture 9: Large Margin Classifiers. Linear Support Vector Machines
Lecture 9: Large Margin Classifiers. Linear Support Vector Machines Perceptrons Definition Perceptron learning rule Convergence Margin & max margin classifiers (Linear) support vector machines Formulation
More informationConvex Optimization: Applications
Convex Optimization: Applications Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh Convex Optimization 1-75/36-75 Based on material from Boyd, Vandenberghe Norm Approximation minimize Ax b (A R m
More informationAbout this class. Maximizing the Margin. Maximum margin classifiers. Picture of large and small margin hyperplanes
About this class Maximum margin classifiers SVMs: geometric derivation of the primal problem Statement of the dual problem The kernel trick SVMs as the solution to a regularization problem Maximizing the
More informationLecture Notes on Support Vector Machine
Lecture Notes on Support Vector Machine Feng Li fli@sdu.edu.cn Shandong University, China 1 Hyperplane and Margin In a n-dimensional space, a hyper plane is defined by ω T x + b = 0 (1) where ω R n is
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 5: Vector Data: Support Vector Machine Instructor: Yizhou Sun yzsun@cs.ucla.edu October 18, 2017 Homework 1 Announcements Due end of the day of this Thursday (11:59pm)
More informationSupport Vector Machines
Support Vector Machines Support vector machines (SVMs) are one of the central concepts in all of machine learning. They are simply a combination of two ideas: linear classification via maximum (or optimal
More informationMachine Learning. Support Vector Machines. Manfred Huber
Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Support Vector Machine (SVM) Hamid R. Rabiee Hadi Asheri, Jafar Muhammadi, Nima Pourdamghani Spring 2013 http://ce.sharif.edu/courses/91-92/2/ce725-1/ Agenda Introduction
More informationCOMP 652: Machine Learning. Lecture 12. COMP Lecture 12 1 / 37
COMP 652: Machine Learning Lecture 12 COMP 652 Lecture 12 1 / 37 Today Perceptrons Definition Perceptron learning rule Convergence (Linear) support vector machines Margin & max margin classifier Formulation
More informationHomework 4. Convex Optimization /36-725
Homework 4 Convex Optimization 10-725/36-725 Due Friday November 4 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)
More informationGradient Descent. Dr. Xiaowei Huang
Gradient Descent Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Three machine learning algorithms: decision tree learning k-nn linear regression only optimization objectives are discussed,
More informationConvex Optimization and Support Vector Machine
Convex Optimization and Support Vector Machine Problem 0. Consider a two-class classification problem. The training data is L n = {(x 1, t 1 ),..., (x n, t n )}, where each t i { 1, 1} and x i R p. We
More informationSupport Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs
E0 270 Machine Learning Lecture 5 (Jan 22, 203) Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in
More informationCSC 411 Lecture 17: Support Vector Machine
CSC 411 Lecture 17: Support Vector Machine Ethan Fetaya, James Lucas and Emad Andrews University of Toronto CSC411 Lec17 1 / 1 Today Max-margin classification SVM Hard SVM Duality Soft SVM CSC411 Lec17
More informationConvex Optimization and SVM
Convex Optimization and SVM Problem 0. Cf lecture notes pages 12 to 18. Problem 1. (i) A slab is an intersection of two half spaces, hence convex. (ii) A wedge is an intersection of two half spaces, hence
More informationCS6375: Machine Learning Gautam Kunapuli. Support Vector Machines
Gautam Kunapuli Example: Text Categorization Example: Develop a model to classify news stories into various categories based on their content. sports politics Use the bag-of-words representation for this
More informationSupport Vector Machines
Wien, June, 2010 Paul Hofmarcher, Stefan Theussl, WU Wien Hofmarcher/Theussl SVM 1/21 Linear Separable Separating Hyperplanes Non-Linear Separable Soft-Margin Hyperplanes Hofmarcher/Theussl SVM 2/21 (SVM)
More informationICS-E4030 Kernel Methods in Machine Learning
ICS-E4030 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 28. September, 2016 Juho Rousu 28. September, 2016 1 / 38 Convex optimization Convex optimisation This
More informationConstrained Optimization and Lagrangian Duality
CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may
More informationMACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA
1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR
More informationLecture 11 and 12: Penalty methods and augmented Lagrangian methods for nonlinear programming
Lecture 11 and 12: Penalty methods and augmented Lagrangian methods for nonlinear programming Coralia Cartis, Mathematical Institute, University of Oxford C6.2/B2: Continuous Optimization Lecture 11 and
More informationConvex Optimization. Newton s method. ENSAE: Optimisation 1/44
Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)
More information1 Cricket chirps: an example
Notes for 2016-09-26 1 Cricket chirps: an example Did you know that you can estimate the temperature by listening to the rate of chirps? The data set in Table 1 1. represents measurements of the number
More informationSupport'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan
Support'Vector'Machines Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan kasthuri.kannan@nyumc.org Overview Support Vector Machines for Classification Linear Discrimination Nonlinear Discrimination
More informationEE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 18
EE/ACM 150 - Applications of Convex Optimization in Signal Processing and Communications Lecture 18 Andre Tkacenko Signal Processing Research Group Jet Propulsion Laboratory May 31, 2012 Andre Tkacenko
More information10-725/36-725: Convex Optimization Prerequisite Topics
10-725/36-725: Convex Optimization Prerequisite Topics February 3, 2015 This is meant to be a brief, informal refresher of some topics that will form building blocks in this course. The content of the
More informationUses of duality. Geoff Gordon & Ryan Tibshirani Optimization /
Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear
More informationSupport Vector Machines: Maximum Margin Classifiers
Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 16, 2008 Piotr Mirowski Based on slides by Sumit Chopra and Fu-Jie Huang 1 Outline What is behind
More informationSupport Vector Machines
EE 17/7AT: Optimization Models in Engineering Section 11/1 - April 014 Support Vector Machines Lecturer: Arturo Fernandez Scribe: Arturo Fernandez 1 Support Vector Machines Revisited 1.1 Strictly) Separable
More informationStat542 (F11) Statistical Learning. First consider the scenario where the two classes of points are separable.
Linear SVM (separable case) First consider the scenario where the two classes of points are separable. It s desirable to have the width (called margin) between the two dashed lines to be large, i.e., have
More informationCSCI : Optimization and Control of Networks. Review on Convex Optimization
CSCI7000-016: Optimization and Control of Networks Review on Convex Optimization 1 Convex set S R n is convex if x,y S, λ,µ 0, λ+µ = 1 λx+µy S geometrically: x,y S line segment through x,y S examples (one
More informationBindel, Fall 2016 Matrix Computations (CS 6210) Notes for
1 A cautionary tale Notes for 2016-10-05 You have been dropped on a desert island with a laptop with a magic battery of infinite life, a MATLAB license, and a complete lack of knowledge of basic geometry.
More informationSupport Vector Machines for Classification and Regression
CIS 520: Machine Learning Oct 04, 207 Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may
More informationDan Roth 461C, 3401 Walnut
CIS 519/419 Applied Machine Learning www.seas.upenn.edu/~cis519 Dan Roth danroth@seas.upenn.edu http://www.cis.upenn.edu/~danroth/ 461C, 3401 Walnut Slides were created by Dan Roth (for CIS519/419 at Penn
More informationLinear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights
Linear Discriminant Functions and Support Vector Machines Linear, threshold units CSE19, Winter 11 Biometrics CSE 19 Lecture 11 1 X i : inputs W i : weights θ : threshold 3 4 5 1 6 7 Courtesy of University
More informationThe Lagrangian L : R d R m R r R is an (easier to optimize) lower bound on the original problem:
HT05: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford Convex Optimization and slides based on Arthur Gretton s Advanced Topics in Machine Learning course
More informationSupport Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Support Vector Machines CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 A Linearly Separable Problem Consider the binary classification
More informationInterior Point Algorithms for Constrained Convex Optimization
Interior Point Algorithms for Constrained Convex Optimization Chee Wei Tan CS 8292 : Advanced Topics in Convex Optimization and its Applications Fall 2010 Outline Inequality constrained minimization problems
More informationEE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015
EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,
More informationLinear Classifiers: Expressiveness
Linear Classifiers: Expressiveness Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 Lecture outline Linear classifiers: Introduction What functions do linear classifiers express?
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More informationSupport Vector Machines
Two SVM tutorials linked in class website (please, read both): High-level presentation with applications (Hearst 1998) Detailed tutorial (Burges 1998) Support Vector Machines Machine Learning 10701/15781
More informationA Brief Review on Convex Optimization
A Brief Review on Convex Optimization 1 Convex set S R n is convex if x,y S, λ,µ 0, λ+µ = 1 λx+µy S geometrically: x,y S line segment through x,y S examples (one convex, two nonconvex sets): A Brief Review
More informationsubject to (x 2)(x 4) u,
Exercises Basic definitions 5.1 A simple example. Consider the optimization problem with variable x R. minimize x 2 + 1 subject to (x 2)(x 4) 0, (a) Analysis of primal problem. Give the feasible set, the
More informationLP Duality: outline. Duality theory for Linear Programming. alternatives. optimization I Idea: polyhedra
LP Duality: outline I Motivation and definition of a dual LP I Weak duality I Separating hyperplane theorem and theorems of the alternatives I Strong duality and complementary slackness I Using duality
More informationMachine Learning Support Vector Machines. Prof. Matteo Matteucci
Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way
More information2.098/6.255/ Optimization Methods Practice True/False Questions
2.098/6.255/15.093 Optimization Methods Practice True/False Questions December 11, 2009 Part I For each one of the statements below, state whether it is true or false. Include a 1-3 line supporting sentence
More informationCS-E4830 Kernel Methods in Machine Learning
CS-E4830 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 27. September, 2017 Juho Rousu 27. September, 2017 1 / 45 Convex optimization Convex optimisation This
More informationNonlinear Optimization for Optimal Control
Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]
More informationSupport Vector Machines and Bayes Regression
Statistical Techniques in Robotics (16-831, F11) Lecture #14 (Monday ctober 31th) Support Vector Machines and Bayes Regression Lecturer: Drew Bagnell Scribe: Carl Doersch 1 1 Linear SVMs We begin by considering
More informationCSC Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming
CSC2411 - Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming Notes taken by Mike Jamieson March 28, 2005 Summary: In this lecture, we introduce semidefinite programming
More informationLecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.
Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods
More informationShort Course Robust Optimization and Machine Learning. 3. Optimization in Supervised Learning
Short Course Robust Optimization and 3. Optimization in Supervised EECS and IEOR Departments UC Berkeley Spring seminar TRANSP-OR, Zinal, Jan. 16-19, 2012 Outline Overview of Supervised models and variants
More informationInterior-Point Methods for Linear Optimization
Interior-Point Methods for Linear Optimization Robert M. Freund and Jorge Vera March, 204 c 204 Robert M. Freund and Jorge Vera. All rights reserved. Linear Optimization with a Logarithmic Barrier Function
More informationEE364b Convex Optimization II May 30 June 2, Final exam
EE364b Convex Optimization II May 30 June 2, 2014 Prof. S. Boyd Final exam By now, you know how it works, so we won t repeat it here. (If not, see the instructions for the EE364a final exam.) Since you
More informationAnnouncements - Homework
Announcements - Homework Homework 1 is graded, please collect at end of lecture Homework 2 due today Homework 3 out soon (watch email) Ques 1 midterm review HW1 score distribution 40 HW1 total score 35
More informationLinear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction
Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the
More informationSupport vector machines Lecture 4
Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The
More informationCOMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma
COMS 4771 Introduction to Machine Learning James McInerney Adapted from slides by Nakul Verma Announcements HW1: Please submit as a group Watch out for zero variance features (Q5) HW2 will be released
More informationProblem Set # 1 Solution, 18.06
Problem Set # 1 Solution, 1.06 For grading: Each problem worths 10 points, and there is points of extra credit in problem. The total maximum is 100. 1. (10pts) In Lecture 1, Prof. Strang drew the cone
More informationToday. Calculus. Linear Regression. Lagrange Multipliers
Today Calculus Lagrange Multipliers Linear Regression 1 Optimization with constraints What if I want to constrain the parameters of the model. The mean is less than 10 Find the best likelihood, subject
More informationSupport Vector Regression (SVR) Descriptions of SVR in this discussion follow that in Refs. (2, 6, 7, 8, 9). The literature
Support Vector Regression (SVR) Descriptions of SVR in this discussion follow that in Refs. (2, 6, 7, 8, 9). The literature suggests the design variables should be normalized to a range of [-1,1] or [0,1].
More informationMachine Learning Practice Page 2 of 2 10/28/13
Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes
More informationLecture 2: Convex Sets and Functions
Lecture 2: Convex Sets and Functions Hyang-Won Lee Dept. of Internet & Multimedia Eng. Konkuk University Lecture 2 Network Optimization, Fall 2015 1 / 22 Optimization Problems Optimization problems are
More informationLagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)
Lagrange Duality Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2017-18, HKUST, Hong Kong Outline of Lecture Lagrangian Dual function Dual
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
More informationIndirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina
Indirect Rule Learning: Support Vector Machines Indirect learning: loss optimization It doesn t estimate the prediction rule f (x) directly, since most loss functions do not have explicit optimizers. Indirection
More informationLinear Regression and Its Applications
Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start
More informationOPTIMISATION 2007/8 EXAM PREPARATION GUIDELINES
General: OPTIMISATION 2007/8 EXAM PREPARATION GUIDELINES This points out some important directions for your revision. The exam is fully based on what was taught in class: lecture notes, handouts and homework.
More informationLecture Notes 1: Vector spaces
Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector
More informationLinear regression COMS 4771
Linear regression COMS 4771 1. Old Faithful and prediction functions Prediction problem: Old Faithful geyser (Yellowstone) Task: Predict time of next eruption. 1 / 40 Statistical model for time between
More informationKernels and the Kernel Trick. Machine Learning Fall 2017
Kernels and the Kernel Trick Machine Learning Fall 2017 1 Support vector machines Training by maximizing margin The SVM objective Solving the SVM optimization problem Support vectors, duals and kernels
More informationCOMS 4771 Regression. Nakul Verma
COMS 4771 Regression Nakul Verma Last time Support Vector Machines Maximum Margin formulation Constrained Optimization Lagrange Duality Theory Convex Optimization SVM dual and Interpretation How get the
More information5. Duality. Lagrangian
5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized
More informationMachine Learning. Support Vector Machines. Fabio Vandin November 20, 2017
Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training
More informationSupport Vector Machine. Industrial AI Lab.
Support Vector Machine Industrial AI Lab. Classification (Linear) Autonomously figure out which category (or class) an unknown item should be categorized into Number of categories / classes Binary: 2 different
More informationConvex Optimization M2
Convex Optimization M2 Lecture 3 A. d Aspremont. Convex Optimization M2. 1/49 Duality A. d Aspremont. Convex Optimization M2. 2/49 DMs DM par email: dm.daspremont@gmail.com A. d Aspremont. Convex Optimization
More informationGI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil
GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis Massimiliano Pontil 1 Today s plan SVD and principal component analysis (PCA) Connection
More informationData Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396
Data Mining Linear & nonlinear classifiers Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 31 Table of contents 1 Introduction
More information1 Review Session. 1.1 Lecture 2
1 Review Session Note: The following lists give an overview of the material that was covered in the lectures and sections. Your TF will go through these lists. If anything is unclear or you have questions
More informationLinear Regression (continued)
Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression
More informationHomework 2 Solutions Kernel SVM and Perceptron
Homework 2 Solutions Kernel SVM and Perceptron CMU 1-71: Machine Learning (Fall 21) https://piazza.com/cmu/fall21/17115781/home OUT: Sept 25, 21 DUE: Oct 8, 11:59 PM Problem 1: SVM decision boundaries
More information(Kernels +) Support Vector Machines
(Kernels +) Support Vector Machines Machine Learning Torsten Möller Reading Chapter 5 of Machine Learning An Algorithmic Perspective by Marsland Chapter 6+7 of Pattern Recognition and Machine Learning
More informationStatistical Methods for SVM
Statistical Methods for SVM Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot,
More information