CSCI5654 (Linear Programming, Fall 2013) Lectures Lectures 10,11 Slide# 1

Size: px
Start display at page:

Download "CSCI5654 (Linear Programming, Fall 2013) Lectures Lectures 10,11 Slide# 1"

Transcription

1 CSCI5654 (Linear Programming, Fall 2013) Lectures Lectures 10,11 Slide# 1

2 Today s Lecture 1. Introduction to norms: L 1,L 2,L. 2. Casting absolute value and max operators. 3. Norm minimization problems. 4. Applications: fitting, classification, denoising etc.. Lectures 10,11 Slide# 2

3 Norm Basic idea: Notion of length/distance between vector. Definition (Norm): Let l : X R 0 be a mapping from a vector space to non-negative reals. The function l is a norm iff 1. l( x) = 0 iff 0, 2. l(a x) = a l( x) 3. l( x + y) l( x)+l( y) (Triangle Inequality). Length of x: l( x). Distance between x, y is l( y x)(= l( x y)). Lectures 10,11 Slide# 3

4 L 2 (Euclidean) norm Distance between (x 1,y 1 ) to (x 2,y 2 ) is (x 1 x 2 ) 2 +(y 1 y 2 ) 2. In general, x 2 = x t. x. Distance between x, y is given by x y 2. Lectures 10,11 Slide# 4

5 L p norm For p 1, we define L p norm as x p = ( x 1 p + x 2 p x n p ) 1 p. The L p norm distance between two vectors is: x y p L 1 norm: x 1 = x 1 + x x n. Class exercise: verify that L 1 norm distance is indeed a norm. Q: Why is p 1 necessary? Lectures 10,11 Slide# 5

6 L norm Obtained as the limit lim p x p. Definition: For vector x, x is defined as x = max( x 1, x 2,..., x n ). Ex-1: Can we verify that L is really a norm? Ex-2 (challenge): Prove that the limit definition coincides with the explicit form of the L distance. Lectures 10,11 Slide# 6

7 Norm: Exercise Take vectors x = (0,1, 1), y = (2,1,1). Write down L 1,L 2,L norm distances between x, y. Lectures 10,11 Slide# 7

8 Norm Spheres ǫ-sphere: { x d( x,0) ǫ} for norm d. The 1 spheres corresponding to the norms L 1,L,L 2. 1 L 2 L 1 L Note: Norm spheres are convex sets. Lectures 10,11 Slide# 8

9 Norm Minimization problems General form: minimize x p Ax b Note-1: We always minimize the norm functions (in this course). Note-2: Maximization of norm is generally a hard (non-convex) problem. Lectures 10,11 Slide# 9

10 Unconstrained Norm Minimization Lets first study the following problem: min. Ax b p. 1. For p = 2, this problem is called (unconstrained) least squares. We will solve this using calculus. 2. For p = 1,, this problem is called L 1 (L ) least squares. We will reduce this problem to LP. 3. Applications: solving (noisy) linear systems, regularization, denoising, max. likelihood estimation, regression and so on. Lectures 10,11 Slide# 10

11 Unconstrained Least Squares Decision Variables: x = (x 1,...,x n ) R n. Objective function: min. A x b 2. Constraints: No explicit constraint. Lectures 10,11 Slide# 11

12 Least Squares: Example Example: min. (2x +3y 1,y) 2. Solution: We will equivalently minimize (2x +3y 1) 2 +y 2. Using fundamental theorem of calculus, we obtain the conditions: In other words, we obtain: (2x+3y 1) 2 +y 2 x = 0 (2x+3y 1) 2 +y 2 y = 0 4(2x +3y 1) = 0 6(2x +3y 1)+2y = 0 The optima lies at is y = 0,x = 1 2. Verify minima by computing checking second order derivative (Hessian matrix). Lectures 10,11 Slide# 12

13 Unconstrained Least Squares Problem: minimize A x b 2. Solution: We will use calculus minimum finding using partial derivatives. Recall: f(x 1,...,x n ) = ( x1 f, x2 f,..., xn f). Criterion for optimal point for f( x) is that f = (0,...,0). ( A x b 2 2) = ((A x b) t (A x b)) = ( x t A t A x 2 x t A t b + bt b) = 2A t A x 2A t b = 0 Minima will occur at x = (A t A) 1 A t b (assume A is full rank ). Similar solution for L p norm for even number p. Lectures 10,11 Slide# 13

14 L 1 norm minimization Decision Variables: x = (x 1,...,x n ) R n. Objective function: min. A x b 1. Constraints: No explicit constraint. We will reduce this to a linear program. Lectures 10,11 Slide# 14

15 Example Example: min. (2x +3y 1,y) 1 = 2x +3y 1 + y. Trick: Let t 1 0,t 2 0 be s.t. Consider the following problem: 2x +3y 1 t 1 y t 2 min. t 1 +t 2 2x +3y 1 t 1 y t 2 t 1,t 2 0 Goal: convince class that this problem is equivalent to original. Note: y t 2 can be rewritten as y t 2 y t 2. Lectures 10,11 Slide# 15

16 Example: LP formulation min. t 1 +t 2 2x +3y 1 t 1 2x 3y +1 t 1 y t 2 y t 2 t 1,t 2 0 Solution: x = 1 2,y = 0. Q: Can we repeat the same trick if y t 2? N: No. A disjunction of inequalities will be needed. Lectures 10,11 Slide# 16

17 L 1 norm minimization Problem: min. Ax b 1. Solution: Let A be m n matrix. Add variables t 1,...,t m correspoding to rows of m. min. m i=1 t i A i x b t i t 1,...,t m 0 min. m i=1 t i A x b t A x + b t t 1,...,t m 0 We write it in the general form: max. 1 t t A x t b A x t b t 1,...,t m 0 Lectures 10,11 Slide# 17

18 L norm minimization Decision Variables: x = (x 1,...,x n ) R n. Objective function: min. A x b. Constraints: No explicit constraint. We will reduce this to a linear program. Lectures 10,11 Slide# 18

19 Example Example: min. (2x +3y 1,y) = min max.( 2x +3y 1, y ). Trick: Let t 0 be s.t. Consider the following problem: min. 2x +3y 1 t y t t 2x +3y 1 t y t t 0 Goal: convince class that this problem is equivalent to original. Note: y t can be rewritten as t y t. Lectures 10,11 Slide# 19

20 Example:LP formulation min. t 2x +3y 1 t 2x 3y +1 t y t y t t 0 Solution: x = 1 2,y = 0. Lectures 10,11 Slide# 20

21 L norm minimization Problem: min. Ax b. Solution: Let A be m n matrix. Add variables t 1,...,t m correspoding to rows of m. min. t A i x b t t 0 min. t A x b t. 1 A x + b t. 1 t 0 We write it in the general form: max. t A x t. 1 b A x t. 1 b t 0 Lectures 10,11 Slide# 21

22 Application-1: Estimation of Solutions Lectures 10,11 Slide# 22

23 Application-1: Solution Estimation Problem: We are trying to solve the linear equation A x = b 1. The entries in A, b could have random errors (noisy measurements). 2. There are many more equations than variables. x A x A Noise b Lectures 10,11 Slide# 23

24 Application: Estimation Example Problem: We are trying to measure a quantity x using noisy measurements: 2x = 5.1 3x = 7.6 4x = x = x = 15.1 Q-1: Is there a value of x that explains the measurement? Q-2: What do we do in this case? Lectures 10,11 Slide# 24

25 Application: Estimation Example Problem: We are trying to measure a quantity x using noisy measurements: 2x = 5.1 3x = 7.6 4x = x = x = 15.1 Find x that nearly fits all the measurements., i,e, min. (2x 5.1,3x 7.6,4x 9.91,5x 12.45,6x 15.1) p. We can choose p = 1,2, and see what answer we get. Lectures 10,11 Slide# 25

26 Application:Estimation L 2 norm: Apply least squares for A = b = Solving for x = argmin Ax b 2, yields x Residual error: (0.09, 0.08,.11, 0.08,.06). Lectures 10,11 Slide# 26

27 Application:Estimation L 1 norm: Reduce to linear program using A = , b = ,. Solution: x = Residual Error: (.12,.13, 0.05, 0.0,.16). L norm: Reduce to linear program. Solution: x = Residual Error: (.1,.1,.1,.05,.1) Lectures 10,11 Slide# 27

28 Experiment with Larger Matrices (matlab) L 1 norm: large number of entries with small residuals, spread is larger. L 2 norm: residuals look like a gaussian distribution. L norm: residuals look more uniform, smaller spread. Matlab code is available, run your own experiments! Lectures 10,11 Slide# 28

29 Regularized Approximation Ordinary least squares can have poor performance (ill conditioning problem). Tikhonov Regularized Approximation: A x b 2 +λ x 2. λ 0 is a tuning parameter. Note-1: This is the same as the following norm minimization: [ A λ 1 2I ] x [ b Note-2: In general, we can also have regularized approximation using L 1,L and other norms. 0 ] 2 2. Lectures 10,11 Slide# 29

30 Penalty Function Approximation Problem: Solve minimize.φ(a x b). where φ is a penalty function. If φ = L 1,L 2,L, this is exactly the same as norm minimization. Note-1: In general, φ need not be a norm. Note-2: φ is sometimes called a loss function. Lectures 10,11 Slide# 30

31 Penalty Functions: Example Deadzone Linear Penalty: Let x = (x 1,...,x n ) t. The deadzone linear penalty is sum of penalties on each individual component x 1,...,x n : wherein φ dz ( x) : φ dz (x 1 )+φ dz (x 2 )+ +φ dz (x n ), φ dz (x) = { 0 u a b( u a) u > a L2 φ dz L 1 -a a Lectures 10,11 Slide# 31

32 Deadzone Linear Penalty (cont) Deadzone vs. L 2 : L 2 norm imposes large penalty when residual is high. Therefore, outliers in data can affect performance drastically. Deadzone penalty function is generally less sensitive to outliers. Q: How do we solve the deadzone penalty approximation problem? A: Apply tricks for L 1,L (upcoming assignment). Lectures 10,11 Slide# 32

33 Other penalty functions Huber Penalty: The Huber penalty is sum of functions on individual components x 1,...,x n : φ H (x 1 )+φ H (x 2 )+ +φ H (x n ), wherein φ H (x) = { u 2 u a a(2 u a) u > a φ H L 2 -a a Lectures 10,11 Slide# 33

34 Penalty Function Approximation: Summary General observations: L 2 : Natural, easy to solve (using pseudo inverse). However, sensitive to outliers. L 1 : Sparse: ensures that most residuals are very small. Insensitive to outliers. However, spread of error is larger. L : Non-sparse but minimizes spread. Very sensitive to outliers. Other loss functions: provide various tradeoffs. Details: Cf. Boyd s book. Lectures 10,11 Slide# 34

35 Application-2: Signal Denoising Lectures 10,11 Slide# 35

36 Application-2: Signal Denoising Input: Given samples of a noisy signal: y : ( y 1, y 2,..., y N ) t. Original signal is unknown (very little data available). Problem: Find new signal y by minimizing φ s is a smoothing function. mininize. y y p p +φ s ( y ). 4 Original Signal 4 Signal With Noise Lectures 10,11 Slide# 36

37 Smoothing Criteria Quadratic Smoothing: L 2 norm on successive differences. φ q ( y) = N (y i y i 1 ) 2. i=2 Total Variation Smoothing: L 1 norm on successive differences. φ tv ( y) = N y i y i 1. i=2 Maximum Variation Smoothing: L norm on successive differences. φ max ( y) = N max i=2 ( y i+1 y i ). Lectures 10,11 Slide# 37

38 Quadratic Smoothing Problem: minimize y y n 2 2 +φ q( y). This can be viewed as a least squares problem: minimize [ I ] y [ yn 0 ] 2 2. Where is the matrix: = Lectures 10,11 Slide# 38

39 Total/Maximum Variance Smoothing Total variance smoothing: L 1 least squares problem. [ ] [ ] minimize I y yn 0 Max. variance smoothing: L 1 least squares problem. [ ] [ ] minimize I y yn Where is the matrix: = Lectures 10,11 Slide# 39

40 Experiment 5 Original Signal 5 Signal With Noise L1 reconstructed L2 reconstructed L reconstructed L1 residuals L2 residuals L residuals Lectures 10,11 Slide# 40

41 Other Smoothing Functions Consider three successive data points instead of two: φ T : m 1 i=2 2y i (y i 1 +y i+1 ). Modify matrix as follows: = is called a Topelitz Matrix. Lectures 10,11 Slide# 41

42 Application-3: Regression Lectures 10,11 Slide# 42

43 Linear Regression Inputs: Data points x i : (x i1,...,x in ) and outcome y i. Goal: Find a function that best fits the data. f( x) = c 1 x 1 +c 2 x c n x n +c 0 12 input vs. output plot for given data Lectures 10,11 Slide# 43

44 Linear Regression Inputs: Data points x i : (x i1,...,x in ) and outcome y i. Goal: Find a function that best fits the data. f( x) = c 1 x 1 +c 2 x c n x n +c 0 12 input vs. output plot for given data Lectures 10,11 Slide# 44

45 Linear Regression Let X be the data matrix and y be corresponding outcome matrix. Note: The i th row X i represents data point i y i is the corresponding outcome for X i If c t x +c 0 is the best line fit, the error for i th data point is: ǫ i : (X i c)+c 0 y i. We can write the overall error as: [X 1] c y. p Lectures 10,11 Slide# 45

46 Regression Inputs: Data X, outcome y. Decision Vars: c 0,c 1,...,c n the coefficients of the straight line. Solution:minimize [X 1] c y. p Note: Instead of the norm, we could use any penalty function. Ordinary Regression: min [X 1] c y p. p Tikhanov Regression: min [X 1] c y 2 2 +λ c 2 2. Lectures 10,11 Slide# 46

47 Experiment Experiment: Create noisy data set with outliers. 80 regular data points with low noise. 5 outlier datapoints with large noise. Goal: Compare effect of L 1, L 2, L norm regressions. Lectures 10,11 Slide# 47

48 Regression with Outliers 15 Regression with outliers L 2 10 L 1 L Lectures 10,11 Slide# 48

49 Regression with Outliers (Error Histogram) 100 L1 norm residual L2 norm residual L norm residual Lectures 10,11 Slide# 49

50 Linear Regression With Non-Linear Models Input: Data X, outcome y, functions g 1,...,g k. Goal: Fit a model of the form y i = c 0 +c 1 g 1 ( x)+c 2 g 2 ( x)+...+c k g k ( x). Solution: Transform the data matrix as follows: g 1 (X 1 )... g k (X 1 ) X g 1 (X 2 )... g k (X 2 ) =... g 1 (X n )... g k (X n ). Apply linear regression on (X, y). Lectures 10,11 Slide# 50

51 Experiment Experiment: Create noisy non-linear data set with outliers. y = 0.2sin(x)+1.5 cos(x)+2 cos(1.5 x)+.2 x 3+Noise. 80 regular data points with low noise. 5 outlier datapoints with large noise Data outliers Regression with non linear model Goal: Compare effect of L 1, L 2, L norm regressions. Lectures 10,11 Slide# 51

52 Experiment: linear regression non-linear model Regression with outliers L 1 L 2 L Data outliers Lectures 10,11 Slide# 52

53 Experiment: residuals 200 L1 norm residual L2 norm residual L norm residual Lectures 10,11 Slide# 53

54 Classification Problem: Given two classes: x 1 +,..., x+ N and x 1,..., x M Goal: Find separating hyperplane between the classes. of data points x + x classifier Lectures 10,11 Slide# 54

55 Application-4: Classification Lectures 10,11 Slide# 55

56 Linear Classification Problem: Given two classes: x 1 +,..., x+ N and x 1,..., x M Goal: Find separating hyperplane between the classes. Q-1: Will such an hyperplane always exist? A: Not always! Only if data is linearly separable. of data points. 1. Assume that such a hyperplane exists. 2. Deal with cases where a hyperplane does not exist. Lectures 10,11 Slide# 56

57 Linear Classification (cont) Solution: Use linear programming. Decision Variables: w 0,w 1,w 2,...,w n, representing the classfier: f w ( x) : w 0 +w 1 x 1 + +w n x n. Linear Programming: X + : max.??? X + w 0 X w 0 x x x x + N 1 X : x 1 1 x x M 1 Lectures 10,11 Slide# 57

58 Classification Using Linear Programs Objective: Lets try i w i = 1 t w. Note: We write a, b for dot product a t b. Simplified Data representation: Represent two classes +1, 1 for data. Data:( x 1,y 1 ),...,( x N,y N ) (X, y), where y i = +1 for class I and y i = 1 for class II data. Linear Program: min. 1, w +w 0 y i ( x i, w +w 0 ) 0 Verify that this formulation works. Lectures 10,11 Slide# 58

59 Experiment: Classification Experiment: Find a classifier for the following points. 2.5 Data for classification Lectures 10,11 Slide# 59

60 Experiment: Classification Experiment: Outcome. 2.5 Classifiers Note: Many different classifiers are possible. Note-2: What is the deal with points that lie on the classifier line? Lectures 10,11 Slide# 60

61 Dual, Complementary Slackness Linear Program: min. 1, w +w 0 y i ( x i, w +w 0 ) 0 Complementary Slackness: Let α 1,...,α N be the dual variables. α i y i ( x i, w +w 0 ) = 0, α i 0. Claim: There are at least n+1 points x j1,..., x jn+1 s.t. x ji, w +w 0 = 0. Note: n+1 pts. fully determine the hyper-plane w;w 0. Lectures 10,11 Slide# 61

62 Classifying Points Problem: Given a new point x, let us find which class it belongs to. Solution-1: Just find out the sign of w, x +w 0, we know w;w 0 from our LP solution. Solution-2: More complicated. Express x as a linear combination of support vectors: We can write w, x as x = λ 1 x j1 +λ 2 x j2 + +λ n+1 x n+1 ( ( w, x +w 0 = i λ i w 0 + w, x ji + }{{} 0 1 i λ i )w 0 = 1 i λ i )w 0. This is impractical for linear programming formulations. Lectures 10,11 Slide# 62

63 Non-Separable Data Q: What if data is not linear separable? The linear program will be primal infeasible In practice, linear separation may hold without noise but not with noise. Solution: Relax the conditions on half-space w. ξ 1 ξ 2 Lectures 10,11 Slide# 63

64 Non-Separable Data: Linear Programming Classifier: (old) w 0 + w, x. (new) w, x +1. Note: Classifier i w ix i +w 0 is equivalent to i LP Formulation: minimize ξ + 1 x + i, w +1 ξ + i x j, w +1 ξ j ξ + i,ξ j 0 w i w 0 x i +1 if w 0 0. The variable ξ encodes tolerance. We minimize ξ as part of objective. Exercise: Verify that the support vector interpretation of dual works. Lectures 10,11 Slide# 64

65 Non Linear Model Classification Linear Classifier: w, x +1. Non-linear functions: φ : R n R. w,(φ 1 ( x),φ 2 ( x),...,φ m ( x)). This is exactly same idea as linear regression with non-linear functions. Lectures 10,11 Slide# 65

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Convex Optimization M2

Convex Optimization M2 Convex Optimization M2 Lecture 8 A. d Aspremont. Convex Optimization M2. 1/57 Applications A. d Aspremont. Convex Optimization M2. 2/57 Outline Geometrical problems Approximation problems Combinatorial

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Perceptron Revisited: Linear Separators. Support Vector Machines

Perceptron Revisited: Linear Separators. Support Vector Machines Support Vector Machines Perceptron Revisited: Linear Separators Binary classification can be viewed as the task of separating classes in feature space: w T x + b > 0 w T x + b = 0 w T x + b < 0 Department

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

Support Vector Machines.

Support Vector Machines. Support Vector Machines www.cs.wisc.edu/~dpage 1 Goals for the lecture you should understand the following concepts the margin slack variables the linear support vector machine nonlinear SVMs the kernel

More information

CSCI5654 (Linear Programming, Fall 2013) Lecture-8. Lecture 8 Slide# 1

CSCI5654 (Linear Programming, Fall 2013) Lecture-8. Lecture 8 Slide# 1 CSCI5654 (Linear Programming, Fall 2013) Lecture-8 Lecture 8 Slide# 1 Today s Lecture 1. Recap of dual variables and strong duality. 2. Complementary Slackness Theorem. 3. Interpretation of dual variables.

More information

6. Approximation and fitting

6. Approximation and fitting 6. Approximation and fitting Convex Optimization Boyd & Vandenberghe norm approximation least-norm problems regularized approximation robust approximation 6 Norm approximation minimize Ax b (A R m n with

More information

Max Margin-Classifier

Max Margin-Classifier Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization

More information

Jeff Howbert Introduction to Machine Learning Winter

Jeff Howbert Introduction to Machine Learning Winter Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable

More information

Example Problem. Linear Program (standard form) CSCI5654 (Linear Programming, Fall 2013) Lecture-7. Duality

Example Problem. Linear Program (standard form) CSCI5654 (Linear Programming, Fall 2013) Lecture-7. Duality CSCI5654 (Linear Programming, Fall 013) Lecture-7 Duality Lecture 7 Slide# 1 Lecture 7 Slide# Linear Program (standard form) Example Problem maximize c 1 x 1 + + c n x n s.t. a j1 x 1 + + a jn x n b j

More information

SUPPORT VECTOR MACHINE

SUPPORT VECTOR MACHINE SUPPORT VECTOR MACHINE Mainly based on https://nlp.stanford.edu/ir-book/pdf/15svm.pdf 1 Overview SVM is a huge topic Integration of MMDS, IIR, and Andrew Moore s slides here Our foci: Geometric intuition

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines Lecture 9: Large Margin Classifiers. Linear Support Vector Machines Perceptrons Definition Perceptron learning rule Convergence Margin & max margin classifiers (Linear) support vector machines Formulation

More information

Convex Optimization: Applications

Convex Optimization: Applications Convex Optimization: Applications Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh Convex Optimization 1-75/36-75 Based on material from Boyd, Vandenberghe Norm Approximation minimize Ax b (A R m

More information

About this class. Maximizing the Margin. Maximum margin classifiers. Picture of large and small margin hyperplanes

About this class. Maximizing the Margin. Maximum margin classifiers. Picture of large and small margin hyperplanes About this class Maximum margin classifiers SVMs: geometric derivation of the primal problem Statement of the dual problem The kernel trick SVMs as the solution to a regularization problem Maximizing the

More information

Lecture Notes on Support Vector Machine

Lecture Notes on Support Vector Machine Lecture Notes on Support Vector Machine Feng Li fli@sdu.edu.cn Shandong University, China 1 Hyperplane and Margin In a n-dimensional space, a hyper plane is defined by ω T x + b = 0 (1) where ω R n is

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING 5: Vector Data: Support Vector Machine Instructor: Yizhou Sun yzsun@cs.ucla.edu October 18, 2017 Homework 1 Announcements Due end of the day of this Thursday (11:59pm)

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Support vector machines (SVMs) are one of the central concepts in all of machine learning. They are simply a combination of two ideas: linear classification via maximum (or optimal

More information

Machine Learning. Support Vector Machines. Manfred Huber

Machine Learning. Support Vector Machines. Manfred Huber Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Support Vector Machine (SVM) Hamid R. Rabiee Hadi Asheri, Jafar Muhammadi, Nima Pourdamghani Spring 2013 http://ce.sharif.edu/courses/91-92/2/ce725-1/ Agenda Introduction

More information

COMP 652: Machine Learning. Lecture 12. COMP Lecture 12 1 / 37

COMP 652: Machine Learning. Lecture 12. COMP Lecture 12 1 / 37 COMP 652: Machine Learning Lecture 12 COMP 652 Lecture 12 1 / 37 Today Perceptrons Definition Perceptron learning rule Convergence (Linear) support vector machines Margin & max margin classifier Formulation

More information

Homework 4. Convex Optimization /36-725

Homework 4. Convex Optimization /36-725 Homework 4 Convex Optimization 10-725/36-725 Due Friday November 4 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

Gradient Descent. Dr. Xiaowei Huang

Gradient Descent. Dr. Xiaowei Huang Gradient Descent Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Three machine learning algorithms: decision tree learning k-nn linear regression only optimization objectives are discussed,

More information

Convex Optimization and Support Vector Machine

Convex Optimization and Support Vector Machine Convex Optimization and Support Vector Machine Problem 0. Consider a two-class classification problem. The training data is L n = {(x 1, t 1 ),..., (x n, t n )}, where each t i { 1, 1} and x i R p. We

More information

Support Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs

Support Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs E0 270 Machine Learning Lecture 5 (Jan 22, 203) Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in

More information

CSC 411 Lecture 17: Support Vector Machine

CSC 411 Lecture 17: Support Vector Machine CSC 411 Lecture 17: Support Vector Machine Ethan Fetaya, James Lucas and Emad Andrews University of Toronto CSC411 Lec17 1 / 1 Today Max-margin classification SVM Hard SVM Duality Soft SVM CSC411 Lec17

More information

Convex Optimization and SVM

Convex Optimization and SVM Convex Optimization and SVM Problem 0. Cf lecture notes pages 12 to 18. Problem 1. (i) A slab is an intersection of two half spaces, hence convex. (ii) A wedge is an intersection of two half spaces, hence

More information

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines Gautam Kunapuli Example: Text Categorization Example: Develop a model to classify news stories into various categories based on their content. sports politics Use the bag-of-words representation for this

More information

Support Vector Machines

Support Vector Machines Wien, June, 2010 Paul Hofmarcher, Stefan Theussl, WU Wien Hofmarcher/Theussl SVM 1/21 Linear Separable Separating Hyperplanes Non-Linear Separable Soft-Margin Hyperplanes Hofmarcher/Theussl SVM 2/21 (SVM)

More information

ICS-E4030 Kernel Methods in Machine Learning

ICS-E4030 Kernel Methods in Machine Learning ICS-E4030 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 28. September, 2016 Juho Rousu 28. September, 2016 1 / 38 Convex optimization Convex optimisation This

More information

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR

More information

Lecture 11 and 12: Penalty methods and augmented Lagrangian methods for nonlinear programming

Lecture 11 and 12: Penalty methods and augmented Lagrangian methods for nonlinear programming Lecture 11 and 12: Penalty methods and augmented Lagrangian methods for nonlinear programming Coralia Cartis, Mathematical Institute, University of Oxford C6.2/B2: Continuous Optimization Lecture 11 and

More information

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44 Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)

More information

1 Cricket chirps: an example

1 Cricket chirps: an example Notes for 2016-09-26 1 Cricket chirps: an example Did you know that you can estimate the temperature by listening to the rate of chirps? The data set in Table 1 1. represents measurements of the number

More information

Support'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan

Support'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan Support'Vector'Machines Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan kasthuri.kannan@nyumc.org Overview Support Vector Machines for Classification Linear Discrimination Nonlinear Discrimination

More information

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 18

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 18 EE/ACM 150 - Applications of Convex Optimization in Signal Processing and Communications Lecture 18 Andre Tkacenko Signal Processing Research Group Jet Propulsion Laboratory May 31, 2012 Andre Tkacenko

More information

10-725/36-725: Convex Optimization Prerequisite Topics

10-725/36-725: Convex Optimization Prerequisite Topics 10-725/36-725: Convex Optimization Prerequisite Topics February 3, 2015 This is meant to be a brief, informal refresher of some topics that will form building blocks in this course. The content of the

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

Support Vector Machines: Maximum Margin Classifiers

Support Vector Machines: Maximum Margin Classifiers Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 16, 2008 Piotr Mirowski Based on slides by Sumit Chopra and Fu-Jie Huang 1 Outline What is behind

More information

Support Vector Machines

Support Vector Machines EE 17/7AT: Optimization Models in Engineering Section 11/1 - April 014 Support Vector Machines Lecturer: Arturo Fernandez Scribe: Arturo Fernandez 1 Support Vector Machines Revisited 1.1 Strictly) Separable

More information

Stat542 (F11) Statistical Learning. First consider the scenario where the two classes of points are separable.

Stat542 (F11) Statistical Learning. First consider the scenario where the two classes of points are separable. Linear SVM (separable case) First consider the scenario where the two classes of points are separable. It s desirable to have the width (called margin) between the two dashed lines to be large, i.e., have

More information

CSCI : Optimization and Control of Networks. Review on Convex Optimization

CSCI : Optimization and Control of Networks. Review on Convex Optimization CSCI7000-016: Optimization and Control of Networks Review on Convex Optimization 1 Convex set S R n is convex if x,y S, λ,µ 0, λ+µ = 1 λx+µy S geometrically: x,y S line segment through x,y S examples (one

More information

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for 1 A cautionary tale Notes for 2016-10-05 You have been dropped on a desert island with a laptop with a magic battery of infinite life, a MATLAB license, and a complete lack of knowledge of basic geometry.

More information

Support Vector Machines for Classification and Regression

Support Vector Machines for Classification and Regression CIS 520: Machine Learning Oct 04, 207 Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may

More information

Dan Roth 461C, 3401 Walnut

Dan Roth   461C, 3401 Walnut CIS 519/419 Applied Machine Learning www.seas.upenn.edu/~cis519 Dan Roth danroth@seas.upenn.edu http://www.cis.upenn.edu/~danroth/ 461C, 3401 Walnut Slides were created by Dan Roth (for CIS519/419 at Penn

More information

Linear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights

Linear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights Linear Discriminant Functions and Support Vector Machines Linear, threshold units CSE19, Winter 11 Biometrics CSE 19 Lecture 11 1 X i : inputs W i : weights θ : threshold 3 4 5 1 6 7 Courtesy of University

More information

The Lagrangian L : R d R m R r R is an (easier to optimize) lower bound on the original problem:

The Lagrangian L : R d R m R r R is an (easier to optimize) lower bound on the original problem: HT05: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford Convex Optimization and slides based on Arthur Gretton s Advanced Topics in Machine Learning course

More information

Support Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Support Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Support Vector Machines CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 A Linearly Separable Problem Consider the binary classification

More information

Interior Point Algorithms for Constrained Convex Optimization

Interior Point Algorithms for Constrained Convex Optimization Interior Point Algorithms for Constrained Convex Optimization Chee Wei Tan CS 8292 : Advanced Topics in Convex Optimization and its Applications Fall 2010 Outline Inequality constrained minimization problems

More information

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015 EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,

More information

Linear Classifiers: Expressiveness

Linear Classifiers: Expressiveness Linear Classifiers: Expressiveness Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 Lecture outline Linear classifiers: Introduction What functions do linear classifiers express?

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Support Vector Machines

Support Vector Machines Two SVM tutorials linked in class website (please, read both): High-level presentation with applications (Hearst 1998) Detailed tutorial (Burges 1998) Support Vector Machines Machine Learning 10701/15781

More information

A Brief Review on Convex Optimization

A Brief Review on Convex Optimization A Brief Review on Convex Optimization 1 Convex set S R n is convex if x,y S, λ,µ 0, λ+µ = 1 λx+µy S geometrically: x,y S line segment through x,y S examples (one convex, two nonconvex sets): A Brief Review

More information

subject to (x 2)(x 4) u,

subject to (x 2)(x 4) u, Exercises Basic definitions 5.1 A simple example. Consider the optimization problem with variable x R. minimize x 2 + 1 subject to (x 2)(x 4) 0, (a) Analysis of primal problem. Give the feasible set, the

More information

LP Duality: outline. Duality theory for Linear Programming. alternatives. optimization I Idea: polyhedra

LP Duality: outline. Duality theory for Linear Programming. alternatives. optimization I Idea: polyhedra LP Duality: outline I Motivation and definition of a dual LP I Weak duality I Separating hyperplane theorem and theorems of the alternatives I Strong duality and complementary slackness I Using duality

More information

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Machine Learning Support Vector Machines. Prof. Matteo Matteucci Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way

More information

2.098/6.255/ Optimization Methods Practice True/False Questions

2.098/6.255/ Optimization Methods Practice True/False Questions 2.098/6.255/15.093 Optimization Methods Practice True/False Questions December 11, 2009 Part I For each one of the statements below, state whether it is true or false. Include a 1-3 line supporting sentence

More information

CS-E4830 Kernel Methods in Machine Learning

CS-E4830 Kernel Methods in Machine Learning CS-E4830 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 27. September, 2017 Juho Rousu 27. September, 2017 1 / 45 Convex optimization Convex optimisation This

More information

Nonlinear Optimization for Optimal Control

Nonlinear Optimization for Optimal Control Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]

More information

Support Vector Machines and Bayes Regression

Support Vector Machines and Bayes Regression Statistical Techniques in Robotics (16-831, F11) Lecture #14 (Monday ctober 31th) Support Vector Machines and Bayes Regression Lecturer: Drew Bagnell Scribe: Carl Doersch 1 1 Linear SVMs We begin by considering

More information

CSC Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming

CSC Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming CSC2411 - Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming Notes taken by Mike Jamieson March 28, 2005 Summary: In this lecture, we introduce semidefinite programming

More information

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods

More information

Short Course Robust Optimization and Machine Learning. 3. Optimization in Supervised Learning

Short Course Robust Optimization and Machine Learning. 3. Optimization in Supervised Learning Short Course Robust Optimization and 3. Optimization in Supervised EECS and IEOR Departments UC Berkeley Spring seminar TRANSP-OR, Zinal, Jan. 16-19, 2012 Outline Overview of Supervised models and variants

More information

Interior-Point Methods for Linear Optimization

Interior-Point Methods for Linear Optimization Interior-Point Methods for Linear Optimization Robert M. Freund and Jorge Vera March, 204 c 204 Robert M. Freund and Jorge Vera. All rights reserved. Linear Optimization with a Logarithmic Barrier Function

More information

EE364b Convex Optimization II May 30 June 2, Final exam

EE364b Convex Optimization II May 30 June 2, Final exam EE364b Convex Optimization II May 30 June 2, 2014 Prof. S. Boyd Final exam By now, you know how it works, so we won t repeat it here. (If not, see the instructions for the EE364a final exam.) Since you

More information

Announcements - Homework

Announcements - Homework Announcements - Homework Homework 1 is graded, please collect at end of lecture Homework 2 due today Homework 3 out soon (watch email) Ques 1 midterm review HW1 score distribution 40 HW1 total score 35

More information

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the

More information

Support vector machines Lecture 4

Support vector machines Lecture 4 Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The

More information

COMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma

COMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma COMS 4771 Introduction to Machine Learning James McInerney Adapted from slides by Nakul Verma Announcements HW1: Please submit as a group Watch out for zero variance features (Q5) HW2 will be released

More information

Problem Set # 1 Solution, 18.06

Problem Set # 1 Solution, 18.06 Problem Set # 1 Solution, 1.06 For grading: Each problem worths 10 points, and there is points of extra credit in problem. The total maximum is 100. 1. (10pts) In Lecture 1, Prof. Strang drew the cone

More information

Today. Calculus. Linear Regression. Lagrange Multipliers

Today. Calculus. Linear Regression. Lagrange Multipliers Today Calculus Lagrange Multipliers Linear Regression 1 Optimization with constraints What if I want to constrain the parameters of the model. The mean is less than 10 Find the best likelihood, subject

More information

Support Vector Regression (SVR) Descriptions of SVR in this discussion follow that in Refs. (2, 6, 7, 8, 9). The literature

Support Vector Regression (SVR) Descriptions of SVR in this discussion follow that in Refs. (2, 6, 7, 8, 9). The literature Support Vector Regression (SVR) Descriptions of SVR in this discussion follow that in Refs. (2, 6, 7, 8, 9). The literature suggests the design variables should be normalized to a range of [-1,1] or [0,1].

More information

Machine Learning Practice Page 2 of 2 10/28/13

Machine Learning Practice Page 2 of 2 10/28/13 Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes

More information

Lecture 2: Convex Sets and Functions

Lecture 2: Convex Sets and Functions Lecture 2: Convex Sets and Functions Hyang-Won Lee Dept. of Internet & Multimedia Eng. Konkuk University Lecture 2 Network Optimization, Fall 2015 1 / 22 Optimization Problems Optimization problems are

More information

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST) Lagrange Duality Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2017-18, HKUST, Hong Kong Outline of Lecture Lagrangian Dual function Dual

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

More information

Indirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina

Indirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina Indirect Rule Learning: Support Vector Machines Indirect learning: loss optimization It doesn t estimate the prediction rule f (x) directly, since most loss functions do not have explicit optimizers. Indirection

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

OPTIMISATION 2007/8 EXAM PREPARATION GUIDELINES

OPTIMISATION 2007/8 EXAM PREPARATION GUIDELINES General: OPTIMISATION 2007/8 EXAM PREPARATION GUIDELINES This points out some important directions for your revision. The exam is fully based on what was taught in class: lecture notes, handouts and homework.

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

Linear regression COMS 4771

Linear regression COMS 4771 Linear regression COMS 4771 1. Old Faithful and prediction functions Prediction problem: Old Faithful geyser (Yellowstone) Task: Predict time of next eruption. 1 / 40 Statistical model for time between

More information

Kernels and the Kernel Trick. Machine Learning Fall 2017

Kernels and the Kernel Trick. Machine Learning Fall 2017 Kernels and the Kernel Trick Machine Learning Fall 2017 1 Support vector machines Training by maximizing margin The SVM objective Solving the SVM optimization problem Support vectors, duals and kernels

More information

COMS 4771 Regression. Nakul Verma

COMS 4771 Regression. Nakul Verma COMS 4771 Regression Nakul Verma Last time Support Vector Machines Maximum Margin formulation Constrained Optimization Lagrange Duality Theory Convex Optimization SVM dual and Interpretation How get the

More information

5. Duality. Lagrangian

5. Duality. Lagrangian 5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized

More information

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017 Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training

More information

Support Vector Machine. Industrial AI Lab.

Support Vector Machine. Industrial AI Lab. Support Vector Machine Industrial AI Lab. Classification (Linear) Autonomously figure out which category (or class) an unknown item should be categorized into Number of categories / classes Binary: 2 different

More information

Convex Optimization M2

Convex Optimization M2 Convex Optimization M2 Lecture 3 A. d Aspremont. Convex Optimization M2. 1/49 Duality A. d Aspremont. Convex Optimization M2. 2/49 DMs DM par email: dm.daspremont@gmail.com A. d Aspremont. Convex Optimization

More information

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis Massimiliano Pontil 1 Today s plan SVD and principal component analysis (PCA) Connection

More information

Data Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396

Data Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396 Data Mining Linear & nonlinear classifiers Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 31 Table of contents 1 Introduction

More information

1 Review Session. 1.1 Lecture 2

1 Review Session. 1.1 Lecture 2 1 Review Session Note: The following lists give an overview of the material that was covered in the lectures and sections. Your TF will go through these lists. If anything is unclear or you have questions

More information

Linear Regression (continued)

Linear Regression (continued) Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression

More information

Homework 2 Solutions Kernel SVM and Perceptron

Homework 2 Solutions Kernel SVM and Perceptron Homework 2 Solutions Kernel SVM and Perceptron CMU 1-71: Machine Learning (Fall 21) https://piazza.com/cmu/fall21/17115781/home OUT: Sept 25, 21 DUE: Oct 8, 11:59 PM Problem 1: SVM decision boundaries

More information

(Kernels +) Support Vector Machines

(Kernels +) Support Vector Machines (Kernels +) Support Vector Machines Machine Learning Torsten Möller Reading Chapter 5 of Machine Learning An Algorithmic Perspective by Marsland Chapter 6+7 of Pattern Recognition and Machine Learning

More information

Statistical Methods for SVM

Statistical Methods for SVM Statistical Methods for SVM Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot,

More information