ICS-E4030 Kernel Methods in Machine Learning

Size: px
Start display at page:

Download "ICS-E4030 Kernel Methods in Machine Learning"

Transcription

1 ICS-E4030 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 28. September, 2016 Juho Rousu 28. September, / 38

2 Convex optimization Convex optimisation This lecture Crash course on convex optimisation Lagrangian duality Derivation of the SVM dual Additional reading on convex optimisation: Boyd & Vandenberghe, Convex optimisation, Cambridege University Press, 2004 Juho Rousu 28. September, / 38

3 Convex optimization Convex optimisation Convex optimization problem Standard form of a convex optimization problem minimize f 0 (x) subject to f i (x) 0, i = 1,..., m a T i x = b i, i = 1,..., p The objective is f 0 is a convex function Constraint functions f 1,..., f m related to inequality constraints are convex functions Equality constraints are affine (linear) Value of x that satisfies the constraints is called feasible, the set of all feasible points is called the feasible set. The feasible set defined by the above constraints is convex Juho Rousu 28. September, / 38

4 Convex optimization Convex optimisation Why is convexity useful? I Convex objective vs. non-convex objective I Convex feasible set vs. non-convex feasible set Juho Rousu 28. September, / 38

5 Convex optimization Convex optimisation Why convexity? Convex objective: We can always improve a sub-optimal objective value by stepping towards negative gradient All local optima are global optima Convex constraints i.e. convex feasible set Any point between two feasible points is feasible Updates remain inside the feasible set as long as the update direction is towards a feasible point = fast algorithms based on the principle of feasible descent Graph of a convex function f lies below any line segment (x, f (x)) and (y, f (y)): Convex set C contains all line segments between any pair of its distinct points Juho Rousu 28. September, / 38

6 Convex optimization Convex optimisation Linear optimization problem (LP) A convex optimization problem that as affine objective and affine constraints is called a linear program (LP). A general linear program has the form minimize c T x + d subject to Gx h 0 Ax b = 0 Geometric interpretation: The feasible set is a polyhedron, given by the intersection of set of half-spaces (Gx h) and set of hyperplanes (Ax = b) Objective decreases the steepest when moving along the direction c T x + d = c Juho Rousu 28. September, / 38

7 Convex optimization Convex optimisation L1-regularized SVM as an LP min w N l Hinge (y i, w, φ(x i ) ) + λ w 1 Express Hinge loss by upper bound (slack) ξ i and inequality constraint 1 w T φ(x i ) y i ξ i Express L1 norm of the weight vector w = D j=1 w j as a sum of magnitudes m j and constraints m j w j, w j m j We obtain the LP N D ξ i + λ min w,m j=1 m j subject to w T φ(x i ) y i 1 ξ i, ξ i 0, i = 1,..., n w j m j, w j m j, m j 0, j = 1,..., D Juho Rousu 28. September, / 38

8 Convex optimization Convex optimisation Quadratic optimization problem (QP) Quadratic program (QP) has a quadratic objective function and affine constraint functions A QP can be expressed in the form minimize (1/2)x T Px + q T x + r subject to Gx h 0 Ax b = 0 P R n n is positive semi-definite matrix, G R m n, A R p n Feasible set of a QP is a polyhedron The objective decreases the steepest when moving along f 0 (x) = (Px + q) When P = 0, QP reduces to LP Juho Rousu 28. September, / 38

9 Convex optimization Convex optimisation Ridge regression as a QP min w N ( w, φ(x i ) y i ) 2 + λ w 2 Express norm of the weight vector as w 2 2 = wt I D w Write squared error in vector form y X w, y X w = y T y 2w T X T y + w T X T X w We get P = X T X + λi D for the quaratic part, q = 2X T y for the linear part ( ) min w λwt X T X + λi D w 2w T X T y + y T y This is an example of an unconstrained QP Juho Rousu 28. September, / 38

10 Convex optimization Convex optimisation Soft-margin SVM as a QP min w N l Hinge ( w, φ(x i ), y i ) + λ 2 w 2 Express Hinge loss by upper bound (slack) ξi and inequality constraint 1 w T φ(x i ) y i ξ i Norm of the weight vector as as w 2 2 = wt I D w We get: P = ID, q = 1 N = vector of ones We obtain the QP min w,ξ 1T N ξ + λ 2 wt I D w subject to w T φ(x i ) y i + ξ i 1, ξ i 0, i = 1,..., n Juho Rousu 28. September, / 38

11 Duality Principle of viewing an optimization problem from two interchangeable views, primal and dual views Intuitively: Minimization of a primal objective Maximization of the dual objective Primal constraints Dual variables Dual constraints Primal variables Motivation in this course: Learning with Features (primal) Learning with Kernels (dual) Juho Rousu 28. September, / 38

12 Duality: Lagrangian Consider a convex optimization problem with variable x R n minimize f 0 (x) subject to f i (x) 0, i = 1,..., m h i (x) = 0, i = 1,..., p We will call this the primal optimisation problem We take the constraints into account by augmenting the objective with the weighted sum of the constraints m p L(x, λ, ν) = f 0 (x) + λ i f i (x) + ν i h i (x) The resulting function is the Lagrangian associated with the optimization problem. Its domain is R n R m R p Juho Rousu 28. September, / 38

13 Example: Minimizing f 0 (x) = x 2, s.t. f 1 (x) = 1 x 0 Lagrangian: L(x, λ) = x 2 + λ(1 x) Juho Rousu 28. September, / 38

14 Lagrangian dual function The Lagrangian dual function g : R m R p R is the minimum value of the Lagrangian over x: g(λ, ν) = inf x L(x, λ, ν) = inf {f 0(x) + x D m λ i f i (x) + p ν i h i (x)} Fixing coefficients (λ, ν) corresponds to certain level of penalty, The infimum returns the optimal x for that level of penalty g(λ, ν) is the corresponding value for the Lagrangian g(λ, ν) is a concave function (i.e. g is convex) Juho Rousu 28. September, / 38

15 Example Minimizing x 2, s.t. x 1 Lagrangian dual function : g(λ) = inf x (x 2 + λ(1 x)) Set derivatives to zero x x 2 + λ(1 x) = 2x λ = 0 = x = λ/2 Plug back to the Lagrangian: g(λ) = λ2 4 + λ(1 λ/2) = λ2 4 + λ Juho Rousu 28. September, / 38

16 Lower bounds on optimal value The Lagrangian dual function gives us lower bounds on the optimal value p of the original problem: For any λ 0 and for any ν, we have g(λ, ν) p Juho Rousu 28. September, / 38

17 Lower bounds on optimal value To see this, let x be a feasible point of the original problem, thus f i ( x) 0 and h i ( x) = 0 Then L( x, λ, ν) = f 0 ( x) + m λ if i ( x) + p ν ih i ( x) f 0 ( x), as λ i f i ( x) 0 and ν i h i ( x) = 0 Juho Rousu 28. September, / 38

18 Lower bounds on optimal value Now, g(λ, ν) = inf x D L(x, λ, ν) L( x, λ, ν) f 0 ( x) as infimum is computed over a set containing x Juho Rousu 28. September, / 38

19 The Lagrange dual problem For each pair (λ, ν), λ 0, the Lagrange dual function gives a lower bound on the optimal value of p. What is the tightest lower bound that can be achieved? We need to find the maximum This gives us a optimization problem maximize g(λ, ν) subject to λ 0 with variable (λ, ν) It is called the Lagrange dual problem of the original optimization problem. It is a convex optimisation problem, since it is equivalent to minimising g(λ, ν) which is a convex function Juho Rousu 28. September, / 38

20 Dual problem for SVMs Derivation of Dual Soft-Margin SVM Recall the soft-margin SVM: Minimize 1 2 w 2 + C N ξ i Subject to 1 y i ( w, φ(x i ) + b) ξ i for all i = 1,..., N. ξ i 0 AG AG AG AG AG AG AG AG AG GC content after 'AG' ξ i is called the slack variable, when positive, the margin γ(x i ) < 1 The sum of slacks is to be minimized so the objective still favours hyperplanes that separates the classes well The coefficient C > 0 controls the balance between maximizing the margin and the amount of slack needed GC content before 'AG' ξ AG AG AG Juho Rousu 28. September, / 38

21 Dual problem for SVMs Recipe to find the dual problem Write down the Lagrangian Find its minimum by setting derivatives w.r.t. primal variables to zero Plug the solution (w, ξ ) back to the Lagrangian to obtain the dual function Write down the optimisation problem to maximise the dual function w.r.t the dual variables Juho Rousu 28. September, / 38

22 Dual problem for SVMs Deriving the dual SVM Primal soft-margin SVM problem Minimize 1 2 w 2 + C N ξ i Subject to 1 y i ( w, φ(x i ) ) ξ i for all i = 1,..., N. ξ i 0 Above we have dropped the intercept of the hyperplane, instead we assume constant feature φ 0 (x) = const First write it in the standard form: Minimize 1 2 w 2 + C N ξ i Subject to 1 y i ( w, φ(x i ) ) ξ i 0 for all i = 1,..., N. ξ i 0 for all i = 1,..., N. Juho Rousu 28. September, / 38

23 Dual problem for SVMs Lagrangian of Soft-Margin SVM Minimize 1 2 w 2 + C N ξ i Subject to 1 y i ( w, φ(x i ) ) ξ i 0 for all i = 1,..., N. ξ i 0 for all i = 1,..., N. We have objective and two sets of inequality constraints, use λ = (α, β) where α = (α i ) N corresponds to the first set, and β = (β i ) N corresponds to the second set of constraints The Lagrangian becomes L(w, ξ, α, β) = 1 2 w 2 + C N ξ i + N α i(1 y i ( w, φ(x i ) ) ξ i ) + N β i( ξ i ) Juho Rousu 28. September, / 38

24 Dual problem for SVMs Minimum of the Lagrangian L(w, ξ, α, β) = 1 2 w 2 + C N ξ i + N α i(1 y i ( w, φ(x i ) ) ξ i ) + N β i( ξ i ) Find the minimum of the Lagrangian by setting derivatives w.r.t. primal variables to zero w L(w, ξ, α, β) N = w = α i y i φ(x i ) = w N α iy i φ(x i ) = 0 Juho Rousu 28. September, / 38

25 Dual problem for SVMs Minimum of the Lagrangian L(w, ξ, α, β) = 1 2 w 2 + C N ξ i + N α i(1 y i ( w, φ(x i ) ) ξ i ) + N β i( ξ i ) Find the minimum of the Lagrangian by setting derivatives w.r.t. primal variables to zero ξi L(w, ξ, b, α, β) = C α i β i = 0 Together with β i 0 this implies α i C for all i = 1,..., N Juho Rousu 28. September, / 38

26 Dual problem for SVMs Lagrangian Dual function L(w, ξ, α, β) = 1 2 w 2 + C + N ξ i N α i (1 y i ( w, φ(x i ) ) ξ i ) + N β i ( ξ i ) Plug the solution w = N α iy i φ(x i ) back to the Lagrangian With C α i β i = 0 the terms containing ξ i and β i cancel out L(w, ξ, α, β) = 1 N 2 α i y i φ(x i ) N N α i (1 y i ( α i y i φ(x i ), φ(x i ) )) Juho Rousu 28. September, / 38

27 Dual problem for SVMs Lagrangian Dual function L(w, ξ, α, β) = 1 N 2 α i y i φ(x i ) N N α i (1 y i ( α j y j φ(x j ), φ(x i ) )) Expanding the squares and simplifying gives the dual function that only depends on the dual variable α j=1 g(α) = N α i 1 N 2 ( N α i α j y i y j φ(x j ), φ(x i ) )) j=1 Juho Rousu 28. September, / 38

28 Dual problem for SVMs Dual SVM optimisation problem Write down the optimisation problem to maximise the dual function w.r.t the dual variables We have non-negativity for α i 0, as well as the upper bound derived above α i C, the optimal dual solution will be N maximize g(α) = α i 1 N N α i α j y i y j φ(x j ), φ(x i ) 2 j=1 subject to 0 α i C, i = 1,..., N Plug in a kernel K(x i, x j ) = φ(x j ), φ(x i ) to obtain Dual SVM problem: N maximize g(α) = α i 1 N N α i α j y i y j K(x i, x j ) 2 j=1 subject to 0 α i C, i = 1,..., N Juho Rousu 28. September, / 38

29 Properties of convex optimisation problems Properties of convex optimisation problems We will look at further concepts to understand the properties of convex optimisation problems Weak and strong duality Duality gap Complementary slackness KKT conditions Juho Rousu 28. September, / 38

30 Properties of convex optimisation problems Weak and strong duality Let p and d denote primal and dual optimal values of an optimization problem. Weak duality d p always holds, even when primal optimization problem is non-convex Strong duality d = p holds for special classes of convex optimization problems LPs and QPs have strong duality Juho Rousu 28. September, / 38

31 Properties of convex optimisation problems Duality gap A pair x, (λ, ν) where x primal feasible and (λ, ν) is dual feasible is called primal dual feasible pair For primal dual feasible pair, the quantity is called the duality gap f 0 (x) g(λ, ν), A primal dual feasible pair localizes the primal and dual optimal values g(λ, ν) d p f 0 (x) into an interval the width of which is given by the duality gap If the duality gap is zero, we know that x primal optimal and (λ, ν) is dual optimal We can use duality gap as a stopping criterion for optimisation Juho Rousu 28. September, / 38

32 Properties of convex optimisation problems Stopping criterion for optimization Suppose the algorithm generates a sequence of primal feasible x (k) and dual feasible (λ (k), ν (k) ) solutions for k = 1, 2,... Then the duality gap can be used as the stopping criterion: e.g. stop when f 0 (x (k) ) g(λ (k), ν (k) ɛ, for some ɛ > 0 Juho Rousu 28. September, / 38

33 Properties of convex optimisation problems Complementary slackness Let x be a primal optimal (and thus also feasible) and (λ, ν ) be a dual optimal (and thus also feasible) solution and let strong optimality hold, i.e. d = p Then, at optimum d = g(λ, ν ) = inf x {f 0(x) + f 0 (x ) + m λ i f i (x) + m λ i f i (x ) + p νi h i (x)} p νi h i (x ) f 0 (x ) = p First inequality: definition of inifimum, second: inequality from x being a feasible solution Due to d = p, the inequalities must hold as equalities = penalty terms must equate to zero Juho Rousu 28. September, / 38

34 Properties of convex optimisation problems Complementary slackness We have m p λ i f i (x ) + νi h i (x ) = 0 Since h i (x ) = 0, i = 1,..., p we conclude that m λ i f i (x ) = 0 and since each term is non-positive λ i f i (x ) = 0, i = 1,..., m This condition is called complementary slackness Juho Rousu 28. September, / 38

35 Properties of convex optimisation problems Complementary slackness Intuition: at optimum there cannot be both slack in the dual variable λ i 0 and the constraint f i (x ) 0 at the same time: and λ i > 0 f i (x ) = 0 f i (x ) < 0 λ i = 0 At optimum, positive Lagrange multipliers are associated with active constraints Juho Rousu 28. September, / 38

36 Properties of convex optimisation problems Karush-Kuhn-Tucker (KKT) conditions Summarizing, at optimum the following conditions must hold true: Inequality contraints satified: f i (x ) 0, i = 1,..., m Equality contraints satified: hi (x ) = 0, i = 1,..., p Non-negativity of dual variables of the inequality constraints: λ i 0, i = 1,..., m Complementary slackness: λ i f i(x ) = 0, i = 1,..., m Derivative of Lagrangian vanishes: f 0 (x ) + m λ i f i(x ) + p ν i h i(x ) = 0 These conditions are called the Karush-Kuhn-Tucker conditions For convex problems satisfying KKT conditions is also sufficient as well as necessary for optimality Juho Rousu 28. September, / 38

37 Properties of convex optimisation problems Complementary slackness for soft-margin SVM Recall the Lagrangian of the soft-margin SVM: L(w, ξ, α, β) = 1 2 w 2 + C N ξ i + N α i(1 y i ( w, φ(x i ) ) ξ i ) + N β i( ξ i ) Complementary slackness condition gives for all i = 1,..., N αi (1 y i w, φ(x i ) ξ i ) = 0 βi ξ i = 0 It can be shown that together with the constraints of the primal and dual problems the above implies: A α i = C = y i w, φ(x i ) 1 B 0 < α i < C = y i w, φ(x i ) = 1 C α i = 0 = y i w, φ(x i ) 1 Juho Rousu 28. September, / 38

38 Properties of convex optimisation problems Complementary slackness for soft-margin SVM The data points are classified by their difficulty: A Too small margin, difficult data point : α i = C = y i w, φ(x i ) 1 B At the boundary : 0 < α i < C = y i w, φ(x i ) = 1 C More than enough margin, easy data point α i = 0 = y i w, φ(x i ) 1 <w, x> + b = -1 <w, x> + b = 0 <w, x> + b = +1 ξ>1 B C A B A B B C C C C ξ>0 C Juho Rousu 28. September, / 38

CS-E4830 Kernel Methods in Machine Learning

CS-E4830 Kernel Methods in Machine Learning CS-E4830 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 27. September, 2017 Juho Rousu 27. September, 2017 1 / 45 Convex optimization Convex optimisation This

More information

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information

The Lagrangian L : R d R m R r R is an (easier to optimize) lower bound on the original problem:

The Lagrangian L : R d R m R r R is an (easier to optimize) lower bound on the original problem: HT05: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford Convex Optimization and slides based on Arthur Gretton s Advanced Topics in Machine Learning course

More information

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Compiled by David Rosenberg Abstract Boyd and Vandenberghe s Convex Optimization book is very well-written and a pleasure to read. The

More information

5. Duality. Lagrangian

5. Duality. Lagrangian 5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized

More information

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014 Convex Optimization Dani Yogatama School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA February 12, 2014 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12,

More information

Convex Optimization and Support Vector Machine

Convex Optimization and Support Vector Machine Convex Optimization and Support Vector Machine Problem 0. Consider a two-class classification problem. The training data is L n = {(x 1, t 1 ),..., (x n, t n )}, where each t i { 1, 1} and x i R p. We

More information

Convex Optimization and SVM

Convex Optimization and SVM Convex Optimization and SVM Problem 0. Cf lecture notes pages 12 to 18. Problem 1. (i) A slab is an intersection of two half spaces, hence convex. (ii) A wedge is an intersection of two half spaces, hence

More information

Machine Learning. Support Vector Machines. Manfred Huber

Machine Learning. Support Vector Machines. Manfred Huber Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data

More information

Lecture 18: Optimization Programming

Lecture 18: Optimization Programming Fall, 2016 Outline Unconstrained Optimization 1 Unconstrained Optimization 2 Equality-constrained Optimization Inequality-constrained Optimization Mixture-constrained Optimization 3 Quadratic Programming

More information

Convex Optimization & Lagrange Duality

Convex Optimization & Lagrange Duality Convex Optimization & Lagrange Duality Chee Wei Tan CS 8292 : Advanced Topics in Convex Optimization and its Applications Fall 2010 Outline Convex optimization Optimality condition Lagrange duality KKT

More information

Lecture Notes on Support Vector Machine

Lecture Notes on Support Vector Machine Lecture Notes on Support Vector Machine Feng Li fli@sdu.edu.cn Shandong University, China 1 Hyperplane and Margin In a n-dimensional space, a hyper plane is defined by ω T x + b = 0 (1) where ω R n is

More information

Introduction to Machine Learning Spring 2018 Note Duality. 1.1 Primal and Dual Problem

Introduction to Machine Learning Spring 2018 Note Duality. 1.1 Primal and Dual Problem CS 189 Introduction to Machine Learning Spring 2018 Note 22 1 Duality As we have seen in our discussion of kernels, ridge regression can be viewed in two ways: (1) an optimization problem over the weights

More information

Support Vector Machines for Classification and Regression

Support Vector Machines for Classification and Regression CIS 520: Machine Learning Oct 04, 207 Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may

More information

Convex Optimization Boyd & Vandenberghe. 5. Duality

Convex Optimization Boyd & Vandenberghe. 5. Duality 5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized

More information

Support Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs

Support Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs E0 270 Machine Learning Lecture 5 (Jan 22, 203) Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in

More information

On the Method of Lagrange Multipliers

On the Method of Lagrange Multipliers On the Method of Lagrange Multipliers Reza Nasiri Mahalati November 6, 2016 Most of what is in this note is taken from the Convex Optimization book by Stephen Boyd and Lieven Vandenberghe. This should

More information

Convex Optimization M2

Convex Optimization M2 Convex Optimization M2 Lecture 3 A. d Aspremont. Convex Optimization M2. 1/49 Duality A. d Aspremont. Convex Optimization M2. 2/49 DMs DM par email: dm.daspremont@gmail.com A. d Aspremont. Convex Optimization

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts Sridhar Mahadevan: CMPSCI 689 p. 1/32 Margin Classifiers margin b = 0 Sridhar Mahadevan: CMPSCI 689 p.

More information

Lecture: Duality.

Lecture: Duality. Lecture: Duality http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghe s lecture notes Introduction 2/35 Lagrange dual problem weak and strong

More information

Machine Learning. Lecture 6: Support Vector Machine. Feng Li.

Machine Learning. Lecture 6: Support Vector Machine. Feng Li. Machine Learning Lecture 6: Support Vector Machine Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Warm Up 2 / 80 Warm Up (Contd.)

More information

Support vector machines

Support vector machines Support vector machines Guillaume Obozinski Ecole des Ponts - ParisTech SOCN course 2014 SVM, kernel methods and multiclass 1/23 Outline 1 Constrained optimization, Lagrangian duality and KKT 2 Support

More information

Support Vector Machines: Maximum Margin Classifiers

Support Vector Machines: Maximum Margin Classifiers Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 16, 2008 Piotr Mirowski Based on slides by Sumit Chopra and Fu-Jie Huang 1 Outline What is behind

More information

Lagrange duality. The Lagrangian. We consider an optimization program of the form

Lagrange duality. The Lagrangian. We consider an optimization program of the form Lagrange duality Another way to arrive at the KKT conditions, and one which gives us some insight on solving constrained optimization problems, is through the Lagrange dual. The dual is a maximization

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Support vector machines (SVMs) are one of the central concepts in all of machine learning. They are simply a combination of two ideas: linear classification via maximum (or optimal

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2

More information

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010 I.3. LMI DUALITY Didier HENRION henrion@laas.fr EECI Graduate School on Control Supélec - Spring 2010 Primal and dual For primal problem p = inf x g 0 (x) s.t. g i (x) 0 define Lagrangian L(x, z) = g 0

More information

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725 Karush-Kuhn-Tucker Conditions Lecturer: Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given a minimization problem Last time: duality min x subject to f(x) h i (x) 0, i = 1,... m l j (x) = 0, j =

More information

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training

More information

Support Vector Machines and Kernel Methods

Support Vector Machines and Kernel Methods 2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University

More information

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines Lecture 9: Large Margin Classifiers. Linear Support Vector Machines Perceptrons Definition Perceptron learning rule Convergence Margin & max margin classifiers (Linear) support vector machines Formulation

More information

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem: CDS270 Maryam Fazel Lecture 2 Topics from Optimization and Duality Motivation network utility maximization (NUM) problem: consider a network with S sources (users), each sending one flow at rate x s, through

More information

Support Vector Machines for Regression

Support Vector Machines for Regression COMP-566 Rohan Shah (1) Support Vector Machines for Regression Provided with n training data points {(x 1, y 1 ), (x 2, y 2 ),, (x n, y n )} R s R we seek a function f for a fixed ɛ > 0 such that: f(x

More information

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities Duality Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities Lagrangian Consider the optimization problem in standard form

More information

A Brief Review on Convex Optimization

A Brief Review on Convex Optimization A Brief Review on Convex Optimization 1 Convex set S R n is convex if x,y S, λ,µ 0, λ+µ = 1 λx+µy S geometrically: x,y S line segment through x,y S examples (one convex, two nonconvex sets): A Brief Review

More information

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training

More information

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training

More information

Support Vector Machine (continued)

Support Vector Machine (continued) Support Vector Machine continued) Overlapping class distribution: In practice the class-conditional distributions may overlap, so that the training data points are no longer linearly separable. We need

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

More information

Support Vector Machine

Support Vector Machine Andrea Passerini passerini@disi.unitn.it Machine Learning Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

More information

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST) Lagrange Duality Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2017-18, HKUST, Hong Kong Outline of Lecture Lagrangian Dual function Dual

More information

Max Margin-Classifier

Max Margin-Classifier Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization

More information

Lecture: Duality of LP, SOCP and SDP

Lecture: Duality of LP, SOCP and SDP 1/33 Lecture: Duality of LP, SOCP and SDP Zaiwen Wen Beijing International Center For Mathematical Research Peking University http://bicmr.pku.edu.cn/~wenzw/bigdata2017.html wenzw@pku.edu.cn Acknowledgement:

More information

Support Vector Machine I

Support Vector Machine I Support Vector Machine I Statistical Data Analysis with Positive Definite Kernels Kenji Fukumizu Institute of Statistical Mathematics, ROIS Department of Statistical Science, Graduate University for Advanced

More information

12. Interior-point methods

12. Interior-point methods 12. Interior-point methods Convex Optimization Boyd & Vandenberghe inequality constrained minimization logarithmic barrier function and central path barrier method feasibility and phase I methods complexity

More information

Support Vector Machines, Kernel SVM

Support Vector Machines, Kernel SVM Support Vector Machines, Kernel SVM Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 27, 2017 1 / 40 Outline 1 Administration 2 Review of last lecture 3 SVM

More information

CSCI : Optimization and Control of Networks. Review on Convex Optimization

CSCI : Optimization and Control of Networks. Review on Convex Optimization CSCI7000-016: Optimization and Control of Networks Review on Convex Optimization 1 Convex set S R n is convex if x,y S, λ,µ 0, λ+µ = 1 λx+µy S geometrically: x,y S line segment through x,y S examples (one

More information

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research Introduction to Machine Learning Lecture 7 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Convex Optimization Differentiation Definition: let f : X R N R be a differentiable function,

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44 Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)

More information

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness. CS/ECE/ISyE 524 Introduction to Optimization Spring 2016 17 14. Duality ˆ Upper and lower bounds ˆ General duality ˆ Constraint qualifications ˆ Counterexample ˆ Complementary slackness ˆ Examples ˆ Sensitivity

More information

Applied inductive learning - Lecture 7

Applied inductive learning - Lecture 7 Applied inductive learning - Lecture 7 Louis Wehenkel & Pierre Geurts Department of Electrical Engineering and Computer Science University of Liège Montefiore - Liège - November 5, 2012 Find slides: http://montefiore.ulg.ac.be/

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Statistical Inference with Reproducing Kernel Hilbert Space Kenji Fukumizu Institute of Statistical Mathematics, ROIS Department of Statistical Science, Graduate University for

More information

Lecture 7: Convex Optimizations

Lecture 7: Convex Optimizations Lecture 7: Convex Optimizations Radu Balan, David Levermore March 29, 2018 Convex Sets. Convex Functions A set S R n is called a convex set if for any points x, y S the line segment [x, y] := {tx + (1

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Lecture 2: Linear SVM in the Dual

Lecture 2: Linear SVM in the Dual Lecture 2: Linear SVM in the Dual Stéphane Canu stephane.canu@litislab.eu São Paulo 2015 July 22, 2015 Road map 1 Linear SVM Optimization in 10 slides Equality constraints Inequality constraints Dual formulation

More information

10701 Recitation 5 Duality and SVM. Ahmed Hefny

10701 Recitation 5 Duality and SVM. Ahmed Hefny 10701 Recitation 5 Duality and SVM Ahmed Hefny Outline Langrangian and Duality The Lagrangian Duality Eamples Support Vector Machines Primal Formulation Dual Formulation Soft Margin and Hinge Loss Lagrangian

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Shivani Agarwal Support Vector Machines (SVMs) Algorithm for learning linear classifiers Motivated by idea of maximizing margin Efficient extension to non-linear

More information

Machine Learning A Geometric Approach

Machine Learning A Geometric Approach Machine Learning A Geometric Approach CIML book Chap 7.7 Linear Classification: Support Vector Machines (SVM) Professor Liang Huang some slides from Alex Smola (CMU) Linear Separator Ham Spam From Perceptron

More information

Lecture Support Vector Machine (SVM) Classifiers

Lecture Support Vector Machine (SVM) Classifiers Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in

More information

Lagrangian Duality and Convex Optimization

Lagrangian Duality and Convex Optimization Lagrangian Duality and Convex Optimization David Rosenberg New York University February 11, 2015 David Rosenberg (New York University) DS-GA 1003 February 11, 2015 1 / 24 Introduction Why Convex Optimization?

More information

CS798: Selected topics in Machine Learning

CS798: Selected topics in Machine Learning CS798: Selected topics in Machine Learning Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS798: Selected topics in Machine Learning

More information

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning Kernel Machines Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 SVM linearly separable case n training points (x 1,, x n ) d features x j is a d-dimensional vector Primal problem:

More information

Support Vector Machines

Support Vector Machines EE 17/7AT: Optimization Models in Engineering Section 11/1 - April 014 Support Vector Machines Lecturer: Arturo Fernandez Scribe: Arturo Fernandez 1 Support Vector Machines Revisited 1.1 Strictly) Separable

More information

CS-E4830 Kernel Methods in Machine Learning

CS-E4830 Kernel Methods in Machine Learning CS-E4830 Kernel Methods in Machine Learning Lecture 5: Multi-class and preference learning Juho Rousu 11. October, 2017 Juho Rousu 11. October, 2017 1 / 37 Agenda from now on: This week s theme: going

More information

Math 273a: Optimization Subgradients of convex functions

Math 273a: Optimization Subgradients of convex functions Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 42 Subgradients Assumptions

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

EE/AA 578, Univ of Washington, Fall Duality

EE/AA 578, Univ of Washington, Fall Duality 7. Duality EE/AA 578, Univ of Washington, Fall 2016 Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized

More information

Learning From Data Lecture 25 The Kernel Trick

Learning From Data Lecture 25 The Kernel Trick Learning From Data Lecture 25 The Kernel Trick Learning with only inner products The Kernel M. Magdon-Ismail CSCI 400/600 recap: Large Margin is Better Controling Overfitting Non-Separable Data 0.08 random

More information

EE364a Review Session 5

EE364a Review Session 5 EE364a Review Session 5 EE364a Review announcements: homeworks 1 and 2 graded homework 4 solutions (check solution to additional problem 1) scpd phone-in office hours: tuesdays 6-7pm (650-723-1156) 1 Complementary

More information

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Michael Patriksson 0-0 The Relaxation Theorem 1 Problem: find f := infimum f(x), x subject to x S, (1a) (1b) where f : R n R

More information

Convex Optimization and Modeling

Convex Optimization and Modeling Convex Optimization and Modeling Duality Theory and Optimality Conditions 5th lecture, 12.05.2010 Jun.-Prof. Matthias Hein Program of today/next lecture Lagrangian and duality: the Lagrangian the dual

More information

subject to (x 2)(x 4) u,

subject to (x 2)(x 4) u, Exercises Basic definitions 5.1 A simple example. Consider the optimization problem with variable x R. minimize x 2 + 1 subject to (x 2)(x 4) 0, (a) Analysis of primal problem. Give the feasible set, the

More information

CSC 411 Lecture 17: Support Vector Machine

CSC 411 Lecture 17: Support Vector Machine CSC 411 Lecture 17: Support Vector Machine Ethan Fetaya, James Lucas and Emad Andrews University of Toronto CSC411 Lec17 1 / 1 Today Max-margin classification SVM Hard SVM Duality Soft SVM CSC411 Lec17

More information

Support Vector Machines

Support Vector Machines Wien, June, 2010 Paul Hofmarcher, Stefan Theussl, WU Wien Hofmarcher/Theussl SVM 1/21 Linear Separable Separating Hyperplanes Non-Linear Separable Soft-Margin Hyperplanes Hofmarcher/Theussl SVM 2/21 (SVM)

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Ryan M. Rifkin Google, Inc. 2008 Plan Regularization derivation of SVMs Geometric derivation of SVMs Optimality, Duality and Large Scale SVMs The Regularization Setting (Again)

More information

Lecture 3 January 28

Lecture 3 January 28 EECS 28B / STAT 24B: Advanced Topics in Statistical LearningSpring 2009 Lecture 3 January 28 Lecturer: Pradeep Ravikumar Scribe: Timothy J. Wheeler Note: These lecture notes are still rough, and have only

More information

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Machine Learning Support Vector Machines. Prof. Matteo Matteucci Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way

More information

Tutorial on Convex Optimization: Part II

Tutorial on Convex Optimization: Part II Tutorial on Convex Optimization: Part II Dr. Khaled Ardah Communications Research Laboratory TU Ilmenau Dec. 18, 2018 Outline Convex Optimization Review Lagrangian Duality Applications Optimal Power Allocation

More information

Indirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina

Indirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina Indirect Rule Learning: Support Vector Machines Indirect learning: loss optimization It doesn t estimate the prediction rule f (x) directly, since most loss functions do not have explicit optimizers. Indirection

More information

Lecture 10: A brief introduction to Support Vector Machine

Lecture 10: A brief introduction to Support Vector Machine Lecture 10: A brief introduction to Support Vector Machine Advanced Applied Multivariate Analysis STAT 2221, Fall 2013 Sungkyu Jung Department of Statistics, University of Pittsburgh Xingye Qiao Department

More information

LECTURE 7 Support vector machines

LECTURE 7 Support vector machines LECTURE 7 Support vector machines SVMs have been used in a multitude of applications and are one of the most popular machine learning algorithms. We will derive the SVM algorithm from two perspectives:

More information

Machine Learning And Applications: Supervised Learning-SVM

Machine Learning And Applications: Supervised Learning-SVM Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine

More information

Jeff Howbert Introduction to Machine Learning Winter

Jeff Howbert Introduction to Machine Learning Winter Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable

More information

Applications of Linear Programming

Applications of Linear Programming Applications of Linear Programming lecturer: András London University of Szeged Institute of Informatics Department of Computational Optimization Lecture 9 Non-linear programming In case of LP, the goal

More information

Outline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22

Outline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22 Outline Basic concepts: SVM and kernels SVM primal/dual problems Chih-Jen Lin (National Taiwan Univ.) 1 / 22 Outline Basic concepts: SVM and kernels Basic concepts: SVM and kernels SVM primal/dual problems

More information

Foundation of Intelligent Systems, Part I. SVM s & Kernel Methods

Foundation of Intelligent Systems, Part I. SVM s & Kernel Methods Foundation of Intelligent Systems, Part I SVM s & Kernel Methods mcuturi@i.kyoto-u.ac.jp FIS - 2013 1 Support Vector Machines The linearly-separable case FIS - 2013 2 A criterion to select a linear classifier:

More information

Linear Support Vector Machine. Classification. Linear SVM. Huiping Cao. Huiping Cao, Slide 1/26

Linear Support Vector Machine. Classification. Linear SVM. Huiping Cao. Huiping Cao, Slide 1/26 Huiping Cao, Slide 1/26 Classification Linear SVM Huiping Cao linear hyperplane (decision boundary) that will separate the data Huiping Cao, Slide 2/26 Support Vector Machines rt Vector Find a linear Machines

More information

8. Geometric problems

8. Geometric problems 8. Geometric problems Convex Optimization Boyd & Vandenberghe extremal volume ellipsoids centering classification placement and facility location 8 Minimum volume ellipsoid around a set Löwner-John ellipsoid

More information

Lecture 6: Conic Optimization September 8

Lecture 6: Conic Optimization September 8 IE 598: Big Data Optimization Fall 2016 Lecture 6: Conic Optimization September 8 Lecturer: Niao He Scriber: Juan Xu Overview In this lecture, we finish up our previous discussion on optimality conditions

More information

Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Module - 5 Lecture - 22 SVM: The Dual Formulation Good morning.

More information

Announcements - Homework

Announcements - Homework Announcements - Homework Homework 1 is graded, please collect at end of lecture Homework 2 due today Homework 3 out soon (watch email) Ques 1 midterm review HW1 score distribution 40 HW1 total score 35

More information

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus 1/41 Subgradient Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes definition subgradient calculus duality and optimality conditions directional derivative Basic inequality

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html

More information

Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron

Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron CS446: Machine Learning, Fall 2017 Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron Lecturer: Sanmi Koyejo Scribe: Ke Wang, Oct. 24th, 2017 Agenda Recap: SVM and Hinge loss, Representer

More information