Convex optimization COMS 4771

Size: px
Start display at page:

Download "Convex optimization COMS 4771"

Transcription

1 Convex optimization COMS 4771

2 1. Recap: learning via optimization

3 Soft-margin SVMs Soft-margin SVM optimization problem defined by training data: w R d λ 2 w n n [ ] 1 y ix T i w. + 1 / 15

4 Soft-margin SVMs Soft-margin SVM optimization problem defined by training data: w R d λ 2 w n n [ ] 1 y ix T i w. + Compare with Empirical Risk Minimization for zero-one loss: 1 w R d n n 1{y ix T i w 0}. 1 / 15

5 Soft-margin SVMs Soft-margin SVM optimization problem defined by training data: w R d λ 2 w n n [ ] 1 y ix T i w. + Compare with Empirical Risk Minimization for zero-one loss: 1 w R d n n 1{y ix T i w 0}. In both cases, i-th term in summation is function of y ix T i w. 1 / 15

6 Zero-one loss vs. hinge loss 1 1 {yx T w 0} Zero-one loss: count 1 if f w(x) y. [ ] 1{yx T w 0} 1 yx T w. 1 + yx T w 2 / 15

7 Zero-one loss vs. hinge loss 1 [1 yx T w] + 1 {yx T w 0} 1 yx T w Hinge loss: an upper-bound on zero-one loss. [ ] 1{yx T w 0} 1 yx T w. + 2 / 15

8 Zero-one loss vs. hinge loss 1 [1 yx T w] + 1 {yx T w 0} 1 yx T w Hinge loss: an upper-bound on zero-one loss. [ ] 1{yx T w 0} 1 yx T w. Soft-margin SVM imizes an upper-bound on the training error rate, plus a term that encourages large margins. + 2 / 15

9 Learning via optimization Zero-one loss ERM Squared loss ERM Logistic regression MLE Soft-margin SVM 1 w R d n 1 w R d n 1 w R d n n 1{y ix T i w 0} n (x T i w y i) 2 n ln (1 + exp( y ix T i w)) λ w R d 2 w n n [ ] 1 y ix T i w + 3 / 15

10 Learning via optimization Zero-one loss ERM Squared loss ERM Logistic regression MLE Soft-margin SVM Generic learning objective: w R d 1 w R d n 1 w R d n 1 w R d n n 1{y ix T i w 0} n (x T i w y i) 2 n ln (1 + exp( y ix T i w)) λ w R d 2 w n r(w) }{{} regularization + 1 n n [ ] 1 y ix T i w + n φ i(w). } {{ } data fitting Regularization: encodes learning bias (e.g., prefer large margins). Data fitting: how poor is the fit to the training data. 3 / 15

11 Regularization Other kinds of regularization encode other inductive biases : e.g., r(w) = λ w 1 : encourages w to be small and sparse r(w) = λ d 1 wi+1 wi : encourage piecewise constant weights r(w) = λ w w old 2 2 : encourage closeness to old solution w old 4 / 15

12 Regularization Other kinds of regularization encode other inductive biases : e.g., r(w) = λ w 1 : encourages w to be small and sparse r(w) = λ d 1 wi+1 wi : encourage piecewise constant weights r(w) = λ w w old 2 2 : encourage closeness to old solution w old Can combine regularization with other data fitting objectives: e.g., λ Ridge regression w R d 2 w n (x T i w y i) 2 2n Lasso λ w w R d n (x T i w y i) 2 2n Sparse SVM λ w w R d n [ ] 1 y ix T i w n + Sparse logistic regression w R d λ w n n ( ln 1 + exp( y ix T i w) ) 4 / 15

13 2. Introduction to convexity

14 Convex sets A set A is convex if, for every pair of points {x, x } in A, the line segment between x and x is also contained in A. convex not convex convex convex 5 / 15

15 Convex sets A set A is convex if, for every pair of points {x, x } in A, the line segment between x and x is also contained in A. Examples: All of R d. Empty set. convex not convex convex convex Half-spaces: {x R d : b T x c}. Intersections of convex sets. Convex hulls: conv(s) := { k αixi : k N, x i S, α i 0, k αi = 1}. 5 / 15

16 Convex functions A function f : R d R is convex if for any x, x R d and α [0, 1], f ((1 α)x + αx ) (1 α) f(x) + α f(x ). x x x x f is not convex f is convex 6 / 15

17 Convex functions A function f : R d R is convex if for any x, x R d and α [0, 1], f ((1 α)x + αx ) (1 α) f(x) + α f(x ). x x x x f is not convex f is convex Examples: f(x) = c x for any c > 0 (on R) f(x) = x c for any c 1 (on R) f(x) = b T x for any b R d. f(x) = x for any norm. f(x) = x T Ax for symmetric positive semidefinite A. af + g for convex functions f and g, and a 0. max{f, g} for convex functions f and g. 6 / 15

18 Verifying convexity Is f(x) = x convex? 7 / 15

19 Verifying convexity Is f(x) = x convex? Pick any α [0, 1] and any x, x R d. 7 / 15

20 Verifying convexity Is f(x) = x convex? Pick any α [0, 1] and any x, x R d. f ((1 α)x + αx ) = (1 α)x + αx 7 / 15

21 Verifying convexity Is f(x) = x convex? Pick any α [0, 1] and any x, x R d. f ((1 α)x + αx ) = (1 α)x + αx (1 α)x + αx (triangle inequality) 7 / 15

22 Verifying convexity Is f(x) = x convex? Pick any α [0, 1] and any x, x R d. f ((1 α)x + αx ) = (1 α)x + αx (1 α)x + αx (triangle inequality) = (1 α) x + α x (homogeneity) 7 / 15

23 Verifying convexity Is f(x) = x convex? Pick any α [0, 1] and any x, x R d. f ((1 α)x + αx ) = (1 α)x + αx (1 α)x + αx (triangle inequality) = (1 α) x + α x (homogeneity) = (1 α)f(x) + αf(x ). Yes, f is convex. 7 / 15

24 Convexity of differentiable functions Differentiable functions If f : R d R is differentiable, then f is convex if and only if f(x) f(x 0) + f(x 0) T (x x 0) f(x) a(x) for all x, x 0 R d. x 0 a(x) = f(x 0) + f (x 0)(x x 0) 8 / 15

25 Convexity of differentiable functions Differentiable functions If f : R d R is differentiable, then f is convex if and only if f(x) f(x 0) + f(x 0) T (x x 0) f(x) a(x) for all x, x 0 R d. x 0 a(x) = f(x 0) + f (x 0)(x x 0) Twice-differentiable functions If f : R d R is twice-differentiable, then f is convex if and only if 2 f(x) 0 for all x R d (i.e., the Hessian, or matrix of second-derivatives, is positive semi-definite for all x). 8 / 15

26 Verifying convexity of differentiable functions Is f(x) = x 4 convex? 9 / 15

27 Verifying convexity of differentiable functions Is f(x) = x 4 convex? Use second-order condition for convexity. f(x) x = 4x3 2 f(x) x 2 = 12x / 15

28 Verifying convexity of differentiable functions Is f(x) = x 4 convex? Use second-order condition for convexity. Yes, f is convex. f(x) x = 4x3 2 f(x) x 2 = 12x / 15

29 Verifying convexity of differentiable functions Is f(x) = x 4 convex? Use second-order condition for convexity. Yes, f is convex. Is f(x) = e a,x convex? f(x) x = 4x3 2 f(x) x 2 = 12x / 15

30 Verifying convexity of differentiable functions Is f(x) = x 4 convex? Use second-order condition for convexity. Yes, f is convex. f(x) x = 4x3 2 f(x) x 2 = 12x 2 0. Is f(x) = e a,x convex? Use first-order condition for convexity. f(x) = e a,x { a, x } = e a,x a (chain rule). 9 / 15

31 Verifying convexity of differentiable functions Is f(x) = x 4 convex? Use second-order condition for convexity. Yes, f is convex. f(x) x = 4x3 2 f(x) x 2 = 12x 2 0. Is f(x) = e a,x convex? Use first-order condition for convexity. f(x) = e a,x { a, x } = e a,x a (chain rule). Difference between f and its affine approximation: f(x) ( f(x 0) + f(x 0), x x 0 ) = e a,x ( ) e a,x0 + e a,x0 a, x x 0 9 / 15

32 Verifying convexity of differentiable functions Is f(x) = x 4 convex? Use second-order condition for convexity. Yes, f is convex. f(x) x = 4x3 2 f(x) x 2 = 12x 2 0. Is f(x) = e a,x convex? Use first-order condition for convexity. f(x) = e a,x { a, x } = e a,x a (chain rule). Difference between f and its affine approximation: f(x) ( f(x 0) + f(x 0), x x ) ( ) 0 = e a,x e a,x0 + e a,x0 a, x x 0 ( = e a,x 0 e a,x x0 ( 1 + a, x x )) 0 9 / 15

33 Verifying convexity of differentiable functions Is f(x) = x 4 convex? Use second-order condition for convexity. Yes, f is convex. f(x) x = 4x3 2 f(x) x 2 = 12x 2 0. Is f(x) = e a,x convex? Use first-order condition for convexity. f(x) = e a,x { a, x } = e a,x a (chain rule). Difference between f and its affine approximation: f(x) ( f(x 0) + f(x 0), x x ) ( ) 0 = e a,x e a,x0 + e a,x0 a, x x 0 ( = e a,x 0 e a,x x0 ( 1 + a, x x )) 0 0 (because 1 + z e z for all z R). 9 / 15

34 Verifying convexity of differentiable functions Is f(x) = x 4 convex? Use second-order condition for convexity. Yes, f is convex. f(x) x = 4x3 2 f(x) x 2 = 12x 2 0. Is f(x) = e a,x convex? Use first-order condition for convexity. f(x) = e a,x { a, x } = e a,x a (chain rule). Difference between f and its affine approximation: f(x) ( f(x 0) + f(x 0), x x ) ( ) 0 = e a,x e a,x0 + e a,x0 a, x x 0 ( = e a,x 0 e a,x x0 ( 1 + a, x x )) 0 Yes, f is convex. 0 (because 1 + z e z for all z R). 9 / 15

35 3. Convex optimization problems

36 Optimization problems A typical optimization problem (in standard form) is written as x R d f 0(x) s.t. f i(x) 0 for all i = 1,..., n. 10 / 15

37 Optimization problems A typical optimization problem (in standard form) is written as x R d f 0(x) s.t. f i(x) 0 for all i = 1,..., n. f 0 : R d R is the objective function; 10 / 15

38 Optimization problems A typical optimization problem (in standard form) is written as x R d f 0(x) s.t. f i(x) 0 for all i = 1,..., n. f 0 : R d R is the objective function; f 1,..., f n : R d R are the constraint functions; 10 / 15

39 Optimization problems A typical optimization problem (in standard form) is written as x R d f 0(x) s.t. f i(x) 0 for all i = 1,..., n. f 0 : R d R is the objective function; f 1,..., f n : R d R are the constraint functions; inequalities f i(x) 0 are constraints; 10 / 15

40 Optimization problems A typical optimization problem (in standard form) is written as x R d f 0(x) s.t. f i(x) 0 for all i = 1,..., n. f 0 : R d R is the objective function; f 1,..., f n : R d R are the constraint functions; inequalities f i(x) 0 are constraints; { } A := x R d : f i(x) 0 for all i = 1, 2,..., n is the feasible region. 10 / 15

41 Optimization problems A typical optimization problem (in standard form) is written as x R d f 0(x) s.t. f i(x) 0 for all i = 1,..., n. f 0 : R d R is the objective function; f 1,..., f n : R d R are the constraint functions; inequalities f i(x) 0 are constraints; { } A := x R d : f i(x) 0 for all i = 1, 2,..., n is the feasible region. Goal: Find x A so that f 0(x) is as small as possible. 10 / 15

42 Optimization problems A typical optimization problem (in standard form) is written as x R d f 0(x) s.t. f i(x) 0 for all i = 1,..., n. f 0 : R d R is the objective function; f 1,..., f n : R d R are the constraint functions; inequalities f i(x) 0 are constraints; { } A := x R d : f i(x) 0 for all i = 1, 2,..., n is the feasible region. Goal: Find x A so that f 0(x) is as small as possible. (Optimal) value of the optimization problem is the smallest value f 0(x) achieved by a feasible point x A. 10 / 15

43 Optimization problems A typical optimization problem (in standard form) is written as x R d f 0(x) s.t. f i(x) 0 for all i = 1,..., n. f 0 : R d R is the objective function; f 1,..., f n : R d R are the constraint functions; inequalities f i(x) 0 are constraints; { } A := x R d : f i(x) 0 for all i = 1, 2,..., n is the feasible region. Goal: Find x A so that f 0(x) is as small as possible. (Optimal) value of the optimization problem is the smallest value f 0(x) achieved by a feasible point x A. Point x A achieving the optimal value is a (global) imizer. 10 / 15

44 Convex optimization problems Standard form of a convex optimization problem: x R d f 0(x) s.t. f i(x) 0 for all i = 1,..., n where f 0, f 1,..., f n : R d R are convex functions. 11 / 15

45 Convex optimization problems Standard form of a convex optimization problem: x R d f 0(x) s.t. f i(x) 0 for all i = 1,..., n where f 0, f 1,..., f n : R d R are convex functions. Fact: the feasible set { } A := x R d : f i(x) 0 for all i = 1, 2,..., n is a convex set. 11 / 15

46 Example: Soft-margin SVM w R d,ξ 1,...,ξ n R λ 2 w n n s.t. y ix T i w 1 ξ i for all i = 1,..., n, ξ i ξ i 0 for all i = 1,..., n. 12 / 15

47 Example: Soft-margin SVM w R d,ξ 1,...,ξ n R Bringing to standard form: w R d,ξ 1,...,ξ n R λ 2 w n n s.t. y ix T i w 1 ξ i for all i = 1,..., n, λ 2 w n ξ i ξ i 0 for all i = 1,..., n. n s.t. 1 ξ i y ix T i w 0 for all i = 1,..., n ξ i 0 ξ i for all i = 1,..., n 12 / 15

48 Example: Soft-margin SVM w R d,ξ 1,...,ξ n R Bringing to standard form: w R d,ξ 1,...,ξ n R λ 2 w n n s.t. y ix T i w 1 ξ i for all i = 1,..., n, λ 2 w n ξ i ξ i 0 for all i = 1,..., n. n s.t. 1 ξ i y ix T i w 0 for all i = 1,..., n ξ i 0 ξ i for all i = 1,..., n (sum of two convex functions, also convex) 12 / 15

49 Example: Soft-margin SVM w R d,ξ 1,...,ξ n R Bringing to standard form: w R d,ξ 1,...,ξ n R λ 2 w n n s.t. y ix T i w 1 ξ i for all i = 1,..., n, λ 2 w n ξ i ξ i 0 for all i = 1,..., n. n ξ i (sum of two convex functions, also convex) s.t. 1 ξ i y ix T i w 0 for all i = 1,..., n (linear constraints) ξ i 0 for all i = 1,..., n 12 / 15

50 Example: Soft-margin SVM w R d,ξ 1,...,ξ n R Bringing to standard form: w R d,ξ 1,...,ξ n R λ 2 w n n s.t. y ix T i w 1 ξ i for all i = 1,..., n, λ 2 w n ξ i ξ i 0 for all i = 1,..., n. n ξ i (sum of two convex functions, also convex) s.t. 1 ξ i y ix T i w 0 for all i = 1,..., n (linear constraints) ξ i 0 for all i = 1,..., n (linear constraints) 12 / 15

51 Example: Soft-margin SVM w R d,ξ 1,...,ξ n R Bringing to standard form: w R d,ξ 1,...,ξ n R λ 2 w n n s.t. y ix T i w 1 ξ i for all i = 1,..., n, λ 2 w n ξ i ξ i 0 for all i = 1,..., n. n ξ i (sum of two convex functions, also convex) s.t. 1 ξ i y ix T i w 0 for all i = 1,..., n (linear constraints) ξ i 0 for all i = 1,..., n (linear constraints) Which of Zero-one loss ERM, Squared loss ERM, Logistic regression MLE, Ridge regression, Lasso, Sparse SVM are convex? 12 / 15

52 Local imizers Consider an optimization problem (not necessarily convex): f 0(x) x R d s.t. x A. 13 / 15

53 Local imizers Consider an optimization problem (not necessarily convex): f 0(x) x R d s.t. x A. We say x A is a local imizer if there is an open ball { } U := x R d : x x 2 < r of positive radius r > 0 such that x is a global imizer for f 0(x) x R d s.t. x A U. 13 / 15

54 Local imizers Consider an optimization problem (not necessarily convex): f 0(x) x R d s.t. x A. We say x A is a local imizer if there is an open ball { } U := x R d : x x 2 < r of positive radius r > 0 such that x is a global imizer for f 0(x) x R d s.t. x A U. Nothing looks better than x in the immediate vicinity of x. locally optimal 13 / 15

55 Local imizers of convex problems If the optimization problem is convex, and x A is a local imizer, then it is also a global imizer. local global 14 / 15

56 Key takeaways 1. Formulation of learning methods as optimization problems. 2. Convex sets, convex functions, ways to check if a function is convex. 3. Standard form of convex optimization problems, concept of local and global imizers. 15 / 15

Lecture 2: Convex Sets and Functions

Lecture 2: Convex Sets and Functions Lecture 2: Convex Sets and Functions Hyang-Won Lee Dept. of Internet & Multimedia Eng. Konkuk University Lecture 2 Network Optimization, Fall 2015 1 / 22 Optimization Problems Optimization problems are

More information

CS-E4830 Kernel Methods in Machine Learning

CS-E4830 Kernel Methods in Machine Learning CS-E4830 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 27. September, 2017 Juho Rousu 27. September, 2017 1 / 45 Convex optimization Convex optimisation This

More information

Linear regression COMS 4771

Linear regression COMS 4771 Linear regression COMS 4771 1. Old Faithful and prediction functions Prediction problem: Old Faithful geyser (Yellowstone) Task: Predict time of next eruption. 1 / 40 Statistical model for time between

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

Support Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs

Support Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs E0 270 Machine Learning Lecture 5 (Jan 22, 203) Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in

More information

Machine Learning Basics Lecture 4: SVM I. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 4: SVM I. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 4: SVM I Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d. from distribution

More information

Support Vector Machines for Classification and Regression

Support Vector Machines for Classification and Regression CIS 520: Machine Learning Oct 04, 207 Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may

More information

ICS-E4030 Kernel Methods in Machine Learning

ICS-E4030 Kernel Methods in Machine Learning ICS-E4030 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 28. September, 2016 Juho Rousu 28. September, 2016 1 / 38 Convex optimization Convex optimisation This

More information

Convex Optimization & Machine Learning. Introduction to Optimization

Convex Optimization & Machine Learning. Introduction to Optimization Convex Optimization & Machine Learning Introduction to Optimization mcuturi@i.kyoto-u.ac.jp CO&ML 1 Why do we need optimization in machine learning We want to find the best possible decision w.r.t. a problem

More information

Machine Learning. Regularization and Feature Selection. Fabio Vandin November 14, 2017

Machine Learning. Regularization and Feature Selection. Fabio Vandin November 14, 2017 Machine Learning Regularization and Feature Selection Fabio Vandin November 14, 2017 1 Regularized Loss Minimization Assume h is defined by a vector w = (w 1,..., w d ) T R d (e.g., linear models) Regularization

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

CIS 520: Machine Learning Oct 09, Kernel Methods

CIS 520: Machine Learning Oct 09, Kernel Methods CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed

More information

Midterm exam CS 189/289, Fall 2015

Midterm exam CS 189/289, Fall 2015 Midterm exam CS 189/289, Fall 2015 You have 80 minutes for the exam. Total 100 points: 1. True/False: 36 points (18 questions, 2 points each). 2. Multiple-choice questions: 24 points (8 questions, 3 points

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline

More information

Lecture 1: January 12

Lecture 1: January 12 10-725/36-725: Convex Optimization Fall 2015 Lecturer: Ryan Tibshirani Lecture 1: January 12 Scribes: Seo-Jin Bang, Prabhat KC, Josue Orellana 1.1 Review We begin by going through some examples and key

More information

Lecture 1: Convex Sets January 23

Lecture 1: Convex Sets January 23 IE 521: Convex Optimization Instructor: Niao He Lecture 1: Convex Sets January 23 Spring 2017, UIUC Scribe: Niao He Courtesy warning: These notes do not necessarily cover everything discussed in the class.

More information

Supremum of simple stochastic processes

Supremum of simple stochastic processes Subspace embeddings Daniel Hsu COMS 4772 1 Supremum of simple stochastic processes 2 Recap: JL lemma JL lemma. For any ε (0, 1/2), point set S R d of cardinality 16 ln n S = n, and k N such that k, there

More information

Machine Learning And Applications: Supervised Learning-SVM

Machine Learning And Applications: Supervised Learning-SVM Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

Convex Optimization Lecture 16

Convex Optimization Lecture 16 Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean

More information

Logistic regression and linear classifiers COMS 4771

Logistic regression and linear classifiers COMS 4771 Logistic regression and linear classifiers COMS 4771 1. Prediction functions (again) Learning prediction functions IID model for supervised learning: (X 1, Y 1),..., (X n, Y n), (X, Y ) are iid random

More information

Statistical Methods for SVM

Statistical Methods for SVM Statistical Methods for SVM Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot,

More information

Classification objectives COMS 4771

Classification objectives COMS 4771 Classification objectives COMS 4771 1. Recap: binary classification Scoring functions Consider binary classification problems with Y = { 1, +1}. 1 / 22 Scoring functions Consider binary classification

More information

MLCC 2017 Regularization Networks I: Linear Models

MLCC 2017 Regularization Networks I: Linear Models MLCC 2017 Regularization Networks I: Linear Models Lorenzo Rosasco UNIGE-MIT-IIT June 27, 2017 About this class We introduce a class of learning algorithms based on Tikhonov regularization We study computational

More information

Qualifying Exam in Machine Learning

Qualifying Exam in Machine Learning Qualifying Exam in Machine Learning October 20, 2009 Instructions: Answer two out of the three questions in Part 1. In addition, answer two out of three questions in two additional parts (choose two parts

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

COMS 4771 Regression. Nakul Verma

COMS 4771 Regression. Nakul Verma COMS 4771 Regression Nakul Verma Last time Support Vector Machines Maximum Margin formulation Constrained Optimization Lagrange Duality Theory Convex Optimization SVM dual and Interpretation How get the

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts Sridhar Mahadevan: CMPSCI 689 p. 1/32 Margin Classifiers margin b = 0 Sridhar Mahadevan: CMPSCI 689 p.

More information

ORIE 4741: Learning with Big Messy Data. Regularization

ORIE 4741: Learning with Big Messy Data. Regularization ORIE 4741: Learning with Big Messy Data Regularization Professor Udell Operations Research and Information Engineering Cornell October 26, 2017 1 / 24 Regularized empirical risk minimization choose model

More information

Warm up: risk prediction with logistic regression

Warm up: risk prediction with logistic regression Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T

More information

IE 521 Convex Optimization

IE 521 Convex Optimization Lecture 1: 16th January 2019 Outline 1 / 20 Which set is different from others? Figure: Four sets 2 / 20 Which set is different from others? Figure: Four sets 3 / 20 Interior, Closure, Boundary Definition.

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Matrix Notation Mark Schmidt University of British Columbia Winter 2017 Admin Auditting/registration forms: Submit them at end of class, pick them up end of next class. I need

More information

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training

More information

SWFR ENG 4TE3 (6TE3) COMP SCI 4TE3 (6TE3) Continuous Optimization Algorithm. Convex Optimization. Computing and Software McMaster University

SWFR ENG 4TE3 (6TE3) COMP SCI 4TE3 (6TE3) Continuous Optimization Algorithm. Convex Optimization. Computing and Software McMaster University SWFR ENG 4TE3 (6TE3) COMP SCI 4TE3 (6TE3) Continuous Optimization Algorithm Convex Optimization Computing and Software McMaster University General NLO problem (NLO : Non Linear Optimization) (N LO) min

More information

Stat542 (F11) Statistical Learning. First consider the scenario where the two classes of points are separable.

Stat542 (F11) Statistical Learning. First consider the scenario where the two classes of points are separable. Linear SVM (separable case) First consider the scenario where the two classes of points are separable. It s desirable to have the width (called margin) between the two dashed lines to be large, i.e., have

More information

Topics we covered. Machine Learning. Statistics. Optimization. Systems! Basics of probability Tail bounds Density Estimation Exponential Families

Topics we covered. Machine Learning. Statistics. Optimization. Systems! Basics of probability Tail bounds Density Estimation Exponential Families Midterm Review Topics we covered Machine Learning Optimization Basics of optimization Convexity Unconstrained: GD, SGD Constrained: Lagrange, KKT Duality Linear Methods Perceptrons Support Vector Machines

More information

Machine Learning. Regularization and Feature Selection. Fabio Vandin November 13, 2017

Machine Learning. Regularization and Feature Selection. Fabio Vandin November 13, 2017 Machine Learning Regularization and Feature Selection Fabio Vandin November 13, 2017 1 Learning Model A: learning algorithm for a machine learning task S: m i.i.d. pairs z i = (x i, y i ), i = 1,..., m,

More information

Machine Learning Practice Page 2 of 2 10/28/13

Machine Learning Practice Page 2 of 2 10/28/13 Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 8: Optimization Cho-Jui Hsieh UC Davis May 9, 2017 Optimization Numerical Optimization Numerical Optimization: min X f (X ) Can be applied

More information

Introduction and Math Preliminaries

Introduction and Math Preliminaries Introduction and Math Preliminaries Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/ yyye Appendices A, B, and C, Chapter

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization

More information

Today. Calculus. Linear Regression. Lagrange Multipliers

Today. Calculus. Linear Regression. Lagrange Multipliers Today Calculus Lagrange Multipliers Linear Regression 1 Optimization with constraints What if I want to constrain the parameters of the model. The mean is less than 10 Find the best likelihood, subject

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Convex Optimization Algorithms for Machine Learning in 10 Slides

Convex Optimization Algorithms for Machine Learning in 10 Slides Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,

More information

Support Vector Machines, Kernel SVM

Support Vector Machines, Kernel SVM Support Vector Machines, Kernel SVM Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 27, 2017 1 / 40 Outline 1 Administration 2 Review of last lecture 3 SVM

More information

Machine Learning. Support Vector Machines. Manfred Huber

Machine Learning. Support Vector Machines. Manfred Huber Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data

More information

Convex envelopes, cardinality constrained optimization and LASSO. An application in supervised learning: support vector machines (SVMs)

Convex envelopes, cardinality constrained optimization and LASSO. An application in supervised learning: support vector machines (SVMs) ORF 523 Lecture 8 Princeton University Instructor: A.A. Ahmadi Scribe: G. Hall Any typos should be emailed to a a a@princeton.edu. 1 Outline Convexity-preserving operations Convex envelopes, cardinality

More information

Generalization theory

Generalization theory Generalization theory Daniel Hsu Columbia TRIPODS Bootcamp 1 Motivation 2 Support vector machines X = R d, Y = { 1, +1}. Return solution ŵ R d to following optimization problem: λ min w R d 2 w 2 2 + 1

More information

RegML 2018 Class 2 Tikhonov regularization and kernels

RegML 2018 Class 2 Tikhonov regularization and kernels RegML 2018 Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT June 17, 2018 Learning problem Problem For H {f f : X Y }, solve min E(f), f H dρ(x, y)l(f(x), y) given S n = (x i,

More information

Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron

Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron CS446: Machine Learning, Fall 2017 Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron Lecturer: Sanmi Koyejo Scribe: Ke Wang, Oct. 24th, 2017 Agenda Recap: SVM and Hinge loss, Representer

More information

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017 Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training

More information

Midterm. Introduction to Machine Learning. CS 189 Spring You have 1 hour 20 minutes for the exam.

Midterm. Introduction to Machine Learning. CS 189 Spring You have 1 hour 20 minutes for the exam. CS 189 Spring 2013 Introduction to Machine Learning Midterm You have 1 hour 20 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. Please use non-programmable calculators

More information

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research Introduction to Machine Learning Lecture 7 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Convex Optimization Differentiation Definition: let f : X R N R be a differentiable function,

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot, we get creative in two

More information

Support Vector Machine I

Support Vector Machine I Support Vector Machine I Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative Please use piazza. No emails. HW 0 grades are back. Re-grade request for one week. HW 1 due soon. HW

More information

Homework 4. Convex Optimization /36-725

Homework 4. Convex Optimization /36-725 Homework 4 Convex Optimization 10-725/36-725 Due Friday November 4 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 4 Subgradient Shiqian Ma, MAT-258A: Numerical Optimization 2 4.1. Subgradients definition subgradient calculus duality and optimality conditions Shiqian

More information

Chapter 2: Preliminaries and elements of convex analysis

Chapter 2: Preliminaries and elements of convex analysis Chapter 2: Preliminaries and elements of convex analysis Edoardo Amaldi DEIB Politecnico di Milano edoardo.amaldi@polimi.it Website: http://home.deib.polimi.it/amaldi/opt-14-15.shtml Academic year 2014-15

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Proximal-Gradient Mark Schmidt University of British Columbia Winter 2018 Admin Auditting/registration forms: Pick up after class today. Assignment 1: 2 late days to hand in

More information

COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017

COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017 COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University UNDERDETERMINED LINEAR EQUATIONS We

More information

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Support Vector Machines

Support Vector Machines EE 17/7AT: Optimization Models in Engineering Section 11/1 - April 014 Support Vector Machines Lecturer: Arturo Fernandez Scribe: Arturo Fernandez 1 Support Vector Machines Revisited 1.1 Strictly) Separable

More information

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES Supervised Learning Linear vs non linear classifiers In K-NN we saw an example of a non-linear classifier: the decision boundary

More information

CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning

CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning Professor Erik Sudderth Brown University Computer Science October 4, 2016 Some figures and materials courtesy

More information

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1).

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1). Regression and PCA Classification The goal: map from input X to a label Y. Y has a discrete set of possible values We focused on binary Y (values 0 or 1). But we also discussed larger number of classes

More information

Warm up. Regrade requests submitted directly in Gradescope, do not instructors.

Warm up. Regrade requests submitted directly in Gradescope, do not  instructors. Warm up Regrade requests submitted directly in Gradescope, do not email instructors. 1 float in NumPy = 8 bytes 10 6 2 20 bytes = 1 MB 10 9 2 30 bytes = 1 GB For each block compute the memory required

More information

Decidability of consistency of function and derivative information for a triangle and a convex quadrilateral

Decidability of consistency of function and derivative information for a triangle and a convex quadrilateral Decidability of consistency of function and derivative information for a triangle and a convex quadrilateral Abbas Edalat Department of Computing Imperial College London Abstract Given a triangle in the

More information

Basis Expansion and Nonlinear SVM. Kai Yu

Basis Expansion and Nonlinear SVM. Kai Yu Basis Expansion and Nonlinear SVM Kai Yu Linear Classifiers f(x) =w > x + b z(x) = sign(f(x)) Help to learn more general cases, e.g., nonlinear models 8/7/12 2 Nonlinear Classifiers via Basis Expansion

More information

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie Computational Biology Program Memorial Sloan-Kettering Cancer Center http://cbio.mskcc.org/leslielab

More information

Introduction to Machine Learning Midterm Exam Solutions

Introduction to Machine Learning Midterm Exam Solutions 10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,

More information

Lasso, Ridge, and Elastic Net

Lasso, Ridge, and Elastic Net Lasso, Ridge, and Elastic Net David Rosenberg New York University February 7, 2017 David Rosenberg (New York University) DS-GA 1003 February 7, 2017 1 / 29 Linearly Dependent Features Linearly Dependent

More information

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines Gautam Kunapuli Example: Text Categorization Example: Develop a model to classify news stories into various categories based on their content. sports politics Use the bag-of-words representation for this

More information

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information

References. Lecture 7: Support Vector Machines. Optimum Margin Perceptron. Perceptron Learning Rule

References. Lecture 7: Support Vector Machines. Optimum Margin Perceptron. Perceptron Learning Rule References Lecture 7: Support Vector Machines Isabelle Guyon guyoni@inf.ethz.ch An training algorithm for optimal margin classifiers Boser-Guyon-Vapnik, COLT, 992 http://www.clopinet.com/isabelle/p apers/colt92.ps.z

More information

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010 Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X

More information

Discriminative Learning and Big Data

Discriminative Learning and Big Data AIMS-CDT Michaelmas 2016 Discriminative Learning and Big Data Lecture 2: Other loss functions and ANN Andrew Zisserman Visual Geometry Group University of Oxford http://www.robots.ox.ac.uk/~vgg Lecture

More information

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Due: Monday, February 13, 2017, at 10pm (Submit via Gradescope) Instructions: Your answers to the questions below,

More information

Lecture 8. Strong Duality Results. September 22, 2008

Lecture 8. Strong Duality Results. September 22, 2008 Strong Duality Results September 22, 2008 Outline Lecture 8 Slater Condition and its Variations Convex Objective with Linear Inequality Constraints Quadratic Objective over Quadratic Constraints Representation

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Complexity Bounds of Radial Basis Functions and Multi-Objective Learning

Complexity Bounds of Radial Basis Functions and Multi-Objective Learning Complexity Bounds of Radial Basis Functions and Multi-Objective Learning Illya Kokshenev and Antônio P. Braga Universidade Federal de Minas Gerais - Depto. Engenharia Eletrônica Av. Antônio Carlos, 6.67

More information

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 10

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 10 COS53: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 0 MELISSA CARROLL, LINJIE LUO. BIAS-VARIANCE TRADE-OFF (CONTINUED FROM LAST LECTURE) If V = (X n, Y n )} are observed data, the linear regression problem

More information

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training

More information

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University Chapter 9. Support Vector Machine Yongdai Kim Seoul National University 1. Introduction Support Vector Machine (SVM) is a classification method developed by Vapnik (1996). It is thought that SVM improved

More information

Accelerate Subgradient Methods

Accelerate Subgradient Methods Accelerate Subgradient Methods Tianbao Yang Department of Computer Science The University of Iowa Contributors: students Yi Xu, Yan Yan and colleague Qihang Lin Yang (CS@Uiowa) Accelerate Subgradient Methods

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Uppsala University Department of Linguistics and Philology Slides borrowed from Ryan McDonald, Google Research Machine Learning for NLP 1(50) Introduction Linear Classifiers Classifiers

More information

Introduction to Machine Learning Midterm, Tues April 8

Introduction to Machine Learning Midterm, Tues April 8 Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend

More information

Generalization, Overfitting, and Model Selection

Generalization, Overfitting, and Model Selection Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification Maria-Florina (Nina) Balcan 10/03/2016 Two Core Aspects of Machine Learning Algorithm Design. How

More information

Short Course Robust Optimization and Machine Learning. 3. Optimization in Supervised Learning

Short Course Robust Optimization and Machine Learning. 3. Optimization in Supervised Learning Short Course Robust Optimization and 3. Optimization in Supervised EECS and IEOR Departments UC Berkeley Spring seminar TRANSP-OR, Zinal, Jan. 16-19, 2012 Outline Overview of Supervised models and variants

More information

Math 273a: Optimization Subgradients of convex functions

Math 273a: Optimization Subgradients of convex functions Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 20 Subgradients Assumptions

More information

Day 4: Classification, support vector machines

Day 4: Classification, support vector machines Day 4: Classification, support vector machines Introduction to Machine Learning Summer School June 18, 2018 - June 29, 2018, Chicago Instructor: Suriya Gunasekar, TTI Chicago 21 June 2018 Topics so far

More information

SCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University.

SCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University. SCMA292 Mathematical Modeling : Machine Learning Krikamol Muandet Department of Mathematics Faculty of Science, Mahidol University February 9, 2016 Outline Quick Recap of Least Square Ridge Regression

More information

Linear, Binary SVM Classifiers

Linear, Binary SVM Classifiers Linear, Binary SVM Classifiers COMPSCI 37D Machine Learning COMPSCI 37D Machine Learning Linear, Binary SVM Classifiers / 6 Outline What Linear, Binary SVM Classifiers Do 2 Margin I 3 Loss and Regularized

More information

Oslo Class 2 Tikhonov regularization and kernels

Oslo Class 2 Tikhonov regularization and kernels RegML2017@SIMULA Oslo Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT May 3, 2017 Learning problem Problem For H {f f : X Y }, solve min E(f), f H dρ(x, y)l(f(x), y) given S n

More information

COMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma

COMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma COMS 4771 Introduction to Machine Learning James McInerney Adapted from slides by Nakul Verma Announcements HW1: Please submit as a group Watch out for zero variance features (Q5) HW2 will be released

More information

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015 EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,

More information

Subgradient Method. Ryan Tibshirani Convex Optimization

Subgradient Method. Ryan Tibshirani Convex Optimization Subgradient Method Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last last time: gradient descent min x f(x) for f convex and differentiable, dom(f) = R n. Gradient descent: choose initial

More information

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning Kernel Machines Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 SVM linearly separable case n training points (x 1,, x n ) d features x j is a d-dimensional vector Primal problem:

More information