Convex optimization COMS 4771
|
|
- Patricia Ray
- 5 years ago
- Views:
Transcription
1 Convex optimization COMS 4771
2 1. Recap: learning via optimization
3 Soft-margin SVMs Soft-margin SVM optimization problem defined by training data: w R d λ 2 w n n [ ] 1 y ix T i w. + 1 / 15
4 Soft-margin SVMs Soft-margin SVM optimization problem defined by training data: w R d λ 2 w n n [ ] 1 y ix T i w. + Compare with Empirical Risk Minimization for zero-one loss: 1 w R d n n 1{y ix T i w 0}. 1 / 15
5 Soft-margin SVMs Soft-margin SVM optimization problem defined by training data: w R d λ 2 w n n [ ] 1 y ix T i w. + Compare with Empirical Risk Minimization for zero-one loss: 1 w R d n n 1{y ix T i w 0}. In both cases, i-th term in summation is function of y ix T i w. 1 / 15
6 Zero-one loss vs. hinge loss 1 1 {yx T w 0} Zero-one loss: count 1 if f w(x) y. [ ] 1{yx T w 0} 1 yx T w. 1 + yx T w 2 / 15
7 Zero-one loss vs. hinge loss 1 [1 yx T w] + 1 {yx T w 0} 1 yx T w Hinge loss: an upper-bound on zero-one loss. [ ] 1{yx T w 0} 1 yx T w. + 2 / 15
8 Zero-one loss vs. hinge loss 1 [1 yx T w] + 1 {yx T w 0} 1 yx T w Hinge loss: an upper-bound on zero-one loss. [ ] 1{yx T w 0} 1 yx T w. Soft-margin SVM imizes an upper-bound on the training error rate, plus a term that encourages large margins. + 2 / 15
9 Learning via optimization Zero-one loss ERM Squared loss ERM Logistic regression MLE Soft-margin SVM 1 w R d n 1 w R d n 1 w R d n n 1{y ix T i w 0} n (x T i w y i) 2 n ln (1 + exp( y ix T i w)) λ w R d 2 w n n [ ] 1 y ix T i w + 3 / 15
10 Learning via optimization Zero-one loss ERM Squared loss ERM Logistic regression MLE Soft-margin SVM Generic learning objective: w R d 1 w R d n 1 w R d n 1 w R d n n 1{y ix T i w 0} n (x T i w y i) 2 n ln (1 + exp( y ix T i w)) λ w R d 2 w n r(w) }{{} regularization + 1 n n [ ] 1 y ix T i w + n φ i(w). } {{ } data fitting Regularization: encodes learning bias (e.g., prefer large margins). Data fitting: how poor is the fit to the training data. 3 / 15
11 Regularization Other kinds of regularization encode other inductive biases : e.g., r(w) = λ w 1 : encourages w to be small and sparse r(w) = λ d 1 wi+1 wi : encourage piecewise constant weights r(w) = λ w w old 2 2 : encourage closeness to old solution w old 4 / 15
12 Regularization Other kinds of regularization encode other inductive biases : e.g., r(w) = λ w 1 : encourages w to be small and sparse r(w) = λ d 1 wi+1 wi : encourage piecewise constant weights r(w) = λ w w old 2 2 : encourage closeness to old solution w old Can combine regularization with other data fitting objectives: e.g., λ Ridge regression w R d 2 w n (x T i w y i) 2 2n Lasso λ w w R d n (x T i w y i) 2 2n Sparse SVM λ w w R d n [ ] 1 y ix T i w n + Sparse logistic regression w R d λ w n n ( ln 1 + exp( y ix T i w) ) 4 / 15
13 2. Introduction to convexity
14 Convex sets A set A is convex if, for every pair of points {x, x } in A, the line segment between x and x is also contained in A. convex not convex convex convex 5 / 15
15 Convex sets A set A is convex if, for every pair of points {x, x } in A, the line segment between x and x is also contained in A. Examples: All of R d. Empty set. convex not convex convex convex Half-spaces: {x R d : b T x c}. Intersections of convex sets. Convex hulls: conv(s) := { k αixi : k N, x i S, α i 0, k αi = 1}. 5 / 15
16 Convex functions A function f : R d R is convex if for any x, x R d and α [0, 1], f ((1 α)x + αx ) (1 α) f(x) + α f(x ). x x x x f is not convex f is convex 6 / 15
17 Convex functions A function f : R d R is convex if for any x, x R d and α [0, 1], f ((1 α)x + αx ) (1 α) f(x) + α f(x ). x x x x f is not convex f is convex Examples: f(x) = c x for any c > 0 (on R) f(x) = x c for any c 1 (on R) f(x) = b T x for any b R d. f(x) = x for any norm. f(x) = x T Ax for symmetric positive semidefinite A. af + g for convex functions f and g, and a 0. max{f, g} for convex functions f and g. 6 / 15
18 Verifying convexity Is f(x) = x convex? 7 / 15
19 Verifying convexity Is f(x) = x convex? Pick any α [0, 1] and any x, x R d. 7 / 15
20 Verifying convexity Is f(x) = x convex? Pick any α [0, 1] and any x, x R d. f ((1 α)x + αx ) = (1 α)x + αx 7 / 15
21 Verifying convexity Is f(x) = x convex? Pick any α [0, 1] and any x, x R d. f ((1 α)x + αx ) = (1 α)x + αx (1 α)x + αx (triangle inequality) 7 / 15
22 Verifying convexity Is f(x) = x convex? Pick any α [0, 1] and any x, x R d. f ((1 α)x + αx ) = (1 α)x + αx (1 α)x + αx (triangle inequality) = (1 α) x + α x (homogeneity) 7 / 15
23 Verifying convexity Is f(x) = x convex? Pick any α [0, 1] and any x, x R d. f ((1 α)x + αx ) = (1 α)x + αx (1 α)x + αx (triangle inequality) = (1 α) x + α x (homogeneity) = (1 α)f(x) + αf(x ). Yes, f is convex. 7 / 15
24 Convexity of differentiable functions Differentiable functions If f : R d R is differentiable, then f is convex if and only if f(x) f(x 0) + f(x 0) T (x x 0) f(x) a(x) for all x, x 0 R d. x 0 a(x) = f(x 0) + f (x 0)(x x 0) 8 / 15
25 Convexity of differentiable functions Differentiable functions If f : R d R is differentiable, then f is convex if and only if f(x) f(x 0) + f(x 0) T (x x 0) f(x) a(x) for all x, x 0 R d. x 0 a(x) = f(x 0) + f (x 0)(x x 0) Twice-differentiable functions If f : R d R is twice-differentiable, then f is convex if and only if 2 f(x) 0 for all x R d (i.e., the Hessian, or matrix of second-derivatives, is positive semi-definite for all x). 8 / 15
26 Verifying convexity of differentiable functions Is f(x) = x 4 convex? 9 / 15
27 Verifying convexity of differentiable functions Is f(x) = x 4 convex? Use second-order condition for convexity. f(x) x = 4x3 2 f(x) x 2 = 12x / 15
28 Verifying convexity of differentiable functions Is f(x) = x 4 convex? Use second-order condition for convexity. Yes, f is convex. f(x) x = 4x3 2 f(x) x 2 = 12x / 15
29 Verifying convexity of differentiable functions Is f(x) = x 4 convex? Use second-order condition for convexity. Yes, f is convex. Is f(x) = e a,x convex? f(x) x = 4x3 2 f(x) x 2 = 12x / 15
30 Verifying convexity of differentiable functions Is f(x) = x 4 convex? Use second-order condition for convexity. Yes, f is convex. f(x) x = 4x3 2 f(x) x 2 = 12x 2 0. Is f(x) = e a,x convex? Use first-order condition for convexity. f(x) = e a,x { a, x } = e a,x a (chain rule). 9 / 15
31 Verifying convexity of differentiable functions Is f(x) = x 4 convex? Use second-order condition for convexity. Yes, f is convex. f(x) x = 4x3 2 f(x) x 2 = 12x 2 0. Is f(x) = e a,x convex? Use first-order condition for convexity. f(x) = e a,x { a, x } = e a,x a (chain rule). Difference between f and its affine approximation: f(x) ( f(x 0) + f(x 0), x x 0 ) = e a,x ( ) e a,x0 + e a,x0 a, x x 0 9 / 15
32 Verifying convexity of differentiable functions Is f(x) = x 4 convex? Use second-order condition for convexity. Yes, f is convex. f(x) x = 4x3 2 f(x) x 2 = 12x 2 0. Is f(x) = e a,x convex? Use first-order condition for convexity. f(x) = e a,x { a, x } = e a,x a (chain rule). Difference between f and its affine approximation: f(x) ( f(x 0) + f(x 0), x x ) ( ) 0 = e a,x e a,x0 + e a,x0 a, x x 0 ( = e a,x 0 e a,x x0 ( 1 + a, x x )) 0 9 / 15
33 Verifying convexity of differentiable functions Is f(x) = x 4 convex? Use second-order condition for convexity. Yes, f is convex. f(x) x = 4x3 2 f(x) x 2 = 12x 2 0. Is f(x) = e a,x convex? Use first-order condition for convexity. f(x) = e a,x { a, x } = e a,x a (chain rule). Difference between f and its affine approximation: f(x) ( f(x 0) + f(x 0), x x ) ( ) 0 = e a,x e a,x0 + e a,x0 a, x x 0 ( = e a,x 0 e a,x x0 ( 1 + a, x x )) 0 0 (because 1 + z e z for all z R). 9 / 15
34 Verifying convexity of differentiable functions Is f(x) = x 4 convex? Use second-order condition for convexity. Yes, f is convex. f(x) x = 4x3 2 f(x) x 2 = 12x 2 0. Is f(x) = e a,x convex? Use first-order condition for convexity. f(x) = e a,x { a, x } = e a,x a (chain rule). Difference between f and its affine approximation: f(x) ( f(x 0) + f(x 0), x x ) ( ) 0 = e a,x e a,x0 + e a,x0 a, x x 0 ( = e a,x 0 e a,x x0 ( 1 + a, x x )) 0 Yes, f is convex. 0 (because 1 + z e z for all z R). 9 / 15
35 3. Convex optimization problems
36 Optimization problems A typical optimization problem (in standard form) is written as x R d f 0(x) s.t. f i(x) 0 for all i = 1,..., n. 10 / 15
37 Optimization problems A typical optimization problem (in standard form) is written as x R d f 0(x) s.t. f i(x) 0 for all i = 1,..., n. f 0 : R d R is the objective function; 10 / 15
38 Optimization problems A typical optimization problem (in standard form) is written as x R d f 0(x) s.t. f i(x) 0 for all i = 1,..., n. f 0 : R d R is the objective function; f 1,..., f n : R d R are the constraint functions; 10 / 15
39 Optimization problems A typical optimization problem (in standard form) is written as x R d f 0(x) s.t. f i(x) 0 for all i = 1,..., n. f 0 : R d R is the objective function; f 1,..., f n : R d R are the constraint functions; inequalities f i(x) 0 are constraints; 10 / 15
40 Optimization problems A typical optimization problem (in standard form) is written as x R d f 0(x) s.t. f i(x) 0 for all i = 1,..., n. f 0 : R d R is the objective function; f 1,..., f n : R d R are the constraint functions; inequalities f i(x) 0 are constraints; { } A := x R d : f i(x) 0 for all i = 1, 2,..., n is the feasible region. 10 / 15
41 Optimization problems A typical optimization problem (in standard form) is written as x R d f 0(x) s.t. f i(x) 0 for all i = 1,..., n. f 0 : R d R is the objective function; f 1,..., f n : R d R are the constraint functions; inequalities f i(x) 0 are constraints; { } A := x R d : f i(x) 0 for all i = 1, 2,..., n is the feasible region. Goal: Find x A so that f 0(x) is as small as possible. 10 / 15
42 Optimization problems A typical optimization problem (in standard form) is written as x R d f 0(x) s.t. f i(x) 0 for all i = 1,..., n. f 0 : R d R is the objective function; f 1,..., f n : R d R are the constraint functions; inequalities f i(x) 0 are constraints; { } A := x R d : f i(x) 0 for all i = 1, 2,..., n is the feasible region. Goal: Find x A so that f 0(x) is as small as possible. (Optimal) value of the optimization problem is the smallest value f 0(x) achieved by a feasible point x A. 10 / 15
43 Optimization problems A typical optimization problem (in standard form) is written as x R d f 0(x) s.t. f i(x) 0 for all i = 1,..., n. f 0 : R d R is the objective function; f 1,..., f n : R d R are the constraint functions; inequalities f i(x) 0 are constraints; { } A := x R d : f i(x) 0 for all i = 1, 2,..., n is the feasible region. Goal: Find x A so that f 0(x) is as small as possible. (Optimal) value of the optimization problem is the smallest value f 0(x) achieved by a feasible point x A. Point x A achieving the optimal value is a (global) imizer. 10 / 15
44 Convex optimization problems Standard form of a convex optimization problem: x R d f 0(x) s.t. f i(x) 0 for all i = 1,..., n where f 0, f 1,..., f n : R d R are convex functions. 11 / 15
45 Convex optimization problems Standard form of a convex optimization problem: x R d f 0(x) s.t. f i(x) 0 for all i = 1,..., n where f 0, f 1,..., f n : R d R are convex functions. Fact: the feasible set { } A := x R d : f i(x) 0 for all i = 1, 2,..., n is a convex set. 11 / 15
46 Example: Soft-margin SVM w R d,ξ 1,...,ξ n R λ 2 w n n s.t. y ix T i w 1 ξ i for all i = 1,..., n, ξ i ξ i 0 for all i = 1,..., n. 12 / 15
47 Example: Soft-margin SVM w R d,ξ 1,...,ξ n R Bringing to standard form: w R d,ξ 1,...,ξ n R λ 2 w n n s.t. y ix T i w 1 ξ i for all i = 1,..., n, λ 2 w n ξ i ξ i 0 for all i = 1,..., n. n s.t. 1 ξ i y ix T i w 0 for all i = 1,..., n ξ i 0 ξ i for all i = 1,..., n 12 / 15
48 Example: Soft-margin SVM w R d,ξ 1,...,ξ n R Bringing to standard form: w R d,ξ 1,...,ξ n R λ 2 w n n s.t. y ix T i w 1 ξ i for all i = 1,..., n, λ 2 w n ξ i ξ i 0 for all i = 1,..., n. n s.t. 1 ξ i y ix T i w 0 for all i = 1,..., n ξ i 0 ξ i for all i = 1,..., n (sum of two convex functions, also convex) 12 / 15
49 Example: Soft-margin SVM w R d,ξ 1,...,ξ n R Bringing to standard form: w R d,ξ 1,...,ξ n R λ 2 w n n s.t. y ix T i w 1 ξ i for all i = 1,..., n, λ 2 w n ξ i ξ i 0 for all i = 1,..., n. n ξ i (sum of two convex functions, also convex) s.t. 1 ξ i y ix T i w 0 for all i = 1,..., n (linear constraints) ξ i 0 for all i = 1,..., n 12 / 15
50 Example: Soft-margin SVM w R d,ξ 1,...,ξ n R Bringing to standard form: w R d,ξ 1,...,ξ n R λ 2 w n n s.t. y ix T i w 1 ξ i for all i = 1,..., n, λ 2 w n ξ i ξ i 0 for all i = 1,..., n. n ξ i (sum of two convex functions, also convex) s.t. 1 ξ i y ix T i w 0 for all i = 1,..., n (linear constraints) ξ i 0 for all i = 1,..., n (linear constraints) 12 / 15
51 Example: Soft-margin SVM w R d,ξ 1,...,ξ n R Bringing to standard form: w R d,ξ 1,...,ξ n R λ 2 w n n s.t. y ix T i w 1 ξ i for all i = 1,..., n, λ 2 w n ξ i ξ i 0 for all i = 1,..., n. n ξ i (sum of two convex functions, also convex) s.t. 1 ξ i y ix T i w 0 for all i = 1,..., n (linear constraints) ξ i 0 for all i = 1,..., n (linear constraints) Which of Zero-one loss ERM, Squared loss ERM, Logistic regression MLE, Ridge regression, Lasso, Sparse SVM are convex? 12 / 15
52 Local imizers Consider an optimization problem (not necessarily convex): f 0(x) x R d s.t. x A. 13 / 15
53 Local imizers Consider an optimization problem (not necessarily convex): f 0(x) x R d s.t. x A. We say x A is a local imizer if there is an open ball { } U := x R d : x x 2 < r of positive radius r > 0 such that x is a global imizer for f 0(x) x R d s.t. x A U. 13 / 15
54 Local imizers Consider an optimization problem (not necessarily convex): f 0(x) x R d s.t. x A. We say x A is a local imizer if there is an open ball { } U := x R d : x x 2 < r of positive radius r > 0 such that x is a global imizer for f 0(x) x R d s.t. x A U. Nothing looks better than x in the immediate vicinity of x. locally optimal 13 / 15
55 Local imizers of convex problems If the optimization problem is convex, and x A is a local imizer, then it is also a global imizer. local global 14 / 15
56 Key takeaways 1. Formulation of learning methods as optimization problems. 2. Convex sets, convex functions, ways to check if a function is convex. 3. Standard form of convex optimization problems, concept of local and global imizers. 15 / 15
Lecture 2: Convex Sets and Functions
Lecture 2: Convex Sets and Functions Hyang-Won Lee Dept. of Internet & Multimedia Eng. Konkuk University Lecture 2 Network Optimization, Fall 2015 1 / 22 Optimization Problems Optimization problems are
More informationCS-E4830 Kernel Methods in Machine Learning
CS-E4830 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 27. September, 2017 Juho Rousu 27. September, 2017 1 / 45 Convex optimization Convex optimisation This
More informationLinear regression COMS 4771
Linear regression COMS 4771 1. Old Faithful and prediction functions Prediction problem: Old Faithful geyser (Yellowstone) Task: Predict time of next eruption. 1 / 40 Statistical model for time between
More informationMaster 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique
Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some
More informationSupport Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs
E0 270 Machine Learning Lecture 5 (Jan 22, 203) Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in
More informationMachine Learning Basics Lecture 4: SVM I. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 4: SVM I Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d. from distribution
More informationSupport Vector Machines for Classification and Regression
CIS 520: Machine Learning Oct 04, 207 Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may
More informationICS-E4030 Kernel Methods in Machine Learning
ICS-E4030 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 28. September, 2016 Juho Rousu 28. September, 2016 1 / 38 Convex optimization Convex optimisation This
More informationConvex Optimization & Machine Learning. Introduction to Optimization
Convex Optimization & Machine Learning Introduction to Optimization mcuturi@i.kyoto-u.ac.jp CO&ML 1 Why do we need optimization in machine learning We want to find the best possible decision w.r.t. a problem
More informationMachine Learning. Regularization and Feature Selection. Fabio Vandin November 14, 2017
Machine Learning Regularization and Feature Selection Fabio Vandin November 14, 2017 1 Regularized Loss Minimization Assume h is defined by a vector w = (w 1,..., w d ) T R d (e.g., linear models) Regularization
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationCIS 520: Machine Learning Oct 09, Kernel Methods
CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed
More informationMidterm exam CS 189/289, Fall 2015
Midterm exam CS 189/289, Fall 2015 You have 80 minutes for the exam. Total 100 points: 1. True/False: 36 points (18 questions, 2 points each). 2. Multiple-choice questions: 24 points (8 questions, 3 points
More informationMachine Learning for NLP
Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline
More informationLecture 1: January 12
10-725/36-725: Convex Optimization Fall 2015 Lecturer: Ryan Tibshirani Lecture 1: January 12 Scribes: Seo-Jin Bang, Prabhat KC, Josue Orellana 1.1 Review We begin by going through some examples and key
More informationLecture 1: Convex Sets January 23
IE 521: Convex Optimization Instructor: Niao He Lecture 1: Convex Sets January 23 Spring 2017, UIUC Scribe: Niao He Courtesy warning: These notes do not necessarily cover everything discussed in the class.
More informationSupremum of simple stochastic processes
Subspace embeddings Daniel Hsu COMS 4772 1 Supremum of simple stochastic processes 2 Recap: JL lemma JL lemma. For any ε (0, 1/2), point set S R d of cardinality 16 ln n S = n, and k N such that k, there
More informationMachine Learning And Applications: Supervised Learning-SVM
Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationConvex Optimization Lecture 16
Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean
More informationLogistic regression and linear classifiers COMS 4771
Logistic regression and linear classifiers COMS 4771 1. Prediction functions (again) Learning prediction functions IID model for supervised learning: (X 1, Y 1),..., (X n, Y n), (X, Y ) are iid random
More informationStatistical Methods for SVM
Statistical Methods for SVM Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot,
More informationClassification objectives COMS 4771
Classification objectives COMS 4771 1. Recap: binary classification Scoring functions Consider binary classification problems with Y = { 1, +1}. 1 / 22 Scoring functions Consider binary classification
More informationMLCC 2017 Regularization Networks I: Linear Models
MLCC 2017 Regularization Networks I: Linear Models Lorenzo Rosasco UNIGE-MIT-IIT June 27, 2017 About this class We introduce a class of learning algorithms based on Tikhonov regularization We study computational
More informationQualifying Exam in Machine Learning
Qualifying Exam in Machine Learning October 20, 2009 Instructions: Answer two out of the three questions in Part 1. In addition, answer two out of three questions in two additional parts (choose two parts
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationCOMS 4771 Regression. Nakul Verma
COMS 4771 Regression Nakul Verma Last time Support Vector Machines Maximum Margin formulation Constrained Optimization Lagrange Duality Theory Convex Optimization SVM dual and Interpretation How get the
More informationSupport Vector Machines
Support Vector Machines Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts Sridhar Mahadevan: CMPSCI 689 p. 1/32 Margin Classifiers margin b = 0 Sridhar Mahadevan: CMPSCI 689 p.
More informationORIE 4741: Learning with Big Messy Data. Regularization
ORIE 4741: Learning with Big Messy Data Regularization Professor Udell Operations Research and Information Engineering Cornell October 26, 2017 1 / 24 Regularized empirical risk minimization choose model
More informationWarm up: risk prediction with logistic regression
Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T
More informationIE 521 Convex Optimization
Lecture 1: 16th January 2019 Outline 1 / 20 Which set is different from others? Figure: Four sets 2 / 20 Which set is different from others? Figure: Four sets 3 / 20 Interior, Closure, Boundary Definition.
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Matrix Notation Mark Schmidt University of British Columbia Winter 2017 Admin Auditting/registration forms: Submit them at end of class, pick them up end of next class. I need
More informationLinear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)
Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training
More informationSWFR ENG 4TE3 (6TE3) COMP SCI 4TE3 (6TE3) Continuous Optimization Algorithm. Convex Optimization. Computing and Software McMaster University
SWFR ENG 4TE3 (6TE3) COMP SCI 4TE3 (6TE3) Continuous Optimization Algorithm Convex Optimization Computing and Software McMaster University General NLO problem (NLO : Non Linear Optimization) (N LO) min
More informationStat542 (F11) Statistical Learning. First consider the scenario where the two classes of points are separable.
Linear SVM (separable case) First consider the scenario where the two classes of points are separable. It s desirable to have the width (called margin) between the two dashed lines to be large, i.e., have
More informationTopics we covered. Machine Learning. Statistics. Optimization. Systems! Basics of probability Tail bounds Density Estimation Exponential Families
Midterm Review Topics we covered Machine Learning Optimization Basics of optimization Convexity Unconstrained: GD, SGD Constrained: Lagrange, KKT Duality Linear Methods Perceptrons Support Vector Machines
More informationMachine Learning. Regularization and Feature Selection. Fabio Vandin November 13, 2017
Machine Learning Regularization and Feature Selection Fabio Vandin November 13, 2017 1 Learning Model A: learning algorithm for a machine learning task S: m i.i.d. pairs z i = (x i, y i ), i = 1,..., m,
More informationMachine Learning Practice Page 2 of 2 10/28/13
Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Lecture 8: Optimization Cho-Jui Hsieh UC Davis May 9, 2017 Optimization Numerical Optimization Numerical Optimization: min X f (X ) Can be applied
More informationIntroduction and Math Preliminaries
Introduction and Math Preliminaries Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/ yyye Appendices A, B, and C, Chapter
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization
More informationToday. Calculus. Linear Regression. Lagrange Multipliers
Today Calculus Lagrange Multipliers Linear Regression 1 Optimization with constraints What if I want to constrain the parameters of the model. The mean is less than 10 Find the best likelihood, subject
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationConvex Optimization Algorithms for Machine Learning in 10 Slides
Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,
More informationSupport Vector Machines, Kernel SVM
Support Vector Machines, Kernel SVM Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 27, 2017 1 / 40 Outline 1 Administration 2 Review of last lecture 3 SVM
More informationMachine Learning. Support Vector Machines. Manfred Huber
Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data
More informationConvex envelopes, cardinality constrained optimization and LASSO. An application in supervised learning: support vector machines (SVMs)
ORF 523 Lecture 8 Princeton University Instructor: A.A. Ahmadi Scribe: G. Hall Any typos should be emailed to a a a@princeton.edu. 1 Outline Convexity-preserving operations Convex envelopes, cardinality
More informationGeneralization theory
Generalization theory Daniel Hsu Columbia TRIPODS Bootcamp 1 Motivation 2 Support vector machines X = R d, Y = { 1, +1}. Return solution ŵ R d to following optimization problem: λ min w R d 2 w 2 2 + 1
More informationRegML 2018 Class 2 Tikhonov regularization and kernels
RegML 2018 Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT June 17, 2018 Learning problem Problem For H {f f : X Y }, solve min E(f), f H dρ(x, y)l(f(x), y) given S n = (x i,
More informationLecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron
CS446: Machine Learning, Fall 2017 Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron Lecturer: Sanmi Koyejo Scribe: Ke Wang, Oct. 24th, 2017 Agenda Recap: SVM and Hinge loss, Representer
More informationMachine Learning. Support Vector Machines. Fabio Vandin November 20, 2017
Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training
More informationMidterm. Introduction to Machine Learning. CS 189 Spring You have 1 hour 20 minutes for the exam.
CS 189 Spring 2013 Introduction to Machine Learning Midterm You have 1 hour 20 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. Please use non-programmable calculators
More informationIntroduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research
Introduction to Machine Learning Lecture 7 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Convex Optimization Differentiation Definition: let f : X R N R be a differentiable function,
More informationSupport Vector Machines
Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot, we get creative in two
More informationSupport Vector Machine I
Support Vector Machine I Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative Please use piazza. No emails. HW 0 grades are back. Re-grade request for one week. HW 1 due soon. HW
More informationHomework 4. Convex Optimization /36-725
Homework 4 Convex Optimization 10-725/36-725 Due Friday November 4 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)
More informationShiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient
Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 4 Subgradient Shiqian Ma, MAT-258A: Numerical Optimization 2 4.1. Subgradients definition subgradient calculus duality and optimality conditions Shiqian
More informationChapter 2: Preliminaries and elements of convex analysis
Chapter 2: Preliminaries and elements of convex analysis Edoardo Amaldi DEIB Politecnico di Milano edoardo.amaldi@polimi.it Website: http://home.deib.polimi.it/amaldi/opt-14-15.shtml Academic year 2014-15
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Proximal-Gradient Mark Schmidt University of British Columbia Winter 2018 Admin Auditting/registration forms: Pick up after class today. Assignment 1: 2 late days to hand in
More informationCOMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017
COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University UNDERDETERMINED LINEAR EQUATIONS We
More informationNon-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines
Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationSupport Vector Machines
EE 17/7AT: Optimization Models in Engineering Section 11/1 - April 014 Support Vector Machines Lecturer: Arturo Fernandez Scribe: Arturo Fernandez 1 Support Vector Machines Revisited 1.1 Strictly) Separable
More informationLINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning
LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES Supervised Learning Linear vs non linear classifiers In K-NN we saw an example of a non-linear classifier: the decision boundary
More informationCS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning
CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning Professor Erik Sudderth Brown University Computer Science October 4, 2016 Some figures and materials courtesy
More informationClassification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1).
Regression and PCA Classification The goal: map from input X to a label Y. Y has a discrete set of possible values We focused on binary Y (values 0 or 1). But we also discussed larger number of classes
More informationWarm up. Regrade requests submitted directly in Gradescope, do not instructors.
Warm up Regrade requests submitted directly in Gradescope, do not email instructors. 1 float in NumPy = 8 bytes 10 6 2 20 bytes = 1 MB 10 9 2 30 bytes = 1 GB For each block compute the memory required
More informationDecidability of consistency of function and derivative information for a triangle and a convex quadrilateral
Decidability of consistency of function and derivative information for a triangle and a convex quadrilateral Abbas Edalat Department of Computing Imperial College London Abstract Given a triangle in the
More informationBasis Expansion and Nonlinear SVM. Kai Yu
Basis Expansion and Nonlinear SVM Kai Yu Linear Classifiers f(x) =w > x + b z(x) = sign(f(x)) Help to learn more general cases, e.g., nonlinear models 8/7/12 2 Nonlinear Classifiers via Basis Expansion
More informationA short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie
A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie Computational Biology Program Memorial Sloan-Kettering Cancer Center http://cbio.mskcc.org/leslielab
More informationIntroduction to Machine Learning Midterm Exam Solutions
10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,
More informationLasso, Ridge, and Elastic Net
Lasso, Ridge, and Elastic Net David Rosenberg New York University February 7, 2017 David Rosenberg (New York University) DS-GA 1003 February 7, 2017 1 / 29 Linearly Dependent Features Linearly Dependent
More informationCS6375: Machine Learning Gautam Kunapuli. Support Vector Machines
Gautam Kunapuli Example: Text Categorization Example: Develop a model to classify news stories into various categories based on their content. sports politics Use the bag-of-words representation for this
More informationConstrained Optimization and Lagrangian Duality
CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may
More informationReferences. Lecture 7: Support Vector Machines. Optimum Margin Perceptron. Perceptron Learning Rule
References Lecture 7: Support Vector Machines Isabelle Guyon guyoni@inf.ethz.ch An training algorithm for optimal margin classifiers Boser-Guyon-Vapnik, COLT, 992 http://www.clopinet.com/isabelle/p apers/colt92.ps.z
More informationLinear Regression. Aarti Singh. Machine Learning / Sept 27, 2010
Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X
More informationDiscriminative Learning and Big Data
AIMS-CDT Michaelmas 2016 Discriminative Learning and Big Data Lecture 2: Other loss functions and ANN Andrew Zisserman Visual Geometry Group University of Oxford http://www.robots.ox.ac.uk/~vgg Lecture
More informationMachine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression
Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Due: Monday, February 13, 2017, at 10pm (Submit via Gradescope) Instructions: Your answers to the questions below,
More informationLecture 8. Strong Duality Results. September 22, 2008
Strong Duality Results September 22, 2008 Outline Lecture 8 Slater Condition and its Variations Convex Objective with Linear Inequality Constraints Quadratic Objective over Quadratic Constraints Representation
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationComplexity Bounds of Radial Basis Functions and Multi-Objective Learning
Complexity Bounds of Radial Basis Functions and Multi-Objective Learning Illya Kokshenev and Antônio P. Braga Universidade Federal de Minas Gerais - Depto. Engenharia Eletrônica Av. Antônio Carlos, 6.67
More informationCOS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 10
COS53: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 0 MELISSA CARROLL, LINJIE LUO. BIAS-VARIANCE TRADE-OFF (CONTINUED FROM LAST LECTURE) If V = (X n, Y n )} are observed data, the linear regression problem
More informationLinear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)
Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training
More informationChapter 9. Support Vector Machine. Yongdai Kim Seoul National University
Chapter 9. Support Vector Machine Yongdai Kim Seoul National University 1. Introduction Support Vector Machine (SVM) is a classification method developed by Vapnik (1996). It is thought that SVM improved
More informationAccelerate Subgradient Methods
Accelerate Subgradient Methods Tianbao Yang Department of Computer Science The University of Iowa Contributors: students Yi Xu, Yan Yan and colleague Qihang Lin Yang (CS@Uiowa) Accelerate Subgradient Methods
More informationMachine Learning for NLP
Machine Learning for NLP Uppsala University Department of Linguistics and Philology Slides borrowed from Ryan McDonald, Google Research Machine Learning for NLP 1(50) Introduction Linear Classifiers Classifiers
More informationIntroduction to Machine Learning Midterm, Tues April 8
Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend
More informationGeneralization, Overfitting, and Model Selection
Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification Maria-Florina (Nina) Balcan 10/03/2016 Two Core Aspects of Machine Learning Algorithm Design. How
More informationShort Course Robust Optimization and Machine Learning. 3. Optimization in Supervised Learning
Short Course Robust Optimization and 3. Optimization in Supervised EECS and IEOR Departments UC Berkeley Spring seminar TRANSP-OR, Zinal, Jan. 16-19, 2012 Outline Overview of Supervised models and variants
More informationMath 273a: Optimization Subgradients of convex functions
Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 20 Subgradients Assumptions
More informationDay 4: Classification, support vector machines
Day 4: Classification, support vector machines Introduction to Machine Learning Summer School June 18, 2018 - June 29, 2018, Chicago Instructor: Suriya Gunasekar, TTI Chicago 21 June 2018 Topics so far
More informationSCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University.
SCMA292 Mathematical Modeling : Machine Learning Krikamol Muandet Department of Mathematics Faculty of Science, Mahidol University February 9, 2016 Outline Quick Recap of Least Square Ridge Regression
More informationLinear, Binary SVM Classifiers
Linear, Binary SVM Classifiers COMPSCI 37D Machine Learning COMPSCI 37D Machine Learning Linear, Binary SVM Classifiers / 6 Outline What Linear, Binary SVM Classifiers Do 2 Margin I 3 Loss and Regularized
More informationOslo Class 2 Tikhonov regularization and kernels
RegML2017@SIMULA Oslo Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT May 3, 2017 Learning problem Problem For H {f f : X Y }, solve min E(f), f H dρ(x, y)l(f(x), y) given S n
More informationCOMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma
COMS 4771 Introduction to Machine Learning James McInerney Adapted from slides by Nakul Verma Announcements HW1: Please submit as a group Watch out for zero variance features (Q5) HW2 will be released
More informationEE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015
EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,
More informationSubgradient Method. Ryan Tibshirani Convex Optimization
Subgradient Method Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last last time: gradient descent min x f(x) for f convex and differentiable, dom(f) = R n. Gradient descent: choose initial
More informationKernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning
Kernel Machines Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 SVM linearly separable case n training points (x 1,, x n ) d features x j is a d-dimensional vector Primal problem:
More information