Convex Optimization & Machine Learning. Introduction to Optimization

Similar documents
January 29, Introduction to optimization and complexity. Outline. Introduction. Problem formulation. Convexity reminder. Optimality Conditions

IOE 611/Math 663: Nonlinear Programming

1. Introduction. mathematical optimization. least-squares and linear programming. convex optimization. example. course goals and topics

Convex Optimization and l 1 -minimization

IE 521 Convex Optimization

Convex optimization COMS 4771

Lecture 1: Introduction

CSCI : Optimization and Control of Networks. Review on Convex Optimization

A Brief Review on Convex Optimization

CSCI 1951-G Optimization Methods in Finance Part 10: Conic Optimization

AN INTRODUCTION TO CONVEXITY

Course Outline. FRTN10 Multivariable Control, Lecture 13. General idea for Lectures Lecture 13 Outline. Example 1 (Doyle Stein, 1979)

LMI MODELLING 4. CONVEX LMI MODELLING. Didier HENRION. LAAS-CNRS Toulouse, FR Czech Tech Univ Prague, CZ. Universidad de Valladolid, SP March 2009

4. Convex optimization problems

Lecture Note 5: Semidefinite Programming for Stability Analysis

1. Introduction. mathematical optimization. least-squares and linear programming. convex optimization. example. course goals and topics

1. Introduction. mathematical optimization. least-squares and linear programming. convex optimization. example. course goals and topics

Lecture 1 Introduction

Foundation of Intelligent Systems, Part I. SVM s & Kernel Methods

Polyhedral Computation. Linear Classifiers & the SVM

The Q-parametrization (Youla) Lecture 13: Synthesis by Convex Optimization. Lecture 13: Synthesis by Convex Optimization. Example: Spring-mass System

Constrained Optimization and Lagrangian Duality

Introduction to Convex Optimization

Convex Optimization in Communications and Signal Processing

Lecture Note 1: Introduction to optimization. Xiaoqun Zhang Shanghai Jiao Tong University

COURSE ON LMI PART I.2 GEOMETRY OF LMI SETS. Didier HENRION henrion

Convex optimization problems. Optimization problem in standard form

FRTN10 Multivariable Control, Lecture 13. Course outline. The Q-parametrization (Youla) Example: Spring-mass System

Chapter 2: Preliminaries and elements of convex analysis

What can be expressed via Conic Quadratic and Semidefinite Programming?

Mathematical Optimization Models and Applications

CS295: Convex Optimization. Xiaohui Xie Department of Computer Science University of California, Irvine

Lecture 6 - Convex Sets

Statistical Methods for SVM

Lecture 2: Convex Sets and Functions

Introduction and Math Preliminaries

HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.

Lecture 1: Convex Sets January 23

Canonical Problem Forms. Ryan Tibshirani Convex Optimization

In English, this means that if we travel on a straight line between any two points in C, then we never leave C.

Appendix PRELIMINARIES 1. THEOREMS OF ALTERNATIVES FOR SYSTEMS OF LINEAR CONSTRAINTS

Nonlinear Programming Models

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines

Nonlinear Optimization: The art of modeling

Mathematics 530. Practice Problems. n + 1 }

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Math 5593 Linear Programming Week 1

Complexity of linear programming: outline

Linear Algebra Review: Linear Independence. IE418 Integer Programming. Linear Algebra Review: Subspaces. Linear Algebra Review: Affine Independence

MOSEK Modeling Cookbook Release 3.0. MOSEK ApS

Machine Learning Practice Page 2 of 2 10/28/13

Selected Examples of CONIC DUALITY AT WORK Robust Linear Optimization Synthesis of Linear Controllers Matrix Cube Theorem A.

Lecture 1: Introduction. Outline. B9824 Foundations of Optimization. Fall Administrative matters. 2. Introduction. 3. Existence of optima

Disciplined Convex Programming

Introduction to Mathematical Programming IE406. Lecture 3. Dr. Ted Ralphs

Lecture: Convex Optimization Problems

Lecture 1: Introduction. Outline. B9824 Foundations of Optimization. Fall Administrative matters. 2. Introduction. 3. Existence of optima

The proximal mapping

Homework 3. Convex Optimization /36-725

1 Strict local optimality in unconstrained optimization

Support Vector Machines

IE 5531: Engineering Optimization I

2 where. x 1 θ = 1 θ = 0.6 x 2 θ = 0 θ = 0.2

COMP 652: Machine Learning. Lecture 12. COMP Lecture 12 1 / 37

Kernel Methods. Machine Learning A W VO

Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Foundation of Intelligent Systems, Part I. Regression

Assignment 1: From the Definition of Convexity to Helley Theorem

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17

4. Convex optimization problems

Distributionally Robust Convex Optimization

Convex Sets. Prof. Dan A. Simovici UMB

Introduction to Support Vector Machines

18.9 SUPPORT VECTOR MACHINES

Semidefinite and Second Order Cone Programming Seminar Fall 2012 Project: Robust Optimization and its Application of Robust Portfolio Optimization

Key Things We Learned Last Time. IE418 Integer Programming. Proving Facets Way #2 Indirect. A More Abstract Example

Machine Learning And Applications: Supervised Learning-SVM

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

10725/36725 Optimization Homework 4

CS675: Convex and Combinatorial Optimization Fall 2016 Convex Optimization Problems. Instructor: Shaddin Dughmi

The Ellipsoid Algorithm

Lecture 8. Strong Duality Results. September 22, 2008

CO 250 Final Exam Guide

Lecture 4: September 12

MS-E2140. Lecture 1. (course book chapters )

Qualifying Exam in Machine Learning

Lecture 2: Linear Algebra Review

1 ** The performance objectives highlighted in italics have been identified as core to an Algebra II course.

Summer School: Semidefinite Optimization

Linear and non-linear programming

Farkas Lemma. Rudi Pendavingh. Optimization in R n, lecture 2. Eindhoven Technical University. Rudi Pendavingh (TUE) Farkas Lemma ORN2 1 / 15

3. Linear Programming and Polyhedral Combinatorics

3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 18

DO NOT OPEN THIS QUESTION BOOKLET UNTIL YOU ARE TOLD TO DO SO

Introduction to Real Analysis Alternative Chapter 1

ORF 523 Lecture 9 Spring 2016, Princeton University Instructor: A.A. Ahmadi Scribe: G. Hall Thursday, March 10, 2016

Real Symmetric Matrices and Semidefinite Programming

Integer Programming ISE 418. Lecture 12. Dr. Ted Ralphs

Transcription:

Convex Optimization & Machine Learning Introduction to Optimization mcuturi@i.kyoto-u.ac.jp CO&ML 1

Why do we need optimization in machine learning We want to find the best possible decision w.r.t. a problem In a supervised setting for instance, we want to learn a map X Y We consider a set of candidates F for such a decision f F X Y CO&ML 2

Why do we need optimization in machine learning f F X Y Quantify how well a candidate function in F fits with the database define a data-dependent criterion C data Typically, given a function f, C data (f) is big if f not accurate on the data. Given both F and C data, a method to find an optimal candidate: min f F C data (f). CO&ML 3

Quizz: Regression 14 12 Scatter plot of Rent vs. Surface y = 0.13*x + 2.4 y = 0.0033*x 2 + 0.35*x 0.71 y = 6.2e 05*x 3 0.01*x 2 + 0.59*x 3.1 y = 1.4e 06*x 4 0.00016*x 3 + 0.0016*x 2 + 0.33*x 1.2 y = 1.8e 07*x 5 + 3.7e 05*x 4 0.0028*x 3 + 0.092*x 2 1.1*x + 7.1 apartment linear quadratic cubic 4th degree 5th degree Rent (x 10.000 JPY) 10 8 6 4 2 10 20 30 40 50 60 70 80 Surface Does Least-Square regression fall into this approach? 1. Yes 2. No CO&ML 4

Quizz: k-nearest neighbors X Does k-nearest neighbors fall into this approach? 1. Yes 2. No CO&ML 5

Quizz: SVM Does the SVM fall into this approach? 1. Yes 2. No CO&ML 6

What is optimization? A general formulation for optimization problem is that of defining unknown variables x 1,x 2,,x n X 1 X 2 X n, and solve minimize (or mazimize) f(x 1,x 2,,x n ), subject to f i (x 1,x 2,,x n ) { <,> =, } b i,i = 1,2,,m; CO&ML 7

What is optimization? A general formulation for optimization problem is that of defining unknown variables x 1,x 2,,x n X 1 X 2 X n, and solve where minimize (or mazimize) f(x 1,x 2,,x n ), subject to f i (x 1,x 2,,x n ) { <,> =, } b i,i = 1,2,,m; the b i R functions f (objective) and g 1,g 2,,g m (constraints) are functions X 1 X 2 X n R CO&ML 8

What is optimization? A general formulation for optimization problem is that of defining unknown variables x 1,x 2,,x n X 1 X 2 X n, and solve minimize (or mazimize) f(x 1,x 2,,x n ), subject to f i (x 1,x 2,,x n ) the sets X i need not be the same, as X i might be R scalar numbers, Z integers, S + n positive definite matrices, strings of letters, etc. { <,> =, When the X i are different, the adjective mixed usually comes in. } b i,i = 1,2,,m; CO&ML 9

Optimization & Mathematical Programming Optimization is field of applied mathematics on its own. Also called Mathematical Programming. Mathematical Programming is not about programming code for mathematics! CO&ML 10

Optimization & Mathematical Programming Mathematical Programming is not about programming code for mathematics! George Dantzig, who proposed the first optimization algorithm, explains: The military refer to their various plans or proposed schedules of training, logistical supply and deployment of combat units as a program. When I first analyzed the Air Force planning problem and saw that it could be formulated as a system of linear inequalities, I called my paper Programming in a Linear Structure. Note that the term program was used for linear programs long before it was used as the set of instructions used by a computer. In the early days, these instructions were called codes. CO&ML 11

Mathematical Programming In the summer of 1948, Koopmans and I visited the Rand Corporation. One day we took a stroll along the Santa Monica beach. Koopmans said: Why not shorten Programming in a Linear Structure to Linear Programming? I replied: Thats it! From now on that will be its name. Later that day I gave a talk at Rand, entitled Linear Programming; years later Tucker shortened it to Linear Program. The term Mathematical Programming is due to Robert Dorfman of Harvard, who felt as early as 1949 that the term Linear Programming was too restrictive. CO&ML 12

Mathematical Programming Today mathematical programming = optimization. A relatively new discipline What seems to characterize the pre-1947 era was lack of any interest in trying to optimize. T. Motzkin in his scholarly thesis written in 1936 cites only 42 papers on linear inequality systems, none of which mentioned an objective function. CO&ML 13

Before we move on to some reminders Keep in mind that optimization is hard. very hard in general. In 60 years, we have gone from nothing to quite a few successes. But always keep in mind that most problems are intractable. For some particular problems there is hope: CONVEX problems CO&ML 14

Before we move on to some reminders Evaluation = Programming Assignments CO&ML 15

Reminders Sources: Stephen Boyd s slides CO&ML 16

Reminders: Convex set line segment between x 1 and x 2 : all points {x = λx 1 +(1 λ)x 2, 0 λ 1} convex set: contains line segment between any two points in the set C is convex x 1,x 2 C,0 λ 1; λx 1 +(1 λ)x 2 C examples (one convex, two nonconvex sets) CO&ML 17

Convex combination and convex hull convex combination of x 1,..., x k : any point x of the form with λ 1 + +λ k = 1, λ i 0 x = λ 1 x 1 +λ 2 x 2 + +λ k x k convex hull S : set of all convex combinations of points in S CO&ML 18

Convex cone conic (nonnegative) combination of x 1 and x 2 : any point of the form with λ 1 0, λ 2 0 x = λ 1 x 1 +λ 2 x 2 x 1 0 x 2 convex cone: set that contains all conic combinations of points in the set CO&ML 19

Hyperplanes and halfspaces hyperplane: set of the form {x a T x = b} (a 0) a x 0 x a T x = b halfspace: set of the form {x a T x b} (a 0) a x 0 a T x b a T x b a is the normal vector hyperplanes are affine and convex; halfspaces are convex CO&ML 20

Norm balls and norm cones norm: a function that satisfies x 0; x = 0 if and only if x = 0 tx = t x for t R x+y x + y notation: is general (unspecified) norm; symb is particular norm norm ball with center x c and radius r: {x x x c r} 1 norm cone: {(x,t) x t} Euclidean norm cone is called secondorder cone cones are convex t 0.5 1 0 0 x 2 1 1 x 1 0 1 CO&ML 21

Usual norms for vectors in R d l 2 norm: x 2 = x T x = d i=1 x 2 i l 1 norm: x 1 = d x i i=1 l p norm: x p = ( d ) p 1 x i p i=1 CO&ML 22

Quizz: l p norms The unit ball of the l p norm is {x x p 1} The unit ball of the l p norm is convex. 1. True 2. False CO&ML 23

Polyhedra solution set of finitely many linear inequalities and equalities Ax b, Cx = d (A R m n, C R p n, is componentwise inequality) a 1 a2 a 5 P a 3 a 4 polyhedron is intersection of finite number of halfspaces and hyperplanes CO&ML 24

Positive semidefinite cone notation: S n is set of symmetric n n matrices S n + = {X S n X 0}: positive semidefinite n n matrices X S n + z T Xz 0 for all z S n + is a convex cone S n ++ = {X S n X 0}: positive definite n n matrices 1 example: [ x y y z ] S 2 + z 0.5 0 1 0 y 1 0 x 0.5 1 CO&ML 25

Euclidean balls and ellipsoids (Euclidean) ball with center x c and radius r: B(x c,r) = {x x x c 2 r} = {x c +ru u 2 1} ellipsoid: set of the form {x (x x c ) T P 1 (x x c ) 1} with P S n ++ (i.e., P symmetric positive definite) x c other representation: {x c +Au u 2 1} with A square and nonsingular CO&ML 26

optimization problems minimize f 0 (x) subject to f i (x) b i, i = 1,...,m x = (x 1,...,x n ): optimization variables f 0 : R n R: objective function f i : R n R, i = 1,...,m: constraint functions optimal solution x has smallest value of f 0 among all vectors that satisfy the constraints CO&ML 27

Quizz minimize f 0 (x) subject to f i (x) b i, i = 1,...,m If an optimization problem has an optimal solution x, this solution is unique. 1. True 2. False CO&ML 28

Examples portfolio optimization variables: amounts invested in different assets constraints: budget, max./min. investment per asset, minimum return objective: overall risk or return variance device sizing in electronic circuits variables: device widths and lengths constraints: manufacturing limits, timing requirements, maximum area objective: power consumption data fitting variables: model parameters constraints: prior information, parameter limits objective: measure of misfit or prediction error CO&ML 29

general optimization problem very difficult to solve Solving optimization problems methods involve some compromise, e.g., very long computation time, or not always finding the solution exceptions: certain problem classes can be solved efficiently and reliably least-squares problems linear programming problems convex optimization problems CO&ML 30

Least-squares minimize Ax b 2 2 solving least-squares problems analytical solution: x = (A T A) 1 A T b reliable and efficient algorithms and software computation time proportional to n 2 k (A R k n ); less if structured a mature technology using least-squares least-squares problems are easy to recognize a few standard techniques increase flexibility (e.g., including weights, adding regularization terms) CO&ML 31

Linear programming minimize c T x subject to a T i x b i, i = 1,...,m solving linear programs no analytical formula for solution reliable and efficient algorithms and software computation time proportional to n 2 m if m n; less with structure a mature technology using linear programming not as easy to recognize as least-squares problems a few standard tricks used to convert problems into linear programs (e.g., problems involving l 1 - or l -norms, piecewise-linear functions) CO&ML 32

Convex optimization problem minimize f 0 (x) subject to f i (x) b i, i = 1,...,m objective and constraint functions are convex: f i (αx+βy) αf i (x)+βf i (y) if α+β = 1, α 0, β 0 includes least-squares problems and linear programs as special cases CO&ML 33

Convex optimization problem solving convex optimization problems no analytical solution reliable and efficient algorithms computation time (roughly) proportional to max{n 3,n 2 m,f}, where F is cost of evaluating f i s and their first and second derivatives almost a technology using convex optimization often difficult to recognize many tricks for transforming problems into convex form surprisingly many problems can be solved via convex optimization CO&ML 34