Optimization. Yuh-Jye Lee. March 21, Data Science and Machine Intelligence Lab National Chiao Tung University 1 / 29

Similar documents
Optimization. Yuh-Jye Lee. March 28, Data Science and Machine Intelligence Lab National Chiao Tung University 1 / 40

Machine Learning. Yuh-Jye Lee. February 12, Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU

Terminal velocity of solid particles in the fluid can be determined from;

Lecture 18: Optimization Programming

MATH 4211/6211 Optimization Basics of Optimization Problems

Constrained optimization: direct methods (cont.)

Continuous Optimisation, Chpt 6: Solution methods for Constrained Optimisation


Lecture 3. Optimization Problems and Iterative Algorithms

Algorithms for constrained local optimization

Lecture 3: Basics of set-constrained and unconstrained optimization

5 Handling Constraints

Optimization Methods. Lecture 19: Line Searches and Newton s Method

Algorithms for nonlinear programming problems II

Constrained Optimization

Optimization Problems with Constraints - introduction to theory, numerical Methods and applications

10 Numerical methods for constrained problems

minimize x subject to (x 2)(x 4) u,

Optimization Tutorial 1. Basic Gradient Descent

Nonlinear Optimization for Optimal Control

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

Algorithms for nonlinear programming problems II

ECE580 Exam 1 October 4, Please do not write on the back of the exam pages. Extra paper is available from the instructor.

Constrained Optimization and Lagrangian Duality

2.3 Linear Programming

Quiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:

4TE3/6TE3. Algorithms for. Continuous Optimization

Solution Methods. Richard Lusby. Department of Management Engineering Technical University of Denmark

NONLINEAR. (Hillier & Lieberman Introduction to Operations Research, 8 th edition)

Optimality conditions for unconstrained optimization. Outline

Unconstrained Optimization

March 8, 2010 MATH 408 FINAL EXAM SAMPLE

Determination of Feasible Directions by Successive Quadratic Programming and Zoutendijk Algorithms: A Comparative Study

CSCI 1951-G Optimization Methods in Finance Part 09: Interior Point Methods

Homework 4. Convex Optimization /36-725

Convex Optimization. Problem set 2. Due Monday April 26th

8 Numerical methods for unconstrained problems

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints

Optimality Conditions for Constrained Optimization

Lectures 9 and 10: Constrained optimization problems and their optimality conditions

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Numerical Optimization

Numerical Optimization

Multidisciplinary System Design Optimization (MSDO)

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

1 Computing with constraints

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

Module 04 Optimization Problems KKT Conditions & Solvers

Lecture 2: Linear SVM in the Dual

Lagrange duality. The Lagrangian. We consider an optimization program of the form

Some new facts about sequential quadratic programming methods employing second derivatives

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

Linear programming II

Nonlinear Optimization: What s important?

10-725/ Optimization Midterm Exam

The Steepest Descent Algorithm for Unconstrained Optimization

Quadratic Programming

Interior Point Algorithms for Constrained Convex Optimization

Part 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL)

CE 191: Civil & Environmental Engineering Systems Analysis. LEC 17 : Final Review

CSCI : Optimization and Control of Networks. Review on Convex Optimization

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

Written Examination

Numerical optimization

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

Lecture 16: October 22

Constrained Nonlinear Optimization Algorithms

LMI Methods in Optimal and Robust Control

5.5 Quadratic programming

Optimisation in Higher Dimensions

Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems

Line Search Methods for Unconstrained Optimisation

Machine Learning A Geometric Approach

Scientific Computing: Optimization

Examination paper for TMA4180 Optimization I

5.6 Penalty method and augmented Lagrangian method

A nonlinear equation is any equation of the form. f(x) = 0. A nonlinear equation can have any number of solutions (finite, countable, uncountable)

IE 5531 Midterm #2 Solutions

More First-Order Optimization Algorithms

CS711008Z Algorithm Design and Analysis

MVE165/MMG631 Linear and integer optimization with applications Lecture 13 Overview of nonlinear programming. Ann-Brith Strömberg

Modern Optimization Techniques

Convex Optimization. Prof. Nati Srebro. Lecture 12: Infeasible-Start Newton s Method Interior Point Methods

Lecture 13: Constrained optimization

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

Introduction to unconstrained optimization - direct search methods

Second Order Optimization Algorithms I

CHAPTER 2: QUADRATIC PROGRAMMING

A SHIFTED PRIMAL-DUAL INTERIOR METHOD FOR NONLINEAR OPTIMIZATION

Algorithms for Constrained Optimization

N. L. P. NONLINEAR PROGRAMMING (NLP) deals with optimization models with at least one nonlinear function. NLP. Optimization. Models of following form:

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

Paul Schrimpf. October 18, UBC Economics 526. Unconstrained optimization. Paul Schrimpf. Notation and definitions. First order conditions

ICS-E4030 Kernel Methods in Machine Learning

Lecture 7: Convex Optimizations

Optimization Methods. Lecture 18: Optimality Conditions and. Gradient Methods. for Unconstrained Optimization

A Brief Review on Convex Optimization

Computational Finance

Transcription:

Optimization Yuh-Jye Lee Data Science and Machine Intelligence Lab National Chiao Tung University March 21, 2017 1 / 29

You Have Learned (Unconstrained) Optimization in Your High School Let f (x) = ax 2 + bx + c, a 0, x = b 2a Case 1 : f (x ) = 2a > 0 x arg min f (x) x R Case 2 : f (x ) = 2a < 0 x arg max f (x) x R For minimization problem (Case I), f (x ) = 0 is called the first order optimality condition. f (x ) > 0 is the second order optimality condition. 2 / 29

Optimization Examples in Machine Learning 1 Maximum likelihood estimation 2 Maximum a posteriori estimation 3 Least squares estimates 4 Gradient descent method 5 Backpropagation 3 / 29

Gradient and Hessian Let f : R n R be a differentiable function. The gradient of function f at a point x R n is defined as f (x) = [ f (x), x 1 f (x),..., x 2 f (x) ] R n x n If f : R n R is a twice differentiable function. The Hessian matrix of f at a point x R n is defined as 2 f (x) = 2 f x 2 1. 2 f x n x 1 2 f x 1 x 2 2 f..... 2 f x n x 2 2 f xn 2 x 1 x n Rn n 4 / 29

Example of Gradient and Hessian f (x) = x1 2 + x2 2 2x 1 + 4x 2 = 1 [ ] [ ] [ ] 2 0 x1 x1 x 2 + [ 2 4 ] [ ] x 1 2 0 2 x 2 x 2 f (x) = [ 2x 1 2 2x 2 + 4 ] [ ], 2 2 0 f (x) = 0 2 [ ] 1 By letting f (x) = 0, we have x = arg min 2 f (x) x R 2 5 / 29

Quadratic Functions (Standard Form) f (x) = 1 2 x Hx + p x Let f : R n R and f (x) = 1 2 x Hx + p x where H R n n is a symmetric matrix and p R n then f (x) = Hx + p 2 f (x) = H (Hessian) Note: If H is positive definite, then x = H 1 p is the unique solution of min f (x). 6 / 29

Least-squares Problem min Ax x R b 2 n 2, A Rm n, b R m f (x) = (Ax b) (Ax b) = x A Ax 2b Ax + b b f (x) = 2A Ax 2A b 2 f (x) = 2A A x = (A A) 1 A b arg min x R n Ax b 2 2 If A A is nonsingular matrix P.D. Note : x is an analytical solution. 7 / 29

How to Solve an Unconstrained MP Get an initial point and iteratively decrease the obj. function value. Stop once the stopping criteria satisfied. Steep decent might not be a good choice. Newtons method is highly recommended. Local and quadratic convergent algorithm. Need to choose a good step size to guarantee global convergence. 8 / 29

The First Order Taylor Expansion Let f : R n R be a differentiable function where f (x + d) = f (x) + f (x) d + α(x, d) d, lim α(x, d) = 0 d 0 If f (x) d < 0 and d is small enough then f (x + d) < f (x). We call d is a descent direction. 9 / 29

Steep Descent with Exact Line Search Start with any x 0 R n. Having x i, stop if f (x i ) = 0. Else compute x i+1 as follows: 1 Steep descent direction: d i = f (x i ) 2 Exact line search: Choose a stepsize such that df (x i + λd i ) dλ = f (x i + λd i ) = 0 3 Updating: x i+1 = x i + λd i 10 / 29

MATLAB Code for Steep Descent with Exact Line Search (Quadratic Function Only) function [x, f value, iter] = grdlines(q, p, x0, esp) % % min 0.5 x Qx + p x % Solving unconstrained minimization via % steep descent with exact line search % 11 / 29

flag = 1; iter = 0; while flag > esp grad = Qx 0 +p; temp1 = grad *grad; if temp1 < 10 12 flag = esp; else stepsize = temp1/(grad *Q*grad); x 1 = x 0 - stepsize*grad; flag = norm(x 1 -x 0 ); x 0 = x 1 ; end; iter = iter + 1; end; x = x 0 ; f value = 0.5*x *Q*x+p *x; 12 / 29

The Key Idea of Newton s Method Let f : R n R be a twice differentiable function f (x + d) = f (x) + f (x) d + 1 2 d 2 f (x)d + β(x, d) d where lim d 0 β(x, d) = 0 At i th iteration, use a quadratic function to approximate f (x) f (x i ) + f (x i )(x x i ) + 1 2 (x x i ) 2 f (x i )(x x i ) x i+1 = arg min f (x) 13 / 29

Newton s Method Start with x 0 R n. Having x i,stop if f (x i ) = 0 Else compute x i+1 as follows: 1 Newton direction: 2 f (x i )d i = f (x i ) Have to solve a system of linear equations here! 2 Updating: x i+1 = x i + d i Converge only when x 0 is close to x enough. 14 / 29

f(x) =à 1 6 x 6 + 1 4 x 4 +2x 2 f (x) = 1 6 g(x) x 6 + 1 4 =f(x x 4 i + 2x 2 g(x) = f (x i ) + f )+f (x i )(x 0 (x i x)(x i ) + à 1 x i )+ 1 2 f 00 (x i )(x à x i ) 2 f (x i )(x x i ) 2 It can not It can converge not converge to the optimal to the solution. optimal solution. 15 / 29

Constrained Optimization Problem Problem setting: Given function f, g i, i = 1,..., k and h j, j = 1,..., m, defined on a domain Ω R n, min x Ω f (x) s.t. g i (x) 0, i h j (x) = 0, where f (x) is called the objective function and g(x) 0, h(x) = 0 are called constrains. j 16 / 29

Example <sol> min f (x) = 2x1 2 + x 2 2 + 3x 3 2 s.t. 2x 1 3x 2 + 4x 3 = 49 L(x, β) = f (x) + β(2x 1 3x 2 + 4x 3 49), β R L(x, β) = 0 x 1 4x 1 + 2β = 0 L(x, β) = 0 x 2 2x 2 3β = 0 L(x, β) = 0 x 3 6x 3 + 4β = 0 2x 1 3x 2 + 4x 3 49 = 0 β = 6 x 1 = 3, x 2 = 9, x 3 = 4 17 / 29

min x R 2 x2 1 + x 2 2 x 1 + x 2 4 x 1 x 2 2 x 1, x 2 0 f(x) = [2x 1, 2x 2 ] f(x ) = [2, 2] 18 / 29

Definitions and Notation Feasible region: where g(x) = F = {x Ω g(x) 0, h(x) = 0} g 1 (x) h 1 (x) and h(x) =. g k (x). h m (x) A solution of the optimization problem is a point x F such that x F for which f (x) < f (x ) and x is called a global minimum. 19 / 29

Definitions and Notation A point x F is called a local minimum of the optimization problem if ε > 0 such that f (x) f ( x), x F and x x < ε At the solution x, an inequality constraint g i (x) is said to be active if g i (x ) = 0, otherwise it is called an inactive constraint. g i (x) 0 g i (x) + ξ i = 0, ξ i 0 where ξ i is called the slack variable 20 / 29

Definitions and Notation Remove an inactive constraint in an optimization problem will NOT affect the optimal solution Very useful feature in SVM If F = R n then the problem is called unconstrained minimization problem Least square problem is in this category SSVM formulation is in this category Difficult to find the global minimum without convexity assumption 21 / 29

The Most Important Concepts in Optimization(minimization) A point is said to be an optimal solution of a unconstrained minimization if there exists no decent direction = f (x ) = 0 A point is said to be an optimal solution of a constrained minimization if there exists no feasible decent direction = KKT conditions There might exist decent direction but move along this direction will leave out the feasible region 22 / 29

Minimum Principle Let f : R n R be a convex and differentiable function F R n be the feasible region. x arg min x F f (x) f (x )(x x ) 0 x F Example: min(x 1) 2 s.t. a x b 23 / 29

min x R 2 x2 1 + x 2 2 x 1 + x 2 4 x 1 x 2 2 x 1, x 2 0 f(x) = [2x 1, 2x 2 ] f(x ) = [2, 2] 24 / 29

Linear Programming Problem An optimization problem in which the objective function and all constraints are linear functions is called a linear programming problem (LP) min p x s.t. Ax b Cx = d L x U 25 / 29

Linear Programming Solver in MATLAB X=LINPROG(f,A,b) attempts to solve the linear programming problem: min x f *x subject to: A*x <= b X=LINPROG(f,A,b,Aeq,beq) solves the problem above while additionally satisfying the equality constraints Aeq*x = beq. X=LINPROG(f,A,b,Aeq,beq,LB,UB) defines a set of lower and upper bounds on the design variables, X, so that the solution is in the range LB <= X <= UB. Use empty matrices for LB and UB if no bounds exist. Set LB(i) = -Inf if X(i) is unbounded below; set UB(i) = Inf if X(i) is unbounded above. 26 / 29

Linear Programming Solver in MATLAB X=LINPROG(f,A,b,Aeq,beq,LB,UB,X0) sets the starting point to X0. This option is only available with the active-set algorithm. The default interior point algorithm will ignore any non-empty starting point. You can type help linprog in MATLAB to get more information! 27 / 29

L 1 -Approximation: min x R n Ax b 1 z 1 = m z i i=1 min x,s 1 s min x,s m s i i=1 Or s.t. s Ax b s s.t. s i A i x b i s i [ ] [ ] x min 0 0 1 1 x,s s [ ] [ ] [ ] A I x b s.t. A I s b 2m (n+m) i 28 / 29

Chebyshev Approximation: min x R n Ax b z = max 1 i m z i min x,γ s.t. γ 1γ Ax b 1γ [ ] [ ] x min 0 0 1 x,s γ [ ] [ ] [ ] A 1 x b s.t. A 1 γ b 2m (n+1) 29 / 29