Lecture 3. Optimization Problems and Iterative Algorithms

Similar documents
Constrained Optimization

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

Constrained Optimization and Lagrangian Duality

Lectures 9 and 10: Constrained optimization problems and their optimality conditions

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

5 Handling Constraints

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Convex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version

MVE165/MMG631 Linear and integer optimization with applications Lecture 13 Overview of nonlinear programming. Ann-Brith Strömberg

Lecture 2: Convex Sets and Functions

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Math 273a: Optimization Subgradients of convex functions

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints

Optimality Conditions for Constrained Optimization

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

Constrained optimization: direct methods (cont.)

Optimization Problems with Constraints - introduction to theory, numerical Methods and applications

CS-E4830 Kernel Methods in Machine Learning

Lecture: Duality of LP, SOCP and SDP

N. L. P. NONLINEAR PROGRAMMING (NLP) deals with optimization models with at least one nonlinear function. NLP. Optimization. Models of following form:

Numerical Optimization

Quiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006

Introduction to Nonlinear Stochastic Programming

Lagrangian Duality Theory

Chapter 2. Optimization. Gradients, convexity, and ALS

Numerical optimization

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method

Miscellaneous Nonlinear Programming Exercises

2.3 Linear Programming

Constrained Optimization Theory

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Lecture 8. Strong Duality Results. September 22, 2008

Convex Functions. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Computational Finance

Algorithms for constrained local optimization

4TE3/6TE3. Algorithms for. Continuous Optimization

Lecture 18: Optimization Programming

Lecture 4: Convex Functions, Part I February 1


4TE3/6TE3. Algorithms for. Continuous Optimization

Chap 2. Optimality conditions

Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems

CONSTRAINED NONLINEAR PROGRAMMING

8 Barrier Methods for Constrained Optimization

5. Duality. Lagrangian

Constrained optimization

Convex Optimization & Lagrange Duality

2.098/6.255/ Optimization Methods Practice True/False Questions

More on Lagrange multipliers

Penalty and Barrier Methods. So we again build on our unconstrained algorithms, but in a different way.

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

The general programming problem is the nonlinear programming problem where a given function is maximized subject to a set of inequality constraints.

Nonlinear Optimization: What s important?

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Primal/Dual Decomposition Methods

Lecture 1: Introduction. Outline. B9824 Foundations of Optimization. Fall Administrative matters. 2. Introduction. 3. Existence of optima

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey

8. Conjugate functions

SECTION C: CONTINUOUS OPTIMISATION LECTURE 11: THE METHOD OF LAGRANGE MULTIPLIERS

Sequential Unconstrained Minimization: A Survey

IE 5531: Engineering Optimization I

Introduction to Optimization Techniques. Nonlinear Optimization in Function Spaces

Interior Point Methods. We ll discuss linear programming first, followed by three nonlinear problems. Algorithms for Linear Programming Problems

Solving Dual Problems

Solution Methods. Richard Lusby. Department of Management Engineering Technical University of Denmark

In view of (31), the second of these is equal to the identity I on E m, while this, in view of (30), implies that the first can be written

Applications of Linear Programming

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Extended Monotropic Programming and Duality 1

Convex Analysis and Optimization Chapter 2 Solutions

Generalization to inequality constrained problem. Maximize

minimize x subject to (x 2)(x 4) u,

1 Computing with constraints

Scientific Computing: Optimization

Convex Optimization M2

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Date: July 5, Contents

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities

8 Numerical methods for unconstrained problems

Computational Optimization. Augmented Lagrangian NW 17.3

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

INTERIOR-POINT METHODS FOR NONCONVEX NONLINEAR PROGRAMMING: CONVERGENCE ANALYSIS AND COMPUTATIONAL PERFORMANCE

CSCI : Optimization and Control of Networks. Review on Convex Optimization

Convex Optimization Boyd & Vandenberghe. 5. Duality

Primal Solutions and Rate Analysis for Subgradient Methods

Outline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems

Subgradients. subgradients and quasigradients. subgradient calculus. optimality conditions via subgradients. directional derivatives

Lecture 1: Introduction. Outline. B9824 Foundations of Optimization. Fall Administrative matters. 2. Introduction. 3. Existence of optima

A Brief Review on Convex Optimization

Conic Linear Programming. Yinyu Ye

IOE 511/Math 652: Continuous Optimization Methods, Section 1

Lecture 8 Plus properties, merit functions and gap functions. September 28, 2008

Lecture 13 Newton-type Methods A Newton Method for VIs. October 20, 2008

Convex Optimization and Modeling

Transcription:

Lecture 3 Optimization Problems and Iterative Algorithms January 13, 2016 This material was jointly developed with Angelia Nedić at UIUC for IE 598ns

Outline Special Functions: Linear, Quadratic, Convex Criteria for Convexity of a Function Operations Preserving Convexity Unconstrained Optimization First-Order Necessary Optimality Conditions Constrained Optimization First-Order Necessary Optimality Conditions KKT Conditions Iterative Algorithms Stochastic Optimization 1

Convex Function f is convex when dom(f) is convex set and there holds f(αx + (1 α)y) αf(x) + (1 α)f(y) for all x, y dom(f) and α [0, 1] strictly convex if the inequality is strict for all x, y dom(f) & α (0, 1) Note that dom(f) is defined as dom(f) {x : f(x) < + }. Stochastic Optimization 2

f (x) f (y) f (x) x y x y f is concave when f is convex f is strictly concave when f is strictly convex Stochastic Optimization 3

Examples of Convex/Concave Functions Examples on R Convex: Affine: ax + b over R for any a, b R Exponential: e ax over R for any a R Power: x p over (0, + ) for p 1 or p 0 Powers of absolute value: x p over R for p 1 Negative entropy: x ln x over (0, + ) Concave: Affine: ax + b over R for any a, b R Powers: x p over (0, + ) for 0 p 1 Logarithm: ln x over (0, + ) Examples on R n Affine functions are both convex and concave Norms x, x 1, x are convex Stochastic Optimization 4

Second-Order Conditions for Convexity Let f be twice differentiable and let dom(f) be the domain of f [In general, when differentiability is considered, it is required that dom(f) is open] The Hessian 2 f(x) is a symmetric n n matrix whose entries are the second-order partial derivatives of f at x: [ 2 f(x) ] ij = 2 f(x) x i x j for i, j = 1,..., n 2nd-order conditions: f is convex if and only if dom(f) is convex set and 2 f(x) 0 for all x dom(f) Positive semidefiniteness of a matrix: [Recall that R n n M 0 if for all x R n, x T Mx 0] f is strictly convex if dom(f) is convex set 2 f(x) 0 for all x dom(f) Positive definiteness of a matrix: [Recall that R n n M 0 if for all x R n, x T Mx > 0] Stochastic Optimization 5

Examples Quadratic function: f(x) = (1/2)x Qx + q x + r with a symmetric n n matrix Q f(x) = Qx + q, 2 f(x) = Q Convex for Q 0 Least-squares objective: f(x) = Ax b 2 with an m n matrix A f(x) = 2A T (Ax b), Convex for any A 2 f(x) = 2A T A Quadratic-over-linear: f(x, y) = x 2 /y Convex for y > 0 2 f(x, y) = 2 y 3 [ y x ] [ y x ] T 0 Stochastic Optimization 6

First-Order Condition for Convexity Let f be differentiable and let dom(f) be its domain. Then, the gradient f(x) = f(x) x 1 f(x) x 2. f(x) x n exists at each x dom(f) 1st-order condition: f is convex if and only if dom(f) is convex and f(x) + f(x) T (z x) f(z) for all x, z dom(f) Note: A first order approximation is a global underestimate of f Stochastic Optimization 7

Very important property used in convex optimization for algorithm designs and performance analysis Stochastic Optimization 8

Operations Preserving Convexity Let f and g be convex functions over R n Positive Scaling: λf is convex for λ > 0; Sum: f + g is convex; (λf)(x) = λf(x) for all x (f + g)(x) = f(x) + g(x) for all x Composition with affine function: for g affine [i.e., g(x) = Ax + b], the composition f g is convex, where (f g)(x) = f(ax + b) for all x Pointwise maximum: For convex functions f 1,..., f m, the pointwisemax function h(x) = max {f 1 (x),..., f m (x)} is convex Polyhedral function: f(x) = max i=1,...,m (a T i x + b i ) is convex Pointwise supremum: Let Y R m and f : R n R m R. Let f(x, y) be convex in x for each y Y. Then, the supremum function over the set Y h(x) = sup y Y f(x, y) is convex Stochastic Optimization 9

Optimization Terminology Let C R n and f : C R. Consider the following optimization problem minimize subject to f(x) x C Example: C = {x R n g(x) 0, x X} Terminology: The set C is referred to as feasible set We say that the problem is feasible when C is nonempty The problem is unconstrained when C = R n, and it is constrained otherwise We say that a vector x is optimal solution or a global minimum when x is feasible and the value f(x ) is not exceeded at any x C, i.e., x C f(x ) f(x) for all x C Stochastic Optimization 10

Local Minimum minimize subject to f(x) x C A vector ˆx is a local minimum for the problem if ˆx C and there is a ball B(ˆx, r) such that f(ˆx) f(x) for all x C with x ˆx r Every global minimum is also a local minimum When the set C is convex and the function f is convex then a local minimum is also global Stochastic Optimization 11

First-Order Necessary Optimality Condition: Unconstrained Problem Let f be a differentiable function with dom(f) = R n and let C = R n. If ˆx is a local minimum of f over R n, then the following holds: f(ˆx) = 0 The gradient relation can be equivalently given as: (y ˆx) f(ˆx) 0 for all y R n This is a variational inequality V I(K, F ) with the set K and the mapping F given by K = R n, F (x) = f(x) Solving a minimization problem can be reduced to solving a corresponding variational inequality Stochastic Optimization 12

First-Order Necessary Optimality Condition: Constrained Problem Let f be a differentiable function with dom(f) = R n and let C R n be a closed convex set. If ˆx is a local minimum of f over C, then the following holds: (y ˆx) f(ˆx) 0 for all y C (1) Again, this is a variational inequality V I(K, F ) with the set K and the mapping F given by K = C, F (x) = f(x) Recall that when f is convex, then a local minimum is also global When f is convex: the preceding relation is also sufficient for ˆx to be a global minimum i.e., if ˆx satisfies relation (1), then ˆx is a (global) minimum Stochastic Optimization 13

Equality and Inequality Constrained Problem Consider the following problem minimize f(x) subject to h 1 (x) = 0,..., h p (x) = 0 g 1 (x) 0,..., g m (x) 0 where f, h i and g j are continuously differentiable over R n. Def. For a feasible vector x, an active set of (inequality) constraints is the set given by A(x) = {j g j (x) = 0} If j A(x), we say that the j-th constraint is inactive at x Def. We say that a vector x is regular if the gradients h 1 (x),..., h p (x), and g j (x) for j A(x) are linearly independent NOTE: x is regular when there are no equality constraints, and all the inequality constrains are inactive [p = 0 and A(x) = ] Stochastic Optimization 14

Lagrangian Function With the problem minimize f(x) subject to h 1 (x) = 0,..., h p (x) = 0 g 1 (x) 0,..., g m (x) 0 (2) we associate the Lagrangian function L(x, λ, µ) defined by L(x, λ, µ) = f(x) + p i=1 λ i h i (x) + m j=1 µ j g j (x) where λ i R for all i, and µ j R + for all j Stochastic Optimization 15

First-Order Karush-Kuhn-Tucker (KKT) Necessary Conditions Th. Let ˆx be a local minimum of the equality/inequality constrained problem (2). Also, assume that ˆx is regular. Then, there exist unique multipliers ˆλ and ˆµ such that x L(ˆx, ˆλ, ˆµ) = 0 [L is the Lagrangian function] ˆµ j 0 for all j ˆµ j = 0 for all j A(ˆx) The last condition is referred to as complementarity conditions We can compactly write them as: g(ˆx) ˆµ Stochastic Optimization 16

In fact, the complementarity-based formulation can be used to write the first-order optimality conditions more compactly. Consider the following constrained optimization problem: minimize f(x) subject to c 1 (x) 0. c m (x) 0 0. Then, if ˆx is regular, then there exists multipliers ˆλ such that 0 ˆx x f(ˆx) x c(ˆx) Tˆλ 0 (3) 0 ˆλ c(ˆx) 0 (4) More succinctly, this is a nonlinear complementarity problem, denoted by Stochastic Optimization 17

CP (R m+n, F ), a problem that requires a z that satisfies 0 z F (z) 0, where z ( ) x λ and F (z) ( x f x c T λ c(x) ). Stochastic Optimization 18

Second-Order KKT Necessary Conditions Th. Let ˆx be a local minimum of the equality/inequality constrained problem (2). Also, assume that ˆx is regular and that f, h i, g j are twice continuously differentiable. Then, there exist unique multipliers ˆλ and ˆµ such that x L(ˆx, ˆλ, ˆµ) = 0 ˆµ j 0 for all j ˆµ j = 0 for all j A(ˆx) For any vector y such that h i (ˆx) y = 0 for all i and g j (ˆx) y = 0 for all j A(ˆx), the following relation holds: y 2 xxl(ˆx, ˆλ, ˆµ)y 0 Stochastic Optimization 19

Solution Procedures: Iterative Algorithms For solving problems, we will consider iterative algorithms Given an initial iterate x 0 We generate a new iterate x k+1 = G k (x k ) where G k is a mapping that depends on the optimization problem Objectives: Provide necessary conditions on the mappings G k that yield a sequence {x k } converging to a solution of the problem of interest Study how fast the sequence {x k } converges: Global convergence rate (when far from optimal points) Local convergence rate (when near an optimal point) Stochastic Optimization 20

Gradient Descent Method Consider continuously differentiable function f. We want to minimize f(x) over x R n Gradient descent method x k+1 = x k α k f(x k ) The scalar α k is a stepsize: α k > 0 The stepsize choices α k = α, or line search, or other stepsize rule so that f(x k+1 ) < f(x k ) Convergence Rate: Looking at the tail of an error e(x k ) = dist(x k, X ) sequence: where dist(x, A) {d(x, a) : a A}. Local convergence is at the best linear lim sup k e(x k+1 ) e(x k ) q for some q (0, 1) Stochastic Optimization 21

Global convergence is also at the best linear Stochastic Optimization 22

Newton s Method Consider twice continuously differentiable function f with Hessian 2 f(x) 0 for all x. We want to solve the following problem: minimize {f(x) : x R n } Newton s method x k+1 = x k α k 2 f(x k ) 1 f(x k ) Local Convergence Rate (near x ) f(x) converges to zero quadratically: f(x k ) C q 2k for all large enough k where C > 0 and q (0, 1) Stochastic Optimization 23

Penalty Methods For solving inequality constrained problems: minimize f(x) subject to g j (x) 0, j = 1,..., m Penalty Approach: Remove the constraints but penalize their violation P c : minimize F (x, c) = f(x)+cp (g 1 (x),..., g m (x)) over x R n where c > 0 is a penalty parameter and P is some penalty function Penalty methods operate in two stages for c and x, respectively Choose initial value c 0 (1) Having c k, solve the problem P ck to obtain its optimal x (c k ) (2) Using x (c k ), update c k to obtain c k+1 and go to step 1 Stochastic Optimization 24

Q-Rates of Convergence Let {x k } be a sequence in R n that converges to x Convergence is said to be: 1. Q-linear if r (0, 1) such that x k+1 x x k x r for k > K. Example: (1 + 0.5 k ) converges Q-linearly to 1. 2. Q-quadratic if M such that x k+1 x x k x 2 M for k > K. Example: (1 + 0.5 2k ) converges Q-quadratically to 1. 3. Q-superlinear if r (0, 1) such that lim k x k+1 x x k x = 0 Example: (1 + k k ) converges Q-superlinearly to 1. 4. Q-quadratically = Q-superlinearly = Q-linearly Stochastic Optimization 25

Example 1 f(x, y) = x 2 + y 2 1. Steepest descent from ( ) 1 1 2. Newton from 3. Newton from ( ) 1 1 ( 1 1 ) Stochastic Optimization 26

y y y Uday V. Shanbhag Lecture 3 2 2 2 1.5 1.5 1.5 1 1 1 0.5 0.5 0.5 0 0 0 0.5 0.5 0.5 1 1 1 1.5 1.5 1.5 2 2 1.5 1 0.5 0 0.5 1 1.5 2 x 2 2 1.5 1 0.5 0 0.5 1 1.5 2 x 2 2 1.5 1 0.5 0 0.5 1 1.5 2 x Figure 1: Well Conditioned Function:Steepest, Newton, Newton Stochastic Optimization 27

Example 2 f(x, y) = 0.1x 2 + y 2 1. Steepest descent from ( ) 1 1 2. Newton from 3. Newton from ( ) 1 1 ( 1 1 ) Stochastic Optimization 28

2 2 2 1.5 1.5 1.5 1 1 1 0.5 0.5 0.5 y 0 y 0 y 0 0.5 0.5 0.5 1 1 1 1.5 1.5 1.5 2 2 1.5 1 0.5 0 0.5 1 1.5 2 x 2 2 1.5 1 0.5 0 0.5 1 1.5 2 x 2 2 1.5 1 0.5 0 0.5 1 1.5 2 x Figure 2: Ill-Conditioned Function: Steepest, Newton, Newton Stochastic Optimization 29

Interior-Point Methods Solve inequality (and more generally) constrained problem: minimize f(x) subject to g j (x) 0, j = 1,..., m The IPM solves a sequence of problems parametrized by t > 0: minimize f(x) 1 t m j=1 ln( g j (x)) Can be viewed as a penalty method with Penalty parameter c = 1 t Penalty function P (u 1,..., u m ) = m j=1 over x R n ln( u j ) This function is known as logarithmic barrier or log barrier function Stochastic Optimization 30

The material for this lecture: References for this lecture (B) Bertsekas D.P. Nonlinear Programming Chapter 1 and Chapter 3 (descent and Newton s methods, KKT conditions) (FP) Facchinei and Pang Finite Dimensional..., Vol I (Part on Complementarity Problems) Chapter 1 for Normal Cone, Dual Cone, and Tangent Cone (BNO) Bertsekas, Nedić, Ozdaglar Convex Analysis and Optimization Chapter 1 (convex functions) Stochastic Optimization 31