Introduction and Math Preliminaries

Similar documents
Conic Linear Programming. Yinyu Ye

Conic Linear Programming. Yinyu Ye

Semidefinite Programming. Yinyu Ye

N. L. P. NONLINEAR PROGRAMMING (NLP) deals with optimization models with at least one nonlinear function. NLP. Optimization. Models of following form:

Lecture 1: Review of linear algebra

Lecture Note 5: Semidefinite Programming for Stability Analysis

Review of Some Concepts from Linear Algebra: Part 2

Lecture 2: Linear Algebra Review

Convex Optimization and Modeling

Review of Linear Algebra

Common-Knowledge / Cheat Sheet

Appendix PRELIMINARIES 1. THEOREMS OF ALTERNATIVES FOR SYSTEMS OF LINEAR CONSTRAINTS

Chapter 2: Preliminaries and elements of convex analysis

CHAPTER 2: CONVEX SETS AND CONCAVE FUNCTIONS. W. Erwin Diewert January 31, 2008.

MATH 5720: Unconstrained Optimization Hung Phan, UMass Lowell September 13, 2018

MAT Linear Algebra Collection of sample exams

2. Linear algebra. matrices and vectors. linear equations. range and nullspace of matrices. function of vectors, gradient and Hessian

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 2

ECON 5111 Mathematical Economics

Lecture 1 Introduction

Linear Algebra Massoud Malek

Linear and non-linear programming

Midterm Review. Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A.

Linear Algebra Primer

Lecture 2: Convex Sets and Functions

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

UC Berkeley Department of Electrical Engineering and Computer Science. EECS 227A Nonlinear and Convex Optimization. Solutions 5 Fall 2009

Linear algebra for computational statistics

Contents Real Vector Spaces Linear Equations and Linear Inequalities Polyhedra Linear Programs and the Simplex Method Lagrangian Duality

EE Applications of Convex Optimization in Signal Processing and Communications Dr. Andre Tkacenko, JPL Third Term

MATHEMATICAL ECONOMICS: OPTIMIZATION. Contents

Lagrangian Duality Theory

Geometric problems. Chapter Projection on a set. The distance of a point x 0 R n to a closed set C R n, in the norm, is defined as

Introduction to Matrix Algebra

Lecture 3. Optimization Problems and Iterative Algorithms

Math 302 Outcome Statements Winter 2013

Lecture 1. 1 Conic programming. MA 796S: Convex Optimization and Interior Point Methods October 8, Consider the conic program. min.

Analysis and Linear Algebra. Lectures 1-3 on the mathematical tools that will be used in C103

Section 3.9. Matrix Norm

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 2

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Symmetric Matrices and Eigendecomposition

HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17

LMI MODELLING 4. CONVEX LMI MODELLING. Didier HENRION. LAAS-CNRS Toulouse, FR Czech Tech Univ Prague, CZ. Universidad de Valladolid, SP March 2009

Convex optimization problems. Optimization problem in standard form

Chapter 3 Transformations

Copositive Plus Matrices

L. Vandenberghe EE236C (Spring 2016) 18. Symmetric cones. definition. spectral decomposition. quadratic representation. log-det barrier 18-1

January 29, Introduction to optimization and complexity. Outline. Introduction. Problem formulation. Convexity reminder. Optimality Conditions

A Brief Review on Convex Optimization

Constrained Optimization Theory

1. Foundations of Numerics from Advanced Mathematics. Linear Algebra

Convex Optimization M2

Lecture: Convex Optimization Problems

ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications

Elementary maths for GMT

Basic Elements of Linear Algebra

Lecture 5. Ch. 5, Norms for vectors and matrices. Norms for vectors and matrices Why?

CSC Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming

3. Vector spaces 3.1 Linear dependence and independence 3.2 Basis and dimension. 5. Extreme points and basic feasible solutions

Definition of convex function in a vector space

Inner products and Norms. Inner product of 2 vectors. Inner product of 2 vectors x and y in R n : x 1 y 1 + x 2 y x n y n in R n

More First-Order Optimization Algorithms

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space.

Assignment 1: From the Definition of Convexity to Helley Theorem

Functional Analysis Review

Chapter 7 Iterative Techniques in Matrix Algebra

CSCI : Optimization and Control of Networks. Review on Convex Optimization

Convexity, Duality, and Lagrange Multipliers

10-701/ Recitation : Linear Algebra Review (based on notes written by Jing Xiang)

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

Convex Optimization Boyd & Vandenberghe. 5. Duality

Optimality, Duality, Complementarity for Constrained Optimization

5. Duality. Lagrangian

Math Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88

3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions

May 9, 2014 MATH 408 MIDTERM EXAM OUTLINE. Sample Questions

Nonlinear Programming Models

Mobile Robotics 1. A Compact Course on Linear Algebra. Giorgio Grisetti

Interior Point Methods for Mathematical Programming

EE731 Lecture Notes: Matrix Computations for Signal Processing

1 Inner Product and Orthogonality

SEMIDEFINITE PROGRAM BASICS. Contents

Math 4A Notes. Written by Victoria Kala Last updated June 11, 2017

Conic Linear Optimization and its Dual. yyye

8 Numerical methods for unconstrained problems

MAT 419 Lecture Notes Transcribed by Eowyn Cenek 6/1/2012

Hands-on Matrix Algebra Using R

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

Summer School: Semidefinite Optimization

ECON 4117/5111 Mathematical Economics

Lecture Note 1: Background

Further Mathematical Methods (Linear Algebra)

Problem 1 (Exercise 2.2, Monograph)

Applied Linear Algebra in Geoscience Using MATLAB

Class notes: Approximation

Inequality Constraints

Transcription:

Introduction and Math Preliminaries Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/ yyye Appendices A, B, and C, Chapter 1 1

What you lean in CME307/MS&E311? Present a core element, mathematical optimization theories and algorithms, for the ICME disciplines. Provide mathematical proofs and in-depth theoretical analyses of optimization/game models/algorithms discussed in MS&E211 Introduce additional conic and nonlinear/nonconvex optimization/game models/problems comparing to MS&E310. Describe new/recent effective optimization/game models/methods/algorithms in Data Science, Machine Learning and AI. Emphasis is on nonlinear and nonconvex theories and practices. 2

Mathematical Optimization The field of optimization is concerned with the study of maximization and minimization of mathematical functions. Very often the arguments of (i.e., variables or unknowns in) these functions are subject to side conditions or constraints. By virtue of its great utility in such diverse areas as applied science, engineering, economics, finance, medicine, and statistics, optimization holds an important place in the practical world and the scientific world. Indeed, as far back as the Eighteenth Century, the famous Swiss mathematician and physicist Leonhard Euler (1707-1783) proclaimed a that... nothing at all takes place in the Universe in which some rule of maximum or minimum does not appear. a See Leonhardo Eulero, Methodus Inviendi Lineas Curvas Maximi Minimive Proprietate Gaudentes, Lausanne & Geneva, 1744, p. 245. 3

Mathematical Optimization/Programming (MP) The class of mathematical optimization/programming problems considered in this course can all be expressed in the form (P) minimize f(x) subject to x X where X usually specified by constraints: c i (x) = 0 i E c i (x) 0 i I. 4

Global and Local Optimizers A global minimizer for (P) is a vector x such that x X and f(x ) f(x) x X. Sometimes one has to settle for a local minimizer, that is, a vector x such that x X and f( x) f(x) x X N( x) where N( x) is a neighborhood of x. Typically, N( x) = B δ ( x), an open ball centered at x having suitably small radius δ > 0. The value of the objective function f at a global minimizer or a local minimizer is also of interest. We call it the global minimum value or a local minimum value, respectively. 5

Important Terms decision variable/activity, data/parameter objective/goal/target constraint/limitation/requirement satisfied/violated feasible/allowable solutions optimal (feasible) solutions optimal value 6

Size and Complexity of Problems number of decision variables number of constraints bit size/number required to store the problem input data problem difficulty or complexity number algorithm complexity or convergence speed 7

Real n-space; Euclidean Space R, R +, int R + R n, R n +, int R n + x y means x j y j for j = 1, 2,..., n 0: all zero vector; and e: all one vector Column vector: and row vector: x = (x 1 ; x 2 ;... ; x n ) x = (x 1, x 2,..., x n ) Inner-Product: x y := x T y = n x j y j j=1 8

Vector norm: x 2 = x T x, x = max{ x 1, x 2,..., x n }, in general, for p 1 1/p n x p = x j p j=1 A set of vectors a 1,..., a m is said to be linearly dependent if there are multipliers λ 1,..., λ m, not all zero, the linear combination m λ i a i = 0 i=1 A linearly independent set of vectors that span R n is a basis. For a sequence x k R n, k = 0, 1,..., we say it is a contraction sequence if there is an x R n and a scalar constant 0 < γ < 1 such that x k+1 x γ x k x, k 0. 9

Matrices A R m n ; a i., the ith row vector; a.j, the jth column vector; a ij, the i, jth entry 0: all zero matrix, and I: the identity matrix The null space N (A), the row space R(A T ), and they are orthogonal. det(a), tr(a): the sum of the diagonal entries of A Inner Product: A B = tra T B = i,j a ij b ij The operator norm of matrix A: The Frobenius norm of matrix A: A 2 := Ax 2 max 0 x R n x 2 A 2 f := A A = i,j a 2 ij 10

Sometimes we use X = diag(x) Eigenvalues and eigenvectors Av = λ v Perron-Frobenius Theorem: a real square matrix with positive entries has a unique largest real eigenvalue and that the corresponding eigenvector can be chosen to have strictly positive components. Stochastic Matrices: A 0 with e T A = e T (Column-Stochastic) or Ae = e (Row-Stochastic). It has a unique largest real eigenvalue 1 and corresponding non-negative right or left eigenvector. 11

Symmetric Matrices S n The Frobenius norm: X f = trx T X = X X Positive Definite (PD): Q 0 iff x T Qx > 0, for all x 0. The sum of PD matrices is PD. Positive Semidefinite (PSD): Q 0 iff x T Qx 0, for all x. The sum of PSD matrices is PSD. PSD matrices: S n +, int Sn + is the set of all positive definite matrices. 12

Affine Set S R n is affine if [ x, y S and α R ] = αx + (1 α)y S. When x and y are two distinct points in R n and α runs over R, {z : z = αx + (1 α)y} is the affine combination of x and y. When 0 α 1, it is called the convex combination of x and y. For α 0 and for β 0 {z : z = αx + βy}, is called the conic combination of x and y. 13

Convex Set Ω is said to be a convex set if for every x 1, x 2 Ω and every real number α [0, 1], the point αx 1 + (1 α)x 2 Ω. Ball and Ellipsoid: for given y R n and positive definite matrix Q, E(y, Q) = {x : (x y) T Q(x y) 1} The Intersection of convex sets is convex The convex hull of a set Ω is the intersection of all convex sets containing Ω An extreme point in a convex set is a point that cannot be expressed as a convex combination of other two distinct points of the set. A set is polyhedral if it has finitely many extreme points; {x : Ax = b, x 0} and {x : Ax b} are convex polyhedral. 14

Cone and Convex Cone A set C is a cone if x C implies αx C for all α > 0 The intersection of cones is a cone A convex cone is a cone and also a convex set A pointed cone is a cone that does not contain a line Dual cone: C := {y : x y 0 for all x C}. The dual cone is always a closed convex cone. 15

Cone Examples Example 1: The n-dimensional non-negative orthant, R n + = {x R n : x 0}, is a convex cone. Its dual is itself. Example 2: The set of all PSD matrices in S n, S+ n, is a convex cone, called the PSD matrix cone. Its dual is itself. Example 3: The set {(t; x) R n+1 : t x p } for a p 1 is a convex cone in R n+1, called the p-order cone. Its dual is the q-order cone with 1 p + 1 q = 1. The dual of the second-order cone (p = 2) is itself. 16

Polyhedral Convex Cones A cone C is (convex) polyhedral if C can be represented by C = {x : Ax 0} or C = {Ax : x 0} for some matrix A. 17

Figure 1: Polyhedral and nonpolyhedral cones. The non-negative orthant is a polyhedral cone, and neither the PSD matrix cone nor the second-order cone is polyhedral. 18

Real Functions Continuous functions Weierstrass theorem: a continuous function f defined on a compact set (bounded and closed) Ω R n has a minimizer in Ω. A function f(x) is called homogeneous of degree k if f(αx) = α k f(x) for all α 0. Example: Let c R n be given and x int R n +. Then ct x is homogeneous of degree 1 and P(x) = n log(c T x) n log x j j=1 is homogeneous of degree 0. Example: Let C S n be given and X int S+ n. Then xt Cx is homogeneous of degree 2, C X and det(x) are homogeneous of degree 1 and n, respectively; and P(X) = n log(c X) log det(x) is homogeneous of degree 0. 19

The gradient vector: f(x) = { f/ x i }, for i = 1,..., n. { } The Hessian matrix: 2 f(x) = 2 f x i x for j i = 1,..., n; j = 1,..., n. Vector function: f = (f 1 ; f 2 ;...; f m ) The Jacobian matrix of f is f(x) = f 1 (x)... f m (x). The least upper bound or supremum of f over Ω sup{f(x) : x Ω} and the greatest lower bound or infimum of f over Ω inf{f(x) : x Ω} 20

Convex Functions f is a convex function iff for 0 α 1, f(αx + (1 α)y) αf(x) + (1 α)f(y). The sum of convex functions is a convex function; the max of convex functions is a convex function; The (lower) level set of f is convex: L(z) = {x : f(x) z}. Convex set {(z; x) : f(x) z} is called the epigraph of f. tf(x/t) is a convex function of (t; x) for t > 0; it s homogeneous with degree 1. 21

Convex Function Examples x p for p 1. αx + (1 α)y p αx p + (1 α)y p α x p + (1 α) y p, from the triangle inequality. e x 1 + e x 2 + e x 3. log(e x 1 + e x 2 + e x 3 ): we will prove it later. 22

Example: Proof of convex function Consider the minimal-objective function of b for fixed A and c: z(b) := minimize f(x) subject to Ax = b, x 0, where f(x) is a convex function. Show that z(b) is a convex function in b. 23

Theorems on functions Taylor s theorem or the mean-value theorem: Theorem 1 Let f C 1 be in a region containing the line segment [x, y]. Then there is a α, 0 α 1, such that f(y) = f(x) + f(αx + (1 α)y)(y x). Furthermore, if f C 2 then there is a α, 0 α 1, such that f(y) = f(x) + f(x)(y x) + (1/2)(y x) T 2 f(αx + (1 α)y)(y x). Theorem 2 Let f C 1. Then f is convex over a convex set Ω if and only if f(y) f(x) + f(x)(y x) for all x, y Ω. Theorem 3 Let f C 2. Then f is convex over a convex set Ω if and only if the Hessian matrix of f is positive semi-definite throughout Ω. 24

Theorem 4 Suppose we have a set of m equations in n variables h i (x) = 0, i = 1,..., m where h i C p for some p 1. Then, a set of m variables can be expressed as implicit functions of the other n m variables in the neighborhood of a feasible point when the Jacobian matrix of the m variables is nonsingular. 25

Lipschitz Functions The first-order β-lipschitz function: there is a positive number β such that for any two points x and y: f(x) f(y) β x y. (1) This condition imples f(x) f(y) f(y) T (x y) β 2 x y 2. The second-order β-lipschitz function: there is a positive number β such that for any two points x and y f(x) f(y) 2 f(y)(x y) β x y 2. (2) This condition implies f(x) f(y) f(y) T (x y) + 1 2 (x y)t 2 f(y)(x y) β 3 x y 3. 26

Known Inequalities Cauchy-Schwarz: given x, y R n, x T y x p y q, where 1 p + 1 q = 1 and p 1. Triangle: given x, y R n, x + y p x p + y p for p 1. Arithmetic-geometric mean: given x R n +, xj n ( xj ) 1/n. 27

System of linear equations Given A R m n and b R m, the problem is to determine n unknowns from m linear equations: Ax = b Theorem 5 Let A R m n and b R m. The system {x : Ax = b} has a solution if and only if that A T y = 0 and b T y 0 has no solution. A vector y, with A T y = 0 and b T y 0, is called an infeasibility certificate for the system. Alternative system pairs: {x : Ax = b} and {y : A T y = 0, b T y 0}. 28

Gaussian Elimination and LU Decomposition a 11 A 1. 0 A x 1 x = A = L U C 0 0 b 1 b. The method runs in O(n 3 ) time for n equations with n unknowns. 29

Linear least-squares problem Given A R m n and c R n, (LS) minimize A T y c 2 subject to y R m, or Choleski Decomposition: (LS) minimize s c 2 subject to s R(A T ). AA T y = Ac AA T = LΛL T, and then solve LΛL T y = Ac. Projections Matrices: A T (AA T ) 1 A and I A T (AA T ) 1 A 30

Solving ball-constrained linear problem (BP ) minimize c T x subject to Ax = 0, x 2 1, x minimizes (BP) if and only if there always exists a y such that they satisfy AA T y = Ac, and if c A T y 0 then x = (c A T y)/ c A T y ; otherwise any feasible x is a minimal solution. 31

Solving ball-constrained linear problem (BD) minimize b T y subject to A T y 2 1. The solution y for (BD) is given as follows: Solve AA T ȳ = b and if ȳ 0 then set y = ȳ/ A T ȳ ; otherwise any feasible y is a solution. 32