Nonlinear Programming Algorithms Handout

Similar documents
EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 2

1 Overview. 2 A Characterization of Convex Functions. 2.1 First-order Taylor approximation. AM 221: Advanced Optimization Spring 2016

Symmetric Matrices and Eigendecomposition

be a Householder matrix. Then prove the followings H = I 2 uut Hu = (I 2 uu u T u )u = u 2 uut u

Nonlinear equations. Norms for R n. Convergence orders for iterative methods

MATH 5720: Unconstrained Optimization Hung Phan, UMass Lowell September 13, 2018

Convex Functions and Optimization

Chapter 3 Transformations

Review of Some Concepts from Linear Algebra: Part 2

Linear Algebra Formulas. Ben Lee

Lecture notes: Applied linear algebra Part 1. Version 2

Math Introduction to Numerical Analysis - Class Notes. Fernando Guevara Vasquez. Version Date: January 17, 2012.

Algebra C Numerical Linear Algebra Sample Exam Problems

EIGENVALUE PROBLEMS. Background on eigenvalues/ eigenvectors / decompositions. Perturbation analysis, condition numbers..

Symmetric matrices and dot products

Linear Algebra Massoud Malek

LinGloss. A glossary of linear algebra

Lecture 7: Positive Semidefinite Matrices

Chapter 7. Canonical Forms. 7.1 Eigenvalues and Eigenvectors

The following definition is fundamental.

In particular, if A is a square matrix and λ is one of its eigenvalues, then we can find a non-zero column vector X with

The Eigenvalue Problem: Perturbation Theory

Chapter 0. Mathematical Preliminaries. 0.1 Norms

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

Spectral radius, symmetric and positive matrices

Section 3.9. Matrix Norm

2. Linear algebra. matrices and vectors. linear equations. range and nullspace of matrices. function of vectors, gradient and Hessian

Basic Elements of Linear Algebra

Throughout these notes we assume V, W are finite dimensional inner product spaces over C.

linearly indepedent eigenvectors as the multiplicity of the root, but in general there may be no more than one. For further discussion, assume matrice

Mathematical foundations - linear algebra

Lecture Note 7: Iterative methods for solving linear systems. Xiaoqun Zhang Shanghai Jiao Tong University

Lecture 10 - Eigenvalues problem

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2017 LECTURE 5

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

Chap 3. Linear Algebra

Numerical Linear Algebra

Eigenvalues and Eigenvectors

Nonlinear Programming Models

Linear Algebra Lecture Notes-II

Spectral inequalities and equalities involving products of matrices

Elementary linear algebra

Optimization Theory. A Concise Introduction. Jiongmin Yong

Def. The euclidian distance between two points x = (x 1,...,x p ) t and y = (y 1,...,y p ) t in the p-dimensional space R p is defined as

Chapter 1. Optimality Conditions: Unconstrained Optimization. 1.1 Differentiable Problems

Ma/CS 6b Class 20: Spectral Graph Theory

Lecture 5. Ch. 5, Norms for vectors and matrices. Norms for vectors and matrices Why?

Linear Algebra in Actuarial Science: Slides to the lecture

Matrix decompositions

NORMS ON SPACE OF MATRICES

Lecture 1: Review of linear algebra

The Singular Value Decomposition and Least Squares Problems

Notes on Eigenvalues, Singular Values and QR

Notes on Linear Algebra and Matrix Theory

Basic Calculus Review

Compound matrices and some classical inequalities

March 5, 2012 MATH 408 FINAL EXAM SAMPLE

Review of some mathematical tools

Computational math: Assignment 1

= main diagonal, in the order in which their corresponding eigenvectors appear as columns of E.

AN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES

Properties of Matrices and Operations on Matrices

Fall TMA4145 Linear Methods. Exercise set Given the matrix 1 2

MATH 221: SOLUTIONS TO SELECTED HOMEWORK PROBLEMS

MATH 581D FINAL EXAM Autumn December 12, 2016

FIXED POINT ITERATIONS

Singular Value Decomposition (SVD)

Linear Algebra - Part II

Eigenvalue and Eigenvector Problems

Topics in Applied Linear Algebra - Part II

Linear Algebra Review

Algebra II. Paulius Drungilas and Jonas Jankauskas

Ma/CS 6b Class 20: Spectral Graph Theory

Functional Analysis Review

Problem # Max points possible Actual score Total 120

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

Linear Algebra & Analysis Review UW EE/AA/ME 578 Convex Optimization

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

C&O367: Nonlinear Optimization (Winter 2013) Assignment 4 H. Wolkowicz

33AH, WINTER 2018: STUDY GUIDE FOR FINAL EXAM

Here is an example of a block diagonal matrix with Jordan Blocks on the diagonal: J

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

CHAPTER 4: HIGHER ORDER DERIVATIVES. Likewise, we may define the higher order derivatives. f(x, y, z) = xy 2 + e zx. y = 2xy.

Paul Schrimpf. October 18, UBC Economics 526. Unconstrained optimization. Paul Schrimpf. Notation and definitions. First order conditions

Lecture Notes to be used in conjunction with. 233 Computational Techniques

1 Linearity and Linear Systems

ACM 104. Homework Set 5 Solutions. February 21, 2001

Lecture Notes to be used in conjunction with. 233 Computational Techniques

Topic 1: Matrix diagonalization

Knowledge Discovery and Data Mining 1 (VO) ( )

1 Directional Derivatives and Differentiability

Recall the convention that, for us, all vectors are column vectors.

Ir O D = D = ( ) Section 2.6 Example 1. (Bottom of page 119) dim(v ) = dim(l(v, W )) = dim(v ) dim(f ) = dim(v )

STAT200C: Review of Linear Algebra

Let A an n n real nonsymmetric matrix. The eigenvalue problem: λ 1 = 1 with eigenvector u 1 = ( ) λ 2 = 2 with eigenvector u 2 = ( 1

Jordan Normal Form. Chapter Minimal Polynomials

σ 11 σ 22 σ pp 0 with p = min(n, m) The σ ii s are the singular values. Notation change σ ii A 1 σ 2

Chapter 2 Convex Analysis

ECS231 Handout Subspace projection methods for Solving Large-Scale Eigenvalue Problems. Part I: Review of basic theory of eigenvalue problems

Convex Optimization and Modeling

Transcription:

Nonlinear Programming Algorithms Handout Michael C. Ferris Computer Sciences Department University of Wisconsin Madison, Wisconsin 5376 September 9 1 Eigenvalues The eigenvalues of a matrix A C n n are the roots of the characteristic polynomial φ(λ) := det(λi A). The set of all eigenvalues is called the spectrum and is denoted by λ(a). Eigenvectors are vectors x satisfying Ax = λx for some λ λ(a). If A R n n is symmetric then n A = QΛQ T = λ i q i qi T where Q R n n is orthogonal (basis of eigenvectors) and Λ R n n is diagonal (each entry called an eigenvalue). Note that in this case the eigenvalues and eigenvectors may be taken as real. The above factorization is called a spectral decomposition. For real symmetric matrices, the eigenvalues are assumed to be ordered using λ 1 λ... λ n. Singular Values Orthogonal matrices satisfy U T U = UU T = I. If A R m n then there exist orthogonal matrices U = [u 1,..., u m ] R m m and V = [v 1,..., v n ] R n n such that U T AV = diag(σ 1,..., σ p ) R m n where p = min{m, n} and σ 1 σ... σ p. This is often written as A = USV T and is termed the singular value decomposition. Singular values are non-negative; σ i (A) denote the ith largest singular value of A. If we define r by σ 1... σ r > σ r+1 =... = σ p = 1

then rank(a) = r, ker(a) = span{v r+1,..., v n }, im(a) = span{u 1,..., u r } and r A = σ i u i vi T. How are eigenvalues and singular values related? σ(a) = + λ(aa T ) Definition 1 A matrix A is positive definite if x T Ax > for all x. A matrix A is positive semidefinite if x T Ax for all x. When A is a symmetric positive definite matrix, then singular values and eigenvalues coincide (take U = V = Q and S = Λ), and thus σ 1 (A) = largest eigenvalue of A If A is square, then σ n (A) = smallest eigenvalue of A σ n (A) min i λ i max λ i σ 1 (A) i Lemma For a symmetric matrix A R n n, the following are equivalent 1. A is positive definite. The (n) leading principal subdeterminants of A are strictly positive 3. All principal subdeterminants of A are strictly positive 4. The eigenvalues of A are strictly positive The matrix [ 1 3 1 shows symmetry to be necessary for this result. ] 3 Matrix Norms Suppose A R m n. A α,β := sup x Ax β x α In particular, when α = β = p, this is called the p-norm of A. The Frobenius norm is defined by A F = m nj=1 A ij. Various properties of matrix norms detail in GVL (pp 54). Interesting facts: A = σ 1

p A F = σi, p = min(m, n) max i,j A A F n A m A 1 = max A ij 1 j n n A = max 1 i m j=1 A ij A ij A mn max A ij i,j A A 1 A min x Ax x = σ n Lemma 3 Let A be a real symmetric matrix. For each x R n λ n x x T Ax λ 1 x If A is also positive definite, then A = λ 1. Proof Since A = QΛQ T (Q invertible), we let x = Qy for some y. Then Also Thus, n x T Ax = y T Q T AQy = y T Λy = λ i yi. y = y T y = x T QQ T x = x. λ n x n n = λ n yi λ i yi (= x T n Ax) λ 1 yi = λ 1 x Finally, Ax = QΛQ T Qy = QΛy, so n Ax = y T ΛQ T QΛy = λ i yi λ 1 y = λ 1 x, the inequality following from the positive definite assumption. Hence Ax x λ 1 so A = sup x Ax x λ 1. But the supremum is attained at an eigenvector of A corresponding to λ 1. Definition 4 Suppose A R n n is nonsingular. The condition number of A, κ(a), is defined to be A A 1. 3

The condition number of a matrix depends on the underlying norm. If A is singular, then κ(a) =. Note that κ (A) = A A 1 = σ 1(A) σ n A. Corollary 5 If A is real symmetric positive definite, then A 1 = λ 1 n and κ(a) = λ 1 λ n. λ 1 1 x x T A 1 x λ 1 n x Proof The eigenvalues of A 1 are simply λ 1 i. Definition 6 Suppose A C n n. The spectral radius of A, ρ(a), is defined as the maximum of λ 1,..., λ n, where λ 1,..., λ n are the eigenvalues of A. Note that if A C n n and λ is any eigenvalue of A with eigenvector u, then Au = λ u, so that λ A. Hence ρ(a) A. Lemma 7 Let A C n n. Then lim k A k = if and only if ρ(a) < 1. Lemma 8 (Neumann) Let E R n n and ρ(e) < 1. Then I E is nonsingular and I E 1 = lim k E i k i= Proof Since ρ(e) < 1, the maximum eigenvalue of E must be less than 1 in modulus and hence I E is invertible. Furthermore so that (I E)(I + + E k 1 ) = I E k I + E + + E k 1 = (I E) 1 (I E) 1 E k The result now follows in the limit as k from the above lemma. 4 Results from Matrix Theory Lemma 9 (Debreu) Suppose H R n n and A R m n. The following are equivalent: (a) Az = and z implies z, Hz > (b) There exists γ such that H + γa T A is positive definite 4

Remark If (b) holds and γ γ then for any z, z, (H + γa T A)z = z, (H + γa T A)z + (γ γ) Az z, (H + γa T A)z so that H + γa T A is also positive definite. Proof (b) (a) If Az = and z, then < z, Hz = z, (H + γa T A)z. (a) (b) If (b) were false, then there would exist z k of norm 1, with z k, (H + ka T A)z k Without loss of generality, we can assume that z k z. Now for each k z k, (k 1 H + A T A)z k Az so Az =. But z k, Hz k + k Az k z k, Hz k z, Hz and this contradicts (a), since z = 1. Lemma 1 (Schur) Suppose that ( A B M = C D ) (1) with D nonsingular. Then M is nonsingular if and only if S = A BD 1 C is nonsingular, and in that case ( M 1 S = 1 S 1 BD 1 ) D 1 CS 1 D 1 + D 1 CS 1 BD 1 () Remark S is the Schur complement of D in M. We can also show that det M = det S det D since ( ) (( ) ( ) ( )) A B A B I I det M = det = det C D C D C D 1 DC D ( A BD = det 1 C BD 1 ) ( ) I det I DC D = det A BD 1 C det D 5

Proof If S is nonsingular, then just multiply the expressions given in (1) and () to obtain I. Thus assume M is nonsingular. Write its inverse as ( ) E F M 1 = G H Since MM 1 = ( I I ) we have I = AE + BG (3) = AF + BH (4) = CE + DG (5) I = CF + DH (6) From (5) we have G = D 1 CE, and substituting this in (3) yields (A BD 1 C)E = I. This implies that S = A BD 1 C is nonsingular and that E = S 1. Using (5) again we have that G = D 1 CS 1. From (6) we have H = D 1 (I CF ), and putting this in (4) yields AF + BD 1 BD 1 CF = that is which implies F = S 1 BD 1. 5 Matrix Norms SF + BD 1 = Suppose A R m n. A α,β := sup x Ax β x α In particular, when α = β = p, this is called the p-norm of A. The Frobenius norm is defined by A F = m nj=1 A ij. Various properties of matrix norms detaile in GVL (pp 54). Interesting facts: The two norm of A is the square root of the largest eigenvalue of A T A. max i,j A A F n A m A 1 = max A ij 1 j n n A = max A ij 1 i m j=1 A ij A mn max A ij A i,j A 1 A The condition number of a matrix depends on the underlying norm, but is defined for a square matrix by κ(a) := A A 1 6

If A is singular, then κ(a) =. Note that κ (A) = A inva = σ 1(A) σ n A where σ i (A) is the ith largest singular value of A. Singular values are non-negative. How are they related to eigenvalues? p A F = σi, p = min(m, n) A = σ 1 min x Ax x = σ n 6 Positive (Semi)-Definite Matrices Definition 11 A matrix A is positive definite if x T Ax > for all x. A matrix A is positive semidefinite if x T Ax for all x. Theorem 1 See Bertsekas 7 Strong Convexity Let f: Ω R, h: Ω R where Ω is an open convex set. Definition 13 f is strongly convex (ρ) on Ω if ρ > such that x, y Ω, λ [, 1] f((1 λ)x + λy) (1 λ)f(x) + λf(y) ρ λ(1 λ) x y Definition 14 h is strongly monotone (ρ) on Ω if ρ > such that x, y Ω h(x) h(y), x y ρ x y Theorem 15 (Strong convexity) If f is continuously differentiable on Ω then the following are equivalent: (a) f is strongly convex (ρ) on Ω (b) For all x, y Ω, f(y) f(x) + f(x), y x + (ρ/) x y (c) f is strongly monotone (ρ) on Ω If f is twice continuously differentiable on Ω, then (d) For all x,y,z Ω, x y, f(z)(x y) ρ x y 7

is equivalent to the above. Proof We show (a) (b) (c). (a) (b) The hypothesis gives f(x + λ(y x)) f(x) λ so taking the limit as λ (b) (c) Applying (b) twice gives f(y) f(x) ρ (1 λ) x y f(x), y x f(y) f(x) ρ x y f(y) f(x) + f(x), y x + ρ x y f(x) f(y) + f(y), x y + ρ x y Adding these inequalities gives f(y) + f(x) f(x) + f(y) + f(x) f(y), y x + ρ x y from where the result follows. (c) (b) The hypothesis gives f(x + t(y x)) f(x), y x dt which implies the result. (b) (a) Letting y = u and x = (1 λ)u + λv in (b) gives ρt x y dt f(u) f((1 λ)u + λv) + f((1 λ)u + λv), λ(u v) + ρ λ(u v) (7) Also letting y = v and x = (1 λ)u + λv in (b) implies f(v) f((1 λ)u + λv) + f((1 λ)u + λv), (1 λ)(v u) + ρ (1 λ)(v u) (8) Adding (1 λ) times (7) to λ times (8) gives the required result. To complete the proof, we assume that f is twice continuously differentiable on Ω. (d) (c) This follows from the hypothesis since f(x) f(y), x y = f(y + t(x y))(x y)dt, x y ρ x y (c) (d) Let x, y, z Ω. Then z + λ(x y) Ω for sufficiently small λ, so x y, f(z + λ(x y))(x y) = The result follows in the limit as λ. x y, f(z + λ(x y)) f(z) + o(1) λ ρ x y + o(1) 8

8 Lipschitz Continuity Definition 16 f is Lipschitz continuous (ρ) on Ω if ρ > such that x, y Ω f(y) f(x) ρ y x Lemma 17 (Lipschitz continuity) Consider the following statements: (a) f is Lipschitz continuous on Ω with constant ρ (b) For all x, y Ω, f(y) f(x) + f(x), y x + (ρ/) x y (c) For all x, y Ω, f(x) f(y), x y ρ x y Then (a) implies (b) implies (c). If f is twice continuously differentiable, then (c) implies (d) For all x, y Ω, y x, f(z)(y x) ρ y x If x y, f(x)(y x) γ y x for some γ (not necessarily positive), then (d) implies (a), possibly with a different constant ρ. Proof (a) (b) f(y) f(x) f(x), y x = (b) (c) Invoking (b) twice gives f(x + t(y x)) f(x), y x dt y x ρ y x tdt = ρ y x f(y) f(x) + f(x), y x + ρ x y f(x) f(y) + f(y), x y + ρ x y f(x + t(y x)) f(x) dt Adding these inequalities gives f(y) + f(x) f(x) + f(y) + f(x) f(y), y x + ρ x y from where (c) follows. (c) (d) It follows from (c) that f(x + λ(y x)) f(x), λ(y x) ρλ y x If we divide both sides by λ, (d) then follows in the limit as λ. 9

(d) (a) Let δ := max{ γ, } + ρ. We first show that f(z) δ. Note that However, f(z) = sup x, f(z)y = 1 y =1 f(z)y = sup x, f(z)y x =1, y =1 x y, f(z)(y x) + x, f(z)x + y, f(z)y 1 { γ x y + ρ x + y } 1 {max{ γ, }( x + x y + y ) + ρ( x + y )} Hence, f(z) max{ γ, } + ρ, as required. The Lipschitz continuity now follows easily since f(y) f(x) = f(x + t(y x))(y x)dt δ y x 1