TMA4180 Solutions to recommended exercises in Chapter 3 of N&W

Similar documents
The Steepest Descent Algorithm for Unconstrained Optimization

Unconstrained optimization

8 Numerical methods for unconstrained problems

Lecture 3: Linesearch methods (continued). Steepest descent methods

Conjugate Gradient Method

ECE580 Partial Solution to Problem Set 3

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

C&O367: Nonlinear Optimization (Winter 2013) Assignment 4 H. Wolkowicz

Nonlinear Optimization: What s important?

ECE580 Fall 2015 Solution to Midterm Exam 1 October 23, Please leave fractions as fractions, but simplify them, etc.

PETROV-GALERKIN METHODS

Symmetric Matrices and Eigendecomposition

You should be able to...

Introduction to Nonlinear Optimization Paul J. Atzberger

j=1 u 1jv 1j. 1/ 2 Lemma 1. An orthogonal set of vectors must be linearly independent.

Line Search Methods for Unconstrained Optimisation

Conjugate Gradient (CG) Method

Solutions and Notes to Selected Problems In: Numerical Optimzation by Jorge Nocedal and Stephen J. Wright.

Gradient Methods Using Momentum and Memory

Final A. Problem Points Score Total 100. Math115A Nadja Hempel 03/23/2017

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

DO NOT OPEN THIS QUESTION BOOKLET UNTIL YOU ARE TOLD TO DO SO

, b = 0. (2) 1 2 The eigenvectors of A corresponding to the eigenvalues λ 1 = 1, λ 2 = 3 are

Nonlinear Programming

IE 5531: Engineering Optimization I

CHAPTER 6. Projection Methods. Let A R n n. Solve Ax = f. Find an approximate solution ˆx K such that r = f Aˆx L.

The Conjugate Gradient Method

Part 2: Linesearch methods for unconstrained optimization. Nick Gould (RAL)

5 Quasi-Newton Methods

Notes on Some Methods for Solving Linear Systems

4 damped (modified) Newton methods

Written Examination

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

The goal of this chapter is to study linear systems of ordinary differential equations: dt,..., dx ) T

Constrained optimization. Unconstrained optimization. One-dimensional. Multi-dimensional. Newton with equality constraints. Active-set method.

Linear Algebra- Final Exam Review

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Numerical Optimization: Basic Concepts and Algorithms

Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem

Math 409/509 (Spring 2011)

Midterm for Introduction to Numerical Analysis I, AMSC/CMSC 466, on 10/29/2015

Course Notes: Week 4

Unconstrained optimization I Gradient-type methods

Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem

IE 5531: Engineering Optimization I

Exam in TMA4180 Optimization Theory

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:

Nonlinear Optimization

Optimal Newton-type methods for nonconvex smooth optimization problems

Trajectory-based optimization

Numerical Optimization

Unconstrained minimization of smooth functions

Quadratic forms. Here. Thus symmetric matrices are diagonalizable, and the diagonalization can be performed by means of an orthogonal matrix.

MA5206 Homework 4. Group 4. April 26, ϕ 1 = 1, ϕ n (x) = 1 n 2 ϕ 1(n 2 x). = 1 and h n C 0. For any ξ ( 1 n, 2 n 2 ), n 3, h n (t) ξ t dt

4 Newton Method. Unconstrained Convex Optimization 21. H(x)p = f(x). Newton direction. Why? Recall second-order staylor series expansion:

Solution Methods. Richard Lusby. Department of Management Engineering Technical University of Denmark

Lecture 2: Linear Algebra Review

Homework 2. Solutions T =

Numerical Optimization

6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE. Three Alternatives/Remedies for Gradient Projection

Higher-Order Methods

Algorithms for constrained local optimization

LECTURE 22: SWARM INTELLIGENCE 3 / CLASSICAL OPTIMIZATION

Notes on Numerical Optimization

Singular Integrals. 1 Calderon-Zygmund decomposition

HYBRID RUNGE-KUTTA AND QUASI-NEWTON METHODS FOR UNCONSTRAINED NONLINEAR OPTIMIZATION. Darin Griffin Mohr. An Abstract

Fall TMA4145 Linear Methods. Exercise set Given the matrix 1 2

Introduction. New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems

Optimization. Yuh-Jye Lee. March 21, Data Science and Machine Intelligence Lab National Chiao Tung University 1 / 29

Quadratic reformulation techniques for 0-1 quadratic programs

Chapter 0. Mathematical Preliminaries. 0.1 Norms

Examination paper for TMA4180 Optimization I

Chapter 8 Gradient Methods

Linear Algebra, part 2 Eigenvalues, eigenvectors and least squares solutions

Optimization Methods. Lecture 19: Line Searches and Newton s Method

HW3 - Due 02/06. Each answer must be mathematically justified. Don t forget your name. 1 2, A = 2 2

Semidefinite Programming Basics and Applications

Linear System Theory

Gradient Descent. Dr. Xiaowei Huang

Lecture 10: October 27, 2016

Miscellaneous Nonlinear Programming Exercises

Numerical optimization

Basic Math for

R-Linear Convergence of Limited Memory Steepest Descent

Optimization Methods. Lecture 18: Optimality Conditions and. Gradient Methods. for Unconstrained Optimization

Optimization 2. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng

ECE 680 Modern Automatic Control. Gradient and Newton s Methods A Review

Suppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf.

Improving the Convergence of Back-Propogation Learning with Second Order Methods

ECE580 Exam 1 October 4, Please do not write on the back of the exam pages. Extra paper is available from the instructor.

Topics in Applied Linear Algebra - Part II

The Conjugate Gradient Method

Some definitions. Math 1080: Numerical Linear Algebra Chapter 5, Solving Ax = b by Optimization. A-inner product. Important facts

Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems

. The following is a 3 3 orthogonal matrix: 2/3 1/3 2/3 2/3 2/3 1/3 1/3 2/3 2/3

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Basic Concepts in Matrix Algebra

UC Berkeley Department of Electrical Engineering and Computer Science. EECS 227A Nonlinear and Convex Optimization. Solutions 5 Fall 2009

Computational Optimization. Mathematical Programming Fundamentals 1/25 (revised)

YORK UNIVERSITY. Faculty of Science Department of Mathematics and Statistics MATH M Test #2 Solutions

Transcription:

TMA480 Solutions to recommended exercises in Chapter 3 of N&W Exercise 3. The steepest descent and Newtons method with the bactracing algorithm is implemented in rosenbroc_newton.m. With initial point x 0 = (.,.) T, Newtons method converged in 8 iterations, while steepest descent needs 3997 (!) iterations. For x 0 = (., ) T, Newtons method converged in iterations, while steepest descent needs 4076 (!) iterations. Clearly, the steepest descent method has very slow convergence rate for this problem. Also notice that the convergence rate of Newtons method is highly dependent on the initial point. Convergence of steepest descent To show that the steepest descent method applied to the Rosenbroc function converges for all x 0, we use Theorem from the note on "Convergence of descent methods with bactracing (Armijo) linesearch". Thus, we need to show that the three assumptions of that theorem holds, i.e., that. f(x) is continuously differentiable;. the set S := {x R f(x) f(x 0 )} is bounded; 3. the matrices B are uniformly positive definite and bounded. Condition were shown in Exercise., while condition 3 is trivial since B = I (the identity) for the steepest descent. To show boundedness of S, we observe that f(x) is the sum of two non-negative terms, so that f(x) = 00(x x ) + ( x ) f(x 0 ) = C { ( x ) C, 00(x x ) C. The first condition is equivalent to x C, thus x is bounded. condition is equivalent to x x 0.C, thus x is also bounded. The second Exercise 3. We show this exercise by an counter-example. Pic objective function f(x) = x x with f (x) = x and minimum x = 0.5. The Wolfe conditions for this dimensional function is f(x + α) f(x) + cαf (x), f (x + α ) c f (x ). In one dimension we let p = and allow α to be negative.

If we choose x = 0, the first condition reads while the other condition reads α α c α, α c, α c, α c. If we now pic c = 3 4 and c = 4, we see that the two conditions reduces to α 4 and α 3 8, which is a contradiction. Hence, we need that 0 < c < c < to be sure that there exists an α satisfying the Wolfe conditions. Exercise 3.3 Consider the strongly convex quadratic function f(x) = xt Qx b T x. We search for a minimizer along the ray x + αp, that is, an α such that We can write d dα (f(x + αp )) = 0. Differentiation gives f(x + αp ) = (x + αp ) T Q(x + αp ) b T (x + αp ) = xt Qx + αx T Qp + α p T Qp b T x αb T p. d dα (f(x + αp )) = x T Qp + αp T Qp b T p = αp T Qp + f T p, where we have used that f = f(x) = Qx b. Thus d dα (f(x + αp )) = 0 only if which is what we wanted to show. Recall this from Exercise.3. α = f T p p T Qp,

Exercise 3.4 We consider the strongly convex quadratic function f(x) = xt Qx b T, whose gradient is given as f(x) = Qx b. The one-dimensional minimizer is given as α = f T p p t Qp The Goldstein conditions are given as = (xt Q bt )p p T Qp. () f(x ) + ( c)α f T p f(x + α p ) f(x ) + cα f T p, c (0, ). () We start by looing at f(x + α p ), f(x + α p ) = xt Qx + α x T Qp + α p T Qp b T x α b T p = f(x ) + α x T Qp + α p T Qp α b T p. Further, we see that ( c)α f T p = ( c)α (x T Q b T )p = α (x T Qp cx t Qp b T p + cb T p ), and that cα f T p = α c(x T Q b T )p. Hence, the Goldstein conditions () can be written as α (x T Qp cx t Qp b T p + cb T p ) α x T Qp + α p T Qp α b T p We start by looing at the first condition, α c(x T Q b T )p. cx T Qp + cb T p α p T Qp c(b T x T Q)p (b T x T Q)p, where we have used () on the right hand side. We see that this inequality is satisfied for all c (0, ) since (bt x T Q)p is non-negative. 3

Similarly, for the second condition we get that x T Qp + α p T Qp b T p c(x T Q b T )p α p T Qp (c )(x T Q b T )p (b T x T Q)p ( c)(b T x T Q)p. This holds true if ( c) since (b T x T Q)p is non-negative, or equivalently that c. Exercise 3.5 For a matrix norm induced from a vector norm, it is always true that Ax A x. Hence, x = B Bx B Bx Bx x B. A property of symmetric positive definite matrices, B, is that there exists matrices B and B such that B = B and B = B. Thus, we have Exercise 3.6 cosθ = f T p f p = p T B p B p p pt B p B p = pt B B p B p = B p B p p = B B M. B B p From Equation (3.8) in N&W we have that { x x ( f0 T Q = f 0) } ( f0 T Q f 0)( f0 T x 0 x Q f 0 ) Q. (3) We now that x 0 x is parallel to an eigenvector of G. Let e be this (normalized) eigenvector with corresponding eigenvalue λ > 0 such that Qe = λe and such that x 0 x = βe for some constant β. Further, recall that s an eigenvalue of Q with corresponding eigenvector e. Now, f o = Q(x 0 x ) = Qβe = βλe, 4

and we can deduce that ( f0 T f 0) ( f0 T Q f 0)( f0 T Q f 0) = (β λ e T e) (β λ e T Qe)(β λ e T Q e) = (e T λe)(e T =. λe) Insertion into (3) gives x x Q = 0. Hence we have convergence in one step. Exercise 3.7 First we use the definition of Q to see that x x Q = (x x ) T Q(x x ) = x T Qx x T Qx + (x ) T Qx. By further using that x + = x α f, we see that x x Q x + x Q = x T Qx x T Qx x T +Qx + + x T +Qx = x T Qx x T Qx Now, if we insert the one-dimensional minimizer, and f = Q(x x ), we get that (x T Qx α f T Qx + α f T Q f ) + (x T Qx α f T Qx ) = α f T Q(x x ) α f T Q f. α = f T f f T Q f x x Q x + x Q = ( f T f ) Further, we see that f T Q f ( f T f ) f T Q f x + x Q = Q f Q = f T Q f. = ( f T f ) f T Q f. (4) Inserting this into (4) and reorganizing gives the desired result (3.8) in N&W. Exercise 3.8 Since Q R n n is SPD, we can diagonalize it, i.e., Q = RDR T, Q = RD R T, where R is an orthonormal matrix and D = diag{λ, λ,..., λ n }. Each column of R is an eigenvector of G and > 0 are the corresponding eigenvalues ordered such that 5

λ λ... λ n. Since RR T = I, we can write β = (x T x) (x T Qx)(x T Q x) = (x T RR T x) (x T RDRx)(x T RD R T x) = (d T d) (d T Dd)(d T D d), where d = R T x. Let ξ i = d i d T d. Then ξ i 0 and i ξ i = and similarly that Hence, d T d d T Dd = i d i i d i = dt d i ξ i d T d i ξ i = d T d d T D d = i ξ. i β = ( i ξ i )( i ξ i ). d T d i d i i ξ i, =. We now see that Further, let λ = i ξ i and λ = i ξ i. Observe that λ λ λ n. By the convexity of the function φ(λ) = λ, we now that Hence, Finally, we deduce that β = λ λ λ n λ n λ λ + λ λ n λ λ n = λ n + λ λ λ n. λ i λ λ n λ(λ n + λ λ) ( λn + λ λ λ n ) ξ i = λ n + λ λ λ λ n. λ λ n max λ [λ,λ n]{λ(λ n + λ λ)} = 4λ λ n λ + λ n ), which is what we wanted to show. We have used that λ(λ n +λ λ) attains its maximum at λ = λ +λ n (verify this). 6