, b = 0. (2) 1 2 The eigenvectors of A corresponding to the eigenvalues λ 1 = 1, λ 2 = 3 are

Similar documents
Lecture Notes: Geometric Considerations in Unconstrained Optimization

Nonlinear Optimization: What s important?

ECE580 Partial Solution to Problem Set 3

ECE580 Exam 1 October 4, Please do not write on the back of the exam pages. Extra paper is available from the instructor.

Gradient Descent. Dr. Xiaowei Huang

LECTURE 22: SWARM INTELLIGENCE 3 / CLASSICAL OPTIMIZATION

Introduction to Unconstrained Optimization: Part 2

AM 205: lecture 18. Last time: optimization methods Today: conditions for optimality

Introduction to gradient descent

Unconstrained optimization

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent

Some definitions. Math 1080: Numerical Linear Algebra Chapter 5, Solving Ax = b by Optimization. A-inner product. Important facts

Numerical Optimization

Conjugate Gradient (CG) Method

MATH 5720: Unconstrained Optimization Hung Phan, UMass Lowell September 13, 2018

Symmetric Matrices and Eigendecomposition

Deep Learning. Authors: I. Goodfellow, Y. Bengio, A. Courville. Chapter 4: Numerical Computation. Lecture slides edited by C. Yim. C.

Notes on Some Methods for Solving Linear Systems

ECE580 Fall 2015 Solution to Midterm Exam 1 October 23, Please leave fractions as fractions, but simplify them, etc.

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:

Lecture 6 Positive Definite Matrices

Introduction to unconstrained optimization - direct search methods

Neural Network Training

Math (P)refresher Lecture 8: Unconstrained Optimization

Numerical Methods I Solving Nonlinear Equations

Numerical Methods - Numerical Linear Algebra

A A x i x j i j (i, j) (j, i) Let. Compute the value of for and

An Iterative Descent Method

2.098/6.255/ Optimization Methods Practice True/False Questions

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

CE 191: Civil and Environmental Engineering Systems Analysis. LEC 05 : Optimality Conditions

Exploring the energy landscape

Math 118, Fall 2014 Final Exam

Positive Definite Matrix

Constrained optimization. Unconstrained optimization. One-dimensional. Multi-dimensional. Newton with equality constraints. Active-set method.

Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem

MATHEMATICS FOR COMPUTER VISION WEEK 8 OPTIMISATION PART 2. Dr Fabio Cuzzolin MSc in Computer Vision Oxford Brookes University Year

Mathematical optimization

Nonlinear Optimization for Optimal Control

Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science

7.2 Steepest Descent and Preconditioning

Linear Algebra And Its Applications Chapter 6. Positive Definite Matrix

There are six more problems on the next two pages

HW3 - Due 02/06. Each answer must be mathematically justified. Don t forget your name. 1 2, A = 2 2

Line Search Methods for Unconstrained Optimisation

Paul Schrimpf. October 18, UBC Economics 526. Unconstrained optimization. Paul Schrimpf. Notation and definitions. First order conditions

Transpose & Dot Product

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

1 Computing with constraints

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science

Advanced Mathematical Programming IE417. Lecture 24. Dr. Ted Ralphs

Two hours. To be provided by Examinations Office: Mathematical Formula Tables. THE UNIVERSITY OF MANCHESTER. 29 May :45 11:45

Nonlinear Optimization

C&O367: Nonlinear Optimization (Winter 2013) Assignment 4 H. Wolkowicz

Transpose & Dot Product

Optimization Methods. Lecture 18: Optimality Conditions and. Gradient Methods. for Unconstrained Optimization

Math 411 Preliminaries

Unconstrained Optimization

Iterative Methods for Solving A x = b

Part 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL)

Chapter 13. Convex and Concave. Josef Leydold Mathematical Methods WS 2018/19 13 Convex and Concave 1 / 44

8 Numerical methods for unconstrained problems

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey

Numerical Optimization

COMP 558 lecture 18 Nov. 15, 2010

5 Overview of algorithms for unconstrained optimization

Optimality Conditions

Chapter 2. Optimization. Gradients, convexity, and ALS

4 damped (modified) Newton methods

Unconstrained minimization of smooth functions

Computational Finance

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

13. Nonlinear least squares

CHAPTER 2: QUADRATIC PROGRAMMING

Multivariate Newton Minimanization

MATH 4211/6211 Optimization Basics of Optimization Problems

Monotone Function. Function f is called monotonically increasing, if. x 1 x 2 f (x 1 ) f (x 2 ) x 1 < x 2 f (x 1 ) < f (x 2 ) x 1 x 2

Penalty and Barrier Methods. So we again build on our unconstrained algorithms, but in a different way.

Chapter 3 Numerical Methods

Review of Classical Optimization

Introduction - Motivation. Many phenomena (physical, chemical, biological, etc.) are model by differential equations. f f(x + h) f(x) (x) = lim

TMA4180 Solutions to recommended exercises in Chapter 3 of N&W

Lecture 3: Linesearch methods (continued). Steepest descent methods

Math 273a: Optimization Basic concepts

1 Overview. 2 A Characterization of Convex Functions. 2.1 First-order Taylor approximation. AM 221: Advanced Optimization Spring 2016

The Steepest Descent Algorithm for Unconstrained Optimization

Higher-Order Methods

Chapter 6. Eigenvalues. Josef Leydold Mathematical Methods WS 2018/19 6 Eigenvalues 1 / 45

OR MSc Maths Revision Course

5 Quasi-Newton Methods

Outline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

Optimization Methods

Performance Surfaces and Optimum Points

PETROV-GALERKIN METHODS

Preliminary Examination, Numerical Analysis, August 2016

IPAM Summer School Optimization methods for machine learning. Jorge Nocedal

ECE 680 Modern Automatic Control. Gradient and Newton s Methods A Review

Transcription:

Quadratic forms We consider the quadratic function f : R 2 R defined by f(x) = 2 xt Ax b T x with x = (x, x 2 ) T, () where A R 2 2 is symmetric and b R 2. We will see that, depending on the eigenvalues of A, the quadratic function f behaves very differently. Note that A is the second derivative of f, i.e., the Hessian matrix. To study basic properties of quadratic forms we first consider the case with a positive definite matrix ( ) 2 A =, b =. (2) 2 The eigenvectors of A corresponding to the eigenvalues λ =, λ 2 = 3 are u = ( ) u 2 = ( ). 2 2 Defining the orthogonal matrix U := [u, u 2 ], we obtain the eigenvalue de- 3.8.6 2.5.4 2.2.5.2.5.4.6.8.8.6.4.2.2.4.6.8.5.5.5.5 Figure : Quadratic form with the positive definite matrix A defined in (2). Left: Contour lines of f(x); the red lines indicate the eigenvector directions of A. Right: Graph of the function. Note that the function is bounded from below and convex. composition of A, i.e., U T AU = Λ = ( ). 3 Note that U T = U. The contour lines of f as well as the eigenvector directions are shown in Figure. Defining the new variables x := U T x, the

quadratic form corresponding to (2) can, in the new variables, be written as f( x) = 2 xλ x. Thus, in the variables that correspond to the eigenvector directions, the quadratic form is based on the diagonal matrix Λ, and the eigenvalue matrix U corresponds to the basis transformation. So to study basic properties of quadratic forms, we can restrict ourselves to diagonal matrices A. Note also that the curvature d2 f( x) d x 2 i is simply λ i, so that the ith eigenvalue represents the curvature in the direction of the ith eigenvector. In two dimensions, f( x) is an ellipse with principle radii proportional to λ.5 i..8.8.6.6.4.4.2.2.2.2.4.6.4.8.6.8.8.6.4.2.2.4.6.8.5.5.5.5.8.8.6.6.4.4.2.2.2.2.4.6.4.8.6.8.8.6.4.2.2.4.6.8.5.5.5.5 Figure 2: Quadratic form with indefinite matrices A (upper row) and A 2 (lower row) defined in (3). Left: Contour lines. Right: Graph of the function. Note that the function is unbounded from above and from below. 2

We next consider the quadratic form corresponding to the indefinite matrices ( ) ( ) 2 2 A =, A 2 2 =, (3) 2 and use b =. Visualizations of the corresponding quadratic form are shown in Figure 2. Note that the function corresponding to A coincides with the one from A 2 after interchanging the coordinate axes. The origin is a maximum in one coordinate direction and a minimum in the other direction, which is a consequence of the indefiniteness of matrices A and A 2. The corresponding quadratic functions are bounded neither from above nor from below, and thus do not have a minimum or a maximum. Finally, we study quadratic forms with semi-definite Hessian matrices. We consider the cases A = ( ) 2, b = ( ), b 2 = ( ) For the indefinite case, the choice of b influences whether there exists a minimum or not. Visualizations of the quadratic forms can be seen in Figure 3. In the direction in which the Hessian matrix is singular, the function is determined by the linear term b. The function based on A and b is unbounded from below and, thus, does not have a minimum. On the other hand, the function based on A and b 2 is independent of x 2, and bounded from below. Thus, all points with x 2 = are minima of f. If b i lies in the range space of A (i.e., it can be written as a linear combination of the columns of A), then there exists a minimum, and in fact infinitely many when A is singular, as it is here. On the other hand, when b i does not lie in the range space of A, then there does not exist a minimum. In summary, when the symmetric matrix A is positive definite, the quadratic form has a unique minimum at the stationary point (i.e. the point at which the gradient vanishes). When A is indefinite, the quadratic form has a stationary point, but it is not a minimum. Finally, when A is singular, it has either no stationary points (when b does not lie in the range space of A), or infinitely many (when b lies in the range space). Convergence of steepest descent for increasingly illconditioned matrices We consider the quadratic function (4) f(x, x 2 ) = 2 (c x 2 + c 2 x 2 2) = 2 xt Ax (5) 3

.5.5.5.5.5.5.5.5.5.5.5.5.5.5.5.5.7.6.5.4.3.5.2...2.5.5.5.4.2.2.4.6.8.2.4.5.5.5.5 Figure 3: Quadratic form with semi-definite matrix A, along with b (upper row) and b 2 (lower row) defined in (4). Note that the case of b corresponds to an inconsistent right side (i.e., b does not belong to the range space of A), and that of b 2 corresponds to a consistent right side (b 2 does belong to the range space of A). Left: Contour lines. Right: Graph of the function. Note that depending on b, the function does not have a minimum (upper row) or has infinitely many minima (lower row). for various c and c 2, where A = diag(c, c 2 ) and x = (x, x 2 ) T. The function is convex and has a global minimum at x = x 2 =. Since A is diagonal, c and c 2 are also the eigenvalues of A. We use the steepest descent method with exact line search to minimize (5). A listing of the simple algorithm is given next. % s t a r t i n g p o i n t x = c2 / s q r t ( c ˆ2 + c2 ˆ 2 ) ; x2 = c / s q r t ( c ˆ2 + c2 ˆ 2 ) ; f o r i t e r = : e r r = s q r t ( x ˆ2 + x2 ˆ 2 ) ; f p r i n t f ( I t e r : %3d : x : %+4.8f, x2 : %+4.8f, e r r o r %4.8 f \n, i t e r, x, x2, e r r ) ; 4

i f ( e r r o r < e 2) f p r i n t f ( Converged with e r r o r %2.2 f. \ n, e r r o r ) ; b reak ; end % e x a c t l i n e s e a r c h a l p h a = ( c ˆ2 x ˆ2 + c2 ˆ2 x2 ˆ2) / ( c ˆ3 x ˆ2 + c2 ˆ3 x2 ˆ 2 ) ; g = c x ; g2 = c2 x2 ; x = x a l p h a g ; x2 = x2 a l p h a g2 ; end Running the above script with c = c 2 =, the method terminates after a single iteration. This one-step convergence is a property of the steepest descent when the eigenvalues c, c 2 coincide and thus the contour lines are circles; see Figure 4..8.8.6.6.4.4.2.2.2.2.4.4.6.6.8.8.8.6.4.2.2.4.6.8.8.6.4.2.2.4.6.8 Figure 4: Contour lines and iterates for c = c 2 = (left plot) and c = 5, c 2 = (right plot). I t e r a t i o n : x : +.77678, x2 : +.77678, e r r o r. I t e r a t i o n 2 : x : +., x2 : +., e r r o r. Converged with e r r o r.. For c = 5, c 2 =, the iteration terminates after 36 iterations; the first iterations are as follows: I t e r a t i o n : x : +.9664, x2 : +.985868, e r r o r. I t e r a t i o n 2 : x :.37449, x2 : +.6537245, e r r o r.666666666667 I t e r a t i o n 3 : x : +.876273, x2 : +.4358363, e r r o r.444444444444 I t e r a t i o n 4 : x :.58848, x2 : +.2954242, e r r o r.296296296296 I t e r a t i o n 5 : x : +.3873899, x2 : +.9369495, e r r o r.975386498 5

I t e r a t i o n 6 : x :.2582599, x2 : +.292997, e r r o r.3687242798 I t e r a t i o n 7 : x : +.72733, x2 : +.868664, e r r o r.877949599 I t e r a t i o n 8 : x :.47822, x2 : +.5739, e r r o r.58527663466 I t e r a t i o n 9 : x : +.76525, x2 : +.382673, e r r o r.3984423 I t e r a t i o n : x :.543, x2 : +.25575, e r r o r.262294874 Taking coefficients between errors of two consecutive iterations, we observe that err k+ err k = 2 3 = 5 5 + = κ κ +, where κ denotes the condition number of the matrix A, i.e., ( ) c κ = cond = λ max(a) c 2 λ min (A) = c. c 2 The contour lines of f for c = c 2 = and c = 5, c 2 = are shown in Figure 4. Now, we study the function (5) with c 2 = and with different values for c, namely c =, 5,,. The number of iterations required for these cases are 39, 629, 383 and 387, respectively. As can be seen, the number increases significantly with c, and thus with increasing κ. The output of the first iterations for c = are I t e r a t i o n : x : +.995372, x2 : +.995379, e r r o r. I t e r a t i o n 2 : x :.8423, x2 : +.84234, e r r o r.8888882 I t e r a t i o n 3 : x : +.666993, x2 : +.6669928, e r r o r.6694248763 I t e r a t i o n 4 : x :.544993, x2 : +.5449932, e r r o r.54778489857 I t e r a t i o n 5 : x : +.44592, x2 : +.44597, e r r o r.448252865 I t e r a t i o n 6 : x :.3648282, x2 : +.36482823, e r r o r.36664783253 I t e r a t i o n 7 : x : +.2984958, x2 : +.29849582, e r r o r.299984589862 I t e r a t i o n 8 : x :.2442239, x2 : +.24422386, e r r o r.245449376 I t e r a t i o n 9 : x : +.99895, x2 : +.998952, e r r o r.286343 I t e r a t i o n : x :.634887, x2 : +.634887, e r r o r.64346694 Taking quotients of consecutive errors, we again observe the theoretically expected convergence rate of (κ )/(κ + ) = 9/ =.882. The large number of iterations for the other cases of c can be explained by the increasing ill-conditioning of the quadratic form. The theoretical convergence rates for c = 5,, are.968,.982 and.998, respectively. These are also exactly the rates we observe for all these cases. Contour lines and iterates for c 2 =, 5 are shown in Figure 5. 6

.8.8.6.6.4.4.2.2.2.2.4.4.6.6.8.8.8.6.4.2.2.4.6.8.8.6.4.2.2.4.6.8 Figure 5: Contour lines and iterates for c = (left plot) and c = 5 (right plot). Convergence examples for Newton s method Here, we study convergence properties of Newton s method for various functions. We start by studying the nonlinear function f : R R defined by f(x) = 2 x2 3 x3. (6) The graph of this function, as well as of its derivative are shown in Figure 6..8.6.4.5.2.2.4.5.6.5.5.5 2 2.5.5.5 2 Figure 6: Graph of nonlinear function defined in (6) (left plot) and of its derivative (right plot). We want to find the (local) minimum of f, i.e., the point x =. As expected, at the local minimum, the derivative of f vanishes. However, the derivative 7

also vanishes at the local maximum x =. A Newton method to find the local minimum of f uses the stationarity of the derivative at the extremal points (minimum and maximum). At a given point x k, the new point x k+ is computed as (where we use a step length of ) x k+ = x2 k 2x k This expression is plotted in Figure 7. First, we consider a Newton iteration (7) 3 f(x)=x 2 /(2x ) 2 2 3.5.5.5 2 Figure 7: The Newton step (7) plotted as a function. starting from x = 2. We obtain the iterations I t e r a t i o n : x :+2. I t e r a t i o n 2 : x :+.256425642573 I t e r a t i o n 3 : x : +5.3972756239 I t e r a t i o n 4 : x : +2.9765664592565 I t e r a t i o n 5 : x : +.7868294993432 I t e r a t i o n 6 : x : +.2425829232779 I t e r a t i o n 7 : x : +.38987964455772 I t e r a t i o n 8 : x : +.4464549898 I t e r a t i o n 9 : x : +.9826397768 I t e r a t i o n : x : +.3939 I t e r a t i o n : x : +. Thus, the Newton method converges to the local maximum x =. Observe that initially, the convergence rate appears to be linear, but beginning from the 6th iteration we can observe the quadratic convergence of Newton s method, i.e., the number of correct digits doubles at every Newton iteration. Next we use the initialization x = 2, which results in the following iterations: 8

I t e r a t i o n : x : 2. I t e r a t i o n 2 : x : 9.7569756975695 I t e r a t i o n 3 : x : 4.64236652692562 I t e r a t i o n 4 : x : 2.944362725536236 I t e r a t i o n 5 : x :.8453985955295 I t e r a t i o n 6 : x :.265683788656865 I t e r a t i o n 7 : x :.467339997474 I t e r a t i o n 8 : x :.9436273736 I t e r a t i o n 9 : x :.3763593883 I t e r a t i o n : x :.465 I t e r a t i o n : x :. Now, the iterates converge to the local minimum x =. Again we initially observe linear convergence, and as we get close to the minimum, the convergence becomes quadratic. To show how sensitive Newton s method is to the initial guess, we now initialize the method with values that are very close to each other. Initializing with x =.5 results in convergence to x = : I t e r a t i o n : x : +.5 I t e r a t i o n 2 : x :+25.54999999998745 I t e r a t i o n 3 : x : +63.2499959999559 I t e r a t i o n 4 : x : +3.752624958943 I t e r a t i o n 5 : x : +6.3324334468 I t e r a t i o n 6 : x : +8.3235335262457 I t e r a t i o n 7 : x : +4.427554887957378 I t e r a t i o n 8 : x : +2.495638666534 I t e r a t i o n 9 : x : +.5643962594859 I t e r a t i o n : x : +.489544835822 I t e r a t i o n : x : +.692549424664 I t e r a t i o n 2 : x : +.276933257698 I t e r a t i o n 3 : x : +.766495756 I t e r a t i o n 4 : x : +.58 Initialization with x =.499 leads to convergence to the local minimum x = : I t e r a t i o n : x : +.499 I t e r a t i o n 2 : x : 24.54999999998887 I t e r a t i o n 3 : x : 62.249995999963 I t e r a t i o n 4 : x : 3.75262495894 I t e r a t i o n 5 : x : 5.332433452 I t e r a t i o n 6 : x : 7.32353352624543 I t e r a t i o n 7 : x : 3.427554887957387 I t e r a t i o n 8 : x :.4956386665345 I t e r a t i o n 9 : x :.5643962594862 I t e r a t i o n : x :.489544835824 I t e r a t i o n : x :.692549424666 I t e r a t i o n 2 : x :.276933257697 I t e r a t i o n 3 : x :.766495756 9

I t e r a t i o n 4 : x :.59 Finally, if the method is initialized with x =.5, it diverges since the second derivative (i.e., the Hessian) of f is singular at this point. Next, we study the function f(x) = x4 4 which is shown together with its derivative in Figure 8. This function has a singular Hessian at its minimum x =. Using the Newton method to find the.25.8.2.6.4.5.2..2.4.5.6.8.8.6.4.2.2.4.6.8.8.6.4.2.2.4.6.8 Figure 8: Function f(x) = x 4 /4 (left) and its derivative (right). global minimum with the starting guess x = results in the following iterations (only the first 5 iterations are shown): I t e r a t i o n : x : +. I t e r a t i o n 2 : x : +.6666666666666666 I t e r a t i o n 3 : x : +.4444444444444444 I t e r a t i o n 4 : x : +.2962962962962963 I t e r a t i o n 5 : x : +.975386497539 I t e r a t i o n 6 : x : +.36872427983539 I t e r a t i o n 7 : x : +.877949598926 I t e r a t i o n 8 : x : +.58527663465935 I t e r a t i o n 9 : x : +.39844236234 I t e r a t i o n : x : +.2622948737489 I t e r a t i o n : x : +.734529958326 I t e r a t i o n 2 : x : +.5699438884 I t e r a t i o n 3 : x : +.7773466292589 I t e r a t i o n 4 : x : +.5382386726 I t e r a t i o n 5 : x : +.342548739787 Note that the iterates converge to the solution x =, but they converge at just a linear rate due to singularity of the Hessian at the solution. For the initial guess x =, the iterates are the negative of the ones shown above.

Finally, we consider the negative hyperbolic secant function f(x) = sech(x). The graph of the function and its derivative are shown in Figure 9. This function changes its curvature, and thus Newton s method diverges if the initial guess is too far from the minimum. The Newton iteration to find the minimum x =. sech(x).4 sech(x)*tanh(x).2.3.3.2.4..5.6..7.2.8.3.9.4 4 3 2 2 3 4 4 3 2 2 3 4 Figure 9: Function f(x) = sech(x) (left) and its derivative (right). of this function computes, at a current iterate x k, the new iterate as x k+ = x k + sinh(2x k) cosh(2x k ) 3. (8) We first study the Newton iterates for starting value x =. and observe quadratic convergence (actually, the convergence is even faster than quadratic, which is due to properties of the hyperbolic secant function): I t e r a t i o n : x : +. I t e r a t i o n 2 : x :.688278843722 I t e r a t i o n 3 : x : +.82476 I t e r a t i o n 4 : x :. Starting the iteration from x =.2 or x =.5, the method also converges to the local (and global) minimum. The iterates for starting guess of x =.5 are: I t e r a t i o n : x : +.5 I t e r a t i o n 2 : x :.366343424646 I t e r a t i o n 3 : x : +.5463437235 I t e r a t i o n 4 : x :.27279652389 I t e r a t i o n 5 : x : +.338348 I t e r a t i o n 6 : x : +.

However, if the method is initialized with x =.6 (or any value larger than that), the method diverges. The first iterates for a starting guess of x =.6 are: I t e r a t i o n : x : +.6 I t e r a t i o n 2 : x :.6695493584473 I t e r a t i o n 3 : x : +.75394947246 I t e r a t i o n 4 : x : +3.4429554757227767 I t e r a t i o n 5 : x : +4.449237479672 I t e r a t i o n 6 : x : +5.449944895447 I t e r a t i o n 7 : x : +6.45548922755296 I t e r a t i o n 8 : x : +7.45698794426 I t e r a t i o n 9 : x : +8.457973784 I t e r a t i o n : x : +9.45728796382 This divergence is explained by the fact that the function is concave at x =.6. 2