Quadratic forms We consider the quadratic function f : R 2 R defined by f(x) = 2 xt Ax b T x with x = (x, x 2 ) T, () where A R 2 2 is symmetric and b R 2. We will see that, depending on the eigenvalues of A, the quadratic function f behaves very differently. Note that A is the second derivative of f, i.e., the Hessian matrix. To study basic properties of quadratic forms we first consider the case with a positive definite matrix ( ) 2 A =, b =. (2) 2 The eigenvectors of A corresponding to the eigenvalues λ =, λ 2 = 3 are u = ( ) u 2 = ( ). 2 2 Defining the orthogonal matrix U := [u, u 2 ], we obtain the eigenvalue de- 3.8.6 2.5.4 2.2.5.2.5.4.6.8.8.6.4.2.2.4.6.8.5.5.5.5 Figure : Quadratic form with the positive definite matrix A defined in (2). Left: Contour lines of f(x); the red lines indicate the eigenvector directions of A. Right: Graph of the function. Note that the function is bounded from below and convex. composition of A, i.e., U T AU = Λ = ( ). 3 Note that U T = U. The contour lines of f as well as the eigenvector directions are shown in Figure. Defining the new variables x := U T x, the
quadratic form corresponding to (2) can, in the new variables, be written as f( x) = 2 xλ x. Thus, in the variables that correspond to the eigenvector directions, the quadratic form is based on the diagonal matrix Λ, and the eigenvalue matrix U corresponds to the basis transformation. So to study basic properties of quadratic forms, we can restrict ourselves to diagonal matrices A. Note also that the curvature d2 f( x) d x 2 i is simply λ i, so that the ith eigenvalue represents the curvature in the direction of the ith eigenvector. In two dimensions, f( x) is an ellipse with principle radii proportional to λ.5 i..8.8.6.6.4.4.2.2.2.2.4.6.4.8.6.8.8.6.4.2.2.4.6.8.5.5.5.5.8.8.6.6.4.4.2.2.2.2.4.6.4.8.6.8.8.6.4.2.2.4.6.8.5.5.5.5 Figure 2: Quadratic form with indefinite matrices A (upper row) and A 2 (lower row) defined in (3). Left: Contour lines. Right: Graph of the function. Note that the function is unbounded from above and from below. 2
We next consider the quadratic form corresponding to the indefinite matrices ( ) ( ) 2 2 A =, A 2 2 =, (3) 2 and use b =. Visualizations of the corresponding quadratic form are shown in Figure 2. Note that the function corresponding to A coincides with the one from A 2 after interchanging the coordinate axes. The origin is a maximum in one coordinate direction and a minimum in the other direction, which is a consequence of the indefiniteness of matrices A and A 2. The corresponding quadratic functions are bounded neither from above nor from below, and thus do not have a minimum or a maximum. Finally, we study quadratic forms with semi-definite Hessian matrices. We consider the cases A = ( ) 2, b = ( ), b 2 = ( ) For the indefinite case, the choice of b influences whether there exists a minimum or not. Visualizations of the quadratic forms can be seen in Figure 3. In the direction in which the Hessian matrix is singular, the function is determined by the linear term b. The function based on A and b is unbounded from below and, thus, does not have a minimum. On the other hand, the function based on A and b 2 is independent of x 2, and bounded from below. Thus, all points with x 2 = are minima of f. If b i lies in the range space of A (i.e., it can be written as a linear combination of the columns of A), then there exists a minimum, and in fact infinitely many when A is singular, as it is here. On the other hand, when b i does not lie in the range space of A, then there does not exist a minimum. In summary, when the symmetric matrix A is positive definite, the quadratic form has a unique minimum at the stationary point (i.e. the point at which the gradient vanishes). When A is indefinite, the quadratic form has a stationary point, but it is not a minimum. Finally, when A is singular, it has either no stationary points (when b does not lie in the range space of A), or infinitely many (when b lies in the range space). Convergence of steepest descent for increasingly illconditioned matrices We consider the quadratic function (4) f(x, x 2 ) = 2 (c x 2 + c 2 x 2 2) = 2 xt Ax (5) 3
.5.5.5.5.5.5.5.5.5.5.5.5.5.5.5.5.7.6.5.4.3.5.2...2.5.5.5.4.2.2.4.6.8.2.4.5.5.5.5 Figure 3: Quadratic form with semi-definite matrix A, along with b (upper row) and b 2 (lower row) defined in (4). Note that the case of b corresponds to an inconsistent right side (i.e., b does not belong to the range space of A), and that of b 2 corresponds to a consistent right side (b 2 does belong to the range space of A). Left: Contour lines. Right: Graph of the function. Note that depending on b, the function does not have a minimum (upper row) or has infinitely many minima (lower row). for various c and c 2, where A = diag(c, c 2 ) and x = (x, x 2 ) T. The function is convex and has a global minimum at x = x 2 =. Since A is diagonal, c and c 2 are also the eigenvalues of A. We use the steepest descent method with exact line search to minimize (5). A listing of the simple algorithm is given next. % s t a r t i n g p o i n t x = c2 / s q r t ( c ˆ2 + c2 ˆ 2 ) ; x2 = c / s q r t ( c ˆ2 + c2 ˆ 2 ) ; f o r i t e r = : e r r = s q r t ( x ˆ2 + x2 ˆ 2 ) ; f p r i n t f ( I t e r : %3d : x : %+4.8f, x2 : %+4.8f, e r r o r %4.8 f \n, i t e r, x, x2, e r r ) ; 4
i f ( e r r o r < e 2) f p r i n t f ( Converged with e r r o r %2.2 f. \ n, e r r o r ) ; b reak ; end % e x a c t l i n e s e a r c h a l p h a = ( c ˆ2 x ˆ2 + c2 ˆ2 x2 ˆ2) / ( c ˆ3 x ˆ2 + c2 ˆ3 x2 ˆ 2 ) ; g = c x ; g2 = c2 x2 ; x = x a l p h a g ; x2 = x2 a l p h a g2 ; end Running the above script with c = c 2 =, the method terminates after a single iteration. This one-step convergence is a property of the steepest descent when the eigenvalues c, c 2 coincide and thus the contour lines are circles; see Figure 4..8.8.6.6.4.4.2.2.2.2.4.4.6.6.8.8.8.6.4.2.2.4.6.8.8.6.4.2.2.4.6.8 Figure 4: Contour lines and iterates for c = c 2 = (left plot) and c = 5, c 2 = (right plot). I t e r a t i o n : x : +.77678, x2 : +.77678, e r r o r. I t e r a t i o n 2 : x : +., x2 : +., e r r o r. Converged with e r r o r.. For c = 5, c 2 =, the iteration terminates after 36 iterations; the first iterations are as follows: I t e r a t i o n : x : +.9664, x2 : +.985868, e r r o r. I t e r a t i o n 2 : x :.37449, x2 : +.6537245, e r r o r.666666666667 I t e r a t i o n 3 : x : +.876273, x2 : +.4358363, e r r o r.444444444444 I t e r a t i o n 4 : x :.58848, x2 : +.2954242, e r r o r.296296296296 I t e r a t i o n 5 : x : +.3873899, x2 : +.9369495, e r r o r.975386498 5
I t e r a t i o n 6 : x :.2582599, x2 : +.292997, e r r o r.3687242798 I t e r a t i o n 7 : x : +.72733, x2 : +.868664, e r r o r.877949599 I t e r a t i o n 8 : x :.47822, x2 : +.5739, e r r o r.58527663466 I t e r a t i o n 9 : x : +.76525, x2 : +.382673, e r r o r.3984423 I t e r a t i o n : x :.543, x2 : +.25575, e r r o r.262294874 Taking coefficients between errors of two consecutive iterations, we observe that err k+ err k = 2 3 = 5 5 + = κ κ +, where κ denotes the condition number of the matrix A, i.e., ( ) c κ = cond = λ max(a) c 2 λ min (A) = c. c 2 The contour lines of f for c = c 2 = and c = 5, c 2 = are shown in Figure 4. Now, we study the function (5) with c 2 = and with different values for c, namely c =, 5,,. The number of iterations required for these cases are 39, 629, 383 and 387, respectively. As can be seen, the number increases significantly with c, and thus with increasing κ. The output of the first iterations for c = are I t e r a t i o n : x : +.995372, x2 : +.995379, e r r o r. I t e r a t i o n 2 : x :.8423, x2 : +.84234, e r r o r.8888882 I t e r a t i o n 3 : x : +.666993, x2 : +.6669928, e r r o r.6694248763 I t e r a t i o n 4 : x :.544993, x2 : +.5449932, e r r o r.54778489857 I t e r a t i o n 5 : x : +.44592, x2 : +.44597, e r r o r.448252865 I t e r a t i o n 6 : x :.3648282, x2 : +.36482823, e r r o r.36664783253 I t e r a t i o n 7 : x : +.2984958, x2 : +.29849582, e r r o r.299984589862 I t e r a t i o n 8 : x :.2442239, x2 : +.24422386, e r r o r.245449376 I t e r a t i o n 9 : x : +.99895, x2 : +.998952, e r r o r.286343 I t e r a t i o n : x :.634887, x2 : +.634887, e r r o r.64346694 Taking quotients of consecutive errors, we again observe the theoretically expected convergence rate of (κ )/(κ + ) = 9/ =.882. The large number of iterations for the other cases of c can be explained by the increasing ill-conditioning of the quadratic form. The theoretical convergence rates for c = 5,, are.968,.982 and.998, respectively. These are also exactly the rates we observe for all these cases. Contour lines and iterates for c 2 =, 5 are shown in Figure 5. 6
.8.8.6.6.4.4.2.2.2.2.4.4.6.6.8.8.8.6.4.2.2.4.6.8.8.6.4.2.2.4.6.8 Figure 5: Contour lines and iterates for c = (left plot) and c = 5 (right plot). Convergence examples for Newton s method Here, we study convergence properties of Newton s method for various functions. We start by studying the nonlinear function f : R R defined by f(x) = 2 x2 3 x3. (6) The graph of this function, as well as of its derivative are shown in Figure 6..8.6.4.5.2.2.4.5.6.5.5.5 2 2.5.5.5 2 Figure 6: Graph of nonlinear function defined in (6) (left plot) and of its derivative (right plot). We want to find the (local) minimum of f, i.e., the point x =. As expected, at the local minimum, the derivative of f vanishes. However, the derivative 7
also vanishes at the local maximum x =. A Newton method to find the local minimum of f uses the stationarity of the derivative at the extremal points (minimum and maximum). At a given point x k, the new point x k+ is computed as (where we use a step length of ) x k+ = x2 k 2x k This expression is plotted in Figure 7. First, we consider a Newton iteration (7) 3 f(x)=x 2 /(2x ) 2 2 3.5.5.5 2 Figure 7: The Newton step (7) plotted as a function. starting from x = 2. We obtain the iterations I t e r a t i o n : x :+2. I t e r a t i o n 2 : x :+.256425642573 I t e r a t i o n 3 : x : +5.3972756239 I t e r a t i o n 4 : x : +2.9765664592565 I t e r a t i o n 5 : x : +.7868294993432 I t e r a t i o n 6 : x : +.2425829232779 I t e r a t i o n 7 : x : +.38987964455772 I t e r a t i o n 8 : x : +.4464549898 I t e r a t i o n 9 : x : +.9826397768 I t e r a t i o n : x : +.3939 I t e r a t i o n : x : +. Thus, the Newton method converges to the local maximum x =. Observe that initially, the convergence rate appears to be linear, but beginning from the 6th iteration we can observe the quadratic convergence of Newton s method, i.e., the number of correct digits doubles at every Newton iteration. Next we use the initialization x = 2, which results in the following iterations: 8
I t e r a t i o n : x : 2. I t e r a t i o n 2 : x : 9.7569756975695 I t e r a t i o n 3 : x : 4.64236652692562 I t e r a t i o n 4 : x : 2.944362725536236 I t e r a t i o n 5 : x :.8453985955295 I t e r a t i o n 6 : x :.265683788656865 I t e r a t i o n 7 : x :.467339997474 I t e r a t i o n 8 : x :.9436273736 I t e r a t i o n 9 : x :.3763593883 I t e r a t i o n : x :.465 I t e r a t i o n : x :. Now, the iterates converge to the local minimum x =. Again we initially observe linear convergence, and as we get close to the minimum, the convergence becomes quadratic. To show how sensitive Newton s method is to the initial guess, we now initialize the method with values that are very close to each other. Initializing with x =.5 results in convergence to x = : I t e r a t i o n : x : +.5 I t e r a t i o n 2 : x :+25.54999999998745 I t e r a t i o n 3 : x : +63.2499959999559 I t e r a t i o n 4 : x : +3.752624958943 I t e r a t i o n 5 : x : +6.3324334468 I t e r a t i o n 6 : x : +8.3235335262457 I t e r a t i o n 7 : x : +4.427554887957378 I t e r a t i o n 8 : x : +2.495638666534 I t e r a t i o n 9 : x : +.5643962594859 I t e r a t i o n : x : +.489544835822 I t e r a t i o n : x : +.692549424664 I t e r a t i o n 2 : x : +.276933257698 I t e r a t i o n 3 : x : +.766495756 I t e r a t i o n 4 : x : +.58 Initialization with x =.499 leads to convergence to the local minimum x = : I t e r a t i o n : x : +.499 I t e r a t i o n 2 : x : 24.54999999998887 I t e r a t i o n 3 : x : 62.249995999963 I t e r a t i o n 4 : x : 3.75262495894 I t e r a t i o n 5 : x : 5.332433452 I t e r a t i o n 6 : x : 7.32353352624543 I t e r a t i o n 7 : x : 3.427554887957387 I t e r a t i o n 8 : x :.4956386665345 I t e r a t i o n 9 : x :.5643962594862 I t e r a t i o n : x :.489544835824 I t e r a t i o n : x :.692549424666 I t e r a t i o n 2 : x :.276933257697 I t e r a t i o n 3 : x :.766495756 9
I t e r a t i o n 4 : x :.59 Finally, if the method is initialized with x =.5, it diverges since the second derivative (i.e., the Hessian) of f is singular at this point. Next, we study the function f(x) = x4 4 which is shown together with its derivative in Figure 8. This function has a singular Hessian at its minimum x =. Using the Newton method to find the.25.8.2.6.4.5.2..2.4.5.6.8.8.6.4.2.2.4.6.8.8.6.4.2.2.4.6.8 Figure 8: Function f(x) = x 4 /4 (left) and its derivative (right). global minimum with the starting guess x = results in the following iterations (only the first 5 iterations are shown): I t e r a t i o n : x : +. I t e r a t i o n 2 : x : +.6666666666666666 I t e r a t i o n 3 : x : +.4444444444444444 I t e r a t i o n 4 : x : +.2962962962962963 I t e r a t i o n 5 : x : +.975386497539 I t e r a t i o n 6 : x : +.36872427983539 I t e r a t i o n 7 : x : +.877949598926 I t e r a t i o n 8 : x : +.58527663465935 I t e r a t i o n 9 : x : +.39844236234 I t e r a t i o n : x : +.2622948737489 I t e r a t i o n : x : +.734529958326 I t e r a t i o n 2 : x : +.5699438884 I t e r a t i o n 3 : x : +.7773466292589 I t e r a t i o n 4 : x : +.5382386726 I t e r a t i o n 5 : x : +.342548739787 Note that the iterates converge to the solution x =, but they converge at just a linear rate due to singularity of the Hessian at the solution. For the initial guess x =, the iterates are the negative of the ones shown above.
Finally, we consider the negative hyperbolic secant function f(x) = sech(x). The graph of the function and its derivative are shown in Figure 9. This function changes its curvature, and thus Newton s method diverges if the initial guess is too far from the minimum. The Newton iteration to find the minimum x =. sech(x).4 sech(x)*tanh(x).2.3.3.2.4..5.6..7.2.8.3.9.4 4 3 2 2 3 4 4 3 2 2 3 4 Figure 9: Function f(x) = sech(x) (left) and its derivative (right). of this function computes, at a current iterate x k, the new iterate as x k+ = x k + sinh(2x k) cosh(2x k ) 3. (8) We first study the Newton iterates for starting value x =. and observe quadratic convergence (actually, the convergence is even faster than quadratic, which is due to properties of the hyperbolic secant function): I t e r a t i o n : x : +. I t e r a t i o n 2 : x :.688278843722 I t e r a t i o n 3 : x : +.82476 I t e r a t i o n 4 : x :. Starting the iteration from x =.2 or x =.5, the method also converges to the local (and global) minimum. The iterates for starting guess of x =.5 are: I t e r a t i o n : x : +.5 I t e r a t i o n 2 : x :.366343424646 I t e r a t i o n 3 : x : +.5463437235 I t e r a t i o n 4 : x :.27279652389 I t e r a t i o n 5 : x : +.338348 I t e r a t i o n 6 : x : +.
However, if the method is initialized with x =.6 (or any value larger than that), the method diverges. The first iterates for a starting guess of x =.6 are: I t e r a t i o n : x : +.6 I t e r a t i o n 2 : x :.6695493584473 I t e r a t i o n 3 : x : +.75394947246 I t e r a t i o n 4 : x : +3.4429554757227767 I t e r a t i o n 5 : x : +4.449237479672 I t e r a t i o n 6 : x : +5.449944895447 I t e r a t i o n 7 : x : +6.45548922755296 I t e r a t i o n 8 : x : +7.45698794426 I t e r a t i o n 9 : x : +8.457973784 I t e r a t i o n : x : +9.45728796382 This divergence is explained by the fact that the function is concave at x =.6. 2