1. First and Second Order Conditions for A Local Min or Max.

Size: px

Start display at page:

Download "1. First and Second Order Conditions for A Local Min or Max."

Winifred Copeland
6 years ago
Views:

1 1 MATH CHAPTER 3: UNCONSTRAINED OPTIMIZATION W. Erwin Diewert 1. First and Second Order Conditions for A Local Min or Max. Consider the problem of maximizing or minimizing a function of N variables, f(x 1,..., x N ) = f(x). From Chapter 1, we found that the first order necessary conditions for x 0 to be a local minimizer or maximizer for f were: (1) D v f(x 0 ) = 0 for all directions v 0 N where the first order directional derivative of f in the direction v evaluated at the point x is defined as (2) D v f(x) lim tæ0 [f(x + tv ) - f(x)] / t. In the case where the first order partial derivatives of f exist and are continuous around x 0, we found that conditions (1) were equivalent to the following N first order necessary conditions for x 0 to be a local minimizer or maximizer for f: (3) f 1 (x 0 ) = 0 ; f 2 (x 0 ) = 0 ;... ; f N (x 0 ) = 0 where the ith first order partial derivative of f is defined as (4) f i (x) lim tæ0 [f(x + te i ) - f(x)]/t; i = 1,..., N where e i is the ith unit vector. It is convenient to introduce a symbol to denote the vector of first order partial derivatives of f evaluated at the point x: (5) f(x) [f 1 (x), f 2 (x),..., f N (x)] T. Note that we have defined f(x) (called the gradient vector of f evaluated at x) to be column vector. Using the notation (5), the system of first order conditions (3) can be written more efficiently as: (6) f(x 0 ) = 0 N. Also using (5), it can be seen that our old First Order Directional Derivative Theorem can be written as: 1

2 2 (7) D v f(x 0 N ) = S i=1vi f i (x 0 ) = v T f(x 0 ). Recall that we required the first order partial derivative functions f i (x) to exist and be continuous around x 0, in order to derive the formula (7). It is also convenient to introduce a notation for the N by N matrix of second order partial derivatives of f evaluated at the point x: È f 11 (x),..., f 1N (x) (8) 2 f(x) Í :. :. Î Í f N1 (x),..., f NN (x) where the ijth element in 2 f(x) is defined as (9) f ij (x) lim tæ0 [f i (x + te j ) - f i (x)]/t, where f i (x) is the ith first order partial derivative of f evaluated at the point x. The N by N matrix 2 f(x) is called the Hessian matrix of f evaluated at x. Recall that the directional derivative of the function D v f(x) evaluated at x in the direction u 0 N is defined as: (10) D vu f(x) lim tæ0 [D v f(x + tu) - D v f(x)]/t. Using (8), it can be seen that our old Second Order Directional Derivative Theorem can be written as follows: (11) D vu f(x) = v T 2 f(x)u. Recall that in order to prove (11), we required the existence and continuity of the second order partial derivative functions f ij (x) around the point x. Armed with formula (11), we can state second order sufficient conditions for x 0 to be a strict local minimizer of f: in addition to the first order conditions (6), we require the following second order conditions: (12) D vv f(x 0 ) = v T 2 f(x 0 ) v > 0 for all v 0 N. If it is convenient, we can replace v 0 N in (12) by v T v = 1 in order to obtain an equivalent set of conditions. Since conditions (6) are equivalent to conditions (1), it can be seen that conditions (6) and (12) are analogues to our single variable calculus conditions for a strict local minimum, except that these univariate conditions now have to hold for all possible directions v. 2

3 3 The counterpart to the univariate second order necessary conditions for x 0 to be a local minimizer for f are conditions (6) plus the following second order conditions: (13) D vv f(x 0 ) = v T 2 f(x 0 ) v 0 for all v 0 N. Obviously, there are analogous sufficient conditions for x 0 to be a strict local maximizer for f: in addition to (6), we require (14) D vv f(x 0 ) = v T 2 f(x 0 ) v < 0 for all v 0 N. Finally, the analogous necessary conditions for x 0 to be a local maximizer for f are (6) and the following second order conditions: (15) D vv f(x 0 ) = v T 2 f(x 0 ) v 0 for all v 0 N. Notice that if N 2, then the second order conditions (12) - (15) involve checking an infinite number of inequalities. In Chapter 1, we have shown how this task can be accomplished in the case where N = 2. In section 4 below, we will show how to do this checking of inequalities for a general N. However, before we do this, it is useful to develop a few properties of quadratic functions. 2. Taylor's Theorem and Quadratic Approximations Taylor's Theorem: Let f(x) be a function of one variable defined over the interval x 0 x x 1 where x 0 < x 1. Suppose the n-1 derivative of f, f (n-1) (x), exists and is continuous over this interval and suppose that the nth derivative of f, f ( n) (x), exists for x such that x 0 < x < x 1. Define the "remainder" R by the following equation: (16) f(x 1 ) = f(x 0 n-1 (x ) + S 1 -x 0 ) k k =1 k! f (k) (x 0 ) + R. Then there exists a point x* such that x 0 < x* < x 1 and (17) R = (x 1 - x 0 ) n f ( n) (x*) / n!. Proof: Define the number M by the following equation: (18) f(x 1 ) = f(x 0 n-1 (x ) + S 1 -x 0 ) k k =1 k! f (k) (x 0 ) + M (x1 -x 0 ) n n!. Define the function F(x) by: (19) F(x) -f(x 1 ) + f(x) + Â k=1 n-1 [(x - x 0 ) k /k!] f (k) (x) + M(x - x 0 ) n /n!. 3

4 4 It can be seen that F(x 1 ) = 0 and by using (18), it can be seen that F(x 0 ) = 0 as well. Thus the function F(x) is continuous for x such that x 0 x x 1 and F is such that F(x 0 ) = F(x 1 ). Thus the function F must attain a local min or a local max for at least one x* such that x 0 < x* < x 1. The first order necessary conditions for a min or a max of F(x) must hold at x = x* so we have 0 = F (x*) = f (x*) + S k =1 n-1 k (x 1 -x*) k- 1 k! + Mn (x1 -x*) n -1 n! (-1) (-1)f (k) n-1 (x (x*) + S 1 -x*) k k =1 k! differentiating the F defined by (19) f (k+1) (x*) = f (x*) - f (x*) - (x 1 - x*)f (2) (x*) - (x1 -x *) 2 2! f (3) (x*) (x1 -x *) n -2 (n-2)! f (n-1) (x*) 2! + (x 1 - x*)f (2) (x*) + (x1 -x *) 2 f (n) (x*) - (x 1 -x *) n -1 (n- 1)! M (n-2)! f (3) (x*) (x1 -x *) n -1 (n- 1)! (20) = (x 1 - x*) n-1 [f (n) (x*) - M]/(n-1)! cancelling terms. Since x 0 < x* < x 1 and hence x 1 - x* > 0, we see that (20) implies (21) f (n) (x*) = M. Now substitute (21) into (18) and we obtain (16) where R is defined by (17). Q.E.D. Note that Taylor's Theorem reduces to the Mean Value Theorem if we set n = 1. Where n = 2, Taylor's Theorem becomes, letting x 1 be replaced by x: (22) f(x) = f(x 0 ) + (x - x 0 ) f (x 0 ) + R. If we drop the remainder term R on the right hand side of (22), what is left is called the linear approximation to f around the point x 0, i.e., define (23) l(x) f(x 0 ) + (x - x 0 ) f (x 0 ). Then for x "reasonably" close to x 0, l(x) will approximate f(x) "reasonably" well: (24) f(x 0 ) + (x - x 0 ) f (x 0 ). Note that l(x 0 ) = f(x 0 ) and l (x 0 ) = f (x 0 ); i.e., the linear approximation to f around the point x 0 has the same level and first derivative as f when evaluated at x = x 0. 4

5 5 Figure 1: The Linear Approximation to f at x 0 f(x) approximation f(x 0 ) l(x) = linear f(x) x 0 x When n = 3, Taylor's Theorem becomes, letting x 1 be replaced by x: (25) f(x) = f(x 0 ) + (x - x 0 ) f (x 0 ) + (1/2)(x - x 0 ) 2 f (x 0 ) + R. If we drop the remainder term R on the right hand side of (25), what is left is called the quadratic approximation to f around the point x 0 : (26) q(x) f(x 0 ) + (x - x 0 ) f (x 0 ) + (1/2)(x - x 0 ) 2 f (x 0 ). Note that the quadratic approximation to f(x) around the point x 0 will have the same level and first and second derivatives evaluated at x = x 0 ; i.e., we have (27) q(x 0 ) = f(x 0 ) ; q (x 0 ) = f (x 0 ) ; q (x 0 ) = f (x 0 ). The quadratic approximation to f around the point x 0 will generally more closely approximate f around x 0 than the corresponding linear approximation. Figure 2: The Quadratic Approximation to f at x 0 f(x), q(x) q(x) x 0 f(x) 5

6 6 The concepts of linear and quadratic approximations to general nonlinear functions can be extended to functions of N variables using the univariate analysis developed above. Let f(x) = f(x 1, x 2,..., x N ) be a function of N variables with continuous first and second order partial derivatives. Now use f in order to define the following function of a single variable t: (28) g(t) f(x 0 + t(x - x 0 )) ; 0 t 1. Thus we have: (29) g(0) = f(x 0 ) and g(1) = f(x). Now apply the linear approximation idea to g around the point t = 0. Thus we have: g(0) + g (0)(t-0) = f(x 0 N ) + S i=1 f i (x 0 )(x i - x 0 )(t-0) differentiating (28) with respect to t and evaluating the derivatives at t = 0 (30) = f(x 0 ) + t T f(x 0 )(x - x 0 ) rearranging terms. Letting t = 1 and using (29), (30) becomes: (31) f(x 0 ) + T f(x 0 )(x - x 0 ) and the right hand side of (31) can be regarded as the linear approximation to f(x) around x = x 0. In order to calculate the quadratic approximation to g(t) around t = 0, we need to calculate the first and second derivatives of g(t). Differentiating (28) with respect to t, we obtain: N (32) g (t) = S i=1 f i (x 0 + t(x - x 0 0 ))(x i - x i ) = T f (x 0 + t(x - x 0 ))(x - x 0 ); N N (33) g (t) = S i=1sj=1 fij (x 0 + t(x - x ))(x i - x i )(xj - x j ) = (x - x 0 ) T 2 f(x 0 + t(x - x 0 ))(x - x 0 ). Thus the quadratic approximation to g around the point t = 0 is: 6

7 7 (34) g(0) + g (0) (t - 0) + (1/2) g (0)(t - 0) 2 (35) = f(x 0 ) + t T f(x 0 )(x - x 0 ) + (1/2)t 2 (x - x 0 ) T 2 f(x 0 )(x - x 0 ) where (35) follows from (34) using (29), (32) and (33). Now evaluate (35) at t = 1 and using (29), we have: (36) f(x 0 ) + T f(x 0 )(x - x 0 ) + (1/2)(x - x 0 ) T 2 f(x 0 )(x - x 0 ). Note that the right hand side of (36) is a quadratic function of x; it is called the quadratic approximation to f(x) around the point x = x 0. Note that Young's Theorem ( f ij (x 0 ) = f ji (x 0 ) ) for all i j) implies that the N by N matrix 2 f(x 0 ) in (36) will be symmetric. Linear and quadratic approximations to general nonlinear functions of N variables are widespread in economics, science, engineering, statistics and business. 3. Rules for Differentiating Linear and Quadratic Functions. Suppose f(x) is a linear function of N variables; i.e., N (37) f(x) a + S i=1 bi x i = a + b T x where b T [b 1, b 2,..., b N ]. Partially differentiating the f(x) defined by (37) with respect to the components of x yields: (38) f i (x) = f(x)/ x i = b i ; i = 1, 2,..., N. Obviously, equations (38) can be rewritten as: (39) f(x) = b if f(x) a + b T x; Rule 1. If we further differentiate the f i (x) defined by (38) with respect to the components of x, we obtain: (40) f ij (x) 2 f(x)/ x i x j = 0; 1 i, j N. The equations (40) can be written more compactly as: (41) 2 f(x) = 0 N N if f(x) a + b T x; Rule 2. Now suppose f(x) is the following (homogeneous) quadratic function of N variables; i.e., 7

8 8 N N (42) f(x) S i=1sj=1 aij x i x j = x T Ax where A = A T. Note that we are assuming that the matrix of coefficients A [a ij ] in (42) is symmetric; i.e., we have a ij = a ji for all i j. We want to calculate the first and second order partial derivatives of the f defined by (42). Let us first consider the case N = 2. In this case, taking into account the fact that a 12 = a 21, we have: (43) f(x 1, x 2 ) = a 11 x a12 x 1 x 2 + a 22 x 2 2. The first order partial derivatives of (43) are: (44) f 1 (x 1, x 2 ) = 2a 11 x 1 + 2a 12 x 2 ; f 2 (x 1, x 2 ) = 2a 12 x 1 + 2a 22 x 2. Equations (44) can be written as: (45) f(x) = 2Ax if f(x) x T Ax, A = A T ; Rule 3. If we partially differentiate the f i (x 1, x 2 ) in (44) with respect to x 1 and x 2, we obtain the following second order derivatives: (46) f 11 (x 1, x 2 ) = 2a 11 ; f 12 (x 1, x 2 ) = 2a 12 ; f 21 (x 1, x 2 ) = 2a 12 ; f 22 (x 1, x 2 ) = 2a 22. Using matrices, equations (46) can be rewritten as: (47) 2 f(x) = 2A if f(x) x T Ax, A = A T ; Rule 4. It can be verified that Rules 3 and 4 hold for a general N and not only the cases N = 1 and N = 2. Rules 1 to 4 are extremely useful and should be memorized. Problems: 1. Verify Rules 3 and 4 for the case N = Consider the following system of equations: (i) y = Xx + e where y and e are M dimensional vectors, X is an M by N matrix and x is an N dimensional vector. Define the function f(x) as 8

9 9 f(x) e T M 2 e = S i=1ei = (y - Xx) T (y - Xx) using (i) to solve for e = (y T - x T X T )(y - Xx) = y T y - y T Xx - x T X T y + x T X T Xx (ii) = y T y - 2y T Xx + x T X T Xx using y T Xx = [x T X T y] T. Assume that (X T X) -1 exists. (a) Show that x ˆ (X T X) -1 X T y satisfies the system of first order conditions for minimizing f(x): (iii) f( ˆ x ) = 0 N. [In statistics, ˆ x is known as the least squares estimator for the vector of parameters x]. (b) Show that 2 f(x) does not depend on x. (c) (iv) Show that v T 2 f( ˆ x ) v > 0 for every v 0 N and so x ˆ is in fact a local minimizer for f(x). [This part of the problem is difficult]. This problem shows that the least squares estimator x ˆ actually does minimize the sum of squared errors e T e with respect to the vector of coefficients x. 4. Quadratic Forms and Definite Matrices Let A be an N by N symmetric matrix and consider the following definitions: (48) A is positive definite iff x T Ax > 0 for all x 0 N ; (49) A is negative definite iff x T Ax < 0 for all x 0 N ; (50) A is positive semidefinite iff x T Ax 0 for all x 0 N ; (51) A is negative semidefinite iff x T Ax 0 for all x 0 N ; (52) A is indefinite iff it is none of (48) - (51). Recall the second order conditions (12) - (15) that were discussed in section 1 above. If we let the A matrix in this section equal 2 f(x 0 ) in section 1, it can be 9

10 10 seen that (48) corresponds to conditions (12) for a strict local minimum, (49) corresponds to conditions (14) for a strict local maximum, (50) corresponds to the second order necessary conditions (13) for a local minimum and (51) corresponds to the second order necessary conditions (15) for a local maximum. In the following section, we show how the Gaussian triangularization procedure can be adapted to determine whether a symmetric matrix A has any of the definiteness properties (48) - (52). Problem: 3. Let D be an N by N diagonal matrix with main diagonal elements d ii for i = 1, 2,..., N. Determine what restrictions the d ii must satisfy in order for D to be: (i) positive definite; (ii) negative definite; (iii) positive semidefinite; (iv) negative semidefinite and (v) indefinite; (assume N 2 for this case). 5. The Method of Lagrange and Gauss for Diagonalizing a Symmetric Matrix Recall the Gaussian triangularization procedure that was discussed in section 3 of Chapter 2 on Elementary Matrix Algebra. If A is a symmetric N by N matrix, then this algorithm can readily be modified to transform A into a diagonal matrix. Consider Stage 1 of our old algorithm where we added multiples of one row of A to other rows of A to create zeros below the first component of the first column of A. We again apply Stage 1 of our old algorithm, but before we proceed to Stage 2, we now add multiples of the final Stage 1 first column to the remaining columns of the transformed A matrix to create zeros in the remainder of row 1. In other words, we repeat the sequence of elementary row operations that we used to accomplish Stage 1 of the algorithm but now we apply the same sequence to the columns as well. More explicitly, consider the 3 cases for Stage 1 of our old algorithm. In case (i), we had a 11 0, and at the of Stage 1, the transformed A matrix had the following form (the E n represent elementary row operation matrices that add multiples of the first row of A to the remaining rows of A): (53) È a 11, a 12,..., a 1N Î Í 0 N-1, A (2) = E NE N-1... E 2 A. Now add -a 12 /a 11 times the first column of (53) to the second column of (53); add -a 13 /a 11 times the first column of (53) to the 3rd column of (53);...; add -a 1N /a 11 times the first column of (53) to the Nth column of (53). It can be verified that these elementary column operations can be performed by multiplying (53) on the right by E 2 T E3 T... EN T ; i.e., the transposes of the sequence of row operation 10

11 11 matrices E 2, E 3,..., E N sweep out a 12 = a 21, a 13 = a 31,..., a 1N = a N1. Thus we have T T (54) E N E N-1... E 2 AE 2 E3... T È EN = a 11, 0 N-1 Î Í 0 N-1, A * at the end of our new Stage 1 Algorithm for case (i) where a If we take transposes of both sides of (54), we deduce that the matrix on the left hand side of (54) is symmetric. Hence A* on the right hand side of (54) must also be symmetric. Hence, we can now apply the next stage of our modified algorithm to the N-1 by N-1 symmetric matrix A*. Now suppose that at Stage 1 of our old algorithm, case (iii) occurred, i.e., a i1 = 0 for i = 1, 2,..., N. But since A is now assumed to be symmetric, we have a 1j = 0 as well for j = 1, 2,..., N. Thus in case (iii), A has the following form: T (55) A = È 0, 0 T N-1 Î Í 0 N-1, A * which is the required form for the next stage of our modified algorithm. Finally, suppose that at Stage 1 of our old algorithm, case (ii) occurred; i.e., a 11 = 0 but a i1 0 for some i > 1. Recall that in our old algorithm, we added row i of A to the first row of A and then applied the case (i) operations to the transformed matrix. In the present algorithm, we not only add row i of A to row 1, we then immediately add column i of the transformed matrix to column 1. The resulting matrix will be symmetric with the element 2a i1 + a ii in the northwest corner of the transformed matrix. We now need to consider 2 cases: Case (a): 2a i1 + a ii 0 In this case, we can now apply our new case (i) algorithm on the previous page to this transformed matrix. If we denote E 1 as the elementary row matrix that adds row i of A to row 1, then we have the following decomposition at the end of Stage 1 of our new algorithm: T T (56) E N E N-1... E 2 E 1 AE 1 E2... T È EN = 2a i1 + a ii, 0 N-1 Î Í 0 N-1, A * ; i.e., we have again reduced A into block diagonal form where A* is a symmetric N-1 by N-1 matrix. Case (b): 2a i1 + a ii = 0. T 11

12 12 In this case, if we look at row 1 and row i and column 1 and column i of the original A matrix, this 2 by 2 submatrix of A has the following form (using a 11 = 0 and a ii = - 2a i1 ): È 0, a i1 Î a i1, -2a i1. After adding row i of the original matrix to row 1 and then adding column i of the original matrix to column 1, the above 2 by 2 submatrix is transformed into: È 0, -a i1 Î -a i1, -2a i1 so we have not succeeded in getting a nonzero element in the northwest corner of the transformed matrix. However, to solve this problem, all we have to do is add row i of the transformed matrix to row 1 and then add column i of the transformed matrix to column 1. Then the new transformed A matrix will be symmetric and have - 4a i1 0 in the top northwest corner. Hence in this case (b), we can again obtain a counterpart to (56) where - 4a i1 will replace 2a i1 + a ii in the northwest corner of the matrix on the right hand side of (56). Hence in both cases (a) and (b), we have again reduced A into block diagonal form where A* is an N-1 by N-1 symmetric matrix. Hence, for all cases, at the end of Stage 1 of our new algorithm, we have reduced A into the following block diagonal form: (57) È d 11, T 0 N-1 Í Î 0 N-1, A * At Stage 2 of the algorithm, we apply the same type of elementary row and column operations to the symmetric matrix A* and at the end of Stage 2, we have reduced A* into the following form: (58) A* = È d 22, 0 T N-2 Í Î 0 N-2, A * * where A** is an N-2 by N-2 symmetric matrix. Now further reduce A** into block diagonal form; etc. Finally, at the end of Stage N, we have transformed A into diagonal form by means of a sequence of elementary row and column operations where we add multiples of one row to another row and then repeat the same operation to the corresponding columns. If we let the N by N matrix E denote the product of all of the elementary row matrices, then we have 12

13 13 (59) EAE T = D where D = [d ij ] and d ij = 0 if i j. È Example: Let A Í 0 Î Stage 1: We are in case (i): a 11 = 1 0. Hence take -2 times row 1 and add to row 2; take -3 times row 1 and add to row 3. We obtain the following matrix: È 1, 2, 3 Í 0, -3, -6 Î 0, -6, -8. Now take -2 times column 1 and add to column 2; take -3 times column 1 and add to column 3; get: È 1, 0, 0 Í 0, -3, -6 Î 0, -6, -8. Stage 2: Now take -2 times row 2 and add to row 3; get: È 1, 0, 0 Í 0, -3, -6. Î 0, 0, 4 Finally, take -2 times column 2 and add to column 3; get: (60) D = È 1, 0, 0 Í 0, -3, 0, a diagonal matrix. Î 0, 0, 4 The two elementary row matrices that we used at Stage 1 of the algorithm were: È 1, 0, 0 È 1, 0, 0 (61) E 1 Í -2, 1 0 ; E 2 Í 0, 1, 0. Î 0, 0, 1 Î -3, 0, 1 The final elementary row matrix that we used at Stage 2 of the algorithm was: È 1, 0, 0 (62) E 3 = Í 0, 1, 0. Î 0, -2, 1 13

14 14 Problem: 4. Define E = E 3 E 2 E 1 where E 3 is defined by (62) and E 1 and E 2 are defined in (61). Show that EAE T = D where D is defined by (60). The matrix E and the diagonal matrix D which occurs in the Lagrange-Gauss diagonalization procedure (see (59) above) can be used to determine whether the symmetric A satisfies any of the definiteness properties (48) - (52). Consider the E matrix which occurs in (59). Since E is a product of elementary row matrices, each of which has determinant equal to 1, it can be seen that (63) E = E T = 1. Since E T = 1, (E T ) -1 exists. Now for each x 0 N, consider the y defined by (64) y = (E T ) -1 x. Suppose y = 0 N. Then premultiplying both sides of (64) by E T leads to x = 0 N which contradicts x 0 N. Hence if x 0 N, then the y defined by (64) also satisfies y 0 N. Let x 0 N and define y by (64). Premultiplying both sides of (64) by E T leads to (65) x = E T y where y 0 N. Hence for x 0 N, we have x T Ax = (E T y) T A(E T y) using (65) = y T EAE T y = y T Dy using (59) N 2 (66) = Âi=1 d ii y i. Thus necessary and sufficient conditions for A to be positive definite are: (67) d ii > 0 for i = 1, 2,..., N. Using (66) and (49), it can be seen that necessary and sufficient conditions for A to be negative definite are: (68) d ii < 0; i = 1,..., N. Similarly, necessary and sufficient conditions for A to be positive semidefinite are: (69) d ii 0; i = 1,..., N. 14

15 15 Finally, necessary and sufficient conditions for A to be negative semidefinite are: (70) d ii 0; i = 1,..., N. Problem: 5. Let A = È Î 2. Which of the definiteness properties (48) - (52) does A satisfy? Historical Note: The above reduction of a quadratic form x T Ax to a sum of squares y T Dy was accomplished by J.-L. Lagrange (1759), "Researches sur la métode de moximis et miniouis", Miscellanea Taurinensi, 1, for the cases N = 2 and N = 3. Carl Friedrich Gauss described the general algorithm in 1810; see his Theory of the Combination of Observations Least Subject to Errors, G.W. Stewart, translator, SIAM Classics in Applied Mathematics, This publication indicates that Gauss arrived at the principle of least squares estimation in 1794 or 1795 but the French mathematician A.M. Legendre independently derived the principle (and named it) in 1805 and actually published the method before Gauss. 6. Checking Second Order Conditions Using Determinants Let A = [a ij ] be an N by N symmetric matrix and suppose that we want to check whether A is a positive definite matrix. If A is positive definite, then it must be the case that a 11 > 0. Why is this? By the definition of A being positive definite, (48) above, we must have x T Ax > 0 for all x 0 N. Let x = e 1, the first unit vector. Then if A is positive definite, we must have (71) e 1 T Ae1 = a 11 > 0. We can rewrite (71) using determinantal notation. Since the determinant of a one by one matrix is simply equal to the single element, (71) is equivalent to: (72) a 11 > 0. Now if the N by N matrix A is positive definite, it can be seen that we must have T (73) 0 < [x 1,x 2, 0 N-2 T ]A[x 1, x 2,0 N-2 ] T = [x 1,x 2 ] È a 11 a 12 È x 1 Î a 12 a 22 Î x 2 for all x 1, x 2 such that [x 1, x 2 ] [0, 0]. This means that the top left corner 2 by 2 submatrix of A must also be positive definite if A is positive definite. Hence, by 15

16 16 the previous section, there exists a 2 by 2 elementary row matrix E (2) which, along with E (2)T, reduces the 2 by 2 submatrix of A into diagonal form. Using (71), it can be seen that the E (2) which will do the job is (74) E (2 ) 1, 0 È Î -a 12 /a 11, 1 and we have (75) E (2 ) È a 11 a 12 Î a 12 a 22 E(2)T = È d 11, 0 Î 0, d 22 where the d ii turn out to be: (76) d 11 a 11 ; 2 (77) d 22 a 22 - a 12 /a11. From the previous section, we know that necessary and sufficient conditions for the 2 by 2 submatrix of A to be positive definite are: (78) d 11 > 0; d 22 > 0. Since E (2) = 1, taking determinants on both sides of (75) yields: (79) a 11 a 12 a 12 a = d d = d 11 d 22 > 0 22 where the inequality follows from (78). Using (76) and (79), it can be seen that the determinantal conditions: (80) a 11 > 0; (81) a 11 a 12 a 12 a 22 > 0 are necessary and sufficient for conditions (78) which in turn are necessary and sufficient for the positive definiteness of the top left corner 2 by 2 submatrix of A. If the N by N matrix A is positive definite, it can be seen that we must have 16

17 17 T (82) 0 < [x 1,x 2, x 3,0 N-3 T ]A[x 1,x 2, x 3,0 N-3 ] T È a 11 a 12 a 13 È = [x 1,x 2,x 3 ] Í a 12 a 22 a 23 Í Î a 13 a 23 a 33 Î for all [x 1, x 2, x 3 ] [0, 0, 0]. This means that the top left corner 3 by 3 submatrix of A must also be positive definite. Hence there exists a 3 by 3 elementary row matrix E (3) with E (3) = 1 such that x 1 x 2 x 3 (83) E (3 ) a È (3) È 11 a 12 a 13 d 11, 0, 0 Í a 12 a 22 a 23 E (3)T = Í (3) 0, d 22, 0 Î a 13 a 23 a 33 Í Î 0, 0, d (3) 33, (3) where the d ii satisfy: (3) (84) d 11 > 0; (3) d22 > 0; (3) d33 > 0. Since E (3) = 1, taking determinants on both sides of (83) yields (85) (3) a 11 a 12 a d 13 11, 0, 0 (3) a 12 a 22 a 23 = 0, d 22, 0 a 13 a 23 a 33 (3) 0, 0, d 33 (3) (3) (3) = d 11 d22 d33 > 0 where the inequality in (85) follows from (84). (3) When A is positive definite, we need to show that the d 11 and (3) d22 which occur in (83) - (85) are the same as the d 11 and d 22 which occurred in (76) - (79). But this is obviously true using the Gaussian diagonalization algorithm: when we diagonalize the 3 by 3 submatrix of A, we must first diagonalize the 2 by 2 (3) submatrix of A and hence the d 11 and (3) d22 in (83) will equal the d 11 and d 22 which occurred in (75). Hence, we can rewrite (85) as follows: (86) a 11 a 12 a 13 d a 12 a 22 a 23 = 0 d 22 0 = d 11 d 22 d 33 > 0; a 13 a 23 a d 33 i.e., we have dropped the superscripts on the d ii. Now it can be seen that the determinantal inequalities (80), (81) and 17

18 18 (87) a 11 a 12 a 13 a 12 a 13 a 22 a 23 a 23 > 0 a 33 along with the equalities in (76), (79) and (86) are necessary and sufficient for the inequalities (88) d 11 > 0, d 22 > 0, d 33 > 0 which in turn are necessary and sufficient to for the top left 3 by 3 submatrix of A to be positive definite. Obviously, the above process can be continued until we obtain the following N determinantal conditions which are necessary and sufficient for the N by N symmetric matrix A to be positive definite: (89) a 11 > 0; a 11 a 12 a 12 a 22 > 0; a 11 a 12 a 13 a 12 a 13 a 22 a 23 a 23 > 0;...; A > 0. a 33 How can we adapt the above analysis to obtain conditions for A to be negative definite? Obviously, the Gaussian diagonalization procedure can again be used: the only difference in the analysis will be that the diagonal elements d ii must all be negative in the case where A is negative definite. This means that the determinantal conditions in (89) that involve an odd number of rows and columns of A must have their signs changed, since these determinants will equal the product of an odd number of the d ii. Hence the following N determinantal conditions are necessary and sufficient for the N by N symmetric matrix A to be negative definite: (90) a 11 < 0; a 11 a 12 a 12 a 22 > 0; a 11 a 12 a 13 a 12 a 22 a 23 a 13 a 23 a 33 < 0;...;(-1) N A > 0. Turning now to determinantal conditions for positive semidefiniteness or negative seimdefiniteness, one might think that the conditions are a straightforward modification of conditions (89) and (90) respectively, where the strict inequalities (>) are replaced by weak inequalities ( ). Unfortunately, this thought is incorrect as the following example shows. Example: A È Í Î In this case, we see that a 11 = 0 = 0; 18

19 19 a 11 a 12 a 12 a 22 = = 0 and A = 0. Hence the weak inequality form of conditions (89) and (90) are both satisfied so we might want to conclude that this A is both positive and negative semidefinite. However, this is not so: A is indefinite since e 2 T Ae2 = a 22 = 1 > 0 and e 3 T Ae3 = a 33 = -1 < 0. The problem with this example is that all of the elements in the first row and column of A are zero and hence d 11 is zero. Now look back at the inequalities (79) and (86): it can be seen that if d 11 = 0, then these inequalities are no longer valid. However, if instead of always picking submatrices of A that included the first row and column of A, we picked submatrices of A that excluded the first row and column, then we would discover that the submatrix of A which consisted of rows 2 and 3 and columns 2 and 3 is indefinite; i.e., we have (91) a 22 = 1 > 0 and a 22 a 23 a 23 a 33 = -1 < 0. In order to determine whether A is positive semidefinite, we replace the strict inequalities in (89) by weak inequalities but the resulting weak inequalities must be checked for all possible choices of the rows of A; i.e., necessary and sufficient conditions for A to be positive semidefinite are: (92) a ii = a ii 0 for i = 1, 2,..., N; a i 1 i 1 a i1 i 2 a i1 i 2 a i2 i 2 0 for i i 1 < i 2 N; a i 1 i 1 a i1 i 2 a i 1 i 3 a i1 i 2 a i2 i 2 a i 2 i 3 0 for 1 i 1 < i 2 < i 3 N; a i1 i 3 a i2 i 3 a i 3 i 3 :. A 0. In the 2 by 2 case, conditions (92) boil down to the following 3 conditions: 2 (93) a 11 0; a 22 0; A = a 11 a 22 - a In the 3 by 3 case, conditions (92) reduce to the following 7 determinantal conditions: 19

20 20 (94) a 11 0; a 22 0; a 33 0; a 11 a 12 a 12 a 22 0; a 11 a 13 a 13 a 33 0; a 22 a 23 a 23 a 33 0; A 0. If A is a symmetric N by N matrix, then necessary and sufficient determinantal conditions for A to be negative semidefinite are: (95) (-1) a ii 0; i = 1, 2,..., N; (-1) 2 a i 1 i 1 a i1 i 2 a i1 i 2 a i2 i 2 0; 1 i 1 < i 2 N; Problems: a i 1 i 1 (-1) 3 a i1 i 2 a i1 i 2 a i2 i 2 a i 1 i 3 a i 2 i 3 0; 1 i 1 < i 2 < i 3 N; a i1 i 3 a i2 i 3 a i 3 i 3 :. (-1) N A Let A È Î 0. Use the Gaussian diagonalization procedure to determine the definiteness properties of A. 7. Does the A defined in problem 6 above satisfy the determinantal conditions (93) for positive semidefiniteness? 8. Solve max x1, x 2 {f(x 1, x 2 ): x 1 > 0, x 2 > 0} (if possible) where f is defined as follows: (a) f(x 1,x 2 ) -x x1 x 2 - x x1 + x 2 (b) f(x 1, x 2 ) ln x 1 + ln x 2 + x 1 x 2-2x 1-2x 2. Check second order conditions when appropriate. 9. Consider the following 2 input, 1 output profit maximization problem: (i) max y,x1,x 2 {py - w 1 x 1 - w 2 x 2 : y = f(x 1, x 2 ) } where f is the producer's production function, w i > 0 is the price of input i and p > 0 is the price of output. The unconstrained maximization problem that is equivalent to (i) is: 20

21 21 (ii) max x1,x 2 { pf(x 1, x 2 ) - w 1 x 1 - w 2 x 2 }. Assume that f is twice continuously differentiable and x 1 * = d 1 (p, w 1, w 2 ) > 0 and x 2 * = d 2 (p, w 1, w 2 ) > 0 solve (ii) and that the first and second order sufficient conditions for a strict local maximum are satisfied at this point x 1 *, x2 *. Note that the producer's supply function y* = s(p, w 1, w 2 ) can be determined as a function of the two input demand functions d 1 and d 2 : (iii) s(p, w 1, w 2 ) f[d 1 (p, w 1, w 2 ), d 2 (p, w 1, w 2 )]. (a) Try to determine the signs of the following derivatives: s(p, w 1, w 2 )/ p; d 1 (p, w 1, w 2 )/ w 1 ; d 2 (p, w 1, w 2 )/ w 2. (b) Prove that: d 1 (p, w 1, w 2 )/ w 2 = d 2 (p, w 1, w 2 )/ w 1. (c) Prove that: s(p, w 1, w 2 )/ w 1 = - d 1 (p, w 1, w 2 )/ p. Note: (b) and (c) are Hotelling symmetry conditions. Hint: Look at the 2 first order conditions for (ii). Differentiate these 2 equations with respect to p; you will obtain a system of 2 equations involving the unknown derivatives d 1 (p, w 1, w 2 )/ p and d 2 (p, w 1, w 2 )/ p. Now differentiate the 2 first order conditions with respect to w 1 ; you will obtain a system of 2 equations involving the derivatives d 1 (p, w 1, w 2 )/ w 1 and d 2 (p, w 1, w 2 )/ w Let F È f 11 Î f 21 f 12 f 22 (i) f 11 < 0; be a symmetric matrix that satisfies the conditions: 2 (ii) f 11 f 22 - f 12 > 0. Show that the following inequality holds: (iii) -f f 12 - f 22 > 0. Hint: -f f 12 - f 22 = -[1, -1] È f 11 f 12 Î f 12 f 22 È 1 Î Consider a simple two sector model for the production sector of an economy. Sector 1 (the "service" section) produces aggregate consumption C using an intermediate input M ("manufactured" goods) and inputs of labour L 1 according to the production function f: (i) C = f(m, L 1 ). 21

22 22 Sector 2 (the "manufacturing" sector produces the intermediate output M using inputs of labour L 2 according to the production function (ii) M = L 2. (Each sector can use other primary inputs such as capital, land or natural resource inputs, but since we hold these other inputs fixed in the short run, we suppress mention of them in the above notation). There is an aggregate labour constraint in the economy: (iii) L 1 + L 2 = L > 0 where L is fixed. The manufacturer gets the revenue p > 0 for each unit of manufacturing output produced but the government puts a positive tax t > 0 on the sale of each unit of manufactures so that the service sector producer faces the price p(1 + t) for each unit of M used. The service sector producer is assumed to be a competitive profit maximizer; i.e., * M* = M(t) and L 1 = L1 (t) is the solution to: (iv) max M,L1 {f(m, L 1 ) - p(1 + t)m - wl 1 } where w > 0 is the wage rate and the price of the consumption good is 1. We assume that the following first and second order conditions for the unconstrained maximization problem (iv) are satisfied: * (v) f 1 (M*, L 1) - p(1 + t) = 0; * (vi) f 2 (M*, L 1) - w = 0; * (vii) f 11 * (viii) f 22 * f 11 (M*, L 1) < 0; * f 22 (M*, L 1) < 0; * * * (xi) f 11 f22 - (f 12 ) 2 * > 0 where f 12 * f12 (M*, L 1). One more equation is required; namely we assume that the price of the manufactured good is equal to the wage rate; i.e., we have: (x) p = w. Equation (x) is consistent with profit maximizing behavior in the manufacturing sector assuming that the production function (ii) is valid. 22

23 23 Now substitute equations (ii), (iii) and (x) into the first order conditions (v) and (vi) and we obtain the following two equations which characterize equilibrium in this simplified economy: (xi) f 1 ( L - L 1 (t), L 1 (t)] - w(t)(1 + t) = 0; (xii) f 2 [ L - L 1 (t), L 1 (t)] - w(t) = 0; where the 2 unknowns in (xi) and (xii) are L 1 (t) (employment in the service sector) and w(t) (the wage rate faced by both sectors) which are regarded as functions of the manufacturer's sales tax t. (a) Differentiate (xi) and (xii) with respect t and solve the resulting two equations for the derivatives L 1 (t) and w (t). (b) Show that L 1 (0) > 0. Hint: Use part (a) and problem 10 above. Consumption regarded as a function of the level of sales taxation is defined as follows: (xiii) C(t) f[ L - L 1 (t), L 1 (t)] (c) Show that C (t) = -tw(t) L 1 (t). Hint: Use (xi) - (xiii). (d) Compute C (0) and C (0). Hint: Use part (c). Now we can use the derivatives in part (d) above to calculate a second order Taylor series approximation to C(t); i.e., we have (xiv) C(0) + C (0)t + (1/2) C (0)t 2. (e) Treat (xiv) as an exact equality and show that C(t) < C(0). Hint: Use parts (b) and (d). Comment: This problem shows that in general, the aggregate net output of the entire production sector falls if transactions between sectors are taxed. There are many applications of this result. Note that (xiv) shows that the loss of output is proportional to the square of the tax rate, t 2. (f) Suppose that the government now subsidizes the output of the manufacturing sector; i.e., t is now negative instead of being positive. Can we still conclude that C(t) < C(0)? This problem shows you that you now have the mathematical tools that will enable you to construct simple models that cast some light on real life, practical economic problems. 23

24 24 7. Linearly Homogeneous Functions and Euler's Theorem Let f(x 1,..., x N ) f(x) be a function of N variables defined over the positive orthant, W {x: x >> 0 N }. Note that x >> 0 N means that each component of x is positive while x 0 N means that each component of x is nonnegative. Finally, x > 0 N means x 0 N but x 0 N (i.e., the components of x are nonnegative and at least one component is positive). (96) Definition: f is (positively) linearly homogeneous iff f (l x) = l f(x) for all l > 0 and x >> 0 N. (97) Definition: f is (positively) homogeneous of degree a iff f(l x) = l a f(x) for all l > 0 and x >> 0 N. We often assume that production functions and utility functions are linearly homogeneous. If the producer's production function f is linearly homogeneous, then we say that the technology is subject to constant returns to scale; i.e., if we double all inputs, output also doubles. If the production function f is homogeneous of degree a < 1, then we say that the technology is subject to diminishing returns to scale while if a > 1, then we have increasing returns to scale. Functions that are homogeneous of degree 1, 0 or -1 occur frequently in index number theory. Recall the profit maximization problem (i) in Problem 9 above. The optimized objective function, p(p, w 1, w 2 ), in that problem is called the firm's profit function and it turns out to be linearly homogeneous in (p, w 1, w 2 ). For another example of a linearly homogeneous function, consider the problem which defines the producer's cost function. Let x 0 N be a vector of inputs, y 0 be the output produced by the inputs x and let y = f(x) be the producer's production function. Let p >> 0 N be a vector of input prices that the producer faces, and define the producer's cost function as (98) C(y, p) min x 0N {p T x: f(x) y}. It can readily be seen, that for fixed y, C(y, p) is linearly homogeneous in the components of p; i.e., let l > 0, p >> 0 N and we have (99) C(y, lp) min x 0N {lp T x: f(x) y} lmin x 0N {p T x: f(x) y} using l > 0 lc(y, p). 24

25 25 Now recall the definition of a linearly homogeneous function f given by (96). We have the following two very useful theorems that apply to differentiable linearly homogeneous functions. Euler's First Theorem: If f is linearly homogeneous and once continuously differentiable, then its first order partial derivative functions, f i (x) for i = 1, 2,..., N, are homogeneous of degree zero and N (100) f(x) = S i=1xi f i (x) = x T f(x). Proof: Partially differentiate both sides of the equation in (96) with respect to x i ; we get for i = 1, 2,..., N: (101) f i (lx) l = lf i (x) for all x >> 0 N and l > 0, or (102) f i (lx) = f i (x) = l 0 f i (x) for all x >> 0 N and l > 0. Using definition (97) for a = 0, we see that equation (102) implies that f i is homogeneous of degree 0. To establish (100), partially differentiate both sides of the equation in (96) with respect to l and get: N (103) S i=1fi (lx 1, lx 2,..., lx N ) (lx i )/ l = f(x) or N fi (lx 1, lx 2,..., lx N )x i = f(x). S i=1 Now set l = 1 in (103) to obtain (100). Q.E.D. Euler's Second Theorem: If f is linearly homogeneous and twice continuously differentiable, then the second order partial derivatives of f satisfy the following N linear restrictions: for i = 1,..., N: N (104) S j=1 fij (x)x j = 0 for x (x 1,..., x N ) T >>0. The restrictions (104) can be rewritten as follows: (105) 2 f(x)x = 0 N for every x >>0 N. Proof: For each i, partially differentiate both sides of equation (102) with respect to l and get for i = 1, 2,..., N: (106) N S j=1 fij (lx 1,..., lx N ) (lx j )/ l = 0 or N S j=1 fij (lx)x j = 0. 25

26 26 Now set l = 1 in (106) and the resulting equations are equations (104). Q.E.D. Problems: 12. [Shephard's Lemma]. Suppose that the producer's cost function C(y, p) is defined by (98) above. Suppose that when p = p* >> 0 N and y = y* > 0, x* > 0 N solves the cost minimization problem, so that (i) p* T x* = C(y*, p*) min x {p* T x: f(x) y*}. (a) Suppose further that C is differentiable with respect to the input prices at (y*, p*). Then show that (ii) x* = p C(y*, p*). Hint: Because x* solves the cost minimization problem defined by C(y*, p*) by hypothesis, then x* must be feasible for this problem so we must have f(x*) y*. Thus x* is a feasible solution for the following cost minimization problem where the general input price vector p >> 0 N has replaced the specific input price vector p* >> 0 N : (iii) C(y*, p) min x {p T x: f(x) y*} p T x* where the inequality follows from the fact that x* is a feasible (but usually not optional) solution for the cost minimization problem in (iii). Now define for each p >> 0 N : (iv) g(p) p T x* - C(y*, p). Use (i) and (iii) to show that g(p) is minimized (over all p such that p >> 0 N ) at p = p*. Now recall the first order necessary conditions for a minimum. (b) Under the hypotheses of part (a), suppose x** > 0 N is another solution to the cost minimization problem defined in (i). Then show x* = x**; i.e., the solution to (i) is unique under the assumption that C(y*, p*) is differentiable with respect to the components of p. 13. Suppose C(y, p) defined by (98) is twice continuously differentiable with respect to the components of the input price vector p and let the vector x(y, p) solve (98); i.e., x(y, p) [x 1 (y, p),..., x N (y, p)] T is the producer's system of cost minimizing input demand functions. Define the N by N matrix of first order partial derivatives of the x i (y, p) with respect to the components of p as: 26

27 27 (i) A [ x i (y, p 1,..., p N )/ p j ] ( p x(y, p)). Show that: (ii) A = A T and (iii) Ap = 0 N. Hint: By the previous problem, x(y, p) p C(y, p). Recall also (99) and Euler's Second Theorem. Comment: The restrictions (ii) and (iii) above were first derived by J.R. Hicks (1939), Value and Capital, Appendix to Chapters II and III, part 8 and P.A. Samuelson (1947), Foundations of Economic Analysis, page 69. The restrictions (ii) on the input demand derivatives x i / p j are known as the Hicks-Samuelson symmetry conditions. So far, we have developed two methods for checking the second order conditions that arise in unconstrained optimization theory: (i) the Lagrange-Gauss diagonalization procedure explained in section 5 above and (iii) the determinantal conditions method explained in section 6 above. In the final sections of this chapter, we are going to derive a third method: the eigenvalue method. Before we can explain this method, we require some preliminary material on complex numbers. 8. Complex Numbers and the Fundamental Theorem of Algebra (107) Definition: i is an algebraic symbol which has the property i 2 = -1. Hence i can be regarded as the square root of -1; i.e., -1 i. (108) Definition: A complex number z is a number which has the form z = x + iy where x and y are ordinary real numbers. The number x is called the real part of z and the number y is called the imaginary part of z. We can add and multiply complex numbers. To add two complex numbers, we merely add their real parts and imaginary parts to form the sum; i.e., if z 1 x 1 + iy 1 and z 2 = x 2 + iy 2, then (109) z 1 + z 2 = [x 1 + iy 1 ] + [x 2 + iy 2 ] (x 1 + x 2 ) + (y 1 + y 2 ) i. To multiply together two complex numbers z 1 and z 2, we multiply them together using ordinary algebra, replacing i 2 by -1; i.e., (110) z 1 z 2 = [x 1 + iy 1 ] [x 2 + iy 2 ] = x 1 x 2 + iy 1 x 2 + ix 1 y 2 + i 2 y 1 y 2 27

28 28 = x 1 x 2 + i 2 y 1 y 2 + (x 1 y 2 + x 2 y 1 )i (x 1 x 2 - y 1 y 2 ) + (x 1 y 2 + x 2 y 1 )i. Two complex numbers are equal iff their real parts and imaginary parts are identical; i.e., if z 1 = x 1 + iy 1 and z 2 = x 2 + iy 2, then z 1 = z 2 iff x 1 = x 2 and y 1 = y 2. The final definition we require in this section is the definition of a complex conjugate. (111) Definition: If z = x + iy, then the complex conjugate of z, denoted by z, is defined as the complex number x - iy; i.e., z x - iy. An interesting property of a complex number and its complex conjugate is given in Problem 15 below. Problems: 14. Let a 3 + i; b 1 + 5i and c 5-2i. Calculate ab-c. Note that we have written a b as ab. 15. Show that z z 0 for any complex number z = x + iy. 16. Let z 1 = x 1 + iy 1 and z 2 = x 2 + iy 2 be two complex numbers calculate z 3 = z 1 z 2. Show that z 3 = z 1 z 2 ; i.e., the complex conjugate of a product of two complex numbers is equal to the product of the complex conjugates. Now let f(x) be a polynomial of degree N; i.e., (112) f(x) a 0 + a 1 x + a 2 x a N x N where a N 0, where the fixed numbers a 0, a 1, a 2,..., a N are ordinary real numbers. If we try to solve the equation f(x) = 0 for real roots x, then it can happen that no real roots to this polynomial equation exist; e.g., consider (113) 1 + x 2 = 0 so that x 2 = -1 and no real roots to (113) exist. However, note that if we allow solutions x to (113) to be complex numbers, then (113) has the roots x 1 = i and x 2 = -i. In general, if we allow solutions to the equation f(x) = 0 (where f is defined by (112)) to be complex numbers, then there are always N roots to the equation (some of which could be repeated or multiple roots). (114) Fundamental Theorem of Algebra: Every polynomial equation of the form, a 0 + a 1 x a 2 x a N x N = 0 (with a N 0) has N roots or solutions, x 1, x 2,..., x N, where in general, the x i are complex numbers. This is one of the few theorems which we will not prove in this course. For a 28

29 29 proof, see J.V. Uspensky, Theory of Equations. 9. The Eigenvalues and Eigenvectors of a Symmetric Matrix Let A be a general N by N matrix; i.e., it is not restricted to be symmetric at this point. (115) Definition: l is a eigenvalue of A with the corresponding eigenvector z [z 1, z 2,..., z N ] T 0 N iff l and z satisfy the following equation: (116) Az = lz; z 0 N. Note that the eigenvector z which appears in (116) is not allowed to be a vector of zeros. In the following theorem, we restrict A to be a symmetric matrix. In the case of a general N by N nonsymmetric A matrix, the eigenvalue l which appears in (116) is allowed to be a complex number and the eigenvector z which appears in (116) is allowed to be a vector of complex numbers; i.e., z is allowed to have the form z = x + iy where x and y are N dimensional vectors of real numbers. (117) Theorem: Every N by N symmetric matrix A has N eigenvalues l 1, l 2,..., l N where these eigenvalues are real numbers. Proof: The equation (116) is equivalent to: (118) [A - li N ]z = 0 N ; z 0 N. Now if [A - li N ] -1 were to exist, then we could premultiply both sides of (118) by this inverse matrix and obtain: (119) [A - li N ] -1 [A - li N ]z = [A - li N ] -1 0 N = 0 N or z = 0 N. But z = 0 N is not admissible as an eigenvector by definition (115). From our earlier material on determinants, we know that [A - li N ] -1 exists iff A - li N 0. Hence, in order to hope to find a l and z 0 N which satisfy (116), we must have: (120) A - li N = 0. If N = 2, the determinantal equation (120) becomes: (121) 0 = È a 11, a 12 Î a 12, a 22 - È l, 0 l 0 Î 29

30 30 = a 11 - l, a 12 a 12, a 22 - l 2 = (a 11 - l)(a 22 - l) - a 12, which is a quadratic equation in l. In the general N by N case, if we expand out the determinantal equation (120), we obtain an equation of degree N in l of the form b 0 + b 1 l +b 2 l b N l N = 0 and by the Fundamental Theorem of Algebra, this polynominal equation has N roots, l 1, l 2,..., l N say. Once we have found these eigenvalues l i, we can obtain corresponding eigenvectors z i 0 N by solving (122) [A - l i I N ]z i = 0 N ; i = 1, 2,..., N for a nonzero vector z i. (We will show exactly how this can be done later). However, both the eigenvalues l i and the eigenvectors z i can have complex numbers as components in general. We now show that the eigenvalues and eigenvectors have real numbers as components when A = A T. Suppose that l 1 is an eigenvalue of A (where l 1 = a 1 + b 1 i say) and z 1 = x 1 + iy 1 is the corresponding eigenvector. Since z 1 0 N, at least one component of the x 1 and y 1 vectors must be nonzero. Thus letting z 1 x 1 - iy 1 be the vector of complex conjugates of the components of z 1, we have z 1T z 1 = [x 1T + iy 1T ] [x 1 - iy1] = x 1T x 1 - i 2 y 1T y 1 - ix 1T y 1 + iy 1T x 1 = x 1T x 1 + y 1T y 1 - i[x 1T y 1 - y 1T x 1 ] = x 1T x 1 + y 1T y 1 since x 1T y 1 = y 1T x 1 N 1 = S i=1(xi ) 2 + N 1 Si=1(yi ) 2 (123) > 0 where the inequality follows since at least one of the x i 1 or yi 1 is not equal to zero and hence its square is positive. By the definition of l 1 and z 1 being an eigenvalue and eigenvector of A, we have: (124) Az 1 = l 1 z 1. Since A is a real matrix, the matrix of complex conjugates of A, A, is A. Now take complex conjugates on both sides of (124). Using A = A and Problem 16 above we obtain: 30

31 31 (125) A z 1 = l 1 z 1. Premultiply both sides of (124) by z 1T and we obtain the following equality: (126) z 1T Az 1 = l 1 z 1T z 1. Now take transposes of both sides of (126) and we obtain: (127) l 1 z 1T z 1 = z 1T A T z 1 = z 1T A z 1 where the second equality in (127) follows from the symmetry of A; i.e., A = A T. Now premultiply both sides of (125) by z 1T and obtain: (128) l 1 z 1T z 1 = z 1T A T z 1. Since the right hand sides of (127) and (128) are equal, so are the left hand sides so we obtain the following equality: (129) l 1 z 1T z 1 = l 1 z 1T z 1. Using (123), we see that z 1T z 1 is a positive number so we can divide both sides of (129) by z 1T z 1 to obtain: (130) l 1 = a 1 + b 1 i = l 1 = a 1 - b 1 i, which in turn implies that the imaginary part of l 1 must be zero; i.e., we find that b 1 = 0 and hence the eigenvalue l 1 must be an ordinary real number. To find a real eigenvector z 1 = x 1 + i0 N = x 1 0 N that corresponds to the eigenvalue l 1, define the N by N matrix B 1 as (131) B 1 A - l 1 I N. We know that B 1 = 0 and we need to find a vector x 1 0 N such that B 1 x 1 = 0 N. Apply the Gaussian triangularization algorithm to B 1. This leads to an elementary row matrix E 1 with E 1 = 1 and (132) E 1 B 1 = U 1 where U 1 is an upper triangular N by N matrix. Since B 1 = 0, taking determinants on both sides of (132) leads to U 1 = 0 and hence at least one of 1 the N diagonal elements u ii of U 1 1 must be zero. Let u i1 i 1 be the first such zero diagonal element. We choose the components of the x 1 1 vector as follows: let x i1 31

Thus necessary and sufficient conditions for A to be positive definite are:

Thus necessary and sufficient conditions for A to be positive definite are: 14 Problem: 4. Define E = E 3 E 2 E 1 where E 3 is defined by (62) and E 1 and E 2 are defined in (61). Show that EAE T = D where D is defined by (60). The matrix E and the diagonal matrix D which occurs