Qudrtic Forms Recll the Simon & Blume excerpt from n erlier lecture which sid tht the min tsk of clculus is to pproximte nonliner functions with liner functions. It s ctully more ccurte to sy tht we pproximte nonliner functions with ffine functions: given nonliner function f : R n R, our pproximting function will e of the form + g( x), where g is liner function nd x = x x. For exmple, in the cse n = 1, if we wish to pproximte f ner point x in the domin of f, our pproximting function is f(x) + x, where the coefficient is f (x), the derivtive of f t x: f(x) plys the role of in the expression + g( x) ove, nd the liner function x i.e., f (x) x plys the role of g( x). In words, the ffine pproximtion of f ner x is the ffine function with (i) the sme vlue s f t x, nd (ii) the sme slope (the sme derivtive) s f t x. We re going to find tht it s importnt to pproximte nonliner functions not only with liner (ctully, ffine) functions, ut lso with qudrtic functions. For exmple, for rel function f : R R, our qudrtic pproximting function will e f(x) + f (x) x + 1 2 f (x)( x) 2. The qudrtic pproximtion is therefore the qudrtic function tht hs (i) the sme vlue s f t x, (ii) the sme slope (the sme derivtive) s f t x, nd (iii) the sme curvture (the sme second derivtive) s f t x. When we generlize from functions with the one-dimensionl domin R to multivrite functions, with domin R n, things get little it more complicted. The derivtive of function f : R n R t point x R n is no longer just numer, ut vector in R n specificlly, the grdient of f t x, which we write s f(x). And the qudrtic term in the qudrtic pproximtion to f is qudrtic form, which is defined y n n n mtrix H(x) the second derivtive of f t x. In these notes we re going to study qudrtic forms. Qudrtic Forms You lredy know tht qudrtic function (from R into R) is 2nd-degree polynomil, i.e., rel function f(x) = x 2 + x + c in which 0. If ech of the coefficients,, nd c is non-zero, then the function hs second-degree (qudrtic) term, first-degree (liner) term, nd zero-degree (constnt) term. However, qudrtic form is rel-vlued function on R n tht hs only seconddegree (qudrtic) terms. So qudrtic form on R (i.e., on R n, where n = 1) is function of the form f(x) = x 2 for some non-zero coefficient R. Wht out the cse n = 2? A qudrtic form on R 2 is function of the form f(x 1, x 2 ) = 11 x 1 x 1 + 12 x 1 x 2 + 21 x 2 x 1 + 22 x 2 x 2, or equivlently, f(x 1, x 2 ) = 11 x 2 1 + ( 12 + 21 )x 1 x 2 + 22 x 2 2. Note tht in the second expression for f we comined the coefficients 12 nd 21 into their sum, so we cn lso write the sme function s f(x 1, x 2 ) = 11 x 2 1 + 12 x 1x 2 + 22 x 2 2, where 12 = 12 + 21, which is common wy to write qudrtic forms (ut without the prime). But for now we re going to use the first expression, writing the generic qudrtic form on R 2 s f(x 1, x 2 ) = 11 x 1 x 1 + 12 x 1 x 2 + 21 x 2 x 1 + 22 x 2 x 2.
Note tht the generic qudrtic form on R 2 cn lso e s written f(x 1, x 2 ) = x 1 x 2 11 12 x 1 21 22 x 2. Moreover, without loss of generlity we cn ssume tht 12 = 21 for if 12 nd 21 re not equl, we cn write them insted s ã 12 nd ã 21 nd then define 12 = 21 = 1 2 (ã 12 +ã 21 ). Therefore the qudrtic forms on R 2 re precisely the functions f : R 2 R of the form f(x) = f(x 1, x 2 ) = x 1 x 2 11 12 x 1 21 22 x 2 = xax, where A is symmetric mtrix. Note: As in the expression xax ove, I m not going to indicte trnsposes of vectors in these Qudrtic Forms notes. The expression xax will lwys men tht the vector x R n is written s row vector if it s on the left of the mtrix nd s column vector if it s on the right, so tht xax is lwys well-defined nd its vlue is lwys rel numer. Before moving to the generl cse of R n, let s consider the cse of R 3. In this cse the generic qudrtic form is f(x 1, x 2, x 3 ) = 11 x 1 x 1 + 22 x 2 x 2 + 33 x 3 x 3 + 12 x 1 x 2 + 21 x 2 x 1 + 13 x 1 x 3 + 31 x 3 x 1 + 23 x 2 x 3 + 32 x 3 x 2, nd we cn ssume, s efore, tht 12 = 21, 13 = 31, nd 23 = 32. Therefore we cn write the qudrtic form s 11 12 13 f(x) = f(x 1, x 2, x 3 ) = x 1 x 2 x 3 21 22 23 x 1 x 2 = xax, where A is symmetric 3 3 mtrix. 31 32 33 x 3 Now it should e cler how we wnt to define the generl qudrtic form, on R n : Definition: A qudrtic form on R n is function f : R n R of the form f(x) = xax, where A is symmetric n n mtrix. One importnt property of qudrtic forms is immeditely ovious: Remrk: The vlue of qudrtic form t the vector 0 R n is zero. Becuse every qudrtic form corresponds to unique symmetric mtrix, we cn chrcterize vrious clsses of qudrtic forms completely in terms of properties of symmetric mtrices. For exmple, how cn we identify which qudrtic forms lwys hve nonnegtive vlues for every vector x R n? How cn we identify which ones re strictly concve functions? We nswer questions like these y identifying the properties of symmetric mtrices tht yield qudrtic forms with the desired properties. 2
Definiteness of Qudrtic Forms nd Mtrices We ended the preceding section y sking how we cn identify which qudrtic forms on R n lwys hve nonnegtive vlues, or which ones re strictly concve functions, etc. The pttern for generl n is foreshdowed y the simple cse n = 1, where the qudrtic forms re the functions f(x) = x 2. If > 0 this qudrtic form is positive for ll nonzero vlues of x, nd if < 0 the qudrtic form is negtive for ll nonzero vlues of x. Moreover, in the > 0 cse the function f( ) is strictly convex, nd when < 0 the function is strictly concve. Before trying to nlyze the generl cse of qudrtic forms on R n for ny n, let s spend some time studying the cse n = 2. Here the qudrtic form is 11 12 f(x) = f(x 1, x 2 ) = x 1 x 2 21 22 where A is symmetric mtrix. Let s rewrite the mtrix s the suscripts. So we hve f(x) = xax = x 1 x 2 c x 1 x 2 x 1 x 2 = xax, so we won t hve to del with c = x 2 1 + 2x 1 x 2 + cx 2 2. (1) Wht we wnt to know out this qudrtic form is whether its vlue is positive (or t lest nonnegtive) for ll vectors x 0; or whether it s negtive (or t lest non-positive) for ll x 0; or whether neither of these is true i.e., it s positive for some vectors nd negtive for others. If the first ( positive ) sttement is true, we sy the qudrtic form is positive definite; if ll we cn sy is tht it s non-negtive for ll nonzero vectors, we sy the qudrtic form is positive semi-definite. If the qudrtic form is negtive for ll x 0, we sy it s negtive definite; nd if we cn only sy it s non-positive for ll nonzero vectors, we sy it s negtive semi-definite. If the sign cn go either wy, we sy the qudrtic form is indefinite. We use the sme terms positive definite, etc. to descrie the mtrix A. Now let s see if we cn figure out some conditions on the mtrix A tht will tell us which definiteness property it hs nd therefore which property the qudrtic form xax hs. Certinly if = c = 0 then the qudrtic form in (1) is indefinite: x 1 x 2 > 0 for some vectors x, nd x 1 x 2 < 0 for other vectors x. In fct, if just one of the coefficients or c is zero, the qudrtic form is indefinite: for exmple, if = 0 then xax = 2x 1 x 2 + cx 2 2 = (2x 1 + cx 2 )x 2, so the sign of xax depends on the sign of 2x 1 + cx 2, which will clerly e positive for some vectors x nd negtive for others. So let s ssume tht oth 0 nd c 0. Now we cn use the trick of completing the squre to chnge this expression into sum of two squres, s follows: 3
xax = x 2 1 + 2x 1 x 2 + cx 2 2 = ( x 2 1 + 2 x ) 1x 2 + cx 2 2 + 2 x2 2 2 x2 2 = ( x 2 1 + 2 x 1x 2 + 2 ( 2) 2 ) 2 x2 + c x 2 2 = ( x 1 + x ) 2 1 2 + (c 2 )x 2 2 = ( x 1 + x ) 2 1 2 + A x2 2. Now it s cler tht if > 0 nd A > 0, then xax is positive definite: these two inequlities ensure tht the only wy oth terms in the sum cn e zero is if x 2 = 0 (to mke the second term zero), in which cse x 1 hs to e zero s well in order to mke the first term zero. Of course, > 0 nd A > 0 lso gurntee tht xax cn t e negtive for ny vector x, so xax is indeed positive definite. A prllel rgument shows tht xax is negtive definite if < 0 nd A > 0: in this cse the coefficient on the second term is negtive if nd only if nd A hve opposite signs. Notice tht lthough we ssumed tht nd c re oth nonzero, we didn t ctully use the fct tht c 0. However, one of the conditions for definiteness (positive or negtive) is tht A > 0, nd this requires tht c > 0 i.e., tht nd c hve the sme sign. Also note tht we could hve crried out the ove rgument with the roles of nd c reversed. Therefore, we hve the following theorem, where we revert to the nottion ij for the entries in the mtrix A: Theorem: A 2 2 symmetric mtrix A is positive definite if nd only if A > 0 nd either 11 > 0 or 22 > 0, which is equivlent to A > 0 nd oth 11 > 0 nd 22 > 0; negtive definite if nd only if A > 0 nd either 11 < 0 or 22 < 0, which is equivlent to A > 0 nd oth 11 < 0 nd 22 < 0. Wht re necessry nd sufficient conditions for the mtrix A nd the ssocited qudrtic form xax to e positive or negtive semidefinite? If A is positive semidefinite i.e., xax 0 for ll x 0 we clerly must hve 0 nd c 0. If either > 0 or c > 0, then we must hve A 0; nd if = c = 0, then we must hve = 0 s well, in which cse A = 0. Therefore the conditions 0, c 0, nd A 0 together must ll hold if A is positive semidefinite. Are these conditions lso sufficient? Suppose tht A 0. If = c = 0, then A 0 implies tht = 0, so tht A is the zero mtrix, nd xax = 0 for ll x R 2. And it s cler in the ove expression for xax tht if > 0 (or, symmetriclly, c > 0) nd lso A 0, then xax 0 for ll x 0. Therefore the conditions 0, c 0, nd A 0 re sufficient s well s necessry for A to e positive semidefinite. A prllel rgument provides the conditions for A to e negtive semidefinite, nd we hve the following theorem: Theorem: A 2 2 symmetric mtrix A is positive semidefinite if nd only if 11 0, 22 0, nd A 0; negtive semidefinite if nd only if 11 0, 22 0, nd A 0. 4
This pttern generlizes to R n for ritrry n s follows: first think of the components ij of the 2 2 mtrix A s 1 1 sumtrices of A; ecuse they re 1 1 we ll cll them order-1 sumtrices, nd we ll sy tht A itself, which is 2 2, is n order-2 sumtrix. For n n n mtrix A we ll sy tht sumtrix of order k consists of k of the rows nd k of the columns of A or we could equivlently sy tht n order-k sumtrix is formed y deleting n k rows nd columns. Now note tht in the 2 2 exmple, the conditions we developed involved only sumtrices on the digonl, 11 nd 22, s well s A. We could sy ech of these sumtrices ws formed y deleting the sme row nd column: for 11 we deleted the second row nd the second column; for 22 we deleted the first row nd column; for A itself we deleted no rows nd columns. Finlly, note tht the conditions in the 2 2 cse were conditions on the signs of the determinnts of these sumtrices. The following definition generlizes these ides to n n mtrices. Definition: Let A e n n n mtrix. For ech k = 1, 2,..., n n order-k principl sumtrix of A is k k mtrix formed y deleting the sme n k rows nd columns. The order-k sumtrices formed y deleting the lst n k rows nd columns re clled the leding principl sumtrices of A. The determinnt of sumtrix of A is clled minor of A. Therefore the leding principl sumtrices of 2 2 mtrix A re the mtrices 11 nd A itself; the leding principl minors re 11 nd A. The leding principl minors of 3 3 mtrix A re 11 12 11 nd nd A. 21 22 The leding principl minors of 4 4 mtrix A re 11 11 12 13 12 11 nd nd 21 22 21 22 23 31 32 33 nd A. The generl versions of our 2 2 theorems re s follows: Theorem: An n n symmetric mtrix is positive definite if nd only if ll of its leding principl minors re positive, or equivlently, if nd only if ll of its principl minors re positive; negtive definite if nd only if ll of its order-k leding principl minors hve sign ( 1) k ; or equivlently, if nd only if ll of its order-k leding principl minors hve sign ( 1) k. Theorem: An n n symmetric mtrix is positive semidefinite if nd only if ll of its principl minors re non-negtive; negtive semidefinite if nd only if ll of its nonzero order-k principl minors hve sign ( 1) k. Corollry: An n n symmetric mtrix is indefinite if nd only if it hs oth negtive principl minor nd n order-k principl minor with sign ( 1) k+1. (Note tht these could oth e the sme principl minor, s in the following exmple.) 5
Exmple: In order tht the mtrix A e positive definite it s necessry nd sufficient tht the leding principl minors ll e positive. It therefore seems nturl to think tht the prllel result should hold for positive semidefiniteness: tht A is positive semidefinite if nd only if ll of its leding principl minors re nonnegtive. Here s counterexmple, which shows tht merely hving nonnegtive leding principl minors is not sufficient to ensure tht A is positive semidefinite: we need to consider ll the principl minors. 1 0 2 Let A = 0 0 0. All three leding principl minors re either positive or zero: 2 0 2 1 0 A 1 = 11 = 1, A 2 = 0 0 = 0, nd A 3 = A = 0. However, the order-2 non-leding principl minor 11 13 31 33 = 1 2 2 2 = 2, which is inconsistent with oth positive nd negtive semidefiniteness: it s negtive, which is inconsistent with positive semidefiniteness; nd its order is k = 2 nd its sign is 1 ( 1) k, which is inconsistent with negtive semidefiniteness. Note tht xax = x 2 1 + 2x2 3 + 4x 1x 3. When x = (1, 0, 1), then xax = 7; when x = (1, 0, 1), then xax = 1; this verifies directly, without hving to consider principl minors, tht A is neither negtive semidefinite nor positive semidefinite i.e., tht it s indefinite. 6