KKT Conditions for Rank-Deficient Nonlinear Least-Square Problems with Rank-Deficient Nonlinear Constraints1

JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS: Vol. 100, No. 1. pp. 145-160, JANUARY 1999 KKT Conditions for Rank-Deficient Nonlinear Least-Square Problems with Rank-Deficient Nonlinear Constraints1 M. GULLIKSSON2 Communicated by C. G. Broyden Abstract. In nonlinear least-square problems with nonlinear constraints, the function (l/2) f2(x) 22, wheref2 is a nonlinear vector function, is to be minimized subject to the nonlinear constraints f1, (x) = 0. This problem is ill-posed if the first-order KKT conditions do not define a locally unique solution. We show that the problem is ill-posed if either the Jacobian off1 or the Jacobian of J is rank-deficient (i.e., not of full rank) in a neighborhood of a solution satisfying the first-order KKT conditions. Either of these ill-posed cases makes it impossible to use a standard Gauss-Newton method. Therefore, we formulate a constrained least-norm problem that can be used when either of these ill-posed cases occur. By using the constant-rank theorem, we derive the necessary and sufficient conditions for a local minimum of this minimum-norm problem. The results given here are crucial for deriving methods solving the rank-deficient problem. Key Words. Nonlinear least squares, optimization, regularization, KKT conditions, rank-deficient nonlinear constraints, rank-deficient nonlinear least-square problems. 1. Introduction We will consider an important special case of ill-posed nonlinear leastsquare problems with nonlinear equality constraints. Let us first consider the unconstrained least-square problem 1The author thanks the reviewers for suggestions improving the manuscript considerably. 2Assistant Professor, Department of Computing Science, Umea University, Umea, Sweden. 145 0022-3239/99/0100-OI45S16.00/0 c 1999 Plenum Publishing Corporation

146 JOTA: VOL. 100, NO. 1, JANUARY 1999 where/: R n -»R m is at least twice continuously differentiable and \\-\\2 is the 2-norm. The first-order KKT condition for (1) is where J=df/dx is the Jacobian of fa solution x to (2) will be called a critical point. For clarity, we sometimes denote functions or derivatives evaluated at x with a hat, e.g., J=J(x). An important case of an ill-posed nonlinear least-square problem is when J is of rank r < n in a neighborhood of the critical point x. This will be the main assumption in this paper but will not always be stated explicitly. Examples of rank-deficient problems are underdetermined problems (Ref. 1), nonlinear regression problems (Ref. 2), nonlinear total least-square problems (Ref. 3), and artificial neural networks (Ref. 4). Note that all these problems may have nonlinear rank-deficient constraints. Another equally important reason for looking at rank-deficient problems is the connection with regularization (Ref. 5). For Tikhonov regularization, it is often the case that the problem solved in the limit is a minimum-norm problem of the kind that we analyze here (Ref. 6). The following theorem characterizes a problem that has a rank-deficient Jacobian in a neighborhood of a critical point. The theorem is in fact a corollary of Theorem 1.2 below and can be found in Ref. 7. Theorem 1.1. Let x be a critical point, and let the rank of J be equal to r<n in a neighborhood of x. Then, Vl x F(x) is a matrix of rank r <n with its nullspace containing the nullspace of J(x). We may conclude that having J rank-deficient makes (1) an ill-posed problem in the sense that (2) does not have a unique solution [though a local minimum to (1) may exist]. Consider now the nonlinear least-square problem with nonlinear constraints. We formulate this problem as wheref 1, : R n ->R m andf 2 : R n R m2, with for the sake of simplicity. For notational convenience, we define

JOTA: VOL. 100, NO. 1, JANUARY 1999 147 with The first-order KKT conditions for this problem read We will call a solution to (4) a critical point. As for the unconstrained problem, we assume that J is rank-deficient in a neighborhood of the critical point of interest. We will also analyze the case when J 1 is of rank s<m1 in a neighborhood of a critical point. It is easy to state the KKT conditions when both J, J\ have full rank in a neighborhood of the solution. If either J or J1 is not of full rank, we say that the problem is ill-posed. We will motivate this statement further before going into the different problem reformulations. It is natural to consider the constrained problem (3) ill-posed if (4) does not have a locally unique solution. This will be the case if the matrix is singular. Here, we have introduced the operator O defined as for yer m and g: R n ->R m a twice continuously differentiate function. We have the following lemma. Lemma 1.1. Define ^v,) as the projection on the nullspace of the Jacobian of f1, J1. The matrix K in (5) is singular if and only if j T 1 or awovl^'^cr,) is rank-deficient. Proof. The if-part is proved by considering giving and thus which by projection on Jf (J\) proves the if-part. For the converse statement, we assume that J T 1 has full rank (J T 1 rankdeficient is trivial) and that &#-(j^2xx& #-M) has not full rank. Then, we

148 JOTA: VOL. 100, NO. 1, JANUARY 1999 have to show that the matrix has a nontrivial nullspace; i.e., there exists y^o such that It is always possible to choose ye jv(j1), and since VL-S^^W1 > has a nontrivial nullspace, the lemma is proved. D We will assume that m\<r. This assumption may be regarded as a constraint qualification (see Lemma 2.1) when J is rank-deficient and does not appear to be a severe restriction in practice. The assumption is used implicitly in the following theorem that we will prove in Section 2.2. Theorem 1.2. Assume that J is rank-deficient in a neighborhood of a critical point. Then, VL^f in (5) is singular with jv(j) c Jf(VL-S?) and ^(VL^) = ^(^r). Moreover, ftr(j,1)v2xx&0^<1t, (and thus K) is singular with a nullspace in ^V(J1) n ^(J). Theorem 1.2 makes it clear that J or J\ rank-deficient in a neighborhood of a critical point gives an ill-posed problem. Now, we turn to the question of reformulating our problems. In the unconstrained case, it is natural to find the minimum-norm solution when J is rank-deficient since it is of interest that the solution is of reasonable size with a residual as small as possible. Therefore, we may use the minimumnorm problem In Ref. 7, necessary and sufficient conditions for a local minimum of this minimum-norm problem are derived. The center xc is chosen from a priori information and should ideally be an approximation of the solution. One possible extension of the unconstrained problem to the rankdeficient constrained problem is to consider

JOTA: VOL. 100, NO. 1, JANUARY 1999 149 Problem (6) is to be understood as minimizing \\x-x c 2, where x is in the solution set of problem (3). If in addition the constraints are ill-posed in the sense that J 1, is rankdeficient in a neighborhood around a critical point, we formulate the problem as Again these three minimization problems are to be thought of as finding the minimum distance to x c, subject to x being in the solution set of the two inner minimization problems. For f 1 and f 2 both linear, a formula for the least-norm solution of (7) with x c = 0 is given in Ref. 8. To the best of our knowledge, the nonlinear problem has not been treated elsewhere. 2. Constrained Full-Rank Case In this section, we consider problem (6). We will assume that only J is rank-deficient and J\ is of full rank in a neighborhood of a critical point x. Consequently, problem (6) is more or less a straightforward generalization of the unconstrained case inheriting the same type of rank deficiency. The constant-rank theorem (Ref. 9) implies that there exists functions h: R r -»R m and z: R n -»R r such lhatf(x) = h(z(x)), with rank(sfi/sz) = rank(8z/dx) = r in a neighborhood of x. Using this representation in (6) and assuming that J\ has full rank gives This problem decouples into

150 JOTA: VOL. 100, NO. 1, JANUARY 1999 with a solution z, and If (9) is going to be a meaningful problem, it is necessary to add a constraint qualification. It seems natural to make the standard assumption that dh\ /dz has full row rank. In the following lemma, the implications of this assumption are stated. Lemma 2.1. Assume that J has rank r<n and fi(x) = h1(z(x)) is attained from the constant rank theorem. If dh\ /dz has full row rank then m1 < r and N(J1) n @(JT) ^ 0. Proof. Since the constant rank theorem tells us that dh1 /dz has full rank it is necessary that m\<r. From the chain rule it is also seen that dh{/dz = 11V1 with 9t(V^ = 9t(3T}. We see that dh1/dz have a nonempty nullspace if and only if ^(J{1n &(JT) ^ 0. D 2.1. Necessary Conditions for Local Minimum. We have the following theorem. Theorem 2.1. Let f=[f1 ;f2]: Rn->Rm be a function whose Jacobian J is of rank r<,n, and assume that f1 : Rn-»Rm' has a Jacobian J\ of rank m\ in a neighborhood of x. Then, a necessary ^condition for (6) to have a local minimum at x is that there exist vectors Ai and y such that Proof. We use the two problems (9) and (10). A necessary condition for (9) to have a local minimum is that h\=0 and the gradient of the Lagrangian is zero, i.e.,

From the chain rule, we get JOTA: VOL. 100, NO. 1, JANUARY 1999 151 and a necessary condition is then which is the second condition given in the theorem with h=f, A1 = A1 at x. A necessary first-order condition for (10) to have a local minimum is that or From (12), we have that M(JT) = M((dz/dx)T), proving the third condition in the theorem. We may add that it is also possible to prove this statement by looking at the necessary condition for We then attain the condition According to Theorem 1.2, JT2 J2+ tfo/i'+/tof 2) has a range space containing &(JT) and the statement is proved again. D There is more to say about the structure of the constrained problem than the theorem reveals. Consider the problem We can define the Lagrange function as and the gradient is giving the second condition as in the proof of Theorem 2.1. We will use this formulation in the next section.

152 JOTA: VOL. 100, NO. 1, JANUARY 1999 2.2. Sufficient Conditions. In order to derive the sufficient conditions, we state the following lemma. Lemma 2.2. If= h(z(x)) e R m with J of rank r < n and y er m, then Proof. From the chain rule, we have that where e i is the ith column in the identity matrix. The statement is easily attained from this by looking at y T Of". D The following two corollaries are a direct consequence of Lemma 2.2. Corollary 2.1. If J T y = 0, then Corollary 2.2. If J T y = 0 and f, y are partitioned as in (7), then Consider (13) again. From the lemmas above, it is seen that the Hessian of the Lagrange function is From the second necessary condition in Theorem 1.1, we have that or by the properties of the operator O,

JOTA: VOL. 100, NO. 1, JANUARY 1999 153 Thus, V2xxy is a matrix with rank r and with a nulispace containing the nullspace of Bz/dx just as in the unconstrained case (Ref. 7). We have also proved Theorem 1.2. For attaining the sufficient condition for a local minimum, we must restrict the Hessian of the Lagrange function to Jf(J)L = &(JT), since any part in Jf(J) will give no information [locally, z(x+p) is constant and (dz/ 8x)p is zero for pe^(j)]. We then restrict our attention to and remaining in St(Jr), project this matrix on the nullspace of J1V1. If we define a matrix ZeRrx<r~mi) that spans the nullspace of J1V1, we get the projected Hessian of the Lagrange function to be Another way of formulating the analysis above is to consider the Taylor expansion of y(x+p, A,). This expansion has a second-order term ptv*xyp. Since we know that p must be in &(JT), we have We also know that p is in the nullspace of J1, that is, which gives with Z a matrix that spans the nullspace of J1 V1. Thus, the second-order term can be written as above. We summarize this in a theorem characterizing the local minimum of problem (13). Theorem 2.2. Assume that V1 is a matrix with Si(1\) = 9t(JT) = ^T(J)1 and Z is a matrix with &(Z) = Jf(J1 V1). If x is a local minimum to the problem (13), then the two first conditions in Theorem 2.1 are satisfied and the matrix where

154 JOTA: VOL. 100, NO. 1, JANUARY 1999 is positive semidefinite. Conversely, if the two conditions in Theorem 2.1 are satisfied and W1 is positive definite, then x is a local minimum to problem (13). We can now extend the previous theorem to the general minimumnorm problem (8). Theorem 2.3. Assume that V\, V2 are matrices with M(V1) = 3t(JT) = ^(J)1, St(V2) = ^(J) and Z is a matrix with ^(Z) = ^(y1k1). If x is a local minimum to the problem (8), then the conditions in Theorem 2.1 are satisfied, the matrices are positive semidefinite, and Conversely, if the conditions in Theorem 2.1 are satisfied and W1, W2 are positive definite, then x is a local minimum to problem (8). Proof. The original constrained problem decouples in the two problems (9) and (10). The results on W2 are derived from problem (10). It is easily seen that the Hessian of the Lagrangian for this problem is In- yr O z", where This matrix is to be projected ^(S) = Jf(J), i.e., K2(/n- yr Q z") V2. Using Lemma 2.2, with we get the condition on W2. Analyzing (9), we determine the Hessian of the Lagrangian in (14) and attain or since S has full rank,

JOTA: VOL. 100, NO. 1, JANUARY 1999 155 With y = [1i; H2 ] and using the necessary condition we have from Lemma 2.2 that or with V1 defined in the theorem. The Hessian of the Lagrange function is to be projected onto the nullspace of dh\ /dz. From we get that and the nullspace of dh1 /dz is seen to be the nullspace of a matrix 1\ 1\. The projected Hessian of the Lagrange function is then with^(z) = ^"(J111,1. D 3. Rank-Deficient Constraints Rank-deficient constraints really do not complicate the problem very much. We assume that J1 has rank s <,p and the natural problem formulation is given in (7). Again, we can use the powerful constant-rank theorem to formulate the following lemma. Lemma 3.1. Assume that J1 has rank s<p and that/1 (x) =1h, (z(x)), where rank(3z/cbc) = r. Then, d1\/dz has rank s and there exist functions c:rs-^rp and d: Rr-Rs, whose Jacobians are of full rank, such that h1(z) = c(d(z)\ Proof. From the chain rule we get Since dz/dx has rank r and df1 /dx have rank s, we get that d1\ /dz has rank s. From the constant-rank theorem, we then get h\ (z) = c(d(z)). D

156 JOTA: VOL. 100, NO. 1, JANUARY 1999 Using the lemma we can formulate the constrained problem as Problem (16) can be solved at three levels. First we have d as the solution of and the inner minimization problem becomes This problem decouples into with a solution, and the final problem is again 3.1. Necessary Conditions. One possible formulation of the necessary conditions for a local minimum is to use (7). Theorem 3.1. A necessary condition for problem (7) to have a minimum at x is that there exists vectors AI, y such that Proof. The first condition comes from problem (17). This problem has as a necessary condition for a local minimum that

From the chain rule, we get JOTA: VOL. 100, NO. 1, JANUARY 1999 157 and since Sd/dz, dz/dx both have full rank, we get J 1 T f 1 = 0 at x. To prove the second condition, we use problem (19). A necessary condition for (19) to have a local minimum is that d(z) = d and the gradient of the Lagrangian is zero, i.e., By defining and using the chain rule again, we get which is the wanted second condition at x. The proof of the last condition is given by the necessary conditions for problem (20) and is found in the proof of Theorem 2.1. D 3.2. Sufficient Conditions. By using the chain rule on f\(x) = c(d(z(x))), we get the following lemma. Lemma 3.2. Assume that f\ (x) = c(d(z(x))) and J T 1y = 0. Then, We have a similar lemma for f 1 (z(x)) = h 1 (z(x)). Lemma 3.3. J T 1y = 0, then Finally, we have a lemma for h 1 (z) = c(d(z)). Lemma 3.4. Assume that h 1 (z) = c(d(z)) and that Z is a matrix such that &(Z} = Jf(J 1 V 1 ). Then,

158 JOTA: VOL. 100, NO. 1, JANUARY 1999 Proof. From the chain rule, we get giving the lemma by looking at A1 Qf'1 and using the fact that the nullspace of dc/dz contains the nullspace of J\V\. D We are now able to prove the following theorem. Theorem 3.2. Assume that U1 is a matrix such that $?(U1) = ^(/1), V1,V2 are matrices with #(K1) = &(JT} = ^(J)L, (V2) = Jf(J\ and Z is a matrix with 3t(Z) = ~V(J1 V1). If x is a local minimum to the problem (8), then the conditions in Theorem 2.1 are satisfied and the matrices where are positive semidefinite. Conversely, if the conditions in Theorem 2.1 are satisfied and W\, W2 are positive definite, then x is a local minimum to problem (8). Proof. For attaining the first condition, we consider problem (17). A necessary condition for a local minimum is that the Hessian is positive semidefinite. From the chain rule, we get and then Furthermore, we have that

JOTA: VOL. 100, NO. 1, JANUARY 1999 159 giving us Defining U1 as above, we get The first condition is attained by using Lemma 3.2 with y=1\. For the second condition, we introduce the Lagrange function and The first two terms have been analyzed before and give Further, with Z defined in the theorem, we have from Lemma 3.3 and Lemma 3.4 The last condition on W3, is proved exactly as in the proof of Theorem 2.3. D Comparing the result of this theorem to the unconstrained case, one may be surprised that there is no condition containing Jf(J\1 corresponding to the matrix V2 in W3. However, the curvature in N(J\1 is taken care of in W2, since this matrix contains the second-order information in c restricted to -/r(j1). 4. Conclusions We have presented a formulation of the nonlinear least-square problem with nonlinear constraints that can be used to find a minimum-norm solution in the ill-posed case where either J or J1 is rank-deficient in a neighborhood of a critical point. Necessary and sufficient conditions for a local minimum have been derived. These conditions are easy to verify and can be used as a basis for constructing methods that can solve the ill-posed problem.

160 JOTA: VOL. 100, NO. 1, JANUARY 1999 References 1. WALKER, H. F., Newton-Like Methods for Underdetermined Systems, Lectures in Applied Mathematics, Vol. 26, pp. 679-699, 1990. 2. BATES, D., and WATTS, D., Nonlinear Regression Analysis and Its Applications, John Wiley, New York, New York, 1988. 3. VAN HUFFEL, S., and VANDEWALLE, J., The Total Least-Square Problem: Computational Aspects and Analysis, SIAM, Philadelphia, Pennsylvania, 1991. 4. ERIKSSON, J., GULLIKSSON, M., LINDSTROM, P., and WEDIN, P. A., Regularization Tools for Training Feed-Forward Neural Networks, Part 2: Large-Scale Problems, Technical Report UMINF 96.06, Department of Computing Science, Umea University, Umea, Sweden, 1996. 5. HANSEN, P. C., Rank-Deficient and Discrete Ill-Posed Problems, Technical Report, Department of Mathematical Modelling, Section for Numerical Analysis, Technical University of Denmark, Lyngby, Denmark, 1997. 6. ERIKSSON, J., and GULLIKSSON, M., Local Results for the Gauss-Newton Method on Constrained Exactly Rank-Deficient Nonlinear Least Squares, Technical Report UMINF 97.12, Department of Computing Science, Umea University, Umea, Sweden, 1997. 7. ERIKSSON, J., Optimization and Regularization of Nonlinear Least-Square Problems, Technical Report UMINF 96.09 (PhD Thesis), Department of Computing Science, Umea University, Umea, Sweden, 1996. 8. HANSON, R. J., and LAWSON, C. L., Solving Least-Square Problems, Prentice Hall, Englewood Cliffs, New Jersey, 1974. 9. CONLAR, L., Differential Manifolds: A First Course, Birkhauser Advanced Texts, Boston, Massachusetts, 1993.