THE PROCRUSTES PROBLEM FOR ORTHOGONAL STIEFEL MATRICES A. W. BOJANCZYK AND A. LUTOBORSKI February 8, 1998 Abstract. In this paper we consider the Procrustes problem on the manifold of orthogonal Stiefel matrices. Given matrices A R mk, B R mp, m p k, we seek the minimum of ka? BQk for all matrices Q R pk, Q T Q = I kk. We introduce a class of relaxation methods for generating minimizing sequences and oer a geometric interpretation of these methods. Results of numerical experiments illustrating the convergence of the methods are given. 1. Introduction We begin by dening the set OSt(p; k) of orthogonal Stiefel matrices: OSt(p; k) = fq R pk ; Q T Q = I kk g (1.1) which is a compact submanifold of dimension M = pk? 1 k(k + 1) of the manifold O(p) of all p p orthogonal matrices which has dimension 1 p(p? 1). Let A R mk and B R mp where m p k be given. Let kak = (trace A T A) 1 denote the standard Frobenius norm in R mk. The Procrustes problem for orthogonal Stiefel matrices is to minimize P[A; B](Q) = ka? BQk (1.) for all Q OSt(p; k). Problem (1.) can be simplied by performing the singular value decomposition of the matrix B R mp. Let B = USV T where U O(m), V O(p) and S = (diag( 1 ; : : : ; p ); O (m?p)(m?p) ) T, then P[A; B](Q) =ku(u T A? SV T Q)k (1.3) =k ~ A? S ~ Qk where ~ A = U T A and ~ Q = V T Q. Due to the fact that the last m? p rows of S are zeros we will simplify (1.3) by introducing new notations. We denote = diag( 1 ; : : : ; p ) and assume from now on that the problem does not reduce to a lower dimensional one, or in other words that all 1 p > 0. We dene A R pk to be a matrix composed of the rst p rows of ~ A. Consequently the Procrustes minimization on the set of orthogonal Stiefel matrices is : For given A R pk and diagonal R pp minimize for all Q OSt(p; k). P[A; ](Q) = ka? Qk (1.4) 1
A. W. BOJANCZYK AND A. LUTOBORSKI The original formulations of the Procrustes problem can be found in [], [3]. We may write (1.4) explicitly as P[A; ](Q) = trace(q T Q)? trace(q T A) + kak (1.5) The Procrustes problem has been solved analytically in the orthogonal case when p = k and OSt(p; k) = O(p), see [11]. In this case Q O(p) and we have P[A; ](Q) = kak + kk? trace(q T A) (1.6) Provided that the singular value decomposition of A is A = P?R T the minimizer in (1.6) is then Q = P R T : (1.7) The functional P[A; ] in (1.5) is a sum of two functionals in Q : the quadratic functional trace(q T Q) and the linear functional?trace(q T A). It is well known how to minimize each of the functionals separately. The minimum value of the quadratic functional is equal to the sum of squares of the k smallest diagonal entries of. This result is due to Ky Fan [1]. The linear functional is minimized when trace(q T A) is maximized. The maximum of this trace is given by the sum of the singular values of the matrix T A. This upper bound on the trace functional has been established by J.Von Neumann in [16], see also [11]. Separate minimization of the quadratic and the linear part are well understood methods. The analytical solution of the orthogonal Procrustes problem for Stiefel matrices is to the best of our knowledge an open problem. It will be useful to interpret the minimization (1.4) geometrically. To do that we dene an eccentric Stiefel manifold OSt[](p; k) in R pk : OSt[](p; k) = fx R pk : X T? X = I kk g : (1.8) The eccentric Stiefel manifold OSt[](p; k) is an image of the orthogonal Stiefel manifold OSt(p; k) under the linear mapping Q?! Q. The image of a sphere fa R pk : kak = p kg of radius p k in R pk under this mapping is an ellipsoid in R pk of which OSt[](p; k) is a subset. The eccentric Stiefel manifold is a compact set contained in a larger ball in R pk centered at 0 and of radius p k 1. Clearly OSt[](p; k) = fx R pk : X = Q; Q OSt(p; k)g (1.9) We note that OSt[](p; 1) is a standard ellipsoid in R p : OSt[](p; 1) = and OSt[I](p; k) = OSt(p; k). Therefore if ( x R p : x 1 1 + + x p p = 1 ) (1.10) min P[A; ](Q) = ka? QOSt(p;k) Q k (1.11) then a point Q is the projection of A onto the eccentric Stiefel manifold OSt[](p; k). Due to the compactness of the manifold a projection Q exists. The big diculty which we face in the task of computing the minimizer Q is the fact that the manifold OSt[](p; k) is not a convex set and a projection on a non-convex set is in general non-unique.
PROCRUSTES PROBLEM FOR STIEFEL MATRICES 3. Notations Elementary plane rotation by an angle is represented by cos? sin G() = sin cos Elementary plane reection about the line with slope tan is cos sin R() = sin? cos For Q R pk and 1 m < n p we introduce the following submatrices of Q : Q [m;n] R k Q (m;n) R (p?)k Q [m;n] [m;n] R consists of the m and n-th rows of Q consists of the rows complementary to Q [m;n] consists of the entries on the intersections of the m-th and n-th rows and columns of Q: (.1) (.) A plane rotation by an angle in the (k; l)-plane in R p,p is represented by a matrix G k;l () R pp such that G k;l () [k;l] [k;l] = G() G k;l() (k;l) (k;l) = I (p?)(p?) (.3) A plane reection R k;l () in the (k; l)-plane is dened similarily by means of R(). J k;l (p) is the set of all plane rotations and reections in the (k; l)-plane. J (p) is the set of plane rotations and reections in all planes. Clearly J k;l (p) J (p) O(p) 3. Relaxation methods for the Procrustes problem The Stiefel manifold OSt(p; k) is the admissible set for the minimizer of the functional P in (1.4). This manifold however is not a vector space which poses severe restrictions on how the succesive approximations can be obtained from the previous ones. Additive corrections are not admissible, but the Stiefel manifold is closed with respect to left multiplication by an orthogonal matrix R O(p). Thus RQ, where Q OSt(p; k), is an admissible approximation. Consequently, we restrict our considerations to a class of minimization methods which construct the approximations ^Q to the minimizer Q by the rule ^Q = RQ ; (3.1) where Q and ^Q denote respectively the current and the next approximations to the minimizer. In what follows we will consider only relaxation minimization methods which seek for the minimizer of the functional P, according to (3.1) with R = R N R 1 (3.) where N M and M is the dimension of the manifold OSt(p; k). Each R i O(p), i = 1; ; : : : ; N depends on a single parameter whose value results from a scalar minimization problem. We will refer to the left multiplication by R in (3.1) as to a sweep. Our relaxation method consists of repeated applications of sweeps which produce a minimizing sequence for the problem (1.4).
4 A. W. BOJANCZYK AND A. LUTOBORSKI We will choose matrices R i to be orthogonally similar to a plane rotation or reection. Dierent choices of similarities will lead to dierent relaxation methods. We set and dene Q 0 =Q; R 0 = I Q i =R i?1 R 0 Q ; i = 1; ; : : : ; N + 1 ; (3.3) R i = R i () = P i J i ()P T i (3.4) where J i J (p), and P i O(p) may depend on the current approximation to the minimizer Q i. It is the choice of P i that fully determines the relaxation method (3.1)-(3.4). The selection of the parameter in (3.4) will result from the scalar minimization ka? (R i ()Q i )k = min ~ ka? (R i (~)Q i )k : (3.5) The matrix R i can be viewed as a plane rotation or reection in a plane spanned by a pair of columns of the matrix P i. The indices (r; s) of this pair of columns are selected according to an ordering N of a set of pairs D. The ordering N : D! f1; ; :::; Ng is an bijection, where D f(r; s) : 1 r k; r + 1 s pg. This inclusion guarantees that D contains at least M distinct pairs necessary to construct an arbitrary Q OSt(p; k) as a product of matrices R i. It is clear that relaxation methods satisfying (3.5) will always produce a nonincreasing sequence of the values P[A; ](Q i ). If P i = I pp in (3.4), then R i = J i and the sweep (3.1) has the following particularly simple form ^Q = J N J 1 Q : (3.6) The relaxation method dened by (3.6) will be refered to as a left-sided relaxation method or LSRM. If P i = (Q i ; Q? i ) in (3.4), where Q? i is the orthogonal complement of Q i, then R i = (Q i ; Q? Q T i )J i i (Q? i ) T and hence Q i+1 = R i Q i = (Q i ; Q? Ikk i )J i : 0 (p?k)k Thus by induction the sweep (3.1) has the form ^Q = (Q ; Q? Ikk )J 1 J N : (3.7) 0 (p?k)k The relaxation method dened by (3.7) will be refered to as a right-sided relaxation method or RSRM. Our objective is to propose a geometric interpretation of the LSRM and describe its numerical implementation based on the geometric aspects of the method. We will compare our left-sided relaxation method with an existing method for the the Procrustes problem for orthogonal Stiefel matrices due to H. Park in [14]. The general description of the RSRM allows us to formulate Park's method as a relaxation method and compare it with the LSRM. The method in [14] is based on the concepts introduced earlier by Ten Berge and Knol [3], where the problem is called the unbalanced Procrustes problem and its solution is based on iterative
PROCRUSTES PROBLEM FOR STIEFEL MATRICES 5 solution of a sequence of orthogonal Procrustes problems called balanced problems. For the study of other minimization methods on submanifolds of spaces of matrices see [13], [15] and [4]. 4. Planar Procrustes problem We will now present the left-sided relaxation method. Without loss of generality let us assume that the planes (r; s) in which transformations operate are chosen in the row cyclic order, in the way analogous to that used in the cyclic Jacobi method for the SVD computation [7]. In this case N : D! f1; : : : ; 1 p(p? 1)g, D = f(r; s) : 1 p? 1; r + 1 s pg is given by N (r; s) = s? r + (r? 1)(p? r ) and ^Q in (3.6) has the following form ry p?1 Y ^Q = J p?r;p?s+1 Q ; (4.1) r=1 s=1 where J r;s J r;s (p). Let Q i = Q N (r;s) be the current approximation in the sweep. The next approximation to the minimizer is Q i+1 = J i ()Q i. The selection of the parameter results from the scalar minimization ka? (J i ()Q r;s )k = min ~ ka? (J i (~)Q r;s )k (4.) Our main goal now is to show how to nd in (4.). Consider the functional J?! ka? (JQ)k in (4.), where for simplicity of notation we omitted all indices. Without loss of generality we assume that N?1 (i) = (r; s) = (1; ) and hence J = G 1; () = diag? G(); I (p?)(p?) where G() is a plane rotation (the case of reection is similar and can be treated in a completely analogous way). The minimization in (4.) is precisely the minimization of a11 a 1 a 1k 1 0? G() q11 q 1 q 1k a 1 a a k 0 q 1 q q k = A [1;]? [1;] G()Q[1;] [1;] : (4.3)? Let U Q?VQ T be the SVD decomposition of Q[1;] such that? = diag( 1 ; ); 0 (k?) f() = and 1 > 0. Thus, f() = A[1;] V Q? 1 0 1 0 0 G()U 0 Q 0 0 : (4.4) Denote B = A [1;] V Q. Note that the last k? columns of the matrix B in (4.4) are always approximated by zero columns, and thus the minimization of f() is equivalent to minimization restricted to the rst two columns. Introducing a new variable = +, where is the angle of the plane rotation ( or reection ) U Q and setting c = cos, s = sin we obtain the following minimization problem.
6 A. W. BOJANCZYK AND A. LUTOBORSKI F () = B 1 0 [1;]? c?s 0 s c 1 0 0 + kb (1;) k (4.5) We may write (4.5) explicitly as where q = (c; s) T and We now denote F () = z 1 c + z s? y 1 c? y s + kb k (4.6) z 1 = 1 1 + (4.7) z = 1 + 1 y 1 = b 11 1 1 + b (4.8) y =?b 1 1 + b 1 1 Z = diag(z 1 ; z ) C = Z?1 Y Y = (y 1 ; y ) T where z 1 z > 0 and C = (c 1 ; c ) T. By completing the squares we may represent F () in the following form: y1 y F () =? z 1 cos +? z sin? kck + ka [1;] k z 1 z (4.9) = kc? Zq()k? kck + ka [1;] k for q() = (cos ; sin ) T. Thus the minimization of the functional (4.9) is equivalent to the following problem, For given C R 1 and diagonal Z R minimize P[C; Z](q) = z 1 cos + z sin? c 1 z 1 cos? c z sin + jjcjj (4.10) = kc? Zqk for all q OSt(; 1), q = (cos ; sin ) T. The minimization problem of the type (4.10) will be called a planar Procrustes problem. Such problem has to be solved on each step of our relaxation method and is geometrically equivalent to projecting C onto an ellipse. In the next section we consider two dierent iterative methods for nding the projection. Both of these geometrically based method provide excellent initial approximations to the solution end means for error control. 5. Projection on an ellipse The geometrical formulation of the planar Procrustes problem (4.10) is very simple. Given a point C and an ellipse E = OSt[Z](; 1) E = OSt[Z](; 1) = (x 1 ; x ) T : = 1 ; (5.1) x 1 z 1 + x z in R we want to nd a point S E, S = Zq = (z 1 cos ; z sin ) T which is a projection of C onto E.
PROCRUSTES PROBLEM FOR STIEFEL MATRICES 7 This can be achieved in a variety of ways. We describe the classical projection of a point onto an ellipse due to Appolonius and a method of iterated reections based on the reection property of the ellipse. 5.1. The hyperbola of Appolonius. Recall the construction of a normal to an ellipse from a point [17], due to Appolonius. With the given ellipse E in (5.1) and with the point C we associate the hyperbola H given by: x = m x 1 x 1? m 1 (5.) where (m 1 ; m ) is the center of H with coordinates m 1 = c 1z 1 z 1? z m =? c z z1? : (5.3) z C S K F Figure 1. Hyperbola of Appolonius To nd the coordinates of the projection point S we have to intersect the hyperbola of Appolonius with the ellipse that is to solve a system of two quadratic equations (5.1) and (5.). Using (5.) to eliminate x from (5.1) we obtain a fourth order polynomial equation in x 1 (x 1? m 1 ) (z 1? x 1)? z1 m z x 1 = 0: (5.4) For any specic numerical values of the coecients this equation can be easily solved symbolically. A simpler, purely numerical alternative is to solve the system (5.1), (5.) using Newton's method. Another alternative is to reduce the system to a scalar trigonometric equation. Assume that C = (c 1 ; c ) T is in the rst quadrant and that C = E. Let S be the projection of C onto E. Then setting (x 1 ; x ) = (z 1 cos ; z sin ) in (??) and next substituting t = tan leads to the equation g(t) = 0 in t, where g(t) = c 1 z 1 t? (z 1? z )t(1 + t )? 1? c z : (5.5)
8 A. W. BOJANCZYK AND A. LUTOBORSKI It is easy to see that the function g(t) is convex and has one positive root. It can also be seen that for t 0 = z 1?z +cz c 1z 1 we have g(t 0 ) > 0. Thus, Newton method starting from the initial approximation t 0 will generate a decreasing, convergent sequence of approximations to the root of g(t) = 0. 5.. Iterated reections. Assume, as before, that C is in the rst quadrant and that C = E, see Fig.. Every other case reduces to this one through reections. Let C = (c 1 ; c ) T B = F = T 1 (z1? z z ); 0 (5.6) 1 q z 1? z ; 0 T K =?F (5.7) C L S R K B F Figure. Iterated reections The reection property of E says that (for C both inside and outside E) S is characterized by : \KSC = \F SC. Let L; R E, as on Figure, be given by L = (z 1 cos L ; z sin L ) T R = (z 1 cos R ; z sin R ) T where for C outside E : tan L = z 1c z c 1 (5.8) cos R = m z 1p z 1? z + p (1 + m )z 1 z4 m z 3 1 + z 1z c m = p c 1? z1? z (5.9) (In the case C inside E, the coordinates of the points L and R can be computed similarly). Clearly R < < L. Analogously as in the bisection method, we compute M = (z 1 cos M ; z sin M ) (an intermediate point between L and R) from L and R by setting M = 1 ( L + R ). If kf? Mk hf? M; C? Mi < kk? Mk hk? M; C? Mi (5.10)
PROCRUSTES PROBLEM FOR STIEFEL MATRICES 9 (where h ; i denotes the inner product), then R M and we set L 1 = M and R 1 = R so that R1 L1. If (5.10) doesn't hold we set L 1 = L and R 1 = M. We thus construct a sequence fl n g E such that lim n!1 L n = S. 5.3. Some remarks on the general and planar Procrustes problems. The planar Procrustes problem has several features which the general problem (1.4) of projection onto the eccentric Stiefel manifold does not posses. E has the reection properties and Appolonius normal. The reection properties of the ellipse do not extend to the eccentric Stiefel manifold and in particular not even to ellipsoids in R p. The construction of an Appolonius normal to the ellipse based on the orthogonality of the ellipse and an associated hyperbola which results in a scalar equation (5.4) is also particular to the planar problem. As a result in the case p > k > 1 our relaxation step, which amounts to solving a planar Procrustes problem, cannot be directly generalized to a higher dimensional problem. A point not belonging to E has either a unique projection onto E or a nite number of projections. B F Figure 3. Points in the rst quadrant and their projections on the ellipse. Hence if C = conv(e) then there exits a unique projection S of C onto E characterized by hc? S; F? Si 0 for all F conv(e). A point C has a non-unique projection S onto E i C is on the major axis between points B and?b. The projections of all other points are unique. In general solutions to the Procrustes problem may form a submanifold of the Stiefel manifold. Various locations of C and its projection(s) S are shown on Fig. 3. Fig. 3 shows also the upper part of the evolute of the ellipse (dashed). From the points in the plane outside the evolute normals to E can be drawn. From the points on the evolute 3 normals can be drawn and 4 normals can be drawn from the
10 A. W. BOJANCZYK AND A. LUTOBORSKI points inside the evolute. The intersections of the normal from C with the ellipse correspond to the zeros of (5.4) and we are able to geometrically rule out the zeros not corresponding to the projection. The distance function d(c) = min q kc? Zqk whose evaluation requires solving the planar Procrustes problem (4.10) is shown on Fig 4. The function d is not dierentiable at points C in the segment [?B; B]. Its singularities are of interest in geometrical optics, where d is called the optical distance, and in the theory of Hamilton-Jacobi equations [1]. Finally we observe that E cuts the plane into two components. However if C = 0 @ 1 p 0 0 0A (5.11) 0 0 then C is on a sphere of radius p in R 3 containing the Stiefel manifold OSt[I](3; ) but the point kc for any k R cannot be connected to 0 with a segment intersecting OSt(3; ). 0.6 0.4 0. 0 1.5 1 1 0.5 0 0.5 1 0.5 0 0.5 1.5 1 Figure 4. Graph of d(c) = min kqk=1 kc? Zqk Both observations concerning the analytical problem are reected in the computations and have computational implications. The nal remarks are about the relation of the Procruste problem to the constrained linear least squares problem. The quadratically constrained linear least squares problem min kb? Axk (5.1) kxk= arises in many applications [8], [10], [9], [5]. By changing variables this problem can be transformed into a special Procrustes problem. The Procrustes problem is min ka? kqk=1 qk (5.13)
PROCRUSTES PROBLEM FOR STIEFEL MATRICES 11 where A = UV T, a = U T b= and q = V T x=. This Procrustes problem is equivalent to projecting the point a onto the ellipsoid OSt[](p; 1). Let kqk = 1. It is clear that the vector n =?1 q has the direction of the normal vector to the ellipsoid at the point q. Thus if q is the projection of a on the ellipsoid then the vector q? a is parallel to the vector n. Thus there exists a scalar so a i? i q i = q i i (5.14) where = diag( i ). As kqk = 1 one can obtain an equation for px ai i 1 = + i : (5.15) i=1 The parameter can be computed by solving this equation. Then the components of q are given by q i = a i i + : i The equation (5.15) is the so-called secular equation which characterizes the critical points of the Lagrangian h(; q) = kq? ak + (kqk? 1); (5.16) see [11]. Thus the multiplier in (5.14) is the Lagrange multiplier in (5.16). 6. Geometric Interpretation of Left and Right Relaxation Methods Since the notion of the standard ellipsoid OSt[](p; 1) in R p is very intuitive we will now interpret the minimization problem (1.11) treating matrices in R pk as k-tuples of vectors in R p. Let A = (a 1 ; a ; : : : ; a k ) be a given k-tuple of vectors in R p. Let Q = (q 1 ; q ; : : : ; q k ) OSt(p; k) be the current approximation to the minimizer. Clearly the points q i all belong to the ellipsoid OSt[](p; 1). Thus the minimization of P[A; ](Q) can be interpreted as nding points qi on the ellipsoid, where qi are orthonormal vectors, that best match, as measured by P[A; ](Q), the given vectors a i in R p. The relaxation method described in Section 3 can be interpreted as follows. Pick an orthonormal basis in R p. In the next sweep rotate the current set of vectors q i as a frame, in planes spanned by all pairs of the vectors from the current basis. In the left-sided relaxation method the basis is the cannonical basis and is the same for all sweeps. All relaxation steps are exactly the same, and all amount to solving a planar Procrustes problem. In the right-sided relaxation method the basis consists of two subsets and changes from sweep to sweep. The rst subset of the basis consists of the columns of the current approximation Q and the second subset consists of the columns of the orthogonal complement Q? of Q. Working only with the columns of Q is equivalent to the so-called balanced Procrustes problem studied by Park [14] which can be solved by means of an SVD computation. The relaxation step in [14] for the balanced problem consists of computing the SVD of the matrix (q r ; q s ) T (a r ; a s ). In our relaxation setting, the relaxation step in the right-sided relaxation method requires solving
1 A. W. BOJANCZYK AND A. LUTOBORSKI the scalar minimization problem (3.5), min c?s (a r ; a s )? (q r ; q s ) s c c +s =1 (6.1) which leads to a linear equation in tangent of and is equivalent to the SVD computation in [14]. Each of these steps is a rotation of vectors q r and q s in the plane spanned by q r and q s so the rotated vectors on the ellipsoid best approximate the two given vectors a r and a s. However, as the columns of Q do not span the whole space R p, it might happen that spanfq 1 ; : : : ; q k g 6= spanfq1; : : : ; qk g, and hence it might not be possible to generate a sequence of approximations that will converge to Q. In order to overcome this problem the matrix Q is extended by its orthogonal complement Q? = (q k+1 ; : : : ; q p ) so that span(q1; : : : ; qk ) span(q 1; : : : ; q p ). The scalar minimization subproblems in [14] involving vectors from both subsets are referred to in [14] as unbalanced subproblems. These scalar minimizations have the following form min c +s =1 a r? (q r ; q s ) c s (6.) That is, the unbalanced subproblem is to nd a vector on the ellipsoid in the plane spanned by q r and q s closest to the given vector a r. As the intersection of this plane and the ellipsoid is an ellipse, the unbalanced subproblem can be expressed as a planar Procrustes problem (4.10) and any of the algorithms discussed in Section 5 can be used to solve this unbalanced problem. Other choices of bases may be possible but the choices leading to the left and the right sided-relaxation methods seem to be the most natural. 7. Numerical experiments In this section we present numerical experiments illustrating the behavior of the left and the right relaxation methods discussed in Section 3. We will start by summarizing the left and the right relaxation methods given below in pseudocode. Given A R pk, A = (a 1 ; :::; a k ), and = diag( 1 ; :::; p ) both algorithms construct of sequences of Stiefel matrices approximating the minimizer of (1.4). Algorithm LSRM: 1. Initialization: set Maxstep, Q = I pk, n = 0, r?1 = 0, r 0 = ka? Qk. Iterate sweeps: while (r n? r n?1 ) > threshold and n < Maxstep for i = 1 to k for j = i + 1 to p solve planar Procrustes problem min ka [i;j]? [i;j] [i;j] J()Q[i;j] k Q [i;j] n n + 1 r n = ka? Qk J()Q [i;j]
Algorithm RSRM: PROCRUSTES PROBLEM FOR STIEFEL MATRICES 13 1. Initialization: set Maxstep, Q = I pp, n = 0, r?1 = 0, r 0 = ka? QI pk k. Iterate sweeps: while (r n? r n?1 ) > threshold and n < Maxstep for i = 1 to k for j = i + 1 to k solve min k(a T ) [i;j]? J()(Q T ) [i;j] k (Q T ) [i;j] J()(Q T ) [i;j] for j = k + 1 to p solve planar Procrustes problem min c +s =1 ka T j? (c s)(qt ) [i;j] k e T i (QT ) (c s)(q T ) [i;j] n n + 1 r n = ka? QI pk k We measure the cost of the two methods by the number of sweeps performed by each of the two methods. A sweep in the LSRM method consists of p(p + 1)= planar Procrustes problems. Each planar Procrustes problem requires computation of the SVD of a k matrix. This can be achieved by rst computing the QR decomposition followed by a SVD problem. After the SVD is calculated, a projection on an ellipse has to be determined. The cost of a sweep is approximately O(kp ) oating point operations. A sweep in the RSRM method consists of k(k + 1)= computations of p SVD problems. In adition, there are k(p? k) planar Procrustes problems, each requiring computation of the SVD of a p matrix followed by computation of a projection on an ellipse. Thus the cost of a sweep is again approximately O(kp ) oating point operations. Surely, the precise cost of a sweep will depend on the number of iterations needed for obtaining satisfactory projections on the resulting ellipses. For each projection, this will depend on the location of the point being projected as well as the shape of the ellipse. Computation of the projection will be most costly when the ellipse is at. As can be seen, sweeps in the two methods may have dierent costs. However, the number of sweeps performend by each of the methods will give some bases for comparing the convergence behavior of the two methods. We begin by illustrating the behavior of the LSRM method for nding Q in the Procrustes problem with p = 4; k =, = diag(10 0 ; 10?1 ; 10? ; 10?3 ), and A = Q where 0 Q = B @?3:1665166866158e? 01 5:34030951680499e? 0?1:508494807711354e? 01?9:0694616989718e? 01?7:975468385641e? 01 3:96039609671e? 01?5:868854343875571e? 01?:0034885985765e? 01 The initial approximation is Q 0 = I 4. Some intermediate values of Q are listed in Table 1. 1 C A ;
14 A. W. BOJANCZYK AND A. LUTOBORSKI sweep # Q ka? Qk 1-3.16646771495053e-01 5.33733877457e-0 6.4043e-05-1.511766767561811e-01-9.06530348969716e-01-7.98951355096e-01 3.98411857874910e-01-5.8675130978633e-01 -.01876639897017e-01 5-3.166517048090e-01 5.3403193360516e-0 1.0450e-06-1.508477146734e-01-9.0699744667507e-01-7.975133055357e-01 3.96168386846e-01-5.86889056665494e-01 -.0040784981074e-01 10-3.1665166993697e-01 5.34030989691857e-0 4.180e-08-1.508494100481855e-01-9.06948179307888e-01-7.9754580980885e-01 3.9600876881795e-01-5.86885579514951e-01 -.0034453790306e-01 15-3.16651668678573e-01 5.3403095300504e-0 8.7804e-10-1.508494779430533e-01-9.0694609057939e-01-7.9754678161558e-01 3.9603837640639e-01-5.868854401803947e-01 -.00348687030837e-01 30-3.1665166866163e-01 5.34030951680608e-0 5.605e-14-1.508494807709545e-01-9.0694616994970e-01-7.975468383019e-01 3.9603960959194e-01-5.86885434387950e-01 -.0034885984618e-01 Table 1. Matrices Q in a minimizing sequence generated by LSRM. We will now present comparative numerical results for the LSRM and RSMR methods. Recall that the functional P is a sum of a linear and a quadratic term. We will consider classes of examples when the functional can be approximated by its linear or the quadratic term. In the rst class of examples the linear term dominates the quadratic term or in other words when jjajj >> jjjj. We deal here with a perturbed linear functional. The minimum of the functional P can be approximated by the sum of singular values of T A. The second class of examples consists of cases when the quadratic term dominates the linear term, that is when jjajj << jjjj. We deal here with a perturbed quadratic functional. Then the minimum value of the functional P can be approximated by the sum of the k smallest singular values of. The third class of examples consists of cases when the functional is genuinely quadratic, that is when A Q for some Q OSt(p; k). The minimum of the functional is then close to zero.
PROCRUSTES PROBLEM FOR STIEFEL MATRICES 15 In each class of examples we pick two dierent matrices : one corresponding to the ellipsoid being almost a sphere, that is when I, the other corresponding to the ellipsoid being very at in one or more planes, that is when 1 p is large. The algorithms were written in MATLAB 4. and run on an HP9000 workstation with the machine relative precision = :04e? 16. We set Maxstep = 30 and threshold = 5. As the initial approximation we took Q = I pk for the LSRM, and (Q; Q? ) = I pp for the RSRM. The planar Procrustes solver used was based on the hyperbola of Appolonius (the iterated reections solver was giving numerically equivalent results). Some representative results are shown in Tables -6. RSRM LSRM p k # sweeps ka? Qk kq? Qk # sweeps ka? Qk kq? Qk 6 6.38e-15 5.30e-15 19 9.e-16.37e-15 3 30 4.41e-1 1.33e-11 5 1.48e-15.53e-15 4 30 5.1e-11 1.61e-10 6 1.99e-15 6.30e-15 5 4 5.86e-15 1.07e-14 30 6.00e-11 1.35e-10 9 30 6.85e-09 5.79e-08 30 9.78e-06 6.11e-05 3 30 8.0e-06 6.89e-05 30 7.01e-07 3.18e-06 4 30 3.66e-03 3.87e-0 30.07e-05 1.8e-04 5 30 5.60e-03 5.93e-0 30 5.06e-07 1.98e-06 6 30 4.57e-04 4.68e-03 30 5.3e-1 1.16e-11 7 30 1.6e-03 1.08e-0 30 1.77e-13 3.05e-13 8 30.46e-03.1e-0 30.98e-1 5.34e-1 Table. A = Q and 1 p. RSRM LSRM p k # sweeps ka? Qk kq? Qk # sweeps ka? Qk kq? Qk 6 30 9.00e-03 1.53e+00 6.08e-16 4.69e-15 3 30 7.49e-03 1.55e+00 18 4.47e-16 1.6e-14 4 30 7.66e-03 1.73e+00 16 5.56e-16.86e-15 5 30 3.10e-03 8.6e-01 30 8.99e-16 7.98e-15 9 30 6.8e-03 1.70e+00 30 9.08e-11.6e-08 3 30 5.04e-03 1.41e+00 30 9.71e-07.84e-04 4 30 4.67e-03 1.35e+00 30 1.19e-04 3.39e-0 5 30 7.04e-03.1e+00 30 1.18e-03 3.01e-01 6 30 6.45e-03 1.96e+00 30 3.8e-06 1.48e-05 7 30 4.93e-03 1.49e+00 30 3.53e-06 1.0e-05 8 30 4.87e-03 1.54e+00 30 8.44e-10 1.70e-09 Table 3. A = Q and 1 p 10. Table illustrates the behavior of the two methods when the ellipsoid is almost a sphere and when there exists Q such that Q = A. That is the bilinear and the linear terms are of comparable size. The experiments suggest that the LSRM requires less sweeps to obtain a satisfactory approximation to the minimizer.
16 A. W. BOJANCZYK AND A. LUTOBORSKI Table 3 illustrates the behavior of the two methods when the length of the half of the ellipsoid's axes is approximately 1.0 and the other half is approximately 0.01. In addition, there exists Q such that Q = A. In this case the convergance of the RSRM is particularly slow. We observed that, at least initially, the RSRM fails to locate the minimizer in OSt(4; ) being unable to establish the proper signs of the entries of the matrix Q. The LSRM on the other hand approximates the minimizer correctly. RSRM LSRM p k # sweeps esterror ka? Qk sweepcorr # sweeps esterror ka? Qk sweepcorr 6 30-1.70e-0.1e-0 1.89e-08 6-1.70e-0.1e-0 6.10e-16 3 30 -.64e-0 3.38e-0 3.93e-08 13 -.64e-0 3.38e-0 3.81e-16 4 30-5.03e-0 6.10e-0 1.10e-07 10-5.03e-0 6.10e-0 3.53e-16 5 30.55e-0 9.04e-01 3.3e-09 30.55e-0 9.04e-01 1.11e-15 9 30-4.41e-0 4.53e-0 8.44e-08 9-4.41e-0 4.53e-0 6.93e-18 3 30-3.61e-0 3.86e-0 4.6e-08 14-3.61e-0 3.86e-0 5.41e-16 4 30-4.05e-0 4.60e-0 4.98e-06 11-4.05e-0 4.59e-0 5.75e-16 5 30-5.44e-0 6.30e-0 1.06e-06 15-5.41e-0 6.7e-0 7.35e-16 6 30 3.49e-0 7.10e-01 1.00e-06 9 3.49e-0 7.10e-01 9.99e-16 7 30 4.89e-0 1.04e+00.67e-07 15 4.89e-0 1.04e+00.e-16 8 30 5.53e-0 1.34e+00 1.88e-08 14 5.53e-0 1.34e+00 6.66e-16 Table 4. kak 10? kk and 1 p. RSRM LSRM p k # sweeps esterror ka? Qk sweepcorr # sweeps esterror ka? Qk sweepcorr 6 6-3.3e-0 1.36e+01 7.10e-15 30-3.3e-0 1.36e+01 5.7e-13 3 5-1.51e-0 3.01e+01 3.55e-15 0-1.51e-0 3.01e+01 7.10e-15 4 4-1.81e-0.51e+01 7.10e-15 30-1.81e-0.51e+01 3.55e-15 5 4-1.43e-0 3.16e+01 3.55e-15 8-1.43e-0 3.16e+01 0.00e+00 9 4-1.6e-0 1.60e+01 0.00e+00 6-1.6e-0 1.60e+01 0.00e+00 3 6 -.37e-0.75e+01 1.06e-15 0 -.37e-0.75e+01 1.77e-15 4 7-3.40e-0 3.66e+01.13e-15 8-3.40e-0 3.66e+01 0.00e+00 5 5-3.6e-0 3.8e+01 1.4e-15 14-3.6e-0 3.8e+01 1.4e-15 6 5 -.87e-0 4.33e+01 0.00e+00 11 -.87e-0 4.33e+01 0.00e+00 7 5 -.69e-0 4.6e+01 7.10e-15 14 -.69e-0 4.6e+01 7.10e-15 8 5 -.70e-0 4.61e+01.13e-15 -.70e-0 4.61e+01 7.10e-15 Table 5. kak 10 kk and 1 p. Table 4 illustrates the behavior of the two methods when the ellipsoid is almost a sphere but now A is chosen so kak 10? kk. That is the quadratic term dominates the linear term. In this case the minimum of the functional can be estimated by the minimum value of the quadratic term. In Table 4 esterror denotes the dierence between the minimum value of the quadratic term and the computed value of the functional, and sweepcorr = ka? Qk? ka? ^Qk where Q and ^Q are the last and penultimate approximations to the minimizer. The experiments
PROCRUSTES PROBLEM FOR STIEFEL MATRICES 17 suggest that the LSRM requires less sweeps to obtain a satisfactory approximation to the minimizer. Table 5 illustrates the behavior of the two methods when the ellipsoid is almost a sphere but now A is chosen so kak 10 kk. That is the linear term dominates the quadratic terms. In this case the minimum of the functional can be estimated by the minimum value of the linear term. In Table 5 esterror denotes the dierence between the minimum value of the linear term and the computed value of the functional. The experiments suggest that the RSRM requires less sweeps to obtain a satisfactory approximation to the minimizer. References [1] V.I. Arnold, Geometrical Methods in the Theory of Ordinary Dierential Equations, Springer-Verlag, New York, 1988. [] J.M. Ten Berge and K. Nevels, A general solution to Mosier's oblique Procrustes problem, Psychometrika 4 (1977), 593{600. [3] J.M. Ten Berge and D.L. Knol, Orthogonal rotations to maximal agreement for two or more matrices of dierent column orders, Psychometrika 49 (1984), 49{55. [4] A. Edelman, T. Arias and S.T.Smith, Conjugate gradient on Stiefel and Grassman manifolds,1995. [5] L. Elden, Algorithms for the regularization of ill-conditioned least squares problems, BIT 17 (1977), 134-145. [6] G.E. Forsythe and G.H. Golub, On the stationary values of a second-degree polynomial on the unit sphere, SIAM 4 (1965), 1050{1068. [7] G.E. Forsythe and P. Henrici, The cyclic Jacobi method for computing the principal values of a complex matrix, Trans. AMS 94 (1960), 1{3. [8] W. Gander, Least Squares with a Quadratic Constraint, Numer. Math. 36 (1981), 91-307. [9] G.H. Golub, Some modied matrix eigenvalue problems, SIAM Review 15 (1973), 318{334. [10] G.H. Golub and U. von Matt, Quadratically constrained least squares and quadratic problems, Numer. Math. 59 (1991), 561{580. [11] G.H. Golub and C.F. Van Loan, Matrix Computations Johns Hopkins, Baltimore,1990. [1] Ky Fan, On a theorem of Weyl concerning eigenvalues of linear transformations, Proc. N.A.S. 35 (1949), 65{655. [13] A. Lutoborski, On the convergence of the Euler-Jacobi method, Numer. Funct. Anal. and Optimiz. 13 (199), 185{0. [14] H. Park, A parallel algorithm for the unbalanced orthogonal Procrustes problem, Parallel Computing 17 (1991), 913{93. [15] S.T. Smith, Optimization techniques on Riemannian manifolds, Fields Institute Comm. 3 (1994), 113{136. [16] J. Von Neumann, Some matrix inequalities and metrization of the matrix space, Tomsk Univ. Rev. (1937), 86{300. [17] H. Weber and J.Wellstein, Enzyklopadie der Elementar Mathematik B.G. Teubner, Leipzig, 1915. Electrical Engineering Department, Cornell University, Ithaca, N.Y. 14853 E-mail address: adamb@ee.cornell.edu Department of Mathematics, Syracuse University, Syracuse, N.Y. 1344-1150 E-mail address: lutobor@mazur.syr.edu