A. W. BOJANCZYK AND A. LUTOBORSKI The original formulations of the Procrustes problem can be found in [], [3]. We may write (1.4) explicitly as P[A; ]

Similar documents
Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic

Sub-Stiefel Procrustes problem. Krystyna Ziętak

Linear Algebra Massoud Malek

Eigenvalue problems and optimization

Lecture 2: Linear Algebra Review

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

ALGORITHM CONSTRUCTION BY DECOMPOSITION 1. INTRODUCTION. The following theorem is so simple it s almost embarassing. Nevertheless

Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) 1.1 The Formal Denition of a Vector Space

KEYWORDS. Numerical methods, generalized singular values, products of matrices, quotients of matrices. Introduction The two basic unitary decompositio

Linear Algebra (Review) Volker Tresp 2017

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

Numerical Methods in Matrix Computations

Institute for Advanced Computer Studies. Department of Computer Science. On the Perturbation of. LU and Cholesky Factors. G. W.

CHAPTER 11. A Revision. 1. The Computers and Numbers therein

Interval solutions for interval algebraic equations

IV. Matrix Approximation using Least-Squares

Vector Spaces ปร ภ ม เวกเตอร

Linear Regression and Its Applications

1. Introduction Let the least value of an objective function F (x), x2r n, be required, where F (x) can be calculated for any vector of variables x2r

7. Symmetric Matrices and Quadratic Forms

Chapter 3 Least Squares Solution of y = A x 3.1 Introduction We turn to a problem that is dual to the overconstrained estimation problems considered s

Outline Introduction: Problem Description Diculties Algebraic Structure: Algebraic Varieties Rank Decient Toeplitz Matrices Constructing Lower Rank St

Numerical Methods. Elena loli Piccolomini. Civil Engeneering. piccolom. Metodi Numerici M p. 1/??

8. Diagonalization.

1 Vectors. Notes for Bindel, Spring 2017 Numerical Analysis (CS 4220)

Class notes: Approximation

NORMS ON SPACE OF MATRICES

Part I: Preliminary Results. Pak K. Chan, Martine Schlag and Jason Zien. Computer Engineering Board of Studies. University of California, Santa Cruz

ON THE ARITHMETIC-GEOMETRIC MEAN INEQUALITY AND ITS RELATIONSHIP TO LINEAR PROGRAMMING, BAHMAN KALANTARI

15 Singular Value Decomposition

1 Introduction CONVEXIFYING THE SET OF MATRICES OF BOUNDED RANK. APPLICATIONS TO THE QUASICONVEXIFICATION AND CONVEXIFICATION OF THE RANK FUNCTION

Outline Background Schur-Horn Theorem Mirsky Theorem Sing-Thompson Theorem Weyl-Horn Theorem A Recursive Algorithm The Building Block Case The Origina

COMP 558 lecture 18 Nov. 15, 2010

Parallel Singular Value Decomposition. Jiaxing Tan

The skew-symmetric orthogonal solutions of the matrix equation AX = B

Course Summary Math 211

Computation of eigenvalues and singular values Recall that your solutions to these questions will not be collected or evaluated.

MAT1035 Analytic Geometry

Abstract. The weighted orthogonal Procrustes problem, an important class of data matching problems in multivariate data analysis, is reconsidered in t

Tangent spaces, normals and extrema

14 Singular Value Decomposition

A Method for Constructing Diagonally Dominant Preconditioners based on Jacobi Rotations

linearly indepedent eigenvectors as the multiplicity of the root, but in general there may be no more than one. For further discussion, assume matrice

Math Introduction to Numerical Analysis - Class Notes. Fernando Guevara Vasquez. Version Date: January 17, 2012.

PROOF OF TWO MATRIX THEOREMS VIA TRIANGULAR FACTORIZATIONS ROY MATHIAS

LMI MODELLING 4. CONVEX LMI MODELLING. Didier HENRION. LAAS-CNRS Toulouse, FR Czech Tech Univ Prague, CZ. Universidad de Valladolid, SP March 2009

STATE COUNCIL OF EDUCATIONAL RESEARCH AND TRAINING TNCF DRAFT SYLLABUS.

AM 205: lecture 8. Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition

7. Dimension and Structure.

Linear Subspace Models

Chapter 3 Transformations

EECS 275 Matrix Computation

Lecture Notes 1: Vector spaces

OUTLINE 1. Introduction 1.1 Notation 1.2 Special matrices 2. Gaussian Elimination 2.1 Vector and matrix norms 2.2 Finite precision arithmetic 2.3 Fact

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

Applied Linear Algebra in Geoscience Using MATLAB

Geometric Modeling Summer Semester 2010 Mathematical Tools (1)

THE GRADIENT PROJECTION ALGORITHM FOR ORTHOGONAL ROTATION. 2 The gradient projection algorithm

Elementary linear algebra

Exercise Sheet 1.

SPRING 2006 PRELIMINARY EXAMINATION SOLUTIONS

Conceptual Questions for Review

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

Divisor matrices and magic sequences

Linear Algebra for Machine Learning. Sargur N. Srihari

A Finite Element Method for an Ill-Posed Problem. Martin-Luther-Universitat, Fachbereich Mathematik/Informatik,Postfach 8, D Halle, Abstract

Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds

ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS

(1.) For any subset P S we denote by L(P ) the abelian group of integral relations between elements of P, i.e. L(P ) := ker Z P! span Z P S S : For ea

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization

STABILITY OF INVARIANT SUBSPACES OF COMMUTING MATRICES We obtain some further results for pairs of commuting matrices. We show that a pair of commutin

DS-GA 1002 Lecture notes 10 November 23, Linear models

The Lanczos and conjugate gradient algorithms

that of the SVD provides new understanding of left and right generalized singular vectors. It is shown

Some preconditioners for systems of linear inequalities

Least Squares Optimization

Suppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf.

Preliminaries and Complexity Theory

A A x i x j i j (i, j) (j, i) Let. Compute the value of for and

which arises when we compute the orthogonal projection of a vector y in a subspace with an orthogonal basis. Hence assume that P y = A ij = x j, x i

Foundations of Matrix Analysis

There are six more problems on the next two pages

Week Quadratic forms. Principal axes theorem. Text reference: this material corresponds to parts of sections 5.5, 8.2,

CSL361 Problem set 4: Basic linear algebra

14.2 QR Factorization with Column Pivoting

Direct methods for symmetric eigenvalue problems

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

APPLICATIONS The eigenvalues are λ = 5, 5. An orthonormal basis of eigenvectors consists of

Vector Spaces ปร ภ ม เวกเตอร

Throughout these notes we assume V, W are finite dimensional inner product spaces over C.

Jim Lambers MAT 610 Summer Session Lecture 2 Notes

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space.

Numerical Methods - Numerical Linear Algebra

Majorization for Changes in Ritz Values and Canonical Angles Between Subspaces (Part I and Part II)

Contents. 1 Vectors, Lines and Planes 1. 2 Gaussian Elimination Matrices Vector Spaces and Subspaces 124

The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA)

Linear Algebra and Eigenproblems

Least Squares Optimization

UMIACS-TR July CS-TR 2494 Revised January An Updating Algorithm for. Subspace Tracking. G. W. Stewart. abstract

The Newton Bracketing method for the minimization of convex functions subject to affine constraints

Transcription:

THE PROCRUSTES PROBLEM FOR ORTHOGONAL STIEFEL MATRICES A. W. BOJANCZYK AND A. LUTOBORSKI February 8, 1998 Abstract. In this paper we consider the Procrustes problem on the manifold of orthogonal Stiefel matrices. Given matrices A R mk, B R mp, m p k, we seek the minimum of ka? BQk for all matrices Q R pk, Q T Q = I kk. We introduce a class of relaxation methods for generating minimizing sequences and oer a geometric interpretation of these methods. Results of numerical experiments illustrating the convergence of the methods are given. 1. Introduction We begin by dening the set OSt(p; k) of orthogonal Stiefel matrices: OSt(p; k) = fq R pk ; Q T Q = I kk g (1.1) which is a compact submanifold of dimension M = pk? 1 k(k + 1) of the manifold O(p) of all p p orthogonal matrices which has dimension 1 p(p? 1). Let A R mk and B R mp where m p k be given. Let kak = (trace A T A) 1 denote the standard Frobenius norm in R mk. The Procrustes problem for orthogonal Stiefel matrices is to minimize P[A; B](Q) = ka? BQk (1.) for all Q OSt(p; k). Problem (1.) can be simplied by performing the singular value decomposition of the matrix B R mp. Let B = USV T where U O(m), V O(p) and S = (diag( 1 ; : : : ; p ); O (m?p)(m?p) ) T, then P[A; B](Q) =ku(u T A? SV T Q)k (1.3) =k ~ A? S ~ Qk where ~ A = U T A and ~ Q = V T Q. Due to the fact that the last m? p rows of S are zeros we will simplify (1.3) by introducing new notations. We denote = diag( 1 ; : : : ; p ) and assume from now on that the problem does not reduce to a lower dimensional one, or in other words that all 1 p > 0. We dene A R pk to be a matrix composed of the rst p rows of ~ A. Consequently the Procrustes minimization on the set of orthogonal Stiefel matrices is : For given A R pk and diagonal R pp minimize for all Q OSt(p; k). P[A; ](Q) = ka? Qk (1.4) 1

A. W. BOJANCZYK AND A. LUTOBORSKI The original formulations of the Procrustes problem can be found in [], [3]. We may write (1.4) explicitly as P[A; ](Q) = trace(q T Q)? trace(q T A) + kak (1.5) The Procrustes problem has been solved analytically in the orthogonal case when p = k and OSt(p; k) = O(p), see [11]. In this case Q O(p) and we have P[A; ](Q) = kak + kk? trace(q T A) (1.6) Provided that the singular value decomposition of A is A = P?R T the minimizer in (1.6) is then Q = P R T : (1.7) The functional P[A; ] in (1.5) is a sum of two functionals in Q : the quadratic functional trace(q T Q) and the linear functional?trace(q T A). It is well known how to minimize each of the functionals separately. The minimum value of the quadratic functional is equal to the sum of squares of the k smallest diagonal entries of. This result is due to Ky Fan [1]. The linear functional is minimized when trace(q T A) is maximized. The maximum of this trace is given by the sum of the singular values of the matrix T A. This upper bound on the trace functional has been established by J.Von Neumann in [16], see also [11]. Separate minimization of the quadratic and the linear part are well understood methods. The analytical solution of the orthogonal Procrustes problem for Stiefel matrices is to the best of our knowledge an open problem. It will be useful to interpret the minimization (1.4) geometrically. To do that we dene an eccentric Stiefel manifold OSt[](p; k) in R pk : OSt[](p; k) = fx R pk : X T? X = I kk g : (1.8) The eccentric Stiefel manifold OSt[](p; k) is an image of the orthogonal Stiefel manifold OSt(p; k) under the linear mapping Q?! Q. The image of a sphere fa R pk : kak = p kg of radius p k in R pk under this mapping is an ellipsoid in R pk of which OSt[](p; k) is a subset. The eccentric Stiefel manifold is a compact set contained in a larger ball in R pk centered at 0 and of radius p k 1. Clearly OSt[](p; k) = fx R pk : X = Q; Q OSt(p; k)g (1.9) We note that OSt[](p; 1) is a standard ellipsoid in R p : OSt[](p; 1) = and OSt[I](p; k) = OSt(p; k). Therefore if ( x R p : x 1 1 + + x p p = 1 ) (1.10) min P[A; ](Q) = ka? QOSt(p;k) Q k (1.11) then a point Q is the projection of A onto the eccentric Stiefel manifold OSt[](p; k). Due to the compactness of the manifold a projection Q exists. The big diculty which we face in the task of computing the minimizer Q is the fact that the manifold OSt[](p; k) is not a convex set and a projection on a non-convex set is in general non-unique.

PROCRUSTES PROBLEM FOR STIEFEL MATRICES 3. Notations Elementary plane rotation by an angle is represented by cos? sin G() = sin cos Elementary plane reection about the line with slope tan is cos sin R() = sin? cos For Q R pk and 1 m < n p we introduce the following submatrices of Q : Q [m;n] R k Q (m;n) R (p?)k Q [m;n] [m;n] R consists of the m and n-th rows of Q consists of the rows complementary to Q [m;n] consists of the entries on the intersections of the m-th and n-th rows and columns of Q: (.1) (.) A plane rotation by an angle in the (k; l)-plane in R p,p is represented by a matrix G k;l () R pp such that G k;l () [k;l] [k;l] = G() G k;l() (k;l) (k;l) = I (p?)(p?) (.3) A plane reection R k;l () in the (k; l)-plane is dened similarily by means of R(). J k;l (p) is the set of all plane rotations and reections in the (k; l)-plane. J (p) is the set of plane rotations and reections in all planes. Clearly J k;l (p) J (p) O(p) 3. Relaxation methods for the Procrustes problem The Stiefel manifold OSt(p; k) is the admissible set for the minimizer of the functional P in (1.4). This manifold however is not a vector space which poses severe restrictions on how the succesive approximations can be obtained from the previous ones. Additive corrections are not admissible, but the Stiefel manifold is closed with respect to left multiplication by an orthogonal matrix R O(p). Thus RQ, where Q OSt(p; k), is an admissible approximation. Consequently, we restrict our considerations to a class of minimization methods which construct the approximations ^Q to the minimizer Q by the rule ^Q = RQ ; (3.1) where Q and ^Q denote respectively the current and the next approximations to the minimizer. In what follows we will consider only relaxation minimization methods which seek for the minimizer of the functional P, according to (3.1) with R = R N R 1 (3.) where N M and M is the dimension of the manifold OSt(p; k). Each R i O(p), i = 1; ; : : : ; N depends on a single parameter whose value results from a scalar minimization problem. We will refer to the left multiplication by R in (3.1) as to a sweep. Our relaxation method consists of repeated applications of sweeps which produce a minimizing sequence for the problem (1.4).

4 A. W. BOJANCZYK AND A. LUTOBORSKI We will choose matrices R i to be orthogonally similar to a plane rotation or reection. Dierent choices of similarities will lead to dierent relaxation methods. We set and dene Q 0 =Q; R 0 = I Q i =R i?1 R 0 Q ; i = 1; ; : : : ; N + 1 ; (3.3) R i = R i () = P i J i ()P T i (3.4) where J i J (p), and P i O(p) may depend on the current approximation to the minimizer Q i. It is the choice of P i that fully determines the relaxation method (3.1)-(3.4). The selection of the parameter in (3.4) will result from the scalar minimization ka? (R i ()Q i )k = min ~ ka? (R i (~)Q i )k : (3.5) The matrix R i can be viewed as a plane rotation or reection in a plane spanned by a pair of columns of the matrix P i. The indices (r; s) of this pair of columns are selected according to an ordering N of a set of pairs D. The ordering N : D! f1; ; :::; Ng is an bijection, where D f(r; s) : 1 r k; r + 1 s pg. This inclusion guarantees that D contains at least M distinct pairs necessary to construct an arbitrary Q OSt(p; k) as a product of matrices R i. It is clear that relaxation methods satisfying (3.5) will always produce a nonincreasing sequence of the values P[A; ](Q i ). If P i = I pp in (3.4), then R i = J i and the sweep (3.1) has the following particularly simple form ^Q = J N J 1 Q : (3.6) The relaxation method dened by (3.6) will be refered to as a left-sided relaxation method or LSRM. If P i = (Q i ; Q? i ) in (3.4), where Q? i is the orthogonal complement of Q i, then R i = (Q i ; Q? Q T i )J i i (Q? i ) T and hence Q i+1 = R i Q i = (Q i ; Q? Ikk i )J i : 0 (p?k)k Thus by induction the sweep (3.1) has the form ^Q = (Q ; Q? Ikk )J 1 J N : (3.7) 0 (p?k)k The relaxation method dened by (3.7) will be refered to as a right-sided relaxation method or RSRM. Our objective is to propose a geometric interpretation of the LSRM and describe its numerical implementation based on the geometric aspects of the method. We will compare our left-sided relaxation method with an existing method for the the Procrustes problem for orthogonal Stiefel matrices due to H. Park in [14]. The general description of the RSRM allows us to formulate Park's method as a relaxation method and compare it with the LSRM. The method in [14] is based on the concepts introduced earlier by Ten Berge and Knol [3], where the problem is called the unbalanced Procrustes problem and its solution is based on iterative

PROCRUSTES PROBLEM FOR STIEFEL MATRICES 5 solution of a sequence of orthogonal Procrustes problems called balanced problems. For the study of other minimization methods on submanifolds of spaces of matrices see [13], [15] and [4]. 4. Planar Procrustes problem We will now present the left-sided relaxation method. Without loss of generality let us assume that the planes (r; s) in which transformations operate are chosen in the row cyclic order, in the way analogous to that used in the cyclic Jacobi method for the SVD computation [7]. In this case N : D! f1; : : : ; 1 p(p? 1)g, D = f(r; s) : 1 p? 1; r + 1 s pg is given by N (r; s) = s? r + (r? 1)(p? r ) and ^Q in (3.6) has the following form ry p?1 Y ^Q = J p?r;p?s+1 Q ; (4.1) r=1 s=1 where J r;s J r;s (p). Let Q i = Q N (r;s) be the current approximation in the sweep. The next approximation to the minimizer is Q i+1 = J i ()Q i. The selection of the parameter results from the scalar minimization ka? (J i ()Q r;s )k = min ~ ka? (J i (~)Q r;s )k (4.) Our main goal now is to show how to nd in (4.). Consider the functional J?! ka? (JQ)k in (4.), where for simplicity of notation we omitted all indices. Without loss of generality we assume that N?1 (i) = (r; s) = (1; ) and hence J = G 1; () = diag? G(); I (p?)(p?) where G() is a plane rotation (the case of reection is similar and can be treated in a completely analogous way). The minimization in (4.) is precisely the minimization of a11 a 1 a 1k 1 0? G() q11 q 1 q 1k a 1 a a k 0 q 1 q q k = A [1;]? [1;] G()Q[1;] [1;] : (4.3)? Let U Q?VQ T be the SVD decomposition of Q[1;] such that? = diag( 1 ; ); 0 (k?) f() = and 1 > 0. Thus, f() = A[1;] V Q? 1 0 1 0 0 G()U 0 Q 0 0 : (4.4) Denote B = A [1;] V Q. Note that the last k? columns of the matrix B in (4.4) are always approximated by zero columns, and thus the minimization of f() is equivalent to minimization restricted to the rst two columns. Introducing a new variable = +, where is the angle of the plane rotation ( or reection ) U Q and setting c = cos, s = sin we obtain the following minimization problem.

6 A. W. BOJANCZYK AND A. LUTOBORSKI F () = B 1 0 [1;]? c?s 0 s c 1 0 0 + kb (1;) k (4.5) We may write (4.5) explicitly as where q = (c; s) T and We now denote F () = z 1 c + z s? y 1 c? y s + kb k (4.6) z 1 = 1 1 + (4.7) z = 1 + 1 y 1 = b 11 1 1 + b (4.8) y =?b 1 1 + b 1 1 Z = diag(z 1 ; z ) C = Z?1 Y Y = (y 1 ; y ) T where z 1 z > 0 and C = (c 1 ; c ) T. By completing the squares we may represent F () in the following form: y1 y F () =? z 1 cos +? z sin? kck + ka [1;] k z 1 z (4.9) = kc? Zq()k? kck + ka [1;] k for q() = (cos ; sin ) T. Thus the minimization of the functional (4.9) is equivalent to the following problem, For given C R 1 and diagonal Z R minimize P[C; Z](q) = z 1 cos + z sin? c 1 z 1 cos? c z sin + jjcjj (4.10) = kc? Zqk for all q OSt(; 1), q = (cos ; sin ) T. The minimization problem of the type (4.10) will be called a planar Procrustes problem. Such problem has to be solved on each step of our relaxation method and is geometrically equivalent to projecting C onto an ellipse. In the next section we consider two dierent iterative methods for nding the projection. Both of these geometrically based method provide excellent initial approximations to the solution end means for error control. 5. Projection on an ellipse The geometrical formulation of the planar Procrustes problem (4.10) is very simple. Given a point C and an ellipse E = OSt[Z](; 1) E = OSt[Z](; 1) = (x 1 ; x ) T : = 1 ; (5.1) x 1 z 1 + x z in R we want to nd a point S E, S = Zq = (z 1 cos ; z sin ) T which is a projection of C onto E.

PROCRUSTES PROBLEM FOR STIEFEL MATRICES 7 This can be achieved in a variety of ways. We describe the classical projection of a point onto an ellipse due to Appolonius and a method of iterated reections based on the reection property of the ellipse. 5.1. The hyperbola of Appolonius. Recall the construction of a normal to an ellipse from a point [17], due to Appolonius. With the given ellipse E in (5.1) and with the point C we associate the hyperbola H given by: x = m x 1 x 1? m 1 (5.) where (m 1 ; m ) is the center of H with coordinates m 1 = c 1z 1 z 1? z m =? c z z1? : (5.3) z C S K F Figure 1. Hyperbola of Appolonius To nd the coordinates of the projection point S we have to intersect the hyperbola of Appolonius with the ellipse that is to solve a system of two quadratic equations (5.1) and (5.). Using (5.) to eliminate x from (5.1) we obtain a fourth order polynomial equation in x 1 (x 1? m 1 ) (z 1? x 1)? z1 m z x 1 = 0: (5.4) For any specic numerical values of the coecients this equation can be easily solved symbolically. A simpler, purely numerical alternative is to solve the system (5.1), (5.) using Newton's method. Another alternative is to reduce the system to a scalar trigonometric equation. Assume that C = (c 1 ; c ) T is in the rst quadrant and that C = E. Let S be the projection of C onto E. Then setting (x 1 ; x ) = (z 1 cos ; z sin ) in (??) and next substituting t = tan leads to the equation g(t) = 0 in t, where g(t) = c 1 z 1 t? (z 1? z )t(1 + t )? 1? c z : (5.5)

8 A. W. BOJANCZYK AND A. LUTOBORSKI It is easy to see that the function g(t) is convex and has one positive root. It can also be seen that for t 0 = z 1?z +cz c 1z 1 we have g(t 0 ) > 0. Thus, Newton method starting from the initial approximation t 0 will generate a decreasing, convergent sequence of approximations to the root of g(t) = 0. 5.. Iterated reections. Assume, as before, that C is in the rst quadrant and that C = E, see Fig.. Every other case reduces to this one through reections. Let C = (c 1 ; c ) T B = F = T 1 (z1? z z ); 0 (5.6) 1 q z 1? z ; 0 T K =?F (5.7) C L S R K B F Figure. Iterated reections The reection property of E says that (for C both inside and outside E) S is characterized by : \KSC = \F SC. Let L; R E, as on Figure, be given by L = (z 1 cos L ; z sin L ) T R = (z 1 cos R ; z sin R ) T where for C outside E : tan L = z 1c z c 1 (5.8) cos R = m z 1p z 1? z + p (1 + m )z 1 z4 m z 3 1 + z 1z c m = p c 1? z1? z (5.9) (In the case C inside E, the coordinates of the points L and R can be computed similarly). Clearly R < < L. Analogously as in the bisection method, we compute M = (z 1 cos M ; z sin M ) (an intermediate point between L and R) from L and R by setting M = 1 ( L + R ). If kf? Mk hf? M; C? Mi < kk? Mk hk? M; C? Mi (5.10)

PROCRUSTES PROBLEM FOR STIEFEL MATRICES 9 (where h ; i denotes the inner product), then R M and we set L 1 = M and R 1 = R so that R1 L1. If (5.10) doesn't hold we set L 1 = L and R 1 = M. We thus construct a sequence fl n g E such that lim n!1 L n = S. 5.3. Some remarks on the general and planar Procrustes problems. The planar Procrustes problem has several features which the general problem (1.4) of projection onto the eccentric Stiefel manifold does not posses. E has the reection properties and Appolonius normal. The reection properties of the ellipse do not extend to the eccentric Stiefel manifold and in particular not even to ellipsoids in R p. The construction of an Appolonius normal to the ellipse based on the orthogonality of the ellipse and an associated hyperbola which results in a scalar equation (5.4) is also particular to the planar problem. As a result in the case p > k > 1 our relaxation step, which amounts to solving a planar Procrustes problem, cannot be directly generalized to a higher dimensional problem. A point not belonging to E has either a unique projection onto E or a nite number of projections. B F Figure 3. Points in the rst quadrant and their projections on the ellipse. Hence if C = conv(e) then there exits a unique projection S of C onto E characterized by hc? S; F? Si 0 for all F conv(e). A point C has a non-unique projection S onto E i C is on the major axis between points B and?b. The projections of all other points are unique. In general solutions to the Procrustes problem may form a submanifold of the Stiefel manifold. Various locations of C and its projection(s) S are shown on Fig. 3. Fig. 3 shows also the upper part of the evolute of the ellipse (dashed). From the points in the plane outside the evolute normals to E can be drawn. From the points on the evolute 3 normals can be drawn and 4 normals can be drawn from the

10 A. W. BOJANCZYK AND A. LUTOBORSKI points inside the evolute. The intersections of the normal from C with the ellipse correspond to the zeros of (5.4) and we are able to geometrically rule out the zeros not corresponding to the projection. The distance function d(c) = min q kc? Zqk whose evaluation requires solving the planar Procrustes problem (4.10) is shown on Fig 4. The function d is not dierentiable at points C in the segment [?B; B]. Its singularities are of interest in geometrical optics, where d is called the optical distance, and in the theory of Hamilton-Jacobi equations [1]. Finally we observe that E cuts the plane into two components. However if C = 0 @ 1 p 0 0 0A (5.11) 0 0 then C is on a sphere of radius p in R 3 containing the Stiefel manifold OSt[I](3; ) but the point kc for any k R cannot be connected to 0 with a segment intersecting OSt(3; ). 0.6 0.4 0. 0 1.5 1 1 0.5 0 0.5 1 0.5 0 0.5 1.5 1 Figure 4. Graph of d(c) = min kqk=1 kc? Zqk Both observations concerning the analytical problem are reected in the computations and have computational implications. The nal remarks are about the relation of the Procruste problem to the constrained linear least squares problem. The quadratically constrained linear least squares problem min kb? Axk (5.1) kxk= arises in many applications [8], [10], [9], [5]. By changing variables this problem can be transformed into a special Procrustes problem. The Procrustes problem is min ka? kqk=1 qk (5.13)

PROCRUSTES PROBLEM FOR STIEFEL MATRICES 11 where A = UV T, a = U T b= and q = V T x=. This Procrustes problem is equivalent to projecting the point a onto the ellipsoid OSt[](p; 1). Let kqk = 1. It is clear that the vector n =?1 q has the direction of the normal vector to the ellipsoid at the point q. Thus if q is the projection of a on the ellipsoid then the vector q? a is parallel to the vector n. Thus there exists a scalar so a i? i q i = q i i (5.14) where = diag( i ). As kqk = 1 one can obtain an equation for px ai i 1 = + i : (5.15) i=1 The parameter can be computed by solving this equation. Then the components of q are given by q i = a i i + : i The equation (5.15) is the so-called secular equation which characterizes the critical points of the Lagrangian h(; q) = kq? ak + (kqk? 1); (5.16) see [11]. Thus the multiplier in (5.14) is the Lagrange multiplier in (5.16). 6. Geometric Interpretation of Left and Right Relaxation Methods Since the notion of the standard ellipsoid OSt[](p; 1) in R p is very intuitive we will now interpret the minimization problem (1.11) treating matrices in R pk as k-tuples of vectors in R p. Let A = (a 1 ; a ; : : : ; a k ) be a given k-tuple of vectors in R p. Let Q = (q 1 ; q ; : : : ; q k ) OSt(p; k) be the current approximation to the minimizer. Clearly the points q i all belong to the ellipsoid OSt[](p; 1). Thus the minimization of P[A; ](Q) can be interpreted as nding points qi on the ellipsoid, where qi are orthonormal vectors, that best match, as measured by P[A; ](Q), the given vectors a i in R p. The relaxation method described in Section 3 can be interpreted as follows. Pick an orthonormal basis in R p. In the next sweep rotate the current set of vectors q i as a frame, in planes spanned by all pairs of the vectors from the current basis. In the left-sided relaxation method the basis is the cannonical basis and is the same for all sweeps. All relaxation steps are exactly the same, and all amount to solving a planar Procrustes problem. In the right-sided relaxation method the basis consists of two subsets and changes from sweep to sweep. The rst subset of the basis consists of the columns of the current approximation Q and the second subset consists of the columns of the orthogonal complement Q? of Q. Working only with the columns of Q is equivalent to the so-called balanced Procrustes problem studied by Park [14] which can be solved by means of an SVD computation. The relaxation step in [14] for the balanced problem consists of computing the SVD of the matrix (q r ; q s ) T (a r ; a s ). In our relaxation setting, the relaxation step in the right-sided relaxation method requires solving

1 A. W. BOJANCZYK AND A. LUTOBORSKI the scalar minimization problem (3.5), min c?s (a r ; a s )? (q r ; q s ) s c c +s =1 (6.1) which leads to a linear equation in tangent of and is equivalent to the SVD computation in [14]. Each of these steps is a rotation of vectors q r and q s in the plane spanned by q r and q s so the rotated vectors on the ellipsoid best approximate the two given vectors a r and a s. However, as the columns of Q do not span the whole space R p, it might happen that spanfq 1 ; : : : ; q k g 6= spanfq1; : : : ; qk g, and hence it might not be possible to generate a sequence of approximations that will converge to Q. In order to overcome this problem the matrix Q is extended by its orthogonal complement Q? = (q k+1 ; : : : ; q p ) so that span(q1; : : : ; qk ) span(q 1; : : : ; q p ). The scalar minimization subproblems in [14] involving vectors from both subsets are referred to in [14] as unbalanced subproblems. These scalar minimizations have the following form min c +s =1 a r? (q r ; q s ) c s (6.) That is, the unbalanced subproblem is to nd a vector on the ellipsoid in the plane spanned by q r and q s closest to the given vector a r. As the intersection of this plane and the ellipsoid is an ellipse, the unbalanced subproblem can be expressed as a planar Procrustes problem (4.10) and any of the algorithms discussed in Section 5 can be used to solve this unbalanced problem. Other choices of bases may be possible but the choices leading to the left and the right sided-relaxation methods seem to be the most natural. 7. Numerical experiments In this section we present numerical experiments illustrating the behavior of the left and the right relaxation methods discussed in Section 3. We will start by summarizing the left and the right relaxation methods given below in pseudocode. Given A R pk, A = (a 1 ; :::; a k ), and = diag( 1 ; :::; p ) both algorithms construct of sequences of Stiefel matrices approximating the minimizer of (1.4). Algorithm LSRM: 1. Initialization: set Maxstep, Q = I pk, n = 0, r?1 = 0, r 0 = ka? Qk. Iterate sweeps: while (r n? r n?1 ) > threshold and n < Maxstep for i = 1 to k for j = i + 1 to p solve planar Procrustes problem min ka [i;j]? [i;j] [i;j] J()Q[i;j] k Q [i;j] n n + 1 r n = ka? Qk J()Q [i;j]

Algorithm RSRM: PROCRUSTES PROBLEM FOR STIEFEL MATRICES 13 1. Initialization: set Maxstep, Q = I pp, n = 0, r?1 = 0, r 0 = ka? QI pk k. Iterate sweeps: while (r n? r n?1 ) > threshold and n < Maxstep for i = 1 to k for j = i + 1 to k solve min k(a T ) [i;j]? J()(Q T ) [i;j] k (Q T ) [i;j] J()(Q T ) [i;j] for j = k + 1 to p solve planar Procrustes problem min c +s =1 ka T j? (c s)(qt ) [i;j] k e T i (QT ) (c s)(q T ) [i;j] n n + 1 r n = ka? QI pk k We measure the cost of the two methods by the number of sweeps performed by each of the two methods. A sweep in the LSRM method consists of p(p + 1)= planar Procrustes problems. Each planar Procrustes problem requires computation of the SVD of a k matrix. This can be achieved by rst computing the QR decomposition followed by a SVD problem. After the SVD is calculated, a projection on an ellipse has to be determined. The cost of a sweep is approximately O(kp ) oating point operations. A sweep in the RSRM method consists of k(k + 1)= computations of p SVD problems. In adition, there are k(p? k) planar Procrustes problems, each requiring computation of the SVD of a p matrix followed by computation of a projection on an ellipse. Thus the cost of a sweep is again approximately O(kp ) oating point operations. Surely, the precise cost of a sweep will depend on the number of iterations needed for obtaining satisfactory projections on the resulting ellipses. For each projection, this will depend on the location of the point being projected as well as the shape of the ellipse. Computation of the projection will be most costly when the ellipse is at. As can be seen, sweeps in the two methods may have dierent costs. However, the number of sweeps performend by each of the methods will give some bases for comparing the convergence behavior of the two methods. We begin by illustrating the behavior of the LSRM method for nding Q in the Procrustes problem with p = 4; k =, = diag(10 0 ; 10?1 ; 10? ; 10?3 ), and A = Q where 0 Q = B @?3:1665166866158e? 01 5:34030951680499e? 0?1:508494807711354e? 01?9:0694616989718e? 01?7:975468385641e? 01 3:96039609671e? 01?5:868854343875571e? 01?:0034885985765e? 01 The initial approximation is Q 0 = I 4. Some intermediate values of Q are listed in Table 1. 1 C A ;

14 A. W. BOJANCZYK AND A. LUTOBORSKI sweep # Q ka? Qk 1-3.16646771495053e-01 5.33733877457e-0 6.4043e-05-1.511766767561811e-01-9.06530348969716e-01-7.98951355096e-01 3.98411857874910e-01-5.8675130978633e-01 -.01876639897017e-01 5-3.166517048090e-01 5.3403193360516e-0 1.0450e-06-1.508477146734e-01-9.0699744667507e-01-7.975133055357e-01 3.96168386846e-01-5.86889056665494e-01 -.0040784981074e-01 10-3.1665166993697e-01 5.34030989691857e-0 4.180e-08-1.508494100481855e-01-9.06948179307888e-01-7.9754580980885e-01 3.9600876881795e-01-5.86885579514951e-01 -.0034453790306e-01 15-3.16651668678573e-01 5.3403095300504e-0 8.7804e-10-1.508494779430533e-01-9.0694609057939e-01-7.9754678161558e-01 3.9603837640639e-01-5.868854401803947e-01 -.00348687030837e-01 30-3.1665166866163e-01 5.34030951680608e-0 5.605e-14-1.508494807709545e-01-9.0694616994970e-01-7.975468383019e-01 3.9603960959194e-01-5.86885434387950e-01 -.0034885984618e-01 Table 1. Matrices Q in a minimizing sequence generated by LSRM. We will now present comparative numerical results for the LSRM and RSMR methods. Recall that the functional P is a sum of a linear and a quadratic term. We will consider classes of examples when the functional can be approximated by its linear or the quadratic term. In the rst class of examples the linear term dominates the quadratic term or in other words when jjajj >> jjjj. We deal here with a perturbed linear functional. The minimum of the functional P can be approximated by the sum of singular values of T A. The second class of examples consists of cases when the quadratic term dominates the linear term, that is when jjajj << jjjj. We deal here with a perturbed quadratic functional. Then the minimum value of the functional P can be approximated by the sum of the k smallest singular values of. The third class of examples consists of cases when the functional is genuinely quadratic, that is when A Q for some Q OSt(p; k). The minimum of the functional is then close to zero.

PROCRUSTES PROBLEM FOR STIEFEL MATRICES 15 In each class of examples we pick two dierent matrices : one corresponding to the ellipsoid being almost a sphere, that is when I, the other corresponding to the ellipsoid being very at in one or more planes, that is when 1 p is large. The algorithms were written in MATLAB 4. and run on an HP9000 workstation with the machine relative precision = :04e? 16. We set Maxstep = 30 and threshold = 5. As the initial approximation we took Q = I pk for the LSRM, and (Q; Q? ) = I pp for the RSRM. The planar Procrustes solver used was based on the hyperbola of Appolonius (the iterated reections solver was giving numerically equivalent results). Some representative results are shown in Tables -6. RSRM LSRM p k # sweeps ka? Qk kq? Qk # sweeps ka? Qk kq? Qk 6 6.38e-15 5.30e-15 19 9.e-16.37e-15 3 30 4.41e-1 1.33e-11 5 1.48e-15.53e-15 4 30 5.1e-11 1.61e-10 6 1.99e-15 6.30e-15 5 4 5.86e-15 1.07e-14 30 6.00e-11 1.35e-10 9 30 6.85e-09 5.79e-08 30 9.78e-06 6.11e-05 3 30 8.0e-06 6.89e-05 30 7.01e-07 3.18e-06 4 30 3.66e-03 3.87e-0 30.07e-05 1.8e-04 5 30 5.60e-03 5.93e-0 30 5.06e-07 1.98e-06 6 30 4.57e-04 4.68e-03 30 5.3e-1 1.16e-11 7 30 1.6e-03 1.08e-0 30 1.77e-13 3.05e-13 8 30.46e-03.1e-0 30.98e-1 5.34e-1 Table. A = Q and 1 p. RSRM LSRM p k # sweeps ka? Qk kq? Qk # sweeps ka? Qk kq? Qk 6 30 9.00e-03 1.53e+00 6.08e-16 4.69e-15 3 30 7.49e-03 1.55e+00 18 4.47e-16 1.6e-14 4 30 7.66e-03 1.73e+00 16 5.56e-16.86e-15 5 30 3.10e-03 8.6e-01 30 8.99e-16 7.98e-15 9 30 6.8e-03 1.70e+00 30 9.08e-11.6e-08 3 30 5.04e-03 1.41e+00 30 9.71e-07.84e-04 4 30 4.67e-03 1.35e+00 30 1.19e-04 3.39e-0 5 30 7.04e-03.1e+00 30 1.18e-03 3.01e-01 6 30 6.45e-03 1.96e+00 30 3.8e-06 1.48e-05 7 30 4.93e-03 1.49e+00 30 3.53e-06 1.0e-05 8 30 4.87e-03 1.54e+00 30 8.44e-10 1.70e-09 Table 3. A = Q and 1 p 10. Table illustrates the behavior of the two methods when the ellipsoid is almost a sphere and when there exists Q such that Q = A. That is the bilinear and the linear terms are of comparable size. The experiments suggest that the LSRM requires less sweeps to obtain a satisfactory approximation to the minimizer.

16 A. W. BOJANCZYK AND A. LUTOBORSKI Table 3 illustrates the behavior of the two methods when the length of the half of the ellipsoid's axes is approximately 1.0 and the other half is approximately 0.01. In addition, there exists Q such that Q = A. In this case the convergance of the RSRM is particularly slow. We observed that, at least initially, the RSRM fails to locate the minimizer in OSt(4; ) being unable to establish the proper signs of the entries of the matrix Q. The LSRM on the other hand approximates the minimizer correctly. RSRM LSRM p k # sweeps esterror ka? Qk sweepcorr # sweeps esterror ka? Qk sweepcorr 6 30-1.70e-0.1e-0 1.89e-08 6-1.70e-0.1e-0 6.10e-16 3 30 -.64e-0 3.38e-0 3.93e-08 13 -.64e-0 3.38e-0 3.81e-16 4 30-5.03e-0 6.10e-0 1.10e-07 10-5.03e-0 6.10e-0 3.53e-16 5 30.55e-0 9.04e-01 3.3e-09 30.55e-0 9.04e-01 1.11e-15 9 30-4.41e-0 4.53e-0 8.44e-08 9-4.41e-0 4.53e-0 6.93e-18 3 30-3.61e-0 3.86e-0 4.6e-08 14-3.61e-0 3.86e-0 5.41e-16 4 30-4.05e-0 4.60e-0 4.98e-06 11-4.05e-0 4.59e-0 5.75e-16 5 30-5.44e-0 6.30e-0 1.06e-06 15-5.41e-0 6.7e-0 7.35e-16 6 30 3.49e-0 7.10e-01 1.00e-06 9 3.49e-0 7.10e-01 9.99e-16 7 30 4.89e-0 1.04e+00.67e-07 15 4.89e-0 1.04e+00.e-16 8 30 5.53e-0 1.34e+00 1.88e-08 14 5.53e-0 1.34e+00 6.66e-16 Table 4. kak 10? kk and 1 p. RSRM LSRM p k # sweeps esterror ka? Qk sweepcorr # sweeps esterror ka? Qk sweepcorr 6 6-3.3e-0 1.36e+01 7.10e-15 30-3.3e-0 1.36e+01 5.7e-13 3 5-1.51e-0 3.01e+01 3.55e-15 0-1.51e-0 3.01e+01 7.10e-15 4 4-1.81e-0.51e+01 7.10e-15 30-1.81e-0.51e+01 3.55e-15 5 4-1.43e-0 3.16e+01 3.55e-15 8-1.43e-0 3.16e+01 0.00e+00 9 4-1.6e-0 1.60e+01 0.00e+00 6-1.6e-0 1.60e+01 0.00e+00 3 6 -.37e-0.75e+01 1.06e-15 0 -.37e-0.75e+01 1.77e-15 4 7-3.40e-0 3.66e+01.13e-15 8-3.40e-0 3.66e+01 0.00e+00 5 5-3.6e-0 3.8e+01 1.4e-15 14-3.6e-0 3.8e+01 1.4e-15 6 5 -.87e-0 4.33e+01 0.00e+00 11 -.87e-0 4.33e+01 0.00e+00 7 5 -.69e-0 4.6e+01 7.10e-15 14 -.69e-0 4.6e+01 7.10e-15 8 5 -.70e-0 4.61e+01.13e-15 -.70e-0 4.61e+01 7.10e-15 Table 5. kak 10 kk and 1 p. Table 4 illustrates the behavior of the two methods when the ellipsoid is almost a sphere but now A is chosen so kak 10? kk. That is the quadratic term dominates the linear term. In this case the minimum of the functional can be estimated by the minimum value of the quadratic term. In Table 4 esterror denotes the dierence between the minimum value of the quadratic term and the computed value of the functional, and sweepcorr = ka? Qk? ka? ^Qk where Q and ^Q are the last and penultimate approximations to the minimizer. The experiments

PROCRUSTES PROBLEM FOR STIEFEL MATRICES 17 suggest that the LSRM requires less sweeps to obtain a satisfactory approximation to the minimizer. Table 5 illustrates the behavior of the two methods when the ellipsoid is almost a sphere but now A is chosen so kak 10 kk. That is the linear term dominates the quadratic terms. In this case the minimum of the functional can be estimated by the minimum value of the linear term. In Table 5 esterror denotes the dierence between the minimum value of the linear term and the computed value of the functional. The experiments suggest that the RSRM requires less sweeps to obtain a satisfactory approximation to the minimizer. References [1] V.I. Arnold, Geometrical Methods in the Theory of Ordinary Dierential Equations, Springer-Verlag, New York, 1988. [] J.M. Ten Berge and K. Nevels, A general solution to Mosier's oblique Procrustes problem, Psychometrika 4 (1977), 593{600. [3] J.M. Ten Berge and D.L. Knol, Orthogonal rotations to maximal agreement for two or more matrices of dierent column orders, Psychometrika 49 (1984), 49{55. [4] A. Edelman, T. Arias and S.T.Smith, Conjugate gradient on Stiefel and Grassman manifolds,1995. [5] L. Elden, Algorithms for the regularization of ill-conditioned least squares problems, BIT 17 (1977), 134-145. [6] G.E. Forsythe and G.H. Golub, On the stationary values of a second-degree polynomial on the unit sphere, SIAM 4 (1965), 1050{1068. [7] G.E. Forsythe and P. Henrici, The cyclic Jacobi method for computing the principal values of a complex matrix, Trans. AMS 94 (1960), 1{3. [8] W. Gander, Least Squares with a Quadratic Constraint, Numer. Math. 36 (1981), 91-307. [9] G.H. Golub, Some modied matrix eigenvalue problems, SIAM Review 15 (1973), 318{334. [10] G.H. Golub and U. von Matt, Quadratically constrained least squares and quadratic problems, Numer. Math. 59 (1991), 561{580. [11] G.H. Golub and C.F. Van Loan, Matrix Computations Johns Hopkins, Baltimore,1990. [1] Ky Fan, On a theorem of Weyl concerning eigenvalues of linear transformations, Proc. N.A.S. 35 (1949), 65{655. [13] A. Lutoborski, On the convergence of the Euler-Jacobi method, Numer. Funct. Anal. and Optimiz. 13 (199), 185{0. [14] H. Park, A parallel algorithm for the unbalanced orthogonal Procrustes problem, Parallel Computing 17 (1991), 913{93. [15] S.T. Smith, Optimization techniques on Riemannian manifolds, Fields Institute Comm. 3 (1994), 113{136. [16] J. Von Neumann, Some matrix inequalities and metrization of the matrix space, Tomsk Univ. Rev. (1937), 86{300. [17] H. Weber and J.Wellstein, Enzyklopadie der Elementar Mathematik B.G. Teubner, Leipzig, 1915. Electrical Engineering Department, Cornell University, Ithaca, N.Y. 14853 E-mail address: adamb@ee.cornell.edu Department of Mathematics, Syracuse University, Syracuse, N.Y. 1344-1150 E-mail address: lutobor@mazur.syr.edu