Comparing Two Matrices by Means of Isometric Projections

Similar documents
Affine iterations on nonnegative vectors

Similarity matrices for colored graphs

Generalized Shifted Inverse Iterations on Grassmann Manifolds 1

Lecture 2: Linear Algebra Review

Note on the convex hull of the Stiefel manifold

Existence and uniqueness of solutions for a continuous-time opinion dynamics model with state-dependent connectivity

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

H 2 -optimal model reduction of MIMO systems

Practice Qualifying Exam Questions, Differentiable Manifolds, Fall, 2009.

SYMPLECTIC MANIFOLDS, GEOMETRIC QUANTIZATION, AND UNITARY REPRESENTATIONS OF LIE GROUPS. 1. Introduction

Functional Analysis Review

Characterization of half-radial matrices

Some inequalities for sum and product of positive semide nite matrices

Linear algebra 2. Yoav Zemel. March 1, 2012

The nonsmooth Newton method on Riemannian manifolds

1 Linear Algebra Problems

Extending a two-variable mean to a multi-variable mean

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

LECTURE 10: THE ATIYAH-GUILLEMIN-STERNBERG CONVEXITY THEOREM

Non-negative matrix factorization with fixed row and column sums

Convexity of the Joint Numerical Range

Differential Topology Final Exam With Solutions

Changing coordinates to adapt to a map of constant rank

THEODORE VORONOV DIFFERENTIABLE MANIFOLDS. Fall Last updated: November 26, (Under construction.)

THE INVERSE FUNCTION THEOREM FOR LIPSCHITZ MAPS

Math Advanced Calculus II

Definition 5.1. A vector field v on a manifold M is map M T M such that for all x M, v(x) T x M.

Adjoint Orbits, Principal Components, and Neural Nets

Topological properties

LECTURE: KOBORDISMENTHEORIE, WINTER TERM 2011/12; SUMMARY AND LITERATURE

An extrinsic look at the Riemannian Hessian

1 Math 241A-B Homework Problem List for F2015 and W2016

Rank-Constrainted Optimization: A Riemannian Manifold Approach

Compression, Matrix Range and Completely Positive Map

Chapter Three. Matrix Manifolds: First-Order Geometry

We describe the generalization of Hazan s algorithm for symmetric programming

2. Linear algebra. matrices and vectors. linear equations. range and nullspace of matrices. function of vectors, gradient and Hessian

Elementary linear algebra

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 11 Luca Trevisan February 29, 2016

Bott Periodicity. Anthony Bosman Senior Honors Thesis Department of Mathematics, Stanford University Adviser: Eleny Ionel

1 Smooth manifolds and Lie groups

An LMI description for the cone of Lorentz-positive maps II

The following definition is fundamental.

Algebra C Numerical Linear Algebra Sample Exam Problems

Math 408 Advanced Linear Algebra

Course Summary Math 211

Singular Value Decomposition

CONVEX OPTIMIZATION OVER POSITIVE POLYNOMIALS AND FILTER DESIGN. Y. Genin, Y. Hachez, Yu. Nesterov, P. Van Dooren

Introduction and Preliminaries

SYMMETRIC SUBGROUP ACTIONS ON ISOTROPIC GRASSMANNIANS

Optimization Theory. A Concise Introduction. Jiongmin Yong

In English, this means that if we travel on a straight line between any two points in C, then we never leave C.

Designing Information Devices and Systems II

Analysis and Linear Algebra. Lectures 1-3 on the mathematical tools that will be used in C103

Compound matrices and some classical inequalities

Lecture 13 The Fundamental Forms of a Surface

GQE ALGEBRA PROBLEMS

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

INVERSE FUNCTION THEOREM and SURFACES IN R n

MET Workshop: Exercises

THE EULER CHARACTERISTIC OF A LIE GROUP

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

Lecture 11: Clifford algebras

Symmetric Spaces Toolkit

Essential Matrix Estimation via Newton-type Methods

Linear algebra and applications to graphs Part 1

Mathematical Foundations

Linear Algebra Massoud Malek

Spectral Theorem for Self-adjoint Linear Operators

MIT Algebraic techniques and semidefinite optimization May 9, Lecture 21. Lecturer: Pablo A. Parrilo Scribe:???

ON WEIGHTED PARTIAL ORDERINGS ON THE SET OF RECTANGULAR COMPLEX MATRICES

MAS331: Metric Spaces Problems on Chapter 1

MATH 23a, FALL 2002 THEORETICAL LINEAR ALGEBRA AND MULTIVARIABLE CALCULUS Solutions to Final Exam (in-class portion) January 22, 2003

CHARACTERIZATIONS. is pd/psd. Possible for all pd/psd matrices! Generating a pd/psd matrix: Choose any B Mn, then

Vectors. January 13, 2013

Problems in Linear Algebra and Representation Theory

14 Singular Value Decomposition

ECE 275A Homework #3 Solutions

Math 205C - Topology Midterm

Two Results About The Matrix Exponential

Duke University, Department of Electrical and Computer Engineering Optimization for Scientists and Engineers c Alex Bronstein, 2014

These notes are incomplete they will be updated regularly.

Basic Concepts of Group Theory

A little taste of symplectic geometry

Differential Topology Solution Set #2

B. Appendix B. Topological vector spaces

Elements of Linear Algebra, Topology, and Calculus

arxiv: v1 [math.na] 1 Sep 2018

Introduction to Topology

Math Linear Algebra II. 1. Inner Products and Norms

Maxwell Equations Waves Helmholtz Wave Equation Dirac Delta Sources Power Matrices Complex Analysis Optimization. EM & Math - Basics

ON THE QR ITERATIONS OF REAL MATRICES

Mathematical Analysis Outline. William G. Faris

The second dual of a C*-algebra

RIGIDITY OF MINIMAL ISOMETRIC IMMERSIONS OF SPHERES INTO SPHERES. Christine M. Escher Oregon State University. September 10, 1997

Lecture Notes on PDEs

Errata for Vector and Geometric Calculus Printings 1-4

Linear Algebra Lecture Notes-II

REPRESENTATION THEORY WEEK 7

Set, functions and Euclidean space. Seungjin Han

Transcription:

Comparing Two Matrices by Means of Isometric Projections T. Cason, P.-A. Absil, P. Van Dooren Université catholique de Louvain, Department of Mathematical Engineering Bâtiment Euler, avenue Georges Lemaître 4, 1348 Louvain-la-Neuve, Belgium http://www.inma.ucl.ac.be/{ cason, absil, vandooren} February 14, 2008 Abstract In this paper, we go over a number of optimization problems defined on a manifold in order to compare two matrices, possibly of different order. We consider several variants and show how these problems relate to various specific problems from the literature. 1 Introduction When comparing two matrices A and B it is often natural to allow for a class of transformations acting on these matrices. For instance, when comparing adjacency matrices A and B of two graphs with an equal number of nodes, one can allow symmetric permutations P T AP on one matrix in order to compare it to B, since this is merely a relabelling of the nodes of A. The so-called comparison then consists in finding the best match between A and B under this class of transformations. A more general class of transformations would be that of unitary similarity transformations Q AQ, where Q is a unitary matrix. This leaves the eigenvalues of A unchanged but rotates its eigenvectors, which will of course play a role in the comparison between A and B. If A and B are of different order, say m and n, one may want to consider their restriction on a lower dimensional subspace: U AU and V BV, with U and V belonging to St(k, m) and St(k, n) respectively, and where St(k, m) = U C m k : U U = I k denotes the compact Stiefel manifold. This yields two square matrices of equal dimension k min(m, n), which can again be compared. This paper presents research results of the Belgian Network DYSCO (Dynamical Systems, Control, and Optimization), funded by the Interuniversity Attraction Poles Programme, initiated by the Belgian State, Science Policy Office. The scientific responsibility rests with its authors. 1

But one still needs to define a measure of comparison between these restrictions of A and B which clearly depends on U and V. Fraikin et al. [1] propose in this context to maximize the inner product between the isometric projections, U AU and V BV, namely: arg max U AU, V BV := Rtr `(U AU) (V BV ), U U=I k V V =I k where R denotes the real part of a complex number. They show this is also equivalent to arg max XA, BX = Rtr(A X BX), X=V U U U=I k V V =I k and eventually show how this problem is linked to the notion of graph similarity introduced by Blondel et al. in [2]. The graph similarity matrix S introduced in that paper also proposes a measure of comparing two matrices A and B via the fixed point of a particular iteration. But it is shown in [3] that this is equivalent to the optimization problem or also arg max S F =1 SA, BS = R tr((sa) BS) arg max S, BSA = Rtr((SA) BS). S F =1 Notice that S also belongs to a Stiefel manifold, since vec(s) St(1, mn). In this paper, we use a distance measure rather than an inner product to compare two matrices. As distance measure between two matrices M and N, we will use dist (M, N) = M N 2 F = tr((m N) (M N)). We will analyze distance minimization problems that are essentially the counterparts of the similarity measures defined above. These are arg min U U=I k V V =I k dist (U AU,V BV ), arg min dist (XA, BX), X=V U U U=I k V V =I k and arg min dist (X, BXA ), X=V U U U=I k V V =I k for the problems involving two isometries U and V. Notice that these three distance problems are not equivalent although the corresponding inner product problems are equivalent. Similarly, we will analyze the two problems and arg min S F =1 dist(sa,bs) = tr((sa BS) (SA BS)) arg min dist(s, BSA ) = tr`(s BSA ) (S BSA ), S F =1 2

for the problems involving a single matrix S. Again, these are not equivalent in their distance formulation although the corresponding inner product problems are equivalent. We will develop optimality conditions for those different problems, indicate their relations with existing problems from the literature and give an analytic solution for particular matrices A and B. 2 The Problems and their Geometry All those problems are defined on feasible sets that have a manifold structure. Roughly speaking, this means that the feasible set is locally smoothly identified with R d, where d is the dimension of the manifold. Optimization on a manifold generalizes optimization in R d while retaining the concept of smoothness. We refer the reader to [4,5] for details. A well known and largely used class of manifolds is the class of embedded submanifolds. The submersion theorem gives a useful sufficient condition to prove that a subset of a manifold M is an embedded submanifold of M. If there exists a smooth mapping F : M N between two manifolds of dimension d m and d n(< d m) and y N such that the rank of F is equal to d n at each point of N := F 1 (y), then N is a embedded submanifold of M and the dimension of N is d m d n. M F y = F(N) N N Example The unitary group U(n) = Q C n n : Q Q = I n is an embedded submanifold of C n n. Indeed, consider the function F : C n n S Her(n) : Q Q Q I n where S Her(n) denotes the set of Hermitian matrices of order n. Clearly, U(n) = F 1 (0 n). It remains to show for all Ĥ S Her(k), there exists an H C n n such that DF(Q) H = Q H +H Q = Ĥ. It is easy to see that DF(Q) (QĤ/2) = Ĥ, and according to the submersion theorem, it follows that U(n) is an embedded submanifold of C n n. The dimension of C n n and S Her(n) are 2n 2 and n 2 respectively. Hence U(n) is of dimension n 2. In our problems, embedding spaces are matrix-euclidean spaces C m k C n k and C n m which have a trivial manifold structure since C m k 3

C n k R 2mnk2 and C n m R 2mn. For each problem, we further analyze whether or not the feasible set is an embedded submanifold of their embedding space. When working with a function on a manifold M, one may be interested in having a local linear approximation of that function. Let M be an element of M and F M(M) denote the set of smooth real-valued functions defined on a neighborhood of M. Definition 1 A tangent vector ξ M to a manifold M at a point M is a mapping from F M(M) to R such that there exists a curve γ on M with γ(0) = M, satisfying ξ Mf = df(γ(t)) dt t=0, f F M(M). Such a curve γ is said to realize the tangent vector ξ x. So, the only thing we need to know about a curve γ in order to compute the first-order variation of a real-value function f at γ(0) along γ is the tangent vector ξ x realized by γ. The tangent space to M at M, denoted by T MM, is the set of all tangent vectors to M at M and it admits a structure of vector space over R. When considering an embedded submanifold in a Euclidean space E, any tangent vector ξ M of the manifold is equivalent to a vector E of the Euclidean space. Indeed, let ˆf be any a differentiable continuous extension of f on E, we have ξ Mf := df(γ(t)) dt = D ˆf(M) E, (1) t=0 where E is γ(0) and D is the directional derivative operator D ˆf(M) ˆf(M + te) ˆf(M) E = lim. t 0 t The tangent space reduces to a linear subspace of the original space E. Example Let γ(t) be a curve on the unitary group U(n) passing through Q at t = 0, i.e. γ(t) γ(t) = I n and γ(0) = Q. Differentiating with respect to t yields γ(0) Q + Q γ(0) = 0 n. One can see from equation (1) that the tangent space to U(n) at Q is contained in E C n n : E Q + Q E = 0 n = QΩ C n n : Ω + Ω = 0 n. (2) Moreover, this set is a vector space over R of dimension n 2, and hence is the tangent space itself. Let g M be an inner product defined on the tangent plane T MM. The gradient of f at M, denoted grad f(m), is defined as the unique element of the tangent plane T MM, that satisfies ξ Mf = g M(grad f(m), ξ M), ξ M T MM. The gradient, together with the inner product, fully characterizes the local first order approximation of a smooth function defined on the manifold. In the case of an embedded manifold of a Euclidean space E, since T MM 4

is a linear subspace of T ME, an inner product ĝ M on T ME generates by restriction an inner product g M on T MM. The orthogonal complement of T MM with respect to ĝ M is called the normal space to M at M and denoted by (T MM). The gradient of a smooth function ˆf, defined on the embedding manifold may be decomposed into its orthogonal projection on the tangent and normal space, respectively P Mgrad ˆf(M) and P Mgrad ˆf(M), and it follows that the gradient of f (the restriction of ˆf on M) is the projection on the tangent space of the gradient of ˆf grad f(m) = P Mgrad ˆf(M). Example Let A and B, two Hermitian matrices. We define ˆf : C n n R : Q Rtr(Q AQB), and f its restriction on the unitary group U(n). We have D ˆf(Q) E = 2R tr(e AQB). We endow the tangent space T QC n n with an inner product ĝ : T QC n n T QC n n R : E, F R tr(e F), and the gradient of ˆf at Q is then given by grad ˆf(Q) = 2AQB. One can further define an orthogonal projection on T QU(n) P QE := E Q Her(Q E), and the gradient of f at Q is given by grad f(q) = P Qgrad ˆf(Q). Those relations are useful when one wishes to analyze optimization problems, and will hence be further developed for the problems we are interested in. Below we look at the various problems introduced earlier and focus on the first problem to make these ideas more explicit. Problem 1 Given A C m m and B C n n, let ˆf : C m k C n k C : (U, V ) ˆf(U,V ) = dist(u AU, V BV ), find the minimizer of where f : St(k, m) St(k, n) C : (U,V ) f(u, V ) = ˆf(U, V ), o St(k, m) = nu C m k : U U = I k denotes the compact Stiefel manifold. Let A = (A 1, A 2) and B = (B 1, B 2) be pairs of matrices. We define the following useful operations: an entrywise product, A B = (A 1B 1, A 2B 2), a contraction product, A B = A 1B 1 + A 2B 2, and 5

a conjugate-transpose operation, A = (A 1, A 2). The definitions of the binary operations, and, are (for readability) extended to single matrices when one has to deal with pairs of identical matrices. Let, for instance, A = (A 1, A 2) be a pair of matrices and B be a single matrix, we define A B = (A 1, A 2) B = (A 1, A 2) (B,B) = (A 1B, A 2B) A B = (A 1, A 2) B = (A 1, A 2) (B,B) = A 1B + A 2B. The feasible set of Problem 1 is given by the cartesian product of two compact Stiefel manifolds, namely M = St(k, m) St (k, n) and is hence a manifold itself (cf. [5]). Moreover, we can prove that M is an embedded submanifold of E = C m k C n k. Indeed, consider the function F : E S Her(k) S Her(k) : M M M (I k, I k ) where S Her(k) denotes the set of Hermitian matrices of order k. Clearly, M = F 1 (0 k,0 k ). It remains to show that each point M M is a regular value of F which means that F has full rank, i.e. for all Ẑ S Her(k) S Her(k), there exists Z E such that DF(M) Z = Ẑ. It is easy to see that DF(M) (M Ẑ/2) = Ẑ, and according to the submersion theorem, it follows that M is an embedded submanifold of E. The tangent space to E at a point M = (U, V ) E is the embedding space itself (i.e. T ME E), whereas the tangent space to M at a point M = (U, V ) M is given by T MM := { γ(0) : γ, differentiable curve on M with γ(0) = M} = {ξ = (ξ U, ξ V ) : Her(ξ M) = 0} = j M ΩU Ω V «+ M KU K V «ff : Ω U, Ω V S s Her(k), where M = (U, V ) with U and V any orthogonal complement of respectively U and V, where Her( ) stands for Her( ) : X (X + X )/2, and where S s Her(k) denotes the set of skew-hermitian matrices of order k. We endow the tangent space T ME with an inner product: ĝ M(, ) : T ME T ME C : ξ, ζ ĝ M(ξ, ζ) = R tr(ξ ζ), and define its restriction on the tangent space T MM( T ME): g M(, ) : T M M T MM C : ξ, ζ g M(ξ, ζ) = ĝ M(ξ, ζ). One may now define the normal space to M at a point M M: T M M := {ξ : ĝ M(ξ, ζ) = 0, ζ T MM} = {M (H U, H V ) : H U, H V S Her(k)}, where S Her(k) denotes the set of Hermitian matrices of order k. 6

Problem 2 Given A C m m and B C n n, let find the minimizer of ˆf : C n m C : X ˆf(X) = dist (XA, BX) f : M C : X f(x) = ˆf(X), where M = V U C n m : (U, V ) St (k, m) St (k, n). M is a smooth and connected manifold. Indeed, let Σ := ˆ I k 0 0 0 be an element of M. Since every X M is congruent to Σ by the congruence action ((Ũ, Ṽ ), X) Ṽ XŨ, (Ũ, Ṽ ) U(m) U(n), where U(n) = U C n n : U U = I n denotes the unitary group of degree n. The set M is an orbit of this smooth complex algebraic Lie group action of U(m) U(n) on C n m and therefore a smooth manifold [6, App. C]. M is the image of the connected subset U(m) U(n) of the continuous (and in fact smooth) map π : U(m) U(n) C n m, π(ũ, Ṽ ) = Ṽ XŨ, and hence is also connected. The tangent space to M at a point X = V U M is T XM := { γ(0) : γ curve on M with γ(0) = X} = {ξ V U + V ξ U : Her(V ξ V ) = Her(U ξ U) = 0 k } = {V ΩU + V K UU + V K V U : Ω S s Her(k)}. We endow the tangent space T XC n m C n m with an inner product: ĝ X(, ) : T XC n m T XC n m C : ξ, ζ ĝ X(ξ, ζ) = R tr(ξ ζ), and define its restriction on the tangent space T XM( T XE): g X(, ) : T XM T XM C : ξ, ζ g X(ξ, ζ) = ĝ X(ξ,ζ). One may now define the normal space to M at a point X M: T X M := {ξ : ĝ X(ξ, ζ) = 0, ζ T XM} = {V HU + V KU : H S Her(k)}. Problem 3 Given A C m m and B C n n, let find the minimizer of ˆf : C n m C : X ˆf(X) = dist(x, BXA ) f : M C : X f(x) = ˆf(X), where M = V U C n m : (U, V ) St (k, m) St (k, n). Since they have the same feasible set, topological developments obtained for Problem 2 hold also for Problem 3. Problem 4 Given A C m m and B C n n, let find the minimizer of ˆf : C n m C : S ˆf(S) = dist(sa, BS) f : M C : X f(x) = ˆf(X), where M = S C n m : S F = 1. 7

The tangent space to M at a point S M is T SM = {ξ : Rtr(ξ S) = 0}. We endow the tangent space T SC n m C n m with an inner product: ĝ S(, ) : T SC n m T SC n m C : ξ, ζ ĝ S(ξ, ζ) = Rtr(ξ ζ), and define its restriction on the tangent space T SM( T SE): g S(, ) : T SM T SM C : ξ, ζ g S(ξ, ζ) = ĝ S(ξ, ζ). One may now define the normal space to M at a point S M: TS M := {ξ : ĝ S(ξ, ζ) = 0, ζ T SM} = {αs : α R} Problem 5 Given A C m m and B C n n, let ˆf : C n m C : S ˆf(S) = dist(s, BSA ) find the minimizer of f : M C : X f(x) = ˆf(X), where M = S C n m : S F = 1. Since they have the same feasible set, topological developments obtained for Problem 4 also hold for Problem 5. 3 Optimality conditions Our problems are optimization problems of smooth functions defined on a compact domain M, and therefore there always exists an optimal solution M M where the first order optimality condition is satisfied, grad f(m) = 0. (3) We study the stationary points of Problem 1 in detail, and we show how the other problem can be tackled. Problem 1 We first analyze this optimality condition for Problem 1. For any (W, Z) T ME, we have D ˆf(U, V ) (W,Z) (W AU + U AW Z BV V BZ) «= 2Rtr (U AU V BV ) AU = ĝ (U,V ) 2 AB + A ««U AB BV BA + B,(W, Z), V BA (4) 8

with AB := U AU V BV =: BA, and hence the gradient of ˆf at a point (U, V ) E is grad ˆf(U,V AU ) = 2 AB + A «U AB BV BA + B. (5) V BA Since the normal space T MM is the orthogonal complement of the tangent space T MM, one can, for any M M, decompose any E E into its orthogonal projections on T MM and T MM: P ME := E P ME and P ME := M Her(M E). (6) For any (W, Z) T MM, (4) which yields D ˆf(U, V ) (W,Z) = g (U,V ) P Mgrad ˆf(U, V ),(W, Z), and the gradient of f at a point (U, V ) M is grad f(u, V ) = P Mgrad ˆf (M). (7) For our problem, the first order optimality condition (3) yields, by means of (5), (6) and (7) AU ««AB + A U AB BV BA + B V BA = «U Her V U AU AB + U A U AB Observe that f is constant on the equivalence classes [U, V ] = {(U, V ) Q : Q U(k)}, V BV BA + V B V BA. (8) and that any point of [U, V ] is a stationary point of f whenever (U, V ) is. We consider the special case where U AU and V BV are simultaneously diagonalizable by a unitary matrix at all stationary points (U, V ) (it follows from (8) that this happens when A and B are both Hermitian), i.e. eigendecomposition of U AU and V BV are respectively WD AW and WD BW, with W U(k) and D A = diag `θ1 A,, θk, A DB = diag `θ1 B,, θk. B The cost function at stationary points simply reduces to kx θi A θi B 2 and the minimization problem roughly consists in finding the isometric projections U AU, V BV such that their eigenvalues are as equal as possible. More precisely, the first optimality condition becomes» «««««A U U DA DA D B W W = 0, (9) B V V D B D A that is,» «««A Ūi Ūi θ A«i θ A B V i V i θi B i θi B «θi B θi A = 0, i = 1,..., k (10) where Ūi and V i denotes the i th column of Ū = UW and V = V W respectively. This implies that for all i = 1,..., k either θ A i = θ B i or `θa i, Ūi and `θb i, V i are eigenpairs of respectively A and B. If A and B are Hermitian matrices and α 1 α 2 α m and β 1 β 2 β m are their respective eigenvalues, the Cauchy interlacing theorem yields after reordering of the θ A i and θ B i in an increasing fashion D B i=1 θ A i [α i, α i k+m ] and θ B i [β i, β i k+n ], i = 1,..., k. 9

Definition 2 For S 1 and S 2, two non-empty subsets of a metric space, we define e d (S 1, S 2) = inf d(s 1, s 2) s 1 S 1 s 2 S 2 When considering two non-empty subsets [α 1, α 2] and [β 1, β 2] of R, one can easily see that e d ([α 1, α 2],[β 1, β 2]) = max (0, α 1 β 2, β 1 α 2). Theorem 3.1 Let α 1 α 2 α m and β 1 β 2 β n be the eigenvalues of Hermitian matrices respectively A and B. The solution of Problem 1 is bounded below by kx (e d ([α i, α i k+m ], [β i, β i k+n ])) 2 i=1 with d the Euclidean norm. Proof Recall that when A and B are Hermitian matrices, the cost function at stationary points of Problem 1 reduces to kx θi A θi B 2, i=1 where θ A 1,..., θ A k and θ B 1,..., θ B k are the eigenvalues of U AU and V BV, respectively. It follows from the Cauchy interlacing theorem that the minimum of this function is bounded below by such that min θ A i,θb i min π kx i=1 2 θ A π(i) θi B (11) θ A 1 θ A 2 θ A k, θ B 1 θ B 2 θ B k, (12) θ A i [α i, α i k+m ], θ B i [β i, β i k+n ], (13) and π( ) is a permutation of 1,..., k. Let θ A 1,..., θ A k, and θ B 1,..., θ B k satisfy (12). Then, the identity permutation π(i) = i is optimal for problem (11). Indeed, if π is not the identity, then there exists i and j such that i < j and π(i) > π(j), and we have `θa i θ B 2 π(i) + `θa j θ B 2 π(j) h`θa j θ B π(i) 2 + `θa i θ B π(j) 2i = 2 `θ A j θ A i `θb π(i) θ B π(j) 0. Since the identity permutation is optimal, our minimization problem simply reduces to kx 2 min θi A θi B. (12)(13) i=1 We now show that (12) can be relaxed. Indeed, assume there is an optimal solution that does not satisfy the ordering condition, i.e. there exist i and 10

θ B 1 θ B 2 θ B 3 θ A 1 θ A 2 θ A 3 α 1 α 2 α 3 α 4 α 5 α 6 β 1 β 2β 3β 4 β 5 β 6 β 7 Figure 1: Let the α i and β i be the eigenvalues of the Hermitian matrices A and B, and k = 3. Problem 1 is then 3X 2 equivalent to min θi A θi B such that θ A i [α i, α i+3], θ i=1 i A,θB i θi B [β i, β i+4]. The two first terms of this sum have strictly positive contributions whereas the third one can be reduced to zero within a continuous set of values for θ3 A and θ3 B in [α 3, β 7]. j, i < j such that θ A j θ A i. One can see that the following inequalities hold α i α j θ A j θ A i α i k+m α j k+m. Since θi A belongs to [α j, α j k+m ] and θj A belongs to [α i, α i k+m ], one can switch i and j and build an ordered solution that does not change the cost function and hence remains optimal. It follows that kx min (12)(13) i=1 θ A i θ B i 2 is equal to kx This result is precisely what we were looking for. i=1 2. min θi A θi B (13) We conjecture that the Cauchy interlacing condition are tight in the following sense: Conjecture 1 Let A S Her(m) with eigenvalues α 1 α 2... α m. Let θ 1 θ 2... θ k satisfy the interlacing conditions θ i [α i, α i k+m ], i = 1,..., k. Then there exists U St(k, m) such that {θ 1, θ 2,..., θ k } is the spectrum of U AU. If this conjecture holds, then Theorem 3.1 holds without the words bounded below by. Figure 1 gives an example of optimal matching. Problem 2 For all Y T XC n m C n m, we have D ˆf(X) Y = 2R tr(y (XAA B XA BXA + B BX)), (14) 11

and hence the gradient of ˆf at a point X C n m is grad ˆf(X) = 2(XAA B XA BXA + B BX). (15) Since the normal space T X M is the orthogonal complement of the tangent space T XM, one can, for any X = V U M, decompose any E C n m into its orthogonal projections on T XM and T X M: P XE = E V Her(V EU) U (I n V V )E (I m UU ), and P XE = V Her(V EU)U + (I n V V ) E (I m UU ). (16) For any Y T XM, (14) hence yields D ˆf(X) Y = Df(X) Y = g X P Xgrad ˆf(X), Y and the gradient of f at a point X = V U M is, grad f(x) = P Xgrad ˆf (X). (17) Problem 3 This problem is very similar to Problem 2. We have D ˆf(X) Y = 2R tr(y (X B XA BXA + B BXA A)), (18) for all Y T XC n m C n m, and hence the gradient of ˆf at a point X C n m is grad ˆf(X) = 2(X B XA BXA + B BXA A). The feasible set is the same as in Problem 2. Hence the orthogonal decomposition (16) holds, and the gradient of f at a point X = V U M is grad f(x) = P Xgrad ˆf (X). Problem 4 For all T T SC n m C n m, we have D ˆf(S) T = 2R tr(t (SAA B SA BSA + B BS)), (19) and hence the gradient of ˆf at a point S C n m is grad ˆf(S) = 2(SAA B SA BSA + B BS). (20) Since the normal space, T S M, is the orthogonal complement of the tangent space, T SM, one can, for any S M, decompose any E C n m into its orthogonal projections on T SM and T S M: P SE = E S Rtr(S E) and P S E = S Rtr(S E). (21) For any T T SM, (19) then yields D ˆf(S) T = Df(S) T = g S P Sgrad ˆf(S), T, 12

and the gradient of f at a point S M is grad f(s) = P Sgrad ˆf (S). For our problem, (3) yields, by means of (20) and (21) λs = (SA BS)A B (SA BS) where λ = tr((sa BS) (SA BS)) ˆf(S). Its equivalent vectorized form is λ vec(s) = A T I I B A T I I B vec(s). Hence, the stationary points of Problem 4 are given by the eigenvectors of `A T I I B `AT I I B. The cost function f simply reduces to the corresponding eigenvalue and the minimal cost is then the smallest eigenvalue. Problem 5 This problem is very similar to Problem 4. A similar approach yields λs = (S BSA ) B (S BSA )A where λ = tr((s BSA ) (S BSA )). Its equivalent vectorized form is λvec(s) = `I I Ā B `I I Ā B vec(s), where Ā denotes the complex conjugate of A. Hence, the stationary points of Problem 5 are given by the eigenvectors of `I I Ā B `I I Ā B, and the cost function f again simply reduces to the corresponding eigenvalue and the minimal cost is then the smallest eigenvalue. 4 Relation to the Crawford Number The field of values of a square matrix A is defined as the set of complex numbers F(A) := {x Ax : x x = 1}, and is known to be a closed convex set [7]. The Crawford number is defined as the distance from that compact set to the origin Cr(A) := min { λ : λ F(A)}, and can be computed e.g. with techniques described in [7]. One could define the generalized Crawford number of two matrices A and B as the distance between F(A) and F(B), i.e. Cr(A, B) := min { λ µ : λ F(A),µ F(B)}. Clearly, Cr(A, 0) = Cr(A) which thus generalizes the concept. Moreover this is a special case of our problem since Cr(A, B) = min U AU V BV. U U=V V =1 One can say that Problem 1 is a k-dimensional extension of this problem. 13

References [1] C. Fraikin, Y. Nesterov, and P. Van Dooren. Optimizing the coupling between two isometric projections of matrices. SIAM Journal on Matrix Analysis and Applications, 2007. [2] V. D. Blondel, A. Gajardo, M. Heymans, P. Senellart, and P. Van Dooren. A measure of similarity between graph vertices: applications to synonym extraction and Web searching. SIAM Review, 46(4):647 666, 2004. [3] C. Fraikin and P. Van Dooren. Graph matching with type constraints on nodes and edges. In Andreas Frommer, Michael W. Mahoney, and Daniel B. Szyld, editors, Web Information Retrieval and Linear Algebra Algorithms, number 07071 in Dagstuhl Seminar Proceedings, Dagstuhl, Germany, 2007. Internationales Begegnungsund Forschungszentrum fuer Informatik (IBFI). [4] J. M. Lee. Introduction to Smooth Manifolds. Springer, New York, 2006. [5] P.-A. Absil, R. Mahony, and R. Sepulchre. Optimization Algorithms on Matrix Manifolds. Princeton University Press, New Jersey, January 2008. [6] U. Helmke and J. B. Moore. Optimization and Dynamical Systems. Communications and Control Engineering Series. Springer-Verlag London Ltd., London, 1994. With a foreword by R. Brockett. [7] R. Horn and C. R. Johnson. Topics in Matrix Analysis. Cambridge University Press, New York, 1991. 14