Disentangling Orthogonal Matrices

Size: px

Start display at page:

Download "Disentangling Orthogonal Matrices"

Lindsey Anis Atkins
5 years ago
Views:

1 Disentangling Orthogonal Matrices Teng Zhang Department of Mathematics, University of Central Florida Orlando, Florida 3286, USA arxiv: v2 [math.oc] 8 May 206 Amit Singer 2 Department of Mathematics and The Program in Applied and Computational Mathematics (PACM), Princeton University Princeton, New Jersey 08544, USA Abstract Motivated by a certain molecular reconstruction methodology in cryo-electron microscopy, we consider the problem of solving a linear system with two unknown orthogonal matrices, which is a generalization of the well-known orthogonal Procrustes problem. We propose an algorithm based on a semi-definite programming (SDP) relaxation, and give a theoretical guarantee for its performance. Both theoretically and empirically, the proposed algorithm performs better than the naïve approach of solving the linear system directly without the orthogonal constraints. We also consider the generalization to linear systems with more than two unknown orthogonal matrices. Keywords: SDP relaxation, orthogonal Procrustes problem, cryo-em MSC[200] 5A24, 90C22. Introduction In this paper, we consider the following problem: given known matrices X, X 2 R N D and unknown orthogonal matrices V, V 2 O(D), recover V and V 2 from X 3 R N D defined by X 3 = X V + X 2 V 2. () A naïve approach would be solving () while dropping the constraints of orthogonality on V and V 2. This linear system has ND linear constraints and 2D 2 unknown variables, therefore, this approach can recover V and V 2 when N 2D. The question is, can we develop an algorithm that takes the constraints of orthogonality into consideration, so that it is able to recover V and teng.zhang@ucf.edu 2 amits@math.princeton.edu Preprint submitted to Elsevier May 0, 206

2 V 2 when N < 2D, and more stably when the observation X 3 is contaminated by noise? The associated least squares problem min V,V 2 O(D) X V + X 2 V 2 X 3 2 F (2) can be considered as a generalization of the well-known orthogonal Procrustes problem []: min X V X 2 2 F, (3) V O(D) with the main difference being that the minimization in (2) is over two orthogonal matrices instead of just one in (3). Although the orthogonal Procrustes problem has a closed form solution using the singular value decomposition, the problem (2) does not enjoy this property. Still, (2) can be reformulated so that it belongs to a wider class of problems called the little Grothendieck problem [2], which again belongs to QO-OC (Quadratic Optimization under Orthogonality Constraints) considered by Nemirovski [3]. QO-OCs have been well studied and include many important problems as special cases, such as Max-Cut [4] and generalized orthogonal Procrustes [5, 6, 7] min X i V i X j V j 2 F, V,...,V n O(D) i,j n which has applications to areas such as psychometrics, image and shape analysis and biometric identification. The non-commutative little Grothendieck problem is defined by: min n V,...,V n O(D) i,j= tr(c ij V i V T j ). (4) Problem (2) can be considered as a special case of (4) when n = 3. The argument is as follows. For convenience, we homogenize () by introducing a slack unitary variable V 3 O(D) and consider the augmented linear system X V + X 2 V 2 + X 3 V 3 = 0 (5) Clearly, if (V, V 2, V 3 ) is a solution to (5), then the pair (V V T 3, V 2 V T 3 ) is a solution to the original linear system (). The least squares formulation corresponding to (5) is min X V + X 2 V 2 + X 3 V 3 2 F. (6) V,V 2,V 3 O(D) Let C R 3D 3D be a Hermitian matrix with the (i, j) th D D block given by C ij = X T i X j. The least squares problem (6) is equivalent to min 3 V,V 2,V 3 O(D) i,j= which is the little Grothendieck problem (4) with n = 3. tr(c ij V j V T i ), (7) 2

3 .. Motivation Our problem arises naturally in single particle reconstruction (SPR) from cryo-electron microscopy (EM), where the goal is to determine the 3D structure of a macromolecule complex from 2D projection images of identical, but randomly oriented, copies of the macromolecule. Zvi Kam [8] showed that the spherical harmonic expansion coefficients of the 3D molecule, when arranged as matrices, can be estimated from 2D projection images up to an orthogonal matrix (for each degree of spherical harmonics). Based on this observation, Bhamre et al. [9] recently proposed Orthogonal Replacement (OR), an orthogonal matrix retrieval procedure in which cryo-em projection images are available for two unknown structures ϕ () and ϕ (2) whose difference ϕ (2) ϕ () is known. It follows from Kam s theory that we are given the spherical harmonic expansion coefficients of ϕ () and ϕ (2) up to an orthogonal matrix, and their difference. Then the problem of recovering the spherical harmonic expansion coefficients of ϕ () and ϕ (2) is reduced to the mathematical problem (). If () can be solved for smaller N, then we can reconstruct ϕ () and ϕ (2) with higher resolution. The cryo-em application serves as the main motivation of this paper. We refer the reader to [9] for further details regarding the specific application to cryo-em. 2. Algorithm and Main result The little Grothendieck problem and QO-OCs are generally intractable, for example, it is well-known that the Max-Cut Problem is NP-hard. Many approximation algorithms have been proposed and analyzed [4, 3, 0,, 2, 2], and the principle of these algorithms is to apply a semi-definite programming (SDP) relaxation followed by a rounding procedure. The SDP can be solved in polynomial time (for any finite precision). Based on the same principle, we relax the problem (7) to an SDP as follows. Let H R 3D 3D be a Hermitian matrix with the (i, j) th D D block given by H ij = V i Vj T. Then (7) is equivalent to min tr(ch), H 0,H ii=i,rank(h)=d where H 0 denotes that H is a positive semidefinite matrix. The only constraint which is non-convex is the rank constraint. Dropping it leads to the following SDP: min tr(ch), subject to H 0 and H ii = I. (8) If the solution satisfies rank(h) = D, then V, V 2 and V 3 are extracted by applying Cholesky decomposition to H. Notice that the solution to (5) is not unique: if (V, V 2, V 3 ) satisfies (5), then for any V O(D), the triplet (V V, V 2 V, V 3 V ) satisfies (5) as well. Although the solution to (5) is not unique, the solution to the original problem (2) is uniquely given by (V V3 T, V 2 V3 T ). 3

4 When the solution matrix H has rank greater than D, we employ the rounding procedure described in [2]: assuming that Cholesky decomposition of H gives H = U T U, where U R 3D 3D and U = (U, U 2, U 3 ), where U i R 3D D. Then we let the approximate solutions to (2) be the nearest orthogonal matrices to U T U 3 and U T 2 U 3 by the procedure in [3]. The procedure is as follows: for a matrix Z with SVD decomposition Z = U Z Σ Z V Z, the nearest orthogonal matrix is U Z V Z = Z(Z T Z) 2. Specifically, our solutions to U T U 3 are given by V = U T U 3 (U T 3 U U T U 3 ) 2 and V2 = U T 2 U 3 (U T 3 U 2 U T 2 U 3 ) 2. (9) 2.. Notations Without loss of generality, we assume that V = V 2 = I in the theoretical analysis. We define the subspace L and the set A as follows: L = {x C 3D : x = (v, v, v) for some v C D }, A = {x C 3D : x = (c v, c 2 v, c 3 v) for some c, c 2, c 3 C and v C D }. For any matrix A C m n, Sp(A) represents the subspace spanned by the row vectors of A. L + L represents the subspace {x + y : x L, y L }, P L is the projector to the subspace L and P L (L ) denotes the orthogonal projection of the subspace L onto L, i.e., {P L x : x L }. For any matrix A with 3D columns, we use the the same notation with subscripts to denote its decomposition into three parts: for A R m 3D, A = (A, A 2, A 3 ), where A i R m D Main results The main contribution of this paper is a particular theoretical guarantee for the SDP approach to return a solution of rank D and recover V and V 2 exactly. We start with a theorem that controls the lower bound of the objective function in (8). Theorem 2.. For any U = (U, U 2, U 3 ) C k 3D that satisfies U i C k D, Ui T U i = I, ( UPL XU T F c(x) ) 2 2D F. 200D Here c(x) is a constant defined as follows: c(x) = min v A L, v = Xv. Based on Theorem 2. and XU T 2 F = tr(ch), this paper proves that when N D +, the SDP method recovers the orthogonal matrices for generic cases, i.e., the property holds for (X, X 2 ) that lies in a dense open subset of R N D R N D. The statement is as follows, and part (b) shows that the SDP method is stable to noise. 4

5 Theorem 2.2. (a) For generic X, X 2 R N D with N D +, the SDP method recovers V and V 2 exactly. (b) For generic X, X 2 R N D with N D+, suppose that the input matrices satisfy ˆX i X i F ɛ for i 3, then the SDP method recovers {V i } 2 i= approximately in the sense that the error between the recovered orthogonal matrix ˆV i and the true orthogonal matrix V i, ˆV i V i F, is bounded above by O(ɛ 2 2D ). This result shows that the SDP method successfully recovers the orthogonal matrices as long as N D+, compared with the stringent requirement N 2D for the naïve least squares approach. The condition N D+ is nearly optimal. In (), there are ND constraints and D(D ) variables. Hence, it is impossible to recover V and V 2 when N < D. While existing works on the sensitivity of SDP problems give similar bounds as the one in Theorem 2.2(b) [4], this bound is a worst-case guarantee and in practice, ˆV i V i F is usually much smaller, as shown in Table 2 of Section 3. We also remark that Theorem 2.2 can be generalized to the complex case the proof applies to the case of unitary matrices as well. For the complex case, there are 2ND constraints and 2D 2 degrees of freedom. Therefore, it is impossible to recover V and V 2 when N < D. Moreover, we suspect that the recovery is impossible even for N = D, which would suggest that the sufficient condition N D + in Theorem 2.2(a) is also necessary: in fact, it is easy to verify the impossibility of recovering V and V 2 when N = D = Generalization A natural generalization of () is the following problem: given known matrices X, X 2,..., X K R N D and unknown orthogonal matrices V, V 2,..., V K O(D), recover {V i } K i= from X K = K i= X i V i. (0) For this generalized problem, the SDP method is formulated as follow: let H R KD KD be a Hermitian matrix with the (i, j) th D D block given by H ij = V i Vj T. Then the SDP method solves min tr(ch), subject to H 0 and H ii = I for all i K, () where H ii represents the (i, i) th D D block. Then we extract the orthogonal matrices by the procedure (9). For this generalized problem and its associated SDP approach, we have the following theoretical guarantee. Theorem 2.3. For generic {X i } K i= RN D, where N (K 2)D +, and X K = ( K i= X iv i )VK T, then the SDP method recovers {V i} K i= exactly. Theorem 2.2(a) can be considered as a special case of Theorem 2.3 when K = 3. However, for K > 3, the condition N (K 2)D + is not close to 5

6 optimal. Since (0) has ND constraints and D(D )(K )/2 variables, the information-theoretic limit is N = (D )(K )/2. Simulations in Section 3 also show that the SDP approach empirically recovers the orthogonal matrices even when N is smaller than (K 2)D +. However, the theoretical guarantee in Theorem 2.3 is still more powerful than the least squares approach of solving min {Vi} K K i= RD D i= X iv i X K 2 F, which requires N (K )D + to recover {V i } K i=. 3. Numerical Experiments In this section, we compare several methods for solving () and (0) on artificial data sets. The data sets are generated as follows: {X i } K i= are random matrices with i.i.d standard Gaussian entries N (0, ), {V i } K i= are random orthogonal matrices, and X K is generated by (0). We compare the following five methods:. The SDP relaxation approach (SDP) described in Section The naïve least squares approach (LS): min {V i} K i= K X K i= X i V i 2 F 3. Since the convex hull of the set of orthogonal matrices is the set of matrices with operator norm non greater than one, we can strengthen the LS approach by constraining its domain (C-LS): min {V i} K i= K X K i= X i V i 2 F, subject to V i, i K 4. This is an approach suggested to us by Afonso Bandeira. Let us start with the case K = 3. If V 3 = V V2 T, then from (), X 3 V2 T = X V 3 + X 2 and X 3 V T = X + X 2 V3 T. Then we solve the expanded least squares problem based on these three equations (LS+): min X 3 X V X 2 V 2 2 F + X 3 V2 T X V 3 X 2 2 F + X 3 V T X X 2 V3 T 2 F V,V 2,V 3 In general, for K 3, this method can be written as min tr(ch 2 ), subject to H = H T and H ii = I for all i K, H R KD KD where H ij represents the ij-th D D block of H. 5. The LS+ approach with constraints on the operator norm of H ij (C-LS+): min tr(ch 2 ), subject to H = H T, H ii = I and H ij for all i, j K. H R KD KD 6

7 To compare the SDP/LS+/C-LS+ approaches, we summarize their objective functions and their constraints in Table. There are two main differences. First, the objective functions are different. However, since tr(ch) = 0 if and only if tr(ch 2 ) = 0 (considering C 0 and H 0), this difference does not affect the property of exact recovery. Second, the constraints of the SDP approach is more restrictive than that of the C-LS+ approach (H ii = H jj = I and H 0 imply H ij ), which is more restrictive than the C-LS approach. This observation partially justifies the fact that SDP performs better than C-LS+, and C-LS+ performs better than C-LS. However, these interpretations do not justify the empirical finding in Figures 2 and 3 that C-LS+ and SDP behave very similarly in the absence of noise. We leave the explanation of this observation as an open question. Table : Comparison between SDP, LS+ and C-LS+ approaches. objective function common constraint other constraints SDP tr(ch) H 0 LS+ tr(ch 2 ) H ii = I, H = H T C-LS+ tr(ch 2 ) H ij Among these optimization approaches, the LS method has an explicit solution by decomposing it into D sub-problems, where each is a regression problem that estimates KD regression parameters. All other methods are convex and can be solved by CVX [5], where the default solver SeDuMi is used [6]. While the LS+ approach can also be written as a least squares problem with an explicit solution, this problem is not decomposable (unlike the LS method). When the solution matrices of LS/C-LS are not orthogonal, they are rounded to the nearest orthogonal matrices using the approach in [3]. The rounding procedure of LS+/C-LS+ is the same as that of the SDP method in (9). In the first simulation, we aim to find the size of N such that the orthogonal matrices be exactly recovered by the suggested algorithms for K = 3. We let D = 0 or 20 and choose various values for N, and record the mean recovery error of V (in Frobenius norm) over 50 repeated simulations in Figure 2. The performance of LS verifies our theoretical analysis: it recovers the orthogonal matrices for N 2D. LS fails when N < 2D because the null space of [X, X 2 ] is nontrivial and there are infinite solutions. Besides, LS+ succeeds when N 3D/2. SDP and C-LS+ are the best approaches and they succeed when N D +, which verifies Theorem 2.2. In the second simulation, we test the stability of the suggested algorithms when K = 3 and the measurement matrix X 3 is contaminated elementwisely by Gaussian noise N (0, σ 2 ). We use the setting N = 2, 6, 22, D = 0 and σ = 0.0 or 0. and record the mean recovery error over 50 runs in Table 2, which shows that the SDP relaxation approach is more stable to noise than competing approaches. This motivates our interest in studying the SDP approach. In the third simulation, we compare these methods for K = 5 and D = 5, 0. The results are shown in Figure 3. This simulation verifies Theorem 2.3 by 7

8 Figure : The dependence of the mean recovery error with respect to N, when D = 0 (left panel) and D = 20 (right panel). The y-axis represents the mean recovery error of V in Frobenius norm. Table 2: The mean recovery error in the noisy setting for K = 3 and D = 0. N σ SDP C-LS+ LS+ C-LS LS

9 Figure 2: The dependence of the mean recovery error with respect to N, when K = 5, D = 5 (left panel) and K = 5, D = 0 (right panel). The y-axis represents the mean recovery error of V in Frobenius norm. showing that the SDP approach successfully recovers the orthogonal matrices for N (K 2)D +. Indeed, the empirical performance of the SDP approach is even better: it recovers {V i } 5 i= at N = 2 and 25 respectively, which are smaller than (K 2)D +. Compared with LS/LS+/C-LS, the SDP and C- LS+ approaches recover the orthogonal matrices with smaller N. At last, we record the running time for all approaches in Table 3. Although the running time is not the main focus of this paper, and CVX is not optimized for the approaches, this table get a sense of the running times. Table 3 clearly shows that the LS approach is much faster than the other approaches, and the SDP approach is consistently faster than C-LS+. We suspect that it is due to the fact that SDP has fewer constraint, even though the constraint of SDP is more restrictive than that of C-LS+. Table 3: The average running time (in seconds) when σ = 0., K = 3, D = 0, N = 5 (first row) and σ = 0., K = 5, D = 0, N = 35 (second row). SDP C-LS+ LS+ C-LS LS Proof of the main result 4.. Auxiliary lemmas This section presents two auxiliary lemmas and their proofs. Lemma 4.. Given linear maps f, f 2 : C n V, where Im(f ) Im(f 2 ) = {0} and Im(f ) + Im(f 2 ) = V, for any given subspace L V with dim(l) n, 9

10 there exists a nonzero vector x L, c, c 2 C and v C n such that x = c f (v) + c 2 f 2 (v). Proof. WLOG assume that dim(l) = n. Let {e,..., e n } be the coordinate system in C n and {f (e ),..., f (e n ), f 2 (e ),..., f 2 (e n )} be the coordinate system in V, and assume that L = Im(C), where C C 2n n, ( ) C C =, and C, C 2 C n n. C 2 Then it is sufficient to show that there exist y C n and λ C such that C y = λc 2 y, or C 2 y = 0. When C 2 is invertible, let y to be any eigenvector of C2 C, then C y = λc 2 y for some λ C. When C 2 is not invertible, C 2 is singular and we can find y such that C 2 y = 0. Therefore, Lemma 4. is proved. Lemma 4.2. AB F A B F. (2) Proof. To prove it, assume that B = (b,..., b ), then AB = (Ab,..., Ab n ). Adding up the inequalities Ab i 2 A 2 b i 2 for i n, (2) is proved Proof of Theorem 2. The main idea of the proof is to find Ũ () such that its row space intersects A, and approximately contained in the row space of U. Then XU T can be bounded below by c(x) and the approximation error. By SVD decomposition of UP L, there exists a unitary matrix U 0 R k k such that the columns of U 0 UP L are orthogonal and have norms σ σ 2... σ k. We write U 0 U into two parts: ( ) U () U 0 U =, U (2) where U () C d 3D and U (2) C (k d) 3D. Then for any unit vector x in C d, Let ker(u () ) = {x CD : U () x = 0} and xu () P L σ d, and (3) U (2) P L σ d+. (4) Ũ () = (U () P ker(u () ), U () 2 P ker(u () ), U () 3 P ker(u () ) ). The argument for Theorem 2. is based on the following three properties. Their proofs are is deferred to Section

11 . Ũ () U () is small: Ũ () U () 4 σ d+. (5) 2. The row space of Ũ () contains vectors from A if σ d 4 σ d+ > Assume σ i = 0 for k + i 2D +, then max σ i 8 σ i+ ( σ i 2D 8 )4D. (6) By property 2 and property 3, there exists d such that there is a vector in both A and Sp(Ũ () ), denoted by v Ũ (), where v C d and v =. Then by (3) and (5), v Ũ () P L v U () P L Ũ () U () σ d 4 σ d+. As a result, Then v U () P L v Ũ () P L U () Ũ () σ d 8 σ d+. XU () F = XP L U () XP L Ũ () v (σ d 8 σ d+ ) P L Ũ () v (σ d 8 σ d+ )c(x). (7) Combining it with (6) and XU F XU () F, Theorem 2. is proved Proof of the three properties The argument for the first property is as follows. For any x ker(u () ) with x =, U () (2) x = 0 and U x = U x =. Applying (4), we have U (2) (2) U 2 = U (2) [I, I, 0] T = U 2 P L [I, I, 0] T U 2 P L [I, I, 0] 2σ d+ and U (2) (2) 2 x (U U (2) 2 )x 2σ d+. As a result, U () 2 x 2 σ d+. Since it holds for any unit vector x, U () 2 P ker(u () ) 2 σ d+. Similarly, U () 3 P ker(u () ) 2 σ d+. Property is concluded as follows: Ũ () U () = (0, U () 2 P ker(u () ), U () 3 P ker(u () )) 4 σ d+. (8) The prove the second property, combining (5) with (3), this implies that when σ d 4 σ d+ > 0, dim(sp(ũ () )) = dim(sp(u () )) dim(sp(u () )) = dim(ker(u () ) ). Let f, f 2 : ker(u () ) C 3D defined by f (x) = [x, x, 0] and f 2 (x) = [x, 0, x], then Lemma 4. implies that Sp(Ũ () ) contains a vector in A. Finally, the argument for the last property (6) is as follows. Assume max i 2D σ i 8 σ i+ = a, then a < 3 since a < max i 2D σ i U 3 i= U i = 3. Applying the definition of a with σ 2D+ = 0, σ 2D a. Then σ 2D a + 8 σ 2D a + 8 a a 2 < 00a 2. Now use the argument of induction. Assume that for d 2D we have σ d 00a 2d 2D, then σ d a + 8 σ d a + 80a 2d 2D < 00a 2d 2D. Therefore, σ < 00a 2 2D and a ( σ 00 )22D. 2D Since σ i= σ2 i /2D > UP L F /2D, (6) is proved.

12 4.3. Proof of Theorem 2.2 Part (a) follows from the result in part (b) with ɛ = 0: then we have UP L F = 0 and U = U 2 = U 3. Therefore, H = (I, I, I) T (I, I, I) and has a rank of D, which means that the SDP approach recovers V and V 2. In the proof of part (b), we represent the noisy setting in (8) by ˆX, Ĉ and Ĥ, the clean setting by X, C and H, and write the Cholesky decompositions of H and Ĥ by H = U T U and Ĥ = Û T Û. Since Û i T Û i = I, Û 3 i= Ûi = 3, (2) implies XÛ T F ˆXÛ T F (X ˆX)Û T F 3 X ˆX F 9ɛ (9) and following the same argument, ˆXU T F 9ɛ + XU T F. (20) Since ˆXÛ T F ˆXU T F and XU T F = 0, (20) implies Combining (9), (2) with Theorem 2., and we have ˆXÛ T F 9ɛ. (2) 9ɛ ˆXÛ F XÛ F 9ɛ c(x)( ÛP L F 2D 200D )2 9ɛ (22) ( 8ɛ ) 2 2D ÛP L F 200D. (23) c(x) Then ( 8ɛ ) 2 2D Û Û2 F = ÛP L (I, I, 0)T F ÛP L F (I, I, 0) T 400D, c(x) and Û T ( 8ɛ ) 2 2D Û2 I F = Û T (Û Û) F Û Û Û F 400D. c(x) Since the post-processing step of V 2 in (9) is a continuous and differentiable function with respect to Û T Û2 at Û T Û2 = I, the recovered orthogonal matrix V 2 has an error bounded above by a constant that is in the order of O(ɛ 2 2D ) if c(x) 0. To prove part (b), it suffices to show that when N > D +, c(x) > 0 for generic X. Let XP L = [C, C 2 ], then this problem is equivalent to: given generic matrices C, C 2 R N D with entries i.i.d. sampled from N(0, ), there do not exist y C D and λ C such that y 0, C y = λc 2 y, or C 2 y = 0. This is a rectangular generalized eigenvalue problem. It is a well-known fact that, for generic C, C 2 with more rows than columns, there is no eigenvalue for such eigenvalue problems [7]. 2

13 4.4. Proof of Theorem 2.3 Proof. The proof can be divided into two steps. First, we prove that if the SDP fails to recover {V i } K i=, then X satisfies a condition (27). Second, we show that this condition does not hold for generic X. In the proof, we define linear operators f,..., f K : R D R KD f i (x) = (0,..., 0, x, 0,..., 0). ith position We also defined L slightly differently from previous sections (but following the same principle) by L = {z R KD : z = (x, x,..., x) for some x R D }. Since X X K = 0, ( ) ( ) tr H(I,..., I) T (I,..., I) = tr (I,..., I)H(I,..., I) T = X +...+X K 2 F = 0. Considering that tr(hc) 0 for any H 0, if the solution to the SDP approach is not uniquely given by (I,..., I) T (I,..., I), then there exists H (I,..., I) T (I,..., I) such that tr(ch) = 0. Let H = U T U for a matrix U R k KD, then we have Sp(X) Sp(U), (24) for a U = (U, U 2,..., U K ) such that U i R k KD and Ui T U i = I for i K, and {U i } K i= are not all the same. (25) Let d = k dim(sp(u) L ), then we can choose an unitary matrix U 0 C k k such that Ũ = U 0U has the following property: the first k d rows of Ũ lie in L and the span of the other d rows only intersects L at origin. Since we can choose U from H = U T U up to a multiplication of any orthogonal matrix in O(k), WLOG we may assume that U 0 = I and U has this property. Define U i and U i2 by ( ) Ui U i =, where U i C (k d) D and U i2 C d D, U i2 and let Ũ = (U, U 2,..., U K ) R (k d) KD and Û = (U 2, U 22,..., U K2 ) R d KD. Since the first (k d) rows of U lie in L, U = U 2 =... = U K. Combining it with Ui2 T U i2 = Ui T U i Ui T U i = I Ui T U i, we have Sp(U 2 ) = Sp(U 22 ) =... = Sp(U K2 ). Let L 0 = Sp(U 2 ) R D, then Sp(U) = Sp(Û) + Sp(Ũ), where Sp(Ũ) L, Sp(Û) L = {0}, Sp(Û) f (L 0 ) + f 2 (L 0 ) f K (L 0 ), and dim(sp(û)) dim(l 0 ). Let ˆL = P L Sp(Û), then ˆL (f (L 0 ) + f 2 (L 0 ) f K (L 0 )) L and dim(ˆl) = dim(sp(û)) dim(l 0). (26) 3

14 Applying (24), if the SDP method fails to recover {V i } K i=, then there exists ˆL P L (Sp(X) ) and L 0 R D such that (26) is satissp(ũ) L fied. For a generic X, P L (Sp(X) ) is a KD D N-dimension subspace P L (Sp(X) ) in L, and we denote the manifold of all such KD D N- dimension subspaces by M. Applying the previous analysis, all P L (Sp(X) ) with property (26) is in the set M M 2... M D, where M d is the set of all L M such that: There exists a d-dimensional subspace L 0 R D and a d-dimensional subspace ˆL such that L ˆL, and ˆL f (L 0 ) + f 2 (L 0 ) f K (L 0 ). (27) Indeed, every element in M d can be obtained by the following procedure: Pick L 0 by a random d-dimensional subspace in R D. Pick ˆL by a random d-dimensional subspace in [f (L 0 ) + f 2 (L 0 ) f K (L 0 )] L, a (K )d-dimensional space. Choose a random N d-dimensional subspace L in the KD D d- dimensional space (ˆL + L ). Then L = L + ˆL is an element in M d. This constructive procedure gives a mapping from G(D, d) G(Kd d, d) G(KD D d, N d) to M d M, where G(D, d) represents the manifold of all d-dimensional subspaces in R D. The dimension of G(D, d) G(Kd d, d) G(KD D d, N d) is (D d)d + (K 2)d 2 + N(KD D N d). Since N (K 2)D +, the dimensionality of M is N(KD D N) and it is larger than the dimension of G(D, d) G(Kd d, d) G(KD D d, N d). Applying this difference in dimensionality, we can prove that (M M 2... M D ) c is a open dense set in M by constructing a ɛ-net for M d as follows. First, find a ɛ-net for the L 0 in step, and denote the ɛ-net by {L (i) 0 }N i= with N = O(ɛ (D d)d ). Second, for each L (i) 0, find a ɛ-net for the corresponding ˆL in step 2, denoted by {ˆL (i)(j) } N2 j= with N 2 = O(ɛ (K 2)d2 ). Finally, for each ˆL (i)(j), find a ɛ-net for the corresponding L in step 3, denoted by {L (i)(j)(k) } N3 k= with N 3 = O(ɛ N(KD D N d) ). By analyzing the continuity of the constructive procedure, {L (i)(j)(k) } N,N2,N3 i,j,k= is a cɛ-net of M d for some c > 0. Denote the haar measure of M by µ, then the measure on a ɛ-ball is O(ɛ N(KD N N) ). By counting the measure of the ɛ-net and letting ɛ 0, we have µ(m d ) = 0. In addition, by the continuity in the constructive procedure, M d is closed and M c d is open. Therefore, M M 2... M D is a closed set with measure zero, which means that its complement in M is an open and dense set. Therefore, the SDP approach recovers {V i } K i= for generic P L (Sp(X) ) M and also for general X. 4

15 Acknowledgment The authors were partially supported by Award Number R0GM from the NIGMS, FA and FA from AFOSR, LTR DTD from the Simons Foundation, and the Moore Foundation Data- Driven Discovery Investigator Award. References References [] J. Gower, G. Dijksterhuis, Procrustes Problems, Oxford Statistical Science Series. [2] A. S. Bandeira, C. Kennedy, A. Singer, Approximating the little Grothendieck problem over the orthogonal and unitary groups, Mathematical Programming (206) 43. [3] A. Nemirovski, Sums of random symmetric matrices and quadratic optimization under orthogonality constraints, Mathematical Programming 09 (2-3) (2007) [4] M. X. Goemans, D. P. Williamson, Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming, J. ACM 42 (6) (995) [5] J. Gower, Generalized procrustes analysis, Psychometrika 40 () (975) [6] J. Ten Berge, Orthogonal procrustes rotation for two or more matrices, Psychometrika 42 (2) (977) [7] A. Shapiro, J. D. Botha, Dual algorithm for orthogonal procrustes rotations, SIAM Journal on Matrix Analysis and Applications 9 (3) (988) [8] Z. Kam, The reconstruction of structure from electron micrographs of randomly oriented particles, Journal of Theoretical Biology 82 () (980) [9] T. Bhamre, T. Zhang, A. Singer, Orthogonal matrix retrieval in cryoelectron microscopy, 2th IEEE International Symposium on Biomedical Imaging (ISBI 205). [0] A. M.-C. So, J. Zhang, Y. Ye, On approximating complex quadratic optimization problems via semidefinite programming relaxations, Math. Program. 0 () (2007)

16 [] A.-C. So, Moment inequalities for sums of random matrices and their applications in optimization, Mathematical Programming 30 () (20) [2] A. Naor, O. Regev, T. Vidick, Efficient rounding for the noncommutative grothendieck inequality, in: Proceedings of the Forty-fifth Annual ACM Symposium on Theory of Computing, STOC 3, 203, pp [3] J. B. Keller, Closest unitary, orthogonal and Hermitian operators to a given operator, Mathematics Magazine 48 (4) (975) [4] Y.-L. Cheung, H. Wolkowicz, Sensitivity analysis of semidefinite programs without strong duality, Technical report, University of Waterloo, Waterloo, Ontario, 204. Available online at HTML/204/06/446.html. [5] M. Grant, S. Boyd, CVX: Matlab software for disciplined convex programming, version 2., (Mar. 204). [6] H. R. Tütüncü, C. K. Toh, J. M. Todd, Solving semidefinite-quadraticlinear programs using sdpt3, Mathematical Programming 95 (2) (2003) [7] J. W. Demmel, A. Edelman, The dimension of matrices (matrix pencils) with given Jordan (Kronecker) canonical forms, Linear Algebra and its Applications 230 (995)

Some Optimization and Statistical Learning Problems in Structural Biology

Some Optimization and Statistical Learning Problems in Structural Biology Amit Singer Princeton University, Department of Mathematics and PACM January 8, 2013 Amit Singer (Princeton University) January