On the convergence of higher-order orthogonality iteration and its extension

Size: px

Start display at page:

Download "On the convergence of higher-order orthogonality iteration and its extension"

Miles Newton
5 years ago
Views:

1 On the convergence of higher-order orthogonality iteration and its extension Yangyang Xu IMA, University of Minnesota SIAM Conference LA15, Atlanta October 27, 2015

2 Best low-multilinear-rank approximation of tensors min X C 1 A 1... N A N 2 F, C,A subject to A n O In r n {A n R In rn : A n A n = I}, n. Any tensor has a multilinear SVD [Lathauwer-Moor-Vandewalle 00] Truncated multilinear SVD is not the best [L-M-V 00] 2/17

3 Face recognition by low-rank tensor approximation Matrix Decomposition D UC Tensor Decomposition D C 1 U 1 5 U 5 Tensor decomposition can improve recognition accuracy over 50% [Vasilescu-Terzopoulos 02] 3/17

4 Higher-order Orthogonality Iteration (HOOI) Given A, the optimal C is C = X 1 A 1... N A N. With this C plugged in, the approximation problem becomes which is equivalent to min A X X 1 A 1 A 1... N A N A N 2 F, subject to A n O In r n, n, max A X 1 A 1... N A N 2 F = A n unfold n (X i n A i ) 2 F, subject to A n O In r n, n. 4/17

5 Higher-order Orthogonality Iteration (HOOI) Algorithm 1: Higher-order orthogonality iteration (HOOI) Input: X and (r 1,..., r N ) Initialization: choose (A 0 1,..., A 0 N ) with A0 n O In r n, n and set k = 0 while not convergent do for n = 1,..., N do Set A k+1 n to an orthonormal basis of the dominant r n -dimensional left singular subspace of increase k k + 1 G k n = unfold n (X i<n (A k+1 i ) i>n (A k i ) ) Widely used and fit to large-scale problems Better scalability than Newton-type methods such as Newton-Grassmann [Elden-Savas 09] and Riemannian trust region [Ishteva et. al 11] Convergence is still an open question 4/17

6 this talk addresses the open question subspace convergence of HOOI Why care about convergence Without convergence, HOOI can give severely different multilinear subspace by running to different numbers of iterations Challenges Non-convexity Non-uniqueness of dominant subspace Non-uniqueness of orthonormal basis How to establish convergence Greedy selection of orthonormal basis Kurdyka- Lojasiewicz property 5/17

7 Greedy higher-order orthogonality iteration Algorithm 2: Greedy HOOI Input: X and (r 1,..., r N ) Initialization: choose (A 0 1,..., A 0 N ) with A0 n O In r n, n and set k = 0 while not convergent do for n = 1,..., N do Set A k+1 n argmin An A n A k n F over all orthonormal bases of the dominant r n -dimensional left singular subspace of increase k k + 1 G k n = unfold n (X i<n (A k+1 i ) i>n (A k i ) ) Iterate mapping is continuous Will be used to establish convergence of original HOOI 6/17

8 Block-nondegeneracy A is a block-nondegenerate point if σ rn (G n ) > σ rn+1(g n ), n where G n = unfold n (X i n A i ) Dominant multilinear subspace is unique Equivalent to negative definiteness of block Hessian on O In rn local maximizer Necessary condition for convergence A degenerate first-order optimal point is not stationary for a 7/17

9 Convergence analysis of greedy HOOI Let {A k } k 1 be the sequence from greedy HOOI. Assume there is a block-nondegenerate limit point Ā. Ā is a block-wise maximizer and a critical point There is α > 0 such that if A k sufficiently close to Ā, then α A k+1 A k 2 F F (A k+1 ) F (A k ). where F (A) = X 1 A 1... N A N 2 F N n=1 ι O In rn If A k is sufficiently close to Ā, then dist(0, F (A k )) C A k A k 1 F. Further with Kurdyka- Lojasiewicz property A k Ā as k Local convergence to globally optimal solution 8/17

10 Convergence analysis of greedy HOOI Kurdyka- Lojasiewicz Property: the following quantity is bounded near Ā for a certain θ [0, 1) F (A) F (Ā) θ dist(0, F (A)) Use (1 θ)(a b) a θ (a 1 θ b 1 θ ), a, b 0 and previous inequalities to have A k+1 A k 2 F F (A k+1 ) F (A k ) (F (Ā) F (Ak )) θ( (F (Ā) F (Ak )) 1 θ (F (Ā) F (Ak+1 )) 1 θ) dist(0, F (A)) ( (F (Ā) F (Ak )) 1 θ (F (Ā) F (Ak+1 )) 1 θ) A k A k 1 F ( (F ( Ā) F (A k )) 1 θ (F (Ā) F (Ak+1 )) 1 θ). Together with Young s inequality gives k Ak+1 A k F < 8/17

11 Convergence analysis of original HOOI Starting from A k 0 sufficiently close to a block-nondegenerate limit point Ā of {A k } k 1, we have A k n(a k n) = Ãk n(ãk n), n, k k 0 where {A k } k k0 and {Ãk } k k0 are sequences from the original and greedy HOOIs respectively. Ā is a block-wise maximizer and a critical point A k n(a k n) ĀnĀ n as k Local convergence to globally optimal multilinear subspace 9/17

12 Best low-rank tensor approximation with missing values min X C 1 A 1... N A N 2 F, X,C,A subject to A n O In r n, n; P Ω (X ) = P Ω (M) Reduces to the best low-rank tensor approximation if Ω contains all elements Can estimate the original tensor M simultaneously 10/17

13 Best low-rank tensor approximation with missing values Low-rank tensor approximation with missing values [Geng-Miles-Zhou-Wang 11] 10/17

14 Incomplete higher-order orthogonality iteration Algorithm 3: incomplete HOOI (ihooi) Input: P Ω (M) and (r 1,..., r N ) Initialization: choose X 0 and (A 0 1,..., A 0 N ) and set k = 0 while not convergent do for n = 1,..., N do Set A k+1 n to an orthonormal basis of the dominant r n -dimensional left singular subspace of G k n = unfold n (X i<n (A k+1 i ) i>n (A k i ) ) X k+1 P Ω (M) + P Ω c( X k N n=1 A k+1 increase k k + 1 n (A k+1 n ) ) X -update is one step of the alternating minimization for min X C C,X N i=1 A k+1 i (A k+1 i ) 2 F, subject to P Ω (X ) = P Ω (M) 11/17

15 Experimental results Low-multilinear-rank tensor approximation Relation between iterate sequences by original and greedy HOOIs Low-rank tensor completion Phase transition plots (simulate how many samples guarantee exact reconstruction) 12/17

16 Comparing HOOIs on subspace learning Subspace relative changes Greedy HOOI Original HOOI Number of iterations Gaussian random tensor full size: core size: A k n(a k n) = Ãk n(ãk n), n, k n = 1 n = 2 n = 3 15 r n th singular value (r n +1) th singular value Number of iterations 13/17

17 Comparing HOOIs on subspace learning Subspace relative changes Greedy HOOI Original HOOI Number of iterations Yale Face Database full size: core size: A k n(a k n) = Ãk n(ãk n), n, k n = 1 n = 2 n = r n th singular value (r n +1) th singular value Number of iterations 13/17

18 Phase transition plots ihooi (prop d) WTucker geomcg sample ratio sample ratio sample ratio rank r rank r rank r randomly generated tensor with rank (r, r, r) True rank provided (adaptive rank estimate can be applied) WTucker [Filipovic-Jukic 13]: alternating minimization to min A,C P Ω (C N i=1 A i) P Ω (M) 2 F geomcg [Kressner-Steinlechner-Vandereycken 13]: Riemannian optimization method Proposed method achieves near-optimal reconstruction 14/17

19 Convergence to globally optimal solution Observation: P Ω(C N n=1 An) PΩ(M) F P Ω(M) F always small Low sampling ratio possibly more than one feasible low-rank solutions A 0 sufficiently close to bases of M C k N n=1 A k n M 50 success times among noise level δ tensor size: (50,50,50); rank: (20,20,20); sample ratio: 10% A 0 truncated HOSVD of M + δ M F N N F 15/17

20 Conclusions Give subspace sequence convergence of the HOOI method first result on convergence of HOOI via the convergence of a greedy HOOI and Kurdyka- Lojasiewicz inequality block-nondegeneracy is sufficient and necessary Propose an incomplete HOOI method for applications with missing values Subspace learning and tensor completion simultaneously Empirically, near-optimal recovery on low-rank tensors 16/17

21 References Y. Xu. On the convergence of higher-order orthogonality iteration. arxiv, Y. Xu. On higher-order singular value decomposition from incomplete data. arxiv, /17

A new truncation strategy for the higher-order singular value decomposition

A new truncation strategy for the higher-order singular value decomposition Nick Vannieuwenhoven K.U.Leuven, Belgium Workshop on Matrix Equations and Tensor Techniques RWTH Aachen, Germany November 21,