Key words. multilinear algebra, singular value decomposition, tensor decomposition, low rank approximation

Size: px

Start display at page:

Download "Key words. multilinear algebra, singular value decomposition, tensor decomposition, low rank approximation"

Osborne Curtis
5 years ago
Views:

1 ON THE TENSOR SVD AND THE OPTIMAL LOW RANK ORTHOGONAL APPROXIMATION OF TENSORS JIE CHEN AND YOUSEF SAAD Abstract. It s known that a hgher order tensor does not necessarly have an optmal low rank approxmaton, and that a tensor mght not be orthogonally decomposable (.e., admt a tensor SVD). We provde several suffcent condtons whch lead to the falure of the tensor SVD, and characterze the exstence of the tensor SVD wth respect to the Hgher Order SVD (HOSVD). In face of these dffcultes to generalze standard results known n the matrx case to tensors, we consder the low rank orthogonal approxmaton of tensors. The exstence of an optmal approxmaton s theoretcally guaranteed under certan condtons, and ths optmal approxmaton yelds a tensor decomposton where the dagonal of the core s maxmzed. We present an algorthm to compute ths approxmaton and analyze ts convergence behavor. Numercal experments ndcate a lnear convergence rate for ths algorthm. Key words. multlnear algebra, sngular value decomposton, tensor decomposton, low rank approxmaton AMS subject classfcatons. 15A69, 15A18 1. Introducton. There has been renewed nterest n studyng the propertes and decompostons of tensors (also known as N-way arrays or multdmensonal arrays) n numercal lnear algebra n recent years [30, 13, 12, 43, 17, 9, 28, 29, 15, 11]. The tensor approxmaton technques have been frutfully appled n varous areas whch nclude among others, chemometrcs [38, 4], sgnal processng [10, 8], vson and graphcs [41, 42], and network analyss [31, 1]. From the pont of vew of practcal applcatons, the matrx SVD and the optmal rank-r approxmaton of matrces (a.k.a. Eckart-Young theorem [18]) are of partcular nterest, and t would be nce f these propertes could be drectly generalzed to hgher order tensors. However, for any order N 3, de Slva and Lm [17] showed that the problem of optmal low rank approxmaton of hgher order tensors s ll-posed for many ranks r, and that ths ll-posedness s not rare for order-3 tensors. Furthermore, Kolda presented numerous examples to llustrate the dffcultes of orthogonal tensor decompostons [28, 29]. These studes revealed many aspects of the dssmlartes between tensors and matrces, n spte of the fact that hgher order tensors are multdmensonal generalzatons of matrces. The most commonly used generalzaton of the matrx SVD to hgher order tensors to date s the Hgher Order Sngular Value Decomposton (HOSVD) [12]. The HOSVD decomposes an order-n tensor nto a core tensor that has the same sze as the orgnal tensor together wth N orthogonal 1 sde-matrces. Although ths decomposton preserves many nce aspects of the matrx SVD (e.g., the core has the all-orthogonalty property and the orderng property), a notable dfference s that the core s n general not dagonal. Hence, n contrast wth the matrx SVD, the HOSVD cannot be wrtten as a sum of a few orthogonal outer-product terms 2. Ths work was supported by NSF under grants DMS and DMS and by the Mnnesota Supercomputng Insttute. Department of Computer Scence and Engneerng, Unversty of Mnnesota at Twn Ctes, MN Emal: {jchen, saad}@cs.umn.edu. 1 Throughout ths paper, a matrx A R m n, m n, s sad to be orthogonal f A T A = I. Ths generalzes the defnton for square matrces. 2 For dscussons of orthogonalty, see Secton

2 2 J. CHEN AND Y. SAAD There exst three well-known approxmatons to hgher order tensors: (1) the rank-1 approxmaton [13, 43, 27]; (2) the rank-(r 1, r 2,..., r N ) approxmaton wth a full core and N orthogonal sde-matrces (n the Tucker/HOOI fashon) [40, 13]; and (3) the approxmaton usng r outer-product terms (n the CANDECOMP/PARAFAC fashon) [6, 19]. Note that the approxmated tensor n case (3) mght have rank less than r. Among these approxmatons, the rank-1 approxmaton [17] and the rank- (r 1, r 2,..., r N ) approxmaton are theoretcally guaranteed to have a global optmum. In practcal applcatons, the three approxmatons are generally computed usng an alternatng least squares (ALS) method [33, 3, 25] (the so-called workhorse algorthm [30]), although many other methods have also been proposed [34, 37, 43, 28, 15, 11]. The convergence behavor of the ALS method s theoretcally unknown except under a few strong condtons [32]. Besdes, t has long been observed that the ALS method for the PARAFAC model may converge extremely slowly f at all [36, 26]. An llustraton of ths phenomenon s gven n the Appendx. Kolda [28] nvestgated several orthogonal decompostons of tensors related to dfferent defntons of orthogonalty, ncludng orthogonal rank decomposton, complete orthogonal rank decomposton and strong orthogonal rank decomposton. These decompostons mght not be unque, or even exst. Among these defntons, only the complete orthogonalty gves a stuaton whch parallels that of the matrx SVD. Ths approach demands that the sde-matrces all be orthogonal, n whch case we use the term tensor sngular value decomposton (tensor SVD, see Defnton 4.1) n ths paper. Zhang and Golub [43] proved that for all tensors of order N 3, the tensor SVD s unque (up to sgns) f t exsts, and that the ncremental rank-1 approxmaton approach wll compute ths decomposton. The followng contrbutons are made n ths paper: 1. Suffcent condtons ndcatng whch tensors fal to have a tensor SVD are gven. These condtons are related to the rank, the order, and the dmensons of the tensor, and hence can be vewed as generalzatons of results gven n the lterature wth specfc examples. Furthermore, the exstence of the tensor SVD can be characterzed by the dagonalty of the core n the HOSVD of the tensor. 2. A form of low rank approxmatons one that requres a dagonal core and orthogonal sde-matrces s dscussed. Theoretcally the global optmum of ths approxmaton can be attaned for any (approprate) rank. We present an teratve algorthm to compute ths approxmaton and analyze ts convergence behavor. 3. The proposed approxmaton at the maxmally possble rank leads to a decomposton of the tensor, where the dagonal of the core s maxmzed. Ths maxmal dagonalty for symmetrc order-3 [14] and 4 [7] tensors and for general order-3 tensors [15, 24, 35] has been prevously nvestgated and Jacob algorthms were used n the cted papers, but our dscusson s n a more general context and the proposed algorthm s not of a Jacob type. 2. Tensor algebra. In ths secton, we brefly revew some concepts and notons that are used throughout the paper. A tensor s a multdmensonal array of data whose elements are referred by usng multple ndces. The number of ndces requred s called the order of a tensor. We use A = (a 1, 2,..., N ) R d 1 d 2 d N to denote a tensor A of order N. For n = 1, 2,..., N, d n s the n-th dmenson of A. As a specal case, a vector s an order-1 tensor and a matrx s an order-2 tensor.

3 TENSOR SVD AND LOW RANK ORTHOGONAL APPROXIMATION Unfoldngs and mode-n products. It s hard to vsualze tensors of order N > 3. They can be flexbly represented when unfolded nto matrces. The unfoldng of a tensor along mode n s a matrx of dmenson d n (d n+1 d N d 1 d n 1 ). We denote the mode-n unfoldng of tensor A by A (n). Each column of A (n) s a column of A along the n-th mode. An mportant operaton for a tensor s the tensor-matrx multplcaton, also known as mode-n product. Gven a tensor A R d 1 d 2 d N and a matrx M R c n d n, the mode-n product s a tensor where B = A n M R d1 dn 1 cn dn+1 d N b 1,..., n 1,j n, n+1,..., N := d n n=1 for j n = 1, 2,..., c n. In matrx representaton, ths s a 1,..., n 1, n, n+1,..., N m jn, n B (n) = MA (n). (2.1) 2.2. Inner products and tensor norms. The nner product of two tensors A and B of the same sze s defned by A, B F := d N N =1 d 1 1 =1 and the norm nduced from ths nner product s A F := A, A F. a 1,..., N b 1,..., N. We say that A s a unt tensor f A F = 1. When N = 2, A F s the Frobenus norm of matrx A. The norm of a tensor s equal to the Frobenus norm of the unfoldng of the tensor along any mode: A F = A (n) F, for n = 1,..., N Tensor products and outer products of vectors. The tensor product of an order-n tensor A R d 1 d 2 d N and an order-n tensor B R c 1 c 2 c N s an order-(n + N ) tensor where C = A B R d1 d N c 1 c N, c 1,..., N,j 1,...,j N := a 1,..., N b j1,...,j N. Note that the operator for tensor products unfortunately concdes wth the one used to denote the Kronecker product of two matrces. In partcular, the tensor product of two matrces (order-2 tensors) s an order-4 tensor, whle the Kronecker product of two matrces s agan a matrx. The reader shall not be confused by ths notaton snce n ths paper Kronecker products are not nvolved.

4 4 J. CHEN AND Y. SAAD The outer product of N (column) vectors, whch generalzes the standard outer product of two vectors (a rank-1 matrx), s a specal case of tensor products. The outer product of N vectors x (n) R d n, s an order-n tensor of dmenson d 1 d 2 d N : X = x (1) x (2) x (N). The ( 1, 2,..., N )-entry of X s N n=1 (x(n) ) n, where (x (n) ) n denotes the n -th entry of vector x (n). The tensor X s also called a rank-1 tensor. The rank of a tensor s defned n Secton 3. It can be verfed that the mode-n product of a rank-1 tensor X wth a matrx M can be computed as follows: ( X n M = x (1) x (n 1) Mx (n)) x (n+1) x (N), and that the nner product of X wth a general tensor A s A, X F = A, x (1) x (2) x (N) F = A 1 x (1)T 2 x (2)T N x (N)T. If U = u (1) u (2) u (N) and V = v (1) v (2) v (N) are two rank-1 tensors then U, V F = N n=1 u (n), v (n), where, denotes the standard Eucldean nner product of two vectors. A consequence of the above relaton s that U F s the product of the 2-norms of the vectors u (n) s Orthogonalty of tensors. Two tensors A and B of the same sze are F-orthogonal (Frobenus orthogonal) f ther nner product s zero,.e., A, B F = 0. For rank-1 tensors U = u (1) u (2) u (N) and V = v (1) v (2) v (N), the above defnton mples that they are F-orthogonal f N n=1 u (n), v (n) = 0. Ths leads to other optons for defnng orthogonalty for two rank-1 tensors. The paper [28] dscussed two cases: 1. Complete orthogonalty: u (n), v (n) = 0 for all n = 1,..., N. 2. Strong orthogonalty: For all n, ether u (n), v (n) = 0 or u (n) and v (n) are collnear, but there s at least one l such that u (l), v (l) = 0. In ths paper we wll smply use the term orthogonal for two outer products that are completely orthogonal (case 1).

5 TENSOR SVD AND LOW RANK ORTHOGONAL APPROXIMATION 5 S (3) B S (2) A = S (1) Fg A decomposton of an order-3 tensor A as B 1 S (1) 2 S (2) 3 S (3) Tensor decompostons. A decomposton of a tensor A R d 1 d 2 d N s of the form A = B 1 S (1) 2 S (2) N S (N), where B R c1 c2 c N s called the core tensor, and S (n) R dn cn for n = 1,..., N are called sde-matrces. An llustraton s gven n Fgure 2.1. Let s (n) be the -th column of S (n). The decomposton of A can equvalently be wrtten as a lnear combnaton of rank-1 tensors: A = c N N =1 c 1 1=1 b 1, 2,..., N s (1) 1 s (2) 2 s (N) N. (2.2) In partcular, f B s dagonal,.e., b 1, 2,..., N = 0 except when 1 = 2 = = N, then A = b... s (1) s (2) s (N) (2.3) where r = mn{c 1,..., c N }. In the lterature, the term decomposton s often used when approxmaton s meant nstead. The Tucker3 decomposton s an approxmaton n the form of the rght-hand sde of (2.2), for gven dmensons c 1, c 2,..., c N. Usually, t s requred that c n s less than the rank of A (n) for all n, otherwse the problem s trval. The HOOI approach computes ths approxmaton wth an addtonal property that all the S (n) s are orthogonal matrces. The CANDECOMP/PARAFAC decomposton s an approxmaton n the form of the rght-hand sde of (2.3), for a pre-specfed r. Usually, r s smaller than the smallest dmenson of all modes of A, although requrng a larger r s also possble n the ALS and other algorthms. As wll be dscussed n the next secton, the smallest r that satsfes equalty (2.3) s the rank of the tensor A. 3. Tensor ranks. The rank of a tensor causes dffcultes when attemptng to generalze matrx propertes to tensors. There are several possble generalzatons of the noton of rank. The n-rank of a tensor A R d1 d2 d N, for n = 1,..., N, denoted by rank n (A), s the rank of the unfoldng A (n) : rank n (A) := rank(a (n) ).

6 6 J. CHEN AND Y. SAAD The (outer-product) rank of A, denoted by rank(a), s defned as { rank(a) := mn r x (n) R d n, = 1,..., r, n = 1,..., N, s.t. A = x (1) x (2) x (N) Hence, a tensor s the outer product of N vectors f and only f t has rank one, and the rank of a general tensor A s the mnmum number of rank-1 tensors that sum to A. There are a few notable dfferences between the noton of rank for matrces and that for tensors: 1. For N = 2,.e., when A s a matrx, rank 1 (A) s the row rank, rank 2 (A) s the column rank, and rank(a) s the outer-product rank, and they are all equal. However, for hgher order tensors (N > 2), n general, the n-ranks are dfferent for dfferent modes n, and they are dfferent from rank(a) [12]. Furthermore, the rank of a matrx A can not be larger than the smallest dmenson of both modes of A, but for tensors ths s no longer true,.e., the rank can be larger than the smallest dmenson of the tensor [12]. 2. The matrx SVD yelds one possble way of wrtng a matrx as a sum of outer-product terms, and the number of nonzero sngular values s equal to the rank of the matrx. However, a tensor SVD does not always exst (see Secton 4), but f t ndeed does, t s unque up to sgns [34, 43] and the number of sngular values s equal to the rank of the tensor (see Defnton 4.1 and Proposton 4.2). 3. It s well-known that the optmal rank-r approxmaton of a matrx s smply ts truncated SVD. However some tensors may fal to have an optmal rank-r approxmaton [17]. If such an approxmaton exsts, t s unclear whether t can be wrtten n the form of a dagonal core multpled by orthogonal sde-matrces. Next are some lemmas and a theorem related to tensor ranks, whch were also gven n [17]. They are useful n dervng the results n Secton 4. The frst lemma ndcates that the rank of a tensor can not be smaller than any of ts n-ranks: Lemma 3.1. Let A R d1 d2 d N be an order-n tensor. Then rank n (A) mn{rank(a), d n }, for n = 1, 2,..., N. }. The next lemma llustrates a way to construct hgher order tensors whle preservng the rank. Lemma 3.2. Let A be a tensor and x be a non-zero vector. Then rank(a) = rank(a x). The followng lemma ndcates that gven any dmenson d 1 d 2 d N, we can construct a tensor of arbtrary rank R mn{d 1, d 2,..., d N }. Lemma 3.3. For n = 1,..., N, let x (n) 1,..., x(n) R Rd n be lnearly ndependent. Then the tensor A = R x (1) x (2) x (N)

7 TENSOR SVD AND LOW RANK ORTHOGONAL APPROXIMATION 7 has rank R. The next theorem s due to JáJá and Takche [23]. They showed that f A and B are order-3 tensors and at least one of them s a stack of two matrces, then the rank of ther drect sum s equal to the sum of ther ranks. Theorem 3.4 (JáJá Takche). Let A R d1 d2 d3 and B R c1 c2 c3. If 2 {d 1, d 2, d 3, c 1, c 2, c 3 }, then rank(a B) = rank(a) + rank(b) The ll-posedness of the optmal low rank approxmaton problem. de Slva and Lm [17] proved that for any order N 3 and dmensons d 1,..., d N 2, there exsts a rank-(r + 1) tensor that has no optmal rank-r approxmaton, for any r = 2,..., mn{d 1,..., d N }. Ths result was further generalzed to an arbtrary rank gap,.e., there exsts a rank-(r + s) tensor that has no optmal rank-r approxmaton, for some r s and s s. Essentally, ths ll-posedness of the optmal approxmaton problem s llustrated by the fact that the tensor E := e 2 e 1 e 1 + e 1 e 2 e 1 + e 1 e 1 e 2 R 2 2 2, where e s the -th column of the dentty matrx, has rank 3 but can be approxmated arbtrarly closely by rank-at-most-2 tensors. Hence E does not have an optmal rank- 2 approxmaton. Then accordng to Theorem 3.4 and Lemma 3.2, the ll-posedness of the problem can be generalzed to arbtrary rank and order, by constructng hgher rank and hgher order tensors usng drect sums and tensor products. We restate one of the results of [17] n the followng theorem. For detals of the proof, see the orgnal paper. Theorem 3.5. For N 3 and d 1, d 2,..., d N 2, there exsts a tensor A R d 1 d 2 d N of rank r + s that has no optmal rank-r approxmaton, for any r and s 1 satsfyng 2s r mn{d 1, d 2,..., d N }. 4. The tensor SVD and ts (non-)exstence. The defnton used for the sngular value decomposton of a tensor generalzes the matrx SVD from the angle of an expanson of outer product matrces, whch becomes an expanson nto a sum of rank-1 tensors. Defnton 4.1. If a tensor A R d 1 d 2 d N can be wrtten n the form A = R σ u (1) u (2) u (N), (4.1) where σ 1 σ 2 σ R > 0 and u (n) j, u (n) k = δ jk (Kronecker delta) for n = 1, 2,..., N, then (4.1) s sad to be the tensor sngular value decomposton (tensor SVD) of A. The σ s are sngular values and the u (n) s for = 1,..., R are the mode-n sngular vectors. We also call (4.1) the SVD of tensor A for short where there s no ambguty about tensors and matrces. In fact, when N = 2,.e., A s a matrx, the tensor SVD of A bols down to the matrx SVD. Expresson (4.1) can equvalently be wrtten n the form A = D 1 U (1) 2 U (2) N U (N), (4.2)

8 8 J. CHEN AND Y. SAAD where D R R R R s the dagonal core tensor wth D... = σ, and [ ] U (n) = u (n) 1, u(n) 2,..., u(n) R R d n R (4.3) are orthogonal matrces for n = 1, 2,..., N. The followng proposton ndcates that the tensor SVD s rank revealng. Proposton 4.2. If a tensor A has the SVD as (4.1), then rank(a) = R. Proof. Ths follows from Lemma 3.3. Trvally, f a tensor s constructed as n (4.1), ts SVD exsts. However, n general, a tensor of order N 3 may fal to have such a decomposton. In ths secton, we dentfy some of these stuatons. To begn wth, note that the orthogonalty of each U (n) mples that R d n for each n,.e., R mn{d 1, d 2,..., d N }. Ths leads to the followng smple result. Proposton 4.3. A tensor A R d1 d2 d N wth rank(a) > mn{d 1, d 2,..., d N } does not admt a tensor SVD. Proof. The exstence of a tensor SVD such as n (4.1) would trvally lead to a contradcton snce the tensor n (4.1) has rank R wth R mn{d 1, d 2, d N }. Note that Theorem 3.5 guarantees that the condton of Proposton 4.3 s not vacuously satsfed, for any order N 3 and dmensons d 1, d 2,..., d N 2. Corollary 4.4. Gven a tensor A satsfyng the condton n Proposton 4.3, any tensor of the form A x (N+1) x (N+l), where l 1 and x (N+1),..., x (N+l) are nonzero vectors, does not admt a tensor SVD. Proof. Ths follows from Proposton 4.3 and Lemma 3.2. Corollary 4.5. A tensor A R d 1 d 2 d N does not admt a tensor SVD f there exsts at least one mode n such that rank n (A) > mn{d 1, d 2,..., d N }. Proof. Ths follows from Proposton 4.3 and Lemma 3.1. Proposton 4.6. There exsts a tensor A R d 1 d 2 d N whch does not admt a tensor SVD whenever d := max {d n} > mn{d n } and d 2 n n N d n. Proof. Wthout loss of generalty, assume that d = d 1 d 2 d N and let d = d 2 d N. Snce d d, for an arbtrary set of orthonormal vectors {a R d = 1,..., d}, we can construct a tensor A whose unfoldng A (1) = [a 1, a 2,..., a d ] T. Then rank 1 (A) = d. By Corollary 4.5, A does not admt a tensor SVD. Note that when N = 2,.e., for the matrx case, t s mpossble for d 1 and d 2 to satsfy the condton n the proposton. In closng ths secton, we provde a necessary and suffcent condton to characterze the exstence of the tensor SVD. 3 Ths s related to the HOSVD proposed by [12]. The essental relaton underlyng the theorem s that the mode-n sngular vectors of A, when ts SVD exsts, are also the left sngular vectors of the unfoldng A (n). n=1 3 As ponted out by a referee, the provded relaton may have long been recognzed n other felds of research, such as sgnal processng, at least n the case of dstnct sngular values.

9 TENSOR SVD AND LOW RANK ORTHOGONAL APPROXIMATION 9 Theorem 4.7. A tensor A admts an SVD f and only f there exsts an HOSVD of A such that the core s dagonal. Proof. The suffcent condton s obvous. Consder the necessary condton. If A can be wrtten n the form (4.1), defne the tensor and let w (n) Snce u (n) j W (n) := u (n+1) u (N) u (1) u (n 1), be the vectorzaton of W (n). Then the unfoldng of A along mode n s R A (n) =, u (n) k = δ jk for all n, we have σ u (n) w (n) j w (n)., w (n) k = δ jk. Hence the above form s the SVD of matrx A (n). In other words, the vectors u (n) 1,..., u(n) R are the left sngular vectors of A (n). From the constructon of the HOSVD, Equaton (4.2) s a vald HOSVD for A. 4 The proof of the above theorem ndcates that f the SVD of a tensor A exsts, ts sngular values concde wth the nonzero mode-n sngular values n ts HOSVD. However the HOSVD of a tensor may not be unque, snce the SVD of the unfoldngs A (n) s are not guaranteed to be unque. Hence even f a tensor s constructed as n (4.1), ts HOSVD wll not necessarly recover ths form. Ths s the reason why n the above theorem we use the phase... f there exsts.... It s nterestng to note agan that the non-unqueness of matrx SVD s caused by duplcate sngular values, however the tensor SVD s unque (f t exsts) even when some of the sngular values are the same [43, Theorem 3.2]. 5. The optmal low rank orthogonal approxmaton. The problem addressed by tensor analyss s to approxmate some tensor A by a lnear combnaton of tensors T 1, T 2,..., T r that have specal structures, e.g., rank-1 tensors, orthogonal tensors, or smple tensors 5. For ths t s desrable to mnmze A σ T F for a gven r. Wthout loss of generalty, we assume that T F = 1 for all. As dscussed n Secton 3.1, f the T s are rank-1 tensors, the nfmum of the above expresson mght not necessarly be attaned. The followng proposton reveals some propertes when the nfmum s ndeed acheved. Proposton 5.1. Gven a tensor A and a postve nteger r, consder a set of lnear combnatons of tensors of the form T := σ T (5.1) where the T s are arbtrary unt tensors. If nf A T F s reached on ths set, then for the optmal T and T s, A T, T F = 0 for = 1, 2,..., r. 4 In order to strctly conform to the defnton of the HOSVD defned n [12], n (4.2) the sze of D should be enlarged from R R R to d 1 d 2 d N by paddng zeros, and the U (n) s should be padded wth orthogonal columns to make square shapes. 5 A tensor s smple f t s the tensor product of two tensors.

10 10 J. CHEN AND Y. SAAD Furthermore, f the T s are requred to be mutually F-orthogonal, then the optmal σ s are related to the optmal T s by In ths stuaton, T F = r σ = A, T F for = 1, 2,..., r. (5.2) σ2. and A T 2 F = A 2 F T 2 F. (5.3) Proof. If the nfmum s attaned by a certan set of σ s and T s, and f there s a j such that A T, T j F = ɛ 0, then A r σ T ɛt j 2 F = A r σ T 2 F 2ɛ A r σ T, T j F + ɛ 2 T j 2 F = A r σ T 2 F ɛ2 < A r σ T 2 F, whch contradcts the assumpton. If the unt tensors T s are mutually F-orthogonal, then Also, 0 = A r σ T, T j F = A, T j F σ j T j, T j F = A, T j F σ j. T 2 F = T, T F = σ T, σ j T j F =,j=1 σ 2, and 2 A σ T F = A 2 F 2σ A, T F + σ 2 = A 2 F σ 2 = A 2 F T 2 F. The last part of the proof ndcates that the equaltes n (5.3) follow from the orthogonalty of the T s and the relatons (5.2). They do not requre optmalty. In ths secton, we wll see that f the T s are mutually orthogonal rank-1 tensors, then the nfmum n the proposton can be attaned. Formally, we wll prove that the problem mn E := A σ u (1) u (2) u (N) s.t. u (n) j, u (n) k = δ jk, for n = 1, 2,..., N, F (5.4) has a soluton for any A R d 1 d 2 d N and any r mn{d 1, d 2,..., d N }. The soluton for the case r = mn{d 1, d 2,..., d N } leads to a decomposton of A where the dagonal of the core s maxmzed.

11 TENSOR SVD AND LOW RANK ORTHOGONAL APPROXIMATION Exstence of the global optmum. Let T := u (1) u (2) u (N), for = 1,..., r, (5.5) and σ s be defned as n (5.2), then accordng to Proposton 5.1 (see comments followng the proof), E 2 2 = A σ T = A 2 F σ 2.,.e., the optmzaton prob- Hence mnmzng E s equvalent to maxmzng r lem (5.4) s equvalent to the followng: max E := s.t. u (n) j (, u (n) k A 1 u (1) F T 2 u (2) σ2 ) T N u (N) 2 T = δ jk, for n = 1, 2,..., N. (5.6) Let U (n) = [ ] u (n) 1, u(n) 2,..., u(n) r Ω (n) (5.7) where Ω (n) := {W R dn r W T W = I} (5.8) for n = 1, 2,..., N. The problem (5.6) can be nterpreted as that of maxmzng E wthn the feasble regon Ω := Ω (1) Ω (2) Ω (N). (5.9) Snce for each n the set Ω (n) s compact (see, e.g., [22, p. 69]), by Tychonoff Theorem, the feasble regon Ω s compact. Under the contnuous mappng E, the mage E (Ω) s also compact. Hence t has a maxmum. Ths proves the followng theorem: Theorem 5.2. There exsts a soluton to the problem (5.6) (or equvalently (5.4) wth σ defned n (5.2)) for any r mn{d 1, d 2,..., d N } Relaton to tensor decompostons. Let U (n), n = 1,..., N be the soluton to the problem (5.4) wth r = mn{d 1, d 2,..., d N } and σ be defned n (5.2). Also for n = 1,..., N, let U (n) be a d n (d n r) matrx such that the square matrx [ Ũ (n) := U (n), U (n) ] R d n d n (5.10) s orthogonal. Further, defne the tensor Then the equalty S := A 1 Ũ (1)T 2 Ũ (2)T N Ũ (N)T R d 1 d 2 d N. (5.11) A = S 1 Ũ (1) 2 Ũ (2) N Ũ (N) (5.12) holds. Ths decomposton of A has the followng two propertes:

12 12 J. CHEN AND Y. SAAD () The sde-matrces Ũ (n) s are orthogonal. () The (squared) norm of the dagonal of the core S: mn{d 1,...,d N } s 2... = ( A 1 u (1) T 2 u (2) ) T N u (N) 2 T = s maxmzed among all the choces of the orthogonal sde-matrces. Ths s known as maxmal dagonalty n [12]. where 5.3. Frst order condton. The Lagrangan of (5.6) s L = σ 2 N j,k=1 n=1 σ = A 1 u (1) µ n j,k T 2 u (2) ( u (n) j σ 2 ), u (n) k δ jk, (5.13) T N u (N) T and the µ n j,k s are Lagrange multplers. Defne the vector v (n) := A 1 u (1) R 1 1 d n 1 1. T n 1 u (n 1) T n+1 u (n+1) T N u (N) T (5.14) (5.15) (Here we abuse the use of notaton =. More precsely, v (n) should be the mode-n unfoldng of the tensor on the rght-hand sde of (5.15).) It s not hard to see that u (n), v (n) = σ for all n and, and v (n) s the partal dervatve of σ wth respect to u (n) The partal dervatve of the Lagrangan wth respect to u (n) s L u (n) = 2σ v (n) j=1 µ n j,u (n) j k=1 µ n,ku (n) k, for any n and. By settng the partal dervatves to 0 and puttng all equatons related to the same n n matrx form, we obtan the followng equatons: µ [ ] 1 v (n) 1 v r (n) σ n 1,1 +µn 1,1 µ [ ]... 2 n 1,r +µn r,1 2 = u (n) 1 u (n).. r....., σ µ n r r,1 +µn 1,r µ 2 n r,r +µn r,r 2 (5.16) for all n = 1, 2,..., N. Let V (n) :=. [ ] v (n) 1, v (n) 2,..., v r (n), (5.17) Σ := dag(σ 1,..., σ r ), (5.18) and let M (n) be the second term on the rght-hand sde of (5.16). Then (5.16) s compactly represented as V (n) Σ = U (n) M (n), n = 1, 2,..., N. (5.19) In summary, the necessary condton of an extremum of the Lagrangan s the equaton (5.19), where V (n) s defned n (5.17), Σ s defned n (5.18), U (n) s defned n (5.7), and M (n) s symmetrc, for all n = 1, 2,..., N.

13 TENSOR SVD AND LOW RANK ORTHOGONAL APPROXIMATION Algorthm: LROAT. We seek orthogonal matrces U (n) s and symmetrc matrces M (n) s whch satsfy the system (5.19). (The Σ and V (n) s are computed from the U (n) s.) Note that the par U (n), M (n) happens to be the polar decomposton of the matrx V (n) Σ. Hence the system can be solved n an teratve fashon: We begn wth an ntal guess of the set of orthogonal matrces {U (1), U (2),..., U (N) }, whch can be obtaned, say, by the HOSVD of A. For each n, we compute V (n) and Σ, and update U (n) as an orthogonal polar factor of V (n) Σ. Ths procedure s terated untl convergence s observed. Algorthm 1 (LROAT) summarzes ths dea. Algorthm 1 Low Rank Orthogonal Approxmaton of Tensors (LROAT) Input: Tensor A, rank r, orthogonal matrces U (1),..., U (N) as ntal guess Output: σ 1,..., σ r, U (1),..., U (N) 1: repeat 2: for n 1,..., N do[ ] 3: Compute V (n) = v (n) 1,..., v r (n) accordng to (5.15) 4: Compute Σ = dag(σ 1,..., σ r ) accordng to (5.14) 5: [Q (n), H (n) ] polar-decomp(v (n) Σ) 6: Update U (n) Q (n) 7: end for 8: untl convergence [ Note that when r = 1, the matrx V (n) = v (n) 1 ] [ and U (n) = u (n) 1 ], whch means that for each teraton u (n) 1 s updated as the normalzed v (n) 1. Ths ndcates that the LROAT algorthm for r = 1 bols down to the ALS method [43] (or the so-called hgher-order power method [13, 27]) for computng the optmal rank-1 approxmaton. Hence, t s not unexpected to see n the numercal experments that n general LROAT converges lnearly. We also comment that LROAT s not an alternatng least squares method (except for the case r = 1) by the nature of the update of U (n) Convergence analyss. LROAT employs an alternatng procedure (teratng through U (1), U (2),..., U (N) ), where n each step all but one (U (n) ) parameters are fxed. In general, algorthms of ths type, ncludng alternatng least squares, are not guaranteed to converge. Specfcally, the objectve functon may converge but not the parameters. (See, for example, [32] for some dscussons.) For LROAT, we are also unable yet to prove the global convergence, though emprcally t appears to hold. However, n ths secton, we wll prove that: (1) The teratons monotoncally ncrease the objectve value E (Theorem 5.4); (2) Under a mld condton, of the generated parameter sequence, every convergng subsequence converges to a statonary pont of the objectve functon (Theorem 5.7); and (3) In a neghborhood of a local maxmum, the parameter sequence converges to ths statonary pont (Theorem 5.9). Before analyzng the convergence behavor of LROAT, we ndex all the terates. The outer-loop s ndexed by p and the overall teratons are ndexed by dx, whch s equal to n + (p 1)N. In other words, Algorthm 1 s rewrtten as follows. In partcular, the numbered lnes correspond to the lnes n Algorthm 1. for p 1, 2,... do for n 1,..., N do dx = n + (p 1)N

14 14 J. CHEN AND Y. SAAD For all, compute σ (dx) accordng to U (1) (p+1),..., U (n 1) (p+1), U (n) (p), U (n+1) (p),..., U (N) (p) Objectve E (dx) = ( ) 2 r σ (dx) 3: Compute V (n) (p) from U (1) (p+1),..., U (n 1) (p+1), U (n+1) (p),..., U (N) ( ) (p) 4: Assgn Σ (dx) = dag σ (dx) 1,..., σ r (dx) 5: Polar decomposton V (n) (p) Σ(dx) = Q (n) (p) H(n) (p) 6: Update U (n) (p+1) = Q(n) (p) end for end for The followng lemma, whch s well-known when the matrx A s square, reveals the trace maxmzng property that s mportant for the convergence analyss of LROAT. Lemma 5.3. Let matrx A R m n, m n, have the polar decomposton A = QH where Q R m n s the orthogonal polar factor and H R n n s the symmetrc postve sem-defnte factor, then max tr(p T A) P R m n, P T P =I s attaned when P = Q. Proof. Any P can be wrtten as ZQ, where Z R m m s orthogonal. Then tr(p T A) = tr(q T Z T QH) = tr(z T QHQ T ). Snce QHQ T s symmetrc postve sem-defnte, max tr(z T QHQ T ) s attaned when Z = I. Snce U (n) (n) (p+1) s the orthogonal polar factor of V (p) Σ(dx), by Lemma 5.3, ( σ (dx) ) 2 = tr ( U (n) (p) Then by the Cauchy-Schwarz nequalty, ( σ (dx) ) ( ) T (n) V (p) Σ(dx) tr U (n) T (n) (p+1) V (p) Σ(dx) = ) 2 σ (dx+1) σ (dx) ( σ (dx+1) σ (dx+1) σ (dx). ) 2, (5.20) and ( σ (dx) ) 2 ( = σ (dx+1) ) 2 ff σ (dx) = σ (dx+1) for all. (5.21) Inequalty (5.20) means that each update of U (n) ncreases the value of the objectve functon E,.e., E (dx) E (dx+1). Snce E s bounded from above (exstence of the maxmum, see Theorem 5.2), the sequence {E (dx) } dx=1 converges. Note that the convergence does not depend on the ntal guess nput to the algorthm. Formally, we have establshed the followng theorem:

15 TENSOR SVD AND LOW RANK ORTHOGONAL APPROXIMATION 15 Theorem 5.4. Gven any ntal guess, the teratons of Algorthm 1 monotoncally ncrease the objectve functon E defned n (5.6) to a lmt. The convergence of the objectve functon does not necessarly mply that the functon parameters wll converge. However, n our case snce the parameters U (n) s are bounded, they admt convergng subsequences. Next we wll show that every such subsequence converges to a statonary pont of E. For ths, the followng lemma uses a helper functon f. Lemma 5.5. Let T : Θ Θ be a contnuous mappng and a sequence {θ n Θ} n=1 be generated from the fxed pont teraton θ n+1 = T (θ n ). If there exsts a contnuous functon f : Θ R satsfyng the followng two condtons: () The sequence {f(θ n )} n=1 converges; () For θ Θ, f f(t (θ)) = f(θ) then T (θ) = θ; then every convergng subsequence of {θ n } n=1 converges to a fxed pont of T. Proof. Let {θ sn } n=1 be a convergng subsequence of {θ n } n=1, where θ sn θ. Also let f be the lmt of f(θ n ). Then f(θ sn ) f(θ ), therefore f(θ ) = f. Meanwhle from the contnuty of T and f, we have T (θ sn ) T (θ ) and f(θ sn+1) = f(t (θ sn )) f(t (θ )), whch mples that f(t (θ )) = f. Condton () of the lemma now mples that θ = T (θ ). Our objectve functon E s just one such helper functon f, and the orthogonal polar factor functon plays the role of the mappng T n the above lemma. The followng lemma establshes the fact that the orthogonal polar factor functon s contnuous. Lemma 5.6. The orthogonal polar factor functon g : A Q defned on the set of matrces wth full column rank s contnuous. Here Q s the orthogonal polar factor of A R m n, m n. Proof. Frst, functon g s well defned, snce the orthogonal polar factor of a full rank matrx exsts and s unque [21]. If Q and Q are the orthogonal polar factors of A and A, respectvely, Sun and Chen [39] have shown that Q Q F 2 A + 2 A A F, where + means pseudo nverse. Hence f A 1, A 2,... converges to A, then g(a 1 ), g(a 2 ),... converges to g(a ). Now we are ready to prove the followng result. Theorem 5.7. Every convergng subsequence of { } U (1) (p),..., U (N) (p) generated p=1 by Algorthm 1 converges to a statonary pont of the objectve functon E defned n (5.6), provded the matrces V (n) n lne 3 of the algorthm do not become rankdefcent throughout the teratons. Proof. For convenence, let U denote the sde-by-sde concatenaton of the U (n) s,.e., at teraton number p we wrte U (p) := [ U (1) (p),..., U (N) (p) V (n) (p) Σ(dx) s computed from U (p) and polar factorzed, and U (n) (p) ]. For each teraton, s updated. Let T be the composte of all these teratons runnng n from 1 to N. That s, U (p+1) = T (U (p) ). It s not hard to see that T s contnuous by Lemma 5.6. The objectve functon E takng parameter U (p) has been prevously shown such that the sequence {E (U (p) )} p=1 s monotoncally convergng. Hence by Lemma 5.5, n order to prove ths theorem, t wll suffce to show that E (T (U)) = E (U) mples T (U) = U. Then every convergng subsequence of {E (U (p) )} p=1 converges to a fxed pont, whch satsfes the frst order condton (5.19),.e., t s also a statonary pont of E.

16 16 J. CHEN AND Y. SAAD If E (T (U)) = E (U), formula (5.21) ndcates that the σ values have not changed after ( the teraton. ) In partcular, for any n, the update of U (n) has not changed tr U (n)t V (n) Σ. Snce the orthogonal polar factor of V (n) Σ s unque when V (n) s not rank-defcent, ths means that U (n) has not changed. Ths n turn means that U s a fxed pont of the mappng T. The condton n the theorem s not a strong requrement n general. Of course, the columns v (n) of the matrx V (n), as computed from (5.15), wll be lnearly dependent f the n-rank of A s less than r. For practcal applcatons, the tensor usually has full n-ranks for all n, so ths does not hamper the applcablty of the theorem. Though the global convergence of {U (p) } s not determned, when localzed, t s possble that ths parameter sequence converges. The followng lemma and theorems consder ths stuaton. Lemma 5.8. If a sequence {θ n } n=1 s bounded, and all of ts convergng subsequences converge to θ, then θ n θ. Proof. (By contradcton.) If {θ n } n=1 does not converge to θ, then there s an ɛ > 0 such that there exsts a subsequence S = {θ sn } n=1, where θ sn θ ɛ for all n. Snce S s bounded, t has a convergng subsequence S. Then S as a subsequence of {θ n } n=1 converges to a lmt [ other than θ. ] Theorem 5.9. Let U = U (1),..., U (N) be a local maxmum of the objectve { [ ]} functon E defned n (5.6). If the sequence U (p) := U (1) (p),..., U (N) (p) generated p=1 by Algorthm 1 les n a neghborhood of U, where U s the only statonary pont n that neghborhood, and f the full rank requrement n Theorem 5.7 s satsfed, then the sequence {U (p) } p=1 converges to U. Proof. Ths mmedately follows from Theorem 5.7 and Lemma 5.8. Note that snce the startng elements of a sequence have no effect on ts convergence behavor, the above theorem holds whenever the talng subsequence, startng from a suffcently large p, les wthn the neghborhood. A weaker, but smpler, result[ s the followng] corollary. Corollary Let U = U (1),..., U (N) be a local maxmum of the objectve functon E defned n (5.6). If ths local maxmum s unque { and [ f the full rank ]} requrement n Theorem 5.7 s satsfed, then the sequence U (p) := U (1) (p),..., U (N) (p) p=1 generated by Algorthm 1 converges to U LROAT for symmetrc tensors. An order-n tensor A R d d d, whose dmensons of all modes are the same, s symmetrc f for all permutatons π, a 1, 2,..., N = a π(1), π(2),..., π(n). For symmetrc tensors, usually the approxmaton problem (5.4) has an addtonal constrant that the sde-matrces U (n) s are the same for all n,.e., the problem becomes mn E = A σ u u u F (5.22) s.t. u j, u k = δ jk. Applyng smlar arguments to those n Secton 5.1, t s easly seen that (5.22) s

17 TENSOR SVD AND LOW RANK ORTHOGONAL APPROXIMATION 17 equvalent to the followng problem: max E ( = A 1 u T 2 u T N u T s.t. u j, u k = δ jk. ) 2 (5.23) The supremum of E can be attaned. Further, the maxmal-dagonalty decomposton of A (c.f. Equaton (5.12)) has an addtonal property that the core S s symmetrc. Also, the frst order condton (5.19) s smplfed to V Σ = UM. Hence, there are two approaches to compute the approxmaton for the symmetrc tensor A. The frst approach s to drectly apply LROAT on A. Theorems n the above secton guarantee the convergence under mld assumptons, but the sdematrces mght no longer be the same, though n the next secton an experment ndcates that they ndeed converge to the same matrx. The second approach s to only use a sngle ntal guess U and omt the for-loop on n (lne 2 of Algorthm 1). We call ths the symmetrc varant of LROAT. In ths case Theorem 5.4 no longer holds,.e., the teratons mght not monotoncally ncrease the objectve value E defned n (5.23), snce the for-loop on n s omtted. An experment n the next secton shows an oscllatng phenomenon, whch s smlar to the one ndcated n Fgure 4.1 of [27], for the objectve value E. 6. Numercal experments. Ths secton wll show a few experments to llustrate the convergence behavor and the approxmaton qualty of LROAT. For comparsons are the ALS methods for Tucker and PARAFAC, whose mplementatons are based on the codes from the MATLAB Tensor Toolbox developed by Bader and Kolda [2]. We use the major left sngular vectors of the unfoldngs as the ntal guess nput for all the algorthms compared. When t comes to the qualty of the fnal approxmaton, experence shows that compared wth random orthonormal vectors, sngular vectors as ntal guesses do not offer any advantage. It has been argued that runnng the algorthms several tmes usng dfferent sets of random ntal guess enhances the probablty of httng the global optmum. We use sngular vectors here only for repeatablty Convergence of LROAT. In the frst experment, we test LROAT (and the symmetrc varant of LROAT mentoned n Secton 5.6) on a few tensors lsted n Table 6.1. The results are shown n Fgures 6.1 and 6.2. Each row of the fgures s one test on a tensor. The left plot shows the objectve value E (the same as the norm of the approxmated tensor T ) for each teraton p, whle the rght plot shows the convergence hstory of the U (n) s. Snce the optma are unknown, we plot the values { to ndcate the convergence of the sequence F U (n) (p) U (n) (p 1) U (n) (p). p=1,2,... Snce these values are plotted on logarthmc scale, f the curves are bounded from above by a straght decreasng lne, then t s ndcated that the convergence of the sequence s at least lnear. Fgures 6.1 and 6.2 show n a total fve tests. The frst test (Fgure 6.1(a)) uses a randomly generated tensor A 1. The second test (Fgure 6.1(b)) uses a low-rank-plus- Gaussan-nose tensor A 2 = B 1 + ρb 2, }

18 18 J. CHEN AND Y. SAAD Table 6.1 The tensors used for the frst experment. The value r s the rank nput to LROAT; t s not the rank of the tensor. Tensor Dmensons r Notes A random tensor A rank-5 tensor + Gaussan nose A the (, j, k)-entry = 1/( 2 + j 2 + k 2 ) A see [27, Example 1] where the low rank tensor B 1 s n the form (2.3) wth r = 5, the Gaussan nose tensor B 2 has normally dstrbuted elements, and ρ = 0.1 B 1 F / B 2 F. In these two tests the two tensors are appled to the LROAT algorthm. The thrd test (Fgure 6.1(c)) uses a symmetrc tensor A 3 wth entres (A 3 ) jk = j 2 + k 2. In ths test A 3 s appled to the symmetrc varant of LROAT. All the three tests show a lnear convergence rate. The fourth (Fgure 6.2(a)) and the ffth (Fgure 6.2(b)) test use a symmetrc tensor A 4 ntroduced n [27, Example 1]: (A 4 ) 1111 = , (A 4 ) 1112 = , (A 4 ) 1113 = , (A 4 ) 1112 = , (A 4 ) 1123 = , (A 4 ) 1133 = , (A 4 ) 1222 = , (A 4 ) 1223 = , (A 4 ) 1233 = , (A 4 ) 1333 = , (A 4 ) 2222 = , (A 4 ) 2223 = , (A 4 ) 2233 = , (A 4 ) 2333 = , (A 4 ) 3333 = In [27], the symmetrc hgher-order power method for computng the optmal rank-1 approxmaton of A 4 s shown to be non-convergng. We experment wth ths tensor wth r = 2 on LROAT and the symmetrc varant of LROAT. Fgure 6.2(a) shows that when appled to LROAT, the approxmaton to A 4 ndeed lnearly converges, and what s more, all the sde-matrces converge to the same result. The approxmaton computed by LROAT s wth A 4 σ 1 u (1) u (1) u (1) u (1) + σ 2 u (2) u (2) u (2) u (2) σ 1 = , u (1) = [ ] T, σ 2 = , u (2) = [ ] T. On the other hand, Fgure 6.2(b) shows that the symmetrc varant of LROAT fals to converge Low rank orthogonal approxmaton compared wth Tucker and PARAFAC. In the second experment, we compare the approxmaton qualty of three dfferent models: Low rank orthogonal approxmaton (wthout confuson n ths secton, we call ths model LROAT, whch happens to be the name of the algorthm, for short), Tucker and PARAFAC. See Fgure 6.3. We experment wth two tensors: a low-rank-plus-gaussan-nose tensor whch s generated the same way

19 TENSOR SVD AND LOW RANK ORTHOGONAL APPROXIMATION 19 as A 2 and a real-lfe tensor. The latter s obtaned from a problem n acoustcs [20], and the data can be downloaded from [16]. The resdual norms A T F (p) res(p) := A F over all the teratons p are plotted. Fgure 6.3 ndcates three facts: (1) The three models approxmate the data tensor well to some extent (less than 35% of the nformaton s lost due to approxmaton); (2) PARAFAC s usually slow to converge; (3) The resdual norm for LROAT s larger than those of Tucker and PARAFAC. The last fact s not unexpected snce LROAT can be consdered a specal case of Tucker and of PARAFAC: The Tucker model has a full core whle the core for LROAT s dagonal, and unlke LROAT the sde-matrces n the PARAFAC model are not restrcted to be orthogonal An applcaton. In the blnd source separaton (BSS) problem [5], the cumulant tensor of order 4 s a rank-r tensor: R σ u u u u, (6.1) where R s the number of sources and u s the -th column of the mxng matrx. In the prewhtenng approach for the BSS problem, the u s become the columns of the composte of the whtenng matrx and the mxng matrx, that s, the u s are length-r vectors and are orthonormal. Hence, ths prewhtenng approach reduces to computng the tensor SVD of the cumulant tensor. Snce n practce ths tensor s estmated from a fnte data set, t s not exact. Thus, the low rank orthogonal approxmaton becomes a sutable tool to recover the u s. In an experment, we let R = 3 and generate a data tensor A 5 = B 3 + ρb 4 where B 3 s as (6.1), B 4 s a symmetrc tensor wth normally dstrbuted elements, and ρ = 0.05 B 3 F / B 4 F. The σ s are and the u s are σ 1 = , σ 2 = , σ 3 = , U = [u 1, u 2, u 3 ] = We use four methods to compute the rank-r (or rank-(r, R, R)) approxmatons to A 5 : LROAT, ncremental rank-1 approxmaton, PARAFAC, and Tucker. All the four methods return same sde-matrces for all modes. They are: U LROAT = , U nc = , U PARAFAC = , U Tucker = Observatons are as follows:

20 20 J. CHEN AND Y. SAAD u s: 1. The U nc and U PARAFAC are not orthogonal. 2. Compared wth Tucker, LROAT gves better approxmatons to the vectors U U LROAT = , U U Tucker = In terms of approxmaton qualty, the resdual norms (n percentage of the norm of A 5 ), are res LROAT = 3.07%, res nc = 1.36%, res PARAFAC = 1.36%, res Tucker = 0%. 7. Concludng remarks. In the present paper we studed the tensor SVD, and characterzed ts exstence n relaton to the HOSVD. Smlar to the concept of rank, the SVD of hgher order tensors exhbts a qute dfferent behavor and characterstcs from those of matrces. Thus, the SVD of a matrx s guaranteed to exst, though t may have dfferent representatons due to orthogonal transformatons of sngular vectors correspondng to the same sngular value. On the other hand, there are many ways n whch a tensor can fal to have an SVD (see the results n Secton 4), but when t exsts, ths decomposton s unque up to sgns. We have also dscussed a new form of optmal low rank approxmaton of tensors, where orthogonalty s requred. Ths approxmaton s nspred by the constrants of the Tucker model and the PARAFAC model. In some applcatons, the proposed approxmaton model may be favored, snce t results n N sets of orthonormal vectors or, equvalently, r mutually orthogonal unt rank-1 tensors wth dfferent weghts. Among the advantages of ths approxmaton over the Tucker model s the fact that t requres far fewer entres to represent the core, and that t s easer to nterpret. Also, compared wth the PARAFAC model, the orthogonalty of vectors may be useful n some cases. Further, the LROAT algorthm for computng the proposed approxmaton does not seem to exhbt the well-known slow convergence from whch the ALS algorthm for PARAFAC suffers. A major restrcton of the proposed model s that the number of terms r can not exceed the smallest dmenson of all modes of the tensor. A consequence s that the approxmaton may stll be very dfferent from the orgnal tensor even when the maxmum r s employed. However we note that when performng data analyss, the nterpretaton of the vectors and the core tensor mght be more mportant than how much s lost when the data s approxmated. A nce aspect of the proposed approxmaton s that the optmum of the objectve functon can theoretcally be attaned, n contrast to the PARAFAC model whch s ll-posed n a strct mathematcal sense. We presented an algorthm to compute ths approxmaton, but the computed result s only optmal n a local neghborhood. It wll be nterestng to study for what tensors or what ntal guesses the LROAT algorthm converges to the global optmum, or to devse a new algorthm to solve ths optmzaton problem. It s an open problem how fast LROAT converges, although emprcally convergence s observed to be lnear. We also dscussed the symmetrc varant of LROAT and ponted out the possblty of ts non-convergence. Hence the convergence propertes of ths varant, and the observed phenomenon that the orgnal LROAT algorthm can yeld same sde-matrces for symmetrc tensors, reman to be nvestgated. Appendx. Does the ALS algorthm for PARAFAC converge? It has been ponted out that the ALS algorthm for computng the PARAFAC model may

21 TENSOR SVD AND LOW RANK ORTHOGONAL APPROXIMATION 21 converge very slowly due to degenerate solutons or multcollneartes, and many alternatves have been proposed to address ths problem [36, 37, 26]. Durng teratons, the objectve value monotoncally decreases by the nature of the alternatng least squares procedure, and snce the sequence s bounded, t converges. However, a proof of the convergence of the parallel factors s lackng. In general t s assumed that these factors converge, but may take a very large number of teratons. In ths secton, we dscuss an experment showng that the general concept of convergence s unclear n ths context. Though only one example s gven, we note that the exhbted behavor s not rare for randomly generated tensors. (On the other hand t may be argued that tensors n real applcatons are far from beng flled wth random entres.) We generate an order-3 tensor A R and run the ALS algorthm on r = 2,.e., to compute the approxmaton A λ 1 u (1) 1 u (2) 1 u (3) 1 + λ 2 u (1) 2 u (2) 2 u (3) 2. The Matlab code whch generates the tensor A s as follows: A(:,:,1) = [ ; ; ]; A(:,:,2) = [ ; ; ]; A(:,:,3) = [ ; ; ]; We use u (n) = e, where n = 1, 2, 3 and e s the -th column of the dentty matrx, as the ntal guess. [ ] Denote U (n) = u (n) 1, u(n) 2 for n = 1, 2, 3. Two plots are shown after runnng 10 5 teratons (see Fgure A.1). Fgure A.1(a) shows the convergence hstory for each U (n). The curves represent U (n) (p) U (n) (p 1), where p s the ndex of the teratons. A necessary condton for convergence to occur s that all the three curves decrease to zero. However we see from the fgure that ths may not be the case. To test the conjecture that each of the curves tends to a nonzero value, we use the followng expresson log 10 y = a (10 4 x) 1/α + b to ft the talng part of the curves (startng from the th teraton). Table A.1 gves the fttng results for dfferent α s. When the number of teratons tends to nfnty, the value 10 b wll show the lmt of the dfferences between two consecutve U (n) s. It s stll dffcult to conclude for ths example that the teratons do not converge snce roundng has not been taken nto account. However, t makes no practcal dfference for ths case whether the sequence actually converges or whether t s exceedngly slow to converge. The result, f convergence holds, wll be an nordnate number of teratons to reach a desrable level of convergence, and the cost wll be too hgh n practce. Ths can be made evdent by examnng Fgure A.1(b), whch plots the parallel factor u (1) 2 over all teratons: The 3rd entry of u (1) 2 decreases from at the th teraton to at the th teraton. REFERENCES

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal Inner Product Defnton 1 () A Eucldean space s a fnte-dmensonal vector space over the reals R, wth an nner product,. Defnton 2 (Inner Product) An nner product, on a real vector space X s a symmetrc, blnear,