arxiv: v1 [math.ag] 10 Oct 2011

Similar documents
Decomposing a three-way dataset of TV-ratings when this is impossible. Alwin Stegeman

A concise proof of Kruskal s theorem on tensor decomposition

Subtracting a best rank-1 approximation may increase tensor rank

arxiv: v1 [math.ra] 13 Jan 2009

c 2008 Society for Industrial and Applied Mathematics

Subtracting a best rank-1 approximation may increase tensor rank 1

Information, Covariance and Square-Root Filtering in the Presence of Unknown Inputs 1

c 2008 Society for Industrial and Applied Mathematics

Joint Regression and Linear Combination of Time Series for Optimal Prediction

Introduction to Tensors. 8 May 2014

On Kruskal s uniqueness condition for the Candecomp/Parafac decomposition

LOW MULTILINEAR RANK TENSOR APPROXIMATION VIA SEMIDEFINITE PROGRAMMING

Computational Linear Algebra

ECE 275A Homework #3 Solutions

Quantum Computing Lecture 2. Review of Linear Algebra

Author manuscript: the content is identical to the content of the published paper, but without the final typesetting by the publisher

Maximum Likelihood Estimation and Polynomial System Solving

j=1 u 1jv 1j. 1/ 2 Lemma 1. An orthogonal set of vectors must be linearly independent.

Hyperdeterminants, secant varieties, and tensor approximations

Ir O D = D = ( ) Section 2.6 Example 1. (Bottom of page 119) dim(v ) = dim(l(v, W )) = dim(v ) dim(f ) = dim(v )

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

EE731 Lecture Notes: Matrix Computations for Signal Processing

JOS M.F. TEN BERGE SIMPLICITY AND TYPICAL RANK RESULTS FOR THREE-WAY ARRAYS

arxiv: v2 [math.sp] 23 May 2018

CS 246 Review of Linear Algebra 01/17/19

Large Scale Data Analysis Using Deep Learning

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

1 Last time: least-squares problems

Lecture 10 - Eigenvalues problem

Deep Learning Book Notes Chapter 2: Linear Algebra

The total least squares problem in AX B. A new classification with the relationship to the classical works

Knowledge Discovery and Data Mining 1 (VO) ( )


w T 1 w T 2. w T n 0 if i j 1 if i = j

Faloutsos, Tong ICDE, 2009

Orthogonal tensor decomposition

On Weighted Structured Total Least Squares

Chapter 3 Transformations

Generic and Typical Ranks of Multi-Way Arrays

From Matrix to Tensor. Charles F. Van Loan

/16/$ IEEE 1728

Linear Algebra 1. M.T.Nair Department of Mathematics, IIT Madras. and in that case x is called an eigenvector of T corresponding to the eigenvalue λ.

Recall : Eigenvalues and Eigenvectors

Third-Order Tensor Decompositions and Their Application in Quantum Chemistry

Fundamentals of Multilinear Subspace Learning

Lecture 4. CP and KSVD Representations. Charles F. Van Loan

Eigenvalues of tensors

Matrix Operations: Determinant

Linear Algebra: Matrix Eigenvalue Problems

Mathematical foundations - linear algebra

Katholieke Universiteit Leuven Department of Computer Science

The Jordan Canonical Form

EECS 275 Matrix Computation

MAT 610: Numerical Linear Algebra. James V. Lambers

Review of Linear Algebra

BASIC ALGORITHMS IN LINEAR ALGEBRA. Matrices and Applications of Gaussian Elimination. A 2 x. A T m x. A 1 x A T 1. A m x

LINEAR ALGEBRA 1, 2012-I PARTIAL EXAM 3 SOLUTIONS TO PRACTICE PROBLEMS

T.8. Perron-Frobenius theory of positive matrices From: H.R. Thieme, Mathematics in Population Biology, Princeton University Press, Princeton 2003

THE PERTURBATION BOUND FOR THE SPECTRAL RADIUS OF A NON-NEGATIVE TENSOR

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Numerical Methods I: Eigenvalues and eigenvectors

INTRODUCTION TO LIE ALGEBRAS. LECTURE 2.

We use the overhead arrow to denote a column vector, i.e., a number with a direction. For example, in three-space, we write

x 104

The QR decomposition and the singular value decomposition in the symmetrized max-plus algebra

Pseudoinverse & Orthogonal Projection Operators

Optimal Scaling of Companion Pencils for the QZ-Algorithm

Linear Algebra (Review) Volker Tresp 2017

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

ELE/MCE 503 Linear Algebra Facts Fall 2018

Here is an example of a block diagonal matrix with Jordan Blocks on the diagonal: J

ON THE REGULARIZATION OF CANONICAL CORRELATION ANALYSIS. Tijl De Bie. Bart De Moor

Properties of Matrices and Operations on Matrices

Multiple integrals: Sufficient conditions for a local minimum, Jacobi and Weierstrass-type conditions

Linear Algebra for Machine Learning. Sargur N. Srihari

I. Multiple Choice Questions (Answer any eight)

Linear Algebra (Review) Volker Tresp 2018

Mathematical foundations - linear algebra

MATH 423 Linear Algebra II Lecture 33: Diagonalization of normal operators.

. The following is a 3 3 orthogonal matrix: 2/3 1/3 2/3 2/3 2/3 1/3 1/3 2/3 2/3

RESEARCH ARTICLE. A strategy of finding an initial active set for inequality constrained quadratic programming problems

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

A randomized block sampling approach to the canonical polyadic decomposition of large-scale tensors

1. What is the determinant of the following matrix? a 1 a 2 4a 3 2a 2 b 1 b 2 4b 3 2b c 1. = 4, then det

MATH 1120 (LINEAR ALGEBRA 1), FINAL EXAM FALL 2011 SOLUTIONS TO PRACTICE VERSION

Eigenvalues, Eigenvectors. Eigenvalues and eigenvector will be fundamentally related to the nature of the solutions of state space systems.

Lecture 15, 16: Diagonalization

Eigenvalues and eigenvectors

Lecture notes: Applied linear algebra Part 1. Version 2

Problem # Max points possible Actual score Total 120

Notes on Eigenvalues, Singular Values and QR

New algorithms for tensor decomposition based on a reduced functional

Conditioning and illposedness of tensor approximations

Structured weighted low rank approximation 1

CVPR A New Tensor Algebra - Tutorial. July 26, 2017

Matrix factorization and minimal state space realization in the max-plus algebra

Chapters 5 & 6: Theory Review: Solutions Math 308 F Spring 2015

15 Singular Value Decomposition

MATHEMATICS 217 NOTES

EIGENVALUE PROBLEMS. EIGENVALUE PROBLEMS p. 1/4

Transcription:

Are diverging CP components always nearly proportional? Alwin Stegeman and Lieven De Lathauwer November, 8 Abstract arxiv:.988v [math.ag] Oct Fitting a Candecomp/Parafac (CP) decomposition (also nown as Canonical Polyadic decomposition) to a multi-way array or higher-order tensor, is equivalent to finding a best low-ran approximation to the multi-way array or higher-order tensor, where the ran is defined as the outer-product ran. However, such a best low-ran approximation may not exist due to the fact that the set of multi-way arrays with ran at most R is not closed for R. Nonexistence of a best low-ran approximation results in (groups of) diverging ran- components when an attempt is made to compute the approximation. In this note, we show that in a group of two or three diverging components, the components converge to proportionality almost everywhere. A partial proof of this result for larger groups of diverging components is also given. Also, we give examples of groups of three, four, and six non-proportional diverging components. These examples are shown to be exceptional cases. Keywords: tensor decomposition, low-ran approximation, Candecomp, Parafac, diverging components. AMS subject classifications: 5A8, 5A, 5A69, 49M7, 6H5. A. Stegeman is with the Heijmans Institute for Psychological Research, University of Groningen, Grote Kruisstraat /, 97 TS Groningen, The Netherlands, phone: ++3 5 363 693, fax: ++3 5 363 634, email: a.w.stegeman@rug.nl, URL: http://www.gmw.rug.nl/ stegeman. Research is supported by the Dutch Organisation for Scientific Research (NWO), VIDI grant 45-8-. L. De Lathauwer is with the Group Science, Engineering and Technology of the Katholiee Universiteit Leuven Campus Kortrij, E. Sabbelaan 53, 85 Kortrij, Belgium, tel.: +3-()56-3.6.6, fax: +3-()56-4.69.99, e-mail: Lieven.DeLathauwer@uleuven-ortrij.be. He is also with the Department of Electrical Engineering (ESAT), Research Division SCD, of the Katholiee Universiteit Leuven, Kasteelpar Arenberg, B-3 Leuven, Belgium, tel.: +3-()6-3.86.5, fax: +3-()6-3.9.7, e-mail: Lieven.DeLathauwer@esat.uleuven.be, URL: http://homes.esat.uleuven.be/ delathau/home.html. Research supported by: () Research Council K.U.Leuven: GOA-MaNet, CoE EF/5/6 Optimization in Engineering (OPTEC), CIF and STRT/8/3 () F.W.O.: (a) project G.47.N, (b) Research Communities ICCoS, ANMMM and MLDM, (3) the Belgian Federal Science Policy Office: IUAP P6/4 (DYSCO, Dynamical systems, control and optimization, 7 ), (4) EU: ERNSI.

Introduction This note is an addendum to Stegeman [7] where the following subject is studied. Let denote the outer-product, and define the ran (over the real field) of Y R I J K as R ran(y) = min{r Y = a r b r c r }. (.) r= Let S R (I,J,K) = {Y R I J K ran(y) R}, (.) be the set of I J K arrays with at most ran R, and let S R (I,J,K) denote the closure of S R (I,J,K), i.e. the union of the set itself and its boundary points in R I J K. Let Z R I J K and denote the Frobenius norm on R I J K. Consider the following low-ran approximation problem. min{ Z Y Y S R (I,J,K)}. (.3) The variables in this problem are actually the ran- terms in the ran-r decomposition of Y: Y = R ω r (a r b r c r ), (.4) r= where vectors a r,b r,c r have norm, r =,...,R. The decomposition (.4) is nown as Candecomp/Parafac (CP) or Canonical Polyadic decomposition. Let A = [a... a R ], B = [b... b R ] and C = [c... c R ]. Assuming ran(z) > R, an optimal solution of (.3) will be a boundary point of the set S R (I,J,K). However, the set S R (I,J,K) is not closed for R, and problem (.3) may not have an optimal solution due to this fact; see De Silva and Lim []. This results in (groups of) diverging ran- terms (also nown as diverging CP components) in (.4) when an attempt is made to compute a best ran-r approximation, see Krijnen, Dijstra and Stegeman [4]. In such a case, the solution array Y converges to a boundary point X of S R (I,J,K) with ran(x) > R. In practice this has the following consequences: while running a CP algorithm to solve (.3), the decrease of Z Y becomes very slow, and some (groups of) columns of A, B, and C become nearly linearly dependent, while the corresponding weights ω r become large in magnitude. However, the sum of the corresponding ran- terms remains small and contributes to a better CP fit. More formally, a group of diverging CP components corresponds to an index set D {,...,R} such that while r D ω (n) r, for all r D, (.5) ω (n) r (a (n) r b (n) r c (n) r ) is bounded, (.6)

where the superscript (n) denotes the n-th CP update of the iterative CP algorithm. More than one group of diverging components may exist. In that case (.5)-(.6) hold for the corresponding disjoint index sets. In practice, a group of diverging components as described above is almost always such that the corresponding columns of A, B, and C become nearly identical up to sign. That is, the diverging components are nearly proportional. In this note, we focus on the question whether this is always true or not. In Section, we show that in a group of two or three diverging components, the corresponding columns of A, B, and C converge to ran- almost everywhere. In Section 4, we give a partial proof of this result for larger groups of diverging components. In Sections 3, 5, and 6, we give examples of groups of three, four, and six non-proportional diverging components, respectively. These examples are shown to be exceptional cases. In this note, all arrays, matrices, vectors, and scalars are real-valued. Groups of two or three diverging components We need the following notation. A matrix form of the CP decomposition (.4) is Y = AC B T, =,...,K, (.) where Y is the -th I J frontal slice of Y, and C is the diagonal matrix with row of C as its diagonal. In (.), the weights ω r are absorbed into A, B, and C. WeuseY = (S,T,U) G todenotethemultilinearmatrixmultiplication ofanarrayg R R P Q with matrices S (I R), T (J P), and U (K Q). The result of the multiplication is an I J K array Y with entries R P Q y ij = s ir t jp u q g rpq, (.) r= p= q= wheres ir, t jp, andu q areentries ofs, T, andu, respectively. Werefertomultiplication (I I,I J,U) G with U nonsingular as a slicemix of G. We say that G has a nonsingular slicemix if I = J and the array (I I,I I,U) G has a nonsingular frontal I I slice for some nonsingular U. For later use, we state the following lemma. Lemma. For R min(i,j,k) and Y S R (I,J,K), it holds that Y = (S,T,U) G for some S, T, U column-wise orthonormal, and some G S R (R,R,R) with all frontal slices upper triangular. Moreover, Y S R (I,J,K) if and only if G S R (R,R,R). Proof. See Stegeman [7, Lemma. (b)]. Our main result is the following. 3

Theorem. Let Y (n) = (A (n),b (n),c (n),ω (n) ) be an I J K CP decomposition, i.e. Y (n) = R r= ω (n) r (a (n) r b (n) r c (n) r ), (.3) with A (n), B (n), and C (n) having the length- vectors a (n) r, b (n) r, and c (n) r as columns, respectively, and Ω (n) = diag(ω (n),...,ω(n) R ). Let Y(n) X with ran(x) > R, and such that ω r (n) for each r. (i) If R =, then A (n), B (n), and C (n) converge to ran- matrices. (ii) If R = 3, then A (n), B (n), and C (n) converge to ran- matrices for almost all X. Proof. Krijnen et al. [4] show that A (n), B (n), and C (n) converge to ran-deficient matrices. For R =, this completes the proof. Next, let R = 3 and suppose min(i,j,k) 3. We have X S 3 (I,J,K) and X is a boundary point of S 3 (I,J,K). By Lemma., there exist L, M, N with orthonormal columns such that X = (L,M,N) G, where the 3 3 3 array G has all frontal slices upper triangular. We have ran(g) = ran(x) > 3 and the ran-3 CP sequence (L T,M T,N T ) Y (n) G. Slightly abusing notation, we write Y (n) = (A (n),b (n),c (n) ) G, where the weights ω r (n) have been absorbed in A (n),b (n),c (n), which are now 3 3 matrices. We assume that G has a nonsingular slicemix. This is true for almost all X, i.e., the subset of boundary points X with ran larger than 3, for which G does not have a nonsingular slicemix, has lower dimensionality than the set of boundary points with ran larger than 3. In fact, if G does not have a nonsingular slicemix, then its upper triangular slices have a zero on their diagonals in the same position. We apply a slicemix to G such that its first slice is nonsingular. Next, we premultiply the slices of G by the inverse of its first slice. Then G is of the form a d f α δ ν G = [G G G 3 ] = b e β ǫ. (.4) c γ Since a matrix cannot be approximated arbitrarily well by a matrix of lower ran, it follows that the approximating ran-3 sequence Y (n) has a nonsingular slicemix for n large enough. Moreover, by Lemma. we may assume without loss of generality that Y (n) has the form (.4). We denote the entries of Y (n) with subscript n, i.e. a n,...,f n and α n,...,ν n. Hence, a n d n f n α n δ n ν n Y (n) = [Y (n) Y (n) Y (n) 3 ] = b n e n β n ǫ n. (.5) c n γ n 4

Next, we consider the ran-3 decomposition (A (n),b (n),c (n) ) of Y (n), which can be written as Y (n) = A (n) C (n) (B (n) ) T, where diagonal matrix C (n) has row of C (n) as its diagonal, =,,3. Since Y (n) = I 3, matrices A (n) and B (n) are nonsingular. Without loss of generality, we set C (n) = I 3. Then (A (n) ) = (B (n) ) T and Y (n) = A (n) C (n) (A (n) ) for =,3. Hence, slices Y (n) and Y (n) 3 have the same eigenvectors. Moreover, their three eigenvectors are linearly independent, and their eigenvalues are on the diagonals of C (n) and C (n) 3, respectively. Since Y(n) and Y (n) 3 have eigenvalues a n,b n,c n and α n,β n,γ n, respectively, we obtain C (n) = a n b n c n. (.6) α n β n γ n From Krijnen et al. [4] we now that A (n), B (n), and C (n) converge to matrices with rans less than 3. The eigendecomposition Y (n) = A (n) C (n) (A (n) ) converges to frontal slice G of G, =,3. Hence, the eigenvectors in A (n) converge to those of G, =,3. Suppose A (n) has a ran- limit. Then G has only one eigenvector and three identical eigenvalues, =,3. Hence, a = b = c and α = β = γ. Suppose A (n) has a ran- limit [a a a 3 ], with a,a,a 3 eigenvectors associated with eigenvalues a,b,c of G, and eigenvalues α,β,γ of G 3, respectively. Without loss of generality, let a and a be linearly independent. If a 3 is proportional to either a or a, then B (n) = (A (n) ) T has large numbers in only two columns. This violates the assumption of ω (n) r for r =,,3. Hence, a 3 is in the linear span of {a,a } and not proportional to a or to a. For an eigenvalue λ of G, we define the eigenspace E (λ) = {x R 3 : G x = λx}, =,3. (.7) It holds that λ λ implies E (λ ) E (λ ) = {}. If a,b,c are distinct, then a,a,a 3 would be linearly independent, which is not the case. Without loss of generality, let a = b. Then a 3 E (a) E (c), which is impossible if a c. Hence, it follows that a = b = c. The proof of α = β = γ is analogous. This implies that C (n) in (.6) converges to a ran- matrix. As Y (n) G, we first assume that the eigenvalues a n,b n,c n are distinct and the eigenvalues α n,β n,γ n are distinct. It can be verified that the eigenvectors A (n) of Y (n) associated with eigenvalues a n,b n,c n are, respectively,, b n a n d n, e n(c n a n) d ne n+f n(c n b n) (c n a n)(c n b n) d ne n+f n(c n b n). (.8) As explained above, the eigenvectors of Y (n) 3 (in terms of α n,...,ν n ) must be identical to those of. We assume d, e, f, δ, ǫ, ν, which holds for almost all X. Combined Y (n) 5

with a = b = c and α = β = γ, it follows from (.8) that A (n) converges to a ran- matrix. For A (n) as in (.8), the columns of B (n) = (A (n) ) T are, d n (a n b n) d ne n+f n(a n b n) (a n b n)(a n c n) d n (b n a n) d ne n (a n b n)(c n b n), d ne n+f n(c n b n) (c n a n)(c n b n). (.9) Hence, each column of B (n) contains large numbers for large n. After normalizing the third entries of each column of B (n) to, we obtain (a n b n)(a n c n) d ne n+f n(a n b n) d n(a n c n) d ne n+f n(a n b n), It follows that also B (n) converges to a ran- matrix. (b n c n) e n,. (.) Above, we assumed distinct eigenvalues a n,b n,c n and α n,β n,γ n. Next, we show that cases with identical eigenvalues can be left out of consideration. We only consider cases where some of a n,b n,c n are identical. Cases where some of α,β,γ are identical can be treated analogously. If a n = b n c n for n large enough, then we must have d n = to obtain three linearly independent eigenvectors of Y (n). This is due to the upper triangular form of Y(n) in (.5). This implies that d = in the limit, which does not hold for almost all X. The case a n b n = c n can be dealt with analogously. Here, we must have e n = to obtain three linearly independent eigenvectors of Y (n) in (.5). This implies that e = in the limit, which does not hold for almost all X. Next, suppose a n = c n b n for n large enough. To obtain three linearly independent eigenvectors of Y (n) in (.5), we must have d n e n +f n (c n b n ) =. Since c n b n c b =, this implies that de = in the limit, which does not hold for almost all X. Finally, we consider the case a n = b n = c n for n large enough. To obtain three linearly independent eigenvectors of Y (n) in (.5), we must have d n = e n = f n =. This implies that d = e = f = in the limit, which does not hold for almost all X. This completes the proof for min(i,j,k) 3. Next, let min(i,j,k) =. Without lossof generality, weassumei J K. IfI = J = K =, then X is a array, which has maximal ran 3 [3]. A contradiction to ran(x) > 3. If I > J = K =, then by [, Theorem 5.] there exists a column-wise orthonormal L (I 3) such that X = (L,I,I ) G, with G a 3 array and ran(x) = ran(g). Since the maximal ran of 3 arrays is 3 [3], we again obtain a contradiction. Finally, let I J > K =. By [, Theorem 5.] there exist column-wise orthonormal L (I 3) and M (J 3) such that X = (L,M,I ) G, with G a 3 3 array and ran(x) = ran(g) > 3. The remainder of the proof is analogous to the beginning of the proof for min(i,j,k) 3. We 6

let Y (n) = (A (n),b (n),c (n) ) G, where A (n) and B (n) are 3 3, and C (n) is 3. As shown in [6], array G can be transformed to have upper triangular slices. Assuming G has a nonsingular slicemix, we transform it to G = [G G ] = a d f b e c. (.) We assume Y (n) to be of the same form, with entries a n,b n,c n,d n,e n,f n. As above, we have B (n) = (A (n) ) T, C (n) = [ a n b n c n ], (.) and eigendecomposition Y (n) = A (n) C (n) (A (n) ) converging to frontal slice G. Krijnen et al. [4] show that C (n) converges to a ran-deficient matrix. Hence, C (n) converges to a ran- matrix, and we have a = b = c in the limit. We assume distinct eigenvalues a n,b n,c n. As above, having some identical eigenvalues for n large enough yields d = or e = or f =, which does not hold for almost all X. The eigenvectors A (n) of Y (n) are given by (.8). We assume d, e, f. Since a = b = c, it can be seen that A (n) converges to a ran- matrix. The same is true for B (n) = (A (n) ) T, which has columns equal to (.) after normalizing the third entry of each column to. This completes the proof. 3 Example of non-proportional diverging components for R = 3 The example is of size 3 3. Let X = a f a e a, (3.) with e and f. Here, X plays the role of G in (.). Since d = in X above, this is an exception to almost all boundaryarrays X with ran(x) > 3. We have Y (n) = (A (n),b (n),c (n) ) X, with A (n) = e n(c n a n) f n (c n b n) (c n a n) f n, B(n) = (A (n) ) T = f n (a n c n) e n (b n c n) and C (n) as in (.). Let a n = a+/n, b n = a /n, c n = a+/n. Then A (n), B(n) e 3f 7 f n (c n a n), (3.), (3.3)

where the columns of B (n) are normalized such that their third entries are equal to. As we see, A (n) converges to a ran- matrix. Note that ω (n) r for r =,,3 in this example, since all three columns of B (n) in (3.) will have large numbers as entries. 4 Extension to groups of four or more diverging components Can statement (ii) of Theorem. also be proven for R 4 if the requirement R min(i,j,k) is added? A proof of this could be analogous to the proof for R = 3 under this requirement. That is, we have Y (n) = (A (n),b (n),c (n) ) G, where G is R R R and has its first slice equal to I R and its other slices upper triangular. Matrices A (n), B (n), and C (n) have size R R, with B (n) = (A (n) ) T, C (n) = I R, and Y (n) = A (n) C (n) (A (n) ) converging to frontal slice G of G, =,...,R. Suppose we have shown that G, =,...,R, all have R identical eigenvalues. Then the limit of C (n) analogous to (.6) is a ran- matrix. For any fixed R 4, we can use symbolic computation software to obtain expressions for A (n) and B (n) analogous to (.8)-(.). This would imply that also A (n) and B (n), when normalized to have length- columns, converge to ran- matrices. The cases where Y (n) has some identical eigenvalues for n large enough, R, restrict some entries of G to zero. This is an exception to almost all boundary arrays X with ran(x) > R. The difficulty in obtaining the proof setched above seems to be in showing that G, =,...,R, all have R identical eigenvalues. For this to be proven, the possibilities for the ran and dependence structure of the limit of A (n) must be analyzed. To have a group of R diverging components with R 4, it must not only be checed that all columns of B (n) = (A (n) ) T contain large numbers, but also that the R components do not consist of several different groups of diverging components. For example, if R = 4, then having two groups of two diverging components corresponds to two times two identical eigenvalues in the limit. In this case, the limit of A (n) has two groups of two proportional columns. Todemonstratetheabove, considerthecaser = 4. ItsufficestoshowthatG hasfouridentical eigenvalues. Let G have eigenvalues λ,λ,λ 3,λ 4 and associated eigenvectors A = [a a a 3 a 4 ], with A (n) A. Let E(λ) = {x R 4 : G x = λx} denote the eigenspace corresponding to eigenvalue λ. We have ran(a) < 4. If ran(a) =, then G has only one eigenvector and four identical eigenvalues: λ = λ = λ 3 = λ 4. Next, let ran(a) =. Without loss of generality, we assume a 3,a 4 span{a,a }, with a and a linearly independent. Suppose λ = λ. Then a 4 E(λ 4 ) E(λ ), which implies λ = λ 4. Analogously, a 3 E(λ 3 ) E(λ ) implies λ = λ 3. Hence, we obtain λ = λ = λ 3 = λ 4. Next, suppose λ λ. Because ran(a) =, we have at most two distinct eigenvalues. If λ 3 = λ λ = λ 4, then a and a 3 are proportional and a and a 4 are proportional. Hence, this is a case of two groups of two diverging components, and not one group of four diverging 8

components. If λ λ = λ 3 = λ 4, then a,a 3,a 4 are proportional, and we have a group of three diverging components only (i.e., large numbers in three columns of B (n) = (A (n) ) T only). Other possibilities for λ λ and ran(a) = are analogous. It follows that if ran(a) =, then λ = λ = λ 3 = λ 4. Next, let ran(a) = 3. Without loss of generality, we assume a 4 span{a,a,a 3 }, with a,a,a 3 linearly independent. Suppose λ = λ = λ 3. Then a 4 E(λ 4 ) E(λ ), which implies λ = λ 4, and yields the desired result. Next, suppose λ = λ λ 3. If λ 4 = λ, then we have a group of three diverging components only. If λ 4 = λ 3, then a 3 and a 4 are proportional, and we have a group of two diverging components only. If λ 4 λ and λ 4 λ 3, then ran(a) = 4 which is not possible. Next, suppose that λ,λ,λ 3 are distinct. Then λ 4 must be equal to one of them. Let λ 4 = λ. Then a and a 4 are proportional, and we have a group of two diverging components only. Other possibilities for the equality of some eigenvalues can be treated analogously. It follows that if ran(a) = 3, then λ = λ = λ 3 = λ 4. As a final remar, we state that under the requirement R min(i,j), the case K = can be proven for any R 4 analogous to the proof of I J > K = and R = 3. 5 Example of non-proportional diverging components for R = 4 This example is in the spiritof [, Section 4]. Let min(i,j) 6, K 5, and R = 4. LetA = [a a ] (I ) and B = [b b ] (J ) be random matrices. For random Q ( ), let à = [ã ã ] = AQ and B = [ b b ] = BQ T. Then AB T = à B T, which implies a b c+a b c ã b c ã b c = O, (5.) for any vector c, where O denotes an allzero array. Let X = [x... x 4 ] (I 4) and Y = [y... y 4 ] (J 4) and Z = [z... z 4 ] (K 4) be random matrices. We define [ A (n) = a + n x a + n x ã n x 3 ã ] n x 4, (5.) [ B (n) = b + n y b + n y b + n y 3 b + ] n y 4, (5.3) [ C (n) = c+ n z c+ n z c+ n z 3 c+ ] n z 4. (5.4) If we let Y (n) = n(a (n),b (n),c (n) ) X, then by using (5.) we obtain X = a b z +a y c+x b c+a b z +a y c+x b c ã b z 3 ã y 3 c x 3 b c ã b z 4 ã y 4 c x 4 b c.(5.5) Themode- ran of X equals ran[a à X] = 6. Themode- ran of X equals ran[b B Y] = 6. The mode-3 ran of X equals ran[c Z] = 5. Since ran(x) is at least equal to its mode-i ran, 9

i =,,3, it follows that ran(x) 6. As we see, both A (n) and B (n) converge to ran- matrices, while C (n) converges to a ran- matrix. We create X according to (5.5) and compute the 4 4 4 array G with upper triangular slices in the transformation X = (L,M,N) G, using the Jacobi-type SGSD algorithm of [] modified as described in [5]. Recall that existence of this transformation follows from Lemma.. Next, we transform the slices of G such that its first slice becomes I 4. We observe the following form for slices,3,4 of the transformed G: a e g a c f. (5.6) a a Hence, the slices have four identical eigenvalues and zeros in positions (,) and (3,4). The latter propertyshows that the limit point is an exception to almost all boundaryarrays X with ran(x) > 4. 6 Example of non-proportional diverging components for R = 6 This example is similar to the one in Section 5. Let min(i,j,k) 8 and R = 6. Let [ ] [ ] [ ] A =, B =, C =, (6.) [ ] [ ] [ ] Ã =, B =, C =. (6.) Then we have 3 a r b r c r = r= [ ] 3 ã r b r c r =, (6.3) r= where the latter is a ran- array. Let S (I ), T (J ), and U (K ) be random matrices. Let X (I 6), Y (J 6), and Z (K 6) be random matrices. We define A (n) = [SA SÃ]+ n X diag(,,,,, ), (6.4) B (n) = [TB T B]+ n Y, C(n) = [UC U C]+ n Z. (6.5) Since the CP decompositions (SA,TB,UC) and (SÃ,T B,U C) yield the same I J K array, it follows that Y (n) = n(a (n),b (n),c (n) ) X, with 3 X = (Sa r Tb r z r +Sa r y r Uc r +x r Tb r Uc r ) r= 3 (Sã s T b s z s+3 +Sã s y s+3 U c s +x s+3 T b s U c s ). (6.6) s=

Themode-ranofX equalsran[sa SÃ X] = 8. Themode-ranofX equalsran[tb T B Y] = 8. Themode-3 ran of X equals ran[uc U C Z] = 8. Since ran(x) is at least equal to its modei ran, i =,,3, it follows that ran(x) 8. As we see, matrices A (n), B (n), and C (n) all converge to ran- matrices. We create X according to (6.6) and compute the 6 6 6 array G with upper triangular slices in the transformation X = (L,M,N) G, using the Jacobi-type SGSD algorithm of [] modified as described in [5]. The resulting slices of G have entries (4,4) and (5,5) almost zero. Hence, G does not seem to have a nonsingular slicemix. This implies that the limit point is an exception to almost all boundary arrays X with ran(x) > 6. References [] De Lathauwer, L, De Moor, B., & Vandewalle, J. (4). Computation of the canconical decomposition by means of a simultaneous generalized Schur decomposition. SIAM Journal on Matrix Analysis and Applications, 6, 95 37. [] De Silva, V., & Lim, L.-H. (8). Tensor ran and the ill-posedness of the best low-ran approximation problem. SIAM Journal on Matrix Analysis and Applications, 3, 84 7. [3] Ja Ja, J. (979). Optimal evaluation of pairs of bilinear forms. SIAM Journal on Computing, 8, 443 46. [4] Krijnen, W.P., Dijstra, T.K., & Stegeman, A. (8). On the non-existence of optimal solutions and the occurrence of degeneracy in the Candecomp/Parafac model. Psychometria, 73, 43 439. [5] Stegeman, A., & De Lathauwer, L. (9). A method to avoid diverging components in the Candecomp/Parafac model for generic I J arrays. SIAM Journal on Matrix Analysis and Applications, 3, 64 638. [6] Stegeman, A. (). The Generalized Schur Decomposition and the ran-r set of real I J arrays. Technical Report, available online as arxiv:.343 [7] Stegeman, A. (). Candecomp/Parafac - from diverging components to a decomposition in bloc terms. Technical Report, submitted.