Uniqueness of Nonnegative Tensor Approximations

Size: px

Start display at page:

Download "Uniqueness of Nonnegative Tensor Approximations"

Damian Holland
5 years ago
Views:

Uniqueness of Nonnegative Tensor Approximations Yang Qi, Pierre Comon, Lek-Heng Lim To cite this version: Yang Qi, Pierre Comon, Lek-Heng Lim. Uniqueness of Nonnegative Tensor Approximations.

tp=&arnumber=7422102&url=http HAL I: hal-01015519 https://hal.archives-ouvertes.

1 Uniqueness of Nonnegative Tensor Approximations Yang Qi, Pierre Comon, Lek-Heng Lim To cite this version: Yang Qi, Pierre Comon, Lek-Heng Lim. Uniqueness of Nonnegative Tensor Approximations. IEEE Transactions on Information Theory, Institute of Electrical an Electronics Engineers, 2016, 62 (4), pp < HAL I: hal Submitte on 12 Apr 2016 HAL is a multi-isciplinary open access archive for the eposit an issemination of scientific research ocuments, whether they are publishe or not. The ocuments may come from teaching an research institutions in France or abroa, or from public or private research centers. L archive ouverte pluriisciplinaire HAL, est estinée au épôt et à la iffusion e ocuments scientifiques e niveau recherche, publiés ou non, émanant es établissements enseignement et e recherche français ou étrangers, es laboratoires publics ou privés. Distribute uner a Creative Commons Attribution - NonCommercial - NoDerivatives 4.0 International License

2 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 62, NO. 4, APRIL Uniqueness of Nonnegative Tensor Approximations Yang Qi, Pierre Comon Fellow, IEEE, an Lek-Heng Lim Abstract We show that for a nonnegative tensor, a best nonnegative rank-r approximation is almost always unique, its best rank-one approximation may always be chosen to be a best nonnegative rank-one approximation, an that the set of nonnegative tensors with non-unique best rank-one approximations form an algebraic hypersurface. We show that the last part hols true more generally for real tensors an thereby etermine a polynomial equation so that a real or nonnegative tensor which oes not satisfy this equation is guarantee to have a unique best rank-one approximation. We also establish an analogue for real or nonnegative symmetric tensors. In aition, we prove a singular vector variant of the Perron Frobenius Theorem for positive tensors an apply it to show that a best nonnegative rank-r approximation of a positive tensor can never be obtaine by eflation. As an asie, we verify that the Eucliean istance (ED) iscriminants of the Segre variety an the Veronese variety are hypersurfaces an give efining equations of these ED iscriminants. I. INTRODUCTION Nonnegative tensor ecomposition, i.e., a ecomposition of a tensor with nonnegative entries (with respect to a fixe choice of bases) into a sum of tensor proucts of nonnegative vectors, arises in a wie range of applications. These inclue hyperspectral imaging, spectroscopy, statistics, phylogenetics, ata mining, pattern recognition, among other areas; see [45], [52], [54], [62] an the references therein. One important reason for its prevalence is that such a ecomposition shows how a joint istribution of iscrete ranom variables ecomposes when they are inepenent conitional on a iscrete latent ranom variable [45], [64] a ubiquitous moel that unerlies many applications. This is in fact one of the simplest Bayesian network [28], [33], [37], a local expression of the joint istribution of a set of ranom variables x i as p(x 1,..., x ) = p(x i θ) µ θ (1) i=1 where θ is some unknown latent ranom variable. The relation expresse in (1) is often calle the naive Bayes hypothesis. In the case when the ranom variables x 1,..., x an the latent variable θ take only a finite number of values, the ecomposition becomes one of the form t i1,...,i = r λ ru i1,p u i,p. (2) Manuscript submitte on October 29, 2014; revise November 16, 2015; accepte February 9, Yang Qi an Pierre Comon are fune by the European Research Council uner the European Community s Seventh Framework Program FP7/ Grant Agreement no Lek-Heng Lim is fune by AFOSR FA , DARPA D15AP00109, NSF IIS , DMS , an DMS Yang Qi an Pierre Comon are with CNRS, Gipsa-Lab, University of Grenoble Alpes, F Grenoble, France ( yang.qi@gipsa-lab.fr, p.comon@ieee.org). Lek-Heng Lim is with the Computational an Applie Mathematics Initiative, Department of Statistics, University of Chicago, 5734 South University Avenue, Chicago, IL 60637, USA ( lekheng@uchicago.eu). One can show [45] that any ecomposition of a nonnegative tensor of the form in (2) may, upon normalization by a suitable constant, be regare as (1), i.e., a marginal ecomposition of a joint probability mass function into conitional probabilities uner the naive Bayes hypothesis. In the event when the latent variable θ is not iscrete or finite, one may argue that (2) becomes an approximation with in place of =. In this article, we investigate several questions regaring nonnegative tensor ecompositions an approximations, focusing in particular on uniqueness issues. In Section II, we efine nonnegative tensors in a way that parallels the usual abstract efinition of tensors in algebra. We will view them as elements in a tensor prouct of cones, i.e., tensors in C 1 C where C 1,..., C are cones an the tensor prouct is that of R + - semimoules (we write R + := [0, ) for the nonnegative reals). The special case C 1 = R n1 +,..., C = R n + then reuces to nonnegative tensors. It has been establishe in [45] that every nonnegative tensor has a best nonnegative rank-r approximation. In Section IV we will show that this best approximation is almost always unique. Furthermore, the set of nonnegative tensors of nonnegative rank > r that o not have a unique best rank-r approximation form a semialgebraic set containe in a hypersurface. For the special case when r = 1, we first show in Section V that for a nonnegative tensor, the best nonnegative rank-one an best rank-one approximations coincie. In Section VII, by exploring normalize singular pairs, we fin an explicit polynomial expression escribing the hypersurface of real (or nonnegative) tensors that amit non-unique best rank-one approximations, which allows one to check whether a given tensor has a unique best rank-one approximation. This polynomial expression also gives a efining equation of the Eucliean istance iscriminant of the Segre variety [22]. In Section VI, we fin results analogous to those in Section VII for real (or nonnegative) symmetric tensors. We prove an analogue of the Perron Frobenius theorem for singular values/vectors of positive tensors in Section V an, among other things, euce that one cannot obtain a best nonnegative rank-r approximation of a positive tensor by eflation, i.e., by fining r successive best nonnegative rank-one approximations. These results woul likely she light on the large number of computational methos for nonnegative matrix factorizations an nonnegative tensor ecompositions [2], [6], [11], [12], [13], [27], [32], [34], [35], [36], [40], [59], [63]. II. NONNEGATIVE TENSORS A tensor of orer (-tensor for short) may be represente as a -imensional hypermatrix, i.e., a -imensional array of (usually) real or complex values. This is a higher-orer generalization of the fact that a 2-tensor, i.e., a linear operator,

3 2 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 62, NO. 4, APRIL 2016 a bilinear form, or a ya, can always be represente as a matrix. Such a coorinate representation sometimes hies intrinsic properties in particular, this array of coorinates is meaningful only if the bases of unerlying vector spaces have been specifie in the first place. With this in min, we prefer to efine tensors properly rather than simply regaring them as -imensional arrays of numbers. The following is the stanar efinition of tensors. We will see later how we may obtain an analogous efinition for nonnegative tensors. Definition 1. Let V i be a vector space of finite imension n i over a fiel K, i = 1,...,, an let V 1 V be the set of -tuples of vectors. Then the tensor prouct V = V 1 V is the free linear space spanne by V 1 V quotient by the equivalence relation (v 1,..., αv i + βv i,..., v ) α(v 1,..., v i,..., v ) + β(v 1,..., v i,..., v ) (3) for every v i, v i V i, α i, β i K, i = 1,...,. A tensor is an element of V 1 V. In particular, (3) gives (α 1 v 1, α 2 v 2,..., α v ) = ( i=1 α i ) (v 1, v 2,..., v ) (4) More etails on the efinition of tensor spaces may be foun in [14], [30], [39], [43]. A ecomposable tensor is one of the form v 1 v, v i V i, i = 1,...,. It represents the equivalence class of tuples up to scaling as in (4), i.e., v 1 v = { (α 1 v 1,..., α v ) : } α i = 1 i=1 By (4), it is clear that a ecomposable tensor cannot in general be uniquely represente by a -tuple of vectors, what is often calle a scaling ineterminacy in the engineering literature. When we use the term unique in this article, it is implicit that the uniqueness is only up to scaling of this nature. From the way a tensor is efine in Definition 1, it is immeiate that a nonzero tensor can always be expresse as a finite sum of nonzero ecomposable tensors. When the number of summans is minimal, this ecomposition is calle a rank ecomposition (the term canonical polyaic or CP is often also use) an the number of summans in such a ecomposition is calle the rank of the tensor. In other wors, we have the following: For every T V 1 V, there exist v i,p V i, i = 1,...,, p = 1,..., rank(t ), such that T = rank(t ) v 1,p v,p. (5) We present the above material, which is largely stanar knowlege, to motivate an analogous construction for real nonnegative tensors. We will first efine nonnegative tensors in a coorinate-epenent manner (i.e., epening on a choice of bases on V 1,..., V ), an then in a coorinate-inepenent manner.. Definition 2. For each i = 1,...,, let V i be a real vector space with im V i = n i. For any fixe choice of basis {v i,1,..., v i,ni } for V i, we enote by V + i the subset of vectors with nonnegative coefficients in V i, i.e., { V + ni } i = α pv i,p V i : α 1,..., α ni R +. We will call an element in V := V 1 V of the form u 1 u where u i V + i for i = 1,...,, a nonnegatively ecomposable tensor. The set of nonnegative tensors V + is then the subset of V efine by V + = { r u 1,p u,p V : u i,p V + i, } i = 1,...,, p = 1,..., r, r N. By its efinition, every element of V + has a representation as a finite sum of nonnegatively ecomposable tensors. A ecomposition of minimal length then yiels the notions of nonnegative tensor rank an nonnegative tensor rank ecomposition. Definition 3. For every T V +, there exist v i,p V + i, i = 1,...,, p = 1,..., r, such that T = rank +(T ) v 1,p v,p (6) where rank + (T ) := {r : T = r } v 1,p v,p. (7) We will call (7) nonnegative tensor rank or nonnegative rank for short an (6) a nonnegative rank ecomposition of the nonnegative tensor T. An obvious property is that rank + (T ) rank(t ) for any T V +. We now examine an alternative coorinate-free approach to efining nonnegative tensors an nonnegative rank. This approach is also more general, yieling a notion of conic rank for a tensor prouct of any convex cones. We first recall the efinition of a tensor prouct of semimoules. See [4] for etails on the existence an a construction of such a tensor prouct. Definition 4. Let R be a commutative semiring an M, N be R-semimoules (cf. Appenix for the efinitions of semirings an semimoules). A tensor prouct M R N of M an N is an R-semimoule satisfying the universal property: There is an R-bilinear map ϕ : M N M R N such that given any other R-semimoule S together with an R-bilinear map h : M N S, there is a unique R-linear map h : M R N S satisfying h = h ϕ. Recall that a convex cone C is a subset of a vector space over an orere fiel that is close uner linear combinations with nonnegative coefficients, i.e., αx + βy belongs to C for all x, y C an any nonnegative scalars α, β. Since any convex cone C i V i is a semimoule over the semiring R +, we have the unique tensor prouct of these convex cones C 1 C as an R + -semimoule up to

4 YANG, COMON, AND LIM: UNIQUENESS OF NONNEGATIVE TENSOR APPROXIMATIONS 3 isomorphism. More precisely, the tensor prouct of cones C 1 C is the quotient monoi F (C 1,..., C )/, where F (C 1,..., C ) is the free monoi generate by all n- tuples (v 1,..., v ) C 1 C, an is the equivalence relation on F (C 1,..., C ) efine by (v 1,..., αv i + βv i,..., v ) α(v 1,..., v i,..., v ) + β(v 1,..., v i,..., v ) for every v i, v i C i, α, β R +, an i = 1,...,. The commutative monoi C 1 C is an R + -semimoule. We write v 1 v for the equivalence class representing (v 1,..., v ) in F (C 1,..., C )/. A multiconic map from C 1 C to a convex cone C is a map ϕ : C 1 C C with the property that ϕ(u 1,..., αv i + βw i,..., u ) = αϕ(u 1,..., v i,..., u ) + βϕ(u 1,..., w i,..., u ) for all α, β R +, i = 1,...,. The multiconic map ν : C 1 C m C 1 C efine by ν(v 1,..., v ) = v 1 v F (C 1,..., C )/ an extene nonnegative linearly to all of C 1 C satisfies the universal factorization property often use to efine tensor prouct spaces: If ϕ is a multiconic map from C 1 C into a convex cone C, then there exists a unique R + -linear map ψ from C 1 C into C, that makes the following iagram commutative: ν C 1 C C 1 C ϕ i.e., ψν = ϕ. Strictly speaking we shoul have written C 1 R+ R+ C to inicate that the tensor prouct is one of R + -semimoules but this is obvious from context. Note that Definition 4 is consistent with our earlier efinition of nonnegative tensors since V + = V 1 + V + as tensor prouct of cones over R +. In [60], the tensor prouct of C 1,..., C is efine to be the convex cone in V 1 V forme by v 1 v V 1 V, where v i C i, an showe that this tensor prouct satisfies the above universal factorization property. By the uniqueness of the R + -semimoule satisfying the universal property, our construction an the one in [60] are equivalent. C If C 1 = R n1 +,..., C = R n +, we may ientify R n1 + R n + = R n1 n + through the interpretation of the tensor prouct of vectors as a hypermatrix via the Segre outer prouct [v 1 (1),..., v 1 (n 1 )] T [v (1),..., v (n )] T ψ = [v 1 (i 1 ) v (i )] n1,...,n i 1,...,i =1. Here we write v(j) for the jth coorinate of v R n. We note that one may easily exten the notion of nonnegative rank an nonnegative rank ecomposition to tensor prouct of other cones. Definition 5. A tensor T C 1 C is sai to be ecomposable if T is of the form u 1 u, where u i C i. For T C 1 C, the conic rank of T, enote by rank + (T ), is the minimal value of r such that T = r u 1,p u,p, where u i,p C i, i.e., T is containe in the convex cone generate by u 1,1 u,1,..., u 1,r u,r. Such a ecomposition will be calle a conic rank ecomposition. In the remainer of this paper, we focus our attention on the case V + = V 1 + V +, the convex cone of nonnegative -tensors although we will point out whenever a result hols more generally for arbitrary cones. For any given positive integer r, we let D + r = {X V + 1 V + : rank +(X) r} enote the set of tensors of nonnegative rank not more than r. III. UNIQUENESS OF RANK DECOMPOSITIONS From the stanpoints of both ientifiability an wellposeness, an important issue is whether a rank ecomposition of the form (5) is unique. It is clear that such ecompositions can never be unique when = 2, i.e., for matrices. But when > 2, rank ecompositions are often unique, which is probably the strongest reason for their utility in applications. There are well-known sufficient conitions ensuring uniqueness of rank ecomposition [38], [53], [20], [21] an many recent works on the uniqueness of generic tensors of certain ranks [56], [9], [5], [10]. We highlight three notable results. Theorem 6 (Kruskal). The rank ecomposition of a -tensor T is unique if rank(t ) 1 + i=1 (κ i 1) 2 where κ i enote the Kruskal rank of the factors u i,1,..., u i,rank(t ), which is generically equal to the imension n i when n i rank(t ). Theorem 7 (Bocci Chiantini Ottaviani). The rank ecomposition of a generic -tensor T of rank-r is unique when i=1 r n i (n 1 + n 2 + n 3 2) i=3 n i 1 + i=1 (n. i 1) Theorem 8 (Chiantini Ottaviani Vannieuwenhoven). The rank ecomposition of a generic -tensor T of rank-r is unique when i=1 r < n i 1 + i=1 (n i 1) if i=1 n i 15000, with some exceptional cases. The authors of [10] also strengthene the above result by a prior compression of tensor T. The consequence is that the imensions n i in Theorem 8 may be replace by the multilinear rank of T, which allows significant tightening of the upper boun for low multilinear rank tensors. The maximum

5 4 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 62, NO. 4, APRIL 2016 R smax where a generic tensor with rank R smax has a unique rank ecomposition has been calle the maximum stable rank in [57]. Theorem 8 implies that if i=1 n i 15000, then asie from the exceptional cases, the maximum stable rank is i=1 n i/[1 + i=1 (n i 1)] 1, which is one less than the (expecte) generic rank [56], [42], [1], [16]. Nevertheless these results o not apply irectly to nonnegative ecompositions over R + (as oppose to ecompositions over C) nor to rank-r approximations (as oppose to rank-r ecompositions). The purpose of this paper is to provie some of the first results in these irections. In particular, it will be necessary to istinguish between an exact nonnegative rank-r ecomposition an a best nonnegative rank-r approximation. Note that when a best nonnegative rank-r approximation to a nonnegative tensor T is unique, it means that min T X (8) rank +(X) r has a unique minimizer X. The nonnegative rank-r ecomposition of X may not however be unique. A nonnegative rank ecomposition X = r u 1,p u,p V 1 + V + is sai to be unique if for any other nonnegative rank ecomposition X = r v 1,p v,p, there is a permutation σ of {1,..., } such that u 1,p u,p = v 1,σ(p) v,σ(p) for all p = 1,..., r. IV. EXISTENCE AND GENERIC UNIQUENESS OF RANK-r APPROXIMATIONS Let V 1,..., V be real vector spaces. Given a nonnegative tensor T V +, we consier the best nonnegative rank-r approximations of T, where r is less than the nonnegative rank of T. We let δ(t ) = inf X D + r T X = inf rank+(x) r T X, where is the Hilbert Schmit norm, i.e., the l 2 -norm given by the inner prouct. Henceforth any unlabelle norm on V 1 V will always enote the Hilbert Schmit norm. When = 2, the Hilbert Schmit norm reuces to the Frobenius norm of matrices an when = 1, it reuces to the Eucliean norm of vectors. Also, throughout this article, the notation X, Y will always enote tensor contraction in all possible inices for X, Y tensors of any orer [43]. When X an Y are of the same orer an real, X, Y reuces to a real inner prouct an our notation is consistent with the inner prouct notation; in particular X, X = X 2. When X is a -tensor an Y is a ( 1)-tensor, X, Y is a vector this is the only other case that will arise in our iscussions below. Note however that over C,, is only a symmetric bilinear form an not a complex inner prouct (which is a sesquilinear form). Proposition 9. Let C i V + i be a close semialgebraic cone for i = 1,...,. Then D r + = {X C 1 C : rank + (X) r} is a close semialgebraic set. Proof. It follows from [45] that the set is close an from the Tarski Seienberg Theorem [19] that it is semialgebraic. Since D r + is a close set, for any T / D r +, there is some T D r + such that T T = δ(t ). The following result is an analogue of [25, Theorem 27] for nonnegative tensors base on [25, Corollary 18]. Proposition 10. Almost every T V + with nonnegative rank > r has a unique best nonnegative rank-r approximation. Proof. For any T, T V 1 V, δ(t ) δ(t ) T T, i.e., δ is Lipschitz an thus ifferentiable almost everywhere in V = V 1 V by Raemacher Theorem. Consier a general T V +. Then in particular T lies in the interior of V + an there is an open neighborhoo of T containe in V +. So δ is ifferentiable almost everywhere in V + as well. Suppose δ is ifferentiable at T V +. For any U V, let δt 2 (U) be the ifferential of δ2 at T along the irection U. Since T T = δ(t ) we obtain δ 2 (T + tu) = δ 2 (T ) + t δ 2 T (U) + O(t 2 ) T + tu T 2 = δ 2 (T ) + 2t U, T T + t 2 U 2. Therefore, for any t, we have t δ 2 T (U) 2t U, T T, which implies that δ 2 T (U) = 2 U, T T. If T is another best nonnegative rank-r approximation of T, then 2 U, T T = δ 2 T (U) = 2 U, T T, from which it follows that T T, U = 0 for any U, i.e., T = T. We note that Proposition 10 hols more generally for arbitrary close cones C 1,..., C in place of V 1 +,..., V +. Our next proposition hols true for arbitrary close semialgebraic cones C 1,..., C in place of V 1 +,..., V +. Proposition 11. The nonnegative tensors satisfying (i) nonnegative rank > r, an (ii) o not have a unique best rank-r approximation, form a semialgebraic set that is containe in some hypersurface. Proof. Observe that D + r ϕ r : (V + 1 V + )r V +, (u 1,1,..., u,1,..., u 1,r,..., u,r ) r is the image of the polynomial map j=1 u 1,j u,j. Hence D r + is semialgebraic by the Tarski Seienberg Theorem [19] an the require result follows from [26, Theorem 3.4]. Now we examine a useful necessary conition for r T p to be a best rank-r approximation of T V 1 V. For a vector u V i, we enote by u(j) the jth coorinate of u, i.e., u = (u(1),..., u(n i )), an we will borrow a stanar notation from algebraic topology where a hat over a quantity means that quantity is omitte. So for example, û 1 u 2 u 3 = u 2 u 3, u 1 û 2 u 3 = u 1 u 3, u 1 u 2 û 3 = u 1 u 2, u 1 û i u = u 1 u i 1 u i+1 u.

6 YANG, COMON, AND LIM: UNIQUENESS OF NONNEGATIVE TENSOR APPROXIMATIONS 5 Let us recall the following well-known fact, which has been use to evelop algorithms for nonnegative matrix factorization an nonnegative tensor ecomposition. Lemma 12. Let V 1,..., V be real vector spaces an let T V 1 V. Let rank(t ) > r an λ r j=1 T j be a best rank-r approximation, where T j = u 1,j u,j an r j=1 T j = 1. Then for all i = 1,...,, an p = 1,..., r, T, u 1,p û i,p u,p r = λ T j, u 1,p û i,p u,p, (9) j=1 where λ = T, r j=1 T j. Proof. Let L enote the line in V 1 V spanne by r j=1 v 1,j v,j, an L enote the orthogonal complement of L. Denote the orthogonal projection of T onto L by Proj L (T ). Then T 2 = Proj L (T ) 2 + Proj L (T ) 2, an thus min T α r v 1,p v,p α 0 = T Proj L (T ) 2 = Proj L (T ) 2 = T 2 Proj L (T ) 2. So computing min min T α r v 1,j v,j v 1,1,...,v,r j=1 α 0 is equivalent to computing max Proj L (T ) = v 1,1,...,v,r max v 1,1,...,v,r T, 2 r v 1,j v,j. j=1 Since r j=1 T j = 1, we must have r T j, u 1,p û i,p u,p 0 j=1 for some p. The Jacobian matrix of the hypersurface efine by r j=1 v 1,j v,j = 1 has constant rank 1 aroun (u 1,1,..., u,1,..., u 1,r,..., u,r ), i.e., this real hypersurface is smooth at the point (u 1,1,..., u,1,..., u 1,r,..., u,r ). Hence we may consier the Lagrangian L = T, r v 1,p v,p ( r ) λ v 1,p v,p 1. (10) Setting L/ v i,p = 0 at (u 1,1,..., u,1,..., u 1,r,..., u,r ) gives T, u 1,p û i,p u,p r = λ T j, u 1,p û i,p u,p j=1 with λ = T, r j=1 T j for all i = 1,...,, p = 1,..., r. (11) Lemma 12 has a nice geometric interpretation as follows. Let σ r (PV 1 PV ) be the cone of the rth secant variety of the Segre variety PV 1 PV. Suppose λ r j=1 T j is a smooth point. Then T λ r j=1 T j is perpenicular to the tangent space of σ r (PV 1 PV ) at λ r j=1 T j. We presente Lemma 12 in a concrete affine (as oppose to projective) manner so that there will be no ambiguity when iscussing λ an u i,j. We will see later in Definition 17 that when r = 1, these are normalize singular values an normalize singular vector tuples of T. For a nonnegative tensor T with rank + (T ) > r, we have an inequality in place of the equality in (9). First we efine the support of a vector v V to be supp(v) := {i {1,..., im V } : v i 0}. Lemma 13. Let T V + with rank + (T ) > r an X = r u 1,p u,p be a solution of the optimization problem (8). Then T, u 1,p v i,p u,p X, u 1,p v i,p u,p (12) where v i,p V + i, i = 1,...,, an p = 1,..., r. For each pair (i, p), consier the subspace Then Ṽ i,p := {v V i : supp(v) supp(u i,p )}. T, u 1,p v i,p u,p for v i,p Ṽi,p. = X, u 1,p v i,p u,p (13) Proof. Fix a pair (i, p) an consier a curve X(t) = u 1,p (u i,p + tv i,p ) u,p + j p u 1,j u,j, where v i,p V + i. Since for t 0, T X(t) achieves a local minimum at t = 0, i.e., nonecreasing in [0, ε) for some small ε > 0, the right erivative lim t 0+ In other wors, we have T X(t) 0. t T, u 1,p v i,p u,p X, u 1,p v i,p u,p. In particular, if v i,p Ṽi,p, X(t) is nonnegative for t ( ε, ε), then the local minimality of T X(t) at 0 implies that T X(t) t = 0, t=0 which gives us T, u 1,p v i,p u,p as require. = X, u 1,p v i,p u,p, Recall that a choice of bases is always implicit when we iscuss V + (cf. Definition 2) an we may refer to coorinates (or entries) of a nonnegative tensor T without ambiguity.

7 6 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 62, NO. 4, APRIL 2016 Lemma 14. Let T V + with rank + (T ) > r an X be a solution of the optimization problem (8). Then there exist i 1,..., i such that the coorinate (T X) i1,...,i > 0. Proof. Let X = r u 1,p u,p. Suppose (T X) i1,...,i 0 for all i 1,..., i. Then there is some p {1,..., r } such that u 1,p (i 1 ) u,p (i ) > 0. So T X, u 1,p u,p which contraicts (13). (T X) i1,...,i u 1,p (i 1 ) u,p (i ) < 0, Proposition 15. Let T V + with rank + (T ) > r an X be a solution to the optimization problem (8). Then rank + (X) = r. Proof. Suppose that rank + (X) r 1. By Lemma 14 there is some coorinate (T X) i1,...,i > 0. Let X be the rankone tensor whose only nonzero coorinate X i 1,...,i = (T X) i1,...,i. Then T X X < T X an rank + (X + X ) r, which contraicts X being a solution of (8). Proposition 15 shows that a solution X of (8) inee has nonnegative rank exactly r; so it is in fact appropriate to call X a best nonnegative rank-r approximation of T. V. RANK-ONE APPROXIMATIONS FOR NONNEGATIVE TENSORS AND THE PERRON FROBENIUS THEOREM We have establishe in Section IV that a best nonnegative rank-r approximation of a nonnegative tensor is generically unique. In this section we focus on the case r = 1 an fin sufficient conitions that guarantee the uniqueness of best nonnegative rank-one approximations. We begin with the following simple but useful observation: For a nonnegative tensor, a best rank-one approximation can always be chosen to be a best nonnegative rank-one approximation. Theorem 16. Given T V +, let u 1 u V 1 V be a best rank-one approximation of T. Then u 1,..., u can be chosen to be nonnegative, i.e., u 1 V 1 +,..., u V +. Proof. Let T = (T i1,...,i ) an u i = (u i (1),..., u i (n i )). Then T u 1 u 2 = n 1,...,n ( Ti1,...,i i 1,...,i =1 u 1 (i 1 ) u (i ) ) 2 n 1,...,n ( Ti1,...,i i 1,...,i =1 u 1 (i 1 ) u (i ) ) 2. Since u 1 u is a best rank-one approximation, we can choose u j (i j ) = u j (i j ), i.e., u 1 V 1 +,..., u V +. By Theorem 16, there is no nee to istinguish between a best rank-one an a best nonnegative rank-one approximation of a nonnegative tensor. This allows us to treat best rank-one approximations of a real tensor in a unifie way, i.e., we will look for sufficient conitions to ensure a unique best rank-one approximation of a real tensor. Motivate in part by the notion of singular pairs of a tensor [44] an by the case r = 1 in Lemma 12, we propose the following efinition. Definition 17. Let V 1,..., V be vector spaces over K of imensions n 1,..., n. For T V 1 V, we call (λ, u 1,..., u ) K V 1 V a normalize singular pair of T if { T, u 1 û i u = λu i, (14) u i, u i = 1, for all i = 1,...,. We call λ a normalize singular value an (u 1,..., u ) is calle a normalize singular vector tuple corresponing to λ. If K = R, λ 0, an u i V + i, we call (λ, u 1,..., u ) a nonnegative normalize singular pair of T. The reaer is remine that the contraction prouct, is only an inner prouct over R but not C. In particular, u, u u 2 over C. In Definition 17 we require that u i, u i = 1 instea of u i = 1 because u i, u i = 1 is an algebraic conition, i.e., it is efine by a polynomial equation. However imposing the conition u i, u i = 1 woul exclue isotropic complex singular vector tuples with u i, u i = 0 note that over C this can happen for u i 0. As such, the following projective variant introuce in [25] is useful when we woul like to inclue such isotropic cases. Definition 18. Let W 1,..., W be complex vector space. For T W 1 W, ([u 1 ],..., [u ]) PW 1 PW is calle a projective singular vector tuple if T, u 1 û i u = λ i u i (15) for some λ i C, i = 1,...,. The number of projective singular vector tuples of a generic tensor has been calculate in [25]. In the sense of [22], this number is the Eucliean istance egree of the Segre variety. Note that as Definition 18 is over projective spaces, the λ i s are not well-efine complex numbers, an neither is i=1 λ i, but this prouct correspons in an appropriate sense to a singular value as we will see next. Definitions 17 an 18 are relate over C as follows. Suppose ([u 1 ],..., [u ]) PW 1 PW is a projective singular vector tuple. We first choose a representative (u 1,..., u ) of ([u 1 ],..., [u ]) that satisfies (15) an has u i = 1. Note that we may assume i=1 λ i to be a nonnegative real number: If (v 1,..., v ) is such that v j = e iθj u j, then T, v 1 v j v = µ j v j an we may choose appropriate θ 1,..., θ so that µ i = e i( 2)(θ1+ +θ ) λ i R +. i=1 i=1 For a nonnegative i=1 λ i, λ := ( i=1 λ i ) 1/ is almost a normalize singular value of T with corresponing normalize singular vector tuple (u 1,..., u ) almost because the conition u i, u i = 1 in Definition 17 has to be replace by u i = 1. It has been shown in [25] that a generic T oes not have a zero singular value nor a projective singular vector tuple ([u 1 ],..., [u ]) such that u i, u i = 0 for some i. Thus, for a generic T, both efinitions above are equivalent. We may use the two efinitions interchangeably epening on the situation.

8 YANG, COMON, AND LIM: UNIQUENESS OF NONNEGATIVE TENSOR APPROXIMATIONS 7 In this article, we will mainly consier the normalize singular pairs of a tensor as efine in Definition 17. The next three results give an analogue of the tensorial Perron Frobenius Theorem [8], [24], [44], [61] for nonnegative normalize singular pairs (as oppose to nonnegative eigenpairs [44]). The proof of Lemma 19 in particular will require the l 1 -norm. Again recall that a choice of bases is always implicit when we iscuss V + (cf. Definition 2) an the l 1 -norm is with respect to this choice of bases. Lemma 19 (Existence). A nonnegative tensor T V + has at least one nonnegative normalize singular pair. Proof. Consier the compact convex set { D = (u 1,..., u ) V 1 + V + : } u i 1 = 1. i=1 If i=1 T, u 1 û i u 1 = 0 for some (u 1,..., u ), then T, u 1 û i u = 0 for all i, which implies that λ = 0. On the other han, if i=1 T, u 1 û i u 1 > 0, we efine the map ψ : D D by ψ(u 1,..., u ) ( T, u 2 u = i=1 T, u,... 1 û i u 1..., T, u 1 u 1 i=1 T, u 1 û i u 1 ). Note that each term T, u 1 û i u in the enominator is the contraction of a -tensor with a ( 1)- tensor an therefore the result is a vector. We then normalize by the sum of the l 1 -norms of these vectors so that ψ 1 = 1. By Brouwer s Fixe Point Theorem, there is some u 1 u such that T, u 1 û i u = λu i where λ = i=1 T, u 1 û i u 1. Since T, u 1 u = λ u i 2 for i = 1,...,, u 1 = = u. Let u i = u i/ u i an λ = T, u 1 u. Then (λ, u 1,..., u ) is a nonnegative normalize singular pair. One of our reviewers has pointe out to us that Lemma 19 may also be obtaine from Lemma 12 an Theorem 16. Definition 20. We say that a tensor T V + is positive if all its coorinates (with respect to the implicit choice of bases when we specify V +, cf. Definition 2) are positive. Lemma 21 (Positivity). If T is positive, then T has a positive normalize singular pair (λ, u 1,..., u ) with λ > 0. Proof. By Lemma 19, T has a nonnegative normalize singular pair (λ, u 1,..., u ). Suppose that a choice of bases has been fixe for V 1,..., V. We let v i (j) enote the jth coorinate of a vector v i V i, j = 1,..., n i. Let α = min{u i (j) : i = 1,...,, j supp(u i )}. For any i an j, λu i (j) = T, u 1 û i u (j) α 1 k j supp(u j) T k 1...k i 1jk i+1...k > 0, implying that λ an all coorinates of u i are positive. We recall the efinition of spectral norm for a tensor, which is known [31] to be NP-har to compute or even approximate. Definition 22. For T V 1 V over R, let T σ := max{ T, u 1 u : u 1 = = u = 1} be the spectral norm of T. We may euce the following from [25, Theorem 20] an Lemma 12. Corollary 23 (Generic Uniqueness). A general real tensor T has a unique normalize singular pair (λ, u 1,..., u ) with λ = T σ. The relation between best rank-r an best rank-one approximations of a matrix over R or C is well-known: A best rank-r approximation can be obtaine from r successive best rank-one approximations a consequence of the Eckart Young Theorem. It has been shown in [55] that this eflation proceure oes not work for real or complex -tensors of orer > 2. In fact, more recently, it has been shown in [58] that the property almost never hols when > 2. We will see here that the eflatability property oes not hol for nonnegative tensor rank either. Proposition 24. A best nonnegative rank-r approximation of a positive tensor with nonnegative rank > r cannot be obtaine by a sequence of best nonnegative rank-one approximations. Proof. It suffices to show that a best nonnegative rank-2 approximation cannot be obtaine by two best nonnegative rank-one approximations. Let T V + be a positive tensor with rank + (T ) > 2. Suppose u 1 u is a best rank-one approximation of T, an u 1 u + v 1 v is a best nonnegative rank-2 approximation of T. By the proof of Lemma 21, u k > 0 for all k = 1,...,, then by Lemma 13, we have T u 1 u, u 1 u = 0, T u 1 u v 1 v, u 1 u = 0. We subtract the secon equation from the first to get v 1 v, u 1 u = 0, which contraicts the non-negativity of each v k positivity of each u k. an the Following [58], we say that a tensor T V + with nonnegative rank s amits a Schmit Eckart Young ecomposition if it can be written as a linear combination of nonnegatively ecomposable tensors T = s u 1,p u,p, an such that r u 1,p u,p is a best nonnegative rank-r approximation of T for all r = 1,..., s. Proposition 24 shows that a general nonnegative tensor oes not amit a Schmit Eckart Young ecomposition. We point out that methos in [18], [48], [49] (for real/complex) [12], [63], [35] (nonnegative) rely on eflation.

9 8 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 62, NO. 4, APRIL 2016 VI. UNIQUENESS OF BEST RANK-ONE APPROXIMATIONS FOR REAL SYMMETRIC TENSORS Not every tensor has a unique best rank-one approximation [55, Proposition 1]. For example, the symmetric 3-tensor x x x + y y y, where x an y are orthonormal, has two best rank-one approximations: x x x an y y y. It is known that a best rank-one approximation of a symmetric tensor can be chosen to be symmetric over R an C [3], [23]. In this section we stuy various properties of the set of symmetric tensors that o not have unique best symmetric rank-one approximations. Before we get to these we will have to first introuce analogues/generalizations of eigenpairs an characteristic polynomials for higher-orer symmetric tensors. In the following, for a real or complex vector space V, S (V ) enotes the symmetric -tensors over V. For any u V, we write u = u u S (V ) for the -fol tensor prouct of u with itself. Let V be the ual space of V. For any group G acting on V, G also acts naturally on S (V ) an S (V ) such that S, T = g S, g T for all g G, T S (V ), an S S (V ). If we fix an inner prouct (, ) on V, then V becomes self ual an we may ientify V = V. In which case, may be regare the inner prouct on S (V ) efine by u, v := (u, v) an extene linearly to any S, T S (V ) (since any element of S (V ) may be expresse as a linear combination of u s [15]). The inner prouct, is clearly invariant uner the group that preserves the inner prouct (, ). In particular, if V = R n, then, is invariant uner the orthogonal group 1 O(n). The following efinition of symmetric tensor eigenpairs is base on [7], [44], [50]. Definition 25. For T S (V ) over C, (λ, u) C V is calle a normalize eigenpair of T if { T, u ( 1) = λu, u, u = 1. λ is the normalize eigenvalue an v the corresponing normalize eigenvector of T. Two normalize eigenpairs (λ, u) an (µ, v) of T are equivalent if (λ, u) = (µ, v) or if ( 1) 2 λ = µ an u = v. A normalize eigenvalue λ is sai to be simple if it has only one corresponing normalize eigenvector up to equivalence. The number of eigenpairs of a tensor over C has been etermine in [7], [47]; one may view this as the ED egree of the Veronese variety [22]. Definition 25 also applies to a real vector space V. In this case, normalize eigenpairs of T S (V ) are invariant O(n). It is easy to see that for a symmetric tensor T S (V ), the spectral norm T σ is the largest eigenvalue of T in absolute value. Let S n 1 enote the unit sphere in R n. The subset 1 Henceforth we assume that our vector spaces are equippe with inner proucts an we write O(n) for the group that preserves the inner prouct. {u S n 1 : T, u = T σ } is non-empty an close in S n 1 an invariant uner O(n). To introuce the characteristic polynomial of a symmetric tensor, we first recall the efinition an some basic properties of the multipolynomial resultant [29], [17]. For any given n + 1 homogeneous polynomials F 0,..., F n C[x 0,..., x n ] with positive total egrees 0,..., n, let F i = α = i c i,α x α0 0 xαn n, where α = (α 0,..., α n ) an α = α α n. We will associate each pair (i, α) with a variable u i,α. Now given a polynomial P in the variables u i,α where i = 0,..., n, an α { 0,..., n }, we enote by P (F 0,..., F n ) the result obtaine by substituting each u i,α in P with c i,α. The following is a classical result in invariant theory [29], [17]. Theorem 26. There is a unique polynomial Res with integer coefficients in the variables u i,α where i = 0,..., n, an α { 0,..., n }, that has the following properties: (i) F 0 = = F n = 0 have a nonzero solution over C if an only if Res (F 0,..., F n ) = 0. (ii) Res (x 0 0,..., xn n ) = 1. (iii) Res is irreucible over C. Definition 27. Res (F 0,..., F n ) C is calle the resultant of the polynomials F 0,..., F n. Often we will also say that it is the resultant of the system of polynomial equations F 0 = 0,..., F n = 0. The following efinition was first propose in [51] an calle an E-characteristic polynomial. Definition 28. The characteristic polynomial of a symmetric tensor T is the resultant ψ T (λ) of the following systems of polynomial equations in n + 1 variables u an x (note that u has n entries). (i) For T S 2 1 (V ), T, u ( 1) λx 2 u = 0 an x 2 u, u = 0. (ii) For T S 2 (V ), T, u (2 1) λ u, u 1 u = 0. Note that we regar λ as a parameter an not one of the variables. One may show that the resultant ψ T (λ) is a (univariate) polynomial in λ. In the following, for u, v, w V, we write u v w := 1 6 (u v w + u w v + v u w + v w u + w v u + w u v) for the symmetric tensor prouct [15]. Note that u v w = v u w = = w v u, i.e., symmetric tensor prouct is inepenent of orer an in particular u v w S 3 (V ). It is easy to exten this to arbitrary orer u 1 u = 1 u τ(1) u τ() S (V ).! τ S For u V, we may write u = u u for the - fol symmetric tensor prouct of u with itself but we clearly always have u = u.

10 YANG, COMON, AND LIM: UNIQUENESS OF NONNEGATIVE TENSOR APPROXIMATIONS 9 Proposition 29. Let V be a real vector space of imension n. Let ρ = T σ an efine H ρ := {T S (V ) : ρ is not simple}. Then H ρ is an algebraic hypersurface in S (V ). Proof. For notation convenience, we prove the result for = 3; extening to > 3 is straightforwar. Let T S 3 (V ). Suppose T H ρ, i.e., there exist u v V with u = v = 1 such that T, u 2 = ρu, T, v 2 = ρv. Let u 1 := u an exten it to {u 1,..., u n }, an orthonormal basis of V. By an action of the orthogonal group O(n) on V, we may assume that v = u 1 cos θ + u 2 sin θ for some θ (0, π). Let T ijk := T, u i u j u k. Then T 111 = ρ, T i11 = 0, (16) T 111 cos 2 θ + T 122 sin 2 θ = T 111 cos θ, (17) 2T 122 sin θ cos θ + T 222 sin 2 θ = T 111 sin θ, 2T j12 cos θ + T j22 sin θ = 0, for i 1 an j > 2. By eliminating θ, we may obtain equations in T ijk s. For example, (17) implies cos θ = 1 or (T 111 T 122 ) cos θ = T 122, an (18) implies sin θ = 0 or 2T 122 cos θ + T 222 sin θ = T 111. Since sin 2 θ + cos 2 θ = 1 an θ 0 or π, we have { [T 111 (T 111 T 122 ) 2T ]2 + T T = T (T 111 T 122 ) 2, (T 111 T T T )T j22 = 2T j12 T 222 (T 111 T 122 ). (18) Let J := {(T, [u 1,..., u n ]) S 3 (V ) O(n) : T ijk satisfies (18)}. Consier the projections S 3 (V ) π 1 J π 2 O(n) (19) where π 1 (T, [u 1,..., u n ]) = T an π 2 (T, [u 1,..., u n ]) = [u 1,..., u n ]. By [51], ρ is a root of the E-characteristic polynomial ψ T (λ) of T. So ρ an any of its corresponing normalize eigenvectors must epen algebraically on T, implying that J is a variety in S 3 (V ) O(n). Hence T has more than one inequivalent normalize eigenvectors corresponing to ρ if an only if T is in the image of π 1, i.e., H ρ = π 1 (J). Now efine T S 3 (V ) by T 111 = 1, T 122 = 2 3 3, T 222 = , an set all other terms T ijk = 0. Then T has two normalize eigenvectors corresponing to its normalize eigenvalue ρ = T σ = 1. Hence T π 1 (J). Since T has a finite number of eigenvectors, a generic T π 1 (J) must also have a finite number of eigenvectors by semicontinuity. Hence im π1 1 (T ) = im O(n 2) for a generic T π 1(J). So im H ρ = im π 1 (J) = im J im O(n 2) = im J (n 2)(n 3)/2. Since π 2 is a ominant morphism, an the imension of a generic fiber π2 1 ([u 1,..., u n ]) is im S 3 (V ) 2(n 1), we euce that im J = im S 3 (V ) 2(n 1) + im O(n). Therefore im H ρ = im S 3 (V ) 1, i.e., H ρ is a hypersurface. Let V be a real vector space. We specify a choice of basis on V an efine the set of nonnegative symmetric tensors to be S (V + ) := S (V ) (V ) +. Recall also Definition 20. Corollary 30. Let T S 3 (V + ) be positive. Let u V be such that T, u 3 = ρ = T σ an σ 2 := min{ T, u v v : u, v = 0, v = 1}. If σ 2 ρ/2, then T has a unique best nonnegative symmetric rank-one approximation. Proof. By Lemma 12, suppose there exist v u such that v = 1, u, v = 0, an T, (u cos θ + v sin θ) 3 = ρ for some 0 < θ π. Then by Lemma 21, we must in fact have 0 < θ < π/2. By (17), T, u v v = cos θ 1+cos θ ρ. Since 0 < cos θ 1+cos θ < 1 2 when 0 < θ < π/2, we get 0 < T, u v v < ρ/2, which contraicts σ 2 ρ/2. Let V be a real vector space of imension n an W = V R C be its complexification. A generic T S (W ) has istinct eigenvalues [7], so the resultant of the polynomial ψ T an its erivative ψ T, enote by D eig(t ), is a nonzero polynomial on S (W ) calle the eigen iscriminant. The equation D eig (T ) = 0 efines the complex hypersurface H isc consisting of tensors T S (W ) that o not have simple normalize eigenpairs. For T S (V ), the hypersurface H ρ in Proposition 29 is a union of some components of the real points of H isc. In fact, if we replace ρ = T σ by any real normalize eigenvalue µ of T in the proof of Proposition 29, we may show that the subset of symmetric tensors whose normalize eigenvalues are not all simple is a finite union of real algebraic hypersurfaces, an these hypersurfaces are the real points of H isc. We summarize this iscussion as follows. Theorem 31. D eig (T ) = 0 is a efining equation of the hypersurface H isc := {T S (W ) : T has a non-simple eigenvalue}. For T S (V ), if D eig (T ) 0, then by efinition, either (i) there is a unique eigenvector v λ corresponing to each eigenvalue λ of T when is o, or (ii) there are two eigenvectors ±v λ corresponing to each eigenvalue λ of T when is even. Hence we have the following. Corollary 32. Let T S (V ). If D eig (T ) 0, then T has a unique best symmetric rank-one approximation. We euce the following analogue for nonnegative tensors from Banach s Theorem that the best rank-one approximation of a symmetric tensor can be chosen to be symmetric [3], [23], Theorem 16, an Corollary 32. Corollary 33. Let T S (V + ). If D eig (T ) 0, then T has a unique best symmetric nonnegative rank-one approximation.

11 10 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 62, NO. 4, APRIL 2016 Let X C n be a complex variety. For x X an u / X, let u (x) = n i=1 (u(i) x(i))2. The Eucliean istance egree (ED egree) of X is the number of nonsingular critical points of u for a generic u, an the ED iscriminant is the set of u such that at least two critical points of u coincie [22]. Hence Theorem 31 shows that the ED iscriminant of the cone over the Veronese variety (in both the real an complex case) is a hypersurface, an D eig (T ) = 0 gives its efining equation. Example 34. Let T = [T ijk ] S 3 (R 2 ). Then ψ T (λ) is the resultant of the polynomials F 0 = T 111 x 2 + 2T 112 xy + T 122 y 2 λxz, F 1 = T 112 x 2 + 2T 122 xy + T 222 y 2 λyz, F 2 = x 2 + y 2 z 2. Consier the Jacobian eterminant J of F 0, F 1, F 2. Then J = et F 0/ x F 0 / y F 0 F 1 / x F 1 / y F 1 / z F 2 / x F 2 / y F 2 / z = (8T T 111T 122 2λ 2 )x 2 z + 4T 112 λy 3 + (8T T 222 T 112 2λ 2 )y 2 z + 4T 122 λx 3 + (4T 111 λ 8T 122 λ)xy 2 + (4T 122 λ + 4T 111 λ)xz 2 + (4T 112 λ + 4T 222 λ)yz 2 + (4T 222 λ 8T 112 λ)x 2 y 2λ 2 z 3 + (8T 112 T 122 8T 111 T 222 )xyz, J x = 12T 122λx 2 + (4T 111 λ 8T 122 λ)y 2 + (4T 111 λ + 4T 122 λ)z 2 + (8T 222 λ 16T 112 λ)xy + (16T λ2 16T 111 T 122 )xz + (8T 112 T 122 8T 111 T 222 )yz, J y = (4T 222λ 8T 112 λ)x T 112 λy 2 + (4T 112 λ + 4T 222 λ)z 2 + (8T 111 λ 16T 122 λ)xy + (8T 112 T 122 8T 111 T 222 )xz + (16T λ 2 16T 112 T 222 )yz, J z = (8T T 111T 122 2λ 2 )x 2 + (8T 122 λ + 8T 111 λ)xz + (8T T 222T 112 2λ 2 )y 2 6λ 2 z 2 + (8T 112 T 122 8T 111 T 222 )xy + (8T 112 λ + 8T 222 λ)yz. By Salmon s formula [17], ψ T (λ) = et(g), where G is efine by (20). Thus ψ T (λ) = p 2 λ 6 + p 4 λ 4 + p 6 λ 2 + p 8 for some homogeneous polynomials p m of egree m in T ijk, m = 2, 4, 6, 8. See also [7], [41]. Therefore D eig (T ) is the eterminant of some matrix in T ijk. For a generic T S 3 (R 2 ), ψ T (λ) = c(λ 2 γ 1 )(λ 2 γ 2 )(λ 2 γ 3 ) for some c C an istinct γ i C, an so D eig (T ) 0. For T H isc, ψ T (λ) has multiple roots. For a specific example, let S S 3 (R 2 ) be efine by S 111 = S 222 = 1 an set other S ijk = 0. Then D eig (S) = 0, implying that S has at least one nonsimple eigenpairs. In fact, ψ S (λ) = (λ+1) 2 (λ 1) 2 (2λ 2 1) an so S has two eigenvectors (1, 0), (0, 1) corresponing to the eigenvalue 1, an two eigenvectors ( 1, 0), (0, 1) corresponing to the eigenvalue 1. Note that S is, up to a change of coorinates, the same example mentione at the beginning of this section, i.e., S = x 3 +y 3 has two best rank-one approximations x 3 an y 3. VII. UNIQUENESS OF BEST RANK-ONE APPROXIMATIONS FOR REAL TENSORS In this section, V an W, with or without subscripts, woul generally enote real an complex vector spaces respectively. Let W 1,..., W be complex vector spaces. For T W 1 W, u i W i, an α i C, we enote by ϕ T (λ) the resultant of the following homogeneous polynomial equations { α i T, u 1 û i u = λ( j i α j)u i, u i, u i = α 2 i, (21) for i = 1,...,. Again by stanar theory of resultants [17], [29], ϕ T (λ) vanishes if an only if (21) has a nontrivial solution, an we obtain the following analogue of Definition 28. Definition 35. ϕ T (λ) is calle the singular characteristic polynomial of T W 1 W. Clearly the roots of ϕ T (λ) are the normalize singular values of T. We also have an analogue of Definition 25. Definition 36. Let T W 1 W. Two normalize singular pairs (λ, u 1,..., u ) an (µ, v 1,..., v ) of T are calle equivalent if (λ, u 1,..., u ) = (µ, v 1,..., v ), or ( 1) 2 λ = µ an u i = v i for i = 1,...,. A normalize singular value λ of T is sai to be simple if it has only one corresponing normalize singular pair up to equivalence. For real vector spaces V 1,..., V, an T V 1 V, normalize singular pairs are invariant uner the prouct of orthogonal groups O(n 1 ) O(n ). It follows from [26] that the subset X V 1 V consisting of tensors without unique best rank-one approximations is containe in a hypersurface. We will show that this can be strengthene to an algebraic hypersurface. Proposition 37. The following subset is an algebraic hypersurface in V 1 V, X := {T V 1 V : T has non-unique best rank-one approximations}. Proof. By Lemma 12, X comprises tensors T for which T σ is not a simple normalize singular value. Let = 3 for notational simplicity. Let T X. Then there exist some G = T 111 T T 112 λ 0 T 112 T T λ T 122 4T 111λ 8T 122λ 4T 111λ+4T 122λ 8T 222λ 16T 112λ 16T λ2 16T 111T 122 8T 112T 122 8T 111T 222 4T 222λ 8T 112λ 12T 112λ 4T 112λ+4T 222λ 8T 111λ 16T 122λ 8T 112T 122 8T 111T T λ2 16T 112T 222 8T T111T122 2λ2 8T T222T112 2λ2 6λ 2 8T 112T 122 8T 111T 222 8T 122λ+8T 111λ 8T 112λ+8T 222λ (20)

12 YANG, COMON, AND LIM: UNIQUENESS OF NONNEGATIVE TENSOR APPROXIMATIONS 11 v 1, v 2, v 3 with v i = 1 an {u 1,1, u 2,1, u 3,1 } = {v 1, v 2, v 3 } with u i,1 = 1 such that T, u 1,1 u 2,1 u 3,1 = T σ = T, v 1 v 2 v 3. For each i = 1, 2, 3, exten u i,1 to an orthonormal basis {u i,1,..., u i,ni } of V i. By an action of O(n 1 ) O(n 2 ) O(n 3 ) on V 1 V 2 V 3, we may assume that v i = cos θ i u i,1 + sin θ i u i,2. Let T ijk = T, u 1,i u 2,j u 3,k. Then we have T 111 = T σ, T i11 = T 1i1 = T 11i = 0, T 111 cos θ 2 cos θ 3 + T 122 sin θ 2 sin θ 3 = T 111 cos θ 1, T 212 cos θ 2 sin θ 3 + T 221 sin θ 2 cos θ 3 + T 222 sin θ 2 sin θ 3 = T 111 sin θ 1, T j12 cos θ 2 sin θ 3 + T j21 sin θ 2 cos θ 3 + T j22 sin θ 2 sin θ 3 = 0, T 111 cos θ 1 cos θ 3 + T 212 sin θ 1 sin θ 3 = T 111 cos θ 2, T 122 cos θ 1 sin θ 3 + T 221 sin θ 1 cos θ 3 + T 222 sin θ 1 sin θ 3 = T 111 sin θ 2, T 1j2 cos θ 1 sin θ 3 + T 2j1 sin θ 1 cos θ 3 + T 2j2 sin θ 1 sin θ 3 = 0, T 111 cos θ 1 cos θ 2 + T 221 sin θ 1 sin θ 2 = T 111 cos θ 3, T 122 cos θ 1 sin θ 2 + T 212 sin θ 1 cos θ 2 + T 222 sin θ 1 sin θ 2 = T 111 sin θ 3, T 12j cos θ 1 sin θ 2 + T 21j sin θ 1 cos θ 2 + T 22j sin θ 1 sin θ 2 = 0, (22) for i 1 an j > 2. By eliminating the parameter θ, we obtain a system of polynomial equations that the T ijk s satisfy. Let J be the incience variety in V 1 V 2 V 3 O(n 1 ) O(n 2 ) O(n 3 ), i.e., for each (T, g 1, g 2, g 3 ) J where g i = [u i,1,..., u i,ni ] O(n i ), there is some (θ 1, θ 2, θ 3 ) such that the T ijk s satisfy (22). Define the projections π 1 J V 1 V 2 V 3 O(n 1 ) O(n 2 ) O(n 3 ) (23) by π 1 (T, g 1, g 2, g 3 ) = T an π 2 (T, g 1, g 2, g 3 ) = (g 1, g 2, g 3 ). Since T σ is a root of ϕ T (λ), T σ an its normalize singular vector tuples epen algebraically on T, implying that J is an algebraic variety. T σ is simple if an only if T is in the image of π 1, i.e., X = π 1 (J). Define T V 1 V 2 V 3 by T 111 = T 222 = 1 an set all other terms T ijk = 0. Then T has two normalize singular vector tuples corresponing to its normalize singular value T σ. So T π 1 (J). Since T has a finite number of singular pairs, a generic T π 1 (J) must also have a finite number of singular pairs by semicontinuity. So im π 1 1 (T ) = im O(n 1 2)+im O(n 2 2)+im O(n 3 2) for a generic T π 1 (J), an im X = im π 1 (J) = im J im O(n 1 2) im O(n 2 2) im O(n 3 2). Since π 2 is a ominant morphism, an the imension of a generic fiber π 1 2 (g 1, g 2, g 3 ) is im V 1 V 2 V 3 2(n 1 + n 2 + n 3 ) + 8, it follows that im J = im V 1 V 2 V 3 2(n 1 + n 2 + n 3 ) im O(n 1 ) + im O(n 2 ) + im O(n 3 ). Therefore the coimension of X is 1. We will show that normalize singular vector tuples of a generic tensor are istinct, a result that we will nee later but is also of inepenent interest. Proposition 38. Let W 1,..., W be vector spaces over C. A generic T W 1 W has istinct equivalence classes of normalize singular pairs. π 2 Our proof of Proposition 38 will rely on the next three intermeiate results. The first require result is a Bertini-type theorem introuce in [25]. Theorem 39 (Frielan Ottaviani). Let E be a vector bunle on a smooth variety B. Let S H 0 (B, E) generate E. If rank(e) > im B, then the zero locus of a generic ζ S is empty. Lemma 40. Let T W 1 W be generic an let (u 1,..., u ) be a normalize singular vector tuple of T. If v is not a scalar multiple of u, then (u 1,..., u 1, v ) is not a normalize singular vector tuple of T. Proof. Suppose λu = T, u 1 u 1 = µv for some v not a scalar multiple of u. Then λ or µ must be 0, contraicting the fact that 0 cannot be a singular value of a generic T [25, Theorem 1]. Lemma 41. Let u i, v i, w i W i with u i, u i = v i, v i = 1, i = 1, 2, 3. For x W i, we write [x] i for the corresponing element in the quotient space W i / span(u i ). Suppose u i = v i for at most one i. Then (i) the system of linear equations T, u 2 u 3 = T, v 1 v 2 v 3 u 1 + w 1, T, u 1 u 3 = w 2, (24) T, u 1 u 2 = w 3, has a solution T W 1 W 2 W 3 if an only if u 2, w 2 = u 3, w 3 ; (ii) the system of linear equations T, u 2 u 3 = T, v 1 v 2 v 3 u 1 + w 1, [ T, u 1 u 3 ] 2 = [w 2 ] 2, (25) [ T, u 1 u 2 ] 3 = [w 3 ] 3, always has a solution T W 1 W 2 W 3. Proof. Note that the variables in these linear equations are T ijk s, the coorinates of T. (i) Let A be the coefficient matrix in (24) an b be the righthan sie. The system has a solution if an only if A an the augmente matrix [A b] have the same rank, i.e., if there is some x i W i, i = 1, 2, 3, such that x 1 u 2 u 3 +u 1 x 2 u 3 +u 1 u 2 x 3 x 1, u 1 v 1 v 2 v 3 = 0, then x 1, w 1 + x 2, w 2 + x 3, w 3 = 0. Since x 1 u 2 u 3 +u 1 x 2 u 3 +u 1 u 2 x 3 x 1, u 1 v 1 v 2 v 3 = 0 if an only if x 1 = 0, x 2 = αu 2, x 3 = αu 3 or x 1 = 0, x 2 = αu 2, x 3 = αu 3 for some α, the system (24) has a solution if an only if u 2, w 2 = u 3, w 3. (ii) The system (25) has a solution if an only if u 2, w 2 + t 2 u 2 = u 3, w 3 + t 3 u 3 for some t 2, t 3 C. Choose any t 2, t 3 such that t 3 t 2 = u 2, w 2 u 3, w 3. Proof of Proposition 38. Let = 3 for notational convenience. For i = 1, 2, 3, let C i = {u i W i : u i, u i = 1}, F i be the trivial vector bunle on C i with fiber isomorphic to W i, L i be the tautological line bunle on C i, an Q i be the quotient bunle F i /L i on C i. Consier the exact sequence of vector bunles 0 L i F i Q i 0

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors Math 18.02 Notes on ifferentials, the Chain Rule, graients, irectional erivative, an normal vectors Tangent plane an linear approximation We efine the partial erivatives of f( xy, ) as follows: f f( x+