Key words. multiway array, Tucker decomposition, low-rank approximation, maximum block improvement

Size: px

Start display at page:

Download "Key words. multiway array, Tucker decomposition, low-rank approximation, maximum block improvement"

Conrad Bishop
6 years ago
Views:

1 ON TENSOR TUCKER DECOMPOSITION: THE CASE FOR AN ADJUSTABLE CORE SIZE BILIAN CHEN, ZHENING LI, AND SHUZHONG ZHANG Abstract. This paper is concerned with the problem of finding a Tucer decomposition for tensors. Traditionally, solution methods for Tucer decomposition presume that the size of the core tensor is specified in advance, which may not be a realistic assumption in some applications. In this paper we propose a new computational model where the configuration and the size of the core become a part of the decisions to be optimized. Our approach is based on the so-called maximum bloc improvement (MBI) method for non-convex bloc optimization. We consider a number of real applications in this paper, showing the promising numerical performances of the proposed algorithms. Key words. multiway array, Tucer decomposition, low-ran approximation, maximum bloc improvement AMS subject classifications. 15A69, 90C26, 90C59, 49M27, 65F99 1. Introduction. The models and solution methods for high order tensor decompositions were initially motivated and developed in psychometrics (see, e.g. [41, 42, 43]) and chemometrics (see, e.g. [4]) where the need for analysis of multiway data is imperative. Since then, tensor decomposition has become an active field of research in mathematics due partly to the intricate algebraic structures of tensors. In addition to this, high order (tensor) statistics and independent component analysis (ICA) have been developed by engineers and statisticians primarily because of their wide practical applications. There are two major tensor decompositions that can be considered as higher order extensions of the matrix singular value decomposition (SVD): one is nown as the CANDECOMP/PARAFAC (CP) decomposition, which attempts to present the tensor as a sum of ran-one tensors and has various applications in chemometrics [2, 4], neuroscience [1], and data mining [6], and the other is nown as the Tucer decomposition, which is a higher order generalization of the principal component analysis (PCA) and has found wide applications in signal processing [18], data mining [37], computer vision [45, 46], and chemical analysis [22, 23]. There are extensive studies in the literature for finding the CP decomposition and the Tucer decomposition for tensors; see, e.g. [16, 15, 27, 30, 31, 35, 17, 47, 50, 40]. On the computational side, the existing popular algorithms are based on the alternating least square (ALS) method proposed originally by Carroll and Chang [12], and Harshman [20]. However, the ALS method is not guaranteed to converge to a global optimal or a stationary point, but only to a solution where the objective function ceases to decrease. However, if the ALS method converges under certain non-degeneracy assumption, then it actually has a local linear convergence rate [44]. Recently, Chen et al. [14] put forward a bloc coordinate descent type search method nown as maximum bloc improvement (MBI), for solving a specific nonlinear optimization with separable structure (including tensor decompositions as special cases), and proved that the se- Department of Automation, Xiamen University, Xiamen , China (blchen@xmu.edu.cn). Department of Mathematics, Shanghai University, Baoshan District, Shanghai , China (zheningli@gmail.com). The research of this author was supported in part by Natural Science Foundation of Shanghai (Grant 12ZR ) and Ph.D. Programs Foundation of Chinese Ministry of Education (Grant ). Department of Industrial and Systems Engineering, University of Minnesota, Minneapolis, MN 55455, USA (zhangs@umn.edu). The research of this author was supported in part by National Science Foundation of USA (Grant CMMI ). 1

2 2 BILIAN CHEN, ZHENING LI, AND SHUZHONG ZHANG quence produced by the MBI method converges to a stationary point in general. This performs very well numerically. The MBI method has been applied successfully to the problem of finding the best ran-one approximation of tensors [14], and in solving problems arising from bioinformatics [49, 48] and radar systems [5]. Local linear convergence of the MBI method for certain types of problems were established by Li et al. [33]. Recent treatise on this topic can be found in the Ph.D. thesis of Chen [13]. In this paper, we study the Tucer decomposition problem with the characteristic that the size of its core is unspecified. We shall demonstrate how the MBI method can be applied to solve such problems. Tucer decomposition can be viewed as a generalization of CP decomposition which is a Tucer model with equal number of components in each mode. It was first introduced in 1963 by Tucer [41], and later redefined in Levin [32] and Tucer [42, 43]. The goal of Tucer decomposition is to decompose a tensor into a core tensor multiplied by a matrix along each mode. It is related to the best ran-(r 1, r 2,..., r d ) approximation of d-th order tensors (cf. [16]). Typically, a ran-(r 1, r 2,..., r d ) Tucer decomposition (sometimes called the best multilinear ran-(r 1, r 2,..., r d ) approximation) means that the size of the core tensor is r 1 r 2 r d. In terms of the decomposition methods, Tucer [43] proposed three methods to find the Tucer decomposition for three-way tensors in 1966, among which the first method is referred to the Tucer1 method, and is now better nown as the higher order singular value decomposition (HOSVD) (see [15]). In [15], De Lathauwer et al. not only showed that the HOSVD for the tensor is a convincing generalization of SVD for the matrix case, but also proposed a truncated HOSVD that gives a suboptimal ran-(r 1, r 2,..., r d ) approximation of tensors. Analogous to the ALS method developed for computing CP decomposition, researchers also derived similar methods for solving Tucer decomposition. Kroonenberg and De Leeuw [30] proposed TUCKALS3 for computing Tucer decomposition for the three-way arrays. Later Kapteyn et al. [25] extended the TUCKALS3 to decompose the general d-way arrays for d > 3. Moreover, De Lathauwer et al. [16] derived a more efficient technique called higher order orthogonal iteration (HOOI) method for computing ran-(r 1, r 2,..., r d ) Tucer decomposition. Possible improvements for the HOOI algorithm are considered by Andersson and Bro [3]. However, as we mentioned earlier, the ALS method has no convergence guarantee in general. Alternatively, there are methods for Tucer decomposition with a convergence guarantee. Eldén and Savas [19], and Ishtev et al. [24] simultaneously proposed Newton-based method for computing the best multilinear ran-(r 1, r 2, r 3 ) approximation of tensors, the former is nown as the Newton-Grassmann method, and the latter is also called the differential-geometric Newton method. Both methods have quadratic local convergence property. (Note that computing Hessian is usually a very demanding tas). For more discussions on the Tucer decomposition and its variants, one is referred to the survey paper by Kolda and Bader [29]. In Tucer decomposition, the ran-(r 1, r 2,..., r d ) (namely the size of the core) is considered a set of predetermined parameters. This requirement, however, is restrictive in some applications. For example, the problem on how to choose a good number of clusters for co-clustering of gene expression data comes up in the area of bioinformatics. In fact, the number of suitable co-clusters is not nown in advance (see Zhang et al. [48]). The question of how to choose the ran of Tucer model (cf. [39, 26, 21]) is challenging, since the problem is already NP-hard even if the Tucer ran (r 1, r 2,..., r d ) is nown. Similar issue on selecting an appropriate ran has also been considered in CP decomposition, e.g. a consistency diagnostic named COR-

3 ADJUSTABLE TENSOR TUCKER DECOMPOSITION 3 CONDIA designed in [11]. Timmerman and Kiers [39] proposed DIFFIT procedure based on optimal fit to choose the numbers of components in Tucer decomposition of three-way data. The procedure, however, is rather time-consuming. Kiers and Der Kinderen [26] revised the procedure and computed DIFFIT on approximate fit to save the computational time. In this paper, we propose a new scheme to choose good rans in Tucer decomposition, provided that the summation of the rans d i=1 r i is fixed. The remainder of the paper is organized as follows. In Section 2, we introduce the notations and frequently used tensor operations throughout the paper. Then we apply the MBI method for solving ran-(r 1, r 2,..., r d ) Tucer decomposition in Section 3. In Section 4, a new model for Tucer decomposition with unspecified core size is proposed and solved based on the MBI method and the penalty method. A heuristic approach is also developed to compute the new model in this section. Finally, numerical results on testing the new model and algorithms are presented in Section Notations. Throughout this paper, we uniformly use non-bold lowercase letters, boldface lowercase letters, capital letters, and calligraphic letters to denote s- calars, vectors, matrices, and tensors, respectively; e.g.: scalar i, vector y, matrix A, and tensor F. We use the subscripts to denote the component of a vector, a matrix, or a tensor; e.g.: y i being the i-th entry of the vector y, A ij being the (i, j)-th entry of the matrix A, and F ij being the (i, j, )-th entry of the tensor F. Let us first introduce some important tensor operations frequently appeared in this paper, which are largely in line with that in [29, 15]. For an overview of tensor operations and properties, we refer to the survey paper [29]. A tensor is a multidimensional array, and the order of a tensor is its dimension, also nown as the ways or the modes of a tensor. In particular, a vector is a tensor of order one, and a matrix is a tensor of order two. Consider A = (A i1i 2...i d ) R n1 n2 n d, a standard tensor of order d, where d 3. A usual way to handle a tensor is to reorder its elements into a matrix; the process is called matricization, also nown as unfolding or flattening. A R n1 n2 n d has d modes, namely, mode-1, mode-2,..., mode-d. Denote the mode- matricization of tensor A to be A (), then the (i 1, i 2,..., i d )-th entry of tensor A is mapped to the (i, j)-th entry of matrix A () R n l n l, where j = l d,l (i l 1) 1 t l 1,t The -ran of tensor A, denoted by ran (A), is the column ran of mode- unfolding A (), i.e., ran (A) = ran(a () ). A d-th order tensor whose ran (A) = r for = 1, 2,..., d, is briefly called a ran-(r 1, r 2,..., r d ) tensor. Analogous to the Frobenius norm of a matrix, the Frobenius norm of tensor A is the usual 2-norm, defined by n 1 A := n 2 i 1=1 i 2=1 n d i d =1 A i1i 2...i d 2. In this paper we uniformly denote the 2-norm for vectors, and the Frobenius norm for matrices and tensors, all by notation. The inner product of two same-sized n t.

4 4 BILIAN CHEN, ZHENING LI, AND SHUZHONG ZHANG tensors A, B is given as A, B = n 1 n 2 i 1=1 i 2=1 n d i d =1 A i1i 2...i d B i1i 2...i d. Hence, it is clear that A, A = A 2. One important tensor operation is the multiplication of a tensor by a matrix. The -mode product of tensor A by a matrix U R m n, denoted by A U, is a tensor in R n1 n2 n 1 m n +1 n d, whose (i 1, i 2,..., i 1, l, i +1,..., i d )-th entry is defined by (A U) i1i 2...i 1 l i +1...i d = n i =1 A i1i 2...i 1 i i +1...i d U li. The equation can also be written in terms of tensor unfolding as well, i.e., Y = A U Y () = UA (). This multiplication in fact changes the dimension of tensor A in mode-. In particular, if U is a vector in R n, the order of tensor A U is then reduced to d 1, whose size is n 1 n 2 n 1 n +1 n d. The -mode products can also be expressed by the the matrix Kronecer product as follows: Y = A 1 U (1) 2 U (2) d U (d) Y () = U () A () (U (d) U (+1) U ( 1) U (1)), for any = 1, 2,..., d with U () R m n for = 1, 2,..., d. The proof of this property can be found in [28]. Throughout this paper we uniformly use a subscript in the parentheses to denote the matricization of a tensor (e.g. A (1) being mode-1 matricization of tensor A), and use a superscript in the parentheses to denote the matrix in the mode product of a tensor (e.g. U (1) in appropriate size showed in mode-1 product of a tensor). 3. Convergence of the traditional Tucer decomposition. Traditionally, Tucer decomposition attempts to find the best approximation for a large-sized tensor by a small-sized tensor with pre-specified dimensions (called the core tensor) multiplied by a matrix on each mode. The problem can be formulated as follows: Given a real tensor F R n1 n2 n d, find a core tensor C R r1 r2 r d with pre-specified integers r i with 1 r i n i for i = 1, 2,..., d, that optimizes (T min ) min F C 1 A (1) 2 A (2) d A (d) s.t. C R r1 r2 r d, A (i) R ni ri, i = 1, 2,..., d. Here, matrices A (i) s are the factor matrices. Without loss of generality, these matrices are assumed to be columnwise orthogonal. The problem can be considered as a generalization of the best ran-one approximation problem, namely the case of c 1 = c 2 = = c d = 1. For any fixed matrices A (i) s, if we optimize the objective function of (T min ) over C, it is not hard to verify that (T min ) is equivalent to the following maximization model (see the discussion in [29, 16, 3]): (T max ) max F 1 (A (1) ) T 2 (A (2) ) T d (A (d) ) T s.t. A (i) R ni ri, (A (i) ) T A (i) = I, i = 1, 2,..., d.

5 ADJUSTABLE TENSOR TUCKER DECOMPOSITION 5 The worhorse for solving (T max ) or (T min ) has been traditionally the ALS method. However, the convergence of the ALS method is not guaranteed in general. Chen et al. [14] proposed the MBI method, a greedy-type search algorithm for optimization model with general separable structure (G) max f(x 1, x 2,..., x d ) s.t. x i S i R ni, i = 1, 2,..., d, where f : R n1+n2+ +n d R is a general continuous function. Assuming that for any fixed d 1 blocs of variables x i s, optimization over one bloc of variables can be done, the MBI method is guaranteed to converge to a stationary point under the mild condition that S i is compact for i = 1, 2,..., d. We notice that (T max ) is a particular instance of (G). Moreover, by fixing any d 1 blocs A (i) s in (T max ), the optimization subproblem in the MBI method can be easily solved by the singular value decomposition (SVD). To be specific, the subproblem of (T max ) by applying the MBI method is (Tmax) i max F 1 (A (1) ) T 2 (A (2) ) T d (A (d) ) T s.t. A (i) R ni ri, (A (i) ) T A (i) = I, where matrices A (1), A (2),..., A (i 1), A (i+1),..., A (d) are given. The objective function of (Tmax) i can be written in the matrix form as follows: (A (i) ) T F (i) (A (d) A (i+1) A (i 1) A (1)). Therefore, the optimal ( solution for (Tmax) i is the r i leading left singular vectors of the matrix F (i) A (d) A (i+1) A (i 1) A (1)). Let us now present the algorithm for solving (T max ) by the MBI approach. Algorithm 1. The MBI method for Tucer decomposition Input Tensor F R n1 n2 n d and scalars r 1, r 2,..., r d. Output Core tensor C R r1 r2 r d and matrices A (i) R ni ri for i = 1, 2,..., d. 0 Choose an initial feasible solution (A (1) 0, A(2) 0,..., A(d) 0 ) and compute initial objective value v 0 := F 1 (A (1) 0 )T 2 (A (2) 0 )T d (A (d). 0 )T Set := 0. 1 For each i = 1, 2,..., d, compute B (i) +1 consisted by the r i leading left singular ( ) vectors of F (i) A (d) A (i+1) A (i 1) A (1), and let w+1 i := F 1 (A (1) )T i 1 (A (i 1) ) T i (B (i) +1 )T i+1 (A (i+1) ) T d (A (d) )T. 2 Let v +1 := max 1 i d w+1 i and choose one i = arg max 1 i d w+1 i, and further denote { A (i) +1 := A (i) i {1, 2,..., d}\{i }, B (i) +1 i = i. 3 If v +1 v ɛ, stop and output the core tensor and matrices A (i) = A (i) to Step 1. C := F 1 (A (1) +1 )T 2 (A (2) +1 )T d (A (d) +1 )T +1 for i = 1, 2,..., d; otherwise, set := + 1 and go

6 6 BILIAN CHEN, ZHENING LI, AND SHUZHONG ZHANG According to the convergence result of the MBI method established in [14], Algorithm 1 guarantees to converge to a stationary point for Tucer decomposition, which is however not the case for many other algorithms, e.g. HOSVD method and HOOI method. 4. Tucer decomposition with unspecified size of the core. In this section, we propose a new model for Tucer decomposition without pre-specifying the size of the core tensor. The dimension of each mode for the core is no longer a constant. Rather it is a variable that also needs to be optimized, which is a ey ingredient in our model. Several algorithms are proposed to solve the new model as well The formulation. Given a tensor F R n1 n2 n d, the goal is to find a small-sized (low ran) core tensor C and d slim matrices A (i) s to express F, as close as possible. We are interested in determining the ran of each mode of C as well as the best approximation of F. Let the i-ran of C be r i for i = 1, 2,..., d. Clearly, we have 1 r i n i for i = 1, 2,..., d. Unlie the general Tucer decomposition, r i s are now decision variables which need to be determined. Denote c to be a given constant for the summation of all i-ran variables, i.e., d i=1 r i = c, which in general prevents r i being too large. To determine the allocation for r i s in the total number c, the new model is (NT min ) min F C 1 A (1) 2 A (2) d A (d) s.t. C R r1 r2 r d, A (i) R ni ri, i = 1, 2,..., d, r i Z, 1 r i n i, i = 1, 2,..., d, d i=1 r i = c. It is difficult to solve (NT min ) directly, since the first two constraints of (NT min ) combine the bloc variables A (i) and i-ran variable r i together. A straightforward method is to separate these variables, and we introduce d more bloc variables Y (i) R mi mi where m i := min{n i, c} for i = 1, 2,..., d and Y (i) = diag(y (i) ), y (i) {0, 1} mi, and m i j=1 y (i) j = r i for i = 1, 2,..., d. By the equivalence between Tucer decomposition models (T min ) and (T max ), we may reformulate (NT min ) as follows: ( (NT max ) max F 1 A (1) Y (1)) T ( 2 A (2) Y (2)) T ( d A (d) Y (d)) T s.t. A (i) R ni mi, (A (i) ) T A (i) = I, i = 1, 2,..., d, y (i) {0, 1} mi, i = 1, 2,..., d, mi j=1 y(i) j 1, i = 1, 2,..., d, d i=1 mi j=1 y(i) j = c. Throughout this paper, Y (i) always represents diag(y (i) ) rather than putting Y (i) = diag(y (i) ) in the constraints, in order to mae the models being concise. Note that r i in (NT min ) is already replaced the number of nonzero entries of y (i) in (NT max ). Let X = F 1 ( A (1) Y (1)) T 2 ( A (2) Y (2)) T d ( A (d) Y (d)) T R m 1 m 2 m d. If feasible bloc variables Y (i) s satisfy

7 ADJUSTABLE TENSOR TUCKER DECOMPOSITION 7 Y (i) = diag(1, 1,..., 1, 0, 0,..., 0), i = 1, 2,..., d, (4.1) }{{}}{{} r i m i r i then the size of the core tensor X can be reduced to r 1 r 2 r d by deleting the tails of zero entries in all modes, and the ran of X in each mode is equal to r 1, r 2,..., r d, respectively. This observation, however establishes the equivalence between (NT min ) and (NT max ). As we shall see in the next subsection, we may without loss of generality assume Y (i) to be in the form of (4.1). This allows us to construct a core tensor with ran-(r 1, r 2,..., r d ) easily, which is exactly the same size of the core tensor C to be optimized in (NT min ). Besides, we would lie to remar that the third constraint mi j=1 y(i) j 1 for i = 1, 2,..., d in (NT max ) is actually redundant. The reason is that if m i j=1 y(i) j = 0 for some i, then clearly the objective is zero, which can never be optimal. However, we still eep it in (NT max ) for continency in implementing the algorithms discussed in the following subsections The penalty method. Let us focus on the model for Tucer decomposition with unspecified size of the core in the maximization form (NT max ). Recall that the separable structure of (G) is required to implement the MBI method. Therefore, we mi j=1 y(i) j move the nonseparable constraint d i=1 the following penalty function ( p λ, A (1), A (2),..., A (d), Y (1), Y (2),..., Y (d)) := = c to the objective function, i.e., F 1 (A (1) Y (1)) T 2 (A (2) Y (2)) T 3 d (A (d) Y (d)) T 2 d m i λ i=1 j=1 where λ > 0 is a penalty parameter. The penalty model for (NT max ) is then (P T ) max p ( λ, A (1), A (2),..., A (d), Y (1), Y (2),..., Y (d)) s.t. A (i) R ni mi, (A (i) ) T A (i) = I, i = 1, 2,..., d, y (i) {0, 1} mi, i = 1, 2,..., d, mi j=1 y(i) j 1, i = 1, 2,..., d. y (i) j 2 c, We are ready to apply the MBI method to solve (P T ) since the bloc constraints are now separable. Before presenting the formal algorithm, let us first discuss the subproblems in implementing the MBI method, which have to be solved globally as a requirement to guarantee convergence. Without loss of generality, we wish to optimize (A (1), Y (1) ) while all other bloc variables (A (i), Y (i) ) for i = 2, 3,..., d are fixed, i.e., (P T 1 ) max ( A (1) Y (1)) T W (1) 2 ( m1 ) 2 λ j=1 y(1) j + c 1 s.t. A (1) R n1 m1, (A (1) ) T A (1) = I, y (1) {0, 1} m1, m1 j=1 y(1) j 1, where W (1) := F (1) ( A (d) Y (d) A (2) Y (2)) and c 1 := d i=2 mi j=1 y(i) j c. This subproblem indeed can be solved easily. First, as the optimization of A (1) is irrelevant to the penalty term, the optimal A (1) is the m 1 leading left singular

8 8 BILIAN CHEN, ZHENING LI, AND SHUZHONG ZHANG vectors of matrix W (1) by applying SVD. Next, we search for the optimal Y (1) for given optimal A (1). Denote v j to be the j-th row vector of matrix (A (1) ) T W (1) for j = 1, 2,..., m 1, and we have ( A (1) Y (1)) T 2 m 1 W (1) = y (1) j v j 2. Therefore, the optimization of Y (1) is then the following problem: ( m1 ) 2 max λ j=1 y(1) m1 j + j=1 ( v j 2 2λ c 1 )y (1) j λ( c 1 ) 2 j=1 s.t. y (1) {0, 1} m1, m 1 j=1 y(1) j 1. Although the above model appears to be combinatorial, it is solvable in polynomialtime. This is because all the possible values for m 1 j=1 y(1) j are {1, 2,..., m 1 }, and for any fixed m 1 j=1 y(1) j, the optimal allocation for y (1) can be assigned greedily as the objective function is now linear. Thus we only need to try m 1 different values for and pic the best solution. m1 j=1 y(1) j In the above discussion, we now that the optimal Y (1) automatically satisfies the formation (4.1). This is because A (1) is the m 1 leading left singular vectors of matrix W (1), leading to the non-increasing order for v j s. Therefore we can update A (1) and Y (1) simultaneously in solving the subproblem of the MBI method for given penalty parameter λ. To summarize, the whole procedure for solving (NT max ) using the MBI method and penalty function method (cf. [8, 38]) is as follows. Algorithm 2. The MBI method for the penalty model Input Tensor F R n1 n2 n d and integer c d. Output Core tensor C R m1 m2 m d and matrices A (i) R ni mi for i = 1, 2,..., d. 0 Choose penalty parameters λ 0 > 0, σ > 1 and an initial feasible solution (A (1) 0, A(2) ( 0,..., A(d) 0, Y (1) 0, Y (2) 0,..., Y (d) 0 ), and compute ) initial objective value v 0 := p λ 0, A (1) 0, A(2) 0,..., A(d) 0, Y (1) 0, Y (2) 0,..., Y (d) 0. Set := 0, l := 0 and λ := λ 0. ( ) 1 For each i = 1, 2,..., d, solve (P T i ) and get its optimal solution B (i) +1, Z(i) +1 with optimal value w+1 i, where (P T i ) is defined similar as (P T 1 ) by replacing bloc 1 to bloc i. 2 Let v +1 := max 1 i d w+1 i and choose one i = arg max 1 i d w+1 i, and further denote ( ) A (i) +1, Y (i) +1 := ( ( A (i) ), Y (i) B (i) +1, Z(i) +1 ) i {1, 2,..., d}\{i }, i = i. 3 If v +1 v ɛ, go to Step 4; otherwise, set := + 1 and go to Step 1. ( d 2 mi 4 If λ l i=1 j=1 (y(i) +1 ) j c) ɛ0, stop and output the core tensor ( ) T ( ) T ( ) T C := F 1 A (1) +1 Y (1) +1 2 A (2) +1 Y (2) +1 3 d A (d) +1 Y (d) +1 and matrices A (i) = A (i) +1 Y (i) +1 for i = 1, 2,..., d; otherwise, set l := l + 1, λ := λ 0 σ l and := 0, and go to Step 1.

9 ADJUSTABLE TENSOR TUCKER DECOMPOSITION 9 We remar that when Algorithm 2 stops, we can shrin the size of the core tensor to r 1 r 2 r d by deleting the tails of zero entries in all modes, where r i is the number of nonzero entries of y (i). As an MBI method, Algorithm 2 is guaranteed to converge to a stationary point of (P T ), as claimed in [14] A heuristic method. To further save the computational efforts, in this subsection we present a heuristic approach to solve Tucer decomposition model with unspecified size of the core, i.e., (NT max ). We now that if each i-ran (i = 1, 2,..., d) of the core tensor C is given, then the problem becomes the traditional Tucer decomposition discussed in Section 3, which can be solved by the MBI method (Algorithm 1). From the discussion in Section 4, we notice that the ey issue of the model is to allocate the constant number c to each i-ran of the core tensor. Therefore, the optimal value of (T max ) can be considered as a function of (r 1, r 2,..., r d ), denoted by g(r 1, r 2,..., r d ). Our heuristic approach tries to find a distribution of the constant c, by adapting the idea of the MBI method. Specifically, we may start with lower i-rans, e.g. r1 0 = r2 0 = = rd 0 = 1, and compute Tucer decomposition using Algorithm 1; then we increase one of the i-rans (1 i d) by one, by choosing i as the best Tucer decomposition among d possible increment of the i-ran, i.e., (r1 +1, r2 +1,..., r +1 d ) = arg max 1 i d g(r 1, r2,..., ri 1, ri + 1, ri+1,..., rd). This procedure is continued until d i=1 r i = c for some (= c d). If the function g(r 1, r 2,..., r d ) for Tucer decomposition can be globally solved for any given (r 1, r 2,..., r d ) (though it is NP-hard in general), then the above approach is a dynamic program and the optimality of (NT max ) is guaranteed when the method stops. Though in general Algorithm 1 can only find the stationary solution of g(r 1, r 2,..., r d ), this heuristic approach is in fact very effective; see the numerical tests in Section 5. Apart from the ran increasing approach, ran decreasing strategy can be the other alternative, i.e., starting from large number r i s and decreasing them until d i=1 r i = c. We conclude this subsection by presenting the heuristic algorithm for ran decreasing method below, which is actually very easy to implement. Algorithm 3. The ran decreasing method Input Tensor F R n1 n2 n d and integer c d. Output Core tensor C R r1 r2 r d and matrices A (i) R ni ri for i = 1, 2,..., d. 0 Choose initial rans (r 1, r 2,..., r d ) with d i=1 r i > c. 1 Apply Algorithm 1 to solve (T max ) with input (r 1, r 2,..., r d ), and output matrices A (i) for i = 1, 2,..., d. 2 For each i = 1, 2,..., d, let B (i) R ni (ri 1) be the matrix by deleting the r i -th column of A (i), and compute i := arg max F 1 (A (1) ) T i 1 (A (i 1) ) T i (B (i) ) T i+1 (A (i+1) ) T 1 i d d (A (d) ) T. Update r i := r i 1 and A (i ) := B (i ). Repeat if necessary until d i=1 r i = c. 3 Apply Algorithm 1 to solve (T max ) with input (r 1, r 2,..., r d ), and output the core tensor C and matrices A (i) for i = 1, 2,..., d.

10 10 BILIAN CHEN, ZHENING LI, AND SHUZHONG ZHANG 5. Numerical experiments. In this section, we apply the algorithms proposed in the previous sections to solve Tucer decomposition without specification of the core size. Four different types of data are tested to demonstrate the effectiveness and efficiency of our algorithms. All the numerical computations are conducted in an Intel Xeon CPU 3.40GHz computer with 8GB RAM. The supporting software is MATLAB (R2011a) as a platform. We use MATLAB Tensor Toolbox Version 2.5 [7] whenever tensor operations are called, and also apply its embedded algorithm (the ALS method) to solve Tucer decomposition with given ran-(r 1, r 2,..., r d ). The termination precision for the ALS method is set to be In implementing Algorithm 2, the penalty increment parameter is set as σ := 2. The starting matrices A (i) 0 s are randomly generated and then made to be orthonormal. The starting diagonal matrices Y (i) 0 s are all set as 0 := diag(1, 0, 0,..., 0) for i = 1, 2,..., d. }{{} m i 1 Y (i) The termination precision for Algorithm 2 is also set to be For Algorithm 3, the initial rans are set as r i := min(n i, c) for i = 1, 2,..., d. When applying Algorithm 1 in Step 1, the starting matrices A (i) 0 s for solving (T max) are randomly generated, and the termination precision is set to be While applying Algorithm 1 in Step 3, the starting matrices A (i) 0 s for solving (T max) are obtained from Step 2, and the termination precision is set to be For the given data tensor F in the following tests, and the ran-(r 1, r 2,..., r d ) approximation tensor ˆF computed by the algorithms, the relative square error of the approximation is normally defined as F ˆF / F. To measure how close it is to the original tensor F, the term fit is defined as fit := 1 F F ˆ. F The larger the fit, the better the performance Noisy tensor decomposition. First we present some preliminary test results on two synthetic data sets. We use Tensor Toolbox to randomly generated a noise-free tensor Y R with ran-(4, 4, 2), where entries of the core tensor follow standard normal distributions and entries of the orthogonal factor matrices follow uniform distributions. A noise tensor N R in the same size is randomly generated, whose entries follow standard normal distributions. The noisy tensor Z to be tested is then constructed as follows: Z = Y + η Y N N, where η is the noise parameter, more formally, the perturbed ratio. Our tas is to find the real i-rans of the core tensor from the corrupted tensor Z. In this set of tests, the initial penalty parameter for Algorithm 2 is set as λ 0 = Z, and summation of the rans is set as c = 10. Results of this experiment are summarized in Figure 5.1, representing the fit and the computational time of Algorithms 2 and 3 for different perturbed ratios. For comparison, we also call the ALS method for solving the traditional Tucer decomposition with given (r 1, r 2, r 3 ), which are randomly generated satisfying r 1 + r 2 + r 3 = 10. Also, 10 random ran samples

11 ADJUSTABLE TENSOR TUCKER DECOMPOSITION 11 are generated for the ALS method, and its maximum fit, average fit and minimum fit are presented in the left hand side of Figure 5.1, corresponding to three pentagon dots from the top to the bottom in each line segment, respectively. The computational time for the ALS method is then the summation of these 10 ran samples. We observe that Algorithms 2 and 3 can easily and quicly find the exact i-rans (4, 4, 2) of the core tensor for different perturbed ratios, and can still approximate the original tensor Y accurately even for large noise. Figure 5.1 shows that Algorithms 2 and 3 outperform the ALS method significantly, both in fit and in computational time. Fig Approximating a noisy tensor of size with original ran-(4,4,2). Figure 5.2 shows another synthetic experiment whose data is generated similarly as that in Figure 5.1. In this set of data, Y R , the i-rans are (5, 5, 4). The summation of the rans in testing Algorithms 2 and 3 are set as c = 14. Figure 5.2 again shows that our algorithms perform quite well in anti-interference and are far superior to the ALS method. Furthermore, both Algorithms 2 and 3 can find the exact i-rans (5, 5, 4) of the core tensor for various perturbed ratios. Fig Approximating a noisy tensor of size with original ran-(5,5,4) Amino acid fluorescence data. This data set was originally generated and measured by Claus Andersson and was later published and tested by Bro [9, 10].

12 12 BILIAN CHEN, ZHENING LI, AND SHUZHONG ZHANG It consists of five laboratory-made samples. Each sample contains different amounts of tyrosine, tryptophan and phenylalanine dissolved in phosphate buffered water. The size of the array A to be decomposed is , which corresponds to samples, emission wavelength ( nm) and excitation wavelength ( nm), respectively. Ideally the array should be describable with three PARAFAC components, where its fit is equal to 97.44% obtained by Tensor Toolbox. This data can also be approximated well in the sense of Tucer decomposition. In implementing Algorithm 2, the initial penalty parameter is set as λ 0 = 0.01 A. The numerical results for the three methods (ALS, Algorithms 2 and 3) are p- resented in Figure 5.3. Each pentagon dot in Figure 5.3 denotes the average fit by running the ALS method 10 times with randomly generated ran-(r 1, r 2, r 3 ) satisfying r 1 + r 2 + r 3 = c, and the CPU time is the summation of 10 random samples. The performances of Algorithms 2 and 3 are much better than that of the ALS method, especially for Algorithm 2, which can get a convincing fit even for lower c. Furthermore, Algorithm 2 decomposes this data set as good as PARAFAC does without nowing the exact i-rans, e.g. the fit reaches 97.55% when c = 9 which finds ran-(3, 3, 3) when it stops. Fig Approximating the amino acid fluorescence data for different c = r 1 + r 2 + r 3 To further justify the importance of Tucer decomposition with unspecified size of the core as well as Algorithm 2, here we investigate the ALS method with all possible pre-specified i-rans to test this data set. For given summation of i-rans c = 9, there are total 25 possible combinations of (r 1, r 2, r 3 ), and their corresponding fits are listed in Figure 5.4. It shows that Tucer decomposition of ran-(3, 3, 3) outperforms all other combinations, which is exactly the i-rans found by Algorithm Gene expression data. We further test one real three-way tensor F R , which is from 3D Arabidopsis gene expression data 1. Essentially this is a set of gene expression data with 2395 genes, measured at 6 time points and 9 different conditions. We again apply the three methods to test this data set. The initial penalty parameter for Algorithm 2 is set as λ 0 = 0.01 F. Results of our experiments are summarized in Table 5.1. For the fit in the last three columns by the ALS method, fit-2 and fit-3 denote the fit by using the ALS method with the rans obtained from 1 We than Professor Xiuzhen Huang for providing this set of data.

13 ADJUSTABLE TENSOR TUCKER DECOMPOSITION 13 Fig The ALS method on the amino acid fluorescence data for c = 9. c Algorithm 2 Algorithm 3 ALS (r 1, r 2, r 3) CPU Fit(%) (r 1, r 2, r 3) CPU Fit(%) Fit-2(%) Fit-3(%) Fit(%) 3 (1, 1, 1) (1, 1, 1) (2, 1, 2) (2, 1, 2) (4, 2, 4) (1, 3, 6) (7, 3, 5) (5, 6, 4) (10, 4, 6) (5, 6, 9) Table 5.1 Approximating the gene expression data Algorithms 2 and 3, respectively; while the fit in the last column is the average fit by running the ALS method 10 times with randomly generated rans satisfying r 1 + r 2 + r 3 = c. The following observations are clearly visible: Excellent performances of Algorithms 2 and 3 are validated. Computing the i-ran information is important, as the rans outputted by Algorithms 2 and 3 are more suitable than randomly generated rans for the ALS method in terms of the fit. With the combination of the ALS method, Algorithm 2 wors better than Algorithm 3 in terms of the fit, while its CPU time is more costly than that of Algorithm Tensor compression of image data. Our final tests are applying the models and algorithms in this paper to compress the tensors from the image data. Here two set of faces are presented, with each in one subsection The ORL database of faces. The first set of images is from the ORL database of faces [36] in AT&T Laboratories Cambridge. In this database, there are 10 different images for each of the 40 distinct subjects, and the size of each image is pixels. Here, we draw one single distinct subject, and then construct a tensor T R When applying Algorithm 2, the initial penalty parameter is set as λ 0 = T. Figure 5.5 displays the images compressed and recovered by the three methods

14 BILIAN CHEN, ZHENING LI, AND SHUZHONG ZHANG c Methods (r 1, r 2, r 3) CR CP(%) RMSE Fit(%) CPU 50 70 Algorithm 3 (25, 22, 3) 62.4 98.4 143.98 83.99 0.71 ALS (best) (16, 25, 9) 28.6 96.5 75.25 87.

59 1.20 (worst) (11, 58, 1) 161.5 99.4 345.26 76.10 Algorithm 2 (30, 30, 10) 11.4 91.3 34.13 91.14 12.35 Table 5.2 Tensor compression for the ORL database of faces when c = 50 and 70 respectively.

14 14 BILIAN CHEN, ZHENING LI, AND SHUZHONG ZHANG c Methods (r 1, r 2, r 3) CR CP(%) RMSE Fit(%) CPU Algorithm 3 (25, 22, 3) ALS (best) (16, 25, 9) (worst) (43, 6, 1) Algorithm 2 (21, 19, 10) Algorithm 3 (35, 31, 4) ALS (best) (34, 29, 7) (worst) (11, 58, 1) Algorithm 2 (30, 30, 10) Table 5.2 Tensor compression for the ORL database of faces when c = 50 and 70 respectively. The detailed numerical values for the two cases are listed in Table 5.2. Some standard abbreviations from image science are adopted, namely, CP for compression, CR for compression ratio, and RMSE for root mean squared error. For the ALS method, its CPU time is computed by the summation of 10 random generated i-rans satisfying r 1 + r 2 + r 3 = c, and its best and worst compressions in terms of fit are reported for comparison. Fig Recovered images for the ORL database of faces (rows from the top to the bottom: 1. Original, 2. Algorithm 3, 3. ALS (the best), 4. ALS (the worst), 5. Algorithm 2) Observation from Figure 5.5 indicates that only the compressed images by the ALS method (the best one) and Algorithm 2 eep all the facial expressions. However Algorithm 2 indeed outperforms the ALS method (the best one) in terms of fit and RMSE as shown in Table 5.2. It is worth mentioning that most of computational effort of Algorithm 2 is to find a suitable combination of rans (r 1, r 2, r 3 ), while this issue for the ALS method is pre-specified. This computational effort in selecting better initial rans is worthwhile, as we noticed that the ALS method for a large c may be worse than Algorithm 2 for a smaller c, e.g. the 4th row in the right of Figure 5.5

ADJUSTABLE TENSOR TUCKER DECOMPOSITION 15 c Methods (r 1, r 2, r 3) CR CP(%) RMSE Fit(%) CPU 50 70 Algorithm 3 (40, 39, 1) 294.1 99.7 480.15 80.11 2.50 ALS (best) (39, 38, 3) 103.2 99.0 183.30 87.

15 ADJUSTABLE TENSOR TUCKER DECOMPOSITION 15 c Methods (r 1, r 2, r 3) CR CP(%) RMSE Fit(%) CPU Algorithm 3 (40, 39, 1) ALS (best) (39, 38, 3) (worst) (75, 2, 3) 1.0e Algorithm 2 (39, 34, 7) Algorithm 3 (53, 55, 2) ALS (best) (54, 52, 4) (worst) (1, 105, 4) 1.1e Algorithm 2 (53, 50, 7) Table 5.3 Tensor compression for for the JAFFE database is not clearer than the 2rd row in the left of Figure 5.5, which is also confirmed by Table 5.2. Figure 5.6 presents the performance of Algorithm 2 for various c. Clearly, the larger c is, the larger the fit, and the lower the RMSE. This figure also suggests a guideline for choosing a suitable c depending on the quality of the compressed images required by the user. Fig Performance of Algorithm 2 for the ORL database of faces The Japanese female facial expression (JAFFE) database. Our last tests are on the facial database [34] from the Psychology Department in Kyushu University, which contains 213 images of 7 different emotional facial expressions (sadness, happiness, surprise, anger, disgust, fear and neutral) posed by 10 Japanese female models. The size of each image is pixels. Here we draw 7 different facial expressions of one female model and construct a tensor T of size A similar set of tests as in Section are conducted. The results are presented in Figures 5.7 and 5.8, Table 5.3, and Figure 5.9, which again confirm the observation from the results for the ORL database of faces. In particular, by comparing the best two set of images (the best ALS method and Algorithm 2) in Figures 5.7 and 5.8, Algorithm 2 is clearer better in details, e.g. the eyebrows.

16 BILIAN CHEN, ZHENING LI, AND SHUZHONG ZHANG Fig. 5.7. Recovered images for the JAFFE database when c = 80 (rows from the top to the bottom: 1. Original, 2. Algorithm 3, 3. ALS (the best), 4.

In this paper we study the problem of finding a proper Tucer decomposition for a general tensor without a pre-specified size of the core tensor, which addresses an important practical issue for real

The approach that we propose is based on the so-called maximum bloc improvement (MBI) method.

Rayens, Structure-seeing multilinear methods for the analysis of fmri data, NeuroImage, 22 (2004), pp. 728 739. [2] C. M. Andersen and R.

16 16 BILIAN CHEN, ZHENING LI, AND SHUZHONG ZHANG Fig Recovered images for the JAFFE database when c = 80 (rows from the top to the bottom: 1. Original, 2. Algorithm 3, 3. ALS (the best), 4. ALS (the worst), 5. Algorithm 2) 6. Conclusion. In this paper we study the problem of finding a proper Tucer decomposition for a general tensor without a pre-specified size of the core tensor, which addresses an important practical issue for real applications. The size of the core tensor is assumed in almost all the existing methods for tensor Tucer decomposition. The approach that we propose is based on the so-called maximum bloc improvement (MBI) method. Our numerical experiments on a variety of instances taen from real applications suggest that the proposed methods perform robustly and effectively. REFERENCES [1] A. H. Andersen and W. S. Rayens, Structure-seeing multilinear methods for the analysis of fmri data, NeuroImage, 22 (2004), pp [2] C. M. Andersen and R. Bro, Practical aspects of PARAFAC modeling of fluorescence excitation-emission data, Journal of Chemometrics, 17 (2003), pp [3] C. A. Andersson and R. Bro, Improving the speed of multi-way algorithms: Part I. Tucer3, Chemometrics and Intelligent Laboratory Systems, 42 (1998), pp [4] C. J. Appellof and E. R. Davidson, Strategies for analyzing data from video fluorometric monitoring of liquid chromatographic effluents, Analytical Chemistry, 53 (1981), pp [5] A. Aubry, A. De Maio, B. Jiang, and S. Zhang, A cognitive approach for ambiguity function shaping, Proceedings of the 7th IEEE Sensor Array and Multichannel Signal Processing Worshop, 2012, pp [6] B. W. Bader, M. W. Berry, and M. Browne, Discussion tracing in enron using PARAFAC, in M. W. Berry and M. Castellanos (eds.), Survey of Text Mining: Clustering, Classification, and Retrieval (2nd edition), Springer, New Yor, 2007, pp [7] B. W. Bader and T. G. Kolda, Matlab Tensor Toolbox, Version 2.5, tgolda/tensortoolbox

ADJUSTABLE TENSOR TUCKER DECOMPOSITION 17 Fig. 5.8.

top to the bottom: 1. Original, 2. Algorithm 3, 3. ALS (the best), 4.

Performance of Algorithm 2 for the JAFFE database [8] D. P.

Laboratory Systems, 38 (1997), pp. 149 171. [10] R.

Agricultural University, Denmar, 1998. [11] R. Bro and H. A. L.

17 ADJUSTABLE TENSOR TUCKER DECOMPOSITION 17 Fig Recovered images for the JAFFE database when c = 110 (rows from the top to the bottom: 1. Original, 2. Algorithm 3, 3. ALS (the best), 4. ALS (the worst), 5. Algorithm 2) Fig Performance of Algorithm 2 for the JAFFE database [8] D. P. Bertseas, Nonlinear Programming (2nd edition), Athena Scientific, Belmont, MA, [9] R. Bro, PARAFAC: Tutorial and applications, Chemometrics and Intelligent Laboratory Systems, 38 (1997), pp [10] R. Bro, Multi-way Analysis in the Food Industry: Models, Algorithms, and Applications, Ph.D. Thesis, University of Amsterdam, Nethelands, and Royal Veterinary and Agricultural University, Denmar, [11] R. Bro and H. A. L. Kiers, A new efficient method for determining the number of components in PARAFAC models, Journal of Chemometrics, 17 (2003), pp [12] J. D. Carroll and J. J. Chang, Analysis of individual differences in multidimensional scaling

18 18 BILIAN CHEN, ZHENING LI, AND SHUZHONG ZHANG via an N-way generalization of Ecart-Young decomposition, Psychometria, 35 (1970), pp [13] B. Chen, Optimization with Bloc Variables: Theory and Applications, Ph.D. Thesis, The Chinese Univesrity of Hong Kong, Hong Kong, [14] B. Chen, S. He, Z. Li, and S. Zhang, Maximum bloc improvement and polynomial optimization, SIAM Journal on Optimization, 22 (2012), pp [15] L. De Lathauwer, B. De Moor, and J. Vandewalle, A multilinear singular value decomposition, SIAM Journal on Matrix Analysis and Applications, 21 (2000), pp [16] L. De Lathauwer, B. De Moor, and J. Vandewalle, On the best ran-1 and ran- (R 1, R 2,..., R N ) approximation of higher-order tensors, SIAM Journal on Matrix Analysis and Applications, 21 (2000), pp [17] L. De Lathauwer and D. Nion, Decompositions of a higher-order tensor in bloc terms Part III: Alternating least squares algorithms, SIAM Journal on Matrix Analysis and Applications, 30 (2008), pp [18] L. De Lathauwer and J. Vandewalle, Dimensionality reduction in higher-order signal processing and ran-(r 1, R 2,..., R N ) reduction in multilinear algebra, Linear Algebra and its Applications, 391 (2004), pp [19] L. Eldén and B. Savas, A Newton-Grassmann method for computing the best multi-linear ran-(r1, r2, r3) approximation of a tensor, SIAM Journal on Matrix Analysis and Applications, 31 (2009), pp [20] R. A. Harshman, Foundations of the PARAFAC procedure: Models and conditions for an explanatory multi-modal factor analysis, UCLA Woring Papers in Phonetics, 16 (1970), pp Also available online at harshman/wpppfac0.pdf [21] Z. He, A. Cichoci, and S. Xie, Efficient method for Tucer3 model selection, Electronics Letters, 45 (2009), pp [22] R. Henrion, Simultaneous simplification of loading and core matrices in N-way PCA: Application to chemometric data arrays, Fresenius Journal of Analytical Chemistry, 361 (1998), pp [23] R. Henrion, On global, local and stationary solutions in three-way data analysis, J. Chemometrics, 14 (2000), pp [24] M. Ishteva, L. De Lathauwer, P. A. Absil, and S. Van Huffel, Differential-geometric Newton method for the best ran-(r 1, R 2, R 3 ) approximation of tensors, Numerical Algorithms, 51 (2009), pp [25] A. Kapteyn, H. Neudecer, and T. Wansbee, An approach to n-mode components analysis, Psychometria, 51 (1986), pp [26] H. A. L. Kiers and A. Der Kinderen, A fast method for choosing the numbers of components in Tucer3 analysis, British Journal of Mathematical and Statistical Psychology, 56 (2003), pp [27] E. Kofidis and P. A. Regalia, On the best ran-1 approximation of higher order supersymmetric tensors, SIAM Journal on Matrix Analysis and Applications, 23 (2002), pp [28] T. G. Kolda, Multilinear operators for higher-order decompositions, Technical Report SAND , Sandia National Laboratories, Albuquerque, NM, [29] T. G. Kolda and B. W. Bader, Tensor decompositions and applications, SIAM Review, 51 (2009), pp [30] P. M. Kroonenberg and J. De Leeuw, Principal component analysis of three-mode data by means of alternating least squares algorithms, Psychometria, 45 (1980), pp [31] D. Leibovici and R. Sabatier, A singular value decomposition of a -way array for a principal component analysis of multiway data, PTA-, Linear Algebra and its Applications, 269 (1998), pp [32] J. Levin, Three-mode Factor Analysis, Ph.D. Thesis, University of Illinois, Urbana, IL, [33] Z. Li, A. Uschmajew, and S. Zhang, Linear convergence analysis of the maximum bloc improvement method for spherically constrained optimization, Technical Report, [34] M. J. Lyons, S. Aamatsu, M. Kamachi, and J. Gyoba, Coding facial expressions with gabor wavelets, Proceedings of the 3rd IEEE International Conference on Automatic Face and Gesture Recognition, 1998, pp [35] L. Qi, The best ran-one approximation ratio of a tensor space, SIAM Journal on Matrix Analysis and Applications, 32 (2011), pp [36] F. Samaria and A. Harter, Parameterisation of a stochastic model for human face identification, Proceedings of 2nd IEEE Worshop on Applications of Computer Vision, 1994, pp [37] B. Savas and L. Eldén, Handwritten digit classification using higher order singular value decomposition, Pattern Recognition, 40 (2007), pp

19 ADJUSTABLE TENSOR TUCKER DECOMPOSITION 19 [38] W. Sun and Y. Yuan, Optimization Theory and Methods: Nonlinear Programming, Springer Optimization and Its Applicationss, Volume 1, Springer, New Yor, [39] M. E. Timmerman and H. A. L. Kiers, Three-mode principal components analysis: Choosing the numbers of components and sensitivity to local optima, British Journal of Mathematical and Statistical Psychology, 53 (2000), pp [40] R. Tomioa, T. Suzui, K. Hayashi, and H. Kashima, Statistical performance of convex tensor decomposition, Proceedings of the 25th Annual Conference on Neural Information Processing Systems, 2011, pp [41] L. R. Tucer, Implications of factor analysis of three-way matrices for measurement of change, in C. W. Harris (eds.), Problems in Measuring Change, University of Wisconsin Press, 1963, pp [42] L. R. Tucer, The extension of factor analysis to three-dimensional matrices, in H. Gullisen and N. Frederisen (eds.), Contributions to Mathematical Psychology, Holt, Rinehardt, & Winston, New Yor, NY, [43] L. R. Tucer, Some mathematical notes on three-mode factor analysis, Psychometria, 31 (1966), pp [44] A. Uschmajew, Local convergence of the alternating least squares algorithm for canonical tensor approximation, SIAM Journal on Matrix Analysis and Applications, 33 (2012), pp [45] M. A. O. Vasilescu and D. Terzopoulos, Multilinear analysis of image ensembles: Tensorfaces, Proceedings of the 7th European Conference on Computer Vision, Lecture Notes in Computer Science, 2350 (2002), Springer, New Yor, pp [46] H. Wang, Q. Wu, L. Shi, Y. Yu, and N. Ahuja, Out-of-core tensor approximation of multidimensional matrices of visual data, ACM Transactions on Graphics (Special Issue for SIGGRAPH 2005), 24 (2005), pp [47] Y. Wang and L. Qi, On the successive supersymmetric ran-1 decomposition of higher-order supersymmetric tensors, Numerical Linear Algebra with Applications, 14 (2007), pp [48] S. Zhang, K. Wang, C. Ashby, B. Chen, and X. Huang, A unified adaptive co-identification framewor for high-d expression data, in T. Shibuya et al. (eds.), Proceedings of the 7th IAPR International Conference on Pattern Recognition in Bioinformatics, Lecture Notes in Computer Science, 7632 (2012), Springer, New Yor, pp [49] S. Zhang, K. Wang, B. Chen, and X. Huang, A new framewor for co-clustering of gene expression data, in M. Loog et al. (eds.), Proceedings of the 6th IAPR International Conference on Pattern Recognition in Bioinformatics, Lecture Notes in Computer Science, 7036 (2011), Springer, New Yor, pp [50] T. Zhang and G. H. Golub, Ran-one approximation to high order tensors, SIAM Journal on Matrix Analysis and Applications, 23 (2001), pp

CP DECOMPOSITION AND ITS APPLICATION IN NOISE REDUCTION AND MULTIPLE SOURCES IDENTIFICATION

International Conference on Computer Science and Intelligent Communication (CSIC ) CP DECOMPOSITION AND ITS APPLICATION IN NOISE REDUCTION AND MULTIPLE SOURCES IDENTIFICATION Xuefeng LIU, Yuping FENG,