The multiple-vector tensor-vector product

I TD MTVP C KU Leuven August 29, 2013 In collaboration with: N Vanbaelen, K Meerbergen, and R Vandebril

Overview I TD MTVP C 1 Introduction Inspiring example Notation 2 Tensor decompositions The CP decomposition The orthogonal Tucker decomposition 3 Multiple-vector tensor-vector product Traditional approach Successive contractions Blocking 4 Conclusions

Overview I TD MTVP C Inspiring example Notation 1 Introduction Inspiring example Notation 2 Tensor decompositions The CP decomposition The orthogonal Tucker decomposition 3 Multiple-vector tensor-vector product Traditional approach Successive contractions Blocking 4 Conclusions

Inspiring example I TD MTVP C Inspiring example Notation 1 1 Image courtesy of NASA

Inspiring example I TD MTVP C Inspiring example Notation 1 1 Image courtesy of Hossain et al (2013)

Inspiring example I TD MTVP C Inspiring example Notation Numerical simulation yields, at every grid point, pressure, velocity, and temperature Usually simulated for different values of the angle of attack, the Reynolds number (viscosity of fluid), the Mach number (speed) Hence, a multiway array with 5 indices

Tensors I TD MTVP C Inspiring example Notation A tensor A is an element of R n 1 R n 2 R n d A = n 1 n 2 i 1 =1 i 2 =1 n d i d =1 a i1,i 2,,i d e i1 e i2 e id Mode 1 vectors (R n 1 ) Mode 2 vectors (R n 2 ) Mode 3 vectors (R n 3 )

Unfolding I TD MTVP C Inspiring example Notation R n 1 n 2 n 3 A = Mode 2 unfolding A (2) = R n 2 n 1 n 3

Overview I TD MTVP C The CP decomposition The orthogonal Tucker decomposition 1 Introduction Inspiring example Notation 2 Tensor decompositions The CP decomposition The orthogonal Tucker decomposition 3 Multiple-vector tensor-vector product Traditional approach Successive contractions Blocking 4 Conclusions

I TD MTVP C The CP decomposition The orthogonal Tucker decomposition Reduced parameter representation Low-rank representation of matrices by the SVD: A = + +

I TD MTVP C The CP decomposition The orthogonal Tucker decomposition Reduced parameter representation Low-rank representation by the rank decomposition: = + + A Low-rank representation by the orthogonal Tucker decomposition: A = S

I TD MTVP C The CP decomposition The orthogonal Tucker decomposition CANDECOMP/PARAFAC decomposition Hitchcock (1927) proposed the rank decomposition: A = r i=1 a (1) i a (2) i a (d) i A = + + Various applications in Chemometrics, signal processing (blind source separation), and psychometrics

I TD MTVP C The CP decomposition The orthogonal Tucker decomposition Algorithms for CP decompositions Let be the elementwise product and the Khatri-Rao product: A B = [ a 1 b 1 a 2 b 2 a n b n ] Let A i = [ ] a (i) 1 a (i) 2 a (i) r

I TD MTVP C The CP decomposition The orthogonal Tucker decomposition Algorithms for CP decompositions Gradient-based optimization algorithms minimizing the objective function: 2 f (A 1,, A d ) = 1 2 A (1) A 1 (A 2 A d ) T F, whose gradient is with and f = [ ] f A i f A i = A (i) (A 1 A i 1 A i+1 A d ) A i J J = (A T 1 A 1 A T i 1A i 1 A T i+1a i+1 A T d A d) 2 Hayashi and Hayashi (1982), Paatero (1997), Tomasi and Bro (2005), Acar et al (2011), and Sorber et al (2013), among others

I TD MTVP C The CP decomposition The orthogonal Tucker decomposition Orthogonal Tucker decomposition The multilinear multiplication of A with {U i } d i=1 is: n 1 (U 1,, U d ) A := i 1 =1 n d i d =1 Tucker (1966) proposed the decomposition a i1,,i d (U 1 e i1 ) (U d e id ) A = (U 1,, U d ) S A = U 1 S U 3 U 2 where S is r 1 r d and A is n 1 n d with r i n i

I TD MTVP C The CP decomposition The orthogonal Tucker decomposition Algorithms for OT decompositions Core idea: use your favorite method to compute a (reduced) basis U i for every factor R n i, then project to obtain S Several approaches exist: Quasi-optimal direct methods, 3 iterative refinement (HOOI), 4 optimization-based methods, 5 iterative methods, 6 and randomized methods 3 Tucker (1966), De Lathauwer et al (2000a), Vannieuwenhoven et al (2012) 4 Kroonenberg and de Leeuw (1980), De Lathauwer et al (2000b) 5 Eldén and Savas (2009), Ishteva et al (2009,2011), Savas and Lim (2010) 6 Goreinov et al (2012), Savas and Eldén (2013)

I TD MTVP C The CP decomposition The orthogonal Tucker decomposition The tensor Krylov method of Eldén and Savas To generate a dominant subspace for mode i, what about Golub and Kahan s bidiagonalization algorithm? β (i) 0 0 v (i) 0 0 for k 1, 2, do α (i) k v(i) k A T (i) u(i) k β (i) k+1 u(i) k+1 A (i)v (i) k β(i) k v(i) k 1 α (i) k u(i) k end for While feasible, 7 this may be impractical because v (i) k is of length n 1 n i 1 n i+1 n d 7 To the best of my knowledge, this has not been investigated Note that only the last vector is required, so memory requirements are quite modest

I TD MTVP C The CP decomposition The orthogonal Tucker decomposition The tensor Krylov method of Eldén and Savas Key idea of Eldén and Savas (2013) was to interlace computation of the bases U i and requiring v (i) k = a (1) i,k a(2) i,k a(i 1) i,k a (i+1) i,k a (d) i,k which is somehow constructed form {v (i) j } k,d j=1,i=1 (several options exist) Hence, the key step in the algorithm becomes: A (i) (a (1) i,k a(2) i,k a(i 1) i,k a (i+1) i,k a (d) i,k )

Overview I TD MTVP C Traditional approach Successive contractions Blocking 1 Introduction Inspiring example Notation 2 Tensor decompositions The CP decomposition The orthogonal Tucker decomposition 3 Multiple-vector tensor-vector product Traditional approach Successive contractions Blocking 4 Conclusions

I TD MTVP C Traditional approach Successive contractions Blocking Multiple vector tensor-vector product It is known that v = A (k) (v 1 v 2 v k 1 v k+1 v d ) is equivalent with the multilinear multiplication v k = (v T 1, v T 2,, v T k 1, I, vt k+1,, v T d ) A := vt j d A j=1 j k We call this a mode-k tensor-vector product (k-tvp) v 3 = v 1 v 2

I TD MTVP C Traditional approach Successive contractions Blocking M1: The traditional approach Traditional way of computing a k-tvp: v = A (k) (v 1 v 2 v k 1 v k+1 v d ), (Ref M1) which requires j k n j values of temporary memory In general, the multiple-vector cases is: V = A (k) (V 1 V 2 V k 1 V k+1 V d ), V i R n i r which requires r j k n j values of temporary memory

I TD MTVP C Traditional approach Successive contractions Blocking M1: The traditional approach Major problem for small r is the cost of unfolding: 200x150x15x70 200x50x32x100 100x70x60x76 90x62x62x90 90x80x60x75 85x75x60x83 80x75x70x76 75x75x75x75 Normalized execution times all modes Ideal 0 02 04 06 08 1 (r = 2 in these experiments)

I TD MTVP C Traditional approach Successive contractions Blocking M3: Successive contractions = A T = A T (3) z

I TD MTVP C Traditional approach Successive contractions Blocking M3: Successive contractions = x T = A T (2) y

I TD MTVP C Traditional approach Successive contractions Blocking M3: Successive contractions Formally, v 1 (I, v 2,, v d ) T A, then, letting A 0 := A, we may compute, (A 1 (d) )T (A 0 (d) )T v d ; (A 2 (d 1) )T (A 1 (d 1) )T v d 1 ; v T 1 (A d 2 (2) )T v 2 This is called a Right-To-Left (RTL) contraction One can show that explicit unfoldings are unnecessary!

I TD MTVP C Traditional approach Successive contractions Blocking M3: Successive contractions One can prove analogous result for v d (v 1, v 2,, v d 1, I ) T A, yielding a Left-To-Right (LTR) contraction Consequently, no unfoldings are necessary if v k v T j d A = vj T j=1 j k d ( v T k 1 j A) j=k+1 j=1 is computed by an LTR followed by an RTL contraction

I TD MTVP C Traditional approach Successive contractions Blocking M3: Successive contractions The multiple-vector cases is handled by B 1 (1) V T 1 A (1) and then processing its rows using tensor-vector products

I TD MTVP C Traditional approach Successive contractions Blocking M3: Successive contractions Normalized execution times all modes M3 200x50x32x100 M1 200x50x32x100 M3 75x75x75x75 M1 75x75x75x75 0 02 04 06 08 1 Additional memory consumption (MB) M3 200x50x32x100 M1 200x50x32x100 M3 75x75x75x75 M1 75x75x75x75 0 500 1,000 1,500 2,000 2,500 (r = 300 in these experiments)

I TD MTVP C Traditional approach Successive contractions Blocking M3B: Successive contractions with blocking To reduce memory consumption, subdivide the tensor and compute k-tvps with the subtensors v v 3,2 3,1 3 + = v 1,1 v 1,2 v 1,3 v 2,1 v 2,2 v 2,3 v 2,4 Nearly as easy as it sounds, though some care must be taken

I TD MTVP C Traditional approach Successive contractions Blocking M3B: Successive contractions with blocking Some heuristics: If possible, permute the modes once prior to all computations so that n 1 n d n 2 n d 1 Slicing on modes 1 and 2 appears to be most effective blocking strategy (!) See our upcoming report for motivations

I TD MTVP C Traditional approach Successive contractions Blocking 317 317 317 tensor with r = 100 vectors Execution time (s) Ref M3B Ref M3 Ref M1 0 50 100 150 200 250 Additional memory consumption (MB) Ref M3B Ref M3 Ref M1 0 50 100 150 200 250 300 350

I TD MTVP C Traditional approach Successive contractions Blocking 350 325 281 tensor with r = 300 vectors Execution time (s) Ref M3B Ref M3 Ref M1 0 100 200 300 400 500 600 Additional memory consumption (MB) Ref M3B Ref M3 Ref M1 0 100 200 300 400 500

Conclusions I TD MTVP C Unfoldings may be very useful for theoretical developments, but practical algorithms will benefit from combining blocking with successive contractions

I TD MTVP C Thank you for your attention!

References I TD MTVP C E Acar, DM Dunlavy, and TG Kolda, A scalable optimization approach for fitting canonical tensor decompositions, J Chemometrics 25(2), 2011 J Carroll and J-J Chang, Analysis of individual differences in multidimensional scaling via an n-way generalization of EckartYoung decomposition, Psychometrika 35(3), 1970 L De Lathauwer, B De Moor, and J Vandewalle, A multilinear singular value decomposition, SIAM J Matrix Anal Appl 30(3), 2000a L De Lathauwer, B De Moor, and J Vandewalle, On the best Rank-1 and rank-(r 1, R 2,, R N ) approximation of higher-order tensors, SIAM J Matrix Anal Appl, 21(4), 2000b L Eldén and B Savas, A Newton Grassmann method for computing the best multilinear rank-(r 1, r 2, r 3 ) approximation of a tensor, SIAM J Matrix Anal Appl 31(2), 2009 SA Goreinov, IV Oseledets, and DV Savostyanov, Wedderburn rank reduction and Krylov subspace methods for tensor approximation Part 1: Tucker case, SIAM J Sci Comput 34(1), 2012 RA Harshman, Foundations of the PARAFAC procedure: Models and conditions for an explanatory multi-modal factor analysis, UCLA Working Papers in Phonetics 16, 1970

References I TD MTVP C C Hayashi and F Hayashi, A new algorithm to solve PARAFAC-model, Behaviormetrika 11, 1982 FL Hitchcock, The expression of a tensor or a polyadic as a sum of products, J Math Phys 6(1), 1927 MA Hossain, Z Huque, RR Kammalapati, and S Khan, Numerical investigation of compressible flow over NREL Phase VI airfoil, IJERT 2(2), 2013 M Ishteva, L De Lathauwer, P-A Absil, and S Van Huffel, Differential-geometric Newton method for the best rank-(r 1,r 2,r 3 ) approximation of tensors, Numer Algorithms 51, 2009 M Ishteva, P-A Absil, S Van Huffel, and L De Lathauwer, Best low multilinear rank approximation of higher-order tensors, based on the Riemannian trust-region scheme, SIAM J Matrix Anal Appl 32, 2011 P Kroonenberg and J de Leeuw, Principal component analysis of three-mode data by means of alternating least squares algorithms, Psychometrika 45(1), 1980 P Paatero, A weighted non-negative least squares algorithm for three-way PARAFAC factor analysis, Chemometrics and Intelligent Laboratory Systems 38, 1997

References I TD MTVP C B Savas and L Eldén, Krylov-type methods for tensor computations I, Linear Algebra Appl 438(2), 2013 B Savas and L-H Lim, Quasi-Newton methods on Grassmannians and multilinear approximations of tensors, SIAM J Sci Comput 32, 2010 L Sorber, M Van Barel and L De Lathauwer, Optimization-based algorithms for tensor decompositions: canonical polyadic decomposition, decomposition in rank-(l r, L r, 1) terms, and a new generalization, SIAM J Optim 23(2), 2013 G Tomasi and R Bro, PARAFAC and missing values, Chemometrics and Intelligent Laboratory Systems 75, 2005 LR Tucker, Some mathematical notes on three-mode factor analysis, Psychometrika 31(3), 1966 N Vannieuwenhoven, R Vandebril, and K Meerbergen, A new truncation strategy for the higher-order singular value decomposition, SIAM J Sci Comput 34(2), 2012 N Vannieuwenhoven, N Vanbaelen, K Meerbergen, and R Vandebril, The dense tensor-vector product: an initial study, 2013 (In preparation)