High Performance Parallel Tucker Decomposition of Sparse Tensors
|
|
- Ursula Lawrence
- 5 years ago
- Views:
Transcription
1 High Performance Parallel Tucker Decomposition of Sparse Tensors Oguz Kaya INRIA and LIP, ENS Lyon, France SIAM PP 16, April 14, 2016, Paris, France Joint work with: Bora Uçar, CNRS and LIP, ENS Lyon, France 1/ 16 Parallel Sparse Tucker Decompositions
2 Tucker Tensor Decomposition K C R 3 I X J K G I A B J R 2 R 1 Tucker decomposition provides a rank-(r 1,..., R N ) approximation of a tensor. consists of a core tensor G R R 1 R N and N matrices having R 1,..., R N columns. We are interested in the case when X is big, sparse, and is of low rank. Example: Google web queries, Netflix movie ratings, Amazon product reviews, etc. 2/ 16 Parallel Sparse Tucker Decompositions
3 Tucker Tensor Decomposition K C R 3 I X J K G I A B J R 2 R 1 Tucker decomposition provides a rank-(r 1,..., R N ) approximation of a tensor. consists of a core tensor G R R 1 R N and N matrices having R 1,..., R N columns. We are interested in the case when X is big, sparse, and is of low rank. Example: Google web queries, Netflix movie ratings, Amazon product reviews, etc. 2/ 16 Parallel Sparse Tucker Decompositions
4 Tucker Tensor Decomposition Related Work: Matlab Tensor Toolbox by Kolda et al. Efficient and scalable computations with sparse tensors (Baskaran et al., 12) Parallel Tensor Compression for Large-Scale Scientific Data (Austin et al., 15) Haten2: Billion-scale tensor decompositions (Jung et al., 15) Applications (in data mining): CubeSVD: A Novel Approach to Personalized Web Search (Sun et al., 05) Tag Recommendations Based on Tensor Dimensionality Reduction (Symeonidis et al., 08) Extended feature combination model for recommendations in location-based mobile services (Sattari et al. 15) Goal: To compute sparse Tucker decomposition in parallel (shared/distributed memory). 3/ 16 Parallel Sparse Tucker Decompositions
5 Tucker Tensor Decomposition Related Work: Matlab Tensor Toolbox by Kolda et al. Efficient and scalable computations with sparse tensors (Baskaran et al., 12) Parallel Tensor Compression for Large-Scale Scientific Data (Austin et al., 15) Haten2: Billion-scale tensor decompositions (Jung et al., 15) Applications (in data mining): CubeSVD: A Novel Approach to Personalized Web Search (Sun et al., 05) Tag Recommendations Based on Tensor Dimensionality Reduction (Symeonidis et al., 08) Extended feature combination model for recommendations in location-based mobile services (Sattari et al. 15) Goal: To compute sparse Tucker decomposition in parallel (shared/distributed memory). 3/ 16 Parallel Sparse Tucker Decompositions
6 Tucker Tensor Decomposition Related Work: Matlab Tensor Toolbox by Kolda et al. Efficient and scalable computations with sparse tensors (Baskaran et al., 12) Parallel Tensor Compression for Large-Scale Scientific Data (Austin et al., 15) Haten2: Billion-scale tensor decompositions (Jung et al., 15) Applications (in data mining): CubeSVD: A Novel Approach to Personalized Web Search (Sun et al., 05) Tag Recommendations Based on Tensor Dimensionality Reduction (Symeonidis et al., 08) Extended feature combination model for recommendations in location-based mobile services (Sattari et al. 15) Goal: To compute sparse Tucker decomposition in parallel (shared/distributed memory). 3/ 16 Parallel Sparse Tucker Decompositions
7 Higher Order Orthogonal Iteration (HOOI) Algorithm K R 3 Algorithm: HOOI for 3rd order tensors repeat I X J K G I A R 1 C B J R 2 1 Â [X 2 B 3 C] (1) 2 A TRSVD(Â, R 1) //R 1 leading left singular vectors 3 ˆB [X 1 A 3 C] (2) 4 B TRSVD(ˆB, R 2) 5 Ĉ [X 1 A 2 B] (2) 6 C TRSVD(Ĉ, R 3) until no more improvement or maximum iterations reached 7 G X 1 A 2 B 3 C 8 return [G; A, B, C] We discuss the case where R 1 = R 2 = = R N = R and N = 3. A R I R,B R J R, and C R K R are dense. Â [X 2 B 3 C] (1) R I R2 is called tensor-times-matrix multiply (TTM). Â R I RN 1,ˆB R J RN 1, and Ĉ R K RN 1 are dense. (R 2 columns for N = 3) 4/ 16 Parallel Sparse Tucker Decompositions
8 Higher Order Orthogonal Iteration (HOOI) Algorithm K R 3 Algorithm: HOOI for 3rd order tensors repeat I X J K G I A R 1 C B J R 2 1 Â [X 2 B 3 C] (1) 2 A TRSVD(Â, R 1) //R 1 leading left singular vectors 3 ˆB [X 1 A 3 C] (2) 4 B TRSVD(ˆB, R 2) 5 Ĉ [X 1 A 2 B] (2) 6 C TRSVD(Ĉ, R 3) until no more improvement or maximum iterations reached 7 G X 1 A 2 B 3 C 8 return [G; A, B, C] We discuss the case where R 1 = R 2 = = R N = R and N = 3. A R I R,B R J R, and C R K R are dense. Â [X 2 B 3 C] (1) R I R2 is called tensor-times-matrix multiply (TTM). Â R I RN 1,ˆB R J RN 1, and Ĉ R K RN 1 are dense. (R 2 columns for N = 3) 4/ 16 Parallel Sparse Tucker Decompositions
9 Higher Order Orthogonal Iteration (HOOI) Algorithm K R 3 Algorithm: HOOI for 3rd order tensors repeat I X J K G I A R 1 C B J R 2 1 Â [X 2 B 3 C] (1) 2 A TRSVD(Â, R 1) //R 1 leading left singular vectors 3 ˆB [X 1 A 3 C] (2) 4 B TRSVD(ˆB, R 2) 5 Ĉ [X 1 A 2 B] (2) 6 C TRSVD(Ĉ, R 3) until no more improvement or maximum iterations reached 7 G X 1 A 2 B 3 C 8 return [G; A, B, C] We discuss the case where R 1 = R 2 = = R N = R and N = 3. A R I R,B R J R, and C R K R are dense. Â [X 2 B 3 C] (1) R I R2 is called tensor-times-matrix multiply (TTM). Â R I RN 1,ˆB R J RN 1, and Ĉ R K RN 1 are dense. (R 2 columns for N = 3) 4/ 16 Parallel Sparse Tucker Decompositions
10 Higher Order Orthogonal Iteration (HOOI) Algorithm K R 3 Algorithm: HOOI for 3rd order tensors repeat I X J K G I A R 1 C B J R 2 1 Â [X 2 B 3 C] (1) 2 A TRSVD(Â, R 1) //R 1 leading left singular vectors 3 ˆB [X 1 A 3 C] (2) 4 B TRSVD(ˆB, R 2) 5 Ĉ [X 1 A 2 B] (2) 6 C TRSVD(Ĉ, R 3) until no more improvement or maximum iterations reached 7 G X 1 A 2 B 3 C 8 return [G; A, B, C] We discuss the case where R 1 = R 2 = = R N = R and N = 3. A R I R,B R J R, and C R K R are dense. Â [X 2 B 3 C] (1) R I R2 is called tensor-times-matrix multiply (TTM). Â R I RN 1,ˆB R J RN 1, and Ĉ R K RN 1 are dense. (R 2 columns for N = 3) 4/ 16 Parallel Sparse Tucker Decompositions
11 Higher Order Orthogonal Iteration (HOOI) Algorithm K R 3 Algorithm: HOOI for 3rd order tensors repeat I X J K G I A R 1 C B J R 2 1 Â [X 2 B 3 C] (1) 2 A TRSVD(Â, R 1) //R 1 leading left singular vectors 3 ˆB [X 1 A 3 C] (2) 4 B TRSVD(ˆB, R 2) 5 Ĉ [X 1 A 2 B] (2) 6 C TRSVD(Ĉ, R 3) until no more improvement or maximum iterations reached 7 G X 1 A 2 B 3 C 8 return [G; A, B, C] We discuss the case where R 1 = R 2 = = R N = R and N = 3. A R I R,B R J R, and C R K R are dense. Â [X 2 B 3 C] (1) R I R2 is called tensor-times-matrix multiply (TTM). Â R I RN 1,ˆB R J RN 1, and Ĉ R K RN 1 are dense. (R 2 columns for N = 3) 4/ 16 Parallel Sparse Tucker Decompositions
12 Tensor-Times-Matrix Multiply  [X 2 B 3 C] (1),  R I R2 B(j, :) C(k, :) R R2 is a Kronecker product. For each nonzero x i,j,k ; Â(i, :) receives the update x i,j,k [B(j, :) C(k, :)]. Algorithm:  [X 2 B 3 C] (1) 1  zeros(i, R 2 ) foreach x i,j,k X do 2 Â(i, :) Â(i, :) + x i,j,k[b(j, :) C(k, :)] 5/ 16 Parallel Sparse Tucker Decompositions
13 Outline 1 Introduction 2 Parallel HOOI 3 Results 4 Conclusion
14 (Bad) Fine-Grain Parallel TTM within Tucker-ALS A and  are rowwise distributed Process p owns and computes A(I p, :) and Â(I p, :). Tensor nonzeros are partitioned (arbitrarily) Process p owns the subset of nonzeros X p Performing x i,j,k[b(j, :) C(k, :)] and generating a partial result for Â(i, :) is a fine-grain task We use post-communication scheme at each iteration: at the beginning, rows of A, B, and C are available. at the end, only A is updated and communicated. Partial row results of  are sent/received (fold). Rows of A(I p, :) are sent/received (expand). Algorithm: Computing A in fine-grain HOOI at process p foreach xi,j,k X p do 1 Â(i, :) Â(i, :) + xi,j,k[b(j, :) C(k, :)] 2 Send/Receive and sum up partial rows of  3 A(Ip, :) TRSVD(Â, R) 4 Send/Receive rows of A 6/ 16 Parallel Sparse Tucker Decompositions
15 (Bad) Fine-Grain Parallel TTM within Tucker-ALS Number of rows sent/received in fold/expand are equal. Each communication unit of expand has size R. Each communication unit of fold has size R N 1. We want to avoid assembling  in fold communication. We need to compute TRSVD(Â, R). Algorithm: Computing A in fine-grain HOOI at process p foreach xi,j,k X p do 1 Â(i, :) Â(i, :) + xi,j,k[b(j, :) C(k, :)] 2 Send/Receive and sum up partial rows of  3 A(Ip, :) TRSVD(Â, R) 4 Send/Receive rows of A 6/ 16 Parallel Sparse Tucker Decompositions
16 (Bad) Fine-Grain Parallel TTM within Tucker-ALS Number of rows sent/received in fold/expand are equal. Each communication unit of expand has size R. Each communication unit of fold has size R N 1. We want to avoid assembling  in fold communication. We need to compute TRSVD(Â, R). Algorithm: Computing A in fine-grain HOOI at process p foreach xi,j,k X p do 1 Â(i, :) Â(i, :) + xi,j,k[b(j, :) C(k, :)] 2 Send/Receive and sum up partial rows of  3 A(Ip, :) TRSVD(Â, R) 4 Send/Receive rows of A 6/ 16 Parallel Sparse Tucker Decompositions
17 Computing TRSVD A ~ ~ A 1 ~ A 2 ~ A 3 = + + Gram matrix  T? Iterative solvers? Need to perform Âx and  T x efficiently. 7/ 16 Parallel Sparse Tucker Decompositions
18 Computing TRSVD A ~ ~ A 1 ~ A 2 ~ A 3 = + + Gram matrix  T? Iterative solvers? Need to perform Âx and  T x efficiently. 7/ 16 Parallel Sparse Tucker Decompositions
19 Computing TRSVD A ~ ~ A 1 ~ A 2 ~ A 3 = + + Gram matrix  T? Iterative solvers? Need to perform Âx and  T x efficiently. 7/ 16 Parallel Sparse Tucker Decompositions
20 Computing y Âx A ~ ~ A 1 ~ A 2 ~ A 3 = + + 8/ 16 Parallel Sparse Tucker Decompositions
21 Computing y Âx ~ ~ ~ A ~ A 1 A 2 A 3 x x x x = + + 8/ 16 Parallel Sparse Tucker Decompositions
22 Computing y Âx ~ ~ ~ A ~ A 1 A 2 A 3 x x x x = + + For each unit of communication, we perform extra work in MxV. 8/ 16 Parallel Sparse Tucker Decompositions
23 Computing y Âx y y 1 y2 y3 = + + Instead of communicating R N 1 entries, we communicate 1! (per SVD iteration) 8/ 16 Parallel Sparse Tucker Decompositions
24 Computing y Âx y y 1 y2 y3 = + + y  T x works in reverse with the same communication cost. Row distribution of y and left-singular vectors are the same as  A gets the same row distribution as Â. 8/ 16 Parallel Sparse Tucker Decompositions
25 (Good) Fine-Grain Parallel TTM within Tucker-ALS Algorithm: Computing A in fine-grain HOOI at process p foreach x i,j,k X p do 1 Â(i, :) Â(i, :) + x i,j,k [B(j, :) C(k, :)] 2 A(I p, :) TRSVD(Â, R, MxV(... ), MTxV(... )) 3 Send/Receive rows of A 9/ 16 Parallel Sparse Tucker Decompositions
26 Hypergraph Model for Parallel HOOI Multi-constraint hypergraph partitioning We balance computation and memory costs. By minimizing the cutsize of the hypergraph, we minimize: the total communication volume of MtV/MTxV, the total extra MxV/MTxV work, and the total volume of communication for TTM. Ideally, should minimize the maximum, not total X = {(1, 2, 3), (2, 3, 1), (3, 1, 2)} a 2 a 1 b 1 c 1 b 2 c 2 x 1,2,3 x 3,1,2 x 2,3,1 c 3 b 3 a 3 10/ 16 Parallel Sparse Tucker Decompositions
27 Outline 1 Introduction 2 Parallel HOOI 3 Results 4 Conclusion
28 Experimental Setup HyperTensor Hybrid OpenMP/MPI code in C++ Dependencies to BLAS, LAPACK, and C++11 STL SLEPc/PETSc for distributed memory TRSVD computations IBM BlueGene/Q Machine 16GB memory and 16 cores (at 1.6GHz) per node Experiments using up to 4096 cores (256 nodes) R i is set to 5/10 for 4/3-dimensional tensors. Tensor sizes Tensor I 1 I 2 I 3 I 4 #nonzeros Netflix 480K 17K 2K - 100M NELL 3.2M K - 78M Delicious 1K 530K 17M 2.4M 140M Flickr K 28M 1.6M 112M 11/ 16 Parallel Sparse Tucker Decompositions
29 Results - Flickr/Delicious Per iteration runtime of the parallel HOOI (in seconds) #nodes #cores Delicious Flickr fine-hp fine-rd coarse-hp coarse-bl fine-hp fine-rd coarse-hp coarse-bl Coarse-grain kernel is slow due to load imbalance and communication. On Delicious, fine-hp is 1.8x/5.4x/6.4x faster than fine-rd/coarse-hp/coarse-bl. On Flickr, fine-hp is 1.5x/3.7x/4.3x faster than fine-rd/coarse-hp/coarse-bl. All instances achieve scalability to 4096 cores. 12/ 16 Parallel Sparse Tucker Decompositions
30 Results - NELL/Netflix Per iteration runtime of the parallel HOOI (in seconds) #nodes #cores NELL Netflix fine-hp fine-rd coarse-hp coarse-bl fine-hp fine-rd coarse-hp coarse-bl On Netflix, fine-hp is 2.8x/5x/5.5x faster than fine-rd/coarse-hp/coarse-bl. On NELL, fine-rd is faster than fine-hp (5x less total comm. but 2x more max comm.) 13/ 16 Parallel Sparse Tucker Decompositions
31 Outline 1 Introduction 2 Parallel HOOI 3 Results 4 Conclusion
32 Conclusion We provide the first high performance shared/distributed memory parallel algorithm/implementation for the Tucker decomposition of sparse tensors hypergraph partitioning models of these computations for better scalability. We achieve scalability up to 4096 cores even with random partitioning. We enable Tucker-based tensor analysis of very big sparse data. 14/ 16 Parallel Sparse Tucker Decompositions
33 Conclusion We provide the first high performance shared/distributed memory parallel algorithm/implementation for the Tucker decomposition of sparse tensors hypergraph partitioning models of these computations for better scalability. We achieve scalability up to 4096 cores even with random partitioning. We enable Tucker-based tensor analysis of very big sparse data. 14/ 16 Parallel Sparse Tucker Decompositions
34 Conclusion We provide the first high performance shared/distributed memory parallel algorithm/implementation for the Tucker decomposition of sparse tensors hypergraph partitioning models of these computations for better scalability. We achieve scalability up to 4096 cores even with random partitioning. We enable Tucker-based tensor analysis of very big sparse data. 14/ 16 Parallel Sparse Tucker Decompositions
35 References [1] O. Kaya and B. Uçar, High-performance parallel algorithms for the Tucker decomposition of higher order sparse tensors, Tech. Rep. RR-8801, Inria, Grenoble Rhône-Alpes, Oct [2] W. Austin, G. Ballard, and T. G. Kolda, Parallel tensor compression for large-scale scientific data, tech. rep., arxiv, [3] I. Jeon, E. E. Papalexakis, U. Kang, and C. Faloutsos, Haten2: Billion-scale tensor decompositions, in IEEE 31st International Conference on Data Engineering (ICDE), pp , [4] M. Baskaran, B. Meister, N. Vasilache, and R. Lethin, Efficient and scalable computations with sparse tensors, in IEEE Conference on High Performance Extreme Computing (HPEC), pp. 1 6, Sept [5] J. Sun, H. Zeng, H. Liu, Y. Lu, and Z. Chen, CubeSVD: A novel approach to personalized web search, in Proceedings of the 14th International Conference on World Wide Web, WWW 05, (New York, NY, USA), pp , ACM, / 16 Parallel Sparse Tucker Decompositions
36 Contact perso.ens-lyon.fr/bora.ucar 16/ 16 Parallel Sparse Tucker Decompositions
Parallel Tensor Compression for Large-Scale Scientific Data
Parallel Tensor Compression for Large-Scale Scientific Data Woody Austin, Grey Ballard, Tamara G. Kolda April 14, 2016 SIAM Conference on Parallel Processing for Scientific Computing MS 44/52: Parallel
More informationUsing R for Iterative and Incremental Processing
Using R for Iterative and Incremental Processing Shivaram Venkataraman, Indrajit Roy, Alvin AuYoung, Robert Schreiber UC Berkeley and HP Labs UC BERKELEY Big Data, Complex Algorithms PageRank (Dominant
More informationA Medium-Grained Algorithm for Distributed Sparse Tensor Factorization
A Medium-Grained Algorithm for Distributed Sparse Tensor Factorization Shaden Smith, George Karypis Department of Computer Science and Engineering, University of Minnesota {shaden, karypis}@cs.umn.edu
More informationNesterov-based Alternating Optimization for Nonnegative Tensor Completion: Algorithm and Parallel Implementation
Nesterov-based Alternating Optimization for Nonnegative Tensor Completion: Algorithm and Parallel Implementation Georgios Lourakis and Athanasios P. Liavas School of Electrical and Computer Engineering,
More informationKronecker Product Approximation with Multiple Factor Matrices via the Tensor Product Algorithm
Kronecker Product Approximation with Multiple actor Matrices via the Tensor Product Algorithm King Keung Wu, Yeung Yam, Helen Meng and Mehran Mesbahi Department of Mechanical and Automation Engineering,
More informationA randomized block sampling approach to the canonical polyadic decomposition of large-scale tensors
A randomized block sampling approach to the canonical polyadic decomposition of large-scale tensors Nico Vervliet Joint work with Lieven De Lathauwer SIAM AN17, July 13, 2017 2 Classification of hazardous
More informationA Practical Randomized CP Tensor Decomposition
A Practical Randomized CP Tensor Decomposition Casey Battaglino, Grey Ballard 2, and Tamara G. Kolda 3 SIAM AN 207, Pittsburgh, PA Georgia Tech Computational Sci. and Engr. 2 Wake Forest University 3 Sandia
More informationDistributed Methods for High-dimensional and Large-scale Tensor Factorization
Distributed Methods for High-dimensional and Large-scale Tensor Factorization Kijung Shin Dept. of Computer Science and Engineering Seoul National University Seoul, Republic of Korea koreaskj@snu.ac.kr
More informationSparse BLAS-3 Reduction
Sparse BLAS-3 Reduction to Banded Upper Triangular (Spar3Bnd) Gary Howell, HPC/OIT NC State University gary howell@ncsu.edu Sparse BLAS-3 Reduction p.1/27 Acknowledgements James Demmel, Gene Golub, Franc
More informationFaloutsos, Tong ICDE, 2009
Large Graph Mining: Patterns, Tools and Case Studies Christos Faloutsos Hanghang Tong CMU Copyright: Faloutsos, Tong (29) 2-1 Outline Part 1: Patterns Part 2: Matrix and Tensor Tools Part 3: Proximity
More informationParallel Sparse Tensor Decompositions using HiCOO Format
Figure sources: A brief survey of tensors by Berton Earnshaw and NVIDIA Tensor Cores Parallel Sparse Tensor Decompositions using HiCOO Format Jiajia Li, Jee Choi, Richard Vuduc May 8, 8 @ SIAM ALA 8 Outline
More informationTowards parallel bipartite matching algorithms
Outline Towards parallel bipartite matching algorithms Bora Uçar CNRS and GRAAL, ENS Lyon, France Scheduling for large-scale systems, 13 15 May 2009, Knoxville Joint work with Patrick R. Amestoy (ENSEEIHT-IRIT,
More informationMust-read Material : Multimedia Databases and Data Mining. Indexing - Detailed outline. Outline. Faloutsos
Must-read Material 15-826: Multimedia Databases and Data Mining Tamara G. Kolda and Brett W. Bader. Tensor decompositions and applications. Technical Report SAND2007-6702, Sandia National Laboratories,
More informationMath 671: Tensor Train decomposition methods
Math 671: Eduardo Corona 1 1 University of Michigan at Ann Arbor December 8, 2016 Table of Contents 1 Preliminaries and goal 2 Unfolding matrices for tensorized arrays The Tensor Train decomposition 3
More informationCS60021: Scalable Data Mining. Dimensionality Reduction
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 1 CS60021: Scalable Data Mining Dimensionality Reduction Sourangshu Bhattacharya Assumption: Data lies on or near a
More informationMemory-efficient Parallel Tensor Decompositions
Patent Pending Memory-efficient Parallel Tensor Decompositions Muthu Baskaran, Tom Henretty, Benoit Pradelle, M. Harper Langston, David Bruns-Smith, James Ezick, Richard Lethin Reservoir Labs Inc. New
More informationMulti-Way Compressed Sensing for Big Tensor Data
Multi-Way Compressed Sensing for Big Tensor Data Nikos Sidiropoulos Dept. ECE University of Minnesota MIIS, July 1, 2013 Nikos Sidiropoulos Dept. ECE University of Minnesota ()Multi-Way Compressed Sensing
More informationLinear Systems of n equations for n unknowns
Linear Systems of n equations for n unknowns In many application problems we want to find n unknowns, and we have n linear equations Example: Find x,x,x such that the following three equations hold: x
More informationLarge-Scale Matrix Factorization with Distributed Stochastic Gradient Descent
Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent KDD 2011 Rainer Gemulla, Peter J. Haas, Erik Nijkamp and Yannis Sismanis Presenter: Jiawen Yao Dept. CSE, UT Arlington 1 1
More informationOptimization of Symmetric Tensor Computations
Optimization of Symmetric Tensor Computations Jonathon Cai Department of Computer Science, Yale University New Haven, CT 0650 Email: jonathon.cai@yale.edu Muthu Baskaran, Benoît Meister, Richard Lethin
More informationEnhancing Scalability of Sparse Direct Methods
Journal of Physics: Conference Series 78 (007) 0 doi:0.088/7-6596/78//0 Enhancing Scalability of Sparse Direct Methods X.S. Li, J. Demmel, L. Grigori, M. Gu, J. Xia 5, S. Jardin 6, C. Sovinec 7, L.-Q.
More informationENGG5781 Matrix Analysis and Computations Lecture 10: Non-Negative Matrix Factorization and Tensor Decomposition
ENGG5781 Matrix Analysis and Computations Lecture 10: Non-Negative Matrix Factorization and Tensor Decomposition Wing-Kin (Ken) Ma 2017 2018 Term 2 Department of Electronic Engineering The Chinese University
More informationFitting a Tensor Decomposition is a Nonlinear Optimization Problem
Fitting a Tensor Decomposition is a Nonlinear Optimization Problem Evrim Acar, Daniel M. Dunlavy, and Tamara G. Kolda* Sandia National Laboratories Sandia is a multiprogram laboratory operated by Sandia
More informationMath 671: Tensor Train decomposition methods II
Math 671: Tensor Train decomposition methods II Eduardo Corona 1 1 University of Michigan at Ann Arbor December 13, 2016 Table of Contents 1 What we ve talked about so far: 2 The Tensor Train decomposition
More informationLecture 4. Tensor-Related Singular Value Decompositions. Charles F. Van Loan
From Matrix to Tensor: The Transition to Numerical Multilinear Algebra Lecture 4. Tensor-Related Singular Value Decompositions Charles F. Van Loan Cornell University The Gene Golub SIAM Summer School 2010
More informationData Mining Lecture 4: Covariance, EVD, PCA & SVD
Data Mining Lecture 4: Covariance, EVD, PCA & SVD Jo Houghton ECS Southampton February 25, 2019 1 / 28 Variance and Covariance - Expectation A random variable takes on different values due to chance The
More informationMTH 464: Computational Linear Algebra
MTH 464: Computational Linear Algebra Lecture Outlines Exam 2 Material Prof. M. Beauregard Department of Mathematics & Statistics Stephen F. Austin State University February 6, 2018 Linear Algebra (MTH
More informationWafer Pattern Recognition Using Tucker Decomposition
Wafer Pattern Recognition Using Tucker Decomposition Ahmed Wahba, Li-C. Wang, Zheng Zhang UC Santa Barbara Nik Sumikawa NXP Semiconductors Abstract In production test data analytics, it is often that an
More informationModel-Driven Sparse CP Decomposition for Higher-Order Tensors
7 IEEE International Parallel and Distributed Processing Symposium Model-Driven Sparse CP Decomposition for Higher-Order Tensors Jiajia Li, Jee Choi, Ioakeim Perros, Jimeng Sun, Richard Vuduc Computational
More informationUtilisation de la compression low-rank pour réduire la complexité du solveur PaStiX
Utilisation de la compression low-rank pour réduire la complexité du solveur PaStiX 26 Septembre 2018 - JCAD 2018 - Lyon Grégoire Pichon, Mathieu Faverge, Pierre Ramet, Jean Roman Outline 1. Context 2.
More informationComputing Dense Tensor Decompositions with Optimal Dimension Trees
Computing Dense Tensor Decompositions with Optimal Dimension Trees Oguz Kaya, Yves Robert, Bora Uçar To cite this version: Oguz Kaya, Yves Robert, Bora Uçar. Computing Dense Tensor Decompositions with
More informationConstrained Tensor Factorization with Accelerated AO-ADMM
Constrained Tensor actorization with Accelerated AO-ADMM Shaden Smith, Alec Beri, George Karypis Department of Computer Science and Engineering, University of Minnesota, Minneapolis, USA Department of
More informationTensor-Tensor Product Toolbox
Tensor-Tensor Product Toolbox 1 version 10 Canyi Lu canyilu@gmailcom Carnegie Mellon University https://githubcom/canyilu/tproduct June, 018 1 INTRODUCTION Tensors are higher-order extensions of matrices
More informationBindel, Fall 2016 Matrix Computations (CS 6210) Notes for
1 Logistics Notes for 2016-08-26 1. Our enrollment is at 50, and there are still a few students who want to get in. We only have 50 seats in the room, and I cannot increase the cap further. So if you are
More informationNumerical Methods I Non-Square and Sparse Linear Systems
Numerical Methods I Non-Square and Sparse Linear Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 September 25th, 2014 A. Donev (Courant
More informationTENSOR LAYERS FOR COMPRESSION OF DEEP LEARNING NETWORKS. Cris Cecka Senior Research Scientist, NVIDIA GTC 2018
TENSOR LAYERS FOR COMPRESSION OF DEEP LEARNING NETWORKS Cris Cecka Senior Research Scientist, NVIDIA GTC 2018 Tensors Computations and the GPU AGENDA Tensor Networks and Decompositions Tensor Layers in
More informationPostgraduate Course Signal Processing for Big Data (MSc)
Postgraduate Course Signal Processing for Big Data (MSc) Jesús Gustavo Cuevas del Río E-mail: gustavo.cuevas@upm.es Work Phone: +34 91 549 57 00 Ext: 4039 Course Description Instructor Information Course
More informationMUMPS. The MUMPS library: work done during the SOLSTICE project. MUMPS team, Lyon-Grenoble, Toulouse, Bordeaux
The MUMPS library: work done during the SOLSTICE project MUMPS team, Lyon-Grenoble, Toulouse, Bordeaux Sparse Days and ANR SOLSTICE Final Workshop June MUMPS MUMPS Team since beg. of SOLSTICE (2007) Permanent
More informationJacobi-Based Eigenvalue Solver on GPU. Lung-Sheng Chien, NVIDIA
Jacobi-Based Eigenvalue Solver on GPU Lung-Sheng Chien, NVIDIA lchien@nvidia.com Outline Symmetric eigenvalue solver Experiment Applications Conclusions Symmetric eigenvalue solver The standard form is
More informationto be more efficient on enormous scale, in a stream, or in distributed settings.
16 Matrix Sketching The singular value decomposition (SVD) can be interpreted as finding the most dominant directions in an (n d) matrix A (or n points in R d ). Typically n > d. It is typically easy to
More informationTensor Analysis. Topics in Data Mining Fall Bruno Ribeiro
Tensor Analysis Topics in Data Mining Fall 2015 Bruno Ribeiro Tensor Basics But First 2 Mining Matrices 3 Singular Value Decomposition (SVD) } X(i,j) = value of user i for property j i 2 j 5 X(Alice, cholesterol)
More informationTensor Network Computations in Quantum Chemistry. Charles F. Van Loan Department of Computer Science Cornell University
Tensor Network Computations in Quantum Chemistry Charles F. Van Loan Department of Computer Science Cornell University Joint work with Garnet Chan, Department of Chemistry and Chemical Biology, Cornell
More informationBig Tensor Data Reduction
Big Tensor Data Reduction Nikos Sidiropoulos Dept. ECE University of Minnesota NSF/ECCS Big Data, 3/21/2013 Nikos Sidiropoulos Dept. ECE University of Minnesota () Big Tensor Data Reduction NSF/ECCS Big
More informationLinear Algebra Methods for Data Mining
Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 The Singular Value Decomposition (SVD) continued Linear Algebra Methods for Data Mining, Spring 2007, University
More informationA Unified Optimization Approach for Sparse Tensor Operations on GPUs
A Unified Optimization Approach for Sparse Tensor Operations on GPUs Bangtian Liu, Chengyao Wen, Anand D. Sarwate, Maryam Mehri Dehnavi Rutgers, The State University of New Jersey {bangtian.liu, chengyao.wen,
More informationTensor-Matrix Products with a Compressed Sparse Tensor
Tensor-Matrix Products with a Compressed Sparse Tensor Shaden Smith shaden@cs.umn.edu Department of Computer Science and Engineering University of Minnesota, USA George Karypis karypis@cs.umn.edu ABSTRACT
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra)
AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 1: Course Overview & Matrix-Vector Multiplication Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 20 Outline 1 Course
More informationSection 1.1: Systems of Linear Equations. A linear equation: a 1 x 1 a 2 x 2 a n x n b. EXAMPLE: 4x 1 5x 2 2 x 1 and x x 1 x 3
Section 1.1: Systems of Linear Equations A linear equation: a 1 x 1 a 2 x 2 a n x n b EXAMPLE: 4x 1 5x 2 2 x 1 and x 2 2 6 x 1 x 3 rearranged rearranged 3x 1 5x 2 2 2x 1 x 2 x 3 2 6 Not linear: 4x 1 6x
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 27, 2015 Outline Linear regression Ridge regression and Lasso Time complexity (closed form solution) Iterative Solvers Regression Input: training
More informationLow-rank Tucker decomposition of large tensors using TensorSketch
Low-rank Tucker decomposition of large tensors using TensorSketch Stephen Becker and Osman Asif Malik Department of Applied Mathematics May 8, 08 Abstract We propose two randomized algorithms for low-rank
More informationTruncation Strategy of Tensor Compressive Sensing for Noisy Video Sequences
Journal of Information Hiding and Multimedia Signal Processing c 2016 ISSN 207-4212 Ubiquitous International Volume 7, Number 5, September 2016 Truncation Strategy of Tensor Compressive Sensing for Noisy
More informationNotes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T.
Notes on singular value decomposition for Math 54 Recall that if A is a symmetric n n matrix, then A has real eigenvalues λ 1,, λ n (possibly repeated), and R n has an orthonormal basis v 1,, v n, where
More informationA model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization)
A model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization) Schodinger equation: Hψ = Eψ Choose a basis set of wave functions Two cases: Orthonormal
More informationFrom Matrix to Tensor. Charles F. Van Loan
From Matrix to Tensor Charles F. Van Loan Department of Computer Science January 28, 2016 From Matrix to Tensor From Tensor To Matrix 1 / 68 What is a Tensor? Instead of just A(i, j) it s A(i, j, k) or
More informationAll-at-once Decomposition of Coupled Billion-scale Tensors in Apache Spark
All-at-once Decomposition of Coupled Billion-scale Tensors in Apache Spark Aditya Gudibanda, Tom Henretty, Muthu Baskaran, James Ezick, Richard Lethin Reservoir Labs 632 Broadway Suite 803 New York, NY
More informationScalable Tensor Factorizations with Incomplete Data
Scalable Tensor Factorizations with Incomplete Data Tamara G. Kolda & Daniel M. Dunlavy Sandia National Labs Evrim Acar Information Technologies Institute, TUBITAK-UEKAE, Turkey Morten Mørup Technical
More informationCVPR A New Tensor Algebra - Tutorial. July 26, 2017
CVPR 2017 A New Tensor Algebra - Tutorial Lior Horesh lhoresh@us.ibm.com Misha Kilmer misha.kilmer@tufts.edu July 26, 2017 Outline Motivation Background and notation New t-product and associated algebraic
More informationProcessing Big Data Matrix Sketching
Processing Big Data Matrix Sketching Dimensionality reduction Linear Principal Component Analysis: SVD-based Compressed sensing Matrix sketching Non-linear Kernel PCA Isometric mapping Matrix sketching
More informationBlock Bidiagonal Decomposition and Least Squares Problems
Block Bidiagonal Decomposition and Least Squares Problems Åke Björck Department of Mathematics Linköping University Perspectives in Numerical Analysis, Helsinki, May 27 29, 2008 Outline Bidiagonal Decomposition
More informationV C V L T I 0 C V B 1 V T 0 I. l nk
Multifrontal Method Kailai Xu September 16, 2017 Main observation. Consider the LDL T decomposition of a SPD matrix [ ] [ ] [ ] [ ] B V T L 0 I 0 L T L A = = 1 V T V C V L T I 0 C V B 1 V T, 0 I where
More informationIntroduction to Data Mining
Introduction to Data Mining Lecture #21: Dimensionality Reduction Seoul National University 1 In This Lecture Understand the motivation and applications of dimensionality reduction Learn the definition
More informationLink Analysis. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze
Link Analysis Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze 1 The Web as a Directed Graph Page A Anchor hyperlink Page B Assumption 1: A hyperlink between pages
More informationAccelerating linear algebra computations with hybrid GPU-multicore systems.
Accelerating linear algebra computations with hybrid GPU-multicore systems. Marc Baboulin INRIA/Université Paris-Sud joint work with Jack Dongarra (University of Tennessee and Oak Ridge National Laboratory)
More informationSolution of Linear Systems
Solution of Linear Systems Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico May 12, 2016 CPD (DEI / IST) Parallel and Distributed Computing
More informationLarge Scale Sparse Linear Algebra
Large Scale Sparse Linear Algebra P. Amestoy (INP-N7, IRIT) A. Buttari (CNRS, IRIT) T. Mary (University of Toulouse, IRIT) A. Guermouche (Univ. Bordeaux, LaBRI), J.-Y. L Excellent (INRIA, LIP, ENS-Lyon)
More informationarxiv:cs/ v1 [cs.ms] 13 Aug 2004
tsnnls: A solver for large sparse least squares problems with non-negative variables Jason Cantarella Department of Mathematics, University of Georgia, Athens, GA 30602 Michael Piatek arxiv:cs/0408029v
More informationQuantum Recommendation Systems
Quantum Recommendation Systems Iordanis Kerenidis 1 Anupam Prakash 2 1 CNRS, Université Paris Diderot, Paris, France, EU. 2 Nanyang Technological University, Singapore. April 4, 2017 The HHL algorithm
More information6 Linear Systems of Equations
6 Linear Systems of Equations Read sections 2.1 2.3, 2.4.1 2.4.5, 2.4.7, 2.7 Review questions 2.1 2.37, 2.43 2.67 6.1 Introduction When numerically solving two-point boundary value problems, the differential
More informationTENSOR APPROXIMATION TOOLS FREE OF THE CURSE OF DIMENSIONALITY
TENSOR APPROXIMATION TOOLS FREE OF THE CURSE OF DIMENSIONALITY Eugene Tyrtyshnikov Institute of Numerical Mathematics Russian Academy of Sciences (joint work with Ivan Oseledets) WHAT ARE TENSORS? Tensors
More informationA Randomized Approach for Crowdsourcing in the Presence of Multiple Views
A Randomized Approach for Crowdsourcing in the Presence of Multiple Views Presenter: Yao Zhou joint work with: Jingrui He - 1 - Roadmap Motivation Proposed framework: M2VW Experimental results Conclusion
More informationLarge-scale Matrix Factorization. Kijung Shin Ph.D. Student, CSD
Large-scale Matrix Factorization Kijung Shin Ph.D. Student, CSD Roadmap Matrix Factorization (review) Algorithms Distributed SGD: DSGD Alternating Least Square: ALS Cyclic Coordinate Descent: CCD++ Experiments
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Numerical Linear Algebra Background Cho-Jui Hsieh UC Davis May 15, 2018 Linear Algebra Background Vectors A vector has a direction and a magnitude
More informationarxiv: v1 [cs.lg] 18 Nov 2018
THE CORE CONSISTENCY OF A COMPRESSED TENSOR Georgios Tsitsikas, Evangelos E. Papalexakis Dept. of Computer Science and Engineering University of California, Riverside arxiv:1811.7428v1 [cs.lg] 18 Nov 18
More informationProbabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms
Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Adrien Todeschini Inria Bordeaux JdS 2014, Rennes Aug. 2014 Joint work with François Caron (Univ. Oxford), Marie
More informationUsing SVD to Recommend Movies
Michael Percy University of California, Santa Cruz Last update: December 12, 2009 Last update: December 12, 2009 1 / Outline 1 Introduction 2 Singular Value Decomposition 3 Experiments 4 Conclusion Last
More informationParallel Numerical Algorithms
Parallel Numerical Algorithms Chapter 6 Structured and Low Rank Matrices Section 6.3 Numerical Optimization Michael T. Heath and Edgar Solomonik Department of Computer Science University of Illinois at
More informationThe Singular Value Decomposition (SVD) and Principal Component Analysis (PCA)
Chapter 5 The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA) 5.1 Basics of SVD 5.1.1 Review of Key Concepts We review some key definitions and results about matrices that will
More informationLU factorization with Panel Rank Revealing Pivoting and its Communication Avoiding version
1 LU factorization with Panel Rank Revealing Pivoting and its Communication Avoiding version Amal Khabou Advisor: Laura Grigori Université Paris Sud 11, INRIA Saclay France SIAMPP12 February 17, 2012 2
More informationMatrix Decomposition and Latent Semantic Indexing (LSI) Introduction to Information Retrieval INF 141/ CS 121 Donald J. Patterson
Matrix Decomposition and Latent Semantic Indexing (LSI) Introduction to Information Retrieval INF 141/ CS 121 Donald J. Patterson Latent Semantic Indexing Outline Introduction Linear Algebra Refresher
More informationDFacTo: Distributed Factorization of Tensors
DFacTo: Distributed Factorization of Tensors Joon Hee Choi Electrical and Computer Engineering Purdue University West Lafayette IN 47907 choi240@purdue.edu S. V. N. Vishwanathan Statistics and Computer
More informationEfficient CP-ALS and Reconstruction From CP
Efficient CP-ALS and Reconstruction From CP Jed A. Duersch & Tamara G. Kolda Sandia National Laboratories Livermore, CA Sandia National Laboratories is a multimission laboratory managed and operated by
More informationTechnical Report TR SPLATT: Efficient and Parallel Sparse Tensor-Matrix Multiplication
Technical Report Department of Computer Science and Engineering University of Minnesota 4-192 Keller Hall 200 Union Street SE Minneapolis, MN 55455-0159 USA TR 15-008 SPLATT: Efficient and Parallel Sparse
More informationTransposition Mechanism for Sparse Matrices on Vector Processors
Transposition Mechanism for Sparse Matrices on Vector Processors Pyrrhos Stathis Stamatis Vassiliadis Sorin Cotofana Electrical Engineering Department, Delft University of Technology, Delft, The Netherlands
More informationComputing least squares condition numbers on hybrid multicore/gpu systems
Computing least squares condition numbers on hybrid multicore/gpu systems M. Baboulin and J. Dongarra and R. Lacroix Abstract This paper presents an efficient computation for least squares conditioning
More informationLearning goals: students learn to use the SVD to find good approximations to matrices and to compute the pseudoinverse.
Application of the SVD: Compression and Pseudoinverse Learning goals: students learn to use the SVD to find good approximations to matrices and to compute the pseudoinverse. Low rank approximation One
More informationA fast randomized algorithm for overdetermined linear least-squares regression
A fast randomized algorithm for overdetermined linear least-squares regression Vladimir Rokhlin and Mark Tygert Technical Report YALEU/DCS/TR-1403 April 28, 2008 Abstract We introduce a randomized algorithm
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Lecture 5: Numerical Linear Algebra Cho-Jui Hsieh UC Davis April 20, 2017 Linear Algebra Background Vectors A vector has a direction and a magnitude
More informationSparse LU Factorization on GPUs for Accelerating SPICE Simulation
Nano-scale Integrated Circuit and System (NICS) Laboratory Sparse LU Factorization on GPUs for Accelerating SPICE Simulation Xiaoming Chen PhD Candidate Department of Electronic Engineering Tsinghua University,
More informationLU Factorization with Panel Rank Revealing Pivoting and Its Communication Avoiding Version
LU Factorization with Panel Rank Revealing Pivoting and Its Communication Avoiding Version Amal Khabou James Demmel Laura Grigori Ming Gu Electrical Engineering and Computer Sciences University of California
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)
AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) Lecture 19: Computing the SVD; Sparse Linear Systems Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical
More informationUSING SINGULAR VALUE DECOMPOSITION (SVD) AS A SOLUTION FOR SEARCH RESULT CLUSTERING
POZNAN UNIVE RSIY OF E CHNOLOGY ACADE MIC JOURNALS No. 80 Electrical Engineering 2014 Hussam D. ABDULLA* Abdella S. ABDELRAHMAN* Vaclav SNASEL* USING SINGULAR VALUE DECOMPOSIION (SVD) AS A SOLUION FOR
More informationLinear Algebra (Review) Volker Tresp 2017
Linear Algebra (Review) Volker Tresp 2017 1 Vectors k is a scalar (a number) c is a column vector. Thus in two dimensions, c = ( c1 c 2 ) (Advanced: More precisely, a vector is defined in a vector space.
More informationAN INDEPENDENT LOOPS SEARCH ALGORITHM FOR SOLVING INDUCTIVE PEEC LARGE PROBLEMS
Progress In Electromagnetics Research M, Vol. 23, 53 63, 2012 AN INDEPENDENT LOOPS SEARCH ALGORITHM FOR SOLVING INDUCTIVE PEEC LARGE PROBLEMS T.-S. Nguyen *, J.-M. Guichon, O. Chadebec, G. Meunier, and
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 27, 2015 Outline One versus all/one versus one Ranking loss for multiclass/multilabel classification Scaling to millions of labels Multiclass
More informationAn Optimized Interestingness Hotspot Discovery Framework for Large Gridded Spatio-temporal Datasets
IEEE Big Data 2015 Big Data in Geosciences Workshop An Optimized Interestingness Hotspot Discovery Framework for Large Gridded Spatio-temporal Datasets Fatih Akdag and Christoph F. Eick Department of Computer
More informationSparse Solutions of Linear Systems of Equations and Sparse Modeling of Signals and Images: Final Presentation
Sparse Solutions of Linear Systems of Equations and Sparse Modeling of Signals and Images: Final Presentation Alfredo Nava-Tudela John J. Benedetto, advisor 5/10/11 AMSC 663/664 1 Problem Let A be an n
More informationA Note on Google s PageRank
A Note on Google s PageRank According to Google, google-search on a given topic results in a listing of most relevant web pages related to the topic. Google ranks the importance of webpages according to
More informationarxiv: v1 [cs.lg] 26 Jul 2017
Updating Singular Value Decomposition for Rank One Matrix Perturbation Ratnik Gandhi, Amoli Rajgor School of Engineering & Applied Science, Ahmedabad University, Ahmedabad-380009, India arxiv:70708369v
More informationParallel Singular Value Decomposition. Jiaxing Tan
Parallel Singular Value Decomposition Jiaxing Tan Outline What is SVD? How to calculate SVD? How to parallelize SVD? Future Work What is SVD? Matrix Decomposition Eigen Decomposition A (non-zero) vector
More informationLarge Scale Tensor Decompositions: Algorithmic Developments and Applications
Large Scale Tensor Decompositions: Algorithmic Developments and Applications Evangelos Papalexakis, U Kang, Christos Faloutsos, Nicholas Sidiropoulos, Abhay Harpale Carnegie Mellon University, KAIST, University
More information