Scalable Tensor Factorizations with Incomplete Data

Size: px

Start display at page:

Download "Scalable Tensor Factorizations with Incomplete Data"

Ira Lynch
6 years ago
Views:

Scalable Tensor Factorizations with Incomplete Data Tamara G. Kolda & Daniel M.

TUBITAK-UEKAE, Turkey Morten Mørup Technical University of Denmark U.S.

multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company,

1 Scalable Tensor Factorizations with Incomplete Data Tamara G. Kolda & Daniel M. Dunlavy Sandia National Labs Evrim Acar Information Technologies Institute, TUBITAK-UEKAE, Turkey Morten Mørup Technical University of Denmark U.S. Department of Energy Office of Advanced Scientific Computing Research Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy s National Nuclear Security Administration under contract DE-AC04-94AL85000.

2 A Nice Way to Start the Day To: Kolda, Tamara G <tgkolda@sandia.gov> From: Engineer, Joe S <jsengin@sandia.gov> Subject: Help with a math problem July 15, 2010 T. G. Kolda - SIAM Annual 2

3 Matrix Factorization To: Engineer, Joe S <jsengin@sandia.gov> From: Kolda, Tamara G <tgkolda@sandia.gov> Subject: Re: Help with a math problem Dear Joe, No problem thanks to the handy Singular Value Decomposition (SVD)! Tammy p.s. Would you prefer Nonnegative Matrix Factorization (NMF), Independent Component Analysis (ICA), etc. July 15, 2010 T. G. Kolda - SIAM Annual 3

4 A Nicer Way to Start the Day To: Kolda, Tamara G <tgkolda@sandia.gov> From: Engineer, Joe S <jsengin@sandia.gov> Subject: Help with a math problem July 15, 2010 T. G. Kolda - SIAM Annual 4

5 Tensor Factorization To: Engineer, Joe S <jsengin@sandia.gov> From: Kolda, Tamara G <tgkolda@sandia.gov> Subject: Re: Help with a math problem Dear Joe, I factored your tensor using CANDECOMP/PARAFAC (CP) tensor decomposition. Though determining tensor rank is NP hard, I used heuristic methods to make a reasonable determination of the number of components. Tammy July 15, 2010 T. G. Kolda - SIAM Annual 5

6 A Better Way to Start the Day To: Kolda, Tamara G <tgkolda@sandia.gov> From: Engineer, Joe S <jsengin@sandia.gov> Subject: Help with a math problem Dear Tammy, Our measurement device had some snafus, and so we couldn t collect all the data. But we still need to factorize this (incomplete) matrix. Maybe we can just fill in the missing entries with zeros -Joe July 15, 2010 T. G. Kolda - SIAM Annual 6

7 Matrix Factorizations with Missing Data To: Engineer, Joe S <jsengin@sandia.gov> From: Kolda, Tamara G <tgkolda@sandia.gov> Subject: Re: Help with a math problem Dear Joe, Per Ruhe (1974): When the number of missing, observations is small, compared to the sample, it is common practice to replace the missing observations by means or extreme values, but when larger parts of the data matrix are empty this is no longer a sensible approach. However, many modern methods exist for factorizing matrices with missing data. A favorite reference is Buchanan & Fitzgibbon (CVPR 05). See also the work on matrix completion by Candès, Recht, and others. Most problems can be solved so long as there are enough known entries and they are distributed reasonably. Tammy July 15, 2010 T. G. Kolda - SIAM Annual 7

8 The Very Best Way to Start the Day To: Kolda, Tamara G <tgkolda@sandia.gov> From: Engineer, Joe S <jsengin@sandia.gov> Subject: Help with a math problem Dear Tammy, Our measurement device had some snafus, and so we couldn t collect all the data. But we still need to factorize this (incomplete) tensor. -Joe July 15, 2010 T. G. Kolda - SIAM Annual 8

9 Factor Example: Latent Semantic Indexing Berry, Dumais, O Brien (1995) Terms Matrix Representation 2-Factor Approximation car d1 service d2 error military repair d3 car repair service d1 d3 service military d2 Documents Gauge Freedom July 15, 2010 T. G. Kolda - SIAM Annual 9

10 Time Samples Time Samples Factor Example: Epilepsy Data measurements are recorded at multiple sites (channels) over time. The data is transformed via a continuous wavelet transform. Eye Artifact CWT Channels Frequency Time Frequency Channel Seizure Acar, Bingol, Bingol, Bro and Yener, Bioinformatics, Time Frequency Channel July 15, 2010 T. G. Kolda - SIAM Annual 10

11 A Whole Host of Tensor Decompositions CANDECOMP/PARAFAC (CP) Tucker PARATUCK2 (DEDICOM3) Block CP PARAFAC2 K. & Bader, Tensor Decompositions and Applications, SIAM Review, 2009 July 15, 2010 T. G. Kolda - SIAM Annual 11

Tensor Factorizations have Numerous Applications Modeling fluorescence excitation-emission data Signal processing

ng (e.g., fmri) data Web graph plus anchor term analysis Image compression and classification; texture analysis

Sidiropoulos, Giannakis, and Bro, IEEE Trans. Signal Processing, 2000. Sun, Tao, and Faloutsos, KDD 06.

12 Tensor Factorizations have Numerous Applications Modeling fluorescence excitation-emission data Signal processing Brain imaging (e.g., fmri) data Web graph plus anchor term analysis Image compression and classification; texture analysis Text analysis, e.g., multi-way LSI Approximating Newton potentials, stochastic PDEs, etc. Sidiropoulos, Giannakis, and Bro, IEEE Trans. Signal Processing, Sun, Tao, and Faloutsos, KDD 06. ERPWAVELAB by Morten Mørup. Furukawa, Kawasaki, Ikeuchi, and Sakauchi, EGRW '02 Doostan, Iaccarino, and Etemadi, J. Computational Physics, 2009 Hazan, Polak, and Shashua, ICCV Andersen and Bro, J. Chemometrics, July 15, 2010 T. G. Kolda - SIAM Annual 12

CANDECOMP/PARAFAC Decomposition (CP) expresses a tensor as the sum of component rank-1

13 Matrix & Tensor Factor Analysis Singular Value Decomposition (SVD) or Principal Component Analysis (PCA) expresses a matrix as the sum of component rank-1 matrices. CANDECOMP/PARAFAC Decomposition (CP) expresses a tensor as the sum of component rank-1 tensors. (Hitchcock 27, Harshman 70, Carroll & Chang 70) July 15, 2010 T. G. Kolda - SIAM Annual 13

14 Matrices Tensors have an Advantage for Recovering Factors Tensors Scale & sign ambiguity Scale and sign ambiguity Permutation ambiguity Permutation ambiguity Gauge freedom for any invertible matrix S: Otherwise components unique under mild conditions! July 15, 2010 T. G. Kolda - SIAM Annual 14

15 Kruskal s Uniqueness Condition The k-rank of a matrix A, denoted k A, is the maximum value of k such that any k columns are linearly independent. An R-component factorization [[A,B,C]] is essentially unique* if: An R-component factorization [[A (1),, A (N) ]] is essentially unique* if: Kruskal (1977, 1989), Sidropolous & Bro (2000) * Essentially unique means up to permutation and scaling: = RxR permutation matrix so long as: July 15, 2010 T. G. Kolda - SIAM Annual 15

16 channels Motivation: EEG Data = Biomedical signal processing EEG (electroencephalogram) signals can be recorded using electrodes placed on the scalp Missing data problem occurs when Electrodes get loose or disconnected, causing the signal to be unusable Different experiments have overlapping but not identical channels time-frequency channel time-freq experiments Can we still do this calculation if data are missing July 15, 2010 T. G. Kolda - SIAM Annual 16

17 The Missing Data Problem Standard Problem: Given tensor X, find A, B, and C such that Typically formulated as a least squares problem. Missing Data Problem: Given a subset of the entries of X, find A, B, and C such that for the known entries. July 15, 2010 T. G. Kolda - SIAM Annual 17

18 Mathematical Formulation Define the weight tensor W such that Then the least squares problem is Elementwise product (.* in MATLAB) With a solution, the tensor can be completed via July 15, 2010 T. G. Kolda - SIAM Annual 18

19 64 channels Brain dynamics can be captured even extensive missing channels 4392 time-freq. = Number of Missing Channels Replace Missing Entries with Zero More Sensible Approach July 15, 2010 T. G. Kolda - SIAM Annual 19

20 64 channels Brain dynamics can be captured even extensive missing channels 4392 time-freq. = Number of Missing Channels Replace Missing Entries with Zero More Sensible Approach July 15, 2010 T. G. Kolda - SIAM Annual 20

21 64 channels Brain dynamics can be captured even extensive missing channels 4392 time-freq. = No Missing Data 30 Chan./Exp. Missing channel time-freq experiments channel time-freq experiments July 15, 2010 T. G. Kolda - SIAM Annual 21

22 Chemometrics Example Bro (1997): Fluorescence measurements of 5 samples containing 3 amino acids Tyrosine Tryptophan Phenylalanine Each amino acid corresponds to a rank-one components Tensor of size 5 x 51 x samples 51 excitations 201 emissions Factors in Emission Mode July 15, 2010 T. G. Kolda - SIAM Annual 22

23 No Missing Data [Movie] July 15, 2010 T. G. Kolda - SIAM Annual 23

24 No Missing Data 7 Iteration July 15, 2010 T. G. Kolda - SIAM Annual 24

25 Replacing 50% Missing Values with Mean [Movie] July 15, 2010 T. G. Kolda - SIAM Annual 25

26 Replacing 50% Missing Values with Mean 7 Iteration July 15, 2010 T. G. Kolda - SIAM Annual 26

27 50% Missing Data using Sensible Approach [Movie] July 15, 2010 T. G. Kolda - SIAM Annual 27

28 50% Missing Data using Sensible Approach 7 Iteration July 15, 2010 T. G. Kolda - SIAM Annual 28

29 75% Missing Data using Sensible Approach [Movie] July 15, 2010 T. G. Kolda - SIAM Annual 29

30 75% Missing Data using Sensible Approach 7 Iteration July 15, 2010 T. G. Kolda - SIAM Annual 30

31 Sensible Approach: Direct Optimization Goal is to minimize the following nonlinear optimization function: Fits only the known values. 2 nd order optimization Gauss-Newton as implemented in INDAFAC Size of Jacobian is (1-M) IJK [# equations] by R(I+J+K) [# variables] Tomasi & Bro, st order optimization Nonlinear CG as implemented in CP-WOPT (available in Tensor Toolbox) Acar, Dunlavy, K., Mørup, 2010 Alternative to direct optimization: EM-ALS Repeat: Impute missing values from model, update model Implementation in parafac from N-Way Toolbox (Bro et al.) July 15, 2010 T. G. Kolda - SIAM Annual 31

32 EM-ALS Algorithm Implementation: parafac in N-way Toolbox by Bro et al. 1. Initialize matrices A, B, C 2. Complete X using the current model: 3. Compute a new A, B, C that best fit the completed X via one or more iterations of alternating least squares (ALS): 4. Go back to step 2 (until convergence) July 15, 2010 T. G. Kolda - SIAM Annual 32

33 Exploiting Sparsity in CP-WOPT Observe that Y and Z are sparse if W is sparse: Sparse calculation between two tensors with the same set of structural nonzeros. Gradient can be calculated as follows: Matrix Khatri-Rao Unfolded Tensor Product More missing data yields a sparser problem! July 15, 2010 T. G. Kolda - SIAM Annual 33

= + + + + Step 3: Randomly remove entries or entire fibers or Step 4: Factorize incomplete tensor CP-WOPT Dense or Sparse INDAFAC (Gauss-Newton)

34 Step 1: Generate factor matrices with random entries drawn from N(0,1). Each matrix has R=5 columns. Normalize each column to length 1. Numerical Experiment Set-up for Simulated Data Step 2: Generate tensors and add noise ( = 10%). = Step 3: Randomly remove entries or entire fibers or Step 4: Factorize incomplete tensor CP-WOPT Dense or Sparse INDAFAC (Gauss-Newton) EM-ALS Step 5: Compute FMS = factor match score of computed factors against truth. Assume columns are normalized and» r is the product of the norms. Best FMS is 1 July 15, 2010 T. G. Kolda - SIAM Annual 34

35 Missing Entries 50 x 40 x 30, R=5 No Difference in Accuracy Between Methods 80% Missing 90% Missing 95% Missing July 15, 2010 T. G. Kolda - SIAM Annual 35

36 Missing Entries 50 x 40 x 30, R=5 No Difference in Accuracy Between Methods 80% Missing 90% Missing 95% Missing 95% missing is pretty hard at this size. Number of starts! July 15, 2010 T. G. Kolda - SIAM Annual 36

37 95% Missing Entries, R=5 Larger Problems are Easier 50 x 40 x x 80 x x 120 x 90 Number of starts! July 15, 2010 T. G. Kolda - SIAM Annual 37

38 80% Missing 50 x 40 x 30, R=5 Structured Missing Data is Harder Missing Entries Missing Fibers July 15, 2010 T. G. Kolda - SIAM Annual 38

39 Exploiting Sparsity for Speed 80% Missing 90% Missing 95% Missing July 15, 2010 T. G. Kolda - SIAM Annual 39

40 Time Time FMS FMS CP-WOPT scales to larger problems! 500 x 500 x 500 with M=99% (1.25 million nonzeros) 1 Dense storage = 1GB Sparse storage = 40MB 1000 x 1000 x 1000 with M = 99.5% (5 million nonzeros) 1 Dense storage = 8GB Sparse storage = 160MB Restarted and converged to correct solution! July 15, 2010 T. G. Kolda - SIAM Annual 40

41 The Very Best Way to Start the Day To: Engineer, Joe S <jsengin@sandia.gov> From: Kolda, Tamara G <tgkolda@sandia.gov> Subject: Re: Help with a math problem Joe, Here you go. There s still a lot of work to do on this topic, but I think you ll find this to be a workable solution in the short term. Thanks for the opportunity to work on such an interesting problem. -Tammy July 15, 2010 T. G. Kolda - SIAM Annual 41

42 Challenges for Missing Data Problems Ubiquitous Real-World Applications Data doesn t perfectly fit (multi-) linear model Missing data patterns may be structured Theory Many recent discoveries in matrix case Need similar ideas (e.g., coherence, nuclear norm ) for tensors! Algorithms Extensions to other tensor factorizations Purposely dropping data for memory/speed of computation or maybe to avoid local minima Rank determination For more info: Tammy Kolda tgkolda@sandia.gov E. Acar, D. M. Dunlavy, T. G. Kolda and M. Mørup. Scalable Tensor Factorizations for Incomplete Data, arxiv: v1, May July 15, 2010 T. G. Kolda - SIAM Annual 42

43 Back-Up Slides July 15, 2010 T. G. Kolda - SIAM Annual 43

month of Géant data 23 x 23 x 2756 (15 min intervals) Modeled using two components = + Error in

44 Tensor Completion for Internet Traffic Matrices Network traffic data Comprises source-destination information over time (tensor) Missing data arises due to various collection issues Our data set One month of Géant data 23 x 23 x 2756 (15 min intervals) Modeled using two components = + Error in predicting missing elements as % missing data increases. Baseline model error July 15, 2010 T. G. Kolda - SIAM Annual 48

Fitting a Tensor Decomposition is a Nonlinear Optimization Problem

Fitting a Tensor Decomposition is a Nonlinear Optimization Problem Evrim Acar, Daniel M. Dunlavy, and Tamara G. Kolda* Sandia National Laboratories Sandia is a multiprogram laboratory operated by Sandia