Computationally Efficient CP Tensor Decomposition Update Framework for Emerging Component Discovery in Streaming Data

Size: px

Start display at page:

Download "Computationally Efficient CP Tensor Decomposition Update Framework for Emerging Component Discovery in Streaming Data"

Merryl Richardson
5 years ago
Views:

Computationally Efficient CP Tensor Decomposition Update Framework for Emerging Component Discovery in Streaming Data Pierre-David Letourneau, Muthu Baskaran, Tom Henretty, James Ezick, Richard

1 Computationally Efficient CP Tensor Decomposition Update Framework for Emerging Component Discovery in Streaming Data Pierre-David Letourneau, Muthu Baskaran, Tom Henretty, James Ezick, Richard Lethin Reservoir Labs 632 Broadway Suite 803 New York, NY Abstract We present Streaming CP Update, an algorithmic framework for updating CP tensor decompositions that possesses the capability of identifying emerging components and can produce decompositions of large, sparse tensors streaming along multiple modes at a low computational cost. We discuss a large-scale implementation of the proposed scheme integrated within the ENSIGN tensor analysis package, and we evaluate and demonstrate the performance of the framework, in terms of computational efficiency and ability to discover emerging components, on a real cyber dataset. I. INTRODUCTION A tensor is a multidimensional array that can be used to represent and store multidimensional data. A tensor decomposition is an object that can extract relationships and correlations among tensor data by representing the latter as a combination of simple components factors, rank-1 tensors, see Figure 1. Tensor decompositions have been successfully used in a multitude of applications. These include: genomics 1, geospatial analysis 2, cybersecurity 3, chemometrics 4, computer vision 5, data mining 6 and precision medicine 7 to name but a few. There exists a variety of numerical packages capable of computing tensor decompositions, including: GigaTensor 8, HaTen2 9, SPLATT 10, SCouT 11 and BIGtensor 12. With the exception of SPLATT 10, existing packages can generally only treat immutable tensors. That is, in situations where the amount of data increases e.g., temporally, they are bound to treat the new data by reconstructing the tensor and re-computing an entirely new decomposition with little, if any, reuse of the information provided by the er decomposition. This, of course, leads to serious computational inefficiencies. In this work, we address these inefficiencies in cases where the amount of data increases in a streaming fashion. That is, we consider cases where data increase is related to the growth of the size of an existing tensor. We focus on the CANDECOMP/PARAFAC CP decomposition and present the Streaming CP Update, an algorithmic framework for updating CP tensor decompositions that possesses the ability to identifying emerging components and can produce robust decompositions of large, sparse streaming tensors at a low-computational cost. Fig. 1. Diagram representing the proposed Streaming CP Update framework for a single streaming mode: an updated decomposition is created from the CP tensor decomposition of an original tensor top and that of an update data tensor middle that adds information along a streaming mode horizontal. Non-streaming modes are merged. The streaming temporal mode is fully updated. The framework generalizes to multiple streaming modes as well. The development of decomposition algorithms for efficiently treating streaming tensors is relatively novel within the field of numerical tensor analysis. Recently-proposed approaches fall within two categories: 1 Perturbation-based methods that perform the update through a continuous modification perturbation of factors found in an existing decomposition 13, Component discovery methods that focus on merging an existing decomposition with a second one obtained from the update data without further modifications to the factors 15. Our method lies at the intersection of both categories; it is a component discovery method because it merges an existing decomposition with a decomposition of the update data along non-streaming modes. However, it is also a perturbationbased method because it modifies and adapts the streaming mode factors following the merging step. In this sense, it offers the best of both worlds while keeping computational costs to a minimum. Our contributions in this regard include: 1 A streaming tensor decomposition framework and algorithm: a low computational cost, small memory footprint algorithm for updating existing tensor decompositions in light of new streaming data, 2 Superior capabilities for identifying and extracting emerging components not present in original data 16, 3 Extension of streaming updates to multi-mode, 4 Implementation of the framework using highperformance tensor decomposition and manipulation routines ENSIGN and 17, 5 Evaluation & demonstration of performance on real data.

2 II. BACKGROUND AND NOTATION We shall use the following notation: vectors are represented by b lowercase letters v, matrices are represented by b capital letters A, and tensors are represented by b calligraphic capital letters X. Tensors are elements of R I1 I2... I N. N is the number of modes of the tensor and I n is the dimension of the tensor along mode n. The CP decomposition of a tensor X is an object denoted A 1, A 2,..., A N, or A n when the context is clear, where {A n } N n1 are I n K factor matrices, K is a fixed integer called the rank and A 1, A 2,..., A N K n1 A1 :, n A 2 :, n... A N :, n, where represents the outer product. Here, we use MATLAB-like notation 1 for array indexing so that a colon : represents all the elements along a certain dimension and a sequence n : n + m represents a restriction to the elements with indices n to n + m included. For instance, the quantity A 1 :, n above refers to the n th column of A 1. We will also use the symbol 0 M N to represent a matrix of size M N with entries being all zeros. We further introduce certain operations on vectors and tensors that will become important in future sections. We denote by the standard inner product between vectors or the Frobenius inner product between matrices and tensors. Similarly, is the Euclidean norm on vectors and the Frobenius norm on matrices and tensors. Given two matrices A and B of size M N, symbols and represent element-wise multiplication and division, i.e., A B i, j Ai, j Bi, j and A B i, j Ai,j Bi,j. We also introduce the Khatri-Rao product A B A:, 1 B:, 1, A:, 2 B:, 2,..., A:, N B:, N, where is the Kronecker product, i.e., A1, 1B... A1, NB A B AM, 1B... AM, NB and A 1, A 2 indicates horizontal matrix concatenation. Finally, we denote the matricization of X along the n th mode by X n to be the re-ordering of the elements of a tensor X in matrix form such that for any fixed index i 1,..., i n 1, i n+1,..., i N, the vector {X i 1,..., i n 1, j, i n+1,..., i N } In j1 is a column of X n. III. RELATED WORK Original work pertaining to CP streaming tensor updates is generally associated with Nion et al. 18. Similar work has also been presented with regard to the Tucker decomposition update in 19, 20, and 21. The Tucker decomposition, although related to the CP, is more restrictive and will not be addressed in this paper. More recent work in streaming tensor decomposition include that of Zhou et al. 13, Smith 14, and Pasricha et al. 15. Zhou et al. s method 13 modifies the factors of an already-existing decomposition in order to account for the update data. The emphasis is on scalability, and the cornerstone of the method is a thorough form of 1 The indexing is 1-indexed; the first index of an array is 1, not 0. computational recycling redundant computation avoidance using a special hierarchy specific to the Alternating Least-Squares ALS approach. Smith 14 proceeds in a similar fashion, but emphasizes the need to down-weight information that was observed far in the past, while paying particular attention to memory and computational costs. Pasricha et al. 15 focuses on discovering emerging components from the update data rather than performing perturbations of existing factors. To do so, a full decomposition of the update data is performed, and the factor matrices are then merged to create an updated decomposition. No further operations are performed following the merging step. IV. STREAMING TENSOR DECOMPOSITION In this section, we introduce the Streaming CP Update framework. We focus on the case of an N + 1-mode tensor for which the N + 1 th mode is the streaming mode 2, and the only mode in which size changes singlemode streaming. The generalization of the framework to multi-mode streaming is discussed in Section IV-A. We will generally refer to the streaming mode as the temporal mode and to the remaining modes as the non-temporal modes. To begin with, we assume that we have access to a rank- K tensor decomposition A n of the original I 1... I N T tensor X. We further denote the update tensor by X new and assume that its size is compatible with that of the original tensor, i.e., that is is of size I 1... I N T new for some T new N. Under these circumstances, our method can be described as follows: 1 Compute tensor decomposition of update data X new, 2 Merge existing and update tensor decompositions factor matrices along non-temporal modes, 3 Update temporal mode factor matrix, 4 Classify factors of updated decomposition optional, 5 Truncate updated decomposition optional. This framework is summarized in Algorithm 1. The first step uses existing routines for computing a rank-k new where K new is user-provided tensor decomposition A new n of the update data tensor X new. Algorithm 1 Streaming CP update Input: A n, X new, K new > 0, 0 < ν sim 1, τ > 0, K Compute: A n new rank-k new decomp. of X new A n, Ã N+1 new MERGE A n, An new, ν sim UPDATE A n, ÃN+1 new {C 1, C 2, C 3 } CLASSIFY A n, K, K, τ A n, S trunc TRUNCATE A n, K, K Output: A n, {C 1, C 2, C 3 }, S trunc 2 This is done purely for convenience and ease of notation; the framework is oblivious to the actual index of the streaming mode.

3 The purpose of the second step is to merge the existing and update tensor factor matrices along the nontemporal modes by first eliminating redundant factors and then concatenating the resulting matrices. For instance, assume that X and X new have non-temporal factor matrices {A i }N i1 and {Aj new} N j1 respectively. First, we identify the non-temporal factors A 1 :, i... AN :, i and A 1 new:, j... A N new:, j that are shared among both decomposition. To do so, we measure the cosine similarity, N A n :, i, An new:, j σi, j n1 A n A :, i n new:, j among each pairs and eliminate from the update data decomposition the factors whose similarity, max i σi, j, lies beyond some thresh ν sim. This leaves non-temporal factor matrices {Ãn new} N n1 of size I n K new, where K new K new. These correspond to novel components not previously observed in the original data, and are of prime importance in capturing phenomena that are not mere perturbations of existing components such as the beginning of a cyber-attack in network analysis. Finally, we concatenate the latter with the original factor matrices to create updated factor matrices: A n A n, Ãn new for n 1,..., N. This is summarized in Algorithm 2. Algorithm 2 Merging non-temporal modes MERGE Input: A n, An Initialize {Ãn for j from 1 to K new new, 0 < ν sim 1 n1 empty matrices do new} N+1 Compute: σj max i N n1 if σj < ν sim then for n from 1 to N + 1 do Add A n new:, j to Ãn end for end if end for for n from 1 to N A n A n end for Output: A n, This step ensures that: do, Ãn new Ã N+1 new new A n :,i, An new :,j A n A :,i n new:,j Non-temporal components found in the decomposition of the original tensor are leveraged to explain similar components in the update data. Completely new components are allowed to emerge. The third step involves the update of the temporal mode factor matrix. For this purpose, we write, upd, 0 T K new upd,new is a matrix of size T + T new K, where K K + K new, and corresponds to the temporal mode factor matrix of the updated decomposition. The zero matrix on the top-right indicates that the updated non-temporal factors associated with the right-most indices have no influence on the decomposition until time T. corresponds to the temporal mode factor matrix of the original decomposition, A upd, is a T new K matrix corresponding to the temporal mode update associated with previously-observed components, whereas A upd,new is a T new K new matrix corresponding to the temporal mode update associated with novel components. Under our proposed framework, the upper portion, 0 of the temporal factor matrix does not require any modifications throughout the update process see Appendix; only the lower part A upd upd,, AN+1 upd,new involves computations. This relies on the assumption that the components of the decomposition are stable, i.e., that an ab initio decomposition of the full tensor would produce nontemporal components associated with the portion of the data similar to those of X. 3 It is also key to the low cost of the method because the size of the latter T new K, and therefore the cost of the update, is often orders-of-magnitude smaller than that of the former T K. Also, upd,new is initialized using the temporal factor matrix obtained from the decomposition of X upd, i.e., Ã new N+1, whereas upd, is initialized randomly. The framework remains the same across decompositions, although the explicit nature of the update process varies. In this paper, we focus on three types of decompositions: the CP-APR 22 probabilistic Poisson framework for count data, CP-ALS 23 Alternating Least-Squares, and CP- ALS nonnegative CP-ALS-NN 23 CP-ALS with nonnegativity constraints decompositions. Algorithm 3-5 provide an explicit representation of the update process for each case. We further emphasize that the Streaming CP Update is a general framework that is not limited to this short list. Algorithm 3 CP-APR update UPDATE ; see proof in Appendix Input: A n, A upd initial guess Compute: Π A 1 A 2... A N T while NOT CONVERGED do A upd A upd X N+1,new A upd Π Π T end while Output: A N+1 0 T K new A upd Algorithm 4 CP-ALS update UPDATE Input: A n, A upd initial guess Compute: V A 1T A 1 A 2T A 2... A NT A N Compute: W A 1 A 2... A N A upd X N+1,new W V Output: N+1 A 0 T K new A upd 3 In practice, we have found that this is indeed the case. See Section V.

4 Algorithm 5 CP-ALS-NN update UPDATE Input: A n, A upd initial guess Compute: V A 1T A 1 A 2T A 2... A NT A N Compute: W A 1 A 2... A N A upd A upd X N+1,new W A upd V Output: N+1 A 0 T K new A upd Following the update stage, we proceed to the postprocessing. First, our aim is to provide classification information to the user as per which components of the updated decomposition belong to which of the following three categories: 1 Components present in the original decomposition that do not appear in the update data C 1 2 Components present in the original decomposition that appear in the update data C 2 3 Novel components C 3 Our criteria in each case can be described as follows: a component A 1 :, i 0... A N :, i 0 present in the original decomposition belongs to class C 1 if the associated updated portion of the temporal mode is small, otherwise it belongs to class C 2. Class C 3 components are those associated with the non-temporal factor matrices: {Ãn new} N n1. Explicit formulas can be found in Algorithm 6, and the actual thresh is user-provided. This classification process is flexible and may easily accommodate additional features such as forgetfulness, by which factors not contributing to the explanation of recent data are discarded see, e.g., 14. This may be achieved using, for instance, weighted norms in Algorithm 6. In addition, it provides a means of quantifying the evolution of the updated decomposition s quality. The final step involves truncating the resulting decomposition, which now has rank K K + K new, back to a decomposition of rank 0 < K K provided by the user. To do so, we order modes according to their size and only select the K largest. Information about which components were eliminated is also provided to the user. Pseudo-code for this operation can be found in Algorithm 7. Algorithm 6 Classify modes CLASSIFY Input: A n, K, K, τ > 0 for k from 1 to K do Let: φ k A 1 :, k... A N :, k if k < K then Tnew i1 upd, i,k 2 if :,j k < τ then φ k C 1 else φ k C 2 end if else φ k C 3 end if end for Output: {C i } 3 i1 Although these last two steps are not necessary for the scheme to succeed, they provide a useful and informative summary to the end-user who may not be fully familiar with the details of the framework. Furthermore, by keeping control over the rank of the updated decomposition we can achieve a trade-off between both computational and memory efficiency and the quality of the decomposition. Algorithm 7 Truncate updated decomposition TRUNCATE Input: A n, K, K > 0 Compute: λ k N+1 n1 An :, k Sort: A n :, j 1,..., A n :, j K, λ j1 λ j2... λ jk Truncate: A n A n :, j 1,..., A n :, j K Output: A n, S trunc {j K+1,..., j K} A. Multi-mode Streaming Decomposition To update along various modes, we order the modes and proceed using the single-mode streaming update, updating one mode at a time. The framework, although not the focus of the present paper, has been implemented and has produced results similar to those observed in the single-mode case Section V. One caveat worth mentioning, however, is that the final decomposition may exhibit a dependence on the particular mode ordering in which one performs the updates, especially if truncation is performed at each step. B. Streaming CP Downdate As more and more data enters the stream, the larger the temporal factor matrix becomes. In certain applications, this may quickly prove prohibitive from a memory perspective. In those circumstances, it may be deemed adequate to remove data from the dataset along the streaming mode. We designate to downdate the process of removing data and then adjusting the decomposition to take the removal into consideration. Downdating can be performed in a way that is analogous to our proposed updating process: assuming the existence of a decomposition A n of size I 1... I N T rem + T, our proposed scheme proceeds as follows: 1 Identify non-temporal components observed in the data to be removed but not present in the remaining data using, e.g., cosine similarity 2 Remove identified non-temporal factors from A n 3 Remove 1 : T rem, : from 4 Adjust temporal mode variants of Alg. 3-5 V. RESULTS A. Component Discovery in Streaming Cyber Data In this section, we illustrate the ability of our streaming tensor decomposition to rapidly discover components in streaming data. We show the capability with a real application use case in the cybersecurity domain. Specifically, we show how we identify a cyber attack at its onset in a real operational network, namely, SCinet, and trace its evolution. SCinet, described as the fastest network connecting the fastest computers, is set up each year at SC the

5 Fig. 2. A component revealing a suspected DNS amplification DDoS attack from the decomposition of a DNS query tensor. International Conference for High Performance Computing, Networking, Storage and Analysis. At SCinet 2017, we installed and operated ENSIGN from a node in the SCinet Network Security Cloud. ENSIGN enabled us to identify a number of suspicious network activities including network mapping attempts, port scans, and multiple suspected DNS amplification DDoS attacks. We base our illustration of the streaming tensor decomposition on one of the multiple suspected DNS amplification DDoS attacks that were detected using ENSIGN s CP-APR decomposition implementation. We chose to use CP-APR since it is particularly well-suited to the positive integers/count data found in cyber tensors. It also benefits the most from the streaming framework among all tested methods Table II. To create a DNS query tensor from a cyber log containing DNS queries, we used the following fields as tensor modes: time, sender IP, receiver IP, DNS query, and DNS query type. In Figure 2, we show one of the components from the output of the tensor decomposition on the DNS query tensor. The component shows a single subnet from Seychelles attempting to lookup a single domain across a large number of SCinet hosts. This lookup was for all DNS records related to a single domain and was repeatedly performed for a period of approximately five hours. Presumably, the sender address was forged and query responses were sent to this forged victim address. In theory, this would overwhelm the victim with response traffic. Since the vast majority of SCinet hosts are not DNS servers, significant traffic to the victim domain was not seen. In this case, it is likely that SCinet was a smaller part of a larger DDoS attack against the victim in Seychelles. From Figure 2, we infer that the attack took place from 8:30am until 1:30pm. We used our streaming analysis capability to detect and validate that the attack could be identified at its onset in near real-time. This opened up the opportunity to notify the network administrators about suspicious activities and attacks for timely action. Figure 3 illustrates the tracking of the activity as it happens over time. It describes the activity with a magnified resolution. B. Computational Efficiency In this section, we present experimental results to illustrate the computational efficiency of our framework and how our approach improves the response time of tensor decompositions while analyzing dynamically changing real-world data. We once again use cyber tensors for our illustration of computational efficiency. Specifically, we use the DNS query tensor described in the previous section. We formed tensors from DNS query logs generated through one day from midnight to 4pm. For these experiments, we fixed the base tensor as the one formed from the collection of DNS queries accumulated from midnight to 9am. We considered analysis of data streams coming in every hour. Further, we demonstrated the computational efficiency for the single mode streaming case i.e., assuming the data grows only along one mode, namely, the time mode. Table I lists the different tensors in the order in which they are formed in time and their sizes. TABLE I CYBER TENSOR DATASETS USED FOR THE EXPERIMENTS Mode size Number of Tensor Time Other modes non-zeros dns 0to , 128, 69608, dns 0to , 128, 69608, dns 0to , 128, 69608, dns 0to , 128, 69608, dns 0to , 128, 69608, dns 0to , 128, 69608, dns 0to , 128, 69608, dns 0to , 128, 69608, In the absence of streaming tensor decomposition, the analysis of each tensor in the list involves a full decomposition of the tensor without using any information from the decomposition of any of the preceding tensors in the list. Table II presents the computational efficiency in terms of faster analysis time when a streaming version of a particular CP method APR, ALS, ALS-NN is used instead of the base full version of the CP method. We used a modern multi-core system to evaluate our framework. The system we use is a quad socket 8-core system with Intel Xeon E GHz processors Intel Sandy Bridge microarchitecture chips. The system has 128 GB of DRAM. We use 64 threads with hyperthreading on for the performance runs. We observe a prominent reduction in the decomposition analysis time for the streaming version of CP-APR between 25x-80x reduction in time. We observe between 3x- 12x reduction in time with the streaming version of CP-ALS. With the streaming version of CP-ALS-NN, we observe between 2x-3.5x reduction in time. For all of our experiments, we used an existing parallelized and scalable implementation of the various CP decomposition routines 3. Our profiling shows that the most expensive part of the Streaming CP Update is the decomposition of the update data itself. The remaining components of the framework, although not currently parallelized, do not

Fig. 3. Components from the streaming decomposition showing the evolution of the DNS amplification DDoS attack as it happens over time. The attack is identified at its onset.

6 Fig. 3. Components from the streaming decomposition showing the evolution of the DNS amplification DDoS attack as it happens over time. The attack is identified at its onset. represent significant overhead and may be themselves easily parallelized. TABLE II TIME TAKEN BY THE THREE CP METHODS AND THEIR STREAMING VERSIONS ON CYBER TENSOR DATASETS Time in seconds CP-APR CP-ALS CP-ALS-NN Tensor Full Stream Full Stream Full Stream dns 0to dns 0to dns 0to dns 0to dns 0to dns 0to dns 0to dns 0to Finally, Figure 4 compares the final fit of the decomposition output resulting from the application of our Streaming CP Update framework with that of an ab initio decomposition, i.e., merging the and update data into a single, large tensor and performing an entirely new decomposition. It is clear from the final fit that our streaming decomposition methods does not result in any significant loss of accuracy in the decomposition even after after multiple updates. In fact, we even see minor improvement in some cases. This counterintuitive behavior may be accounted for by the nature of the optimization problem underlying the decomposition descent path, non-convexity but is not yet fully understood. Fig. 4. Final fit 1.0 represents perfect reconstruction resulting from our Streaming CP Update versus an ab initio decomposition. Using the streaming update framework does not significantly affect the fit less than 7% for APR, 10% for ALS and 3% for ALS-NN after 7 updates. VI. CONCLUSIONS We have presented a novel, computationally-efficient framework for performing CP tensor decomposition updates capable of updating and tracking the most important components as well as the quality of the decomposition on the fly. We have completed a full implementation of the method using high-performance tensor tools, which we used to evaluate the performance of the approach using real data. In doing so, we demonstrated the ability of the technique to capture important information and emerging components within a stream of data as well as its competitive computational performance.

7 REFERENCES 1 Victoria Hore, Ana Viñuela, Alfonso Buil, Julian Knight, Mark I McCarthy, Kerrin Small, and Jonathan Marchini. Tensor decomposition for multiple-tissue gene expression experiments. Nature genetics, 489:1094, Tom Henretty, Muthu Baskaran, James Ezick, David Bruns-Smith, and Tyler A Simon. A quantitative and qualitative analysis of tensor decompositions on spatiotemporal data. In High Performance Extreme Computing Conference HPEC, 2017 IEEE, pages 1 7. IEEE, Muthu Baskaran, Tom Henretty, Benoit Pradelle, M Harper Langston, David Bruns-Smith, James Ezick, and Richard Lethin. Memoryefficient parallel tensor decompositions. In High Performance Extreme Computing Conference HPEC, 2017 IEEE, pages 1 7. IEEE, Charlotte Møller Andersen and R Bro. Practical aspects of parafac modeling of fluorescence excitation-emission data. Journal of Chemometrics, 174: , Tamir Hazan, Simon Polak, and Amnon Shashua. Sparse image coding using a 3d non-negative tensor factorization. In Computer Vision, ICCV Tenth IEEE International Conference on, volume 1, pages IEEE, Furong Huang. Discovery of latent factors in high-dimensional data using tensor methods. arxiv preprint arxiv: , Yuan Luo, Fei Wang, and Peter Szolovits. Tensor factorization toward precision medicine. Briefings in bioinformatics, 183: , U Kang, Evangelos Papalexakis, Abhay Harpale, and Christos Faloutsos. Gigatensor: scaling tensor analysis up by 100 times-algorithms and discoveries. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pages ACM, Inah Jeon, Evangelos E Papalexakis, U Kang, and Christos Faloutsos. Haten2: Billion-scale tensor decompositions. In Data Engineering ICDE, 2015 IEEE 31st International Conference on, pages IEEE, Shaden Smith, Niranjay Ravindran, Nicholas D Sidiropoulos, and George Karypis. Splatt: Efficient and parallel sparse tensor-matrix multiplication. In Parallel and Distributed Processing Symposium IPDPS, 2015 IEEE International, pages IEEE, ByungSoo Jeon, Inah Jeon, Lee Sael, and U Kang. Scout: Scalable coupled matrix-tensor factorization-algorithm and discoveries. In Data Engineering ICDE, 2016 IEEE 32nd International Conference on, pages IEEE, Namyong Park, Byungsoo Jeon, Jungwoo Lee, and U Kang. Bigtensor: Mining billion-scale tensor made easy. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pages ACM, Shuo Zhou, Nguyen Xuan Vinh, James Bailey, Yunzhe Jia, and Ian Davidson. Accelerating online cp decompositions for higher order tensors. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages ACM, Shaden Smith, Kejun Huang, Nicholas D Sidiropoulos, and George Karypis. Streaming tensor factorization for infinite data sources. In Proceedings of the 2018 SIAM International Conference on Data Mining, pages SIAM, Ravdeep Pasricha, Ekta Gujral, and Evangelos E Papalexakis. Identifying and alleviating concept drift in streaming tensor decomposition. arxiv preprint arxiv: , A. Commike A. Gudibanda T. Henretty M. H. Langston P. Letourneau J. Ros-Giralt R. Lethin J. Ezick, M. Baskaran. Eliminating barriers to automated tensor analysis for large-scale flows, January Reservoir Labs. ENSIGN Tensor Toolbox, Dimitri Nion and Nicholas D Sidiropoulos. Adaptive algorithms to track the parafac decomposition of a third-order tensor. IEEE Transactions on Signal Processing, 576: , Jimeng Sun, Dacheng Tao, and Christos Faloutsos. Beyond streams and graphs: dynamic tensor analysis. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages ACM, Jimeng Sun, Dacheng Tao, Spiros Papadimitriou, Philip S Yu, and Christos Faloutsos. Incremental tensor analysis: Theory and applications. ACM Transactions on Knowledge Discovery from Data TKDD, 23:11, Muthu Baskaran, M Harper Langston, Tahina Ramananandro, David Bruns-Smith, Tom Henretty, James Ezick, and Richard Lethin. Accelerated low-rank updates to tensor decompositions. In High Performance Extreme Computing Conference HPEC, 2016 IEEE, pages 1 7. IEEE, Eric C Chi and Tamara G Ka. On tensors, sparsity, and nonnegative factorizations. SIAM Journal on Matrix Analysis and Applications, 334: , Tamara G Ka and Brett W Bader. Tensor decompositions and applications. SIAM review, 513: , 2009.

8 APPENDIX STREAMING CP-APR UPDATE - PROOF In this section, we prove the correctness of the streaming CP-APR update algorithm. Other update algorithms within the framework can be derived in a similar fashion. Our starting point is Algorithm 1-2 of Chi et al. 22 that describe the CP-APR algorithm as an alternating descent scheme. In this case, we fix n N + 1 temporal mode and consider the update process which takes the form, Algorithm 8 CP-APR descent along mode N + 1 while NOT CONVERGED do X N+1 Π Π T end while where, Π upd, 0 T K new upd,new A N A N 1... A 1 T Π Π new Under our framework, Π is a matrix of size K + K new N n1 I n, is a matrix of size T +T new K + K new corresponding to the temporal mode factor matrix of the updated decomposition, is of size T K and corresponds to the temporal mode factor matrix of the original decomposition, A upd, is a T new K matrix corresponding to the temporal mode update of the components we wish to compute, and A upd, new is an analogous T new K new matrix associated with emerging components. In particular, we note that Algorithm 8 stops when it has found a matrix that is a fixed point of the operator UA A X N+1 A Π Π T. Finally, we introduce X N+1 and X N+1 new, the matricization of X and X new along the temporal mode respectively, and write: X N+1 write: upd upd, AN+1 upd,new T XN+1 T, X N+1 new T. We also. With this notation, it follows that, X N+1 Π XN+1, X N+1,new X N+1, upd, Π X N+1,new A upd Π Therefore, the update takes the form, X N+1 B Π Π T X N+1, AN+1 However, since 0 T K new upd 0 T K new upd,new Π Π T Π T new X N+1,new A upd Π X N+1, Π upd X N+1,new A 1,..., AN, AN+1 Π Π new Π T upd Π 0 T K new Π T is a decomposition of X, must be a fixed point by construction, i.e., X N+1, Π Π T This indicates that the known original portion of the temporal factor matrix remains fixed throughout the CP-APR update and that only the portion corresponding to the update data must be modified at each iteration. The streaming CP- APR update thus becomes, Algorithm 9 CP-APR update streaming 1 : T, : 0 T K new Initialize A upd T + 1 : T + T new, : while NOT CONVERGED do A upd A upd X N+1,new A upd Π Π T end while which is the form found in Algorithm 3.

Memory-efficient Parallel Tensor Decompositions

Patent Pending Memory-efficient Parallel Tensor Decompositions Muthu Baskaran, Tom Henretty, Benoit Pradelle, M. Harper Langston, David Bruns-Smith, James Ezick, Richard Lethin Reservoir Labs Inc. New