A new truncation strategy for the higher-order singular value decomposition

Similar documents
Parallel Tensor Compression for Large-Scale Scientific Data

The multiple-vector tensor-vector product

On the convergence of higher-order orthogonality iteration and its extension

THERE is an increasing need to handle large multidimensional

Tensors and graphical models

Multilinear Subspace Analysis of Image Ensembles

Tensor networks, TT (Matrix Product States) and Hierarchical Tucker decomposition

An Introduction to Hierachical (H ) Rank and TT Rank of Tensors with Examples

Kronecker Product Approximation with Multiple Factor Matrices via the Tensor Product Algorithm

Tensor-Based Dictionary Learning for Multidimensional Sparse Recovery. Florian Römer and Giovanni Del Galdo

Multi-Linear Mappings, SVD, HOSVD, and the Numerical Solution of Ill-Conditioned Tensor Least Squares Problems

A Tensor Approximation Approach to Dimensionality Reduction

/16/$ IEEE 1728

CVPR A New Tensor Algebra - Tutorial. July 26, 2017

A Multi-Affine Model for Tensor Decomposition

Computational Linear Algebra

Higher-Order Singular Value Decomposition (HOSVD) for structured tensors

Lecture 1: Introduction to low-rank tensor representation/approximation. Center for Uncertainty Quantification. Alexander Litvinenko

Matrix-Product-States/ Tensor-Trains

Theoretical Performance Analysis of Tucker Higher Order SVD in Extracting Structure from Multiple Signal-plus-Noise Matrices

Fundamentals of Multilinear Subspace Learning

Multiscale Tensor Decomposition

Lecture 4. Tensor-Related Singular Value Decompositions. Charles F. Van Loan

Dynamical low-rank approximation

Principal Component Analysis

Principal components analysis COMS 4771

DS-GA 1002 Lecture notes 10 November 23, Linear models

20th European Signal Processing Conference (EUSIPCO 2012) Bucharest, Romania, August 27-31, 2012

Index. Copyright (c)2007 The Society for Industrial and Applied Mathematics From: Matrix Methods in Data Mining and Pattern Recgonition By: Lars Elden

vmmlib Tensor Approximation Classes Susanne K. Suter April, 2013

c 2008 Society for Industrial and Applied Mathematics

Designing Information Devices and Systems II

Lecture: Face Recognition and Feature Reduction

MATH36001 Generalized Inverses and the SVD 2015

Quick Introduction to Nonnegative Matrix Factorization

Wafer Pattern Recognition Using Tucker Decomposition

Generalized Higher-Order Tensor Decomposition via Parallel ADMM

N-mode Analysis (Tensor Framework) Behrouz Saghafi

Math 261 Lecture Notes: Sections 6.1, 6.2, 6.3 and 6.4 Orthogonal Sets and Projections

CS 231A Section 1: Linear Algebra & Probability Review

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang

Linear Subspace Models

Lecture: Face Recognition and Feature Reduction

EECS 275 Matrix Computation

Weighted Singular Value Decomposition for Folded Matrices

Lecture 4. CP and KSVD Representations. Charles F. Van Loan

LOW MULTILINEAR RANK TENSOR APPROXIMATION VIA SEMIDEFINITE PROGRAMMING

Introduction to the Tensor Train Decomposition and Its Applications in Machine Learning

TENSOR LAYERS FOR COMPRESSION OF DEEP LEARNING NETWORKS. Cris Cecka Senior Research Scientist, NVIDIA GTC 2018

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

STA141C: Big Data & High Performance Statistical Computing

Matrix-Tensor and Deep Learning in High Dimensional Data Analysis

Available Ph.D position in Big data processing using sparse tensor representations

Multilinear Analysis of Image Ensembles: TensorFaces

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

HOSVD Based Image Processing Techniques

Key words. multiway array, Tucker decomposition, low-rank approximation, maximum block improvement

The Singular Value Decomposition

OBJECT DETECTION AND RECOGNITION IN DIGITAL IMAGES

18 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 19, NO. 1, JANUARY 2008

Lecture 13 Visual recognition

7 Principal Component Analysis

Singular Value Decomposition

TENSOR APPROXIMATION TOOLS FREE OF THE CURSE OF DIMENSIONALITY

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =

Orthogonal tensor decomposition

Lecture Notes 2: Matrices

14 Singular Value Decomposition

Linear Algebra. Session 12

IV. Matrix Approximation using Least-Squares

Faloutsos, Tong ICDE, 2009

Low-rank tensor discretization for high-dimensional problems

TENSORS AND COMPUTATIONS

Review problems for MA 54, Fall 2004.

March 27 Math 3260 sec. 56 Spring 2018

FACTORIZATION STRATEGIES FOR THIRD-ORDER TENSORS

Math Linear Algebra

Singular Value Decomposition

A randomized block sampling approach to the canonical polyadic decomposition of large-scale tensors

PAVEMENT CRACK CLASSIFICATION BASED ON TENSOR FACTORIZATION. Offei Amanor Adarkwa

Tensor networks and deep learning

JOHANNES KEPLER UNIVERSITY LINZ. Institute of Computational Mathematics

THERE is an increasing need to handle large multidimensional

TCNJ JOURNAL OF STUDENT SCHOLARSHIP VOLUME XI APRIL, 2009 HIGHER ORDER TENSOR OPERATIONS AND THEIR APPLICATIONS

Multilinear Singular Value Decomposition for Two Qubits

STA141C: Big Data & High Performance Statistical Computing

Low Rank Tensor Recovery via Iterative Hard Thresholding

Fast multilinear Singular Values Decomposition for higher-order Hankel tensors

Journal of Statistical Software

Tensor Sparsity and Near-Minimal Rank Approximation for High-Dimensional PDEs

Numerical tensor methods and their applications

Tutorial on MATLAB for tensors and the Tucker decomposition

NEW TENSOR DECOMPOSITIONS IN NUMERICAL ANALYSIS AND DATA PROCESSING

GAIT RECOGNITION THROUGH MPCA PLUS LDA. Haiping Lu, K.N. Plataniotis and A.N. Venetsanopoulos

Section 6.2, 6.3 Orthogonal Sets, Orthogonal Projections

Chapter 3 Transformations

From Matrix to Tensor. Charles F. Van Loan

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016

Lecture Notes 1: Vector spaces

Sufficient Conditions to Ensure Uniqueness of Best Rank One Approximation

Transcription:

A new truncation strategy for the higher-order singular value decomposition Nick Vannieuwenhoven K.U.Leuven, Belgium Workshop on Matrix Equations and Tensor Techniques RWTH Aachen, Germany November 21, 2011 Joint work with Raf Vandebril and Karl Meerbergen

1 Introduction Context Notation 2 Orthogonal Tucker approximation 3 Truncated HOSVD 4 Sequentially truncated HOSVD Definition Operation count Approximation error 5 Numerical examples Compression of images Handwritten digit classification Compression of simulation results 6 Conclusions

Overview 1 Introduction Context Notation 2 Orthogonal Tucker approximation 3 Truncated HOSVD 4 Sequentially truncated HOSVD Definition Operation count Approximation error 5 Numerical examples Compression of images Handwritten digit classification Compression of simulation results 6 Conclusions

Overview 1 Introduction Context Notation 2 Orthogonal Tucker approximation 3 Truncated HOSVD 4 Sequentially truncated HOSVD Definition Operation count Approximation error 5 Numerical examples Compression of images Handwritten digit classification Compression of simulation results 6 Conclusions

Context I Everything is a tensor d = 0 Scalar a = d = 1 Vector a = d = 2 Matrix A = d 3 Tensor A =

Context II Everything is a tensor Multidimensional data appears in many applications: image and signal processing, pattern recognition, data mining and machine learning, chemometrics, biomedicine, psychometrics,... Two major problems associated with this data: 1 Storage cost is very high, and 2 analysis and interpretation of patterns and phenomena in data.

Context III Tensor decompositions for compression Stable tensor decompositions suitable for compression in the Low-dimensional case: Tucker (Tucker 1966), Higher-Order SVD (De Lathauwer et al 2000), Cross-approximation (Oseledets et al. 2008), Sequentially Truncated HOSVD (Vannieuwenhoven et al. 2011). High-dimensional case: Hierarchical Tucker (Hackbusch and Kühn 2009, Grasedyck 2010), Tensor-Train (Oseledets 2011).

Context IV Compression Low-rank representation of matrices by SVD: A USV T := (U, V ) S A U S V T Low-rank representation of tensors by Tucker: A (Û 1, Û 2, Û 3 ) Ŝ Û 3 A Û 1 Ŝ Û 2

Overview 1 Introduction Context Notation 2 Orthogonal Tucker approximation 3 Truncated HOSVD 4 Sequentially truncated HOSVD Definition Operation count Approximation error 5 Numerical examples Compression of images Handwritten digit classification Compression of simulation results 6 Conclusions

Notation I Mode k vector space A tensor A of order d is an object in the tensor product of d vector spaces: A R n 1 R n 2 R n d R n 1 n 2 n d A 3 rd order tensor has 3 associated vector spaces: Mode 1 vectors (R n 1 ) Mode 2 vectors (R n 2 ) Mode 3 vectors (R n 3 )

Notation II Unfolding R n 1 n 2 n 3 A = Mode 2 unfolding A (2) = R n 2 n 1 n 3

Notation III Frobenius norm: A 2 := i 1,i 2,i 3 A 2 i 1,i 2,i 3. Multilinear multiplication: [(I, M 2, I ) A] (2) := M 2 A (2). (M 1, M 2, M 3 ) A := (M 1, I, I ) (I, M 2, I ) (I, I, M 3 ) A. Projection of mode 2 vectors on span of U 2 (orthogonal columns): π 2 A := (I, U 2 U2 T, I ) A Projection of mode 2 vectors on complement of U 2 : π2 A := A π 2 A

Overview 1 Introduction Context Notation 2 Orthogonal Tucker approximation 3 Truncated HOSVD 4 Sequentially truncated HOSVD Definition Operation count Approximation error 5 Numerical examples Compression of images Handwritten digit classification Compression of simulation results 6 Conclusions

Orthogonal Tucker approximation problem Best rank-(r 1, r 2, r 3 ) approximation problem: min A B F = rank(b) (r 1,r 2,r 3 ) min A (U 1 U1 T, U 2 U2 T,..., U d Ud T ) A F. U i O(n i,r i ) with O(n i, r i ) the group of n i r i matrices with orthonormal columns. Optimum is found by orthogonal projection onto a new, optimal tensor basis, but no closed solution known.

Orthogonal Tucker approximation I Definition A (Û 1, Û 2, Û 3 ) Ŝ Û 3 A Û 1 Ŝ Û 2 Rank (r 1, r 2, r 3 ) orthogonal Tucker approximation to A Columns of Û 1 R n 1 r 1 Û 2 R n 2 r 2 Û 3 R n 3 r 3 can be extended to a basis of R n 1 R n 2 R n 3

Orthogonal Tucker approximation II Error [VVM2011] If A Â := π 1 π 2 π 3 A = (U 1 U1 T, U 2 U2 T, U 3 U3 T ) A. Then an error expression is = A π 1 π 2 π 3 A 2 = + + π2 A 2 + π1 π 2A 2 + π3 π 1π 2 A 2 with upper bound + + A π 1 π 2 π 3 A 2 π2 A 2 + π1 A 2 + π3 A 2

Orthogonal Tucker approximation II Error [VVM2011] If A Â := π 1 π 2 π 3 A = (U 1 U1 T, U 2 U2 T, U 3 U3 T ) A. Then an error expression is = A π 1 π 2 π 3 A 2 = + + π2 A 2 + π1 π 2A 2 + π3 π 1π 2 A 2 with upper bound + + A π 1 π 2 π 3 A 2 π2 A 2 + π1 A 2 + π3 A 2 Dependence on the processing order of the modes.

Overview 1 Introduction Context Notation 2 Orthogonal Tucker approximation 3 Truncated HOSVD 4 Sequentially truncated HOSVD Definition Operation count Approximation error 5 Numerical examples Compression of images Handwritten digit classification Compression of simulation results 6 Conclusions

Truncated Higher-Order SVD (T-HOSVD) [DDV2000] Recall upper bound? + + A π 1 π 2 π 3 A 2 π2 A 2 + π1 A 2 + π3 A 2 Minimize it! min A π 1 π 2 π 3 A 2 π 1,π 2,π 3 = min π 1,π 2,π 3 k=1 3 k=1 3 πk A 2, min π k π k A 2. Minimum given by r k first singular vectors in every mode k!

Algorithm Rank (r 1, r 2, r 3 ) T-HOSVD: for every mode k do Compute rank r k truncated SVD: A (k) = [ Ū k Ū k end for Project: S = (Ū T 1, ŪT 2, ŪT 3 ) A ] [ S k S k ] [ V k V k ] T

Overview 1 Introduction Context Notation 2 Orthogonal Tucker approximation 3 Truncated HOSVD 4 Sequentially truncated HOSVD Definition Operation count Approximation error 5 Numerical examples Compression of images Handwritten digit classification Compression of simulation results 6 Conclusions

Overview 1 Introduction Context Notation 2 Orthogonal Tucker approximation 3 Truncated HOSVD 4 Sequentially truncated HOSVD Definition Operation count Approximation error 5 Numerical examples Compression of images Handwritten digit classification Compression of simulation results 6 Conclusions

Sequentially truncated HOSVD (ST-HOSVD) [VVM2011] Recall error expression? = A π 1 π 2 π 3 A 2 = + + π2 A 2 + π1 π 2A 2 + π3 π 1π 2 A 2 (Try to) minimize it! min A π 1 π 2 π 3 A 2 π 1,π 2,π 3 = min π 1,π 2,π 3 [ = min π 1 π1 A 2 + π2 π 1 A 2 + π3 π 1 π 2 A 2] [ π2 π 1 A 2 + min π π 3 π 1 π 2 A 2 3 [ π 1 A 2 + min π 2 ]]

Sequentially truncated HOSVD (ST-HOSVD) [VVM2011] The sequentially truncated HOSVD computes solution to π 1 = arg min π 1 π 1 A 2 π 2 = arg min π 2 π 2 π 1A 2 π 3 = arg min π 3 π 3 π 1π 2A 2 π k given by r k first singular vectors of [π 1 π k 1 A] (k)!

Algorithm Rank (r 1, r 2, r 3 ) ST-HOSVD: Ŝ = A for every mode k do Compute rank r k trunctated SVD: Ŝ (k) = [ Û k Project: Ŝ (k) = Ûk T end for Û k Ŝ(k) ] [ Ŝ k Ŝ k ] [ ˆV k ˆV k ] T Ŝ = A Ŝ (1) = Û T 1 Ŝ(1) Ŝ (2) = Û T 2 Ŝ(2) Ŝ (3) = Û T 3 Ŝ(3)

Overview 1 Introduction Context Notation 2 Orthogonal Tucker approximation 3 Truncated HOSVD 4 Sequentially truncated HOSVD Definition Operation count Approximation error 5 Numerical examples Compression of images Handwritten digit classification Compression of simulation results 6 Conclusions

Operation count Theorem Let A R n n n be truncated to rank (r, r,..., r) by the ST-HOSVD and T-HOSVD. Assume an O(m 2 n) algorithm to compute the SVD of an m n matrix, m n. Then, the ST-HOSVD requires ( d ) d O r k 1 n d k+2 + r k n d k k=1 k=1 operations, and T-HOSVD requires ( ) d O dn d+1 + r k n d k+1 k=1 operations to compute the approximation.

Operation count Speedup 5 4 3 2 1 0 0 5 10 15 20 25 30 r d = 3 d = 4 d = 5 Speedup of ST-HOSVD over T-HOSVD for an order-d tensor of size 30 30 30 which is truncated to rank (r, r,..., r). Speedups greater than d possible with non-cubic tensors.

Overview 1 Introduction Context Notation 2 Orthogonal Tucker approximation 3 Truncated HOSVD 4 Sequentially truncated HOSVD Definition Operation count Approximation error 5 Numerical examples Compression of images Handwritten digit classification Compression of simulation results 6 Conclusions

Approximation error I Hypothesis Hypothesis Let A be an ST-HOSVD approximation, and A the T-HOSVD approximation of corresponding rank. Then, A A F? A A F. Not valid in general.

Approximation error II Counterexample Rank-(1, 1, 1, 1) approximation of [ ] [ ] 0.5 1.7 2.4 0.1 A :,:,1,1 =, A 1.3 0.6 :,:,2,1 =, 0.7 1.4 [ ] [ ] 0.1 0.1 0.3 2.5 A :,:,1,2 =, A 2.2 0.8 :,:,2,2 =. 0.0 0.3 T-HOSVD: A = 0.97325 0.22975 ST-HOSVD: A = 0.97325 0.22975, 0.78940 0.61388, 0.97310 0.23037 The approximation errors:, 0.31546 0.94894, 0.09956 0.99503, 0.88167 0.47186, 0.99692 0.07841 [ 2.57934 ] [ 2.53595 ] A A 2 F = 18.68700 and A A 2 F = 18.90896,..

Approximation error III Sufficient condition Theorem Let A R n 1 n 2 n 3. Let A be the rank-(1, r, r) ST-HOSVD of A and let A be the T-HOSVD of A of the same rank. Then, A A F A A F. ST-HOSVD yields better rank-1 approximation for order-3 tensors.

Approximation error IV Bounds ST-HOSVD error bound: = A π 1 π 2 π 3 A 2 = + + π2 A 2 + π1 π 2A 2 + π3 π 1π 2 A 2 T-HOSVD error bound: + + A π 1 π 2 π 3 A 2 π2 A 2 + π1 A 2 + π3 A 2

Approximation error IV Bounds ST-HOSVD error bound: = A π 1 π 2 π 3 A 2 = + + π2 A 2 + π1 π 2A 2 + π3 π 1π 2 A 2 T-HOSVD error bound: + + A π 1 π 2 π 3 A 2 π2 A 2 + π1 A 2 + π3 A 2 Both bounded by: A A F d A A opt F.

Overview 1 Introduction Context Notation 2 Orthogonal Tucker approximation 3 Truncated HOSVD 4 Sequentially truncated HOSVD Definition Operation count Approximation error 5 Numerical examples Compression of images Handwritten digit classification Compression of simulation results 6 Conclusions

Overview 1 Introduction Context Notation 2 Orthogonal Tucker approximation 3 Truncated HOSVD 4 Sequentially truncated HOSVD Definition Operation count Approximation error 5 Numerical examples Compression of images Handwritten digit classification Compression of simulation results 6 Conclusions

Compression of images I Compression of the Olivetti Research Laboratory faces database [SH1994, VT2003]. Tensor of size 10304 40 10 (Texel Subject Expression). Unstructured, 4.1 million nonzeros. HOOI, T-HOSVD and ST-HOSVD approximations of 6560 different ranks.

Compression of images II Relative difference between approximation error of T-HOSVD and HOOI: (err HOSVD err HOOI )/err HOOI. Subject mode rank Expression mode rank = 6 40 35 30 25 20 15 10 5 0 100 200 300 400 Texel mode rank 7% 6% 5% 4% 3% 2% 1% 0% Subject mode rank Expression mode rank = 7 40 35 30 25 20 15 10 5 0 100 200 300 400 Texel mode rank 7% 6% 5% 4% 3% 2% 1% 0% Average relative error: 2.115% Maximum relative error: 6.340%

Compression of images II Relative difference between approximation error of ST-HOSVD and HOOI: (err HOSVD err HOOI )/err HOOI. Subject mode rank Expression mode rank = 6 40 35 30 25 20 15 10 5 0 100 200 300 400 Texel mode rank 7% 6% 5% 4% 3% 2% 1% 0% Subject mode rank Expression mode rank = 7 40 35 30 25 20 15 10 5 0 100 200 300 400 Texel mode rank 7% 6% 5% 4% 3% 2% 1% 0% Average relative error: 0.099% Maximum relative error: 1.012%

Dinner tonight La Finestra at 19h30!

Compression of images III Total compression time (min) of HOOI, ST-HOSVD and T-HOSVD: 1841 91 207

Overview 1 Introduction Context Notation 2 Orthogonal Tucker approximation 3 Truncated HOSVD 4 Sequentially truncated HOSVD Definition Operation count Approximation error 5 Numerical examples Compression of images Handwritten digit classification Compression of simulation results 6 Conclusions

Handwritten digit classification I Classification of handwritten digits by T-HOSVD [SE2007]. Tensor of size 786 5421 10 (Texel Example Digit). Unstructured, 42.6 million non-zeros. T-HOSVD and ST-HOSVD truncated to relative error of 10%. T-HOSVD ST-HOSVD Rel. model error 9.90% 9.68% Model rank (94, 511, 10) (94, 511, 10)

Handwritten digit classification II T-HOSVD ST-HOSVD Classification error 4.94% 4.94% Factorization time 49m 26.0s 1m 8.7s 43x speedup!

Handwritten digit classification III Why? Recall truncation rank? (94, 511, 10). T-HOSVD requires: SVD of 786 54210 matrix, SVD of 5421 7860 matrix, and SVD of 10 4260906 matrix. ST-HOSVD (only) requires: 1 SVD of 786 54210 matrix, 2 SVD of 5421 940 matrix, and 3 SVD of 10 48034 matrix.

Overview 1 Introduction Context Notation 2 Orthogonal Tucker approximation 3 Truncated HOSVD 4 Sequentially truncated HOSVD Definition Operation count Approximation error 5 Numerical examples Compression of images Handwritten digit classification Compression of simulation results 6 Conclusions

Compression of simulation results I Compression of numerical solution of the heat equation on a square domain computed by explicit Euler. Inspired by [LVV2010]. Tensor of size 101 101 10001 (x y t). Partially symmetric, 102.0 million non-zeros. T-HOSVD and ST-HOSVD truncated to absolute error of 10 4 (discretization accurary).

Compression of simulation results II T-HOSVD ST-HOSVD Abs. error 8.512 10 5 9.587 10 5 Rank (22, 22, 20) (22, 21, 19) T-HOSVD ST-HOSVD Storage (nb. of values) 214144 203140 Factorization time 2h 46m 1m 14.7s 133x speedup!

Overview 1 Introduction Context Notation 2 Orthogonal Tucker approximation 3 Truncated HOSVD 4 Sequentially truncated HOSVD Definition Operation count Approximation error 5 Numerical examples Compression of images Handwritten digit classification Compression of simulation results 6 Conclusions

Conclusions Early projection, as in ST-HOSVD, can greatly improve the performance of T-HOSVD.

Thank you for your attention.

References DDV2000 L. De Lathauwer, B. De Moor, and J. Vandewalle, A multilinear singular value decomposition, SIMAX, 21 (2000) LVV2010 L.S. Lorente, J.M. Vega, and A. Velazquez, Compression of aerodynamic databases using higher-order singular value decomposition, Aerosp. Sc. and Tech., 14 (2010) SÉ2007 B. Savas and L. Eldén, Handwritten digit classification using higher-order singular value decomposition, Patt. Recog., 40 (2007) VVM2011 N. Vannieuwenhoven, R. Vandebril, and K. Meerbergen, A new truncation strategy for the higher-order singular value decomposition, 2011, Submitted. Also: On the truncated multilinear singular value decomposition, Tech. Rep. TW589, K.U.Leuven, March 2011

References SH1994 F.S. Samaria and A.C. Harter, Parameterization of a stochastic model of human face identification, Proc. Second IEEE W. Appl. Comp. Vision, 1994. VT2003 M.A.O. Vasilescu and D. Terzopoulos, Multilinear subspace analysis of image ensembles, Comp. Vision Patt. Recog. IEEE Conf. 2003