Decomposing a three-way dataset of TV-ratings when this is impossible. Alwin Stegeman

Similar documents
JOS M.F. TEN BERGE SIMPLICITY AND TYPICAL RANK RESULTS FOR THREE-WAY ARRAYS

A concise proof of Kruskal s theorem on tensor decomposition

Optimal solutions to non-negative PARAFAC/multilinear NMF always exist

Subtracting a best rank-1 approximation may increase tensor rank

arxiv: v1 [math.ag] 10 Oct 2011

The multiple-vector tensor-vector product

arxiv: v1 [math.ra] 13 Jan 2009

c 2008 Society for Industrial and Applied Mathematics

Tutorial on MATLAB for tensors and the Tucker decomposition

Third-Order Tensor Decompositions and Their Application in Quantum Chemistry

CVPR A New Tensor Algebra - Tutorial. July 26, 2017

Coupled Matrix/Tensor Decompositions:

Tensor Decompositions Workshop Discussion Notes American Institute of Mathematics (AIM) Palo Alto, CA

/16/$ IEEE 1728

Computational Linear Algebra

On the uniqueness of multilinear decomposition of N-way arrays

MATRIX COMPLETION AND TENSOR RANK

A BLIND SPARSE APPROACH FOR ESTIMATING CONSTRAINT MATRICES IN PARALIND DATA MODELS

ENGG5781 Matrix Analysis and Computations Lecture 10: Non-Negative Matrix Factorization and Tensor Decomposition

Numerical multilinear algebra and its applications

On Kruskal s uniqueness condition for the Candecomp/Parafac decomposition

Kruskal s condition for uniqueness in Candecomp/Parafac when ranks and k-ranks coincide

Window-based Tensor Analysis on High-dimensional and Multi-aspect Streams

A Constrained PARAFAC Method for Positive Manifold Data

Multi-Way Compressed Sensing for Big Tensor Data

Orthogonal tensor decomposition

Introduction to Tensors. 8 May 2014

LOW RANK TENSOR DECONVOLUTION

Pascal Lab. I3S R-PC

c 2008 Society for Industrial and Applied Mathematics

Generic and Typical Ranks of Multi-Way Arrays

Must-read Material : Multimedia Databases and Data Mining. Indexing - Detailed outline. Outline. Faloutsos

Fundamentals of Multilinear Subspace Learning

hal , version 1-16 Aug 2009

Globally convergent algorithms for PARAFAC with semi-definite programming

Nonnegative approximations of nonnegative tensors

Typical rank and indscal dimensionality for symmetric three-way arrays of order I 2 2 or I 3 3

Multilinear Analysis of Image Ensembles: TensorFaces

Weighted Singular Value Decomposition for Folded Matrices

1. Structured representation of high-order tensors revisited. 2. Multi-linear algebra (MLA) with Kronecker-product data.

Linear Algebra (Review) Volker Tresp 2018

Fitting a Tensor Decomposition is a Nonlinear Optimization Problem

Hyperdeterminants, secant varieties, and tensor approximations

A Simpler Approach to Low-Rank Tensor Canonical Polyadic Decomposition

Fast Nonnegative Tensor Factorization with an Active-Set-Like Method

Non-negative Tensor Factorization Based on Alternating Large-scale Non-negativity-constrained Least Squares

Dimitri Nion & Lieven De Lathauwer

Tensors versus Matrices, usefulness and unexpected properties

To be published in Optics Letters: Blind Multi-spectral Image Decomposition by 3D Nonnegative Tensor Title: Factorization Authors: Ivica Kopriva and A

SOME ADDITIONAL RESULTS ON PRINCIPAL COMPONENTS ANALYSIS OF THREE-MODE DATA BY MEANS OF ALTERNATING LEAST SQUARES ALGORITHMS. Jos M. F.

FROM BASIS COMPONENTS TO COMPLEX STRUCTURAL PATTERNS Anh Huy Phan, Andrzej Cichocki, Petr Tichavský, Rafal Zdunek and Sidney Lehky

GENERIC AND TYPICAL RANKS OF THREE-WAY ARRAYS

A FLEXIBLE MODELING FRAMEWORK FOR COUPLED MATRIX AND TENSOR FACTORIZATIONS

Big Tensor Data Reduction

Faloutsos, Tong ICDE, 2009

Wafer Pattern Recognition Using Tucker Decomposition

Towards A New Multiway Array Decomposition Algorithm: Elementwise Multiway Array High Dimensional Model Representation (EMAHDMR)

4. Multi-linear algebra (MLA) with Kronecker-product data.

Algorithm 862: MATLAB Tensor Classes for Fast Algorithm Prototyping

A PARALLEL ALGORITHM FOR BIG TENSOR DECOMPOSITION USING RANDOMLY COMPRESSED CUBES (PARACOMP)

Scalable Tensor Factorizations with Incomplete Data

Numerical multilinear algebra in data analysis

Multilinear Subspace Analysis of Image Ensembles

Generic properties of Symmetric Tensors

Eigenvalues of tensors

Linear Algebra (Review) Volker Tresp 2017

TENSOR APPROXIMATION TOOLS FREE OF THE CURSE OF DIMENSIONALITY

Multilinear Algebra in Data Analysis:

, one is often required to find a best rank-r approximation to A in other words, determine vectors x i R d1, y i R d2,...

c 2012 Society for Industrial and Applied Mathematics

Non-negative Tensor Factorization with missing data for the modeling of gene expressions in the Human Brain

A GENERALIZATION OF TAKANE'S ALGORITHM FOR DEDICOM. Yosmo TAKANE

Parallel Tensor Compression for Large-Scale Scientific Data

Matrix Decomposition in Privacy-Preserving Data Mining JUN ZHANG DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF KENTUCKY

the tensor rank is equal tor. The vectorsf (r)

A randomized block sampling approach to the canonical polyadic decomposition of large-scale tensors

On the convergence of higher-order orthogonality iteration and its extension

Nonnegativity Constraints in Numerical Analysis

Sparseness Constraints on Nonnegative Tensor Decomposition

Properties of Matrices and Operations on Matrices

CP DECOMPOSITION AND ITS APPLICATION IN NOISE REDUCTION AND MULTIPLE SOURCES IDENTIFICATION

Online Tensor Factorization for. Feature Selection in EEG

Performance Evaluation of the Matlab PCT for Parallel Implementations of Nonnegative Tensor Factorization

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 57, NO. 6, JUNE

Tensor Decomposition Theory and Algorithms in the Era of Big Data

Data Mining and Matrices

Higher-Order Singular Value Decomposition (HOSVD) for structured tensors

Tensor Analysis. Topics in Data Mining Fall Bruno Ribeiro

Analyzing Data Boxes : Multi-way linear algebra and its applications in signal processing and communications

A new truncation strategy for the higher-order singular value decomposition

Permutation transformations of tensors with an application

Available Ph.D position in Big data processing using sparse tensor representations

Review of Linear Algebra

Information Retrieval

Performance Evaluation of the Matlab PCT for Parallel Implementations of Nonnegative Tensor Factorization

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 52, NO. 9, SEPTEMBER

Sufficient Conditions to Ensure Uniqueness of Best Rank One Approximation

to be more efficient on enormous scale, in a stream, or in distributed settings.

Multi-Linear Mappings, SVD, HOSVD, and the Numerical Solution of Ill-Conditioned Tensor Least Squares Problems

arxiv: v1 [cs.lg] 18 Nov 2018

Transcription:

Decomposing a three-way dataset of TV-ratings when this is impossible Alwin Stegeman a.w.stegeman@rug.nl www.alwinstegeman.nl 1

Summarizing Data in Simple Patterns Information Technology collection of huge data sets, often multi-way data z(i,j,k, ) Approximation: Multi-way data simple patterns data interpretation (psychometrics, neuro-imaging, data mining) separation of chemical compounds (chemometrics) separation of mixed signals (signal processing) faster calculations (algebraic complexity theory, scientific computing) 2

Simple structure = rank 1 2-way array = matrix Z (I J) with entries z(i,j) rank 1: Z = a b T = a ๐ b z(i,j) = a(i)۰b(j) rank(z) = min {R : Z = a 1 ๐b 1 + + a R ๐b R } 3-way array Z (I J K) with entries z(i,j,k) rank 1: Z = a ๐ b ๐ c z(i,j,k) = a(i)۰b(j)۰c(k) rank(z) = min {R : Z = a 1 ๐b 1 ๐c 1 + + a R ๐b R ๐c R } 3

2-way (PCA) decomposition Z b 1 b R = + + + E a 1 a R Z = a 1 ๐ b 1 + + a R ๐ b R + E = A B T + E with A = [a 1 a R ] B = [b 1 b R ] Goal: Find (A,B) that minimize ssq(e) 4

3-way Candecomp/Parafac (CP) c 1 c R Z b 1 b R = + + + E a 1 a R Z = a 1 ๐ b 1 ๐ c 1 + + a R ๐ b R ๐ c R + E Goal: Find (A,B,C) that minimize ssq(e) with C = [c 1 c R ] 5

3-way CP 2-way decomp computation iterative algorithm SVD best rank-r approximation rotational uniqueness existence for R < rank(data) yes under mild conditions not guaranteed yes no yes 6

CP analysis of 3-way TV-ratings data 7

TV show 1 Mash 8

TV show 2 Charlie s Angels 9

TV show 3 All in the Family 10

TV show 4 60 Minutes 11

TV show 5 The Tonight Show 12

TV show 6 Let s Make a Deal 13

TV show 7 The Waltons 14

TV show 8 Saturday Night Live 15

TV show 9 News 16

TV show 10 Kojak 17

TV show 11 Mork and Mindy 18

TV show 12 Jacques Cousteau 19

TV show 13 Football 20

TV show 14 Little House on the Prairie 21

TV show 15 Wild Kingdom 22

Rating Scales 1-8 -6, -5,, -1, 0, 1,, 5, 6 23

Rating Scales 9-16 -6, -5,, -1, 0, 1,, 5, 6 24

TV-ratings data 30 persons have rated 15 TV shows on 16 rating scales Preprocessing: Centering across rating scales Centering across TV shows Normalizing within persons TV data also analyzed by Lundy et al. (1989) and Harshman (2004) Analysis presented is by Stegeman (2014) 25

Output of the CP analysis with R components Matrix A (15 R): Matrix B (16 R): Matrix C (30 R): columns are TV show components columns are rating scales loadings columns are person loadings c 1 c R Z b 1 b R = + + + E a 1 a R 26

Scaling the CP solution Column of A: mean squared component score = 1 Column of B: mean squared loading = 1 Column of C: sum of squared loadings = 4 Z = g 1 (a 1 ๐ b 1 ๐ c 1 ) + + g R (a R ๐ b R ๐ c R ) + E weight g r indicates strength of component r columns are sign changed such that C has positive loadings 27

Fit of the CP solution Fit % = 100 100 ssq(e) / ssq(z) (range 0 to 100) Congruence coefficient of two components cc A (1,2) = a T 1 a2 (range -1 to +1) ssq( a1) ssq( a2) cc(1,2) = cc A (1,2) cc B (1,2) cc C (1,2) 28

The CP solution with 2 components Overall: fit = 41.96 % cc(1,2) = 0.002 Component 1: fit = 28.46 % g 1 = 1.46 Component 2: fit = 13.59 % g 2 = 1.01 Interpretation: Component 1 = Humor Component 2 = Sensitivity 29

Component 1 = Humor TV shows mode Scales mode 30

Component 2 = Sensitivity TV shows mode Scales mode 31

Components 1 and 2 persons plot 32

The CP solution with 3 components Overall: fit = 50.76 % cc(1,2) = -0.996 cc(1,3) = -0.13 cc(2,3) = 0.12 Comps. 1+2: fit = 20.11 % g 1 = 15.23 g 2 = 15.39 Component 3: fit = 24.38 % g 3 = 1.52 Interpretation: Components 1 & 2 =??? Component 3 = Humor 33

Components 1 and 2 TV shows mode 34

Components 1 and 2 Scales mode 35

Comparing the solutions for R=2 and R=3 congruence coefficients of R=2 components (columns) and R=3 components (rows): Humor Sensitivity Comp. 1-0.15-0.41 Comp. 2 0.15 0.46 Humor 0.93 0.01 36

Some Theory Diverging components occur when CP does not have an optimal solution, i.e., best rank-r approximation does not exist (Krijnen et al., 2008; De Silva & Lim, 2008) CP has an optimal solution if the columns of A (or B or C) are restricted to be orthogonal (Harshman & Lundy, 1984; Krijnen et al., 2008) CP has an optimal solution if the data is nonnegative and A,B,C are restricted to be nonnegative (Lim, 2005; Lim & Comon, 2009) 37

R=3 components and orthogonal TV shows mode Overall: fit = 50.22 % cc(r1,r2) = 0 Component 1: fit = 27.19 % g 1 = 1.43 Component 2: fit = 13.04 % g 2 = 0.99 Component 3: fit = 9.99 % g 3 = 0.87 Interpretation: Component 1 = Humor Component 2 = Sensitivity Component 3 = Violence 38

Component 3 = Violence TV shows mode Scales mode 39

Comparing the solutions obtained so far R=2 R=3 R=3 orth. H S Comp.1 Comp.2 H Humor 0.96 0.00-0.20 0.19 0.95 Sensitivity 0.00 0.94-0.30 0.36 0.03 Violence 0.11 0.08 0.30-0.24-0.03 H (R=2) -0.15 0.15 0.93 S (R=2) -0.41 0.46 0.01 the two diverging components relate to S and V 40

Some more Theory CP decompositions with R components (rank R) CP with > R components (rank > R) Z (data) X (boundary) updates 41

CP does not have an optimal solution if optimal boundary point X does not have rank R In that case, the decomposition of X contains one or more interaction terms, e.g., s 2 ๐t 1 ๐u 2 How to find X and its decomposition? Algorithms exist for: I J 2 arrays and R min(i,j) (Stegeman & De Lathauwer, 2009) I J K arrays and R=2 (Rocci & Giordani, 2010) 42

Two-stage method for I J K arrays and general R First fit CP. In case of diverging components, do this: For combinations of nondiverging and groups of 2,3,4 diverging components, the form of the decomposition of the limit point X has been proven (Stegeman, 2012,2013) This form of decomposition is fitted to the data Z with initial values obtained from the diverging CP decomposition (Stegeman, 2012,2013) This yields X and its decomposition with interaction terms (Stegeman, 2012,2013) 43

The form of the limit of two diverging components is: g 111 (s 1 ๐t 1 ๐u 1 ) + g 221 (s 2 ๐t 2 ๐u 1 ) + g 212 (s 2 ๐t 1 ๐u 2 ) For the TV data with R=3, we fit the decomposition: g 111 (s 1 ๐t 1 ๐u 1 ) + g 221 (s 2 ๐t 2 ๐u 1 ) + g 212 (s 2 ๐t 1 ๐u 2 ) + g 333 (s 3 ๐t 3 ๐u 3 ) 44

Decomposition of the limit point in 4 terms Overall: fit = 50.7571 % (50.7569 for R=3) g 111 (s 1 ๐t 1 ๐u 1 ): fit = 7.62 % g 111 = 0.99 g 221 (s 2 ๐t 2 ๐u 1 ): fit = 10.75 % g 221 = 0.95 g 212 (s 2 ๐t 1 ๐u 2 ): fit = 1.55 % g 122 = 0.33 g 333 (s 3 ๐t 3 ๐u 3 ): fit = 24.37 % g 333 = 1.52 Interpretation: s 1 and t 1 = Violence s 2 and t 2 = Sensitivity s 3 and t 3 = Humor 45

Comparison to R=3 solution with orth. TV shows Humor Sensitivity Violence g 111 (s 1 ๐t 1 ๐u 1 ) -0.07 0.13 0.86 g 221 (s 2 ๐t 2 ๐u 1 ) 0.05 0.81 0.02 g 212 (s 2 ๐t 1 ๐u 2 ) -0.03-0.02-0.03 g 333 (s 3 ๐t 3 ๐u 3 ) 0.95 0.04-0.03 46

Interpretation of the decomposition in 4 terms s r = TV show component r t r = Rating scale loadings r u r = Idealized person r TV shows scales id. person weight g (s 1 ๐t 1 ๐u 1 ) Violent Violence 1 0.99 (s 2 ๐t 2 ๐u 1 ) Sensitive Sensitivity 1 0.95 (s 2 ๐t 1 ๐u 2 ) Sensitive Violence 2 0.33 (s 3 ๐t 3 ๐u 3 ) Humorous Humor 3 1.52 Note: u 2 >0 or u 2 <0 varies per person 47

Remarks on TV-ratings analysis The decomposition of the limit point resembles the R=3 CP solution with orthogonal scales. However, orthogonality between Sensitivity and Violence is not intuitive. The sign of the interaction term between Sensitive TV shows and Violence scales differs per person. The decomposition g 111 (s 1 ๐t 1 ๐u 1 ) + g 221 (s 2 ๐t 2 ๐u 1 ) + g 212 (s 2 ๐t 1 ๐u 2 ) is only partially unique, but this does not affect interpretation. 48

General Remarks General results on (non)existence of best rank-r approximations only exist for: R=1 best approx. always exists Z 2 2 2 and R=2 (De Silva & Lim, 2008) Z generic I J 2 and R 2 (Stegeman, 2006, 2008, 2015) The form of the decomposition of the limit point X is a generalization of the Jordan canonical form for matrices (Stegeman, 2013) 49

References De Silva, V. & Lim, L-H. (2008). Tensor rank and the ill-posedness of the best low-rank approximation problem. SIAM Journal on Matrix Analysis and Applications, 30, 1084-1127. Harshman, R.A., & Lundy, M.E. (1984). Data preprocessing and the extended Parafac model. In: Research Methods for Multimode Data Analysis, H.G. Law et al., Editors, Praeger, New York, pp. 216-284. Harshman, R.A. (2004). The problem and nature of degenerate solutions or decompositions of 3-way arrays. Talk at the Tensor Decompositions Workshop, Palo Alto, CA, American Institute of Mathematics. Krijnen, W.P., Dijkstra, T.K. & Stegeman, A. (2008). On the non-existence of optimal solutions and the occurrence of degeneracy in the Candecomp/Parafac model. Psychometrika, 73, 431-439. Lim, L.-H. (2005). Optimal solutions to non-negative Parafac/ multilinear NMF always exist. Talk at the Workshop of Tensor Decompositions and Applications, CIRM, Luminy, Marseille, France. Lim, L.-H. & Comon, P. (2009). Nonnegative approximations of nonnegative tensors. Journal of Chemometrics, 23, 432-441. 50

Lundy, M.E., Harshman, R.A., & Kruskal, J.B. (1989). A two-stage procedure incorporating good features of both trilinear and quadrilinear models. In: Multiway Data Analysis, R. Coppi & S. Bolasco (Editors), North-Holland, pp. 123-130. Rocci, R. & Giordani, P. (2010). A weak degeneracy revealing decomposition for the Candecomp/Parafac model. Journal of Chemometrics, 24, 57-66. Stegeman, A. (2006). Degeneracy in Candecomp/Parafac explained for p p 2 arrays of rank p+1 or higher. Psychometrika, 71, 483-501. Stegeman, A. (2008). Low-rank approximation of generic p q 2 arrays and diverging components in the Candecomp/Parafac model. SIAM Journal on Matrix Analysis and Applications, 30, 988-1007 Stegeman, A. (2012). Candecomp/Parafac from diverging components to a decomposition in block terms. SIAM Journal on Matrix Analysis and Applications, 33, 291-316. Stegeman, A. (2013). A three-way Jordan canonical form as limit of low-rank tensor approximations. SIAM Journal on Matrix Analysis and Applications, 34, 624-650. 51

Stegeman, A. (2014). Finding the limit of diverging components in three-way Candecomp/Parafac - a demonstration of its practical merits. Computational Statistics & Data Analysis, 75, 203-216. Stegeman, A. (2015). On the (non)existence of best low-rank approximations of generic IxJx2 arrays. Submitted. arxiv: 1309.5727 Stegeman, A. & De Lathauwer, L. (2009). A method to avoid diverging components in the Candecomp/Parafac model for generic I xj x2 arrays. SIAM Journal on Matrix Analysis and Applications, 30, 1614-1638. 52