Available online at ScienceDirect. Procedia Computer Science 96 (2016 )

Similar documents
FRST Multivariate Statistics. Multivariate Discriminant Analysis (MDA)

(This is a sample cover image for this issue. The actual cover is not yet available at this time.)

Alberto Maydeu Olivares Instituto de Empresa Marketing Dept. C/Maria de Molina Madrid Spain

Sparse canonical correlation analysis from a predictive point of view

Mathematical Scheme Comparing of. the Three-Level Economical Systems

Statistics for Applications. Chapter 7: Regression 1/43

Lecture Note 3: Stationary Iterative Methods

School of Electrical Engineering, University of Bath, Claverton Down, Bath BA2 7AY

Two Kinds of Parabolic Equation algorithms in the Computational Electromagnetics

School of Electrical Engineering, University of Bath, Claverton Down, Bath BA2 7AY

First-Order Corrections to Gutzwiller s Trace Formula for Systems with Discrete Symmetries

SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS

Online Appendices for The Economics of Nationalism (Xiaohuan Lan and Ben Li)

Converting Z-number to Fuzzy Number using. Fuzzy Expected Value

Math 124B January 31, 2012

A. Distribution of the test statistic

An explicit Jordan Decomposition of Companion matrices

A Simple and Efficient Algorithm of 3-D Single-Source Localization with Uniform Cross Array Bing Xue 1 2 a) * Guangyou Fang 1 2 b and Yicai Ji 1 2 c)

ScienceDirect. Numerical modelling of debris bed water quenching

Experimental Investigation and Numerical Analysis of New Multi-Ribbed Slab Structure

Partial permutation decoding for MacDonald codes

BP neural network-based sports performance prediction model applied research

Source and Relay Matrices Optimization for Multiuser Multi-Hop MIMO Relay Systems

General Certificate of Education Advanced Level Examination June 2010

Two-sample inference for normal mean vectors based on monotone missing data

NEW DEVELOPMENT OF OPTIMAL COMPUTING BUDGET ALLOCATION FOR DISCRETE EVENT SIMULATION

On a geometrical approach in contact mechanics

Published in: Proceedings of the Twenty Second Nordic Seminar on Computational Mechanics

A Brief Introduction to Markov Chains and Hidden Markov Models

International Journal of Mass Spectrometry

CS229 Lecture notes. Andrew Ng

Nonlinear Analysis of Spatial Trusses

Akaike Information Criterion for ANOVA Model with a Simple Order Restriction

FORECASTING TELECOMMUNICATIONS DATA WITH AUTOREGRESSIVE INTEGRATED MOVING AVERAGE MODELS

Model-based Clustering by Probabilistic Self-organizing Maps

6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17. Solution 7

MARKOV CHAINS AND MARKOV DECISION THEORY. Contents

Analysis of Emerson s Multiple Model Interpolation Estimation Algorithms: The MIMO Case

Numerical Simulation for Optimizing Temperature Gradients during Single Crystal Casting Process

Process Capability Proposal. with Polynomial Profile

In-plane shear stiffness of bare steel deck through shell finite element models. G. Bian, B.W. Schafer. June 2017

A Comparison Study of the Test for Right Censored and Grouped Data

STA 216 Project: Spline Approach to Discrete Survival Analysis

Interactive Fuzzy Programming for Two-level Nonlinear Integer Programming Problems through Genetic Algorithms

Moreau-Yosida Regularization for Grouped Tree Structure Learning

Multiway Regularized Generalized Canonical Correlation Analysis

Stochastic Variational Inference with Gradient Linearization

Numerical simulation of javelin best throwing angle based on biomechanical model

ABOUT THE FUNCTIONING OF THE SPECIAL-PURPOSE CALCULATING UNIT BASED ON THE LINEAR SYSTEM SOLUTION USING THE FIRST ORDER DELTA- TRANSFORMATIONS

Research of Data Fusion Method of Multi-Sensor Based on Correlation Coefficient of Confidence Distance

Gauss Law. 2. Gauss s Law: connects charge and field 3. Applications of Gauss s Law

1 Equivalent SDOF Approach. Sri Tudjono 1,*, and Patria Kusumaningrum 2

Consistent linguistic fuzzy preference relation with multi-granular uncertain linguistic information for solving decision making problems

Theory of Generalized k-difference Operator and Its Application in Number Theory

Stochastic Automata Networks (SAN) - Modelling. and Evaluation. Paulo Fernandes 1. Brigitte Plateau 2. May 29, 1997

An Information Geometrical View of Stationary Subspace Analysis

High Efficiency Development of a Reciprocating Compressor by Clarification of Loss Generation in Bearings

HYDROGEN ATOM SELECTION RULES TRANSITION RATES

A Robust Voice Activity Detection based on Noise Eigenspace Projection

Unconditional security of differential phase shift quantum key distribution

arxiv: v1 [cs.lg] 31 Oct 2017

Adjustment of automatic control systems of production facilities at coal processing plants using multivariant physico- mathematical models

Algorithms to solve massively under-defined systems of multivariate quadratic equations

Haar Decomposition and Reconstruction Algorithms

Construction of Supersaturated Design with Large Number of Factors by the Complementary Design Method

Wavelet Galerkin Solution for Boundary Value Problems

Combining reaction kinetics to the multi-phase Gibbs energy calculation

Scalable Spectrum Allocation for Large Networks Based on Sparse Optimization

How the backpropagation algorithm works Srikumar Ramalingam School of Computing University of Utah

II. PROBLEM. A. Description. For the space of audio signals

BALANCING REGULAR MATRIX PENCILS

The EM Algorithm applied to determining new limit points of Mahler measures

Reconstructions that Combine Cell Average Interpolation with Least Squares Fitting

Application of PCRTM to Hyperspectral Data

Reliability: Theory & Applications No.3, September 2006

Inverse-Variance Weighting PCA-based VRE criterion to select the optimal number of PCs

Discrete Applied Mathematics

Expectation-Maximization for Estimating Parameters for a Mixture of Poissons

Explicit Exact Solution of Damage Probability for Multiple Weapons against a Unitary Target

Investigation on spectrum of the adjacency matrix and Laplacian matrix of graph G l

Radar/ESM Tracking of Constant Velocity Target : Comparison of Batch (MLE) and EKF Performance

TUNING PARAMETER SELECTION FOR PENALIZED LIKELIHOOD ESTIMATION OF GAUSSIAN GRAPHICAL MODEL

Uniprocessor Feasibility of Sporadic Tasks with Constrained Deadlines is Strongly conp-complete

Input-to-state stability for a class of Lurie systems

A polynomial chaos-based kalman filter approach for parameter estimation of mechanical systems

Statistical Learning Theory: A Primer

A proposed nonparametric mixture density estimation using B-spline functions

Automobile Prices in Market Equilibrium. Berry, Pakes and Levinsohn

Do Schools Matter for High Math Achievement? Evidence from the American Mathematics Competitions Glenn Ellison and Ashley Swanson Online Appendix

Learning Fully Observed Undirected Graphical Models

Research Article Optimal Control of Probabilistic Logic Networks and Its Application to Real-Time Pricing of Electricity

The influence of temperature of photovoltaic modules on performance of solar power plant

Intuitionistic Fuzzy Optimization Technique for Nash Equilibrium Solution of Multi-objective Bi-Matrix Games

Convergence Property of the Iri-Imai Algorithm for Some Smooth Convex Programming Problems

Translation Microscopy (TRAM) for super-resolution imaging.

General Certificate of Education Advanced Level Examination June 2010

arxiv: v3 [math.sp] 2 Nov 2017

Optimum Design Method of Viscous Dampers in Building Frames Using Calibration Model

Simplified analysis of EXAFS data and determination of bond lengths

Supporting Information for Suppressing Klein tunneling in graphene using a one-dimensional array of localized scatterers

Transcription:

Avaiabe onine at www.sciencedirect.com ScienceDirect Procedia Computer Science 96 (206 92 99 20th Internationa Conference on Knowedge Based and Inteigent Information and Engineering Systems Connected categorica canonica covariance anaysis for three-mode three-way data sets based on Tucker mode Jun Tsuchida a * Hiroshi Yadohisa b a Graduate Schoo of Cuture and Information Science Doshisha University Tataramiyakodani -3 Kyotanabe Kyoto 60-0394 Japan b Department of Cuture and Information Science Doshisha University Tataramiyakodani -3 Kyotanabe Kyoto 60-0394 Japan Abstract When we work with two three-mode three-way data sets such as pane data we often investigate two types of factors: common factors which represent reationships between the two data sets and unique factors which show the uniqueness of each data set reative to the other. We propose a method for investigating common and unique factors simutaneousy. Canonica covariance anaysis is an existing method that aows the estimation of common and unique factors simutaneousy; however this method was proposed for use with two-mode two-way data and it is imited to quantitative data. Thus appying canonica covariance anaysis to three-mode three-way data sets or to categorica data sets is not suitabe. To overcome this probem we buid on the concept of the Tucker mode and the concept of non-metric principa component anaysis to deveop and propose a method suitabe the anaysis of categorica three-mode three-way data sets. Moreover we introduce connector matrices making it easy to determine which factors are common and aowing the seection of different numbers of dimensions for the factors. c 206 Pubished The Authors. by Esevier Pubished B.V. This is byan Esevier open access B.V. artice under the CC BY-NC-ND icense (http://creativecommons.org/icenses/by-nc-nd/4.0/. Peer-review under responsibiity of KES Internationa. Peer-review under responsibiity of KES Internationa Keywords: Aternative east squares; Canonica correation anaysis; Dimension reduction; Matrix factorization. Introduction A three-mode three-way data set is obtained from the same set of objects and variabes under different conditions; such data are obtained as a set of mutivariate data. For exampe pane data are often obtained by asking the same question of the same objects at different times. When we work with two three-mode three-way data sets we often investigate two types of factors. One type is that of common factors; these factors shows reationships between the two data sets. The other type is that of unique factors; these factors represent the uniqueness of each data set. For the investigation of unique factors we can appy dimension reduction methods such as the Tucker method (Tucker 966; Kroonenberg 983 or the parae factor anaysis (PARAFAC method (Harshman 970 to threemode three-way data. These methods are suitabe for finding the uniqueness of each data set because they are exten- Corresponding author. Te.: +8-774-65-7657 E-mai address: jun.tsuchida0328@gmai.com 877-0509 206 Pubished by Esevier B.V. This is an open access artice under the CC BY-NC-ND icense (http://creativecommons.org/icenses/by-nc-nd/4.0/. Peer-review under responsibiity of KES Internationa doi:0.06/j.procs.206.08.270

Jun Tsuchida and Hiroshi Yadohisa / Procedia Computer Science 96 ( 206 92 99 93 sions of principa component anaysis. Thus we can interpret each individua data set we. However this approach does not aow us to interpret reationships between the data sets. For the investigation of common factors we can appy canonica correation anaysis (Hoteing 936 to threemode three-way data sets. However canonica correation anaysis was proposed as a method for two-mode two-way data such as mutivariate data. Therefore it does not consider the condition when searching for reationships between data sets. That is canonica correation anaysis tends to regard the same variabe under different conditions as being different variabes. Canonica covariance anaysis is a method for investigating common factors and unique factors. However ike canonica correation anaysis this method was aso proposed for two-mode two-way data; therefore it has the same probem as canonica correation anaysis. Furthermore these two methods assume the data are quantitative; thus they are inadequate for appying to quaitative data. Moreover three-mode data often incude quaitative variabes. For exampe pane data are often obtained via a questionnaire. In this paper we propose a method for investigating common factors and unique factors simutaneousy. Our method is based on canonica covariance anaysis non-metric principa component anaysis (Young et a. 978 and the Tucker mode. Using the concept of canonica covariance anaysis we can estimate common factors and unique factors simutaneousy. Using three-mode three-way non-metric principa component anaysis based on the Tucker mode we can appy the proposed method to three-mode three-way data that can be either quantitative or quaitative. 2. Notation X = (x ijxk x Y = (y ijyk y : the (I J x K x three-way array and the (I J y K y three-way array respectivey. x ijxk x and y ijyk y are the vaue of variabe j x for object i under condition k x and the vaue of variabe j y for object i under condition k y respectivey. X = (x ipjx k x Y = (y ipjy k y : the (I J x P x K x three-way array and the (I J y P y K y three-way array respectivey. X s and Y s eements are dummy variabes of X and Y. p jx and p jy are the category numbers of variabes j x and j y respectivey. p jx = 2... P jx ; P x = J x j P x= j x ; p jy = 2...P jy ; P y = J y j P y= j y. I J x K x J y K y : number of objects variabes of X conditions of X variabes of Y and conditions of Y respectivey. X IJ xk x Y IJ yk y : the (I J x P x K x matrix (X.. X..2... X..K x and the (I J y P y K y matrix (Y.. Y..2... Y..K y respectivey. where X..k x (k x = 2 K x and Y..k y (k y = 2 K y are (I J x matrix whose eements are (x ijxk x and (I J y matrix whose eements are (y ijyk y respectivey. These are the mode- matricizations of X and Y respectivey. B x B y : the (J x r cx oading matrix for variabes of X and the (J y r cy oading matrix for variabes of Y respectivey C x C y : the (K x r cx oading matrix for conditions of X and the (K y r cy oading matrix for conditions of Y respectivey Q x = Bdiag( Q y = Bdiag( : the (J x P x K x J x K x weight matrix for dummy variabes of X and the (J y P y K y J y K y weight matrix for dummy variabes of Y respectivey. is the weight vector for variabe j x under condition k x and is the weight vector for variabe j y under condition k y. Bdiag shows bock diagona matrix. XQ YQ :an(i J x K x weighted dummy three-way array of X and an (I J y K y weighted dummy three-way array of Y respectivey. When these arrays are mode- matricized these matrices are equa to X IJ xk x Q x and Y IJ yk y Q y respectivey.

94 Jun Tsuchida and Hiroshi Yadohisa / Procedia Computer Science 96 ( 206 92 99 XQ JxIK x XQ KxIJ x : the mode-2 and mode-3 matricizations of XQ respectivey YQ JyIK y YQ KyIJ y : the mode-2 and mode-3 matricizations of YQ respectivey F x F y : the (I r bx r cx factor three-way array of X and the (I r by r cy factor three-way array of Y respectivey F (x Ir bkr ck F (x r bkir ck F (x r ckir bk : the mode- mode-2 and mode-3 matricizations of F x respectivey F (y F (y r byir cy F (y r cyir by : the mode mode-2 and mode-3 matricizations of F y respectivey X Y :an(i P jx dummy variabe matrix of X that satisfies mode 2 = j x and mode 3 = k x and an (I P jy dummy variabe matrix of X that satisfies mode 2 = j y and mode 3 = k y respectivey F (x k : the (I r bx sub-matrix of F (x Ir bkr ck that satisfies mode 3 = k [F (x ] j : the jth coumn vector of F (x f (x ir bxr cx f (y ir byr cy I n J n : the ith row vectors of F (x and F (y : the n-dimensiona identity matrix : the n-dimensiona centering matrix respectivey 3. C4A based on Tucker mode In this section we expain Connected Categorica Canonica Covariance Anaysis (C4A for three-mode three-way data sets which is based on the Tucker mode. First we describe the mode of Categorica Canonica Covariance Anaysis (CCCA for three-mode three-way data sets which is an extension of the mode for two-mode two-way data. Next since this mode does not represent the unique factors we introduce the connect matrix which can identify those dimensions that are common factors. Any factor that is not connected with any other factor by the connect matrix is a unique factor. 3.. Categorica canonica covariance anaysis for three-mode three-way data sets Given two categorica three-mode three-way data sets X and Y we consider the objective function f defined as foows: f (F x F y B x B y C x C y Q x Q y X Y (C x B x 2 + Y IJ yk y Q y F (y (C y B y 2 + F (x F (y 2 ( subject to B xb x = B yb y = I rb C xc x = C yc y = I rc X = J I X I q X X = (j x = 2...J x ; k x = 2... K x Y = J I Y and I q Y Y = (j y = 2...J y ; k y = 2... K y. When we set A x = C x B x and A y = C y B y the first and second terms of objective function f are the same as those of the two-mode two-way non-metric principa component anaysis (NCA. Moreover when Q x and Q y are given the first and second terms of objective function f are the same as those of the two-mode two-way principa component anaysis. Therefore we can regard the first and second terms of objective function f as constraint NCA. The third term is simiar to the objective function of the canonica correation anaysis. This term represents the common factors. When F x are very different from F y this term takes a arge vaue. Therefore this method searches a subspace that maximizes the variance of each data set and the covariance between data sets simutaneousy. However this method has two probems. First we must set the number of dimensions of C x C y and B x B y to be the same. That is we must assume that the number of unique factors is the same an assumption that is not suitabe for rea-word data anaysis. The other probem is that the third term considers a the factors; that is it is difficut to determine which are the common factors.

Jun Tsuchida and Hiroshi Yadohisa / Procedia Computer Science 96 ( 206 92 99 95 3.2. Connected categorica canonica covariance anaysis for three-mode three-way data sets We use the same setting here as that for the CCCA for three-mode three-way data sets given in the previous subsection. To overcome the probems with CCCA we introduce connect matrices D x and D y. D x and D y indicate which factors are the common factors that is the factor that is connected by D x and D y that serves to maximize the covariance between data sets. Using D x and D y we can seect different numbers of dimensions for C x C y and for B x B y. We describe an objective function g for the connected canonica covariance anaysis (C4A as foows: g(f x F y B x B y C x C y Q x Q y D x D y X Y (C x B x 2 + Y IJ yk y Q y F (y (C y B y 2 + F (x D x F (y D y 2 (2 subject to B xb x = I rbx B yb y = I rby C xc x = I rcx C yc y = I rcy X = J I X I q X X = (j x = 2... J x ; k x = 2...K x Y = J I Y I q Y Y = (j y = 2... J y ; k y = 2... K y D x = D cx D bx D y = D cy D by D bx {0 } rbx cb D by {0 } rby cb D cx {0 } rcx cc D cy {0 } rcy cc r bx = r cx = d (bx q b = (q b = 2... c b d (cx q c r by = r cy = (q c = 2... c c and d (by q b = (q b = 2... c b = d (cy q c = (q c = 2... c c. When we set r bx = r by r cx = r cy c c = r cx c b = r bx D x = I rcxr bx and D y = I rcxr bx the objective function g is equa to the objective function f of the CCCA. We describe D x and D y as D x = D cx D bx and D y = D cy D by. D bx and D by represent a common factor for variabes whie D cx and D cy represent a common factor for conditions. 3.3. Agorithm for C4A To estimate the parameters of C4A we use an aternative east squares agorithm which is described as foows: Step : Set r bx r by r cx r cy c c and c b Step 2: Initiaize B x B y C x C y F x and F y Step 3: Update Q x and Q y Step 4: Update D x and D y Step 5: Update F x and F y Step 6: Update C x and C y Step 7: Update B x and B y Step 8: Repeat Steps 3 7 unti the vaue of the objective function converges We expain the detais of the steps for updating parameters in the subsections that foow. 3.3.. Updating Q x and Q y Given B x B y C x C y F x and F y we obtain an objective function g as foows: g(q x Q y B x B y C x C y F x F y X Y (C x B x 2 + Y IJ yk y Q y F (y (C y B y 2 + const (3

96 Jun Tsuchida and Hiroshi Yadohisa / Procedia Computer Science 96 ( 206 92 99 where const is a constant unreated to Q x and Q y. From equation (3 the formua for updating Q x is independent from that for updating Q y. Thus we first describe the formua for updating Q x. We rewrite the first term in equation (3 as foows: X IJ xk x (C x B x 2 = 2tr(Q xx IJ xk x F (x (C x B x + const. (4 From equation (4 we consider the Q x that minimizes equation (3 as the Q x that maximizes tr(q xx IJ xk x F (x (C x B x. From the definition of Q x in order to maximize tr(q xx IJ xk x F (x (C x B x we consider each vaue of. Objective function g for is obtained as foows: r cx g ( C x B x F x X = tr(q X c kxf (x b jx From the constraint on this objective function g is very simiar to the objective function of the canonica correation anaysis. Therefore we obtain the formua for updating as foows; = I(X X 2 u (qx where u (qx is the first dimension eft singuar vector of (X X 2 X Jn ( r cx c kxf b jx. The formua for updating is obtained in the same way with the resut as foows: = I(Y Y 2 u (qy (5 where u (qy is the first dimension eft singuar vector of (Y Y 2 Y Jn ( r cy c kyf b jy. 3.3.2. Updating D x and D y Given F x and F y for the formuas to update D x and D y we can rewrite the objective function of C4A as foows: F (x D x F (y D y + const = F (x (D cx D bx F (y (D cy D by + const = D bx F(x r bxir cx (D cx I n D by F(y r byir cy D cy I n + const (6 = D cxf (x r cxir bx (D bx I n D cyf (y c cyir by D by I n + const (7 Given D y and D cx we can regard equation (6 as a k-means objective function. Thus we obtain the formua for updating D bx as foows: ( d (bx = arg min q = 0 (otherwise [F(x r bxir cx (D cx I n ] d q (by r byir cy (D cy I n (q = 2...c b. The derivation of the formua for updating D cx is the same concept: Given D y and D bx we can aso regard equation (7 as a k-means objective function thus obtaining the foowing formua for updating D cx : ( d (cx = arg min q = 0 (otherwise D y is updated in the same way as D x. [F(x r cxir bx (D bx I n ] d q (cy F (y r cyir by (D by I n (q = 2...c c. F (y

Jun Tsuchida and Hiroshi Yadohisa / Procedia Computer Science 96 ( 206 92 99 97 3.3.3. Updating F x and F y Given B x B y C x C y Q x Q y D x and D y we consider the formua for updating F x and F y. For updating F x we fix F y. Then we can rewrite the objective function of C4A as foows: g(f x B x B y C x C y F y Q x Q y D x D y X Y (C x B x 2 + F (x D x F (y D y 2 (8 + const. This objective function is simiar to ridge regression; that is the first term may be regarded as a regression term and the second term may be regarded as a penaty term. Thus we obtain the foowing formua for updating F x : F (x = (X IJ xk x Q x (C x B y + F (y Ir cyr by D y D x(i rbxr cx + D x D x. The formua for updating F y is obtained in the same way as that for F x and is as foows: F (y = (Y IJ yk y Q y (C y B x + F (x Ir cxr bx D x D y(i rbyr cy + D y D y. 3.3.4. Updating C x and C y Given B x B y F x F y Q x and Q y we consider the formua for updating C x and C y. We can rewrite the objective function of C4A for updating C x and C y as foows: g(c x C y B x B y F x F y Q x Q y X Y (C x B x 2 + Y IJ yk y Q y F (y (C y B y 2 + const (9 = XQ KxIJ x C x F (x r cxir bx (B x I n 2 + YQ KyIJ k C y F (y r cyir by (B y I n 2 + const (0 From equation (9 we can see that C x is unreated to C y. Thus we can update C x and C y simutaneousy. First we consider the formua for updating C x. From equation (0 and the constraint of C x we obtain the formua for updating C x using the same method of Procrustes rotation (Zou et a. 2006. The resuting formua for updating C x is C x = U cx V cx where U cx and V cx are the eft and right singuar matrices respectivey of (XQ KxIJ x (B x I n F (x r cxir bx. We obtain aso the formua for updating C y which is C y = U cy V cy where U cy and V cy are the eft and right singuar matrices respectivey of (YQ KyIJ y (B y I n F (y r cyir by. 3.3.5. Updating B x and B y Updating B x and B y is very simiar to updating C x and C y. First given C x C y F x F y Q x and Q y we rewrite the objective function for updating B x and B y as foows: g(b x B y C x C y F x F y Q x Q y X Y (C x B x 2 + Y IJ yk y Q y F (y (C y B y 2 + const ( = XQ JxIK x B x F (x r bxir cx (C x I n 2 + YQ JyIK y B y F (y r byir cy (C y I n 2 + const. (2 From the equation ( we see that B x is unreated to B y. Therefore we can update B x and B y simutaneousy. The objective function for updating B x and B y is the same as the objective function for updating C x and C y. Thus we obtain the foowing formua for updating B x : B x = U bx V bx where U bx and V bx are the eft and right singuar matrices respectivey of (XQ JxIK x (C x I n F (x r bxir cx. We obtain aso the formua for updating B y : B y = U by V by where U by and V by are the eft and right singuar matrices respectivey of (YQ JyIK y (C y I n F (y r byir cy.

98 Jun Tsuchida and Hiroshi Yadohisa / Procedia Computer Science 96 ( 206 92 99 4. Numerica exampe In this section we describe C4A s estimator has ess bias than previous works when appying C4A to three-mode three-way data under C4A conditions. For evauating estimation parameters of oadings we compare C4A with CCCA for three-mode three-way and for two-mode two-way data. We set the true parameters B x B y C x and C y as foows: B x = (b x b x2 b x3 B y = (b y b y2 b y3 C x = (c x c x2 c x3 C y = (c y c y2 c y3 b x = ( 5 0 0 b x2 = (0 5 5 0 5 b x3 = (0 0 5 b y = ( 5 0 b y2 = (0 5 5 0 6 b y3 = (0 6 c x = ( 5 0 c x2 = (0 5 6 0 5 c x3 = (0 5 c y = ( 4 0 8 c y2 = (0 4 3 0 5 c y3 = (0 7 5 where d and 0 d are the d-dimensiona one vector and the d-dimensiona zero vector respectivey. Then for satisfying the constraint we normaize the oading matrices. F x and F y are generated as foows: ( f (x ir bxr cx f (y ir byr cy i.i.d. N(0 Σ Σ = (σ ij (i = j σ ij = 0.8 ((i j {( 9 (3 7 (4 6 (6 4}. 0 (otherwise This setting represents the case in which there are two common factor oadings for variabes and conditions. Thus there are four common factors in these data. To generate data sets X and Y we first set score data sets X and Y as foows: X IJ xk x = F Irbxr cx (C x B x + E x Y IJ yk y = F Irbyr cy (C y B y + E y E x = (ε (x ij E y = (ε (y ij ε(x i.i.d. ij N(0 sd 2 ε (y i.i.d. ij N(0 sd 2. Then we generate X and Y as foows: (x ijxk x Quantie(x jxk x 0.25 2 (Quantie(x jxk x ijxk x = x 0.25 < x ijxk x Quantie(x jxk x 0.45 3 (Quantie(x jxk x 0.45 < x ijxk x Quantie(x jxk x 0.85 4 (Quantie(x jxk x 0.85 < x ijxk x (i = 2... n; j x = 2... 5; k x = 2... 6 (y ijyk y Quantie(y jyk y 0.2 2 (Quantie(y jyk y 0.2 < y ijyk y Quantie(y jyk y 0.4 y ijyk y = 3 (Quantie(y jyk y 0.4 < y ijyk y Quantie(y jyk y 0.6 4 (Quantie(y jyk y 0.6 < y ijyk y Quantie(y jyk y 0.8 5 (Quantie(y jyk y 0.8 < y ijyk y (i = 2... n; j y = 2... 6; k y = 2... 2 where x jxk x is the I-dimensiona vector of variabe j x under condition k x of X y jyk y is the I-dimensiona vector of variabe j y under condition k y of Y and Quantie(x h is the function returning the h-quantie of x. We set the number of objects as 300 and 500 and the standard deviation sd of noise as 0. and 0.3. We set the number of dimension of C x C y B x and B y as 3. For two-mode two-way anaysis we set the number of dimension A x and A y are 9 because the numbers of dimension of C x B x and C y B y are 9. For each estimator we cacuate the mean of squared error as foows: R ( Ĉ x ˆB x C x B x 2 + Ĉ y ˆB y C y B y 2 where R is reputation times. We set the reputation time R as 00. When we evauate the mean squared error for two-mode two-way method we set Ĉ x ˆB x = Â x and Ĉ y ˆB y = Â y. Tabe and Fig show the simuation resuts. From Tabe the resut of C4A is the smaest mean of squared error. However the standard deviations of C4A and CCCA for three-mode three-way data estimators are arger than for two-mode two-way method. One of the reason is as foows: C4A and CCCA for three-mode three-way data has oca minimum probem.

Jun Tsuchida and Hiroshi Yadohisa / Procedia Computer Science 96 ( 206 92 99 99 Tabe. the mean of squared error of estimator. the vaue in parentheses is standard deviation Setting C4A CCCA for three-mode three-way CCCA for two-mode two-way n = 300 sd = 0. 7.439 (3.395.445 (3.283 26.356 (.548 n = 500 sd = 0. 7.860 (3.503.357 (3.546 26.370 (.665 n = 300 sd = 0.3 7.860 (3.587 2.86 (3.253 26.340 (.32 n = 500 sd = 0.3 7.806 (3.40.653 (3.004 26.355 (.524 n=300 sd=0. n=500 sd=0. Sum of Square Error 0 5 0 5 20 25 30 Sum of Square Error 0 5 0 5 20 25 30 C4A CCCA Two mode C4A CCCA Two mode n=300 sd=0.3 n=500 sd=0.3 Sum of Square Error 0 5 0 5 20 25 30 Sum of Square Error 0 5 0 5 20 25 30 C4A CCCA Two mode C4A CCCA Two mode Fig.. Boxpots for the sum of squared errors. Labeing C4A CCCA Two-mode on the x-axis stand for boxpots of C4A CCCA for three-mode three-way CCCA for two-mode two-way respectivey 5. Concusion We have proposed the C4A method. This method has three advantages: First it can be appied to categorica data sets. The C4A method is based on NCA; therefore it is easy to extend C4A to C4A based on mutipe correspondence anaysis. Second it is easy to understand which factor is the common factor. Third it is easy to understand which factor is the unique factor. For future study we see a need to acceerate the agorithm. One oop of the agorithm must perform singuar vaue decomposition at east m x + m y + 4 times; thus the arger the number of iterations the onger the cacuation time. When the agorithm is appied to data that have a arge number of dimensions of modes 2 and 3 the number of iterations wi tend to be arger than when appied to data having a sma number of dimensions of modes 2 and 3. To overcome this probem we woud consider appying an acceeration method such that described in Kuroda et a. (202 to the C4A agorithm. References. Harshman R. A. (970. Foundations of the PARAFAC procedure: Modes and conditions for an expanatory muti-moda factor anaysis. UCLA Working Papers in Phonetics 6 84. 2. Hoteing H. (936. Reations between two sets of variates. Biometrika 28 32 377. 3. Kroonenberg P. M. (983. Three-Mode Principa Component Anaysis: Theory and Appications (Vo. 2. DSWO Press. 4. Kuroda M. Mori Y. Iizuka M. and Sakakihara M. (202. Acceeration of convergence of the aternating east squares agorithm for noninear principa components anaysis. In Principa Component Anaysis (Sanguansat P. (Ed. 29 44. 5. Tucker L. R. (966. Some mathematica notes on three-mode factor anaysis. Psychometrika 3 279 3. 6. Young F. W. Takane Y. and de Leeuw J. (978. The principa components of mixed measurement eve mutivariate data: An aternating east squares method with optima scaing features. Psychometrika 43 279 28. 7. Zou H. Hastie T. and Tibshirani R. (2006. Sparse principa component anaysis. Journa of Computationa and Graphica Statistics 5 265 286.