N-mode Analysis (Tensor Framework) Behrouz Saghafi

N-mode Analysis (Tensor Framework) Drawback of 1-mode analysis (e.g. PCA): Captures the variance among just a single factor Our training set contains changes in more than 1 factor: People, action, viewpoint, etc This motivates analysis in multiple modes.

N-mode Analysis (Related work) Ding and Ye [1] extend the common matrix SVD to 2D-SVD. 2D-LDA has been introduced [2]. Vasilescu and Terzopoulos [3-5] : proposed the idea of using N-mode SVD on the data tensor to decompose it into multiple factors. Have Applied it on face recognition and also synthesis and recognition of human signatures and actions. [1] C. Ding and J. Ye, "Two-dimensional Singular Value Decomposition (2DSVD) for 2D Maps and Images," in SIAM Int'l Conf. Data Mining, 2005. [2] K. Inoue and K. Urahama, "Non-Iterative Two-Dimensional Linear Discriminant Analysis," in ICPR, 2006. [3] M. A. O. Vasilescu and D. Terzopoulos, "Multilinear Image Analysis for Facial Recognition," in ICPR, 2002. [4] M. A. O. Vasilescu and D. Terzopoulos, "Multilinear Analysis of Image Ensembles: TensorFaces," in ECCV, 2002. [5] M. A. O. Vasilescu, "Human Motion Signatures: Analysis, Synthesis, Recognition," in ICPR (3), 2002, pp. 456-460.

Drawback of Vasilescu s method on action recognition Using point trajectories as features for representing actions. Instead, sillhouettes are more informative cues. Point Trajectories require accurate and expensive tracking methods, but silhouettes can be approximated through edgemaps, so are extracted more efficiently. Data tensor comprises three modes :actions, people and joint angles. We separate the modes regarding frames and pixels because they contain different types of information (without making the computation cost increase sensibly). In action recognition, they assumed the person to be known. They have used a very small motion capture database comprising three simple actions: walk, ascend stairs and descent stairs. No numerical evaluation of the results is provided.

Tensors Tensor: extend the concepts of vectors and matrices into higher orders. A I1... In... IN The order of tensor is N. An element of A is denoted as a i... i where n... i A 1 i 1 N n n mode n vectors of tensor : the n-dimensional vectors obtained by varying index i n while keeping the other indices fixed or the column vectors of matrix that results from flattening the tensor. A( n ) I

Flattening a tensor

Product of a tensor by a matrix B A M B MA n ( n) ( n)

N-mode SVD (HD-SVD) SVD: D U U T 1 2 SVD (in term of mode-n products): D U U 1 1 2 2 N-mode SVD: Core tensor D Z U U... U... U 1 1 2 2 n n N N Mode matrices

N-mode Action Video Analysis > form tensor D from the image ensembles: Scenario 1 (3 modes) Mode 1: pixels Mode 2: actions Mode 3: people Mode 1: pixels Scenario 2 (4 modes) Mode 2: frames Mode 3: actions Mode 4: people

N-mode Action Video Analysis For 3-mode scenario:

N-mode Action Video Analysis N-mode SVD: Data tensor D Z U U U U D B U 1 pixels 2 frames 3 actions 4 people 3 actions Basis tensor Action space embedded matrix Basis Tensor Computation: B Z U U U D 1 pixels 2 frames 4 people 3 U T actions

linear projections Index into the basis tensor for a particular t & p : B t, p Flatten along the action mode: B tp, (actions) For training frames y B x T a t, p(action) t, a, p x B y T t, a, p t, p(action) a therefore Given an unknown frame x, project it into a set of candidate embedded vectors for every t and p: T y t, p t, p(action) compare each t, pagainst the learned vectors a to find the action class in a nearest neighbor framework. y B y x

Experimental Results (Data Sets) Weizmann Database: A widely used database with a reasonable size Contains ten action classes performed by nine different human subjects. Actions include bending (bend), jumping jack (jack), jumping-forward-on-two-legs (jump), jumping-in-place-on-twolegs (pjump), running (run), galloping sideways (side), skipping (skip), walking (walk), waving-one-hand (wave1), and wavingtwo-hands (wave2).

Experimental Results (Data Sets) Weizmann Database:

Experimental Results (Preprocessing) We use the silhouettes provided All the silhouettes are centered and normalized into the same dimension (64 48) We find the sequence periods using [1] which uses absolute correlation between frames. In 3-mode scenario: We use the max period to select equal length subsequences. In 4-mode scenario: We warp all the sequences into the same temporal duration using bicubic interpolation technique. [1] R. Cutler and L. Davis, "Robust Real-Time Periodic Motion Detection, Analysis, and Applications," IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, pp. 781-796, 2000.

Experimental Results 1-mode (PCA) 77.78% 3-mode 80.25% 4-mode 85.19% Neibles et al. (CVPR 2007) 72.8%