Higher Order Separable LDA Using Decomposed Tensor Classifiers

Size: px

Start display at page:

Download "Higher Order Separable LDA Using Decomposed Tensor Classifiers"

Marion Fields
5 years ago
Views:

1 Higher Order Separable LDA Using Decomposed Tensor Classifiers Christian Bauckhage, Thomas Käster and John K. Tsotsos Centre for Vision Research, York University, Toronto, ON, M3J 1P3 Abstract. The idea of understanding collections of digital images as higher order tensors is gaining ever more popularity in computer vision. A growing body of literature suggests that tensorial methods outperform common vector-based representations when it comes to image coding. Surprisingly few contributors, however, have yet recognized the potential that multilinear algebra offers for classification. In this paper, we demonstrate the advantages tensor classifiers offer for view-based object detection. We present a higher order extension of linear discriminant analysis that applies to multilinear objects of arbitrary order. In contrast to other recent contributions, it does not rely on the technique of n-mode SVD. Instead, we apply an alternating least squares procedure to repeated tensor contractions to obtain an R-term approximation of a projection tensor for higher order LDA. The resulting multilinear classifiers train within seconds. As they are separable, they also provide fast runtime behavior. In addition to fast trainingand runtime, the methods provides good accuracy. Empirical results in multilinear color object detection and illumination invariant face detection show that the method performs robustly and reliably on complex, unconstrained natural scenes. 1 Introduction, Motivation and Background Surveying recent literature, one can note an increased interest in tensor-based methods for image processing and computer vision. This appears to have been stirred by work by Shashua and Levin [1] and Vasilescu and Terzepoulos [2]. As both contributions provide good examples for the different benefits of multilinear representations of image data, it is instructive to have a closer look at what they report. Interpreting a set of grey level images {A 1, A 2,...} as a third-order tensor A, Shashua and Levin propose to consider a rank-1 decomposition of this tensor. Their multi-matrix extension of the singular value decomposition (SVD) yields a set of rank-1 matrices whose linear span includes the input data. Empirical results show that projecting the data into the corresponding subspace captures spatial and temporal redundancies in the input and therefore is well suited for image coding and subsequent classification. Vasilescu and Terzepoulos represent an ensemble of images depicting different faces with different facial expressions under different lighting conditions as a fifth-order tensor B. Given this tensor, they apply the n-mode or higher order SVD (HOSVD) originally developed by De Lathauwer et al. [3]. This leads to an (n-mode) factorization of B, where one of the factors is a fifth-order tensor Z called the core tensor. The significance of Z is that it allows for computing eigen modes of the image ensemble. The

2 method thus provides a means for flexible, application-dependent dimension reduction: subspaces may be tailored that represent some of the independent modalities in the original input set more faithfully than others. Similar to these early contributions, most of the more recent papers on tensor methods for computer vision emphasize dimension reduction. Wang and Ahuja [4] extend the HOSVD procedure and obtain sparse but faithful representations of video data. Shashua and Hazan [5] consider non-negative tensor factorizations and demonstrate that these lead to semantically meaningful basis images. Dealing with video rendering, Vlasic et al. [6] apply HOSVD to mesh models of faces and Wang et al. [7] present a block-wise HOSVD that efficiently handles 6th- or 7th-order tensors representing texture data. Given the success and benefits of tensor-based subspace methods for image coding, surprisingly little work has been reported on adopting multilinear techniques to classifier design for pattern recognition or classification. An account of early attempts on multivariate classifiers is given in [8]. In the same paper, Tenenbaum and Freeman introduce bilinear models for classification tasks where the input patterns are formed from two independent factors. However, although their classifiers are multilinear functions, they still require vectorized input data. Just recently, several researchers independently reported on how to overcome this requirement and introduced different multilinear extensions of Fisher s linear discriminant analysis (LDA). The methods described in [9 11] all focus on classifying data which is given in form of second-order tensors. As each of these techniques is related to the matters treated here, we will describe them in more depth later on. However, it is worthwhile pointing out already that all three contributions report results showing that interpreting grey-value images as second-order tensors leads to an increased performance in visual object recognition. In this paper, we present an extension of the iterative least squares approach to second-order tensor classification which we introduced in [9]. Following the general strategy laid out there, we present how to extend LDA to tensors of arbitrary higher orders. Towards the problem of binary classification for appearance-based object recognition, we will make use of the concept of orthogonal tensor decompositions and derive an algorithm for training separable, multilinear discriminant classifiers. It will turn out that by design our approach to higher order LDA has several favorable characteristics: (i) the approach only requires few samples for training; (ii) its training times are very fast; (iii) as the resulting classifiers are separable, their runtime behavior is fast as well. In addition, the advantageous, structure-preserving properties of tensor methods in image coding become apparent in image classification, too. Experiments with third-order tensor classifiers revealed that the technique is applicable to color object detection and illumination invariant face detection. On a dataset of RGB-images of objects in complex natural scenes, as well as on higher order representations of grey-value images using illumination insensitive features, multilinear classification coped with considerable variation in the data and was fast and robust. We will first introduce basic definitions and notational conventions needed in later sections. Section 3 will present our approach to higher order LDA and contrast it with the other recent proposals. In section 4, we will present and discuss our current experimental results. Finally, a summary and an outlook to promising directions for future work will close this contribution.

3 2 Basic Concepts and Notation In the remainder of this paper, we will make frequent use of definitions and notational conventions adopted from Kolda [12]. If A is an m 1 m 2... m n tensor over R, we say that its order is n and its jth dimension is m j. The elements of A can be indexed as follows A i1i 2...i n where i j {1, 2,..., m j } for j = 1,..., n. The set of all tensors of size m 1 m 2... m n is denoted by T (m 1, m 2,..., m n ). Whenever we can neglect the dimensions but wish to refer to the order of a tensor, we simply write T n. The inner product of two tensors A, B T (m 1, m 2,..., m n ) is defined as A B = m 1 m 2 i 1=1 i 2=1... m n i n=1 A i1i 2...i n B i1i 2...i n. Using Einstein s summation convention, in which we implicitly sum over repeated indices in products, we may also write A B = A i1i 2...i n B i1i 2...i n. Note that the inner product is a special case of a tensor contraction. This class of operations comprises multiplications of tensors (of possibly different orders) which result in lower order objects. A familiar example is the multiplication of a matrix M T (m 1, m 2 ) with a vector u T (m 2 ): Mu = v. The components of the resulting vector v T (m 1 ) are given by v i = M ij u j. In Penrose s abstract index notation, the indices assume the role of abstract markers in terms of which the algebra is formulated. We can thus express u and M as u u j and M M ij, respectively. This introduces precious versatility into the writing of tensor equations. The following expressions, for instance, become equally valid to denote the contraction in our example: Mu = v v i = M ij u j. A tensor A T (m 1, m 2,... m n ) is a decomposed or rank 1 tensor, if it can be written as A = a 1 a 2... a n where denotes the outer product and the factors a j are vectors of corresponding dimensions, i.e a j R mj. For the elements of a rank 1 tensor we have: A i1i 2...i n = a 1 i 1 a 2 i 2... a n i n. Two decomposed tensors U, V T (m 1, m 2,..., m n ) are orthogonal, if U V = u j v j = 0. They are said to be completely orthogonal, if u j v j 1 j n.

4 3 Extending LDA to Higher Order Objects Linear discriminant analysis (LDA) is a well established, powerful tool for dimensionality reduction and classification and applications abound (cf. [13] and the references therein). In this paper, we will focus on the linear discriminant analysis of (multi)linear data from two classes we shall call ω + and ω. Next, we will briefly summarize traditional LDA and two recent extensions to higher order objects; afterwards we present an alternative approach to generalizing LDA to tensor spaces. 3.1 The Generalized Eigenvalue Approach and its Extensions to Tensors Traditional LDA deals with vectorial data. Given a set of feature vectors {x 1, x 2,..., x N } containing positive and negative examples, binary LDA seeks a projection w x l, l = 1... N, of the samples that maximizes the inter-class distance of the resulting scalars. The most widely applied technique for finding the direction w of the optimal projection dates back to seminal work by Fisher [14]. He proposed to determine w by maximizing the Rayleigh quotient w T S b w/w T S w w where S b and S w are matrices that denote the between-class and within-class scatter of the data. Following this proposal, w results from solving the generalized eigenvalue problem S b w = αs w w. Once w has been found, binary classification simply requires selecting a suitable threshold along this direction. Recently, Ye et al. [11] applied LDA to matrix spaces. Dealing with data given in form of matrices X, they seek projection matrices L and R such that projecting the data onto a lower-dimensional space according to L T XR preserves the structure in the original higher-dimensional space. Their solution is an iterative procedure of solving generalized eigenvalue problems for row- and column-space projections of the data. More recently yet, Yan et al. [10] extended the Rayleigh criterion to the binary discriminant analysis in higher order tensor spaces. Their technique basically relies on the j-mode or higher order SVD (HOSVD) A = Z 1 U 1 2 U 2... n U n (1) developed by De Lathauwer [3]. Here, the core tensor Z is of the same order and dimensions as the tensor A. The U j are mutually orthogonal, unitary matrices and the j-mode product j of a tensor A T (m 1,..., m j,..., m n ) with a matrix U T (k j, m j ) results in a tensor B T (m 1,..., m j 1, k j, m j+1,..., m n ). The projection matrices U j are found by unfolding A along the jth mode. Each U j is given by the left singular matrix of the corresponding, unfolded matrix A j. Higher order SVD of nth-order tensors thus requires n matrix SVDs of matrices of size m j m 1 m 2... m j 1 m j+1... m n. Yan et al. develop an iterative procedure where they (i) initialize the projection matrices U j to unity, (ii) compute HOSVD projections of the training data, (iii) compute the between- and within-class scatter matrices along all modes and (iv) refine the corresponding matrices U j by solving generalized eigenvalue problems and continue with the second step until the projection matrices converge to a stable solution. Both, Ye et al. and Yan et al. evaluate their techniques on grey level image data. Both show that, for the task of face recognition, higher order LDA outperforms conventional linear discriminant analysis applied to vectorized image data.

2 The Least Mean Squares Approach A well known but underexploited fact that Fisher himself pointed out [14] is that binary LDA is equivalent to the least mean squares (LMS) fitting of a hyperplane

5 (a) X T (72, 92, 3) (b) X T (70, 70, 2) Fig. 1. Examples of 3rd-order tensor representations for computer vision. 1(a) RGB image patch for color object detection. 1(b) two-layered curvature features for face detection. 3.2 The Least Mean Squares Approach A well known but underexploited fact that Fisher himself pointed out [14] is that binary LDA is equivalent to the least mean squares (LMS) fitting of a hyperplane that separates the classes ω + and ω. The projection direction corresponds to the normal vector of the plane. Dealing with vectorial data, the LMS procedure is the following: Given a sample of l = 1,..., N pattern vectors x l and a corresponding set of class labels y l, the optimal projection direction w for classification results from minimizing the error E(w) = 1 2 ( y l w x l) 2. (2) l Setting the gradient w E = 0 and rearranging the resulting terms results in a closed form solution for w: w = C 1 xx y x (3) where C xx = j xj x j is the correlation matrix of the sample vectors and y x denotes the cross correlation vector between the samples and the class labels. 3.3 Extending the LMS Approach to Tensors In contrast to the contributions discussed above, we propose to make use of the LMS approach to LDA in order to extend it to multilinear objects. Aiming at image data and fast visual object detection, we apply an alternating least squares (ALS) procedure to perform separable higher order LDA. This idea of approaching tensor-based LDA arose from two observations concerning color-image processing. A color image typically consists of several image planes or layers and can thus be interpreted as a third-order tensor I T (m 1, m 2, m 3 ) where m 1 and m 2 correspond to the x- and y-resolution and m 3 counts the number of layers, for instance, m 3 = 3 for RGB images (see Fig. 1). Most approaches to appearance based object recognition transform image patches X of size, say, m n d into vectors x R mnd. Because of the extreme dimension of the resulting space, they first require dimensionality reduction

6 before classification is carried out. The first step towards fast multilinear discriminant analysis for visual processing is to refrain from unfolding X but to consider the inner product W X = W ijk X ijk, where the projection tensor W assumes the role of the projection vector w in traditional LDA. The second step towards fast linear discriminant classification results from efficiency considerations. Object detection by iterating over a color image I and multiplying each visited image patch with a projection tensor W T (m, n, d) requires O(mnd) operations per pixel. Even on modern computers, this will be prohibitive if m and n are fairly large. Assume, however, W was given as a R-term sum of rank-1 tensors R W = u r v r w r (4) r=1 where u r R m, v r R n and w r R d. Then, the iteration over the image can be computed as a sequence of one-dimensional convolutions r(( ) I u r v r) w r. This reduces the effort to O(R(m + n + d)) and therefore provides a fast multilinear approach to object detection. In describing how to derive such separable projection tensors from training data, we will, for convenience, first consider the derivation of a R = 1 term separable higher order LDA projection where W = u v w. Assume a sample {X l, y l } l=1,...,n of image patches X l with corresponding class labels y l. Due the rank-1 constraint on W there is no closed form solution that would minimize the error E(u, v, w) = 1 2 l( y l X l u v w ) 2. However, the following alternating least squares procedure will minimize the error with respect to the modes: 1. randomly initialize u R m and v R n 2. for l = 1,..., N compute the contractions x l k = X ijk l u iv j 3. compute w = C 1 x k x k y xk 4. for l = 1,..., N compute the contractions x l j = X ijk l u iw k 5. compute v = C 1 x jx j y xj 6. for l = 1,..., N compute the contractions x l i = X ijk l v jw k 7. compute u = C 1 x ix i y xi As the procedure starts with arbitrary vectors u and v, steps 2 through 7 must be iterated until a suitable convergence criterion is met. Our implementation considers the refinement of the vector u. If, in iteration t, u(t) u(t 1) ɛ, the process is stopped. Practical experience shows that this yields quick convergence. Extending the procedure to compute a single decomposed projection tensor to the derivation of multi-term tensors is straightforward. Additive stage-wise modeling [13] provides the toolkit. If W = k r=1 ur v r w r is a k term solution for the LDA projection tensor, a k + 1 term representation can be found by minimizing E(u k+1, v k+1, w k+1 ). Note, however, that it is appropriate to require that every newly found rank-1 tensor u k+1 v k+1 w k+1 be orthogonal to the ones derived so far. In this way, the resulting projection tensor W favors directions of maximum variance in the data tensor space over less informative ones. Orthogonality is guaranteed, if the

7 Input: a training set {X l, y l } l=1,...,n of image patches X l T n with class labels y l Output: a rank-r approximation of an nth-order projection tensor W = u r 1 u r 2... u r n for r = 1,..., R t = 0 for j = 1,..., n 1 randomly initialize u r j(t) orthogonalize u r j(t) w.r.t. {u 1 j,..., u r 1 j } repeat t t + 1 for j = n,..., 1 for l = 1,..., N contract x l i j = Xi l 1...i j 1 i j i j+1...i n u r i 1 (t)... u r i j 1 (t) u r i j+1 (t)... u r i n (t) compute u r j(t) = C 1 x ij x ij y xij orthogonalize u r j(t) w.r.t. {u 1 j,..., u r 1 j } until u r 1(t) u r 1(t 1) ɛ t > t max endfor Fig. 2. Repeated alternating least squares scheme for computing an nth-order LDA projection tensor W given as a sum of R completely orthogonal basis tensors u r 1 u r 2... u r n. individual rank-1 tensors are completely orthogonal. Therefore, the (modified) Gram- Schmidt procedure is applied after steps 3, 5, and 7 in the above algorithm. Although we derived the algorithm focusing on the case of 3rd-order tensors, its underlying principles immediately apply to tensors of arbitrary order. A summary of the general, n-th order form of the procedure, including stage-wise refinement and orthogonalization is given in Figure Properties and Benefits of Higher Order LDA The ALS approach to higher order tensor LDA approach we have described has significant differences from past work and provides valuable advantages. These are summarized in the following paragraphs. Multilinear LDA by extension of the LMS approach should not be confused with orthogonal LDA. Our approach does not seek a set of orthogonal discriminant directions as does O-LDA [15]. Rather, we determine a single discriminant direction requiring that the projection tensor is given as a sum of R pairwise orthogonal tensors of rank 1. Multilinear LDA by extension of the LMS approach does not aim at computing higher order SVD. The above derivation of rank-r projection tensors resembles the iterative computation of conventional matrix SVD. In fact, if the above algorithm was applied to 2nd-order tensors, it would yield vectors u r and v r proportional to the left and right singular vectors of the 2nd-order projection tensor W which would result from an un-

8 constrained LMS approach to LDA [9]. It is thus no surprise to see that, in applied mathematics, alternating least squares techniques have been considered as a means of computing tensor SVDs (cf. e.g. [12]). Note however, that our goal is not to decompose a given tensor but to find an efficient multilinear classifier directly from training data. Neither are we interested in high-rank solution; the smaller the rank R of the resulting classifier, the faster will be its training and runtime behavior. Multilinear LDA by extension of the LMS approach is less involved than approaches generalizing the Rayleigh criterion. While our approach resembles the methods discussed in section 3.1 in that it is an iterative technique, techniques that generalize the Rayleigh criterion to higher orders require the computation of two correlation matrices for each mode in each iteration. Tensor LDA based on the extension of the LMS approach requires only one such matrix per mode and iteration. Moreover, while the method by Yan et al. [10] is as general as our proposal, it necessitates frequent computations of SVDs of large matrices whereas our approach only requires inversions of reasonably small matrices. Multilinear LDA by extension of the LMS approach trains quickly. If multivariate data of size m n d were unfolded into vectors, conventional LDA based on LMS optimization or on solving the generalized eigenvalue problem S b w = αs w w would require the computation and inversion of covariance matrices of sizes mnd mnd. Even for moderate values of m and n and not too many training examples, this may become infeasible. However, the covariance matrices C xk x k, C xjx j and C xix i that appear in the learning stage of our approach to 3rd-order LDA are of considerably reduced sizes d d, n n and m m, respectively. Therefore, in addition to its fast runtime due to separability, our technique significantly shortens training time. In practice, we found that, compared to traditional LDA on very high dimensional vector spaces, our tensor space method reduces training times from hours to seconds. Multilinear LDA by extension of the LMS approach tackles the small sample size problem. This property is closely related to the previous one. The term small sample size problem refers to the effect that for conventional, vectorial LDA the within-class scatter matrix S w is often singular because the number of training samples is much smaller than the dimension of the embedding space [15]. Again, as the covariance matrices that appear in computing decomposed, mutually orthogonal projection tensors are of small dimensionality, small sample sizes will not hamper multilinear LDA. 4 Experiments This section presents two application examples for higher-order, multilinear discriminant analysis using the algorithm derived above. First, we regard the problem of robust object detection in unconstrained real world environments. Afterwards, we consider fast face detection under a wide range of illumination conditions. Note that our focus is on object detection rather than on object recognition. This accords with one of the basic properties of our classifiers: they are tailored to be separable higher order tensors and thus allow for fast processing of multivariate, 3rd-order image data. In each experiment, the input was normalized to zero mean X = X M, where M denotes the mean of the training samples. During runtime, this accounts only for

Fig. 3. Exemplary results of color object detection in natural environments.

For training, 27 randomly chosen images were considered.

8GHz Pentium Mobile Notebook running LINUX, training took an average time of 9 seconds.

discriminant direction found in the training phase.

9 Fig. 3. Exemplary results of color object detection in natural environments. The evaluation set consists of 77 images of size of a publicly available dataset of scenes from home and work environments [16]. The task in this experiment was to detect the cylindrical blue cup. For training, 27 randomly chosen images were considered. From each training image, 3 positive and 28 negative training patches of size were extracted (see Fig. 1(a)). The classifier was thus trained with 837 samples (81 ω+ (label: +1) and 756 ω (label: 1)). On a 1.8GHz Pentium Mobile Notebook running LINUX, training took an average time of 9 seconds. Testing the classifier on the remaining 52 images, took 34 seconds and produced an equal error rate of 85% (R = 4, θ = 3.1). a single operation per pixel, since (X M) W = X W M W, where the scalar constant M W can be computed beforehand. Suitable classification thresholds θ were estimated by projecting the training data onto the discriminant direction found in the training phase. Precision-recall curves resulting from sliding θ along this direction allowed for characterizing the performance of our method. Our first test set consists of images showing typical, unconstrained, and cluttered work and home environments under natural illumination (see Fig.3). The task was to detect one of the reoccuring objects a cylindrical blue cup. Details of the experimental setting are provided in Fig. 3. In our second test, we considered a subset of the Yale face database [17]. The greyvalue images in this database were transformed into a 3rd-order representation consisting of two layers. Following Koenderink s proposal for features for illumination invariant image processing [18], we computed mean curvature and Gaussian curvature

Fig. 4. Exemplary results of face detection.

The set covers 10 individuals under seven different illumination conditions.

The grey-value images in the database were transformed into a two-layered representation using illumination insensitive features as

For training, 21 randomly chosen images were used, where each of the seven conditions was represented by three images.

The classifier was thus trained with 483 samples (63 ω + (label: +1) and 420 ω (label: 1)). On a 1.

Running the classifier on the remaining 49 test images took 30 seconds and produced an equal error rate of 81% (R = 4, θ = 1.566).

For R = 4 term tensor classifiers, we obtained an equal error rate (the point where recall equals precision) of 85% in the first

10 Fig. 4. Exemplary results of face detection. The evaluation set consists of a subset of 70 images (scaled to pixels) of the Yale face database [17]. The set covers 10 individuals under seven different illumination conditions. The columns in the figure correspond to three of these conditions. The grey-value images in the database were transformed into a two-layered representation using illumination insensitive features as proposed by Koenderink [18]. For training, 21 randomly chosen images were used, where each of the seven conditions was represented by three images. From each training image, 3 positive and 20 negative training patches of size pixels were extracted. The classifier was thus trained with 483 samples (63 ω + (label: +1) and 420 ω (label: 1)). On a 1.8GHz Pentium Mobile Notebook running LINUX, training took an average time of 4 seconds. Running the classifier on the remaining 49 test images took 30 seconds and produced an equal error rate of 81% (R = 4, θ = 1.566). maps using his fiber bundle interpretation of image data (see Fig. 1(b)). Further details concerning the setting are given in Fig. 4. For R = 4 term tensor classifiers, we obtained an equal error rate (the point where recall equals precision) of 85% in the first experiment and 81% in the second one. Lower values for R were less useful, higher values did not yield improvements. Note, however, that, at the cost of only slightly worse precision, the tensor classifiers achieved 100% recall in both experiments. In the light of these results, it is instructive to contrast multilinear object detection with the currently most popular technique for fast view-based object detection, the cascaded weak classifiers approach by Viola and Jones [19]. This approach provides excellent results where detection has to deal with objects of homogenous texture (such

11 as faces). However, applying the method to object detection in natural environments (like the ones in Fig. 3) reveals that less distinctive texture or texture variations due to varying illumination impair the performance [20]. Moreover, the boosting process for feature selection is almost insatiable in terms of training data and training time. Where cascaded weak classifiers require O(10 8 ) training examples, training our tensor classifiers was done with less than 1000 samples; where the training of cascaded weak classifiers requires O(10 1 ) hours, separable tensor classifiers are trained within seconds. Furthermore, extensions of the approach by Viola and Jones that incorporate temporal information and thus handle situations where the most salient features are not correlated to texture variations along lower modes of the data (i.e. along the x, y image plane) but to variations along higher modes (for instance, along a temporal or a color direction ), are confined to small image patches since training will become infeasible otherwise. Tensor classifiers, however, automatically and quickly account for multivariate variations and, as our experiments show, accomplish this on patch sizes that are out of reach for cascaded weak classifiers. Because of the latter two characteristics, we belive that tensor-based object detection provides an auspicious avenue to scenarios that necessitate online visual learning, in order to, for instance, cope with changing illumination conditions. This, however, remains a topic for future work. 5 Conclusion and Outlook Currently, tensor-based methods are gaining popularity as a means to produce dimensionality reduced image representations. Recent contributions demonstrate that multilinear techniques preserve image structures more faithfully than approaches based on conventional vector space embeddings of image data. The work presented in this paper aims to adopt this property to the design of multilinear classifiers. We presented an alternating least squares algorithm to learn higher order, multilinear discriminant projection tensors given as an R-term basis expansion of mutually orthogonal rank-1 tensors. We pointed out that this technique is less involved than another recently proposed approach to higher order LDA, because it does not rely on repeated singular value decompositions of large matrices. Instead, due to the separable nature of our classifiers, the matrix inversions that are required during training only apply to reasonably small matrices. Separable, multilinear classifiers thus have several advantages: (i) the amount of required training data is much smaller than for other methods; (ii) training times are very short; (iii) the runtime behavior on multivariate image data is fast even for classification windows of sizes that are unmanageable for other techniques. In two exemplary applications, we considered un-preprocessed RGB images and multilayered curvature feature images. Our results underline that multilinear discriminant analysis of 3rd-order image data accounts for salient information distributed across several modes and thus yields robust and reliable results in object detection in complex, natural scenes. Given these findings, there are numerous promising directions for further research on tensor-based classifiers. Currently, we focus on two questions: How can our approach be extended to multiple classes? Do robust statistical methods provide a simple avenue to even better performance? Concerning the former, we examine the use of ten-

12 sor contractions different from the ones applied in this paper. Concerning the latter, we investigate the effects of exchanging the least mean squares steps in our algorithm by robust estimation techniques. Future investigations may deal with incorporating boosting steps and investigating the possibility of applying the kernel trick to higher order linear discriminant analysis References 1. Shashua, A., Levin, A.: Linear Image Cosing for Regression and Classification using the Tensor-rank Principle. In: Proc. CVPR. Volume I. (2001) Vasilescu, M., Terzopolos, D.: Multilinear Analsysis of Image Ensembles: Tensorfaces. In: Proc. ECCV. Volume 2350 of LNCS., Springer (2002) De Lathauwer, L., De Moor, B., Vanderwalle, J.: A Multilinear Singular Value Decomposition. SIAM J. Matrix Anal. Appl. 21 (2000) Wang, H., Ahuja, N.: Compact representation of multidimensional data using tensor rankone decomposition. In: Proc. ICPR. Volume I. (2004) Shashua, A., Hazan, T.: Non-Negative Tensor Factorization with Applications to Statistics and Computer Vision. In: Proc. Int Conf. Machine Learning. (2005) 6. Vlasic, D., Brand, M., Pfister, H., Popović, J.: Face transfer with multilinear models. ACM Trans. on Graphics (Proc. SIGGRAPH 05) 24 (2005) Wang, H., Wu, Q., Shi, L., Yu, Y., Ahuja, N.: Out-of-core tensor approximation of multidimensional matrices of visual data. ACM Trans. on Graphics (Proc. SIGGRAPH 05) 24 (2005) Tenenbaum, J., Freeman, W.: Separating Style and Content with Bilinear Models. Neural Computing 12 (2000) anonymized for review: (2005) 10. Yan, S., Xu, D., Zhang, L., Tang, X., Zhang, H.J.: Discriminant Analysis with Tensor Representation. In: Proc. CVPR. Volume I. (2005) Ye, J., Janardan, R., Li, Q.: Two-Dimensional Linear Discriminant Analysis. In Saul, L., Weiss, Y., Bottou, L., eds.: Advances in Neural Information Processing Systems 17. MIT Press, Cambridge, MA (2005) Kolda, T.: Orthogonal Tensor Decompositions. SIAM J. Matrix Anal. Appl. 23 (2001) Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer (2001) 14. Fisher, R.: The Use of Multiple Measurements in Taxonomic Problems. Ann. Eugenics 7 (1936) Fukunaga, K.: Introduction to Statistical Pattern Recognition. Academic Press (1990) 16. anonymized for review: (2005) 17. Georghiades, A., Belhumeur, P., Kriegman, D.: From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose. IEEE Trans. Pattern Anal. Machine Intelli. 23 (2001) Koenderink, J.J., van Doorn, A.J.: Image Processing Done Right. In: Proc. ECCV. Volume 2350 of LNCS., Springer (2002) Viola, P., Jones, M.: Rapid Object Detection using a Boosted Cascade of Simple Features. In: Proc. CVPR. Volume I. (2001) anonymized for review: - (2004)

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang

CS 231A Section 1: Linear Algebra & Probability Review Kevin Tang Kevin Tang Section 1-1 9/30/2011 Topics Support Vector Machines Boosting Viola Jones face detector Linear Algebra Review Notation Operations