Higher Order Separable LDA Using Decomposed Tensor Classifiers

Size: px
Start display at page:

Download "Higher Order Separable LDA Using Decomposed Tensor Classifiers"

Transcription

1 Higher Order Separable LDA Using Decomposed Tensor Classifiers Christian Bauckhage, Thomas Käster and John K. Tsotsos Centre for Vision Research, York University, Toronto, ON, M3J 1P3 Abstract. The idea of understanding collections of digital images as higher order tensors is gaining ever more popularity in computer vision. A growing body of literature suggests that tensorial methods outperform common vector-based representations when it comes to image coding. Surprisingly few contributors, however, have yet recognized the potential that multilinear algebra offers for classification. In this paper, we demonstrate the advantages tensor classifiers offer for view-based object detection. We present a higher order extension of linear discriminant analysis that applies to multilinear objects of arbitrary order. In contrast to other recent contributions, it does not rely on the technique of n-mode SVD. Instead, we apply an alternating least squares procedure to repeated tensor contractions to obtain an R-term approximation of a projection tensor for higher order LDA. The resulting multilinear classifiers train within seconds. As they are separable, they also provide fast runtime behavior. In addition to fast trainingand runtime, the methods provides good accuracy. Empirical results in multilinear color object detection and illumination invariant face detection show that the method performs robustly and reliably on complex, unconstrained natural scenes. 1 Introduction, Motivation and Background Surveying recent literature, one can note an increased interest in tensor-based methods for image processing and computer vision. This appears to have been stirred by work by Shashua and Levin [1] and Vasilescu and Terzepoulos [2]. As both contributions provide good examples for the different benefits of multilinear representations of image data, it is instructive to have a closer look at what they report. Interpreting a set of grey level images {A 1, A 2,...} as a third-order tensor A, Shashua and Levin propose to consider a rank-1 decomposition of this tensor. Their multi-matrix extension of the singular value decomposition (SVD) yields a set of rank-1 matrices whose linear span includes the input data. Empirical results show that projecting the data into the corresponding subspace captures spatial and temporal redundancies in the input and therefore is well suited for image coding and subsequent classification. Vasilescu and Terzepoulos represent an ensemble of images depicting different faces with different facial expressions under different lighting conditions as a fifth-order tensor B. Given this tensor, they apply the n-mode or higher order SVD (HOSVD) originally developed by De Lathauwer et al. [3]. This leads to an (n-mode) factorization of B, where one of the factors is a fifth-order tensor Z called the core tensor. The significance of Z is that it allows for computing eigen modes of the image ensemble. The

2 method thus provides a means for flexible, application-dependent dimension reduction: subspaces may be tailored that represent some of the independent modalities in the original input set more faithfully than others. Similar to these early contributions, most of the more recent papers on tensor methods for computer vision emphasize dimension reduction. Wang and Ahuja [4] extend the HOSVD procedure and obtain sparse but faithful representations of video data. Shashua and Hazan [5] consider non-negative tensor factorizations and demonstrate that these lead to semantically meaningful basis images. Dealing with video rendering, Vlasic et al. [6] apply HOSVD to mesh models of faces and Wang et al. [7] present a block-wise HOSVD that efficiently handles 6th- or 7th-order tensors representing texture data. Given the success and benefits of tensor-based subspace methods for image coding, surprisingly little work has been reported on adopting multilinear techniques to classifier design for pattern recognition or classification. An account of early attempts on multivariate classifiers is given in [8]. In the same paper, Tenenbaum and Freeman introduce bilinear models for classification tasks where the input patterns are formed from two independent factors. However, although their classifiers are multilinear functions, they still require vectorized input data. Just recently, several researchers independently reported on how to overcome this requirement and introduced different multilinear extensions of Fisher s linear discriminant analysis (LDA). The methods described in [9 11] all focus on classifying data which is given in form of second-order tensors. As each of these techniques is related to the matters treated here, we will describe them in more depth later on. However, it is worthwhile pointing out already that all three contributions report results showing that interpreting grey-value images as second-order tensors leads to an increased performance in visual object recognition. In this paper, we present an extension of the iterative least squares approach to second-order tensor classification which we introduced in [9]. Following the general strategy laid out there, we present how to extend LDA to tensors of arbitrary higher orders. Towards the problem of binary classification for appearance-based object recognition, we will make use of the concept of orthogonal tensor decompositions and derive an algorithm for training separable, multilinear discriminant classifiers. It will turn out that by design our approach to higher order LDA has several favorable characteristics: (i) the approach only requires few samples for training; (ii) its training times are very fast; (iii) as the resulting classifiers are separable, their runtime behavior is fast as well. In addition, the advantageous, structure-preserving properties of tensor methods in image coding become apparent in image classification, too. Experiments with third-order tensor classifiers revealed that the technique is applicable to color object detection and illumination invariant face detection. On a dataset of RGB-images of objects in complex natural scenes, as well as on higher order representations of grey-value images using illumination insensitive features, multilinear classification coped with considerable variation in the data and was fast and robust. We will first introduce basic definitions and notational conventions needed in later sections. Section 3 will present our approach to higher order LDA and contrast it with the other recent proposals. In section 4, we will present and discuss our current experimental results. Finally, a summary and an outlook to promising directions for future work will close this contribution.

3 2 Basic Concepts and Notation In the remainder of this paper, we will make frequent use of definitions and notational conventions adopted from Kolda [12]. If A is an m 1 m 2... m n tensor over R, we say that its order is n and its jth dimension is m j. The elements of A can be indexed as follows A i1i 2...i n where i j {1, 2,..., m j } for j = 1,..., n. The set of all tensors of size m 1 m 2... m n is denoted by T (m 1, m 2,..., m n ). Whenever we can neglect the dimensions but wish to refer to the order of a tensor, we simply write T n. The inner product of two tensors A, B T (m 1, m 2,..., m n ) is defined as A B = m 1 m 2 i 1=1 i 2=1... m n i n=1 A i1i 2...i n B i1i 2...i n. Using Einstein s summation convention, in which we implicitly sum over repeated indices in products, we may also write A B = A i1i 2...i n B i1i 2...i n. Note that the inner product is a special case of a tensor contraction. This class of operations comprises multiplications of tensors (of possibly different orders) which result in lower order objects. A familiar example is the multiplication of a matrix M T (m 1, m 2 ) with a vector u T (m 2 ): Mu = v. The components of the resulting vector v T (m 1 ) are given by v i = M ij u j. In Penrose s abstract index notation, the indices assume the role of abstract markers in terms of which the algebra is formulated. We can thus express u and M as u u j and M M ij, respectively. This introduces precious versatility into the writing of tensor equations. The following expressions, for instance, become equally valid to denote the contraction in our example: Mu = v v i = M ij u j. A tensor A T (m 1, m 2,... m n ) is a decomposed or rank 1 tensor, if it can be written as A = a 1 a 2... a n where denotes the outer product and the factors a j are vectors of corresponding dimensions, i.e a j R mj. For the elements of a rank 1 tensor we have: A i1i 2...i n = a 1 i 1 a 2 i 2... a n i n. Two decomposed tensors U, V T (m 1, m 2,..., m n ) are orthogonal, if U V = u j v j = 0. They are said to be completely orthogonal, if u j v j 1 j n.

4 3 Extending LDA to Higher Order Objects Linear discriminant analysis (LDA) is a well established, powerful tool for dimensionality reduction and classification and applications abound (cf. [13] and the references therein). In this paper, we will focus on the linear discriminant analysis of (multi)linear data from two classes we shall call ω + and ω. Next, we will briefly summarize traditional LDA and two recent extensions to higher order objects; afterwards we present an alternative approach to generalizing LDA to tensor spaces. 3.1 The Generalized Eigenvalue Approach and its Extensions to Tensors Traditional LDA deals with vectorial data. Given a set of feature vectors {x 1, x 2,..., x N } containing positive and negative examples, binary LDA seeks a projection w x l, l = 1... N, of the samples that maximizes the inter-class distance of the resulting scalars. The most widely applied technique for finding the direction w of the optimal projection dates back to seminal work by Fisher [14]. He proposed to determine w by maximizing the Rayleigh quotient w T S b w/w T S w w where S b and S w are matrices that denote the between-class and within-class scatter of the data. Following this proposal, w results from solving the generalized eigenvalue problem S b w = αs w w. Once w has been found, binary classification simply requires selecting a suitable threshold along this direction. Recently, Ye et al. [11] applied LDA to matrix spaces. Dealing with data given in form of matrices X, they seek projection matrices L and R such that projecting the data onto a lower-dimensional space according to L T XR preserves the structure in the original higher-dimensional space. Their solution is an iterative procedure of solving generalized eigenvalue problems for row- and column-space projections of the data. More recently yet, Yan et al. [10] extended the Rayleigh criterion to the binary discriminant analysis in higher order tensor spaces. Their technique basically relies on the j-mode or higher order SVD (HOSVD) A = Z 1 U 1 2 U 2... n U n (1) developed by De Lathauwer [3]. Here, the core tensor Z is of the same order and dimensions as the tensor A. The U j are mutually orthogonal, unitary matrices and the j-mode product j of a tensor A T (m 1,..., m j,..., m n ) with a matrix U T (k j, m j ) results in a tensor B T (m 1,..., m j 1, k j, m j+1,..., m n ). The projection matrices U j are found by unfolding A along the jth mode. Each U j is given by the left singular matrix of the corresponding, unfolded matrix A j. Higher order SVD of nth-order tensors thus requires n matrix SVDs of matrices of size m j m 1 m 2... m j 1 m j+1... m n. Yan et al. develop an iterative procedure where they (i) initialize the projection matrices U j to unity, (ii) compute HOSVD projections of the training data, (iii) compute the between- and within-class scatter matrices along all modes and (iv) refine the corresponding matrices U j by solving generalized eigenvalue problems and continue with the second step until the projection matrices converge to a stable solution. Both, Ye et al. and Yan et al. evaluate their techniques on grey level image data. Both show that, for the task of face recognition, higher order LDA outperforms conventional linear discriminant analysis applied to vectorized image data.

5 (a) X T (72, 92, 3) (b) X T (70, 70, 2) Fig. 1. Examples of 3rd-order tensor representations for computer vision. 1(a) RGB image patch for color object detection. 1(b) two-layered curvature features for face detection. 3.2 The Least Mean Squares Approach A well known but underexploited fact that Fisher himself pointed out [14] is that binary LDA is equivalent to the least mean squares (LMS) fitting of a hyperplane that separates the classes ω + and ω. The projection direction corresponds to the normal vector of the plane. Dealing with vectorial data, the LMS procedure is the following: Given a sample of l = 1,..., N pattern vectors x l and a corresponding set of class labels y l, the optimal projection direction w for classification results from minimizing the error E(w) = 1 2 ( y l w x l) 2. (2) l Setting the gradient w E = 0 and rearranging the resulting terms results in a closed form solution for w: w = C 1 xx y x (3) where C xx = j xj x j is the correlation matrix of the sample vectors and y x denotes the cross correlation vector between the samples and the class labels. 3.3 Extending the LMS Approach to Tensors In contrast to the contributions discussed above, we propose to make use of the LMS approach to LDA in order to extend it to multilinear objects. Aiming at image data and fast visual object detection, we apply an alternating least squares (ALS) procedure to perform separable higher order LDA. This idea of approaching tensor-based LDA arose from two observations concerning color-image processing. A color image typically consists of several image planes or layers and can thus be interpreted as a third-order tensor I T (m 1, m 2, m 3 ) where m 1 and m 2 correspond to the x- and y-resolution and m 3 counts the number of layers, for instance, m 3 = 3 for RGB images (see Fig. 1). Most approaches to appearance based object recognition transform image patches X of size, say, m n d into vectors x R mnd. Because of the extreme dimension of the resulting space, they first require dimensionality reduction

6 before classification is carried out. The first step towards fast multilinear discriminant analysis for visual processing is to refrain from unfolding X but to consider the inner product W X = W ijk X ijk, where the projection tensor W assumes the role of the projection vector w in traditional LDA. The second step towards fast linear discriminant classification results from efficiency considerations. Object detection by iterating over a color image I and multiplying each visited image patch with a projection tensor W T (m, n, d) requires O(mnd) operations per pixel. Even on modern computers, this will be prohibitive if m and n are fairly large. Assume, however, W was given as a R-term sum of rank-1 tensors R W = u r v r w r (4) r=1 where u r R m, v r R n and w r R d. Then, the iteration over the image can be computed as a sequence of one-dimensional convolutions r(( ) I u r v r) w r. This reduces the effort to O(R(m + n + d)) and therefore provides a fast multilinear approach to object detection. In describing how to derive such separable projection tensors from training data, we will, for convenience, first consider the derivation of a R = 1 term separable higher order LDA projection where W = u v w. Assume a sample {X l, y l } l=1,...,n of image patches X l with corresponding class labels y l. Due the rank-1 constraint on W there is no closed form solution that would minimize the error E(u, v, w) = 1 2 l( y l X l u v w ) 2. However, the following alternating least squares procedure will minimize the error with respect to the modes: 1. randomly initialize u R m and v R n 2. for l = 1,..., N compute the contractions x l k = X ijk l u iv j 3. compute w = C 1 x k x k y xk 4. for l = 1,..., N compute the contractions x l j = X ijk l u iw k 5. compute v = C 1 x jx j y xj 6. for l = 1,..., N compute the contractions x l i = X ijk l v jw k 7. compute u = C 1 x ix i y xi As the procedure starts with arbitrary vectors u and v, steps 2 through 7 must be iterated until a suitable convergence criterion is met. Our implementation considers the refinement of the vector u. If, in iteration t, u(t) u(t 1) ɛ, the process is stopped. Practical experience shows that this yields quick convergence. Extending the procedure to compute a single decomposed projection tensor to the derivation of multi-term tensors is straightforward. Additive stage-wise modeling [13] provides the toolkit. If W = k r=1 ur v r w r is a k term solution for the LDA projection tensor, a k + 1 term representation can be found by minimizing E(u k+1, v k+1, w k+1 ). Note, however, that it is appropriate to require that every newly found rank-1 tensor u k+1 v k+1 w k+1 be orthogonal to the ones derived so far. In this way, the resulting projection tensor W favors directions of maximum variance in the data tensor space over less informative ones. Orthogonality is guaranteed, if the

7 Input: a training set {X l, y l } l=1,...,n of image patches X l T n with class labels y l Output: a rank-r approximation of an nth-order projection tensor W = u r 1 u r 2... u r n for r = 1,..., R t = 0 for j = 1,..., n 1 randomly initialize u r j(t) orthogonalize u r j(t) w.r.t. {u 1 j,..., u r 1 j } repeat t t + 1 for j = n,..., 1 for l = 1,..., N contract x l i j = Xi l 1...i j 1 i j i j+1...i n u r i 1 (t)... u r i j 1 (t) u r i j+1 (t)... u r i n (t) compute u r j(t) = C 1 x ij x ij y xij orthogonalize u r j(t) w.r.t. {u 1 j,..., u r 1 j } until u r 1(t) u r 1(t 1) ɛ t > t max endfor Fig. 2. Repeated alternating least squares scheme for computing an nth-order LDA projection tensor W given as a sum of R completely orthogonal basis tensors u r 1 u r 2... u r n. individual rank-1 tensors are completely orthogonal. Therefore, the (modified) Gram- Schmidt procedure is applied after steps 3, 5, and 7 in the above algorithm. Although we derived the algorithm focusing on the case of 3rd-order tensors, its underlying principles immediately apply to tensors of arbitrary order. A summary of the general, n-th order form of the procedure, including stage-wise refinement and orthogonalization is given in Figure Properties and Benefits of Higher Order LDA The ALS approach to higher order tensor LDA approach we have described has significant differences from past work and provides valuable advantages. These are summarized in the following paragraphs. Multilinear LDA by extension of the LMS approach should not be confused with orthogonal LDA. Our approach does not seek a set of orthogonal discriminant directions as does O-LDA [15]. Rather, we determine a single discriminant direction requiring that the projection tensor is given as a sum of R pairwise orthogonal tensors of rank 1. Multilinear LDA by extension of the LMS approach does not aim at computing higher order SVD. The above derivation of rank-r projection tensors resembles the iterative computation of conventional matrix SVD. In fact, if the above algorithm was applied to 2nd-order tensors, it would yield vectors u r and v r proportional to the left and right singular vectors of the 2nd-order projection tensor W which would result from an un-

8 constrained LMS approach to LDA [9]. It is thus no surprise to see that, in applied mathematics, alternating least squares techniques have been considered as a means of computing tensor SVDs (cf. e.g. [12]). Note however, that our goal is not to decompose a given tensor but to find an efficient multilinear classifier directly from training data. Neither are we interested in high-rank solution; the smaller the rank R of the resulting classifier, the faster will be its training and runtime behavior. Multilinear LDA by extension of the LMS approach is less involved than approaches generalizing the Rayleigh criterion. While our approach resembles the methods discussed in section 3.1 in that it is an iterative technique, techniques that generalize the Rayleigh criterion to higher orders require the computation of two correlation matrices for each mode in each iteration. Tensor LDA based on the extension of the LMS approach requires only one such matrix per mode and iteration. Moreover, while the method by Yan et al. [10] is as general as our proposal, it necessitates frequent computations of SVDs of large matrices whereas our approach only requires inversions of reasonably small matrices. Multilinear LDA by extension of the LMS approach trains quickly. If multivariate data of size m n d were unfolded into vectors, conventional LDA based on LMS optimization or on solving the generalized eigenvalue problem S b w = αs w w would require the computation and inversion of covariance matrices of sizes mnd mnd. Even for moderate values of m and n and not too many training examples, this may become infeasible. However, the covariance matrices C xk x k, C xjx j and C xix i that appear in the learning stage of our approach to 3rd-order LDA are of considerably reduced sizes d d, n n and m m, respectively. Therefore, in addition to its fast runtime due to separability, our technique significantly shortens training time. In practice, we found that, compared to traditional LDA on very high dimensional vector spaces, our tensor space method reduces training times from hours to seconds. Multilinear LDA by extension of the LMS approach tackles the small sample size problem. This property is closely related to the previous one. The term small sample size problem refers to the effect that for conventional, vectorial LDA the within-class scatter matrix S w is often singular because the number of training samples is much smaller than the dimension of the embedding space [15]. Again, as the covariance matrices that appear in computing decomposed, mutually orthogonal projection tensors are of small dimensionality, small sample sizes will not hamper multilinear LDA. 4 Experiments This section presents two application examples for higher-order, multilinear discriminant analysis using the algorithm derived above. First, we regard the problem of robust object detection in unconstrained real world environments. Afterwards, we consider fast face detection under a wide range of illumination conditions. Note that our focus is on object detection rather than on object recognition. This accords with one of the basic properties of our classifiers: they are tailored to be separable higher order tensors and thus allow for fast processing of multivariate, 3rd-order image data. In each experiment, the input was normalized to zero mean X = X M, where M denotes the mean of the training samples. During runtime, this accounts only for

9 Fig. 3. Exemplary results of color object detection in natural environments. The evaluation set consists of 77 images of size of a publicly available dataset of scenes from home and work environments [16]. The task in this experiment was to detect the cylindrical blue cup. For training, 27 randomly chosen images were considered. From each training image, 3 positive and 28 negative training patches of size were extracted (see Fig. 1(a)). The classifier was thus trained with 837 samples (81 ω+ (label: +1) and 756 ω (label: 1)). On a 1.8GHz Pentium Mobile Notebook running LINUX, training took an average time of 9 seconds. Testing the classifier on the remaining 52 images, took 34 seconds and produced an equal error rate of 85% (R = 4, θ = 3.1). a single operation per pixel, since (X M) W = X W M W, where the scalar constant M W can be computed beforehand. Suitable classification thresholds θ were estimated by projecting the training data onto the discriminant direction found in the training phase. Precision-recall curves resulting from sliding θ along this direction allowed for characterizing the performance of our method. Our first test set consists of images showing typical, unconstrained, and cluttered work and home environments under natural illumination (see Fig.3). The task was to detect one of the reoccuring objects a cylindrical blue cup. Details of the experimental setting are provided in Fig. 3. In our second test, we considered a subset of the Yale face database [17]. The greyvalue images in this database were transformed into a 3rd-order representation consisting of two layers. Following Koenderink s proposal for features for illumination invariant image processing [18], we computed mean curvature and Gaussian curvature

10 Fig. 4. Exemplary results of face detection. The evaluation set consists of a subset of 70 images (scaled to pixels) of the Yale face database [17]. The set covers 10 individuals under seven different illumination conditions. The columns in the figure correspond to three of these conditions. The grey-value images in the database were transformed into a two-layered representation using illumination insensitive features as proposed by Koenderink [18]. For training, 21 randomly chosen images were used, where each of the seven conditions was represented by three images. From each training image, 3 positive and 20 negative training patches of size pixels were extracted. The classifier was thus trained with 483 samples (63 ω + (label: +1) and 420 ω (label: 1)). On a 1.8GHz Pentium Mobile Notebook running LINUX, training took an average time of 4 seconds. Running the classifier on the remaining 49 test images took 30 seconds and produced an equal error rate of 81% (R = 4, θ = 1.566). maps using his fiber bundle interpretation of image data (see Fig. 1(b)). Further details concerning the setting are given in Fig. 4. For R = 4 term tensor classifiers, we obtained an equal error rate (the point where recall equals precision) of 85% in the first experiment and 81% in the second one. Lower values for R were less useful, higher values did not yield improvements. Note, however, that, at the cost of only slightly worse precision, the tensor classifiers achieved 100% recall in both experiments. In the light of these results, it is instructive to contrast multilinear object detection with the currently most popular technique for fast view-based object detection, the cascaded weak classifiers approach by Viola and Jones [19]. This approach provides excellent results where detection has to deal with objects of homogenous texture (such

11 as faces). However, applying the method to object detection in natural environments (like the ones in Fig. 3) reveals that less distinctive texture or texture variations due to varying illumination impair the performance [20]. Moreover, the boosting process for feature selection is almost insatiable in terms of training data and training time. Where cascaded weak classifiers require O(10 8 ) training examples, training our tensor classifiers was done with less than 1000 samples; where the training of cascaded weak classifiers requires O(10 1 ) hours, separable tensor classifiers are trained within seconds. Furthermore, extensions of the approach by Viola and Jones that incorporate temporal information and thus handle situations where the most salient features are not correlated to texture variations along lower modes of the data (i.e. along the x, y image plane) but to variations along higher modes (for instance, along a temporal or a color direction ), are confined to small image patches since training will become infeasible otherwise. Tensor classifiers, however, automatically and quickly account for multivariate variations and, as our experiments show, accomplish this on patch sizes that are out of reach for cascaded weak classifiers. Because of the latter two characteristics, we belive that tensor-based object detection provides an auspicious avenue to scenarios that necessitate online visual learning, in order to, for instance, cope with changing illumination conditions. This, however, remains a topic for future work. 5 Conclusion and Outlook Currently, tensor-based methods are gaining popularity as a means to produce dimensionality reduced image representations. Recent contributions demonstrate that multilinear techniques preserve image structures more faithfully than approaches based on conventional vector space embeddings of image data. The work presented in this paper aims to adopt this property to the design of multilinear classifiers. We presented an alternating least squares algorithm to learn higher order, multilinear discriminant projection tensors given as an R-term basis expansion of mutually orthogonal rank-1 tensors. We pointed out that this technique is less involved than another recently proposed approach to higher order LDA, because it does not rely on repeated singular value decompositions of large matrices. Instead, due to the separable nature of our classifiers, the matrix inversions that are required during training only apply to reasonably small matrices. Separable, multilinear classifiers thus have several advantages: (i) the amount of required training data is much smaller than for other methods; (ii) training times are very short; (iii) the runtime behavior on multivariate image data is fast even for classification windows of sizes that are unmanageable for other techniques. In two exemplary applications, we considered un-preprocessed RGB images and multilayered curvature feature images. Our results underline that multilinear discriminant analysis of 3rd-order image data accounts for salient information distributed across several modes and thus yields robust and reliable results in object detection in complex, natural scenes. Given these findings, there are numerous promising directions for further research on tensor-based classifiers. Currently, we focus on two questions: How can our approach be extended to multiple classes? Do robust statistical methods provide a simple avenue to even better performance? Concerning the former, we examine the use of ten-

12 sor contractions different from the ones applied in this paper. Concerning the latter, we investigate the effects of exchanging the least mean squares steps in our algorithm by robust estimation techniques. Future investigations may deal with incorporating boosting steps and investigating the possibility of applying the kernel trick to higher order linear discriminant analysis References 1. Shashua, A., Levin, A.: Linear Image Cosing for Regression and Classification using the Tensor-rank Principle. In: Proc. CVPR. Volume I. (2001) Vasilescu, M., Terzopolos, D.: Multilinear Analsysis of Image Ensembles: Tensorfaces. In: Proc. ECCV. Volume 2350 of LNCS., Springer (2002) De Lathauwer, L., De Moor, B., Vanderwalle, J.: A Multilinear Singular Value Decomposition. SIAM J. Matrix Anal. Appl. 21 (2000) Wang, H., Ahuja, N.: Compact representation of multidimensional data using tensor rankone decomposition. In: Proc. ICPR. Volume I. (2004) Shashua, A., Hazan, T.: Non-Negative Tensor Factorization with Applications to Statistics and Computer Vision. In: Proc. Int Conf. Machine Learning. (2005) 6. Vlasic, D., Brand, M., Pfister, H., Popović, J.: Face transfer with multilinear models. ACM Trans. on Graphics (Proc. SIGGRAPH 05) 24 (2005) Wang, H., Wu, Q., Shi, L., Yu, Y., Ahuja, N.: Out-of-core tensor approximation of multidimensional matrices of visual data. ACM Trans. on Graphics (Proc. SIGGRAPH 05) 24 (2005) Tenenbaum, J., Freeman, W.: Separating Style and Content with Bilinear Models. Neural Computing 12 (2000) anonymized for review: (2005) 10. Yan, S., Xu, D., Zhang, L., Tang, X., Zhang, H.J.: Discriminant Analysis with Tensor Representation. In: Proc. CVPR. Volume I. (2005) Ye, J., Janardan, R., Li, Q.: Two-Dimensional Linear Discriminant Analysis. In Saul, L., Weiss, Y., Bottou, L., eds.: Advances in Neural Information Processing Systems 17. MIT Press, Cambridge, MA (2005) Kolda, T.: Orthogonal Tensor Decompositions. SIAM J. Matrix Anal. Appl. 23 (2001) Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer (2001) 14. Fisher, R.: The Use of Multiple Measurements in Taxonomic Problems. Ann. Eugenics 7 (1936) Fukunaga, K.: Introduction to Statistical Pattern Recognition. Academic Press (1990) 16. anonymized for review: (2005) 17. Georghiades, A., Belhumeur, P., Kriegman, D.: From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose. IEEE Trans. Pattern Anal. Machine Intelli. 23 (2001) Koenderink, J.J., van Doorn, A.J.: Image Processing Done Right. In: Proc. ECCV. Volume 2350 of LNCS., Springer (2002) Viola, P., Jones, M.: Rapid Object Detection using a Boosted Cascade of Simple Features. In: Proc. CVPR. Volume I. (2001) anonymized for review: - (2004)

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang CS 231A Section 1: Linear Algebra & Probability Review Kevin Tang Kevin Tang Section 1-1 9/30/2011 Topics Support Vector Machines Boosting Viola Jones face detector Linear Algebra Review Notation Operations

More information

CS 231A Section 1: Linear Algebra & Probability Review

CS 231A Section 1: Linear Algebra & Probability Review CS 231A Section 1: Linear Algebra & Probability Review 1 Topics Support Vector Machines Boosting Viola-Jones face detector Linear Algebra Review Notation Operations & Properties Matrix Calculus Probability

More information

A Multi-Affine Model for Tensor Decomposition

A Multi-Affine Model for Tensor Decomposition Yiqing Yang UW Madison breakds@cs.wisc.edu A Multi-Affine Model for Tensor Decomposition Hongrui Jiang UW Madison hongrui@engr.wisc.edu Li Zhang UW Madison lizhang@cs.wisc.edu Chris J. Murphy UC Davis

More information

N-mode Analysis (Tensor Framework) Behrouz Saghafi

N-mode Analysis (Tensor Framework) Behrouz Saghafi N-mode Analysis (Tensor Framework) Behrouz Saghafi N-mode Analysis (Tensor Framework) Drawback of 1-mode analysis (e.g. PCA): Captures the variance among just a single factor Our training set contains

More information

Uncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization

Uncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization Uncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization Haiping Lu 1 K. N. Plataniotis 1 A. N. Venetsanopoulos 1,2 1 Department of Electrical & Computer Engineering,

More information

Example: Face Detection

Example: Face Detection Announcements HW1 returned New attendance policy Face Recognition: Dimensionality Reduction On time: 1 point Five minutes or more late: 0.5 points Absent: 0 points Biometrics CSE 190 Lecture 14 CSE190,

More information

Multiple Similarities Based Kernel Subspace Learning for Image Classification

Multiple Similarities Based Kernel Subspace Learning for Image Classification Multiple Similarities Based Kernel Subspace Learning for Image Classification Wang Yan, Qingshan Liu, Hanqing Lu, and Songde Ma National Laboratory of Pattern Recognition, Institute of Automation, Chinese

More information

A Tensor Approximation Approach to Dimensionality Reduction

A Tensor Approximation Approach to Dimensionality Reduction Int J Comput Vis (2008) 76: 217 229 DOI 10.1007/s11263-007-0053-0 A Tensor Approximation Approach to Dimensionality Reduction Hongcheng Wang Narendra Ahua Received: 6 October 2005 / Accepted: 9 March 2007

More information

Reconnaissance d objetsd et vision artificielle

Reconnaissance d objetsd et vision artificielle Reconnaissance d objetsd et vision artificielle http://www.di.ens.fr/willow/teaching/recvis09 Lecture 6 Face recognition Face detection Neural nets Attention! Troisième exercice de programmation du le

More information

ECE 661: Homework 10 Fall 2014

ECE 661: Homework 10 Fall 2014 ECE 661: Homework 10 Fall 2014 This homework consists of the following two parts: (1) Face recognition with PCA and LDA for dimensionality reduction and the nearest-neighborhood rule for classification;

More information

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis .. December 20, 2013 Todays lecture. (PCA) (PLS-R) (LDA) . (PCA) is a method often used to reduce the dimension of a large dataset to one of a more manageble size. The new dataset can then be used to make

More information

PCA FACE RECOGNITION

PCA FACE RECOGNITION PCA FACE RECOGNITION The slides are from several sources through James Hays (Brown); Srinivasa Narasimhan (CMU); Silvio Savarese (U. of Michigan); Shree Nayar (Columbia) including their own slides. Goal

More information

Face detection and recognition. Detection Recognition Sally

Face detection and recognition. Detection Recognition Sally Face detection and recognition Detection Recognition Sally Face detection & recognition Viola & Jones detector Available in open CV Face recognition Eigenfaces for face recognition Metric learning identification

More information

Iterative Laplacian Score for Feature Selection

Iterative Laplacian Score for Feature Selection Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,

More information

Discriminative Direction for Kernel Classifiers

Discriminative Direction for Kernel Classifiers Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering

More information

Lecture 13 Visual recognition

Lecture 13 Visual recognition Lecture 13 Visual recognition Announcements Silvio Savarese Lecture 13-20-Feb-14 Lecture 13 Visual recognition Object classification bag of words models Discriminative methods Generative methods Object

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

Tensor Decompositions for Machine Learning. G. Roeder 1. UBC Machine Learning Reading Group, June University of British Columbia

Tensor Decompositions for Machine Learning. G. Roeder 1. UBC Machine Learning Reading Group, June University of British Columbia Network Feature s Decompositions for Machine Learning 1 1 Department of Computer Science University of British Columbia UBC Machine Learning Group, June 15 2016 1/30 Contact information Network Feature

More information

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition ace Recognition Identify person based on the appearance of face CSED441:Introduction to Computer Vision (2017) Lecture10: Subspace Methods and ace Recognition Bohyung Han CSE, POSTECH bhhan@postech.ac.kr

More information

Symmetric Two Dimensional Linear Discriminant Analysis (2DLDA)

Symmetric Two Dimensional Linear Discriminant Analysis (2DLDA) Symmetric Two Dimensional inear Discriminant Analysis (2DDA) Dijun uo, Chris Ding, Heng Huang University of Texas at Arlington 701 S. Nedderman Drive Arlington, TX 76013 dijun.luo@gmail.com, {chqding,

More information

A Statistical Analysis of Fukunaga Koontz Transform

A Statistical Analysis of Fukunaga Koontz Transform 1 A Statistical Analysis of Fukunaga Koontz Transform Xiaoming Huo Dr. Xiaoming Huo is an assistant professor at the School of Industrial and System Engineering of the Georgia Institute of Technology,

More information

Discriminant Uncorrelated Neighborhood Preserving Projections

Discriminant Uncorrelated Neighborhood Preserving Projections Journal of Information & Computational Science 8: 14 (2011) 3019 3026 Available at http://www.joics.com Discriminant Uncorrelated Neighborhood Preserving Projections Guoqiang WANG a,, Weijuan ZHANG a,

More information

Multilinear Analysis of Image Ensembles: TensorFaces

Multilinear Analysis of Image Ensembles: TensorFaces Multilinear Analysis of Image Ensembles: TensorFaces M Alex O Vasilescu and Demetri Terzopoulos Courant Institute, New York University, USA Department of Computer Science, University of Toronto, Canada

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

Multilinear Subspace Analysis of Image Ensembles

Multilinear Subspace Analysis of Image Ensembles Multilinear Subspace Analysis of Image Ensembles M. Alex O. Vasilescu 1,2 and Demetri Terzopoulos 2,1 1 Department of Computer Science, University of Toronto, Toronto ON M5S 3G4, Canada 2 Courant Institute

More information

COS 429: COMPUTER VISON Face Recognition

COS 429: COMPUTER VISON Face Recognition COS 429: COMPUTER VISON Face Recognition Intro to recognition PCA and Eigenfaces LDA and Fisherfaces Face detection: Viola & Jones (Optional) generic object models for faces: the Constellation Model Reading:

More information

Simultaneous and Orthogonal Decomposition of Data using Multimodal Discriminant Analysis

Simultaneous and Orthogonal Decomposition of Data using Multimodal Discriminant Analysis Simultaneous and Orthogonal Decomposition of Data using Multimodal Discriminant Analysis Terence Sim Sheng Zhang Jianran Li Yan Chen School of Computing, National University of Singapore, Singapore 117417.

More information

Fantope Regularization in Metric Learning

Fantope Regularization in Metric Learning Fantope Regularization in Metric Learning CVPR 2014 Marc T. Law (LIP6, UPMC), Nicolas Thome (LIP6 - UPMC Sorbonne Universités), Matthieu Cord (LIP6 - UPMC Sorbonne Universités), Paris, France Introduction

More information

Machine Learning (Spring 2012) Principal Component Analysis

Machine Learning (Spring 2012) Principal Component Analysis 1-71 Machine Learning (Spring 1) Principal Component Analysis Yang Xu This note is partly based on Chapter 1.1 in Chris Bishop s book on PRML and the lecture slides on PCA written by Carlos Guestrin in

More information

Dimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014

Dimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014 Dimensionality Reduction Using PCA/LDA Hongyu Li School of Software Engineering TongJi University Fall, 2014 Dimensionality Reduction One approach to deal with high dimensional data is by reducing their

More information

Subspace Methods for Visual Learning and Recognition

Subspace Methods for Visual Learning and Recognition This is a shortened version of the tutorial given at the ECCV 2002, Copenhagen, and ICPR 2002, Quebec City. Copyright 2002 by Aleš Leonardis, University of Ljubljana, and Horst Bischof, Graz University

More information

Principal Component Analysis and Linear Discriminant Analysis

Principal Component Analysis and Linear Discriminant Analysis Principal Component Analysis and Linear Discriminant Analysis Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/29

More information

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:

More information

Fundamentals of Multilinear Subspace Learning

Fundamentals of Multilinear Subspace Learning Chapter 3 Fundamentals of Multilinear Subspace Learning The previous chapter covered background materials on linear subspace learning. From this chapter on, we shall proceed to multiple dimensions with

More information

Dimensionality Reduction Using the Sparse Linear Model: Supplementary Material

Dimensionality Reduction Using the Sparse Linear Model: Supplementary Material Dimensionality Reduction Using the Sparse Linear Model: Supplementary Material Ioannis Gkioulekas arvard SEAS Cambridge, MA 038 igkiou@seas.harvard.edu Todd Zickler arvard SEAS Cambridge, MA 038 zickler@seas.harvard.edu

More information

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations. Previously Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations y = Ax Or A simply represents data Notion of eigenvectors,

More information

Principal Components Analysis. Sargur Srihari University at Buffalo

Principal Components Analysis. Sargur Srihari University at Buffalo Principal Components Analysis Sargur Srihari University at Buffalo 1 Topics Projection Pursuit Methods Principal Components Examples of using PCA Graphical use of PCA Multidimensional Scaling Srihari 2

More information

Dimensionality Reduction:

Dimensionality Reduction: Dimensionality Reduction: From Data Representation to General Framework Dong XU School of Computer Engineering Nanyang Technological University, Singapore What is Dimensionality Reduction? PCA LDA Examples:

More information

PCA and LDA. Man-Wai MAK

PCA and LDA. Man-Wai MAK PCA and LDA Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: S.J.D. Prince,Computer

More information

PCA and LDA. Man-Wai MAK

PCA and LDA. Man-Wai MAK PCA and LDA Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: S.J.D. Prince,Computer

More information

Real Time Face Detection and Recognition using Haar - Based Cascade Classifier and Principal Component Analysis

Real Time Face Detection and Recognition using Haar - Based Cascade Classifier and Principal Component Analysis Real Time Face Detection and Recognition using Haar - Based Cascade Classifier and Principal Component Analysis Sarala A. Dabhade PG student M. Tech (Computer Egg) BVDU s COE Pune Prof. Mrunal S. Bewoor

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

Boosting: Algorithms and Applications

Boosting: Algorithms and Applications Boosting: Algorithms and Applications Lecture 11, ENGN 4522/6520, Statistical Pattern Recognition and Its Applications in Computer Vision ANU 2 nd Semester, 2008 Chunhua Shen, NICTA/RSISE Boosting Definition

More information

Linear Subspace Models

Linear Subspace Models Linear Subspace Models Goal: Explore linear models of a data set. Motivation: A central question in vision concerns how we represent a collection of data vectors. The data vectors may be rasterized images,

More information

Dimensionality Reduction

Dimensionality Reduction Dimensionality Reduction Le Song Machine Learning I CSE 674, Fall 23 Unsupervised learning Learning from raw (unlabeled, unannotated, etc) data, as opposed to supervised data where a classification of

More information

Face Recognition Using Multi-viewpoint Patterns for Robot Vision

Face Recognition Using Multi-viewpoint Patterns for Robot Vision 11th International Symposium of Robotics Research (ISRR2003), pp.192-201, 2003 Face Recognition Using Multi-viewpoint Patterns for Robot Vision Kazuhiro Fukui and Osamu Yamaguchi Corporate Research and

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA

More information

Machine learning for pervasive systems Classification in high-dimensional spaces

Machine learning for pervasive systems Classification in high-dimensional spaces Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version

More information

When Fisher meets Fukunaga-Koontz: A New Look at Linear Discriminants

When Fisher meets Fukunaga-Koontz: A New Look at Linear Discriminants When Fisher meets Fukunaga-Koontz: A New Look at Linear Discriminants Sheng Zhang erence Sim School of Computing, National University of Singapore 3 Science Drive 2, Singapore 7543 {zhangshe, tsim}@comp.nus.edu.sg

More information

Properties of Matrices and Operations on Matrices

Properties of Matrices and Operations on Matrices Properties of Matrices and Operations on Matrices A common data structure for statistical analysis is a rectangular array or matris. Rows represent individual observational units, or just observations,

More information

18 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 19, NO. 1, JANUARY 2008

18 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 19, NO. 1, JANUARY 2008 18 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 19, NO. 1, JANUARY 2008 MPCA: Multilinear Principal Component Analysis of Tensor Objects Haiping Lu, Student Member, IEEE, Konstantinos N. (Kostas) Plataniotis,

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

L26: Advanced dimensionality reduction

L26: Advanced dimensionality reduction L26: Advanced dimensionality reduction The snapshot CA approach Oriented rincipal Components Analysis Non-linear dimensionality reduction (manifold learning) ISOMA Locally Linear Embedding CSCE 666 attern

More information

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) Principal Component Analysis (PCA) Additional reading can be found from non-assessed exercises (week 8) in this course unit teaching page. Textbooks: Sect. 6.3 in [1] and Ch. 12 in [2] Outline Introduction

More information

CVPR A New Tensor Algebra - Tutorial. July 26, 2017

CVPR A New Tensor Algebra - Tutorial. July 26, 2017 CVPR 2017 A New Tensor Algebra - Tutorial Lior Horesh lhoresh@us.ibm.com Misha Kilmer misha.kilmer@tufts.edu July 26, 2017 Outline Motivation Background and notation New t-product and associated algebraic

More information

Mathematical foundations - linear algebra

Mathematical foundations - linear algebra Mathematical foundations - linear algebra Andrea Passerini passerini@disi.unitn.it Machine Learning Vector space Definition (over reals) A set X is called a vector space over IR if addition and scalar

More information

Scale-Invariance of Support Vector Machines based on the Triangular Kernel. Abstract

Scale-Invariance of Support Vector Machines based on the Triangular Kernel. Abstract Scale-Invariance of Support Vector Machines based on the Triangular Kernel François Fleuret Hichem Sahbi IMEDIA Research Group INRIA Domaine de Voluceau 78150 Le Chesnay, France Abstract This paper focuses

More information

2D Image Processing Face Detection and Recognition

2D Image Processing Face Detection and Recognition 2D Image Processing Face Detection and Recognition Prof. Didier Stricker Kaiserlautern University http://ags.cs.uni-kl.de/ DFKI Deutsches Forschungszentrum für Künstliche Intelligenz http://av.dfki.de

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality

More information

Kazuhiro Fukui, University of Tsukuba

Kazuhiro Fukui, University of Tsukuba Subspace Methods Kazuhiro Fukui, University of Tsukuba Synonyms Multiple similarity method Related Concepts Principal component analysis (PCA) Subspace analysis Dimensionality reduction Definition Subspace

More information

Least Squares Optimization

Least Squares Optimization Least Squares Optimization The following is a brief review of least squares optimization and constrained optimization techniques. Broadly, these techniques can be used in data analysis and visualization

More information

Linear Dimensionality Reduction

Linear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis

More information

CS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines

CS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines CS4495/6495 Introduction to Computer Vision 8C-L3 Support Vector Machines Discriminative classifiers Discriminative classifiers find a division (surface) in feature space that separates the classes Several

More information

Kronecker Decomposition for Image Classification

Kronecker Decomposition for Image Classification university of innsbruck institute of computer science intelligent and interactive systems Kronecker Decomposition for Image Classification Sabrina Fontanella 1,2, Antonio Rodríguez-Sánchez 1, Justus Piater

More information

Multiscale Tensor Decomposition

Multiscale Tensor Decomposition Multiscale Tensor Decomposition Alp Ozdemir 1, Mark A. Iwen 1,2 and Selin Aviyente 1 1 Department of Electrical and Computer Engineering, Michigan State University 2 Deparment of the Mathematics, Michigan

More information

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage

More information

Comparison of Modern Stochastic Optimization Algorithms

Comparison of Modern Stochastic Optimization Algorithms Comparison of Modern Stochastic Optimization Algorithms George Papamakarios December 214 Abstract Gradient-based optimization methods are popular in machine learning applications. In large-scale problems,

More information

TENSOR LAYERS FOR COMPRESSION OF DEEP LEARNING NETWORKS. Cris Cecka Senior Research Scientist, NVIDIA GTC 2018

TENSOR LAYERS FOR COMPRESSION OF DEEP LEARNING NETWORKS. Cris Cecka Senior Research Scientist, NVIDIA GTC 2018 TENSOR LAYERS FOR COMPRESSION OF DEEP LEARNING NETWORKS Cris Cecka Senior Research Scientist, NVIDIA GTC 2018 Tensors Computations and the GPU AGENDA Tensor Networks and Decompositions Tensor Layers in

More information

CITS 4402 Computer Vision

CITS 4402 Computer Vision CITS 4402 Computer Vision A/Prof Ajmal Mian Adj/A/Prof Mehdi Ravanbakhsh Lecture 06 Object Recognition Objectives To understand the concept of image based object recognition To learn how to match images

More information

DYNAMIC TEXTURE RECOGNITION USING ENHANCED LBP FEATURES

DYNAMIC TEXTURE RECOGNITION USING ENHANCED LBP FEATURES DYNAMIC TEXTURE RECOGNITION USING ENHANCED FEATURES Jianfeng Ren BeingThere Centre Institute of Media Innovation Nanyang Technological University 50 Nanyang Drive, Singapore 637553. Xudong Jiang, Junsong

More information

Supervised locally linear embedding

Supervised locally linear embedding Supervised locally linear embedding Dick de Ridder 1, Olga Kouropteva 2, Oleg Okun 2, Matti Pietikäinen 2 and Robert P.W. Duin 1 1 Pattern Recognition Group, Department of Imaging Science and Technology,

More information

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 42 Outline 1 Introduction 2 Feature selection

More information

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University Lecture 4: Principal Component Analysis Aykut Erdem May 016 Hacettepe University This week Motivation PCA algorithms Applications PCA shortcomings Autoencoders Kernel PCA PCA Applications Data Visualization

More information

Feature Extraction with Weighted Samples Based on Independent Component Analysis

Feature Extraction with Weighted Samples Based on Independent Component Analysis Feature Extraction with Weighted Samples Based on Independent Component Analysis Nojun Kwak Samsung Electronics, Suwon P.O. Box 105, Suwon-Si, Gyeonggi-Do, KOREA 442-742, nojunk@ieee.org, WWW home page:

More information

Principal Component Analysis and Singular Value Decomposition. Volker Tresp, Clemens Otte Summer 2014

Principal Component Analysis and Singular Value Decomposition. Volker Tresp, Clemens Otte Summer 2014 Principal Component Analysis and Singular Value Decomposition Volker Tresp, Clemens Otte Summer 2014 1 Motivation So far we always argued for a high-dimensional feature space Still, in some cases it makes

More information

Classification of handwritten digits using supervised locally linear embedding algorithm and support vector machine

Classification of handwritten digits using supervised locally linear embedding algorithm and support vector machine Classification of handwritten digits using supervised locally linear embedding algorithm and support vector machine Olga Kouropteva, Oleg Okun, Matti Pietikäinen Machine Vision Group, Infotech Oulu and

More information

Lecture 7: Con3nuous Latent Variable Models

Lecture 7: Con3nuous Latent Variable Models CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 7: Con3nuous Latent Variable Models All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/

More information

A Novel Rejection Measurement in Handwritten Numeral Recognition Based on Linear Discriminant Analysis

A Novel Rejection Measurement in Handwritten Numeral Recognition Based on Linear Discriminant Analysis 009 0th International Conference on Document Analysis and Recognition A Novel Rejection easurement in Handwritten Numeral Recognition Based on Linear Discriminant Analysis Chun Lei He Louisa Lam Ching

More information

High Dimensional Discriminant Analysis

High Dimensional Discriminant Analysis High Dimensional Discriminant Analysis Charles Bouveyron 1,2, Stéphane Girard 1, and Cordelia Schmid 2 1 LMC IMAG, BP 53, Université Grenoble 1, 38041 Grenoble cedex 9 France (e-mail: charles.bouveyron@imag.fr,

More information

Least Squares Optimization

Least Squares Optimization Least Squares Optimization The following is a brief review of least squares optimization and constrained optimization techniques, which are widely used to analyze and visualize data. Least squares (LS)

More information

Kernel Methods. Machine Learning A W VO

Kernel Methods. Machine Learning A W VO Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance

More information

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x = Linear Algebra Review Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1 x x = 2. x n Vectors of up to three dimensions are easy to diagram.

More information

Fisher s Linear Discriminant Analysis

Fisher s Linear Discriminant Analysis Fisher s Linear Discriminant Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Tensor Canonical Correlation Analysis and Its applications

Tensor Canonical Correlation Analysis and Its applications Tensor Canonical Correlation Analysis and Its applications Presenter: Yong LUO The work is done when Yong LUO was a Research Fellow at Nanyang Technological University, Singapore Outline Y. Luo, D. C.

More information

Robust Motion Segmentation by Spectral Clustering

Robust Motion Segmentation by Spectral Clustering Robust Motion Segmentation by Spectral Clustering Hongbin Wang and Phil F. Culverhouse Centre for Robotics Intelligent Systems University of Plymouth Plymouth, PL4 8AA, UK {hongbin.wang, P.Culverhouse}@plymouth.ac.uk

More information

Empirical Discriminative Tensor Analysis for Crime Forecasting

Empirical Discriminative Tensor Analysis for Crime Forecasting Empirical Discriminative Tensor Analysis for Crime Forecasting Yang Mu 1, Wei Ding 1, Melissa Morabito 2, Dacheng Tao 3, 1 Department of Computer Science, University of Massachusetts Boston,100 Morrissey

More information

Motivating the Covariance Matrix

Motivating the Covariance Matrix Motivating the Covariance Matrix Raúl Rojas Computer Science Department Freie Universität Berlin January 2009 Abstract This note reviews some interesting properties of the covariance matrix and its role

More information

Image Analysis. PCA and Eigenfaces

Image Analysis. PCA and Eigenfaces Image Analysis PCA and Eigenfaces Christophoros Nikou cnikou@cs.uoi.gr Images taken from: D. Forsyth and J. Ponce. Computer Vision: A Modern Approach, Prentice Hall, 2003. Computer Vision course by Svetlana

More information

On the convergence of higher-order orthogonality iteration and its extension

On the convergence of higher-order orthogonality iteration and its extension On the convergence of higher-order orthogonality iteration and its extension Yangyang Xu IMA, University of Minnesota SIAM Conference LA15, Atlanta October 27, 2015 Best low-multilinear-rank approximation

More information

Multi-Class Linear Dimension Reduction by. Weighted Pairwise Fisher Criteria

Multi-Class Linear Dimension Reduction by. Weighted Pairwise Fisher Criteria Multi-Class Linear Dimension Reduction by Weighted Pairwise Fisher Criteria M. Loog 1,R.P.W.Duin 2,andR.Haeb-Umbach 3 1 Image Sciences Institute University Medical Center Utrecht P.O. Box 85500 3508 GA

More information

Local Learning Projections

Local Learning Projections Mingrui Wu mingrui.wu@tuebingen.mpg.de Max Planck Institute for Biological Cybernetics, Tübingen, Germany Kai Yu kyu@sv.nec-labs.com NEC Labs America, Cupertino CA, USA Shipeng Yu shipeng.yu@siemens.com

More information

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach Dr. Guangliang Chen February 9, 2016 Outline Introduction Review of linear algebra Matrix SVD PCA Motivation The digits

More information

Vector Space Models. wine_spectral.r

Vector Space Models. wine_spectral.r Vector Space Models 137 wine_spectral.r Latent Semantic Analysis Problem with words Even a small vocabulary as in wine example is challenging LSA Reduce number of columns of DTM by principal components

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Notes on Latent Semantic Analysis

Notes on Latent Semantic Analysis Notes on Latent Semantic Analysis Costas Boulis 1 Introduction One of the most fundamental problems of information retrieval (IR) is to find all documents (and nothing but those) that are semantically

More information

Pattern Recognition 2

Pattern Recognition 2 Pattern Recognition 2 KNN,, Dr. Terence Sim School of Computing National University of Singapore Outline 1 2 3 4 5 Outline 1 2 3 4 5 The Bayes Classifier is theoretically optimum. That is, prob. of error

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks Delivered by Mark Ebden With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable

More information

20 Unsupervised Learning and Principal Components Analysis (PCA)

20 Unsupervised Learning and Principal Components Analysis (PCA) 116 Jonathan Richard Shewchuk 20 Unsupervised Learning and Principal Components Analysis (PCA) UNSUPERVISED LEARNING We have sample points, but no labels! No classes, no y-values, nothing to predict. Goal:

More information

Image Analysis & Retrieval Lec 14 - Eigenface & Fisherface

Image Analysis & Retrieval Lec 14 - Eigenface & Fisherface CS/EE 5590 / ENG 401 Special Topics, Spring 2018 Image Analysis & Retrieval Lec 14 - Eigenface & Fisherface Zhu Li Dept of CSEE, UMKC http://l.web.umkc.edu/lizhu Office Hour: Tue/Thr 2:30-4pm@FH560E, Contact:

More information