Probabilistic Class-Specific Discriminant Analysis

Size: px
Start display at page:

Download "Probabilistic Class-Specific Discriminant Analysis"

Transcription

1 Probabilistic Class-Specific Discriminant Analysis Alexros Iosifidis Department of Engineering, ECE, Aarhus University, Denmark arxiv: v [cs.lg] 4 Dec 08 Abstract In this paper we formulate a probabilistic model for class-specific discriminant subspace learning. The proposed model can naturally incorporate the multi-modal structure of the negative class, which is neglected by existing methods. Moreover, it can be directly used to define a probabilistic classification rule in the discriminant subspace. We show that existing class-specific discriminant analysis methods are special cases of the proposed probabilistic model, by casting them as probabilistic models, they can be extended to class-specific classifiers. We illustrate the performance of the proposed model, in comparison with that of related methods, in both verification classification problems. Index Terms Class-Specific Discriminant Analysis, Multi-modal data distributions, Verification, Classification. I. INTRODUCTION Class-Specific Discriminant Analysis CSDA) [], [], [3], [4], [5], [6], determines an optimal subspace suitable for verification problems, where the objective is the discrimination of the class of interest from the rest of the world. As an example, let us consider the person identification problem, either through face verification [], or through exploiting movement information [7]. Different from person recognition, which is a multi-class classification problem assigning a sample facial image or movement sequence) to a class in a pre-defined set of classes person IDs in this case), person identification discriminates the person of interest from all rest people. While multi-class discriminant analysis models, like Linear Discriminant Analysis LDA) its variants [8], [9], [0], can be applied in such problems, they are inherently limited by the adopted class discrimination definition. That is, the maximal dimensionality of the resulting subspace is restricted by the number of classes, due to the rank of the between-class scatter matrix. In verification problems involving two classes LDA its variants lead to one-dimensional subspaces. On the other h, CSDA by expressing class discrimination using the out-of-class intra-class scatter matrices is able to define subspaces the dimensionality of which is restricted by the the number of samples forming the smallest class which is usually the class of interest) or the number of original space dimensions. By defining multiple discriminant dimensions, CSDA has been shown to achieve better class discrimination better performance compared to LDA [], [4], [5]. While the definition of class discrimination in CSDA its variants based on the intra-class out-of-class scatter matrices overcomes the limitations of LDA related to the dimensionality of the discriminant subspace, it overlooks the structure of the negative class. Since in practice samples forming the negative class belong to many classes, different from the positive one, it is expected that they will form Fig.. An example problem for class-specific discriminant learning where the negative class blue) forms subclasses. subclasses, as illustrated in the example of Figure. Class discrimination as defined by CSDA its variants disregards this structure. In this paper, we formulate the class specific discriminant analysis optimization problem based on a probabilistic model that incorporates the above-described structure of the negative class. We show that the optimization criterion used by stard CSDA its variants corresponts to a special case of the proposed probabilistic model, while new discriminant subspaces can be obtained by allowing samples of the negative class to form subclasses automatically determined by applying unsupervised) clustering techniques on the negative class data. Moreover, the use of the proposed probabilistic model for class-specific discriminant learning naturally leads to a classification rule in the discriminant subspace, something that is not possible when the stard CSDA criterion is considered. II. RELATED WORK Let us denote by S p = {x,..., x Np } a set of D- dimensional vectors representing samples of the positive class by S n = {x Np+,..., x N }, where N = +, a set of N n vectors representing samples of the negative class. In the following we consider the linear class-specific subspace learning case we will describe how to perform nonlinear kernel-based) class-specific subspace learning following the same processing steps in Section III-C. We would like determine a linear projection W R D d, mapping x i to

2 a d-dimensional subspace, i.e. z i = W T x i that enhances discrimination of the two classes. Class-specific Discriminant Analysis [] defines the projection matrix W as the one maximizing the following criterion: J W) = D nw) D p W), ) where D n W) D p W) are the out-of-class intra-class distances defined as follows: D n W) = x i S n W T x i W T m ) D p W) = x i S p W T x i W T m. 3) m is the mean vector of the positive class, i.e. m = x i S p x i. That is, it is assumed that the positive class is unimodal W is defined as the matrix maping the positive class vectors as close as possible to the positive class mean vector, while it maps the negative class vectors as far as possible from it, in the reduced dimensionality space R d. The criterion J W) in ) can be expressed as: J W) = T r W T S n W ) T r W T S p W), 4) where T r ) is the trace operator. S n R D D S p R D D are the out-of-class intra-class scatter matrices defined in the space R D : S n = x i m) x i m) T 5) x i S n S p = x i S p x i m) x i m) T. 6) W is obtained by solving the generalized eigen-analysis problem of S n w = λs p w keeping the eigenvectors corresponding to the d largest eigenvalues []. Here, we should note that, since the rank of S p is at most considering that usually N n > N n ), the dimensionality of the learnt subspace d is restricted by the number of samples forming the positive class or by the dimensionality of the original space D), i.e. d min, D). In the case where S n is singular, a regularized version of the above problem is solved by using S n = S n +ɛi, where I R D D is the identity matrix ɛ is a small positive value. A Spectral Regression [] based solution of 4) has been proposed in [4], [5]. Let us denote by w an eigenvector of the generalized eigen-analysis problem S n w = λs p w with eigenvalue λ. By setting Xw = v, the original eigen-analysis problem can be transformed to the following eigen-analysis problem P n v = λp p v, where P n = e n e T n e n e T p e p e T n + N e p p e T p P p = + N )e p p e T p. Here e p R N e n R N are the positive negative class indicator vectors. That is, e p,i = if x i S p e p,i = 0 for x i S n, e n,i = e p,i. Based on the above, the projection matrix W optimizing 4) is obtained by applying a two-step process: Solution of the eigen-analysis problem P n v = λp p v, leading to the determination of a matrix V = [v,..., v d ], where v i is the eigenvector corresponding to the i-th largest eigenvalue. Calculation of W = [w,..., w d ], where: Xw i = v i, i =,..., d. 7) It has been shown in [3] that by exploiting the structure of P p P n, their generalized eigenvectors v can be directly obtained using the labeling information of the training vectors. However, in that case the order of the eigenvectors is not related to their discrimination ability. Finally, it has been shown in [4] that the class-specific discriminant analysis problem in ) can be casted as a low-rank regression problem in which the target vectors are the same as those defined in [3], providing a new proof of the equivalence between class-specific discriminant analysis class specific spectral regression. After determining the data projection matrix W, the training vectors x i, i =,..., N are mapped to the discriminant subspace z i = W T x i. When a classification problem is considered, a classifier is trained using z i. For example, a linear SVM is used in [5] that is trained on vectors d i = z i µ, where µ is the mean vector of the positive samples in the discriminant subspace the absolute value operator is applied element-wise on the resulting vector. Due to the need of training an additional classifier, class-specific discriminant analysis models are usually employed in class-specific ranking settings, where test vectors x j are mapped to the discriminant subspace z j = WT x j are subsequently ordered based on their distance w.r.t. the positive class mean d j = z j µ. III. PROBABILISTIC CLASS-SPECIFIC LEARNING In this section, we follow a probabilistic approach for defining a class-specific discrimination criterion that is able to encode subclass information of the negative class. We call the proposed method Probabilistic Class-Specific Discriminant Analysis PCSDA). PCDA assumes that exists a data generation process for the positive class following the distribution: P x) N m, Φ p ), 8) where m is the mean vector of the positive class Φ p is the covariance of the underlying data generation process. A multi-modal) negative class is formed by subclasses the representations of which are drawn from the following distribution: P y) N m, Φ n ). 9) Here y denotes the representations of the negative subclasses Φ n is the covariance of the negative subclass generation process. Samples of the negative subclasses are drawn from the following distribution: P x) = Ny m, Φ n ) Nx y, Φ w ) dy, 0) where Φ w is the covariance of the underlying data generation process for each subclass of the negative class. As we will show later, a special case of the above definition for

3 3 the negative class is when each subclass is formed by one sample, leading to the class discrimination criterion used by the stard CSDA it s variants. Given a set of positive samples S p = {x i,..., x Np } the probability of correct assignment under our model under the assumption that the data are i.i.d.) is equal to: P S p ) = P x i ) ) Let us assume that negative class is formed by K subclasses, i.e. S n = {S,..., S K }, each having a cardinality of M = N n /K. We will show how to relax this assumption in the next subsection. Then the probability of assigning each of the negative samples to the corresponding subclass is equal to: P S k ) = Ny m, Φ n ) P x i y, Φ w ) dy. ) x i S k i= By centering the training samples to the positive class mean this can always be done by setting x i m x i ) we have: P S p ) = exp ) π) NpD Φ p Np T rφ p S p ) 3) P S k ) = M D π) MD Φ w M Φ n + M Φ w exp ) T rφ w S k) w ) exp T r Φ n + M Φ w) S k) ) ) n. 4) In the above, S p is the scatter matrix of the positive class, i.e. S p = x i S p x i x T i, Sk) w is the within-subclass scatter matrix of the k-th subclass of the negative class, i.e. S k) w = x i S k x i x k )x i x k ) T S k) n is the scatter matrix of the k-th subclass w.r.t. the positive class, i.e. S k) n = x k x T k, where x k is the mean vector of the k-th subclass of the negative class, i.e. x k = M x i S k x i. Since the assignment of the negative samples to subclasses is not provided by the labels used during training, we define them by applying a clustering method e.g. K-Means) on the negative class vectors. A. Training phase The parameters of the proposed PCSDA are the covariance matrices Φ p, Φ n Φ w defining the data generation processes for the positive negative classes. These parameters are estimated by maximizing the log) probability of correct assignment of the vectors x i, i =,..., N: where L = ln P x i S p ) + ln P x i S n ), 5) ln P x i S p ) = C ln Φ p T rφ p S p ) 6) lnp x i S n ) = K lnp S k ) k= = C N n K ln Φ w K ln Φ n + M Φ w T rφ w S w ) T r Φ n + M Φ w) S n ). 7) S n S w are the total scatter matrices of the negative subclasses, i.e., S w = K k= Sk) w S n = K k= Sk) n. By substituting 6) 7) in 5) the optimization function takes the form: L = C 3 ln Φ p T rφ p S p ) N n K ln Φ w K ln Φ n + M Φ w T rφ w S w ) T r Φ n + M Φ w) ) S n.8) The saddle points of L with respect to Φ p, Φ n Φ w lead to: ϑl = 0 ϑφ p Φ p = S p 9) ϑl = 0 ϑφ n Φ n + M Φ w = K S n 0) ϑl = 0 Φ w = ϑφ w N n K S w. ) Combining 0) ) we get: Φ n = K S n MN n K) S w. ) By combining ) ) we can define the total scatter matrix for the negative class: Φ t = Φ n + Φ w = K S n + N n S w = S t. 3) Let us denote by Q R D D the eigenvectors of the generalized eigen-analysis problem: Φ t q = λφ p q. 4) The columns of Q diagonalize both Φ p Φ t, i.e. Q T Φ p Q = I Q T Φ t Q = t, where t R D D is diagonal matrix. Setting V = Q T, we have: Φ p = VV T Φ t = V t V T. 5) Let W R D D be ) the solution of the eigen-analysis Np problem K S n + Np N n S w w = λs p w. From 9) 5): V = W T Λ p ), 6) where Λ p = W T S p W is a diagonal matrix scaling the columns of W T to calculate V. Combining 4), 5) 6): t = Λ t Λ p, 7)

4 4 where Λ t = W T S t W. Thus, V t can be computed by solving an eigen-analysis problem defined on the input vectors x i, i =,..., N. In the above we set the assumption that the number of samples forming the negative classes is equal. This assumption can be relaxed following one of the following approaches. After assigning all negative samples to subclasses calculating the conditional distributions of each subclass, one can sample M vectors from these distributions. Alternatively, one can calculate the total within-subclass matrix of the negative class S w define the subclass scatter matrices as S k) w = K S w. Note that for the calculation of the model s parameters Eqs. 9) 3)), only the total scatter matrices S p, S w S n are used. B. Test phase After the estimation of the model s parameters, a new sample represented by the vector x can be evaluated. The posterior probabilities of the positive negative classes are given by: P c p x ) = px c p )P c p ) px 8) ) P c n x ) = px c n )P c n ) px ) 9) The a priori class probabilities P c p ) P c n ) can be calculated by the proportion of the positive negative samples in the training phase, i.e. P c p ) = /N P c n ) = N n /N. Depending on the problem at h, it may be sometimes preferable to consider equiprobable classes, i.e. P c p ) = P c n ) = /, leading to the maximum likelihood classification case. The class-conditional probabilities of x are given by: px c p ) = exp ) π) D Φ p x T Φp x = exp ) π) D VV T z T z 30) px c n ) = = exp ) π) D Φ t x T Φ t x exp ) π) D V t V T z T t z 3) where z = V x. In the case where we want z R d, d < D, we keep the dimensions of z corresponding to the d largest diagonal elements of t. Combining 8)-3) the ratio of class posterior probabilities is equal to: λx ) = P c p) t P c n ) exp z T z ) exp z T t z ) 3) can also be used to define a classification rule: gx ) = ln P c p ) ln P c n ) + ln t z T z + z T t z 33) classifying x to the positive class if gx ) > 0 to the negative class otherwise. In class-specific ranking settings one can follow the process applied when using the stard CSDA approach. First, test vectors x j are mapped to the discriminant subspace z j = W T x j z j, i =,..., N are obtained. Then, z j s are ordered based on their distance w.r.t. the positive class mean d j = z j µ. C. CSDA variants under the probabilistic model A special case of the PCSDA can be obtained by setting the assumption that each negative sample forms a negative subclass, i.e. K = N n M =. In that case Φ w = 0, Φ t = Φ n, the negative samples are drawn from a distribution P x) N m, Φ b ), W is calculated by solving for S n w = λs p w, where λ = λn n /, i.e., we obtain the class discrimination definition of CSDA. The Spectral Regressionbased solutions of CSDA in [4], [5], [3] the low-rank regression solution of [4] use the same class discrimination criterion, thus, correspond to the same setting of PCSDA. Since the discrimination criterion used in CSDA is a special case of the proposed probabilistic model, all the abovementioned methods can be extended to perform classification using g ) in 33). D. Non-linear PCSDA In the above analysis we considered the linear class-specific subspace learning case. In order to non-linearly map x i R D to z i R d traditional kernel-based learning methods perform a non-linear mapping of the input space R D to the feature space F using a function φ ), i.e. x i R D φx i ) F. Then, linear class-specific projections are defined by using the training data in F. Since the dimensionality of F is arbitrary virtually infinite), the data representations in F cannot be calculated. Traditional kernel-based learning methods address this issue by exploiting the Representer theorem the non-linear mapping is implicitly performed using the kernel function encoding dot products in the feature space, i.e. κx i, x j ) = φx i ) T φx j ) [5]. As has been shown in [6] the effective dimensionality of the kernel space F is at most equal to L = mind, N), thus, an explicit non-linear mapping x i R D φ i R L can be calculated such that κx i, x j ) = φ T i φ j. This is achieved by using Φ = Σ U T, where U Σ contain the eigenvectors eigenvavlues of the kernel matrix K R N N [7]. Thus, extension of PCSDA to the non-linear kernel) case can be readily obtained by applying the above-described linear PCSDA on the vectors φ i, i =,..., N. For the cases where the size of training set is prohibitive for applying kernel-based discriminant learning, the Nyström-based kernel subspace learning method of [8] or nonlinear data mappings based on romized features, as proposed in [9], can be used. Here we should note that the application of K-Means using in R L corresponds to the application of kernel K-Means in the original space R D.

5 5 TABLE I AVERAGE PRECISION ON THE MNIST DATASET. CSDA PCSDA K = N n ) PCSDA K = 5) PCSDA K = 0) PCSDA K = 0) CSSR [4] PCSR Mean TABLE II AVERAGE PRECISION AND STANDARD DEVIATION) OVER FIVE EXPERIMENTS ON THE JAFFE DATASET. CSDA PCSDA K = N n ) PCSDA K = 5) PCSDA K = 0) PCSDA K = 0) CSSR [4] PCSR Anger ) ) ) ) ) ) ) Disgust ) ) ) ) ) ) ) Fear ) ) ) ) ) ) ) Happyness ) ) ) ) ) ) ) Sadness ) ) ) ) ) ) ) Surprise ) ) ) ) ) ) ) Neutral ) ) ) ) ) ) ) Mean ) ) ) ) ) ) ) TABLE III AVERAGE PRECISION AND STANDARD DEVIATION) OVER FIVE EXPERIMENTS ON THE 5 SCENE DATASET. CSDA PCSDA K = N n ) PCSDA K = 5) PCSDA K = 0) PCSDA K = 0) CSSR [4] PCSR C ) ) ) ) ) ) ) C ) ) ) ) ) ) ) C ) ) ) ) ) ) ) C ) ) ) ) ) ) ) C ) ) ) ) ) ) ) C ) ) ) ) ) ) ) C ) ) ) ) ) ) ) C ) ) ) ) ) ) ) C ) ) ) ) ) ) ) C ) ) ) ) ) ) ) C ) ) ) ) ) ) ) C ) ) ) ) ) ) ) C ) ) ) ) ) ) ) C ) ) ) ) ) ) ) C ) ) ) ) ) ) ) Mean ) ) ) ) ) ) ) IV. EXPERIMENTS In this Section we provide experimental results illustrating the performance of the proposed PCSDA in comparison with existing CSDA variants. For each dataset, we formed classspecific ranking problems for each class using the class data as positive samples the data of the remaining classes as negative samples. In all the experiments the data representations are first mapped to the PCA space preserving 98% of the spectrum energy subsequently non-linearly mapped to the subspace of the kernel space as discussed in Section III-D). We used the RBF kernel function setting the value of σ equal to the mean distance value between the positive training vectors. The discriminant subspace is subsequently obtained by applying one of the competing discriminant methods. The dimensionality of the discriminant subspace is determined by applying five-fold cross-validation on the training data within the range of [, 5]. Finally, performance is measured in the discriminant subspace using the average precision metric. Table I illustrates the performance over all classes of the MNIST dataset [0]. Since by using the entire training set performance saturates, we formed a reduced problem by using only the first 00 images of each class in the training set we report performance on the entire test set. Table II illustrates the performance over all classes of the JAFFEE dataset []. We used the grayscale values of the facial images as a representation. For class-specific ranking problem, we applied five experiments by romly splitting each class in 70%/30% subsets using the first subset of each class

6 6 to form the training set the second ones to form the test set. Table III illustrates the performance over all classes of the 5 scene dataset []. We employed the deep features generated by average pooling over spatial dimension of the last convolution layer of VGG network [3] trained on ILSVRC0 database. For each class-specific ranking problem, we applied five experiments by romly splitting each class using 70%/30% splits. In Tables I-III, the variants of the proposed method that outperform the corresponding baseline are highlighted for each problem. As can be seen, the performance obtained for the stard CSDA its probabilistic version PCSDA K = N n ) is very similar. The differences observed are mainly due to the small differences in the ranking of the used projection vectors Eq. 7)). Moreover, exploiting subclass information of the negative class can be beneficial since performance improves in most tested cases. Comparing the CSSR in [4] its probabilistic version PCSSR, it can be seen that the latter improves the performance considerably. This is due to the use of 7), which ranks all eigenvectors of CSSR according to their class-specific discrimination power. Finally, in Table IV we compare the performance of CSDA PCSDA for classification. Since each of the class-specific classification problems is imbalanced, we used the g-mean metric, which is defined as the square-root of the multiplication of the correct classification rate of each class. Classification is performed by using the classification rule g ) in 33) for PCSDA. For CSDA a linear SVM is used is trained on vectors d i = z i µ, similar to [5]. The value of C is optimized in the range 0 3:3) jointly with the discriminant subspace dimensionality by applying five-fold cross-validation on the training data. Comparing the performance of the CSDA+SVM classification scheme with that of PCSDA K = N n ) on the MNIST JAFFE datasets, we can see that the use of an additional classifier trained on the data representations in the discriminant subspace increases classification performance. By by allowing the negative class to form subclasses the performance increases for PCSDA is competitive of that of the CSDA+SVM classification scheme. Overall, the PCSDA model achieves competitive performance without requiring the training of a new classifier in the discriminant subspace. TABLE IV PERFORMANCE ON CLASSIFICATION PROBLEMS. MNIST 5 scene JAFFE CSDA ) ) PCSDA K = N n ) ) ) PCSDA K = 5) ) ) PCSDA K = 0) ) ) PCSDA K = 0) ) ) V. CONCLUSIONS In this paper we proposed a probabilistic model for classspecific discriminant subspace learning that is able to incorporate subclass information naturally appearing in the negative class in the optimization criterion. The adoption of a probability-based optimization criterion for class-specific discriminant subspace learning leads to a classifier defined on the data representations in the discriminant subspace. We showed that the proposed approach includes as special cases existing class-specific discriminant analysis methods. Experimental results illustrated the performance of the proposed model, in comparison with that of related methods, in both verification classification problems. REFERENCES [] Y. Kittler, J. Li, J. Matas, Face verification using client specific Fisher faces, The Statistics of Directions, Shapes Images, pp , 000. [] G. Goudelis, S. Zafeiriou, A. Tefas, I. Pitas, Class-specific kernel discriminant analysis for face verification, IEEE Transactions on Information Forensics Security, vol., no. 3, pp , 007. [3] S. Zafeiriou, G. Tzimiropoulos, M. Petrou, T. Stathaki, Regularized kernel discriminant analysis with a robust kernel for face recognition verification, IEEE Transactions on Neural Networks, vol. 3, no. 3, pp , 0. [4] S. Arashloo J. Kittler, Class-specific kernel fusion of multiple descriptors for face verification using multiscale binarized statistical image features, IEEE Tranactions on Information Forensics Security, vol. 9, no., pp , 04. [5] A. Iosifidis, A. Tefas, I. Pitas, Class-specific reference discriminant analysis with application in human behavior analysis, IEEE Tranactions on Human-Machine Systems, vol. 45, no. 3, pp , 05. [6] A. Iosifidis, M. Gabbouj, P. Pekki, Class-specific nonlinear projections using class-specific kernel spaces, IEEE International Conference on Big Data Science Engineering, pp. 8, 05. [7] A. Iosifidis, A. Tefas, A. Pitas, Activity based person identification using fuzzy representation discriminant learning, IEEE Transactions on Information Forensics Security, vol. 7, no., pp , 0. [8] R. Duda, P. Hart, D. Stork, Pattern Classification, nd ed. Wiley- Interscience, 000. [9] A. Iosifidis, A. Tefas, I. Pitas, On the optimal class representation in Linear Discriminant Analysis, IEEE Transactions on Neural Networks Learning Systems, vol. 4, no. 9, pp , 03. [0] A. Iosifidis, A. Tefas, I. Pitas, Kernel reference discriminant analysis, Pattern Recognition Letters, vol. 49, pp. 85 9, 04. [] Y. Jia, F. Nie, C. Zhang, Trace ratio problem revisited, IEEE Transactions on Neural Networks, vol. 0, no. 4, pp , 009. [] D. Cai, X. He, J. Han, Spectral regression for efficient regularized subspace learning, International Conference on Computer Vision, 007. [3] A. Iosifidis M. Gabbouj, Scaling up class-specific kernel discriminant analysis for large-scale face verification, IEEE Transactions on Information Forensics Security, vol., no., pp , 06. [4] A. Iosifidis M. Gabbouj, Class-specific kernel discriminant analysis revisited: Further analysis extensions, IEEE Transactions on Cybernetics, vol. 47, no., pp , 07. [5] B. Scholkpf A. Smola, Learning with Kernels. MIT Press, 00. [6] N. Kwak, Nonlinear Projection Trick in kernel methods: an alternative to the kernel trick, IEEE Transactions on Neural Networks Learning Systems, vol. 4, no., pp. 3 9, 03. [7] N. Kwak, Implementing kernel methods incrementally by incremental nonlinear projection trick, IEEE Transactions on Cybernetics, vol. 47, no., pp , 07. [8] A. Iosifidis M. Gabbouj, Nyström-based approximate kernel subspace learning, Pattern Recognition, vol. 57, pp , 06. [9] A. Rahimi B. Recht, Rom features for large-scale kernel machines, Advances in Neural Information Processing Systems, 007. [0] Y. LeCun, L. Bottou, Y. Bengio, Gradient-based learning applied to document recognition, Proceedings of the IEEE, vol. 86, pp , 998. [] M. Lyons, S. Akamatsu, M. Kamachi, J. Gyoba, Coding facial expressions with gabor wavelets, IEEE International Conference on Automatic Face Gesture Recognition, pp , 998. [] S. Lazebnik, C. Schmid, J. Ponce, Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories, IEEE Conference on Computer Vision Pattern Recognition, 006. [3] K. Simonyan A. Zisserman, Very deep convolutional networks for large-scale image recognition, arxiv: , 04.

Class Mean Vector Component and Discriminant Analysis for Kernel Subspace Learning

Class Mean Vector Component and Discriminant Analysis for Kernel Subspace Learning 1 Class Mean Vector Component and Discriminant Analysis for Kernel Suspace Learning Alexandros Iosifidis Department of Engineering, ECE, Aarhus University, Denmark alexandros.iosifidis@eng.au.dk arxiv:1812.05988v1

More information

Multilinear Class-Specific Discriminant Analysis

Multilinear Class-Specific Discriminant Analysis 1 Pattern Recognition Letters journal homepage: www.elsevier.com Multilinear Class-Specific Discriminant Analysis Dat Thanh Tran a,, Moncef Gabbouj a, Alexandros osifidis a,b a Laboratory of Signal Processing,

More information

Probabilistic and Statistical Learning. Alexandros Iosifidis Academy of Finland Postdoctoral Research Fellow (term )

Probabilistic and Statistical Learning. Alexandros Iosifidis Academy of Finland Postdoctoral Research Fellow (term ) Probabilistic and Statistical Learning Alexandros Iosifidis Academy of Finland Postdoctoral Research Fellow (term 2016-2019) Tampere, August 2016 Outline Problems considered Bayesian Learning Subspace

More information

Nyström-based Approximate Kernel Subspace Learning

Nyström-based Approximate Kernel Subspace Learning Nyström-based Approximate Kernel Subspace Learning Alexandros Iosifidis and Moncef Gabbouj Department of Signal Processing, Tampere University of Technology, P. O. Box 553, FIN-33720 Tampere, Finland e-mail:

More information

A BAYESIAN APPROACH FOR EXTREME LEARNING MACHINE-BASED SUBSPACE LEARNING. Alexandros Iosifidis and Moncef Gabbouj

A BAYESIAN APPROACH FOR EXTREME LEARNING MACHINE-BASED SUBSPACE LEARNING. Alexandros Iosifidis and Moncef Gabbouj A BAYESIAN APPROACH FOR EXTREME LEARNING MACHINE-BASED SUBSPACE LEARNING Alexandros Iosifidis and Moncef Gabbouj Department of Signal Processing, Tampere University of Technology, Finland {alexandros.iosifidis,moncef.gabbouj}@tut.fi

More information

Stefanos Zafeiriou, Anastasios Tefas, and Ioannis Pitas

Stefanos Zafeiriou, Anastasios Tefas, and Ioannis Pitas GENDER DETERMINATION USING A SUPPORT VECTOR MACHINE VARIANT Stefanos Zafeiriou, Anastasios Tefas, and Ioannis Pitas Artificial Intelligence and Information Analysis Lab/Department of Informatics, Aristotle

More information

ESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN ,

ESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN , Sparse Kernel Canonical Correlation Analysis Lili Tan and Colin Fyfe 2, Λ. Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong. 2. School of Information and Communication

More information

Multiple Similarities Based Kernel Subspace Learning for Image Classification

Multiple Similarities Based Kernel Subspace Learning for Image Classification Multiple Similarities Based Kernel Subspace Learning for Image Classification Wang Yan, Qingshan Liu, Hanqing Lu, and Songde Ma National Laboratory of Pattern Recognition, Institute of Automation, Chinese

More information

Lecture 13 Visual recognition

Lecture 13 Visual recognition Lecture 13 Visual recognition Announcements Silvio Savarese Lecture 13-20-Feb-14 Lecture 13 Visual recognition Object classification bag of words models Discriminative methods Generative methods Object

More information

A Multi-task Learning Strategy for Unsupervised Clustering via Explicitly Separating the Commonality

A Multi-task Learning Strategy for Unsupervised Clustering via Explicitly Separating the Commonality A Multi-task Learning Strategy for Unsupervised Clustering via Explicitly Separating the Commonality Shu Kong, Donghui Wang Dept. of Computer Science and Technology, Zhejiang University, Hangzhou 317,

More information

When Fisher meets Fukunaga-Koontz: A New Look at Linear Discriminants

When Fisher meets Fukunaga-Koontz: A New Look at Linear Discriminants When Fisher meets Fukunaga-Koontz: A New Look at Linear Discriminants Sheng Zhang erence Sim School of Computing, National University of Singapore 3 Science Drive 2, Singapore 7543 {zhangshe, tsim}@comp.nus.edu.sg

More information

Tensor Methods for Feature Learning

Tensor Methods for Feature Learning Tensor Methods for Feature Learning Anima Anandkumar U.C. Irvine Feature Learning For Efficient Classification Find good transformations of input for improved classification Figures used attributed to

More information

Learning Kernel Parameters by using Class Separability Measure

Learning Kernel Parameters by using Class Separability Measure Learning Kernel Parameters by using Class Separability Measure Lei Wang, Kap Luk Chan School of Electrical and Electronic Engineering Nanyang Technological University Singapore, 3979 E-mail: P 3733@ntu.edu.sg,eklchan@ntu.edu.sg

More information

Aruna Bhat Research Scholar, Department of Electrical Engineering, IIT Delhi, India

Aruna Bhat Research Scholar, Department of Electrical Engineering, IIT Delhi, India International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2017 IJSRCSEIT Volume 2 Issue 6 ISSN : 2456-3307 Robust Face Recognition System using Non Additive

More information

Linear Dimensionality Reduction

Linear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis

More information

CS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines

CS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines CS4495/6495 Introduction to Computer Vision 8C-L3 Support Vector Machines Discriminative classifiers Discriminative classifiers find a division (surface) in feature space that separates the classes Several

More information

Clustering VS Classification

Clustering VS Classification MCQ Clustering VS Classification 1. What is the relation between the distance between clusters and the corresponding class discriminability? a. proportional b. inversely-proportional c. no-relation Ans:

More information

PATTERN CLASSIFICATION

PATTERN CLASSIFICATION PATTERN CLASSIFICATION Second Edition Richard O. Duda Peter E. Hart David G. Stork A Wiley-lnterscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane Singapore Toronto CONTENTS

More information

SGN (4 cr) Chapter 5

SGN (4 cr) Chapter 5 SGN-41006 (4 cr) Chapter 5 Linear Discriminant Analysis Jussi Tohka & Jari Niemi Department of Signal Processing Tampere University of Technology January 21, 2014 J. Tohka & J. Niemi (TUT-SGN) SGN-41006

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

Pattern Recognition 2

Pattern Recognition 2 Pattern Recognition 2 KNN,, Dr. Terence Sim School of Computing National University of Singapore Outline 1 2 3 4 5 Outline 1 2 3 4 5 The Bayes Classifier is theoretically optimum. That is, prob. of error

More information

Subspace Analysis for Facial Image Recognition: A Comparative Study. Yongbin Zhang, Lixin Lang and Onur Hamsici

Subspace Analysis for Facial Image Recognition: A Comparative Study. Yongbin Zhang, Lixin Lang and Onur Hamsici Subspace Analysis for Facial Image Recognition: A Comparative Study Yongbin Zhang, Lixin Lang and Onur Hamsici Outline 1. Subspace Analysis: Linear vs Kernel 2. Appearance-based Facial Image Recognition.

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

Discriminative Direction for Kernel Classifiers

Discriminative Direction for Kernel Classifiers Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering

More information

Machine Learning 2017

Machine Learning 2017 Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section

More information

Multivariate statistical methods and data mining in particle physics

Multivariate statistical methods and data mining in particle physics Multivariate statistical methods and data mining in particle physics RHUL Physics www.pp.rhul.ac.uk/~cowan Academic Training Lectures CERN 16 19 June, 2008 1 Outline Statement of the problem Some general

More information

Classification and Pattern Recognition

Classification and Pattern Recognition Classification and Pattern Recognition Léon Bottou NEC Labs America COS 424 2/23/2010 The machine learning mix and match Goals Representation Capacity Control Operational Considerations Computational Considerations

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

Discriminant Kernels based Support Vector Machine

Discriminant Kernels based Support Vector Machine Discriminant Kernels based Support Vector Machine Akinori Hidaka Tokyo Denki University Takio Kurita Hiroshima University Abstract Recently the kernel discriminant analysis (KDA) has been successfully

More information

Face recognition Computer Vision Spring 2018, Lecture 21

Face recognition Computer Vision Spring 2018, Lecture 21 Face recognition http://www.cs.cmu.edu/~16385/ 16-385 Computer Vision Spring 2018, Lecture 21 Course announcements Homework 6 has been posted and is due on April 27 th. - Any questions about the homework?

More information

Kernel Uncorrelated and Orthogonal Discriminant Analysis: A Unified Approach

Kernel Uncorrelated and Orthogonal Discriminant Analysis: A Unified Approach Kernel Uncorrelated and Orthogonal Discriminant Analysis: A Unified Approach Tao Xiong Department of ECE University of Minnesota, Minneapolis txiong@ece.umn.edu Vladimir Cherkassky Department of ECE University

More information

A Least Squares Formulation for Canonical Correlation Analysis

A Least Squares Formulation for Canonical Correlation Analysis A Least Squares Formulation for Canonical Correlation Analysis Liang Sun, Shuiwang Ji, and Jieping Ye Department of Computer Science and Engineering Arizona State University Motivation Canonical Correlation

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please

More information

Algorithm-Independent Learning Issues

Algorithm-Independent Learning Issues Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning

More information

Machine Learning - MT & 14. PCA and MDS

Machine Learning - MT & 14. PCA and MDS Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)

More information

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Intelligent Data Analysis and Probabilistic Inference Lecture

More information

c 4, < y 2, 1 0, otherwise,

c 4, < y 2, 1 0, otherwise, Fundamentals of Big Data Analytics Univ.-Prof. Dr. rer. nat. Rudolf Mathar Problem. Probability theory: The outcome of an experiment is described by three events A, B and C. The probabilities Pr(A) =,

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

Lecture: Face Recognition

Lecture: Face Recognition Lecture: Face Recognition Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 12-1 What we will learn today Introduction to face recognition The Eigenfaces Algorithm Linear

More information

Adaptive Kernel Principal Component Analysis With Unsupervised Learning of Kernels

Adaptive Kernel Principal Component Analysis With Unsupervised Learning of Kernels Adaptive Kernel Principal Component Analysis With Unsupervised Learning of Kernels Daoqiang Zhang Zhi-Hua Zhou National Laboratory for Novel Software Technology Nanjing University, Nanjing 2193, China

More information

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis .. December 20, 2013 Todays lecture. (PCA) (PLS-R) (LDA) . (PCA) is a method often used to reduce the dimension of a large dataset to one of a more manageble size. The new dataset can then be used to make

More information

CS 231A Section 1: Linear Algebra & Probability Review

CS 231A Section 1: Linear Algebra & Probability Review CS 231A Section 1: Linear Algebra & Probability Review 1 Topics Support Vector Machines Boosting Viola-Jones face detector Linear Algebra Review Notation Operations & Properties Matrix Calculus Probability

More information

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang CS 231A Section 1: Linear Algebra & Probability Review Kevin Tang Kevin Tang Section 1-1 9/30/2011 Topics Support Vector Machines Boosting Viola Jones face detector Linear Algebra Review Notation Operations

More information

Machine Learning 11. week

Machine Learning 11. week Machine Learning 11. week Feature Extraction-Selection Dimension reduction PCA LDA 1 Feature Extraction Any problem can be solved by machine learning methods in case of that the system must be appropriately

More information

Supervised locally linear embedding

Supervised locally linear embedding Supervised locally linear embedding Dick de Ridder 1, Olga Kouropteva 2, Oleg Okun 2, Matti Pietikäinen 2 and Robert P.W. Duin 1 1 Pattern Recognition Group, Department of Imaging Science and Technology,

More information

Semi-Supervised Learning through Principal Directions Estimation

Semi-Supervised Learning through Principal Directions Estimation Semi-Supervised Learning through Principal Directions Estimation Olivier Chapelle, Bernhard Schölkopf, Jason Weston Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany {first.last}@tuebingen.mpg.de

More information

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti

More information

Example: Face Detection

Example: Face Detection Announcements HW1 returned New attendance policy Face Recognition: Dimensionality Reduction On time: 1 point Five minutes or more late: 0.5 points Absent: 0 points Biometrics CSE 190 Lecture 14 CSE190,

More information

Global Scene Representations. Tilke Judd

Global Scene Representations. Tilke Judd Global Scene Representations Tilke Judd Papers Oliva and Torralba [2001] Fei Fei and Perona [2005] Labzebnik, Schmid and Ponce [2006] Commonalities Goal: Recognize natural scene categories Extract features

More information

CMU-Q Lecture 24:

CMU-Q Lecture 24: CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input

More information

Keywords Eigenface, face recognition, kernel principal component analysis, machine learning. II. LITERATURE REVIEW & OVERVIEW OF PROPOSED METHODOLOGY

Keywords Eigenface, face recognition, kernel principal component analysis, machine learning. II. LITERATURE REVIEW & OVERVIEW OF PROPOSED METHODOLOGY Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Eigenface and

More information

A Novel PCA-Based Bayes Classifier and Face Analysis

A Novel PCA-Based Bayes Classifier and Face Analysis A Novel PCA-Based Bayes Classifier and Face Analysis Zhong Jin 1,2, Franck Davoine 3,ZhenLou 2, and Jingyu Yang 2 1 Centre de Visió per Computador, Universitat Autònoma de Barcelona, Barcelona, Spain zhong.in@cvc.uab.es

More information

CITS 4402 Computer Vision

CITS 4402 Computer Vision CITS 4402 Computer Vision A/Prof Ajmal Mian Adj/A/Prof Mehdi Ravanbakhsh Lecture 06 Object Recognition Objectives To understand the concept of image based object recognition To learn how to match images

More information

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?

More information

ABC-LogitBoost for Multi-Class Classification

ABC-LogitBoost for Multi-Class Classification Ping Li, Cornell University ABC-Boost BTRY 6520 Fall 2012 1 ABC-LogitBoost for Multi-Class Classification Ping Li Department of Statistical Science Cornell University 2 4 6 8 10 12 14 16 2 4 6 8 10 12

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

arxiv: v1 [cs.cv] 8 Oct 2018

arxiv: v1 [cs.cv] 8 Oct 2018 Deep LDA Hashing Di Hu School of Computer Science Northwestern Polytechnical University Feiping Nie School of Computer Science Northwestern Polytechnical University arxiv:1810.03402v1 [cs.cv] 8 Oct 2018

More information

Local Learning Projections

Local Learning Projections Mingrui Wu mingrui.wu@tuebingen.mpg.de Max Planck Institute for Biological Cybernetics, Tübingen, Germany Kai Yu kyu@sv.nec-labs.com NEC Labs America, Cupertino CA, USA Shipeng Yu shipeng.yu@siemens.com

More information

High Dimensional Discriminant Analysis

High Dimensional Discriminant Analysis High Dimensional Discriminant Analysis Charles Bouveyron 1,2, Stéphane Girard 1, and Cordelia Schmid 2 1 LMC IMAG, BP 53, Université Grenoble 1, 38041 Grenoble cedex 9 France (e-mail: charles.bouveyron@imag.fr,

More information

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods Pattern Recognition and Machine Learning Chapter 6: Kernel Methods Vasil Khalidov Alex Kläser December 13, 2007 Training Data: Keep or Discard? Parametric methods (linear/nonlinear) so far: learn parameter

More information

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University Lecture 4: Principal Component Analysis Aykut Erdem May 016 Hacettepe University This week Motivation PCA algorithms Applications PCA shortcomings Autoencoders Kernel PCA PCA Applications Data Visualization

More information

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2

More information

Feature selection and extraction Spectral domain quality estimation Alternatives

Feature selection and extraction Spectral domain quality estimation Alternatives Feature selection and extraction Error estimation Maa-57.3210 Data Classification and Modelling in Remote Sensing Markus Törmä markus.torma@tkk.fi Measurements Preprocessing: Remove random and systematic

More information

Fantope Regularization in Metric Learning

Fantope Regularization in Metric Learning Fantope Regularization in Metric Learning CVPR 2014 Marc T. Law (LIP6, UPMC), Nicolas Thome (LIP6 - UPMC Sorbonne Universités), Matthieu Cord (LIP6 - UPMC Sorbonne Universités), Paris, France Introduction

More information

Classification of handwritten digits using supervised locally linear embedding algorithm and support vector machine

Classification of handwritten digits using supervised locally linear embedding algorithm and support vector machine Classification of handwritten digits using supervised locally linear embedding algorithm and support vector machine Olga Kouropteva, Oleg Okun, Matti Pietikäinen Machine Vision Group, Infotech Oulu and

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Machine learning for pervasive systems Classification in high-dimensional spaces

Machine learning for pervasive systems Classification in high-dimensional spaces Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

Probabilistic Machine Learning. Industrial AI Lab.

Probabilistic Machine Learning. Industrial AI Lab. Probabilistic Machine Learning Industrial AI Lab. Probabilistic Linear Regression Outline Probabilistic Classification Probabilistic Clustering Probabilistic Dimension Reduction 2 Probabilistic Linear

More information

Principal Component Analysis

Principal Component Analysis B: Chapter 1 HTF: Chapter 1.5 Principal Component Analysis Barnabás Póczos University of Alberta Nov, 009 Contents Motivation PCA algorithms Applications Face recognition Facial expression recognition

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR

More information

Cluster Kernels for Semi-Supervised Learning

Cluster Kernels for Semi-Supervised Learning Cluster Kernels for Semi-Supervised Learning Olivier Chapelle, Jason Weston, Bernhard Scholkopf Max Planck Institute for Biological Cybernetics, 72076 Tiibingen, Germany {first. last} @tuebingen.mpg.de

More information

Midterm. Introduction to Machine Learning. CS 189 Spring You have 1 hour 20 minutes for the exam.

Midterm. Introduction to Machine Learning. CS 189 Spring You have 1 hour 20 minutes for the exam. CS 189 Spring 2013 Introduction to Machine Learning Midterm You have 1 hour 20 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. Please use non-programmable calculators

More information

arxiv: v1 [cs.cv] 11 May 2015 Abstract

arxiv: v1 [cs.cv] 11 May 2015 Abstract Training Deeper Convolutional Networks with Deep Supervision Liwei Wang Computer Science Dept UIUC lwang97@illinois.edu Chen-Yu Lee ECE Dept UCSD chl260@ucsd.edu Zhuowen Tu CogSci Dept UCSD ztu0@ucsd.edu

More information

Sparse Support Vector Machines by Kernel Discriminant Analysis

Sparse Support Vector Machines by Kernel Discriminant Analysis Sparse Support Vector Machines by Kernel Discriminant Analysis Kazuki Iwamura and Shigeo Abe Kobe University - Graduate School of Engineering Kobe, Japan Abstract. We discuss sparse support vector machines

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

Discriminant analysis and supervised classification

Discriminant analysis and supervised classification Discriminant analysis and supervised classification Angela Montanari 1 Linear discriminant analysis Linear discriminant analysis (LDA) also known as Fisher s linear discriminant analysis or as Canonical

More information

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1 EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle

More information

A Statistical Analysis of Fukunaga Koontz Transform

A Statistical Analysis of Fukunaga Koontz Transform 1 A Statistical Analysis of Fukunaga Koontz Transform Xiaoming Huo Dr. Xiaoming Huo is an assistant professor at the School of Industrial and System Engineering of the Georgia Institute of Technology,

More information

STUDY ON METHODS FOR COMPUTER-AIDED TOOTH SHADE DETERMINATION

STUDY ON METHODS FOR COMPUTER-AIDED TOOTH SHADE DETERMINATION INTERNATIONAL JOURNAL OF INFORMATION AND SYSTEMS SCIENCES Volume 5, Number 3-4, Pages 351 358 c 2009 Institute for Scientific Computing and Information STUDY ON METHODS FOR COMPUTER-AIDED TOOTH SHADE DETERMINATION

More information

Efficient Kernel Discriminant Analysis via QR Decomposition

Efficient Kernel Discriminant Analysis via QR Decomposition Efficient Kernel Discriminant Analysis via QR Decomposition Tao Xiong Department of ECE University of Minnesota txiong@ece.umn.edu Jieping Ye Department of CSE University of Minnesota jieping@cs.umn.edu

More information

Support Vector Machines for Classification: A Statistical Portrait

Support Vector Machines for Classification: A Statistical Portrait Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,

More information

Distributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College

Distributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College Distributed Estimation, Information Loss and Exponential Families Qiang Liu Department of Computer Science Dartmouth College Statistical Learning / Estimation Learning generative models from data Topic

More information

A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier

A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier Seiichi Ozawa, Shaoning Pang, and Nikola Kasabov Graduate School of Science and Technology, Kobe

More information

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1).

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1). Regression and PCA Classification The goal: map from input X to a label Y. Y has a discrete set of possible values We focused on binary Y (values 0 or 1). But we also discussed larger number of classes

More information

COMS 4771 Introduction to Machine Learning. Nakul Verma

COMS 4771 Introduction to Machine Learning. Nakul Verma COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW1 due next lecture Project details are available decide on the group and topic by Thursday Last time Generative vs. Discriminative

More information

Principal Component Analysis

Principal Component Analysis CSci 5525: Machine Learning Dec 3, 2008 The Main Idea Given a dataset X = {x 1,..., x N } The Main Idea Given a dataset X = {x 1,..., x N } Find a low-dimensional linear projection The Main Idea Given

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier

A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier Seiichi Ozawa 1, Shaoning Pang 2, and Nikola Kasabov 2 1 Graduate School of Science and Technology,

More information

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 42 Outline 1 Introduction 2 Feature selection

More information

Introduction to Deep Learning CMPT 733. Steven Bergner

Introduction to Deep Learning CMPT 733. Steven Bergner Introduction to Deep Learning CMPT 733 Steven Bergner Overview Renaissance of artificial neural networks Representation learning vs feature engineering Background Linear Algebra, Optimization Regularization

More information

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views A Randomized Approach for Crowdsourcing in the Presence of Multiple Views Presenter: Yao Zhou joint work with: Jingrui He - 1 - Roadmap Motivation Proposed framework: M2VW Experimental results Conclusion

More information

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Course 495: Advanced Statistical Machine Learning/Pattern Recognition Course 495: Advanced Statistical Machine Learning/Pattern Recognition Goal (Lecture): To present Probabilistic Principal Component Analysis (PPCA) using both Maximum Likelihood (ML) and Expectation Maximization

More information

Nonnegative Matrix Factorization Clustering on Multiple Manifolds

Nonnegative Matrix Factorization Clustering on Multiple Manifolds Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10) Nonnegative Matrix Factorization Clustering on Multiple Manifolds Bin Shen, Luo Si Department of Computer Science,

More information

Kernel Methods. Machine Learning A W VO

Kernel Methods. Machine Learning A W VO Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance

More information

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH Hoang Trang 1, Tran Hoang Loc 1 1 Ho Chi Minh City University of Technology-VNU HCM, Ho Chi

More information