Biometric scores fusion based on total error rate minimization

Size: px
Start display at page:

Download "Biometric scores fusion based on total error rate minimization"

Transcription

1 Pattern Recognition 4 (28) Biometric scores fusion based on total error rate minimization Kar-Ann Toh, Jaihie Kim, Sangyoun Lee Biometrics Engineering Research Center, School of Electrical & Electronic Engineering, Yonsei University, 34 Shinchon-dong, Seodaemun-gu, Seoul, 2-749, Korea Received 3 October 26; received in revised form 23 June 27; accepted 25 July 27 Abstract This paper addresses the biometric scores fusion problem from the error rate minimization point of view. Comparing to the conventional approach which treats fusion classifier design and performance evaluation as a two-stage process, this work directly optimizes the target performance with respect to fusion classifier design. Based on a smooth approximation to the total error rate of identity verification, a deterministic solution is proposed to solve the fusion optimization problem. The proposed method is applied to a face and iris verification fusion problem addressing the demand for high security in the modern networked society. Our empirical evaluations show promising potential in terms of decision accuracy and computing efficiency. 27 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved. Keywords: Multimodal biometrics; Decision fusion; Equal error rate; Pattern classification; Machine learning. Introduction.. Background Attributed to the identity-based nature of authentication, biometric has gained much attention over recent years particularly, for its potential role in information and forensic security. However, there remained many problems to be resolved before biometrics can gain possibly pervasive applications. For instance, due to inherent limitations as well as external sensing factors, no single biometric method can warrant a % authentication accuracy as well as universality of usage by itself. Since combining multiple biometric methods can alleviate many of these problems, multimodal biometrics becomes a focused field of research. Existing means to combine or fuse multiple modalities of biometrics can be performed either before matching or after matching. For fusion before matching, two levels namely, the sensor level and the feature level (see e.g. Refs. [,2]) can be identified. For fusion after matching, three levels namely, Corresponding author. Tel.: ; fax: address: katoh@yonsei.ac.kr (K.-A. Toh). the abstract level (see e.g. Ref. [3]), the rank level and the match score level (see e.g. Ref. [4]) can be identified. Concerning the central module of fusion, either non-training based methods or training-based methods can be adopted. In non-training based methods, it is often assumed that the outputs of individual biometric classifiers are the probabilities that the input pattern belongs to a certain class label (see e.g. Refs. [5 8]). The training-based methods do not require this assumption and they can operate directly on the match scores generated by biometric verification modules (see e.g. Refs. [3,9 ]). Our work here belongs to the training-based approach working at match score level. Since the outcome of biometric verification consists of only two labels, i.e. the query identity is recognized to be either a genuine-user or an imposter, the verification process can thus be treated as a two-category classification problem. This classification treatment holds well for multimodal scores fusion because similar decision labels are anticipated..2. Motivation Apart from the receiver operating characteristic (ROC) curves, the false acceptance rate (FAR), the false rejection rate (FRR), and the equal error rate (EER) have been used 3-323/$3. 27 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved. doi:.6/j.patcog

2 K.-A. Toh et al. / Pattern Recognition 4 (28) extensively for comparison of biometric verification performances. These error rates have their own reasons for being widely used: (i) each is a single index measure and thus simple and direct in terms of interpretation as compared to the ROC, (ii) the EER is a compact term indicating both the FAR and the FRR at the same time, and more importantly (iii) the EER is based on a projected optimal operating point (of total error rate, TER) where the FAR curve meets the FRR curve. The error rate is a percentage count of misclassified samples and this poses difficulty to analyze it directly without imposing strong assumptions regarding the data distribution. Except for Poh and Bengio [2] who approached the problem from the theoretical EER point of view based on Gaussian ssumption, there has been lack of literature from the biometric community to solve or even acknowledge this problem. A common practice as seen in the multimodal biometrics community is to perform decision fusion and performance evaluation separately (see all cited references in previous subsection). For instance, the fusion module is first designed using certain distance criterion (e.g. least squares error, LSE) and then the performance is evaluated using FAR, FRR or EER. Although there exists a certain correlation between the learning distance (e.g. LSE) and the decision error rates (a percentage count of incorrectly classified samples), the FAR and FRR outcomes are frequently found to behave in a rather different manner from the optimally learned fusion classifier (with respect to the LSE) in practice. This is due to a mismatch between the learning objective (LSE) and the authentication objectives (FAR, FRR or EER). The Support Vector Machines (SVM) [3,4] has largely advanced the situation in terms of decision boundary design. However there remains no direct clues regarding the error rates without going through the error counting process. In view of the above problem, we present an attempt to approximate the error counting of FAR and FRR and then optimize the approximated Total Error Rate (TER which is equal to FAR+FRR) directly with respect to fusion classifier design. Based on an extensive experiments on fusing several biometrics, we shall observe the empirical behaviors of such formulation..3. Contributions and organization The contributions of this work are enumerated as follows: (i) formulation of an approximate optimization objective which includes a decision threshold for empirical TER estimation, (ii) proposal of a novel fast closed-form solution to TER minimization, (iii) provision of empirical evidences using several biometric data sets for fusion study. The paper is organized as follows: the next section provides several definitions of error rates and a brief account on related linear estimation models. A direct means to compute the TER given a decision model is next presented in Section 3. With the TER objective in place, Section 4 presents our proposed method to minimize the TER. In Section 5, we introduce three non-intrusive biometrics from the face and the eye for verification scores fusion. This is followed up by an extensive experimentation using data from these three biometrics in Section 6. The proposed method is further benchmarked using publicly available data sets. Section 7 summarizes the results and observations, and finally some concluding remarks are passed in Section Definitions and preliminaries Consider the binary classification in biometric decision. Suppose we have m learning examples {x} m i= Rl (a l-dimensional biometric feature vector) and their corresponding class labels y i {, } where denotes an imposter and denotes a genuine-user. Let g : R l R be the hypothesis function (biometric classifier) mapping these pattern features onto a scalar measure for decision inference. Suppose g(x) produces a continuous output, then the output must be thresholded in order to label each example as positive class (genuine-user) or negative class (imposter). Given a decision threshold τ, the class label associated to a new example x n can be written as { (=genuine_user) if g(xn ) τ, cls(g(x n )) = () (=imposter) if g(x n )<τ. For each operational setting of τ, atrue positive rate (TP) and a true negative rate (TN) are sufficient to describe the classifier s performance. Alternatively, a false positive rate (FP) and a false negative rate (FN) can also be defined. It is noted that TP + FN = and TN+ FP =. The relation between these recognition rates can be tabulated as a confusion matrix as shown in Table. 2.. Equal error rate In biometric verification where the basic task is to distinguish between two classes of users, namely the genuine-user and the imposter, FP rate is also called FAR, and FN rate is also called FRR. By varying the threshold τ fro to + (or from to in a normalized case), the FRR shows an increasing trend while the FAR shows a decreasing trend, both with respect to this change of τ. Along the variation of threshold τ, there is a point (say, at τ ) where the two curves (FAR and FRR) cross each other, and this point is called the EER. In other words, EER = FAR FAR=FRR = FRR FRR=FAR = FAR τ = FRR τ Total error rate The TER is defined as the sum of the False Acceptance and the False Rejection rates (TER = FAR + FRR). The EER Table Confusion matrix for two-class problems Estimate\truth P N Pˆ TP FP ˆN FN TN

3 68 K.-A. Toh et al. / Pattern Recognition 4 (28) Distribution frequency Genuine-user and imposter score distributions Imposter Genuine-user Error rates Normalized score FAR, FRR and TER curves FAR FRR TER.2 EER Normalized score Fig.. Relations among FAR, FRR, TER and EER. mentioned above is frequently used as a performance index for biometric systems because at this particular operating point, the TER is frequently found to be at its minimum. This is particularly true when the genuine-user and imposter score distributions are normal (see Fig. ). As such, the EER is frequently approximated by TER/2 at τ [2] and minimization of EER may be treated as minimization of the minimum TER. We shall minimize the empirical TER and observe its impact on the observed EER in this development Linear parametric models Linear parametric models have been widely used due to their tractability in optimization and related analysis. The embedment of nonlinearities such as kernels and other basis functions into linear regression models has even widened their scope of applications (see e.g. Refs. [5 7]). The importance of linear parametric models that embed nonlinearities is thus obvious and we shall limit our scope to these linear parametric models in this paper. A good example of linear parametric model is the multivariate polynomial (MP) regression which has been shown to possess the capability of describing any arbitrary complex nonlinear input output relationships attributed to the theoretical ground of Weierstrass s approximation theory (see e.g. Ref. [8]). However, the number of independent adjustable parameters in MP would grow like l r for a rth-order model with input dimension l [9]. This limitation has been recently addressed by Toh [2] and Toh et al. [2] by reducing the number of polynomial expansion terms for classification applications. We shall adopt this reduced multivariate polynomial model (RM) in our experiments even though we know that the proposed formulation can be easily adapted for other types of linear parametric models with embedded basis functions. Consider a l-dimensional input x and a rth-order polynomial operating on x which gives rise to K polynomial expansion terms. A linear parametric model in this context can be written as K g(α, x) = α k p k (x) = p(x)α, (2) k= where p k (x) corresponds to the kth polynomial expansion term of the row vector p(x) =[p (x), p (x),...,p K (x)] and α = [α, α,...,α K ] T is a column parameter vector. When each element of the input x R l has a known label y R, giving rise to m learning data pairs (x i,y i ), i =, 2,...,m, the learning problem can be supervised. In biometric verification problems, these target labels are known as genuine-user and imposter. Learning of the target labels (packed as y=[y,y 2,...,y m ] T ) can be accomplished by minimizing a LSE criterion. To stabilize the solution for estimation, a weight decay regularization can be performed [2]. The criterion function to be minimized

4 K.-A. Toh et al. / Pattern Recognition 4 (28) is thus: J = 2 m [y i p(x i )α] 2 + b 2 αt α i= = 2 y P α b 2 α 2 2, (3) where b controls the weighting of regularization and P packs the training samples in matrix form: p(x ) p(x 2 ) P =. (4). p(x m ) The estimated training output is given by ŷ = P α where the solution for α which minimizes J is LSE : α = (P T P + bi) P T y, (5) with b being chosen to be a small value for stability and not introducing much bias [2]. I is an identity matrix with similar dimension to P T P. For unseen test data x t, another polynomial matrix P t can be generated using p(x t ). Prediction of the class label for ŷ t can then be performed using the above learned α (i.e. ŷ t = P t α) and the classification decision given by Eq. (). With these backgrounds in place, we are ready to discuss the TER and related issues in the sequel. 3. Direct computation of TER It is noted here that minimization of the minimum TER would be a two-step process if classifier optimization (locating of classifier parameters) and threshold optimization (locating the minimum TER at τ from FAR and FRR computations) are treated separately. We shall present a direct method to compute the TER here and then in next section, we propose a method to minimize this minimum TER directly according to fusion classifier design. Without loss of generality, consider the decision distributions as illustrated in Fig. where the genuine-user scores are normally centered at a higher value than that of the imposter scores. Denote for those variables (x,m) related to positive (genuine-user) and negative (imposter) examples by respective superscripts + and, it is not difficult to see that the FAR and FRR are merely the averaged counts of decision scores falling within the opposite pattern categories: FAR = FRR = g(x j ) τ, (6) j= g(x + i )<τ, (7) i= where the term g(x) τ ( g(x)<τ ) corresponds to a whenever g(x) τ (g(x)<τ), and otherwise. Define a step function δ(ε j ) εj with ε j = g(x j ) τ for j =, 2,...,, Eq. (6) can be re-written as FAR = δ(ε j ). (8) j= We can use the same definition of step function δ for Eq. (7) when we write ε i = τ g(x + i ) Δ (Δ, which can be ignored in practice, is to account for the strict inequality in Eq. (7)) for i =, 2,..., : FRR = δ(ε i ). (9) i= With the FAR and FRR in place, the TER can be written as TER = FAR + FRR = j= δ(ε j ) + δ(ε i ). () i= Here, we note that TER can be related to the commonly known Accuracy () when s =, and when the normalization factor ( 2 ) is ignored. TP + s( FP) Accuracy =. () + s Suppose the fusion classifier g consists of some K number of adjustable parameters α=[α, α 2,...,α K ] T operating on the feature vector x, i.e. g(α, x), then the goal to improve the classifier s discrimination performance can be treated as to minimize the TER given by TER(α, x +, x ) = δ(ε(α, x j )) j= + δ(ε(α, x + i )), (2) i= where ε j = ε(α, x j ) = g(α, x j ) τ for j =, 2,...,m and ε i = ε(α, x + i ) = τ g(α, x+ i ) for i =, 2,...,m+. Remark. We note that g(α, x) need not be linear with respect to x. However, when g(α, x) is linear with respect to both α and x, Eq. (2) may be formulated as a perceptron criterion function where only the total number of samples being misclassified is accumulated (see Ref. [22, chapter 5.5]). Due to the piecewise nature of such perceptron formulation, the criterion function could be ill-posed. Particularly, the error-correcting procedure may never cease even for linearly non-separable case. We shall attempt a smooth approximation approach (where g(α, x) can be nonlinear with respect to x), which accumulates all training samples, to overcome this problem.

5 7 K.-A. Toh et al. / Pattern Recognition 4 (28) Minimizing TER To solve the problem in Eq. (2), an approximation to the non-differentiable step function δ is often adopted. A natural choice to approximate the above step function is the sigmoid function [23] where the minimization problem becomes arg min TER(α, x +, x ) α = arg min σ(ε(α, x α j )) + j= σ(ε(α, x + i )), (3) i= where σ(x) =, γ >, (4) + e γx and ε(α, x j )=g(α, x j ) τ for j =, 2,...,m and ε(α, x + i )= τ g(α, x + i ) for i =, 2,...,m+. There are two problems associated with this approximation. The first problem is that the formulation is nonlinear with respect to the learning parameters. Although an iterative search can be employed for local solutions, different initializations may end up with different local solutions, hence incurring laborious trial and error efforts to select an appropriate setting. The second problem is that the objective function could be illconditioned due to the much local plateaus resulting from summing the flat regions of the sigmoid. A lot of search effort may be spent upon making little progress at locally flat regions. 4.. Quadratic approximation We seek in this section a possible deterministic closed-form solution from matching of the link-loss functional pair [24,25]. Since we adopt a linear link function (polynomials p), a quadratic loss functional would match well to arrive at desired convexity for the link-loss pair. However, some considerations regarding goodness of approximation to the step function δ would be necessary. When we have all inputs normalized within [, ], the step functional can be approximated by centering a quadratic functional at the origin. To cater for inputs which get beyond this range, an offset η can be introduced such that only the right arm of the quadratic functional is activated for the approximation (see Fig. 2). With this idea in mind, the following regularized quadratic TER approximation is proposed: TER(α, x +, x ) = b 2 α [ε(α, x j ) + η]2 j= [ε(α, x + i ) + η]2, (5) i= Step loss L (g) and approximations.5.5 Step Sigmoid Quadratic Output: g Fig. 2. Sigmoidal (γ = 5) and quadratic (η = ) approximations to the step function. where η > and ε(α, x j ) = g(α, x j ) τ = p(x j )α τ, (6) ε(α, x + i ) = τ g(α, x+ i ) = τ p(x + i )α, (7) for j =, 2,...,, i =, 2,..., Optimizing parameter α Our first task is to solve for the parameter vector α which minimizes Eq. (5). The optimality condition to minimize Eq. (5) requires that TER(α, x +, x ) α which implies that bα + + = (8) p T (x j )[p(x j )α τ + η] j= p T (x + i )[τ p(x+ i )α + η]= i= bi + p T (x j )p(x j ) j=

6 K.-A. Toh et al. / Pattern Recognition 4 (28) p T (x + i )p(x+ i ) α i= (η τ) p T (x j j= ) (η + τ) p T (x + i ) =. i= (9) Abbreviating the row polynomial vectors p j =p(x j ) RK and p i = p(x + i ) RK, the solution for α which minimizes Eq. (5) can be written as α = bi + p T j p j + p T i p i (τ η) j= j= p T j i= + (τ + η) i= p T i, (2) where I is an identity matrix of K K size. In a more compact matrix form, Eq. (2) can be written as ( α = bi + P T P + ) P + T P + ( ) (τ η) P T I (τ + η) + P + T I +, (2) where p(x + ) p(x ) p(x + 2 ) p(x 2 ) P + =, P =, (22).. p(x + ) p(x ) and I + =[,...,] T N m+, I =[,...,] T N m. Remark 2. It is noted here that the decision threshold τ is being included in the optimization process when determining α. This is differentiated from many conventional classifiers (such as neural networks) which do not include an explicit decision threshold during classifier design. Moreover, the solution here for minimizing TER is deterministic as it does not require initialization. The learning solution in Eq. (2) appears to have similar structure to Eq. (5) but with separate normalized covariates and regressor matrices (P + and P ) corresponding to each class label. This is differentiated from the LSE in Eq. (5) which lumped the two class specific regressor matrices into a single matrix P. Except for inclusion of threshold, bias and regularization terms, the structure of Eq. (2) also appears analogous to that of the solution to Fisher linear discriminant analysis (see e.g. Ref. [22]). However, since g(α, x) = p(x)α can be nonlinear with respect to x, a nonlinear decision boundary in the x-plane can be obtained, and this is a main advantage over linear classifiers. The proposed formulation can thus be considered a nonlinear discriminant function and we shall explore an expansion of p(x) using a recently proposed reduced polynomial model since the full polynomial has explosive number of parameters as the input dimension and model order increase. The quadratic approximation here may also be related to a quadratic relaxation technique for perceptron criterion (see Ref. [22, chapter 5.6]). However, as mentioned in Ref. [22], the quadratic relaxation suffer much problems from the errorcorrecting procedure even for linearly non-separable cases, particularly the convergence and boundary point issues. The main advantage of our formulation is that the solution (α) for classification decision can be obtained in closed-form which is also least squares optimal in the TER sense Optimizing threshold τ The threshold value τ as appeared in Eqs. (2) or (2) can also be optimized. This can be obtained by making TER(α, τ)/ τ= which gives [(p j α τ) + η]+ j= 2τ = [(τ p i α) + η]= i= (p j α + η) + j= (p i α η) i= τ = 2 (I T P α) + 2 (I + T P +α). (23) To solve for τ in Eq. (23), we need the solution for α. Let ( M = bi + P T P + ) P + T P + (24) and we can packed α from (2) as (τ η) α = M P T I (τ + η) + M P+ T I +. (25) This compact Eq. (25) can be substituted into Eq. (23) and τ can be solved as τ = η ( ( 2 A A + B C + B C with A, B, C, D being defined as A = I T P M P T I, B = I T P M P T + I +, C = I T + P +M P T I, D ) D ), (26) D = I T + P +M P T + I +. (27)

7 72 K.-A. Toh et al. / Pattern Recognition 4 (28) Remark 3. Similar to estimation of α, the optimal threshold τ can also be obtained in closed-form without the need of an iteration process. This τ can in turn be fed into Eq. (2) for optimal estimation of α. Here we note that the bias parameter η in the equation cannot be optimized due to its uniform contribution to all components of error rates Summary of proposed algorithm (TER Q ) For clarity reason, the procedure to implement the algorithm is summarized as follows: Training: Set η = and b = 3, () Generate the regression matrices P + and P from respective genuine-user and imposter training data using Eq. (4). (2) Generate the matrices M, A, B, C and D using P + and P obtained from above step. (3) Compute the optimal decision threshold τ using Eq. (26). Alternatively, τ can be fixed according to the mid-point of the design output range. (4) Compute the optimal fusion classifier parameter α using Eq. (25) and τ. To test or predict a verification outcome from new data: () Generate the regression polynomial P t from the test data using Eq. (4). (2) Compute the decision fusion output ŷ = P t α. (3) Decision: if ŷ τ then the new data is genuine-user, else imposter. For convenience, we shall call this algorithm TER Q in brief. With the algorithm in place, we are ready to perform fusion experiments in the following section. 5. Biometrics from the eye and face Attributed to the pioneering work of John Daugman [26,27], the iris has now been recognized as a biometric for high security applications. Apart from its high accuracy, an iris verification system is non-intrusive in terms of physical contact. However, a visual (RGB) iris recognition system can easily be fooled by a high resolution picture when it is not equipped with anti-spoofing solutions. Apart from using infra-red iris imaging solutions, fusion of several biometric modalities can uplift the level of matching engagement, thereby deterring a certain amount of imposter attacks. As part of our continual effort to fuse several modalities in a natural manner considering ease of use, we shall combine an infra-red iris verification scores with two face verification scores from both visual and infra-red spectrums in this study. 5.. Infra-red iris verification The infra-red iris images were captured using a monochrome CCD camera (WAT-92A from Watec Co. Ltd.) with infra-red LED illumination at resolution. Fig. 3 shows five image samples from five different identities. The raw iris infrared images were first localized by means of interior boundary (between pupil and iris) and exterior boundary (between iris and sclera) which were found using edge detection algorithm. The localized iris region was then transformed into a polar coordinate by a rubber sheet model whereby a normalization was performed to generate an iris signal for feature extraction. Based on independent component analysis (ICA), a set of basis function was estimated to represent the iris signal. The coefficients of the ICA expansions were adopted as feature vectors which were then fed into a cosine distance measure for comparison of two identities. The interested readers are referred to Ref. [28] for more technical details regarding the infra-red iris verification system Visual face verification Face is the most common biometric used by humans. We inherently use this biometric to recognize people in our daily interactions. Face recognition is thus an important area in biometrics for it can also be covert as well as non-intrusive. The main approaches to viewer centered 2D face recognition includes holistic, analytic and hybrid methods [29]. The holistic approach uses subspace techniques to reduce the image dimension and then compare image similarity using this subspace. A very widely used technique for this subspace reduction is principal component analysis (PCA). The analytic approach uses geometrical features such as distances between face objects like eyes, nose, and mouth for similarity measure. The hybrid approach combines various means including the holistic and analytic approaches. The visual face images used in this study were captured under various illumination and pose conditions using a Bumblebee CCD camera produced by Point Grey Research Inc. (see Ref. [3] for details). The resolution of the image used was pixels. The top row of Fig. 4 shows some visual image samples for an identity under various illumination and pose conditions. In this work, we adopted the holistic approach using PCA. To compare similarity between two face images, the Euclidean distance was used for the first eigenvalues [3] Infra-red face verification Due to the relatively high instrumental cost, the infra-red face is less studied as compared to the visual face. The infra-red face images used in this study were captured using a ThermoVision S65 produced by FLIR Systems Inc. As in the visual face case, the images were captured under varying illumination, expression and pose conditions with the resolution of the image being fixed at pixels. The bottom row of Fig. 4 shows some infra-red image samples for the same identity under various conditions. Similar to the visual face, we adopted the holistic approach using PCA for the infra-red face. To compare similarity between two face images, the Euclidean distance was used for the first eigenvalues [3].

8 K.-A. Toh et al. / Pattern Recognition 4 (28) Fig. 3. Infra-red iris samples for five different identities. Fig. 4. Top row: visual face samples for an identity under different lighting and pose conditions; Bottom row: infra-red face samples for the same identity under different lighting and pose conditions. These two face data sets constitute a true multimodal system since they are acquired from a similar pool of identities. 6. Experiments 6.. Data sets In the following experiments, each data set corresponding to infra-red iris (iris-ir), visual face (face-vs) and infra-red face (face-ir) verification consists of 96 identities, wherein each identity contains image samples. For training and test purposes, each of these biometric data sets are partitioned into two equal sets consisting of S train and S test, each with 96 5 samples. The genuine-user and the imposter match-scores are generated from these two sets by intra-identity and interidentity matching among the visual/infra-red image samples for each biometric. A total of 96 ( ) sample matchscores are thus available for the genuine-user class in each training set and test set for each biometric. As for the imposter scores, there are 4 ( ) sample match-scores for the 96 identities. Since all three biometrics have the same number of genuine-user and imposter samples, an arbitrary oneto-one identity correspondences was assumed among the three biometric data sets. This is a reasonable assumption since our focus here is output scores fusion and not on correlation among different modalities for each identity Preprocessing In the following experiments, the match-scores for all biometrics are normalized to within the interval [, ], all having a higher match score for a genuine-user than that of an imposter, before performing data fusion. Fig. 5(a) (b), Fig. 5(c) (d) and Fig. 5(e) (f) show the matching performances for the training and test sets, respectively, for individual iris-ir, face-vs and face-ir verifications before scores fusion. From the match-score distribution plots as shown in Fig. 5(a), (c) and (e), the verification performance depends much on the overlapping zone between the genuine-user and the imposter classes. Among the three biometrics, we see that iris-ir has the least overlapping regions and hence the best verification performance. The face-vs has the most overlapping regions and this gives rise to the worst verification performance. This observation using the scores distribution plot is further confirmed by the corresponding ROC plots in Fig. 5(b), (d) and (f) Fusion experiments and evaluation setups Based on the biometrics data described above and a publicly available database, the following three sets of fusion experiments are performed: (i) Fusion of face biometrics (face-vs and face-ir): Due to possible large variation of illumination conditions in ground applications, we believe that visual and infra-red images can complement each other. In this experiment, we perform fusion experiments by combining the verification decision scores from face-vs and face-ir where the images were captured simultaneously. Since the face-vs has poor performance due to large variation of illumination conditions in the database, we shall observe the effects of fusing it with the much higher performed face-ir. (ii) Fusion of faces-and-eye biometrics (iris-ir, face-vs and face-ir): Similar to the face fusion above, an advantage to fuse iris-ir, face-vs and face-ir is that their images can be captured simultaneously. This is important in application because simultaneously presenting all three biometrics becomes a much more difficult task than presenting a single biometric for an imposter attack. This is especially

9 74 K.-A. Toh et al. / Pattern Recognition 4 (28) Normalized frequency Authentic accept rate (%) Normalized frequency Authentic accept rate (%) Normalized frequency Authentic accept rate (%) Fig. 5. Matching performance for iris (IR), face (visual) and face (IR) verification systems: training (solid lines) and test (dashed lines) sets (a) Match score distribution (Iris IR), (b) ROC (Iris IR), (c) Match score distribution (Face Visual), (d) ROC (Face Visual), (e) Match score distribution (Face IR), (f) ROC (Face IR). true when both visual and infra-red light frequencies are exploited. We shall observe the impact of having an additional dimensionality from combining low and high performance biometrics on the proposed algorithm. (iii) Publicly available data sets: Apart from the above experiments using in-house data sets, the proposed TER Q is further experimented on publicly available fusion data sets (XM2VTS face and speaker verification database, which contains 32 fusion cases [3,32]) so that further comparison can be done by other researchers. Regarding the algorithm settings and comparison measures, the following items are observed in our experiments: Comparison platform: To compare the conventional LSE learning and the proposed TER Q learning, we shall adopt the RM model as seen in Ref. [2] for decision score fusion since the number of polynomial coefficients does not explode with respect to model order and feature dimension. Different model orders r [2, 3, 4, 5, 6] will be experimented for both LSE and TER Q such that the experiments project a good overview on different operational settings. For TER Q, the bias was fixed at η = in all experiments since it was found to be inert to estimation within the intended operating range. In all the following fusion experiments, we set b = 3 for the RM model since: (i) it does not introduce much bias in regularization, (ii) we have a standardized setting for both LSE and the proposed TER Q, and (iii) we found this setting produces good training and test results for both cases from our empirical observations. Performance evaluation criterion: The EER shall be adopted as the performance comparison measure in experiments (i) and (ii). There are two reasons behind this choice of criterion: () it is a single value index which has a clear indication of high and low performances and this can be advantages to the use of ROC or DET where the curves for different algorithms may cross each other, and (2) it is related to our optimization objective (minimization of TER). For experiment (iii), the HTER (Half Total Error Rate) will be adopted according to Ref. [3] for direct comparison purpose. Here we note that HTER can be related to EER under certain conditions (e.g. Fig. ).

10 K.-A. Toh et al. / Pattern Recognition 4 (28) EER EER Polynomial order r=2:6, SVM-RBF Polynomial order r=2:6 γ=., :, LSE-train LSE-test TER Q -train TER Q -test SVMPoly-train SVMPoly-test SVMRbf-train SVMRbf-test Face-VStrain Face-VS-test Face-IR-train Face-IR-test LSE-train LSE-test TER Q -train TER Q -test SVMPoly-train SVMPoly-test Face-IR-train Face-IR-test Fig. 6. Combining Face-VS and Face-IR verifications: (a) EER plotted over different polynomial orders and RBF kernel widths, (b) a zoom-in view. Computing effort: In order to observe the computational effort, the CPU time will be recorded. All experiments were ran under the PC Windows Matlab platform using a Pentium- M-.73 GHz computer. Although there could be small differences among different runs of the same algorithm, the CPU time provides some hints regarding the order of difference between two algorithms run-times. For instance, if Algorithm A uses s and Algorithm B uses s, then we can say that A is approximately times faster than B. Benchmarking: In order to gauge whether the best possible performance has been attained for experiments (i) and (ii), we conduct similar experiments using SVM [4,5] implemented by Ma et al. [33]. Both the polynomial kernel (SVM-poly) and the radial basis kernel (SVM-Rbf) are experimented for various polynomial orders (r [2, 3, 4, 5, 6]) and kernel widths (Gamma [.,,...,, ] [33]). We believe these choices of kernel settings do provide certain benchmarks regarding the achievable performance. For experiment (iii), a similar experimental protocol has been adopted according to that in Ref. [3] for the proposed TER Q such that the results can be directly compared in future Results (i): fusion of face-vs and face-ir scores Fig. 6 shows the training (solid lines) and test (dashed lines) EER for the experimented LSE, TER Q, SVM-poly, and SVM- Rbf. Both the LSE and the TER Q adopted the RM model and their EER results are plotted over different model orders for the range mentioned in previous section. For SVM-poly, the same model order range (r [2,...,6]) was used, and for SVM-Rbf, the kernel width (Gamma [33]) was chosen from [.,, 2,...,9,, ]. From the overall plot in Fig. 6(a), we see that the SVM-Rbf has low EER at large kernel width (small Gamma value of.) and has high EER for small kernel widths. The EER values of SVM-Rbf for Gamma [, 2,...,9,, ] are seen to be lower than that of the face-vs but higher than that of face-ir. This indicates that an appropriate kernel size suitable for the distributions must be chosen in order that SVM-Rbf performs well. From the zoom-in plot in Fig. 6(b), we see that the training and test fusion results crowd around, respectively, the training and test results of the higher performed face-ir (than face-vs). For the training cases, the LSE, TER Q and SVM-poly show better performance (lower EER) than that of the IR-face at many model orders (particularly r = 3, 4, 5). However, for the test cases, only TER Q shows a clear performance superiority in terms of EER than that of the face-ir. Fig. 7 shows an instance of EER performance at r = 3 using the DET curves. Fig. 8 shows the CPU times incurred for training (solid lines) and test (dashed lines) from running LSE, TER Q, SVMpoly and SVM-Rbf. From Fig. 8(a), we see that TER Q and LSE have a clear advantage of low CPU requirement. From Fig. 8(b), we see that the test CPU times are similar for TER Q and LSE since they have similar polynomial expansion terms. The training CPU time for TER Q is seen to be lower than that of LSE due to a vectorized implementation of TER Q whereas LSE did not capitalize on such facility. This shows that significant CPU time can be reduced from efficient implementation Results (ii): fusion of iris-ir, face-vs and face-ir scores For fusion of three biometrics (iris-ir, face-vs and face- IR), Fig. 9 shows the training (solid lines) and test (dashed lines) EER for the experimented LSE, TER Q, SVM-poly, and SVM-Rbf. Similar to previous experiment, both the LSE and the TER Q adopted the RM model and their EER results

11 76 K.-A. Toh et al. / Pattern Recognition 4 (28) False rejection rate (%) LSE TER Q SVM-Lin SVM-Pol SVM-Rbf False acceptance rate (%) False rejection rate (%) LSE TER Q SVM-Lin SVM-Pol SVM-Rbf False acceptance rate (%) Fig. 7. DET curves comparing different classifiers for fusion of face biometrics (r = 3 for LSE, TER Q and SVM-poly; Gamma =. for SVM-Rbf) (a) Train data, (b) Test data. CPU time (sec) LSE-train LSE-test TER Q -train TER Q -test SVMPoly-train SVMPoly-test SVMRbf-train SVMRbf-test CPU time (sec) LSEtrain LSEtest TER Q -train TER Q -test Polynomial order r=2:6, SVM-RBF γ=., :, Polynomial order r=2:6 Fig. 8. (a) CPU times incurred for fusion of two biometrics (the CPU time for SVM-Rbf-train at Gamma = is s), (b) a zoom-in view. are plotted over different model orders ranging from 2 to 6. For SVM-poly, the same model order range was plotted. For SVM-Rbf, the kernel width (Gamma) was chosen from [.,, 2,...,9,, ]. From the training results of Fig. 9, we see that only TER Q, LSE and SVM-Rbf at Gamma = have better performance than that of iris-ir. This is reasonable since SVMs are not aimed at training the EER directly and requires a trial and error effort to tune the classifier for best EER performance. The reason that TER Q has best training results is that its training is based on optimization of TER which is related to EER. From the test results of Fig. 9, all four algorithms (LSE, TER Q, SVM-poly, and SVM-Rbf) show an improvement of accuracy as compared to the test samples of the single biometric Iris-IR. The proposed TER Q is seen to perform best for all model orders. Fig. shows a sample DET plot at r = 5 for

12 K.-A. Toh et al. / Pattern Recognition 4 (28) LSE-train LSE-test TER Q -train TER Q -test SVMPoly-train SVMPoly-test SVMRbf-train SVMRbf-test Iris-IR-train Iris-IR-test EER Polynomial order r=2:6, SVM-RBF γ=., :, Fig. 9. Combining face-vs, face-ir and iris-ir verifications: EER plotted over different polynomial orders and RBF kernel widths. False rejection rate (%) 2 Face Visual n False acceptance rate (%) False rejection rate (%) 2 Face Visual n False acceptance rate (%) Fig.. DET curves comparing different classifiers for fusion of all three biometrics (r = 5 for LSE, TER Q and SVM-poly; Gamma = for SVM-Rbf) (a) Train data, (b) Test data.

13 78 K.-A. Toh et al. / Pattern Recognition 4 (28) LSE-train LSE-test TER Q -train TER Q -test SVMPoly-train SVMPoly-test SVMRbf-train SVMRbf-test 4 CPU time (sec) Polynomial order r=2:6, SVM-RBF γ=., :, Fig.. CPU times incurred for fusion of all three biometrics (the CPU times for SVM-Rbf-train and SVM-Rbf-test at Gamma = are, respectively, s and s). 3 6 Mean operator Weighted-Sum-Fisher Weighted-Sum-Brute-Force LSE TER Q Average HTER from 32 cases Mean operator Weighted-Sum-Fisher Weighted-Sum-Brute-Force LSE TER Q HTER Polynomial order Experiment number for 32 fusion cases Fig. 2. Experiments on publicly available data sets: (a) Average HTER plotted over different polynomial orders; (b) HTER (LSE and TER Q with r = 6, b = 3 ) plotted over 32 cases of fusion combinations according to Ref. [3]. LSE, TER Q and SVM-Poly. In the same plot, for SVM-Rbf the best DET at kernel width Gamma = is shown. Fig. shows the CPU times incurred for training (solid lines) and test (dashed lines) for LSE, TER Q, SVM-poly and SVM-Rbf. Again, the TER Q and LSE show lowest computing effort both in terms of training and testing. The EER line in Fig. does not cut on the DET curves due to the resolution of the relatively small number of genuine-user scores (96). For such cases, an approximation to the EER is adopted similar to Ref. [34] by averaging the FAR and the FRR at nearest resolution Results (iii): experiments on publicly available data sets Fig. 2(a) shows the average test HTER obtained from the 32 fusion cases for LSE and TER Q with model orders r [2, 3, 4, 5, 6] and b = 3. The average test HTER of the mean operator, the weighted-sum-fisher and the weightedsum-brute-force from Refs. [3,32] are also included in the figure for comparison purpose. Here, we see that both LSE and TER Q show a decreasing trend in terms of HTER values for r [2, 3, 4, 5]. However, at r = 6, LSE shows deterioration of performance (due to serious over-fitting in one case as seen in

14 K.-A. Toh et al. / Pattern Recognition 4 (28) Fig. 2(b)) while TER Q maintains the performance improvement trend. At r = 5, 6, TER Q outperforms all compared fusion techniques. For a glance of performance for each fusion case, Fig. 2(b) shows the detail HTERs for those individual 32 fusion cases [3] at r = 6, b = 3 for LSE and TER Q. 7. Summary of results and discussion 7.. Summary of observations The following observations can be summarized regarding the performance of the proposed TER Q in previous experiments: () Fusion of face biometrics: when fusing a low performance system with a high performance system, inappropriate tuning to the adjustable parameters of an algorithm may deteriorate the fusion performance. This is particularly evident from the SVM-Rbf experiments. Relatively, TER Q appears to be less sensitive to model parameter change (r is the only model parameter). (2) Fusion of faces-and-iris biometrics: when two biometrics are much stronger than a third one, the chance of having a better fusion performance increases comparing with that merely using a strong and a weak one. This is evident from all test results. (3) Experiments on publicly available data sets: TER Q shows the trend of having a more stable prediction output than that of LSE particularly at high order settings. (4) To summarize, the following points are seen regarding TER Q : (i) good training and test EER performances, (ii) fast training and testing in terms of computational effort. Infra-red face match-score.9.8 Genuine-users Imposters.3.2 Infra-red face match-score.9.8 Genuine-users Imposters.3.2. RML-LSE. SVM-Poly Visual face match-score Visual face match-score Infra-red face match-score Imposters Genuine-users Infra-red face matchscore Imposters Genuine-users.. SVM-RBF RM-TER Visual face match-score Visual face match-score Fig. 3. Decision contours of different classifiers when combining 2 biometrics: (a) Least Squares error minimization using third-order RM model, (b) SVM using third-order Polynomial kernel, (c) SVM using Rbf kernel with Gamma =, (d) TER minimization using third-order RM model.

15 8 K.-A. Toh et al. / Pattern Recognition 4 (28) Iris-IR match-score Genuine-users Imposters Iris-IR match-score.9 Genuine-users Imposters.3.2 RM-LSE. SVM-Poly Face-VS match-score Face-VS match-score Iris-IR match-score Genuine-users.9.8 Genuine-users Imposters.3 Imposters.2. SVM-RBF RM-TER Iris-IR match-score Face-VS match-score Face-VS match-score Fig. 4. Decision contours of different classifiers on visual face and infra-red iris plane (with infra-red face fixed at score of.5) when combining three biometrics: (a) Least Squares error minimization using fifth-order RM model, (b) SVM using fifth-order Polynomial kernel, (c) SVM using Rbf kernel with Gamma =., (d) TER minimization using fifth-order RM model Decision landscapes The good performance can probably be understood from the decision landscapes point of view. Fig. 3 shows the score distributions and decision contours for combining the visual and the infra-red face biometrics. The genuine-user and imposter scores show large overlapping regions and rather indistinguishable distributions. This gives rise to a difficult classification problem. In Fig. 3(a) (d), we show decision contours, respectively, for LSE, SVM-poly, SVM-Rbf, and TER Q. Here we see that for the LSE method, the decision contours are much affected by the density of data where the decision contours rarely cut through the high density imposter region (i.e. fitting the data density). For the SVM methods, the decision contours appear to be determined by the distribution structure of data. This is particularly obvious for SVM adopting the RBF kernel with a small kernel width. As for the proposed TER minimization, two phenomenons are observed: (i) the decision contours appear to be unaffected by and run through the high density imposter regions like that in the SVM-poly case, and (ii) the orientations of the contours appear to go along with the two clusters distribution direction. This suggests that the decision boundary is determined by classification error distribution rather than by data density. Fig. 4 shows part of the decision contours for fusion of three biometrics. The plot was obtained for the face-vs and iris-ir plane with face-ir fixed at a score of.5. Here, SVM-poly is seen to be rather affected by the high imposter density which is, perhaps, due to the local solution property. Both TER Q and SVM-Rbf show rather inert to the distribution density and decision contours are seen to cut across the imposter zones. 8. Conclusion In this paper, an approach to directly optimize the decision total error rate with respect to a fusion classifier design

16 K.-A. Toh et al. / Pattern Recognition 4 (28) is proposed for multimodal biometric scores fusion. Through a quadratic approximation for the error counts, a closedform solution is proposed to solve the optimization problem. Although starting from different derivation points, the structure of the proposed solution can be related to that of Fisher linear discriminant analysis except for inclusion of nonlinear decision capability, normalization and several adjustable terms. This suggests that the proposed formulation is a nonlinear discriminant function which has advantages over linear functions for complex decision hyper-surfaces. From the error minimization viewpoint, the significance of this formulation is the inclusion of decision threshold in the optimization solution which solves the complexity of TER evaluation and threshold setting. With much consideration to ground application scenarios, the proposed method (TER Q ) is applied to fuse two face biometrics (based on visual and infra-images) and an infra-red iris biometric. Extensive experiments were performed considering various settings of algorithm s model order which is the only major tuning parameter. The performance is benchmarked on a publicly available database, as well as compared with the commonly adopted least squares error criterion and two support vector machines adopting different kernels. We are overwhelmed by the very encouraging empirical findings. Our immediate task is to generalize the method for multiple category problems for wider applications. Acknowledgements The authors would like to thank the following colleagues for assistance in collection and generation of biometrics decision output data: Mr. Sang-Ki Kim and Mr. Kwang-Hyuk Bae. Special thanks go to Dr. Norman Poh for sharing the XM2VTS face and speaker verification data sets for fusion benchmarking. This work was supported by the Korea Science and Engineering Foundation (KOSEF) through the Biometrics Engineering Research Center (BERC) at Yonsei University. References [] A. Ross, R. Govindarajan, Feature level fusion using hand and face biometrics, in: Proceedings of SPIE Conference on Biometric Technology for Human Identification II, Orlando, USA, March 25, pp [2] A.A. Ross, K. Nandakumar, A.K. Jain, Handbook of Multibiometrics, vol. 6, International Series on Biometrics, Springer, Berlin, 26. [3] Y.S. Huang, C.Y. Suen, A method of combining multiple experts for the recognition of unconstrained handwriten numerals, IEEE Trans. Pattern Anal. Mach. Intell. 7 () (995) [4] A.K. Jain, K. Nandakumar, A. Ross, Score normalization in multimodal biometric systems, Pattern Recognition 38 (25) [5] J. Kittler, M. Hatef, R.P.W. Duin, J. Matas, On combining classifiers, IEEE Trans. Pattern Anal. Mach. Intell. 2 (3) (998) [6] L. Hong, A. Jain, S. Pankanti, Can multibiometrics improve performance? in: Proceedings of Auto ID, Summit, NJ, 999, pp [7] A. Ross, A. Jain, Information fusion in biometrics, Pattern Recognition Lett. 24 (23) [8] L.I. Kuncheva, C.J. Whitaker, C.A. Shipp, R.P.W. Duin, Limits on the majority vote accuracy in classifier fusion, Pattern Anal. Appl. 6 (23) [9] J. Kittler, K. Messer, Fusion of multiple experts in multimodal biometric personal identity verification systems, in: Proceedings of the 22 2th IEEE Workshop on Neural Networks for Signal Processing, 22, pp [] L.I. Kuncheva, J.C. Bezdek, R. Duin, Decision templates for multiple classifier design: an experimental comparison, Pattern Recognition 34 (2) (2) [] K.-A. Toh, W.-Y. Yau, X. Jiang, A reduced multivariate polynomial model for multimodal biometrics and classifiers fusion, IEEE Trans. Circuits Systems Video Technol. (Special Issue on Image- and Video- Based Biometrics) 4 (2) (24) [2] N. Poh, S. Bengio, How do correlation and variance of base-experts affect fusion in biometric authentication tasks?, IEEE Trans. Signal Process. 53 () (25) [3] B.E. Boser, I.M. Guyon, V.N. Vapnik, A training algorithm for optimal margin classifiers, in: Fifth Annual Workshop on Computational Learning Theory, Pittsburgh, ACM, New York, 992, pp [4] V.N. Vapnik, Statistical Learning Theory, Wiley-Interscience, New York, 998. [5] B.E. Boser, I.M. Guyon, V.N. Vapnik, A training algorithm for optimal margin classifier, in: Poceedings of the fifth ACM Workshop on Computational Learning Theory, Pittsburgh, PA, 992, pp [6] T. Poggio, F. Girosi, Networks for approximation and learning, Proc. IEEE 78 (9) (99) [7] B. Schölkopf, A.J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, MIT Press, Cambridge, MA, 22. [8] W.R. Wade, An Introduction to Analysis, second ed., Upper Saddle River, NJ: Prentice Hall, 2. [9] C.M. Bishop, Neural Networks for Pattern Recognition, Oxford University Press, New York, 995. [2] K.-A. Toh, Fingerprint and speaker verification decisions fusion, in: International Conference on Image Analysis and Processing (ICIAP), Mantova, Italy, 23, pp [2] K.-A. Toh, Q.-L. Tran, D. Srinivasan, Benchmarking a reduced multivariate polynomial pattern classifier, IEEE Trans. Pattern Anal. Mach. Intell. 26 (6) (24) [22] R.O. Duda, P.E. Hart, D.G. Stork, Pattern classification, second ed., Wiley, New York, 2. [23] K.-A. Toh, Learning from target knowledge approximation, in: Proceedings of the First IEEE Conference on Industrial Electronics and Applications, Singapore, May 26, pp [24] G.J. Gordon, Generalized 2 Linear 2 Models, in: Advances in Neural Information Processing Systems (NIPS 22), Vancouver, British Columbia, Canada, December 22, pp [25] P. McCullagh, J.A. Nelder, Generalized Linear Models, second ed., Chapman and Hall, London, 989. [26] J. Daugman, High confidence visual recognition of persons by a test of statistical independence, IEEE Trans. on Pattern Anal. Mach. Intell. 5 () (993) [27] J. Daugman, Biometric personal identification system based on iris analysis, U.S. Patent 2956, 994. [28] K. Bae, S. Noh, J. Kim, Iris feature extraction using independent component analysis, in: Proceedings 4th International Conference on Audio- and Video-Based Person Authentication (AVBPA), Guildford, UK, June 23, pp [29] in: S.Z. Li, A.K. Jain (Eds.), Handbook of Face Recognition, Springer, New York, 24. [3] S.-K. Kim, H. Lee, S. Yu, S. Lee, Robust face recognition by fusion of visual and infrared cues, in: Proceedings of the First IEEE Conference on Industrial Electronics and Applications, Singapore, May 26, pp [3] N. Poh, S. Bengio, Database protocol and tools for evaluating scorelevel fusion algorithms in biometric authentication, Pattern Recognition 39 (2) (26) [32] N. Poh, Multi-system biometrics: Optimal fusion and user-specific information, Ph.D. dissertation, Swiss Federal Institute of Technology in Lausanne, 26.

Study on Classification Methods Based on Three Different Learning Criteria. Jae Kyu Suhr

Study on Classification Methods Based on Three Different Learning Criteria. Jae Kyu Suhr Study on Classification Methods Based on Three Different Learning Criteria Jae Kyu Suhr Contents Introduction Three learning criteria LSE, TER, AUC Methods based on three learning criteria LSE:, ELM TER:

More information

Dynamic Linear Combination of Two-Class Classifiers

Dynamic Linear Combination of Two-Class Classifiers Dynamic Linear Combination of Two-Class Classifiers Carlo Lobrano 1, Roberto Tronci 1,2, Giorgio Giacinto 1, and Fabio Roli 1 1 DIEE Dept. of Electrical and Electronic Engineering, University of Cagliari,

More information

Likelihood Ratio Based Biometric Score Fusion

Likelihood Ratio Based Biometric Score Fusion 1 To appear in IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007 Based Biometric Score Fusion Karthik Nandakumar, Student Member, IEEE, Yi Chen, Student Member, IEEE, Sarat C. Dass,

More information

Score Normalization in Multimodal Biometric Systems

Score Normalization in Multimodal Biometric Systems Score Normalization in Multimodal Biometric Systems Karthik Nandakumar and Anil K. Jain Michigan State University, East Lansing, MI Arun A. Ross West Virginia University, Morgantown, WV http://biometrics.cse.mse.edu

More information

Scale-Invariance of Support Vector Machines based on the Triangular Kernel. Abstract

Scale-Invariance of Support Vector Machines based on the Triangular Kernel. Abstract Scale-Invariance of Support Vector Machines based on the Triangular Kernel François Fleuret Hichem Sahbi IMEDIA Research Group INRIA Domaine de Voluceau 78150 Le Chesnay, France Abstract This paper focuses

More information

Chemometrics: Classification of spectra

Chemometrics: Classification of spectra Chemometrics: Classification of spectra Vladimir Bochko Jarmo Alander University of Vaasa November 1, 2010 Vladimir Bochko Chemometrics: Classification 1/36 Contents Terminology Introduction Big picture

More information

A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier

A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier Seiichi Ozawa, Shaoning Pang, and Nikola Kasabov Graduate School of Science and Technology, Kobe

More information

Role of Assembling Invariant Moments and SVM in Fingerprint Recognition

Role of Assembling Invariant Moments and SVM in Fingerprint Recognition 56 Role of Assembling Invariant Moments SVM in Fingerprint Recognition 1 Supriya Wable, 2 Chaitali Laulkar 1, 2 Department of Computer Engineering, University of Pune Sinhgad College of Engineering, Pune-411

More information

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1 EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle

More information

STUDY ON METHODS FOR COMPUTER-AIDED TOOTH SHADE DETERMINATION

STUDY ON METHODS FOR COMPUTER-AIDED TOOTH SHADE DETERMINATION INTERNATIONAL JOURNAL OF INFORMATION AND SYSTEMS SCIENCES Volume 5, Number 3-4, Pages 351 358 c 2009 Institute for Scientific Computing and Information STUDY ON METHODS FOR COMPUTER-AIDED TOOTH SHADE DETERMINATION

More information

Learning Methods for Linear Detectors

Learning Methods for Linear Detectors Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2011/2012 Lesson 20 27 April 2012 Contents Learning Methods for Linear Detectors Learning Linear Detectors...2

More information

If you wish to cite this paper, please use the following reference:

If you wish to cite this paper, please use the following reference: This is an accepted version of a paper published in Proceedings of the st IEEE International Workshop on Information Forensics and Security (WIFS 2009). If you wish to cite this paper, please use the following

More information

A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier

A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier Seiichi Ozawa 1, Shaoning Pang 2, and Nikola Kasabov 2 1 Graduate School of Science and Technology,

More information

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology

More information

Linear Classifiers as Pattern Detectors

Linear Classifiers as Pattern Detectors Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2014/2015 Lesson 16 8 April 2015 Contents Linear Classifiers as Pattern Detectors Notation...2 Linear

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

Constrained Optimization and Support Vector Machines

Constrained Optimization and Support Vector Machines Constrained Optimization and Support Vector Machines Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/

More information

I D I A P R E S E A R C H R E P O R T. Samy Bengio a. November submitted for publication

I D I A P R E S E A R C H R E P O R T. Samy Bengio a. November submitted for publication R E S E A R C H R E P O R T I D I A P Why Do Multi-Stream, Multi-Band and Multi-Modal Approaches Work on Biometric User Authentication Tasks? Norman Poh Hoon Thian a IDIAP RR 03-59 November 2003 Samy Bengio

More information

A Contrario Detection of False Matches in Iris Recognition

A Contrario Detection of False Matches in Iris Recognition A Contrario Detection of False Matches in Iris Recognition Marcelo Mottalli, Mariano Tepper, and Marta Mejail Departamento de Computación, Universidad de Buenos Aires, Argentina Abstract. The pattern of

More information

ECE662: Pattern Recognition and Decision Making Processes: HW TWO

ECE662: Pattern Recognition and Decision Making Processes: HW TWO ECE662: Pattern Recognition and Decision Making Processes: HW TWO Purdue University Department of Electrical and Computer Engineering West Lafayette, INDIANA, USA Abstract. In this report experiments are

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH Hoang Trang 1, Tran Hoang Loc 1 1 Ho Chi Minh City University of Technology-VNU HCM, Ho Chi

More information

Pattern Recognition Problem. Pattern Recognition Problems. Pattern Recognition Problems. Pattern Recognition: OCR. Pattern Recognition Books

Pattern Recognition Problem. Pattern Recognition Problems. Pattern Recognition Problems. Pattern Recognition: OCR. Pattern Recognition Books Introduction to Statistical Pattern Recognition Pattern Recognition Problem R.P.W. Duin Pattern Recognition Group Delft University of Technology The Netherlands prcourse@prtools.org What is this? What

More information

Support Vector Regression (SVR) Descriptions of SVR in this discussion follow that in Refs. (2, 6, 7, 8, 9). The literature

Support Vector Regression (SVR) Descriptions of SVR in this discussion follow that in Refs. (2, 6, 7, 8, 9). The literature Support Vector Regression (SVR) Descriptions of SVR in this discussion follow that in Refs. (2, 6, 7, 8, 9). The literature suggests the design variables should be normalized to a range of [-1,1] or [0,1].

More information

Data Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396

Data Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396 Data Mining Linear & nonlinear classifiers Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 31 Table of contents 1 Introduction

More information

Aruna Bhat Research Scholar, Department of Electrical Engineering, IIT Delhi, India

Aruna Bhat Research Scholar, Department of Electrical Engineering, IIT Delhi, India International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2017 IJSRCSEIT Volume 2 Issue 6 ISSN : 2456-3307 Robust Face Recognition System using Non Additive

More information

Multiple Similarities Based Kernel Subspace Learning for Image Classification

Multiple Similarities Based Kernel Subspace Learning for Image Classification Multiple Similarities Based Kernel Subspace Learning for Image Classification Wang Yan, Qingshan Liu, Hanqing Lu, and Songde Ma National Laboratory of Pattern Recognition, Institute of Automation, Chinese

More information

BIOMETRIC verification systems are used to verify the

BIOMETRIC verification systems are used to verify the 86 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 14, NO. 1, JANUARY 2004 Likelihood-Ratio-Based Biometric Verification Asker M. Bazen and Raymond N. J. Veldhuis Abstract This paper

More information

From Last Meeting. Studied Fisher Linear Discrimination. - Mathematics. - Point Cloud view. - Likelihood view. - Toy examples

From Last Meeting. Studied Fisher Linear Discrimination. - Mathematics. - Point Cloud view. - Likelihood view. - Toy examples From Last Meeting Studied Fisher Linear Discrimination - Mathematics - Point Cloud view - Likelihood view - Toy eamples - Etensions (e.g. Principal Discriminant Analysis) Polynomial Embedding Aizerman,

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

Discriminative Direction for Kernel Classifiers

Discriminative Direction for Kernel Classifiers Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering

More information

THE ADVICEPTRON: GIVING ADVICE TO THE PERCEPTRON

THE ADVICEPTRON: GIVING ADVICE TO THE PERCEPTRON THE ADVICEPTRON: GIVING ADVICE TO THE PERCEPTRON Gautam Kunapuli Kristin P. Bennett University of Wisconsin Madison Rensselaer Polytechnic Institute Madison, WI 5375 Troy, NY 1218 kunapg@wisc.edu bennek@rpi.edu

More information

Machine learning for pervasive systems Classification in high-dimensional spaces

Machine learning for pervasive systems Classification in high-dimensional spaces Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version

More information

Likelihood Ratio in a SVM Framework: Fusing Linear and Non-Linear Face Classifiers

Likelihood Ratio in a SVM Framework: Fusing Linear and Non-Linear Face Classifiers Likelihood Ratio in a SVM Framework: Fusing Linear and Non-Linear Face Classifiers Mayank Vatsa, Richa Singh, Arun Ross, and Afzel Noore Lane Department of Computer Science and Electrical Engineering West

More information

Linear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights

Linear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights Linear Discriminant Functions and Support Vector Machines Linear, threshold units CSE19, Winter 11 Biometrics CSE 19 Lecture 11 1 X i : inputs W i : weights θ : threshold 3 4 5 1 6 7 Courtesy of University

More information

Learning features by contrasting natural images with noise

Learning features by contrasting natural images with noise Learning features by contrasting natural images with noise Michael Gutmann 1 and Aapo Hyvärinen 12 1 Dept. of Computer Science and HIIT, University of Helsinki, P.O. Box 68, FIN-00014 University of Helsinki,

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction

More information

Automatic Identity Verification Using Face Images

Automatic Identity Verification Using Face Images Automatic Identity Verification Using Face Images Sabry F. Saraya and John F. W. Zaki Computer & Systems Eng. Dept. Faculty of Engineering Mansoura University. Abstract This paper presents two types of

More information

Multivariate statistical methods and data mining in particle physics

Multivariate statistical methods and data mining in particle physics Multivariate statistical methods and data mining in particle physics RHUL Physics www.pp.rhul.ac.uk/~cowan Academic Training Lectures CERN 16 19 June, 2008 1 Outline Statement of the problem Some general

More information

SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels

SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels Karl Stratos June 21, 2018 1 / 33 Tangent: Some Loose Ends in Logistic Regression Polynomial feature expansion in logistic

More information

Lecture 4 Discriminant Analysis, k-nearest Neighbors

Lecture 4 Discriminant Analysis, k-nearest Neighbors Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

Kernels for Multi task Learning

Kernels for Multi task Learning Kernels for Multi task Learning Charles A Micchelli Department of Mathematics and Statistics State University of New York, The University at Albany 1400 Washington Avenue, Albany, NY, 12222, USA Massimiliano

More information

Classification of handwritten digits using supervised locally linear embedding algorithm and support vector machine

Classification of handwritten digits using supervised locally linear embedding algorithm and support vector machine Classification of handwritten digits using supervised locally linear embedding algorithm and support vector machine Olga Kouropteva, Oleg Okun, Matti Pietikäinen Machine Vision Group, Infotech Oulu and

More information

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Overview Introduction Linear Methods for Dimensionality Reduction Nonlinear Methods and Manifold

More information

Analytical Study of Biometrics Normalization and Fusion Techniques For Designing a Multimodal System

Analytical Study of Biometrics Normalization and Fusion Techniques For Designing a Multimodal System Volume Issue 8, November 4, ISSN No.: 348 89 3 Analytical Study of Biometrics Normalization and Fusion Techniques For Designing a Multimodal System Divya Singhal, Ajay Kumar Yadav M.Tech, EC Department,

More information

INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY [Gaurav, 2(1): Jan., 2013] ISSN: 2277-9655 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY Face Identification & Detection Using Eigenfaces Sachin.S.Gurav *1, K.R.Desai 2 *1

More information

PATTERN CLASSIFICATION

PATTERN CLASSIFICATION PATTERN CLASSIFICATION Second Edition Richard O. Duda Peter E. Hart David G. Stork A Wiley-lnterscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane Singapore Toronto CONTENTS

More information

CITS 4402 Computer Vision

CITS 4402 Computer Vision CITS 4402 Computer Vision A/Prof Ajmal Mian Adj/A/Prof Mehdi Ravanbakhsh Lecture 06 Object Recognition Objectives To understand the concept of image based object recognition To learn how to match images

More information

EXTRACTING BIOMETRIC BINARY STRINGS WITH MINIMAL AREA UNDER THE FRR CURVE FOR THE HAMMING DISTANCE CLASSIFIER

EXTRACTING BIOMETRIC BINARY STRINGS WITH MINIMAL AREA UNDER THE FRR CURVE FOR THE HAMMING DISTANCE CLASSIFIER 17th European Signal Processing Conference (EUSIPCO 29) Glasgow, Scotland, August 24-28, 29 EXTRACTING BIOMETRIC BINARY STRINGS WITH MINIMA AREA UNER THE CURVE FOR THE HAMMING ISTANCE CASSIFIER Chun Chen,

More information

Support Vector Machine Regression for Volatile Stock Market Prediction

Support Vector Machine Regression for Volatile Stock Market Prediction Support Vector Machine Regression for Volatile Stock Market Prediction Haiqin Yang, Laiwan Chan, and Irwin King Department of Computer Science and Engineering The Chinese University of Hong Kong Shatin,

More information

Subcellular Localisation of Proteins in Living Cells Using a Genetic Algorithm and an Incremental Neural Network

Subcellular Localisation of Proteins in Living Cells Using a Genetic Algorithm and an Incremental Neural Network Subcellular Localisation of Proteins in Living Cells Using a Genetic Algorithm and an Incremental Neural Network Marko Tscherepanow and Franz Kummert Applied Computer Science, Faculty of Technology, Bielefeld

More information

Eigenface-based facial recognition

Eigenface-based facial recognition Eigenface-based facial recognition Dimitri PISSARENKO December 1, 2002 1 General This document is based upon Turk and Pentland (1991b), Turk and Pentland (1991a) and Smith (2002). 2 How does it work? The

More information

Nonlinear Support Vector Machines through Iterative Majorization and I-Splines

Nonlinear Support Vector Machines through Iterative Majorization and I-Splines Nonlinear Support Vector Machines through Iterative Majorization and I-Splines P.J.F. Groenen G. Nalbantov J.C. Bioch July 9, 26 Econometric Institute Report EI 26-25 Abstract To minimize the primal support

More information

Characterization of Jet Charge at the LHC

Characterization of Jet Charge at the LHC Characterization of Jet Charge at the LHC Thomas Dylan Rueter, Krishna Soni Abstract The Large Hadron Collider (LHC) produces a staggering amount of data - about 30 petabytes annually. One of the largest

More information

Machine Learning 2017

Machine Learning 2017 Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section

More information

Multimodal Biometric Fusion Joint Typist (Keystroke) and Speaker Verification

Multimodal Biometric Fusion Joint Typist (Keystroke) and Speaker Verification Multimodal Biometric Fusion Joint Typist (Keystroke) and Speaker Verification Jugurta R. Montalvão Filho and Eduardo O. Freire Abstract Identity verification through fusion of features from keystroke dynamics

More information

SVM TRADE-OFF BETWEEN MAXIMIZE THE MARGIN AND MINIMIZE THE VARIABLES USED FOR REGRESSION

SVM TRADE-OFF BETWEEN MAXIMIZE THE MARGIN AND MINIMIZE THE VARIABLES USED FOR REGRESSION International Journal of Pure and Applied Mathematics Volume 87 No. 6 2013, 741-750 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu doi: http://dx.doi.org/10.12732/ijpam.v87i6.2

More information

Learning Kernel Parameters by using Class Separability Measure

Learning Kernel Parameters by using Class Separability Measure Learning Kernel Parameters by using Class Separability Measure Lei Wang, Kap Luk Chan School of Electrical and Electronic Engineering Nanyang Technological University Singapore, 3979 E-mail: P 3733@ntu.edu.sg,eklchan@ntu.edu.sg

More information

Selection of Classifiers based on Multiple Classifier Behaviour

Selection of Classifiers based on Multiple Classifier Behaviour Selection of Classifiers based on Multiple Classifier Behaviour Giorgio Giacinto, Fabio Roli, and Giorgio Fumera Dept. of Electrical and Electronic Eng. - University of Cagliari Piazza d Armi, 09123 Cagliari,

More information

Information Fusion for Local Gabor Features Based Frontal Face Verification

Information Fusion for Local Gabor Features Based Frontal Face Verification Information Fusion for Local Gabor Features Based Frontal Face Verification Enrique Argones Rúa 1, Josef Kittler 2, Jose Luis Alba Castro 1, and Daniel González Jiménez 1 1 Signal Theory Group, Signal

More information

Biometrics: Introduction and Examples. Raymond Veldhuis

Biometrics: Introduction and Examples. Raymond Veldhuis Biometrics: Introduction and Examples Raymond Veldhuis 1 Overview Biometric recognition Face recognition Challenges Transparent face recognition Large-scale identification Watch list Anonymous biometrics

More information

Support Vector Machines: Maximum Margin Classifiers

Support Vector Machines: Maximum Margin Classifiers Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 16, 2008 Piotr Mirowski Based on slides by Sumit Chopra and Fu-Jie Huang 1 Outline What is behind

More information

Iterative Laplacian Score for Feature Selection

Iterative Laplacian Score for Feature Selection Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,

More information

COS 429: COMPUTER VISON Face Recognition

COS 429: COMPUTER VISON Face Recognition COS 429: COMPUTER VISON Face Recognition Intro to recognition PCA and Eigenfaces LDA and Fisherfaces Face detection: Viola & Jones (Optional) generic object models for faces: the Constellation Model Reading:

More information

Unsupervised Anomaly Detection for High Dimensional Data

Unsupervised Anomaly Detection for High Dimensional Data Unsupervised Anomaly Detection for High Dimensional Data Department of Mathematics, Rowan University. July 19th, 2013 International Workshop in Sequential Methodologies (IWSM-2013) Outline of Talk Motivation

More information

Small sample size generalization

Small sample size generalization 9th Scandinavian Conference on Image Analysis, June 6-9, 1995, Uppsala, Sweden, Preprint Small sample size generalization Robert P.W. Duin Pattern Recognition Group, Faculty of Applied Physics Delft University

More information

Support Vector Ordinal Regression using Privileged Information

Support Vector Ordinal Regression using Privileged Information Support Vector Ordinal Regression using Privileged Information Fengzhen Tang 1, Peter Tiňo 2, Pedro Antonio Gutiérrez 3 and Huanhuan Chen 4 1,2,4- The University of Birmingham, School of Computer Science,

More information

COMS 4771 Introduction to Machine Learning. Nakul Verma

COMS 4771 Introduction to Machine Learning. Nakul Verma COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW1 due next lecture Project details are available decide on the group and topic by Thursday Last time Generative vs. Discriminative

More information

Kernel Methods. Barnabás Póczos

Kernel Methods. Barnabás Póczos Kernel Methods Barnabás Póczos Outline Quick Introduction Feature space Perceptron in the feature space Kernels Mercer s theorem Finite domain Arbitrary domain Kernel families Constructing new kernels

More information

Linear Discrimination Functions

Linear Discrimination Functions Laurea Magistrale in Informatica Nicola Fanizzi Dipartimento di Informatica Università degli Studi di Bari November 4, 2009 Outline Linear models Gradient descent Perceptron Minimum square error approach

More information

Two-Layered Face Detection System using Evolutionary Algorithm

Two-Layered Face Detection System using Evolutionary Algorithm Two-Layered Face Detection System using Evolutionary Algorithm Jun-Su Jang Jong-Hwan Kim Dept. of Electrical Engineering and Computer Science, Korea Advanced Institute of Science and Technology (KAIST),

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines Pattern Recognition and Machine Learning James L. Crowley ENSIMAG 3 - MMIS Fall Semester 2016 Lessons 6 10 Jan 2017 Outline Perceptrons and Support Vector machines Notation... 2 Perceptrons... 3 History...3

More information

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning Mark Schmidt University of British Columbia, May 2016 www.cs.ubc.ca/~schmidtm/svan16 Some images from this lecture are

More information

Issues and Techniques in Pattern Classification

Issues and Techniques in Pattern Classification Issues and Techniques in Pattern Classification Carlotta Domeniconi www.ise.gmu.edu/~carlotta Machine Learning Given a collection of data, a machine learner eplains the underlying process that generated

More information

Face Recognition Using Eigenfaces

Face Recognition Using Eigenfaces Face Recognition Using Eigenfaces Prof. V.P. Kshirsagar, M.R.Baviskar, M.E.Gaikwad, Dept. of CSE, Govt. Engineering College, Aurangabad (MS), India. vkshirsagar@gmail.com, madhumita_baviskar@yahoo.co.in,

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

6.867 Machine learning

6.867 Machine learning 6.867 Machine learning Mid-term eam October 8, 6 ( points) Your name and MIT ID: .5.5 y.5 y.5 a).5.5 b).5.5.5.5 y.5 y.5 c).5.5 d).5.5 Figure : Plots of linear regression results with different types of

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

Linear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7.

Linear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7. Preliminaries Linear models: the perceptron and closest centroid algorithms Chapter 1, 7 Definition: The Euclidean dot product beteen to vectors is the expression d T x = i x i The dot product is also

More information

Outline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22

Outline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22 Outline Basic concepts: SVM and kernels SVM primal/dual problems Chih-Jen Lin (National Taiwan Univ.) 1 / 22 Outline Basic concepts: SVM and kernels Basic concepts: SVM and kernels SVM primal/dual problems

More information

Support Vector Machine for Classification and Regression

Support Vector Machine for Classification and Regression Support Vector Machine for Classification and Regression Ahlame Douzal AMA-LIG, Université Joseph Fourier Master 2R - MOSIG (2013) November 25, 2013 Loss function, Separating Hyperplanes, Canonical Hyperplan

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

When enough is enough: early stopping of biometrics error rate testing

When enough is enough: early stopping of biometrics error rate testing When enough is enough: early stopping of biometrics error rate testing Michael E. Schuckers Department of Mathematics, Computer Science and Statistics St. Lawrence University and Center for Identification

More information

Pramod K. Varshney. EECS Department, Syracuse University This research was sponsored by ARO grant W911NF

Pramod K. Varshney. EECS Department, Syracuse University This research was sponsored by ARO grant W911NF Pramod K. Varshney EECS Department, Syracuse University varshney@syr.edu This research was sponsored by ARO grant W911NF-09-1-0244 2 Overview of Distributed Inference U i s may be 1. Local decisions 2.

More information

Support Vector Regression with Automatic Accuracy Control B. Scholkopf y, P. Bartlett, A. Smola y,r.williamson FEIT/RSISE, Australian National University, Canberra, Australia y GMD FIRST, Rudower Chaussee

More information

Polyhedral Computation. Linear Classifiers & the SVM

Polyhedral Computation. Linear Classifiers & the SVM Polyhedral Computation Linear Classifiers & the SVM mcuturi@i.kyoto-u.ac.jp Nov 26 2010 1 Statistical Inference Statistical: useful to study random systems... Mutations, environmental changes etc. life

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Stefanos Zafeiriou, Anastasios Tefas, and Ioannis Pitas

Stefanos Zafeiriou, Anastasios Tefas, and Ioannis Pitas GENDER DETERMINATION USING A SUPPORT VECTOR MACHINE VARIANT Stefanos Zafeiriou, Anastasios Tefas, and Ioannis Pitas Artificial Intelligence and Information Analysis Lab/Department of Informatics, Aristotle

More information

Predicting Time of Peak Foreign Exchange Rates. Charles Mulemi, Lucio Dery 0. ABSTRACT

Predicting Time of Peak Foreign Exchange Rates. Charles Mulemi, Lucio Dery 0. ABSTRACT Predicting Time of Peak Foreign Exchange Rates Charles Mulemi, Lucio Dery 0. ABSTRACT This paper explores various machine learning models of predicting the day foreign exchange rates peak in a given window.

More information

Score calibration for optimal biometric identification

Score calibration for optimal biometric identification Score calibration for optimal biometric identification (see also NIST IBPC 2010 online proceedings: http://biometrics.nist.gov/ibpc2010) AI/GI/CRV 2010, Ottawa Dmitry O. Gorodnichy Head of Video Surveillance

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Benchmarking Non-Parametric Statistical Tests

Benchmarking Non-Parametric Statistical Tests R E S E A R C H R E P O R T I D I A P Benchmarking Non-Parametric Statistical Tests Mikaela Keller a Samy Bengio a Siew Yeung Wong a IDIAP RR 05-38 January 5, 2006 to appear in Advances in Neural Information

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information