Study on Classification Methods Based on Three Different Learning Criteria. Jae Kyu Suhr

Size: px
Start display at page:

Download "Study on Classification Methods Based on Three Different Learning Criteria. Jae Kyu Suhr"

Transcription

1 Study on Classification Methods Based on Three Different Learning Criteria Jae Kyu Suhr

2 Contents Introduction Three learning criteria LSE, TER, AUC Methods based on three learning criteria LSE:, ELM TER: TER-, TER-ELM AUC: AUC- Experiment Setup: Data sets, Parameter setting Result: Normalization, TER and LAUC results

3 Introduction Pattern classification is a widely researched topic for decision making. In pattern classification, empirical learning constitutes a major paradigm. Under this paradigm, a classifier is designed to minimize a certain cost function (learning criteria). Least Squares Error (LSE) is a commonly used cost function. The reasons for the popularity of LSE are its simplicity, clear physical meaning, and tractability for analysis. The embedment of nonlinearities into linear models has widened the application of LSE cost function.

4 Introduction Recently, two efficient basis functions were proposed. Reduced multinomial Model () [] Basis function: reduced version of full polynomial. Extreme Learning Machine (ELM) [3] Basis function: Single-hidden Layer Feedforward Neural networks (SLFNs). However LSE s limitation becomes apparent when high accuracy is required. LSE cost function tries to minimize the fitting error rather than the classification error which is desired to be minimized for classification task.

5 Introduction Three main approaches have been adopted to overcome this drawback of LSE cost function. Discriminant approach: FDA, GDA Structural approach: SVD Classification-error approach In the third approach, two cost functions were recently proposed. Total Error Rate (TER) -based approach (TER-, TER-ELM) [4,5] Maximize the total error rate in the training stage. Area under the ROC curve (AUC) -based approach (AUC-) [6] Maximize the area under the ROC curve in the training stage. Main breakthrough is a smooth approximate formulation for calculating TER and AUC. Quadratic approximation for counting process Closed-form solution.

6 Introduction In this paper, Five classification methods based on three different learning criteria were evaluated. LSE criteria:, ELM TER criteria: TER-, TER-ELM AUC criteria: AUC- Five two-class problems in the UCI database were used for the method evaluation. Pima-dabetes, SPECT-heart, StatLog-heart, Tic-tac-toe, and Wdbc The efficient way to normalize feature vectors for and ELM-based methods was discussed.

7 LSE-based Method Parametric model adopting a basis expansion term: K g( α, x) = α p ( x) = p( x) α k = LSE cost function b J ( α) = y Pα + α k k Solution for LSE which minimizes J T T αˆ = ( PP+bI) Py basis function ELM basis function fˆ ( α, x) r l r k kjx j rl j x x xl k= j= j= = α + α + α + ( ) r T j ( α j x)( x x xl), l, r. j= j φ( w x+ b) φ( wp xp + bp) H = φ( m + b) φ( p m + bp) w x w x m p

8 Total Error Rate + TER( α, x, x ) TER-based Method + = ( (, ) ) + ( (, )) m m m + Lg j τ Lτ g α x + α xi j= m j= When using g( α, x) = p( ) + TER( α, x, x ) b + = α + px α + + px α + m m m + ( j) τ η τ ( i ) η + j= m i= Optimal parameter x α T T τ η T τ + η T j j + i i j + i α ( ) ( ) = b I m p p m p p m p m p T T τ η T τ + η T α ( ) ( ) = b I m P P m P P m P m P and quadratic approximation

9 AUC-based Method Area under ROC curve + m m + AUC( x, x ) mm + arg min AAC( α, x, x ) α = x x + + g( i ) > g( j ) i= j= + m m + = arg min u g(, j) g(, i ) + mm α x α x α i= j= Optimal parameter + m m T α = bi+ ( j i) ( j i) + p p p p mm i= j= + m m η ( j i) + p p mm i= j= T When using a quadratic approximation + arg min AAC( α, x, x ) α + m m b + arg min α + ( ( j) ( i )) η + mm px px α + α i= j= TER-based threshold + τ = px ( ) α + px ( ) α m m m + j + i j= m i=

10 Method Description Basis Learning criteria LSE TER AUC function [] TER- [4] AUC- [6] SLFNs ELM [3] TER-ELM [5] -

11 Data Set Description DB name Number of samples Number of features Number of classes Missing feature values Pima-diabetes (65% / 35%) None Wisconsin Diagnostic Breast Cancer (63% / 36%) None SPECT-heart 67 (79% / %) None Statlog-heart 7 3 (56% / 44%) None Tic-Tac-Toe Endgame (65% / 35%) None

12 Experimental Setup Validation: -fold cross validation Run: runs for all method and all setting, TER-, AUC- ~ order TER-ELM Activation function: sigmoid ~ hidden neurons TER-, TER-ELM τ = η =.5 AUC- η = Data normalization: min-max, TER-, TER-ELM Data normalization was applied after making P matrix ELM, TER-ELM Data normalization was applied before making H matrix

13 Evaluation Criteria Total Error Rate (TER) total number of misclassified data sample TER (%) = total number of data sample LAUC : Negative base logarithm of AUC values Because the AUC value shows little difference between two biometrics which have high performances LAUC = log ( AUC)

14 Normalization Procedure Min-max normalization technique in three different ways: No normalization Normalization before making P or H matrix Normalization after making P or H matrix.

15 case test error (%) wdbc no before P matrix after P matrix order(~) Normalization after making P matrix has the best performance. P matrix of is produced by multiplying and adding many feature values This leads a singularity problem of the matrix inversion This finally causes the parameter estimation to be unstable. Normalization after making P matrix is better than normalization before making P matrix Even if the feature vectors are normalized before making P matrix Feature values are multiplied and added when producing P matrix. This can also cause the singularity problem.

16 ELM case test error (%) wdbc no before H matrix after H matrix order(~) Normalization before making H matrix has the best performance. No normalization and normalization after making H matrix have almost the same performances

17 ELM case number of occurrence number of occurrence Normalization after making H matrix feature value Input feature number of occurrence number of occurrence -5 5 feature value Input weight and bias feature value Sigmoid activation function Normalization before making H matrix number of occurrence number of occurrence Almost no difference number of occurrence number of occurrence feature value 5 Min-max normalization Much informative feature value Input feature feature value Min-max normalization -5 5 feature value Input weight and bias feauture value Sigmoid activation function

18 Comparison Results test error (%) ELM TER- TER-ELM AUC- Pima-diabetes LAUC.5 ELM TER- TER-ELM AUC- Pima-diabetes order(~), hidden neuron(~) order(~), hidden neuron(~) test error (%) ELM TER- TER-ELM AUC- SPECT-heart LAUC.5.5 ELM TER- TER-ELM AUC- SPECT-heart order(~), hidden neuron(~) order(~), hidden neuron(~)

19 Comparison Results test error (%) ELM TER- TER-ELM AUC- StatLog-heart LAUC.5 ELM TER- TER-ELM AUC- StatLog-heart 5.5 test error (%) order(~), hidden neuron(~) tic-tac-toe ELM TER- TER-ELM AUC- LAUC order(~), hidden neuron(~) tic-tac-toe ELM TER- TER-ELM AUC order(~), hidden neuron(~) order(~), hidden neuron(~)

20 Comparison Results test error (%) ELM TER- TER-ELM AUC- wdbc LAUC ELM TER- TER-ELM AUC- wdbc order(~), hidden neuron(~) order(~), hidden neuron(~)

21 Conclusions For data normalization, Normalization should be applied after making P matrix when using basis function. Normalization should be applied before making H matrix when using ELM basis function. For two class problems, All methods have a similar results. Especially, TER- and AUC- have almost the same performance in terms of TER and LAUC. TER: find the optimal α with a fixed г to minimize the total error rate. AUC: find the optimal г with a fixed α to minimize the total error rate. TER and AUC show a very similar trend.

22 References [] R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification, seconded. John Wiley & Sons,. [] K.-A. Toh, Q.-L. Tran, and D. Srinivasan, Benchmarking a reduced multivariate polynomial pattern classifier, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 6, no. 6, pp , 4. [3] Huang, G.-B., Zhu, Q.-Y., & Siew, C.-K. (6). Extreme learning machine: Theory and applications. Neurocomputing, 7, [4] K.-A. Toh and H.-L. Eng, Between classification-error approximation and weighted least-squares learning, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 3, no. 4, pp , 8. [5] K.-A. Toh, Deterministic Neural Classification, Neural Computation, 8. [6] K.-A. Toh, J. Kim and S. Lee, Maximizing Area Under ROC Curve for Biometric Scores Fusion, Pattern Recognition, 8. [7] K.-A. Toh, Learning from Target Knowledge Approximation, Proc. First IEEE Conf. Industrial Electronics and Applications, pp. 85-8, May 6. [8] J.A. Hanley, B.J. McNeil, The meaning and use of the area under a receiver operating characteristic (ROC) curve, Radiology 43 (98) [9] K.-A. Toh, Between AUC Based and Error Rate Based Learning, The 3rd IEEE Conference on Industrial Electronics and Applications (ICIEA), Singapore, June 8. [] D.J. Newman, S. Hettich, C.L. Blake, and C.J. Merz, UCI Repository of Machine Learning Databases, Univ. of California, Dept. of Information and Computer Sciences,

23 THE END

Modeling High-Dimensional Discrete Data with Multi-Layer Neural Networks

Modeling High-Dimensional Discrete Data with Multi-Layer Neural Networks Modeling High-Dimensional Discrete Data with Multi-Layer Neural Networks Yoshua Bengio Dept. IRO Université de Montréal Montreal, Qc, Canada, H3C 3J7 bengioy@iro.umontreal.ca Samy Bengio IDIAP CP 592,

More information

Optimizing Data Transformation for Binary Classification

Optimizing Data Transformation for Binary Classification Optimizing Data ransformation for Binary Classification Kangro Oh, Kar-Ann oh, and Zhengguo Li Abstract In this paper, we propose to optimize a data transformation matrix and study its impact on binary

More information

Improving the Expert Networks of a Modular Multi-Net System for Pattern Recognition

Improving the Expert Networks of a Modular Multi-Net System for Pattern Recognition Improving the Expert Networks of a Modular Multi-Net System for Pattern Recognition Mercedes Fernández-Redondo 1, Joaquín Torres-Sospedra 1 and Carlos Hernández-Espinosa 1 Departamento de Ingenieria y

More information

I D I A P. Online Policy Adaptation for Ensemble Classifiers R E S E A R C H R E P O R T. Samy Bengio b. Christos Dimitrakakis a IDIAP RR 03-69

I D I A P. Online Policy Adaptation for Ensemble Classifiers R E S E A R C H R E P O R T. Samy Bengio b. Christos Dimitrakakis a IDIAP RR 03-69 R E S E A R C H R E P O R T Online Policy Adaptation for Ensemble Classifiers Christos Dimitrakakis a IDIAP RR 03-69 Samy Bengio b I D I A P December 2003 D a l l e M o l l e I n s t i t u t e for Perceptual

More information

Neural Networks and the Back-propagation Algorithm

Neural Networks and the Back-propagation Algorithm Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely

More information

Weight Initialization Methods for Multilayer Feedforward. 1

Weight Initialization Methods for Multilayer Feedforward. 1 Weight Initialization Methods for Multilayer Feedforward. 1 Mercedes Fernández-Redondo - Carlos Hernández-Espinosa. Universidad Jaume I, Campus de Riu Sec, Edificio TI, Departamento de Informática, 12080

More information

Electric Load Forecasting Using Wavelet Transform and Extreme Learning Machine

Electric Load Forecasting Using Wavelet Transform and Extreme Learning Machine Electric Load Forecasting Using Wavelet Transform and Extreme Learning Machine Song Li 1, Peng Wang 1 and Lalit Goel 1 1 School of Electrical and Electronic Engineering Nanyang Technological University

More information

Recurrence Enhances the Spatial Encoding of Static Inputs in Reservoir Networks

Recurrence Enhances the Spatial Encoding of Static Inputs in Reservoir Networks Recurrence Enhances the Spatial Encoding of Static Inputs in Reservoir Networks Christian Emmerich, R. Felix Reinhart, and Jochen J. Steil Research Institute for Cognition and Robotics (CoR-Lab), Bielefeld

More information

Learning Kernel Parameters by using Class Separability Measure

Learning Kernel Parameters by using Class Separability Measure Learning Kernel Parameters by using Class Separability Measure Lei Wang, Kap Luk Chan School of Electrical and Electronic Engineering Nanyang Technological University Singapore, 3979 E-mail: P 3733@ntu.edu.sg,eklchan@ntu.edu.sg

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

BANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1

BANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1 BANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1 Shaobo Li University of Cincinnati 1 Partially based on Hastie, et al. (2009) ESL, and James, et al. (2013) ISLR Data Mining I Lecture

More information

Support Vector Machine via Nonlinear Rescaling Method

Support Vector Machine via Nonlinear Rescaling Method Manuscript Click here to download Manuscript: svm-nrm_3.tex Support Vector Machine via Nonlinear Rescaling Method Roman Polyak Department of SEOR and Department of Mathematical Sciences George Mason University

More information

Necessary Corrections in Intransitive Likelihood-Ratio Classifiers

Necessary Corrections in Intransitive Likelihood-Ratio Classifiers Necessary Corrections in Intransitive Likelihood-Ratio Classifiers Gang Ji and Jeff Bilmes SSLI-Lab, Department of Electrical Engineering University of Washington Seattle, WA 9895-500 {gang,bilmes}@ee.washington.edu

More information

Sparse Support Vector Machines by Kernel Discriminant Analysis

Sparse Support Vector Machines by Kernel Discriminant Analysis Sparse Support Vector Machines by Kernel Discriminant Analysis Kazuki Iwamura and Shigeo Abe Kobe University - Graduate School of Engineering Kobe, Japan Abstract. We discuss sparse support vector machines

More information

EEE 241: Linear Systems

EEE 241: Linear Systems EEE 4: Linear Systems Summary # 3: Introduction to artificial neural networks DISTRIBUTED REPRESENTATION An ANN consists of simple processing units communicating with each other. The basic elements of

More information

Intelligent Modular Neural Network for Dynamic System Parameter Estimation

Intelligent Modular Neural Network for Dynamic System Parameter Estimation Intelligent Modular Neural Network for Dynamic System Parameter Estimation Andrzej Materka Technical University of Lodz, Institute of Electronics Stefanowskiego 18, 9-537 Lodz, Poland Abstract: A technique

More information

MLPR: Logistic Regression and Neural Networks

MLPR: Logistic Regression and Neural Networks MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition Amos Storkey Amos Storkey MLPR: Logistic Regression and Neural Networks 1/28 Outline 1 Logistic Regression 2 Multi-layer

More information

Outline. MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition. Which is the correct model? Recap.

Outline. MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition. Which is the correct model? Recap. Outline MLPR: and Neural Networks Machine Learning and Pattern Recognition 2 Amos Storkey Amos Storkey MLPR: and Neural Networks /28 Recap Amos Storkey MLPR: and Neural Networks 2/28 Which is the correct

More information

Learning Methods for Linear Detectors

Learning Methods for Linear Detectors Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2011/2012 Lesson 20 27 April 2012 Contents Learning Methods for Linear Detectors Learning Linear Detectors...2

More information

Non-parametric Classification of Facial Features

Non-parametric Classification of Facial Features Non-parametric Classification of Facial Features Hyun Sung Chang Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology Problem statement In this project, I attempted

More information

Combination Methods for Ensembles of Multilayer Feedforward 1

Combination Methods for Ensembles of Multilayer Feedforward 1 Combination Methods for Ensembles of Multilayer Feedforward 1 JOAQUÍN TORRES-SOSPEDRA MERCEDES FERNÁNDEZ-REDONDO CARLOS HERNÁNDEZ-ESPINOSA Dept. de Ingeniería y Ciencia de los Computadores Universidad

More information

Support Vector Machine (continued)

Support Vector Machine (continued) Support Vector Machine continued) Overlapping class distribution: In practice the class-conditional distributions may overlap, so that the training data points are no longer linearly separable. We need

More information

Linear Classifiers as Pattern Detectors

Linear Classifiers as Pattern Detectors Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2014/2015 Lesson 16 8 April 2015 Contents Linear Classifiers as Pattern Detectors Notation...2 Linear

More information

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1 EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

Biometric scores fusion based on total error rate minimization

Biometric scores fusion based on total error rate minimization Pattern Recognition 4 (28) 66 82 www.elsevier.com/locate/pr Biometric scores fusion based on total error rate minimization Kar-Ann Toh, Jaihie Kim, Sangyoun Lee Biometrics Engineering Research Center,

More information

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber Machine Learning Regression-Based Classification & Gaussian Discriminant Analysis Manfred Huber 2015 1 Logistic Regression Linear regression provides a nice representation and an efficient solution to

More information

A BAYESIAN APPROACH FOR EXTREME LEARNING MACHINE-BASED SUBSPACE LEARNING. Alexandros Iosifidis and Moncef Gabbouj

A BAYESIAN APPROACH FOR EXTREME LEARNING MACHINE-BASED SUBSPACE LEARNING. Alexandros Iosifidis and Moncef Gabbouj A BAYESIAN APPROACH FOR EXTREME LEARNING MACHINE-BASED SUBSPACE LEARNING Alexandros Iosifidis and Moncef Gabbouj Department of Signal Processing, Tampere University of Technology, Finland {alexandros.iosifidis,moncef.gabbouj}@tut.fi

More information

Machine Learning : Support Vector Machines

Machine Learning : Support Vector Machines Machine Learning Support Vector Machines 05/01/2014 Machine Learning : Support Vector Machines Linear Classifiers (recap) A building block for almost all a mapping, a partitioning of the input space into

More information

Discriminant Analysis and Statistical Pattern Recognition

Discriminant Analysis and Statistical Pattern Recognition Discriminant Analysis and Statistical Pattern Recognition GEOFFREY J. McLACHLAN Department of Mathematics The University of Queensland St. Lucia, Queensland, Australia A Wiley-Interscience Publication

More information

PATTERN CLASSIFICATION

PATTERN CLASSIFICATION PATTERN CLASSIFICATION Second Edition Richard O. Duda Peter E. Hart David G. Stork A Wiley-lnterscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane Singapore Toronto CONTENTS

More information

Optimization Approximation Solution for Regression Problem Based on Extremal Learning Machine

Optimization Approximation Solution for Regression Problem Based on Extremal Learning Machine Optimization Approximation Solution for Regression Problem Based on Extremal Learning Machine Yubo Yuan Yuguang Wang Feilong Cao Department of Mathematics, China Jiliang University, Hangzhou 3008, Zhejiang

More information

p(d θ ) l(θ ) 1.2 x x x

p(d θ ) l(θ ) 1.2 x x x p(d θ ).2 x 0-7 0.8 x 0-7 0.4 x 0-7 l(θ ) -20-40 -60-80 -00 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ θ x FIGURE 3.. The top graph shows several training points in one dimension, known or assumed to

More information

CS 195-5: Machine Learning Problem Set 1

CS 195-5: Machine Learning Problem Set 1 CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of

More information

Index of Balanced Accuracy: A Performance Measure for Skewed Class Distributions

Index of Balanced Accuracy: A Performance Measure for Skewed Class Distributions Index of Balanced Accuracy: A Performance Measure for Skewed Class Distributions V. García 1,2, R.A. Mollineda 2, and J.S. Sánchez 2 1 Lab. Reconocimiento de Patrones, Instituto Tecnológico de Toluca Av.

More information

Bayesian Reasoning and Recognition

Bayesian Reasoning and Recognition Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIAG 2 / osig 1 Second Semester 2013/2014 Lesson 12 28 arch 2014 Bayesian Reasoning and Recognition Notation...2 Pattern Recognition...3

More information

Multilayer Perceptron = FeedForward Neural Network

Multilayer Perceptron = FeedForward Neural Network Multilayer Perceptron = FeedForward Neural Networ History Definition Classification = feedforward operation Learning = bacpropagation = local optimization in the space of weights Pattern Classification

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of

More information

Bayesian Decision Theory

Bayesian Decision Theory Introduction to Pattern Recognition [ Part 4 ] Mahdi Vasighi Remarks It is quite common to assume that the data in each class are adequately described by a Gaussian distribution. Bayesian classifier is

More information

NEAREST NEIGHBOR CLASSIFICATION WITH IMPROVED WEIGHTED DISSIMILARITY MEASURE

NEAREST NEIGHBOR CLASSIFICATION WITH IMPROVED WEIGHTED DISSIMILARITY MEASURE THE PUBLISHING HOUSE PROCEEDINGS OF THE ROMANIAN ACADEMY, Series A, OF THE ROMANIAN ACADEMY Volume 0, Number /009, pp. 000 000 NEAREST NEIGHBOR CLASSIFICATION WITH IMPROVED WEIGHTED DISSIMILARITY MEASURE

More information

COGS Q250 Fall Homework 7: Learning in Neural Networks Due: 9:00am, Friday 2nd November.

COGS Q250 Fall Homework 7: Learning in Neural Networks Due: 9:00am, Friday 2nd November. COGS Q250 Fall 2012 Homework 7: Learning in Neural Networks Due: 9:00am, Friday 2nd November. For the first two questions of the homework you will need to understand the learning algorithm using the delta

More information

Pattern Classification

Pattern Classification Pattern Classification All materials in these slides were taen from Pattern Classification (2nd ed) by R. O. Duda,, P. E. Hart and D. G. Stor, John Wiley & Sons, 2000 with the permission of the authors

More information

Multivariate statistical methods and data mining in particle physics

Multivariate statistical methods and data mining in particle physics Multivariate statistical methods and data mining in particle physics RHUL Physics www.pp.rhul.ac.uk/~cowan Academic Training Lectures CERN 16 19 June, 2008 1 Outline Statement of the problem Some general

More information

Course 395: Machine Learning - Lectures

Course 395: Machine Learning - Lectures Course 395: Machine Learning - Lectures Lecture 1-2: Concept Learning (M. Pantic) Lecture 3-4: Decision Trees & CBC Intro (M. Pantic & S. Petridis) Lecture 5-6: Evaluating Hypotheses (S. Petridis) Lecture

More information

Discriminant Kernels based Support Vector Machine

Discriminant Kernels based Support Vector Machine Discriminant Kernels based Support Vector Machine Akinori Hidaka Tokyo Denki University Takio Kurita Hiroshima University Abstract Recently the kernel discriminant analysis (KDA) has been successfully

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012) Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation

More information

Unsupervised Learning with Permuted Data

Unsupervised Learning with Permuted Data Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University

More information

ESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN ,

ESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN , Relevance determination in learning vector quantization Thorsten Bojer, Barbara Hammer, Daniel Schunk, and Katharina Tluk von Toschanowitz University of Osnabrück, Department of Mathematics/ Computer Science,

More information

Comparison of Log-Linear Models and Weighted Dissimilarity Measures

Comparison of Log-Linear Models and Weighted Dissimilarity Measures Comparison of Log-Linear Models and Weighted Dissimilarity Measures Daniel Keysers 1, Roberto Paredes 2, Enrique Vidal 2, and Hermann Ney 1 1 Lehrstuhl für Informatik VI, Computer Science Department RWTH

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

EVALUATING MISCLASSIFICATION PROBABILITY USING EMPIRICAL RISK 1. Victor Nedel ko

EVALUATING MISCLASSIFICATION PROBABILITY USING EMPIRICAL RISK 1. Victor Nedel ko 94 International Journal "Information Theories & Applications" Vol13 [Raudys, 001] Raudys S, Statistical and neural classifiers, Springer, 001 [Mirenkova, 00] S V Mirenkova (edel ko) A method for prediction

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Lossless Online Bayesian Bagging

Lossless Online Bayesian Bagging Lossless Online Bayesian Bagging Herbert K. H. Lee ISDS Duke University Box 90251 Durham, NC 27708 herbie@isds.duke.edu Merlise A. Clyde ISDS Duke University Box 90251 Durham, NC 27708 clyde@isds.duke.edu

More information

BACKPROPAGATION. Neural network training optimization problem. Deriving backpropagation

BACKPROPAGATION. Neural network training optimization problem. Deriving backpropagation BACKPROPAGATION Neural network training optimization problem min J(w) w The application of gradient descent to this problem is called backpropagation. Backpropagation is gradient descent applied to J(w)

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

Article from. Predictive Analytics and Futurism. July 2016 Issue 13

Article from. Predictive Analytics and Futurism. July 2016 Issue 13 Article from Predictive Analytics and Futurism July 2016 Issue 13 Regression and Classification: A Deeper Look By Jeff Heaton Classification and regression are the two most common forms of models fitted

More information

Classification with Kernel Mahalanobis Distance Classifiers

Classification with Kernel Mahalanobis Distance Classifiers Classification with Kernel Mahalanobis Distance Classifiers Bernard Haasdonk and Elżbieta P ekalska 2 Institute of Numerical and Applied Mathematics, University of Münster, Germany, haasdonk@math.uni-muenster.de

More information

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio Class 4: Classification Quaid Morris February 11 th, 211 ML4Bio Overview Basic concepts in classification: overfitting, cross-validation, evaluation. Linear Discriminant Analysis and Quadratic Discriminant

More information

Benchmarking Functional Link Expansions for Audio Classification Tasks

Benchmarking Functional Link Expansions for Audio Classification Tasks 25th Italian Workshop on Neural Networks (Vietri sul Mare) Benchmarking Functional Link Expansions for Audio Classification Tasks Scardapane S., Comminiello D., Scarpiniti M., Parisi R. and Uncini A. Overview

More information

A TWO-STAGE COMMITTEE MACHINE OF NEURAL NETWORKS

A TWO-STAGE COMMITTEE MACHINE OF NEURAL NETWORKS Journal of the Chinese Institute of Engineers, Vol. 32, No. 2, pp. 169-178 (2009) 169 A TWO-STAGE COMMITTEE MACHINE OF NEURAL NETWORKS Jen-Feng Wang, Chinson Yeh, Chen-Wen Yen*, and Mark L. Nagurka ABSTRACT

More information

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework

More information

Generalization to Unseen Cases

Generalization to Unseen Cases In Y. Weiss, B. Schölkopf, and J. Platt (Eds.), Advances in Neural Information Processing Systems 18 (NIPS-05), pp. 1129 1136, MIT Press, Cambridge, MA, 2006 Generalization to Unseen Cases Teemu Roos Helsinki

More information

1162 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 11, NO. 5, SEPTEMBER The Evidence Framework Applied to Support Vector Machines

1162 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 11, NO. 5, SEPTEMBER The Evidence Framework Applied to Support Vector Machines 1162 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 11, NO. 5, SEPTEMBER 2000 Brief Papers The Evidence Framework Applied to Support Vector Machines James Tin-Yau Kwok Abstract In this paper, we show that

More information

COMS 4771 Introduction to Machine Learning. Nakul Verma

COMS 4771 Introduction to Machine Learning. Nakul Verma COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW1 due next lecture Project details are available decide on the group and topic by Thursday Last time Generative vs. Discriminative

More information

Multi-layer Neural Networks

Multi-layer Neural Networks Multi-layer Neural Networks Steve Renals Informatics 2B Learning and Data Lecture 13 8 March 2011 Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 1 Overview Multi-layer neural

More information

Multilayer Neural Networks

Multilayer Neural Networks Multilayer Neural Networks Multilayer Neural Networks Discriminant function flexibility NON-Linear But with sets of linear parameters at each layer Provably general function approximators for sufficient

More information

From perceptrons to word embeddings. Simon Šuster University of Groningen

From perceptrons to word embeddings. Simon Šuster University of Groningen From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written

More information

Notes on Discriminant Functions and Optimal Classification

Notes on Discriminant Functions and Optimal Classification Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem

More information

The Nearest Feature Midpoint - A Novel Approach for Pattern Classification. Abstract

The Nearest Feature Midpoint - A Novel Approach for Pattern Classification. Abstract International Journal of Information Technology, Vol. No. The Nearest Feature Midpoint - A Novel Approach for Pattern Classification Zonglin Zhou and Chee Keong Kwoh Department of Computer Science Hong

More information

DESIGNING RBF CLASSIFIERS FOR WEIGHTED BOOSTING

DESIGNING RBF CLASSIFIERS FOR WEIGHTED BOOSTING DESIGNING RBF CLASSIFIERS FOR WEIGHTED BOOSTING Vanessa Gómez-Verdejo, Jerónimo Arenas-García, Manuel Ortega-Moral and Aníbal R. Figueiras-Vidal Department of Signal Theory and Communications Universidad

More information

A Simple Implementation of the Stochastic Discrimination for Pattern Recognition

A Simple Implementation of the Stochastic Discrimination for Pattern Recognition A Simple Implementation of the Stochastic Discrimination for Pattern Recognition Dechang Chen 1 and Xiuzhen Cheng 2 1 University of Wisconsin Green Bay, Green Bay, WI 54311, USA chend@uwgb.edu 2 University

More information

Neural Network to Control Output of Hidden Node According to Input Patterns

Neural Network to Control Output of Hidden Node According to Input Patterns American Journal of Intelligent Systems 24, 4(5): 96-23 DOI:.5923/j.ajis.2445.2 Neural Network to Control Output of Hidden Node According to Input Patterns Takafumi Sasakawa, Jun Sawamoto 2,*, Hidekazu

More information

Predicting the Probability of Correct Classification

Predicting the Probability of Correct Classification Predicting the Probability of Correct Classification Gregory Z. Grudic Department of Computer Science University of Colorado, Boulder grudic@cs.colorado.edu Abstract We propose a formulation for binary

More information

A Metric Approach to Building Decision Trees based on Goodman-Kruskal Association Index

A Metric Approach to Building Decision Trees based on Goodman-Kruskal Association Index A Metric Approach to Building Decision Trees based on Goodman-Kruskal Association Index Dan A. Simovici and Szymon Jaroszewicz University of Massachusetts at Boston, Department of Computer Science, Boston,

More information

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1 What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1 Multi-layer networks Steve Renals Machine Learning Practical MLP Lecture 3 7 October 2015 MLP Lecture 3 Multi-layer networks 2 What Do Single

More information

Selection of Classifiers based on Multiple Classifier Behaviour

Selection of Classifiers based on Multiple Classifier Behaviour Selection of Classifiers based on Multiple Classifier Behaviour Giorgio Giacinto, Fabio Roli, and Giorgio Fumera Dept. of Electrical and Electronic Eng. - University of Cagliari Piazza d Armi, 09123 Cagliari,

More information

LBR-Meta: An Efficient Algorithm for Lazy Bayesian Rules

LBR-Meta: An Efficient Algorithm for Lazy Bayesian Rules LBR-Meta: An Efficient Algorithm for Lazy Bayesian Rules Zhipeng Xie School of Computer Science Fudan University 220 Handan Road, Shanghai 200433, PR. China xiezp@fudan.edu.cn Abstract LBR is a highly

More information

Supervised locally linear embedding

Supervised locally linear embedding Supervised locally linear embedding Dick de Ridder 1, Olga Kouropteva 2, Oleg Okun 2, Matti Pietikäinen 2 and Robert P.W. Duin 1 1 Pattern Recognition Group, Department of Imaging Science and Technology,

More information

Dynamic Linear Combination of Two-Class Classifiers

Dynamic Linear Combination of Two-Class Classifiers Dynamic Linear Combination of Two-Class Classifiers Carlo Lobrano 1, Roberto Tronci 1,2, Giorgio Giacinto 1, and Fabio Roli 1 1 DIEE Dept. of Electrical and Electronic Engineering, University of Cagliari,

More information

Linear Methods for Classification

Linear Methods for Classification Linear Methods for Classification Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Classification Supervised learning Training data: {(x 1, g 1 ), (x 2, g 2 ),..., (x

More information

Diversity-Based Boosting Algorithm

Diversity-Based Boosting Algorithm Diversity-Based Boosting Algorithm Jafar A. Alzubi School of Engineering Al-Balqa Applied University Al-Salt, Jordan Abstract Boosting is a well known and efficient technique for constructing a classifier

More information

Kernel-based Feature Extraction under Maximum Margin Criterion

Kernel-based Feature Extraction under Maximum Margin Criterion Kernel-based Feature Extraction under Maximum Margin Criterion Jiangping Wang, Jieyan Fan, Huanghuang Li, and Dapeng Wu 1 Department of Electrical and Computer Engineering, University of Florida, Gainesville,

More information

Extreme Learning Machine: RBF Network Case

Extreme Learning Machine: RBF Network Case Extreme Learning Machine: RBF Network Case Guang-Bin Huang and Chee-Kheong Siew School of Electrical and Electronic Engineering Nanyang Technological University Nanyang Avenue, Singapore 639798 E-mail:

More information

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks Topics in Machine Learning-EE 5359 Neural Networks 1 The Perceptron Output: A perceptron is a function that maps D-dimensional vectors to real numbers. For notational convenience, we add a zero-th dimension

More information

Gaussian discriminant analysis Naive Bayes

Gaussian discriminant analysis Naive Bayes DM825 Introduction to Machine Learning Lecture 7 Gaussian discriminant analysis Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Outline 1. is 2. Multi-variate

More information

Voting Massive Collections of Bayesian Network Classifiers for Data Streams

Voting Massive Collections of Bayesian Network Classifiers for Data Streams Voting Massive Collections of Bayesian Network Classifiers for Data Streams Remco R. Bouckaert Computer Science Department, University of Waikato, New Zealand remco@cs.waikato.ac.nz Abstract. We present

More information

A New Wrapper Method for Feature Subset Selection

A New Wrapper Method for Feature Subset Selection A New Wrapper Method for Feature Subset Selection Noelia Sánchez-Maroño 1 and Amparo Alonso-Betanzos 1 and Enrique Castillo 2 1- University of A Coruña- Department of Computer Science- LIDIA Lab. Faculty

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

Linear Discrimination Functions

Linear Discrimination Functions Laurea Magistrale in Informatica Nicola Fanizzi Dipartimento di Informatica Università degli Studi di Bari November 4, 2009 Outline Linear models Gradient descent Perceptron Minimum square error approach

More information

Artificial Neural Networks

Artificial Neural Networks Introduction ANN in Action Final Observations Application: Poverty Detection Artificial Neural Networks Alvaro J. Riascos Villegas University of los Andes and Quantil July 6 2018 Artificial Neural Networks

More information

Minimal Attribute Space Bias for Attribute Reduction

Minimal Attribute Space Bias for Attribute Reduction Minimal Attribute Space Bias for Attribute Reduction Fan Min, Xianghui Du, Hang Qiu, and Qihe Liu School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu

More information

Divergence based Learning Vector Quantization

Divergence based Learning Vector Quantization Divergence based Learning Vector Quantization E. Mwebaze 1,2, P. Schneider 2, F.-M. Schleif 3, S. Haase 4, T. Villmann 4, M. Biehl 2 1 Faculty of Computing & IT, Makerere Univ., P.O. Box 7062, Kampala,

More information

STUDY ON METHODS FOR COMPUTER-AIDED TOOTH SHADE DETERMINATION

STUDY ON METHODS FOR COMPUTER-AIDED TOOTH SHADE DETERMINATION INTERNATIONAL JOURNAL OF INFORMATION AND SYSTEMS SCIENCES Volume 5, Number 3-4, Pages 351 358 c 2009 Institute for Scientific Computing and Information STUDY ON METHODS FOR COMPUTER-AIDED TOOTH SHADE DETERMINATION

More information

k k k 1 Lecture 9: Applying Backpropagation Lecture 9: Applying Backpropagation 3 Lecture 9: Applying Backpropagation

k k k 1 Lecture 9: Applying Backpropagation Lecture 9: Applying Backpropagation 3 Lecture 9: Applying Backpropagation K-Class Classification Problem Let us denote the -th class by C, with n exemplars or training samples, forming the sets T for = 1,, K: {( x, ) p = 1 n } T = d,..., p p The complete training set is T =

More information

Maximum Likelihood Estimation. only training data is available to design a classifier

Maximum Likelihood Estimation. only training data is available to design a classifier Introduction to Pattern Recognition [ Part 5 ] Mahdi Vasighi Introduction Bayesian Decision Theory shows that we could design an optimal classifier if we knew: P( i ) : priors p(x i ) : class-conditional

More information

Integrating Global and Local Structures: A Least Squares Framework for Dimensionality Reduction

Integrating Global and Local Structures: A Least Squares Framework for Dimensionality Reduction Integrating Global and Local Structures: A Least Squares Framework for Dimensionality Reduction Jianhui Chen, Jieping Ye Computer Science and Engineering Department Arizona State University {jianhui.chen,

More information

Non-Iterative Heteroscedastic Linear Dimension Reduction for Two-Class Data

Non-Iterative Heteroscedastic Linear Dimension Reduction for Two-Class Data Non-Iterative Heteroscedastic Linear Dimension Reduction for Two-Class Data From Fisher to Chernoff M. Loog and R. P.. Duin Image Sciences Institute, University Medical Center Utrecht, Utrecht, The Netherlands,

More information

Two-Layered Face Detection System using Evolutionary Algorithm

Two-Layered Face Detection System using Evolutionary Algorithm Two-Layered Face Detection System using Evolutionary Algorithm Jun-Su Jang Jong-Hwan Kim Dept. of Electrical Engineering and Computer Science, Korea Advanced Institute of Science and Technology (KAIST),

More information

A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems

A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems Machine Learning, 45, 171 186, 001 c 001 Kluwer Academic Publishers. Manufactured in The Netherlands. A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems

More information