Study on Classification Methods Based on Three Different Learning Criteria. Jae Kyu Suhr
|
|
- Erika Ray
- 6 years ago
- Views:
Transcription
1 Study on Classification Methods Based on Three Different Learning Criteria Jae Kyu Suhr
2 Contents Introduction Three learning criteria LSE, TER, AUC Methods based on three learning criteria LSE:, ELM TER: TER-, TER-ELM AUC: AUC- Experiment Setup: Data sets, Parameter setting Result: Normalization, TER and LAUC results
3 Introduction Pattern classification is a widely researched topic for decision making. In pattern classification, empirical learning constitutes a major paradigm. Under this paradigm, a classifier is designed to minimize a certain cost function (learning criteria). Least Squares Error (LSE) is a commonly used cost function. The reasons for the popularity of LSE are its simplicity, clear physical meaning, and tractability for analysis. The embedment of nonlinearities into linear models has widened the application of LSE cost function.
4 Introduction Recently, two efficient basis functions were proposed. Reduced multinomial Model () [] Basis function: reduced version of full polynomial. Extreme Learning Machine (ELM) [3] Basis function: Single-hidden Layer Feedforward Neural networks (SLFNs). However LSE s limitation becomes apparent when high accuracy is required. LSE cost function tries to minimize the fitting error rather than the classification error which is desired to be minimized for classification task.
5 Introduction Three main approaches have been adopted to overcome this drawback of LSE cost function. Discriminant approach: FDA, GDA Structural approach: SVD Classification-error approach In the third approach, two cost functions were recently proposed. Total Error Rate (TER) -based approach (TER-, TER-ELM) [4,5] Maximize the total error rate in the training stage. Area under the ROC curve (AUC) -based approach (AUC-) [6] Maximize the area under the ROC curve in the training stage. Main breakthrough is a smooth approximate formulation for calculating TER and AUC. Quadratic approximation for counting process Closed-form solution.
6 Introduction In this paper, Five classification methods based on three different learning criteria were evaluated. LSE criteria:, ELM TER criteria: TER-, TER-ELM AUC criteria: AUC- Five two-class problems in the UCI database were used for the method evaluation. Pima-dabetes, SPECT-heart, StatLog-heart, Tic-tac-toe, and Wdbc The efficient way to normalize feature vectors for and ELM-based methods was discussed.
7 LSE-based Method Parametric model adopting a basis expansion term: K g( α, x) = α p ( x) = p( x) α k = LSE cost function b J ( α) = y Pα + α k k Solution for LSE which minimizes J T T αˆ = ( PP+bI) Py basis function ELM basis function fˆ ( α, x) r l r k kjx j rl j x x xl k= j= j= = α + α + α + ( ) r T j ( α j x)( x x xl), l, r. j= j φ( w x+ b) φ( wp xp + bp) H = φ( m + b) φ( p m + bp) w x w x m p
8 Total Error Rate + TER( α, x, x ) TER-based Method + = ( (, ) ) + ( (, )) m m m + Lg j τ Lτ g α x + α xi j= m j= When using g( α, x) = p( ) + TER( α, x, x ) b + = α + px α + + px α + m m m + ( j) τ η τ ( i ) η + j= m i= Optimal parameter x α T T τ η T τ + η T j j + i i j + i α ( ) ( ) = b I m p p m p p m p m p T T τ η T τ + η T α ( ) ( ) = b I m P P m P P m P m P and quadratic approximation
9 AUC-based Method Area under ROC curve + m m + AUC( x, x ) mm + arg min AAC( α, x, x ) α = x x + + g( i ) > g( j ) i= j= + m m + = arg min u g(, j) g(, i ) + mm α x α x α i= j= Optimal parameter + m m T α = bi+ ( j i) ( j i) + p p p p mm i= j= + m m η ( j i) + p p mm i= j= T When using a quadratic approximation + arg min AAC( α, x, x ) α + m m b + arg min α + ( ( j) ( i )) η + mm px px α + α i= j= TER-based threshold + τ = px ( ) α + px ( ) α m m m + j + i j= m i=
10 Method Description Basis Learning criteria LSE TER AUC function [] TER- [4] AUC- [6] SLFNs ELM [3] TER-ELM [5] -
11 Data Set Description DB name Number of samples Number of features Number of classes Missing feature values Pima-diabetes (65% / 35%) None Wisconsin Diagnostic Breast Cancer (63% / 36%) None SPECT-heart 67 (79% / %) None Statlog-heart 7 3 (56% / 44%) None Tic-Tac-Toe Endgame (65% / 35%) None
12 Experimental Setup Validation: -fold cross validation Run: runs for all method and all setting, TER-, AUC- ~ order TER-ELM Activation function: sigmoid ~ hidden neurons TER-, TER-ELM τ = η =.5 AUC- η = Data normalization: min-max, TER-, TER-ELM Data normalization was applied after making P matrix ELM, TER-ELM Data normalization was applied before making H matrix
13 Evaluation Criteria Total Error Rate (TER) total number of misclassified data sample TER (%) = total number of data sample LAUC : Negative base logarithm of AUC values Because the AUC value shows little difference between two biometrics which have high performances LAUC = log ( AUC)
14 Normalization Procedure Min-max normalization technique in three different ways: No normalization Normalization before making P or H matrix Normalization after making P or H matrix.
15 case test error (%) wdbc no before P matrix after P matrix order(~) Normalization after making P matrix has the best performance. P matrix of is produced by multiplying and adding many feature values This leads a singularity problem of the matrix inversion This finally causes the parameter estimation to be unstable. Normalization after making P matrix is better than normalization before making P matrix Even if the feature vectors are normalized before making P matrix Feature values are multiplied and added when producing P matrix. This can also cause the singularity problem.
16 ELM case test error (%) wdbc no before H matrix after H matrix order(~) Normalization before making H matrix has the best performance. No normalization and normalization after making H matrix have almost the same performances
17 ELM case number of occurrence number of occurrence Normalization after making H matrix feature value Input feature number of occurrence number of occurrence -5 5 feature value Input weight and bias feature value Sigmoid activation function Normalization before making H matrix number of occurrence number of occurrence Almost no difference number of occurrence number of occurrence feature value 5 Min-max normalization Much informative feature value Input feature feature value Min-max normalization -5 5 feature value Input weight and bias feauture value Sigmoid activation function
18 Comparison Results test error (%) ELM TER- TER-ELM AUC- Pima-diabetes LAUC.5 ELM TER- TER-ELM AUC- Pima-diabetes order(~), hidden neuron(~) order(~), hidden neuron(~) test error (%) ELM TER- TER-ELM AUC- SPECT-heart LAUC.5.5 ELM TER- TER-ELM AUC- SPECT-heart order(~), hidden neuron(~) order(~), hidden neuron(~)
19 Comparison Results test error (%) ELM TER- TER-ELM AUC- StatLog-heart LAUC.5 ELM TER- TER-ELM AUC- StatLog-heart 5.5 test error (%) order(~), hidden neuron(~) tic-tac-toe ELM TER- TER-ELM AUC- LAUC order(~), hidden neuron(~) tic-tac-toe ELM TER- TER-ELM AUC order(~), hidden neuron(~) order(~), hidden neuron(~)
20 Comparison Results test error (%) ELM TER- TER-ELM AUC- wdbc LAUC ELM TER- TER-ELM AUC- wdbc order(~), hidden neuron(~) order(~), hidden neuron(~)
21 Conclusions For data normalization, Normalization should be applied after making P matrix when using basis function. Normalization should be applied before making H matrix when using ELM basis function. For two class problems, All methods have a similar results. Especially, TER- and AUC- have almost the same performance in terms of TER and LAUC. TER: find the optimal α with a fixed г to minimize the total error rate. AUC: find the optimal г with a fixed α to minimize the total error rate. TER and AUC show a very similar trend.
22 References [] R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification, seconded. John Wiley & Sons,. [] K.-A. Toh, Q.-L. Tran, and D. Srinivasan, Benchmarking a reduced multivariate polynomial pattern classifier, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 6, no. 6, pp , 4. [3] Huang, G.-B., Zhu, Q.-Y., & Siew, C.-K. (6). Extreme learning machine: Theory and applications. Neurocomputing, 7, [4] K.-A. Toh and H.-L. Eng, Between classification-error approximation and weighted least-squares learning, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 3, no. 4, pp , 8. [5] K.-A. Toh, Deterministic Neural Classification, Neural Computation, 8. [6] K.-A. Toh, J. Kim and S. Lee, Maximizing Area Under ROC Curve for Biometric Scores Fusion, Pattern Recognition, 8. [7] K.-A. Toh, Learning from Target Knowledge Approximation, Proc. First IEEE Conf. Industrial Electronics and Applications, pp. 85-8, May 6. [8] J.A. Hanley, B.J. McNeil, The meaning and use of the area under a receiver operating characteristic (ROC) curve, Radiology 43 (98) [9] K.-A. Toh, Between AUC Based and Error Rate Based Learning, The 3rd IEEE Conference on Industrial Electronics and Applications (ICIEA), Singapore, June 8. [] D.J. Newman, S. Hettich, C.L. Blake, and C.J. Merz, UCI Repository of Machine Learning Databases, Univ. of California, Dept. of Information and Computer Sciences,
23 THE END
Modeling High-Dimensional Discrete Data with Multi-Layer Neural Networks
Modeling High-Dimensional Discrete Data with Multi-Layer Neural Networks Yoshua Bengio Dept. IRO Université de Montréal Montreal, Qc, Canada, H3C 3J7 bengioy@iro.umontreal.ca Samy Bengio IDIAP CP 592,
More informationOptimizing Data Transformation for Binary Classification
Optimizing Data ransformation for Binary Classification Kangro Oh, Kar-Ann oh, and Zhengguo Li Abstract In this paper, we propose to optimize a data transformation matrix and study its impact on binary
More informationImproving the Expert Networks of a Modular Multi-Net System for Pattern Recognition
Improving the Expert Networks of a Modular Multi-Net System for Pattern Recognition Mercedes Fernández-Redondo 1, Joaquín Torres-Sospedra 1 and Carlos Hernández-Espinosa 1 Departamento de Ingenieria y
More informationI D I A P. Online Policy Adaptation for Ensemble Classifiers R E S E A R C H R E P O R T. Samy Bengio b. Christos Dimitrakakis a IDIAP RR 03-69
R E S E A R C H R E P O R T Online Policy Adaptation for Ensemble Classifiers Christos Dimitrakakis a IDIAP RR 03-69 Samy Bengio b I D I A P December 2003 D a l l e M o l l e I n s t i t u t e for Perceptual
More informationNeural Networks and the Back-propagation Algorithm
Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely
More informationWeight Initialization Methods for Multilayer Feedforward. 1
Weight Initialization Methods for Multilayer Feedforward. 1 Mercedes Fernández-Redondo - Carlos Hernández-Espinosa. Universidad Jaume I, Campus de Riu Sec, Edificio TI, Departamento de Informática, 12080
More informationElectric Load Forecasting Using Wavelet Transform and Extreme Learning Machine
Electric Load Forecasting Using Wavelet Transform and Extreme Learning Machine Song Li 1, Peng Wang 1 and Lalit Goel 1 1 School of Electrical and Electronic Engineering Nanyang Technological University
More informationRecurrence Enhances the Spatial Encoding of Static Inputs in Reservoir Networks
Recurrence Enhances the Spatial Encoding of Static Inputs in Reservoir Networks Christian Emmerich, R. Felix Reinhart, and Jochen J. Steil Research Institute for Cognition and Robotics (CoR-Lab), Bielefeld
More informationLearning Kernel Parameters by using Class Separability Measure
Learning Kernel Parameters by using Class Separability Measure Lei Wang, Kap Luk Chan School of Electrical and Electronic Engineering Nanyang Technological University Singapore, 3979 E-mail: P 3733@ntu.edu.sg,eklchan@ntu.edu.sg
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationBANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1
BANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1 Shaobo Li University of Cincinnati 1 Partially based on Hastie, et al. (2009) ESL, and James, et al. (2013) ISLR Data Mining I Lecture
More informationSupport Vector Machine via Nonlinear Rescaling Method
Manuscript Click here to download Manuscript: svm-nrm_3.tex Support Vector Machine via Nonlinear Rescaling Method Roman Polyak Department of SEOR and Department of Mathematical Sciences George Mason University
More informationNecessary Corrections in Intransitive Likelihood-Ratio Classifiers
Necessary Corrections in Intransitive Likelihood-Ratio Classifiers Gang Ji and Jeff Bilmes SSLI-Lab, Department of Electrical Engineering University of Washington Seattle, WA 9895-500 {gang,bilmes}@ee.washington.edu
More informationSparse Support Vector Machines by Kernel Discriminant Analysis
Sparse Support Vector Machines by Kernel Discriminant Analysis Kazuki Iwamura and Shigeo Abe Kobe University - Graduate School of Engineering Kobe, Japan Abstract. We discuss sparse support vector machines
More informationEEE 241: Linear Systems
EEE 4: Linear Systems Summary # 3: Introduction to artificial neural networks DISTRIBUTED REPRESENTATION An ANN consists of simple processing units communicating with each other. The basic elements of
More informationIntelligent Modular Neural Network for Dynamic System Parameter Estimation
Intelligent Modular Neural Network for Dynamic System Parameter Estimation Andrzej Materka Technical University of Lodz, Institute of Electronics Stefanowskiego 18, 9-537 Lodz, Poland Abstract: A technique
More informationMLPR: Logistic Regression and Neural Networks
MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition Amos Storkey Amos Storkey MLPR: Logistic Regression and Neural Networks 1/28 Outline 1 Logistic Regression 2 Multi-layer
More informationOutline. MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition. Which is the correct model? Recap.
Outline MLPR: and Neural Networks Machine Learning and Pattern Recognition 2 Amos Storkey Amos Storkey MLPR: and Neural Networks /28 Recap Amos Storkey MLPR: and Neural Networks 2/28 Which is the correct
More informationLearning Methods for Linear Detectors
Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2011/2012 Lesson 20 27 April 2012 Contents Learning Methods for Linear Detectors Learning Linear Detectors...2
More informationNon-parametric Classification of Facial Features
Non-parametric Classification of Facial Features Hyun Sung Chang Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology Problem statement In this project, I attempted
More informationCombination Methods for Ensembles of Multilayer Feedforward 1
Combination Methods for Ensembles of Multilayer Feedforward 1 JOAQUÍN TORRES-SOSPEDRA MERCEDES FERNÁNDEZ-REDONDO CARLOS HERNÁNDEZ-ESPINOSA Dept. de Ingeniería y Ciencia de los Computadores Universidad
More informationSupport Vector Machine (continued)
Support Vector Machine continued) Overlapping class distribution: In practice the class-conditional distributions may overlap, so that the training data points are no longer linearly separable. We need
More informationLinear Classifiers as Pattern Detectors
Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2014/2015 Lesson 16 8 April 2015 Contents Linear Classifiers as Pattern Detectors Notation...2 Linear
More informationEEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1
EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationBiometric scores fusion based on total error rate minimization
Pattern Recognition 4 (28) 66 82 www.elsevier.com/locate/pr Biometric scores fusion based on total error rate minimization Kar-Ann Toh, Jaihie Kim, Sangyoun Lee Biometrics Engineering Research Center,
More informationMachine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber
Machine Learning Regression-Based Classification & Gaussian Discriminant Analysis Manfred Huber 2015 1 Logistic Regression Linear regression provides a nice representation and an efficient solution to
More informationA BAYESIAN APPROACH FOR EXTREME LEARNING MACHINE-BASED SUBSPACE LEARNING. Alexandros Iosifidis and Moncef Gabbouj
A BAYESIAN APPROACH FOR EXTREME LEARNING MACHINE-BASED SUBSPACE LEARNING Alexandros Iosifidis and Moncef Gabbouj Department of Signal Processing, Tampere University of Technology, Finland {alexandros.iosifidis,moncef.gabbouj}@tut.fi
More informationMachine Learning : Support Vector Machines
Machine Learning Support Vector Machines 05/01/2014 Machine Learning : Support Vector Machines Linear Classifiers (recap) A building block for almost all a mapping, a partitioning of the input space into
More informationDiscriminant Analysis and Statistical Pattern Recognition
Discriminant Analysis and Statistical Pattern Recognition GEOFFREY J. McLACHLAN Department of Mathematics The University of Queensland St. Lucia, Queensland, Australia A Wiley-Interscience Publication
More informationPATTERN CLASSIFICATION
PATTERN CLASSIFICATION Second Edition Richard O. Duda Peter E. Hart David G. Stork A Wiley-lnterscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane Singapore Toronto CONTENTS
More informationOptimization Approximation Solution for Regression Problem Based on Extremal Learning Machine
Optimization Approximation Solution for Regression Problem Based on Extremal Learning Machine Yubo Yuan Yuguang Wang Feilong Cao Department of Mathematics, China Jiliang University, Hangzhou 3008, Zhejiang
More informationp(d θ ) l(θ ) 1.2 x x x
p(d θ ).2 x 0-7 0.8 x 0-7 0.4 x 0-7 l(θ ) -20-40 -60-80 -00 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ θ x FIGURE 3.. The top graph shows several training points in one dimension, known or assumed to
More informationCS 195-5: Machine Learning Problem Set 1
CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of
More informationIndex of Balanced Accuracy: A Performance Measure for Skewed Class Distributions
Index of Balanced Accuracy: A Performance Measure for Skewed Class Distributions V. García 1,2, R.A. Mollineda 2, and J.S. Sánchez 2 1 Lab. Reconocimiento de Patrones, Instituto Tecnológico de Toluca Av.
More informationBayesian Reasoning and Recognition
Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIAG 2 / osig 1 Second Semester 2013/2014 Lesson 12 28 arch 2014 Bayesian Reasoning and Recognition Notation...2 Pattern Recognition...3
More informationMultilayer Perceptron = FeedForward Neural Network
Multilayer Perceptron = FeedForward Neural Networ History Definition Classification = feedforward operation Learning = bacpropagation = local optimization in the space of weights Pattern Classification
More informationIntroduction to Gaussian Processes
Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of
More informationBayesian Decision Theory
Introduction to Pattern Recognition [ Part 4 ] Mahdi Vasighi Remarks It is quite common to assume that the data in each class are adequately described by a Gaussian distribution. Bayesian classifier is
More informationNEAREST NEIGHBOR CLASSIFICATION WITH IMPROVED WEIGHTED DISSIMILARITY MEASURE
THE PUBLISHING HOUSE PROCEEDINGS OF THE ROMANIAN ACADEMY, Series A, OF THE ROMANIAN ACADEMY Volume 0, Number /009, pp. 000 000 NEAREST NEIGHBOR CLASSIFICATION WITH IMPROVED WEIGHTED DISSIMILARITY MEASURE
More informationCOGS Q250 Fall Homework 7: Learning in Neural Networks Due: 9:00am, Friday 2nd November.
COGS Q250 Fall 2012 Homework 7: Learning in Neural Networks Due: 9:00am, Friday 2nd November. For the first two questions of the homework you will need to understand the learning algorithm using the delta
More informationPattern Classification
Pattern Classification All materials in these slides were taen from Pattern Classification (2nd ed) by R. O. Duda,, P. E. Hart and D. G. Stor, John Wiley & Sons, 2000 with the permission of the authors
More informationMultivariate statistical methods and data mining in particle physics
Multivariate statistical methods and data mining in particle physics RHUL Physics www.pp.rhul.ac.uk/~cowan Academic Training Lectures CERN 16 19 June, 2008 1 Outline Statement of the problem Some general
More informationCourse 395: Machine Learning - Lectures
Course 395: Machine Learning - Lectures Lecture 1-2: Concept Learning (M. Pantic) Lecture 3-4: Decision Trees & CBC Intro (M. Pantic & S. Petridis) Lecture 5-6: Evaluating Hypotheses (S. Petridis) Lecture
More informationDiscriminant Kernels based Support Vector Machine
Discriminant Kernels based Support Vector Machine Akinori Hidaka Tokyo Denki University Takio Kurita Hiroshima University Abstract Recently the kernel discriminant analysis (KDA) has been successfully
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationOutline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation
More informationUnsupervised Learning with Permuted Data
Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University
More informationESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN ,
Relevance determination in learning vector quantization Thorsten Bojer, Barbara Hammer, Daniel Schunk, and Katharina Tluk von Toschanowitz University of Osnabrück, Department of Mathematics/ Computer Science,
More informationComparison of Log-Linear Models and Weighted Dissimilarity Measures
Comparison of Log-Linear Models and Weighted Dissimilarity Measures Daniel Keysers 1, Roberto Paredes 2, Enrique Vidal 2, and Hermann Ney 1 1 Lehrstuhl für Informatik VI, Computer Science Department RWTH
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationEVALUATING MISCLASSIFICATION PROBABILITY USING EMPIRICAL RISK 1. Victor Nedel ko
94 International Journal "Information Theories & Applications" Vol13 [Raudys, 001] Raudys S, Statistical and neural classifiers, Springer, 001 [Mirenkova, 00] S V Mirenkova (edel ko) A method for prediction
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationLossless Online Bayesian Bagging
Lossless Online Bayesian Bagging Herbert K. H. Lee ISDS Duke University Box 90251 Durham, NC 27708 herbie@isds.duke.edu Merlise A. Clyde ISDS Duke University Box 90251 Durham, NC 27708 clyde@isds.duke.edu
More informationBACKPROPAGATION. Neural network training optimization problem. Deriving backpropagation
BACKPROPAGATION Neural network training optimization problem min J(w) w The application of gradient descent to this problem is called backpropagation. Backpropagation is gradient descent applied to J(w)
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationArticle from. Predictive Analytics and Futurism. July 2016 Issue 13
Article from Predictive Analytics and Futurism July 2016 Issue 13 Regression and Classification: A Deeper Look By Jeff Heaton Classification and regression are the two most common forms of models fitted
More informationClassification with Kernel Mahalanobis Distance Classifiers
Classification with Kernel Mahalanobis Distance Classifiers Bernard Haasdonk and Elżbieta P ekalska 2 Institute of Numerical and Applied Mathematics, University of Münster, Germany, haasdonk@math.uni-muenster.de
More informationClass 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio
Class 4: Classification Quaid Morris February 11 th, 211 ML4Bio Overview Basic concepts in classification: overfitting, cross-validation, evaluation. Linear Discriminant Analysis and Quadratic Discriminant
More informationBenchmarking Functional Link Expansions for Audio Classification Tasks
25th Italian Workshop on Neural Networks (Vietri sul Mare) Benchmarking Functional Link Expansions for Audio Classification Tasks Scardapane S., Comminiello D., Scarpiniti M., Parisi R. and Uncini A. Overview
More informationA TWO-STAGE COMMITTEE MACHINE OF NEURAL NETWORKS
Journal of the Chinese Institute of Engineers, Vol. 32, No. 2, pp. 169-178 (2009) 169 A TWO-STAGE COMMITTEE MACHINE OF NEURAL NETWORKS Jen-Feng Wang, Chinson Yeh, Chen-Wen Yen*, and Mark L. Nagurka ABSTRACT
More informationProbabilistic modeling. The slides are closely adapted from Subhransu Maji s slides
Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework
More informationGeneralization to Unseen Cases
In Y. Weiss, B. Schölkopf, and J. Platt (Eds.), Advances in Neural Information Processing Systems 18 (NIPS-05), pp. 1129 1136, MIT Press, Cambridge, MA, 2006 Generalization to Unseen Cases Teemu Roos Helsinki
More information1162 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 11, NO. 5, SEPTEMBER The Evidence Framework Applied to Support Vector Machines
1162 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 11, NO. 5, SEPTEMBER 2000 Brief Papers The Evidence Framework Applied to Support Vector Machines James Tin-Yau Kwok Abstract In this paper, we show that
More informationCOMS 4771 Introduction to Machine Learning. Nakul Verma
COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW1 due next lecture Project details are available decide on the group and topic by Thursday Last time Generative vs. Discriminative
More informationMulti-layer Neural Networks
Multi-layer Neural Networks Steve Renals Informatics 2B Learning and Data Lecture 13 8 March 2011 Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 1 Overview Multi-layer neural
More informationMultilayer Neural Networks
Multilayer Neural Networks Multilayer Neural Networks Discriminant function flexibility NON-Linear But with sets of linear parameters at each layer Provably general function approximators for sufficient
More informationFrom perceptrons to word embeddings. Simon Šuster University of Groningen
From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written
More informationNotes on Discriminant Functions and Optimal Classification
Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem
More informationThe Nearest Feature Midpoint - A Novel Approach for Pattern Classification. Abstract
International Journal of Information Technology, Vol. No. The Nearest Feature Midpoint - A Novel Approach for Pattern Classification Zonglin Zhou and Chee Keong Kwoh Department of Computer Science Hong
More informationDESIGNING RBF CLASSIFIERS FOR WEIGHTED BOOSTING
DESIGNING RBF CLASSIFIERS FOR WEIGHTED BOOSTING Vanessa Gómez-Verdejo, Jerónimo Arenas-García, Manuel Ortega-Moral and Aníbal R. Figueiras-Vidal Department of Signal Theory and Communications Universidad
More informationA Simple Implementation of the Stochastic Discrimination for Pattern Recognition
A Simple Implementation of the Stochastic Discrimination for Pattern Recognition Dechang Chen 1 and Xiuzhen Cheng 2 1 University of Wisconsin Green Bay, Green Bay, WI 54311, USA chend@uwgb.edu 2 University
More informationNeural Network to Control Output of Hidden Node According to Input Patterns
American Journal of Intelligent Systems 24, 4(5): 96-23 DOI:.5923/j.ajis.2445.2 Neural Network to Control Output of Hidden Node According to Input Patterns Takafumi Sasakawa, Jun Sawamoto 2,*, Hidekazu
More informationPredicting the Probability of Correct Classification
Predicting the Probability of Correct Classification Gregory Z. Grudic Department of Computer Science University of Colorado, Boulder grudic@cs.colorado.edu Abstract We propose a formulation for binary
More informationA Metric Approach to Building Decision Trees based on Goodman-Kruskal Association Index
A Metric Approach to Building Decision Trees based on Goodman-Kruskal Association Index Dan A. Simovici and Szymon Jaroszewicz University of Massachusetts at Boston, Department of Computer Science, Boston,
More informationWhat Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1
What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1 Multi-layer networks Steve Renals Machine Learning Practical MLP Lecture 3 7 October 2015 MLP Lecture 3 Multi-layer networks 2 What Do Single
More informationSelection of Classifiers based on Multiple Classifier Behaviour
Selection of Classifiers based on Multiple Classifier Behaviour Giorgio Giacinto, Fabio Roli, and Giorgio Fumera Dept. of Electrical and Electronic Eng. - University of Cagliari Piazza d Armi, 09123 Cagliari,
More informationLBR-Meta: An Efficient Algorithm for Lazy Bayesian Rules
LBR-Meta: An Efficient Algorithm for Lazy Bayesian Rules Zhipeng Xie School of Computer Science Fudan University 220 Handan Road, Shanghai 200433, PR. China xiezp@fudan.edu.cn Abstract LBR is a highly
More informationSupervised locally linear embedding
Supervised locally linear embedding Dick de Ridder 1, Olga Kouropteva 2, Oleg Okun 2, Matti Pietikäinen 2 and Robert P.W. Duin 1 1 Pattern Recognition Group, Department of Imaging Science and Technology,
More informationDynamic Linear Combination of Two-Class Classifiers
Dynamic Linear Combination of Two-Class Classifiers Carlo Lobrano 1, Roberto Tronci 1,2, Giorgio Giacinto 1, and Fabio Roli 1 1 DIEE Dept. of Electrical and Electronic Engineering, University of Cagliari,
More informationLinear Methods for Classification
Linear Methods for Classification Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Classification Supervised learning Training data: {(x 1, g 1 ), (x 2, g 2 ),..., (x
More informationDiversity-Based Boosting Algorithm
Diversity-Based Boosting Algorithm Jafar A. Alzubi School of Engineering Al-Balqa Applied University Al-Salt, Jordan Abstract Boosting is a well known and efficient technique for constructing a classifier
More informationKernel-based Feature Extraction under Maximum Margin Criterion
Kernel-based Feature Extraction under Maximum Margin Criterion Jiangping Wang, Jieyan Fan, Huanghuang Li, and Dapeng Wu 1 Department of Electrical and Computer Engineering, University of Florida, Gainesville,
More informationExtreme Learning Machine: RBF Network Case
Extreme Learning Machine: RBF Network Case Guang-Bin Huang and Chee-Kheong Siew School of Electrical and Electronic Engineering Nanyang Technological University Nanyang Avenue, Singapore 639798 E-mail:
More informationSPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks
Topics in Machine Learning-EE 5359 Neural Networks 1 The Perceptron Output: A perceptron is a function that maps D-dimensional vectors to real numbers. For notational convenience, we add a zero-th dimension
More informationGaussian discriminant analysis Naive Bayes
DM825 Introduction to Machine Learning Lecture 7 Gaussian discriminant analysis Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Outline 1. is 2. Multi-variate
More informationVoting Massive Collections of Bayesian Network Classifiers for Data Streams
Voting Massive Collections of Bayesian Network Classifiers for Data Streams Remco R. Bouckaert Computer Science Department, University of Waikato, New Zealand remco@cs.waikato.ac.nz Abstract. We present
More informationA New Wrapper Method for Feature Subset Selection
A New Wrapper Method for Feature Subset Selection Noelia Sánchez-Maroño 1 and Amparo Alonso-Betanzos 1 and Enrique Castillo 2 1- University of A Coruña- Department of Computer Science- LIDIA Lab. Faculty
More informationCh 4. Linear Models for Classification
Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,
More informationLinear Discrimination Functions
Laurea Magistrale in Informatica Nicola Fanizzi Dipartimento di Informatica Università degli Studi di Bari November 4, 2009 Outline Linear models Gradient descent Perceptron Minimum square error approach
More informationArtificial Neural Networks
Introduction ANN in Action Final Observations Application: Poverty Detection Artificial Neural Networks Alvaro J. Riascos Villegas University of los Andes and Quantil July 6 2018 Artificial Neural Networks
More informationMinimal Attribute Space Bias for Attribute Reduction
Minimal Attribute Space Bias for Attribute Reduction Fan Min, Xianghui Du, Hang Qiu, and Qihe Liu School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu
More informationDivergence based Learning Vector Quantization
Divergence based Learning Vector Quantization E. Mwebaze 1,2, P. Schneider 2, F.-M. Schleif 3, S. Haase 4, T. Villmann 4, M. Biehl 2 1 Faculty of Computing & IT, Makerere Univ., P.O. Box 7062, Kampala,
More informationSTUDY ON METHODS FOR COMPUTER-AIDED TOOTH SHADE DETERMINATION
INTERNATIONAL JOURNAL OF INFORMATION AND SYSTEMS SCIENCES Volume 5, Number 3-4, Pages 351 358 c 2009 Institute for Scientific Computing and Information STUDY ON METHODS FOR COMPUTER-AIDED TOOTH SHADE DETERMINATION
More informationk k k 1 Lecture 9: Applying Backpropagation Lecture 9: Applying Backpropagation 3 Lecture 9: Applying Backpropagation
K-Class Classification Problem Let us denote the -th class by C, with n exemplars or training samples, forming the sets T for = 1,, K: {( x, ) p = 1 n } T = d,..., p p The complete training set is T =
More informationMaximum Likelihood Estimation. only training data is available to design a classifier
Introduction to Pattern Recognition [ Part 5 ] Mahdi Vasighi Introduction Bayesian Decision Theory shows that we could design an optimal classifier if we knew: P( i ) : priors p(x i ) : class-conditional
More informationIntegrating Global and Local Structures: A Least Squares Framework for Dimensionality Reduction
Integrating Global and Local Structures: A Least Squares Framework for Dimensionality Reduction Jianhui Chen, Jieping Ye Computer Science and Engineering Department Arizona State University {jianhui.chen,
More informationNon-Iterative Heteroscedastic Linear Dimension Reduction for Two-Class Data
Non-Iterative Heteroscedastic Linear Dimension Reduction for Two-Class Data From Fisher to Chernoff M. Loog and R. P.. Duin Image Sciences Institute, University Medical Center Utrecht, Utrecht, The Netherlands,
More informationTwo-Layered Face Detection System using Evolutionary Algorithm
Two-Layered Face Detection System using Evolutionary Algorithm Jun-Su Jang Jong-Hwan Kim Dept. of Electrical Engineering and Computer Science, Korea Advanced Institute of Science and Technology (KAIST),
More informationA Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems
Machine Learning, 45, 171 186, 001 c 001 Kluwer Academic Publishers. Manufactured in The Netherlands. A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems
More information