Semiparametric Discriminant Analysis of Mixture Populations Using Mahalanobis Distance. Probal Chaudhuri and Subhajit Dutta
|
|
- Russell Owens
- 5 years ago
- Views:
Transcription
1 Semiparametric Discriminant Analysis of Mixture Populations Using Mahalanobis Distance Probal Chaudhuri and Subhajit Dutta Indian Statistical Institute, Kolkata. Workshop on Classification and Regression Trees Institute of Mathematical Sciences, National University of Singapore
2 Brief history Fisher, R. A. (1936), The use of multiple measurements in taxonomic problems, Ann. Eugenics, 7, Fisher was interested in the taxonomic classification of different species of Iris. He had measurements on the lengths and the widths of the sepals and the petals of the flowers of three different species, namely, Iris setosa, Iris virginica and Iris versicolor.
3 Brief history (Contd.) Mahalanobis, P. C. (1936), On the generalized distance in statistics, Proc. Nat. Acad. Sci., India, 12, Mahalanobis met Nelson Annandale at the 1920 Nagpur session of the Indian Science Congress. Annandale asked Mahalanobis to analyze anthropometric measurements of Anglo-Indians in Calcutta. This eventually led to the development of Mahalanobis distance.
4 Mahalanobis distance and Fisher s discriminant function Mahalanobis distance of an observation x from a population with mean µ and dispersion Σ : MD(x,µ,Σ) = (x µ) Σ 1 (x µ). Fisher s linear discriminant function for two populations with means µ 1 and µ 2 and a common dispersion Σ : (x (µ 1 +µ 2 )/2) Σ 1 (µ 1 µ 2 ).
5 Mahalanobis distance and Fisher s discriminant function (Contd.)
6 Mahalanobis distance and Fisher s discriminant function (Contd.) The linear discriminant function has Bayes risk optimality for Gaussian class distributions which differ in their locations but have the same dispersion. This has been discussed in detail in Welch (1939, Biometrika) and Rao (1948, JRSS-B). In fact, the Bayes risk optimality holds for elliptically symmetric and unimodal class distributions which differ only in their locations.
7 Examples Example (a) : Class 1 : Mixture of N d (0,Σ) and N d (0, 10Σ); and Class 2 : N d (0, 5Σ). Σ = [ I d ], N d is the d-variate normal distribution. Example (b) : Class 1 : Mixture of U d (0,Σ, 0, 1) and U d (0,Σ, 2, 3); and Class 2 : U d (0,Σ, 1, 2) and U d (0,Σ, 3, 4). U d (µ,σ, r 1, r 2 ) denotes the uniform distribution over the region {x R d : r 1 < Σ 1/2 (x µ) < r 2 }. Classes have same location 0 but different scatters and shapes.
8 0 X X Bayes class boundaries X1 (a) Example (a) X1 (b) Example (b) Figure: Bayes class boundaries in R2. 4
9 Bayes class boundaries (Contd.) Class distributions involve elliptically symmetric distributions. They have same location (i.e., 0), and they differ in their scatters as well as shapes. No linear or quadratic classifier will work here!!
10 Performance of some standard classifiers Misclassification rate Bayes k NN KDE LDA QDA Misclassification rate Bayes k NN KDE LDA QDA log(d) (a) Example (a) log(d) (b) Example (b) Figure: Misclassification rates of LDA, QDA, two nonparametric classifiers and the Bayes classifier for d = 2, 5, 10, 20, 50 and 100.
11 Nonparametric multinomial additive logistic regression model Suppose that the class densities are elliptically symmetric f i (x) = Σ i 1/2 g i ( Σ 1/2 i (x µ i ) ) = ψ i (MD(x,µ i,σ i )) for all i = 1, 2. The class posterior probabilities are and p(1 x) = π 1 f 1 (x)/(π 1 f 1 (x)+π 2 f 2 (x)) It is easy to see that p(2 x) = 1 p(1 x). log{p(1 x)/p(2 x)} = log(π 1 f 1 (x)/π 2 f 2 (x)) = log(π 1 /π 2 )+logψ 1 (MD(x,µ 1,Σ 1 )) logψ 2 (MD(x,µ 2,Σ 2 )).
12 Nonparametric multinomial additive logistic regression model (Contd.) The posteriors turn out to be of the form = = p(1 x) = p(1 z(x)) exp(logψ 1 (z 1 (x)) logψ 2 (z 2 (x)) [1+exp(logψ 1 (z 1 (x)) logψ 2 (z 2 (x))], p(2 x) = p(2 z(x)) 1 [1+exp(logψ 1 (z 1 (x)) logψ 2 (z 2 (x))], where z(x) = (z 1 (x), z 2 (x)) = (MD(x,µ 1,Σ 1 ), MD(x,µ 2,Σ 2 )). The posterior probabilities satisfy a generalized additive model (Hastie and Tibshirani, 1990).
13 Nonparametric multinomial additive logistic regression model (Contd.) If we have two normal populations N d (µ 1,Σ) and N d (µ 2,Σ), we get linear logistic regression model for the posterior probabilities. This is related to Fisher s linear discriminant analysis. If we have two normal populations N d (µ 1,Σ 1 ) and N d (µ 2,Σ 2 ), we get quadratic logistic regression model for the posterior probabilities. This is related to quadratic discriminant analysis.
14 Nonparametric multinomial additive logistic regression model (Contd.) For any 1 i (J 1), it is easy to see that log{p(i x)/p(j x)} = log(π i /π J )+logψ i (MD(x,µ i,σ i )) logψ J (MD(x,µ J,Σ J )), where p(i x) is the posterior probability of the i-th class. For any 1 i (J 1), the posteriors are of the form p(i x) = p(i z(x)) = exp(φ i (z(x))) [1+ (J 1) k=1 exp(φ k(z(x)))], 1 p(j x) = p(j z(x)) = [1+ (J 1) k=1 exp(φ k(z(x)))], where z(x) = (MD(x,µ 1,Σ 1 ),, MD(x,µ J,Σ J )).
15 Nonparametric multinomial additive logistic regression model (Contd.) We replace the original feature variables by the Mahalanobis distances from different classes. x z(x) = (MD(x,µ 1,Σ 1 ),, MD(x,µ J,Σ J )). One can use the backfitting algorithm (Hastie and Tibshirani, 1990) to estimate the posterior probabilities from the training data.
16 More general class distributions Non-elliptic class distributions. Multi-modal class distributions. Mixture models for class distributions.
17 Finite mixture of elliptically symmetric densities Assume R i f i (x) = θ ik Σ ik 1/2 g ik ( Σ 1/2 ik (x µ ik ) ), k=1 where θ ik s are positive satisfying R i k=1 θ ik = 1 for all 1 i J. The posterior probability for the i-th class is R i p(i x) = p(c ir x) for all 1 i J, r=1 where c ir denotes the r-th sub-class in the i-th class. The posterior probability p(c ir x) satisfies a multinomial additive logistic regression model because the distribution of the sub-population c ir is elliptically symmetric.
18 The missing data problem In the training data, we have the class labels, but the sub-class labels are not available. If we had known the sub-class labels, we could once again use the backfitting algorithm to estimate the sub-class posteriors. Sub-class labels can be treated as missing observations. We can use an EM-type algorithm.
19 SPARC : The algorithm Initial E-step : Sub-class labels are estimated by appropriate cluster analysis of the training data in each class. Initial and later M-steps : Once the sub-class labels are obtained, sub-class posteriors can be estimated by fitting a nonparametric multinomial additive logistic regression model using the backfitting algorithm. Later E-steps : The sub-class labels are estimated by sub-class posterior probabilities. Iterations are carried out until posterior estimates stabilize. An observation is classified into the class having the largest posterior probability.
20 A pool of different classifiers Traditional classifiers like LDA and QDA. Nonparametric classifiers based on k-nn and KDE. SVM with the linear kernel and the radial basis functions. Classifiers based on adaptive partitioning of the co-variate space : CART, RF and Poly-MARS.
21 A pool of different classifiers (Contd.) Hastie and Tibshirani (1996, JRSS-B) proposed an extension of LDA by modelling the density function of each class by a finite mixture of normal densities. They called their method mixture discriminant analysis (MDA). Fraley and Raftery (2002, JASA) extended MDA to MclustDA, where they considered mixtures of other families of parametric densities.
22 Simulated datasets Examples (a) and (b). Example (c) : The first class is an equal mixture of N d (0, 0.25I d ) and N d (0, I d ), and the second class is an equal mixture of N d (-1, 0.25I d ) and N d (1, 0.25I d ). Example (d) : One class distribution is an equal mixture of U d (0,Σ, 0, 1) and U d (2,Σ, 2, 3), and the other one is an equal mixture of U d (1,Σ, 1, 2) and U d (3,Σ, 3, 4).
23 Analysis of simulated data SPARC Mclust DA MDA Poly MARS RF CART KDE knn SVM r SVM l QDA LDA Bayes SPARC Mclust DA MDA Poly MARS RF CART KDE knn SVM r SVM l QDA LDA Bayes Misclassification rate Misclassification rate (a) Example (a) (b) Example (b) Figure: Boxplots of misclassification rates. Center of a box is the mean, and its width = 2 3sd.
24 Analysis of simulated data SPARC Mclust DA MDA Poly MARS RF CART KDE knn SVM r SVM l QDA LDA Bayes SPARC Mclust DA MDA Poly MARS RF CART KDE knn SVM r SVM l QDA LDA Bayes Misclassification rate Misclassification rate (a) Example (c) (b) Example (d) Figure: Boxplots of misclassification rates. Center of a box is the mean, and its width = 2 3sd.
25 Real data Hemophilia data : Available in the R package rrcov. There are two classes of carrier and non-carrier women. The variables are measurements on AHF activity and AHF antigen. Biomedical data : Available in the CMU data archive. There are two classes of carriers and non-carriers of a rare genetic disease. Variables are four measurements on blood samples of individuals.
26 Analysis of real benchmark data sets SPARC Mclust DA MDA Poly MARS RF CART KDE knn SVM r SVM l QDA LDA SPARC Mclust DA MDA Poly MARS RF CART KDE knn SVM r SVM l QDA LDA Misclassification rate Misclassification rate (a) Hemophilia data (b) Biomedical data Figure: Boxplots of misclassification rates. Center of a box is the mean, and its width = 2 3sd.
27 Real data (Contd.) Diabetes data : Available in the UCI data archive. There are three classes that consist of normal individuals, chemical diabetic and overt diabetic patients. The five variables are measurements related to weights of individuals and their insulin and glucose levels in blood. Vehicle data : Available in the UCI data archive. There are four types of vehicles and eighteen measurements related to the shape of each vehicle.
28 Analysis of real benchmark data sets (Contd.) SPARC Mclust DA MDA Poly MARS RF CART KDE knn SVM r SVM l QDA LDA SPARC Mclust DA MDA Poly MARS RF CART KDE knn SVM r SVM l QDA LDA Misclassification rate Misclassification rate (a) Diabetes data (b) Vehicle data Figure: Boxplots of misclassification rates. Center of a box is the mean, and its width = 2 3sd.
29 Looking back at the Iris data SPARC Mclust DA MDA Poly MARS RF CART KDE knn SVM r SVM l QDA LDA Misclassification rate Figure: Boxplot of misclassification rates for the Iris data. Center of the box is the mean, and its width = 2 3sd.
ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification
ISyE 6416: Computational Statistics Spring 2017 Lecture 5: Discriminant analysis and classification Prof. Yao Xie H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology
More informationIntroduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones
Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive
More informationClassification Methods II: Linear and Quadratic Discrimminant Analysis
Classification Methods II: Linear and Quadratic Discrimminant Analysis Rebecca C. Steorts, Duke University STA 325, Chapter 4 ISL Agenda Linear Discrimminant Analysis (LDA) Classification Recall that linear
More informationClassification techniques focus on Discriminant Analysis
Classification techniques focus on Discriminant Analysis Seminar: Potentials of advanced image analysis technology in the cereal science research 2111 2005 Ulf Indahl/IMT - 14.06.2010 Task: Supervised
More informationLecture 5: Classification
Lecture 5: Classification Advanced Applied Multivariate Analysis STAT 2221, Spring 2015 Sungkyu Jung Department of Statistics, University of Pittsburgh Xingye Qiao Department of Mathematical Sciences Binghamton
More informationDiscriminant Analysis and Statistical Pattern Recognition
Discriminant Analysis and Statistical Pattern Recognition GEOFFREY J. McLACHLAN Department of Mathematics The University of Queensland St. Lucia, Queensland, Australia A Wiley-Interscience Publication
More informationDiscrimination: finding the features that separate known groups in a multivariate sample.
Discrimination and Classification Goals: Discrimination: finding the features that separate known groups in a multivariate sample. Classification: developing a rule to allocate a new object into one of
More informationOn optimum choice of k in nearest neighbor classification
Computational Statistics & Data Analysis 50 (2006) 3113 3123 www.elsevier.com/locate/csda On optimum choice of k in nearest neighbor classification Anil K. Ghosh Theoretical Statistics and Mathematics
More informationSupervised Learning: Linear Methods (1/2) Applied Multivariate Statistics Spring 2012
Supervised Learning: Linear Methods (1/2) Applied Multivariate Statistics Spring 2012 Overview Review: Conditional Probability LDA / QDA: Theory Fisher s Discriminant Analysis LDA: Example Quality control:
More informationSF2935: MODERN METHODS OF STATISTICAL LECTURE 3 SUPERVISED CLASSIFICATION, LINEAR DISCRIMINANT ANALYSIS LEARNING. Tatjana Pavlenko.
SF2935: MODERN METHODS OF STATISTICAL LEARNING LECTURE 3 SUPERVISED CLASSIFICATION, LINEAR DISCRIMINANT ANALYSIS Tatjana Pavlenko 5 November 2015 SUPERVISED LEARNING (REP.) Starting point: we have an outcome
More informationClassification via kernel regression based on univariate product density estimators
Classification via kernel regression based on univariate product density estimators Bezza Hafidi 1, Abdelkarim Merbouha 2, and Abdallah Mkhadri 1 1 Department of Mathematics, Cadi Ayyad University, BP
More informationLEC 4: Discriminant Analysis for Classification
LEC 4: Discriminant Analysis for Classification Dr. Guangliang Chen February 25, 2016 Outline Last time: FDA (dimensionality reduction) Today: QDA/LDA (classification) Naive Bayes classifiers Matlab/Python
More informationClassification. Chapter Introduction. 6.2 The Bayes classifier
Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode
More informationLecture 5: LDA and Logistic Regression
Lecture 5: and Logistic Regression Hao Helen Zhang Hao Helen Zhang Lecture 5: and Logistic Regression 1 / 39 Outline Linear Classification Methods Two Popular Linear Models for Classification Linear Discriminant
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationMachine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber
Machine Learning Regression-Based Classification & Gaussian Discriminant Analysis Manfred Huber 2015 1 Logistic Regression Linear regression provides a nice representation and an efficient solution to
More informationExtensions to LDA and multinomial regression
Extensions to LDA and multinomial regression Patrick Breheny September 22 Patrick Breheny BST 764: Applied Statistical Modeling 1/20 Introduction Quadratic discriminant analysis Fitting models Linear discriminant
More informationThe Bayes classifier
The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal
More informationProbabilistic Fisher Discriminant Analysis
Probabilistic Fisher Discriminant Analysis Charles Bouveyron 1 and Camille Brunet 2 1- University Paris 1 Panthéon-Sorbonne Laboratoire SAMM, EA 4543 90 rue de Tolbiac 75013 PARIS - FRANCE 2- University
More informationMSA220 Statistical Learning for Big Data
MSA220 Statistical Learning for Big Data Lecture 4 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology More on Discriminant analysis More on Discriminant
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING Text Data: Topic Model Instructor: Yizhou Sun yzsun@cs.ucla.edu December 4, 2017 Methods to be Learnt Vector Data Set Data Sequence Data Text Data Classification Clustering
More informationECE662: Pattern Recognition and Decision Making Processes: HW TWO
ECE662: Pattern Recognition and Decision Making Processes: HW TWO Purdue University Department of Electrical and Computer Engineering West Lafayette, INDIANA, USA Abstract. In this report experiments are
More informationA Study of Relative Efficiency and Robustness of Classification Methods
A Study of Relative Efficiency and Robustness of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang April 28, 2011 Department of Statistics
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 305 Part VII
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationContents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)
Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture
More informationPart I. Linear Discriminant Analysis. Discriminant analysis. Discriminant analysis
Week 5 Based in part on slides from textbook, slides of Susan Holmes Part I Linear Discriminant Analysis October 29, 2012 1 / 1 2 / 1 Nearest centroid rule Suppose we break down our data matrix as by the
More informationUniversity of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians
Engineering Part IIB: Module F Statistical Pattern Processing University of Cambridge Engineering Part IIB Module F: Statistical Pattern Processing Handout : Multivariate Gaussians. Generative Model Decision
More informationLecture 4 Discriminant Analysis, k-nearest Neighbors
Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se
More informationChap 2. Linear Classifiers (FTH, ) Yongdai Kim Seoul National University
Chap 2. Linear Classifiers (FTH, 4.1-4.4) Yongdai Kim Seoul National University Linear methods for classification 1. Linear classifiers For simplicity, we only consider two-class classification problems
More informationDoes Modeling Lead to More Accurate Classification?
Does Modeling Lead to More Accurate Classification? A Comparison of the Efficiency of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang
More informationLinear Discriminant Analysis Based in part on slides from textbook, slides of Susan Holmes. November 9, Statistics 202: Data Mining
Linear Discriminant Analysis Based in part on slides from textbook, slides of Susan Holmes November 9, 2012 1 / 1 Nearest centroid rule Suppose we break down our data matrix as by the labels yielding (X
More informationRandom projection ensemble classification
Random projection ensemble classification Timothy I. Cannings Statistics for Big Data Workshop, Brunel Joint work with Richard Samworth Introduction to classification Observe data from two classes, pairs
More informationClassification using stochastic ensembles
July 31, 2014 Topics Introduction Topics Classification Application and classfication Classification and Regression Trees Stochastic ensemble methods Our application: USAID Poverty Assessment Tools Topics
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationDD-Classifier: Nonparametric Classification Procedure Based on DD-plot 1. Abstract
DD-Classifier: Nonparametric Classification Procedure Based on DD-plot 1 Jun Li 2, Juan A. Cuesta-Albertos 3, Regina Y. Liu 4 Abstract Using the DD-plot (depth-versus-depth plot), we introduce a new nonparametric
More informationLectures in AstroStatistics: Topics in Machine Learning for Astronomers
Lectures in AstroStatistics: Topics in Machine Learning for Astronomers Jessi Cisewski Yale University American Astronomical Society Meeting Wednesday, January 6, 2016 1 Statistical Learning - learning
More informationThe exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.
CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please
More informationAdaptive Mixture Discriminant Analysis for Supervised Learning with Unobserved Classes
Adaptive Mixture Discriminant Analysis for Supervised Learning with Unobserved Classes Charles Bouveyron To cite this version: Charles Bouveyron. Adaptive Mixture Discriminant Analysis for Supervised Learning
More informationComparison of Different Classification Methods on Glass Identification for Forensic Research
Journal of Statistical Science and Application, April 2016, Vol. 4, No. 03-04, 65-84 doi: 10.17265/2328-224X/2015.0304.001 D DAV I D PUBLISHING Comparison of Different Classification Methods on Glass Identification
More information> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel
Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation
More informationUniversity of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians
University of Cambridge Engineering Part IIB Module 4F: Statistical Pattern Processing Handout 2: Multivariate Gaussians.2.5..5 8 6 4 2 2 4 6 8 Mark Gales mjfg@eng.cam.ac.uk Michaelmas 2 2 Engineering
More informationMLE/MAP + Naïve Bayes
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes MLE / MAP Readings: Estimating Probabilities (Mitchell, 2016)
More informationNaïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 3 September 14, Readings: Mitchell Ch Murphy Ch.
School of Computer Science 10-701 Introduction to Machine Learning aïve Bayes Readings: Mitchell Ch. 6.1 6.10 Murphy Ch. 3 Matt Gormley Lecture 3 September 14, 2016 1 Homewor 1: due 9/26/16 Project Proposal:
More informationDesign of Text Mining Experiments. Matt Taddy, University of Chicago Booth School of Business faculty.chicagobooth.edu/matt.
Design of Text Mining Experiments Matt Taddy, University of Chicago Booth School of Business faculty.chicagobooth.edu/matt.taddy/research Active Learning: a flavor of design of experiments Optimal : consider
More informationMulti-scale Classification using Localized Spatial Depth
Journal of Macine Learning Researc 7 (206) -30 Submitted 3/6; Revised 0/6; Publised 2/6 Multi-scale Classification using Localized Spatial Dept Subajit Dutta Department of Matematics and Statistics Indian
More informationSTAT 730 Chapter 1 Background
STAT 730 Chapter 1 Background Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 27 Logistics Course notes hopefully posted evening before lecture,
More informationClassification Ensemble That Maximizes the Area Under Receiver Operating Characteristic Curve (AUC)
Classification Ensemble That Maximizes the Area Under Receiver Operating Characteristic Curve (AUC) Eunsik Park 1 and Y-c Ivan Chang 2 1 Chonnam National University, Gwangju, Korea 2 Academia Sinica, Taipei,
More informationAdaptive Mixture Discriminant Analysis for. Supervised Learning with Unobserved Classes
Adaptive Mixture Discriminant Analysis for Supervised Learning with Unobserved Classes Charles Bouveyron SAMOS-MATISSE, CES, UMR CNRS 8174 Université Paris 1 (Panthéon-Sorbonne), Paris, France Abstract
More informationCMSC858P Supervised Learning Methods
CMSC858P Supervised Learning Methods Hector Corrada Bravo March, 2010 Introduction Today we discuss the classification setting in detail. Our setting is that we observe for each subject i a set of p predictors
More informationNaïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Naïve Bayes Matt Gormley Lecture 18 Oct. 31, 2018 1 Reminders Homework 6: PAC Learning
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu November 3, 2015 Methods to Learn Matrix Data Text Data Set Data Sequence Data Time Series Graph
More informationCSC 411: Lecture 09: Naive Bayes
CSC 411: Lecture 09: Naive Bayes Class based on Raquel Urtasun & Rich Zemel s lectures Sanja Fidler University of Toronto Feb 8, 2015 Urtasun, Zemel, Fidler (UofT) CSC 411: 09-Naive Bayes Feb 8, 2015 1
More informationMLE/MAP + Naïve Bayes
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes Matt Gormley Lecture 19 March 20, 2018 1 Midterm Exam Reminders
More informationSupervised dimensionality reduction using mixture models
Sajama Alon Orlitsky University of california at San Diego sajama@ucsd.edu alon@ucsd.edu Abstract Given a classification problem, our goal is to find a low-dimensional linear transformation of the feature
More informationIntro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation
Lecture 15. Pattern Classification (I): Statistical Formulation Outline Statistical Pattern Recognition Maximum Posterior Probability (MAP) Classifier Maximum Likelihood (ML) Classifier K-Nearest Neighbor
More informationHigh Dimensional Discriminant Analysis
High Dimensional Discriminant Analysis Charles Bouveyron LMC-IMAG & INRIA Rhône-Alpes Joint work with S. Girard and C. Schmid High Dimensional Discriminant Analysis - Lear seminar p.1/43 Introduction High
More informationL11: Pattern recognition principles
L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction
More informationHigh Dimensional Discriminant Analysis
High Dimensional Discriminant Analysis Charles Bouveyron 1,2, Stéphane Girard 1, and Cordelia Schmid 2 1 LMC IMAG, BP 53, Université Grenoble 1, 38041 Grenoble cedex 9 France (e-mail: charles.bouveyron@imag.fr,
More informationAn Introduction to Multivariate Methods
Chapter 12 An Introduction to Multivariate Methods Multivariate statistical methods are used to display, analyze, and describe data on two or more features or variables simultaneously. I will discuss multivariate
More informationA Robust Approach to Regularized Discriminant Analysis
A Robust Approach to Regularized Discriminant Analysis Moritz Gschwandtner Department of Statistics and Probability Theory Vienna University of Technology, Austria Österreichische Statistiktage, Graz,
More informationMachine Learning for Signal Processing Bayes Classification and Regression
Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu October 19, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network
More informationIndependent Factor Discriminant Analysis
Independent Factor Discriminant Analysis Angela Montanari, Daniela Giovanna Caló, and Cinzia Viroli Statistics Department University of Bologna, via Belle Arti 41, 40126, Bologna, Italy (e-mail: montanari@stat.unibo.it,
More informationECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam
ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The
More informationClassification: Linear Discriminant Analysis
Classification: Linear Discriminant Analysis Discriminant analysis uses sample information about individuals that are known to belong to one of several populations for the purposes of classification. Based
More information10-701/ Machine Learning - Midterm Exam, Fall 2010
10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam
More informationISSN X On misclassification probabilities of linear and quadratic classifiers
Afrika Statistika Vol. 111, 016, pages 943 953. DOI: http://dx.doi.org/10.1699/as/016.943.85 Afrika Statistika ISSN 316-090X On misclassification probabilities of linear and quadratic classifiers Olusola
More informationApplied Multivariate and Longitudinal Data Analysis
Applied Multivariate and Longitudinal Data Analysis Discriminant analysis and classification Ana-Maria Staicu SAS Hall 5220; 919-515-0644; astaicu@ncsu.edu 1 Consider the examples: An online banking service
More informationA Bias Correction for the Minimum Error Rate in Cross-validation
A Bias Correction for the Minimum Error Rate in Cross-validation Ryan J. Tibshirani Robert Tibshirani Abstract Tuning parameters in supervised learning problems are often estimated by cross-validation.
More informationMULTIVARIATE HOMEWORK #5
MULTIVARIATE HOMEWORK #5 Fisher s dataset on differentiating species of Iris based on measurements on four morphological characters (i.e. sepal length, sepal width, petal length, and petal width) was subjected
More informationSVM-flexible discriminant analysis
SVM-flexible discriminant analysis Huimin Peng November 20, 2014 Outline SVM Nonlinear SVM = Penalization method discriminant analysis FDA: flexible discriminant analysis penalized discriminant analysis
More informationSpring 2006: Linear Discriminant Analysis, Etc.
36-724 Spring 2006: Linear Discriminant Analysis, Etc. Brian Junker April 17, 2006 Review: The Bayes Classifier Linear and Quadratic Discriminant Analysis and Friends Linear regression of an indicator
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer
More informationLecture 9: Classification, LDA
Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we
More informationLecture 9: Classification, LDA
Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis Jonathan Taylor, 10/12 Slide credits: Sergio Bacallado 1 / 1 Review: Main strategy in Chapter 4 Find an estimate ˆP
More informationLecture 8: Classification
1/26 Lecture 8: Classification Måns Eriksson Department of Mathematics, Uppsala University eriksson@math.uu.se Multivariate Methods 19/5 2010 Classification: introductory examples Goal: Classify an observation
More informationGaussian Mixture Models with Component Means Constrained in Pre-selected Subspaces
Gaussian Mixture Models with Component Means Constrained in Pre-selected Subspaces Mu Qiao and Jia Li Abstract We investigate a Gaussian mixture model (GMM) with component means constrained in a pre-selected
More informationDISCRIMINANT ANALYSIS. 1. Introduction
DISCRIMINANT ANALYSIS. Introduction Discrimination and classification are concerned with separating objects from different populations into different groups and with allocating new observations to one
More informationBINARY TREE-STRUCTURED PARTITION AND CLASSIFICATION SCHEMES
BINARY TREE-STRUCTURED PARTITION AND CLASSIFICATION SCHEMES DAVID MCDIARMID Abstract Binary tree-structured partition and classification schemes are a class of nonparametric tree-based approaches to classification
More informationT 2 Type Test Statistic and Simultaneous Confidence Intervals for Sub-mean Vectors in k-sample Problem
T Type Test Statistic and Simultaneous Confidence Intervals for Sub-mean Vectors in k-sample Problem Toshiki aito a, Tamae Kawasaki b and Takashi Seo b a Department of Applied Mathematics, Graduate School
More informationFundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur
Fundamentals to Biostatistics Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Statistics collection, analysis, interpretation of data development of new
More informationLecture 9: Classification, LDA
Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationMachine Learning 2017
Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section
More informationBayesian Inference of Interactions and Associations
Bayesian Inference of Interactions and Associations Jun Liu Department of Statistics Harvard University http://www.fas.harvard.edu/~junliu Based on collaborations with Yu Zhang, Jing Zhang, Yuan Yuan,
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationUnsupervised clustering of COMBO-17 galaxy photometry
STScI Astrostatistics R tutorials Eric Feigelson (Penn State) November 2011 SESSION 2 Multivariate clustering and classification ***************** ***************** Unsupervised clustering of COMBO-17
More informationIntroduction to Machine Learning Midterm Exam
10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but
More informationClassification 1: Linear regression of indicators, linear discriminant analysis
Classification 1: Linear regression of indicators, linear discriminant analysis Ryan Tibshirani Data Mining: 36-462/36-662 April 2 2013 Optional reading: ISL 4.1, 4.2, 4.4, ESL 4.1 4.3 1 Classification
More informationCS 340 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis
CS 3 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis AD March 11 AD ( March 11 1 / 17 Multivariate Gaussian Consider data { x i } N i=1 where xi R D and we assume they are
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Introduction to Classification Algorithms Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Some
More informationOn Some Classification Methods for High Dimensional and Functional Data
On Some Classification Methods for High Dimensional and Functional Data by OLUSOLA SAMUEL MAKINDE A thesis submitted to The University of Birmingham for the degree of Doctor of Philosophy School of Mathematics
More informationSupervised Learning. Regression Example: Boston Housing. Regression Example: Boston Housing
Supervised Learning Unsupervised learning: To extract structure and postulate hypotheses about data generating process from observations x 1,...,x n. Visualize, summarize and compress data. We have seen
More informationSupport Vector Machines for Classification: A Statistical Portrait
Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Antti Ukkonen TAs: Saska Dönges and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer,
More informationDiscriminant Kernels based Support Vector Machine
Discriminant Kernels based Support Vector Machine Akinori Hidaka Tokyo Denki University Takio Kurita Hiroshima University Abstract Recently the kernel discriminant analysis (KDA) has been successfully
More informationc 4, < y 2, 1 0, otherwise,
Fundamentals of Big Data Analytics Univ.-Prof. Dr. rer. nat. Rudolf Mathar Problem. Probability theory: The outcome of an experiment is described by three events A, B and C. The probabilities Pr(A) =,
More informationMultivariate statistical methods and data mining in particle physics
Multivariate statistical methods and data mining in particle physics RHUL Physics www.pp.rhul.ac.uk/~cowan Academic Training Lectures CERN 16 19 June, 2008 1 Outline Statement of the problem Some general
More information