Semiparametric Discriminant Analysis of Mixture Populations Using Mahalanobis Distance. Probal Chaudhuri and Subhajit Dutta

Size: px
Start display at page:

Download "Semiparametric Discriminant Analysis of Mixture Populations Using Mahalanobis Distance. Probal Chaudhuri and Subhajit Dutta"

Transcription

1 Semiparametric Discriminant Analysis of Mixture Populations Using Mahalanobis Distance Probal Chaudhuri and Subhajit Dutta Indian Statistical Institute, Kolkata. Workshop on Classification and Regression Trees Institute of Mathematical Sciences, National University of Singapore

2 Brief history Fisher, R. A. (1936), The use of multiple measurements in taxonomic problems, Ann. Eugenics, 7, Fisher was interested in the taxonomic classification of different species of Iris. He had measurements on the lengths and the widths of the sepals and the petals of the flowers of three different species, namely, Iris setosa, Iris virginica and Iris versicolor.

3 Brief history (Contd.) Mahalanobis, P. C. (1936), On the generalized distance in statistics, Proc. Nat. Acad. Sci., India, 12, Mahalanobis met Nelson Annandale at the 1920 Nagpur session of the Indian Science Congress. Annandale asked Mahalanobis to analyze anthropometric measurements of Anglo-Indians in Calcutta. This eventually led to the development of Mahalanobis distance.

4 Mahalanobis distance and Fisher s discriminant function Mahalanobis distance of an observation x from a population with mean µ and dispersion Σ : MD(x,µ,Σ) = (x µ) Σ 1 (x µ). Fisher s linear discriminant function for two populations with means µ 1 and µ 2 and a common dispersion Σ : (x (µ 1 +µ 2 )/2) Σ 1 (µ 1 µ 2 ).

5 Mahalanobis distance and Fisher s discriminant function (Contd.)

6 Mahalanobis distance and Fisher s discriminant function (Contd.) The linear discriminant function has Bayes risk optimality for Gaussian class distributions which differ in their locations but have the same dispersion. This has been discussed in detail in Welch (1939, Biometrika) and Rao (1948, JRSS-B). In fact, the Bayes risk optimality holds for elliptically symmetric and unimodal class distributions which differ only in their locations.

7 Examples Example (a) : Class 1 : Mixture of N d (0,Σ) and N d (0, 10Σ); and Class 2 : N d (0, 5Σ). Σ = [ I d ], N d is the d-variate normal distribution. Example (b) : Class 1 : Mixture of U d (0,Σ, 0, 1) and U d (0,Σ, 2, 3); and Class 2 : U d (0,Σ, 1, 2) and U d (0,Σ, 3, 4). U d (µ,σ, r 1, r 2 ) denotes the uniform distribution over the region {x R d : r 1 < Σ 1/2 (x µ) < r 2 }. Classes have same location 0 but different scatters and shapes.

8 0 X X Bayes class boundaries X1 (a) Example (a) X1 (b) Example (b) Figure: Bayes class boundaries in R2. 4

9 Bayes class boundaries (Contd.) Class distributions involve elliptically symmetric distributions. They have same location (i.e., 0), and they differ in their scatters as well as shapes. No linear or quadratic classifier will work here!!

10 Performance of some standard classifiers Misclassification rate Bayes k NN KDE LDA QDA Misclassification rate Bayes k NN KDE LDA QDA log(d) (a) Example (a) log(d) (b) Example (b) Figure: Misclassification rates of LDA, QDA, two nonparametric classifiers and the Bayes classifier for d = 2, 5, 10, 20, 50 and 100.

11 Nonparametric multinomial additive logistic regression model Suppose that the class densities are elliptically symmetric f i (x) = Σ i 1/2 g i ( Σ 1/2 i (x µ i ) ) = ψ i (MD(x,µ i,σ i )) for all i = 1, 2. The class posterior probabilities are and p(1 x) = π 1 f 1 (x)/(π 1 f 1 (x)+π 2 f 2 (x)) It is easy to see that p(2 x) = 1 p(1 x). log{p(1 x)/p(2 x)} = log(π 1 f 1 (x)/π 2 f 2 (x)) = log(π 1 /π 2 )+logψ 1 (MD(x,µ 1,Σ 1 )) logψ 2 (MD(x,µ 2,Σ 2 )).

12 Nonparametric multinomial additive logistic regression model (Contd.) The posteriors turn out to be of the form = = p(1 x) = p(1 z(x)) exp(logψ 1 (z 1 (x)) logψ 2 (z 2 (x)) [1+exp(logψ 1 (z 1 (x)) logψ 2 (z 2 (x))], p(2 x) = p(2 z(x)) 1 [1+exp(logψ 1 (z 1 (x)) logψ 2 (z 2 (x))], where z(x) = (z 1 (x), z 2 (x)) = (MD(x,µ 1,Σ 1 ), MD(x,µ 2,Σ 2 )). The posterior probabilities satisfy a generalized additive model (Hastie and Tibshirani, 1990).

13 Nonparametric multinomial additive logistic regression model (Contd.) If we have two normal populations N d (µ 1,Σ) and N d (µ 2,Σ), we get linear logistic regression model for the posterior probabilities. This is related to Fisher s linear discriminant analysis. If we have two normal populations N d (µ 1,Σ 1 ) and N d (µ 2,Σ 2 ), we get quadratic logistic regression model for the posterior probabilities. This is related to quadratic discriminant analysis.

14 Nonparametric multinomial additive logistic regression model (Contd.) For any 1 i (J 1), it is easy to see that log{p(i x)/p(j x)} = log(π i /π J )+logψ i (MD(x,µ i,σ i )) logψ J (MD(x,µ J,Σ J )), where p(i x) is the posterior probability of the i-th class. For any 1 i (J 1), the posteriors are of the form p(i x) = p(i z(x)) = exp(φ i (z(x))) [1+ (J 1) k=1 exp(φ k(z(x)))], 1 p(j x) = p(j z(x)) = [1+ (J 1) k=1 exp(φ k(z(x)))], where z(x) = (MD(x,µ 1,Σ 1 ),, MD(x,µ J,Σ J )).

15 Nonparametric multinomial additive logistic regression model (Contd.) We replace the original feature variables by the Mahalanobis distances from different classes. x z(x) = (MD(x,µ 1,Σ 1 ),, MD(x,µ J,Σ J )). One can use the backfitting algorithm (Hastie and Tibshirani, 1990) to estimate the posterior probabilities from the training data.

16 More general class distributions Non-elliptic class distributions. Multi-modal class distributions. Mixture models for class distributions.

17 Finite mixture of elliptically symmetric densities Assume R i f i (x) = θ ik Σ ik 1/2 g ik ( Σ 1/2 ik (x µ ik ) ), k=1 where θ ik s are positive satisfying R i k=1 θ ik = 1 for all 1 i J. The posterior probability for the i-th class is R i p(i x) = p(c ir x) for all 1 i J, r=1 where c ir denotes the r-th sub-class in the i-th class. The posterior probability p(c ir x) satisfies a multinomial additive logistic regression model because the distribution of the sub-population c ir is elliptically symmetric.

18 The missing data problem In the training data, we have the class labels, but the sub-class labels are not available. If we had known the sub-class labels, we could once again use the backfitting algorithm to estimate the sub-class posteriors. Sub-class labels can be treated as missing observations. We can use an EM-type algorithm.

19 SPARC : The algorithm Initial E-step : Sub-class labels are estimated by appropriate cluster analysis of the training data in each class. Initial and later M-steps : Once the sub-class labels are obtained, sub-class posteriors can be estimated by fitting a nonparametric multinomial additive logistic regression model using the backfitting algorithm. Later E-steps : The sub-class labels are estimated by sub-class posterior probabilities. Iterations are carried out until posterior estimates stabilize. An observation is classified into the class having the largest posterior probability.

20 A pool of different classifiers Traditional classifiers like LDA and QDA. Nonparametric classifiers based on k-nn and KDE. SVM with the linear kernel and the radial basis functions. Classifiers based on adaptive partitioning of the co-variate space : CART, RF and Poly-MARS.

21 A pool of different classifiers (Contd.) Hastie and Tibshirani (1996, JRSS-B) proposed an extension of LDA by modelling the density function of each class by a finite mixture of normal densities. They called their method mixture discriminant analysis (MDA). Fraley and Raftery (2002, JASA) extended MDA to MclustDA, where they considered mixtures of other families of parametric densities.

22 Simulated datasets Examples (a) and (b). Example (c) : The first class is an equal mixture of N d (0, 0.25I d ) and N d (0, I d ), and the second class is an equal mixture of N d (-1, 0.25I d ) and N d (1, 0.25I d ). Example (d) : One class distribution is an equal mixture of U d (0,Σ, 0, 1) and U d (2,Σ, 2, 3), and the other one is an equal mixture of U d (1,Σ, 1, 2) and U d (3,Σ, 3, 4).

23 Analysis of simulated data SPARC Mclust DA MDA Poly MARS RF CART KDE knn SVM r SVM l QDA LDA Bayes SPARC Mclust DA MDA Poly MARS RF CART KDE knn SVM r SVM l QDA LDA Bayes Misclassification rate Misclassification rate (a) Example (a) (b) Example (b) Figure: Boxplots of misclassification rates. Center of a box is the mean, and its width = 2 3sd.

24 Analysis of simulated data SPARC Mclust DA MDA Poly MARS RF CART KDE knn SVM r SVM l QDA LDA Bayes SPARC Mclust DA MDA Poly MARS RF CART KDE knn SVM r SVM l QDA LDA Bayes Misclassification rate Misclassification rate (a) Example (c) (b) Example (d) Figure: Boxplots of misclassification rates. Center of a box is the mean, and its width = 2 3sd.

25 Real data Hemophilia data : Available in the R package rrcov. There are two classes of carrier and non-carrier women. The variables are measurements on AHF activity and AHF antigen. Biomedical data : Available in the CMU data archive. There are two classes of carriers and non-carriers of a rare genetic disease. Variables are four measurements on blood samples of individuals.

26 Analysis of real benchmark data sets SPARC Mclust DA MDA Poly MARS RF CART KDE knn SVM r SVM l QDA LDA SPARC Mclust DA MDA Poly MARS RF CART KDE knn SVM r SVM l QDA LDA Misclassification rate Misclassification rate (a) Hemophilia data (b) Biomedical data Figure: Boxplots of misclassification rates. Center of a box is the mean, and its width = 2 3sd.

27 Real data (Contd.) Diabetes data : Available in the UCI data archive. There are three classes that consist of normal individuals, chemical diabetic and overt diabetic patients. The five variables are measurements related to weights of individuals and their insulin and glucose levels in blood. Vehicle data : Available in the UCI data archive. There are four types of vehicles and eighteen measurements related to the shape of each vehicle.

28 Analysis of real benchmark data sets (Contd.) SPARC Mclust DA MDA Poly MARS RF CART KDE knn SVM r SVM l QDA LDA SPARC Mclust DA MDA Poly MARS RF CART KDE knn SVM r SVM l QDA LDA Misclassification rate Misclassification rate (a) Diabetes data (b) Vehicle data Figure: Boxplots of misclassification rates. Center of a box is the mean, and its width = 2 3sd.

29 Looking back at the Iris data SPARC Mclust DA MDA Poly MARS RF CART KDE knn SVM r SVM l QDA LDA Misclassification rate Figure: Boxplot of misclassification rates for the Iris data. Center of the box is the mean, and its width = 2 3sd.

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification ISyE 6416: Computational Statistics Spring 2017 Lecture 5: Discriminant analysis and classification Prof. Yao Xie H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology

More information

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive

More information

Classification Methods II: Linear and Quadratic Discrimminant Analysis

Classification Methods II: Linear and Quadratic Discrimminant Analysis Classification Methods II: Linear and Quadratic Discrimminant Analysis Rebecca C. Steorts, Duke University STA 325, Chapter 4 ISL Agenda Linear Discrimminant Analysis (LDA) Classification Recall that linear

More information

Classification techniques focus on Discriminant Analysis

Classification techniques focus on Discriminant Analysis Classification techniques focus on Discriminant Analysis Seminar: Potentials of advanced image analysis technology in the cereal science research 2111 2005 Ulf Indahl/IMT - 14.06.2010 Task: Supervised

More information

Lecture 5: Classification

Lecture 5: Classification Lecture 5: Classification Advanced Applied Multivariate Analysis STAT 2221, Spring 2015 Sungkyu Jung Department of Statistics, University of Pittsburgh Xingye Qiao Department of Mathematical Sciences Binghamton

More information

Discriminant Analysis and Statistical Pattern Recognition

Discriminant Analysis and Statistical Pattern Recognition Discriminant Analysis and Statistical Pattern Recognition GEOFFREY J. McLACHLAN Department of Mathematics The University of Queensland St. Lucia, Queensland, Australia A Wiley-Interscience Publication

More information

Discrimination: finding the features that separate known groups in a multivariate sample.

Discrimination: finding the features that separate known groups in a multivariate sample. Discrimination and Classification Goals: Discrimination: finding the features that separate known groups in a multivariate sample. Classification: developing a rule to allocate a new object into one of

More information

On optimum choice of k in nearest neighbor classification

On optimum choice of k in nearest neighbor classification Computational Statistics & Data Analysis 50 (2006) 3113 3123 www.elsevier.com/locate/csda On optimum choice of k in nearest neighbor classification Anil K. Ghosh Theoretical Statistics and Mathematics

More information

Supervised Learning: Linear Methods (1/2) Applied Multivariate Statistics Spring 2012

Supervised Learning: Linear Methods (1/2) Applied Multivariate Statistics Spring 2012 Supervised Learning: Linear Methods (1/2) Applied Multivariate Statistics Spring 2012 Overview Review: Conditional Probability LDA / QDA: Theory Fisher s Discriminant Analysis LDA: Example Quality control:

More information

SF2935: MODERN METHODS OF STATISTICAL LECTURE 3 SUPERVISED CLASSIFICATION, LINEAR DISCRIMINANT ANALYSIS LEARNING. Tatjana Pavlenko.

SF2935: MODERN METHODS OF STATISTICAL LECTURE 3 SUPERVISED CLASSIFICATION, LINEAR DISCRIMINANT ANALYSIS LEARNING. Tatjana Pavlenko. SF2935: MODERN METHODS OF STATISTICAL LEARNING LECTURE 3 SUPERVISED CLASSIFICATION, LINEAR DISCRIMINANT ANALYSIS Tatjana Pavlenko 5 November 2015 SUPERVISED LEARNING (REP.) Starting point: we have an outcome

More information

Classification via kernel regression based on univariate product density estimators

Classification via kernel regression based on univariate product density estimators Classification via kernel regression based on univariate product density estimators Bezza Hafidi 1, Abdelkarim Merbouha 2, and Abdallah Mkhadri 1 1 Department of Mathematics, Cadi Ayyad University, BP

More information

LEC 4: Discriminant Analysis for Classification

LEC 4: Discriminant Analysis for Classification LEC 4: Discriminant Analysis for Classification Dr. Guangliang Chen February 25, 2016 Outline Last time: FDA (dimensionality reduction) Today: QDA/LDA (classification) Naive Bayes classifiers Matlab/Python

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

Lecture 5: LDA and Logistic Regression

Lecture 5: LDA and Logistic Regression Lecture 5: and Logistic Regression Hao Helen Zhang Hao Helen Zhang Lecture 5: and Logistic Regression 1 / 39 Outline Linear Classification Methods Two Popular Linear Models for Classification Linear Discriminant

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber Machine Learning Regression-Based Classification & Gaussian Discriminant Analysis Manfred Huber 2015 1 Logistic Regression Linear regression provides a nice representation and an efficient solution to

More information

Extensions to LDA and multinomial regression

Extensions to LDA and multinomial regression Extensions to LDA and multinomial regression Patrick Breheny September 22 Patrick Breheny BST 764: Applied Statistical Modeling 1/20 Introduction Quadratic discriminant analysis Fitting models Linear discriminant

More information

The Bayes classifier

The Bayes classifier The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal

More information

Probabilistic Fisher Discriminant Analysis

Probabilistic Fisher Discriminant Analysis Probabilistic Fisher Discriminant Analysis Charles Bouveyron 1 and Camille Brunet 2 1- University Paris 1 Panthéon-Sorbonne Laboratoire SAMM, EA 4543 90 rue de Tolbiac 75013 PARIS - FRANCE 2- University

More information

MSA220 Statistical Learning for Big Data

MSA220 Statistical Learning for Big Data MSA220 Statistical Learning for Big Data Lecture 4 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology More on Discriminant analysis More on Discriminant

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING Text Data: Topic Model Instructor: Yizhou Sun yzsun@cs.ucla.edu December 4, 2017 Methods to be Learnt Vector Data Set Data Sequence Data Text Data Classification Clustering

More information

ECE662: Pattern Recognition and Decision Making Processes: HW TWO

ECE662: Pattern Recognition and Decision Making Processes: HW TWO ECE662: Pattern Recognition and Decision Making Processes: HW TWO Purdue University Department of Electrical and Computer Engineering West Lafayette, INDIANA, USA Abstract. In this report experiments are

More information

A Study of Relative Efficiency and Robustness of Classification Methods

A Study of Relative Efficiency and Robustness of Classification Methods A Study of Relative Efficiency and Robustness of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang April 28, 2011 Department of Statistics

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 305 Part VII

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II) Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture

More information

Part I. Linear Discriminant Analysis. Discriminant analysis. Discriminant analysis

Part I. Linear Discriminant Analysis. Discriminant analysis. Discriminant analysis Week 5 Based in part on slides from textbook, slides of Susan Holmes Part I Linear Discriminant Analysis October 29, 2012 1 / 1 2 / 1 Nearest centroid rule Suppose we break down our data matrix as by the

More information

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians Engineering Part IIB: Module F Statistical Pattern Processing University of Cambridge Engineering Part IIB Module F: Statistical Pattern Processing Handout : Multivariate Gaussians. Generative Model Decision

More information

Lecture 4 Discriminant Analysis, k-nearest Neighbors

Lecture 4 Discriminant Analysis, k-nearest Neighbors Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se

More information

Chap 2. Linear Classifiers (FTH, ) Yongdai Kim Seoul National University

Chap 2. Linear Classifiers (FTH, ) Yongdai Kim Seoul National University Chap 2. Linear Classifiers (FTH, 4.1-4.4) Yongdai Kim Seoul National University Linear methods for classification 1. Linear classifiers For simplicity, we only consider two-class classification problems

More information

Does Modeling Lead to More Accurate Classification?

Does Modeling Lead to More Accurate Classification? Does Modeling Lead to More Accurate Classification? A Comparison of the Efficiency of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang

More information

Linear Discriminant Analysis Based in part on slides from textbook, slides of Susan Holmes. November 9, Statistics 202: Data Mining

Linear Discriminant Analysis Based in part on slides from textbook, slides of Susan Holmes. November 9, Statistics 202: Data Mining Linear Discriminant Analysis Based in part on slides from textbook, slides of Susan Holmes November 9, 2012 1 / 1 Nearest centroid rule Suppose we break down our data matrix as by the labels yielding (X

More information

Random projection ensemble classification

Random projection ensemble classification Random projection ensemble classification Timothy I. Cannings Statistics for Big Data Workshop, Brunel Joint work with Richard Samworth Introduction to classification Observe data from two classes, pairs

More information

Classification using stochastic ensembles

Classification using stochastic ensembles July 31, 2014 Topics Introduction Topics Classification Application and classfication Classification and Regression Trees Stochastic ensemble methods Our application: USAID Poverty Assessment Tools Topics

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

DD-Classifier: Nonparametric Classification Procedure Based on DD-plot 1. Abstract

DD-Classifier: Nonparametric Classification Procedure Based on DD-plot 1. Abstract DD-Classifier: Nonparametric Classification Procedure Based on DD-plot 1 Jun Li 2, Juan A. Cuesta-Albertos 3, Regina Y. Liu 4 Abstract Using the DD-plot (depth-versus-depth plot), we introduce a new nonparametric

More information

Lectures in AstroStatistics: Topics in Machine Learning for Astronomers

Lectures in AstroStatistics: Topics in Machine Learning for Astronomers Lectures in AstroStatistics: Topics in Machine Learning for Astronomers Jessi Cisewski Yale University American Astronomical Society Meeting Wednesday, January 6, 2016 1 Statistical Learning - learning

More information

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please

More information

Adaptive Mixture Discriminant Analysis for Supervised Learning with Unobserved Classes

Adaptive Mixture Discriminant Analysis for Supervised Learning with Unobserved Classes Adaptive Mixture Discriminant Analysis for Supervised Learning with Unobserved Classes Charles Bouveyron To cite this version: Charles Bouveyron. Adaptive Mixture Discriminant Analysis for Supervised Learning

More information

Comparison of Different Classification Methods on Glass Identification for Forensic Research

Comparison of Different Classification Methods on Glass Identification for Forensic Research Journal of Statistical Science and Application, April 2016, Vol. 4, No. 03-04, 65-84 doi: 10.17265/2328-224X/2015.0304.001 D DAV I D PUBLISHING Comparison of Different Classification Methods on Glass Identification

More information

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation

More information

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians University of Cambridge Engineering Part IIB Module 4F: Statistical Pattern Processing Handout 2: Multivariate Gaussians.2.5..5 8 6 4 2 2 4 6 8 Mark Gales mjfg@eng.cam.ac.uk Michaelmas 2 2 Engineering

More information

MLE/MAP + Naïve Bayes

MLE/MAP + Naïve Bayes 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes MLE / MAP Readings: Estimating Probabilities (Mitchell, 2016)

More information

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 3 September 14, Readings: Mitchell Ch Murphy Ch.

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 3 September 14, Readings: Mitchell Ch Murphy Ch. School of Computer Science 10-701 Introduction to Machine Learning aïve Bayes Readings: Mitchell Ch. 6.1 6.10 Murphy Ch. 3 Matt Gormley Lecture 3 September 14, 2016 1 Homewor 1: due 9/26/16 Project Proposal:

More information

Design of Text Mining Experiments. Matt Taddy, University of Chicago Booth School of Business faculty.chicagobooth.edu/matt.

Design of Text Mining Experiments. Matt Taddy, University of Chicago Booth School of Business faculty.chicagobooth.edu/matt. Design of Text Mining Experiments Matt Taddy, University of Chicago Booth School of Business faculty.chicagobooth.edu/matt.taddy/research Active Learning: a flavor of design of experiments Optimal : consider

More information

Multi-scale Classification using Localized Spatial Depth

Multi-scale Classification using Localized Spatial Depth Journal of Macine Learning Researc 7 (206) -30 Submitted 3/6; Revised 0/6; Publised 2/6 Multi-scale Classification using Localized Spatial Dept Subajit Dutta Department of Matematics and Statistics Indian

More information

STAT 730 Chapter 1 Background

STAT 730 Chapter 1 Background STAT 730 Chapter 1 Background Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 27 Logistics Course notes hopefully posted evening before lecture,

More information

Classification Ensemble That Maximizes the Area Under Receiver Operating Characteristic Curve (AUC)

Classification Ensemble That Maximizes the Area Under Receiver Operating Characteristic Curve (AUC) Classification Ensemble That Maximizes the Area Under Receiver Operating Characteristic Curve (AUC) Eunsik Park 1 and Y-c Ivan Chang 2 1 Chonnam National University, Gwangju, Korea 2 Academia Sinica, Taipei,

More information

Adaptive Mixture Discriminant Analysis for. Supervised Learning with Unobserved Classes

Adaptive Mixture Discriminant Analysis for. Supervised Learning with Unobserved Classes Adaptive Mixture Discriminant Analysis for Supervised Learning with Unobserved Classes Charles Bouveyron SAMOS-MATISSE, CES, UMR CNRS 8174 Université Paris 1 (Panthéon-Sorbonne), Paris, France Abstract

More information

CMSC858P Supervised Learning Methods

CMSC858P Supervised Learning Methods CMSC858P Supervised Learning Methods Hector Corrada Bravo March, 2010 Introduction Today we discuss the classification setting in detail. Our setting is that we observe for each subject i a set of p predictors

More information

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Naïve Bayes Matt Gormley Lecture 18 Oct. 31, 2018 1 Reminders Homework 6: PAC Learning

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu November 3, 2015 Methods to Learn Matrix Data Text Data Set Data Sequence Data Time Series Graph

More information

CSC 411: Lecture 09: Naive Bayes

CSC 411: Lecture 09: Naive Bayes CSC 411: Lecture 09: Naive Bayes Class based on Raquel Urtasun & Rich Zemel s lectures Sanja Fidler University of Toronto Feb 8, 2015 Urtasun, Zemel, Fidler (UofT) CSC 411: 09-Naive Bayes Feb 8, 2015 1

More information

MLE/MAP + Naïve Bayes

MLE/MAP + Naïve Bayes 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes Matt Gormley Lecture 19 March 20, 2018 1 Midterm Exam Reminders

More information

Supervised dimensionality reduction using mixture models

Supervised dimensionality reduction using mixture models Sajama Alon Orlitsky University of california at San Diego sajama@ucsd.edu alon@ucsd.edu Abstract Given a classification problem, our goal is to find a low-dimensional linear transformation of the feature

More information

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation Lecture 15. Pattern Classification (I): Statistical Formulation Outline Statistical Pattern Recognition Maximum Posterior Probability (MAP) Classifier Maximum Likelihood (ML) Classifier K-Nearest Neighbor

More information

High Dimensional Discriminant Analysis

High Dimensional Discriminant Analysis High Dimensional Discriminant Analysis Charles Bouveyron LMC-IMAG & INRIA Rhône-Alpes Joint work with S. Girard and C. Schmid High Dimensional Discriminant Analysis - Lear seminar p.1/43 Introduction High

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

High Dimensional Discriminant Analysis

High Dimensional Discriminant Analysis High Dimensional Discriminant Analysis Charles Bouveyron 1,2, Stéphane Girard 1, and Cordelia Schmid 2 1 LMC IMAG, BP 53, Université Grenoble 1, 38041 Grenoble cedex 9 France (e-mail: charles.bouveyron@imag.fr,

More information

An Introduction to Multivariate Methods

An Introduction to Multivariate Methods Chapter 12 An Introduction to Multivariate Methods Multivariate statistical methods are used to display, analyze, and describe data on two or more features or variables simultaneously. I will discuss multivariate

More information

A Robust Approach to Regularized Discriminant Analysis

A Robust Approach to Regularized Discriminant Analysis A Robust Approach to Regularized Discriminant Analysis Moritz Gschwandtner Department of Statistics and Probability Theory Vienna University of Technology, Austria Österreichische Statistiktage, Graz,

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu October 19, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network

More information

Independent Factor Discriminant Analysis

Independent Factor Discriminant Analysis Independent Factor Discriminant Analysis Angela Montanari, Daniela Giovanna Caló, and Cinzia Viroli Statistics Department University of Bologna, via Belle Arti 41, 40126, Bologna, Italy (e-mail: montanari@stat.unibo.it,

More information

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The

More information

Classification: Linear Discriminant Analysis

Classification: Linear Discriminant Analysis Classification: Linear Discriminant Analysis Discriminant analysis uses sample information about individuals that are known to belong to one of several populations for the purposes of classification. Based

More information

10-701/ Machine Learning - Midterm Exam, Fall 2010

10-701/ Machine Learning - Midterm Exam, Fall 2010 10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam

More information

ISSN X On misclassification probabilities of linear and quadratic classifiers

ISSN X On misclassification probabilities of linear and quadratic classifiers Afrika Statistika Vol. 111, 016, pages 943 953. DOI: http://dx.doi.org/10.1699/as/016.943.85 Afrika Statistika ISSN 316-090X On misclassification probabilities of linear and quadratic classifiers Olusola

More information

Applied Multivariate and Longitudinal Data Analysis

Applied Multivariate and Longitudinal Data Analysis Applied Multivariate and Longitudinal Data Analysis Discriminant analysis and classification Ana-Maria Staicu SAS Hall 5220; 919-515-0644; astaicu@ncsu.edu 1 Consider the examples: An online banking service

More information

A Bias Correction for the Minimum Error Rate in Cross-validation

A Bias Correction for the Minimum Error Rate in Cross-validation A Bias Correction for the Minimum Error Rate in Cross-validation Ryan J. Tibshirani Robert Tibshirani Abstract Tuning parameters in supervised learning problems are often estimated by cross-validation.

More information

MULTIVARIATE HOMEWORK #5

MULTIVARIATE HOMEWORK #5 MULTIVARIATE HOMEWORK #5 Fisher s dataset on differentiating species of Iris based on measurements on four morphological characters (i.e. sepal length, sepal width, petal length, and petal width) was subjected

More information

SVM-flexible discriminant analysis

SVM-flexible discriminant analysis SVM-flexible discriminant analysis Huimin Peng November 20, 2014 Outline SVM Nonlinear SVM = Penalization method discriminant analysis FDA: flexible discriminant analysis penalized discriminant analysis

More information

Spring 2006: Linear Discriminant Analysis, Etc.

Spring 2006: Linear Discriminant Analysis, Etc. 36-724 Spring 2006: Linear Discriminant Analysis, Etc. Brian Junker April 17, 2006 Review: The Bayes Classifier Linear and Quadratic Discriminant Analysis and Friends Linear regression of an indicator

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer

More information

Lecture 9: Classification, LDA

Lecture 9: Classification, LDA Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we

More information

Lecture 9: Classification, LDA

Lecture 9: Classification, LDA Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis Jonathan Taylor, 10/12 Slide credits: Sergio Bacallado 1 / 1 Review: Main strategy in Chapter 4 Find an estimate ˆP

More information

Lecture 8: Classification

Lecture 8: Classification 1/26 Lecture 8: Classification Måns Eriksson Department of Mathematics, Uppsala University eriksson@math.uu.se Multivariate Methods 19/5 2010 Classification: introductory examples Goal: Classify an observation

More information

Gaussian Mixture Models with Component Means Constrained in Pre-selected Subspaces

Gaussian Mixture Models with Component Means Constrained in Pre-selected Subspaces Gaussian Mixture Models with Component Means Constrained in Pre-selected Subspaces Mu Qiao and Jia Li Abstract We investigate a Gaussian mixture model (GMM) with component means constrained in a pre-selected

More information

DISCRIMINANT ANALYSIS. 1. Introduction

DISCRIMINANT ANALYSIS. 1. Introduction DISCRIMINANT ANALYSIS. Introduction Discrimination and classification are concerned with separating objects from different populations into different groups and with allocating new observations to one

More information

BINARY TREE-STRUCTURED PARTITION AND CLASSIFICATION SCHEMES

BINARY TREE-STRUCTURED PARTITION AND CLASSIFICATION SCHEMES BINARY TREE-STRUCTURED PARTITION AND CLASSIFICATION SCHEMES DAVID MCDIARMID Abstract Binary tree-structured partition and classification schemes are a class of nonparametric tree-based approaches to classification

More information

T 2 Type Test Statistic and Simultaneous Confidence Intervals for Sub-mean Vectors in k-sample Problem

T 2 Type Test Statistic and Simultaneous Confidence Intervals for Sub-mean Vectors in k-sample Problem T Type Test Statistic and Simultaneous Confidence Intervals for Sub-mean Vectors in k-sample Problem Toshiki aito a, Tamae Kawasaki b and Takashi Seo b a Department of Applied Mathematics, Graduate School

More information

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Fundamentals to Biostatistics Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Statistics collection, analysis, interpretation of data development of new

More information

Lecture 9: Classification, LDA

Lecture 9: Classification, LDA Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Machine Learning 2017

Machine Learning 2017 Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section

More information

Bayesian Inference of Interactions and Associations

Bayesian Inference of Interactions and Associations Bayesian Inference of Interactions and Associations Jun Liu Department of Statistics Harvard University http://www.fas.harvard.edu/~junliu Based on collaborations with Yu Zhang, Jing Zhang, Yuan Yuan,

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Unsupervised clustering of COMBO-17 galaxy photometry

Unsupervised clustering of COMBO-17 galaxy photometry STScI Astrostatistics R tutorials Eric Feigelson (Penn State) November 2011 SESSION 2 Multivariate clustering and classification ***************** ***************** Unsupervised clustering of COMBO-17

More information

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning Midterm Exam 10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

More information

Classification 1: Linear regression of indicators, linear discriminant analysis

Classification 1: Linear regression of indicators, linear discriminant analysis Classification 1: Linear regression of indicators, linear discriminant analysis Ryan Tibshirani Data Mining: 36-462/36-662 April 2 2013 Optional reading: ISL 4.1, 4.2, 4.4, ESL 4.1 4.3 1 Classification

More information

CS 340 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis

CS 340 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis CS 3 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis AD March 11 AD ( March 11 1 / 17 Multivariate Gaussian Consider data { x i } N i=1 where xi R D and we assume they are

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Introduction to Classification Algorithms Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Some

More information

On Some Classification Methods for High Dimensional and Functional Data

On Some Classification Methods for High Dimensional and Functional Data On Some Classification Methods for High Dimensional and Functional Data by OLUSOLA SAMUEL MAKINDE A thesis submitted to The University of Birmingham for the degree of Doctor of Philosophy School of Mathematics

More information

Supervised Learning. Regression Example: Boston Housing. Regression Example: Boston Housing

Supervised Learning. Regression Example: Boston Housing. Regression Example: Boston Housing Supervised Learning Unsupervised learning: To extract structure and postulate hypotheses about data generating process from observations x 1,...,x n. Visualize, summarize and compress data. We have seen

More information

Support Vector Machines for Classification: A Statistical Portrait

Support Vector Machines for Classification: A Statistical Portrait Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Antti Ukkonen TAs: Saska Dönges and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer,

More information

Discriminant Kernels based Support Vector Machine

Discriminant Kernels based Support Vector Machine Discriminant Kernels based Support Vector Machine Akinori Hidaka Tokyo Denki University Takio Kurita Hiroshima University Abstract Recently the kernel discriminant analysis (KDA) has been successfully

More information

c 4, < y 2, 1 0, otherwise,

c 4, < y 2, 1 0, otherwise, Fundamentals of Big Data Analytics Univ.-Prof. Dr. rer. nat. Rudolf Mathar Problem. Probability theory: The outcome of an experiment is described by three events A, B and C. The probabilities Pr(A) =,

More information

Multivariate statistical methods and data mining in particle physics

Multivariate statistical methods and data mining in particle physics Multivariate statistical methods and data mining in particle physics RHUL Physics www.pp.rhul.ac.uk/~cowan Academic Training Lectures CERN 16 19 June, 2008 1 Outline Statement of the problem Some general

More information