Classification techniques focus on Discriminant Analysis

Similar documents
Classification Methods II: Linear and Quadratic Discrimminant Analysis

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification

From dummy regression to prior probabilities in PLS-DA

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Multivariate statistical methods and data mining in particle physics

Supervised Learning: Linear Methods (1/2) Applied Multivariate Statistics Spring 2012

Semiparametric Discriminant Analysis of Mixture Populations Using Mahalanobis Distance. Probal Chaudhuri and Subhajit Dutta

ECE662: Pattern Recognition and Decision Making Processes: HW TWO

LEC 4: Discriminant Analysis for Classification

Advanced statistical methods for data analysis Lecture 1

Unsupervised Learning. k-means Algorithm

Linear & Non-Linear Discriminant Analysis! Hugh R. Wilson

Tools of AI. Marcin Sydow. Summary. Machine Learning

Classification 2: Linear discriminant analysis (continued); logistic regression

MLE/MAP + Naïve Bayes

MLE/MAP + Naïve Bayes

Machine Learning 2017

Does Modeling Lead to More Accurate Classification?

A Study of Relative Efficiency and Robustness of Classification Methods

Lecture Notes 6: Linear Models

Probabilistic Fisher Discriminant Analysis

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Machine learning for pervasive systems Classification in high-dimensional spaces

Extensions to LDA and multinomial regression

Recap from previous lecture

ECE-271B. Nuno Vasconcelos ECE Department, UCSD

Linear Models. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber

An Introduction to Statistical and Probabilistic Linear Models

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018

Article from. Predictive Analytics and Futurism. July 2016 Issue 13

International Journal of Pure and Applied Mathematics Volume 19 No , A NOTE ON BETWEEN-GROUP PCA

Regularized Discriminant Analysis and Reduced-Rank LDA

Motivating the Covariance Matrix

Method for Optimizing the Number and Precision of Interval-Valued Parameters in a Multi-Object System

CMSC858P Supervised Learning Methods

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)

Supervised Learning. Regression Example: Boston Housing. Regression Example: Boston Housing

Classification 1: Linear regression of indicators, linear discriminant analysis

Discriminant Analysis and Statistical Pattern Recognition

Lecture 5: LDA and Logistic Regression

CS 195-5: Machine Learning Problem Set 1

High Dimensional Discriminant Analysis

Distance Metric Learning

c 4, < y 2, 1 0, otherwise,

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Introduction to Machine Learning Spring 2018 Note 18

Sparse Discriminant Analysis

Decision trees COMS 4771

Chap 2. Linear Classifiers (FTH, ) Yongdai Kim Seoul National University

p(x ω i 0.4 ω 2 ω

Analysis of Classification in Interval-Valued Information Systems

Support Vector Machines for Classification: A Statistical Portrait

ECE521 week 3: 23/26 January 2017

Machine Learning Linear Models

6.867 Machine Learning

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 3 September 14, Readings: Mitchell Ch Murphy Ch.

UVA CS / Introduc8on to Machine Learning and Data Mining. Lecture 9: Classifica8on with Support Vector Machine (cont.

Lecture 4 Discriminant Analysis, k-nearest Neighbors

Machine Learning Lecture 7

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

Decision Trees (Cont.)

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Lecture 5: Classification

Machine Learning Linear Classification. Prof. Matteo Matteucci

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

High Dimensional Discriminant Analysis

Introduction to Machine Learning

Machine Leanring Theory and Applications: third lecture

Random projection ensemble classification

MULTIVARIATE PATTERN RECOGNITION FOR CHEMOMETRICS. Richard Brereton

Kernel Partial Least Squares for Nonlinear Regression and Discrimination

Degenerate Expectation-Maximization Algorithm for Local Dimension Reduction

Statistical Methods for Particle Physics Lecture 2: multivariate methods

GI01/M055: Supervised Learning

A Topological Discriminant Analysis

STAT 730 Chapter 1 Background

Least Squares Regression

Applied Multivariate and Longitudinal Data Analysis

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

MSA200/TMS041 Multivariate Analysis

Logistic Regression. COMP 527 Danushka Bollegala

BINARY TREE-STRUCTURED PARTITION AND CLASSIFICATION SCHEMES

10-701/ Machine Learning - Midterm Exam, Fall 2010

Statistical Machine Learning Hilary Term 2018

Clustering VS Classification

A Bias Correction for the Minimum Error Rate in Cross-validation

On Improving the k-means Algorithm to Classify Unclassified Patterns

Generative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham

PRINCIPAL COMPONENTS ANALYSIS

Part I. Linear Discriminant Analysis. Discriminant analysis. Discriminant analysis

Comparison of Different Classification Methods on Glass Identification for Forensic Research

Least Squares Regression

Techniques and Applications of Multivariate Analysis

Machine Learning 11. week

Tufts COMP 135: Introduction to Machine Learning

Classification Ensemble That Maximizes the Area Under Receiver Operating Characteristic Curve (AUC)

Ensemble Methods and Random Forests

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

10-810: Advanced Algorithms and Models for Computational Biology. Optimal leaf ordering and classification

Transcription:

Classification techniques focus on Discriminant Analysis Seminar: Potentials of advanced image analysis technology in the cereal science research 2111 2005 Ulf Indahl/IMT - 14.06.2010

Task: Supervised Learning For a given feature vector x R p : Based on an appropriate model find the correct class label g C = {1, 2,, L} (L = number of unordered categories). Start by building a classification rule consistent with, and based on A set of training data (n samples) {(x i, g i ) i = 1,,n} to learn the relationship between feature vectors x R p and the groups in C. For new feature vectors x k : Predict its class g k with good precision (high probability). As always: Validation of the model building is required.

Probabilistic/statistical approach based on Bayes rile Assume feature vectors of each class described by a probability density f(x g). Aposteriori probability of each class g: p(g x) = π g f(x g)/σπ j f(x j) Bayes rule: Assign x to class k if the aposteriori probability p(l x) > p(g x) when g l. Our focus: Assume each group described by a normal distributoin

LDA Linear Discriminant Analysis: Assume all groups multinormal with identical covatriance structure Σ and individual group means µ 1,...,µ L For LDA the groups are spearated by linear surfaces:

Case study Iris datasettet (Anderson/Fisher): 3 classes - the species of iris: Setosa(1), Versicolor(2) and Virginica(3). n = 150 samples (50 of each class), p = 4 features (in principle measurable by appropriate image analysis methodology.) Measurements of Sepal- and Petal- length and width. The dataset was collected by E. Anderson (1935) and analyzed by R.A. Fisher (1936)

Estimation of LDA-model from the empirical data Demo of simple MATLAB-code Results (confution matrix) of leave-one-out crossvalidation: Set Vers Virg pred Set 50 0 0 pred Vers 0 48 2 pred Virg 0 1 49 Error groups (2 & 3): 3/100 OBS - full leave-one-out crossvalidation can be estimated without explicit remodelling (a consequence of linear modeling similar to MLR).

Canonical variates: Visual/geometrical facet of LDA Fisher s Idea: Project data such that classes are optimally separated.

Fisher s Formulation We want to find the projection a maximizing the quatient: B is the so-called between-groups covariance matrix measuring the dispersion of the class means: One projection is often not enough to separate data and one can use se further projections: Maximize the above quotient over all a that are W-orthogonal to projection directions already found.

Can Var - 2-1.4-1.6-1.8-2 -2.2-2.4-2.6 Scatterplot of IRIS-data setosa versicolor virginica -2.8-3 -3.2-3.4-3 -2-1 0 1 2 3 4 5 Can Var - 1

Can Var - 2-1.4-1.6-1.8-2 -2.2-2.4-2.6 Scatterplot of IRIS-data setosa versicolor virginica -2.8-3 -3.2-3.4-3 -2-1 0 1 2 3 4 5 Can Var - 1

LDA applied to the original features = LDA applied to the data after projection onto the subspace spanned by the canonical variates Compare the estimated classprobabilities of LDA on original features and LDA on canonical variates.

Classification by regression Group memberships are dummy-coded Use MLR to obtain fitted dummies. Classify sample x to the class corresponding to the largest among the L fitted values. This strategy is not optimal. See MATLAB-code applied to IRIS data. However LDA applied to the fitted MLR values = LDA applied to the original features (!)

Classification by regression: The Iris data Demo of simple MATLAB-code Results (confution matrix): Set Vers Virg pred Set 50 0 0 pred Vers 0 34 16 pred Virg 0 7 43 Error groups (2 & 3): 23/100

Ypred2 1 0.8 0.6 0.4 0.2 Scatterplot of IRIS-data setosa versicolor virginica 0-0.2-0.4-0.6-0.4-0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Ypred1

Ypred2 1 0.8 0.6 0.4 0.2 Scatterplot of IRIS-data setosa versicolor virginica 0-0.2-0.4-0.6-0.4-0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Ypred1

RDA: Modifications PDA/FDA:

Some other popular classification methods Nearest Neighbours Logistic Regression Classification Trees Support Vector Machines Neural Networks

Modifications cont. LDA combined with PLS methodology: References: Ulf G. Indahl, Harald Martens and Tormod Næs: From dummy regression to prior probabilities in PLS- DA, Journal of Chemometrics Volume 21 Issue 12, Pages 529 536. Kristian Hovde Liland, Ulf Geir Indahl. Powered partial least squares discriminant analysis, Journal of Chemometrics, Volume 23 Issue 1, Pages 7 18.

References Fisher, The use of multiple measurements in taxonomic problems, Eugenics 1936. Duda, Hart & Stork, Pattern Classification, Wiley 2001. Hastie, Tibshirani & Friedman, The Elements of Statistical Learning, Springer 2001. Ludger Evers, Decision theory and discriminant Analysis (lecture notes) University of Oxford 2004. Friedman, Regularized discriminant analysis, JASA 1989. Hastie, Buja & Tibshirani, Penalized Discriminant Analysis, Annals of Statistics 1995. Hastie, Tibshirani & Buja, Flexible Discriminant Analysis, JASA 1994.