ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification

Similar documents
Classification 1: Linear regression of indicators, linear discriminant analysis

Lecture 9: Classification, LDA

Lecture 9: Classification, LDA

Classification Methods II: Linear and Quadratic Discrimminant Analysis

LEC 4: Discriminant Analysis for Classification

Classification 2: Linear discriminant analysis (continued); logistic regression

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)

Lecture 9: Classification, LDA

Supervised Learning: Linear Methods (1/2) Applied Multivariate Statistics Spring 2012

Regularized Discriminant Analysis and Reduced-Rank LDA

Semiparametric Discriminant Analysis of Mixture Populations Using Mahalanobis Distance. Probal Chaudhuri and Subhajit Dutta

Lecture 5: Classification

The Bayes classifier

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

CMSC858P Supervised Learning Methods

Part I. Linear Discriminant Analysis. Discriminant analysis. Discriminant analysis

Linear Discriminant Analysis Based in part on slides from textbook, slides of Susan Holmes. November 9, Statistics 202: Data Mining

Lecture 5: LDA and Logistic Regression

Classification. Chapter Introduction. 6.2 The Bayes classifier

MSA220 Statistical Learning for Big Data

Introduction to Machine Learning Spring 2018 Note 18

Chap 2. Linear Classifiers (FTH, ) Yongdai Kim Seoul National University

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Classification techniques focus on Discriminant Analysis

Introduction to Machine Learning

LDA, QDA, Naive Bayes

Lecture 4 Discriminant Analysis, k-nearest Neighbors

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

Supervised Learning. Regression Example: Boston Housing. Regression Example: Boston Housing

MLE/MAP + Naïve Bayes

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber

MLE/MAP + Naïve Bayes

Lecture 5. Gaussian Models - Part 1. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. November 29, 2016

Introduction to Machine Learning

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

Linear Methods for Prediction

Data Mining and Analysis: Fundamental Concepts and Algorithms

SF2935: MODERN METHODS OF STATISTICAL LECTURE 3 SUPERVISED CLASSIFICATION, LINEAR DISCRIMINANT ANALYSIS LEARNING. Tatjana Pavlenko.

Data Mining and Analysis: Fundamental Concepts and Algorithms

Introduction to Machine Learning

Introduction to Machine Learning

Machine Learning 1. Linear Classifiers. Marius Kloft. Humboldt University of Berlin Summer Term Machine Learning 1 Linear Classifiers 1

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 3 September 14, Readings: Mitchell Ch Murphy Ch.

Extensions to LDA and multinomial regression

Generative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham

Machine Learning (CS 567) Lecture 5

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Spring 2006: Linear Discriminant Analysis, Etc.

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018

Linear Regression and Discrimination

Statistical Data Mining and Machine Learning Hilary Term 2016

CSC 411: Lecture 09: Naive Bayes

Linear Methods for Prediction

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur

ECE662: Pattern Recognition and Decision Making Processes: HW TWO

MATH 567: Mathematical Techniques in Data Science Logistic regression and Discriminant Analysis

Naive Bayes and Gaussian Bayes Classifier

Naive Bayes and Gaussian Bayes Classifier

High Dimensional Discriminant Analysis

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Generative Model (Naïve Bayes, LDA)

5. Discriminant analysis

STA 414/2104, Spring 2014, Practice Problem Set #1

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation

6.867 Machine Learning

Machine Learning - MT Classification: Generative Models

Machine Learning for Signal Processing Bayes Classification and Regression

A Robust Approach to Regularized Discriminant Analysis

7 Gaussian Discriminant Analysis (including QDA and LDA)

Gaussian Models

Machine Learning for OR & FE

Naïve Bayes classification

MSA200/TMS041 Multivariate Analysis

An Introduction to Multivariate Methods

Techniques and Applications of Multivariate Analysis

Computational Genomics

Informatics 2B: Learning and Data Lecture 10 Discriminant functions 2. Minimal misclassifications. Decision Boundaries

CS 195-5: Machine Learning Problem Set 1

CMU-Q Lecture 24:

A Study of Relative Efficiency and Robustness of Classification Methods

Machine Learning Linear Classification. Prof. Matteo Matteucci

COM336: Neural Computing

Naive Bayes and Gaussian Bayes Classifier

Multivariate statistical methods and data mining in particle physics

CS 340 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis

Mathematical Formulation of Our Example

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification

Applied Multivariate Analysis

Introduction to Data Science

Development and Evaluation of Methods for Predicting Protein Levels from Tandem Mass Spectrometry Data. Han Liu

Regularized Discriminant Analysis. Part I. Linear and Quadratic Discriminant Analysis. Discriminant Analysis. Example. Example. Class distribution

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Does Modeling Lead to More Accurate Classification?

Classification: Linear Discriminant Analysis

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Pattern recognition. "To understand is to perceive patterns" Sir Isaiah Berlin, Russian philosopher

Bayesian Decision Theory

Bayesian Decision Theory

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

Data Mining and Analysis

Transcription:

ISyE 6416: Computational Statistics Spring 2017 Lecture 5: Discriminant analysis and classification Prof. Yao Xie H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology

Classification Classification is a predictive task in which the response takes the values across several categories (in the fundamental case, two categories) Examples Predicting whether a patient will develop breast cancer or remain healthy, given genetic information Predicting whether or not a user will like a new product, based on user covariates and a history of his/her previous ratings Predicting the voting preference based on voter s social, political, and economical status Here we consider supervised classification, i.e., we know the labels for the training data

Fisher s Irish example The Iris flower data set or Fisher s Iris data set is a multivariate data set introduced by Sir Ronald Fisher (1936) as an example of discriminant analysis. [1] The data set consists of 50 samples from each of three species of Iris (Iris setosa, Iris virginica and Iris versicolor). Four features were measured from each sample: the length and the width of the sepals and petals, in centimetres. Based on the combinabon of these four features, Fisher developed a linear discriminant model to disbnguish the species from each other

Classification model-based Linear and quadratic discriminant analysis: Gaussian densities. Mixtures of Gaussians. Naive Bayes: assume each of the class densities are products of marginal densities, that is, all the variables are independent. Multivariate Gaussian density function f(x) = 1 (2π) p/2 Σ 1/2 e 1 2 (x µ)t Σ 1 (x µ)

Linear discriminant analysis (LDA) assumes that the data within each class are normally distributed h j (x) = P(X = x C = j) = N (µ i, Σ) Note that the covariance matrix is a common one Σ k = Σ, k = 1,..., K Each class has its own mean µ j R p Maximum likelihood principle: we find j so that the posterior probability is the largest: P(C = j X = x)π j = h j (x)π j

estimated class given a data (feature vector) x f LDA (x) = arg max δ j(x) j=1,...,k Discriminant function: δ j (x), j = 1,..., K δ j (x) = µ T j Σ 1 x 1 }{{} 2 µt j Σ 1 µ j + log π j }{{} a T j b j is an affine function of x for LDA

LDA from data In practice, we estimate the model parameters from training data ˆπ j = n j /n the proportion of observation in class j (n j : the number of samples labeled as class j) ˆµj = 1 n j y=j x i: the centroid of class j ˆΣ = 1 K n K j=1 y (x i=j i ˆµ j )(x i ˆµ j ) T the pooled sample covariance matrix

LDA decision boundaries Due to our decision rule: find j so that the posterior probability is the largest: P(C = j X = x)π j = h j (x)π j The decision boundary between classes j and k is the set of all x R p such that δ j (x) = δ k (x), i.e., a j a k + (b j b k ) T x = 0

LDA computations and sphering The decision boundaries for LDA are useful for graphical purposes, but to classify a new data x 0, we simply compute ˆδ j (x 0 ) for each j = 1,..., K Note that LDA s discriminant function can be written as 1 2 (x ˆµ j) T ˆΣ 1 (x ˆµ j ) log ˆπ j It helps to perform eigendecomposition of the sample covariance matrix ˆΣ = UDU T, computation can be simplied (x ˆµ j ) T Σ 1 (x ˆµ j ) = D} 1/2 {{ U T x} D 1/2 U T ˆµ j }{{} x µ j which is just the Euclidean distance between x and µ j compute the discriminant function ˆδ j = 1 2 x µ j 2 2 log ˆπ j, and then find the nearest centroid 2 2

Quadratic discriminant analysis Estimate covariance matrix Σ k separately for each class k Quadratic discriminant function δ k (x) = 1 2 log Σ k 1 2 (x µ k) T Σ 1 k (x µ k) + log π k QDA fits data better than LDA, but has more parameters to estimate.

Diabetes dataset The scatter plot follows. Without diabetes: stars (class 1), with diabetes: circles (class 2). Solid line: classification boundary obtained by LDA. Dash dot line: boundary obtained by linear regression of indicator matrix.

Diabetes dataset Two input variables computed from the principle components of the original 8 variables Prior probabilities: ˆπ 1 = 0.651, ˆπ 2 = 0.349 ˆµ 1 = ( 0.4035, 0.1935) T, ˆµ 2 = (0.7528, 0.3611) T ˆΣ = [ 1.7925 ] 0.1461 0.1461 1.6634 LDA Decision rule { f LDA 1 1.1443 x1 0.5802x (x) = 2 0 2 otherwise. Within training data classification error rate: 28.26%

QDA for diabetes dataset Prior probabilities: ˆπ 1 = 0.651, ˆπ 2 = 0.349 ˆµ 1 = ( 0.4035, 0.1935) T, ˆµ 2 = (0.7528, 0.3611) T ˆΣ 1 = [ ] 1.6769 0.0461, ˆΣ2 = 0.0461 1.5964 [ 2.0087 ] 0.3330 0.3330 1.7887 Within training data classification error rate: 29.04%.

LDA on expanded basis Expand input space to include X 1 X 2, X1 2, X2 2 Input is five dimensional X = (X 1, X 2, X 1 X 2, X1 2, X2 2 ) 0.403 0.752 0.193 0.361 ˆµ 1 = 0.032 0.059 1.836 1.630, ˆµ 2 = 2.568 1.912 1.7925 0.1461 0.6254 0.3548 0.5215 0.1461 1.6634 0.6073 0.7421 1.2193 ˆΣ = 0.6254 0.6073 3.5751 1.1118 0.5044 0.3548 0.7421 1.1118 12.3355 0.0957 0.5215 1.2193 0.5044 0.0957 4.4650 Classification boundary 0.651 0.728x 1 0.552x 2 0.006x 1 x 2 0.071x 2 1 +0.170x2 2 = 0

Classification boundaries obtained by LDA using the expanded input space X 1, X 2, X 1 X 2, X 2 1, X2 2. Within training data classification error rate: 26.82%. Lower than those by LDA and QDA with the original input.

Mixture discriminant analysis A single Gaussian to model a class, as in LDA, can be quite restricted A method for classification (supervised) based on mixture models Extension of linear discriminant analysis a mixture of normals is used to obtain a density estimation for each class