Discriminant Analysis Documentation

Similar documents
Linear Regression and Discrimination

Classification Methods II: Linear and Quadratic Discrimminant Analysis

CMSC858P Supervised Learning Methods

Gaussian Models

Regularized Discriminant Analysis and Reduced-Rank LDA

Bayesian Decision Theory

Supervised Learning: Linear Methods (1/2) Applied Multivariate Statistics Spring 2012

Lecture 9: Classification, LDA

Lecture 9: Classification, LDA

Lecture 9: Classification, LDA

International Journal of Pure and Applied Mathematics Volume 19 No , A NOTE ON BETWEEN-GROUP PCA

Computation. For QDA we need to calculate: Lets first consider the case that

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Introduction to Machine Learning

High Dimensional Discriminant Analysis

High Dimensional Discriminant Analysis

Section 7: Discriminant Analysis.

Introduction to Machine Learning

Regularized Discriminant Analysis. Part I. Linear and Quadratic Discriminant Analysis. Discriminant Analysis. Example. Example. Class distribution

High Dimensional Discriminant Analysis

Classification 1: Linear regression of indicators, linear discriminant analysis

Introduction to Machine Learning

Introduction to Machine Learning Spring 2018 Note 18

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21

Generative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham

LEC 4: Discriminant Analysis for Classification

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides

BANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1

Linear Methods for Prediction

Clustering VS Classification

MSA220 Statistical Learning for Big Data

1 Number Systems and Errors 1

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification

Classification. Chapter Introduction. 6.2 The Bayes classifier

Introduction to Machine Learning

Introduction to Applied Linear Algebra with MATLAB

Probabilistic generative models

Chap 2. Linear Classifiers (FTH, ) Yongdai Kim Seoul National University

Lecture 6: Methods for high-dimensional problems

MSA200/TMS041 Multivariate Analysis

A Statistical Analysis of Fukunaga Koontz Transform

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

From dummy regression to prior probabilities in PLS-DA

Linear Classifiers as Pattern Detectors

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)

The Bayes classifier

Classification: Linear Discriminant Analysis

Classification 2: Linear discriminant analysis (continued); logistic regression

Lecture 4 Discriminant Analysis, k-nearest Neighbors

Discriminant analysis and supervised classification

Hands-on Matrix Algebra Using R

Machine Learning 1. Linear Classifiers. Marius Kloft. Humboldt University of Berlin Summer Term Machine Learning 1 Linear Classifiers 1

Lecture 5: Classification

Ch 4. Linear Models for Classification

Machine Learning Linear Classification. Prof. Matteo Matteucci

A Study of Relative Efficiency and Robustness of Classification Methods

STATS306B STATS306B. Discriminant Analysis. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010

5. Discriminant analysis

Linear Classification: Probabilistic Generative Models

CLASSICAL NORMAL-BASED DISCRIMINANT ANALYSIS

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Notes on Discriminant Functions and Optimal Classification

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Linear Methods for Prediction

Bayesian Decision and Bayesian Learning

Does Modeling Lead to More Accurate Classification?

Classification of high dimensional data: High Dimensional Discriminant Analysis

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

Dimensionality Reduction and Principal Components

An Introduction to Statistical and Probabilistic Linear Models

Recap from previous lecture

Lecture 5. Gaussian Models - Part 1. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. November 29, 2016

THESIS COVARIANCE REGULARIZATION IN MIXTURE OF GAUSSIANS FOR HIGH-DIMENSIONAL IMAGE CLASSIFICATION. Submitted by. Daniel L Elliott

Linear Decision Boundaries

Linear discriminant functions

APPLIED NUMERICAL LINEAR ALGEBRA

Support Vector Machines for Classification: A Statistical Portrait

Principal component analysis

Classification. 1. Strategies for classification 2. Minimizing the probability for misclassification 3. Risk minimization

DATA MINING AND MACHINE LEARNING. Lecture 4: Linear models for regression and classification Lecturer: Simone Scardapane

STATISTICAL LEARNING SYSTEMS

Is cross-validation valid for small-sample microarray classification?

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Motivating the Covariance Matrix

Machine Learning (CS 567) Lecture 5

CS 195-5: Machine Learning Problem Set 1

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory

LDA, QDA, Naive Bayes

12 Discriminant Analysis

The following postestimation commands are of special interest after discrim qda: The following standard postestimation commands are also available:

Classification. Sandro Cumani. Politecnico di Torino

p(x ω i 0.4 ω 2 ω

L11: Pattern recognition principles

7. Symmetric Matrices and Quadratic Forms

EE731 Lecture Notes: Matrix Computations for Signal Processing

Transcription:

Discriminant Analysis Documentation Release 1 Tim Thatcher May 01, 2016

Contents 1 Installation 3 2 Theory 5 2.1 Linear Discriminant Analysis (LDA).................................. 5 2.2 Quadratic Discriminant Analysis (QDA)................................ 6 2.3 Canonical Discriminant Analysis (CDA)................................ 7 2.4 Using LDA to do QDA.......................................... 8 2.5 Calculation Method........................................... 9 3 Package Interface 11 4 References 13 Bibliography 15 i

ii

DiscriminantAnalysis.jl is a Julia package for multiple linear and quadratic regularized discriminant analysis (LDA & QDA respectively). LDA and QDA are distribution-based classifiers with the underlying assumption that data follows a multivariate normal distribution. LDA differs from QDA in the assumption about the class variability; LDA assumes that all classes share the same within-class covariance matrix whereas QDA relaxes that constraint and allows for distinct within-class covariance matrices. This results in LDA being a linear classifier and QDA being a quadratic classifier. Contents 1

2 Contents

CHAPTER 1 Installation The source code is available on Github: DiscriminantAnalysis.jl To add the package from Julia: Pkg.add("DiscriminantAnalysis") 3

4 Chapter 1. Installation

CHAPTER 2 Theory Linear and Quadratic Discriminant Analysis in the context of classification arise as simple probabilistic classifiers. Discriminant Analysis works under the assumption that each class follows a Gaussian distribution. That is, for each class k, the probability distribution can be modelled by: f k (x) = exp ( 1 2 (x μ k)σ 1 k (x μ k) ) (2π) p/2 Σ k 1/2 Let π k represent the prior class membership probabilities. Application of Baye s Theorem results in: P (K = k X = x) = f k(x)π k i f i(x)π i Noting that probabilities are non-zero and the natural logarithm is monotonically increasing, the following rule can be used for classification: arg max k f k (x)π k i f i(x)π i = arg max k log(f k (x)) + log(π k ) Application of the natural logarithm helps to simplify the classification rule when working with a Gaussian distribution. The resulting set of functions δ k are known as discriminant functions. In the context of LDA and QDA, discriminant functions are of the form: δ k (x) = log(f k (x)) + log(π k ) 2.1 Linear Discriminant Analysis (LDA) Linear Discriminant Analysis works under the simplifying assumption that Σ k = Σ for each class k. In other words, the classes share a common within-class covariance matrix. Since x Σ x term is constant across classes, this simplifies the discriminant function to a linear classifier: δ k (x) = μ k Σ 1 x + 1 2 μ kσ 1 μ k + log(π k ) The following plot shows the linear classification boundaries that result when a sample data set of two bi-variate Gaussian variables is modelled using linear discriminant analysis: 5

2.2 Quadratic Discriminant Analysis (QDA) Quadratic Discriminant Analysis does not make the simplifying assumption that each class shares the same covariance matrix. This results in a quadratic classifier in x: δ k (x) = 1 2 (x μ k)σ 1 k (x μ k) 1 2 log ( Σ k ) + log(π k ) The following plot shows the quadratic classification boundaries that result when a sample data set of two bi-variate Gaussian variables is modelled using quadratic discriminant analysis: 6 Chapter 2. Theory

Note that quadratic discriminant analysis does not necessarily perform better than linear discriminant analysis. 2.3 Canonical Discriminant Analysis (CDA) Canonical discriminant analysis expands upon linear discriminant analysis by noting that the class centroids lie in a c 1 dimensional subspace of the p dimensions of the data where c is the number of classes. Defining the between-class covariance matrix: Σ b = 1 c (μ k μ)(μ k μ) c k=1 Canonical discriminant analysis then optimizes the generalized Rayleigh quotient of the between-class covariance and the within-class covariance to solve for the optimal axes to describe class separability: arg max w wσ b w wσw 2.3. Canonical Discriminant Analysis (CDA) 7

For two class LDA, the canonical coordinate is perpendicular to the separating hyperplane produced by the decision boundary. For the LDA model above, the dimensionality is reduced from 2 to 1. The following image shows the resulting distribution of points relative to the canonical coordinate: 2.4 Using LDA to do QDA A quadratic boundary using LDA can be generated by squaring each variable and by producing all the interaction terms. For two variables x and y, this is simply: x + y + x 2 + y 2 + xy The transformed variables may be used as inputs for the LDA model. This results in a quadratic decision boundary: 8 Chapter 2. Theory

Note that this boundary does not correspond to the same boundary produced by QDA. 2.5 Calculation Method As a result of floating point arithmetic, full inversion of a matrix may introduce numerical error. Even inversion of a small matrix may produce relatively large error (see Hilbert matrices), so alternative methods are used to ensure numerical stability. For each class covariance matrix in QDA (or the overall covariance matrix in LDA), a whitening matrix W k is computed such that: V(X k W k ) = W k V(X k )W k = W k Σ k W k = I = W = Σ 1/2 This is accomplished using an QR or singular value decomposition of the data matrix where possible. When the covariance matrix must be calculated directly, the Cholesky decomposition is used to whiten the data instead. 2.5. Calculation Method 9

Once the whitening matrix has been computed, we can then use the transformation: z k = W k x = Z k = XW k Since we are now working in the transformed space, the determinant goes to zero and the inverse is simply the identity matrix. This results in the simplified discriminant function: δ k (z k ) = 1 2 (z k μ k )(z k μ k ) + log(π k ) 10 Chapter 2. Theory

CHAPTER 3 Package Interface Note: Data matrices may be stored in either row-major or column-major ordering of observations. Row-major ordering means each row corresponds to an observation and column-major ordering means each column corresponds to an observation: x 1 x 2 X row = X col = x 1 x 2 x n. x n In DiscriminantAnalysis.jl, the input data matrix X is assumed to be stored in the same format as a design matrix in statistics (row-major) by default. This ordering can be switched between row-major and column-major by setting the order argument to Val{:row} and Val{:col}, respectively. lda(x, y [; order, M, priors, gamma]) Fit a regularized linear discriminant model based on data X and class identifier y. X must be a matrix of floats and y must be a vector of positive integers that index the classes. M is an optional matrix of class means. If M is not supplied, it defaults to point estimates of the class means. The priors argument represents the prior probability of class membership. If priors is not supplied, it defaults to equal class weights. Note: See the format notes for the data matrix X. Gamma is a regularization parameter that shrinks the covariance matrix towards the average eigenvalue: ( ) trace(σ) Σ(γ) = (1 γ)σ + γ I p This type of regularization can be used counteract bias in the eigenvalue estimates generated from the sample covariance matrix. The components of the LDA model may be extracted from the ModelLDA object returned by the lda function: Field is_cda W order M priors gamma Description Boolean value; the model is a CDA model if true The whitening matrix used to decorrelate observations The ordering of observations in the data matrix A matrix of class means; one per row A vector of class prior probabilities The regularization parameter as defined above. 11

cda(x, y [; order, M, priors, gamma]) Fit a regularized canonical discriminant model based on data X and class identifier y. The CDA model is identical to an LDA model, except that dimensionality reduction is included in the whitening transformation matrix. See the lda documentation for information on the arguments. qda(x, y [; order, M, priors, gamma, lambda]) Fit a regularized quadratic discriminant model based on data X and class identifier y. X must be a matrix of floats and y must be a vector of positive integers that index the classes. M is an optional matrix of class means. If M is not supplied, it defaults to point estimates of the class means. The priors argument represents the prior probability of class membership. If priors is not supplied, it defaults to equal class weights. Note: See the format notes for the data matrix X. Lambda is a regularization parameter that shrinks the class covariance matrices towards the overall covariance matrix: Σ k (λ) = (1 λ)σ k + λσ As in LDA, gamma is a regularization parameter that shrinks the covariance matrix towards the average eigenvalue: ( ) trace(σk (λ)) Σ k (γ, λ) = (1 γ)σ k (λ) + γ I p The components of the QDA model may be extracted from the ModelQDA object returned by the qda function: Field W_k order M priors gamma lambda Description The vector of whitening matrices (one per class) The ordering of observations in the data matrix A matrix of class means; one per row A vector of class prior probabilities The regularization parameter as defined above. The regularization parameter as defined above. discriminants(model, Z) Returns a matrix of discriminant function values based on model. Each column of values corresponds to a class discriminant function and each row corresponds to the discriminant function values for an observation in Z. For example, Z[i,j] corresponds to the discriminant function value of class j for observation i. classify(model, Z) Returns a vector of class indices based on the classification rule. This function takes the output of the discriminants function and applies indmax to each row to determine the class. 12 Chapter 3. Package Interface

CHAPTER 4 References 13

14 Chapter 4. References

Bibliography [fried] Friedman J. 1989. Regularized discriminant analysis. Journal of the American statistical association 84.405; p. 165-175. [hff] Hastie T, Tibshirani R, Friedman J, Franklin J. 2005. The elements of statistical learning: data mining, inference and prediction. The Mathematical Intelligencer, 27(2); p. 83-85. 15

16 Bibliography

Index C cda() (built-in function), 11 classify() (built-in function), 12 D discriminants() (built-in function), 12 L lda() (built-in function), 11 Q qda() (built-in function), 12 17