Discriminant Analysis Documentation

Size: px

Start display at page:

Download "Discriminant Analysis Documentation"

Asher Ryan
5 years ago
Views:

1 Discriminant Analysis Documentation Release 1 Tim Thatcher May 01, 2016

3 Contents 1 Installation 3 2 Theory Linear Discriminant Analysis (LDA) Quadratic Discriminant Analysis (QDA) Canonical Discriminant Analysis (CDA) Using LDA to do QDA Calculation Method Package Interface 11 4 References 13 Bibliography 15 i

4 ii

5 DiscriminantAnalysis.jl is a Julia package for multiple linear and quadratic regularized discriminant analysis (LDA & QDA respectively). LDA and QDA are distribution-based classifiers with the underlying assumption that data follows a multivariate normal distribution. LDA differs from QDA in the assumption about the class variability; LDA assumes that all classes share the same within-class covariance matrix whereas QDA relaxes that constraint and allows for distinct within-class covariance matrices. This results in LDA being a linear classifier and QDA being a quadratic classifier. Contents 1

6 2 Contents

7 CHAPTER 1 Installation The source code is available on Github: DiscriminantAnalysis.jl To add the package from Julia: Pkg.add("DiscriminantAnalysis") 3

8 4 Chapter 1. Installation

9 CHAPTER 2 Theory Linear and Quadratic Discriminant Analysis in the context of classification arise as simple probabilistic classifiers. Discriminant Analysis works under the assumption that each class follows a Gaussian distribution. That is, for each class k, the probability distribution can be modelled by: f k (x) = exp ( 1 2 (x μ k)σ 1 k (x μ k) ) (2π) p/2 Σ k 1/2 Let π k represent the prior class membership probabilities. Application of Baye s Theorem results in: P (K = k X = x) = f k(x)π k i f i(x)π i Noting that probabilities are non-zero and the natural logarithm is monotonically increasing, the following rule can be used for classification: arg max k f k (x)π k i f i(x)π i = arg max k log(f k (x)) + log(π k ) Application of the natural logarithm helps to simplify the classification rule when working with a Gaussian distribution. The resulting set of functions δ k are known as discriminant functions. In the context of LDA and QDA, discriminant functions are of the form: δ k (x) = log(f k (x)) + log(π k ) 2.1 Linear Discriminant Analysis (LDA) Linear Discriminant Analysis works under the simplifying assumption that Σ k = Σ for each class k. In other words, the classes share a common within-class covariance matrix. Since x Σ x term is constant across classes, this simplifies the discriminant function to a linear classifier: δ k (x) = μ k Σ 1 x μ kσ 1 μ k + log(π k ) The following plot shows the linear classification boundaries that result when a sample data set of two bi-variate Gaussian variables is modelled using linear discriminant analysis: 5

2.2 Quadratic Discriminant Analysis (QDA) Quadratic Discriminant Analysis does not make the simplifying assumption that each class shares the same covariance matrix.

10 2.2 Quadratic Discriminant Analysis (QDA) Quadratic Discriminant Analysis does not make the simplifying assumption that each class shares the same covariance matrix. This results in a quadratic classifier in x: δ k (x) = 1 2 (x μ k)σ 1 k (x μ k) 1 2 log ( Σ k ) + log(π k ) The following plot shows the quadratic classification boundaries that result when a sample data set of two bi-variate Gaussian variables is modelled using quadratic discriminant analysis: 6 Chapter 2. Theory

11 Note that quadratic discriminant analysis does not necessarily perform better than linear discriminant analysis. 2.3 Canonical Discriminant Analysis (CDA) Canonical discriminant analysis expands upon linear discriminant analysis by noting that the class centroids lie in a c 1 dimensional subspace of the p dimensions of the data where c is the number of classes. Defining the between-class covariance matrix: Σ b = 1 c (μ k μ)(μ k μ) c k=1 Canonical discriminant analysis then optimizes the generalized Rayleigh quotient of the between-class covariance and the within-class covariance to solve for the optimal axes to describe class separability: arg max w wσ b w wσw 2.3. Canonical Discriminant Analysis (CDA) 7

The following image shows the resulting distribution of points relative to the canonical coordinate: 2.

12 For two class LDA, the canonical coordinate is perpendicular to the separating hyperplane produced by the decision boundary. For the LDA model above, the dimensionality is reduced from 2 to 1. The following image shows the resulting distribution of points relative to the canonical coordinate: 2.4 Using LDA to do QDA A quadratic boundary using LDA can be generated by squaring each variable and by producing all the interaction terms. For two variables x and y, this is simply: x + y + x 2 + y 2 + xy The transformed variables may be used as inputs for the LDA model. This results in a quadratic decision boundary: 8 Chapter 2. Theory

13 Note that this boundary does not correspond to the same boundary produced by QDA. 2.5 Calculation Method As a result of floating point arithmetic, full inversion of a matrix may introduce numerical error. Even inversion of a small matrix may produce relatively large error (see Hilbert matrices), so alternative methods are used to ensure numerical stability. For each class covariance matrix in QDA (or the overall covariance matrix in LDA), a whitening matrix W k is computed such that: V(X k W k ) = W k V(X k )W k = W k Σ k W k = I = W = Σ 1/2 This is accomplished using an QR or singular value decomposition of the data matrix where possible. When the covariance matrix must be calculated directly, the Cholesky decomposition is used to whiten the data instead Calculation Method 9

14 Once the whitening matrix has been computed, we can then use the transformation: z k = W k x = Z k = XW k Since we are now working in the transformed space, the determinant goes to zero and the inverse is simply the identity matrix. This results in the simplified discriminant function: δ k (z k ) = 1 2 (z k μ k )(z k μ k ) + log(π k ) 10 Chapter 2. Theory

15 CHAPTER 3 Package Interface Note: Data matrices may be stored in either row-major or column-major ordering of observations. Row-major ordering means each row corresponds to an observation and column-major ordering means each column corresponds to an observation: x 1 x 2 X row = X col = x 1 x 2 x n. x n In DiscriminantAnalysis.jl, the input data matrix X is assumed to be stored in the same format as a design matrix in statistics (row-major) by default. This ordering can be switched between row-major and column-major by setting the order argument to Val{:row} and Val{:col}, respectively. lda(x, y [; order, M, priors, gamma]) Fit a regularized linear discriminant model based on data X and class identifier y. X must be a matrix of floats and y must be a vector of positive integers that index the classes. M is an optional matrix of class means. If M is not supplied, it defaults to point estimates of the class means. The priors argument represents the prior probability of class membership. If priors is not supplied, it defaults to equal class weights. Note: See the format notes for the data matrix X. Gamma is a regularization parameter that shrinks the covariance matrix towards the average eigenvalue: ( ) trace(σ) Σ(γ) = (1 γ)σ + γ I p This type of regularization can be used counteract bias in the eigenvalue estimates generated from the sample covariance matrix. The components of the LDA model may be extracted from the ModelLDA object returned by the lda function: Field is_cda W order M priors gamma Description Boolean value; the model is a CDA model if true The whitening matrix used to decorrelate observations The ordering of observations in the data matrix A matrix of class means; one per row A vector of class prior probabilities The regularization parameter as defined above. 11

16 cda(x, y [; order, M, priors, gamma]) Fit a regularized canonical discriminant model based on data X and class identifier y. The CDA model is identical to an LDA model, except that dimensionality reduction is included in the whitening transformation matrix. See the lda documentation for information on the arguments. qda(x, y [; order, M, priors, gamma, lambda]) Fit a regularized quadratic discriminant model based on data X and class identifier y. X must be a matrix of floats and y must be a vector of positive integers that index the classes. M is an optional matrix of class means. If M is not supplied, it defaults to point estimates of the class means. The priors argument represents the prior probability of class membership. If priors is not supplied, it defaults to equal class weights. Note: See the format notes for the data matrix X. Lambda is a regularization parameter that shrinks the class covariance matrices towards the overall covariance matrix: Σ k (λ) = (1 λ)σ k + λσ As in LDA, gamma is a regularization parameter that shrinks the covariance matrix towards the average eigenvalue: ( ) trace(σk (λ)) Σ k (γ, λ) = (1 γ)σ k (λ) + γ I p The components of the QDA model may be extracted from the ModelQDA object returned by the qda function: Field W_k order M priors gamma lambda Description The vector of whitening matrices (one per class) The ordering of observations in the data matrix A matrix of class means; one per row A vector of class prior probabilities The regularization parameter as defined above. The regularization parameter as defined above. discriminants(model, Z) Returns a matrix of discriminant function values based on model. Each column of values corresponds to a class discriminant function and each row corresponds to the discriminant function values for an observation in Z. For example, Z[i,j] corresponds to the discriminant function value of class j for observation i. classify(model, Z) Returns a vector of class indices based on the classification rule. This function takes the output of the discriminants function and applies indmax to each row to determine the class. 12 Chapter 3. Package Interface

17 CHAPTER 4 References 13

18 14 Chapter 4. References

19 Bibliography [fried] Friedman J Regularized discriminant analysis. Journal of the American statistical association ; p [hff] Hastie T, Tibshirani R, Friedman J, Franklin J The elements of statistical learning: data mining, inference and prediction. The Mathematical Intelligencer, 27(2); p

20 16 Bibliography

21 Index C cda() (built-in function), 11 classify() (built-in function), 12 D discriminants() (built-in function), 12 L lda() (built-in function), 11 Q qda() (built-in function), 12 17

Linear Regression and Discrimination

Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian