CLASSICAL NORMAL-BASED DISCRIMINANT ANALYSIS

Size: px
Start display at page:

Download "CLASSICAL NORMAL-BASED DISCRIMINANT ANALYSIS"

Transcription

1 CLASSICAL NORMAL-BASED DISCRIMINANT ANALYSIS EECS 833, March 006 Geoff Bohling Assistant Scientist Kansas Geological Survey Overheads and resources available at

2 Example Data For the next two lectures, we will loo at predicting facies from logs for a lower Cretaceous (~45 to 00 million years old) section in north central Kansas: Facies assignments from core are available from the Jones well, along with a suite of logs including neutron and density porosity, photoelectric factor, and thorium, uranium and potassium components of the spectral gamma ray log. We will recast the density porosity as apparent matrix density (Rhomaa) and the photoelectric factor as apparent matrix volumetric photoelectric absorption (Umaa), so that the six logs employed for discrimination are: Th, U, K, Rhomaa, Umaa, and φ N. The six facies piced from core are marine shale, paralic (coastal), floodplain, channel sandstone, splay sandstone, and paleosol.

3 So, we will train on this data from the Jones well and loo at predictions both in the Jones well and the Kenyon well. 3

4 For the sae of illustration, we will also loo at a two-dimensional, two-group sub-example, trying to discriminate marine and paralic facies: There are 8 core samples designated marine and 56 designated paralic. Classical discriminant analysis assumes that the data from each group follow a multivariate normal distribution, so we will tae a bit of time to loo at some properties of this distribution. 4

5 The Normal (Gaussian) Density Function The probability density function for a single normally distributed variable, X, with a mean of µ and a standard deviation of σ is given by or where f π [ σ ] ( x) = exp ( x µ ) f σ ( x) = exp[ 0.5z ] σ z π = x µ σ represents a standardized version of x. The standardized random variable, Z, follows a normal distribution with a mean of zero and standard deviation of, or Z ~ N(0,). The appearance of z, the negative of the squared scaled distance to the mean, in the exponential is very important. This means that squared scaled distances (scaled Euclidean distances) are the natural distance metric for normally distributed variables. This leads, for example, to the close connection between the normal distribution and leastsquares regression. 5

6 For the multivariate normal distribution, we now consider a vector of random variables, X, with a vector mean of µ and a covariance matrix S. That is each individual variable, X i, follows a normal distribution with a mean of µ i and a variance of σ i = Σii, the appropriate diagonal element of the covariance matrix. The covariance between any pair of the variables, X i and X j, is given by Σ ij and the corresponding correlation is given by Σ ij ρ ij =. ΣiiΣ jj 6

7 The multivariate normal density function for X is given by f ( x) = ( π ) p exp S ( ) ( ) x µ S x µ where p is the number of variables or components of X and S is the determinant of the covariance matrix. The quadratic form z = ( x µ ) S ( x µ ) now represents a squared distance to the vector mean scaled according to the variances and covariances specified in S. This is the squared Mahalanobis distance to the vector mean. Using z we can express the multivariate normal density function in the same form as the univariate version, apart from appropriate differences in the normalizing factor out front: f ( x) = ( π ) p S exp p / [ 0.5z ] = ( π ) S / exp[ 0.5z ] The second form shown will be handy for later development. 7

8 If each component variable, X i, is scaled to zero mean and unit standard deviation in advance of the analysis, the resulting vector mean is µ = 0 and the covariance matrix is the same as the correlation matrix of the original variables, with s on the diagonal and correlations, ρ ij, in the offdiagonal locations. In general, statistical analyses (regression, classification, etc.) using these standardized yield equivalent results to those based on the original variables. In particular, Mahalanobis distances between points in the standardized space are the same as those between corresponding points in the original space, so that the fundamental configuration of the data is unchanged by the translation (to zero mean) and scaling (to unit standard deviation). If all the variables are mutually uncorrelated, then the correlation matrix (which is also the covariance matrix in standardized space) is the identity matrix and Mahalanobis distances reduce to Euclidean distances in the standardized space. This is all to say that Mahalanobis distances are essentially Euclidean distances scaled according to the individual variances (or standard deviations) and adjusted to account for correlations among the variables. The latter adjustment is basically a coordinate rotation and re-scaling in accordance with the principal axes of the correlation matrix. The following plots show contours of the bivariate normal density function and Mahalanobis distances (M.D.) to the points (,) and (,-) when the correlation, ρ, between the two variables is 0 and when it is

9 9

10 Fitting the Normal Distribution to Data The normal distribution is determined by two parameters, mean and variance (or standard deviation). In a multivariate context, this means a mean vector and a covariance matrix. Fitting the normal distribution to a set of N data values is a simple matter of computing the average for each variable, X i : X = N N i X i, n n= and the sample covariance between each pair of variables, X i and X j : Cov ( X, X ) i j = ( N ) N n= ( X X ) ( X X ) i, n i j, n j These serve as the estimates of the population means and covariances (variances when i = j). The division by (N-), rather than N, for the covariance gives the usual unbiased estimator. Although it is easy to fit the normal distribution in the sense of computing the sample means and covariances, there is absolutely no guarantee that the resulting normal distribution will actually fit the data well. 0

11

12 Assessing Multivariate Normality The goodness-of-fit of the normal distribution to the observed data should be assessed prior to applying normalbased procedures, including classical discriminant analysis. Methods for assessing the goodness of fit to a normal distribution include graphical displays such as quantilequantile plots and numerical tests such as the Kolmogorov- Smirnov test. The Matlab Statistics toolbox contains various functions for testing normality of univariate data (stest, jbtest, lillietest). You can also assess fits using the distribution fitting tool (dfittool). In the multivariate case, each variable must be normally distributed for the entire set to follow a multivariate normal distribution, but normality of the individual variables does not guarantee multivariate normality. However, if the data do follow a multivariate normal distribution, then the squared Mahalanobis distances from the data points to the centroid (mean) should follow a chi-squared distribution with p degrees of freedom. As an example, we can compute squared M.D. s for the marine Umaa-Rhomaa data points, in the 8x (Nxp) data matrix URMar, using N = size(urmar,); % number of data points (rows) mu = mean(urmar); % x p vector of column means sigma = cov(urmar); % p x p covariance matrix xmd = zeros(n,); % initialize vector of squared md's for i = :N z = (URMar(i,:)- mu)'; % difference from mean xmd(i) = z'*(sigma\z); % squared MD to mean end

13 Then we can use a Kolmogorov-Smirnov test to compare the computed squared M.D. s to the theoretical chi-squared cumulative density function with p = degrees of freedom: >> stest(xmd,[sort(xmd) chicdf(sort(xmd),)]) ans = 0 The K-S test compares the empirical cumulative density function for the data to the given theoretical CDF (in this case, the chi-squared with degrees of freedom), comparing the maximum difference between the two to a certain test statistic whose distribution is nown under the null hypothesis that the data follow the specified distribution. The answer of 0 indicates that, at a 5% significance level, we cannot reject the hypothesis that the observed values follow the chi-squared distribution. We can also construct a quantile-quanitle plot of the observed squared M.D. s versus chi-squared quantiles using code lie: prb = ( (:N) )/N; % probability values % degrees of freedom = number of variables: df = size(urmar,); % plot sorted xmd s against chi-squared quantiles: plot(chiinv(prb,df),sort(xmd),'.'); hold on plot([0 4],[0 4],'r-'); % to line 3

14 The resulting plot is shown below. This could also be accomplished with Matlab s qqplot function. The plot shows noticeable deviation from the -to- line, but the deviations are not strong enough for the K-S test to reject the possibility that the squared M.D. s follow a chi-squared distribution. 4

15 Discriminant Analysis Classical discriminant analysis results from assuming that each data point arises with prior probability q from one of K different groups or classes, each characterized by its own group mean vector, µ, and covariance matrix, S. Plugging the multivariate normal density function into Bayes theorem yields the following posterior probability for group given a vector of observed data values, x: with P = ( x) q K q l= = Pr[ G q f = x] = K q f l= ( x) ( x) p / / ( π ) S [ ] exp 0.5z p / / ( π ) S exp[ 0.5z ] l z = µ l ( x µ ) S ( x ) representing the squared Mahalanobis distance from the data vector x to the th p group mean. The factor ( π ) is the same for all groups, so that the posterior probability for each group at x is an exponential transform of the negative squared Mahalanobis distance to the group centroid adjusted by the prior probability and the determinant of the group covariance matrix. l l l 5

16 The training process for classical discriminant analysis simply consists of estimating the mean and covariance matrix for each group or class, based on a training dataset with nown classes for each data point. The j th component of the mean vector for group is simply the mean for variable j over the N data points in group. x ; j = xn; N n j where n indicates the set of data points in group. The group covariance matrices are commonly estimated one of two ways: Either a distinct estimate, S, is developed for each group s covariance matrix, with entries given by S ; i, j = n; i ; i n; j ; ( N ) n ( x x )( x x ) or the covariance matrices are assumed to be equal and estimated by a single pooled estimate, S, with entries: j S i, j = ( N K ) N n= ( x x ) ( x x ) n; i ( n); i n; j ( n); j where by x ( n); i I mean the i th component of the mean vector for whichever group data point n belongs to, (n). 6

17 If the group covariance matrices are assumed to be equal and estimated by the pooled data covariance matrix, S, then the squared Mahalanobis distance from a data vector x to the mean of group is given by z = ( x x ) S ( x x ) and the covariance matrix determinants are all equal, so that Bayes formula reduces to P q exp ( ) [ 0.5z ] = K q exp[ 0.5z ] x. l= l Computing the Mahalanobis distances using a common covariance estimate and then assigning each data vector, x, to the group with the highest posterior probability results in an allocation rule that draws linear boundaries between regions of space allocated to different groups. Thus, this approach is called linear discriminant analysis. Linear discriminant analysis is implemented by the classify function in Matlab s statistics toolbox, setting the type option to linear (which is the default). l 7

18 With the Umaa-Rhomaa data in the data matrix URData and the facies numbers ( for Marine, for Paralic) in the vector URFacies, we can perform linear discriminant analysis, predicting bac on the training data, using >> [class,err,prob] = classify(urdata,urdata,urfacies,'linear'); >> % the estimated error (misclassification) rate >> err err = >> % compare original and predicted facies >> crosstab(urfacies,class) ans = 6 45 We can get a contour plot of the posterior probability for the marine facies by predicting over a grid of values specified in URGrid (a 00 x matrix containing all the combinations of 0 regularly-spaced Umaa values and 0 regularly-spaced Rhomaa values): [class,err,prob]=classify(urgrid,urdata,urfacies,'linear'); and then contouring the first column of the probability matrix against the grid coordinates (Ug, Rg): >> [cs,h] = contour(ug,rg,reshape(prob(:,),0,0)',(0.:0.:0.9)); >> clabel(cs,h,[ ]) >> hold on >> plot(urmar(:,),urmar(:,),'r+') >> plot(urpar(:,),urpar(:,),'bo') 8

19 (I ve also reversed the Rhomaa axis.) Because there are only two classes, the posterior probability for the paralic facies is just one minus that for the marine facies. The Prob = 0.5 contour is the dividing line between the regions in Umaa-Rhomaa space allocated to the two classes by Bayes rule. Misallocations are inevitable unless there is no overlap between the original classes. Bayes rule yields the minimum error or misclassification rate when the density estimates going into it are accurate. Adjusting the prior probabilities shifts the placement of the probability contours but does not change their orientation. 9

20 If the covariance matrices are not assumed to be equal and each is estimated separately by S, then the Mahalanobis to each group is given by z = ( x x ) S ( x x ) and, in general, the covariance matrix determinants all differ, so that P ( ) [ 0.5z ] [ 0.5z ] q S exp x =. K q S exp l= l l Using Mahalanobis distances computed from the groupspecific covariance matrices leads to an allocation rule that draws quadratic boundaries between groups in the variable space, so this is called quadratic discriminant analysis. It can be implemented using the classify function with type set to quadratic. The Matlab documentation refers to the individual-group covariance estimates as stratified by group. l 0

21 Applying quadratic discriminant analysis to the Umaa- Rhomaa data yields: >> [class,err,prob] = classify(urdata,urdata,urfacies,'quadratic'); >> err % a slightly lower error rate err = >> crosstab(urfacies,class) ans =

22 Note that if the prior probabilities for all groups are assumed to be equal (e.g, you have no basis for assigning unequal priors) and the covariance matrix determinants are assumed to be equal (or, differences among them are ignored), then Bayes formula simply reduces to P exp ( ) [ 0.5z ] x = K exp[ 0.5z ] l= so that allocating x to the nearest group, in terms of Mahalanobis distance, is equivalent allocating it to the group with the highest posterior probability. In other words, if you are only interested in allocation, and not probabilities, you can stop after computing Mahalanobis distances and assign each data point to the group with minimum Mahalanobis distance. This can be done with classify function by setting type to mahalanobis. Note that this option uses stratified covariance matrix estimates, not a pooled covariance matrix estimate. Since the stratified estimates, S, will in general have differing determinants, running classify with type = mahalanobis will produce different allocations than running it using type = quadratic, even if you specify equal priors. The Mahalanobis distance allocations will also differ from those produced using type = linear, since the latter computes distances based on the pooled covariance matrix. l

23 Applying linear discriminant analysis to the full Jones dataset, with six facies (specified in the vector JonesFac) and six logs (in the 88 x 6 data matrix JonesVar), and predicting bac on the Jones data itself yields... >> [class,err,prob] = classify(jonesvar,jonesvar,jonesfac,'linear'); >> err err = >> crosstab(jonesfac,class) Quadratic discriminant analysis yields: >> [class,err,prob] = classify(jonesvar,jonesvar,jonesfac,'quadratic'); >> err err = 0.44 >> crosstab(jonesfac,class) So, Q.D.A. produces a lower misallocation rate than L.D.A. when the training data are resubstituted into the allocation rule. This is not necessarily a good thing, since reproducing the training data too accurately (overtraining) can lead to poor generalization. More on that next time. 3

24 For the sae of illustration, we will loo at the Q.D.A. allocations versus depth: Note: The equality of the number of categories (facies) and number of predictor variables (logs) is a purely coincidental aspect of this example. For all the millions of things I haven t said, see McLachlan, G.J., 99, Discriminant Analysis and Statistical Pattern Recognition, John Wiley & Sons, Inc., New Yor, 56 pp. 4

DIMENSION REDUCTION AND CLUSTER ANALYSIS

DIMENSION REDUCTION AND CLUSTER ANALYSIS DIMENSION REDUCTION AND CLUSTER ANALYSIS EECS 833, 6 March 2006 Geoff Bohling Assistant Scientist Kansas Geological Survey geoff@kgs.ku.edu 864-2093 Overheads and resources available at http://people.ku.edu/~gbohling/eecs833

More information

INTRODUCTION TO WELL LOGS And BAYES THEOREM

INTRODUCTION TO WELL LOGS And BAYES THEOREM INTRODUCTION TO WELL LOGS And BAYES THEOREM EECS 833, 7 February 006 Geoff Bohling Assistant Scientist Kansas Geological Survey geoff@kgs.ku.edu 864-093 Overheads and resources available at http://people.ku.edu/~gbohling/eecs833

More information

Discriminant analysis and supervised classification

Discriminant analysis and supervised classification Discriminant analysis and supervised classification Angela Montanari 1 Linear discriminant analysis Linear discriminant analysis (LDA) also known as Fisher s linear discriminant analysis or as Canonical

More information

Gaussian Models

Gaussian Models Gaussian Models ddebarr@uw.edu 2016-04-28 Agenda Introduction Gaussian Discriminant Analysis Inference Linear Gaussian Systems The Wishart Distribution Inferring Parameters Introduction Gaussian Density

More information

Table of Contents. Multivariate methods. Introduction II. Introduction I

Table of Contents. Multivariate methods. Introduction II. Introduction I Table of Contents Introduction Antti Penttilä Department of Physics University of Helsinki Exactum summer school, 04 Construction of multinormal distribution Test of multinormality with 3 Interpretation

More information

Feature selection and extraction Spectral domain quality estimation Alternatives

Feature selection and extraction Spectral domain quality estimation Alternatives Feature selection and extraction Error estimation Maa-57.3210 Data Classification and Modelling in Remote Sensing Markus Törmä markus.torma@tkk.fi Measurements Preprocessing: Remove random and systematic

More information

The Bayes classifier

The Bayes classifier The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal

More information

Generative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham

Generative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham Generative classifiers: The Gaussian classifier Ata Kaban School of Computer Science University of Birmingham Outline We have already seen how Bayes rule can be turned into a classifier In all our examples

More information

Introduction to Machine Learning

Introduction to Machine Learning Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},

More information

CMSC858P Supervised Learning Methods

CMSC858P Supervised Learning Methods CMSC858P Supervised Learning Methods Hector Corrada Bravo March, 2010 Introduction Today we discuss the classification setting in detail. Our setting is that we observe for each subject i a set of p predictors

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

Part I. Linear Discriminant Analysis. Discriminant analysis. Discriminant analysis

Part I. Linear Discriminant Analysis. Discriminant analysis. Discriminant analysis Week 5 Based in part on slides from textbook, slides of Susan Holmes Part I Linear Discriminant Analysis October 29, 2012 1 / 1 2 / 1 Nearest centroid rule Suppose we break down our data matrix as by the

More information

Introduction to Machine Learning Spring 2018 Note 18

Introduction to Machine Learning Spring 2018 Note 18 CS 189 Introduction to Machine Learning Spring 2018 Note 18 1 Gaussian Discriminant Analysis Recall the idea of generative models: we classify an arbitrary datapoint x with the class label that maximizes

More information

Machine Learning (CS 567) Lecture 5

Machine Learning (CS 567) Lecture 5 Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

6-1. Canonical Correlation Analysis

6-1. Canonical Correlation Analysis 6-1. Canonical Correlation Analysis Canonical Correlatin analysis focuses on the correlation between a linear combination of the variable in one set and a linear combination of the variables in another

More information

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification ISyE 6416: Computational Statistics Spring 2017 Lecture 5: Discriminant analysis and classification Prof. Yao Xie H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology

More information

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ). .8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Bayesian Classification Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

More information

Lecture Note 1: Probability Theory and Statistics

Lecture Note 1: Probability Theory and Statistics Univ. of Michigan - NAME 568/EECS 568/ROB 530 Winter 2018 Lecture Note 1: Probability Theory and Statistics Lecturer: Maani Ghaffari Jadidi Date: April 6, 2018 For this and all future notes, if you would

More information

Linear Discriminant Analysis Based in part on slides from textbook, slides of Susan Holmes. November 9, Statistics 202: Data Mining

Linear Discriminant Analysis Based in part on slides from textbook, slides of Susan Holmes. November 9, Statistics 202: Data Mining Linear Discriminant Analysis Based in part on slides from textbook, slides of Susan Holmes November 9, 2012 1 / 1 Nearest centroid rule Suppose we break down our data matrix as by the labels yielding (X

More information

Chapter 5 continued. Chapter 5 sections

Chapter 5 continued. Chapter 5 sections Chapter 5 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions

More information

MSA220 Statistical Learning for Big Data

MSA220 Statistical Learning for Big Data MSA220 Statistical Learning for Big Data Lecture 4 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology More on Discriminant analysis More on Discriminant

More information

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30 Problem Set 2 MAS 622J/1.126J: Pattern Recognition and Analysis Due: 5:00 p.m. on September 30 [Note: All instructions to plot data or write a program should be carried out using Matlab. In order to maintain

More information

CS 340 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis

CS 340 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis CS 3 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis AD March 11 AD ( March 11 1 / 17 Multivariate Gaussian Consider data { x i } N i=1 where xi R D and we assume they are

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

STT 843 Key to Homework 1 Spring 2018

STT 843 Key to Homework 1 Spring 2018 STT 843 Key to Homework Spring 208 Due date: Feb 4, 208 42 (a Because σ = 2, σ 22 = and ρ 2 = 05, we have σ 2 = ρ 2 σ σ22 = 2/2 Then, the mean and covariance of the bivariate normal is µ = ( 0 2 and Σ

More information

SF2935: MODERN METHODS OF STATISTICAL LECTURE 3 SUPERVISED CLASSIFICATION, LINEAR DISCRIMINANT ANALYSIS LEARNING. Tatjana Pavlenko.

SF2935: MODERN METHODS OF STATISTICAL LECTURE 3 SUPERVISED CLASSIFICATION, LINEAR DISCRIMINANT ANALYSIS LEARNING. Tatjana Pavlenko. SF2935: MODERN METHODS OF STATISTICAL LEARNING LECTURE 3 SUPERVISED CLASSIFICATION, LINEAR DISCRIMINANT ANALYSIS Tatjana Pavlenko 5 November 2015 SUPERVISED LEARNING (REP.) Starting point: we have an outcome

More information

Motivating the Covariance Matrix

Motivating the Covariance Matrix Motivating the Covariance Matrix Raúl Rojas Computer Science Department Freie Universität Berlin January 2009 Abstract This note reviews some interesting properties of the covariance matrix and its role

More information

Classification: Linear Discriminant Analysis

Classification: Linear Discriminant Analysis Classification: Linear Discriminant Analysis Discriminant analysis uses sample information about individuals that are known to belong to one of several populations for the purposes of classification. Based

More information

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation Lecture 15. Pattern Classification (I): Statistical Formulation Outline Statistical Pattern Recognition Maximum Posterior Probability (MAP) Classifier Maximum Likelihood (ML) Classifier K-Nearest Neighbor

More information

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout :. The Multivariate Gaussian & Decision Boundaries..15.1.5 1 8 6 6 8 1 Mark Gales mjfg@eng.cam.ac.uk Lent

More information

L5: Quadratic classifiers

L5: Quadratic classifiers L5: Quadratic classifiers Bayes classifiers for Normally distributed classes Case 1: Σ i = σ 2 I Case 2: Σ i = Σ (Σ diagonal) Case 3: Σ i = Σ (Σ non-diagonal) Case 4: Σ i = σ 2 i I Case 5: Σ i Σ j (general

More information

Correlation and Regression Theory 1) Multivariate Statistics

Correlation and Regression Theory 1) Multivariate Statistics Correlation and Regression Theory 1) Multivariate Statistics What is a multivariate data set? How to statistically analyze this data set? Is there any kind of relationship between different variables in

More information

The F distribution and its relationship to the chi squared and t distributions

The F distribution and its relationship to the chi squared and t distributions The chemometrics column Published online in Wiley Online Library: 3 August 2015 (wileyonlinelibrary.com) DOI: 10.1002/cem.2734 The F distribution and its relationship to the chi squared and t distributions

More information

THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2012, Mr. Ruey S. Tsay

THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2012, Mr. Ruey S. Tsay THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2012, Mr Ruey S Tsay Lecture 9: Discrimination and Classification 1 Basic concept Discrimination is concerned with separating

More information

An Introduction to Multivariate Statistical Analysis

An Introduction to Multivariate Statistical Analysis An Introduction to Multivariate Statistical Analysis Third Edition T. W. ANDERSON Stanford University Department of Statistics Stanford, CA WILEY- INTERSCIENCE A JOHN WILEY & SONS, INC., PUBLICATION Contents

More information

STAT 461/561- Assignments, Year 2015

STAT 461/561- Assignments, Year 2015 STAT 461/561- Assignments, Year 2015 This is the second set of assignment problems. When you hand in any problem, include the problem itself and its number. pdf are welcome. If so, use large fonts and

More information

Revision: Chapter 1-6. Applied Multivariate Statistics Spring 2012

Revision: Chapter 1-6. Applied Multivariate Statistics Spring 2012 Revision: Chapter 1-6 Applied Multivariate Statistics Spring 2012 Overview Cov, Cor, Mahalanobis, MV normal distribution Visualization: Stars plot, mosaic plot with shading Outlier: chisq.plot Missing

More information

Classification with Gaussians

Classification with Gaussians Classification with Gaussians Andreas C. Kapourani (Credit: Hiroshi Shimodaira) 09 March 2018 1 Classification In the previous lab sessions, we applied Bayes theorem in order to perform statistical pattern

More information

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II) Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

The following postestimation commands are of special interest after discrim qda: The following standard postestimation commands are also available:

The following postestimation commands are of special interest after discrim qda: The following standard postestimation commands are also available: Title stata.com discrim qda postestimation Postestimation tools for discrim qda Syntax for predict Menu for predict Options for predict Syntax for estat Menu for estat Options for estat Remarks and examples

More information

Lecture 9: Classification, LDA

Lecture 9: Classification, LDA Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we

More information

Supervised Learning: Linear Methods (1/2) Applied Multivariate Statistics Spring 2012

Supervised Learning: Linear Methods (1/2) Applied Multivariate Statistics Spring 2012 Supervised Learning: Linear Methods (1/2) Applied Multivariate Statistics Spring 2012 Overview Review: Conditional Probability LDA / QDA: Theory Fisher s Discriminant Analysis LDA: Example Quality control:

More information

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu

More information

p(x ω i 0.4 ω 2 ω

p(x ω i 0.4 ω 2 ω p( ω i ). ω.3.. 9 3 FIGURE.. Hypothetical class-conditional probability density functions show the probability density of measuring a particular feature value given the pattern is in category ω i.if represents

More information

Lecture 3: Pattern Classification

Lecture 3: Pattern Classification EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures

More information

Dependence. MFM Practitioner Module: Risk & Asset Allocation. John Dodson. September 11, Dependence. John Dodson. Outline.

Dependence. MFM Practitioner Module: Risk & Asset Allocation. John Dodson. September 11, Dependence. John Dodson. Outline. MFM Practitioner Module: Risk & Asset Allocation September 11, 2013 Before we define dependence, it is useful to define Random variables X and Y are independent iff For all x, y. In particular, F (X,Y

More information

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Intelligent Data Analysis and Probabilistic Inference Lecture

More information

Hypothesis Testing for Var-Cov Components

Hypothesis Testing for Var-Cov Components Hypothesis Testing for Var-Cov Components When the specification of coefficients as fixed, random or non-randomly varying is considered, a null hypothesis of the form is considered, where Additional output

More information

Lecture 9: Classification, LDA

Lecture 9: Classification, LDA Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we

More information

Minimum Error Rate Classification

Minimum Error Rate Classification Minimum Error Rate Classification Dr. K.Vijayarekha Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur-613 401 Table of Contents 1.Minimum Error Rate Classification...

More information

Lecture 6: Methods for high-dimensional problems

Lecture 6: Methods for high-dimensional problems Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,

More information

Feature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size

Feature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size Feature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size Berkman Sahiner, a) Heang-Ping Chan, Nicholas Petrick, Robert F. Wagner, b) and Lubomir Hadjiiski

More information

Dimension Reduction. David M. Blei. April 23, 2012

Dimension Reduction. David M. Blei. April 23, 2012 Dimension Reduction David M. Blei April 23, 2012 1 Basic idea Goal: Compute a reduced representation of data from p -dimensional to q-dimensional, where q < p. x 1,...,x p z 1,...,z q (1) We want to do

More information

An Alternative Algorithm for Classification Based on Robust Mahalanobis Distance

An Alternative Algorithm for Classification Based on Robust Mahalanobis Distance Dhaka Univ. J. Sci. 61(1): 81-85, 2013 (January) An Alternative Algorithm for Classification Based on Robust Mahalanobis Distance A. H. Sajib, A. Z. M. Shafiullah 1 and A. H. Sumon Department of Statistics,

More information

A Program for Data Transformations and Kernel Density Estimation

A Program for Data Transformations and Kernel Density Estimation A Program for Data Transformations and Kernel Density Estimation John G. Manchuk and Clayton V. Deutsch Modeling applications in geostatistics often involve multiple variables that are not multivariate

More information

Lecture 4 Discriminant Analysis, k-nearest Neighbors

Lecture 4 Discriminant Analysis, k-nearest Neighbors Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se

More information

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians Engineering Part IIB: Module F Statistical Pattern Processing University of Cambridge Engineering Part IIB Module F: Statistical Pattern Processing Handout : Multivariate Gaussians. Generative Model Decision

More information

Gaussian Quiz. Preamble to The Humble Gaussian Distribution. David MacKay 1

Gaussian Quiz. Preamble to The Humble Gaussian Distribution. David MacKay 1 Preamble to The Humble Gaussian Distribution. David MacKay Gaussian Quiz H y y y 3. Assuming that the variables y, y, y 3 in this belief network have a joint Gaussian distribution, which of the following

More information

Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee

Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee Lecture - 04 Basic Statistics Part-1 (Refer Slide Time: 00:33)

More information

EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 19

EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 19 EEC 686/785 Modeling & Performance Evaluation of Computer Systems Lecture 19 Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org (based on Dr. Raj Jain s lecture

More information

CHAPTER 5. Outlier Detection in Multivariate Data

CHAPTER 5. Outlier Detection in Multivariate Data CHAPTER 5 Outlier Detection in Multivariate Data 5.1 Introduction Multivariate outlier detection is the important task of statistical analysis of multivariate data. Many methods have been proposed for

More information

Linear Methods for Prediction

Linear Methods for Prediction This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

Hotelling s One- Sample T2

Hotelling s One- Sample T2 Chapter 405 Hotelling s One- Sample T2 Introduction The one-sample Hotelling s T2 is the multivariate extension of the common one-sample or paired Student s t-test. In a one-sample t-test, the mean response

More information

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1 EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle

More information

Bayesian Decision Theory

Bayesian Decision Theory Bayesian Decision Theory Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2017 CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 1 / 46 Bayesian

More information

Statistics Toolbox 6. Apply statistical algorithms and probability models

Statistics Toolbox 6. Apply statistical algorithms and probability models Statistics Toolbox 6 Apply statistical algorithms and probability models Statistics Toolbox provides engineers, scientists, researchers, financial analysts, and statisticians with a comprehensive set of

More information

Principal component analysis

Principal component analysis Principal component analysis Motivation i for PCA came from major-axis regression. Strong assumption: single homogeneous sample. Free of assumptions when used for exploration. Classical tests of significance

More information

Classification 2: Linear discriminant analysis (continued); logistic regression

Classification 2: Linear discriminant analysis (continued); logistic regression Classification 2: Linear discriminant analysis (continued); logistic regression Ryan Tibshirani Data Mining: 36-462/36-662 April 4 2013 Optional reading: ISL 4.4, ESL 4.3; ISL 4.3, ESL 4.4 1 Reminder:

More information

Rejection regions for the bivariate case

Rejection regions for the bivariate case Rejection regions for the bivariate case The rejection region for the T 2 test (and similarly for Z 2 when Σ is known) is the region outside of an ellipse, for which there is a (1-α)% chance that the test

More information

Multivariate Statistics Random Projections and Johnson-Lindenstrauss Lemma

Multivariate Statistics Random Projections and Johnson-Lindenstrauss Lemma Multivariate Statistics Random Projections and Johnson-Lindenstrauss Lemma Suppose again we have n sample points x,..., x n R p. The data-point x i R p can be thought of as the i-th row X i of an n p-dimensional

More information

Machine Learning 2017

Machine Learning 2017 Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section

More information

Lecture 8: Classification

Lecture 8: Classification 1/26 Lecture 8: Classification Måns Eriksson Department of Mathematics, Uppsala University eriksson@math.uu.se Multivariate Methods 19/5 2010 Classification: introductory examples Goal: Classify an observation

More information

Lecture 9: Classification, LDA

Lecture 9: Classification, LDA Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis Jonathan Taylor, 10/12 Slide credits: Sergio Bacallado 1 / 1 Review: Main strategy in Chapter 4 Find an estimate ˆP

More information

Random vectors X 1 X 2. Recall that a random vector X = is made up of, say, k. X k. random variables.

Random vectors X 1 X 2. Recall that a random vector X = is made up of, say, k. X k. random variables. Random vectors Recall that a random vector X = X X 2 is made up of, say, k random variables X k A random vector has a joint distribution, eg a density f(x), that gives probabilities P(X A) = f(x)dx Just

More information

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians University of Cambridge Engineering Part IIB Module 4F: Statistical Pattern Processing Handout 2: Multivariate Gaussians.2.5..5 8 6 4 2 2 4 6 8 Mark Gales mjfg@eng.cam.ac.uk Michaelmas 2 2 Engineering

More information

Robustness of the Quadratic Discriminant Function to correlated and uncorrelated normal training samples

Robustness of the Quadratic Discriminant Function to correlated and uncorrelated normal training samples DOI 10.1186/s40064-016-1718-3 RESEARCH Open Access Robustness of the Quadratic Discriminant Function to correlated and uncorrelated normal training samples Atinuke Adebanji 1,2, Michael Asamoah Boaheng

More information

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing 1 In most statistics problems, we assume that the data have been generated from some unknown probability distribution. We desire

More information

Naive Bayes & Introduction to Gaussians

Naive Bayes & Introduction to Gaussians Naive Bayes & Introduction to Gaussians Andreas C. Kapourani 2 March 217 1 Naive Bayes classifier In the previous lab we illustrated how to use Bayes Theorem for pattern classification, which in practice

More information

Confidence Intervals, Testing and ANOVA Summary

Confidence Intervals, Testing and ANOVA Summary Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0

More information

MULTIVARIATE PATTERN RECOGNITION FOR CHEMOMETRICS. Richard Brereton

MULTIVARIATE PATTERN RECOGNITION FOR CHEMOMETRICS. Richard Brereton MULTIVARIATE PATTERN RECOGNITION FOR CHEMOMETRICS Richard Brereton r.g.brereton@bris.ac.uk Pattern Recognition Book Chemometrics for Pattern Recognition, Wiley, 2009 Pattern Recognition Pattern Recognition

More information

Classification Methods II: Linear and Quadratic Discrimminant Analysis

Classification Methods II: Linear and Quadratic Discrimminant Analysis Classification Methods II: Linear and Quadratic Discrimminant Analysis Rebecca C. Steorts, Duke University STA 325, Chapter 4 ISL Agenda Linear Discrimminant Analysis (LDA) Classification Recall that linear

More information

Unconstrained Ordination

Unconstrained Ordination Unconstrained Ordination Sites Species A Species B Species C Species D Species E 1 0 (1) 5 (1) 1 (1) 10 (4) 10 (4) 2 2 (3) 8 (3) 4 (3) 12 (6) 20 (6) 3 8 (6) 20 (6) 10 (6) 1 (2) 3 (2) 4 4 (5) 11 (5) 8 (5)

More information

Outline. Simulation of a Single-Server Queueing System. EEC 686/785 Modeling & Performance Evaluation of Computer Systems.

Outline. Simulation of a Single-Server Queueing System. EEC 686/785 Modeling & Performance Evaluation of Computer Systems. EEC 686/785 Modeling & Performance Evaluation of Computer Systems Lecture 19 Outline Simulation of a Single-Server Queueing System Review of midterm # Department of Electrical and Computer Engineering

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 9 for Applied Multivariate Analysis Outline Two sample T 2 test 1 Two sample T 2 test 2 Analogous to the univariate context, we

More information

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,

More information

Review (Probability & Linear Algebra)

Review (Probability & Linear Algebra) Review (Probability & Linear Algebra) CE-725 : Statistical Pattern Recognition Sharif University of Technology Spring 2013 M. Soleymani Outline Axioms of probability theory Conditional probability, Joint

More information

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

CS 195-5: Machine Learning Problem Set 1

CS 195-5: Machine Learning Problem Set 1 CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of

More information

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition Memorial University of Newfoundland Pattern Recognition Lecture 6 May 18, 2006 http://www.engr.mun.ca/~charlesr Office Hours: Tuesdays & Thursdays 8:30-9:30 PM EN-3026 Review Distance-based Classification

More information

* Tuesday 17 January :30-16:30 (2 hours) Recored on ESSE3 General introduction to the course.

* Tuesday 17 January :30-16:30 (2 hours) Recored on ESSE3 General introduction to the course. Name of the course Statistical methods and data analysis Audience The course is intended for students of the first or second year of the Graduate School in Materials Engineering. The aim of the course

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Regularized Discriminant Analysis. Part I. Linear and Quadratic Discriminant Analysis. Discriminant Analysis. Example. Example. Class distribution

Regularized Discriminant Analysis. Part I. Linear and Quadratic Discriminant Analysis. Discriminant Analysis. Example. Example. Class distribution Part I 09.06.2006 Discriminant Analysis The purpose of discriminant analysis is to assign objects to one of several (K) groups based on a set of measurements X = (X 1, X 2,..., X p ) which are obtained

More information

A Probability Review

A Probability Review A Probability Review Outline: A probability review Shorthand notation: RV stands for random variable EE 527, Detection and Estimation Theory, # 0b 1 A Probability Review Reading: Go over handouts 2 5 in

More information

STA 437: Applied Multivariate Statistics

STA 437: Applied Multivariate Statistics Al Nosedal. University of Toronto. Winter 2015 1 Chapter 5. Tests on One or Two Mean Vectors If you can t explain it simply, you don t understand it well enough Albert Einstein. Definition Chapter 5. Tests

More information

LEC 4: Discriminant Analysis for Classification

LEC 4: Discriminant Analysis for Classification LEC 4: Discriminant Analysis for Classification Dr. Guangliang Chen February 25, 2016 Outline Last time: FDA (dimensionality reduction) Today: QDA/LDA (classification) Naive Bayes classifiers Matlab/Python

More information

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21 CS 70 Discrete Mathematics and Probability Theory Fall 205 Lecture 2 Inference In this note we revisit the problem of inference: Given some data or observations from the world, what can we infer about

More information