Discriminant Analysis (DA)

Size: px

Start display at page:

Download "Discriminant Analysis (DA)"

Theodora French
6 years ago
Views:

1 Discriminant Analysis (DA) Involves two main goals: 1) Separation/discrimination: Descriptive Discriminant Analysis (DDA) 2) Classification/allocation: Predictive Discriminant Analysis (PDA) In DDA Classification of subjects into known groups on the basis of their quantitative characteristics Using known groups (classes) and their multiple characteristics to build a model (discriminant rule) that can discriminate among groups, in other words Relating a classification variable to multiple quantitative explanatory variables (p responses) the model built, may be used to classify new observations into the known groups (PDA) the success or error of assigning new observations to known groups depends on the quality of the model built Differences between DDA and Cluster Analysis In DA the grouping is known before the data analysis, we perform the analysis to gain better understanding of the grouping structure and restructure using multiple characteristics In cluster analysis the grouping and its structure are not known before the data analysis The grouping resulted from cluster analysis is only suggestive (not necessarily the true but unknown clustering structure) 197

2 Difference between DDA and Regression analysis In DDA the dependent variable is categorical, but in regression analysis the dependent variable is continuous Examples of usage: aving a number of subjects who can be classified as having 1) heart disease or 2) no heart disease a set of their medical, physiological, dietary, characteristics to address if the heart condition can be explained by such characteristics, and if so identify a discriminatory model to classify new observations into each of the heart-condition groups (i.e., predict group membership of new subjects) based on related characteristics Credit card companies develop discriminant models based on past records to predict applicants who will be credit worthy or delinquent Developing a model to explain popularity of TV programs (ads, news, etc.) and predict popularity of a new program Males and Females, Young and Adults, Conifers and ardwoods, Smokers and Non-smokers, Successful or Unsuccessful students in graduating, Bankrupt and Successful companies (useful for company managers as well as share holders), originating an artifact to a civilization or tribe 198

3 Assumptions: Independence of subjects If multivariate normality can be assumed and variance/covariance matrices are equal, then - linear discriminant analysis should be used variance/covariance matrices are not equal, then - quadratic discriminant analysis should be used If multivariate normality CANNOT be assumed or explanatory variables are categorical, then use SAS GENMODE or LOGISTIC procedures, or SAS DISCRIM procedure and non-parametric discriminant analysis using the kernel or K nearest-neighbor method 199

4 A simple Example of DDA and PDA: There are two known groups belonging to 2 separate populations (happy and unhappy) and one quantitative variable (income). Furthermore, assume that income is normally distributed with equal variances for the two populations as below Figure adapted from Afifi & Clark, 1998 What would be the best criterion based on the available info. (income) to separate the 2 groups (assuming income means differ significantly p < 0.01)? In other words, the separation (discrimination) value of income (c) for the two groups? 200

5 x I x C II, this can serve as a classification (discrimination) rule, 2 to model (predict) the future obs. - with income < C belonging to group - with income > C belonging to group Is it possible to make an error (i.e., misclassification)? Is the probability of misclassification equal? ow can we calculate the exact error? - to classify based on the rule developed and compare it to the known classification Classified as Known grouping appy Unhappy %error (1-%correct) appy (n =200) Unhappy (n =100) Total (n =300) Is income a good separator of the two groups? ow might such error (probability of misclassification) decrease with regard to - mean and variance of income? - addition of other variables? - sample size (# of obs.)? 201

6 Let s add age (x 2 ) into our analysis and assume the normality and equality of variances for age and income for the two groups (I=happy, II=unhappy). Univariate and bivariate (concentration ellipses) for the two populations may be shown as below Figure adapted from Afifi & Clark, 1998 What do the shaded areas indicate? Which analysis (univariate or bivariate) results in greater shaded areas? What is the simplest way of separating the two groups based on both age and income? 202

7 What is the mathematical equation of the dividing line z? z = a 1 x 1 +a 2 x 2, - this was developed first by Fisher (1936) and hence is called Fisher s discriminant function - the symbol z here should not be mistaken with z as the symbol for a standardized value - Fisher calculated the coefficients a 1 and a 2 such that the squared statistical distance (D 2 ) between the means of the two groups in terms of z values is maximized: -- implication of a large D 2? formulas for their computations can be found in Fisher (1936), Lachenbruch (1975), Afifi and Azen (1979) - for each observation within a group, a z value can be calculated z using the above formula, then calculate C as I z C II, 2 Figure adapted from Afifi & Clark,

8 and then use C to classify -- existing observations with their known group membership to -- additional observations to predict their group membership Classified as Known grouping appy Unhappy %error (1-%correct) appy (n =200) Unhappy (n =100) Total (n =300) Did the addition of the second variable (age) help the discrimination between the two groups? - it depends on the effects on the error rates, are they decreased? - is the overall error acceptable? ow may we improve the discriminant rule further? -adding new variables, this cannot be shown graphically, but holds mathematically So, classification may be done based on a single variable but the classification may not be very accurate the more variables involved the smaller the error, but the number of variables involved depends on the number of observations 204

9 In general, there are 4 similar ways to develop and use discriminant rules Fisher s linear Discriminant Function Rule (presented above) for cases when two multivariate normal populations have equal variance/covariance matrices Likelihood Rule: Rule: Choose g 1 if L(x, 1, 1 ) > L(x; 2, 2) and choose g 2 otherwise, where L(x, i, i ) is the likelihood function (the multivariate normal probability density function, presented in earlier lectures) Mahalanobis Distance Rule When two populations have equal variance/covariance matrices, the likelihood rule will be equivalent to: Rule: Choose g 1 when d 1 < d 2, where d i = (x- i ), (x- i ) d i measures how far x is from i (the Mahalanobis Squared distance between x and i ) Posterior Probability Rule based on the Bayes Theorem When the variance/covariance matrices are equal - the quantity P(g i x) is defined as P( g i x ) P( x g i ) P( gi ), where: k P( x g ) P( g ) i 1 i i 205

10 P(x g i ) is the probability of observing x assuming the data are from population g i, or in other words, it is - the proportion of units in population g i, that has a response vector close to x - it is called typicality probability - we use the data to calculate this P(g i x) is the probability of belonging to population g i conditioned on observing x - conceptually, P (g i x) P (x g i ) - P (g i x) is called the posterior probability - we use the Bayes theorem to calculate this P(g i ) is the probability of belonging to population i - it is called the prior probability K is the number of criterion (known) populations Recall the Bayes theorem from univariate analysis, that P(B A) = [P(A B) P(B)] / [P(A)], because P(A) P(B A) = P(B) P(A B) Rule: Choose g 1 if P (g 1 x) > P (g 2 x) > P (g 3 x) Remark: When the variance/covariance matrices are equal, all four discriminant rules are equivalent. 206

11 Note: The following examples are from SAS elp and Documentation with slight modifications in some cases Example 1: Performing a simple discriminant analysis on simulated data Options nocenter ps=35 ls=65 nodate pageno=1; Data a; drop n; Type = ''; do n = 1 to 20; X = * normal(57391); Y = X normal(57391); output; end; Type = 'C'; do n = 1 to 30; X = * normal(57391); Y = X normal(57391); output; end; run; symbol1 v='' c=black; symbol2 v='c' c=red; run; proc print data=a; run; 207

12 Obs Type X Y C C C C C C C C C C C C C C C C C C C

13 209 proc gplot; plot Y*X=type / cframe=w nolegend; run; Y X C C C C C C C C C C C C C C C C C C C C

14 proc discrim data=a all; class type; var X Y; run; Prior probabilities in the above codes (as indicated in the output) are equal by default. Anywhere before the RUN statement we may indicate: PRIORS Prop (meaning proportional to group sample size) c = 0.7 h =.3 (or any other desired or hypothesized probability) The DISCRIM Procedure Observations 50 DF Total 49 Variables 2 DF Within Classes 48 Classes 2 DF Between Classes 1 Class Level Information Prior Type Var. Freq. Weight Proportion Probability C C

15 Within-Class SSCP Matrices Type = C Variable X Y X Y Type = Variable X Y X Y Pooled Within-Class SSCP Matrix Variable X Y X Y Between-Class SSCP Matrix Variable X Y X Y Total-Sample SSCP Matrix Variable X Y X Y

16 Within-Class Covariance Matrices Type = C, DF = 29 Variable X Y X Y Type =, DF = 19 Variable X Y X Y Pooled Within-Class Covariance Matrix, DF = 48 Variable X Y X Y Between-Class Covariance Matrix, DF = 1 Variable X Y X Y Total-Sample Covariance Matrix, DF = 49 Variable X Y X Y

17 Within-Class Correlation Coefficients / Pr > r Type = C Variable X Y X <.0001 Y < Type = Variable X Y X <.0001 Y < Pooled Within-Class Correlation Coefficients / Pr > r Variable X Y X <.0001 Y <.0001 Between-Class Correlation Coefficients / Pr > r Variable X Y X Y Total-Sample Correlation Coefficients / Pr > r Variable X Y X <.0001 Y <

18 Simple Statistics Total-Sample Standard Variable N Sum Mean Variance Deviation X Y Type = C Standard Variable N Sum Mean Variance Deviation X Y Type = Standard Variable N Sum Mean Variance Deviation X Y Total-Sample Standardized Class Means Variable C X Y Pooled Within-Class Standardized Class Means Variable C X Y

19 Pooled Covariance Matrix Information Covariance Matrix Rank Natural Log of the Determinant of the Covariance Matrix Pairwise Squared Distances Between Groups 2 D(i j)= (X i - X j )' COV -1 (X i - X j ) Squared Distance to Type From Type C C

20 Univariate Test Statistics F Statistics, Num DF=1, Den DF=48 Total Pooled Between Standard Standard Standard Variable Deviation Deviation Deviation F Value Pr > F X Y Multivariate Statistics and Exact F Statistics S=1 M=0 N=22.5 Statistic Value F Value NumDF DenDF Pr > F Wilks' Lambda <.0001 Pillai's Trace <.0001 otelling-lawley Trace <.0001 Roy's Greatest Root <

21 Linear Discriminant Function _ Constant = -.5 X j ' COV -1 X j Coefficient Vector = COV -1 X j Linear Discriminant Function for Type Variable C Constant X Y

22 Classification Summary for Calibration Data: WORK.A Resubstitution Summary using Linear Discriminant Function Posterior Probability of Membership in Each Type 2 2 Pr(j X) = exp(-.5 D (X)) / SUM exp(-.5 D (X)) j k k Number of Observations and Percent Classified into Type From Type C Total C Total Priors Error Count Estimates for Type C Total Rate Priors NOTE: If Prior probabilities are set proportional to group sample size, the total classification error decreases to

23 To learn how to use the information given under Linear Discriminant Function for Type for classification purposes to generate the classification table we may run the following codes: data b; set a; class_c = ( *x) + ( *y); class_h = ( *x) + ( *y); if class_c > class_h then pred_type = 'C'; else pred_type = ''; run; proc freq data=b; tables type*pred_type; run; 219

24 Example 2: The iris data (Fisher, 1936) are used. The sepal length, sepal width, petal length, and petal width are measured in millimeters on 50 iris specimens from each of three known species, Iris setosa, I. versicolor, and I. virginica. The first discriminant analyses is performed with a single quantitative variable (petal width) to simplify the outputs. The GCART procedure is used to display the sample distribution of petal width in the three species. Note the overlap between species I. versicolor and I. virginica. data iris; input SepalLength SepalWidth PetalLength PetalWidth lines; ; 220

25 proc gchart data=iris; vbar PetalWidth / subgroup=species midpoints=0 to 25 raxis=axis1 maxis=axis2 legend=legend1 cframe=ligr; run; Output : Sample Distribution of Petal Width in Three Species 221

26 To use the Discriminant Model built to predict species membership of plants with known petal width but unknown species, 30 plants will be simulated and saved in a data named b using the following codes: data b; do plant = 1 to 30; PetalWidth= 10+4 *normal (1); output;end; options nocenter ls=75; Proc print data=b noobs; Run; Plant Petal Width

27 Data b can then be used as the test data using the TESTDATA, TESTLIST, and TESTID SAS keywords for predicting species of the simulated plants The following run uses normal-theory methods (method=normal). The crosslisterr option lists the misclassified observations under cross validation and displays cross validation error-rate estimates. The testdata option names the data set containing plants for which we would like to predict species using the discriminant model. The option lists the species membership for plants within the TEST data. The testid statement indicates the name of the observation when predicting its species. This statement works only if testlist and/or testlisterr option are/is used. Although not the case in the following run, please note that testclass, testdata, testlist, testlisterr, and testid may also be used in separating the original data, with known grouping for all observations, into two groups and based on one group build the discriminant model and then test this model with the rest of data named TESTDATA. Proc discrim data=iris method=normal crosslisterr testdata=b testlist; class Species; var PetalWidth; testid Plant; run; 223

28 The DISCRIM Procedure Total Sample Size 150 DF Total 149 Variables 1 DF Within Classes 147 Classes 3 DF Between Classes 2 Number of Observations Read 150 Number of Observations Used 150 Class Level Information Variable Prior Species Name Frequency Weight Proportion Probability Setosa Setosa Versicolor Versicolor Virginica Virginica Pooled Covariance Matrix Information Natural Log of the Covariance Determinant of the Matrix Rank Covariance Matrix Pairwise Generalized Squared Distances Between Groups 2-1 D (i j) = (X - X )' COV (X - X ) i j i j Generalized Squared Distance to Species From Species Setosa Versicolor Virginica Setosa Versicolor Virginica Linear Discriminant Function _ -1 _ -1 _ Constant = -.5 X' COV X Coefficient Vector = COV X j j j Linear Discriminant Function for Species Variable Label Setosa Versicolor Virginica Constant PetalWidth Petal Width in mm

29 Classification Summary for Calibration Data: WORK.IRIS Resubstitution Summary using Linear Discriminant Function Generalized Squared Distance Function 2 _ -1 _ D (X) = (X-X )' COV (X-X ) j j j Posterior Probability of Membership in Each Species 2 2 Pr(j X) = exp(-.5 D (X)) / SUM exp(-.5 D (X)) j k k Number of Observations and Percent Classified into Species From Species Setosa Versicolor Virginica Total Setosa Versicolor Virginica Total Priors Error Count Estimates for Species Setosa Versicolor Virginica Total Rate Priors

30 The DISCRIM Procedure Classification Results for Calibration Data: WORK.IRIS Cross-validation Results using Linear Discriminant Function Generalized Squared Distance Function 2 _ -1 _ D (X) = (X-X )' COV (X-X ) j (X)j (X) (X)j Posterior Probability of Membership in Each Species 2 2 Pr(j X) = exp(-.5 D (X)) / SUM exp(-.5 D (X)) j k k Posterior Probability of Membership in Species From Classified Obs Species into Species Setosa Versicolor Virginica 5 Virginica Versicolor * Versicolor Virginica * Virginica Versicolor * Virginica Versicolor * Virginica Versicolor * Versicolor Virginica * * Misclassified observation 226

31 The DISCRIM Procedure Classification Summary for Calibration Data: WORK.IRIS Cross-validation Summary using Linear Discriminant Function Generalized Squared Distance Function 2 _ -1 _ D (X) = (X-X )' COV (X-X ) j (X)j (X) (X)j Posterior Probability of Membership in Each Species 2 2 Pr(j X) = exp(-.5 D (X)) / SUM exp(-.5 D (X)) j k k Number of Observations and Percent Classified into Species From Species Setosa Versicolor Virginica Total Setosa Versicolor Virginica Total Priors Error Count Estimates for Species Setosa Versicolor Virginica Total Rate Priors

32 The DISCRIM Procedure Classification Results for Test Data: WORK.b Classification Results using Linear Discriminant Function Posterior Probability of Membership in Species Classified plant into Species Setosa Versicolor Virginica 1 Virginica Versicolor Versicolor Setosa Virginica Setosa Versicolor Versicolor Setosa Versicolor Setosa Versicolor Versicolor Setosa Setosa Setosa Versicolor Versicolor Setosa Versicolor Versicolor Versicolor Versicolor Versicolor Setosa Setosa Versicolor Versicolor Versicolor Versicolor

33 Observation Profile for Test Data Number of Observations Read 30 Number of Observations Used 30 Number of Observations and Percent Classified into Species Setosa Versicolor Virginica Total Total Priors

34 Part of options in Proc DISCIRM adopted from SAS elp and Documentation. LISTERR displays the resubstitution classification results for misclassified observations only. You can specify this option only when the input data set is an ordinary SAS data set. NOCLASSIFY suppresses the resubstitution classification of the input DATA= data set. You can specify this option only when the input data set is an ordinary SAS data set. OUT=SAS-data-set creates an output SAS data set containing all the data from the DATA= data set, plus the posterior probabilities and the class into which each observation is classified by resubstitution. When you specify the CANONICAL option, the data set also contains new variables with canonical variable scores. See the "OUT= Data Set" section. OUTCROSS=SAS-data-set creates an output SAS data set containing all the data from the DATA= data set, plus the posterior probabilities and the class into which each observation is classified by cross validation. When you specify the CANONICAL option, the data set also contains new variables with canonical variable scores. See the "OUT= Data Set" section. OUTD=SAS-data-set creates an output SAS data set containing all the data from the DATA= data set, plus the group-specific density estimates for each observation. See the "OUT= Data Set" section. OUTSTAT=SAS-data-set creates an output SAS data set containing various statistics such as means, standard deviations, and correlations. When the input data set is an ordinary SAS data set or when TYPE=CORR, TYPE=COV, TYPE=CSSCP, or TYPE=SSCP, this option can be used to generate discriminant statistics. When you specify the CANONICAL option, canonical correlations, canonical structures, canonical coefficients, and means of canonical variables for each class are included in the data set. If you specify METOD=NORMAL, the output data set also includes coefficients of the discriminant functions, and the output data set is TYPE=LINEAR (POOL=YES), TYPE=QUAD (POOL=NO), or TYPE=MIXED (POOL=TEST). If you specify METOD=NPAR, this output data set is TYPE=CORR. This data set also holds calibration information that can be used to classify new observations. See the "Saving and Using Calibration Information" section and the "OUT= Data Set" section. POSTERR displays the posterior probability error-rate estimates of the classification criterion based on the classification results. TESTDATA=SAS-data-set names an ordinary SAS data set with observations that are to be classified. The quantitative variable names in this data set must match those in the DATA= data set. When you specify the TESTDATA= option, you can also specify the TESTCLASS, TESTFREQ, and TESTID statements. When you specify the TESTDATA= option, you can use the TESTOUT= and TESTOUTD= options to 230

35 generate classification results and group-specific density estimates for observations in the test data set. Note that if the CLASS variable is not present in the TESTDATA= data set, the output will not include misclassification statistics. TESTLIST lists classification results for all observations in the TESTDATA= data set. TESTLISTERR lists only misclassified observations in the TESTDATA= data set but only if a TESTCLASS statement is also used. TESTOUT=SAS-data-set creates an output SAS data set containing all the data from the TESTDATA= data set, plus the posterior probabilities and the class into which each observation is classified. When you specify the CANONICAL option, the data set also contains new variables with canonical variable scores. See the "OUT= Data Set" section. TESTOUTD=SAS-data-set creates an output SAS data set containing all the data from the TESTDATA= data set, plus the group-specific density estimates for each observation. See the "OUT= Data Set" section. 231

36 The following is a non-parametric discriminant analyses (METOD=NPAR). It uses equal bandwidths (smoothing parameters). The value of the radius parameter that, assuming normality, minimizes an approximate mean integrated square error is 0.48 (see the "Nonparametric Methods" section). Choosing r = 0.4 gives a more detailed look at the irregularities in the data. The following statements produce Output : proc discrim data=iris method=npar kernel=normal r =.4 short noclassify crosslisterr; class Species; var PetalWidth; title2 'Using Kernel Density Estimates with Equal Bandwidth'; run; Output : Kernel Density Estimates with Equal Bandwidth Discriminant Analysis of Fisher (1936) Iris Data Using Kernel Density Estimates with Equal Bandwidth The DISCRIM Procedure Observations 150 DF Total 149 Variables 1 DF Within Classes 147 Classes 3 DF Between Classes 2 Class Level Information Species Variable Name Frequency Weight Proportion Prior Probability Setosa Setosa Versicolor Versicolor Virginica Virginica

37 Classification Results for Calibration Data: WORK.IRIS Cross-validation Results using Normal Kernel Density Posterior Probability of Membership in Species Obs From Species Classified into Species Setosa Versicolor Virginica 5 Virginica Versicolor * Versicolor Virginica * Virginica Versicolor * Virginica Versicolor * Virginica Versicolor * Versicolor Virginica * * Misclassified observation 233

38 Classification Summary for Calibration Data: WORK.IRIS Cross-validation Summary using Normal Kernel Density Number of Observations and Percent Classified into Species From Species Setosa Versicolor Virginica Total Setosa Versicolor Virginica Total Priors Error Count Estimates for Species Setosa Versicolor Virginica Total Rate Priors

39 In the following example: All four variables are used. POOL=TEST (YES or NO are other options Yes is default) tests the homogeneity of the within-group covariance matrices (null) (Output ). This test rejects the null at the 0.10 level (SAS default value to declare calculated P significant), so separate within-group covariance matrices are used to derive the quadratic discriminant criterion. WCOV and PCOV options display the within-group covariance matrices and the pooled covariance matrix (Output ). DISTANCE option displays squared distances between classes (Output ). ANOVA and MANOVA options test if the class means are equal, using ANOVA and MANOVA (Output ). LISTERR option lists the misclassified observations using resubstitution (Output ). CROSSLISTERR option lists misclassified observations using cross validation and displays cross validation error-rate estimates (Output ). OUTSTAT= option generates a TYPE=MIXED (because POOL=TEST) output data set containing various statistics such as means, covariances, and coefficients of the discriminant function (Output ). As expected, resubstitution method error count estimate is smaller than that of cross validation method because it is optimistically biased. Proc discrim data=iris outstat=irisstat wcov pcov method=normal pool=test distance anova manova listerr crosslisterr; class Species; var SepalLength SepalWidth PetalLength PetalWidth; run; 235

40 Output : Covariance Matrices Within-Class Covariance Matrices Species = Setosa, DF = 49 Variable SepalLength SepalWidth PetalLength PetalWidth SepalLength SepalWidth PetalLength PetalWidth Species = Versicolor, DF = 49 Variable SepalLength SepalWidth PetalLength PetalWidth SepalLength SepalWidth PetalLength PetalWidth Species = Virginica, DF = 49 Variable SepalLength SepalWidth PetalLength PetalWidth SepalLength SepalWidth PetalLength PetalWidth

41 Pooled Within-Class Covariance Matrix, DF = 147 Variable SepalLength SepalWidth PetalLength PetalWidt h SepalLength SepalWidth PetalLength PetalWidth Output : omogeneity Test Test of omogeneity of Within Covariance Matrices Chi-Square DF Pr > ChiSq <.0001 Since the Chi-Square value is significant at the 0.1 level, the within covariance matrices will be used in the discriminant function. Reference: Morrison, D.F. (1976) Multivariate Statistical Methods p

42 Output : Squared Distances Discriminant Analysis of Fisher (1936) Iris Data Using Quadratic Discriminant Function The DISCRIM Procedure Squared Distance to Species From Species Setosa Versicolor Virginica Setosa Versicolor Virginica Generalized Squared Distance to Species From Species Setosa Versicolor Virginica Setosa Versicolor Virginica

43 Output : Tests of Equal Class Means Univariate Test Statistics F Statistics, Num DF=2, Den DF=147 Variable Total Standard Deviation Pooled Standard Deviation Between Standard Deviation R- Square R- Square / (1- RSq) F Value Pr > F SepalLength <.0001 SepalWidth <.0001 PetalLength <.0001 PetalWidth <.0001 Average R-Square Unweighted Weighted by Variance Multivariate Statistics and F Approximations S=2 M=0.5 N=71 Statistic Value F Value Num DF Den DF Pr > F Wilks' Lambda <.0001 Pillai's Trace <.0001 otelling-lawley Trace <.0001 Roy's Greatest Root <.0001 NOTE: F Statistic for Roy's Greatest Root is an upper bound. NOTE: F Statistic for Wilks' Lambda is exact. 239

44 Output : Misclassified Observations: Resubstitution Classification Results for Calibration Data: WORK.IRIS Resubstitution Results using Quadratic Discriminant Function Posterior Probability of Membership in Species Obs From Species Classified into Species Setosa Versicolor Virginica 5 Virginica Versicolor * Versicolor Virginica * Versicolor Virginica * * Misclassified observation Resubstitution Summary using Quadratic Discriminant Function Number of Observations and Percent Classified into Species From Species Setosa Versicolor Virginica Total Setosa Versicolor Virginica Total Priors Error Count Estimates for Species Setosa Versicolor Virginica Total Rate Priors

45 Output : Misclassified Observations: Cross validation Classification Results for Calibration Data: WORK.IRIS Cross-validation Results using Quadratic Discriminant Function Posterior Probability of Membership in Species Obs From Species Classified into Species Setosa Versicolor Virginica 5 Virginica Versicolor * Versicolor Virginica * Versicolor Virginica * Versicolor Virginica * * Misclassified observation Number of Observations and Percent Classified into species From Species Setosa Versicolor Virginica Total Setosa Versicolor Virginica Total Priors Error Count Estimates for Species Setosa Versicolor Virginica Total Rate Priors Note that if we assume homogeneity of var/covar matrices (POOL=Yes), ECE rate for the resubstitution method remain the same but it will be 0.02 instead of for the crossvalidation method. 241

Chapter 25 The DISCRIM Procedure. Chapter Table of Contents

Chapter 25 The DISCRIM Procedure. Chapter Table of Contents Chapter 25 Chapter Table of Contents OVERVIEW...1013 GETTING STARTED...1014 SYNTAX...1019 PROCDISCRIMStatement...1019 BYStatement...1027 CLASSStatement...1028 FREQStatement...1028 IDStatement...1028 PRIORSStatement...1028