Applied Multivariate Analysis

Similar documents
The SAS System 18:28 Saturday, March 10, Plot of Canonical Variables Identified by Cluster

Multivariate analysis of variance and covariance

4.1 Computing section Example: Bivariate measurements on plants Post hoc analysis... 7

Chapter 7, continued: MANOVA

z = β βσβ Statistical Analysis of MV Data Example : µ=0 (Σ known) consider Y = β X~ N 1 (β µ, β Σβ) test statistic for H 0β is

Discriminant Analysis

Discriminant Analysis (DA)

Repeated Measures Part 2: Cartoon data

MANOVA MANOVA,$/,,# ANOVA ##$%'*!# 1. $!;' *$,$!;' (''

Exst7037 Multivariate Analysis Cancorr interpretation Page 1

Stevens 2. Aufl. S Multivariate Tests c

Application of Ghosh, Grizzle and Sen s Nonparametric Methods in. Longitudinal Studies Using SAS PROC GLM

MULTIVARIATE HOMEWORK #5

MANOVA is an extension of the univariate ANOVA as it involves more than one Dependent Variable (DV). The following are assumptions for using MANOVA:

SAS/STAT 15.1 User s Guide The CANDISC Procedure

ANOVA Longitudinal Models for the Practice Effects Data: via GLM

Other hypotheses of interest (cont d)

Example 1 describes the results from analyzing these data for three groups and two variables contained in test file manova1.tf3.

Chapter 9. Multivariate and Within-cases Analysis. 9.1 Multivariate Analysis of Variance

Covariance Structure Approach to Within-Cases

Repeated-Measures ANOVA in SPSS Correct data formatting for a repeated-measures ANOVA in SPSS involves having a single line of data for each

Repeated Measures ANOVA Multivariate ANOVA and Their Relationship to Linear Mixed Models

Multivariate Statistical Analysis

Discriminant Analysis

Applied Multivariate and Longitudinal Data Analysis

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification

Multivariate Analysis of Variance

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

WITHIN-PARTICIPANT EXPERIMENTAL DESIGNS

Analysis of Longitudinal Data: Comparison Between PROC GLM and PROC MIXED. Maribeth Johnson Medical College of Georgia Augusta, GA

Applied Multivariate Statistical Modeling Prof. J. Maiti Department of Industrial Engineering and Management Indian Institute of Technology, Kharagpur

General Linear Model. Notes Output Created Comments Input. 19-Dec :09:44

GLM Repeated Measures

Data Analyses in Multivariate Regression Chii-Dean Joey Lin, SDSU, San Diego, CA

STAT 501 EXAM I NAME Spring 1999

In most cases, a plot of d (j) = {d (j) 2 } against {χ p 2 (1-q j )} is preferable since there is less piling up

Prepared by: Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti

Multivariate Tests. Mauchly's Test of Sphericity

An Introduction to Multivariate Statistical Analysis

Canonical Correlations

MULTIVARIATE ANALYSIS OF VARIANCE

Y (Nominal/Categorical) 1. Metric (interval/ratio) data for 2+ IVs, and categorical (nominal) data for a single DV

Journal of Statistical Softw are

Neuendorf MANOVA /MANCOVA. Model: X1 (Factor A) X2 (Factor B) X1 x X2 (Interaction) Y4. Like ANOVA/ANCOVA:

Neuendorf MANOVA /MANCOVA. Model: X1 (Factor A) X2 (Factor B) X1 x X2 (Interaction) Y4. Like ANOVA/ANCOVA:

General Linear Model

Multivariate Linear Models

PRINCIPAL COMPONENTS ANALYSIS

Least Squares Estimation

An Introduction to Multivariate Methods

Classification Methods II: Linear and Quadratic Discrimminant Analysis

M A N O V A. Multivariate ANOVA. Data

Descriptive Statistics

Chapter 5: Multivariate Analysis and Repeated Measures

THE UNIVERSITY OF CHICAGO Booth School of Business Business 41912, Spring Quarter 2016, Mr. Ruey S. Tsay

Supervised Learning: Linear Methods (1/2) Applied Multivariate Statistics Spring 2012

STAT 501 Assignment 2 NAME Spring Chapter 5, and Sections in Johnson & Wichern.

T. Mark Beasley One-Way Repeated Measures ANOVA handout

Multivariate Regression (Chapter 10)

Topic 20: Single Factor Analysis of Variance

Multivariate Linear Regression Models

Neuendorf MANOVA /MANCOVA. Model: MAIN EFFECTS: X1 (Factor A) X2 (Factor B) INTERACTIONS : X1 x X2 (A x B Interaction) Y4. Like ANOVA/ANCOVA:

Applied Multivariate and Longitudinal Data Analysis

Lecture 5: Hypothesis tests for more than one sample

Rejection regions for the bivariate case

UV Absorbance by Fish Slime

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Analysis of variance, multivariate (MANOVA)

1998, Gregory Carey Repeated Measures ANOVA - 1. REPEATED MEASURES ANOVA (incomplete)

Principal component analysis

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

Multivariate Analysis of Variance

Visualizing Tests for Equality of Covariance Matrices Supplemental Appendix

Chapter 2 Multivariate Normal Distribution

STAT 730 Chapter 1 Background

Comparisons of Several Multivariate Populations

THE UNIVERSITY OF CHICAGO Booth School of Business Business 41912, Spring Quarter 2012, Mr. Ruey S. Tsay

6 Multivariate Regression

M M Cross-Over Designs

Gregory Carey, 1998 Regression & Path Analysis - 1 MULTIPLE REGRESSION AND PATH ANALYSIS

Multivariate Data Analysis Notes & Solutions to Exercises 3

Quiz #3 Research Hypotheses that Involve Comparing Non-Nested Models

Hypothesis Testing for Var-Cov Components

LEC 4: Discriminant Analysis for Classification

Inferences about a Mean Vector

HYPOTHESIS TESTING: THE CHI-SQUARE STATISTIC

Multivariate Statistics

Applied Multivariate Analysis

You can compute the maximum likelihood estimate for the correlation

PRINCIPAL COMPONENTS ANALYSIS (PCA)

SAS Syntax and Output for Data Manipulation: CLDP 944 Example 3a page 1

POWER AND TYPE I ERROR RATE COMPARISON OF MULTIVARIATE ANALYSIS OF VARIANCE

THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2008, Mr. Ruey S. Tsay. Solutions to Final Exam

Techniques and Applications of Multivariate Analysis

THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2014, Mr. Ruey S. Tsay. Solutions to Final Exam

Odor attraction CRD Page 1

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

4 Statistics of Normally Distributed Data

Data Mining and Analysis: Fundamental Concepts and Algorithms

Transcription:

Department of Mathematics and Statistics, University of Vaasa, Finland Spring 2017

Discriminant Analysis

Background 1 Discriminant analysis Background General Setup for the Discriminant Analysis Descriptive Discriminant Analysis Number of Discriminant Functions

Background Example 1 Consider the following data on financial ratios for solvent and bankrupted companies Financial Ratios of Bankrupt and Solvent Companies, Altman (1968) Source: Morrison (1990). Multivariate Statistical Methods, 3rd ed. McGraw-Hill X1 = Working Capital / Total Assets X2 = Retained Earnings / Total Assets X3 = Earnings Before Interest and Taxes / Total Assets X4 = Market Value of Equity / Total Value of Liabilities X5 = Sales / Total Assets Group, 1 = Bankrupt 2 = Solvent

Background Group X1 X2 X3 X4 X5 Group X1 X2 X3 X4 X5 1 36.7-62.8-89.5 54.1 1.7 1 25.2-11.4 4.8 7.0 0.9 1 24.0 3.3-3.5 20.9 1.1 1-61.6-120.8-103.2 24.7 2.5 1-1.0-18.1-28.8 36.2 1.1 1 18.9-3.8-50.6 26.4 0.9 1-57.2-61.2-56.6 11.0 1.7 1 3.0-20.3-17.4 8.0 1.0 1-5.1-194.5-25.8 6.5 0.5 1 17.9 20.8-4.3 22.6 1.0 1 5.4-106.1-22.9 23.8 1.5 1 23.0-39.4-35.7 69.1 1.2 1-67.6-164.1-17.7 8.7 1.3 1-185.1-308.9-65.8 35.7 0.8 1 13.5 7.2-22.6 96.1 2.0 1-5.7-118.3-34.2 21.7 1.5 1 72.4-185.9-280.0 12.5 6.7 1 17.0-34.6-19.4 35.5 3.4 1-31.2-27.9 6.3 7.0 1.3 1 14.1-48.2 6.8 16.6 1.6 1-60.6-49.2-17.2 7.2 0.3 1 26.2-19.2-36.7 90.4 0.8 1 7.0-18.1-6.5 16.5 0.9 1 53.1-98.0-20.8 26.6 1.7 1-17.2-129.0-14.2 267.9 1.3 1 32.7-4.0-15.8 177.4 2.1 1 26.7-8.7-36.3 32.5 2.8 1-7.7-59.2-12.8 21.3 2.1 1 18.0-13.1-17.6 14.6 0.9 1 2.0-38.0 1.6 7.7 1.2 1-35.3-57.9 0.7 13.7 0.8 1 5.1-8.8-9.1 100.9 0.9 1 0.0-64.7-4.0 0.7 0.1 1 25.2-11.4 4.8 Seppo 7.0 Pynnönen 0.9

Background 2 35.2 43.0 16.4 99.1 1.3 2 38.8 47.0 16.0 126.5 1.9 2 14.0-3.3 4.0 91.7 2.7 2 55.1 35.0 20.8 72.3 1.9 2 59.3 46.7 12.6 724.1 0.9 2 33.6 20.8 12.5 152.8 2.4 2 52.8 33.0 23.6 475.9 1.5 2 45.6 26.1 10.4 287.9 2.1 2 47.4 68.6 13.8 581.3 1.6 2 40.0 37.3 33.4 228.8 3.5 2 69.0 59.0 23.1 406.0 5.5 2 34.2 49.6 23.8 126.6 1.9 2 47.0 12.5 7.0 53.4 1.8 2 15.4 37.3 34.1 570.1 1.5 2 56.9 35.3 4.2 240.3 0.9 2 43.8 49.5 25.1 115.0 2.6 2 20.7 18.1 13.5 63.1 4.0 2 33.8 31.4 15.7 144.8 1.9 2 35.8 21.5-14.4 90.0 1.0 2 24.4 8.5 5.8 149.1 1.5 2 48.9 40.6 5.8 82.0 1.8 2 49.9 34.6 26.4 310.0 1.8 2 54.8 19.9 26.7 239.9 2.3 2 39.0 17.4 12.6 60.5 1.3 2 53.0 54.7 14.6 771.7 1.7 2 20.1 53.5 20.6 307.5 1.1 2 53.7 35.6 26.4 289.5 2.0 2 46.1 39.4 30.5 700.0 1.9 2 48.3 53.1 7.1 164.4 1.9 2 46.7 39.8 13.8 229.1 1.2 2 60.3 59.5 7.0 226.6 2.0 2 17.9 16.3 20.4 105.6 1.0 2 24.7 21.7-7.8 118.6 1.6

Background Relevant questions then are: How do the companies in these two groups differ from each other? Which ratios best discriminate the groups? Are the ratios useful for predicting bankruptcies? Partial answers to can be obtained by examining each single variable at a time.

Background For example sample statistics for each group are

Background Some graphics may also be helpful. For example, More complete use of group separation information, however, can be given by discriminant analysis (DA).

General Setup for the Discriminant Analysis 1 Discriminant analysis Background General Setup for the Discriminant Analysis Descriptive Discriminant Analysis Number of Discriminant Functions

General Setup for the Discriminant Analysis Discriminant analysis is used for two purposes: (1) describing major differences among the groups, and (2) classifying subject on the basis of measurements.

Descriptive Discriminant Analysis 1 Discriminant analysis Background General Setup for the Discriminant Analysis Descriptive Discriminant Analysis Number of Discriminant Functions

Descriptive Discriminant Analysis The start off setup: p variables q exclusive groups

Descriptive Discriminant Analysis The goal of the descriptive DA is: Form k new variables such that 1 The new variables are uncorrelated. 2 The first new variable has the best discriminating power w.r.t the given groups. The second new variable has the second best discriminating power and is uncorrelated with the first one, the third has the third best discriminating power and is uncorrelated with the previous ones, etc. Remark 1 k min(p, q 1). For example, if q = 2 then k = min(p, 1) = 1.

Descriptive Discriminant Analysis More precisely, suppose we have observations on random variables x 1,..., x p from q groups. Then the j th discriminant function is defined as a linear combination of the original variables y j = a j1 x 1 + + a jp x p, (1) such that corr[y j, y l ] = 0 for j l, and y 1 has the best discriminating power, y 2 the second best, and so on.

Descriptive Discriminant Analysis Remark 2 In the basic case the assumption is that the groups differ only with respect to the means of the variables. As a consequence the correlations between the variables and variances are assumed the same over the groups (groups have similar covariance structures).

Descriptive Discriminant Analysis The idea in deriving the discriminant functions is to divide the total variation into between group and within group variation T = B + W, (2) where T denotes the total covariance matrix, B the between covariance matrix, and W the within covariance matrix.

Descriptive Discriminant Analysis Technically the problem reduces again to an eigenvalue problem. In this case the eigenvalues are extracted form the matrix BW 1. (3) The resulting eigenvectors form the coefficients for the discriminant functions y j, j = 1,..., k with k = min(q 1, p). The functions are called canonical discriminant functions.

Descriptive Discriminant Analysis Example 2 Consider the bankruptcy data. SAS proc candisc or SPSS (Analyze Classify Discriminant). Below are SAS results. Example: Discriminant analysis applied to bankrupt data Canonical Discriminant Analysis 66 Observations 65 DF Total 5 Variables 64 DF Within Classes 2 Classes 1 DF Between Classes Class Level Information GROUP Frequency Weight Proportion 1 33 33.0000 0.500000 2 33 33.0000 0.500000

Descriptive Discriminant Analysis Canonical Discriminant Analysis Within-Class Covariance Matrices GROUP = 1 DF = 32 Variable X1 X2 X3 X4 X5 X1 2104.5659 1834.1637-266.4029 249.8980 18.0357 X2 1834.1637 5085.4767 1632.2018 177.7665-15.6653 X3-266.4029 1632.2018 2637.1822 168.3066-46.6066 X4 249.8980 177.7665 168.3066 3018.2188 1.6108 X5 18.0357-15.6653-46.6066 1.6108 1.3509 GROUP = 2 DF = 32 Variable X1 X2 X3 X4 X5 X1 201.986 117.413 16.740 974.165 1.921 X2 117.413 272.496 52.076 1630.092 0.879 X3 16.740 52.076 118.108 814.591 2.762 X4 974.165 1630.092 814.591 42669.190-14.529 X5 1.921 0.879 2.762-14.529 0.865

Descriptive Discriminant Analysis Canonical Discriminant Analysis Simple Statistics Total-Sample Variable N Mean Variance Std Dev X1 66 19.28485 1632 40.39972 X2 66-13.63485 5064 71.15836 X3 66-8.23182 1920 43.81308 X4 66 147.35909 34186 184.89362 X5 66 1.72121 1.13924 1.06735 GROUP = 1 Variable N Mean Variance Std Dev X1 33-2.83030 2105 45.87555 X2 33-62.51212 5085 71.31253 X3 33-31.78182 2637 51.35350 X4 33 40.04545 3018 54.93832 X5 33 1.50303 1.35093 1.16229 GROUP = 2 Variable N Mean Variance Std Dev X1 33 41.40000 201.98563 14.21216 X2 33 35.24242 272.49627 16.50746 X3 33 15.31818 118.10841 10.86777 X4 33 254.67273 42669 206.56522 X5 33 1.93939 0.86496 0.93003

Descriptive Discriminant Analysis Univariate Test Statistics F Statistics, Num DF= 1 Den DF= 64 Total Pooled Between RSQ/ Variable STD STD STD R-Squared (1-RSQ) X1 40.3997 33.9599 31.2755 0.304266 0.4373 X2 71.1584 51.7589 69.1229 0.479063 0.9196 X3 43.8131 37.1166 33.3047 0.293363 0.4152 X4 184.8936 151.1413 151.7644 0.342055 0.5199 X5 1.0673 1.0526 0.3086 0.042428 0.0443 Univariate Test Statistics Variable F Pr > F X1 27.9892 0.0001 X2 58.8555 0.0001 X3 26.5698 0.0001 X4 33.2726 0.0001 X5 2.8357 0.0971 Average R-Squared: Unweighted = 0.2922351 Weighted by Variance = 0.3546308 Multivariate Statistics and Exact F Statistics S=1 M=1.5 N=29 Statistic Value F Num DF Den DF Pr > F Wilks Lambda 0.369760775 20.4534 5 60 0.0001 Pillai s Trace 0.630239225 20.4534 5 60 0.0001 Hotelling-Lawley Trace 1.704451275 20.4534 5 60 0.0001 Roy s Greatest Root 1.704451275 20.4534 5 60 0.0001

Descriptive Discriminant Analysis Example: Discriminant analysis applied to bankrupt data Canonical Discriminant Analysis Adjusted Approx Squared Canonical Canonical Standard Canonical Correlation Correlation Error Correlation 1 0.793876 0.781803 0.045863 0.630239 Eigenvalues of INV(E)*H = CanRsq/(1-CanRsq) Eigenvalue Difference Proportion Cumulative 1 1.7045. 1.0000 1.0000 Test of H0: The canonical correlations in the current row and all that follow are zero Likelihood Ratio Approx F Num DF Den DF Pr > F 1 0.36976078 20.4534 5 60 0.0001 NOTE: The F statistic is exact. Total Canonical Structure CAN1 X1 0.694823 X2 0.871854 X3 0.682260 X4 0.736708 X5 0.259462

Descriptive Discriminant Analysis Between Canonical Structure CAN1 X1 1.000000 X2 1.000000 X3 1.000000 X4 1.000000 X5 1.000000 Pooled Within Canonical Structure CAN1 X1 0.506539 X2 0.734533 X3 0.493528 X4 0.552283 X5 0.161231 Total-Sample Standardized Canonical Coefficients CAN1 X1 0.1404518774 X2 0.6028563830 X3 0.6695203123 X4 0.5616859665 X5 0.5320432994 Pooled Within-Class Standardized Canonical Coefficients CAN1 X1 0.1180635365 X2 0.4385036080 X3 0.5671902048 X4 0.4591503359 X5 0.5246858501

Descriptive Discriminant Analysis Raw Canonical Coefficients CAN1 X1 0.0034765558 X2 0.0084720383 X3 0.0152812900 X4 0.0030378872 X5 0.4984713894 Class Means on Canonical Variables GROUP CAN1 1-1.285613175 2 1.285613175

Descriptive Discriminant Analysis The output includes several coefficient matrices. The structure matrices describe the correlations of the original variables with the discriminant function. The most useful of these for interpretation purposes is the within canonical structure. In the case of multiple groups also between canonical structure may give useful additional information. This structure tells how the means of variables and means of discriminant functions are correlated.

Descriptive Discriminant Analysis The standardized coefficients are obtained by dividing the raw coefficients by the standard deviations of the variables. These coefficient tell the marginal effect of the (standardized) variable on the discriminant function. Labeling the discriminant function is based on those variables having largest correlations and largest standardized coefficients.

Descriptive Discriminant Analysis Example 3 From the within canonical structure we observe: X 2 (Retained earnings / Total assets) has the highest correlation with the discriminant function. X 4 (Market value of equity / Total Value of Liabilities), X 1 (Working capital / Total Assets), and X 3 (Earnings before interest and taxes / Total assets) have next highest. X 5 (Sales / Total Assets) is small, but it has a large standardized coefficient. Summing up, profitable and companies whose market value is on a high level are the properties preventing from the bankruptcy.

Descriptive Discriminant Analysis It should be noted that the basic assumption in the discriminant analysis is that the variables are normally distributed in each of the groups, and that the covariance matrices are the same. The former assumption is harder to test. The latter is easier (in SPSS select Box M from the options). If the covariance matrices are not the same the linear discriminant function analysis is invalid. One should move to the quadratic discriminant function analysis. This method, however, is planned for classification purposes.

Descriptive Discriminant Analysis Example 4 Testing for the equality of the population covariance matrices. H 0 : Σ 1 = Σ 2, (4) where Σ i is the population covariance matrix of the population i (i = 1, 2). SPSS give the result: Test Chi-Square Value = 186.18 with 15 degrees of freedom and p-value = 0.0001 We observe that the null hypothesis is rejected, hence one analysis results should be interpreted with caution.

Number of Discriminant Functions 1 Discriminant analysis Background General Setup for the Discriminant Analysis Descriptive Discriminant Analysis Number of Discriminant Functions

Number of Discriminant Functions In a case of multiple group (> 2) the question is: in how many dimension the groups are different. In the case of two groups this is not a major problem, because the groups can differentiate only in one dimension. Generally, however, there can be more discriminating dimensions, if q > 2.

Number of Discriminant Functions Example 5 The following data is a classic example considering different species of Iris Setosa. The following measures were made: SL: SW: PL: PW: Sepal length Sepal WIdth Pedal Length Pedal Width

Number of Discriminant Functions The CANDISC procedure produces the following results. title; data iris; title Discriminant Analysis of Fisher (1936) Iris Data ; input sepallen sepalwid petallen petalwid spec_no @@; if spec_no=1 then species= SETOSA ; if spec_no=2 then species= VERSICOLOR ; if spec_no=3 then species= VIRGINICA ; label sepallen= Sepal Length in mm. sepalwid= Sepal Width in mm. petallen= Petal Length in mm. petalwid= Petal Width in mm. ; datalines; 50 33 14 02 1 64 28 56 22 3 65 28 46 15 2 67 31 56 24 3 63 28 51 15 3 46 34 14 03 1 69 31 51 23 3 62 22 45 15 2 59 32 48 18 2 46 36 10 02 1 61 30 46 14 2 60 27 51 16 2 65 30 52 20 3 56 25 39 11 2 65 30 55 18 3 58 27 51 19 3 68 32 59 23 3 51 33 17 05 1 57 28 45 13 2 62 34 54 23 3 77 38 67 22 3 63 33 47 16 2 67 33 57 25 3 76 30 66 21 3 49 25 45 17 3 55 35 13 02 1 67 30 52 23 3 70 32 47 14 2 64 32 45 15 2 61 28 40 13 2 48 31 16 02 1 59 30 51 18 3 55 24 38 11 2 63 25 50 19 3 64 32 53 23 3 52 34 14 02 1 49 36 14 01 1 54 30 45 15 2 79 38 64 20 3 44 32 13 02 1 67 33 57 21 3 50 35 16 06 1 58 26 40 12 2 44 30 13 02 1 77 28 67 20 3 63 27 49 18 3 47 32 16 02 1 55 26 44 12 2 50 23 33 10 2 72 32 60 18 3 48 30 14 03 1 51 38 16 02 1 61 30 49 18 3 48 34 19 02 1 50 30 16 02 1 50 32 12 02 1 61 26 56 14 3 64 28 56 21 3 43 30 11 01 1 58 40 12 02 1 51 38 19 04 1 67 31 44 14 2 62 28 48 18 3 49 30 14 02 1 51 35 14 02 1 56 30 45 15 2 58 27 41 10 2 50 34 16 04 1.

Number of Discriminant Functions title Canonical Discriminant Analysis of IRIS data ; proc candisc data = iris; class species; var sepallen--petalwid; run; Which gives the results: Canonical Discriminant Analysis of IRIS data Canonical Discriminant Analysis 150 Observations 149 DF Total 4 Variables 147 DF Within Classes 3 Classes 2 DF Between Classes Class Level Information SPECIES Frequency Weight Proportion SETOSA 50 50.0000 0.333333 VERSICOLOR 50 50.0000 0.333333 VIRGINICA 50 50.0000 0.333333 Canonical Discriminant Analysis Multivariate Statistics and F Approximations S=2 M=0.5 N=71 Statistic Value F Num DF Den DF Pr > F Wilks Lambda 0.023438631 199.145 8 288 0.0001 Pillai s Trace 1.191898825 53.4665 8 290 0.0001 Hotelling-Lawley Trace 32.47732024 580.532 8 286 0.0001 Roy s Greatest Root 32.1919292 1166.96 4 145 0.0001 NOTE: F Statistic for Roy s Greatest Root is an upper bound. NOTE: F Statistic for Wilks Lambda is exact.

Number of Discriminant Functions Adjusted Approx Squared Canonical Canonical Standard Canonical Correlation Correlation Error Correlation 1 0.984821 0.984508 0.002468 0.969872 2 0.471197 0.461445 0.063734 0.222027 Eigenvalues of INV(E)*H = CanRsq/(1-CanRsq) Eigenvalue Difference Proportion Cumulative 1 32.1919 31.9065 0.9912 0.9912 2 0.2854. 0.0088 1.0000 Test of H0: The canonical correlations in the current row and all that follow are zero Likelihood Ratio Approx F Num DF Den DF Pr > F 1 0.02343863 199.1453 8 288 0.0001 2 0.77797337 13.7939 3 145 0.0001 Total Canonical Structure CAN1 CAN2 SEPALLEN 0.791888 0.217593 Sepal Length in mm. SEPALWID -0.530759 0.757989 Sepal Width in mm. PETALLEN 0.984951 0.046037 Petal Length in mm. PETALWID 0.972812 0.222902 Petal Width in mm.

Number of Discriminant Functions Between Canonical Structure CAN1 CAN2 SEPALLEN 0.991468 0.130348 Sepal Length in mm. SEPALWID -0.825658 0.564171 Sepal Width in mm. PETALLEN 0.999750 0.022358 Petal Length in mm. PETALWID 0.994044 0.108977 Petal Width in mm. Pooled Within Canonical Structure CAN1 CAN2 SEPALLEN 0.222596 0.310812 Sepal Length in mm. SEPALWID -0.119012 0.863681 Sepal Width in mm. PETALLEN 0.706065 0.167701 Petal Length in mm. PETALWID 0.633178 0.737242 Petal Width in mm.

Number of Discriminant Functions Total-Sample Standardized Canonical Coefficients CAN1 CAN2 SEPALLEN -0.686779533 0.019958173 Sepal Length in mm. SEPALWID -0.668825075 0.943441829 Sepal Width in mm. PETALLEN 3.885795047-1.645118866 Petal Length in mm. PETALWID 2.142238715 2.164135931 Petal Width in mm. Pooled Within-Class Standardized Canonical Coefficients CAN1 CAN2 SEPALLEN -.4269548486 0.0124075316 Sepal Length in mm. SEPALWID -.5212416758 0.7352613085 Sepal Width in mm. PETALLEN 0.9472572487 -.4010378190 Petal Length in mm. PETALWID 0.5751607719 0.5810398645 Petal Width in mm. Raw Canonical Coefficients CAN1 CAN2 SEPALLEN -.0829377642 0.0024102149 Sepal Length in mm. SEPALWID -.1534473068 0.2164521235 Sepal Width in mm. PETALLEN 0.2201211656 -.0931921210 Petal Length in mm. PETALWID 0.2810460309 0.2839187853 Petal Width in mm. Class Means on Canonical Variables SPECIES CAN1 CAN2 SETOSA -7.607599927 0.215133017 VERSICOLOR 1.825049490-0.727899622 VIRGINICA 5.782550437 0.512766605

Number of Discriminant Functions The Wilk s lambda test indicates that there are two statistically significant discriminators on the five percent level. Generally the hypotheses to be tested is like in the factor analysis H 0 : H 1 : The number of discriminators = m More is needed (5) On the basis of the within-matrices the first discriminator indicates that the species differ with respect to the overall size of the leaves and the second discriminator that species differ also with respect to the width of the leaves.

Number of Discriminant Functions Example 9.6: Bankruptcy risk and signal to reorganization of a company (Laitinen, Luoma, Pynnönen 1996, UV, Discussion Papers 200) Thus we have four groups.

Number of Discriminant Functions Sample Table statistics: 7. Descriptive statistics of groups for estimation data. B 1 (n=20) B 2 (n=20) N 3 (n=17) N 4 (n=23) F for eq Variable Mean Std Dev Mean Std Dev Mean Std Dev Mean Std Dev of means ROI -10.24 8.60 3.52 5.59 2.27 7.14 12.02 5.96 37.66*** TCF -13.32 10.83 0.13 2.31 0.97 5.00 6.47 5.67 32.48*** QRA 0.58 0.39 0.57 0.55 1.14 0.70 0.85 0.42 4.95** SCA -0.61 20.22-4.75 18.79 13.62 13.19 23.13 19.55 10.39*** DSR 1.09 0.55 0.69 0.25 0.88 0.34 0.57 0.28 7.62*** **=significant at level 0.01 ***=significant at level 0.001

Number of Discriminant Functions Number of canonical discriminant functions: The results indicate that also the third canonical discriminant function is statistically significant.

Number of Discriminant Functions Canonical structure and standardized coefficients: Table 11. Canonical structure and Standardized canonical coefficients both as pooled within. Canonical structure* Standardized coefficient Variable CAN1 CAN2 CAN3 CAN1 CAN2 CAN3 ROI 0.702 0.036 0.004 0.717 0.013-0.737 TCF 0.643 0.059 0.467 0.372-0.458 0.983 QRA 0.101 0.513 0.653-0.061 0.563 0.661 SCA 0.252 0.773-0.168 0.169 0.946-0.522 DSR -0.306 0.203 0.149-0.722 0.034 0.16 *Correlation coefficients between original variables and canonical variables.

Number of Discriminant Functions Interpretation of the discriminant functions:

Number of Discriminant Functions Group differences:

Number of Discriminant Functions CAN1, the financial performance, shows that the financial performance is the main characteristic differentiating healthy and bankruptcy firms (as expected). CAN2, controversy dynamic liquidity and static ratios, is differentiating characteristic between reorganizable non-bankrupt and reorganizable bankrupt firms. CAN3, controversy between liquidity and other ratios, reorganizable non-bankrupt firms and healthy firms. The distinction is probably due to the fact that non-bankrupt firms may have cash reserves (high liquidity), but do not use it profitably.