Discriminant Analysis

Similar documents
M A N O V A. Multivariate ANOVA. Data

Principal component analysis

Y (Nominal/Categorical) 1. Metric (interval/ratio) data for 2+ IVs, and categorical (nominal) data for a single DV

MANOVA MANOVA,$/,,# ANOVA ##$%'*!# 1. $!;' *$,$!;' (''

Applied Multivariate Analysis

Investigating Models with Two or Three Categories

MANOVA is an extension of the univariate ANOVA as it involves more than one Dependent Variable (DV). The following are assumptions for using MANOVA:

Neuendorf MANOVA /MANCOVA. Model: X1 (Factor A) X2 (Factor B) X1 x X2 (Interaction) Y4. Like ANOVA/ANCOVA:

Neuendorf MANOVA /MANCOVA. Model: X1 (Factor A) X2 (Factor B) X1 x X2 (Interaction) Y4. Like ANOVA/ANCOVA:

Frequency Distribution Cross-Tabulation

MULTINOMIAL LOGISTIC REGRESSION

Quiz #3 Research Hypotheses that Involve Comparing Non-Nested Models

MULTIVARIATE HOMEWORK #5

Neuendorf MANOVA /MANCOVA. Model: MAIN EFFECTS: X1 (Factor A) X2 (Factor B) INTERACTIONS : X1 x X2 (A x B Interaction) Y4. Like ANOVA/ANCOVA:

1. BINARY LOGISTIC REGRESSION

4.1 Computing section Example: Bivariate measurements on plants Post hoc analysis... 7

Chapter 7, continued: MANOVA

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

Multivariate analysis of variance and covariance

Discriminant Analysis: A Case Study of a War Data Set

Classification: Linear Discriminant Analysis

Chapter 19: Logistic regression

Example 1 describes the results from analyzing these data for three groups and two variables contained in test file manova1.tf3.

Experimental Design and Data Analysis for Biologists

The Bayes classifier

Multivariate Regression (Chapter 10)

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

Textbook Examples of. SPSS Procedure

INFORMATION THEORY AND STATISTICS

Multivariate Statistical Analysis

STAT 501 EXAM I NAME Spring 1999

Repeated Measures ANOVA Multivariate ANOVA and Their Relationship to Linear Mixed Models

Sampling distribution of t. 2. Sampling distribution of t. 3. Example: Gas mileage investigation. II. Inferential Statistics (8) t =

STAT Chapter 13: Categorical Data. Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure).

The SAS System 18:28 Saturday, March 10, Plot of Canonical Variables Identified by Cluster

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

Review of Multiple Regression

Applied Multivariate and Longitudinal Data Analysis

One-way ANOVA. Experimental Design. One-way ANOVA

Regularized Discriminant Analysis and Reduced-Rank LDA

SPSS LAB FILE 1

N Utilization of Nursing Research in Advanced Practice, Summer 2008

STAT 501 Assignment 2 NAME Spring Chapter 5, and Sections in Johnson & Wichern.

Prepared by: Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti

Repeated-Measures ANOVA in SPSS Correct data formatting for a repeated-measures ANOVA in SPSS involves having a single line of data for each

Statistical Tools for Multivariate Six Sigma. Dr. Neil W. Polhemus CTO & Director of Development StatPoint, Inc.

Levene's Test of Equality of Error Variances a

Data Analysis and Statistical Methods Statistics 651

Group comparison test for independent samples

Lecture 8: Classification

Hypothesis testing:power, test statistic CMS:

Assignment 3. Introduction to Machine Learning Prof. B. Ravindran

Univariate Analysis of Variance

Lecture 6: Single-classification multivariate ANOVA (k-group( MANOVA)

EXAM PRACTICE. 12 questions * 4 categories: Statistics Background Multivariate Statistics Interpret True / False

Classification 1: Linear regression of indicators, linear discriminant analysis

Wald s theorem and the Asimov data set

General Linear Model

DISCRIMINANT ANALYSIS IN THE STUDY OF ROMANIAN REGIONAL ECONOMIC DEVELOPMENT

Multivariate Analysis of Variance

THE ROYAL STATISTICAL SOCIETY 2015 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE MODULE 3

ANOVA, ANCOVA and MANOVA as sem

p(x ω i 0.4 ω 2 ω

CHAPTER 2. Types of Effect size indices: An Overview of the Literature

STAT5044: Regression and Anova

Discriminant Analysis (DA)

How do we compare the relative performance among competing models?

Lecture 06. DSUR CH 05 Exploring Assumptions of parametric statistics Hypothesis Testing Power

FAQ: Linear and Multiple Regression Analysis: Coefficients

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

CLASSICAL NORMAL-BASED DISCRIMINANT ANALYSIS

Stevens 2. Aufl. S Multivariate Tests c

Example. Multiple Regression. Review of ANOVA & Simple Regression /749 Experimental Design for Behavioral and Social Sciences

13.1 Categorical Data and the Multinomial Experiment

STATISTICS 407 METHODS OF MULTIVARIATE ANALYSIS TOPICS

MULTIVARIATE ANALYSIS OF VARIANCE

Model Estimation Example

Ch. 11 Inference for Distributions of Categorical Data

Discriminant Analysis

WELCOME! Lecture 13 Thommy Perlinger

4/2/2018. Canonical Analyses Analysis aimed at identifying the relationship between two multivariate datasets. Cannonical Correlation.

T. Mark Beasley One-Way Repeated Measures ANOVA handout

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

Machine Learning Linear Classification. Prof. Matteo Matteucci

BIOL 458 BIOMETRY Lab 8 - Nested and Repeated Measures ANOVA

You can compute the maximum likelihood estimate for the correlation

Analysis of Variance

The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions.

Chapter 9. Multivariate and Within-cases Analysis. 9.1 Multivariate Analysis of Variance

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami

10: Crosstabs & Independent Proportions

Descriptive Statistics

CCR RULE GROUNDWATER STATISTICAL METHOD SELECTION CERTIFICATION

Testing for Normality

The entire data set consists of n = 32 widgets, 8 of which were made from each of q = 4 different materials.

CCR RULE GROUNDWATER STATISTICAL METHOD SELECTION CERTIFICATION

Repeated Measures Part 2: Cartoon data

Using SPSS for One Way Analysis of Variance

Mathematical Notation Math Introduction to Applied Statistics

6-1. Canonical Correlation Analysis

Transcription:

Discriminant Analysis V.Čekanavičius, G.Murauskas 1 Discriminant analysis one categorical variable depends on one or more normaly distributed variables. Can be used for forecasting. V.Čekanavičius, G.Murauskas 2 1

Data (X 11, X 21, X 31,, X k1, Y 1 ),., (X 1n, X 2n, X 3n,, X kn, Y n ). Dependent variable Y - categorical, Independent variables X- are normal for each category. Covariance matrices of variables for each category are equal. V.Čekanavičius, G.Murauskas 3 Y categorical X 1 X 2 X 3 Normal variables V.Čekanavičius, G.Murauskas 4 2

Checking for assumptions In social sciences checking for assumptions is very rare. Slight violations of assumptions are allowed. V.Čekanavičius, G.Murauskas 5 Discriminant properties for each independent variable Small Wilks lambda variable has better discrimination properties. Variable is statistically significant if Wilks test s p-value p < 0.05. Typically non-significant variables are dropped from the model. Sometimes non-significant variable is retained in the model (if the model with it is much better). 6 3

Large Wilks lambda If some variables in comparison to other variables have much larger Wilk s lambdas (5-7 times larger) one should try discriminant analysis without those variables. Note that dropping of variables changes ALL characteristics (Box test, classification table etc.). V.Čekanavičius, G.Murauskas 7 Canonical functions Just like in MANOVA special functions are constructed accounting for the variance of independent variables: f 1 (x)=a 1 +b 11 X 1 + b 21 X 2 + +b k1 X k, f 2 (x)=a 2 +b 12 X 1 + b 22 X 2 + +b k2 X k,.. If Y has k possible values, then k-1 canonical function is constructed. Most important is the first> second> V.Čekanavičius, G.Murauskas 8 4

Taking into account canonical functions Canonical functions help to check which models variables are most important. If many cases only the first canonical function really matters. V.Čekanavičius, G.Murauskas 9 Checking for the dominant first canonical function If there is more than one canonical function we check what percent of the common variance is explained by the first canonical function. We do not check any percentage if there is only one canonical function (Y has two values). V.Čekanavičius, G.Murauskas 10 5

If the first canonical function dominates The more important variables has the larger absolute value of standardized coefficient. The more important variables has the larger absolute value of correlation with the first canonical function. Not important variables are candidates for dropping from the model. V.Čekanavičius, G.Murauskas 11 Classification table One of the main indicators for the model fit. Classification table shows correct and incorrect classifications when discriminant analysis model is applied to the initial data. V.Čekanavičius, G.Murauskas 12 6

Standard investigation: Classification table. Checking which canonical functions are more important. Checking which variables are more important. Wilks test for significant variables. (Forecasting). Checking for normality (K-S p>0.05). Checking for equality of covariance matrices (Box statistics p>0.05). V.Čekanavičius, G.Murauskas 13 Example: Is it possible to distinguish among lithuanians, latvians and estonians taking into account their answers to the questions about: The sea (test1), Sport (test2), Neighboring countries (test3) V.Čekanavičius, G.Murauskas 14 7

Data V.Čekanavičius, G.Murauskas 15 Analyze -> Classify -> Discriminant V.Čekanavičius, G.Murauskas 16 8

Analyze -> Classify -> Discriminant Dependent variable Here Independent variables V.Čekanavičius, G.Murauskas 17 Statistics check varnos check V.Čekanavičius, G.Murauskas 18 9

Classify -> Discriminant Next here V.Čekanavičius, G.Murauskas 19 Classify varnos check V.Čekanavičius, G.Murauskas 20 10

General statistics V.Čekanavičius, G.Murauskas 21 Vilk s l Tests of Equality of Group Means p-values TEST1 TEST2 TEST3 Wilks' Lambda F df1 df2 Sig..039 406.803 2 33.000.572 12.364 2 33.000.311 36.491 2 33.000 All variables are statistically significant. However Vilk s lis small for TEST1 (Sea) only. TEST1 is most important in the model. V.Čekanavičius, G.Murauskas 22 11

REMARK Taking into account that Wilk s lambda is small for TEST1 only, one should try also the discriminant analysis without TEST2 and TEST3. Then both models should be compared. (Here, the results of this comparative analysis are omitted). Test Results Box's M 9,094 Approx.,880 F df1 12 df2 1791,789 Sig.,682 Tests null hypothesis of equal population covariance matrices. Box p>0.05, covariance matrices do not differ significantly V.Čekanavičius, G.Murauskas 24 12

Summary of Canonical Discriminant Functions Function 1 2 Eigenvalues Eigenvalue % of Variance Cumulative % Canonical Correlation 33.751 a 99.6 99.6.986.129 a.4 100.0.338 a. First 2 canonical discriminant functions were used in the ana f 1 explaines 99.6 % of common variance, f 2 0.4 %. f 1 dominates. V.Čekanavičius, G.Murauskas 25 TEST1 TEST2 TEST3 Structure Matrix Function 1 2.854 *.498 -.136.987 *.254.514 * Pooled within-groups correlations between discriminating variables and standardized canonical discriminant functions Variables ordered by absolute size of correlation within function. *. Largest absolute correlation between each variable and any discriminant function The strongest correlation of f 1 is with TEST1 (sea). V.Čekanavičius, G.Murauskas 26 13

Canonical Discriminant Functions f 1 - good discrimination f 2 - bad discrimination 3 2 1 0 lietuviai latviai -1 estai SALIS Group Centroids Function 2-2 -3-8 -6-4 -2 0 2 4 6 8 estai latviai lietuviai Function 1 V.Čekanavičius, G.Murauskas 27 Classification Results a correctly classified Predicted Group Membershi SALIS 1 lietuviai 2 latviai 3 estai Total Original Count 1 lietuvia 16 0 0 16 2 latviai 0 11 2 13 3 estai 0 2 5 7 % 1 lietuvia 100.0.0.0 100.0 2 latviai.0 84.6 15.4 100.0 3 estai.0 28.6 71.4 100.0 a.88.9% of original grouped cases correctly classified. V.Čekanavičius, G.Murauskas 28 14

Classification Results a Original Count % SALIS 1 lietuviai 2 latviai 3 estai Total 1 lietuvia 16 0 0 16 2 latviai 3 estai 1 lietuvia 2 latviai 3 estai Predicted Group Membershi 0 11 2 13 0 2 5 7 100.0.0.0 100.0.0 84.6 15.4 100.0.0 28.6 71.4 100.0 a.88.9% of original grouped cases correctly classified. Percents of Correct classification V.Čekanavičius, G.Murauskas 29 (forecasting) variables Classification Function Coefficients SALIS 1 lietuviai 2 latviai 3 estai TEST1-1.234.461.163 TEST2 7.881 6.221 6.221 TEST3 1.101.685.780 (Constant) -351.724-301.126-278.343 Fisher's linear discriminant functions for lithuanians Fisher s function is = -1.23*TEST1 + 7.88*TEST2+1.10*TEST3-351.72 V.Čekanavičius, G.Murauskas 30 15

(forecasting) Classification Function Coefficients SALIS 1 lietuviai 2 latviai 3 estai TEST1-1.234.461.163 TEST2 7.881 6.221 6.221 TEST3 1.101.685.780 (Constant) -351.724-301.126-278.343 Fisher's linear discriminant functions for latvians Fisher s function is = 0.46*TEST1 + 6.22*TEST2+0.68*TEST3-301.12 V.Čekanavičius, G.Murauskas 31 (forecasting) Classification Function Coefficients SALIS 1 lietuviai 2 latviai 3 estai TEST1-1.234.461.163 TEST2 7.881 6.221 6.221 TEST3 1.101.685.780 (Constant) -351.724-301.126-278.343 Fisher's linear discriminant functions for Estonians Fisher s function is = 0.16*TEST1 + 6.22*TEST2+0.78*TEST3-278.34 V.Čekanavičius, G.Murauskas 32 16

Forecasting Let TEST1=30, TEST2= 80, TEST3=70. Fisher s functions then: For Lithuanians= 318.78. For Latvians= 257.91. For Estonians= 269.75. We forecast that this respondent is Lithuanian (the largest value of the corresponding Fisher s function). V.Čekanavičius, G.Murauskas 33 Checking for normality 17

Checking for normality Normality of variables should be checked for each category of dependent variable. One can use select cases (three times) Or Split case We demonstrate the last option V.Čekanavičius, G.Murauskas 35 Data -> Split file move Check V.Čekanavičius, G.Murauskas 36 18

Analyze ->Descriptive-> Explore Move Check Here V.Čekanavičius, G.Murauskas 37 Check V.Čekanavičius, G.Murauskas 38 19

salis = 1 lietuviai For Lithuanians: test1 is not normal (p<0.05), test2 and test3 are normal V.Čekanavičius, G.Murauskas 39 Q-Q plot for test2 also shows similarity to normal distribution (all points are close to the line) V.Čekanavičius, G.Murauskas 40 20

salis = 2 latviai For Latvians: test1, test2 and test3 are normal V.Čekanavičius, G.Murauskas 41 Checking for normality One should check for normality for ALL variables and all categories (Lithuanians, Latvians and Estonians). Kolomogorov-Smirnov and/or Shapiro- Wilk tests and sometimes Q-Q plots suffice. V.Čekanavičius, G.Murauskas 42 21