Discriminant Correspondence Analysis

Similar documents
How to analyze multiple distance matrices

The STATIS Method. 1 Overview. Hervé Abdi 1 & Dominique Valentin. 1.1 Origin and goal of the method

1 Overview. 2 Multiple Regression framework. Effect Coding. Hervé Abdi

RV Coefficient and Congruence Coefficient

1 Overview. Conguence: Congruence coefficient, R V -coefficient, and Mantel coefficient. Hervé Abdi

Analyse des Correspondence Multiples (ACM) Multiple Correspondence Analysis (MCA) Hervé Abdi. Refresher! and Menu MCA... MCA... Home Page.

1 Overview. Coefficients of. Correlation, Alienation and Determination. Hervé Abdi Lynne J. Williams

Principal Component Analysis

DISTATIS: The Analysis of Multiple Distance Matrices

Canonical Correlation Analysis

Forecasting 1 to h steps ahead using partial least squares

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Correspondence analysis

Y (Nominal/Categorical) 1. Metric (interval/ratio) data for 2+ IVs, and categorical (nominal) data for a single DV

(Analyse des Correspondances (AFC) (Simple) Correspondence Analysis (CA) Hervé Abdi CNAM STA201

Generalizing Partial Least Squares and Correspondence Analysis to Predict Categorical (and Heterogeneous) Data

Principal component analysis

How to Sparsify the Singular Value Decomposition with Orthogonal Components

A Multivariate Perspective

Evaluation of scoring index with different normalization and distance measure with correspondence analysis

Part III: Unstructured Data. Lecture timetable. Analysis of data. Data Retrieval: III.1 Unstructured data and data retrieval

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides

Lecture 2: Data Analytics of Narrative

Linear Regression and Its Applications

Assessing Sample Variability in the Visualization Techniques related to Principal Component Analysis : Bootstrap and Alternative Simulation Methods.

Relate Attributes and Counts

2 Theory. 2.1 State Space Representation S 2 S 1 S 3

Biplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 6 Offprint

Discriminant analysis and supervised classification

Correspondence Analysis (CA)

Table of Contents. Multivariate methods. Introduction II. Introduction I

5. Discriminant analysis

Discriminant Analysis for Interval Data

An Alternative Algorithm for Classification Based on Robust Mahalanobis Distance

Distances and similarities Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining

1 A factor can be considered to be an underlying latent variable: (a) on which people differ. (b) that is explained by unknown variables

Statistics 202: Data Mining. c Jonathan Taylor. Week 2 Based in part on slides from textbook, slides of Susan Holmes. October 3, / 1

Part I. Other datatypes, preprocessing. Other datatypes. Other datatypes. Week 2 Based in part on slides from textbook, slides of Susan Holmes

Multicategory Vertex Discriminant Analysis for High-Dimensional Data

Computational paradigms for the measurement signals processing. Metodologies for the development of classification algorithms.

Linear Algebra (Review) Volker Tresp 2017

15 Singular Value Decomposition

Chapter 3 Transformations

FINM 331: MULTIVARIATE DATA ANALYSIS FALL 2017 PROBLEM SET 3

Multivariate Statistics Random Projections and Johnson-Lindenstrauss Lemma

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Principal Component Analysis. Applied Multivariate Statistics Spring 2012

Motivating the Covariance Matrix

Latent Class Analysis for Models with Error of Measurement Using Log-Linear Models and An Application to Women s Liberation Data

Non-parametric (Distribution-free) approaches p188 CN

ECE521 week 3: 23/26 January 2017

Audio transcription of the Principal Component Analysis course

Biplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 11 Offprint. Discriminant Analysis Biplots

Consideration on Sensitivity for Multiple Correspondence Analysis

1. The Polar Decomposition

DISCRIMINANT ANALYSIS. 1. Introduction

Singular value decomposition (SVD) of large random matrices. India, 2010

STATISTICS 407 METHODS OF MULTIVARIATE ANALYSIS TOPICS

Basics of Multivariate Modelling and Data Analysis

1. Introduction to Multivariate Analysis

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

More Linear Algebra. Edps/Soc 584, Psych 594. Carolyn J. Anderson

Pattern Recognition 2

Problem # Max points possible Actual score Total 120

7. Assumes that there is little or no multicollinearity (however, SPSS will not assess this in the [binary] Logistic Regression procedure).

An Introduction to Multivariate Methods

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

1 Interpretation. Contents. Biplots, revisited. Biplots, revisited 2. Biplots, revisited 1

Machine Learning Concepts in Chemoinformatics

Deep Learning Book Notes Chapter 2: Linear Algebra

Least Squares Optimization

Problems with Stepwise Procedures in. Discriminant Analysis. James M. Graham. Texas A&M University

Chapter 6: Orthogonality

We use the overhead arrow to denote a column vector, i.e., a number with a direction. For example, in three-space, we write

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

A. Pelliccioni (*), R. Cotroneo (*), F. Pungì (*) (*)ISPESL-DIPIA, Via Fontana Candida 1, 00040, Monteporzio Catone (RM), Italy.

Abstract. 1 Introduction

Announcements (repeat) Principal Components Analysis

Freeman (2005) - Graphic Techniques for Exploring Social Network Data

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

Number of cases (objects) Number of variables Number of dimensions. n-vector with categorical observations

Discrete Multivariate Statistics

LIKELIHOOD RECEIVER FOR FH-MFSK MOBILE RADIO*

INRIA Rh^one-Alpes. Abstract. Friedman (1989) has proposed a regularization technique (RDA) of discriminant analysis

Shortest paths with negative lengths

Maths for Signals and Systems Linear Algebra in Engineering

Group comparison test for independent samples

Class business PS is due Wed. Lecture 20 (QPM 2016) Multivariate Regression November 14, / 44

Lecture 5: Ecological distance metrics; Principal Coordinates Analysis. Univariate testing vs. community analysis

Correspondence Analysis

Numerical Methods I: Eigenvalues and eigenvectors

Large Scale Data Analysis Using Deep Learning

Classification: Linear Discriminant Analysis

Estimating representational dissimilarity measures

Regularized Discriminant Analysis and Reduced-Rank LDA

Review for Exam 1. Erik G. Learned-Miller Department of Computer Science University of Massachusetts, Amherst Amherst, MA

Multivariate Analysis. A New Method for Statistical Multidimensional Unfolding

Factor Analysis. Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA

Transcription:

Discriminant Correspondence Analysis Hervé Abdi Overview As the name indicates, discriminant correspondence analysis (DCA) is an extension of discriminant analysis (DA) and correspondence analysis (CA). Like discriminant analysis, the goal of DCA is to categorize observations in pre-defined groups, and like correspondence analysis, it is used with nominal variables. The main idea behind DCA is to represent each group by the sum of its observations and to perform a simple CA on the groups by variables matrix. The original observations are then projected as supplementary elements and each observation is assigned to the closest group. The comparison between the a priori and the a posteriori classifications can be used to assess the quality of the discrimination. A similar procedure can be used to assign new observations to categories. The stability of the analysis can be evaluated using cross-validation techniques such as jackknifing or bootstrapping. In: Neil Salkind (Ed.) (007). Encyclopedia of Measurement and Statistics. Thousand Oaks (CA): Sage. Address correspondence to: Hervé Abdi Program in Cognition and Neurosciences, MS: Gr.4., The University of Texas at Dallas, Richardson, TX 7508 0688, USA E-mail: herve@utdallas.edu http://www.utd.edu/ herve

An example It is commonly thought that the taste of wines depends upon their origin. As an illustration we have sampled wines coming from different origins (4 wines per origin) and asked a professional taster (unaware of the origin of the wines) to rate these wines on 5 scales. The scores of the taster were then transformed into binary codes to form an indicator matrix (as in multiple correspondence analysis). For example, a score of on the Fruity" scale would be coded by the following pattern of binary values: 0 0. An additional unknown wine was also evaluated by the taster with the goal of predicting its origin from the ratings. The data are given in Table. Notations There are K groups, each group comprising I k observations and the sum of the I k s is equal to I which is the total number of observations. For convenience, we assume that the observations constitute the rows of the data matrix, and that the variables are the columns. There are J variables. The I J data matrix is denoted X. The indicator matrix is an I K matrix denoted Y in which a value of indicates that the row belongs to the group represented by the column and a value of 0 indicates that it does not. The K J matrix denoted N, is called the group matrix, it stores the total of the variables for each category. For our example, we find that: N = Y T X = 0 0 0 0 0 0 0 0 () Performing CA on the group matrix N provides two sets of factor scores: one for the groups (denoted F) and one for the variables (denoted G). These factor scores are, in general scaled such that their variance is equal to the eigenvalue associated with the factor. The grand total of the table is noted N, and the first step of the analysis is to compute the probability matrix Z = N N. We.

Table : Data the region wines example: wines from dierent regions are rated on 5 descriptors. A value of indicates that the wine possesses the given value of the variable. The wine W? is an unknown wine treated as a supplementary observation. Woody Fruity Sweet Alcohol Hedonic Wine Region 4 Loire 0 0 0 0 0 0 0 0 0 0 0 Loire 0 0 0 0 0 0 0 0 0 0 0 Loire 0 0 0 0 0 0 0 0 0 0 0 4 Loire 0 0 0 0 0 0 0 0 0 0 0 Loire 0 0 0 0 5 Rhône 0 0 0 0 0 0 0 0 0 0 0 6 Rhône 0 0 0 0 0 0 0 0 0 0 0 7 Rhône 0 0 0 0 0 0 0 0 0 0 0 8 Rhône 0 0 0 0 0 0 0 0 0 0 0 Rhône 0 9 Beaujolais 0 0 0 0 0 0 0 0 0 0 0 0 Beaujolais 0 0 0 0 0 0 0 0 0 0 0 Beaujolais 0 0 0 0 0 0 0 0 0 0 0 Beaujolais 0 0 0 0 0 0 0 0 0 0 0 Beaujolais 0 0 0 W?? 0 0 0 0 0 0 0 0 0 0 0

denote r the vector of the row totals of Z, (i.e., r = Z, with being a conformable vector of s) c the vector of the column totals, and D c = diag {c}, D r = diag {r}. The factor scores are obtained from the following singular value decomposition: D r ( Z rc T) D c = P Q T. () ( is the diagonal matrix of the singular values, and Λ = is the matrix of the eigenvalues). The row and (respectively) column factor scores are obtained as F = D r P and G = D c Q. () The squared (χ ) distances from the rows and columns to their respective barycenters are obtained as { d r = diag FF T} { and d c = diag GG T}. (4) The squared cosines between row i and factor l and column j and factor l are obtained respectively as: o i,l = f i,l d r,i and o j,l = g j,l d c,j. (5) (with d r,i, and d c,j, being respectively the i -th element of d r and the j -th element of d c ). Squared cosines help locating the factors important for a given observation. The contributions of row i to factor l and of column j to factor l are obtained respectively as: t i,l = f g i,l j,l and t j,l =. (6) λ l λ l Contributions help locating the observations important for a given factor. Supplementary or illustrative elements can be projected onto the factors using the so called transition formula. Specifically, let i T sup being an illustrative row and j sup being an illustrative column to be projected. Their coordinates f sup and g sup are obtained as: ( ) f sup = i T ( ) sup i T sup G and g sup = j T sup j T sup F. (7) 4

[note that the scalar terms (i T sup ) and ( j T sup ) are used to insure that the sum of the elements of i sup or j sup is equal to one, if this is already the case, these terms are superfluous]. After the analysis has been performed on the groups, the original observations are projected as supplementary elements and their factor scores are stored in a matrix denoted F sup. To compute these scores, first compute the matrix of row profiles and then apply Equation 7 to obtain R = ( diag {X} ) X (8) F sup = RG. (9) The Euclidean distance between the observations and the groups computed from the factor scores is equal to the χ -distance between their row profiles. The I K distance matrix between observations and groups is computed as D = s sup T + s T F sup F T (0) with { } { s sup = diag F sup F T sup and s = diag FF T}. () Each observation is then assigned to the closest group.. Model Evaluation The quality of the discrimination can be evaluated as a fixed effect model or as a random effect model. For the fixed effect model, the correct classifications are compared to the assignments obtained from Equation 0. The fixed effect model evaluates the quality of the classification on the sample used to build the model. The random effect model evaluates the quality of the classification on new observations. Typically, this step is performed using cross-validation techniques such as jackknifing or bootstrapping. 5

Table : Factor scores, squared cosines, and contributions for the variables (J-set). Contributions corresponding to negative scores are in italic. Woody Fruity Sweet Alcohol Hedonic 4 Axis λ % Factor Scores.5 55.9.05.88.88.05.9.5..04.4..0 0 0 0 0.0 44.04.5...5.04.64..8.74..40 0 0 0 0 Axis Squared Cosines.998.0.89.89.0.998.84.864.0.05.864.0 0 0 0 0.00.979.08.08.979.00.66.7.979.965.7.979 0 0 0 0 Axis Contributions..00.07.07.00..05.09.00.007.09.008 0 0 0 0.0006.0405.0.0.0405.0006.09.0056.04.5.0056.4860 0 0 0 0 6

Table : Factor scores, squared cosines, and contributions for the regions, and the supplementary rows being the wines from the region and the mysterious wine (W?). Contributions corresponding to negative scores are in italic. Loire Rhône Beaujolais Region Wines Region Wines Region Wines Axis λ % Loire 4 Rhône 4 Beaujolais 4 W? Factor Scores 5 55 0.66 0.8 0.50 0.4 0.89 0.0 0.07 0.66 0. 0.9 0.56 0.74 0.4 0. 0.96.0 0 44 0. 0.4 0.05 0.5 0. 0.6.05 0.9 0.0 0.64 0.9 0.7 0.4 0.0 0. 0.5 Squared Cosines.89.79.99.75.94.0.00..56.7.67.5.47.56.90.98...0.5.06.97.00.67.44.8..49.5.44.0.0 Contributions.58.....0.....4......09.....65.....6..... 7

λ =.0 τ = 44% Alcohol λ =.0 τ = 44% 4 Beaujolais Rhone a 4 Loire? 4 λ =.5 τ = 56% Fruit Wood Sugar Beaujolais Rhone Fruit Wood Hedonic4 Sugar Loire Sugar Alcohol Alcohol b Fruit Wood λ =.5 τ = 56% Figure :. Projections on the rst dimensions. (a) The I set: rows (i.e., wines), The wines are projected as supplementary elements, Wine? is an unknown wine. (b) The J set: columns (i.e., descriptors). The wines categories have also been projected for ease of interpretation. Both gures have the same scale (some projection points have been slightly moved to increase readability). (Projections from Tables and ). a Figure :. Projections on the rst dimensions. (a) Fixed eect model. The three regions and the convex envelop for the wines. (b) Random eect model. The jackknifed wines have been projected back onto the xed eect solution. The convex envelop shows that the random eect categories have a larger variability and have moved. b 8

4 Results Tables and give the results of the analysis and Figure displays them. The fixed effect quality of the model is evaluated by the following confusion matrix: 4 0 0 0 0 0 4. () In this matrix, the rows are the assigned groups and the columns are the real groups. For example, out of 5 wines assigned to the wine region Beaujolais (Group ), one wine was in fact from the Rhône region (Group ) and 4 wines were from Beaujolais. The overall quality can be computed from the diagonal of the matrix. Here we find that (4 + + 4) wines out of were correctly classified. A jackknife procedure was used in order to evaluate the generalization capacity of the analysis to new wines (i.e., this corresponds to a random effect analysis). Each wine was in turn taken out of the sample, a DCA was performed on the remaining sample of wines, and the wine taken out was assigned to the closest group. This gave the following confusion matrix:. () As expected, the performance of the model as a random effect is less impressive than as a fixed effect model. Now only 6 ( + + ) wines out of are correctly classified. The differences between the fixed and the random effect models are illustrated in Figure where the jackknifed wines have been projected onto the fixed effect solution (using metric multidimensional scaling, see entry). The quality of the model can be evaluated by drawing the convex envelop of each category. For the fixed effect model, the centers of gravity of the convex envelops are the categories and this illustrates that DCA is a least square estimation technique. For the random effect model, the degradation 9

of performance is due to a larger variance (the areas of the convex envelops are larger) and to a rotation of the envelops (the convex envelops are no longer centered on the category centers of gravity). References [] Abdi, H. (00). Multivariate analysis. In M. Lewis-Beck, A. Bryman, & T. Futing (Eds): Encyclopedia for research methods for the social sciences. Thousand Oaks: Sage. [] Clausen, S.E. (998). Applied correspondence analysis. Thousand Oaks (CA): Sage. [] Greenacre, M.J. (984). Theory and applications of correspondence analysis. London: Academic Press. [4] Greenacre, M.J. (99). Correspondence analysis in practice. London: Academic Press. [5] Weller S.C., & Romney, A.K. (990). Metric scaling: Correspondence analysis. Thousand Oaks (CA): Sage. 0