Correspondence Analysis

Size: px
Start display at page:

Download "Correspondence Analysis"

Transcription

1 Correspondence Analysis Julie Josse, François Husson, Sébastien Lê Applied Mathematics Department, Agrocampus Ouest user-2008 Dortmund, August 11th / 22

2 History Theoretical principles: Fisher (1940) Correspondence Analysis has been actively developed in in Rennes! JP. Benzécri: mathematician and linguist PhD thesis of his student B. Escofier : Correspondence Analysis The beginning of the "French school" 2 / 22

3 CA in the R packages anacor (de Leeuw and mair) ca (Nenadic and Greenacre) ade4 (Chessel) vegan (Dixon) homals (de Leeuw) FactoMineR (Husson et al.) 3 / 22

4 Data, examples Two categorical variables contingency table. Symmetric role of the rows and the columns Examples: examples where a χ 2 test can be applied text-mining: number of times the word i is in the text j solutions (acid, bitter, etc.) - answers (acid, bitter, etc.): number of persons who answer j for the stimulus i perfumes - descriptors: number of times the descriptor j is used to describe the perfume i 4 / 22

5 Notations Figure: Data table in CA. 5 / 22

6 Notations Figure: Row profile and column profile. 6 / 22

7 Aim Rows typology Columns typology Relationship between these two typologies Study the relationship (the correspondence) between the two variables, the gap to independence Visualize the association between levels 7 / 22

8 Example 12 perfumes described by 39 words: floral fruity strong soft light... Angel Aromatics Elixir Chanel Cinéma Coco Mademoiselle / 22

9 Chi-square: Intensity of the relationship χ 2 = ij = n ij (n ij n i. n.j /n) 2, n i. n.j /n (f ij f i. f.j ) 2 f i. f.j, = nφ 2. φ 2 is the intensity of the relationship Chi-square test (Pearson): χ 2 obs χ2 (I 1) (J 1) χ 2 = 615.8, (p-value = 1.7e-56) Highly significant obs 9 / 22

10 Nature of the relationship Contribution to the chi-square: x 2 ij = (n ij n i. n.j /n) 2 n i. n.j /n Contribution of each cell, contribution of each row, contribution of each column? Residuals (positive or negative association): x ij = (n ij n i. n.j /n) ni. n.j /n CA: visualize the residuals matrix X (the gap to independence) As usual, the association structure of X is revealed using the SVD 10 / 22

11 Total inertia Total inertia = φ 2 = trace(xx ) = χ2 n φ 2 = χ2 n = ij (f ij f i. f.j ) 2 f i. f.j φ 2 = ij f.j ( fij f.j f i. ) 2 f i. Similarly: φ 2 = χ2 n = ij f i. ( fij f i. f.j ) 2 f.j 11 / 22

12 Total inertia explained via the profile For the row profile: Weight for the columns Associated weight of the rows i φ 2 = ij f i. f ij f i. = f.j f i. ( fij f i. f.j ) 2 f.j Total inertia = weighted sum of squared distances of the rows profile to the average profile the weight of the row profile is its mass f i. and the squared distance is an Euclidean distance where each squared difference is divided by the corresponding average value f.j 12 / 22

13 χ 2 -distance the row i is a point in R J (with the weight f i. ) d 2 (i, l) = j 1 f.j ( fij f ) 2 lj f i. f l. the column j is a point in R I (with the weight f.j ) d 2 (j, h) = i 1 f i. ( fij f ) 2 ih f.j f.h These χ 2 -distances enjoy good properties (distributional equivalence principle) 13 / 22

14 CA: different formulations In factorial analysis such as PCA, we are looking for dimension which represent in the better way the variability between individuals (i.e the distance to the barycenter), in CA we are looking for dimensions which better represent the gap to independence. Classical presentation of CA: two weighted PCA on "row profile" and on "column profil" Canonical Analysis on the two indicator matrices 14 / 22

15 Link between the two representations: transition formulae F s (i) = 1 f ij G s (j) λs f i. Row i is at the barycenter of the weighted columns (with a scale 1/ λ s ) j G s (k) = 1 f ij F s (i) λs f.j Column k is at the barycenter of the weighted rows (with a scale 1/ λ s ) i 15 / 22

16 Graphical representation CA factor map vanilla Dim 2 (21.12%) sugary Lolita Lempika Cinéma L_instant fruity soft J_adore light Coco Pure Mademoiselle Poison J_adore_et discrete floral fresh Pleasures acid wooded Angel spicy soap strong Aromatics ShalimarElixir old Chanel agressive Dim 1 (60.46%) 16 / 22

17 Graphical representation CA factor map Dim 2 (21.12%) Lolita Lempika Cinéma L_instant J_adore Coco Pure Mademoiselle Poison J_adore_et Pleasures Angel Chanel 5 Aromatics ShalimarElixir Dim 1 (60.46%) 17 / 22

18 Graphical representation CA factor map Dim 2 (21.12%) sugary Lolita Lempika Cinéma L_instant young fruity soft J_adore light vanilla hot wooded Coco Pure Mademoiselle Poison discreet forest J_adore_et floral fresh rose shampoo nature Pleasures shower.gel candy Angel oriental acid spicy vegetable lemon woman soap heavy intense peppery agressive heady drugs strong eau.de.cologne Aromatics ShalimarElixir male alcohol powerful old toilets Chanel 5 amber musky Dim 1 (60.46%) 18 / 22

19 Graphical representation in CA: remarks The barycenter represents the independence The distance between levels of a same variable can be interpreted Representation provided are pseudo-barycentric (dilatation): transition formulae It is not possible to interpret the distance between levels of the two variables but it is at a weighted barycenter of all the levels 19 / 22

20 Helps to interpret Supplementary informations can be added (zero weight)! Percentage of variance for each axis: information brought by λ the dimension s, but, it is interesting to have a look at the s λs eigenvalues. λ s always smaller than 1; the value 1 is obtained for exclusive associations. library(factominer) don=diag(5) a=ca(don) a$eig don=matrix(1,5,5) a=ca(don) a$eig 20 / 22

21 Helps to interpret The maximum number of axes is min(i, J) 1 Quality of the representation: cos 2 Contribution: inertia of a point total inertia = f i.f s(i) 2 λ s. Be careful, extreme points are not those which contribute the most to the dimension 21 / 22

22 Practice library(factominer) perfume = read.table("perfume.txt",header=t,sep="\t",row.names=1) res.ca = CA(perfume,col.sup=16:39) plot(res.ca,invisible="row") plot(res.ca,invisible=c("col","col.sup")) res.ca$eig barplot(res.ca$eig[,1],main="eigenvalues",names.arg=1:nrow(res.ca$eig)) res.ca$row$coord res.ca$row$cos2 res.ca$row$contrib res.ca$col$coord res.ca$col$cos2 res.ca$col$contrib 22 / 22

Correspondence Analysis (CA)

Correspondence Analysis (CA) Correspondence Analysis (CA) François Husson & Magalie Houée-Bigot Department o Applied Mathematics - Rennes Agrocampus husson@agrocampus-ouest.r / 43 Correspondence Analysis (CA) Data 2 Independence model

More information

Bootstrap-Based Regularization for Low-Rank Matrix Estimation

Bootstrap-Based Regularization for Low-Rank Matrix Estimation Journal of Machine Learning Research 17 (016) 1-9 Submitted 1/14; Revised 3/16; Published 8/16 Bootstrap-Based Regularization for Low-Rank Matrix Estimation Julie Josse josse@agrocampus-ouest.fr Department

More information

Handling missing values in Multiple Factor Analysis

Handling missing values in Multiple Factor Analysis Handling missing values in Multiple Factor Analysis François Husson, Julie Josse Applied mathematics department, Agrocampus Ouest, Rennes, France Rabat, 26 March 2014 1 / 10 Multi-blocks data set Groups

More information

Lecture 2: Data Analytics of Narrative

Lecture 2: Data Analytics of Narrative Lecture 2: Data Analytics of Narrative Data Analytics of Narrative: Pattern Recognition in Text, and Text Synthesis, Supported by the Correspondence Analysis Platform. This Lecture is presented in three

More information

Missing values imputation for mixed data based on principal component methods

Missing values imputation for mixed data based on principal component methods Missing values imputation for mixed data based on principal component methods Vincent Audigier, François Husson & Julie Josse Agrocampus Rennes Compstat' 2012, Limassol (Cyprus), 28-08-2012 1 / 21 A real

More information

Correspondence analysis

Correspondence analysis C Correspondence Analysis Hervé Abdi 1 and Michel Béra 2 1 School of Behavioral and Brain Sciences, The University of Texas at Dallas, Richardson, TX, USA 2 Centre d Étude et de Recherche en Informatique

More information

Evaluation of scoring index with different normalization and distance measure with correspondence analysis

Evaluation of scoring index with different normalization and distance measure with correspondence analysis Evaluation of scoring index with different normalization and distance measure with correspondence analysis Anders Nilsson Master Thesis in Statistics 15 ECTS Spring Semester 2010 Supervisors: Professor

More information

Simple and Canonical Correspondence Analysis Using the R Package anacor

Simple and Canonical Correspondence Analysis Using the R Package anacor Simple and Canonical Correspondence Analysis Using the R Package anacor Jan de Leeuw University of California, Los Angeles Patrick Mair WU Wirtschaftsuniversität Wien Abstract This paper presents the R

More information

Multivariate Statistics Fundamentals Part 1: Rotation-based Techniques

Multivariate Statistics Fundamentals Part 1: Rotation-based Techniques Multivariate Statistics Fundamentals Part 1: Rotation-based Techniques A reminded from a univariate statistics courses Population Class of things (What you want to learn about) Sample group representing

More information

TOTAL INERTIA DECOMPOSITION IN TABLES WITH A FACTORIAL STRUCTURE

TOTAL INERTIA DECOMPOSITION IN TABLES WITH A FACTORIAL STRUCTURE Statistica Applicata - Italian Journal of Applied Statistics Vol. 29 (2-3) 233 doi.org/10.26398/ijas.0029-012 TOTAL INERTIA DECOMPOSITION IN TABLES WITH A FACTORIAL STRUCTURE Michael Greenacre 1 Universitat

More information

Regularized PCA to denoise and visualise data

Regularized PCA to denoise and visualise data Regularized PCA to denoise and visualise data Marie Verbanck Julie Josse François Husson Laboratoire de statistique, Agrocampus Ouest, Rennes, France CNAM, Paris, 16 janvier 2013 1 / 30 Outline 1 PCA 2

More information

Topic 21 Goodness of Fit

Topic 21 Goodness of Fit Topic 21 Goodness of Fit Contingency Tables 1 / 11 Introduction Two-way Table Smoking Habits The Hypothesis The Test Statistic Degrees of Freedom Outline 2 / 11 Introduction Contingency tables, also known

More information

Discriminant Correspondence Analysis

Discriminant Correspondence Analysis Discriminant Correspondence Analysis Hervé Abdi Overview As the name indicates, discriminant correspondence analysis (DCA) is an extension of discriminant analysis (DA) and correspondence analysis (CA).

More information

arxiv: v4 [stat.co] 8 Dec 2017

arxiv: v4 [stat.co] 8 Dec 2017 Multivariate Analysis of Mixed Data: The R Package PCAmixdata Marie Chavent,2, Vanessa Kuentz-Simonet 3, Amaury Labenne 3, Jérôme Saracco 2,4 December, 207 arxiv:4.49v4 [stat.co] 8 Dec 207 Université de

More information

Extended Mosaic and Association Plots for Visualizing (Conditional) Independence. Achim Zeileis David Meyer Kurt Hornik

Extended Mosaic and Association Plots for Visualizing (Conditional) Independence. Achim Zeileis David Meyer Kurt Hornik Extended Mosaic and Association Plots for Visualizing (Conditional) Independence Achim Zeileis David Meyer Kurt Hornik Overview The independence problem in -way contingency tables Standard approach: χ

More information

Visualizing Independence Using Extended Association and Mosaic Plots. Achim Zeileis David Meyer Kurt Hornik

Visualizing Independence Using Extended Association and Mosaic Plots. Achim Zeileis David Meyer Kurt Hornik Visualizing Independence Using Extended Association and Mosaic Plots Achim Zeileis David Meyer Kurt Hornik Overview The independence problem in 2-way contingency tables Standard approach: χ 2 test Alternative

More information

Generalizing Partial Least Squares and Correspondence Analysis to Predict Categorical (and Heterogeneous) Data

Generalizing Partial Least Squares and Correspondence Analysis to Predict Categorical (and Heterogeneous) Data Generalizing Partial Least Squares and Correspondence Analysis to Predict Categorical (and Heterogeneous) Data Hervé Abdi, Derek Beaton, & Gilbert Saporta CARME 2015, Napoli, September 21-23 1 Outline

More information

Correspondence Analysis

Correspondence Analysis STATGRAPHICS Rev. 7/6/009 Correspondence Analysis The Correspondence Analysis procedure creates a map of the rows and columns in a two-way contingency table for the purpose of providing insights into the

More information

missmda: a package to handle missing values in Multivariate exploratory Data Analysis methods

missmda: a package to handle missing values in Multivariate exploratory Data Analysis methods missmda: a package to handle missing values in Multivariate exploratory Data Analysis methods Julie Josse & Francois Husson Applied Mathematics Department Agrocampus Rennes - France user! 2011, Warwick,

More information

Audio transcription of the Principal Component Analysis course

Audio transcription of the Principal Component Analysis course Audio transcription of the Principal Component Analysis course Part I. Data - Practicalities Slides 1 to 8 Pages 2 to 5 Part II. Studying individuals and variables Slides 9 to 26 Pages 6 to 13 Part III.

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Giorgos Korfiatis Alfa-Informatica University of Groningen Seminar in Statistics and Methodology, 2007 What Is PCA? Dimensionality reduction technique Aim: Extract relevant

More information

Multiple Correspondence Analysis

Multiple Correspondence Analysis Multiple Correspondence Analysis 18 Up to now we have analysed the association between two categorical variables or between two sets of categorical variables where the row variables are different from

More information

6. Let C and D be matrices conformable to multiplication. Then (CD) =

6. Let C and D be matrices conformable to multiplication. Then (CD) = Quiz 1. Name: 10 points per correct answer. (20 points for attendance). 1. Let A = 3 and B = [3 yy]. When is A equal to B? xx A. When x = 3 B. When y = 3 C. When x = y D. Never 2. See 1. What is the dimension

More information

MICHAEL GREENACRE 1. Department of Economics and Business, Universitat Pompeu Fabra, Ramon Trias Fargas, 25-27, Barcelona, Spain

MICHAEL GREENACRE 1. Department of Economics and Business, Universitat Pompeu Fabra, Ramon Trias Fargas, 25-27, Barcelona, Spain Ecology, 91(4), 2010, pp. 958 963 Ó 2010 by the Ecological Society of America Correspondence analysis of raw data MICHAEL GREENACRE 1 Department of Economics and Business, Universitat Pompeu Fabra, Ramon

More information

Algebra of Principal Component Analysis

Algebra of Principal Component Analysis Algebra of Principal Component Analysis 3 Data: Y = 5 Centre each column on its mean: Y c = 7 6 9 y y = 3..6....6.8 3. 3.8.6 Covariance matrix ( variables): S = -----------Y n c ' Y 8..6 c =.6 5.8 Equation

More information

13.1 Categorical Data and the Multinomial Experiment

13.1 Categorical Data and the Multinomial Experiment Chapter 13 Categorical Data Analysis 13.1 Categorical Data and the Multinomial Experiment Recall Variable: (numerical) variable (i.e. # of students, temperature, height,). (non-numerical, categorical)

More information

Correspondence Analysis

Correspondence Analysis Correspondence Analysis Q: when independence of a 2-way contingency table is rejected, how to know where the dependence is coming from? The interaction terms in a GLM contain dependence information; however,

More information

MATH Notebook 3 Spring 2018

MATH Notebook 3 Spring 2018 MATH448001 Notebook 3 Spring 2018 prepared by Professor Jenny Baglivo c Copyright 2010 2018 by Jenny A. Baglivo. All Rights Reserved. 3 MATH448001 Notebook 3 3 3.1 One Way Layout........................................

More information

Multiple Correspondence Analysis of Subsets of Response Categories

Multiple Correspondence Analysis of Subsets of Response Categories Multiple Correspondence Analysis of Subsets of Response Categories Michael Greenacre 1 and Rafael Pardo 2 1 Departament d Economia i Empresa Universitat Pompeu Fabra Ramon Trias Fargas, 25-27 08005 Barcelona.

More information

Statistics 3858 : Contingency Tables

Statistics 3858 : Contingency Tables Statistics 3858 : Contingency Tables 1 Introduction Before proceeding with this topic the student should review generalized likelihood ratios ΛX) for multinomial distributions, its relation to Pearson

More information

Statistics for Managers Using Microsoft Excel

Statistics for Managers Using Microsoft Excel Statistics for Managers Using Microsoft Excel 7 th Edition Chapter 1 Chi-Square Tests and Nonparametric Tests Statistics for Managers Using Microsoft Excel 7e Copyright 014 Pearson Education, Inc. Chap

More information

Singular Value Decompsition

Singular Value Decompsition Singular Value Decompsition Massoud Malek One of the most useful results from linear algebra, is a matrix decomposition known as the singular value decomposition It has many useful applications in almost

More information

Correspondence Analysis & Related Methods

Correspondence Analysis & Related Methods Correspondence Analysis & Related Methods Michael Greenacre SESSION 9: CA applied to rankings, preferences & paired comparisons Correspondence analysis (CA) can also be applied to other types of data:

More information

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx

More information

Journal of Statistical Software

Journal of Statistical Software JSS Journal of Statistical Software April 2011, Volume 40, Issue 11. http://www.jstatsoft.org/ Simultaneous Analysis in S-PLUS: The SimultAn Package Amaya Zárraga Universidad del País Vasco/ Euskal Herriko

More information

CORRESPONDENCE ANALYSIS: INTRODUCTION OF A SECOND DIMENSIONS TO DESCRIBE PREFERENCES IN CONJOINT ANALYSIS.

CORRESPONDENCE ANALYSIS: INTRODUCTION OF A SECOND DIMENSIONS TO DESCRIBE PREFERENCES IN CONJOINT ANALYSIS. CORRESPONDENCE ANALYSIS: INTRODUCTION OF A SECOND DIENSIONS TO DESCRIBE PREFERENCES IN CONJOINT ANALYSIS. Anna Torres Lacomba Universidad Carlos III de adrid ABSTRACT We show the euivalence of using correspondence

More information

Multidimensional Scaling in R: SMACOF

Multidimensional Scaling in R: SMACOF Multidimensional Scaling in R: SMACOF Patrick Mair Institute for Statistics and Mathematics WU Vienna University of Economics and Business Jan de Leeuw Department of Statistics University of California,

More information

The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions.

The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions. The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions. A common problem of this type is concerned with determining

More information

Principal component analysis

Principal component analysis Principal component analysis Motivation i for PCA came from major-axis regression. Strong assumption: single homogeneous sample. Free of assumptions when used for exploration. Classical tests of significance

More information

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary

More information

Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution

Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution James V. Lambers Department of Mathematics The University of Southern Mississippi James V. Lambers Statistical Data Analysis

More information

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables. Regression Analysis BUS 735: Business Decision Making and Research 1 Goals of this section Specific goals Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate

More information

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric Assumptions The observations must be independent. Dependent variable should be continuous

More information

Biplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 11 Offprint. Discriminant Analysis Biplots

Biplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 11 Offprint. Discriminant Analysis Biplots Biplots in Practice MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University Chapter 11 Offprint Discriminant Analysis Biplots First published: September 2010 ISBN: 978-84-923846-8-6 Supporting

More information

Computational Systems Biology: Biology X

Computational Systems Biology: Biology X Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#7:(Mar-23-2010) Genome Wide Association Studies 1 The law of causality... is a relic of a bygone age, surviving, like the monarchy,

More information

REGRESSION, DISCRIMINANT ANALYSIS, AND CANONICAL CORRELATION ANALYSIS WITH HOMALS 1. MORALS

REGRESSION, DISCRIMINANT ANALYSIS, AND CANONICAL CORRELATION ANALYSIS WITH HOMALS 1. MORALS REGRESSION, DISCRIMINANT ANALYSIS, AND CANONICAL CORRELATION ANALYSIS WITH HOMALS JAN DE LEEUW ABSTRACT. It is shown that the homals package in R can be used for multiple regression, multi-group discriminant

More information

FINM 331: MULTIVARIATE DATA ANALYSIS FALL 2017 PROBLEM SET 3

FINM 331: MULTIVARIATE DATA ANALYSIS FALL 2017 PROBLEM SET 3 FINM 331: MULTIVARIATE DATA ANALYSIS FALL 2017 PROBLEM SET 3 The required files for all problems can be found in: http://www.stat.uchicago.edu/~lekheng/courses/331/hw3/ The file name indicates which problem

More information

15 Singular Value Decomposition

15 Singular Value Decomposition 15 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

INTERACTIVE CORRESPONDENCE ANALYSIS IN A DYNAMIC OBJECT-ORIENTED ENVIRONMENT. 1. Introduction

INTERACTIVE CORRESPONDENCE ANALYSIS IN A DYNAMIC OBJECT-ORIENTED ENVIRONMENT. 1. Introduction INTERACTIVE CORRESPONDENCE ANALYSIS IN A DYNAMIC OBJECT-ORIENTED ENVIRONMENT JASON BOND AND GEORGE MICHAILIDIS ABSTRACT. A highly interactive, user-friendly object-oriented software package written in

More information

CS60021: Scalable Data Mining. Dimensionality Reduction

CS60021: Scalable Data Mining. Dimensionality Reduction J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 1 CS60021: Scalable Data Mining Dimensionality Reduction Sourangshu Bhattacharya Assumption: Data lies on or near a

More information

How to analyze multiple distance matrices

How to analyze multiple distance matrices DISTATIS How to analyze multiple distance matrices Hervé Abdi & Dominique Valentin Overview. Origin and goal of the method DISTATIS is a generalization of classical multidimensional scaling (MDS see the

More information

Principal Component Analysis

Principal Component Analysis (in press, 00). Wiley Interdisciplinary Reviews: Computational Statistics, Principal Component Analysis Hervé Abdi Lynne J. Williams Abstract Principal component analysis (pca) is a multivariate technique

More information

Maximum variance formulation

Maximum variance formulation 12.1. Principal Component Analysis 561 Figure 12.2 Principal component analysis seeks a space of lower dimensionality, known as the principal subspace and denoted by the magenta line, such that the orthogonal

More information

Relate Attributes and Counts

Relate Attributes and Counts Relate Attributes and Counts This procedure is designed to summarize data that classifies observations according to two categorical factors. The data may consist of either: 1. Two Attribute variables.

More information

Testing Independence

Testing Independence Testing Independence Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM 1/50 Testing Independence Previously, we looked at RR = OR = 1

More information

Chapter 11 Canonical analysis

Chapter 11 Canonical analysis Chapter 11 Canonical analysis 11.0 Principles of canonical analysis Canonical analysis is the simultaneous analysis of two, or possibly several data tables. Canonical analyses allow ecologists to perform

More information

Chi-Squared Tests. Semester 1. Chi-Squared Tests

Chi-Squared Tests. Semester 1. Chi-Squared Tests Semester 1 Goodness of Fit Up to now, we have tested hypotheses concerning the values of population parameters such as the population mean or proportion. We have not considered testing hypotheses about

More information

Chapter 10. Chapter 10. Multinomial Experiments and. Multinomial Experiments and Contingency Tables. Contingency Tables.

Chapter 10. Chapter 10. Multinomial Experiments and. Multinomial Experiments and Contingency Tables. Contingency Tables. Chapter 10 Multinomial Experiments and Contingency Tables 1 Chapter 10 Multinomial Experiments and Contingency Tables 10-1 1 Overview 10-2 2 Multinomial Experiments: of-fitfit 10-3 3 Contingency Tables:

More information

Sensory Analysis, R tools to analyze Preference liking in function of descriptive measures.

Sensory Analysis, R tools to analyze Preference liking in function of descriptive measures. Sensory Analysis, R tools to analyze Preference liking in function of descriptive measures. Dhafer Malouche essai.academia.edu/dhafermalouche Center of Political Studies, Institute of Social Research University

More information

Biplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 6 Offprint

Biplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 6 Offprint Biplots in Practice MICHAEL GREENACRE Proessor o Statistics at the Pompeu Fabra University Chapter 6 Oprint Principal Component Analysis Biplots First published: September 010 ISBN: 978-84-93846-8-6 Supporting

More information

SCA vs. TCA: an Expertise

SCA vs. TCA: an Expertise SCA vs. TCA: an Expertise Sergio Camiz and Gastão Coelho Gomes 24/07/2015 1 Introduction The aim of this work is to compare both theoretically and in practice two exploratory methods whose aim is apparently

More information

ANOVA approach. Investigates interaction terms. Disadvantages: Requires careful sampling design with replication

ANOVA approach. Investigates interaction terms. Disadvantages: Requires careful sampling design with replication ANOVA approach Advantages: Ideal for evaluating hypotheses Ideal to quantify effect size (e.g., differences between groups) Address multiple factors at once Investigates interaction terms Disadvantages:

More information

Research Methodology: Tools

Research Methodology: Tools MSc Business Administration Research Methodology: Tools Applied Data Analysis (with SPSS) Lecture 05: Contingency Analysis March 2014 Prof. Dr. Jürg Schwarz Lic. phil. Heidi Bruderer Enzler Contents Slide

More information

1 Interpretation. Contents. Biplots, revisited. Biplots, revisited 2. Biplots, revisited 1

1 Interpretation. Contents. Biplots, revisited. Biplots, revisited 2. Biplots, revisited 1 Biplots, revisited 1 Biplots, revisited 2 1 Interpretation Biplots, revisited Biplots show the following quantities of a data matrix in one display: Slide 1 Ulrich Kohler kohler@wz-berlin.de Slide 3 the

More information

HOMOGENEITY ANALYSIS USING EUCLIDEAN MINIMUM SPANNING TREES 1. INTRODUCTION

HOMOGENEITY ANALYSIS USING EUCLIDEAN MINIMUM SPANNING TREES 1. INTRODUCTION HOMOGENEITY ANALYSIS USING EUCLIDEAN MINIMUM SPANNING TREES JAN DE LEEUW 1. INTRODUCTION In homogeneity analysis the data are a system of m subsets of a finite set of n objects. The purpose of the technique

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

CONVERSION OF COORDINATES BETWEEN FRAMES

CONVERSION OF COORDINATES BETWEEN FRAMES ECS 178 Course Notes CONVERSION OF COORDINATES BETWEEN FRAMES Kenneth I. Joy Institute for Data Analysis and Visualization Department of Computer Science University of California, Davis Overview Frames

More information

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007) FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter

More information

LOOKING FOR RELATIONSHIPS

LOOKING FOR RELATIONSHIPS LOOKING FOR RELATIONSHIPS One of most common types of investigation we do is to look for relationships between variables. Variables may be nominal (categorical), for example looking at the effect of an

More information

Matrices: 2.1 Operations with Matrices

Matrices: 2.1 Operations with Matrices Goals In this chapter and section we study matrix operations: Define matrix addition Define multiplication of matrix by a scalar, to be called scalar multiplication. Define multiplication of two matrices,

More information

Frequency Distribution Cross-Tabulation

Frequency Distribution Cross-Tabulation Frequency Distribution Cross-Tabulation 1) Overview 2) Frequency Distribution 3) Statistics Associated with Frequency Distribution i. Measures of Location ii. Measures of Variability iii. Measures of Shape

More information

Y (Nominal/Categorical) 1. Metric (interval/ratio) data for 2+ IVs, and categorical (nominal) data for a single DV

Y (Nominal/Categorical) 1. Metric (interval/ratio) data for 2+ IVs, and categorical (nominal) data for a single DV 1 Neuendorf Discriminant Analysis The Model X1 X2 X3 X4 DF2 DF3 DF1 Y (Nominal/Categorical) Assumptions: 1. Metric (interval/ratio) data for 2+ IVs, and categorical (nominal) data for a single DV 2. Linearity--in

More information

The following postestimation commands are of special interest after ca and camat: biplot of row and column points

The following postestimation commands are of special interest after ca and camat: biplot of row and column points Title stata.com ca postestimation Postestimation tools for ca and camat Description Syntax for predict Menu for predict Options for predict Syntax for estat Menu for estat Options for estat Remarks and

More information

Fall TMA4145 Linear Methods. Exercise set Given the matrix 1 2

Fall TMA4145 Linear Methods. Exercise set Given the matrix 1 2 Norwegian University of Science and Technology Department of Mathematical Sciences TMA445 Linear Methods Fall 07 Exercise set Please justify your answers! The most important part is how you arrive at an

More information

Distributed multilevel matrix completion for medical databases

Distributed multilevel matrix completion for medical databases Distributed multilevel matrix completion for medical databases Julie Josse Ecole Polytechnique, INRIA Séminaire Parisien de Statistique, IHP January 15, 2018 1 / 38 Overview 1 Introduction 2 Single imputation

More information

Principal Component Analysis for Mixed Quantitative and Qualitative Data

Principal Component Analysis for Mixed Quantitative and Qualitative Data Principal Component Analysis for Mixed Quantitative and Qualitative Data Susana Agudelo-Jaramillo Manuela Ochoa-Muñoz Tutor: Francisco Iván Zuluaga-Díaz EAFIT University Medelĺın-Colombia Research Practise

More information

The Principal Component Analysis

The Principal Component Analysis The Principal Component Analysis Philippe B. Laval KSU Fall 2017 Philippe B. Laval (KSU) PCA Fall 2017 1 / 27 Introduction Every 80 minutes, the two Landsat satellites go around the world, recording images

More information

A Multivariate Perspective

A Multivariate Perspective A Multivariate Perspective on the Analysis of Categorical Data Rebecca Zwick Educational Testing Service Ellijot M. Cramer University of North Carolina at Chapel Hill Psychological research often involves

More information

Consideration on Sensitivity for Multiple Correspondence Analysis

Consideration on Sensitivity for Multiple Correspondence Analysis Consideration on Sensitivity for Multiple Correspondence Analysis Masaaki Ida Abstract Correspondence analysis is a popular research technique applicable to data and text mining, because the technique

More information

Chemometrics. Matti Hotokka Physical chemistry Åbo Akademi University

Chemometrics. Matti Hotokka Physical chemistry Åbo Akademi University Chemometrics Matti Hotokka Physical chemistry Åbo Akademi University Linear regression Experiment Consider spectrophotometry as an example Beer-Lamberts law: A = cå Experiment Make three known references

More information

ColourMatrix: White Paper

ColourMatrix: White Paper ColourMatrix: White Paper Finding relationship gold in big data mines One of the most common user tasks when working with tabular data is identifying and quantifying correlations and associations. Fundamentally,

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 6: Bivariate Correspondence Analysis - part II

MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 6: Bivariate Correspondence Analysis - part II MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 6: Bivariate Correspondence Analysis - part II the Contents the the the Independence The independence between variables x and y can be tested using.

More information

Assessing Sample Variability in the Visualization Techniques related to Principal Component Analysis : Bootstrap and Alternative Simulation Methods.

Assessing Sample Variability in the Visualization Techniques related to Principal Component Analysis : Bootstrap and Alternative Simulation Methods. Draft from : COMPSTAT, Physica Verlag, Heidelberg, 1996, Alberto Prats, Editor, p 205 210. Assessing Sample Variability in the Visualization Techniques related to Principal Component Analysis : Bootstrap

More information

arxiv: v1 [stat.me] 13 May 2016

arxiv: v1 [stat.me] 13 May 2016 Multiple Correspondence Analysis & the Multilogit Bilinear Model arxiv:1605.04212v1 [stat.me] 13 May 2016 William Fithian 1 and Julie Josse 2 1 Berkeley Statistics Department, USA 2 Agrocampus Ouest -

More information

Statistics Handbook. All statistical tables were computed by the author.

Statistics Handbook. All statistical tables were computed by the author. Statistics Handbook Contents Page Wilcoxon rank-sum test (Mann-Whitney equivalent) Wilcoxon matched-pairs test 3 Normal Distribution 4 Z-test Related samples t-test 5 Unrelated samples t-test 6 Variance

More information

Multidimensional Joint Graphical Display of Symmetric Analysis: Back to the Fundamentals

Multidimensional Joint Graphical Display of Symmetric Analysis: Back to the Fundamentals Multidimensional Joint Graphical Display of Symmetric Analysis: Back to the Fundamentals Shizuhiko Nishisato Abstract The basic premise of dual scaling/correspondence analysis lies in the simultaneous

More information

One-stage dose-response meta-analysis

One-stage dose-response meta-analysis One-stage dose-response meta-analysis Nicola Orsini, Alessio Crippa Biostatistics Team Department of Public Health Sciences Karolinska Institutet http://ki.se/en/phs/biostatistics-team 2017 Nordic and

More information

Notes for Math 324, Part 12

Notes for Math 324, Part 12 72 Notes for Math 324, Part 12 Chapter 12 Definition and main properties of probability 12.1 Sample space, events We run an experiment which can have several outcomes. The set consisting by all possible

More information

DESAIN EKSPERIMEN Analysis of Variances (ANOVA) Semester Genap 2017/2018 Jurusan Teknik Industri Universitas Brawijaya

DESAIN EKSPERIMEN Analysis of Variances (ANOVA) Semester Genap 2017/2018 Jurusan Teknik Industri Universitas Brawijaya DESAIN EKSPERIMEN Analysis of Variances (ANOVA) Semester Jurusan Teknik Industri Universitas Brawijaya Outline Introduction The Analysis of Variance Models for the Data Post-ANOVA Comparison of Means Sample

More information

4. Ordination in reduced space

4. Ordination in reduced space Université Laval Analyse multivariable - mars-avril 2008 1 4.1. Generalities 4. Ordination in reduced space Contrary to most clustering techniques, which aim at revealing discontinuities in the data, ordination

More information

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression Rebecca Barter April 20, 2015 Fisher s Exact Test Fisher s Exact Test

More information

A Short Course in Basic Statistics

A Short Course in Basic Statistics A Short Course in Basic Statistics Ian Schindler November 5, 2017 Creative commons license share and share alike BY: C 1 Descriptive Statistics 1.1 Presenting statistical data Definition 1 A statistical

More information

Multivariate Analysis of Ecological Data

Multivariate Analysis of Ecological Data Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology

More information

Residual-based Shadings for Visualizing (Conditional) Independence

Residual-based Shadings for Visualizing (Conditional) Independence Residual-based Shadings for Visualizing (Conditional) Independence Achim Zeileis David Meyer Kurt Hornik http://www.ci.tuwien.ac.at/~zeileis/ Overview The independence problem in 2-way contingency tables

More information

DE CHAZAL DU MEE BUSINESS SCHOOL AUGUST 2003 MOCK EXAMINATIONS IOP 201-Q (INDUSTRIAL PSYCHOLOGICAL RESEARCH)

DE CHAZAL DU MEE BUSINESS SCHOOL AUGUST 2003 MOCK EXAMINATIONS IOP 201-Q (INDUSTRIAL PSYCHOLOGICAL RESEARCH) DE CHAZAL DU MEE BUSINESS SCHOOL AUGUST 003 MOCK EXAMINATIONS IOP 01-Q (INDUSTRIAL PSYCHOLOGICAL RESEARCH) Time: hours READ THE INSTRUCTIONS BELOW VERY CAREFULLY. Do not open this question paper until

More information

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017 CPSC 340: Machine Learning and Data Mining More PCA Fall 2017 Admin Assignment 4: Due Friday of next week. No class Monday due to holiday. There will be tutorials next week on MAP/PCA (except Monday).

More information

1 Overview. Conguence: Congruence coefficient, R V -coefficient, and Mantel coefficient. Hervé Abdi

1 Overview. Conguence: Congruence coefficient, R V -coefficient, and Mantel coefficient. Hervé Abdi In Neil Salkind (Ed.), Encyclopedia of Research Design. Thousand Oaks, CA: Sage. 2010 Conguence: Congruence coefficient, R V -coefficient, and Mantel coefficient Hervé Abdi 1 Overview The congruence between

More information

Linear Algebra Methods for Data Mining

Linear Algebra Methods for Data Mining Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 1. Basic Linear Algebra Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki Example

More information

1 Introduction. 2 A regression model

1 Introduction. 2 A regression model Regression Analysis of Compositional Data When Both the Dependent Variable and Independent Variable Are Components LA van der Ark 1 1 Tilburg University, The Netherlands; avdark@uvtnl Abstract It is well

More information