Correspondence Analysis
|
|
- Solomon Curtis
- 5 years ago
- Views:
Transcription
1 Correspondence Analysis Julie Josse, François Husson, Sébastien Lê Applied Mathematics Department, Agrocampus Ouest user-2008 Dortmund, August 11th / 22
2 History Theoretical principles: Fisher (1940) Correspondence Analysis has been actively developed in in Rennes! JP. Benzécri: mathematician and linguist PhD thesis of his student B. Escofier : Correspondence Analysis The beginning of the "French school" 2 / 22
3 CA in the R packages anacor (de Leeuw and mair) ca (Nenadic and Greenacre) ade4 (Chessel) vegan (Dixon) homals (de Leeuw) FactoMineR (Husson et al.) 3 / 22
4 Data, examples Two categorical variables contingency table. Symmetric role of the rows and the columns Examples: examples where a χ 2 test can be applied text-mining: number of times the word i is in the text j solutions (acid, bitter, etc.) - answers (acid, bitter, etc.): number of persons who answer j for the stimulus i perfumes - descriptors: number of times the descriptor j is used to describe the perfume i 4 / 22
5 Notations Figure: Data table in CA. 5 / 22
6 Notations Figure: Row profile and column profile. 6 / 22
7 Aim Rows typology Columns typology Relationship between these two typologies Study the relationship (the correspondence) between the two variables, the gap to independence Visualize the association between levels 7 / 22
8 Example 12 perfumes described by 39 words: floral fruity strong soft light... Angel Aromatics Elixir Chanel Cinéma Coco Mademoiselle / 22
9 Chi-square: Intensity of the relationship χ 2 = ij = n ij (n ij n i. n.j /n) 2, n i. n.j /n (f ij f i. f.j ) 2 f i. f.j, = nφ 2. φ 2 is the intensity of the relationship Chi-square test (Pearson): χ 2 obs χ2 (I 1) (J 1) χ 2 = 615.8, (p-value = 1.7e-56) Highly significant obs 9 / 22
10 Nature of the relationship Contribution to the chi-square: x 2 ij = (n ij n i. n.j /n) 2 n i. n.j /n Contribution of each cell, contribution of each row, contribution of each column? Residuals (positive or negative association): x ij = (n ij n i. n.j /n) ni. n.j /n CA: visualize the residuals matrix X (the gap to independence) As usual, the association structure of X is revealed using the SVD 10 / 22
11 Total inertia Total inertia = φ 2 = trace(xx ) = χ2 n φ 2 = χ2 n = ij (f ij f i. f.j ) 2 f i. f.j φ 2 = ij f.j ( fij f.j f i. ) 2 f i. Similarly: φ 2 = χ2 n = ij f i. ( fij f i. f.j ) 2 f.j 11 / 22
12 Total inertia explained via the profile For the row profile: Weight for the columns Associated weight of the rows i φ 2 = ij f i. f ij f i. = f.j f i. ( fij f i. f.j ) 2 f.j Total inertia = weighted sum of squared distances of the rows profile to the average profile the weight of the row profile is its mass f i. and the squared distance is an Euclidean distance where each squared difference is divided by the corresponding average value f.j 12 / 22
13 χ 2 -distance the row i is a point in R J (with the weight f i. ) d 2 (i, l) = j 1 f.j ( fij f ) 2 lj f i. f l. the column j is a point in R I (with the weight f.j ) d 2 (j, h) = i 1 f i. ( fij f ) 2 ih f.j f.h These χ 2 -distances enjoy good properties (distributional equivalence principle) 13 / 22
14 CA: different formulations In factorial analysis such as PCA, we are looking for dimension which represent in the better way the variability between individuals (i.e the distance to the barycenter), in CA we are looking for dimensions which better represent the gap to independence. Classical presentation of CA: two weighted PCA on "row profile" and on "column profil" Canonical Analysis on the two indicator matrices 14 / 22
15 Link between the two representations: transition formulae F s (i) = 1 f ij G s (j) λs f i. Row i is at the barycenter of the weighted columns (with a scale 1/ λ s ) j G s (k) = 1 f ij F s (i) λs f.j Column k is at the barycenter of the weighted rows (with a scale 1/ λ s ) i 15 / 22
16 Graphical representation CA factor map vanilla Dim 2 (21.12%) sugary Lolita Lempika Cinéma L_instant fruity soft J_adore light Coco Pure Mademoiselle Poison J_adore_et discrete floral fresh Pleasures acid wooded Angel spicy soap strong Aromatics ShalimarElixir old Chanel agressive Dim 1 (60.46%) 16 / 22
17 Graphical representation CA factor map Dim 2 (21.12%) Lolita Lempika Cinéma L_instant J_adore Coco Pure Mademoiselle Poison J_adore_et Pleasures Angel Chanel 5 Aromatics ShalimarElixir Dim 1 (60.46%) 17 / 22
18 Graphical representation CA factor map Dim 2 (21.12%) sugary Lolita Lempika Cinéma L_instant young fruity soft J_adore light vanilla hot wooded Coco Pure Mademoiselle Poison discreet forest J_adore_et floral fresh rose shampoo nature Pleasures shower.gel candy Angel oriental acid spicy vegetable lemon woman soap heavy intense peppery agressive heady drugs strong eau.de.cologne Aromatics ShalimarElixir male alcohol powerful old toilets Chanel 5 amber musky Dim 1 (60.46%) 18 / 22
19 Graphical representation in CA: remarks The barycenter represents the independence The distance between levels of a same variable can be interpreted Representation provided are pseudo-barycentric (dilatation): transition formulae It is not possible to interpret the distance between levels of the two variables but it is at a weighted barycenter of all the levels 19 / 22
20 Helps to interpret Supplementary informations can be added (zero weight)! Percentage of variance for each axis: information brought by λ the dimension s, but, it is interesting to have a look at the s λs eigenvalues. λ s always smaller than 1; the value 1 is obtained for exclusive associations. library(factominer) don=diag(5) a=ca(don) a$eig don=matrix(1,5,5) a=ca(don) a$eig 20 / 22
21 Helps to interpret The maximum number of axes is min(i, J) 1 Quality of the representation: cos 2 Contribution: inertia of a point total inertia = f i.f s(i) 2 λ s. Be careful, extreme points are not those which contribute the most to the dimension 21 / 22
22 Practice library(factominer) perfume = read.table("perfume.txt",header=t,sep="\t",row.names=1) res.ca = CA(perfume,col.sup=16:39) plot(res.ca,invisible="row") plot(res.ca,invisible=c("col","col.sup")) res.ca$eig barplot(res.ca$eig[,1],main="eigenvalues",names.arg=1:nrow(res.ca$eig)) res.ca$row$coord res.ca$row$cos2 res.ca$row$contrib res.ca$col$coord res.ca$col$cos2 res.ca$col$contrib 22 / 22
Correspondence Analysis (CA)
Correspondence Analysis (CA) François Husson & Magalie Houée-Bigot Department o Applied Mathematics - Rennes Agrocampus husson@agrocampus-ouest.r / 43 Correspondence Analysis (CA) Data 2 Independence model
More informationBootstrap-Based Regularization for Low-Rank Matrix Estimation
Journal of Machine Learning Research 17 (016) 1-9 Submitted 1/14; Revised 3/16; Published 8/16 Bootstrap-Based Regularization for Low-Rank Matrix Estimation Julie Josse josse@agrocampus-ouest.fr Department
More informationHandling missing values in Multiple Factor Analysis
Handling missing values in Multiple Factor Analysis François Husson, Julie Josse Applied mathematics department, Agrocampus Ouest, Rennes, France Rabat, 26 March 2014 1 / 10 Multi-blocks data set Groups
More informationLecture 2: Data Analytics of Narrative
Lecture 2: Data Analytics of Narrative Data Analytics of Narrative: Pattern Recognition in Text, and Text Synthesis, Supported by the Correspondence Analysis Platform. This Lecture is presented in three
More informationMissing values imputation for mixed data based on principal component methods
Missing values imputation for mixed data based on principal component methods Vincent Audigier, François Husson & Julie Josse Agrocampus Rennes Compstat' 2012, Limassol (Cyprus), 28-08-2012 1 / 21 A real
More informationCorrespondence analysis
C Correspondence Analysis Hervé Abdi 1 and Michel Béra 2 1 School of Behavioral and Brain Sciences, The University of Texas at Dallas, Richardson, TX, USA 2 Centre d Étude et de Recherche en Informatique
More informationEvaluation of scoring index with different normalization and distance measure with correspondence analysis
Evaluation of scoring index with different normalization and distance measure with correspondence analysis Anders Nilsson Master Thesis in Statistics 15 ECTS Spring Semester 2010 Supervisors: Professor
More informationSimple and Canonical Correspondence Analysis Using the R Package anacor
Simple and Canonical Correspondence Analysis Using the R Package anacor Jan de Leeuw University of California, Los Angeles Patrick Mair WU Wirtschaftsuniversität Wien Abstract This paper presents the R
More informationMultivariate Statistics Fundamentals Part 1: Rotation-based Techniques
Multivariate Statistics Fundamentals Part 1: Rotation-based Techniques A reminded from a univariate statistics courses Population Class of things (What you want to learn about) Sample group representing
More informationTOTAL INERTIA DECOMPOSITION IN TABLES WITH A FACTORIAL STRUCTURE
Statistica Applicata - Italian Journal of Applied Statistics Vol. 29 (2-3) 233 doi.org/10.26398/ijas.0029-012 TOTAL INERTIA DECOMPOSITION IN TABLES WITH A FACTORIAL STRUCTURE Michael Greenacre 1 Universitat
More informationRegularized PCA to denoise and visualise data
Regularized PCA to denoise and visualise data Marie Verbanck Julie Josse François Husson Laboratoire de statistique, Agrocampus Ouest, Rennes, France CNAM, Paris, 16 janvier 2013 1 / 30 Outline 1 PCA 2
More informationTopic 21 Goodness of Fit
Topic 21 Goodness of Fit Contingency Tables 1 / 11 Introduction Two-way Table Smoking Habits The Hypothesis The Test Statistic Degrees of Freedom Outline 2 / 11 Introduction Contingency tables, also known
More informationDiscriminant Correspondence Analysis
Discriminant Correspondence Analysis Hervé Abdi Overview As the name indicates, discriminant correspondence analysis (DCA) is an extension of discriminant analysis (DA) and correspondence analysis (CA).
More informationarxiv: v4 [stat.co] 8 Dec 2017
Multivariate Analysis of Mixed Data: The R Package PCAmixdata Marie Chavent,2, Vanessa Kuentz-Simonet 3, Amaury Labenne 3, Jérôme Saracco 2,4 December, 207 arxiv:4.49v4 [stat.co] 8 Dec 207 Université de
More informationExtended Mosaic and Association Plots for Visualizing (Conditional) Independence. Achim Zeileis David Meyer Kurt Hornik
Extended Mosaic and Association Plots for Visualizing (Conditional) Independence Achim Zeileis David Meyer Kurt Hornik Overview The independence problem in -way contingency tables Standard approach: χ
More informationVisualizing Independence Using Extended Association and Mosaic Plots. Achim Zeileis David Meyer Kurt Hornik
Visualizing Independence Using Extended Association and Mosaic Plots Achim Zeileis David Meyer Kurt Hornik Overview The independence problem in 2-way contingency tables Standard approach: χ 2 test Alternative
More informationGeneralizing Partial Least Squares and Correspondence Analysis to Predict Categorical (and Heterogeneous) Data
Generalizing Partial Least Squares and Correspondence Analysis to Predict Categorical (and Heterogeneous) Data Hervé Abdi, Derek Beaton, & Gilbert Saporta CARME 2015, Napoli, September 21-23 1 Outline
More informationCorrespondence Analysis
STATGRAPHICS Rev. 7/6/009 Correspondence Analysis The Correspondence Analysis procedure creates a map of the rows and columns in a two-way contingency table for the purpose of providing insights into the
More informationmissmda: a package to handle missing values in Multivariate exploratory Data Analysis methods
missmda: a package to handle missing values in Multivariate exploratory Data Analysis methods Julie Josse & Francois Husson Applied Mathematics Department Agrocampus Rennes - France user! 2011, Warwick,
More informationAudio transcription of the Principal Component Analysis course
Audio transcription of the Principal Component Analysis course Part I. Data - Practicalities Slides 1 to 8 Pages 2 to 5 Part II. Studying individuals and variables Slides 9 to 26 Pages 6 to 13 Part III.
More informationPrincipal Component Analysis
Principal Component Analysis Giorgos Korfiatis Alfa-Informatica University of Groningen Seminar in Statistics and Methodology, 2007 What Is PCA? Dimensionality reduction technique Aim: Extract relevant
More informationMultiple Correspondence Analysis
Multiple Correspondence Analysis 18 Up to now we have analysed the association between two categorical variables or between two sets of categorical variables where the row variables are different from
More information6. Let C and D be matrices conformable to multiplication. Then (CD) =
Quiz 1. Name: 10 points per correct answer. (20 points for attendance). 1. Let A = 3 and B = [3 yy]. When is A equal to B? xx A. When x = 3 B. When y = 3 C. When x = y D. Never 2. See 1. What is the dimension
More informationMICHAEL GREENACRE 1. Department of Economics and Business, Universitat Pompeu Fabra, Ramon Trias Fargas, 25-27, Barcelona, Spain
Ecology, 91(4), 2010, pp. 958 963 Ó 2010 by the Ecological Society of America Correspondence analysis of raw data MICHAEL GREENACRE 1 Department of Economics and Business, Universitat Pompeu Fabra, Ramon
More informationAlgebra of Principal Component Analysis
Algebra of Principal Component Analysis 3 Data: Y = 5 Centre each column on its mean: Y c = 7 6 9 y y = 3..6....6.8 3. 3.8.6 Covariance matrix ( variables): S = -----------Y n c ' Y 8..6 c =.6 5.8 Equation
More information13.1 Categorical Data and the Multinomial Experiment
Chapter 13 Categorical Data Analysis 13.1 Categorical Data and the Multinomial Experiment Recall Variable: (numerical) variable (i.e. # of students, temperature, height,). (non-numerical, categorical)
More informationCorrespondence Analysis
Correspondence Analysis Q: when independence of a 2-way contingency table is rejected, how to know where the dependence is coming from? The interaction terms in a GLM contain dependence information; however,
More informationMATH Notebook 3 Spring 2018
MATH448001 Notebook 3 Spring 2018 prepared by Professor Jenny Baglivo c Copyright 2010 2018 by Jenny A. Baglivo. All Rights Reserved. 3 MATH448001 Notebook 3 3 3.1 One Way Layout........................................
More informationMultiple Correspondence Analysis of Subsets of Response Categories
Multiple Correspondence Analysis of Subsets of Response Categories Michael Greenacre 1 and Rafael Pardo 2 1 Departament d Economia i Empresa Universitat Pompeu Fabra Ramon Trias Fargas, 25-27 08005 Barcelona.
More informationStatistics 3858 : Contingency Tables
Statistics 3858 : Contingency Tables 1 Introduction Before proceeding with this topic the student should review generalized likelihood ratios ΛX) for multinomial distributions, its relation to Pearson
More informationStatistics for Managers Using Microsoft Excel
Statistics for Managers Using Microsoft Excel 7 th Edition Chapter 1 Chi-Square Tests and Nonparametric Tests Statistics for Managers Using Microsoft Excel 7e Copyright 014 Pearson Education, Inc. Chap
More informationSingular Value Decompsition
Singular Value Decompsition Massoud Malek One of the most useful results from linear algebra, is a matrix decomposition known as the singular value decomposition It has many useful applications in almost
More informationCorrespondence Analysis & Related Methods
Correspondence Analysis & Related Methods Michael Greenacre SESSION 9: CA applied to rankings, preferences & paired comparisons Correspondence analysis (CA) can also be applied to other types of data:
More informationDimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas
Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx
More informationJournal of Statistical Software
JSS Journal of Statistical Software April 2011, Volume 40, Issue 11. http://www.jstatsoft.org/ Simultaneous Analysis in S-PLUS: The SimultAn Package Amaya Zárraga Universidad del País Vasco/ Euskal Herriko
More informationCORRESPONDENCE ANALYSIS: INTRODUCTION OF A SECOND DIMENSIONS TO DESCRIBE PREFERENCES IN CONJOINT ANALYSIS.
CORRESPONDENCE ANALYSIS: INTRODUCTION OF A SECOND DIENSIONS TO DESCRIBE PREFERENCES IN CONJOINT ANALYSIS. Anna Torres Lacomba Universidad Carlos III de adrid ABSTRACT We show the euivalence of using correspondence
More informationMultidimensional Scaling in R: SMACOF
Multidimensional Scaling in R: SMACOF Patrick Mair Institute for Statistics and Mathematics WU Vienna University of Economics and Business Jan de Leeuw Department of Statistics University of California,
More informationThe goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions.
The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions. A common problem of this type is concerned with determining
More informationPrincipal component analysis
Principal component analysis Motivation i for PCA came from major-axis regression. Strong assumption: single homogeneous sample. Free of assumptions when used for exploration. Classical tests of significance
More informationDATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD
DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary
More informationIntroduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution
Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution James V. Lambers Department of Mathematics The University of Southern Mississippi James V. Lambers Statistical Data Analysis
More informationRegression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.
Regression Analysis BUS 735: Business Decision Making and Research 1 Goals of this section Specific goals Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate
More informationParametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami
Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric Assumptions The observations must be independent. Dependent variable should be continuous
More informationBiplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 11 Offprint. Discriminant Analysis Biplots
Biplots in Practice MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University Chapter 11 Offprint Discriminant Analysis Biplots First published: September 2010 ISBN: 978-84-923846-8-6 Supporting
More informationComputational Systems Biology: Biology X
Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#7:(Mar-23-2010) Genome Wide Association Studies 1 The law of causality... is a relic of a bygone age, surviving, like the monarchy,
More informationREGRESSION, DISCRIMINANT ANALYSIS, AND CANONICAL CORRELATION ANALYSIS WITH HOMALS 1. MORALS
REGRESSION, DISCRIMINANT ANALYSIS, AND CANONICAL CORRELATION ANALYSIS WITH HOMALS JAN DE LEEUW ABSTRACT. It is shown that the homals package in R can be used for multiple regression, multi-group discriminant
More informationFINM 331: MULTIVARIATE DATA ANALYSIS FALL 2017 PROBLEM SET 3
FINM 331: MULTIVARIATE DATA ANALYSIS FALL 2017 PROBLEM SET 3 The required files for all problems can be found in: http://www.stat.uchicago.edu/~lekheng/courses/331/hw3/ The file name indicates which problem
More information15 Singular Value Decomposition
15 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing
More informationINTERACTIVE CORRESPONDENCE ANALYSIS IN A DYNAMIC OBJECT-ORIENTED ENVIRONMENT. 1. Introduction
INTERACTIVE CORRESPONDENCE ANALYSIS IN A DYNAMIC OBJECT-ORIENTED ENVIRONMENT JASON BOND AND GEORGE MICHAILIDIS ABSTRACT. A highly interactive, user-friendly object-oriented software package written in
More informationCS60021: Scalable Data Mining. Dimensionality Reduction
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 1 CS60021: Scalable Data Mining Dimensionality Reduction Sourangshu Bhattacharya Assumption: Data lies on or near a
More informationHow to analyze multiple distance matrices
DISTATIS How to analyze multiple distance matrices Hervé Abdi & Dominique Valentin Overview. Origin and goal of the method DISTATIS is a generalization of classical multidimensional scaling (MDS see the
More informationPrincipal Component Analysis
(in press, 00). Wiley Interdisciplinary Reviews: Computational Statistics, Principal Component Analysis Hervé Abdi Lynne J. Williams Abstract Principal component analysis (pca) is a multivariate technique
More informationMaximum variance formulation
12.1. Principal Component Analysis 561 Figure 12.2 Principal component analysis seeks a space of lower dimensionality, known as the principal subspace and denoted by the magenta line, such that the orthogonal
More informationRelate Attributes and Counts
Relate Attributes and Counts This procedure is designed to summarize data that classifies observations according to two categorical factors. The data may consist of either: 1. Two Attribute variables.
More informationTesting Independence
Testing Independence Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM 1/50 Testing Independence Previously, we looked at RR = OR = 1
More informationChapter 11 Canonical analysis
Chapter 11 Canonical analysis 11.0 Principles of canonical analysis Canonical analysis is the simultaneous analysis of two, or possibly several data tables. Canonical analyses allow ecologists to perform
More informationChi-Squared Tests. Semester 1. Chi-Squared Tests
Semester 1 Goodness of Fit Up to now, we have tested hypotheses concerning the values of population parameters such as the population mean or proportion. We have not considered testing hypotheses about
More informationChapter 10. Chapter 10. Multinomial Experiments and. Multinomial Experiments and Contingency Tables. Contingency Tables.
Chapter 10 Multinomial Experiments and Contingency Tables 1 Chapter 10 Multinomial Experiments and Contingency Tables 10-1 1 Overview 10-2 2 Multinomial Experiments: of-fitfit 10-3 3 Contingency Tables:
More informationSensory Analysis, R tools to analyze Preference liking in function of descriptive measures.
Sensory Analysis, R tools to analyze Preference liking in function of descriptive measures. Dhafer Malouche essai.academia.edu/dhafermalouche Center of Political Studies, Institute of Social Research University
More informationBiplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 6 Offprint
Biplots in Practice MICHAEL GREENACRE Proessor o Statistics at the Pompeu Fabra University Chapter 6 Oprint Principal Component Analysis Biplots First published: September 010 ISBN: 978-84-93846-8-6 Supporting
More informationSCA vs. TCA: an Expertise
SCA vs. TCA: an Expertise Sergio Camiz and Gastão Coelho Gomes 24/07/2015 1 Introduction The aim of this work is to compare both theoretically and in practice two exploratory methods whose aim is apparently
More informationANOVA approach. Investigates interaction terms. Disadvantages: Requires careful sampling design with replication
ANOVA approach Advantages: Ideal for evaluating hypotheses Ideal to quantify effect size (e.g., differences between groups) Address multiple factors at once Investigates interaction terms Disadvantages:
More informationResearch Methodology: Tools
MSc Business Administration Research Methodology: Tools Applied Data Analysis (with SPSS) Lecture 05: Contingency Analysis March 2014 Prof. Dr. Jürg Schwarz Lic. phil. Heidi Bruderer Enzler Contents Slide
More information1 Interpretation. Contents. Biplots, revisited. Biplots, revisited 2. Biplots, revisited 1
Biplots, revisited 1 Biplots, revisited 2 1 Interpretation Biplots, revisited Biplots show the following quantities of a data matrix in one display: Slide 1 Ulrich Kohler kohler@wz-berlin.de Slide 3 the
More informationHOMOGENEITY ANALYSIS USING EUCLIDEAN MINIMUM SPANNING TREES 1. INTRODUCTION
HOMOGENEITY ANALYSIS USING EUCLIDEAN MINIMUM SPANNING TREES JAN DE LEEUW 1. INTRODUCTION In homogeneity analysis the data are a system of m subsets of a finite set of n objects. The purpose of the technique
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationCONVERSION OF COORDINATES BETWEEN FRAMES
ECS 178 Course Notes CONVERSION OF COORDINATES BETWEEN FRAMES Kenneth I. Joy Institute for Data Analysis and Visualization Department of Computer Science University of California, Davis Overview Frames
More informationCHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)
FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter
More informationLOOKING FOR RELATIONSHIPS
LOOKING FOR RELATIONSHIPS One of most common types of investigation we do is to look for relationships between variables. Variables may be nominal (categorical), for example looking at the effect of an
More informationMatrices: 2.1 Operations with Matrices
Goals In this chapter and section we study matrix operations: Define matrix addition Define multiplication of matrix by a scalar, to be called scalar multiplication. Define multiplication of two matrices,
More informationFrequency Distribution Cross-Tabulation
Frequency Distribution Cross-Tabulation 1) Overview 2) Frequency Distribution 3) Statistics Associated with Frequency Distribution i. Measures of Location ii. Measures of Variability iii. Measures of Shape
More informationY (Nominal/Categorical) 1. Metric (interval/ratio) data for 2+ IVs, and categorical (nominal) data for a single DV
1 Neuendorf Discriminant Analysis The Model X1 X2 X3 X4 DF2 DF3 DF1 Y (Nominal/Categorical) Assumptions: 1. Metric (interval/ratio) data for 2+ IVs, and categorical (nominal) data for a single DV 2. Linearity--in
More informationThe following postestimation commands are of special interest after ca and camat: biplot of row and column points
Title stata.com ca postestimation Postestimation tools for ca and camat Description Syntax for predict Menu for predict Options for predict Syntax for estat Menu for estat Options for estat Remarks and
More informationFall TMA4145 Linear Methods. Exercise set Given the matrix 1 2
Norwegian University of Science and Technology Department of Mathematical Sciences TMA445 Linear Methods Fall 07 Exercise set Please justify your answers! The most important part is how you arrive at an
More informationDistributed multilevel matrix completion for medical databases
Distributed multilevel matrix completion for medical databases Julie Josse Ecole Polytechnique, INRIA Séminaire Parisien de Statistique, IHP January 15, 2018 1 / 38 Overview 1 Introduction 2 Single imputation
More informationPrincipal Component Analysis for Mixed Quantitative and Qualitative Data
Principal Component Analysis for Mixed Quantitative and Qualitative Data Susana Agudelo-Jaramillo Manuela Ochoa-Muñoz Tutor: Francisco Iván Zuluaga-Díaz EAFIT University Medelĺın-Colombia Research Practise
More informationThe Principal Component Analysis
The Principal Component Analysis Philippe B. Laval KSU Fall 2017 Philippe B. Laval (KSU) PCA Fall 2017 1 / 27 Introduction Every 80 minutes, the two Landsat satellites go around the world, recording images
More informationA Multivariate Perspective
A Multivariate Perspective on the Analysis of Categorical Data Rebecca Zwick Educational Testing Service Ellijot M. Cramer University of North Carolina at Chapel Hill Psychological research often involves
More informationConsideration on Sensitivity for Multiple Correspondence Analysis
Consideration on Sensitivity for Multiple Correspondence Analysis Masaaki Ida Abstract Correspondence analysis is a popular research technique applicable to data and text mining, because the technique
More informationChemometrics. Matti Hotokka Physical chemistry Åbo Akademi University
Chemometrics Matti Hotokka Physical chemistry Åbo Akademi University Linear regression Experiment Consider spectrophotometry as an example Beer-Lamberts law: A = cå Experiment Make three known references
More informationColourMatrix: White Paper
ColourMatrix: White Paper Finding relationship gold in big data mines One of the most common user tasks when working with tabular data is identifying and quantifying correlations and associations. Fundamentally,
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationMS-E2112 Multivariate Statistical Analysis (5cr) Lecture 6: Bivariate Correspondence Analysis - part II
MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 6: Bivariate Correspondence Analysis - part II the Contents the the the Independence The independence between variables x and y can be tested using.
More informationAssessing Sample Variability in the Visualization Techniques related to Principal Component Analysis : Bootstrap and Alternative Simulation Methods.
Draft from : COMPSTAT, Physica Verlag, Heidelberg, 1996, Alberto Prats, Editor, p 205 210. Assessing Sample Variability in the Visualization Techniques related to Principal Component Analysis : Bootstrap
More informationarxiv: v1 [stat.me] 13 May 2016
Multiple Correspondence Analysis & the Multilogit Bilinear Model arxiv:1605.04212v1 [stat.me] 13 May 2016 William Fithian 1 and Julie Josse 2 1 Berkeley Statistics Department, USA 2 Agrocampus Ouest -
More informationStatistics Handbook. All statistical tables were computed by the author.
Statistics Handbook Contents Page Wilcoxon rank-sum test (Mann-Whitney equivalent) Wilcoxon matched-pairs test 3 Normal Distribution 4 Z-test Related samples t-test 5 Unrelated samples t-test 6 Variance
More informationMultidimensional Joint Graphical Display of Symmetric Analysis: Back to the Fundamentals
Multidimensional Joint Graphical Display of Symmetric Analysis: Back to the Fundamentals Shizuhiko Nishisato Abstract The basic premise of dual scaling/correspondence analysis lies in the simultaneous
More informationOne-stage dose-response meta-analysis
One-stage dose-response meta-analysis Nicola Orsini, Alessio Crippa Biostatistics Team Department of Public Health Sciences Karolinska Institutet http://ki.se/en/phs/biostatistics-team 2017 Nordic and
More informationNotes for Math 324, Part 12
72 Notes for Math 324, Part 12 Chapter 12 Definition and main properties of probability 12.1 Sample space, events We run an experiment which can have several outcomes. The set consisting by all possible
More informationDESAIN EKSPERIMEN Analysis of Variances (ANOVA) Semester Genap 2017/2018 Jurusan Teknik Industri Universitas Brawijaya
DESAIN EKSPERIMEN Analysis of Variances (ANOVA) Semester Jurusan Teknik Industri Universitas Brawijaya Outline Introduction The Analysis of Variance Models for the Data Post-ANOVA Comparison of Means Sample
More information4. Ordination in reduced space
Université Laval Analyse multivariable - mars-avril 2008 1 4.1. Generalities 4. Ordination in reduced space Contrary to most clustering techniques, which aim at revealing discontinuities in the data, ordination
More informationSTAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression
STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression Rebecca Barter April 20, 2015 Fisher s Exact Test Fisher s Exact Test
More informationA Short Course in Basic Statistics
A Short Course in Basic Statistics Ian Schindler November 5, 2017 Creative commons license share and share alike BY: C 1 Descriptive Statistics 1.1 Presenting statistical data Definition 1 A statistical
More informationMultivariate Analysis of Ecological Data
Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology
More informationResidual-based Shadings for Visualizing (Conditional) Independence
Residual-based Shadings for Visualizing (Conditional) Independence Achim Zeileis David Meyer Kurt Hornik http://www.ci.tuwien.ac.at/~zeileis/ Overview The independence problem in 2-way contingency tables
More informationDE CHAZAL DU MEE BUSINESS SCHOOL AUGUST 2003 MOCK EXAMINATIONS IOP 201-Q (INDUSTRIAL PSYCHOLOGICAL RESEARCH)
DE CHAZAL DU MEE BUSINESS SCHOOL AUGUST 003 MOCK EXAMINATIONS IOP 01-Q (INDUSTRIAL PSYCHOLOGICAL RESEARCH) Time: hours READ THE INSTRUCTIONS BELOW VERY CAREFULLY. Do not open this question paper until
More informationCPSC 340: Machine Learning and Data Mining. More PCA Fall 2017
CPSC 340: Machine Learning and Data Mining More PCA Fall 2017 Admin Assignment 4: Due Friday of next week. No class Monday due to holiday. There will be tutorials next week on MAP/PCA (except Monday).
More information1 Overview. Conguence: Congruence coefficient, R V -coefficient, and Mantel coefficient. Hervé Abdi
In Neil Salkind (Ed.), Encyclopedia of Research Design. Thousand Oaks, CA: Sage. 2010 Conguence: Congruence coefficient, R V -coefficient, and Mantel coefficient Hervé Abdi 1 Overview The congruence between
More informationLinear Algebra Methods for Data Mining
Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 1. Basic Linear Algebra Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki Example
More information1 Introduction. 2 A regression model
Regression Analysis of Compositional Data When Both the Dependent Variable and Independent Variable Are Components LA van der Ark 1 1 Tilburg University, The Netherlands; avdark@uvtnl Abstract It is well
More information