Principal Component Analysis for Mixed Quantitative and Qualitative Data

Size: px

Start display at page:

Download "Principal Component Analysis for Mixed Quantitative and Qualitative Data"

Erin Reynolds
5 years ago
Views:

Ochoa-Muñoz Tutor: Francisco Iván Zuluaga-Díaz EAFIT

1 Principal Component Analysis for Mixed Quantitative and Qualitative Data Susana Agudelo-Jaramillo Manuela Ochoa-Muñoz Tutor: Francisco Iván Zuluaga-Díaz EAFIT University Medelĺın-Colombia Research Practise April 12th, 2016

2 Mixed Quantitative and Qualitative Data Principal Component Analysis for Mixed Quantitative and Qualitative Data 1/23

3 Correspondence Analysis It is a graphical technique to represent information of a contingency table with two inputs, which contains the count of elements for crossclassification of two categorical variables. These tables are based on two qualitative nominal or ordinal variables where categories of one variable appear in rows and other variable categories are represented in columns [de la Fuente Fernández, 2011]. Correspondence analysis can be useful to identify categories that are similar, which therefore can be combined. Principal Component Analysis for Mixed Quantitative and Qualitative Data 2/23

4 Example The following example illustrated how a quantification matrix works for a sample of 12 people and 4 categorical variables. Figure 1: Categories for the four variables taken from [Rencher, 1934] Principal Component Analysis for Mixed Quantitative and Qualitative Data 3/23

5 Example Figure 2: List of 12 people and their categories on four variables taken from [Rencher, 1934] Principal Component Analysis for Mixed Quantitative and Qualitative Data 4/23

6 Example Figure 3: Correspondence analysis of the four variables Principal Component Analysis for Mixed Quantitative and Qualitative Data 5/23

7 Indicator Matrix S ij = 1 if object i belongs to the category of the variable j 0 if object i does not belong to the category of the variable j Figure 4: Indicator matrix G for the data taken from [Rencher, 1934] Principal Component Analysis for Mixed Quantitative and Qualitative Data 6/23

8 Burt Matrix From the indicator matrix G we can get the G G matrix known as the Burt matrix. Figure 5: Burt Matrix G G for the matrix G taken from [Rencher, 1934] Principal Component Analysis for Mixed Quantitative and Qualitative Data 7/23

9 Burt Matrix In the diagonal blocks appear matrices containing the marginal frequencies of each of the variables analyzed. Outside the diagonal appear contingency tables of frequencies corresponding to all combinations 2 to 2 of the variables analyzed. Figure 6: Part of the contingency tables for variables Gender and Age Principal Component Analysis for Mixed Quantitative and Qualitative Data 8/23

10 Quantification Matrices Quantification matrices transform qualitative data into components which facilitates the analysis of results. The idea of using quantification matrices is to define correlation coefficients. The quantification matrices are used to measure similarity and dissimilarity between the objects respect to a variable. Principal Component Analysis for Mixed Quantitative and Qualitative Data 9/23

11 Quantification Matrix G j G j The elements of the quantification matrix G j G j are given by: S ii j = 1 if object i and object i belong to the same category 0 if object i and object i belong to different category S ii j it is a measure of similarity between sample objects i and i in terms of a particular variable j. The frequency categories and the number of categories are not taken into account in this measure of similarity [Kiers, 1989]. Principal Component Analysis for Mixed Quantitative and Qualitative Data 10/23

12 Quantification Matrix G j G j Table 1: Quantification matrix GG of hair color variable Hair Color Principal Component Analysis for Mixed Quantitative and Qualitative Data 11/23

13 Example Figure 7: List of 12 people and their categories on four variables taken from [Rencher, 1934] Principal Component Analysis for Mixed Quantitative and Qualitative Data 12/23

14 Quantification Matrix G j (G j G j) 1 G j In this case Burt matrix inverted is added: Table 2: Burt matrix inverted of hair color variable Hair Color Blond Brown Black Red Blond Brown Black Red Principal Component Analysis for Mixed Quantitative and Qualitative Data 13/23

15 Quantification Matrix G j (G j G j) 1 G j The elements of the quantification matrix G j (G j G j) 1 G j are given by: S ii j = f 1 g if object i and object i belong to the same category 0 if object i and object i belong to different category where f 1 g is the g th diagonal element of (G j G j) 1 [Kiers, 1989]. Principal Component Analysis for Mixed Quantitative and Qualitative Data 14/23

16 Quantification Matrix G j (G j G j) 1 G j Table 3: Quantification matrix G(G G) 1 G of hair color variable Hair Color Principal Component Analysis for Mixed Quantitative and Qualitative Data 15/23

17 Example Figure 8: List of 12 people and their categories on four variables taken from [Rencher, 1934] Principal Component Analysis for Mixed Quantitative and Qualitative Data 16/23

18 Quantification Matrix JG j (G j G j) 1 G j J Here the J matrix is added: J=I n 11 n where I n is the identity matrix, 1 is an ones vector and n is the sample size. Principal Component Analysis for Mixed Quantitative and Qualitative Data 17/23

19 Quantification Matrix JG j (G j G j) 1 G j J Table 4: J matrix J Matrix Principal Component Analysis for Mixed Quantitative and Qualitative Data 18/23

20 Quantification Matrix JG j (G j G j) 1 G j J This quantification matrix is a normalized version of the χ 2 measure. Where χ 2 = 0 if variables are statistically independent [Kiers, 1989]. The elements of the quantification matrix JG j (G j G j) 1 G j J are given by: S ii j = f 1 g n 1 if object i and object i belong to the same category n 1 if object i and object i belong to different category Principal Component Analysis for Mixed Quantitative and Qualitative Data 19/23

21 Quantification Matrix JG j (G j G j) 1 G j J Table 5: Quantification matrix JG(G G) 1 G J of hair color variable Hair Color Principal Component Analysis for Mixed Quantitative and Qualitative Data 20/23

22 Example Figure 9: List of 12 people and their categories on four variables taken from [Rencher, 1934] Principal Component Analysis for Mixed Quantitative and Qualitative Data 21/23

23 References de la Fuente Fernández, S. (2011). Análisis correspondencias simples y múltiples. Universidad Autónoma de Madrid, pages 1 9. Kiers, H. (1989). Three-way methods for the analysis of qualitative and quantitative two-way data. PhD thesis. Rencher, A. C. (1934). Methods of Multivariate Analysis. Wiley Series in Probability and Statistics. Principal Component Analysis for Mixed Quantitative and Qualitative Data 22/23

24 THANKS FOR YOUR ATTENTION Principal Component Analysis for Mixed Quantitative and Qualitative Data 23/23

Principal Component Analysis for Mixed Quantitative and Qualitative Data

1 Principal Component Analysis for Mixed Quantitative and Qualitative Data Susana Agudelo-Jaramillo sagudel9@eafit.edu.co Manuela Ochoa-Muñoz mochoam@eafit.edu.co Francisco Iván Zuluaga-Díaz fzuluag2@eafit.edu.co